Loading...
Thumbnail Image
Publication

Imaginator: A Text-To-Image Model Pipeline

Journal Title
Readers/Advisors
Reale, Michael, Ph.D., Confer, Amos, Dr., Chiang, Chen-fu, Dr.
Journal Title
Term and Year
Spring 2024
Publication Date
2023-12-22
Book Title
Publication Volume
Publication Issue
Publication Begin
Publication End
Number of pages
Research Projects
Organizational Units
Journal Issue
Abstract
This work presents a pipeline of three seperate parts that create an image taken from a passage of text; whether that be a book, or some other form of media. It utilizes Gradio, a web-app based hosting program to combine these into one pipeline.[1] It also includes a way to generate a dataset filled with optimal Stable-Diffusion prompts, utilizing chatgptv3.5-turbo-1106, for the purposes of fine-tuning or training.[2] Based on research, this may be a first-of-a-kind dataset for the field. First, it utilizes PyTesseract (TesseractOCR) and opencv2 to clean up the image and obtain plaintext from an image of a book page, or other written text. Then, the pipeline sends this plain text to a fine-tuned LLM, based on the long-t5-tglobal-xl-16384-book-summary, which is further based on the LongT5 document summarization model type, fine-tuned to produce an output that is friendly for Stable Diffusion.[12] This output can be characterized as a series of tags or short descriptors separated by a myriad of commas. Once this output is produced, it is sent to the final step in the pipeline, a Stable Diffusion model, specifically Stable Diffusion XL Turbo, which produces an image based on the summarized text.[15] In user-testing, it is fairly accurate to the original book passage. Due to limitations, and this being a first-of-a-kind project, there is no output to compare it to.
Citation
DOI
Description
Accessibility Statement
Embedded videos