Imaginator: A Text-To-Image Model Pipeline
dc.contributor.author | Horton, Brandon H. | |
dc.date.accessioned | 2024-10-21T18:40:28Z | |
dc.date.available | 2024-10-21T18:40:28Z | |
dc.date.issued | 2023-12-22 | |
dc.identifier.uri | http://hdl.handle.net/20.500.12648/15609 | |
dc.description.abstract | This work presents a pipeline of three seperate parts that create an image taken from a passage of text; whether that be a book, or some other form of media. It utilizes Gradio, a web-app based hosting program to combine these into one pipeline.[1] It also includes a way to generate a dataset filled with optimal Stable-Diffusion prompts, utilizing chatgptv3.5-turbo-1106, for the purposes of fine-tuning or training.[2] Based on research, this may be a first-of-a-kind dataset for the field. First, it utilizes PyTesseract (TesseractOCR) and opencv2 to clean up the image and obtain plaintext from an image of a book page, or other written text. Then, the pipeline sends this plain text to a fine-tuned LLM, based on the long-t5-tglobal-xl-16384-book-summary, which is further based on the LongT5 document summarization model type, fine-tuned to produce an output that is friendly for Stable Diffusion.[12] This output can be characterized as a series of tags or short descriptors separated by a myriad of commas. Once this output is produced, it is sent to the final step in the pipeline, a Stable Diffusion model, specifically Stable Diffusion XL Turbo, which produces an image based on the summarized text.[15] In user-testing, it is fairly accurate to the original book passage. Due to limitations, and this being a first-of-a-kind project, there is no output to compare it to. | en_US |
dc.language.iso | N/A | en_US |
dc.publisher | SUNY Polytechnic Institute | en_US |
dc.subject | Gradio | en_US |
dc.subject | artificial intelligence (AI) | en_US |
dc.subject | text-to-image generation model | en_US |
dc.subject | deep learning techniques | en_US |
dc.subject | multi-document summarization | en_US |
dc.subject | Optical Character Recognition (OCR) | en_US |
dc.subject | Large Language Model (LLM) | en_US |
dc.subject | Stable diffusion | en_US |
dc.title | Imaginator: A Text-To-Image Model Pipeline | en_US |
dc.title.alternative | A Project: Submitted to the Graduate Faculty of the State University of New York Polytechnic Institute in Partial Fulfillment of the Requirements for the Degree of Master of Science | en_US |
dc.type | Masters Project | en_US |
dc.description.version | NA | en_US |
refterms.dateFOA | 2024-10-21T18:40:29Z | |
dc.description.institution | SUNY Polytechnic Institute | en_US |
dc.description.department | College of Engineering, Department of Computer & Information Science | en_US |
dc.description.degreelevel | MS | en_US |
dc.description.advisor | Reale, Michael, Ph.D. | |
dc.description.advisor | Confer, Amos, Dr. | |
dc.description.advisor | Chiang, Chen-fu, Dr. | |
dc.date.semester | Spring 2024 | en_US |
Files in this item
This item appears in the following Collection(s)
-
SUNY Polytechnic Institute College of Engineering
This collection contains master's theses, capstone projects, and other student and faculty work from programs within the Department of Engineering, including computer science and network security.