Show simple item record

dc.contributor.authorHorton, Brandon H.
dc.date.accessioned2024-10-21T18:40:28Z
dc.date.available2024-10-21T18:40:28Z
dc.date.issued2023-12-22
dc.identifier.urihttp://hdl.handle.net/20.500.12648/15609
dc.description.abstractThis work presents a pipeline of three seperate parts that create an image taken from a passage of text; whether that be a book, or some other form of media. It utilizes Gradio, a web-app based hosting program to combine these into one pipeline.[1] It also includes a way to generate a dataset filled with optimal Stable-Diffusion prompts, utilizing chatgptv3.5-turbo-1106, for the purposes of fine-tuning or training.[2] Based on research, this may be a first-of-a-kind dataset for the field. First, it utilizes PyTesseract (TesseractOCR) and opencv2 to clean up the image and obtain plaintext from an image of a book page, or other written text. Then, the pipeline sends this plain text to a fine-tuned LLM, based on the long-t5-tglobal-xl-16384-book-summary, which is further based on the LongT5 document summarization model type, fine-tuned to produce an output that is friendly for Stable Diffusion.[12] This output can be characterized as a series of tags or short descriptors separated by a myriad of commas. Once this output is produced, it is sent to the final step in the pipeline, a Stable Diffusion model, specifically Stable Diffusion XL Turbo, which produces an image based on the summarized text.[15] In user-testing, it is fairly accurate to the original book passage. Due to limitations, and this being a first-of-a-kind project, there is no output to compare it to.en_US
dc.language.isoN/Aen_US
dc.publisherSUNY Polytechnic Instituteen_US
dc.subjectGradioen_US
dc.subjectartificial intelligence (AI)en_US
dc.subjecttext-to-image generation modelen_US
dc.subjectdeep learning techniquesen_US
dc.subjectmulti-document summarizationen_US
dc.subjectOptical Character Recognition (OCR)en_US
dc.subjectLarge Language Model (LLM)en_US
dc.subjectStable diffusionen_US
dc.titleImaginator: A Text-To-Image Model Pipelineen_US
dc.title.alternativeA Project: Submitted to the Graduate Faculty of the State University of New York Polytechnic Institute in Partial Fulfillment of the Requirements for the Degree of Master of Scienceen_US
dc.typeMasters Projecten_US
dc.description.versionNAen_US
refterms.dateFOA2024-10-21T18:40:29Z
dc.description.institutionSUNY Polytechnic Instituteen_US
dc.description.departmentCollege of Engineering, Department of Computer & Information Scienceen_US
dc.description.degreelevelMSen_US
dc.description.advisorReale, Michael, Ph.D.
dc.description.advisorConfer, Amos, Dr.
dc.description.advisorChiang, Chen-fu, Dr.
dc.date.semesterSpring 2024en_US


Files in this item

Thumbnail
Name:
Masters_Project_Brandon_Horton ...
Size:
16.41Mb
Format:
PDF
Thumbnail
Name:
Brandon_Horton_Library_Release ...
Size:
162.2Kb
Format:
PDF

This item appears in the following Collection(s)

  • SUNY Polytechnic Institute College of Engineering
    This collection contains master's theses, capstone projects, and other student and faculty work from programs within the Department of Engineering, including computer science and network security.

Show simple item record