SUNY Polytechnic Institute College of Engineering
Recent Submissions
-
Imaginator: A Text-To-Image Model PipelineThis work presents a pipeline of three seperate parts that create an image taken from a passage of text; whether that be a book, or some other form of media. It utilizes Gradio, a web-app based hosting program to combine these into one pipeline.[1] It also includes a way to generate a dataset filled with optimal Stable-Diffusion prompts, utilizing chatgptv3.5-turbo-1106, for the purposes of fine-tuning or training.[2] Based on research, this may be a first-of-a-kind dataset for the field. First, it utilizes PyTesseract (TesseractOCR) and opencv2 to clean up the image and obtain plaintext from an image of a book page, or other written text. Then, the pipeline sends this plain text to a fine-tuned LLM, based on the long-t5-tglobal-xl-16384-book-summary, which is further based on the LongT5 document summarization model type, fine-tuned to produce an output that is friendly for Stable Diffusion.[12] This output can be characterized as a series of tags or short descriptors separated by a myriad of commas. Once this output is produced, it is sent to the final step in the pipeline, a Stable Diffusion model, specifically Stable Diffusion XL Turbo, which produces an image based on the summarized text.[15] In user-testing, it is fairly accurate to the original book passage. Due to limitations, and this being a first-of-a-kind project, there is no output to compare it to.
-
Face Recognition on Edge DevicesIn the rapidly evolving landscape of face recognition technologies, this project addresses the pressing need to evaluate and optimize diverse face recognition models for deployment on edge devices. Spanning traditional methods such as Eigenfaces to contemporary deep learning approaches like VGG Face, DeepFace, and ArcFace, the study conducts a rigorous comparative analysis across various operational settings. The central focus is on elucidating the impact of Graphics Processing Units (GPUs) in enhancing model performance, particularly within the resource constraints inherent to edge devices. Empirical testing unfolds on a dataset that’s widely recognized as the Labeled Faces in the Wild (LFW) dataset. Employing Python as the primary programming language, TensorFlow as the machine learning backbone, and leveraging the capabilities of Keras, OpenCV, and Adam Geitgey’s Face Recognition library, the project assembles a versatile toolkit for model evaluation. Upon completion, this project is poised to contribute substantial insights into the deployability of face recognition models on edge devices, providing practical guidance for developers, engineers, and decision-makers. The anticipated outcomes encompass a nuanced understanding of the models’ adaptability, limitations, and the role of GPUs in enhancing performance. As technology converges towards decentralized computing paradigms, this study is positioned to play a pivotal role in shaping the landscape of face recognition technologies on the edge.
-
Using Linear Regression and Machine Learning Techniques to Predict Housing Prices Based on Economic FactorsWhen trying to predict housing prices, most studies rely on data from a specific area, and the features of the homes there. In this study, the goal is to use linear regression and machine learning to predict housing prices based on overarching economic factors. A mix of machine learning and linear regression was used, including TensorFlow Keras, OLS, Ridge, Lasso, Elastic Net, XGBoost, Random Forest and SVM. Datasets featured include Average Sales Price of House Sold for the United States, closing stock prices (NASDAQ, S&P), 30-year mortgage interest rates, average monthly rent, number of houses sold, number of houses constructed, mean family income, median family income, GDP, and unemployment rate.
-
Implementing a NIDS System for protecting computer and wireless networks using various machine learning approachesIn modern Wireless Networks, security is critical. With the ever-evolving attacks on wireless networks, both public and private, the use of Network Intrusion Detection Systems is at an all-time high. While NIDS is needed more than ever, its current security structure is starting to show signs of becoming obsolete. With the alarming rate of attacks in the modern digital space, NIDS needs to have a way to react effectively. Setting up NIDS is too slow and can cause many issues when defending against these attacks, leaving wireless networks vulnerable. Three approaches are often discussed NIDS: signature-based, anomaly-based, as well as hybrid. Implementing Machine Learning would fall under an updated version of Anomaly Learning. The hope from these actions is that they will allow new attacks to be caught without user interference. This paper will discuss various forms of Machine Learning, NIDS, and the implementation of both into each other. This paper will discuss our current options in Machine Learning NIDS and explain how they’ve evolved thus far and the advantages at each stage. This paper, while mainly focusing on the machine learning implementation of NIDS, will touch briefly on how this implementation could strengthen current security in wireless networks.
-
PixelatedGAN: A Generative Adversarial Network For Pixelated ImagesThis work presents a generative adversarial network which generates images in a pixelated output space. The results of this project have both utility in allowing for more accurate training and generation when based upon input images which are pixelated, and also for creating uniquely intelligently pixelated outputs when trained on non-pixelated input images. Pixelated images are used often in video games and art. Pixelated images are also uniquely useful for image compression since they do not lose any visual information when made smaller. At the minimum, a pixelated image can be compressed to a quarter of its original size without losing any data. Several attempts have been made by researchers in the field of generative AI, prior to this paper, to create a neural network which generates pixel art. However, these attempts focused more on the artistic value of images stylistically similar to pixelated images rather than on actually having the network create images which were properly pixelated.
-
Filter Design on GraphsGraphs are a fundamental tool in Computer Science in a variety of areas such as Artificial Intelligence, Machine Learning, Networking, Signal Processing, Brain Mapping, Social Networks, and many others. Many of these graphs can have millions of nodes, so some sort of filtering is usually in order to extract the useful information. The aim of this thesis is develop the mathematical background and building blocks that are fundamental to designing these filters, as well as lay out a clear blueprint for how to create graph filters for undirected graphs. The two major approaches that will be focused on will be developing filters in the vertex domain and spectral domains. Filter design in the vertex domain aims to act on the laplacian or adjacency matrices directly, while design in the spectral domain looks at acting on the spectral properties of the graph. Some polynomial and rational filters are proposed in this thesis, and are applied to sample graphs to demonstrate their effectiveness. Further study could be conducted with regard to directed graphs, looking at a different variety of families of polynomials, or analyzing the efficiency of computing these filters on larger graphs.
-
A Comparison Between Relational and Dimensional Model Techniques in a Business Intelligence SettingData modeling is the process of analyzing and defining the different data a business collects and produces and the relationships between that data. [8] Today there are two prevalent database models; relational modeling and dimensional modeling. These models are used to connect the various tables in a database to be used for data analysis. Most traditional businesses use a relational database to store all of their information, and when attempting to analyze data, the relational database is turned into a dimensional one. Dimensional models are considered to be simpler and better to execute queries against. Dimensional modeling is the current industry standard when analyzing data using a business intelligence tool, but are there cases when the transformation step can be skipped and a developer work with a relational model with minimal impacts on the development and user experience? This project will evaluate various aspects, such as the complexity of each model, query execution time, and time to update report elements on business intelligence tools as a result of user interaction. As well as determine if there are cases where the conversion step from a relational to a dimensional model can be skipped and queries created against the relational model instead.
-
Fine-Grained Categorization Using a Mixture of Transfer Learning NetworksIn this paper, we apply a mixture of experts approach to further enhance the accuracy of transfer learning networks on a fine-grained categorization problem, expanding on the work of Firsching and Hashem [4]. Mixture of experts approaches may help to improve accuracy on categorization problems. Likewise, transfer learning is a highly effective tech nique for solving problems in machine learning of varying complexities. We here illustrate that mixtures of trained transfer learning networks, when applied properly, may further improve categorization accuracy.
-
A Bayesian Approach to Stock TradingThis project focuses on using Probabilistic Programming and more specifically using the Bayesian approach to devise an effective strategy to trade. This project does so by implementing a novel model on co-integration for pairs-trading using probabilistic programming. As opposed to using the traditional and simpler frequentist approach for pair determination I have implemented a more sophisticated Bayesian approach for pair trading using probabilistic programming. Pair trading is a market neutral strategy that enables traders to profit from virtually any market conditions be it uptrend, downtrend, or sideways movement. It is characterized as statistical arbitrage and convergence trading strategy. Pair Trading combined with co-integration as criteria makes for a successful and reliable trading strategy. Unlike simpler frequentist cointegration tests, the Bayesian approach allows to monitor the relationship between a pair of equities over time, which further allows to follow pairs whose cointegration parameters change steadily or abruptly. Bayesian statistics also accounts for uncertainty in in making predictions. It provides with mathematical tools to update beliefs about random events considering seeing new data or evidence about those events and it can do without having the need for a large dataset. It interprets probability as a measure of believability or confidence that an individual might possess about the occurrence of a particular event while including uncertainty in the equation. Along with a mean reversion trading algorithm, this approach can be effectively used as a viable trading strategy, open for further evaluation and risk management.
-
Real-time Exercise Posture Correction Using Human Pose Detection TechniqueHuman pose detection is one of the fascinating research areas in the field of computer vision that has many unsolved challenges. Detecting and capturing human activity is advantageous in many fields like sport analysis, human coordination tracking, public surveillance etc. Due to the COVID-19 driven pandemic, it became hard for society to access exercising hubs and newbies into the fitness industry were left blank with almost no personal guidance that they usually get in gym in terms of exercising in the right way through one-on-one interactions. As these resources are not always available, human pose detection can be a medium to replace a human personal trainer by developing a real-time exercise posture correction system on recorded videos or realtime image stream that allows people to safely exercise at home avoiding injuries. This project uses a pre-trained OpenPose Caffe model with two datasets i.e., COCO and MPII to correct one of the exercises in the fitness industry. This project also discusses various pose estimation and key point detection techniques in detail and different deep learning models used for pose classification.
-
Stock Price Prediction Using Sentiment Analysis and LSTMThis work presents multiple Long Short-Term Memory neural networks used in conjunction with sentiment analysis to predict stock prices over time. Multiple datasets and input features are used on a LSTM model to decipher which features produce the best output predictions and if there is correlation to the sentiment of posts and the rising of a stock. This project uses embedding based sentiment analysis on a dataset collected from Kaggle which includes over one million posts made on the subreddit r/wallstreetbets. This subreddit recently came under fire by the media with the shorting of Gamestop in the stock market. It was theorized that this subreddit was working as a collective to drive up the price of multiple stock, therefore hurting large corporations such as hedge funds that had large short positions on multiple stocks.
-
Exploring Deep Learning for Vulnerability Detection in Smart ContractsThis project explores vulnerability detection in Solidity smart contracts. The following report provides a brief overview of blockchain technology, smart contract specific vulnerabilities and the tooling that exists to detect these vulnerabilities. The application of deep learning as a vulnerability detection tool was explored in more detail. The result of this work is an LSTM trained to detect re-entrancy vulnerabilities in smart contracts. The model is trained on smart contracts identified and labeled in the ScawlD dataset provided by Yashavant et al.
-
High Performance OFDM PHY in C++Orthogonal Frequency Division Multiplexing (OFDM) is a popular modulation technique used in some of the most well known waveforms today such as 5G and Wi-Fi. Almost all waveform implementations are currently being performed on hardware and FPGA firmware due to the high performance these techniques allow, with a significant trade off being the time and cost to develop with these methods. It seems that almost all software based research and development is being done using GNU Radio, which while a very quick and easy environment to test with, has nowhere near the performance capabilities of a pure C++ implementation. This work aims to investigate software optimization techniques that can be used in C++ to allow for quick and high performance applications to be created on general purpose processors (GPPs), and set a benchmark for what can be achieved on some common platforms, like a laptop and a PC. The results show that sample rates and bandwidths of well over 1000MHz can be achieved. To the best of my knowledge, the contributions presented in this paper have resulted in the highest performing implementation of a completely software based OFDM PHY in terms of sample rate, bandwidth, and subcarrier count.
-
Computer-To-Computer Based Communication Through Natural LanguageWhile highly accurate and efficient computer-to-computer communication exists, the exploration of communication of computer models via natural language is still worth exploring. This paper explores the creation of a “Speaker” model which preforms image-to-text operations and a “Listener” model which does the reverse text-to-image task. These models can be used together to form the basis of computer natural language communication not only in existing languages such as English but in completely new generated languages as well with the help of the “Rambler” model which combines the Speaker and Listener to preform the entire image-to-text-to-image process. By comparing the image on both sides of the process, how effective the communication is can be measured. While natural language-based computer communications will likely never be common place, it nonetheless poses some interesting and unique challenges.
-
Forecast of COVID-19 New Cases Using an MLP RegressorOver the past year the COVID-19 pandemic has overwhelmed healthcare systems and government institutions worldwide, creating the need for accurate prediction of confirmed cases. Current research focuses on prediction methodologies aiming at mitigating economical anxiety and aiding in detection, preemption and forecasting of the pandemic. Modeling using artificial neural networks (ANN) is an important component of this effort and this research aims to contribute with a methodology to forecast occurrence of new cases of COVID-19 1, 2 and 7 days ahead by using historical case data. These approaches were compared and contrasted with some published results in terms of network architecture and neural network of use. Three regressors were developed and showcased in this document with their accuracy evaluated using the root mean squared error (RMSE) from 1179 to 2806. The quality evaluation supports the conclusion that these regressors compete with those of current architectures.
-
Face Recognition and Emotion IdentificationWhile face recognition has been around in one form or another since the 1960s, recent technological developments have led to a wide proliferation of this technology. This technology is no longer seen as something out of science fiction movies like Minority Report. With the release of the iPhone X, millions of people now literally have face recognition technology in the palms of their hands, protecting their data and personal information. While mobile phone access control might be the most recognizable way face recognition is being used, it is being employed for a wide range of use cases including preventing crime, protecting events and making air travel more convenient. This project focuses on various advanced Python libraries to improve the face recognition accuracy such as OpenCV, Sklearn, face_recognition. The project understands the data and model, train it for further usage. The real time videos are considered for evaluating the results. Further the project glances the emotion recognition algorithms using CV2, Seaborn. The areas of the human faces are highlighted according to different emotions. The large data sets (fer2013, Olivetti faces) are used for training and testing the data sets. PCA, leave one out cross validation, grid search CV, machine learning pipelines, CNN models are used to estimate and increase the accuracy. The project is executed in Anaconda environment Jupyter Notebook. As the data sets are huge Google Collaboratory is used for execution.
-
Multicyclic Loss for Multidomain Image-to-Image TranslationGANs developed to Translate an Image’s style between different domains often only care about the initial translation, and not the ability to further translate upon an image This can cause issues where, if one would want to generate upon an image and then further on, change that image even more that person may come into issues. This creates a ”gap” between the base images and the generated images, and in this paper a Multicyclic Loss is presented, where the Neural Network also trains on further translations to images that were already translated. iv
-
Text Detection from an ImageRecently, a variety of real-world applications have triggered a huge demand for techniques that can extract textual information from images and videos. Therefore, image text detection and recognition have become active research topics in computer vision. The current trend in object detection and localization is to learn predictions with high capacity deep neural networks trained on a very large amount of annotated data and using a high amount of processing power. In this project, I have built an approach for text detection using the object detection technique. Our approach is to deal with the text as objects. We use an object detection method, YOLO (You Only Look Once), to detect the text in the images. We frame object detection as a regression problem to spatially separated bounding boxes and associated class probabilities. YOLO, a single neural network, that predicts bounding boxes and class probabilities directly from full images in one evaluation. Since the whole detection pipeline is a single network, it can be optimized end-to-end directly on detection performance. The MobileNet pre-trained deep learning model architecture was used and modified in different ways to find the best performing model. The goal is to achieve high accuracy in text spotting. Experiments on standard datasets ICDAR 2015 demonstrate that the proposed algorithm significantly outperforms methods in terms of both accuracy and efficiency.
-
Technology Case Study in Storage Area NetworksIn today's world we need immediate access to data. The demand for networked data access has increased exponentially in the last 20 years. With that demand the importance and volume of networked data has also grown exponentially. The speed at which the data can be accessed has increased and with that the data has moved from individual workstations to a networked location. Over the last decade there has been a trend to move mission critical data away from individual workstations to a centralized data center. A centralized data center removes the location constraint for accessing the data. If critical data is stored on individual servers, a failure will cause the data to be inaccessible. Today, mission critical applications are spanned over multiple servers for redundancy. With this topology, having the data in a central location allows the individual servers to better work with data. With the addition of virtualization, servers can be moved online from one physical server to another. If the data is centralized, it can be presented to all hosts in the cluster. This allows servers to move efficiently between hosts without losing access to the critical data. Many businesses in various industries like finance, airline, hospital, research, etc. depend on the speed and secure availability of their centralized data to function efficiently.
-
Non-Convex Optimization: RMSProp Based Optimization for Long Short-Term Memory NetworkThis project would give a comprehensive picture of non-convex optimization for deep learning, explain in details about Long Short-Term Memory (LSTM) and RMSProp. We start by illustrating the internal mechanisms of LSTM, like the network structure and backpropagation through time (BPTT). Then introducing RMSProp optimization, some relevant mathematical theorems and proofs in those sections, which give a clear picture of how RMSProp algorithm is helpful to escape the saddle point. After all the above, we apply it with LSTM with RMSProp for the experiment; the result would present the efficiency and accuracy, especially how our method beat traditional strategy in non-convex optimization.