• Accessible Formal Methods: A Study of the Java Modeling Language

      Rawding, Michael; Andriamanalimanana, Bruno; Advisor; Spetka, Scott; Reviewer; Vishwanathan, Roopa; Reviewer (2017-04-17)
      While formal methods offer the highest level of confidence that software behaves as intended, they are notoriously difficult to use. The Java Modeling Language and the associated OpenJML tool aim to make formal specification and verification more accessible to Java developers. This report gives an overview of JML and assesses its current status and usability. Though many common Java features have been implemented, lack of standard library support is identified as an obstacle to using JML effectively. To help address that problem, this report documents the process of adding support for a new library to OpenJML.
    • Aligning the SUNY Poly NCS Program with Nationally Recognized Accreditation

      Cook, John; Marsh, John; Adviser; Hash, Larry; Reviewer; Bull, Ronny; Reviewer (2015-01-29)
      This document is an exploration into what types of curriculum changes must be made to accommodate accreditation. In the review of program accrediting bodies, none is more authoritative or more appropriate than the Accreditation Board for Engineering and Technology (ABET). In ABET’s requirements for accreditation, computing related programs are defined and delineated. On further exploration, it can be seen that the Association for Computing Machinery (ACM) has driven the development of those definitions. The ACM further defines goals and objectives for these disciplines, as well as curriculum models. When reviewing other accreditations, not only are these ACM definitions recognized within those accreditations, goal and outcome alignment is also present. This ‘goal and outcome’ methodology is also present in institution level accreditations that SUNY Poly must comply with. After reviewing the ACM program definitions and comparing them to the NCS program, it is concluded that NCS most closely resembles an ACM IT defined program. This leads to the recommendation of adopting and aligning with ACM IT program guidelines, which provides solutions to multiple program and institution requirements as well as creating a solid pathway to accreditation.
    • Applicability of the Julia Programming Language to Forward Error-Correction Coding in Digital Communications Systems

      Quinn, Ryan; Andriamanalimanana, Bruno R.; Advisor; Sengupta, Saumendra; Reviewer; Spetka, Scott; Reviewer (2018-05)
      Traditionally SDR has been implemented in C and C++ for execution speed and processor efficiency. Interpreted and high-level languages were considered too slow to handle the challenges of digital signal processing (DSP). The Julia programming language is a new language developed for scientific and mathematical purposes that is supposed to write like Python or MATLAB and execute like C or FORTRAN. Given the touted strengths of the Julia language, it bore investigating as to whether it was suitable for DSP. This project specifically addresses the applicability of Julia to forward error correction (FEC), a highly mathematical topic to which Julia should be well suited. It has been found that Julia offers many advantages to faithful implementations of FEC specifications over C/C++, but the optimizations necessary to use FEC in real systems are likely to blunt this advantage during normal use. The Julia implementations generally effected a 33% or higher reduction in source lines of code (SLOC) required to implement. Julia implementations of FEC algorithms were generally not more than 1/3 the speed of mature C/C++ implementations.While Julia has the potential to achieve the required performance for FEC, the optimizations required to do so will generally obscure the closeness of the implementation and specification. At the current time it seems unlikely that Julia will pose a serious challenge to the dominance of C/C++ in the field of DSP.
    • BGP Routing Protocol

      Parasa, Sai Kiran; Hash, Larry; Advisor (2016-08)
      Border Gateway Protocol is the protocol which makes the Internet work. It is used at the Service provider level which is between different Autonomous Systems (AS). An Autonomous System is a single organization which controls the administrative part of a network. Routing with in an Autonomous System is called as Intra-Autonomous routing and routing between different Autonomous Systems is called as Inter-Autonomous System routing. The routing protocols used within an Autonomous System are called Interior Gateway Protocols (IGP) and the protocols used between the Autonomous Systems are called Exterior Gateway Protocols. Routing Information Protocol (RIP), Open Short Path First (OSPF) and Enhanced Interior Gateway Routing Protocol (EIGRP) are the examples for IGP protocols and Border Gateway Protocol (BGP) is the example for EGP protocols. Every routing protocol use some metric to calculate the best path to transfer the routing information. BGP rather than using a particular metric, it uses BGP attributes to select the best path. Once it selects the best path, then it starts sending the updates in the network. Every router implementing BGP in the network, configures this best path in its Routing Information Base. Only one best route is selected and forwarded to the whole network. [17] Due to the tremendous increase in the size of the internet and its users, the convergence time during link failure in the protocol is very high.
    • Botnet Campaign Detection on Twitter

      Fields, Jeremy; Sengupta, Saumendra; Adviser; White, Joshura; Reviewer; Spetka, Scott; Reviewer (2016-08)
      The goal of this thesis is to investigate and analyze botnet activity on social media networks. We first start by creating an algorithm and scoring method for “likely bots,” and analyze them in conjunction with their neighboring messages to determine whether there is a likely group of bots, or botnet. Chapters 1 & 2 cover the overview of the work, and the previous research done by others. Multiple datasets were collected from Twitter, over different time frames, including random samples, and targeted topics. Chapters 3 & 4 cover the methodology and results of the approach using these datasets. The method is shown to have high accuracy.
    • A Case Study on Apache HBase

      Nalla, Rohit Reddy; Sengupta, Sam; Adviser; Novillo, Jorge; Reviewer; Rezk, Mohamed; Reviewer (2015-05-16)
      Apache HBase is an open-source, non-relational and a distributed data base system built on top of HDFS (Hadoop Distributed File system). HBase was designed post Google’s Big table and it is written in Java. It was developed as a part of Apache’s Hadoop Project. It provides a kind of fault – tolerant mechanism to store minor amounts of non-zero items caught within large amounts of empty items. HBase is used when we require real-time read/write access to huge data bases. HBase project was started by the end of 2006 by Chad Walters and Jim Kellerman at Powerset.[2] The main purpose of HBase is to process large amounts of data. Mike Cafarella worked on code of the working system initially and later Jim Kellerman carried it to the next stage. HBase was first released as a part of Hadoop 0.15.0 in October 2007[2]. The project goal was holding of very large tables like billions of rows X millions of columns. In May 2010, HBase advanced to a major project and it became an Apache Top Level Project. Several applications like Adobe, Twitter, Yahoo, Trend Micro etc. use this data base. Social networking sites like Facebook have implemented its messenger application using HBase. This document helps us to understand how HBase works and how is it different from other data bases. This document highlights about the current challenges in data security and a couple of models have been proposed towards the security and levels of data access to overcome the challenges. This document also discusses the workload challenges and techniques to overcome. Also an overview has been given on how HBase has been implemented in real time application Facebook messenger app.
    • Comparison of Network Switch Architectures by CISCO

      Vemula, Veera Venkata Satyanarayana; Hash, Larry; Advisor (2016-02-01)
      This project is targeted to compare two major switching architectures provided by CISCO. CISCO is a network device manufacturer who has contributed to networking world by inventing many networking protocols which are used to improve the network performance and network health. In this document the switching architectures CATALYST and NEXUS are compared. All the available features in each architectures are listed and working of the supported protocols is explained in detail. The document also considers three network scenarios and explains which architecture is best suited and explains why in detail.
    • Data Mining and Bi Data Warehousing Based Implementation for a Random Film Studio

      Bonthi, Sneha; Andriamanalimanana, Bruno; Adviser; Rezk, Mohamed; Reviewer; Reale, Michael; Reviewer (2016-12-01)
      The purpose of this report is to study a dataset of movies and analyse the possibility and feasibility of implementing a data warehousing or a data mining application to improve analytics and decision making. The project report talks about the raw data originating from the data collection centres and box offices which can be modelled and transformed into a specific format and structure that would help the business analysts in identifying patterns and trends so as to take important business decisions. The report explores the benefits of extracting, transforming and loading this raw data into a dimensional model. According to the proposed implementation, one can create a reporting layer to perform aggregations and grouping them by various attributes like date, genre, actor and country and present them using dashboards and reports to enable better decision making. This single point of data, which is the result of data mining activity, can be shared and brainstorming sessions can then be carried out to infer priceless market information and effectively utilize time and efforts to maximize profits.
    • Data Mining: Privacy Preservation in Data Mining Using Perturbation Techniques

      Patel, Nikunjkumar; Sengupta, Sam; Adviser; Andriamanalimanana, Bruno; Reviewer; Novillo, Jorge; Reviewer (2015-05-06)
      In recent years, data mining has become important player in determining future business strategies. Data mining helps identifying patterns and trends from large amount of data, which can be used for reducing cost, increasing revenue and many more. With increased use of various data mining technologies and larger storage devices, amount of data collected and stored is significantly increased. This data contains personal information like credit card details, contact and residential information, etc. All these reasons have made it inevitable to concentrate on privacy of the data. In order to alleviate privacy concerns, a number of techniques have recently been proposed to perform the data mining in privacy preserving way. This project briefs about various data mining models and explains in detail about perturbation techniques. Main objective of this project is to achieve two things. First, preserve the accuracy of the data mining models and second, preserve the privacy of the original data. The discussion about transformation invariant data mining models has shown that multiplicative perturbations can theoretically guarantee zero loss of accuracy for a number of models.
    • De-anonymizing Social Network Neighborhoods Using Auxiliary and Semantic Information

      Morgan, Steven Michael; Novillo, Jorge; Adviser; Andriamanalimanana, Bruno; Reviewer; Reale, Michael; Reviewer (2015-12-11)
      The increasing popularity of social networks and their progressively more robust uses provides an interesting intersection of data. Social graphs have been rigorously studied for de-anonymization. Users of social networks will provide feedback to pages of interest and will create a vibrant profile. In addition to user interests, textual analysis provides another feature set for users. The user profile can be viewed as a classical relational dataset in conjunction with graph data. This paper uses semantic information to improve the accuracy of de-anonymizing social network data.
    • The Deep Space Network - A Technology Case Study and What Improvements to the Deep Space Network are Needed to Support Crewed Missions to Mars?

      Falke, Prasad; Hash, Larry; Advisor; Marsh, John; Reviewer; White, Joshua; Reviewer; Climek, David; Reviewer; Kwiat, Kevin; Reviewer (2017-05-28)
      The purpose of this thesis research is to find out what experts and interested people think about Deep Space Network (DSN) technology for the crewed Mars mission in the future. The research document also addresses possible limitations which need to be fix before any critical missions. The paper discusses issues such as: data rate, hardware upgrade and new install requirement and a budget required for that, propagation delay, need of dedicated antenna support for the mission and security constraints. The Technology Case Study (TCS) and focused discussion help to know the possible solutions and what everyone things about the DSN technology. The public platforms like Quora, Reddit, StackExchange, and Facebook Mars Society group assisted in gathering technical answers from the experts and individuals interested in this research.
    • Employee Collaboration in Sharepoint

      Vempati, Sai Sandeep Soumithri; Chiang, Chen-Fu; Adviser; Novillo, Jorge; Reviewer; Rezk, Mohamed; Reviewer (2016-12-01)
      This project aims at developing a portal for a company’s internal needs that include leave portal, a pre-sales dashboard and a document sharing list for the employees in SharePoint Online. SharePoint Online is web based Content Management System (CMS) provided by Microsoft. Microsoft introduced SharePoint in 2001 which was an instant winner. It had all the features that are needed for storage and collaboration. SharePoint later on evolved into two major versions, namely, On-premise and Cloud version. SharePoint the cloud version proved to be a feasible CMS for start-ups and small companies. As the usage of SharePoint Online has minimised the burden maintenance of servers and administration more companies started using SharePoint. The utility of SharePoint has caught the attention of many companies lately. It has scaled up to, 75000 organisations saving 160 million users [8]. The usage of SharePoint made companies develop portals that are interactive and act as platforms for collaboration and exchange of information. The workflow automation provided by SharePoint helps in simplifying the business process management. Web technologies can be used to develop the portal in a user friendly and responsive manner. In this project, a portal is developed that mainly has three functionalities – a leave application platform, a dashboard for Presales and a list that helps sharing of information. The leave application feature is based on the workflow automation service provided by SharePoint in which the user can request concerned manager for a leave approval. The whole process of approval is automated in the portal. The Presales dashboard option helps in viewing data related to projects that can be used to develop reports by the Presales team of a company. The data is shown in various forms suitable for easy understanding using web parts in the dashboard. A list that demonstrates file approval is included in the portal.
    • Enhancing the Effectiveness of Software Test Automation

      Jansing, David; Novillo, Jorge; Adviser; Cavallo, Roger; Reviewer; Spetka, Scott; Reviewer (2015-12-01)
      Effective software testing can save money and effort by catching problems before they make it very far through the software development process. It is known that the longer a defect remains undetected, the more expensive it is to fix. Testing is, therefore a critical part of the development process. It can also be expensive and labor intensive, particularly when done by hand. It is estimated that the total effort testing software consumes at least half of a project’s overall labor. Automation can make much of the testing an organization does more accurate and cheaper than merely putting several people in a room and having them run tests from a paper script. It also frees the testing staff to do more specific and in-­‐depth testing than would otherwise be possible. This paper focuses mainly on software test automation techniques and how automation can enhance the efficiency of a software team as well as the quality of the final product.
    • A Genetic Algorithm for Locating Acceptable Structure Models of Systems (Reconstructability Analysis)

      Heath, Joshua; Cavallo, Roger; Advisor; Reale, Michael; Reviewer; Sengupta, Saumendra; Reviewer (2018-05)
      The emergence of the field of General Systems Theory (GST) can be best attributed to the belief that all systems, irrespective of context, share simple, organizational principles capable of being mathematically modeled with any of many forms of abstraction. Structure  modeling is a well‐developed aspect of GST specializing in analyzing the structure of a system ‐ that is, the interactions between the attributes of a system. These interactions, while intuitive in smaller systems, become increasingly difficult to comprehend as the number of measurable attributes of a system increases. To combat this, one may approach an overall system by analyzing its various subsystems and, potentially, reconstruct properties of that system using  knowledge gained from considering a collection of these subsystems (a structure model). In situations where the overall system cannot be fully reconstructed based on a given structure model, the benefits and detriments associated with using such a model should both be considered. For example, while a model may be simpler to understand, or require less storage space in memory than the system as a whole, all information regarding that system may not be inferable from that model. As systems grow in size, determining the acceptability of every meaningful structure model of a system in order tofind the most acceptable becomes exceedingly resource-intensive. In this thesis, a measure of the memory requirements associated with storing a system or a set of subsystems (a structure model) is defined and is used in defining an objective measure of the acceptability of a structure as a representation of an overall system. A Genetic Algorithm for Locating Acceptable Structures (GALAS) is then outlined, with this acceptability criterion serving as an optimizable fitness function. The goal of this heuristic is to search the set of all meaningful structure models, without the need for exhaustively generating each, and produce those that are the most acceptable, based on predefined acceptability criteria. 
    • High Performance Distributed Big File Cloud Storage

      Shakelli, Anusha; Sengupta, Sam; Adviser; White, Joshua; Reviewer (2016-05-01)
      Cloud storage services are growing at a fast rate and are emerging in data storage field. These services are used by people for backing up data, sharing file through social networks like Facebook [3], Zing Me [2]. Users will be able to upload data from computer, mobile or tablet and also download and share them to others. Thus, system load in cloud storage becomes huge. Nowadays, Cloud storage service has become a crucial requirement for many enterprises due to its features like cost saving, performance, security, flexibility. To design an efficient storage engine for cloud based systems, it is always required to deal with requirements like big file processing, lightweight metadata, deduplication, high scalability. Here we suggest a Big file cloud architecture to handle all problems in big file cloud system. Basically, here we propose to build a scalable distributed data cloud storage that supports big file with size up to several terabytes. In cloud storage, system load is usually heavy. Data deduplication to reduce wastage of storage space caused by storing same static data from different users. In order to solve the above problems, a common method used in Cloud storages, is by dividing big file into small blocks, storing them on disks and then dealing them using a metadata system [1], [6], [19], [20]. Current cloud storage services have a complex metadata system. Thereby, the space complexity of the metadata System is O(n) and it is not scalable for big file. In this research, a new big file cloud storage architecture and a better solution to reduce the space complexity of metadata is suggested.
    • Image Processing In F#

      Odoi, Kaia; Andriamanalimanana, Bruno; Advisor; Novillo, Jorge; Reviewer; Sengupta, Sam; Reviewer (2017-05-01)
      Image searching is an essential feature of many software applications. Histograms can be used to represent the pixel color intensities of images. Measuring the similarities between images by comparing the histograms can be performed through the use of information-theoretic measures, such as the Kullback-Leibler divergence and cross-entropy. In this project, a query image is selected from a collection of images and it is compared to the other images to determine which image is most similar to the query image. This process is carried out by creating histograms of each image, and then using measures such as the Kullback-Leibler divergence and cross-entropy to compare the histograms. The .NET functional language, F#, is used in the implementation of this project. The C# language, another .NET language, was also used for coding the graphical user interface.
    • An Inventory Management App in Salesforce

      Chennamaneni, Rahul Madhava Rao; Chiang, Chen-Fu; Adviser; Novillo, Jorge; Reviewer; Rezk, Mohamed, Reviewer (2016-12-01)
      Salesforce is a cloud-based customer relationship management (CRM) software that accelerates business relationships and can transform the working lives of the team. Marc Benioff developed it in the late 1990s and now it has been announced as the world's most innovative company for six consecutive years by Forbes Magazine [1]. Unlike traditional CRM software, Salesforce is an internet service. It is available with just a sign-up and logs in through a browser, and it is immediately available. It is based on cloud computing, where the customers, without the need of installing any traditional software, can access the cloud, i.e., through the internet, for their business needs [2]. Inventory Management (IM) is the method of controlling and supervising the storage, utilization and ordering of components that an organization can track of their items it sells. It is the act of controlling and administering the quantities of products in the sale. For a business, an inventory is the main asset which represents an investment by the owner until the item is sold [3]. To demonstrate its functionalities of Salesforce, I created an application for inventory management. Here, In the inventory management, there are two parts: one is administration part, and another part is customer portal. The administrator manages the inventory and store operations, and the customer buys the products in the inventory through customer portal.
    • Live Tweet Map with Sentimental Analysis

      Kotrika, Rohila; Chen-Fu Chiang; Reviewer; Saumendra, Sengupta; Advisor; Andriamanalimanana, Bruno; Reviewer (2016-05-01)
      This project basically aims to build a system for the real-time analysis of the trends and public views around the whole world by storing and analyzing the stream of tweets from the Twitter live API which produces a huge amount of data . The tweets, tweet ID, time and other relevant elements are stored into a database and are represented in a map that is being updated in near real time with the help of Google map API. This project also aims to achieve the sentimental analysis of the tweets by sending the tweets to the natural language processing API which in turn processes the tweets using the natural language processing and gives a result If those tweets are positive, negative or neutral in nature. The map clusters tweet as to show where people are tweeting most from according to the sample tweets we get from the streaming API. These clusters will be shown in different colors according to the sentimental evaluation we receive from the sentiment API by Vivek Narayanan which works by examining individual words and short sequences of words (n-grams) and comparing them with a probability model. The probability model is built on a pre labeled test set of IMDb movie reviews. It can also detect negations in phrases, i.e., the phrase "not bad" will be classified as positive despite having two individual words with a negative sentiment. The web service uses a co routine server based on event, so that the trained database can be loaded into shared memory for all requests, which makes it quite scalable and fast. The API is specified here, it supports batch calls so that network latency isn't the main bottleneck. For Instance, if a tweet is negative in evaluation then it is shown in a red color marker on the map, green for positive and grey for the neutral. This analytic will also demonstrate the heat map for all the tweets that are stored in the database which gives a satisfying answer demonstrating from which part of the world are most of the tweets from. In this project we create a dynamic web application with the target runtime environment as Apache Tomcat Server. The server will also be initialized with the context listener which starts running the code to get the tweets into the database till the server is stopped. The most popular trends among worldwide and citywide would be provided in a drop down to be selected from which gives a clear perspective on how each trend behaves. It also offers the public, the media, politicians and scholars a new and timely perspective on the dynamics of the world wide trends and public opinion.
    • NautiCode: Coding for Kids

      Zeo, Brittany; Mullick, Rosemary; Adviser; Sarner, Ronald; Reviewer; Urban, Christopher; Reviewer (2016-05-08)
      Throughout my college career, I have asked students what made them decide to major in Computer Science. The answers I received very seldom revealed previous coding experience. Unfortunately, this is the system: you don’t know what you want to major in and so you choose something that looks interesting. You just hope it works out for the best. Fortunately for me, I had four years of programming experience in classes before reaching college as well as being a programmer on my high school’s FIRST Robotics team. This previous exposure to coding allowed me to make an educated decision about what I wanted to major in. It is not always the case that an individual gets this experience, and I want to change that. For my Masters Project, I have decided to come up with a website to get kids to learn and practice some basic concepts of coding: NautiCode. My target audience is mid to upper elementary school children. And best of all, there is no previous coding experience needed when using NautiCode. Even if Computer Science is not their career choice, they can have the exposure at an early age. Coding does not only benefit computer scientists; just having the background knowledge of concepts such as: logic, data storage, and how things relate, can be beneficial to an individual for any major. These ideas can help individuals think about problems differently and come up with solutions that would not have been possible had they not been exposed to computer science concepts. What better time in an individual’s life to introduce these concepts than childhood. Children’s brains are magnificent. They can absorb so much information and they think differently about the world. This leads to creative solutions and a new perspective. What I aim to do with NautiCode is to get children thinking in new ways and exploit their creativity and spark new ideas. I aim to give an explanation of the simple concepts in an introduction and gradually work up towards more difficult problems. Children are more capable than they know and with a little guidance, they can start creating their own technologies in no time. NautiCode is a fully functional website that I created on my own. The front end is using Scss, and HTML5 while the backend is using PHP, SQL, JS, and AJAX. My databases are being hosted locally through phpMyAdmin and MAMP.
    • New Techniques for Public Key Encryption with Sender Recovery

      Godi, Murali; Viswanathan, Roopa; Adviser; Novillo, Jorge; Reviewer; Chiang, Chen-Fu; Reviewer (2016-12-15)
      In this paper, we consider a situation where a sender transmits a ciphertext to a receiver using a public-key encryption scheme, and at a later point of time, wants to retrieve the plaintext, without having to request the receiver’s help in decrypting the ciphertext, and without having to store a set of plaintext/ciphertext pairs for every receiver the sender interacts with. This problem, known as public key encryption with sender recovery has intuitive solutions based on KEM/DEM schemes. We propose a KEM/DEM-based solution that is CCA-secure, and only requires the receiver to be equipped with a public/secret key pair (the sender needs only a symmetric recovery key), and has much simplified proofs compared to prior work in this area. We prove our protocols secure in the single receiver and multi-receiver setting. To achieve our goals, we use an analysis technique called plaintext randomization that results in greatly simplified and intuitive proofs for protocols that use a PKE internally as a component and compose the PKE with other primitives. We instantiate our protocol for public key encryption with sender recovery with the well-known KEM/DEM scheme due to Cramer and Shoup.