• Data Mining: Privacy Preservation in Data Mining Using Perturbation Techniques

      Patel, Nikunjkumar; Sengupta, Sam; Adviser; Andriamanalimanana, Bruno; Reviewer; Novillo, Jorge; Reviewer (2015-05-06)
      In recent years, data mining has become important player in determining future business strategies. Data mining helps identifying patterns and trends from large amount of data, which can be used for reducing cost, increasing revenue and many more. With increased use of various data mining technologies and larger storage devices, amount of data collected and stored is significantly increased. This data contains personal information like credit card details, contact and residential information, etc. All these reasons have made it inevitable to concentrate on privacy of the data. In order to alleviate privacy concerns, a number of techniques have recently been proposed to perform the data mining in privacy preserving way. This project briefs about various data mining models and explains in detail about perturbation techniques. Main objective of this project is to achieve two things. First, preserve the accuracy of the data mining models and second, preserve the privacy of the original data. The discussion about transformation invariant data mining models has shown that multiplicative perturbations can theoretically guarantee zero loss of accuracy for a number of models.
    • De-anonymizing Social Network Neighborhoods Using Auxiliary and Semantic Information

      Morgan, Steven Michael; Novillo, Jorge; Adviser; Andriamanalimanana, Bruno; Reviewer; Reale, Michael; Reviewer (2015-12-11)
      The increasing popularity of social networks and their progressively more robust uses provides an interesting intersection of data. Social graphs have been rigorously studied for de-anonymization. Users of social networks will provide feedback to pages of interest and will create a vibrant profile. In addition to user interests, textual analysis provides another feature set for users. The user profile can be viewed as a classical relational dataset in conjunction with graph data. This paper uses semantic information to improve the accuracy of de-anonymizing social network data.
    • Live Tweet Map with Sentimental Analysis

      Kotrika, Rohila; Chen-Fu Chiang; Reviewer; Saumendra, Sengupta; Advisor; Andriamanalimanana, Bruno; Reviewer (2016-05-01)
      This project basically aims to build a system for the real-time analysis of the trends and public views around the whole world by storing and analyzing the stream of tweets from the Twitter live API which produces a huge amount of data . The tweets, tweet ID, time and other relevant elements are stored into a database and are represented in a map that is being updated in near real time with the help of Google map API. This project also aims to achieve the sentimental analysis of the tweets by sending the tweets to the natural language processing API which in turn processes the tweets using the natural language processing and gives a result If those tweets are positive, negative or neutral in nature. The map clusters tweet as to show where people are tweeting most from according to the sample tweets we get from the streaming API. These clusters will be shown in different colors according to the sentimental evaluation we receive from the sentiment API by Vivek Narayanan which works by examining individual words and short sequences of words (n-grams) and comparing them with a probability model. The probability model is built on a pre labeled test set of IMDb movie reviews. It can also detect negations in phrases, i.e., the phrase "not bad" will be classified as positive despite having two individual words with a negative sentiment. The web service uses a co routine server based on event, so that the trained database can be loaded into shared memory for all requests, which makes it quite scalable and fast. The API is specified here, it supports batch calls so that network latency isn't the main bottleneck. For Instance, if a tweet is negative in evaluation then it is shown in a red color marker on the map, green for positive and grey for the neutral. This analytic will also demonstrate the heat map for all the tweets that are stored in the database which gives a satisfying answer demonstrating from which part of the world are most of the tweets from. In this project we create a dynamic web application with the target runtime environment as Apache Tomcat Server. The server will also be initialized with the context listener which starts running the code to get the tweets into the database till the server is stopped. The most popular trends among worldwide and citywide would be provided in a drop down to be selected from which gives a clear perspective on how each trend behaves. It also offers the public, the media, politicians and scholars a new and timely perspective on the dynamics of the world wide trends and public opinion.
    • Representational State Transfer as a Web Service

      Desai, Dhruv; Sengupta, Sam; Adviser; Novillo, Jorge; Reviewer; Andriamanalimanana, Bruno; Reviewer (2015-12-01)
      This report is a study on Representational State Transfer architectural style and its usefulness for implementing web service. This report will highlight the differences in perceiving REST as an architectural style and as a web service. This document will also discuss web services in general and highlight important differences between the different web services in programming languages. The goal of this report is to clarify the term REST as an architectural style which has proved to be a popular choice for implementing a web service rather than REST being termed as a web service and compare Web Services based on its performance in a Java Application.