• Data Mining and Bi Data Warehousing Based Implementation for a Random Film Studio

      Bonthi, Sneha; Andriamanalimanana, Bruno; Adviser; Rezk, Mohamed; Reviewer; Reale, Michael; Reviewer (2016-12-01)
      The purpose of this report is to study a dataset of movies and analyse the possibility and feasibility of implementing a data warehousing or a data mining application to improve analytics and decision making. The project report talks about the raw data originating from the data collection centres and box offices which can be modelled and transformed into a specific format and structure that would help the business analysts in identifying patterns and trends so as to take important business decisions. The report explores the benefits of extracting, transforming and loading this raw data into a dimensional model. According to the proposed implementation, one can create a reporting layer to perform aggregations and grouping them by various attributes like date, genre, actor and country and present them using dashboards and reports to enable better decision making. This single point of data, which is the result of data mining activity, can be shared and brainstorming sessions can then be carried out to infer priceless market information and effectively utilize time and efforts to maximize profits.
    • De-anonymizing Social Network Neighborhoods Using Auxiliary and Semantic Information

      Morgan, Steven Michael; Novillo, Jorge; Adviser; Andriamanalimanana, Bruno; Reviewer; Reale, Michael; Reviewer (2015-12-11)
      The increasing popularity of social networks and their progressively more robust uses provides an interesting intersection of data. Social graphs have been rigorously studied for de-anonymization. Users of social networks will provide feedback to pages of interest and will create a vibrant profile. In addition to user interests, textual analysis provides another feature set for users. The user profile can be viewed as a classical relational dataset in conjunction with graph data. This paper uses semantic information to improve the accuracy of de-anonymizing social network data.
    • A Genetic Algorithm for Locating Acceptable Structure Models of Systems (Reconstructability Analysis)

      Heath, Joshua; Cavallo, Roger; Advisor; Reale, Michael; Reviewer; Sengupta, Saumendra; Reviewer (2018-05)
      The emergence of the field of General Systems Theory (GST) can be best attributed to the belief that all systems, irrespective of context, share simple, organizational principles capable of being mathematically modeled with any of many forms of abstraction. Structure  modeling is a well‐developed aspect of GST specializing in analyzing the structure of a system ‐ that is, the interactions between the attributes of a system. These interactions, while intuitive in smaller systems, become increasingly difficult to comprehend as the number of measurable attributes of a system increases. To combat this, one may approach an overall system by analyzing its various subsystems and, potentially, reconstruct properties of that system using  knowledge gained from considering a collection of these subsystems (a structure model). In situations where the overall system cannot be fully reconstructed based on a given structure model, the benefits and detriments associated with using such a model should both be considered. For example, while a model may be simpler to understand, or require less storage space in memory than the system as a whole, all information regarding that system may not be inferable from that model. As systems grow in size, determining the acceptability of every meaningful structure model of a system in order tofind the most acceptable becomes exceedingly resource-intensive. In this thesis, a measure of the memory requirements associated with storing a system or a set of subsystems (a structure model) is defined and is used in defining an objective measure of the acceptability of a structure as a representation of an overall system. A Genetic Algorithm for Locating Acceptable Structures (GALAS) is then outlined, with this acceptability criterion serving as an optimizable fitness function. The goal of this heuristic is to search the set of all meaningful structure models, without the need for exhaustively generating each, and produce those that are the most acceptable, based on predefined acceptability criteria.