Computer Science

Big Data Analytics at Oklahoma State University -- Projects

NSF Research Experience for Undergraduates (REU) program

Big Data Analytics at Oklahoma State University

Oklahoma State University

Computer Science Department

Stillwater, OK

Research Projects

  • Attacks on privacy-enhancing technologies
  • Adversarial machine learning
  • Well Drilling Speed Prediction
  • Visual Data Analytics
  • Twitter data analytics
  • Intelligent scene understanding from imagery and video big data
  • Project Descriptions

    Title: Attacks on Privacy-Enhancing Technologies

    Description: Even though encryption protects the contents of the communication between a client and a server on the Internet, an adversary can still collect metadata information such as the size of packets and the number of packets sent and received. This type of attack is called a website fingerprinting attack. Participants in this project will collect real web data, parse them using standard statistical tools, and analyze the accuracy of such an attack using machine learning and other statistical techniques.

    Title: Adversarial Machine Learning

    Description: Big data analytics usually involves machine learning for prediction or analysis. Machine learning requires training data before a model can be built. It is often assumed that this training data is representative of this model. In this project, participants will explore the effect of adversarial/malicious data in the training set on the prediction accuracy of the model built. Possible countermeasures will also be analyzed.

    Title: Well Drilling Speed Prediction

    Description: A large dataset of drilling speed and drilling attributes (such as rock formation, weight, torque, direction, etc...) are available for analysis. The goal is to accurately predict the drilling speed based on the conditions.

    Title: Visual data analytics

    Description: While "Big Data" commonly refers to datasets that contain a very large number of data points, the term also often implies the use of datasets with a very high amount of dimensionality. Visualization and visual analytic tools have become integral to the knowledge discovery process across various domains. The interactive visual representations of data amplify human cognition, and thereby achieve deeper, faster insights that can facilitate decision-making. Data visualization is the graphical display of abstract information for data analysis and communication. We will focus on multidimensional data visualization with standard high-dimensional methods, such as parallel coordinates, projections such as MDS, and non-projective visualization techniques such as node-link graphs. In this project, participants will collect metadata information from real web data and in order to directly interact with data, gain insight, draw conclusions, and make decisions, they visualize it using different selected methods and analyze the correlations of data points or clusters.

    Title: Twitter Data Analytics

    Description: As early as 2010, researchers have observed: "Because social media is already a significant part of the information ecosystem and as social media platforms and applications gain widespread adoption with unprecedented reach to users, consumers, voters, businesses, governments, and nonprofit organizations alike, interest in social media from all walks of life has been skyrocketing from both application and research perspectives". Twitter is one of the most popular social media platforms in use today. This project will focus on Twitter data. We will collect Twitter data, apply filtering methods to extract pertinent data, and perform empirical validation of some predictive and analytic models.

    Title: Intelligent scene understanding from imagery and video big data

    Description: A computer system's ability to interpret and understand context in visual data is growing rapidly, with the advent of deep learning and access to large-scale imagery sources and repositories. Such an ability is vital for machines that operate in the real world, such as autonomous vehicles. Participants in this project will develop machine learning techniques to recognize objects, actions and relationships in image data.

    Questions? Contact: Dr. Eric Chan-Tin (