Big giants like Nike, FitBit, Apple, Google, Under Armour etc. are entering the market with IoT based wearables and intuitive mobile apps to monitor the health activities of individuals. We want to provide the solution to accurately detect the activity of an individual as sitting, sitting-down, standing, standing-up and walking from the given accelerometer data and also determine the best location for the accelerometer.
The dataset comprised of 5 classes (sitting-down, standing-up, standing, walking, and sitting) collected on 8 hours of activities of 4 healthy subjects. Each data point comprises of physical attributes of the subject, 3-axis readings from 4 accelerometer and the corresponding physical activity as the class. This data was generated by placing four accelerometer on different body parts, of four subjects, such as waist, left thigh, right arm and right ankle. The readings were taken over a time window of 150ms and represented in temporal sequence.
Since there are 5 target classes (multinomial distribution), the Naive Bayes, K - Nearest Neighbours and Random Forest classifiers were tested on the raw dataset to predict human activity. The input variables are all numeric (x, y, z axes) and selected PCA for dimensionality reduction.
The activity of an individual can be correctly predicted from the given accelerometer data with 77% accuracy. Single best accelerometer should be worn on waist. Additionally, the right arm accelerometer can be combined for better and improved analysis of change point detection of all 5 human activities.
People like to listen songs online on Spotify, Pandora, iTunes etc. and they have their playlists and favourite songs. We want to predict the release year of the particular song based on the featues of it.
The dataset is subset of million song dataset is a freely-available collection of audio features and metadata for a million contemporary popular music tracks. There are 90 attributes, 12 = timbre average, 78 = timbre covariance. The first value is the year (target), ranging from 1922 to 2011. We take the average and covariance over all 'segments', each segment being described by a 12-dimensional timbre vector.
Since the target variable is a quantitative variable, we will use Linear Regression Techniques. A baseline model is created using the intercept (average) of the responses. Linear regression with Regularization and Grid search was used to improve the performance of the model.
The performace of the model improved from Baseline Model 22.137 RMSE to Final Interaction Model with Regularization 16.525 RMSE.
Online advertisements are used for marketing to target the specific group of people. Click- Through Rate is the ratio of users who click on a specific link to the number of total users who view a page. We want to predict whether the user would click on the online advertisement song based on the attributes of the user and his web history.
The dataset is available on Kaggle and it was provided by Criteo Labs . There are 39 attributes, 13 = count features, 26 = categorial values. The first value is either 0 or 1. Whether the user has clicked on the advertisement or not.
Since the target variable is a qualitative variable, we will use Classification Techniques. A baseline model is created using the intercept (average) of the responses. Since there are many categorical features, One Hot Encoding and Feature Hashing was used to decrease the sparsity. Logistic regression with Hyperparameter Grid search was used to improve the performance of the model.
The performace of the model improved from Baseline Model 0.542 Log Loass to Final Logistic Regression Model 0.459 Log Loss.
Natural Language Processing (NLP) was used for extraction of keywords for the online dashboard. BeautifulSoup libray was used to web scrape U.S. Patent Application website . Keywords were extracted from the plain text using noise removal, scrubbing, stemming, normalization and word tagging of keywords. Author Name and Application Data were also extracted and processed to create HTML Pages.
Neuroimaging analysis is performed to learn about the neural activity of the brain when some action is performed. Exploratory data analysis is performed to study the brain activity of larval zebrafish. The fish is shown images of 12 directions in every 20 second interval.
The dataset is composed of images of brain activity taken by using light-sheet microsopy. The data is of total 240 seconds and the image dimension is 230 X 202. The raw data is in text format where each row is for single pixel. There are total of 46460 pixels and each row has first two attributes as the co-ordinates of the pixel and remaining is the time series data of the intensity of the pixel.
Since the data is multi - dimensional, we will use Principal Component Analysis. We have aggregated the features in time by creating 20 features with each feature adding the first second in all the directions and so on. We have also aggregated the feature direction wise with 12 features.
We have learned that the stimulus responds in similar way across all the directions. Regions on either side of the midline of brain are colored differently, which suggests that, direction selectivity has a different representation across the two sides of the brain.
Data Warehouse for state-wise drought data across USA was created to provide business insights. Star schema was created after dimension modelling on the data. MS SQL Server Management Studio 2014 (SSIS, SSAS) was used for Extraction, Transformation and Loading (ETL).