Predicting Latencies in Networks Predicting latencies of communicating hosts over large networks. Made use of
Matrix Factorization and Time Series Forecasting to predict latencies of random hosts from incomplete data.
Explaining Aggregates for Exploratory Analytics Building explanatory regression functions to facililate large scale data exploration. Tool to guide data analysts
when exploring unknown data sets.
Query-Driven Learning for Approximate Query Processing Building Machine Learning models that learn to predict the answers of queries using previously executed queries.
This approach offers order of magnitude speedups in executing aggregate queries by trading off some of the accuracy.
Detecting Interesting Regions in Data Using Query-Driven learning models to automatically detect interesting regions in large data sets.
This approach fuses Machine Learning and Evolutionary Optimization to discover regions that are potentially interesting
to the user.
EDA Analysis for Chicago Crimes An EDA analysis for Chicago Crimes focusing on the spatio-temporal dimensions
of the dataset. In this kaggle kernel I constructed interesting visualisations to answer questions that I had when I first
encountered the data set.
Research Paper Recommender This project is aimed at helping researchers and people who
read scientific papers find new papers that might be of interest to them.
It uses arXiv and recent developments in word embeddings
(using a popular library gensim) to automatically fetch new papers for
a given category. If they are found
to be similar then they are added to a specified folder.
Growing Networks An implementation of the algorithm listed in the paper
"A self-organising network that grows when required".
This algorithm can be used for clustering tasks, vector quantization tasks and dimensionality reduction tasks
Cyprus Budget Explorer This project aims to build a system that would allow for exploring the budget
of the Republic of Cyprus. The data is available in a pdf format from here : http://bit.ly/2biYQ0y
Because the data is not available in any other format, the pdf file was
parsed and cleaned using Tabula and Google Refine.
MapReduce/HBase code examples As part of the Big Data Course, implemented a number of algorithms performing top-k queries,
exploratory analysis and more. The code examples, show how to use secondary sorting and how to efficiently leverage
the clean-up phase of Mappers and Reducers.