The Garage48 Open & Big Data 2016 hackathon will be one of the most important hackathon’s in our recent history. It’s one of the first data science hackathons in our region and it will put emphasis on new future technologies. Data Science and Machine Learning are the key to extracting valuable knowledge from large collections of data. Data Science is a multidisciplinary field that combines the following subjects as: efficient programming, artificial intelligence, data warehousing, visualization, pattern recognition, statistics, predictive analytics etc. During the years a multitude of tools were developed to apply this complicated machinery on practice. Thus to make data science more easy and reachable for the hackathon, we are giving you 8 tips that will make everyone friends with data science.
You can be a back-end or front-end developer, UI/UX designer, marketer, business developer, data scientist or data scraper – our 8 secret tricks will make your life easy and fun at the hackathon. :)
Our tips for Garage48 Open & Big Data 2016 are the following:
- Look through the list of publicly available datasets that we have collected for you https://docs.google.com/spreadsheets/d/1otI5CjtgHBqQ2Ls_mYn008KxniEU36e8A_62Q1EWLtw/edit#gid=0 and pick up the one that needs the least effort in pre-processing.
- Get to know your data before going deep into analysis, i.e. examine summary statistics.
- Pick up language depending on goals of your project and your personal expertise: R is suitable for analysis of small and medium size datasets, also using ggplot2 library is capable of producing high quality visualisations https://www.ggplot2-exts.org/ has a list of add-ons to ggplot2 that will make your visualisation even cooler). Python has almost the same functionality and capable of almost all the same things as R, but also more suitable for analysing of large size datasets and developing more deployment-ready product. Look up such libraries as numpy, pandas, scikit-learn when using Python.
- Use access to University of Tartu High Performance Computing Center to store large amounts of data and run heavy analysis scripts.
- Feel absolutely free to ask our mentors for help and advice.
- Develop on subsample of the data to save time if your data is big.
- Make the test set first thing on Friday and use it to estimate true final performance of your model on Sunday evening.
- If the main goal of your project is to perform a data analysis on some data - spend some time to make beautiful and clear visualisation of your results https://goo.gl/ZpsL8b - few examples for inspiration.
Want to join in for the Garage48 Open & Big Data 2016 hackathon on Oct 21-23 in Tartu?
LAST seats available via http://garage48.org/events/openbigdata
“Garage48 Open & Big Data 2016” will take place in the framework of EU Structural Funds support scheme “Raising Public Awareness about the Information Society” and is funded by the European Regional Development Fund.