Course: Big Data analysis with Hadoop and RHadoop

Europe/Ljubljana
Event will take place via ZOOM

Event will take place via ZOOM

https://us06web.zoom.us/j/84616931743?pwd=Rkl5VkZET1BOKyt3bjFDUGQ5Q2lWQT09
Description

Overview: 
This training course will focus on the foundations of “Big Data” analysis by introducing the Hadoop distributed computing architecture and providing an introductory level tutorial for Big Data analysis using Hadoop and Rhadoop. Although online, the course will be hands-on, allowing participants to work interactively on real data on the High Performance Computing environment of the University of Ljubljana 

Description: 
The training event will consist of two 4 hour training in two consecutive days. The first day will focus to big data management and data analysis with Hadoop. The participant will learn how to (i) move big data efficiently to a cluster and to Hadoop distributed file system, and (ii) how to perform simple big data analysis by Python scripts using MapReduce and Hadoop. The second day will focus to big data management and analysis using Rhadoop. We will stick to work within RStudio and will write all scripts within R using several state-of-the-art libraries for parallel computations, like parallel, doParallel and foreach and libraries to work with Hadoop, like rmr, rhdfs and rhbase. 

Target audience: 
Everyone interested in big data management and analysis 

Prerequisite knowledge: 
For the first day: basic Linux shell commands, Python  
For the second day: basic Linux shell commands and R 

Workflow: 
The course will be online via zoom. The participants will need local computer to connect to the HPC at University of Ljubljana. Before the start of the course they will get a student account at this supercomputer and all the examples will be done on this machine. They will retain this account for 2 more weeks to repeat the cases again, to transfer the data and the examples to a local machine. 

Skills to be gained:
At the end of the course the student will be able to: 

  • Connect to a supercomputer using NoMachine tool; 

  • Move big data to a supercomputer and store it to a distributed file system; 

  • Writing Python scripts to perform basic data management and data analysis tasks by Hadoop; 

  • Writing R scripts to perform basic data management and data analysis tasks by Rhadoop libraries like rmr, rhdfs and rhbase; 

Trainers:  

Name/ Surname 

Institution 

Description of expertise 

Prof. Janez Povh 

University of Ljubljana, Slovenia 

applied mathematics, high performance computing, big data analysis 

Dr. Giovanna Roda 

EuroCC Austria, BOKU, and TU Wien, Austria

high performance computing, big data analysis 

Liana Akobian 

TU Wien, Austria

high performance computing, big data analysis 


Organisers:

This course is an EuroCC event jointly organised by EuroCC Slovenia and EuroCC Austria.  

Fakulteta za strojništvo Ostalo LogotipLeCAD   


Organized by

Fakulteta za računalništvo in informatiko, Univerza v Ljubljani

Surveys
Survey