Workshop: Introduction to big data manipulation using Hive

24 June 2016
University of Manchester

Hive is a package that allows users to manipulate large datasets within the Hadoop environment with the aim of either making the data small enough to analyse on a desktop package such as Stata or R, or of doing analyses within the Hadoop environment itself.

In this workshop you will be introduced to Hive and the Hive query language (HiveQL) within a Hortonworks Hadoop Data Platform (HDP) environment.

This course will cover how to:

  • load big datasets into the Hadoop file system and how to process them using Hive
  • run simple queries using HiveQL enabling you to start exploring the contents of a dataset
  • ‘slice’ and ‘dice’ the dataset into smaller datasets which can be consumed by traditional desktop applications
  • access Hive tables from R using Open database connector (ODBC)

This workshop is free and is intended for researchers with experience of doing quantitative research. It will be of most interest to researchers who have used commands in packages like Stata, R or SPSS but have no in-depth knowledge of programming with data.

The format of the workshop will be a mixture of presentations and hands-on practical exercises using Hive.

Laptops will be provided for this workshop with all necessary software.