Webinar: What is Hive?

22 March 2016
Online, 15.00 - 16.00

Hive is a package that works with Hadoop that allows users to manipulate very large datasets. This webinar is intended as an overview of what Hive is and why you might want to learn more about it.

This webinar will provide an overview of:

  • how Hive integrates into an Hadoop system and provides access to the large distributed datasets stored in Hadoop
  • why you might want to use Hive
  • the range of things you can do with Hive
  • two different ways of accessing Hive: directly accessing Hive via a Web interface and accessing Hive directly from desktop applications
  • examples to demonstrate: selecting specific columns from a data set, selecting rows with specific column values, aggregating and obtaining basic statistical measures for column values and joining two datasets together using a common column.

This webinar is intended for researchers with no in-depth knowledge of programming with data. However, attendees are more likely to find this webinar of interest if they already have some experience of doing simple data manipulations (e.g. obtaining summary statistics or aggregating data in SPSS, Stata or R)

The webinar will consist of a 30 minute presentation followed by 20 minutes for questions.