For large datasets, e.g. millions of records, a dedicated database environment would be the storage medium of choice. A database environment is designed to store data efficiently while at the same time making the retrieval of the data you want very straight forward using simple to learn SQL queries. The often hidden advantage of a database system is that the actual database storage can be on a remote server whilst the query environment is on your desktop. The only data that takes up memory on your machine are the records returned by your queries.
This free webinar, organised by the UK Data Service, will explain various different database environments and storage regimes and show you how easy it can be to start retrieving data using standard SQL queries.
About the webinar series: getting data, storing data, manipulating data
All projects need data. You can generate it yourself via surveys or you can get some from the UK Data Service. If you didn’t generate it yourself, the chances are it is not quite what you wanted but you can adapt it to your needs. You can adapt the data by a variety of means; cleaning the data, extracting parts of the data (and ignoring the rest) and, trickiest of all, joining data from different sources.
This series of three webinars will cover ways of dealing with these data issues using both familiar software tools such as Excel and others that you may be aware of but may not have had any direct experience of.
In the first webinar we will look at SQL and Databases: useful for storing large data (100’s of GB or even more) and using simple SQL queries to retrieve only the items you want, possibly across more than one dataset.
In the second webinar we will look at various means of getting data from the Internet: this might range from simple copy and paste through to systematic data downloads from datasets or scraping specific items from 100’s of similar web pages.
The third webinar will look at the new functionality available in the later versions of Excel: Power Pivot to break the 1M rows problem and making dataset joins much easier, and the latest dynamic array functions which can simplify many common tasks.