How to read the datasets
All datasets below are provided in the form of csv files. We provide you with a class for loading these files into memory: download tableDemos.zip
and uncompress it in your Processing project folder. The zip file contains the Table
class (all files named Table.pde
are identical) as well as examples on how to use it:
tableDemo1
shows how to load a csv (comma-separated values) file and read data directly from the table.tableDemo2
shows how to store csv data into objects.tableDemo3
shows how to preprocess csv content.tableTest
loads a csv file and displays it as a table in the Processing window. It also contains instructions on how to load each of the datasets below.
Since some of the datasets include country data, we also provide you with a file countries.csv
that lists country names, country codes and vertices for drawing them on the screen. For an example on how to use this file to draw a map, download mapDemo.zip
. Please note that country names in the csv file will not necessarily match all country names from your dataset.
We don’t provide code for parsing dates and other structured types. For this you might have to use the DateFormat
class or regular expressions. See tableDemo3
for simple uses of regular expressions.
Also remember that Processing is based on Java, so if there is anything you need to do that Processing does not support, feel free to use Java classes.
If you find a bug in the Processing code above, please send an e-mail to james.eagantelecom-paristechfr.
Classic datasets
These are simple multidimensional datasets that are for the most part classic infovis datasets. Use one of these if you prefer to be safe.
Cameras
A dataset of about 1000 cameras with 13 properties such as weight, focal length, price, etc.
Cars
A dataset of about 400 cars with 8 characteristics such as horsepower, acceleration, etc.
Cereals
About 80 cereal products with their dietary characteristics.
Countries
A dataset of 160 countries with ~40 characteristics such as debt, electricity consumption, Internet users, etc.
Films
About 1600 movies with properties such as length, main actor and actress, director and popularity.
Wikipedia Edits
A log of 1000 wikipedia edits with article name, user, date and amount of changes.
Less common datasets
These are datasets that we did not visualize, but the Table class loads them without any apparent problem. They are more interesting in that fewer (or no) visualizations are available online yet, and they can lead to interesting insights.
Causes of Death
Causes of death in France from 2001-2008. Variables include year, gender, cause of death, and number of deaths.
Other data on European countries can be downloaded from the Eurostat Website:
- Use the tree to browse the databases by themes, then open the database of your choice by clicking on the left icon.
- A default tabular view appears and a user interface allows you to add more dimensions or filter the data (it might require some time to get used to).
- Once you are satisfied with the table, click on the disk icon on the top then select the xls format. Cleanup the xls file using Excel then export it as a csv file.
New Born Baby Patterns
This dataset consists of three files: sleep periods, feeding periods, and diaper changes of a baby in its first 2.5 months
Time Use
How people spend their time depending on country and sex, with activities such as paid work, household and family care, etc.
You can generate csv files that include other dimensions such as day of the week or month by going to the Eurostat Website and proceeding as indicated above.
Happiness
European quality of life survey with questions related to income, life satisfaction or perceived quality of society.
The above table is quite small and only provides the average rating for the question How happy would you say you are these days? Rating 1 (low) to 10 (high) by country and by sex. On its own, this dataset it probably insufficient for this class project. You are encouraged to download and visualize answers to other questions as well. For this, go to the Eurofound Website, select the question to the left then use the bottom links to download the csv file.
Income Inequalities
The Gini index per country per year (sparse data).
Other data per country per year can be downloaded from gapminder, such as electricity generation per person, alcohol consumption, air traffic accidents, and more classical measures such as GDP. You can possibly combine several indicators together.
HIV Prevalence
HIV prevalence per country per year, with uncertainty bounds. Cells need some parsing.
- Download csv file.
- Explanation of columns.
- Source Website (seems to be offline).
Speed Dating
Speed dating data with over 8,000 observations of matches and non-matches, with answers to survey questions about how people rate themselves and how they rate others on several dimensions. This is a large and rich dataset which might take you some time to fully understand.
World Values Survey
A comprehensive survey consisting in 300+ questions asked to people from different countries on their values, gathered across several years. You can answer a subset of the questions here and see which country best represents your values.
- No csv file is provided here for the moment, but you can download Excel files for individual questions by following the link below. Requires some cleaning up.
- Source Website.
WVS Cultural Map of the World
An aggregated dataset computed from the World Values Survey that measures cultural proximity of countries across two dimensions, and for different time periods. A small but interesting dataset.
Dream Bank
A collection of over 20,000 dream reports with dates. The reports come from a variety of different sources and research studies, from people ages 7 to 74.
- Not dataset file is provided here for the moment, but you can download text files by following the link below. Requires some cleaning up. See example of a query result.
- Source Website.
Your own data
You may also choose your own dataset. In order to do so, you must first get your dataset approved by the instructor. Data should be sufficiently complex.
These datasets have been gathered and cleaned up by Petra Isenberg, Pierre Dragicevic and Yvonne Jansen. Please acknowledge these authors when reusing content from this page, and the source data authors for external links. This page licensed under a Creative Commons Attribution-ShareAlike 3.0 License.