Archive for February 21st, 2008

Sample datasets I am trying to import

I am trying to collect sample datasets to test in OpenVisuals.org. As of that, these sample datasets should:

  • sample a generic type of datasets.
  • should have different but generic formats, which would be a format outcome of copying/pasting, exporting from excel, etc.. (right now, I can think of differences in comma separated values (sometimes comma, sometime tab, sometimes space).
  • should imply comparison among its data.

I will use these datasets to make sure that the website (hence the applets) will be able to work fine with them. I currently fetched couple of datasets that would be suitable:

MDF (wood type) Production for Continents per years (gathered from FAO United Nations website)

years	subject	commodity	America +	Asia +	Europe +	Oceania +	

1995	Production Quantity	MDF	2398000.00	1582000.00	3363300.00	540000.00

1996	Production Quantity	MDF	2725000.00	2190000.00	3604300.00	786000.00

1997	Production Quantity	MDF	3285036.00	3737000.00	4587300.00	853000.00

1998	Production Quantity	MDF	3899694.00	3484000.00	6557913.00	899000.00

1999	Production Quantity	MDF	4635000.00	3854000.00	7288500.00	1025000.00

2000	Production Quantity	MDF	4773000.00	4652000.00	8380493.00	1241000.00

2001	Production Quantity	MDF	5101472.00	8002300.00	9163590.00	1350000.00

2002	Production Quantity	MDF	5601924.00	10135100.00	10386690.00	1419000.00

2003	Production Quantity	MDF	6327229.00	13970200.00	11110990.00	1627000.00

2004	Production Quantity	MDF	7729545.00	18658026.00	11846105.00	1639000.00

2005	Production Quantity	MDF	7751384.00	24133523.00	12704400.00	1622000.00

2006	Production Quantity	MDF	8467770.00	27017523.00	13383800.00	1635000.00

In the table above, the generated dataset set is not delimited with comma, but tab spaces (although the file is .csv). This also happens when you copy paste from Excel.

Also below, is a very typical example of a dataset with empty cells (unknown data), though the content is one of the most popular ones lately.

"State","Barack Obama","Hillary Rodham Clinton  ","John Edwards","John McCain ","Mike Huckabee","Mitt Romney","Ron Paul"
"Alabama",10,19,0,16,21,0,0
"Arizona",21,25,0,50,0,0,0
"Arkansas",3,10,0,1,29,1,0
"California",155,195,0,146,0,3,0
"Connecticut",26,22,0,27,0,0,0
"Delaware",9,6,0,18,0,0,0
"Georgia",22,12,0,3,45,0,0
"Illinois",68,35,0,54,0,3,0
"Kansas",15,6,0,,,,0
"Massachusetts",38,55,0,18,0,22,0
"Missouri",36,36,0,58,0,0,0
"Montana",,,,0,0,25,0
"New Jersey",46,54,0,52,0,0,0
"New Mexico",12,14,0,,,,0
"New York",80,121,0,101,0,0,0
"North Dakota",,,,5,5,8,5
"Oklahoma",14,24,0,32,6,0,0
"Tennessee",14,24,0,19,25,8,0
"Utah",14,9,0,0,0,36,0
"West Virginia",,,,,18,0,0
"Florida",,,,57,0,0,0
"South Carolina",25,12,8,19,5,0,0
"Michigan",,,,6,1,23,0
"New Hampshire",9,9,4,7,1,4,0

Here is also the imported version of this dataset in OpenVisuals.org .