An Incomplete #dh2013 Twitter Archive (Conference Days Only; Times in GMT and BST)
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
This .XLS file contains a dataset of Tweets tagged with #dh2013 (case not sensitive).
This file was created and shared by Ernesto Priego (Centre for Information Science, City University London) with a Creative Commons- Attribution license (CC-BY) for academic research and educational use.
The Digital Humanities 2013 conference took place at the University of Nebraska–Lincoln, USA, 16-19 July 2013.
The file contains approximately 6,661 Tweets published publicly and tagged with #dh2013 between Mon Jul 15 07:12:10 +0000 and Sat Jul 20 23:20:04 +0000.
Plase note the data on this set is incomplete. If you have the missing Tweets, will you let us know to complete a set?
The Tweets contained in this file were originally collected in July 2013 using Martin Hawksey’s TAGS 5.1. Due to the volume of Tweets several Google Spreadsheets were created during preceding and during the event, which were subsequently refined to individual sheets. An attempt to reconstruct the chronology was done manually. With thanks to Lisa Rhody who contributed some Tweets I had failed to collect.
Sheet 0. A 'Cite Me' sheet, including procedence of this file, citation information, information about its contents, the methods employed and some context.
Sheet 1. Monday 15 July 2013 ( 371 Tweets; noticeably incomplete)
Sheet 2. Tuesday 16 July 2013 ( 1, 187 Tweets)
Sheet 3. Wednesday 17 July 2013 ( 2, 227 Tweets)
Sheet 4. Thursday 18 July 2013 ( 2, 826 Tweets)
Sheet 5. Friday 19 July 2013 (approx 1500 Tweets; various Tweets with line breaks; noticeably incomplete due to high volumes and collection times; set from Fri Jul 19 13:41:01 +0000 )
Sheet 6. Saturday 20 July 2013 ( 122 Tweets; incomplete, set starts from Sat Jul 20 17:42:30)
Times are, unfortunately, in GMT (created) and BST (time). They should be Nebraska time, though of course not all Tweets were tweeted from the conference location. This means that dates do not correspond with Conference day times due to time difference. Nebraska is CDT.
Only users with at least 2 followers were included in the archive. Retweets have been included. Data might require reduplication.
Due to the different methods employed in attempting to catch a high volume of Tweets, unfortunately the metadata in the set is not complete (the lack of ISO language metadata in most of these sheets is particularly disappointing, as it would have provided interesting insights)..
Some work was done to ensure the chronology was complete; I have highlighted gaps in the Tweets on yellow on the sheets and in the listing above.
Please note that both research and experience show that the Twitter search API isn't 100% reliable. Large tweet volumes affect the search collection process. The API might "over-represent the more central users", not offering "an accurate picture of peripheral activity" (González-Bailón, Sandra, et al. 2012).
The Tweet volume was higher than what the available collecting methods allowed so data in this file is known to be incomplete. It is not guaranteed this file contains each and every Tweet tagged with #dh2013 during the indicated period, and is shared for comparative and indicative educational and research purposes only.
Please note the data in this file is likely to require further refining and even deduplication. The data is shared as is. This dataset is shared to encourage open research into scholarly activity on Twitter. If you use or refer to this data in any way please cite and link back using the citation information above.