"The BBC's Great Debate": Anonymised Data from a #BBCDebate Archive
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
The raw data was downloaded as an Excel spreadsheet file
containing an archive of 38,166 Tweets (38,066 Unique Tweets) publicly
published with the queried hashtag (#BBCDebate) between 14/06/2016
22:03:18 and 22/06/2016 09:12:32 BST. Due to the expected high volume of
Tweets only users with at least 10 followers were included in the
The Tweets contained in the Archive sheet were collected using Martin Hawksey’s TAGS 6.0.Given the relatively large volume of activity expected around #BBCDebate and the public and political nature of the hashtag, I have only shared indicative data. No full tweets nor any other associated metadata have been shared.
The dataset contains a metrics summary as well as a table with column headings labeled created_at, time, geo_coordinates (anonymised; if there was data YES has been indicated; if no data was present the corresponding cell has been left blank), user_lang and user_followers_count data corresponding to each Tweet.
Timestamps should suffice to prove the existence of the Tweets and could be useful to run analyses of activity on Twitter around a real-time media event.
No Personally identifiable information (PII), nor Sensitive Personal Information (SPI) was collected nor was contained in the dataset.
Some basic deduplication and refining of the collected data performed.
For more information including methodological and limitation issues etc. please click on the references listed below.