An Archive of #APE2017 Tweets Published 16-18 January 2017 GMT
datasetposted on 19.01.2017 by Ernesto Priego
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
The Academic Publishing in Europe 12 (APE 2017) conference took place in Berlin, Germany, on 17 - 18 January 2017 with a Pre-Conference Day on 16 January 2017. A hashtag used on Twitter to report and discuss from / about the conference was #APE2017.
What This Output Is
This is a CSV file containing a total of 2,011 Tweets publicly published with the hashtag #APE2017 between Monday January 16 2017 at 00:48:59 +0000 and Wednesday January 18 2017 22:42:06 +0000.
Please note the conference took place in Berlin (GMT +1); the local time of publishing appears under column E.
Methodology and Limitations
The Tweets contained in this file were collected by Ernesto Priego using Martin Hawksey's TAGS 6.0. The original data collection gathered 2,133 Tweets including a period covering 10-19 January 2017.
For the purpose of this particular dataset the original dataset was refined to include here only the conference period of 16-18 January and the data was re-ordered in chronological order.
Retweets have been included (Retweets count as Tweets), so Tweet text duplication is normal. The collection spreadsheet was customised to reflect the time zone and geographical location of the conference and GMT (columns D and E).
The profile_image_url and entities_str metadata were removed before public sharing in this archive.
Though initial data refining was conducted please bear in mind that the conference hashtag might have been spammed so some Tweets colllected may be from spam accounts. Some automated refining has been performed to remove Tweets not related to the conference but the data is likely to require further refining and deduplication.
Both research and experience show that the Twitter search API is not 100% reliable. Large Tweet volumes affect the search collection process. The API might "over-represent the more central users", not offering "an accurate picture of peripheral activity" (Gonzalez-Bailon, Sandra, et al. 2012).
Apart from the filters and limitations already declared, it cannot be guaranteed that this file contains each and every Tweet tagged with #APE2017 during the indicated period, and the dataset is shared for archival, comparative and indicative educational research purposes only. Other hashtag combinations might have been used for the conference as well.
Only content from public accounts is included and was obtained from the Twitter Search API. The shared data is also publicly available to all Twitter users via the Twitter Search API and available to anyone with an Internet connection via the Twitter and Twitter Search web client and mobile apps without the need of a Twitter account.
Each Tweet and its contents were published openly on the Web with the queried hashtag and are responsibility of the original authors. Original Tweets are likely to be copyright their individual authors but please check individually.
This dataset is shared to archive, document and encourage open educational research into scholarly activity on Twitter.
Tweets published publicly by scholars during academic conferences are often tagged (labeled) with a hashtag dedicated to the conference in question.
The purpose and function of hashtags is to organise and describe information/outputs under the relevant label in order to enhance the discoverability of the labeled information/outputs (Tweets in this case).
A hashtag is metadata users choose freely to use so their content is associated, directly linked to and categorised with the chosen hashtag.
Though every reason for Tweeters' use of hashtags cannot be generalised nor predicted, it can be argued that scholarly Twitter users form specialised, self-selecting public professional networks that tend to observe scholarly practices and accepted modes of social and professional behaviour.
In general terms it can be argued that scholarly Twitter users willingly and consciously tag their public Tweets with a conference hashtag as a means to network and to promote, report from, reflect on, comment on and generally contribute publicly to the scholarly conversation around conferences. As Twitter users, conference Twitter hashtag contributors have agreed to Twitter's Privacy and data sharing policies.
Professional associations like the Modern Language Association recognise Tweets as citeable scholarly outputs. Archiving scholarly Tweets is a means to preserve this form of rapid online scholarship that otherwise can very likely become unretrievable as time passes; Twitter's search API has well-known temporal limitations for retrospective historical search and collection.
Beyond individual tweets as scholarly outputs, the collective scholarly activity on Twitter around a conference or academic project or event can provide interesting insights for the contemporary history of scholarly communications. To date, collecting in real time is the only relatively accurate method to archive tweets at a small scale.
Though these datasets have limitations and are not thoroughly systematic, it is hoped they can contribute to developing new insights into the discipline's presence on Twitter over time.
The CC-BY license has been applied to the output in the repository as a curated dataset. Authorial/curatorial/collection work has been performed on the file in order to make it available as part of the scholarly record. The data contained in the deposited file is otherwise freely available elsewhere through different methods and anyone not wishing to attribute the data to the creator of this output is needless to say free to do their own collection and clean their own data.