#WLIC2016 Most Frequent Terms Roundup
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
The IFLA World Library and Information Congress 2016 and 2nd IFLA General Conference and Assembly, ‘Connections. Collaboration. Community’ took place 13–19 August 2016 at the Greater Columbus Convention Center (GCCC) in Columbus, Ohio, United States.The official hashtag of the conference was #WLIC2016.
This spreadsheet contains the results of a text analysis of 22327 Tweets publicly labeled with #WLIC2016 between Sunday 14 and Thursday 18 August 2015.
The collection of the source dataset was made with a Twitter Archiving Google Spreadsheet and the automated text analysis was done with the Terms tool from Voyant Tools.
The spreadsheet contains:
A sheet containing a table summarising the source archive
A sheet containing a table detailing tweet counts per day.
Sheets containing the 'raw' (no stop words, no manual refining) tables of top 300 most frequent terms and their counts for the Sun-Thu corpus and each individual corpus (1 per day).
Sheets containing the 'edited' (edited English stop word filter applied, manually refined) tables of top 50 Most frequent terms and their counts for the Sun-Thu corpus and each individual corpus (1 per day).
A sheet containing a comparison table of the top 50 per day.
Only Tweets published by accounts with at least one follower were included in the source archive.
Both research and experience show that the Twitter search API is not 100% reliable. Large Tweet volumes affect the search collection process. The API might "over-represent the more central users", not offering "an accurate picture of peripheral activity" (González-Bailon, Sandra, et al, 2012).
Apart from the filters and limitations already declared, it cannot be guaranteed that each and every Tweet tagged with #WLIC2016 during the indicated period was analysed. The dataset was shared for archival, comparative and indicative educational research purposes only.
Only content from public accounts, obtained from the Twitter Search API, was analysed. The source data is also publicly available to all Twitter users via the Twitter Search API and available to anyone with an Internet connection via the Twitter and Twitter Search web client and mobile apps without the need of a Twitter account.
This file contains the results of analyses of Tweets that were published openly on the Web with the queried hashtag; the source Tweets are not included. The content of the source Tweets is responsibility of the original authors. Original Tweets are likely to be copyright their individual authors but please check individually.
A hashtag is metadata users choose freely to use so their content is associated, directly linked to and categorised with the chosen hashtag. The purpose and function of hashtags is to organise and describe information/outputs under the relevant label in order to enhance the discoverability of the labeled information/outputs (Tweets in this case). Tweets published publicly by scholars or other professionals during academic conferences are often publicly tagged (labeled) with a hashtag dedicated to the conference in question. This practice used to be the confined to a few 'niche' fields; it is increasingly becoming the norm rather than the exception.
Though every reason for Tweeters' use of hashtags cannot be generalised nor predicted, it can be argued that scholarly Twitter users form specialised, self-selecting public professional networks that tend to observe scholarly practices and accepted modes of social and professional behaviour.
In general terms it can be argued that scholarly Twitter users willingly and consciously tag their public Tweets with a conference hashtag as a means to network and to promote, report from, reflect on, comment on and generally contribute publicly to the scholarly conversation around conferences. As Twitter users, conference Twitter hashtag contributors have agreed to Twitter's Privacy and data sharing policies.
Professional associations like the Modern Language Association and the American Pyschological Association recognise Tweets as citeable scholarly outputs. Archiving scholarly Tweets is a means to preserve this form of rapid online scholarship that otherwise can very likely become unretrievable as time passes; Twitter's search API has well-known temporal limitations for retrospective historical search and collection.
Beyond individual Tweets as scholarly outputs, the collective scholarly activity on Twitter around a conference or academic project or event can provide interesting insights for the contemporary history of scholarly communications. Though this work has limitations and might not be thoroughly systematic, it is hoped it can contribute to developing new insights into a discipline's public concerns as expressed on Twitter over time.
As it is increasingly recommended for data sharing, the CC-0 license has been applied to the resulting output in the repository. It is important however to bear in mind that some terms appearing in the dataset might be licensed individually differently; copyright of the source Tweets -and sometimes of individual terms- belongs to their authors.
Authorial/curatorial/collection work has been performed on the shared file as a curated dataset resulting from analysis, in order to make it available as part of the scholarly record. If this dataset is consulted attribution is always welcome.
Ideally for proper reproducibility and to encourage other studies the whole archive dataset should be available. Those wishing to obtain the whole Tweets should still be able to get them themselves via text and data mining methods.