24 Hours of #DHDiversity - figshare.csv (430.53 kB)

24 Hours of #DHDiversity

dataset

posted on 2016-07-14, 09:00 authored by Ernesto PriegoErnesto Priego

Background

The Digital Humanities 2016 conference is taking/took place in Kraków, Poland, between Sunday 11 July and Saturday 16 July 2016. #DH2016 is/was the conference official hashtag.

On Wednesday 13 July 2016 from 11:30am to 1:00pm local time the panel titled “Quality Matters: Diversity and the Digital Humanities in 2016” was chaired by Amy Earhart and included presentations by Alex Gil, Roopika Risam, Barbara Bordalejo, Isabel Galina, Lorna Hughes, and Melissa Terras.

After the lunch break second Diversity panel, titled “Boundary Land: Diversity as a defining feature of the Digital Humanities”, took place from 2:30 to 4:00 pm. It was chaired by Isabel Galina RussellBarbara Bordalejo, Padmini Murray Ray, Gimena del Rio and Elena González-Blanco.

These sessions were discussed on Twitter with the additional #dhdiversity hashtag (case not sensitive).

What This Output Is

This is a CSV file containing a total of 1151 Tweets publicly published with the hashtag #DHDiversity.

The archive starts with a Tweet published on Wednesday July 13 2016 at 07:41:01 +0000 and finishes with a Tweet published on Thursday 14 July 2016 at 14 08:10:10 +0000.

Methodology and Limitations

The Tweets contained in this file were collected by Ernesto Priego using Martin Hawksey's TAGS 6.0.

Only users with at least 1 follower were included in the archive. Retweets have been included (Retweets count as Tweets). The collection spreadsheet was customised to reflect the time zone and geographical location of the conference.

The profile_image_url and entities_str metadata were removed before public sharing in this archive.

Please bear in mind that the conference hashtag has been spammed so some Tweets colllected may be from spam accounts. Some automated refining has been performed to remove Tweets not related to the conference but the data is likely to require further refining and deduplication.

Both research and experience show that the Twitter search API is not 100% reliable. Large Tweet volumes affect the search collection process. The API might "over-represent the more central users", not offering "an accurate picture of peripheral activity" (Gonzalez-Bailon, Sandra, et al. 2012).

Apart from the filters and limitations already declared, it cannot be guaranteed that this file contains each and every Tweet tagged with #DHDiversity during the indicated period, and the dataset is shared for archival, comparative and indicative educational research purposes only.

Only content from public accounts is included and was obtained from the Twitter Search API. The shared data is also publicly available to all Twitter users via the Twitter Search API and available to anyone with an Internet connection via the Twitter and Twitter Search web client and mobile apps without the need of a Twitter account.

Each Tweet and its contents were published openly on the Web with the queried hashtag and are responsibility of the original authors.

No private personal information is shared in this dataset. The collection and sharing of this dataset is enabled and allowed by Twitter's Privacy Policy. The sharing of this dataset complies with Twitter's Developer Rules of the Road.

This dataset is shared to archive, document and encourage open educational research into scholarly activity on Twitter.

Other Considerations

Tweets published publicly by scholars during academic conferences are often tagged (labeled) with a hashtag dedicated to the conference in question.

The purpose and function of hashtags is to organise and describe information/outputs under the relevant label in order to enhance the discoverability of the labeled information/outputs (Tweets in this case).

A hashtag is metadata users choose freely to use so their content is associated, directly linked to and categorised with the chosen hashtag.

Though every reason for Tweeters' use of hashtags cannot be generalised nor predicted, it can be argued that scholarly Twitter users form specialised, self-selecting public professional networks that tend to observe scholarly practices and accepted modes of social and professional behaviour.

In general terms it can be argued that scholarly Twitter users willingly and consciously tag their public Tweets with a conference hashtag as a means to network and to promote, report from, reflect on, comment on and generally contribute publicly to the scholarly conversation around conferences. As Twitter users, conference Twitter hashtag contributors have agreed to Twitter's Privacy and data sharing policies.

Professional associations like the Modern Language Association recognise Tweets as citeable scholarly outputs. Archiving scholarly Tweets is a means to preserve this form of rapid online scholarship that otherwise can very likely become unretrievable as time passes; Twitter's search API has well-known temporal limitations for retrospective historical search and collection.

Beyond individual tweets as scholarly outputs, the collective scholarly activity on Twitter around a conference or academic project or event can provide interesting insights for the contemporary history of scholarly communications. To date, collecting in real time is the only relatively accurate method to archive tweets at a small scale.

Though these datasets have limitations and are not thoroughly systematic, it is hoped they can contribute to developing new insights into the discipline's presence on Twitter over time.

The CC-BY license has been applied to the output in the repository as a curated dataset. Authorial/curatorial/collection work has been performed on the file in order to make it available as part of the scholarly record. The data contained in the deposited file is otherwise freely available elsewhere through different methods and anyone not wishing to attribute the data to the creator of this output is needless to say free to do their own collection and clean their own data.