This is an Excel workbook containing two sheets. The first sheet contains 503 rows corresponding to 503 Tweet id strings from_user_id_str 25073877 and the following corresponding metadata:
created_at time user_lang in_reply_to_user_id_str f from_user_id_str in_reply_to_status_id_str source user_followers_count user_friends_count
Tweet texts, URLs and other metadata such as profile_image_url, status_url and entities_str have not been included.
An attempt to remove duplicated entries was made but duplicates might have remained so further data refining might be required prior to analyses.
The second sheet contains 400 rows corresponding to the most frequent terms in the dataset's Tweets' texts. The text analysis was performed with the Terms Tool from Voyant Tools by Stéfan Sinclair & Geoffrey Rockwell (2017). An edited English stop words list was applied to remove Twitter data specific terms such as t.co, https, user names, etc. The analysed Tweets contained emojis and other special characters; due to character encoding these will be reflected in the terms list as character combinations.
Motivations to Share this Data
Archived
Tweets can provide interesting insights for the study of contemporary
history of
media, politics, diplomacy, etc. The queried account is a public
account widely agreed to
be of exceptional national and international public interest. Though
they provide public access to tweeted content in real time, Twitter Web
and mobile clients are not suited for appropriate Tweet corpus analysis.
For anyone researching social media, access to the data is absolutely
essential in order to perform, review and reproduce studies.
Archiving Tweets of public interest
due to their historic significance is a means to both preserve and enable reproducible study of this form of
rapid online communication that otherwise can very likely become
unretrievable as time passes. Due to Twitter's current business model and API limits, to date collecting in real time is the
only relatively reliable method to archive Tweets at a small scale.
So far Twitter data analysis and visualisation has been done without researchers providing access to the source data that would allow reproducibility. It is appreciated that an Excel workbook is far from ideal as a file format, but due to the small scale the intention is to make this data human readable and available to researchers in a variety of non-technical fields.
Methodology and Limitations
The
Tweets contained in this file were collected by Ernesto Priego using a
Python script. The data collection search query was
from:realdonaldtrump. A trigger was scheduled to collect atuomatically every hour, this means that any Tweets immediately deleted after publication have not been collected.
The
original data harvesting was refined to delete duplications, to
subscribe to Twitter's Terms and Conditions and so that the data was
sorted in chronological order.
Duplication of data due to the automated collection is possible so further data refining might be required.
The file may not contain data from Tweets deleted by the queried user account immediately after original publication.
Both research and experience show that the Twitter
search API is not 100% reliable.
(Gonzalez-Bailon, Sandra, et al. 2012).
Apart from the filters
and limitations already declared, it cannot be guaranteed that this file
contains each and every Tweet posted by the queried account during the
indicated period. This file dataset is shared for archival,
comparative and indicative educational research purposes only.
The
content included is from a public Twitter account and was obtained from
the Twitter Search API. The shared data is also publicly available to
all Twitter users via the Twitter Search API and available to anyone
with an Internet connection via the Twitter and Twitter Search web
client and mobile apps without the need of a Twitter account.
The original Tweets, their contents and associated metadata were published openly on the Web from the
queried public account and are responsibility of the original authors.
Original Tweets are likely to be copyright their individual authors but
please check individually. The license on this output applies to the data collection; third-party content should be attributed to the original authors and copyright owners.
Please note that usernames, user profile pictures and full text of the Tweets collected have not been included in this file. No private personal information is
shared in this dataset. As indicated above this dataset does not contain
the text of the Tweets. The collection and sharing of this dataset is
enabled and allowed by Twitter's Privacy Policy. The sharing of this
dataset complies with Twitter's Developer Rules of the Road.
This dataset is shared to archive, document and encourage open educational research into political activity on Twitter.
Other Considerations
All
Twitter users agree to Twitter's Privacy and data sharing policies.
Social media research remains in its infancy and though work has been
done to develop best practices there is yet no agreement on a series of
grey areas relating to reseach methodologies including ad hoc social
media specific research ethics guidelines for reproducible research.
It is understood that public figures Tweet publicly with the conscious intention to have their Tweets publicly accessed and discussed. It is assumed that a public figure Tweeting publicly is of public interest and that such figure, as a Twitter user, has given implicit consent, by agreeing explicitly to Twitter's Terms and Conditions, for their Tweets to be publicly accessed and discussed, including critical analysis, without the need for prior written permission. There is therefore no difference between collecting data and performing data analysis from a public printed or online publication and collecting data and performing data analysis of a dataset containing Twitter data from a public account from a public user in a public role. Though
these datasets have limitations and are not thoroughly systematic, it
is hoped they can contribute to developing new insights into the
discipline's presence on Twitter over time. Reproducibility is
considered here a key value for robust and trustworthy research.
Different
scholarly professional associations like the Modern Language
Association recognise Tweets, datasets and other online and digital resources as citeable scholarly outputs.
The data contained in the deposited file is otherwise available elsewhere through different methods.