Covid-19 Sentiments from Tagesschau.de and Twitter

Abstract

This entry covers the extraction of Tagesschau.de user comments and @tagesschau Twitter replies to the respective news articles. Then a sentiment analysis is conducted and the results are described, discussed, and plotted.

1. Get the tagesschau.de User Comments

The following part will discuss how the extraction of the user comments was conducted: In a first step, the website structure of the Tagesschau website (which is the website of a German national and international television service that is state funded) was assessed:

Screenshot of the meta.tagesschau.de website
Screenshot of the Meta Website of tagesschau

In a first step, the user comments were extracted using a Python based web crawler. The crawler ran on an Amazon AWS EC2 instance and basically iterated through the URL ID (the number at the end of the URL)

https://meta.tagesschau.de/id/145925

The web crawler used the Python package BeautifulSoup to extract the HTML data from the website that included the user comments. This HTML data was processed and the text comments and information such as the date of the comment were extracted and appended to a DataFrame based on the Python package pandas. After the crawler had iterated through all news articles it ran a sentiment analysis on the text using the package textblob_de. Last, it created a .CSV file that was downloaded for the upcoming analyses:

Screenshot of the crawler results
Screenshot of the crawler results

2. Get the Twitter Replies to @tagesschau Comments

The next web crawler gathered all Twitter replies that were made to @tagesschau. It was also written in Python and used the module GetOldTweets3 that allowed to bypass the Twitter API limitations and extract older tweets with the search term "@tagesschau".

Screenshot of the Twitter search term
Screenshot of the Twitter search term

The crawler further extracted the number of replies, the number of favorites, the number of retweets, and the sentiment of the reply. The crawler ran for about 2 days on an AWS EC2 server and created a .CSV file after completion.

3. Analyze the Sentiment Changes from January 2020 until June 2020

The next step is to do a quick first visualization of the sentiment changes of the Twitter replies to @tagesschau and the user comments made on the Tagesschau.de website. To make the plot more visual appealing and remove daily spikes, the data was transformed using a 7 day moving average:

Screenshot of moving average dataframe for the Tagesschau.de dataframe
Screenshot of moving average dataframe for the Tagesschau.de dataframe

This was also done for the twitter sentiments and then a first visualization was conducted using the matplotlib library that is commonly used to visualize and animate data in Python.

Tagesschau user comments sentiments vs. Twitter user replies sentiment
Tagesschau user comments sentiments vs. Twitter user replies sentiment

As we can see the sentiments from both data sources correlate quite high. Both lines follow the same trends and experience similar peaks and drops.

4. Focusing on Covid-19 comments

But the sentiments ins Section 3 include all comments and replies and are not just about Covid-19. In order to change this, I simply searched the news-article text corpus for specific words related to Covid-19:

searchfor = ['Corona', 'Covid','Virus','Pandemie', 'covid','virus','pandemie','corona']
twitter_analysis_df3['corona'] = twitter_analysis_df3['text'].str.contains('|'.join(searchfor), regex=True)

Plotting this new and filtered data showed interesting effects:

Covid-19 related Tagesschau user comments sentiments vs. Twitter user replies sentiment
Covid-19 related Tagesschau user comments sentiments vs. Twitter user replies sentiment

We can clearly see that the sentiments of the Twitter replies to @tagesschau news articles were much more negative in the beginning.

5. Conclusions

We can conclude that is quite easy to extract user comments and user replies from websites such as Tagesschau.de and @tagesschau from Twitter. The graphs of the overall user comment sentiments are further quite similar and seem to follow the same trends. Therefore, we can conclude that either: a) the same people write user comments for both platforms, b) the mood of the commentators is similar and is dependent on the news posts. Regarding the only Covid-19 related user comments and Twitter replies, the Twitter replies were by far more negative in the beginning of the pandemic in Europe, however after some time they again, follow the trend of the Tagesschau.de sentiments. Last, the comments made on the Tagesschau.de website seem to be more positive in general - maybe because the comment section is moderated?!