Sentiment Analysis of Finnish Twitter Discussions on COVID-19 During the Pandemic

Claes, Maëlick; Farooq, Umar; Salman, Iflaah; Teern, Anna; Isomursu, Minna; Halonen, Raija

doi:10.1007/s42979-023-02595-2

Sentiment Analysis of Finnish Twitter Discussions on COVID-19 During the Pandemic

Original Research
Open access
Published: 13 February 2024

Volume 5, article number 266, (2024)
Cite this article

You have full access to this open access article

Download PDF

Save article

View saved research

SN Computer Science Aims and scope Submit manuscript

Sentiment Analysis of Finnish Twitter Discussions on COVID-19 During the Pandemic

Download PDF

2230 Accesses
4 Citations
Explore all metrics

Abstract

With the outbreak of the COVID-19 pandemic, researchers have studied how people reacted on social media during the pandemic. Sentiment analysis has been leveraged to gain insight. However, much of the research conducted on both sentiment analysis and social media analysis of COVID-19 often focuses on widespread languages, such as English and Chinese. This is partly due to the scarcity of resources for natural language processing and sentiment analysis for morphologically complex and less prevalent languages such as Finnish. This paper aims to analyze sentiments on Twitter in the Finnish language during the COVID-19 pandemic. We manually annotate with sentiments a random sample of 1943 tweets about COVID-19 in Finnish. We use it to build binomial and multinomial logistic regression models with Lasso penalty by exploiting ngrams and two existing sentiment lexicons. We also build two similar models using an existing (pre-COVID-19) Twitter dataset for comparison. The best-performing model for the Finnish language is then used to determine the trends of positive, negative, and neutral opinions on a collection of tweets in Finnish extracted between April 21 and June 18, 2020. The best sentiment polarity prediction model for the Finnish language attain 0.785 AUC, 0.710 balanced accuracy, and 0.723 macro-averaged F1 for predicting positive and negative polarity (binomial classification), and 0.667 AUC, 0.607 balanced accuracy, and 0.475 F1 when adding neutral tweets (multinomial classification). On the other hand, the pre-COVID-19 model trained on the same number of tweets exhibits higher accuracy for the multinomial model (0.687 balanced accuracy, and 0.588 F1). We hypothesize that this loss of performance is due to the COVID-19 context that makes sentiment analysis of neutral tweets more difficult for the machine learning algorithm to predict. Running the model on all the extracted Finnish tweets, we observe a decrease in negativity and an increase in positivity over the observed time as the Finnish government lifts restrictions. Our results show that applying an existing general-purpose sentiment analyzer on tweets that are domain-specific, such as COVID-19, provides lower accuracy. More effort in the future needs to be invested in using and developing sentiment analysis tools tailored to their application domain when conducting large-scale social media analysis of specific medical issues, such as a global pandemic.

Rise and fall of the global conversation and shifting sentiments during the COVID-19 pandemic

Article Open access 17 May 2021

Covid-19 Twitter Data Analysis Using Natural Language Processing

Multiclass sentiment analysis on COVID-19-related tweets using deep learning models

Article 06 August 2022

Introduction

The novel coronavirus, also known as COVID-19, is an infectious disease that out broke in Wuhan, China, in December 2019 and spread globally [63]. The World Health Organization declared the outbreak a pandemic in March 2020 [12, 63]. As of June 9, 2022, the virus has infected over 530 million people and caused over 6 million deaths globally [14]. The social distancing preventive measures enforced by local governments turned people to online social interactions through social media [8]. According to Li et al., people share situational information on social media consisting of cautions and advice, notifications or measures, donating money or services, providing emotional support, help-seeking, doubting and criticizing state actions, and refuting rumors [33]. This leads to rich sentimental data (opinions and emotions) related to COVID-19 on social media platforms such as Twitter [8]. The most common language on Twitter is English; it continued to be the language of sentiments also during the COVID-19 pandemic. However, many tweets are also posted in less spoken languages such as Finnish [30].

In Finland, most people use the Finnish language to express their opinions on social media [1, 45]. Similarly, during COVID-19, Finns expressed their opinions and shared information about the pandemic on Twitter using the Finnish language [1]. Analyzing and processing social media content in the Finnish language are important to determine useful information relevant to the respective context. Sentiment analysis is a type of analysis that helps determine whether the opinions of people on a certain subject are positive or negative. Sentiment analysis methods, tools, and resources are often lacking for languages other than English [4, 29]. Sentiment analysis methods for languages, such as Chinese [62, 68], Korean [50], and Arabic [16], are proposed in the literature to explore the local content about COVID-19. However, methods for less widespread languages, such as the Finnish language, are also needed to analyze the emotions or sentiments of local people.

To prevent the spread of the virus, governments, health organizations, and researchers worldwide are collecting and analyzing data to understand and respond to the situation [8, 32, 47]. The data are serving as a backbone for making informed decisions to combat the pandemic. Therefore, sentiment analysis may also help local health authorities understand people’s opinions and emotional states better to monitor the situation in a pandemic and respond accordingly.

To perform sentiment analysis, Natural Language Processing (NLP) resources and tools are required. However, the NLP resources (annotated datasets, sentiment lexicons) that are publicly available to perform sentiment analysis for the Finnish language are rather limited, because Finnish is a morphologically complex language [23, 54]. The existing tools for languages with rich resources and simple morphological languages, such as English [4, 22, 57, 66], cannot easily be reused with the Finnish language [23].

Independently of the language, it is important to note that sentiment analysis is a domain-specific problem [9]. COVID-19 is a recent phenomenon. Therefore, the existing general-purpose sentiment and NLP resources, such as lexicons or word embeddings, may not perform well in classifying text related to COVID-19 due to a lack of vocabulary either specific or more prevalent since the beginning of the COVID-19 pandemic. On the other hand, the performance of a machine learning approach (trained on a labeled dataset of a specific domain) declines when used for a new or different domain [20].

Thus, specialized sentiment analysis systems are required to efficiently determine the opinions and emotions related to the COVID-19 pandemic and provide reliable analysis and monitoring of social media conversation. In this paper, to address this need, we propose a sentiment analyzer for the Finnish language to determine local sentiment polarity and trends during the pandemic. This solution is based on machine learning methods that determine sentiment polarity (positive, negative, and neutral opinions) of tweets regarding COVID-19 in the Finnish language. To the best of our knowledge, this paper is the first to investigate the performance of existing sentiment analysis methods in Finnish in the context of COVID-19, and propose a sentiment analyzer tailored to COVID-19 for the Finnish language. Moreover, it could also be the first study to analyze COVID-19 tweets in Finnish using sentiment analysis.

For this purpose, we extracted tweets in the Finnish language posted between April and June 2020 and answered the following research questions:

RQ1: What set(s) of features best predict sentiment polarity of COVID-19 Finnish tweets?
RQ2: How does the best sentiment polarity prediction model for COVID-19 Finnish tweets compare to a similar generic model?
RQ3: How did the sentiment of Finnish COVID-19 tweets evolve between April and June 2020?

We publicly share our annotated Finnish dataset, made of tweet IDs and the different sentiment labels, our sentiment polarity prediction model, and the replication code for this study [10].

The remainder of this paper is structured as follows. First, in the section “Related Work”, we present existing work on both social media content analysis for COVID-19 and sentiment analysis for the Finnish language. Then, in the section “Methods”, we describe the methodology followed for extracting data, pre-processing it, annotating a subset of tweets, building machine learning models, and evaluating them. In the section “Results”, we answer the three research questions and discuss their implications in the section “Discussion”. Finally, we present the limitations of this study in the section “Limitations” and conclude in the section “Conclusions and Future Work”.

Related Work

Compared to the existing work on sentiment analysis, we propose a sentiment classifier for the Finnish language tuned and evaluated to the context of COVID-19 on Twitter. We also show that using this classifier rather than a lexicon-based tool readily available leads to different results when analyzing the evolution of sentiment in a large dataset of COVID-19 tweets. In this section, we first present the literature about sentiment analysis methods for the Finnish language. Then, we summarize the literature regarding social media content analysis for COVID-19 and more specifically sentiment analysis methods.

Sentiment Analysis for Finnish

Sentiment analysis is an automatic natural language processing technique to analyze the opinions and emotions expressed in textual content. Sentiment analysis is used for different purposes and different applications, e.g., customer reviews [17], predicting stock prices, identifying political trends, and determining opinions regarding events [38]. The existing sentiment analysis methods could be divided into two types: lexicon-based (unsupervised) and machine learning methods (supervised). The lexicon-based methods rely on a pre-built sentiment dictionary where words are associated with sentiment orientation. The machine learning methods use existing learning approaches, such as logistic regressions, Support Vector Machines, Naïve Bayes, or neural networks. Conducting sentiment analysis for Finnish comes with its challenges.

Honkela et al. highlighted the issues and challenges in text mining for the Finnish language. For instance, Finnish is a morphologically complex language, whereas English is considered a simple language [23]. Therefore, in Finnish, a noun has approximately 2000 distinct inflections, and a verb has more than 10,000. Additionally, Finnish has billions of surface word forms that are hard to categorize and list down. One of the challenges when developing text mining methods, such as sentiment analysis, is to deal with these varying word forms. Honkela et al. emphasize the need for statistical machine learning and neural network methods, although lexicon-based sentiment analysis methods can be proposed for the Finnish language.

A limited number of sentiment analysis methods are proposed in the literature for the Finnish language. A multilingual sentiment analysis method was proposed by Rashkin et al. [55], which extends the English connotation frames [53] to ten European languages. The set of new European languages also included the low resources languages, such as Finnish and Polish. The multi-language connotation frame is a framework that allows to infer the context-based polarity of opinion-bearing words toward entities or events [55]. The method is based on parallel corpora of English and the other ten languages, and translated connotation frames.

Ahmadi [3] conducted a study to investigate the feasibility of developing a sentiment analysis application for the Finnish language. Instead of proposing a new method, an effort was made to investigate the feasibility of existing methods. The authors identified that there is a lack of sentiment lexicons and natural language processing tools for the Finnish language. The author suggests using the translated versions of the existing English sentiment lexicons and emphasizes developing a dictionary for the Finnish Language.

Jussila et al. [28] evaluated and compared the performance of two existing tools for sentiment analysis in the Finnish language with human annotators. The tools are SentiStrength [58, 59], Nemo Sentiment, and Data Analyzer [48]. SentiStrength algorithm incorporates a lexicon dictionary containing words with polarity scores to classify the tweets into positive, negative, and neutral. On the other hand, the Nemo Sentiment and Data Analyzer tool determines sentiment using two different algorithms: linear regression and random forest. The result was that the level of agreement between the sentiment analysis using these tools and the human annotators was poor [28].

More recent research has utilized machine learning and neural networks for Finnish sentiment analysis [23, 30, 44, 60]. However, all of these are based on movie or product reviews. They usually either try to predict the rating given by the reviewer or are limited to binomial classification (i.e., positive and negative). Social media posts do not focus only on expressing polarized opinions or emotions, but also on various kinds of information, such as news or facts that are more neutral and potentially more objective. Thus, the presence of neutral documents (multinomial classification) not expressing sentiment or emotion cannot be ignored when analyzing sentiment on Twitter.

Social Media Content and Sentiment Analysis for COVID-19

A number of methods are proposed in the literature to analyze the social media content regarding COVID-19 [6, 26, 33, 37, 52, 65]. Most of these studies analyze social media content to identify local trends with methods, such as keyword analysis [6, 52], topic models and word network analysis [26], author analysis [67], social network analysis [37], and machine learning [33]. Another popular technique to gain insights into how people react online is sentiment analysis.

Since the COVID-19 outbreak, sentiment analysis has been recommended as a technique for better understanding how people react to news [36], and several studies have relied on it for analyzing COVID-19 social media posts [2, 5, 16, 50, 51, 62, 68]. As of 2021, research has been conducted using modern NLP techniques such as BERT-based models for COVID-19 Twitter data [34, 34, 41]. Still, most studies analyzing Twitter data still rely either on lexicon-based methods or supervised methods built with pre-COVID data without evaluating their accuracy [13, 15, 18, 24, 25, 39, 56, 64]. In particular, lexicon-based techniques are popular as they are easy to reuse and do not require any labeling or machine learning model training. However, according to a recent literature review [22], lexicon-based sentiment analysis is largely inferior to other sentiment methods relying on traditional machine learning, neural networks, or language models.

Jongeling et al. [27] have shown that different generic tools for sentiment analysis can lead to different results leading to different conclusions when used in domain-specific contexts such as software engineering. Research also shows that such specific domains require domain-specific sentiment tools [7], and, more recently, platform-specific or topic-specific tools are necessary [43].

Methods

In this section, we describe how we extracted data and processed it to answer our research questions. First, we describe how we extracted data from Twitter for both Finnish and English languages. Then, we present how we manually annotated a sample of Finnish tweets for sentiment polarity. Following this, we report how we processed the Finnish tweets for natural language processing and detail the different text features we computed on the processed tweets. Finally, we introduce the used machine learning algorithm and how it was validated for building our Finnish sentiment polarity prediction model.

Data Extraction

Finnish Twitter Data

We started extracting data from Twitter on April 30 using the Twitter API by running once a day an R script relying on the rtweet package [31]. We used the following query for searching for Finnish tweets: covid OR corona OR korona OR pandemi OR epidemi. These terms were chosen as they were the stem for most common terms used in Finnish in relation to COVID-19 without including non-COVID-19 tweets. We investigated other terms, such as infection, but found that these returned too many tweets unrelated to COVID-19. We kept running the script daily until June 18 and extracted 146,445 tweets in Finnish, with the oldest one going back to April 21 at 19:07 UTC and the last posted on June 17 at 23:56 UTC. These tweets were posted by a total of 47,587 different users.

We chose the data from the peak time of the COVID-19 pandemic in Finland, because trends on social media change rapidly with the passage of time. Therefore, the earliest peak time of the COVID-19 event is appropriate to analyze the sentiments of the public on social media. In the chosen period, the infection rate started from 117 (7-day average—April 21) persons and dropped to 9 (7-day average—17 June) persons [14].

Manual Annotation of Tweets

On May 12, we took a random sample of the tweets that had already been extracted. We took 5000 random tweets out of the original ones (i.e., not retweets).

Early annotation revealed that many tweets tagged as Finnish by Twitter were written in other languages. Before proceeding further with the annotation, we ran Google’s Compact Language Detector 3 (using R package cld3 [46]) to detect the language more accurately. This left 3976 Finnish tweets after filtering.

Out of these, 1943 tweets were annotated as positive, negative, or neutral. The annotation of the tweets was done by the three authors who are Finnish native speakers. The logic of how annotation was done was discussed beforehand by the people participating in the annotation work. Moreover, the annotators marked the tweets that appeared sarcastic or ironic, and we filtered those out. In the case where a tweet contains both positive and negative sentiment, the tweet was annotated based on the strongest sentiment polarity, or left out if there is no stronger polarity.

Three rounds of annotations were conducted:

1.
The same set of tweets was annotated by all in order to be able to compute an agreement rate on tweets. This resulted in 183 annotated by all three annotators.
2.
We divided the tweets among annotators, so they could annotate different tweets to maximize the number of tweets with at least one annotation, which resulted in a set of 1943 tweets.
3.
The tweets already annotated were divided among annotators by prioritizing tweets already annotated by two people and then ordering the remaining tweets in a way that could maximize the number of tweets with three annotators with the least amount of effort.

Out of the 1943, 1897 tweets were annotated and confirmed as Finnish and not contain irony or sarcasm. Table 1 reports the number of tweets annotated by each annotator. Figure 1 depicts the process followed for annotation. In total, 653 tweets were annotated by all three annotators, 227 tweets by two annotators, and the remaining 1,017 tweets by only one annotator.

Table 1 Number of sample tweets annotated by each annotator

Full size table

From the 653 tweets annotated by all three annotators, we report an agreement rate of 53.5% and a weighted Krippendorff’s \(\alpha\) of 0.705. After annotating the tweets, the three annotators met to discuss the reasons for the disagreement in the annotations. This is further detailed in the discussion section and explains the aggregated polarity.

Finally, we computed an aggregated polarity annotation for tweets with disagreements using the majority vote when possible. In case of a tie,^{Footnote 1} we proceeded like this:

If one tied annotation is positive and another negative, we discard the tweet from the annotation dataset.
If one tied annotation is positive (or respectively negative) and the other neutral, the tweet is labeled as positive (respectively negative).

In the end, the annotated dataset contains 1867 tweets out of which 517 are labeled as positive, 630 as negative, and 720 as neutral.

Finnish Tweets Pre-processing

Out of the 146,445 Finnish tweets, only 80,372 are original (not retweeted). After running cld3 on all tweets, 63,080 original Finnish tweets were left. These were posted by a total of 17,804 different users between April 21 at 19:07 and June 16 at 23:09 UTC.

The text of both the Finnish and English tweets was pre-processed by removing the hashtag symbols and URLs using regular expressions. The tweets were tokenized using the R package tokenizers [40] and taking care of keeping Unicode emojis as individual tokens. For the Finnish tweets, the different tokens were stemmed using Voikko [61], an open source morphological analyzer for the Finnish language. For cases where Voikko identified more than one word as a potential stem for a given token, the first stem was selected.

Generic Finnish Sentiment Data

To compare our sentiment analysis model for Finnish tweets, we also use the FinnSentiment dataset [35] which contains data from the social media website Suomi24 and was extracted in 2019. The dataset contains 27,000 posts annotated as positive, negative, or neutral by three different persons. From this dataset, we took a random sample of 517 positive, 630 negative, and 720 neutral posts to match our annotated dataset’s size.

Feature Engineering

For building a machine learning classification model for sentiment polarity, we computed different sets of text features for Finnish tweets. Our features rely on ngrams (unigrams and bigrams) and two sentiment polarity lexicons that were available online. The first one is for SentiStrength [57], a popular lexicon-based sentiment analysis tool. The second is a Finnish translation of the AFINN-165 polarity lexicon [42]. For the ngrams, we only considered the ngrams that appear at least ten times in all the tweets.

The set of features considered in this study are the following:

No Stemming: unigrams (i.e., individual tokens) before stemming with Voikko
Uni.: unigrams stemmed with Voikko
Bi.: bigrams (successive tokens) stemmed with Voikko
SS: number of (non-stemmed) tokens matching positive and negative words from the SentiStrength Finnish lexicon.
SS Full: number of (non-stemmed) tokens matching each polarity value \((-5, -4, -3, -2, -1, 1, 2, 3, 4)\) from the SentiStrength Finnish lexicon.
AFINN: number of (stemmed) tokens matching positive and negative words from the AFINN Finnish lexicon.
AFINN Full: number of (stemmed) tokens matching each polarity value \((-5, -4, -3, -2, -1, 1, 2, 3, 4, 5)\) from the AFINN Finnish lexicon.

Machine Learning

We used weighted binomial and multinomial logistic regressions with the Lasso penalty for building classification models for sentiment polarity. Lasso regressions perform both variable selection and regularization that ensure that we can test language features with high dimensionality (e.g., ngrams generated from large documents) without having to worry about feature selection or over-fitting. Penalized regressions are regarded as a recommended strategy for natural language processing tasks [19] due to their ability to handle large and sparse input space.

Moreover, with a penalized regression model, contrarily to black-box models such as random forest, variables which are the best predictors can be identified using the log of the odds.^{Footnote 2}

For validating the models, we ran a tenfold cross-validation repeated 10 times. We selected the best \(\lambda\) hyperparameter for the Lasso regression using the Area Under the ROC Curve (AUC). Besides, we also computed and report the accuracy and macro-averaged F1, two popular performance metrics for machine learning models. In addition, we also computed and report the balanced accuracy to consider the class imbalance in the dataset.

We consider the output of SentiStrength with the Finnish lexicon as a baseline model when evaluating and comparing the machine learning models. As SentiStrength provides both a negative and positive score (respectively, between \(-1\) and \(-5\) and \(+1\) and \(+5\)) for a piece of text, polarity classes are inferred for each tweet by summing their score. Accuracy, balanced accuracy, and F1 scores are computed and reported for the baseline model.

Results

RQ1: What Set(s) of Features Best Predict Sentiment Polarity of COVID-19 Finnish Tweets?

For predicting the sentiment polarity of the COVID-19 tweets, we built two different types of models: binomial models that predict whether a tweet polarity is positive or negative (2-class problem), and a multinomial model that predicts the polarity taking also into account neutral tweets (3-class problem). For the 2-class problem, we only consider the 1147 tweets labeled as positive or negative. The 2-class problem is of interest as it is common in the literature in sentiment analysis and allows to compare our model with other sentiment analysis methods that only solve the 2-class problem, such as those based on product review. However, this paper’s main goal is to solve the 3-class problem as neutral tweets not expressing sentiments (e.g., stating facts or reporting news) are common on Twitter. Thus, a sentiment polarity prediction model dealing only with positive and negative tweets would be of limited interest in this context.

2-Class Problem

Table 2 reports the AUC, accuracy, balanced accuracy, and macro-averaged F1 score for different feature sets for the 2-class problem. The table shows that the two best individual feature sets are the unigrams (Uni., 0.75 AUC) and the AFINN Finnish lexicon (AFINN and AFINN Full, 0.72 AUC). The difference between the model without stemming and the model using unigrams is statistically significant \((P < 0.001)\) and has a large effect size (Cohen’s \(d=1.14\)).

Table 2 Feature size, area under the (ROC) curve, accuracy, balanced accuracy, and macro F1 for the different feature sets for the 2-class problem

Full size table

It can be observed that adding bigrams to unigrams (Uni. + Bi.) does not improve the performance of the model, potentially because of the small number of bigrams (80) left after filtering for the bigrams used at least ten times. Using lexicon-based features that keep the strength of the sentiment value (SS Full and AFINN Full) does not improve the model’s performance compared to simpler feature sets that only count the number of positive and negative words (SS and AFINN).

Overall, the best prediction model is obtained by adding both lexicons to the unigrams (Uni. + SS + AFINN), which provides an AUC of 0.785, an accuracy of 0.71, a balanced accuracy of 0.712, and an F1 score of 0.723. However, while this model exhibits a statistically significant difference with both models Uni. \((P < 0.001\), Cohen’s \(d = 0.94\)) and Uni. + AFINN \((P < 0.001\), Cohen’s \(d = 0.55\)), the difference is not statistically significant when compared with the model Uni. + SS \((P = 0.22).\)

3-Class Problem

Table 3 reports the AUC, accuracy, balanced accuracy, and macro-averaged F1 score for different feature sets for the 3-class problem. For comparison, the table also reports as a baseline the accuracy, balanced accuracy, and F1 score of running SentiStrength with the Finnish lexicon on the annotated data.

Table 3 Feature size, area under the (ROC) curve, accuracy, balanced accuracy, and macro F1 for the different feature sets for the 3-class problem

Full size table

Adding neutral tweets significantly decreases the performance of the model in comparison with the 2-class problem. Similarly, as before, the table shows that the two best individual feature sets are the unigrams (Uni., 0.65 AUC) and the AFINN Finnish lexicon (AFINN 0.63 AUC). The difference between the model without stemming and the model using unigrams is statistically significant \((P < 0.001)\) and has a large effect size (Cohen’s \(d = 1.03\)).

The best feature set is obtained by combining both lexicons with the unigrams (Uni. + SS + AFINN), which provides an AUC of 0.667, an accuracy of 0.474, a balanced accuracy of 0.607, and an F1 score of 0.475. However, while this model exhibits a statistically significant difference and a strong effect size with the model Uni. \((P < 0.001\), Cohen’s \(d = 0.82\)), the difference is small with the model Uni. + AFINN \((P = 0.008\), Cohen’s \(d = 0.38\)) and not statistically significant with the model Uni. + SS \((P = 0.15).\)

Table 4 Confusion matrix for the best 3-class model (Uni. + SS + AFINN) and when running SentiStrength directly on the tweets (baseline)

Full size table

When running SentiStrength with the Finnish lexicon on the tweets rather than building a linear regression model using the lexicon (baseline in Table 1), we report an increase of 0.044 in balanced accuracy and 0.07 in the F1 score with the best model. Even the model without stemming outperforms the baseline in terms of balanced accuracy and F1 score. However, the difference between the baseline and our model is more than a difference in overall accuracy. Table 4 shows the confusion matrix of one of the best models and the confusion matrix obtained by running SentiStrength with the Finnish lexicon directly on the tweets. This highlights that the SentiStrength Finnish lexicon cannot properly detect positive and negative tweets. Indeed, its recall for the positive class is 28.4% (vs. 52.9% for the regression model) and 23.7% for the negative class (vs. 49%).

Table 5 reports the best predictors for the best model for the 3-class problem. While many predictors are generic positive or negative words (fuck, after the death, wonderful, joy, fine, the thumbs-up emoji, and SS positive words), the model also exhibits predictors that are more specific to Finland and the pandemic phenomenon, such as THL,^{Footnote 3}(Sanna) Mari(n),^{Footnote 4}senior, and May. Besides, the predictors also include words that would usually be considered as stop words, such as how, below, I and or.

Table 5 Best predictors for the best model for the 3-class problem (Uni. + SS + AFINN)

Full size table

RQ2: How Does the Best Sentiment Polarity Prediction Model for COVID-19 Finnish Tweets Compared to a Similar Generic Model?

Using the Suomi24 dataset, we built two different sentiment polarity prediction models, one using the full dataset and one using a random subsample matching our annotated dataset’s size and class distribution.

Table 6 Area under the (ROC) curve, accuracy, balanced accuracy, and macro F1 for the model based on the Suomi24 dataset with all the data, subsampled to match our annotated dataset, and using our annotated dataset

Full size table

Table 6 reports the performance metrics for these two models and the one for our COVID-19 annotated tweets. Overall, the performance is better for both Suomi24 models than for the COVID-19 model. Downsizing the Suomi24 dataset from 27,000 posts to 1867 to match the COVID-19 annotated dataset reduces the AUC from 0.795 to 0.753 and the F1 score from 0.594 to 0.588. However, the COVID-19 model exhibits much lower prediction performances, with an AUC of 0.666 and an F1 score of 0.475.

Looking at the confusion matrices in Table 7 for the sampled Suomi24 and the COVID-19 models, the major difference between both prediction models is caused by the misclassification of neutral tweets.

In the Suomi24 model, 233 (32%) of the 734 neutral tweets are misclassified as positive or negative, giving a recall of 68% for the neutral tweets. On the other hand, the COVID-19 model misclassifies 420 (58%) of the 720 neutral tweets, giving a recall of 42% for the neutral tweets.

Table 7 Confusion matrix for the Suomi24 model and the best COVID-19 model (Uni. + SS + AFINN)

Full size table

RQ3: How Did Sentiment of Finnish COVID-19 Tweets Evolve from April to June 2020?

We ran the final sentiment polarity model for the Finnish language (Uni. + SS + AFINN in Table 1) on the 63,080 original Finnish tweets from April 21 to June 17. Figures 2 and 3 show the 7-day running average of the evolution of the daily (relative and absolute) numbers of positive, negative, and neutral tweets.

Figure 2 shows a decreasing trend in relative negative sentiment and an increasing trend in positive sentiment over time. However, as seen in Fig. 3, the overall amount of COVID-19 tweets decreases from over 1600 tweets per day in late April to less than 600 tweets a day in mid-June. This decrease is particularly noticeable in mid-May.

For comparison, Fig. 4 shows the evolution of sentiment when running SentiStrength with the Finnish lexicon rather than the sentiment polarity prediction model presented in this paper. The use of SentiStrength does not show the changes in sentiment polarity observed in Fig. 2 because of its inability to detect positive or negative sentiment correctly.

Discussion

Finnish Sentiment Analysis of COVID-19

All our linear models built for answering RQ1, including the worst logistic regression model based on SentiStrength’s Finnish lexicon, provide better accuracy than by running the SentiStrength tool directly on the annotated dataset. More specifically, while our multinomial model is far from providing the most accurate results, it exhibits better recall for predicting the positive (55% recall) and negative cases (50% recall) than by running SentiStrength (28% and 24%, respectively) directly on the tweets. Thus, our sentiment analyzer provides far more reliable, and useful results for predicting positive and negative tweets, which are often the cases researchers are interested in [2], than the Finnish version of SentiStrength.

The analysis of all the extracted Finnish tweets reveals a decreasing trend of COVID-19 tweets, but also a decreasing trend in negativity mirrored by an increasing trend in positivity. This observation complements the trend for the rate of infections (mentioned in the section “Data Extraction”) in Finland for our collected data period. These trends are particularly noticeable from mid-May, which corresponds to the time when the Finnish government started to loosen the restrictions in Finland gradually. Moreover, these results match previous findings of a higher amount of positive sentiment than negative [5], or an increase over time in positive sentiment [68].

Implications for Sentiment Analysis

Even though our results focus on tweets written in Finnish, our findings also have broader implications for sentiment analysis in the context of COVID-19, more generally in medical social media analysis, in other languages.

Our results show that the accuracy of a generic sentiment analysis tool for the 3-class problem is potentially lower in the context of COVID-19 than in a generic context. In RQ2, we found that the sentiment polarity prediction model for the COVID-19 tweets was performing worse than the non-COVID-19 model based on Suomi24. This difference is explained by a much lower recall for the neutral case of COVID-19 tweets than for Suomi24 dataset. Thus, we conclude that detecting neutral tweets might be more difficult in the context of COVID-19 than in a general context.

The best predictors identified in RQ2 unveil that non-sentiment-bearing words can act as good predictors in the specific context of the COVID-19 pandemic. Specifically, the best predictor for positive sentiment was May (toukokuu), which relates to restrictions being gradually lifted by the Finnish government in May 2020.

These findings imply that sentiment analysis tools developed for (or with) data with a broad scope are potentially less accurate in specific contexts, such as the COVID-19 pandemic. Therefore, further effort needs to be invested in developing sentiment analysis tools tailored to medical and epidemic events to provide accurate social media monitoring tools, but also potentially other major events causing global disruptions such as a financial crisis.

Disagreement Among the Annotators

After annotating the tweets, the three annotators met to discuss disagreements. The tweets that included a fact, such as a news headline, were the most typical reason for differing opinions. This resulted in neutral labels when the tweets were interpreted as a statement of a fact, or positive/negative labels if the tone of the statement was deemed positive/negative. Thus, considering the source of a tweet (e.g., whether it comes from a news website or not) when training a machine learning model for sentiment analysis could potentially improve the recall of neutral tweets.

Another typical case was a tweet part of a discussion thread. As the context was not visible to the annotators, the context and the tweet’s interpretation differed for each annotator. This implies that tweets that are part of a discussion thread should be annotated by showing the whole discussion thread rather than just a single tweet. Moreover, sentiment analysis tools could also benefit from using previous tweets in a thread to improve their performance.

Some tweets were very short, consisting only of an individual or few words and not whole sentences. There were deviations in ratings in these cases as the interpretation was based on evaluating the meaning of the collection of words or a single word in the context (e.g., word comment on the news).

Disagreement Between the Annotation and the Prediction Model

The annotators also met to discuss the differences between human annotators and the algorithm to identify possible directions for future improvements.

Several explanations were identified as themes for differences between human annotators and the automatic sentiment analysis done with linear regression. First, a small fraction of the differences were later noticed as human errors in favor of automatic annotation.

However, it appeared that criticism (e.g., government policies) was especially difficult for the algorithm to detect as a negative sentiment. There were also other tweets with subtle underlying messages that the human annotators identified, but the algorithm could not detect, which resulted in a different annotation.

The algorithm was often capable of detecting the proper sentiment, but understandably cannot detect the meaning behind words as humans do from context. Furthermore, sentiments expressed in this dataset were heavily influenced by the ongoing extraordinary situation in society. The differences in sentiments were often expressed with only subtle differences in wordings.

Limitations

Regardless of rigorous research methods, the study comes with some limitations. First, the amount of data that could be annotated is relatively limited. With more data, the accuracy of the model based on the pre-COVID dataset increases from 0.687 to 0.727. Thus, we can expect a similar improvement with more training data for the Finnish model.

Furthermore, the built sentiment polarity prediction model could be improved with better text features and better machine learning algorithms. We only used ngrams and two sentiment lexicons as text features. However, using other features, such as word embeddings to capture the semantics of words, could lead to better performance [22]. We relied on logistic regressions as they allow to interpret the model, but black-box algorithms such as random forests or neural networks could yield better accuracy.

The annotation is not only limited to the total number of tweets that were annotated but also limited to the number of different annotations for each tweet. A single person annotated 53% of the tweets, and only 653 tweets were annotated by all three. For these, we report a weighted Krippendorff’s \(\alpha\) of 0.705. Thus, the final annotated model is biased toward one annotator.

Finally, even though they match with the previous results [5, 68], the results of RQ3 might not generalize to all times and places [21, 49].

The higher amount of positive sentiment than negative sentiment could result from the situation in Finland being significantly less severe in 2020 than in many other countries. Moreover, the decrease in negative sentiment and an increase in positive sentiment observed in mid-May, supposedly linked to the lifting of the first restrictions, could differ in other countries. It might be observed later for other European countries or for countries where the number of daily new cases kept increasing during the summer of 2020.

Conclusions and Future Work

In this paper, we presented how we built a sentiment polarity prediction model tailored to Finnish COVID-19 Twitter discussions using Twitter data extracted from the end of April 2020 to the middle of June 2020. As far as we know, this paper could be the first attempt at developing a sentiment analyzer tailored for COVID-19 online discussions for the Finnish language.

Our best prediction model is based on logistic regression with the Lasso penalty trained with stemmed unigrams and two existing sentiment lexicons for the Finnish language. Even though the prediction model is relatively simple, it provides better accuracy than an existing popular tool, SentiStrength. We publicly release our annotated Finnish dataset and the final prediction model alongside all source code used for processing, training, and evaluation of machine learning models [10].

We observe a significant increase in performance when using a pre-COVID-19 generic Finnish sentiment dataset using the same amount of training data. This difference is mostly due to a higher amount of misclassified neutral tweets with the recall of the neutral case dropping from 68% when using the pre-COVID-19 generic Finnish dataset to 42% when using the COVID-19 dataset. We conclude that sentiments expressed in COVID-19 tweets are more difficult to detect automatically. Thus, sentiment analysis for COVID-19, and more broadly for epidemic monitoring, would benefit from tailored sentiment analysis solutions. Our model is trained from and analyzed on the COVID-19 dataset; how the model will perform on a completely new/different Finnish text is yet to be tested. However, we anticipate that our method will perform similarly on other pandemic texts or health-related phenomena.

Applying our sentiment analyzer to all the data we collected over the course of almost 2 months, we found that the trend in sentiment became gradually more positive as the Finnish government started to lift restrictions during the Spring of 2020.

In the future, we want to extend these results by investigating more advanced techniques for sentiment analysis. In particular, a recent literature review of existing sentiment analysis techniques [22] shows that methods based on language models usually outperform other techniques such as those based on traditional machine learning, and even more, those based on lexicon matching. Thus, we are planning on exploring how language models and word embeddings, such as BERT and word2vec, can be leveraged to improve our current model for the Finnish language.

Eventually, we plan to reuse our method for building sentiment analyzers for other languages. In particular, we are interested in annotating another sample of tweets in Swedish to analyze COVID-19 Swedish tweets as Finland and Sweden are neighboring countries that adopted completely different measures in the face of the COVID-19 pandemic. We believe having sentiment analysis tools built similarly for both languages could lead to an interesting comparison of how differently people reacted to the two countries’ COVID-19 reaction strategies.

Data Availability

The authors publicly release the annotated data (containing tweet IDs and the annotations), the source code used to train the regression models, their evaluation and generated numbers, and tables and figures are available as a replication package [10]. Moreover, we also release the final prediction model as an R package [11].

Notes

Lack of majority vote.
These represents the logarithm of \(\frac{p}{1-p}\) where p is the probability of belonging to a given class.
THL is the Finnish National Institute for Health and Welfare operating under the Finnish Ministry of Social Affairs and Health.
Mari is a stemming mistake made by Voikko of Marin, Sanna Marin being the Finnish Prime Minister during the COVID-19 pandemic.

References

Study: Finnish municipalities should post in English to maximise social media potential. Yle News, 2023. https://yle.fi/a/74-20045158. Accessed 12 Dec 2023
Abd-Alrazaq A, Alhuwail D, Househ M, Hai M, Shah Z. Top concerns of tweeters during the COVID-19 pandemic: a surveillance study. J Med Internet Res. 2020;22(4):1–9. https://doi.org/10.2196/19016.
Article Google Scholar
Ahmadi M. Finnish mood A sentiment analysis application for Twitter data. Master’s thesis, Tampere University of Applied Sciences. 2017. https://www.theseus.fi/bitstream/handle/10024/133294/Ahmadi_Mojtaba.pdf?sequence=1 &isAllowed=y. Accessed 22 June 2020
Alswaidan N, Menai MEB. A survey of state-of-the-art approaches for emotion recognition in text. Knowl Inf Syst. 2020;62:1–51.
Article Google Scholar
Barkur G, Vibha Kamath GB. Sentiment analysis of nationwide lockdown due to COVID 19 outbreak: evidence from India. Asian J Psychiatry. 2020. https://doi.org/10.1016/j.ajp.2020.102089.
Article Google Scholar
Budhwani H, Sun R. Creating COVID-19 stigma by referencing the novel coronavirus as the Chinese virus on twitter: quantitative analysis of social media data. J Med Internet Res. 2020;22(5):1–7. https://doi.org/10.2196/19301.
Article Google Scholar
Calefato F, Lanubile F, Maiorano F, Novielli N. Sentiment polarity detection for software development. Empir Softw Eng. 2018;23(3):1352–82.
Article Google Scholar
Chen E, Lerman K, Ferrara E. Tracking social media discourse about the COVID-19 pandemic: development of a public coronavirus Twitter data set. JMIR Public Health Surveill. 2020;6(2):e19273. https://doi.org/10.2196/19273. arXiv:2003.07372.
Article PubMed PubMed Central Google Scholar
Choi Y, Kim Y, Myaeng SH. Domain-specific sentiment analysis using contextual feature generation. Hong Kong China: ACM; 2009. p. 37–44.
Google Scholar
Claes M. Finnishsentimentcovid19—replication package. 2020. https://github.com/M3SOulu/FinnishSentimentCOVID19.
Claes M. Finnishsentimentcovid19—replication package. 2020. https://github.com/M3SOulu/FinnishSentiment.
Cucinotta D, Vanelli M. WHO declares COVID-19 a pandemic. Acta Biomed. 2020;91:157–60. https://doi.org/10.23750/abm.v91i1.9397. arXiv:2003.10359.
Article PubMed PubMed Central Google Scholar
Das S, Dutta A. Characterizing public emotions and sentiments in COVID-19 environment: a case study of India. J Hum Behav Soc Environ. 2021;31(1–4):154–67.
Article Google Scholar
Dong E, Du H, Gardner L. An interactive web-based dashboard to track COVID-19 in real time. Lancet Infect Dis. 2020;20(5):533–4.
Article CAS PubMed PubMed Central Google Scholar
Eachempati P, Srivastava PR, Panigrahi PK. Sentiment analysis of COVID-19 pandemic on the stock market. Am Bus Rev. 2021;24(1):8.
Article Google Scholar
Elhadad MK, Li KF, Gebali F. COVID-19-fakes: a twitter (Arabic/English) dataset for detecting misleading information on COVID-19. In: International conference on intelligent networking and collaborative systems. Berlin: Springer; 2020. p. 256–68.
Farooq U, Dhamala TP, Nongaillard A, Ouzrout Y, Qadir MA. A word sense disambiguation method for feature level sentiment analysis. In: 2015 9th international conference on software, knowledge, information management and applications (SKIMA). IEEE; 2015. p. 1–8.
Garcia K, Berton L. Topic detection and sentiment analysis in Twitter content related to COVID-19 from Brazil and the USA. Appl Soft Comput. 2021;101:107057.
Article PubMed Google Scholar
Gentzkow M, Kelly B, Taddy M. Text as data. J Econ Lit. 2019;57(3):535–74. https://doi.org/10.1257/jel.20181020.
Article Google Scholar
Ghifary M, Balduzzi D, Kleijn WB, Zhang M. Scatter component analysis: a unified framework for domain adaptation and domain generalization. IEEE Trans Pattern Anal Mach Intell. 2016;39(7):1414–30.
Article PubMed Google Scholar
Gore RJ, Diallo S, Padilla J. You are what you tweet: connecting the geographic variation in America’s obesity rate to Twitter content. PLoS One. 2015;10(9):e0133505.
Article PubMed PubMed Central Google Scholar
Heitmann M, Siebert C, Hartmann J, Schamp C. More than a feeling: benchmarks for sentiment analysis accuracy. Technical report. Working Paper. 2020. https://papers.ssrn.com/sol3/papers.cfm?abstract_id.
Honkela T, Korhonen J, Lagus K, Saarinen E. Five-dimensional sentiment analysis of corpora, documents and words published. In: Advances in self-organizing maps and learning vector quantization—proceedings of the 10th international workshop, WSOM 2014; 2014. https://doi.org/10.1007/978-3-319-07695-9, http://www.scopus.com/inward/record.url?eid=2-s2.0-84903515551 &partnerID=tZOtx3y1.
Hota H, Sharma DK, Verma N. Lexicon-based sentiment analysis using twitter data: a case of COVID-19 outbreak in India and abroad. In: Data science for COVID-19. Amsterdam: Elsevier; 2021. p. 275–95.
Jang H, Rempel E, Roth D, Carenini G, Janjua NZ. Tracking COVID-19 discourse on Twitter in North America: infodemiology study using topic modeling and aspect-based sentiment analysis. J Med Internet Res. 2021;23(2):e25431.
Article PubMed PubMed Central Google Scholar
Jo W, Lee J, Park J, Kim Y. Online information exchange and anxiety spread in the early stage of the novel coronavirus (COVID-19) outbreak in South Korea: structural topic model and network analysis. J Med Internet Res. 2020;22(6):e19455. https://doi.org/10.2196/19455.
Article PubMed PubMed Central Google Scholar
Jongeling R, Sarkar P, Datta S, Serebrenik A. On negative results when using sentiment analysis tools for software engineering research. Empir Softw Eng. 2017;22(5):2543–84.
Article Google Scholar
Jussila J, Vuori V, Okkonen J, Helander N. Reliability and perceived value of sentiment analysis for Twitter data. In: International conference on strategic innovative marketing; 2017. https://doi.org/10.1007/978-3-319-56288-9, http://link.springer.com/10.1007/978-3-319-56288-9.
Kaity M, Balakrishnan V. Sentiment lexicons and non-English languages: a survey. Knowl Inf Syst. 2020;62(12):1–36.
Article Google Scholar
Kaustinen J. Sentiment analysis of Finnish movie reviews: extracting sentiment from texts in a morphologically rich language. Master’s thesis, Åbo Akademi University. 2018.
Kearney MW. rtweet: collecting and analyzing twitter data. J Open Source Softw. 2019;4(42):1829. https://doi.org/10.21105/joss.01829, https://joss.theoj.org/papers/10.21105/joss.01829, r package version 0.7.0.
Kluge HHP. Statement—every country needs to take boldest actions to stop COVID-19. 2020. https://www.euro.who.int/en/health-topics/health-emergencies/coronavirus-covid-19/statements/statement-every-country-needs-to-take-boldest-actions-to-stop-covid-19. Accessed 17 June 2020.
Li L, Zhang Q, Wang X, Zhang J, Wang T, Gao TL, Duan W, Tsoi KKF, Wang FY. Characterizing the propagation of situational information in social media during COVID-19 epidemic: a case study on Weibo. IEEE Trans Comput Soc Syst. 2020;7(2):556–62. https://doi.org/10.1109/TCSS.2020.2980007.
Article Google Scholar
Lin HY, Moh TS. Sentiment analysis on COVID tweets using COVID-twitter-bert with auxiliary sentence approach. Virtual Event USA: ACM; 2021. p. 234–8.
Google Scholar
Lindén K, Jauhiainen T, Hardwick S. Finnsentiment—a Finnish social media corpus for sentiment polarity annotation. Lang Resour Eval. 2023;57:581–609.
Article Google Scholar
Liu Q, Zheng Z, Zheng J, Chen Q, Liu G, Chen S, Chu B, Zhu H, Akinwunmi B, Huang J, et al. Health communication through news media during the early stage of the COVID-19 outbreak in china: digital topic modeling approach. J Med Internet Res. 2020;22(4):e19118.
Article PubMed PubMed Central Google Scholar
Lopez CE, Vasu M, Gallemore C. Understanding the perception of COVID-19 policies by mining a multilanguage Twitter dataset. 2020. arXiv:2003.10359.
Medhat W, Hassan A, Korashy H. Sentiment analysis algorithms and applications: a survey. Ain Shams Eng J. 2014;5(4):1093–113.
Article Google Scholar
de Melo T, Figueiredo CM. Comparing news articles and tweets about COVID-19 in Brazil: sentiment analysis and topic modeling approach. JMIR Public Health Surveill. 2021;7(2):e24585.
Article PubMed PubMed Central Google Scholar
Mullen LA, Benoit K, Keyes O, Selivanov D, Arnold J. Fast, consistent tokenization of natural language text. J Open Source Softw. 2018;3:655. https://doi.org/10.21105/joss.00655.
Article ADS Google Scholar
Müller M, Salathé M, Kummervold PE. Covid-twitter-bert: A natural language processing model to analyse covid-19 content on twitter. Front Artif Intell 2023;6:1023281.
Article PubMed PubMed Central Google Scholar
Nielsen FÅ. AFINN sentiment analysis in Python. 2020. https://github.com/fnielsen/afinn. Accessed 1 Apr 2020
Novielli N, Calefato, Dongiovanni D, Girardi D, Fabio, Lanubile F. Can we use se-specific sentiment analysis tools in a cross-platform setting? In: Proceedings of 17th international conference on mining software repositories (MSR). Seoul, Republic of Korea: ACM; 2020.
Nukarinen V. Automated text sentiment analysis for Finnish language using deep learning. Master’s thesis, Tampere University of Technology. 2018.
Nyman N. Twitter: real-life contacts online—mapping variation of regional language choice in Finland within a social media context. Master’s thesis, University of Eastern Finland. 2021.
Ooms J. cld3: Google’s compact language detector 3. 2020. https://cran.r-project.org/web/packages/cld3/index.html. Accessed 1 Apr 2020
Organization WH. Coronavirus. 2020. https://www.who.int/emergencies/diseases/novel-coronavirus-2019/. Accessed 17 June 2020.
Paavola J, Jalonen H. An approach to detect and analyze the impact of biased information sources in the social media. In: ECCWS2015-proceedings of the 14th European conference on cyber warfare and security; Hatfield, UK: Academic Conferences and Publishing International Limited; 2015. p. 213.
Padilla JJ, Kavak H, Lynch CJ, Gore RJ, Diallo SY. Temporal and spatiotemporal investigation of tourist attraction visit sentiment on Twitter. PLoS One. 2018;13(6):e0198857.
Article PubMed PubMed Central Google Scholar
Park HW, Park S, Chong M. Conversations and medical news frames on Twitter: infodemiological study on COVID-19 in South Korea. J Med Internet Res. 2020;22(5):e18897.
Article PubMed PubMed Central Google Scholar
Prabhakar Kaila D, Prasad DA. Informational flow on twitter—corona virus outbreak—topic modelling approach. Int J Adv Res Eng Technol. 2020;11(3):128–34.
Google Scholar
Qin L, Sun Q, Wang Y, Wu KF, Chen M, Shia BC, Wu SY. Prediction of number of cases of 2019 novel coronavirus (COVID-19) using social media search index. Int J Environ Res Public Health. 2020. https://doi.org/10.3390/ijerph17072365.
Article PubMed PubMed Central Google Scholar
Rashkin H, Singh S, Choi Y. Connotation frames: a data-driven investigation. In: Proceedings of the 54th annual meeting of the association for computational linguistics, vol. 1. Long Papers; 2016. p. 311–21.
Rashkin H, Bell E, Choi Y, Volkova S. Multilingual connotation frames: a case study on social media for targeted sentiment analysis and forecast. In: Proceedings of the 55th annual meeting of the association for computational linguistics, vol. 2. Short Papers; 2017. p. 459–64.
Rashkin H, Bell E, Choi Y, Volkova S. Multilingual connotation frames: a case study on social media for targeted sentiment analysis and forecast. In: ACL 2017—55th annual meeting of the Association for Computational Linguistics, proceedings of the conference (long papers), vol. 2; 2017. p. 459–64. https://doi.org/10.18653/v1/P17-2073.
Suryadi D. Does it make you sad? A lexicon-based sentiment analysis on COVID-19 news tweets. In: IOP conference series: materials science and engineering, vol. 1077. Yogyakarta, Indonesia: IOP Publishing; 2021. p. 012042.
Thelwall M. The heart and soul of the web? Sentiment strength detection in the social web with sentistrength. In: Cyberemotions. Berlin: Springer; 2017. p. 119–34.
Thelwall M, Buckley K, Paltoglou G, Cai D, Kappas A. Sentiment strength detection in short informal text. J Am Soc Inf Sci Technol. 2010;61(12):2544–58.
Article Google Scholar
Thelwall M, Buckley K, Paltoglou G. Sentiment strength detection for the social web. J Am Soc Inf Sci Technol. 2012;63(1):163–73.
Article Google Scholar
Vankka J, Myllykoski H, Peltonen T, Riippa K. Sentiment analysis of Finnish customer reviews. In: 2019 6th International conference on social networks analysis, management and security, SNAMS 2019; 2019. p. 344–50. https://doi.org/10.1109/SNAMS.2019.8931724.
Voikko. Voikko, Free linguistic software and data for Finnish. 2019. https://voikko.puimula.org. Accessed 1 Apr 2020
Wang T, Lu K, Chow KP, Zhu Q. COVID-19 sensing: negative sentiment analysis on social media in China via BERT model. IEEE Access. 2020;8:138162–9.
Article PubMed Google Scholar
Wu YC, Chen CS, Chan YJ. The outbreak of COVID-19: an overview. J Chin Med Assoc. 2020;83(3):217–20. https://doi.org/10.1097/JCMA.0000000000000270.
Article CAS PubMed PubMed Central ADS Google Scholar
Xiang X, Lu X, Halavanau A, Xue J, Sun Y, Lai PHL, Wu Z. Modern senicide in the face of a pandemic: an examination of public discourse and sentiment about older adults and COVID-19 using machine learning. J Gerontol: Ser B. 2021;76(4):e190–200.
Article Google Scholar
Ye Y, Hou S, Fan Y, Qian Y, Zhang Y, Sun S, Peng Q, Laparo K. \(\alpha\)-Satellite: an AI-driven system and benchmark datasets for hierarchical community-level risk assessment to help combat COVID-19. 2020. arXiv:2003.12232.
Yue L, Chen W, Li X, Zuo W, Yin M. A survey of sentiment analysis in social media. Knowl Inf Syst. 2019;60(2):617–63.
Article Google Scholar
Yum S. Social network analysis for coronavirus (COVID-19) in the United States. Soc Sci Q (CDC). 2020. https://doi.org/10.1111/ssqu.12808.
Article Google Scholar
Zhao Y, Cheng S, Yu X, Xu H. Chinese public’s attention to the COVID-19 epidemic on social media: observational descriptive study. J Med Internet Res. 2020;22(5):1–13. https://doi.org/10.2196/18825.
Article Google Scholar

Download references

Acknowledgements

The first author has been supported by the Academy of Finland grant 328058.

Funding

Open Access funding provided by LUT University (previously Lappeenranta University of Technology (LUT)).

Author information

Authors and Affiliations

Epiroc AB, Luleå, Sweden
Maëlick Claes
Mashal, Helsinki, Finland
Umar Farooq
School of Engineering Science, Lappeenranta-Lahti University of Technology, Mukkulankatu 19 (Lahti Campus), Lahti, Finland
Iflaah Salman
M3S Research Unit, Faculty of Information Technology and Electrical Engineering, University of Oulu, Oulu, Finland
Anna Teern, Minna Isomursu & Raija Halonen

Authors

Maëlick Claes
View author publications
Search author on:PubMed Google Scholar
Umar Farooq
View author publications
Search author on:PubMed Google Scholar
Iflaah Salman
View author publications
Search author on:PubMed Google Scholar
Anna Teern
View author publications
Search author on:PubMed Google Scholar
Minna Isomursu
View author publications
Search author on:PubMed Google Scholar
Raija Halonen
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Iflaah Salman.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Claes, M., Farooq, U., Salman, I. et al. Sentiment Analysis of Finnish Twitter Discussions on COVID-19 During the Pandemic. SN COMPUT. SCI. 5, 266 (2024). https://doi.org/10.1007/s42979-023-02595-2

Download citation

Received: 29 June 2022
Accepted: 28 December 2023
Published: 13 February 2024
Version of record: 13 February 2024
DOI: https://doi.org/10.1007/s42979-023-02595-2

Keywords

Profiles

Anna Teern View author profile

Sentiment Analysis of Finnish Twitter Discussions on COVID-19 During the Pandemic

Abstract

Similar content being viewed by others

Rise and fall of the global conversation and shifting sentiments during the COVID-19 pandemic

Covid-19 Twitter Data Analysis Using Natural Language Processing

Multiclass sentiment analysis on COVID-19-related tweets using deep learning models

Explore related subjects

Introduction

Related Work

Sentiment Analysis for Finnish

Social Media Content and Sentiment Analysis for COVID-19

Methods

Data Extraction

Finnish Twitter Data

Manual Annotation of Tweets

Finnish Tweets Pre-processing

Generic Finnish Sentiment Data

Feature Engineering

Machine Learning

Results

RQ1: What Set(s) of Features Best Predict Sentiment Polarity of COVID-19 Finnish Tweets?

2-Class Problem

3-Class Problem

RQ2: How Does the Best Sentiment Polarity Prediction Model for COVID-19 Finnish Tweets Compared to a Similar Generic Model?

RQ3: How Did Sentiment of Finnish COVID-19 Tweets Evolve from April to June 2020?

Discussion

Finnish Sentiment Analysis of COVID-19

Implications for Sentiment Analysis

Disagreement Among the Annotators

Disagreement Between the Annotation and the Prediction Model

Limitations

Conclusions and Future Work

Data Availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Profiles