Predicting stock price movements using sentiment analysis of unstructured text data

The influence of social media on the financial market has increased enormously, especially in recent years. Probably the most influential example is the rise in GameStop's share price in 2021 due to a coordinated agreement between users of the social medium Reddit. This clearly shows that social media has become a significant factor in predicting stock performance. The aim of this project is to find a way to use unstructured text data from social media, such as tweets or comments, to determine opinion indicators for the future stock performance of various companies and to use them for improved stock prediction. Suitable natural language processing (NLP) methods should be used for this purpose.

In order to concretize our question, we want to pay special attention to phenomena such as the "GameStop short squeeze", which are still little researched. In the recently published paper [1], these phenomena (hereafter referred to as "YOLO" events) are analyzed using data from Reddit. However, the authors do not take the sentiment of the collected data into account in their analysis. Our project will focus on the extent to which other platforms and sentiment are also suitable as indicators for the volatility of share prices, for example. To this end, we will evaluate historical Twitter data for the share prices examined in the paper using a sentiment analysis.

The stocks that are the focus of our analysis are GameStop, AMC Entertainment Holdings, Blackberry and Nokia. YOLO events from the recent past are known for these stocks. We will specifically examine these events and determine the correlation between volatility and Twitter sentiment.

With the help of the Twitter API, we gain access to the required data. Based on this, we will write a program that selects the appropriate data and prepares it for our sentiment analysis. The preparation step is an important and non-trivial part of data collection. Especially when, for example, images or emoticons are an important part of a tweet, filtering out unwanted data is essential. We retrieve the stock data for the selected time periods via another API. We compare the results from the sentiment analysis with the stock data and observe the extent to which we can establish a correlation between these data sets.

And here is the student group in their own words (German):

Literature

[1] Lyócsa, Š., Baumöhl, E. & Výrost, T. (2022). In: Finance Research Letters (46) A. https://www.sciencedirect.com/science/article/pii/S1544612321003603, last access: 13th November, 2022.

Studierendenprojekt: Vorhersage von Aktienkursverläufen mittels Sentiment-Analyse von unstrukturierten Textdaten

Förderzeitraum: 01.10.2022 - 31.03.2023 (6 Monate)

Studierende: Thomas Löhden & Tim Matthies

Mentor: Dr. Jun-Patrick Raabe