Preface

This is the python code appendix (1/2) of the research paper "Social interactions in volatile markets - a GameStop story".

Gather reddit data

Pushshift.io API created by the /r/datasets mod team to help provide enhanced functionality and search capabilities for searching Reddit comments and submissions. This RESTful API gives full functionality for searching Reddit data and also includes the capability of creating powerful data aggregations. https://github.com/pushshift/api

The code below kept generally to ensure an easy application for further research. GME specific attributes are regonisable and marked as such.

Code for download the dataset is based on: https://colab.research.google.com/drive/1biLcXeHs8yZD1x9f3gv-cNJXEq7tpyoO?usp=sharing#scrollTo=MBikywNJ8ufl

Get submissions from subreddit

The Pushshift API is accessible through building an URL with the relevant parameters.

There are 2 different categories how a user can participate in a subreddit: Submitting a post or by commenting the postings of others. Since submissions are more significant contribution to the communities, the focus is on them.

Parameters for the Pushshift URL include:

Furthermore, there are also other parameters specific to submissions, we need to include:

Create timestamps and queries for search URL: https://timestampgenerator.com/

Create the file from data function as .csv. To validate results, look into https://subredditstats.com/r/palantir to compare user activity.

Clean data - technically correct and consitent

Set variable types correctly

Clean up Flairs for Subreddit /r/gme

To merge categories and unify content

Descriptive analysis

Number of submissions in subreddit during time

Ratio of flairs in subreddit over time


Analysis

Merge data frames

Pricing data (several jupyther notebook) is needed

Preparation for time series

To include time dimension in the correlation analysis, it is necessary to prepare the dataframe accordingly. This means, adding new variables for the time dimension, derived from the index of the dataframe.

Pearson correlation

For the complete dataframe

For quarters

For months

For highly volatile events (days/hours)

Frist: add an event marker, when stock performed +/-x% in one hour

Second: calculate correlation

Additional Analysis

Scatter Plot

Augmented Dickey-Fuller Test