Storyline uses sentiment analysis to visualize average positive and negative sentiment levels throughout classic works of literature.
First, the text is cleaned up to remove any elements that are irrelevant or that could potentially inhibit the sentiment analysis. Then the VADER sentiment analysis lexicon is applied to the text. The VADER lexicon uses sentiment data on each word in the sentence to create overall sentiment scores for each sentence. When the sentiment analysis is complete for each piece of literature, an array of sentiment scores is produced. These arrays are smoothed using convolution with a gaussian function. The standard deviation of the gaussian function can be altered while holding the window size constant in order to alter the resolution or level of detail visible. This allows the user to see the overall shifts in sentiment across the entire course of the story at a low level of resolution, as well as the details of smaller story arcs at a high level of resolution.
All of the source texts for the works of literature come from Project Gutenberg. This visualiser is based on work done by Reagan et al. and Kim et al. This project is open source and available on GitLab.
Kim, E. et al. “Investigating the Relationship between Literary Genres and Emotional Plot Development.” LaTeCH@ACL (2017).
Reagan, A. J. et al. “The emotional arcs of stories are dominated by six basic shapes.” EPJ Data Science 5 (2016): 1-12.