The Impact of Scale on Content Analysis of Goodreads Reviews

  • We use content analysis: quantitative method for analysing the content of reviews
  • Subsets of reviews with different types of focus and different scales (from 1 to 100 to 10,000 to 1 million reviews)
In [1]:
# This reload library is just used for developing the REPUBLIC hOCR parser 
# and can be removed once this module is stable.
%reload_ext autoreload
%autoreload 2

# This is needed to add the repo dir to the path so jupyter
# can load the modules in the scripts directory from the notebooks
import os
import sys
repo_dir = os.path.split(os.getcwd())[0]
print(repo_dir)
if repo_dir not in sys.path:
    sys.path.append(repo_dir)
    
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import json
import csv
import os

data_dir = '../data/GoodReads'

books_10k_file = os.path.join(data_dir, 'goodreads_reviews-books_above_10k_lang_reviews.csv.gz')
reviewers_5k_file = os.path.join(data_dir, 'goodreads_reviews-reviewers_above_5k_reviews.csv.gz')
random_1M_file = os.path.join(data_dir, 'goodreads_reviews-random_sample_1M.csv.gz')
author_file = os.path.join(data_dir, 'goodreads_book_authors.csv.gz') # author information
book_file = os.path.join(data_dir, 'goodreads_books.csv.gz') # basic book metadata
/Users/marijnkoolen/Code/Huygens/scale

Loading and Merging Data

We start with a subset of reviews for frequently reviewed books. To see how this subset was created, go to the Filtering Goodreads reviews notebook. This subset contains all reviews for books that have at least 10,000 reviews each.

We first load the reviews into a Pandas dataframe, then add metadata for the reviewed books from some of the datasets with book metadata.

In [2]:
# the review dataframe
review_df = pd.read_csv(books_10k_file, sep='\t', compression='gzip')

review_df
Out[2]:
Unnamed: 0 user_id book_id review_id rating date_added date_updated read_at started_at n_votes n_comments review_length review_text author_id title author_name review_lang
0 0 8842281e1d1347389f2ab93d60773d4d 2767052 248c011811e945eca861b5c31a549291 5 Wed Jan 13 13:38:25 -0800 2010 Wed Mar 22 11:46:36 -0700 2017 Sun Mar 25 00:00:00 -0700 2012 Fri Mar 23 00:00:00 -0700 2012 24 25 1326 I cracked and finally picked this up. Very enj... 153394 The Hunger Games (The Hunger Games, #1) Suzanne Collins en
1 1 704eb93a316aff687a93d5215882eb21 2767052 c52e231744768e9d7f939d1cbeb87666 5 Fri Jul 20 13:59:12 -0700 2012 Sun Aug 23 20:49:13 -0700 2015 Sat Feb 18 00:00:00 -0800 2012 NaN 0 0 31 Exciting, fun, entertaining! :) 153394 The Hunger Games (The Hunger Games, #1) Suzanne Collins en
2 2 4b3636a043e5c99fa27ac897ccfa1151 2767052 89f5c6ed51ba6f70d3955a620f9af830 5 Thu Jun 09 22:05:49 -0700 2011 Fri Sep 13 08:47:42 -0700 2013 Tue Jul 05 00:00:00 -0700 2011 Mon Jul 04 00:00:00 -0700 2011 0 0 201 This was the perfect quick read for a beach va... 153394 The Hunger Games (The Hunger Games, #1) Suzanne Collins en
3 3 012aa353140af13109d00ca36cdc0637 2767052 77fa951667b104fd565d5bd6c760437b 5 Sun Nov 04 18:57:00 -0800 2012 Mon Apr 15 12:57:23 -0700 2013 Sun Apr 14 00:00:00 -0700 2013 NaN 0 0 1523 The United States (and I assume most other soc... 153394 The Hunger Games (The Hunger Games, #1) Suzanne Collins en
4 4 2f6af21d14c83a5df6cdcef5e6af0b3e 2767052 46f876086c1e378859f889e87d1e6e5c 4 Thu Jun 07 10:31:00 -0700 2012 Thu Jun 07 10:33:17 -0700 2012 Mon Apr 16 00:00:00 -0700 2012 NaN 0 0 98 A page turner. Since I hate reality TV I value... 153394 The Hunger Games (The Hunger Games, #1) Suzanne Collins en
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
121925 121972 d168e4a91a8cb0795d72d0adbe9a5897 10818853 a72358e15220c703fbcd1a61ceb60ea6 3 Tue Aug 06 16:05:58 -0700 2013 Tue Aug 06 16:06:37 -0700 2013 NaN NaN 0 0 107 Very shocking content. Not well written. Makes... 4725841 Fifty Shades of Grey (Fifty Shades, #1) E.L. James en
121926 121973 d43b94b7a0a02e0bbaa6b93b884a0c9d 10818853 f35af15602f353e3c4b8b357ca2cfd01 4 Sat Jun 16 04:02:33 -0700 2012 Sat Jun 16 04:03:52 -0700 2012 Fri Jun 08 00:00:00 -0700 2012 NaN 1 0 45 A wonderful, if slightly twisted, love story. 4725841 Fifty Shades of Grey (Fifty Shades, #1) E.L. James en
121927 121974 43202656e9c338bb711afbc7136ab344 10818853 0931f46ea40d06bb201410a1c465b2ff 2 Sun Nov 11 01:28:33 -0800 2012 Sun Nov 11 01:29:46 -0800 2012 NaN NaN 0 0 118 Read to see what all the hype was about. Mills... 4725841 Fifty Shades of Grey (Fifty Shades, #1) E.L. James en
121928 121975 d94c83867337514c94738b57a1d19677 10818853 bf6e6e995804cd92d2e0f66a0fe4c5d8 5 Sat Sep 08 09:20:43 -0700 2012 Wed Dec 26 03:13:01 -0800 2012 NaN NaN 0 0 296 This book killed the little innocence in me. I... 4725841 Fifty Shades of Grey (Fifty Shades, #1) E.L. James en
121929 121976 e60fcbb1c70ed4f383145efcae21c7ac 10818853 6b298c960776d63607d06023ad38b567 4 Tue Jul 21 03:53:31 -0700 2015 Sun Jul 26 09:25:26 -0700 2015 Fri Jul 24 00:00:00 -0700 2015 Tue Jul 21 00:00:00 -0700 2015 0 0 274 I actually to my own surprise, enjoyed this bo... 4725841 Fifty Shades of Grey (Fifty Shades, #1) E.L. James en

121930 rows × 17 columns

In [3]:
from dateutil.parser import parse, tz

def parse_date(date_str):
    try:
        return parse(date_str).astimezone(utc)
    except TypeError:
        return None

utc = tz.gettz('UTC')

review_df['date_added'] = review_df.date_added.apply(parse_date)
review_df['date_updated'] = review_df.date_updated.apply(parse_date)
review_df['read_at'] = review_df.read_at.apply(parse_date)
review_df['started_at'] = review_df.started_at.apply(parse_date)
In [4]:
# get a list of book ids that are in the review dataset
review_book_ids = set(review_df.book_id.unique())

# load basic book metadata (only book and author id and book title)
bookmeta_df = pd.read_csv(book_file, sep='\t', compression='gzip', usecols=['book_id', 'work_id', 'author_id', 'title'])

# filter the book metadata to only the book ids in the review dataset
bookmeta_df = bookmeta_df[bookmeta_df.book_id.isin(review_book_ids)]

# load the author metadata to get author names 
author_df = pd.read_csv(author_file, sep='\t', compression='gzip', usecols=['author_id', 'name'])
author_df = author_df.rename(columns={'name': 'author_name'})

# merge the book and author metadata into a single dataframe, 
# keeping only author names for books in the review dataset
metadata_df = pd.merge(bookmeta_df, author_df, how='left')

# merge the review dataset with the book metadata
review_df = pd.merge(review_df, metadata_df, on='book_id')

We remove empty reviews as they are non-reviews (see Filtering Goodreads Reviews for details on how and why we do this).

In [5]:
print('Number of empty reviews:', len(review_df[review_df.review_length == 0]))
review_df = review_df[review_df.review_length > 0]
Number of empty reviews: 0
In [7]:
# This step writes the current dataframe to file, 
# so all the merging steps can be skipped in reruns of the notebook
merged_data_file = '../data/Goodreads/goodreads_reviews-books_above_10k.merged.csv.gzip'
review_df.to_csv(merged_data_file, sep='\t', compression='gzip')

#review_df = pd.read_csv(merged_data_file, sep='\t', compression='gzip')
In [ ]:
 

This datasets contains reviews for nine books that each have at least 10,000 reviews:

In [290]:
review_df.groupby(['author_name', 'title']).size()
Out[290]:
author_name      title                                  
E.L. James       Fifty Shades of Grey (Fifty Shades, #1)    11176
John Green       The Fault in Our Stars                     20738
Markus Zusak     The Book Thief                             11297
Paula Hawkins    The Girl on the Train                      13401
Stephenie Meyer  Twilight (Twilight, #1)                    10532
Suzanne Collins  Catching Fire (The Hunger Games, #2)       11900
                 Mockingjay (The Hunger Games, #3)          13534
                 The Hunger Games (The Hunger Games, #1)    18613
Veronica Roth    Divergent (Divergent, #1)                  10739
dtype: int64

Suzanne Collins has three books, all part of the same trilogy, among the most frequently reviewed books:

In [291]:
review_df.author_name.value_counts()
Out[291]:
Suzanne Collins    44047
John Green         20738
Paula Hawkins      13401
Markus Zusak       11297
E.L. James         11176
Veronica Roth      10739
Stephenie Meyer    10532
Name: author_name, dtype: int64

There are reviews in different languages:

In [292]:
review_df.review_lang.value_counts()
Out[292]:
en         113338
es           1650
af            666
id            624
unknown       516
it            486
de            450
tl            385
cy            331
fr            302
so            283
pt            270
sv            254
nl            252
sl            245
no            227
ro            213
ca            186
pl            172
sw            156
da            155
tr            124
et            107
hr            103
vi             89
sk             86
hu             66
cs             63
sq             46
fi             45
lt             23
lv             17
Name: review_lang, dtype: int64

For content analysis, we'll remove the non-English reviews, so content can be more easily compared across reviews.

In [293]:
review_df = review_df[review_df.review_lang == 'en']

First, we compare how the reviews are spread over time, for all books together and per book.

In [379]:
plt.rcParams['figure.figsize'] = [15, 5]

# group all reviews by year and month that they were published
g = review_df.groupby([review_df.date_updated.dt.year, review_df.date_updated.dt.month]).size()
# plot the number of reviews per month as a bar chart
ax = g.plot(kind='bar')
# update the ticks on the x-axis so that they remain readable...
ax.set_xticks(range(len(g)));
# ... with only a tick label for January of each year
ax.set_xticklabels(["%s-%02d" % item if item[1] == 1 else '' for item in g.index.tolist()], rotation=90);
plt.gcf().autofmt_xdate()
plt.xlabel('Review month')
plt.ylabel('Number of reviews')
plt.show()

The first reviews are from late 2007, the last from late 2017. The plot shows that the total number of reviews for these nine books increased from late 2007 with a sudden jump in 2012 and with another jump in 2014. However, with the current scale (over 100,000 reviews) and focus (reviews for nine popular books) we don't see differences in patterns per book. We shift our focus by creating views on numbers of reviews per book.

In [353]:
# Group the number of reviews by year and by book title
g = review_df.groupby([review_df.date_updated.dt.year, 'title']).size()
# is zero for years in which a book has no reviews
u = g.unstack('title').fillna(0)
for title in review_df.title.unique():
    # divide the number of reviews for a book in a certain 
    # year by the number of reviews over all years to get proportions
    u[title] = u[title] / sum(u[title])
# plot as bar chart
u.plot(kind='bar')
Out[353]:
<AxesSubplot:xlabel='date_updated'>

We notice that there are some marked differences in how reviews of a book are spread over time. For some, there is large burst just after release (especially Fifty shades grey with 50% of its reviews in 2012, then the amount of reviews drops off rapidly), while for others the reviews are more spread out, like Twilight and particularly The book thief which was released in 2005, had a small fraction of its reviews in 2007, but got an increasing amount of reviews up to a peak in 2014, a full 9 years after its release, and still receiving many reviews in 2017.

We start with analysing the reviews for a single book. A random pick from the book ids:

In [22]:
np.random.choice(list(review_book_ids))
Out[22]:
7260188

We create a new dataframe by selecting only the reviews for the randomly selected book.

In [10]:
book_id = 7260188
book_df = review_df[review_df.book_id == book_id]
book_df.title.drop_duplicates()
Out[10]:
18613    Mockingjay (The Hunger Games, #3)
Name: title, dtype: object

The chosen book is Mockingjay, the third book in The Hunger Games trilogy by Suzanne Collins. Let's start with a quick look at the ratings to know if we can expect positive and/or negative reviews:

In [11]:
book_df.rating.value_counts()
Out[11]:
5    4817
4    4084
3    2834
2    1133
1     363
0     303
Name: rating, dtype: int64

The ratings of zero are not actual ratings, but non-ratings, i.e. the reviewer wrote a review but provided no explicit rating.

In [12]:
plt.rcParams['figure.figsize'] = [15, 5]

g = book_df.groupby([book_df.date_added.dt.year, 'rating']).size()
u = g.unstack('date_added')
print('year\tavg. rating')
for year in u.columns:
    print(f'{year}\t{book_df[book_df.date_added.dt.year == year].rating.mean(): >4.2f}')
    u[year] = u[year] / sum(u[year])

g = u.stack()
u = g.unstack('rating')
u.plot(kind='bar')
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-12-d9edbfce80bc> in <module>()
      1 plt.rcParams['figure.figsize'] = [15, 5]
      2 
----> 3 g = book_df.groupby([book_df.date_added.dt.year, 'rating']).size()
      4 u = g.unstack('date_added')
      5 print('year\tavg. rating')

~/Library/Python/3.6/lib/python/site-packages/pandas/core/generic.py in __getattr__(self, name)
   5268             or name in self._accessors
   5269         ):
-> 5270             return object.__getattribute__(self, name)
   5271         else:
   5272             if self._info_axis._can_hold_identifiers_and_holds_name(name):

~/Library/Python/3.6/lib/python/site-packages/pandas/core/accessor.py in __get__(self, obj, cls)
    185             # we're accessing the attribute of the class, i.e., Dataset.geo
    186             return self._accessor
--> 187         accessor_obj = self._accessor(obj)
    188         # Replace the property with the accessor object. Inspired by:
    189         # http://www.pydanny.com/cached-property.html

~/Library/Python/3.6/lib/python/site-packages/pandas/core/indexes/accessors.py in __new__(cls, data)
    336             return DatetimeProperties(data, orig)
    337 
--> 338         raise AttributeError("Can only use .dt accessor with datetimelike values")

AttributeError: Can only use .dt accessor with datetimelike values

The plot shows that fraction of reviews per year that gets a rating of 1-5 stars (or no rating, represented by the zero values).

The majority of reviews have a positive rating, and although the fraction of 5-star reviews drops somewhat after the first year (the lowest average rating is in 2014), the majority remains positive. This is typical of online reviews. People don't choose books to read randomly, but those which they expect to like. Furthermore, people who liked a book are more likely willing to put effort into reviewing it.

Let's look at the differences in review length:

In [13]:
book_df.review_length.value_counts().sort_index()
Out[13]:
1        10
2        10
3        30
4        25
5        22
         ..
11845     1
12221     1
12472     1
15704     1
17786     1
Name: review_length, Length: 2503, dtype: int64
In [14]:
from collections import Counter

# count the number of reviews of each length
counts = book_df.review_length.value_counts().sort_index()
print('The shortest review (in text characters):', book_df.review_length.min())
print('The longest review (in text characters):', book_df.review_length.max())
print('The average review length:', book_df.review_length.mean())
print('The standard deviation in review lengths:', book_df.review_length.std())
print('\nNumber of reviews with fewer than 100 characters:', sum(book_df.review_length < 100))
print('Number of reviews of below average length:', sum(book_df.review_length < book_df.review_length.mean()))
print('Number of reviews of above average length:', sum(book_df.review_length > book_df.review_length.mean()))

dist = {length: count for length, count in counts.iteritems()}
book_df.review_length.value_counts()
x, y = zip(*book_df.review_length.value_counts().sort_index().iteritems())
plt.plot(x, y)
plt.axvline(x=book_df.review_length.mean(), color='red', linestyle='dotted')
plt.xscale('log')
The shortest review (in text characters): 1
The longest review (in text characters): 17786
The average review length: 608.3044923895375
The standard deviation in review lengths: 1013.9769171514645

Number of reviews with fewer than 100 characters: 3512
Number of reviews of below average length: 9850
Number of reviews of above average length: 3684

The plot above shows the distribution of review lengths in number of characters per review. There is a large spread in review lengths. There are thousands of reviews with fewer than 100 characters. Based on typical average word lengths in English of just over 4 characters per word, plus whitespace between words, that means that these are reviews with fewer than 20 words. The average length is 628 characters (the red dotted line), while the longest is almost 18,000 characters long (roughly 3600 words).

Slight tangent on the distribution: The standard deviation is higher than the average length, signaling that this distribution is skewed towards the left (most reviews are shorter than the average). See the notebook on Analysing Distributions for a detailed analysis of the different types of distributions and our arguments on why it is important to know about them and take them into account when interpreting data.

Let's sample a review and look at the text.

In [24]:
random_seed = 1205921

sample_df = book_df.sample(1, random_state=random_seed)

review_text_col = list(sample_df.columns).index('review_text')
sample_df.iloc[0,review_text_col]
Out[24]:
"I found the writing great and the story well moving until it got to the end and the unnecessary tortures and games and killing of children ensued. Why repeat that? We didn't need the shock value. We have already established how cruel the Capitol was. I really didn't need to read about people getting killed in more and more creative ways - it seemed self-serving, like the author has some morbid fascination with meat grinders and burning people or letting them torn apart by vicious monsters. It really detracted from the story for me, which I did find interesting and the ending was surprising - although I would have liked to read about the trial instead of Katniss getting locked up again (she is locked up a lot in this book, another annoying bit).   I liked the book very much but I deducted one star for the self-serving, morbid violence that didn't further the story."

This review describes a somewhat negative reading experience due to the violence in the book, but the reviewer found the story interesting and the ending surprising.

Let's compare a small sample of 10 reviews:

In [25]:
from scripts.text_tail_analysis import get_dataframe_review_texts
sample_size = 10
sample_df = book_df.sample(sample_size, random_state=random_seed)

for ri, review_text in enumerate(get_dataframe_review_texts(sample_df)):
    print(f'review {ri+1}:', review_text)
    print('\n')
review 1: I found the writing great and the story well moving until it got to the end and the unnecessary tortures and games and killing of children ensued. Why repeat that? We didn't need the shock value. We have already established how cruel the Capitol was. I really didn't need to read about people getting killed in more and more creative ways - it seemed self-serving, like the author has some morbid fascination with meat grinders and burning people or letting them torn apart by vicious monsters. It really detracted from the story for me, which I did find interesting and the ending was surprising - although I would have liked to read about the trial instead of Katniss getting locked up again (she is locked up a lot in this book, another annoying bit).   I liked the book very much but I deducted one star for the self-serving, morbid violence that didn't further the story.


review 2: No where near as good as the first two


review 3: Amazing! Review to come.


review 4: 1.5 stars   Before you come at me with your pitch forks, screaming at me for disliking a book in the Hunger Games trilogy, let me explain something to you.   I don't give books 5 stars just because I loved the previous 2 in the series. I refuse to give a book a postive review just because it is the conclusion to the series.   I did not like Mockingjay. Not only was I disappointed but I felt robbed (and I do not mean by my $20 that could have been spent on something better like a giant ice-cream cone)   I felt robbed because after 2 great books, Suzanne Collins ends her series in such an anti-climatic way that leaves me stunned, but not in the good way.   Perhaps I am biased. Scratch that, I am biased. Many of my favourite characters died in this final installment. I know, I know, there was a war. There are going to be deaths. Suzanne Collins put them in the book to make it more real.   Well you know what? I don't buy that!   Those deaths had nothing of importance, they happened, in my opinion, just to cause angst. And our main character Katniss already provided a lot of that.   Let's move on to Katniss, the true reason for my less-than-positive review. She was infuriating and annoying. She seemed like a hopeless little girl that could not pull herself together. She seemed like, dare I say it, Bella Swan. Whiny, indecisive, dependant ... all attributes that the Katniss I loved in the first novel did not possess. One of my favorite female characters, was just completely changed in this one book. I loved her for her strength and fight, not her inability to stand up for herself.   This seemed like a watered-down version of Katniss, created to further show her love for Peeta is so strong that she can't function without him.   Wait.   Back pedal on that.   Katniss Everdeen, the girl on fire, overly reliant on a boy?   You heard (or read) me correctly.   No longer is this girl the head-strong, kick-ass female protaginist I loved. In her place is a girl who is desperately needing Peeta, while toying with Gale only then to feel no remorse when Gale leaves and keeps Peeta just because he's there   Let me clarify something before I go on. I am a fan of Peenis. Hehe.   Peenis is the pairing of Katniss and Peeta. This book, despite what happened in the end, is not in support of Peenis. Why? Because Katniss is not the same Katniss and Peeta is not the same Peeta. They're imposters, they're not the characters I rooted for, loved and cherished.   Despite the suger-coated epilogue, I could not bring myself to like the ending of the novel. It did not seem the least bit real or in-character. Sure, I wanted Peeta to be Katniss' love. But did that mean she would feel nothing when Gale leaves? Obviously not considering they are best friends.   There were so many deaths in the book that, as I've mentioned, felt unneccesary.   Why did Prim have to die? Why? The series began with Prim being the innocent little sister that Katniss sought to protect. Why make her die? With her dead, the purpose of the series escapes me. Is Suzanne Collins implying that in the war, there is no hope whatsoever. I feel that's a terrible message, I understand war is killing humanity but shouldn't there be some hope?   Finnick Odair was my favorite character. He was hot, a hero, hot, sarcastic, hot, kind, hot, smart and did I mention hot? Yes, I did have a book-crush on Finnick. So his role in the book really pissed me off. After giving him such a great reunion with Annie, letting them getting married, why do you have to kill him off? And Annie's reaction (from Katniss' point of view) is little to nothing. We don't hear about her tears, her sadness, just ... nothing   If I go on to explain all the deaths in the book, I might just die myself so I'll stop. But the enormous amount of deaths served no purpose whatsoever.   The theme in this book (if there was any) is death and revenge and more death. I have a strong feeling the only reason Katniss killed Coin was because of Prim's death. And then, spoiler alert, when they are all voting on if there should be another Hunger Games but with the children of the Capital being the players, why in the world does Katniss say yes? And then say "for Prim"? I don't think Prim would like another Games to happen. This whole series is about how awful the Games are, how no one should have to suffer them, how they are cruel and a disgrace. So why does she do a 360 and vote for another Games? It's beyond me. It's purely for revenge. When she had a shimmer of goodness in the book, when she defended her dressing commitee because she claimed they were innocent and didn't know better because they were raised to regard the Games as okay, this is all in spite because then she chooses to make their children play the Games. So the theme in the book? Revenge. Pure revenge. It teaches us that humanity is a nothing but bad, that the only way to resolve problems if fighting fire with fire, to forget about tolerance and forgiveness and to make people pay. In other words, go ahead and be a hypocrite. Nice theme for a book targeted at young adults (notice my sarcasm).   Other things I did not like include: the slow beginning, the lack of Peeta (his brain-washed version of himself is not Peeta but just another cause for drama and angst), the filming of everything (and I mean everything, in the midst of war, they are concerned about filming Katniss fight? Seriously?) and, of course, the fragments. That happened. Every. Single. Other. Sentance. I. Understand. That. Everyone. Has. There. Own. Way. Of. Writing. But. Making. All. These. Fragments. Does. Not. Create. Suspense. And. No. Teenage. Girl. Thinks. Like. This.   Overall, it was a huge let down. I read the other 2 in the series, loved them both, and this felt like a huge downer. I had to force myself to finish this book while with the others, I finished them in one sitting because they were that good.   Why give it 1.5 stars rather than just 1? Because, despite my disappointment, I know many people who do not read at all and yet they were reading this series and for that, I give this book the 0.5 star since it was able to get people to read.   This review is ending kind of anti-climatic actually .... oh well, it goes with how Mockingjay ended   If you're pissed off that I didn't like this book, I don't really care. I think if Collins decided that instead of writing a series, to just make the book in one book, the first one, it would have been better. All this extra rebellion, District 13, blahblahblah was unnecessary. I felt she could have wrapped up the book nicely in Book 1. Oh well.   Still pissed I didn't give this book 5 stars?


review 5: This book was more than I expected, the key word being expected. Three times in this book I was blindsided. I turned the page to the next chapter expecting something to happen and then got an eyeful of something else, stunning my conscious recognition. This is just one thing I loved about this book. I also love that this book is no fairytale. It is gut wrenchingly raw and cruel and can solicit emotions of repulsion and disgust right alongside desires of hope and happiness. There is no sugar coating to this series. It was a realistic read. Collins writes believability into a complex snare that draws you in and makes it so hard to put the books down. These books will definitely reside on my favorites shelf. For me this book was worth the wait not that I wouldn't have like to have had it sooner. Now I'm off to savor this book again and glean even more while listening to the audio.


review 6: Okay, Actually, I'm a bit disappointing by the end of the story


review 7: ** spoiler alert **   This book moved the slowest for me reading-wise. Specifically because it was so descriptive. Dialog is always quicker and more engaging to read.   All the psychological stuff that I felt was left out of The Hunger Games, coagulates here in Mockingjay. Katniss, having been given only a few months of what she think will be the rest of her more semi-normal life, has to return to survival mode. Peeta's character development is almost scary, but I think was a good move. He moves from being just the baker's son who was drawn for the games and survived with Katniss' help, to being a survivor himself. Struggling with the Capitol's hijacking and the slow process of recovery.   I had a hard time with Finnick's death, almost as much as I had with Cinna's in Catching Fire. But like the deaths in the Harry Potter series, as much as I disliked it such a thing is what gives a story depth and importance. Without loss the struggle loses it's meaning. Without it the story loses grasp on reality.   Having read all three books, I can absolutely not agree with anyone that tries claiming this is another stab at a Twilight-style saga.


review 8: This series is just hard to describe. For one thing I was continually pulled out of the narrative when the narrator switched back and forth between and past and present tense. It was strange, but maybe that is just me.   Overall the idea and the world behind this story is brilliant. I really enjoyed the first book, but as the story went on into the second and third books it just got painful the read. These characters are really tortured beyond what is normal for a young adult book.   And towards the end of this finale book I just got fed up with Katniss and her self loathing. She alternates between blaming the capital and the Games for all her problems and then blaming herself. I actually think that is is probably very realistic. But personally I left this series feeling like I should have stopped reading at the end of book one when I still liked all the characters.


review 9: I fell in love with this series just months before Mocking Jay came out, one of my fellow library friends introduced me to the series and I devoured both Hunger Games and Catching Fire within days! I was devastated to find out that I had to wait for the conclusion, specially with the cliff hanger in Catching Fire! It was well worth the wait and like always, the conclusion was my favorite part. I love a good book that can make you feel a lot of emotions. My favorite part of this series was that it wasn't centered around boys like Twilight. There was two possible love interests but that wasn't the focus of the story. It was also my first time seeing a strong female lead. I felt like I connected the best with Katniss versus Bella or Trish. I also enjoyed the movies which is a plus knowing the most movies don't compare to the books. Finally I enjoy a book that throws you for a curve and this one had me speechless, and also crying at the end for the rest of the 3 hour drive to visit my family!


review 10: Re-read november of 2014:   I loved this book even more than I did the first time around.   Maybe because I took my time reading it this time, or I already watched part 1 of the film but I just lovee itt.   The ending makes sense now, at first it felt rushed and well maybe the second half of the books feels that way, but that's how war is. Fast-paced, no room for thinking things over, move or you're dead. And this book captures that essence really well.   This is not a happy ending story, it is more bitter than sweet.But it shows that sometimes in life you need to accept what has happened and those things you can't change. Adapt, and keep going.   I give it a 4.5. I think it's still my least favorite from the series considering how dark it is, but it's a great conclusion.


Many reviews are very short, just one or two short sentences. Many reviewers mention the ending. This book being the last of a trilogy, this is not unexpected, as this book wraps up a longer narrative. We see quite some difference of opinion.

Taking a first step into a more quantitative analysis of the content, we do a Keyword in Context (KWiC) search for the words 'end', 'ended' and 'ending' to get insight in what reviewers say about it.

In [26]:
import re


def kwic(pattern, reviews, word_boundaries=True):
    pattern = pattern if not word_boundaries else r'\b' + pattern + r'\b'
    for review in reviews:
        for match in re.finditer(pattern, review):
            start = match.start(0) - 40 if match.start(0) > 40 else 0
            end = match.end(0) + 40
            print(f'{match[1]: <15}{review[start:end]}')
            
pattern = '(end|ends|ended|ending|endings)'
kwic(pattern, get_dataframe_review_texts(sample_df))
end            e story well moving until it got to the end and the unnecessary tortures and games 
ending         e, which I did find interesting and the ending was surprising - although I would have 
ends           se after 2 great books, Suzanne Collins ends her series in such an anti-climatic way
end            This book, despite what happened in the end, is not in support of Peenis. Why? Beca
ending         e, I could not bring myself to like the ending of the novel. It did not seem the least
ending         to get people to read.   This review is ending kind of anti-climatic actually .... oh 
ended          .. oh well, it goes with how Mockingjay ended   If you're pissed off that I didn't li
end            ctually, I'm a bit disappointing by the end of the story
end            r a young adult book.   And towards the end of this finale book I just got fed up w
end            ke I should have stopped reading at the end of book one when I still liked all the 
end            d me speechless, and also crying at the end for the rest of the 3 hour drive to vis
ending         of the film but I just lovee itt.   The ending makes sense now, at first it felt rushe
ending         ence really well.   This is not a happy ending story, it is more bitter than sweet.But

Another way to get insight in the content of multiple reviews is to make frequency lists.

In [27]:
import re

tf = Counter()
for text in get_dataframe_review_texts(sample_df):
    # split the texts on any non-word characters
    words = re.split(r'\W+', text.strip())
    # count the number of times each word occurs across the review texts
    tf.update(words)

tf.most_common(20)
Out[27]:
[('the', 119),
 ('I', 88),
 ('and', 56),
 ('to', 51),
 ('a', 46),
 ('of', 43),
 ('that', 38),
 ('is', 38),
 ('book', 37),
 ('in', 36),
 ('was', 29),
 ('this', 29),
 ('it', 26),
 ('for', 22),
 ('Katniss', 19),
 ('not', 19),
 ('t', 18),
 ('with', 18),
 ('just', 18),
 ('like', 17)]

Among the top 20 most frequent words, we find a domain-generic term, 'book', as well as the name of a character in the book, 'Katniss'.

How often do variants of 'end' and 'ending' appear in these 10 reviews?

In [28]:
for term in ['end', 'ends', 'ended', 'ending', 'endings']:
    print(f'{term}:', tf[term])
end: 6
ends: 1
ended: 1
ending: 5
endings: 0
In [29]:
print('Number of words:', sum(tf.values()))
print('Number of distinct words:', len(tf.keys()))
Number of words: 2327
Number of distinct words: 765

We can also use some of the many wonderful open source Natural Language Processing (NLP) tools to have more control on the textual content. We use Spacy to parsed the reviews to have access to the individual sentences and words, and get additional information on word forms, part-of-speech, lemmas, etc.

We start with listing all entities that Spacy identified in the sample of reviews.

In [30]:
import spacy

# load the large model for English
nlp = spacy.load('en_core_web_lg')

# use nlp to parse each text and store the parsed results as a list of docs
docs = [nlp(text) for text in get_dataframe_review_texts(sample_df)]

# iterate over the docs, then over the entities in each doc and count them
tf = Counter([entity.text for doc in docs for entity in doc.ents])

tf.most_common()
Out[30]:
[('Katniss', 18),
 ('Peeta', 10),
 ('first', 7),
 ('one', 6),
 ('2', 3),
 ('Mockingjay', 3),
 ('Suzanne Collins', 3),
 ('Peenis', 3),
 ('Catching Fire', 3),
 ('Capitol', 2),
 ('two', 2),
 ('1.5', 2),
 ('5', 2),
 ('Finnick', 2),
 ('Annie', 2),
 ('Collins', 2),
 ('the Hunger Games', 1),
 ('20', 1),
 ('Bella Swan', 1),
 ('One', 1),
 ('Katniss Everdeen', 1),
 ('Gale', 1),
 ('Finnick Odair', 1),
 ('Coin', 1),
 ('Prim', 1),
 ('another Hunger Games', 1),
 ('Capital', 1),
 ('360', 1),
 ('Games', 1),
 ('Sentance', 1),
 ('I.', 1),
 ('0.5', 1),
 ('blahblahblah', 1),
 ('Book 1', 1),
 ('Three', 1),
 ('The Hunger Games', 1),
 ('only a few months', 1),
 ('baker', 1),
 ('Cinna', 1),
 ('Harry Potter', 1),
 ('three', 1),
 ('second', 1),
 ('third', 1),
 ('just months', 1),
 ('Jay', 1),
 ('Hunger Games', 1),
 ('days', 1),
 ('Twilight', 1),
 ('Bella', 1),
 ('Trish', 1),
 ('the 3 hour', 1),
 ('november of 2014', 1),
 ('1', 1),
 ('the second half', 1),
 ('4.5', 1)]

There is only a short list of entities found in the 10 reviews, most appearing only once. If we look not only at named entities, but at all noun phrases, we get a longer list:

In [31]:
# instead of entities, we can also look at noun-phrases
tf = Counter([ne.text for doc in docs for ne in doc.noun_chunks])

tf.most_common()
Out[31]:
[('I', 87),
 ('it', 26),
 ('me', 14),
 ('this book', 13),
 ('Katniss', 12),
 ('you', 11),
 ('It', 10),
 ('she', 10),
 ('they', 10),
 ('the book', 9),
 ('Peeta', 8),
 ('the story', 7),
 ('the end', 6),
 ('them', 6),
 ('the series', 6),
 ('something', 5),
 ('what', 5),
 ('this series', 5),
 ('people', 4),
 ('a book', 4),
 ('nothing', 4),
 ('She', 4),
 ('Prim', 4),
 ('the Games', 4),
 ('We', 3),
 ('the conclusion', 3),
 ('Mockingjay', 3),
 ('Suzanne Collins', 3),
 ('herself', 3),
 ('her', 3),
 ('him', 3),
 ('fire', 3),
 ('who', 3),
 ('Gale', 3),
 ('Peenis', 3),
 ('This book', 3),
 ('myself', 3),
 ('war', 3),
 ('the books', 3),
 ('Catching Fire', 3),
 ('the ending', 2),
 ('5 stars', 2),
 ('deaths', 2),
 ('importance', 2),
 ('angst', 2),
 ('a lot', 2),
 ('humanity', 2),
 ('He', 2),
 ('revenge', 2),
 ('the world', 2),
 ('another Games', 2),
 ('himself', 2),
 ('everything', 2),
 ('Collins', 2),
 ('emotions', 2),
 ('the rest', 2),
 ('the writing', 1),
 ('the unnecessary tortures', 1),
 ('games', 1),
 ('killing', 1),
 ('children', 1),
 ('the shock value', 1),
 ('the Capitol', 1),
 ('more and more creative ways', 1),
 ('the author', 1),
 ('some morbid fascination', 1),
 ('meat grinders', 1),
 ('vicious monsters', 1),
 ('the trial', 1),
 ('another annoying bit', 1),
 ('one star', 1),
 ('the self-serving, morbid violence', 1),
 ('Review', 1),
 ('your pitch forks', 1),
 ('the Hunger Games', 1),
 ('trilogy', 1),
 ('books', 1),
 ('a postive review', 1),
 ('a giant ice-cream cone', 1),
 ('2 great books', 1),
 ('her series', 1),
 ('such an anti-climatic way', 1),
 ('the good way', 1),
 ('my favourite characters', 1),
 ('this final installment', 1),
 ('a war', 1),
 ('Those deaths', 1),
 ('my opinion', 1),
 ('our main character', 1),
 ("'s", 1),
 ('the true reason', 1),
 ('my less-than-positive review', 1),
 ('a hopeless little girl', 1),
 ('all attributes', 1),
 ('the Katniss', 1),
 ('the first novel', 1),
 ('my favorite female characters', 1),
 ('this one book', 1),
 ('her strength', 1),
 ('a watered-down version', 1),
 ('her love', 1),
 ('Wait', 1),
 ('Back pedal', 1),
 ('Katniss Everdeen', 1),
 ('the girl', 1),
 ('a boy', 1),
 ('You', 1),
 ('this girl', 1),
 ('the head-strong, kick-ass female protaginist', 1),
 ('her place', 1),
 ('a girl', 1),
 ('no remorse', 1),
 ('he', 1),
 ('a fan', 1),
 ('the pairing', 1),
 ('support', 1),
 ('the same Katniss', 1),
 ('the same Peeta', 1),
 ('They', 1),
 ('imposters', 1),
 ('the characters', 1),
 ('the suger-coated epilogue', 1),
 ('the novel', 1),
 ('character', 1),
 ("Katniss' love", 1),
 ('best friends', 1),
 ('so many deaths', 1),
 ('The series', 1),
 ('the innocent little sister', 1),
 ('the purpose', 1),
 ('the war', 1),
 ('no hope', 1),
 ('a terrible message', 1),
 ('some hope', 1),
 ('Finnick Odair', 1),
 ('my favorite character', 1),
 ('a book-crush', 1),
 ('Finnick', 1),
 ('his role', 1),
 ('such a great reunion', 1),
 ('Annie', 1),
 ("Annie's reaction", 1),
 ("Katniss' point", 1),
 ('view', 1),
 ('her tears', 1),
 ('her sadness', 1),
 ('all the deaths', 1),
 ('the enormous amount', 1),
 ('no purpose', 1),
 ('The theme', 1),
 ('death', 1),
 ('more death', 1),
 ('a strong feeling', 1),
 ('the only reason', 1),
 ('Coin', 1),
 ("Prim's death", 1),
 ('spoiler alert', 1),
 ('another Hunger Games', 1),
 ('the children', 1),
 ('the Capital', 1),
 ('the players', 1),
 ('This whole series', 1),
 ('no one', 1),
 ('a disgrace', 1),
 ('vote', 1),
 ('a shimmer', 1),
 ('goodness', 1),
 ('her dressing commitee', 1),
 ('spite', 1),
 ('their children', 1),
 ('the theme', 1),
 ('Revenge', 1),
 ('Pure revenge', 1),
 ('us', 1),
 ('a nothing', 1),
 ('the only way', 1),
 ('problems', 1),
 ('tolerance', 1),
 ('forgiveness', 1),
 ('other words', 1),
 ('a hypocrite', 1),
 ('Nice theme', 1),
 ('young adults', 1),
 ('my sarcasm', 1),
 ('Other things', 1),
 ('the slow beginning', 1),
 ('the lack', 1),
 ('his brain-washed version', 1),
 ('just another cause', 1),
 ('drama', 1),
 ('the filming', 1),
 ('the midst', 1),
 ('course', 1),
 ('Sentance', 1),
 ('I. Understand', 1),
 ('Everyone', 1),
 ('Way', 1),
 ('Writing', 1),
 ('Fragments', 1),
 ('Suspense', 1),
 ('Teenage', 1),
 ('Girl', 1),
 ('a huge let', 1),
 ('a huge downer', 1),
 ('the others', 1),
 ('1.5 stars', 1),
 ('my disappointment', 1),
 ('many people', 1),
 ('the 0.5 star', 1),
 ('This review', 1),
 ('a series', 1),
 ('one book', 1),
 ('All this extra rebellion', 1),
 ('District', 1),
 ('blahblahblah', 1),
 ('Book', 1),
 ('the key word', 1),
 ('the page', 1),
 ('the next chapter', 1),
 ('an eyeful', 1),
 ('my conscious recognition', 1),
 ('just one thing', 1),
 ('no fairytale', 1),
 ('repulsion', 1),
 ('disgust', 1),
 ('desires', 1),
 ('hope', 1),
 ('happiness', 1),
 ('no sugar coating', 1),
 ('a realistic read', 1),
 ('believability', 1),
 ('a complex snare', 1),
 ('These books', 1),
 ('my favorites shelf', 1),
 ('the audio', 1),
 ('** spoiler alert', 1),
 ('Dialog', 1),
 ('All the psychological stuff', 1),
 ('The Hunger Games', 1),
 ('only a few months', 1),
 ('her more semi-normal life', 1),
 ('survival mode', 1),
 ("Peeta's character development", 1),
 ('a good move', 1),
 ("just the baker's son", 1),
 ('the games', 1),
 ("Katniss' help", 1),
 ('a survivor', 1),
 ("the Capitol's hijacking", 1),
 ('the slow process', 1),
 ('recovery', 1),
 ('a hard time', 1),
 ("Finnick's death", 1),
 ('Cinna', 1),
 ('the deaths', 1),
 ('the Harry Potter series', 1),
 ('such a thing', 1),
 ('a story depth', 1),
 ('loss', 1),
 ('the struggle', 1),
 ('grasp', 1),
 ('reality', 1),
 ('all three books', 1),
 ('anyone', 1),
 ('another stab', 1),
 ('a Twilight-style saga', 1),
 ('This series', 1),
 ('one thing', 1),
 ('the narrative', 1),
 ('the narrator', 1),
 ('just me', 1),
 ('the idea', 1),
 ('this story', 1),
 ('the first book', 1),
 ('the second and third books', 1),
 ('the read', 1),
 ('These characters', 1),
 ('a young adult book', 1),
 ('this finale book', 1),
 ('her self loathing', 1),
 ('the capital', 1),
 ('all her problems', 1),
 ('book', 1),
 ('all the characters', 1),
 ('love', 1),
 ('Jay', 1),
 ('my fellow library friends', 1),
 ('both Hunger Games', 1),
 ('days', 1),
 ('the cliff hanger', 1),
 ('my favorite part', 1),
 ('a good book', 1),
 ('My favorite part', 1),
 ('boys', 1),
 ('Twilight', 1),
 ('two possible love interests', 1),
 ('the focus', 1),
 ('my first time', 1),
 ('a strong female lead', 1),
 ('Bella', 1),
 ('Trish', 1),
 ('the movies', 1),
 ('a plus', 1),
 ('the most movies', 1),
 ('a curve', 1),
 ('this one', 1),
 ('the 3 hour drive', 1),
 ('my family', 1),
 ('Re-read november', 1),
 ('the first time', 1),
 ('my time', 1),
 ('part', 1),
 ('the film', 1),
 ('The ending', 1),
 ('sense', 1),
 ('the second half', 1),
 ('Fast-paced, no room', 1),
 ('things', 1),
 ('that essence', 1),
 ('a happy ending story', 1),
 ('life', 1),
 ('a great conclusion', 1)]

Many of these noun chunks are pronouns like 'I', 'me', 'you', 'she', 'they', 'them', 'we'. These are common in reviews, as reviewers often describe their personal reading experience and the affect that the book had on them. In a small sample, they get in way of seeing what content aspects are mentioned.

Spacy adds word form information to each word in the document. We can easily filter out common stopwords to get a better view of the content words that are mentioned.

In [32]:
tf = Counter([token.text for doc in docs for token in doc if not token.is_stop])

tf.most_common(20)
Out[32]:
[('.', 157),
 (',', 106),
 ('  ', 40),
 ('book', 37),
 ('-', 21),
 ('?', 19),
 ('Katniss', 19),
 ('like', 17),
 ('series', 17),
 ('read', 11),
 ('story', 10),
 ('Games', 10),
 ('Peeta', 10),
 ('(', 8),
 (')', 8),
 ('books', 8),
 ('loved', 8),
 ('felt', 8),
 ('end', 6),
 ('deaths', 6)]

Now we see many punctuation symbols. Let's filter those out as well.

In [33]:
tf = Counter([token.text for doc in docs for token in doc if not token.is_stop and not token.is_punct])

tf.most_common(20)
Out[33]:
[('  ', 40),
 ('book', 37),
 ('Katniss', 19),
 ('like', 17),
 ('series', 17),
 ('read', 11),
 ('story', 10),
 ('Games', 10),
 ('Peeta', 10),
 ('books', 8),
 ('loved', 8),
 ('felt', 8),
 ('end', 6),
 ('deaths', 6),
 ('love', 6),
 ('think', 6),
 ('people', 5),
 ('ending', 5),
 ('good', 5),
 ('Collins', 5)]

The most common words are clearly related to the book domain (such as book, read, series, story) and the review domain (like, loved, felt, love, good). Notice that there are many morphological variants of each other.

We can also count the word lemmas instead of the surface variants in the text:

In [34]:
tf = Counter([token.lemma_ for doc in docs for token in doc if not token.is_stop and not token.is_punct])

tf.most_common(20)
Out[34]:
[('book', 45),
 ('  ', 40),
 ('like', 21),
 ('Katniss', 19),
 ('series', 17),
 ('read', 15),
 ('love', 14),
 ('feel', 14),
 ('story', 10),
 ('end', 10),
 ('Games', 10),
 ('death', 10),
 ('character', 9),
 ('Peeta', 9),
 ('think', 8),
 ('get', 7),
 ('good', 7),
 ('way', 6),
 ('let', 6),
 ('star', 6)]

Now we also see end as a common word.

Zooming out to more reviewers

With 10 short reviews we can only see a few commonalities and distinctions. Several mention the ending, some like and some don't. A quantitative perspective doesn't give us much beyond what a close reading of the reviews would give us.

If we zoom out to a larger group of 10,000 reviews, we get a more stable picture of what aspects are commonly mentioned. But now a different problems rears up.

In [54]:
from scripts.text_tail_analysis import read_spacy_docs_for_dataframe, select_dataframe_spacy_docs
import spacy
import time

nlp = spacy.load('en_core_web_lg')
fname = '../data/goodreads-reviews-books_above_10k.doc_bin'
start = time.time()
review_docs = read_spacy_docs_for_dataframe(fname, review_df, nlp)
print('took:', time.time() - start, 'seconds')
print('number of spacy docs loaded:', len(review_docs))
book_docs = select_dataframe_spacy_docs(book_df, review_docs, as_dict=True)
print('number of spacy docs selected:', len(book_docs.keys()))
took: 294.7146508693695 seconds
number of spacy docs loaded: 113338
number of spacy docs selected: 12607
In [35]:
sample_size = 10000
sample_df = book_df.sample(sample_size, random_state=random_seed)
docs = [nlp(text) for text in get_dataframe_review_texts(sample_df)]
docs = select_dataframe_spacy_docs(sample_df, review_docs, as_dict=False)


# calculate the term frequency of individual words
tf = Counter([token.lemma_ for doc in docs for token in doc if not token.is_stop and not token.is_punct])

tf.most_common(20)
Out[35]:
[('book', 16843),
 ('  ', 14091),
 ('Katniss', 9164),
 ('like', 6744),
 ('read', 6651),
 ('end', 6424),
 ('series', 5948),
 ('love', 5158),
 ('think', 4737),
 ('character', 4217),
 ('Peeta', 4105),
 ('feel', 4090),
 ('ending', 3969),
 ('good', 3882),
 ('story', 3162),
 ('Collins', 3136),
 ('time', 2856),
 ('Games', 2853),
 ('way', 2794),
 ('Hunger', 2763)]

This list is very similar to the one for ten reviews. The book and review domain terms, plus the names of the book, author and main characters.

Plain word lists are a quick way to get an overview of what is common across a set of reviews. Apart from total word counts, we can also count each word once per document regardless of how frequently the reviewer uses it, so that we get insight in how many reviewers mention a specific term, e.g. 'ending'. With each review being a document, this frequency is known as the document frequency.

In [38]:
from scripts.text_tail_analysis import get_doc_word_token_set

df = Counter([lemma for doc in docs for lemma in get_doc_word_token_set(doc, use_lemma=True)])

df.most_common(20)
Out[38]:
[('book', 6163),
 ('series', 3850),
 ('end', 3713),
 ('read', 3688),
 ('like', 3324),
 ('Katniss', 3118),
 ('ending', 2949),
 ('love', 2880),
 ('good', 2792),
 ('think', 2639),
 ('character', 2375),
 ('feel', 2305),
 ('story', 1960),
 ('Collins', 1950),
 ('Hunger', 1891),
 ('trilogy', 1870),
 ('Games', 1861),
 ('time', 1852),
 ('Peeta', 1842),
 ('way', 1832)]

This is quite insightful. There are 3713 reviews (37% of the 10,000 in the sample) that mention the word end and 2949 reviews (could be many of the same reviews) that mention ending. Also, 2375 reviewers mention the word character, and 1960 mention story.

But what is the problem that rears up here? Let's look at the total number of words and distinct word forms:

In [39]:
print('Number of words:', sum(tf.values()))
print('Number of distinct words:', len(tf.keys()))
Number of words: 487298
Number of distinct words: 27632

The 10,000 reviews contain 487,298 words in total, and 27,632 distinct words. Above, we have looked at only the 20 most frequent ones. What are these remaining 27,612 words?

This is where the highly skewed distribution of word frequencies throws up barriers to analysis. How do we get a good overview of what those low-frequency are?

In [40]:
sizes = [10, 20, 100, 200]
for size in sizes:
    sum_top = sum([freq for term, freq in tf.most_common(size)])
    print(f'Sum frequency of top {size} terms: {sum_top} (fraction: {sum_top / sum(tf.values()): >.2f})')
Sum frequency of top 10 terms: 79977 (fraction: 0.16)
Sum frequency of top 20 terms: 113587 (fraction: 0.23)
Sum frequency of top 100 terms: 211555 (fraction: 0.43)
Sum frequency of top 200 terms: 263236 (fraction: 0.54)

These top 20 terms represent only 25% of all words. Even if we look at the top 200 words, we're ignoring half of the text.

In [172]:
tf_lemma_pos = Counter([(token.lemma_, token.pos_) for doc in docs for token in doc if not token.is_stop and not token.is_punct])

tf_lemma_pos.most_common(20)
Out[172]:
[(('book', 'NOUN'), 1785),
 (('  ', 'SPACE'), 1221),
 (('Katniss', 'PROPN'), 874),
 (('read', 'VERB'), 633),
 (('series', 'NOUN'), 575),
 (('think', 'VERB'), 530),
 (('character', 'NOUN'), 473),
 (('feel', 'VERB'), 417),
 (('ending', 'NOUN'), 416),
 (('love', 'VERB'), 416),
 (('good', 'ADJ'), 409),
 (('Peeta', 'PROPN'), 387),
 (('like', 'SCONJ'), 371),
 (('end', 'NOUN'), 354),
 (('go', 'VERB'), 315),
 (('story', 'NOUN'), 313),
 (('like', 'VERB'), 306),
 (('Collins', 'PROPN'), 305),
 (('want', 'VERB'), 302),
 (('know', 'VERB'), 289)]

Long Tails and Classification

One thing we can do to organise items in the long tail is to categorise or classify them.

  • group by part-of-speech: this
  • group by frequent terms in the sentence that they have a syntactical dependency with
  • group by semantic information about each word based on external resources, like sentiment, synonyms, hypernyms, or domain specific word categorisations (e.g. LIWC, Wordnet, ...).
In [299]:
from collections import defaultdict
from scripts.text_tail_analysis import show_pos_tail_distribution

tf_lemma_pos = Counter([(token.lemma_, token.pos_) for doc in docs for token in doc if not token.is_stop and not token.is_punct])

show_pos_tail_distribution
Word form	All TF (frac)	TF <= 5 (frac)	TF = 1 (frac)
----------------------------------------------
VERB        	 12830  0.28	  1671  0.21	   564  0.19
NOUN        	 16541  0.36	  2876  0.36	  1014  0.34
ADJ         	  6866  0.15	  1757  0.22	   674  0.22
PROPN       	  4738  0.10	   715  0.09	   408  0.13
SCONJ       	   418  0.01	     4   0.0	     1   0.0
NUM         	   481  0.01	   121  0.02	    49  0.02
ADV         	  2438  0.05	   502  0.06	   210  0.07
X           	    51  0.00	    41  0.01	    26  0.01
INTJ        	   407  0.01	   141  0.02	    35  0.01
SPACE       	  1221  0.03	     0   0.0	     0   0.0
PUNCT       	    32  0.00	    32   0.0	    22  0.01
CCONJ       	    12  0.00	    12   0.0	     0   0.0
PRON        	    17  0.00	    17   0.0	     4   0.0
PART        	    35  0.00	     0   0.0	     0   0.0
ADP         	    40  0.00	    20   0.0	    10   0.0
DET         	    10  0.00	    10   0.0	     6   0.0
SYM         	     1  0.00	     1   0.0	     1   0.0

Above we see the proportion of Part-Of-Speech tags across all words and across words that occur at most five times and at most once. Remember, this is after removal of stopwords and punctuation.

  • First, the largest categories overall are nouns (36%), verbs (28%), adjectives (15%), proper nouns (10%) and adverbs (5%). Proper nouns refer to single identifiable entities.

  • Among the less frequent words, the proportion of nouns and adverbs remain stable, the proportion of verbs drop, while the number of adjectives and proper nouns go up.

In other words, the tail has relatively many adjectives and entities, but also many other nouns. In terms of content analysis, these are important categories. Of course, with 1000 reviews and only a few thousand of these words, it is possible to go through all of them to get insights in what they are and how they relate to the book, the reading experience or something else. If we were to scale up to tens of thousands or millions of reviews, this would become increasingly infeasible.

In [421]:
from scripts.text_tail_analysis import get_lemma_pos_df_index
In [431]:
df_group1 = book_df[book_df.rating > 3]
df_group2 = book_df[book_df.rating < 3]

book_docs_group1 = select_dataframe_spacy_docs(df_group1, review_docs, as_dict=True)
book_docs_group2 = select_dataframe_spacy_docs(df_group2, review_docs, as_dict=True)

print(len(book_docs_group1), len(book_docs_group2))
8269 1671
In [438]:
token_pos_types = ['ADJ', 'ADV', 'NOUN', 'PROPN', 'VERB']
docs_group1 = [book_docs_group1[review_id] for review_id in book_docs_group1]
docfreq_group1 = get_lemma_pos_df_index(docs_group1, keep_pron=True)

docs_group2 = [book_docs_group2[review_id] for review_id in book_docs_group2]
docfreq_group2 = get_lemma_pos_df_index(docs_group2, keep_pron=True)

total_group1 = len(book_docs_group1)
total_group2 = len(book_docs_group2)

for pos_type in token_pos_types:
    for term, freq in docfreq_group1.most_common(1000):
        lemma, pos = term
        if pos != pos_type:
            continue
        prop_group1 = freq / total_group1
        prop_group2 = docfreq_group2[term] / total_group2
        prop = prop_group2 / prop_group1
        if prop < 1.5:
            continue
        print(f'{lemma: <20}{pos: <6}{freq: >6}{docfreq_group2[term]: >6}{prop_group1: >8.4f}{prop_group2: >8.4f}{prop: >6.2f}')
bad                 ADJ      684   324  0.0827  0.1939  2.34
well                ADJ      627   201  0.0758  0.1203  1.59
second              ADJ      597   213  0.0722  0.1275  1.77
strong              ADJ      521   200  0.0630  0.1197  1.90
big                 ADJ      370   149  0.0447  0.0892  1.99
interesting         ADJ      339   131  0.0410  0.0784  1.91
main                ADJ      282   145  0.0341  0.0868  2.54
old                 ADJ      270    82  0.0327  0.0491  1.50
disappointed        ADJ      217   186  0.0262  0.1113  4.24
depressing          ADJ      215    94  0.0260  0.0563  2.16
high                ADJ      170    70  0.0206  0.0419  2.04
horrible            ADJ      150    79  0.0181  0.0473  2.61
possible            ADJ      136    50  0.0164  0.0299  1.82
weak                ADJ      125    91  0.0151  0.0545  3.60
major               ADJ      123    72  0.0149  0.0431  2.90
huge                ADJ      119    58  0.0144  0.0347  2.41
terrible            ADJ      118    83  0.0143  0.0497  3.48
poor                ADJ      114    46  0.0138  0.0275  2.00
okay                ADJ      111    50  0.0134  0.0299  2.23
disappointing       ADJ      108   167  0.0131  0.0999  7.65
predictable         ADJ      104    41  0.0126  0.0245  1.95
actual              ADJ       96    37  0.0116  0.0221  1.91
rushed              ADJ       96    33  0.0116  0.0197  1.70
complete            ADJ       95    39  0.0115  0.0233  2.03
boring              ADJ       94   107  0.0114  0.0640  5.63
close               ADJ       94    29  0.0114  0.0174  1.53
sorry               ADJ       85    41  0.0103  0.0245  2.39
compelling          ADJ       85    33  0.0103  0.0197  1.92
confused            ADJ       82    36  0.0099  0.0215  2.17
mad                 ADJ       80    38  0.0097  0.0227  2.35
obvious             ADJ       78    25  0.0094  0.0150  1.59
female              ADJ       77    35  0.0093  0.0209  2.25
general             ADJ       75    29  0.0091  0.0174  1.91
fine                ADJ       72    46  0.0087  0.0275  3.16
awful               ADJ       69    51  0.0083  0.0305  3.66
original            ADJ       69    25  0.0083  0.0150  1.79
stupid              ADJ       68    45  0.0082  0.0269  3.27
anti                ADJ       68    38  0.0082  0.0227  2.77
confusing           ADJ       68    34  0.0082  0.0203  2.47
entertaining        ADJ       65    21  0.0079  0.0126  1.60
depressed           ADJ       65    48  0.0079  0.0287  3.65
single              ADJ       63    26  0.0076  0.0156  2.04
present             ADJ       63    33  0.0076  0.0197  2.59
annoying            ADJ       62    47  0.0075  0.0281  3.75
interested          ADJ       61    19  0.0074  0.0114  1.54
bleak               ADJ       60    23  0.0073  0.0138  1.90
cold                ADJ       58    20  0.0070  0.0120  1.71
unnecessary         ADJ       56    52  0.0068  0.0311  4.60
total               ADJ       56    20  0.0068  0.0120  1.77
weird               ADJ       56    26  0.0068  0.0156  2.30
selfish             ADJ       56    45  0.0068  0.0269  3.98
decent              ADJ       53    20  0.0064  0.0120  1.87
popular             ADJ       53    17  0.0064  0.0102  1.59
ok                  ADJ       52    32  0.0063  0.0192  3.05
teenage             ADJ       52    18  0.0063  0.0108  1.71
normal              ADJ       51    20  0.0062  0.0120  1.94
tired               ADJ       51    35  0.0062  0.0209  3.40
non                 ADJ       50    16  0.0060  0.0096  1.58
dramatic            ADJ       48    15  0.0058  0.0090  1.55
extreme             ADJ       47    17  0.0057  0.0102  1.79
post                ADJ       47    18  0.0057  0.0108  1.90
instead             ADV      345   189  0.0417  0.1131  2.71
honestly            ADV      179    93  0.0216  0.0557  2.57
simply              ADV      163    54  0.0197  0.0323  1.64
later               ADV      145    44  0.0175  0.0263  1.50
seriously           ADV      120    76  0.0145  0.0455  3.13
somewhat            ADV      117    38  0.0141  0.0227  1.61
literally           ADV      106    45  0.0128  0.0269  2.10
nearly              ADV       95    39  0.0115  0.0233  2.03
anymore             ADV       89    61  0.0108  0.0365  3.39
basically           ADV       87    62  0.0105  0.0371  3.53
way                 ADV       85    34  0.0103  0.0203  1.98
kinda               ADV       83    27  0.0100  0.0162  1.61
obviously           ADV       80    27  0.0097  0.0162  1.67
ahead               ADV       66    29  0.0080  0.0174  2.17
suddenly            ADV       60    31  0.0073  0.0186  2.56
unfortunately       ADV       59    42  0.0071  0.0251  3.52
constantly          ADV       58    23  0.0070  0.0138  1.96
barely              ADV       55    42  0.0067  0.0251  3.78
utterly             ADV       50    26  0.0060  0.0156  2.57
character           NOUN    3184  1071  0.3851  0.6409  1.66
death               NOUN    1080   371  0.1306  0.2220  1.70
page                NOUN     866   265  0.1047  0.1586  1.51
author              NOUN     716   274  0.0866  0.1640  1.89
point               NOUN     631   280  0.0763  0.1676  2.20
plot                NOUN     550   273  0.0665  0.1634  2.46
girl                NOUN     436   143  0.0527  0.0856  1.62
fact                NOUN     410   145  0.0496  0.0868  1.75
reason              NOUN     408   161  0.0493  0.0963  1.95
person              NOUN     388   134  0.0469  0.0802  1.71
sense               NOUN     365   117  0.0441  0.0700  1.59
triangle            NOUN     346   142  0.0418  0.0850  2.03
writing             NOUN     337   133  0.0408  0.0796  1.95
fire                NOUN     333   105  0.0403  0.0628  1.56
scene               NOUN     326   106  0.0394  0.0634  1.61
decision            NOUN     262    95  0.0317  0.0569  1.79
idea                NOUN     241   119  0.0291  0.0712  2.44
guy                 NOUN     223    75  0.0270  0.0449  1.66
kid                 NOUN     217    92  0.0262  0.0551  2.10
problem             NOUN     212    94  0.0256  0.0563  2.19
heroine             NOUN     195    87  0.0236  0.0521  2.21
boy                 NOUN     178    54  0.0215  0.0323  1.50
self                NOUN     176   112  0.0213  0.0670  3.15
half                NOUN     175    62  0.0212  0.0371  1.75
mother              NOUN     147    48  0.0178  0.0287  1.62
writer              NOUN     146    51  0.0177  0.0305  1.73
development         NOUN     144    64  0.0174  0.0383  2.20
pawn                NOUN     138    42  0.0167  0.0251  1.51
finale              NOUN     138    53  0.0167  0.0317  1.90
expectation         NOUN     137    49  0.0166  0.0293  1.77
resolution          NOUN     127    39  0.0154  0.0233  1.52
hell                NOUN     122    42  0.0148  0.0251  1.70
sort                NOUN     114    41  0.0138  0.0245  1.78
month               NOUN     104    36  0.0126  0.0215  1.71
control             NOUN     100    42  0.0121  0.0251  2.08
bomb                NOUN     100    35  0.0121  0.0209  1.73
example             NOUN      92    40  0.0111  0.0239  2.15
interest            NOUN      92    47  0.0111  0.0281  2.53
strength            NOUN      92    35  0.0111  0.0209  1.88
president           NOUN      86    29  0.0104  0.0174  1.67
sentence            NOUN      85    44  0.0103  0.0263  2.56
teenager            NOUN      83    26  0.0100  0.0156  1.55
protagonist         NOUN      83    52  0.0100  0.0311  3.10
tv                  NOUN      83    30  0.0100  0.0180  1.79
attention           NOUN      81    27  0.0098  0.0162  1.65
state               NOUN      81    27  0.0098  0.0162  1.65
lack                NOUN      79    44  0.0096  0.0263  2.76
version             NOUN      77    24  0.0093  0.0144  1.54
snow                NOUN      76    32  0.0092  0.0192  2.08
personality         NOUN      75    27  0.0091  0.0162  1.78
hospital            NOUN      75    63  0.0091  0.0377  4.16
impact              NOUN      74    28  0.0089  0.0168  1.87
coin                NOUN      74    35  0.0089  0.0209  2.34
direction           NOUN      74    25  0.0089  0.0150  1.67
blood               NOUN      73    25  0.0088  0.0150  1.69
element             NOUN      70    30  0.0085  0.0180  2.12
climax              NOUN      69    32  0.0083  0.0192  2.29
narrative           NOUN      69    23  0.0083  0.0138  1.65
depression          NOUN      68    23  0.0082  0.0138  1.67
propaganda          NOUN      68    23  0.0082  0.0138  1.67
purpose             NOUN      68    33  0.0082  0.0197  2.40
explanation         NOUN      67    27  0.0081  0.0162  1.99
cause               NOUN      65    20  0.0079  0.0120  1.52
fighting            NOUN      63    22  0.0076  0.0132  1.73
group               NOUN      63    25  0.0076  0.0150  1.96
disappointment      NOUN      62   133  0.0075  0.0796 10.62
human               NOUN      61    19  0.0074  0.0114  1.54
screen              NOUN      61    20  0.0074  0.0120  1.62
perspective         NOUN      61    19  0.0074  0.0114  1.54
setting             NOUN      61    22  0.0074  0.0132  1.78
quality             NOUN      59    38  0.0071  0.0227  3.19
result              NOUN      58    22  0.0070  0.0132  1.88
number              NOUN      57    20  0.0069  0.0120  1.74
excitement          NOUN      57    21  0.0069  0.0126  1.82
flaw                NOUN      56    26  0.0068  0.0156  2.30
revenge             NOUN      56    36  0.0068  0.0215  3.18
volume              NOUN      55    22  0.0067  0.0132  1.98
fantasy             NOUN      54    18  0.0065  0.0108  1.65
form                NOUN      53    18  0.0064  0.0108  1.68
effort              NOUN      53    21  0.0064  0.0126  1.96
arrow               NOUN      53    26  0.0064  0.0156  2.43
pacing              NOUN      52    28  0.0063  0.0168  2.66
citizen             NOUN      52    20  0.0063  0.0120  1.90
mission             NOUN      51    30  0.0062  0.0180  2.91
room                NOUN      51    20  0.0062  0.0120  1.94
song                NOUN      49    15  0.0059  0.0090  1.51
hype                NOUN      48    17  0.0058  0.0102  1.75
killing             NOUN      48    15  0.0058  0.0090  1.55
name                NOUN      46    16  0.0056  0.0096  1.72
despair             NOUN      46    14  0.0056  0.0084  1.51
ground              NOUN      46    14  0.0056  0.0084  1.51
trouble             NOUN      46    17  0.0056  0.0102  1.83
Fire                PROPN    646   216  0.0781  0.1293  1.65
Catching            PROPN    646   222  0.0781  0.1329  1.70
Snow                PROPN    573   192  0.0693  0.1149  1.66
Twilight            PROPN    104    34  0.0126  0.0203  1.62
Cinna               PROPN     98    38  0.0119  0.0227  1.92
Book                PROPN     90    44  0.0109  0.0263  2.42
PTSD                PROPN     71    25  0.0086  0.0150  1.74
Johanna             PROPN     61    20  0.0074  0.0120  1.62
Rue                 PROPN     54    22  0.0065  0.0132  2.02
get                 VERB    1243   451  0.1503  0.2699  1.80
die                 VERB    1024   364  0.1238  0.2178  1.76
kill                VERB     918   390  0.1110  0.2334  2.10
start               VERB     741   249  0.0896  0.1490  1.66
try                 VERB     690   224  0.0834  0.1341  1.61
hate                VERB     626   238  0.0757  0.1424  1.88
tell                VERB     558   182  0.0675  0.1089  1.61
mean                VERB     501   155  0.0606  0.0928  1.53
lose                VERB     483   167  0.0584  0.0999  1.71
let                 VERB     462   172  0.0559  0.1029  1.84
have                VERB     361   129  0.0437  0.0772  1.77
fall                VERB     292   100  0.0353  0.0598  1.69
decide              VERB     285   105  0.0345  0.0628  1.82
rush                VERB     278   103  0.0336  0.0616  1.83
care                VERB     275   180  0.0333  0.1077  3.24
follow              VERB     222    76  0.0268  0.0455  1.69
save                VERB     222    70  0.0268  0.0419  1.56
force               VERB     217    78  0.0262  0.0467  1.78
deserve             VERB     213    67  0.0258  0.0401  1.56
spend               VERB     203   111  0.0245  0.0664  2.71
throw               VERB     195    93  0.0236  0.0557  2.36
lead                VERB     180    64  0.0218  0.0383  1.76
suppose             VERB     177    85  0.0214  0.0509  2.38
stand               VERB     162    62  0.0196  0.0371  1.89
explain             VERB     155    62  0.0187  0.0371  1.98
build               VERB     143    53  0.0173  0.0317  1.83
compare             VERB     138    43  0.0167  0.0257  1.54
suffer              VERB     137    45  0.0166  0.0269  1.63
develop             VERB     133    52  0.0161  0.0311  1.93
run                 VERB     130    62  0.0157  0.0371  2.36
mention             VERB     127    55  0.0154  0.0329  2.14
drag                VERB     126    51  0.0152  0.0305  2.00
allow               VERB     122    42  0.0148  0.0251  1.70
use                 VERB     109    35  0.0132  0.0209  1.59
resolve             VERB     108    33  0.0131  0.0197  1.51
stick               VERB     107    52  0.0129  0.0311  2.40
shoot               VERB     106    48  0.0128  0.0287  2.24
cause               VERB     102    39  0.0123  0.0233  1.89
act                 VERB     101    32  0.0122  0.0192  1.57
buy                 VERB      96    42  0.0116  0.0251  2.16
ruin                VERB      94    62  0.0114  0.0371  3.26
drive               VERB      91    36  0.0110  0.0215  1.96
return              VERB      88    28  0.0106  0.0168  1.57
hang                VERB      87    39  0.0105  0.0233  2.22
suck                VERB      86    53  0.0104  0.0317  3.05
bother              VERB      85    70  0.0103  0.0419  4.08
lack                VERB      82    49  0.0099  0.0293  2.96
root                VERB      78    28  0.0094  0.0168  1.78
dislike             VERB      75    47  0.0091  0.0281  3.10
manipulate          VERB      74    28  0.0089  0.0168  1.87
introduce           VERB      68    27  0.0082  0.0162  1.96
fail                VERB      68    64  0.0082  0.0383  4.66
hide                VERB      67    37  0.0081  0.0221  2.73
walk                VERB      66    21  0.0080  0.0126  1.57
occur               VERB      63    23  0.0076  0.0138  1.81
blame               VERB      61    19  0.0074  0.0114  1.54
treat               VERB      60    21  0.0073  0.0126  1.73
sound               VERB      60    20  0.0073  0.0120  1.65
control             VERB      57    18  0.0069  0.0108  1.56
wake                VERB      57    34  0.0069  0.0203  2.95
pass                VERB      56    29  0.0068  0.0174  2.56
invest              VERB      55    28  0.0067  0.0168  2.52
appear              VERB      55    20  0.0067  0.0120  1.80
support             VERB      53    25  0.0064  0.0150  2.33
serve               VERB      52    17  0.0063  0.0102  1.62
rise                VERB      52    20  0.0063  0.0120  1.90
marry               VERB      52    19  0.0063  0.0114  1.81
annoy               VERB      52    28  0.0063  0.0168  2.66
vote                VERB      51    32  0.0062  0.0192  3.10
drop                VERB      51    36  0.0062  0.0215  3.49
complete            VERB      51    20  0.0062  0.0120  1.94
adore               VERB      49    16  0.0059  0.0096  1.62
cut                 VERB      49    17  0.0059  0.0102  1.72
step                VERB      48    15  0.0058  0.0090  1.55
warn                VERB      47    19  0.0057  0.0114  2.00
suggest             VERB      47    17  0.0057  0.0102  1.79

Adjectives:

  • 'bad', 'strong', 'big', 'interesting', 'main, 'old', 'disappointed'

Adverbs:

Nouns:

  • 'character', 'page', 'author', 'point', 'plot', 'writing'

Pronouns:

If we compare the proper nouns, we see that the negative reviews make a comparison to the Twilight series.

Verbs:

  • 'get', 'die', 'kill', 'start', 'try', 'hate', 'tell', 'mean', 'care', 'spend', 'throw'
In [459]:
tail_groupings = get_tail_groupings(docs_group2, docfreq_group2, token_pos_types, liwc, max_threshold=5000, min_threshold=10)

tail_df = pd.DataFrame(tail_groupings)

book_terms = ['book', 'novel', 'story', 'plot', 'character', 'twist', 'development']

tail_df[(tail_df.dependency_word == 'bad') & (tail_df.tail_pos == 'NOUN')]
Out[459]:
dependency_type dependency_word dependency_pos dependency_freq tail_word tail_pos tail_freq dep_tail_freq liwc_category
19237 head bad ADJ 324 year NOUN 102 1 relativ|time
19239 head bad ADJ 324 character NOUN 1071 1 None
43829 child bad ADJ 324 book NOUN 3681 20 None
43830 child bad ADJ 324 term NOUN 24 1 quant|relativ|time
43832 child bad ADJ 324 thing NOUN 519 13 None
43833 child bad ADJ 324 part NOUN 71 8 funct|quant
43834 child bad ADJ 324 mood NOUN 10 2 affect
43835 child bad ADJ 324 outcome NOUN 15 1 None
43836 child bad ADJ 324 ending NOUN 565 10 relativ|time
43838 child bad ADJ 324 writing NOUN 133 4 social|cogmech
43843 child bad ADJ 324 guy NOUN 75 10 None
43845 child bad ADJ 324 choice NOUN 82 2 None
43846 child bad ADJ 324 memory NOUN 29 2 None
43848 child bad ADJ 324 person NOUN 134 3 social|humans
43850 child bad ADJ 324 read NOUN 65 1 work|leisure
43851 child bad ADJ 324 case NOUN 35 1 None
43852 child bad ADJ 324 triangle NOUN 142 1 None
43853 child bad ADJ 324 interest NOUN 47 1 None
43855 child bad ADJ 324 situation NOUN 43 1 None
43856 child bad ADJ 324 people NOUN 385 1 None
43860 child bad ADJ 324 sequel NOUN 18 1 None
43862 child bad ADJ 324 conclusion NOUN 95 3 None
43864 child bad ADJ 324 reason NOUN 161 1 None
43865 child bad ADJ 324 taste NOUN 19 4 None
43866 child bad ADJ 324 aspect NOUN 27 1 None
43867 child bad ADJ 324 loss NOUN 24 1 None
43869 child bad ADJ 324 ass NOUN 22 4 swear|bio|body|sexual
43870 child bad ADJ 324 stuff NOUN 32 1 funct|pronoun|ipron
43873 child bad ADJ 324 model NOUN 12 1 None
43874 child bad ADJ 324 good NOUN 13 1 affect|posemo
43875 child bad ADJ 324 idea NOUN 119 1 cogmech|insight
43876 child bad ADJ 324 decision NOUN 95 2 None
43877 child bad ADJ 324 death NOUN 371 1 None
43881 child bad ADJ 324 dialogue NOUN 14 1 None
43882 child bad ADJ 324 game NOUN 187 1 None
43883 child bad ADJ 324 series NOUN 925 1 quant
43884 child bad ADJ 324 way NOUN 516 1 relativ
43885 child bad ADJ 324 movie NOUN 149 1 None
43887 child bad ADJ 324 sentence NOUN 44 1 None
43888 child bad ADJ 324 boy NOUN 54 1 social|humans
43889 child bad ADJ 324 one NOUN 45 1 funct|number
43892 child bad ADJ 324 trilogy NOUN 304 1 None
In [453]:
from scripts.text_tail_analysis import has_lemma_pos, sentence_iter
lemma = 'writing'
pos = 'NOUN'

for sent in sentence_iter(docs_group2):
    if has_lemma_pos(sent, lemma, pos):
        print(sent)
I question the plot, the writing, and the capricious actions of characters.
Ofcourse, I was not expecting any Shakespearean writing in it but the whole book literally felt like that it was written by a 10-year-old; nothing really deep, not as exciting as the previous ones and a rush, out of nowhere ending!   
The writing style was still good, but I couldn't get into the storyline as deeply.
The quality of the writing has seriously deteriorated, the main character has become a hysterical, trauma-ridden, selfish, self-pitying whiner, and the plot has taken a suicide leap.   
It seems that without the Arena to set the stage Collin's writing has stuttered out and Katniss has lost her fire.   
With the monotone writing, the monotone character of Katniss, and the monotone plot, the monotone and banal setting of District 13 doesn't help much.
I also felt cheated by the writing in this installation.
I obviously expect a war to be violent but the tone of the writing in this book (and the previous two) did not prepare me for the level of violence
The world is still very interesting, and Collins' writing style is enjoyable.
I was very disappointed with the quality of the writing and dwindling character development & storylines.
I figured out how these books don't apply the regular paths of writing: three disasters, character's inner goal, external goal; having to decide between making the right decision or achieving the initial goal.
As I was reading this, I slowly realized how bad the writing was.
I know that I didn't like Hunger Games because the writing focused on so much details that really made no sense.
The writing for the action/fighting scenes was awkward and unclear.
Not because that's what the book's bad writing imposed.
The book's writing was marvelous.
Collins' writing style, which was so vivid and present in the first two books, is flattened to a dull, passive observation.
What lazy writing.
As for the writing - at times it rushed forward at the expense of clarity and at other times languished in the inconsequential, repetitive doldrums.
Had the writing been stronger, less roughshod, then it would have sat better with me.
I also didn't love the writing style.
Writing was incredible.
I would rate this a 3.5, but since that's not an option I am going with a 4 because I still find the writing quality and the overall message and themes to be very gripping and kept me glued.
The writing is still brillant.
The language and writing style didn't correspond to what actually happened.   
These irregularities in Collins writing stopped me from giving this a much higher rating.
Suzanne Collins has a fantastic writing style and imagination.
, I think Suzanne Collins just got tired of writing and everything went into super-hyper-wtf most-undeveloped-plot-ever speed.
It poses as a Y.A., with its simple writing and juvenile style...
I found the writing weak and needlessly long.
**   The quality of writing was piss-poor and long stretches of this were utterly dull.
Can you take some initiative in your writing and explain this for us?)
I mean, she didn't kill Peeta, Beetee, or Annie, which she could easily have done since none of them were particularly stable), and perhaps that's even what is supposed to have happened, but the writing just doesn't support it.
Its like the author got tired of writing and just suddenly ended the book with very little resolution.
All that great writing went out with a whimper, and I am mad about it!
What a disappointment... I loved the first two, loved the author's style of writing and ability to keep you on the edge of your seat throughout the book -
I was satisfied with the ending, but the writing was much sloppier, probably because Collins went from telling a story to telling an ideology.
It seemed like Collins was just tired of writing, and so started throwing at a dartboard of characters to see who she would kill off next.
There are several problems with the writing:   
The writing is strong but by Mockingjay I just did not care if the characters lived or died.
I also was very frustrated with how crucial points were handled in the writing.
Damn Suzanne Collins has a brilliant imagination and amazing writing skill.
Hunger Games was great, Catching Fire was defendable, Mockingjay was just sloppy writing.
I found the writing simplistic and could not get attached to characters in a meaningful way.   
I thought the writing was as strong in this book as the others and the imagery was outstanding.
But I still like Collins' style of writing.
The writing.
THE BIGGEST WRITING FLAW
I sure would like to see Ms Collins write a book with the scales balanced in the other direction because she IS a 5 star writing talent!
This was the weakest of the trilogy, although the plot was more interesting than the second, the writing had gotten weaker.
The writing was just TERRIBLE.
but I felt the overall simplicity and poor structure of the writing was an insult to young readers.
I'll still see the movie because the story intrigues me, but the writing has too many holes for me.
And Collins employed my most hated writing device- knocking the main character unconscious during the climactic battle and wrapping up the story in a random rehashing of crucial events.
yes the story overall was good but the writing was the worst I've read in a long time.
Again, I think Collins ended with some lazy writing, excusable possibly by the fact that she was trying to get us to read into the character's head, but it was boring and dry.   
I've read all three books and although the writing was ok the stories and characters
Writing.
The writing, the intensity, the characters, the creativity....
Seemed like the author got bored and wanted to hurry up and finish writing.
The writing was good, but the plot of this book and the overall story bored me so much
The end was even lazier than Collins' usual writing.
Hell, there were moments of powerful writing in the final chapters.
It really is a solid piece of writing on its own.
The writing in the first book of the trilogy wasn't anything impressive, but I think it deteriorated greatly by the time the third book came around.
Her writing is a copy and paste of one minute self loathing to "
and I sincerely apologize for my sloppy writing and overindulgence in run-on sentences!
The bad writing, forgivable in the first 2 for good plot, is constantly in your face in this book.
As for the writing?
In the end my rating is more to do with my enjoyment of the book than the writing or story.
the level writing and poor editing shown in Collins' last installment of the Hunger Game novels was unacceptable.
Seriously, it is like when I am having my elementary students write a story that they get tired of writing, so they just say, "
A new triangle between leaders is squandered in the writing.
They teach it in many writing workshops and seminars: "Show, don't tell."   
, some heart-pumping, soul-inspiring lines ("If we burn, you burn with us!"), jaw-dropping surprises (I won't say more than The Return of Peeta), Collins' generally high-quality writing AND best of all a happy ending.   
It became a non-priority finish around page 100 when I didn't see where the action was going, and even the writing seemed to have changed a little.
Both the writing and the content were not up to my expectations, and I felt very dissatisfied when I finished.
The writing is second rate and so dumbed down a 4 year old has the IQ to follow and where the characters could have been built up and mad more complex
While Collins has shown periods of brilliant writing, other passages leads one to wonder if she was under the influence of morphling herself.
In evaluating this whole series, its biggest limitation is its poor writing.
It was nice to see that in writing.
I just don't like her writing and the character development.
her writing is beautiful.
I still love Suzanne Collins writing, she is an amazing writer but sometimes the plots' just fail.
I was utterly taken aback with the many flaws in the writing and the utter lack of growth by the main character.
His death was marred by horrible writing and I ended up feeling swindled and angry rather than remorse and sorrow.   
The writing lacked any emotion whatsoever.
Everything feels authentically numb and unfeeling, a credit to Collins' writing style.
Also, in my opinion, the quality of writing decreased in this novel.
A friend of mine who originally suggested this series to me said something to the effect of "You can tell by the writing that she loved writing the first book, endured writing the second book, and hated writing the third book.
Without battle scenes to sustain the writing this novel shrivels and wilts.
It was a little anti-climatic, I skipped a lot of unneeded writing.
Bad writing.
I think Collins got bored writing the story and felt like she had to conclude it as quickly - regardless of how sloppy and erratic the writing became - as possible.
Very lazy, apathetic writing towards the end.
It felt to me like Collins just became tired of writing, killed everyone off except for the final two characters to hopefully start their very own Garden of Eden.
I found the book to be boring, unimaginative, and a significant lack of creative writing.
But there were alot of problems with the writing, all of which, I hate to say, seems to go back to bad revising and editing.   
Because of this, the quality of writing suffered and then the book was choppy and difficult to follow.   
From a writing standpoint, it was seriously lacking in dialogue.
It was like Collins was taking her merry time writing and then realized abruptly that she had a deadline.
That said, the writing was pretty good, I guess.   
My problem with the book was the terrible writing and plot.
{sigh again} Yes, it is great writing and full vivid images (
Basically, I didn't feel that the writing was anything spectacular and I preferred the action of the arena as opposed to the bizarre young adult political tension that Mockingjay was driven by.   
To me, there was no growth for the author in her writing for this novel.   
The first of the Hunger Games trilogy was absolutely amazing, I was astounded by the writing and the whole story in general. '
To be fare, some nice writing in this book, but a major let down regarding expectations.
When an author's writing is two for two, you start getting really excited about the third one.
I still think it's worth a 3 because if nothing else, her writing is very captivating
that would be effin' brilliant writing!   
, IT'S JUST BAD WRITING.
it had its good parts, but it felt like suzanne collins just got bored with writing and sped right through the story.
I could hardly follow up the plot, I was not sure if it was because the author's fluid writing skill had deteriorated or because I was reading it in Katniss' voice.
I apologize to Suzanne Collins--but I feel that perhaps her book schedule of editing and writing squeezed her too hard and left her unable to write the way she did for book one.
She should have been given more time to finish writing and editing this book properly.
It never promised to be more than young adult fiction, so I can't fault the writing so much.
I finished the whole trilogy just because I really liked the idea of Panem and yearly Hunger games, but it was Twilight in terms od quality and writing skills.
The themes and ideas were interesting, but the unbearable central character and the naive writing brought the book down.
Seriously, I can't help but wonder if Ms. Collins suffered a major crisis during the writing of this book, had a horrific event take place in her childhood or if she is off her meds.
Both the writing and the plot are rushed.
To understand my rating, you have to understand that I love the writing.
However, her writing is so strong that I had a strong visceral reaction to the story.
This dulls the effectiveness of the great characters and writing.
so you see, Katniss, after you passed out the others..." The writing in this book is by far the least polished, and I had a lot of trouble following the chaotic street battle in the Capitol.
Writing: 4 stars
The writing is atrocious.   
but the lack of development, like Collins just got tired of writing.
In my experience, the final book of a series is generally the best writing-wise.
However, as any writer knows, long words do not equal good writing.
To be able to experience Panem, its corrupted core, and everything else the book contains without being weighed down with the chintzy, high school-level writing will be very exciting.   
It is nothing personal, but I just can't stand the terrible writing, cliche-plot, irrational characters, and the whining teen-girl.
In my view, that's just not good writing -- it's like seeing the stage hand during a play.   
In [454]:
from scripts.text_tail_analysis import has_lemma_pos, sentence_iter
lemma = 'bad'
pos = 'ADJ'

for sent in sentence_iter(docs_group2):
    if has_lemma_pos(sent, lemma, pos):
        print(sent)
At times it's like reading a bad diary.
The worst, though, was that I could see it all coming
This is not my favorite book ever, but it's also not the worst book I've ever read.
First of all, Katniss and Gale stop speaking on bad terms.
This was bad....
worst book in the series.
And good and bad are not clearly defined black and white.
Even in real life, no matter how bad things may be, there is always hope.
Katniss is likely suffering from some form of PTSD, but rather than use the horrific experiences that keep piling up on Katniss as a jumping off point to explore what war-like events do to younger people (such as child soldiers), Collins shuffles around the whole issue by constantly throwing more bad things at Katniss so any recovery or clarity she gains is immediately destroyed by what happens next.   
But worse than this is the final fifty pages of the book when the twist is revealed, and I was left completely not caring because all the build up to make Katniss strong, to have her work through her pain, to have her continue forward because what she's doing it right, all of it gets cancelled out for a shock value plot twist that reeks of trying to make a point and failing.
And the worst part is that this all happens over the course of a couple of pages.
Maybe I'm tired and in a bad mood, didn't have a particularly good week, maybe that's making me go more annoyed at the book.
And yes - that made me very emotional with Katniss and Buttercup on the same bad, both crying over Prim.
so I steeled myself for the worst possible outcome because I obsessively fell in love with this series.
We stay right there and see all Katniss sees, good or bad the whole time.
too bad.
Worst ending for a book series.
She would be incredibly mean to people in the capital, and then the next she would suddenly feel bad.   
As I was reading this, I slowly realized how bad the writing was.
I guess it was a good story, but it's always a bad sign when you really couldn't care less if the main character dies (and at some points, really hope that it happens).
Not because that's what the book's bad writing imposed.
Also, although the first two books are a bit gruesome, I didn't think they would be too bad for kids to read because Collins doesn't go into too much graphic detail, or dwell too long on the deaths and make them unbearable.
I didn't mind it in the beginning, and even expected it; after all Peeta has been captured, her home has been destroyed, and she's been through hell and back, only to find out the worst is yet to come.
Then there's what happened to Peeta - that may have been the worst part for me.
The story wasn't bad
Perhaps most impressively the books lack a clear good and bad narrative - Katniss herself is seen by others (and often by herself) as unemotional and manipulative in her triangulation between Peeta and Gale, while in turn being manipulated by others particularly in the second book and through the third.
This book had the worst ending known to man!
This book was one of the worst books I've ever read, near the end
the writing[in my opinion] was just bad
You've got to take the good with the bad
the worst of the trilogy, and honestly the love story was never my favorite part of the books.
It was bad enough that her beloved Cinna was killed off, Finnicke killed just after his wedding to the love of his life, Peeta's memories Hijacked, and Gale badly wounded over and over again, Ms Collins then decides to land the biggest blow of all by having Prim blown to bits.
The first person narrative made it all the worse.
War is bad?
Even her relationship with her mother was the same, if not worse, than it was at the start of the series....
The story is full of intrigue, excitement, action and suspense, but there's that lingering "which guy will she choose" trope that gets worse and worse as it goes along.   
Those are bad stats, Suzanne Collins.   
I hated Peeta throughout all three books and it got even worse when he announced that he loved her in the interview.
I almost felt like the bad guys won on this one, and I didn't like it.
And as if that isn't bad enough, there was still so much promise throughout the whole story that never comes to fruition!   
I mean, I expect bad endings with some authors like Nicholas Sparks, but I was completely thrown on this one.
Even while expecting the worst, I didn't mind the first half of the book.
There are probably going to be people throwing shoes at me now, but do your worst.
It was worse than the first book, but better than the second.
The novel reads like a bad space opera complete with shallow characters and a functional single level plot.
To compound this bad structural choice, what action that does occurs is short-circuited.
Bad, bad, so bad.
It's too bad :(.
Like a bad memory, I shall quietly sweep the memory of this book under a rug and then roll the rug up and throw it in the garbage, vacuum the leftover dirt (memories) and never think of the rug dirt again but if it happens to seep into my memories I will hastily override that memory with anything else that comes to mind.
Finnick, the victor; Finnick, who had just been married after what is possibly the worst imaginable existence -
The worst book I've ever read.
Worst ending possible.
I have to assume nothing changed in the government and it's just as bad as it was.
Does that make them bad
It was rude the way he said she was ugly and a bad person and a liar
The worst part is, that it can't be undone, it can't be rewritten, the series is forever tainted.   
Nothing gets resolved, everything ends worse, and the whole thing is a giant unsatisfying downer.
is bad and humans are dumb.
It could be worse.
I first thought it was a good book with a depressing message; now I realize it is a bad book with a condescending, obvious (and poorly delivered) message.   
And it only makes it worse that it's geared to teens.   
The series got worse and worse.
At its worst, this series reminds me of the movie Idiocracy, which condemns scatalogical humor as a major theme, then proceeds to appeal to the audience with only scatological humor for 2 hours.
and although she does feel bad for some of the destruction that she has inadvertedly caused Katniss just does wat she wants
Honestly, finishing it just made me in a bad mood.
Honestly one of the worst books I've ever read.
But nothing was worse than the her fate for the Big Bad, Snow.
I feel really bad giving this a two-star rating
I should've stopped with the first as each in the series has gotten progressively worse for me.
A couple of times, Katniss gets injured, things look bad and then CUT!
The best thing I can say about it is that it wasn't as bad as the Harry Potter epilogue.
The whole thing was a bit of a mess, but it could've been worse.
but overall it wasn't a bad read.
This book was the worst of the 3 in the series.
That may make me a bad person due to the horrors she has seen
And what's worse, President Snow has made it clear that no one else is safe either.
Not bad.
Was it better, worse....
Of course, there's nothing bad in a little "razzle dazzle" (as Plutarch would say) when it comes to slaughtering children (Or, wait...
But not even the worst case of PTSD could justify the level of age regression and brokeness we are shown in this story!   
The HG must be one of THE worst love triangle in history!!!!!!
And while we're on that subject, might I point out how I believe that Katniss could very well be one of the worst love interest in any love triangle in history????!!!!!
to their guy, or anyone for that matter, because someone else wants it so bad.
definetly the worst book in the series.
Now the third book will just be some crazy gaga, some impossible rebels that turn out to be worse than the people we thought were bad guys.
There was just so much focus on the bad and nothing good.
Ok, I completely understand everyone saying that this was a very bad situation, and that war is hard, and that Katniss, of course, is suffering from PTSD.
Really bad ending to the series.   
, I still feel there is something left to say, I want to know more, which to me, is the worst thing that can happen when I finish a book.   
"War is BAD!
If you fight back, you are JUST as bad!"
"You all don't fight - You'll be bad people!
I get the dark "war is bad" thing.
What was worse was that Katniss was stripped of any decency she had and
Worst of all, Gale is stripped of any closure.
This is worst of the three, and feels like meeting the deadline.
For me it was one of those things where I felt the worst had already happened to her, so it was hard for me to think the conditions she was in were really that terrible.   
I was tempted to give this a 1-star, but okay this book was bad but not to that extent.
Worst of the series - extremely violent, not very well written, and pretty confusing.
Definitely the worst of the series.
I understand a lot is going on, but her reactions are not consistent, this was a minor problem in the first book that was only made worse now.
More like a 1.5 than 2 stars, because this is worst that the previous
One of the worst sequels I've ever read, which is truly disappointing considering
The only reason she ended up with Peeta was because Gale had the bad luck of being involved with the bombs and was whisked out of the story.
Would it have been so bad for her to kill them?
Very slow and boring, the worst of the 3
yes the story overall was good but the writing was the worst I've read in a long time.
To live there is worse than to live in prison.
I could go on and on and on with things like this, but I think you get the point why this is a bad book.
Seriously, it's almost as if Collins had a brain storming session, "Let's see, what's the absolute worst possible thing that could happen to a person?
Bad Things   - Gale and Katniss shoot war planes down with bows and trick arrows.
The worst love triangle conclusion since Pretty in Pink.   
And trust me with all the wars, games and gore in these books, they did not seem as bad as that anti climatic ending.
And whats worst, Katniss was all for it.
they had her give in to someone else's pressure, which is one of the worst possible reasons to become a parent.
It teaches us that humanity is a nothing but bad, that the only way to resolve problems if fighting fire with fire, to forget about tolerance and forgiveness and to make people pay.
I have nothing bad to say about Collins writing and her books.
My rating on this book is very low, not because it was bad, but because by the end of the series I wished I have never began them.
The thoughts this story left me with gave a bad taste in my mouth and horrible images in my mind.
What's worse than living under a fascist regime?
I can't say enough bad things about this book or how this series was ended.
I read a lot of the negative reviews posted on Goodreads, which I still agree with them all, but after reading the book myself these negatives didn't seem as bad as when I read those reviews.
Now whether that's because I was already prepared for them or the book wasn't as bad as they said, hmmm, IDK.   
it is by far the worst of the series.
This is the worst of the series, it was choppy and sloppily written.
Probably the worst aspect of the series was the character of Katniss who is a lazy, self-centered, mean and immature junkie who for some reason is in the middle of a love triangle (the boys seem sensible apart from this choice) and also the leader of the revolution.
the promise that life can go on, no matter how bad our losses.
Not saying the book was bad
I almost feel bad for Gale, because he never really had a chance with her.
But this one, should have been as The Hunger Games made me believe: a story about a girl that had the qualities to be admired, the strength to overcome the worst and the capacity to become a hero.
The story wasn't as captivating, the names got worse (Leeg 1 and Leeg 2?
the worst book of the series.
To many reviews already on this book both good and bad.
As if to confirm by suspicions we are renewed with the flip at the end of the book:   I'll tell them that on bad mornings, it feels impossible to take pleasure in anything because I'm afraid it could be taken away.   
Bad ending hard to even finish was very let down after the first two books
But worst than all of that is the pacing!
The ending was the worst.
So is that good or bad?
I know war is bad and effects everyone in tragic ways, but still Suzanne!!
You made Katniss seem like such a bad-ass throughout the entire series and in the end you make her pick THE F**KING DANDELIONS over the FIRE?!
I usually finish books in a day or a week depending on how good or bad it is.
I understand that it turned all of his good memories to bad, but it shouldn't really affect his personality in any way...
Hey, this one wasn't actually as bad as the second!
In Mockingjay, all these traits are scrapped and we get a Katniss-clone who is angsty and bitchy and whiny (wasn't Bella in Twilight bad enough?).
Not only did she not improve herself from the first book (she was kickass in the first book btw), she got WORSE, an empty shadow of her former self.
And good and bad are not clearly defined black and white.
Even in real life, no matter how bad things may be, there is always hope.
One of the worst books I have ever read.
The bad writing, forgivable in the first 2 for good plot, is constantly in your face in this book.
Bad stuff happens.
Another book I feel bad that I am going to piss readers off with my review.   
Too bad.
That was the worst part of the whole book.
It felt like Breaking Bad for YA.
But as before, this is a bad choice.
In fact, it's even worse in this book because Katniss spends most of her time as everyone else's sock puppet.
War is bad.
Catching Fire wasn't too bad.
I can stomach stories where bad things happen
This is not to say it is a bad book.
Worst of all, she never stepped up.
Character deaths went from compelling and tragic in the first book to cheap attempts to show that War Is Bad in this book.   
It's bad when it's 140 pages in and
It's too bad, there are so many things in here that could have been amazing and truly emotional to explore.
As I said I dont read these kind of books and the war issue in the Mockingjay brought too much bad memories on the surface.   
That is not to say that this one is very much worse than the other two, but just that I am having second thoughts about the whole trilogy.
she's kissing every on that just a bad rule model for a teenage   
When I return: the good, the bad, and what the page flip between Chapters 24 and 25 mean for this book and the series.   
Here are the reasons why this book is bad:   1.
Book was worst of the series, became very predictable, and ended just like I thought it would.
The worst book of the series.
All I can say is that for me this is the worst from the three books even if the ending is quite satisfying.   
You want an example of bad conclusion of a trilogy/series that started off on the highest tops?   
All in all, worst ending ever!
It was worse than a B Western in the movies.
The bad guys were so obvious and yet Katniss didn't see them.
Is this the worst and least coherent part of the series, or a point made, stating that there is no right or logical side of war?   
It was unnecessary, and it astounds me that neither Collins or her editor thought it would be a bad idea to include such an incessantly long explanation.
Katniss turned out to have way too many issues, got injuried every other chapter and after a while I was just hoping for it to end without being worse and worse as a I read.
After waiting in anticipation for both Mockingjay and Catching Fire, the series has gotten progressively worse.
This is the only series I've ever read that gets progressively worse.
that was the worst ending to such a great series!
By this third book, I knew that certain main characters would live (they do) that Katniss would make one bad decision after another in complete defiance of all good sense or advice (she does), that every other important character in the book would stand up in defense of her bad judgment, make excuses for her behavior and protect her (they do), that it would all get passed off as stress and mental damage from her participation in the Games (it is) and that there would be no ramifications that really mattered (there aren't).   
she drifts along only making everything worse and 25 years of her life are wrapped up in the last 5 pages of the book without much explanation and no psychological investment.   
It's really too bad.
This was an ok book with a good moral, "War is bad".
And what's worse, President Snow has made it clear that no one else is safe either.
worst of the 3 books
The books have gotten progressively worse, almost as though they were destined to be made into movies.   
I mean I think Snow should have had a worse death
Wow it sucked so bad
Say to the world that Coin was as bad--or would be as bad--as Snow and how they needed to be careful.
Seeing my favorite characters mangled and insane at the end was maybe the worst scenario.
This can be a good or bad thing depending on the tastes of the reader.   
It was repetitive but still not bad.
Too bad---loved The Hunger Games, but am now disillusioned.....
**   Bad.
It's hard to begrudge Collins for concentrating on the complexities of war, where there are no good guys and bad guys, only uneasy alliances, or giving Katniss such a lot to think about.
, so it wasn't the worst thing I've ever read.
After the death of a beloved character, Collins composed a passage that I think aimed for high-brow narrative of loss but came across as an LSD-fueled bungle of bad descriptive poetry;   (5)
Things get worse from there.   
The worst ending I could have imagined.
Too bad.   
Out of the three books, this one I thought was the worst.
Can't really say I was expecting much, I find that the third books in trilogies to often be worse than the first two.
"it's all my fault, I'm a bad person" business.
Yes, war is bad.
Everything gets even worse in this book.
Bad dialogue.
Bad writing.
When I reached to the ending, it got even worse.
Not Mockingjay, what a bad ending to a decent series.
One of the WORST endings to a trilogy I have ever seen.
I was hoping for some kind of tech warfare or economic warfare or some good ol' Ludlum-like lie/spy/get the bad guy.
I can't believe how bad it was.
Seriously, to think that you single-handedly have the best reason to bullet the target bad guy
One of the worst books that I have ever read!
Having been warned, I was expecting the worst and hoping for the best from this book, and I landed somewhere in the middle.
But there were alot of problems with the writing, all of which, I hate to say, seems to go back to bad revising and editing.   
I loved how the last line was,'But there are much worse games to play'.
A lot of the story can be attributed to her actions, for better or for worse.
Is I bad that my favorite part of this book was reading the reviews of broken-heated readers?
She was tough, bad ass, smart, etc.
In short, I think the author covered a weak plot with action and violence and trying to make Katniss into a bad-ass rebel that didn't fit with the set up from the previous books.
***   I officially take back every bad thing I wrote about Catching Fire.
How bad is Mockingjay?
In fact she made all the other characters worse because they loved her...
And the end was really bad.   
The trilogy gets worse as it progresses.
This final chapter is the worst of the three.   
However I wouldn't call the series bad.
Why does this President Snow (bad guy) or anybody else in this universe give a rip about what Katniss says or does as she mopes around underground?
You would think that now both our protagonist have survived the games they would have a happy life, well not exactly.. they run into a bit of bad luck
I couldn't feel bad for her
A bad ending
And the ending is the worst half-ass ending ever for a book.
This was literally one of the worst book I have ever read.   
**   i'm sorry but this book was really bad   
The first book sets our hero up as a self-sufficient bad-ass survivor.
I feel bad for Peeta that he's stuck with Katniss as the only emotional link to his forgotten past.
The secound book was even worse than this one so don't bother with that.
There are so many reasons why I thought this was a bad book.
Worst of the 3.
While that results in lots of angst and inevitable reader frustration, it doesn't mean it's a bad route to take - lots of people who had been put into Katniss's position
His parting script should read, "Thank you, Suzanne, for making it virtually indistinguishable who is the bad guy in this book.
This only horrifies the reader and turns them off in the worst way possible.
This series of books just got worse and worse....
I was beginning to wonder who the good guy was, and who the bad guy was.
Anyways, this review is really bad, but I really need to get started on all my homework which I chose to do at the end of the book.   
We knew Coin was bad from the beginning.
What make it worst is, katniss's too often mental breakdown, that triggered with everything.
Finally, the ending is the worst and make me hope suzanne Collins will announce that she is disappointed with her book and will proceed to write the real ending.
Mockingjay has a bad pass at the latter.
Worse, Katniss lapses into a passive heroine.
I thought the movies are bad but
but god honestly this book is so bad...
The sum of the experience , mirrors my sentiment towards each of the books: it is not bad, but it is not good.
I can't believe how bad this one was.
It was so cheesy and something out of a bad 80s Sci-fi movie.   
The ending was bad.
Really bad.
I can't even say it was bad but good, just terrible.
Bad
I think "Bright and early the next morning, the brains assemble to take on the problem of the Nut" is possibly the worst sentence ever written in a YA book.... EVER!
, IT'S JUST BAD WRITING.
BAD DECISION.
It left such a bad taste in my mouth so that whenever anyone mentions the series, all I can think about is how they end up.
I know it wasn't THAT bad, but this was probably the closest I've ever come to writing something with what could be described best as "burning, flesh-rotting rage".
"And I was thinking," Collins continued, "That we could make Gale, our 'bad boy,' into a huge terrorist.
But worst of all was the way Katniss ends up with Peeta.
The worst was all the characters that we have become attached to that have died (spoiler Alert) especially Prim.
In the end, Katniss is extremely depressed and the example she presents is about the worst one I think of.
This is one of the worse books I have ever read.
Too bad.
The ending of the Hunger Games left a bad taste in my mouth.
It just went from bad to worse, to will someone please just slit my wrists and put me out of my misery?!   
This is definitely the worst book of the series.
The worst part of it is that the focus leaves the world and the war, and centers more on Katniss's boy troubles.
but I feel this ending was the worst.
WAR IS BAD.
We understand that it's especially bad for children, too
This book is disturbing, and left me with a bad taste in my mouth.
Bad conclusion to a series.
I did however love the ending where the leader of the rebellion was just as bad as President Snow
This was a huge disappointment - it was a huge surprise to me how bad this third book was
The ending wasn't too bad though, could've been worse.
That whole last third of the book, Finnick's death was THE WORST.
THAT IS BAD,
EQUALS= BAD BOOK.
I FEEL BAD
Worst of the series she made me not care about people who later die , and or characters who survive and I hate the ending where the hunger games continue what did they fight a war for ?
The last mission is the worst in terms of pointlessness.
The worse thing is that you know that this could have been epic (like all Spartacus and 300 like).
Would it really have been a bad thing to give one character an actual happy ending?
And when Peeta came back brainwashed thinking she was the bad guy after he gets rescued from his captors and tries repeatedly to kill her, I was convinced she would end up with Gail.
Here she's confused and torn, in a love triangle that has been building up but suddenly feels underdeveloped and is frankly dull, and fighting to keep her identity in a rebellion that doesn't have any redeeming traits and seems just as bad - if not worse - than the evil Capitol domination of them.
Other than feeling bad for herself for loving two people, she never really considered it at all, or truly acted upon it.
Too bad.
This was worse than the second book.
Worst part - everything else.
Progressively worse
I waited so much for this book that at the end it just let me down so bad.
It's unfortunate that this seems to be common practice for the last book of a series (Harry Potter, Twilight), but for this book it seems particularly misguided, because Mockingjay is (a) no longer than the first two books, and (b) without a doubt the worst book in the trilogy.
The worst part for me was the lack of character development, I was expecting anything tragic to happen to any character including Katniss
she was also still haunted with her bad dream, not really a good ending to me....
No, Katniss has flaws, she has depth, and we watch her struggle and punish herself and we see both her good and bad sides.   
And the head of the Capitol, President Snow, felt like a cardboard cutout of a typical bad guy/tyrannical ruler.
All in all, not a bad trilogy.
I found her at best whiny and at worse annoyingly weak willed.
What makes the worst even worse is that it seems Suzanne Collins is attempting to justify all of her actions by making them not her fault and thus make Katniss unnameable for the books plot.
I find this to be the worse book in the series.
In [439]:
for pos_type in token_pos_types:
    for term, freq in docfreq_group1.most_common(1000):
        lemma, pos = term
        if pos != pos_type:
            continue
        prop_group1 = freq / total_group1
        prop_group2 = docfreq_group2[term] / total_group2
        prop = prop_group2 / prop_group1
        if prop > 0.66:
            continue
        print(f'{lemma: <20}{pos: <6}{freq: >6}{docfreq_group2[term]: >6}{prop_group1: >8.4f}{prop_group2: >8.4f}{prop: >6.2f}')
great               ADJ     1713   205  0.2072  0.1227  0.59
happy               ADJ     1209   127  0.1462  0.0760  0.52
sad                 ADJ      999   128  0.1208  0.0766  0.63
amazing             ADJ      929    86  0.1123  0.0515  0.46
perfect             ADJ      556    23  0.0672  0.0138  0.20
dark                ADJ      436    58  0.0527  0.0347  0.66
realistic           ADJ      367    44  0.0444  0.0263  0.59
wonderful           ADJ      310    23  0.0375  0.0138  0.37
excellent           ADJ      258    15  0.0312  0.0090  0.29
fantastic           ADJ      257    29  0.0311  0.0174  0.56
powerful            ADJ      223    23  0.0270  0.0138  0.51
brilliant           ADJ      223    28  0.0270  0.0168  0.62
satisfied           ADJ      215    22  0.0260  0.0132  0.51
beautiful           ADJ      209    11  0.0253  0.0066  0.26
intense             ADJ      191    21  0.0231  0.0126  0.54
certain             ADJ      190    25  0.0230  0.0150  0.65
heartbreaking       ADJ      153     8  0.0185  0.0048  0.26
surprised           ADJ      153    17  0.0185  0.0102  0.55
unexpected          ADJ      147    18  0.0178  0.0108  0.61
bittersweet         ADJ      143     9  0.0173  0.0054  0.31
fitting             ADJ      136     6  0.0164  0.0036  0.22
brutal              ADJ      133    14  0.0161  0.0084  0.52
sweet               ADJ      129    15  0.0156  0.0090  0.58
favourite           ADJ      122    15  0.0148  0.0090  0.61
incredible          ADJ      119    12  0.0144  0.0072  0.50
shocking            ADJ      108    11  0.0131  0.0066  0.50
quick               ADJ       97    12  0.0117  0.0072  0.61
ready               ADJ       95    11  0.0115  0.0066  0.57
pleased             ADJ       92     3  0.0111  0.0018  0.16
impossible          ADJ       87    10  0.0105  0.0060  0.57
fast                ADJ       85     8  0.0103  0.0048  0.47
tough               ADJ       85    11  0.0103  0.0066  0.64
broken              ADJ       80     9  0.0097  0.0054  0.56
wrenching           ADJ       70     5  0.0085  0.0030  0.35
raw                 ADJ       67     5  0.0081  0.0030  0.37
pretty              ADJ       67     8  0.0081  0.0048  0.59
appropriate         ADJ       63     7  0.0076  0.0042  0.55
worried             ADJ       61     5  0.0074  0.0030  0.41
devastating         ADJ       61     1  0.0074  0.0006  0.08
bright              ADJ       58     7  0.0070  0.0042  0.60
riveting            ADJ       58     3  0.0070  0.0018  0.26
fabulous            ADJ       56     7  0.0068  0.0042  0.62
flawed              ADJ       55     6  0.0067  0.0036  0.54
mixed               ADJ       54     6  0.0065  0.0036  0.55
unique              ADJ       52     6  0.0063  0.0036  0.57
open                ADJ       49     4  0.0059  0.0024  0.40
current             ADJ       49     6  0.0059  0.0036  0.61
fictional           ADJ       48     2  0.0058  0.0012  0.21
definitely          ADV      624    68  0.0755  0.0407  0.54
highly              ADV      217    16  0.0262  0.0096  0.36
emotionally         ADV      199    25  0.0241  0.0150  0.62
matter              ADV      149    16  0.0180  0.0096  0.53
perfectly           ADV      143    15  0.0173  0.0090  0.52
slightly            ADV      131    16  0.0158  0.0096  0.60
nicely              ADV       91     5  0.0110  0.0030  0.27
beautifully         ADV       57     4  0.0069  0.0024  0.35
differently         ADV       56     5  0.0068  0.0030  0.44
necessarily         ADV       54     6  0.0065  0.0036  0.55
rarely              ADV       47     6  0.0057  0.0036  0.63
heart               NOUN     575    51  0.0695  0.0305  0.44
read                NOUN     573    65  0.0693  0.0389  0.56
question            NOUN     314    30  0.0380  0.0180  0.47
tear                NOUN     277    17  0.0335  0.0102  0.30
loss                NOUN     264    24  0.0319  0.0144  0.45
job                 NOUN     264    35  0.0319  0.0209  0.66
future              NOUN     252    22  0.0305  0.0132  0.43
turn                NOUN     199    26  0.0241  0.0156  0.65
horror              NOUN     189    16  0.0229  0.0096  0.42
night               NOUN     183    17  0.0221  0.0102  0.46
favorite            NOUN     172    14  0.0208  0.0084  0.40
reading             NOUN     154    19  0.0186  0.0114  0.61
change              NOUN     144    19  0.0174  0.0114  0.65
journey             NOUN     142    13  0.0172  0.0078  0.45
struggle            NOUN     136    15  0.0164  0.0090  0.55
ride                NOUN     133     8  0.0161  0.0048  0.30
effect              NOUN     131    16  0.0158  0.0096  0.60
nature              NOUN     119     9  0.0144  0.0054  0.37
peace               NOUN     118    15  0.0143  0.0090  0.63
tribute             NOUN     118    13  0.0143  0.0078  0.55
adventure           NOUN     114    13  0.0138  0.0078  0.56
politic             NOUN     109    12  0.0132  0.0072  0.54
truth               NOUN     107    13  0.0129  0.0078  0.60
tale                NOUN     105    13  0.0127  0.0078  0.61
today               NOUN      99     9  0.0120  0.0054  0.45
ability             NOUN      95     9  0.0115  0.0054  0.47
evil                NOUN      87    11  0.0105  0.0066  0.63
answer              NOUN      86    10  0.0104  0.0060  0.58
plenty              NOUN      85     4  0.0103  0.0024  0.23
seat                NOUN      85     3  0.0103  0.0018  0.17
justice             NOUN      84    11  0.0102  0.0066  0.65
consequence         NOUN      81     8  0.0098  0.0048  0.49
side                NOUN      81     7  0.0098  0.0042  0.43
cost                NOUN      79     6  0.0096  0.0036  0.38
entertainment       NOUN      75     5  0.0091  0.0030  0.33
turner              NOUN      74     8  0.0089  0.0048  0.53
anger               NOUN      71     8  0.0086  0.0048  0.56
roller              NOUN      67     8  0.0081  0.0048  0.59
coaster             NOUN      67     6  0.0081  0.0036  0.44
morning             NOUN      66     6  0.0080  0.0036  0.45
copy                NOUN      62     7  0.0075  0.0042  0.56
school              NOUN      58     6  0.0070  0.0036  0.51
discussion          NOUN      57     7  0.0069  0.0042  0.61
cruelty             NOUN      57     4  0.0069  0.0024  0.35
punch               NOUN      57     2  0.0069  0.0012  0.17
commentary          NOUN      53     7  0.0064  0.0042  0.65
cover               NOUN      52     4  0.0063  0.0024  0.38
gut                 NOUN      51     6  0.0062  0.0036  0.58
aftermath           NOUN      50     6  0.0060  0.0036  0.59
dandelion           NOUN      50     4  0.0060  0.0024  0.40
damage              NOUN      49     5  0.0059  0.0030  0.50
father              NOUN      48     4  0.0058  0.0024  0.41
genius              NOUN      47     3  0.0057  0.0018  0.32
Everdeen            PROPN    257    34  0.0311  0.0203  0.65
series              PROPN    124    13  0.0150  0.0078  0.52
Trilogy             PROPN    124     9  0.0150  0.0054  0.36
Quell               PROPN     71     9  0.0086  0.0054  0.63
Quarter             PROPN     69     9  0.0083  0.0054  0.65
MOCKINGJAY          PROPN     62     7  0.0075  0.0042  0.56
cry                 VERB     704    53  0.0851  0.0317  0.37
recommend           VERB     496    56  0.0600  0.0335  0.56
break               VERB     414    53  0.0501  0.0317  0.63
reread              VERB     209    16  0.0253  0.0096  0.38
face                VERB     161    19  0.0195  0.0114  0.58
pack                VERB     136    15  0.0164  0.0090  0.55
thank               VERB     124    15  0.0150  0.0090  0.60
surprise            VERB     104     8  0.0126  0.0048  0.38
provoke             VERB     100     3  0.0121  0.0018  0.15
laugh               VERB      88     7  0.0106  0.0042  0.39
haunt               VERB      83     6  0.0100  0.0036  0.36
review              VERB      83    11  0.0100  0.0066  0.66
grip                VERB      83     8  0.0100  0.0048  0.48
answer              VERB      73     1  0.0088  0.0006  0.07
devour              VERB      69     5  0.0083  0.0030  0.36
relate              VERB      69     9  0.0083  0.0054  0.65
escape              VERB      67     8  0.0081  0.0048  0.59
predict             VERB      66     3  0.0080  0.0018  0.22
be                  VERB      64     5  0.0077  0.0030  0.39
unfold              VERB      60     2  0.0073  0.0012  0.16
inspire             VERB      56     6  0.0068  0.0036  0.53
overthrow           VERB      54     7  0.0065  0.0042  0.64
admire              VERB      54     7  0.0065  0.0042  0.64
disagree            VERB      54     7  0.0065  0.0042  0.64
share               VERB      51     3  0.0062  0.0018  0.29
sob                 VERB      47     2  0.0057  0.0012  0.21
volunteer           VERB      46     5  0.0056  0.0030  0.54
reflect             VERB      46     3  0.0056  0.0018  0.32
  • adjectives: 'great', 'happy', 'sad', 'amazing', 'perfect', 'dark', 'realistic'. The word 'dark' is probably intended as a positive aspect.
  • adverbs: 'definitely', 'highly', 'emotionally', 'perfectly', 'beautifully'
  • nouns: 'heart', 'read', 'question', 'tear'
  • proper nouns: 'Everdeen', 'series', 'Trilogy', 'MOCKINGJAY'
  • Verbs: 'cry', 'recommend', 'reread', 'thank', 'provoke', 'surprise, 'grip', 'devour', 'relate
In [437]:
for pos_type in token_pos_types:
    for term, freq in docfreq_group1.most_common(1000):
        lemma, pos = term
        if pos != pos_type:
            continue
        prop_group1 = freq / total_group1
        prop_group2 = docfreq_group2[term] / total_group2
        prop = prop_group2 / prop_group1
        if prop < 0.66 or prop > 1.5:
            continue
        print(f'{lemma: <20}{pos: <6}{freq: >6}{docfreq_group2[term]: >6}{prop_group1: >8.4f}{prop_group2: >8.4f}{prop: >6.2f}')
good                ADJ     3199   594  0.3869  0.3555  0.92
little              ADJ     1204   208  0.1456  0.1245  0.85
real                ADJ     1151   183  0.1392  0.1095  0.79
final               ADJ      867   178  0.1048  0.1065  1.02
favorite            ADJ      618   100  0.0747  0.0598  0.80
sure                ADJ      592   117  0.0716  0.0700  0.98
different           ADJ      592    85  0.0716  0.0509  0.71
hard                ADJ      539   118  0.0652  0.0706  1.08
glad                ADJ      507    76  0.0613  0.0455  0.74
young               ADJ      475    81  0.0574  0.0485  0.84
entire              ADJ      472   136  0.0571  0.0814  1.43
new                 ADJ      451    68  0.0545  0.0407  0.75
emotional           ADJ      424    65  0.0513  0.0389  0.76
true                ADJ      401    76  0.0485  0.0455  0.94
right               ADJ      392    58  0.0474  0.0347  0.73
previous            ADJ      356    84  0.0431  0.0503  1.17
long                ADJ      341    72  0.0412  0.0431  1.04
able                ADJ      309    62  0.0374  0.0371  0.99
slow                ADJ      288    73  0.0348  0.0437  1.25
worth               ADJ      262    49  0.0317  0.0293  0.93
easy                ADJ      257    43  0.0311  0.0257  0.83
human               ADJ      252    48  0.0305  0.0287  0.94
wrong               ADJ      243    59  0.0294  0.0353  1.20
awesome             ADJ      242    36  0.0293  0.0215  0.74
important           ADJ      231    67  0.0279  0.0401  1.44
nice                ADJ      227    52  0.0275  0.0311  1.13
dead                ADJ      215    55  0.0260  0.0329  1.27
exciting            ADJ      193    56  0.0233  0.0335  1.44
difficult           ADJ      185    40  0.0224  0.0239  1.07
clear               ADJ      180    37  0.0218  0.0221  1.02
political           ADJ      180    26  0.0218  0.0156  0.71
alive               ADJ      168    34  0.0203  0.0203  1.00
satisfying          ADJ      164    24  0.0198  0.0144  0.72
dystopian           ADJ      161    31  0.0195  0.0186  0.95
crazy               ADJ      143    34  0.0173  0.0203  1.18
angry               ADJ      138    38  0.0167  0.0227  1.36
short               ADJ      138    39  0.0167  0.0233  1.40
violent             ADJ      136    37  0.0164  0.0221  1.35
safe                ADJ      135    27  0.0163  0.0162  0.99
honest              ADJ      133    32  0.0161  0.0192  1.19
deep                ADJ      127    22  0.0154  0.0132  0.86
personal            ADJ      125    22  0.0151  0.0132  0.87
necessary           ADJ      118    27  0.0143  0.0162  1.13
loose               ADJ      118    24  0.0143  0.0144  1.01
mental              ADJ      118    29  0.0143  0.0174  1.22
epic                ADJ      118    35  0.0143  0.0209  1.47
overall             ADJ      110    32  0.0133  0.0192  1.44
small               ADJ      107    18  0.0129  0.0108  0.83
tragic              ADJ      105    21  0.0127  0.0126  0.99
upset               ADJ      103    16  0.0125  0.0096  0.77
believable          ADJ      102    20  0.0123  0.0120  0.97
evil                ADJ       99    25  0.0120  0.0150  1.25
enjoyable           ADJ       97    15  0.0117  0.0090  0.77
cruel               ADJ       96    22  0.0116  0.0132  1.13
romantic            ADJ       91    20  0.0110  0.0120  1.09
painful             ADJ       90    25  0.0109  0.0150  1.37
excited             ADJ       90    24  0.0109  0.0144  1.32
heavy               ADJ       89    15  0.0108  0.0090  0.83
large               ADJ       84    15  0.0102  0.0090  0.88
prim                ADJ       80    15  0.0097  0.0090  0.93
surprising          ADJ       78    12  0.0094  0.0072  0.76
particular          ADJ       77    19  0.0093  0.0114  1.22
psychological       ADJ       76    11  0.0092  0.0066  0.72
free                ADJ       76    20  0.0092  0.0120  1.30
physical            ADJ       75    11  0.0091  0.0066  0.73
thrilling           ADJ       75    13  0.0091  0.0078  0.86
simple              ADJ       75    20  0.0091  0.0120  1.32
complex             ADJ       74    11  0.0089  0.0066  0.74
disturbing          ADJ       73    20  0.0088  0.0120  1.36
solid               ADJ       71    13  0.0086  0.0078  0.91
brave               ADJ       70    12  0.0085  0.0072  0.85
innocent            ADJ       70    18  0.0085  0.0108  1.27
similar             ADJ       69    18  0.0083  0.0108  1.29
fair                ADJ       65    10  0.0079  0.0060  0.76
willing             ADJ       64    14  0.0077  0.0084  1.08
past                ADJ       63    12  0.0076  0.0072  0.94
constant            ADJ       63    18  0.0076  0.0108  1.41
afraid              ADJ       60    14  0.0073  0.0084  1.15
impressed           ADJ       59    13  0.0071  0.0078  1.09
harsh               ADJ       59    11  0.0071  0.0066  0.92
rebel               ADJ       57    12  0.0069  0.0072  1.04
strange             ADJ       56     9  0.0068  0.0054  0.80
late                ADJ       55    11  0.0067  0.0066  0.99
unpredictable       ADJ       55     8  0.0067  0.0048  0.72
bloody              ADJ       54    16  0.0065  0.0096  1.47
ultimate            ADJ       54    10  0.0065  0.0060  0.92
fun                 ADJ       52    12  0.0063  0.0072  1.14
low                 ADJ       52    13  0.0063  0.0078  1.24
bitter              ADJ       51    13  0.0062  0.0078  1.26
worthy              ADJ       51    13  0.0062  0.0078  1.26
horrific            ADJ       49    13  0.0059  0.0078  1.31
beloved             ADJ       49     7  0.0059  0.0042  0.71
minor               ADJ       49     9  0.0059  0.0054  0.91
pure                ADJ       49    12  0.0059  0.0072  1.21
mature              ADJ       48    10  0.0058  0.0060  1.03
early               ADJ       48    14  0.0058  0.0084  1.44
future              ADJ       48     8  0.0058  0.0048  0.82
hopeful             ADJ       47    11  0.0057  0.0066  1.16
smart               ADJ       47    14  0.0057  0.0084  1.47
suspenseful         ADJ       47    12  0.0057  0.0072  1.26
graphic             ADJ       46    13  0.0056  0.0078  1.40
book                NOUN   13445  3681  1.6260  2.2029  1.35
series              NOUN    5372   925  0.6497  0.5536  0.85
ending              NOUN    3488   565  0.4218  0.3381  0.80
end                 NOUN    2939   705  0.3554  0.4219  1.19
story               NOUN    2635   580  0.3187  0.3471  1.09
time                NOUN    2475   547  0.2993  0.3273  1.09
way                 NOUN    2298   516  0.2779  0.3088  1.11
trilogy             NOUN    2250   304  0.2721  0.1819  0.67
thing               NOUN    2060   519  0.2491  0.3106  1.25
people              NOUN    1764   385  0.2133  0.2304  1.08
war                 NOUN    1752   374  0.2119  0.2238  1.06
life                NOUN    1279   257  0.1547  0.1538  0.99
spoiler             NOUN    1151   265  0.1392  0.1586  1.14
lot                 NOUN    1134   243  0.1371  0.1454  1.06
love                NOUN    1095   296  0.1324  0.1771  1.34
bit                 NOUN     929   180  0.1123  0.1077  0.96
world               NOUN     862   158  0.1042  0.0946  0.91
action              NOUN     862   258  0.1042  0.1544  1.48
movie               NOUN     841   149  0.1017  0.0892  0.88
district            NOUN     817   151  0.0988  0.0904  0.91
review              NOUN     796   179  0.0963  0.1071  1.11
star                NOUN     765   154  0.0925  0.0922  1.00
reader              NOUN     754   208  0.0912  0.1245  1.37
game                NOUN     722   187  0.0873  0.1119  1.28
novel               NOUN     714   195  0.0863  0.1167  1.35
alert               NOUN     703   181  0.0850  0.1083  1.27
conclusion          NOUN     599    95  0.0724  0.0569  0.78
rebellion           NOUN     577   113  0.0698  0.0676  0.97
day                 NOUN     567    79  0.0686  0.0473  0.69
child               NOUN     535   141  0.0647  0.0844  1.30
year                NOUN     526   102  0.0636  0.0610  0.96
feeling             NOUN     464   108  0.0561  0.0646  1.15
moment              NOUN     464    98  0.0561  0.0586  1.05
chapter             NOUN     427   103  0.0516  0.0616  1.19
twist               NOUN     406    65  0.0491  0.0389  0.79
friend              NOUN     404    64  0.0489  0.0383  0.78
mind                NOUN     404    79  0.0489  0.0473  0.97
rebel               NOUN     400    89  0.0484  0.0533  1.10
epilogue            NOUN     391    89  0.0473  0.0533  1.13
emotion             NOUN     385    54  0.0466  0.0323  0.69
place               NOUN     359    96  0.0434  0.0575  1.32
event               NOUN     358    95  0.0433  0.0569  1.31
word                NOUN     356    80  0.0431  0.0479  1.11
part                NOUN     348    71  0.0421  0.0425  1.01
choice              NOUN     344    82  0.0416  0.0491  1.18
revolution          NOUN     340    74  0.0411  0.0443  1.08
adult               NOUN     334    65  0.0404  0.0389  0.96
beginning           NOUN     324    96  0.0392  0.0575  1.47
violence            NOUN     299    62  0.0362  0.0371  1.03
hope                NOUN     299    80  0.0362  0.0479  1.32
thought             NOUN     295    60  0.0357  0.0359  1.01
line                NOUN     285    81  0.0345  0.0485  1.41
power               NOUN     284    49  0.0343  0.0293  0.85
relationship        NOUN     283    53  0.0342  0.0317  0.93
fan                 NOUN     270    53  0.0327  0.0317  0.97
hunger              NOUN     266    67  0.0322  0.0401  1.25
arena               NOUN     264    54  0.0319  0.0323  1.01
sister              NOUN     263    67  0.0318  0.0401  1.26
detail              NOUN     260    52  0.0314  0.0311  0.99
opinion             NOUN     258    77  0.0312  0.0461  1.48
family              NOUN     249    67  0.0301  0.0401  1.33
reality             NOUN     249    44  0.0301  0.0263  0.87
installment         NOUN     248    63  0.0300  0.0377  1.26
kind                NOUN     245    74  0.0296  0.0443  1.49
course              NOUN     234    48  0.0283  0.0287  1.02
rest                NOUN     229    57  0.0277  0.0341  1.23
government          NOUN     225    49  0.0272  0.0293  1.08
head                NOUN     218    50  0.0264  0.0299  1.13
battle              NOUN     218    62  0.0264  0.0371  1.41
romance             NOUN     217    44  0.0262  0.0263  1.00
one                 NOUN     215    45  0.0260  0.0269  1.04
eye                 NOUN     213    42  0.0258  0.0251  0.98
theme               NOUN     204    53  0.0247  0.0317  1.29
hand                NOUN     204    49  0.0247  0.0293  1.19
face                NOUN     200    43  0.0242  0.0257  1.06
situation           NOUN     198    43  0.0239  0.0257  1.07
pain                NOUN     197    41  0.0238  0.0245  1.03
hero                NOUN     196    50  0.0237  0.0299  1.26
message             NOUN     195    55  0.0236  0.0329  1.40
role                NOUN     191    55  0.0231  0.0329  1.42
fiction             NOUN     186    29  0.0225  0.0174  0.77
rating              NOUN     183    38  0.0221  0.0227  1.03
society             NOUN     175    33  0.0212  0.0197  0.93
issue               NOUN     174    47  0.0210  0.0281  1.34
surprise            NOUN     172    24  0.0208  0.0144  0.69
team                NOUN     166    42  0.0201  0.0251  1.25
leader              NOUN     165    42  0.0200  0.0251  1.26
couple              NOUN     164    41  0.0198  0.0245  1.24
capitol             NOUN     162    29  0.0196  0.0174  0.89
symbol              NOUN     158    23  0.0191  0.0138  0.72
experience          NOUN     156    31  0.0189  0.0186  0.98
man                 NOUN     147    39  0.0178  0.0233  1.31
closure             NOUN     147    44  0.0178  0.0263  1.48
work                NOUN     145    33  0.0175  0.0197  1.13
level               NOUN     144    33  0.0174  0.0197  1.13
order               NOUN     142    32  0.0172  0.0192  1.12
finish              NOUN     141    24  0.0171  0.0144  0.84
piece               NOUN     138    29  0.0167  0.0174  1.04
film                NOUN     137    19  0.0166  0.0114  0.69
hour                NOUN     136    20  0.0164  0.0120  0.73
week                NOUN     133    27  0.0161  0.0162  1.00
stuff               NOUN     133    32  0.0161  0.0192  1.19
teen                NOUN     131    26  0.0158  0.0156  0.98
destruction         NOUN     131    18  0.0158  0.0108  0.68
case                NOUN     124    35  0.0150  0.0209  1.40
age                 NOUN     124    19  0.0150  0.0114  0.76
survival            NOUN     122    18  0.0148  0.0108  0.73
pace                NOUN     122    22  0.0148  0.0132  0.89
memory              NOUN     122    29  0.0148  0.0174  1.18
aspect              NOUN     120    27  0.0145  0.0162  1.11
home                NOUN     120    19  0.0145  0.0114  0.78
edge                NOUN     120    18  0.0145  0.0108  0.74
start               NOUN     117    29  0.0141  0.0174  1.23
suspense            NOUN     117    23  0.0141  0.0138  0.97
view                NOUN     116    20  0.0140  0.0120  0.85
style               NOUN     114    27  0.0138  0.0162  1.17
happiness           NOUN     111    18  0.0134  0.0108  0.80
depth               NOUN     109    28  0.0132  0.0168  1.27
victor              NOUN     109    24  0.0132  0.0144  1.09
storyline           NOUN     109    28  0.0132  0.0168  1.27
chance              NOUN     108    25  0.0131  0.0150  1.15
fight               NOUN     107    23  0.0129  0.0138  1.06
capital             NOUN     105    27  0.0127  0.0162  1.27
middle              NOUN     105    29  0.0127  0.0174  1.37
woman               NOUN     105    25  0.0127  0.0150  1.18
survivor            NOUN     103    14  0.0125  0.0084  0.67
enemy               NOUN     102    14  0.0123  0.0084  0.68
sadness             NOUN     102    14  0.0123  0.0084  0.68
audience            NOUN     102    27  0.0123  0.0162  1.31
type                NOUN     100    19  0.0121  0.0114  0.94
trauma              NOUN      98    20  0.0119  0.0120  1.01
reaction            NOUN      98    23  0.0119  0.0138  1.16
need                NOUN      96    27  0.0116  0.0162  1.39
humanity            NOUN      95    22  0.0115  0.0132  1.15
nightmare           NOUN      94    16  0.0114  0.0096  0.84
outcome             NOUN      92    15  0.0111  0.0090  0.81
drama               NOUN      92    19  0.0111  0.0114  1.02
freedom             NOUN      92    26  0.0111  0.0156  1.40
term                NOUN      92    24  0.0111  0.0144  1.29
note                NOUN      91    25  0.0110  0.0150  1.36
torture             NOUN      89    24  0.0108  0.0144  1.33
minute              NOUN      89    18  0.0108  0.0108  1.00
act                 NOUN      88    17  0.0106  0.0102  0.96
complaint           NOUN      88    14  0.0106  0.0084  0.79
shock               NOUN      88    23  0.0106  0.0138  1.29
People              NOUN      88    20  0.0106  0.0120  1.12
conflict            NOUN      88    22  0.0106  0.0132  1.24
tragedy             NOUN      88    21  0.0106  0.0126  1.18
matter              NOUN      88    16  0.0106  0.0096  0.90
voice               NOUN      87    17  0.0105  0.0102  0.97
description         NOUN      87    19  0.0105  0.0114  1.08
tone                NOUN      85    21  0.0103  0.0126  1.22
difference          NOUN      85    23  0.0103  0.0138  1.34
country             NOUN      84    14  0.0102  0.0084  0.82
warning             NOUN      82    14  0.0099  0.0084  0.84
mockingjay          NOUN      81    20  0.0098  0.0120  1.22
soldier             NOUN      81    20  0.0098  0.0120  1.22
picture             NOUN      80    16  0.0097  0.0096  0.99
doubt               NOUN      76    14  0.0092  0.0084  0.91
good                NOUN      76    13  0.0092  0.0078  0.85
sacrifice           NOUN      74    12  0.0089  0.0072  0.80
plan                NOUN      74    21  0.0089  0.0126  1.40
fear                NOUN      71    13  0.0086  0.0078  0.91
history             NOUN      71    15  0.0086  0.0090  1.05
paragraph           NOUN      70    12  0.0085  0.0072  0.85
circumstance        NOUN      69    15  0.0083  0.0090  1.08
baby                NOUN      67    14  0.0081  0.0084  1.03
literature          NOUN      67    15  0.0081  0.0090  1.11
force               NOUN      66    15  0.0080  0.0090  1.12
cat                 NOUN      65    16  0.0079  0.0096  1.22
peeta               NOUN      65    15  0.0079  0.0090  1.14
spot                NOUN      65     9  0.0079  0.0054  0.69
bow                 NOUN      65    14  0.0079  0.0084  1.07
list                NOUN      64    14  0.0077  0.0084  1.08
imagination         NOUN      61    10  0.0074  0.0060  0.81
rule                NOUN      61    12  0.0074  0.0072  0.97
grief               NOUN      61    10  0.0074  0.0060  0.81
fate                NOUN      61    13  0.0074  0.0078  1.05
odd                 NOUN      61     9  0.0074  0.0054  0.73
friendship          NOUN      61    11  0.0074  0.0066  0.89
genre               NOUN      61    13  0.0074  0.0078  1.05
darkness            NOUN      60     9  0.0073  0.0054  0.74
suffering           NOUN      60    13  0.0073  0.0078  1.07
light               NOUN      60    13  0.0073  0.0078  1.07
body                NOUN      60    14  0.0073  0.0084  1.15
food                NOUN      59    14  0.0071  0.0084  1.17
blog                NOUN      59    10  0.0071  0.0060  0.84
trial               NOUN      59    13  0.0071  0.0078  1.09
break               NOUN      59    13  0.0071  0.0078  1.09
meaning             NOUN      57    17  0.0069  0.0102  1.48
goal                NOUN      57    13  0.0069  0.0078  1.13
comment             NOUN      57    13  0.0069  0.0078  1.13
medium              NOUN      57     8  0.0069  0.0048  0.69
set                 NOUN      56    12  0.0068  0.0072  1.06
path                NOUN      56    11  0.0068  0.0066  0.97
lesson              NOUN      56    16  0.0068  0.0096  1.41
camera              NOUN      54    16  0.0065  0.0096  1.47
breath              NOUN      54     8  0.0065  0.0048  0.73
promise             NOUN      54    15  0.0065  0.0090  1.37
hatred              NOUN      54    10  0.0065  0.0060  0.92
image               NOUN      54    15  0.0065  0.0090  1.37
feel                NOUN      54    11  0.0065  0.0066  1.01
dream               NOUN      53     9  0.0064  0.0054  0.84
scar                NOUN      52     8  0.0063  0.0048  0.76
saga                NOUN      51     7  0.0062  0.0042  0.68
connection          NOUN      51    12  0.0062  0.0072  1.16
past                NOUN      51    10  0.0062  0.0060  0.97
joy                 NOUN      51    10  0.0062  0.0060  0.97
deal                NOUN      50    15  0.0060  0.0090  1.48
uprising            NOUN      50    11  0.0060  0.0066  1.09
thank               NOUN      49    11  0.0059  0.0066  1.11
tension             NOUN      49    14  0.0059  0.0084  1.41
favor               NOUN      49     7  0.0059  0.0042  0.71
conversation        NOUN      49    10  0.0059  0.0060  1.01
confusion           NOUN      49    12  0.0059  0.0072  1.21
tree                NOUN      49    11  0.0059  0.0066  1.11
soul                NOUN      48    10  0.0058  0.0060  1.03
right               NOUN      48    10  0.0058  0.0060  1.03
comparison          NOUN      48    13  0.0058  0.0078  1.34
danger              NOUN      48    13  0.0058  0.0078  1.34
courage             NOUN      48    12  0.0058  0.0072  1.24
brain               NOUN      47    12  0.0057  0.0072  1.26
return              NOUN      46     8  0.0056  0.0048  0.86
motive              NOUN      46    10  0.0056  0.0060  1.08
fun                 NOUN      46    12  0.0056  0.0072  1.29
Katniss             PROPN   7040  1953  0.8514  1.1688  1.37
Peeta               PROPN   3185   772  0.3852  0.4620  1.20
Collins             PROPN   2577   644  0.3116  0.3854  1.24
Games               PROPN   2385   548  0.2884  0.3279  1.14
Hunger              PROPN   2302   535  0.2784  0.3202  1.15
Mockingjay          PROPN   2162   487  0.2615  0.2914  1.11
Gale                PROPN   1990   503  0.2407  0.3010  1.25
Suzanne             PROPN   1059   180  0.1281  0.1077  0.84
Capitol             PROPN   1002   259  0.1212  0.1550  1.28
Prim                PROPN    817   246  0.0988  0.1472  1.49
Finnick             PROPN    648   168  0.0784  0.1005  1.28
District            PROPN    469   102  0.0567  0.0610  1.08
Coin                PROPN    467   133  0.0565  0.0796  1.41
President           PROPN    457    97  0.0553  0.0580  1.05
Panem               PROPN    361    58  0.0437  0.0347  0.80
YA                  PROPN    298    63  0.0360  0.0377  1.05
Haymitch            PROPN    255    56  0.0308  0.0335  1.09
Harry               PROPN    198    50  0.0239  0.0299  1.25
Potter              PROPN    171    41  0.0207  0.0245  1.19
Capital             PROPN    165    33  0.0200  0.0197  0.99
Annie               PROPN    162    27  0.0196  0.0162  0.82
Team                PROPN    160    24  0.0193  0.0144  0.74
katniss             PROPN    142    31  0.0172  0.0186  1.08
Overall             PROPN    112    18  0.0135  0.0108  0.80
Game                PROPN     93    18  0.0112  0.0108  0.96
Kat                 PROPN     81    15  0.0098  0.0090  0.92
Buttercup           PROPN     80    18  0.0097  0.0108  1.11
peeta               PROPN     71    12  0.0086  0.0072  0.84
Boggs               PROPN     65    17  0.0079  0.0102  1.29
Plutarch            PROPN     51    11  0.0062  0.0066  1.07
read                VERB    5248  1040  0.6347  0.6224  0.98
think               VERB    4184   804  0.5060  0.4811  0.95
love                VERB    3635   500  0.4396  0.2992  0.68
feel                VERB    3047   881  0.3685  0.5272  1.43
end                 VERB    2456   542  0.2970  0.3244  1.09
like                VERB    2250   560  0.2721  0.3351  1.23
know                VERB    2192   500  0.2651  0.2992  1.13
want                VERB    2109   516  0.2550  0.3088  1.21
go                  VERB    2046   569  0.2474  0.3405  1.38
happen              VERB    1525   459  0.1844  0.2747  1.49
come                VERB    1468   312  0.1775  0.1867  1.05
find                VERB    1345   332  0.1627  0.1987  1.22
leave               VERB    1117   289  0.1351  0.1730  1.28
enjoy               VERB    1109   184  0.1341  0.1101  0.82
finish              VERB    1074   297  0.1299  0.1777  1.37
write               VERB    1035   288  0.1252  0.1724  1.38
take                VERB    1024   231  0.1238  0.1382  1.12
give                VERB     899   218  0.1087  0.1305  1.20
make                VERB     842   189  0.1018  0.1131  1.11
need                VERB     817   152  0.0988  0.0910  0.92
expect              VERB     805   194  0.0974  0.1161  1.19
say                 VERB     717   154  0.0867  0.0922  1.06
live                VERB     686   148  0.0830  0.0886  1.07
keep                VERB     578   135  0.0699  0.0808  1.16
see                 VERB     574   110  0.0694  0.0658  0.95
understand          VERB     546   155  0.0660  0.0928  1.40
believe             VERB     521   113  0.0630  0.0676  1.07
wait                VERB     498   122  0.0602  0.0730  1.21
turn                VERB     497   138  0.0601  0.0826  1.37
wish                VERB     496   123  0.0600  0.0736  1.23
hope                VERB     480   103  0.0580  0.0616  1.06
change              VERB     477    94  0.0577  0.0563  0.98
look                VERB     442   102  0.0535  0.0610  1.14
survive             VERB     429    60  0.0519  0.0359  0.69
choose              VERB     413   119  0.0499  0.0712  1.43
fight               VERB     404   116  0.0489  0.0694  1.42
guess               VERB     402    87  0.0486  0.0521  1.07
bring               VERB     400    89  0.0484  0.0533  1.10
stop                VERB     367   109  0.0444  0.0652  1.47
realize             VERB     365    78  0.0441  0.0467  1.06
play                VERB     333    62  0.0403  0.0371  0.92
help                VERB     333    74  0.0403  0.0443  1.10
show                VERB     327    57  0.0395  0.0341  0.86
wrap                VERB     321    49  0.0388  0.0293  0.76
pick                VERB     318    93  0.0385  0.0557  1.45
grow                VERB     312    80  0.0377  0.0479  1.27
disappoint          VERB     307    87  0.0371  0.0521  1.40
watch               VERB     300    66  0.0363  0.0395  1.09
stay                VERB     293    51  0.0354  0.0305  0.86
work                VERB     283    71  0.0342  0.0425  1.24
miss                VERB     282    63  0.0341  0.0377  1.11
continue            VERB     276    59  0.0334  0.0353  1.06
begin               VERB     275    69  0.0333  0.0413  1.24
hold                VERB     267    43  0.0323  0.0257  0.80
create              VERB     264    42  0.0319  0.0251  0.79
remember            VERB     257    54  0.0311  0.0323  1.04
will                VERB     257    59  0.0311  0.0353  1.14
move                VERB     243    49  0.0294  0.0293  1.00
wonder              VERB     241    61  0.0291  0.0365  1.25
hear                VERB     233    62  0.0282  0.0371  1.32
talk                VERB     228    54  0.0276  0.0323  1.17
learn               VERB     223    34  0.0270  0.0203  0.75
forget              VERB     213    48  0.0258  0.0287  1.12
catch               VERB     210    60  0.0254  0.0359  1.41
satisfy             VERB     204    29  0.0247  0.0174  0.70
agree               VERB     204    48  0.0247  0.0287  1.16
admit               VERB     200    40  0.0242  0.0239  0.99
ask                 VERB     198    27  0.0239  0.0162  0.67
set                 VERB     196    52  0.0237  0.0311  1.31
appreciate          VERB     191    28  0.0231  0.0168  0.73
win                 VERB     191    51  0.0231  0.0305  1.32
consider            VERB     190    49  0.0230  0.0293  1.28
describe            VERB     189    38  0.0229  0.0227  0.99
pull                VERB     189    31  0.0229  0.0186  0.81
imagine             VERB     187    35  0.0226  0.0209  0.93
add                 VERB     180    45  0.0218  0.0269  1.24
destroy             VERB     174    50  0.0210  0.0299  1.42
tie                 VERB     173    26  0.0209  0.0156  0.74
deal                VERB     173    35  0.0209  0.0209  1.00
figure              VERB     169    34  0.0204  0.0203  1.00
remind              VERB     161    32  0.0195  0.0192  0.98
sit                 VERB     159    45  0.0192  0.0269  1.40
manage              VERB     151    34  0.0183  0.0203  1.11
rescue              VERB     150    25  0.0181  0.0150  0.82
remain              VERB     149    30  0.0180  0.0180  1.00
include             VERB     142    36  0.0172  0.0215  1.25
blow                VERB     142    34  0.0172  0.0203  1.18
capture             VERB     140    27  0.0169  0.0162  0.95
tear                VERB     140    21  0.0169  0.0126  0.74
trust               VERB     139    19  0.0168  0.0114  0.68
protect             VERB     134    27  0.0162  0.0162  1.00
hurt                VERB     134    31  0.0162  0.0186  1.14
struggle            VERB     128    25  0.0155  0.0150  0.97
spoil               VERB     127    30  0.0154  0.0180  1.17
draw                VERB     126    30  0.0152  0.0180  1.18
involve             VERB     122    32  0.0148  0.0192  1.30
meet                VERB     120    27  0.0145  0.0162  1.11
fit                 VERB     119    21  0.0144  0.0126  0.87
provide             VERB     118    19  0.0143  0.0114  0.80
experience          VERB     118    20  0.0143  0.0120  0.84
fill                VERB     117    18  0.0141  0.0108  0.76
accept              VERB     111    22  0.0134  0.0132  0.98
speak               VERB     111    22  0.0134  0.0132  0.98
rate                VERB     110    27  0.0133  0.0162  1.21
handle              VERB     108    25  0.0131  0.0150  1.15
pace                VERB     107    20  0.0129  0.0120  0.92
torture             VERB     106    26  0.0128  0.0156  1.21
portray             VERB     101    19  0.0122  0.0114  0.93
reach               VERB      97    21  0.0117  0.0126  1.07
affect              VERB      97    27  0.0117  0.0162  1.38
focus               VERB      96    27  0.0116  0.0162  1.39
question            VERB      91    13  0.0110  0.0078  0.71
call                VERB      90    26  0.0109  0.0156  1.43
put                 VERB      90    25  0.0109  0.0150  1.37
exist               VERB      89    23  0.0108  0.0138  1.28
hit                 VERB      88    16  0.0106  0.0096  0.90
discuss             VERB      88    13  0.0106  0.0078  0.73
listen              VERB      87    14  0.0105  0.0084  0.80
send                VERB      87    24  0.0105  0.0144  1.37
burn                VERB      86    26  0.0104  0.0156  1.50
carry               VERB      85    19  0.0103  0.0114  1.11
sacrifice           VERB      83    21  0.0100  0.0126  1.25
contain             VERB      83    18  0.0100  0.0108  1.07
complain            VERB      82    15  0.0099  0.0090  0.91
deliver             VERB      81    21  0.0098  0.0126  1.28
recover             VERB      80    12  0.0097  0.0072  0.74
push                VERB      80    16  0.0097  0.0096  0.99
damage              VERB      79    14  0.0096  0.0084  0.88
lie                 VERB      78    15  0.0094  0.0090  0.95
close               VERB      75    13  0.0091  0.0078  0.86
conclude            VERB      73    15  0.0088  0.0090  1.02
plan                VERB      72    21  0.0087  0.0126  1.44
hijack              VERB      71    10  0.0086  0.0060  0.70
settle              VERB      69    19  0.0083  0.0114  1.36
prefer              VERB      69    16  0.0083  0.0096  1.15
prove               VERB      68    12  0.0082  0.0072  0.87
present             VERB      68    18  0.0082  0.0108  1.31
avoid               VERB      67    16  0.0081  0.0096  1.18
base                VERB      67    17  0.0081  0.0102  1.26
worry               VERB      66    12  0.0080  0.0072  0.90
shock               VERB      66    12  0.0080  0.0072  0.90
prepare             VERB      66    14  0.0080  0.0084  1.05
offer               VERB      65    11  0.0079  0.0066  0.84
touch               VERB      64    11  0.0077  0.0066  0.85
attach              VERB      63    16  0.0076  0.0096  1.26
overcome            VERB      63    13  0.0076  0.0078  1.02
explore             VERB      62    16  0.0075  0.0096  1.28
release             VERB      61    16  0.0074  0.0096  1.30
reveal              VERB      61    13  0.0074  0.0078  1.05
jump                VERB      58    12  0.0070  0.0072  1.02
open                VERB      57    14  0.0069  0.0084  1.22
endure              VERB      57    11  0.0069  0.0066  0.95
engage              VERB      57    11  0.0069  0.0066  0.95
discover            VERB      56     9  0.0068  0.0054  0.80
kick                VERB      55    10  0.0067  0.0060  0.90
pay                 VERB      55     9  0.0067  0.0054  0.81
bear                VERB      54    13  0.0065  0.0078  1.19
sleep               VERB      54    14  0.0065  0.0084  1.28
mock                VERB      52    12  0.0063  0.0072  1.14
post                VERB      50     9  0.0060  0.0054  0.89
scream              VERB      50    15  0.0060  0.0090  1.48
justify             VERB      50    15  0.0060  0.0090  1.48
represent           VERB      50     7  0.0060  0.0042  0.69
teach               VERB      50     7  0.0060  0.0042  0.69
heal                VERB      49     9  0.0059  0.0054  0.91
anticipate          VERB      49    10  0.0059  0.0060  1.01
connect             VERB      47    13  0.0057  0.0078  1.37

The word 'end' is used slightly more in positive reviews, while 'ending' is used more in negative reviews.

In [ ]:
 
In [ ]:
 
In [ ]:
 
In [365]:
from scripts.liwc import LIWC

# This dictionary is part of LIWC 2007, which is a commercial product, so not available in our Github repo
liwc_dict_file = '../data/LIWC2007_English131104.dic'

liwc = LIWC(liwc_dict_file)
Reading dictionary file ../data/LIWC2007_English131104.dic
encoding = utf-8
number of words : 4482
number of categories : 64
In [366]:
sample_size = 1000
sample_df = book_df.sample(sample_size, random_state=random_seed)
sample_docs = select_dataframe_spacy_docs(sample_df, review_docs, as_dict=True)
In [367]:
from scripts.text_tail_analysis import get_lemma_pos_tf_index, group_by_head, group_by_child

token_pos_types = ['ADJ', 'NOUN', 'PROPN', 'VERB']
doc_list = [sample_docs[review_id] for review_id in sample_docs]
tf_lemma_pos = get_lemma_pos_tf_index(doc_list)
In [369]:
from scripts.text_tail_analysis import get_tail_groupings

tail_groupings = get_tail_groupings(doc_list, tf_lemma_pos, token_pos_types, liwc, max_threshold=5, min_threshold=0)

tail_df = pd.DataFrame(tail_groupings)

book_terms = ['book', 'novel', 'story', 'plot', 'character', 'twist', 'development']

tail_df[(tail_df.tail_pos == 'ADJ') & (tail_df.dependency_word == 'book')]
Out[369]:
dependency_type dependency_word dependency_pos dependency_freq tail_word tail_pos tail_freq dep_tail_freq liwc_category
218 head book NOUN 1630 eighth ADJ 1 1 None
220 head book NOUN 1630 sappy ADJ 2 1 None
221 head book NOUN 1630 disjointed ADJ 5 1 None
223 head book NOUN 1630 government ADJ 1 1 None
224 head book NOUN 1630 exceptional ADJ 5 1 None
226 head book NOUN 1630 middle ADJ 5 1 None
227 head book NOUN 1630 lengthy ADJ 2 1 None
229 head book NOUN 1630 later ADJ 2 1 relativ|time
232 head book NOUN 1630 remarkable ADJ 3 1 None
233 head book NOUN 1630 separate ADJ 2 1 None
234 head book NOUN 1630 war ADJ 4 1 affect|negemo|anger|death
235 head book NOUN 1630 adequate ADJ 2 1 None
236 head book NOUN 1630 dull ADJ 4 1 None
237 head book NOUN 1630 special ADJ 2 1 affect|posemo
240 head book NOUN 1630 triumphant ADJ 1 1 None
242 head book NOUN 1630 audio ADJ 2 2 None
245 head book NOUN 1630 best ADJ 2 1 funct|quant|affect|posemo|achieve
249 head book NOUN 1630 7th ADJ 1 1 None
250 head book NOUN 1630 light ADJ 3 1 percept
251 head book NOUN 1630 fourth ADJ 1 1 None
255 head book NOUN 1630 dystopic ADJ 2 1 None
256 head book NOUN 1630 expletive ADJ 1 1 None
258 head book NOUN 1630 pretty ADJ 5 1 affect|posemo|cogmech|tentat
259 head book NOUN 1630 paced ADJ 5 1 None
260 head book NOUN 1630 thrilling ADJ 5 1 None
263 head book NOUN 1630 anticipated ADJ 1 1 None
269 head book NOUN 1630 darned ADJ 1 1 None
270 head book NOUN 1630 previious ADJ 1 1 None
5111 child book NOUN 1630 intriguing ADJ 1 1 None
In [221]:
tail_groupings = get_tail_groupings(doc_list, tf_lemma_pos, token_pos_types, liwc, max_threshold=50, min_threshold=10)

tail_df = pd.DataFrame(tail_groupings)

book_terms = [
    'book', 'novel', 'story', 'plot', 'character', 'twist', 'development', 
    'pace', 'scene', 'setting', 'narrative', 'theme', 'event']
author_terms = ['writing', 'style', 'write', 'author', 'writer', 'voice', 'describe', 'explain']
reader_terms = ['reader', 'feel', 'feeling', 'make', 'pull', 'throw']
tail_df[(tail_df.dependency_word == 'describe')]
Out[221]:
dependency_type dependency_word dependency_pos dependency_freq tail_word tail_pos tail_freq dep_tail_freq liwc_category
2501 head describe VERB 20 describe VERB 20 3 verb|present|social
2502 head describe VERB 20 event NOUN 40 1 relativ|time
2503 head describe VERB 20 feeling NOUN 50 1 None
2504 head describe VERB 20 theme NOUN 19 1 None
2505 head describe VERB 20 scene NOUN 30 1 None
2506 head describe VERB 20 explain VERB 29 1 verb|present|social|cogmech|insight
9015 child describe VERB 20 word NOUN 42 3 None
9016 child describe VERB 20 begin VERB 35 1 verb|present|relativ|time
9017 child describe VERB 20 reality NOUN 19 1 cogmech|certain
9018 child describe VERB 20 pull VERB 11 1 None
In [ ]:

In [ ]:
 
In [ ]:
 
In [ ]:
 
In [251]:
terms = ['memorable', 'chilling', 'overwrought', 'gripping']
for term in terms:
    syns = wn.synsets(term)
    for syn in syns:
        print(syn.lemmas())
        print(syn.hypernyms())
        #print(syn.hyponyms())
        affect = wn.synset('affect.v.01')
        print(syn.wup_similarity(affect))
        print()
[Lemma('memorable.s.01.memorable')]
[]
None

[Lemma('cooling.n.01.cooling'), Lemma('cooling.n.01.chilling'), Lemma('cooling.n.01.temperature_reduction')]
[Synset('temperature_change.n.01')]
None

[Lemma('chill.v.01.chill')]
[Synset('depress.v.01')]
0.2857142857142857

[Lemma('cool.v.01.cool'), Lemma('cool.v.01.chill'), Lemma('cool.v.01.cool_down')]
[Synset('change.v.01')]
0.3333333333333333

[Lemma('cool.v.02.cool'), Lemma('cool.v.02.chill'), Lemma('cool.v.02.cool_down')]
[Synset('change_state.v.01')]
0.2857142857142857

[Lemma('chilling.s.01.chilling'), Lemma('chilling.s.01.scarey'), Lemma('chilling.s.01.scary'), Lemma('chilling.s.01.shivery'), Lemma('chilling.s.01.shuddery')]
[]
None

[Lemma('distraught.s.01.distraught'), Lemma('distraught.s.01.overwrought')]
[]
None

[Lemma('grip.v.01.grip')]
[Synset('seize.v.01')]
0.2857142857142857

[Lemma('grapple.v.02.grapple'), Lemma('grapple.v.02.grip')]
[Synset('seize.v.01')]
0.2857142857142857

[Lemma('fascinate.v.02.fascinate'), Lemma('fascinate.v.02.transfix'), Lemma('fascinate.v.02.grip'), Lemma('fascinate.v.02.spellbind')]
[Synset('interest.v.01')]
0.25

[Lemma('absorbing.s.01.absorbing'), Lemma('absorbing.s.01.engrossing'), Lemma('absorbing.s.01.fascinating'), Lemma('absorbing.s.01.gripping'), Lemma('absorbing.s.01.riveting')]
[]
None

In [ ]:
 
In [ ]:
 
In [ ]:
 
In [278]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [315]:
review_df.groupby(['book_id', 'author_name', 'title']).size()
Out[315]:
book_id   author_name      title                                  
19063     Markus Zusak     The Book Thief                             10547
41865     Stephenie Meyer  Twilight (Twilight, #1)                     9826
2767052   Suzanne Collins  The Hunger Games (The Hunger Games, #1)    17403
6148028   Suzanne Collins  Catching Fire (The Hunger Games, #2)       11057
7260188   Suzanne Collins  Mockingjay (The Hunger Games, #3)          12607
10818853  E.L. James       Fifty Shades of Grey (Fifty Shades, #1)    10257
11870085  John Green       The Fault in Our Stars                     19151
13335037  Veronica Roth    Divergent (Divergent, #1)                   9866
22557272  Paula Hawkins    The Girl on the Train                      12624
dtype: int64
In [321]:
sample_size = 1000
hg1_df = review_df[review_df.book_id == 2767052]
sample_hg1_df = hg1_df.sample(sample_size)
sample_hg1_df
docs_hg1 = [nlp(text) for text in get_sample_review_texts(sample_hg1_df)]
In [322]:
tf_lemma_pos = get_lemma_pos_tf_index(docs + docs_hg1)
In [326]:
child_group_hg1 = group_by_child(docs_hg1, tf_lemma_pos, token_pos_types, max_threshold=5)

shared_tokens = [token for token in child_group_hg1 if token in child_group]

token_lemma_pos = ('Katniss', 'PROPN')
if token_lemma_pos in child_group_hg1:
    print(token_lemma_pos)
    for token_pos in token_pos_types:
        print('\t', token_pos, '\t', [lemma for lemma, pos in child_group_hg1[token_lemma_pos] if pos == token_pos])
    print()


#for token_lemma_pos in child_group_hg1:
for token_lemma_pos in shared_tokens:
    if sum(child_group_hg1[token_lemma_pos].values()) < 20:
        continue
    print(token_lemma_pos)
    for token_pos in token_pos_types:
        print('\t', token_pos, '\t', [lemma for lemma, pos in child_group[token_lemma_pos] if pos == token_pos])
        print('\t', token_pos, '\t', [lemma for lemma, pos in child_group_hg1[token_lemma_pos] if pos == token_pos])
        print()
('Katniss', 'PROPN')
	 ADJ 	 []
	 NOUN 	 ['prowess', 'behaviour', 'mask', 'monologueing', 'selflessness', 'crisis', 'tomboyishness', 'forehead', 'pin', 'volunter', 'reluctance', 'falseness', 'eyesight', 'volenteer', 'hunt', 'viewpoint']
	 PROPN 	 ['Goes', 'Szeyenne', 'story', 'Ngunit', 'Everdeen-', 'edumped']

('read', 'VERB')
	 ADJ 	 ['exhausting', 'stoked', 'eager']
	 ADJ 	 ['hesitant', 'psyched', 'adamant']

	 NOUN 	 ['generation', 'joy', 'push', 'try', 'sob']
	 NOUN 	 ['headache', 'wish', 'yearn', 'holiday', 'crime', 'reluctance', 'promise', 'thon', 'chore', 'million', 'blessing', 'urge', 'disclaimer', 'car', 'instance', 'hurry', 'category']

	 PROPN 	 []
	 PROPN 	 []

('Katniss', 'PROPN')
	 ADJ 	 ['redeeming', 'lead', 'twisted', 'married']
	 ADJ 	 []

	 NOUN 	 ['host', 'liberation', 'intervention', 'evolution', 'talk', 'desire', 'progression', 'hallucination', 'relationsip', 'pov', 'channel', 'trial', 'turmoil', 'doing', 'house', 'search', 'bout', 'escapade', 'involve', 'process', 'insight', 'vulnerability', 'mom', 'mark', 'trait', 'challenge', 'reflection', 'whining', 'recovery', 'dilemma', 'bow', 'rumination', 'obsession', 'attempt', 'plague', 'vote']
	 NOUN 	 ['prowess', 'behaviour', 'mask', 'monologueing', 'selflessness', 'crisis', 'tomboyishness', 'forehead', 'pin', 'volunter', 'reluctance', 'falseness', 'eyesight', 'volenteer', 'hunt', 'viewpoint']

	 PROPN 	 ['Everdden', 'Everdean', 'Everdine', 'p.o.v']
	 PROPN 	 ['Goes', 'Szeyenne', 'story', 'Ngunit', 'Everdeen-', 'edumped']

('book', 'NOUN')
	 ADJ 	 ['underwhelming', 'touching', 'watery', 'mesmerizing', 'unpredictable', 'special']
	 ADJ 	 ['unfun', 'intolerable', 'prescient', 'remarkable', 'provoking']

	 NOUN 	 ['wasn', 'hiding', 'tour', 'defense', 'plotline', 'content', 'hangover', 'club', 'weakness', 'center', 'recovery']
	 NOUN 	 ['length', '#', 'jacket', 'chore', 'report', 'anthem', 'seductiveness', 'readability', 'worm', 'advance', 'hipster']

	 PROPN 	 []
	 PROPN 	 ['Gregor', 'can;t']

('little', 'ADJ')
	 ADJ 	 ['tiring', 'skewed', 'hokey', 'lost', 'spastic', 'agonizing', 'underwhelmed', 'dense', 'odd', 'extreme', 'facetious', 'akward', 'nervous', 'wary', 'eyed', 'paranoid', 'ridiculous', 'anxious', 'wordy']
	 ADJ 	 ['grumpy', 'dimensional', 'intimidated', 'thin', 'gross', 'icky', 'childish', 'shaky', 'clunky', 'forced', 'cheesy']

	 NOUN 	 ['fit', 'annoyance', 'visit', 'package', 'talk', 'bow', 'light', 'meh', 'redemption', 'primrose', 'guarantee', 'heartedness', 'duck', 'distinction', 'outside', 'dark', 'context', 'pansy']
	 NOUN 	 ['suspension', 'guidance', 'showing', 'publicity', 'juvenile', 'duplicity', 'dark', 'ambiguity', 'quibble', 'mention', 'machina', 'batter', 'roll']

	 PROPN 	 ['Duck']
	 PROPN 	 ['OTT']

('very', 'ADV')
	 ADJ 	 ['unconvincing', 'lacking', 'cringy', 'imaginative', 'worthwhile', 'intriguing', 'impulsive', 'hatefull', 'military', 'consistent', 'loyal', 'worried', 'rich', 'fond', 'sympathetic', 'peculiar', 'shallow', 'sensitive', 'touching', 'vivid', 'impressive', 'subtle', 'substantial', 'narrow', 'sensetive', 'mediocre', 'conflicting', 'independent', 'tangible', 'mopey', 'precious', 'conflicted', 'telling', 'cranky', 'concerned', 'dissapointed']
	 ADJ 	 ['unlikely', 'telling', 'energetic', 'concerned', 'intertaining', 'addicting', 'talented', 'sensible', 'touching', 'unmoved', 'admirable', 'foward', 'interior', 'connected', 'distinct', 'devoted', 'torn', 'poignant', 'convenient', 'cryptic', 'critical', 'resourceful', 'absorbing', 'english', 'boyish', 'unlucky', 'unusual', 'squicky', 'approachable', 'angsty', 'cinematic', 'gripping']

	 NOUN 	 ['end!!!!!!!!!!I']
	 NOUN 	 ['sublty', 'gentleman']

	 PROPN 	 []
	 PROPN 	 []

('more', 'ADV')
	 ADJ 	 ['sophisticated', 'bearable', 'insipid', 'detailed', 'introspective', 'thoughtful', 'eloquent', 'triumphant', 'caring', 'hidden', 'focused', 'lovely', 'meaningful', 'understandable', 'readable', 'serious', 'likable', 'rounded', 'concerned', 'critical', 'evident', 'subtle', 'valuable', 'convincing', 'heightened', 'positive', 'natural', 'vulnerable', 'paced', 'gripping', 'forgiving', 'acquainted', 'profound', 'frequent', 'likely', 'effective', 'ideal', 'integral', 'redemptive', 'explicit', 'practical']
	 ADJ 	 ['articulate', 'savage', 'substantial', 'suitable', 'dimensional', 'meaty', 'contrived', 'psychotic', 'endearing', 'accessible', 'jaded', 'superficial', 'lethal', 'layered', 'imagined', 'appealing', 'stylistic', 'involved', 'coherent', 'poetic', 'mystified', 'enthralling', 'enthralled', 'amorous']

	 NOUN 	 ['gravitas']
	 NOUN 	 []

	 PROPN 	 []
	 PROPN 	 []

('so', 'ADV')
	 ADJ 	 ['frustrating', 'brave', 'cringy', 'staged', 'unsure', 'jaded', 'thankful', 'badass', 'intelligent', 'ish', 'attached', 'limited', 'underwhelming', 'hollow', 'indifferent', 'unhappy', 'lackluster', 'aggrivated', 'distant', 'impure', 'dissapointing', 'ephemeral', 'unsettled', 'conflicted', 'damaged', 'descriptive', 'dangerous', 'climatic', 'surprising', 'riveting', 'repetitive', 'disjointed', 'pitiful', 'cliche', 'extraordinary', 'stunned', 'masculine', 'stoked', 'dire', 'fustrating', 'skillful', 'frightening', 'enthralled', 'saccharine', 'exhausted', 'climactic', 'detailed', 'naive', 'gloomy', 'connected', 'mesmerizing', 'engrossing', 'likeable', 'grateful', 'special']
	 ADJ 	 ['observant', 'charming', 'wide', 'cheesy', 'repetitive', 'colourful', 'incredibly', 'stressful', 'hyper', 'thick', 'rounded', 'tardy', 'oblivious', 'unneccessary', 'sucky', 'distressed', 'lucky', 'hurt', 'discrete', 'connected', 'blurry', 'menacing', 'immersed', 'vicious', 'fond']

	 NOUN 	 ['reference', 'scatter', 'heartbreaking--']
	 NOUN 	 ['manything', 'applause']

	 PROPN 	 ['wright']
	 PROPN 	 ['edumped', 'twisted!After']

In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [198]:
def filter_doc_terms(doc, filter_terms):
    return [token for token in doc if token in filter_terms]
    
def doc_generator(docs, use_sentences=False):
    for doc in docs:
        if use_sentences:
            for sent in doc.sents:
                yield sent
        else:
            yield doc

def get_cooc(docs, filter_terms=None, use_sentences=False, use_lemma=False):
    cooc = Counter()
    for doc in doc_generator(docs, use_sentences=use_sentences):
        token_set = get_doc_token_set(doc, use_lemma=use_lemma)
        if filter_terms:
            token_set = filter_doc_terms(token_set, filter_terms)
        cooc.update([term_pair for term_pair in combinations(sorted(token_set), 2)])
    return cooc
In [199]:
common_terms = [term for term, freq in df.most_common() if freq >= 100 and term != '  ']
cooc = get_cooc(docs, filter_terms=common_terms, use_sentences=False, use_lemma=True)


cooc.most_common(50)
Out[199]:
[(('book', 'read'), 308),
 (('book', 'end'), 280),
 (('book', 'series'), 271),
 (('book', 'like'), 271),
 (('Katniss', 'book'), 254),
 (('book', 'good'), 214),
 (('book', 'think'), 213),
 (('book', 'love'), 212),
 (('book', 'character'), 201),
 (('Katniss', 'end'), 199),
 (('book', 'ending'), 198),
 (('read', 'series'), 198),
 (('book', 'feel'), 196),
 (('end', 'series'), 190),
 (('Games', 'Hunger'), 189),
 (('Katniss', 'read'), 184),
 (('like', 'read'), 179),
 (('end', 'read'), 178),
 (('end', 'like'), 177),
 (('book', 'way'), 176),
 (('Katniss', 'like'), 174),
 (('Collins', 'book'), 170),
 (('book', 'story'), 169),
 (('Games', 'book'), 168),
 (('Hunger', 'book'), 168),
 (('book', 'time'), 164),
 (('like', 'series'), 163),
 (('Katniss', 'series'), 161),
 (('read', 'think'), 154),
 (('end', 'think'), 153),
 (('Katniss', 'think'), 153),
 (('Katniss', 'Peeta'), 152),
 (('Katniss', 'character'), 148),
 (('end', 'love'), 148),
 (('character', 'end'), 147),
 (('like', 'think'), 146),
 (('book', 'trilogy'), 146),
 (('Katniss', 'love'), 146),
 (('character', 'series'), 145),
 (('good', 'read'), 145),
 (('Peeta', 'book'), 144),
 (('love', 'series'), 144),
 (('feel', 'read'), 144),
 (('love', 'read'), 144),
 (('feel', 'like'), 143),
 (('good', 'series'), 142),
 (('character', 'read'), 141),
 (('ending', 'like'), 141),
 (('end', 'good'), 140),
 (('book', 'want'), 139)]
In [200]:
cooc = get_cooc(docs, filter_terms=common_terms, use_sentences=True, use_lemma=True)
cooc.most_common(50)
Out[200]:
[(('Games', 'Hunger'), 283),
 (('book', 'read'), 200),
 (('book', 'like'), 179),
 (('Katniss', 'Peeta'), 164),
 (('Katniss', 'book'), 144),
 (('book', 'series'), 129),
 (('book', 'good'), 126),
 (('Katniss', 'end'), 117),
 (('book', 'end'), 117),
 (('feel', 'like'), 115),
 (('book', 'think'), 109),
 (('book', 'love'), 109),
 (('Gale', 'Katniss'), 108),
 (('Gale', 'Peeta'), 103),
 (('book', 'feel'), 100),
 (('Collins', 'Suzanne'), 98),
 (('Katniss', 'like'), 90),
 (('end', 'series'), 82),
 (('Collins', 'book'), 81),
 (('read', 'series'), 78),
 (('book', 'character'), 77),
 (('Hunger', 'book'), 74),
 (('Katniss', 'think'), 74),
 (('Games', 'book'), 73),
 (('Katniss', 'love'), 73),
 (('end', 'like'), 69),
 (('book', 'trilogy'), 67),
 (('Katniss', 'feel'), 66),
 (('Katniss', 'character'), 65),
 (('Peeta', 'end'), 65),
 (('book', 'story'), 63),
 (('end', 'way'), 63),
 (('book', 'way'), 62),
 (('book', 'time'), 60),
 (('Peeta', 'love'), 60),
 (('end', 'think'), 58),
 (('book', 'thing'), 56),
 (('book', 'finish'), 56),
 (('Mockingjay', 'book'), 56),
 (('Games', 'series'), 55),
 (('like', 'series'), 55),
 (('Peeta', 'book'), 54),
 (('love', 'series'), 54),
 (('Hunger', 'series'), 54),
 (('book', 'go'), 54),
 (('read', 'time'), 53),
 (('book', 'final'), 53),
 (('Peeta', 'like'), 52),
 (('good', 'series'), 50),
 (('book', 'want'), 49)]
In [268]:
from helper import get_pmi_cooc

pmi_cooc = get_pmi_cooc(df, cooc, filter_terms=common_terms)

for ti, term_pair in enumerate(pmi_cooc):
    print(term_pair, pmi_cooc[term_pair])
    if ti == 10:
        break
('Games', 'Hunger') 6.460479234356834
('Collins', 'Suzanne') 6.088259967047493
('Gale', 'Peeta') 5.99832727669207
('Katniss', 'Peeta') 5.610687390134804
('Gale', 'Katniss') 5.495901734017394
('Collins', 'write') 4.992401742361956
('feel', 'like') 4.943319340093928
('Games', 'final') 4.925125255903836
('Hunger', 'final') 4.915074920050335
('Games', 'Mockingjay') 4.733835029127121
('Hunger', 'Mockingjay') 4.72378469327362
In [296]:
from helper import get_doc_content_chunks
from scripts.pmi import PMICOOC

token_sets = [sent_chunks for doc in docs for sent_chunks in get_doc_content_chunks(doc)]
token_sets = [[token.lemma_ if token.lemma_ != '-PRON-' else token.text for token in token_set] for token_set in token_sets]
pmi_cooc = PMICOOC(token_sets, filter_terms=common_terms)
token_freq = Counter([token for token_set in token_sets for token in token_set])
cooc_freq = Counter([token_pair for token_set in token_sets for token_pair in combinations([token for token in token_set], 2)])
pmi_cooc = get_pmi_cooc(token_freq, cooc_freq, filter_terms=common_terms)

for ti, term_pair in enumerate(pmi_cooc):
    print(term_pair, pmi_cooc[term_pair])
    if ti == 10:
        break
('ending', 'trilogy') 5.6627197008620245
('lot', 'lot') 3.73396263103296
('like', 'Games') 3.703295832639988
('way', 'way') 3.5018627676966796
('enjoy', 'series') 3.13889763482435
('series', 'book') 3.086841272868297
('like', 'ending') 3.0101486520800425
('Suzanne', 'character') 2.9081494841249214
('lot', 'people') 2.805975859395614
('story', 'thing') 2.583653438550424
('book', 'book') 2.541614222384974
In [313]:
from scripts.pmi import PMICOOC
pmi_cooc = PMICOOC(token_sets, filter_terms=common_terms)

for term in pmi_cooc.highest(5):
    print(term, pmi_cooc[term])
('ending', 'trilogy') 5.6627197008620245
('lot', 'lot') 3.73396263103296
('like', 'Games') 3.703295832639988
('way', 'way') 3.5018627676966796
('enjoy', 'series') 3.13889763482435
('series', 'book') 3.086841272868297
In [317]:
for term in pmi_cooc:
    print(term, pmi_cooc[term])
('ending', 'trilogy') 5.6627197008620245
('lot', 'lot') 3.73396263103296
('like', 'Games') 3.703295832639988
('way', 'way') 3.5018627676966796
('enjoy', 'series') 3.13889763482435
('series', 'book') 3.086841272868297
('like', 'ending') 3.0101486520800425
('Suzanne', 'character') 2.9081494841249214
('lot', 'people') 2.805975859395614
('story', 'thing') 2.583653438550424
('book', 'book') 2.541614222384974
('series', 'leave') 2.476760691006965
('thing', 'happen') 2.34846154872004
('great', 'way') 2.343352337447501
('Collins', 'write') 2.314179600159026
('time', 'trilogy') 2.312815613587419
('lot', 'war') 2.2233705532354926
('Peeta', 'Gale') 2.211760809640805
('Gale', 'Peeta') 2.211760809640805
('come', 'series') 2.208959989109013
('Suzanne', 'write') 2.1851493404152946
('write', 'Suzanne') 2.1851493404152946
('great', 'character') 2.14268164198535
('character', 'great') 2.14268164198535
('Collins', 'war') 2.135814662211173
('people', 'lot') 2.1128286788356685
('Gale', 'character') 2.0972192679085926
('Suzanne', 'war') 2.0608516237377175
('war', 'people') 1.9885309621580913
('Suzanne', 'happen') 1.9798104137345918
('Collins', 'final') 1.9798104137345918
('Suzanne', 'people') 1.9503097493378938
('people', 'Suzanne') 1.9503097493378938
('think', 'series') 1.9364527146764674
('final', 'Gale') 1.9048473752611363
('trilogy', 'people') 1.9015195851684619
('write', 'way') 1.8850447479649568
('way', 'write') 1.8850447479649568
('story', 'happen') 1.8845002339302668
('great', 'great') 1.8779890877582677
('enjoy', 'people') 1.8626041689187882
('story', 'get') 1.854999569533569
('time', 'happen') 1.8487821513281877
('enjoy', 'enjoy') 1.847219250079309
('time', 'get') 1.8192814869314895
('go', 'way') 1.8140930119926724
('story', 'Gale') 1.8095371954568116
('enjoy', 'character') 1.7218316150377058
('know', 'happen') 1.7217288110591014
('find', 'happen') 1.6882895649756453
('end', 'series') 1.6861581742963068
('character', 'finish') 1.6843740525028057
('people', 'enjoy') 1.6802826121248335
('read', 'series') 1.6790407065274429
('get', 'little') 1.6555102091172489
('find', 'way') 1.6386443344864796
('time', 'thing') 1.6316446240741898
('like', 'series') 1.6238542909601519
('think', 'lot') 1.5881460204082518
('way', 'little') 1.5612576708710595
('little', 'way') 1.5612576708710595
('Peeta', 'character') 1.5568348419010574
('Katniss', 'Gale') 1.5560293028176242
('Katniss', 'Peeta') 1.5437123070105858
('write', 'spoiler') 1.5432954542428998
('feel', 'Suzanne') 1.5422639946362378
('thing', 'little') 1.5368662177469004
('go', 'happen') 1.5323811065273951
('little', 'time') 1.5071904496007837
('time', 'little') 1.5071904496007837
('enjoy', 'trilogy') 1.4806695582208178
('write', 'trilogy') 1.4432119956859175
('find', 'character') 1.4379736390243285
('think', 'Suzanne') 1.4256270909104771
('way', 'go') 1.408627903884508
('war', 'war') 1.4059256559979703
('book', 'great') 1.4050826988545706
('great', 'book') 1.4050826988545706
('enjoy', 'book') 1.3896977800150914
('Collins', 'leave') 1.3841185693253595
('final', 'Peeta') 1.3644629492536011
('Mockingjay', 'war') 1.3528158306840217
('spoiler', 'come') 1.3439625516224087
('know', 'go') 1.3328678580029723
('end', 'Peeta') 1.3280953050827264
('want', 'know') 1.3272430942782067
('want', 'happen') 1.3119810411589363
('series', 'love') 1.3023545629693218
('time', 'want') 1.2883112970730315
('character', 'love') 1.2715829043025684
('story', 'Peeta') 1.2691527694492766
('leave', 'way') 1.2603653666824723
('thing', 'thing') 1.255855284112142
('finish', 'Mockingjay') 1.2539699960473891
('Collins', 'trilogy') 1.2378730690052144
('people', 'thing') 1.2203485956552322
('story', 'Katniss') 1.2155966649803143
('get', 'happen') 1.2143425715950202
('feel', 'little') 1.1986742902461607
('finish', 'read') 1.1972026196347048
('way', 'finish') 1.1918975674050114
('spoiler', 'feel') 1.1880921809156237
('people', 'people') 1.1848419071983223
('love', 'way') 1.184571527312939
('write', 'write') 1.184517460107389
('Mockingjay', 'spoiler') 1.1427443609371646
('Suzanne', 'feel') 1.1367988865280734
('book', 'time') 1.123231546713583
('know', 'lot') 1.116137521523476
('good', 'great') 1.1119805589334504
('book', 'find') 1.1058398040017139
('time', 'go') 1.0668786101624514
('spoiler', 'go') 1.066878610162451
('think', 'write') 1.049149519675565
('Mockingjay', 'trilogy') 1.042660902380182
('feel', 'character') 1.0414887067237484
('think', 'Collins') 1.0379666074358191
('read', 'book') 1.0284531403862938
('Suzanne', 'Katniss') 1.0232247723328585
('want', 'end') 1.0183288017206515
('enjoy', 'Collins') 1.016636095961586
('Suzanne', 'go') 1.0155853157749009
('time', 'war') 1.0135326294571585
('war', 'time') 1.0135326294571585
('enjoy', 'write') 0.9988314713280796
('book', 'feel') 0.9970369441529148
('leave', 'feel') 0.9930921658618983
('write', 'character') 0.9912268719428605
('Collins', 'way') 0.9865586407243083
('Gale', 'Katniss') 0.9854844443500113
('love', 'character') 0.9839008318507876
('love', 'Peeta') 0.9806593379266164
('think', 'go') 0.9781977837032804
('Gale', 'leave') 0.9726832942306912
('feel', 'thing') 0.9664535207808344
('Gale', 'Gale') 0.9603857664202851
('get', 'way') 0.9570579763276105
('happen', 'happen') 0.9561611635399372
('know', 'Suzanne') 0.9536185920257009
('Suzanne', 'know') 0.9536185920257009
('thing', 'time') 0.9384974435142445
('come', 'people') 0.9326665232034513
('thing', 'get') 0.9326665232034513
('go', 'go') 0.9248309525064365
('Gale', 'way') 0.9115956022508531
('find', 'end') 0.9091295097556594
('think', 'end') 0.906833297495309
('war', 'leave') 0.9055439913930627
('Peeta', 'Katniss') 0.900575547025302
('trilogy', 'want') 0.900545766064268
('find', 'find') 0.8740381899443894
('lot', 'like') 0.8700824885837721
('like', 'Peeta') 0.8603263136384072
('like', 'book') 0.8554836891626193
('little', 'go') 0.8543171681787783
('feel', 'way') 0.8366942940777351
('thing', 'leave') 0.8305088054501487
('leave', 'thing') 0.8305088054501487
('people', 'good') 0.8242984864816697
('want', 'finish') 0.8241727872796942
('know', 'book') 0.8138566496505423
('Katniss', 'end') 0.8119156786656515
('Katniss', 'character') 0.8101315568721499
('enjoy', 'come') 0.7994985687075882
('read', 'Mockingjay') 0.7966515263289691
('Katniss', 'go') 0.7893695654237208
('Collins', 'character') 0.7858879452621571
('happen', 'character') 0.7858879452621571
('Collins', 'come') 0.7798456308061944
('enjoy', 'leave') 0.7796171981537594
('go', 'leave') 0.7765684153044009
('end', 'Gale') 0.7698674424221519
('think', 'happen') 0.7697026208411397
('thing', 'write') 0.762041006172688
('enjoy', 'find') 0.7524310578496025
('want', 'get') 0.7434838760295515
('leave', 'write') 0.7421596356188591
('write', 'leave') 0.7421596356188591
('leave', 'finish') 0.7421596356188591
('think', 'good') 0.7414485803332919
('book', 'Collins') 0.7414361826913233
('find', 'come') 0.7387841440084206
('thing', 'end') 0.7330534692994355
('find', 'Collins') 0.7327781199482086
('great', 'finish') 0.726534317715778
('think', 'little') 0.7253624425816675
('feel', 'finish') 0.7239536711222865
('Collins', 'feel') 0.7192854399037342
('know', 'leave') 0.7146016915552009
('trilogy', 'read') 0.7139598104838558
('know', 'end') 0.7124565352088127
('get', 'book') 0.7119355182946253
('write', 'enjoy') 0.7111493988762986
('character', 'Gale') 0.7109249067887018
('like', 'Gale') 0.7075635590859971
('like', 'Suzanne') 0.7075635590859971
('leave', 'little') 0.7060546309767428
('know', 'war') 0.7041577323941179
('think', 'Gale') 0.6947395823676844
('Collins', 'go') 0.6850832461401916
('enjoy', 'little') 0.6750443942341823
('feel', 'leave') 0.6746384347433637
('finish', 'finish') 0.6736918363413983
('like', 'thing') 0.6707495859632806
('read', 'story') 0.6674397948489631
('Gale', 'book') 0.666473144217868
('love', 'finish') 0.6663657962493253
('think', 'time') 0.6659901690816986
('think', 'get') 0.6601592487709054
('get', 'go') 0.6555825817434937
('way', 'know') 0.6535139995753627
('finish', 'know') 0.6461338922777402
('Mockingjay', 'little') 0.6425008465017109
('want', 'write') 0.6418512304857396
('book', 'good') 0.6390741700297535
('come', 'war') 0.6377432894951106
('war', 'thing') 0.6377432894951106
('write', 'little') 0.6375868316992819
('great', 'like') 0.635242897506371
('like', 'end') 0.6334555869322751
('read', 'time') 0.6317217122468838
('Mockingjay', 'leave') 0.6292906147649046
('think', 'Katniss') 0.6268921477879663
('want', 'thing') 0.624839884659203
('feel', 'happen') 0.6239752600994095
('Collins', 'want') 0.618833860598991
('little', 'end') 0.6085992948260295
('Katniss', 'get') 0.6060636244615027
('time', 'know') 0.5994467783050872
('great', 'Peeta') 0.5989951071140298
('know', 'people') 0.593615857994294
('happen', 'end') 0.593516052614701
('think', 'war') 0.5883795663767747
('come', 'Mockingjay') 0.5846334641811621
('love', 'love') 0.5820787150211242
('find', 'want') 0.5777723738012172
('know', 'know') 0.5777539536938271
('Collins', 'Katniss') 0.5749396670417656
('come', 'thing') 0.5627081035521967
('thing', 'come') 0.5627081035521967
('Collins', 'happen') 0.5506960554317728
('know', 'Gale') 0.5481534839175365
('Gale', 'know') 0.5481534839175365
('book', 'Mockingjay') 0.546224016066291
('Mockingjay', 'book') 0.546224016066291
('feel', 'Katniss') 0.5428583560515761
('Katniss', 'find') 0.5338781802439918
('leave', 'end') 0.530850541951652
('people', 'come') 0.5272014150952868
('thing', 'people') 0.5272014150952868
('Collins', 'people') 0.521195391035075
('happen', 'people') 0.521195391035075
('think', 'character') 0.5193866948898233
('know', 'Peeta') 0.518594681675992
('Gale', 'end') 0.5185530141412458
('think', 'come') 0.5133443804338607
('read', 'happen') 0.5122906126921154
('Peeta', 'little') 0.5100476210975337
('little', 'Peeta') 0.5100476210975337
('Katniss', 'want') 0.5095460795844611
('know', 'read') 0.5046683719777977
('happen', 'go') 0.5027616893462369
('want', 'little') 0.500385710185797
('know', 'love') 0.49570700854499405
('feel', 'time') 0.49494500035567873
('enjoy', 'read') 0.49272283744022766
('people', 'get') 0.491694726638377
('leave', 'know') 0.4914581402409912
('like', 'find') 0.4901506824807723
('feel', 'get') 0.4891140800448854
('Katniss', 'time') 0.4867314018182897
('Peeta', 'love') 0.48166817180762866
('like', 'Katniss') 0.48134780873481714
('great', 'find') 0.4801339042373011
('enjoy', 'get') 0.47630980779889764
('happen', 'Gale') 0.47573301695831754
('go', 'Peeta') 0.4752008897673657
('finish', 'come') 0.47435893372090715
('know', 'feel') 0.4732521757444189
('write', 'feel') 0.4726392428413805
('go', 'book') 0.47035826529157765
('way', 'end') 0.4697628499718137
('like', 'Mockingjay') 0.46953139527803645
('Peeta', 'get') 0.46546371448950735
('know', 'Katniss') 0.4650385772070299
('Mockingjay', 'read') 0.46017928970775623
('little', 'Katniss') 0.4564915166285714
('go', 'end') 0.4559695278394779
('find', 'Peeta') 0.4539028920884314
('know', 'like') 0.44948195641050676
('know', 'come') 0.446800989657249
('Gale', 'get') 0.4462323525616195
('get', 'Gale') 0.4462323525616195
('great', 'Gale') 0.4462323525616195
('Gale', 'great') 0.4462323525616195
('time', 'write') 0.44468316557479043
('feel', 'Gale') 0.44365170596812803
('want', 'read') 0.44322729634584823
('book', 'read') 0.44066647548417454
('know', 'want') 0.4399398992773038
('get', 'finish') 0.43885224526399735
('little', 'come') 0.43825392907879057
('happen', 'want') 0.43651230380503636
('find', 'Gale') 0.4346715301605436
('end', 'end') 0.4333952058009389
('little', 'happen') 0.4322479050185786
('like', 'good') 0.4288501566169764
('know', 'write') 0.4229903409635305
('know', 'finish') 0.4229903409635305
('great', 'good') 0.41883337837350515
('get', 'know') 0.4112943012003392
('end', 'people') 0.40986470839074485
('want', 'Katniss') 0.40946262102747866
('feel', 'come') 0.40683773284541175
('book', 'want') 0.40410887975037707
('like', 'like') 0.40409761893295393
('great', 'read') 0.40274724062188094
('little', 'get') 0.40274724062188083
('feel', 'Collins') 0.4008317087851997
('go', 'good') 0.40039967668466736
('think', 'read') 0.3888902059604547
('Suzanne', 'love') 0.386063831095167
('love', 'Suzanne') 0.386063831095167
('feel', 'want') 0.3819581369627882
('people', 'feel') 0.37133104438850184
('feel', 'people') 0.37133104438850184
('Peeta', 'way') 0.37121117624331795
('finish', 'good') 0.3659908969991254
('write', 'good') 0.3659908969991254
('want', 'Gale') 0.36154926533158105
('Gale', 'little') 0.3572848665451234
('little', 'Gale') 0.3572848665451234
('enjoy', 'feel') 0.3559461255490225
('like', 'people') 0.3475608250545902
('people', 'like') 0.3475608250545902
('thing', 'Peeta') 0.34681972311915893
('good', 'good') 0.34597203010863326
('like', 'know') 0.33169892075412327
('finish', 'feel') 0.3184885630141222
('feel', 'write') 0.3184885630141222
('people', 'love') 0.31374316951554093
('get', 'Peeta') 0.31131303466224913
('Mockingjay', 'go') 0.3075495227212048
('read', 'find') 0.3041750412311751
('Peeta', 'find') 0.2997522122611732
('love', 'go') 0.2953094678267029
('want', 'war') 0.29440996249395257
('thing', 'Katniss') 0.29326361865019684
('book', 'enjoy') 0.29108549134698153
('come', 'go') 0.28562416209223923
('go', 'come') 0.28562416209223923
('go', 'thing') 0.28562416209223923
('thing', 'go') 0.28562416209223923
('trilogy', 'think') 0.2782246380729349
('come', 'come') 0.2750260311004159
('Katniss', 'leave') 0.2733822480963677
('know', 'think') 0.2696038950290279
('end', 'know') 0.2604714114657554
('get', 'Katniss') 0.257756930193287
('people', 'Katniss') 0.257756930193287
('people', 'think') 0.2546941406627408
('book', 'finish') 0.2536279288120812
('finish', 'book') 0.2536279288120812
('book', 'write') 0.2536279288120812
('go', 'get') 0.2501174736353292
('go', 'people') 0.2501174736353292
('great', 'go') 0.2501174736353292
('find', 'Katniss') 0.24619610779221102
('like', 'leave') 0.2454031073012873
('time', 'come') 0.24535026295429915
('Katniss', 'know') 0.24189502589282005
('read', 'go') 0.2412126952923697
('Collins', 'time') 0.23934423889408732
('Peeta', 'Peeta') 0.23629185834394714
('happen', 'get') 0.23351331858329408
('read', 'lot') 0.2321217235911176
('think', 'Peeta') 0.22846312851387138
('thing', 'know') 0.2236574383430393
('Collins', 'find') 0.22195249618221805
('come', 'love') 0.21571846534792813
('like', 'enjoy') 0.21439287055872705
('get', 'time') 0.20984357449738925
('Katniss', 'come') 0.20625224166056685
('go', 'Gale') 0.2046550995585718
('want', 'leave') 0.1994934059972098
('leave', 'want') 0.1994934059972098
('time', 'find') 0.19828275209631332
('go', 'finish') 0.19727499226094958
('people', 'know') 0.18815074988612948
('great', 'know') 0.18815074988612948
('Gale', 'happen') 0.18805094450653684
('war', 'enjoy') 0.181386574090557
('character', 'Peeta') 0.17054048078116676
('go', 'Katniss') 0.1703303570174975
('go', 'want') 0.16543438640529065
('feel', 'good') 0.16493830349910737
('come', 'Peeta') 0.16449816632520414
('read', 'read') 0.16426802063442092
('go', 'little') 0.16116998761883308
('Collins', 'good') 0.16065197031842227
('Peeta', 'want') 0.15763707594525905
('read', 'come') 0.1505718566270098
('come', 'leave') 0.1373616248902034
('good', 'time') 0.13698222623251757
('read', 'enjoy') 0.1360478935014953
('write', 'know') 0.1353082685117497
('leave', 'happen') 0.13135560082999143
('get', 'good') 0.1311513059217244
('finish', 'want') 0.13102560671974892
('write', 'want') 0.13102560671974892
('go', 'think') 0.13089992331607678
('feel', 'go') 0.12975379138545406
('time', 'read') 0.12089608848089323
('read', 'spoiler') 0.12089608848089302
('find', 'good') 0.11959048352064833
('feel', 'feel') 0.11743596951410425
('Peeta', 'end') 0.1116999807582332
('thing', 'enjoy') 0.10635138814764288
('enjoy', 'thing') 0.10635138814764288
('read', 'finish') 0.09859033096659497
('Katniss', 'good') 0.09788420493878523
('little', 'want') 0.09492060207763253
('time', 'feel') 0.08947989224751424
('Katniss', 'lot') 0.08713141316252361
('lot', 'Katniss') 0.08713141316252361
('Katniss', 'love') 0.07980537307045085
('Katniss', 'people') 0.0754353733993322
('Mockingjay', 'feel') 0.07208814953564498
('write', 'come') 0.0688938256127427
('come', 'finish') 0.0688938256127427
('finish', 'thing') 0.0688938256127427
('Mockingjay', 'happen') 0.06780181635495974
('Mockingjay', 'Collins') 0.06780181635495974
('Collins', 'finish') 0.06288780155253057
('happen', 'write') 0.06288780155253057
('finish', 'happen') 0.06288780155253057
('love', 'come') 0.06156778552066988
('love', 'feel') 0.05984809464114317
('come', 'book') 0.054295026191589914
('book', 'happen') 0.04828900213137792
('happen', 'book') 0.04828900213137792
('time', 'Mockingjay') 0.044132072269054995
('get', 'Mockingjay') 0.03830115195826193
('end', 'happen') 0.03390026467927831
('Collins', 'end') 0.03390026467927831
('write', 'get') 0.03338713715583276
('get', 'write') 0.03338713715583276
('people', 'finish') 0.03338713715583276
('come', 'read') 0.03278882097062636
('read', 'thing') 0.03278882097062636
('love', 'time') 0.03189201737455294
('think', 'way') 0.026910209792029096
('way', 'think') 0.026910209792029096
('Mockingjay', 'find') 0.02674032955718588
('find', 'Mockingjay') 0.02674032955718588
('love', 'people') 0.026061097063760082
('like', 'think') 0.02354886208932412
('Mockingjay', 'know') 0.02243924765779492
('read', 'trilogy') 0.02081262992391053
('end', 'leave') 0.020024918185661537
('book', 'get') 0.018788337734680024
('war', 'Peeta') 0.016389800953908365
('little', 'leave') 0.012907450416797465
('read', 'want') 0.007909225088002656
('find', 'book') 0.007227515333603956
('people', 'end') 0.004399600282580265
('think', 'thing') 0.0025187566678700193
('feel', 'end') 0.0018189536890887228
('Katniss', 'happen') -0.0004244778617961298
('go', 'feel') -0.0037776012390683955
('enjoy', 'end') -0.010985318556898898
('love', 'Gale') -0.019401277012997264
('Gale', 'love') -0.019401277012997264
('go', 'read') -0.021151569175121548
('thing', 'like') -0.02239759459666465
('book', 'Gale') -0.026674036342077354
('go', 'time') -0.03173367850565846
('think', 'people') -0.03298793178903991
('finish', 'end') -0.04844288109179921
('think', 'know') -0.048849836089506604
('think', 'want') -0.05313249788150729
('good', 'book') -0.054073010530191784
('war', 'come') -0.05540389106483467
('write', 'read') -0.05556034886066321
('character', 'feel') -0.05712358194436134
('Peeta', 'come') -0.05864538498900561
('Peeta', 'thing') -0.05864538498900561
('come', 'know') -0.0640246341087415
('end', 'good') -0.06846174798229139
('find', 'like') -0.06946510545465037
('think', 'feel') -0.07330890636537855
('like', 'go') -0.07633798474241234
('end', 'want') -0.08028348694745813
('Gale', 'go') -0.08302697289320887
('end', 'little') -0.08454788573391582
('war', 'get') -0.0909105795217445
('know', 'time') -0.09370040225485816
('Mockingjay', 'enjoy') -0.09486680253760127
('enjoy', 'Mockingjay') -0.09486680253760127
('feel', 'Peeta') -0.09673272003940706
('find', 'war') -0.10247140192282031
('like', 'finish') -0.11074676442795403
('come', 'Katniss') -0.11220148945796768
('end', 'think') -0.11481795003667217
('feel', 'read') -0.12308154973615869
('love', 'know') -0.12333219986122934
('want', 'want') -0.1239585504501197
('want', 'love') -0.12761486165323013
('love', 'think') -0.13089678123833973
('leave', 'Katniss') -0.1320828600117966
('Mockingjay', 'write') -0.1323243650725016
('think', 'leave') -0.13514564954234243
('Gale', 'war') -0.13637295359850174
('come', 'happen') -0.1364451010679606
('happen', 'thing') -0.1364451010679606
('thing', 'Collins') -0.1364451010679606
('leave', 'go') -0.1397223165697542
('love', 'get') -0.15626045973019462
('book', 'love') -0.15916321938815614
('love', 'book') -0.15916321938815614
('get', 'thing') -0.16594576546465847
('enjoy', 'think') -0.16615588628490285
('Peeta', 'good') -0.1670134217107873
('find', 'love') -0.16782128213127065
('read', 'leave') -0.16941410637715734
('people', 'Collins') -0.17195178952487034
('people', 'happen') -0.17195178952487034
('know', 'good') -0.17239267083052323
('good', 'know') -0.17239267083052323
('Peeta', 'know') -0.1745524988839533
('come', 'find') -0.1775065878657345
('thing', 'find') -0.1775065878657345
('find', 'thing') -0.1775065878657345
('want', 'Peeta') -0.17883516067595395
('read', 'feel') -0.18762007087372987
('read', 'know') -0.18847880858214766
('Mockingjay', 'Katniss') -0.1956366444868284
('think', 'Mockingjay') -0.19869943401737422
('get', 'get') -0.20145245392156833
('get', 'people') -0.20145245392156833
('go', 'Mockingjay') -0.20327610104478588
('finish', 'think') -0.20361344881980326
('think', 'finish') -0.20361344881980326
('Katniss', 'feel') -0.20744723834831777
('finish', 'go') -0.20819011584721497
('war', 'feel') -0.21127426177161948
('Gale', 'come') -0.21140813954141596
('come', 'Gale') -0.21140813954141596
('thing', 'Gale') -0.21140813954141596
('Katniss', 'think') -0.21285850696385425
('find', 'get') -0.21301327632264422
('people', 'find') -0.21301327632264422
('book', 'think') -0.218212248240956
('feel', 'know') -0.2198950048155264
('way', 'read') -0.2305017983569954
('read', 'way') -0.2305017983569954
('enjoy', 'know') -0.23269927706151453
('know', 'enjoy') -0.23269927706151453
('want', 'enjoy') -0.23698193885351523
('end', 'go') -0.2371776527204674
('good', 'thing') -0.23880711372953034
('good', 'happen') -0.24481313778974226
('people', 'Gale') -0.24691482799832576
('come', 'little') -0.2548932514811547
('Gale', 'find') -0.2584756503994017
('Gale', 'think') -0.260771862659752
('happen', 'little') -0.2608992755413667
('know', 'Mockingjay') -0.2652428247939857
('want', 'Mockingjay') -0.2695254865859864
('Mockingjay', 'want') -0.2695254865859864
('good', 'get') -0.2743138021864402
('good', 'people') -0.2743138021864402
('end', 'time') -0.2774515518584073
('end', 'get') -0.28328247216920044
('thing', 'feel') -0.2863094477145336
('leave', 'leave') -0.2879848537717899
('find', 'read') -0.3019607623391402
('find', 'little') -0.3019607623391405
('love', 'end') -0.30708334946477833
('like', 'Collins') -0.31608569110865736
('Gale', 'good') -0.31977617626319743
('love', 'Mockingjay') -0.32197196195852906
('Katniss', 'war') -0.3248483759668344
('love', 'Katniss') -0.32565973503771367
('war', 'think') -0.3279111654973804
('Mockingjay', 'come') -0.3316572676929928
('Mockingjay', 'thing') -0.3316572676929928
('thing', 'Mockingjay') -0.3316572676929928
('go', 'war') -0.3324878325247921
('Peeta', 'go') -0.3357293264489633
('go', 'know') -0.3411085755686991
('love', 'thing') -0.3438973225874946
('want', 'go') -0.3453912373606998
('enjoy', 'Katniss') -0.34541465354831163
('like', 'get') -0.3455863555053551
('love', 'happen') -0.34990334664770667
('love', 'Collins') -0.34990334664770667
('happen', 'Peeta') -0.3523334815009983
('finish', 'leave') -0.3564526530492507
('read', 'good') -0.3632612882029363
('little', 'good') -0.3632612882029363
('people', 'Mockingjay') -0.36716395614990255
('Gale', 'feel') -0.3672785102482009
('read', 'think') -0.37324984608644207
('Katniss', 'way') -0.3754921087855896
('Peeta', 'time') -0.376003225586903
('great', 'love') -0.37940401104440435
('enjoy', 'finish') -0.3874628897918111
('love', 'find') -0.39096483344548044
('Gale', 'like') -0.3910487295821126
('leave', 'read') -0.392557657691367
('think', 'love') -0.3932610457058308
('war', 'want') -0.3987372180659927
('thing', 'think') -0.40294635144029434
('happen', 'Katniss') -0.4058895859699604
('Mockingjay', 'Gale') -0.41262633022666
('finish', 'write') -0.42492045232671155
('time', 'Katniss') -0.42955933005586533
('character', 'read') -0.4311724938191467
('like', 'read') -0.43453384152185137
('think', 'great') -0.43845303989720447
('read', 'Collins') -0.4432208323353211
('Mockingjay', 'end') -0.4489939743975347
('find', 'think') -0.45001386229828044
('think', 'find') -0.45001386229828044
('love', 'good') -0.45226535930927625
('think', 'think') -0.4523100745586306
('find', 'go') -0.4545905293256922
('end', 'book') -0.4685067886211167
('know', 'thing') -0.469489742216906
('read', 'Peeta') -0.4707816319141924
('thing', 'want') -0.4737724040089067
('Collins', 'know') -0.475495766277118
('know', 'Collins') -0.475495766277118
('enjoy', 'like') -0.47875431000121826
('Peeta', 'leave') -0.48399186365099905
('leave', 'Peeta') -0.48399186365099905
('write', 'think') -0.49129552127158416
('know', 'get') -0.5049964306738158
('get', 'want') -0.5092790924658166
('people', 'want') -0.5092790924658166
('want', 'people') -0.5092790924658166
('enjoy', 'Peeta') -0.5150021003935593
('good', 'go') -0.5158910551894879
('like', 'write') -0.5162118725361186
('finish', 'like') -0.5162118725361186
('know', 'find') -0.5165572530748919
('end', 'Katniss') -0.5172202686142907
('want', 'find') -0.5208399148668927
('Katniss', 'little') -0.5243377363831546
('little', 'think') -0.5274005259137006
('book', 'like') -0.5308106719572714
('end', 'like') -0.5451994094093711
('Mockingjay', 'Peeta') -0.5475456481260306
('finish', 'Peeta') -0.5524596629284597
('Gale', 'want') -0.5547414665425742
('book', 'Peeta') -0.5670584623496124
('come', 'feel') -0.5739915201663144
('character', 'Katniss') -0.5761628042477407
('end', 'war') -0.578205705877541
('war', 'end') -0.578205705877541
('happen', 'feel') -0.5799975442265263
('come', 'enjoy') -0.5867957924123024
('Katniss', 'Collins') -0.5882111427639152
('get', 'leave') -0.5912922441266516
('Collins', 'enjoy') -0.5928018164725143
('little', 'know') -0.5939439166903122
('know', 'little') -0.5939439166903122
('find', 'leave') -0.6028530665277277
('leave', 'find') -0.6028530665277277
('write', 'Katniss') -0.6060157673974217
('get', 'feel') -0.6094982086232241
('Katniss', 'book') -0.6206145668185746
('book', 'Katniss') -0.6206145668185746
('find', 'feel') -0.6210590310243002
('read', 'end') -0.6235443864666027
('want', 'feel') -0.6296427747156919
('write', 'happen') -0.6302593790074147
('write', 'Collins') -0.6302593790074147
('feel', 'love') -0.6332990859188021
('leave', 'Gale') -0.6367546182034091
('like', 'war') -0.6405095892136957
('like', 'feel') -0.6482715945491851
('come', 'end') -0.6532408918204551
('like', 'want') -0.6534129940496035
('time', 'love') -0.6612551631853923
('good', 'leave') -0.6641535923915237
('leave', 'good') -0.6641535923915237
('happen', 'read') -0.666364383649531
('enjoy', 'Gale') -0.6677648549459695
('finish', 'find') -0.6713208658051885
('find', 'finish') -0.6713208658051885
('find', 'write') -0.6713208658051885
('write', 'find') -0.6713208658051885
('Peeta', 'war') -0.676757379606037
('good', 'feel') -0.6823595568880961
('read', 'like') -0.6858482698027576
('get', 'end') -0.688747580277365
('enjoy', 'good') -0.695163829134084
('read', 'people') -0.6958650480462288
('Collins', 'think') -0.6966344479522872
('end', 'find') -0.700308402678441
('Gale', 'write') -0.7052224174808699
('like', 'love') -0.7058594694221462
('come', 'like') -0.71554477515661
('time', 'think') -0.7203041920381921
('happen', 'like') -0.721550799216822
('like', 'happen') -0.721550799216822
('get', 'think') -0.7261351123489852
('good', 'finish') -0.7326213916689844
('love', 'read') -0.7560335695126815
('leave', 'Mockingjay') -0.757003746354986
('Peeta', 'happen') -0.7577985896091629
('love', 'leave') -0.769243801249488
('feel', 'Mockingjay') -0.7752097108515585
('Katniss', 'Mockingjay') -0.7834233093889474
('little', 'read') -0.7848125340627251
('read', 'little') -0.7848125340627251
('Peeta', 'people') -0.7872992540058608
('good', 'think') -0.7989964606138569
('love', 'enjoy') -0.8002540379920483
('Peeta', 'think') -0.8011562886672869
('Katniss', 'thing') -0.805348670017913
('end', 'feel') -0.8091112625272401
('Mockingjay', 'finish') -0.8254715456324468
('leave', 'think') -0.8282928301022877
('come', 'Collins') -0.8295922816279059
('happen', 'Collins') -0.8355983056881179
('write', 'love') -0.8377116005269488
('finish', 'love') -0.8377116005269488
('like', 'little') -0.8399989496300161
('great', 'Katniss') -0.8408553584748228
('Katniss', 'great') -0.8408553584748228
('get', 'come') -0.8590929460246038
('enjoy', 'go') -0.8638797338722599
('Collins', 'get') -0.8650989700848156
('get', 'Collins') -0.8650989700848156
('Mockingjay', 'think') -0.8918466145773195
('Katniss', 'write') -0.8936978398492025
('Katniss', 'finish') -0.8936978398492025
('write', 'go') -0.9013372964071603
('go', 'write') -0.9013372964071603
('Peeta', 'feel') -0.907662936255736
('Collins', 'Gale') -0.9105613441615731
('Gale', 'Collins') -0.9105613441615731
('think', 'book') -0.9113594288009013
('Peeta', 'like') -0.9314331555896477
('come', 'good') -0.9319542942894756
('happen', 'good') -0.9379603183496875
('good', 'Collins') -0.9379603183496875
('love', 'war') -0.9620093172045259
('good', 'find') -0.9790218051474614
('think', 'like') -0.9880520495891558
('go', 'like') -0.9926287166165674
('feel', 'find') -1.0265241391324649
('feel', 'think') -1.028820351392815
('Gale', 'read') -1.029009494574767
('Collins', 'Mockingjay') -1.0308104723131502
('Collins', 'love') -1.043050527207652
('happen', 'love') -1.043050527207652
('good', 'read') -1.0564084687628816
('come', 'think') -1.0960935320002396
('little', 'feel') -1.103910802747885
('end', 'enjoy') -1.1095976072250087
('Mockingjay', 'good') -1.1331724849747198
('leave', 'like') -1.1408912538186033
('end', 'write') -1.147055169759909
('write', 'end') -1.147055169759909
('end', 'finish') -1.147055169759909
('go', 'find') -1.1477377098856376
('end', 'love') -1.154381209851982
('little', 'love') -1.161498677620846
('happen', 'know') -1.1686429468370634
('want', 'Collins') -1.172925608629064
('end', 'read') -1.1831601744020255
('find', 'know') -1.2097044336348373
('read', 'Katniss') -1.2174849169430997
('Peeta', 'write') -1.2456068434884051
('Peeta', 'finish') -1.2456068434884051
('good', 'want') -1.2752876212906339
('want', 'good') -1.2752876212906339
('war', 'read') -1.2784703542063502
('finish', 'Katniss') -1.299162947957367
('go', 'love') -1.3141284446073975
('Katniss', 'read') -1.3352679525994833
('read', 'love') -1.3438202344148005
('end', 'come') -1.3463880723804005
('thing', 'read') -1.3535055401492642
('Collins', 'read') -1.3595115642094764
('get', 'read') -1.3890122286061741
('read', 'get') -1.3890122286061741
('people', 'read') -1.3890122286061741
('Katniss', 'like') -1.3904543681667745
('Collins', 'like') -1.4146979797767674
('war', 'Katniss') -1.4234606646349444
('read', 'Gale') -1.4344746026829316
('want', 'think') -1.4394268590013979
('get', 'like') -1.4441986441734649
('Collins', 'Peeta') -1.4509457701691082
('Peeta', 'Collins') -1.4509457701691082
('good', 'end') -1.4547561091021821
('leave', 'love') -1.4623909818094332
('Katniss', 'Katniss') -1.4802582630280776
('enjoy', 'love') -1.4934012185519936
('good', 'like') -1.517059992438337
('love', 'write') -1.530858781086894
('good', 'Peeta') -1.5533077828306778
('feel', 'like') -1.5645623264233401
('Peeta', 'read') -1.569393920582302
('good', 'Katniss') -1.6068638872996401
('love', 'like') -1.6221502012963014
('Peeta', 'Mockingjay') -1.6461579367941404
('want', 'like') -1.7520252827177134
('get', 'love') -1.765698372164295
('happen', 'think') -1.7952467366203972
('read', 'write') -1.8473198180887183
('love', 'want') -2.0735250107085434
  • compare genres

  • differences in subjectivity are not noticeable at small scale. Need a particular, larger-scale focus to bring them out. But they can drown again in very large sets

  • topics need large scale

  • named entities are manageable at small scale, but become harder to deal with at large scale: mostly long tail, unknown, lower accuracy

  • many aspects become harder to summarise and organise at large scale

In [ ]:
 
In [13]:
import math
review_df.iloc[0:10,]

chunks = math.ceil(113000 / 10000)
for chunk in range(chunks):
    print(chunk)
0
1
2
3
4
5
6
7
8
9
10
11
In [ ]: