大家好,又见面了,我是你们的朋友全栈君。
教程地址:
https://www.kaggle.com/c/word2vec-nlp-tutorial/overview/part-1-for-beginners-bag-of-words
读取训练数据
训练数据的内容是2500条电影评论。
import pandas as pd
train = pd.read_csv("./data/labeledTrainData.tsv", header=0, delimiter="\t", quoting=3)
train.head(3)
id | sentiment | review | |
---|---|---|---|
0 | “5814_8” | 1 | “With all this stuff going down at the moment … |
1 | “2381_9” | 1 | “\”The Classic War of the Worlds\” by Timothy … |
2 | “7759_3” | 0 | “The film starts with a manager (Nicholas Bell… |
train.shape
(25000, 3)
example = train['review'][0]
example
'"With all this stuff going down at the moment with MJ i\'ve started listening to his music, watching the odd documentary here and there, watched The Wiz and watched Moonwalker again. Maybe i just want to get a certain insight into this guy who i thought was really cool in the eighties just to maybe make up my mind whether he is guilty or innocent. Moonwalker is part biography, part feature film which i remember going to see at the cinema when it was originally released. Some of it has subtle messages about MJ\'s feeling towards the press and also the obvious message of drugs are bad m\'kay.<br /><br />Visually impressive but of course this is all about Michael Jackson so unless you remotely like MJ in anyway then you are going to hate this and find it boring. Some may call MJ an egotist for consenting to the making of this movie BUT MJ and most of his fans would say that he made it for the fans which if true is really nice of him.<br /><br />The actual feature film bit when it finally starts is only on for 20 minutes or so excluding the Smooth Criminal sequence and Joe Pesci is convincing as a psychopathic all powerful drug lord. Why he wants MJ dead so bad is beyond me. Because MJ overheard his plans? Nah, Joe Pesci\'s character ranted that he wanted people to know it is he who is supplying drugs etc so i dunno, maybe he just hates MJ\'s music.<br /><br />Lots of cool things in this like MJ turning into a car and a robot and the whole Speed Demon sequence. Also, the director must have had the patience of a saint when it came to filming the kiddy Bad sequence as usually directors hate working with one kid let alone a whole bunch of them performing a complex dance scene.<br /><br />Bottom line, this movie is for people who like MJ on one level or another (which i think is most people). If not, then stay away. It does try and give off a wholesome message and ironically MJ\'s bestest buddy in this movie is a girl! Michael Jackson is truly one of the most talented people ever to grace this planet but is he guilty? Well, with all the attention i\'ve gave this subject....hmmm well i don\'t know because people can be different behind closed doors, i know this for a fact. He is either an extremely nice but stupid guy or one of the most sickest liars. I hope he is not the latter."'
train当中的review项里面包含的数据是HTML类型,为了去除HTML标签,保存纯粹的评论,使用BeautifulSoup。
BeautifulSoup处理
from bs4 import BeautifulSoup
# 创建 beautifulsoup 对象
soup = BeautifulSoup(example)
#格式化输出内容
print(soup.prettify())
<html>
<body>
<p>
"With all this stuff going down at the moment with MJ i've started listening to his music, watching the odd documentary here and there, watched The Wiz and watched Moonwalker again. Maybe i just want to get a certain insight into this guy who i thought was really cool in the eighties just to maybe make up my mind whether he is guilty or innocent. Moonwalker is part biography, part feature film which i remember going to see at the cinema when it was originally released. Some of it has subtle messages about MJ's feeling towards the press and also the obvious message of drugs are bad m'kay.
<br/>
<br/>
Visually impressive but of course this is all about Michael Jackson so unless you remotely like MJ in anyway then you are going to hate this and find it boring. Some may call MJ an egotist for consenting to the making of this movie BUT MJ and most of his fans would say that he made it for the fans which if true is really nice of him.
<br/>
<br/>
The actual feature film bit when it finally starts is only on for 20 minutes or so excluding the Smooth Criminal sequence and Joe Pesci is convincing as a psychopathic all powerful drug lord. Why he wants MJ dead so bad is beyond me. Because MJ overheard his plans? Nah, Joe Pesci's character ranted that he wanted people to know it is he who is supplying drugs etc so i dunno, maybe he just hates MJ's music.
<br/>
<br/>
Lots of cool things in this like MJ turning into a car and a robot and the whole Speed Demon sequence. Also, the director must have had the patience of a saint when it came to filming the kiddy Bad sequence as usually directors hate working with one kid let alone a whole bunch of them performing a complex dance scene.
<br/>
<br/>
Bottom line, this movie is for people who like MJ on one level or another (which i think is most people). If not, then stay away. It does try and give off a wholesome message and ironically MJ's bestest buddy in this movie is a girl! Michael Jackson is truly one of the most talented people ever to grace this planet but is he guilty? Well, with all the attention i've gave this subject....hmmm well i don't know because people can be different behind closed doors, i know this for a fact. He is either an extremely nice but stupid guy or one of the most sickest liars. I hope he is not the latter."
</p>
</body>
</html>
# 查找各个标签
print(soup.title)
print(soup.head)
print(soup.a)
print(soup.p)
None
None
None
<p>"With all this stuff going down at the moment with MJ i've started listening to his music, watching the odd documentary here and there, watched The Wiz and watched Moonwalker again. Maybe i just want to get a certain insight into this guy who i thought was really cool in the eighties just to maybe make up my mind whether he is guilty or innocent. Moonwalker is part biography, part feature film which i remember going to see at the cinema when it was originally released. Some of it has subtle messages about MJ's feeling towards the press and also the obvious message of drugs are bad m'kay.<br/><br/>Visually impressive but of course this is all about Michael Jackson so unless you remotely like MJ in anyway then you are going to hate this and find it boring. Some may call MJ an egotist for consenting to the making of this movie BUT MJ and most of his fans would say that he made it for the fans which if true is really nice of him.<br/><br/>The actual feature film bit when it finally starts is only on for 20 minutes or so excluding the Smooth Criminal sequence and Joe Pesci is convincing as a psychopathic all powerful drug lord. Why he wants MJ dead so bad is beyond me. Because MJ overheard his plans? Nah, Joe Pesci's character ranted that he wanted people to know it is he who is supplying drugs etc so i dunno, maybe he just hates MJ's music.<br/><br/>Lots of cool things in this like MJ turning into a car and a robot and the whole Speed Demon sequence. Also, the director must have had the patience of a saint when it came to filming the kiddy Bad sequence as usually directors hate working with one kid let alone a whole bunch of them performing a complex dance scene.<br/><br/>Bottom line, this movie is for people who like MJ on one level or another (which i think is most people). If not, then stay away. It does try and give off a wholesome message and ironically MJ's bestest buddy in this movie is a girl! Michael Jackson is truly one of the most talented people ever to grace this planet but is he guilty? Well, with all the attention i've gave this subject....hmmm well i don't know because people can be different behind closed doors, i know this for a fact. He is either an extremely nice but stupid guy or one of the most sickest liars. I hope he is not the latter."</p>
# 遍历孩子
for child in soup.body.children:
print (child)
<p>"With all this stuff going down at the moment with MJ i've started listening to his music, watching the odd documentary here and there, watched The Wiz and watched Moonwalker again. Maybe i just want to get a certain insight into this guy who i thought was really cool in the eighties just to maybe make up my mind whether he is guilty or innocent. Moonwalker is part biography, part feature film which i remember going to see at the cinema when it was originally released. Some of it has subtle messages about MJ's feeling towards the press and also the obvious message of drugs are bad m'kay.<br/><br/>Visually impressive but of course this is all about Michael Jackson so unless you remotely like MJ in anyway then you are going to hate this and find it boring. Some may call MJ an egotist for consenting to the making of this movie BUT MJ and most of his fans would say that he made it for the fans which if true is really nice of him.<br/><br/>The actual feature film bit when it finally starts is only on for 20 minutes or so excluding the Smooth Criminal sequence and Joe Pesci is convincing as a psychopathic all powerful drug lord. Why he wants MJ dead so bad is beyond me. Because MJ overheard his plans? Nah, Joe Pesci's character ranted that he wanted people to know it is he who is supplying drugs etc so i dunno, maybe he just hates MJ's music.<br/><br/>Lots of cool things in this like MJ turning into a car and a robot and the whole Speed Demon sequence. Also, the director must have had the patience of a saint when it came to filming the kiddy Bad sequence as usually directors hate working with one kid let alone a whole bunch of them performing a complex dance scene.<br/><br/>Bottom line, this movie is for people who like MJ on one level or another (which i think is most people). If not, then stay away. It does try and give off a wholesome message and ironically MJ's bestest buddy in this movie is a girl! Michael Jackson is truly one of the most talented people ever to grace this planet but is he guilty? Well, with all the attention i've gave this subject....hmmm well i don't know because people can be different behind closed doors, i know this for a fact. He is either an extremely nice but stupid guy or one of the most sickest liars. I hope he is not the latter."</p>
# find_all是一个很神奇的函数,可以传入字符、列表、正则表达式、函数等等等。
print(soup.find_all('br'))
[<br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>]
# 可以在soup.select里面直接使用css代码
print(soup.select('.p'))
[]
# 获取text
print(soup.get_text())
"With all this stuff going down at the moment with MJ i've started listening to his music, watching the odd documentary here and there, watched The Wiz and watched Moonwalker again. Maybe i just want to get a certain insight into this guy who i thought was really cool in the eighties just to maybe make up my mind whether he is guilty or innocent. Moonwalker is part biography, part feature film which i remember going to see at the cinema when it was originally released. Some of it has subtle messages about MJ's feeling towards the press and also the obvious message of drugs are bad m'kay.Visually impressive but of course this is all about Michael Jackson so unless you remotely like MJ in anyway then you are going to hate this and find it boring. Some may call MJ an egotist for consenting to the making of this movie BUT MJ and most of his fans would say that he made it for the fans which if true is really nice of him.The actual feature film bit when it finally starts is only on for 20 minutes or so excluding the Smooth Criminal sequence and Joe Pesci is convincing as a psychopathic all powerful drug lord. Why he wants MJ dead so bad is beyond me. Because MJ overheard his plans? Nah, Joe Pesci's character ranted that he wanted people to know it is he who is supplying drugs etc so i dunno, maybe he just hates MJ's music.Lots of cool things in this like MJ turning into a car and a robot and the whole Speed Demon sequence. Also, the director must have had the patience of a saint when it came to filming the kiddy Bad sequence as usually directors hate working with one kid let alone a whole bunch of them performing a complex dance scene.Bottom line, this movie is for people who like MJ on one level or another (which i think is most people). If not, then stay away. It does try and give off a wholesome message and ironically MJ's bestest buddy in this movie is a girl! Michael Jackson is truly one of the most talented people ever to grace this planet but is he guilty? Well, with all the attention i've gave this subject....hmmm well i don't know because people can be different behind closed doors, i know this for a fact. He is either an extremely nice but stupid guy or one of the most sickest liars. I hope he is not the latter."
import nltk
import re
from nltk.corpus import stopwords
from nltk.stem.lancaster import LancasterStemmer
lancaster_stemmer = LancasterStemmer()
print (stopwords.words("english"))
['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're", "you've", "you'll", "you'd", 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', "she's", 'her', 'hers', 'herself', 'it', "it's", 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', "that'll", 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so', 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', "don't", 'should', "should've", 'now', 'd', 'll', 'm', 'o', 're', 've', 'y', 'ain', 'aren', "aren't", 'couldn', "couldn't", 'didn', "didn't", 'doesn', "doesn't", 'hadn', "hadn't", 'hasn', "hasn't", 'haven', "haven't", 'isn', "isn't", 'ma', 'mightn', "mightn't", 'mustn', "mustn't", 'needn', "needn't", 'shan', "shan't", 'shouldn', "shouldn't", 'wasn', "wasn't", 'weren', "weren't", 'won', "won't", 'wouldn', "wouldn't"]
获取词袋和向量
# 写一个处理函数
def review_to_words( raw_review ):
review_text = BeautifulSoup(raw_review).get_text()
# 去除标点和数字,仅保留强烈语气词
letters_only = re.sub("[^a-zA-Z?!]", " ", review_text)
# 统一转换为小写字母
words = letters_only.lower().split()
# 由于set的搜索速度更快,所以把list转换成set
stops = set(stopwords.words("english"))
# 移除停止词,并且将词转为原形形式
meaningful_words = [lancaster_stemmer.stem(w) for w in words if not w in stops]
# 返回标准语句
return( " ".join( meaningful_words ))
# 获取字符串列表
num_reviews = train["review"].size
clean_train_reviews = []
for i in range(num_reviews):
clean_train_reviews.append( review_to_words( train["review"][i] ) )
uk edit show rath less extrav us vert person concern get new kitch perhap bedroom bathroom wond grat got us vert show everyth real tv instead mak improv hous occup could afford entir hous get rebuilt know show try show lousy welf system ex us beg hard enough receiv rath vulg produc plac tak plac particul sear also uncal rsther turn on famy depr are pot millionair would far bet help commun whol instead spend hundr thousand doll on hom build someth whol commun perhap plac diy pow tool borrow return along build mat everyon benefit want giv on person caus enorm res among rest loc commun stil liv run hous
在进行下一步之前,有必要介绍一下词袋模型,要将几个句子转化成向量,第一步是把它们包含的所有词不重复地装到一个袋子里,然后这几个句子就可以转换成和袋子里的词的数量一样长的向量,这个向量的每一个位置都对应着袋子里面某一个词在句子中出现的次数,如果没有出现就是0.
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer(analyzer = "word", tokenizer = None, preprocessor = None, stop_words = None, max_features = 5000)
train_data_features = vectorizer.fit_transform(clean_train_reviews)
train_data_features = train_data_features.toarray()
print(train_data_features.shape)
(25000, 5000)
查看词袋里面装的具体内容
vocab = vectorizer.get_feature_names()
print(len(vocab))
print(vocab)
5000
['abandon', 'abbot', 'abc', 'abduc', 'abl', 'abomin', 'aborigin', 'abort', 'abound', 'about', 'abraham', 'abrupt', 'abs', 'absolv', 'absorb', 'absurd', 'abud', 'abund', 'abus', 'abysm', 'ac', 'academy', 'acc', 'acceiv', 'access', 'accid', 'acclaim', 'accompany', 'accompl', 'accord', 'account', 'accus', 'ach', 'achiev', 'acid', 'acknowledg', 'acquaint', 'acquir', 'across', 'act', 'actress', 'ad', 'adam', 'adapt', 'addict', 'addit', 'address', 'adel', 'adequ', 'adjust', 'admin', 'admir', 'admit', 'adolesc', 'adopt', 'adr', 'adult', 'adv', 'advers', 'advert', 'aesthet', 'af', 'affair', 'affect', 'affirm', 'affleck', 'afford', 'afr', 'afraid', 'afric', 'afterma', 'afternoon', 'afterward', 'ag', 'again', 'agend', 'aggress', 'ago', 'agon', 'agony', 'agr', 'agree', 'ah', 'ahead', 'aid', 'aim', 'aimless', 'air', 'airpl', 'airport', 'ak', 'akin', 'akshay', 'al', 'ala', 'alarm', 'alb', 'albeit', 'albert', 'alcohol', 'alec', 'alert', 'alex', 'alexand', 'alexandr', 'alfr', 'alic', 'alik', 'alison', 'all', 'alleg', 'alley', 'allow', 'allud', 'almost', 'alon', 'along', 'alongsid', 'alot', 'already', 'alright', 'also', 'alt', 'altern', 'although', 'altm', 'altogeth', 'alvin', 'alway', 'aly', 'am', 'amand', 'amaz', 'amazon', 'amb', 'ambigu', 'ambit', 'amby', 'americ', 'amidst', 'amitabh', 'among', 'amongst', 'amount', 'ampl', 'amrit', 'amus', 'amy', 'an', 'analys', 'anch', 'anct', 'and', 'anderson', 'andr', 'andre', 'andrew', 'andy', 'ang', 'angel', 'angl', 'angry', 'angst', 'angy', 'anil', 'anim', 'ann', 'annount', 'annoy', 'anny', 'anoth', 'answ', 'ant', 'antagon', 'antholog', 'anthony', 'anticip', 'anton', 'antonio', 'antonion', 'antwon', 'anxy', 'anybody', 'anyhow', 'anym', 'anyon', 'anyone', 'anyth', 'anytim', 'anyway', 'anywh', 'ap', 'apart', 'apocalypt', 'apolog', 'app', 'appal', 'appear', 'appl', 'applaud', 'apply', 'apprecy', 'approach', 'appropry', 'approv', 'approxim', 'april', 'apt', 'ar', 'arab', 'arc', 'arch', 'archaeolog', 'architect', 'are', 'area', 'argentin', 'argu', 'ariel', 'aristocr', 'arkin', 'arm', 'armstrong', 'army', 'arnold', 'around', 'arquet', 'arrang', 'arrest', 'arrog', 'arrow', 'art', 'arth', 'artic', 'artsy', 'artwork', 'arty', 'as', 'ash', 'asham', 'ashley', 'asid', 'ask', 'asleep', 'aspect', 'aspir', 'ass', 'assassin', 'assault', 'assembl', 'assert', 'asset', 'assign', 'assist', 'assocy', 'assort', 'assum', 'astair', 'aston', 'astound', 'astronaut', 'asyl', 'at', 'athlet', 'atl', 'atmosph', 'atroc', 'atrocy', 'attach', 'attack', 'attempt', 'attenborough', 'attend', 'attitud', 'attorney', 'attract', 'attribut', 'aud', 'audio', 'audit', 'audrey', 'audy', 'august', 'aunt', 'aur', 'aussy', 'aust', 'austin', 'austral', 'aut', 'auth', 'auto', 'autobiograph', 'autom', 'av', 'avail', 'aveng', 'avid', 'avoid', 'aw', 'await', 'awak', 'award', 'away', 'awesom', 'awhil', 'awkward', 'ax', 'aztec', 'bab', 'baby', 'babysit', 'bacal', 'bach', 'bachch', 'bachel', 'back', 'backdrop', 'background', 'backst', 'backward', 'bacon', 'bad', 'baddy', 'baffl', 'bag', 'bait', 'bak', 'baksh', 'bal', 'bald', 'baldwin', 'ballet', 'ban', 'band', 'bang', 'bank', 'bant', 'bar', 'barb', 'barbar', 'barbr', 'bargain', 'bark', 'barn', 'barney', 'baron', 'barrel', 'barry', 'barrym', 'bas', 'basebal', 'bash', 'basket', 'basketbal', 'bastard', 'bat', 'bath', 'bathroom', 'batm', 'battl', 'battlefield', 'bau', 'bay', 'bbc', 'be', 'beach', 'bean', 'bear', 'beard', 'beast', 'beat', 'beatl', 'beatty', 'beauty', 'beav', 'becam', 'beckham', 'beckins', 'becom', 'bed', 'bedroom', 'beer', 'beetl', 'befriend', 'beg', 'begin', 'begun', 'behav', 'behavio', 'behavy', 'behind', 'behold', 'being', 'bel', 'belg', 'believ', 'belong', 'belov', 'belt', 'belush', 'ben', 'bend', 'benea', 'benefit', 'bennet', 'bent', 'beowulf', 'bergm', 'berkeley', 'berlin', 'bernard', 'besid', 'best', 'bet', 'betray', 'better', 'betty', 'bev', 'bew', 'bewild', 'beyond', 'bias', 'bibl', 'big', 'biggest', 'bik', 'bikin', 'biko', 'bil', 'bimbo', 'bin', 'bind', 'bing', 'biograph', 'biop', 'bir', 'bird', 'birthday', 'bit', 'bitch', 'bittersweet', 'bizar', 'bla', 'black', 'blackmail', 'blad', 'blah', 'blair', 'blak', 'blam', 'bland', 'blank', 'blast', 'blat', 'blaz', 'bleak', 'blee', 'blend', 'bless', 'blew', 'blind', 'blink', 'bliss', 'blob', 'block', 'blockbust', 'blond', 'blood', 'bloody', 'bloom', 'blossom', 'blow', 'blown', 'blu', 'blunt', 'blur', 'bo', 'board', 'boast', 'boat', 'bob', 'bobby', 'body', 'bog', 'bogart', 'boggl', 'boil', 'bol', 'bold', 'bollywood', 'bomb', 'bon', 'bond', 'bonny', 'boo', 'boob', 'boog', 'book', 'boom', 'boost', 'boot', 'bor', 'bord', 'boredom', 'born', 'borrow', 'boss', 'boston', 'both', 'bottl', 'bottom', 'bought', 'bound', 'bount', 'bounty', 'bourn', 'bout', 'bow', 'bowl', 'box', 'boy', 'boyfriend', 'boyl', 'brad', 'brady', 'brain', 'brainless', 'branagh', 'branch', 'brand', 'brando', 'brat', 'brav', 'braveheart', 'bravo', 'brazil', 'brea', 'bread', 'break', 'breakdown', 'breakfast', 'breast', 'breath', 'breathtak', 'bree', 'brend', 'brent', 'bret', 'bri', 'brick', 'brid', 'bridg', 'bridget', 'brief', 'bright', 'bril', 'bring', 'brit', 'britain', 'bro', 'broad', 'broadcast', 'broadway', 'brok', 'bronson', 'bront', 'brood', 'brook', 'brooklyn', 'brosn', 'broth', 'brought', 'brow', 'brown', 'bruc', 'bruno', 'brush', 'brut', 'bry', 'bsg', 'btw', 'bubbl', 'buck', 'bucket', 'bud', 'buddy', 'budget', 'buff', 'buffalo', 'bug', 'build', 'built', 'bul', 'bulk', 'bullet', 'bum', 'bumbl', 'bump', 'bunch', 'bunny', 'bur', 'burd', 'burk', 'burn', 'burst', 'burt', 'burton', 'bury', 'bus', 'busey', 'bush', 'businessm', 'bust', 'busy', 'but', 'butch', 'butl', 'button', 'buy', 'buzz', 'bye', 'cab', 'cabin', 'cabl', 'caf', 'cag', 'cagney', 'cain', 'cak', 'cal', 'calc', 'calib', 'californ', 'calm', 'cam', 'cambod', 'camcord', 'cameo', 'camer', 'camera', 'cameram', 'cameron', 'camp', 'campaign', 'campbel', 'campy', 'can', 'canad', 'cancel', 'candid', 'candl', 'candy', 'cannib', 'cannon', 'cannot', 'cant', 'canyon', 'cap', 'capac', 'capit', 'capot', 'capt', 'captain', 'car', 'card', 'cardboard', 'carel', 'cares', 'caretak', 'carey', 'carl', 'carlito', 'carlo', 'carm', 'carn', 'carol', 'carolin', 'caron', 'carp', 'carradin', 'carrey', 'carry', 'cart', 'cartoon', 'cary', 'cas', 'casablanc', 'cash', 'casino', 'casp', 'cassavet', 'cassidy', 'cast', 'castl', 'cat', 'catalog', 'catastroph', 'catch', 'catchy', 'categ', 'catherin', 'cathol', 'cattl', 'caught', 'caus', 'caut', 'cav', 'cbs', 'cd', 'ceas', 'cecil', 'ceil', 'cel', 'celebr', 'celest', 'celluloid', 'cemetery', 'cens', 'cent', 'century', 'cerebr', 'ceremony', 'certain', 'cg', 'cgi', 'chain', 'chainsaw', 'chair', 'challeng', 'chamb', 'chamberlain', 'champ', 'chan', 'chang', 'channel', 'chant', 'chao', 'chaplin', 'chapt', 'char', 'charact', 'charg', 'charism', 'charl', 'charlot', 'charlton', 'charm', 'chas', 'chat', 'chavez', 'che', 'cheadl', 'cheap', 'cheaply', 'check', 'cheek', 'chees', 'cheesy', 'chem', 'cher', 'chess', 'chest', 'chew', 'chib', 'chicago', 'chick', 'chief', 'chil', 'child', 'childr', 'chin', 'chines', 'chip', 'cho', 'chocol', 'choir', 'chok', 'chong', 'choos', 'chop', 'choppy', 'chor', 'choreograph', 'chos', 'chris', 'christ', 'christian', 'christianity', 'christians', 'christie', 'christina', 'christine', 'christmas', 'christopher', 'christy', 'chronicles', 'chuck', 'chuckl', 'church', 'churn', 'cia', 'cigaret', 'cinderell', 'cindy', 'cinem', 'cinema', 'cinematograph', 'circ', 'circumst', 'cit', 'city', 'civil', 'cla', 'clad', 'claim', 'clair', 'clan', 'clar', 'clark', 'clash', 'class', 'classm', 'classy', 'claud', 'claustrophob', 'claw', 'clay', 'cle', 'clear', 'clerk', 'clev', 'cli', 'clich', 'click', 'cliff', 'cliffhang', 'clim', 'climact', 'climax', 'climb', 'clin', 'clint', 'clip', 'cliv', 'cloak', 'clock', 'clon', 'clooney', 'clos', 'closest', 'closet', 'closeup', 'cloth', 'cloud', 'clown', 'clu', 'club', 'clueless', 'clumsy', 'clunky', 'clut', 'co', 'coach', 'coast', 'coat', 'cod', 'cody', 'coff', 'coffin', 'coh', 'coher', 'coincid', 'cok', 'col', 'cold', 'colin', 'coll', 'collab', 'collaps', 'colleagu', 'collect', 'colleg', 'collet', 'collin', 'colm', 'colo', 'colon', 'colonel', 'colony', 'columb', 'columbo', 'com', 'comb', 'combin', 'comeback', 'comedy', 'comfort', 'command', 'commend', 'commerc', 'commit', 'common', 'commun', 'comp', 'company', 'comparison', 'compass', 'compel', 'compens', 'compet', 'competit', 'compl', 'complain', 'complaint', 'complet', 'complex', 'comply', 'compos', 'composit', 'compound', 'compr', 'comprehend', 'comprom', 'compuls', 'comput', 'con', 'conceit', 'conceiv', 'concern', 'concert', 'conclud', 'concoct', 'cond', 'condemn', 'condit', 'conduc', 'conf', 'confess', 'confid', 'confin', 'confirm', 'conflict', 'confront', 'confus', 'congrat', 'connect', 'connery', 'conqu', 'conrad', 'conscy', 'consequ', 'conserv', 'consid', 'consist', 'conspir', 'const', 'constitut', 'construct', 'consum', 'cont', 'contact', 'contain', 'contemp', 'contempl', 'contempt', 'contend', 'contest', 'context', 'contin', 'continu', 'contract', 'contradict', 'contrast', 'contribut', 'control', 'controvers', 'controversy', 'conv', 'conveny', 'convers', 'convert', 'convey', 'convict', 'convint', 'convolv', 'cook', 'cooky', 'cool', 'coop', 'cop', 'cor', 'corbet', 'corey', 'corm', 'corn', 'corny', 'corp', 'corps', 'correct', 'corrid', 'corrupt', 'cost', 'costum', 'couch', 'could', 'counsel', 'count', 'counterpart', 'countless', 'country', 'countrysid', 'county', 'coup', 'coupl', 'cour', 'cours', 'court', 'courtroom', 'cousin', 'cov', 'cow', 'coward', 'cowboy', 'cox', 'crack', 'craft', 'craig', 'cram', 'crap', 'crappy', 'crash', 'crav', 'crawford', 'crawl', 'craz', 'crazy', 'cre', 'cream', 'creasy', 'cred', 'credit', 'creek', 'creep', 'creepy', 'crew', 'cri', 'crim', 'crimin', 'cring', 'crippl', 'cris', 'crisp', 'crit', 'crocodil', 'crook', 'crop', 'crosby', 'cross', 'crow', 'crowd', 'crown', 'cru', 'cruc', 'crud', 'cruel', 'crush', 'cry', 'crypt', 'cryst', 'cub', 'cue', 'culmin', 'cult', 'cum', 'cunningham', 'cup', 'cur', 'curios', 'curs', 'curt', 'curtain', 'cury', 'cusack', 'cush', 'custom', 'cut', 'cyborg', 'cyc', 'cyn', 'cyph', 'da', 'dad', 'daddy', 'dahl', 'dahm', 'dai', 'daisy', 'dal', 'dalton', 'dam', 'damn', 'damon', 'dan', 'dandy', 'dang', 'daniel', 'danny', 'dant', 'dar', 'dark', 'darl', 'darn', 'dash', 'dat', 'daught', 'dav', 'david', 'davy', 'dawn', 'dawson', 'day', 'daylight', 'dazzl', 'de', 'dea', 'dead', 'deaf', 'deal', 'dealt', 'dean', 'deann', 'dear', 'death', 'deb', 'debby', 'debr', 'debt', 'debut', 'dec', 'decad', 'decapit', 'deceas', 'deceiv', 'decid', 'deck', 'decl', 'declin', 'ded', 'dee', 'deem', 'deep', 'deeply', 'deer', 'def', 'defend', 'defens', 'defin', 'definit', 'defy', 'deg', 'degr', 'degrad', 'del', 'delay', 'delet', 'delib', 'delicy', 'delight', 'deliry', 'delivery', 'delud', 'delv', 'dem', 'demand', 'demil', 'democr', 'demon', 'demonst', 'den', 'deniro', 'denou', 'dent', 'deny', 'denzel', 'dep', 'depart', 'depend', 'depict', 'deprav', 'depress', 'depth', 'deputy', 'der', 'derang', 'derek', 'des', 'desc', 'descend', 'describ', 'desert', 'deserv', 'design', 'desir', 'desp', 'despair', 'despit', 'destin', 'destiny', 'destroy', 'destruct', 'det', 'detach', 'detail', 'detect', 'determin', 'detery', 'detract', 'detroit', 'dev', 'devast', 'develop', 'devil', 'devo', 'devoid', 'devot', 'devy', 'dialog', 'diamond', 'dian', 'diary', 'dick', 'dict', 'did', 'die', 'died', 'diff', 'difficul', 'difficult', 'dig', 'digest', 'digit', 'dign', 'dil', 'dilemm', 'dim', 'dimend', 'dimin', 'din', 'dinosa', 'dir', 'direct', 'dirt', 'dirty', 'dis', 'disagr', 'disappear', 'disappoint', 'disast', 'disbeliev', 'disc', 'discern', 'disciplin', 'disco', 'discov', 'discovery', 'discuss', 'diseas', 'disgrac', 'disgu', 'disgust', 'dish', 'disjoint', 'dislik', 'dism', 'dismiss', 'disney', 'disord', 'dispatch', 'display', 'dispos', 'disregard', 'disrespect', 'dissolv', 'dist', 'distinct', 'distort', 'distract', 'distress', 'distribut', 'district', 'disturb', 'div', 'divers', 'divert', 'divid', 'divin', 'divorc', 'dixon', 'dj', 'do', 'doc', 'doct', 'docu', 'dodg', 'dog', 'dogm', 'dol', 'doll', 'dolph', 'dom', 'domest', 'domin', 'domino', 'don', 'donald', 'donn', 'dont', 'doo', 'doom', 'door', 'dor', 'dorothy', 'dos', 'dot', 'doubl', 'doubt', 'dougla', 'down', 'downey', 'downhil', 'download', 'downright', 'doyl', 'doz', 'dr', 'drab', 'dracul', 'draft', 'drag', 'dragon', 'drain', 'drak', 'dram', 'drama', 'draw', 'drawn', 'dre', 'dread', 'dream', 'dreamy', 'dreck', 'dress', 'drew', 'drift', 'dril', 'drink', 'drip', 'driv', 'drivel', 'dron', 'drop', 'drov', 'drown', 'drug', 'drum', 'drunk', 'dry', 'du', 'dub', 'duby', 'duck', 'dud', 'dudley', 'due', 'duel', 'duh', 'duk', 'dul', 'dumb', 'dumbest', 'dump', 'dun', 'duo', 'dur', 'dust', 'dustin', 'dutch', 'duty', 'duval', 'dvd', 'dvds', 'dwarf', 'dwel', 'dying', 'dyl', 'dynam', 'dysfunct', 'eag', 'eagl', 'ear', 'earl', 'earn', 'earnest', 'eas', 'east', 'eastern', 'eastwood', 'easy', 'eat', 'ebert', 'ecc', 'echo', 'econom', 'ed', 'eddy', 'edg', 'edgy', 'edi', 'edison', 'edit', 'educ', 'edward', 'edy', 'eery', 'effect', 'efficy', 'effort', 'effortless', 'eg', 'ego', 'egypt', 'eight', 'eighty', 'einstein', 'eith', 'el', 'elab', 'eld', 'elect', 'electron', 'eleg', 'eleph', 'elev', 'elimin', 'elit', 'elizabe', 'elliot', 'elm', 'els', 'else', 'elsewh', 'elud', 'elv', 'elvir', 'em', 'embark', 'embarrass', 'embody', 'embrac', 'emerg', 'emil', 'emm', 'emot', 'emp', 'empath', 'empathy', 'emphas', 'empir', 'employ', 'empty', 'emy', 'en', 'ench', 'enco', 'encount', 'end', 'endear', 'ending', 'endless', 'enemy', 'energet', 'energy', 'enforc', 'eng', 'engin', 'engl', 'england', 'engross', 'enh', 'enigm', 'enjoy', 'enl', 'enlight', 'enorm', 'enough', 'ens', 'ensembl', 'ensu', 'ent', 'enterpr', 'entertain', 'enthral', 'enthusiasm', 'enthusiast', 'entir', 'entitl', 'entry', 'environ', 'envy', 'ep', 'episod', 'epitom', 'eq', 'equ', 'equip', 'er', 'eras', 'erik', 'erot', 'errol', 'escap', 'esp', 'espec', 'esquir', 'ess', 'est', 'esth', 'estrang', 'et', 'etc', 'etern', 'eth', 'ethn', 'eug', 'europ', 'ev', 'evalu', 'evelyn', 'ever', 'every', 'everybody', 'everyday', 'everyon', 'everyth', 'everywh', 'evid', 'evil', 'evok', 'evolv', 'ex', 'exact', 'exag', 'examin', 'exampl', 'exceiv', 'excel', 'excess', 'exchang', 'excit', 'exclud', 'excrucy', 'excus', 'execut', 'exempl', 'exerc', 'exhaust', 'exhibit', 'exit', 'exorc', 'exot', 'expand', 'expect', 'expedit', 'expend', 'expens', 'expert', 'expery', 'expl', 'explain', 'explicit', 'explod', 'exploit', 'expos', 'exposit', 'express', 'exquisit', 'ext', 'extend', 'extery', 'extinct', 'extr', 'extra', 'extraordin', 'extrem', 'ey', 'eyebrow', 'eyr', 'fab', 'fabl', 'fabr', 'fac', 'facil', 'fact', 'fad', 'fai', 'fail', 'faint', 'fair', 'fairbank', 'fairy', 'faith', 'fak', 'fal', 'falk', 'fallon', 'fals', 'fam', 'famili', 'famy', 'fan', 'fant', 'fantast', 'fantasy', 'far', 'farc', 'farm', 'farrel', 'fart', 'fasc', 'fascin', 'fash', 'fassbind', 'fast', 'fat', 'fath', 'fault', 'fav', 'favo', 'favorit', 'favourit', 'fay', 'fbi', 'fear', 'feast', 'feat', 'fed', 'fee', 'feebl', 'feel', 'feet', 'feinston', 'fel', 'felix', 'fellin', 'fellow', 'felt', 'fem', 'femin', 'feminin', 'fent', 'fer', 'ferrel', 'fest', 'fet', 'fetch', 'fev', 'fi', 'fiant', 'fict', 'fido', 'field', 'fiend', 'fierc', 'fif', 'fifteen', 'fifty', 'fig', 'fight', 'fil', 'film', 'filmmak', 'filt', 'filthy', 'fin', 'find', 'finest', 'fing', 'finney', 'fir', 'firm', 'first', 'fish', 'fishburn', 'fist', 'fit', 'fiv', 'fix', 'flag', 'flair', 'flam', 'flash', 'flashback', 'flashy', 'flat', 'flav', 'flaw', 'flawless', 'fle', 'fleet', 'flem', 'flesh', 'fli', 'flick', 'flight', 'flimsy', 'flip', 'flirt', 'flo', 'flock', 'flood', 'flop', 'flor', 'florid', 'flow', 'fluff', 'fluid', 'fly', 'flyn', 'foc', 'focus', 'fog', 'foil', 'folk', 'follow', 'fond', 'fontain', 'food', 'fool', 'foot', 'footbal', 'for', 'forbid', 'forc', 'ford', 'foreign', 'foremost', 'forest', 'forev', 'forg', 'forget', 'forgot', 'form', 'formul', 'formula', 'forrest', 'fort', 'fortun', 'forty', 'forward', 'fost', 'fought', 'foul', 'found', 'four', 'fox', 'foxx', 'frag', 'fragil', 'frail', 'fram', 'franch', 'francisco', 'franco', 'frank', 'frankenstein', 'franklin', 'franky', 'frant', 'fraud', 'fre', 'freak', 'freaky', 'fred', 'freddy', 'freedom', 'freem', 'freez', 'french', 'frenzy', 'frequ', 'fresh', 'fri', 'friday', 'friend', 'fright', 'frog', 'from', 'front', 'fronty', 'frost', 'froz', 'fruit', 'frust', 'fry', 'fu', 'fuel', 'ful', 'fulc', 'fulfil', 'fun', 'funct', 'fund', 'funda', 'funniest', 'funny', 'furnit', 'furtherm', 'fury', 'fut', 'fuzzy', 'fx', 'gabl', 'gabriel', 'gadget', 'gag', 'gain', 'gal', 'galactic', 'galaxy', 'gallery', 'gam', 'gambl', 'gamer', 'gandh', 'gang', 'gangst', 'gap', 'gar', 'garb', 'garbo', 'gard', 'garland', 'garn', 'gary', 'gas', 'gasp', 'gat', 'gath', 'gav', 'gay', 'gaz', 'gear', 'geek', 'gem', 'gen', 'gend', 'genet', 'geni', 'genius', 'genr', 'gentl', 'gentlem', 'genuin', 'geny', 'georg', 'ger', 'gerard', 'germ', 'germany', 'gershwin', 'gest', 'get', 'ghetto', 'ghost', 'giallo', 'giant', 'gibson', 'gielgud', 'gift', 'gig', 'giggl', 'gil', 'gilbert', 'gilliam', 'gimmick', 'gin', 'ging', 'giovann', 'girl', 'girlfriend', 'giv', 'glad', 'glady', 'glam', 'glant', 'glar', 'glass', 'gle', 'glen', 'glimps', 'glob', 'gloom', 'glor', 'glory', 'glov', 'glow', 'glu', 'go', 'goal', 'goat', 'god', 'godard', 'godfath', 'godzill', 'goe', 'goer', 'going', 'gold', 'goldberg', 'goldbl', 'goldsworthy', 'goldy', 'gon', 'gonn', 'good', 'goodby', 'goodm', 'goody', 'goof', 'goofy', 'gor', 'gordon', 'gorg', 'gory', 'gosh', 'got', 'goth', 'gott', 'govern', 'govind', 'grab', 'grac', 'grad', 'gradu', 'graham', 'grainy', 'gram', 'grand', 'grandfath', 'grandm', 'grandmoth', 'grandp', 'grant', 'graph', 'grasp', 'grass', 'grat', 'gratuit', 'grav', 'graveyard', 'gray', 'grayson', 'gre', 'great', 'greatest', 'gree', 'greedy', 'greek', 'green', 'greet', 'greg', 'grew', 'grey', 'grief', 'griev', 'griffi', 'grim', 'grin', 'grinch', 'grind', 'grip', 'gritty', 'gro', 'gross', 'grotesqu', 'ground', 'group', 'grow', 'grown', 'grudg', 'gruesom', 'guar', 'guarantee', 'guard', 'guess', 'guest', 'guid', 'guil', 'guilt', 'guin', 'guine', 'guit', 'gum', 'gun', 'gundam', 'gunfight', 'gung', 'gut', 'guy', 'gwyne', 'gypo', 'gypsy', 'ha', 'habit', 'hack', 'hackm', 'hackney', 'hadley', 'hag', 'hail', 'hain', 'hair', 'hal', 'half', 'halfway', 'hallmark', 'halloween', 'hallucin', 'ham', 'hamilton', 'hamlet', 'hammy', 'han', 'hand', 'handicap', 'handl', 'handsom', 'hang', 'hank', 'hannah', 'hap', 'hapless', 'happy', 'har', 'harass', 'harb', 'hard', 'hardc', 'hardy', 'hark', 'harlow', 'harm', 'harmless', 'harold', 'harp', 'harriet', 'harrison', 'harrow', 'harry', 'harsh', 'hart', 'hartley', 'harvey', 'hat', 'hatch', 'haunt', 'havoc', 'hawk', 'hawn', 'hay', 'haywor', 'hbo', 'hea', 'head', 'headach', 'heal', 'healthy', 'heap', 'hear', 'heard', 'heart', 'heartbreak', 'heartfelt', 'heartwarm', 'heat', 'heav', 'heavy', 'heck', 'hect', 'heel', 'height', 'heist', 'hel', 'held', 'helicopt', 'hello', 'helm', 'helmet', 'help', 'helpless', 'henchm', 'henry', 'hent', 'hepburn', 'her', 'herbert', 'herd', 'here', 'herm', 'hero', 'heroin', 'hesit', 'heston', 'hey', 'hi', 'hick', 'hid', 'high', 'highest', 'highlight', 'highway', 'hil', 'him', 'hind', 'hint', 'hip', 'hippy', 'hir', 'hist', 'hit', 'hitch', 'hitchcock', 'hitl', 'hk', 'hmmm', 'ho', 'hoffm', 'hog', 'hokey', 'hol', 'hold', 'holiday', 'hollow', 'hollywood', 'holm', 'holocaust', 'holy', 'hom', 'homeless', 'homicid', 'homosex', 'hon', 'honest', 'honesty', 'hong', 'hono', 'hood', 'hook', 'hoop', 'hoot', 'hop', 'hopeless', 'hopkin', 'hor', 'horn', 'horny', 'horr', 'horrend', 'horrid', 'hors', 'hospit', 'host', 'hostel', 'hostil', 'hot', 'hotel', 'hound', 'hour', 'hous', 'household', 'housew', 'how', 'howard', 'howev', 'howl', 'http', 'hudson', 'hug', 'hugh', 'huh', 'hulk', 'hum', 'humbl', 'humo', 'humy', 'hundr', 'hung', 'hungry', 'hunt', 'hurry', 'hurt', 'husband', 'hustl', 'huston', 'hybrid', 'hyd', 'hyp', 'hypnot', 'hyst', 'ian', 'ic', 'icon', 'id', 'ide', 'idea', 'ident', 'ideolog', 'idiot', 'idol', 'ie', 'if', 'ign', 'ii', 'il', 'illeg', 'illog', 'illud', 'illust', 'im', 'imagery', 'imagin', 'imdb', 'imit', 'immedy', 'immens', 'immers', 'immigr', 'immort', 'imo', 'imp', 'impact', 'impecc', 'imperson', 'impl', 'implaus', 'imply', 'import', 'impos', 'imposs', 'impress', 'imprison', 'improb', 'improv', 'impuls', 'in', 'inacc', 'inadvert', 'inappropry', 'incap', 'incarn', 'incest', 'inch', 'incid', 'inclin', 'includ', 'incoh', 'incompet', 'incomprehens', 'inconsist', 'incorp', 'incorrect', 'increas', 'incred', 'ind', 'indee', 'independ', 'indian', 'indiff', 'individ', 'induc', 'indulg', 'indust', 'industry', 'indy', 'inept', 'inevit', 'inexpery', 'inexpl', 'inf', 'infam', 'infect', 'infery', 'infinit', 'inflict', 'influ', 'info', 'inform', 'ing', 'ingeny', 'ingredy', 'ingrid', 'inh', 'inhabit', 'inherit', 'init', 'inject', 'injury', 'injust', 'inm', 'innoc', 'innov', 'innuendo', 'ins', 'insec', 'insect', 'insert', 'insid', 'insight', 'insign', 'insipid', 'insist', 'insomn', 'inspect', 'inspir', 'inst', 'instal', 'instead', 'instinct', 'institut', 'instru', 'instruct', 'insult', 'int', 'intact', 'integr', 'intellect', 'intellig', 'intend', 'intens', 'interact', 'interest', 'interf', 'intern', 'internet', 'interpret', 'interrupt', 'intertwin', 'interv', 'interview', 'intery', 'intim', 'intol', 'intrigu', 'intro', 'introduc', 'intrud', 'inv', 'invad', 'invas', 'invest', 'investig', 'invis', 'invit', 'involv', 'iq', 'ir', 'iraq', 'ireland', 'iron', 'irony', 'irrelev', 'irrit', 'is', 'isabel', 'ish', 'islam', 'island', 'isol', 'israel', 'issu', 'it', 'ita', 'item', 'iturb', 'iv', 'jack', 'jacket', 'jackson', 'jacky', 'jacob', 'jacqu', 'jad', 'jaff', 'jag', 'jail', 'jak', 'jam', 'jamy', 'jan', 'jap', 'japanes', 'jar', 'jason', 'jaw', 'jay', 'jazz', 'jeal', 'jealousy', 'jean', 'jed', 'jeff', 'jeffrey', 'jen', 'jenn', 'jenny', 'jeremy', 'jerk', 'jerry', 'jersey', 'jes', 'jess', 'jessic', 'jet', 'jew', 'jewel', 'jil', 'jim', 'jimmy', 'joan', 'job', 'jock', 'jody', 'joe', 'joel', 'joey', 'johansson', 'john', 'johnny', 'johnson', 'join', 'joint', 'jok', 'joly', 'jon', 'jonath', 'jord', 'jos', 'joseph', 'josh', 'journ', 'journey', 'jov', 'joy', 'jr', 'juan', 'jud', 'judg', 'judy', 'juic', 'jul', 'juliet', 'july', 'jump', 'jun', 'jungl', 'junk', 'juny', 'jury', 'just', 'justin', 'juvenil', 'kan', 'kansa', 'kapo', 'kar', 'karl', 'karloff', 'kat', 'kathleen', 'kathryn', 'kathy', 'katy', 'kay', 'kaz', 'keaton', 'keen', 'keep', 'kei', 'kel', 'ken', 'kenne', 'kennedy', 'kent', 'kept', 'kevin', 'key', 'khan', 'kick', 'kid', 'kiddy', 'kidm', 'kidnap', 'kil', 'kim', 'kind', 'king', 'kingdom', 'kinnear', 'kirk', 'kiss', 'kit', 'kitch', 'kitty', 'klin', 'kne', 'knew', 'knif', 'knight', 'knightley', 'knock', 'know', 'knowledg', 'known', 'kolchak', 'kong', 'kor', 'kore', 'kri', 'kubrick', 'kudo', 'kum', 'kung', 'kurosaw', 'kurt', 'kyl', 'la', 'lab', 'label', 'lac', 'lack', 'lacklust', 'lad', 'lady', 'laid', 'lak', 'lam', 'lamb', 'lampoon', 'lan', 'land', 'landmark', 'landscap', 'lang', 'langu', 'lant', 'laput', 'lar', 'larg', 'larry', 'las', 'last', 'lat', 'latest', 'latin', 'latino', 'laugh', 'laught', 'launch', 'laur', 'laurel', 'laury', 'lav', 'law', 'lawr', 'lawy', 'lay', 'lazy', 'le', 'lead', 'leagu', 'lean', 'leap', 'learn', 'least', 'leath', 'leav', 'lect', 'led', 'lee', 'left', 'leg', 'legend', 'legitim', 'leigh', 'lemmon', 'len', 'lend', 'leng', 'lengthy', 'lennon', 'leo', 'leon', 'leonard', 'les', 'lesb', 'less', 'lesson', 'lest', 'let', 'leth', 'lev', 'level', 'lew', 'lex', 'li', 'liam', 'lib', 'liberty', 'libr', 'licens', 'lie', 'lif', 'life', 'lifeless', 'lifestyl', 'lifetim', 'lift', 'light', 'lightn', 'lik', 'likew', 'lil', 'lily', 'limb', 'limit', 'lin', 'lincoln', 'lind', 'lindsay', 'linear', 'ling', 'link', 'lion', 'lionel', 'lip', 'lis', 'list', 'lit', 'littl', 'liu', 'liv', 'liz', 'lizard', 'lloyd', 'load', 'loath', 'loc', 'lock', 'log', 'loi', 'lol', 'lon', 'london', 'long', 'longest', 'longor', 'look', 'loom', 'loop', 'loos', 'lor', 'lord', 'lorett', 'los', 'loss', 'lost', 'lot', 'lou', 'loud', 'lousy', 'lov', 'love', 'low', 'lowest', 'loy', 'loyal', 'luc', 'luca', 'lucil', 'luck', 'lucky', 'lucy', 'ludicr', 'lugos', 'lui', 'luk', 'luka', 'lumet', 'lun', 'lunch', 'lundgr', 'lung', 'lur', 'lurk', 'lush', 'lust', 'luth', 'luxury', 'lying', 'lynch', 'lyr', 'mabel', 'mac', 'macabr', 'macarth', 'macdonald', 'machin', 'macho', 'macy', 'mad', 'made', 'madm', 'madonn', 'mads', 'mae', 'maf', 'mag', 'magazin', 'maggy', 'magn', 'maid', 'mail', 'main', 'mainstream', 'maintain', 'maj', 'mak', 'makeup', 'mal', 'malon', 'mam', 'man', 'mand', 'mandy', 'mang', 'manhat', 'maniac', 'manifest', 'manip', 'mankind', 'manufact', 'many', 'map', 'mar', 'marc', 'march', 'margaret', 'margin', 'marilyn', 'marin', 'mario', 'mark', 'market', 'marl', 'marlon', 'marqu', 'marry', 'marsh', 'marshal', 'mart', 'marth', 'martin', 'marty', 'marvel', 'marx', 'mary', 'mask', 'masoch', 'mason', 'mass', 'massacr', 'mast', 'masterpiec', 'masterson', 'masturb', 'mat', 'match', 'mathieu', 'matrix', 'matthau', 'matthew', 'maureen', 'max', 'maxim', 'may', 'mayb', 'mayhem', 'mccarthy', 'mccoy', 'mclaglen', 'mcqueen', 'me', 'meadow', 'meal', 'mean', 'meand', 'meaningless', 'meant', 'meantim', 'meanwhil', 'meas', 'meat', 'mech', 'med', 'medicin', 'mediev', 'mediocr', 'medit', 'meek', 'meet', 'meg', 'mel', 'meliss', 'melodram', 'melody', 'melt', 'melvyn', 'mem', 'memb', 'men', 'menac', 'ment', 'mer', 'merc', 'merciless', 'mercy', 'merit', 'mermaid', 'merry', 'meryl', 'mesm', 'mess', 'messy', 'met', 'metaph', 'method', 'mex', 'mexico', 'mey', 'mgm', 'miam', 'mic', 'mich', 'michael', 'michel', 'mick', 'mickey', 'mid', 'middl', 'midget', 'midl', 'midnight', 'midst', 'might', 'mighty', 'miik', 'mik', 'mil', 'mild', 'mildr', 'milit', 'milk', 'millionair', 'milo', 'mim', 'min', 'mind', 'mindless', 'minim', 'minisery', 'minnell', 'minut', 'mir', 'mirac', 'mirand', 'mis', 'miscast', 'misery', 'misfit', 'misfortun', 'misguid', 'mislead', 'miss', 'missil', 'mist', 'mistak', 'mistress', 'misunderstand', 'misunderstood', 'mitch', 'mitchel', 'mix', 'mixt', 'miyazak', 'mm', 'mob', 'mobl', 'mobst', 'mock', 'mod', 'model', 'modern', 'modest', 'modesty', 'moe', 'mol', 'molest', 'mom', 'moment', 'mon', 'money', 'monit', 'monk', 'monkey', 'monolog', 'monoton', 'monst', 'mont', 'montan', 'month', 'monty', 'monu', 'mood', 'moody', 'moon', 'moor', 'mor', 'morbid', 'more', 'moreov', 'morg', 'mormon', 'morn', 'moron', 'mort', 'moss', 'most', 'mot', 'moth', 'motorcyc', 'mou', 'mount', 'mountain', 'mourn', 'mous', 'mouth', 'mov', 'movie', 'movies', 'movy', 'mr', 'mrs', 'ms', 'mst', 'mtv', 'much', 'muddl', 'mug', 'mult', 'multipl', 'mum', 'mummy', 'mund', 'muppet', 'murd', 'murky', 'murph', 'murray', 'mus', 'musc', 'muse', 'muslim', 'must', 'mut', 'mutil', 'myer', 'myrtl', 'myst', 'mystery', 'myth', 'mytholog', 'nad', 'nail', 'naiv', 'nak', 'nam', 'nant', 'nar', 'narrow', 'naschy', 'nasty', 'nat', 'nata', 'natal', 'nath', 'naughty', 'naus', 'navy', 'naz', 'nbc', 'nd', 'near', 'nearby', 'neat', 'necess', 'neck', 'ned', 'nee', 'needless', 'neg', 'neglect', 'neighb', 'neighbo', 'neil', 'neith', 'nelson', 'nemes', 'neo', 'nephew', 'nerd', 'nerv', 'net', 'netflix', 'network', 'neurot', 'neut', 'nev', 'nevertheless', 'new', 'newcom', 'newm', 'newspap', 'next', 'nic', 'nichola', 'nicholson', 'nick', 'nicol', 'nicola', 'niec', 'night', 'nightclub', 'nightm', 'nin', 'ninj', 'niro', 'niv', 'no', 'nobl', 'nobody', 'nod', 'noir', 'nois', 'nol', 'nolt', 'nomin', 'non', 'nonetheless', 'nonsens', 'nop', 'nor', 'norm', 'northam', 'northern', 'nos', 'nostalg', 'not', 'notch', 'noteworthy', 'noth', 'novak', 'novel', 'now', 'nowaday', 'nowh', 'nuant', 'nuclear', 'nud', 'num', 'numb', 'nun', 'nurs', 'nut', 'ny', 'nyc', 'object', 'oblig', 'obnoxy', 'obsc', 'observ', 'obsess', 'obstac', 'obtain', 'obvy', 'oc', 'occ', 'occas', 'occult', 'occup', 'occupy', 'occur', 'octob', 'od', 'odyssey', 'off', 'offb', 'offend', 'oft', 'oh', 'oil', 'ok', 'okay', 'ol', 'old', 'oldest', 'oliv', 'olivy', 'olymp', 'om', 'omin', 'omit', 'on', 'one', 'onlin', 'onto', 'op', 'oper', 'opin', 'oppon', 'opportun', 'oppos', 'opposit', 'oppress', 'opt', 'optim', 'or', 'orang', 'orchest', 'ord', 'ordin', 'org', 'orgy', 'origin', 'orl', 'orph', 'orson', 'ory', 'osc', 'oth', 'othello', 'otherw', 'otto', 'ought', 'out', 'outcom', 'outd', 'outdo', 'outfit', 'outland', 'outlaw', 'outlin', 'outright', 'outsid', 'outstand', 'ov', 'over', 'overact', 'overal', 'overblown', 'overboard', 'overcom', 'overdon', 'overlong', 'overlook', 'overshadow', 'overt', 'overwhelm', 'ow', 'owl', 'own', 'oz', 'pac', 'pacino', 'pack', 'pad', 'pag', 'paid', 'pain', 'paint', 'pair', 'pal', 'palac', 'palestin', 'palm', 'paltrow', 'pamel', 'pan', 'pant', 'pap', 'par', 'parad', 'parallel', 'paramount', 'parano', 'paranoid', 'park', 'parody', 'parrot', 'parson', 'part', 'particip', 'particul', 'partn', 'party', 'pass', 'passeng', 'past', 'pat', 'patch', 'path', 'pathet', 'patho', 'patric', 'patrick', 'patriot', 'patron', 'pattern', 'paty', 'pau', 'paul', 'paus', 'paxton', 'pay', 'paycheck', 'payoff', 'pc', 'peac', 'peak', 'pearl', 'peck', 'peculi', 'pedest', 'pee', 'peer', 'peg', 'pen', 'penelop', 'penguin', 'penny', 'peopl', 'people', 'pep', 'per', 'perc', 'perceiv', 'perfect', 'perform', 'perhap', 'peril', 'period', 'perm', 'permit', 'perpet', 'perry', 'person', 'perspect', 'persuad', 'pervers', 'pervert', 'pet', 'petty', 'pfeiff', 'pg', 'phantasm', 'phantom', 'phas', 'phenom', 'phenomenon', 'phil', 'philip', 'phillip', 'philosoph', 'phoenix', 'phon', 'phony', 'photo', 'photograph', 'phrase', 'phys', 'piano', 'pick', 'pickford', 'pict', 'pie', 'piec', 'pier', 'pierc', 'pig', 'pil', 'pilot', 'pin', 'pink', 'pion', 'pip', 'pir', 'pistol', 'pit', 'pitch', 'pity', 'pivot', 'pix', 'plac', 'place', 'plagu', 'plain', 'plan', 'planet', 'plant', 'plast', 'plat', 'platform', 'plaus', 'play', 'playboy', 'playwright', 'pleas', 'please', 'plenty', 'plight', 'plod', 'plot', 'plu', 'plug', 'plum', 'pocket', 'poe', 'poem', 'poet', 'poetry', 'poign', 'point', 'pointless', 'poison', 'pok', 'pokemon', 'pol', 'polansk', 'policem', 'policy', 'polit', 'pond', 'pool', 'poor', 'pop', 'popcorn', 'popul', 'porn', 'porno', 'pornograph', 'port', 'portrait', 'portray', 'pos', 'posey', 'posit', 'poss', 'possess', 'post', 'pot', 'pound', 'pour', 'poverty', 'pow', 'powel', 'pra', 'pract', 'prank', 'pray', 'pre', 'preach', 'preachy', 'prec', 'precy', 'pred', 'predecess', 'predict', 'pref', 'prefer', 'pregn', 'prejud', 'prem', 'premy', 'prep', 'prepost', 'prequel', 'pres', 'preserv', 'presid', 'press', 'preston', 'presum', 'pretend', 'pretenty', 'pretty', 'prev', 'prevail', 'preview', 'prevy', 'prey', 'pri', 'pric', 'priceless', 'prid', 'priest', 'prim', 'primit', 'princess', 'princip', 'principl', 'print', 'prison', 'priv', 'privileg', 'priz', 'pro', 'prob', 'problem', 'proc', 'process', 'proclaim', 'produc', 'prof', 'profess', 'profil', 'profit', 'profound', 'program', 'progress', 'project', 'prolog', 'prom', 'promin', 'promot', 'prompt', 'pronount', 'proof', 'prop', 'propagand', 'property', 'prophecy', 'prophet', 'proport', 'propos', 'prosecut', 'prospect', 'prostitut', 'protagon', 'protect', 'protest', 'proud', 'prov', 'provid', 'provoc', 'provok', 'ps', 'pseudo', 'psych', 'psycho', 'psycholog', 'psychopa', 'psychot', 'psychy', 'pub', 'publ', 'puerto', 'pul', 'pulp', 'pumba', 'pump', 'pun', 'punch', 'punk', 'puppet', 'puppy', 'pur', 'purchas', 'purpl', 'purpos', 'pursu', 'pursuit', 'push', 'put', 'puzzl', 'python', 'quaid', 'qual', 'quant', 'quart', 'quas', 'queen', 'quentin', 'quest', 'quick', 'quiet', 'quin', 'quintess', 'quirky', 'quit', 'quot', 'rabbit', 'rac', 'rachel', 'rack', 'rad', 'radio', 'rady', 'rag', 'raid', 'rail', 'rain', 'rainy', 'rais', 'raj', 'ralph', 'ram', 'rambl', 'rambo', 'ramon', 'ramp', 'ran', 'ranch', 'randolph', 'random', 'randy', 'rang', 'rank', 'rant', 'rao', 'rap', 'rapid', 'rapt', 'rar', 'rat', 'rath', 'ratso', 'rav', 'raw', 'ray', 'raymond', 'raz', 'rd', 'rea', 'reach', 'react', 'read', 'ready', 'real', 'really', 'realm', 'rear', 'reason', 'rebel', 'rec', 'recal', 'receiv', 'recit', 'reckless', 'recogn', 'recognit', 'recommend', 'record', 'recov', 'recr', 'recruit', 'recyc', 'red', 'redeem', 'redempt', 'redneck', 'reduc', 'redund', 'ree', 'reel', 'reev', 'ref', 'refer', 'reflect', 'refresh', 'refug', 'refus', 'reg', 'regain', 'regard', 'regardless', 'regim', 'regret', 'regul', 'rehash', 'rehears', 'reid', 'reign', 'reincarn', 'reinforc', 'reis', 'reject', 'rel', 'relax', 'releas', 'relentless', 'relev', 'reliev', 'relig', 'religy', 'reluct', 'rely', 'remad', 'remain', 'remak', 'remark', 'rememb', 'remind', 'reminisc', 'remot', 'remov', 'ren', 'renaiss', 'rend', 'rendit', 'rent', 'rep', 'repetit', 'replac', 'replay', 'reply', 'report', 'repr', 'repres', 'repress', 'republ', 'repuls', 'reput', 'request', 'requir', 'rerun', 'res', 'rescu', 'research', 'resembl', 'reserv', 'resid', 'resist', 'resolv', 'reson', 'resort', 'resourc', 'respect', 'respond', 'respons', 'rest', 'resta', 'restrain', 'restraint', 'restrict', 'result', 'resum', 'resurrect', 'ret', 'retain', 'retard', 'retir', 'retriev', 'retrospect', 'return', 'reun', 'reunit', 'rev', 'revel', 'reveng', 'revers', 'review', 'revisit', 'revolt', 'revolv', 'reward', 'rewrit', 'rex', 'reynold', 'rhym', 'rhythm', 'ric', 'rich', 'richard', 'richardson', 'rick', 'ricky', 'rid', 'riddl', 'ridic', 'riff', 'rifl', 'rig', 'right', 'ring', 'riot', 'rip', 'ripoff', 'ris', 'risk', 'rit', 'ritchy', 'riv', 'rivet', 'road', 'roam', 'roar', 'rob', 'robbery', 'robbin', 'robby', 'robert', 'robertson', 'robin', 'robinson', 'robot', 'rochest', 'rock', 'rocket', 'rocky', 'rod', 'rog', 'rohm', 'rol', 'rom', 'romeo', 'romero', 'romp', 'ron', 'ronald', 'ronny', 'roof', 'rooky', 'room', 'rooney', 'root', 'rop', 'ros', 'rosario', 'rosem', 'ross', 'rot', 'roth', 'rough', 'round', 'rous', 'rout', 'routin', 'row', 'rowland', 'roy', 'rub', 'ruby', 'rud', 'rug', 'ruin', 'rukh', 'rul', 'rum', 'run', 'runaway', 'rur', 'rush', 'russ', 'russel', 'ruth', 'ruthless', 'ryan', 'sabot', 'sabrin', 'sack', 'sacr', 'sad', 'saddl', 'saf', 'sag', 'said', 'sail', 'saint', 'sak', 'sal', 'salesm', 'salm', 'saloon', 'salt', 'salv', 'sam', 'samanth', 'sammo', 'samura', 'san', 'sand', 'sandl', 'sandr', 'sang', 'sant', 'sap', 'sappy', 'sar', 'sarah', 'sarandon', 'sarcasm', 'sarcast', 'sassy', 'sat', 'satir', 'satisfact', 'satisfy', 'saturday', 'sav', 'saw', 'say', 'scal', 'scan', 'scand', 'scar', 'scarc', 'scarecrow', 'scarfac', 'scariest', 'scarlet', 'scary', 'scat', 'scen', 'scenario', 'scenery', 'schedule', 'scheme', 'schlock', 'schneider', 'school', 'schools', 'sci', 'scif', 'scooby', 'scoop', 'scop', 'scor', 'scorses', 'scot', 'scotland', 'scratch', 'scream', 'screaming', 'screams', 'screen', 'screening', 'screenplay', 'screens', 'screenwriter', 'screenwriters', 'screw', 'screwball', 'screwed', 'script', 'scripted', 'scripting', 'scripts', 'scrooge', 'se', 'sea', 'seag', 'seal', 'sean', 'search', 'season', 'seat', 'sebast', 'sec', 'second', 'secret', 'sect', 'seduc', 'see', 'seedy', 'seek', 'seem', 'seen', 'seg', 'sel', 'seldom', 'select', 'self', 'sem', 'sen', 'send', 'sens', 'senseless', 'sensit', 'sent', 'sentinel', 'senty', 'seny', 'sep', 'septemb', 'sequ', 'sequel', 'ser', 'serb', 'serg', 'serv', 'sery', 'sess', 'set', 'settl', 'setup', 'sev', 'seventy', 'sew', 'sex', 'sexy', 'seymo', 'sf', 'sg', 'sgt', 'sh', 'shad', 'shadow', 'shaggy', 'shah', 'shahid', 'shak', 'shakespear', 'shaky', 'shal', 'shallow', 'sham', 'shameless', 'shangha', 'shap', 'shar', 'shark', 'sharon', 'sharp', 'shat', 'shav', 'shaw', 'she', 'shed', 'sheen', 'sheet', 'shel', 'shelf', 'shelley', 'shelt', 'shepard', 'shepherd', 'sheriff', 'shield', 'shift', 'shin', 'ship', 'shirley', 'shirt', 'sho', 'shock', 'shoddy', 'shoot', 'shootout', 'shop', 'shor', 'short', 'shortcom', 'shot', 'shotgun', 'should', 'shout', 'shov', 'show', 'showcas', 'showdown', 'shown', 'shut', 'shy', 'sibl', 'sick', 'sid', 'sidekick', 'sidewalk', 'sidney', 'sigh', 'sight', 'sign', 'sil', 'silv', 'sim', 'simil', 'simmon', 'simon', 'simpl', 'simply', 'simpson', 'simult', 'sin', 'sinatr', 'sing', 'singl', 'sink', 'sint', 'sir', 'sirk', 'sissy', 'sist', 'sit', 'sitcom', 'situ', 'six', 'sixteen', 'sixty', 'siz', 'skat', 'skept', 'sketch', 'ski', 'skil', 'skin', 'skinny', 'skip', 'skit', 'skul', 'sky', 'slack', 'slam', 'slap', 'slapstick', 'slash', 'slat', 'slaught', 'slav', 'slay', 'sleaz', 'sleazy', 'sleep', 'sleepwalk', 'slic', 'slick', 'slid', 'slight', 'slightest', 'slim', 'slimy', 'slip', 'slo', 'sloppy', 'slow', 'slug', 'slut', 'sly', 'smack', 'smal', 'smart', 'smash', 'smel', 'smi', 'smil', 'smok', 'smoo', 'smug', 'smuggl', 'snak', 'snap', 'snatch', 'sneak', 'snip', 'snl', 'snob', 'snow', 'snowm', 'snuff', 'so', 'soap', 'sob', 'soc', 'socc', 'socy', 'soderbergh', 'soft', 'sol', 'sold', 'soldy', 'solid', 'solo', 'solv', 'somebody', 'someday', 'somehow', 'someon', 'someth', 'sometim', 'somewh', 'son', 'sondr', 'song', 'sonny', 'soon', 'soph', 'soprano', 'sor', 'sorrow', 'sorry', 'sort', 'sou', 'sought', 'soul', 'sound', 'soundtrack', 'soup', 'sour', 'sourc', 'southern', 'soviet', 'sox', 'soyl', 'spac', 'spacey', 'spad', 'spaghett', 'spain', 'span', 'spar', 'spark', 'sparkl', 'spawn', 'speak', 'spear', 'spec', 'spect', 'spectac', 'spectacul', 'specy', 'spee', 'speech', 'spel', 'spend', 'spent', 'spi', 'spic', 'spid', 'spielberg', 'spik', 'spil', 'spin', 'spir', 'spirit', 'spit', 'splatter', 'splendid', 'split', 'spock', 'spoil', 'spok', 'spont', 'spoof', 'spooky', 'spoon', 'sport', 'spot', 'spotlight', 'spout', 'spread', 'spree', 'spring', 'springer', 'spy', 'squ', 'squad', 'squeez', 'st', 'stab', 'stabl', 'stack', 'stad', 'staff', 'stag', 'stair', 'stak', 'stal', 'stalk', 'stallon', 'stamp', 'stan', 'stand', 'standard', 'standout', 'stanley', 'stant', 'stanwyck', 'star', 'stardom', 'stardust', 'starg', 'stark', 'start', 'startl', 'starv', 'stat', 'statu', 'stay', 'ste', 'steady', 'steam', 'steel', 'stell', 'step', 'steph', 'stephany', 'stereotyp', 'sterl', 'stern', 'stev', 'stewart', 'stick', 'stiff', 'stil', 'stilt', 'stim', 'stink', 'stir', 'stock', 'stol', 'stomach', 'ston', 'stood', 'stoog', 'stop', 'stor', 'storm', 'story', 'storylin', 'storytel', 'straight', 'straightforward', 'stranded', 'strange', 'strangely', 'stranger', 'strangers', 'stream', 'streep', 'street', 'streets', 'streisand', 'strength', 'strengths', 'stress', 'stretch', 'stretched', 'strict', 'strictly', 'strike', 'strikes', 'striking', 'string', 'strings', 'strip', 'stroke', 'strong', 'stronger', 'strongest', 'strongly', 'struck', 'structure', 'struggle', 'struggles', 'struggling', 'stuart', 'stuck', 'stud', 'studio', 'study', 'stuff', 'stumbl', 'stun', 'stunt', 'stupid', 'styl', 'sub', 'subject', 'sublim', 'submarin', 'submit', 'subplot', 'subsequ', 'subst', 'substitut', 'subt', 'subtext', 'subtitl', 'subtl', 'suburb', 'subvert', 'subway', 'success', 'suck', 'sud', 'sue', 'suff', 'sufficy', 'sug', 'suggest', 'suicid', 'suit', 'sul', 'sum', 'summ', 'sun', 'sund', 'sunday', 'sung', 'sunk', 'sunny', 'sunr', 'sunset', 'sunshin', 'sup', 'superb', 'superbl', 'superf', 'superhero', 'superm', 'supern', 'superst', 'supery', 'supply', 'support', 'suppos', 'suppress', 'suprem', 'sur', 'surf', 'surfac', 'surgery', 'surpass', 'surpr', 'surrend', 'surround', 'surv', 'sus', 'susp', 'suspect', 'suspend', 'suspens', 'suspicy', 'sustain', 'sutherland', 'swallow', 'sway', 'swe', 'swear', 'swed', 'sweep', 'sweet', 'swept', 'swift', 'swim', 'swing', 'switch', 'sword', 'sydney', 'symbol', 'sympath', 'sympathet', 'sympathy', 'syndrom', 'synops', 'system', 'tabl', 'taboo', 'tack', 'tackl', 'tacky', 'tact', 'tad', 'tag', 'tail', 'tak', 'tal', 'talk', 'talky', 'tam', 'tang', 'tank', 'tap', 'tar', 'tarantino', 'target', 'tarz', 'task', 'tast', 'tasteless', 'tat', 'tattoo', 'taught', 'tax', 'tayl', 'tcm', 'tea', 'teach', 'team', 'tear', 'teas', 'tech', 'techn', 'technicol', 'technolog', 'ted', 'tedy', 'tee', 'teen', 'tel', 'televid', 'temp', 'templ', 'tempt', 'ten', 'tend', 'tens', 'tent', 'ter', 'term', 'termin', 'terr', 'territ', 'terry', 'test', 'testa', 'texa', 'text', 'th', 'thank', 'that', 'the', 'thelm', 'them', 'therapy', 'there', 'theref', 'thick', 'thief', 'thiev', 'thin', 'thing', 'think', 'thinking', 'third', 'thirty', 'this', 'tho', 'thoma', 'thompson', 'thorn', 'thorough', 'though', 'thought', 'thousand', 'thread', 'threat', 'threatened', 'threatening', 'threatens', 'three', 'threw', 'thrill', 'thrilled', 'thriller', 'thrillers', 'thrilling', 'thrills', 'throat', 'throughout', 'throw', 'throwing', 'thrown', 'throws', 'thru', 'thu', 'thug', 'thumb', 'thund', 'thunderbird', 'thurm', 'tick', 'ticket', 'tid', 'tie', 'tied', 'tierney', 'tig', 'tight', 'til', 'tim', 'timberlak', 'time', 'timeless', 'timmy', 'timon', 'timothy', 'tin', 'tiny', 'tip', 'tir', 'tiresom', 'tit', 'titl', 'toby', 'tod', 'today', 'toe', 'togeth', 'toilet', 'tok', 'tokyo', 'tol', 'told', 'tom', 'tomato', 'tomb', 'tome', 'tommy', 'tomorrow', 'ton', 'tongu', 'tonight', 'tony', 'too', 'took', 'tool', 'top', 'topless', 'tor', 'torch', 'torn', 'toronto', 'tort', 'toss', 'tot', 'touch', 'tough', 'tour', 'tow', 'toward', 'town', 'toy', 'trac', 'track', 'tracy', 'trad', 'trademark', 'tradit', 'traff', 'trag', 'tragedy', 'trail', 'train', 'trait', 'tramp', 'transcend', 'transf', 'transform', 'transit', 'transl', 'transmit', 'transp', 'transpl', 'transport', 'trap', 'trash', 'trashy', 'traum', 'trav', 'travel', 'travesty', 'tre', 'treas', 'trek', 'tremend', 'trend', 'tri', 'triangl', 'trib', 'tribut', 'trick', 'trig', 'trilog', 'trio', 'trip', 'tripl', 'trit', 'triumph', 'triv', 'trom', 'troop', 'troubl', 'tru', 'truck', 'trum', 'trust', 'truth', 'try', 'tub', 'tuck', 'tun', 'tunnel', 'turd', 'turk', 'turkey', 'turmoil', 'turn', 'turtl', 'tv', 'twelv', 'twenty', 'twic', 'twilight', 'twin', 'twist', 'two', 'tyl', 'typ', 'ug', 'ugh', 'uh', 'uk', 'ultim', 'ultimat', 'ultr', 'um', 'un', 'unansw', 'unattract', 'unaw', 'unbear', 'unbeliev', 'unc', 'uncanny', 'uncomfort', 'unconscy', 'unconv', 'unconvint', 'uncov', 'uncut', 'und', 'undead', 'undeny', 'under', 'underground', 'undermin', 'undernea', 'underst', 'understand', 'understood', 'undertak', 'underw', 'underwear', 'underworld', 'undoubt', 'uneasy', 'unev', 'unexpect', 'unexplain', 'unfair', 'unfold', 'unforg', 'unforget', 'unfortun', 'unfunny', 'unhappy', 'uniform', 'unimagin', 'uninspir', 'unint', 'uninterest', 'unit', 'univers', 'unknown', 'unleash', 'unless', 'unlik', 'unnecess', 'unorigin', 'unpleas', 'unpredict', 'unr', 'unravel', 'unrel', 'uns', 'unsatisfy', 'unseen', 'unsettl', 'unst', 'unsuspect', 'unsympathet', 'unus', 'unw', 'unwatch', 'unwil', 'up', 'upcom', 'upd', 'uplift', 'upon', 'upset', 'upsid', 'urb', 'urg', 'us', 'useless', 'ustinov', 'ut', 'util', 'uw', 'vac', 'vacu', 'vad', 'vagu', 'vain', 'val', 'valentin', 'valid', 'valley', 'valu', 'vampir', 'van', 'vaness', 'vanill', 'vant', 'vary', 'vast', 'vault', 'veg', 'vega', 'vehic', 'vein', 'ven', 'venezuel', 'veng', 'venom', 'vent', 'ver', 'verb', 'verdict', 'verg', 'verhoev', 'vers', 'versatil', 'vert', 'vet', 'vhs', 'via', 'vib', 'vibr', 'vic', 'vict', 'victim', 'victor', 'vicy', 'vid', 'video', 'vietnam', 'view', 'viewpoint', 'vigil', 'vignet', 'vil', 'villain', 'vint', 'viol', 'vir', 'virgin', 'virt', 'virtu', 'vis', 'viscont', 'visit', 'vit', 'viv', 'vivid', 'voc', 'voic', 'void', 'voight', 'volum', 'volunt', 'vomit', 'von', 'vonnegut', 'vot', 'voy', 'vs', 'vulg', 'vuln', 'wacky', 'wag', 'wagn', 'wait', 'waitress', 'wak', 'wal', 'walk', 'wallac', 'walsh', 'walt', 'wan', 'wand', 'wang', 'wann', 'wannab', 'want', 'war', 'ward', 'wardrob', 'warhol', 'warm', 'warn', 'warp', 'warry', 'was', 'wash', 'washington', 'wast', 'wat', 'watch', 'watson', 'wav', 'wax', 'way', 'wayn', 'weak', 'weakest', 'weal', 'wealthy', 'weapon', 'wear', 'weary', 'weath', 'weav', 'web', 'websit', 'wed', 'wee', 'week', 'weekend', 'weight', 'weird', 'wel', 'welcom', 'well', 'wendigo', 'wendy', 'went', 'werewolf', 'werewolv', 'wes', 'west', 'western', 'wet', 'whack', 'whal', 'what', 'whatev', 'whatsoev', 'wheel', 'wheelchair', 'whenev', 'wherea', 'wheth', 'whilst', 'whin', 'whiny', 'whip', 'whistl', 'whit', 'who', 'whoev', 'whol', 'wholesom', 'whoop', 'whor', 'whos', 'why', 'wick', 'wid', 'widescreen', 'widmark', 'widow', 'wield', 'wif', 'wig', 'wil', 'wild', 'william', 'wilson', 'win', 'winchest', 'wind', 'window', 'wing', 'wint', 'wip', 'wir', 'wis', 'wisdom', 'wish', 'wit', 'witch', 'witchcraft', 'within', 'without', 'witty', 'wiv', 'wizard', 'woe', 'wolf', 'wom', 'wond', 'wonderland', 'wong', 'wont', 'woo', 'wood', 'woody', 'wor', 'word', 'work', 'world', 'worm', 'worn', 'worry', 'wors', 'worst', 'worthless', 'worthwhil', 'worthy', 'would', 'wound', 'wow', 'wrap', 'wreck', 'wrench', 'wrestl', 'wretch', 'wright', 'writ', 'wrong', 'wrot', 'wtf', 'ww', 'wwe', 'wwi', 'www', 'ya', 'yank', 'yard', 'yawn', 'ye', 'yeah', 'year', 'yearn', 'years', 'yel', 'yellow', 'yep', 'yesterday', 'yet', 'yoka', 'york', 'you', 'young', 'youngest', 'youngst', 'youth', 'youtub', 'zan', 'zen', 'zero', 'zizek', 'zomb', 'zomby', 'zon', 'zoom', 'zorro']
预测结果
使用随机森林分类器进行分类
from sklearn.ensemble import RandomForestClassifier
forest = RandomForestClassifier(n_estimators = 100)
forest = forest.fit(train_data_features, train["sentiment"])
输出提交结果
test = pd.read_csv("./data/testData.tsv", header=0, delimiter="\t",quoting=3 )
num_reviews = len(test["review"])
clean_test_reviews = []
for i in range(num_reviews):
clean_review = review_to_words( test["review"][i] )
clean_test_reviews.append( clean_review )
test_data_features = vectorizer.transform(clean_test_reviews)
test_data_features = test_data_features.toarray()
result = forest.predict(test_data_features)
output = pd.DataFrame( data={
"id":test["id"], "sentiment":result} )
output.to_csv( "Bag_of_Words_model.csv", index=False, quoting=3 )
尝试使用xgb
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(train_data_features, train["sentiment"], test_size=0.2)
model = XGBClassifier()
eval_set = [(X_test, y_test)]
model.fit(X_train, y_train, early_stopping_rounds=10, eval_metric="logloss", eval_set=eval_set, verbose=True)
result = model.predict(test_data_features)
output = pd.DataFrame( data={
"id":test["id"], "sentiment":result} )
output.to_csv( "xgbBag_of_Words_model.csv", index=False, quoting=3 )
[0] validation_0-logloss:0.676219
Will train until validation_0-logloss hasn't improved in 10 rounds.
[1] validation_0-logloss:0.662174
[2] validation_0-logloss:0.651057
[3] validation_0-logloss:0.640874
[4] validation_0-logloss:0.632644
[5] validation_0-logloss:0.625076
[6] validation_0-logloss:0.617619
[7] validation_0-logloss:0.611682
[8] validation_0-logloss:0.605249
[9] validation_0-logloss:0.599587
[10] validation_0-logloss:0.594965
[11] validation_0-logloss:0.589799
[12] validation_0-logloss:0.585117
[13] validation_0-logloss:0.580564
[14] validation_0-logloss:0.576377
[15] validation_0-logloss:0.572584
[16] validation_0-logloss:0.568511
[17] validation_0-logloss:0.565177
[18] validation_0-logloss:0.561793
[19] validation_0-logloss:0.558281
[20] validation_0-logloss:0.55503
[21] validation_0-logloss:0.552451
[22] validation_0-logloss:0.549323
[23] validation_0-logloss:0.546664
[24] validation_0-logloss:0.544006
[25] validation_0-logloss:0.54108
[26] validation_0-logloss:0.538433
[27] validation_0-logloss:0.535872
[28] validation_0-logloss:0.533465
[29] validation_0-logloss:0.5312
[30] validation_0-logloss:0.528723
[31] validation_0-logloss:0.526622
[32] validation_0-logloss:0.524268
[33] validation_0-logloss:0.522295
[34] validation_0-logloss:0.519956
[35] validation_0-logloss:0.518042
[36] validation_0-logloss:0.515848
[37] validation_0-logloss:0.514131
[38] validation_0-logloss:0.512278
[39] validation_0-logloss:0.510431
[40] validation_0-logloss:0.508723
[41] validation_0-logloss:0.506938
[42] validation_0-logloss:0.505074
[43] validation_0-logloss:0.50362
[44] validation_0-logloss:0.501969
[45] validation_0-logloss:0.500489
[46] validation_0-logloss:0.499067
[47] validation_0-logloss:0.497414
[48] validation_0-logloss:0.496192
[49] validation_0-logloss:0.494645
[50] validation_0-logloss:0.493216
[51] validation_0-logloss:0.49187
[52] validation_0-logloss:0.490369
[53] validation_0-logloss:0.489028
[54] validation_0-logloss:0.487349
[55] validation_0-logloss:0.486212
[56] validation_0-logloss:0.485081
[57] validation_0-logloss:0.483909
[58] validation_0-logloss:0.482761
[59] validation_0-logloss:0.481767
[60] validation_0-logloss:0.480625
[61] validation_0-logloss:0.479329
[62] validation_0-logloss:0.478402
[63] validation_0-logloss:0.477328
[64] validation_0-logloss:0.476377
[65] validation_0-logloss:0.475029
[66] validation_0-logloss:0.473751
[67] validation_0-logloss:0.472692
[68] validation_0-logloss:0.471596
[69] validation_0-logloss:0.470421
[70] validation_0-logloss:0.469413
[71] validation_0-logloss:0.468299
[72] validation_0-logloss:0.467431
[73] validation_0-logloss:0.466318
[74] validation_0-logloss:0.465558
[75] validation_0-logloss:0.464642
[76] validation_0-logloss:0.463728
[77] validation_0-logloss:0.462841
[78] validation_0-logloss:0.46207
[79] validation_0-logloss:0.461132
[80] validation_0-logloss:0.460134
[81] validation_0-logloss:0.45898
[82] validation_0-logloss:0.458173
[83] validation_0-logloss:0.457472
[84] validation_0-logloss:0.456591
[85] validation_0-logloss:0.456256
[86] validation_0-logloss:0.455629
[87] validation_0-logloss:0.454958
[88] validation_0-logloss:0.454081
[89] validation_0-logloss:0.453485
[90] validation_0-logloss:0.452779
[91] validation_0-logloss:0.452121
[92] validation_0-logloss:0.45126
[93] validation_0-logloss:0.450549
[94] validation_0-logloss:0.450048
[95] validation_0-logloss:0.44925
[96] validation_0-logloss:0.448478
[97] validation_0-logloss:0.447839
[98] validation_0-logloss:0.447183
[99] validation_0-logloss:0.446421
还是随机森林好用
发布者:全栈程序员-用户IM,转载请注明出处:https://javaforall.cn/143707.html原文链接:https://javaforall.cn
【正版授权,激活自己账号】: Jetbrains全家桶Ide使用,1年售后保障,每天仅需1毛
【官方授权 正版激活】: 官方授权 正版激活 支持Jetbrains家族下所有IDE 使用个人JB账号...