利用Python做一个词频统计

GitHub地址：FightingBob 【Give me a star , thanks.】

词频统计

　　对纯英语的文本文件【Eg: 瓦尔登湖(英文版).txt】的英文单词出现的次数进行统计，并记录起来

代码实现

 1 import string
 2 from os import path
 3 with open('瓦尔登湖(英文版).txt','rb') as text1:
 4     words = [word.strip(string.punctuation).lower() for word in str(text1.read()).split()]
 5     words_index = set(words)
 6     count_dict = {index:words.count(index) for index in words_index}
 7     with open(path.dirname(__file__) + '/file1.txt','a+') as text2:
 8         text2.writelines('以下是词频统计的结果：' + '\n')
 9         for word in sorted(count_dict,key=lambda x:count_dict[x],reverse=True):
10             text2.writelines('{}--{} times'.format(word,count_dict[word]) + '\n')
11         text1.close()
12         text2.close()

代码解析　　
- 获取文件，以二进制格式打开文件，用于读取内容
  - 　　1 with open(‘瓦尔登湖(英文版).txt’,’rb’) as text1:
- 获取单词列表
  - 先读取内容
    - 　　content = text1.read()
  - 再获取单词列表（使用split() 通过指定分隔符对字符串进行切片）
    - 　　words = content.split()
  - 单词大写改小写，去掉单词前后符号
    - 　　word,strip(string.punctuation).lower()
  - 去除重复的单词
    - 　　words_index = set(words)
- 设置单词：单词次数的字典　　　　　　
  - 　　count_dict = {index:words.count(index) for index in words_index}
- 写入词频统计
  - 先创建文件，获取当前目录，并以追加写入的方式写入
    - 　　with open(path.dirname(__file__) + ‘/file1.txt’,’a+’) as text2:
  - 换行写入
    - 　　text2.writelines(‘以下是词频统计的结果：’ + ‘\n’)
  - 对单词进行排序，根据次数从大到小【key=lambda x:count_dict[x]以值排序】
    - 　　sorted(count_dict,key=lambda x:count_dict[x],reverse=True)
  - 换行写入词频
    - 　　text2.writelines(‘{}–{} times’.format(word,count_dict[word]) + ‘\n’)
  - 关闭资源
    - 　　text1.close()
    - 　　text2.close()

GitHub地址：FightingBob 【Give me a star , thanks.】　　　　　　　　　　

转载于:https://www.cnblogs.com/littlebob/p/9189794.html

发布者：全栈程序员-用户IM，转载请注明出处：https://javaforall.cn/107469.html原文链接：https://javaforall.cn

【正版授权，激活自己账号】： Jetbrains全家桶Ide使用，1年售后保障，每天仅需1毛

【官方授权正版激活】： 官方授权正版激活支持Jetbrains家族下所有IDE 使用个人JB账号...

Python 词频统计

利用Python做一个词频统计

词频统计

对纯英语的文本文件【Eg: 瓦尔登湖(英文版).txt】的英文单词出现的次数进行统计，并记录起来

代码实现

代码解析

获取文件，以二进制格式打开文件，用于读取内容

获取单词列表

先读取内容

再获取单词列表（使用split() 通过指定分隔符对字符串进行切片）

单词大写改小写，去掉单词前后符号

去除重复的单词

设置单词：单词次数的字典

写入词频统计

先创建文件，获取当前目录，并以追加写入的方式写入

换行写入

对单词进行排序，根据次数从大到小【key=lambda x:count_dict[x]以值排序】

换行写入词频

关闭资源

相关推荐

差分曼彻斯特编码详解「建议收藏」

硬件设计——外围电路（电源电路）[通俗易懂]

小程序面试题及答案2019_小程序面试问的技术点

pycharm界面怎么调成中文版(pycharm怎么破解)

java weakhashmap_解析WeakHashMap与HashMap的区别详解

HTTP_REFERER的用法及伪造

发表回复

　　对纯英语的文本文件【Eg: 瓦尔登湖(英文版).txt】的英文单词出现的次数进行统计，并记录起来

代码解析　　

设置单词：单词次数的字典