我有以下两个数据框badges和comments. 我从数据框创建了一个“黄金用户”列表,badges其Class=1.
这里Name的意思是“徽章的名称”和徽章Class的等级(1=金,2=银,3=铜)。
我已经完成了文本预处理comments['Text'],现在想从comments['Text'].
我尝试了给定的代码,但出现错误:
"KeyError: "None of [Index(['1532', '290', '1946', '1459', '6094', '766', '10446', '3106', '1',\n '1587',\n ...\n '35760', '45979', '113061', '35306', '104330', '40739', '4181', '58888',\n '2833', '58158'],\n dtype='object', length=1708)] are in the [index]". Please provide me a way to fix this.
数据框 1(徽章)
Id | UserId | Name | Date |Class | TagBased
2 | 23 | Autobiographer | 2016-01-12T18:44:49.267 | 3 | False
3 | 22 | Autobiographer | 2016-01-12T18:44:49.267 | 3 | False
4 | 21 | Autobiographer | 2016-01-12T18:44:49.267 | 3 | False
5 | 20 | Autobiographer | 2016-01-12T18:44:49.267 | 3 | False
6 | 19 | Autobiographer | 2016-01-12T18:44:49.267 | 3 | False
数据框 2(评论)
Id| Text | UserId
6| [2006, course, allen, knutsons, 2001, course, ... | 3
8| [also, theo, johnsonfreyd, note, mark, haimans... | 1
代码
for index,rows in comments.iterrows():
gold_comments = rows[comments.Text.loc[gold_users]]
Counter(gold_comments)
