数据挖掘 - 熊猫数据帧内存 - 吾爱随笔录

我有一个关于内存使用的问题。

我想做4件事：

1) make a dataframe from one of several columns from a datasource, say a json string
2) make the third column of the original dataset the index to the dataframe
3) change the name of another column
4) change the series i've created to a dataframe

我的问题是关于内存效率。似乎对于第 1 步），我首先加载整个数据帧，然后运行 concat 命令来连接我想要的列。

对于第 2 步，我再次需要将新数据框重新保存为另一个对象。

对于第 3 步，它似乎粘在那里，所以什么也没有。

如果存在，请提供更有效的方法来解决这个问题。

命令：

   df = pd.DataFrame(jsonobject)
   df = df.set_index("columnC")
   df.index.names= ["foo"]
   df1 = df["foo"].map(lambda x:x["id"])
   df2 = pd.DataFrame(df1)