我有一个熊猫数据框,其中包含部分损坏的数据字段,如下所示。它有数字(不是日期)或 nans。真实的数据框也有非常多的行。我想在其中获取非日期值并将它们按行分配给最接近它的日期。例如,如果第 3 行中的日期字段是 nan 或垃圾值(数字或字符串),我希望第 3 行中的日期等于第 2 行或第 4 行中的日期。有没有办法这样做不涉及在 for 循环中迭代整个数据帧吗?
inputArr = [['A', Timestamp('2021-06-01 00:00:00'), 9],
['A', Timestamp('2021-06-01 00:00:00'), 60],
['A', Timestamp('2021-06-01 00:00:00'), 39],
['A', 3, 51],
['A', Timestamp('2021-06-01 00:00:00'), 99],
['B', Timestamp('2021-06-01 00:00:00'), 21],
['B', Timestamp('2021-06-01 00:00:00'), 93],
['B', Timestamp('2021-06-01 00:00:00'), 42],
['B', 'xpwh1i3992aisan', 87],
['B', Timestamp('2021-06-01 00:00:00'), 33],
['C', nan, 72],
['C', Timestamp('2021-06-01 00:00:00'), 90],
['C', Timestamp('2021-06-01 00:00:00'), 42],
['C', 3, 87],
['C', 'items 44', 30],
['D', Timestamp('2021-06-01 00:00:00'), 75],
['D', Timestamp('2021-06-01 00:00:00'), 87],
['D', Timestamp('2021-06-01 00:00:00'), 78],
['D', 3, 75],
['D', Timestamp('2021-06-01 00:00:00'), 60],
['E', Timestamp('2021-06-01 00:00:00'), 0],
['E', nan, 69],
['E', Timestamp('2021-06-01 00:00:00'), 21],
['E', 3, 30],
['E', Timestamp('2021-06-01 00:00:00'), 69]]
trialPD = pd.DataFrame(inputArr, columns = ["Name", "date_purchase", "num_items"])