根据前一行用扭曲填充列

数据挖掘 熊猫 数据争吵
2022-02-06 15:27:16

我正在努力解决熊猫问题。我有以下数据。

+--------+------+---------+---------+-------------+-------------+--------------+------------+-------------+------------+----------+
| symbol | side | status  | origQty | executedQty |     qty     | availableQty |   price    | boughtValue | soldValue  | dcaLevel |
+--------+------+---------+---------+-------------+-------------+--------------+------------+-------------+------------+----------+
| DGDBTC | BUY  | FILLED  |   0.125 |  0.12500000 |  0.12500000 |   0.12500000 | 0.02000700 |  0.00250088 |            |          |
| DGDBTC | BUY  | FILLED  |   0.125 |  0.12500000 |  0.12500000 |   0.25000000 | 0.01960100 |  0.00245013 |            |          |
| DGDBTC | SELL | FILLED  |    0.25 |  0.25000000 | -0.25000000 |   0.00000000 | 0.02005900 |             | 0.00501475 |          |
| DGDBTC | BUY  | FILLED  |   0.113 |  0.11300000 |  0.11300000 |   0.11300000 | 0.02203000 |  0.00248939 |            |          |
| DGDBTC | BUY  | FILLED  |   0.113 |  0.11300000 |  0.11300000 |   0.22600000 | 0.02160300 |  0.00244114 |            |          |
| DGDBTC | BUY  | EXPIRED |   0.226 |  0.00000000 |  0.00000000 |              | 0.02125500 |             |            |          |
| DGDBTC | BUY  | PARTIAL |   0.226 |  0.15800000 |  0.15800000 |   0.38400000 | 0.02126100 |             |            |          |
| DGDBTC | SELL | EXPIRED |   0.384 |  0.00000000 |  0.00000000 |              | 0.02196600 |             |            |          |
| DGDBTC | SELL | EXPIRED |   0.384 |  0.00000000 |  0.00000000 |              | 0.02214300 |             |            |          |
| DGDBTC | SELL | EXPIRED |   0.384 |  0.00000000 |  0.00000000 |              | 0.02189900 |             |            |          |
| DGDBTC | BUY  | FILLED  |   0.384 |  0.38400000 |  0.38400000 |   0.76800000 | 0.02082900 |  0.00799834 |            |          |
| DGDBTC | BUY  | FILLED  |   0.768 |  0.76800000 |  0.76800000 |   1.53600000 | 0.01984300 |  0.01523942 |            |          |
| DGDBTC | SELL | EXPIRED |   1.536 |  0.00000000 |  0.00000000 |              | 0.02074400 |             |            |          |
| DGDBTC | SELL | EXPIRED |   1.536 |  0.00000000 |  0.00000000 |              | 0.02094100 |             |            |          |
| DGDBTC | SELL | EXPIRED |   1.536 |  0.00000000 |  0.00000000 |              | 0.02076800 |             |            |          |
| DGDBTC | SELL | PARTIAL |   1.536 |  0.30300000 | -0.30300000 |   1.23300000 | 0.02065000 |             |            |          |
| DGDBTC | SELL | FILLED  |   1.233 |  1.23300000 | -1.23300000 |   0.00000000 | 0.02070000 |             | 0.02552310 |          |
+--------+------+---------+---------+-------------+-------------+--------------+------------+-------------+------------+----------+

这是按符号分组的数据子集。对于每个符号,我想用符合以下规则的值填充最后一列:

  1. 系列中的每个买单 (side=BUY) 的值为零 (0)。
  2. 对于每个连续的买单,价值增加一 (1)。
  3. 当达到卖单(side=SELL)时,它标志着一个新的买单系列。
  4. 状态为 EXPIRED 的行将被跳过。

例子:

+--------+------+---------+---------+-------------+-------------+--------------+------------+-------------+------------+----------+
| symbol | side | status  | origQty | executedQty |     qty     | availableQty |   price    | boughtValue | soldValue  | dcaLevel |
+--------+------+---------+---------+-------------+-------------+--------------+------------+-------------+------------+----------+
| DGDBTC | BUY  | FILLED  |   0.125 |  0.12500000 |  0.12500000 |   0.12500000 | 0.02000700 |  0.00250088 |            |        0 |
| DGDBTC | BUY  | FILLED  |   0.125 |  0.12500000 |  0.12500000 |   0.25000000 | 0.01960100 |  0.00245013 |            |        1 |
| DGDBTC | SELL | FILLED  |    0.25 |  0.25000000 | -0.25000000 |   0.00000000 | 0.02005900 |             | 0.00501475 |          |
| DGDBTC | BUY  | FILLED  |   0.113 |  0.11300000 |  0.11300000 |   0.11300000 | 0.02203000 |  0.00248939 |            |        0 |
| DGDBTC | BUY  | FILLED  |   0.113 |  0.11300000 |  0.11300000 |   0.22600000 | 0.02160300 |  0.00244114 |            |        1 |
| DGDBTC | BUY  | EXPIRED |   0.226 |  0.00000000 |  0.00000000 |              | 0.02125500 |             |            |          |
| DGDBTC | BUY  | PARTIAL |   0.226 |  0.15800000 |  0.15800000 |   0.38400000 | 0.02126100 |             |            |        2 |
| DGDBTC | SELL | EXPIRED |   0.384 |  0.00000000 |  0.00000000 |              | 0.02196600 |             |            |          |
| DGDBTC | SELL | EXPIRED |   0.384 |  0.00000000 |  0.00000000 |              | 0.02214300 |             |            |          |
| DGDBTC | SELL | EXPIRED |   0.384 |  0.00000000 |  0.00000000 |              | 0.02189900 |             |            |          |
| DGDBTC | BUY  | FILLED  |   0.384 |  0.38400000 |  0.38400000 |   0.76800000 | 0.02082900 |  0.00799834 |            |        3 |
| DGDBTC | BUY  | FILLED  |   0.768 |  0.76800000 |  0.76800000 |   1.53600000 | 0.01984300 |  0.01523942 |            |        4 |
| DGDBTC | SELL | EXPIRED |   1.536 |  0.00000000 |  0.00000000 |              | 0.02074400 |             |            |          |
| DGDBTC | SELL | EXPIRED |   1.536 |  0.00000000 |  0.00000000 |              | 0.02094100 |             |            |          |
| DGDBTC | SELL | EXPIRED |   1.536 |  0.00000000 |  0.00000000 |              | 0.02076800 |             |            |          |
| DGDBTC | SELL | PARTIAL |   1.536 |  0.30300000 | -0.30300000 |   1.23300000 | 0.02065000 |             |            |          |
| DGDBTC | SELL | FILLED  |   1.233 |  1.23300000 | -1.23300000 |   0.00000000 | 0.02070000 |             | 0.02552310 |          |
+--------+------+---------+---------+-------------+-------------+--------------+------------+-------------+------------+----------+

我尝试了以下两种方法。

merged_df['dcaLevel'] = merged_df[(merged_df['side'] == 'BUY') & (merged_df['status'].isin(['FILLED', 'PARTIAL']))].groupby(['symbol'])['dcaLevel'].apply(lambda x: x.shift(1) + 1)

这种方式会引发错误。

merged_df['dcaLevel'] = merged_df[(merged_df['side'] == 'BUY') & (merged_df['status'].isin(['FILLED', 'PARTIAL']))].groupby(['symbol'])['dcaLevel'].apply(lambda x: 0 if x.shift(1) else x.shift(1) + 1)

我尝试了以下替代方法。

symbol_df = merged_df.loc[merged_df['symbol'] == 'DGDBTC']
tmp_df = symbol_df[(symbol_df['side'] == 'BUY') & (symbol_df['status'].isin(['FILLED', 'PARTIAL']))]
tmp_df['dcaLevel'] = np.where(tmp_df['availableQty'] < tmp_df['availableQty'].shift(1), 0, tmp_df['dcaLevel'].shift(1) + 1)

它适用于某些行而不适用于其他行,并且系列中的第一个购买订单仍然是 NaN。

我编写了以下代码,它确实有效,但我确信有一种更简单的方法可以用 Pandas 做到这一点。

merged_df['dcaLevel'] = np.NaN
grouped = merged_df[merged_df['status'].isin(['FILLED', 'PARTIAL'])].groupby(['symbol'])
col_idx = merged_df.columns.get_loc('dcaLevel')
for name, group in grouped:
    first = True
    for index, row in group.iterrows():
        if row['side'] == 'SELL':
            first = True
            dca_level = np.NaN
        else:
            if first:
                first = False
                dca_level = 0
            else:
                dca_level = dca_level + 1
            merged_df.iloc[index, col_idx] = dca_level
merged_df[merged_df['symbol'] == 'DGDBTC']

我希望有人可以帮助解决这个问题。

1个回答

一个简单的方法可能如下:

import numpy as np
import pandas as pd

counter = 0

def conditional_cumulative_sum(x):
    global counter

    if x['status'] == 'EXPIRED':
        return np.nan
    elif x['side'] == 'BUY':
        temporal = counter
        counter += 1
        return temporal
    elif x['side'] == 'SELL':
        counter = 0
        return np.nan

frame = pd.read_csv('sample.csv')
frame['dcaLevel'] = frame.apply(conditional_cumulative_sum, axis=1)

假设sample.csv包含示例中提供的数据,并且列中的空值dcaLevel对应于 NaN 值。