通过计数字符来确定变量值的位置

Question

我需要计算数据框中每个变量值的位置。例如，让我们使用这个数据框：

创建数据框

data = {
    'ol': ['H_KXKnn1_01_p_lk0', 'H_KXKnn1_02_p_lk0', 'H_KXKnn1_03_p_lk0'],
    'nl': [12.01, 89.01, 25.01],
    'nol': ['Xn', 'Ln', 'Rn'],
    'nolp': [68, 70, 72],
    'nolxx': [0.0, 1.0, 5.0]
}

df = pd.DataFrame(data)

我将此数据框保存为 .dat

df.to_csv('your_file.dat', sep='\t', index=False)

当我计算 .dat 文件中每个值字符开始和结束的位置时，我会得到：

variable position (start,end)
ol        (0,17)
nl        (18,23)
nol       (24,26)
nolp      (27,29)
nolxx     (30,33)

我也考虑将“_”、“.”和空格视为字符。但是，当我运行这段迭代每一列的代码时：

for col in df.columns:
    col_length = df[col].astype(str).apply(len).max() + df[col].astype(str).apply(lambda x: x.count('_') + x.count('.')).max()
    positions[col] = (current_pos, current_pos + col_length - 1)
    current_pos += col_length + 1

positions_df = pd.DataFrame(list(positions.items()), columns=['Variable', 'Position'])

它返回以下值：

  Variable    Position
      ol      (0, 20)
      nl     (22, 27)
     nol     (29, 30)
    nolp     (32, 33)
   nolxx     (35, 38)

我不确定为什么它返回不同的数字/位置。非常欢迎任何帮助我如何做到这一点！谢谢！！

Answer 1

第一列中所有字符串的长度为

。您正在向其添加附加值，从而使结果有所不同。由于字符串中有 3 个

'_'

，因此它变为 17 + 3 = 20。

df[col].astype(str).apply(lambda x: x.count('_') + x.count('.')

这是代码的修改版本，它产生与第一个代码相同的输出：

positions = {}
current_pos = 0

for col in df.columns:
    col_length = df[col].astype(str).apply(len).max()
    positions[col] = (current_pos, current_pos + col_length)
    current_pos += col_length + 1

positions_df = pd.DataFrame(list(positions.items()), columns=['Variable', 'Position'])

这是输出：

 Variable  Position
0       ol   (0, 17)
1       nl  (18, 23)
2      nol  (24, 26)
3     nolp  (27, 29)
4    nolxx  (30, 33)

通过计数字符来确定变量值的位置

问题描述投票：0回答：1

创建数据框

1个回答

最新问题

通过计数字符来确定变量值的位置

问题描述 投票：0回答：1

创建数据框

1个回答

最新问题

问题描述投票：0回答：1