Python - 从字符串中提取由连字符分隔的数字

问题描述 投票:0回答:3

我正在尝试提取数据框列(名称深度)中用连字符分隔的 2 个数字(深度自和深度至)。虽然第一个数字被正确提取,但第二个数字却没有正确提取。我尝试过很多方法。

ConvCore = pd.read_csv(r'ConvCore.csv', encoding='cp1252')
ConvCore.columns = ['Depth', 'k', 'phi', 'Well']
ConvCore['DepthFrom'] = ConvCore['Depth'].str.extract('([0-9.]+)')

#ConvCore['DepthTo'] = ConvCore['Depth'].str.extract('-([0-9.]+)')
#for i in ConvCore:
    #ConvCore['DepthTo'] = re.search(r'(\d+)-', ConvCore['Depth'][i-1])
    #ConvCore['DepthFrom'] = ConvCore['Depth'].str.extract('(\d+)').astype(float)
    #DepthTo = ConvCore['Depth'].str.extract('(?P<digit1>[0123456789])').astype(float)
    #ConvCore['DepthTo'] = ConvCore['Depth'].str.split("-")
    #ConvCore['DepthFrom'] = re.match(r'(\d+)', ConvCore['Depth']).group()

python string expression extract
3个回答
0
投票

试试这个方法:

ConvCore['DepthFrom'] = ConvCore['Depth'].str.split("-", expand=True)[0]
ConvCore['DepthTo'] = ConvCore['To'].str.split("-", expand=True)[1]

0
投票

您可以拆分值,然后将新值分配给数据框。我使用示例数据集来模拟您的场景,

In [4]: df = pd.DataFrame({'num_legs': ['20-30', '40-60', '80-90', '0-10'],
    ...:
    ...:                    'num_wings': [2, 0, 0, 0],
    ...:
    ...:                    'num_specimen_seen': [10, 2, 1, 8]},
    ...:
    ...:                   index=['falcon', 'dog', 'spider', 'fish'])

In [5]: ndf = pd.DataFrame(df.num_legs.str.split('-').tolist(), columns = ['x1', 'x2'])

In [6]: df[ ndf.columns ] = ndf.values

In [7]: df
Out[7]:
       num_legs  num_wings  num_specimen_seen  x1  x2
falcon    20-30          2                 10  20  30
dog       40-60          0                  2  40  60
spider    80-90          0                  1  80  90
fish       0-10          0                  8   0  10

所以在你的情况下,代码应该是这样的,

ndf = pd.DataFrame(ConvCore.Depth.str.split('-').tolist(), columns = ['DepthFrom', 'DepthTo'])

ConvCore[ ndf.columns ] = ndf.values

0
投票

您的列值中有单个 - 实例,因此使用

ConvCore['DepthFrom'] = ConvCore['Depth'].str.extract(r'(\d*\.?\d+)-', expand=False)
ConvCore['DepthTo'] = ConvCore['Depth'].str.extract(r'-(\d*\.?\d+)', expand=False)

正则表达式证明“从”正则表达式证明“到”

正则表达式说明

\d*  matches a digit (equivalent to [0-9]) between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\.? matches the character . with index 4610 (2E16 or 568) literally between zero and one times, as many times as possible, giving back as needed (greedy)
\d+ matches a digit (equivalent to [0-9]) between one and unlimited times, as many times as possible, giving back as needed (greedy)
© www.soinside.com 2019 - 2024. All rights reserved.