根据条件提取行Pandas Python

Question

如果应用某些条件，我需要提取行。

col1列应包含列表list_words中的所有单词。
最后一个单词应该是Story
下一行的最后一个单词应为b ac：

这是我当前的代码：

import pandas as pd

df = pd.DataFrame({'col1': ['Draft SW Quality Assurance Story', 'alex ac', 'anny ac', 'antoine ac','aze epic', 'bella ac', 'Complete SW Quality Assurance Plan Story', 'celine ac','wqas epic', 'karmen ac', 'kameilia ac', 'Update SW Quality Assurance Plan Story', 'joseph ac','Update SW Quality Assurance Plan ac', 'joseph ac'],
                   'col2': ['aa', 'bb', 'cc', 'dd','ee', 'ff', 'gg', 'hh', 'ii', 'jj', 'kk', 'll', 'mm', 'nn', 'oo']}) 
print(df)

list_words="SW Quality Plan Story"
set_words = set(list_words.split())
#check if list_words is in the cell
df['TrueFalse']=pd.concat([df.col1.str.contains(word,regex=False) for word in list_words.split()],axis=1).sum(1) > 1 

print('\n',df)
#extract last word
df["Suffix"] = df["col1"].str.split().str[-1]
print('\n',df)
df['ok']=''
for i in range (len(df)-1):
    if ((df["Suffix"].iloc[i]=='Story') & (df["TrueFalse"].iloc[i]=='True') & (df["Suffix"].iloc[i+1]=='ac')):
        df['ok'].iloc[i+1]=df["Suffix"].iloc[i+1]

print('\n',df)

输出：

                                         col1 col2  TrueFalse Suffix ok
0           Draft SW Quality Assurance Story   aa       True  Story   
1                                    alex ac   bb      False     ac   
2                                    anny ac   cc      False     ac   
3                                 antoine ac   dd      False     ac   
4                                   aze epic   ee      False   epic   
5                                   bella ac   ff      False     ac   
6   Complete SW Quality Assurance Plan Story   gg       True  Story   
7                                  celine ac   hh      False     ac   
8                                  wqas epic   ii      False   epic   
9                                  karmen ac   jj      False     ac   
10                               kameilia ac   kk      False     ac   
11    Update SW Quality Assurance Plan Story   ll       True  Story   
12                                 joseph ac   mm      False     ac   
13       Update SW Quality Assurance Plan ac   nn       True     ac   
14                                 joseph ac   oo      False     ac

第13行应设置为False

所需输出：

                                         col1 col2  TrueFalse Suffix 
0           Draft SW Quality Assurance Story   aa       True  Story 
1                                    alex ac   bb      False     ac   
2                                    anny ac   cc      False     ac   
3                                 antoine ac   dd      False     ac   
6   Complete SW Quality Assurance Plan Story   gg       True  Story   
7                                  celine ac   hh      False     ac   
11    Update SW Quality Assurance Plan Story   ll       True  Story   
12                                 joseph ac   mm      False     ac

Answer 1

这里是您可以完成此操作的一种方法。通过使用管道定界符来分割要搜索的字符串，从而利用正则表达式。检查同一列是否以故事结尾，并检查下一列（df.shift（-1））是否以ac结尾。

import pandas as pd

df = pd.DataFrame({'col1': ['Draft SW Quality Assurance Story', 'alex ac', 'anny ac', 'antoine ac','aze epic', 'bella ac', 'Complete SW Quality Assurance Plan Story', 'celine ac','wqas epic', 'karmen ac', 'kameilia ac', 'Update SW Quality Assurance Plan Story', 'joseph ac','Update SW Quality Assurance Plan ac', 'joseph ac'],
                   'col2': ['aa', 'bb', 'cc', 'dd','ee', 'ff', 'gg', 'hh', 'ii', 'jj', 'kk', 'll', 'mm', 'nn', 'oo']}) 
print(df)

list_words="SW Quality Plan Story"
set_words = set(list_words.split())
#check if list_words is in the cell
df['TrueFalse']=(df['col1'].str.contains('|'.join(word for word in set_words))) & (df['col1'].str.endswith('Story')) & (df['col1'].shift(-1).str.endswith('ac'))
print(df)

                                        col1 col2  TrueFalse
0           Draft SW Quality Assurance Story   aa       True
1                                    alex ac   bb      False
2                                    anny ac   cc      False
3                                 antoine ac   dd      False
4                                   aze epic   ee      False
5                                   bella ac   ff      False
6   Complete SW Quality Assurance Plan Story   gg       True
7                                  celine ac   hh      False
8                                  wqas epic   ii      False
9                                  karmen ac   jj      False
10                               kameilia ac   kk      False
11    Update SW Quality Assurance Plan Story   ll       True
12                                 joseph ac   mm      False
13       Update SW Quality Assurance Plan ac   nn      False
14                                 joseph ac   oo      False

Answer 2

这是您的不同条件。查看condition_1现在如何工作：

# Condition 1: col1 minus all words in set_words is empty!
df["condition_1"] = df.col1.apply(lambda x: not bool(set_words - set(x.split())))

# Condition 2: the last word should be 'Story'
df["condition_2"] = df.col1.str.endswith("Story") 

# Condition 3: the last word in the next row should be ac. See `shift(-1)`
df["condition_3"] = df.col1.str.endswith("ac").shift(-1) 

print(df)

输出：

                                        col1 col2  condition_1  condition_2 condition_3
0           Draft SW Quality Assurance Story   aa        False         True        True
1                                    alex ac   bb        False        False        True
2                                    anny ac   cc        False        False        True
3                                 antoine ac   dd        False        False       False
4                                   aze epic   ee        False        False        True
5                                   bella ac   ff        False        False       False
6   Complete SW Quality Assurance Plan Story   gg         True         True        True
7                                  celine ac   hh        False        False       False
8                                  wqas epic   ii        False        False        True
9                                  karmen ac   jj        False        False        True
10                               kameilia ac   kk        False        False       False
11    Update SW Quality Assurance Plan Story   ll         True         True        True
12                                 joseph ac   mm        False        False        True
13       Update SW Quality Assurance Plan ac   nn        False        False        True
14                                 joseph ac   oo        False        False         NaN

这里是查找满足所有三个条件的所有行的方法：

>>> print(df[df.condition_1 & df.condition_2 & df.condition_3])
                                        col1 col2  condition_2 condition_3  condition_1
6   Complete SW Quality Assurance Plan Story   gg         True        True         True
11    Update SW Quality Assurance Plan Story   ll         True        True         True

或者您可以将其存储为单独的列conditions：

df["conditions"] = df.condition_1 & df.condition_2 & df.condition_3

>>> print(df)
                                        col1 col2  condition_2 condition_3  condition_1  conditions
0           Draft SW Quality Assurance Story   aa         True        True        False       False
1                                    alex ac   bb        False        True        False       False
2                                    anny ac   cc        False        True        False       False
3                                 antoine ac   dd        False       False        False       False
4                                   aze epic   ee        False        True        False       False
5                                   bella ac   ff        False       False        False       False
6   Complete SW Quality Assurance Plan Story   gg         True        True         True        True
7                                  celine ac   hh        False       False        False       False
8                                  wqas epic   ii        False        True        False       False
9                                  karmen ac   jj        False        True        False       False
10                               kameilia ac   kk        False       False        False       False
11    Update SW Quality Assurance Plan Story   ll         True        True         True        True
12                                 joseph ac   mm        False        True        False       False
13       Update SW Quality Assurance Plan ac   nn        False        True        False       False
14                                 joseph ac   oo        False         NaN        False       False

根据条件提取行Pandas Python

问题描述投票：0回答：2

2个回答

最新问题

根据条件提取行Pandas Python

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2