def parseline(line):
line = line.values.flatten().tolist() # flatten labeled point pandas dataframe to python list
strLine1 = listToString(line) # custom function just converts list to string for regex operations.
strLine2 = re.sub(r"^1:1 |2:\d+.\d+ ","",strLine1) # filter string to eliminate first two indices; python string
splitLine = strLine2.replace("0 ", "").split(" ") # eliminate specific val; split on spaces; python list of strings
positive = 0 # variable for presence/absence of something instantiated
for feature in splitLine:
featureIndex = feature.split(":")[0]
featureValue = feature.split(":")[1]
if featureIndex in toRemove: # toRemove is a list of vals to eliminate from each line; this works
positive = 1
newLine = ""
if positive == 1:
newLine = [i for i in toRemove not in splitLine] # goal here is to remove values found in the toRemove from the newLine
newLine = "1" + " " + newLine
print(newLine)
else:
newLine = "0" + " " + strLine2
return newLine
这是我正在完成的一个项目的一些代码。我已经成功地生成了一个列表,其中包含了我不希望在每一行中包含的值。该列表被称为 "toRemove"。
条件语句 "if featureIndex in toRemove "是有效的,打印语句证实了这一点,在 "toRemove "中发现的每个 "featureIndex "旁边打印 "This index needs removing from final list"。
问题是第二个条件语句(if positive ==1,vs,else)从 "if positive ==1 "条件中返回一个列表,这个列表只是 "toRemove "的重复。而 "else "条件实际上返回的是正确的列表。
例如
'if positive == 1:' list output:
['20', '68', '112', '264', '384', '449', '454', '749', '839',...] #this is just a copy of the 'toRemove' list
'else:' list output:
0 3:0.0 4:1 12:1 36710:1 36725:1 36791:1 86715:1 98190:1
我最初试图把这个问题作为一个数据类型的问题来处理,因此在转换语句旁边有记账的注释。
我在这里到底错在哪里?
EDIT:通过'parseline'函数发送的输入文件有以下格式。
1:1 2:00 3:00 4:1 9:1 20:1 40:1... # say index 20 is one of the indices in 'toRemove'
1:1 2:10 3:00 45:1 85:1 99:1 100:1... # say none of the index vals in this line are in 'toRemove'
"parseline(line) "删除了索引1和2,然后通过 "toRemove "列表解析,从列表中删除项目,为原始输入文件中的每一行输出 "newLine "字符串。
对于同样的两个示例输入,'newLine'的输出应该是
1 3:00 4:1 9:1 40:1... #notice index 20 is gone, and its presence in the list is accounted for by the 1
0 3:00 45:1 85:1 99:1 100:1... #notice since none of the indices in the original list were in the 'toRemove' list,
是一个数据类型的问题。问题已经解决了。谢谢大家。