我正在处理一个文本文件,其中包含一个不规则的结构,包含一个标题和不同部分的数据。我打算做的是遍历列表并在遇到某个角色后跳转到下一部分。我在下面做了一个简单的例子。处理这个问题的优雅方法是什么?
lines = ['a','b','c','$', 1, 2, 3]
for line in lines:
if line == '$':
print("FOUND END OF HEADER")
break
else:
print("Reading letters")
# Here, I start again, but I would like to continue with the actual
# state of the iterator, in order to only read the remaining elements.
for line in lines:
print("Reading numbers")
实际上,通过使用内置函数iter
在for循环外创建行迭代器,您可以为两个循环创建一个迭代器。这样,它将在第一个循环中部分耗尽,并在下一个循环中重复使用。
lines = ['a','b','c','$', 1, 2, 3]
iter_lines = iter(lines) # This creates and iterator on lines
for line in iter_lines :
if line == '$':
print("FOUND END OF HEADER")
break
else:
print("Reading letters")
for line in iter_lines:
print("Reading numbers")
以上打印此结果。
Reading letters
Reading letters
Reading letters
FOUND END OF HEADER
Reading numbers
Reading numbers
Reading numbers
您可以使用enumerate
来跟踪迭代中的位置:
lines = ['a','b','c','$', 1, 2, 3]
for i, line in enumerate(lines):
if line == '$':
print("FOUND END OF HEADER")
break
else:
print("Reading letters")
print(lines[i+1:]) #prints [1,2,3]
但是,除非你真的需要处理标题部分,否则@EdChum简单地使用index
的想法可能更好。
更简单的方式,也许更pythonic:
lines = ['a','b','c','$', 1, 2, 3]
print([i for i in lines[lines.index('$')+1:]])
# [1, 2, 3]
如果你想在$
之后读取每个元素到不同的变量,试试这个:
lines = ['a','b','c','$', 1, 2, 3]
a, b, c = [i for i in lines[lines.index('$')+1:]]
print(a, b, c)
# 1 2 3
或者如果你不知道跟随$
有多少元素,你可以这样做:
lines = ['a','b','c','$', 1, 2, 3, 4, 5, 6]
a, *b = [i for i in lines[lines.index('$')+1:]]
print(a, *b)
# 1 2 3 4 5 6
如果您有更多这种分隔符,最通用的解决方案是构建一个小型状态机来解析您的数据:
def state0(line):
pass # processing function for state0
def state1(line):
pass # processing function for state1
# and so on...
states = (state0, state1, ...) # tuple grouping all processing functions
separators = {'$':1, '#':2, ...} # linking separators and states
state = 0 # initial state
for line in text:
if line in separators:
print('Found separator', line)
state = separators[line] # change state
else:
states[state](line) # process line with associated function
该解决方案能够以任意顺序以任意数量的重复正确地处理任意数量的分隔符。唯一的限制是给定的分隔符始终跟随相同类型的数据,可以通过其关联的函数进行处理。