将Python字符串拆分为最靠近中间的换行符

问题描述 投票:0回答:4

我在python中有一个字符串,长度约为3900个字符。该字符串有多个字符,包括新行多次。为简单起见,请考虑以下字符串

s = "this is a looooooooooooooooooooooooooong string which is \n split into \n a lot of \n new lines \n and I need to split \n it into roughly \n two halves on the new line\n"

我想将上面的字符串分成大约两半\ n所以期望的结果将是这样的:

first part = "this is a looooooooooooooooooooooooooong string which is \n split into \n a lot of "
second part = " new lines \n and I need to split \n it into roughly \n two halves on the new line\n"

我有这个python代码:

firstpart, secondpart = s[:len(s)/2], s[len(s)/2:]

但显然这会将字符串拆分为恰好位于该位置的任何字符的一半。

python string
4个回答
3
投票

像这样的东西?

mid = len(s)//2

try:
    break_at = mid + min(-s[mid::-1].index('\n'), s[mid:].index('\n'), key=abs)
except ValueError:  # if '\n' not in s
    break_at = len(s)

firstpart, secondpart = s[:break_at], s[break_at:]

secondpart将以换行符开头。


2
投票

试试这个:

mid = len(s)/2
about_mid = mid + s[mid:].index('\n')

parts = s[:about_mid], s[about_mid+1:]

2
投票

这是另一种方式。拆分'\n'上的字符串,并跟踪3件事:

  • 拆分字符串列表中的索引
  • 当前子字符串的位置与字符串中间的绝对差异
  • 子串

例如:

s_split = [(i, abs(len(s)//2 - s.find(x)), x) for i, x in enumerate(s.split('\n'))]
#[(0, 81, 'this is a looooooooooooooooooooooooooong string which is '),
# (1, 23, ' split into '),
# (2, 10, ' a lot of '),
# (3, 1, ' new lines '),
# (4, 13, ' and I need to split '),
# (5, 35, ' it into roughly '),
# (6, 53, ' two halves on the new line'),
# (7, 81, '')]

现在,您可以通过元组中的第二个元素对此列表进行排序,以找到最接近中间的子字符串。使用此索引通过使用'\n'连接来构建字符串:

idx_left = min(s_split, key=lambda x: x[1])[0]
first = "\n".join([s_split[i][2] for i in range(idx_left)])
second = "\n".join([s_split[i][2] for i in range(idx_left, len(s_split))])

print("%r"%first)
print("%r"%second)
#'this is a looooooooooooooooooooooooooong string which is \n split into \n a lot of '
#' new lines \n and I need to split \n it into roughly \n two halves on the new line\n'

0
投票

也试试这个。

split=s.splitlines()
half=int(len(split)/2)

first=''.join(split[half:])
second=''.join(split[:half])
© www.soinside.com 2019 - 2024. All rights reserved.