Python：从文件中选择随机行，然后删除该行

Question

我是Python的新手（因为我通过Codecademy课程学习了它）并且可以使用一些帮助来搞清楚这一点。

我有一个文件'TestingDeleteLines.txt'，大约有300行文字。现在，我试图让它从该文件中打印出10条随机行，然后删除这些行。

所以，如果我的文件有10行：

Carrot
Banana
Strawberry
Canteloupe
Blueberry
Snacks
Apple
Raspberry
Papaya
Watermelon

我需要它从这些行中随机挑出，告诉我它是随机挑选的蓝莓，胡萝卜，西瓜和香蕉，然后删除这些行。

问题是，当Python读取文件时，它会读取该文件，一旦它到达结尾，它就不会返回并删除这些行。我目前的想法是，我可以将行写入列表，然后重新打开文件，将列表与文本文件匹配，如果找到匹配项，则删除行。

我目前的问题是双重的：

它复制了随机元素。如果它选择一条线，我需要它不再选择相同的线。但是，使用random.sample似乎不起作用，因为当我稍后使用每一行附加到URL时，我需要将这些行分开。
我不觉得我的逻辑（写入数组 - >在文件文件中找到匹配 - >删除）是最理想的逻辑。有没有更好的方法来写这个？ import webbrowser import random """url= 'http://www.google.com' webbrowser.open_new_tab(url+myline)""" Eventually, I need a base URL + my 10 random lines opening in each new tab def ShowMeTheRandoms(): x=1 DeleteList= [] lines=open('TestingDeleteLines.txt').read().splitlines() for x in range(0,10): myline=random.choice(lines) print(myline) """debugging, remove later""" DeleteList.append(myline) x=x+1 print DeleteList """debugging, remove later""" ShowMeTheRandoms()

Answer 1

我有一个文件'TestingDeleteLines.txt'，大约有300行文字。现在，我试图让它从该文件中打印出10条随机行，然后删除这些行。

#!/usr/bin/env python
import random

k = 10
filename = 'TestingDeleteLines.txt'
with open(filename) as file:
    lines = file.read().splitlines()

if len(lines) > k:
    random_lines = random.sample(lines, k)
    print("\n".join(random_lines)) # print random lines

    with open(filename, 'w') as output_file:
        output_file.writelines(line + "\n"
                               for line in lines if line not in random_lines)
elif lines: # file is too small
    print("\n".join(lines)) # print all lines
    with open(filename, 'wb', 0): # empty the file
        pass

O(n**2)算法是必要的can be improved（你不需要它来输入一个小文件）

Answer 2

要点是：你不要从文件中“删除”，而是用新内容重写整个文件（或另一个文件）。规范的方法是逐行读取原始文件，将要保留的行写回临时文件，然后用新文件替换旧文件。

with open("/path/to/source.txt") as src, open("/path/to/temp.txt", "w") as dest:
    for line in src:
        if should_we_keep_this_line(line):
            dest.write(line)
os.rename("/path/to/temp.txt", "/path/to/source.txt")

Answer 3

那么list.pop呢 - 它会为你提供项目并一步更新列表。

lines = readlines()
deleted = []

indices_to_delete = random.sample(xrange(len(lines)), 10)

# sort to delete biggest index first 
indices_to_delete.sort(reverse=True)

for i in indices_to_delete:
    # lines.pop(i) delete item at index i and return the item
    # do you need it or its index in the original file than
    deleted.append((i, lines.pop(i)))

# write the updated *lines* back to the file or new file ?!
# and you have everything in deleted if you need it again

Answer 4

要从文件中选择随机行，您可以使用节省空间的单通道reservoir-sampling algorithm。要删除该行，您可以打印除所选行之外的所有内容：

#!/usr/bin/env python3
import fileinput

with open(filename) as file:
    k = select_random_it(enumerate(file), default=[-1])[0]

if k >= 0: # file is not empty
    with fileinput.FileInput(filename, inplace=True, backup='.bak') as file:
        for i, line in enumerate(file):
            if i != k: # keep line
                print(line, end='') # stdout is redirected to filename

其中select_random_it() implements the reservoir-sampling algorithm：

import random

def select_random_it(iterator, default=None, randrange=random.randrange):
    """Return a random element from iterator.

    Return default if iterator is empty.
    iterator is exhausted.
    O(n)-time, O(1)-space algorithm.
    """
    # from https://stackoverflow.com/a/1456750/4279
    # select 1st item with probability 100% (if input is one item, return it)
    # select 2nd item with probability 50% (or 50% the selection stays the 1st)
    # select 3rd item with probability 33.(3)%
    # select nth item with probability 1/n
    selection = default
    for i, item in enumerate(iterator, start=1):
        if randrange(i) == 0: # random [0..i)
            selection = item
    return selection

要从文件中打印k随机行并删除它们：

#!/usr/bin/env python3
import random
import sys

k = 10
filename = 'TestingDeleteLines.txt'
with open(filename) as file:
    random_lines = reservoir_sample(file, k) # get k random lines

if not random_lines: # file is empty
    sys.exit() # do nothing, exit immediately

print("\n".join(map(str.strip, random_lines))) # print random lines
delete_lines(filename, random_lines) # delete them from the file

其中reservoir_sample()使用与select_random_it()相同的算法但允许选择k项而不是一个：

import random

def reservoir_sample(iterable, k,
                     randrange=random.randrange, shuffle=random.shuffle):
    """Select *k* random elements from *iterable*.

    Use O(n) Algorithm R https://en.wikipedia.org/wiki/Reservoir_sampling

    If number of items less then *k* then return all items in random order.
    """
    it = iter(iterable)
    if not (k > 0):
        raise ValueError("sample size must be positive")

    sample = list(islice(it, k)) # fill the reservoir
    shuffle(sample)
    for i, item in enumerate(it, start=k+1):
        j = randrange(i) # random [0..i)
        if j < k:
            sample[j] = item # replace item with gradually decreasing probability
    return sample

和delete_lines()实用程序功能删除文件中选择的随机行：

import fileinput
import os

def delete_lines(filename, lines):
    """Delete *lines* from *filename*."""
    lines = set(lines) # for amortized O(1) lookup
    with fileinput.FileInput(filename, inplace=True, backup='.bak') as file:
        for line in file:
            if line not in lines:
                print(line, end='')
    os.unlink(filename + '.bak') # remove backup if there is no exception

qazxsw poi，qazxsw poi函数不会将整个文件加载到内存中，因此它们可以用于任意大文件。

Answer 5

让我们假设您有一个存储在reservoir_sample()中的文件中的行列表

delete_lines()

从这里，您将覆盖以前的文本文件，其中items的内容与您的首选行结尾\ r \ n或\ n连接。 >>> items = ['a', 'b', 'c', 'd', 'e', 'f'] >>> choices = random.sample(items, 2) # select 2 items >>> choices # here are the two ['b', 'c'] >>> for i in choices: ... items.remove(i) ... >>> items # tee daa, no more b or c ['a', 'd', 'e', 'f']不会删除行结尾，因此如果您使用该方法，则无需添加自己的行结尾。

Answer 6

也许你可以尝试使用0到300生成10个随机数

items

然后通过使用列表推导复制从行数组中删除：

readlines()

然后将行写回'TestingDeleteLines.txt'。

要查看上述复制代码的工作原理，这篇文章可能会有所帮助：

deleteLineNums = random.sample(xrange(len(lines)), 10)

编辑：要获得随机生成的索引的行，只需执行：

linesCopy = [line for idx, line in enumerate(lines) if idx not in deleteLineNums]
lines[:] = linesCopy

实际的行包含随机生成的行索引的实际行文本。

编辑：甚至更好，使用列表理解：

Remove items from a list while iterating

Python：从文件中选择随机行，然后删除该行

问题描述投票：2回答：6

6个回答

最新问题

Python：从文件中选择随机行，然后删除该行

问题描述 投票：2回答：6

6个回答

最新问题

问题描述投票：2回答：6