迭代多个列表的最快方法

Question

我正在尝试获取 2 个巨大的文本文件，将它们的数据合并到一个文本文件中。

一个文本文件包含 200 000 - 500 000 行

[node_id, x, y, z, temperature]

，另一个包含向量的 203 000 行文件。

其想法是识别一个文件中的节点到另一个文件中的向量，并将它们组合成一个新的文本文件。我已经设法将这两个文件分成 3-6k 行的块来管理时间，但我只能想到：

A
```
for
```
在向量上循环以获得起始和结束坐标。
A
```
for
```
循环每个节点以检查向量框的坐标。
计算节点到向量的距离并选择其中的向量。
获取下一个向量。

这仍然会在一个 for 循环中创建嵌套 5000 次迭代的 for 循环，其中嵌套了另一个 5000 次迭代的 for 循环。有没有更快的方法来做到这一点？

节点文件示例： [node_id，x，y，z，温度]

[21,-10.0,-12.0,4.0,160,0]

矢量文件示例： [矢量_时间、x、y、z、瓦数]

[8.83,-9.82,-3.16,0.05。 150.00]

我已经尝试过：

with open(rf'Node_Temp_result.txt', 'r') as nodeFile:
    nodesInfo = nodeFile.readlines()

with open(rf'laser.txt', 'r') as laserFile:
    laserInfo = laserFile.readlines()
    
for laser in tqdm(range(currentLaserMin, currentLaserNext)):
        local_laser = Split(laserInfo, laser)
        laser_search = Laser(local_laser.col0(), local_laser.col1(), local_laser.col2(), local_laser.col3(), local_laser.col4())
        splits = 0

        x_search_laser = laser_search.laser_x()
        y_search_laser = laser_search.laser_y()
        z_search_laser = laser_search.laser_layer()

        if layer - 1 < z_search_laser <= layer and laser_search.laser_power() > 0:
            laser_next = Split(laserInfo, laser + 1)
            laser_search_next = Laser(laser_next.col0(), laser_next.col1(), laser_next.col2(), laser_next.col3(), laser_next.col4())

            x_start = x_search_laser
            x_end = laser_search_next.laser_x()

            y_start = y_search_laser
            y_end = laser_search_next.laser_y()

            num = 0

            for node in range(currentNodeMin, currentNodeNext):
                local_node = Split(nodesInfo, node)
                node_search = Node(local_node.col0(), local_node.col1(), local_node.col2(), local_node.col3(), local_node.col4())

                x_search_node = node_search.node_x()
                y_search_node = node_search.node_y()
                z_search_node = node_search.node_layer()

                if node_search.node_temp() > 250:
                    if layer - 1 < z_search_node <= layer:
                        if x_start <= x_search_node <= x_end and y_start <= y_search_node <= y_end:

                            crossLocations = cross_finder(x_start, x_end, y_start, y_end)

                            if crossLocations[3] < 0.07:
                                splits += 1
                                'append the node to the vector'

Answer 1

您可以考虑执行此任务的一些选项是：

使用Python多重处理，这将允许您并行运行任务：https://docs.python.org/3/library/multiprocessing.html
使用dask dataframes：https://www.dask.org/这是一个非常易于使用的库，具有非常好的性能。
使用pola.rs数据帧：https://www.pola.rs/Polars的功能较少，文档也较差，但应该有更好的性能。
使用 PySpark 在集群中运行任务：https://spark.apache.org/docs/latest/api/python/
使用更快的编程语言，例如 Rust、C 或 C++。

迭代多个列表的最快方法

问题描述投票：0回答：1

1个回答

最新问题

迭代多个列表的最快方法

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1