Python 捕获一组中的 1000 个项目进行处理

Question

我有一个包含数千条记录的大表（可能有 3,000 到 75,000 条记录），我将所有数字 ID 放入排序列表中。我想一次有序地处理一组 1000 个 ID。我如何优雅地获取前 1000 个和“标签”，设置为“223344 到 337788”（字典在这里有意义吗，或者只是列表捕获中的第一个/最后一个项目......以跟踪处理的集合））？依此类推，直到所有 ID 都在 1000 个 ID 的集合中（最后一个是余数）？我对熊猫一无所知，但看到过一个与熊猫有些类似的问题——这在这里有意义吗？对于这个可能笨重的问题表示歉意 - 在我的头脑中仍然有点笨重，可以在这里使用一个共鸣板。

Answer 1

您不一定需要使用 pandas 来完成此任务；一个常规的 Python 列表和一些基本逻辑就足够了。

# You can replace it with an actual list of IDs
ids = list(range(3000, 75000)) 

# to store processed sets
processed_sets = []

# Process data in groups(1000 items)
for i in range(0, len(ids), 1000):
    # Get the current set 
    current_set = ids[i:i + 1000]
    
    # Create a label 
    label = f"{current_set[0]} to {current_set[-1]}"
    
    # Store it
    processed_sets.append({
        'label': label,
        'ids': current_set
    })

for group in processed_sets:
    print(group['label'], group['ids'])

Answer 2

使用 pandas 来完成这项任务是有意义的，因为它为数据操作和处理提供了强大的工具。以下是如何在 pandas 中做到这一点

加载数据：创建 ID 列表。替换占位符值具有实际的整数 ID。
创建 DataFrame：将 ID 列表转换为 pandas DataFrame。
排序：按 ID 列对 DataFrame 进行排序。
拆分：将排序后的 DataFrame 拆分为每个 1000 行的块。
标签：用其包含的 ID 范围来标记每个块。
处理：迭代标记的块并处理每个块。

    import pandas as pd
    
    # Example list of IDs
    id_list = [223344, 223345, 223346, 300000, 300001, 337788]  # Replace with your actual IDs
    
    # Convert the list to a pandas DataFrame
    df = pd.DataFrame(id_list, columns=['ID'])
    
    # Sort the DataFrame
    df = df.sort_values(by='ID').reset_index(drop=True)
    
    # Define the chunk size
    chunk_size = 1000
    
    # Split the DataFrame into chunks and process each chunk
    chunks = [df[i:i + chunk_size] for i in range(0, df.shape[0], chunk_size)]
    
    # Label each chunk with its range
    chunk_labels = {}
    for i, chunk in enumerate(chunks):
        start_id = chunk['ID'].iloc[0]
        end_id = chunk['ID'].iloc[-1]
        chunk_label = f"{start_id} to {end_id}"
        chunk_labels[chunk_label] = chunk
    
    # Example of how to access a chunk by its label
    for label, chunk in chunk_labels.items():
        print(f"Processing IDs {label}")
        print(chunk)  # Replace this line with your actual processing logic

Python 捕获一组中的 1000 个项目进行处理

问题描述投票：0回答：2

2个回答

最新问题

Python 捕获一组中的 1000 个项目进行处理

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2