我需要处理一个巨大的(12000+行)Excel范围并且处理速度非常快。
我不确定您使用的是什么编程语言,但一种方法是将数据导出为 .csv 格式,然后使用 pandas。为了提高时间复杂度,您可以对要处理的数据进行批处理,并使用 ThreadPoolExecutor 并行处理这些批处理,并在这些并行线程执行结束时聚合数据。请参考下面的例子:
def process_batch(batch: pd.DataFrame):
#Your processing logic goes here
def process_data_in_batches(file_path: str, batch_size: int):
# Read the CSV file
data = pd.read_csv(file_path)
# Split data into batches
batches = [data[i:i + batch_size] for i in range(0, len(data), batch_size)]
aggregated_results = []
with ThreadPoolExecutor() as executor:
futures = [executor.submit(process_batch, batch) for batch in batches]
for future in futures:
aggregated_results.extend(future.result())
return aggregated_results