绘制大数据直方图

Question

我正在尝试用 python 绘制大数据（近 700 万个点）的直方图，我想知道值的频率。我已经尝试过这段代码，但是花了太长时间才完成一个多小时！那么，有什么建议吗？

import numpy as np
import matplotlib.pyplot as plt

file_path = "D:/results/planarity2.txt" 
data_array = []

with open(file_path, "r") as file:
    for line in file:
        value = line.strip()  
        data_array.append(value)
column_values = data_array 

unique_values, counts = np.unique(column_values, return_counts=True)

value_frequency = dict(zip(unique_values, counts))


x_values = list(value_frequency.keys())
y_values = list(value_frequency.values())

plt.bar(x_values, y_values, edgecolor='black', alpha=0.7)


plt.xlabel('Column Values')
plt.ylabel('Frequency')
plt.title('Frequency of Points Based on Column Values')
plt.show()

我也试过了，但没用

import numpy as np
import matplotlib.pyplot as plt

file_path = "D:/results/planarity2.txt" 
data_array = []

with open(file_path, "r") as file:
    for line in file:
        value = line.strip()  
        data_array.append(value)
column_values = data_array 
value_frequency = {}

for value in column_values:
    if value in value_frequency:
        value_frequency[value] += 1
    else:
        value_frequency[value] = 1

x_values = list(value_frequency.keys())
y_values = list(value_frequency.values())

plt.bar(x_values, y_values, edgecolor='black', alpha=0.7)

plt.xlabel('Column Values')
plt.ylabel('Frequency')
plt.title('Frequency of Points Based on Column Values')
plt.show()

Answer 1

我认为您的主要问题是您似乎正在读取文件并将内容保留为字符串，而不是将值转换为数字并将它们保存在 NumPy 数组中（假设您的值只是数字？）。拥有 700 万个数据点应该不是一个特别的问题。首先要尝试的一件事是使用 NumPy

loadtxt

函数读取文件，该函数在读取值并输出 NumPy 数组时会自动将值转换为浮点数。例如，而不是：

file_path = "D:/results/planarity2.txt" 
data_array = []

with open(file_path, "r") as file:
    for line in file:
        value = line.strip()  
        data_array.append(value)
column_values = data_array

只需：

file_path = "D:/results/planarity2.txt"
column_values = np.loadtxt(file_path)

看看是否有帮助。

绘制大数据直方图

问题描述投票：0回答：1

1个回答

最新问题

绘制大数据直方图

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1