我想为日志文件的偏移量绘制概率密度函数。 这是代码:
timestamps = []
sequences = []
log_Name = 'test_rtt_25-01-17_13-07-41_values5_rate50.log'
log_Path = "/home/ubuntu/results-25-01-09-docker/"
true_Path = log_Path + log_Name
with open(true_Path, "r") as f:
for line in f:
if not line.startswith('#'):
time_part, seq_part = line.strip().split('(')
base, offset = time_part.split('+')
timestamps.append(float(offset))
seq = int(seq_part[:-1])
sequences.append(seq)
代码从日志文件中读取数据,然后将偏移量和序列保存在“时间戳”和“序列”中。
这是“时间戳”和“序列”的示例。
[0.001009023, 0.001055868, 0.000992934, 0.001148472, 0.001086814, 0.001110649, 0.001066759, 0.00126167, 0.001231778, 0.000944345]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
如您所见,有 10 个偏移和 10 个序列。每个偏移量都有其编号,例如:0.001009023 是编号 1。 我想绘制概率密度函数,我尝试了这个:
source = {'seqs': sequences, 'times': timestamps}
df = pd.DataFrame(source)
df.sort_values(by = ['times'], inplace=True)
df_mean = np.mean(df['times'])
df_std = np.std(df['times'])
pdf = stats.norm.pdf(df['times'], df_mean, df_std)
plt.plot(df['times'], pdf)
plt.xlabel('Offsets') # Label for the x-axis
plt.savefig('/home/ubuntu/')
我不知道为什么概率比1大得多,它应该小于1。有人知道我哪里做错了吗?
以下是如何估计数据基础分布的 PDF:最大似然估计(假设正态分布)、核密度估计和 Rosenblatt 的平移直方图。
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats, integrate
times = [0.001009023, 0.001055868, 0.000992934, 0.001148472, 0.001086814,
0.001110649, 0.001066759, 0.00126167, 0.001231778, 0.000944345]
times = np.asarray(times)
x = np.linspace(0.0008, 0.0014, 300)
# Maximum likelihood estimate normal distribution
mu, sigma = stats.norm.fit(times) # simply the mean and uncorrected variance
X = stats.Normal(mu=mu, sigma=sigma)
# Kernel density estimate
Y = stats.gaussian_kde(times)
# Rosenblatt's Shifted Histogram
z = stats.mstats.rsh(times, points=x)
plt.plot(x, X.pdf(x), label='MLE Normal Distribution')
plt.plot(x, Y.evaluate(x), label='KDE')
plt.plot(x, z, label='RSH')
plt.legend()
plt.title("PDF Estimates")