删除重复数据Python

Question

我有一个庞大的网状流量分布数据库。但问题是网格太小，所以它们中的某些部分是无用的，这使我的计算变得困难。在每个网格长度的y维度上是0.00032。我的y维度从0到0.45。你可以理解，有很多无用的数据。

我想让每个网格长度等于0.00128而不是删除不能被0.00128分割的行，怎么做？

trainProcessed = trainProcessed[trainProcessed[:,4]%0.00128==0]

我试过这行代码（trainProcessed是我的数据作为一个numpy数组），但它像0 - > 0.00128 - > 0.00256 - > 0.00512。但是有些行的值为0.00384，并且也可以被0.00128分割。顺便说一下阵列形状是（888300,8）。

示例数据：

X：[0,0,0,0,0.00031999,0.00031999,0.00063999,0.00064,0.00096,0.00096,0.000128,0.000128]

示例输出：

X：[0,0,0,0,0.000128,0.000128]

Answer 1

对于这种情况和函数模数，我将使用十进制：

import pandas as pd
from decimal import Decimal
df = pd.DataFrame({'values': [0.00128, 0.00384, 0.367, 0.128, 0.34]})
print(df)

#convert float to str then Decimal and apply the modulo
#keep only rows which are dividable by 0.00128
filter = df.apply(lambda r: Decimal(str(r['values'])) % Decimal('0.00128')  == Decimal('0') ,axis=1)

#if data are smaller you could multiply by power of 10 before modulo
#filter = df.apply(lambda r: Decimal(str(r['values'] * 1000)) % Decimal('0.00128')  == Decimal('0') ,axis=1)
df=df[filter].reset_index(drop=True)

#the line: df=df[~filter].reset_index(drop=True) does the (not filter)
print(df)

初始输出：

最终输出

    values
0  0.00128
1  0.00384
2  0.12800

删除重复数据Python

问题描述投票：1回答：1

1个回答

最新问题

删除重复数据Python

问题描述 投票：1回答：1

1个回答

最新问题

问题描述投票：1回答：1