我有一个程序,用户可以复制 Excel 表格并将其粘贴到程序本身上进行数据分析。该表允许的唯一值是浮点数。问题是有 4 种不同的方式来编写浮点数,它们是:
x = 123456.789
y = 123456,789
a = 123.456,789
b = 123,456.789
我需要代码将所有 4 个解释为:
123456.789
最好的方法是什么,才能在 Windows、Linux 和 Mac 上运行?
完整功能如下:
# Copies csv from clipboard to a pandas dataframe
clipboard_dataframe = pd.read_clipboard()
# Get row and column count
paste_rows = len(clipboard_dataframe.index)
paste_columns = len(clipboard_dataframe.columns)
# Create the table to show the values
paste_janela = QTableWidget(paste_rows, paste_columns)
# Define the labels for headers and indexes
paste_janela.setHorizontalHeaderLabels(clipboard_dataframe.columns)
paste_janela.setVerticalHeaderLabels(clipboard_dataframe.index)
# Populate the table with the proper values
for x in range(paste_janela.rowCount()):
for y in range(paste_janela.columnCount()):
# Error handling in case the cell value isn't a float
if not isinstance(clipboard_dataframe.iat[x,y], numbers.Number):
error_message = QMessageBox.critical(self.janela_principal,
"Erro de importação de dados",
"Houe um erro na importação de dados de tabela. \nOs únicos valores aceitos são números reais")
raise ValueError("Valor inválido foi encontrado na tabela. Só é aceito números reais nas tabelas")
table_value = str(clipboard_dataframe.iat[x,y])
table_item = QTableWidgetItem(table_value)
paste_janela.setItem(x, y, table_item)
# Pass the table to the MDI Area, turning it into a subwindow in the process
self.sandbox_mdiarea.addSubWindow(paste_janela)
# Needed to load the window, otherwise it will be hidden as default
paste_janela.show()
这是一个适用于您的测试用例的解决方案,还有我想出的几个解决方案。最大的问题是像
123,456
这样的数字,不清楚它应该是123456还是123.456。
基本的解决方案是检查它是否有小数点或句点,或者两者都有,然后调整字符串以适合内置的
float
函数。假设始终提供小数除法器,我相信这里有一个满足您要求的函数:
def read_float(floatstr: str) -> float:
period_count = floatstr.count('.')
comma_count = floatstr.count(',')
if period_count > 1 and comma_count > 1:
raise ValueError(f'Unable to read float with multiple of both period and comma: {floatstr}')
if period_count == 0 and comma_count == 0:
return float(floatstr)
if period_count == 0:
return float(floatstr.replace(',', '.'))
if comma_count == 0:
return float(floatstr)
period_first = floatstr.find('.')
comma_first = floatstr.find(',')
if period_first < comma_first:
return float(floatstr.replace('.', '').replace(',', '.'))
return float(floatstr.replace(',', ''))
def _main():
import numpy as np
test_points = [
('123', 123),
('1.25', 1.25),
('1234.56', 1234.56),
('1,234.56', 1234.56),
('1,234,567.89', 1234567.89),
('1234567.89', 1234567.89),
('1,25', 1.25),
('1234,56', 1234.56),
('1.234,56', 1234.56),
('1.234.567,89', 1234567.89),
('1234567,89', 1234567.89),
('123,456', 123.456) # this is problematic - no way to tell if 123456 or 123.456
]
for teststr, testval in test_points:
output = read_float(teststr)
print(f'{teststr:>12} - {"PASS" if np.equal(testval, output) else "FAIL"}')
if __name__ == '__main__':
_main()
给出以下输出:
123 - PASS
1.25 - PASS
1234.56 - PASS
1,234.56 - PASS
1,234,567.89 - PASS
1234567.89 - PASS
1,25 - PASS
1234,56 - PASS
1.234,56 - PASS
1.234.567,89 - PASS
1234567,89 - PASS
123,456 - PASS
我确信存在使用一些本地化库的更简单的解决方案,但我是 找不到它。
如果您有任何疑问,请告诉我。如果您有一个测试点,此功能不起作用,请告诉我,我会考虑更新它。