我想从列中删除美元符号和逗号,然后转换为浮动。这是我到目前为止所做的,没有用。实际上什么都没有改变。数据看起来像[“ $ 200,00”,“ $ 1,000.00” ...“ $ 50.00”]
import pandas as pd
import string
y_train = train.iloc[:,-1]
needtoclean=y_train.to_list()#''.join(y_train.to_list())
to_delete = set(string.punctuation) - {'$',','}
clean = [x for x in needtoclean if x not in to_delete]
[尝试一下,下次您应该发布代码
按索引迭代列表,以便能够修改值。
1)。删除$
2)。投放为浮动
for i in xrange(len(your_list)):
your_list[i] = float(your_list[i].replace("$", ""))
如果美元符号始终在这些字符串中的同一位置,则应完成此工作。我假设您使用的是熊猫数据框。
df["needtoclean"] = df["needtoclean"].apply(lambda x: float(x[1:]))
list_ = ['$58.00', '$60.00'] #Your Lise
new_list = [] #Initialise new list
for elem in list_: #Iterate over previous list's elements
elem = elem.replace("$", '') #Replace the `$` sign
new_list.append(float(elem)) #Add the typecasted float to new list
通过列表理解很容易解决。
unclean = ['$58.00', '$125.00'] # your data
clean = [float(value[1:]) for value in unclean if value.startswith('$')]
# you can remove "if value.startswith('$')" if you are sure
# that all values start with $
如果您希望将其用作功能:
unclean = ['$58.00', '$125.00']
def to_clean_float(unclean):
return [float(value[1:]) for value in unclean if value.startswith('$')]
print(to_clean_float(unclean)) # Gives: [58.0, 125.0]
如果您不需要它作为原子列表,但想进一步处理数据,则也可以创建一个generator expression。如果列表很大,可以节省很多内存。
unclean = ['$58.00', '$125.00']
def to_clean_float(unclean):
return (float(value[1:]) for value in unclean if value.startswith('$'))
clean_generator = to_clean_float(unclean)
print(list(value for value in clean_generator)) # Gives: [58.0, 125.0]