Pandas：将Series的数据类型更改为String

Question

我将 Pandas 'ver 0.12.0' 与 Python 2.7 一起使用，并具有如下数据框：

df = pd.DataFrame({'id' : [123,512,'zhub1', 12354.3, 129, 753, 295, 610],
                    'colour': ['black', 'white','white','white',
                            'black', 'black', 'white', 'white'],
                    'shape': ['round', 'triangular', 'triangular','triangular','square',
                                        'triangular','round','triangular']
                    },  columns= ['id','colour', 'shape'])

id

系列由一些整数和字符串组成。它的

dtype

默认为

object

。我想将

id

的所有内容转换为字符串。我尝试了

astype(str)

，它产生了下面的输出。

df['id'].astype(str)
0    1
1    5
2    z
3    1
4    1
5    7
6    2
7    6

1) 如何将

id

的所有元素转换为字符串？

2) 我最终将使用

id

为数据帧建立索引。与使用整数索引相比，在数据帧中使用字符串索引会减慢速度吗？

Answer 1

反映最新实践的新答案：截至目前（v1.2.4），

astype('str')

和

astype(str)

都不起作用。

根据文档，可以通过以下方式将系列转换为字符串数据类型：

df['id'] = df['id'].astype("string")

df['id'] = pandas.Series(df['id'], dtype="string")

df['id'] = pandas.Series(df['id'], dtype=pandas.StringDtype)

端到端示例：

import pandas as pd

# Create a sample DataFrame
data = {
    'Name': ['John', 'Alice', 'Bob', 'John', 'Alice'],
    'Age': [25, 30, 35, 25, 30],
    'City': ['New York', 'London', 'Paris', 'New York', 'London'],
    'Salary': [50000, 60000, 70000, 50000, 60000],
    'Category': ['A', 'B', 'C', 'A', 'B']
}

df = pd.DataFrame(data)

# Print the DataFrame
print("Original DataFrame:")
print(df)
print("\nData types:")
print(df.dtypes)
cat_cols_ = None
# Apply the code to change data types
if not cat_cols_:
    # Get the columns with object data type
    object_columns = df.select_dtypes(include=['object']).columns.tolist()
    
    if len(object_columns) > 0:
        print(f"\nObject columns found, converting to string: {object_columns}")
        
        # Convert object columns to string type
        df[object_columns] = df[object_columns].astype('string')
    
    # Get the categorical columns (including string and category data types)
    cat_cols_ = df.select_dtypes(include=['category', 'string']).columns.tolist()

# Print the updated DataFrame and data types
print("\nUpdated DataFrame:")
print(df)
print("\nUpdated data types:")
print(df.dtypes)
print(f"\nCategorical columns (cat_cols_): {cat_cols_}")

Original DataFrame:
    Name  Age      City  Salary Category
0   John   25  New York   50000        A
1  Alice   30    London   60000        B
2    Bob   35     Paris   70000        C
3   John   25  New York   50000        A
4  Alice   30    London   60000        B

Data types:
Name        object
Age          int64
City        object
Salary       int64
Category    object
dtype: object

Object columns found, converting to string: ['Name', 'City', 'Category']

Updated DataFrame:
    Name  Age      City  Salary Category
0   John   25  New York   50000        A
1  Alice   30    London   60000        B
2    Bob   35     Paris   70000        C
3   John   25  New York   50000        A
4  Alice   30    London   60000        B

Updated data types:
Name        string[python]
Age                  int64
City        string[python]
Salary               int64
Category    string[python]
dtype: object

Categorical columns (cat_cols_): ['Name', 'City', 'Category']

Answer 2

您可以使用

str

 将 id 的所有元素转换为

apply

df.id.apply(str)

0        123
1        512
2      zhub1
3    12354.3
4        129
5        753
6        295
7        610

OP编辑：

我认为这个问题与Python版本（2.7.）有关，这有效：

df['id'].astype(basestring)
0        123
1        512
2      zhub1
3    12354.3
4        129
5        753
6        295
7        610
Name: id, dtype: object

Answer 3

您必须分配它，如下所示：-

df['id']= df['id'].astype(str)

Answer 4

就我个人而言，以上方法都不适合我。做了什么：

new_str = [str(x) for x in old_obj][0]

Answer 5

您可以使用：

df.loc[:,'id'] = df.loc[:, 'id'].astype(str)

这就是他们推荐此解决方案的原因：Pandas doc

TD；LR

反映一些答案：

df['id'] = df['id'].astype("string")

这将在给定的示例中中断，因为它将尝试转换为StringArray，它无法处理“字符串”中的任何数字。

df['id']= df['id'].astype(str)

对我来说，这个解决方案引发了一些警告：

> SettingWithCopyWarning:  
> A value is trying to be set on a copy of a
> slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

Answer 6

有两种可能：

使用
```
.astype("str").astype("string")
```
。如此处
使用
```
.astype(pd.StringDtype())
```
。来自官方文档

Answer 7

对我来说它有效：

 df['id'].convert_dtypes()

请参阅此处的文档：

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.convert_dtypes.html

Answer 8

2
投票

使用 pandas 字符串方法，即

df['id'].str.cat()

Answer 9

如果你想动态地做

df_obj = df.select_dtypes(include='object')
df[df_obj.columns] = df_obj.astype(str)

Answer 10

您的问题可以通过先将其转换为对象来轻松解决。转换为object后，只需使用“astype”即可将其转换为str。

obj = lambda x:x[1:]
df['id']=df['id'].apply(obj).astype('str')

Answer 11

对我来说 .to_string() 有效

df['id']=df['id'].to_string()

Pandas：将Series的数据类型更改为String

问题描述投票：0回答：11

11个回答

最新问题

Pandas：将Series的数据类型更改为String

问题描述 投票：0回答：11

11个回答

最新问题

问题描述投票：0回答：11