我如何在MySQL中复制熊猫函数?

问题描述 投票:3回答:1

我是SQL的新手,正试图了解我在python中了解的知识。我有一个脚本,可在其中连接到SSMS的odbc以在Python中处理数据:

import pyodbc
import pandas as pd
#odbc
conn = pyodbc.connect('Driver={SQL Server};'
                      'Server=PMZZ315\RION;'
                      'Database=Warehouse;'
                      'Trusted_Connection=yes;')

cursor = conn.cursor()

df = pd.read_sql_query("SELECT [LetId],[StreetAddressLine1],[CompanyName] FROM Dim.Let", conn)
df

df.head()
#print(df.columns)


# Select duplicate rows except first occurrence based on all columns
duplicateRowsDF = df[df.duplicated(['CompanyName','StreetAddressLine1'])]

#print("Duplicate Rows except first occurrence based on all columns are :")
print(duplicateRowsDF)
duplicateRowsDF.to_csv("duplicateRowsDFodbc.csv")

SQL中的哪个函数可以替代df.duplicated函数?我要尝试做的是,如果重复公司名称和街道地址,则忽略重复的记录而忽略第一例

输出数据集的代表:

LetId   StreetAddressLine1           CompanyName
32  1451 West Brimson View Court    Palmer 
405 1808 North Lonion Ave           Ozark 
465 4223 Monty Hwy              Alabama 
python odbc ssms
1个回答
0
投票

SQL表表示无序集。排序仅由数据中的列提供。没有命令就没有“第一”。让我假设letid定义了顺序。

SQL中的规范方法使用row_number()

select t.*
from (select t.*,
             row_number() over (partition by CompanyName, StreetAddressLine1 order by letid) as seqnum
      from t
     ) t
where seqnum = 1;
© www.soinside.com 2019 - 2024. All rights reserved.