我试图在python中执行一个脚本,它将从表一中获取数据并输入到另一个表中。有点像某种ETL。然而,我遇到了这个SyntaxError: unexpected EOF while parsing
错误。我有点困难,试图使用我见过别人使用的技术,所以我真的不知道我的方式。
到目前为止,这是我的代码:
import psycopg2
try:
connectionone = psycopg2.connect(user = "postgres",
password = "xxxxxx",
host = "127.0.0.1",
port = "5432",
database = "xxxxxx")
connectiontwo = psycopg2.connect(user = "postgres",
password = "xxxxxx",
host = "127.0.0.1",
port = "5432",
database = "xxxxxx")
cursorsource = connectionone.cursor()
cursordest = connectiontwo.cursor()
#Truncating dest table
print("Truncating Destination")
cursordest.execute('delete from testarea.salepersons_2')
connectiontwo.commit()
#Fetch source data
cursorsource.execute('SELECT sp_no, sp_name, sp_territory, sp_product,
active FROM testarea.salepersons_original;')
rows = cursorsource.fetchall()
sql_insert = 'INSERT INTO testarea.salepersons_2 (sp_no, sp_name,
p_territory, sp_product, active) values '
sql_values = ['(%s, %s, %s, %s, %s)']
data_values = []
batch_size = 1000 #customize for size of tables...
sql_stmt = sql_insert + ','.join(sql_values*batch_size) + ';'
for i, row in enumerate(rows, 1):
data_values += row[:5] #relates to number of columns (%s)
if i % batch_size == 0:
cursordest.execute (sql_stmt , data_values )
cursordest.commit()
print("Inserting")
data_values = []
if (i % batch_size != 0):
sql_stmt = sql_insert + ','.join(sql_values*(i % batch_size)) +
';'
cursordest.execute (sql_stmt, data_values)
print("Last Values ....")
connectiontwo.commit()
except (Exception, psycopg2.Error) as error :
print ("Error occured :-(", error)
finally:
#closing database connection.
if(connectionone):
cursorsource.close()
connectionone.close()
print("PostgreSQL connection is closed")
#closing database connection.
if(connectiontwo):
cursordest.close()
connectiontwo.close()
print("PostgreSQL connection is closed")
#close connections
cursorsource.close()
cursordest.close()
cursorsource.close()
cursordest.close()
第一个问题很简单,可以解决。您有一个仅由单引号括起来的多行字符串:
cursorsource.execute('SELECT sp_no, sp_name, sp_territory, sp_product,
active FROM testarea.salepersons_original;')
您应该将其括在三引号中,这不会影响SQL执行:
cursorsource.execute("""SELECT sp_no, sp_name, sp_territory, sp_product,
active FROM testarea.salepersons_original;""")
剩下的代码很难让我遵循。我怀疑你实际上有一个包含5000列的表,所以我认为你试图做1000个含有5个值的行插入。如果我的理解是正确的,我只能给出一般的方法:
import random
import string
# Create some fake data to visualise
fake_data = [random.choice(list(string.ascii_letters)) for x in range(50)]
# Chunk the data (https://stackoverflow.com/a/1751478/4799172)
# This reshapes it into sublists each of length 5.
# This can fail if your original list is not a multiple of 5, but I think your
# existing code will still throw the same issues.
def chunks(l, n):
n = max(1, n)
return (l[i:i+n] for i in range(0, len(l), n))
chunked_data = chunks(fake_data, 5)
sql_insert = """INSERT INTO testarea.salepersons_2 (sp_no, sp_name,
sp_territory, sp_product, active) values (?, ?, ?, ?, ?)"""
# Use executemany, not execute in a loop, to repeat for each sublist in
# chunked_data
cursor.executemany(sql_insert, chunked_data)
请注意,在这种情况下,我使用参数化查询来防止SQL注入(我使用?
作为值的占位符)。不同的图书馆有不同的占位符;例如,MySQL包装器期望%s
而SQLite期望?
- 在这种情况下我使用?
消除歧义,它不仅仅是常规的字符串格式化,但你可能不得不改回%s
。