我需要通过sql语句访问驻留在远程db2数据库中的数据,并将其转换为Pandas DataFrame。全部来自我的Mac。我看着使用Pandas的read_sql和ibm_db_sa适配器,但看起来Mac上不支持必备的客户端软件
我提出了一个jdbc选项,我正在发布,但我很想知道是否有其他人有任何想法
这是一个使用jdbc,pip可安装的JayDeBeApi和相应的db jar文件的选项
注意:这可以用于其他jdbc / jaydebeapi compliant databases,如Oracle,MS Sql Server等
import jaydebeapi
import pandas as pd
def read_jdbc(sql, jclassname, driver_args, jars=None, libs=None):
'''
Reads jdbc compliant data sources and returns a Pandas DataFrame
uses jaydebeapi.connect and doc strings :-)
https://pypi.python.org/pypi/JayDeBeApi/
:param sql: select statement
:param jclassname: Full qualified Java class name of the JDBC driver.
e.g. org.postgresql.Driver or com.ibm.db2.jcc.DB2Driver
:param driver_args: Argument or sequence of arguments to be passed to the
Java DriverManager.getConnection method. Usually the
database URL. See
http://docs.oracle.com/javase/6/docs/api/java/sql/DriverManager.html
for more details
:param jars: Jar filename or sequence of filenames for the JDBC driver
:param libs: Dll/so filenames or sequence of dlls/sos used as
shared library by the JDBC driver
:return: Pandas DataFrame
'''
try:
conn = jaydebeapi.connect(jclassname, driver_args, jars, libs)
except jaydebeapi.DatabaseError as de:
raise
try:
curs = conn.cursor()
curs.execute(sql)
columns = [desc[0] for desc in curs.description] #getting column headers
#convert the list of tuples from fetchall() to a df
return pd.DataFrame(curs.fetchall(), columns=columns)
except jaydebeapi.DatabaseError as de:
raise
finally:
curs.close()
conn.close()
一些例子
#DB2
conn = 'jdbc:db2://<host>:5032/<db>:currentSchema=<schema>;'
class_name = 'com.ibm.db2.jcc.DB2Driver'
sql = 'SELECT name FROM table_name FETCH FIRST 5 ROWS ONLY'
df = read_jdbc(sql, class_name, [conn, 'myname', 'mypwd'])
#PostgreSQL
conn = 'jdbc:postgresql://<host>:5432/<db>?currentSchema=<schema>'
class_name = 'org.postgresql.Driver'
jar = '/path/to/jar/postgresql-9.4.1212.jar'
sql = 'SELECT name FROM table_name LIMIT 5'
df = read_jdbc(sql, class_name, [conn, 'myname', 'mypwd'], jars=jar)
我从https://stackoverflow.com/a/33805547/914967得到了一个更简单的答案,它只使用pip模块ibm_db
:
import ibm_db
import ibm_db_dbi
import pandas as pd
conn_handle = ibm_db.connect('DATABASE={};HOSTNAME={};PORT={};PROTOCOL=TCPIP;UID={};PWD={};'.format(db_name, hostname, port_number, user, password), '', '')
conn = ibm_db_dbi.Connection(conn_handle)
df = pd.read_sql(sql, conn)
鲍勃,你应该看看ibmdbpy(https://pypi.python.org/pypi/ibmdbpy)。它是DB2和dashDB表的pandas数据框架样式API。它支持底层DB2客户端驱动程序,ODBC和JDBC。
因此,作为先决条件,您需要为Mac设置DB2客户端驱动程序包,您可以在此处找到:http://www-01.ibm.com/support/docview.wss?uid=swg21385217
在@IanBjorhovde对我的问题发表评论之后,我调查了另一个允许我使用sqlalchemy和pandas的read_sql()的解决方案
这是我采取的步骤。注意:我在OSX Yosemite(10.10.4)上使用python 3.4和3.5
1)下载IBM DB2 Express-C(DB2的免费社区版)
2)导航到解压缩的目录后
sudo ./db2_install
我接受了/opt/IBM/db2/V10.1的默认位置
3)安装ibm_db和ibm_db_sa
pip install ibm_db
我从source构建了ibm_db_sa,因为安装的pip失败了
python setup.py install
应该这样做。当您尝试连接到数据库时,可能会收到“Reason:image not found”之类的错误,因此请阅读this以获取修复程序。注意:可能需要重新启动
用法示例:
import ibm_db_sa
import pandas as pd
from sqlalchemy import select, create_engine
eng = create_engine('ibm_db_sa://<user_name>:<pwd>@<host>:5032/<db name>')
sql = 'SELECT name FROM table_name FETCH FIRST 5 ROWS ONLY'
df = pd.read_sql(sql, eng)