安装“ODBC 驱动程序驱动器和 Databricks 集群”

问题描述 投票:0回答:1

我的集群上的 init 脚本指向 DBFS 路径。错误显示不再支持 DBFS,我必须移动它的工作区或使用 ABFSS 路径。我搬到工作区了

enter image description here

pyodbs.sh 看起来像这样

#!/bin/bash
curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
curl https://packages.microsoft.com/config/ubuntu/16.04/prod.list > /etc/apt/sources.list.d/mssql-release.list 

sudo apt-get update
dpkg --configure -a
sudo ACCEPT_EULA=Y apt-get -q -y install msodbcsql17

/databricks/python/bin/pip list | egrep 'thrift-sasl|sasl'
/databricks/python/bin/pip install --upgrade thrift
dpkg -l | egrep 'thrift_sasl|libsasl2-dev|gcc|python-dev'
sudo apt-get -y install libsasl2-dev gcc 

sudo apt-get -q -y install unixodbc unixodbc-dev
sudo apt-get -q -y install python3-dev
/databricks/python/bin/pip install pyodbc

我从笔记本运行这个命令

%sh
curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
curl https://packages.microsoft.com/config/ubuntu/16.04/prod.list > /etc/apt/sources.list.d/mssql-release.list 
apt-get update
ACCEPT_EULA=Y apt-get install msodbcsql17 
apt-get -y install unixodbc-dev
pip3 install --upgrade pyodbc

这是我得到的错误

enter image description here

有人可以帮我解决这个问题吗?在我更改初始化脚本路径后,它立即开始失败

我的集群配置 enter image description here

azure databricks azure-databricks pyodbc
1个回答
0
投票

我已经尝试过以下方法:

!curl https://packages.microsoft.com/keys/microsoft.asc | sudo apt-key add -
!curl https://packages.microsoft.com/config/ubuntu/16.04/prod.list | sudo tee /etc/apt/sources.list.d/mssql-release.list
!sudo apt-get update
!sudo ACCEPT_EULA=Y apt-get install -q -y msodbcsql17
!pip install --upgrade pyodbc

在上面的代码中,我添加并验证包的 Microsoft GPG 密钥。 接下来为 Ubuntu 16.04 添加 MS SQL Server 存储库。 更新软件包列表并安装 MS SQL Server 的 ODBC 驱动程序 17。

结果:

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   983  100   983    0     0  10516      0 --:--:-- --:--:-- --:--:-- 10569
Warning: apt-key is deprecated. Manage keyring files in trusted.gpg.d instead (see apt-key(8)).
OK
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    79  100    79    0     0    888      0 --:--:-- --:--:-- --:--:--   897
deb [arch=amd64] https://packages.microsoft.com/ubuntu/16.04/prod xenial main
Hit:1 https://packages.microsoft.com/ubuntu/16.04/prod xenial InRelease
Hit:2 https://repos.azul.com/zulu/deb stable InRelease
Hit:3 http://security.ubuntu.com/ubuntu jammy-security InRelease               
Hit:4 http://archive.ubuntu.com/ubuntu jammy InRelease                         
Hit:5 http://archive.ubuntu.com/ubuntu jammy-updates InRelease
Hit:6 http://archive.ubuntu.com/ubuntu jammy-backports InRelease
Reading package lists... Done
W: https://packages.microsoft.com/ubuntu/16.04/prod/dists/xenial/InRelease: Key is stored in legacy trusted.gpg keyring (/etc/apt/trusted.gpg), see the DEPRECATION section in apt-key(8) for details.
W: https://repos.azul.com/zulu/deb/dists/stable/InRelease: Key is stored in legacy trusted.gpg keyring (/etc/apt/trusted.gpg), see the DEPRECATION section in apt-key(8) for details.
Reading package lists...
Building dependency tree...
Reading state information...
msodbcsql17 is already the newest version (17.8.1.1-1).
0 upgraded, 0 newly installed, 0 to remove and 86 not upgraded.
1 not fully installed or removed.
After this operation, 0 B of additional disk space will be used.
Setting up msodbcsql17 (17.8.1.1-1) ...
dpkg: error processing package msodbcsql17 (--configure):
 installed msodbcsql17 package post-installation script subprocess returned error exit status 127
Errors were encountered while processing:
 msodbcsql17
E: Sub-process /usr/bin/dpkg returned an error code (1)
Note: you may need to restart the kernel using dbutils.library.restartPython() to use updated packages.
Requirement already satisfied: pyodbc in /databricks/python3/lib/python3.10/site-packages (4.0.32)
Collecting pyodbc
  Downloading pyodbc-5.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (334 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 334.7/334.7 kB 1.8 MB/s eta 0:00:00
Installing collected packages: pyodbc
  Attempting uninstall: pyodbc
    Found existing installation: pyodbc 4.0.32
    Not uninstalling pyodbc at /databricks/python3/lib/python3.10/site-packages, outside environment /local_disk0/.ephemeral_nfs/envs/pythonEnv-9de0cbfa-9f9d-4ae7-b838-2c6b8f93e753
    Can't uninstall 'pyodbc'. No files were found to uninstall.
Successfully installed pyodbc-5.1.0
Name: pyodbc Version: 5.1.0 Summary: DB API module for ODBC Home-page: [https://github.com/mkleehammer/pyodbc](https://github.com/mkleehammer/pyodbc) Author: Author-email: Michael Kleehammer <[email protected]> License: MIT License Location: /local_disk0/.ephemeral_nfs/envs/pythonEnv-9de0cbfa-9f9d-4ae7-b838-2c6b8f93e753/lib/python3.10/site-packages Requires: Required-by:
WARNING: apt does not have a stable CLI interface. Use with caution in scripts. msodbcsql17/xenial,now 17.8.1.1-1 amd64 [installed]

我同意@Ganesh Chandrasekaran 你可以使用 pyspark 如果你正在连接到 SQL Server, 您可以绕过 ODBC 并使用 Databricks 的直接连接。

以下是示例:

jdbcHostname = "<Your Sql server>.database.windows.net"
jdbcPort = 1433
jdbcDatabase = "db02"
jdbcUrl = f"jdbc:sqlserver://{jdbcHostname}:{jdbcPort};database={jdbcDatabase}"
connectionProperties = {
    "user": "admin02",
    "password": "Welcome@1"
}
remote_table = (spark.read
  .format("jdbc")  
  .option("url", jdbcUrl)  
  .option("user", connectionProperties["user"])
  .option("password", connectionProperties["password"])
  .option("dbtable", "dbo.tbl02")  
  .option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver")  
  .load()
)
remote_table.show()

结果:

+---+-------+
| id|   Name|
+---+-------+
|  1|  dilip|
|  2|    Raj|
|  3|Narayan|
+---+-------+
© www.soinside.com 2019 - 2024. All rights reserved.