我们在连接到 Google Sheets(特别是使用 googleapiclient)时遇到超时问题。代码一直在工作,但是在一些新的部署之后,我们开始收到此错误。即使我们回滚更改,此错误仍然存在。
我们设置在 MWAA Airflow 2.6.3 上运行的气流,并使用 python WHL 文件构建依赖关系。我们尝试从 Python Package Index 安装需求,但出现超时错误
WARNING: requirements.txt installation timed out after 9 minutes. Some requirements may not have installed.
并且 DAG 已损坏。
Airflow 能够连接到其他第 3 方服务(Jira、其他服务等),但连接到 Google Sheet API 的 DAG 存在问题。
请分享任何解决方案或我们可以寻找解决问题的可能位置。谢谢。
代码片段
from googleapiclient.discovery import build
service = getattr(build(
serviceName='sheets',
version='v4',
credentials=<credentials>), spreadsheets)()
service.get(spreadsheetId=<spreadsheet_id>).execute()
我们得到以下堆栈跟踪
Traceback (most recent call last):
File "/usr/local/airflow/dags/common/spreadsheet.py", line 199, in get_spreadsheet
return service.get(spreadsheetId=self._id).execute()
File "/usr/local/airflow/.local/lib/python3.10/site-packages/googleapiclient/_helpers.py", line 130, in positional_wrapper
return wrapped(*args, **kwargs)
File "/usr/local/airflow/.local/lib/python3.10/site-packages/googleapiclient/http.py", line 923, in execute
resp, content = _retry_request(
File "/usr/local/airflow/.local/lib/python3.10/site-packages/googleapiclient/http.py", line 222, in _retry_request
raise exception
File "/usr/local/airflow/.local/lib/python3.10/site-packages/googleapiclient/http.py", line 191, in _retry_request
resp, content = http.request(uri, method, *args, **kwargs)
File "/usr/local/airflow/.local/lib/python3.10/site-packages/google_auth_httplib2.py", line 209, in request
self.credentials.before_request(self._request, method, uri, request_headers)
File "/usr/local/airflow/.local/lib/python3.10/site-packages/google/auth/credentials.py", line 151, in before_request
self.refresh(request)
File "/usr/local/airflow/.local/lib/python3.10/site-packages/google/oauth2/service_account.py", line 434, in refresh
access_token, expiry, _ = _client.jwt_grant(
File "/usr/local/airflow/.local/lib/python3.10/site-packages/google/oauth2/_client.py", line 312, in jwt_grant
response_data = _token_endpoint_request(
File "/usr/local/airflow/.local/lib/python3.10/site-packages/google/oauth2/_client.py", line 272, in _token_endpoint_request
response_status_ok, response_data, retryable_error = _token_endpoint_request_no_throw(
File "/usr/local/airflow/.local/lib/python3.10/site-packages/google/oauth2/_client.py", line 219, in _token_endpoint_request_no_throw
request_succeeded, response_data, retryable_error = _perform_request()
File "/usr/local/airflow/.local/lib/python3.10/site-packages/google/oauth2/_client.py", line 195, in _perform_request
response = request(
File "/usr/local/airflow/.local/lib/python3.10/site-packages/google_auth_httplib2.py", line 119, in __call__
response, data = self.http.request(
File "/usr/local/airflow/.local/lib/python3.10/site-packages/httplib2/__init__.py", line 1724, in request
(response, content) = self._request(
File "/usr/local/airflow/.local/lib/python3.10/site-packages/httplib2/__init__.py", line 1444, in _request
(response, content) = self._conn_request(conn, request_uri, method, body, headers)
File "/usr/local/airflow/.local/lib/python3.10/site-packages/httplib2/__init__.py", line 1366, in _conn_request
conn.connect()
File "/usr/local/airflow/.local/lib/python3.10/site-packages/httplib2/__init__.py", line 1156, in connect
sock.connect((self.host, self.port))
TimeoutError: timed out
Configurations:
MWAA: Airflow 2.6.3
Installed Packages (Using plugins.zip):
- Levenshtein-0.21.1
- PyGithub-1.59.0
- adtk-0.6.2
- apache-airflow-providers-atlassian-jira-2.1.1
- apache-airflow-providers-github-2.3.1
- apache-airflow-providers-mysql-5.1.1
- apache-airflow-providers-snowflake-4.2.0
- asttokens-2.2.1
- atlassian-python-api-3.39.0
- aws-requests-auth-0.4.3
- backcall-0.2.0
- cachetools-5.3.1
- comm-0.2.2
- cycler-0.12.1
- debugpy-1.8.1
- defusedxml-0.7.1
- executing-1.2.0
- fonttools-4.50.0
- google-api-core-2.11.0
- google-api-python-client-2.92.0
- google-auth-2.21.0
- google-auth-httplib2-0.1.0
- googleapis-common-protos-1.59.1
- gql-3.3.0
- graphql-core-3.2.3
- httplib2-0.22.0
- iniconfig-2.0.0
- ipykernel-6.25.1
- ipython-8.14.0
- jedi-0.18.2
- jira-3.5.2
- joblib-1.3.2
- jupyter-client-8.3.0
- jupyter-core-5.3.1
- kiwisolver-1.4.5
- matplotlib-3.5.2
- matplotlib-inline-0.1.6
- mpld3-0.5.9
- mysqlclient-2.2.0
- nest-asyncio-1.6.0
- numpy-1.24.4
- oauthlib-3.2.2
- oscrypto-1.3.0
- pandas-1.5.3
- parso-0.8.3
- patsy-0.5.6
- pickleshare-0.7.5
- pillow-10.2.0
- playwright-1.37.0
- protobuf-4.23.4
- pure-eval-0.2.2
- py-1.11.0
- pyOpenSSL-23.2.0
- pyasn1-0.4.8
- pyasn1-modules-0.2.8
- pycryptodomex-3.18.0
- pyee-9.0.4
- pynacl-1.5.0
- pypika-0.48.9
- pytest-7.4.0
- python-Levenshtein-0.21.1
- pyzmq-25.1.0
- requests-oauthlib-1.3.1
- retry-0.9.2
- rsa-4.9
- scikit-learn-1.3.0
- scipy-1.12.0
- snowflake-connector-python-3.0.4
- snowflake-sqlalchemy-1.4.7
- sortedcontainers-2.4.0
- sql-formatter-0.6.2
- stack-data-0.6.2
- statsmodels-0.14.1
- thefuzz-0.20.0
- threadpoolctl-3.4.0
- traitlets-5.9.0
- uritemplate-4.1.1
对于任何来到这里的人。
经过多次尝试和错误,最终我们发现网络上的 IPv6 与 Google API 包交互存在问题(根据此答案https://stackoverflow.com/a/75375184/15938510)我们删除了 IPv6 AWS网络,现在代码可以正常运行了。