我正在使用 Python 3.8、Azure Data Lake gen 2 和以下插件...
azure-storage-blob==12.4.0
azure-storage-file-datalake==12.1.1
如何检查文件系统上是否存在特定路径?我试过这个
from azure.storage.filedatalake import DataLakeFileClient
...
file = DataLakeFileClient.from_connection_string(
DATA_LAKE_CONN_STR,
file_system_name=filesystem,
file_path=path
)
但收到错误,指出 DataLakeFileClient 不存在“exists”方法。
测试文件或路径存在性的更简单方法:
from azure.storage.filedatalake import DataLakeServiceClient
...
try:
file_system_client = service_client.get_file_system_client(file_system="my-file-system")
if file_system_client.get_file_client("my-file").exists():
print("file exists")
else:
print("file does not exist")
except Exception as e:
print(e)
将
get_file_client()
更改为 get_directory_client()
以测试路径。
如果你想检查文件系统上是否存在文件,请参考以下代码
from azure.storage.filedatalake import DataLakeFileClient
account_name = 'testadls05'
account_key = 'CpfCQot******JOLvB+aJOZbsQ=='
file_system_name='test'
file_client = DataLakeFileClient(account_url="{}://{}.dfs.core.windows.net".format(
"https",
account_name
),
file_system_name=file_system_name,
file_path='test.txt',
credential=account_key
)
try:
file_client.get_file_properties()
except Exception as error:
print(error)
if type(error).__name__ =='ResourceNotFoundError':
print("the path does not exist")
从 azure.identity 导入 DefaultAzureCredential 从 azure.storage.filedatalake 导入 DataLakeServiceClient 将 pandas 导入为 pd 导入io
存储帐户名称=“leowk” 容器名称=“jana” blob_name =“sub.csv”
凭证 = DefaultAzureCredential() storage_account_url = f"https://{storage_account_name}.dfs.core.windows.net" data_lake_service_client = DataLakeServiceClient(account_url=storage_account_url, credential=credential)
文件路径 = f"{container_name}/{blob_name}"
file_system_client = data_lake_service_client.get_file_system_client(file_system=container_name) file_client = file_system_client.get_file_client(file_path)
使用 io.BytesIO() 作为 file_stream: file_client.download_file()#.readinto(file_stream) 文件流.seek(0) df = pd.read_csv(file_stream)
打印(df.head())