我试图循环ftp上的文件,然后存储它们。但是,在第二次迭代中,我收到错误:
FileNotFoundError: [Errno 2] No such file or directory:
这是我的代码:
# TODO: auth
from ftplib import FTP
def extract(environment):
ftp = FTP(auth["host"])
# Monitor and extract
with ftp.login(user=auth['username'], passwd=auth['password']) as ftp:
folders = []
try:
folders = ftp.nlst()
except:
print('Probably no folders in this directory')
for f in folders:
# Go into subfolder per subfund
path = "".join(['/',f])
ftp.cwd(path)
# List files
files = []
try:
files = ftp.nlst()
except:
print('Probably no files in this directory')
for filename in files:
if ".csv" in filename:
with open(filename, 'r+') as source_file:
print('opened, this works for the 1st only')
store_to_gcs(source_file, filename)
def store_to_gcs(source_file, filename)
# TODO: bucket = storage.bucket(app=app)
# After it I store it to GCS, does it have anything to do with it?
storage_ref = "test/" + filename
blob = bucket.blob(storage_ref)
blob.upload_from_file(source_file)
with open(filename, 'r+') as source_file
仅适用于文件中的第一个文件,但不适用于第二个文件。
我可以确认我在正确的目录中,因为我做了ftp.pwd()
确认。
open(filename, 'r+')
打开一个本地文件。虽然我相信你想打开一个远程文件。
你可能有ffpos1_708524_57474156_18022019_036521_1.csv
的本地副本,但不是fflia1_708470_57474842_18022019_036521_1.csv
。这就解释了为什么open
在第一次迭代中看似成功。
ftplib中没有类似open
的功能。
有两种解决方案:
BytesIO
文件对象。
见Retrieve data from gz file on FTP server without writing it locally。
然后你可以将BytesIO
传递给blob.upload_from_file
。
这很容易实现,但如果文件太大,则可能会出现问题。
for filename in files:
if ".csv" in filename:
flo = BytesIO()
ftp.retrbinary('RETR ' + filename, flo.write)
flo.seek(0)
store_to_gcs(flo, filename)