我想将原始数据集文件添加到我的 dagshub 存储库(我的第一个存储库,并且它与 MLflow 教程一起使用)
这句话给我带来了麻烦:
repo = dagshub.upload.Repo(USER_NAME,REPO_NAME)
repo.upload(local_path='data/winequality.txt',
remote_path='data/raw/winequality.txt',
commit_message='Added Raw Data',
versioning='dvc')
这是我得到的错误:
Uploading files (1) to "USER_NAME/REPO_NAME"...
---------------------------------------------------------------------------
DagsHubAPIError Traceback (most recent call last)
<ipython-input-49-e8d1e8493248> in <cell line: 4>()
2 repo = dagshub.upload.Repo(USER_NAME,REPO_NAME)
3
----> 4 repo.upload(local_path='data/winequality.txt',
5 remote_path='data/raw/winequality.txt',
6 commit_message='Added Raw Data',
2 frames
/usr/local/lib/python3.10/dist-packages/dagshub/upload/wrapper.py in upload(self, local_path, commit_message, remote_path, **kwargs)
286 else:
287 file_to_upload = DataSet.get_file(str(local_path), remote_path)
--> 288 self.upload_files([file_to_upload], commit_message=commit_message, **kwargs)
289
290 def upload_files(
/usr/local/lib/python3.10/dist-packages/dagshub/upload/wrapper.py in upload_files(self, files, directory_path, commit_message, versioning, new_branch, last_commit, force)
375 timeout=None,
376 )
--> 377 self._log_upload_details(data, res, files)
378
379 # The ETag header contains the hash of the uploaded commit,
/usr/local/lib/python3.10/dist-packages/dagshub/upload/wrapper.py in _log_upload_details(self, data, res, files)
413 log_message(f"Got unknown successful status code {res.status_code}")
414 else:
--> 415 raise determine_upload_api_error(res)
416
417 def _poll_mirror_up_to_date(self):
DagsHubAPIError: file missing from storage:
Required resource is missing from the storage, is '' stored in your repository DagsHub storage?
Repo 文件结构如下所示:
本地盘:
根/
|...数据/
|... 酒质.txt
远程:
根/
|...数据/
|...原始/
请注意,“raw”是由 DVC 控制的版本,但 dagshub 文档显示这是执行此操作的方法:上传数据
不确定我错过了什么。
该问题似乎是由于缺少 DVC 跟踪文件导致的,这会阻止向目录添加新文件。要解决该问题,请运行以下代码:
pip install dvc "dvc[s3]"
(如果尚未安装)。
git clone https://dagshub.com/<user_name>/<repo_name>.git
cd <repo_name>
dvc remote add origin --local s3://dvc
dvc remote modify origin --local endpointurl https://dagshub.com/<user_name>/<repo_name>.s3
dvc remote modify origin --local access_key_id <your_token>
dvc remote modify origin --local secret_access_key <your_token>
配置完成后,运行以下命令:
mkdir -p data/raw
dvc commit data/raw.dvc
dvc push -r origin
然后运行您的代码。现在就可以工作了!
话虽这么说,这可能也是我们可以改进的地方,所以我会与工程团队分享!
谢谢你的提问:)