从AWS S3 ETag计算MD5

Question

我知道是否可以计算本地存储文件的 ETag。这对我来说没有用。我有一个链，我可以在其中压缩文件并使用内存将它们直接上传到 S3 存储：

zip -r - $input_path | tee >(md5sum - >> $MD5_FILE) >(aws s3 cp - s3://$bucket_name/$final_path_zip) >/dev/null

此后我想检查 ETag 是否与我在此命令中计算的 md5 匹配。因此我想知道是否有可能（可能使用 bash）在知道 ETag 的情况下计算整个文件的 md5checksum ？

另一种方法是从管道 zip 计算 ETag，但我不知道如何做到这一点（wc -c 没有任何结果）

Answer 1

您无法从 S3 中的任意 ETag 获取 MD5 摘要。对于使用单个 PutObject 请求上传的非加密对象，它只是内容的 MD5 摘要。对于通过分段上传上传的对象，它记录为复合校验和。这意味着它是每个部分的摘要连接在一起的摘要，并在末尾添加一个标签来计算部分的数量。由于 MD5 哈希算法是不可逆的，因此您无法从中获取各个部分的哈希值。

对于使用任何方法上传的加密对象，它只是记录为“不是其对象数据的 MD5 摘要”。

因此，如果您想将 S3 中对象的 ETag 与您创建的对象进行比较，您需要使用与 S3 相同的技术来计算 ETag。

md5

本身不足以实现分段上传，您需要更复杂的东西。以下 Python 脚本将执行此操作，输出较小文件的 MD5 摘要，或较大上传部分的摘要：

#!/usr/bin/env python3

import sys
from hashlib import md5

MULTIPART_THRESHOLD = 8388608
MULTIPART_CHUNKSIZE = 8388608
BUFFER_SIZE = 1048576

# Verify some assumptions are correct
assert(MULTIPART_CHUNKSIZE >= MULTIPART_THRESHOLD)
assert((MULTIPART_THRESHOLD % BUFFER_SIZE) == 0)
assert((MULTIPART_CHUNKSIZE % BUFFER_SIZE) == 0)

hash = md5()
read = 0
chunks = None

while True:
    # Read some from stdin, if we're at the end, stop reading
    bits = sys.stdin.buffer.read(1048576)
    if len(bits) == 0: break
    read += len(bits)
    hash.update(bits)
    if chunks is None:
        # We're handling a multi-part upload, so switch to calculating 
        # hashes of each chunk
        if read >= MULTIPART_THRESHOLD:
            chunks = b''
    if chunks is not None:
        if (read % MULTIPART_CHUNKSIZE) == 0:
            # Dont with a chunk, add it to the list of hashes to hash later
            chunks += hash.digest()
            hash = md5()

if chunks is None:
    # Normal upload, just output the MD5 hash
    etag = hash.hexdigest()
else:
    # Multipart upload, need to output the hash of the hashes
    if (read % MULTIPART_CHUNKSIZE) != 0:
        # Add the last part if we have a partial chunk
        chunks += hash.digest()
    etag = md5(chunks).hexdigest() + "-" + str(len(chunks) // 16)

# Just show the etag, adding quotes to mimic how S3 operates
print('"' + etag + '"')

这是您的

md5

电话的替代品：

$ zip -r - "$input_path" | tee >(python calculate_etag_from_pipe - >> "$MD5_FILE") >(aws s3 cp - s3://$bucket_name/$final_path_zip) >/dev/null
[ ... zip file is created and uploaded to S3 ... ]

$ cat "$MD5_FILE"
"ef5c64605cb198b65b2451a76719b8d8-96"

$ aws s3api head-object --bucket $bucket_name --key $final_path_zip --query ETag --output text
"ef5c64605cb198b65b2451a76719b8d8-96"

请注意，所示脚本对如何将上传分成多部分上传做出了一些假设。这些假设大致映射了 AWS CLI 默认情况下的运行方式，但不能保证。如果您使用不同的 SDK 或 CLI 的不同设置，则需要调整

MULTIPART_THRESHOLD

和

MULTIPART_CHUNKSIZE

。

从AWS S3 ETag计算MD5

问题描述投票：0回答：1

1个回答

最新问题

从AWS S3 ETag计算MD5

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1