在Python中转换文件大小的更好方法[关闭]

问题描述 投票:0回答:12

我正在使用一个读取文件并返回其大小(以字节为单位)的库。

然后将该文件大小显示给最终用户;为了让他们更容易理解,我将文件大小除以

MB
显式转换为
1024.0 * 1024.0
。当然这可行,但我想知道在 Python 中是否有更好的方法来做到这一点?

更好,我的意思可能是一个 stdlib 函数,可以根据我想要的类型操纵大小。就像如果我指定

MB
,它会自动将其除以
1024.0 * 1024.0
。这些线上有一些东西。

python filesize
12个回答
269
投票

这是我使用的:

import math

def convert_size(size_bytes):
   if size_bytes == 0:
       return "0B"
   size_name = ("B", "KB", "MB", "GB", "TB", "PB", "EB", "ZB", "YB")
   i = int(math.floor(math.log(size_bytes, 1024)))
   p = math.pow(1024, i)
   s = round(size_bytes / p, 2)
   return "%s %s" % (s, size_name[i])

注意:大小应以字节为单位发送。


150
投票

hurry.filesize 它将获取以字节为单位的大小,并生成一个漂亮的字符串。

>>> from hurry.filesize import size
>>> size(11000)
'10K'
>>> size(198283722)
'189M'

或者如果您想要 1K == 1000(这是大多数用户的假设):

>>> from hurry.filesize import size, si
>>> size(11000, system=si)
'11K'
>>> size(198283722, system=si)
'198M'

它也有 IEC 支持(但没有记录):

>>> from hurry.filesize import size, iec
>>> size(11000, system=iec)
'10Ki'
>>> size(198283722, system=iec)
'189Mi'

因为它是由 Awesome Martijn Faassen 编写的,所以代码很小、清晰且可扩展。编写自己的系统非常简单。

这是一个:

mysystem = [
    (1024 ** 5, ' Megamanys'),
    (1024 ** 4, ' Lotses'),
    (1024 ** 3, ' Tons'), 
    (1024 ** 2, ' Heaps'), 
    (1024 ** 1, ' Bunches'),
    (1024 ** 0, ' Thingies'),
    ]

像这样使用:

>>> from hurry.filesize import size
>>> size(11000, system=mysystem)
'10 Bunches'
>>> size(198283722, system=mysystem)
'189 Heaps'

61
投票

您可以使用

1024 * 1024
按位移位运算符
代替 << 的大小除数,即
1<<20
获得兆字节,
1<<30
获得千兆字节等。

在最简单的情况下,您可以有例如一个常量

MBFACTOR = float(1<<20)
,然后可以与字节一起使用,即:
megas = size_in_bytes/MBFACTOR

兆字节通常就是您所需要的,或者可以使用类似的东西:

# bytes pretty-printing
UNITS_MAPPING = [
    (1<<50, ' PB'),
    (1<<40, ' TB'),
    (1<<30, ' GB'),
    (1<<20, ' MB'),
    (1<<10, ' KB'),
    (1, (' byte', ' bytes')),
]


def pretty_size(bytes, units=UNITS_MAPPING):
    """Get human-readable file sizes.
    simplified version of https://pypi.python.org/pypi/hurry.filesize/
    """
    for factor, suffix in units:
        if bytes >= factor:
            break
    amount = int(bytes / factor)

    if isinstance(suffix, tuple):
        singular, multiple = suffix
        if amount == 1:
            suffix = singular
        else:
            suffix = multiple
    return str(amount) + suffix

print(pretty_size(1))
print(pretty_size(42))
print(pretty_size(4096))
print(pretty_size(238048577))
print(pretty_size(334073741824))
print(pretty_size(96995116277763))
print(pretty_size(3125899904842624))

## [Out] ###########################
1 byte
42 bytes
4 KB
227 MB
311 GB
88 TB
2 PB

49
投票

如果您已经知道自己想要什么单位尺寸,可以使用以下一些易于复制的单衬。 如果您正在寻找具有一些不错选项的更通用的功能,请参阅我的 2021 年 2 月更新...

字节

print(f"{os.path.getsize(filepath):,} B") 

千比特

print(f"{os.path.getsize(filepath)/(1<<7):,.0f} kb")

千字节

print(f"{os.path.getsize(filepath)/(1<<10):,.0f} KB")

兆位

print(f"{os.path.getsize(filepath)/(1<<17):,.0f} mb")

兆字节

print(f"{os.path.getsize(filepath)/(1<<20):,.0f} MB")

千兆位

print(f"{os.path.getsize(filepath)/(1<<27):,.0f} gb")

千兆字节

print(f"{os.path.getsize(filepath)/(1<<30):,.0f} GB")

太字节

print(f"{os.path.getsize(filepath)/(1<<40):,.0f} TB")

2021 年 2 月更新 以下是我更新和充实的函数,用于 a) 获取文件/文件夹大小,b) 转换为所需的单位:

from pathlib import Path

def get_path_size(path = Path('.'), recursive=False):
    """
    Gets file size, or total directory size

    Parameters
    ----------
    path: str | pathlib.Path
        File path or directory/folder path

    recursive: bool
        True -> use .rglob i.e. include nested files and directories
        False -> use .glob i.e. only process current directory/folder

    Returns
    -------
    int:
        File size or recursive directory size in bytes
        Use cleverutils.format_bytes to convert to other units e.g. MB
    """
    path = Path(path)
    if path.is_file():
        size = path.stat().st_size
    elif path.is_dir():
        path_glob = path.rglob('*.*') if recursive else path.glob('*.*')
        size = sum(file.stat().st_size for file in path_glob)
    return size


def format_bytes(bytes, unit, SI=False):
    """
    Converts bytes to common units such as kb, kib, KB, mb, mib, MB

    Parameters
    ---------
    bytes: int
        Number of bytes to be converted

    unit: str
        Desired unit of measure for output


    SI: bool
        True -> Use SI standard e.g. KB = 1000 bytes
        False -> Use JEDEC standard e.g. KB = 1024 bytes

    Returns
    -------
    str:
        E.g. "7 MiB" where MiB is the original unit abbreviation supplied
    """
    if unit.lower() in "b bit bits".split():
        return f"{bytes*8} {unit}"
    unitN = unit[0].upper()+unit[1:].replace("s","")  # Normalised
    reference = {"Kb Kib Kibibit Kilobit": (7, 1),
                 "KB KiB Kibibyte Kilobyte": (10, 1),
                 "Mb Mib Mebibit Megabit": (17, 2),
                 "MB MiB Mebibyte Megabyte": (20, 2),
                 "Gb Gib Gibibit Gigabit": (27, 3),
                 "GB GiB Gibibyte Gigabyte": (30, 3),
                 "Tb Tib Tebibit Terabit": (37, 4),
                 "TB TiB Tebibyte Terabyte": (40, 4),
                 "Pb Pib Pebibit Petabit": (47, 5),
                 "PB PiB Pebibyte Petabyte": (50, 5),
                 "Eb Eib Exbibit Exabit": (57, 6),
                 "EB EiB Exbibyte Exabyte": (60, 6),
                 "Zb Zib Zebibit Zettabit": (67, 7),
                 "ZB ZiB Zebibyte Zettabyte": (70, 7),
                 "Yb Yib Yobibit Yottabit": (77, 8),
                 "YB YiB Yobibyte Yottabyte": (80, 8),
                 }
    key_list = '\n'.join(["     b Bit"] + [x for x in reference.keys()]) +"\n"
    if unitN not in key_list:
        raise IndexError(f"\n\nConversion unit must be one of:\n\n{key_list}")
    units, divisors = [(k,v) for k,v in reference.items() if unitN in k][0]
    if SI:
        divisor = 1000**divisors[1]/8 if "bit" in units else 1000**divisors[1]
    else:
        divisor = float(1 << divisors[0])
    value = bytes / divisor
    return f"{value:,.0f} {unitN}{(value != 1 and len(unitN) > 3)*'s'}"


# Tests 
>>> assert format_bytes(1,"b") == '8 b'
>>> assert format_bytes(1,"bits") == '8 bits'
>>> assert format_bytes(1024, "kilobyte") == "1 Kilobyte"
>>> assert format_bytes(1024, "kB") == "1 KB"
>>> assert format_bytes(7141000, "mb") == '54 Mb'
>>> assert format_bytes(7141000, "mib") == '54 Mib'
>>> assert format_bytes(7141000, "Mb") == '54 Mb'
>>> assert format_bytes(7141000, "MB") == '7 MB'
>>> assert format_bytes(7141000, "mebibytes") == '7 Mebibytes'
>>> assert format_bytes(7141000, "gb") == '0 Gb'
>>> assert format_bytes(1000000, "kB") == '977 KB'
>>> assert format_bytes(1000000, "kB", SI=True) == '1,000 KB'
>>> assert format_bytes(1000000, "kb") == '7,812 Kb'
>>> assert format_bytes(1000000, "kb", SI=True) == '8,000 Kb'
>>> assert format_bytes(125000, "kb") == '977 Kb'
>>> assert format_bytes(125000, "kb", SI=True) == '1,000 Kb'
>>> assert format_bytes(125*1024, "kb") == '1,000 Kb'
>>> assert format_bytes(125*1024, "kb", SI=True) == '1,024 Kb'

2022 年 10 月更新

我对最近评论的回答太长,因此不需要对 1<<20 magic! I also notice that float 进行进一步解释,因此我已将其从上面的示例中删除。

正如另一条回复中所述(上文)“<<" is called a "bitwise operator". It converts the left hand side to binary and moves the binary digits 20 places to the left (in this case). When we count normally in decimal, the total number of digits dictates whether we've reached the tens, hundreds, thousands, millions etc. Similar thing in binary except the number of digits dictates whether we're talking bits, bytes, kilobytes, megabytes etc. So.... 1<<20 is actually the same as (binary) 1 with 20 (binary) zeros after it, or if you remember how to convert from binary to decimal: 2 to the power of 20 (2**20) which equals 1048576. In the snippets above, os.path.getsize returns a value in BYTES and 1048576 bytes are strictly speaking a Mebibyte (MiB) and casually speaking a Megabyte (MB).


31
投票

这是计算尺寸的紧凑函数

def GetHumanReadable(size,precision=2):
    suffixes=['B','KB','MB','GB','TB']
    suffixIndex = 0
    while size > 1024 and suffixIndex < 4:
        suffixIndex += 1 #increment the index of the suffix
        size = size/1024.0 #apply the division
    return "%.*f%s"%(precision,size,suffixes[suffixIndex])

更详细的输出及反之操作请参考:http://code.activestate.com/recipes/578019-bytes-to- human- human-to-bytes-converter/


24
投票

这是:

def convert_bytes(size):
    for x in ['bytes', 'KB', 'MB', 'GB', 'TB']:
        if size < 1024.0:
            return "%3.1f %s" % (size, x)
        size /= 1024.0

    return size

输出

>>> convert_bytes(1024)
'1.0 KB'
>>> convert_bytes(102400)
'100.0 KB'

12
投票

以防万一有人正在寻找这个问题的反面(我确实这样做了),这对我有用:

def get_bytes(size, suffix):
    size = int(float(size))
    suffix = suffix.lower()

    if suffix == 'kb' or suffix == 'kib':
        return size << 10
    elif suffix == 'mb' or suffix == 'mib':
        return size << 20
    elif suffix == 'gb' or suffix == 'gib':
        return size << 30

    return False

9
投票
UNITS = {1000: ['KB', 'MB', 'GB'],
            1024: ['KiB', 'MiB', 'GiB']}

def approximate_size(size, use_base_1024=True):
    mult = 1024 if use_base_1024 else 1000
    for unit in UNITS[mult]:
        size = size / mult
        if size < mult:
            return '{0:.3f} {1}'.format(size, unit)
            
approximate_size(2123, False)

3
投票

这是我的两分钱,它允许上下投射,并增加可定制的精度:

def convertFloatToDecimal(f=0.0, precision=2):
    '''
    Convert a float to string of decimal.
    precision: by default 2.
    If no arg provided, return "0.00".
    '''
    return ("%." + str(precision) + "f") % f

def formatFileSize(size, sizeIn, sizeOut, precision=0):
    '''
    Convert file size to a string representing its value in B, KB, MB and GB.
    The convention is based on sizeIn as original unit and sizeOut
    as final unit. 
    '''
    assert sizeIn.upper() in {"B", "KB", "MB", "GB"}, "sizeIn type error"
    assert sizeOut.upper() in {"B", "KB", "MB", "GB"}, "sizeOut type error"
    if sizeIn == "B":
        if sizeOut == "KB":
            return convertFloatToDecimal((size/1024.0), precision)
        elif sizeOut == "MB":
            return convertFloatToDecimal((size/1024.0**2), precision)
        elif sizeOut == "GB":
            return convertFloatToDecimal((size/1024.0**3), precision)
    elif sizeIn == "KB":
        if sizeOut == "B":
            return convertFloatToDecimal((size*1024.0), precision)
        elif sizeOut == "MB":
            return convertFloatToDecimal((size/1024.0), precision)
        elif sizeOut == "GB":
            return convertFloatToDecimal((size/1024.0**2), precision)
    elif sizeIn == "MB":
        if sizeOut == "B":
            return convertFloatToDecimal((size*1024.0**2), precision)
        elif sizeOut == "KB":
            return convertFloatToDecimal((size*1024.0), precision)
        elif sizeOut == "GB":
            return convertFloatToDecimal((size/1024.0), precision)
    elif sizeIn == "GB":
        if sizeOut == "B":
            return convertFloatToDecimal((size*1024.0**3), precision)
        elif sizeOut == "KB":
            return convertFloatToDecimal((size*1024.0**2), precision)
        elif sizeOut == "MB":
            return convertFloatToDecimal((size*1024.0), precision)

根据需要添加

TB
等。


2
投票

这是与 ls -lh 的输出匹配的版本。

def human_size(num: int) -> str:
    base = 1
    for unit in ['B', 'K', 'M', 'G', 'T', 'P', 'E', 'Z', 'Y']:
        n = num / base
        if n < 9.95 and unit != 'B':
            # Less than 10 then keep 1 decimal place
            value = "{:.1f}{}".format(n, unit)
            return value
        if round(n) < 1000:
            # Less than 4 digits so use this
            value = "{}{}".format(round(n), unit)
            return value
        base *= 1024
    value = "{}{}".format(round(n), unit)
    return value

2
投票

我想要 2 路转换,并且我想使用 Python 3 format() 支持来实现最 Pythonic。也许尝试 datasize 库模块? https://pypi.org/project/datasize/

$ pip install -qqq datasize
$ python
...
>>> from datasize import DataSize
>>> 'My new {:GB} SSD really only stores {:.2GiB} of data.'.format(DataSize('750GB'),DataSize(DataSize('750GB') * 0.8))
'My new 750GB SSD really only stores 558.79GiB of data.'

-1
投票

这是我的实现:

from bisect import bisect

def to_filesize(bytes_num, si=True):
    decade = 1000 if si else 1024
    partitions = tuple(decade ** n for n in range(1, 6))
    suffixes = tuple('BKMGTP')

    i = bisect(partitions, bytes_num)
    s = suffixes[i]

    for n in range(i):
        bytes_num /= decade

    f = '{:.3f}'.format(bytes_num)

    return '{}{}'.format(f.rstrip('0').rstrip('.'), s)

它将打印最多三位小数,并删除尾随的零和句点。布尔参数

si
将切换基于 10 和基于 2 的大小大小的使用。

这是它的对应物。它允许编写干净的配置文件,例如

{'maximum_filesize': from_filesize('10M')
。它返回一个近似于预期文件大小的整数。我没有使用位移位,因为源值是浮点数(它会接受
from_filesize('2.15M')
就可以了)。将其转换为整数/小数是可行的,但会使代码更加复杂,而且它已经可以正常工作了。

def from_filesize(spec, si=True):
    decade = 1000 if si else 1024
    suffixes = tuple('BKMGTP')

    num = float(spec[:-1])
    s = spec[-1]
    i = suffixes.index(s)

    for n in range(i):
        num *= decade

    return int(num)
© www.soinside.com 2019 - 2024. All rights reserved.