如果参数是列表对象，Python也是动态数组使用的sys.getsizeof返回内存

Question

python中的列表是由动态数组实现的。因此，列表对象中有一个指针指向堆中的数组。源代码为here。

typedef struct {
PyObject_VAR_HEAD
/* Vector of pointers to list elements.  list[0] is ob_item[0], etc. */
PyObject **ob_item;

/* ob_item contains space for 'allocated' elements.  The number
 * currently in use is ob_size.
 * Invariants:
 *     0 <= ob_size <= allocated
 *     len(list) == ob_size
 *     ob_item == NULL implies ob_size == allocated == 0
 * list.sort() temporarily sets allocated to -1 to detect mutations.
 *
 * Items must normally not be NULL, except during construction when
 * the list is not yet visible outside the function that builds it.
 */
Py_ssize_t allocated;

} PyListObject;

所以我认为无论列表中有多少项，sys.getsizeof都将返回一个常数，因为列表对象仅包含一个指针，一个MACRO（PyObject_HEAD Py_ssize_t ob_size;）和一个整数。

但是，我发现列表中sys.getsizeof的返回值随着列表中项目数的增加而增加。

示例：

sys.getsizeof([])                         #returns 72
sys.getsizeof([1])                        #returns 80
sys.getsizeof([x for x in xrange(1024)])  #returns 9032

为什么列表上的sys.getsizeof的返回值不是常数？

Answer 1

这是优化问题。

让我们举个例子：假设append在当前“已满”（从内存的PoV）列表中发出。 2种可能的结果：

您认为很自然的人：
- 为新元素分配了内存
- 新元素引用保存在上面的存储槽中
实际的一个：
- 为一堆新元素分配了内存
- 新元素引用保存在第一个内存插槽中（从上一个项目符号中保存）

重复该过程时，可以看到2 ^nd方法的性能更好，因为内存分配较少（下一个append不必像以前那样分配内存）。需要注意的是，由于列表内容是一个数组（realloc（在[[整个列表内容上）每次涉及到内存操作时，对于大的（并且不仅如此）列表）。

不用说，
non“完整”列表的概念不适用于1 st方法的上下文。
^{实施细节：每次修改列表（可能/也将应用于其他容器）（添加或删除元素）时，都会重新计算其大小，因此它是[[resize d。
来自[GitHub]: python/cpython - (master) cpython/Objects/listobject.c的注释（列表_ 调整大小}的注释部分，重点是我的）]/* This over-allocates proportional to the list size, making room * for additional growth. The over-allocation is mild, but is * enough to give linear-time amortized behavior over a long * sequence of appends() in the presence of a poorly-performing * system realloc(). * The growth pattern is: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ... * Note: new_allocated won't overflow because the largest possible value * is PY_SSIZE_T_MAX * (9 / 8) + 6 which always fits in a size_t. */
下面是一个例子。
code00.py
：
#!/usr/bin/env python import sys def list_growth(max_elems): test_list = [] prev_size = 0 data = [] for _ in range(max_elems): cur_size = sys.getsizeof(test_list) if cur_size != prev_size: data.append((len(test_list), cur_size)) prev_size = cur_size test_list.append(0) return data def main(*argv): count = 3000 print("List growth example (.append) for {0:d} ints".format(count)) pairs = list_growth(count) for pair in pairs: print("{0:4} - {1:5}".format(*pair)) ''' print("Same example for initially constructed list: {0:d}".format(count)) for x, size in pairs: test_list = [0] * x print("{0:4} - {1:5}".format(x, sys.getsizeof(test_list))) ''' if __name__ == "__main__": print("Python {0:s} {1:d}bit on {2:s}\n".format(" ".join(item.strip() for item in sys.version.split("\n")), 64 if sys.maxsize > 0x100000000 else 32, sys.platform)) main(*sys.argv[1:]) print("\nDone.")
输出
：
e:\Work\Dev\StackOverflow\q060502188>"e:\Work\Dev\VEnvs\py_pc064_03.07.06_test0\Scripts\python.exe" code00.py Python 3.7.6 (tags/v3.7.6:43364a7ae0, Dec 19 2019, 00:42:30) [MSC v.1916 64 bit (AMD64)] 64bit on win32 List growth example (.append) for 3000 ints 0 - 64 1 - 96 5 - 128 9 - 192 17 - 264 26 - 344 36 - 432 47 - 528 59 - 640 73 - 768 89 - 912 107 - 1072 127 - 1248 149 - 1448 174 - 1672 202 - 1928 234 - 2216 270 - 2536 310 - 2896 355 - 3304 406 - 3760 463 - 4272 527 - 4848 599 - 5496 680 - 6232 772 - 7056 875 - 7984 991 - 9024 1121 - 10200 1268 - 11520 1433 - 13008 1619 - 14680 1828 - 16560 2063 - 18672 2327 - 21048 2624 - 23728 2959 - 26736 Done.
如图所示，结果与注释中的结果匹配-偏移量为1：）。
可能还想检查[SO]: Why does list ask about __len__?（确保也到达答案的最后一部分）。
处理已创建的列表时，大小要小一些，但原理是相同的。

如果参数是列表对象，Python也是动态数组使用的sys.getsizeof返回内存

问题描述投票：0回答：1

1个回答

最新问题

如果参数是列表对象，Python也是动态数组使用的sys.getsizeof返回内存

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1