我将矩阵与字节字符串相互转换的代码有什么问题吗？

Question

我有这个函数可以将二进制二维数组转换为字节数组：

def flatten_and_pad_to_multiple_of_8(binary_matrix):
    # Step 1: Calculate the size of the original flattened array
    rows, cols = binary_matrix.shape
    current_length = rows * cols
    
    # Step 2: Calculate the required length that is a multiple of 8
    padded_length = ((current_length + 7) // 8) * 8
    
    # Step 3: Initialize flat_bits with the required padded length
    flat_bits = np.zeros(padded_length, dtype=np.uint8)
    
    # Step 4: Fill flat_bits with values from the binary matrix
    idx = 0
    for i in range(rows):
        for j in range(cols):
            flat_bits[idx] = binary_matrix[i, j]
            idx += 1
    
    return flat_bits

def matrix_to_ascii(matrix):
    flat_bits = flatten_and_pad_to_multiple_of_8(matrix)    
    # Convert the flattened bits into bytes
    ascii_string = ""
    for i in range(0, len(flat_bits), 8):
        byte = 0
        for j in range(8):
            byte = (byte << 1) | flat_bits[i + j]
        ascii_char = chr(byte)
        ascii_string += ascii_char
    return ascii_string

如果

matrix = np.array([[0, 1, 1, 1, 1],
[1, 0, 1, 1, 1],
[1, 1, 0, 1, 1],
[1, 1, 1, 0, 1],
[1, 1, 1, 1, 0]], dtype=uint8)

那么，matrix_to_ascii(matrix) 是 '}÷ß\x00'，尽管它是一个字符串。然后我必须执行matrix_to_ascii(matrix).encode()。我的问题是将其转换回矩阵。

我首先将字符串转换为字节数组以节省空间。我需要节省代码空间。这是将其转换回矩阵的损坏代码：

def ascii_to_matrix(byte_array, original_shape):
    """
    ascii_string must be a bytestring before it is passed in.
    """
    # Initialize the binary matrix with the original shape
    rows, cols = original_shape
    binary_matrix = np.zeros((rows, cols), dtype=np.uint8)
    
    # Fill the binary matrix with bits from the byte array
    bit_idx = 0
    for byte in byte_array:
        for j in range(8):
            if bit_idx < rows * cols:
                binary_matrix[bit_idx // cols, bit_idx % cols] = (byte >> (7 - j)) & 1
                bit_idx += 1
            else:
                break
    return binary_matrix

不幸的是，它给出了错误的输出：

ascii_to_matrix(matrix_to_ascii(matrix).encode(), (5, 5))

array([[0, 1, 1, 1, 1],
       [1, 0, 1, 1, 1],
       [0, 0, 0, 0, 1],
       [1, 1, 0, 1, 1],
       [0, 1, 1, 1, 1]], dtype=uint8)

我做错了什么？

（我没有使用任何更高级的 numpy 函数，因为我最终希望使用 numba 来加快这一切。特别是，我不能使用 packbits 或 tobytes，因为 numba 不支持它们。）

Answer 1

import numpy as np

matrix = np.array(
    [
        [0, 1, 1, 1, 1],
        [1, 0, 1, 1, 1],
        [1, 1, 0, 1, 1],
        [1, 1, 1, 0, 1],
        [1, 1, 1, 1, 0],
    ],
    dtype=np.uint8,
)


def matrix_to_bytes(matrix):
    flat_matrix = matrix.flatten()
    chunks = np.pad(flat_matrix, (0, 8 - len(flat_matrix) % 8)).reshape(-1, 8)
    return bytes(int("".join(map(str, chunk)), 2) for chunk in chunks)


def bytes_to_matrix(bytestring, shape):
    bits = "".join(f"{byte:08b}" for byte in bytestring)
    return np.array(list(bits), dtype=np.uint8)[: np.prod(shape)].reshape(shape)


print(x := matrix_to_bytes(matrix))
print(y := bytes_to_matrix(x, (5, 5)))
assert np.array_equal(matrix, y)

似乎可以解决问题。

我将矩阵与字节字符串相互转换的代码有什么问题吗？

问题描述投票：0回答：1

1个回答

最新问题

我将矩阵与字节字符串相互转换的代码有什么问题吗？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1