是否可以修复因截断而损坏的序列化字符串?

问题描述 投票:0回答:15

我有一个巨大的多维数组,已由 PHP 序列化。已存入MySQL,数据字段不够大...末尾被截掉了。

我需要提取数据,但是

unserialize
不起作用。

有谁知道可以关闭所有数组并重新计算字符串长度以生成新的有效序列化字符串的解决方案?

手工处理的数据太多了。

php serialization corruption truncation
15个回答
39
投票

这是重新计算序列化数组中元素的长度:

$fixed = preg_replace_callback(
    '/s:([0-9]+):\"(.*?)\";/',
    function ($matches) { return "s:".strlen($matches[2]).':"'.$matches[2].'";';     },
    $serialized
);

但是,如果您的字符串包含

";
,则不起作用。在这种情况下,不可能自动修复序列化的数组字符串——需要手动编辑。


25
投票

解决方案:

1)在线尝试:

序列化字符串修复器(在线工具)

2)使用功能:

unserialize(
serialize_corrector(
$serialized_string )  )  ;

代码:

function serialize_corrector($serialized_string){
    // at first, check if "fixing" is really needed at all. After that, security checkup.
    if ( @unserialize($serialized_string) !== true &&  preg_match('/^[aOs]:/', $serialized_string) ) {
        $serialized_string = preg_replace_callback( '/s\:(\d+)\:\"(.*?)\";/s',    function($matches){return 's:'.strlen($matches[2]).':"'.$matches[2].'";'; },   $serialized_string );
    }
    return $serialized_string;
} 

还有这个脚本,我还没有测试过。


22
投票

我已经尝试了这篇文章中找到的所有内容,但没有任何效果对我有用。经过几个小时的痛苦之后,这是我在谷歌的深层页面中找到的内容并终于起作用了:

function fix_str_length($matches) {
    $string = $matches[2];
    $right_length = strlen($string); // yes, strlen even for UTF-8 characters, PHP wants the mem size, not the char count
    return 's:' . $right_length . ':"' . $string . '";';
}
function fix_serialized($string) {
    // securities
    if ( !preg_match('/^[aOs]:/', $string) ) return $string;
    if ( @unserialize($string) !== false ) return $string;
    $string = preg_replace("%\n%", "", $string);
    // doublequote exploding
    $data = preg_replace('%";%', "µµµ", $string);
    $tab = explode("µµµ", $data);
    $new_data = '';
    foreach ($tab as $line) {
        $new_data .= preg_replace_callback('%\bs:(\d+):"(.*)%', 'fix_str_length', $line);
    }
    return $new_data;
}

您可以按如下方式调用例程:

//Let's consider we store the serialization inside a txt file
$corruptedSerialization = file_get_contents('corruptedSerialization.txt');

//Try to unserialize original string
$unSerialized = unserialize($corruptedSerialization);

//In case of failure let's try to repair it
if(!$unSerialized){
    $repairedSerialization = fix_serialized($corruptedSerialization);
    $unSerialized = unserialize($repairedSerialization);
}

//Keep your fingers crossed
var_dump($unSerialized);

4
投票

以下代码片段将尝试读取和解析递归损坏的序列化字符串(blob 数据)。例如,如果您存储到数据库列字符串太长并且它被切断。数字原语和布尔值保证有效,字符串可能被切断和/或数组键可能丢失。该例程可能有用,例如如果恢复重要(不是全部)部分数据对您来说是足够的解决方案。

class Unserializer
{
    /**
    * Parse blob string tolerating corrupted strings & arrays
    * @param string $str Corrupted blob string
    */
    public static function parseCorruptedBlob(&$str)
    {
        // array pattern:    a:236:{...;}
        // integer pattern:  i:123;
        // double pattern:   d:329.0001122;
        // boolean pattern:  b:1; or b:0;
        // string pattern:   s:14:"date_departure";
        // null pattern:     N;
        // not supported: object O:{...}, reference R:{...}

        // NOTES:
        // - primitive types (bool, int, float) except for string are guaranteed uncorrupted
        // - arrays are tolerant to corrupted keys/values
        // - references & objects are not supported
        // - we use single byte string length calculation (strlen rather than mb_strlen) since source string is ISO-8859-2, not utf-8

        if(preg_match('/^a:(\d+):{/', $str, $match)){
            list($pattern, $cntItems) = $match;
            $str = substr($str, strlen($pattern));
            $array = [];
            for($i=0; $i<$cntItems; ++$i){
                $key = self::parseCorruptedBlob($str);
                if(trim($key)!==''){ // hmm, we wont allow null and "" as keys..
                    $array[$key] = self::parseCorruptedBlob($str);
                }
            }
            $str = ltrim($str, '}'); // closing array bracket
            return $array;
        }elseif(preg_match('/^s:(\d+):/', $str, $match)){
            list($pattern, $length) = $match;
            $str = substr($str, strlen($pattern));
            $val = substr($str, 0, $length + 2); // include also surrounding double quotes
            $str = substr($str, strlen($val) + 1); // include also semicolon
            $val = trim($val, '"'); // remove surrounding double quotes
            if(preg_match('/^a:(\d+):{/', $val)){
                // parse instantly another serialized array
                return (array) self::parseCorruptedBlob($val);
            }else{
                return (string) $val;
            }
        }elseif(preg_match('/^i:(\d+);/', $str, $match)){
            list($pattern, $val) = $match;
            $str = substr($str, strlen($pattern));
            return (int) $val;
        }elseif(preg_match('/^d:([\d.]+);/', $str, $match)){
            list($pattern, $val) = $match;
            $str = substr($str, strlen($pattern));
            return (float) $val;
        }elseif(preg_match('/^b:(0|1);/', $str, $match)){
            list($pattern, $val) = $match;
            $str = substr($str, strlen($pattern));
            return (bool) $val;
        }elseif(preg_match('/^N;/', $str, $match)){
            $str = substr($str, strlen('N;'));
            return null;
        }
    }
}

// usage:
$unserialized = Unserializer::parseCorruptedBlob($serializedString);

3
投票

使用

preg_replace_callback()
,而不是
preg_replace(.../e)
(因为
/e
修饰符已弃用)。

$fixed_serialized_String = preg_replace_callback('/s:([0-9]+):\"(.*?)\";/',function($match) {
    return "s:".strlen($match[2]).':"'.$match[2].'";';
}, $serializedString);

$correct_array= unserialize($fixed_serialized_String);

2
投票

对我来说最好的解决方案:

$output_array = unserialize(My_checker($serialized_string));

代码:

function My_checker($serialized_string){
    // securities
    if (empty($serialized_string))                      return '';
    if ( !preg_match('/^[aOs]:/', $serialized_string) ) return $serialized_string;
    if ( @unserialize($serialized_string) !== false ) return $serialized_string;

    return
    preg_replace_callback(
        '/s\:(\d+)\:\"(.*?)\";/s', 
        function ($matches){  return 's:'.strlen($matches[2]).':"'.$matches[2].'";';  },
        $serialized_string )
    ;
}

0
投票

基于@Emil M 的回答 这是一个适用于包含双引号的文本的固定版本。

function fix_broken_serialized_array($match) {
    return "s:".strlen($match[2]).":\"".$match[2]."\";"; 
}
$fixed = preg_replace_callback(
    '/s:([0-9]+):"(.*?)";/',
    "fix_broken_serialized_array",
    $serialized
);

0
投票

[更新] 同事们,我不太确定这里是否允许,但特别是对于类似的情况,我创建了自己的工具并将其放在自己的网站上。请尝试一下 https://saysimsim.ru/tools/SerializedDataEditor

[旧文] 结论 :-) 经过 3 天(而不是预计的 2 小时)将 WordPress 网站迁移到新域名后,我终于找到了这个页面! 各位同事,请将此视为我对你们所有回答的“Thank_You_Very_Much_Indeed”。 下面的代码包含您的所有解决方案,几乎没有添加任何内容。 JFYI:就我个人而言,解决方案 3 最有效。卡迈勒·萨利赫 - 你是最棒的!!!

function hlpSuperUnSerialize($str) {
    #region Simple Security
    if (
        empty($str)
        || !is_string($str)
        || !preg_match('/^[aOs]:/', $str)
    ) {
        return FALSE;
    }
    #endregion Simple Security

    #region SOLUTION 0
    // PHP default :-)
    $repSolNum = 0;
    $strFixed  = $str;
    $arr       = @unserialize($strFixed);
    if (FALSE !== $arr) {
        error_log("UNSERIALIZED!!! SOLUTION {$repSolNum} worked!!!");

        return $arr;
    }
    #endregion SOLUTION 0

    #region SOLUTION 1
    // @link https://stackoverflow.com/a/5581004/3142281
    $repSolNum = 1;
    $strFixed  = preg_replace_callback(
        '/s:([0-9]+):\"(.*?)\";/',
        function ($matches) { return "s:" . strlen($matches[2]) . ':"' . $matches[2] . '";'; },
        $str
    );
    $arr       = @unserialize($strFixed);
    if (FALSE !== $arr) {
        error_log("UNSERIALIZED!!! SOLUTION {$repSolNum} worked!!!");

        return $arr;
    }
    #endregion SOLUTION 1

    #region SOLUTION 2
    // @link https://stackoverflow.com/a/24995701/3142281
    $repSolNum = 2;
    $strFixed  = preg_replace_callback(
        '/s:([0-9]+):\"(.*?)\";/',
        function ($match) {
            return "s:" . strlen($match[2]) . ':"' . $match[2] . '";';
        },
        $str);
    $arr       = @unserialize($strFixed);
    if (FALSE !== $arr) {
        error_log("UNSERIALIZED!!! SOLUTION {$repSolNum} worked!!!");

        return $arr;
    }
    #endregion SOLUTION 2

    #region SOLUTION 3
    // @link https://stackoverflow.com/a/34224433/3142281
    $repSolNum = 3;
    // securities
    $strFixed = preg_replace("%\n%", "", $str);
    // doublequote exploding
    $data     = preg_replace('%";%', "µµµ", $strFixed);
    $tab      = explode("µµµ", $data);
    $new_data = '';
    foreach ($tab as $line) {
        $new_data .= preg_replace_callback(
            '%\bs:(\d+):"(.*)%',
            function ($matches) {
                $string       = $matches[2];
                $right_length = strlen($string); // yes, strlen even for UTF-8 characters, PHP wants the mem size, not the char count

                return 's:' . $right_length . ':"' . $string . '";';
            },
            $line);
    }
    $strFixed = $new_data;
    $arr      = @unserialize($strFixed);
    if (FALSE !== $arr) {
        error_log("UNSERIALIZED!!! SOLUTION {$repSolNum} worked!!!");

        return $arr;
    }
    #endregion SOLUTION 3

    #region SOLUTION 4
    // @link https://stackoverflow.com/a/36454402/3142281
    $repSolNum = 4;
    $strFixed  = preg_replace_callback(
        '/s:([0-9]+):"(.*?)";/',
        function ($match) {
            return "s:" . strlen($match[2]) . ":\"" . $match[2] . "\";";
        },
        $str
    );
    $arr       = @unserialize($strFixed);
    if (FALSE !== $arr) {
        error_log("UNSERIALIZED!!! SOLUTION {$repSolNum} worked!!!");

        return $arr;
    }
    #endregion SOLUTION 4

    #region SOLUTION 5
    // @link https://stackoverflow.com/a/38890855/3142281
    $repSolNum = 5;
    $strFixed  = preg_replace_callback('/s\:(\d+)\:\"(.*?)\";/s', function ($matches) { return 's:' . strlen($matches[2]) . ':"' . $matches[2] . '";'; }, $str);
    $arr       = @unserialize($strFixed);
    if (FALSE !== $arr) {
        error_log("UNSERIALIZED!!! SOLUTION {$repSolNum} worked!!!");

        return $arr;
    }
    #endregion SOLUTION 5

    #region SOLUTION 6
    // @link https://stackoverflow.com/a/38891026/3142281
    $repSolNum = 6;
    $strFixed  = preg_replace_callback(
        '/s\:(\d+)\:\"(.*?)\";/s',
        function ($matches) { return 's:' . strlen($matches[2]) . ':"' . $matches[2] . '";'; },
        $str);;
    $arr = @unserialize($strFixed);
    if (FALSE !== $arr) {
        error_log("UNSERIALIZED!!! SOLUTION {$repSolNum} worked!!!");

        return $arr;
    }
    #endregion SOLUTION 6
    error_log('Completely unable to deserialize.');

    return FALSE;
}

0
投票

我们也遇到了一些问题。最后,我们使用了 roman-newaza 的修改版本,它也适用于包含换行符的数据。

<?php 


$mysql = mysqli_connect("localhost", "...", "...", "...");
$res = mysqli_query($mysql, "SELECT option_id,option_value from ... where option_value like 'a:%'");

$prep = mysqli_prepare($mysql, "UPDATE ... set option_value = ? where option_id = ?");


function fix_str_length($matches) {
    $string = $matches[2];
    $right_length = strlen($string); // yes, strlen even for UTF-8 characters, PHP wants the mem size, not the char count
    return 's:' . $right_length . ':"' . $string . '";';
}
function fix_serialized($string) {
    if ( !preg_match('/^[aOs]:/', $string) ) return $string;
    if ( @unserialize($string) !== false ) return $string;
    $data = preg_replace('%";%', "µµµ", $string);
    $tab = explode("µµµ", $data);
    $new_data = '';
    foreach ($tab as $line) {
        $new_data .= preg_replace_callback('%\bs:(\d+):"(.*)%s', 'fix_str_length', $line);
    }
    return $new_data;
}

while ( $val = mysqli_fetch_row($res) ) {
  $y = $val[0];
  $x = $val[1];

  $unSerialized = unserialize($x);

  //In case of failure let's try to repair it
  if($unSerialized === false){
      echo "fixing $y\n";
      $repairedSerialization = fix_serialized($x);
      //$unSerialized = unserialize($repairedSerialization);
      mysqli_stmt_bind_param($prep, "si", $repairedSerialization, $y);
      mysqli_stmt_execute($prep);
  }

}

0
投票

热门投票答案无法修复具有不带引号的字符串值的序列化数组,例如

a:1:{i:0;s:2:14;}

function unserialize_corrupted(string $str): array {
    // Fix serialized array with unquoted strings
    if(preg_match('/^(a:\d+:{)/', $str)) {
        preg_match_all('/(s:\d+:(?!").+(?!");)/U', $str, $pm_corruptedStringValues);

        foreach($pm_corruptedStringValues[0] as $_corruptedStringValue) {
            // Get post string data
            preg_match('/^(s:\d+:)/', $_corruptedStringValue, $pm_strBase);

            // Get unquoted string
            $stringValue = substr($_corruptedStringValue, strlen($pm_strBase[0]), -1);
            // Rebuild serialized data with quoted string
            $correctedStringValue = "$pm_strBase[0]\"$stringValue\";";

            // replace corrupted data
            $str = str_replace($_corruptedStringValue, $correctedStringValue, $str);
        }
    }

    // Fix offset error
    $str = preg_replace_callback(
        '/s:(\d+):\"(.*?)\";/',
        function($matches) { return "s:".strlen($matches[2]).':"'.$matches[2].'";'; },
        $str
    );

    $unserializedString = unserialize($str);

    if($unserializedString === false) {
        // Return empty array if string can't be fixed
        $unserializedString = array();
    }

    return $unserializedString;
}

0
投票

基于@Preciel的解决方案,也修复对象

public function unserialize(string $string2array): array {
    if (preg_match('/^(a:\d+:{)/', $string2array)) {
        preg_match_all('/((s:\d+:(?!").+(?!");)|(O:\d+:(?!").+(?!"):))/U', $string2array, $matches);
        foreach ($matches[0] as $match) {
            preg_match('/^((s|O):\d+:)/', $match, $strBase);
            $stringValue = substr($match, strlen($strBase[0]), -1);
            $endSymbol = substr($match, -1);
            $fixedValue = $strBase[2] . ':' . strlen($stringValue) . ':"' . $stringValue . '"' . $endSymbol;
            $string2array = str_replace($match, $fixedValue, $string2array);
        }
    }

    $string2array = preg_replace_callback(
        '/(a|s|b|d|i):(\d+):\"(.*?)\";/',
        function ($matches) {
            return $matches[1] . ":" . strlen($matches[3]) . ':"' . $matches[3] . '";';
        },
        $string2array
    );

    $unserializedString = (!empty($string2array) && @unserialize($string2array)) ? unserialize($string2array) : array();
    return $unserializedString;
}

-3
投票

我怀疑有人会编写代码来检索部分保存的数组:) 我曾经修复过这样的事情,但是是手工修复的,花了几个小时,然后我意识到我不需要数组的那部分......

除非它真的很重要的数据(我的意思是真的很重要),否则你最好放弃这个


-3
投票

您可以通过数组的方式将无效的序列化数据恢复为正常:)

str = "a:1:{i:0;a:4:{s:4:\"name\";s:26:\"20141023_544909d85b868.rar\";s:5:\"dname\";s:20:\"HTxRcEBC0JFRWhtk.rar\";s:4:\"size\";i:19935;s:4:\"dead\";i:0;}}"; 

preg_match_all($re, $str, $matches);

if(is_array($matches) && !empty($matches[1]) && !empty($matches[2]))
{
    foreach($matches[1] as $ksel => $serv)
    {
        if(!empty($serv))
        {
            $retva[] = $serv;
        }else{
            $retva[] = $matches[2][$ksel];
        }
    }

    $count = 0;
    $arrk = array();
    $arrv = array();
    if(is_array($retva))
    {
        foreach($retva as $k => $va)
        {
            ++$count;
            if($count/2 == 1)
            {
                $arrv[] = $va;
                $count = 0;
            }else{
                $arrk[] = $va;
            }
        }
        $returnse = array_combine($arrk,$arrv);
    }

}

print_r($returnse);

-5
投票

我认为这几乎是不可能的。 在修复阵列之前,您需要知道它是如何损坏的。 有多少孩子失踪?内容是什么?

抱歉,恕我直言,你做不到。

证明:

<?php

$serialized = serialize(
    [
        'one'   => 1,
        'two'   => 'nice',
        'three' => 'will be damaged'
    ]
);

var_dump($serialized); // a:3:{s:3:"one";i:1;s:3:"two";s:4:"nice";s:5:"three";s:15:"will be damaged";}

var_dump(unserialize('a:3:{s:3:"one";i:1;s:3:"two";s:4:"nice";s:5:"tee";s:15:"will be damaged";}')); // please note 'tee'

var_dump(unserialize('a:3:{s:3:"one";i:1;s:3:"two";s:4:"nice";s:5:"three";s:')); // serialized string is truncated

链接:https://ideone.com/uvISQu

即使您可以重新计算键/值的长度,您也不能信任从此源检索的数据,因为您无法重新计算这些值。例如。如果序列化的数据是一个对象,您的属性将不再可访问。


-5
投票

序列化几乎总是不好的,因为你无法以任何方式搜索它。 抱歉,您好像被逼到了墙角……

© www.soinside.com 2019 - 2024. All rights reserved.