将方括号标记的文本解析为关联数组

问题描述 投票:0回答:2

我有一个看起来像这样的 php 字符串...

[line1]this is some test text[/line1][line2]This is line 2 text[/line2][line3]This is line 3 text[/line3]

我正在尝试创建一个看起来像这样的数组..

array(
    "line1" => "this is some test text",
    "line2" => "This is line 2 text",
    "line3" => "This is line 3 text"
)

该字符串是动态创建的,因此它可以包含第 1 行 - 第 99 行等。

执行此操作并保持可扩展性的最佳方法是什么? 有人可以给我举个例子吗?

php arrays text-parsing
2个回答
2
投票

就正则表达式而言,这可能是匹配模式的合理妥协。

注意:这不会处理嵌套/递归。

\[([^\]]+)\](.*?)\[/\g{1}\]

用途:

preg_match_all( '%\[([^\]]+)\](.*?)\[/\g{1}\]%', $subject, $matches, PREG_SET_ORDER );
var_dump( $matches );

Match the character “[” literally «\[»
Match the regex below and capture its match into backreference number 1 «([^\]]+)»
   Match any character that is NOT a “]” «[^\]]+»
      Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the character “]” literally «\]»
Match the regex below and capture its match into backreference number 2 «(.*)»
   Match any single character that is NOT a line break character (line feed) «.*»
      Between zero and unlimited times, as many times as possible, giving back as needed (lazy) «*»
Match the character “[” literally «\[»
Match the character “/” literally «/»
Match the same text that was most recently matched by capturing group number 1 (case sensitive; fail if the group did not participate in the match so far) «\g{1}»
Match the character “]” literally «\]»

0
投票

您可以使用

explode
采用这种方法(通过多次分割输入字符串)来隔离相关数据:line-idline-text:

$text = "[line1]this is some test text[/line1][line2]This is line 2 text[/line2][line3]This is line 3 text[/line3]";

$text = explode( '][line', ']'.$text.'[line');

// you'll get a spurious item at index 0 and at the end, they'll be skipped

$n = count( $text );

$output = array();

for( $i = 1; $i < $n - 1; $i++ )
{
    $line = explode( ']', $text[ $i] );
    $id = 'line' . $line[ 0 ];
    $line = explode( '[', $line[ 1 ] );
    $value = $line[ 0 ];
    $output[ $id ] = $value;
}

var_export( $output );
echo "\n";

你得到:

array (
  'line1' => 'this is some test text',
  'line2' => 'This is line 2 text',
  'line3' => 'This is line 3 text',
)

注:

可以容忍并妥善处理空“线”

line
文本内的方形刹车会破坏代码并搞砸一切。

输入格式必须严格按照表格格式

[line
n
]
文字
[/line
n
]
....

如果您有进一步的要求,可以调整代码。

我认为这可能是一个很好的起点。

注意(再次):

这是一个可行的解决方案(考虑到上面提到的限制)。

另一种方法是使用正则表达式,并

preg_match_all()
使用捕获组来检索行 ID 和行文本。

© www.soinside.com 2019 - 2024. All rights reserved.