G获取潜在嵌套大括号内的所有子字符串

问题描述 投票:0回答:2

我正在尝试用 PHP 解析以下格式:

// This is a comment
{
this is an entry
}
{
this is another entry
}
{
entry
{entry within entry}
{entry within entry}
}

也许只是缺少咖啡因,但我想不出一个好的方法来获取大括号的内容。

php string tokenize text-parsing
2个回答
1
投票

这是一个非常常见的解析任务,基本上你需要跟踪你可能处于的各种状态,并使用常量和函数调用的组合来维护它们。

这是一些相当不优雅的代码,它就是这样做的:

<?php

$input = file_get_contents('input.txt');

define('STATE_CDATA', 0);
define('STATE_COMMENT', 1);

function parseBrace($input, &$i)
{
    $parsed = array(
        'cdata' => '',
        'children' => array()
    );
    $length = strlen($input);
    $state = STATE_CDATA;
    for(++$i; $i < $length; ++$i) {
        switch($input[$i]) {
            case '/':
                if ('/' === $input[$i+1]) {
                    $state = STATE_COMMENT;
                    ++$i;
                } if (STATE_CDATA === $state) {
                    $parsed['cdata'] .= $input[$i];
                }
                break;
            case '{':
                if (STATE_CDATA === $state) {
                    $parsed['children'][] = parseBrace($input, $i);
                }
                break;
            case '}':
                if (STATE_CDATA === $state) {
                    break 2; // for
                }
                break;
            case "\n":
                if (STATE_CDATA === $state) {
                    $parsed['cdata'] .= $input[$i];
                }
                $state = STATE_CDATA;
                break;
            default:
                if (STATE_CDATA === $state) {
                    $parsed['cdata'] .= $input[$i];
                }
        }
    }
    return $parsed;
}

function parseInput($input)
{
    $parsed = array(
        'cdata' => '',
        'children' => array()
    );
    $state = STATE_CDATA;
    $length = strlen($input);
    for($i = 0; $i < $length; ++$i) {
        switch($input[$i]) {
            case '/':
                if ('/' === $input[$i+1]) {
                    $state = STATE_COMMENT;
                    ++$i;
                } if (STATE_CDATA === $state) {
                    $parsed['cdata'] .= $input[$i];
                }
                break;
            case '{':
                if (STATE_CDATA === $state) {
                    $parsed['children'][] = parseBrace($input, $i);
                }
                break;
            case "\n":
                if (STATE_CDATA === $state) {
                    $parsed['cdata'] .= $input[$i];
                }
                $state = STATE_CDATA;
                break;
            default:
                if (STATE_CDATA === $state) {
                    $parsed['cdata'] .= $input[$i];
                }
        }
    }
    return $parsed;
}

print_r(parseInput($input));

这会产生以下输出:

Array
(
    [cdata] =>




    [children] => Array
    (
        [0] => Array
        (
            [cdata] =>
this is an entry

            [children] => Array
            (
            )

        )

        [1] => Array
        (
            [cdata] =>
this is another entry

            [children] => Array
            (
            )   

        )

        [2] => Array
        (
            [cdata] => 
entry



            [children] => Array
            (
                [0] => Array
                (
                    [cdata] => entry within entry
                    [children] => Array
                    (
                    )


                )

                [1] => Array
                (
                    [cdata] => entry within entry
                    [children] => Array
                    (
                    )

                )

            )

        )

    )

)

您可能想要清理所有空白,但一些放置得当的装饰会为您排序。


0
投票

这可能不是处理大量内容的最佳解决方案,但它确实有效。

<?php
        $text = "I am out of the brackets {hi i am in the brackets} Back out { Back in}";
        print $text . '<hr />';

        $tmp = explode("{",$text);
        $tmp2 = array();
        $wantedText = array();
        for($i = 0; $i < count($tmp); $i++){
                if(stristr($tmp[$i],"}")){
                    $tmp2 = explode("}",$tmp[$i]);
                    array_push($wantedText,$tmp2[0]);
                }
        }
        print_r($wantedText);
    ?>

结果:

Array ( [0] => hi i am in the brackets [1] => Back in )
© www.soinside.com 2019 - 2024. All rights reserved.