我想解析这样的字符串
"blub blib blab"
"\n \n blub \t \t \n blib \t \n blab \n \t \n"
" blub blib blab "
并将
"blub"
、"blib"
和 "blab"
提取到结构体的成员 a
、b
和 c
(在下面的代码中定义)。
想法:我认为 *qi::space 和 +qi::space 正在成为我的结果集的一部分,而“不正确”填充的成员是找到的空格
我正在尝试用 Spirit Qi 解析一些脚本语言,这是我的第一步 - 在多年不使用 Spirit 之后。我知道我可以使用正则表达式轻松解析它,但这不是我的意图。
#include <string>
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
struct Out
{
Out() = default;
Out( const std::string& a_, const std::string& b_, const std::string& c_ ) :
a( a_ ), b( b_ ), c( c_ )
{
}
std::string a;
std::string b;
std::string c;
};
BOOST_FUSION_ADAPT_STRUCT( Out, a, b, c )
int main()
{
qi::rule<char const*, std::string()> identifier_rule =
qi::char_( "a-zA-Z_" ) >> *qi::char_( "a-zA-Z0-9_" );
boost::spirit::qi::rule<char const*, Out()> abc_rule =
*qi::space >> identifier_rule >> +qi::space >> identifier_rule >> +qi::space >> identifier_rule >> *qi::space;
std::string test = "blub blib blab";
//std::string test = "\n \n blub \t \t \n blib \t \n blab \n \t \n";
//std::string test = " blub blib blab ";
Out o;
char const* f( test.c_str() );
char const* l( f + test.size() );
assert( qi::parse( f, l, abc_rule, o ) );
assert( o.a == "blub" );
assert( o.b == "blib" );
assert( o.c == "blab" );
return 0;
}
我为所有案例制作了一个独立的测试平台:
#include <boost/spirit/include/qi.hpp>
#include <iomanip>
namespace qi = boost::spirit::qi;
struct Out {
std::string a, b, c;
};
BOOST_FUSION_ADAPT_STRUCT(Out, a, b, c)
int main() {
using It = std::string_view::const_iterator;
qi::rule<It, std::string()> identifier_rule = qi::char_("a-zA-Z_") >> *qi::char_("a-zA-Z0-9_");
boost::spirit::qi::rule<It, Out()> abc_rule //
= *qi::space >> identifier_rule //
>> +qi::space >> identifier_rule //
>> +qi::space >> identifier_rule //
>> *qi::space;
for (std::string_view test : {
"blub blib blab",
"\n \n blub \t \t \n blib \t \n blab \n \t \n",
" blub blib blab ",
}) {
It f = test.begin(), l = test.end();
if (Out o; qi::parse(f, l, abc_rule, o))
{
std::cout << "Parsed:\n"
<< "A: " << quoted(o.a) << "\n"
<< "B: " << quoted(o.b) << "\n"
<< "C: " << quoted(o.c) << "\n";
} else
std::cout << "Failed to parse " << quoted(test) << std::endl;
}
}
结果正是我所期望的:
Parsed:
A: ""
B: "blub"
C: " "
Parsed:
A: "
"
B: "blub"
C: "
"
Parsed:
A: " "
B: "blub"
C: " "
您可能期望发生的是
qi::space
被省略。你必须告诉它:
boost::spirit::qi::rule<It, Out()> abc_rule //
= qi::omit[*qi::space] >> identifier_rule //
>> qi::omit[+qi::space] >> identifier_rule //
>> qi::omit[+qi::space] >> identifier_rule //
>> qi::omit[*qi::space];
打印
A: "blub"
B: "blib"
C: "blab"
Parsed:
A: "blub"
B: "blib"
C: "blab"
Parsed:
A: "blub"
B: "blib"
C: "blab"
惯用的方法是使用船长来代替。然后就变得简单了:
boost::spirit::qi::rule<It, Out()> abc_rule =
qi::skip(qi::space)[identifier_ >> identifier_ >> identifier_];
仍然打印相同的内容。
¹ 请参阅背景:Boost 精神船长问题