我正在尝试使此方法在字符串过滤器中运行:
public function truncate($string, $chars = 50, $terminator = ' …');
我希望这个
$in = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWYXZ1234567890";
$out = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUV …";
还有这个
$in = "âãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂ㥹ĆćĈĉĊċČčĎďĐđĒēĔĕĖėĘęĚěĜĝ";
$out = "âãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂ㥹ĆćĈĉĊċČčĎďĐđ …";
即$chars
减去$terminator
字符串的字符。
此外,过滤器应该在$chars
限制以下的第一个单词边界处剪切,例如
$in = "Answer to the Ultimate Question of Life, the Universe, and Everything.";
$out = "Answer to the Ultimate Question of Life, the …";
我很确定这应该适用于这些步骤
但是,我现在尝试了str*
和mb_*
函数的各种组合,但是均产生了错误的结果。这不是那么困难,所以我显然缺少了一些东西。有人会为此共享一个可行的实现吗?[[或将我指向一个资源,使我最终可以理解该方法。
P.S。是的,我之前已经检查过https://stackoverflow.com/search?q=truncate+string+php:)
function truncate($string, $chars = 50, $terminator = ' …') {
$cutPos = $chars - mb_strlen($terminator);
$boundaryPos = mb_strrpos(mb_substr($string, 0, mb_strpos($string, ' ', $cutPos)), ' ');
return mb_substr($string, 0, $boundaryPos === false ? $cutPos : $boundaryPos) . $terminator;
}
但是您需要确保正确设置内部编码。
mb_strimwidth
—获取具有指定宽度的截断的字符串我没有尝试运行此程序,但是它应该可以运行,或者至少可以让您90%地达到目标。
mb_strimwidth
首先,要澄清输入值的质量。戈登说,该功能必须是多字节安全的,并遵守字边界。样本数据在确定截断位置时并未暴露对非空格,非单词字符(例如标点符号)的期望处理,因此我们必须假设以空格字符为目标已经足够了,并且明智的做法是,因为大多数“阅读更多内容”字符串在截断时不必担心遵守标点符号。
第二,在相当普遍的情况下,必须对包含换行符的大量文本使用省略号。
第三,让我们随意同意一些基本的数据标准化,例如:
function truncate( $string, $chars = 50, $terminate = ' ...' )
{
$chars -= mb_strlen($terminate);
if ( $chars <= 0 )
return $terminate;
$string = mb_substr($string, 0, $chars);
$space = mb_strrpos($string, ' ');
if ($space < mb_strlen($string) / 2)
return $string . $terminate;
else
return mb_substr($string, 0, $space) . $terminate;
}
的值将始终大于$chars
的mb_strlen()
$terminator
)功能:
测试用例:
function truncateGumbo($string, $chars = 50, $terminator = ' …') { $cutPos = $chars - mb_strlen($terminator); $boundaryPos = mb_strrpos(mb_substr($string, 0, mb_strpos($string, ' ', $cutPos)), ' '); return mb_substr($string, 0, $boundaryPos === false ? $cutPos : $boundaryPos) . $terminator; } function truncateGordon($string, $chars = 50, $terminator = ' …') { return mb_strimwidth($string, 0, $chars, $terminator); } function truncateSoapBox($string, $chars = 50, $terminate = ' …') { $chars -= mb_strlen($terminate); if ( $chars <= 0 ) return $terminate; $string = mb_substr($string, 0, $chars); $space = mb_strrpos($string, ' '); if ($space < mb_strlen($string) / 2) return $string . $terminate; else return mb_substr($string, 0, $space) . $terminate; } function truncateMickmackusa($string, $max = 50, $terminator = ' …') { $trunc = $max - mb_strlen($terminator, 'UTF-8'); return preg_replace("~(?=.{{$max}})(?:\S{{$trunc}}|.{0,$trunc}(?=\s))\K.+~us", $terminator, $string); }
执行:
$tests = [ [ 'testCase' => "Answer to the Ultimate Question of Life, the Universe, and Everything.", // 50th char ---------------------------------------------------^ 'expected' => "Answer to the Ultimate Question of Life, the …", ], [ 'testCase' => "A single line of text to be followed by another\nline of text", // 50th char ----------------------------------------------------^ 'expected' => "A single line of text to be followed by another …", ], [ 'testCase' => "âãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂ㥹ĆćĈĉĊċČčĎďĐđĒēĔĕĖėĘęĚěĜĝ", // 50th char ---------------------------------------------------^ 'expected' => "âãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂ㥹ĆćĈĉĊċČčĎďĐđ …", ], [ 'testCase' => "123456789 123456789 123456789 123456789 123456789", // 50th char doesn't exist -------------------------------------^ 'expected' => "1234567890123456789012345678901234567890123456789", ], [ 'testCase' => "Hello worldly world", // 50th char doesn't exist -------------------------------------^ 'expected' => "Hello worldly world", ], [ 'testCase' => "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWYXZ1234567890", // 50th char ---------------------------------------------------^ 'expected' => "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUV …", ], ];
输出:
foreach ($tests as ['testCase' => $testCase, 'expected' => $expected]) { echo "\tSample Input:\t\t$testCase\n"; echo "\n\ttruncateGumbo:\t\t" , truncateGumbo($testCase); echo "\n\ttruncateGordon:\t\t" , truncateGordon($testCase); echo "\n\ttruncateSoapBox:\t" , truncateSoapBox($testCase); echo "\n\ttruncateMickmackusa:\t" , truncateMickmackusa($testCase); echo "\n\tExpected Result:\t{$expected}"; echo "\n-----------------------------------------------------\n"; }
我的模式说明:尽管看起来确实很难看,但是大多数乱码模式语法都是将数字值插入为动态量词的问题。
我也可以写成:
Sample Input: Answer to the Ultimate Question of Life, the Universe, and Everything. truncateGumbo: Answer to the Ultimate Question of Life, the … truncateGordon: Answer to the Ultimate Question of Life, the Uni … truncateSoapBox: Answer to the Ultimate Question of Life, the … truncateMickmackusa: Answer to the Ultimate Question of Life, the … Expected Result: Answer to the Ultimate Question of Life, the … ----------------------------------------------------- Sample Input: A single line of text to be followed by another line of text truncateGumbo: A single line of text to be followed by … truncateGordon: A single line of text to be followed by another … truncateSoapBox: A single line of text to be followed by … truncateMickmackusa: A single line of text to be followed by another … Expected Result: A single line of text to be followed by another … ----------------------------------------------------- Sample Input: âãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂ㥹ĆćĈĉĊċČčĎďĐđĒēĔĕĖėĘęĚěĜĝ truncateGumbo: âãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂ㥹ĆćĈĉĊċČčĎďĐđ … truncateGordon: âãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂ㥹ĆćĈĉĊċČčĎďĐđ … truncateSoapBox: âãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂ㥹ĆćĈĉĊċČčĎďĐđ … truncateMickmackusa: âãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂ㥹ĆćĈĉĊċČčĎďĐđ … Expected Result: âãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂ㥹ĆćĈĉĊċČčĎďĐđ … ----------------------------------------------------- Sample Input: 123456789 123456789 123456789 123456789 123456789 truncateGumbo: 123456789 123456789 123456789 123456789 12345678 … truncateGordon: 123456789 123456789 123456789 123456789 123456789 truncateSoapBox: 123456789 123456789 123456789 123456789 … truncateMickmackusa: 123456789 123456789 123456789 123456789 123456789 Expected Result: 1234567890123456789012345678901234567890123456789 ----------------------------------------------------- Sample Input: Hello worldly world truncateGumbo: Warning: mb_strpos(): Offset not contained in string in /in/ibFH5 on line 4 Hello worldly world … truncateGordon: Hello worldly world truncateSoapBox: Hello worldly … truncateMickmackusa: Hello worldly world Expected Result: Hello worldly world ----------------------------------------------------- Sample Input: abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWYXZ1234567890 truncateGumbo: abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUV … truncateGordon: abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUV … truncateSoapBox: abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUV … truncateMickmackusa: abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUV … Expected Result: abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUV … -----------------------------------------------------
为简单起见,我将'~(?:\S{' . $trunc . '}|(?=.{' . $max . '}).{0,' . $trunc . '}(?=\s))\K.+~us'
替换为$trunc
,将48
替换为$max
。
50