在 Power Query M 中计算二元组列表中匹配的字母对

问题描述 投票:0回答:1

在我的上一个关于 M 中二元分解的问题取得了巨大成功之后,我现在遇到了一个新问题。

我正在尝试计算两个二元组列表之间的匹配对,我尝试过的两种方法有时都会低估,我无法准确说明原因。 例如,{"He","el","ll","lo"} 和 {"He","el","lo"} 之间的匹配正确计数为 3 个匹配(好吧,6 个,但我是故意将函数内的计数加倍),而 {"He","el","ll","lo"} 和 {"Hi","il","lo"} 错误地计算了 0 个匹配项而不是 1 个匹配项。

我在实现之前使用 List.Sort() 作为输入,尽管问题对于排序来说是不变的。

我的两个函数都基于 wikibooks 中 Dice 相似性页面的 java 实现中的计数器

此处摘录:

int matches = 0, i = 0, j = 0;
    while (i < n && j < m)
    {
        if (sPairs[i] == tPairs[j])
        {
            matches += 2;
            i++;
            j++;
        }
        else if (sPairs[i] < tPairs[j])
            i++;
        else
            j++;
    }

我最初的破解导致了递归函数:

(x as list, y as list, i as number, j as number, matches) as number => 
let 
    matcher = if x{i} = y{j} then matches + 2 else matches,
    ineq = if x{i} < y{j} then 1 else 0,
    Check = if i = List.Count(x) - 1 or j = List.Count(y) - 1 then matcher
            else if matcher > matches then @Counter1(x,y,i+1,j+1,matcher)
            else if ineq = 1 then @Counter1(x,y,i+1,j,matcher)
            else @Counter1(x,y,i,j+1,matcher)
in Check

为了避免由于潜在的速度问题而导致的递归,我还让副驾驶使用 list.accumulate 给我写了一个函数

(x as list, y as list) as number => 
let 
    n = List.Count(x),
    m = List.Count(y),
    matches = List.Accumulate(
        {0..n-1},
        [i = 0, j = 0, matches = 0],
        (state, current) =>
            if state[i] < n and state[j] < m then
                if x{state[i]} = y{state[j]} then
                    [i = state[i] + 1, j = state[j] + 1, matches = state[matches] + 2]
                else if x{state[i]} < y{state[j]} then
                    [i = state[i] + 1, j = state[j], matches = state[matches]]
                else
                    [i = state[i], j = state[j] + 1, matches = state[matches]]
            else
                state
    )[matches]
in
    matches

据我所知,这两个函数都给出了相同的输出,第二个函数肯定感觉更快,尽管这意味着它们也都给出了计数不足的问题。

唯一想到的是我改编的java代码使用字母对的二进制表示来比较它们,而我不确定M如何在我的函数中比较字母对。

任何帮助将不胜感激!

powerbi powerquery m
1个回答
0
投票

我认为你的编码技能比我好得多,但无论如何我都会尝试一下。作为一个懒惰的人,我选择了我能想到的最简单的方法,将第二个二元组列表扩展到第一个列表中的每个二元组并计数匹配。我确信可以用更简单的方式来完成,但这是我使用 Power Query 的简单方式:

let
    // Load the original table, subsitute other sources
    Source = Table.FromRecords({
        [Column1 = "He", Column2 = "Hi"],
        [Column1 = "el", Column2 = "il"],
        [Column1 = "ll", Column2 = "lo"],
        [Column1 = "lo", Column2 = null]
    }),

    // Separate Column1 and Column2 into two tables
    Table1 = Table.SelectColumns(Source, {"Column1"}),
    Table2 = Table.SelectColumns(Source, {"Column2"}),

    // Remove null values from Table2, not strictly required
    Table2NonNull = Table.SelectRows(Table2, each [Column2] <> null),

    // Create a custom column in Table1 to add all rows of Table2 to each row of Table1
    CrossJoin = Table.AddColumn(Table1, "Column2", each Table2NonNull[Column2]),

    // Expand the new column to create the cross join effect
    ExpandedTable = Table.ExpandListColumn(CrossJoin, "Column2"),
    // Count matches
    Custom = Table.AddColumn(ExpandedTable, "Custom", each if [Column1]=[Column2] then 1 else 0),
    // Sum all ones in the column
    TotalMatchCount = List.Sum(Custom[Custom])
in
    TotalMatchCount
© www.soinside.com 2019 - 2024. All rights reserved.