C++ 函数返回极其缓慢，远慢于功能等效的 python 代码

Question

我有一个在我编写的脚本中使用的函数，用于从列表中删除多余的阻塞关键字。基本上，输入（以任何顺序）：

{"apple", "bapple", "banana", "cherry", "bananaman", "sweetherrypie", "sweet", "b"}

它应该输出一个缩小的字符串数组（以任何顺序）：

{"apple", "cherry", "sweet", "b"}

这样做是为了减少匹配算法需要通过的关键字数量，作为个人学习项目（如果字符串包含“banana”，它也会包含“b”，所以香蕉是不必要的，可以丢弃

它需要具有极高的性能，因为它必须在超过 10,000,000 个字符串的列表上运行。

最初，我编写了一些 Python 代码来执行此操作：

def deduplicate_keywords(keywords):
    # Remove duplicates and sort the keywords by length in ascending order
    keywords = sorted(set(keywords), key=len)
    unique_keywords = []

    for keyword in keywords:
        not_redundant = True

        for unique_keyword in unique_keywords:
            # Check if the current keyword includes any of the unique keywords
            if unique_keyword in keyword:
                not_redundant = False
                break
        # If the keyword does not include any of the shorter keywords, add it to unique_keywords
        if not_redundant:
            unique_keywords.append(keyword)

    return unique_keywords

以及衡量其性能的性能基准：

class TestDeduplicateKeywords(unittest.TestCase):
    def test_performance_intensive_case(self):
        keywords = [
            "abcdef",
            "def",
            "ghidef",
            "jkl",
            "mnop",
            "qrst",
            "uvwxyz",
            "ghijkl",
            "abcdefgh",
            "ijklmnop",
            "mnopqrstuv",
            "rstuvwx",
            "mnopqrstuvwx",
            "uvwx",
        ]

        # Extending the list to make it very long and complex
        base_keywords = keywords[:]
        for i in range(200000):
            base_keywords.extend([f"{k}{i}" for k in keywords])
            base_keywords.extend([f"{i}{k}" for k in keywords])
            base_keywords.extend([f"{i}{k}{i}" for k in keywords])

        keywords = base_keywords

        old_function_time = 0
        new_function_time = 0

        for repeat in range(1): 
            # This loop (not present in the cpp code) is used to loop it multiple times to find an average
            start_time = time.time()
            result = deduplicate_keywords(keywords)
            end_time = time.time()
            old_function_time = (
                repeat * old_function_time + (end_time - start_time)
            ) / (repeat + 1)
            start_time = time.time()
            result = deduplicate_keywords_new(keywords) 
            # Above is a function that can be edited to check whether changes speed up the code
            end_time = time.time()
            new_function_time = (
                repeat * new_function_time + (end_time - start_time)
            ) / (repeat + 1)

        # As the list is extended with numbers, the result will be unique non-redundant keywords
        expected_result = ["def", "jkl", "mnop", "qrst", "uvwx"]

        self.assertEqual(set(result), set(expected_result))
        print(
            f"Test performance_intensive_case for old function took {old_function_time} seconds\n"
            f"Test performance_intensive_case for new function took {new_function_time} seconds\n"
        )

        if old_function_time < new_function_time:
            print(f"Old is faster than new")
        else:
            print(f"New is faster than old")


if __name__ == "__main__":
    unittest.main()

这对于我在

12 seconds

的弱笔记本电脑上的需求来说太慢了。

因此，我尝试在 cpp 中编写相同的内容，看看是否能获得性能提升。它so慢得多，所以我向函数本身添加了一些计时代码，以查看速度减慢的地方：

#include <iostream>
#include <vector>
#include <string>
#include <set>
#include <algorithm>
#include <chrono>
#include <cassert>

std::vector<std::string> deduplicate_keywords(const std::vector<std::string>& keywords) {
    auto t_start = std::chrono::high_resolution_clock::now();

    // Remove duplicates using a set
    auto t1 = std::chrono::high_resolution_clock::now();
    std::cout << "started old function!" << std::endl;
    std::set<std::string> unique_set(keywords.begin(), keywords.end());
    auto t2 = std::chrono::high_resolution_clock::now();
    std::cout << "removed duplicates in " << std::chrono::duration<double>(t2 - t1).count() << " seconds" << std::endl;

    // Convert set back to vector and sort by length
    auto t3 = std::chrono::high_resolution_clock::now();
    std::vector<std::string> sorted_keywords(unique_set.begin(), unique_set.end());
    std::sort(sorted_keywords.begin(), sorted_keywords.end(), [](const std::string& a, const std::string& b) {
        return a.length() < b.length();
    });
    auto t4 = std::chrono::high_resolution_clock::now();
    std::cout << "Sorted by length in " << std::chrono::duration<double>(t4 - t3).count() << " seconds" << std::endl;

    // Filter out redundant keywords
    auto t5 = std::chrono::high_resolution_clock::now();
    std::vector<std::string> unique_keywords;
    for (const auto& keyword : sorted_keywords) {

        for (const auto& unique_keyword : unique_keywords) {
            // Check if the current keyword includes any of the unique keywords
            if (keyword.find(unique_keyword) != std::string::npos) {
                goto next_iteration;
            }
        }

        // If the keyword does not include any of the shorter keywords, add it to unique_keywords
        unique_keywords.push_back(keyword);
        next_iteration:
            continue;
    }
    auto t6 = std::chrono::high_resolution_clock::now();
    std::cout << "Filtered in " << std::chrono::duration<double>(t6 - t5).count() << " seconds" << std::endl;

    auto t7 = std::chrono::high_resolution_clock::now();
    sorted_keywords.clear();
    sorted_keywords.shrink_to_fit();
    unique_keywords.shrink_to_fit();
    auto t8 = std::chrono::high_resolution_clock::now();
    std::cout << "Destructed in " << std::chrono::duration<double>(t8 - t7).count() << " seconds" << std::endl;

    auto t9 = std::chrono::high_resolution_clock::now();
    auto total_time = std::chrono::duration<double>(t9 - t_start).count();
    std::cout << "Total function time: " << total_time << " seconds" << std::endl;

    return unique_keywords;
}

void test_performance_intensive_case() {
    std::vector<std::string> keywords = {
        "abcdef", "def", "ghidef", "jkl", "mnop", "qrst", "uvwxyz",
        "ghijkl", "abcdefgh", "ijklmnop", "mnopqrstuv", "rstuvwx",
        "mnopqrstuvwx", "uvwx"
    };

    // Extending the list to make it very long and complex
    std::vector<std::string> base_keywords = keywords;
    for (int i = 0; i < 20000; ++i) {
        for (const auto& k : keywords) {
            base_keywords.push_back(k + std::to_string(i));
            base_keywords.push_back(std::to_string(i) + k);
            base_keywords.push_back(std::to_string(i) + k + std::to_string(i));
        }
    }

    auto keywords_extended = base_keywords;
    std::set<std::string> expected_result = {"def", "jkl", "mnop", "qrst", "uvwx"};

    // there is no looping to calculate an average because it is so much slower
    auto start_time_total = std::chrono::high_resolution_clock::now();
    auto start_time = std::chrono::high_resolution_clock::now();
    std::vector<std::string> result_old = deduplicate_keywords(keywords_extended);
    auto end_time = std::chrono::high_resolution_clock::now();
    double old_function_time = std::chrono::duration<double>(end_time - start_time).count();
    auto end_time_total = std::chrono::high_resolution_clock::now();
    double total_test_time = std::chrono::duration<double>(end_time_total - start_time_total).count();
    assert(std::set<std::string>(result_old.begin(), result_old.end()) == expected_result);

    std::cout << "old function time: " << old_function_time << " seconds" << std::endl;
    std::cout << "Total test case time: " << total_test_time << " seconds" << std::endl;

    auto start_time_total_new = std::chrono::high_resolution_clock::now();
    auto start_time_new = std::chrono::high_resolution_clock::now();
    std::vector<std::string>result_new = deduplicate_keywords_new(keywords_extended); // same as in the python code, useful for comparison
    auto end_time_new = std::chrono::high_resolution_clock::now();
    double new_function_time = std::chrono::duration<double>(end_time - start_time).count();
    auto end_time_total_new = std::chrono::high_resolution_clock::now();
    double total_test_time_new = std::chrono::duration<double>(end_time_total_new - start_time_total_new).count();
    assert(std::set<std::string>(result_new.begin(), result_new.end()) == expected_result);


    std::cout << "new function time: " << new_function_time << " seconds" << std::endl;
    std::cout << "Total test case time: " << total_test_time_new << " seconds" << std::endl;

    std::cout << "Test performance_intensive_case for old function took " << old_function_time
              << " seconds but new function took " << new_function_time << " seconds"  << std::endl;

    if (old_function_time < new_function_time) {
        std::cout << "Old is faster than new" << std::endl;
    } else {
        std::cout << "New is faster than old" << std::endl;
    }
}

int main() {
    test_performance_intensive_case();
    return 0;
}

这会输出到终端 this （缩短，删除当前空的 deduplicate_keywords_new 函数的日志）：

started old function!
removed duplicates in 9.64173 seconds
Sorted by length in 3.36138 seconds
Filtered in 0.366933 seconds
Destructed in 7.63208 seconds
Total function time: 21.0037 seconds
old function time: 405.213 seconds
Total test case time: 405.213 seconds

除了总体速度降低之外，日志似乎表明返回函数（

return unique keywords

）似乎需要非常长的时间。为什么是这样？我能做些什么来阻止这种情况呢？我对 cpp 编码和开发相当陌生，因此如果您的解释详细而透彻，那就太好了。谢谢！

Answer 1

恕我直言，这个解决方案更快：

struct sort {
    bool operator () (const std::string& s1, const std::string& s2) const {
        size_t len1 = s1.length();
        size_t len2 = s2.length();
        return len1 == len2 ? s1 < s2 : len1 < len2;
    }
};

struct unique {
    bool operator () (const std::string& s1, const std::string& s2) const {
        return s1.find(s2) == std::string::npos &&
               s2.find(s1) == std::string::npos &&
               s1 < s2;
    }
};

std::vector<std::string> deduplicate_keywords (const std::vector<std::string>& keywords)
{
    std::set<std::string, sort> sorted_keywords (keywords.cbegin(), keywords.cend());
    std::set<std::string, unique> unique_keywords (sorted_keywords.cbegin(), sorted_keywords.cend());
    return std::vector<std::string>(unique_keywords.cbegin(), unique_keywords.cend());
}

比较运算符由std::set的优化插入算法使用。

C++ 函数返回极其缓慢，远慢于功能等效的 python 代码

问题描述投票：0回答：1

1个回答

最新问题

C++ 函数返回极其缓慢，远慢于功能等效的 python 代码

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1