如何优化laravel上的大数据处理?

问题描述 投票:0回答:3

我的任务是: “获取交易表,按交易日期对行进行分组并计算状态。此操作将形成统计数据,并将在页面上呈现”。

这是我的统计生成方法

public static function getStatistics(Website $website = null)
{
    if($website == null) return [];

    $query = \DB::table('transactions')->where("website_id", $website->id)->orderBy("dt", "desc")->get();

    $transitions = collect(static::convertDate($query))->groupBy("dt");
    $statistics = collect();

    dd($transitions);

    foreach ($transitions as $date => $trans) {
        $subscriptions = $trans->where("status", 'subscribe')->count();
        $unsubscriptions = $trans->where("status", 'unsubscribe')->count();
        $prolongations = $trans->where("status", 'rebilling')->count();
        $redirections = $trans->where("status", 'redirect_to_lp')->count();
        $conversion = $redirections == 0 ? 0 : ((float) ($subscriptions / $redirections));
        $earnings = $trans->sum("pay");

        $statistics->push((object)[
            "date" => $date,
            "subscriptions" => $subscriptions,
            'unsubscriptions' => $unsubscriptions,
            'prolongations' => $prolongations,
            'redirections' => $redirections,
            'conversion' => round($conversion, 2),
            'earnings' => $earnings,
        ]);

    }

    return $statistics;
}

如果交易行数低于 100,000 - 一切正常。但是,如果计数高于 150-200k - nginx 会抛出 502 bad gateway。你能给我什么建议?我在大数据处理方面没有任何经验。难道,我的想法有根本性的错误?

php laravel laravel-5 bigdata
3个回答
4
投票

大数据从来都不容易,但我建议使用 Laravel

chunk
而不是
get

https://laravel.com/docs/5.1/eloquent(ctrl+f“::chunk”)

::chunk
的作用是一次选择n行,并允许您一点一点地处理它们。这很方便,因为它允许您将更新流式传输到浏览器,但在 ~150k 结果范围内,我建议查找如何将这项工作推送到后台进程,而不是根据请求处理它。


4
投票

经过几天的研究这个问题的信息,我找到了正确的答案:

不要使用 PHP 处理原始数据。最好使用 SQL!

就我而言,我们使用的是 PostgreSQL。

下面,我将编写对我有用的sql查询,也许对其他人有帮助。

WITH
        cte_range(dt) AS
        (
            SELECT
                generate_series('2016-04-01 00:00:00'::timestamp with time zone, '{$date} 00:00:00'::timestamp with time zone, INTERVAL '1 day')
        ),

        cte_data AS
        (
            SELECT
                date_trunc('day', dt) AS dt,
                COUNT(*) FILTER (WHERE status = 'subscribe') AS count_subscribes,
                COUNT(*) FILTER (WHERE status = 'unsubscribe') AS count_unsubscribes,
                COUNT(*) FILTER (WHERE status = 'rebilling') AS count_rebillings,
                COUNT(*) FILTER (WHERE status = 'redirect_to_lp') AS count_redirects_to_lp,
                SUM(pay) AS earnings,
                CASE
                    WHEN COUNT(*) FILTER (WHERE status = 'redirect_to_lp') > 0 THEN 100.0 * COUNT(*) FILTER (WHERE status = 'subscribe')::float / COUNT(*) FILTER (WHERE status = 'redirect_to_lp')::float
                    ELSE 0
                END
                AS conversion_percent

            FROM
                transactions

            WHERE
                website_id = {$website->id}

            GROUP BY
                date_trunc('day', dt)
        )

        SELECT
            to_char(cte_range.dt, 'YYYY-MM-DD') AS day,
            COALESCE(cte_data.count_subscribes, 0) AS count_subscribe,
            COALESCE(cte_data.count_unsubscribes, 0) AS count_unsubscribes,
            COALESCE(cte_data.count_rebillings, 0) AS count_rebillings,
            COALESCE(cte_data.count_redirects_to_lp, 0) AS count_redirects_to_lp,
            COALESCE(cte_data.conversion_percent, 0) AS conversion_percent,
            COALESCE(cte_data.earnings, 0) AS earnings

        FROM
            cte_range

        LEFT JOIN
            cte_data
            ON cte_data.dt = cte_range.dt

        ORDER BY
            cte_range.dt DESC

0
投票

使用 Laravel 处理大型数据库数据/数组的最佳方法是使用

LazyCollection
(docs) 和
chunks
。还讨论了这里

<?php
$chunkSize = 500;
MyModel::lazy($chunkSize)->each(function (LazyCollection $items) {
    $items->each(function(MyModel $item) {
        // $item handling
    });
});
// or collect($bigDataArray)->lazy()->chunk($chunkSize)->each(function(LazyCollection $items) {...})
© www.soinside.com 2019 - 2024. All rights reserved.