我的任务是: “获取交易表,按交易日期对行进行分组并计算状态。此操作将形成统计数据,并将在页面上呈现”。
这是我的统计生成方法
public static function getStatistics(Website $website = null)
{
if($website == null) return [];
$query = \DB::table('transactions')->where("website_id", $website->id)->orderBy("dt", "desc")->get();
$transitions = collect(static::convertDate($query))->groupBy("dt");
$statistics = collect();
dd($transitions);
foreach ($transitions as $date => $trans) {
$subscriptions = $trans->where("status", 'subscribe')->count();
$unsubscriptions = $trans->where("status", 'unsubscribe')->count();
$prolongations = $trans->where("status", 'rebilling')->count();
$redirections = $trans->where("status", 'redirect_to_lp')->count();
$conversion = $redirections == 0 ? 0 : ((float) ($subscriptions / $redirections));
$earnings = $trans->sum("pay");
$statistics->push((object)[
"date" => $date,
"subscriptions" => $subscriptions,
'unsubscriptions' => $unsubscriptions,
'prolongations' => $prolongations,
'redirections' => $redirections,
'conversion' => round($conversion, 2),
'earnings' => $earnings,
]);
}
return $statistics;
}
如果交易行数低于 100,000 - 一切正常。但是,如果计数高于 150-200k - nginx 会抛出 502 bad gateway。你能给我什么建议?我在大数据处理方面没有任何经验。难道,我的想法有根本性的错误?
大数据从来都不容易,但我建议使用 Laravel
chunk
而不是 get
。
https://laravel.com/docs/5.1/eloquent(ctrl+f“::chunk”)
::chunk
的作用是一次选择n行,并允许您一点一点地处理它们。这很方便,因为它允许您将更新流式传输到浏览器,但在 ~150k 结果范围内,我建议查找如何将这项工作推送到后台进程,而不是根据请求处理它。
经过几天的研究这个问题的信息,我找到了正确的答案:
不要使用 PHP 处理原始数据。最好使用 SQL!
就我而言,我们使用的是 PostgreSQL。
下面,我将编写对我有用的sql查询,也许对其他人有帮助。
WITH
cte_range(dt) AS
(
SELECT
generate_series('2016-04-01 00:00:00'::timestamp with time zone, '{$date} 00:00:00'::timestamp with time zone, INTERVAL '1 day')
),
cte_data AS
(
SELECT
date_trunc('day', dt) AS dt,
COUNT(*) FILTER (WHERE status = 'subscribe') AS count_subscribes,
COUNT(*) FILTER (WHERE status = 'unsubscribe') AS count_unsubscribes,
COUNT(*) FILTER (WHERE status = 'rebilling') AS count_rebillings,
COUNT(*) FILTER (WHERE status = 'redirect_to_lp') AS count_redirects_to_lp,
SUM(pay) AS earnings,
CASE
WHEN COUNT(*) FILTER (WHERE status = 'redirect_to_lp') > 0 THEN 100.0 * COUNT(*) FILTER (WHERE status = 'subscribe')::float / COUNT(*) FILTER (WHERE status = 'redirect_to_lp')::float
ELSE 0
END
AS conversion_percent
FROM
transactions
WHERE
website_id = {$website->id}
GROUP BY
date_trunc('day', dt)
)
SELECT
to_char(cte_range.dt, 'YYYY-MM-DD') AS day,
COALESCE(cte_data.count_subscribes, 0) AS count_subscribe,
COALESCE(cte_data.count_unsubscribes, 0) AS count_unsubscribes,
COALESCE(cte_data.count_rebillings, 0) AS count_rebillings,
COALESCE(cte_data.count_redirects_to_lp, 0) AS count_redirects_to_lp,
COALESCE(cte_data.conversion_percent, 0) AS conversion_percent,
COALESCE(cte_data.earnings, 0) AS earnings
FROM
cte_range
LEFT JOIN
cte_data
ON cte_data.dt = cte_range.dt
ORDER BY
cte_range.dt DESC
使用 Laravel 处理大型数据库数据/数组的最佳方法是使用
LazyCollection
(docs) 和 chunks
。还讨论了这里
<?php
$chunkSize = 500;
MyModel::lazy($chunkSize)->each(function (LazyCollection $items) {
$items->each(function(MyModel $item) {
// $item handling
});
});
// or collect($bigDataArray)->lazy()->chunk($chunkSize)->each(function(LazyCollection $items) {...})