我有一个记忆泄漏问题(我认为这就是所谓的),我在24小时内进行了无数次尝试后无法解决。
背景:我有一个文章表(〜37m行),带有Artend_id和Journal_id列,Journal_specialty Table(〜29K行),带有Journal_id和Specialty_ID列。我想使用来自其他两个表格的数据来填充Article_specialty表,并使用preciate_id and Pretecialty_id填充 - 根据期刊的特色词,每篇文章都在其中发表。 我的代码:
namespace App\Console\Commands\Articles;
use Illuminate\Console\Command;
use Illuminate\Support\Facades\DB;
class xa02_ArticleSpecialty extends Command
{
protected $signature = 'articles:xa02-article-specialty';
protected $description = 'Populate article_specialty table using journal_specialty relationships';
private const BATCH_SIZE = 50000;
private const INSERT_CHUNK_SIZE = 25000;
public function handle()
{
$this->info('Starting article_specialty population...');
$startTime = microtime(true);
$journalSpecialties = DB::table('journal_specialty')
->select('journal_id', 'specialty_id')
->get()
->groupBy('journal_id')
->map(fn($group) => $group->pluck('specialty_id')->toArray())
->toArray();
$totalRecords = DB::select("SHOW TABLE STATUS LIKE 'articles'")[0]->Rows;
$this->info("Total articles to process: $totalRecords");
$bar = $this->output->createProgressBar($totalRecords);
$bar->start();
$offset = 0;
// Debugging: Show initial memory usage
$this->info('Initial memory usage: ' . memory_get_usage() . ' bytes');
while ($offset < $totalRecords) {
$articles = DB::table('articles')
->orderBy('date_e')
->limit(self::BATCH_SIZE)
->offset($offset)
->get()
->toArray();
// Debugging: Show memory usage after fetching articles
$this->info('Memory usage after fetching articles: ' . memory_get_usage() . ' bytes');
if (empty($articles)) {
break; // ✅ No more data to process
}
$insertData = [];
foreach ($articles as $article) {
if (isset($journalSpecialties[$article->journal_id])) {
foreach ($journalSpecialties[$article->journal_id] as $specialty_id) {
$insertData[] = [
'article_id' => $article->article_id,
'specialty_id' => $specialty_id
];
}
}
if (count($insertData) >= self::INSERT_CHUNK_SIZE) {
DB::table('article_specialty')->insertOrIgnore($insertData);
$insertData = []; // ✅ Free memory immediately
}
}
// Debugging: Show memory usage after insert
$this->info('Memory usage after inserting: ' . memory_get_usage() . ' bytes');
if (!empty($insertData)) {
DB::table('article_specialty')->insertOrIgnore($insertData);
}
$bar->advance(count($articles));
// Trying to free memory after processing
$articles = null;
$insertData = null;
gc_collect_cycles();
clearstatcache();
DB::table('articles')->newQuery();
DB::flushQueryLog();
$offset += self::BATCH_SIZE;
// Debugging: Show memory usage after processing batch
$this->newLine();
$this->info('Memory usage after processing batch: ' . memory_get_usage() . ' bytes');
}
$bar->finish();
$this->newLine();
$totalTime = microtime(true) - $startTime;
$this->info('✅ Completed! Processed articles in ' . gmdate("H:i:s", $totalTime) . '.');
// Debugging: Show final memory usage
$this->info('Final memory usage: ' . memory_get_usage() . ' bytes');
}
}
outcome:
Starting article_specialty population...
Total articles to process: 37765760
0/37765760 [>---------------------------] 0%
Initial memory usage: 35593704 bytes
Memory usage after fetching articles: 141872128 bytes
Memory usage after inserting: 147389216 bytes
100000/37765760 [>---------------------------] 0%
Memory usage after processing batch: 41217656 bytes
Memory usage after fetching articles: 145440808 bytes
Memory usage after inserting: 155017720 bytes
200000/37765760 [>---------------------------] 0%
Memory usage after processing batch: 46857472 bytes
Memory usage after fetching articles: 151319400 bytes
Memory usage after inserting: 161351936 bytes
300000/37765760 [>---------------------------] 0%
Memory usage after processing batch: 52506008 bytes
Memory usage after fetching articles: 156635200 bytes
Memory usage after inserting: 166937624 bytes
...
它一直积累,直到它停止出现错误,使记忆疲惫:
PHP Fatal error: Allowed memory size of 536870912 bytes exhausted (tried to allocate 1052672 bytes)
我想念什么?
这是29k记录,它不适合您的记忆;您需要将其分成较小的部分。
$journalSpecialties = DB::table('journal_specialty')
->select('journal_id', 'specialty_id')
->get()
Query仅与给定文章块有关的
journal_specialty
记录,以确保在致电之前通过SQL处理所有过滤(减少结果尺寸)。
get()
由于您只需要
DB::table('articles')->orderBy('date_e')->chunk(self::BATCH_SIZE, function ($articles) {
$insertData = [];
foreach ($articles as $article) {
$specialties = DB::table('journal_specialty')
->where('journal_id', $article->journal_id)
->pluck('specialty_id')
->toArray();
foreach ($specialties as $specialty_id) {
$insertData[] = [
'article_id' => $article->article_id,
'specialty_id' => $specialty_id
];
}
if (count($insertData) >= self::INSERT_CHUNK_SIZE) {
DB::table('article_specialty')->insertOrIgnore($insertData);
$insertData = [];
}
}
if (!empty($insertData)) {
DB::table('article_specialty')->insertOrIgnore($insertData);
}
gc_collect_cycles();
});
,因此您可以使用插曲仅检索该列,从而减少存储在内存中的数据量。当然,这会导致更多的查询,但是您的内存不会填补。