如何使用 EFCore 查找 CosmosDB 中的不良数据?

问题描述 投票:0回答:1

我正在尝试从 CosmosDb 中的数据库集中提取所有条目,如下所示:

var list = context.MarketplaceTransactions.ToList();

此操作失败:

info: 8/23/2024 12:44:05.527 CosmosEventId.ExecutingSqlQuery[30100] (Microsoft.EntityFrameworkCore.Database.Command)
      Executing SQL query for container 'StrangeCloudDbContext' in partition '(null)' [Parameters=[]]
      SELECT c
      FROM root c
      WHERE (c["Discriminator"] = "MarketplaceTransaction")
info: 8/23/2024 12:44:07.788 CosmosEventId.ExecutedReadNext[30102] (Microsoft.EntityFrameworkCore.Database.Command)
      Executed ReadNext (2088.0046 ms, 39.43 RU) ActivityId='cdfdac63-7605-4746-a90d-9669cf864d0e', Container='StrangeCloudDbContext', Partition='(null)', Parameters=[]
      SELECT c
      FROM root c
      WHERE (c["Discriminator"] = "MarketplaceTransaction")
fail: 8/23/2024 12:44:07.802 CoreEventId.QueryIterationFailed[10100] (Microsoft.EntityFrameworkCore.Query)
      An exception occurred while iterating over the results of a query for context type 'StrangeCloud.Api.Data.StrangeCloudDbContext'.
      System.InvalidOperationException: Nullable object must have a value.
         at lambda_method14(Closure, QueryContext, JObject)
         at Microsoft.EntityFrameworkCore.Cosmos.Query.Internal.CosmosShapedQueryCompilingExpressionVisitor.QueryingEnumerable`1.Enumerator.MoveNext()

(我已经打开了详细的错误报告和敏感日志记录)

大概数据库中的条目之一对于不为空的属性有一个空条目(也许?)但是有数百万个数据库条目和数十个属性,并且该错误既不告诉我该属性也不告诉我该条目问题。 我怎样才能找到并解决问题?

entity-framework-core azure-cosmosdb
1个回答
0
投票

您可以使用 AsEnumerable() 然后枚举结果,而不是使用尝试一次具体化所有实体的 ToList()。这样,您就可以捕获每个有问题的实体的异常:

using System;
using System.Collections.Generic;
using System.Linq;
using Microsoft.EntityFrameworkCore;

async Task ProcessMarketplaceTransactions(StrangeCloudDbContext context)
{
    var problematicEntities = new List<(int index, Exception exception)>();
    var validEntities = new List<MarketplaceTransaction>();
    int totalProcessed = 0;

    try
    {
        // First, attempt to execute the query without materializing entities
        var query = context.MarketplaceTransactions.AsNoTracking();
        
        // Use a small Take() to test if the query executes at all
        await query.Take(1).ToListAsync();

        // If we get here, the query itself is valid. Now let's process entities.
        await foreach (var entity in query.AsAsyncEnumerable())
        {
            try
            {
                // Attempt to access all properties to force full materialization
                var temp = new 
                {
                    entity.Id,
                    // List all other properties here
                };
                validEntities.Add(entity);
            }
            catch (Exception ex)
            {
                problematicEntities.Add((totalProcessed, ex));
            }
            totalProcessed++;

            // Optional: Add a break condition if you want to limit processing
            // if (totalProcessed >= 1000000) break;
        }
    }
    catch (Exception ex)
    {
        Console.WriteLine($"Query execution failed: {ex.Message}");
        return; // Exit if we can't even start the query
    }

    Console.WriteLine($"Total entities processed: {totalProcessed}");
    Console.WriteLine($"Valid entities: {validEntities.Count}");
    Console.WriteLine($"Problematic entities: {problematicEntities.Count}");

    foreach (var (index, exception) in problematicEntities)
    {
        Console.WriteLine($"Error at index {index}: {exception.Message}");
    }
}

// Usage
await ProcessMarketplaceTransactions(context);
  1. 首先尝试使用 Take(1) 执行查询以检查查询本身是否有效。这将捕获查询执行期间发生的任何异常。
  2. 如果查询有效,它将使用 AsAsyncEnumerable() 来流式传输结果,这对于大型数据集更有效。
  3. 然后,它尝试单独具体化每个实体,捕获有问题的实体的异常。
  4. 外部 try-catch 将处理查询执行期间发生的任何异常,而内部 try-catch 处理实体具体化期间发生的异常。
© www.soinside.com 2019 - 2024. All rights reserved.