Kubernetes .NET 应用程序 SocketExceptionFactory + ExtendedSocketException

问题描述 投票:0回答:2

我们最近开始在 Azure 中的 k8s 中推出 .NET Core 应用程序时遇到问题,应用程序无法找到主机名,例如我们的 Azure 数据库名称。

这个问题似乎是间歇性的,因为我们的旧 Pod 仍然运行良好,即使我们弹跳它们,它们也会正常恢复。

下面的问题看似是Hangfire的问题,但实际上是域名解析失败。

Hangfire.SqlServer.SqlServerObjectsInstaller       - An exception occurred while trying to perform the migration. Retrying...
System.Data.SqlClient.SqlException (0x80131904): A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: TCP Provider, error: 35 - An internal exception was caught)
 ---> System.Net.Internals.SocketExceptionFactory+ExtendedSocketException (00000001, 11): Resource temporarily unavailable
   at System.Net.Dns.InternalGetHostByName(String hostName)
   at System.Net.Dns.GetHostAddresses(String hostNameOrAddress)
   at System.Data.SqlClient.SNI.SNITCPHandle.Connect(String serverName, Int32 port, TimeSpan timeout)
   at System.Data.SqlClient.SNI.SNITCPHandle..ctor(String serverName, Int32 port, Int64 timerExpire, Object callbackObject, Boolean parallel)
   at System.Data.ProviderBase.DbConnectionPool.CheckPoolBlockingPeriod(Exception e)
   at System.Data.ProviderBase.DbConnectionPool.CreateObject(DbConnection owningObject, DbConnectionOptions userOptions, DbConnectionInternal oldConnection)
   at System.Data.ProviderBase.DbConnectionPool.UserCreateRequest(DbConnection owningObject, DbConnectionOptions userOptions, DbConnectionInternal oldConnection)
   at System.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, UInt32 waitForMultipleObjectsTimeout, Boolean allowCreate, Boolean onlyOneCheckConnection, DbConnectionOptions userOptions, DbConnectionInternal& connection)
   at System.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal& connection)
   at System.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection)
   at System.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)
   at System.Data.ProviderBase.DbConnectionClosed.TryOpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)
   at System.Data.SqlClient.SqlConnection.TryOpen(TaskCompletionSource`1 retry)
   at System.Data.SqlClient.SqlConnection.Open()
   at Hangfire.SqlServer.SqlServerStorage.CreateAndOpenConnection()
   at Hangfire.SqlServer.SqlServerStorage.UseConnection[T](DbConnection dedicatedConnection, Func`2 func)
   at Hangfire.SqlServer.SqlServerStorage.UseConnection(DbConnection dedicatedConnection, Action`1 action)
   at Hangfire.SqlServer.SqlServerStorage.Initialize()
ClientConnectionId:00000000-0000-0000-0000-000000000000
.net azure kubernetes
2个回答
0
投票

事实证明,该问题与我们的服务主体凭证失效有关,这种情况每年都会发生。本文文章介绍了如何更新您的服务主体。

TL;博士

运行此 bash 脚本。 (如果您使用的是 Windows,这将在 Git Bash 中运行。只需记住安装 Azure CLI。它在 PowerShell 中不起作用。)

RESOURCE=<your resource>
NAME=<cluster name>
SP_ID=$(az aks show --resource-group $RESOURCE --name $NAME --query servicePrincipalProfile.clientId -o tsv)
SP_SECRET=$(az ad sp credential reset --name $SP_ID --query password -o tsv)
az aks update-credentials --resource-group $RESOURCE --name $NAME --reset-service-principal --service-principal $SP_ID --client-secret $SP_SECRET

注意: 最终命令将运行 5 分钟以上。我杀死了我的进程,但它仍然成功完成。


0
投票

我在 AKS 环境中托管的多个 .NET 应用程序(微服务和 API 网关)中遇到了类似的问题。

具体来说,我们收到了 System.Net.Sockets.SocketException 类型的异常,具有不同的超时:10、15、20、90 秒...

在我们的案例中,问题与 coredns pod 有关(与传统网络中的 DNS 服务器类似)。此评论引导我们走向正确的方向:https://github.com/Azure/AKS/issues/1320#issuecomment-555045638

总而言之,解决方案是在我们的 AKS 环境中重新生成所有 coredns pod。使它们全部再生很重要。

© www.soinside.com 2019 - 2024. All rights reserved.