我正在读取bigquery表数据并将它们加载到案例类中,并在加载时面对这个
null pointer exception
java.lang.NullPointerException: null
at org.apache.spark.unsafe.UTF8StringBuilder.append(UTF8StringBuilder.java:76) ~[spark-unsafe_2.12-3.5.0.jar:3.5.0]
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.project_fieldToString_0_0$(Unknown Source) ~[?:?]
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.project_fieldToString_1_2$(Unknown Source) ~[?:?]
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.project_elementToString_1$(Unknown Source) ~[?:?]
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) ~[?:?]
select cm from `test-project.MAPPINGS.dataset_configurations_test` t
,unnest(column_mappings) as cm
where t.data_type='PM'
and cm.column_name in ('XPI_MIN','WEEK');
[{
"cm": {
"mapping_type": "aggregation",
"source_column_name": "COLLECTTIME",
"column_name": "WEEK",
"name": "WEEK",
"display_name": null,
"description": null,
"keep_source_column": "true",
"formula": "DATE_FORMAT(COLLECTTIME, \"w\")",
"functions": {
"fun_temporal": "FIRST",
"fun_regional": "FIRST",
"fun_temporal_unit": [],
"fun_regional_unit": []
}
}
}, {
"cm": {
"mapping_type": "ingestion",
"source_column_name": "XPI_Min",
"column_name": "XPI_MIN",
"name": "XPI_MIN",
"display_name": "XPI_Min",
"description": "XPI_Min",
"keep_source_column": "false",
"formula": null,
"functions": {
"fun_temporal": "MIN",
"fun_regional": "MIN",
"fun_temporal_unit": [{
"key": null,
"value": null
}],
"fun_regional_unit": []
}
}
}]
给出的是案例类的结构
case class Functions
(
fun_temporal: Option[String],
fun_regional: Option[String],
fun_temporal_unit: Option[Map[String,String]],
fun_regional_unit: Option[Map[String,String]],
)
尝试加载时代码失败
column XPI_Min
我可以如下更新 bigquery 表数据来修复它,但这对我们来说开销太大。因为我们必须更新大量记录。在案例类声明中寻找一些解决方案或使用一些 scala/spark。
update `test-project.MAPPINGS.dataset_configurations_test` a
set column_mappings=
ARRAY(
SELECT AS STRUCT mapping_type,source_column_name,column_name,b.name,display_name,description,keep_source_column,formula,
STRUCT(functions.fun_temporal as fun_temporal
, functions.fun_regional as fun_regional
, CAST(NULL as ARRAY<STRUCT<key STRING, value STRING>>) as fun_regional_unit
, CAST(NULL as ARRAY<STRUCT<key STRING, value STRING>>) as fun_regional_unit
) as functions
FROM UNNEST(column_mappings) b where b.column_name='XPI_MIN'
)
where a.name='SNIR_XPI' and a.technology='MW'
;
尽管如果您发布一个可重现的示例来验证您似乎正在尝试拥有一个带有空键的地图,那就太好了 - 这是不允许的:
MapType值,键不允许有空值
您必须在源处修复数据,在尝试显示/映射等之前通过选择/投影删除任何空键条目,或者将其视为 Seq[(String, String)] 并处理代码中的空值.