我需要使用 Java8 中的 apache-avro 库创建镶木地板文件。使用 Maven 生成资源从“.avsc”文件自动创建的 POJO。但我在处理文件中的 BigDecimal 字段时遇到问题。我研究了 apache-avro 库文档(apache-avro)。 我能够使用我需要的字段类型成功创建 POJO。但我在写入阶段遇到异常。我看到已经问过类似的问题,但没有解决方案来解决我的问题。
这是我正在处理的代码github代码
Employee_schema.avsc
{"type": "record",
"namespace": "com.avro.example",
"name": "Employee",
"fields": [
{"name": "name","type": "string"},
{"name": "email","type": "string"},
{"name": "salary",
"type": { "type": "bytes",
"logicalType": "decimal",
"precision": 4,
"scale": 2
}}]
}
主.类
public static void main(String[] args) throws IOException {
File outputParquet = new File("./output.parquet");
Files.deleteIfExists(outputParquet.toPath());
Employee employee = new Employee("john", "[email protected]",BigDecimal.TEN);
ParquetWriter<Employee> writer = AvroParquetWriter.<Employee>builder(new Path(outputParquet.getAbsolutePath()))
.withCompressionCodec(CompressionCodecName.SNAPPY)
.withSchema(employee.getSchema())
.build();
try {
writer.write(employee);
} catch (Exception e) {
e.printStackTrace();
}
writer.close();
}
pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.avro.example</groupId>
<artifactId>avro-example</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.source>8</maven.compiler.source>
<maven.compiler.target>8</maven.compiler.target>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.parquet</groupId>
<artifactId>parquet-avro</artifactId>
<version>1.11.1</version>
</dependency>
<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro</artifactId>
<version>1.11.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>1.2.1</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.avro</groupId>
<artifactId>avro-maven-plugin</artifactId>
<version>1.11.1</version>
<executions>
<execution>
<goals>
<goal>schema</goal>
</goals>
<phase>generate-sources</phase>
<configuration>
<sourceDirectory>${project.basedir}/src/main/avro/</sourceDirectory>
<outputDirectory>${project.basedir}/src/main/java/</outputDirectory>
<enableDecimalLogicalType>true</enableDecimalLogicalType>
<fieldVisibility>private</fieldVisibility>
<stringType>String</stringType>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
我不知道技术原因,但你必须配置一个十进制转换器。
首先创建一个数据模型:
GenericData dataModel = new SpecificData();
然后添加十进制转换器:
dataModel.addLogicalTypeConversion(new DecimalConversion());
在
AvroParquetWriter
构建器中,您可以使用方法 dataModel
配置新的
withDataModel
结果:
GenericData dataModel = new SpecificData();
dataModel.addLogicalTypeConversion(new DecimalConversion());
ParquetWriter<Employee> writer = AvroParquetWriter.<Employee>builder(output)
.withSchema(employee.getSchema())
.withDataModel(dataModel)
.build();
例如,要写入
java.time.LocalDateTime
值,您需要添加此转换器:
genericDataModel.addLogicalTypeConversion(new TimeConversions.LocalTimestampMillisConversion());