我目前正在尝试使用Hibernate实现批量插入。这是我实现的几件事:
1。实体
@Entity
@Table(name = "my_bean_table")
@Data
public class MyBean {
@Id
@GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "seqGen")
@SequenceGenerator(name = "seqGen", sequenceName = "bean_c_seq", allocationSize=50)
@Column(name = "my_bean_id")
private Long id;
@Column(name = "my_bean_name")
private String name;
@Column(name = "my_bean_age")
private int age;
public MyBean(String name, int age) {
this.name = name;
this.age = age;
}
}
2.application.properties
休眠和数据源的配置方式是:
spring.datasource.url=jdbc:postgresql://{ip}:{port}/${db}?reWriteBatchedInserts=true&loggerLevel=TRACE&loggerFile=pgjdbc.log
spring.jpa.show-sql=truespring.jpa.properties.hibernate.jdbc.batch_size=50
spring.jpa.properties.hibernate.order_inserts=true
NB:&loggerLevel=TRACE&loggerFile=pgjdbc.log
用于调试目的
3。我的PostgresSQL数据库中的元素
CREATE TABLE my_bean_table
(
my_bean_id bigint NOT NULL DEFAULT nextval('my_bean_seq'::regclass),
my_bean_name "char(100)" NOT NULL,
my_bean_age smallint NOT NULL,
CONSTRAINT bean_c_table_pkey PRIMARY KEY (bean_c_id)
)
CREATE SEQUENCE my_bean_seq
INCREMENT 50
START 1
MINVALUE 1
MAXVALUE 9223372036854775807
CACHE 1;
编辑:已添加ItemWriter
public class MyBeanWriter implements ItemWriter<MyBean> {
private Logger logger = LoggerFactory.getLogger(MyBeanWriter .class);
@Autowired
MyBeanRepository repository;
@Override
public void write(List<? extends BeanFluxC> items) throws Exception {
repository.saveAll(items);
}
}
commit-interval也设置为50。
在jdbc驱动程序提供的日志文件中,我得到以下几行:
avr. 10, 2020 7:26:48 PM org.postgresql.core.v3.QueryExecutorImpl execute
FINEST: batch execute 3 queries, handler=org.postgresql.jdbc.BatchResultHandler@1317ac2c, maxRows=0, fetchSize=0, flags=5
avr. 10, 2020 7:26:48 PM org.postgresql.core.v3.QueryExecutorImpl sendParse
FINEST: FE=> Parse(stmt=null,query="insert into my_bean_table (my_bean_age, my_bean_name, my_bean_id) values ($1, $2, $3),($4, $5, $6),($7, $8, $9),($10, $11, $12),($13, $14, $15),($16, $17, $18),($19, $20, $21),($22, $23, $24),($25, $26, $27),($28, $29, $30),($31, $32, $33),($34, $35, $36),($37, $38, $39),($40, $41, $42),($43, $44, $45),($46, $47, $48),($49, $50, $51),($52, $53, $54),($55, $56, $57),($58, $59, $60),($61, $62, $63),($64, $65, $66),($67, $68, $69),($70, $71, $72),($73, $74, $75),($76, $77, $78),($79, $80, $81),($82, $83, $84),($85, $86, $87),($88, $89, $90),($91, $92, $93),($94, $95, $96)",oids={23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20})
...
FINEST: FE=> Execute(portal=null,limit=1)
avr. 10, 2020 7:26:48 PM org.postgresql.core.v3.QueryExecutorImpl sendParse
FINEST: FE=> Parse(stmt=null,query="insert into my_bean_table (my_bean_age, my_bean_name, my_bean_id) values ($1, $2, $3),($4, $5, $6),($7, $8, $9),($10, $11, $12),($13, $14, $15),($16, $17, $18),($19, $20, $21),($22, $23, $24),($25, $26, $27),($28, $29, $30),($31, $32, $33),($34, $35, $36),($37, $38, $39),($40, $41, $42),($43, $44, $45),($46, $47, $48)",oids={23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20,23,1043,20})
...
avr. 10, 2020 7:26:48 PM org.postgresql.core.v3.QueryExecutorImpl sendParse
FINEST: FE=> Parse(stmt=null,query="insert into my_bean_table (my_bean_age, my_bean_name, my_bean_id) values ($1, $2, $3),($4, $5, $6)",oids={23,1043,20,23,1043,20})
这是我的问题:为什么将批处理查询分为3个查询:
注意:我试图将批量大小设置为100和200,但仍然有3个不同的查询。
我没有一个定论的答案,但是这种行为看起来非常相似,并且可能与批量获取的原因相同。
它使用不同的语句,参数集的数量等于2的幂。这是为了最大程度地减少执行的不同语句的数量。数据库需要解析语句,并使用缓存来保存解析的语句。如果客户端执行大量的语句,而这些语句实际上在做相同的事情,但是参数集的数量不同,则这将使缓存无效。
另一方面,我没有通过批量插入看到它,而仅通过批量获取操作看到它。我有一些猜测为什么会这样:
您的ID由数据库生成,因此在将数据写入数据库ID之前,需要从数据库序列中查询。也许选择行为比泄漏到插入物]]
这可能是由JDBC驱动程序完成的优化,正在重写这种auf语句。
Hibernate一直这样做,而我只是想念那个。尽管我认为当参数集的数量等于批处理大小时这样做很奇怪。