技术栈:express + typeorm + mysql
我正在寻找此任务的解决方案: 我有 csv 文件(100000 多行),每行都包含一些数据,例如:
reviewer, review, email, rating, employee, employee_position, employee_unique_id, company, company_description
我需要以最快的方式将所有这些数据保存到 5 个不同的表中
@Entity('user')
export class UserEntity {
@PrimaryGeneratedColumn('uuid')
id: number
@Column()
name: string
@Column()
email: string
@OneToMany(() => ReviewEntity, review => review.fromUser)
reviews: ReviewEntity[]
}
@Entity('review')
export class ReviewEntity {
@PrimaryGeneratedColumn()
id: number
@Column('longtext')
text: string
@Column()
rating: number
@ManyToOne(() => UserEntity, user => user.reviews)
fromUser: UserEntity
@ManyToOne(() => EmployeeEntity, employee => employee.reviews)
forEmployee: EmployeeEntity
}
@Entity('employee')
export class EmployeeEntity {
@PrimaryColumn()
id: string
@Column()
name: string
@ManyToOne(() => EmployeePositionEntity, position => position.employees)
position: EmployeePositionEntity
@ManyToOne(() => CompanyEntity, company => company.employees)
company: CompanyEntity
@OneToMany(() => ReviewEntity, review => review.forEmployee)
reviews: ReviewEntity[]
}
@Entity('employee_position')
export class EmployeePositionEntity {
@PrimaryGeneratedColumn()
id: number
@Column()
name: string
@OneToMany(() => EmployeeEntity, employee => employee.position)
employees: EmployeeEntity[]
}
@Entity('company')
export class CompanyEntity {
@PrimaryGeneratedColumn()
id: number
@Column()
name: string
@Column('longtext')
description: string
@OneToMany(() => EmployeeEntity, employee => employee.company)
employees: EmployeeEntity[]
}
我尝试将数据解析为 5 个数组,并以 100 项为单位并行保存它们,但这花费了太长的时间
LOAD DATA INFILE
-- 将所有数据放入包含所有列的单个表 (raw
) 中。CREATE TABLE table1 SELECT ... FROM raw
;DROP TABLE raw
也就是说,由于传入数据的布局不是您所需要的,因此计划仅将其用作其他表的暂存区域。
如果任务的一部分是“规范化”数据,那么及时获取自动增量 ID 会变得有点棘手。这可能需要这种有效的方法来做到这一点(每个ID一次):标准化.
这可能涉及5+1全表扫描。但是,通过在单个查询中执行每个完整任务,速度将比一次执行一行操作快 10 倍。