-
Notifications
You must be signed in to change notification settings - Fork 290
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Is your feature request related to a problem or challenge?
As a part of #1382 , we need to implement insert_into
for IcebergTableProvider
to support INSERT INTO
query in datafusion:
insert into t value (1, 'a');
Physical Plans
Within insert_into
, we will need to add a few nodes / Datafusion physical plans to complete the write process. And the entire write process can be described by the flowchart below:
flowchart TD
A(["Input Node"]) --> F["Project Node"]
F --> B["Repartition Node"]
B --> C["Sort Node"]
C --> D["Writer Node"]
D --> E["Commit Node"]
- Input Node: Input physical plan that represents the input data
- Project Node: Caculate partition value
- Repartition Node: Decide the partitioning mode for the best parallelism
- Sort Node: Sort the input data
- Writer Node: Spawn Iceberg writers and write the input data
- Commit Node: Commit the data written using Iceberg Tx API
Writer Extension
Except writers mentioned in the writer path of #1382 , there are other writers like RollingFileWriter
can be useful to help split incoming data into multiple files
Tasks List
- Implement
RollingFileWriter
: Helps split incoming data into multiple files #1541 - Implement Project Node: Caculate partition value #1542
- Implement Repartition Node: Decide when the partitioning mode for the best parallelism #1543
- Implement Sort Node: Sort the input data #1544
- Implement Writer Node: Spawn Iceberg writers and write the input data #1545
- Implement Commit Node: Commit the data written using Iceberg Tx API #1546
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request