Skip to content

Implement insert_into for IcebergTableProvider #1540

@CTTY

Description

@CTTY

Is your feature request related to a problem or challenge?

As a part of #1382 , we need to implement insert_into for IcebergTableProvider to support INSERT INTO query in datafusion:

insert into t value (1, 'a');

Physical Plans

Within insert_into, we will need to add a few nodes / Datafusion physical plans to complete the write process. And the entire write process can be described by the flowchart below:

flowchart TD
    A(["Input Node"]) --> F["Project Node"]
    F --> B["Repartition Node"]
    B --> C["Sort Node"]
    C --> D["Writer Node"]
    D --> E["Commit Node"]
Loading
  • Input Node: Input physical plan that represents the input data
  • Project Node: Caculate partition value
  • Repartition Node: Decide the partitioning mode for the best parallelism
  • Sort Node: Sort the input data
  • Writer Node: Spawn Iceberg writers and write the input data
  • Commit Node: Commit the data written using Iceberg Tx API

Writer Extension

Except writers mentioned in the writer path of #1382 , there are other writers like RollingFileWriter can be useful to help split incoming data into multiple files

Tasks List

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions