Skip to content

[VL] The time taken to merge payload during the shuffle write is excessively high #10104

Open
@NEUpanning

Description

@NEUpanning

Backend

VL (Velox)

Bug description

The total time across all tasks for vanilla Spark was 78.1 hours, but for gluten it reached 1899.3 hours. The flame graph shows that the majority of time is occupied by the merging of payloads. After adding some logs, I see merge operation occurred 1522330 times for 1084128 rows in a task, with each instance taking a few milliseconds.

Flame graph:
Image

gluten shuffle metrics:

shuffle records written: 39,407,858,231
shuffle write time total (min, med, max (stageId: taskId))
17.86 h (0 ms, 27.4 s, 17.1 m (stage 0.0: task 499))
time to compress total (min, med, max (stageId: taskId))
31.52 h (0 ms, 29.6 s, 20.3 m (stage 0.0: task 1191))
time to split total (min, med, max (stageId: taskId))
1478.09 h (0 ms, 45.5 m, 1.71 h (stage 0.0: task 86))
time to spill total (min, med, max (stageId: taskId))
15.77 h (0 ms, 25.2 s, 16.3 m (stage 0.0: task 499))

shuffle schema:

Image

Gluten version

Gluten-1.3

Spark version

Spark-3.5.x

Spark configurations

No response

System information

No response

Relevant logs

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtriage

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions