Skip to content

ENH: speed up wide DataFrame.line plots by using a single LineCollection #61764

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

EvMossan
Copy link

@EvMossan EvMossan commented Jul 3, 2025

What does this PR change?

  • Speeds up DataFrame.plot(kind="line") when the frame is “wide”.
    If the DataFrame has > 200 columns, is not a time-series plot, has
    no stacking and no error bars, we now draw everything with a single
    matplotlib.collections.LineCollection instead of one Line2D per column.
  • No API changes; behaviour is identical for smaller plots or the excluded
    cases above.

Performance numbers

500 rows × 2000 cols (RangeIndex) master this PR speed-up
df.plot(legend=False) 0.342 s 0.056 s 6.1×

Benchmarked on pandas 3.0.0.dev0+2183.g94ff63adb2, matplotlib 3.10.3, NumPy 2.2.6

Notes

  • This PR does not change anything for DatetimeIndex plots—those remain on the original per-column path. A follow-up could combine LineCollection with the x_compat=True workaround (see #61398) to similarly speed up time-series plots.
  • Threshold (> 200 columns) is a heuristic and can be tuned in review.

cc @shadnikn @arthurlw – happy to take any feedback 🙂

@EvMossan EvMossan marked this pull request as ready for review July 3, 2025 08:59
@EvMossan EvMossan marked this pull request as draft July 3, 2025 09:00
@EvMossan EvMossan force-pushed the plot-linecollection-speedup branch from 41f3346 to 1cc672b Compare July 3, 2025 09:09
@EvMossan EvMossan force-pushed the plot-linecollection-speedup branch from 7bf84c2 to 0febdd9 Compare July 3, 2025 10:23
@EvMossan EvMossan marked this pull request as ready for review July 3, 2025 11:19
Comment on lines +1556 to +1563
threshold = 200 # switch when DataFrame has more than this many columns
can_use_lc = (
not self._is_ts_plot() # not a TS plot
and not self.stacked # stacking not requested
and not com.any_not_none(*self.errors.values()) # no error bars
and len(self.data.columns) > threshold
)
if can_use_lc:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer not to have a special casing like this because it's difficult to maintain parity between a "fast path" and the existing path.

Is there a way to refactor our plotting here to generalize the plotting to this form rather than the iterative approach below?

@EvMossan EvMossan force-pushed the plot-linecollection-speedup branch from 2cfc0ba to 0febdd9 Compare July 4, 2025 08:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants