to_sql() performance regression (#19664) when DF contains many columns

After #19664 `to_sql()` performance is highly dependent on the number of columns.

#### Problem description

For a single column 0.23 is faster than 0.22. Adding more columns has little influence on 0.22 but quickly degrades the performance on 0.23. I.e. for 20 columns 0.23 takes 1.5 times longer than 0.22

Performance degradation happens regardless of parameter `chunksize` adjustment.

```python
import time

import numpy as np
import pandas as pd
from sqlalchemy import create_engine


start = time.time()

N_COLS = 20
df = pd.DataFrame({n: np.arange(0,20_000,1) for n in range(N_COLS)})

#create the engine to connect pandas with sqlite3
engine = create_engine('postgresql://user:@localhost/db')
#create connection
conn = engine.connect()

# convert df to sql table
df.to_sql('test',engine, if_exists='replace',chunksize=1_000)
result = conn.execute("select * from test")
conn.close()
print('WRITE: {}'.format(time.time() - start))
```

Tested using linux, postgres 9.6, python  3.6, SQLAlchemy 1.2.2


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

to_sql() performance regression (#19664) when DF contains many columns #21146

Problem description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

to_sql() performance regression (#19664) when DF contains many columns #21146

Description

Problem description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions