-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Closed
Labels
PerformanceMemory or execution speed performanceMemory or execution speed performanceRegressionFunctionality that used to work in a prior pandas versionFunctionality that used to work in a prior pandas version
Milestone
Description
After #19664 to_sql()
performance is highly dependent on the number of columns.
Problem description
For a single column 0.23 is faster than 0.22. Adding more columns has little influence on 0.22 but quickly degrades the performance on 0.23. I.e. for 20 columns 0.23 takes 1.5 times longer than 0.22
Performance degradation happens regardless of parameter chunksize
adjustment.
import time
import numpy as np
import pandas as pd
from sqlalchemy import create_engine
start = time.time()
N_COLS = 20
df = pd.DataFrame({n: np.arange(0,20_000,1) for n in range(N_COLS)})
#create the engine to connect pandas with sqlite3
engine = create_engine('postgresql://user:@localhost/db')
#create connection
conn = engine.connect()
# convert df to sql table
df.to_sql('test',engine, if_exists='replace',chunksize=1_000)
result = conn.execute("select * from test")
conn.close()
print('WRITE: {}'.format(time.time() - start))
Tested using linux, postgres 9.6, python 3.6, SQLAlchemy 1.2.2
Metadata
Metadata
Assignees
Labels
PerformanceMemory or execution speed performanceMemory or execution speed performanceRegressionFunctionality that used to work in a prior pandas versionFunctionality that used to work in a prior pandas version