-
Notifications
You must be signed in to change notification settings - Fork 127
Description
Is your feature request related to a problem? Please describe.
Currently, one only uses the columns parameter to re-order the list of columns and it has to exactly match the columns provided in the query or table. See this TODO:
python-bigquery-pandas/pandas_gbq/gbq.py
Lines 939 to 944 in 912b615
# TODO(kiraksi): allow columns to be a subset of all columns in the table, with follow up PR | |
if columns is not None: | |
if sorted(columns) == sorted(final_df.columns): | |
final_df = final_df[columns] | |
else: | |
raise InvalidColumnOrder("Column order does not match this DataFrame.") |
Describe the solution you'd like
Only download the selected columns if the user passes a list of columns to read_gbq
For queries:
Maybe these still need to have the columns match since one can specify these in SQL? I don't see a selected_fields option in https://cloud.google.com/python/docs/reference/bigquery/latest/google.cloud.bigquery.client.Client#google_cloud_bigquery_client_Client_query_and_wait
For table IDs:
Pass the list of columns through as selected_fields
to https://cloud.google.com/python/docs/reference/bigquery/latest/google.cloud.bigquery.client.Client#google_cloud_bigquery_client_Client_list_rows
Starting here:
python-bigquery-pandas/pandas_gbq/gbq.py
Lines 914 to 919 in 912b615
final_df = connector.download_table( | |
query_or_table, | |
max_results=max_results, | |
progress_bar_type=progress_bar_type, | |
dtypes=dtypes, | |
) |
python-bigquery-pandas/pandas_gbq/gbq.py
Line 396 in 912b615
rows_iter = self.client.list_rows(table_ref, max_results=max_results) |
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Aside: https://googleapis.dev/python/pandas-gbq/latest/reading.html has no mention that a table ID is supported. We should add a sample there.