-
-
Notifications
You must be signed in to change notification settings - Fork 18.8k
Description
There are several places where pandas has hidden heuristics/thresholds dictating certain behavior that is not immediately obvious or configurable to the user. IIRC, there have been bugs in rolling
and to_datetime
where buggy behavior was encountered when data had a particular value or the data was a certain size for example which can be hard to diagnose.
Ideally we should:
- Not change behavior due to some data characteristic introspection
- At lease expose the option to the user to control the heuristic
CSV reading tokenizer chunksize
pandas/pandas/_libs/parsers.pyx
Line 119 in bb0403b
int64_t DEFAULT_CHUNKSIZE = 256 * 1024 |
CSV line buffer size
pandas/pandas/_libs/parsers.pyx
Line 587 in bb0403b
heuristic = 2**20 // self.table_width |
Number of elements when to auto use numexpr
_MIN_ELEMENTS = 1_000_000 |
TDA iter chunk size processing
pandas/pandas/core/arrays/timedeltas.py
Line 387 in bb0403b
chunksize = 10000 |
Something pytables related
pandas/pandas/core/computation/pytables.py
Line 101 in bb0403b
_max_selectors = 31 |
Line 1887 in bb0403b
chunksize = 100000 |
Number of element to automatically use caching in to_datetime
pandas/pandas/core/tools/datetimes.py
Line 124 in bb0403b
start_caching_at = 50 |
Chunk size to use when writing csv
pandas/pandas/io/formats/csvs.py
Line 166 in bb0403b
return (100000 // (len(self.cols) or 1)) or 1 |
Number of regexes to store when time parsing
pandas/pandas/_libs/tslibs/strptime.pyx
Line 576 in bb0403b
_CACHE_MAX_SIZE = 5 # Max number of regexes stored in _regex_cache |
Rank tolerance
Line 61 in bb0403b
float64_t FP_ERR = 1e-13 |
isin algo determination
pandas/pandas/core/algorithms.py
Line 521 in bb0403b
len(comps_array) > 1_000_000 |
Value formatting
pandas/pandas/io/formats/format.py
Line 1562 in bb0403b
has_large_values = (abs_vals > 1e6).any() |
Number of elements to populate hash table
Line 99 in bb0403b
_SIZE_CUTOFF = 1_000_000 |