REDCap-EDA is a command-line tool for performing Exploratory Data Analysis (EDA) on REDCap datasets. It automates data inspection, schema enforcement, statistical analysis, visualization, and report generation.
- ✅ Automatic Data Type Enforcement (casts columns based on a predefined or user-defined schema)
- 📊 Summary Statistics (mean, median, std dev, outliers, categorical distributions)
- 📉 Visualizations (histograms, box plots, categorical distributions, time trends, word clouds)
- 📂 Comprehensive PDF Report Generation with UnifiedReport
- 🔄 Multiprocessing for Faster Execution
- 🔍 Progress Bars with
tqdm
- 📂 Exports Reports (JSON, PDF, and saved visualizations)
- 📝 Interactive Schema Creation for custom datasets
pip install redcap-eda
redcap-eda analyze --sample
redcap-eda analyze --sample --sample-schema
redcap-eda analyze --csv path/to/your_data.csv
redcap-eda analyze --csv path/to/your_data.csv --schema path/to/schema.json
redcap-eda --debug analyze --sample
redcap-eda list-cases
.
├── Makefile # Helper commands
├── README.md # Project documentation
├── dist # Distribution files for PyPI
├── mypy.ini # Type checking configuration
├── poetry.lock # Poetry dependency lock file
├── pyproject.toml # Poetry project configuration
├── schemas # Saved schema files
│ └── schema_sample_dataset.json
├── src
│ ├── logs
│ │ └── redcap_eda.log # Log files
│ └── redcap_eda
│ ├── analysis # EDA analysis modules
│ │ ├── categorical
│ │ │ └── mixins.py # Categorical data analysis
│ │ ├── datetime
│ │ │ └── mixins.py # Datetime data analysis
│ │ ├── eda.py # Main EDA module
│ │ ├── json_report_handler.py # JSON export utility
│ │ ├── lib.py # Shared data structures (e.g., AnalysisResult)
│ │ ├── missing
│ │ │ └── mixins.py # Missing data analysis
│ │ ├── numerical
│ │ │ └── mixins.py # Numerical data analysis
│ │ └── text
│ │ └── mixins.py # Text data analysis
│ ├── cast_schema.py # Schema enforcement
│ ├── cli.py # Command-line interface
│ ├── load_case_data.py # Dataset loader
│ ├── logger.py # Logging utilities
│ └── unified_report.py # PDF report generation
└── tests # Unit tests
├── __init__.py
└── fixtures
└── toy_data.csv # Sample test data
- Fork the repository and create a feature branch.
- Run tests to ensure code integrity:
poetry run pytest tests/
- Submit a pull request with a detailed description.
This project is licensed under the MIT License.
- REDCap for enabling structured data collection.
- The Open Source Community for inspiration & contributions!