Hamilton

Submitting Author: Stefan Krawczyk (@skrawcz)
All current maintainers: (@skrawcz, @elijahbenizzy)
Package Name: Hamilton (sf-hamilton on pypi)
One-Line Description of Package: A general purpose micro-framework for defining dataflows.
Repository Link:  https://github.com/stitchfix/hamilton
Version submitted:   1.15.0
Editor: TBD  
Reviewer 1: TBD  
Reviewer 2: TBD  
Archive: TBD  
Version accepted: TBD 
Date accepted (month/day/year): TBD

---

## Description

- Include a brief paragraph describing what your package does:
Hamilton is a general purpose micro-framework for creating dataflows from python functions! Rather than write procedural python code in scripts, one declares python functions that together define a Directed Acyclic Graph (DAG). It was originally built to solve the (micro) challenges in wrangling and maintaining production code to create wide (1000+) column dataframes, but has been extended to enable modeling any python object generation. Hamilton aims for code clarity, low code upkeep costs, ease of modification, with always unit testable and naturally documentable code.

## Scope 
- Please indicate which [category or categories][PackageCategories] this package falls under:
	- [x] Data retrieval
	- [x] Data extraction
	- [x] Data munging
	- [ ] Data deposition
	- [ ] Reproducibility
	- [ ] Geospatial
	- [ ] Education
	- [ ] Data visualization*


- **For all submissions**, explain how the and why the package falls under the categories you indicated above. In your explanation, please address the following points (briefly, 1-2 sentences for each):  
  - Who is the target audience and what are scientific applications of this package?  
  - Are there other Python packages that accomplish the same thing? If so, how does yours differ?
  - If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or `@tag` the editor you contacted: 

Pre-submission enquiry - https://github.com/pyOpenSci/software-submission/issues/74
> Who is the target audience and what are scientific applications of this package?  

Broadly speaking, anyone doing any data transformations in python. This covers writing code to retrieve, extract, and munge data. Hamilton wants to help you structure the code that you write for these tasks.

Scientific applications: primarily feature engineering (retrieve, extract, munge), and then connecting it with modeling, e.g. time-series forecasting, machine learning; basically any work that involves executing a dataflow.

> data munging

Hamilton was built for a team to manage their time-series forecasting feature engineering. So it's design goal was to help data science teams maintain data munging code well.

> data extraction
> data retrieval

I see these two as the precursor steps to `data munging`. With Hamilton you can "model" these as steps in your dataflow and connect it with data munging. Hamilton does not provide any special features here, other than helping you structure you code in a way to ensure that these pieces are modular and can be swapped out easily.

> Are there other Python packages that accomplish the same thing? If so, how does yours differ?

Hamilton is a (Directed Acyclic Graph) DAG framework, so has similarities with other such systems that think similarly, e.g. airflow, dagster, etc. Where Hamilton differs, is that Hamilton focuses on the "micro", whereas these frameworks focus on the "macro". E.g. with Hamilton you can model feature engineering as multiple nodes in a DAG and Hamilton can help construct the dataframe that you want. You can't model that process with airflow, instead airflow wants you to string together tasks that represent "macro" things you might want to do, e.g. extract features, train a model, predict. Because of Hamilton's focus on the micro, it's actually embeddable within these "macro" frameworks, and in general, anywhere that python can run, e.g. jupyter notebooks, pyspark, python scripts, python webservices. For example, here's a blog post showing [Hamilton + Metaflow](https://outerbounds.com/blog/developing-scalable-feature-engineering-dags/) - Hamilton helps with the feature engineering task, and metaflow does the macro orchestration. 

Hamilton is similar in spirit to Pandera. Hamilton has a focus on testable, documentation friendly, and decoupled code, and it's only a library not a whole system that needs to be setup and managed. In this way it's similar in spirit to Pandera -- which is trying to make it simpler to maintain data pipelines. Of course Hamilton provides an integration with Pandera!


## Technical checks

For details about the pyOpenSci packaging requirements, see our [packaging guide][PackagingGuide]. Confirm each of the following by checking the box.  This package:

- [x] does not violate the Terms of Service of any service it interacts with. 
- [x] has an [OSI approved license][OsiApprovedLicense].
- [x] contains a README with instructions for installing the development version. See [here](https://github.com/stitchfix/hamilton/blob/main/developer_setup.md).
- [x] includes documentation with examples for all functions. 
- [x] contains a vignette with examples of its essential functions and uses. 
- [x] has a test suite.
- [x] has continuous integration, such as Travis CI, AppVeyor, CircleCI, and/or others.

## Publication options

- [x] Do you wish to automatically submit to the [Journal of Open Source Software][JournalOfOpenSourceSoftware]? If so:

<details>
 <summary>JOSS Checks</summary>  

- [x] The package has an **obvious research application** according to JOSS's definition in their [submission requirements][JossSubmissionRequirements]. Be aware that completing the pyOpenSci review process **does not** guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS.
- [x] The package is not a "minor utility" as defined by JOSS's [submission requirements][JossSubmissionRequirements]: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria.
- [ ] The package contains a `paper.md` matching [JOSS's requirements][JossPaperRequirements] with a high-level description in the package root or in `inst/`.
- [ ] The package is deposited in a long-term repository with the DOI: 

*Note: Do not submit your package separately to JOSS*
  
</details>

## Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?
This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.

- [x] Yes I am OK with reviewers submitting requested changes as issues to my repo. Reviewers will then link to the issues in their submitted review.

## Code of conduct

- [x] I agree to abide by [pyOpenSci's Code of Conduct][PyOpenSciCodeOfConduct] during the review process and in maintaining my package should it be accepted.

## Please fill out our survey
- [x] [Last but not least please fill out our pre-review survey](https://forms.gle/F9mou7S3jhe8DMJ16). This helps us track
submission and improve our peer review process. We will also ask our reviewers 
and editors to fill this out.

**P.S.** *Have feedback/comments about our review process? Leave a comment [here][Comments]


## Editor and Review Templates

The [editor template can be found here][Editor Template].

The [review template can be found here][Review Template].

[PackagingGuide]: https://www.pyopensci.org/contributing-guide/authoring/index.html#packaging-guide

[PackageCategories]: https://www.pyopensci.org/contributing-guide/open-source-software-peer-review/aims-and-scope.html?highlight=data#package-categories

[NotesOnCategories]: https://www.pyopensci.org/contributing-guide/open-source-software-peer-review/aims-and-scope.html?highlight=data#notes-on-categories


[JournalOfOpenSourceSoftware]: http://joss.theoj.org/

[JossSubmissionRequirements]: https://joss.readthedocs.io/en/latest/submitting.html#submission-requirements

[JossPaperRequirements]: https://joss.readthedocs.io/en/latest/submitting.html#what-should-my-paper-contain

[PyOpenSciCodeOfConduct]: https://www.pyopensci.org/contributing-guide/open-source-software-peer-review/code-of-conduct.html?highlight=code%20conduct

[OsiApprovedLicense]: https://opensource.org/licenses

[Editor Template]: https://www.pyopensci.org/peer-review-guide/software-peer-review-guide/editors-guide.html#respond-to-the-submitter-in-the-github-issue

[Review Template]: https://www.pyopensci.org/peer-review-guide/software-peer-review-guide/reviewer-guide.html#peer-review-template

[Comments]: https://github.com/pyOpenSci/governance/issues/8


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Hamilton #80

Description

Scope

Technical checks

Publication options

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

Code of conduct

Please fill out our survey

Editor and Review Templates

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Hamilton #80

Description

Description

Scope

Technical checks

Publication options

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

Code of conduct

Please fill out our survey

Editor and Review Templates

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions