-
Notifications
You must be signed in to change notification settings - Fork 91
Download mnist.tar.gz if it is missing #55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This commit removes the data/ subdirectory and adds *.dat files to to the .gitignore file.
After this commit gets merged, it would be a good idea to use a git repo surgery tool to remove all copies of mnist.tar.gz and any other binary files from the repositories history. @everythingfunctional recently did this on another project and might have advice about what tool to use. After removing all binary files from the commit history, anyone who has a local copy of this repository will need to obtain a fresh clone -- especially if they will ever push commits to this repository or submit a pull request from a fork created before the binary files were removed. |
Thanks, Damian. I moved the subroutine to the MNIST submodule because mod_io provides more general functions to read binary files. Let me know if it looks good. I'll merge tomorrow if no objections. Eventually we should generalize and move the downloader function into its own module as there will be more datasets to download. |
@milancurcic looks good to me. Thanks for approving the PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@milancurcic let me know if you would like any refinements to this PR before it's merged. Examples:
These are the sorts of things I would do if I were writing a shell script to download the code, but I'm on the fence as to whether these would be improvements for several reasons:
|
No, let's do any refinements in separate PRs. I prefer small PRs. We already check for presence of one .dat file to determine whether to download or not. This assumes that all files are present and correctly downloaded. We can improve this, but in a separate PR. As you work with me you'll find that I'm all about "good enough" and "mostly working" solutions, and never about bullet-proof or perfect solutions. This allows me to find out what's a real problem and what isn't, instead of guessing. |
@milancurcic We are very much aligned: I frequently advocate for getting working solutions out as early as possible and then making refinements based on the feedback of users or other developers. |
This PR resolves issue #54 and contains all of the commits from PR #51 because it modifies submodules that won't exist on the main branch until #51 gets merged.