Skip to content

Download mnist.tar.gz if it is missing #55

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 18, 2022

Conversation

rouson
Copy link
Collaborator

@rouson rouson commented Apr 17, 2022

This PR resolves issue #54 and contains all of the commits from PR #51 because it modifies submodules that won't exist on the main branch until #51 gets merged.

@rouson rouson requested a review from milancurcic April 17, 2022 00:14
@rouson
Copy link
Collaborator Author

rouson commented Apr 17, 2022

After this commit gets merged, it would be a good idea to use a git repo surgery tool to remove all copies of mnist.tar.gz and any other binary files from the repositories history. @everythingfunctional recently did this on another project and might have advice about what tool to use. After removing all binary files from the commit history, anyone who has a local copy of this repository will need to obtain a fresh clone -- especially if they will ever push commits to this repository or submit a pull request from a fork created before the binary files were removed.

@milancurcic milancurcic added the enhancement New feature or request label Apr 18, 2022
@milancurcic
Copy link
Member

Thanks, Damian. I moved the subroutine to the MNIST submodule because mod_io provides more general functions to read binary files. Let me know if it looks good. I'll merge tomorrow if no objections.

Eventually we should generalize and move the downloader function into its own module as there will be more datasets to download.

@rouson
Copy link
Collaborator Author

rouson commented Apr 18, 2022

@milancurcic looks good to me. Thanks for approving the PR.

Copy link
Collaborator Author

@rouson rouson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rouson
Copy link
Collaborator Author

rouson commented Apr 18, 2022

@milancurcic let me know if you would like any refinements to this PR before it's merged. Examples:

  1. The code could check whether mnist.tar.gz is present, and if so, not download it.
  2. The code could prompt the user for permission to download the file before doing so.

These are the sorts of things I would do if I were writing a shell script to download the code, but I'm on the fence as to whether these would be improvements for several reasons:

  1. Each execute_command_line call requires assumptions about what type of OS shell is being used (must be Unix-like) what's available on the system and what commands are available in that shell (e.g., curl and ls).
  2. execute_command_line requires verbose arguments and verbose error handling, some of which might not even work depending on the system behavior (the error-handling code in this PR seemed to end up being a no-op in macOS -- I haven't tested it in Linux).
  3. Even if the file is present, it might be from a failed or incomplete download so presence of the file doesn't necessary guarantee the integrity of the file, which is why I defaulted to just doing the download if the uncompressed files are missing.

@milancurcic
Copy link
Member

No, let's do any refinements in separate PRs. I prefer small PRs.

We already check for presence of one .dat file to determine whether to download or not. This assumes that all files are present and correctly downloaded. We can improve this, but in a separate PR.

As you work with me you'll find that I'm all about "good enough" and "mostly working" solutions, and never about bullet-proof or perfect solutions. This allows me to find out what's a real problem and what isn't, instead of guessing.

@rouson
Copy link
Collaborator Author

rouson commented Apr 18, 2022

@milancurcic We are very much aligned: I frequently advocate for getting working solutions out as early as possible and then making refinements based on the feedback of users or other developers.

@rouson rouson merged commit 60b5357 into modern-fortran:main Apr 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants