Skip to content
This repository was archived by the owner on Jul 16, 2025. It is now read-only.

noahgift/rdedupe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tests Build binary release Clippy Rustfmt

🎓 Pragmatic AI Labs | Join 1M+ ML Engineers

🔥 Hot Course Offers:

🚀 Level Up Your Career:

Learn end-to-end ML engineering from industry veterans at PAIML.COM

RDedupe

A Rust based deduplication tool

Goals

  • Build a multiplatform, fast deduplication tool that uses Rust parallelization.

hpc-threaded-data-engineering

Current Status

  • Added Rayon Parallization
  • Added progress bar Progress Bar
  • Added Polars DataFrame
  • Added statistics about files with optional CSV report.

Future Improvements

  • Add a GUI
  • Add a web interface
  • Fix GitHub Actions Build process to not fail silently!
  • Store logs about actions performed across multiple runs

Building and Running

  • Build: cd into rdedupe and run make all
  • Run: cargo run -- dedupe --path tests --pattern .txt
  • Run tests: make test

OS X Install

  • Install rust via rustup
  • Add to ~/.cargo/config
[target.x86_64-apple-darwin]
rustflags = [
  "-C", "link-arg=-undefined",
  "-C", "link-arg=dynamic_lookup",
]

[target.aarch64-apple-darwin]
rustflags = [
  "-C", "link-arg=-undefined",
  "-C", "link-arg=dynamic_lookup",
]
  • run make all in rdedupe directory

Releases

No releases published

Sponsor this project

 

Packages

No packages published

Contributors 2

  •  
  •