High-performance code similarity detection tools written in Rust. Detects duplicate functions and similar code patterns across your codebase in multiple programming languages.
Tool | Language | Status | Description |
---|---|---|---|
similarity-ts | TypeScript/JavaScript | ✅ Production Ready | Most mature and production-tested |
similarity-py | Python | Not production-tested yet | |
similarity-rs | Rust | Not production-tested yet | |
similarity-elixir | Elixir | 🧪 Experimental | Early development stage |
similarity-generic | Go, Java, C/C++, C#, Ruby | 🧪 Experimental | Early development stage |
similarity-md | Markdown | 🧪 Experimental | Early development stage |
- Zero configuration - works out of the box
- Multi-language support - TypeScript/JavaScript, Python, and Rust
- Fast & Accurate - AST-based comparison, not just text matching
- AI-friendly output - Easy to share with Claude, GPT-4, etc.
cargo install similarity-ts
# Scan current directory
similarity-ts .
# Scan specific files
similarity-ts src/utils.ts src/helpers.ts
# Show actual code
similarity-ts . --print
Copy the output and use this prompt with Claude:
Run `similarity-ts .` to detect semantic code similarities. Execute this command, analyze the duplicate code patterns, and create a refactoring plan. Check `similarity-ts -h` for detailed options.
Example output:
Duplicates in src/utils.ts:
────────────────────────────────────────────────────────────
src/utils.ts:10-20 calculateTotal <-> src/helpers.ts:5-15 computeSum
Similarity: 92.50%, Score: 9.2 points
The AI will analyze patterns and suggest refactoring strategies.
- AI Assistant Guide - Refactoring workflow and best practices
- similarity-ts - TypeScript/JavaScript similarity detection ✅ Most mature and production-tested
- similarity-py - Python similarity detection
⚠️ Not production-tested - similarity-rs - Rust similarity detection
⚠️ Not production-tested
- similarity-elixir - Elixir similarity detection 🧪 Experimental
- similarity-generic - Generic similarity detection for Go, Java, C/C++, C#, Ruby 🧪 Experimental
- similarity-md - Markdown similarity detection 🧪 Experimental
# Install from crates.io
cargo install similarity-ts
# Use the installed binary
similarity-ts --help
# Install from crates.io
cargo install similarity-py
# Use the installed binary
similarity-py --help
# Install from crates.io
cargo install similarity-rs
# Use the installed binary
similarity-rs --help
# Install from crates.io
cargo install similarity-elixir
# Use the installed binary
similarity-elixir --help
# Install from crates.io
cargo install similarity-generic
# Use the installed binary
similarity-generic --language go main.go
similarity-generic --language java Main.java
# Clone the repository
git clone https://github.com/mizchi/similarity.git
cd similarity
# Build all tools
cargo build --release
# Or install specific tool
cargo install --path crates/similarity-ts
cargo install --path crates/similarity-py
cargo install --path crates/similarity-rs
--threshold
/-t
- Similarity threshold (0.0-1.0, default: 0.85)--min-lines
/-m
- Minimum lines for functions (default: 3-5)--min-tokens
- Minimum AST nodes for functions--print
/-p
- Print code in output--cross-file
/-c
- Enable cross-file comparison--no-size-penalty
- Disable size difference penalty
# Check for duplicate functions (default)
similarity-ts ./src
# Enable type checking (experimental)
similarity-ts ./src --experimental-types
# Check types only
similarity-ts ./src --no-functions --experimental-types
# Fast mode with bloom filter (default)
similarity-ts ./src --no-fast # disable
# Check Python files
similarity-py ./src
# Include test files
similarity-py . --extensions py,test.py
# Check Rust files
similarity-rs ./src
# Skip test functions (test_ prefix or #[test])
similarity-rs . --skip-test
# Set minimum tokens (default: 30)
similarity-rs . --min-tokens 50
The tool outputs in a VSCode-compatible format for easy navigation:
Duplicates in src/utils.ts:
────────────────────────────────────────────────────────────
src/utils.ts:10 | L10-15 similar-function: calculateSum
src/utils.ts:20 | L20-25 similar-function: addNumbers
Similarity: 85.00%, Priority: 8.5 (lines: 10)
Click on the file paths in VSCode's terminal to jump directly to the code.
Results are sorted by priority (lines × similarity) to help you focus on the most impactful duplications first.
For AI assistants (like Claude, GPT-4, etc.) to help with code deduplication:
`similarity-ts .` でコードの意味的な類似が得られます。あなたはこれを実行し、ソースコードの重複を検知して、リファクタリング計画を立てます。細かいオプションは similarity-ts -h で確認してください。
English version:
Run `similarity-ts .` to detect semantic code similarities. Execute this command, analyze the duplicate code patterns, and create a refactoring plan. Check `similarity-ts -h` for detailed options.
-
Run similarity detection:
similarity-ts . --threshold 0.8 --min-lines 10
-
Share output with AI: Copy the similarity report to your AI assistant
-
AI analyzes patterns: The AI will identify common patterns and suggest refactoring strategies
-
Iterative refinement: Adjust threshold and options based on AI recommendations
This tool can be integrated into:
- Pre-commit hooks to prevent duplicate code
- CI/CD pipelines for code quality checks
- IDE extensions for real-time duplicate detection
- AI-powered code review workflows
- AST Parsing: Language-specific parsers convert code to ASTs
- TypeScript/JavaScript: oxc-parser (fast)
- Python/Rust: tree-sitter
- Tree Extraction: Extracts function/method nodes with structure
- TSED Algorithm: Tree Structure Edit Distance with size penalties
- Similarity Score: Normalized score between 0 and 1
- Impact Calculation: Considers code size for prioritization
The --experimental-overlap
flag enables detection of partial code overlaps within and across functions:
# Basic overlap detection
similarity-ts ./src --experimental-overlap
# With custom parameters
similarity-ts ./src --experimental-overlap \
--threshold 0.75 \
--overlap-min-window 8 \
--overlap-max-window 25 \
--overlap-size-tolerance 0.25
Parameters:
--experimental-overlap
: Enable overlap detection mode--overlap-min-window
: Minimum AST nodes to consider (default: 8)--overlap-max-window
: Maximum AST nodes to consider (default: 25)--overlap-size-tolerance
: Size variation tolerance (default: 0.25)
Use Cases:
- Finding copy-pasted code fragments within larger functions
- Detecting similar algorithmic patterns across different contexts
- Identifying refactoring opportunities for common code blocks
- TypeScript: Type similarity detection (interfaces, type aliases)
- Python: Class and method detection, decorator support
- Rust: Test function filtering, impl block analysis
# Find duplicate functions
similarity-ts ./src --threshold 0.7 --print
# Find similar types across files
similarity-ts ./src --no-functions --experimental-types --cross-file --print
# Comprehensive analysis
similarity-ts ./src \
--threshold 0.8 \
--min-lines 10 \
--cross-file \
--extensions ts,tsx
# Detect partial code overlaps (Experimental)
similarity-ts ./src --experimental-overlap --threshold 0.75 --print
# Find duplicate functions in Python project
similarity-py ./src --threshold 0.85 --print
# Check with custom settings
similarity-py . \
--min-lines 5 \
--extensions py
# Find duplicates excluding tests
similarity-rs ./src --skip-test --print
# Strict checking with high token count
similarity-rs . \
--min-tokens 50 \
--threshold 0.9 \
--skip-test
⚠️ EXPERIMENTAL: The generic language support is in early development and may have limitations or bugs.
The similarity-generic
tool provides experimental support for additional languages using tree-sitter parsers:
- Go
- Java
- C
- C++
- C#
- Ruby
- Elixir
# From crates.io (when available)
cargo install similarity-generic
# From source
cargo install --path crates/similarity-generic
# Detect Go duplicates
similarity-generic --language go ./src
# Detect Java duplicates
similarity-generic --language java ./src
# Detect C/C++ duplicates
similarity-generic --language c ./src
similarity-generic --language cpp ./src
# Detect C# duplicates
similarity-generic --language csharp ./src
# Detect Ruby duplicates
similarity-generic --language ruby ./src
# Detect Elixir duplicates
similarity-generic --language elixir ./src
# Common options work the same way
similarity-generic --language go ./src --threshold 0.8 --print
Language | File Extensions | Status |
---|---|---|
Go | .go | Experimental |
Java | .java | Experimental |
C | .c, .h | Experimental |
C++ | .cpp, .cc, .cxx, .hpp, .h | Experimental |
C# | .cs | Experimental |
Ruby | .rb | Experimental |
You can also provide custom language configurations:
# Use custom config file
similarity-generic --config ./my-language.json ./src
See examples/configs/custom-language-template.json for configuration format.
- Performance is slower than specialized tools (similarity-ts, similarity-py, similarity-rs)
- Detection accuracy may vary by language
- Some language-specific features may not be fully supported
- Custom configurations require understanding of tree-sitter node types
For production use, prefer the specialized tools when available.
- Written in Rust for maximum performance
- Concurrent file processing
- Memory-efficient algorithms
- Language-specific optimizations:
- TypeScript/JavaScript: Fast mode with bloom filters (~4x faster)
- Python/Rust: Tree-sitter based parsing
- Intelligent filtering reduces unnecessary comparisons
MIT