# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Project Overview Claude-Scripts is a comprehensive Python code quality analysis toolkit implementing a layered, plugin-based architecture for detecting duplicates, complexity metrics, and modernization opportunities. The system uses sophisticated similarity algorithms including LSH for scalable analysis of large codebases. ## Development Commands ### Essential Commands ```bash # Activate virtual environment and install dependencies source .venv/bin/activate && uv pip install -e ".[dev]" # Run all quality checks make check-all # Run linting and auto-fix issues make format # Run type checking make typecheck # Run tests with coverage make test-cov # Run a single test source .venv/bin/activate && pytest path/to/test_file.py::TestClass::test_method -xvs # Install pre-commit hooks make install-dev # Build distribution packages make build ``` ### CLI Usage Examples ```bash # Detect duplicate code claude-quality duplicates src/ --threshold 0.8 --format console # Analyze complexity claude-quality complexity src/ --threshold 10 --format json # Modernization analysis claude-quality modernization src/ --include-type-hints # Full analysis claude-quality full-analysis src/ --output report.json # Create exceptions template claude-quality create-exceptions-template --output-path .quality-exceptions.yaml ``` ## Architecture Overview ### Core Design Pattern: Plugin-Based Analysis Pipeline ``` CLI Layer (cli/main.py) → Configuration (config/schemas.py) → Analysis Engines → Output Formatters ``` The system implements multiple design patterns: - **Strategy Pattern**: Similarity algorithms (`LevenshteinSimilarity`, `JaccardSimilarity`, etc.) are interchangeable - **Visitor Pattern**: AST traversal for code analysis - **Factory Pattern**: Dynamic engine creation based on configuration - **Composite Pattern**: Multiple engines combine for `full_analysis` ### Critical Module Interactions **Duplicate Detection Flow:** 1. `FileFinder` discovers Python files based on path configuration 2. `ASTAnalyzer` extracts code blocks (functions, classes, methods) 3. `DuplicateDetectionEngine` orchestrates analysis: - For small codebases: Direct similarity comparison - For large codebases (>1000 files): LSH-based scalable detection 4. `SimilarityCalculator` applies weighted algorithm combination 5. Results filtered through `ExceptionFilter` for configured suppressions **Similarity Algorithm System:** - Multiple algorithms run in parallel with configurable weights - Algorithms grouped by type: text-based, token-based, structural, semantic - Final score = weighted combination of individual algorithm scores - LSH (Locality-Sensitive Hashing) enables O(n log n) scaling for large datasets **Configuration Hierarchy:** ```python QualityConfig ├── detection: Algorithm weights, thresholds, LSH parameters ├── complexity: Metrics selection, thresholds per metric ├── languages: File extensions, language-specific rules ├── paths: Include/exclude patterns for file discovery └── exceptions: Suppression rules with pattern matching ``` ### Key Implementation Details **Pydantic Version Constraint:** - Must use Pydantic 2.5.x (not 2.6+ or 2.11+) due to compatibility issues - Configuration schemas use Pydantic for validation and defaults **AST Analysis Strategy:** - Uses Python's standard `ast` module for parsing - Custom `NodeVisitor` subclasses for different analysis types - Preserves line numbers and column offsets for accurate reporting **Performance Optimizations:** - File-based caching with configurable TTL - Parallel processing for multiple files - LSH indexing for large-scale duplicate detection - Incremental analysis support through cache ### Testing Approach **Test Structure:** - Unit tests for individual algorithms and components - Integration tests for end-to-end CLI commands - Property-based testing for similarity algorithms - Fixture-based test data in `tests/fixtures/` **Coverage Requirements:** - Minimum 80% coverage enforced in CI - Focus on algorithm correctness and edge cases - Mocking external dependencies (file I/O, Git operations) ### Important Configuration Files **pyproject.toml:** - Package metadata and dependencies - Ruff configuration (linting rules) - MyPy configuration (type checking) - Pytest configuration (test discovery and coverage) **Makefile:** - Standardizes development commands - Ensures virtual environment activation - Combines multiple tools into single targets **.pre-commit-config.yaml:** - Automated code quality checks on commit - Includes ruff, mypy, and standard hooks ## Code Quality Standards ### Linting Configuration - Ruff with extensive rule selection (E, F, W, UP, ANN, etc.) - Ignored rules configured for pragmatic development - Auto-formatting enabled with `make format` ### Type Checking - Strict MyPy configuration - All public APIs must have type annotations - Ignores for third-party libraries without stubs ### Project Structure Conventions - Similarity algorithms inherit from `BaseSimilarityAlgorithm` - Analysis engines follow the `analyze()` → `AnalysisResult` pattern - Configuration uses Pydantic models with validation - Results formatted through dedicated formatter classes ## Critical Dependencies **Analysis Core:** - `radon`: Industry-standard complexity metrics - `datasketch`: LSH implementation for scalable similarity - `python-Levenshtein`: Fast string similarity **Infrastructure:** - `click`: CLI framework with subcommand support - `pydantic==2.5.3`: Configuration and validation (version-locked) - `pyyaml`: Configuration file parsing **Development:** - `uv`: Fast Python package manager (replaces pip) - `pytest`: Testing framework with coverage - `ruff`: Fast Python linter and formatter - `mypy`: Static type checking