5.8 KiB
5.8 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
Claude-Scripts is a comprehensive Python code quality analysis toolkit implementing a layered, plugin-based architecture for detecting duplicates, complexity metrics, and modernization opportunities. The system uses sophisticated similarity algorithms including LSH for scalable analysis of large codebases.
Development Commands
Essential Commands
# Activate virtual environment and install dependencies
source .venv/bin/activate && uv pip install -e ".[dev]"
# Run all quality checks
make check-all
# Run linting and auto-fix issues
make format
# Run type checking
make typecheck
# Run tests with coverage
make test-cov
# Run a single test
source .venv/bin/activate && pytest path/to/test_file.py::TestClass::test_method -xvs
# Install pre-commit hooks
make install-dev
# Build distribution packages
make build
CLI Usage Examples
# Detect duplicate code
claude-quality duplicates src/ --threshold 0.8 --format console
# Analyze complexity
claude-quality complexity src/ --threshold 10 --format json
# Modernization analysis
claude-quality modernization src/ --include-type-hints
# Full analysis
claude-quality full-analysis src/ --output report.json
# Create exceptions template
claude-quality create-exceptions-template --output-path .quality-exceptions.yaml
Architecture Overview
Core Design Pattern: Plugin-Based Analysis Pipeline
CLI Layer (cli/main.py) → Configuration (config/schemas.py) → Analysis Engines → Output Formatters
The system implements multiple design patterns:
- Strategy Pattern: Similarity algorithms (
LevenshteinSimilarity,JaccardSimilarity, etc.) are interchangeable - Visitor Pattern: AST traversal for code analysis
- Factory Pattern: Dynamic engine creation based on configuration
- Composite Pattern: Multiple engines combine for
full_analysis
Critical Module Interactions
Duplicate Detection Flow:
FileFinderdiscovers Python files based on path configurationASTAnalyzerextracts code blocks (functions, classes, methods)DuplicateDetectionEngineorchestrates analysis:- For small codebases: Direct similarity comparison
- For large codebases (>1000 files): LSH-based scalable detection
SimilarityCalculatorapplies weighted algorithm combination- Results filtered through
ExceptionFilterfor configured suppressions
Similarity Algorithm System:
- Multiple algorithms run in parallel with configurable weights
- Algorithms grouped by type: text-based, token-based, structural, semantic
- Final score = weighted combination of individual algorithm scores
- LSH (Locality-Sensitive Hashing) enables O(n log n) scaling for large datasets
Configuration Hierarchy:
QualityConfig
├── detection: Algorithm weights, thresholds, LSH parameters
├── complexity: Metrics selection, thresholds per metric
├── languages: File extensions, language-specific rules
├── paths: Include/exclude patterns for file discovery
└── exceptions: Suppression rules with pattern matching
Key Implementation Details
Pydantic Version Constraint:
- Must use Pydantic 2.5.x (not 2.6+ or 2.11+) due to compatibility issues
- Configuration schemas use Pydantic for validation and defaults
AST Analysis Strategy:
- Uses Python's standard
astmodule for parsing - Custom
NodeVisitorsubclasses for different analysis types - Preserves line numbers and column offsets for accurate reporting
Performance Optimizations:
- File-based caching with configurable TTL
- Parallel processing for multiple files
- LSH indexing for large-scale duplicate detection
- Incremental analysis support through cache
Testing Approach
Test Structure:
- Unit tests for individual algorithms and components
- Integration tests for end-to-end CLI commands
- Property-based testing for similarity algorithms
- Fixture-based test data in
tests/fixtures/
Coverage Requirements:
- Minimum 80% coverage enforced in CI
- Focus on algorithm correctness and edge cases
- Mocking external dependencies (file I/O, Git operations)
Important Configuration Files
pyproject.toml:
- Package metadata and dependencies
- Ruff configuration (linting rules)
- MyPy configuration (type checking)
- Pytest configuration (test discovery and coverage)
Makefile:
- Standardizes development commands
- Ensures virtual environment activation
- Combines multiple tools into single targets
.pre-commit-config.yaml:
- Automated code quality checks on commit
- Includes ruff, mypy, and standard hooks
Code Quality Standards
Linting Configuration
- Ruff with extensive rule selection (E, F, W, UP, ANN, etc.)
- Ignored rules configured for pragmatic development
- Auto-formatting enabled with
make format
Type Checking
- Strict MyPy configuration
- All public APIs must have type annotations
- Ignores for third-party libraries without stubs
Project Structure Conventions
- Similarity algorithms inherit from
BaseSimilarityAlgorithm - Analysis engines follow the
analyze()→AnalysisResultpattern - Configuration uses Pydantic models with validation
- Results formatted through dedicated formatter classes
Critical Dependencies
Analysis Core:
radon: Industry-standard complexity metricsdatasketch: LSH implementation for scalable similaritypython-Levenshtein: Fast string similarity
Infrastructure:
click: CLI framework with subcommand supportpydantic==2.5.3: Configuration and validation (version-locked)pyyaml: Configuration file parsing
Development:
uv: Fast Python package manager (replaces pip)pytest: Testing framework with coverageruff: Fast Python linter and formattermypy: Static type checking