Files
claude-scripts/CLAUDE.md

5.8 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

Claude-Scripts is a comprehensive Python code quality analysis toolkit implementing a layered, plugin-based architecture for detecting duplicates, complexity metrics, and modernization opportunities. The system uses sophisticated similarity algorithms including LSH for scalable analysis of large codebases.

Development Commands

Essential Commands

# Activate virtual environment and install dependencies
source .venv/bin/activate && uv pip install -e ".[dev]"

# Run all quality checks
make check-all

# Run linting and auto-fix issues
make format

# Run type checking
make typecheck

# Run tests with coverage
make test-cov

# Run a single test
source .venv/bin/activate && pytest path/to/test_file.py::TestClass::test_method -xvs

# Install pre-commit hooks
make install-dev

# Build distribution packages
make build

CLI Usage Examples

# Detect duplicate code
claude-quality duplicates src/ --threshold 0.8 --format console

# Analyze complexity
claude-quality complexity src/ --threshold 10 --format json

# Modernization analysis
claude-quality modernization src/ --include-type-hints

# Full analysis
claude-quality full-analysis src/ --output report.json

# Create exceptions template
claude-quality create-exceptions-template --output-path .quality-exceptions.yaml

Architecture Overview

Core Design Pattern: Plugin-Based Analysis Pipeline

CLI Layer (cli/main.py) → Configuration (config/schemas.py) → Analysis Engines → Output Formatters

The system implements multiple design patterns:

  • Strategy Pattern: Similarity algorithms (LevenshteinSimilarity, JaccardSimilarity, etc.) are interchangeable
  • Visitor Pattern: AST traversal for code analysis
  • Factory Pattern: Dynamic engine creation based on configuration
  • Composite Pattern: Multiple engines combine for full_analysis

Critical Module Interactions

Duplicate Detection Flow:

  1. FileFinder discovers Python files based on path configuration
  2. ASTAnalyzer extracts code blocks (functions, classes, methods)
  3. DuplicateDetectionEngine orchestrates analysis:
    • For small codebases: Direct similarity comparison
    • For large codebases (>1000 files): LSH-based scalable detection
  4. SimilarityCalculator applies weighted algorithm combination
  5. Results filtered through ExceptionFilter for configured suppressions

Similarity Algorithm System:

  • Multiple algorithms run in parallel with configurable weights
  • Algorithms grouped by type: text-based, token-based, structural, semantic
  • Final score = weighted combination of individual algorithm scores
  • LSH (Locality-Sensitive Hashing) enables O(n log n) scaling for large datasets

Configuration Hierarchy:

QualityConfig
├── detection: Algorithm weights, thresholds, LSH parameters
├── complexity: Metrics selection, thresholds per metric
├── languages: File extensions, language-specific rules
├── paths: Include/exclude patterns for file discovery
└── exceptions: Suppression rules with pattern matching

Key Implementation Details

Pydantic Version Constraint:

  • Must use Pydantic 2.5.x (not 2.6+ or 2.11+) due to compatibility issues
  • Configuration schemas use Pydantic for validation and defaults

AST Analysis Strategy:

  • Uses Python's standard ast module for parsing
  • Custom NodeVisitor subclasses for different analysis types
  • Preserves line numbers and column offsets for accurate reporting

Performance Optimizations:

  • File-based caching with configurable TTL
  • Parallel processing for multiple files
  • LSH indexing for large-scale duplicate detection
  • Incremental analysis support through cache

Testing Approach

Test Structure:

  • Unit tests for individual algorithms and components
  • Integration tests for end-to-end CLI commands
  • Property-based testing for similarity algorithms
  • Fixture-based test data in tests/fixtures/

Coverage Requirements:

  • Minimum 80% coverage enforced in CI
  • Focus on algorithm correctness and edge cases
  • Mocking external dependencies (file I/O, Git operations)

Important Configuration Files

pyproject.toml:

  • Package metadata and dependencies
  • Ruff configuration (linting rules)
  • MyPy configuration (type checking)
  • Pytest configuration (test discovery and coverage)

Makefile:

  • Standardizes development commands
  • Ensures virtual environment activation
  • Combines multiple tools into single targets

.pre-commit-config.yaml:

  • Automated code quality checks on commit
  • Includes ruff, mypy, and standard hooks

Code Quality Standards

Linting Configuration

  • Ruff with extensive rule selection (E, F, W, UP, ANN, etc.)
  • Ignored rules configured for pragmatic development
  • Auto-formatting enabled with make format

Type Checking

  • Strict MyPy configuration
  • All public APIs must have type annotations
  • Ignores for third-party libraries without stubs

Project Structure Conventions

  • Similarity algorithms inherit from BaseSimilarityAlgorithm
  • Analysis engines follow the analyze()AnalysisResult pattern
  • Configuration uses Pydantic models with validation
  • Results formatted through dedicated formatter classes

Critical Dependencies

Analysis Core:

  • radon: Industry-standard complexity metrics
  • datasketch: LSH implementation for scalable similarity
  • python-Levenshtein: Fast string similarity

Infrastructure:

  • click: CLI framework with subcommand support
  • pydantic==2.5.3: Configuration and validation (version-locked)
  • pyyaml: Configuration file parsing

Development:

  • uv: Fast Python package manager (replaces pip)
  • pytest: Testing framework with coverage
  • ruff: Fast Python linter and formatter
  • mypy: Static type checking