Files
claude-scripts/CLAUDE.md

178 lines
5.8 KiB
Markdown

# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
Claude-Scripts is a comprehensive Python code quality analysis toolkit implementing a layered, plugin-based architecture for detecting duplicates, complexity metrics, and modernization opportunities. The system uses sophisticated similarity algorithms including LSH for scalable analysis of large codebases.
## Development Commands
### Essential Commands
```bash
# Activate virtual environment and install dependencies
source .venv/bin/activate && uv pip install -e ".[dev]"
# Run all quality checks
make check-all
# Run linting and auto-fix issues
make format
# Run type checking
make typecheck
# Run tests with coverage
make test-cov
# Run a single test
source .venv/bin/activate && pytest path/to/test_file.py::TestClass::test_method -xvs
# Install pre-commit hooks
make install-dev
# Build distribution packages
make build
```
### CLI Usage Examples
```bash
# Detect duplicate code
claude-quality duplicates src/ --threshold 0.8 --format console
# Analyze complexity
claude-quality complexity src/ --threshold 10 --format json
# Modernization analysis
claude-quality modernization src/ --include-type-hints
# Full analysis
claude-quality full-analysis src/ --output report.json
# Create exceptions template
claude-quality create-exceptions-template --output-path .quality-exceptions.yaml
```
## Architecture Overview
### Core Design Pattern: Plugin-Based Analysis Pipeline
```
CLI Layer (cli/main.py) → Configuration (config/schemas.py) → Analysis Engines → Output Formatters
```
The system implements multiple design patterns:
- **Strategy Pattern**: Similarity algorithms (`LevenshteinSimilarity`, `JaccardSimilarity`, etc.) are interchangeable
- **Visitor Pattern**: AST traversal for code analysis
- **Factory Pattern**: Dynamic engine creation based on configuration
- **Composite Pattern**: Multiple engines combine for `full_analysis`
### Critical Module Interactions
**Duplicate Detection Flow:**
1. `FileFinder` discovers Python files based on path configuration
2. `ASTAnalyzer` extracts code blocks (functions, classes, methods)
3. `DuplicateDetectionEngine` orchestrates analysis:
- For small codebases: Direct similarity comparison
- For large codebases (>1000 files): LSH-based scalable detection
4. `SimilarityCalculator` applies weighted algorithm combination
5. Results filtered through `ExceptionFilter` for configured suppressions
**Similarity Algorithm System:**
- Multiple algorithms run in parallel with configurable weights
- Algorithms grouped by type: text-based, token-based, structural, semantic
- Final score = weighted combination of individual algorithm scores
- LSH (Locality-Sensitive Hashing) enables O(n log n) scaling for large datasets
**Configuration Hierarchy:**
```python
QualityConfig
detection: Algorithm weights, thresholds, LSH parameters
complexity: Metrics selection, thresholds per metric
languages: File extensions, language-specific rules
paths: Include/exclude patterns for file discovery
exceptions: Suppression rules with pattern matching
```
### Key Implementation Details
**Pydantic Version Constraint:**
- Must use Pydantic 2.5.x (not 2.6+ or 2.11+) due to compatibility issues
- Configuration schemas use Pydantic for validation and defaults
**AST Analysis Strategy:**
- Uses Python's standard `ast` module for parsing
- Custom `NodeVisitor` subclasses for different analysis types
- Preserves line numbers and column offsets for accurate reporting
**Performance Optimizations:**
- File-based caching with configurable TTL
- Parallel processing for multiple files
- LSH indexing for large-scale duplicate detection
- Incremental analysis support through cache
### Testing Approach
**Test Structure:**
- Unit tests for individual algorithms and components
- Integration tests for end-to-end CLI commands
- Property-based testing for similarity algorithms
- Fixture-based test data in `tests/fixtures/`
**Coverage Requirements:**
- Minimum 80% coverage enforced in CI
- Focus on algorithm correctness and edge cases
- Mocking external dependencies (file I/O, Git operations)
### Important Configuration Files
**pyproject.toml:**
- Package metadata and dependencies
- Ruff configuration (linting rules)
- MyPy configuration (type checking)
- Pytest configuration (test discovery and coverage)
**Makefile:**
- Standardizes development commands
- Ensures virtual environment activation
- Combines multiple tools into single targets
**.pre-commit-config.yaml:**
- Automated code quality checks on commit
- Includes ruff, mypy, and standard hooks
## Code Quality Standards
### Linting Configuration
- Ruff with extensive rule selection (E, F, W, UP, ANN, etc.)
- Ignored rules configured for pragmatic development
- Auto-formatting enabled with `make format`
### Type Checking
- Strict MyPy configuration
- All public APIs must have type annotations
- Ignores for third-party libraries without stubs
### Project Structure Conventions
- Similarity algorithms inherit from `BaseSimilarityAlgorithm`
- Analysis engines follow the `analyze()``AnalysisResult` pattern
- Configuration uses Pydantic models with validation
- Results formatted through dedicated formatter classes
## Critical Dependencies
**Analysis Core:**
- `radon`: Industry-standard complexity metrics
- `datasketch`: LSH implementation for scalable similarity
- `python-Levenshtein`: Fast string similarity
**Infrastructure:**
- `click`: CLI framework with subcommand support
- `pydantic==2.5.3`: Configuration and validation (version-locked)
- `pyyaml`: Configuration file parsing
**Development:**
- `uv`: Fast Python package manager (replaces pip)
- `pytest`: Testing framework with coverage
- `ruff`: Fast Python linter and formatter
- `mypy`: Static type checking