178 lines
5.8 KiB
Markdown
178 lines
5.8 KiB
Markdown
# CLAUDE.md
|
|
|
|
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
|
|
|
## Project Overview
|
|
|
|
Claude-Scripts is a comprehensive Python code quality analysis toolkit implementing a layered, plugin-based architecture for detecting duplicates, complexity metrics, and modernization opportunities. The system uses sophisticated similarity algorithms including LSH for scalable analysis of large codebases.
|
|
|
|
## Development Commands
|
|
|
|
### Essential Commands
|
|
```bash
|
|
# Activate virtual environment and install dependencies
|
|
source .venv/bin/activate && uv pip install -e ".[dev]"
|
|
|
|
# Run all quality checks
|
|
make check-all
|
|
|
|
# Run linting and auto-fix issues
|
|
make format
|
|
|
|
# Run type checking
|
|
make typecheck
|
|
|
|
# Run tests with coverage
|
|
make test-cov
|
|
|
|
# Run a single test
|
|
source .venv/bin/activate && pytest path/to/test_file.py::TestClass::test_method -xvs
|
|
|
|
# Install pre-commit hooks
|
|
make install-dev
|
|
|
|
# Build distribution packages
|
|
make build
|
|
```
|
|
|
|
### CLI Usage Examples
|
|
```bash
|
|
# Detect duplicate code
|
|
claude-quality duplicates src/ --threshold 0.8 --format console
|
|
|
|
# Analyze complexity
|
|
claude-quality complexity src/ --threshold 10 --format json
|
|
|
|
# Modernization analysis
|
|
claude-quality modernization src/ --include-type-hints
|
|
|
|
# Full analysis
|
|
claude-quality full-analysis src/ --output report.json
|
|
|
|
# Create exceptions template
|
|
claude-quality create-exceptions-template --output-path .quality-exceptions.yaml
|
|
```
|
|
|
|
## Architecture Overview
|
|
|
|
### Core Design Pattern: Plugin-Based Analysis Pipeline
|
|
```
|
|
CLI Layer (cli/main.py) → Configuration (config/schemas.py) → Analysis Engines → Output Formatters
|
|
```
|
|
|
|
The system implements multiple design patterns:
|
|
- **Strategy Pattern**: Similarity algorithms (`LevenshteinSimilarity`, `JaccardSimilarity`, etc.) are interchangeable
|
|
- **Visitor Pattern**: AST traversal for code analysis
|
|
- **Factory Pattern**: Dynamic engine creation based on configuration
|
|
- **Composite Pattern**: Multiple engines combine for `full_analysis`
|
|
|
|
### Critical Module Interactions
|
|
|
|
**Duplicate Detection Flow:**
|
|
1. `FileFinder` discovers Python files based on path configuration
|
|
2. `ASTAnalyzer` extracts code blocks (functions, classes, methods)
|
|
3. `DuplicateDetectionEngine` orchestrates analysis:
|
|
- For small codebases: Direct similarity comparison
|
|
- For large codebases (>1000 files): LSH-based scalable detection
|
|
4. `SimilarityCalculator` applies weighted algorithm combination
|
|
5. Results filtered through `ExceptionFilter` for configured suppressions
|
|
|
|
**Similarity Algorithm System:**
|
|
- Multiple algorithms run in parallel with configurable weights
|
|
- Algorithms grouped by type: text-based, token-based, structural, semantic
|
|
- Final score = weighted combination of individual algorithm scores
|
|
- LSH (Locality-Sensitive Hashing) enables O(n log n) scaling for large datasets
|
|
|
|
**Configuration Hierarchy:**
|
|
```python
|
|
QualityConfig
|
|
├── detection: Algorithm weights, thresholds, LSH parameters
|
|
├── complexity: Metrics selection, thresholds per metric
|
|
├── languages: File extensions, language-specific rules
|
|
├── paths: Include/exclude patterns for file discovery
|
|
└── exceptions: Suppression rules with pattern matching
|
|
```
|
|
|
|
### Key Implementation Details
|
|
|
|
**Pydantic Version Constraint:**
|
|
- Must use Pydantic 2.5.x (not 2.6+ or 2.11+) due to compatibility issues
|
|
- Configuration schemas use Pydantic for validation and defaults
|
|
|
|
**AST Analysis Strategy:**
|
|
- Uses Python's standard `ast` module for parsing
|
|
- Custom `NodeVisitor` subclasses for different analysis types
|
|
- Preserves line numbers and column offsets for accurate reporting
|
|
|
|
**Performance Optimizations:**
|
|
- File-based caching with configurable TTL
|
|
- Parallel processing for multiple files
|
|
- LSH indexing for large-scale duplicate detection
|
|
- Incremental analysis support through cache
|
|
|
|
### Testing Approach
|
|
|
|
**Test Structure:**
|
|
- Unit tests for individual algorithms and components
|
|
- Integration tests for end-to-end CLI commands
|
|
- Property-based testing for similarity algorithms
|
|
- Fixture-based test data in `tests/fixtures/`
|
|
|
|
**Coverage Requirements:**
|
|
- Minimum 80% coverage enforced in CI
|
|
- Focus on algorithm correctness and edge cases
|
|
- Mocking external dependencies (file I/O, Git operations)
|
|
|
|
### Important Configuration Files
|
|
|
|
**pyproject.toml:**
|
|
- Package metadata and dependencies
|
|
- Ruff configuration (linting rules)
|
|
- MyPy configuration (type checking)
|
|
- Pytest configuration (test discovery and coverage)
|
|
|
|
**Makefile:**
|
|
- Standardizes development commands
|
|
- Ensures virtual environment activation
|
|
- Combines multiple tools into single targets
|
|
|
|
**.pre-commit-config.yaml:**
|
|
- Automated code quality checks on commit
|
|
- Includes ruff, mypy, and standard hooks
|
|
|
|
## Code Quality Standards
|
|
|
|
### Linting Configuration
|
|
- Ruff with extensive rule selection (E, F, W, UP, ANN, etc.)
|
|
- Ignored rules configured for pragmatic development
|
|
- Auto-formatting enabled with `make format`
|
|
|
|
### Type Checking
|
|
- Strict MyPy configuration
|
|
- All public APIs must have type annotations
|
|
- Ignores for third-party libraries without stubs
|
|
|
|
### Project Structure Conventions
|
|
- Similarity algorithms inherit from `BaseSimilarityAlgorithm`
|
|
- Analysis engines follow the `analyze()` → `AnalysisResult` pattern
|
|
- Configuration uses Pydantic models with validation
|
|
- Results formatted through dedicated formatter classes
|
|
|
|
## Critical Dependencies
|
|
|
|
**Analysis Core:**
|
|
- `radon`: Industry-standard complexity metrics
|
|
- `datasketch`: LSH implementation for scalable similarity
|
|
- `python-Levenshtein`: Fast string similarity
|
|
|
|
**Infrastructure:**
|
|
- `click`: CLI framework with subcommand support
|
|
- `pydantic==2.5.3`: Configuration and validation (version-locked)
|
|
- `pyyaml`: Configuration file parsing
|
|
|
|
**Development:**
|
|
- `uv`: Fast Python package manager (replaces pip)
|
|
- `pytest`: Testing framework with coverage
|
|
- `ruff`: Fast Python linter and formatter
|
|
- `mypy`: Static type checking
|