# ROCm Support Implementation Checklist This checklist tracks the implementation progress for Sprint 18.5. --- ## Phase 1: Device Abstraction Layer ### 1.1 GPU Detection Module - [ ] Create `src/noteflow/infrastructure/gpu/__init__.py` - [ ] Create `src/noteflow/infrastructure/gpu/detection.py` - [ ] Implement `GpuBackend` enum (NONE, CUDA, ROCM, MPS) - [ ] Implement `GpuInfo` dataclass - [ ] Implement `detect_gpu_backend()` function - [ ] Implement `get_gpu_info()` function - [ ] Add ROCm version detection via `torch.version.hip` - [ ] Create `tests/infrastructure/gpu/test_detection.py` - [ ] Test no-torch case - [ ] Test CUDA detection - [ ] Test ROCm detection (HIP check) - [ ] Test MPS detection - [ ] Test CPU fallback ### 1.2 Domain Types - [ ] Create `src/noteflow/domain/ports/gpu.py` - [ ] Export `GpuBackend` enum - [ ] Export `GpuInfo` type - [ ] Define `GpuDetectionProtocol` ### 1.3 ASR Device Types - [ ] Update `src/noteflow/application/services/asr_config/types.py` - [ ] Add `ROCM = "rocm"` to `AsrDevice` enum - [ ] Add ROCm entry to `DEVICE_COMPUTE_TYPES` mapping - [ ] Update `AsrCapabilities` dataclass with `rocm_available` and `gpu_backend` fields ### 1.4 Diarization Device Mixin - [ ] Update `src/noteflow/infrastructure/diarization/engine/_device_mixin.py` - [ ] Add ROCm detection in `_detect_available_device()` - [ ] Maintain backward compatibility with "cuda" device string ### 1.5 System Metrics - [ ] Update `src/noteflow/infrastructure/metrics/system_resources.py` - [ ] Handle ROCm VRAM queries (same API as CUDA via HIP) - [ ] Add `gpu_backend` field to metrics ### 1.6 gRPC Proto - [ ] Update `src/noteflow/grpc/proto/noteflow.proto` - [ ] Add `ASR_DEVICE_ROCM = 3` to `AsrDevice` enum - [ ] Add `rocm_available` field to `AsrConfiguration` - [ ] Add `gpu_backend` field to `AsrConfiguration` - [ ] Regenerate Python stubs - [ ] Run `scripts/patch_grpc_stubs.py` ### 1.7 Phase 1 Tests - [ ] Run `pytest tests/infrastructure/gpu/` - [ ] Run `make quality-py` - [ ] Verify no regressions in CUDA detection --- ## Phase 2: ASR Engine Protocol ### 2.1 Engine Protocol Definition - [ ] Extend `src/noteflow/infrastructure/asr/protocols.py` (or relocate to `domain/ports`) - [ ] Reuse `AsrResult` / `WordTiming` from `infrastructure/asr/dto.py` - [ ] Add `device` property (logical device: cpu/cuda/rocm) - [ ] Add `compute_type` property - [ ] Confirm `model_size` + `is_loaded` already covered - [ ] Add optional `transcribe_file()` helper (if needed) ### 2.2 Refactor FasterWhisperEngine - [ ] Update `src/noteflow/infrastructure/asr/engine.py` - [ ] Ensure compliance with `AsrEngine` - [ ] Add explicit type annotations - [ ] Document as CUDA/CPU backend - [ ] Create `tests/infrastructure/asr/test_protocol_compliance.py` - [ ] Verify `FasterWhisperEngine` implements protocol ### 2.3 PyTorch Whisper Engine (Fallback) - [ ] Create `src/noteflow/infrastructure/asr/pytorch_engine.py` - [ ] Implement `WhisperPyTorchEngine` class - [ ] Implement all protocol methods - [ ] Handle device placement (cuda/rocm/cpu) - [ ] Support all compute types - [ ] Create `tests/infrastructure/asr/test_pytorch_engine.py` - [ ] Test model loading - [ ] Test transcription - [ ] Test device handling ### 2.4 Engine Factory - [ ] Create `src/noteflow/infrastructure/asr/factory.py` - [ ] Implement `create_asr_engine()` function - [ ] Implement `_resolve_device()` helper - [ ] Implement `_create_cpu_engine()` helper - [ ] Implement `_create_cuda_engine()` helper - [ ] Implement `_create_rocm_engine()` helper - [ ] Define `EngineCreationError` exception - [ ] Create `tests/infrastructure/asr/test_factory.py` - [ ] Test auto device resolution - [ ] Test explicit device selection - [ ] Test fallback behavior - [ ] Test error cases ### 2.5 Update Engine Manager - [ ] Update `src/noteflow/application/services/asr_config/_engine_manager.py` - [ ] Add `detect_rocm_available()` method - [ ] Update `build_capabilities()` for ROCm - [ ] Update `check_configuration()` for ROCm validation - [ ] Use factory for engine creation in `build_engine_for_job()` - [ ] Update `tests/application/test_asr_config_service.py` - [ ] Add ROCm detection tests - [ ] Add ROCm validation tests ### 2.6 Phase 2 Tests - [ ] Run full ASR test suite - [ ] Run `make quality-py` - [ ] Verify CUDA path unchanged --- ## Phase 3: ROCm-Specific Engine ### 3.1 ROCm Engine Implementation - [ ] Create `src/noteflow/infrastructure/asr/rocm_engine.py` - [ ] Implement `FasterWhisperRocmEngine` class - [ ] Handle CTranslate2-ROCm import with fallback - [ ] Implement all protocol methods - [ ] Add ROCm-specific optimizations - [ ] Create `tests/infrastructure/asr/test_rocm_engine.py` - [ ] Test import fallback behavior - [ ] Test engine creation (mock) - [ ] Test protocol compliance ### 3.2 Update Factory for ROCm - [ ] Update `src/noteflow/infrastructure/asr/factory.py` - [ ] Add ROCm engine import with graceful fallback - [ ] Log warning when falling back to PyTorch - [ ] Update factory tests for ROCm path ### 3.3 ROCm Installation Detection - [ ] Update `src/noteflow/infrastructure/gpu/detection.py` - [ ] Add `is_ctranslate2_rocm_available()` function - [ ] Add `get_rocm_version()` function - [ ] Add corresponding tests ### 3.4 Phase 3 Tests - [ ] Run ROCm-specific tests (skip if no ROCm) - [ ] Run `make quality-py` - [ ] Test on AMD hardware (if available) --- ## Phase 4: Configuration & Distribution ### 4.1 Feature Flag - [ ] Update `src/noteflow/config/settings/_features.py` - [ ] Add `NOTEFLOW_FEATURE_ROCM_ENABLED` flag - [ ] Document in settings - [ ] Update any feature flag guards ### 4.2 gRPC Config Handlers - [ ] Update `src/noteflow/grpc/mixins/asr_config.py` - [ ] Handle ROCm device in `GetAsrConfiguration()` - [ ] Handle ROCm device in `UpdateAsrConfiguration()` - [ ] Add ROCm to capabilities response - [ ] Update tests in `tests/grpc/test_asr_config.py` ### 4.3 Dependencies - [ ] Update `pyproject.toml` - [ ] Add `rocm` extras group - [ ] Add `openai-whisper` as optional dependency - [ ] Document ROCm installation in comments - [ ] Create `requirements-rocm.txt` (optional) ### 4.4 Docker ROCm Image - [ ] Create `docker/Dockerfile.rocm` - [ ] Base on `rocm/pytorch` image - [ ] Install NoteFlow with ROCm extras - [ ] Configure for GPU access - [ ] Update `compose.yaml` (and/or add `compose.rocm.yaml`) with ROCm profile - [ ] Test Docker image build ### 4.5 Documentation - [ ] Create `docs/installation/rocm.md` - [ ] System requirements - [ ] PyTorch ROCm installation - [ ] CTranslate2-ROCm installation (optional) - [ ] Docker usage - [ ] Troubleshooting - [ ] Update main README with ROCm section - [ ] Update `CLAUDE.md` with ROCm notes ### 4.6 Phase 4 Tests - [ ] Run full test suite - [ ] Run `make quality` - [ ] Build ROCm Docker image - [ ] Test on AMD hardware --- ## Final Validation ### Quality Gates - [ ] `pytest tests/quality/` passes - [ ] `make quality-py` passes - [ ] `make quality` passes (full stack) - [ ] Proto regenerated correctly - [ ] No type errors (`basedpyright`) - [ ] No lint errors (`ruff`) ### Functional Validation - [ ] CUDA path works (no regression) - [ ] CPU path works (no regression) - [ ] ROCm detection works - [ ] PyTorch fallback works - [ ] gRPC configuration works - [ ] Device switching works ### Documentation - [ ] Sprint README complete - [ ] Implementation checklist complete - [ ] Installation guide complete - [ ] API documentation updated --- ## Notes ### Files Created | File | Status | |------|--------| | `src/noteflow/domain/ports/gpu.py` | ❌ | | `src/noteflow/domain/ports/asr.py` | optional (only if relocating protocol) | | `src/noteflow/infrastructure/gpu/__init__.py` | ❌ | | `src/noteflow/infrastructure/gpu/detection.py` | ❌ | | `src/noteflow/infrastructure/asr/pytorch_engine.py` | ❌ | | `src/noteflow/infrastructure/asr/rocm_engine.py` | ❌ | | `src/noteflow/infrastructure/asr/factory.py` | ❌ | | `docker/Dockerfile.rocm` | ❌ | | `docs/installation/rocm.md` | ❌ | ### Files Modified | File | Status | |------|--------| | `application/services/asr_config/types.py` | ❌ | | `application/services/asr_config/_engine_manager.py` | ❌ | | `infrastructure/diarization/engine/_device_mixin.py` | ❌ | | `infrastructure/metrics/system_resources.py` | ❌ | | `infrastructure/asr/engine.py` | ❌ | | `infrastructure/asr/protocols.py` | ❌ | | `grpc/proto/noteflow.proto` | ❌ | | `grpc/mixins/asr_config.py` | ❌ | | `config/settings/_features.py` | ❌ | | `pyproject.toml` | ❌ |