* UN-2901 [FIX] Container startup race condition with polling grace period * UN-2901 [FIX] Add Redis retry resilience and fix container failure detection - Add configurable Redis retry decorator with exponential backoff - Fix critical bug where containers that never start are marked as SUCCESS - Add robust env var validation for retry configuration - Apply retry logic to FileExecutionStatusTracker and ToolExecutionTracker - Document REDIS_RETRY_MAX_ATTEMPTS and REDIS_RETRY_BACKOFF_FACTOR env vars 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * UN-2901 [FIX] Address CodeRabbitAI review feedback for race condition fix This commit addresses all valid CodeRabbitAI review comments on PR #1602: 1. **Fix retry loop semantics**: Changed retry loop to use range(max_retries + 1) where max_retries means "retries after initial attempt", not total attempts. Updated default from 5 to 4 (total 5 attempts) for clarity. 2. **Fix TypeError in file_execution_tracker.py**: Fixed json.loads() receiving dict instead of string by using string fallback values. 3. **Fix unsafe env var parsing**: Added _safe_get_env_int/_safe_get_env_float helpers with validation and fallback to defaults with warning logs. 4. **Fix status None check**: Added defensive None check before calling .get() on status dict in grace period reset logic. 5. **Update sample.env defaults**: Changed REDIS_RETRY_MAX_ATTEMPTS from 5 to 4 and updated comments to clarify retry semantics. 6. **Improve transient failure handling**: Changed logger.error to logger.warning for transient status fetch failures, added sleep before continue to respect polling interval and avoid API hammering. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>
12 KiB
12 KiB