* UN-2901 [FIX] Container startup race condition with polling grace period
* UN-2901 [FIX] Add Redis retry resilience and fix container failure detection
- Add configurable Redis retry decorator with exponential backoff
- Fix critical bug where containers that never start are marked as SUCCESS
- Add robust env var validation for retry configuration
- Apply retry logic to FileExecutionStatusTracker and ToolExecutionTracker
- Document REDIS_RETRY_MAX_ATTEMPTS and REDIS_RETRY_BACKOFF_FACTOR env vars
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* UN-2901 [FIX] Address CodeRabbitAI review feedback for race condition fix
This commit addresses all valid CodeRabbitAI review comments on PR #1602:
1. **Fix retry loop semantics**: Changed retry loop to use range(max_retries + 1)
where max_retries means "retries after initial attempt", not total attempts.
Updated default from 5 to 4 (total 5 attempts) for clarity.
2. **Fix TypeError in file_execution_tracker.py**: Fixed json.loads() receiving
dict instead of string by using string fallback values.
3. **Fix unsafe env var parsing**: Added _safe_get_env_int/_safe_get_env_float
helpers with validation and fallback to defaults with warning logs.
4. **Fix status None check**: Added defensive None check before calling .get()
on status dict in grace period reset logic.
5. **Update sample.env defaults**: Changed REDIS_RETRY_MAX_ATTEMPTS from 5 to 4
and updated comments to clarify retry semantics.
6. **Improve transient failure handling**: Changed logger.error to logger.warning
for transient status fetch failures, added sleep before continue to respect
polling interval and avoid API hammering.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
---------
Co-authored-by: Claude <noreply@anthropic.com>
* UN-2889 [FIX] Handle Celery logger with empty request_id to prevent SIGSEGV crashes
- Simplified logging filters into RequestIDFilter and OTelFieldFilter
- Removed custom DjangoStyleFormatter and StructuredFormatter classes
- Removed Celery's worker_log_format config that created formatters without filters
- Removed LOG_FORMAT environment variable and all format options
- All workers now use single standardized format with filters always applied
* addressd coderabiit comment
* addressd coderabiit comment
- Add lock TTL (10 min default) to prevent deadlock if worker crashes during destination processing
- Add error check before marking DESTINATION_PROCESSING SUCCESS to ensure no false success status
- Add DESTINATION_PROCESSING_LOCK_TTL_IN_SECOND=600 to workers/sample.env
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Fixed stage order: DESTINATION_PROCESSING(2) → FINALIZATION(3) → COMPLETED(4)
- Implemented atomic lock via DESTINATION_PROCESSING IN_PROGRESS status
- Added 3-minute TTL for COMPLETED stage to optimize Redis memory
- Removed conflicting stage status updates from service.py
- Enhanced status history tracking with complete IN_PROGRESS → SUCCESS transitions
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* UN-2860 [FIX] Fixed active file cache not preventing duplicate file processing
Fixed critical bug where ActiveFileFilter cache checks were failing to detect
files already being processed, causing duplicate file processing in concurrent
workflow executions.
Key fixes:
- Fixed cache data access: Extract execution_id from nested cache structure
(cached_data["data"]["execution_id"] instead of cached_data["execution_id"])
- Changed cache status from "EXECUTING" to "PENDING" for queued files
- Increased MAX_ACTIVE_FILE_CACHE_TTL from 1hr to 2hrs for resource-constrained environments
- Added cache cleanup in finally blocks to prevent stale entries
- Fixed cache key format consistency (hash-based) between backend and workers
- Optimized DB queries to skip files already found in cache
- Removed ~370 lines of dead code (file_management_utils.py and unused methods)
Root cause: RedisCacheBackend wraps data in {data: {...}, cached_at, ttl} but
filter_pipeline was accessing execution_id directly instead of from nested data key.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* addressed code rabbit comments
* optional db param for clear cache
---------
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: harini-venkataraman <115449948+harini-venkataraman@users.noreply.github.com>
* UN-2854 [MISC] Set CELERY_TASK_REJECT_ON_WORKER_LOST to false
- Change default value from True to False in worker_models.py
- Update sample.env to reflect new default (false)
- Fix JSON credential quoting in sample.env (double to single quotes)
- Prevents duplicate task processing on worker connection loss
- Matches backend behavior (which never had this issue)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* task_reject_on_worker_lost default value was wrongly added
---------
Co-authored-by: Claude <noreply@anthropic.com>
* UN-2470 [MISC] Remove Django dependency from Celery workers
This commit introduces a new worker architecture that decouples
Celery workers from Django where possible, enabling support for
gevent/eventlet pool types and reducing worker startup overhead.
Key changes:
- Created separate worker modules (api-deployment, callback, file_processing, general)
- Added internal API endpoints for worker communication
- Implemented Django-free task execution where appropriate
- Added shared utilities and client facades
- Updated container configurations for new worker architecture
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
* Fix pre-commit issues: file permissions and ruff errors
Setup the docker for new workers
- Add executable permissions to worker entrypoint files
- Fix import order in namespace package __init__.py
- Remove unused variable api_status in general worker
- Address ruff E402 and F841 errors
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
* refactoreed, Dockerfiles,fixes
* flexibility on celery run commands
* added debug logs
* handled filehistory for API
* cleanup
* cleanup
* cloud plugin structure
* minor changes in import plugin
* added notification and logger workers under new worker module
* add docker compatibility for new workers
* handled docker issues
* log consumer worker fixes
* added scheduler worker
* minor env changes
* cleanup the logs
* minor changes in logs
* resolved scheduler worker issues
* cleanup and refactor
* ensuring backward compatibbility to existing wokers
* added configuration internal apis and cache utils
* optimization
* Fix API client singleton pattern to share HTTP sessions
- Fix flawed singleton implementation that was trying to share BaseAPIClient instances
- Now properly shares HTTP sessions between specialized clients
- Eliminates 6x BaseAPIClient initialization by reusing the same underlying session
- Should reduce API deployment orchestration time by ~135ms (from 6 clients to 1 session)
- Added debug logging to verify singleton pattern activation
* cleanup and structuring
* cleanup in callback
* file system connectors issue
* celery env values changes
* optional gossip
* variables for sync, mingle and gossip
* Fix for file type check
* Task pipeline issue resolving
* api deployement failed response handled
* Task pipline fixes
* updated file history cleanup with active file execution
* pipline status update and workflow ui page execution
* cleanup and resolvinf conflicts
* remove unstract-core from conenctoprs
* Commit uv.lock changes
* uv locks updates
* resolve migration issues
* defer connector-metadtda
* Fix connector migration for production scale
- Add encryption key handling with defer() to prevent decryption failures
- Add final cleanup step to fix duplicate connector names
- Optimize for large datasets with batch processing and bulk operations
- Ensure unique constraint in migration 0004 can be created successfully
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
* hitl fixes
* minor fixes on hitl
* api_hub related changes
* dockerfile fixes
* api client cache fixes with actual response class
* fix: tags and llm_profile_id
* optimized clear cache
* cleanup
* enhanced logs
* added more handling on is file dir and added loggers
* cleanup the runplatform script
* internal apis are excempting from csrf
* sonal cloud issues
* sona-cloud issues
* resolving sonar cloud issues
* resolving sonar cloud issues
* Delta: added Batch size fix in workers
* comments addressed
* celery configurational changes for new workers
* fiixes in callback regaurding the pipline type check
* change internal url registry logic
* gitignore changes
* gitignore changes
* addressng pr cmmnets and cleanup the codes
* adding missed profiles for v2
* sonal cloud blocker issues resolved
* imlement otel
* Commit uv.lock changes
* handle execution time and some cleanup
* adding user_data in metadata Pr: https://github.com/Zipstack/unstract/pull/1544
* scheduler backward compatibitlity
* replace user_data with custom_data
* Commit uv.lock changes
* celery worker command issue resolved
* enhance package imports in connectors by changing to lazy imports
* Update runner.py by removing the otel from it
Update runner.py by removing the otel from it
Signed-off-by: ali <117142933+muhammad-ali-e@users.noreply.github.com>
* added delta changes
* handle erro to destination db
* resolve tool instances id validation and hitl queu name in API
* handled direct execution from workflow page to worker and logs
* handle cost logs
* Update health.py
Signed-off-by: Ritwik G <100672805+ritwik-g@users.noreply.github.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* minor log changes
* introducing log consumer scheduler to bulk create, and socket .emit from worker for ws
* Commit uv.lock changes
* time limit or timeout celery config cleanup
* implemented redis client class in worker
* pipline status enum mismatch
* notification worker fixes
* resolve uv lock conflicts
* workflow log fixes
* ws channel name issue resolved. and handling redis down in status tracker, and removing redis keys
* default TTL changed for unified logs
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Signed-off-by: ali <117142933+muhammad-ali-e@users.noreply.github.com>
Signed-off-by: Ritwik G <100672805+ritwik-g@users.noreply.github.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Ritwik G <100672805+ritwik-g@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>