1296 Commits

Author SHA1 Message Date
Chandrasekharan M
8eb50f7128 UN-2830 [MISC] Suppress OpenTelemetry EventLogger deprecation warning in prompt-service (#1585)
Some checks failed
Container Image Build Test for PRs / build (push) Has been cancelled
Run tox tests with UV / test (push) Has been cancelled
UN-2830 Suppress OpenTelemetry EventLogger LogRecord deprecation warning in prompt-service

Suppressed the LogDeprecatedInitWarning from OpenTelemetry SDK 1.37.0's EventLogger.emit()
method which still uses deprecated trace_id/span_id/trace_flags parameters instead of the
context parameter.

This is a temporary workaround for an upstream bug in OpenTelemetry Python SDK where
EventLogger.emit() was not updated when LogRecord API was changed to accept context parameter.
The warning only appears in prompt-service because llama-index emits OpenTelemetry Events.

See: https://github.com/open-telemetry/opentelemetry-python/issues/4328

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-27 11:11:25 +05:30
Chandrasekharan M
13642ba376 UN-2175 Avoid extracting when doc is already extracted during indexing (#1605)
Co-authored-by: harini-venkataraman <115449948+harini-venkataraman@users.noreply.github.com>
2025-10-27 11:00:14 +05:30
Chandrasekharan M
7d1a01b605 [MISC] Move Redis clients to unstract-core for better code organization (#1613)
- Moved redis_client.py and redis_queue_client.py from workers/shared/cache/ to unstract/core/src/unstract/core/cache/
- Created new cache module in unstract-core with __init__.py exports
- Updated all import statements across workers to use unstract.core.cache instead of shared.cache
- Updated imports in log_consumer tasks and cache_backends to use new location
- Maintains backward compatibility by keeping same API and functionality
2025-10-27 10:52:10 +05:30
ali
a08020a98b UN-2901 [FIX] Prevent race conditions in distributed file processing causing status corruption and premature cleanup (#1609)
This commit implements multiple defensive fixes to prevent race conditions when duplicate
workers process the same file execution, which was causing:
1. COMPLETED/ERROR status being overwritten with PENDING→EXECUTING by late workers
2. Premature chord callback cleanup (FileExpired errors) while workers still processing
3. Missing FINALIZATION stage detection during grace period
4. Incorrect COMPLETED:SUCCESS status for failed executions

Fixes implemented:
- Pre-creation: Check FileExecutionStatusTracker (Redis) + DB before PENDING update
- Grace period: Extended duplicate detection to include FINALIZATION stage
- Destination: Wait for DESTINATION_PROCESSING lock release (prevents chord cleanup)
- Terminal states: Handle both COMPLETED and ERROR in processor.py
- Status tracking: Set COMPLETED:FAILED for failed executions (was SUCCESS)
- Refactoring: Consolidated completed_ttl to method initialization

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-23 18:41:42 +05:30
ali
a219b2dced UN-2901 [FIX] Prevent invalid status updates (EXECUTING/ERROR) from duplicate file processing runs (#1606)
* UN-2901 [FIX] Prevent invalid status updates (EXECUTING/ERROR) from duplicate file processing runs

Fixes race condition where late-arriving workers overwrite COMPLETED status
with invalid EXECUTING or ERROR states, causing files to appear failed/stuck
even though processing succeeded.

Changes:
- FileAPIClient: Fixed URL construction and method call bugs
- Fresh DB validation: Check current status before updating to EXECUTING
- Grace period optimization: Early exit when duplicate detected during tool polling
- File count accuracy: Include skipped files in total_files calculation

Impact: Files now correctly maintain COMPLETED status; no duplicate processing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* coderabbit commnets addressed

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-23 10:35:58 +05:30
harini-venkataraman
91f17dcc7f UN-2859 [FIX] Fixing input dict for Smart table extractor (#1604)
* Fix/changing prompt keys

* Fix/changing prompt keys

* FIx/adding input file path to table settings

* FIx/adding input file path to table settings
2025-10-22 11:02:29 +05:30
ali
83d17ea4f5 UN-2901 [FIX] Container startup race condition with polling grace period (#1602)
* UN-2901 [FIX] Container startup race condition with polling grace period

* UN-2901 [FIX] Add Redis retry resilience and fix container failure detection

- Add configurable Redis retry decorator with exponential backoff
- Fix critical bug where containers that never start are marked as SUCCESS
- Add robust env var validation for retry configuration
- Apply retry logic to FileExecutionStatusTracker and ToolExecutionTracker
- Document REDIS_RETRY_MAX_ATTEMPTS and REDIS_RETRY_BACKOFF_FACTOR env vars

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* UN-2901 [FIX] Address CodeRabbitAI review feedback for race condition fix

This commit addresses all valid CodeRabbitAI review comments on PR #1602:

1. **Fix retry loop semantics**: Changed retry loop to use range(max_retries + 1)
   where max_retries means "retries after initial attempt", not total attempts.
   Updated default from 5 to 4 (total 5 attempts) for clarity.

2. **Fix TypeError in file_execution_tracker.py**: Fixed json.loads() receiving
   dict instead of string by using string fallback values.

3. **Fix unsafe env var parsing**: Added _safe_get_env_int/_safe_get_env_float
   helpers with validation and fallback to defaults with warning logs.

4. **Fix status None check**: Added defensive None check before calling .get()
   on status dict in grace period reset logic.

5. **Update sample.env defaults**: Changed REDIS_RETRY_MAX_ATTEMPTS from 5 to 4
   and updated comments to clarify retry semantics.

6. **Improve transient failure handling**: Changed logger.error to logger.warning
   for transient status fetch failures, added sleep before continue to respect
   polling interval and avoid API hammering.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-22 10:18:32 +05:30
ali
13baba491a UN-2897 [FIX] Prevent SIGSEGV crashes from Secret Manager gRPC calls in Google Drive connector (#1599)
UN-2897 [FIX] Prevent SIGSEGV from Secret Manager gRPC in Google Drive connector

Add module-level cache for client_secrets to prevent repeated Secret Manager
API calls which use gRPC and are not fork-safe. This addresses the second
root cause discovered after initial lazy initialization fix.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-22 10:05:18 +05:30
ali
70400eb7cc UN-2470 [FIX] Fix AttributeError when logging file batch creation in general worker (#1600)
Fixed incorrect dictionary-style access on FileBatchResponse dataclass.
Changed batch_response.get() calls to proper attribute access.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-22 09:58:31 +05:30
harini-venkataraman
92b24ca214 UN-2859 [FIX] Fixing prompt keys for Smart table extractor (#1603)
* Fix/changing prompt keys

* Fix/changing prompt keys
2025-10-21 18:38:38 +05:30
Chandrasekharan M
8913c08197 [MISC] Consolidate test reports into single PR comment and document libmagic1 dependency (#1598)
* Update tox.ini to document libmagic1 system dependency requirement

* [CI] Consolidate test reports into single PR comment

This change reduces noise in PR comments by combining multiple test
reports (runner and sdk1) into a single consolidated comment.

Changes:
- Add combine-test-reports.sh script to merge test reports
- Update CI workflow to combine reports before posting
- Replace separate runner/sdk1 comment steps with single combined step
- Summary section shows test counts (passed/failed/total) collapsed by default
- Full detailed reports available in collapsible sections
- Simplify job summary to use combined report

Benefits:
- Single PR comment instead of multiple separate comments
- Cleaner PR comment section with less clutter
- Easy-to-read summary with detailed inspection on demand
- Maintains all existing test information

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Fix test count extraction in combine-test-reports script

The previous version was using text pattern matching which incorrectly
extracted test counts. This fix properly parses the pytest-md-report
markdown table format by:

- Finding the table header row to determine column positions
- Locating the TOTAL row (handles both TOTAL and **TOTAL** formatting)
- Extracting values from the passed, failed, and SUBTOTAL columns
- Using proper table parsing instead of pattern matching

This resolves the issue where the summary showed incorrect counts
(23 for both runner and sdk1 instead of the actual 11 and 66).

Fixes: Test count summary in PR comments now shows correct values

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Fix LaTeX formatting in pytest-md-report test count extraction

pytest-md-report wraps all table values in LaTeX formatting:
$$\textcolor{...}{\tt{VALUE}}$$

The previous fix attempted to parse the table but didn't handle
the LaTeX formatting, causing it to extract 0 for all counts.

Changes:
- Add strip_latex() function to extract values from LaTeX wrappers
- Update grep pattern to match TOTAL row specifically (not SUBTOTAL in header)
- Apply LaTeX stripping to all extracted values before parsing

This fix was tested locally with tox-generated reports and correctly
shows: Runner=11 passed, SDK1=66 passed

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-17 14:49:39 +05:30
ali
2bef7d30f2 UN-2897 [FIX] Google Drive connector SIGSEGV crashes in Celery ForkPoolWorker processes (#1597)
UN-2897 [FIX] Google Drive connector SIGSEGV crashes in Celery ForkPoolWorker

Implements lazy initialization for Google Drive API client to prevent
segmentation faults when Celery forks worker processes.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-16 15:32:14 +05:30
harini-venkataraman
26f34f6a73 UN-2859 [FIX] Removing redundant validations (#1596)
* Removing redundant validations

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-10-16 13:59:38 +05:30
ali
8d6c7e20f9 UN-2893 [FIX] Fix duplicate process handling status updates and UI error logs (#1594)
* UN-2893 [FIX] Fix duplicate process handling status updates and UI error logs

Prevent duplicate worker processes from updating file execution status
and showing UI error logs during GKE race conditions.

- Added is_duplicate_skip flag to FileProcessingResult dataclass
- Fixed destination_processed default value for correct duplicate detection
- Skip status updates and UI logs when duplicate is detected
- Only first worker updates status, second worker silently exits

* logger.error converted to logger.exception

* error to exception in logs
2025-10-16 10:50:39 +05:30
ali
464fbb30b6 UN-2882 [FIX] Fix BigQuery float precision issue in PARSE_JSON for metadata serialization (#1593)
* UN-2882 [FIX] Fix BigQuery float precision issue in PARSE_JSON for metadata serialization

- Added BigQuery-specific float sanitization with IEEE 754 double precision safe zone
- Consolidated duplicate float sanitization logic into shared utilities
- Fixed insertion errors caused by floats with >15 significant figures

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Fix truthiness check for empty values in JSON serialization

- Changed 'if sanitized_value' to 'if sanitized_value is not None'
- Prevents empty dicts {}, empty lists [], and zero values from becoming None
- Addresses CodeRabbit AI feedback on PR #1593

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-16 10:39:15 +05:30
harini-venkataraman
7ca14cf323 UN-2859 [FEAT] Smart table extractor - Implementation in API deployments for Excel files (#1582)
* feat/smart-table-extractor

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Address review comments

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Address review comments

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Address review comments

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Deepak K <89829542+Deepak-Kesavan@users.noreply.github.com>
2025-10-15 16:03:03 +05:30
ali
ebadc955a3 UN-2889 [FIX] Handle Celery logger with empty request_id to prevent SIGSEGV crashes (#1591)
* UN-2889 [FIX] Handle Celery logger with empty request_id to prevent SIGSEGV crashes

- Simplified logging filters into RequestIDFilter and OTelFieldFilter
- Removed custom DjangoStyleFormatter and StructuredFormatter classes
- Removed Celery's worker_log_format config that created formatters without filters
- Removed LOG_FORMAT environment variable and all format options
- All workers now use single standardized format with filters always applied

* addressd coderabiit comment

* addressd coderabiit comment
2025-10-15 15:00:10 +05:30
ali
7661377e00 UN-2885 [FIX] Fix MinIO cleanup failure due to incorrect organization ID format in workflow execution context (#1590)
UN-2885 [FIX] Fix MinIO cleanup failure due to incorrect organization ID format

Fixed MinIO path mismatch causing cleanup failures by correcting the
organization_id format returned in workflow execution context.

Changed backend/workflow_manager/internal_views.py:443 from returning
numeric organization.id (e.g., "2") to string organization.organization_id
(e.g., "org_HJqNgodUQsA99S3A") in _get_organization_context() method.

This ensures MinIO cleanup uses the correct organization ID format that
matches the paths where tools write execution files.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-15 14:32:11 +05:30
ali
16d045f02c UN-2882 [FIX] Fix BigQuery float precision issue in metadata serialization (#1589)
* Fix BigQuery float precision issue by normalizing floats before JSON serialization

- Added _sanitize_floats_for_database() helper function to recursively normalize
  float values to 6 decimal precision using string formatting
- Modified _add_processing_columns() to sanitize metadata before json.dumps()
- Fixes BigQuery insertion failures caused by floats that can't round-trip
  through string representation (e.g., 22.770092)
- Solution normalizes internal binary representation via float(f"{x:.6f}")
- Handles edge cases: NaN and Infinity converted to None
- Works recursively on nested dicts/lists
- Backward compatible, preserves meaningful precision
- Protects all database types (BigQuery, PostgreSQL, MySQL, Snowflake)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* addressing PR comments add sanitized method over data feild too

* moved math import to top of the file

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-14 19:56:57 +05:30
ali
407d5f54d6 Fix organization context pollution in shared HTTP sessions
- Remove X-Organization-ID from session headers in _setup_session()
- Remove X-Organization-ID from set_organization_context() method
- Update clear_organization_context() to only clear instance variables
- Use per-request headers in _make_request() to prevent pollution

This prevents callback workers from inheriting wrong organization context
when using shared HTTP sessions with singleton pattern.

Fixes: UN-2877

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-14 15:29:18 +05:30
ali
922d22e06f move the lock deletion to handle_output just after destination process 2025-10-14 15:23:51 +05:30
ali
b8c8aa94af minor additions 2025-10-14 15:23:51 +05:30
ali
cb37fe3e1d prevent status update from duplicate tasks 2025-10-14 15:23:51 +05:30
ali
e7efffaa7f remove CELERY_WORKER_HEARTBEAT=0 from sample.env 2025-10-14 15:23:51 +05:30
ali
dfa0e2a209 replacing .error with .exception in common exception 2025-10-14 15:23:51 +05:30
ali
241274bd8b replacing .error with .exception in common exception 2025-10-14 15:23:51 +05:30
ali
6867e0381a adding redis lock while DESTINATION_PROCESS stage 2025-10-14 15:23:51 +05:30
ali
84a9e5d4ef UN-2874 Address CodeRabbit review: Add lock TTL and error checking for destination processing
- Add lock TTL (10 min default) to prevent deadlock if worker crashes during destination processing
- Add error check before marking DESTINATION_PROCESSING SUCCESS to ensure no false success status
- Add DESTINATION_PROCESSING_LOCK_TTL_IN_SECOND=600 to workers/sample.env

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-14 15:23:51 +05:30
ali
35be79f549 UN-2874 [FIX] Prevent duplicate destination entries when worker drops abruptly using atomic lock mechanism
- Fixed stage order: DESTINATION_PROCESSING(2) → FINALIZATION(3) → COMPLETED(4)
- Implemented atomic lock via DESTINATION_PROCESSING IN_PROGRESS status
- Added 3-minute TTL for COMPLETED stage to optimize Redis memory
- Removed conflicting stage status updates from service.py
- Enhanced status history tracking with complete IN_PROGRESS → SUCCESS transitions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-14 15:23:51 +05:30
ali
10571598f2 UN-2866 [FIX] Fix duplicate detection parameter name mismatch causing false positives on worker retry
Fixed parameter name from 'exclude_execution_id' to 'current_execution_id'
in worker API client to match backend endpoint expectations. This allows
worker retries after pod crashes to properly exclude current execution
from duplicate detection.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-13 16:53:07 +05:30
harini-venkataraman
4b0c57dfeb Revert "UN-2859 [FEAT] Smart table extractor - Implementation in API deployments for Excel files" (#1581)
Revert "UN-2859 [FEAT] Smart table extractor - Implementation in API deployme…"

This reverts commit 1c95bc56ea.
2025-10-13 15:54:22 +05:30
harini-venkataraman
1c95bc56ea UN-2859 [FEAT] Smart table extractor - Implementation in API deployments for Excel files (#1579)
* feat/smart-table-extractor

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Address review comments

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Address review comments

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Address review comments

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Gayathri <142381512+gaya3-zipstack@users.noreply.github.com>
2025-10-13 15:24:09 +05:30
Rahul Johny
89d4d6b5a1 UN-2871 [FEATURE] Log sharing across shared workflows/deployments (#1580)
* UN-2871 [FEATURE] Add shared workflow executions filter to enable multi-user access

Update WorkflowExecutionManager.for_user() to include executions from workflows shared with users, ensuring consistent access control across workflow and execution models.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Update backend/workflow_manager/workflow_v2/models/execution.py

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Rahul Johny <116638720+johnyrahul@users.noreply.github.com>

* UN-2871 [FIX] Move Q import to top-level for PEP8 compliance

Move django.db.models.Q import from function-level to module-level to comply with linting standards and improve code organization.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* UN-2871 [SECURITY] Fix execution filtering to respect independent workflow and deployment sharing

Update WorkflowExecutionManager.for_user() to properly handle independent sharing between workflows and API deployments/pipelines. Previous implementation only checked workflow sharing, allowing users to see executions for unshared deployments.

Key changes:
- Add separate filters for API deployments and pipelines access
- Implement proper logic for independent sharing scenarios:
  * Workflow shared + no pipeline -> User sees workflow-level executions
  * API/Pipeline shared (regardless of workflow) -> User sees those executions
  * Both shared -> User sees all related executions
  * Neither shared -> User cannot see executions

This ensures users can only view executions for resources they have explicit access to, preventing unauthorized data exposure.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* UN-2871 [PERF] Optimize ExecutionFilter to use EXISTS instead of values_list for large datasets

Replace inefficient values_list() queries with EXISTS subqueries in filter_execution_entity().
This significantly improves performance when filtering by entity type on large datasets.

Performance improvements:
- API filter: Uses EXISTS check instead of fetching all API deployment IDs
- ETL filter: Uses EXISTS check instead of fetching all ETL pipeline IDs
- TASK filter: Uses EXISTS check instead of fetching all TASK pipeline IDs
- Workflow filter: Simplified to use isnull check (removed redundant workflow_id filter)

EXISTS is more efficient because:
1. Stops at first match (short-circuits)
2. Doesn't transfer data from database to application
3. Better query optimizer hints for the database
4. Reduced memory usage

The queryset is already filtered by user permissions via get_queryset(),
so this change only optimizes the entity type filtering step.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* UN-2871 [FIX] Move Exists and OuterRef imports to module level for PEP8 compliance

Move django.db.models.Exists and OuterRef imports from function-level to module-level to comply with linting standards and improve code organization.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Signed-off-by: Rahul Johny <116638720+johnyrahul@users.noreply.github.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-10-13 15:04:15 +05:30
ali
2aebf04905 UN-2867 [FIX] Fixed webhook notifications showing incorrect pipeline name for SUCCESS/COMPLETED events (#1577)
* UN-2867 [FIX] Fixed webhook notifications showing incorrect pipeline name

- Added pipeline name fetch from Pipeline/APIDeployment models in callback worker
- Replaced workflow name fallback with explicit "Unknown API"/"Unknown Pipeline" values
- Extracted pipeline name fetching logic into reusable _fetch_pipeline_name_from_api() method

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* some exception erros to convert to logger.exception for better debugging

* replace logger.warning with exception

* addressed coderabbitai's comments

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Ritwik G <100672805+ritwik-g@users.noreply.github.com>
2025-10-13 12:13:05 +05:30
ali
c5bc92be1e UN-2869 [FIX] Add broker heartbeat configuration to prevent RabbitMQ connection timeouts causing false duplicate detection (#1578)
UN-2869 [FIX] Add broker heartbeat configuration to prevent RabbitMQ connection timeouts

This fix addresses false duplicate file detection caused by stale IN_PROGRESS
records when RabbitMQ disconnects idle workers after 60 seconds.

Changes:
- Added broker_heartbeat=30s to WorkerCeleryConfig in workers/shared/models/worker_models.py
- Configurable via CELERY_BROKER_HEARTBEAT env var (default: 30s)
- Prevents RabbitMQ connection drops during long-running tasks
- Eliminates stale cache/DB entries that cause false duplicate detection

Technical Details:
- RabbitMQ default timeout: 60 seconds
- Recommended heartbeat: 30 seconds (half of timeout)
- Uses get_celery_setting() for hierarchical config: worker-specific -> global -> default

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-10 20:03:10 +05:30
ali
f62b2d6d86 UN-2866 [MISC] Add database duplicate file check in file_processing worker with ERROR record creation for audit trail (#1575)
* UN-2866 [MISC] Add database duplicate file check in file_processing worker

This adds a defensive double-safeguard layer to prevent duplicate file
processing at the file_processing worker level, complementing the existing
duplicate check in the general worker.

Changes:
- Added _check_file_already_active_in_db() helper function that checks
  database (not Redis) for files already being processed in other executions
- Modified _pre_create_file_executions() to check for duplicates before
  creating file execution records
- Duplicates create WorkflowFileExecution records marked as ERROR status
  (not skipped) with detailed error message for full audit trail
- Added UI logging via log_file_processing_error() for user visibility
- Modified _process_individual_files() to handle skipped duplicates gracefully
- Updated _compile_batch_result() to include skipped_duplicates count

Benefits:
- Double protection against race conditions (catches edge cases)
- Full audit trail - every file has a database record with explanation
- Better user experience - users see why files were skipped in UI
- Transparent failure counting - duplicates counted as failed with reason
- Database-only check (efficient, no Redis overhead)

Implementation Details:
- Only checks database (Redis already updated by general worker)
- Fail-safe approach - allows processing if check fails
- Creates ERROR records for duplicates instead of silently skipping
- Proper logging at both console and UI levels
- Skipped duplicates included in failed_files count

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* UN-2866 [MISC] Enhanced duplicate file detection with Redis-first check and stable identifier tracking

- Redis-first duplicate check with DB fallback for faster detection
- Stable identifier tracking (uuid:path) to prevent misclassification
- Reuse ActiveFileManager methods for consistency

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Ritwik G <100672805+ritwik-g@users.noreply.github.com>
2025-10-10 13:03:26 +05:30
ali
77e14eadf6 UN-2865 [FIX] Remove premature COMPLETED status update in general worker after async file orchestration (#1574)
UN-2865 [FIX] Remove premature COMPLETED status update in general worker

This fix addresses a critical bug where the general worker incorrectly
marked workflow executions as COMPLETED immediately after orchestrating
async file processing, while files were still being processed.

Changes:
- Removed WorkflowExecutionStatusUpdate that set status to COMPLETED
- Removed incorrect execution_time update (only orchestration time)
- Removed incorrect total_files calculation
- Updated comments to clarify orchestration vs execution completion
- Updated logging to reflect async orchestration behavior

The callback worker now properly handles setting the final COMPLETED or
ERROR status after all files finish processing, matching the pattern
used by the API deployment worker.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-10 10:25:31 +05:30
ali
435759b12f UN-2857 [MISC] Fix Redis unacked hash memory leak in log consumer worker (#1566)
Added write_only=True parameter to KombuManager in log_consumer worker
to prevent accumulation of unacked messages in Redis.
2025-10-10 10:25:12 +05:30
ali
d7113fcf80 UN-2862 [FIX] Remove legacy QUEUED and CANCELED statuses from ExecutionStatus enum (#1571)
UN-2862 [FIX] Remove legacy QUEUED and CANCELED statuses

- Remove QUEUED and CANCELED enum values from ExecutionStatus
- Remove legacy status mappings from 4 worker status mapping files
- Remove dead LEGACY_STATUS_MAPPING dict (never used)
- Remove unreachable QUEUED/CANCELED mappings from callback tasks
- Fix unused QUEUED default parameter to PENDING in internal_client
- Add human-readable labels to ExecutionStatus.choices using .title()
- Remove unmigrated API constraint from file_execution model

All legacy status references cleaned up while maintaining backward compatibility
through string-based legacy mappings in callback tasks.

Co-authored-by: harini-venkataraman <115449948+harini-venkataraman@users.noreply.github.com>
2025-10-09 15:05:37 +05:30
Rahul Johny
0faa0b6cd7 UN-2852 [FIX] Include shared workflows in endpoint queryset (#1572)
Updated WorkflowEndpointViewSet to use Workflow.objects.for_user() method
instead of filtering by workflow__created_by. This ensures that workflow
endpoints are visible for both owned and shared workflows, maintaining
consistency with WorkflowViewSet behavior.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-09 14:57:31 +05:30
ali
0f22474e6b UN-2860 [FIX] Fixed active file cache not preventing duplicate file processing (#1570)
* UN-2860 [FIX] Fixed active file cache not preventing duplicate file processing

Fixed critical bug where ActiveFileFilter cache checks were failing to detect
files already being processed, causing duplicate file processing in concurrent
workflow executions.

Key fixes:
- Fixed cache data access: Extract execution_id from nested cache structure
  (cached_data["data"]["execution_id"] instead of cached_data["execution_id"])
- Changed cache status from "EXECUTING" to "PENDING" for queued files
- Increased MAX_ACTIVE_FILE_CACHE_TTL from 1hr to 2hrs for resource-constrained environments
- Added cache cleanup in finally blocks to prevent stale entries
- Fixed cache key format consistency (hash-based) between backend and workers
- Optimized DB queries to skip files already found in cache
- Removed ~370 lines of dead code (file_management_utils.py and unused methods)

Root cause: RedisCacheBackend wraps data in {data: {...}, cached_at, ttl} but
filter_pipeline was accessing execution_id directly instead of from nested data key.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* addressed code rabbit comments

* optional db param for clear cache

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: harini-venkataraman <115449948+harini-venkataraman@users.noreply.github.com>
2025-10-09 12:14:32 +05:30
Chandrasekharan M
0c0c8c1034 UN-2793 [FEAT] Add retry logic with exponential backoff to SDK1 (#1564)
* UN-2793 [FEAT] Add retry logic with exponential backoff to SDK1

Implemented automatic retry logic for platform and prompt service calls
with configurable exponential backoff, comprehensive test coverage, and
CI integration.

Features:
- Exponential backoff with jitter for transient failures
- Configurable via environment variables (MAX_RETRIES, MAX_TIME, BASE_DELAY, etc.)
- Retries ConnectionError, Timeout, HTTPError (502/503/504), OSError
- 67 tests with 100% pass rate
- CI integration with test reporting

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* [SECURITY] Use full commit SHA for sticky-pull-request-comment action

Replace tag reference with full commit SHA for better security:
- marocchino/sticky-pull-request-comment@v2 → @7737449 (v2.9.4)

This prevents potential supply chain attacks where tags could be moved
to point to malicious code. Commit SHAs are immutable.

Fixes SonarQube security hotspot for external GitHub action.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* [FIX] Allow retryable HTTP errors (502/503/504) to propagate for retry

Fixed HTTPError handling in _get_adapter_configuration to check status
codes and re-raise retryable errors (502, 503, 504) so the retry
decorator can handle them. Non-retryable errors are still converted
to SdkError as before.

Changes:
- Check HTTPError status code before converting to SdkError
- Re-raise HTTPError for 502/503/504 to allow retry decorator to retry
- Added parametrized test for all retryable status codes (502, 503, 504)
- All 12 platform tests pass

This fixes a bug where 502/503/504 errors were not being retried
because they were converted to SdkError before the retry decorator
could see them.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* [FIX] Use pytest.approx() for floating point comparisons in tests

Replaced direct equality comparisons (==) with pytest.approx() for
floating point values to avoid precision issues and satisfy SonarQube
code quality check (python:S1244).

Changes in test_retry_utils.py:
- test_exponential_backoff_without_jitter: Use pytest.approx() for 1.0, 2.0, 4.0, 8.0
- test_max_delay_cap: Use pytest.approx() for 5.0

This is the proper way to compare floating point values in tests,
accounting for floating point precision limitations.

All 4 TestCalculateDelay tests pass.

Fixes SonarQube: python:S1244 - Do not perform equality checks with
floating point values.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* minor: Addressed code smells, ruff fixes

* misc: Fixed tox config for sdk1 tests

* misc: Ruff issues fixed

* misc: tox tests fixed

* prompt service lock file for venv

* updated lock files for backend and prompt-service

* UN-2793 [FEAT] Update to unstract-sdk v0.78.0 with retry logic support (#1567)

[FEAT] Update unstract-sdk to v0.78.0 across all services and tools

- Updated unstract-sdk dependency from v0.77.3 to v0.78.0 in all pyproject.toml files
  - Main repository, backend, workers, platform-service, prompt-service
  - filesystem and tool-registry modules
- Updated tool requirements.txt files (structure, classifier, text_extractor)
- Bumped tool versions in properties.json:
  - Structure tool: 0.0.88 → 0.0.89
  - Classifier tool: 0.0.68 → 0.0.69
  - Text extractor tool: 0.0.64 → 0.0.65
- Updated tool versions in backend/sample.env and public_tools.json
- Regenerated all uv.lock files with new SDK version

This update brings in the retry logic with exponential backoff from unstract-sdk v0.78.0

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <noreply@anthropic.com>

---------

Signed-off-by: Chandrasekharan M <117059509+chandrasekharan-zipstack@users.noreply.github.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-10-09 10:48:19 +05:30
Chandrasekharan M
9997ee10e5 [MISC] Updated chord retry to parse values as float for v2 workers (#1568)
misc: Updated chord retry to parse values as float for v2 workers
2025-10-09 10:35:20 +05:30
Chandrasekharan M
f15fe26a09 [MISC] Parse CELERY_RESULT_CHORD_RETRY_INTERVAL as float instead of int (#1563)
misc: Correct chord retry env parsing
2025-10-08 13:51:17 +05:30
Praveen Kumar
d6e06704b1 UN-1856 [GATED-FEAT] - Added file storage deps for sdk1 (#1557)
* Added flie storage deps for sdk1

* Removed specific file storage deps from tomls and clubed them together with unstract-sdk1

* Commit uv.lock changes

* Pinned versions for azure-identity and azure-mgmt-apimanagement in backend's pyproject.toml

---------

Co-authored-by: pk-zipstack <187789273+pk-zipstack@users.noreply.github.com>
2025-10-08 12:30:01 +05:30
ali
26f423cea5 UN-2850 [FIX] Add INPROGRESS notification for scheduled ETL executions and adjust status mapping (#1562)
* UN-2850 [FIX] Update ETL webhook response status change for scheduled executions

- Add INPROGRESS status to NotificationStatus enum for scheduled executions
- Add SCHEDULED_EXECUTION source to NotificationSource enum
- Fix notification status mapping for backward compatibility:
  - API workflows use COMPLETED status
  - ETL/TASK workflows use SUCCESS status
- Update Slack webhook formatting to inline display
- Standardize key naming in Slack notifications (ID → Id)
- Trigger INPROGRESS notification when scheduled pipeline starts

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* missed to commit

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-08 11:45:35 +05:30
ali
490e394a71 UN-2854 [MISC] Set CELERY_TASK_REJECT_ON_WORKER_LOST to false by default to prevent duplicate task processing (#1565)
* UN-2854 [MISC] Set CELERY_TASK_REJECT_ON_WORKER_LOST to false

- Change default value from True to False in worker_models.py
- Update sample.env to reflect new default (false)
- Fix JSON credential quoting in sample.env (double to single quotes)
- Prevents duplicate task processing on worker connection loss
- Matches backend behavior (which never had this issue)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* task_reject_on_worker_lost default value was wrongly added

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-08 11:40:53 +05:30
Praveen Kumar
52362f1730 UN-1856 [GATED-FEAT] - Ruff linting issues in SDK v1 (#1560)
* fix(sdk1): resolve 185 ruff linting issues across multiple categories

Systematically fixed all ruff linting errors in the sdk1 directory by category:

- I001: Fixed 38 import sorting issues
- ANN201/ANN202/ANN204: Added missing return type annotations
- ANN003: Added type annotations for **kwargs parameters
- ANN001: Added type annotations for function arguments
- D415: Fixed docstring punctuation (first line ending)
- B006: Resolved 15 mutable argument default issues
- B007: Fixed 1 unused loop control variable
- B008: Corrected function calls in argument defaults
- B024: Fixed abstract base class declarations

* Fixed D205, B904, D107, F821, E501, D200, D411 lint issues

* Fixed lint issues

* Fixed lint issues

* Fixed all lint issues

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Added ANN101 to ignore list in ruff linter

* Added UP038 and ANN102 to ruff ignore list

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed issue with pycln in base.py

* - Removed global ignore for "UP038" and "C901" Added comment in the issue code block(C901) to supress the issue.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added ignore for UP038 in tools.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Addressed comments by code rabbit

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* modified tool.py to ignore UP038 issue

* Addressed code rabbit suggestions

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Gayathri <142381512+gaya3-zipstack@users.noreply.github.com>
2025-10-07 14:18:03 +05:30
Ritwik G
e6b8e94ba3 [FIX] Corrected the nginx port number (#1561)
Corrected the nginx port number

Signed-off-by: Ritwik G <100672805+ritwik-g@users.noreply.github.com>
2025-10-06 12:03:48 +05:30
ali
0c5997f9a9 UN-2470 [FEAT] Remove Django dependency from Celery workers with internal APIs (#1494)
* UN-2470 [MISC] Remove Django dependency from Celery workers

This commit introduces a new worker architecture that decouples
Celery workers from Django where possible, enabling support for
gevent/eventlet pool types and reducing worker startup overhead.

Key changes:
- Created separate worker modules (api-deployment, callback, file_processing, general)
- Added internal API endpoints for worker communication
- Implemented Django-free task execution where appropriate
- Added shared utilities and client facades
- Updated container configurations for new worker architecture

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Fix pre-commit issues: file permissions and ruff errors

Setup the docker for new workers

- Add executable permissions to worker entrypoint files
- Fix import order in namespace package __init__.py
- Remove unused variable api_status in general worker
- Address ruff E402 and F841 errors

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* refactoreed, Dockerfiles,fixes

* flexibility on celery run commands

* added debug logs

* handled filehistory for API

* cleanup

* cleanup

* cloud plugin structure

* minor changes in import plugin

* added notification and logger workers under new worker module

* add docker compatibility for new workers

* handled docker issues

* log consumer worker fixes

* added scheduler worker

* minor env changes

* cleanup the logs

* minor changes in logs

* resolved scheduler worker issues

* cleanup and refactor

* ensuring backward compatibbility to existing wokers

* added configuration internal apis and cache utils

* optimization

* Fix API client singleton pattern to share HTTP sessions

- Fix flawed singleton implementation that was trying to share BaseAPIClient instances
- Now properly shares HTTP sessions between specialized clients
- Eliminates 6x BaseAPIClient initialization by reusing the same underlying session
- Should reduce API deployment orchestration time by ~135ms (from 6 clients to 1 session)
- Added debug logging to verify singleton pattern activation

* cleanup and structuring

* cleanup in callback

* file system connectors  issue

* celery env values changes

* optional gossip

* variables for sync, mingle and gossip

* Fix for file type check

* Task pipeline issue resolving

* api deployement failed response handled

* Task pipline fixes

* updated file history cleanup with active file execution

* pipline status update and workflow ui page execution

* cleanup and resolvinf conflicts

* remove unstract-core from conenctoprs

* Commit uv.lock changes

* uv locks updates

* resolve migration issues

* defer connector-metadtda

* Fix connector migration for production scale

- Add encryption key handling with defer() to prevent decryption failures
- Add final cleanup step to fix duplicate connector names
- Optimize for large datasets with batch processing and bulk operations
- Ensure unique constraint in migration 0004 can be created successfully

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* hitl fixes

* minor fixes on hitl

* api_hub related changes

* dockerfile fixes

* api client cache fixes with actual response class

* fix: tags and llm_profile_id

* optimized clear cache

* cleanup

* enhanced logs

* added more handling on is file dir and added loggers

* cleanup the runplatform script

* internal apis are excempting from csrf

* sonal cloud issues

* sona-cloud issues

* resolving sonar cloud issues

* resolving sonar cloud issues

* Delta: added Batch size fix in workers

* comments addressed

* celery configurational changes for new workers

* fiixes in callback regaurding the pipline type check

* change internal url registry logic

* gitignore changes

* gitignore changes

* addressng pr cmmnets and cleanup the codes

* adding missed profiles for v2

* sonal cloud blocker issues resolved

* imlement otel

* Commit uv.lock changes

* handle execution time and some cleanup

* adding user_data in metadata Pr: https://github.com/Zipstack/unstract/pull/1544

* scheduler backward compatibitlity

* replace user_data with custom_data

* Commit uv.lock changes

* celery worker command issue resolved

* enhance package imports in connectors by changing to lazy imports

* Update runner.py by removing the otel from it

Update runner.py by removing the otel from it

Signed-off-by: ali <117142933+muhammad-ali-e@users.noreply.github.com>

* added delta changes

* handle erro to destination db

* resolve tool instances id validation and hitl queu name in API

* handled direct execution from workflow page to worker and logs

* handle cost logs

* Update health.py

Signed-off-by: Ritwik G <100672805+ritwik-g@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* minor log changes

* introducing log consumer scheduler to bulk create, and socket .emit from worker for ws

* Commit uv.lock changes

* time limit or timeout celery config cleanup

* implemented redis client class in worker

* pipline status enum mismatch

* notification worker fixes

* resolve uv lock conflicts

* workflow log fixes

* ws channel name issue resolved. and handling redis down in status tracker, and removing redis keys

* default TTL changed for unified logs

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: ali <117142933+muhammad-ali-e@users.noreply.github.com>
Signed-off-by: Ritwik G <100672805+ritwik-g@users.noreply.github.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Ritwik G <100672805+ritwik-g@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-10-03 11:24:07 +05:30