Files
unstract/backend/notification_v2/internal_api_views.py
ali 0c5997f9a9 UN-2470 [FEAT] Remove Django dependency from Celery workers with internal APIs (#1494)
* UN-2470 [MISC] Remove Django dependency from Celery workers

This commit introduces a new worker architecture that decouples
Celery workers from Django where possible, enabling support for
gevent/eventlet pool types and reducing worker startup overhead.

Key changes:
- Created separate worker modules (api-deployment, callback, file_processing, general)
- Added internal API endpoints for worker communication
- Implemented Django-free task execution where appropriate
- Added shared utilities and client facades
- Updated container configurations for new worker architecture

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Fix pre-commit issues: file permissions and ruff errors

Setup the docker for new workers

- Add executable permissions to worker entrypoint files
- Fix import order in namespace package __init__.py
- Remove unused variable api_status in general worker
- Address ruff E402 and F841 errors

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* refactoreed, Dockerfiles,fixes

* flexibility on celery run commands

* added debug logs

* handled filehistory for API

* cleanup

* cleanup

* cloud plugin structure

* minor changes in import plugin

* added notification and logger workers under new worker module

* add docker compatibility for new workers

* handled docker issues

* log consumer worker fixes

* added scheduler worker

* minor env changes

* cleanup the logs

* minor changes in logs

* resolved scheduler worker issues

* cleanup and refactor

* ensuring backward compatibbility to existing wokers

* added configuration internal apis and cache utils

* optimization

* Fix API client singleton pattern to share HTTP sessions

- Fix flawed singleton implementation that was trying to share BaseAPIClient instances
- Now properly shares HTTP sessions between specialized clients
- Eliminates 6x BaseAPIClient initialization by reusing the same underlying session
- Should reduce API deployment orchestration time by ~135ms (from 6 clients to 1 session)
- Added debug logging to verify singleton pattern activation

* cleanup and structuring

* cleanup in callback

* file system connectors  issue

* celery env values changes

* optional gossip

* variables for sync, mingle and gossip

* Fix for file type check

* Task pipeline issue resolving

* api deployement failed response handled

* Task pipline fixes

* updated file history cleanup with active file execution

* pipline status update and workflow ui page execution

* cleanup and resolvinf conflicts

* remove unstract-core from conenctoprs

* Commit uv.lock changes

* uv locks updates

* resolve migration issues

* defer connector-metadtda

* Fix connector migration for production scale

- Add encryption key handling with defer() to prevent decryption failures
- Add final cleanup step to fix duplicate connector names
- Optimize for large datasets with batch processing and bulk operations
- Ensure unique constraint in migration 0004 can be created successfully

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* hitl fixes

* minor fixes on hitl

* api_hub related changes

* dockerfile fixes

* api client cache fixes with actual response class

* fix: tags and llm_profile_id

* optimized clear cache

* cleanup

* enhanced logs

* added more handling on is file dir and added loggers

* cleanup the runplatform script

* internal apis are excempting from csrf

* sonal cloud issues

* sona-cloud issues

* resolving sonar cloud issues

* resolving sonar cloud issues

* Delta: added Batch size fix in workers

* comments addressed

* celery configurational changes for new workers

* fiixes in callback regaurding the pipline type check

* change internal url registry logic

* gitignore changes

* gitignore changes

* addressng pr cmmnets and cleanup the codes

* adding missed profiles for v2

* sonal cloud blocker issues resolved

* imlement otel

* Commit uv.lock changes

* handle execution time and some cleanup

* adding user_data in metadata Pr: https://github.com/Zipstack/unstract/pull/1544

* scheduler backward compatibitlity

* replace user_data with custom_data

* Commit uv.lock changes

* celery worker command issue resolved

* enhance package imports in connectors by changing to lazy imports

* Update runner.py by removing the otel from it

Update runner.py by removing the otel from it

Signed-off-by: ali <117142933+muhammad-ali-e@users.noreply.github.com>

* added delta changes

* handle erro to destination db

* resolve tool instances id validation and hitl queu name in API

* handled direct execution from workflow page to worker and logs

* handle cost logs

* Update health.py

Signed-off-by: Ritwik G <100672805+ritwik-g@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* minor log changes

* introducing log consumer scheduler to bulk create, and socket .emit from worker for ws

* Commit uv.lock changes

* time limit or timeout celery config cleanup

* implemented redis client class in worker

* pipline status enum mismatch

* notification worker fixes

* resolve uv lock conflicts

* workflow log fixes

* ws channel name issue resolved. and handling redis down in status tracker, and removing redis keys

* default TTL changed for unified logs

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: ali <117142933+muhammad-ali-e@users.noreply.github.com>
Signed-off-by: Ritwik G <100672805+ritwik-g@users.noreply.github.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Ritwik G <100672805+ritwik-g@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-10-03 11:24:07 +05:30

253 lines
9.6 KiB
Python

"""Internal API views for notification data access by workers.
These endpoints provide notification configuration data to workers
without exposing full Django models or requiring Django dependencies.
Security Note:
- CSRF protection is disabled for internal service-to-service communication
- Authentication is handled by InternalAPIAuthMiddleware using Bearer tokens
- These endpoints are not accessible from browsers and don't use session cookies
"""
import logging
from api_v2.models import APIDeployment
from django.http import JsonResponse
from django.shortcuts import get_object_or_404
from django.views.decorators.csrf import csrf_exempt
from django.views.decorators.http import require_http_methods
from pipeline_v2.models import Pipeline
from utils.organization_utils import filter_queryset_by_organization
from notification_v2.models import Notification
logger = logging.getLogger(__name__)
# Constants for error messages
INTERNAL_SERVER_ERROR_MSG = "Internal server error"
@csrf_exempt # Safe: Internal API with Bearer token auth, service-to-service only
@require_http_methods(["GET"])
def get_pipeline_notifications(request, pipeline_id):
"""Get active notifications for a pipeline or API deployment.
Used by callback worker to fetch notification configuration.
"""
try:
# Try to find the pipeline ID in Pipeline model first
pipeline_queryset = Pipeline.objects.filter(id=pipeline_id)
pipeline_queryset = filter_queryset_by_organization(
pipeline_queryset, request, "organization"
)
if pipeline_queryset.exists():
pipeline = pipeline_queryset.first()
# Get active notifications for this pipeline
notifications = Notification.objects.filter(pipeline=pipeline, is_active=True)
notifications_data = []
for notification in notifications:
notifications_data.append(
{
"id": str(notification.id),
"notification_type": notification.notification_type,
"platform": notification.platform,
"url": notification.url,
"authorization_type": notification.authorization_type,
"authorization_key": notification.authorization_key,
"authorization_header": notification.authorization_header,
"max_retries": notification.max_retries,
"is_active": notification.is_active,
}
)
return JsonResponse(
{
"status": "success",
"pipeline_id": str(pipeline.id),
"pipeline_name": pipeline.pipeline_name,
"pipeline_type": pipeline.pipeline_type,
"notifications": notifications_data,
}
)
else:
# If not found in Pipeline, try APIDeployment model
api_queryset = APIDeployment.objects.filter(id=pipeline_id)
api_queryset = filter_queryset_by_organization(
api_queryset, request, "organization"
)
if api_queryset.exists():
api = api_queryset.first()
# Get active notifications for this API deployment
notifications = Notification.objects.filter(api=api, is_active=True)
notifications_data = []
for notification in notifications:
notifications_data.append(
{
"id": str(notification.id),
"notification_type": notification.notification_type,
"platform": notification.platform,
"url": notification.url,
"authorization_type": notification.authorization_type,
"authorization_key": notification.authorization_key,
"authorization_header": notification.authorization_header,
"max_retries": notification.max_retries,
"is_active": notification.is_active,
}
)
return JsonResponse(
{
"status": "success",
"pipeline_id": str(api.id),
"pipeline_name": api.api_name,
"pipeline_type": "API",
"notifications": notifications_data,
}
)
else:
return JsonResponse(
{
"status": "error",
"message": "Pipeline or API deployment not found",
},
status=404,
)
except Exception as e:
logger.error(f"Error getting pipeline notifications for {pipeline_id}: {e}")
return JsonResponse(
{"status": "error", "message": INTERNAL_SERVER_ERROR_MSG}, status=500
)
@csrf_exempt # Safe: Internal API with Bearer token auth, service-to-service only
@require_http_methods(["GET"])
def get_api_notifications(request, api_id):
"""Get active notifications for an API deployment.
Used by callback worker to fetch notification configuration.
"""
try:
# Get API deployment with organization filtering
api_queryset = APIDeployment.objects.filter(id=api_id)
api_queryset = filter_queryset_by_organization(
api_queryset, request, "organization"
)
api = get_object_or_404(api_queryset)
# Get active notifications for this API
notifications = Notification.objects.filter(api=api, is_active=True)
notifications_data = []
for notification in notifications:
notifications_data.append(
{
"id": str(notification.id),
"notification_type": notification.notification_type,
"platform": notification.platform,
"url": notification.url,
"authorization_type": notification.authorization_type,
"authorization_key": notification.authorization_key,
"authorization_header": notification.authorization_header,
"max_retries": notification.max_retries,
"is_active": notification.is_active,
}
)
return JsonResponse(
{
"status": "success",
"api_id": str(api.id),
"api_name": api.api_name,
"display_name": api.display_name,
"notifications": notifications_data,
}
)
except APIDeployment.DoesNotExist:
return JsonResponse(
{"status": "error", "message": "API deployment not found"}, status=404
)
except Exception as e:
logger.error(f"Error getting API notifications for {api_id}: {e}")
return JsonResponse(
{"status": "error", "message": INTERNAL_SERVER_ERROR_MSG}, status=500
)
@csrf_exempt # Safe: Internal API with Bearer token auth, service-to-service only
@require_http_methods(["GET"])
def get_pipeline_data(request, pipeline_id):
"""Get basic pipeline data for notification purposes.
Used by callback worker to determine pipeline type and name.
"""
try:
# Get pipeline with organization filtering
pipeline_queryset = Pipeline.objects.filter(id=pipeline_id)
pipeline_queryset = filter_queryset_by_organization(
pipeline_queryset, request, "organization"
)
pipeline = get_object_or_404(pipeline_queryset)
return JsonResponse(
{
"status": "success",
"pipeline_id": str(pipeline.id),
"pipeline_name": pipeline.pipeline_name,
"pipeline_type": pipeline.pipeline_type,
"last_run_status": pipeline.last_run_status,
}
)
except Pipeline.DoesNotExist:
return JsonResponse(
{"status": "error", "message": "Pipeline not found"}, status=404
)
except Exception as e:
logger.error(f"Error getting pipeline data for {pipeline_id}: {e}")
return JsonResponse(
{"status": "error", "message": INTERNAL_SERVER_ERROR_MSG}, status=500
)
@csrf_exempt # Safe: Internal API with Bearer token auth, service-to-service only
@require_http_methods(["GET"])
def get_api_data(request, api_id):
"""Get basic API deployment data for notification purposes.
Used by callback worker to determine API name and details.
"""
try:
# Get API deployment with organization filtering
api_queryset = APIDeployment.objects.filter(id=api_id)
api_queryset = filter_queryset_by_organization(
api_queryset, request, "organization"
)
api = get_object_or_404(api_queryset)
return JsonResponse(
{
"status": "success",
"api_id": str(api.id),
"api_name": api.api_name,
"display_name": api.display_name,
"is_active": api.is_active,
}
)
except APIDeployment.DoesNotExist:
return JsonResponse(
{"status": "error", "message": "API deployment not found"}, status=404
)
except Exception as e:
logger.error(f"Error getting API data for {api_id}: {e}")
return JsonResponse(
{"status": "error", "message": INTERNAL_SERVER_ERROR_MSG}, status=500
)