Files
unstract/run-platform.sh
ali 0c5997f9a9 UN-2470 [FEAT] Remove Django dependency from Celery workers with internal APIs (#1494)
* UN-2470 [MISC] Remove Django dependency from Celery workers

This commit introduces a new worker architecture that decouples
Celery workers from Django where possible, enabling support for
gevent/eventlet pool types and reducing worker startup overhead.

Key changes:
- Created separate worker modules (api-deployment, callback, file_processing, general)
- Added internal API endpoints for worker communication
- Implemented Django-free task execution where appropriate
- Added shared utilities and client facades
- Updated container configurations for new worker architecture

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Fix pre-commit issues: file permissions and ruff errors

Setup the docker for new workers

- Add executable permissions to worker entrypoint files
- Fix import order in namespace package __init__.py
- Remove unused variable api_status in general worker
- Address ruff E402 and F841 errors

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* refactoreed, Dockerfiles,fixes

* flexibility on celery run commands

* added debug logs

* handled filehistory for API

* cleanup

* cleanup

* cloud plugin structure

* minor changes in import plugin

* added notification and logger workers under new worker module

* add docker compatibility for new workers

* handled docker issues

* log consumer worker fixes

* added scheduler worker

* minor env changes

* cleanup the logs

* minor changes in logs

* resolved scheduler worker issues

* cleanup and refactor

* ensuring backward compatibbility to existing wokers

* added configuration internal apis and cache utils

* optimization

* Fix API client singleton pattern to share HTTP sessions

- Fix flawed singleton implementation that was trying to share BaseAPIClient instances
- Now properly shares HTTP sessions between specialized clients
- Eliminates 6x BaseAPIClient initialization by reusing the same underlying session
- Should reduce API deployment orchestration time by ~135ms (from 6 clients to 1 session)
- Added debug logging to verify singleton pattern activation

* cleanup and structuring

* cleanup in callback

* file system connectors  issue

* celery env values changes

* optional gossip

* variables for sync, mingle and gossip

* Fix for file type check

* Task pipeline issue resolving

* api deployement failed response handled

* Task pipline fixes

* updated file history cleanup with active file execution

* pipline status update and workflow ui page execution

* cleanup and resolvinf conflicts

* remove unstract-core from conenctoprs

* Commit uv.lock changes

* uv locks updates

* resolve migration issues

* defer connector-metadtda

* Fix connector migration for production scale

- Add encryption key handling with defer() to prevent decryption failures
- Add final cleanup step to fix duplicate connector names
- Optimize for large datasets with batch processing and bulk operations
- Ensure unique constraint in migration 0004 can be created successfully

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* hitl fixes

* minor fixes on hitl

* api_hub related changes

* dockerfile fixes

* api client cache fixes with actual response class

* fix: tags and llm_profile_id

* optimized clear cache

* cleanup

* enhanced logs

* added more handling on is file dir and added loggers

* cleanup the runplatform script

* internal apis are excempting from csrf

* sonal cloud issues

* sona-cloud issues

* resolving sonar cloud issues

* resolving sonar cloud issues

* Delta: added Batch size fix in workers

* comments addressed

* celery configurational changes for new workers

* fiixes in callback regaurding the pipline type check

* change internal url registry logic

* gitignore changes

* gitignore changes

* addressng pr cmmnets and cleanup the codes

* adding missed profiles for v2

* sonal cloud blocker issues resolved

* imlement otel

* Commit uv.lock changes

* handle execution time and some cleanup

* adding user_data in metadata Pr: https://github.com/Zipstack/unstract/pull/1544

* scheduler backward compatibitlity

* replace user_data with custom_data

* Commit uv.lock changes

* celery worker command issue resolved

* enhance package imports in connectors by changing to lazy imports

* Update runner.py by removing the otel from it

Update runner.py by removing the otel from it

Signed-off-by: ali <117142933+muhammad-ali-e@users.noreply.github.com>

* added delta changes

* handle erro to destination db

* resolve tool instances id validation and hitl queu name in API

* handled direct execution from workflow page to worker and logs

* handle cost logs

* Update health.py

Signed-off-by: Ritwik G <100672805+ritwik-g@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* minor log changes

* introducing log consumer scheduler to bulk create, and socket .emit from worker for ws

* Commit uv.lock changes

* time limit or timeout celery config cleanup

* implemented redis client class in worker

* pipline status enum mismatch

* notification worker fixes

* resolve uv lock conflicts

* workflow log fixes

* ws channel name issue resolved. and handling redis down in status tracker, and removing redis keys

* default TTL changed for unified logs

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: ali <117142933+muhammad-ali-e@users.noreply.github.com>
Signed-off-by: Ritwik G <100672805+ritwik-g@users.noreply.github.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Ritwik G <100672805+ritwik-g@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-10-03 11:24:07 +05:30

361 lines
13 KiB
Bash
Executable File

#!/usr/bin/env bash
set -o nounset # exit if a variable is not set
set -o errexit # exit for any command failure"
# text color escape codes (\033 == \e but OSX doesn't respect the \e)
blue_text='\033[94m'
green_text='\033[32m'
red_text='\033[31m'
default_text='\033[39m'
yellow_text='\033[33m'
# set -x/xtrace uses PS4 for more info
PS4="$blue_text"'${0}:${LINENO}: '"$default_text"
debug() {
if [ "$opt_verbose" = true ]; then
echo $1
fi
}
check_dependencies() {
if ! command -v git &> /dev/null; then
echo "$red_text""git not found. Exiting.""$default_text"
exit 1
fi
if ! command -v python3 &> /dev/null; then
echo "$red_text""python3 not found. Exiting.""$default_text"
exit 1
fi
if ! command -v docker &> /dev/null; then
echo "$red_text""docker not found. Exiting.""$default_text"
exit 1
fi
# For 'docker compose' vs 'docker-compose', see https://stackoverflow.com/a/66526176.
docker compose >/dev/null 2>&1
if [ $? -eq 0 ]; then
docker_compose_cmd="docker compose"
elif command -v docker-compose &> /dev/null; then
docker_compose_cmd="docker-compose"
else
echo "$red_text""Both 'docker compose' and 'docker-compose' not found. Exiting.""$default_text"
exit 1
fi
}
display_banner() {
# Make sure the console is huge
if test $(tput cols) -ge 64; then
echo " █████ █████"
echo "░░███ ░░███ "
echo " ░███ ░███ "
echo " ░███ ░███ "
echo " ░███ ░███ "
echo " ░███ ░███ "
echo " ░░█████████ >UNSTRACT COMMUNITY EDITION"
echo " ░░░░░░░░░ "
echo ""
sleep 1
fi
}
display_help() {
printf "Run Unstract platform in docker containers\n"
echo
echo -e "Syntax: $0 [options]"
echo -e "Options:"
echo -e " -h, --help Display help information"
echo -e " -e, --only-env Only do env files setup"
echo -e " -p, --only-pull Only do docker images pull"
echo -e " -b, --build-local Build docker images locally"
echo -e " -u, --update Update services version"
echo -e " -w, --workers-v2 Use v2 dedicated worker containers"
echo -e " -x, --trace Enables trace mode"
echo -e " -V, --verbose Print verbose logs"
echo -e " -v, --version Docker images version tag (default \"latest\")"
echo -e ""
}
parse_args() {
while [[ $# -gt 0 ]]; do
arg="$1"
case $arg in
-h | --help)
display_help
exit
;;
-e | --only-env)
opt_only_env=true
;;
-p | --only-pull)
opt_only_pull=true
;;
-b | --build-local)
opt_build_local=true
;;
-u | --update)
opt_update=true
;;
-w | --workers-v2)
opt_workers_v2=true
;;
-x | --trace)
set -o xtrace # display every line before execution; enables PS4
;;
-V | --verbose)
opt_verbose=true
;;
-v | --version)
if [ -z "${2-}" ]; then
echo "No version specified."
echo
display_help
exit
else
opt_version="$2"
fi
shift
;;
*)
echo "'$1' is not a known command."
echo
display_help
exit
;;
esac
shift
done
debug "OPTION only_env: $opt_only_env"
debug "OPTION only_pull: $opt_only_pull"
debug "OPTION build_local: $opt_build_local"
debug "OPTION upgrade: $opt_update"
debug "OPTION workers_v2: $opt_workers_v2"
debug "OPTION verbose: $opt_verbose"
debug "OPTION version: $opt_version"
}
do_git_pull() {
if [ "$opt_update" = false ]; then
return
fi
current_version=$(git describe --tags --abbrev=0)
echo "Fetching release tags."
git fetch --quiet --tags
if [[ "$opt_version" == "latest" ]]; then
target_branch=`git ls-remote --tags origin | awk -F/ '{print $3}' | sort -V | tail -n1`
elif [[ "$opt_version" == "main" ]]; then
target_branch="main"
opt_build_local=true
echo -e "Choosing ""$blue_text""local build""$default_text"" of Docker images from ""$blue_text""main""$default_text"" branch."
elif [ -z $(git tag -l "$opt_version") ]; then
echo -e "$red_text""Version not found.""$default_text"
version_regex="^v([0-9]+)\.([0-9]+)\.([0-9]+)(-[a-zA-Z0-9]+(\.[0-9]+)?)?$"
if [[ ! $opt_version =~ $version_regex ]]; then
echo -e "$red_text""Version must be provided with a 'v' prefix and follow SemVer (e.g. v0.47.0).""$default_text"
fi
exit 1
else
target_branch="$opt_version"
fi
echo -e "Performing ""$blue_text""git checkout""$default_text"" to ""$blue_text""$target_branch""$default_text""."
git checkout --quiet $target_branch
echo -e "Performing ""$blue_text""git pull""$default_text"" on ""$blue_text""$target_branch""$default_text""."
git pull --quiet $(git remote) $target_branch
}
copy_or_merge_envs() {
local src_file="$1"
local dest_file="$2"
local displayed_reason="$3"
if [ ! -e "$dest_file" ]; then
cp "$src_file" "$dest_file"
echo -e "Created env for ""$blue_text""$displayed_reason""$default_text"" at ""$blue_text""$dest_file""$default_text""."
elif [ "$opt_only_env" = true ] || [ "$opt_update" = true ]; then
python3 "$script_dir/docker/scripts/merge_env.py" "$src_file" "$dest_file"
if [ $? -ne 0 ]; then
exit 1
fi
echo -e "Merged env for ""$blue_text""$displayed_reason""$default_text"" at ""$blue_text""$dest_file""$default_text""."
fi
}
setup_env() {
# Generate Fernet Key. Refer https://pypi.org/project/cryptography/. for both backend and platform-service.
ENCRYPTION_KEY=$(python3 -c "import secrets, base64; print(base64.urlsafe_b64encode(secrets.token_bytes(32)).decode())")
DEFAULT_AUTH_KEY="unstract"
for service in "${services[@]}"; do
# Skip services that are spawned at runtime
for ignore_service in "${spawned_services[@]}"; do
if [[ "$service" == "$ignore_service" ]]; then
echo -e "Skipped env for ${blue_text}$service${default_text} as it's generated at runtime"
continue 2
fi
done
sample_env_path="$script_dir/$service/sample.env"
env_path="$script_dir/$service/.env"
if [ -e "$sample_env_path" ] && [ ! -e "$env_path" ]; then
first_setup=true
cp "$sample_env_path" "$env_path"
# Add encryption secret for backend and platform-service.
if [[ "$service" == "backend" || "$service" == "platform-service" ]]; then
echo -e "$blue_text""Adding encryption secret to $service""$default_text"
if [[ "$OSTYPE" == "darwin"* ]]; then
sed -i '' "s/ENCRYPTION_KEY.*/ENCRYPTION_KEY=\"$ENCRYPTION_KEY\"/" $env_path
else
sed -i "s/ENCRYPTION_KEY.*/ENCRYPTION_KEY=\"$ENCRYPTION_KEY\"/" $env_path
fi
fi
# Add default auth and system admin credentials for backend.
if [ "$service" == "backend" ]; then
echo -e "$blue_text""Adding default auth and system admin credentials to $service""$default_text"
if [[ "$OSTYPE" == "darwin"* ]]; then
sed -i '' "s/DEFAULT_AUTH_USERNAME.*/DEFAULT_AUTH_USERNAME=\"$DEFAULT_AUTH_KEY\"/" $env_path
sed -i '' "s/DEFAULT_AUTH_PASSWORD.*/DEFAULT_AUTH_PASSWORD=\"$DEFAULT_AUTH_KEY\"/" $env_path
# sed -i '' "s/SYSTEM_ADMIN_USERNAME.*/SYSTEM_ADMIN_USERNAME=\"$DEFAULT_AUTH_KEY\"/" $env_path
# sed -i '' "s/SYSTEM_ADMIN_PASSWORD.*/SYSTEM_ADMIN_PASSWORD=\"$DEFAULT_AUTH_KEY\"/" $env_path
else
sed -i "s/DEFAULT_AUTH_USERNAME.*/DEFAULT_AUTH_USERNAME=\"$DEFAULT_AUTH_KEY\"/" $env_path
sed -i "s/DEFAULT_AUTH_PASSWORD.*/DEFAULT_AUTH_PASSWORD=\"$DEFAULT_AUTH_KEY\"/" $env_path
# sed -i "s/SYSTEM_ADMIN_USERNAME.*/SYSTEM_ADMIN_USERNAME=\"$DEFAULT_AUTH_KEY\"/" $env_path
# sed -i "s/SYSTEM_ADMIN_PASSWORD.*/SYSTEM_ADMIN_PASSWORD=\"$DEFAULT_AUTH_KEY\"/" $env_path
fi
fi
echo -e "Created env for ""$blue_text""$service""$default_text" at ""$blue_text""$env_path""$default_text"."
elif [ "$opt_only_env" = true ] || [ "$opt_update" = true ]; then
python3 "$script_dir/docker/scripts/merge_env.py" "$sample_env_path" "$env_path"
if [ $? -ne 0 ]; then
exit 1
fi
echo -e "Merged env for ""$blue_text""$service""$default_text" at ""$blue_text""$env_path""$default_text"."
fi
done
copy_or_merge_envs "$script_dir/docker/sample.essentials.env" "$script_dir/docker/essentials.env" "essential services"
copy_or_merge_envs "$script_dir/docker/sample.env" "$script_dir/docker/.env" "docker compose"
if [ "$opt_only_env" = true ]; then
echo -e "$green_text""Done.""$default_text" && exit 0
fi
}
build_services() {
pushd "$script_dir/docker" 1>/dev/null
if [ "$opt_build_local" = true ]; then
echo -e "$blue_text""Building""$default_text"" docker images ""$blue_text""$opt_version""$default_text"" locally."
VERSION=$opt_version $docker_compose_cmd -f "$script_dir/docker/docker-compose.build.yaml" build || {
echo -e "$red_text""Failed to build docker images.""$default_text"
exit 1
}
elif [ "$first_setup" = true ] || [ "$opt_update" = true ]; then
echo -e "$blue_text""Pulling""$default_text"" docker images tag ""$blue_text""$opt_version""$default_text""."
# Try again on a slow network.
VERSION=$opt_version $docker_compose_cmd -f "$script_dir/docker/docker-compose.yaml" pull ||
VERSION=$opt_version $docker_compose_cmd -f "$script_dir/docker/docker-compose.yaml" pull || {
echo -e "$red_text""Failed to pull docker images.""$default_text"
echo -e "$red_text""Either version not found or docker is not running.""$default_text"
echo -e "$red_text""Please check and try again.""$default_text"
exit 1
}
fi
popd 1>/dev/null
if [ "$opt_only_pull" = true ]; then
echo -e "$green_text""Done.""$default_text" && exit 0
fi
}
run_services() {
pushd "$script_dir/docker" 1>/dev/null
if [ "$opt_workers_v2" = true ]; then
echo -e "$blue_text""Starting docker containers with V2 dedicated workers in detached mode""$default_text"
VERSION=$opt_version $docker_compose_cmd --profile workers-v2 up -d
else
echo -e "$blue_text""Starting docker containers with existing backend-based workers in detached mode""$default_text"
VERSION=$opt_version $docker_compose_cmd up -d
fi
if [ "$opt_update" = true ]; then
echo ""
if [[ "$opt_version" == "main" ]]; then
echo -e "$green_text""Updated platform to latest main (unstable).""$default_text"
else
echo -e "$green_text""Updated platform to $opt_version version.""$default_text"
fi
# Show release notes on version update if applicable
python3 "$script_dir/docker/scripts/release-notes/print_release_notes.py" "$current_version" "$target_branch"
fi
echo -e "\nOnce the services are up, visit ""$blue_text""http://frontend.unstract.localhost""$default_text"" in your browser."
echo -e "\nSee logs with:"
echo -e " ""$blue_text""$docker_compose_cmd -f docker/docker-compose.yaml logs -f""$default_text"
echo -e "Configure services by updating corresponding ""$yellow_text""<service>/.env""$default_text"" files."
echo -e "Make sure to ""$yellow_text""restart""$default_text"" the services with:"
echo -e " ""$blue_text""$docker_compose_cmd -f docker/docker-compose.yaml up -d""$default_text"
if [ "$first_setup" = true ]; then
echo -e "\n###################### BACKUP ENCRYPTION KEY ######################"
echo -e "Copy the value of ""$yellow_text""ENCRYPTION_KEY""$default_text"" in any of the following env files"
echo -e "to a secure location:\n"
echo -e "- ""$red_text""backend/.env""$default_text"
echo -e "- ""$red_text""platform-service/.env""$default_text"
echo -e "\nAdapter credentials are encrypted by the platform using this key."
echo -e "Its loss or change will make all existing adapters inaccessible!"
echo -e "###################################################################"
fi
popd 1>/dev/null
}
#
# Run Unstract platform - BEGIN
#
check_dependencies
opt_only_env=false
opt_only_pull=false
opt_build_local=false
opt_update=false
opt_workers_v2=false
opt_verbose=false
opt_version="latest"
script_dir=$(dirname "$(readlink -f "$BASH_SOURCE")")
first_setup=false
# Extract service names from docker compose file
services=($(VERSION=$opt_version $docker_compose_cmd -f "$script_dir/docker/docker-compose.build.yaml" config --services))
# Add workers manually for env setup
services+=("workers")
spawned_services=("tool-structure" "tool-sidecar")
current_version=""
target_branch=""
display_banner
parse_args $*
do_git_pull
setup_env
build_services
run_services
#
# Run Unstract platform - END
#