Files
unstract/backend/README.md
Jaseem Jas ba1df894d2 Python 3.9 to 3.12 (#1231)
* python version updated from 3.9 into 3.12

* x2text-service updated with uv and python version 3.12

* x2text-service docker file updated

* Unstract packages updated with uv

* Runner updated with uv

* Promptservice updated with uv

* Platform service updated with uv

* backend service updated with uv

* root pyproject.toml file updated

* sdk version updated in services

* unstract package modules updated based on sdk version:

* docker file update

* pdm lock workflow modified to support uv

* Docs updated based on uv support

* lock automation updated

* snowflake module version updated into 3.14.0

* tox updated to support UV

* tox updated to support UV

* tox updated with pytest

* tox updated with pytest-md-report

* tox updated with module requirements

* python migration from 3.9 to 3.12

* tox updated with module requirements

* runner updated

* Commit uv.lock changes

* runner updated

* Commit uv.lock changes

* pytest.ini added

* x2text-service docker file updated

* pytest.ini removed

* environment updated to test

* docformatter commented on pre-commit

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* some pre-commit issues ignored

* some pre-commit issues ignored

* some pre-commit issues ignored

* some pre-commit issues ignored

* some pre-commit issues ignored

* pre-commit updates

* un used import removed from platfrom service controller

* tox issue fixed

* tox issue fixed

* docker files updated

* backend dockerfile updated

* open installation issue fixed

* Tools docker file updated with base python version 3.12

* python version updated into min 3.12 in pyproject.toml

* linting issue fixed

* uv version upgraded into 0.6.14

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* migrations excluded from ruff

* added PoethePoet task runner

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* feat: Added poe tasks for services (#1248)

* Added poe tasks for services

* reverted FE change made by mistake

* updated tool-sidecar to uv and python to 3.12.9

* minor updates in pyproject descreption

* feat: platform-service logging improvements (#1255)

feat: Used flask util from core to improve logging in platform-service, added core as a dependency to platform-service:

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: Platform-service build issue and numpy issue with Python 3.12 (#1258)

* fix: Platform-service build and numpy issue with Py 3.12

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: Removed backend dockerfile install statements for numpy

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* minor: Handled scenario when cost is not calculated due to no usage

* minor: Corrected content shown for workflow input

* fix: Minor fixes, used gthread for prompt-service, runner

* Commit uv.lock changes

* Removed unused line in tool dockerfile

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Chandrasekharan M <chandrasekharan@zipstack.com>
Co-authored-by: Chandrasekharan M <117059509+chandrasekharan-zipstack@users.noreply.github.com>
Co-authored-by: ali-zipstack <muhammad.ali@zipstack.com>
2025-04-24 16:07:02 +05:30

7.1 KiB

Unstract Backend

Contains the backend services for Unstract written with Django and DRF.

Dependencies

  1. Postgres
  2. Redis

Getting started

Install and run locally

Create your virtual env

All commands assumes that you have activated your venv.

Install UV: https://docs.astral.sh/uv/getting-started/installation/

# Create venv and install dependencies
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
uv sync

Installing dependencies

Go to service dir and install dependencies listed in corresponding pyproject.toml.

# Install dependencies
uv sync

# Install specific dev dependency group
uv sync --group dev
# Install production dependencies only
uv sync --group deploy

Running scripts

UV allows you to run python scripts applicable within the service dir.

uv run sample_script.py

Running commands

  • If you plan to run the django server locally, make sure the dependent services are up (either locally or through docker compose)
  • Copy sample.env into .env and update the necessary variables. For eg:
DJANGO_SETTINGS_MODULE='backend.settings.dev'
DB_HOST='localhost'
DB_USER='unstract_dev'
DB_PASSWORD='unstract_pass'
DB_NAME='unstract_db'
DB_PORT=5432
  • If you've made changes to the model, run uv run manage.py makemigrations, else ignore this step
  • Run the following to apply any migrations to the DB and start the server
uv run manage.py migrate
uv run manage.py runserver localhost:8000

Authentication

The default username is unstract and the default password is unstract.

Initial Setup

To customize the username or password:

  1. Navigate to /backend/.env created from /backend/sample.env
  2. Update the values for DEFAULT_AUTH_USERNAME and DEFAULT_AUTH_PASSWORD with strong, unique credentials of your choosing
  3. Save the /backend/.env file and restart the server to apply changes

Updating Credentials

To update the username or password after initial setup:

  1. Modify the credentials in /backend/.env
    • DEFAULT_AUTH_USERNAME=your_new_username
    • DEFAULT_AUTH_PASSWORD=your_new_password
  2. Save changes and restart backend service

Now you can login with the new credentials.

Important Notes

  • DEFAULT_AUTH_USERNAME must not match the username of any Django superuser or admin account. Keeping them distinct ensures security and avoids potential conflicts.
  • Use strong and unique credentials to protect your system.
  • The authentication system validates credentials against the values specified in the /backend/.env file.

Asynchronous Execution

This project uses Celery for handling asynchronous execution. Celery tasks are managed through various queues and consumed by workers.

ETL, TASK, and API Deployment tasks are handled by these asynchronous workers. Log management also utilizes Celery.

Queues

Queue Name Description Tasks
celery Default queue for general Celery tasks, including those without a specific queue. Webhook notifications, Pipeline (ETL, Tasks) Executions.
celery_periodic_logs Queue for persisting logs into the database.
celery_log_task_queue Queue for publishing logs to WebSocket clients.
celery_api_deployments Queue for managing API deployment tasks.

Run Execution Worker

To start a Celery worker, use the following command:

celery -A backend worker --loglevel=info -Q <queue_name>

Autoscaling Workers

  celery -A backend worker --loglevel=info -Q <queue_name> --autoscale=<max_workers>,<min_workers>

Celery supports autoscaling of worker processes, allowing you to dynamically adjust the number of workers based on workload.

  • Max Workers (max_workers): This value is related to your CPU resources and the level of concurrency you need.

    • For CPU-bound tasks: Consider setting max_workers close to or slightly above the number of CPU cores.
    • For I/O-bound tasks: You can set a higher max_workers value, typically 2-3 times the number of CPU cores.
  • Min Workers (min_workers): This is the minimum number of worker processes that will always be running.

Worker Dashboard

  • We have to ensure the package flower is installed in the current environment
  • Run command
celery -A backend flower

This command will start Flower on the default port (5555) and can be accessed via a web browser. Flower provides a user-friendly interface for monitoring and managing Celery tasks

Connecting to Postgres

Follow the below steps to connect to the postgres DB running with docker compose.

  1. Exec into a shell within the postgres container
docker compose exec -it db bash
  1. Connect to the db as the specified user
psql -d unstract_db -U unstract_dev
  1. Execute PSQL commands within the shell.

API Docs

While running the backend server locally, access the API documentation that's auto generated at the backend endpoint /api/v1/doc/.

NOTE: There exists issues accessing this when the django server is run with gunicorn (in case of running with a container)

Connectors

Google Drive

The Google Drive connector makes use of PyDrive2 library and supports only OAuth 2.0 authentication. To set it up, follow the first step higlighted in Google's docs and set the client ID and client secret as envs in backend/.env

GOOGLE_OAUTH2_KEY="<client-id>"
GOOGLE_OAUTH2_SECRET="<client-secret>"

Tool Registry

Information regarding how tools are added and maintained can be found here.

Archived - (EXPERIMENTAL)

Accessing the admin site

  • If its the first time, create a super user and follow the on-screen instructions
python manage.py createsuperuser
  • Register your models in <app>/admin.py, for example
from django.contrib import admin
from .models import Prompt

admin.site.register(Prompt)
  • Make sure the server is running and hit the /admin endpoint

Running unit tests

Units tests are run with pytest and pytest-django

pytest
pytest prompt # To run for an app named prompt

All tests are organized within an app, for eg: prompt/tests/test_urls.py

NOTE: The django server need not be up to run the tests, however the DB needs to be running.