Files
lightrag/docs/OfflineDeployment.md

9.3 KiB

LightRAG Offline Deployment Guide

This guide provides comprehensive instructions for deploying LightRAG in offline environments where internet access is limited or unavailable.

If you deploy LightRAG using Docker, there is no need to refer to this document, as the LightRAG Docker image is pre-configured for offline operation.

Software packages requiring transformers, torch, or cuda will not be included in the offline dependency group. Consequently, document extraction tools such as Docling, as well as local LLM models like Hugging Face and LMDeploy, are outside the scope of offline installation support. These high-compute-resource-demanding services should not be integrated into LightRAG. Docling will be decoupled and deployed as a standalone service.

Table of Contents

Overview

LightRAG uses dynamic package installation (pipmaster) for optional features based on file types and configurations. In offline environments, these dynamic installations will fail. This guide shows you how to pre-install all necessary dependencies and cache files.

What Gets Dynamically Installed?

LightRAG dynamically installs packages for:

  • Storage Backends: redis, neo4j, pymilvus, pymongo, asyncpg, qdrant-client
  • LLM Providers: openai, anthropic, ollama, zhipuai, aioboto3, voyageai, llama-index, lmdeploy, transformers, torch
  • Tiktoken Models: BPE encoding models downloaded from OpenAI CDN

Note: Document processing dependencies (pypdf, python-docx, python-pptx, openpyxl) are now pre-installed with the api extras group and no longer require dynamic installation.

Quick Start

Option 1: Using pip with Offline Extras

# Online environment: Install all offline dependencies
pip install lightrag-hku[offline]

# Download tiktoken cache
lightrag-download-cache

# Create offline package
pip download lightrag-hku[offline] -d ./offline-packages
tar -czf lightrag-offline.tar.gz ./offline-packages ~/.tiktoken_cache

# Transfer to offline server
scp lightrag-offline.tar.gz user@offline-server:/path/to/

# Offline environment: Install
tar -xzf lightrag-offline.tar.gz
pip install --no-index --find-links=./offline-packages lightrag-hku[offline]
export TIKTOKEN_CACHE_DIR=~/.tiktoken_cache

Option 2: Using Requirements Files

# Online environment: Download packages
pip download -r requirements-offline.txt -d ./packages

# Transfer to offline server
tar -czf packages.tar.gz ./packages
scp packages.tar.gz user@offline-server:/path/to/

# Offline environment: Install
tar -xzf packages.tar.gz
pip install --no-index --find-links=./packages -r requirements-offline.txt

Layered Dependencies

LightRAG provides flexible dependency groups for different use cases:

Available Dependency Groups

Group Description Use Case
api API server + document processing FastAPI server with PDF, DOCX, PPTX, XLSX support
offline-storage Storage backends Redis, Neo4j, MongoDB, PostgreSQL, etc.
offline-llm LLM providers OpenAI, Anthropic, Ollama, etc.
offline Complete offline package API + Storage + LLM (all features)

Note: Document processing (PDF, DOCX, PPTX, XLSX) is included in the api extras group. The previous offline-docs group has been merged into api for better integration.

Software packages requiring transformers, torch, or cuda will not be included in the offline dependency group.

Installation Examples

# Install API with document processing
pip install lightrag-hku[api]

# Install API and storage backends
pip install lightrag-hku[api,offline-storage]

# Install all offline dependencies (recommended for offline deployment)
pip install lightrag-hku[offline]

Using Individual Requirements Files

# Storage backends only
pip install -r requirements-offline-storage.txt

# LLM providers only
pip install -r requirements-offline-llm.txt

# All offline dependencies
pip install -r requirements-offline.txt

Tiktoken Cache Management

Tiktoken downloads BPE encoding models on first use. In offline environments, you must pre-download these models.

Using the CLI Command

After installing LightRAG, use the built-in command:

# Download to default location (see output for exact path)
lightrag-download-cache

# Download to specific directory
lightrag-download-cache --cache-dir ./tiktoken_cache

# Download specific models only
lightrag-download-cache --models gpt-4o-mini gpt-4

Default Models Downloaded

  • gpt-4o-mini (LightRAG default)
  • gpt-4o
  • gpt-4
  • gpt-3.5-turbo
  • text-embedding-ada-002
  • text-embedding-3-small
  • text-embedding-3-large

Setting Cache Location in Offline Environment

# Option 1: Environment variable (temporary)
export TIKTOKEN_CACHE_DIR=/path/to/tiktoken_cache

# Option 2: Add to ~/.bashrc or ~/.zshrc (persistent)
echo 'export TIKTOKEN_CACHE_DIR=~/.tiktoken_cache' >> ~/.bashrc
source ~/.bashrc

# Option 3: Copy to default location
cp -r /path/to/tiktoken_cache ~/.tiktoken_cache/

Complete Offline Deployment Workflow

Step 1: Prepare in Online Environment

# 1. Install LightRAG with offline dependencies
pip install lightrag-hku[offline]

# 2. Download tiktoken cache
lightrag-download-cache --cache-dir ./offline_cache/tiktoken

# 3. Download all Python packages
pip download lightrag-hku[offline] -d ./offline_cache/packages

# 4. Create archive for transfer
tar -czf lightrag-offline-complete.tar.gz ./offline_cache

# 5. Verify contents
tar -tzf lightrag-offline-complete.tar.gz | head -20

Step 2: Transfer to Offline Environment

# Using scp
scp lightrag-offline-complete.tar.gz user@offline-server:/tmp/

# Or using USB/physical media
# Copy lightrag-offline-complete.tar.gz to USB drive

Step 3: Install in Offline Environment

# 1. Extract archive
cd /tmp
tar -xzf lightrag-offline-complete.tar.gz

# 2. Install Python packages
pip install --no-index \
    --find-links=/tmp/offline_cache/packages \
    lightrag-hku[offline]

# 3. Set up tiktoken cache
mkdir -p ~/.tiktoken_cache
cp -r /tmp/offline_cache/tiktoken/* ~/.tiktoken_cache/
export TIKTOKEN_CACHE_DIR=~/.tiktoken_cache

# 4. Add to shell profile for persistence
echo 'export TIKTOKEN_CACHE_DIR=~/.tiktoken_cache' >> ~/.bashrc

Step 4: Verify Installation

# Test Python import
python -c "from lightrag import LightRAG; print('✓ LightRAG imported')"

# Test tiktoken
python -c "from lightrag.utils import TiktokenTokenizer; t = TiktokenTokenizer(); print('✓ Tiktoken working')"

# Test optional dependencies (if installed)
python -c "import docling; print('✓ Docling available')"
python -c "import redis; print('✓ Redis available')"

Troubleshooting

Issue: Tiktoken fails with network error

Problem: Unable to load tokenizer for model gpt-4o-mini

Solution:

# Ensure TIKTOKEN_CACHE_DIR is set
echo $TIKTOKEN_CACHE_DIR

# Verify cache files exist
ls -la ~/.tiktoken_cache/

# If empty, you need to download cache in online environment first

Issue: Dynamic package installation fails

Problem: Error installing package xxx

Solution:

# Pre-install the specific package you need
# For API with document processing:
pip install lightrag-hku[api]

# For storage backends:
pip install lightrag-hku[offline-storage]

# For LLM providers:
pip install lightrag-hku[offline-llm]

Issue: Missing dependencies at runtime

Problem: ModuleNotFoundError: No module named 'xxx'

Solution:

# Check what you have installed
pip list | grep -i xxx

# Install missing component
pip install lightrag-hku[offline]  # Install all offline deps

Issue: Permission denied on tiktoken cache

Problem: PermissionError: [Errno 13] Permission denied

Solution:

# Ensure cache directory has correct permissions
chmod 755 ~/.tiktoken_cache
chmod 644 ~/.tiktoken_cache/*

# Or use a user-writable directory
export TIKTOKEN_CACHE_DIR=~/my_tiktoken_cache
mkdir -p ~/my_tiktoken_cache

Best Practices

  1. Test in Online Environment First: Always test your complete setup in an online environment before going offline.

  2. Keep Cache Updated: Periodically update your offline cache when new models are released.

  3. Document Your Setup: Keep notes on which optional dependencies you actually need.

  4. Version Pinning: Consider pinning specific versions in production:

    pip freeze > requirements-production.txt
    
  5. Minimal Installation: Only install what you need:

    # If you only need API with document processing
    pip install lightrag-hku[api]
    # Then manually add specific LLM: pip install openai
    

Additional Resources

Support

If you encounter issues not covered in this guide:

  1. Check the GitHub Issues
  2. Review the project documentation
  3. Create a new issue with your offline deployment details