lightrag

Author	SHA1	Message	Date
yangdx	cdd53ee875	Remove manual initialize_pipeline_status() calls across codebase - Auto-init pipeline status in storages - Remove redundant import statements - Simplify initialization pattern - Update docs and examples	2025-11-17 12:54:33 +08:00
yangdx	e22ac52ebc	Auto-initialize pipeline status in LightRAG.initialize_storages() • Remove manual initialize_pipeline_status calls • Auto-init in initialize_storages method • Update error messages for clarity • Warn on workspace conflicts	2025-11-17 12:54:33 +08:00
yangdx	e8383df3b8	Fix NamespaceLock context variable timing to prevent lock bricking * Acquire lock before setting ContextVar * Prevent state corruption on cancellation * Fix permanent lock brick scenario * Store context only after success * Handle acquisition failure properly	2025-11-17 12:54:33 +08:00
yangdx	95e1fb1612	Remove final_namespace attribute for in-memory storage and use namespace in clean_llm_query_cache.py	2025-11-17 12:54:33 +08:00
yangdx	7ed0eac4c9	Fix workspace filtering logic in get_all_update_flags_status • Handle namespaces with/without prefixes • Fix workspace matching logic	2025-11-17 12:54:33 +08:00
yangdx	78689e8837	Fix pipeline status namespace check to handle root case - Add check for bare "pipeline_status" - Handle namespace without prefix	2025-11-17 12:54:33 +08:00
yangdx	d54d0d55d9	Standardize empty workspace handling from "_" to "" across storage * Unify empty workspace behavior by changing workspace from "_" to "" * Fixed incorrect empty workspace detection in get_all_update_flags_status()	2025-11-17 12:54:33 +08:00
yangdx	b6a5a90eaf	Fix NamespaceLock concurrent coroutine safety with ContextVar - Use ContextVar for per-coroutine storage - Prevent state interference between coroutines - Add re-entrance protection check	2025-11-17 12:54:33 +08:00
yangdx	fd486bc922	Refactor storage classes to use namespace instead of final_namespace	2025-11-17 12:54:33 +08:00
yangdx	01814bfc7a	Fix missing function call parentheses in get_all_update_flags_status	2025-11-17 12:54:33 +08:00
yangdx	7deb9a64b9	Refactor namespace lock to support reusable async context manager • Add NamespaceLock class wrapper • Fix lock re-entrance issues • Enable concurrent lock usage • Fresh context per async with block • Update get_namespace_lock API	2025-11-17 12:54:33 +08:00
yangdx	52c812b9a0	Fix workspace isolation for pipeline status across all operations - Fix final_namespace error in get_namespace_data() - Fix get_workspace_from_request return type - Add workspace param to pipeline status calls	2025-11-17 12:54:33 +08:00
yangdx	926960e957	Refactor workspace handling to use default workspace and namespace locks - Remove DB-specific workspace configs - Add default workspace auto-setting - Replace global locks with namespace locks - Simplify pipeline status management - Remove redundant graph DB locking	2025-11-17 12:54:33 +08:00
yangdx	acae404f04	Update env.example • Comment out Ollama config • Set OpenAI as active default • Add EMBEDDING_TOKEN_LIMIT option • Add Gemini embedding configuration	2025-11-17 12:54:33 +08:00
yangdx	ec05d89c2a	Add macOS fork safety check for Gunicorn multi-worker mode • Check OBJC_DISABLE_INITIALIZE_FORK_SAFETY • Prevent NumPy/Accelerate crashes • Show detailed error message • Provide multiple fix options • Exit early if misconfigured	2025-11-17 12:54:33 +08:00
Sleeep	8abc2ac1cb	Update edge keywords extraction in graph visualization 构建neo4j时候关键字的取值默认为d7 应该为修改后的d9	2025-11-17 12:54:32 +08:00
yangdx	e5addf4d94	Improve embedding config priority and add debug logging • Fix embedding_dim priority logic • Add final config logging	2025-11-17 12:54:32 +08:00
yangdx	2fb57e767d	Fix embedding token limit initialization order * Capture max_token_size before decorator * Apply wrapper after capturing attribute * Prevent decorator from stripping dataclass * Ensure token limit is properly set	2025-11-17 12:54:32 +08:00
yangdx	6b2af2b579	Refactor embedding function creation with proper attribute inheritance - Extract max_token_size from providers - Avoid double-wrapping EmbeddingFunc - Improve configuration priority logic - Add comprehensive debug logging - Return complete EmbeddingFunc instance	2025-11-17 12:54:32 +08:00
yangdx	f0254773c6	Convert embedding_token_limit from property to field with __post_init__ • Remove property decorator • Add field with init=False • Set value in __post_init__ method • embedding_token_limit is now in config dictionary	2025-11-17 12:54:32 +08:00
yangdx	14a6c24ed7	Add configurable embedding token limit with validation - Add EMBEDDING_TOKEN_LIMIT env var - Set max_token_size on embedding func - Add token limit property to LightRAG - Validate summary length vs limit - Log warning when limit exceeded	2025-11-17 12:54:32 +08:00
yangdx	f5b48587ed	Improve Bedrock error handling with retry logic and custom exceptions • Add specific exception types • Implement proper retry mechanism • Better error classification • Enhanced logging and validation • Enable embedding retry decorator	2025-11-17 12:54:32 +08:00
yangdx	77221564b0	Add max_token_size parameter to embedding function decorators - Add max_token_size=8192 to all embed funcs - Move siliconcloud to deprecated folder - Import wrap_embedding_func_with_attrs - Update EmbeddingFunc docstring - Fix langfuse import type annotation	2025-11-17 12:54:32 +08:00
yangdx	8283c86bce	Refactor exception handling in MemgraphStorage label methods	2025-11-17 12:54:32 +08:00
yangdx	423e4e927a	Fix null reference errors in graph database error handling - Initialize result vars to None - Add null checks before consume calls - Prevent crashes in except blocks - Apply fix to both Neo4J and Memgraph	2025-11-17 12:54:32 +08:00
yangdx	2f2f35b883	Add macOS compatibility check for DOCLING with multi-worker Gunicorn	2025-11-17 12:54:32 +08:00
yangdx	c246eff725	Improve docling integration with macOS compatibility and CLI flag - Add --docling CLI flag for easier setup - Add numpy version constraints - Exclude docling on macOS (fork-safety)	2025-11-17 12:54:32 +08:00
yangdx	63510478e5	Improve error handling and logging in cloud model detection	2025-11-17 12:54:32 +08:00
LacombeLouis	67dfd85679	Add a better regex	2025-11-17 12:54:32 +08:00
Louis Lacombe	5127bf20ae	Add support for environment variable fallback for API key and default host for cloud models	2025-11-17 12:54:32 +08:00
yangdx	fa9206d69a	Update uv.lock	2025-11-17 12:54:32 +08:00
yangdx	7b7f93d77c	Implement lazy configuration initialization for API server • Add lazy config initialization • Maintain backward compatibility • Support programmatic usage • Add gunicorn dependency • Explicit config in entry points	2025-11-17 12:54:32 +08:00
yangdx	69a0b74ce7	refactor: move document deps to api group, remove dynamic imports - Merge offline-docs into api extras - Remove pipmaster dynamic installs - Add async document processing - Pre-check docling availability - Update offline deployment docs	2025-11-17 12:54:32 +08:00
yangdx	7d394fb0a4	Replace asyncio.iscoroutine with inspect.isawaitable for better detection	2025-11-17 12:54:32 +08:00
yangdx	72f68c2a61	Update env.example	2025-11-17 12:54:32 +08:00
yangdx	a08bc72635	Fix empty dict handling after JSON sanitization • Replace truthy checks with `is not None` • Handle empty dict edge case properly • Prevent data reload failures • Add comprehensive test coverage • Fix JsonKVStorage and DocStatusStorage	2025-11-17 12:54:32 +08:00
yangdx	cca0800ed4	Fix migration to reload sanitized data and prevent memory corruption • Reload cleaned data after sanitization • Update shared memory with clean data • Add specific surrogate char tests • Test migration sanitization flow • Prevent dirty data in memory	2025-11-17 12:54:32 +08:00
yangdx	7f54f47093	Optimize JSON string sanitization with precompiled regex and zero-copy - Precompile regex pattern at module level - Zero-copy path for clean strings - Use C-level regex for performance - Remove deprecated _sanitize_json_data - Fast detection for common case	2025-11-17 12:54:32 +08:00
yangdx	f289cf6225	Optimize JSON write with fast/slow path to reduce memory usage - Fast path for clean data (no sanitization) - Slow path sanitizes during encoding - Reload shared memory after sanitization - Custom encoder avoids deep copies - Comprehensive test coverage	2025-11-17 12:54:32 +08:00
yangdx	93a3e47134	Remove deprecated response_type parameter from query settings - Bump API version to 0254 - Remove response format UI controls - Hard-code response_type in query params - Add migration for version 19 - Clean up settings store structure	2025-11-17 12:54:32 +08:00
yangdx	abeaac84fa	Improve JSON data sanitization to handle tuples and dict keys - Sanitize dictionary keys - Preserve tuple types - Handle nested structures better	2025-11-17 12:54:32 +08:00
yangdx	5885637ebf	Add specialized JSON string sanitizer to prevent UTF-8 encoding errors • Remove surrogate characters (U+D800-DFFF) • Filter Unicode non-characters • Direct char-by-char filtering	2025-11-17 12:54:32 +08:00
yangdx	23cbb9c9b2	Add data sanitization to JSON writing to prevent UTF-8 encoding errors • Add _sanitize_json_data helper function • Recursively clean strings in data • Sanitize before JSON serialization • Prevent encoding-related crashes • Use existing sanitize_text_for_encoding	2025-11-17 12:54:32 +08:00
yangdx	ff8f158891	Update env.example	2025-11-17 12:54:32 +08:00
yangdx	c434879c7a	Replace PyPDF2 with pypdf for PDF processing - Update import from PyPDF2 to pypdf - Change dependency to pypdf>=6.1.0 - Update all requirements files - Remove PyPDF2 from lock file - Use modern pypdf library	2025-11-17 12:54:32 +08:00
yangdx	af5423919b	Support async chunking functions in LightRAG processing pipeline - Add Awaitable and Union type imports - Update chunking_func type annotation - Handle coroutine results with await - Add return type validation - Update docstring for async support	2025-11-17 12:54:32 +08:00
Tong Da	5016025453	easier version: detect chunking_func result is coroutine or not	2025-11-17 12:54:32 +08:00
Tong Da	7740500693	support async chunking func to improve processing performance when a heavy `chunking_func` is passed in by user	2025-11-17 12:54:32 +08:00
BukeLy	18a4870229	fix: Add default workspace support for backward compatibility Fixes two compatibility issues in workspace isolation: 1. Problem: lightrag_server.py calls initialize_pipeline_status() without workspace parameter, causing pipeline to initialize in global namespace instead of rag's workspace. Solution: Add set_default_workspace() mechanism in shared_storage. LightRAG.initialize_storages() now sets default workspace, which initialize_pipeline_status() uses when called without parameters. 2. Problem: /health endpoint hardcoded to use "pipeline_status", cannot return workspace-specific status or support frontend workspace selection. Solution: Add LIGHTRAG-WORKSPACE header support. Endpoint now extracts workspace from header or falls back to server default, returning correct workspace-specific pipeline status. Changes: - lightrag/kg/shared_storage.py: Add set/get_default_workspace() - lightrag/lightrag.py: Call set_default_workspace() in initialize_storages() - lightrag/api/lightrag_server.py: Add get_workspace_from_request() helper, update /health endpoint to support LIGHTRAG-WORKSPACE header Testing: - Backward compatibility: Old code works without modification - Multi-instance safety: Explicit workspace passing preserved - /health endpoint: Supports both default and header-specified workspaces Related: #2353	2025-11-17 12:54:20 +08:00
BukeLy	eb52ec94d7	feat: Add workspace isolation support for pipeline status Problem: In multi-tenant scenarios, different workspaces share a single global pipeline_status namespace, causing pipelines from different tenants to block each other, severely impacting concurrent processing performance. Solution: - Extended get_namespace_data() to recognize workspace-specific pipeline namespaces with pattern "{workspace}:pipeline" (following GraphDB pattern) - Added workspace parameter to initialize_pipeline_status() for per-tenant isolated pipeline namespaces - Updated all 7 call sites to use workspace-aware locks: * lightrag.py: process_document_queue(), aremove_document() * document_routes.py: background_delete_documents(), clear_documents(), cancel_pipeline(), get_pipeline_status(), delete_documents() Impact: - Different workspaces can process documents concurrently without blocking - Backward compatible: empty workspace defaults to "pipeline_status" - Maintains fail-fast: uninitialized pipeline raises clear error - Expected N× performance improvement for N concurrent tenants Bug fixes: - Fixed AttributeError by using self.workspace instead of self.global_config - Fixed pipeline status endpoint to show workspace-specific status - Fixed delete endpoint to check workspace-specific busy flag Code changes: 4 files, 141 insertions(+), 28 deletions(-) Testing: All syntax checks passed, comprehensive workspace isolation tests completed	2025-11-17 12:53:44 +08:00

1 2 3 4 5 ...

5656 Commits