lightrag

Author	SHA1	Message	Date
yangdx	4ac5ec4c2f	Improve Qdrant workspace detection via payload sampling - Detect unindexed workspace_id via sample - Prevent cross-workspace data leakage - Fix empty workspace warning logic - Update migration tests for sampling	2025-12-19 18:00:17 +08:00
yangdx	1c083c6699	Remove redundant pytest.mark.asyncio decorators - Remove explicit asyncio markers - Clean up unused imports in tests	2025-12-19 16:00:37 +08:00
yangdx	0ae60d36bc	Improve Qdrant migration checks and verification logic - Check workspace data before migrating - Update Qdrant migration tests	2025-12-19 12:03:38 +08:00
yangdx	bf618fc976	Refactor Qdrant setup and migration logic - Validate dimensions before migration - Require namespace and workspace args - Raise error on vector size mismatch - Simplify collection initialization flow - Update tests for strict checks	2025-12-19 10:45:18 +08:00
yangdx	1b62ec9af5	refactor(Qdrant): simplify suffix generation and improve migration logic - Move suffix generation logic to BaseVectorStorage._generate_collection_suffix() - Remove EmbeddingFunc.get_model_identifier() and unused abstract methods - Qdrant: raise error on dimension mismatch, disable auto-deletion of legacy collections - Update tests accordingly BREAKING CHANGE: Qdrant dimension mismatch raises error; legacy collections require manual cleanup	2025-12-16 12:33:17 +08:00
BukeLy	5180c1e395	feat: implement dimension compatibility checks for PostgreSQL and Qdrant migrations This update introduces checks for vector dimension compatibility before migrating legacy data in both PostgreSQL and Qdrant storage implementations. If a dimension mismatch is detected, the migration is skipped to prevent data loss, and a new empty table or collection is created for the new embedding model. Key changes include: - Added dimension checks in `PGVectorStorage` and `QdrantVectorDBStorage` classes. - Enhanced logging to inform users about dimension mismatches and the creation of new storage. - Updated E2E tests to validate the new behavior, ensuring legacy data is preserved and new structures are created correctly. Impact: - Prevents potential data corruption during migrations with mismatched dimensions. - Improves user experience by providing clear logging and maintaining legacy data integrity. Testing: - New tests confirm that the system behaves as expected when encountering dimension mismatches.	2025-11-20 12:22:13 +08:00
BukeLy	8386ea061e	refactor: unify PostgreSQL and Qdrant migration logic for consistency Why this change is needed: Previously, PostgreSQL and Qdrant had inconsistent migration behavior: - PostgreSQL kept legacy tables after migration, requiring manual cleanup - Qdrant auto-deleted legacy collections after migration This inconsistency caused confusion for users and required different documentation for each backend. How it solves the problem: Unified both backends to follow the same smart cleanup strategy: - Case 1 (both exist): Auto-delete if legacy is empty, warn if has data - Case 4 (migration): Auto-delete legacy after successful verification This provides a fully automated migration experience without manual intervention. Impact: - Eliminates need for users to manually delete legacy tables/collections - Reduces storage waste from duplicate data - Provides consistent behavior across PostgreSQL and Qdrant - Simplifies documentation and user experience Testing: - All 16 unit tests pass (8 PostgreSQL + 8 Qdrant) - Added 4 new tests for Case 1 scenarios (empty vs non-empty legacy) - Updated E2E tests to verify auto-deletion behavior - All lint checks pass (ruff-format, ruff, trailing-whitespace)	2025-11-20 11:37:59 +08:00
BukeLy	6bef40766d	style: fix lint errors (trailing whitespace and formatting)	2025-11-20 01:41:23 +08:00
BukeLy	5d9547344a	fix: correct Qdrant legacy_namespace for data migration Why this change is needed: The legacy_namespace logic was incorrectly including workspace in the collection name, causing migration to fail in E2E tests. When workspace was set (e.g., to a temp directory path), legacy_namespace became "/tmp/xxx_chunks" instead of "lightrag_vdb_chunks", so the migration logic couldn't find the legacy collection. How it solves it: Changed legacy_namespace to always use the old naming scheme without workspace prefix: "lightrag_vdb_{namespace}". This matches the actual collection names from pre-migration code and aligns with PostgreSQL's approach where legacy_table_name = base_table (without workspace). Impact: - Qdrant legacy data migration now works correctly in E2E tests - All unit tests pass (6/6 for both Qdrant and PostgreSQL) - E2E test_legacy_migration_qdrant should now pass Testing: - Unit tests: pytest tests/test_qdrant_migration.py -v (6/6 passed) - Unit tests: pytest tests/test_postgres_migration.py -v (6/6 passed) - Updated test_qdrant_collection_naming to verify new legacy_namespace	2025-11-20 01:08:15 +08:00
BukeLy	ad68624d02	feat: PostgreSQL model isolation and auto-migration Why this change is needed: PostgreSQL vector storage needs model isolation to prevent dimension conflicts when different workspaces use different embedding models. Without this, the first workspace locks the vector dimension for all subsequent workspaces, causing failures. How it solves it: - Implements dynamic table naming with model suffix: {table}_{model}_{dim}d - Adds setup_table() method mirroring Qdrant's approach for consistency - Implements 4-branch migration logic: both exist -> warn, only new -> use, neither -> create, only legacy -> migrate - Batch migration: 500 records/batch (same as Qdrant) - No automatic rollback to support idempotent re-runs Impact: - PostgreSQL tables now isolated by embedding model and dimension - Automatic data migration from legacy tables on startup - Backward compatible: model_name=None defaults to "unknown" - All SQL operations use dynamic table names Testing: - 6 new tests for PostgreSQL migration (100% pass) - Tests cover: naming, migration trigger, scenarios 1-3 - 3 additional scenario tests added for Qdrant completeness Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-19 22:54:37 +08:00
BukeLy	df5aacb545	feat: Qdrant model isolation and auto-migration Why this change is needed: To implement vector storage model isolation for Qdrant, allowing different workspaces to use different embedding models without conflict, and automatically migrating existing data. How it solves it: - Modified QdrantVectorDBStorage to use model-specific collection suffixes - Implemented automated migration logic from legacy collections to new schema - Fixed Shared-Data lock re-entrancy issue in multiprocess mode - Added comprehensive tests for collection naming and migration triggers Impact: - Existing users will have data automatically migrated on next startup - New workspaces will use isolated collections based on embedding model - Fixes potential lock-related bugs in shared storage Testing: - Added tests/test_qdrant_migration.py passing - Verified migration logic covers all 4 states (New/Legacy existence combinations)	2025-11-19 18:47:38 +08:00

11 Commits