Files
biz-bud/package.json
Travis Vasceannie aaa9fa285d Vasceannie/issue32 (#41)
* Repopatch (#31)

* fix: repomix output not being processed by analyze_content node

Fixed issue where repomix repository content was not being uploaded to R2R:
- Updated analyze_content_for_rag_node to check for repomix_output before scraped_content length check
- Fixed repomix content formatting to wrap in pages array as expected by upload_to_r2r_node
- Added proper metadata structure for repository content including URL preservation

The analyzer was returning early with "No new content to process" for git repos because scraped_content is empty for repomix. Now it properly processes repomix_output and formats it for R2R upload.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: update scrape status summary to properly report git repository processing

- Add detection for repomix_output and is_git_repo fields
- Include git repository-specific status messages
- Show repository processing method (Repomix) and output size
- Display R2R collection name when available
- Update fallback summary for git repos
- Add unit tests for git repository summary generation

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* feat: enhance GitHub URL processing and add unit tests for collection name extraction

- Implemented logic in extract_collection_name to detect GitHub, GitLab, and Bitbucket repository names from URLs.
- Added comprehensive unit tests for extract_collection_name to validate various GitHub URL formats.
- Updated existing tests to reflect correct repository name extraction for GitHub URLs.

This commit improves the handling of repository URLs, ensuring accurate collection name extraction for R2R uploads.

* feat: enhance configuration and documentation for receipt processing system

- Added `max_concurrent_scrapes` setting in `config.yaml` to limit concurrent scraping operations.
- Updated `pyrefly.toml` to include project root in search paths for module resolution.
- Introduced new documentation files for receipt processing, including agent design, code examples, database design, executive summary, implementation guide, and paperless integration.
- Enhanced overall documentation structure for better navigation and clarity.

This commit improves the configuration management and provides comprehensive documentation for the receipt processing system, facilitating easier implementation and understanding of the workflow.

* fix: adjust FirecrawlApp configuration for improved performance

- Reduced timeout from 160 to 60 seconds and max retries from 2 to 1 in firecrawl_discover_urls_node and firecrawl_process_single_url_node for better responsiveness.
- Implemented URL limit of 100 in firecrawl_discover_urls_node to prevent system overload and added logging for URL processing.
- Updated batch scraping concurrency settings in firecrawl_process_single_url_node to dynamically adjust based on batch size, enhancing efficiency.

These changes optimize the Firecrawl integration for more effective URL discovery and processing.

* fix: resolve linting and type checking errors

- Fix line length error in statistics.py by breaking long string
- Replace Union type annotations with modern union syntax in types.py
- Fix module level import ordering in tools.py
- Add string quotes for forward references in arxiv.py and firecrawl.py
- Replace problematic Any type annotations with proper types where possible
- Fix isinstance call with union types using tuple syntax with noqa
- Move runtime imports to TYPE_CHECKING blocks to fix TC001 violations
- Fix typo in CLAUDE.md documentation
- Add codespell ignore for package-lock.json hash strings
- Fix cast() calls to use proper type objects
- Fix callback function to be synchronous instead of awaited
- Add noqa comments for legitimate Any usage in serialization utilities
- Regenerate package-lock.json to resolve integrity issues

* chore: update various configurations and documentation across the project

- Modified .gitignore to exclude unnecessary files and directories.
- Updated .roomodes and CLAUDE.md for improved clarity and organization.
- Adjusted package-lock.json and package.json for dependency management.
- Enhanced pyrefly.toml and pyrightconfig.json for better project configuration.
- Refined settings in .claude/settings.json and .roo/mcp.json for consistency.
- Improved documentation in .roo/rules and examples for better usability.
- Updated multiple test files and configurations to ensure compatibility and clarity.

These changes collectively enhance project organization, configuration management, and documentation quality.

* feat: enhance configuration and error handling in LLM services

- Updated `config.yaml` to improve clarity and organization, including detailed comments on configuration precedence and environment variable overrides.
- Modified LLM client to conditionally remove the temperature parameter for reasoning models, ensuring proper model behavior.
- Adjusted integration tests to reflect status changes from "completed" to "success" for better accuracy in test assertions.
- Enhanced error handling in various tests to ensure robust responses to missing API keys and other edge cases.

These changes collectively improve the configuration management, error handling, and overall clarity of the LLM services.

* feat: add direct reference allowance in Hatch metadata

- Updated `pyproject.toml` to include `allow-direct-references` in Hatch metadata, enhancing package management capabilities.

This change improves the configuration for package references in the project.

* feat: add collection name override functionality in URL processing

- Enhanced `process_url_to_r2r`, `stream_url_to_r2r`, and `process_url_to_r2r_with_streaming` functions to accept an optional `collection_name` parameter for overriding automatic derivation.
- Updated `URLToRAGState` to include `collection_name` for better state management.
- Modified `upload_to_r2r_node` to utilize the override collection name when provided.
- Added comprehensive unit tests to validate the collection name override functionality.

These changes improve the flexibility of URL processing by allowing users to specify a custom collection name, enhancing the overall usability of the system.

* feat: add component extraction and categorization functionality

- Introduced `ComponentExtractor` and `ComponentCategorizer` classes for extracting and categorizing components from text across various industries.
- Updated `__init__.py` to include new component extraction functionalities in the domain module.
- Refactored import paths in `catalog_component_extraction.py` and test files to align with the new structure.

These changes enhance the system's ability to process and categorize components, improving overall functionality.

* feat: enhance Firecrawl integration and configuration management

- Updated `.gitignore` to exclude task files for better organization.
- Modified `config.yaml` to include `max_pages_to_map` for improved URL mapping capabilities.
- Enhanced `Makefile` to include `pyright` for type checking during linting.
- Introduced new scripts for cache clearing and improved error handling in various nodes.
- Added comprehensive tests for duplicate detection and URL processing, ensuring robust functionality.

These changes collectively enhance the Firecrawl integration, improve configuration management, and ensure better testing coverage for the system.

* feat: update URL processing logic to improve batch handling

- Modified `should_scrape_or_skip` function to return "increment_index" instead of "skip_to_summary" when no URLs are available, enhancing batch processing flow.
- Updated documentation and comments to reflect changes in the URL processing logic, clarifying the new behavior for empty batches.

These changes improve the efficiency of URL processing by ensuring that empty batches are handled correctly, allowing for seamless transitions to the next batch.

* fix: update linting script path in settings.json

- Changed the command path for the linting script in `.claude/settings.json` from `./scripts/lint-file.sh` to `../scripts/lint-file.sh` to ensure correct execution.

This change resolves the issue with the linting script not being found due to an incorrect relative path.

* feat: enhance linting script output and URL processing logic

- Updated the linting script in `scripts/lint-file.sh` to provide clearer output messages for linting results, including separators and improved failure messages.
- Modified `preserve_url_fields_node` function in `url_to_r2r.py` to increment the batch index for URL processing, ensuring better handling of batch completion and logging.

These changes improve the user experience during linting and enhance the URL processing workflow.

* feat: enhance URL processing and configuration management

- Added `max_pages_to_crawl` to `config.yaml` to increase the number of pages processed after discovery.
- Updated `preserve_url_fields_node` and `should_process_next_url` functions in `url_to_r2r.py` to utilize `sitemap_urls` for improved URL handling and logging.
- Introduced `batch_size` in `URLToRAGState` for better control over URL processing in batches.

These changes improve the efficiency and flexibility of URL processing and enhance configuration management.

* feat: increase max pages to crawl in configuration

- Updated `max_pages_to_crawl` in `config.yaml` from 1000 to 2000 to enhance the number of pages processed after discovery, improving overall URL processing capabilities.

* fix: clear batch_urls_to_scrape in firecrawl_process_single_url_node

- Added logic to clear `batch_urls_to_scrape` to signal batch completion in the `firecrawl_process_single_url_node` function, ensuring proper handling of batch states.
- Updated `.gitignore` to include a trailing space for better consistency in ignored task files.

* fix: update firecrawl_batch_process_node to clear batch_urls_to_scrape

- Changed the key for URLs to scrape from `urls_to_process` to `batch_urls_to_scrape` in the `firecrawl_batch_process_node` function.
- Added logic to clear `batch_urls_to_scrape` upon completion of the batch process, ensuring proper state management.

* fix: improve company name extraction and human assistance flow

- Updated `extract_company_names` function to skip empty company names during extraction, enhancing the accuracy of results.
- Modified `human_assistance` function to be asynchronous, allowing for non-blocking execution and improved workflow interruption handling.
- Adjusted logging in `firecrawl_legacy.py` to correctly format fallback config names, ensuring clarity in logs.
- Cleaned up test assertions in `test_agent_nodes_r2r.py` and `test_upload_r2r_comprehensive.py` for better readability and consistency.

* feat: add black formatting support in Makefile and scripts

- Introduced a new `black` target in the Makefile to format specified Python files using the Black formatter.
- Added a new script `black-file.sh` to handle pre-tool use hooks for formatting Python files before editing or writing.
- Updated `.claude/settings.json` to include the new linting script for pre-tool use, ensuring consistent formatting checks.

These changes enhance code quality by integrating automatic formatting into the development workflow.

* fix: update validation methods and enhance configuration models

- Refactored validation methods in various models to use instance methods instead of class methods for improved clarity and consistency.
- Updated `.gitignore` to include task files, ensuring better organization of ignored files.
- Added new fields and validation logic in configuration models for enhanced database and service configurations, improving overall robustness and usability.

These changes enhance code quality and maintainability across the project.

* feat: enhance configuration and validation structure

- Updated `.gitignore` to include `tasks.json` for better organization of ignored files.
- Added new documentation files for best practices and patterns in LangGraph implementation.
- Introduced new validation methods and configuration models to improve robustness and usability.
- Removed outdated documentation files to streamline the codebase.

These changes enhance the overall structure and maintainability of the project.

* fix: update test assertions and improve .gitignore

- Modified `.gitignore` to ensure proper organization of ignored task files by adding a trailing space.
- Updated assertions in `test_scrapers.py` to reflect the expected structure of the result when scraping an empty URL list.
- Adjusted the action type in `test_error_handling_integration.py` to use the correct custom action type for better clarity.
- Changed the import path in `test_semantic_extraction.py` to reflect the new module structure.

These changes enhance test accuracy and maintainability of the project.

* fix: update validation methods and enhance metadata handling

- Refactored validation methods in various models to use class methods for improved consistency.
- Updated `.gitignore` to ensure proper organization of ignored task files by adding a trailing space.
- Enhanced metadata handling in `FirecrawlStrategy` to convert `FirecrawlMetadata` to `PageMetadata`.
- Improved validation logic in multiple models to ensure proper type handling and error management.

These changes enhance code quality and maintainability across the project.

* ayoooo

* fix: update test component name extraction for consistency

- Modified test assertions in `test_catalog_research_integration.py` to ensure component names are converted to strings before applying the `lower()` method. This change enhances the robustness of the tests by preventing potential errors with non-string values.

These changes improve test reliability and maintainability.

---------

Co-authored-by: Claude <noreply@anthropic.com>

* fix: resolve all 226 pyrefly type errors across codebase

- Fixed import and function usage errors in example files
- Added proper type casting for intentional validation test mismatches
- Resolved missing argument errors in Pydantic model constructors
- Fixed dictionary unpacking type errors with explicit constructor calls
- Updated configuration validation tests to satisfy strict type checking

All fixes maintain original test logic while satisfying pyrefly requirements.
No use of type: ignore or Any types per project guidelines.

* fix: resolve all 226 pyrefly type errors with ruff-compatible approach

Final solution uses 'is None or isinstance(value, union_type)' pattern which:
- Pyrefly accepts: separate None check + isinstance with union types
- Ruff accepts and auto-formats to modern union syntax in isinstance calls

 COMPLETE: 226 → 0 pyrefly errors resolved
 All pre-commit hooks passing
 Ruff and pyrefly both satisfied

* chore: update .gitignore and add new configuration files

- Added tasks.json and tasks/ to .gitignore for better organization of ignored files.
- Introduced .mcp.json for project configuration management.
- Updated CLAUDE.local.md with development warnings and guidance.
- Enhanced dev.sh script with additional shell commands for container management.
- Removed test_fixes_summary.md as it is no longer needed.
- Updated various documentation files for clarity and consistency.

These changes improve project organization and provide clearer development guidelines.

* oh snap

* resolve: fix merge conflict in CLAUDE.md by removing duplicate content

* yo

* chore: update .gitignore and enhance project configuration

- Modified .gitignore to include a trailing space for better consistency in ignored task files.
- Added pytest-xdist to development dependencies in pyproject.toml for parallel test execution.
- Updated VSCode settings to reflect new package paths for improved development experience.
- Refactored get_stream_writer calls in various files to handle RuntimeError gracefully during tests.
- Introduced a new legacy firecrawl module for backward compatibility with tests.
- Added RepomixClient class for interacting with the repomix repository analysis tool.

These changes improve project organization, enhance testing capabilities, and ensure backward compatibility.x

* fix: enhance error handling and threading in error registry

- Introduced threading lock in ErrorRegistry to ensure thread-safe singleton instantiation.
- Updated handle_exception_group decorator to support both sync and async functions, improving flexibility in error handling.
- Refactored exception handling logic to provide clearer error messages for exception groups in both sync and async contexts.

These changes improve the robustness of the error handling framework and enhance the overall usability of the error management system.

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-07-14 21:23:14 -04:00

7 lines
232 B
JSON

{
"dependencies": {
"task-master-ai": "^0.19.0"
},
"packageManager": "pnpm@10.13.1+sha512.37ebf1a5c7a30d5fabe0c5df44ee8da4c965ca0c5af3dbab28c3a1681b70a256218d05c81c9c0dcf767ef6b8551eb5b960042b9ed4300c59242336377e01cfad"
}