* feat: implement catalog intelligence features and update configurations - Added new catalog intelligence tools and workflows, including ingredient extraction and research nodes. - Updated config.yaml to include new catalog items and categories. - Introduced comprehensive documentation for catalog data source configuration and caching. - Refactored existing code to replace menu-related references with catalog-specific terminology. - Enhanced integration tests for catalog intelligence workflows and data sources. This commit enhances the catalog management capabilities of the application, providing better support for Caribbean food items and related data processing. * docs: add comprehensive documentation for docstring enrichment progress and session summaries - Introduced new documentation files to track the progress of enriching docstrings across the Business Buddy framework. - Added detailed summaries for each session, outlining accomplishments, statistics, and strategic value. - Enhanced overall documentation quality to ensure clarity and accessibility for developers and users. This commit significantly improves the documentation framework, providing a structured overview of the ongoing efforts to enhance code clarity and maintainability. * Fix extraction package imports and update documentation - Fixed SIM102 linting error by combining nested if statements - Fixed JSON syntax in .vscode/settings.json by removing comments - Fixed type error in ingredient_extractor.py by adding type checking - Added 'nd' to codespell ignore list for ordinal numbers - Updated DEVELOPMENT.md with comprehensive bb_extraction documentation 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix catalog intelligence and data loading issues - Fixed catalog intelligence graph routing to always generate optimization reports - Added basic catalog optimization suggestions when no ingredient focus is found - Implemented database catalog loading and data source tracking - Enhanced ingredient detection with context pattern matching - Fixed fallback behavior to return proper data source metadata - Fixed type checking issues with dynamic db_service method calls Major fixes: - catalog_intel.py: Route to generate_report even without ingredient focus - c_intel.py: Generate basic suggestions and add context-aware ingredient detection - load_catalog_data.py: Implement database loading and track data_source_used 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * feat: enhance business-buddy-extraction package with new features and configurations - Updated .vscode settings for improved Python terminal integration and testing coverage. - Expanded documentation in DEVELOPMENT.md to clarify content extraction utilities. - Introduced new configuration files for Pyright and pytest to streamline development and testing processes. - Added new core extraction utilities and refactored existing modules for better organization and maintainability. - Implemented comprehensive test suite enhancements, including new fixtures and assertions for improved test reliability. This commit significantly improves the functionality and usability of the business-buddy-extraction package, facilitating better development practices and testing capabilities. * fix: resolve blocking I/O issue in browser scraping code - Move get_logger import to module level to avoid blocking imports in async context - Replace synchronous file operations with async-safe alternatives in cookie handling - Use asyncio.run_in_executor for file I/O operations when in async context - Maintain backward compatibility by falling back to sync operations when not in async context This fixes the "Blocking call to io.TextIOWrapper.read" error that was occurring during browser scraping operations in async workflows. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * feat: introduce technology component research and extraction capabilities - Added a new example for catalog component research specific to technology products. - Implemented component extraction and research nodes to handle various industries, including technology. - Refactored existing ingredient-related nodes and types to support component-focused workflows. - Updated state management to track component research results and analytics. - Enhanced testing coverage for the new component research functionalities. This commit significantly expands the capabilities of the catalog research framework, allowing for comprehensive analysis and extraction of components across different industries. * Fix item name lowercasing by adding assignment (#30) Co-authored-by: Cursor Agent <cursoragent@cursor.com> * refactor: enhance FirecrawlOptions with default formats and improve type checks - Added field_validator to set default formats for FirecrawlOptions if none are provided. - Updated type checks in WebBaseLoaderScraper to use isinstance for element validation. - Improved error logging in catalog_component_extraction to include exception details. These changes improve the robustness and clarity of the code, ensuring better handling of default values and error reporting. * refactor: update ComponentExtractor to include quantity and unit fields - Enhanced the ComponentExtractor class to add 'quantity' and 'unit' fields for better component data representation. - Introduced 'raw_data' to encapsulate the original text for improved data handling. These changes improve the clarity and usability of the component extraction process, allowing for more detailed component information to be captured. * feat: add component matching function to improve ingredient detection - Introduced _is_component_match function to enhance matching logic for components, preventing false positives using word boundaries. - Updated identify_component_focus_node and find_affected_catalog_items_node to utilize the new matching function for more accurate component identification. These changes improve the accuracy of ingredient detection in catalog intelligence analysis. * refactor: improve type assertions in monitoring and query optimization - Added type assertions for async and sync functions in HealthMonitor to enhance type safety. - Specified type for the domain variable in extract_collection_name function to improve clarity. - Clarified type for query_lower variable in QueryOptimizer to ensure consistent type usage. These changes enhance code readability and maintainability by ensuring proper type handling across the affected modules. * refactor: enhance type handling in URL extraction and query optimization - Updated the domain variable in extract_collection_name to ensure it is always a string, improving robustness. - Modified query_lower in QueryOptimizer to explicitly convert the query to a string before lowering, enhancing type safety. These changes improve code clarity and maintainability by ensuring consistent type handling in the affected functions. * refactor: improve type handling in URL extraction and query optimization - Updated the domain variable in extract_collection_name to use cast for type safety. - Modified query_lower in QueryOptimizer to utilize cast for consistent type handling. These changes enhance code clarity and maintainability by ensuring robust type assertions in the affected functions. * refactor: update lint workflow to use pyrefly for type checking - Replaced mypy with pyrefly for type checking in the GitHub Actions workflow. - Updated the corresponding echo message to reflect the change in type checking tool. These changes enhance the linting process by integrating pyrefly for improved type checking capabilities. * refactor: update research agent initialization to handle API key errors - Removed module-level initialization of the research agent to prevent API key errors during import. - Added error handling in the __getattr__ method to return a placeholder if the research agent cannot be created, ensuring graceful failure in tests. These changes improve the robustness of the research agent handling, particularly in test environments. * refactor: simplify cookie handling in Browser class - Removed asynchronous handling for saving and loading cookies in the Browser class. - Updated save_cookies and load_cookies methods to always operate synchronously, improving code clarity and maintainability. These changes streamline cookie management by ensuring consistent synchronous operations. * refactor: streamline virtual environment setup in CI workflow - Removed caching of the virtual environment in the GitHub Actions workflow. - Simplified the installation process by always creating a new virtual environment and installing dependencies directly. - Updated cache status messages to reflect changes in caching strategy. These changes enhance the clarity and efficiency of the CI workflow by simplifying the virtual environment management. * chore: update CI workflow and enhance Firecrawl configuration handling - Added step to download NLTK data in the CI workflow for improved testing capabilities. - Modified Firecrawl configuration extraction to support both "api_config" and "api" keys for better compatibility with different config formats. - Enhanced unit tests for Repomix to mock binary existence, ensuring more reliable test execution. These changes improve the CI setup and enhance the flexibility of configuration handling in the Firecrawl integration. * test: enhance Repomix tests by mocking binary executability - Added mock for `os.access` to ensure the executability of the Repomix binary in unit tests. - Updated comments to clarify the purpose of mocking binary existence and executability. These changes improve the reliability of Repomix tests by accurately simulating the environment in which the binary operates. --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Cursor Agent <cursoragent@cursor.com>
105 lines
3.6 KiB
YAML
105 lines
3.6 KiB
YAML
# This workflow will run unit tests for the current project
|
|
|
|
name: CI
|
|
|
|
on:
|
|
push:
|
|
branches: ["main", "develop", "semantic"]
|
|
pull_request:
|
|
workflow_dispatch: # Allows triggering the workflow manually in GitHub UI
|
|
|
|
# If another push to the same PR or branch happens while this workflow is still running,
|
|
# cancel the earlier run in favor of the next run.
|
|
concurrency:
|
|
group: ${{ github.workflow }}-${{ github.ref }}
|
|
cancel-in-progress: true
|
|
|
|
jobs:
|
|
unit-tests:
|
|
name: Unit Tests
|
|
strategy:
|
|
matrix:
|
|
os: [ubuntu-latest]
|
|
python-version: ["3.12"]
|
|
runs-on: ${{ matrix.os }}
|
|
steps:
|
|
- uses: actions/checkout@v4
|
|
- name: Set up Python ${{ matrix.python-version }}
|
|
uses: actions/setup-python@v4
|
|
with:
|
|
python-version: ${{ matrix.python-version }}
|
|
|
|
- name: Cache uv dependencies
|
|
uses: actions/cache@v3
|
|
with:
|
|
path: ~/.cache/uv
|
|
key: ${{ runner.os }}-uv-${{ hashFiles('**/pyproject.toml') }}
|
|
restore-keys: |
|
|
${{ runner.os }}-uv-
|
|
|
|
|
|
- name: Cache Ruff
|
|
uses: actions/cache@v3
|
|
with:
|
|
path: ~/.cache/ruff
|
|
key: ${{ runner.os }}-ruff-${{ hashFiles('src/**/*.py') }}
|
|
restore-keys: |
|
|
${{ runner.os }}-ruff-
|
|
|
|
|
|
|
|
- name: Cache pytest
|
|
uses: actions/cache@v3
|
|
with:
|
|
path: .pytest_cache
|
|
key: ${{ runner.os }}-pytest-${{ hashFiles('tests/**/*.py', 'src/**/*.py') }}
|
|
restore-keys: |
|
|
${{ runner.os }}-pytest-
|
|
|
|
- name: Cache Python bytecode
|
|
uses: actions/cache@v3
|
|
with:
|
|
path: |
|
|
src/**/__pycache__
|
|
tests/**/__pycache__
|
|
packages/**/__pycache__
|
|
key: ${{ runner.os }}-pycache-${{ matrix.python-version }}-${{ hashFiles('src/**/*.py', 'tests/**/*.py', 'packages/**/*.py') }}
|
|
restore-keys: |
|
|
${{ runner.os }}-pycache-${{ matrix.python-version }}-
|
|
|
|
- name: Install uv
|
|
run: curl -LsSf https://astral.sh/uv/install.sh | sh
|
|
|
|
- name: Install dependencies
|
|
run: |
|
|
echo "📦 Installing dependencies..."
|
|
echo "🆕 Creating virtual environment..."
|
|
uv venv
|
|
echo "📦 Installing project dependencies..."
|
|
uv pip install -e ".[dev]"
|
|
# Ensure pytest-cov is installed
|
|
uv pip install pytest-cov
|
|
# Download NLTK data
|
|
echo "📥 Downloading NLTK data..."
|
|
uv run python -c "import nltk; nltk.download('punkt_tab')"
|
|
|
|
- name: Show cache status
|
|
run: |
|
|
echo "📊 Cache Status:"
|
|
echo "UV cache: $([ -d '~/.cache/uv' ] && echo '✅ Cached' || echo '❌ Not cached')"
|
|
echo "Ruff cache: $([ -d '~/.cache/ruff' ] && echo '✅ Cached' || echo '❌ Not cached')"
|
|
echo "Pytest cache: $([ -d '.pytest_cache' ] && echo '✅ Cached' || echo '❌ Not cached')"
|
|
|
|
# Ruff linting and formatting are handled by pre-commit hooks in lint.yml workflow
|
|
# Spell checking is handled by pre-commit hooks in lint.yml workflow
|
|
- name: Run tests with pytest
|
|
env:
|
|
# Set mock API keys for tests to prevent import errors
|
|
OPENAI_API_KEY: "test-key"
|
|
ANTHROPIC_API_KEY: "test-key"
|
|
TAVILY_API_KEY: "test-key"
|
|
run: |
|
|
echo "🧪 Running tests with pytest (with cache)..."
|
|
uv run pytest tests/unit_tests --cov=biz_bud --cov-report=term-missing --cov-report=html --cov-fail-under=0 || echo "Some tests failed - see logs above"
|
|
continue-on-error: true
|