Go to file

Travis Vasceannie 18b93515cc Bb-core-restoration-backup (#54 )

* fix: complete bb_tools migration with pre-commit compliance

- Migrate all bb_tools modules to src/biz_bud/tools structure
- Fix TypedDict definitions and type checking issues
- Create missing extraction modules (core/types.py, numeric/)
- Update pre-commit config with correct pyrefly paths
- Disable general pyrefly check (missing modules outside migration scope)
- Achieve pre-commit compliance for migration-specific modules

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: complete bb_tools migration with pre-commit compliance

* Pre-config-migration-backup (#51)

* fix: resolve linting errors for ErrorDetails import, spacing, and unused variables

* fix: correct docstring imperative mood in conftest.py

- Change 'Factory for creating...' to 'Create...'
- Change 'Simple timer...' to 'Provide simple timer...'
- Ensure all docstrings use imperative mood as required by D401

* feat: add new configuration and migration tools

- Introduced new configuration files and scripts for dependency analysis and migration planning.
- Added new Python modules for dependency analysis and migration processes.
- Updated .gitignore to include task files.
- Enhanced existing examples and scripts to support new functionality.

These changes improve the overall configuration management and migration capabilities of the project.

* refactor: reorganize tools package and enhance LangGraph integration

- Moved tool factory and related components to a new core structure for better organization.
- Updated pre-commit configuration to enable pyrefly type checking.
- Introduced new scraping strategies and unified scraper implementations for improved functionality.
- Enhanced error handling and logging across various tools and services.
- Added new TypedDicts for state management and tool execution tracking.

These changes improve the overall architecture and maintainability of the tools package while ensuring compliance with LangGraph standards.

* refactor: apply final Sourcery improvements

- Use named expression for cleanup_tasks in container.py
- Fix whitespace issue in cleanup_registry.py

All Sourcery suggestions now implemented

* refactor: reorganize tools package and enhance LangGraph integration

- Moved tool factory and related components to a new core structure for better organization.
- Updated pre-commit configuration to enable pyrefly type checking.
- Introduced new scraping strategies and unified scraper implementations for improved functionality.
- Enhanced error handling and logging across various tools and services.
- Added new TypedDicts for state management and tool execution tracking.

These changes improve the overall architecture and maintainability of the tools package while ensuring compliance with LangGraph standards.

* chore: update dependencies and improve error handling

- Bump version of @anthropic-ai/claude-code in package-lock.json to 1.0.64.
- Modify Dockerfile to allow 'npm' command in sudoers for the 'dev' user.
- Refactor buddy_execution.py and buddy_nodes_registry.py for improved readability.
- Enhance error handling in tool_exceptions.py with detailed docstrings.
- Update various decorators in langgraph to clarify their functionality in docstrings.
- Improve validation error handling in pydantic_models.py and security.py.
- Refactor catalog data loading to use asyncio for better performance.
- Enhance batch web search tool with a new result formatting function.

These changes enhance the overall functionality, maintainability, and clarity of the codebase.

* refactor: update .gitignore and improve configuration files

- Updated .gitignore to include task files with clearer formatting.
- Simplified the include paths in repomix.config.json for better clarity.
- Added a new documentation file for tool organization and refactoring plans.
- Enhanced docstrings across various files for improved clarity and consistency.

These changes enhance the organization and maintainability of the project while improving documentation clarity.

* refactor: streamline code with assignment expressions and improve readability

- Updated buddy_nodes_registry.py to simplify graph name assignment.
- Enhanced error handling in various files by using assignment expressions for clarity.
- Refactored multiple functions across the codebase to improve readability and maintainability.
- Adjusted return statements in validation and processing functions for better flow.

These changes enhance the overall clarity and efficiency of the codebase while maintaining functionality.

* refactor: enhance test structure and improve docstring clarity

- Added timeout decorator to improve async test handling in test_concurrency_races.py.
- Removed redundant imports and improved docstring clarity across multiple test files.
- Updated various test classes to ensure consistent and clear documentation.

These changes enhance the maintainability and readability of the test suite while ensuring proper async handling.

* refactor: enhance test documentation and structure

- Updated test fixture imports to include additional noqa codes for clarity.
- Added module docstrings for various test directories to improve documentation.
- Improved docstring formatting in test_embed_integration.py for consistency.

These changes enhance the clarity and maintainability of the test suite while ensuring proper documentation across test files.

* refactor: enhance test documentation and structure

- Added module docstrings to various test files for improved clarity.
- Improved individual test function docstrings to better describe their purpose.

These changes enhance the maintainability and readability of the test suite while ensuring proper documentation across test files.

* Refactoring of graphs nodes and tools (#52)

* Refactoring of graphs nodes and tools

* Refactoring of graphs nodes and tools

* Update src/biz_bud/graphs/planner.py

Co-authored-by: qodo-merge-pro[bot] <151058649+qodo-merge-pro[bot]@users.noreply.github.com>

* Refactoring of graphs nodes and tools

* Refactoring of graphs nodes and tools

* Refactoring of graphs nodes and tools

* Refactoring of graphs nodes and tools

---------

Co-authored-by: qodo-merge-pro[bot] <151058649+qodo-merge-pro[bot]@users.noreply.github.com>

* Tool-streamlining (#53)

* feat: add new tools and capabilities for extraction, scraping, and search

- Introduced new modules for extraction, scraping, and search capabilities, enhancing the overall functionality of the tools package.
- Added unit tests for browser tools and capabilities, improving test coverage and reliability.
- Refactored existing code for better organization and maintainability, including the removal of obsolete directories and files.

These changes significantly enhance the toolset available for data extraction and processing, while ensuring robust testing and code quality.

* refactor: remove obsolete extraction, scraping, and search modules

- Deleted outdated modules related to extraction, scraping, and search functionalities to streamline the codebase.
- This cleanup enhances maintainability and reduces complexity by removing unused code.

* big

* refactor: enhance tool call validation and logging

- Improved validation for tool calls to handle both dictionary and ToolCall object formats.
- Added detailed logging for invalid tool call structures and missing required fields.
- Streamlined the process of filtering valid tool calls for better maintainability and clarity.

* refactor: enhance capability normalization and metadata structure in LLM client and tests

- Added normalization for capability names in LangchainLLMClient to prevent duplicates.
- Updated test_memory_exhaustion.py to include detailed metadata structure for documents.
- Improved test_state_corruption.py to use a more descriptive data structure for large data entries.
- Enhanced test visualization state with additional fields for better context and configuration.

* refactor: update .gitignore and remove obsolete files

- Updated .gitignore to include task files and ensure proper tracking.
- Deleted analyze_test_violations.py, comprehensive_violations_baseline.txt, domain-nodes-migration-summary.md, domain-specific-nodes-migration-plan.md, EXTRACTION_REORGANIZATION.md, graph-specific-nodes-migration-plan.md, legacy-nodes-cleanup-analysis.md, MIGRATION_COMPLETE_SUMMARY.md, MIGRATION_COMPLETE.md, node-migration-final-analysis.md, nodes-migration-analysis.md, phase1-import-migration-status.md, REDUNDANT_FILE_CLEANUP.md, REGISTRY_REMOVAL_SUMMARY.md, shared-types-migration-summary.md, and various test violation reports to streamline the codebase and remove unused files.

* refactor: update .gitignore and enhance message handling in LLM call

- Added environment files to .gitignore for better configuration management.
- Refactored agent imports in __init__.py to reflect changes in architecture.
- Improved message handling in call_model_node to ensure valid message lists and provide clearer error responses.
- Updated unit tests to reflect changes in error messages and ensure consistency in validation checks.

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: qodo-merge-pro[bot] <151058649+qodo-merge-pro[bot]@users.noreply.github.com>

2025-08-01 21:18:22 -04:00

.claude

Bb-core-restoration-backup (#54 )

2025-08-01 21:18:22 -04:00

.cursor/rules

Semantic (#15 )

2025-06-06 00:46:23 -04:00

.devcontainer

Bb-core-restoration-backup (#54 )

2025-08-01 21:18:22 -04:00

.github/workflows

Menu_time (#29 )

2025-07-07 20:04:25 -04:00

.idx

fbase

2025-05-12 23:06:28 +00:00

.roo

Repopatch (#31 )

2025-07-12 23:06:26 -04:00

.vscode

feat: add Paperless NGX agent with robust error handling (#42 )

2025-07-16 16:31:44 -04:00

.windsurf/rules

Vasceannie/issue32 (#41 )

2025-07-14 21:23:14 -04:00

docker

Bb-core-restoration-backup (#54 )

2025-08-01 21:18:22 -04:00

docs

Bb-core-restoration-backup (#54 )

2025-08-01 21:18:22 -04:00

examples

Bb-core-restoration-backup (#54 )

2025-08-01 21:18:22 -04:00

scripts

Bb-core-restoration-backup (#54 )

2025-08-01 21:18:22 -04:00

src

Bb-core-restoration-backup (#54 )

2025-08-01 21:18:22 -04:00

static

Repopatch (#31 )

2025-07-12 23:06:26 -04:00

tests

Bb-core-restoration-backup (#54 )

2025-08-01 21:18:22 -04:00

.codespellignore

Repopatch (#31 )

2025-07-12 23:06:26 -04:00

.dockerignore

refac

2025-07-17 21:22:04 -04:00

.env.example

route-n-plan (#44 )

2025-07-17 18:32:58 -04:00

.env.production

route-n-plan (#44 )

2025-07-17 18:32:58 -04:00

.flake8

Bb-core-restoration-backup (#54 )

2025-08-01 21:18:22 -04:00

.gitignore

Bb-core-restoration-backup (#54 )

2025-08-01 21:18:22 -04:00

.mcp.json

Bb-core-restoration-backup (#54 )

2025-08-01 21:18:22 -04:00

.pre-commit-config.yaml

Bb-core-restoration-backup (#54 )

2025-08-01 21:18:22 -04:00

.pylintrc

Bb-core-restoration-backup (#54 )

2025-08-01 21:18:22 -04:00

.pyreflyignore

Bb-core-restoration-backup (#54 )

2025-08-01 21:18:22 -04:00

.repomixignore

Semantic (#15 )

2025-06-06 00:46:23 -04:00

.roomodes

Repopatch (#31 )

2025-07-12 23:06:26 -04:00

.sourcery.yaml

Bb-core-restoration-backup (#54 )

2025-08-01 21:18:22 -04:00

CLAUDE.local.md

Cleanup (#45 )

2025-07-20 13:21:05 -04:00

CLAUDE.md

Bb-core-restoration-backup (#54 )

2025-08-01 21:18:22 -04:00

commit.gpgsign=false,tag.gpgsign=false

Bb-core-restoration-backup (#54 )

2025-08-01 21:18:22 -04:00

config.yaml

Cleanup (#45 )

2025-07-20 13:21:05 -04:00

deploy.sh

route-n-plan (#44 )

2025-07-17 18:32:58 -04:00

dev.sh

Vasceannie/issue32 (#41 )

2025-07-14 21:23:14 -04:00

docker-compose.production.yml

route-n-plan (#44 )

2025-07-17 18:32:58 -04:00

Dockerfile.production

fix: streamline Docker build with optimized package installation and PYTHONPATH config

2025-07-17 21:59:28 -04:00

langgraph.json

Cleanup (#45 )

2025-07-20 13:21:05 -04:00

LICENSE

feat: add core embeddings functionality with multi-provider support and Jina integration

2025-05-07 14:25:50 -04:00

Makefile

Bb-core-restoration-backup (#54 )

2025-08-01 21:18:22 -04:00

mypy.ini

Repopatch (#31 )

2025-07-12 23:06:26 -04:00

nginx.conf

route-n-plan (#44 )

2025-07-17 18:32:58 -04:00

package-lock.json

Bb-core-restoration-backup (#54 )

2025-08-01 21:18:22 -04:00

package.json

Vasceannie/issue32 (#41 )

2025-07-14 21:23:14 -04:00

pyproject.toml

Bb-core-restoration-backup (#54 )

2025-08-01 21:18:22 -04:00

pyrefly.toml

Bb-core-restoration-backup (#54 )

2025-08-01 21:18:22 -04:00

pyrightconfig.json

feat: add Paperless NGX agent with robust error handling (#42 )

2025-07-16 16:31:44 -04:00

README.md

Cleanup (#45 )

2025-07-20 13:21:05 -04:00

repomix.config.json

Bb-core-restoration-backup (#54 )

2025-08-01 21:18:22 -04:00

requirements.txt

Cleanup (#27 )

2025-06-27 00:26:39 -04:00

sourcery copy.txt

Bb-core-restoration-backup (#54 )

2025-08-01 21:18:22 -04:00

sourcery.txt

Bb-core-restoration-backup (#54 )

2025-08-01 21:18:22 -04:00

uv.lock

Bb-core-restoration-backup (#54 )

2025-08-01 21:18:22 -04:00

README.md

Business Buddy (Biz Budz)

Business Buddy is a sophisticated AI agent framework built on LangGraph, designed for business research, analysis, and document processing workflows. It provides a modular architecture for creating, managing, and executing AI-powered tasks with built-in support for various LLM providers, advanced RAG capabilities, and comprehensive data processing tools.

🚀 Features

Core Capabilities

Advanced RAG Integration: Full R2R (Retrieval-Augmented Retrieval) support with document deduplication, batch processing, and intelligent collection management
Multi-LLM Support: Compatible with OpenAI, Anthropic, Google VertexAI, Cohere, and more
Modular Architecture: Organized into reusable nodes, graphs, and services for easy extension
Type Safety: Comprehensive type hints with Pydantic models throughout
Asynchronous by Design: Built for high-performance concurrent operations

Specialized Workflows

Market Research: Automated business and market analysis workflows
Menu Intelligence: Restaurant menu analysis and extraction
Document Processing: URL-to-R2R pipeline with intelligent content analysis
Web Scraping: Multiple strategies including Firecrawl, BeautifulSoup, and browser automation
Search Orchestration: Multi-provider search with caching and result ranking

📁 Project Structure

biz-budz/
├── src/biz_bud/          # Main application code
│   ├── graphs/           # LangGraph workflow definitions
│   ├── nodes/            # Modular processing nodes
│   │   ├── analysis/     # Data analysis and visualization
│   │   ├── core/         # Core functionality
│   │   ├── llm/          # LLM interactions
│   │   ├── rag/          # RAG and R2R integration
│   │   ├── research/     # Research workflows
│   │   ├── scraping/     # Web scraping strategies
│   │   ├── search/       # Search orchestration
│   │   └── validation/   # Content validation
│   ├── services/         # External service integrations
│   └── states/           # TypedDict state definitions
├── packages/             # Modular utility packages
│   ├── business-buddy-core/      # Core utilities
│   ├── business-buddy-extraction/# Entity extraction
│   ├── business-buddy-tools/     # Web tools & scrapers
│   └── business-buddy-utils/     # General utilities
└── examples/             # Usage examples and demos

🛠️ Installation

Prerequisites

Python 3.12+
UV package manager
Docker (for development services)

Quick Setup

Clone and setup:

git clone https://github.com/vasceannie/competitor-costing.git biz-budz
cd biz-budz
./scripts/setup-dev.sh

Configure environment:

cp .env.example .env
# Edit .env with your API keys:
# - OPENAI_API_KEY
# - ANTHROPIC_API_KEY (optional)
# - TAVILY_API_KEY (for web search)
# - FIRECRAWL_API_KEY (for advanced scraping)
# - R2R_BASE_URL (if using R2R)

Start development services:

make start  # Starts PostgreSQL, Redis, Qdrant

💻 Development

Commands

# Environment activation (always use this)
source .venv/bin/activate

# Run all code quality checks
make lint-all

# Run tests with coverage
make test

# Run tests in watch mode
make test_watch

# Format code
make format

# Run pre-commit hooks
make pre-commit

# Start/stop Docker services
make start
make stop

Code Quality Standards

This project enforces strict code quality:

Type Safety: No Any types or # type: ignore allowed
Linting: Ruff for style, Pyrefly for advanced type checking
Testing: Minimum 70% coverage requirement
Documentation: Imperative docstrings with punctuation
Pre-commit: Automatic hooks for all quality checks

Testing

# Run all tests
pytest

# Run specific test file
pytest tests/unit_tests/nodes/rag/test_analyzer.py

# Run with coverage report
pytest --cov=biz_bud --cov-report=html

# Run integration tests only
pytest tests/integration_tests/

🔧 Configuration

Business Buddy uses a hierarchical configuration system:

Environment Variables (highest priority)
YAML Configuration (config.yaml)
Default Values (lowest priority)

Example configuration usage:

from biz_bud.config.loader import load_config

config = load_config()
# Access nested configuration
llm_config = config.llm_config
api_keys = config.api_config

📚 Usage Examples

Running a Research Workflow

from biz_bud.graphs.research import research_graph

# Execute research workflow
result = await research_graph.ainvoke({
    "messages": [HumanMessage(content="Research the coffee shop market in Seattle")],
    "config": config
})

URL to R2R Document Processing

from biz_bud.graphs.url_to_r2r import url_to_r2r_graph

# Process URLs and upload to R2R
result = await url_to_r2r_graph.ainvoke({
    "url": "https://docs.example.com",
    "config": config
})

Using the RAG Agent

# from biz_bud.agents.rag_agent import create_rag_agent_executor  # Module deleted

# agent = create_rag_agent_executor(config)
# result = await agent.ainvoke({
    "messages": [HumanMessage(content="What are the key features of R2R?")]
})

🏗️ Architecture Highlights

Key Design Patterns

State-Driven Workflows: TypedDict states ensure type safety across graph executions
Service Abstraction: Clean interfaces for all external dependencies
Decorator Pattern: Centralized error handling and logging via @log_config and @error_handling
Modular Nodes: Each node has single responsibility and is independently testable
Parallel Processing: Extensive use of asyncio for concurrent operations

RAG Integration

R2R Support: Full integration with R2R for document storage and retrieval
Intelligent Deduplication: Content-based and URL-based duplicate detection
Batch Processing: Efficient handling of large document sets
Collection Management: Automatic collection assignment based on source domains

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Make your changes with tests
Ensure all checks pass (make lint-all && make test)
Commit with descriptive message
Push and create a Pull Request

Development Principles

Always use UV for package management
Ensure all code is strongly typed
Write tests for new functionality
Follow existing code patterns and conventions
Never use --no-verify flag for commits

🚀 CI/CD

GitHub Actions workflows ensure code quality:

Code Quality (lint.yml): Runs all linters and type checkers
Unit Tests (unit-tests.yml): Executes test suite with coverage
Integration Tests (integration-tests.yml): Validates full workflows

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgements

LangChain - Foundation for agent development
LangGraph - Graph-based workflow orchestration
R2R - RAG system integration
Firecrawl - Advanced web scraping
UV - Fast Python package management

Note: Always activate the virtual environment with .venv/bin/activate and use UV for all package management operations.