* feat: implement async factory functions and optimize blocking I/O operations * refactor: improve thread safety and error handling in graph factory caching system * Update src/biz_bud/agents/buddy_agent.py Co-authored-by: qodo-merge-pro[bot] <151058649+qodo-merge-pro[bot]@users.noreply.github.com> * Update src/biz_bud/services/factory/service_factory.py Co-authored-by: qodo-merge-pro[bot] <151058649+qodo-merge-pro[bot]@users.noreply.github.com> * Update src/biz_bud/core/edge_helpers/error_handling.py Co-authored-by: qodo-merge-pro[bot] <151058649+qodo-merge-pro[bot]@users.noreply.github.com> * refactor: extract validation helpers and improve code organization in workflow modules * refactor: extract shared code into helper methods and reuse across modules * fix: prevent task cancellation leaks and improve async handling across core services * refactor: replace async methods with sync in URL processing and error handling * refactor: replace emoji and unsafe regex with plain text and secure regex operations * refactor: consolidate timeout handling and regex safety checks with cross-platform support * refactor: consolidate async detection and error normalization utilities across core modules * Update src/biz_bud/core/utils/url_analyzer.py Co-authored-by: qodo-merge-pro[bot] <151058649+qodo-merge-pro[bot]@users.noreply.github.com> * fix: prevent config override in LLM kwargs to maintain custom parameters * refactor: centralize regex security and async context handling with improved error handling * Update scripts/checks/audit_core_dependencies.py Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com> * Update scripts/checks/audit_core_dependencies.py Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com> * Update scripts/checks/audit_core_dependencies.py Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com> * Update scripts/checks/audit_core_dependencies.py Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com> * fix: correct regex pattern for alternation in repeated groups by removing escaped bracket * fix: resolve pre-commit hooks configuration and update dependencies - Clean and reinstall pre-commit hooks to fix corrupted cache - Update isort to v6.0.1 to resolve deprecation warnings - Fix pytest PT012 error by separating pytest.raises from context managers - Fix pytest PT011 errors by using GraphInterrupt instead of generic Exception - Fix formatting and trailing whitespace issues automatically applied by hooks * chore: remove deprecated files and clean up project structure * fix: use safe regex utilities in json_extractor validation - Replace unsafe re.search() calls with SafeRegexCompiler in _validate_input_security - Use already imported _compiler instance for pattern compilation - Add timeout protection (0.5s) for security pattern checks - Properly handle exceptions during pattern compilation - This prevents the security validation itself from being vulnerable to ReDoS attacks * chore: remove deprecated files and clean up project structure * feat: enhance metric tracking and memory management utilities - Introduced new functions for initializing and updating metrics in the langgraph module. - Added memory usage and concurrency management functions in async_utils. - Refactored extractors and orchestrator to utilize new memory management functions for dynamic concurrency scaling. - Removed deprecated memory management code from extractors and orchestrator. * refactor: enhance graph construction with new configuration approach - Updated graph.py and subgraphs.py to utilize GraphBuilderConfig for defining nodes and edges. - Removed deprecated StateGraph usage in favor of a more modular graph construction method. - Improved readability and maintainability of graph creation functions across various modules. * fix: improve error handling in async cleanup and shutdown processes - Enhanced the cleanup method in the cache decorator to preserve CancelledError during cleanup. - Updated the service lifecycle manager to better handle CancelledError during shutdown and service cleanup. - Ensured that all exceptions are properly logged and handled, maintaining robustness in async operations. * refactor: standardize return type annotations for CompiledStateGraph - Updated function signatures across multiple modules to use a consistent return type annotation for CompiledStateGraph. - Improved code clarity and maintainability by removing specific type parameters in favor of a more general annotation. --------- Co-authored-by: qodo-merge-pro[bot] <151058649+qodo-merge-pro[bot]@users.noreply.github.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
Business Buddy (Biz Budz)
Business Buddy is a sophisticated AI agent framework built on LangGraph, designed for business research, analysis, and document processing workflows. It provides a modular architecture for creating, managing, and executing AI-powered tasks with built-in support for various LLM providers, advanced RAG capabilities, and comprehensive data processing tools.
🚀 Features
Core Capabilities
- Advanced RAG Integration: Full R2R (Retrieval-Augmented Retrieval) support with document deduplication, batch processing, and intelligent collection management
- Multi-LLM Support: Compatible with OpenAI, Anthropic, Google VertexAI, Cohere, and more
- Modular Architecture: Organized into reusable nodes, graphs, and services for easy extension
- Type Safety: Comprehensive type hints with Pydantic models throughout
- Asynchronous by Design: Built for high-performance concurrent operations
Specialized Workflows
- Market Research: Automated business and market analysis workflows
- Menu Intelligence: Restaurant menu analysis and extraction
- Document Processing: URL-to-R2R pipeline with intelligent content analysis
- Web Scraping: Multiple strategies including Firecrawl, BeautifulSoup, and browser automation
- Search Orchestration: Multi-provider search with caching and result ranking
📁 Project Structure
biz-budz/
├── src/biz_bud/ # Main application code
│ ├── graphs/ # LangGraph workflow definitions
│ ├── nodes/ # Modular processing nodes
│ │ ├── analysis/ # Data analysis and visualization
│ │ ├── core/ # Core functionality
│ │ ├── llm/ # LLM interactions
│ │ ├── rag/ # RAG and R2R integration
│ │ ├── research/ # Research workflows
│ │ ├── scraping/ # Web scraping strategies
│ │ ├── search/ # Search orchestration
│ │ └── validation/ # Content validation
│ ├── services/ # External service integrations
│ └── states/ # TypedDict state definitions
├── packages/ # Modular utility packages
│ ├── business-buddy-core/ # Core utilities
│ ├── business-buddy-extraction/# Entity extraction
│ ├── business-buddy-tools/ # Web tools & scrapers
│ └── business-buddy-utils/ # General utilities
└── examples/ # Usage examples and demos
🛠️ Installation
Prerequisites
- Python 3.12+
- UV package manager
- Docker (for development services)
Quick Setup
-
Clone and setup:
git clone https://github.com/vasceannie/competitor-costing.git biz-budz cd biz-budz ./scripts/setup-dev.sh -
Configure environment:
cp .env.example .env # Edit .env with your API keys: # - OPENAI_API_KEY # - ANTHROPIC_API_KEY (optional) # - TAVILY_API_KEY (for web search) # - FIRECRAWL_API_KEY (for advanced scraping) # - R2R_BASE_URL (if using R2R) -
Start development services:
make start # Starts PostgreSQL, Redis, Qdrant
💻 Development
Commands
# Environment activation (always use this)
source .venv/bin/activate
# Run all code quality checks
make lint-all
# Run tests with coverage
make test
# Run tests in watch mode
make test_watch
# Format code
make format
# Run pre-commit hooks
make pre-commit
# Start/stop Docker services
make start
make stop
Code Quality Standards
This project enforces strict code quality:
- Type Safety: No
Anytypes or# type: ignoreallowed - Linting: Ruff for style, Pyrefly for advanced type checking
- Testing: Minimum 70% coverage requirement
- Documentation: Imperative docstrings with punctuation
- Pre-commit: Automatic hooks for all quality checks
Testing
# Run all tests
pytest
# Run specific test file
pytest tests/unit_tests/nodes/rag/test_analyzer.py
# Run with coverage report
pytest --cov=biz_bud --cov-report=html
# Run integration tests only
pytest tests/integration_tests/
🔧 Configuration
Business Buddy uses a hierarchical configuration system:
- Environment Variables (highest priority)
- YAML Configuration (
config.yaml) - Default Values (lowest priority)
Example configuration usage:
from biz_bud.config.loader import load_config
config = load_config()
# Access nested configuration
llm_config = config.llm_config
api_keys = config.api_config
📚 Usage Examples
Running a Research Workflow
from biz_bud.graphs.research import research_graph
# Execute research workflow
result = await research_graph.ainvoke({
"messages": [HumanMessage(content="Research the coffee shop market in Seattle")],
"config": config
})
URL to R2R Document Processing
from biz_bud.graphs.url_to_r2r import url_to_r2r_graph
# Process URLs and upload to R2R
result = await url_to_r2r_graph.ainvoke({
"url": "https://docs.example.com",
"config": config
})
Using the RAG Agent
# from biz_bud.agents.rag_agent import create_rag_agent_executor # Module deleted
# agent = create_rag_agent_executor(config)
# result = await agent.ainvoke({
"messages": [HumanMessage(content="What are the key features of R2R?")]
})
🏗️ Architecture Highlights
Key Design Patterns
- State-Driven Workflows: TypedDict states ensure type safety across graph executions
- Service Abstraction: Clean interfaces for all external dependencies
- Decorator Pattern: Centralized error handling and logging via
@log_configand@error_handling - Modular Nodes: Each node has single responsibility and is independently testable
- Parallel Processing: Extensive use of asyncio for concurrent operations
RAG Integration
- R2R Support: Full integration with R2R for document storage and retrieval
- Intelligent Deduplication: Content-based and URL-based duplicate detection
- Batch Processing: Efficient handling of large document sets
- Collection Management: Automatic collection assignment based on source domains
🤝 Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes with tests
- Ensure all checks pass (
make lint-all && make test) - Commit with descriptive message
- Push and create a Pull Request
Development Principles
- Always use UV for package management
- Ensure all code is strongly typed
- Write tests for new functionality
- Follow existing code patterns and conventions
- Never use
--no-verifyflag for commits
🚀 CI/CD
GitHub Actions workflows ensure code quality:
- Code Quality (
lint.yml): Runs all linters and type checkers - Unit Tests (
unit-tests.yml): Executes test suite with coverage - Integration Tests (
integration-tests.yml): Validates full workflows
📝 License
This project is licensed under the MIT License - see the LICENSE file for details.
🙏 Acknowledgements
- LangChain - Foundation for agent development
- LangGraph - Graph-based workflow orchestration
- R2R - RAG system integration
- Firecrawl - Advanced web scraping
- UV - Fast Python package management
Note: Always activate the virtual environment with .venv/bin/activate and use UV for all package management operations.