Go to file

Travis Vasceannie 6079ea027e Refine (#59 )

* feat: implement async factory functions and optimize blocking I/O operations

* refactor: improve thread safety and error handling in graph factory caching system

* Update src/biz_bud/agents/buddy_agent.py

Co-authored-by: qodo-merge-pro[bot] <151058649+qodo-merge-pro[bot]@users.noreply.github.com>

* Update src/biz_bud/services/factory/service_factory.py

Co-authored-by: qodo-merge-pro[bot] <151058649+qodo-merge-pro[bot]@users.noreply.github.com>

* Update src/biz_bud/core/edge_helpers/error_handling.py

Co-authored-by: qodo-merge-pro[bot] <151058649+qodo-merge-pro[bot]@users.noreply.github.com>

* refactor: extract validation helpers and improve code organization in workflow modules

* refactor: extract shared code into helper methods and reuse across modules

* fix: prevent task cancellation leaks and improve async handling across core services

* refactor: replace async methods with sync in URL processing and error handling

* refactor: replace emoji and unsafe regex with plain text and secure regex operations

* refactor: consolidate timeout handling and regex safety checks with cross-platform support

* refactor: consolidate async detection and error normalization utilities across core modules

* Update src/biz_bud/core/utils/url_analyzer.py

Co-authored-by: qodo-merge-pro[bot] <151058649+qodo-merge-pro[bot]@users.noreply.github.com>

* fix: prevent config override in LLM kwargs to maintain custom parameters

* refactor: centralize regex security and async context handling with improved error handling

* Update scripts/checks/audit_core_dependencies.py

Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>

* Update scripts/checks/audit_core_dependencies.py

Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>

* Update scripts/checks/audit_core_dependencies.py

Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>

* Update scripts/checks/audit_core_dependencies.py

Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>

* fix: correct regex pattern for alternation in repeated groups by removing escaped bracket

* fix: resolve pre-commit hooks configuration and update dependencies

- Clean and reinstall pre-commit hooks to fix corrupted cache
- Update isort to v6.0.1 to resolve deprecation warnings
- Fix pytest PT012 error by separating pytest.raises from context managers
- Fix pytest PT011 errors by using GraphInterrupt instead of generic Exception
- Fix formatting and trailing whitespace issues automatically applied by hooks

* chore: remove deprecated files and clean up project structure

* fix: use safe regex utilities in json_extractor validation

- Replace unsafe re.search() calls with SafeRegexCompiler in _validate_input_security
- Use already imported _compiler instance for pattern compilation
- Add timeout protection (0.5s) for security pattern checks
- Properly handle exceptions during pattern compilation
- This prevents the security validation itself from being vulnerable to ReDoS attacks

* chore: remove deprecated files and clean up project structure

* feat: enhance metric tracking and memory management utilities

- Introduced new functions for initializing and updating metrics in the langgraph module.
- Added memory usage and concurrency management functions in async_utils.
- Refactored extractors and orchestrator to utilize new memory management functions for dynamic concurrency scaling.
- Removed deprecated memory management code from extractors and orchestrator.

* refactor: enhance graph construction with new configuration approach

- Updated graph.py and subgraphs.py to utilize GraphBuilderConfig for defining nodes and edges.
- Removed deprecated StateGraph usage in favor of a more modular graph construction method.
- Improved readability and maintainability of graph creation functions across various modules.

* fix: improve error handling in async cleanup and shutdown processes

- Enhanced the cleanup method in the cache decorator to preserve CancelledError during cleanup.
- Updated the service lifecycle manager to better handle CancelledError during shutdown and service cleanup.
- Ensured that all exceptions are properly logged and handled, maintaining robustness in async operations.

* refactor: standardize return type annotations for CompiledStateGraph

- Updated function signatures across multiple modules to use a consistent return type annotation for CompiledStateGraph.
- Improved code clarity and maintainability by removing specific type parameters in favor of a more general annotation.

---------

Co-authored-by: qodo-merge-pro[bot] <151058649+qodo-merge-pro[bot]@users.noreply.github.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>

2025-08-08 00:27:49 -04:00

.claude

Refine (#59 )

2025-08-08 00:27:49 -04:00

.cursor/rules

Semantic (#15 )

2025-06-06 00:46:23 -04:00

.devcontainer

Bb-core-restoration-backup (#54 )

2025-08-01 21:18:22 -04:00

.github/workflows

feat: enhance coverage reporting and improve tool configuration (#55 )

2025-08-04 00:54:52 -04:00

.idx

fbase

2025-05-12 23:06:28 +00:00

.roo

Repopatch (#31 )

2025-07-12 23:06:26 -04:00

.sonar

Refine (#59 )

2025-08-08 00:27:49 -04:00

.vscode

Tests (#56 )

2025-08-05 13:03:53 -04:00

.windsurf/rules

Vasceannie/issue32 (#41 )

2025-07-14 21:23:14 -04:00

docker

Refine (#59 )

2025-08-08 00:27:49 -04:00

docs

Refine (#59 )

2025-08-08 00:27:49 -04:00

examples

feat: enhance coverage reporting and improve tool configuration (#55 )

2025-08-04 00:54:52 -04:00

scripts

Refine (#58 )

2025-08-06 13:09:30 -04:00

src

Refine (#59 )

2025-08-08 00:27:49 -04:00

static

Repopatch (#31 )

2025-07-12 23:06:26 -04:00

tests

Refine (#59 )

2025-08-08 00:27:49 -04:00

.claude.yaml

feat: enhance coverage reporting and improve tool configuration (#55 )

2025-08-04 00:54:52 -04:00

.codespellignore

Repopatch (#31 )

2025-07-12 23:06:26 -04:00

.dockerignore

refac

2025-07-17 21:22:04 -04:00

.env.example

route-n-plan (#44 )

2025-07-17 18:32:58 -04:00

.env.production

route-n-plan (#44 )

2025-07-17 18:32:58 -04:00

.flake8

Bb-core-restoration-backup (#54 )

2025-08-01 21:18:22 -04:00

.gitignore

feat: enhance coverage reporting and improve tool configuration (#55 )

2025-08-04 00:54:52 -04:00

.mcp.json

Bb-core-restoration-backup (#54 )

2025-08-01 21:18:22 -04:00

.pre-commit-config.yaml

Refine (#59 )

2025-08-08 00:27:49 -04:00

.pre-commit-test-policy.yaml

Tests (#56 )

2025-08-05 13:03:53 -04:00

.pylintrc

Bb-core-restoration-backup (#54 )

2025-08-01 21:18:22 -04:00

.pyreflyignore

Bb-core-restoration-backup (#54 )

2025-08-01 21:18:22 -04:00

.repomixignore

Semantic (#15 )

2025-06-06 00:46:23 -04:00

.roomodes

Repopatch (#31 )

2025-07-12 23:06:26 -04:00

.sourcery.yaml

Bb-core-restoration-backup (#54 )

2025-08-01 21:18:22 -04:00

analyze_coverage.py

Tests (#56 )

2025-08-05 13:03:53 -04:00

basedpyright_errors.txt

Tests (#56 )

2025-08-05 13:03:53 -04:00

cicderrors.txt

feat: enhance coverage reporting and improve tool configuration (#55 )

2025-08-04 00:54:52 -04:00

CLAUDE.local.md

feat: enhance coverage reporting and improve tool configuration (#55 )

2025-08-04 00:54:52 -04:00

CLAUDE.md

Tests (#56 )

2025-08-05 13:03:53 -04:00

commit.gpgsign=false,tag.gpgsign=false

Bb-core-restoration-backup (#54 )

2025-08-01 21:18:22 -04:00

complex_errors.txt

Tests (#56 )

2025-08-05 13:03:53 -04:00

config.yaml

Tests (#56 )

2025-08-05 13:03:53 -04:00

deploy.sh

route-n-plan (#44 )

2025-07-17 18:32:58 -04:00

dev.sh

Vasceannie/issue32 (#41 )

2025-07-14 21:23:14 -04:00

docker-compose.production.yml

route-n-plan (#44 )

2025-07-17 18:32:58 -04:00

Dockerfile.production

feat: enhance coverage reporting and improve tool configuration (#55 )

2025-08-04 00:54:52 -04:00

full_audit.py

Tests (#56 )

2025-08-05 13:03:53 -04:00

LANGGRAPH_ASYNC_FIXES.md

feat: implement async factory functions and optimize blocking I/O operations (#57 )

2025-08-05 21:10:48 -04:00

langgraph.json

Refine (#59 )

2025-08-08 00:27:49 -04:00

LICENSE

feat: add core embeddings functionality with multi-provider support and Jina integration

2025-05-07 14:25:50 -04:00

Makefile

feat: enhance coverage reporting and improve tool configuration (#55 )

2025-08-04 00:54:52 -04:00

mypy.ini

feat: enhance coverage reporting and improve tool configuration (#55 )

2025-08-04 00:54:52 -04:00

nginx.conf

route-n-plan (#44 )

2025-07-17 18:32:58 -04:00

package-lock.json

Bb-core-restoration-backup (#54 )

2025-08-01 21:18:22 -04:00

package.json

Vasceannie/issue32 (#41 )

2025-07-14 21:23:14 -04:00

pyproject.toml

Refine (#59 )

2025-08-08 00:27:49 -04:00

pyrefly.toml

Refine (#59 )

2025-08-08 00:27:49 -04:00

pyrightconfig.json

feat: enhance coverage reporting and improve tool configuration (#55 )

2025-08-04 00:54:52 -04:00

pytest-vscode.ini

Tests (#56 )

2025-08-05 13:03:53 -04:00

quick_audit.py

Tests (#56 )

2025-08-05 13:03:53 -04:00

README.md

Cleanup (#45 )

2025-07-20 13:21:05 -04:00

repomix.config.json

Bb-core-restoration-backup (#54 )

2025-08-01 21:18:22 -04:00

requirements.txt

feat: enhance coverage reporting and improve tool configuration (#55 )

2025-08-04 00:54:52 -04:00

review.txt

feat: enhance coverage reporting and improve tool configuration (#55 )

2025-08-04 00:54:52 -04:00

sonar-project.properties

feat: enhance coverage reporting and improve tool configuration (#55 )

2025-08-04 00:54:52 -04:00

uv.lock

Refine (#59 )

2025-08-08 00:27:49 -04:00

README.md

Business Buddy (Biz Budz)

Business Buddy is a sophisticated AI agent framework built on LangGraph, designed for business research, analysis, and document processing workflows. It provides a modular architecture for creating, managing, and executing AI-powered tasks with built-in support for various LLM providers, advanced RAG capabilities, and comprehensive data processing tools.

🚀 Features

Core Capabilities

Advanced RAG Integration: Full R2R (Retrieval-Augmented Retrieval) support with document deduplication, batch processing, and intelligent collection management
Multi-LLM Support: Compatible with OpenAI, Anthropic, Google VertexAI, Cohere, and more
Modular Architecture: Organized into reusable nodes, graphs, and services for easy extension
Type Safety: Comprehensive type hints with Pydantic models throughout
Asynchronous by Design: Built for high-performance concurrent operations

Specialized Workflows

Market Research: Automated business and market analysis workflows
Menu Intelligence: Restaurant menu analysis and extraction
Document Processing: URL-to-R2R pipeline with intelligent content analysis
Web Scraping: Multiple strategies including Firecrawl, BeautifulSoup, and browser automation
Search Orchestration: Multi-provider search with caching and result ranking

📁 Project Structure

biz-budz/
├── src/biz_bud/          # Main application code
│   ├── graphs/           # LangGraph workflow definitions
│   ├── nodes/            # Modular processing nodes
│   │   ├── analysis/     # Data analysis and visualization
│   │   ├── core/         # Core functionality
│   │   ├── llm/          # LLM interactions
│   │   ├── rag/          # RAG and R2R integration
│   │   ├── research/     # Research workflows
│   │   ├── scraping/     # Web scraping strategies
│   │   ├── search/       # Search orchestration
│   │   └── validation/   # Content validation
│   ├── services/         # External service integrations
│   └── states/           # TypedDict state definitions
├── packages/             # Modular utility packages
│   ├── business-buddy-core/      # Core utilities
│   ├── business-buddy-extraction/# Entity extraction
│   ├── business-buddy-tools/     # Web tools & scrapers
│   └── business-buddy-utils/     # General utilities
└── examples/             # Usage examples and demos

🛠️ Installation

Prerequisites

Python 3.12+
UV package manager
Docker (for development services)

Quick Setup

Clone and setup:

git clone https://github.com/vasceannie/competitor-costing.git biz-budz
cd biz-budz
./scripts/setup-dev.sh

Configure environment:

cp .env.example .env
# Edit .env with your API keys:
# - OPENAI_API_KEY
# - ANTHROPIC_API_KEY (optional)
# - TAVILY_API_KEY (for web search)
# - FIRECRAWL_API_KEY (for advanced scraping)
# - R2R_BASE_URL (if using R2R)

Start development services:

make start  # Starts PostgreSQL, Redis, Qdrant

💻 Development

Commands

# Environment activation (always use this)
source .venv/bin/activate

# Run all code quality checks
make lint-all

# Run tests with coverage
make test

# Run tests in watch mode
make test_watch

# Format code
make format

# Run pre-commit hooks
make pre-commit

# Start/stop Docker services
make start
make stop

Code Quality Standards

This project enforces strict code quality:

Type Safety: No Any types or # type: ignore allowed
Linting: Ruff for style, Pyrefly for advanced type checking
Testing: Minimum 70% coverage requirement
Documentation: Imperative docstrings with punctuation
Pre-commit: Automatic hooks for all quality checks

Testing

# Run all tests
pytest

# Run specific test file
pytest tests/unit_tests/nodes/rag/test_analyzer.py

# Run with coverage report
pytest --cov=biz_bud --cov-report=html

# Run integration tests only
pytest tests/integration_tests/

🔧 Configuration

Business Buddy uses a hierarchical configuration system:

Environment Variables (highest priority)
YAML Configuration (config.yaml)
Default Values (lowest priority)

Example configuration usage:

from biz_bud.config.loader import load_config

config = load_config()
# Access nested configuration
llm_config = config.llm_config
api_keys = config.api_config

📚 Usage Examples

Running a Research Workflow

from biz_bud.graphs.research import research_graph

# Execute research workflow
result = await research_graph.ainvoke({
    "messages": [HumanMessage(content="Research the coffee shop market in Seattle")],
    "config": config
})

URL to R2R Document Processing

from biz_bud.graphs.url_to_r2r import url_to_r2r_graph

# Process URLs and upload to R2R
result = await url_to_r2r_graph.ainvoke({
    "url": "https://docs.example.com",
    "config": config
})

Using the RAG Agent

# from biz_bud.agents.rag_agent import create_rag_agent_executor  # Module deleted

# agent = create_rag_agent_executor(config)
# result = await agent.ainvoke({
    "messages": [HumanMessage(content="What are the key features of R2R?")]
})

🏗️ Architecture Highlights

Key Design Patterns

State-Driven Workflows: TypedDict states ensure type safety across graph executions
Service Abstraction: Clean interfaces for all external dependencies
Decorator Pattern: Centralized error handling and logging via @log_config and @error_handling
Modular Nodes: Each node has single responsibility and is independently testable
Parallel Processing: Extensive use of asyncio for concurrent operations

RAG Integration

R2R Support: Full integration with R2R for document storage and retrieval
Intelligent Deduplication: Content-based and URL-based duplicate detection
Batch Processing: Efficient handling of large document sets
Collection Management: Automatic collection assignment based on source domains

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Make your changes with tests
Ensure all checks pass (make lint-all && make test)
Commit with descriptive message
Push and create a Pull Request

Development Principles

Always use UV for package management
Ensure all code is strongly typed
Write tests for new functionality
Follow existing code patterns and conventions
Never use --no-verify flag for commits

🚀 CI/CD

GitHub Actions workflows ensure code quality:

Code Quality (lint.yml): Runs all linters and type checkers
Unit Tests (unit-tests.yml): Executes test suite with coverage
Integration Tests (integration-tests.yml): Validates full workflows

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgements

LangChain - Foundation for agent development
LangGraph - Graph-based workflow orchestration
R2R - RAG system integration
Firecrawl - Advanced web scraping
UV - Fast Python package management

Note: Always activate the virtual environment with .venv/bin/activate and use UV for all package management operations.