Files
biz-bud/AGENTS.md
Travis Vasceannie a6731cc185 fix: resolve all pyrefly linting errors in Discord implementation
- Fix Pydantic Field constraints using Annotated pattern
- Fix database access to use asyncpg pool directly
- Fix LLM client max_tokens parameter usage
- Add type safety checks for dict operations
- Fix Discord.py type annotations and overrides
- Add pyrefly ignore comments for false positives
- Fix bot.user null checks in event handlers
- Ensure all Discord services pass type checking
2025-09-20 18:17:56 -04:00

8.8 KiB

Repository Guidelines

Comprehensive directory map for everything under src/ so agents and contributors can navigate confidently.

Legend & Scope

Lines reference paths relative to /home/vasceannie/repos/biz-budz. __pycache__/ folders exist in most packages and are excluded from detail. .backup files capture older implementations—consult primary modules first.

Root: src/

src/ holds all installable code declared in pyproject.toml. Ensure PYTHONPATH=src when invoking modules directly or running ad-hoc scripts.

Package: src/biz_bud/

__init__.py exposes package exports; py.typed marks type completeness. PROJECT_OVERVIEW.md summarizes architecture; webapp.py defines the FastAPI entry point. .claude/settings.local.json stores assistant settings; safe to ignore for runtime logic.

Agents: src/biz_bud/agents/

AGENTS.md (package-level) documents agent orchestration expectations. buddy_agent.py builds the Business Buddy orchestrator. buddy_execution.py wires execution loops and callbacks. buddy_routing.py handles task routing decisions. buddy_nodes_registry.py maps node IDs to implementations. buddy_state_manager.py encapsulates state mutations and safeguards.

Core: src/biz_bud/core/

Infrastructure shared by graphs, nodes, and services. caching/ includes backends (cache_backends.py, memory.py, file.py), orchestrators (cache_manager.py), decorators, and redis.py; guidance lives in CACHING_GUIDELINES.md. config/ provides layered config loading via loader.py, constants, ensure_tools_config.py, integration stubs, and schemas/ (TypedDict definitions for app, analysis, buddy, core, llm, research, services, tools). edge_helpers/ centralizes graph routing logic: command_patterns.py, router_factories.py, secure_routing.py, workflow_routing.py, monitoring, validation, and edge docs (edges.md). errors/ holds exception bases, aggregators, formatters, telemetry integration, LLM-specific exceptions, routing configuration, and tool exception wrappers. langgraph/ wraps integration helpers (graph_builder.py, graph_config.py, cross_cutting.py, runnable_config.py, state_immutability.py). logging/ placeholder for advanced logging bridges when package-level logging diverges. networking/ includes async HTTP and API clients, retry helpers, and typed models for external calls. services/ offers container abstractions, lifecycle management, registries, monitoring hooks, and HTTP service scaffolding. url_processing/ centralizes URL configuration, discovery, filtering, and validation utilities. utils/ spans capability inference, JSON/HTML utilities, graph helpers, lazy loading, regex security, and URL analysis/normalization. validation/ implements layered validation, including content checks, document chunking, condition security, statistics, LangGraph rule enforcement, and decorator support.

Examples: src/biz_bud/examples/

langgraph_state_patterns.py demonstrates state management strategies for LangGraph pipelines; reference before creating new graph state machines.

Graphs: src/biz_bud/graphs/

analysis/ contains graph.py and nodes/ covering data planning (plan.py), interpretation, visualization, and backups for legacy logic. catalog/ delivers catalog intelligence flows: graph.py, nodes.py, and nodes/ with analysis, research, defaults, catalog loaders, plus backups for experimentation. discord/ currently holds only __pycache__; reserved for future Discord graph support. examples/ bundles runnable samples (human_feedback_example.py, service_factory_example.py) with .backup copies for archival reference. paperless/ manages document processing: README.md, agent.py, graph.py, subgraphs.py, and nodes/ for document validation, receipt handling, and core processors. rag/ orchestrates retrieval-augmented workflows: graph.py, integrations.py, and nodes/ housing agent nodes, duplicate checks, batch processing, R2R uploads, scraping helpers, utilities, and workflow routers. rag/nodes/integrations/ delivers integration helpers (firecrawl/ config, repomix.py) for external connectors. rag/nodes/scraping/ offers URL analyzer, discovery, router, and summary nodes (plus .backup history). research/ packages research graphs: graph.py, backups, and nodes/ for query derivation, preparation, synthesis, processing, validation. scraping/ supplies a focused scraping graph implementation via graph.py.

Logging: src/biz_bud/logging/

config.py consumes logging_config.yaml to configure structured logging. formatters.py and utils.py provide logging helpers, while unified_logging.py centralizes logger creation.

Nodes: src/biz_bud/nodes/

core/ exposes batch management, input normalization, output shaping, and error handling nodes. error_handling/ provides analyzer, guidance, interceptor, and recovery logic to stabilize runs. extraction/ bundles semantic extractors, orchestrators, consolidated pipelines, and structured extractors. integrations/ currently focuses on Firecrawl configuration; extend for new data sources. llm/ houses call.py with unified LangChain/LangGraph invocation wrappers. scrape/ covers batch scraping, URL discovery, routing, and concrete scrape nodes. search/ includes orchestrators, query optimization, caching, ranking, monitoring, and research-specific search utilities. url_processing/ supplies typed discovery and validation nodes plus helper typing definitions. validation/ provides content, human feedback, and logical validation nodes for graph checkpoints.

Prompts: src/biz_bud/prompts/

Template modules for consistent messaging: analysis.py, defaults.py, error_handling.py, feedback.py, paperless.py, research.py, all exposed via __init__.py.

Services: src/biz_bud/services/

Root modules (config_manager.py, registry.py, container.py, lifecycle.py, factories.py, monitoring.py, http_service.py) coordinate service registration and health. factory/service_factory.py builds service instances for runtime injection. llm/ wraps LLM service wiring with client.py, configuration schemas, shared types.py, and utility helpers.

States: src/biz_bud/states/

Documentation (README.md) and base.py outline state layering conventions. Reusable fragments live in common_types.py, domain_types.py, focused_states.py, and unified.py. Workflow modules: analysis.py, buddy.py, catalog.py, market.py, planner.py, research.py, search.py, extraction.py, feedback.py, reflection.py, validation.py, receipt.py. RAG-specific files (rag.py, rag_agent.py, rag_orchestrator.py, url_to_rag.py, url_to_rag_r2r.py) cover retrieval agents. Validation models reside in validation_models.py; tool-capability state in tools.py. catalogs/ refines catalog structures via m_components.py and m_types.py.

Tools: src/biz_bud/tools/

browser/ defines browser abstractions (base.py, browser.py, driverless_browser.py, helper utilities). capabilities/ organizes tool registries by domain:

  • batch/receipt_processing.py batches receipt workflows.
  • database/tool.py and document/tool.py expose minimal wrappers.
  • external/paperless/tool.py binds to Paperless APIs.
  • extraction/ contains content.py, legacy_tools.py, receipt.py, statistics.py, structured.py, single_url_processor.py, and subpackages:
    • core/ (base classes, types), numeric/ (numeric extraction, quality),
    • statistics_impl/ (statistical extractors), text/ (structured text extraction).
  • fetch/tool.py standardizes remote fetch operations.
  • introspection/ provides tool.py, interface.py, models.py, and default providers.
  • scrape/ exposes interface.py, tool.py, and provider adapters (beautifulsoup.py, firecrawl.py, jina.py).
  • search/ mirrors scrape layout with providers for Arxiv, Jina, Tavily.
  • url_processing/ offers config.py, service.py, models, interface, and provider adapters for deduplication, discovery, normalization, validation.
  • utils/ currently awaits helper additions.
  • workflow/ implements execution/planning pipelines and validation helpers for orchestrated tool calls. clients/ wraps Firecrawl (firecrawl.py), Tavily (tavily.py), Paperless (paperless.py), Jina (jina.py), and R2R (r2r.py, r2r_utils.py). loaders/ provides web_base_loader.py for resilient web content ingestion. utils/html_utils.py supports DOM cleanup for downstream tools.

Other Files

logging_config.yaml ensures consistent structured logging. Backup modules (*.backup) remain for comparison; update or remove once superseded.

Maintenance Guidance

Update this guide whenever new directories or significant files appear under src/. Validate structural changes with basedpyright and pyrefly to catch import regressions. Keep placeholder directories until confirming nothing imports them as packages.