- Fix Pydantic Field constraints using Annotated pattern - Fix database access to use asyncpg pool directly - Fix LLM client max_tokens parameter usage - Add type safety checks for dict operations - Fix Discord.py type annotations and overrides - Add pyrefly ignore comments for false positives - Fix bot.user null checks in event handlers - Ensure all Discord services pass type checking
8.8 KiB
Repository Guidelines
Comprehensive directory map for everything under src/ so agents and contributors can navigate confidently.
Legend & Scope
Lines reference paths relative to /home/vasceannie/repos/biz-budz.
__pycache__/ folders exist in most packages and are excluded from detail.
.backup files capture older implementations—consult primary modules first.
Root: src/
src/ holds all installable code declared in pyproject.toml.
Ensure PYTHONPATH=src when invoking modules directly or running ad-hoc scripts.
Package: src/biz_bud/
__init__.py exposes package exports; py.typed marks type completeness.
PROJECT_OVERVIEW.md summarizes architecture; webapp.py defines the FastAPI entry point.
.claude/settings.local.json stores assistant settings; safe to ignore for runtime logic.
Agents: src/biz_bud/agents/
AGENTS.md (package-level) documents agent orchestration expectations.
buddy_agent.py builds the Business Buddy orchestrator.
buddy_execution.py wires execution loops and callbacks.
buddy_routing.py handles task routing decisions.
buddy_nodes_registry.py maps node IDs to implementations.
buddy_state_manager.py encapsulates state mutations and safeguards.
Core: src/biz_bud/core/
Infrastructure shared by graphs, nodes, and services.
caching/ includes backends (cache_backends.py, memory.py, file.py), orchestrators (cache_manager.py), decorators, and redis.py; guidance lives in CACHING_GUIDELINES.md.
config/ provides layered config loading via loader.py, constants, ensure_tools_config.py, integration stubs, and schemas/ (TypedDict definitions for app, analysis, buddy, core, llm, research, services, tools).
edge_helpers/ centralizes graph routing logic: command_patterns.py, router_factories.py, secure_routing.py, workflow_routing.py, monitoring, validation, and edge docs (edges.md).
errors/ holds exception bases, aggregators, formatters, telemetry integration, LLM-specific exceptions, routing configuration, and tool exception wrappers.
langgraph/ wraps integration helpers (graph_builder.py, graph_config.py, cross_cutting.py, runnable_config.py, state_immutability.py).
logging/ placeholder for advanced logging bridges when package-level logging diverges.
networking/ includes async HTTP and API clients, retry helpers, and typed models for external calls.
services/ offers container abstractions, lifecycle management, registries, monitoring hooks, and HTTP service scaffolding.
url_processing/ centralizes URL configuration, discovery, filtering, and validation utilities.
utils/ spans capability inference, JSON/HTML utilities, graph helpers, lazy loading, regex security, and URL analysis/normalization.
validation/ implements layered validation, including content checks, document chunking, condition security, statistics, LangGraph rule enforcement, and decorator support.
Examples: src/biz_bud/examples/
langgraph_state_patterns.py demonstrates state management strategies for LangGraph pipelines; reference before creating new graph state machines.
Graphs: src/biz_bud/graphs/
analysis/ contains graph.py and nodes/ covering data planning (plan.py), interpretation, visualization, and backups for legacy logic.
catalog/ delivers catalog intelligence flows: graph.py, nodes.py, and nodes/ with analysis, research, defaults, catalog loaders, plus backups for experimentation.
discord/ currently holds only __pycache__; reserved for future Discord graph support.
examples/ bundles runnable samples (human_feedback_example.py, service_factory_example.py) with .backup copies for archival reference.
paperless/ manages document processing: README.md, agent.py, graph.py, subgraphs.py, and nodes/ for document validation, receipt handling, and core processors.
rag/ orchestrates retrieval-augmented workflows: graph.py, integrations.py, and nodes/ housing agent nodes, duplicate checks, batch processing, R2R uploads, scraping helpers, utilities, and workflow routers.
rag/nodes/integrations/ delivers integration helpers (firecrawl/ config, repomix.py) for external connectors.
rag/nodes/scraping/ offers URL analyzer, discovery, router, and summary nodes (plus .backup history).
research/ packages research graphs: graph.py, backups, and nodes/ for query derivation, preparation, synthesis, processing, validation.
scraping/ supplies a focused scraping graph implementation via graph.py.
Logging: src/biz_bud/logging/
config.py consumes logging_config.yaml to configure structured logging.
formatters.py and utils.py provide logging helpers, while unified_logging.py centralizes logger creation.
Nodes: src/biz_bud/nodes/
core/ exposes batch management, input normalization, output shaping, and error handling nodes.
error_handling/ provides analyzer, guidance, interceptor, and recovery logic to stabilize runs.
extraction/ bundles semantic extractors, orchestrators, consolidated pipelines, and structured extractors.
integrations/ currently focuses on Firecrawl configuration; extend for new data sources.
llm/ houses call.py with unified LangChain/LangGraph invocation wrappers.
scrape/ covers batch scraping, URL discovery, routing, and concrete scrape nodes.
search/ includes orchestrators, query optimization, caching, ranking, monitoring, and research-specific search utilities.
url_processing/ supplies typed discovery and validation nodes plus helper typing definitions.
validation/ provides content, human feedback, and logical validation nodes for graph checkpoints.
Prompts: src/biz_bud/prompts/
Template modules for consistent messaging: analysis.py, defaults.py, error_handling.py, feedback.py, paperless.py, research.py, all exposed via __init__.py.
Services: src/biz_bud/services/
Root modules (config_manager.py, registry.py, container.py, lifecycle.py, factories.py, monitoring.py, http_service.py) coordinate service registration and health.
factory/service_factory.py builds service instances for runtime injection.
llm/ wraps LLM service wiring with client.py, configuration schemas, shared types.py, and utility helpers.
States: src/biz_bud/states/
Documentation (README.md) and base.py outline state layering conventions.
Reusable fragments live in common_types.py, domain_types.py, focused_states.py, and unified.py.
Workflow modules: analysis.py, buddy.py, catalog.py, market.py, planner.py, research.py, search.py, extraction.py, feedback.py, reflection.py, validation.py, receipt.py.
RAG-specific files (rag.py, rag_agent.py, rag_orchestrator.py, url_to_rag.py, url_to_rag_r2r.py) cover retrieval agents.
Validation models reside in validation_models.py; tool-capability state in tools.py.
catalogs/ refines catalog structures via m_components.py and m_types.py.
Tools: src/biz_bud/tools/
browser/ defines browser abstractions (base.py, browser.py, driverless_browser.py, helper utilities).
capabilities/ organizes tool registries by domain:
batch/receipt_processing.pybatches receipt workflows.database/tool.pyanddocument/tool.pyexpose minimal wrappers.external/paperless/tool.pybinds to Paperless APIs.extraction/containscontent.py,legacy_tools.py,receipt.py,statistics.py,structured.py,single_url_processor.py, and subpackages:core/(base classes, types),numeric/(numeric extraction, quality),statistics_impl/(statistical extractors),text/(structured text extraction).
fetch/tool.pystandardizes remote fetch operations.introspection/providestool.py,interface.py,models.py, and default providers.scrape/exposesinterface.py,tool.py, and provider adapters (beautifulsoup.py,firecrawl.py,jina.py).search/mirrors scrape layout with providers for Arxiv, Jina, Tavily.url_processing/offersconfig.py,service.py, models, interface, and provider adapters for deduplication, discovery, normalization, validation.utils/currently awaits helper additions.workflow/implements execution/planning pipelines and validation helpers for orchestrated tool calls.clients/wraps Firecrawl (firecrawl.py), Tavily (tavily.py), Paperless (paperless.py), Jina (jina.py), and R2R (r2r.py,r2r_utils.py).loaders/providesweb_base_loader.pyfor resilient web content ingestion.utils/html_utils.pysupports DOM cleanup for downstream tools.
Other Files
logging_config.yaml ensures consistent structured logging.
Backup modules (*.backup) remain for comparison; update or remove once superseded.
Maintenance Guidance
Update this guide whenever new directories or significant files appear under src/.
Validate structural changes with basedpyright and pyrefly to catch import regressions.
Keep placeholder directories until confirming nothing imports them as packages.