* feat: enhance coverage reporting and improve tool configuration - Added support for JSON coverage reports in pyproject.toml. - Updated .gitignore to include coverage.json and task files for better management. - Introduced a new Type Safety Audit Report to document findings and recommendations for type safety improvements. - Created a comprehensive coverage configuration guide to assist in understanding coverage reporting setup. - Refactored tools configuration to utilize environment variables for concurrent scraping settings. These changes improve the project's testing and reporting capabilities while enhancing overall code quality and maintainability. * feat: enhance configuration handling and improve error logging - Introduced a new utility function `_get_env_int` for robust environment variable integer retrieval with validation. - Updated `WebToolsConfig` and `ToolsConfigModel` to utilize the new utility for environment variable defaults. - Enhanced logging in `CircuitBreaker` to provide detailed state transition information. - Improved URL handling in `url_analyzer.py` for better file extension extraction and normalization. - Added type validation and logging in `SecureInputMixin` to ensure input sanitization and validation consistency. These changes improve the reliability and maintainability of configuration management and error handling across the codebase. * refactor: update imports and enhance .gitignore for improved organization - Updated import paths in various example scripts to reflect the new structure under `biz_bud`. - Enhanced .gitignore to include clearer formatting for task files. - Removed obsolete function calls and improved error handling in several scripts. - Added public alias for backward compatibility in `upload_r2r.py`. These changes improve code organization, maintainability, and compatibility across the project. * refactor: update graph paths in langgraph.json for improved organization - Changed paths for research, catalog, paperless, and url_to_r2r graphs to reflect new directory structure. - Added new entries for analysis and scraping graphs to enhance functionality. These changes improve the organization and maintainability of the graph configurations. * fix: enhance validation and error handling in date range and scraping functions - Updated date validation in UserFiltersModel to ensure date values are strings. - Improved error messages in create_scraped_content_dict to clarify conditions for success and failure. - Enhanced test coverage for date validation and scraping content creation to ensure robustness. These changes improve input validation and error handling across the application, enhancing overall reliability. * refactor: streamline graph creation and enhance type annotations in examples - Simplified graph creation in `catalog_ingredient_research_example.py` and `catalog_tech_components_example.py` by directly compiling the graph. - Updated type annotations in `catalog_intel_with_config.py` for improved clarity and consistency. - Enhanced error handling in catalog data processing to ensure robustness against unexpected data types. These changes improve code readability, maintainability, and error resilience across example scripts. * Update src/biz_bud/nodes/extraction/extractors.py Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com> * Update src/biz_bud/core/validation/pydantic_models.py Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com> * refactor: migrate Jina and Tavily clients to use ServiceFactory dependency injection * refactor: migrate URL processing to provider-based architecture with improved error handling * feat: add FirecrawlApp compatibility classes and mock implementations * fix: add thread-safe locking to LazyLoader factory management * feat: implement service restart and refactor cache decorator helpers * refactor: move r2r_direct_api_call to tools.clients.r2r_utils and improve HTTP service error handling * chore: update Sonar task IDs in report configuration --------- Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
14 KiB
Of course. Writing a script to enforce architectural conventions is an excellent way to maintain a large codebase. Statically analyzing your code is far more reliable than manual reviews for catching these kinds of deviations.
This script will use Python's built-in ast (Abstract Syntax Tree) module. It's the most robust way to analyze Python code, as it understands the code's structure, unlike simple text-based searches which can be easily fooled.
The script will identify modules, functions, or packages that are NOT using your core dependency infrastructure by looking for "anti-patterns"—the use of standard libraries or direct instantiations where your custom framework should be used instead.
The Script: audit_core_dependencies.py
Save the following code as a Python file (e.g., audit_core_dependencies.py) in the root of your repository.
import ast
import os
import argparse
from typing import Any, Dict, List, Set, Tuple
# --- Configuration of Anti-Patterns ---
# Direct imports of libraries that should be replaced by your core infrastructure.
# Maps the disallowed module to the suggested core module/function.
DISALLOWED_IMPORTS: Dict[str, str] = {
"logging": "biz_bud.logging.unified_logging.get_logger",
"requests": "biz_bud.core.networking.http_client.HTTPClient",
"httpx": "biz_bud.core.networking.http_client.HTTPClient",
"aiohttp": "biz_bud.core.networking.http_client.HTTPClient",
"asyncio.gather": "biz_bud.core.networking.async_utils.gather_with_concurrency",
}
# Direct instantiation of service clients or tools that should come from the factory.
DISALLOWED_INSTANTIATIONS: Dict[str, str] = {
"TavilySearchProvider": "ServiceFactory.get_service() or create_tools_for_capabilities()",
"JinaSearchProvider": "ServiceFactory.get_service() or create_tools_for_capabilities()",
"ArxivProvider": "ServiceFactory.get_service() or create_tools_for_capabilities()",
"FirecrawlClient": "ServiceFactory.get_service() or a dedicated provider from ScrapeService",
"TavilyClient": "ServiceFactory.get_service()",
"PostgresStore": "ServiceFactory.get_db_service()",
"LangchainLLMClient": "ServiceFactory.get_llm_client()",
"HTTPClient": "HTTPClient.get_or_create_client() instead of direct instantiation",
}
# Built-in exceptions that should ideally be wrapped in a custom BusinessBuddyError.
DISALLOWED_EXCEPTIONS: Set[str] = {
"Exception",
"ValueError",
"KeyError",
"TypeError",
"AttributeError",
"NotImplementedError",
}
class InfrastructureVisitor(ast.NodeVisitor):
"""
AST visitor that walks the code tree and identifies violations
of the core dependency infrastructure usage.
"""
def __init__(self, filepath: str):
self.filepath = filepath
self.violations: List[Tuple[int, str]] = []
self.imported_names: Dict[str, str] = {} # Maps alias to full import path
def _add_violation(self, node: ast.AST, message: str):
self.violations.append((node.lineno, message))
def visit_Import(self, node: ast.Import) -> None:
"""Checks for `import logging`, `import requests`, etc."""
for alias in node.names:
if alias.name in DISALLOWED_IMPORTS:
suggestion = DISALLOWED_IMPORTS[alias.name]
self._add_violation(
node,
f"Disallowed import '{alias.name}'. Please use '{suggestion}'."
)
self.generic_visit(node)
def visit_ImportFrom(self, node: ast.ImportFrom) -> None:
"""Checks for direct service/client imports, e.g., `from biz_bud.tools.clients import TavilyClient`"""
if node.module:
for alias in node.names:
full_import_path = f"{node.module}.{alias.name}"
# Store the imported name (could be an alias)
self.imported_names[alias.asname or alias.name] = full_import_path
# Check for direct service/tool instantiation patterns
if "biz_bud.tools.clients" in node.module or \
"biz_bud.services" in node.module and "factory" not in node.module:
if alias.name in DISALLOWED_INSTANTIATIONS:
suggestion = DISALLOWED_INSTANTIATIONS[alias.name]
self._add_violation(
node,
f"Disallowed direct import of '{alias.name}'. Use the ServiceFactory: '{suggestion}'."
)
self.generic_visit(node)
def visit_Raise(self, node: ast.Raise) -> None:
"""Checks for `raise ValueError` instead of a custom error."""
if isinstance(node.exc, ast.Call) and isinstance(node.exc.func, ast.Name):
exception_name = node.exc.func.id
elif isinstance(node.exc, ast.Name):
exception_name = node.exc.id
else:
exception_name = "unknown"
if exception_name in DISALLOWED_EXCEPTIONS:
self._add_violation(
node,
f"Raising generic exception '{exception_name}'. Please use a custom `BusinessBuddyError` from `core.errors.base`."
)
self.generic_visit(node)
def visit_Assign(self, node: ast.Assign) -> None:
"""Checks for direct state mutation like `state['key'] = value`."""
for target in node.targets:
if isinstance(target, ast.Subscript) and isinstance(target.value, ast.Name):
if target.value.id == 'state':
self._add_violation(
node,
"Direct state mutation `state[...] = ...` detected. Please use `StateUpdater` for immutable updates."
)
self.generic_visit(node)
def visit_Call(self, node: ast.Call) -> None:
"""
Checks for:
1. Direct instantiation of disallowed classes (e.g., `TavilyClient()`).
2. Direct use of `asyncio.gather`.
3. Direct state mutation via `state.update(...)`.
"""
# 1. Check for direct instantiations
if isinstance(node.func, ast.Name):
class_name = node.func.id
if class_name in DISALLOWED_INSTANTIATIONS:
# Verify it's not a legitimate call, e.g. a function with the same name
if self.imported_names.get(class_name, "").endswith(class_name):
suggestion = DISALLOWED_INSTANTIATIONS[class_name]
self._add_violation(
node,
f"Direct instantiation of '{class_name}'. Use the ServiceFactory: '{suggestion}'."
)
# 2. Check for `asyncio.gather` and `state.update`
if isinstance(node.func, ast.Attribute):
attr = node.func
if isinstance(attr.value, ast.Name):
parent_name = attr.value.id
attr_name = attr.attr
# Check for asyncio.gather
if parent_name == 'asyncio' and attr_name == 'gather':
suggestion = DISALLOWED_IMPORTS['asyncio.gather']
self._add_violation(
node,
f"Direct use of 'asyncio.gather'. Please use '{suggestion}' for controlled concurrency."
)
# Check for state.update()
if parent_name == 'state' and attr_name == 'update':
self._add_violation(
node,
"Direct state mutation with `state.update()` detected. Please use `StateUpdater`."
)
self.generic_visit(node)
def audit_directory(directory: str) -> Dict[str, List[Tuple[int, str]]]:
"""Scans a directory for Python files and audits them."""
all_violations: Dict[str, List[Tuple[int, str]]] = {}
for root, _, files in os.walk(directory):
for file in files:
if file.endswith(".py"):
filepath = os.path.join(root, file)
try:
with open(filepath, "r", encoding="utf-8") as f:
source_code = f.read()
tree = ast.parse(source_code, filename=filepath)
visitor = InfrastructureVisitor(filepath)
visitor.visit(tree)
if visitor.violations:
all_violations[filepath] = visitor.violations
except (SyntaxError, ValueError) as e:
all_violations[filepath] = [(0, f"ERROR: Could not parse file: {e}")]
return all_violations
def main():
parser = argparse.ArgumentParser(description="Audit Python code for adherence to core infrastructure.")
parser.add_argument(
"directory",
nargs="?",
default="src/biz_bud",
help="The directory to scan. Defaults to 'src/biz_bud'."
)
args = parser.parse_args()
print(f"--- Auditing directory: {args.directory} ---\n")
violations = audit_directory(args.directory)
if not violations:
print("\033[92m✅ All scanned files adhere to the core infrastructure rules.\033[0m")
return
print(f"\033[91m🔥 Found {len(violations)} file(s) with violations:\033[0m\n")
total_violations = 0
for filepath, file_violations in violations.items():
print(f"\033[1m\033[93mFile: {filepath}\033[0m")
for line, message in sorted(file_violations):
print(f" \033[96mL{line}:\033[0m {message}")
total_violations += 1
print("-" * 20)
print(f"\n\033[1m\033[91mSummary: Found {total_violations} total violations in {len(violations)} files.\033[0m")
if __name__ == "__main__":
main()
How to Run the Script
- Save the file as
audit_core_dependencies.pyin your project's root directory. - Run from your terminal:
# Scan the default 'src/biz_bud' directory python audit_core_dependencies.py # Scan a different directory python audit_core_dependencies.py path/to/your/code
How It Works and What It Detects
This script defines a series of "anti-patterns" and then checks your code for them.
-
Logging (
DISALLOWED_IMPORTS):- Anti-Pattern:
import logging - Why: Your custom logging in
biz_bud.logging.unified_loggingandservices.logger_factoryis designed to provide structured, context-aware logs. Using the standardlogginglibrary directly bypasses this, leading to inconsistent log formats and loss of valuable context like trace IDs or node names. - Detection: The script flags any file that directly imports the
loggingmodule.
- Anti-Pattern:
-
Errors (
DISALLOWED_EXCEPTIONS):- Anti-Pattern:
raise ValueError("...")orexcept Exception: - Why: Your
core.errorsframework is built to create a predictable, structured error handling system. Raising generic exceptions bypasses your custom error types (BusinessBuddyError), telemetry, and routing logic. This leads to unhandled crashes and makes it difficult to implement targeted recovery strategies. - Detection: The
visit_Raisemethod checks if the code is raising a standard, built-in exception instead of a custom one.
- Anti-Pattern:
-
HTTP & APIs (
DISALLOWED_IMPORTS):- Anti-Pattern:
import requestsorimport httpx - Why: Your
core.networking.http_client.HTTPClientprovides a centralized, singleton session manager with built-in retry logic, timeouts, and potentially unified headers or proxy configurations. Using external HTTP libraries directly fragments this logic, leading to inconsistent network behavior and making it harder to manage connections. - Detection: Flags any file importing
requests,httpx, oraiohttp.
- Anti-Pattern:
-
Tools, Services, and Language Models (
DISALLOWED_INSTANTIATIONS):- Anti-Pattern:
from biz_bud.tools.clients import TavilyClient; client = TavilyClient() - Why: Your
ServiceFactoryis the single source of truth for creating and managing the lifecycle of services. It handles singleton behavior, dependency injection, and centralized configuration. Bypassing it means you might have multiple instances of a service (e.g., multiple database connection pools), services without proper configuration, or services that don't get cleaned up correctly. - Detection: The script first identifies direct imports of service or client classes and then uses
visit_Callto check if they are being instantiated directly.
- Anti-Pattern:
-
State Reducers (
visit_Assign,visit_Call):- Anti-Pattern:
state['key'] = valueorstate.update({...}) - Why: Your architecture appears to be moving towards immutable state updates (as hinted by
core/langgraph/state_immutability.pyand the concept of reducers). Direct mutation of the state dictionary is an anti-pattern because it can lead to unpredictable side effects, making the graph's flow difficult to trace and debug. Using aStateUpdaterclass or reducers ensures that state changes are explicit and traceable. - Detection: The script specifically looks for assignment to
state[...]or calls tostate.update().
- Anti-Pattern:
-
Concurrency (
visit_Call):- Anti-Pattern:
asyncio.gather(...) - Why: Your
gather_with_concurrencywrapper incore.networking.async_utilslikely adds a semaphore or other logic to limit the number of concurrent tasks. Callingasyncio.gatherdirectly bypasses this control, which can lead to overwhelming external APIs with too many requests, hitting rate limits, or exhausting system resources. - Detection: The script looks for direct calls to
asyncio.gather.
- Anti-Pattern:
This script provides a powerful, automated first line of defense to enforce your architectural standards and significantly reduce the classes of bugs you asked about.