Files

Charis 68cb1a1870 feat(content api): add management api references to semantic search (#36289 )

* docs: add cursor rule for embedding generation process

Add documentation for cursor IDE about how docs embeddings are generated,
including the workflow for creating and uploading semantic search content.

* feat: improve API reference metadata upload with descriptive content

- Add preembeddings script to run codegen before embedding generation
- Enhance OpenApiReferenceSource to generate more descriptive content including
  parameters, responses, path information, and better structured documentation

* feat: add Management API references to searchDocs GraphQL query

- Add ManagementApiReference GraphQL type and model for API endpoint search results
- Integrate Management API references into global search results
- Update test snapshots and add comprehensive test coverage for Management API search

* style: format

2025-06-18 09:12:03 -04:00

2.5 KiB

Raw Permalink Blame History

Documentation Embeddings Generation System

Overview

The documentation embeddings generation system processes various documentation sources and uploads their metadata to a database for semantic search functionality. The system is located in apps/docs/scripts/search/ and works by:

Discovering content sources from multiple types of documentation
Processing content into structured sections with checksums
Generating embeddings using OpenAI's text-embedding-ada-002 model
Storing in database with vector embeddings for semantic search

Architecture

Main Entry Point

generate-embeddings.ts - Main script that orchestrates the entire process
Supports --refresh flag to force regeneration of all content

Content Sources (`sources/` directory)

Base Classes

BaseLoader - Abstract class for loading content from different sources
BaseSource - Abstract class for processing and formatting content

Source Types

Markdown Sources (markdown.ts)
- Processes .mdx files from guides and documentation
- Extracts frontmatter metadata and content sections
Reference Documentation (reference-doc.ts)
- OpenAPI References - Management API documentation from OpenAPI specs
- Client Library References - JavaScript, Dart, Python, C#, Swift, Kotlin SDKs
- CLI References - Command-line interface documentation
- Processes YAML/JSON specs and matches with common sections
GitHub Discussions (github-discussion.ts)
- Fetches troubleshooting discussions from GitHub using GraphQL API
- Uses GitHub App authentication for access
Partner Integrations (partner-integrations.ts)
- Fetches approved partner integration documentation from Supabase database
- Technology integrations only (excludes agencies)

Processing Flow

Content Discovery: Each source loader discovers and loads content files/data
Content Processing: Each source processes content into:
- Checksum for change detection
- Metadata (title, subtitle, etc.)
- Sections with headings and content
Change Detection: Compares checksums against existing database records
Embedding Generation: Uses OpenAI to generate embeddings for new/changed content
Database Storage: Stores in page and page_section tables with embeddings
Cleanup: Removes outdated pages using version tracking

Database Schema

page table: Stores page metadata, content, checksum, version
page_section table: Stores individual sections with embeddings, token counts

2.5 KiB Raw Permalink Blame History