4.1 KiB
Here are clear written examples of metadata tagging in both Open WebUI and Weaviate, showing how you can associate tags and structured metadata with knowledge objects for RAG and semantic search.
Example: Metadata Tagging in Open WebUI
You send a document to the Open WebUI API endpoint, attaching metadata and tags in the content field as a JSON string:
POST http://localhost/api/v1/documents/create
Content-Type: application/json
{
"name": "policy_doc_2022",
"title": "2022 Policy Handbook",
"collection_name": "company_handbooks",
"filename": "policy_2022.pdf",
"content": "{\"tags\": [\"policy\", \"2022\", \"hr\"], \"source_url\": \"https://example.com/policy_2022.pdf\", \"author\": \"Jane Doe\"}"
}
- The
"tags"field is a list of labels for classification (policy, 2022, hr). - The
"source_url"and"author"fields provide additional metadata useful for retrieval, audit, and filtering.[1][2]
For pipeline-based ingestion, you might design a function to extract and append metadata before vectorization:
metadata = {
"tags": ["policy", "2022"],
"source_url": document_url,
"author": document_author
}
embed_with_metadata(chunk, metadata)
This metadata becomes part of your retrieval context in RAG workflows.[1]
Example: Metadata Tagging in Weaviate
In Weaviate, metadata and tags are defined directly in the schema and attached to each object when added:
Schema definition:
{
"class": "Document",
"properties": [
{"name": "title", "dataType": ["text"]},
{"name": "tags", "dataType": ["text[]"]},
{"name": "source_url", "dataType": ["text"]},
{"name": "author", "dataType": ["text"]}
]
}
Object creation example:
client.data_object.create(
data_object={
"title": "2022 Policy Handbook",
"tags": ["policy", "2022", "hr"],
"source_url": "https://example.com/policy_2022.pdf",
"author": "Jane Doe"
},
class_name="Document"
)
- The
"tags"field is a text array, ideal for semantic filtering and faceting. - Other fields store provenance metadata, supporting advanced queries and data governance.[3][4][5]
Query with metadata filtering:
result = (
client.query
.get("Document", ["title", "tags", "author"])
.with_filter({"path": ["tags"], "operator": "ContainsAny", "value": ["policy", "hr"]})
.do()
)
This retrieves documents classified with either "policy" or "hr" tags.[4][3]
Both platforms support metadata tagging for documents, which enables powerful RAG scenarios, detailed filtering, and context-rich retrievals.[5][2][3][4][1]