Skip to content

Vectorization and Semantic Search: Complete Guide

Semantic search transforms how we find information by understanding meaning rather than just matching keywords. At its core is vectorization—converting text into numerical representations that capture semantic meaning, enabling machines to understand context, similarity, and relationships between pieces of information.

What you’ll learn: How to implement semantic search systems using embeddings and vector databases, from basic concepts to production-ready implementations.

Use cases:

  • Document search that understands user intent
  • Recommendation systems based on content similarity
  • Duplicate detection and content deduplication
  • Question answering and RAG systems
  • Multi-language search without translation

Time to complete: 60-75 minutes

Required knowledge:

  • Python 3.9+
  • Basic understanding of vectors and linear algebra
  • Familiarity with APIs and databases
  • Understanding of how LLMs work

Required accounts/tools:

  • OpenAI API key (for embeddings)
  • Optional: Pinecone, Weaviate, or Qdrant account for cloud vector databases
  • Python environment with pip

Optional but helpful:

  • Understanding of cosine similarity
  • Experience with NumPy
  • Knowledge of database systems

Embeddings are dense vector representations of data that capture semantic meaning. Similar concepts have similar vectors.

# Example: Words with similar meanings have similar vectors
"king"[0.2, 0.5, 0.8, ...] (1536 dimensions)
"queen"[0.25, 0.48, 0.82, ...] # Very similar!
"car"[-0.3, 0.1, -0.2, ...] # Very different
# Vector arithmetic captures relationships
king - man + woman ≈ queen
paris - france + italy ≈ rome

Traditional keyword search:

Query: "How do I fix a leaky faucet?"
Matches: Documents containing "fix", "leaky", "faucet"
Misses: "Repairing a dripping tap" (different words, same meaning)

Semantic search with embeddings:

Query embedding: [0.3, 0.7, ...]
Similar documents:
- "Repairing a dripping tap" (similarity: 0.92)
- "Fix leaky faucet guide" (similarity: 0.95)
- "Plumbing maintenance tips" (similarity: 0.78)
from openai import OpenAI
import numpy as np
client = OpenAI()
def create_embedding(text: str, model: str = "text-embedding-3-small") -> list[float]:
"""
Create an embedding for text using OpenAI's API.
Args:
text: Input text to embed
model: Embedding model to use
Returns:
List of floats representing the embedding vector
"""
# Clean text
text = text.replace("\n", " ").strip()
# Create embedding
response = client.embeddings.create(
model=model,
input=text
)
return response.data[0].embedding
# Example usage
text = "How do I build a semantic search system?"
embedding = create_embedding(text)
print(f"Embedding dimension: {len(embedding)}")
print(f"First 5 values: {embedding[:5]}")
# Output:
# Embedding dimension: 1536
# First 5 values: [0.0034, -0.0182, 0.0093, -0.0067, 0.0156]
from typing import Dict
EMBEDDING_MODELS = {
"text-embedding-3-small": {
"dimension": 1536,
"cost_per_1m_tokens": 0.02,
"max_tokens": 8191,
"use_case": "General purpose, cost-effective"
},
"text-embedding-3-large": {
"dimension": 3072,
"cost_per_1m_tokens": 0.13,
"max_tokens": 8191,
"use_case": "Higher quality, more expensive"
},
"text-embedding-ada-002": {
"dimension": 1536,
"cost_per_1m_tokens": 0.10,
"max_tokens": 8191,
"use_case": "Legacy model, being phased out"
}
}
def choose_embedding_model(
quality_priority: str = "balanced", # "cost", "balanced", "quality"
volume: str = "medium" # "low", "medium", "high"
) -> str:
"""Choose appropriate embedding model based on requirements"""
if quality_priority == "cost" or volume == "high":
return "text-embedding-3-small"
elif quality_priority == "quality":
return "text-embedding-3-large"
else:
return "text-embedding-3-small" # Default: best balance
# Usage
model = choose_embedding_model(quality_priority="cost", volume="high")
print(f"Recommended model: {model}")
print(f"Specs: {EMBEDDING_MODELS[model]}")
from typing import List
import time
def create_embeddings_batch(
texts: List[str],
model: str = "text-embedding-3-small",
batch_size: int = 100
) -> List[List[float]]:
"""
Create embeddings for multiple texts efficiently.
Args:
texts: List of texts to embed
model: Embedding model to use
batch_size: Number of texts to embed per API call
Returns:
List of embedding vectors
"""
all_embeddings = []
for i in range(0, len(texts), batch_size):
batch = texts[i:i + batch_size]
# Clean texts
cleaned_batch = [text.replace("\n", " ").strip() for text in batch]
# Create embeddings
response = client.embeddings.create(
model=model,
input=cleaned_batch
)
# Extract embeddings
batch_embeddings = [item.embedding for item in response.data]
all_embeddings.extend(batch_embeddings)
# Rate limiting (if needed)
if i + batch_size < len(texts):
time.sleep(0.1)
return all_embeddings
# Example: Embed 1000 documents efficiently
documents = [f"Document {i} about various topics" for i in range(1000)]
start_time = time.time()
embeddings = create_embeddings_batch(documents, batch_size=100)
elapsed = time.time() - start_time
print(f"Embedded {len(documents)} documents in {elapsed:.2f}s")
print(f"Rate: {len(documents)/elapsed:.1f} docs/sec")

Most common metric for comparing embeddings. Measures angle between vectors.

import numpy as np
from numpy.linalg import norm
def cosine_similarity(vec1: List[float], vec2: List[float]) -> float:
"""
Calculate cosine similarity between two vectors.
Returns value between -1 and 1:
1: Identical direction (most similar)
0: Orthogonal (unrelated)
-1: Opposite direction (most dissimilar)
"""
vec1_np = np.array(vec1)
vec2_np = np.array(vec2)
return np.dot(vec1_np, vec2_np) / (norm(vec1_np) * norm(vec2_np))
# Example
text1 = "I love programming in Python"
text2 = "Python is my favorite programming language"
text3 = "I enjoy cooking Italian food"
emb1 = create_embedding(text1)
emb2 = create_embedding(text2)
emb3 = create_embedding(text3)
sim_1_2 = cosine_similarity(emb1, emb2)
sim_1_3 = cosine_similarity(emb1, emb3)
print(f"Similarity (text1, text2): {sim_1_2:.4f}") # High (~0.85)
print(f"Similarity (text1, text3): {sim_1_3:.4f}") # Low (~0.30)
def euclidean_distance(vec1: List[float], vec2: List[float]) -> float:
"""
Calculate Euclidean (L2) distance.
Lower values = more similar.
"""
vec1_np = np.array(vec1)
vec2_np = np.array(vec2)
return np.linalg.norm(vec1_np - vec2_np)
def dot_product_similarity(vec1: List[float], vec2: List[float]) -> float:
"""
Calculate dot product similarity.
Higher values = more similar.
"""
return np.dot(vec1, vec2)
def manhattan_distance(vec1: List[float], vec2: List[float]) -> float:
"""
Calculate Manhattan (L1) distance.
Lower values = more similar.
"""
vec1_np = np.array(vec1)
vec2_np = np.array(vec2)
return np.sum(np.abs(vec1_np - vec2_np))
# Comparison
metrics = {
"Cosine Similarity": cosine_similarity(emb1, emb2),
"Euclidean Distance": euclidean_distance(emb1, emb2),
"Dot Product": dot_product_similarity(emb1, emb2),
"Manhattan Distance": manhattan_distance(emb1, emb2)
}
for metric, value in metrics.items():
print(f"{metric}: {value:.4f}")

When to use each metric:

  • Cosine Similarity: Best for most text applications (direction matters more than magnitude)
  • Euclidean Distance: When magnitude matters (e.g., image embeddings)
  • Dot Product: Fast approximation when vectors are normalized
  • Manhattan Distance: Robust to outliers, interpretable

Naive search (computing similarity for all vectors):

def naive_search(query_embedding: List[float], all_embeddings: List[List[float]], k: int = 5):
"""O(n) complexity - slow for large datasets"""
similarities = []
for i, emb in enumerate(all_embeddings):
sim = cosine_similarity(query_embedding, emb)
similarities.append((i, sim))
similarities.sort(key=lambda x: x[1], reverse=True)
return similarities[:k]
# For 1M vectors: ~seconds per query ❌

Vector database (using approximate nearest neighbor algorithms):

# Same query on vector DB: ~milliseconds per query ✅
# Uses HNSW, IVF, or similar algorithms for fast search
DatabaseTypeBest ForKey Features
ChromaDBEmbeddedDevelopment, small-medium datasetsEasy to use, embedded, open-source
PineconeCloudProduction, managed solutionFully managed, scalable, simple API
WeaviateSelf-hosted/CloudML-native applicationsGraphQL, hybrid search, modular
QdrantSelf-hosted/CloudHigh performance, filteringRust-based, fast, rich filters
MilvusSelf-hostedLarge scale, enterpriseHighly scalable, multi-cloud
FAISSLibraryResearch, custom solutionsFacebook’s library, very fast
import chromadb
from chromadb.config import Settings
# Initialize client
client = chromadb.PersistentClient(path="./chroma_db")
# Create collection
collection = client.create_collection(
name="my_documents",
metadata={"description": "Document embeddings for semantic search"}
)
# Add documents
documents = [
"Python is a high-level programming language",
"Machine learning is a subset of artificial intelligence",
"Neural networks are inspired by biological brains",
"Data science combines statistics and programming"
]
# ChromaDB can auto-generate embeddings (using default model)
# or you can provide your own
from openai import OpenAI
openai_client = OpenAI()
embeddings = []
for doc in documents:
response = openai_client.embeddings.create(
model="text-embedding-3-small",
input=doc
)
embeddings.append(response.data[0].embedding)
# Add to collection
collection.add(
documents=documents,
embeddings=embeddings,
ids=[f"doc_{i}" for i in range(len(documents))],
metadatas=[{"source": "manual", "index": i} for i in range(len(documents))]
)
# Query
query = "What is Python used for?"
query_embedding = openai_client.embeddings.create(
model="text-embedding-3-small",
input=query
).data[0].embedding
results = collection.query(
query_embeddings=[query_embedding],
n_results=2
)
print("Query:", query)
print("\nTop results:")
for i, (doc, distance) in enumerate(zip(results['documents'][0], results['distances'][0])):
print(f"{i+1}. {doc}")
print(f" Distance: {distance:.4f}\n")
from pinecone import Pinecone, ServerlessSpec
import os
# Initialize Pinecone
pc = Pinecone(api_key=os.getenv("PINECONE_API_KEY"))
# Create index
index_name = "semantic-search-demo"
if index_name not in pc.list_indexes().names():
pc.create_index(
name=index_name,
dimension=1536, # text-embedding-3-small dimension
metric="cosine",
spec=ServerlessSpec(
cloud="aws",
region="us-east-1"
)
)
# Connect to index
index = pc.Index(index_name)
# Prepare vectors for upsert
vectors_to_upsert = [
{
"id": f"doc_{i}",
"values": embedding,
"metadata": {
"text": doc,
"source": "manual",
"index": i
}
}
for i, (doc, embedding) in enumerate(zip(documents, embeddings))
]
# Upsert vectors
index.upsert(vectors=vectors_to_upsert)
# Query
query_embedding = openai_client.embeddings.create(
model="text-embedding-3-small",
input="What is Python used for?"
).data[0].embedding
results = index.query(
vector=query_embedding,
top_k=2,
include_metadata=True
)
print("Top results from Pinecone:")
for match in results['matches']:
print(f"Score: {match['score']:.4f}")
print(f"Text: {match['metadata']['text']}\n")
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
# Initialize client (local)
client = QdrantClient(path="./qdrant_db")
# Or cloud
# client = QdrantClient(
# url="https://your-cluster.qdrant.io",
# api_key=os.getenv("QDRANT_API_KEY")
# )
collection_name = "documents"
# Create collection
client.create_collection(
collection_name=collection_name,
vectors_config=VectorParams(size=1536, distance=Distance.COSINE)
)
# Prepare points
points = [
PointStruct(
id=i,
vector=embedding,
payload={
"text": doc,
"source": "manual",
"index": i
}
)
for i, (doc, embedding) in enumerate(zip(documents, embeddings))
]
# Upload points
client.upsert(
collection_name=collection_name,
points=points
)
# Search
search_result = client.search(
collection_name=collection_name,
query_vector=query_embedding,
limit=2
)
print("Top results from Qdrant:")
for hit in search_result:
print(f"Score: {hit.score:.4f}")
print(f"Text: {hit.payload['text']}\n")

Combine traditional keyword search with semantic search for best results.

from rank_bm25 import BM25Okapi
import numpy as np
class HybridSearch:
def __init__(self, documents: List[str], embeddings: List[List[float]]):
self.documents = documents
self.embeddings = np.array(embeddings)
# Initialize BM25 for keyword search
tokenized_docs = [doc.lower().split() for doc in documents]
self.bm25 = BM25Okapi(tokenized_docs)
def search(
self,
query: str,
query_embedding: List[float],
k: int = 5,
alpha: float = 0.5 # Weight for semantic vs keyword (0=all keyword, 1=all semantic)
):
"""
Perform hybrid search combining semantic and keyword search.
Args:
query: Search query text
query_embedding: Embedding of the query
k: Number of results to return
alpha: Balance between semantic (1.0) and keyword (0.0) search
Returns:
List of (index, score, document) tuples
"""
# Semantic search scores
query_emb = np.array(query_embedding)
semantic_scores = np.dot(self.embeddings, query_emb) / (
np.linalg.norm(self.embeddings, axis=1) * np.linalg.norm(query_emb)
)
# Keyword search scores (BM25)
tokenized_query = query.lower().split()
keyword_scores = self.bm25.get_scores(tokenized_query)
# Normalize scores to 0-1 range
semantic_scores_norm = (semantic_scores - semantic_scores.min()) / (semantic_scores.max() - semantic_scores.min() + 1e-10)
keyword_scores_norm = (keyword_scores - keyword_scores.min()) / (keyword_scores.max() - keyword_scores.min() + 1e-10)
# Combine scores
combined_scores = alpha * semantic_scores_norm + (1 - alpha) * keyword_scores_norm
# Get top k
top_indices = np.argsort(combined_scores)[::-1][:k]
results = [
(idx, combined_scores[idx], self.documents[idx])
for idx in top_indices
]
return results
# Usage
hybrid_searcher = HybridSearch(documents, embeddings)
results = hybrid_searcher.search(
query="programming Python",
query_embedding=query_embedding,
k=3,
alpha=0.7 # 70% semantic, 30% keyword
)
print("Hybrid search results:")
for idx, score, doc in results:
print(f"Score: {score:.4f} - {doc}")

First retrieve candidates with fast bi-encoder (embeddings), then re-rank with slower but more accurate cross-encoder.

from sentence_transformers import CrossEncoder
class RerankedSearch:
def __init__(self, vector_db, reranker_model: str = "cross-encoder/ms-marco-MiniLM-L-6-v2"):
self.vector_db = vector_db
self.cross_encoder = CrossEncoder(reranker_model)
def search(self, query: str, initial_k: int = 20, final_k: int = 5):
"""
Two-stage search with re-ranking.
Stage 1: Fast retrieval of initial_k candidates
Stage 2: Re-rank with cross-encoder to get final_k results
"""
# Stage 1: Initial retrieval
initial_results = self.vector_db.query(
query_embeddings=[query_embedding],
n_results=initial_k
)
documents = initial_results['documents'][0]
# Stage 2: Re-rank with cross-encoder
pairs = [[query, doc] for doc in documents]
rerank_scores = self.cross_encoder.predict(pairs)
# Sort by rerank scores
ranked_results = sorted(
zip(documents, rerank_scores),
key=lambda x: x[1],
reverse=True
)[:final_k]
return ranked_results
# Usage
reranked_searcher = RerankedSearch(collection)
results = reranked_searcher.search(
query="How to build machine learning models?",
initial_k=20,
final_k=5
)
print("Re-ranked results:")
for doc, score in results:
print(f"Score: {score:.4f} - {doc}")

Filter results based on metadata before or after similarity search.

# Add documents with rich metadata
collection.add(
documents=[
"Python programming tutorial",
"Advanced Python techniques",
"JavaScript basics for beginners"
],
embeddings=[...], # embeddings
ids=["doc1", "doc2", "doc3"],
metadatas=[
{"language": "python", "level": "beginner", "year": 2024},
{"language": "python", "level": "advanced", "year": 2024},
{"language": "javascript", "level": "beginner", "year": 2023}
]
)
# Query with filters
results = collection.query(
query_embeddings=[query_embedding],
n_results=5,
where={"language": "python"}, # Only Python documents
where_document={"$contains": "tutorial"} # Must contain "tutorial"
)
# Complex filters
results = collection.query(
query_embeddings=[query_embedding],
n_results=5,
where={
"$and": [
{"language": {"$eq": "python"}},
{"level": {"$ne": "advanced"}},
{"year": {"$gte": 2024}}
]
}
)

Search across multiple embedding spaces for different aspects.

class MultiVectorSearch:
def __init__(self):
self.collections = {
"content": chromadb.create_collection("content_embeddings"),
"title": chromadb.create_collection("title_embeddings"),
"summary": chromadb.create_collection("summary_embeddings")
}
def add_document(self, doc_id: str, content: str, title: str, summary: str):
"""Add document with multiple embeddings"""
content_emb = create_embedding(content)
title_emb = create_embedding(title)
summary_emb = create_embedding(summary)
self.collections["content"].add(ids=[doc_id], embeddings=[content_emb], documents=[content])
self.collections["title"].add(ids=[doc_id], embeddings=[title_emb], documents=[title])
self.collections["summary"].add(ids=[doc_id], embeddings=[summary_emb], documents=[summary])
def search(self, query: str, weights: dict = None):
"""
Search across multiple embedding spaces with weights.
Args:
query: Search query
weights: Dict of weights for each embedding space
e.g., {"content": 0.5, "title": 0.3, "summary": 0.2}
"""
if weights is None:
weights = {"content": 0.6, "title": 0.3, "summary": 0.1}
query_emb = create_embedding(query)
all_scores = {}
for space, weight in weights.items():
results = self.collections[space].query(
query_embeddings=[query_emb],
n_results=10
)
for doc_id, distance in zip(results['ids'][0], results['distances'][0]):
if doc_id not in all_scores:
all_scores[doc_id] = 0
# Convert distance to similarity and weight
similarity = 1 - distance # Assuming distance metric
all_scores[doc_id] += similarity * weight
# Sort by combined score
ranked = sorted(all_scores.items(), key=lambda x: x[1], reverse=True)
return ranked[:5]
from typing import List, Dict
class DocumentChunker:
"""Smart document chunking for optimal search performance"""
@staticmethod
def chunk_by_tokens(
text: str,
max_tokens: int = 512,
overlap_tokens: int = 50
) -> List[str]:
"""Chunk text by token count with overlap"""
# Simple word-based approximation (1 token ≈ 0.75 words)
words = text.split()
max_words = int(max_tokens * 0.75)
overlap_words = int(overlap_tokens * 0.75)
chunks = []
start = 0
while start < len(words):
end = min(start + max_words, len(words))
chunk = " ".join(words[start:end])
chunks.append(chunk)
start = end - overlap_words
if start >= len(words):
break
return chunks
@staticmethod
def chunk_by_sentences(
text: str,
max_sentences: int = 5,
overlap_sentences: int = 1
) -> List[str]:
"""Chunk text by sentences for better semantic coherence"""
import re
# Simple sentence splitting
sentences = re.split(r'(?<=[.!?])\s+', text)
chunks = []
start = 0
while start < len(sentences):
end = min(start + max_sentences, len(sentences))
chunk = " ".join(sentences[start:end])
chunks.append(chunk)
start = end - overlap_sentences
if start >= len(sentences):
break
return chunks
@staticmethod
def chunk_semantic(
text: str,
max_chunk_size: int = 1000
) -> List[str]:
"""Chunk by semantic boundaries (paragraphs, sections)"""
# Split by double newlines (paragraphs)
paragraphs = text.split("\n\n")
chunks = []
current_chunk = ""
for para in paragraphs:
if len(current_chunk) + len(para) <= max_chunk_size:
current_chunk += para + "\n\n"
else:
if current_chunk:
chunks.append(current_chunk.strip())
current_chunk = para + "\n\n"
if current_chunk:
chunks.append(current_chunk.strip())
return chunks
# Usage
text = """Your long document here..."""
chunker = DocumentChunker()
# Method 1: Token-based (most common)
chunks_tokens = chunker.chunk_by_tokens(text, max_tokens=512, overlap_tokens=50)
# Method 2: Sentence-based (better semantic coherence)
chunks_sentences = chunker.chunk_by_sentences(text, max_sentences=5)
# Method 3: Semantic boundaries (best for structured docs)
chunks_semantic = chunker.chunk_semantic(text, max_chunk_size=1000)
print(f"Token-based: {len(chunks_tokens)} chunks")
print(f"Sentence-based: {len(chunks_sentences)} chunks")
print(f"Semantic: {len(chunks_semantic)} chunks")
from functools import lru_cache
import hashlib
import pickle
import os
class EmbeddingCache:
"""Cache embeddings to reduce API calls"""
def __init__(self, cache_dir: str = "./embedding_cache"):
self.cache_dir = cache_dir
os.makedirs(cache_dir, exist_ok=True)
def _get_cache_key(self, text: str, model: str) -> str:
"""Generate cache key from text and model"""
content = f"{model}:{text}"
return hashlib.md5(content.encode()).hexdigest()
def get(self, text: str, model: str) -> List[float]:
"""Get embedding from cache"""
cache_key = self._get_cache_key(text, model)
cache_file = os.path.join(self.cache_dir, f"{cache_key}.pkl")
if os.path.exists(cache_file):
with open(cache_file, 'rb') as f:
return pickle.load(f)
return None
def set(self, text: str, model: str, embedding: List[float]):
"""Store embedding in cache"""
cache_key = self._get_cache_key(text, model)
cache_file = os.path.join(self.cache_dir, f"{cache_key}.pkl")
with open(cache_file, 'wb') as f:
pickle.dump(embedding, f)
def create_embedding_cached(self, text: str, model: str = "text-embedding-3-small") -> List[float]:
"""Create embedding with caching"""
# Check cache first
cached = self.get(text, model)
if cached is not None:
return cached
# Create new embedding
embedding = create_embedding(text, model)
# Cache it
self.set(text, model, embedding)
return embedding
# Usage
cache = EmbeddingCache()
# First call: hits API
emb1 = cache.create_embedding_cached("Hello world")
# Second call: uses cache (instant!)
emb2 = cache.create_embedding_cached("Hello world")
print("Embeddings identical:", emb1 == emb2) # True
from dataclasses import dataclass
from typing import List
import time
@dataclass
class SearchMetrics:
"""Track search performance metrics"""
query_count: int = 0
total_latency: float = 0.0
cache_hits: int = 0
cache_misses: int = 0
def record_query(self, latency: float, cache_hit: bool = False):
"""Record a search query"""
self.query_count += 1
self.total_latency += latency
if cache_hit:
self.cache_hits += 1
else:
self.cache_misses += 1
def get_avg_latency(self) -> float:
"""Get average query latency"""
if self.query_count == 0:
return 0.0
return self.total_latency / self.query_count
def get_cache_hit_rate(self) -> float:
"""Get cache hit rate"""
total = self.cache_hits + self.cache_misses
if total == 0:
return 0.0
return self.cache_hits / total
def report(self) -> str:
"""Generate metrics report"""
return f"""
Search Performance Metrics:
- Total Queries: {self.query_count}
- Average Latency: {self.get_avg_latency():.3f}s
- Cache Hit Rate: {self.get_cache_hit_rate():.2%}
- Cache Hits: {self.cache_hits}
- Cache Misses: {self.cache_misses}
""".strip()
# Usage
metrics = SearchMetrics()
def search_with_metrics(query: str, collection) -> dict:
"""Search with performance tracking"""
start = time.time()
# Check cache
cache_key = f"query:{query}"
cached_result = cache.get(cache_key, "results")
if cached_result:
metrics.record_query(time.time() - start, cache_hit=True)
return cached_result
# Perform search
query_emb = create_embedding(query)
results = collection.query(query_embeddings=[query_emb], n_results=5)
# Cache results
cache.set(cache_key, "results", results)
metrics.record_query(time.time() - start, cache_hit=False)
return results
# After many queries
print(metrics.report())

Issue: Poor search quality

# Solutions:
# 1. Check chunk sizes - too large or too small?
# 2. Ensure embeddings match (same model for indexing and querying)
# 3. Try hybrid search (semantic + keyword)
# 4. Add re-ranking
# 5. Review your data quality

Issue: Slow search performance

# Solutions:
# 1. Use proper vector database (not naive search)
# 2. Optimize index parameters (HNSW parameters, IVF clusters)
# 3. Implement caching
# 4. Use smaller embedding models
# 5. Pre-filter with metadata before similarity search

Issue: High costs

# Solutions:
# 1. Cache embeddings aggressively
# 2. Use text-embedding-3-small instead of large
# 3. Batch embedding operations
# 4. Implement TTL for rarely accessed embeddings

Related guides:

Advanced topics:

  • Multi-modal embeddings (text + images)
  • Fine-tuning embedding models
  • Sparse-dense hybrid search (SPLADE)
  • Cross-lingual semantic search
  • Embedding drift monitoring

Documentation:

Papers:

Community:


Found an issue? Open an issue or submit a PR!