Skip to content

Developer Guide Template

What you’ll build: [Describe the end result]

Use cases: [When would someone use this?]

Time to complete: [Realistic estimate]

Required knowledge:

  • [e.g., Python 3.9+]
  • [e.g., Understanding of async/await patterns]
  • [e.g., Familiarity with REST APIs]

Required accounts/tools:

  • [e.g., OpenAI API key]
  • [e.g., Pinecone account (free tier works)]
  • [e.g., Git and a code editor]

Optional but helpful:

  • [e.g., Docker knowledge for containerization]
  • [e.g., Experience with LangChain]
[Component 1] → [Component 2] → [Component 3]
↓ ↓ ↓
[Data Flow Description]

Key components:

  • [Component 1]: [What it does]
  • [Component 2]: [What it does]
  • [Component 3]: [What it does]
Terminal window
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install packages
pip install langchain openai pinecone-client tiktoken

Create a .env file:

OPENAI_API_KEY=your-openai-key-here
PINECONE_API_KEY=your-pinecone-key-here
PINECONE_ENVIRONMENT=your-environment

Security note: Never commit .env files to version control. Add to .gitignore:

.gitignore
.env
venv/
__pycache__/

Goal: [What this step accomplishes]

# Import necessary modules
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Pinecone
import os
from dotenv import load_dotenv
# Load environment variables
load_dotenv()
# Initialize embeddings
embeddings = OpenAIEmbeddings(
model="text-embedding-3-small",
openai_api_key=os.getenv("OPENAI_API_KEY")
)

Why this works: [Explain the technical reasoning]

Common issues:

  • Problem: [Common error]
    • Solution: [How to fix it]

Goal: [What this step accomplishes]

import pinecone
# Initialize Pinecone
pinecone.init(
api_key=os.getenv("PINECONE_API_KEY"),
environment=os.getenv("PINECONE_ENVIRONMENT")
)
# Create or connect to index
index_name = "my-rag-index"
if index_name not in pinecone.list_indexes():
pinecone.create_index(
name=index_name,
dimension=1536, # OpenAI embedding dimension
metric="cosine"
)
# Initialize vector store
vectorstore = Pinecone.from_existing_index(
index_name=index_name,
embedding=embeddings
)

Performance considerations:

  • [e.g., Cosine similarity is optimal for OpenAI embeddings]
  • [e.g., Dimension must match embedding model output]

[Add as many steps as needed to complete the implementation]

import pytest
from your_module import your_function
def test_embedding_generation():
"""Test that embeddings are generated correctly"""
text = "Test document"
result = your_function(text)
assert len(result) == 1536 # OpenAI embedding dimension
assert all(isinstance(x, float) for x in result)
def test_retrieval():
"""Test that similar documents are retrieved"""
query = "test query"
results = vectorstore.similarity_search(query, k=3)
assert len(results) <= 3
assert all(hasattr(doc, 'page_content') for doc in results)
def test_full_pipeline():
"""Test the complete RAG pipeline"""
# Add documents
docs = ["Document 1", "Document 2", "Document 3"]
vectorstore.add_texts(docs)
# Query
query = "test query"
results = rag_chain.run(query)
# Verify
assert results is not None
assert len(results) > 0

Run tests:

Terminal window
pytest tests/ -v

Embedding optimization:

  • Use text-embedding-3-small for cost/performance balance
  • Batch embeddings for large document sets
  • Cache frequently used embeddings

Vector store optimization:

  • Set appropriate k value for retrieval (3-5 typically optimal)
  • Use metadata filtering to narrow search space
  • Consider approximate nearest neighbor (ANN) for large datasets

Estimated costs (as of 2025):

  • Embeddings: ~$0.02 per 1M tokens
  • LLM calls: Varies by model ($0.50-$15 per 1M tokens)
  • Vector storage: ~$0.10 per 100k vectors/month

Cost-saving strategies:

# Cache embeddings
from langchain.cache import InMemoryCache
langchain.llm_cache = InMemoryCache()
# Use cheaper models for simple queries
from langchain.llms import OpenAI
llm = OpenAI(model="gpt-3.5-turbo") # vs gpt-4
# Dockerfile
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["python", "main.py"]

Build and run:

Terminal window
docker build -t my-rag-app .
docker run --env-file .env my-rag-app

Monitoring:

  • Track API latency and error rates
  • Monitor token usage for cost control
  • Set up alerts for API failures

Security:

  • Rotate API keys regularly
  • Use environment variables, never hardcode keys
  • Implement rate limiting
  • Validate and sanitize user inputs

Scalability:

  • Use async operations for concurrent requests
  • Implement caching layer (Redis/Memcached)
  • Consider serverless deployment (AWS Lambda, Cloud Run)

Error: RateLimitError: Rate limit exceeded

# Solution: Implement exponential backoff
import time
from openai.error import RateLimitError
def call_with_retry(func, max_retries=3):
for attempt in range(max_retries):
try:
return func()
except RateLimitError:
if attempt < max_retries - 1:
time.sleep(2 ** attempt)
else:
raise

Error: InvalidDimensionError: Vector dimension mismatch

  • Cause: Embedding model output doesn’t match index dimension
  • Solution: Verify index dimension matches model (1536 for text-embedding-3-small)

Error: No results returned from similarity search

  • Cause: No documents in vector store or query too specific
  • Solution: Verify documents were added; try broader query terms

Enhancements to consider:

  • Add conversation memory for multi-turn interactions
  • Implement semantic caching to reduce API calls
  • Add metadata filtering for more precise retrieval
  • Create a web interface with Streamlit or FastAPI
  • Add observability with LangSmith or similar

Related guides:

  • [Link to related developer guide]
  • [Link to another relevant guide]

Official documentation:

Example repositories:

  • [Link to GitHub repo with complete code]
  • [Link to related example]

Community:


Found an issue with this guide? Open an issue or submit a PR!