HomeEncyclopediaArchitecture & Technical

architecture icon

Vector Database

Latest update: 26/04/30


Back to › Architecture & Technical


Definition

A vector database is a specialized database designed to store and search embeddings – the numerical representations of text, images, or other data – so an AI system can quickly find content based on meaning rather than exact keywords.

Transformer Architecture, Explained

Before 2017, most language AI processed text word by word in sequence – reading left to right and carrying information forward through time. It worked, but it was slow and struggled with long-range connections. A word at the end of a paragraph would barely influence how the model understood a word at the beginning.

The transformer changed that. Introduced in a 2017 Google paper titled “Attention Is All You Need,” it replaced sequential processing with a mechanism that looks at all words simultaneously and measures the relationships between every pair of them. That shift in architecture is what made today’s large language models possible.

Every major model you’ve used – GPT, Claude, Gemini, LLaMA – runs on transformer architecture.

💡 How Does It Work?

When content gets added to a vector database – a document, a sentence, a product description – an embedding model converts it into a vector: a list of hundreds or thousands of numbers that encodes its meaning. That vector gets stored alongside the original content.

When you run a search, your query gets converted into a vector too. The database then finds the stored vectors that are closest to your query vector – using distance calculations in high-dimensional space. The closest matches are the most semantically relevant results.

Think of it like a city where every document lives at a specific address based on its meaning. Similar documents live near each other. When you search, you’re not looking up a street name – you’re asking “what’s near this coordinate?” The database finds everything in that neighborhood.

Why It Matters for Your Prompts

Vector databases don’t show up in most casual AI interactions – they work behind the scenes. But if you’re building AI tools, or if you’re using any system that lets you “ask questions about your documents,” a vector database is almost certainly involved.

Understanding them explains a behavior you might notice in RAG-powered tools: results can be oddly sensitive to phrasing. The system finds content based on semantic similarity to your query. If your query uses different vocabulary than the documents in the database, the distance between vectors grows – and relevance drops. Using terminology that matches your knowledge base’s language often retrieves better results than rephrasing in your own words.

It also explains why these systems can fail silently. If your query lands in a region of vector space where your documents are sparse, the “closest” results might still be a poor match – and the AI will generate from them anyway. The system found the nearest neighbor, not necessarily the right one.

🌐 Real-World Example

A law firm stores 50,000 contracts in a vector database. A paralegal asks: “Find any clauses about what happens to intellectual property if the partnership dissolves.”

A keyword search returns nothing – none of the contracts use the word “dissolves.” They say “termination,” “wind-down,” “expiry,” and “conclusion of the agreement.”

The vector database finds contracts with clauses about IP rights upon contract termination, ownership transfer at wind-down, and assignment of assets after expiry. Different words, same meaning, high semantic similarity.

The paralegal gets relevant results. The keyword search would have found nothing at all.

Related Terms

  • Embedding – Embeddings are what get stored in a vector database; without embedding models to convert content into vectors, the database has nothing to search.
  • Retrieval-Augmented Generation (RAG) – RAG systems use vector databases as their retrieval layer; the database is where the knowledge base lives.
  • Latent Space – The “space” a vector database searches through is essentially the latent space of the embedding model used to encode the content.
  • Semantic Search – Semantic search is the practical output of a vector database query – finding content by meaning instead of keyword.
  • Inference – In a RAG pipeline, the vector database retrieval happens just before inference – it feeds the language model the relevant content it needs to generate an accurate response.

Frequently Asked Questions

Do I need a vector database to build an AI application?

Not always. For small knowledge bases – a few dozen documents – you can often store embeddings in a regular database or even in memory and search them directly. Vector databases become necessary when you’re working at scale: thousands or millions of documents where search speed and efficiency matter. If you’re building a prototype, start simple. Graduate to a dedicated vector database when performance becomes a constraint.

What’s the difference between a vector database and a regular search engine?

Regular search engines are optimized for keyword and phrase matching – they index words and find documents containing them. Vector databases search by mathematical proximity in embedding space – finding documents that are semantically close to a query regardless of exact wording. They’re complementary: keyword search is fast and precise for known terms; vector search handles ambiguous, conceptual, or natural language queries better. Many production systems use both.

Which vector databases are most commonly used?

Pinecone, Weaviate, Qdrant, Chroma, and Milvus are among the most widely adopted purpose-built options. Postgres with the pgvector extension is a popular choice for teams that want to add vector search to an existing database setup. OpenAI and many embedding API providers have their own storage options too. The right choice depends on scale, infrastructure, and whether you need managed hosting or prefer open source.

Can a vector database go out of date?

Yes – and it’s something to actively manage. If your documents change and you don’t re-embed and re-index them, the database serves stale vectors. The content might be updated but the vectors still reflect the old version. Any RAG system that relies on current information needs a process for keeping the vector database in sync with the underlying content it represents.

References

Further Reading

Author Daniel: AI prompt specialist with over 5 years of experience in generative AI, LLM optimization, and prompt chain design. Daniel has helped hundreds of creators improve output quality through structured prompting techniques. At our AI Prompting Encyclopedia, he breaks down complex prompting strategies into clear, actionable guides.