Exploring Weaviate: Your Go-To Vector Database for RAG Applications
Written on
Chapter 1: Introduction to Weaviate
In the realm of Retrieval Augmented Generation (RAG) applications, a vector database plays an essential role. It not only stores vectors and their associated metadata but also efficiently computes similarity metrics and enhances semantic search capabilities. However, selecting the right vector database for production-ready RAG applications can be challenging. Factors such as cost, query performance, and scalability must be meticulously assessed, particularly when dealing with millions of vectors or real-time requirements.
My experience at a pharmaceutical company has led me through this decision-making process. In this article, I aim to elucidate why Weaviate stands out as an excellent choice for managing vector data and facilitating RAG development, especially for developers with limited experience in large language models (LLMs).
To simplify this tutorial, I will begin by loading data from a public API, indexing it in Weaviate, and showcasing some of its impressive features.
Here's what we will cover:
- Loading research papers from the Papers With Code public API
- Setting up Weaviate locally
- Indexing data in batches
- Exploring Weaviate's search functionalities:
- Vector search
- Keyword search
- Hybrid search
- Re-ranking
- Advanced filtering
- Generative search: Transforming your database into a RAG system
1. Loading Research Papers from the Papers With Code Public API 🔌
To explore Weaviate's features, we first need to ingest some data. Various datasets are available, but today we will focus on research papers related to Large Language Models. We will extract this dataset from the Papers With Code public API.
To collect the dataset, execute the following function:
import urllib
import requests
from tqdm import tqdm
def extract_papers(query: str):
query = urllib.parse.quote(query)
response = requests.get(url).json()
count = response["count"]
results = response["results"]
num_pages = count // 50
for page in tqdm(range(2, num_pages + 1)):
response = requests.get(url).json()
results += response["results"]
return results
query = "Large Language Models"
papers = extract_papers(query)
This dataset consists of approximately 11,000 paper abstracts, each containing various attributes and metadata, such as:
{
"id": "n-gram-counts-and-language-models-from-the",
"title": "N-gram Counts and Language Models from the Common Crawl",
"abstract": "We contribute 5-gram counts and language models trained on the Common Crawl corpus...",
"authors": ["Christian Buck", "Bas van Ooyen", "Kenneth Heafield"],
"published": "2014-05-01"
}
2. Setting Up Weaviate Locally ⚙️
Weaviate is an open-source, AI-native vector database designed to assist developers in creating intuitive and dependable AI-powered applications. It can be utilized in various ways:
- Locally via Docker Compose
- On a Kubernetes cluster
- Through Weaviate Cloud Services (WCS), a managed offering
For demonstration purposes, we'll use Docker Compose. The setup process is straightforward:
- Create an empty weaviate_data folder on your host to store Weaviate data.
- Prepare a Docker Compose file with the following configuration:
version: '3.4'
services:
weaviate:
command:
- --host
- 0.0.0.0
- --port
- '8080'
- --scheme
- http
image: cr.weaviate.io/semitechnologies/weaviate:1.24.6
ports:
- 8080:8080
- 50051:50051
volumes:
- ./weaviate_data:/var/lib/weaviate
restart: on-failure:0
environment:
QUERY_DEFAULTS_LIMIT: 25
PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
DEFAULT_VECTORIZER_MODULE: 'none'
ENABLE_MODULES: 'text2vec-palm,reranker-cohere,generative-palm'
AUTHENTICATION_APIKEY_ENABLED: 'true'
AUTHENTICATION_APIKEY_ALLOWED_KEYS: 'admin'
AUTHENTICATION_APIKEY_USERS: 'ahmed'
- Execute the command:
docker-compose up
To ensure that Weaviate has started successfully, you can check the relevant URLs.
After launching Weaviate, you'll need to create a client to interact with it. This client can be configured to connect with other third-party cloud services like OpenAI, VertexAI, or Cohere.
Why is this connection necessary? Weaviate does more than merely store vectors and compute similarities. It can embed data while indexing, perform document re-ranking post-retrieval, and execute RAG operations by providing answers based on search results.
For this tutorial, I will link Weaviate to VertexAI for text embedding and generation, and to the Cohere API for the re-ranking task.
3. Indexing the Data in Batches 🗂️
Before we can index data into Weaviate, we must create a collection and define the schema. This is accomplished through the create method, which specifies:
- Collection name: MLPapers
- Properties (or fields)
properties = [
Property(name="context", data_type=DataType.TEXT, tokenization=wvc.config.Tokenization.LOWERCASE, vectorize_property_name=True),
Property(name="title", data_type=DataType.TEXT, tokenization=wvc.config.Tokenization.LOWERCASE, vectorize_property_name=True),
Property(name="authors", data_type=DataType.TEXT_ARRAY, tokenization=wvc.config.Tokenization.LOWERCASE, vectorize_property_name=False),
Property(name="conference", data_type=DataType.TEXT, vectorize_property_name=False),
Property(name="date", data_type=DataType.DATE, vectorize_property_name=False),
Property(name="paper_id", data_type=DataType.TEXT, vectorize_property_name=False),
Property(name="arxiv_id", data_type=DataType.TEXT, vectorize_property_name=False),
Property(name="nips_id", data_type=DataType.TEXT, vectorize_property_name=False),
]
Once the collection is created, data can be indexed in batches, which expedites the process. Here's the code snippet for that:
batch_size = 50
indices = range(0, len(papers), batch_size)
batches = [papers[i:i + batch_size] for i in indices]
ids = []
for batch in tqdm(batches):
inserted_objects = collection.data.insert_many(batch)
ids += inserted_objects.all_responses
After the indexing process is complete, you can verify the number of vectors using the aggregate method:
collection.aggregate.over_all(total_count=True)
# AggregateReturn(properties={}, total_count=10592)
4. Weaviate Search Functionalities 🔍
Weaviate provides various search options that we'll explore in this section.
#### 4.1 Vector Search
The primary search method utilizes vector similarities. Here's how to implement it:
client = re_instantiate_weaviate()
collection = client.collections.get(collection_name)
query = "How to fine-tune an LLM?"
semantic_responses = collection.query.near_text(
query=query,
limit=3,
return_metadata=["certainty"],
)
Weaviate embeds the text query using the vectorizer and then retrieves similar results from the collection.
#### 4.2 Keyword Search
Weaviate supports keyword searches through the BM25 algorithm, a straightforward method based on string matching. To execute a keyword search, call the bm25 method:
keyword_response = collection.query.bm25(
query=query,
limit=5,
return_metadata=["certainty"],
)
#### 4.3 Hybrid Search
One of the standout features of Weaviate is its hybrid search, which merges multiple search algorithms to enhance the accuracy and relevance of results. This approach allows for matching keywords while also considering semantics.
To perform a hybrid search, use the following code:
hybrid_response = collection.query.hybrid(
query=query,
limit=3,
alpha=0.5,
)
5. Generative Search: Transforming Your Database into a RAG System 🤖
Beyond retrieving relevant documents using various search algorithms, Weaviate can also combine search results into a prompt, send it to an LLM, and generate an answer.
To enable generative search, ensure that the generative-palm module is activated when starting Weaviate. Prepare a prompt template to integrate the retrieved context:
prompt = """
Answer the following questions by using the provided context only.
If the context is not helpful or doesn't provide any relevant details, just say "I don't know."
Question = {query}
Context = {abstract}
"""
Then, to generate an answer using vector search, execute this function:
response = collection.generate.near_text(
query=query,
limit=5,
grouped_task=prompt,
rerank=Rerank(
prop="abstract",
query=query,
),
)
Conclusion
Weaviate stands as an AI-native vector database that seamlessly integrates into the generative AI landscape. By connecting to third-party services and open-source models, it not only retrieves vectors but also performs additional tasks such as data vectorization, document re-ranking, and generating answers through RAG.
The flexibility of deploying Weaviate to a cluster using Kubernetes or accessing it through its managed service makes it an excellent choice for handling intensive workloads. One of the most appealing aspects of this open-source database is the developer experience; everything operates cohesively, and the API is well-designed for optimal usability.
The first video provides an overview of vector databases, focusing on their role and functionalities.
The second video guides you through creating your first vector database with Weaviate, showcasing practical implementation steps.