Unlocking Keyword Search: Enhancing Queries with Weaviate
Written on
Understanding Keyword Search
Searching for information is an integral part of our daily lives. We utilize search engines such as Google, Bing, or You.com regularly, and we also look for content on platforms like YouTube and social media. The concept of searching is so ingrained in our routines that it’s challenging to envision our lives without it. Businesses, too, rely on search functionalities to sift through internal documents. The predominant method for implementing search systems is through Keyword Search.
In this section, we will cover the following topics:
- The mechanics of keyword search
- Implementing a keyword search system in Python
- How language models can enhance search systems
Let's dive in!
How Does Keyword Search Operate?
To grasp the fundamentals of keyword search, consider the following illustration.
Imagine querying, “What does the abbreviation EPS represent in finance?” In our example, we are looking through a limited dataset of four sentences. By comparing the number of matching words between the query and the dataset, we observe that the fourth sentence aligns the most with the query, making it a likely candidate for retrieval.
A Closer Look at Keyword Search Systems
The schematic below presents a conceptual overview of a keyword search system.
Key components include the query, the search system, and an Inverted Index table. The process is straightforward: a query is submitted, and the system returns ranked results. Typically, the initial phase employs the BM25 algorithm to score documents and utilizes an Inverted Index for optimized performance. For instance, if the query includes the term “EPS,” the system quickly identifies documents containing that keyword, significantly enhancing search speed.
Practical Implementation of Keyword Search in Python
Now let’s look at a hands-on example in Python. This demonstration connects to a database to conduct a keyword search.
Setting Up the Environment
First, ensure Python, conda, and pip are installed on your machine. Create a virtual environment in your terminal with the following commands:
conda create -n keyword-search-demo python=3.9.12
conda activate keyword-search-demo
With the environment ready, install the required packages:
pip install weaviate-client cohere
Keyword Search Example
After creating an account on the Cohere platform, obtain your API key. Use the following code snippet to authenticate with the Weaviate database, which contains extensive data from Wikipedia:
import cohere
import weaviate
weaviate_api_key = "76320a90-53d8-42bc-b41d-678647c6672e"
auth_config = weaviate.auth.AuthApiKey(api_key=weaviate_api_key)
cohere_api_key = 'YOUR API KEY'
client = weaviate.Client(
auth_client_secret=auth_config,
additional_headers={
"X-Cohere-Api-Key": cohere_api_key,}
)
client.is_ready()
Replace 'YOUR API KEY' with your actual Cohere API key. The last line verifies the connection.
Querying the Database
The function below executes a keyword search:
def keyword_search(query, results_lang='en', properties=["title", "url", "text"], num_results=3):
where_filter = {
"path": ["lang"],
"operator": "Equal",
"valueString": results_lang
}
response = (
client.query.get("Articles", properties)
.with_bm25(query=query)
.with_where(where_filter)
.with_limit(num_results)
.do()
)
result = response['data']['Get']['Articles']
return result
You can invoke this function with the following query:
query = "What does the abbreviation EPS stand for in the financial industry?"
keyword_search_results = keyword_search(query)
print(keyword_search_results)
This will yield a list of results, but the output may not be user-friendly. You can format it for better readability with:
def print_result(result):
""" Print results with formatting """
for i, item in enumerate(result):
print(f'item {i}')
for key in item.keys():
print(f"{key}:{item.get(key)}")
print()
print()
Output Example
The results will be structured as follows:
item 0
text:...
title:Subprime mortgage crisis
item 1
text:...
title:Nassim Nicholas Taleb
item 2
text:...
title:UBS
Language Models: Enhancing Search Systems
In the subsequent section, we will explore how language models can address the limitations inherent in traditional keyword search systems.
Limitations of Keyword Search Systems
Keyword searches can struggle with nuanced queries. For example, asking, “It’s a bear market” may not yield relevant results if the document contains phrases like “sharp declines in share prices.” Language models enhance search capabilities by interpreting the overall meaning, allowing for more relevant document retrieval.
Improving Search with Language Models
The illustration below depicts a search system augmented by language models.
Incorporating language models can refine both stages of the search process, enhancing accuracy through embeddings and generating more relevant results.
Conclusion
In this article, we explored the workings of keyword search systems and how to implement them using the Weaviate API. We also highlighted the limitations of such systems and introduced language models as a viable solution for improving search accuracy.
This video covers the AI/ML Workshop with Weaviate Vector Database, elaborating on the integration of AI and keyword search.
In this video, Laura Ham discusses building your own search application with Weaviate at the Conf42 Machine Learning 2022 event.
Stay Connected
Join our free weekly Magic AI newsletter for the latest updates on AI and technology. Don’t forget to check our digital products page for exciting freebies!
Learn more about us on our About page. If you enjoyed this article, feel free to share it. Thank you for reading!