Getting Started Guide With Text Embedding Models in LangChain

October 14, 2023 | by Arround The Web | No comments

Getting Started Guide With Text Embedding Models in LangChain

Text embedding is the process of converting text into numbers and creating vectors to store the text in numerical form. This way the machine/model can understand/process the data easily and it can apply similarity searches like semantic or other searches efficiently. The LangChain framework allows developers to build language models that can extract text in natural languages.

This post will demonstrate the process of getting started with text embedding models in LangChain.

Getting Started Guide With Text Embedding Models in LangChain.

The LangChain module enables the user to build multiple text embedding models like embedding for a document with multiple strings and single query embedding. To learn the process of getting to know how text embedding models work using the LangChain module, simply go through this straightforward guide:

Step 1: Install Modules
Start the process of using the text embedding models by installing the LangChain framework containing the required dependencies and libraries:

pip install langchain

Install tiktoken which is a tokenizer used to convert text into small chunks so the embeddings can be created easily:

pip install tiktoken

The last module to be installed for this guide is the OpenAI which can be used to call the OpenAIEmbedding() method for building the text embedders:

pip install openai

Step 2: Setting up OpenAI Environment and Importing Libraries
After installing all the required modules, simply set up the environment for the OpenAI using its API key:

import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

Build the embedding model by importing the OpenAIEmbeddings library and calling it to define the model:

from langchain.embeddings import OpenAIEmbeddings

embeddings_model = OpenAIEmbeddings()

Step 3: Embedding Documents
Once the model is defined, simply call the model with the embed_documents() method and display the length of embeddings with its index location:

embeddings = embeddings_model.embed_documents(
[
"Hello",
"Oh, Hi there!",
"Who am i talking to",
"My name is Jud",
"Hello Jud"
]
)
len(embeddings), len(embeddings[0])

Step 4: Embedding Queries
The last step uses the embed_query() method containing the string and displays only the first 5 embeddings of the query:

embedded_query = embeddings_model.embed_query("What was the name mentioned in the conversation")
embedded_query[:5]

That is all about the getting started guide with text embedding models in LangChain.

Conclusion

To get started with the text embedding models in LangChain, simply get the required dependencies or modules using the pip command. Setting up the OpenAI environment using its API key is another requirement to build the embedding models and to import the OpenAIEmbedding() method. After that, simply use the embed_documents() and embed_query() methods to apply these methods in the respective dataset. This guide illustrated the process of getting started with the text embedding models in LangChain.

Source: linuxhint.com