Working with LLMs: Handling Token Limits

Working with LLMs: Handling Token Limits

Apr 19, 2023

What are tokens?

Tokens are the smallest units of meaning in a language and are used by language models to understand the structure and meaning of a text. In Natural Language Processing (NLP), tokens are typically created by segmenting a sentence or a document into individual words or other meaningful units, such as phrases or named entities. This tokenization process allows Large Language Models (LLMs) to process and analyze large volumes of text data, making it possible to perform tasks such as language translation, text classification, sentiment analysis, and more.

Generally, you won’t tokenize on your own, but you should be aware that there are always many ways to tokenize a single sentence! Suppose you want to tokenize “Let’s tokenize! Isn’t this easy?”

Try the OpenAI tokenizer out here:

What are token limits?

The token limit is the maximum number of tokens that can be used in the prompt and the completion from the model. Most LLMs have token limits, which refer to the maximum number of tokens that the model can process at once. The token limit is determined by the architecture of the model. Have you heard of GPT4, or GPT3.5? Each model has a different token limit, even though they all originate from OpenAI.

You can take a look here for token limits based on each model:

As you can see, GPT4 has a higher token limit than GPT3.

As you can imagine, if you have a long document or a lot of text to summarize or to Q/A on top of, this can pose a problem. We have two mitigation strategies to help.


Let’s say you want to summarize all slack conversation that occurred in the last week. If you have a chatty team, it’s definitely over the token limit. You can overcome this by splitting up, or chunking, the conversation and summarizing each chunk. You can even create a summary on top of them all.


  • Think carefully on how you want to chunk your text. It will vary based on the type of text, but you should keep related content together in the same chunk. It could be time based, user based or just a simple window based chunking.

  • Summarization prompts: Consider what is important to you, so the LLM can retain that information in your summaries. Do you want to retain information about the user? Do you want to make sure any links or code shared is preserved and not just summarized.

  • This approach is generally simple but may not work for all use cases because it does lead to information loss.

Build a Knowledge Base:

To summarize and ask questions for a large corpus of text, another strategy is creating and then querying a knowledge base built on your data. You can store each data point (this can be a conversation, a paragraph in a book, etc.) as an embedding. In simple terms, an embedding is a representation of text strings, that can help us measure relatedness. You can create these embeddings using openAI apis.

Learn more about embeddings here :

You can store these embeddings in a vector database like Pinecone, ElasticSearch etc. Now based on a question, you can retrieve the top N similar embeddings based on similarity score like cosine and use that as context to ask your question.

The prompt can be as simple as :


  • Keep the data points not too long, so you don’t go over the context limit.

  • Most of the vector databases have the query retrieval process implemented, so you don’t have to worry about the ranking.

  • Along with the embeddings, you can store metadata like timestamps, users involved, etc. so that you can filter on the embeddings and then perform similarity search.

We’ve learned these techniques while building PromptOps, a slackbot who can help with all your DevOps needs. An ever present issue in DevOps is ensuring that the on-call engineer has all the necessary context to solve their problem- a case tailor made for building a knowledge base.

We hope you find these tips helpful. If you’re interested in learning more about how you can build products using LLMs follow us on Twitter or Linkedin, or just reach out to us at

© PromptOps 2023. All rights reserved.