What is Asyncio? Getting Started with Asynchronous Python and Using Asyncio in an AI Application with an LLM

What is Asyncio? Getting Started with Asynchronous Python and Using Asyncio in an AI Application with an LLM
Estimated reading time: 10 minutes
- It allows your program to switch between tasks instead of blocking, greatly improving efficiency.
- It offers significant performance boosts by running tasks in parallel rather than sequentially, especially for operations involving waiting, like network requests or API calls.
- Integrating
asyncio
with Large Language Models (LLMs) via asynchronous clients (e.g.,AsyncOpenAI
) can lead to dramatic speedups, as demonstrated by a nearly 6x improvement in the example. - Key applications include generating content for multiple users, multi-step LLM workflows, and fetching data from various APIs simultaneously.
- Embracing
asyncio
enhances performance, cost efficiency, user experience, and scalability in modern AI systems, making applications more responsive and robust under heavy load.
- What is Asyncio? Getting Started with Asynchronous Python and Using Asyncio in an AI Application with an LLM
- Key Takeaways
- What is Asyncio?
- Getting Started with Asynchronous Python
- Using Asyncio in an AI Application with an LLM
- Conclusion
- Frequently Asked Questions
In many AI applications today, performance is a big deal. You may have noticed that while working with , a lot of time is spent waiting—waiting for an API response, waiting for multiple calls to finish, or waiting for I/O operations.
That’s where comes in. Surprisingly, many developers use LLMs without realizing they can speed up their apps with asynchronous programming.
This guide will walk you through:
- What is asyncio?
- Getting started with asynchronous Python
- Using asyncio in an AI application with an LLM
What is Asyncio?
Python’s , allowing multiple I/O-bound tasks to run efficiently within a single thread. At its core, asyncio
works with awaitable objects—usually coroutines—that an event loop schedules and executes without blocking. This means your program doesn’t halt completely while waiting for one task to finish; instead, it can switch to another task that is ready to proceed.
In simpler terms, synchronous code runs tasks one after another, like standing in a single grocery line, patiently waiting for each customer to be served before the next one starts. , like using multiple self-checkout machines. When one machine is scanning items, you can immediately start scanning items at another, making the overall process much faster. This non-blocking approach is especially useful for operations like network requests (e.g., calling APIs from OpenAI, Anthropic, Hugging Face), where most of the time is spent waiting for responses, enabling much faster execution by allowing your program to perform other operations during these waiting periods.
Getting Started with Asynchronous Python
Understanding how to implement asynchronous patterns is crucial for optimizing modern applications, especially those involving external services or intensive I/O operations. Python’s for this. Before diving into complex scenarios, let’s explore basic examples to grasp the core concepts of synchronous versus asynchronous execution.
Actionable Step 1: Set Up Your Environment and Basic Understanding
To begin, ensure you have Python 3.7+ installed. asyncio
is a built-in library, so no extra installation is typically required. Familiarize yourself with the keywords (to define a coroutine or asynchronous context manager) and (to pause execution of a coroutine until an awaitable completes). The primary entry point for running asynchronous code is asyncio.run()
.
Example: Running Tasks With and Without asyncio
Let’s first observe how a program behaves when tasks are executed synchronously.
import time def say_hello(): print("Hello...") time.sleep(2) # simulate waiting (like an API call) print("...World!") def main(): say_hello() say_hello() say_hello() if __name__ == "__main__": start = time.time() main() print(f"Finished in {time.time() - start:.2f} seconds")
In this example, we ran a simple function three times in a synchronous way. The output shows that each call to say_hello()
prints “Hello…”, waits 2 seconds, then prints “…World!”. Since the calls happen one after another, the wait time adds up — 2 seconds × 3 calls = 6 seconds total. This clearly illustrates the bottleneck of sequential execution when tasks involve waiting. You can check out the FULL CODES here.
Now, let’s see the significant performance boost when we introduce .
import nest_asyncio, asyncio
nest_asyncio.apply()
import time async def say_hello(): print("Hello...") await asyncio.sleep(2) # simulate waiting (like an API call) print("...World!") async def main(): # Run tasks concurrently await asyncio.gather( say_hello(), say_hello(), say_hello() ) if __name__ == "__main__": start = time.time() asyncio.run(main()) print(f"Finished in {time.time() - start:.2f} seconds")
The above code shows that all three calls to the say_hello()
function started almost at the same time. Each prints “Hello…” immediately, then waits 2 seconds concurrently before printing “…World!”. Because these tasks ran in parallel rather than one after another, the total time is roughly the longest single wait time (~2 seconds) instead of the sum of all waits (6 seconds in the synchronous version). This demonstrates the performance advantage of asyncio
for I/O-bound tasks. The nest_asyncio.apply()
line is included here for environments like Jupyter notebooks that might already have an event loop running, allowing asyncio.run()
to function correctly. You can check out the FULL CODES here.
Example: Download Simulation
Imagine you need to download several files. Each download takes time, but during that wait, your program can work on other downloads instead of sitting idle. This is a classic scenario where asynchronous programming shines.
import asyncio
import random
import time async def download_file(file_id: int): print(f"Start downloading file {file_id}") download_time = random.uniform(1, 3) # simulate variable download time await asyncio.sleep(download_time) # non-blocking wait print(f"Finished downloading file {file_id} in {download_time:.2f} seconds") return f"File {file_id} content" async def main(): files = [1, 2, 3, 4, 5] start_time = time.time() # Run downloads concurrently results = await asyncio.gather(*(download_file(f) for f in files)) end_time = time.time() print("\nAll downloads completed.") print(f"Total time taken: {end_time - start_time:.2f} seconds") print("Results:", results) if __name__ == "__main__": asyncio.run(main())
All downloads started almost at the same time, as shown by the “Start downloading file X” lines appearing immediately one after another. Each file took a different amount of time to “download” (simulated with asyncio.sleep()
), so they finished at different times — for instance, file 3 might finish first in 1.42 seconds, and file 1 last in 2.67 seconds. Since all downloads were running concurrently, the total time taken was roughly equal to the longest single download time (around 2.68 seconds in one run), not the sum of all times. This demonstrates the — when tasks involve waiting, they can be done in parallel, greatly improving efficiency and throughput.
Using Asyncio in an AI Application with an LLM
Now that we understand how asyncio
works, let’s apply it to a real-world AI example. Large Language Models (LLMs) such as OpenAI’s GPT models often involve multiple API calls that each take time to complete. If we run these calls one after another, we waste valuable time waiting for responses, significantly increasing the total execution time of our applications.
Actionable Step 2: Install Asynchronous Libraries for LLMs
To interact with LLMs asynchronously, you’ll need a client library that supports async/await
. For OpenAI, this means using their . First, install the necessary package:
!pip install openai
And set up your API key securely:
import asyncio
from openai import AsyncOpenAI import os
from getpass import getpass
os.environ['OPENAI_API_KEY'] = getpass('Enter OpenAI API Key: ')
In this section, we’ll compare running multiple prompts with and without asyncio
using the OpenAI client. We’ll use 15 short prompts to clearly demonstrate the performance difference. You can check out the FULL CODES here.
Synchronous LLM Calls
First, let’s see the synchronous approach to querying an LLM multiple times. Each call blocks the program until a response is received.
import time
from openai import OpenAI # Create sync client
client = OpenAI() def ask_llm(prompt: str): response = client.chat.completions.create( model="gpt-4o-mini", messages=[{"role": "user", "content": prompt}] ) return response.choices[0].message.content def main(): prompts = [ "Briefly explain quantum computing.", "Write a 3-line haiku about AI.", "List 3 startup ideas in agri-tech.", "Summarize Inception in 2 sentences.", "Explain blockchain in 2 sentences.", "Write a 3-line story about a robot.", "List 5 ways AI helps healthcare.", "Explain Higgs boson in simple terms.", "Describe neural networks in 2 sentences.", "List 5 blog post ideas on renewable energy.", "Give a short metaphor for time.", "List 3 emerging trends in ML.", "Write a short limerick about programming.", "Explain supervised vs unsupervised learning in one sentence.", "List 3 ways to reduce urban traffic."
] start = time.time() results = [] for prompt in prompts: results.append(ask_llm(prompt)) end = time.time() for i, res in enumerate(results, 1): print(f"\n--- Response {i} ---") print(res) print(f"\n[Synchronous] Finished in {end - start:.2f} seconds") if __name__ == "__main__": main()
The synchronous version processed all 15 prompts one after another, so the total time is the sum of each request’s duration. Since each request took time to complete, the overall runtime was much longer — approximately in this case. This clearly demonstrates how waiting for each API call sequentially can drastically increase execution time for tasks involving multiple LLM interactions. You can check out the FULL CODES here.
Asynchronous LLM Calls
Now, let’s leverage .
from openai import AsyncOpenAI # Create async client
client = AsyncOpenAI() async def ask_llm(prompt: str): response = await client.chat.completions.create( model="gpt-4o-mini", messages=[{"role": "user", "content": prompt}] ) return response.choices[0].message.content async def main(): prompts = [ "Briefly explain quantum computing.", "Write a 3-line haiku about AI.", "List 3 startup ideas in agri-tech.", "Summarize Inception in 2 sentences.", "Explain blockchain in 2 sentences.", "Write a 3-line story about a robot.", "List 5 ways AI helps healthcare.", "Explain Higgs boson in simple terms.", "Describe neural networks in 2 sentences.", "List 5 blog post ideas on renewable energy.", "Give a short metaphor for time.", "List 3 emerging trends in ML.", "Write a short limerick about programming.", "Explain supervised vs unsupervised learning in one sentence.", "List 3 ways to reduce urban traffic."
] start = time.time() results = await asyncio.gather(*(ask_llm(p) for p in prompts)) end = time.time() for i, res in enumerate(results, 1): print(f"\n--- Response {i} ---") print(res) print(f"\n[Asynchronous] Finished in {end - start:.2f} seconds") if __name__ == "__main__": asyncio.run(main())
The asynchronous version processed all 15 prompts concurrently, starting them almost at the same time instead of one by one. As a result, the total runtime was close to the time of the slowest single request — approximately instead of adding up all requests. This is a nearly 6x speedup! The large difference happens because, in synchronous execution, each API call blocks the program until it finishes, so times add up. In asynchronous execution with asyncio
, API calls run in parallel, allowing the program to handle many tasks while waiting for responses, drastically reducing total execution time.
Actionable Step 3: Integrate Asyncio into Your LLM Workflows
Whenever your AI application involves multiple I/O-bound operations, especially API calls to LLMs, refactor your code to use . Replace synchronous client calls (e.g., OpenAI()
) with their asynchronous counterparts (e.g., AsyncOpenAI()
) and use asyncio.gather()
to run multiple tasks concurrently. This simple change can yield massive performance improvements.
Why This Matters in AI Applications
In real-world AI applications, waiting for each request to finish before starting the next can quickly become a bottleneck, especially when dealing with multiple queries or data sources. This is particularly common in workflows such as:
- Generating content for multiple users simultaneously — e.g., chatbots, recommendation engines, or multi-user dashboards, where many users expect near-instant responses.
- Calling the LLM several times in one workflow — such as for summarization, refinement, classification, or multi-step reasoning agents that might require iterative calls to an LLM.
- Fetching data from multiple APIs — for example, combining LLM output with information from a vector database, external knowledge bases, or other third-party APIs to enrich responses.
Using asyncio
in these cases brings significant benefits:
- — by making parallel API calls instead of waiting for each one sequentially, your system can handle more work in less time, directly impacting user satisfaction and system throughput.
- — faster execution can reduce operational costs by minimizing server idle time, and batching requests where possible can further optimize usage of paid APIs that might charge per unit of time or request.
- — concurrency makes applications feel more responsive, which is crucial for real-time systems like AI assistants and chatbots. Users expect quick interactions, and asynchronous programming helps deliver that.
- — asynchronous patterns allow your application to handle many more simultaneous requests without proportionally increasing resource consumption, making your AI services more robust and capable under heavy load.
Conclusion
Asynchronous programming with Python’s . By switching from synchronous to asynchronous execution, especially in the context of AI applications relying on Large Language Models, developers can achieve dramatic improvements in performance, responsiveness, and scalability. The ability to manage multiple concurrent tasks efficiently, without blocking the main thread, directly translates into faster results, lower operational costs, and a superior user experience. Embracing asyncio
is not just an optimization; it’s a fundamental shift in how we build high-performance, modern AI systems.
Check out the FULL CODES here to experiment with these examples yourself.
Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks.
Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
Frequently Asked Questions
What is the main benefit of using asyncio in Python?
The primary benefit of using . This includes tasks like network requests, file I/O, and API calls. Asyncio allows your program to perform other tasks while waiting for slow operations to complete, preventing the entire application from blocking.
When should I use asyncio in my Python projects?
You should use asyncio
primarily for tasks where your program spends a lot of time waiting for external resources. Common use cases include:
- Web scraping
- API calls (especially to LLMs)
- Database interactions
- Network programming
For CPU-bound tasks (heavy computations), `asyncio` will not provide a direct speedup; multiprocessing would be a more suitable choice.
What are async
and await
keywords in Python?
The async
keyword is used to define a coroutine function, which is a special type of function that can be paused and resumed. The await
keyword is used inside an async
function to pause its execution until an “awaitable” object (such as asyncio.sleep()
or an asynchronous API call) completes. This pausing allows the event loop to switch to other tasks during the wait, enabling concurrency.
How does asyncio improve LLM application performance?
LLM applications frequently make multiple API calls to models like OpenAI’s GPT. If these calls are executed synchronously, each one blocks the program until a response is received, leading to very long total execution times. By using asyncio
with asynchronous LLM clients (e.g., ) and asyncio.gather()
, multiple LLM calls can be initiated and processed concurrently. This dramatically reduces the overall waiting time, resulting in a significantly faster application.
Is asyncio suitable for all types of Python applications?
No, asyncio
is primarily designed for and most effective in I/O-bound applications. For applications that are CPU-bound (i.e., they spend most of their time performing intensive calculations rather than waiting for external resources), `asyncio` will not offer a performance advantage. In such scenarios, techniques like multi-processing are typically more appropriate.