Python’s Global Interpreter Lock (GIL) often raises questions about concurrency and performance, especially for web frameworks like FastAPI. How does FastAPI stay so fast despite the GIL, and how can you run it with multiple workers to fully leverage multi-core CPUs?
Let’s explore these concepts clearly.
The Global Interpreter Lock, or GIL, is a mutex that ensures only one thread executes Python bytecode at any given moment inside a single process. This simplifies memory management and protects Python objects from concurrent access issues. However, it means pure Python threads cannot run code in parallel on multiple CPU cores, limiting how multi-threaded Python programs handle CPU-bound tasks.
This sounds like bad news for a web framework that needs to handle many requests simultaneously, right? Not entirely.
How FastAPI Achieves High Performance Despite the GIL?
FastAPI is designed to handle many simultaneous requests efficiently by leveraging Python’s asynchronous programming capabilities, specifically the async/await syntax.
- Asynchronous I/O: FastAPI endpoints can be defined as async functions. When these functions perform I/O operations like waiting for a database query, network response, or file access, they yield control (using await) back to an event loop. This means while one request is waiting, the server can start working on other requests, without the need for multiple threads running in parallel.
- Single-threaded event loop: FastAPI runs on ASGI servers like Uvicorn that manage an event loop in a single thread. This avoids the overhead and complexity of thread locking under the GIL because only one thread executes Python code at a time, but efficiently switches between many tasks waiting for I/O.
- Ideal for I/O-bound tasks: Web APIs typically spend a lot of time waiting for I/O operations, so asynchronous concurrency lets FastAPI handle many requests without needing multiple CPU cores or threads.
But What If Your Application Is CPU-bound or You Need More Parallelism?
For CPU-bound workloads (heavy calculations) or simply to better utilize multi-core CPUs for handling many requests in parallel, you need multiple processes. This is where Uvicorn’s worker processes come in.
Uvicorn, the ASGI server often used to run FastAPI, supports spawning multiple worker processes via the --workers option. Each worker is a separate process with its own Python interpreter and GIL. Workers run independently and can handle requests concurrently across different CPU cores. The master Uvicorn process listens on a port and delegates incoming requests to the worker processes.
This model effectively bypasses the single-thread GIL limitation by scaling workload horizontally over processes rather than threads (unlike multi-threading in Java or .NET frameworks - e.g. Spring Boot, ASP.NET MVC)
Set the number of workers roughly equal to your CPU cores for optimal utilization. Each worker is a separate process, so memory usage will increase.
When deploying with containers or orchestration tools like Kubernetes, it’s common to run one worker per container and scale containers horizontally.
Please NOTE that 95% of web applications and REST apis are NOT CPU-bound, but I/O bound. So even a single FastAPI server with async programming should more than suffice. Throw-in an additional server with a load balancer for high availability.
But what if you have synchronous libraries and cannot run async in FastAPI? Well FastAPI can handle sync routes also as follows:
When FastAPI routes are defined as synchronous functions (def), the framework handles them by running the route handlers in an external thread pool instead of the main event loop thread. This approach prevents blocking the server's event loop, allowing requests to be processed concurrently despite the synchronous code. The synchronous route is effectively executed on a worker thread managed by the thread pool executor in the underlying Starlette framework. While this allows parallel execution, blocking I/O operations in sync routes still consume a thread and can reduce scalability under heavy load. Therefore, sync routes in FastAPI run concurrently but rely on thread-based parallelism rather than true asynchronous non-blocking concurrency as with async def routes. The default number of threads in FastAPI's thread pool for handling synchronous routes is 40.