A concise overview of approaches available in Python

Photo by ThisisEngineering RAEng on Unsplash

If you’re about to start a big data project you will be either retrieving a lot of information or crunching big numbers on your machine, or both. However, if the code is sequential or synchronous your application may start struggling.

Let’s see which concepts and Python libraries can improve your application’s performance in each case.

What concurrency and parallelism are and what problems they address

There are two dimensions along which you can speed your program — I/O and CPU consumption. If your code does, for instance, a lot of file accessing or communication over the network, it’s I/O-bound. CPU-bound code involves heavy computation. For example, training a statistical model is definitely a compute-intensive job.

How do both types of work differ regarding the resources they require?

When the I/O-bound code sends multiple requests, it doesn’t really utilize the machine’s CPU cores that much since essentially, it’s idly waiting for the responses. Hence, such applications can’t improve their performance by having more compute power added. It’s more about the wait time between the request and response.

It’s the opposite for the CPU-bound pieces of code.

The two mechanisms to alleviate either type of bottlenecks are concurrency and parallelism respectively.

Generally, concurrency is considered to be a larger concept than parallelism. Simply put, it’s doing multiple things at the same time. In practice, there is a particular angle to the distinction between the two ideas, especially in Python. Concurrency is then often understood as “managing” multiple jobs simultaneously. In actuality, those jobs don’t really execute all at the same time. They cleverly alternate.

The parallel execution, however, does mean executing multiple jobs simultaneously, or in parallel. The parallelism allows to leverage multiple cores on a single machine.

Three Python libraries for concurrency and parallelism

In Python, concurrency is represented by threading and asyncio, whereas parallelism is achieved with multiprocessing.


With threading, you create multiple threads across which you distribute some I/O-bound workload. For example, if you have a simple function to download some files download_file(f), you can use the ThreadPoolExecutor to spin up the threads and map the function calls to each argument file from a list of files:

with ThreadPoolExecutor() as executor:
executor.map(download_file, files)

Something that is worth mentioning here is that threading in Python doesn’t work quite the same way as it does in other languages like Java— CPython’s Global Interpreter Lock (GIL) actually ensures that memory usage is thread-safe and so only one thread can be processed at a time (see more in threading documentation). So it’s really a concurrent mechanism as defined above.


With asyncio, you create tasks for similar purposes:

tasks = [asyncio.create_task(download_file(f)) for f in files]

But the idea behind the asyncio’s tasks is different from the threads. In fact, tasks run on a single thread. However, each task allows the OS to run another task if the first one is waiting for its response instead of blocking it. That’s the essence of the asynchronous IO. (A more thorough walk-through for an asynchronous program in a later article).

Most often, you’ll also want to create a special event loop object to manage the tasks in your main function.

threading vs asyncio

The asyncio module can improve performance even more considerably for I/O-bound programs because there is less overhead with creating and managing tasks than with threads.

The threading approach can be deemed more dangerous because the switching between the threads can happen at any time, even in the middle of a statement execution due to the pre-emptive multitasking, whereas asyncio tasks signal themselves when they are ready to switch over —a mechanism that is known as cooperative multitasking. If an issues occurs at that time, it can be more difficult to trace it with the threading approach.

However, using the asyncio module involves writing a fair amount of code to just accommodate it.


Both approaches above work well for speeding up I/O-bound programs. As for the CPU-bound programs, it’s multiprocessing that will really help.

The multiprocessing module creates a Python interpreter for each process. It actually leverages the number of CPU cores on your machine. A typical example of a CPU-bound piece of code is compressing files. So if you have a function compress_file(f), the syntax for spinning up new processes and distributing the workload across them will look similar to the threading example:

with ProcessPoolExecutor() as executor:
executor.map(compress_file, files)

If multiprocessing is so amazing, why not use it all the time?

There are a couple of tricky things about writing code with multiprocessing. First, you need to be able to establish if some data might actually need to be accessed by all processes — because the memory between the processes is not shared. Also, it can sometimes be difficult to figure out which parts of the program can be cleanly split up into separate processes.

Finally, you should carefully evaluate the trade-offs between the performance gain thanks to multiprocessing and its cost in your case. If the computation in fact is not all that intensive, multiprocessing might not speed things up that much because of the significant overhead associated with spinning up interpreters for each process.

What are the results of your experiments with these modules? Any interesting observation about their behavior in different setups?

If you are interested in articles on Python, feel free to check out my post on new features in Python 3.9 and Python code optimization!

Concurrency and Parallelism in Python was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.