You’ve come to the correct place if you’ve heard a lot about the new asyncio module being added to Python, but you’re wondering how it stacks up against other concurrency approaches, or if you’re unfamiliar with the term “concurrency” and how it may help speed up your program. You’ll pick up the following knowledge from this article:
-
What
concurrency
is -
What
parallelism
is -
How some of
Python’s concurrency methods
compare, including
threading
,
asyncio
, and
multiprocessing
-
When to use concurrency
in your program and which module to use
This article assumes you are familiar with Python’s fundamentals and are running at least Python 3.6 to follow along with the code examples. The Real Python GitHub repository has the sample code available for download. Try Our Python Concurrency Quiz and See How Well You Know It! Take the Quiz » to get a score that will help you monitor your development as a learner over time.
What Is Concurrency?
Concurrency is defined as happening at the same time. At a high level, Python’s several terms for concurrent activity all refer to the same thing: the sequential execution of a set of instructions.
To me, they represent divergent lines of thought. At predetermined points, the CPU or brain processing them can halt processing of one and begin processing another. Each one’s progress is recorded so that it can be picked up exactly where it left off after an interruption.
You may be confused by the fact that Python has multiple terms for the same idea. It turns out that at a high level of abstraction, threads, tasks, and processes all look the same. When dissected, each individual part signifies something somewhat different. The differences between them will become clearer when you examine the subsequent cases.
Let’s break through the “simultaneous” portion of that term now. You need to exercise some caution because, when you come right down to it, only multiprocessing truly executes these independent lines of reasoning simultaneously. Both threading and asyncio are limited to a single processor’s single threaded execution. They simply develop more efficient ways to take turns in order to complete the task at hand more quickly. We nevertheless refer to this as concurrency even if their thinking processes do not occur simultaneously.
The main distinction between threading and asyncio is in how tasks are asynchronously switched between. Threading is a method of multitasking in which the OS is aware of and capable of resuming execution of any of the running threads. Because the OS can interrupt your thread and swap tasks without your intervention, this is known as pre-emptive multitasking.
The benefit of pre-emptive multitasking is that the thread’s code doesn’t have to do anything to switch tasks. The “at any time” clause adds another layer of complexity. Even simple statements like x = x + 1 can undergo this change in the middle of a single Python program.
In contrast, Asyncio makes use of cooperative multitasking. The jobs need to coordinate by signaling when they’re done and can be replaced. To make this work, some tweaks to the task’s code are required.
The payoff of this extra effort is knowing exactly where your task will be replaced. Unless the statement in which it is used is explicitly noted, it will not be replaced during execution. You’ll be able to see how this helps you out later on.
design.
What Is Parallelism?
You have studied concurrency on a single processor up until this point. Isn’t it amazing that your brand new laptop has multiple processor cores? What is the best way to utilize them? The solution is multiprocessing.
Python’s multiprocessing feature allows for the spawning of multiple independent programs. In this context, a process is analogous to a new program; yet, technically speaking, processes are typically defined as a collection of resources such as memory and file handles. Each thread could be thought of as using its own Python interpreter.
Each line of reasoning in a multiprocessing program can use its own processing core because it represents a distinct process. Having each process run on its own separate core allows them to run simultaneously, which is fantastic. While this approach is not without its challenges, Python generally makes light work of them.
Let’s go over the distinctions between concurrency and parallelism now that you know what they are and why they are useful.
useful:
Concurrency Type | Switching Decision | Number of Processors |
---|---|---|
Pre-emptive multitasking (
) |
The operating system decides when to switch tasks external to Python. | 1 |
Cooperative multitasking (
) |
The tasks decide when to give up control. | 1 |
Multiprocessing (
) |
The processes all run at the same time on different processors. | Many |
All of these forms of simultaneous processing have their uses. Here are some examples of software they can speed up:
up.
When Is Concurrency Useful?
There are two categories of issues where concurrency can be quite helpful. CPU-bound and I/O-bound tasks are the two most common types.
Your program’s performance will degrade due to I/O bottlenecks if it has to wait too long for input or output from the outside world. They crop up often when your app interacts with resources that move at a slower pace than your CPU.
There are countless examples of things that are slower than your CPU, but most of them are completely irrelevant to your program. Your app’s most common bottlenecks will be the file system and network connections.
So, how does that manifest itself?
Time spent waiting for an I/O operation to finish is represented in red, whereas time spent actually processing data is shown in blue in the preceding diagram. Due to the fact that internet requests can take many times as long as CPU instructions, the proportions in this diagram are unrealistic. Most of the time, this is what your browser is up to.
On the other hand, there exist categories of programs that can perform substantial processing without using the file system or network. These applications are known as CPU-bound because the CPU itself, rather than the network or the file system, is the bottleneck in their performance.
For a CPU-limited application, see this related diagram:
Different types of concurrency perform better or worse with CPU-bound and I/O-bound programs, as you’ll see when you work through the examples in the next section. It is up to you to weigh the potential gains in speed against the additional code and complexity introduced by concurrency. You should have enough information to get started making that choice by the time you finish reading this article.
In order to make things more clear, here is a
concept:
I/O-Bound Process | CPU-Bound Process |
---|---|
Your program spends most of its time talking to a slow device, like a network connection, a hard drive, or a printer. | You program spends most of its time doing CPU operations. |
Speeding it up involves overlapping the times spent waiting for these devices. | Speeding it up involves finding ways to do more computations in the same amount of time. |
The focus will initially be on I/O-bound applications. In the following section, you will observe code that handles CPU-bound
programs.
How to Speed Up an I/O-Bound Program
First, we’ll examine a frequent issue with I/O-bound programs: obtaining data from remote servers. While we’ll use web page downloads as an example, this concept applies to any kind of network activity. Visualizing and establishing an online presence is merely simpler.
pages.
Synchronous Version
We’ll begin with a serial approach to this problem. The requests module is necessary for the operation of this software. Pip install requests must be run prior to execution, likely in a virtualenv. The current implementation makes no advantage of concurrency.
all:
import requests
import time
def download_site(url, session):
with session.get(url) as response:
print(f"Read {len(response.content)} from {url}")
def download_all_sites(sites):
with requests.Session() as session:
for url in sites:
download_site(url, session)
if __name__ == "__main__":
sites = [
"https://www.jython.org",
"http://olympus.realpython.org/dice",
] * 80
start_time = time.time()
download_all_sites(sites)
duration = time.time() - start_time
print(f"Downloaded {len(sites)} in {duration} seconds")
The length of the program is indicative of its brevity. download_site() only retrieves the data from a given URL and displays its size. We’re utilizing a Session object derived from requests, which is worth mentioning.
While calling get() straight from requests will work, a Session object will enable requests to perform some clever network techniques, greatly enhancing performance.
download_all_sites() initiates a new Session and iteratively downloads the sites in the list. Finally, it outputs the total time required for this procedure, giving you the satisfaction of knowing just how much time and effort concurrency saved in the following cases.
The I/O-bound diagram described in the previous section serves as a good template for the processing diagram for this program. It’s important to remember that network traffic is extremely dynamic and can change drastically even within a single second. Due to network difficulties, I’ve observed the duration of these tests quadruple between runs. The Many Benefits of the Synchronous Version
This newer version of the code is wonderful because it’s simple to use. Writing and fixing it was a breeze. It’s easier to grasp in one’s mind as well. Since it just has one line of reasoning, you can always anticipate its actions and know how it will react. Constraints of Real-Time Execution
The main drawback here is that it is slower than the alternatives we will offer. Here is a screenshot of the finished result on my
machine:
$ ./io_non_concurrent.py
[most output skipped]
Downloaded 160 in 14.289619207382202 seconds
Your mileage may very well vary. The times ranged from 14.2 seconds up to 21.9 seconds when I ran this script. The fastest of three times was used for this article’s purposes. It will still be easy to see how the approaches vary from one another.
However, being behind the pace isn’t always a deal breaker. It may not be worth the effort to implement concurrency if the program you’re running only takes 2 seconds in the synchronous version and is rarely used. This is the final stop.
If your software is regularly executed, what happens? Can we still run it if it takes hours? Let’s rewrite this program with threading and move on to concurrency.
.
Creating a threaded program requires more work, as you might expect. However, in simple circumstances, you may be pleasantly pleased by how little more work is required. Look what happens when we add threading to the same program:
:
import concurrent.futures
import requests
import threading
import time
thread_local = threading.local()
def get_session():
if not hasattr(thread_local, "session"):
thread_local.session = requests.Session()
return thread_local.session
def download_site(url):
session = get_session()
with session.get(url) as response:
print(f"Read {len(response.content)} from {url}")
def download_all_sites(sites):
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
executor.map(download_site, sites)
if __name__ == "__main__":
sites = [
"https://www.jython.org",
"http://olympus.realpython.org/dice",
] * 80
start_time = time.time()
download_all_sites(sites)
duration = time.time() - start_time
print(f"Downloaded {len(sites)} in {duration} seconds")
Adding threading requires only minor adjustments to the existing framework. The download_all_sites() method is now more involved than simply running the function individually for each site.
In this variant, you are tasked with the seemingly daunting task of constructing a ThreadPoolExecutor. The formula for ThreadPoolExecutor is as follows: Thread+Pool+Executor.
The significance of Thread is unsurprising to you. That’s merely an idea we stated before. The Pool section is when things get exciting. With the help of this object, you can launch several threads at once. The Executor is the final piece, and it’s responsible for scheduling and running all of the threads in the pool. The request will be processed in the pool.
To make it easier to manage establishing and removing the pool of Threads, the standard library provides ThreadPoolExecutor as a context manager.
.map() is a useful tool after you have a ThreadPoolExecutor. The passed-in function is executed on each site in the list by calling this method. The best aspect is that it handles managing a pool of threads and automatically executes them in parallel.
Those who are used to objects and methods such as Thread.start(), Thread.join(), and Queue in other languages or even Python 2 may be confused as to their absence in Python 3.
These remain available and can be used to fine-tune the execution of your threads. But starting with Python 3.2, if you don’t require that granular control, the standard library provides a higher-level abstraction called Executors that handles many of the intricacies for you.
Another novel aspect of our illustration is that threads must now each generate their own requests.Session() object. While it’s not often obvious from reading the requests description, it appears rather evident from this issue that each thread need its own Session.
This is one of the fascinating and challenging aspects of threading. Thread-safe data is data that cannot be modified by a non-thread in the event of an interruption by the operating system and the subsequent initiation of a new job. The requests.Session() method is not thread-safe, unfortunately.
Depending on the nature of the data and the intended use, different approaches can be taken to ensure that thread-safety during data access. Using Python’s queue module and its thread-safe Queue data structure is one option.
To prevent multiple threads from modifying the same section of code or section of memory at the same time, these objects employ low-level primitives like threading.Lock. Through the ThreadPoolExecutor object, you are making indirect use of this technique.
Thread local storage is an alternative approach that can be used here. Using threading.local(), you can make an object that acts like a global but is actually local to a single thread. Thread_local and get_session are used in your example to do this.()
:
thread_local = threading.local()
def get_session():
if not hasattr(thread_local, "session"):
thread_local.session = requests.Session()
return thread_local.session
The threading module contains the function local() for just this purpose. While it may seem counterintuitive, you should only make one of these objects instead of one for each thread. The object itself manages the isolation of threaded access to its data.
The session that is looked for by get_session() when it is invoked is unique to the thread that called it. Therefore, the first time a thread calls get_session(), it will establish a new session, and from then on, it will simply use the already created session wherever possible.
Finally, a word about deciding how many threads to use. The 5 threads used in the example code are clearly visible. Feel free to adjust this value and observe the subsequent effects on the total duration. One thread per download would seem to be the most efficient, but I found that this was not the case on my computer. The best performance I saw was with 5-10 threads. Above that number, the added time spent building and removing the threads will cancel out any gains in efficiency.
The tricky answer is that the optimal number of threads varies from task to task. There needs to be some trial and error. Reasons Why the Threaded Version Is Awesome
Very quick! This is my fastest time during my tests. It’s important to recall that it took more than 14
seconds:
$ ./io_threading.py
[most output skipped]
Downloaded 160 in 3.7238826751708984 seconds
Its time diagram for execution is as follows:
By sending numerous requests to different websites in parallel, your application can skip ahead in the queue and acquire the final result more quickly. Yippee! That’s what we were hoping to accomplish. Issues with the Threading Implementation
As the example demonstrates, however, doing so requires a bit more code, and you need give some serious consideration to which data is shared throughout different threads.
Subtle and undetectable thread interactions are possible. Race conditions, which can be triggered by these interactions, often lead to unpredictable, sporadic errors that are challenging to track down. Those of you who aren’t already aware with the term “race conditions” may wish to continue reading this expanded and more detailed explanation.
below.
Let’s discuss asyncio in greater detail before diving into the asyncio example code. Asyncio 101
We’ll be using a stripped-down form of asyncio for this. Although this simplifies a lot of the mechanics, it should be understood.
Asyncio’s central idea is to have a single Python object, the event loop, manage the execution of multiple tasks in parallel. The event loop is aware of the current status of each task. In practice, tasks can be in a variety of states, but for the sake of argument, let’s consider a one-state event loop.
Tasks can be in one of two states: ready, indicating they are prepared to perform work, or waiting, indicating they are waiting for something external to complete, such a network activity.
Your streamlined event loop keeps two separate to-do lists, one for each of these conditions. One of the completed jobs is chosen and resumed. Until that job cooperatively returns control to the event loop, it is in charge.
When a running job returns control to the event loop, it is classified as ready or waiting, and the event loop then checks each task in the waiting list to determine if it has become ready due to the completion of an I/O operation. The system is aware that the ready tasks have not yet been executed and so are still ready.
After reordering all of the jobs, the event loop selects the next one to execute, and the cycle begins again. In your reduced event loop, the longest-waiting task is executed. Until the event loop completes, this procedure will keep repeating itself.
One key aspect of asyncio is that tasks never surrender authority unless explicitly instructed to do so. They never have to pause a procedure because of a disruption. This makes resource sharing somewhat simpler in asyncio compared to threading. Making your code thread-safe is unnecessary.
That’s the big picture of asyncio in action. If you’re looking for further information, this answer on StackOverflow should satisfy your curiosity. await and async
Let’s speak about async and await now, two new keywords in Python. In light of what has been said so far, await can be seen as the enchantment that causes the task to return control to the event loop. Waiting for a function call indicates that the call will take some time and that the task should relinquish control.
Async can be thought of as a flag in Python that indicates the upcoming function definition will make use of the await keyword. While this isn’t always the case (for example, with asynchronous generators), in most situations it serves as a useful starting point and provides a straightforward model.
The async with statement in the following code is an exception to this rule; it turns an object you would ordinarily await into a context manager. Although the semantics are different, the underlying concept of marking this context manager as a replaceable component remains the same.
Managing the communication between the event loop and the jobs is, as you might expect, a bit involved. These specifics aren’t relevant for new asyncio developers, but it’s vital to keep in mind that every function that uses await must be labeled async. Otherwise, a syntax error will occur. Return to Source
With everything out of the way, let’s dive into the asyncio implementation of the sample code and see how it all comes together. Take note that aiohttp has been included in this revision. Prior to using aiohttp, you must first execute pip install aiohttp.
it:
import asyncio
import time
import aiohttp
async def download_site(session, url):
async with session.get(url) as response:
print("Read {0} from {1}".format(response.content_length, url))
async def download_all_sites(sites):
async with aiohttp.ClientSession() as session:
tasks = []
for url in sites:
task = asyncio.ensure_future(download_site(session, url))
tasks.append(task)
await asyncio.gather(*tasks, return_exceptions=True)
if __name__ == "__main__":
sites = [
"https://www.jython.org",
"http://olympus.realpython.org/dice",
] * 80
start_time = time.time()
asyncio.get_event_loop().run_until_complete(download_all_sites(sites))
duration = time.time() - start_time
print(f"Downloaded {len(sites)} sites in {duration} seconds")
This iteration is a step up in complexity from the last couple. It’s organized similarly to the ThreadPoolExecutor, but it requires more effort to set up the jobs. First, let’s examine the most extreme case. download_site()
With the exception of the async keyword on the function declaration line and the async with keywords when actually using session.get(), the first download_site() is nearly identical to the threaded version. The reason that Session can be given in here instead of using thread-local storage will become clear in a moment. download_all_sites()
The most noticeable difference from the threading example occurs in download_all_sites().
The session is generated as a context manager that may be used across all jobs. Because they are all executing on the same thread, the tasks can share the session. When the session is in a terrible state, it is impossible for one task to interrupt another.
Using asyncio.ensure_future() within that context manager, a list of tasks is created and started. This function ensures that the session context survives until all tasks have finished by using asyncio.gather() once all tasks have been started.
This is essentially what the threading code does, though the ThreadPoolExecutor takes care of the nitty-gritty for you. Unfortunately, the AsyncioPoolExecutor class is not yet available.
However, the devil is in the details, and there is one tweak worth noting. Do you recall our conversation on how many threads to start? The appropriate number of threads wasn’t made clear in the threading example.
Asyncio’s superior scalability over threading is one of its coolest features. Creating and running more tasks is effective since each one requires fewer inputs and less time than a thread. This illustrative procedure simply generates a new task for each website to download, and it functions admirably. __main__
Because of how asyncio works, you’ll need to initiate the event loop and specify which jobs to execute. The functions get_event_loop() and run_until_complete() can be found in the file’s __main__ section. If nothing else, the name of those operations is superb.
If you’ve upgraded to Python 3.7, you’ll find that this syntax has been simplified by the Python core developers. As an alternative to the tongue-twisting asyncio.get_event_loop().run_until_complete(), you can simply use asyncio.run(). What’s so Great About the asyncio Variant
Incredibly quick! This version of the code ran the tests on my machine the fastest.
margin:
$ ./io_asyncio.py
[most output skipped]
Downloaded 160 in 2.5727896690368652 seconds
The time diagram for the execution is very similar to the threading example. It’s only that a single thread handles all of the I/O operations:
This code is slightly more involved than the threading example because there is no convenient wrapper to simplify things, such as the ThreadPoolExecutor. Putting forth a little extra effort in this scenario will yield significantly greater results.
One typical criticism is that using async and await where they belong adds unnecessary complexity. That’s accurate, to a point. On the other hand, considering the circumstances under which a particular duty might be replaced can help you build a more efficient system.
The problem of scale is equally significant. It takes significantly longer to perform the preceding threading example with one thread for each site than it does with a few threads. The asyncio demo ran smoothly even when subjected to hundreds of simultaneous jobs. Issues with the asyncio Implementation
Asyncio currently has a few bugs that need fixing. To fully benefit from asyncio, you must use libraries that have been updated to be compatible with its API. Downloading the sites using requests alone would have been substantially slower because requests is not intended to inform the event loop that it has been blocked. As more libraries adopt asyncio, this is becoming less of a problem.
The benefits of working on multiple activities at once are nullified if even one of them isn’t cooperative, which is a more nuanced problem. A simple programming error might lead an unnecessary process to run indefinitely, depriving other activities of resources. If a task does not return control to the event loop, it will not be able to intervene.
So, let’s move up to a far more advanced level of concurrency, namely multiprocessing.
.
The multiprocessing version of the code, in contrast to the earlier methods, makes extensive use of the many central processing units (CPUs) that your fancy new computer has. Or, as is the case with my antiquated laptop,. To begin, let’s look at the
code:
import requests
import multiprocessing
import time
session = None
def set_global_session():
global session
if not session:
session = requests.Session()
def download_site(url):
with session.get(url) as response:
name = multiprocessing.current_process().name
print(f"{name}:Read {len(response.content)} from {url}")
def download_all_sites(sites):
with multiprocessing.Pool(initializer=set_global_session) as pool:
pool.map(download_site, sites)
if __name__ == "__main__":
sites = [
"https://www.jython.org",
"http://olympus.realpython.org/dice",
] * 80
start_time = time.time()
download_all_sites(sites)
duration = time.time() - start_time
print(f"Downloaded {len(sites)} in {duration} seconds")
Before we get into the code, let’s take a quick tour of what multiprocessing accomplishes for you; it’s considerably shorter than the asyncio example and looks fairly similar to the threading example. In Brief: Multiprocessing
All the preceding examples of concurrency in this article have been limited to using only a single central processing unit (CPU) or core on your machine. The existing architecture of CPython and a feature called the Global Interpreter Lock (GIL) are to blame for these issues.
The details of the GIL’s implementation are outside the scope of this piece. The fact that the synchronous, threading, and asyncio implementations of this example can all be run on a single CPU is sufficient for the time being.
The goal of adding multiprocessing to the standard library was to remove this obstacle and allow your code to make use of numerous central processing units. In a nutshell, it does this by starting a separate instance of the Python interpreter on each CPU and distributing portions of your code to run there.
Starting a new thread in the currently running Python interpreter is much quicker than launching a new Python interpreter. It’s a complex procedure with various limitations and challenges, but when applied to the right issue, it can have a profound impact. Coding for Multiprocessors
From our synchronous version, there are a few minor tweaks to the code. The first of these can be found in the function known as download_all_sites(). It does this by making a multiprocessing.Pool object and having it map download_site to the iterable sites, rather of simply executing download_site() over and over again. The threading example should help you recognize this.
Here, the Pool spawns a number of independent Python interpreter processes, and instructs each one to execute the supplied function on a subset of the iterable’s contents (in this example, the list of sites). The multiprocessing module takes care of coordinating the exchange of information between the primary process and any subprocesses.
It’s important to focus on the line that results in Pool. To begin, it does not provide the optional parameter of how many processes to create in the Pool. The number of available processors on your computer will be automatically matched by multiprocessing.Pool(). This is the optimal solution in our situation, as is often the case.
For this issue, running more processes simultaneously did not improve performance. The overhead of starting and stopping so many processes to perform I/O requests in parallel was greater than the gain from doing so.
The initializer=set_global_session parameter comes next in the function call. Keep in mind that our Pool processes each have their own dedicated memory. That rules out things like sharing a Session object between them. You should instead make a new Session for each process, rather than each time the function is invoked.
The parameter for the initializer function was designed with this use in mind. A return value cannot be passed from the initializer to the method called by the process in download_site() ; however, a global session variable can be initialized to store the process’s unique session. The global variable will be different for each process because each process has its own memory space.
That sums up the whole thing. The remainder of the code should look familiar. Reasons Why the Multiprocessor Version is Awesome
The multiprocessing implementation of this example is fantastic due to its low overhead and ease of setup. It also makes the most of your computer’s processing capability. This code has the following time diagram for execution:
Flaws in the Multiprocessor Implementation
There is some additional preparation for this version of the example, and the use of a single global session object is unusual. It’s important to give some thought to which variables will be used in certain operations.
When compared to the asyncio and threading variants, it is notably slower.
example:
$ ./io_mp.py
[most output skipped]
Downloaded 160 in 5.718175172805786 seconds
That’s to be expected, as solving I/O-bound problems is not the primary goal of multiprocessing. In the following part, we’ll take a closer look at CPU-bound
examples.
How to Speed Up a CPU-Bound Program
Allow me to make a slight directional change. All of the preceding cases involved an I/O bottleneck. Researching a problem that is limited by the computer’s processing power is next. An I/O-bound problem, as you saw, is one that is heavily dependent on the results of external processes such as a network call. In contrast, a CPU-bound task performs few I/O operations, and its runtime is proportional to how quickly the CPU can process the necessary data.
Our example will make use of a quite inane function in order to generate code that is very CPU-intensive. The values between 0 and the one you enter into this function will be squared, and their sum will be returned.
value:
def cpu_bound(number):
return sum(i * i for i in range(number))
Since there will be so many of you, this process will take some time. Keep in mind that this is only a stand-in for the meatier, more processing-intensive code that you have planned, such as solving complex equations or sorting massive data sets.
structure.
CPU-Bound Synchronous Version
Let’s take a look at the non-concurrent implementation of
example:
import time
def cpu_bound(number):
return sum(i * i for i in range(number))
def find_sums(numbers):
for number in numbers:
cpu_bound(number)
if __name__ == "__main__":
numbers = [5_000_000 + x for x in range(20)]
start_time = time.time()
find_sums(numbers)
duration = time.time() - start_time
print(f"Duration {duration} seconds")
This code invokes cpu_bound() twenty times, each time passing a unique huge integer. This is accomplished without using any more threads, processes, or CPUs. This is the time diagram for the execution:
The execution times of CPU-bound examples are typically more stable than those of I/O-bound examples. My timing for this one averages 7.8 seconds.
machine:
$ ./cpu_non_concurrent.py
Duration 7.834432125091553 seconds
Obviously, we are capable of greater things. All of this is happening on a single CPU, with no multitasking at all. Let’s try to figure out a way to make it work.
better.
Do you believe this may be sped up by altering the code to make use of threading or asyncio?
If “not at all” was your response, you deserve a reward. Those who guessed “It will slow it down” are correct and deserve a cookie.
Specifically, most of the total time in your I/O-bound example above was spent waiting for sluggish procedures to complete. Thanks to threading and asyncio, you may now overlap the waiting times, making the process go much faster than before.
However, time is not wasted on a CPU-bound problem. The central processing unit is working as quickly as it can to solve the situation. Both threads and tasks in Python are executed by the same process on the same CPU. That’s because the setup of threads or jobs adds more work for the single CPU, therefore the non-concurrent code is being executed in its entirety. There are more than 10
seconds:
$ ./cpu_threading.py
Duration 10.407078266143799 seconds
To facilitate your testing, I’ve included a threading implementation of this code to the GitHub repository alongside the existing set of examples. Not now, though; let’s put that off.
however.
CPU-Bound
You’ve arrived at the sweet spot for multitasking. Multiprocessing is unique among concurrency libraries in that it was developed specifically to distribute intensive workloads over numerous central processing units. Its time diagram for execution is as follows:
The source code looks like this:
like:
import multiprocessing
import time
def cpu_bound(number):
return sum(i * i for i in range(number))
def find_sums(numbers):
with multiprocessing.Pool() as pool:
pool.map(cpu_bound, numbers)
if __name__ == "__main__":
numbers = [5_000_000 + x for x in range(20)]
start_time = time.time()
find_sums(numbers)
duration = time.time() - start_time
print(f"Duration {duration} seconds")
Compared to the non-concurrent version, very little of this code had to be modified. After importing multiprocessing, you simply swap out your for loop with an instance of multiprocessing.Pool and the.map() method to distribute the numbers among available worker processes.
You accomplished the same thing with the I/O-bound multiprocessing function, but in this case, the Session object isn’t important.
The processes argument of the multiprocessing.Pool() function Object() { [native code] } is worth examining, as was described earlier. The maximum number of Process objects that can be produced and managed by the Pool is set by the user. By default, it will count your computer’s processors and launch a separate task for each one. This is perfect for our example, but in a real-world setting, you may need more fine-grained control.
Those of you who have written multithreaded and multiprocessing code in other languages will find the multiprocessing.Pool code to be familiar, as it is constructed from building blocks like Queue and Semaphore. Reasons Why the Multiprocessor Version is Awesome
The multiprocessing implementation of this example is fantastic due to its low overhead and ease of setup. It also makes the most of your computer’s processing capability.
Hey, I mentioned something very similar when we last discussed multiprocessing. The key distinction is that this time around, it is without question the best choice. My watch counts down in 2.5 seconds.
machine:
$ ./cpu_mp.py
Duration 2.5175397396087646 seconds
That’s a significant improvement over the alternatives we considered. Flaws in the Multiprocessor Implementation
The use of multiprocessing is not without its downsides. Although they aren’t particularly noticeable in this simple example, there can be challenges in partitioning your task such that each CPU can function independently.
Many proposed fixes also call for improved process-to-process interaction. Your solution may become more complicated as a result, requiring considerations that a non-concurrent program would not face.
with.
When to Use Concurrency
We’ve covered a lot of territory, so it would be helpful to review some of the most important concepts before moving on to the discussion of certain decision points that will help you choose which, if any, concurrency module is right for your project.
The first thing to do is figure out if a concurrency module is necessary. While the provided examples make it appear as though each library is straightforward, concurrency inherently adds complexity and frequently leads to difficult-to-find issues.
Don’t rush into adding concurrency without first identifying the type of concurrency you need to address a known performance issue. Premature optimization is the source of most (if not all) programming problems, according to Donald Knuth.
If you’ve decided to optimize your software, one of the first things you should do is determine whether or not it’s CPU- or I/O-bound. Keep in mind that CPU-bound programs are always hard at work processing data or crunching figures, while I/O-bound programs spend much of their time waiting for anything to happen.
Multiprocessing is clearly the best option for CPU-limited issues. Threading and asyncio were not useful in solving this issue.
The Python community has a saying that goes something like, “Use asyncio when you can, threading when you must.” This is because asyncio can provide the best speed up for this type of program, but sometimes you will need crucial libraries that have not been ported to take advantage of asyncio. Keep in mind that all other tasks will be blocked until one task releases control to the event loop.