In computing, binary search is one of the oldest and most reliable algorithms. It’s a common question in technical job interviews and programming competitions. Even with an understanding of the concept, putting binary search into practise is difficult. If you need to perform a binary search in Python or another language, you should always use one of the many available libraries instead of writing your own. This guide will teach you how to:

Use the
bisect
module to do a binary search in Python 
Implement a binary search in Python both
recursively
and
iteratively

Recognize and fix
defects
in a binary search Python implementation 
Analyze the
timespace complexity
of the binary search algorithm 
Search even
faster
than binary search
The reader of this lesson should have some familiarity with programming and an intermediate level of interest in data structures and algorithms. You should know Python’s basic data types, including lists and tuples, at the very least. You need also have a basic understanding of recursion, classes, data classes, and lambdas to get the most out of this lesson.
The sample code used in this lesson may be downloaded from the link below (it requires Python 3.7 or later).
run:
Benchmarking
In the following section of this guide, you will use a subset of the IMDb to compare the efficiency of several search methods. Use of this dataset for any noncommercial use is completely free of charge. It’s tabseparated values (TSV) files that are compressed and updated everyday.
There is a Python script provided for your convenience in the provided sample code. It will get the appropriate file from IMDb, unzip it, and then extract the needed
pieces:
$ python download_imdb.py
Fetching data from IMDb...
Created "names.txt" and "sorted_names.txt"
Warning: this will generate two additional files, each around half the size of the original download (600 MB). It could take a minute or two for the download and processing of this data.
complete.
Download IMDb
Access the data by downloading the name.basics.tsv.gz file from https://datasets.imdbws.com/. This file contains information about actors, directors, authors, etc. The following is what you’ll find if you unzip the file:
content:
nconst primaryName birthYear deathYear (...)
nm0000001 Fred Astaire 1899 1987 (...)
nm0000002 Lauren Bacall 1924 2014 (...)
nm0000003 Brigitte Bardot 1934 \N (...)
nm0000004 John Belushi 1949 1982 (...)
The first line has the column names and the subsequent lines contain data records. Aside from a unique identification, full names, birth years, and other information are included in each record. A tab character separates each of these items.
Due to the massive number of records, a conventional text editor will likely crash if you try to access the file. Spreadsheets and other niche programmes may have trouble reading it. The highperformance data grid viewer seen in JupyterLab is an alternative.
example.
Read TabSeparated Values
A TSV file can be read in several different ways. You can, for instance, read it with Pandas, a specialised program, or certain commandline tools. However, the hasslefree Python script is included in the sample code and is highly recommended. Keep in mind that manual file parsing is fraught with the risk of human error.
marginal instances. A field’s column count may be shattered, for instance, if the tab character were used literally within quote marks. Look for a suitable module in the default library or a reliable thirdparty one if possible.
The final goal is to have two text files in your
disposal:

names.txt

sorted_names.txt
The first one will have a list of names extracted from the TSV’s second column.
file:
Fred Astaire
Lauren Bacall
Brigitte Bardot
John Belushi
Ingmar Bergman
...
The sorted version of this will appear in the second.
When you’re ready to use them using Python, import the two files with this
function:
def load_names(path):
with open(path) as text_file:
return text_file.read().splitlines()
names = load_names('names.txt')
sorted_names = load_names('sorted_names.txt')
This code returns a string containing a selection of names from the specified file. Keep in mind that the newline character at the end of each line is removed when you call.splitlines() on the final string. Text_file.readlines() is another option, however it will retain any extraneous lines in the file.
newlines.
Measure the Execution Time
Algorithms can be ranked in terms of how quickly they are able to process the IMDb dataset. The time and timeit modules are incorporated right into most programming languages and are commonly used for timing a section of code.
It is also possible to define a bespoke decorator for timing a function. Python 3.7’s new time.perf_counter_ns() provides the high accuracy needed for the presented examples.
nanoseconds.
Understanding Search Algorithms
The concept of searching is fundamental to computer science since it is used everywhere. You’ve done a number of searches on the internet today alone, but have you ever stopped to consider what it all means?
There is a wide variety of search algorithms. One possible scenario is that you
can:

Do a
fulltext search

Match strings with
fuzzy searching
 Find the shortest path in a graph
 Query a database
 Look for a minimum or maximum value
This guide will teach you how to locate a specific item in a hierarchical collection of results, such as a phone book. Questions like these may cross your mind while you look for this component:
questions:
Question  Answer 

Is it there?  Yes 
Where is it?  On the 42nd page 
Which one is it?  A person named John Doe 
If an item is in the set, the first inquiry will reveal that fact. It is either true or untrue without exception. The second possible response is the index of the missing piece, which may not be accessible. The element, or its absence, is the third possible explanation. It’s important to remember that there could be more than one right answer in some cases because of duplicate or similar items. If you have several contacts with the same name, for instance, all of them will match your search criteria. On the other hand, there may be no answer at all or simply a rough approximation.
To find what you need quickly, use “search by value,” which will compare each item in the collection to the one you specify. The full element, be it a number, a string, or a person, is what you’re looking for. If there is even the tiniest variance between the two elements being compared, there will be no match.
However, you can narrow your search by selecting a specific element property, such a person’s last name. Selecting a single or several criteria for evaluation is known as “searching by key.” You should first familiarise yourself with various search algorithms before diving into binary search in Python.
work.
Random Search
How could you possibly search for an item in that backpack? You could reach in and grab something at random to check if it’s what you’re after. If you don’t get what you want the first time, you have to return the item and try again. One of the least effective search methods, random search, is best illustrated here. This strategy is inefficient since it increases the likelihood that you will repeatedly select the wrong option. Funny thing about this tactic
theoretically be the most effective option if you were
fortunate or possessing a little set of items.
The following Python code snippet captures the essence of this algorithm:
code:
import random
def find(elements, value):
while True:
random_element = random.choice(elements)
if random_element == value:
return random_element
The function will keep iterating until a random element yields the expected result. Unfortunately, this isn’t particularly helpful because the function either returns None or the same value it was passed as a parameter. The whole implementation is included in the downloadable sample code at the following link:
The random search algorithm seems to be reasonably effective for small datasets.
fast:
>>>
>>> from search.random import * # Sample code to download
>>> fruits = ['orange', 'plum', 'banana', 'apple']
>>> contains(fruits, 'banana')
True
>>> find_index(fruits, 'banana')
2
>>> find(fruits, key=len, value=4)
'plum'
But think of the difficulty of such a search among millions of constituents! A brief summary of an IMDb performance test is provided below.
dataset:
Search Term  Element Index  Best Time  Average Time  Worst Time 

Fred Astaire 




Alicia Monica 




Baoyin Liu 




missing 




We took care to select nonredundant data stored in dissimilar places in memory. To compensate for the algorithm’s inherent randomness and for outside influences like garbage collection and system operations, we conducted each search 10 times. Note: If you would want to try this experiment on your own, please review the procedures outlined in the tutorial’s introduction. Your code’s efficiency can be evaluated with the inbuilt
modules like time and timeit, or you can create your own own
stylistic embellisher.
The results of using this algorithm are not predictable. The average time required to locate an element is independent of the element’s physical location, although the best and worst case scenarios can differ by two or three orders of magnitude. It also exhibits erratic behaviour at times. Think about a set of things that includes some repeats. Repeated invocations will yield unique results due to the algorithm’s random selection of elements.
What would make this better? Using a linear search is one approach that can solve both problems at once.
.
Linear Search
It’s common to scan the lunch menu erratically in hopes that something would jump out at you. You may also adopt a more methodical approach by reading the menu sequentially from top to bottom and giving each item your full attention. That’s the essence of linear search. Python programmers could use enumerate()’s tracking of the current element’s
index:
def find_index(elements, value):
for index, element in enumerate(elements):
if element == value:
return index
A collection of items is iterated over by the function in a consistent and welldefined pattern. If the element is found, or if there are none left to check, the process ends. Using an indexbased order for the traversal ensures that each element is only accessed once.
Here we will test how effectively linear search performs on your IMDb dataset.
before:
Search Term  Element Index  Best Time  Average Time  Worst Time 

Fred Astaire 




Alicia Monica 




Baoyin Liu 




missing 




The time it takes to look for a specific piece is extremely consistent. There is not much of a difference between the best and worst times and the average. Due to the consistent browsing order, the necessary number of comparisons to locate a given element remains constant.
However, as the index of a soughtafter item rises, so does the time required to locate it. The greater the distance between an item and the list’s beginning, the greater the number of comparisons that must be performed. When single item is missing, the entire set must be examined before a firm conclusion can be drawn.
The correlation between element placement and discovery time is readily apparent when experimental data is plotted and the dots are connected.
The name of the technique stems from the fact that all data points may be represented by a linear function and form a straight line. The time needed to find any element using a linear search is assumed to be proportionate to the total number of elements in the collection, on average. As the amount of data to be searched grows, their performance degrades.
If biometric scanners in some airports were constructed using linear search, they would take far longer to correctly identify passengers. However, for smaller datasets, the linear search strategy may be preferable because it doesn’t call for any data preprocessing. The advantages of preprocessing wouldn’t cancel out the costs in this scenario.
Linear search is already included in Python, so there’s no need to implement it from scratch. For instance, if you have a list data structure, you can access a method that either returns the element’s index or throws an exception.
otherwise:
>>>
>>> fruits = ['orange', 'plum', 'banana', 'apple']
>>> fruits.index('banana')
2
>>> fruits.index('blueberry')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: 'blueberry' is not in list
The in operator is more Pythonic, but this method can also determine whether or not the element is included in the collection.
:
>>>
>>> 'banana' in fruits
True
>>> 'blueberry' in fruits
False
The builtin functions and operators will easily outperform your solution, even though they rely on linear search internally. They were compiled to native machine code because C is a highlevel language. No amount of effort put in will be enough to beat it with the regular Python interpreter.
The timeit module demonstrates that a short test suggests the Python implementation could be over ten times slower than the equivalent native
one:
>>>
>>> import timeit
>>> from search.linear import contains
>>> fruits = ['orange', 'plum', 'banana', 'apple']
>>> timeit.timeit(lambda: contains(fruits, 'blueberry'))
1.8904765040024358
>>> timeit.timeit(lambda: 'blueberry' in fruits)
0.22473459799948614
The native code may work OK for smaller datasets, but for larger ones, you’ll need to rethink the technique. A linear search is not always performed by the in operator. If you apply it to a set, for instance, a hashbased search is performed. The controller is compatible with all
any type of iterable collection, such as a tuple, list, set, dictionary, or string. It can also be used to provide support for your own custom classes via an implementation of the
The.__contains__() magic method is used to define the underlying logic.
The linear search algorithm is rarely useful in realworld situations. There was a time when I couldn’t get my cat registered at the vet because their computer system kept crashing. My doctor informed me he needs to upgrade his computer soon since his database is getting too large.
At that moment, I remember thinking that whoever built that software had obviously never heard of the binary search.
algorithm!
Binary Search
The number 2 is commonly thought of when the phrase “binary” is mentioned. In this sense, it means to split an array in half and discard half of the entries at each algorithmic stage. This can significantly cut down on the number of tests conducted to locate a specific component. The catch is that the collection needs to be sorted before it can be used.
The concept is similar to turning to a certain page in a book. When looking for a specific page in a book, most people start by opening it to a random page or one that seems to be close to where they want to be.
On rare occasions, you’ll get lucky and land on the desired page right away. If the page number is too little, though, you know the page in question must be on the right. On the next try, if you overshoot and the current page number is more than the page you’re looking for, you may be certain that it is in between.
You do it all over again, this time checking the page that falls smack dab in the middle of the new range rather than picking at random. Attempts are reduced as a result. In the game of “guess the number,” a similar strategy can be utilised. If you’re unfamiliar with that game, a quick Internet search will yield a wealth of code snippets written in Python. Note that if the numbers are normally distributed, you can use to determine the middle value.
linear interpolation as opposed to arithmetic mean instead. Even fewer steps are needed in this variant of the method.
The lower bound and upper bound are the page numbers that determine how many pages to search. It is usual practise in binary search to use the first page as the lower bound and the last page as the upper bound. Both limits need to be constantly revised. If the page you’ve turned to is lower than the one you’re trying to find, for instance, you can use that number as a new minimum.
Suppose you needed to find a strawberry in a bunch of other fruits arranged in size order:
The initial attempt results with a lemon being the central component. Since it dwarfs a strawberry, you can safely ignore everything to the right of it, lemon included. The upper bound will be shifted to a new location, and the middle index will be refreshed.
You no longer have the full complement of fruits. The strawberry you were looking for was in the centre position all along. If it wasn’t the case, you’d just adjust the limits till they overlapped. If you’re lacking a plum and want to put one between your strawberries and kiwis, you’ll find the following.
You may have seen that not many comparisons were required to locate the soughtafter component. That’s where binary search comes in handy. There would only be a small number of tests needed, even if a million different pieces were involved. Due to halving, this sum won’t go above the logarithm of the total number of components in base two. That is to say, the number of remaining elements is halved at each stage.
The elements’ size ordering makes this feasible. To search for certain hues of fruit, however, would require reorganising the entire collection. One strategy for avoiding the timeconsuming and resourceintensive task of sorting involves precomputing alternative perspectives on the same set of data. The process is analogous to that of building an index in a database.
Think about the results of updating, removing, or adding to a collection. Maintaining the correct sort order is essential for the continued operation of a binary search. The following section will explain how to perform this operation using the bisect module.
Later on, you’ll learn how to write your own binary search algorithm in Python. Let’s test it out against the IMDb database right now. Take note that the persons you need to find have changed since you last looked. Because binary search reorders the items, sorting the dataset is required. To maintain consistency in the measurements, the new elements are placed at about the same indices as the old ones.
comparable:
Search Term  Element Index  Average Time  Comparisons 

(…) Berendse 


23 
Jonathan Samuangte 


24 
Yorgos Rahmatoulin 


23 
missing 


23 
The turnaround time for responses is really low. The typical time it takes the binary search to locate a single element among nine million is just a few microseconds. Aside from that, the formula below accurately describes the steady state of the number of comparisons for the selected elements:
The logarithm of the collection’s size is the number of comparisons needed to find most elements. On the flip side, there is one central component that can be identified immediately upon making a single comparison.
The divideandconquer strategy, of which binary search is a prime example, involves breaking down a large problem into several smaller problems of the same type. The answer is derived by summing the components’ solutions. The Quicksort algorithm is another wellknown application of this method. Please keep in mind that divide and conquer is not the same as
The method of dynamic programming is comparable.
Binary search, in contrast to other search algorithms, has applications beyond simple searching. It’s possible to conduct range inquiries, locate the highest or smallest value, determine the neighbouring value to the target, and more.
Binary search isn’t always the ideal option if you’re looking for the quickest solution possible. Algorithms that work well with hashbased data structures are even quicker. However, this extra memory is necessary for these methods, whereas binary search provides a fair spacetime tradeoff.
.
HashBased Search
You can search more quickly if you reduce the size of the problem space. In order to accomplish this, binary search halves the number of potential solutions at each stage. If all the items are sorted, then it only takes twenty comparisons to see if the element is present, even if there are a million of them.
Knowing exactly where to seek for something is the quickest method to find it. If you already knew where an element was stored in memory, you could skip the searching altogether. Hashing is the process of determining the memory address of an element by mapping either the element itself or one of its keys to that address.
Hashing is analogous to computing the index based on the element itself as opposed to searching for the specific element. A hash function is designed to do this, and it relies on specific mathematical qualities to succeed. Good hashing functionality
should:
 Take arbitrary input and turn it into a fixedsize output.

Have a uniform value distribution to mitigate
hash collisions
.  Produce deterministic results.

Be a
oneway function
. 
Amplify input changes to achieve the
avalanche effect
.
However, it can’t have a high computational cost or the benefits will be outweighed by the expense. Data integrity checking and cryptography are not the only applications of hash functions.
Maps, hash tables, dictionaries, and associative arrays are all types of data structures that implement this idea to translate between keys and their respective values. It’s important to remember that Python’s builtin data structures set and dict both use the hash function to quickly locate data. In contrast to how elements are hashed in a set, a dict utilises the hash function on the keys to its elements. Raymond Hettinger’s talk at a conference explains the Pythonic implementation of a dict.
Python dictionaries written recently.
Hashing can also be pictured by thinking of “buckets” where related items are stored under their respective keys. You could sort fruits by colour as you collect them, for instance.
Fruits like the kiwi and coconut go into the brown bucket, while apples go into the red bucket, and so on. This makes it possible to quickly scan some of the components. Each bucket should contain exactly one fruit. If you don’t, you’ll have a collision, which is a lot of extra work. It’s important to keep in mind that the buckets and their contents are rarely arranged in any logical fashion.
Let’s make a dictionary out of the names in the IMDb dataset, where each entry represents a key and its associated value represents a value.
index:
>>>
>>> from benchmark import load_names # Sample code to download
>>> names = load_names('names.txt')
>>> index_by_name = {
... name: index for index, name in enumerate(names)
... }
Once the textual names have been loaded into a flat list, the mapping can be created using enumerate() within a dictionary comprehension. Now, both the presence test and the index lookup are
straightforward:
>>>
>>> 'Guido van Rossum' in index_by_name
False
>>> 'Arnold Schwarzenegger' in index_by_name
True
>>> index_by_name['Arnold Schwarzenegger']
215
You don’t have to do any searching because a hash function is used internally.
See how well the hashbased search algorithm does on the Internet Movie Database below.
dataset:
Search Term  Element Index  Best Time  Average Time  Worst Time 

Fred Astaire 




Alicia Monica 




Baoyin Liu 




missing 




The average time is not only significantly faster than the existing binary search Python implementation, but it is also consistent across all elements.
The drawbacks include a load time increase of about 0.5 seconds, an increase in memory usage by the Python process of about 0.5 GB, and the necessity of maintaining consistency between the new data and the dictionary’s contents. As a result, lookups are lightning fast, while inserts and updates take only a fraction of the time of a list.
Additionally, dictionary keys must satisfy the constraints of being hashable and having constant hash values. Python’s hash() function can be used to determine whether or not a given data type can be hashed.
it:
>>>
>>> key = ['round', 'juicy']
>>> hash(key)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
Collections that can be modified, such as a list, set, or dictionary, cannot be hashed. Since the hash value of a key often depends on some properties of the key, it is best practise to make dictionary keys immutable. If the contents of a mutable collection were hashable, the hash value would change whenever the collection was used as a key. Imagine if the colour of a certain fruit changed as it ripened. You’d be sifting through the wrong container for it.
It’s not just for hashing data anymore. It’s used in cryptography, for instance, to ensure the security of sensitive information like passwords and other sensitive information.
verification.
Using the
The bisect module is built into Python and can be used to perform binary search as well as keep a list in its original order. The bisection approach to finding the roots of a function serves as its foundation. This module’s six features are split between two categories:
categories:
Find Index  Insert Element 







You can use these methods to locate an element’s index or insert a new one into the correct slot. The first two names are simply shorthand references to bisect_right() and insort_right(). In fact, you’re only dealing with four different capabilities. Important: The list must be sorted before being passed to any of the functions. It’s likely that you’ll get the wrong answers if the components aren’t sorted.
So, let’s jump right into observing the bisect module in
action.
Finding an Element
Use bisect_left to locate an item’s index in a sorted list.()
:
>>>
>>> import bisect
>>> sorted_fruits = ['apple', 'banana', 'orange', 'plum']
>>> bisect.bisect_left(sorted_fruits, 'banana')
1
Since the banana was located at position 1, the output indicates that it is the second fruit in the set. However, you would still get the expected result even if some component was missing.
position:
>>>
>>> bisect.bisect_left(sorted_fruits, 'apricot')
1
>>> bisect.bisect_left(sorted_fruits, 'watermelon')
4
These berries aren’t included just yet, but you can get an idea of where they belong. An apricot, for instance, would go between an apple and a banana, and a watermelon would come in last. You can tell if an element was discovered by comparing two
conditions:

Is the
index
within the size of the list? 
Is the
value
of the element the desired one?
A general function for locating elements can be derived from this.
value:
def find_index(elements, value):
index = bisect.bisect_left(elements, value)
if index < len(elements) and elements[index] == value:
return index
If a match is found, the function will return the index of the matching element. Otherwise, it will implicitly return None.
When conducting a search using a key, it is necessary to keep a separate list of keys. To avoid unnecessary expenditures, it is preferable to determine the keys in advance and employ them repeatedly. To facilitate searches with a variety of keys without introducing a great deal of new code, you can define a helper class.
duplication:
class SearchBy:
def __init__(self, key, elements):
self.elements_by_key = sorted([(key(x), x) for x in elements])
self.keys = [x[0] for x in self.elements_by_key]
The identifier is a function that is the first argument to __init__(). To retrieve a piece of data based on its key, you must first store it in the form of keyvalue pairs and then sort that list. When using tuples to represent pairs, the order of the first member of each pair is always maintained. The next step is to extract the keys, resulting in a flat list formatted appropriately for your binary search implementation in Python.
Then there’s the actual process of locating elements, which involves
key:
class SearchBy:
def __init__(self, key, elements):
...
def find(self, value):
index = bisect.bisect_left(self.keys, value)
if index < len(self.keys) and self.keys[index] == value:
return self.elements_by_key[index][1]
To determine an item’s index, this code divides the sorted key list in half. If such a key does exist, it can be used to retrieve the associated pair from the previously computed list of keyvalue pairs via the key’s index. The second part of those two words is the soughtafter sum. Please keep in mind that this is just an example. To get the best results, use the
official documentation recommends a particular recipe.
If you have more than one banana, bisect_left() will give you the one on the left.
instance:
>>>
>>> sorted_fruits = [
... 'apple',
... 'banana', 'banana', 'banana',
... 'orange',
... 'plum'
... ]
>>> bisect.bisect_left(sorted_fruits, 'banana')
1
Bisect_right(), or its alias bisect(), must be called in order to obtain the rightmost banana. For the purposes of determining where a new banana should be inserted, however, those two functions return an index that is one away from the actual rightmost banana.
element:
>>>
>>> bisect.bisect_right(sorted_fruits, 'banana')
4
>>> bisect.bisect(sorted_fruits, 'banana')
4
>>> sorted_fruits[4]
'orange'
You can figure out how many bananas you have when the codes are combined.
have:
>>>
>>> l = bisect.bisect_left(sorted_fruits, 'banana')
>>> r = bisect.bisect_right(sorted_fruits, 'banana')
>>> r  l
3
Both bisect_left() and bisect_right() would return zero if an element were absent.
bananas.
Inserting a New Element
The bisect module can also be used to ensure that the elements of a sorted list remain in the same order. After all, it would be inconvenient to rearrange the entire list every time an item needed to be added to it. Generally speaking, all three capabilities are applicable.
interchangeably:
>>>
>>> import bisect
>>> sorted_fruits = ['apple', 'banana', 'orange']
>>> bisect.insort(sorted_fruits, 'apricot')
>>> bisect.insort_left(sorted_fruits, 'watermelon')
>>> bisect.insort_right(sorted_fruits, 'plum')
>>> sorted_fruits
['apple', 'apricot', 'banana', 'orange', 'plum', 'watermelon']
In the absence of duplicates, you won’t notice a change. However, this will still not be obvious so long as the values being duplicated are relatively straightforward. A second banana placed on the left has the same result as a second placed on the right.
You’ll need a data type where individual objects can be distinguished from one another even if they share the same value in order to spot the distinction. Let’s use Python’s new @dataclass decorator to create our own custom type, the Person.
3.7:
from dataclasses import dataclass, field
@dataclass
class Person:
name: str
number: int = field(compare=False)
def __repr__(self):
return f'{self.name}({self.number})'
Each individual is given both a name and a random identification number. Two people can be considered equal even if they have different values in the number field by omitting that field from the equality test.
attribute:
>>>
>>> p1 = Person('John', 1)
>>> p2 = Person('John', 2)
>>> p1 == p2
True
However, it is possible to differentiate between these two variables because they refer to different things.
them:
>>>
>>> p1 is p2
False
>>> p1
John(1)
>>> p2
John(2)
Different things are being used for the p1 and p2 variables.
Keep in mind that you can’t use the bisection algorithm on data because instances of a data class aren’t comparable by default.
them:
>>>
>>> alice, bob = Person('Alice', 1), Person('Bob', 1)
>>> alice < bob
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: '<' not supported between instances of 'Person' and 'Person'
Since alice and bob are both objects of a special Python class, the language has no idea how to rank them. For the interpreter to understand how to compare objects of your class, you would normally add the magic method.__lt__() (less than). The @dataclass decorator, on the other hand, can take a few supplementary Boolean flags. When True for one of them, order, the magic methods for comparison are generated automatically.
:
@dataclass(order=True)
class Person:
...
As a result, you can evaluate two candidates and choose the one who best fits your needs.
first:
>>>
>>> alice < bob
True
>>> bob < alice
False
Last but not least, the name and number properties allow you to track where new members are added to the group as it is updated by various functions.
list:
>>>
>>> sorted_people = [Person('John', 1)]
>>> bisect.insort_left(sorted_people, Person('John', 2))
>>> bisect.insort_right(sorted_people, Person('John', 3))
>>> sorted_people
[John(2), John(1), John(3)]
Insertion order is indicated by the numbers in parentheses following the names. There was initially only one John, so he was given the position of 1. You started by making a second one and placing it to the left.
right.
Implementing Binary Search in Python
Keep in mind that unless you have a very good reason, you should probably not use the algorithm. You won’t have to recreate the wheel, saving both time and effort. The library’s code is likely stable, having been used and tested in a realworld production setting, and it likely features robust functionality developed collaboratively by a large number of developers.
However, there are occasions where getting your hands dirty is the best option. Because of licencing or security concerns, your company may have a policy against using certain open source libraries. Maybe you’re at capacity with your storage and your network and can’t take on any more reliance. Finally, if you’re looking to expand your knowledge, try your hand at coding.
Most algorithms have only two basic building blocks:
ways:

Iteratively

Recursively
There are, however, some notable deviations from the norm. The Ackermann function is a wellknown example that has no other possible form of expression.
Get comfortable with the binary search algorithm before continuing. If you need a refresher on something from earlier in this guide,
refresher.
Iteratively
The loop in the iterative algorithm will keep running the same operations over and over again until the termination condition is met. First, we’ll create a function that looks for elements by value and returns those elements.
index:
def find_index(elements, value):
...
This will be used again in a later project.
Assuming the elements are in order, the minimum and maximum values can be chosen arbitrarily.
sequence:
def find_index(elements, value):
left, right = 0, len(elements)  1
You need to find the component in the middle to check if it contains the needed information. Taking the mean of these two indices yields the median index.
boundaries:
def find_index(elements, value):
left, right = 0, len(elements)  1
middle = (left + right) // 2
The result of an integer division can be rounded down to accommodate an odd or even number of elements in the bounded range. A ceiling function may be used instead of updating the boundaries to define the stopping condition.
The next step is to either complete the sequence or divide it in half so that the search can be continued in either of the resulting
halves:
def find_index(elements, value):
left, right = 0, len(elements)  1
middle = (left + right) // 2
if elements[middle] == value:
return middle
if elements[middle] < value:
left = middle + 1
elif elements[middle] > value:
right = middle  1
Return the index of the matching element if it was found in the middle. Otherwise, if it was too low, the lower limit should be raised. If it was excessively large, the upper limit should be reduced.
As the lower boundary approaches the upper one, the process must be enclosed in a loop in order to continue.
one:
def find_index(elements, value):
left, right = 0, len(elements)  1
while left <= right:
middle = (left + right) // 2
if elements[middle] == value:
return middle
if elements[middle] < value:
left = middle + 1
elif elements[middle] > value:
right = middle  1
If the lower boundary is less than or equal to the upper boundary, then iteration should continue. If no match was found, the function will simply return None. In order to find something using a key, you have to look at its characteristics rather than its numerical value. The length of a fruit’s name could serve as a key. Modifying find_index() to take and apply a key is possible.
parameter:
def find_index(elements, value, key):
left, right = 0, len(elements)  1
while left <= right:
middle = (left + right) // 2
middle_element = key(elements[middle])
if middle_element == value:
return middle
if middle_element < value:
left = middle + 1
elif middle_element > value:
right = middle  1
However, before conducting the search, you should sort the list by the same key you’ll use to filter the results.
with:
>>>
>>> fruits = ['orange', 'plum', 'watermelon', 'apple']
>>> fruits.sort(key=len)
>>> fruits
['plum', 'apple', 'orange', 'watermelon']
>>> fruits[find_index(fruits, key=len, value=10)]
'watermelon'
>>> print(find_index(fruits, key=len, value=3))
None
Given that none of the other fruits on the list have names consisting of only three letters, watermelon was selected as the example fruit.
Wonderful, but now you can’t conduct searches based on a value. To fix this, you could give the key the value None by default and then check to see if it was actually given. A more elegant solution, however, would always involve simply calling the key. It would default to an identity function that simply returns the element.
itself:
def identity(element):
return element
def find_index(elements, value, key=identity):
...
You could also use an anonymous lambda expression inline to define the identity function.
:
def find_index(elements, value, key=lambda x: x):
...
Only one question can be answered by find_index(). Two more questions remain, “Is it there?as well as “What is it?Your answers to these two questions can be constructed upon
it:
def find_index(elements, value, key):
...
def contains(elements, value, key=identity):
return find_index(elements, value, key) is not None
def find(elements, value, key=identity):
index = find_index(elements, value, key)
return None if index is None else elements[index]
These three properties reveal an element’s essential nature. However, you still haven’t implemented anything to prevent duplicates. What if you had a group of people who all happened to share a first or last name? For instance, there could be several Johns or a Smith family among the
people:
people = [
Person('Bob', 'Williams'),
Person('John', 'Doe'),
Person('Paul', 'Brown'),
Person('Alice', 'Smith'),
Person('John', 'Smith'),
]
It is possible to extend an existing data class to model the Person
earlier:
from dataclasses import dataclass
@dataclass(order=True)
class Person:
name: str
surname: str
Take note of the order attribute’s role in facilitating the generation of magic methods for conducting crossfield comparisons between instances of the class. The namedtuple is a shorter alternative that could be used instead.
syntax:
from collections import namedtuple
Person = namedtuple('Person', 'name surname')
Both meanings can be used interchangeably and correctly. Each individual possesses both a given name and a surname. The attrgetter() builtin operator makes it easy to define the key function for sorting and searching by any of them.
module:
>>>
>>> from operator import attrgetter
>>> by_surname = attrgetter('surname')
>>> people.sort(key=by_surname)
>>> people
[Person(name='Paul', surname='Brown'),
Person(name='John', surname='Doe'),
Person(name='Alice', surname='Smith'),
Person(name='John', surname='Smith'),
Person(name='Bob', surname='Williams')]
Take note of the new, alphabetical order of last names. Even though there is John Smith and Alice Smith, only one of them will show up in a binary search for the Smith surname.
result:
>>>
>>> find(people, key=by_surname, value='Smith')
Person(name='Alice', surname='Smith')
Bisect_left() and bisect_right() can be written to replicate the functionality of the bisect module demonstrated previously. Checking for the existence of a duplicate element is necessary before locating its leftmost occurrence.
all:
def find_leftmost_index(elements, value, key=identity):
index = find_index(elements, value, key)
if index is not None:
...
return index
When an index is discovered, the search continues to the left until a different key is found or the set is exhausted.
elements:
def find_leftmost_index(elements, value, key=identity):
index = find_index(elements, value, key)
if index is not None:
while index >= 0 and key(elements[index]) == value:
index = 1
index += 1
return index
If you go past the very first item in the array, you’ll need to shift the index one spot to the right.
The process of identifying the optimal instance is very similar, but the
conditions:
def find_rightmost_index(elements, value, key=identity):
index = find_index(elements, value, key)
if index is not None:
while index < len(elements) and key(elements[index]) == value:
index += 1
index = 1
return index
To reach the end of the list, you will now scroll to the right instead of the left. Using both methods, you can locate every instance of a duplicate.
items:
def find_all_indices(elements, value, key=identity):
left = find_leftmost_index(elements, value, key)
right = find_rightmost_index(elements, value, key)
if left and right:
return set(range(left, right + 1))
return set()
The result of this function is a set every time. If the missing item cannot be located, the set is said to be “empty.” If there is only one instance of the element, then there will be only one index in the set. If this isn’t the case, then the set will have multiple indices.
Finally, you can round out your binary search Python with the definition of even more abstract functions.
library:
def find_leftmost(elements, value, key=identity):
index = find_leftmost_index(elements, value, key)
return None if index is None else elements[index]
def find_rightmost(elements, value, key=identity):
index = find_rightmost_index(elements, value, key)
return None if index is None else elements[index]
def find_all(elements, value, key=identity):
return {elements[i] for i in find_all_indices(elements, value, key)}
This not only helps you find items on the list, but it also makes retrieving them much easier. Detailed questions are at your disposal.
questions:
Is it there?  Where is it?  What is it? 












Here you can find the full source code for this binary search Python library.
below:
Recursively
The recursive form of contains(), which reports whether or not an element was found, is the only one you’ll be thinking about for the time being. Please take note that the definition of
In an earlier section, recursion was
scene from an
Series on functional programming in JavaScript, Fun Fun Function:
If a function can call itself indefinitely, it is said to be recursive.
Matthew P. Johansson
Taking the iterative form of binary search and chopping it up with the slicing operator is the simplest method.
list:
def contains(elements, value):
left, right = 0, len(elements)  1
if left <= right:
middle = (left + right) // 2
if elements[middle] == value:
return True
if elements[middle] < value:
return contains(elements[middle + 1:], value)
elif elements[middle] > value:
return contains(elements[:middle], value)
return False
By performing a single condition check, you can avoid looping and instead occasionally invoke the same function on a smaller list. Honestly, where do we go wrong there? Well, it turns out that slicing creates copies of referenced elements, which can result in significant extra work for the computer’s memory and processing power.
To avoid duplication, you could always use the same list within the function, but supply new boundary values.
necessary:
def contains(elements, value, left, right):
if left <= right:
middle = (left + right) // 2
if elements[middle] == value:
return True
if elements[middle] < value:
return contains(elements, value, middle + 1, right)
elif elements[middle] > value:
return contains(elements, value, left, middle  1)
return False
The drawback is that initial boundaries must be passed to the function every time it is called, ensuring that they are valid.
correct:
>>>
>>> sorted_fruits = ['apple', 'banana', 'orange', 'plum']
>>> contains(sorted_fruits, 'apple', 0, len(sorted_fruits)  1)
True
In the event of an error, it may not locate the specified component. The use of default function arguments or the creation of a helper function that delegates to the recursive function are two ways to enhance the current situation.
one:
def contains(elements, value):
return recursive(elements, value, 0, len(elements)  1)
def recursive(elements, value, left, right):
...
Further, nesting functions can help you conceal technical details and take advantage of variable reuse in outer functions.
scope:
def contains(elements, value):
def recursive(left, right):
if left <= right:
middle = (left + right) // 2
if elements[middle] == value:
return True
if elements[middle] < value:
return recursive(middle + 1, right)
elif elements[middle] > value:
return recursive(left, middle  1)
return False
return recursive(0, len(elements)  1)
Both the elements and value parameters are defined in the outer scope, but are still accessible to the recursive() inner function. Python’s socalled LEGB rule instructs the interpreter to look for symbols in the following contexts to determine the lifetime and visibility of variables.
order:

Local
scope 
Enclosing
scope 
Global
scope 
Builtin
symbols
This paves the way for using outerscope variables inside of nested blocks of code.
Performance, convenience, and individual preference all factor into whether or not an iterative or recursive implementation is chosen. On the other hand, recursion is not without its dangers, which will be discussed in the following section.
section.
Covering Tricky Details
In The Art of Computer Programming, the author explains how to use the binary search algorithm in practise as follows:
Despite the apparent simplicity of binary search, many skilled programmers have made rookie mistakes when first attempting it.
According to Donald Knuth
Hopefully this will dissuade you from trying to create the algorithm on your own if the previous one didn’t. A decade passed before anyone noticed that the Java standard library’s implementation of binary search contained a subtle bug. However, the history of the bug goes back much further than that. I was once screened out technically due to my ignorance of the binary search algorithm. Several encoding challenges, including a binary search puzzle, needed to be completed. Can you guess which one I didn’t finish? Yeah.
The following is by no means an allinclusive list, but simple oversights like failing to alphabetize your
list.
Integer Overflow
As previously mentioned, this is the Java flaw in question. The binary search Python algorithm, you may recall, studies the object in the centre of a given interval within a sorted set. Where does the middle piece come from, exactly? To find the midpoint, you typically take the mean between the lower and upper limits.
index:
middle = (left + right) // 2
In nearly all situations, this approach to determining the mean will yield accurate results. When the number of items in the collection grows too large, however, the total of the two limits will exceed the range of the integer data type. It will exceed the maximum positive integer value.
In such a case, the programme execution could be terminated immediately by raising an error in some programming languages. That isn’t always the case, though. When this occurs in Java, for instance, the value simply flips around and becomes an arbitrary number without raising any eyebrows. If the resulting number is negative, an IndexOutOfBoundsException will be thrown, alerting you to the issue.
In jshell, which can be thought of as an interactive interpreter for Java, we can see an example of this behaviour in action.
Java:
jshell> var a = Integer.MAX_VALUE
a ==> 2147483647
jshell> a + 1
$2 ==> 2147483648
Calculating the offset and then adding it to the lower index could be a more secure way to find the middle index.
boundary:
middle = left + (right  left) // 2
It is impossible for the above sum to be reached even if both values are at their absolute maximum. There are a few other ways, but fortunately, you never have to worry about them in Python because the language does not have the integer overflow error. The largest number an integer can be is only limited by your imagination.
memory:
>>>
>>> 2147483647**7
210624582650556372047028295576838759252690170086892944262392971263
There is, however, a catch. It is possible for library code to be subject to the C language constraints while still causing an overflow when called directly. Python has many Clanguage based library options. Using ctypes, you could even create your own C extension module or import a DLL into Python.
.
Stack Overflow
In theory, the recursive implementation of binary search could be at the heart of the stack overflow issue. There is a hard cap on the depth of nesting that can be achieved in most programming languages. A return address is kept on the stack for each call. Python’s default limit for such programmes is several thousand levels.
calls:
>>>
>>> import sys
>>> sys.getrecursionlimit()
3000
For many recursive algorithms, this is simply not enough. Since Python’s binary search is logarithmic, it’s highly improbable that it will ever require more than this. You’ll need a stockpile of materials equal to 23000. There are more than nine hundred digits in that!
However, if the stopping condition is stated incorrectly due to a bug, the infinite recursion error may still occur. A stack overflow occurs when an infinite recursion is performed. Observe: The
In languages where memory management is handled by hand, stack overflows are extremely frequent. People frequently googled those errors to see if anyone else had experienced them, hence the origin of the term “google error.”
Programmer Q&A forum.
The recursion limit can be temporarily increased or decreased to imitate a stack overflow. The features of the Python runtime environment mean that the effective limit will be lower.
call:
>>>
>>> def countup(limit, n=1):
... print(n)
... if n < limit:
... countup(limit, n + 1)
...
>>> import sys
>>> sys.setrecursionlimit(7) # Actual limit is 3
>>> countup(10)
1
2
3
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 4, in countup
File "<stdin>", line 4, in countup
File "<stdin>", line 2, in countup
RecursionError: maximum recursion depth exceeded while calling a Python object
There were three invocations of the recursive function before the stack was exhausted. The interactive interpreter was likely responsible for the remaining four calls. It’s possible to get a different result from the same code if you run it in PyCharm or another Python shell.
result.
Duplicate Elements
You understand that it’s possible for the list to contain duplicate items and that you must take appropriate action if this occurs. This is said only to stress the fact that a traditional binary search in Python may not yield definite outcomes. The result will vary depending on the number of items in the list and the order in which they were placed.
answer:
>>>
>>> from search.binary import *
>>> sorted_fruits = ['apple', 'banana', 'banana', 'orange']
>>> find_index(sorted_fruits, 'banana')
1
>>> sorted_fruits.append('plum')
>>> find_index(sorted_fruits, 'banana')
2
The list includes two bananas. When you first use find_index(), it will give you the left one. The same call, however, will yield a different banana if an unrelated item is appended to the end of the list.
Sorting algorithms are subject to the same rule, known as algorithm stability. Some do not affect the equilibrium between chemically similar elements, making them stable. Some people don’t provide assurances like that. The least important key to remember should always be used as the starting point when sorting elements by multiple criteria.
stability.
FloatingPoint Rounding
You’ve been looking for fruits and people, but have you considered looking for numbers? It stands to reason that they would be the same. In this example, we will use a list comprehension to generate a list of floatingpoint numbers with 0.1point intervals.
:
>>>
>>> sorted_numbers = [0.1*i for i in range(1, 4)]
Onetenth, twotenth, and threetenth numbers should fill out the list. Only two of those three numbers are actually possible.
found:
>>>
>>> from search.binary import contains
>>> contains(sorted_numbers, 0.1)
True
>>> contains(sorted_numbers, 0.2)
True
>>> contains(sorted_numbers, 0.3)
False
Since Python’s default linear search conforms to this expectation, this isn’t a binary searchspecific issue.
it:
>>>
>>> 0.1 in sorted_numbers
True
>>> 0.2 in sorted_numbers
True
>>> 0.3 in sorted_numbers
False
It has nothing to do with Python, but rather with the way that floatingpoint numbers are stored in RAM. IEEE 754 is the standard for floatingpoint arithmetic that specifies this. The binary representation of some decimal numbers is infinitely large, to put it mildly. These numbers are rounded off because of memory constraints, which results in a floatingpoint rounding error. Avoid using floatingpoint numbers if you need the highest degree of accuracy. For structural work, they are ideal. However, rounding errors should not accumulate in monetary operations. It is suggested that all monetary values be rounded down to the nearest integer (cents, pennies, etc.).
Alternately, many languages’ code bases accommodate
integers with a fixed decimal place, like the
Python’s support for decimals. You can now customise the timing and method of rounding.
When dealing with floatingpoint numbers, an approximate comparison is preferable to exact matching. Let’s compare two variables that vary
values:
>>>
>>> a = 0.3
>>> b = 0.1 * 3
>>> b
0.30000000000000004
>>> a == b
False
Standard comparison fails even though the two values are very close. Good thing there’s a builtin Python function to check if two numbers are within a predetermined tolerance for each other.
neighbourhood:
>>>
>>> import math
>>> math.isclose(a, b)
True
The maximum distance between the values, or “neighborhood,” is something that can be modified.
needed:
>>>
>>> math.isclose(a, b, rel_tol=1e16)
False
To perform a binary search in Python, use that function in the following
way:
import math
def find_index(elements, value):
left, right = 0, len(elements)  1
while left <= right:
middle = (left + right) // 2
if math.isclose(elements[middle], value):
return middle
if elements[middle] < value:
left = middle + 1
elif elements[middle] > value:
right = middle  1
However, this binary search implementation in Python only works with floatingpoint numbers. If you tried to use it for a different kind of search, you would always get an
error.
Analyzing the TimeSpace Complexity of Binary Search
In what follows, you will find some mathematical concepts but no code.
You can improve the efficiency of almost any algorithm in computing, but doing so will require more memory. For instance, you discovered that conducting a hashbased search of the IMDb dataset necessitated an additional half a gigabyte of memory in order to achieve unprecedented speed.
On the flip side, reducing the size of a video stream before sending it over the network requires more processing power but saves bandwidth. This phenomenon is known as the spacetime tradeoff and is useful in evaluating an algorithm’s complexity
.
TimeSpace Complexity
The computational complexity is a relative measure of how many resources an algorithm needs to do its job. The resources include computation time as well as the amount of memory it uses. Comparing the complexity of various algorithms allows you to make an informed decision about which is better in a given situation. Note: Algorithms that don’t need to allocate more memory than their input data already consumes are called
inplace , or
insitu , algorithms. This results in mutating the original data, which sometimes may have unwanted sideeffects.
You looked at a few search algorithms and their average performance against a large dataset. It’s clear from those measurements that a binary search is faster than a linear search. You can even tell by what factor.
However, if you took the same measurements in a different environment, you’d probably get slightly or perhaps entirely different results. There are invisible factors at play that can be influencing your test. Besides, such measurements aren’t always feasible. So, how can you compare time complexities quickly and objectively?
The first step is to break down the algorithm into smaller pieces and find the one that is doing the most work. It’s likely going to be some elementary operation that gets called a lot and consistently takes about the same time to run. For search algorithms, such an operation might be the comparison of two elements.
Having established that, you can now analyse the algorithm. To find the time complexity, you want to describe the relationship between the number of elementary operations executed versus the size of the input. Formally, such a relationship is a mathematical function. However, you’re not interested in looking for its exact algebraic formula but rather estimating its overall shape.
There are a few wellknown classes of functions that most algorithms fit in. Once you classify an algorithm according to one of them, you can put it on a scale:
Common Classes of Time Complexity
These classes tell you how the number of elementary operations increases with the growing size of the input. They are, from left to
right:
 Constant
 Logarithmic
 Linear
 Quasilinear
 Quadratic
 Exponential
 Factorial
This can give you an idea about the performance of the algorithm you’re considering. A constant complexity, regardless of the input size, is the most desired one. A logarithmic complexity is still pretty good, indicating a divideandconquer technique at use. The further to the right on this scale, the worse the complexity of the algorithm, because it has more work to do.
When you’re talking about the time complexity, what you typically mean is the asymptotic complexity , which describes the behaviour under very large data sets. This simplifies the function formula by eliminating all terms and coefficients but the one that grows at the fastest rate (for example, n squared).
However, a single function doesn’t provide enough information to compare two algorithms accurately. The time complexity may vary depending on the volume of data. For example, the binary search algorithm is like a turbocharged engine, which builds pressure before it’s ready to deliver power. On the other hand, the linear search algorithm is fast from the start but quickly reaches its peak power and ultimately loses the race:
In terms of speed, the binary search algorithm starts to overtake the linear search when there’s a certain number of elements in the collection. For smaller collections, a linear search might be a better choice. Note: Note that the same algorithm may have different
optimistic ,
pessimistic , and
average time complexity. For example, in the bestcase scenario, a linear search algorithm will find the element at the first index, after running just one comparison.
On the other end of the spectrum, it’ll have to compare a reference value to all elements in the collection. In practice, you want to know the pessimistic complexity of an algorithm.
There are a few mathematical notations of the asymptotic complexity, which are used to compare algorithms. By far the most popular one is the Big O notation
.
The Big O Notation
The
Big O notation represents the worstcase scenario of asymptotic complexity. Although this might sound rather intimidating, you don’t need to know the formal definition. Intuitively, it’s a very rough measure of the rate of growth at the tail of the function that describes the complexity. You pronounce it as “ big oh” of something :
That “something” is usually a function of data size or just the digit “one” that stands for a constant. For example, the linear search algorithm has a time complexity of O(n) , while a hashbased search has O(1) complexity. Note: When you say that some algorithm has complexity O(f(n)) , where n is the size of the input data, then it means that the function f(n) is an upper bound of the graph of that complexity. In other words, the actual complexity of that algorithm won’t grow faster than f(n) multiplied by some constant, when n approaches infinity.
In reallife, the Big O notation is used less formally as both an upper and a lower bound. This is useful for the classification and comparison of algorithms without having to worry about the exact function
formulas.
The Complexity of Binary Search
You’ll estimate the asymptotic time complexity of binary search by determining the number of comparisons in the worstcase scenario—when an element is missing—as a function of input size. You can approach this problem in three different
ways:
 Tabular
 Graphical
 Analytical
The tabular method is about collecting empirical data, putting it in a table, and trying to guess the formula by eyeballing sampled
values:
Number of Elements  Number of Comparisons 

0  0 
1  1 
2  2 
3  2 
4  3 
5  3 
6  3 
7  3 
8  4 
The number of comparisons grows as you increase the number of elements in the collection, but the rate of growth is slower than if it was a linear function. That’s an indication of a good algorithm that can scale with data.
If that doesn’t help you, you can try the graphical method, which visualises the sampled data by drawing a graph:
The data points seem to overlay with a curve, but you don’t have enough information to provide a conclusive answer. It could be a polynomial , whose graph turns up and down for larger inputs.
Taking the analytical approach, you can choose some relationship and look for patterns. For example, you might study how the number of elements shrinks in each step of the
algorithm:
Comparison  Number of Elements 

–  n 
1st  n/2 
2nd  n/4 
3rd  n/8 
⋮  ⋮ 
kth 
n/2 ^{ k } 
In the beginning, you start with the whole collection of n elements. After the first comparison, you’re left with only half of them. Next, you have a quarter, and so on. The pattern that arises from this observation is that after kth comparison, there are n/2 k elements. Variable k is the expected number of elementary operations.
After all k comparisons, there will be no more elements left. However, when you take one step back, that is k – 1 , there will be exactly one element left. This gives you a convenient equation:
Multiply both sides of the equation by the denominator, then take the logarithm base two of the result, and move the remaining constant to the right. You’ve just found the formula for the binary search complexity, which is on the order of O(log(n)) .