Listing of Contents
Python’s collections module includes a plethora of specialised container data types that have been meticulously crafted to solve certain programming challenges in a Pythonic and efficient manner. Also, the module offers wrapper classes that enable it safer to develop new classes that operate similarly to the built-in types dict, list, and str.
Understanding about the data types and classes in collections will help you to add a vital set of dependable and efficient programming tools to your toolbox.
This guide will teach you how to:
namedtuple
deque
Counter
defaultdict
OrderedDict
ChainMap
Learn the fundamentals of dealing with Python’s built-in data types, such as lists, tuples, and dictionaries, to better comprehend the data types and classes in collections. Furthermore, the last section of the paper demands fundamental understanding of object-oriented programming in Python.
Introduction to Python’s collections
In Python 2.4, Raymond Hettinger introduced the collections module to the standard library. The objective was to offer a variety of customised collection data types to address certain programming issues.
At that time, collections consisted of just one data structure, deque, which was expressly built as a double-ended queue that facilitates fast append and pop operations on both ends of the sequence. Some standard library modules began using deque to boost the performance of their classes and structures afterwards. Exceptional instances include queue and threading.
A couple of specific container data types were added to the module throughout time:
deque
defaultdict
namedtuple()
tuple
that provides named fields that allow accessing items by name while keeping the ability to access items by indexOrderedDict
Counter
ChainMap
In addition to these particular data types, collections includes three basic classes for creating custom lists, dictionaries, and strings:
UserDict
dict
UserList
list
UserString
string
The ability to subclass the appropriate standard built-in data types rendered several of these wrapper classes obsolete. Nonetheless, utilising these classes is sometimes safer and less prone to mistake than using regular data types.
Given this quick introduction to collections and the particular use cases that the data structures and classes in this module may address, it is now time to examine them in further detail. Before continuing, it is crucial to note that this lesson serves as an introduction to collections in general. In the majority of the parts that follow, a blue alert box will direct you to a specific article on the class or method at hand.
Enhancing Readability of Code: namedtuple ()
The namedtuple() factory method in Python enables the creation of tuple subclasses with named fields. These fields provide direct access to the values of a named tuple using the dot notation, as in obj.attribute.
This functionality was required since using indices to retrieve the values in a conventional tuple is cumbersome, difficult to comprehend, and prone to error. This is particularly true if the tuple you’re dealing with has several elements and was produced in a distant location.
Go to Create Pythonic and Clean Code With namedtuple for a more in-depth look at how to utilise namedtuple in Python.
Back in Python 2.6, a tuple subclass with named fields that developers could access through dot notation was deemed useful. That’s where namedtuple comes from (). Comparing the tuple subclasses you may create with this method to conventional tuples reveals a significant improvement in code readability.
Consider divmod to put the code readability issue into perspective (). This built-in function accepts two (simple) integers as input and returns a tuple containing the quotient and remainder of the integer division of the input values.
>>>
>>> divmod(12, 5)
(2, 2)
It functions well. But, can this result be read? Can you determine the significance of each number in the output? Thankfully, Python provides a solution to this problem. With namedtuple, you may create a bespoke version of divmod() with an explicit result.
>>>
>>> from collections import namedtuple
>>> def custom_divmod(x, y):
... DivMod = namedtuple("DivMod", "quotient remainder")
... return DivMod(*divmod(x, y))
...
>>> result = custom_divmod(12, 5)
>>> result
DivMod(quotient=2, remainder=2)
>>> result.quotient
2
>>> result.remainder
2
You now understand the significance of each value in the result. Each independent value may also be accessed using the dot notation and a descriptive field name.
To create a new tuple subclass using namedtuple(), two mandatory parameters are required:
typename is the name of the newly created class. It must be a valid Python identification string.
field names is a list of field names used to retrieve the resultant tuple’s elements. It might be:
["field1", "field2", ..., "fieldN"]
"field1 field2 ... fieldN"
"field1, field2, ..., fieldN"
Here are many methods to build a 2D Point with two coordinates (x and y) using the namedtuple() function:
>>>
>>> from collections import namedtuple
>>> # Use a list of strings as field names
>>> Point = namedtuple("Point", ["x", "y"])
>>> point = Point(2, 4)
>>> point
Point(x=2, y=4)
>>> # Access the coordinates
>>> point.x
2
>>> point.y
4
>>> point[0]
2
>>> # Use a generator expression as field names
>>> Point = namedtuple("Point", (field for field in "xy"))
>>> Point(2, 4)
Point(x=2, y=4)
>>> # Use a string with comma-separated field names
>>> Point = namedtuple("Point", "x, y")
>>> Point(2, 4)
Point(x=2, y=4)
>>> # Use a string with space-separated field names
>>> Point = namedtuple("Point", "x y")
>>> Point(2, 4)
Point(x=2, y=4)
In these cases, Point is first constructed using a list of field names. Then, Point is instantiated to create a point object. Notice that x and y may be accessed both by field name and by index.
The subsequent examples demonstrate how to generate a named tuple containing a string of field names separated by commas, a generator expression, and a string of field names separated by spaces.
You may specify default values for your fields, generate a dictionary from a named tuple, alter the value of a specific field, and more using named tuples.
>>>
>>> from collections import namedtuple
>>> # Define default values for fields
>>> Person = namedtuple("Person", "name job", defaults=["Python Developer"])
>>> person = Person("Jane")
>>> person
Person(name='Jane', job='Python Developer')
>>> # Create a dictionary from a named tuple
>>> person._asdict()
{'name': 'Jane', 'job': 'Python Developer'}
>>> # Replace the value of a field
>>> person = person._replace(job="Web Developer")
>>> person
Person(name='Jane', job='Web Developer')
Initially, the Person class is created using namedtuple (). This time, you use a defaults parameter that takes a series of default values for the tuple’s fields. Notice that namedtuple() assigns default values to the fields on the right.
In the second example,. asdict is used to generate a dictionary from an existing named tuple (). This function returns a dictionary whose keys are the field names.
Lastly, you change the original value of job using. replace(). This method does not change the tuple in situ; rather, it returns a new named tuple with the updated value for each field. Do you know why the. replace() function produces a new named tuple?
Developing Efficient Stacks and Queues: deque
Python’s deque was the first collection data structure. This sequence-like data type is an extension of stacks and queues that supports memory-efficient and quick append and pop operations at both ends of the data structure.
Note: The term deque stands for double-ended queue and is pronounced “deck.”
With an O(n) time complexity, append and pop operations on the beginning or left side of list objects are wasteful in Python. Python must shift all the items to the right in order to insert new items at the list’s beginning, which makes these actions particularly costly when dealing with huge lists.
In contrast, append and pop operations on the right side of a list are often efficient (O(1)) unless Python must reallocate memory to expand the underlying list in order to accept additional entries.
Python’s deque was designed to address this issue. Since deques are implemented as a doubly-linked list, append and pop operations on both sides of the object are stable and equally efficient. Deques are thus very handy for generating stacks and queues.
Consider a queue as an example. It handles objects according to the FIFO principle. It functions like a conduit, where new objects are introduced at one end and old stuff are expelled at the other. The action of adding an item to the end of a queue is known as enqueue. Dequeue is the action of removing an item from the front or commencement of a queue.
Note: Check out Python’s deque: Implement Efficient Queues and Stacks is a comprehensive look at using deque in Python programmes.
Suppose you are simulating a line of people waiting to purchase movie tickets. It is possible with a deque. Each time a new individual comes, you queue them. When the person in front of you in line receives their tickets, you dequeue them.
Using a deque object, you can simulate the procedure as follows:
>>>
>>> from collections import deque
>>> ticket_queue = deque()
>>> ticket_queue
deque([])
>>> # People arrive to the queue
>>> ticket_queue.append("Jane")
>>> ticket_queue.append("John")
>>> ticket_queue.append("Linda")
>>> ticket_queue
deque(['Jane', 'John', 'Linda'])
>>> # People bought their tickets
>>> ticket_queue.popleft()
'Jane'
>>> ticket_queue.popleft()
'John'
>>> ticket_queue.popleft()
'Linda'
>>> # No people on the queue
>>> ticket_queue.popleft()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: pop from an empty deque
Create an empty deque object to represent the queue of individuals. To enqueue a person, you may use.append(), which adds things to a deque’s right end. To dequeue a person, use the.popleft() function, which removes and returns the leftmost item in a deque.
Note: queue is part of the Python standard library. This module supports multi-producer, multi-consumer queues that facilitate the secure exchange of data across many threads.
The initializer for deque accepts two optional arguments:
iterable
holds an iterable that serves as an initializer.maxlen
holds an integer number that specifies the maximum length of the deque
.If you don’t offer an iterable, then you receive an empty deque. If you provide a value for maxlen, your deque will hold no more than maxlen items.
Having a maxlen is a nice feature. For instance, suppose you need to implement a recent files list in one of your apps. Thus, you may perform the following:
>>>
>>> from collections import deque
>>> recent_files = deque(["core.py", "README.md", "__init__.py"], maxlen=3)
>>> recent_files.appendleft("database.py")
>>> recent_files
deque(['database.py', 'core.py', 'README.md'], maxlen=3)
>>> recent_files.appendleft("requirements.txt")
>>> recent_files
deque(['requirements.txt', 'database.py', 'core.py'], maxlen=3)
As the deque reaches its maximum capacity (three files in this example), adding a new file to one end causes the file at the opposing end to be immediately discarded. If you do not provide a value for maxlen, the deque may expand to an arbitrary size.
You are now familiar with the fundamentals of deques, including how to build them and add and pop items from both ends of a given deque. Deques give a list-like interface with extra functionality. Here are several examples:
>>>
>>> from collections import deque
>>> # Use different iterables to create deques
>>> deque((1, 2, 3, 4))
deque([1, 2, 3, 4])
>>> deque([1, 2, 3, 4])
deque([1, 2, 3, 4])
>>> deque("abcd")
deque(['a', 'b', 'c', 'd'])
>>> # Unlike lists, deque doesn't support .pop() with arbitrary indices
>>> deque("abcd").pop(2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: pop() takes no arguments (1 given)
>>> # Extend an existing deque
>>> numbers = deque([1, 2])
>>> numbers.extend([3, 4, 5])
>>> numbers
deque([1, 2, 3, 4, 5])
>>> numbers.extendleft([-1, -2, -3, -4, -5])
>>> numbers
deque([-5, -4, -3, -2, -1, 1, 2, 3, 4, 5])
>>> # Insert an item at a given position
>>> numbers.insert(5, 0)
>>> numbers
deque([-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5])
In the preceding examples, deques are initialised using various forms of iterables. One distinction between deque and list is that deque.pop() cannot get the item at a specified index.
Notice that deque provides.append(),.pop(), and.extend() methods with the suffix left to indicate that they operate on the left end of the underlying deque.
Deques also offer sequence operations:
.clear()
.copy()
.count(x)
x
.remove(value)
value
Deques’ ability to rotate their components with.rotate() is an additional feature that makes them appealing.
>>>
>>> from collections import deque
>>> ordinals = deque(["first", "second", "third"])
>>> ordinals.rotate()
>>> ordinals
deque(['third', 'first', 'second'])
>>> ordinals.rotate(2)
>>> ordinals
deque(['first', 'second', 'third'])
>>> ordinals.rotate(-2)
>>> ordinals
deque(['third', 'first', 'second'])
>>> ordinals.rotate(-1)
>>> ordinals
deque(['first', 'second', 'third'])
This procedure spins the deque by n steps clockwise. n defaults to a value of 1 When n is given a negative number, the rotation is to the left.
Eventually, you can access the elements of a deque using indices, but you cannot slice a deque:
>>>
>>> from collections import deque
>>> ordinals = deque(["first", "second", "third"])
>>> ordinals[1]
'second'
>>> ordinals[0:2]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: sequence index must be integer, not 'slice'
Interestingly, Deques allow indexing but do not enable slicing. When attempting to obtain a slice from an existing deque, a TypeError is generated. Due to the inefficiency of executing a slice operation on a linked list, this operation is unavailable.
Handling Missing Keys: defaultdict
How to manage missing keys is a typical issue encountered while dealing with dictionaries in Python. If you attempt to access a key that does not exist in a particular dictionary, you get a KeyError: Not Found error.
>>>
>>> favorites = {"pet": "dog", "color": "blue", "language": "Python"}
>>> favorites["fruit"]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'fruit'
There are many ways to circumvent this difficulty. For example, you may use .setdefault (). This method requires an argument of type key. If the key exists in the dictionary, the associated value is returned. If not, the method inserts the key, assigns it a default value, and returns it:
>>>
>>> from collections import namedtuple
>>> def custom_divmod(x, y):
... DivMod = namedtuple("DivMod", "quotient remainder")
... return DivMod(*divmod(x, y))
...
>>> result = custom_divmod(12, 5)
>>> result
DivMod(quotient=2, remainder=2)
>>> result.quotient
2
>>> result.remainder
2
0
In this example,.setdefault() is used to establish a default value for the fruit property. This key does not exist in the favourites collection, so.setdefault() creates it and gives it the value apple. If you call.setdefault() with an existing key, the function will have no effect on the dictionary and the key will retain its previous value.
You can also use.get() to produce an appropriate default value if a specified key is absent:
>>>
>>> from collections import namedtuple
>>> def custom_divmod(x, y):
... DivMod = namedtuple("DivMod", "quotient remainder")
... return DivMod(*divmod(x, y))
...
>>> result = custom_divmod(12, 5)
>>> result
DivMod(quotient=2, remainder=2)
>>> result.quotient
2
>>> result.remainder
2
1
Since the key is absent in the underlying dictionary,.get() returns apple. However,.get() does not automatically generate a new key.
Although dealing with missing keys in dictionaries is a typical need, Python’s collections additionally offer a method for doing so. The defaultdict data type is a subclass of dict that assists with missing keys.
Note: For further information on how to utilise Python’s defaultdict, see Handling Missing Keys With the defaultdict Type.
The first parameter of defaultdict’s function Object() { [native code] } is a function object. When a nonexistent key is accessed, defaultdict automatically executes that function without parameters to provide an appropriate default value.
defaultdict keeps the input function in.default factory and then overrides. missing__() to automatically execute the function and provide a default value when missing keys are accessed.
Any callable may be used to initialise defaultdict instances. Using int(), for instance, you may design an appropriate counter to count various objects:
>>>
>>> from collections import namedtuple
>>> def custom_divmod(x, y):
... DivMod = namedtuple("DivMod", "quotient remainder")
... return DivMod(*divmod(x, y))
...
>>> result = custom_divmod(12, 5)
>>> result
DivMod(quotient=2, remainder=2)
>>> result.quotient
2
>>> result.remainder
2
2
In this example, defaultdict is created using int() as its first input. When a nonexistent key is accessed, the dictionary immediately runs int(), which returns 0 as the key’s default value. This kind of defaultdict object is particularly helpful for counting items in Python.
Another typical use of defaultdict is grouping items. In this instance, list() is the convenient factory function:
>>>
>>> from collections import namedtuple
>>> def custom_divmod(x, y):
... DivMod = namedtuple("DivMod", "quotient remainder")
... return DivMod(*divmod(x, y))
...
>>> result = custom_divmod(12, 5)
>>> result
DivMod(quotient=2, remainder=2)
>>> result.quotient
2
>>> result.remainder
2
3
In this example, you have raw data on pets and their breeds and must categorise them by pet type. While creating the defaultdict instance, list() is used as the.default factory. This allows your dictionary to provide an empty list ([]) as the default value for all missing keys. Then, you store the breeds of your pets using this list.
As defaultdict is a subclass of dict, it implements the same interface. This implies that you may use defaultdict objects in the same manner as a standard dictionary.
Organizing Your Dictionaries: OrderedDict
Sometimes, it is necessary for dictionaries to remember the order in which key-value pairs are introduced. For years, Python’s standard dictionaries were unordered data structures. PEP 372 introduced the concept of adding a new dictionary type to collections in 2008.
The new class would remember the order of items depending on the timing of key insertions. That was OrderedDict’s genesis.
Python 3.1 saw the debut of OrderedDict. Its application programming interface (API) is almost identical to dict. But, OrderedDict iterates through keys and values in the same order in which the entries were first added to the dictionary. When a new value is assigned to an existing key, the order of the key-value combination stays the same. If a dictionary entry is removed and re-added, it will be relocated to the end of the dictionary.
Note: Observe OrderedDict vs dict in Python: Learn more about Python’s OrderedDict and why you should consider using it in The Right Tool for the Job.
Several methods exist for creating OrderedDict objects. The majority are similar to how a standard dictionary is created. For instance, you may construct an empty ordered dictionary by instantiating the class without any parameters and then inserting the necessary key-value pairs:
>>>
>>> from collections import namedtuple
>>> def custom_divmod(x, y):
... DivMod = namedtuple("DivMod", "quotient remainder")
... return DivMod(*divmod(x, y))
...
>>> result = custom_divmod(12, 5)
>>> result
DivMod(quotient=2, remainder=2)
>>> result.quotient
2
>>> result.remainder
2
4
In the following example, an empty ordered dictionary is created by instantiating OrderedDict without any parameters. Next, you add key-value pairs to the dictionary, just as you would with a conventional dictionary.
While iterating over the dictionary life stages, the key-value pairs are returned in the same order in which they were added. Ordering things correctly is the primary issue that OrderedDict addresses.
Python 3.6 introduces a new dict implementation. This implementation introduces an unexpected new feature: standard dictionaries now maintain their entries in the same order in which they were first added.
Originally, the functionality was seen as an implementation detail, and the documentation warned against relying on it. With Python 3.7, the feature is now an official component of the language standard. Thus, what is the purpose of utilising OrderedDict?
There are still several qualities that make OrderedDict valuable:
OrderedDict
, your code will make it clear that the order of items in the dictionary is important. You’re clearly communicating that your code needs or relies on the order of items in the underlying dictionary.OrderedDict
, you have access to .move_to_end()
, which is a method that allows you to manipulate the order of items in your dictionary. You’ll also have an enhanced variation of .popitem()
that allows removing items from either end of the underlying dictionary.OrderedDict
, equality tests between dictionaries take the order of items into account. So, if you have two ordered dictionaries with the same group of items but in a different order, then your dictionaries will be considered non-equal.Backward compatibility is at least one additional justification for choosing OrderedDict. With Python versions prior to 3.6, relying on ordinary dict objects to keep the order of items can cause your code to malfunction.
Now it’s time to see some of these interesting OrderedDict capabilities in action:
>>>
>>> from collections import namedtuple
>>> def custom_divmod(x, y):
... DivMod = namedtuple("DivMod", "quotient remainder")
... return DivMod(*divmod(x, y))
...
>>> result = custom_divmod(12, 5)
>>> result
DivMod(quotient=2, remainder=2)
>>> result.quotient
2
>>> result.remainder
2
5
In these examples,.move to end() is used to rearrange elements and reorder letters. Observe that. move _to end() takes an optional parameter named last that specifies which end of the dictionary the entries should be moved to. When you need to arrange the elements in your dictionaries or change their order in any manner, this approach is incredibly useful.
Another significant distinction between OrderedDict and a conventional dictionary is the manner in which they compare for equality.
>>>
>>> from collections import namedtuple
>>> def custom_divmod(x, y):
... DivMod = namedtuple("DivMod", "quotient remainder")
... return DivMod(*divmod(x, y))
...
>>> result = custom_divmod(12, 5)
>>> result
DivMod(quotient=2, remainder=2)
>>> result.quotient
2
>>> result.remainder
2
6
Here, letters 1 has a different item order than letters 0. When using standard dictionaries, this distinction is irrelevant, and both dictionaries compare equally. When using ordered dictionaries, however, letters 0 and letters 1 are not equivalent. This is due to the fact that equality tests across ordered dictionaries take both the content and the order of entries into account.
Counting Items All at Once: Counter
Counting objects is a popular programming process. Suppose you need to determine the frequency with which a certain item occurs in a list or iterable. If your list is small, counting the things on it might be simple and fast. It will be more difficult to count the items on a lengthy list.
Typically, a counter or an integer variable with a starting value of zero is used to count objects. Afterwards, you increase the counter to indicate the number of occurrences of a certain item.
In Python, you may use a dictionary to count several things simultaneously. In this instance, the keys will store individual objects, while the values will provide the item’s count, or the number of instances of a certain object.
This is an example of counting the letters in “Mississippi” using a standard dictionary and a for loop:
>>>
>>> from collections import namedtuple
>>> def custom_divmod(x, y):
... DivMod = namedtuple("DivMod", "quotient remainder")
... return DivMod(*divmod(x, y))
...
>>> result = custom_divmod(12, 5)
>>> result
DivMod(quotient=2, remainder=2)
>>> result.quotient
2
>>> result.remainder
2
7
The loop iterates over each letter of the word. The conditional statement verifies that the letters are not already in the dictionary before setting the letter count to zero. As the loop progresses, the ultimate step is to increase the letter count.
As you already know, defaultdict objects are helpful for counting since it is not necessary to verify whether the key exists. The dictionary assures that any missing keys will have adequate default values:
>>>
>>> from collections import namedtuple
>>> def custom_divmod(x, y):
... DivMod = namedtuple("DivMod", "quotient remainder")
... return DivMod(*divmod(x, y))
...
>>> result = custom_divmod(12, 5)
>>> result
DivMod(quotient=2, remainder=2)
>>> result.quotient
2
>>> result.remainder
2
8
In this example, a defaultdict object is created and initialised using int (). Using int() as the factory method, the underlying default dictionary generates missing keys and initialises them to zero automatically. Finally, the value of the current key is incremented to get the final letter count for “mississippi.”
Python offers an effective method for tackling the counting problem, just as it does with other common programming issues. Counter, a subclass of dict intended specifically for counting items, is available in collections.
Here’s how to write the “Mississippi” illustration using Counter:
>>>
>>> from collections import namedtuple
>>> def custom_divmod(x, y):
... DivMod = namedtuple("DivMod", "quotient remainder")
... return DivMod(*divmod(x, y))
...
>>> result = custom_divmod(12, 5)
>>> result
DivMod(quotient=2, remainder=2)
>>> result.quotient
2
>>> result.remainder
2
9
Wow! It was fast! One line of code and you’re finished. In this example, Counter iterates through the string “mississippi” to generate a dictionary with the letters as keys and their frequency as values.
Note: Check out Python’s Counter: The Pythonic Way to Count Things for a more in-depth look at Counter and how to use it to effectively count objects.
There are several methods to initialise the Counter class. For repeated objects, you may use lists, tuples, or any other iterables. The sole constraint is that objects must be hashable:
>>>
>>> from collections import namedtuple
>>> # Use a list of strings as field names
>>> Point = namedtuple("Point", ["x", "y"])
>>> point = Point(2, 4)
>>> point
Point(x=2, y=4)
>>> # Access the coordinates
>>> point.x
2
>>> point.y
4
>>> point[0]
2
>>> # Use a generator expression as field names
>>> Point = namedtuple("Point", (field for field in "xy"))
>>> Point(2, 4)
Point(x=2, y=4)
>>> # Use a string with comma-separated field names
>>> Point = namedtuple("Point", "x, y")
>>> Point(2, 4)
Point(x=2, y=4)
>>> # Use a string with space-separated field names
>>> Point = namedtuple("Point", "x y")
>>> Point(2, 4)
Point(x=2, y=4)
0
Because integers are hashable, Counter operates appropriately. In contrast, lists are not hashable, therefore Counter returns a TypeError.
Being hashable requires that your objects maintain a constant hash value throughout their lifespan. This is necessary since these objects will function as dictionary keys. Immutable objects are hashable in Python.
Note: In Counter, the counting functionality is provided via a highly efficient C function. If this method is unavailable for any reason, the class will utilise a similar but less efficient Python function.
Due to the fact that Counter is a subclass of dict, their interfaces are almost identical. Yet, there are some small distinctions. The first distinction is that Counter does not support.fromkeys (). This prevents inconsistencies such as Counter.fromkeys(“abbbc”, 2), in which each letter would be assigned a starting count of 2 regardless of its actual count in the input iterable.
The second distinction is that.update() does not replace an existing object’s count (value) with a new count (key). It adds the two quantities:
>>>
>>> from collections import namedtuple
>>> # Use a list of strings as field names
>>> Point = namedtuple("Point", ["x", "y"])
>>> point = Point(2, 4)
>>> point
Point(x=2, y=4)
>>> # Access the coordinates
>>> point.x
2
>>> point.y
4
>>> point[0]
2
>>> # Use a generator expression as field names
>>> Point = namedtuple("Point", (field for field in "xy"))
>>> Point(2, 4)
Point(x=2, y=4)
>>> # Use a string with comma-separated field names
>>> Point = namedtuple("Point", "x, y")
>>> Point(2, 4)
Point(x=2, y=4)
>>> # Use a string with space-separated field names
>>> Point = namedtuple("Point", "x y")
>>> Point(2, 4)
Point(x=2, y=4)
1
Here, you update the m and I counts. These letters now contain the total of their original count plus the value you gave to them via.update (). .update() adds a new key with the matching value if you use a key that was not present in the original counter. In addition to iterables, mappings, keyword arguments, and counters,.update() now takes additional counters.
As Counter is a subclass of dict, there are no constraints on the types of objects that may be stored in its keys and values. The keys may contain any hashable object, whereas the values may contain any object. To properly function as counters, however, the values must be integers representing counts.
Counter differs from dict in that accessing a missing key returns 0 rather than issuing a KeyError:
>>>
>>> from collections import namedtuple
>>> # Use a list of strings as field names
>>> Point = namedtuple("Point", ["x", "y"])
>>> point = Point(2, 4)
>>> point
Point(x=2, y=4)
>>> # Access the coordinates
>>> point.x
2
>>> point.y
4
>>> point[0]
2
>>> # Use a generator expression as field names
>>> Point = namedtuple("Point", (field for field in "xy"))
>>> Point(2, 4)
Point(x=2, y=4)
>>> # Use a string with comma-separated field names
>>> Point = namedtuple("Point", "x, y")
>>> Point(2, 4)
Point(x=2, y=4)
>>> # Use a string with space-separated field names
>>> Point = namedtuple("Point", "x y")
>>> Point(2, 4)
Point(x=2, y=4)
2
This behaviour indicates that the count of nonexistent objects in a counter is zero. The letter “a” does not appear in the original word, hence its count is 0.
Counter may also be used to imitate a multiset or bag in Python. Similar to sets, multisets permit numerous occurrences of a given element. The quantity of occurrences of an element is referred to as its multiplicity. For instance, a multiset may be 1, 1, 2, 3, 3, 3, 3, 4, 4.
When emulating multisets using Counter, the keys represent the items and the values reflect their corresponding multiplicity:
>>>
>>> from collections import namedtuple
>>> # Use a list of strings as field names
>>> Point = namedtuple("Point", ["x", "y"])
>>> point = Point(2, 4)
>>> point
Point(x=2, y=4)
>>> # Access the coordinates
>>> point.x
2
>>> point.y
4
>>> point[0]
2
>>> # Use a generator expression as field names
>>> Point = namedtuple("Point", (field for field in "xy"))
>>> Point(2, 4)
Point(x=2, y=4)
>>> # Use a string with comma-separated field names
>>> Point = namedtuple("Point", "x, y")
>>> Point(2, 4)
Point(x=2, y=4)
>>> # Use a string with space-separated field names
>>> Point = namedtuple("Point", "x y")
>>> Point(2, 4)
Point(x=2, y=4)
3
In this case, the keys of multiset correspond to a Python set. The values represent the quantity of each set element.
Python’s Counter gives a few more capabilities for working with multisets. You may begin your counters, for instance, using a mapping of items and their multiplicity. Also, you may conduct mathematical operations on the multiplicity of elements and more.
Suppose you work at the local animal shelter. You are responsible for keeping track of how many animals are adopted each day and how many animals arrive and exit the shelter. In this instance, Counter:
>>>
>>> from collections import namedtuple
>>> # Use a list of strings as field names
>>> Point = namedtuple("Point", ["x", "y"])
>>> point = Point(2, 4)
>>> point
Point(x=2, y=4)
>>> # Access the coordinates
>>> point.x
2
>>> point.y
4
>>> point[0]
2
>>> # Use a generator expression as field names
>>> Point = namedtuple("Point", (field for field in "xy"))
>>> Point(2, 4)
Point(x=2, y=4)
>>> # Use a string with comma-separated field names
>>> Point = namedtuple("Point", "x, y")
>>> Point(2, 4)
Point(x=2, y=4)
>>> # Use a string with space-separated field names
>>> Point = namedtuple("Point", "x y")
>>> Point(2, 4)
Point(x=2, y=4)
4 may be used.
That’s cool! With Counter, you can now maintain a record of your pets. Notice that.remove() and.update() may be used to subtract and add counts or multiplicities, respectively. The addition (+) and subtraction (-) operators may also be used.
You can do much more with Counter objects as multisets in Python, so go ahead and experiment!
Chaining Together Dictionaries: ChainMap
ChainMap in Python combines many dictionaries and other mappings into a single object that functions similarly to a dictionary. In other words, it transforms several mappings into a single logical mapping.
ChainMap objects are viewable updates, thus changes to any of the linked mappings effect the ChainMap object as a whole. This is due to the fact that ChainMap does not combine the input mappings. On top of this list, it reimplements standard dictionary operations. For instance, a key lookup sequentially scans the list of mappings until it finds the key.
Note: ChainMap in Python: Managing Many Contexts Properly go further into the use of ChainMap in Python programmes.
While dealing with ChainMap objects, multiple dictionaries with unique or repeated keys are supported.
In either scenario, ChainMap enables you to consider all of your dictionaries as a single entity. If all of your dictionaries have unique keys, you may access and modify the keys as if you were dealing with a single dictionary.
In addition to handling your dictionaries as one, you may use the internal list of mappings to create some type of access priority if your dictionaries include duplicate keys. Due to this capability, ChainMap objects are excellent for managing many contexts.
Suppose you are developing a command-line interface (CLI) application. The programme enables the user to connect to the Internet using a proxy service. The priority settings are:
--proxy
, -p
)If the user specifies a proxy on the command line, the programme is required to utilise that proxy. If not, the programme shall utilise the proxy given by the next configuration object, and so forth. This is one of the most typical ChainMap usage cases. In this case, the following options are available:
>>>
>>> from collections import namedtuple
>>> # Use a list of strings as field names
>>> Point = namedtuple("Point", ["x", "y"])
>>> point = Point(2, 4)
>>> point
Point(x=2, y=4)
>>> # Access the coordinates
>>> point.x
2
>>> point.y
4
>>> point[0]
2
>>> # Use a generator expression as field names
>>> Point = namedtuple("Point", (field for field in "xy"))
>>> Point(2, 4)
Point(x=2, y=4)
>>> # Use a string with comma-separated field names
>>> Point = namedtuple("Point", "x, y")
>>> Point(2, 4)
Point(x=2, y=4)
>>> # Use a string with space-separated field names
>>> Point = namedtuple("Point", "x y")
>>> Point(2, 4)
Point(x=2, y=4)
5
ChainMap enables you to determine the proper application proxy configuration priority. A key lookup scans cmd proxy, local proxy, and global proxy, and returns the first occurrence of the key found. In this example, the user does not provide a proxy on the command line; thus, your programme utilises the proxy specified in local proxy.
In general, ChainMap objects act like conventional dictionary objects. Yet, they have extra characteristics. For instance, they have a.maps public property that contains the internal mappings list:
>>>
>>> from collections import namedtuple
>>> # Use a list of strings as field names
>>> Point = namedtuple("Point", ["x", "y"])
>>> point = Point(2, 4)
>>> point
Point(x=2, y=4)
>>> # Access the coordinates
>>> point.x
2
>>> point.y
4
>>> point[0]
2
>>> # Use a generator expression as field names
>>> Point = namedtuple("Point", (field for field in "xy"))
>>> Point(2, 4)
Point(x=2, y=4)
>>> # Use a string with comma-separated field names
>>> Point = namedtuple("Point", "x, y")
>>> Point(2, 4)
Point(x=2, y=4)
>>> # Use a string with space-separated field names
>>> Point = namedtuple("Point", "x y")
>>> Point(2, 4)
Point(x=2, y=4)
6
The.maps instance property provides access to the internal mappings list. This list may be modified. You may manually add and delete mappings, iterate over the list, and more.
Furthermore, ChainMap offers the.new child() function and the.parents property:
>>>
>>> from collections import namedtuple
>>> # Use a list of strings as field names
>>> Point = namedtuple("Point", ["x", "y"])
>>> point = Point(2, 4)
>>> point
Point(x=2, y=4)
>>> # Access the coordinates
>>> point.x
2
>>> point.y
4
>>> point[0]
2
>>> # Use a generator expression as field names
>>> Point = namedtuple("Point", (field for field in "xy"))
>>> Point(2, 4)
Point(x=2, y=4)
>>> # Use a string with comma-separated field names
>>> Point = namedtuple("Point", "x, y")
>>> Point(2, 4)
Point(x=2, y=4)
>>> # Use a string with space-separated field names
>>> Point = namedtuple("Point", "x y")
>>> Point(2, 4)
Point(x=2, y=4)
7
With.new child(), a new ChainMap object is created including a new map (son) followed by all the maps in the current instance. The map supplied as the first parameter becomes the first map in the array of maps. If no map is provided, the procedure uses an empty dictionary.
The parents property provides a new ChainMap object that contains all the maps in the current instance, except the initial map. This is handy for skipping the first map in a key search.
Mutating actions, such as updating keys, adding new keys, deleting existing keys, popping keys, and erasing the dictionary, operate on the first mapping in ChainMap’s internal list of mappings:
>>>
>>> from collections import namedtuple
>>> # Use a list of strings as field names
>>> Point = namedtuple("Point", ["x", "y"])
>>> point = Point(2, 4)
>>> point
Point(x=2, y=4)
>>> # Access the coordinates
>>> point.x
2
>>> point.y
4
>>> point[0]
2
>>> # Use a generator expression as field names
>>> Point = namedtuple("Point", (field for field in "xy"))
>>> Point(2, 4)
Point(x=2, y=4)
>>> # Use a string with comma-separated field names
>>> Point = namedtuple("Point", "x, y")
>>> Point(2, 4)
Point(x=2, y=4)
>>> # Use a string with space-separated field names
>>> Point = namedtuple("Point", "x y")
>>> Point(2, 4)
Point(x=2, y=4)
8
These examples demonstrate that modifying a ChainMap object affects just the initial mapping in the internal list. This is a crucial aspect to consider while dealing with ChainMap.
At first sight, it may seem as if it is feasible to modify any existing key-value combination in a particular ChainMap. You can only modify the key-value pairs in the first mapping unless you use.maps to directly access and modify additional mappings in the list.
Customizing UserString, UserList, and UserDict are built-ins.
Sometimes, it is necessary to alter built-in types such as strings, arrays, and dictionaries in order to add and change functionality. With Python 2.2, this is possible by explicitly subclassing such classes. Yet, as you’ll see in a moment, this strategy may provide some challenges.
Python’s collections include three wrapper classes that imitate the behaviour of the native data types:
UserString
UserList
UserDict
Using a mix of ordinary and special methods, you may imitate and adapt the behaviour of strings, lists, and dictionaries using these classes.
When it comes to customising the behaviour of built-in types, developers often wonder whether UserString, UserList, and UserDict are necessary. The answer is affirmative.
The open-closed paradigm informed the design and implementation of built-in types. This indicates that they are available for extension, but not for alteration. Enabling alterations to these classes’ essential properties may compromise their invariants. So, Python core developers opted to safeguard them from alteration.
For instance, suppose you want a dictionary that automatically lowercases the keys upon insertion. You could subclass dict and override. setitem__() such that every time the dictionary inserts a key, the key name is lowercased:
>>>
>>> from collections import namedtuple
>>> # Use a list of strings as field names
>>> Point = namedtuple("Point", ["x", "y"])
>>> point = Point(2, 4)
>>> point
Point(x=2, y=4)
>>> # Access the coordinates
>>> point.x
2
>>> point.y
4
>>> point[0]
2
>>> # Use a generator expression as field names
>>> Point = namedtuple("Point", (field for field in "xy"))
>>> Point(2, 4)
Point(x=2, y=4)
>>> # Use a string with comma-separated field names
>>> Point = namedtuple("Point", "x, y")
>>> Point(2, 4)
Point(x=2, y=4)
>>> # Use a string with space-separated field names
>>> Point = namedtuple("Point", "x y")
>>> Point(2, 4)
Point(x=2, y=4)
9
This dictionary works well when new keys are inserted using square brackets and dictionary-style assignment ([]). Nevertheless, it does not work when a dictionary is sent to the class function Object() { [native code] } or when.update is used (). This implies that you must override. init (),.update(), and maybe other functions for your own dictionary to function properly.
Now consider the same dictionary using the UserDict base class:
>>>
>>> from collections import namedtuple
>>> # Define default values for fields
>>> Person = namedtuple("Person", "name job", defaults=["Python Developer"])
>>> person = Person("Jane")
>>> person
Person(name='Jane', job='Python Developer')
>>> # Create a dictionary from a named tuple
>>> person._asdict()
{'name': 'Jane', 'job': 'Python Developer'}
>>> # Replace the value of a field
>>> person = person._replace(job="Web Developer")
>>> person
Person(name='Jane', job='Web Developer')
0
It operates! Before entering them into the dictionary, your custom dictionary now transforms all new keys to lowercase. Notice that because you do not directly inherit from dict, your class does not yield dict objects like the preceding example.
UserDict maintains a standard dictionary in the.data instance attribute. Next, all of its methods are implemented based on the dictionary. UserList and UserString operate identically, with the exception that their.data attributes contain a list and a str object, respectively.
If you need to modify any of these classes, just override the relevant methods and modify their behaviour as necessary.
Generally speaking, you should use UserDict, UserList, and UserString when you require a class that behaves almost similarly to the underlying wrapped built-in class and you wish to alter a portion of its basic functionality.
Accessing and directly manipulating the underlying.data property is a further incentive to utilise these classes instead of the comparable built-in classes.
The ability to directly inherit from built-in types has basically rendered UserDict, UserList, and UserString obsolete. The underlying implementation of built-in types, however, makes it difficult to securely inherit from them without rewriting a substantial amount of code. It is safer to utilise the proper class from collections in the majority of situations. That will prevent several problems and odd conduct.
Conclusion
Python’s collections module provides various specialised container data types that may be used to solve common programming issues, such as object counting, generating queues and stacks, and managing missing dictionary keys, among others.
Collections’ data types and classes were meant to be efficient and Pythonic. They may be really useful in your Python programming adventure, therefore it is well worth your time to learn about them.
This tutorial taught you how to:
namedtuple
deque
Counter
defaultdict
OrderedDict
ChainMap
In addition, you discovered three useful wrapper classes: UserDict, UserList, and UserString. These classes are useful for creating new classes that imitate the behaviour of the built-in dict, list, and str types.
Mark as Finished