Blame - Doc/howto/sorting.rst - platform/external/python/cpython3

blob: 12804466289914b78171f50bbc4298df12f73120 [file] [log] [blame]

Georg Brandl	c275e15	2010-11-05 07:10:41 +0000	[diff] [blame]	1	.. _sortinghowto:
				2
Raymond Hettinger	53c58f8	2010-09-01 09:15:42 +0000	[diff] [blame]	3	Sorting HOW TO
				4	**************
				5
				6	:Author: Andrew Dalke and Raymond Hettinger
Raymond Hettinger	b436b6c	2011-01-12 01:16:57 +0000	[diff] [blame]	7	:Release: 0.1
Raymond Hettinger	53c58f8	2010-09-01 09:15:42 +0000	[diff] [blame]	8
				9
				10	Python lists have a built-in :meth:`list.sort` method that modifies the list
Benjamin Peterson	1efb8dc	2011-01-12 04:44:41 +0000	[diff] [blame]	11	in-place. There is also a :func:`sorted` built-in function that builds a new
				12	sorted list from an iterable.
Raymond Hettinger	53c58f8	2010-09-01 09:15:42 +0000	[diff] [blame]	13
				14	In this document, we explore the various techniques for sorting data using Python.
				15
				16
				17	Sorting Basics
				18	==============
				19
				20	A simple ascending sort is very easy: just call the :func:`sorted` function. It
Raymond Hettinger	b436b6c	2011-01-12 01:16:57 +0000	[diff] [blame]	21	returns a new sorted list::
Raymond Hettinger	53c58f8	2010-09-01 09:15:42 +0000	[diff] [blame]	22
				23	>>> sorted([5, 2, 3, 1, 4])
				24	[1, 2, 3, 4, 5]
				25
Raymond Hettinger	810cd34	2011-02-06 06:11:29 +0000	[diff] [blame]	26	You can also use the :meth:`list.sort` method. It modifies the list
Serhiy Storchaka	ecf41da	2016-10-19 16:29:26 +0300	[diff] [blame]	27	in-place (and returns ``None`` to avoid confusion). Usually it's less convenient
Raymond Hettinger	53c58f8	2010-09-01 09:15:42 +0000	[diff] [blame]	28	than :func:`sorted` - but if you don't need the original list, it's slightly
				29	more efficient.
				30
				31	>>> a = [5, 2, 3, 1, 4]
				32	>>> a.sort()
				33	>>> a
				34	[1, 2, 3, 4, 5]
				35
				36	Another difference is that the :meth:`list.sort` method is only defined for
				37	lists. In contrast, the :func:`sorted` function accepts any iterable.
				38
				39	>>> sorted({1: 'D', 2: 'B', 3: 'B', 4: 'E', 5: 'A'})
				40	[1, 2, 3, 4, 5]
				41
				42	Key Functions
				43	=============
				44
Raymond Hettinger	99a5638	2012-04-29 09:32:30 -0700	[diff] [blame]	45	Both :meth:`list.sort` and :func:`sorted` have a key parameter to specify a
Raymond Hettinger	53c58f8	2010-09-01 09:15:42 +0000	[diff] [blame]	46	function to be called on each list element prior to making comparisons.
				47
				48	For example, here's a case-insensitive string comparison:
				49
				50	>>> sorted("This is a test string from Andrew".split(), key=str.lower)
				51	['a', 'Andrew', 'from', 'is', 'string', 'test', 'This']
				52
				53	The value of the key parameter should be a function that takes a single argument
				54	and returns a key to use for sorting purposes. This technique is fast because
				55	the key function is called exactly once for each input record.
				56
				57	A common pattern is to sort complex objects using some of the object's indices
				58	as keys. For example:
				59
				60	>>> student_tuples = [
Zachary Ware	378a1d7	2016-08-09 16:47:04 -0500	[diff] [blame]	61	... ('john', 'A', 15),
				62	... ('jane', 'B', 12),
				63	... ('dave', 'B', 10),
				64	... ]
Raymond Hettinger	53c58f8	2010-09-01 09:15:42 +0000	[diff] [blame]	65	>>> sorted(student_tuples, key=lambda student: student[2]) # sort by age
				66	[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
				67
				68	The same technique works for objects with named attributes. For example:
				69
				70	>>> class Student:
Zachary Ware	378a1d7	2016-08-09 16:47:04 -0500	[diff] [blame]	71	... def __init__(self, name, grade, age):
				72	... self.name = name
				73	... self.grade = grade
				74	... self.age = age
				75	... def __repr__(self):
				76	... return repr((self.name, self.grade, self.age))
Raymond Hettinger	53c58f8	2010-09-01 09:15:42 +0000	[diff] [blame]	77
				78	>>> student_objects = [
Zachary Ware	378a1d7	2016-08-09 16:47:04 -0500	[diff] [blame]	79	... Student('john', 'A', 15),
				80	... Student('jane', 'B', 12),
				81	... Student('dave', 'B', 10),
				82	... ]
Raymond Hettinger	53c58f8	2010-09-01 09:15:42 +0000	[diff] [blame]	83	>>> sorted(student_objects, key=lambda student: student.age) # sort by age
				84	[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
				85
				86	Operator Module Functions
				87	=========================
				88
				89	The key-function patterns shown above are very common, so Python provides
Raymond Hettinger	810cd34	2011-02-06 06:11:29 +0000	[diff] [blame]	90	convenience functions to make accessor functions easier and faster. The
				91	:mod:`operator` module has :func:`~operator.itemgetter`,
Raymond Hettinger	99a5638	2012-04-29 09:32:30 -0700	[diff] [blame]	92	:func:`~operator.attrgetter`, and a :func:`~operator.methodcaller` function.
Raymond Hettinger	53c58f8	2010-09-01 09:15:42 +0000	[diff] [blame]	93
				94	Using those functions, the above examples become simpler and faster:
				95
				96	>>> from operator import itemgetter, attrgetter
				97
				98	>>> sorted(student_tuples, key=itemgetter(2))
				99	[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
				100
				101	>>> sorted(student_objects, key=attrgetter('age'))
				102	[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
				103
				104	The operator module functions allow multiple levels of sorting. For example, to
				105	sort by grade then by age:
				106
				107	>>> sorted(student_tuples, key=itemgetter(1,2))
				108	[('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)]
				109
				110	>>> sorted(student_objects, key=attrgetter('grade', 'age'))
				111	[('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)]
				112
				113	Ascending and Descending
				114	========================
				115
				116	Both :meth:`list.sort` and :func:`sorted` accept a reverse parameter with a
Raymond Hettinger	99a5638	2012-04-29 09:32:30 -0700	[diff] [blame]	117	boolean value. This is used to flag descending sorts. For example, to get the
Raymond Hettinger	53c58f8	2010-09-01 09:15:42 +0000	[diff] [blame]	118	student data in reverse age order:
				119
				120	>>> sorted(student_tuples, key=itemgetter(2), reverse=True)
				121	[('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)]
				122
				123	>>> sorted(student_objects, key=attrgetter('age'), reverse=True)
				124	[('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)]
				125
				126	Sort Stability and Complex Sorts
				127	================================
				128
				129	Sorts are guaranteed to be `stable
Georg Brandl	5d94134	2016-02-26 19:37:12 +0100	[diff] [blame]	130	<https://en.wikipedia.org/wiki/Sorting_algorithm#Stability>`_\. That means that
Raymond Hettinger	53c58f8	2010-09-01 09:15:42 +0000	[diff] [blame]	131	when multiple records have the same key, their original order is preserved.
				132
				133	>>> data = [('red', 1), ('blue', 1), ('red', 2), ('blue', 2)]
				134	>>> sorted(data, key=itemgetter(0))
				135	[('blue', 1), ('blue', 2), ('red', 1), ('red', 2)]
				136
				137	Notice how the two records for blue retain their original order so that
				138	``('blue', 1)`` is guaranteed to precede ``('blue', 2)``.
				139
				140	This wonderful property lets you build complex sorts in a series of sorting
				141	steps. For example, to sort the student data by descending grade and then
				142	ascending age, do the age sort first and then sort again using grade:
				143
				144	>>> s = sorted(student_objects, key=attrgetter('age')) # sort on secondary key
				145	>>> sorted(s, key=attrgetter('grade'), reverse=True) # now sort on primary key, descending
				146	[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
				147
Georg Brandl	5d94134	2016-02-26 19:37:12 +0100	[diff] [blame]	148	The `Timsort <https://en.wikipedia.org/wiki/Timsort>`_ algorithm used in Python
Raymond Hettinger	53c58f8	2010-09-01 09:15:42 +0000	[diff] [blame]	149	does multiple sorts efficiently because it can take advantage of any ordering
				150	already present in a dataset.
				151
				152	The Old Way Using Decorate-Sort-Undecorate
				153	==========================================
				154
				155	This idiom is called Decorate-Sort-Undecorate after its three steps:
				156
				157	* First, the initial list is decorated with new values that control the sort order.
				158
				159	* Second, the decorated list is sorted.
				160
				161	* Finally, the decorations are removed, creating a list that contains only the
				162	initial values in the new order.
				163
				164	For example, to sort the student data by grade using the DSU approach:
				165
				166	>>> decorated = [(student.grade, i, student) for i, student in enumerate(student_objects)]
				167	>>> decorated.sort()
				168	>>> [student for grade, i, student in decorated] # undecorate
				169	[('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)]
				170
				171	This idiom works because tuples are compared lexicographically; the first items
				172	are compared; if they are the same then the second items are compared, and so
				173	on.
				174
				175	It is not strictly necessary in all cases to include the index i in the
				176	decorated list, but including it gives two benefits:
				177
				178	* The sort is stable -- if two items have the same key, their order will be
				179	preserved in the sorted list.
				180
				181	* The original items do not have to be comparable because the ordering of the
				182	decorated tuples will be determined by at most the first two items. So for
				183	example the original list could contain complex numbers which cannot be sorted
				184	directly.
				185
				186	Another name for this idiom is
Georg Brandl	5d94134	2016-02-26 19:37:12 +0100	[diff] [blame]	187	`Schwartzian transform <https://en.wikipedia.org/wiki/Schwartzian_transform>`_\,
Raymond Hettinger	53c58f8	2010-09-01 09:15:42 +0000	[diff] [blame]	188	after Randal L. Schwartz, who popularized it among Perl programmers.
				189
				190	Now that Python sorting provides key-functions, this technique is not often needed.
				191
				192
				193	The Old Way Using the cmp Parameter
				194	=====================================
				195
				196	Many constructs given in this HOWTO assume Python 2.4 or later. Before that,
				197	there was no :func:`sorted` builtin and :meth:`list.sort` took no keyword
				198	arguments. Instead, all of the Py2.x versions supported a cmp parameter to
				199	handle user specified comparison functions.
				200
				201	In Py3.0, the cmp parameter was removed entirely (as part of a larger effort to
				202	simplify and unify the language, eliminating the conflict between rich
				203	comparisons and the :meth:`__cmp__` magic method).
				204
				205	In Py2.x, sort allowed an optional function which can be called for doing the
				206	comparisons. That function should take two arguments to be compared and then
				207	return a negative value for less-than, return zero if they are equal, or return
				208	a positive value for greater-than. For example, we can do:
				209
				210	>>> def numeric_compare(x, y):
Zachary Ware	378a1d7	2016-08-09 16:47:04 -0500	[diff] [blame]	211	... return x - y
				212	>>> sorted([5, 2, 4, 1, 3], cmp=numeric_compare) # doctest: +SKIP
Raymond Hettinger	53c58f8	2010-09-01 09:15:42 +0000	[diff] [blame]	213	[1, 2, 3, 4, 5]
				214
				215	Or you can reverse the order of comparison with:
				216
				217	>>> def reverse_numeric(x, y):
Zachary Ware	378a1d7	2016-08-09 16:47:04 -0500	[diff] [blame]	218	... return y - x
				219	>>> sorted([5, 2, 4, 1, 3], cmp=reverse_numeric) # doctest: +SKIP
Raymond Hettinger	53c58f8	2010-09-01 09:15:42 +0000	[diff] [blame]	220	[5, 4, 3, 2, 1]
				221
				222	When porting code from Python 2.x to 3.x, the situation can arise when you have
				223	the user supplying a comparison function and you need to convert that to a key
Raymond Hettinger	b436b6c	2011-01-12 01:16:57 +0000	[diff] [blame]	224	function. The following wrapper makes that easy to do::
Raymond Hettinger	53c58f8	2010-09-01 09:15:42 +0000	[diff] [blame]	225
Raymond Hettinger	b436b6c	2011-01-12 01:16:57 +0000	[diff] [blame]	226	def cmp_to_key(mycmp):
				227	'Convert a cmp= function into a key= function'
Ezio Melotti	af8838f	2013-03-11 09:30:21 +0200	[diff] [blame]	228	class K:
Raymond Hettinger	b436b6c	2011-01-12 01:16:57 +0000	[diff] [blame]	229	def __init__(self, obj, *args):
				230	self.obj = obj
				231	def __lt__(self, other):
				232	return mycmp(self.obj, other.obj) < 0
				233	def __gt__(self, other):
				234	return mycmp(self.obj, other.obj) > 0
				235	def __eq__(self, other):
				236	return mycmp(self.obj, other.obj) == 0
				237	def __le__(self, other):
				238	return mycmp(self.obj, other.obj) <= 0
				239	def __ge__(self, other):
				240	return mycmp(self.obj, other.obj) >= 0
				241	def __ne__(self, other):
				242	return mycmp(self.obj, other.obj) != 0
				243	return K
Raymond Hettinger	53c58f8	2010-09-01 09:15:42 +0000	[diff] [blame]	244
				245	To convert to a key function, just wrap the old comparison function:
				246
Zachary Ware	378a1d7	2016-08-09 16:47:04 -0500	[diff] [blame]	247	.. testsetup::
				248
				249	from functools import cmp_to_key
				250
				251	.. doctest::
				252
Raymond Hettinger	53c58f8	2010-09-01 09:15:42 +0000	[diff] [blame]	253	>>> sorted([5, 2, 4, 1, 3], key=cmp_to_key(reverse_numeric))
				254	[5, 4, 3, 2, 1]
				255
				256	In Python 3.2, the :func:`functools.cmp_to_key` function was added to the
Raymond Hettinger	810cd34	2011-02-06 06:11:29 +0000	[diff] [blame]	257	:mod:`functools` module in the standard library.
Raymond Hettinger	53c58f8	2010-09-01 09:15:42 +0000	[diff] [blame]	258
				259	Odd and Ends
				260	============
				261
				262	* For locale aware sorting, use :func:`locale.strxfrm` for a key function or
				263	:func:`locale.strcoll` for a comparison function.
				264
Raymond Hettinger	810cd34	2011-02-06 06:11:29 +0000	[diff] [blame]	265	* The reverse parameter still maintains sort stability (so that records with
Raymond Hettinger	53c58f8	2010-09-01 09:15:42 +0000	[diff] [blame]	266	equal keys retain the original order). Interestingly, that effect can be
				267	simulated without the parameter by using the builtin :func:`reversed` function
				268	twice:
				269
				270	>>> data = [('red', 1), ('blue', 1), ('red', 2), ('blue', 2)]
Raymond Hettinger	b9531bc	2016-04-26 01:11:10 -0700	[diff] [blame]	271	>>> standard_way = sorted(data, key=itemgetter(0), reverse=True)
				272	>>> double_reversed = list(reversed(sorted(reversed(data), key=itemgetter(0))))
				273	>>> assert standard_way == double_reversed
				274	>>> standard_way
				275	[('red', 1), ('red', 2), ('blue', 1), ('blue', 2)]
Raymond Hettinger	53c58f8	2010-09-01 09:15:42 +0000	[diff] [blame]	276
				277	* The sort routines are guaranteed to use :meth:`__lt__` when making comparisons
				278	between two objects. So, it is easy to add a standard sort order to a class by
				279	defining an :meth:`__lt__` method::
				280
				281	>>> Student.__lt__ = lambda self, other: self.age < other.age
				282	>>> sorted(student_objects)
				283	[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
				284
				285	* Key functions need not depend directly on the objects being sorted. A key
				286	function can also access external resources. For instance, if the student grades
				287	are stored in a dictionary, they can be used to sort a separate list of student
				288	names:
				289
				290	>>> students = ['dave', 'john', 'jane']
				291	>>> newgrades = {'john': 'F', 'jane':'A', 'dave': 'C'}
				292	>>> sorted(students, key=newgrades.__getitem__)
				293	['jane', 'dave', 'john']