Skip to content

Latest commit

 

History

History
 
 

Bucket Sort

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

Bucket Sort

Bucket Sort, also known as Bin Sort, is a distributed sorting algorithm, which sort elements from an array by performing these steps:

  1. Distribute the elements into buckets or bins.
  2. Sort each bucket individually.
  3. Merge the buckets in order to produce a sorted array as the result.

See the algorithm in action here and here.

The performance for execution time is:

Case Performance
Worst O(n^2)
Best Omega(n + k)
Average Theta(n + k)

Where n = the number of elements and k is the number of buckets.

In the best case, the algorithm distributes the elements uniformly between buckets, a few elements are placed on each bucket and sorting the buckets is O(1). Rearranging the elements is one more run through the initial list.

In the worst case, the elements are sent all to the same bucket, making the process take O(n^2).

Pseudocode

A pseudocode of the algorithm can be as follows:

function bucketSort(array, n) is
    buckets ← new array of n empty lists
    for i = 0 to (length(array)-1) do
        insert array[i] into buckets[msbits(array[i], k)]
    for i = 0 to n - 1 do
        nextSort(buckets[i]);
    return the concatenation of buckets[0], ...., buckets[n-1]

Graphically explained

  1. Distribute elements in buckets:

distribution step

  1. Sorting inside every bucket and merging:

sorting each bucket and merge

An example

Input

Suppose we have the following list of elements: [2, 56, 4, 77, 26, 98, 55]. Let's use 10 buckets. To determine the capacity of each bucket we need to know the maximum element value, in this case 98.

So the buckets are:

  • bucket 1: from 0 to 9
  • bucket 2: from 10 to 19
  • bucket 3: from 20 to 29
  • and so on.

Distribution

Now we need to choose a distribution function.

bucketNumber = (elementValue / totalNumberOfBuckets) + 1

Such that by applying that function we distribute all the elements in the buckets.

In our example it is like the following:

  1. Apply the distribution function to 2. bucketNumber = (2 / 10) + 1 = 1
  2. Apply the distribution function to 56. bucketNumber = (56 / 10) + 1 = 6
  3. Apply the distribution function to 4. bucketNumber = (4 / 10) + 1 = 1
  4. Apply the distribution function to 77. bucketNumber = (77 / 10) + 1 = 8
  5. Apply the distribution function to 26. bucketNumber = (26 / 10) + 1 = 3
  6. Apply the distribution function to 98. bucketNumber = (98 / 10) + 1 = 10
  7. Apply the distribution function to 55. bucketNumber = (55 / 10) + 1 = 6

Our buckets will be filled now:

1 : [2, 4]
2 : []
3 : [26]
4 : []
5 : []
6 : [55, 56]
7 : []
8 : [77]
9 : []
10 : [98]

We can choose to insert the elements in every bucket in order, or sort every bucket after distributing all the elements.

Put the elements back in the list

Finally we go through all the buckets and put the elements back in the list:

[2, 4, 26, 55, 56, 77, 98]

Swift implementation

Here is a diagram that shows the functions, data structures and protocols for our bucker sort implementation:

classes

Main function

bucketSort() is a generic function that can apply the algorithm to any element of type T, as long as T is Sortable.

public func bucketSort<T:Sortable>(elements: [T],
                                distributor: Distributor,
                                     sorter: Sorter,
                                    buckets: [Bucket<T>]) -> [T] {
	precondition(allPositiveNumbers(elements))
	precondition(enoughSpaceInBuckets(buckets, elements: elements))

	var bucketsCopy = buckets
	for elem in elements {
		distributor.distribute(elem, buckets: &bucketsCopy)
	}

	var results = [T]()

	for bucket in bucketsCopy {
		results += bucket.sort(sorter)
	}

	return results
}

Bucket

public struct Bucket<T:Sortable> {
	var elements: [T]
	let capacity: Int

	public init(capacity: Int) {
		self.capacity = capacity
		elements = [T]()
	}

	public mutating func add(item: T) {
		if (elements.count < capacity) {
			elements.append(item)
		}
	}

	public func sort(algorithm: Sorter) -> [T] {
		return algorithm.sort(elements)
	}
}

Sortable

public protocol Sortable: IntConvertible, Comparable {
}

IntConvertible

The algorithm is designed to sort integers, so all the elements to be sorted should be mapped to an integer value.

public protocol IntConvertible {
	func toInt() -> Int
}

Sorter

public protocol Sorter {
	func sort<T:Sortable>(items: [T]) -> [T]
}

Distributor

public protocol Distributor {
	func distribute<T:Sortable>(element: T, inout buckets: [Bucket<T>])
}

Custom Sorter and Distributor

The current implementation make use of the following implementations for Sorter and Distributor.

Sorter

public struct InsertionSorter: Sorter {
	public func sort<T:Sortable>(items: [T]) -> [T] {
		var results = items
		for i in 0 ..< results.count {
			var j = i
			while ( j > 0 && results[j-1] > results[j]) {

				let auxiliar = results[j-1]
				results[j-1] = results[j]
				results[j] = auxiliar

				j -= 1
			}
		}
		return results
	}
}

Distributor

/*
 * An example of a simple distribution function that send every elements to
 * the bucket representing the range in which it fits.
 *
 * If the range of values to sort is 0..<49 i.e, there could be 5 buckets of capacity = 10
 * So every element will be classified by the ranges:
 *
 * -  0 ..< 10
 * - 10 ..< 20
 * - 20 ..< 30
 * - 30 ..< 40
 * - 40 ..< 50
 *
 * By following the formula: element / capacity = #ofBucket
 */
public struct RangeDistributor: Distributor {
	 public func distribute<T:Sortable>(element: T, inout buckets: [Bucket<T>]) {
	 let value = element.toInt()
	 let bucketCapacity = buckets.first!.capacity

	 let bucketIndex = value / bucketCapacity
	 buckets[bucketIndex].add(element)
	}
}

Make your own version

By reusing this code and implementing your own Sorter and Distributor you can experiment with different versions.

Other variations of Bucket Sort

The following are some of the variation to the general Bucket Sort implemented here:

Written for Swift Algorithm Club by Barbara Rodeker. Images from Wikipedia.