Selection
Selection algorithms enable us to efficiently find the k-th smallest or largest element in a collection. These algorithms are widely used in statistics, data analysis, and as subroutines in more complex algorithms.
What is Selection?
Selection refers to the process of finding the k-th smallest (or largest) element in an unsorted array or list. For example, finding the median (the middle value) is a classic selection problem. Selection is different from sorting: it focuses on finding a specific element rather than ordering the entire collection.
Selection by Sorting
The simplest way to solve a selection problem is to sort the array and then pick the k-th element.
- Time Complexity: O(n log n) due to sorting.
- Space Complexity: O(1) if sorting in-place.
- Pros: Simple to implement.
- Cons: Inefficient for large datasets when only one element is needed.
Partition-based Selection (Quickselect)
Quickselect is an efficient selection algorithm based on the partitioning logic of Quicksort. It works by recursively partitioning the array and narrowing down the search space.
- Choose a pivot from the array
- Partition the array so that A[low ... pivotpoint - 1] \(\le\) pivotpoint \(\le\) A[pivotpoint + 1 ... high]
- After partitioning if the pivot is at index k, return the pivot
- Else if pivot index > k: Repeat the process on left partition
fun partition(arr: IntArray, l: Int, r: Int, p: Int){
val pivotVal = arr[p]
arr.swap(p, r)
var idx = l
for(i in l until r) {
if(arr[i] < pivotVal) {
arr.swap(idx, i)
idx++
}
}
arr.swap(r, idx)
return idx
}
fun quickSelect(arr: IntArray, k: Int, l: Int, r: Int) {
if(l == r) return arr[l]
val pivot = Random.nextInt(l, r + 1)
val pivotIndex = partition(arr, l, r, pivot)
return when{
k == pivotIndex -> arr[k]
k < pivotIndex -> quickSelect(arr, k, l, pivotIndex - 1)
else -> quickSelect(arr, k, pivotIndex + 1, r)
}
}
- Average Time Complexity: O(n)
- Worst-case Time Complexity: O(n²) (rare)
- Space Complexity: O(n) (due to recursion and partitioning)
- Pros: Fast for most practical cases.
- Cons: Worst-case performance can be poor without careful pivot selection.
Linear Selection—Median of Medians
The Median of Medians algorithm guarantees linear time selection, even in the worst case. It works by recursively finding good pivots:
- Divide the array into groups of five.
- Find the median of each group (by sorting).
- Recursively find the median of these medians.
- Use this median as the pivot for partitioning.
fun linearSelection(arr: IntArray, k: Int) {
if(arr.size <= 5)
return arr.sorted()[k - 1]
// Divide into groups of 5
val medians = mutableListOf<Int>()
var i = 0
while(i < arr.size) {
val group = arr.slice(i until minOf(i + 5, arr.size)).toMutableList()
group.sort()
medians.add(group[group.size / 2])
i += 5
}
// Find median of medians
val pivot = linearSelect(medians, (medians.size + 1) / 2)
val lows = arr.filter { it < pivot }
val highs = arr.filter { it > pivot }
val pivots = arr.filter { it == pivot }
return when {
k <= lows.size -> linearSelect(lows, k)
k <= lows.size + pivots.size -> pivot
else -> linearSelect(highs, k - lows.size - pivots.size)
}
}
- Time Complexity: O(n)
- Space Complexity: O(n)
- Pros: Worst-case linear time.
- Cons: More complex to implement; higher constant factors than Quickselect.
Applications of Selection Algorithms
- Finding the Median: Useful in statistics and data analysis.
- Order Statistics: Finding the k-th largest/smallest value.
- Top-k Problems: E.g., finding the top k scores in a leaderboard.
- Data Stream Processing: Efficiently maintaining medians or percentiles.
- Subroutines in Other Algorithms: Used in algorithms like QuickSort, clustering, and more.
Comparison and Complexity Analysis
| Algorithm | Average Time | Worst-case Time | Space |
|---|---|---|---|
| Selection by Sorting | O(n log n) | O(n log n) | O(1) |
| Quickselect | O(n) | O(n²) | O(n) |
| Median of Medians | O(n) | O(n) | O(n) |
- For most practical cases, Quickselect is preferred due to its speed and simplicity.
- Median of Medians is used when worst-case guarantees are required.
Practical Tips
- For small arrays, sorting is often sufficient.
- For large datasets, use Quickselect or Median of Medians.
- Be aware of worst-case scenarios in Quickselect (e.g., adversarial input).
Explore more selection problems and solutions in the problems section.