You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Prompt: "Write a quicksort algorithm"
Without spec decoding: 29.803 tokens-per-sec
With spec decoding: 29.051 tokens-per-sec
Qwen2.5-Coder-0.5B-Instruct-MLX-4Bit alone: 284.647 tokens-per-sec
In the same situation on an M3 Pro, 32GB of ram, we see tremendous speedup (~7tok/sec -> ~16tok/sec)
Full logs:
Click to expand
(venv) ➜ test mlx_lm.generate --model lmstudio-community/Qwen2.5-Coder-32B-Instruct-MLX-4bit --prompt "Write a quicksort algorithm" --draft-model mlx-community/Qwen2.5-0.5B-Instruct-4bit -m 1000 --temp 0
==========
Certainly! Quicksort is a popular and efficient sorting algorithm that uses a divide-and-conquer approach to sort elements. Below is a simple implementation of the Quicksort algorithm in Python:
defquicksort(arr):
iflen(arr) <=1:
returnarrelse:
pivot=arr[len(arr) //2] # Choose the middle element as the pivotleft= [xforxinarrifx<pivot] # Elements less than the pivotmiddle= [xforxinarrifx==pivot] # Elements equal to the pivotright= [xforxinarrifx>pivot] # Elements greater than the pivotreturnquicksort(left) +middle+quicksort(right)
# Example usage:arr= [3, 6, 8, 10, 1, 2, 1]
sorted_arr=quicksort(arr)
print("Sorted array:", sorted_arr)
Explanation:
Base Case: If the array has 0 or 1 element, it is already sorted, so we return it as is.
Pivot Selection: We choose the middle element of the array as the pivot.
Partitioning: We create three lists:
left for elements less than the pivot.
middle for elements equal to the pivot.
right for elements greater than the pivot.
Recursive Sorting: We recursively apply the quicksort function to the left and right lists and concatenate the results with the middle list.
This implementation is simple and easy to understand, but it may not be the most efficient in terms of space complexity due to the use of additional lists. For an in-place version, you can modify the algorithm to swap elements within the original array. Here's an in-place version:
defquicksort_inplace(arr, low, high):
iflow<high:
pi=partition(arr, low, high) # Partitioning indexquicksort_inplace(arr, low, pi-1) # Sort left partquicksort_inplace(arr, pi+1, high) # Sort right partdefpartition(arr, low, high):
pivot=arr[high] # Choose the last element as the pivoti=low-1# Index of smaller elementforjinrange(low, high):
ifarr[j] <=pivot:
i+=1arr[i], arr[j] =arr[j], arr[i] # Swaparr[i+1], arr[high] =arr[high], arr[i+1] # Swap pivot elementreturni+1# Example usage:arr= [3, 6, 8, 10, 1, 2, 1]
quicksort_inplace(arr, 0, len(arr) -1)
print("Sorted array:", arr)
In this in-place version, the partition function rearranges the elements in the array such that elements less than the pivot are on the left, elements greater than the pivot are on the right, and the pivot is in its correct position. The quicksort_inplace function then recursively sorts the subarrays.
==========
Certainly! Quicksort is a popular and efficient sorting algorithm that uses a divide-and-conquer approach to sort elements. Below is a simple implementation of the Quicksort algorithm in Python:
defquicksort(arr):
iflen(arr) <=1:
returnarrelse:
pivot=arr[len(arr) //2] # Choose the middle element as the pivotleft= [xforxinarrifx<pivot] # Elements less than the pivotmiddle= [xforxinarrifx==pivot] # Elements equal to the pivotright= [xforxinarrifx>pivot] # Elements greater than the pivotreturnquicksort(left) +middle+quicksort(right)
# Example usage:arr= [3, 6, 8, 10, 1, 2, 1]
sorted_arr=quicksort(arr)
print("Sorted array:", sorted_arr)
Explanation:
Base Case: If the array has 0 or 1 element, it is already sorted, so we return it as is.
Pivot Selection: We choose the middle element of the array as the pivot.
Partitioning: We create three lists:
left for elements less than the pivot.
middle for elements equal to the pivot.
right for elements greater than the pivot.
Recursive Sorting: We recursively apply the quicksort function to the left and right lists and concatenate the results with the middle list.
This implementation is simple and easy to understand, but it may not be the most efficient in terms of space complexity due to the use of additional lists. For an in-place version, you can modify the algorithm to swap elements within the original array. Here's an in-place version:
defquicksort_inplace(arr, low, high):
iflow<high:
pi=partition(arr, low, high) # Partitioning indexquicksort_inplace(arr, low, pi-1) # Sort left partquicksort_inplace(arr, pi+1, high) # Sort right partdefpartition(arr, low, high):
pivot=arr[high] # Choose the last element as the pivoti=low-1# Index of smaller elementforjinrange(low, high):
ifarr[j] <=pivot:
i+=1arr[i], arr[j] =arr[j], arr[i] # Swaparr[i+1], arr[high] =arr[high], arr[i+1] # Swap pivot elementreturni+1# Example usage:arr= [3, 6, 8, 10, 1, 2, 1]
quicksort_inplace(arr, 0, len(arr) -1)
print("Sorted array:", arr)
In this in-place version, the partition function rearranges the elements in the array such that elements less than the pivot are on the left, elements greater than the pivot are on the right, and the pivot is in its correct position. The quicksort_inplace function then recursively sorts the subarrays.
==========
Sure, here's a simple implementation of the quicksort algorithm in Python:
defquicksort(arr):
# Base case: if the array is empty or has one element, it's already sortediflen(arr) <=1:
returnarr# Choose a pivot elementpivot=arr[len(arr) //2]
# Partition the array into two sub-arrays: elements less than or equal to the pivot and elements greater than or equal to the pivotless_than_pivot= [xforxinarrifx<=pivot]
greater_than_pivot= [xforxinarrifx>pivot]
# Recursively sort the two sub-arraysquicksort(less_than_pivot)
quicksort(greater_than_pivot)
# Merge the sorted sub-arraysreturnless_than_pivot+ [pivot] +greater_than_pivot
This function takes an array as input and returns a new array sorted in ascending order. It uses a simple partitioning strategy: it selects a pivot element and partitions the array into two sub-arrays: all elements less than or equal to the pivot and all elements greater than or equal to the pivot. The function then recursively sorts the two sub-arrays and merges them to form the sorted array.
I ran a couple benchmarks on M3 max and M2 Ultra. As expected we get much better scaling of the big model w.r.t. sequence length on M3 max than M2 Ultra. This probably explains why we are seeing little to no performance improvement on M2 Ultra.
In the figure below you see time as you increase sequence length. You want the line to be as flat as possible for the best possible speedup with speculative generation.
On the optimistic side, from conversations @angeloskath and @barronalex there is likely room to improve small batch qmm which should help this use case considerably.
Speculative decoding does not seem to improve generation speed as expected on M2 Ultra Mac Studio, 128GB.
Main model: https://huggingface.co/lmstudio-community/Qwen2.5-Coder-32B-Instruct-MLX-4bit
Draft model: https://huggingface.co/lmstudio-community/Qwen2.5-Coder-0.5B-Instruct-MLX-4bit or https://huggingface.co/mlx-community/Qwen2.5-0.5B-Instruct-4bit
Prompt: "Write a quicksort algorithm"
Without spec decoding: 29.803 tokens-per-sec
With spec decoding: 29.051 tokens-per-sec
Qwen2.5-Coder-0.5B-Instruct-MLX-4Bit alone: 284.647 tokens-per-sec
In the same situation on an M3 Pro, 32GB of ram, we see tremendous speedup (~7tok/sec -> ~16tok/sec)
Full logs:
Click to expand
(venv) ➜ test mlx_lm.generate --model lmstudio-community/Qwen2.5-Coder-32B-Instruct-MLX-4bit --prompt "Write a quicksort algorithm" --draft-model mlx-community/Qwen2.5-0.5B-Instruct-4bit -m 1000 --temp 0
==========
Certainly! Quicksort is a popular and efficient sorting algorithm that uses a divide-and-conquer approach to sort elements. Below is a simple implementation of the Quicksort algorithm in Python:
Explanation:
left
for elements less than the pivot.middle
for elements equal to the pivot.right
for elements greater than the pivot.quicksort
function to theleft
andright
lists and concatenate the results with themiddle
list.This implementation is simple and easy to understand, but it may not be the most efficient in terms of space complexity due to the use of additional lists. For an in-place version, you can modify the algorithm to swap elements within the original array. Here's an in-place version:
In this in-place version, the
partition
function rearranges the elements in the array such that elements less than the pivot are on the left, elements greater than the pivot are on the right, and the pivot is in its correct position. Thequicksort_inplace
function then recursively sorts the subarrays.==========
Prompt: 34 tokens, 71.386 tokens-per-sec
Generation: 709 tokens, 29.051 tokens-per-sec
Peak memory: 18.932 GB
(venv) ➜ test mlx_lm.generate --model lmstudio-community/Qwen2.5-Coder-32B-Instruct-MLX-4bit --prompt "Write a quicksort algorithm" -m 1000 --temp 0
==========
Certainly! Quicksort is a popular and efficient sorting algorithm that uses a divide-and-conquer approach to sort elements. Below is a simple implementation of the Quicksort algorithm in Python:
Explanation:
left
for elements less than the pivot.middle
for elements equal to the pivot.right
for elements greater than the pivot.quicksort
function to theleft
andright
lists and concatenate the results with themiddle
list.This implementation is simple and easy to understand, but it may not be the most efficient in terms of space complexity due to the use of additional lists. For an in-place version, you can modify the algorithm to swap elements within the original array. Here's an in-place version:
In this in-place version, the
partition
function rearranges the elements in the array such that elements less than the pivot are on the left, elements greater than the pivot are on the right, and the pivot is in its correct position. Thequicksort_inplace
function then recursively sorts the subarrays.==========
Prompt: 34 tokens, 75.790 tokens-per-sec
Generation: 709 tokens, 29.803 tokens-per-sec
Peak memory: 18.643 GB
(venv) ➜ test mlx_lm.generate --model lmstudio-community/Qwen2.5-Coder-0.5B-Instruct-MLX-4bit --prompt "Write a quicksort algorithm" -m 1000 --temp 0
==========
Sure, here's a simple implementation of the quicksort algorithm in Python:
This function takes an array as input and returns a new array sorted in ascending order. It uses a simple partitioning strategy: it selects a pivot element and partitions the array into two sub-arrays: all elements less than or equal to the pivot and all elements greater than or equal to the pivot. The function then recursively sorts the two sub-arrays and merges them to form the sorted array.
==========
Prompt: 34 tokens, 683.032 tokens-per-sec
Generation: 276 tokens, 284.647 tokens-per-sec
Peak memory: 0.299 GB
The text was updated successfully, but these errors were encountered: