Remove Duplicates from Sorted Array#
Problem#
Given an integer array nums
sorted in non-decreasing order, remove the
duplicates in-place
such that each unique element appears only once. The relative order of the
elements should be kept the same. Then return the number of unique elements in
nums
.
Consider the number of unique elements of nums
to be k
, to get accepted, you
need to do the following things:
Change the array
nums
such that the firstk
elements ofnums
contain the unique elements in the order they were present innums
initially. The remaining elements ofnums
are not important as well as the size ofnums
.Return
k
.
Intuition#
If we are allowed to use extra memory, we can use a hash table to store the unique elements in the array. Then, we can return the length of the hash table as the length of the array without duplicates. But since we are not allowed to use extra memory, we need to find a way to do this in-place. Usually this kind of pattern is solved using the two-pointer technique.
The two-pointer algorithm in question can be intuitively understood as a “runner and walker” or “fast and slow” strategy.
Let’s say you’re tasked with removing duplicates from a sorted array of numbers. Imagine these numbers are placed on a straight pathway, sorted from low to high, and duplicates are grouped together. You have two markers or pointers – the “fast” one and the “slow” one.
You start both pointers at the beginning of the pathway (the first element of the array). The fast pointer races through the pathway, visiting every number, while the slow one walks, taking its time. Whenever the fast pointer finds a new number (which is not a duplicate, since duplicates are grouped together), it signals the slow pointer to move one step forward and copy this new number.
So the slow pointer is building a collection of unique numbers in-place (i.e., using the same array), because it only moves and copies when the fast pointer signals that it has found a new unique number. The fast pointer, on the other hand, is doing the job of scanning through each element in the array.
By the time the fast pointer reaches the end of the array, the slow pointer would have gone through all unique elements, replacing duplicates with new unique elements in the original array. The position of the slow pointer (plus one, since array indexing typically starts from zero) will give the length of the array without duplicates.
This is the intuition behind the algorithm. It uses two pointers at different speeds to efficiently scan and modify the array in-place. The proof shows that this approach is sound and will give us the expected outcome.
Solution (Two-Pointers Moving In The Same Direction)#
Consider an array \(\mathcal{A}\) sorted in non-decreasing order. We seek to remove the duplicates in-place and return the new length of the array.
Intuition#
TODO.
Visualisation#
See visualization here.
Algorithm#
We use the two-pointer technique where we initiate two pointers, slow
and
fast
. The slow
pointer represents the location at which we want to place the
next unique element we find as we scan through the array. The fast
pointer
scans through the array.
Algorithm 11 (Two-Pointers)
Let’s define the following variables:
\(i\): as the slow pointer;
\(j\): as the fast pointer.
The algorithm is as follows:
Start with two pointers at the beginning of the array (index \(i\)) and one at the same location (index \(j\)).
Compare the elements at indices \(i\) and \(j\) in array \(\mathcal{A}\), i.e., check if \(\mathcal{A}[i] = \mathcal{A}[j]\).
If the elements are not equal (\(\mathcal{A}[i] \neq \mathcal{A}[j]\)), increment \(i\) (i.e., \(i = i + 1\)), and replace the element at index \(i\) with the element at \(j\) (i.e., \(\mathcal{A}[i] = \mathcal{A}[j]\)).
If the elements are equal (\(\mathcal{A}[i] = \mathcal{A}[j]\)), leave \(i\) as it is and only move \(j\) to the next position (i.e., \(j = j + 1\)).
Repeat steps 2-4 until \(j\) has scanned the entire array, i.e., until \(j = n\), where \(n\) is the length of the array.
At this point, \(i + 1\) gives the length of the array with all unique elements, and the elements from \(\mathcal{A}[0]\) to \(\mathcal{A}[i]\) are all unique elements.
In code like syntax, it would look like this:
Algorithm 12 (Two-Pointers (Code))
Start with two pointers, one at the beginning of the array (index
slow
) and one at the same location (indexfast
).Compare the elements at indices
slow
andfast
.If the elements are not equal, increment
slow
, replace the element at indexslow
with the element atfast
.If the elements are equal, leave
slow
as it is and only movefast
to the next position.Repeat steps 2-4 until
fast
has scanned the entire array.
Claim#
We first make a claim and proceed to prove it.
Theorem 73 (Claim)
Claim: Given an array \(\mathcal{A}\) sorted in non-decreasing order, the two-pointer algorithm correctly removes all duplicates from \(\mathcal{A}\) in-place, such that each element appears only once, and returns the length of the new array.
Proof#
We make a formal mathematical proof using the property of sorted arrays. The algorithm mentioned is often referred to as the “two-pointer” method, and it works efficiently on sorted arrays due to their inherent order.
Proof. Let’s denote the array at the beginning as \(\mathcal{A}\) of length \(n\). Let \(j\) be the fast pointer and \(i\) be the slow pointer. Both are indices of \(\mathcal{A}\), such that \(0 \leq i, j < n\).
Our goal is to prove that for all \(k\) where \(0 \leq k \leq i\), the elements in the subarray \(\mathcal{A}[0,...,k]\) are unique and sorted.
We will use induction on \(j\).
Base Case: When \(j = 0\), \(i = 0\) and the array consists of only one element. It is sorted and contains no duplicate elements. Therefore, the claim holds for the base case.
Inductive Step: Let’s assume that the claim is true for some \(j = k\). That is, the subarray \(\mathcal{A}[0,...,i]\) contains unique and sorted elements for some \(i \leq k\).
Now, let’s consider \(j = k + 1\). There are two possibilities:
\(\mathcal{A}[j] = \mathcal{A}[i]\): The element at index \(j\) is a duplicate of the element at index \(i\). The algorithm doesn’t increment \(i\), and the invariant still holds because the unique and sorted subarray hasn’t changed.
\(\mathcal{A}[j] \neq \mathcal{A}[i]\): The element at index \(j\) is a new unique element. The algorithm increments \(i\) and replaces \(\mathcal{A}[i]\) with \(\mathcal{A}[j]\). This maintains the invariant, as the subarray up to index \(i\) remains sorted and unique.
By induction, we can conclude that the algorithm maintains the invariant at every step, and hence the final array up to index \(i\) consists of unique and sorted elements.
The space complexity is \(\mathcal{O}(1)\), since we didn’t use any extra space in the process, which confirms the in-place nature of the algorithm.
Thus, our claim is proven.
The induction only involves \(j\) to avoid complexity as considering both \(i\) and \(j\) would make the proof more complex. However, the proof is still valid.
We give an informal proof of the claim below, which may be more intuitive.
In the sorted array, all duplicate elements are contiguous. When the fast
pointer encounters a new unique element (i.e., nums[fast] != nums[slow]
), we
increment slow
and replace nums[slow]
with nums[fast]
. Thus, we always
keep the first occurrence of each element, and duplicates are overwritten by new
unique elements.
By the end of the algorithm, slow
is at the last unique element’s index. So
the length of the array with all unique elements is slow + 1
.
The proof follows from the correctness of the algorithm, which is guaranteed by
the loop invariant: at the
start of each iteration of the loop, the array up to index slow
consists of
sorted unique elements of the array, and fast
is the index of the element to
compare with nums[slow]
. This invariant is clearly true at the start, and each
iteration of the loop maintains it, so when the loop terminates, the array up to
index slow
contains all unique elements.
The space complexity is \(\mathcal{O}(1)\) since no additional space is used. Thus, the algorithm meets the in-place requirement. Therefore, the algorithm is correct.
Implementation#
1from typing import List
2
3class Solution:
4 def removeDuplicates(self, nums: List[int]) -> int:
5 slow: int = 0
6
7 for fast in range(len(nums)):
8 if nums[fast] != nums[slow]:
9 slow += 1
10 nums[slow] = nums[fast]
11 return slow + 1
Tests#
1nums: List[int] = [1, 1, 1]
2compare_test_case(
3 actual=Solution().removeDuplicates(nums=nums),
4 expected=1,
5 description="test all duplicates",
6)