Monday, May 20, 2013

Finding distinct elements in an array


For example, consider an array of ten elements {4, 7, 3, 6, 4, 9, 6, 2, 9, 0}. The distinct elements are {4, 7, 3, 6, 9, 2, 0}. The element ordering in the output need not be maintained.

A simple approach is to sort the input array and find the distinct elements by comparing the adjacent elements. The time complexity is O(nlogn).

An efficient approach is to use a hash set/unordered set. A hash set is a data structure which holds distinct elements. It is an unordered collection of elements which provides insertion and look up operations in O(1) time. It uses a hashing algorithm to distribute the elements into buckets.

Read each of the input elements into a hash set. Once the entire array is read, just iterate over the hash set to get the distinct elements. The time complexity of this approach is O(n) as we are going over all the array elements exactly once.

Here is the C++ code:

int input_array[ARRAYSIZE] = {4, 7, 3, 6, 4, 9, 6, 2, 9, 0};

void RemoveDuplicates(const int* input_array) {
  unordered_set<int> input_distinct;
  for(int i = 0; i < ARRAYSIZE; ++i) {
    input_distinct.insert(input_array[i]);
  }
  unordered_set<int>::const_iterator iter;
  for(iter = input_distinct.begin(); iter != input_distinct.end(); ++iter) {
    cout << *iter << " ";
  }
}

Invoke the function as: RemoveDuplicates(input_array);

No comments:

Post a Comment