Segment Tree + Coordinate Compression: A love story or overkill ?

Segment Tree + Coordinate Compression: A love story or overkill ?

Let’s begin with some context, I learned segment tree in my college but I never practiced it thoroughly and it costed me couple of questions in recent codeforces rounds. Thus, I wanted to understand the segment tree more in depth, it’s flexibility and how can i use it to solve questions.

If you don’t know segment tree, I am attaching resources for you to refer.

I had or still have a bad habit of solving range query problems via online approach rather than offline approach.

Online Approach
Process the queries one by one as per the order in which they are given.
Offline Approach
Intelligently change the order of processing of queries to efficiently process them.

Though, I was able to solve few questions via online approach using some other technique which I will share in the later articles but while going through the solutions of other I realised solving questions via offline approach could be really easy, intuitive and efficient.

Let’s take an example of one such question i.e., Enemy is Weak. This question can be reduced to the following question

Given an array of n elements, find the number of triplets i, j, k such that i < j < k and ai > aj > ak where ax is the power of man standing at position x.

Constraints: 3 ≤ n ≤ 106 ,1 ≤ i ≤ n, 1 ≤ ai ≤ 109

My approach to solve this problem was to find:

for each 1 <= i <= n, calculate the number of indices j<i such that a[j]>a[i] and the number of indices k>i such that a[i]>a[k] and multiply them to find the total number of triplets formed with i being at the middle and return the sum of all of them.

Now the problem is reduced to following steps.

  1. for each index i, find the number of indices j such that j<i & a[j]>a[i] and store it in vector ( say X )

  2. for each index i, find the number of indices k such that k>i & a[k]<a[i] and store it in vector ( say Y )

  3. for each index i, multiply the above two values i.e., X[i]*Y[i] and sum the calculated product value to return the total number of triplets.

The 3rd step is pretty straight forward, but the problem is first two steps and we have to find it efficiently.

From this point onwards, I am going to explain how can we find the values for first step. The second step will be very similar and thus leaving you the readers with this exercise.

For each index i, find the number of indices j such that j<i & a[j]>a[i]

For a moment, let’s forget about the constraint and assume 1<= a[i] <=106 , and all the elements are distinct i.e., vector a is a permutation of n and now it’s possible to create a vector of 106 length, right?

With this assumption, we can solve the problem in following way

  1. Initialise a segment tree of n length with 0 value which supports range sum queries and point update.

  2. Iterate from left to right

  3. For each i,

    1. Query the segment tree from [a[i]+1,n] and stores this value/

    2. Update the value of segment tree at a[i] index with 1.

Simple, right? but how can we do the same thing with, 1 ≤ a[i] ≤ 109 ?

Observation

We can see that n <=106 in the constraints, it means the total number of elements is within the limit and though a[i] can be upto 109 but there can be at most 106 of them.

So, what if we create a 1:1 mapping of all the a[i]’s value in the range [1, 106] while maintaining the order, can we simulate the same algorithm described above and get he correct answer?

Well, the answer is obviously yes, I just want you folks to think about it.

Coordinate Compression
In layman terms, it means creating 1:1 mapping of higher values to lower values while maintaining the order

How to create the mapping?

Easy peasy,

Sort the original array

Yeah, that’s it. Sorted array indices will be the compressed coordinates for the respective value.

WARNING: SPOILERS AHEAD, if you understood up till now, try solving on your own first.

If you are still confused refer to the below code.

    int n;
    cin>>n;
    vector<int> at(n);
    for(int i=0;i<n;i++) cin>>at[i];
    vector<int> bt=at;
    sort(bt.begin(),bt.end());
    vector<int> a(n,0);
    SegmentTree seg_tree(a);  // range sum tree
    vector<int> left(n),right(n);
    int req;
    for(int i=0;i<n;i++){
        req= lower_bound(bt.begin(), bt.end(), at[i])-bt.begin(); // to calculate the compressed index
        seg_tree.update(req, 1);
        int val = seg_tree.query(req+1, n-1);
        left[i]=val;
        // to calculate the right side inversions but there can be other ways to achieve 
        // this and you can try this by creating another segment tree
        right[i]= req - (i-val); 
    }
    int total=0ll;
    for(int i=0;i<n;i++){
        int val = left[i]*right[i];
        total+=val;
    }
    cout<<total<<endl;

Why it can be a potential overkill?

Segment Tree + Coordinate compression is great combo and for me who is writing code in C++ and compiling it using Clang, there is no other alternative as of now but if you are using gcc compiler, you can use Policy Based Data Structure to solve this question, relatively easy, without using segment tree.

If you don’t know about policy based data structure read this article first.

How exactly Policy Based Data Structure helpful here?

Again, let’s just think about the step 1 of the solution i.e., for each index i, find the number of indices j such that j<i & a[j]>a[i].

If we iterate from left to right and put a[i] in an ordered set, then during the iteration for an index i, if we can find a[i] position in the set, can’t we also find the number of elements greater than a[i] in the ordered set that are encountered so far?

And isn’t that same as the number of indices j such that j<i and a[j] > a[i] ? It is.

Yes, it is this simple and if it is this simple then why should we bother implementing segment tree and doing coordinate compression and pulling our hairs?

Well, even I don’t know, that’s why I called it an overkill.

Ordered Set
The elements of the set are in sorted order.

Now, some of you might be thinking, something is missing as I haven’t spoken about usage of PBDS yet and yes, a piece is still missing in the puzzle.

Finding the position of a[i] in the ordered set (std::set) requires iterating over the ordered set. Thus, taking O(n), making the algorithm O(n*n).

And here comes the PBDS to our rescue, it provides us a way to find the number of elements smaller than the given number in the ordered set in logarithmic time via order_of_key() function which makes the total time O(n*log(n)).

#include <bits/stdc++.h>
using namespace std;
#define ll long long

// for Policy based data structure we include these files
#include <ext/pb_ds/assoc_container.hpp> // Common file
#include <ext/pb_ds/tree_policy.hpp> 
#include <ext/pb_ds/detail/standard_policies.hpp>
using namespace __gnu_pbds;
typedef tree<int, null_type, less<int>, rb_tree_tag, tree_order_statistics_node_update> ordered_set;

signed main() {
    int n;
    cin>>n;
    vector<ll> a(n);
    for(int i=0;i<n;i++) cin>>a[i];
    ll total=0ll;
    ordered_set st;
    ordered_set bst;
    vector<ll> left(n);
    vector<ll> right(n);
    // Step 1
    for(int i=0;i<n;i++){
        st.insert(a[i]);
        ll noOfElementsSmaller = st.order_of_key(a[i]);
        ll noOfElementsLargerLeftSide = i - noOfElementsSmaller;
        left[i]=noOfElementsLargerLeftSide;
    }
    // Step 2
    for(int i=n-1;i>=0;i--){
        bst.insert(a[i]);
        ll noOfElementsSmaller = bst.order_of_key(a[i]);
        right[i]=noOfElementsSmaller;
    }
    // Step 3
    for(int i=0;i<n;i++) total+= (left[i]*right[i]);
    cout<<total<<endl;
}

Conclusion

Segment Tree + Coordinate Compression is an excellent combo and it helps in solving a lot of problems but with the emergence of data structures like PBDS, it may be an overkill to use it everywhere.

Also, the Policy based data structure is more intuitive than the segment tree, in my humble opinion.

Resources