I save my employer millions using technique I learned in QuantNet Advanced C++ course. AMA

Andy Nguyen · 9/6/23

Thank you for sharing your story @sonmanutd
It's great to learn how you are able to make a real impact in your work from what you learned in our courses. From reading this, it looks like every a small change to make the code more efficient can make a big difference the larger the code base.
And it also seems like you only touch the tip of the iceberg.
Would love to hear as you go through the system and find ways to optimize things.

sonmanutd · 9/7/23

Andy Nguyen said:
Thank you for sharing your story @sonmanutd
It's great to learn how you are able to make a real impact in your work from what you learned in our courses. From reading this, it looks like every a small change to make the code more efficient can make a big difference the larger the code base.
And it also seems like you only touch the tip of the iceberg.
Would love to hear as you go through the system and find ways to optimize things.

These big advertising platforms tend to have great tools for tracking the efficiency of each piece of code. There are platform showing how much a function contributes to latency/resource usage, so it is really easy to see which piece of code is inefficient. Opportunities for gains basically boil down to

1. How familiar you are with the code base
2. How well tested the code you plan to change is
3. How would the code change create any inconvenience for developments?

I am lucky that in my case, I owned the piece of code so it is easier to institute change. As for number 2, if the code has a lot of unit tests, then the owners of the code would be more comfortable allowing the change. Also, if a feature is under development, then usually we don't optimize the code right away to allow the project owner to develop quickly.

There are a lot of things to consider, but I have often found my changes accepted in a well-tested piece of code that has been running the system for a long time but is not yet up-to-date with the newest techs. I made a lot of small gains just by converting inefficient loops into ranges.

MikeLawrence · 9/7/23

I need to figure out what the range based process is, because I was asked an array sorting question for a technical screening and even though my multiple for loops would have worked for all cases it wouldn't compile in time for three of them (under two seconds).

This gives a great practical reason to move into the Advanced course when I'm able/finished with the first, thanks.

As for a question, what was your:
favorite/least favorite sections in the advanced course
hardest/easiest
most useful/most niche

sonmanutd · 9/7/23

I need to figure out what the range based process is, because I was asked an array sorting question for a technical screening and even though my multiple for loops would have worked for all cases it wouldn't compile in time for three of them (under two seconds).

What was the technical interview question that you had? I don't think ranges actually help with time complexity. It is just that it takes advantage of certain aspects of the loop to make the code slightly more optimized. For array sorting, I think it is probably that your algorithm is actually not optimal. I am really curious about your screening question

favorite/least favorite sections in the advanced course

I think Level 3 about Threads and Processes is great. Especially helpful and powered much of the improvement that I did. The template part is great and comprehensive too. Regex is the part that is quite tedious and too technical in my opinion.

hardest/easiest

Exercise 7 of Module 7 was quite challenging and took me a bit of time to complete (though I liked the challenge).
No parts feel really easy in my opinion

most useful/most niche

From the context of working in a big corporation, I definitely say ranges and threads. There is so much room to make improvements both big and small using this knowledge. The most niche is probably the use of concepts. It is quite interesting and I am still yet to grasp the full benefits of it. But the ability to classify by their functions and, I think, allow the method to be determined at compile time due to the use of template (@Daniel Duffy please correct me on this), means that you will save instructions and jumps, and enable a lot of time savings when these benefits scale up.

MikeLawrence · 9/7/23

sonmanutd said:
What was the technical interview question that you had? I don't think ranges actually help with time complexity. It is just that it takes advantage of certain aspects of the loop to make the code slightly more optimized. For array sorting, I think it is probably that your algorithm is actually not optimal. I am really curious about your screening question

I was given an array of positively numbered integers and had to count the number of pairs. So less sorting than matching.
I believe my method was to take each element of the array and then loop through the rest of the array beginning at that element (testing each new element with a boolean). If I found a match I assigned the matched element the value -1 since I knew that wouldn't match anything else and mess me up later. After either finding a pair or not, each new element was checked for positive value otherwise it was skipped. Pretty straightforward.

I did something like that anyway. The speed improvement I can think of right now would be to remove the element that paired with the current element when I found it, so that I'm not looping through, checking, and then moving on from a bunch of -1's. But I didn't think of a way to do that during the screening. Afterward I have seen that you can just reassign the next element of the array to the element before it starting from that point on but I would think that would take more time than my method anyway. It was a standard array, and not an STL class or something, so I don't know if there is a function like remove() out there that I could have utilized.

sonmanutd · 9/7/23

MikeLawrence said:
I was given an array of positively numbered integers and had to count the number of pairs. So less sorting than matching.
I believe my method was to take each element of the array and then loop through the rest of the array beginning at that element (testing each new element with a boolean). If I found a match I assigned the matched element the value -1 since I knew that wouldn't match anything else and mess me up later. After either finding a pair or not, each new element was checked for positive value otherwise it was skipped. Pretty straightforward.

I did something like that anyway. The speed improvement I can think of right now would be to remove the element that paired with the current element when I found it, so that I'm not looping through, checking, and then moving on from a bunch of -1's. But I didn't think of a way to do that during the screening. Afterward I have seen that you can just reassign the next element of the array to the element before it starting from that point on but I would think that would take more time than my method anyway. It was a standard array, and not an STL class or something, so I don't know if there is a function like remove() out there that I could have utilized.

Wouldn't that be O(N^2)?

You can count the existence of each number first, and put them into hash map.
{1: 5, 2: 6, 3: 8, 4: 1} for example, in which the key is the number, and the value is the count.

Making this hashmap would be O(N)
And for each number, you just divide by 2 to get the pair.

Let's bring this convo to DM :D

Daniel Duffy · 9/7/23

sonmanutd said:
Wouldn't that be O(N^2)?

You can count the existence of each number first, and put them into hash map.
{1: 5, 2: 6, 3: 8, 4: 1} for example, in which the key is the number, and the value is the count.

Making this hashmap would be O(N)
And for each number, you just divide by 2 to get the pair.

Let's bring this convo to DM :D

Average or worst-case O(..)?

std::unordered_map is an associative container that contains key-value pairs with unique keys. Search, insertion, and removal of elements have average constant-time complexity.

Internally, the elements are not sorted in any particular order, but organized into buckets. Which bucket an element is placed into depends entirely on the hash of its key. Keys with the same hash code appear in the same bucket. This allows fast access to individual elements, since once the hash is computed, it refers to the exact bucket the element is placed into.

sonmanutd · 9/7/23

Daniel Duffy said:
Average or worst-case O(..)?

std::unordered_map is an associative container that contains key-value pairs with unique keys. Search, insertion, and removal of elements have average constant-time complexity.

Internally, the elements are not sorted in any particular order, but organized into buckets. Which bucket an element is placed into depends entirely on the hash of its key. Keys with the same hash code appear in the same bucket. This allows fast access to individual elements, since once the hash is computed, it refers to the exact bucket the element is placed into.

Average O(N).

Worst case is O(N^2), but is only the case if every element inserted hashed to the same key, meaning that we have to iterate the bucket. But that is probably really rare.

Daniel Duffy · 9/7/23

sonmanutd said:
Average O(N).

Worst case is O(N^2), but is only the case if every element inserted hashed to the same key, meaning that we have to iterate the bucket. But that is probably really rare.

??
that contradicts the official documentation.

sonmanutd · 9/7/23

std::unordered_map - cppreference.com

en.cppreference.com

Official documentation doesn't say anything about worst case complexity

What is the worst case scenario for an unordered_map?

I have found many posts about the complexity of map and unordered_map. It is said that unordered_map has worst case complexity of O(N). For my purpose I will have input as sorted values like 1 2 5 ...

stackoverflow.com

unordered_map C++ - Scaler Topics

Learn all the concepts related to unordered_map in C++ & why we use it. Scaler Topic also explains the syntax & time complexity analysis of different operations.

www.scaler.com

I searched several source and they all said the inseration worst time is O(N). Now, you have to insert N time for each element of the array. That would make it O(N^2)

Daniel Duffy · 9/7/23

An unordered_map is a data structure that stores data in the form of key-value pairs. The best case and the average case complexity for all the operations in an unordered_map is O(1). While in the worst case, the time complexity for all the operations in an unordered_map is O(n).

sonmanutd · 9/7/23

Daniel Duffy said:
An unordered_map is a data structure that stores data in the form of key-value pairs. The best case and the average case complexity for all the operations in an unordered_map is O(1). While in the worst case, the time complexity for all the operations in an unordered_map is O(n).

I think we are saying the same thing. I was referring to O(N^2) as the complexity of worst case of the whole algorithm needed to solve the counting pairs, and not just the insertion to unordered_map part (you need a for loop to insert all the elements to unordered_map too)

I agree that on average, insertion is O(1). But because insert N times, it makes the whole problem O(N) on average.
Worse case, insertion is O(N). But we insert N times. So in the off case, the whole problem is O(N^2) at worst.

I guess we are turning this thread into leet code lol.

Andy Nguyen · 9/7/23

Are there other new C++2X features that you are interested in implementing at work?

sonmanutd · 9/11/23

Andy Nguyen said:
Are there other new C++2X features that you are interested in implementing at work?

I already mentioned concepts and range-based. There are still much to optimize with those features.

For other features like spaceship operator, map contains, strings prefix and suffix or calendar libraries, I have found that these have very incremental benefits. While it may be true, often time there are just too many instances of the old code, that the owner often unwilling to make one exception if it hurts clarity of the code or create different looking codes that do the same things. For example, if the string prefix is checked in a certain way in the code base, then the owner would like to either keep it the old way, or replace every code instances with the new method for the sake of "consistency". In terms of clarity, I have found that people are really receptive of std::accumulate, but is a bit afraid of using transform-reduce. It is a constant battle to decide between better code, impact and suitability to the organization.

Finally, after this round of optimization, I actually plan to focus more on parallelization of offline pipelines that involve processing of logging. Because the logging is so big, there are pipelines that run everyday cost a lot of resources. These will have less latency requirement but more memory requirement. The use of threads will be really useful here.

sonmanutd · 9/11/23

sonmanutd said:
Finally, after this round of optimization, I actually plan to focus more on parallelization of offline pipelines that involve processing of logging. Because the logging is so big, there are pipelines that run everyday cost a lot of resources. These will have less latency requirement but more memory requirement. The use of threads will be really useful here.

One funny story. Someone forgot to turn off a pipeline that is no longer useful. The pipeline keeps running and cost resources. When someone found out, we turned off the gas stove and save resource equals to a few percentage of our annual saving goals lol.

MichaelLewis · 9/11/23

How did you go about calculating all these millions saved and developer years saved? I’d be curious to learn how to quantify this better for my own purposes as it seems like a valuable career thing to be able to measure concretely.

sonmanutd · 9/13/23

MichaelLewis said:
How did you go about calculating all these millions saved and developer years saved? I’d be curious to learn how to quantify this better for my own purposes as it seems like a valuable career thing to be able to measure concretely.

The company has an excellent infrastructure to measure the gains. Every binary job has a good record keeping of CPU, RAM, GPU, and TPU usage, from when to when down to the nanoseconds. Based on this massive amount of data, an entire team is dedicated to data analysis, understanding the types of jobs that tend to perform poorly / hog resources. They will then issue directives and policies on log and resource costs and provide tools to help other Engs measure their gains. Obviously, one of these guidelines was how to convert CPU, RAM, GPU, and TPU usage into a single unit SWE years.

In my own case, because the parallelization of the ad candidate process makes the query run some tens of milliseconds faster, the load balancer was able to push more queries into a server because throughput increases. Therefore, fewer servers were needed for specific locales of the world. We added up all the reductions in servers, RAM, TPU, and usage, and that amounted to dozens of SWE years, hence millions.

Even the compiler was modified to have some in-house use cases. The binary also has something like a DEBUG mode on steroids, where it stores the clock time usage of every single function. A small percentage of real-world queries were reduplicated and rerouted to these binaries for the sole purpose of investigating resource hogging. This helps investigate and see which function is costing a significant amount of resources. For smaller individual projects, something similar to this, where a simple record keeping of function total CPU time could be pretty useful.

jahaanrawat · 9/17/23

MikeLawrence said:
I need to figure out what the range based process is, because I was asked an array sorting question for a technical screening and even though my multiple for loops would have worked for all cases it wouldn't compile in time for three of them (under two seconds).

This gives a great practical reason to move into the Advanced course when I'm able/finished with the first, thanks.

As for a question, what was your:
favorite/least favorite sections in the advanced course
hardest/easiest
most useful/most niche

Learning algorithm designs techniques like dynamic programming, max flow/min cut, greedy algorithms, divide and conquer, etc is really helpful for finding optimized solutions.

sonmanutd · 9/20/23

jahaanrawat said:
Learning algorithm designs techniques like dynamic programming, max flow/min cut, greedy algorithms, divide and conquer, etc is really helpful for finding optimized solutions.

Yes. This is why to this date, I still encourage people to actually practice often on LeetCode.

A lot of people often laugh at LeetCode for asking way too specific algorithm questions, while in their real job they never got to use them. But that being said, I still do think that practicing LeetCode (upto a certain point) is a great way to understand algorithm complexity, and has very regular practice with it. This really helps people out in identifying these inefficiencies and contribute positively to whatever companies you work for.

I save my employer millions using technique I learned in QuantNet Advanced C++ course. AMA

Andy Nguyen

sonmanutd

MikeLawrence

sonmanutd

MikeLawrence

sonmanutd

Daniel Duffy

C++ author, trainer

sonmanutd

Daniel Duffy

C++ author, trainer

sonmanutd

std::unordered_map - cppreference.com

What is the worst case scenario for an unordered_map?

unordered_map C++ - Scaler Topics

Daniel Duffy

C++ author, trainer

sonmanutd

Andy Nguyen

sonmanutd

sonmanutd

MichaelLewis

sonmanutd

jahaanrawat

sonmanutd