#### Background

Imagine you have a list of papers sorted alphabetically by the names at the top (assume by last name). How would you find the name John Stephens?

One obvious way to do it is to look at the first paper, then the second, then the third, and so on until you find John Stephens. Imagine that's the last name in the list. You would need to look at every paper to find it. That means if there are n papers, you'd look at all n. This is often called O(n) complexity.

Another way you might do it is split the papers in half. Maybe one stack ends with Tim Nichols and the other starts with Steve Norton. Norton comes before Stephens alphabetically, so you use that stack. Split it in half and repeat. Eventually you will find John Stephens. Each time you split, you cut the stack by 1/2. That means that your worst case scenario here is (2)^(# splits) = n. We can rearrange that to solve for the number of splits:

(2)^(# splits) = n

(# splits)*log(2) = log(n)

(# splits) ~ log(n)

That means if there are n papers, you'd look at log(n)/log(2) in the worst case. This is often called O(log(n)) complexity. If you are interested in the general rules, wikipedia is a good start. This particular algorithm is a simple binary search.

To see the next tier, we can move to sorts. How do you get a sorted list like that in the first place?

One method is to take the first number, compare it with the second, the third, and so on, then repeat for the second number. You search the entire list each time, and you do one search per item in the list. If the list is n items, this is n searches of n items which is n^2 operations, so that's called O(n^2).

Another method is to merge elements 1 and 2, 3, and 4, and so on in order. Then, merge the 1/2 and 3/4 blocks, the 5/6 and 7/8 blocks, and so on. Then merge the 1/2/3/4 and 5/6/7/8 blocks. You have to scan the n elements each time you merge, and since the groups scale up as 2^n, the total merges is ~ log(n). That's O(n*log(n)) overall. This particular algorithm is called a mergesort.

Below is a gif comparing those two. Pay attention to the left-right movement. That's your 'n' term. In the top one, you can see the groups go up by a factor of 2 each time which scales as log(n). In the bottom one, the left-right movement happens for ~ every element which scales as n:

Hopefully this gives a sense of the differences in algorithm complexities. Choosing a better algorithm can greatly increase the speed of your code which is generally extremely important. Note also that I'm using complexity to mean 'time complexity' here. There is also memory/space complexity that I ignored here.

One obvious way to do it is to look at the first paper, then the second, then the third, and so on until you find John Stephens. Imagine that's the last name in the list. You would need to look at every paper to find it. That means if there are n papers, you'd look at all n. This is often called O(n) complexity.

Another way you might do it is split the papers in half. Maybe one stack ends with Tim Nichols and the other starts with Steve Norton. Norton comes before Stephens alphabetically, so you use that stack. Split it in half and repeat. Eventually you will find John Stephens. Each time you split, you cut the stack by 1/2. That means that your worst case scenario here is (2)^(# splits) = n. We can rearrange that to solve for the number of splits:

(2)^(# splits) = n

(# splits)*log(2) = log(n)

(# splits) ~ log(n)

That means if there are n papers, you'd look at log(n)/log(2) in the worst case. This is often called O(log(n)) complexity. If you are interested in the general rules, wikipedia is a good start. This particular algorithm is a simple binary search.

#### Visualization

So the problem above found that two obvious searching algorithms have O(n) and O(log(n)) complexity. What does that mean in day to day life, and why does anyone care? To attempt to convey why it's useful for a developer to understand this, I've made two visualizations of the above algorithms.

First, each algorithm attempts to find the number 29 in a set of 40 numbers sorted from low to high. Each step of the algorithm takes 1 second in the gif below, and the current location is highlighted.

First, each algorithm attempts to find the number 29 in a set of 40 numbers sorted from low to high. Each step of the algorithm takes 1 second in the gif below, and the current location is highlighted.

Next, each algorithm attempts to find the number 83 in a set of 100 numbers sorted from low to high. Each step of the algorithm takes 0.5 seconds in the gif below, and the current location is again highlighted.

To see the next tier, we can move to sorts. How do you get a sorted list like that in the first place?

One method is to take the first number, compare it with the second, the third, and so on, then repeat for the second number. You search the entire list each time, and you do one search per item in the list. If the list is n items, this is n searches of n items which is n^2 operations, so that's called O(n^2).

Another method is to merge elements 1 and 2, 3, and 4, and so on in order. Then, merge the 1/2 and 3/4 blocks, the 5/6 and 7/8 blocks, and so on. Then merge the 1/2/3/4 and 5/6/7/8 blocks. You have to scan the n elements each time you merge, and since the groups scale up as 2^n, the total merges is ~ log(n). That's O(n*log(n)) overall. This particular algorithm is called a mergesort.

Below is a gif comparing those two. Pay attention to the left-right movement. That's your 'n' term. In the top one, you can see the groups go up by a factor of 2 each time which scales as log(n). In the bottom one, the left-right movement happens for ~ every element which scales as n:

Hopefully this gives a sense of the differences in algorithm complexities. Choosing a better algorithm can greatly increase the speed of your code which is generally extremely important. Note also that I'm using complexity to mean 'time complexity' here. There is also memory/space complexity that I ignored here.

## 0 comments:

## Post a Comment