Assymptotic Notations. Source: Eternally Confuzzled

By Julienne Walker
License: Public Domain

I try to minimize things that might not be well known to my target audience because I'm reasonably sure that when a lot of people reach some funky notation that they don't understand, they stop reading. Now, while I try also to explain the good, the bad, and the ugly without using those notations as well, it's hard to realize this if you don't get to that point. ;-) The most important measures that I use are for asymptotic growth of algorithms. This article will attempt to explain the use of these notations without requiring you to have a degree in computer science.

Because I'm simplifying reality, it should be noted that none of the information in this article will be completely accurate in a formal mathematical sense, so please refrain from emailing me with complaints that I'm not entirely correct. The information in this article is more than sufficient to know what I mean when I say O(N2), or Ω(Nlog2 N) and which one is better. And to be perfectly frank, once you get the basic concept, that's more than enough to live your life unless you have a burning desire to formally analyze and prove the performance properties of an algorithm. I don't really care about that because I can get close enough with my simplified reality to know which algorithms to choose.

The very word “asymptotic” scares the bejeezus out of people because it sounds complicated. The definition doesn't serve to aleviate that fear. Something that's asymptotic relates to a asymptote, which is defined as “A line whose distance to a given curve tends toward zero”. That's damn near worthless, so let's say that something asymptotic refers to a limiting behavior based on a single variable and a desired measure.

For example, we can use the number of steps that it takes for an algorithm that works with N items to complete (aka. time complexity) as the desired measure, and derive an asymptotic bound on the time complexity by increasing N toward infinity. In real person terms, we're just figuring out how much longer the algorithm takes when we add more items. The most common way to go about this is to double the number of items and see how much longer the algorithm takes.

Now, we could actually test this by writing the algorithm, profiling it to see how long it takes for N, then profiling it again after doubling N. The time difference is a rough estimate of the growth. This is called an empirical test. However, we can also do a theoretical test by measuring the steps that rely on the size of N and get a reasonably useful measure of how the time complexity grows. Because the steps that don't rely on N won't grow, we can remove them from the measure because at a certain point, they become so small as to be worthless. In other words, we pretend that they don't matter in all cases.

This is the idea behind asymptotic notation. By removing the constants (variables that have a fixed but unknown value), we can focus on the part of the measure that grows and derive a simplified asymptotic bound on the algorithm. A common notation that removes constants is called Big O Notation, where the O means “order of” (there are variants that do something similar that we'll look at shortly). Let's look at an example:

1 void f ( int a[], int n )
2 {
3 int i;
4
5 printf ( "N = %d\n", n );
6
7 for ( i = 0; i < n; i++ )
8 printf ( "%d ", a[i] );
9 printf ( "\n" );
10 }
In this function, the only part that takes longer as the size of the array grows is the loop. Therefore, the two printf calls outside of the loop are said to have a constant time complexity, or O(1), as they don't rely on N. The loop itself has a number of steps equal to the size of the array, so we can say that the loop has a linear time complexity, or O(N). The entire function f has a time complexity of 2 * O(1) + O(N), and because constants are removed, it's simplified to O(1) + O(N).

Now, asymptotic notation also typically ignores the measures that grow more slowly because eventually the measure that grows more quickly will dominate the time complexity as N moves toward infinity. So by ignoring the constant time complexity because it grows more slowly than the linear time complexity, we can simplify the asymptotic bound of the function to O(N), so the conclusion is that f has linear time complexity.

Okay, but what does O really mean? Big O notation refers to the asymptotic upper bound, which means that it's a cap on how much the time complexity will grow. If we say that a function is O(1), then there's no growth and the function will always take a fixed amount of time to complete. If we say that a function is O(N) then if N doubles, the function's time complexity at most will double. It may be less, but never more. That's the upper bound of an algorithm, and it's the most common notation.

Now, even though O notation is the most common, it's not always the most accurate measure. For example, let's say we have a sequential search of an unordered array where the items are randomly distributed and we want both the average case growth and the worst case growth:

1 int find ( int a[], int n, int x )
2 {
3 int i;
4
5 for ( i = 0; i < n; i++ ) {
6 if ( a[i] == x )
7 return 1;
8 }
9
10 return 0;
11 }
This algorithm is clearly O(N) because it only has one loop that relies on the size of the array, and the time complexity of the loop doubles as the size of the array doubles. However, that's the worst case upper bound. We know (because smarter people than I figured it out) that on average, only half of the array is searched before the item is found due to the random distribution. So while the time complexity could reach O(N), it's usually less even though we don't really know how much less.

Okay, how about a binary search instead of a sequential search? If the array is sorted, we can make the search a lot faster by splitting the array in half at each comparison and only searching the half where the item might be. That's common knowledge, but why is it faster? Here's the code for a binary search:

1 int find ( int a[], int n, int x )
2 {
3 int i = 0;
4
5 while ( i < n ) {
6 int mid = ( n + i ) / 2;
7
8 if ( a[mid] < x )
9 n = mid;
10 else if ( a[mid] > x )
11 i = mid + 1;
12 else
13 return 1;
14 }
15
16 return 0;
17 }
We can call this an O(N) algorithm and not be wrong because the time complexity will never exceed O(N). But because the array is split in half each time, the number of steps is always going to be equal to the base-2 logarithm of N, which is considerably less than O(N). So an even better choice would be to set the upper bound to log N, which is the upper limit that we know we're guaranteed never to cross. Therefore, a more accurate claim is that binary search is a logarithmic, or O(log2 N), algorithm.

Sometimes we're interested not in an upper bound, but in a lower bound. What's the smallest time complexity that we can expect? For example, what if we want to know the lower bound for the binary search we just found the upper bound for? Well, since a correct binary search is guaranteed to only take log N steps to complete, we can say that the lower bound is also logarithmic. There's a notation for the lower bound too, called Omega, so we can say that the lower bound for binary search is Ω(log2 N).

Wait a second. The upper and lower bounds are the same! That's good, now we can have a very accurate bound on the time complexity of a binary search. There's a notation for the asyptotically tight bound too, called Theta, and since we know the O and Ω for binary search and they're the same, we can say that binary search is Θ(log N). O(log2 N) is still correct, and you'll see it more often than any of the other notations, but Θ(log2 N) is a much stronger claim.

Note that in the best case scenario, the first item we look at would be the one we're looking for and the search would effectively be O(1), so why is the lower bound Ω(log2 N)? Remember that we're only using one variable, the size of the array, to derive our measure. If we use other variables, such as the contents of the array and the item being searched for, we can easily say that the lower bound is O(1) because the best possible case is an immediate match. However, lacking those extra variables, we can't make an assumption. Therefore, the longest time complexity possible is logarithmic for both the upper and lower bounds.

Okay, what about a sorting algorithm? Let's start with selection sort. The algorithm is simple: find the largest item and move it to the back of the array. When you move an item to the back, decrease the size of the array so that you don't continually choose from the items that have already been selected:

1 void jsw_selection ( int a[], int n )
2 {
3 while ( --n > 0 ) {
4 int i, max = n;
5
6 for ( i = 0; i < n; i++ ) {
7 if ( a[i] > a[max] )
8 max = i;
9 }
10
11 if ( max != n )
12 jsw_swap ( &a[n], &a[max] );
13 }
14 }
This algorithm has two loops, one inside of the other. Both rely on the size of the array, so the algorithm is clearly O(N * N), more commonly shown as O(N2) and referred to as quadratic. The fact that N decreases with each step of the outer loop is irrelevant unless you want a tight bound, and even then it's difficult to analyze. But that doesn't matter much because the upper bound is really all we care about for an existing sorting algorithm.

Let's look at a faster sort. The heap sort algorithm uses a tree based structure to make the selection process faster. Because the selection part of a selection sort is Θ(N), and the outer loop that it's nested in is O(N), selection sort is O(N2). But by using a heap where selection is O(1) and fixing the heap is Θ(log2 N), we can turn the whole process of selection into a Θ(log2 N) process:

1 void jsw_do_heap ( int a[], int i, int n )
2 {
3 int k = i * 2 + 1;
4 int save = a[i];
5
6 while ( k < n ) {
7 if ( k + 1 < n && a[k] < a[k + 1] )
8 ++k;
9
10 if ( save >= a[k] )
11 break;
12
13 a[i] = a[k];
14 i = k;
15 k = i * 2 + 1;
16 }
17
18 a[i] = save;
19 }
20
21 void jsw_heapsort ( int a[], int n )
22 {
23 int i = n / 2;
24
25 while ( i-- > 0 )
26 jsw_do_heap ( a, i, n );
27
28 while ( --n > 0 ) {
29 jsw_swap ( &a[0], &a[n] );
30 jsw_do_heap ( a, 0, n );
31 }
32 }
Because the heap is structured like a tree, jsw_do_heap is Θ(log2 N). The first loop in jsw_heapsort is O(N / 2), but because the second loop is O(N) and dominates the first loop, we can toss the complexity of the first loop. So we have an O(N) loop that calls a Θ(log2 N) function. We conclude that the upper bound of heap sort is O(Nlog2 N), which doesn't have a set descriptive name, but it's often shown as O(N * log2 N). However, because the lower bound of heap sort is also Ω(Nlog2 N) for the same reasons as binary search, we can safely say that heap sort has Θ(log2 N) time complexity.

We've looked at the most common time complexities: O(1) for constant time, O(N) for linear time, O(log2 N) for logarithmic time, O(Nlog2 N), and O(N2) for quadratic time. Others exist, such as O(N!) for a ridiculous factorial growth, but you won't see them often. Here are the upper bound time complexities in order of growth from least to greatest that you're most likely to see:

O(1) - No growth
O(log2 N) - Grows by the logarithm of N when N doubles
O(N) - Grows with N when N doubles
O(Nlog2 N) - Grows by the product of N and the logarithm of N when N doubles
O(N2) - Grows twice as fast as N when N doubles
O(N!) - Grows by the factorial of N when N doubles
Hopefully this article clears up any confusion without delving too far into the theory. Unfortunately, algorithm analysis can only be simplified so much before it becomes useless, but I think I covered all of the foundations that would be useful in figuring out a basic time complexity for your own algorithms as well as understanding many of the time complexities given for existing algorithms.

TopFrom the twisted minds of Julienne Walker and The EC Team

more related stuff at-
http://www.eternallyconfuzzled.com

JMI's decision to provide legal aid to its two students is surely appraised!!

I really appreciate Jamia Millia Islamia's decision to help its two students legally who are apparently involved in Delhi serial blasts.
Delhi Police has shown how they work.
killing two people in jamia nagar wasn't what the trial could bring.
The guys had police verification before renting the house and now police says that the verification is false, now i question the law-keepers and lawmakers: that why do they create rules first then deny their authenticity??
I am also living in a house away from my home, and many of my friends are living in rented houses.
Some of them are Hindus and some are Muslims too, all of them have police verification and domicile certificates with them.
Are they having it all so that one day some one can shoot them dead and say they were terrorists and their verifications were fake??
In that case the police department issuing the verification and NOC will be the biggest accused. Why did they permit a person to live in a particular locality if his identity or activities were suspected??

They say India is developing, and I am happy know that how it is developing: shining on the outer side but the base is rotten.

As far as Jamia Millia is concerned, so the two students who got arrested were living in Jamia and they had only the university as their guardian and home in the big city of Delhi, so the use of students funds for legal aid is totally fair.
I deeply honor and welcome Mr. Mushirul Hasan, the VC of the university about his decision, and welcome the UPA government too, for the intellect.

JSmooth - Wrapping JARs into EXE files

Hello guys, I've been seeing that so many of u need to wrap (or convert as u might say) your java JAR archives into .exe, so that they can look better.
The solution : JSmooth API, which wraps your JAR archives into windows exe files.
Heres the link to that software-
JSmooth
Its open source and free ware.

And heres a guide which tells u about how to get things done through it in 10 easy steps.
I have created this guide myself, so for any questions related to it, ask me.
Heres its link-

Format USB Pen Drive in NTFS format

normally all the USB pen drives are formatted in FAT or FAT32 file system by windows.
but you can also have them formatted in NTFS to use the compression feature and file encryption provided b windows, and also, the ability to use quota management if ur drive is used by some particular users.

to do this, follow these steps-
1.) Right click on My Computer>Properties.
2.) On the Hardware ab, click on Device Manager Button.
3.) select your pen drive from the list of storage devices listed.
4.) Right click on the pen drive, and click Properties.
5.) Now in the Policies tab,check the Optimized for Performance.
6.) This will enable the NTFS file system when u try to format the drive normally.

Thats it..!!
ur done.
u can now use NTFS on ur pen drive also, bt there may be a compatibility issue on the OSs which do not support NTFS, like some of Linux vendors, so do manage these things.

And, After doing this, please remove ur pen drive by choosing Safely Remove Hardware option only, and not by just plucking it out from the USB port.
In win xp, this safely remove.. option will get highlighted in the system tray when pen drive is plugged in.

Enjoy!!

Removing autorun.inf virus

Sometimes your drives get infected with viruses which don't let you open that file, or simply scramble up your right click context menu.
Why use any anti virus when u can remove it urself??

Make a search for autorun.inf file, and include hidden files and system file in your search. As soon as you get the file, SHIFT+DELETE.
Now log-out and log-in again; your dives will be normal.
you can also enable the "show hidden files" options from folder options menu, but sometimes that virus disables this feature too. so for this, you can go to Command Prompt, and type-
C:>\DIR/AH

This will show u ur hidden files, now mesmerize the names to delete, delete that autorun.inf file and other unwanted files with DEL command, and re-login.
I'll be posting the registry entry which controls the visibility of hidden files.
Try this until then, or mail me if any problem

Stop Reserving, If you have the caliber, Compete

Recently I found myself competing for my Master's degree.
Though I was not very well prepared for the exams, but rather than just finding that I wasn't in, I also realized that there's a lot of people who got advantage of reservation.

I was shocked to see, that although I could have been selected, I wasn't!!

I also found many others who were not selected because competition for general category has become very very tough.

Why are we divided into categories?
When we get salaries, punishments, then the cast system is not followed, then why in competitions?

We are the citizens of India, and we all are Indians first then anything else.
By this government is also dividing the people on the basis of casts and sects.
Won't this create bigger problems?

I can estimate the outcomes of such a system too: they provide reservation in the competitions, then chances of selection of non-skilled people become high.
These r the people who at last work for the nation, and of course aren't as skilled as other people who couldn't get in because their seats were served to reserved candidates.

This is what generates, 'brain drain' from India. because the mass of the highly skilled people go abroad and work for other countries, just because their talents aren't recognized in India, their own country.

I wasn't certainly a deserving candidates, but there would have been thousands others, more skilled than me, who suffered because of reservation system.

It is a questions for Indian policy makers, how do they come around such an silly concepts without thinking about the future?

Are we still the biggest democracy when we're not giving 'equal' chances to every citizen?

Welcome!!

hello guys!!
here's my new blog, that will contain, program codes, their logic, and some concepts I would be working on.

it will also contain some tweaks and tricks for different things, to either flaunt your devices or to use them in the times of need/crisis.

compliments, comments and corrections are always welcomed. :)