Also of note, hash tables (unless they are using perfect hashing) are not O(1). Big O notation refers to the worst case time, and the worst case time for a hash table is when the hash key is non-unique and the desired element needs to be looked up in a table. The relation between N and the complexity of this operation is somewhat implementation / data distribution dependent, but in the true worst case it is O(N) (all keys in the same bucket).
Not true. Big O notation can refer to worst case time, average case time, best case time, or any other possible mathematical function from integers or real numbers to real numbers.
Big-O notation is in fact independent of computer science. It’s sometimes taught in calculus courses and used to describe arbitrary functions that have nothing to do with algorithm running times.
Big-O in Bachmann–Landau notation specifically means the upper bound.
There are lots of other interesting measures of asymptotic behavior, and it's true that frequently in the vernacular Big-O is bandied as if it can mean many different things.
But I've never heard anyone in the computer science domain (which is surely what we're talking about when we're talking about hash tables?) argue that Big-O, when used precisely, is anything but upper bound.
Yes. The upper bound of some function. That function is not necessarily the worst-case runtime.
Best-case, average-case, and worst-case can all have upper bounds.
If the best case runtime as a function of data size is n/2, average case is n^2 + 5x, and worst case is e^n + 5n^2, these functions are asymptotically upper bounded by n, 2n^2, and 2e^n respectively, so they are big-O (incidentally they are also big-Omega and big-Theta) of n, n^2, and e^n respectively.
Big-O means the same thing in CS as it does in mathematics; this is not an issue of different domains using terms in different ways.
A famous example is Quicksort which is O(n*log(n)) average case, but not worst case where the tightest big-O bound is O(n^2). Any reputable reference you look up will agree with this. From Wikipedia, for example: “Mathematical analysis of quicksort shows that, on average, the algorithm takes O(n log n) comparisons to sort n items. In the worst case, it makes O(n2) comparisons, though this behavior is rare.”
When referring to the O() of a particular algorithm without further specification, upper bound is what is meant. When discussing the asymptotics of quicksort you say that it is O(nlog(n)) in the average case, not that it is O(nlog(n)) in general.
I tried to make this clear in my comment - (asymptotic) "upper bound" is a concept applicable to any function from N to R. So do you mean the upper bound of the worst case, the upper bound of the average case, the upper bound of the best case, the upper bound of the worst case memory usage, the upper bound of the median price of eggs in China on the nth day since 1970, or something else?
Ya, I just mean that without further qualification, it's the upper bound of the whole function that's being referred to, which is equivalent to the upper bound of the worst case.
No, you still don't understand. The worst-case runtime and the average-case runtime are different functions. There's no one "whole function" that both are part of.
They are not different functions. They are the same function. The average case is a special case of the general function in which some of the terms collapse.
I don't know what to say at this point. Any reputable reference will disagree with you, including the two Stack Overflow links I posted, or even the Wikipedia article on big-O notation, which doesn't say anywhere that it has to do with the worst case of algorithms.
> They are not different functions. They are the same function.
Sure, if you expand the term "function" to include "random variable", which is legitimate to do in some fields of math... but we're talking about deterministic functions from the natural numbers (or real numbers) to the real numbers here. Which is what big-O notation is defined for.
Can you write down the deterministic function from N->R giving the "runtime" of some algorithm? (I mean the actual function, not big-O).
This does not match my experience at all. I have never (including in academia) encountered use of Big O which implicitly referred to the worst case - it has _always_ referred to the average case. That being said, usage has rarely been implicit in my experience - average or worst has almost always been explicitly stated.
You're wrong about this. g in O(f) is defined as: There exists c, m such that for all n > m , c*f(n)>g(n). Big O notation doesn't care about algorithms or worst cases.
It is an upper bound on the growth. But _what_ you're upper bounding is not specified.
Big-O notation works on mathematical functions. In the usual case, the function you're upper bounding is the worst-case run time of a procedure for a given input size. But that's _far_ from the only case.
Big O can b the upper bound can be for the amortized worst case though.
The other notations (small o and omega) are about scaling (small o doesn't allow constant factors) and pinching (omega means there are two constant factors, one of which forms a lower bound, and the other an upper bound.
True, the word amortized (i.e. average) is left out here, which still makes people's usage imprecise. But there is a decent argument that amortized is more meaningful than worst-case. Because amortized gives expected throughput and latency, whereas worst-case says very little about throughput and only informs you on the 99th percentile of latency.