2014-05-28

Worst case vs. Average case

For years and years everyone's been focused on the average case and amortized analysis. Quicksort is most programmers favorite sort. Splay trees are popular in some groups due to their amortized performance, etc.
I'd like to propose an alternative. I would like to propose that for most problems this view is incorrect, and that we should be focused on the worst case performance rather than the average. The reason for this is that ALL code is real-time.

The definition of realtime is:
A program is realtime if it must complete it's task within a specified real world time bound T.

 If I'm playing a videogame and 1 out of every 100 frames takes longer than a 100'th of a second to render I can visually see a flicker/stall in the video. As a result the videogame looks crappy and I hate playing it. This is a clear case where our software is realtime. If it takes longer than a certain amount of time, then it's not performing correctly.

Now lets say I go to visit a website, I sit there for 5 minutes waiting for it to load and it never does. So I give up on the website close the browser and never visit it again. Clearly this websites behavior was incorrect, and as a result it caused the website to lose a user. So, part of the correctness criteria for that website is a realtime bound. That websites server code is real-time!

Worst case vs. Average case

If I need my code to meet a real-time criteria, always, then I need to make sure that it's worst-case is less than that bound. Thus, we want to choose algorithms with low variability in time, and possibly a worst average case, in exchange for ensuring that the worst-case falls within our limitations.

This concept is especially true in distributed systems. In a normal system if our worst-case is highly unlikely it won't impact us often at all. On the other hand if rendering the website the user wanted requires hitting 100,000 servers, then it's pretty likely that that worst-case that only happens 0.0001% of the time will trigger on ONE of those servers. Examples of this include search-engines which fan out the request to servers each of which is responsible for part of the space of all data being searched.

Surprisingly this even comes up on back-end systems. Lets say I'm building an enormous dataset on a large cluster. I do this once a day.  If I'm doing a mapreduce and 1 out of every 1000 shards of my mapreduce is running really slowly, then my total mapreduce time is bound on those worst case shards, not the average. And I have a realtime bound of 1 day (probably less, since it'd be nice if I had a little leeway).

On top of that, while waiting for those couple of shards to finish their work we're probably waisting the rest of the cluster's computational ability - it's just idle. We could try and cram in other workloads, but this gets really complicated. It's much easier if you can flatten every workload as much as possible.

Lies and Deceit

I've just told a lie. In practice on really large systems it's often actually the 99.99% latency (or some number of 9's anyway), and you can may be able to just throw-out/time-out on the other cases. You probably can't do this at every layer though! So, as you distribute and add more layers to a system you still need to worry more and more about the wost-case performance.

Note for wankers, I'm using O not big theta because it's easier to type, and while the distinction would be formally useful here it's not widely used.

The point

When we start thinking of algorithms this way, the cost of mis-sizing a hashtable is an O(1) operation suddenly jumping to an O(n) operation so we can resize. Our quicksort looks pretty bad at O(n). Balanced trees, and the extra memory needed for mergesort or the extra moves for heapsort start to look a lot more tempting.

That is not to say average case, or even better amortized analysis, is not useful. If you do 100,000 operations on your dict, and all you care about is the time bound of that total set of operations, but this is only true if no-one else depends on the intermediate results.

So think about the real bounds of the code you are writing, and whenever someone says they want something to be "faster" stop for a second ask yourself if worst case or average is really what needs to improve.




6 comments:

  1. I always like to point out that any section of code that uses dynamic allocation can never be considered real-time, because there is no way to know what state the heap is in. It gets even worse in garbage collected environments because the collector may also need to run to make room for a new allocation.
    - orie

    ReplyDelete
    Replies
    1. for the love of god, did google just use the captcha to have me label a home address for them?

      Delete
    2. blargh, sorry about the captcha.

      As for the dynamic memory allocation... I used to agree with you, but I'm less convinced these days.

      There's nothing special about memory allocation really. You control the library, so can control all aspects of it if you so desire. If you allocate large enough blocks it'll just be memory mapping underneath anyway, and if not most libraries are Log(n) in the size of the *free* heap, so if you don't free much you can be confident it'll be fast.

      That leaves the kernel mapping pages to you. It's true that it's not guaranteed to be be fast, but it's guaranteed to be as fast as fault loading in your executable or your stack, which you can't control either. Of things that screw you dynamic memory allocation isn't really the worst one.

      What really makes realtime a joke is prefetching, branch prediction, and nondeterminstic caching. That's if you control the whole machine, if not you've also got competing workloads blowing your cachelines, scheduling issues, IO contention, etc. Those screw you hard and can cause a piece of code's runtime to shoot up by 100 or 1000x compared to a random benchmark pretty easily.

      Delete
    3. It's not the allocation that gets you - it's fragmentation and size lookups.
      You can structure your program to avoid those and still have reasonable bounds - for example, by using fixed-sized slabs (e.g. ropes or cords instead of contiguous strings) or region-scoped arenas.

      Actually writing programs with bounded memory usage, though, would require much better languages and/or static analysis tools than are currently usable.

      Delete
  2. I think worst case is pretty much irrelevant if it never actually happens. Also, it tends to be obnoxious to even figure out what the worst case is. For example, if X and Y take N and M time worst case, then the worst case for X;Y is not necessarily N+M.

    ReplyDelete
    Replies
    1. Then I suspect you haven't worked:
      1) in robotics, where you loop multiple times a second and the
      worst thing that can happen usually does.
      2) in extremely large systems where everything that can go wron
      g probably is going wrong somewhere right now in your system.

      There's also a fundamental assumption here that the goal is to write software that works reliably. Saying "this case will never happen" is antithetical to that goal. If we want to keep moving forward towards systems we can rely on at some point we need to aim higher... as the 9's get higher the "worst-case" become
      s more and more relevent.

      Note, I did give a node to your point though in my "lies and deceit" segment. If you are aiming for 5 9's, and your worst-case is outside of that *shrug* it may well not matter. But then you need to worry about your almost worst case and make sure that
      's inside your bounds. You still need to think this way it's ju
      st past the most rudimentary approach

      Delete