Dots vs lines, which is better? When should I use one over the other? Should I even care? The boss just wants a pretty picture in the report, I’ll can use whatever the default is and go to lunch early.
In my opinion, this is a question that should be thoroughly thought over when visualising test results. After all, it is normally quite a lot of effort to get performance test environments build and configured and getting performance tests running correctly; it would be a shame not to glean as much as information as possible from the results.
Here at Ideal Technology, Tableau is our tool of choice for visualising result data. When you create time series data in Tableau, by default it will usually select to represent the data with lines. Depending on the data this may be the desirable choice. However, when the sample density gets beyond a certain point, the lines get far too busy and can actually hide important information in the data.
Take the below plots of response time over time; do they tell the same story?
These both actually represent the same data; the only difference is the lower plot joins the data points with a line. When we compare them, the lower plot (lines) appears to show the system is behaving much slower than the top plot. In addition the top plot clearly show a couple of interesting patterns that hard to see or even invisible when you plot with lines.
- The horizontal banding; in performance analysis this is a telltale sign of something requiring more analysis. It illustrates something happening in constant time, consistently, which is generally unexpected. This could be issues like connection timeouts or queuing problems.
- Response time distribution. The top plot clearly shows the vast majority of the requests were completing below 2 seconds. With the bottom plot, it appears as if the majority of responses were around 8 seconds; overstating response times by a factor of 4.
Now for another example; this time on a much more sparse dataset.
In this case it is easy to see that using lines adds to the information presented, particularly where data points in different series are close together or overlap (such as the yellow and green series for Monday, Tuesday and Wednesday above).
And so to answer the question; neither is better overall. Both have a time where they should be used. Individual dots for lots of data over time; lines for less data over discrete intervals/buckets. But you knew that already right …?
And a quick note about how much data we often deal with. The first plot above has approximately 800,000 data points and was generated from a single 3 hour test using 50 virtual users. We use Tableau as it can plot these amount of data very quickly and cleanly. In fact on a recent client engagement I was created the below plot with around 6,000,000 unaggregated data points. Try doing that in Excel …