#### Simple demo

The code below is plotting roughly 400,000 points using 'scattergl' in plotly. When you enable 'Decimate', it plots the same data set using the decimation technique described in this post.

#### What is decimation?

Here, I only consider plots that linear interpolate between points ('lines' in plotly).

Imagine you have a plot with three points: (0,0), (1,0), and (2,2). When you plot that, it's just a line from (0,0) to (1,0) and another line from (1,0) to (2,2). If your plot is 500px wide with no margins, that line will stretch from pixel 0 to pixel 500.

Now, imagine you have a plot with 4 points: (0,0), (1,1), (2,2), and (10k,0). If it's 500 px wide, what line will be drawn? x = 0 is in pixel 500*(0/10k), or 0 (since you round). x = 1 is in pixel 500*(1/10k), or 0 (since you round). x = 2 is in pixel 500*(2/10k), or 0 (since you round). x = 10k is in pixel 500*(10k/10k), or 500. Thus, you get a vertical line from (0,0) to (2,2), and then a line from (2,2) to (10k,0).

The point (1,0) lies on that vertical line. There is no reason to plot it. That's the basic idea here. If a point will already be plotted by the linear interpolation in the plot, remove it when you go to plot.

There are many different meanings of decimate, but here I use it to mean keep only the max and min (by y-value) point in each horizontal pixel.

#### Why is it faster?

Consider the case in the demo above. That plot is only 500px wide. At most then, you have ~1000 points that can really be rendered using the above algorithm. Sending 1000 points to plotly to render is much, much faster than sending 400,000 points to it.

There is some overhead for decimating the traces. From experimenting on my machine, it seems to be faster when there are >= 5x as many points as there are horizontal pixels in the plot. This will vary by decimation algorithm, plot specifics, etc., so you'd need to play around with it.

#### What's the actual algorithm?

You can just copy it from the code if you want. It's not plotly-specific. The basics though are:

- Check if # of points is large enough to be worth decimating (return if not)
- Check if points are sorted by x (return if not)
- Find the subset you've zoomed on (can ignore if you don't want to support zoom)
- Set initial point and first bin (bin == horizontal pixel)
- Walk through all points in bin
- track max and min
- If leaving bin, store off max and min, and reset for next bin
- When done with all bins, set final point

That's it.

There are other ways you can do this. For example, if your data set is like the one in my 'what is decimation' section, you might get a large benefit with even small amounts of data.

## 0 comments:

## Post a Comment