Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I'm trying to use the following effect on a HTML5 game: http://somethinghitme.com/projects/metaballs/

But since its a game (as opposed to graphical demo) I have tighter FPS requirements, I need time to calculate the physics and the some other things and my biggest bottleneck is the code for the metaballs.

The following code is what I got after stripping the original code for performance, its not as pretty but it's enough for my purposes:

ParticleSpawner.prototype.metabilize = function(ctx) {
    var imageData = this._tempCtx.getImageData(0,0,900,675),
    pix = imageData.data;
    this._tempCtx.putImageData(imageData,0,0);

    for (var i = 0, n = pix.length; i <n; i += 4) {
        if(pix[i+3]<210){
            pix[i+3] = 0;
        }
    }

    //ctx.clearRect(0,0,900,675);
    //ctx.drawImage(this._tempCanvas,0,0);
    ctx.putImageData(imageData, 0, 0);
}

I had another loop on my code and I managed to increase its performance by using the technique described on the following link http://www.fatagnus.com/unrolling-your-loop-for-better-performance-in-javascript/ but using the same on this actually decreases the performance (maybe I did it wrong?)

I also researched web workers to see if I could split the load (since the code runs for each pixel individually) but the example I found on this link http://blogs.msdn.com/b/eternalcoding/archive/2012/09/20/using-web-workers-to-improve-performance-of-image-manipulation.aspx actually runs slower when using web workers.

What else can I do? Is there a way to remove the branching from the loop? Another way to unroll it? Or is this the best I can do?

Edit:

This is some of the surrounding code:

ParticleSpawner.prototype.drawParticles = function(ctx) {
    this._tempCtx.clearRect(0,0,900,675);

    var iterations = Math.floor(this._particles.getNumChildren() / 8);
    var leftover = this._particles.getNumChildren() % 8;
    var i = 0;

    if(leftover > 0) {
        do {
            this.process(i++);
        } while(--leftover > 0);
    }

    do {
        this.process(i++);
        this.process(i++);
        this.process(i++);
        this.process(i++);
        this.process(i++);
        this.process(i++);
        this.process(i++);
        this.process(i++);
    } while(--iterations > 0);

    this.metabilize(ctx);

}

and the process method:

ParticleSpawner.prototype.process = function(i) {
    if(!this._particles.getChildAt(i)) return;
    var bx = this._particles.getChildAt(i).x;
    var by = this._particles.getChildAt(i).y;

    if(bx > 910 || bx < -10 || by > 685) {
        this._particles.getChildAt(i).destroy();
        return;
    }

    //this._tempCtx.drawImage(this._level._queue.getResult("particleGradient"),bx-20,by-20);

    var grad = this._tempCtx.createRadialGradient(bx,by,1,bx,by,20);
    this._tempCtx.beginPath();

    var color = this._particles.getChildAt(i).color;
    var c = "rgba("+color.r+","+color.g+","+color.b+",";

    grad.addColorStop(0, c+'1.0)');
    grad.addColorStop(0.6, c+'0.5)');
    grad.addColorStop(1, c+'0)');

    this._tempCtx.fillStyle = grad;
    this._tempCtx.arc(bx, by, 20, 0, Math.PI*2);
    this._tempCtx.fill();

};

As can be seen, I tried using images instead of gradient shapes, but the performance was worse, I also tried to use ctx.drawImage instead of putImageData, but it loses the alpha and is not faster. I can't think of an alternative to achieve the desired effect. The current code runs perfectly on Google Chrome, but Safari and Firefox are really slow. Is there anything else I can try? Should I just give up on those browsers?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
198 views
Welcome To Ask or Share your Answers For Others

1 Answer

Updated

Some techniques that can be applied

Here are some optimization techniques that can be applied to make this work more fluent in FF and Safari as well.

That being said: Chrome's canvas implementation is very good and much faster (at the moment) than the bone provided by Firefox and Safari. The new Opera uses the same engine as Chrome and is (about?) equally as fast as Chrome's.

For this to work fine cross-browser some compromises needs to be made and as always quality will suffer.

The techniques I try to demonstrate are:

  • Cache a single gradient that is used as meta ball basis
  • Cache everything if possible
  • Render in half resolution
  • Use drawImage() to update main canvas
  • Disable image smoothing
  • Use integer coordinates and sizes
  • Use requestAnimationFrame()
  • Use while loops as often as you can

Bottlenecks

There is a high cost in generating a gradient for each metaball. So when we cache this once and for all we will just by doing that notice a huge improvement in performance.

The other point is getImageData and putImageData and the fact that we need to use a high-level language to iterate over a low-level byte array. Fortunately the array is typed array so that helps a little but we won't be able to get much more out of it unless we sacrifice more quality.

When you need to squeeze everything you can the so-called micro-optimizations becomes vital (these has an undeserved bad reputation IMO).

From the impression of your post: You seem to be very close to have this working but from the provided code I cannot see what went wrong so-to-speak.

In any case - Here is an actual implementation of this (based on the code you refer to):

Fiddle demo

Pre-calculate variables in the initial steps - everything we can pre-calculate helps us later as we can use the value directly:

var ...,

// multiplicator for resolution (see comment below)
factor = 2,
width = 500,
height = 500,

// some dimension pre-calculations
widthF = width / factor,
heightF = height / factor,

// for the pixel alpha
threshold = 210,
thresholdQ = threshold * 0.25,

// for gradient (more for simply setting the resolution)
grad,
dia = 500 / factor,
radius = dia * 0.5,

...

We use a factor here to reduce the actual size and to scale the final render to on-screen canvas. For each 2 factor you save 4x pixels exponentially. I preset this to 2 in the demo and this works great with Chrome and good with Firefox. You might even be able to run factor of 1 (1:1 ratio) in both browsers on a better spec'ed machine than mine (Atom CPU).

Init the sizes of the various canvases:

// set sizes on canvases
canvas.width = width;
canvas.height = height;

// off-screen canvas
tmpCanvas.width = widthF;
tmpCanvas.height = heightF;

// gradient canvas
gCanvas.width = gCanvas.height = dia

Then generate a single instance of a gradient that will be cached for the other balls later. Worth to notice: I initially used only this to draw all the balls but later decided to cache each ball as an image (canvas) instead of drawing and scaling.

This has a memory penalty but increases the performance. If memory is of importance you can skip the caching of rendered balls in the loop that generates them and just drawImage the gradient canvas instead when you need to draw the balls.

Generate gradient:

var grad = gCtx.createRadialGradient(radius, radius, 1, radius, radius, radius);
grad.addColorStop(0, 'rgba(0,0,255,1)');
grad.addColorStop(1, 'rgba(0,0,255,0)');
gCtx.fillStyle = grad;
gCtx.arc(radius, radius, radius, 0, Math.PI * 2);
gCtx.fill();

Then in the loop that generates the various metaballs.

Cache the calculated and rendered metaball:

for (var i = 0; i < 50; i++) {

    // all values are rounded to integer values
    var x = Math.random() * width | 0,
        y = Math.random() * height | 0,
        vx = Math.round((Math.random() * 8) - 4),
        vy = Math.round((Math.random() * 8) - 4),
        size = Math.round((Math.floor(Math.random() * 200) + 200) / factor),

        // cache this variant as canvas
        c = document.createElement('canvas'),
        cc = c.getContext('2d');

    // scale and draw the metaball
    c.width = c.height = size;
    cc.drawImage(gCanvas, 0, 0, size, size);

    points.push({
        x: x,
        y: y,
        vx: vx,
        vy: vy,
        size: size,
        maxX: widthF + size,
        maxY: heightF + size,
        ball: c  // here we add the cached ball
    });
}

Then we turn off interpolating for images that are being scaled - this gains even more speed.

Note that you can also use CSS in some browsers to do the same as here.

Disable image smoothing:

// disable image smoothing for sake of speed
ctx.webkitImageSmoothingEnabled = false;
ctx.mozImageSmoothingEnabled = false;
ctx.msImageSmoothingEnabled = false;
ctx.oImageSmoothingEnabled = false;
ctx.imageSmoothingEnabled = false;  // future...

Now the non-critical parts are done. The rest of the code utilizes these tweaks to perform better.

The main loop now looks like this:

function animate() {

    var len = points.length,
        point;

    // clear the frame of off-sceen canvas
    tmpCtx.clearRect(0, 0, width, height);

    while(len--) {
        point = points[len];
        point.x += point.vx;
        point.y += point.vy;

        // the checks are now exclusive so only one of them is processed    
        if (point.x > point.maxX) {
            point.x = -point.size;
        } else if (point.x < -point.size) {
            point.x = point.maxX;
        }

        if (point.y > point.maxY) {
            point.y = -point.size;
        } else if (point.y < -point.size) {
            point.y = point.maxY;
        }

        // draw cached ball onto off-screen canvas
        tmpCtx.drawImage(point.ball, point.x, point.y, point.size, point.size);
    }

    // trigger levels
    metabalize();

    // low-level loop
    requestAnimationFrame(animate);
}

Using requestAnimationFrame squeezes a little more of the browser as it is intended to be more low-level and more efficient than just using a setTimeout.

The original code checked for both edges - this is not necessary as a ball can only cross one edge at the time (per axis).

The metabolize function is modified like this:

function metabalize(){

    // cache what can be cached
var imageData = tmpCtx.getImageData(0 , 0, widthF, heightF),
        pix = imageData.data,
        i = pix.length - 1,
        p;

    // using a while loop here instead of for is beneficial
    while(i > 0) {
    p = pix[i];
        if(p < threshold) {
    pix[i] = p * 0.1667; // multiply is faster than div
    if(p > thresholdQ){
        pix[i] = 0;
    }
        }
    i -= 4;
    }

    // put back data, clear frame and update scaled
    tmpCtx.putImageData(imageData, 0, 0);
    ctx.clearRect(0, 0, width, height);
    ctx.drawImage(tmpCanvas, 0, 0, width, height);
}

Some micro-optimizations that actually helps in this context.

We cache the pixel value for alpha channel as we use it more than two times. Instead of diving on 6 we multiply with 0.1667 as multiplication is a tad faster.

We have already cached tresholdQ value (25% of threshold). Putting the cached value inside the function would give a little more speed.

Unfortunately as this method is based on the alpha channel we need to clear also the main canvas. This has a (relatively) huge penalty in this context. The optimal would be to be able to use solid colors which you could "blit" directly but I didn't look Into that aspect here.

You could also had put the point data in an array instead of as objects. However, since there are so few this will probably not be worth it in this case.

In conclusion

I have probably missed one or two (or more) places which can be optimized further but you get the idea.

And as you can see the modified code runs several times faster than the original code mainly due to the compromise we make here with quality and some optimizations particularly with the gradient.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...