— Pixels Commander

[ In English, На русском ]

The state of HTML content rasterization: approaches, problems, solutions

Have you ever faced a need to rasterize HTML content in browser? Probably not, but this is not a reason to stop reading this article. Integration testing, page thumbnnails, remote printing, GPU rendering – this is a short list of tasks HTML rasterization is needed for. Sooner or later you will meet rasterization on your way. Let`s have a look at tools and approaches, actual problems of this field and also think about better way for HTML content rasterization.

How do we can to rasterize right now?

There is no standard API for drawing HTML content on canvas or converting it to an image, but there are few workarounds:

  • Inserting content into SVG foreignObject tag and converting it to dataURI after
  • Drawing content to Canvas based on CSS styles and markup grabbed from page. Actually this needs to reimplement all  browser layout and styling model in JavaScript
  • Rasterize with PhantomJS

So let`s zoom on pros and cons of each of these approaches.

Exporting content with SVG

This method was initially explained in Robert`O Callahan`s article “Drawing DOM Content To Canvas” and implemented as rasterizeHTML.js by Christoph Burgmer. In essential it uses possibility of injecting HTML content into SVG foreignObject tag and converting SVG to dataURI then. Main con of this technique is an absence of IE support. Also you should respect same origin policy for resources as images otherwise content will not be rendered. There is a workaround for this based on converting resources to dataURI which cleans their origin. As overall this technique is quite easy to implement but lack of IE support makes it a bad choice for public projects.

Drawing HTML content to Canvas based on page markup and styles

This is exceptionally hard to implement since needs all rendering logic to be implemented in JavaScript. The good thing is that it is already done by Niklas von Hertzen and available in html2canvas library. Using this approach you still have same origin issue as well as lack of Retina displays support (hopefully will be fixed in a while). The thing which makes html2canvas really strong is wide browsers compatibility which makes html2canvas the only choice for real world projects.

Rasterization with PhantomJS

This method differs from previous two as it requires page to be loaded with PhantomJS headless browser and calling  webpage.render(fileName) after. Obviously this does not fit for client side development and can be used on server side or during integration testing. However even for tests we face an issue since PhantomJS is a single engine implementation which rendering result could differ from other browser. This is a problem common for every approach we reviewed.

Common problem – inconsistency

There is no guarantees that image you get with one of approaches available will precisely reflect image user see in browser. html2canvas is supplied with nice integration tests but this does not solve problem completely.

Common problem – performance

All techniques have one more common issue – redundant number of operations required. Using any of them assumes creating content for second time and PhantomJS runs some file I/O operations which is even worth. This means that rasterization is inefficient and takes way more time it could to as rasterization result is already known to browser! We only need to get image from memory…

Rasterization libraries performance comparsion

Benchmark measures rasterization time for document of average complexity containing 282 DOM nodes (10 of which are medium size images and 1 fullscreen background). Computer configuration for test is MacBook Pro 2,3 GHz Intel Core i7 16 GB 1600 MHz DDR3.

Results:

Diagram HTML rasterization libraries performance comparsion

  • html2canvas 170 msec
  • rasterizehtml 450 msec

As we see html2canvas is a few times faster, but this still exceeds maximum frame budget of 16 msec by more then 10 times. This lead to janks and delays in interactive applications and can be real disaster on weaker devices.

Native rasterization API as a solution

Taking into account all problems listed draft proposal for native rasterization API was developed. In short it proposes to expose document.rasterize(element, options) method returning ImageData object, which could be saved to file / be displayed on canvas or IMG tag then. As a security concern it assumes all inputs be rendered as empty and same-origin policy be respected for resources and iframes. You may find proposal here.

Rasterization API needs your comments and expertise! Join us!

P.S. Mozilla already experimented in this field by implementing drawWindow API, which lacks flexibility and therefore was not motivating to standardise it and implement in other browsers.