WebGPU computations performance in comparison to WebGL
WebGPU – the successor of WebGL, a brand new API to utilize GPUs in the browser. It is promised to be available in regular Chrome in Q1 2022. In comparison to WebGL, WebGPU promises better performance and better compatibility with modern hardware, but the most recognizable feature of WebGPU is a special API for performing computations on GPU.
Does not WebGL have the same feature?
Yes and no. WebGL does not have a special API for computation but still, there is a hack that makes it possible. Data is being converted into an image, image uploaded to GPU as a texture, texture rendered synchronously with a pixel shader that does an actual computation. Then the result of computation we have as a set of pixels on a <canvas>
element and we have to read it synchronously with getPixelsData
then color codes to be converted back to your data. Looks like an inefficient mess, right?
How WebGPU is different?
API WebGPU provides for computations (compute shaders) is different in the way it is easy to miss the importance of the improvements, however, it empowers developers with absolutely new features. The way it works is:
The differences
- Data uploaded to GPU as a buffer, you do not convert it to pixels so it is cheaper
- Computation is being performed asynchronously and does not block JS main thread (say hi to real-time post-processing and complex physics simulation at 60FPS)
- We do not need canvas element and we avoid its limitation on size
- We do not do expensive and synchronous getPixelsData
- We do not spend time converting pixels values back to data
So WebGPU’s promise is that we can compute without blocking the main thread and compute faster, but how much faster?
How do we benchmark?
As a benchmark, we use matrix multiplication which lets us scale the complexity and amount of computations easily.
For example, 16×16 matrix multiplication requires 7936 multiplication operations and 60×60 already gets us 428400 operations.
Sure thing we run the test in an appropriate browser which is Chrome Canary with #unsafe-webgpu-enabled
flag on.
Results
The first results were discouraging and WebGL outperformed WebGPU at the bigger numbers:
Then I found that the size of a working group (number of operations to calculate in a single batch) is set in code to be as big as the matrix side. It works fine until the matrix side is lower than the number of ALUs on GPU (arithmetic logical unit) which is reflected in WebGPU API as a maximumWorkingGroupSize property. For me, it was 256. Once the working group was set to be less or equal to 256 this is the result we get:
This is quite impressive while is expected. WebGPU initialization and data transfer times are remarkably lower because we do not convert data to textures and do not read it from pixels. WebGPU performance is significantly higher and gets to 3.5x faster compared to WebGL while it does not block the main thread.
It is also interesting to see WebGL failing after matrix size gets over 4096×4096 because of canvas and texture size limitations while WebGPU is capable to perform for matrices up to 5000×5000 which sounds not much of a difference but actually is 112552823744 more operations to perform and 817216 more values to hold.
Small but interesting fact – both WebGL / WebGPU require some time to warm up while JS goes full power straight away.
Conclusion
The experiment proved that WebGPU compute shaders are in practice 3.5x faster than WebGL computing with pixel shaders while having significantly higher limits regarding the amount of data to process also it does not block the main thread. This allows new kinds of tasks in the browser: video and audio editing, real-time physics simulation, and more realistic visual effects, machine learning. This is the incomplete list of jobs to benefit from WebGPU where we can expect the new generation of apps to appear and the boundaries of what is possible to do on the Web significantly expanded.