The first generation of GPU instances was introduced by AWS back in 2010. The industry has changed dramatically since then and nowadays server-side GPUs are widely used for a broad range of tasks - from machine learning to media processing. In this article we’re going to build a server that can securely run graphical applications in isolated environments and leverage on the power of modern GPUs.
Architecture
The application that we’re going to build is a webpage video recorder. The architecture is pretty straightforward: we will deploy an AWS ECS cluster of GPU-powered instances. The system image already has NVIDIA drivers installed and available from within Docker with just a little effort, so we will focus on the application itself. The application contains a pool of virtual displays. Whenever a website recording is requested, an isolated Chrome instance is launched. When it loads the webpage, the server captures the screen and encodes it in realtime to a video file on the GPU.
G5 instances are equipped with a powerful NVIDIA A10G GPU which allows to process several HD video streams simultaneously, so the application can easily scale up on the number of virtual displays, and thanks to Docker, it can run isolated parallel workflows in a secure manner.
The source code for this project is available on GitHub. It contains the application Docker image and the AWS infrastructure definition.
Docker Container
The container is based on the official NVIDIA CUDA image that contains everything that we will need to leverage on the hardware acceleration. As for the AWS instance image, we will use the official ECS-optimized AMI for GPU instances, which contains fresh NVIDIA drivers and libraries, a Docker runtime with GPU support and everything else that we’ll need to start using it in an ECS cluster. The only tricky part is to build FFmpeg from the source and to link it with NVIDIA libraries available in the system. It’s also worth noting that NVIDIA_DRIVER_CAPABILITIES
variable controls what NVIDIA features will be available to the container.
1 | # CUDA 11.4 matches the runtime at the latest AWS ECS-optimized AMI for GPU instances |
Application
The application is a Node.js API server. It launches virtual displays with Xvfb (which stands for X virtual framebuffer) and manages the resource pooling. Each display contains a window manager, so that applications can draw themselves on it. The displays also utilize OpenGL provided by VirtualGL. While this is not a truly 3D-accelerated environment and Xvfb is also CPU demanding compared to a regular X server, for the sake of simplicity we won’t discuss how to get around that, since it’s more than enough for our use case.
1 | // Start the display server |
Chrome is launched using Puppeteer with quite a few options in order to be able to work inside Docker and to enable the hardware acceleration that is usually disabled in server usage scenarios:
google-chrome-stable \ |
The resulting GPU report from chrome://gpu
looks this way and is similar to what you can see in Chrome on GPU-powered non-Windows platforms. It can vary greatly depending on the Chrome version used and the driver version installed, but this outcome is perfectly fine for our use case:
Canvas: Hardware accelerated
Canvas out-of-process rasterization: Disabled
Direct Rendering Display Compositor: Disabled
Compositing: Hardware accelerated
Multiple Raster Threads: Enabled
OpenGL: Enabled
Rasterization: Hardware accelerated on all pages
Raw Draw: Disabled
Skia Renderer: Enabled
Video Decode: Hardware accelerated
Video Encode: Software only. Hardware acceleration disabled
Vulkan: Disabled
WebGL: Hardware accelerated
WebGL2: Hardware accelerated
Screen Capture
The screen capture is performed by FFmpeg with X11 as the input source. NVIDIA GPU that is used in G5 instances provides NVENC - a hardware accelerated video encoding module. Since we’ve built a custom version of FFmpeg with CUDA libraries, we can now encode HEVC (H.265) videos with FFmpeg entirely on the GPU, offloading the CPU for other tasks. This provides us with the ability to capture and compress HD videos in realtime at high FPS. The command for FFmpeg to capture the screen is the following:
# https://en.wikipedia.org/wiki/FFmpeg |
FFmpeg runs for the requested amount of seconds, then stops and the resulting MP4 file is served to the user. The following video has been recorded from this CodePen demo. Thanks to the hardware acceleration, it is smooth and crisp.