The first generation of GPU instances was introduced by AWS back in 2010. The industry has changed dramatically since then and nowadays server-side GPUs are widely used for a broad range of tasks - from machine learning to media processing. In this article we’re going to build a server that can securely run graphical applications in isolated environments and leverage on the power of modern GPUs.

Architecture

The application that we’re going to build is a webpage video recorder. The architecture is pretty straightforward: we will deploy an AWS ECS cluster of GPU-powered instances. The system image already has NVIDIA drivers installed and available from within Docker with just a little effort, so we will focus on the application itself. The application contains a pool of virtual displays. Whenever a website recording is requested, an isolated Chrome instance is launched. When it loads the webpage, the server captures the screen and encodes it in realtime to a video file on the GPU.

Schema

G5 instances are equipped with a powerful NVIDIA A10G GPU which allows to process several HD video streams simultaneously, so the application can easily scale up on the number of virtual displays, and thanks to Docker, it can run isolated parallel workflows in a secure manner.

The source code for this project is available on GitHub. It contains the application Docker image and the AWS infrastructure definition.

Docker Container

The container is based on the official NVIDIA CUDA image that contains everything that we will need to leverage on the hardware acceleration. As for the AWS instance image, we will use the official ECS-optimized AMI for GPU instances, which contains fresh NVIDIA drivers and libraries, a Docker runtime with GPU support and everything else that we’ll need to start using it in an ECS cluster. The only tricky part is to build FFmpeg from the source and to link it with NVIDIA libraries available in the system. It’s also worth noting that NVIDIA_DRIVER_CAPABILITIES variable controls what NVIDIA features will be available to the container.

Docker imagesource
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
# CUDA 11.4 matches the runtime at the latest AWS ECS-optimized AMI for GPU instances
# https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-optimized_AMI.html
# Use the development version to be able to reference headers during FFmpeg build
FROM nvidia/cuda:11.4.0-devel-ubuntu20.04

ARG DEBIAN_FRONTEND=noninteractive

# Mount all driver libraries
# https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/user-guide.html#driver-capabilities
ENV NVIDIA_DRIVER_CAPABILITIES=all
ENV PORT=3000
ENV NODE_ENV=develop

RUN apt-get update -y && apt-get install -y curl wget

RUN mkdir /install
WORKDIR /install

# FFmpeg, build from source and include NVIDIA codecs
COPY ./install/ffmpeg.sh .
RUN bash -e ffmpeg.sh

# VirtualGL
COPY ./install/virtualgl.sh .
RUN bash -e virtualgl.sh

# Chrome dependencies
COPY ./install/chrome.sh .
RUN bash -e chrome.sh

# Utilities
COPY ./install/utils.sh .
RUN bash -e utils.sh

# Node.js
COPY ./install/node.sh .
RUN bash -e node.sh

RUN rm -rf /install

# Application
WORKDIR /usr/src/app

COPY package.json .
COPY package-lock.json .
RUN npm i

ADD . .

RUN npm run build && rm -rf src
RUN chmod -R o+rwx node_modules/puppeteer/.local-chromium

CMD npm start

Application

The application is a Node.js API server. It launches virtual displays with Xvfb (which stands for X virtual framebuffer) and manages the resource pooling. Each display contains a window manager, so that applications can draw themselves on it. The displays also utilize OpenGL provided by VirtualGL.

Display initializationsource
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
// Start the display server
// https://en.wikipedia.org/wiki/Xvfb
// https://en.wikipedia.org/wiki/VirtualGL
const item = await new Promise<Xvfb>((resolve, reject) => {
const xvfb = new Xvfb({
reuse: false,
xvfb_args: [
'-screen', '0', `1920x1080x24+32`,
'+extension', 'GLX', // Enable OpenGL
'+extension', 'RANDR',
'-nolisten', 'tcp', '-dpi', '96', '-ac', '-noreset',
],
});
xvfb.start((error) => {
if (error) {
reject(error);
} else {
resolve(xvfb);
}
});
});
const display = item.display(); // Get the display number, e.g. ":99"

// Start the window manager
// https://en.wikipedia.org/wiki/Fluxbox
const wm = execa('fluxbox', [], {
env: { DISPLAY: display }, // Run on the new display
});

// Hide the cursor
// http://manpages.ubuntu.com/manpages/trusty/man1/unclutter.1.html
const cursor = execa('unclutter', ['-idle', '0'], {
env: { DISPLAY: display },
});

Chrome is launched using Puppeteer with quite a few options in order to be able to work inside Docker and to enable the hardware acceleration that is usually disabled in server usage scenarios:

Chrome launch commandsource
google-chrome-stable \
--kiosk --start-fullscreen --autoplay-policy=no-user-gesture-required \
--hide-scrollbars --disable-infobars --no-default-browser-check \
--no-sandbox --disable-setuid-sandbox \
--ignore-gpu-blacklist --ignore-gpu-blocklist \
--enable-features=VaapiVideoDecoder --enable-accelerated-video-decode \
--enable-gpu-rasterization --enable-oop-rasterization --enable-tcp-fast-open \
--use-gl=desktop --enable-webgl

The resulting GPU report from chrome://gpu looks this way and is similar to what you can see in Chrome on GPU-powered non-Windows platforms. It can vary greatly depending on the Chrome version used and the driver version installed, but this outcome is perfectly fine for our use case:

Canvas: Hardware accelerated
Canvas out-of-process rasterization: Disabled
Direct Rendering Display Compositor: Disabled
Compositing: Hardware accelerated
Multiple Raster Threads: Enabled
OpenGL: Enabled
Rasterization: Hardware accelerated on all pages
Raw Draw: Disabled
Skia Renderer: Enabled
Video Decode: Hardware accelerated
Video Encode: Software only. Hardware acceleration disabled
Vulkan: Disabled
WebGL: Hardware accelerated
WebGL2: Hardware accelerated

Screen Capture

The screen capture is performed by FFmpeg with X11 as the input source. NVIDIA GPU that is used in G5 instances provides NVENC - a hardware accelerated video encoding module. Since we’ve built a custom version of FFmpeg with CUDA libraries, we can now encode HEVC (H.265) videos with FFmpeg entirely on the GPU, offloading the CPU for other tasks. This provides us with the ability to capture and compress HD videos in realtime at high FPS. The command for FFmpeg to capture the screen is the following:

FFmpeg screen recordingsource
# https://en.wikipedia.org/wiki/FFmpeg
# https://en.wikipedia.org/wiki/X_Window_System#Key_terms
ffmpeg \
-f x11grab \ # Capture the X Window System
-hwaccel nvdec -hwaccel_output_format cuda \ # Enable GPU acceleration with NVDEC
-thread_queue_size 2048 -probesize 10M -analyzeduration 10M \ # Optimizations
-framerate 30 -s 1920x1080 \ # Input stream parameters
-i :99.0 \ # Capture "display" 99 and "screen" 0
-c:v hevc_nvenc \ # Encode as HEVC (H.265) on GPU with NVENC
-preset fast -movflags +faststart -g 999999 \ # Optimize for screen capture
~/output.mp4

FFmpeg runs for the requested amount of seconds, then stops and the resulting MP4 file is served to the user. The following video has been recorded from this CodePen demo. Thanks to the hardware acceleration, it is smooth and crisp.

Conclusion

As you can see, the toolset for server-side GPU applications has matured enough that building a simple app can take just a few hours. With the power of cloud scalability and the modern software, you can build things that were hard or even unimaginable just a decade ago.