AWS Nitro System

At Tuesday Night
Live with James Hamilton
at the 2016 AWS re:Invent conference, I introduced
the first Amazon Web Services custom silicon. The ASIC I showed formed the foundational
core of our second generation custom network interface controllers and, even
back in 2016, there was at least one of these ASICs going into every new server
in the AWS fleet. This work has continued for many years now and this part and
subsequent generations form the hardware basis of the AWS Nitro System.  The Nitro system is used to deliver these
features for AWS Elastic Compute Cluster
(EC2)
instance types:

  1. High speed networking with hardware offload
  2. High speed EBS storage with hardware offload
  3. NVMe local storage
  4. Remote Direct Memory Access (RDMA) for MPI and
    Libfabric
  5. Hardware protection/firmware verification for bare
    metal instances
  6. All business logic needed to control EC2 instances

We continue to consume millions of the Nitro ASICs every
year so, even though it’s only used by AWS, it’s actually a fairly high volume
server component. This and follow-on technology has been supporting much of the
innovation going on in EC2 but haven’t had a chance to get into much detail on
how Nitro actually works.

At re:Invent 2018
Anthony Liguori, one of the lead engineers on the AWS Nitro System project gave
what was, at least for me, one of the best talks at re:Invent outside of the
keynotes. It’s worth watching the video (URL below) but I’ll cover some of what
Anthony went through in his talk here.

The Nitro System powers all EC2 Instance types over the last
couple of years.  There are three major
components:

  1. Nitro Card I/O Acceleration
  2. Nitro Security Chip
  3. Nitro Hypervisor

Different EC2 server instance types include different Nitro
System features and some server types have many Nitro System cards that
implement the five main features of the AWS Nitro System:

  1. Nitro Card for VPC (Virtual Private Cloud)
  2. Nitro Card for EBS (Elastic Block Store)
  3. Nitro Card for Instance Storage
  4. Nitro Card Controller
  5. Nitro Security Chip

These features formed the backbone for Anthony Liguori’s 2018
re:Invent talk and he went through some of the characteristics of each.

Nitro Card for VPC

The Nitro card for VPC is essentially a PCIe attached Network Interface Card (NIC) often called a network adapter or, in some parts of the industry, a network controller. This is the card that implements the hardware interface between EC2 servers and the network connection or connections implemented on that server type. And, like all NICs, interfacing with it requires that there be a specific device driver loaded to support communicating with the network adapter.  In the case of AWS NICs, the Elastic Network Adapter (ENA) is the device driver support for our NICs. This driver is now included in all major operating systems and distributions.

The Nitro Card for VPC supports network packet
encapsulation/decapsulation, implements EC2 security groups, enforces limits,
and is responsible for routing.  Having
these features implemented off of the server hardware rather than in the
hypervisor allows customers to fully use the underlying server hardware without
impacting network performance, impacting other users, and we don’t have to have
some server cores unavailable to customers to handle networking tasks. And, it
also allows secure networking support without requiring server resources to be
reserved for AWS use. The largest instance types get access to all server cores.

It wasn’t covered in the talk but the Nitro Card for VPC
also supports Remote Direct
Memory Access
(RDMA) networking. The Elastic Fabric Adapter (EFA) supports
both the OpenFabrics Alliance Libfabric API or the popular Message Passing
Interface
(MPI). These APIs both provide network access with operating
system bypass when used with EFA. MPI is in common use in high performance
computing applications and, to a lesser extent, in latency sensitive data
intensive applications and some distributed databases.

Nitro Card for EBS

The Nitro Card for EBS supports storage acceleration for
EBS.  All instance local storage is
implemented as NVMe
devices and the Nitro Card for EBS supports transparent encryption, limits to
protect the performance characteristics of the system for other users, drive
monitoring to monitor SSD wear, and it also supports bare metal instance types.

Remote storage is again
implemented as NVMe devices but this time as NVMe
over Fabrics
supporting access to EBS volumes again with encryption and
again without impacting other EC2 users and with security even in a bare metal
environment.

The Nitro card for EBS was first
launched in the EC2 C4 instance family.

Nitro Card for Instance Storage

The Nitro Card for Instance storage also implements NVMe (Non-Volatile Memory
for PCIe) for local EC2 instance storage.

Nitro Card Controller

The Nitro Card Controller coordinates all other Nitro cards,
the server hypervisor, and the Nitro Security Chip. It implements the hardware
root of trust using the Nitro Security Chip and supports instance monitoring
functions. It also implements the NVMe controller functionality for one or more
Nitro Cards for EBS.

Nitro Security Chip

The Nitro security chip traps all I/O to non-volatile
storage including BIOS and all I/O device firmware and any other controller
firmware on the server. This is a simple approach to security where the general
purpose processor is simply unable to change any firmware or device
configuration. Rather than accept the error prone and complex task of ensuring
access is approved and correct, no access is allowed. EC2 servers can’t update
their firmware. This is GREAT from a security perspective but the obvious
question is how is the firmware updated. It’s updated using by AWS and AWS only
through the Nitro System.

The Nitro Security Chip also implements the hardware root of
trust. This system replaces 10s of millions of lines of code that for the Unified
Extensible Firmware Interface
(UEFI) and supports secure boot. In starts
the server up untrusted, then measures every firmware system on the server to
ensure that none have been modified or changed in any unauthorized way.  Each checksum (device measure) is checked against
the verified correct checksum stored in the Nitro Security Chip.

Summary

The Nitro System supports key network, server, security,
firmware patching, and monitoring functions freeing up the entire underlying
server for customer use. This allows EC2 instances to have access to all cores
– none need to be reserved for storage or network I/O. This both gives more
resources over to our largest instance types for customer use – we don’t need
to reserve resource for housekeeping, monitoring, security, network I/O, or
storage. The Nitro System also makes possible the use of a very simple, light
weight hypervisor that is just about always quiescent and it allows us to
securely support bare metal instance types.

More data on the AWS Nitro System from Anthony Liguori, one
of the lead engineers behind the software systems that make up the AWS Nitro
System:

Three Keynotes for a fast past view for what’s new across
all of AWS:

Read More

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.