Confidential Kubernetes: Use Confidential Virtual Machines and Enclaves to improve your cluster security

Authors: Fabian Kammel (Edgeless Systems), Mikko Ylinen (Intel), Tobin Feldman-Fitzthum (IBM)

In this blog post, we will introduce the concept of Confidential Computing (CC) to improve any computing environment's security and privacy properties. Further, we will show how the Cloud-Native ecosystem, particularly Kubernetes, can benefit from the new compute paradigm.

Confidential Computing is a concept that has been introduced previously in the cloud-native world. The Confidential Computing Consortium (CCC) is a project community in the Linux Foundation that already worked on Defining and Enabling Confidential Computing. In the Whitepaper, they provide a great motivation for the use of Confidential Computing:

Data exists in three states: in transit, at rest, and in use. …Protecting sensitive data in all of its states is more critical than ever. Cryptography is now commonly deployed to provide both data confidentiality (stopping unauthorized viewing) and data integrity (preventing or detecting unauthorized changes). While techniques to protect data in transit and at rest are now commonly deployed, the third state - protecting data in use - is the new frontier.

Confidential Computing aims to primarily solve the problem of protecting data in use by introducing a hardware-enforced Trusted Execution Environment (TEE).

Trusted Execution Environments

For more than a decade, Trusted Execution Environments (TEEs) have been available in commercial computing hardware in the form of Hardware Security Modules (HSMs) and Trusted Platform Modules (TPMs). These technologies provide trusted environments for shielded computations. They can store highly sensitive cryptographic keys and carry out critical cryptographic operations such as signing or encrypting data.

TPMs are optimized for low cost, allowing them to be integrated into mainboards and act as a system's physical root of trust. To keep the cost low, TPMs are limited in scope, i.e., they provide storage for only a few keys and are capable of just a small subset of cryptographic operations.

In contrast, HSMs are optimized for high performance, providing secure storage for far more keys and offering advanced physical attack detection mechanisms. Additionally, high-end HSMs can be programmed so that arbitrary code can be compiled and executed. The downside is that they are very costly. A managed CloudHSM from AWS costs around $1.50 / hour or ~$13,500 / year.

In recent years, a new kind of TEE has gained popularity. Technologies like AMD SEV, Intel SGX, and Intel TDX provide TEEs that are closely integrated with userspace. Rather than low-power or high-performance devices that support specific use cases, these TEEs shield normal processes or virtual machines and can do so with relatively low overhead. These technologies each have different design goals, advantages, and limitations, and they are available in different environments, including consumer laptops, servers, and mobile devices.

Additionally, we should mention ARM TrustZone, which is optimized for embedded devices such as smartphones, tablets, and smart TVs, as well as AWS Nitro Enclaves, which are only available on Amazon Web Services and have a different threat model compared to the CPU-based solutions by Intel and AMD.

IBM Secure Execution for Linux lets you run your Kubernetes cluster's nodes as KVM guests within a trusted execution environment on IBM Z series hardware. You can use this hardware-enhanced virtual machine isolation to provide strong isolation between tenants in a cluster, with hardware attestation about the (virtual) node's integrity.

Security properties and feature set

In the following sections, we will review the security properties and additional features these new technologies bring to the table. Only some solutions will provide all properties; we will discuss each technology in further detail in their respective section.

The Confidentiality property ensures that information cannot be viewed while it is in use in the TEE. This provides us with the highly desired feature to secure data in use. Depending on the specific TEE used, both code and data may be protected from outside viewers. The differences in TEE architectures and how their use in a cloud native context are important considerations when designing end-to-end security for sensitive workloads with a minimal Trusted Computing Base (TCB) in mind. CCC has recently worked on a common vocabulary and supporting material that helps to explain where confidentiality boundaries are drawn with the different TEE architectures and how that impacts the TCB size.

Confidentiality is a great feature, but an attacker can still manipulate or inject arbitrary code and data for the TEE to execute and, therefore, easily leak critical information. Integrity guarantees a TEE owner that neither code nor data can be tampered with while running critical computations.

Availability is a basic property often discussed in the context of information security. However, this property is outside the scope of most TEEs. Usually, they can be controlled (shut down, restarted, …) by some higher level abstraction. This could be the CPU itself, the hypervisor, or the kernel. This is to preserve the overall system's availability, not the TEE itself. When running in the cloud, availability is usually guaranteed by the cloud provider in terms of Service Level Agreements (SLAs) and is not cryptographically enforceable.

Confidentiality and Integrity by themselves are only helpful in some cases. For example, consider a TEE running in a remote cloud. How would you know the TEE is genuine and running your intended software? It could be an imposter stealing your data as soon as you send it over. This fundamental problem is addressed by Attestability. Attestation allows us to verify the identity, confidentiality, and integrity of TEEs based on cryptographic certificates issued from the hardware itself. This feature can also be made available to clients outside of the confidential computing hardware in the form of remote attestation.

TEEs can hold and process information that predates or outlives the trusted environment. That could mean across restarts, different versions, or platform migrations. Therefore Recoverability is an important feature. Data and the state of a TEE need to be sealed before they are written to persistent storage to maintain confidentiality and integrity guarantees. The access to such sealed data needs to be well-defined. In most cases, the unsealing is bound to a TEE's identity. Hence, making sure the recovery can only happen in the same confidential context.

This does not have to limit the flexibility of the overall system. AMD SEV-SNP's migration agent (MA) allows users to migrate a confidential virtual machine to a different host system while keeping the security properties of the TEE intact.

Feature comparison

These sections of the article will dive a little bit deeper into the specific implementations, compare supported features and analyze their security properties.

AMD SEV

AMD's Secure Encrypted Virtualization (SEV) technologies are a set of features to enhance the security of virtual machines on AMD's server CPUs. SEV transparently encrypts the memory of each VM with a unique key. SEV can also calculate a signature of the memory contents, which can be sent to the VM's owner as an attestation that the initial guest memory was not manipulated.

The second generation of SEV, known as Encrypted State or SEV-ES, provides additional protection from the hypervisor by encrypting all CPU register contents when a context switch occurs.

The third generation of SEV, Secure Nested Paging or SEV-SNP, is designed to prevent software-based integrity attacks and reduce the risk associated with compromised memory integrity. The basic principle of SEV-SNP integrity is that if a VM can read a private (encrypted) memory page, it must always read the value it last wrote.

Additionally, by allowing the guest to obtain remote attestation statements dynamically, SNP enhances the remote attestation capabilities of SEV.

AMD SEV has been implemented incrementally. New features and improvements have been added with each new CPU generation. The Linux community makes these features available as part of the KVM hypervisor and for host and guest kernels. The first SEV features were discussed and implemented in 2016 - see AMD x86 Memory Encryption Technologies from the 2016 Usenix Security Symposium. The latest big addition was SEV-SNP guest support in Linux 5.19.

Confidential VMs based on AMD SEV-SNP are available in Microsoft Azure since July 2022. Similarly, Google Cloud Platform (GCP) offers confidential VMs based on AMD SEV-ES.

Intel SGX

Intel's Software Guard Extensions has been available since 2015 and were introduced with the Skylake architecture.

SGX is an instruction set that enables users to create a protected and isolated process called an enclave. It provides a reverse sandbox that protects enclaves from the operating system, firmware, and any other privileged execution context.

The enclave memory cannot be read or written from outside the enclave, regardless of the current privilege level and CPU mode. The only way to call an enclave function is through a new instruction that performs several protection checks. Its memory is encrypted. Tapping the memory or connecting the DRAM modules to another system will yield only encrypted data. The memory encryption key randomly changes every power cycle. The key is stored within the CPU and is not accessible.

Since the enclaves are process isolated, the operating system's libraries are not usable as is; therefore, SGX enclave SDKs are required to compile programs for SGX. This also implies applications need to be designed and implemented to consider the trusted/untrusted isolation boundaries. On the other hand, applications get built with very minimal TCB.

An emerging approach to easily transition to process-based confidential computing and avoid the need to build custom applications is to utilize library OSes. These OSes facilitate running native, unmodified Linux applications inside SGX enclaves. A library OS intercepts all application requests to the host OS and processes them securely without the application knowing it's running a TEE.

The 3rd generation Xeon CPUs (aka Ice Lake Server - "ICX") and later generations did switch to using a technology called Total Memory Encryption - Multi-Key (TME-MK) that uses AES-XTS, moving away from the Memory Encryption Engine that the consumer and Xeon E CPUs used. This increased the possible enclave page cache (EPC) size (up to 512GB/CPU) and improved performance. More info about SGX on multi-socket platforms can be found in the Whitepaper.

A list of supported platforms is available from Intel.

SGX is available on Azure, Alibaba Cloud, IBM, and many more.

Intel TDX

Where Intel SGX aims to protect the context of a single process, Intel's Trusted Domain Extensions protect a full virtual machine and are, therefore, most closely comparable to AMD SEV.

As with SEV-SNP, guest support for TDX was merged in Linux Kernel 5.19. However, hardware support will land with Sapphire Rapids during 2023: Alibaba Cloud provides invitational preview instances, and Azure has announced its TDX preview opportunity.

Overhead analysis

The benefits that Confidential Computing technologies provide via strong isolation and enhanced security to customer data and workloads are not for free. Quantifying this impact is challenging and depends on many factors: The TEE technology, the benchmark, the metrics, and the type of workload all have a huge impact on the expected performance overhead.

Intel SGX-based TEEs are hard to benchmark, as shown by different papers. The chosen SDK/library OS, the application itself, as well as the resource requirements (especially large memory requirements) have a huge impact on performance. A single-digit percentage overhead can be expected if an application is well suited to run inside an enclave.

Confidential virtual machines based on AMD SEV-SNP require no changes to the executed program and operating system and are a lot easier to benchmark. A benchmark from Azure and AMD shows that SEV-SNP VM overhead is <10%, sometimes as low as 2%.

Although there is a performance overhead, it should be low enough to enable real-world workloads to run in these protected environments and improve the security and privacy of our data.

Confidential Computing compared to FHE, ZKP, and MPC

Fully Homomorphic Encryption (FHE), Zero Knowledge Proof/Protocol (ZKP), and Multi-Party Computations (MPC) are all a form of encryption or cryptographic protocols that offer similar security guarantees to Confidential Computing but do not require hardware support.

Fully (also partially and somewhat) homomorphic encryption allows one to perform computations, such as addition or multiplication, on encrypted data. This provides the property of encryption in use but does not provide integrity protection or attestation like confidential computing does. Therefore, these two technologies can complement to each other.

Zero Knowledge Proofs or Protocols are a privacy-preserving technique (PPT) that allows one party to prove facts about their data without revealing anything else about the data. ZKP can be used instead of or in addition to Confidential Computing to protect the privacy of the involved parties and their data. Similarly, Multi-Party Computation enables multiple parties to work together on a computation, i.e., each party provides their data to the result without leaking it to any other parties.

Use cases of Confidential Computing

The presented Confidential Computing platforms show that both the isolation of a single container process and, therefore, minimization of the trusted computing base and the isolation of a `` full virtual machine are possible. This has already enabled a lot of interesting and secure projects to emerge:

Confidential Containers

Confidential Containers (CoCo) is a CNCF sandbox project that isolates Kubernetes pods inside of confidential virtual machines.

CoCo can be installed on a Kubernetes cluster with an operator. The operator will create a set of runtime classes that can be used to deploy pods inside an enclave on several different platforms, including AMD SEV, Intel TDX, Secure Execution for IBM Z, and Intel SGX.

CoCo is typically used with signed and/or encrypted container images which are pulled, verified, and decrypted inside the enclave. Secrets, such as image decryption keys, are conditionally provisioned to the enclave by a trusted Key Broker Service that validates the hardware evidence of the TEE prior to releasing any sensitive information.

CoCo has several deployment models. Since the Kubernetes control plane is outside the TCB, CoCo is suitable for managed environments. CoCo can be run in virtual environments that don't support nesting with the help of an API adaptor that starts pod VMs in the cloud. CoCo can also be run on bare metal, providing strong isolation even in multi-tenant environments.

Managed confidential Kubernetes

Azure and GCP both support the use of confidential virtual machines as worker nodes for their managed Kubernetes offerings.

Both services aim for better workload protection and security guarantees by enabling memory encryption for container workloads. However, they don't seek to fully isolate the cluster or workloads against the service provider or infrastructure. Specifically, they don't offer a dedicated confidential control plane or expose attestation capabilities for the confidential cluster/nodes.

Azure also enables Confidential Containers in their managed Kubernetes offering. They support the creation based on Intel SGX enclaves and AMD SEV-based VMs.

Constellation

Constellation is a Kubernetes engine that aims to provide the best possible data security. Constellation wraps your entire Kubernetes cluster into a single confidential context that is shielded from the underlying cloud infrastructure. Everything inside is always encrypted, including at runtime in memory. It shields both the worker and control plane nodes. In addition, it already integrates with popular CNCF software such as Cilium for secure networking and provides extended CSI drivers to write data securely.

Occlum and Gramine

Occlum and Gramine are examples of open source library OS projects that can be used to run unmodified applications in SGX enclaves. They are member projects under the CCC, but similar projects and products maintained by companies also exist. With these libOS projects, existing containerized applications can be easily converted into confidential computing enabled containers. Many curated prebuilt containers are also available.

Where are we today? Vendors, limitations, and FOSS landscape

As we hope you have seen from the previous sections, Confidential Computing is a powerful new concept to improve security, but we are still in the (early) adoption phase. New products are starting to emerge to take advantage of the unique properties.

Google and Microsoft are the first major cloud providers to have confidential offerings that can run unmodified applications inside a protected boundary. Still, these offerings are limited to compute, while end-to-end solutions for confidential databases, cluster networking, and load balancers have to be self-managed.

These technologies provide opportunities to bring even the most sensitive workloads into the cloud and enables them to leverage all the tools in the CNCF landscape.

Call to action

If you are currently working on a high-security product that struggles to run in the public cloud due to legal requirements or are looking to bring the privacy and security of your cloud-native project to the next level: Reach out to all the great projects we have highlighted! Everyone is keen to improve the security of our ecosystem, and you can play a vital role in that journey.