[Semi Thesis Review for me] Meltdown: reading Kernel Memory from User Space -(3)

5 min readJun 24, 2020

Author: Moritz Lipp1 , Michael Schwarz1 , Daniel Gruss1 , Thomas Prescher2 , Werner Haas2 , Anders Fogh3 , Jann Horn4 , Stefan Mangard1 , Paul Kocher5 , Daniel Genkin6,9 , Yuval Yarom7 , Mike Hamburg8 1Graz University of Technology, 2Cyberus Technology GmbH, 3G-Data Advanced Analytics, 4Google Project Zero, 5 Independent (www.paulkocher.com), 6University of Michigan, 7University of Adelaide & Data61, 8Rambus, Cryptography Research Division

This paper is included in the Preceedings of the 27th USENIX Security Symposium. (August 15–17, 2018, Baltimore, MD, USA)

Total number of pages: 18

Writer: Yu-gyoung Yun
(DGIST, undergraduate student. searchien@dgist.ac.kr)

[5. Meltdown]

Meltdown, a powerful attack allowing to read arbitrary physical memory from an unprivileged user program.

Attack setting:

In our attack, we consider personal computers and virtual machines in the cloud.

The attacker has arbitrary unprivileged code execution on the attacked system, i.e., the attacker can run any code with the privileges of a normal user.

But) the attacker has no physical access to the machine.

[5.1 Attack Description]

Meltdown combines the two building blocks discussed in Section 4.

Meltdown consists of 3 steps:

Step 1. Reading the secret.

The content of an attacker-chosen memory location, which is inaccessible to the attacker, is loaded into a register.

To load data from the main memory into the register.
In parallel to translating a virtual address into a physical address, the CPU also checks the permission bits of the virtual address, i.e., whether this virtual address is user accessible or only accessible by the kernel.

→ modern OS always map the entire kernel into the virtual address space of every user process.

→all kernel addresses lead to a valid physical address when translating them, and the CPU can access the content of such addresses.

again) Meltdown exploits the OoO execution of modern CPUs, which still executes instructions in the small time window between the illegal memory access and the raising of the exception.

line 4, load the byte value located at the target kernel address, stored int he RCX register, into the least significant byte of the RAX register represented by AL.

MOV instruction is retired, the exception is registered, and the pipeline is flushed to eliminate all results of subsequent instructions which were executed OoO. →there is a race condition between raising this exception and our attack step 2 as described below.

They found that prefetching the kernel address can slightly improve the performance of the attack on some systems.

Step 2. Transmitting the secret.

A transient instruction accesses a cache line based on the secret content of the register.

If the transient instruction sequence is executed before the MOV instruction is retired(i.e., raises the exception), and the transient instruction sequence performed computations based on the secret, it can be utilized to transmit the secret to the attacker.

allocate a probe array in memory and ensure that no part of this array is cached.

In line 5 of Listing 2, the secret value from step 1 is multiplied by the page size, i.e., 4KB. We read a single byte at once. Hence, our probe array is 256*4096 bytes, assuming 4KB pages.

In the OoO execution we have a noise-bias towards register value ‘0’.

For this reason, we introduce a retry-logic into the transient instruction sequence. → If we read a ‘0’, back to STEP 1.

In line 7, multiplied secret + base address ←forming the target address of the covert channel → this address can be caching (L1, L3) → →Transient instructino sequence affects the cache state based on the secret value that was read in STEP 1.

Since the transient instruction sequence in step 2 races against raising the exception, reducing the runtime of step2 can significantly improve the performance of the attack. →being cached in the translationlookaside buffer (TLB) increases the attack performance on some systems.

Step 3. Receiving the secret.

The attacker uses Flush+Reload to determine the accessed cache line and hence the secret stored at the chosen memory location.

By repeating these steps for different memory locations, the attacker can dump the kernel memory, including the entire physical memory.

Exactly one cache line of the probe array is cached. (The position of the cached cache line within the probe array depends only on the secret which is read in step 1.)

Thus, the attacker iterates over all 256 pages of the probe array and measures the access time for every first cache line (i.e., offset) on the page.

Dumping the entire physical memory.

As the memory access to the kernel address raises an exception, we have to handle or suppress the exception.

As all major operating systems also typically map the entire physical memory into the kernel address space (cf. Section 2.2) in every user process,

Again) Meltdown can also read the entire physical memory of the target machine.

[5.2 Optimizations and Limitations]

* Inherent bias towards 0. → Race condition

Q. Race condition?

A. a race condition is an undesirable situation that occurs when a device or system attempts to perform two or more operations at the same time, but because of the nature of the device or system, the operations must be done in the proper sequence to be done correctly.

Illegal memory load often returns a ‘0’ (Line 4), when implemented using an add instruction instead of the mov.

The reason for this bias to ‘0’ may either be that the memory load is masked out by a failed permission check, or a speculated value because the data of the stalled load is not available yet.

The maximum number of retries is an optimization parameter influencing the attack performance and the error rate.

* Optimizing the case of 0.

This loop is terminated either by reading a non-zero value or by the raised exception of the invalid memory access.

Hence, these optimizations may increase the attack performance.

* Single-bit transmission.

In the attack description in Section 5.1, the attacker transmitted 8 bits through the covert channel at once and performed 28 = 256 Flush+ Reload measurements to recover the secret.

Trade off

( ‘running more transient instr’ VS

‘performing more Flush+Reload measurements’)

In fact, with this implementation, almost the entire time is spent on Flush+Reload measurements.

The number of bits read and transmitted at once is a trade-off between some implicit error-reduction and the overall transmission rate of the covert channel.