From: Michael Kelley <[email protected]>
Shared (decrypted) pages should never be returned to the page allocator,
lest future usage of the pages store data that should not be exposed to
the host. They may also cause the guest to crash if the page is used in
a way disallowed by HW (i.e. for executable code or as a page table).
Normally set_memory() call failures are rare. But in CoCo VMs
set_memory_XXcrypted() may involve calls to the untrusted host, and an
attacker could fail these calls such that:
1. set_memory_encrypted() returns an error and leaves the pages fully
shared.
2. set_memory_decrypted() returns an error, but the pages are actually
full converted to shared.
This means that patterns like the below can cause problems:
void *addr = alloc();
int fail = set_memory_decrypted(addr, 1);
if (fail)
free_pages(addr, 0);
And:
void *addr = alloc();
int fail = set_memory_decrypted(addr, 1);
if (fail) {
set_memory_encrypted(addr, 1);
free_pages(addr, 0);
}
Unfortunately these patterns appear in the kernel. And what the
set_memory() callers should do in this situation is not clear either. They
shouldn’t use them as shared because something clearly went wrong, but
they also need to fully reset the pages to private to free them. But, the
kernel needs the host's help to do this and the host is already being
uncooperative around the needed operations. So this isn't guaranteed to
succeed and the caller is kind of stuck with unusable pages.
The only choice is to panic or leak the pages. The kernel tries not to
panic if at all possible, so just leak the pages at the call sites.
Separately there is a patch[1] to warn if the guest detects strange host
behavior around this. It is stalled, so in the mean time I’m proceeding
with fixing the callers to leak the pages. No additional warnings are
added, because the plan is to warn in a single place in x86 set_memory()
code.
This series fixes the cases in the Hyper-V code.
This is the non-RFC/RFT version of Rick Edgecombe's previous series.[2]
Rick asked me to do this version based on my comments and the testing
I did. I've tested most of the error paths by hacking
set_memory_encrypted() to fail, and observing /proc/vmallocinfo and
/proc/buddyinfo to confirm that the memory is leaked as expected
instead of freed.
Changes in this version:
* Expanded commit message references to "TDX" to be "CoCo VMs" since
set_memory_encrypted() could fail in other configurations, such as
Hyper-V CoCo guests running with a paravisor on SEV-SNP processors.
* Changed "Subject:" prefixes to match historical practice in Hyper-V
related source files
* Patch 1: Added handling of set_memory_decrypted() failure
* Patch 2: Changed where the "decrypted" flag is set so that
error cases not related to set_memory_encrypted() are handled
correctly
* Patch 2: Fixed the polarity of the test for set_memory_encrypted()
failing
* Added Patch 5 to the series to properly handle free'ing of
ring buffer memory
* Fixed a few typos throughout
[1] https://lore.kernel.org/lkml/[email protected]/
[2] https://lore.kernel.org/linux-hyperv/[email protected]/
Michael Kelley (1):
Drivers: hv: vmbus: Don't free ring buffers that couldn't be
re-encrypted
Rick Edgecombe (4):
Drivers: hv: vmbus: Leak pages if set_memory_encrypted() fails
Drivers: hv: vmbus: Track decrypted status in vmbus_gpadl
hv_netvsc: Don't free decrypted memory
uio_hv_generic: Don't free decrypted memory
drivers/hv/channel.c | 16 ++++++++++++----
drivers/hv/connection.c | 11 +++++++----
drivers/net/hyperv/netvsc.c | 7 +++++--
drivers/uio/uio_hv_generic.c | 12 ++++++++----
include/linux/hyperv.h | 1 +
5 files changed, 33 insertions(+), 14 deletions(-)
--
2.25.1
From: Rick Edgecombe <[email protected]>
In CoCo VMs it is possible for the untrusted host to cause
set_memory_encrypted() or set_memory_decrypted() to fail such that an
error is returned and the resulting memory is shared. Callers need to
take care to handle these errors to avoid returning decrypted (shared)
memory to the page allocator, which could lead to functional or security
issues.
The netvsc driver could free decrypted/shared pages if
set_memory_decrypted() fails. Check the decrypted field in the gpadl
to decide whether to free the memory.
Signed-off-by: Rick Edgecombe <[email protected]>
Signed-off-by: Michael Kelley <[email protected]>
---
drivers/net/hyperv/netvsc.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 82e9796c8f5e..70b7f91fb96b 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -154,8 +154,11 @@ static void free_netvsc_device(struct rcu_head *head)
int i;
kfree(nvdev->extension);
- vfree(nvdev->recv_buf);
- vfree(nvdev->send_buf);
+
+ if (!nvdev->recv_buf_gpadl_handle.decrypted)
+ vfree(nvdev->recv_buf);
+ if (!nvdev->send_buf_gpadl_handle.decrypted)
+ vfree(nvdev->send_buf);
bitmap_free(nvdev->send_section_map);
for (i = 0; i < VRSS_CHANNEL_MAX; i++) {
--
2.25.1
From: Rick Edgecombe <[email protected]>
In CoCo VMs it is possible for the untrusted host to cause
set_memory_encrypted() or set_memory_decrypted() to fail such that an
error is returned and the resulting memory is shared. Callers need to
take care to handle these errors to avoid returning decrypted (shared)
memory to the page allocator, which could lead to functional or security
issues.
In order to make sure callers of vmbus_establish_gpadl() and
vmbus_teardown_gpadl() don't return decrypted/shared pages to
allocators, add a field in struct vmbus_gpadl to keep track of the
decryption status of the buffers. This will allow the callers to
know if they should free or leak the pages.
Signed-off-by: Rick Edgecombe <[email protected]>
Signed-off-by: Michael Kelley <[email protected]>
---
drivers/hv/channel.c | 25 +++++++++++++++++++++----
include/linux/hyperv.h | 1 +
2 files changed, 22 insertions(+), 4 deletions(-)
diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index 56f7e06c673e..bb5abdcda18f 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -472,9 +472,18 @@ static int __vmbus_establish_gpadl(struct vmbus_channel *channel,
(atomic_inc_return(&vmbus_connection.next_gpadl_handle) - 1);
ret = create_gpadl_header(type, kbuffer, size, send_offset, &msginfo);
- if (ret)
+ if (ret) {
+ gpadl->decrypted = false;
return ret;
+ }
+ /*
+ * Set the "decrypted" flag to true for the set_memory_decrypted()
+ * success case. In the failure case, the encryption state of the
+ * memory is unknown. Leave "decrypted" as true to ensure the
+ * memory will be leaked instead of going back on the free list.
+ */
+ gpadl->decrypted = true;
ret = set_memory_decrypted((unsigned long)kbuffer,
PFN_UP(size));
if (ret) {
@@ -563,9 +572,15 @@ static int __vmbus_establish_gpadl(struct vmbus_channel *channel,
kfree(msginfo);
- if (ret)
- set_memory_encrypted((unsigned long)kbuffer,
- PFN_UP(size));
+ if (ret) {
+ /*
+ * If set_memory_encrypted() fails, the decrypted flag is
+ * left as true so the memory is leaked instead of being
+ * put back on the free list.
+ */
+ if (!set_memory_encrypted((unsigned long)kbuffer, PFN_UP(size)))
+ gpadl->decrypted = false;
+ }
return ret;
}
@@ -886,6 +901,8 @@ int vmbus_teardown_gpadl(struct vmbus_channel *channel, struct vmbus_gpadl *gpad
if (ret)
pr_warn("Fail to set mem host visibility in GPADL teardown %d.\n", ret);
+ gpadl->decrypted = ret;
+
return ret;
}
EXPORT_SYMBOL_GPL(vmbus_teardown_gpadl);
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 2b00faf98017..5bac136c268c 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -812,6 +812,7 @@ struct vmbus_gpadl {
u32 gpadl_handle;
u32 size;
void *buffer;
+ bool decrypted;
};
struct vmbus_channel {
--
2.25.1
From: Rick Edgecombe <[email protected]>
In CoCo VMs it is possible for the untrusted host to cause
set_memory_encrypted() or set_memory_decrypted() to fail such that an
error is returned and the resulting memory is shared. Callers need to
take care to handle these errors to avoid returning decrypted (shared)
memory to the page allocator, which could lead to functional or security
issues.
VMBus code could free decrypted pages if set_memory_encrypted()/decrypted()
fails. Leak the pages if this happens.
Signed-off-by: Rick Edgecombe <[email protected]>
Signed-off-by: Michael Kelley <[email protected]>
---
drivers/hv/connection.c | 29 ++++++++++++++++++++++-------
1 file changed, 22 insertions(+), 7 deletions(-)
diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 3cabeeabb1ca..f001ae880e1d 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -237,8 +237,17 @@ int vmbus_connect(void)
vmbus_connection.monitor_pages[0], 1);
ret |= set_memory_decrypted((unsigned long)
vmbus_connection.monitor_pages[1], 1);
- if (ret)
+ if (ret) {
+ /*
+ * If set_memory_decrypted() fails, the encryption state
+ * of the memory is unknown. So leak the memory instead
+ * of risking returning decrypted memory to the free list.
+ * For simplicity, always handle both pages the same.
+ */
+ vmbus_connection.monitor_pages[0] = NULL;
+ vmbus_connection.monitor_pages[1] = NULL;
goto cleanup;
+ }
/*
* Set_memory_decrypted() will change the memory contents if
@@ -337,13 +346,19 @@ void vmbus_disconnect(void)
vmbus_connection.int_page = NULL;
}
- set_memory_encrypted((unsigned long)vmbus_connection.monitor_pages[0], 1);
- set_memory_encrypted((unsigned long)vmbus_connection.monitor_pages[1], 1);
+ if (vmbus_connection.monitor_pages[0]) {
+ if (!set_memory_encrypted(
+ (unsigned long)vmbus_connection.monitor_pages[0], 1))
+ hv_free_hyperv_page(vmbus_connection.monitor_pages[0]);
+ vmbus_connection.monitor_pages[0] = NULL;
+ }
- hv_free_hyperv_page(vmbus_connection.monitor_pages[0]);
- hv_free_hyperv_page(vmbus_connection.monitor_pages[1]);
- vmbus_connection.monitor_pages[0] = NULL;
- vmbus_connection.monitor_pages[1] = NULL;
+ if (vmbus_connection.monitor_pages[1]) {
+ if (!set_memory_encrypted(
+ (unsigned long)vmbus_connection.monitor_pages[1], 1))
+ hv_free_hyperv_page(vmbus_connection.monitor_pages[1]);
+ vmbus_connection.monitor_pages[1] = NULL;
+ }
}
/*
--
2.25.1
From: Rick Edgecombe <[email protected]>
In CoCo VMs it is possible for the untrusted host to cause
set_memory_encrypted() or set_memory_decrypted() to fail such that an
error is returned and the resulting memory is shared. Callers need to
take care to handle these errors to avoid returning decrypted (shared)
memory to the page allocator, which could lead to functional or security
issues.
The VMBus device UIO driver could free decrypted/shared pages if
set_memory_decrypted() fails. Check the decrypted field in the gpadl
to decide whether to free the memory.
Signed-off-by: Rick Edgecombe <[email protected]>
Signed-off-by: Michael Kelley <[email protected]>
---
drivers/uio/uio_hv_generic.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/drivers/uio/uio_hv_generic.c b/drivers/uio/uio_hv_generic.c
index 20d9762331bd..6be3462b109f 100644
--- a/drivers/uio/uio_hv_generic.c
+++ b/drivers/uio/uio_hv_generic.c
@@ -181,12 +181,14 @@ hv_uio_cleanup(struct hv_device *dev, struct hv_uio_private_data *pdata)
{
if (pdata->send_gpadl.gpadl_handle) {
vmbus_teardown_gpadl(dev->channel, &pdata->send_gpadl);
- vfree(pdata->send_buf);
+ if (!pdata->send_gpadl.decrypted)
+ vfree(pdata->send_buf);
}
if (pdata->recv_gpadl.gpadl_handle) {
vmbus_teardown_gpadl(dev->channel, &pdata->recv_gpadl);
- vfree(pdata->recv_buf);
+ if (!pdata->recv_gpadl.decrypted)
+ vfree(pdata->recv_buf);
}
}
@@ -295,7 +297,8 @@ hv_uio_probe(struct hv_device *dev,
ret = vmbus_establish_gpadl(channel, pdata->recv_buf,
RECV_BUFFER_SIZE, &pdata->recv_gpadl);
if (ret) {
- vfree(pdata->recv_buf);
+ if (!pdata->recv_gpadl.decrypted)
+ vfree(pdata->recv_buf);
goto fail_close;
}
@@ -317,7 +320,8 @@ hv_uio_probe(struct hv_device *dev,
ret = vmbus_establish_gpadl(channel, pdata->send_buf,
SEND_BUFFER_SIZE, &pdata->send_gpadl);
if (ret) {
- vfree(pdata->send_buf);
+ if (!pdata->send_gpadl.decrypted)
+ vfree(pdata->send_buf);
goto fail_close;
}
--
2.25.1
From: Michael Kelley <[email protected]>
In CoCo VMs it is possible for the untrusted host to cause
set_memory_encrypted() or set_memory_decrypted() to fail such that an
error is returned and the resulting memory is shared. Callers need to
take care to handle these errors to avoid returning decrypted (shared)
memory to the page allocator, which could lead to functional or security
issues.
The VMBus ring buffer code could free decrypted/shared pages if
set_memory_decrypted() fails. Check the decrypted field in the struct
vmbus_gpadl for the ring buffers to decide whether to free the memory.
Signed-off-by: Michael Kelley <[email protected]>
---
drivers/hv/channel.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index bb5abdcda18f..47e1bd8de9fc 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -153,7 +153,9 @@ void vmbus_free_ring(struct vmbus_channel *channel)
hv_ringbuffer_cleanup(&channel->inbound);
if (channel->ringbuffer_page) {
- __free_pages(channel->ringbuffer_page,
+ /* In a CoCo VM leak the memory if it didn't get re-encrypted */
+ if (!channel->ringbuffer_gpadlhandle.decrypted)
+ __free_pages(channel->ringbuffer_page,
get_order(channel->ringbuffer_pagecount
<< PAGE_SHIFT));
channel->ringbuffer_page = NULL;
--
2.25.1
On Mon, Mar 11, 2024 at 09:15:53AM -0700, [email protected] wrote:
> From: Michael Kelley <[email protected]>
>
> Shared (decrypted) pages should never be returned to the page allocator,
> lest future usage of the pages store data that should not be exposed to
> the host. They may also cause the guest to crash if the page is used in
> a way disallowed by HW (i.e. for executable code or as a page table).
>
> Normally set_memory() call failures are rare. But in CoCo VMs
> set_memory_XXcrypted() may involve calls to the untrusted host, and an
> attacker could fail these calls such that:
> 1. set_memory_encrypted() returns an error and leaves the pages fully
> shared.
> 2. set_memory_decrypted() returns an error, but the pages are actually
> full converted to shared.
>
> This means that patterns like the below can cause problems:
> void *addr = alloc();
> int fail = set_memory_decrypted(addr, 1);
> if (fail)
> free_pages(addr, 0);
>
> And:
> void *addr = alloc();
> int fail = set_memory_decrypted(addr, 1);
> if (fail) {
> set_memory_encrypted(addr, 1);
> free_pages(addr, 0);
> }
>
> Unfortunately these patterns appear in the kernel. And what the
> set_memory() callers should do in this situation is not clear either. They
> shouldn’t use them as shared because something clearly went wrong, but
> they also need to fully reset the pages to private to free them. But, the
> kernel needs the host's help to do this and the host is already being
> uncooperative around the needed operations. So this isn't guaranteed to
> succeed and the caller is kind of stuck with unusable pages.
>
> The only choice is to panic or leak the pages. The kernel tries not to
> panic if at all possible, so just leak the pages at the call sites.
> Separately there is a patch[1] to warn if the guest detects strange host
> behavior around this. It is stalled, so in the mean time I’m proceeding
> with fixing the callers to leak the pages. No additional warnings are
> added, because the plan is to warn in a single place in x86 set_memory()
> code.
>
> This series fixes the cases in the Hyper-V code.
>
> This is the non-RFC/RFT version of Rick Edgecombe's previous series.[2]
> Rick asked me to do this version based on my comments and the testing
> I did. I've tested most of the error paths by hacking
> set_memory_encrypted() to fail, and observing /proc/vmallocinfo and
> /proc/buddyinfo to confirm that the memory is leaked as expected
> instead of freed.
>
> Changes in this version:
> * Expanded commit message references to "TDX" to be "CoCo VMs" since
> set_memory_encrypted() could fail in other configurations, such as
> Hyper-V CoCo guests running with a paravisor on SEV-SNP processors.
> * Changed "Subject:" prefixes to match historical practice in Hyper-V
> related source files
> * Patch 1: Added handling of set_memory_decrypted() failure
> * Patch 2: Changed where the "decrypted" flag is set so that
> error cases not related to set_memory_encrypted() are handled
> correctly
> * Patch 2: Fixed the polarity of the test for set_memory_encrypted()
> failing
> * Added Patch 5 to the series to properly handle free'ing of
> ring buffer memory
> * Fixed a few typos throughout
>
> [1] https://lore.kernel.org/lkml/[email protected]/
> [2] https://lore.kernel.org/linux-hyperv/[email protected]/
>
> Michael Kelley (1):
> Drivers: hv: vmbus: Don't free ring buffers that couldn't be
> re-encrypted
>
> Rick Edgecombe (4):
> Drivers: hv: vmbus: Leak pages if set_memory_encrypted() fails
> Drivers: hv: vmbus: Track decrypted status in vmbus_gpadl
> hv_netvsc: Don't free decrypted memory
> uio_hv_generic: Don't free decrypted memory
Applied to hyperv-fixes. Thanks.
On Wed, 2024-04-10 at 21:34 +0000, Wei Liu wrote:
>
> Applied to hyperv-fixes. Thanks.
Thanks, and thanks to Michael for getting it across the finish line.