2023-12-30 16:25:33

by Michael Roth

[permalink] [raw]
Subject: [PATCH v1 15/26] x86/sev: Introduce snp leaked pages list

From: Ashish Kalra <[email protected]>

Pages are unsafe to be released back to the page-allocator, if they
have been transitioned to firmware/guest state and can't be reclaimed
or transitioned back to hypervisor/shared state. In this case add
them to an internal leaked pages list to ensure that they are not freed
or touched/accessed to cause fatal page faults.

Signed-off-by: Ashish Kalra <[email protected]>
[mdr: relocate to arch/x86/virt/svm/sev.c]
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/include/asm/sev.h | 2 ++
arch/x86/virt/svm/sev.c | 35 +++++++++++++++++++++++++++++++++++
2 files changed, 37 insertions(+)

diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index d3ccb7a0c7e9..435ba9bc4510 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -264,6 +264,7 @@ void snp_dump_hva_rmpentry(unsigned long address);
int psmash(u64 pfn);
int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid, bool immutable);
int rmp_make_shared(u64 pfn, enum pg_level level);
+void snp_leak_pages(u64 pfn, unsigned int npages);
#else
static inline bool snp_probe_rmptable_info(void) { return false; }
static inline int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level) { return -ENODEV; }
@@ -275,6 +276,7 @@ static inline int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int as
return -ENODEV;
}
static inline int rmp_make_shared(u64 pfn, enum pg_level level) { return -ENODEV; }
+static inline void snp_leak_pages(u64 pfn, unsigned int npages) {}
#endif

#endif
diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
index ee182351d93a..0f2e1ce241b5 100644
--- a/arch/x86/virt/svm/sev.c
+++ b/arch/x86/virt/svm/sev.c
@@ -60,6 +60,17 @@ static u64 probed_rmp_base, probed_rmp_size;
static struct rmpentry *rmptable __ro_after_init;
static u64 rmptable_max_pfn __ro_after_init;

+/* List of pages which are leaked and cannot be reclaimed */
+struct leaked_page {
+ struct page *page;
+ struct list_head list;
+};
+
+static LIST_HEAD(snp_leaked_pages_list);
+static DEFINE_SPINLOCK(snp_leaked_pages_list_lock);
+
+static unsigned long snp_nr_leaked_pages;
+
#undef pr_fmt
#define pr_fmt(fmt) "SEV-SNP: " fmt

@@ -476,3 +487,27 @@ int rmp_make_shared(u64 pfn, enum pg_level level)
return rmpupdate(pfn, &state);
}
EXPORT_SYMBOL_GPL(rmp_make_shared);
+
+void snp_leak_pages(u64 pfn, unsigned int npages)
+{
+ struct page *page = pfn_to_page(pfn);
+ struct leaked_page *leak;
+
+ pr_debug("%s: leaking PFN range 0x%llx-0x%llx\n", __func__, pfn, pfn + npages);
+
+ spin_lock(&snp_leaked_pages_list_lock);
+ while (npages--) {
+ leak = kzalloc(sizeof(*leak), GFP_KERNEL_ACCOUNT);
+ if (!leak)
+ goto unlock;
+ leak->page = page;
+ list_add_tail(&leak->list, &snp_leaked_pages_list);
+ dump_rmpentry(pfn);
+ snp_nr_leaked_pages++;
+ pfn++;
+ page++;
+ }
+unlock:
+ spin_unlock(&snp_leaked_pages_list_lock);
+}
+EXPORT_SYMBOL_GPL(snp_leak_pages);
--
2.25.1



2024-01-08 10:45:54

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v1 15/26] x86/sev: Introduce snp leaked pages list

On 12/30/23 17:19, Michael Roth wrote:
> From: Ashish Kalra <[email protected]>
>
> Pages are unsafe to be released back to the page-allocator, if they
> have been transitioned to firmware/guest state and can't be reclaimed
> or transitioned back to hypervisor/shared state. In this case add
> them to an internal leaked pages list to ensure that they are not freed
> or touched/accessed to cause fatal page faults.
>
> Signed-off-by: Ashish Kalra <[email protected]>
> [mdr: relocate to arch/x86/virt/svm/sev.c]
> Signed-off-by: Michael Roth <[email protected]>

Hi, sorry I didn't respond in time to the last mail discussing previous
version in
https://lore.kernel.org/all/[email protected]/
due to upcoming holidays.

I would rather avoid the approach of allocating container objects:
- it's allocating memory when effectively losing memory, a dangerous thing
- are all the callers and their context ok with GFP_KERNEL?
- GFP_KERNEL_ACCOUNT seems wrong, why would we be charging this to the
current process, it's probably not its fault the pages are leaked? Also the
charging can fail?
- given the benefit of having leaked pages on a list is basically just
debugging (i.e. crash dump or drgn inspection) this seems too heavy

I think it would be better and sufficient to use page->lru for order-0 and
head pages, and simply skip tail pages (possibly with adjusted warning
message for that case).

Vlastimil

<snip>

> +
> +void snp_leak_pages(u64 pfn, unsigned int npages)
> +{
> + struct page *page = pfn_to_page(pfn);
> + struct leaked_page *leak;
> +
> + pr_debug("%s: leaking PFN range 0x%llx-0x%llx\n", __func__, pfn, pfn + npages);
> +
> + spin_lock(&snp_leaked_pages_list_lock);
> + while (npages--) {
> + leak = kzalloc(sizeof(*leak), GFP_KERNEL_ACCOUNT);
> + if (!leak)
> + goto unlock;

Should we skip the dump_rmpentry() in such a case?

> + leak->page = page;
> + list_add_tail(&leak->list, &snp_leaked_pages_list);
> + dump_rmpentry(pfn);
> + snp_nr_leaked_pages++;
> + pfn++;
> + page++;
> + }
> +unlock:
> + spin_unlock(&snp_leaked_pages_list_lock);
> +}
> +EXPORT_SYMBOL_GPL(snp_leak_pages);


2024-01-09 22:21:00

by Ashish Kalra

[permalink] [raw]
Subject: Re: [PATCH v1 15/26] x86/sev: Introduce snp leaked pages list

Hello Vlastimil,

On 1/8/2024 4:45 AM, Vlastimil Babka wrote:
> On 12/30/23 17:19, Michael Roth wrote:
>> From: Ashish Kalra <[email protected]>
>>
>> Pages are unsafe to be released back to the page-allocator, if they
>> have been transitioned to firmware/guest state and can't be reclaimed
>> or transitioned back to hypervisor/shared state. In this case add
>> them to an internal leaked pages list to ensure that they are not freed
>> or touched/accessed to cause fatal page faults.
>>
>> Signed-off-by: Ashish Kalra <[email protected]>
>> [mdr: relocate to arch/x86/virt/svm/sev.c]
>> Signed-off-by: Michael Roth <[email protected]>
> Hi, sorry I didn't respond in time to the last mail discussing previous
> version in
> https://lore.kernel.org/all/[email protected]/
> due to upcoming holidays.
>
> I would rather avoid the approach of allocating container objects:
> - it's allocating memory when effectively losing memory, a dangerous thing
> - are all the callers and their context ok with GFP_KERNEL?
> - GFP_KERNEL_ACCOUNT seems wrong, why would we be charging this to the
> current process, it's probably not its fault the pages are leaked? Also the
> charging can fail?
> - given the benefit of having leaked pages on a list is basically just
> debugging (i.e. crash dump or drgn inspection) this seems too heavy
>
> I think it would be better and sufficient to use page->lru for order-0 and
> head pages, and simply skip tail pages (possibly with adjusted warning
> message for that case).
>
> Vlastimil
>
> <snip

Considering the above thoughts, this is updated version of
snp_leak_pages(), looking forward to any review comments/feedback you
have on the same:

void snp_leak_pages(u64 pfn, unsigned int npages)
{
        struct page *page = pfn_to_page(pfn);

        pr_debug("%s: leaking PFN range 0x%llx-0x%llx\n", __func__,
pfn, pfn + npages);

        spin_lock(&snp_leaked_pages_list_lock);
        while (npages--) {
                /*
                 * Reuse the page's buddy list for chaining into the leaked
                 * pages list. This page should not be on a free list
currently
                 * and is also unsafe to be added to a free list.
                 */
                if ((likely(!PageCompound(page))) || (PageCompound(page) &&
                    !PageTail(page) && compound_head(page) == page))
                        /*
                         * Skip inserting tail pages of compound page as
                         * page->buddy_list of tail pages is not usable.
                         */
                        list_add_tail(&page->buddy_list,
&snp_leaked_pages_list);
                sev_dump_rmpentry(pfn);
                snp_nr_leaked_pages++;
                pfn++;
                page++;
        }
        spin_unlock(&snp_leaked_pages_list_lock);
}

Thanks, Ashish


2024-01-10 08:59:57

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v1 15/26] x86/sev: Introduce snp leaked pages list

On 1/9/24 23:19, Kalra, Ashish wrote:
> Hello Vlastimil,
>
> On 1/8/2024 4:45 AM, Vlastimil Babka wrote:
>> On 12/30/23 17:19, Michael Roth wrote:
>>> From: Ashish Kalra <[email protected]>
>>>
>>> Pages are unsafe to be released back to the page-allocator, if they
>>> have been transitioned to firmware/guest state and can't be reclaimed
>>> or transitioned back to hypervisor/shared state. In this case add
>>> them to an internal leaked pages list to ensure that they are not freed
>>> or touched/accessed to cause fatal page faults.
>>>
>>> Signed-off-by: Ashish Kalra <[email protected]>
>>> [mdr: relocate to arch/x86/virt/svm/sev.c]
>>> Signed-off-by: Michael Roth <[email protected]>
>> Hi, sorry I didn't respond in time to the last mail discussing previous
>> version in
>> https://lore.kernel.org/all/[email protected]/
>> due to upcoming holidays.
>>
>> I would rather avoid the approach of allocating container objects:
>> - it's allocating memory when effectively losing memory, a dangerous thing
>> - are all the callers and their context ok with GFP_KERNEL?
>> - GFP_KERNEL_ACCOUNT seems wrong, why would we be charging this to the
>> current process, it's probably not its fault the pages are leaked? Also the
>> charging can fail?
>> - given the benefit of having leaked pages on a list is basically just
>> debugging (i.e. crash dump or drgn inspection) this seems too heavy
>>
>> I think it would be better and sufficient to use page->lru for order-0 and
>> head pages, and simply skip tail pages (possibly with adjusted warning
>> message for that case).
>>
>> Vlastimil
>>
>> <snip
>
> Considering the above thoughts, this is updated version of
> snp_leak_pages(), looking forward to any review comments/feedback you
> have on the same:
>
> void snp_leak_pages(u64 pfn, unsigned int npages)
> {
>         struct page *page = pfn_to_page(pfn);
>
>         pr_debug("%s: leaking PFN range 0x%llx-0x%llx\n", __func__,
> pfn, pfn + npages);
>
>         spin_lock(&snp_leaked_pages_list_lock);
>         while (npages--) {
>                 /*
>                  * Reuse the page's buddy list for chaining into the leaked
>                  * pages list. This page should not be on a free list
> currently
>                  * and is also unsafe to be added to a free list.
>                  */
>                 if ((likely(!PageCompound(page))) || (PageCompound(page) &&
>                     !PageTail(page) && compound_head(page) == page))

This is unnecessarily paranoid wrt that compound_head(page) test, but OTOH
doesn't handle the weird case when we're leaking less than whole compound
page (if that can even happen). So I'd suggest:

while (npages) {

if ((likely(!PageCompound(page))) || (PageHead(page) && compound_nr(page)
<= npages))
list_add_tail(&page->buddy_list, ...)
}

... (no change from yours)

npages--;
}

(or an equivalent for()) perhaps

>                         /*
>                          * Skip inserting tail pages of compound page as
>                          * page->buddy_list of tail pages is not usable.
>                          */
>                         list_add_tail(&page->buddy_list,
> &snp_leaked_pages_list);
>                 sev_dump_rmpentry(pfn);
>                 snp_nr_leaked_pages++;
>                 pfn++;
>                 page++;
>         }
>         spin_unlock(&snp_leaked_pages_list_lock);
> }
>
> Thanks, Ashish
>