On Mon, Jun 12, 2023 at 09:08:58AM -0700, Dave Hansen wrote:
> On 6/11/23 21:25, Michael Roth wrote:
> > +/*
> > + * The RMP entry format is not architectural. The format is defined in PPR
> > + * Family 19h Model 01h, Rev B1 processor.
> > + */
> > +struct rmpentry {
> > + union {
> > + struct {
> > + u64 assigned : 1,
> > + pagesize : 1,
> > + immutable : 1,
> > + rsvd1 : 9,
> > + gpa : 39,
> > + asid : 10,
> > + vmsa : 1,
> > + validated : 1,
> > + rsvd2 : 1;
> > + } info;
> > + u64 low;
> > + };
> > + u64 high;
> > +} __packed;
>
> What's 'high' used for? The PPR says it's reserved. Why not call it
> reserved?
>
> It _looks_ like it's only used for a debugging pr_info(). It makes the
> struct look kinda goofy. I'd much rather limit the goofiness to the
> "dumping" code, like:
>
> u64 *__e = (void *)e;
> ....
> pr_info("RMPEntry paddr 0x%llx: [high=0x%016llx low=0x%016llx]\n",
> pfn << PAGE_SHIFT, __e[0], __e[1]);
>
> BTW, why does it do any good to dump all these reserved fields?
>
The reserved bits sometimes contain information that can be useful to
pass along to folks on the firmware side, so would definitely be helpful
to provide the full raw contents of the RMP entry.
So maybe something like this better captures the intended usage:
struct rmpentry {
union {
struct {
u64 assigned : 1,
pagesize : 1,
immutable : 1,
rsvd1 : 9,
gpa : 39,
asid : 10,
vmsa : 1,
validated : 1,
rsvd2 : 1;
u64 rsvd3;
} info;
u64 data[2];
};
} __packed;
But dropping the union and casting to u64[] locally in the debug/dumping
routine should work fine as well.
> > /*
> > * The first 16KB from the RMP_BASE is used by the processor for the
> > * bookkeeping, the range needs to be added during the RMP entry lookup.
> > */
> > #define RMPTABLE_CPU_BOOKKEEPING_SZ 0x4000
> > +#define RMPENTRY_SHIFT 8
> > +#define rmptable_page_offset(x) (RMPTABLE_CPU_BOOKKEEPING_SZ + \
> > + (((unsigned long)x) >> RMPENTRY_SHIFT))
>
> Is RMPENTRY_SHIFT just log2(sizeof(struct rmpentry))? If so, wouldn't
> this be a *LOT* more obvious to be written:
>
> unsigned long rmptable_offset(unsigned long paddr)
> {
> return RMPTABLE_CPU_BOOKKEEPING_SZ +
> paddr / sizeof(struct rmpentry);
> }
>
> ??
>
> It removes a cast, gives 'x' a real name, removes a random magic number
> #define and is clearer to boot.
Yes, we ended up reworking this to be an array of struct rmpentry's that
are indexed by PFN. That helps to simplify the lookup logic a good bit.
>
> > static unsigned long rmptable_start __ro_after_init;
> > static unsigned long rmptable_end __ro_after_init;
> > @@ -210,3 +235,63 @@ static int __init snp_rmptable_init(void)
> > * the page(s) used for DMA are hypervisor owned.
> > */
> > fs_initcall(snp_rmptable_init);
> > +
> > +static inline unsigned int rmpentry_assigned(const struct rmpentry *e)
> > +{
> > + return e->info.assigned;
> > +}
> > +
> > +static inline unsigned int rmpentry_pagesize(const struct rmpentry *e)
> > +{
> > + return e->info.pagesize;
> > +}
>
> I think these are a little silly. If you're going to go to the trouble
> of using bitfields, you don't need helpers for every bitfield. I say
> either use bitfields without helpers or just regular old:
>
> #define RMPENTRY_ASSIGNED_MASK BIT(1)
>
> and then *maybe* you can make helpers for them.
Yah, if we have the bitfields it makes sense to use them directly.
>
> > +static int rmptable_entry(unsigned long paddr, struct rmpentry *entry)
> > +{
> > + unsigned long vaddr;
> > +
> > + vaddr = rmptable_start + rmptable_page_offset(paddr);
> > + if (unlikely(vaddr > rmptable_end))
> > + return -EFAULT;
>
> This seems like more of a WARN_ON_ONCE() kind of thing than just an
> error return. Isn't it a big deal if an invalid (non-RMP-covered)
> address makes it in here?
Yes, now the lookups are by indexing into an array of struct rmpentry's,
that scenario basically the same as an attempted out-of-bounds access,
so definitely worth a warning.
>
> > + *entry = *(struct rmpentry *)vaddr;
> > +
> > + return 0;
> > +}
> > +
> > +static int __snp_lookup_rmpentry(u64 pfn, struct rmpentry *entry, int *level)
> > +{
>
> I've been bitten by pfn vs. paddr quite a few times. I'd really
> encourage you to add it to the names if you're going to pass them
> around, like __snp_lookup_rmpentry_pfn().
>
> > + unsigned long paddr = pfn << PAGE_SHIFT;
> > + struct rmpentry large_entry;
> > + int ret;
> > +
> > + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> > + return -ENXIO;
>
> OK, so if you're running on non-SNP hardware, you return -ENXIO here.
> Remember this please.
>
> > + ret = rmptable_entry(paddr, entry);
> > + if (ret)
> > + return ret;
> > +
> > + /* Read a large RMP entry to get the correct page level used in RMP entry. */
> > + ret = rmptable_entry(paddr & PMD_MASK, &large_entry);
> > + if (ret)
> > + return ret;
> > +
> > + *level = RMP_TO_X86_PG_LEVEL(rmpentry_pagesize(&large_entry));
> > +
> > + return 0;
> > +}
>
> This is a bit weird. Should it say something like this?
>
> To do an 4k RMP lookup the hardware looks at two places in the RMP:
I'd word this as:
"To query all the relevant bit of an 4k RMP entry, the kernel must access
2 entries in the RMP table:"
Because it's possible hardware only looks at the 2M entry for
hardware-based lookups, depending on where the access is coming from, or
how the memory at the PFN range is mapped.
But otherwise it seems like an accurate description.
>
> 1. The actual 4k RMP entry
> 2. The 2M entry that "covers" the 4k entry
>
> The page size is not indicated in the 4k entries at all. It is solely
> indicated by the 2M entry.
>
> > +int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level)
> > +{
> > + struct rmpentry e;
> > + int ret;
> > +
> > + ret = __snp_lookup_rmpentry(pfn, &e, level);
> > + if (ret)
> > + return ret;
> > +
> > + *assigned = !!rmpentry_assigned(&e);
> > + return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(snp_lookup_rmpentry);
> > diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
> > index b8357d6ecd47..bf0378136289 100644
> > --- a/arch/x86/include/asm/sev-common.h
> > +++ b/arch/x86/include/asm/sev-common.h
> > @@ -171,4 +171,8 @@ struct snp_psc_desc {
> > #define GHCB_ERR_INVALID_INPUT 5
> > #define GHCB_ERR_INVALID_EVENT 6
> >
> > +/* RMP page size */
> > +#define RMP_PG_SIZE_4K 0
> > +#define RMP_TO_X86_PG_LEVEL(level) (((level) == RMP_PG_SIZE_4K) ? PG_LEVEL_4K : PG_LEVEL_2M)
> > +
> > #endif
> > diff --git a/arch/x86/include/asm/sev-host.h b/arch/x86/include/asm/sev-host.h
> > new file mode 100644
> > index 000000000000..30d47e20081d
> > --- /dev/null
> > +++ b/arch/x86/include/asm/sev-host.h
> > @@ -0,0 +1,22 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + * AMD SVM-SEV Host Support.
> > + *
> > + * Copyright (C) 2023 Advanced Micro Devices, Inc.
> > + *
> > + * Author: Ashish Kalra <[email protected]>
> > + *
> > + */
> > +
> > +#ifndef __ASM_X86_SEV_HOST_H
> > +#define __ASM_X86_SEV_HOST_H
> > +
> > +#include <asm/sev-common.h>
> > +
> > +#ifdef CONFIG_KVM_AMD_SEV
> > +int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level);
> > +#else
> > +static inline int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level) { return 0; }
> > +#endif
>
> Above, -ENXIO was returned when SEV-SNP was not supported. Here, 0 is
> returned when it is compiled out. That inconsistent.
>
> Is snp_lookup_rmpentry() acceptable when SEV-SNP is in play? I'd like
> to see consistency between when it is compiled out and when it is
> compiled in but unsupported on the CPU.
I really don't think anything in the kernel should be calling
snp_lookup_rmpentry(), so I think it makes sense to adoption the -ENXIO
convention here and in any other stubs where that applies.
Thanks,
Mike
>
> > +#endif
> > diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
> > index d34c46db7dd1..446fc7a9f7b0 100644
> > --- a/arch/x86/include/asm/sev.h
> > +++ b/arch/x86/include/asm/sev.h
> > @@ -81,9 +81,6 @@ extern bool handle_vc_boot_ghcb(struct pt_regs *regs);
> > /* Software defined (when rFlags.CF = 1) */
> > #define PVALIDATE_FAIL_NOUPDATE 255
> >
> > -/* RMP page size */
> > -#define RMP_PG_SIZE_4K 0
> > -
> > #define RMPADJUST_VMSA_PAGE_BIT BIT(16)
> >
> > /* SNP Guest message request */
>