2024-01-24 17:40:17

by Steve Wahl

[permalink] [raw]
Subject: [repost PATCH v2] x86/mm/ident_map: Use gbpages only where full GB page should be mapped.

Instead of using gbpages for all memory regions, which can include
vast areas outside what's actually been requested, use them only when
map creation requests include the full GB page of space; descend to
using smaller 2M pages when only portions of a GB page are included in
the request.

No attempt is made to coalesce mapping requests. If a request requires
a map entry at the 2M (pmd) level, subsequent mapping requests within
the same 1G region will also be at the pmd level, even if adjacent or
overlapping such requests could theoretically have been combined to
map a full gbpage. Existing usage starts with larger regions and then
adds smaller regions, so this should not have any great consequence.

When gbpages are used exclusively to create identity maps, large
ranges of addresses not actually requested can be included in the
resulting table. On UV systems, this ends up including regions that
will cause hardware to halt the system if accessed (these are marked
"reserved" by BIOS). Even though code does not actually make
references to these addresses, including them in an active map allows
processor speculation into this region, which is enough to trigger the
system halt.

The kernel option "nogbpages" will disallow use of gbpages entirely
and avoid this problem, but uses a lot of extra memory for page tables
that are not really needed.

Signed-off-by: Steve Wahl <[email protected]>
---
repost: no changes except for this note. V2 got no replies.

v2: per Dave Hanson review: Additional changelog info,
moved pud_large() check earlier in the code, and
improved the comment describing the conditions
that restrict gbpage usage.

arch/x86/mm/ident_map.c | 20 +++++++++++++++-----
1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c
index 968d7005f4a7..5c88c3a7d12a 100644
--- a/arch/x86/mm/ident_map.c
+++ b/arch/x86/mm/ident_map.c
@@ -31,13 +31,23 @@ static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
if (next > end)
next = end;

- if (info->direct_gbpages) {
- pud_t pudval;
+ /* if this is already a gbpage, this portion is already mapped */
+ if (pud_large(*pud))
+ continue;

- if (pud_present(*pud))
- continue;
+ /*
+ * To be eligible to use a gbpage:
+ * - gbpages must be enabled
+ * - addr must be gb aligned (start of region)
+ * - next must be gb aligned (end of region)
+ * - PUD must be empty (nothing already mapped in this region)
+ */
+ if (info->direct_gbpages
+ && !(addr & ~PUD_MASK)
+ && !(next & ~PUD_MASK)
+ && !pud_present(*pud)) {
+ pud_t pudval;

- addr &= PUD_MASK;
pudval = __pud((addr - info->offset) | info->page_flag);
set_pud(pud, pudval);
continue;
--
2.26.2



2024-01-24 23:24:23

by Dave Hansen

[permalink] [raw]
Subject: Re: [repost PATCH v2] x86/mm/ident_map: Use gbpages only where full GB page should be mapped.

On 1/24/24 09:36, Steve Wahl wrote:
> Instead of using gbpages for all memory regions, which can include
> vast areas outside what's actually been requested, use them only when
> map creation requests include the full GB page of space; descend to
> using smaller 2M pages when only portions of a GB page are included in
> the request.

This kinda jumps immediately to the solution without stating the problem.

The problem is something like this:

Right now, ident_pud_init() will use 1GB pages to map an area as
long as 1G pages are supported. It does not consider the size
of the area being mapped. Mapping 1G? Use a 1GB mapping.
Mapping 4k? Also use a 1GB mapping. On UV systems, this ends
up mapping BIOS-reserved regions that will cause hardware to
halt the system if accessed, even speculatively.

Right?

> + /*
> + * To be eligible to use a gbpage:
> + * - gbpages must be enabled
> + * - addr must be gb aligned (start of region)
> + * - next must be gb aligned (end of region)
> + * - PUD must be empty (nothing already mapped in this region)
> + */

.. this also needs to say _why_. As it stands, it kinda just rewrites
the code in English which isn't super helpful. It's also awfully
awkward to write a multi-line comment above a multi-line if().

Why not refactor it to do something like:

bool can_use_gbpages = info->direct_gbpages;

/* Avoid using a gbpage when it would be too large: */
can_use_gbpages &= (addr & ~PUD_MASK) ||
(next & ~PUD_MASK);

/* Never overwrite existing mappings: */
can_use_gbpages &= !pud_present(*pud);

if (can_use_gbpages) {
...