2020-11-19 18:01:12

by Andrea Arcangeli

[permalink] [raw]
Subject: [PATCH 0/1] x86: restore the write back cache of reserved RAM in iounmap()

Hello everyone,

We identified some PCD set on the direct mapping causing hardly
reproducible performance issues and this patch fixes the ultimate root
cause.

The caller for now has been tweaked to avoid triggering this case (now
that we know about it) however if the observations on the proposed fix
aren't correct, it'd be great if we could still do some other change
to ioremap/iounmap and perhaps the other memtype APIs, to be sure
those PCD/PWT leftovers won't happen again a few years from now in
another place.

For example one more complex alternative would be to use the
page_mapcount of reserved pages (currently unused) to do proper
refcounting on the overlapping ioremap so you can resync the kernel
direct mapping to write back only at the very last iounmap.

Or a much simpler alternative that would remain fully neutral for
overlapping ioremaps, would be to overwrite all page_count of reserved
RAM in a way that if they're ever freed later it'll trigger a crash
during __free_pages.

==
// SPDX-License-Identifier: GPL-2.0-only
/*
* ioremap.c
*
* Copyright (C) 2020 Red Hat, Inc.
*
* Reproducer for bug io iounmap.
*/

#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/mm.h>
#include <asm/io.h>

#define NR_PAGES 512
#define REPRODUCE

static struct page *pages[NR_PAGES];
static void __iomem *map[NR_PAGES];

int init_module(void)
{
int i;
for (i = 0; i < NR_PAGES; i++) {
pages[i] = alloc_pages(GFP_KERNEL|__GFP_NOWARN, MAX_ORDER-1);
if (!pages[i])
break;
__SetPageReserved(pages[i]);
#ifdef REPRODUCE
map[i] = ioremap(page_to_phys(pages[i]), 1L<<(MAX_ORDER-1));
WARN_ON_ONCE(!map[i]);
#endif
}

return 0;
}

void cleanup_module(void)
{
int i;
for (i = 0; i < NR_PAGES; i++) {
if (map[i])
iounmap(map[i]);
if (!pages[i])
break;
__ClearPageReserved(pages[i]);
__free_pages(pages[i], MAX_ORDER-1);
}
}

MODULE_LICENSE("GPL");
==

Andrea Arcangeli (1):
x86: restore the write back cache of reserved RAM in iounmap()

arch/x86/mm/ioremap.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)


2020-11-19 18:04:58

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 0/1] x86: restore the write back cache of reserved RAM in iounmap()

On Thu, Nov 19, 2020 at 12:59:01PM -0500, Andrea Arcangeli wrote:
> Hello everyone,
>
> We identified some PCD set on the direct mapping causing hardly
> reproducible performance issues and this patch fixes the ultimate root
> cause.
>
> The caller for now has been tweaked to avoid triggering this case (now
> that we know about it)

What is the callers? The whole SetPageReservered + ioremap* thing
you mention in the actual patch is completely bogus. I think we'll
need to reject that as well and fix the caller.

2020-11-19 19:06:02

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [PATCH 0/1] x86: restore the write back cache of reserved RAM in iounmap()

Hello Christoph,

On Thu, Nov 19, 2020 at 06:02:06PM +0000, Christoph Hellwig wrote:
> What is the callers? The whole SetPageReservered + ioremap* thing
> you mention in the actual patch is completely bogus. I think we'll
> need to reject that as well and fix the caller.

The actual caller is not so much the focus here: the point here is to
be able to either handle the caller gracefully or to get a synchronous
kernel crash in __free_pages.

Otherwise the problem induced by such a caller (no matter if right or
wrong) becomes hardly debuggable.

The caller in question was the EFI_BOOT_SERVICE_DATA that is aliased
on non RAM but then freed later by swapping RAM under it.

Of course the caller has already been changed to stick to write back
and that specific caller is not a concern anymore. My concern is if we
leave the callee (iounmap) as it is, what does guarantee us that we
won't hit again in production a few years down the road?

When I first read the caller it felt nothing should have gone wrong,
it looked ok even the version that would leave PCD leftovers bits in
the direct map. So I didn't get why switching to write back would
prevent the PCD leftovers until I looked at the callee (iounmap).

Thanks,
Andrea