2018-07-05 05:48:13

by Yang, Bin

[permalink] [raw]
Subject: [PATCH v2] x86/mm: fix cpu stuck issue in __change_page_attr_set_clr

When changing a 4K page attr inside 1G/2M large page range,
__change_page_attr() will call try_preserve_large_page() to decide
to split the big page or not. And try_preserve_large_page() will
call static_protections() to check all 4K pages inside the large page
range one by one. The check loop is very inefficient. In worst case,
static_protections() will be called for 1G/4K (262144) times.

This issue can be triggered by free_initmem() during kernel boot.
free_initmem() <-- free N pages
free_init_pages()
set_memory_rw()
change_page_attr_set()
change_page_attr_set_clr()
__change_page_attr_set_clr()
__change_page_attr() <-- loop N times
try_preserve_large_page()
static_protections() <-- worst case: loop 262144 * N times

Instead of checking all pages, it can only check one page for same
overlapping range. This patch enhances static_protections() to return the
page num that all following pages are in the same overlapping range with
same protection flags. It can reduce the check loop from 262144 to less
than 10 times.

Signed-off-by: Bin Yang <[email protected]>
---
arch/x86/mm/pageattr.c | 72 ++++++++++++++++++++++++++++++++++++++++----------
1 file changed, 58 insertions(+), 14 deletions(-)

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 3bded76e..dee23ef 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -292,17 +292,27 @@ static void cpa_flush_array(unsigned long *start, int numpages, int cache,
* checks and fixes these known static required protection bits.
*/
static inline pgprot_t static_protections(pgprot_t prot, unsigned long address,
- unsigned long pfn)
+ unsigned long pfn, unsigned long *page_num)
{
pgprot_t forbidden = __pgprot(0);
+ unsigned long tmp;
+ unsigned long num = PUD_PAGE_SIZE >> PAGE_SHIFT;

/*
* The BIOS area between 640k and 1Mb needs to be executable for
* PCI BIOS based config access (CONFIG_PCI_GOBIOS) support.
*/
#ifdef CONFIG_PCI_BIOS
- if (pcibios_enabled && within(pfn, BIOS_BEGIN >> PAGE_SHIFT, BIOS_END >> PAGE_SHIFT))
- pgprot_val(forbidden) |= _PAGE_NX;
+ if (pcibios_enabled) {
+ tmp = (BIOS_BEGIN >> PAGE_SHIFT) > pfn ?
+ (BIOS_BEGIN >> PAGE_SHIFT) - pfn : ULONG_MAX;
+ if (within(pfn, BIOS_BEGIN >> PAGE_SHIFT,
+ BIOS_END >> PAGE_SHIFT)) {
+ pgprot_val(forbidden) |= _PAGE_NX;
+ tmp = (BIOS_END >> PAGE_SHIFT) - pfn;
+ }
+ num = num > tmp ? tmp : num;
+ }
#endif

/*
@@ -310,18 +320,30 @@ static inline pgprot_t static_protections(pgprot_t prot, unsigned long address,
* Does not cover __inittext since that is gone later on. On
* 64bit we do not enforce !NX on the low mapping
*/
- if (within(address, (unsigned long)_text, (unsigned long)_etext))
+ tmp = (unsigned long)_text > address ?
+ ((unsigned long)_text - address) >> PAGE_SHIFT : ULONG_MAX;
+ if (within(address, (unsigned long)_text, (unsigned long)_etext)) {
pgprot_val(forbidden) |= _PAGE_NX;
+ tmp = ((unsigned long)_etext - address) >> PAGE_SHIFT;
+ }
+ num = num > tmp ? tmp : num;

/*
* The .rodata section needs to be read-only. Using the pfn
* catches all aliases. This also includes __ro_after_init,
* so do not enforce until kernel_set_to_readonly is true.
*/
- if (kernel_set_to_readonly &&
- within(pfn, __pa_symbol(__start_rodata) >> PAGE_SHIFT,
- __pa_symbol(__end_rodata) >> PAGE_SHIFT))
- pgprot_val(forbidden) |= _PAGE_RW;
+ if (kernel_set_to_readonly) {
+ tmp = (__pa_symbol(__start_rodata) >> PAGE_SHIFT) > pfn ?
+ (__pa_symbol(__start_rodata) >> PAGE_SHIFT) - pfn :
+ ULONG_MAX;
+ if (within(pfn, __pa_symbol(__start_rodata) >> PAGE_SHIFT,
+ __pa_symbol(__end_rodata) >> PAGE_SHIFT)) {
+ pgprot_val(forbidden) |= _PAGE_RW;
+ tmp = (__pa_symbol(__end_rodata) >> PAGE_SHIFT) - pfn;
+ }
+ num = num > tmp ? tmp : num;
+ }

#if defined(CONFIG_X86_64)
/*
@@ -333,11 +355,15 @@ static inline pgprot_t static_protections(pgprot_t prot, unsigned long address,
* This will preserve the large page mappings for kernel text/data
* at no extra cost.
*/
+ tmp = kernel_set_to_readonly && (unsigned long)_text > address ?
+ ((unsigned long)_text - address) >> PAGE_SHIFT : ULONG_MAX;
if (kernel_set_to_readonly &&
within(address, (unsigned long)_text,
(unsigned long)__end_rodata_hpage_align)) {
unsigned int level;

+ tmp = ((unsigned long)__end_rodata_hpage_align
+ - address) >> PAGE_SHIFT;
/*
* Don't enforce the !RW mapping for the kernel text mapping,
* if the current mapping is already using small page mapping.
@@ -355,13 +381,28 @@ static inline pgprot_t static_protections(pgprot_t prot, unsigned long address,
* text mapping logic will help Linux Xen parvirt guest boot
* as well.
*/
- if (lookup_address(address, &level) && (level != PG_LEVEL_4K))
+ if (lookup_address(address, &level) && (level != PG_LEVEL_4K)) {
+ unsigned long psize = page_level_size(level);
+ unsigned long pmask = page_level_mask(level);
+ unsigned long nextpage_addr =
+ (address + psize) & pmask;
+ unsigned long numpages =
+ (nextpage_addr - address) >> PAGE_SHIFT;
+
pgprot_val(forbidden) |= _PAGE_RW;
+ tmp = tmp > numpages ? numpages : tmp;
+ }
}
+ num = kernel_set_to_readonly && num > tmp ? tmp : num;
#endif

prot = __pgprot(pgprot_val(prot) & ~pgprot_val(forbidden));

+ if (num == 0)
+ num = 1;
+ if (page_num)
+ *page_num = num;
+
return prot;
}

@@ -552,7 +593,8 @@ static int
try_preserve_large_page(pte_t *kpte, unsigned long address,
struct cpa_data *cpa)
{
- unsigned long nextpage_addr, numpages, pmask, psize, addr, pfn, old_pfn;
+ unsigned long nextpage_addr, numpages, pmask, psize, pnum,
+ addr, pfn, old_pfn;
pte_t new_pte, old_pte, *tmp;
pgprot_t old_prot, new_prot, req_prot;
int i, do_split = 1;
@@ -625,7 +667,7 @@ try_preserve_large_page(pte_t *kpte, unsigned long address,
pfn = old_pfn + ((address & (psize - 1)) >> PAGE_SHIFT);
cpa->pfn = pfn;

- new_prot = static_protections(req_prot, address, pfn);
+ new_prot = static_protections(req_prot, address, pfn, NULL);

/*
* We need to check the full range, whether
@@ -634,8 +676,10 @@ try_preserve_large_page(pte_t *kpte, unsigned long address,
*/
addr = address & pmask;
pfn = old_pfn;
- for (i = 0; i < (psize >> PAGE_SHIFT); i++, addr += PAGE_SIZE, pfn++) {
- pgprot_t chk_prot = static_protections(req_prot, addr, pfn);
+ for (i = 0; i < (psize >> PAGE_SHIFT);
+ i += pnum, addr += PAGE_SIZE * pnum, pfn += pnum) {
+ pgprot_t chk_prot =
+ static_protections(req_prot, addr, pfn, &pnum);

if (pgprot_val(chk_prot) != pgprot_val(new_prot))
goto out_unlock;
@@ -1246,7 +1290,7 @@ static int __change_page_attr(struct cpa_data *cpa, int primary)
pgprot_val(new_prot) &= ~pgprot_val(cpa->mask_clr);
pgprot_val(new_prot) |= pgprot_val(cpa->mask_set);

- new_prot = static_protections(new_prot, address, pfn);
+ new_prot = static_protections(new_prot, address, pfn, NULL);

new_prot = pgprot_clear_protnone_bits(new_prot);

--
2.7.4



2018-07-05 06:29:25

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v2] x86/mm: fix cpu stuck issue in __change_page_attr_set_clr

On Thu, 5 Jul 2018, Bin Yang wrote:
> static inline pgprot_t static_protections(pgprot_t prot, unsigned long address,
> - unsigned long pfn)
> + unsigned long pfn, unsigned long *page_num)
> {
> pgprot_t forbidden = __pgprot(0);
> + unsigned long tmp;
> + unsigned long num = PUD_PAGE_SIZE >> PAGE_SHIFT;

Everything is a PUD? Oh well, you clearly made a lot of effort to
understand the code you are changing.

> /*
> * The BIOS area between 640k and 1Mb needs to be executable for
> * PCI BIOS based config access (CONFIG_PCI_GOBIOS) support.
> */
> #ifdef CONFIG_PCI_BIOS
> - if (pcibios_enabled && within(pfn, BIOS_BEGIN >> PAGE_SHIFT, BIOS_END >> PAGE_SHIFT))
> - pgprot_val(forbidden) |= _PAGE_NX;
> + if (pcibios_enabled) {
> + tmp = (BIOS_BEGIN >> PAGE_SHIFT) > pfn ?
> + (BIOS_BEGIN >> PAGE_SHIFT) - pfn : ULONG_MAX;
> + if (within(pfn, BIOS_BEGIN >> PAGE_SHIFT,
> + BIOS_END >> PAGE_SHIFT)) {
> + pgprot_val(forbidden) |= _PAGE_NX;
> + tmp = (BIOS_END >> PAGE_SHIFT) - pfn;
> + }
> + num = num > tmp ? tmp : num;

What? I really gave you a hint which used a overlaps() helper in the pseudo
code. But sure open coding the same thing in 5 places is faster, right?

You managed to make this code completely unreadable and I'm not even trying
to review that mess.

> + for (i = 0; i < (psize >> PAGE_SHIFT);
> + i += pnum, addr += PAGE_SIZE * pnum, pfn += pnum) {
> + pgprot_t chk_prot =
> + static_protections(req_prot, addr, pfn, &pnum);

When done right, then there is no need for a loop at all. And I told you so.
But that would need more effort than creating a trainwreck, right?

Stop sending half baken and half thought out patches. I really spent a lot
of time explaining you things in detail and giving you hints how it should
be done. Feel free to ignore me, but don't be surprised if I'm ignoring you
as well.

Thanks,

tglx



2018-07-05 22:22:18

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v2] x86/mm: fix cpu stuck issue in __change_page_attr_set_clr

On 07/04/2018 10:47 PM, Bin Yang wrote:
> When changing a 4K page attr inside 1G/2M large page range,
> __change_page_attr() will call try_preserve_large_page() to decide
> to split the big page or not. And try_preserve_large_page() will
> call static_protections() to check all 4K pages inside the large page
> range one by one. The check loop is very inefficient. In worst case,
> static_protections() will be called for 1G/4K (262144) times.

I wrote this before I read the entire thread and Thomas's analysis, but
I think we came to the same conclusion. Here goes anyway:

Let me make sure I got this right.

1. free_init_pages() frees a single 4k page
2. set_memory_rw() tries to set that 4k range read-write
3. try_preserve_large_page()->_lookup_address_cpa() finds a 1GB
page covering that 4k range.
4. We now go over the 1GB range (outside the 4k one we may or may not be
changing) to see if it needs to be converted because of
static_protections()

Correct?

#4 seems insane. Why is it the new call's job to fix up protections
outside the 4k area which it was called to change?

There's an argument to be made that we need the static_protections()
loop for the area that try_to_preserve_large_page() is being called
over, but by no means do we need that for anything larger.

So, for this patch's purposes, if we do a change_page_attr(), walk down
to the PTE/PMD/PUD/... mapping that virtual address, and find that that
pte fulfills the 'prot' we are going for (adjusted for
static_protections() and for large page prot bits), we are done.
Totally done. Done-aroo.