2020-05-12 09:41:03

by Jan Beulich

[permalink] [raw]
Subject: your "x86: mm: convert dump_pagetables to use walk_page_range" change

Steven,

in the description of this change you say:

"The effective permissions are passed down the chain using new fields in
struct pg_state."

I don't see how this works, and I suppose this part of the change is
(part of) the reason why a W+X warning has magically disappeared in
5.6.x (compared to 5.5.x) when running a 32-bit kernel under Xen.

Quoting the relevant piece of code:

if (level > 0) {
new_eff = effective_prot(st->prot_levels[level - 1],
new_prot);
} else {
new_eff = new_prot;
}

if (level >= 0)
st->prot_levels[level] = new_eff;

The generic framework calls note_page() only for leaf pages or holes
afaics. The protections for a leaf page found at a level other than
the numerically highest one have no meaning at all for a mapping at
a later address mapped with a numerically higher level mapping.
Instead it's the non-leaf page tables for that specific address
which determine the effective protection for any particular mapping.

To take an example, suppose the first present leaf page is found
at level 4. st->prot_levels[] will be all zero at this time, from
which it follows that new_eff will be zero then, too.

I don't think the intended effect can be achieved without either
retaining the original behavior of passing the effective protection
into note_page(), or calling note_page() also for non-leaf pages
(indicating to it which case it is, and adjusting it accordingly).

Am I overlooking something?

Additionally I'd like to note that note_page()'s "unsigned long val"
parameter isn't wide enough for 32-bit PAE PTEs, and hence the NX
flag will always be seen as clear in new_prot in such configs.

Jan


2020-05-12 13:06:09

by Steven Price

[permalink] [raw]
Subject: Re: your "x86: mm: convert dump_pagetables to use walk_page_range" change

On 12/05/2020 10:39, Jan Beulich wrote:
> Steven,

Hi Jan,

> in the description of this change you say:
>
> "The effective permissions are passed down the chain using new fields in
> struct pg_state."
>
> I don't see how this works, and I suppose this part of the change is
> (part of) the reason why a W+X warning has magically disappeared in
> 5.6.x (compared to 5.5.x) when running a 32-bit kernel under Xen.
>
> Quoting the relevant piece of code:
>
> if (level > 0) {
> new_eff = effective_prot(st->prot_levels[level - 1],
> new_prot);
> } else {
> new_eff = new_prot;
> }
>
> if (level >= 0)
> st->prot_levels[level] = new_eff;
>
> The generic framework calls note_page() only for leaf pages or holes
> afaics. The protections for a leaf page found at a level other than
> the numerically highest one have no meaning at all for a mapping at
> a later address mapped with a numerically higher level mapping.
> Instead it's the non-leaf page tables for that specific address
> which determine the effective protection for any particular mapping.
>
> To take an example, suppose the first present leaf page is found
> at level 4. st->prot_levels[] will be all zero at this time, from
> which it follows that new_eff will be zero then, too.
>
> I don't think the intended effect can be achieved without either
> retaining the original behavior of passing the effective protection
> into note_page(), or calling note_page() also for non-leaf pages
> (indicating to it which case it is, and adjusting it accordingly).
>
> Am I overlooking something?

Sadly I don't think you are - you're reasoning seems correct. It looks
like the computation of effective permissions will need to be done in
ptdump.c rather than dump_pagetables.c - as it's only ptdump.c that
deals with the non-leaf entries as you point out.

> Additionally I'd like to note that note_page()'s "unsigned long val"
> parameter isn't wide enough for 32-bit PAE PTEs, and hence the NX
> flag will always be seen as clear in new_prot in such configs.

Ah, interesting. I'm not sure what type is actually guaranteed to be
correct. pgprotval_t is x86 specific, but it might be necessary to
extend it to other architectures. I think I got the "unsigned long" from
the generic page.h (and because it happens to work on most
architectures) - but hadn't noticed that that file was specifically only
for NOMMU architectures.

I'll see if I can come up with fixes, but if you've got anything ready
already then please jump in.

Steve

2020-05-12 13:14:23

by Jan Beulich

[permalink] [raw]
Subject: Re: your "x86: mm: convert dump_pagetables to use walk_page_range" change

On 12.05.2020 15:02, Steven Price wrote:
> On 12/05/2020 10:39, Jan Beulich wrote:
>> Additionally I'd like to note that note_page()'s "unsigned long val"
>> parameter isn't wide enough for 32-bit PAE PTEs, and hence the NX
>> flag will always be seen as clear in new_prot in such configs.
>
> Ah, interesting. I'm not sure what type is actually guaranteed to be
> correct. pgprotval_t is x86 specific, but it might be necessary to
> extend it to other architectures. I think I got the "unsigned long" from
> the generic page.h (and because it happens to work on most
> architectures) - but hadn't noticed that that file was specifically only
> for NOMMU architectures.

Well, it's pteval_t (still x86-specific I guess) unless the call
sites could/would be switched to use pte_flags(). As per
https://lists.xenproject.org/archives/html/xen-devel/2020-05/msg00549.html
the use of pte_val() has another bad effect, but as described
there it might also be Xen specific code which is broken here.

> I'll see if I can come up with fixes, but if you've got anything ready
> already then please jump in.

No, I don't. I lack the time right now to make a change potentially
affecting multiple architectures.

Jan

2020-05-12 13:21:16

by Peter Zijlstra

[permalink] [raw]
Subject: Re: your "x86: mm: convert dump_pagetables to use walk_page_range" change

On Tue, May 12, 2020 at 02:02:51PM +0100, Steven Price wrote:
> > Additionally I'd like to note that note_page()'s "unsigned long val"
> > parameter isn't wide enough for 32-bit PAE PTEs, and hence the NX
> > flag will always be seen as clear in new_prot in such configs.
>
> Ah, interesting. I'm not sure what type is actually guaranteed to be
> correct. pgprotval_t is x86 specific, but it might be necessary to extend it
> to other architectures. I think I got the "unsigned long" from the generic
> page.h (and because it happens to work on most architectures) - but hadn't
> noticed that that file was specifically only for NOMMU architectures.

unsigned long long should cover the i386-PAE / ARMv7-LPAE cases. I
can't, from teh top of my head, remember if there's anybody else who has
a longer PTE.