2020-05-21 15:25:21

by Steven Price

[permalink] [raw]
Subject: [PATCH 0/2] Fix W+X debug feature on x86

Jan alert me[1] that the W+X detection debug feature was broken in x86
by my change[2] to switch x86 to use the generic ptdump infrastructure.

Fundamentally the approach of trying to move the calculation of
effective permissions into note_page() was broken because note_page() is
only called for 'leaf' entries and the effective permissions are passed
down via the internal nodes of the page tree. The solution I've taken
here is to create a new (optional) callback which is called for all
nodes of the page tree and therefore can calculate the effective
permissions.

Secondly on some configurations (32 bit with PAE) "unsigned long" is not
large enough to store the table entries. The fix here is simple - let's
just use a u64.

I'd welcome testing (and other comments), especially if you have a
configuration which previously triggered W+X warnings as I don't have
such a setup.

[1] https://lore.kernel.org/lkml/[email protected]/
[2] 2ae27137b2db ("x86: mm: convert dump_pagetables to use walk_page_range")

Steven Price (2):
x86: mm: ptdump: Calculate effective permissions correctly
mm: ptdump: Expand type of 'val' in note_page()

arch/arm64/mm/dump.c | 2 +-
arch/x86/mm/dump_pagetables.c | 33 ++++++++++++++++++++-------------
include/linux/ptdump.h | 3 ++-
mm/ptdump.c | 17 ++++++++++++++++-
4 files changed, 39 insertions(+), 16 deletions(-)

--
2.20.1


2020-05-21 15:30:01

by Steven Price

[permalink] [raw]
Subject: [PATCH 1/2] x86: mm: ptdump: Calculate effective permissions correctly

By switching the x86 page table dump code to use the generic code the
effective permissions are no longer calculated correctly because the
note_page() function is only called for *leaf* entries. To calculate the
actual effective permissions it is necessary to observe the full
hierarchy of the page tree.

Introduce a new callback for ptdump which is called for every entry and
can therefore update the prot_levels array correctly. note_page() can
then simply access the appropriate element in the array.

Reported-by: Jan Beulich <[email protected]>
Fixes: 2ae27137b2db ("x86: mm: convert dump_pagetables to use walk_page_range")
Signed-off-by: Steven Price <[email protected]>
---
arch/x86/mm/dump_pagetables.c | 31 +++++++++++++++++++------------
include/linux/ptdump.h | 1 +
mm/ptdump.c | 17 ++++++++++++++++-
3 files changed, 36 insertions(+), 13 deletions(-)

diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index 69309cd56fdf..199bbb7fbd79 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -249,10 +249,22 @@ static void note_wx(struct pg_state *st, unsigned long addr)
(void *)st->start_address);
}

-static inline pgprotval_t effective_prot(pgprotval_t prot1, pgprotval_t prot2)
+static void effective_prot(struct ptdump_state *pt_st, int level, u64 val)
{
- return (prot1 & prot2 & (_PAGE_USER | _PAGE_RW)) |
- ((prot1 | prot2) & _PAGE_NX);
+ struct pg_state *st = container_of(pt_st, struct pg_state, ptdump);
+ pgprotval_t prot = val & PTE_FLAGS_MASK;
+ pgprotval_t effective;
+
+ if (level > 0) {
+ pgprotval_t higher_prot = st->prot_levels[level - 1];
+
+ effective = (higher_prot & prot & (_PAGE_USER | _PAGE_RW)) |
+ ((higher_prot | prot) & _PAGE_NX);
+ } else {
+ effective = prot;
+ }
+
+ st->prot_levels[level] = effective;
}

/*
@@ -270,16 +282,10 @@ static void note_page(struct ptdump_state *pt_st, unsigned long addr, int level,
struct seq_file *m = st->seq;

new_prot = val & PTE_FLAGS_MASK;
+ new_eff = st->prot_levels[level];

- if (level > 0) {
- new_eff = effective_prot(st->prot_levels[level - 1],
- new_prot);
- } else {
- new_eff = new_prot;
- }
-
- if (level >= 0)
- st->prot_levels[level] = new_eff;
+ if (!val)
+ new_eff = 0;

/*
* If we have a "break" in the series, we need to flush the state that
@@ -374,6 +380,7 @@ static void ptdump_walk_pgd_level_core(struct seq_file *m,
struct pg_state st = {
.ptdump = {
.note_page = note_page,
+ .effective_prot = effective_prot,
.range = ptdump_ranges
},
.level = -1,
diff --git a/include/linux/ptdump.h b/include/linux/ptdump.h
index a67065c403c3..ac01502763bf 100644
--- a/include/linux/ptdump.h
+++ b/include/linux/ptdump.h
@@ -14,6 +14,7 @@ struct ptdump_state {
/* level is 0:PGD to 4:PTE, or -1 if unknown */
void (*note_page)(struct ptdump_state *st, unsigned long addr,
int level, unsigned long val);
+ void (*effective_prot)(struct ptdump_state *st, int level, u64 val);
const struct ptdump_range *range;
};

diff --git a/mm/ptdump.c b/mm/ptdump.c
index 26208d0d03b7..f4ce916f5602 100644
--- a/mm/ptdump.c
+++ b/mm/ptdump.c
@@ -36,6 +36,9 @@ static int ptdump_pgd_entry(pgd_t *pgd, unsigned long addr,
return note_kasan_page_table(walk, addr);
#endif

+ if (st->effective_prot)
+ st->effective_prot(st, 0, pgd_val(val));
+
if (pgd_leaf(val))
st->note_page(st, addr, 0, pgd_val(val));

@@ -53,6 +56,9 @@ static int ptdump_p4d_entry(p4d_t *p4d, unsigned long addr,
return note_kasan_page_table(walk, addr);
#endif

+ if (st->effective_prot)
+ st->effective_prot(st, 1, p4d_val(val));
+
if (p4d_leaf(val))
st->note_page(st, addr, 1, p4d_val(val));

@@ -70,6 +76,9 @@ static int ptdump_pud_entry(pud_t *pud, unsigned long addr,
return note_kasan_page_table(walk, addr);
#endif

+ if (st->effective_prot)
+ st->effective_prot(st, 2, pud_val(val));
+
if (pud_leaf(val))
st->note_page(st, addr, 2, pud_val(val));

@@ -87,6 +96,8 @@ static int ptdump_pmd_entry(pmd_t *pmd, unsigned long addr,
return note_kasan_page_table(walk, addr);
#endif

+ if (st->effective_prot)
+ st->effective_prot(st, 3, pmd_val(val));
if (pmd_leaf(val))
st->note_page(st, addr, 3, pmd_val(val));

@@ -97,8 +108,12 @@ static int ptdump_pte_entry(pte_t *pte, unsigned long addr,
unsigned long next, struct mm_walk *walk)
{
struct ptdump_state *st = walk->private;
+ pte_t val = READ_ONCE(*pte);
+
+ if (st->effective_prot)
+ st->effective_prot(st, 4, pte_val(val));

- st->note_page(st, addr, 4, pte_val(READ_ONCE(*pte)));
+ st->note_page(st, addr, 4, pte_val(val));

return 0;
}
--
2.20.1

2020-05-21 15:30:32

by Steven Price

[permalink] [raw]
Subject: [PATCH 2/2] mm: ptdump: Expand type of 'val' in note_page()

The page table entry is passed in the 'val' argument to note_page(),
however this was previously an "unsigned long" which is fine on 64-bit
platforms. But for 32 bit x86 it is not always big enough to contain a
page table entry which may be 64 bits.

Change the type to u64 to ensure that it is always big enough.

Reported-by: Jan Beulich <[email protected]>
Signed-off-by: Steven Price <[email protected]>
---
arch/arm64/mm/dump.c | 2 +-
arch/x86/mm/dump_pagetables.c | 2 +-
include/linux/ptdump.h | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/mm/dump.c b/arch/arm64/mm/dump.c
index 860c00ec8bd3..d4313bc0c4c1 100644
--- a/arch/arm64/mm/dump.c
+++ b/arch/arm64/mm/dump.c
@@ -247,7 +247,7 @@ static void note_prot_wx(struct pg_state *st, unsigned long addr)
}

static void note_page(struct ptdump_state *pt_st, unsigned long addr, int level,
- unsigned long val)
+ u64 val)
{
struct pg_state *st = container_of(pt_st, struct pg_state, ptdump);
static const char units[] = "KMGTPE";
diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index 199bbb7fbd79..17aa7ac94a34 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -273,7 +273,7 @@ static void effective_prot(struct ptdump_state *pt_st, int level, u64 val)
* print what we collected so far.
*/
static void note_page(struct ptdump_state *pt_st, unsigned long addr, int level,
- unsigned long val)
+ u64 val)
{
struct pg_state *st = container_of(pt_st, struct pg_state, ptdump);
pgprotval_t new_prot, new_eff;
diff --git a/include/linux/ptdump.h b/include/linux/ptdump.h
index ac01502763bf..2a3a95586425 100644
--- a/include/linux/ptdump.h
+++ b/include/linux/ptdump.h
@@ -13,7 +13,7 @@ struct ptdump_range {
struct ptdump_state {
/* level is 0:PGD to 4:PTE, or -1 if unknown */
void (*note_page)(struct ptdump_state *st, unsigned long addr,
- int level, unsigned long val);
+ int level, u64 val);
void (*effective_prot)(struct ptdump_state *st, int level, u64 val);
const struct ptdump_range *range;
};
--
2.20.1

2020-05-21 19:12:49

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 0/2] Fix W+X debug feature on x86

On Thu, 21 May 2020 16:23:06 +0100 Steven Price <[email protected]> wrote:

> Jan alert me[1] that the W+X detection debug feature was broken in x86
> by my change[2] to switch x86 to use the generic ptdump infrastructure.
>
> Fundamentally the approach of trying to move the calculation of
> effective permissions into note_page() was broken because note_page() is
> only called for 'leaf' entries and the effective permissions are passed
> down via the internal nodes of the page tree. The solution I've taken
> here is to create a new (optional) callback which is called for all
> nodes of the page tree and therefore can calculate the effective
> permissions.
>
> Secondly on some configurations (32 bit with PAE) "unsigned long" is not
> large enough to store the table entries. The fix here is simple - let's
> just use a u64.

I assumed that a cc:stable was appropriate on both of these(?).

> I'd welcome testing (and other comments), especially if you have a
> configuration which previously triggered W+X warnings as I don't have
> such a setup.

I'll wait a while for such testing. If nothing happens then I guess we
merge it up and see what then happens.

2020-05-22 12:54:00

by Steven Price

[permalink] [raw]
Subject: Re: [PATCH 0/2] Fix W+X debug feature on x86

On 21/05/2020 20:08, Andrew Morton wrote:
> On Thu, 21 May 2020 16:23:06 +0100 Steven Price <[email protected]> wrote:
>
>> Jan alert me[1] that the W+X detection debug feature was broken in x86
>> by my change[2] to switch x86 to use the generic ptdump infrastructure.
>>
>> Fundamentally the approach of trying to move the calculation of
>> effective permissions into note_page() was broken because note_page() is
>> only called for 'leaf' entries and the effective permissions are passed
>> down via the internal nodes of the page tree. The solution I've taken
>> here is to create a new (optional) callback which is called for all
>> nodes of the page tree and therefore can calculate the effective
>> permissions.
>>
>> Secondly on some configurations (32 bit with PAE) "unsigned long" is not
>> large enough to store the table entries. The fix here is simple - let's
>> just use a u64.
>
> I assumed that a cc:stable was appropriate on both of these(?).

Yes thanks.

>> I'd welcome testing (and other comments), especially if you have a
>> configuration which previously triggered W+X warnings as I don't have
>> such a setup.
>
> I'll wait a while for such testing. If nothing happens then I guess we
> merge it up and see what then happens.
>

Thanks,

Steve

2020-05-22 18:09:50

by Qian Cai

[permalink] [raw]
Subject: Re: [PATCH 1/2] x86: mm: ptdump: Calculate effective permissions correctly

On Thu, May 21, 2020 at 04:23:07PM +0100, Steven Price wrote:
> --- a/arch/x86/mm/dump_pagetables.c
> +++ b/arch/x86/mm/dump_pagetables.c
> @@ -249,10 +249,22 @@ static void note_wx(struct pg_state *st, unsigned long addr)
> @@ -270,16 +282,10 @@ static void note_page(struct ptdump_state *pt_st, unsigned long addr, int level,
> struct seq_file *m = st->seq;
>
> new_prot = val & PTE_FLAGS_MASK;
> + new_eff = st->prot_levels[level];

This will trigger,

.config (if ever matters):
https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config

[ 104.532621] UBSAN: array-index-out-of-bounds in arch/x86/mm/dump_pagetables.c:284:27
[ 104.542620] index -1 is out of range for type 'pgprotval_t [5]'
[ 104.552624] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.7.0-rc6-next-20200522+ #5
[ 104.560865] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
[ 104.562604] Call Trace:
[ 104.562604] dump_stack+0xa7/0xea
[ 104.562604] ubsan_epilogue+0x9/0x45
[ 104.562604] __ubsan_handle_out_of_bounds.cold.12+0x2b/0x36
[ 104.562604] ? __kasan_check_write+0x14/0x20
[ 104.562604] note_page+0x91f/0xa00
[ 104.562604] ? down_read_non_owner+0x330/0x330
[ 104.562604] ? match_held_lock+0x20/0x250
[ 104.562604] ptdump_walk_pgd+0xa1/0xb0
[ 104.562604] ptdump_walk_pgd_level_core+0x19f/0x200
[ 104.562604] ? up+0x46/0x60
[ 104.562604] ? hugetlb_get_unmapped_area+0x590/0x590
[ 104.562604] ? lock_downgrade+0x3e0/0x3e0
[ 104.562604] ? _raw_spin_unlock_irqrestore+0x44/0x50
[ 104.562604] ? ptdump_walk_pgd_level_debugfs+0x80/0x80
[ 104.562604] ? ptdump_walk_pgd_level_core+0x200/0x200
[ 104.562604] ? virt_efi_set_variable_nonblocking+0xf1/0x110
[ 104.562604] ptdump_walk_pgd_level+0x32/0x40
[ 104.562604] efi_dump_pagetable+0x17/0x19
[ 104.562604] efi_enter_virtual_mode+0x3e5/0x3f5
[ 104.562604] start_kernel+0x848/0x8f6
[ 104.562604] ? __early_make_pgtable+0x2cb/0x314
[ 104.562604] ? thread_stack_cache_init+0xb/0xb
[ 104.562604] ? early_make_pgtable+0x21/0x23
[ 104.562604] ? early_idt_handler_common+0x35/0x4c
[ 104.562604] x86_64_start_reservations+0x24/0x26
[ 104.562604] x86_64_start_kernel+0xf4/0xfb
[ 104.562604] secondary_startup_64+0xb6/0xc0

>
> - if (level > 0) {
> - new_eff = effective_prot(st->prot_levels[level - 1],
> - new_prot);
> - } else {
> - new_eff = new_prot;
> - }
> -
> - if (level >= 0)
> - st->prot_levels[level] = new_eff;
> + if (!val)
> + new_eff = 0;
>
> /*
> * If we have a "break" in the series, we need to flush the state that

2020-05-26 13:42:32

by Steven Price

[permalink] [raw]
Subject: Re: [PATCH 1/2] x86: mm: ptdump: Calculate effective permissions correctly

On 22/05/2020 19:07, Qian Cai wrote:
> On Thu, May 21, 2020 at 04:23:07PM +0100, Steven Price wrote:
>> --- a/arch/x86/mm/dump_pagetables.c
>> +++ b/arch/x86/mm/dump_pagetables.c
>> @@ -249,10 +249,22 @@ static void note_wx(struct pg_state *st, unsigned long addr)
>> @@ -270,16 +282,10 @@ static void note_page(struct ptdump_state *pt_st, unsigned long addr, int level,
>> struct seq_file *m = st->seq;
>>
>> new_prot = val & PTE_FLAGS_MASK;
>> + new_eff = st->prot_levels[level];
>
> This will trigger,
>
> .config (if ever matters):
> https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config
>
> [ 104.532621] UBSAN: array-index-out-of-bounds in arch/x86/mm/dump_pagetables.c:284:27
> [ 104.542620] index -1 is out of range for type 'pgprotval_t [5]'

Doh! In this case the result (new_eff) is never actually used: the -1 is
used just to flush the last proper entry out, so in this case val == 0
which means the following statement sets new_eff = 0. But obviously an
out-of-bounds access isn't great. Thanks for testing!

The following patch should fix it up by making the assignment
conditional on val != 0.

Andrew: Would you like me to resend, or can you squash in the below?

Steve

-----8<----
diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index 17aa7ac94a34..ea9010113f69 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -282,10 +282,10 @@ static void note_page(struct ptdump_state *pt_st,
unsigned long addr, int level,
struct seq_file *m = st->seq;

new_prot = val & PTE_FLAGS_MASK;
- new_eff = st->prot_levels[level];
-
if (!val)
new_eff = 0;
+ else
+ new_eff = st->prot_levels[level];

/*
* If we have a "break" in the series, we need to flush the state that

2020-05-27 19:03:10

by Steven Price

[permalink] [raw]
Subject: Re: [PATCH 1/2] x86: mm: ptdump: Calculate effective permissions correctly

On 27/05/2020 16:15, Jan Beulich wrote:
> On 21.05.2020 17:23, Steven Price wrote:
>> By switching the x86 page table dump code to use the generic code the
>> effective permissions are no longer calculated correctly because the
>> note_page() function is only called for *leaf* entries. To calculate the
>> actual effective permissions it is necessary to observe the full
>> hierarchy of the page tree.
>>
>> Introduce a new callback for ptdump which is called for every entry and
>> can therefore update the prot_levels array correctly. note_page() can
>> then simply access the appropriate element in the array.
>>
>> Reported-by: Jan Beulich <[email protected]>
>> Fixes: 2ae27137b2db ("x86: mm: convert dump_pagetables to use walk_page_range")
>> Signed-off-by: Steven Price <[email protected]>
>
> This (with the later correction) and the 2nd patch
> Tested-by: Jan Beulich <[email protected]>
>
> It allowed me to go and finally find why under Xen there was still
> a single W+X mapping left - another bug, another patch.
>
> Thanks, Jan
>

Thanks for testing (and sorry for breaking it in the first place)!

Steve

2020-05-27 21:47:24

by Jan Beulich

[permalink] [raw]
Subject: Re: [PATCH 1/2] x86: mm: ptdump: Calculate effective permissions correctly

On 21.05.2020 17:23, Steven Price wrote:
> By switching the x86 page table dump code to use the generic code the
> effective permissions are no longer calculated correctly because the
> note_page() function is only called for *leaf* entries. To calculate the
> actual effective permissions it is necessary to observe the full
> hierarchy of the page tree.
>
> Introduce a new callback for ptdump which is called for every entry and
> can therefore update the prot_levels array correctly. note_page() can
> then simply access the appropriate element in the array.
>
> Reported-by: Jan Beulich <[email protected]>
> Fixes: 2ae27137b2db ("x86: mm: convert dump_pagetables to use walk_page_range")
> Signed-off-by: Steven Price <[email protected]>

This (with the later correction) and the 2nd patch
Tested-by: Jan Beulich <[email protected]>

It allowed me to go and finally find why under Xen there was still
a single W+X mapping left - another bug, another patch.

Thanks, Jan