2022-09-09 15:33:01

by Dave Hansen

[permalink] [raw]
Subject: [PATCH] x86/mm: Set NX bit when making pages present

The x86 mm code now actively refuses to create writable, executable
mappings and warns when there is an attempt to create one.

0day ran across a case triggered by module unloading, but that looks
to be a generic problem. It presumably goes like this:

1. Load module with direct map, P=1,W=1,NX=1
2. Map module executable, set P=1,W=0,NX=0
3. Free module, land in vfree()->vm_remove_mappings()
4. Set P=0 during alias processing, P=0,W=0,NX=0
5. Restore kernel mapping via set_direct_map_default_noflush(),
set P=1,W=1, resulting in P=1,W=1,NX=0

That's clearly a writable, executable mapping which is a no-no. The
new W^X code is clearly doing its job.

Fix it by actively setting _PAGE_NX when creating writable mappings.

One concern: I haven't been able to actually reproduce this, even by
loading and unloading the module that 0day hit it with. I'd like to
be able to reproduce this before committing a fix.

Reported-by: kernel test robot <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
Cc: Peter Zijlstra (Intel) <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: [email protected]
Cc: "H. Peter Anvin" <[email protected]>
Link: https://lore.kernel.org/all/[email protected]/

--

0day folks, please do share these as they come up. We want to keep
fixing them.
---
arch/x86/mm/pat/set_memory.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 1a2d6376251c..5fb5874ea2c6 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -2247,6 +2247,12 @@ static int __set_pages_p(struct page *page, int numpages)
.mask_clr = __pgprot(0),
.flags = 0};

+ /*
+ * Avoid W^X mappings that occur if the old
+ * mapping was !_PAGE_RW and !_PAGE_NX.
+ */
+ pgprot_val(cpa.mask_set) |= __supported_pte_mask & _PAGE_NX;
+
/*
* No alias checking needed for setting present flag. otherwise,
* we may need to break large pages for 64-bit kernel text
--
2.34.1


2022-09-16 04:29:15

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH] x86/mm: Set NX bit when making pages present

Hi Dave,

On Fri, Sep 09, 2022 at 08:27:21AM -0700, Dave Hansen wrote:
> The x86 mm code now actively refuses to create writable, executable
> mappings and warns when there is an attempt to create one.
>
> 0day ran across a case triggered by module unloading, but that looks
> to be a generic problem. It presumably goes like this:
>
> 1. Load module with direct map, P=1,W=1,NX=1
> 2. Map module executable, set P=1,W=0,NX=0
> 3. Free module, land in vfree()->vm_remove_mappings()
> 4. Set P=0 during alias processing, P=0,W=0,NX=0
> 5. Restore kernel mapping via set_direct_map_default_noflush(),
> set P=1,W=1, resulting in P=1,W=1,NX=0
>
> That's clearly a writable, executable mapping which is a no-no. The
> new W^X code is clearly doing its job.
>
> Fix it by actively setting _PAGE_NX when creating writable mappings.
>
> One concern: I haven't been able to actually reproduce this, even by
> loading and unloading the module that 0day hit it with. I'd like to
> be able to reproduce this before committing a fix.

We followed the steps in original report and confirmed the case could be
reproduced by running in qemu. Here we put a brief log for your
information.

$ lkp-tests/bin/lkp qemu -k bzImage -m modules.cgz job-script
...
exec command: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -fsdev local,id=test_dev,path=/root/.lkp//result/rcuscale/300s-srcu/vm-snb/debian-i386-20191205.cgz/i386-randconfig-a013-20211012/gcc-11/652c5bf380ad018e15006a7f8349800245ddbbad/0,security_model=none -device virtio-9p-pci,fsdev=test_dev,mount_tag=9p/virtfs_mount -kernel bzImage -append root=/dev/ram0 RESULT_ROOT=/result/rcuscale/300s-srcu/vm-snb/debian-i386-20191205.cgz/i386-randconfig-a013-20211012/gcc-11/652c5bf380ad018e15006a7f8349800245ddbbad/3 BOOT_IMAGE=/pkg/linux/i386-randconfig-a013-20211012/gcc-11/652c5bf380ad018e15006a7f8349800245ddbbad/vmlinuz-5.19.0-00430-g652c5bf380ad branch=tip/x86/mm job=/lkp/jobs/scheduled/vm-meta-159/rcuscale-300s-srcu-debian-i386-20191205.cgz-652c5bf380ad018e15006a7f8349800245ddbbad-20220904-96286-fjjdsq-5.yaml user=lkp ARCH=i386 kconfig=i386-randconfig-a013-20211012 commit=652c5bf380ad018e15006a7f8349800245ddbbad vmalloc=256M initramfs_async=0 page_owner=on max_uptime=2100 LKP_LOCAL_RUN=1 selinux=0 debug apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 net.ifnames=0 printk.devkmsg=on panic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic load_ramdisk=2 prompt_ramdisk=0 drbd.minor_count=8 systemd.log_level=err ignore_loglevel console=tty0 earlyprintk=ttyS0,115200 console=ttyS0,115200 vga=normal rw ip=dhcp result_service=9p/virtfs_mount -initrd /root/.lkp/cache/final_initrd -smp 2 -m 16384M -no-reboot -watchdog i6300esb -rtc base=localtime -device e1000,netdev=net0 -netdev user,id=net0 -display none -monitor null -serial stdio -drive file=/tmp/vdisk-root/disk-vm-snb-0,media=disk,if=virtio -drive file=/tmp/vdisk-root/disk-vm-snb-1,media=disk,if=virtio -drive file=/tmp/vdisk-root/disk-vm-snb-2,media=disk,if=virtio
early console in setup code
early console in extract_kernel
input_data: 0x02c1a094
input_len: 0x010390d3
output: 0x01000000
output_len: 0x0239fed0
kernel_total_size: 0x02c6b000
needed_size: 0x02c6b000

Decompressing Linux... Parsing ELF... No relocation needed... done.
Booting the kernel.
...
LKP: ttyS0: 287: Kernel tests: Boot OK!
LKP: ttyS0: 287: HOSTNAME vm-snb, MAC 52:54:00:12:34:56, kernel 5.19.0-00430-g652c5bf380ad 1
[ OK ] Reached target Printer.
LKP: ttyS0: 287: /lkp/lkp/src/bin/run-lkp /lkp/jobs/scheduled/vm-meta-159/rcuscale-300s-srcu-debian-i386-20191205.cgz-652c5bf380ad018e15006a7f8349800245ddbbad-20220904-96286-fjjdsq-5.yaml
[ 11.660921][ T203] ------------[ cut here ]------------
[ 11.661292][ T203] CPA refuse W^X violation: 0000000000000060 -> 0000000000000063 range: 0x00000000bab7e000 - 0x00000000bab7efff PFN 7ab7e
[ 11.662112][ T203] WARNING: CPU: 0 PID: 203 at arch/x86/mm/pat/set_memory.c:600 __change_page_attr+0x245/0x260
[ 11.662762][ T203] Modules linked in: torture
[ 11.663057][ T203] CPU: 0 PID: 203 Comm: kworker/0:3 Not tainted 5.19.0-00430-g652c5bf380ad #1
[ 11.663605][ T203] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[ 11.664175][ T203] Workqueue: events do_free_init
[ 11.664545][ T203] EIP: __change_page_attr+0x245/0x260
[ 11.664881][ T203] Code: ff ff ff 8d 87 ff 0f 00 00 ff 75 e4 31 d2 50 8b 45 e0 57 52 31 d2 51 52 50 68 18 49 8d 42 c6 05 bc 1e 02 43 01 e8 c1 e0 e7 00 <0f> 0b 83 c4 20 e9 40 ff ff ff e8 4c 8d f2 00 8d b4 26 00 00 00 00
[ 11.666092][ T203] EAX: 00000077 EBX: 7ab7e060 ECX: 42af1540 EDX: 42af153c
[ 11.666526][ T203] ESI: beb69ea4 EDI: bab7e000 EBP: beb69e4c ESP: beb69e0c
[ 11.666970][ T203] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 EFLAGS: 00010292
[ 11.667441][ T203] CR0: 80050033 CR2: 0805ae40 CR3: 7ab70000 CR4: 00040690
[ 11.667884][ T203] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 11.668326][ T203] DR6: fffe0ff0 DR7: 00000400
[ 11.668618][ T203] Call Trace:
[ 11.668832][ T203] __change_page_attr_set_clr+0x49/0x170
[ 11.669178][ T203] ? _vm_unmap_aliases+0x101/0x120
[ 11.669494][ T203] set_direct_map_default_noflush+0x49/0x60
[ 11.669857][ T203] __vunmap+0x192/0x270
[ 11.670114][ T203] __vfree+0x20/0x50
[ 11.670361][ T203] vfree+0x29/0x60
[ 11.670591][ T203] module_memfree+0x1b/0x30
[ 11.670875][ T203] do_free_init+0x2c/0x50
[ 11.671142][ T203] process_one_work+0x20c/0x480
[ 11.671441][ T203] worker_thread+0x166/0x3c0
[ 11.671727][ T203] kthread+0xbf/0xe0
[ 11.671966][ T203] ? rescuer_thread+0x310/0x310
[ 11.672278][ T203] ? kthread_complete_and_exit+0x20/0x20
[ 11.672626][ T203] ret_from_fork+0x19/0x30
[ 11.672907][ T203] irq event stamp: 2183
[ 11.673161][ T203] hardirqs last enabled at (2191): [<410b8b5e>] __up_console_sem+0x6e/0x80
[ 11.673706][ T203] hardirqs last disabled at (2198): [<410b8b45>] __up_console_sem+0x55/0x80
[ 11.674238][ T203] softirqs last enabled at (2164): [<41f8ed1c>] __do_softirq+0x2ac/0x3b0
[ 11.674761][ T203] softirqs last disabled at (2151): [<41023525>] do_softirq_own_stack+0x25/0x30
[ 11.675312][ T203] ---[ end trace 0000000000000000 ]---
[ 11.677154][ T618] srcu-scale:--- Start of test: nreaders=1 nwriters=1 verbose=1 shutdown=0
[ 11.677750][ T618] srcu-torture: Creating rcu_scale_reader task
[ 11.684278][ T621] srcu-scale: rcu_scale_reader task started
[ 11.688805][ T618] srcu-torture: Creating rcu_scale_writer task
[ 11.692377][ T622] srcu-scale: rcu_scale_writer task started
[ 11.692962][ T204] BUG: unable to handle page fault for address: bab7f400
[ 11.693401][ T204] #PF: supervisor write access in kernel mode
[ 11.693791][ T204] #PF: error_code(0x0002) - not-present page
[ 11.694157][ T204] *pde = 7ab81063 *pte = 7ab7f060
[ 11.694477][ T204] Oops: 0002 [#1]
...

>
> Reported-by: kernel test robot <[email protected]>
> Signed-off-by: Dave Hansen <[email protected]>
> Cc: Peter Zijlstra (Intel) <[email protected]>
> Cc: Andy Lutomirski <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: [email protected]
> Cc: "H. Peter Anvin" <[email protected]>
> Link: https://lore.kernel.org/all/[email protected]/
>
> --
>
> 0day folks, please do share these as they come up. We want to keep
> fixing them.

We sent another report at:
https://lore.kernel.org/all/[email protected]/

The WARNING and Call Trace are slightly different from those in the
first report. Please kindly check if this is a unique one or a
duplicate. Thanks.

> ---
> arch/x86/mm/pat/set_memory.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> index 1a2d6376251c..5fb5874ea2c6 100644
> --- a/arch/x86/mm/pat/set_memory.c
> +++ b/arch/x86/mm/pat/set_memory.c
> @@ -2247,6 +2247,12 @@ static int __set_pages_p(struct page *page, int numpages)
> .mask_clr = __pgprot(0),
> .flags = 0};
>
> + /*
> + * Avoid W^X mappings that occur if the old
> + * mapping was !_PAGE_RW and !_PAGE_NX.
> + */
> + pgprot_val(cpa.mask_set) |= __supported_pte_mask & _PAGE_NX;
> +
> /*
> * No alias checking needed for setting present flag. otherwise,
> * we may need to break large pages for 64-bit kernel text
> --
> 2.34.1
>

We tested this patch, but found it couldn't fix the problem:

=========================================================================================
compiler/kconfig/rootfs/runtime/scale_type/tbox_group/testcase:
gcc-11/i386-randconfig-a013-20211012/debian-i386-20191205.cgz/300s/srcu/vm-snb/rcuscale

commit:
652c5bf380ad0 ("x86/mm: Refuse W^X violations")
f7b4797d892f9 ("x86/mm: Set NX bit when making pages present")

652c5bf380ad018e f7b4797d892f968e0980fe820f7
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
6:6 0% 6:6 dmesg.BUG:unable_to_handle_page_fault_for_address
6:6 0% 6:6 dmesg.EIP:__change_page_attr
6:6 0% 6:6 dmesg.Kernel_panic-not_syncing:Fatal_exception
6:6 0% 6:6 dmesg.Oops:#[##]
6:6 0% 6:6 dmesg.WARNING:at_arch/x86/mm/pat/set_memory.c:#__change_page_attr
6:6 0% 6:6 dmesg.boot_failures

=========================================================================================
compiler/kconfig/rootfs/runtime/tbox_group/test/testcase/torture_type:
gcc-11/i386-randconfig-a004-20211104/debian-i386-20191205.cgz/300s/vm-snb/cpuhotplug/rcutorture/tasks-tracing

commit:
652c5bf380ad0 ("x86/mm: Refuse W^X violations")
f7b4797d892f9 ("x86/mm: Set NX bit when making pages present")

652c5bf380ad018e f7b4797d892f968e0980fe820f7
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
6:6 0% 6:6 dmesg.BUG:unable_to_handle_page_fault_for_address
6:6 0% 6:6 dmesg.EIP:verify_rwx
6:6 0% 6:6 dmesg.Kernel_panic-not_syncing:Fatal_exception
6:6 0% 6:6 dmesg.Oops:#[##]
6:6 0% 6:6 dmesg.WARNING:at_arch/x86/mm/pat/set_memory.c:#verify_rwx
6:6 0% 6:6 dmesg.boot_failures

--
Best Regards,
Yujie

2022-09-19 18:31:28

by Edgecombe, Rick P

[permalink] [raw]
Subject: Re: [PATCH] x86/mm: Set NX bit when making pages present

On Fri, 2022-09-09 at 08:27 -0700, Dave Hansen wrote:
> 0day ran across a case triggered by module unloading, but that looks
> to be a generic problem. It presumably goes like this:
>
> 1. Load module with direct map, P=1,W=1,NX=1
> 2. Map module executable, set P=1,W=0,NX=0
> 3. Free module, land in vfree()->vm_remove_mappings()
> 4. Set P=0 during alias processing, P=0,W=0,NX=0
> 5. Restore kernel mapping via
> set_direct_map_default_noflush(),
> set P=1,W=1, resulting in P=1,W=1,NX=0

I don't think this is right. CPA skips NX modification on the alias, so
the earlier module CPA's (1-2) shouldn't have touched NX where
set_direct_map() is operating. So NX *shouldn't* have been cleared in
this case.

Clearly somehow it is though. The original report has this in the log:
Notice: NX (Execute Disable) protection cannot be enabled: non-PAE
kernel!

So probably the W^X checker needs to check this non-PAE case
differently.

2022-09-19 19:20:33

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH] x86/mm: Set NX bit when making pages present

On 9/19/22 11:14, Edgecombe, Rick P wrote:
> Clearly somehow it is though. The original report has this in the log:
> Notice: NX (Execute Disable) protection cannot be enabled: non-PAE
> kernel!

Ah, crud. Nice catch, btw.

So, the CPU has NX, making cpu_feature_enabled(X86_FEATURE_NX)==1, but
the page table mode does not have support.

I guess we can either clear X86_FEATURE_NX around the "protection cannot
be enabled" message, or do something like the attached patch and just do
the check at runtime.

I'm not sure we want to mess with X86_FEATURE_NX itself. It seems to
get used for a few different things, including on the KVM side.

0day folks, can you see if _this_ one (totally untested) helps the
situation? At least this is a real oddball case. It's not something
that folks are very likely to hit at all.


Attachments:
nx.patch (923.00 B)

2022-09-20 03:45:24

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH] x86/mm: Set NX bit when making pages present

On Mon, Sep 19, 2022 at 11:30:09AM -0700, Dave Hansen wrote:
> On 9/19/22 11:14, Edgecombe, Rick P wrote:
> > Clearly somehow it is though. The original report has this in the log:
> > Notice: NX (Execute Disable) protection cannot be enabled: non-PAE
> > kernel!
>
> Ah, crud. Nice catch, btw.
>
> So, the CPU has NX, making cpu_feature_enabled(X86_FEATURE_NX)==1, but
> the page table mode does not have support.
>
> I guess we can either clear X86_FEATURE_NX around the "protection cannot
> be enabled" message, or do something like the attached patch and just do
> the check at runtime.
>
> I'm not sure we want to mess with X86_FEATURE_NX itself. It seems to
> get used for a few different things, including on the KVM side.
>
> 0day folks, can you see if _this_ one (totally untested) helps the
> situation? At least this is a real oddball case. It's not something
> that folks are very likely to hit at all.

This patch fixes the issues in below two reports. Thanks.

https://lore.kernel.org/all/[email protected]/
https://lore.kernel.org/all/[email protected]/

Comparison of test results:

=========================================================================================
compiler/kconfig/rootfs/runtime/scale_type/tbox_group/testcase:
gcc-11/i386-randconfig-a013-20211012/debian-i386-20191205.cgz/300s/srcu/vm-snb/rcuscale

commit:
652c5bf380ad01 ("x86/mm: Refuse W^X violations")
0718e6a68e2a7c ("nx.patch")

652c5bf380ad018e 0718e6a68e2a7cb20800b1b1372
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
6:6 -100% :6 dmesg.BUG:unable_to_handle_page_fault_for_address
6:6 -100% :6 dmesg.EIP:__change_page_attr
6:6 -100% :6 dmesg.Kernel_panic-not_syncing:Fatal_exception
6:6 -100% :6 dmesg.Oops:#[##]
6:6 -100% :6 dmesg.WARNING:at_arch/x86/mm/pat/set_memory.c:#__change_page_attr


=========================================================================================
compiler/kconfig/rootfs/runtime/tbox_group/test/testcase/torture_type:
gcc-11/i386-randconfig-a004-20211104/debian-i386-20191205.cgz/300s/vm-snb/cpuhotplug/rcutorture/tasks-tracing

commit:
652c5bf380ad01 ("x86/mm: Refuse W^X violations")
0718e6a68e2a7c ("nx.patch")

652c5bf380ad018e 0718e6a68e2a7cb20800b1b1372
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
6:6 -100% :6 dmesg.BUG:unable_to_handle_page_fault_for_address
6:6 -100% :6 dmesg.EIP:verify_rwx
6:6 -100% :6 dmesg.Kernel_panic-not_syncing:Fatal_exception
6:6 -100% :6 dmesg.Oops:#[##]
6:6 -100% :6 dmesg.WARNING:at_arch/x86/mm/pat/set_memory.c:#verify_rwx

> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 216fee7144ee..005492257abb 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -844,6 +844,7 @@ static void __init x86_report_nx(void)
> /* 32bit non-PAE kernel, NX cannot be used */
> printk(KERN_NOTICE "Notice: NX (Execute Disable) protection "
> "cannot be enabled: non-PAE kernel!\n");
> + __supported_pte_mask &= ~_PAGE_NX;
> #endif
> }
> }
> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> index 1a2d6376251c..f8162fe94bd0 100644
> --- a/arch/x86/mm/pat/set_memory.c
> +++ b/arch/x86/mm/pat/set_memory.c
> @@ -587,7 +587,7 @@ static inline pgprot_t verify_rwx(pgprot_t old, pgprot_t new, unsigned long star
> {
> unsigned long end;
>
> - if (!cpu_feature_enabled(X86_FEATURE_NX))
> + if (!(__supported_pte_mask & _PAGE_NX))
> return new;
>
> if (!((pgprot_val(old) ^ pgprot_val(new)) & (_PAGE_RW | _PAGE_NX)))