2021-10-19 02:59:18

by kernel test robot

[permalink] [raw]
Subject: [mm] 6128b3af2a: UBSAN:shift-out-of-bounds_in(null)



Greeting,

FYI, we noticed the following commit (built with clang-14):

commit: 6128b3af2a5e42386aa7faf37609b57f39fb7d00 ("mm: ignore MAP_DENYWRITE in ksys_mmap_pgoff()")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master


in testcase: trinity
version: trinity-i386-4d2343bd-1_20200320
with following parameters:

runtime: 300s

test-description: Trinity is a linux system call fuzz tester.
test-url: http://codemonkey.org.uk/projects/trinity/


on test machine: qemu-system-i386 -enable-kvm -cpu SandyBridge -smp 2 -m 4G

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


+---------------------------------------------+------------+------------+
| | 8d0920bde5 | 6128b3af2a |
+---------------------------------------------+------------+------------+
| UBSAN:shift-out-of-bounds_in(null) | 0 | 9 |
+---------------------------------------------+------------+------------+


If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>



[ 126.758570][ T3293] ================================================================================
[ 126.758949][ T3293] UBSAN: shift-out-of-bounds in (null):0:0
[ 126.759174][ T3293] BUG: kernel NULL pointer dereference, address: 00000000
[ 126.759447][ T3293] #PF: supervisor read access in kernel mode
[ 126.759676][ T3293] #PF: error_code(0x0000) - not-present page
[ 126.759905][ T3293] *pde = 00000000
[ 126.760047][ T3293] Oops: 0000 [#1] SMP
[ 126.760205][ T3293] CPU: 1 PID: 3293 Comm: trinity-c4 Not tainted 5.14.0-00006-g6128b3af2a5e #1
[ 126.760541][ T3293] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[ 126.760890][ T3293] EIP: __ubsan_handle_shift_out_of_bounds+0x88/0x350
[ 126.761147][ T3293] Code: 00 83 c4 04 7f 23 47 04 7f 23 47 04 ff 37 68 ef ff 37 68 ef e3 77 d0 d7 e3 77 d0 d7 00 8b 45 f0 00 8b 45 f0 c4 14 66 83 c4 14 <66> 83 66
83 3f 00 66 83 3f 00 00 00 66 83 00 00 66 83 b9 01 00 00
[ 126.761889][ T3293] EAX: 00000000 EBX: f345b500 ECX: 00000027 EDX: eba9ce40
[ 126.762159][ T3293] ESI: 00000046 EDI: 00000000 EBP: f3575f40 ESP: f3575ecc
[ 126.762428][ T3293] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010286
[ 126.762718][ T3293] CR0: 80050033 CR2: 00000000 CR3: 33464000 CR4: 000406d0
[ 126.762989][ T3293] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 126.763259][ T3293] DR6: fffe0ff0 DR7: 00000400
[ 126.763436][ T3293] Call Trace:
[ 126.763562][ T3293] ? rcu_lock_acquire+0x30/0x30
[ 126.763749][ T3293] ? rcu_read_lock_sched_held+0x31/0x70
[ 126.763960][ T3293] ksys_mmap_pgoff+0x1fc/0x290
[ 126.764146][ T3293] __ia32_sys_mmap_pgoff+0x1c/0x30
[ 126.764343][ T3293] do_int80_syscall_32+0x39/0x80
[ 126.764532][ T3293] entry_INT80_32+0x10d/0x10d
[ 126.764709][ T3293] EIP: 0xb7fbda02
[ 126.764848][ T3293] Code: 95 01 00 05 25 36 02 00 83 ec 14 8d 80 e8 99 ff ff 50 6a 02 e8 1f ff 00 00 c7 04 24 7f 00 00 00 e8 7e 87 01 00 66 90 90 cd 80 <c3> 8d b6
00 00 00 00 8d bc 27 00 00 00 00 8b 1c 24 c3 8d b6 00 00
[ 126.765591][ T3293] EAX: ffffffda EBX: 00000000 ECX: 00001000 EDX: 55dd7eb6
[ 126.765859][ T3293] ESI: f0bd6374 EDI: ffffffff EBP: 00000000 ESP: bf9964d8
[ 126.766129][ T3293] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000296
[ 126.766419][ T3293] Modules linked in: aesni_intel crypto_simd qemu_fw_cfg autofs4
[ 126.766715][ T3293] CR2: 0000000000000000
[ 126.766894][ T3293] ---[ end trace e6000e119f0dc7f3 ]---
[ 126.767105][ T3293] EIP: __ubsan_handle_shift_out_of_bounds+0x88/0x350
[ 126.767361][ T3293] Code: 00 83 c4 04 7f 23 47 04 7f 23 47 04 ff 37 68 ef ff 37 68 ef e3 77 d0 d7 e3 77 d0 d7 00 8b 45 f0 00 8b 45 f0 c4 14 66 83 c4 14 <66> 83 66 83 3f 00 66 83 3f 00 00 00 66 83 00 00 66 83 b9 01 00 00
[ 126.768112][ T3293] EAX: 00000000 EBX: f345b500 ECX: 00000027 EDX: eba9ce40
[ 126.768384][ T3293] ESI: 00000046 EDI: 00000000 EBP: f3575f40 ESP: f3575ecc
[ 126.768657][ T3293] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010286
[ 126.768947][ T3293] CR0: 80050033 CR2: 00000000 CR3: 33464000 CR4: 000406d0
[ 126.769223][ T3293] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 126.769496][ T3293] DR6: fffe0ff0 DR7: 00000400
[ 126.769680][ T3293] Kernel panic - not syncing: Fatal exception
[ 126.769946][ T3293] Kernel Offset: disabled



To reproduce:

# build kernel
cd linux
cp config-5.14.0-00006-g6128b3af2a5e .config
make HOSTCC=clang-14 CC=clang-14 ARCH=i386 olddefconfig prepare modules_prepare bzImage

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> job-script # job-script is attached in this email

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.



---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation

Thanks,
Oliver Sang


Attachments:
(No filename) (5.09 kB)
config-5.14.0-00006-g6128b3af2a5e (119.59 kB)
job-script (4.41 kB)
dmesg.xz (16.53 kB)
Download all attachments

2021-10-19 15:53:03

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [mm] 6128b3af2a: UBSAN:shift-out-of-bounds_in(null)

kernel test robot <[email protected]> writes:

> Greeting,
>
> FYI, we noticed the following commit (built with clang-14):
>
> commit: 6128b3af2a5e42386aa7faf37609b57f39fb7d00 ("mm: ignore MAP_DENYWRITE in ksys_mmap_pgoff()")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

I believe this failure is misattributed. Perhaps your reproducer
only intermittently reproduces the problem?

The change in question only contains

flags &= ~MAP_DENYWRITE

After all of the other users of MAP_DENYWRITE had been removed from the
kernel. So I don't see how it could possibly be responsible for the
reported shift out of bounds problem.

Eric




> in testcase: trinity
> version: trinity-i386-4d2343bd-1_20200320
> with following parameters:
>
> runtime: 300s
>
> test-description: Trinity is a linux system call fuzz tester.
> test-url: http://codemonkey.org.uk/projects/trinity/
>
>
> on test machine: qemu-system-i386 -enable-kvm -cpu SandyBridge -smp 2 -m 4G
>
> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
>
>
> +---------------------------------------------+------------+------------+
> | | 8d0920bde5 | 6128b3af2a |
> +---------------------------------------------+------------+------------+
> | UBSAN:shift-out-of-bounds_in(null) | 0 | 9 |
> +---------------------------------------------+------------+------------+
>
>
> If you fix the issue, kindly add following tag
> Reported-by: kernel test robot <[email protected]>
>
>
>
> [ 126.758570][ T3293] ================================================================================
> [ 126.758949][ T3293] UBSAN: shift-out-of-bounds in (null):0:0
> [ 126.759174][ T3293] BUG: kernel NULL pointer dereference, address: 00000000
> [ 126.759447][ T3293] #PF: supervisor read access in kernel mode
> [ 126.759676][ T3293] #PF: error_code(0x0000) - not-present page
> [ 126.759905][ T3293] *pde = 00000000
> [ 126.760047][ T3293] Oops: 0000 [#1] SMP
> [ 126.760205][ T3293] CPU: 1 PID: 3293 Comm: trinity-c4 Not tainted 5.14.0-00006-g6128b3af2a5e #1
> [ 126.760541][ T3293] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> [ 126.760890][ T3293] EIP: __ubsan_handle_shift_out_of_bounds+0x88/0x350
> [ 126.761147][ T3293] Code: 00 83 c4 04 7f 23 47 04 7f 23 47 04 ff 37 68 ef ff 37 68 ef e3 77 d0 d7 e3 77 d0 d7 00 8b 45 f0 00 8b 45 f0 c4 14 66 83 c4 14 <66> 83 66
> 83 3f 00 66 83 3f 00 00 00 66 83 00 00 66 83 b9 01 00 00
> [ 126.761889][ T3293] EAX: 00000000 EBX: f345b500 ECX: 00000027 EDX: eba9ce40
> [ 126.762159][ T3293] ESI: 00000046 EDI: 00000000 EBP: f3575f40 ESP: f3575ecc
> [ 126.762428][ T3293] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010286
> [ 126.762718][ T3293] CR0: 80050033 CR2: 00000000 CR3: 33464000 CR4: 000406d0
> [ 126.762989][ T3293] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [ 126.763259][ T3293] DR6: fffe0ff0 DR7: 00000400
> [ 126.763436][ T3293] Call Trace:
> [ 126.763562][ T3293] ? rcu_lock_acquire+0x30/0x30
> [ 126.763749][ T3293] ? rcu_read_lock_sched_held+0x31/0x70
> [ 126.763960][ T3293] ksys_mmap_pgoff+0x1fc/0x290
> [ 126.764146][ T3293] __ia32_sys_mmap_pgoff+0x1c/0x30
> [ 126.764343][ T3293] do_int80_syscall_32+0x39/0x80
> [ 126.764532][ T3293] entry_INT80_32+0x10d/0x10d
> [ 126.764709][ T3293] EIP: 0xb7fbda02
> [ 126.764848][ T3293] Code: 95 01 00 05 25 36 02 00 83 ec 14 8d 80 e8 99 ff ff 50 6a 02 e8 1f ff 00 00 c7 04 24 7f 00 00 00 e8 7e 87 01 00 66 90 90 cd 80 <c3> 8d b6
> 00 00 00 00 8d bc 27 00 00 00 00 8b 1c 24 c3 8d b6 00 00
> [ 126.765591][ T3293] EAX: ffffffda EBX: 00000000 ECX: 00001000 EDX: 55dd7eb6
> [ 126.765859][ T3293] ESI: f0bd6374 EDI: ffffffff EBP: 00000000 ESP: bf9964d8
> [ 126.766129][ T3293] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000296
> [ 126.766419][ T3293] Modules linked in: aesni_intel crypto_simd qemu_fw_cfg autofs4
> [ 126.766715][ T3293] CR2: 0000000000000000
> [ 126.766894][ T3293] ---[ end trace e6000e119f0dc7f3 ]---
> [ 126.767105][ T3293] EIP: __ubsan_handle_shift_out_of_bounds+0x88/0x350
> [ 126.767361][ T3293] Code: 00 83 c4 04 7f 23 47 04 7f 23 47 04 ff 37 68 ef ff 37 68 ef e3 77 d0 d7 e3 77 d0 d7 00 8b 45 f0 00 8b 45 f0 c4 14 66 83 c4 14 <66> 83 66 83 3f 00 66 83 3f 00 00 00 66 83 00 00 66 83 b9 01 00 00
> [ 126.768112][ T3293] EAX: 00000000 EBX: f345b500 ECX: 00000027 EDX: eba9ce40
> [ 126.768384][ T3293] ESI: 00000046 EDI: 00000000 EBP: f3575f40 ESP: f3575ecc
> [ 126.768657][ T3293] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010286
> [ 126.768947][ T3293] CR0: 80050033 CR2: 00000000 CR3: 33464000 CR4: 000406d0
> [ 126.769223][ T3293] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [ 126.769496][ T3293] DR6: fffe0ff0 DR7: 00000400
> [ 126.769680][ T3293] Kernel panic - not syncing: Fatal exception
> [ 126.769946][ T3293] Kernel Offset: disabled
>
>
>
> To reproduce:
>
> # build kernel
> cd linux
> cp config-5.14.0-00006-g6128b3af2a5e .config
> make HOSTCC=clang-14 CC=clang-14 ARCH=i386 olddefconfig prepare modules_prepare bzImage
>
> git clone https://github.com/intel/lkp-tests.git
> cd lkp-tests
> bin/lkp qemu -k <bzImage> job-script # job-script is attached in this email
>
> # if come across any failure that blocks the test,
> # please remove ~/.lkp and /lkp dir to run from a clean state.
>
>
>
> ---
> 0DAY/LKP+ Test Infrastructure Open Source Technology Center
> https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation
>
> Thanks,
> Oliver Sang

2021-10-20 07:24:07

by David Hildenbrand

[permalink] [raw]
Subject: Re: [mm] 6128b3af2a: UBSAN:shift-out-of-bounds_in(null)

On 19.10.21 17:49, Eric W. Biederman wrote:
> kernel test robot <[email protected]> writes:
>
>> Greeting,
>>
>> FYI, we noticed the following commit (built with clang-14):
>>
>> commit: 6128b3af2a5e42386aa7faf37609b57f39fb7d00 ("mm: ignore MAP_DENYWRITE in ksys_mmap_pgoff()")
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> I believe this failure is misattributed. Perhaps your reproducer
> only intermittently reproduces the problem?
>
> The change in question only contains
>
> flags &= ~MAP_DENYWRITE
>
> After all of the other users of MAP_DENYWRITE had been removed from the
> kernel. So I don't see how it could possibly be responsible for the
> reported shift out of bounds problem.
>
> Eric

Thanks for looking into this Eric while I spent the last couple of days
in bed feeling miserable. :)


So we get 9 new instances of "UBSAN:shift-out-of-bounds_in(null)" (NULL
pointer dereference) on 6128b3af2a compared to 6128b3af2a^ (8d0920bde5),
apparently inside ksys_mmap_pgoff() on 32bit.

As we're dealing with a fuzzer, is there any reproducer as sometimes
provided by syzkaller? The report itself is not very helpful when
judging if that patch is actually responsible for what we're seeing.

I agree with Eric that it's rather unlikely that when we stop masking
off a bit that's ignored throughout the kernel, that we suddenly trigger
a NULL pointer de-reference. But I learned that everything is possible ;)

--
Thanks,

David / dhildenb

2021-10-20 13:56:56

by kernel test robot

[permalink] [raw]
Subject: Re: [mm] 6128b3af2a: UBSAN:shift-out-of-bounds_in(null)

Hi, David, Hi, Eric,

On Wed, Oct 20, 2021 at 09:22:52AM +0200, David Hildenbrand wrote:
> On 19.10.21 17:49, Eric W. Biederman wrote:
> > kernel test robot <[email protected]> writes:
> >
> >> Greeting,
> >>
> >> FYI, we noticed the following commit (built with clang-14):
> >>
> >> commit: 6128b3af2a5e42386aa7faf37609b57f39fb7d00 ("mm: ignore MAP_DENYWRITE in ksys_mmap_pgoff()")
> >> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> >
> > I believe this failure is misattributed. Perhaps your reproducer
> > only intermittently reproduces the problem?

yes, we only reproduce the problem intermittently, those 9 instances are
out of 115 runs.
8d0920bde5eb8ec7 6128b3af2a5e42386aa7faf3760
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
:115 8% 9:115 dmesg.UBSAN:shift-out-of-bounds_in(null) <--


> >
> > The change in question only contains
> >
> > flags &= ~MAP_DENYWRITE
> >
> > After all of the other users of MAP_DENYWRITE had been removed from the
> > kernel. So I don't see how it could possibly be responsible for the
> > reported shift out of bounds problem.
> >
> > Eric
>
> Thanks for looking into this Eric while I spent the last couple of days
> in bed feeling miserable. :)
>
>
> So we get 9 new instances of "UBSAN:shift-out-of-bounds_in(null)" (NULL
> pointer dereference) on 6128b3af2a compared to 6128b3af2a^ (8d0920bde5),
> apparently inside ksys_mmap_pgoff() on 32bit.
>
> As we're dealing with a fuzzer, is there any reproducer as sometimes
> provided by syzkaller? The report itself is not very helpful when
> judging if that patch is actually responsible for what we're seeing.
>
> I agree with Eric that it's rather unlikely that when we stop masking
> off a bit that's ignored throughout the kernel, that we suddenly trigger
> a NULL pointer de-reference. But I learned that everything is possible ;)


now we run parent 200 more times, the "UBSAN:shift-out-of-bounds_in(null)" (1)
still cannot be reproduced on parent:
8d0920bde5eb8ec7 6128b3af2a5e42386aa7faf3760
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
45:315 -11% 9:115 dmesg.BUG:kernel_NULL_pointer_dereference,address
:315 3% 8:115 dmesg.BUG:unable_to_handle_page_fault_for_address
45:315 -9% 17:115 dmesg.EIP:__ubsan_handle_shift_out_of_bounds <--(2)
45:315 -9% 17:115 dmesg.Kernel_panic-not_syncing:Fatal_exception
45:315 -9% 17:115 dmesg.Oops:#[##]
:315 3% 9:115 dmesg.UBSAN:shift-out-of-bounds_in(null) <--(1)
45:315 -9% 17:115 dmesg.boot_failures


however, from above (2), we found parent dmesg (attached) has similar
Call Trace, which just does't have "UBSAN:shift-out-of-bounds_in(null)"
things:
[ 272.487295][ T7295] BUG: kernel NULL pointer dereference, address: 0000000c
[ 272.488078][ T7295] #PF: supervisor read access in kernel mode
[ 272.488673][ T7295] #PF: error_code(0x0000) - not-present page
[ 272.489266][ T7295] *pde = 00000000
[ 272.489751][ T7295] Oops: 0000 [#1] SMP
[ 272.490165][ T7295] CPU: 1 PID: 7295 Comm: trinity-c2 Not tainted 5.14.0-00005-g8d0920bde5eb #1
[ 272.491122][ T7295] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[ 272.492067][ T7295] EIP: __ubsan_handle_shift_out_of_bounds+0xe/0x350
[ 272.492760][ T7295] Code: 05 90 a6 c2 00 68 2a 54 00 68 2a 54 bd 4e 00 8d bd 4e 00 8d 00 00 66 90 00 00 66 90 57 56 83 ec 57 56 83 ec 89 c7 8b 48 89 c7 <8b> 48 8d\
b4 26 00 8d b4 26 00 75 b4 64 8b 75 b4 64 8b ca 83 bb 1c
[ 272.494890][ T7295] EAX: 00000000 EBX: c5d6cf38 ECX: 00000031 EDX: 00000000
[ 272.495686][ T7295] ESI: f138eb71 EDI: 00000000 EBP: f5a23f3c ESP: f5a23ec8
[ 272.496532][ T7295] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010292
[ 272.497383][ T7295] CR0: 80050033 CR2: 0000000c CR3: 3528d000 CR4: 000406d0
[ 272.498152][ T7295] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 272.498897][ T7295] DR6: fffe0ff0 DR7: 00000400
[ 272.499411][ T7295] Call Trace:
[ 272.499827][ T7295] ? __lock_acquire+0x955/0xb80
[ 272.500361][ T7295] ? rcu_lock_acquire+0x30/0x30
[ 272.500875][ T7295] ? rcu_read_lock_sched_held+0x31/0x70
[ 272.501500][ T7295] ksys_mmap_pgoff+0x1fd/0x290
[ 272.501990][ T7295] __ia32_sys_mmap_pgoff+0x1c/0x30
[ 272.502512][ T7295] do_int80_syscall_32+0x39/0x80
[ 272.503101][ T7295] entry_INT80_32+0x10d/0x10d
[ 272.503624][ T7295] EIP: 0xb7f71a02
[ 272.504029][ T7295] Code: 95 01 00 05 25 36 02 00 83 ec 14 8d 80 e8 99 ff ff 50 6a 02 e8 1f ff 00 00 c7 04 24 7f 00 00 00 e8 7e 87 01 00 66 90 90 cd 80 <c3> 8d b6 00 00 00 00 8d bc 27 00 00 00 00 8b 1c 24 c3 8d b6 00 00
[ 272.506044][ T7295] EAX: ffffffda EBX: 00000000 ECX: 00000000 EDX: f138eb71
[ 272.506825][ T7295] ESI: c5d6cf38 EDI: ffffffff EBP: 00000000 ESP: bfca54d8
[ 272.507592][ T7295] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000296
[ 272.508417][ T7295] Modules linked in: aesni_intel crypto_simd qemu_fw_cfg autofs4
[ 272.509201][ T7295] CR2: 000000000000000c
[ 272.509704][ T7295] ---[ end trace 97b48cc676da14f9 ]---
[ 272.510293][ T7295] EIP: __ubsan_handle_shift_out_of_bounds+0xe/0x350
[ 272.511023][ T7295] Code: 05 90 a6 c2 00 68 2a 54 00 68 2a 54 bd 4e 00 8d bd 4e 00 8d 00 00 66 90 00 00 66 90 57 56 83 ec 57 56 83 ec 89 c7 8b 48 89 c7 <8b> 48 8d b4 26 00 8d b4 26 00 75 b4 64 8b 75 b4 64 8b ca 83 bb 1c
[ 272.513169][ T7295] EAX: 00000000 EBX: c5d6cf38 ECX: 00000031 EDX: 00000000
[ 272.513979][ T7295] ESI: f138eb71 EDI: 00000000 EBP: f5a23f3c ESP: f5a23ec8
[ 272.514800][ T7295] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010292
[ 272.515974][ T7295] CR0: 80050033 CR2: 0000000c CR3: 3528d000 CR4: 000406d0
[ 272.516787][ T7295] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 272.517619][ T7295] DR6: fffe0ff0 DR7: 00000400
[ 272.518105][ T7295] Kernel panic - not syncing: Fatal exception
[ 272.519566][ T7295] Kernel Offset: disabled


as contrast, in fbc:
[ 126.758570][ T3293] ================================================================================
[ 126.758949][ T3293] UBSAN: shift-out-of-bounds in (null):0:0
[ 126.759174][ T3293] BUG: kernel NULL pointer dereference, address: 00000000
[ 126.759447][ T3293] #PF: supervisor read access in kernel mode
[ 126.759676][ T3293] #PF: error_code(0x0000) - not-present page
[ 126.759905][ T3293] *pde = 00000000
[ 126.760047][ T3293] Oops: 0000 [#1] SMP
[ 126.760205][ T3293] CPU: 1 PID: 3293 Comm: trinity-c4 Not tainted 5.14.0-00006-g6128b3af2a5e #1
[ 126.760541][ T3293] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[ 126.760890][ T3293] EIP: __ubsan_handle_shift_out_of_bounds+0x88/0x350
[ 126.761147][ T3293] Code: 00 83 c4 04 7f 23 47 04 7f 23 47 04 ff 37 68 ef ff 37 68 ef e3 77 d0 d7 e3 77 d0 d7 00 8b 45 f0 00 8b 45 f0 c4 14 66 83 c4 14 <66> 83 66
83 3f 00 66 83 3f 00 00 00 66 83 00 00 66 83 b9 01 00 00
[ 126.761889][ T3293] EAX: 00000000 EBX: f345b500 ECX: 00000027 EDX: eba9ce40
[ 126.762159][ T3293] ESI: 00000046 EDI: 00000000 EBP: f3575f40 ESP: f3575ecc
[ 126.762428][ T3293] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010286
[ 126.762718][ T3293] CR0: 80050033 CR2: 00000000 CR3: 33464000 CR4: 000406d0
[ 126.762989][ T3293] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 126.763259][ T3293] DR6: fffe0ff0 DR7: 00000400
[ 126.763436][ T3293] Call Trace:
[ 126.763562][ T3293] ? rcu_lock_acquire+0x30/0x30
[ 126.763749][ T3293] ? rcu_read_lock_sched_held+0x31/0x70
[ 126.763960][ T3293] ksys_mmap_pgoff+0x1fc/0x290
[ 126.764146][ T3293] __ia32_sys_mmap_pgoff+0x1c/0x30
[ 126.764343][ T3293] do_int80_syscall_32+0x39/0x80
[ 126.764532][ T3293] entry_INT80_32+0x10d/0x10d
[ 126.764709][ T3293] EIP: 0xb7fbda02
[ 126.764848][ T3293] Code: 95 01 00 05 25 36 02 00 83 ec 14 8d 80 e8 99 ff ff 50 6a 02 e8 1f ff 00 00 c7 04 24 7f 00 00 00 e8 7e 87 01 00 66 90 90 cd 80 <c3> 8d b6
00 00 00 00 8d bc 27 00 00 00 00 8b 1c 24 c3 8d b6 00 00
[ 126.765591][ T3293] EAX: ffffffda EBX: 00000000 ECX: 00001000 EDX: 55dd7eb6
[ 126.765859][ T3293] ESI: f0bd6374 EDI: ffffffff EBP: 00000000 ESP: bf9964d8
[ 126.766129][ T3293] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000296
[ 126.766419][ T3293] Modules linked in: aesni_intel crypto_simd qemu_fw_cfg autofs4
[ 126.766715][ T3293] CR2: 0000000000000000
[ 126.766894][ T3293] ---[ end trace e6000e119f0dc7f3 ]---
[ 126.767105][ T3293] EIP: __ubsan_handle_shift_out_of_bounds+0x88/0x350
[ 126.767361][ T3293] Code: 00 83 c4 04 7f 23 47 04 7f 23 47 04 ff 37 68 ef ff 37 68 ef e3 77 d0 d7 e3 77 d0 d7 00 8b 45 f0 00 8b 45 f0 c4 14 66 83 c4 14 <66> 83 66
+83 3f 00 66 83 3f 00 00 00 66 83 00 00 66 83 b9 01 00 00
[ 126.768112][ T3293] EAX: 00000000 EBX: f345b500 ECX: 00000027 EDX: eba9ce40
[ 126.768384][ T3293] ESI: 00000046 EDI: 00000000 EBP: f3575f40 ESP: f3575ecc
[ 126.768657][ T3293] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010286
[ 126.768947][ T3293] CR0: 80050033 CR2: 00000000 CR3: 33464000 CR4: 000406d0
[ 126.769223][ T3293] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 126.769496][ T3293] DR6: fffe0ff0 DR7: 00000400
[ 126.769680][ T3293] Kernel panic - not syncing: Fatal exception
[ 126.769946][ T3293] Kernel Offset: disabled


basically, we just based on the diff to report out, but maybe need your education
if this "UBSAN:shift-out-of-bounds_in(null)" diff really matter in this case.


>
> --
> Thanks,
>
> David / dhildenb
>


Attachments:
(No filename) (9.77 kB)
dmesg-parent.xz (17.33 kB)
Download all attachments

2021-10-20 16:46:28

by David Hildenbrand

[permalink] [raw]
Subject: Re: [mm] 6128b3af2a: UBSAN:shift-out-of-bounds_in(null)

On 20.10.21 16:13, Oliver Sang wrote:
> Hi, David, Hi, Eric,
>
> On Wed, Oct 20, 2021 at 09:22:52AM +0200, David Hildenbrand wrote:
>> On 19.10.21 17:49, Eric W. Biederman wrote:
>>> kernel test robot <[email protected]> writes:
>>>
>>>> Greeting,
>>>>
>>>> FYI, we noticed the following commit (built with clang-14):
>>>>
>>>> commit: 6128b3af2a5e42386aa7faf37609b57f39fb7d00 ("mm: ignore MAP_DENYWRITE in ksys_mmap_pgoff()")
>>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>>>
>>> I believe this failure is misattributed. Perhaps your reproducer
>>> only intermittently reproduces the problem?
>
> yes, we only reproduce the problem intermittently, those 9 instances are
> out of 115 runs.
> 8d0920bde5eb8ec7 6128b3af2a5e42386aa7faf3760
> ---------------- ---------------------------
> fail:runs %reproduction fail:runs
> | | |
> :115 8% 9:115 dmesg.UBSAN:shift-out-of-bounds_in(null) <--
>
>
>>>
>>> The change in question only contains
>>>
>>> flags &= ~MAP_DENYWRITE
>>>
>>> After all of the other users of MAP_DENYWRITE had been removed from the
>>> kernel. So I don't see how it could possibly be responsible for the
>>> reported shift out of bounds problem.
>>>
>>> Eric
>>
>> Thanks for looking into this Eric while I spent the last couple of days
>> in bed feeling miserable. :)
>>
>>
>> So we get 9 new instances of "UBSAN:shift-out-of-bounds_in(null)" (NULL
>> pointer dereference) on 6128b3af2a compared to 6128b3af2a^ (8d0920bde5),
>> apparently inside ksys_mmap_pgoff() on 32bit.
>>
>> As we're dealing with a fuzzer, is there any reproducer as sometimes
>> provided by syzkaller? The report itself is not very helpful when
>> judging if that patch is actually responsible for what we're seeing.
>>
>> I agree with Eric that it's rather unlikely that when we stop masking
>> off a bit that's ignored throughout the kernel, that we suddenly trigger
>> a NULL pointer de-reference. But I learned that everything is possible ;)
>
>
> now we run parent 200 more times, the "UBSAN:shift-out-of-bounds_in(null)" (1)
> still cannot be reproduced on parent:
> 8d0920bde5eb8ec7 6128b3af2a5e42386aa7faf3760
> ---------------- ---------------------------
> fail:runs %reproduction fail:runs
> | | |
> 45:315 -11% 9:115 dmesg.BUG:kernel_NULL_pointer_dereference,address
> :315 3% 8:115 dmesg.BUG:unable_to_handle_page_fault_for_address
> 45:315 -9% 17:115 dmesg.EIP:__ubsan_handle_shift_out_of_bounds <--(2)
> 45:315 -9% 17:115 dmesg.Kernel_panic-not_syncing:Fatal_exception
> 45:315 -9% 17:115 dmesg.Oops:#[##]
> :315 3% 9:115 dmesg.UBSAN:shift-out-of-bounds_in(null) <--(1)
> 45:315 -9% 17:115 dmesg.boot_failures
>
>
> however, from above (2), we found parent dmesg (attached) has similar
> Call Trace, which just does't have "UBSAN:shift-out-of-bounds_in(null)"
> things:
> [ 272.487295][ T7295] BUG: kernel NULL pointer dereference, address: 0000000c
> [ 272.488078][ T7295] #PF: supervisor read access in kernel mode
> [ 272.488673][ T7295] #PF: error_code(0x0000) - not-present page
> [ 272.489266][ T7295] *pde = 00000000
> [ 272.489751][ T7295] Oops: 0000 [#1] SMP
> [ 272.490165][ T7295] CPU: 1 PID: 7295 Comm: trinity-c2 Not tainted 5.14.0-00005-g8d0920bde5eb #1
> [ 272.491122][ T7295] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> [ 272.492067][ T7295] EIP: __ubsan_handle_shift_out_of_bounds+0xe/0x350
> [ 272.492760][ T7295] Code: 05 90 a6 c2 00 68 2a 54 00 68 2a 54 bd 4e 00 8d bd 4e 00 8d 00 00 66 90 00 00 66 90 57 56 83 ec 57 56 83 ec 89 c7 8b 48 89 c7 <8b> 48 8d\
> b4 26 00 8d b4 26 00 75 b4 64 8b 75 b4 64 8b ca 83 bb 1c
> [ 272.494890][ T7295] EAX: 00000000 EBX: c5d6cf38 ECX: 00000031 EDX: 00000000
> [ 272.495686][ T7295] ESI: f138eb71 EDI: 00000000 EBP: f5a23f3c ESP: f5a23ec8
> [ 272.496532][ T7295] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010292
> [ 272.497383][ T7295] CR0: 80050033 CR2: 0000000c CR3: 3528d000 CR4: 000406d0
> [ 272.498152][ T7295] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [ 272.498897][ T7295] DR6: fffe0ff0 DR7: 00000400
> [ 272.499411][ T7295] Call Trace:
> [ 272.499827][ T7295] ? __lock_acquire+0x955/0xb80
> [ 272.500361][ T7295] ? rcu_lock_acquire+0x30/0x30
> [ 272.500875][ T7295] ? rcu_read_lock_sched_held+0x31/0x70
> [ 272.501500][ T7295] ksys_mmap_pgoff+0x1fd/0x290
> [ 272.501990][ T7295] __ia32_sys_mmap_pgoff+0x1c/0x30
> [ 272.502512][ T7295] do_int80_syscall_32+0x39/0x80
> [ 272.503101][ T7295] entry_INT80_32+0x10d/0x10d
> [ 272.503624][ T7295] EIP: 0xb7f71a02
> [ 272.504029][ T7295] Code: 95 01 00 05 25 36 02 00 83 ec 14 8d 80 e8 99 ff ff 50 6a 02 e8 1f ff 00 00 c7 04 24 7f 00 00 00 e8 7e 87 01 00 66 90 90 cd 80 <c3> 8d b6 00 00 00 00 8d bc 27 00 00 00 00 8b 1c 24 c3 8d b6 00 00
> [ 272.506044][ T7295] EAX: ffffffda EBX: 00000000 ECX: 00000000 EDX: f138eb71
> [ 272.506825][ T7295] ESI: c5d6cf38 EDI: ffffffff EBP: 00000000 ESP: bfca54d8
> [ 272.507592][ T7295] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000296
> [ 272.508417][ T7295] Modules linked in: aesni_intel crypto_simd qemu_fw_cfg autofs4
> [ 272.509201][ T7295] CR2: 000000000000000c
> [ 272.509704][ T7295] ---[ end trace 97b48cc676da14f9 ]---
> [ 272.510293][ T7295] EIP: __ubsan_handle_shift_out_of_bounds+0xe/0x350
> [ 272.511023][ T7295] Code: 05 90 a6 c2 00 68 2a 54 00 68 2a 54 bd 4e 00 8d bd 4e 00 8d 00 00 66 90 00 00 66 90 57 56 83 ec 57 56 83 ec 89 c7 8b 48 89 c7 <8b> 48 8d b4 26 00 8d b4 26 00 75 b4 64 8b 75 b4 64 8b ca 83 bb 1c
> [ 272.513169][ T7295] EAX: 00000000 EBX: c5d6cf38 ECX: 00000031 EDX: 00000000
> [ 272.513979][ T7295] ESI: f138eb71 EDI: 00000000 EBP: f5a23f3c ESP: f5a23ec8
> [ 272.514800][ T7295] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010292
> [ 272.515974][ T7295] CR0: 80050033 CR2: 0000000c CR3: 3528d000 CR4: 000406d0
> [ 272.516787][ T7295] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [ 272.517619][ T7295] DR6: fffe0ff0 DR7: 00000400
> [ 272.518105][ T7295] Kernel panic - not syncing: Fatal exception
> [ 272.519566][ T7295] Kernel Offset: disabled
>
>
> as contrast, in fbc:
> [ 126.758570][ T3293] ================================================================================
> [ 126.758949][ T3293] UBSAN: shift-out-of-bounds in (null):0:0
> [ 126.759174][ T3293] BUG: kernel NULL pointer dereference, address: 00000000
> [ 126.759447][ T3293] #PF: supervisor read access in kernel mode
> [ 126.759676][ T3293] #PF: error_code(0x0000) - not-present page
> [ 126.759905][ T3293] *pde = 00000000
> [ 126.760047][ T3293] Oops: 0000 [#1] SMP
> [ 126.760205][ T3293] CPU: 1 PID: 3293 Comm: trinity-c4 Not tainted 5.14.0-00006-g6128b3af2a5e #1
> [ 126.760541][ T3293] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> [ 126.760890][ T3293] EIP: __ubsan_handle_shift_out_of_bounds+0x88/0x350
> [ 126.761147][ T3293] Code: 00 83 c4 04 7f 23 47 04 7f 23 47 04 ff 37 68 ef ff 37 68 ef e3 77 d0 d7 e3 77 d0 d7 00 8b 45 f0 00 8b 45 f0 c4 14 66 83 c4 14 <66> 83 66
> 83 3f 00 66 83 3f 00 00 00 66 83 00 00 66 83 b9 01 00 00
> [ 126.761889][ T3293] EAX: 00000000 EBX: f345b500 ECX: 00000027 EDX: eba9ce40
> [ 126.762159][ T3293] ESI: 00000046 EDI: 00000000 EBP: f3575f40 ESP: f3575ecc
> [ 126.762428][ T3293] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010286
> [ 126.762718][ T3293] CR0: 80050033 CR2: 00000000 CR3: 33464000 CR4: 000406d0
> [ 126.762989][ T3293] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [ 126.763259][ T3293] DR6: fffe0ff0 DR7: 00000400
> [ 126.763436][ T3293] Call Trace:
> [ 126.763562][ T3293] ? rcu_lock_acquire+0x30/0x30
> [ 126.763749][ T3293] ? rcu_read_lock_sched_held+0x31/0x70
> [ 126.763960][ T3293] ksys_mmap_pgoff+0x1fc/0x290
> [ 126.764146][ T3293] __ia32_sys_mmap_pgoff+0x1c/0x30
> [ 126.764343][ T3293] do_int80_syscall_32+0x39/0x80
> [ 126.764532][ T3293] entry_INT80_32+0x10d/0x10d
> [ 126.764709][ T3293] EIP: 0xb7fbda02
> [ 126.764848][ T3293] Code: 95 01 00 05 25 36 02 00 83 ec 14 8d 80 e8 99 ff ff 50 6a 02 e8 1f ff 00 00 c7 04 24 7f 00 00 00 e8 7e 87 01 00 66 90 90 cd 80 <c3> 8d b6
> 00 00 00 00 8d bc 27 00 00 00 00 8b 1c 24 c3 8d b6 00 00
> [ 126.765591][ T3293] EAX: ffffffda EBX: 00000000 ECX: 00001000 EDX: 55dd7eb6
> [ 126.765859][ T3293] ESI: f0bd6374 EDI: ffffffff EBP: 00000000 ESP: bf9964d8
> [ 126.766129][ T3293] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000296
> [ 126.766419][ T3293] Modules linked in: aesni_intel crypto_simd qemu_fw_cfg autofs4
> [ 126.766715][ T3293] CR2: 0000000000000000
> [ 126.766894][ T3293] ---[ end trace e6000e119f0dc7f3 ]---
> [ 126.767105][ T3293] EIP: __ubsan_handle_shift_out_of_bounds+0x88/0x350
> [ 126.767361][ T3293] Code: 00 83 c4 04 7f 23 47 04 7f 23 47 04 ff 37 68 ef ff 37 68 ef e3 77 d0 d7 e3 77 d0 d7 00 8b 45 f0 00 8b 45 f0 c4 14 66 83 c4 14 <66> 83 66
> +83 3f 00 66 83 3f 00 00 00 66 83 00 00 66 83 b9 01 00 00
> [ 126.768112][ T3293] EAX: 00000000 EBX: f345b500 ECX: 00000027 EDX: eba9ce40
> [ 126.768384][ T3293] ESI: 00000046 EDI: 00000000 EBP: f3575f40 ESP: f3575ecc
> [ 126.768657][ T3293] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010286
> [ 126.768947][ T3293] CR0: 80050033 CR2: 00000000 CR3: 33464000 CR4: 000406d0
> [ 126.769223][ T3293] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [ 126.769496][ T3293] DR6: fffe0ff0 DR7: 00000400
> [ 126.769680][ T3293] Kernel panic - not syncing: Fatal exception
> [ 126.769946][ T3293] Kernel Offset: disabled
>
>
> basically, we just based on the diff to report out, but maybe need your education
> if this "UBSAN:shift-out-of-bounds_in(null)" diff really matter in this case.

The triggering code sequences are "ksys_mmap_pgoff+0x1fd/0x290" vs.
"ksys_mmap_pgoff+0x1fc/0x290", so my gut feeling is that we're dealing
with the same issue.

But I don't have an fully satisfactory explanation why we're getting
more often "address: 00000000" with that commit instead of via the
parent "address: 0000000c". Maybe simply because of changed code layout
"garbage" we're using to address is now "different garbage" :)


--
Thanks,

David / dhildenb