2022-05-10 13:29:53

by kernel test robot

[permalink] [raw]
Subject: [mm] 23e12fc477: UBSAN:shift-out-of-bounds_in_mm/page_isolation.c



Greeting,

FYI, we noticed the following commit (built with clang-15):

commit: 23e12fc477f1c2729af51c427087e777d6e63803 ("mm: make alloc_contig_range work at pageblock granularity")
https://github.com/hnaz/linux-mm master

in testcase: boot

on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):



If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


[ 103.625478][ T1] ================================================================================
[ 103.628487][ T1] UBSAN: shift-out-of-bounds in mm/page_isolation.c:416:17
[ 103.631041][ T1] shift exponent 64 is too large for 64-bit type 'unsigned long'
[ 103.633539][ T1] CPU: 0 PID: 1 Comm: swapper Not tainted 5.18.0-rc4-mm1-00249-g23e12fc477f1 #1 4cafac2312e666eae49f8458f1d93cbe9d5338b2
[ 103.637394][ T1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[ 103.640378][ T1] Call Trace:
[ 103.641583][ T1] <TASK>
[ 103.642670][ T1] __ubsan_handle_shift_out_of_bounds+0x356/0x3a0
[ 103.644703][ T1] isolate_single_pageblock+0x683/0x870
[ 103.646498][ T1] start_isolate_page_range+0x69/0xb10
[ 103.648349][ T1] alloc_contig_range+0x27b/0x680
[ 103.650010][ T1] alloc_contig_pages+0x413/0x550
[ 103.651549][ T1] debug_vm_pgtable_alloc_huge_page+0x27/0xc1
[ 103.653486][ T1] init_args+0xa5f/0xe06
[ 103.654924][ T1] ? __hugetlb_cgroup_file_legacy_init+0x61f/0x61f
[ 103.656949][ T1] debug_vm_pgtable+0x56/0x3e0
[ 103.658484][ T1] ? __hugetlb_cgroup_file_legacy_init+0x61f/0x61f
[ 103.660556][ T1] do_one_initcall+0x2bd/0x740
[ 103.662132][ T1] ? __hugetlb_cgroup_file_legacy_init+0x61f/0x61f
[ 103.664179][ T1] ? __llvm_gcov_reset+0x740/0x1320
[ 103.665837][ T1] do_initcall_level+0x13c/0x284
[ 103.667460][ T1] do_initcalls+0x75/0xb7
[ 103.668995][ T1] kernel_init_freeable+0x158/0x1f6
[ 103.670678][ T1] ? rest_init+0x2f0/0x2f0
[ 103.672143][ T1] kernel_init+0x18/0x2a0
[ 103.673544][ T1] ? rest_init+0x2f0/0x2f0
[ 103.675026][ T1] ret_from_fork+0x22/0x30
[ 103.676494][ T1] </TASK>
[ 103.677587][ T1] ================================================================================
[ 140.018114][ C0] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 32s!
[ 140.021174][ C0] Showing busy workqueues and worker pools:
[ 140.022912][ C0] workqueue events_power_efficient: flags=0x80
[ 140.024730][ C0] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=6/256 refcnt=7
[ 140.024759][ C0] pending: neigh_managed_work, neigh_managed_work, neigh_managed_work, neigh_periodic_work, neigh_periodic_work, neigh_periodic_work




To reproduce:

# build kernel
cd linux
cp config-5.18.0-rc4-mm1-00249-g23e12fc477f1 .config
make HOSTCC=clang-15 CC=clang-15 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules
make HOSTCC=clang-15 CC=clang-15 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install
cd <mod-install-dir>
find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz


git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.



--
0-DAY CI Kernel Test Service
https://01.org/lkp



Attachments:
(No filename) (3.60 kB)
config-5.18.0-rc4-mm1-00249-g23e12fc477f1 (138.40 kB)
job-script (4.83 kB)
dmesg.xz (13.64 kB)
Download all attachments

2022-05-10 20:53:45

by Zi Yan

[permalink] [raw]
Subject: Re: [mm] 23e12fc477: UBSAN:shift-out-of-bounds_in_mm/page_isolation.c

Hi kernel test robot,

There is a fixup patch for the commit: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-make-alloc_contig_range-work-at-pageblock-granularity-fix.patch
It fixed the issue as I verified it by following the steps below. No more boot hang.

--
Best Regards,
Yan, Zi

On 10 May 2022, at 5:58, kernel test robot wrote:

> Greeting,
>
> FYI, we noticed the following commit (built with clang-15):
>
> commit: 23e12fc477f1c2729af51c427087e777d6e63803 ("mm: make alloc_contig_range work at pageblock granularity")
> https://github.com/hnaz/linux-mm master
>
> in testcase: boot
>
> on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
>
> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
>
>
>
> If you fix the issue, kindly add following tag
> Reported-by: kernel test robot <[email protected]>
>
>
> [ 103.625478][ T1] ================================================================================
> [ 103.628487][ T1] UBSAN: shift-out-of-bounds in mm/page_isolation.c:416:17
> [ 103.631041][ T1] shift exponent 64 is too large for 64-bit type 'unsigned long'
> [ 103.633539][ T1] CPU: 0 PID: 1 Comm: swapper Not tainted 5.18.0-rc4-mm1-00249-g23e12fc477f1 #1 4cafac2312e666eae49f8458f1d93cbe9d5338b2
> [ 103.637394][ T1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> [ 103.640378][ T1] Call Trace:
> [ 103.641583][ T1] <TASK>
> [ 103.642670][ T1] __ubsan_handle_shift_out_of_bounds+0x356/0x3a0
> [ 103.644703][ T1] isolate_single_pageblock+0x683/0x870
> [ 103.646498][ T1] start_isolate_page_range+0x69/0xb10
> [ 103.648349][ T1] alloc_contig_range+0x27b/0x680
> [ 103.650010][ T1] alloc_contig_pages+0x413/0x550
> [ 103.651549][ T1] debug_vm_pgtable_alloc_huge_page+0x27/0xc1
> [ 103.653486][ T1] init_args+0xa5f/0xe06
> [ 103.654924][ T1] ? __hugetlb_cgroup_file_legacy_init+0x61f/0x61f
> [ 103.656949][ T1] debug_vm_pgtable+0x56/0x3e0
> [ 103.658484][ T1] ? __hugetlb_cgroup_file_legacy_init+0x61f/0x61f
> [ 103.660556][ T1] do_one_initcall+0x2bd/0x740
> [ 103.662132][ T1] ? __hugetlb_cgroup_file_legacy_init+0x61f/0x61f
> [ 103.664179][ T1] ? __llvm_gcov_reset+0x740/0x1320
> [ 103.665837][ T1] do_initcall_level+0x13c/0x284
> [ 103.667460][ T1] do_initcalls+0x75/0xb7
> [ 103.668995][ T1] kernel_init_freeable+0x158/0x1f6
> [ 103.670678][ T1] ? rest_init+0x2f0/0x2f0
> [ 103.672143][ T1] kernel_init+0x18/0x2a0
> [ 103.673544][ T1] ? rest_init+0x2f0/0x2f0
> [ 103.675026][ T1] ret_from_fork+0x22/0x30
> [ 103.676494][ T1] </TASK>
> [ 103.677587][ T1] ================================================================================
> [ 140.018114][ C0] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 32s!
> [ 140.021174][ C0] Showing busy workqueues and worker pools:
> [ 140.022912][ C0] workqueue events_power_efficient: flags=0x80
> [ 140.024730][ C0] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=6/256 refcnt=7
> [ 140.024759][ C0] pending: neigh_managed_work, neigh_managed_work, neigh_managed_work, neigh_periodic_work, neigh_periodic_work, neigh_periodic_work
>
>
>
>
> To reproduce:
>
> # build kernel
> cd linux
> cp config-5.18.0-rc4-mm1-00249-g23e12fc477f1 .config
> make HOSTCC=clang-15 CC=clang-15 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules
> make HOSTCC=clang-15 CC=clang-15 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install
> cd <mod-install-dir>
> find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz
>
>
> git clone https://github.com/intel/lkp-tests.git
> cd lkp-tests
> bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email
>
> # if come across any failure that blocks the test,
> # please remove ~/.lkp and /lkp dir to run from a clean state.
>
>
>
> --
> 0-DAY CI Kernel Test Service
> https://01.org/lkp


Attachments:
signature.asc (871.00 B)
OpenPGP digital signature

2022-05-10 23:05:08

by Andrew Morton

[permalink] [raw]
Subject: Re: [mm] 23e12fc477: UBSAN:shift-out-of-bounds_in_mm/page_isolation.c

On Tue, 10 May 2022 17:58:24 +0800 kernel test robot <[email protected]> wrote:

> commit: 23e12fc477f1c2729af51c427087e777d6e63803 ("mm: make alloc_contig_range work at pageblock granularity")
> https://github.com/hnaz/linux-mm master

That tree is no longer being updated. Please switch to

git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Either the mm-unstable branch (hotfixes and MM) or the mm-everything
branch (mm-unstable plus non-MM patches).

mm-unstable includes
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-make-alloc_contig_range-work-at-pageblock-granularity-fix.patch,
which quite possibly fixes the issue you have detected.

Thanks.