Hello,
Since this commit (Commit: cba5e97280f5 - Merge tag
'sched_urgent_for_v5.13_rc6') we started to see some problem when
running the LTP "cfs_bandwidth01" test case.
Below is a part of the call trace, full console log is available on
https://arr-cki-prod-datawarehouse-public.s3.amazonaws.com/datawarehouse-public/2021/06/20/324119537/build_aarch64_redhat%3A1361789398/tests/10171670_aarch64_1_console.log
[ 3916.859758] LTP: starting cfs_bandwidth01 (cfs_bandwidth01 -i 5)
[ 3918.099939] ------------[ cut here ]------------
[ 3918.101813] rq->tmp_alone_branch != &rq->leaf_cfs_rq_list
[ 3918.101905] WARNING: CPU: 0 PID: 181633 at kernel/sched/fair.c:401
unthrottle_cfs_rq+0x504/0x51c
[ 3918.105454] Modules linked in: n_gsm pps_ldisc ppp_synctty mkiss
ax25 ppp_async ppp_generic serport slcan slip slhc snd_hrtimer snd_seq
snd_seq_device sctp snd_timer snd soundcore authenc pcrypt crypto_user
sha3_generic algif_hash rfkill sunrpc vfat fat virtio_net net_failover
failover fuse drm zram ip_tables x_tables xfs crct10dif_ce ghash_ce
virtio_blk virtio_console qemu_fw_cfg virtio_mmio aes_neon_bs
[ 3918.114509] CPU: 0 PID: 181633 Comm: systemd-udevd Not tainted 5.13.0-rc6 #1
[ 3918.116316] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
[ 3918.118105] pstate: 404000c5 (nZcv daIF +PAN -UAO -TCO BTYPE=--)
[ 3918.119745] pc : unthrottle_cfs_rq+0x504/0x51c
[ 3918.120916] lr : unthrottle_cfs_rq+0x504/0x51c
[ 3918.122083] sp : ffff800010003d20
[ 3918.122954] x29: ffff800010003d20 x28: ffff0fb540c3c200 x27: ffff0fb5ff170400
[ 3918.124837] x26: 000000000000743e x25: 000000000000859d x24: ffffc818f03b1140
[ 3918.126720] x23: 0000000000000000 x22: ffff0fb5ff170400 x21: 0000000000000001
[ 3918.128547] x20: ffff0fb5ff1704c0 x19: 0000000000000009 x18: 0000000000000001
[ 3918.130369] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000030
[ 3918.132213] x14: ffffffffffffffff x13: ffff8000900039bf x12: ffff8000100039c8
[ 3918.134063] x11: ffffc818f047f5b0 x10: ffffc818f040d2f0 x9 : ffffc818ee20104c
[ 3918.135948] x8 : 00000000000172f8 x7 : ffffc818f03b82b8 x6 : 0000000000002e62
[ 3918.137805] x5 : ffff0fb5ff15d450 x4 : 0000000000000001 x3 : ffff479d0f337000
[ 3918.139668] x2 : ffff0fb5ff15d458 x1 : ffff0fb5780cc000 x0 : 000000000000002d
[ 3918.141536] Call trace:
[ 3918.142181] unthrottle_cfs_rq+0x504/0x51c
[ 3918.143258] distribute_cfs_runtime+0x1ec/0x2b4
[ 3918.144430] sched_cfs_period_timer+0xd4/0x250
[ 3918.145589] __run_hrtimer+0x11c/0x1d0
[ 3918.146572] __hrtimer_run_queues+0x80/0xf0
[ 3918.147683] hrtimer_interrupt+0xf4/0x2cc
[ 3918.148754] arch_timer_handler_virt+0x40/0x50
[ 3918.149952] handle_percpu_devid_irq+0x98/0x170
[ 3918.151159] __handle_domain_irq+0x88/0xec
[ 3918.152242] gic_handle_irq+0x5c/0xdc
[ 3918.153210] el1_irq+0xc0/0x148
[ 3918.154028] el0_svc_common.constprop.0+0x48/0x104
[ 3918.155281] do_el0_svc+0x30/0x9c
[ 3918.156148] el0_svc+0x2c/0x54
[ 3918.156940] el0_sync_handler+0x1a4/0x1b0
[ 3918.157940] el0_sync+0x19c/0x1c0
[ 3918.158782] irq event stamp: 428552
[ 3918.159654] hardirqs last enabled at (428551):
[<ffffc818ee0dac14>] el0_svc_common.constprop.0+0x44/0x104
[ 3918.162091] hardirqs last disabled at (428552):
[<ffffc818ef03b400>] enter_el1_irq_or_nmi+0x10/0x20
[ 3918.164392] softirqs last enabled at (428480):
[<ffffc818ee0c6490>] put_cpu_fpsimd_context+0x30/0x70
[ 3918.166740] softirqs last disabled at (428478):
[<ffffc818ee0c6408>] get_cpu_fpsimd_context+0x8/0x60
[ 3918.169065] ---[ end trace 350df9ac4e47440c ]---
[ 3918.170397]
[ 3918.170401] ======================================================
[ 3918.170402] WARNING: possible circular locking dependency detected
[ 3918.170404] 5.13.0-rc6 #1 Not tainted
This is a call trace from aarch64, but this also happens on other arches.
Here is the console log from x86_64:
https://arr-cki-prod-datawarehouse-public.s3.amazonaws.com/datawarehouse-public/2021/06/20/324119545/build_x86_64_redhat%3A1361789429/tests/10170701_x86_64_1_console.log
Thank you,
Bruno Goncalves
On Mon, Jun 21, 2021 at 7:54 AM CKI Project <[email protected]> wrote:
>
>
> Hello,
>
> We ran automated tests on a recent commit from this kernel tree:
>
> Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> Commit: cba5e97280f5 - Merge tag 'sched_urgent_for_v5.13_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>
> The results of these automated tests are provided below.
>
> Overall result: FAILED (see details below)
> Merge: OK
> Compile: OK
> Selftests compile: FAILED
> Tests: FAILED
>
> All kernel binaries, config files, and logs are available for download here:
>
> https://arr-cki-prod-datawarehouse-public.s3.amazonaws.com/index.html?prefix=datawarehouse-public/2021/06/20/324119537
>
> One or more kernel tests failed:
>
> aarch64:
> ❌ LTP
> ❌ CIFS Connectathon
>
> x86_64:
> ❌ xfstests - nfsv4.2
> ❌ power-management: cpupower/sanity test
> ❌ storage: software RAID testing
>
> We hope that these logs can help you find the problem quickly. For the full
> detail on our testing procedures, please scroll to the bottom of this message.
>
> Please reply to this email if you have any questions about the tests that we
> ran or if you have any suggestions on how to make future tests more effective.
>
> ,-. ,-.
> ( C ) ( K ) Continuous
> `-',-.`-' Kernel
> ( I ) Integration
> `-'
> ______________________________________________________________________________
>
> Compile testing
> ---------------
>
> We compiled the kernel for 4 architectures:
>
> aarch64:
> make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
>
> ppc64le:
> make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
>
> s390x:
> make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
>
> x86_64:
> make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
>
>
> We built the following selftests:
>
> x86_64:
> net: OK
> bpf: fail
> install and packaging: OK
>
> You can find the full log (build-selftests.log) in the artifact storage above.
>
>
> Hardware testing
> ----------------
> We booted each kernel and ran the following tests:
>
> aarch64:
> Host 1:
>
> ⚡ Internal infrastructure issues prevented one or more tests (marked
> with ⚡⚡⚡) from running on this architecture.
> This is not the fault of the kernel that was tested.
>
> ✅ Boot test
> ✅ ACPI table test
> ✅ ACPI enabled test
> ❌ LTP
> ❌ CIFS Connectathon
> ✅ POSIX pjd-fstest suites
> ⚡⚡⚡ Loopdev Sanity
> ⚡⚡⚡ jvm - jcstress tests
> ⚡⚡⚡ Memory: fork_mem
> ⚡⚡⚡ Memory function: memfd_create
> ⚡⚡⚡ AMTU (Abstract Machine Test Utility)
> ⚡⚡⚡ Networking bridge: sanity
> ⚡⚡⚡ Ethernet drivers sanity
> ⚡⚡⚡ Networking socket: fuzz
> ⚡⚡⚡ Networking: igmp conformance test
> ⚡⚡⚡ Networking route: pmtu
> ⚡⚡⚡ Networking route_func - local
> ⚡⚡⚡ Networking route_func - forward
> ⚡⚡⚡ Networking TCP: keepalive test
> ⚡⚡⚡ Networking UDP: socket
> ⚡⚡⚡ Networking cki netfilter test
> ⚡⚡⚡ Networking tunnel: geneve basic test
> ⚡⚡⚡ Networking tunnel: gre basic
> ⚡⚡⚡ L2TP basic test
> ⚡⚡⚡ Networking tunnel: vxlan basic
> ⚡⚡⚡ Networking ipsec: basic netns - transport
> ⚡⚡⚡ Networking ipsec: basic netns - tunnel
> ⚡⚡⚡ Libkcapi AF_ALG test
> ⚡⚡⚡ pciutils: update pci ids test
> ⚡⚡⚡ ALSA PCM loopback test
> ⚡⚡⚡ ALSA Control (mixer) Userspace Element test
> ⚡⚡⚡ storage: SCSI VPD
> ⚡⚡⚡ trace: ftrace/tracer
> ⚡⚡⚡ kdump - kexec_boot
> ⚡⚡⚡ xarray-idr-radixtree-test
> ⚡⚡⚡ i2c: i2cdetect sanity
> ⚡⚡⚡ Firmware test suite
> ⚡⚡⚡ Memory function: kaslr
> ⚡⚡⚡ audit: audit testsuite test
>
> ppc64le:
> Host 1:
>
> ⚡ Internal infrastructure issues prevented one or more tests (marked
> with ⚡⚡⚡) from running on this architecture.
> This is not the fault of the kernel that was tested.
>
> ✅ Boot test
> ⚡⚡⚡ LTP
> ⚡⚡⚡ CIFS Connectathon
> ⚡⚡⚡ POSIX pjd-fstest suites
> ⚡⚡⚡ Loopdev Sanity
> ⚡⚡⚡ jvm - jcstress tests
> ⚡⚡⚡ Memory: fork_mem
> ⚡⚡⚡ Memory function: memfd_create
> ⚡⚡⚡ AMTU (Abstract Machine Test Utility)
> ⚡⚡⚡ Networking bridge: sanity
> ⚡⚡⚡ Ethernet drivers sanity
> ⚡⚡⚡ Networking socket: fuzz
> ⚡⚡⚡ Networking route: pmtu
> ⚡⚡⚡ Networking route_func - local
> ⚡⚡⚡ Networking route_func - forward
> ⚡⚡⚡ Networking TCP: keepalive test
> ⚡⚡⚡ Networking UDP: socket
> ⚡⚡⚡ Networking cki netfilter test
> ⚡⚡⚡ Networking tunnel: geneve basic test
> ⚡⚡⚡ Networking tunnel: gre basic
> ⚡⚡⚡ L2TP basic test
> ⚡⚡⚡ Networking tunnel: vxlan basic
> ⚡⚡⚡ Networking ipsec: basic netns - tunnel
> ⚡⚡⚡ Libkcapi AF_ALG test
> ⚡⚡⚡ pciutils: update pci ids test
> ⚡⚡⚡ ALSA PCM loopback test
> ⚡⚡⚡ ALSA Control (mixer) Userspace Element test
> ⚡⚡⚡ trace: ftrace/tracer
> ⚡⚡⚡ xarray-idr-radixtree-test
> ⚡⚡⚡ Memory function: kaslr
> ⚡⚡⚡ audit: audit testsuite test
>
> s390x:
> Host 1:
>
> ⚡ Internal infrastructure issues prevented one or more tests (marked
> with ⚡⚡⚡) from running on this architecture.
> This is not the fault of the kernel that was tested.
>
> ⚡⚡⚡ Boot test
> ⚡⚡⚡ kdump - sysrq-c
> ⚡⚡⚡ kdump - file-load
>
> Host 2:
>
> ⚡ Internal infrastructure issues prevented one or more tests (marked
> with ⚡⚡⚡) from running on this architecture.
> This is not the fault of the kernel that was tested.
>
> ⚡⚡⚡ Boot test
> ⚡⚡⚡ xfstests - ext4
> ⚡⚡⚡ xfstests - xfs
> ⚡⚡⚡ Storage: swraid mdadm raid_module test
> ⚡⚡⚡ Podman system integration test - as root
> ⚡⚡⚡ Podman system integration test - as user
> ⚡⚡⚡ xfstests - btrfs
> ⚡⚡⚡ selinux-policy: serge-testsuite
> ⚡⚡⚡ Storage blktests
> ⚡⚡⚡ Storage nvme - tcp
> ⚡⚡⚡ stress: stress-ng
>
> Host 3:
>
> ⚡ Internal infrastructure issues prevented one or more tests (marked
> with ⚡⚡⚡) from running on this architecture.
> This is not the fault of the kernel that was tested.
>
> ⚡⚡⚡ Boot test
> ⚡⚡⚡ LTP
> ⚡⚡⚡ CIFS Connectathon
> ⚡⚡⚡ POSIX pjd-fstest suites
> ⚡⚡⚡ Loopdev Sanity
> ⚡⚡⚡ jvm - jcstress tests
> ⚡⚡⚡ Memory: fork_mem
> ⚡⚡⚡ Memory function: memfd_create
> ⚡⚡⚡ AMTU (Abstract Machine Test Utility)
> ⚡⚡⚡ Networking bridge: sanity
> ⚡⚡⚡ Ethernet drivers sanity
> ⚡⚡⚡ Networking route: pmtu
> ⚡⚡⚡ Networking route_func - local
> ⚡⚡⚡ Networking route_func - forward
> ⚡⚡⚡ Networking TCP: keepalive test
> ⚡⚡⚡ Networking UDP: socket
> ⚡⚡⚡ Networking cki netfilter test
> ⚡⚡⚡ Networking tunnel: geneve basic test
> ⚡⚡⚡ Networking tunnel: gre basic
> ⚡⚡⚡ L2TP basic test
> ⚡⚡⚡ Networking tunnel: vxlan basic
> ⚡⚡⚡ Networking ipsec: basic netns - transport
> ⚡⚡⚡ Networking ipsec: basic netns - tunnel
> ⚡⚡⚡ Libkcapi AF_ALG test
> ⚡⚡⚡ trace: ftrace/tracer
> ⚡⚡⚡ kdump - kexec_boot
> ⚡⚡⚡ xarray-idr-radixtree-test
> ⚡⚡⚡ Memory function: kaslr
> ⚡⚡⚡ audit: audit testsuite test
>
> Host 4:
>
> ⚡ Internal infrastructure issues prevented one or more tests (marked
> with ⚡⚡⚡) from running on this architecture.
> This is not the fault of the kernel that was tested.
>
> ⚡⚡⚡ Boot test
> ⚡⚡⚡ xfstests - ext4
> ⚡⚡⚡ xfstests - xfs
> ⚡⚡⚡ Storage: swraid mdadm raid_module test
> ⚡⚡⚡ Podman system integration test - as root
> ⚡⚡⚡ Podman system integration test - as user
> ⚡⚡⚡ xfstests - btrfs
> ⚡⚡⚡ selinux-policy: serge-testsuite
> ⚡⚡⚡ Storage blktests
> ⚡⚡⚡ Storage nvme - tcp
> ⚡⚡⚡ stress: stress-ng
>
> Host 5:
>
> ⚡ Internal infrastructure issues prevented one or more tests (marked
> with ⚡⚡⚡) from running on this architecture.
> This is not the fault of the kernel that was tested.
>
> ⚡⚡⚡ Boot test
> ⚡⚡⚡ LTP
> ⚡⚡⚡ CIFS Connectathon
> ⚡⚡⚡ POSIX pjd-fstest suites
> ⚡⚡⚡ Loopdev Sanity
> ⚡⚡⚡ jvm - jcstress tests
> ⚡⚡⚡ Memory: fork_mem
> ⚡⚡⚡ Memory function: memfd_create
> ⚡⚡⚡ AMTU (Abstract Machine Test Utility)
> ⚡⚡⚡ Networking bridge: sanity
> ⚡⚡⚡ Ethernet drivers sanity
> ⚡⚡⚡ Networking route: pmtu
> ⚡⚡⚡ Networking route_func - local
> ⚡⚡⚡ Networking route_func - forward
> ⚡⚡⚡ Networking TCP: keepalive test
> ⚡⚡⚡ Networking UDP: socket
> ⚡⚡⚡ Networking cki netfilter test
> ⚡⚡⚡ Networking tunnel: geneve basic test
> ⚡⚡⚡ Networking tunnel: gre basic
> ⚡⚡⚡ L2TP basic test
> ⚡⚡⚡ Networking tunnel: vxlan basic
> ⚡⚡⚡ Networking ipsec: basic netns - transport
> ⚡⚡⚡ Networking ipsec: basic netns - tunnel
> ⚡⚡⚡ Libkcapi AF_ALG test
> ⚡⚡⚡ trace: ftrace/tracer
> ⚡⚡⚡ kdump - kexec_boot
> ⚡⚡⚡ xarray-idr-radixtree-test
> ⚡⚡⚡ Memory function: kaslr
> ⚡⚡⚡ audit: audit testsuite test
>
> x86_64:
> Host 1:
>
> ⚡ Internal infrastructure issues prevented one or more tests (marked
> with ⚡⚡⚡) from running on this architecture.
> This is not the fault of the kernel that was tested.
>
> ✅ Boot test
> ✅ ACPI table test
> ✅ LTP
> ✅ CIFS Connectathon
> ✅ POSIX pjd-fstest suites
> ✅ Loopdev Sanity
> ⚡⚡⚡ jvm - jcstress tests
> ✅ Memory: fork_mem
> ✅ Memory function: memfd_create
> ✅ AMTU (Abstract Machine Test Utility)
> ✅ Networking bridge: sanity
> ✅ Ethernet drivers sanity
> ⚡⚡⚡ Networking socket: fuzz
> ⚡⚡⚡ Networking: igmp conformance test
> ⚡⚡⚡ Networking route: pmtu
> ⚡⚡⚡ Networking route_func - local
> ⚡⚡⚡ Networking route_func - forward
> ⚡⚡⚡ Networking TCP: keepalive test
> ⚡⚡⚡ Networking UDP: socket
> ⚡⚡⚡ Networking cki netfilter test
> ⚡⚡⚡ Networking tunnel: geneve basic test
> ⚡⚡⚡ Networking tunnel: gre basic
> ⚡⚡⚡ L2TP basic test
> ⚡⚡⚡ Networking tunnel: vxlan basic
> ⚡⚡⚡ Networking ipsec: basic netns - transport
> ⚡⚡⚡ Networking ipsec: basic netns - tunnel
> ⚡⚡⚡ Libkcapi AF_ALG test
> ⚡⚡⚡ pciutils: sanity smoke test
> ⚡⚡⚡ pciutils: update pci ids test
> ⚡⚡⚡ ALSA PCM loopback test
> ⚡⚡⚡ ALSA Control (mixer) Userspace Element test
> ⚡⚡⚡ storage: SCSI VPD
> ⚡⚡⚡ trace: ftrace/tracer
> ⚡⚡⚡ kdump - kexec_boot
> ⚡⚡⚡ xarray-idr-radixtree-test
> ⚡⚡⚡ i2c: i2cdetect sanity
> ⚡⚡⚡ Firmware test suite
> ⚡⚡⚡ Memory function: kaslr
> ⚡⚡⚡ audit: audit testsuite test
>
> Host 2:
>
> ⚡ Internal infrastructure issues prevented one or more tests (marked
> with ⚡⚡⚡) from running on this architecture.
> This is not the fault of the kernel that was tested.
>
> ✅ Boot test
> ✅ xfstests - ext4
> ✅ xfstests - xfs
> ❌ xfstests - nfsv4.2
> ❌ power-management: cpupower/sanity test
> ❌ storage: software RAID testing
> ✅ Storage: swraid mdadm raid_module test
> ❌ Podman system integration test - as root
> ✅ Podman system integration test - as user
> ✅ CPU: Idle Test
> ✅ xfstests - btrfs
> ⚡⚡⚡ xfstests - cifsv3.11
> ⚡⚡⚡ IPMI driver test
> ⚡⚡⚡ IPMItool loop stress test
> ⚡⚡⚡ selinux-policy: serge-testsuite
> ⚡⚡⚡ Storage blktests
> ⚡⚡⚡ Storage block - filesystem fio test
> ⚡⚡⚡ Storage block - queue scheduler test
> ⚡⚡⚡ Storage nvme - tcp
> ⚡⚡⚡ Storage nvdimm ndctl test suite
> ⚡⚡⚡ Storage: lvm device-mapper test
> ⚡⚡⚡ stress: stress-ng
>
> Test sources: https://gitlab.com/cki-project/kernel-tests
> Pull requests are welcome for new tests or improvements to existing tests!
>
> Aborted tests
> -------------
> Tests that didn't complete running successfully are marked with ⚡⚡⚡.
> If this was caused by an infrastructure issue, we try to mark that
> explicitly in the report.
>
> Waived tests
> ------------
> If the test run included waived tests, they are marked with . Such tests are
> executed but their results are not taken into account. Tests are waived when
> their results are not reliable enough, e.g. when they're just introduced or are
> being fixed.
>
> Testing timeout
> ---------------
> We aim to provide a report within reasonable timeframe. Tests that haven't
> finished running yet are marked with ⏱.
>
>
Hello,
> Since this commit (Commit: cba5e97280f5 - Merge tag
> 'sched_urgent_for_v5.13_rc6') we started to see some problem when
> running the LTP "cfs_bandwidth01" test case.
We got a similar report here, together with some discussion:
https://lore.kernel.org/lkml/[email protected]/
It should be fixed by this patch, so feel free to test and report back:
https://lore.kernel.org/lkml/[email protected]/
It has already made its way into tip;
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/log/?h=sched/urgent
Thanks
Odin
On Thu, Jun 24, 2021 at 2:30 PM Odin Ugedal <[email protected]> wrote:
>
> Hello,
>
> > Since this commit (Commit: cba5e97280f5 - Merge tag
> > 'sched_urgent_for_v5.13_rc6') we started to see some problem when
> > running the LTP "cfs_bandwidth01" test case.
>
> We got a similar report here, together with some discussion:
> https://lore.kernel.org/lkml/[email protected]/
>
> It should be fixed by this patch, so feel free to test and report back:
> https://lore.kernel.org/lkml/[email protected]/
Thank you for the reply, I've tested the patch and it works well.
Bruno
>
> It has already made its way into tip;
> https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/log/?h=sched/urgent
>
> Thanks
> Odin
>