Hi everyone,
I compiled Linux 5.13.0 and use it on my Ubuntu. I got a kernel warning
at kernel/sched/fair.c:3306.
Below is the system information
Kernel: 5.13.0-icetea001-12377-gf55966571d5e
OS: Ubuntu 21.04
CPU: 4 Core
Hardware name: Acer Aspire ES1-421/OLVIA_BE, BIOS V1.05 07/02/2015
Reproduction steps:
1. Connect to a wireless (internet).
2. After several moment (the time to reproduce is random), the internet
will suddenly hang for a few seconds. After that the network is down,
but the interface state is still connected.
The only way to get the network back is reconnect the wireless.
# Ok, after hang, internet won't work.
# See, it's still connected.
nmcli c
# Disconnect
nmcli c down qwerty;
# Connect again
nmcli c up qwerty;
# Internet work again after reconnect.
3. Check `dmesg -Sr`.
Here is the warning (I attached more log and kernel config as well):
[ C0] ------------[ cut here ]------------
[ C0] cfs_rq->avg.load_avg || cfs_rq->avg.util_avg ||
cfs_rq->avg.runnable_avg
[ C0] WARNING: CPU: 0 PID: 12 at kernel/sched/fair.c:3306
update_blocked_averages+0x941/0x9a0
[ C0] Modules linked in: rfcomm xt_CHECKSUM xt_MASQUERADE
xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp nft_compat
nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
nft_counter nf_tables nfnetlink bridge stp llc bfq cmac algif_hash
algif_skcipher af_alg bnep dm_multipath scsi_dh_rdac scsi_dh_emc
scsi_dh_alua snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio
btusb snd_hda_codec_hdmi btrtl snd_hda_intel uvcvideo btbcm btintel
snd_intel_dspcfg snd_intel_sdw_acpi videobuf2_vmalloc bluetooth
snd_hda_codec videobuf2_memops snd_hda_core videobuf2_v4l2 snd_hwdep
videobuf2_common snd_pcm edac_mce_amd videodev snd_seq_midi
snd_seq_midi_event snd_rawmidi kvm_amd ecdh_generic mc ecc kvm snd_seq
wl(OE) acer_wmi snd_seq_device sparse_keymap snd_timer cfg80211
input_leds snd soundcore wmi_bmof serio_raw ccp k10temp mac_hid
fam15h_power sch_fq_codel msr ip_tables x_tables autofs4 btrfs
blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq
[ C0] async_xor async_tx xor raid6_pq libcrc32c raid1 raid0
multipath linear amdgpu iommu_v2 gpu_sched radeon i2c_algo_bit
drm_ttm_helper ttm drm_kms_helper hid_generic syscopyarea rtsx_pci_sdmmc
sysfillrect sysimgblt crct10dif_pclmul fb_sys_fops cec crc32_pclmul
rc_core ghash_clmulni_intel usbhid aesni_intel sdhci_pci crypto_simd
cqhci xhci_pci r8169 psmouse drm xhci_pci_renesas ahci cryptd realtek
sdhci rtsx_pci libahci hid i2c_piix4 wmi video
[ C0] CPU: 0 PID: 12 Comm: ksoftirqd/0 Tainted: G OE
5.13.0-icetea001-12377-gf55966571d5e #3
[ C0] Hardware name: Acer Aspire ES1-421/OLVIA_BE, BIOS V1.05 07/02/2015
[ C0] RIP: 0010:update_blocked_averages+0x941/0x9a0
[ C0] Code: 00 e9 a7 fe ff ff e8 9e 22 c2 00 e9 4b f9 ff ff 0f 0b e9
da fe ff ff 48 c7 c7 88 c1 5c 82 c6 05 07 f6 ae 01 01 e8 d3 50 bc 00
<0f> 0b 41 8b 84 24 78 01 00 00 e9 f8 fa ff ff 48 c7 c7 88 bb 5c 82
[ C0] RSP: 0018:ffffc900001a7de0 EFLAGS: 00010082
[ C0] RAX: 0000000000000000 RBX: ffff888104ec6980 RCX: 0000000000000027
[ C0] RDX: ffff888313c18e28 RSI: 0000000000000001 RDI: ffff888313c18e20
[ C0] RBP: ffffc900001a7e58 R08: ffffffff82962048 R09: 00000000ffffdfff
[ C0] R10: ffffffff82882060 R11: ffffffff82882060 R12: ffff888104ec6800
[ C0] R13: 0000000000000000 R14: 0000735d623b8c53 R15: ffff888103830200
[ C0] FS: 0000000000000000(0000) GS:ffff888313c00000(0000)
knlGS:0000000000000000
[ C0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ C0] CR2: 00007ff1f8c91000 CR3: 000000017c42a000 CR4: 00000000000406f0
[ C0] Call Trace:
[ C0] run_rebalance_domains+0x53/0x80
[ C0] __do_softirq+0xd2/0x472
[ C0] run_ksoftirqd+0x3f/0x60
[ C0] smpboot_thread_fn+0xc2/0x170
[ C0] ? smpboot_register_percpu_thread+0xe0/0xe0
[ C0] kthread+0x138/0x160
[ C0] ? set_kthread_struct+0x50/0x50
[ C0] ret_from_fork+0x1f/0x30
[ C0] irq event stamp: 43203642
[ C0] hardirqs last enabled at (43203641): [<ffffffff810a6004>]
run_ksoftirqd+0x44/0x60
[ C0] hardirqs last disabled at (43203642): [<ffffffff81d1e8af>]
__schedule+0xfcf/0x17d0
[ C0] softirqs last enabled at (43203640): [<ffffffff810a5fff>]
run_ksoftirqd+0x3f/0x60
[ C0] softirqs last disabled at (43203625): [<ffffffff810a5fff>]
run_ksoftirqd+0x3f/0x60
[ C0] ---[ end trace 74d3894cf8cf6ef8 ]---
Attachment:
1) config.gz (kernel config for compile)
2) dmesg.txt (more about kernel log)
3) proc_cpuinfo.gz (From cat /proc/cpuinfo)
4) proc_modules.gz (From cat /proc/modules)
If you need more information or want to me to do something, please let
me know. I will be happy to help.
Regards,
Ammar
Hi Ammar,
On 30/07/2021 17:21, Ammar Faizi wrote:
> Hi everyone,
>
> I compiled Linux 5.13.0 and use it on my Ubuntu. I got a kernel warning
> at kernel/sched/fair.c:3306.
>
> Below is the system information
> Kernel: 5.13.0-icetea001-12377-gf55966571d5e
So you're running with:
9e077b52d86a - sched/pelt: Check that *_avg are null when *_sum are
(2021-06-17 Vincent Guittot)
but not with:
ceb6ba45dc80 - sched/fair: Sync load_sum with load_avg after dequeue
(2021-07-02 Vincent Guittot)
The SCHED_WARN_ON you're hitting is harmless and just tells you that the
PELT load_avg and load_sum part of one of your cfs_rq's is not aligned.
Has to be load (and not util or runnable) since load is the only one
still not fixed in f55966571d5e.
This should go away once you applied ceb6ba45dc80.
-- Dietmar
On 8/2/21 3:42 PM, Dietmar Eggemann wrote:
> So you're running with:
>
> 9e077b52d86a - sched/pelt: Check that *_avg are null when *_sum are
> (2021-06-17 Vincent Guittot)
>
> but not with:
>
> ceb6ba45dc80 - sched/fair: Sync load_sum with load_avg after dequeue
> (2021-07-02 Vincent Guittot)
>
> The SCHED_WARN_ON you're hitting is harmless and just tells you that the
> PELT load_avg and load_sum part of one of your cfs_rq's is not aligned.
> Has to be load (and not util or runnable) since load is the only one
> still not fixed in f55966571d5e.
>
> This should go away once you applied ceb6ba45dc80.
Alright, I have just moved to 5.14-rc4 and doesn't seem to have this
issue anymore.
Thanks for the response, Dietmar.
Ammar