2021-07-30 15:31:56

by Ammar Faizi

[permalink] [raw]
Subject: WARNING: CPU: 0 PID: 12 at kernel/sched/fair.c:3306 update_blocked_averages+0x941/0x9a0

Hi everyone,

I compiled Linux 5.13.0 and use it on my Ubuntu. I got a kernel warning
at kernel/sched/fair.c:3306.

Below is the system information
Kernel: 5.13.0-icetea001-12377-gf55966571d5e
OS: Ubuntu 21.04
CPU: 4 Core
Hardware name: Acer Aspire ES1-421/OLVIA_BE, BIOS V1.05 07/02/2015

Reproduction steps:
1. Connect to a wireless (internet).
2. After several moment (the time to reproduce is random), the internet
will suddenly hang for a few seconds. After that the network is down,
but the interface state is still connected.

The only way to get the network back is reconnect the wireless.
  # Ok, after hang, internet won't work.

  # See, it's still connected.
  nmcli c

  # Disconnect
  nmcli c down qwerty;

  # Connect again
  nmcli c up qwerty;

  # Internet work again after reconnect.

3. Check `dmesg -Sr`.


Here is the warning (I attached more log and kernel config as well):

[    C0] ------------[ cut here ]------------
[    C0] cfs_rq->avg.load_avg || cfs_rq->avg.util_avg ||
cfs_rq->avg.runnable_avg
[    C0] WARNING: CPU: 0 PID: 12 at kernel/sched/fair.c:3306
update_blocked_averages+0x941/0x9a0
[    C0] Modules linked in: rfcomm xt_CHECKSUM xt_MASQUERADE
xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp nft_compat
nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
nft_counter nf_tables nfnetlink bridge stp llc bfq cmac algif_hash
algif_skcipher af_alg bnep dm_multipath scsi_dh_rdac scsi_dh_emc
scsi_dh_alua snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio
btusb snd_hda_codec_hdmi btrtl snd_hda_intel uvcvideo btbcm btintel
snd_intel_dspcfg snd_intel_sdw_acpi videobuf2_vmalloc bluetooth
snd_hda_codec videobuf2_memops snd_hda_core videobuf2_v4l2 snd_hwdep
videobuf2_common snd_pcm edac_mce_amd videodev snd_seq_midi
snd_seq_midi_event snd_rawmidi kvm_amd ecdh_generic mc ecc kvm snd_seq
wl(OE) acer_wmi snd_seq_device sparse_keymap snd_timer cfg80211
input_leds snd soundcore wmi_bmof serio_raw ccp k10temp mac_hid
fam15h_power sch_fq_codel msr ip_tables x_tables autofs4 btrfs
blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq
[    C0]  async_xor async_tx xor raid6_pq libcrc32c raid1 raid0
multipath linear amdgpu iommu_v2 gpu_sched radeon i2c_algo_bit
drm_ttm_helper ttm drm_kms_helper hid_generic syscopyarea rtsx_pci_sdmmc
sysfillrect sysimgblt crct10dif_pclmul fb_sys_fops cec crc32_pclmul
rc_core ghash_clmulni_intel usbhid aesni_intel sdhci_pci crypto_simd
cqhci xhci_pci r8169 psmouse drm xhci_pci_renesas ahci cryptd realtek
sdhci rtsx_pci libahci hid i2c_piix4 wmi video
[    C0] CPU: 0 PID: 12 Comm: ksoftirqd/0 Tainted: G           OE    
5.13.0-icetea001-12377-gf55966571d5e #3
[    C0] Hardware name: Acer Aspire ES1-421/OLVIA_BE, BIOS V1.05 07/02/2015
[    C0] RIP: 0010:update_blocked_averages+0x941/0x9a0
[    C0] Code: 00 e9 a7 fe ff ff e8 9e 22 c2 00 e9 4b f9 ff ff 0f 0b e9
da fe ff ff 48 c7 c7 88 c1 5c 82 c6 05 07 f6 ae 01 01 e8 d3 50 bc 00
<0f> 0b 41 8b 84 24 78 01 00 00 e9 f8 fa ff ff 48 c7 c7 88 bb 5c 82
[    C0] RSP: 0018:ffffc900001a7de0 EFLAGS: 00010082
[    C0] RAX: 0000000000000000 RBX: ffff888104ec6980 RCX: 0000000000000027
[    C0] RDX: ffff888313c18e28 RSI: 0000000000000001 RDI: ffff888313c18e20
[    C0] RBP: ffffc900001a7e58 R08: ffffffff82962048 R09: 00000000ffffdfff
[    C0] R10: ffffffff82882060 R11: ffffffff82882060 R12: ffff888104ec6800
[    C0] R13: 0000000000000000 R14: 0000735d623b8c53 R15: ffff888103830200
[    C0] FS:  0000000000000000(0000) GS:ffff888313c00000(0000)
knlGS:0000000000000000
[    C0] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    C0] CR2: 00007ff1f8c91000 CR3: 000000017c42a000 CR4: 00000000000406f0
[    C0] Call Trace:
[    C0]  run_rebalance_domains+0x53/0x80
[    C0]  __do_softirq+0xd2/0x472
[    C0]  run_ksoftirqd+0x3f/0x60
[    C0]  smpboot_thread_fn+0xc2/0x170
[    C0]  ? smpboot_register_percpu_thread+0xe0/0xe0
[    C0]  kthread+0x138/0x160
[    C0]  ? set_kthread_struct+0x50/0x50
[    C0]  ret_from_fork+0x1f/0x30
[    C0] irq event stamp: 43203642
[    C0] hardirqs last  enabled at (43203641): [<ffffffff810a6004>]
run_ksoftirqd+0x44/0x60
[    C0] hardirqs last disabled at (43203642): [<ffffffff81d1e8af>]
__schedule+0xfcf/0x17d0
[    C0] softirqs last  enabled at (43203640): [<ffffffff810a5fff>]
run_ksoftirqd+0x3f/0x60
[    C0] softirqs last disabled at (43203625): [<ffffffff810a5fff>]
run_ksoftirqd+0x3f/0x60
[    C0] ---[ end trace 74d3894cf8cf6ef8 ]---

Attachment:
1) config.gz (kernel config for compile)
2) dmesg.txt (more about kernel log)
3) proc_cpuinfo.gz (From cat /proc/cpuinfo)
4) proc_modules.gz (From cat /proc/modules)

If you need more information or want to me to do something, please let
me know. I will be happy to help.

Regards,
  Ammar


Attachments:
config.gz (60.90 kB)
dmesg.txt (16.32 kB)
proc_cpuinfo.gz (845.00 B)
proc_modules.gz (1.61 kB)
Download all attachments

2021-08-02 08:43:51

by Dietmar Eggemann

[permalink] [raw]
Subject: Re: WARNING: CPU: 0 PID: 12 at kernel/sched/fair.c:3306 update_blocked_averages+0x941/0x9a0

Hi Ammar,

On 30/07/2021 17:21, Ammar Faizi wrote:
> Hi everyone,
>
> I compiled Linux 5.13.0 and use it on my Ubuntu. I got a kernel warning
> at kernel/sched/fair.c:3306.
>
> Below is the system information
> Kernel: 5.13.0-icetea001-12377-gf55966571d5e

So you're running with:

9e077b52d86a - sched/pelt: Check that *_avg are null when *_sum are
(2021-06-17 Vincent Guittot)

but not with:

ceb6ba45dc80 - sched/fair: Sync load_sum with load_avg after dequeue
(2021-07-02 Vincent Guittot)

The SCHED_WARN_ON you're hitting is harmless and just tells you that the
PELT load_avg and load_sum part of one of your cfs_rq's is not aligned.
Has to be load (and not util or runnable) since load is the only one
still not fixed in f55966571d5e.

This should go away once you applied ceb6ba45dc80.

-- Dietmar


2021-08-02 12:56:26

by Ammar Faizi

[permalink] [raw]
Subject: Re: WARNING: CPU: 0 PID: 12 at kernel/sched/fair.c:3306 update_blocked_averages+0x941/0x9a0

On 8/2/21 3:42 PM, Dietmar Eggemann wrote:
> So you're running with:
>
> 9e077b52d86a - sched/pelt: Check that *_avg are null when *_sum are
> (2021-06-17 Vincent Guittot)
>
> but not with:
>
> ceb6ba45dc80 - sched/fair: Sync load_sum with load_avg after dequeue
> (2021-07-02 Vincent Guittot)
>
> The SCHED_WARN_ON you're hitting is harmless and just tells you that the
> PELT load_avg and load_sum part of one of your cfs_rq's is not aligned.
> Has to be load (and not util or runnable) since load is the only one
> still not fixed in f55966571d5e.
>
> This should go away once you applied ceb6ba45dc80.

Alright, I have just moved to 5.14-rc4 and doesn't seem to have this
issue anymore.

Thanks for the response, Dietmar.

  Ammar