Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp1131877rwb; Thu, 1 Dec 2022 12:52:35 -0800 (PST) X-Google-Smtp-Source: AA0mqf6/kLi2xWpiyVA9wHN0mnxR0PpQXlcfjFzR7NqUwJdMOt8E1ujp3GhvXf2fFGt+pL4ktRbS X-Received: by 2002:a17:902:ab12:b0:189:7d4d:b99b with SMTP id ik18-20020a170902ab1200b001897d4db99bmr26267353plb.147.1669927954846; Thu, 01 Dec 2022 12:52:34 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1669927954; cv=none; d=google.com; s=arc-20160816; b=XSxvjV8OY8L3Xpc3RDE5Wd8WGtqb7k6gWLSigTy52O19wWb/1eBkycYMT+cpr6GjKD Ke49YSlGqGfOrpYLLrO0EzDygYwL4RsFZgjNQIsLISReUciOIjzzucdvPgvv0G+7lk2y iMC2WeJ+ythpxdN/90QmebQ71ae0U7NkUVoRYULgXGHwZ16xz/y/OOvVEog4PSS1dYF+ IlxvrW7zpuPivrKCNaR9SqR2m5LR93rkK6ZCXP4DR454luJh+yRXNGMZtTidDHVFKQpc Y79JPsg1hbzL7uMLaJjHkeBakrf7bz/mWUlioVyBhxoQkBQaJZVoug/MG4rbf7e66MqO 0ZvQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=RYM9YxeYq1Uhm+R3k3EuQq6ka07hBaRmT8dYommbKh8=; b=AgRnT2VADNqITskiA2JE/3DhAL3uVkWc5wG+QbGoKOh5wCvbx/1kqft3bG+DuY6P9n vAtl8CWY+ZXM0KOijWoZphtwit2q5GRA0wQiEu3Y7x3YZeRZiB+UJs8WFJ8FcUmJGyAY apAwicfh16DA7h/zIBHZJAvkZgkRH/Gw5TOjfuj1UyZj1QG1+OLfATCtlvFW2PNpLYwh w9pDdBlosduU9v0bJBCk6+v/34g7EBMndqImv/1okeMpnPj0mcyFeK0cDqfcoasX2qkC /zojSS5+r9R9P65Hwko8fuUJ7H5aaomAuHpNF+Sb3cfs77YA9WQ40l9D9cPs4lmDO0KY z0yg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=H1aObPDA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n3-20020a638f03000000b004769f0faab8si5134907pgd.740.2022.12.01.12.52.24; Thu, 01 Dec 2022 12:52:34 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=H1aObPDA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230202AbiLAUaG (ORCPT + 81 others); Thu, 1 Dec 2022 15:30:06 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57848 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229623AbiLAUaD (ORCPT ); Thu, 1 Dec 2022 15:30:03 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6C2FC769E7 for ; Thu, 1 Dec 2022 12:30:02 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id F03A1B82025 for ; Thu, 1 Dec 2022 20:30:00 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A5CF0C433C1; Thu, 1 Dec 2022 20:29:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1669926599; bh=G6s/neJ/58a8Hb/nA6CqujMZezHgukxnl63LF3NqZnk=; h=Date:From:To:Cc:Subject:Reply-To:References:In-Reply-To:From; b=H1aObPDAZgD9K9Wf3p50VGZA3YtpH5al+bf1PD0AHR1Xo7d6ZkoV/iqbJVfPfpqfS dffyyzXfaJsHRDP7Y9Qj31tHjYsMBWY8IwBY+zBY2i53W1lEjbSmA8mm7qGrZx44ks 4uoQj7yMT3Iam8zyFGswBLdwixvtzOinxnzGaJ892h8UCN4q1oBavoDPDkvIVA7EV9 SaLFXt4Sk7ncRr0YeEvVu2K6yhmbwAys9CaF2gnCIAsPBr/CHyXO3EpIukCgnmnJ6s 5z4jGyyxQyi5CLkoozqjWC+H1fJM9lYxoKdKVvv4+MXyu77GXDjnGqUWGP+iMlqOKj hN3r95d3ICbdQ== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id 4DB8A5C05F8; Thu, 1 Dec 2022 12:29:59 -0800 (PST) Date: Thu, 1 Dec 2022 12:29:59 -0800 From: "Paul E. McKenney" To: Petr Mladek Cc: "Zhang, Qiang1" , "Liu, Yujie" , John Ogness , "oe-lkp@lists.linux.dev" , lkp , Thomas Gleixner , Miguel Ojeda , Linux Memory Management List , "linux-kernel@vger.kernel.org" Subject: Re: [linux-next:master] [printk] 8bdbdd7f43: BUG:scheduling_while_atomic Message-ID: <20221201202959.GJ4001@paulmck-ThinkPad-P17-Gen-1> Reply-To: paulmck@kernel.org References: <202211302358.ef0db537-yujie.liu@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Dec 01, 2022 at 05:12:44PM +0100, Petr Mladek wrote: > On Thu 2022-12-01 05:27:13, Zhang, Qiang1 wrote: > > Greeting, > > > > FYI, we noticed BUG:scheduling_while_atomic due to commit (built with clang-14): > > > > commit: 8bdbdd7f43cd74c7faca6add8a62d541503ae21d ("printk: Prepare for SRCU console list protection") > > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master > > > > in testcase: boot > > > > on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G > > > > caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace): > > > > > > [ 8.561823][ T0] BUG: scheduling while atomic: swapper/0/0x00000002 > > [ 8.569154][ T0] no locks held by swapper/0. > > [ 8.571934][ T0] Modules linked in: > > [ 8.573001][ T0] CPU: 0 PID: 0 Comm: swapper Not tainted 6.1.0-rc1-00015-g8bdbdd7f43cd #1 9e431b4e696756e99f94acf7e74ac6e512df80ce > > [ 8.576740][ T0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014 > > [ 8.579942][ T0] Call Trace: > > [ 8.581143][ T0] > > [ 8.582054][ T0] dump_stack_lvl (??:?) > > [ 8.583312][ T0] ? netdev_notice (??:?) > > [ 8.584753][ T0] ? lockdep_print_held_locks (lockdep.c:?) > > [ 8.586563][ T0] __schedule_bug (core.c:?) > > [ 8.588171][ T0] ? trace_sched_switch (core.c:?) > > [ 8.589753][ T0] ? save_trace (lockdep.c:?) > > [ 8.591135][ T0] schedule_debug (core.c:?) > > [ 8.592582][ T0] __schedule (core.c:?) > > [ 8.593902][ T0] ? __sched_text_start (core.c:?) > > [ 8.595356][ T0] ? add_chain_block (lockdep.c:?) > > [ 8.596847][ T0] ? find_held_lock (lockdep.c:?) > > [ 8.598368][ T0] schedule (??:?) > > [ 8.599564][ T0] schedule_timeout (??:?) > > [ 8.600937][ T0] ? console_conditional_schedule (??:?) > > [ 8.602773][ T0] do_wait_for_common (build_utility.c:?) > > [ 8.604522][ T0] ? console_conditional_schedule (??:?) > > [ 8.606462][ T0] ? bit_wait_io_timeout (build_utility.c:?) > > [ 8.608196][ T0] ? _raw_spin_lock_irq (??:?) > > [ 8.609935][ T0] ? lockdep_hardirqs_on (??:?) > > [ 8.611646][ T0] wait_for_completion (??:?) > > [ 8.613253][ T0] synchronize_srcu (??:?) > > Hmm, I do not understand it. It seems that this is called > in normal context. There was console_unlock() right > before synchronize_srcu(). It means that we took > semaphore in the same context. > > Also all the calls above looks correct. wait_for_completion() > seems to be called in normal context as well. Is it possible that this is happening between the time that the scheduler is running, but before workqueues are fully up and running? If so, I can adjust synchronize_srcu() to operate during that time. But I would need confirmation that this in fact the situation. Thanx, Paul > Best Regards, > Petr > > > [ 8.614825][ T0] ? srcu_gp_start_if_needed (??:?) > > [ 8.616664][ T0] ? rcu_read_lock_any_held (??:?) > > [ 8.618281][ T0] ? console_trylock_spinning (??:?) > > [ 8.620181][ T0] unregister_console (??:?) > > [ 8.621875][ T0] register_console (??:?) > > [ 8.623401][ T0] ? serial8250_isa_init_ports (8250_core.c:?) > > [ 8.625281][ T0] ? parse_options (super.c:?) > > [ 8.626887][ T0] univ8250_console_init (8250_core.c:?) > > [ 8.628583][ T0] console_init (??:?) > > [ 8.630025][ T0] start_kernel (??:?) > > [ 8.631558][ T0] secondary_startup_64_no_verify (??:?) > > [ 8.633502][ T0] > > [ 8.634624][ T0] ------------[ cut here ]------------ > > [ 8.636289][ T0] releasing a pinned lock > > [ 8.638693][ T0] WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:5352 lock_release (??:?) > > [ 8.641591][ T0] Modules linked in: > > [ 8.642864][ T0] CPU: 0 PID: 0 Comm: swapper Tainted: G W 6.1.0-rc1-00015-g8bdbdd7f43cd #1 9e431b4e696756e99f94acf7e74ac6e512df80ce > > [ 8.646469][ T0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014 > > [ 8.649578][ T0] RIP: 0010:lock_release (??:?) > > [ 8.651243][ T0] Code: 00 00 e9 0f fe ff ff 48 83 05 06 f6 ff 06 01 e8 91 4e 2d 03 e9 67 fe ff ff 48 c7 c7 20 8c cf 84 e8 90 3f ec ff 48 8b 54 24 08 <0f> 0b 48 83 05 a9 f6 ff 06 01 e9 66 fc ff ff e8 67 c9 99 01 85 c0 > > All code > > ======== > > 0: 00 00 add %al,(%rax) > > 2: e9 0f fe ff ff jmpq 0xfffffffffffffe16 > > 7: 48 83 05 06 f6 ff 06 addq $0x1,0x6fff606(%rip) # 0x6fff615 > > e: 01 > > f: e8 91 4e 2d 03 callq 0x32d4ea5 > > 14: e9 67 fe ff ff jmpq 0xfffffffffffffe80 > > 19: 48 c7 c7 20 8c cf 84 mov $0xffffffff84cf8c20,%rdi > > 20: e8 90 3f ec ff callq 0xffffffffffec3fb5 > > 25: 48 8b 54 24 08 mov 0x8(%rsp),%rdx > > 2a:* 0f 0b ud2 <-- trapping instruction > > 2c: 48 83 05 a9 f6 ff 06 addq $0x1,0x6fff6a9(%rip) # 0x6fff6dd > > 33: 01 > > 34: e9 66 fc ff ff jmpq 0xfffffffffffffc9f > > 39: e8 67 c9 99 01 callq 0x199c9a5 > > 3e: 85 c0 test %eax,%eax > > > > Code starting with the faulting instruction > > =========================================== > > 0: 0f 0b ud2 > > 2: 48 83 05 a9 f6 ff 06 addq $0x1,0x6fff6a9(%rip) # 0x6fff6b3 > > 9: 01 > > a: e9 66 fc ff ff jmpq 0xfffffffffffffc75 > > f: e8 67 c9 99 01 callq 0x199c97b > > 14: 85 c0 test %eax,%eax > > [ 8.656638][ T0] RSP: 0000:ffffffff862077c0 EFLAGS: 00010086 > > [ 8.658249][ T0] RAX: 0000000000000017 RBX: ffffffff86244244 RCX: ffffffff8631a420 > > [ 8.660383][ T0] RDX: ffffffff86244218 RSI: 0000000000000008 RDI: 0000000000000001 > > [ 8.662798][ T0] RBP: ffffffff862078e0 R08: dffffc0000000000 R09: 00000000035545d9 > > [ 8.665205][ T0] R10: dffff7fff0c40e7a R11: 0000000000000001 R12: ffffffff862fcf58 > > [ 8.667420][ T0] R13: 1ffffffff0c40f06 R14: ffffffff86244220 R15: dffffc0000000000 > > [ 8.669824][ T0] FS: 0000000000000000(0000) GS:ffffffff862ab000(0000) knlGS:0000000000000000 > > [ 8.672552][ T0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 8.674557][ T0] CR2: ffff88843ffff000 CR3: 0000000006235000 CR4: 00000000000406b0 > > [ 8.677082][ T0] Call Trace: > > [ 8.678220][ T0] > > [ 8.679230][ T0] ? __lock_acquire (??:?) > > [ 8.680709][ T0] ? kvm_sched_clock_read (kvmclock.c:?) > > [ 8.682309][ T0] ? sched_clock_cpu (??:?) > > [ 8.683854][ T0] ? kvm_sched_clock_read (kvmclock.c:?) > > [ 8.685268][ T0] _raw_spin_unlock (??:?) > > [ 8.686730][ T0] dequeue_task_idle (build_policy.c:?) > > [ 8.688259][ T0] ? set_next_task_idle (build_policy.c:?) > > [ 8.689822][ T0] ? update_rq_clock (??:?) > > [ 8.691387][ T0] __schedule (core.c:?) > > [ 8.692782][ T0] ? __sched_text_start (core.c:?) > > [ 8.694287][ T0] ? add_chain_block (lockdep.c:?) > > [ 8.695869][ T0] ? find_held_lock (lockdep.c:?) > > [ 8.697314][ T0] schedule (??:?) > > [ 8.698529][ T0] schedule_timeout (??:?) > > [ 8.701929][ T0] ? console_conditional_schedule (??:?) > > [ 8.703826][ T0] do_wait_for_common (build_utility.c:?) > > [ 8.705395][ T0] ? console_conditional_schedule (??:?) > > [ 8.707189][ T0] ? bit_wait_io_timeout (build_utility.c:?) > > [ 8.708779][ T0] ? _raw_spin_lock_irq (??:?) > > [ 8.710371][ T0] ? lockdep_hardirqs_on (??:?) > > [ 8.711982][ T0] wait_for_completion (??:?) > > [ 8.713487][ T0] synchronize_srcu (??:?) > > [ 8.715020][ T0] ? srcu_gp_start_if_needed (??:?) > > [ 8.716772][ T0] ? rcu_read_lock_any_held (??:?) > > [ 8.718477][ T0] ? console_trylock_spinning (??:?) > > [ 8.720179][ T0] unregister_console (??:?) > > [ 8.721644][ T0] register_console (??:?) > > [ 8.728846][ T0] ? serial8250_isa_init_ports (8250_core.c:?) > > [ 8.730692][ T0] ? parse_options (super.c:?) > > [ 8.732180][ T0] univ8250_console_init (8250_core.c:?) > > [ 8.733762][ T0] console_init (??:?) > > [ 8.735096][ T0] start_kernel (??:?) > > [ 8.736425][ T0] secondary_startup_64_no_verify (??:?) > > [ 8.738088][ T0] > > [ 8.739057][ T0] irq event stamp: 494 > > [ 8.740376][ T0] hardirqs last enabled at (493): dump_stack_lvl (??:?) > > [ 8.743081][ T0] hardirqs last disabled at (494): __schedule (core.c:?) > > [ 8.745797][ T0] softirqs last enabled at (0): 0x0 > > [ 8.747882][ T0] softirqs last disabled at (0): 0x0 > > [ 8.749889][ T0] ---[ end trace 0000000000000000 ]--- > > [ 8.751520][ T0] bad: scheduling from the idle thread! > > [ 8.753351][ T0] CPU: 0 PID: 0 Comm: swapper Tainted: G W 6.1.0-rc1-00015-g8bdbdd7f43cd #1 9e431b4e696756e99f94acf7e74ac6e512df80ce > > [ 8.757566][ T0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014 > > [ 8.760783][ T0] Call Trace: > > [ 8.761970][ T0] > > [ 8.763067][ T0] dump_stack_lvl (??:?) > > [ 8.764611][ T0] ? netdev_notice (??:?) > > [ 8.766213][ T0] ? lockdep_hardirqs_on_prepare (??:?) > > [ 8.768214][ T0] ? print_irqtrace_events (??:?) > > [ 8.769911][ T0] ? kvm_sched_clock_read (kvmclock.c:?) > > [ 8.771550][ T0] dequeue_task_idle (build_policy.c:?) > > [ 8.773087][ T0] ? set_next_task_idle (build_policy.c:?) > > [ 8.774647][ T0] ? update_rq_clock (??:?) > > [ 8.776280][ T0] __schedule (core.c:?) > > [ 8.777729][ T0] ? __sched_text_start (core.c:?) > > [ 8.779238][ T0] ? add_chain_block (lockdep.c:?) > > [ 8.780922][ T0] ? find_held_lock (lockdep.c:?) > > [ 8.782560][ T0] schedule (??:?) > > [ 8.783988][ T0] schedule_timeout (??:?) > > [ 8.785515][ T0] ? console_conditional_schedule (??:?) > > [ 8.787500][ T0] do_wait_for_common (build_utility.c:?) > > [ 8.789162][ T0] ? console_conditional_schedule (??:?) > > [ 8.790997][ T0] ? bit_wait_io_timeout (build_utility.c:?) > > [ 8.792567][ T0] ? _raw_spin_lock_irq (??:?) > > [ 8.794292][ T0] ? lockdep_hardirqs_on (??:?) > > [ 8.796008][ T0] wait_for_completion (??:?) > > [ 8.797520][ T0] synchronize_srcu (??:?) > > [ 8.799001][ T0] ? srcu_gp_start_if_needed (??:?) > > [ 8.800696][ T0] ? rcu_read_lock_any_held (??:?) > > [ 8.802508][ T0] ? console_trylock_spinning (??:?) > > [ 8.804316][ T0] unregister_console (??:?) > > [ 8.805810][ T0] register_console (??:?) > > [ 8.807364][ T0] ? serial8250_isa_init_ports (8250_core.c:?) > > [ 8.811167][ T0] ? parse_options (super.c:?) > > [ 8.812693][ T0] univ8250_console_init (8250_core.c:?) > > [ 8.814269][ T0] console_init (??:?) > > [ 8.815765][ T0] start_kernel (??:?) > > [ 8.817296][ T0] secondary_startup_64_no_verify (??:?) > > [ 8.819246][ T0] > > > > > > > > Through the config file, it can be found that the following options are enabled > > > > CONFIG_TINY_RCU=y > > CONFIG_RCU_EXPERT=y > > CONFIG_SRCU=y > > CONFIG_TINY_SRCU=y > > > > This problem should have been fixed: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=0bf43fcbf8ebbf84a2b1b8770eaefcdb4a99afd6 > > > > commit dbc6ca150842650d513705f26e3e6b7a4e182ce9 > > Author: Zqiang > > Date: Wed Nov 9 15:36:38 2022 +0800 > > > > srcu: Make Tiny synchronize_srcu() check for readers > > > > This commit adds lockdep checks for illegal use of synchronize_srcu() > > within same-type SRCU read-side critical sections and within normal > > RCU read-side critical sections. It also makes synchronize_srcu() > > be a no-op during early boot. > > > > These changes bring Tiny synchronize_srcu() into line with both Tree > > synchronize_srcu() and Tiny synchronize_rcu(). > > > > Signed-off-by: Zqiang > > Signed-off-by: Paul E. McKenney > > > > > > Thanks > > Zqiang > > > > > > > > > > > > > > > > If you fix the issue, kindly add following tag > > | Reported-by: kernel test robot > > | Link: https://lore.kernel.org/oe-lkp/202211302358.ef0db537-yujie.liu@intel.com > > > > > > To reproduce: > > > > # build kernel > > cd linux > > cp config-6.1.0-rc1-00015-g8bdbdd7f43cd .config > > make HOSTCC=clang-14 CC=clang-14 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules > > make HOSTCC=clang-14 CC=clang-14 ARCH=x86_64 INSTALL_MOD_PATH= modules_install > > cd > > find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz > > > > > > git clone https://github.com/intel/lkp-tests.git > > cd lkp-tests > > bin/lkp qemu -k -m modules.cgz job-script # job-script is attached in this email > > > > # if come across any failure that blocks the test, > > # please remove ~/.lkp and /lkp dir to run from a clean state. > > > > > > -- > > 0-DAY CI Kernel Test Service > > https://01.org/lkp