Received: by 2002:a05:6a10:f3d0:0:0:0:0 with SMTP id a16csp5375383pxv; Wed, 7 Jul 2021 02:11:10 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxw2kEFTqQcrXqMwVHJg46DmGpVj9YmA34EIQpKlvIG8Xzq6RDWGkLW4+eeJKZg7ABkI2CU X-Received: by 2002:aa7:cd85:: with SMTP id x5mr28721151edv.115.1625649070472; Wed, 07 Jul 2021 02:11:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1625649070; cv=none; d=google.com; s=arc-20160816; b=IdrhJRQ7O2/fRHD+ri9LbEE4tERoVxF9Z+5e39YL82WK0FioqzMXB6p4VxXlx+DCwc NXOHo5PD7oRv+aXMfNOozVPhyCLW08K7piuIThw6cf31FOwi2WQcndP5YsKWKc0tutOW /bIqtPzxIrfjMNYTJgV1ZCZZYEUWrUoaAO34yHhi/3lQ8IIOATuXMrYupxA2BRP94rbM tj+1hSXdF0P3OE51oLCcrLXHyioXCj/pa9d6DXU91uI/XSxhDx9Qzn2PXI5SmACzgrqb tNRQO8KX98YMOLxwqQ8FkCdfJvBILLn8NScu+qLj0Y4daKqg96CpUlmOon+YdKeMi2O5 f6xg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=3AnM3WWAHik+amzahIq1nrIIyCAvA09jDRmXxoEQOtE=; b=r/4JeivdG2Mq5rWJD5YeGTNTOxojQ/KTUys86HefQBakHXM3rDMP8fmZ1SbjcR43vr WfBiuoth2dTaOKR3MZu2rf3EcvVqhubI+f2p8EihiZxEgZY9tKEMKwR+bxJPAauiWA1Q SBDds3JELIvwtd6rtFZ/V9aBiXrkz9RGQhovVVp7v3lxZbxlV7DDo6gR06LE8jR4PpOm QkqdKHORhYKgp38GMSciNDsSTzpCiVOiN4iA9GkoKatrZNMtb8yQdLklmPQszNmsoJ48 x3spLDa18wmnmbd0p4/JLgqK0rkFvHGMHAofX1DaIqKZ2iE+Myt4JxBhc3vXmVL973Je uwgQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id j15si17274338edv.567.2021.07.07.02.10.47; Wed, 07 Jul 2021 02:11:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231190AbhGGJKR (ORCPT + 99 others); Wed, 7 Jul 2021 05:10:17 -0400 Received: from mga02.intel.com ([134.134.136.20]:43997 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230166AbhGGJKQ (ORCPT ); Wed, 7 Jul 2021 05:10:16 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10037"; a="196435148" X-IronPort-AV: E=Sophos;i="5.83,331,1616482800"; d="scan'208";a="196435148" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Jul 2021 02:07:29 -0700 X-IronPort-AV: E=Sophos;i="5.83,331,1616482800"; d="scan'208";a="486741150" Received: from shao2-debian.sh.intel.com (HELO localhost) ([10.239.13.11]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Jul 2021 02:07:25 -0700 Date: Wed, 7 Jul 2021 17:07:22 +0800 From: kernel test robot To: Vincent Guittot Cc: Odin Ugedal , Peter Zijlstra , Sachin Sant , Naresh Kamboju , LKML , lkp@lists.01.org, kbuild test robot , ltp@lists.linux.it, Aubrey Li , Chen Yu Subject: Re: [sched/fair] 1c35b07e6d: RIP:native_queued_spin_lock_slowpath Message-ID: <20210707090722.GH2022171@shao2-debian> References: <20210706085652.GF2022171@shao2-debian> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 06, 2021 at 11:08:54AM +0200, Vincent Guittot wrote: > Hi Rong > > On Tue, 6 Jul 2021 at 10:56, kernel test robot wrote: > > > > Greeting, > > > > FYI, we noticed the following commit (built with gcc-9): > > > > commit: 1c35b07e6d3986474e5635be566e7bc79d97c64d ("sched/fair: Ensure _sum and _avg values stay consistent") > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > I don't think this commit is the real culprit as it mainly replaces a > sub by a mul whereas the dmesg mentioned spinlock deadlock . Have you > bisect the problem down to this commit or you faced the problem while > testing latest master branch ? Hi Vincent, It's bisected by 0day-CI, I tried to run more times and found the issue is not first introduced by this commit, but boot always failed with this commit. adf3c31e18b765ea 1c35b07e6d3986474e5635be566 ---------------- --------------------------- fail:runs %reproduction fail:runs | | | 7:15 -47% :12 last_state.booting 2:15 67% 12:12 dmesg.Kernel_panic-not_syncing:Hard_LOCKUP :15 7% 1:12 dmesg.RIP:cpuidle_enter_state 2:15 67% 12:12 dmesg.RIP:native_queued_spin_lock_slowpath :15 7% 1:12 dmesg.RIP:sd_init_command[sd_mod] 1:15 -7% :12 dmesg.RIP:update_blocked_averages 1:15 -7% :12 dmesg.WARNING:at_kernel/sched/fair.c:#update_blocked_averages 3:15 60% 12:12 dmesg.boot_failures Best Regards, Rong Chen > > > > > > > in testcase: ltp > > version: ltp-x86_64-14c1f76-1_20210703 > > with following parameters: > > > > disk: 1HDD > > fs: ext4 > > test: dio-01 > > ucode: 0xe2 > > > > test-description: The LTP testsuite contains a collection of tools for testing the Linux kernel and related features. > > test-url: http://linux-test-project.github.io/ > > > > > > on test machine: 8 threads Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz with 28G memory > > > > caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace): > > > > > > > > If you fix the issue, kindly add following tag > > Reported-by: kernel test robot > > > > > > [ 160.446205] > > [ 160.451594] <<>> > > [ 160.451595] > > [ 178.116525] ------------[ cut here ]------------ > > [ 203.592757] NMI watchdog: Watchdog detected hard LOCKUP on cpu 3 > > [ 203.592758] Modules linked in: dm_mod btrfs blake2b_generic xor zstd_compress raid6_pq libcrc32c ipmi_devintf ipmi_msghandler sd_mod t10_pi sg intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp i915 kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel mei_wdt intel_gtt drm_kms_helper ahci rapl syscopyarea libahci sysfillrect intel_cstate sysimgblt mei_me fb_sys_fops wmi_bmof drm intel_uncore libata mei joydev intel_pch_thermal wmi video intel_pmc_core acpi_pad ip_tables > > [ 203.592770] CPU: 3 PID: 3103 Comm: diotest6 Tainted: G I 5.13.0-rc6-00076-g1c35b07e6d39 #1 > > [ 203.592770] Hardware name: Dell Inc. OptiPlex 7040/0Y7WYT, BIOS 1.2.8 01/26/2016 > > [ 203.592771] RIP: 0010:native_queued_spin_lock_slowpath (kbuild/src/consumer/kernel/locking/qspinlock.c:382 kbuild/src/consumer/kernel/locking/qspinlock.c:315) > > [ 203.592771] Code: 6c f0 0f ba 2f 08 0f 92 c0 0f b6 c0 c1 e0 08 89 c2 8b 07 30 e4 09 d0 a9 00 01 ff ff 75 46 85 c0 74 0e 8b 07 84 c0 74 08 f3 90 <8b> 07 84 c0 75 f8 b8 01 00 00 00 66 89 07 c3 8b 37 b8 00 02 00 00 > > All code > > ======== > > 0: 6c insb (%dx),%es:(%rdi) > > 1: f0 0f ba 2f 08 lock btsl $0x8,(%rdi) > > 6: 0f 92 c0 setb %al > > 9: 0f b6 c0 movzbl %al,%eax > > c: c1 e0 08 shl $0x8,%eax > > f: 89 c2 mov %eax,%edx > > 11: 8b 07 mov (%rdi),%eax > > 13: 30 e4 xor %ah,%ah > > 15: 09 d0 or %edx,%eax > > 17: a9 00 01 ff ff test $0xffff0100,%eax > > 1c: 75 46 jne 0x64 > > 1e: 85 c0 test %eax,%eax > > 20: 74 0e je 0x30 > > 22: 8b 07 mov (%rdi),%eax > > 24: 84 c0 test %al,%al > > 26: 74 08 je 0x30 > > 28: f3 90 pause > > 2a:* 8b 07 mov (%rdi),%eax <-- trapping instruction > > 2c: 84 c0 test %al,%al > > 2e: 75 f8 jne 0x28 > > 30: b8 01 00 00 00 mov $0x1,%eax > > 35: 66 89 07 mov %ax,(%rdi) > > 38: c3 retq > > 39: 8b 37 mov (%rdi),%esi > > 3b: b8 00 02 00 00 mov $0x200,%eax > > > > Code starting with the faulting instruction > > =========================================== > > 0: 8b 07 mov (%rdi),%eax > > 2: 84 c0 test %al,%al > > 4: 75 f8 jne 0xfffffffffffffffe > > 6: b8 01 00 00 00 mov $0x1,%eax > > b: 66 89 07 mov %ax,(%rdi) > > e: c3 retq > > f: 8b 37 mov (%rdi),%esi > > 11: b8 00 02 00 00 mov $0x200,%eax > > [ 203.592772] RSP: 0018:ffffc90001f032d8 EFLAGS: 00000002 > > [ 203.592773] RAX: 0000000000000101 RBX: ffff88810d4a0000 RCX: ffff888759cc0000 > > [ 203.592773] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff888759ceba80 > > [ 203.592774] RBP: ffffc90001f032e8 R08: ffff888759ceb420 R09: ffff888759ceb420 > > [ 203.592774] R10: ffff88810cc01500 R11: 0000000000000000 R12: ffff888759ceba80 > > [ 203.592774] R13: 0000000000000000 R14: 0000000000000087 R15: ffff88810d4a0c8c > > [ 203.592775] FS: 00007fc252ae2740(0000) GS:ffff888759cc0000(0000) knlGS:0000000000000000 > > [ 203.592775] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 203.592776] CR2: 00007fa0a4d577f8 CR3: 000000074d22a005 CR4: 00000000003706e0 > > [ 203.592776] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > [ 203.592776] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > [ 203.592777] Call Trace: > > [ 203.592777] _raw_spin_lock (kbuild/src/consumer/arch/x86/include/asm/paravirt.h:585 kbuild/src/consumer/arch/x86/include/asm/qspinlock.h:51 kbuild/src/consumer/include/asm-generic/qspinlock.h:85 kbuild/src/consumer/include/linux/spinlock.h:183 kbuild/src/consumer/include/linux/spinlock_api_smp.h:143 kbuild/src/consumer/kernel/locking/spinlock.c:151) > > [ 203.592777] raw_spin_rq_lock_nested (kbuild/src/consumer/arch/x86/include/asm/preempt.h:85 kbuild/src/consumer/kernel/sched/core.c:462) > > [ 203.592778] try_to_wake_up (kbuild/src/consumer/kernel/sched/sched.h:1536 kbuild/src/consumer/kernel/sched/sched.h:1611 kbuild/src/consumer/kernel/sched/core.c:3555 kbuild/src/consumer/kernel/sched/core.c:3835) > > [ 203.592778] __queue_work (kbuild/src/consumer/arch/x86/include/asm/paravirt.h:590 kbuild/src/consumer/arch/x86/include/asm/qspinlock.h:56 kbuild/src/consumer/include/linux/spinlock.h:212 kbuild/src/consumer/include/linux/spinlock_api_smp.h:151 kbuild/src/consumer/kernel/workqueue.c:1501) > > [ 203.592778] queue_work_on (kbuild/src/consumer/kernel/workqueue.c:1526) > > > > > > To reproduce: > > > > git clone https://github.com/intel/lkp-tests.git > > cd lkp-tests > > bin/lkp install job.yaml # job file is attached in this email > > bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run > > bin/lkp run generated-yaml-file > > > > > > > > --- > > 0DAY/LKP+ Test Infrastructure Open Source Technology Center > > https://lists.01.org/hyperkitty/list/lkp@lists.01.org Intel Corporation > > > > Thanks, > > Rong Chen > >