Received: by 2002:a05:6a10:413:0:0:0:0 with SMTP id 19csp962826pxp; Wed, 16 Mar 2022 22:46:45 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy4khQyPpUxRibhcBle99WxIiHrWUteXTXPh+NBSh22i5WkWZWGz3mFZLQ5GlT5RtwzLxSt X-Received: by 2002:a63:4403:0:b0:375:6d5b:5aa7 with SMTP id r3-20020a634403000000b003756d5b5aa7mr2321988pga.269.1647496005134; Wed, 16 Mar 2022 22:46:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1647496005; cv=none; d=google.com; s=arc-20160816; b=S3sdp/mI7a8jgMbqGaF3UKoC8VSKIak2LGqVWExdhG/6xSQeMBOzMU/w6GN2uks11d bQrib6KIqJaNpG4t+UjIjVfE0L/eB3k9AOhJcghi5NlCHRGnZ6dM5BZ7a3svwt5ZuqS4 qA8fFdhA3pS/4BR/D2TjZ7kIMNxjfvPJ171+iCmzTyEg/F7zvY+L01/D3WBuKHNcbwSf mKSqscvAJ1XE8euuupVouc8vk7dE6o8sNyrOWOvYo3x0tq0/EralMy5KR6NGzMxJEn6K 6gqu4urcD1iSarlLwWqxgLw5Xm1q28rO6lrUaZoj4Oj6XDC5yLMb7HC0rjpm+fZa7hHa veEA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:dkim-signature :dkim-signature:date; bh=hQRxvCGWz5JjZ5mxKrBGw2/uRU2quWW8b9gTuwsLi08=; b=W6w98htDxOKykARF4o8dkcqJYT3k3ZXc5unbT/Fknrv9Ga297wtltVx5pjSvX7ougs yUVqpza/v3oE39YoEm+bTPx9ZXuGjj6AASNHKZH4PpljAP8X1nNmluZqF1IMHPjvu+PB iV7oiKiqwSAt5FLFjmIZ1FmUBE7H73F2WFzPIsQu48DvVzGzFTGTQF0FwITH6yVtiskQ DkXYC7M15NiJWRe1kxtgOMu2b6MKngk4qOWUvpQmkTQvfff+BdkVKUCenM2JPwFyrFzL Nv9Auwc42SraR3qPoU2fm9Ki4P2H11ro7TIcQT/Jyh2zMLzRnaHC5VznN2Yl+RZMHUlz ThLA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b="zUJ/PTUF"; dkim=neutral (no key) header.i=@linutronix.de; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id g6-20020a656cc6000000b003820502bd64si995043pgw.231.2022.03.16.22.46.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 16 Mar 2022 22:46:45 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b="zUJ/PTUF"; dkim=neutral (no key) header.i=@linutronix.de; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 149C212B742; Wed, 16 Mar 2022 21:41:42 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243624AbiCPQEh (ORCPT + 99 others); Wed, 16 Mar 2022 12:04:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58106 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234859AbiCPQEf (ORCPT ); Wed, 16 Mar 2022 12:04:35 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 01FBE4EA2D for ; Wed, 16 Mar 2022 09:03:16 -0700 (PDT) Date: Wed, 16 Mar 2022 17:03:12 +0100 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1647446594; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=hQRxvCGWz5JjZ5mxKrBGw2/uRU2quWW8b9gTuwsLi08=; b=zUJ/PTUFN88JQUjr3/q4NVB3+mFVpG94J5I6ycNIA6tKPg9nBZU/6cmeURuXEBErmMWUn9 7HI1j6HhpRySNlEWGVqTnnr4eOR/Ongjo7fp1As/aFz4BkNcskVNP3ZHzWnYPrkGZbkbpR 83Rfywiflvfns7lZAwmuc1AYzG8MQ/s41W912TH5nniE7f7rTsXnYGCJ3xhqcGr0UCb4Qk 6MD7SgedyJQ+0Qeq1LElF5SRTmfrEH2aI+EPEMrrGpIuDNVa0AU7PP7lwAPrFr9w8Z2js6 jMq1jSsZLcoPy26nEG8r3nAym+4u6lF1VknXxn9SBxltjv63ZotBAMQN8qFD/A== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1647446594; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=hQRxvCGWz5JjZ5mxKrBGw2/uRU2quWW8b9gTuwsLi08=; b=NWoxNFiv9Oc5+egiMdKxzZhIjMZZhmDoCwajc6mvhC2fgIuvYmJEL3BjPaudKIF1tLO9m4 mhTED40dYRHizDBg== From: Sebastian Andrzej Siewior To: Steven Rostedt Cc: Peter Zijlstra , LKML , Thomas Gleixner Subject: Re: sched_core_balance() releasing interrupts with pi_lock held Message-ID: References: <20220308161455.036e9933@gandalf.local.home> <20220315174606.02959816@gandalf.local.home> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20220315174606.02959816@gandalf.local.home> X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2022-03-15 17:46:06 [-0400], Steven Rostedt wrote: > On Tue, 8 Mar 2022 16:14:55 -0500 > Steven Rostedt wrote: > > > Hi Peter, > > Have you had time to look into this? yes, I can confirm that it is a problem ;) So I did this: diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 33ce5cd113d8..56c286aaa01f 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5950,7 +5950,6 @@ static bool try_steal_cookie(int this, int that) unsigned long cookie; bool success = false; - local_irq_disable(); double_rq_lock(dst, src); cookie = dst->core->core_cookie; @@ -5989,7 +5988,6 @@ static bool try_steal_cookie(int this, int that) unlock: double_rq_unlock(dst, src); - local_irq_enable(); return success; } @@ -6019,7 +6017,7 @@ static void sched_core_balance(struct rq *rq) preempt_disable(); rcu_read_lock(); - raw_spin_rq_unlock_irq(rq); + raw_spin_rq_unlock(rq); for_each_domain(cpu, sd) { if (need_resched()) break; @@ -6027,7 +6025,7 @@ static void sched_core_balance(struct rq *rq) if (steal_cookie_task(cpu, sd)) break; } - raw_spin_rq_lock_irq(rq); + raw_spin_rq_lock(rq); rcu_read_unlock(); preempt_enable(); } which looked right but RT still fall apart: | ===================================== | WARNING: bad unlock balance detected! | 5.17.0-rc8-rt14+ #10 Not tainted | ------------------------------------- | gcc/2608 is trying to release lock ((lock)) at: | [] folio_add_lru+0x60/0x90 | but there are no more locks to release! | | other info that might help us debug this: | 4 locks held by gcc/2608: | #0: ffff88826ea6efe0 (&sb->s_type->i_mutex_key#12){++++}-{3:3}, at: xfs_ilock+0x90/0xd0 | #1: ffff88826ea6f1a0 (mapping.invalidate_lock#2){++++}-{3:3}, at: page_cache_ra_unbounded+0x8e/0x1f0 | #2: ffff88852aba8d18 ((lock)#3){+.+.}-{2:2}, at: folio_add_lru+0x2a/0x90 | #3: ffffffff829a5140 (rcu_read_lock){....}-{1:2}, at: rt_spin_lock+0x5/0xe0 | | stack backtrace: | CPU: 18 PID: 2608 Comm: gcc Not tainted 5.17.0-rc8-rt14+ #10 | Hardware name: Intel Corporation S2600CP/S2600CP, BIOS SE5C600.86B.02.03.0003.041920141333 04/19/2014 | Call Trace: | | dump_stack_lvl+0x4a/0x62 | lock_release.cold+0x32/0x37 | rt_spin_unlock+0x17/0x80 | folio_add_lru+0x60/0x90 | filemap_add_folio+0x53/0xa0 | page_cache_ra_unbounded+0x1c3/0x1f0 | filemap_get_pages+0xe3/0x5b0 | filemap_read+0xc5/0x2f0 | xfs_file_buffered_read+0x6b/0x1a0 | xfs_file_read_iter+0x6a/0xd0 | new_sync_read+0x11b/0x1a0 | vfs_read+0x134/0x1d0 | ksys_read+0x68/0xf0 | do_syscall_64+0x59/0x80 | entry_SYSCALL_64_after_hwframe+0x44/0xae | RIP: 0033:0x7f3feab7310e It is always the local-lock that is breaks apart. Based on "locks held" and the lock it tries to release it looks like the lock was acquired on CPU-A and released on CPU-B. > Thanks, > > -- Steve Sebastian