Received: by 2002:a6b:fb09:0:0:0:0:0 with SMTP id h9csp5171643iog; Wed, 22 Jun 2022 13:39:51 -0700 (PDT) X-Google-Smtp-Source: AGRyM1sfrwOicDS/M+QqhoO9jRsVzAu5D9dMASDbeVtV6tSzD/JmU46odg+URfcnqsAVTfi1JDUE X-Received: by 2002:a17:907:160f:b0:70f:cceb:d78c with SMTP id hb15-20020a170907160f00b0070fccebd78cmr4869071ejc.247.1655930391698; Wed, 22 Jun 2022 13:39:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1655930391; cv=none; d=google.com; s=arc-20160816; b=EGvMgfzUeTWw5HH83l0zuRP9Lf4Q6MhGXBaav2EvZs/sbj8i4B7z2yAUk6+f8Nek68 rCz6UI1CkBVqbS+8fnuq/1/YgMlyWrXjP+q60wOsnvt0fCa/b4JuF/HJoyCucGl7SYzM NWArczFYXZvRG81ElBcn9+KhvZCu5JIOAMN8ydE7pL6QxbKGgfZz10d1xNxSujVas59L Jcy4/exobijg21quJ3ZtQc6t2OxVLZQZ+6haJOM8MFcPr6q02Gfxjf/fB8JS+8ctiocc UrydLhbXzX/kuTf60eY3Xi37iv+4AbEruZ6piPSO7q04NSP0LZvox2sskzgXKoxJqD/s hA9g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=azMoc3MuHzEFRLZqvDPUKx486juejRj8cxPvSP7WFUA=; b=DwwjlPV1Id1kLmwjtwJiHSzrR5MRKCuv33TzY0C9mFqVI6qM2Ds5T57gj/0cSZtsOG Ec7Wgcp7W2CxtGluPk5mKVT3RdahIj4ugM2uiX26wNHPlEpvQUn5JZDRMA5Vk+rBmyhr Cun3nbfMHJeIm+iO6gTgLGOF9fyuYll0AbWtISOo7rE2WmnM7BLEFNDtxrIEbwRBnhjM llX0Hr/yytCHiCSv1O6b4R7eZt3Gac9oPOAI6oneoFjDtNAUq47RS82I+PbVvN0QxvFw D6m918HAFFH/oyjqHQ5fFSZUCbHjVEK0Lg3hfHtTie/fkLZfs4jmBW+HklFRptx1Gd1T 8toQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=DrJHN2rG; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i29-20020a0564020f1d00b0043574f03987si13851002eda.39.2022.06.22.13.39.15; Wed, 22 Jun 2022 13:39:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=DrJHN2rG; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1358676AbiFVUEi (ORCPT + 99 others); Wed, 22 Jun 2022 16:04:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40474 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234514AbiFVUEf (ORCPT ); Wed, 22 Jun 2022 16:04:35 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id BB853205C5 for ; Wed, 22 Jun 2022 13:04:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1655928270; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=azMoc3MuHzEFRLZqvDPUKx486juejRj8cxPvSP7WFUA=; b=DrJHN2rGgGPWtP2Y6xPFXvaKTYwc7wnqlI5mu5o0/+vh0TqjG4oSzK5tkyub+b256z9FH6 Q/bSSVacBTFVMq82HePNZjb6J+plk26WforO6SGIRAhzwD3uSEaDu/Ltv0ukEQF8R3idmR 0g1ZoVSGG6BbAabtevT7zB4MNIV/5pE= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-587-T0_kzLE-Pp6zI8_NKfjStA-1; Wed, 22 Jun 2022 16:04:27 -0400 X-MC-Unique: T0_kzLE-Pp6zI8_NKfjStA-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 4F4DA29DD9AB; Wed, 22 Jun 2022 20:04:27 +0000 (UTC) Received: from llong.com (dhcp-17-215.bos.redhat.com [10.18.17.215]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0DEE4492CA5; Wed, 22 Jun 2022 20:04:27 +0000 (UTC) From: Waiman Long To: Peter Zijlstra , Ingo Molnar , Will Deacon , Boqun Feng Cc: linux-kernel@vger.kernel.org, john.p.donnelly@oracle.com, Hillf Danton , Mel Gorman , Waiman Long Subject: [PATCH v2] locking/rwsem: Allow slowpath writer to ignore handoff bit if not set by first waiter Date: Wed, 22 Jun 2022 16:04:19 -0400 Message-Id: <20220622200419.778799-1-longman@redhat.com> MIME-Version: 1.0 Content-type: text/plain Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.85 on 10.11.54.9 X-Spam-Status: No, score=-3.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org With commit d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more consistent"), the writer that sets the handoff bit can be interrupted out without clearing the bit if the wait queue isn't empty. This disables reader and writer optimistic lock spinning and stealing. Now if a non-first writer in the queue is somehow woken up or a new waiter enters the slowpath, it can't acquire the lock. This is not the case before commit d257cc8cb8d5 as the writer that set the handoff bit will clear it when exiting out via the out_nolock path. This is less efficient as the busy rwsem stays in an unlock state for a longer time. In some cases, this new behavior may cause lockups as shown in [1] and [2]. This patch allows a non-first writer to ignore the handoff bit if it is not originally set or initiated by the first waiter. This patch is shown to be effective in fixing the lockup problem reported in [1]. [1] https://lore.kernel.org/lkml/20220617134325.GC30825@techsingularity.net/ [2] https://lore.kernel.org/lkml/3f02975c-1a9d-be20-32cf-f1d8e3dfafcc@oracle.com/ Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more consistent") Signed-off-by: Waiman Long Tested-by: Mel Gorman --- kernel/locking/rwsem.c | 30 ++++++++++++++++++++---------- 1 file changed, 20 insertions(+), 10 deletions(-) diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c index 9d1db4a54d34..ffd6188d4a7c 100644 --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -335,8 +335,6 @@ struct rwsem_waiter { struct task_struct *task; enum rwsem_waiter_type type; unsigned long timeout; - - /* Writer only, not initialized in reader */ bool handoff_set; }; #define rwsem_first_waiter(sem) \ @@ -459,10 +457,12 @@ static void rwsem_mark_wake(struct rw_semaphore *sem, * to give up the lock), request a HANDOFF to * force the issue. */ - if (!(oldcount & RWSEM_FLAG_HANDOFF) && - time_after(jiffies, waiter->timeout)) { - adjustment -= RWSEM_FLAG_HANDOFF; - lockevent_inc(rwsem_rlock_handoff); + if (time_after(jiffies, waiter->timeout)) { + if (!(oldcount & RWSEM_FLAG_HANDOFF)) { + adjustment -= RWSEM_FLAG_HANDOFF; + lockevent_inc(rwsem_rlock_handoff); + } + waiter->handoff_set = true; } atomic_long_add(-adjustment, &sem->count); @@ -599,7 +599,7 @@ rwsem_del_wake_waiter(struct rw_semaphore *sem, struct rwsem_waiter *waiter, static inline bool rwsem_try_write_lock(struct rw_semaphore *sem, struct rwsem_waiter *waiter) { - bool first = rwsem_first_waiter(sem) == waiter; + struct rwsem_waiter *first = rwsem_first_waiter(sem); long count, new; lockdep_assert_held(&sem->wait_lock); @@ -609,11 +609,20 @@ static inline bool rwsem_try_write_lock(struct rw_semaphore *sem, bool has_handoff = !!(count & RWSEM_FLAG_HANDOFF); if (has_handoff) { - if (!first) + /* + * Honor handoff bit and yield only when the first + * waiter is the one that set it. Otherwisee, we + * still try to acquire the rwsem. + */ + if (first->handoff_set && (waiter != first)) return false; - /* First waiter inherits a previously set handoff bit */ - waiter->handoff_set = true; + /* + * First waiter can inherit a previously set handoff + * bit and spin on rwsem if lock acquisition fails. + */ + if (waiter == first) + waiter->handoff_set = true; } new = count; @@ -1027,6 +1036,7 @@ rwsem_down_read_slowpath(struct rw_semaphore *sem, long count, unsigned int stat waiter.task = current; waiter.type = RWSEM_WAITING_FOR_READ; waiter.timeout = jiffies + RWSEM_WAIT_TIMEOUT; + waiter.handoff_set = false; raw_spin_lock_irq(&sem->wait_lock); if (list_empty(&sem->wait_list)) { -- 2.31.1