Received: by 2002:a05:6358:a55:b0:ec:fcf4:3ecf with SMTP id 21csp2628391rwb; Fri, 20 Jan 2023 05:32:31 -0800 (PST) X-Google-Smtp-Source: AMrXdXuDhRNvCruTQAobtW2nPplVKjWpjdjV7p44hQR0AYVkXgPH1IyGXJv5R1ngumHCmsb6NXSG X-Received: by 2002:a17:907:6746:b0:871:b898:92fd with SMTP id qm6-20020a170907674600b00871b89892fdmr16867946ejc.6.1674221550962; Fri, 20 Jan 2023 05:32:30 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674221550; cv=none; d=google.com; s=arc-20160816; b=V+e2pJW+awedIgYswieXELVQbq0BWCkzF/suLoObeZKw+jKc9lepOYSIBAItm+0wzF RL05vfOEw/q2H5Ei6GmoF8SbXc8di8yC8uN/JTopYtiUjzvBqiVPgtvWgSTx0/c/NEaB /FoEF6BBe80ojptSDhJVrDOPl3t9M9YDuAMhGpsKycm1hfo5x652uRdnCA5o3xjvyZNr GAxiueOu9TPwoq4XRtW+07NHNwZ2iOfA2fPK0DZjnVI57mgvyWhuKGRk5XGXWr8ZLEOx d5rne+v1waY1TmQify7DgDoSC9alUY4+OJSbUuIUkvpzNcOptNKhaV4nBHg2s0Olgbp6 M5Cw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=TFYI/jRjTtMgYqOh2zPjNDM+24CrL5+7vrqwH1g2FRg=; b=PLpAhlliYXqgGk2oYtJ4f8jCzU8NR6QPkODK9UdcM/M+LWQG5vlq9q8cP9cPOyM9/I 2NTgF80y/7udbgwiw8pZhV676Wf5SqDhRjE5oh41VJQRaxKPcLeNViyMghksFET8Kdd0 TiEoQDMNq6i9xH/tu8Sj6TtR0x6zetvoCQXhk1nWv5eiBw34ETp2IVHRoplKp0y63Qpu RnTh+thB6gZgxtrDU0AEWkUlbw1gIjNoMTL51mZib41inoJ4NV7tnKkK/sSmqkZBrmfN bUK625NJ9qTozniGa2Jx62DVkkHGFdXZzmWJBPMJ4QmJR7O/XWe1Ew0gZEkfa9Epkiex TUGA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id qa15-20020a170907868f00b008704dba09ddsi17184140ejc.291.2023.01.20.05.32.18; Fri, 20 Jan 2023 05:32:30 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229897AbjATNYt (ORCPT + 49 others); Fri, 20 Jan 2023 08:24:49 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41988 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229893AbjATNYs (ORCPT ); Fri, 20 Jan 2023 08:24:48 -0500 Received: from outbound-smtp48.blacknight.com (outbound-smtp48.blacknight.com [46.22.136.219]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 95540BF8A6 for ; Fri, 20 Jan 2023 05:24:45 -0800 (PST) Received: from mail.blacknight.com (pemlinmail02.blacknight.ie [81.17.254.11]) by outbound-smtp48.blacknight.com (Postfix) with ESMTPS id 0A427FAE0F for ; Fri, 20 Jan 2023 13:24:43 +0000 (GMT) Received: (qmail 16057 invoked from network); 20 Jan 2023 13:24:43 -0000 Received: from unknown (HELO techsingularity.net) (mgorman@techsingularity.net@[84.203.198.246]) by 81.17.254.9 with ESMTPSA (AES256-SHA encrypted, authenticated); 20 Jan 2023 13:24:43 -0000 Date: Fri, 20 Jan 2023 13:24:41 +0000 From: Mel Gorman To: Sebastian Andrzej Siewior Cc: Peter Zijlstra , Thomas Gleixner , Ingo Molnar , Davidlohr Bueso , Linux-RT , LKML Subject: Re: [PATCH v2] locking/rwbase: Prevent indefinite writer starvation Message-ID: <20230120132441.4jjke47rnpikiuf5@techsingularity.net> References: <20230117083817.togfwc5cy4g67e5r@techsingularity.net> <20230117165021.t5m7c2d6frbbfzig@techsingularity.net> <20230118173130.4n2b3cs4pxiqnqd3@techsingularity.net> <20230119110220.kphftcehehhi5l5u@techsingularity.net> <20230119174101.rddtxk5xlamlnquh@techsingularity.net> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jan 20, 2023 at 09:25:00AM +0100, Sebastian Andrzej Siewior wrote: > On 2023-01-19 17:41:01 [+0000], Mel Gorman wrote: > > > > Yes, it makes your concern much clearer but I'm not sure it actually matters > > in terms of preventing write starvation or in terms of correctness. At > > worst, a writer is blocked that could have acquired the lock during a tiny > > race but that's a timing issue rather than a correctness issue. > > Correct. My concern is that one reader may need to wait 4ms+ for the > lock while a following reader (that one that sees the timeout) does not. > This can lead to confusion later on. > Ok, yes, that is a valid concern I had not considered when thinking in terms of correctness or writer starvation. It would be very tricky to diagnose if it happened. > > The race could be closed by moving wait_lock acquisition before the > > atomic_sub in rwbase_write_lock() but it expands the scope of the wait_lock > > and I'm not sure that's necessary for either correctness or preventing > > writer starvation. It's a more straight-forward fix but expanding the > > scope of a lock unnecessarily has been unpopular in the past. > > > > I think we can close the race that concerns you but I'm not convinced we > > need to and changing the scope of wait_lock would need a big comment and > > probably deserves a separate patch. > > would it work to check the timeout vs 0 before and only apply the > timeout check if it is != zero? The writer would need to unconditionally > or the lowest bit. That should close gaps at a low price. The timeout > variable is always read within the lock so there shouldn't be need for > any additional barriers. > Yes, as a bonus point, it can be checked early in rwbase_allow_reader_bias and is an cheap test for the common case so it's win-win all round. Patch is now this; --8<-- locking/rwbase: Prevent indefinite writer starvation rw_semaphore and rwlock are explicitly unfair to writers in the presense of readers by design with a PREEMPT_RT configuration. Commit 943f0edb754f ("locking/rt: Add base code for RT rw_semaphore and rwlock") notes; The implementation is writer unfair, as it is not feasible to do priority inheritance on multiple readers, but experience has shown that real-time workloads are not the typical workloads which are sensitive to writer starvation. While atypical, it's also trivial to block writers with PREEMPT_RT indefinitely without ever making forward progress. Since LTP-20220121, the dio_truncate test case went from having 1 reader to having 16 readers and the number of readers is sufficient to prevent the down_write ever succeeding while readers exist. Eventually the test is killed after 30 minutes as a failure. dio_truncate is not a realtime application but indefinite writer starvation is undesirable. The test case has one writer appending and truncating files A and B while multiple readers read file A. The readers and writer are contending for one file's inode lock which never succeeds as the readers keep reading until the writer is done which never happens. This patch records a timestamp when the first writer is blocked. DL / RT tasks can continue to take the lock for read as long as readers exist indefinitely. Other readers can acquire the read lock unless a writer has been blocked for a minimum of 4ms. This is sufficient to allow the dio_truncate test case to complete within the 30 minutes timeout. Signed-off-by: Mel Gorman --- include/linux/rwbase_rt.h | 3 +++ kernel/locking/rwbase_rt.c | 38 +++++++++++++++++++++++++++++++++++--- 2 files changed, 38 insertions(+), 3 deletions(-) diff --git a/include/linux/rwbase_rt.h b/include/linux/rwbase_rt.h index 1d264dd08625..b969b1d9bb85 100644 --- a/include/linux/rwbase_rt.h +++ b/include/linux/rwbase_rt.h @@ -10,12 +10,14 @@ struct rwbase_rt { atomic_t readers; + unsigned long waiter_timeout; struct rt_mutex_base rtmutex; }; #define __RWBASE_INITIALIZER(name) \ { \ .readers = ATOMIC_INIT(READER_BIAS), \ + .waiter_timeout = 0, \ .rtmutex = __RT_MUTEX_BASE_INITIALIZER(name.rtmutex), \ } @@ -23,6 +25,7 @@ struct rwbase_rt { do { \ rt_mutex_base_init(&(rwbase)->rtmutex); \ atomic_set(&(rwbase)->readers, READER_BIAS); \ + (rwbase)->waiter_timeout = 0; \ } while (0) diff --git a/kernel/locking/rwbase_rt.c b/kernel/locking/rwbase_rt.c index c201aadb9301..9d5bbf2985de 100644 --- a/kernel/locking/rwbase_rt.c +++ b/kernel/locking/rwbase_rt.c @@ -39,7 +39,10 @@ * major surgery for a very dubious value. * * The risk of writer starvation is there, but the pathological use cases - * which trigger it are not necessarily the typical RT workloads. + * which trigger it are not necessarily the typical RT workloads. SCHED_OTHER + * reader acquisitions will be forced into the slow path if a writer is + * blocked for more than RWBASE_RT_WAIT_TIMEOUT jiffies. New DL / RT readers + * can still starve a writer indefinitely. * * Fast-path orderings: * The lock/unlock of readers can run in fast paths: lock and unlock are only @@ -65,6 +68,27 @@ static __always_inline int rwbase_read_trylock(struct rwbase_rt *rwb) return 0; } +/* + * Allow reader bias for SCHED_OTHER tasks with a pending writer for a + * minimum of 4ms or 1 tick. This matches RWSEM_WAIT_TIMEOUT for the + * generic RWSEM implementation. + */ +#define RWBASE_RT_WAIT_TIMEOUT DIV_ROUND_UP(HZ, 250) + +static bool __sched rwbase_allow_reader_bias(struct rwbase_rt *rwb) +{ + /* + * Allow reader bias if no writer is blocked or for DL / RT tasks. + * Such tasks should be designed to avoid heavy writer contention + * or indefinite starvation. + */ + if (!rwb->waiter_timeout || rt_task(current)) + return true; + + /* Allow reader bias unless a writer timeout has expired. */ + return time_before(jiffies, rwb->waiter_timeout); +} + static int __sched __rwbase_read_lock(struct rwbase_rt *rwb, unsigned int state) { @@ -74,9 +98,11 @@ static int __sched __rwbase_read_lock(struct rwbase_rt *rwb, raw_spin_lock_irq(&rtm->wait_lock); /* * Allow readers, as long as the writer has not completely - * acquired the semaphore for write. + * acquired the semaphore for write and reader bias is still + * allowed. */ - if (atomic_read(&rwb->readers) != WRITER_BIAS) { + if (atomic_read(&rwb->readers) != WRITER_BIAS && + rwbase_allow_reader_bias(rwb)) { atomic_inc(&rwb->readers); raw_spin_unlock_irq(&rtm->wait_lock); return 0; @@ -255,6 +281,7 @@ static int __sched rwbase_write_lock(struct rwbase_rt *rwb, for (;;) { /* Optimized out for rwlocks */ if (rwbase_signal_pending_state(state, current)) { + rwb->waiter_timeout = 0; rwbase_restore_current_state(); __rwbase_write_unlock(rwb, 0, flags); trace_contention_end(rwb, -EINTR); @@ -264,12 +291,17 @@ static int __sched rwbase_write_lock(struct rwbase_rt *rwb, if (__rwbase_write_trylock(rwb)) break; + /* Record timeout when reader bias is ignored. */ + rwb->waiter_timeout = jiffies + RWBASE_RT_WAIT_TIMEOUT; + raw_spin_unlock_irqrestore(&rtm->wait_lock, flags); rwbase_schedule(); raw_spin_lock_irqsave(&rtm->wait_lock, flags); set_current_state(state); } + + rwb->waiter_timeout = 0; rwbase_restore_current_state(); trace_contention_end(rwb, 0);