Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp5579705rwb; Mon, 14 Nov 2022 06:43:44 -0800 (PST) X-Google-Smtp-Source: AA0mqf6qWB8nFiA6Oo/KW2c3y4qIgLw9fd4NBNS8fnab3zJEVoFZPHrz0gCo4UqlpKOEUtm4DcR6 X-Received: by 2002:a63:f94d:0:b0:46f:fe3e:3ebe with SMTP id q13-20020a63f94d000000b0046ffe3e3ebemr12017509pgk.518.1668437024622; Mon, 14 Nov 2022 06:43:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1668437024; cv=none; d=google.com; s=arc-20160816; b=nJm4vIQryv+MvQHtQsUbleAKeSeB9BoYKO/48/aJquwn4TJmD4+IWIMYbgrjnCzKJ3 NTVaOBVe1FhpFGA7rL4w1fslLfJp9em2Xqk+w7zi0MsEfvjYD6aWrD91czcIsQfePWyW udzKvyOLl5OQ0AtYuiPsFDE5AlWDLBWyeM+pCg6Lb5+8KfzQJfMQwuwMhOdFI4xtvUXT tnng8H9oNedOjqUoR3XR2+Ya7tVLDbhZdG59VxlLblrdGhWoct18caZUqwuRQRVDdJqx bt9og1yDGopAZzcX96237JcgnpUKMP+nU3pyFrb5are1FRQyY03EsamDfGmswxxHt67w a9Cg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature :dkim-signature; bh=bvn13rfdfmvV1YQvbwuoN+ga+7VR2YZHLfDOkPJnpKQ=; b=khNFLJmmfrJux+1ardEyccl1ArNncho6PHBO5klqqvsEg7eXg//c2JITeJER6C39Kv pSyFSBqxXlkGu3+QhtKpGRp5KYt68DqJlgTrL1JHvUK7araDutYyHVoiIyn4YMxQfCED Pdxu2tJdNgRIwCVd8IcEQ5GQlJCzTCI6kyYmzct+3rDagf4LNxgssgG0VIzzX77xn3m1 quTA3ra8ACf9pqOXuSZ3CMPR/YNM0gyvtFoyPYtALAelIZWGJyWe8V0i+GVBQpKzohcR C/paP96iOB6mxi6T9hhbVVX8dwABju2RQvPZcKcgjGtqkoqxUC8YctalRvCOAVVrQ7nS a3/A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.cz header.s=susede2_rsa header.b=Dg9oJu3a; dkim=neutral (no key) header.i=@suse.cz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id d23-20020a63d657000000b0047048d81089si10313078pgj.186.2022.11.14.06.43.32; Mon, 14 Nov 2022 06:43:44 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.cz header.s=susede2_rsa header.b=Dg9oJu3a; dkim=neutral (no key) header.i=@suse.cz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236756AbiKNOeJ (ORCPT + 87 others); Mon, 14 Nov 2022 09:34:09 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42900 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235948AbiKNOeH (ORCPT ); Mon, 14 Nov 2022 09:34:07 -0500 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 51F3028D; Mon, 14 Nov 2022 06:34:06 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 09BD92007A; Mon, 14 Nov 2022 14:34:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1668436445; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=bvn13rfdfmvV1YQvbwuoN+ga+7VR2YZHLfDOkPJnpKQ=; b=Dg9oJu3a7HlQzF2fqyGJ0iU2JonKmI5Ux35yTs50cmGX8PQi3huYmg1RLQwTSi4+lCeY6h VCPhbwi8HFf2Qg1RGvhfePPU3OnOySTI/sc4OpEeg3cPMXBq0KPsL5tG6+rvrscPVaudby tocqHcpgVjB0cXh10ywW6ydZRThYcLY= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1668436445; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=bvn13rfdfmvV1YQvbwuoN+ga+7VR2YZHLfDOkPJnpKQ=; b=d463NL2mlAxK5hgLZJgILG0wG9v64ivh4KNOHjQ/dmeZ4zfSySJvzi/0QKBX6qAMuPVUhj FIXAG/td0NOeOACA== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id EFCA913A92; Mon, 14 Nov 2022 14:34:04 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 3/+LOtxRcmOPcAAAMHmgww (envelope-from ); Mon, 14 Nov 2022 14:34:04 +0000 Received: by quack3.suse.cz (Postfix, from userid 1000) id 8119AA0709; Mon, 14 Nov 2022 15:34:04 +0100 (CET) Date: Mon, 14 Nov 2022 15:34:04 +0100 From: Jan Kara To: Gabriel Krisman Bertazi Cc: Jan Kara , axboe@kernel.dk, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, Hugh Dickins , Keith Busch , Liu Song Subject: Re: [PATCH] sbitmap: Advance the queue index before waking up the queue Message-ID: <20221114143404.c47lvnbfihz5tdj5@quack3> References: <20221105231055.25953-1-krisman@suse.de> <20221114132313.5cqhvzxarm7rwvmt@quack3> <87wn7xk46u.fsf_-_@gbertazi.udp.ovpn1.prg.suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87wn7xk46u.fsf_-_@gbertazi.udp.ovpn1.prg.suse.de> X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon 14-11-22 09:20:57, Gabriel Krisman Bertazi wrote: > Jan Kara writes: > > > Gabriel, when looking through this patch, I've noticed we can loose wakeups > > after your latest simplifications. See below for details: > > > > On Sat 05-11-22 19:10:55, Gabriel Krisman Bertazi wrote: > >> @@ -587,7 +571,7 @@ static struct sbq_wait_state *sbq_wake_ptr(struct sbitmap_queue *sbq) > >> for (i = 0; i < SBQ_WAIT_QUEUES; i++) { > >> struct sbq_wait_state *ws = &sbq->ws[wake_index]; > >> > >> - if (waitqueue_active(&ws->wait) && atomic_read(&ws->wait_cnt)) { > >> + if (waitqueue_active(&ws->wait)) { > >> if (wake_index != atomic_read(&sbq->wake_index)) > >> atomic_set(&sbq->wake_index, wake_index); > >> return ws; > > > > Neither sbq_wake_ptr() nor sbitmap_queue_wake_up() now increment the > > wake_index after performing the wakeup. Thus we would effectively keep > > waking tasks from a single waitqueue until it becomes empty and only then > > go for the next waitqueue. This creates unnecessary unfairness in task > > wakeups and even possible starvation issues. So I think we need to advance > > wake_index somewhere. Perhaps here before returning waitqueue. > > right. This is indeed a problem. what do you think of the patch below? > > > Now this may be also problematic - when we were checking the number of woken > > waiters in the older version of the patch (for others: internal version of > > the patch) this was fine but now it may happen that the 'ws' we have > > selected has no waiters anymore. And in that case we need to find another > > waitqueue because otherwise we'd be loosing too many wakeups and we could > > deadlock. So I think this rather needs to be something like: > > > > do { > > if (atomic_read(&sbq->completion_cnt) - wakeups < wake_batch) > > return; > > } while (!atomic_try_cmpxchg(&sbq->wakeup_cnt, > > &wakeups, wakeups + wake_batch)); > > > > do { > > ws = sbq_wake_ptr(sbq); > > if (!ws) > > return; > > } while (!wake_up_nr(&ws->wait, wake_batch)); > > > > with our original version of wake_up_nr() returning number of woken > > waiters. What do you think? > > I agree, and it wouldn't happen with the wake_up_nr patch we had before. > I will revive it quickly and follow up. But, in this case, I want to be > cautious with benchmarking, so I will follow up still today, but as soon > as the new round of tests complete. Sure. > -- >8 -- > Subject: [PATCH] sbitmap: Advance the queue index before waking up the queue > > When a queue is awaken, the wake_index written by sbq_wake_ptr currently > keeps pointing to the same queue. On the next wake up, it will thus > retry the same queue, which is unfair to other queues, and can lead to > starvation. This patch, moves the index update to happen before the > queue is returned, such that it will now try a different queue first on > the next wake up, improving fairness. > > Reported-by: Jan Kara > Signed-off-by: Gabriel Krisman Bertazi Yes, nice. Feel free to add: Reviewed-by: Jan Kara Honza > --- > lib/sbitmap.c | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/lib/sbitmap.c b/lib/sbitmap.c > index eca462cba398..bea7984f7987 100644 > --- a/lib/sbitmap.c > +++ b/lib/sbitmap.c > @@ -571,13 +571,19 @@ static struct sbq_wait_state *sbq_wake_ptr(struct sbitmap_queue *sbq) > for (i = 0; i < SBQ_WAIT_QUEUES; i++) { > struct sbq_wait_state *ws = &sbq->ws[wake_index]; > > + /* > + * Advance the index before checking the current queue. > + * It improves fairness, by ensuring the queue doesn't > + * need to be fully emptied before trying to wake up > + * from the next one. > + */ > + wake_index = sbq_index_inc(wake_index); > + > if (waitqueue_active(&ws->wait)) { > if (wake_index != atomic_read(&sbq->wake_index)) > atomic_set(&sbq->wake_index, wake_index); > return ws; > } > - > - wake_index = sbq_index_inc(wake_index); > } > > return NULL; > -- > 2.35.3 > > > > > -- Jan Kara SUSE Labs, CR