Received: by 2002:a05:6358:4e97:b0:b3:742d:4702 with SMTP id ce23csp376098rwb; Fri, 12 Aug 2022 23:08:34 -0700 (PDT) X-Google-Smtp-Source: AA6agR4TYU+QrzRzFlbPNUfcFfic9T/TXtJ7FHA7DP1hIhQOu5oEyALBa+w6est4zrcogsvHMbWP X-Received: by 2002:a17:907:1b1c:b0:72f:9aac:ee41 with SMTP id mp28-20020a1709071b1c00b0072f9aacee41mr4589645ejc.56.1660370914031; Fri, 12 Aug 2022 23:08:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1660370914; cv=none; d=google.com; s=arc-20160816; b=FOK3XsNgcrkzg+2NPlYVKlH3P1PPpqdpGHQrbwPUoHomCr8vsoN0cOv7COwUEJ06hF +wGUl0CKPENLJnu4l1UgCrw3rdqsFn9ynjQFm25Wmiwm7myU+z4DUSakub67qlDQnnxz I+zZPJwdYqfldq/X/1H2EWiMv1Os5rV9t3roJkkwSY4olGn3qO5eNDvFCUZHKKAZcAdK vaGdLZ1Zi3X1pHUsOSphQ3xcze1RLH14mAH58XIs37w/kQpOeStP7Jod4tIT0lApjSUJ YcRHNnzPjiLQj978T5zqrrH4W5fWY9o13ZqgEps9SVKKz8+vRZMQbMvyldN1nqBTAgSR IT3g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:references:cc:to :subject; bh=N32gugl9hdCE2tQ7AwNOM2cap5Ir8MBAzaCK+auO4L0=; b=P6k/eACL+91zAVDEEwHQmdSz5rBrGvtJfpH7bZjC10L5plVkW7C5JmqILOFbWJavuW cWLDSl8jhI0TRhWkILTm53rjnJfLgF5uIZNXlSrJEhydwP4lSgqVtpA3i80j0zJiy+FE IdFQg3HY+x3MuFlLR7IkLA8BVOufh19oT9gtofPucWzF2o+3AilsPLlMH6xnEf5hZLM8 ac0NV2W/cBw+DUmiDAbHQUstLfWS4YpNYOpEyH4pL9oWK+F1cFFnVFQGsg1Ww6H2gKj9 ueber2Vu+5Vdjzt8Ho/gODOcOCSRKxCRWY5THS4Mb+Jf2UPrwIcyot5Lxa6+kVSLjgnQ Wi7g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z4-20020a05640235c400b0043c7df15c19si3808892edc.554.2022.08.12.23.08.07; Fri, 12 Aug 2022 23:08:34 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236782AbiHMF7I (ORCPT + 99 others); Sat, 13 Aug 2022 01:59:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40454 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229507AbiHMF7G (ORCPT ); Sat, 13 Aug 2022 01:59:06 -0400 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3987B41D3F; Fri, 12 Aug 2022 22:59:05 -0700 (PDT) Received: from mail02.huawei.com (unknown [172.30.67.169]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4M4VFF5nbqzKJY5; Sat, 13 Aug 2022 13:57:37 +0800 (CST) Received: from [10.174.176.73] (unknown [10.174.176.73]) by APP4 (Coremail) with SMTP id gCh0CgA3PfqjPfdi_zsfAQ--.8775S3; Sat, 13 Aug 2022 13:59:01 +0800 (CST) Subject: Re: [PATCH] sbitmap: fix possible io hung due to lost wakeup To: Yu Kuai , jack@suse.cz, axboe@kernel.dk, osandov@fb.com Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, yi.zhang@huawei.com, "yukuai (C)" References: <20220803121504.212071-1-yukuai1@huaweicloud.com> From: Yu Kuai Message-ID: Date: Sat, 13 Aug 2022 13:58:59 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 In-Reply-To: <20220803121504.212071-1-yukuai1@huaweicloud.com> Content-Type: text/plain; charset=gbk; format=flowed Content-Transfer-Encoding: 8bit X-CM-TRANSID: gCh0CgA3PfqjPfdi_zsfAQ--.8775S3 X-Coremail-Antispam: 1UD129KBjvJXoWxuF48Wr4DWw1fury3XryxKrg_yoW7Jr4kpr W3JF1vva9YyrWIywsrGr4jv3WF9w4vgrZrGr43Kw15Cr12qr1Ykr109r45ury8ArZ8W345 tr13JFZ3CFyUJaDanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUkG14x267AKxVW8JVW5JwAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK02 1l84ACjcxK6xIIjxv20xvE14v26F1j6w1UM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26F4j 6r4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oV Cq3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0 I7IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r 4UM4x0Y48IcVAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwCYjI0SjxkI62AI1cAE67vI Y487MxAIw28IcxkI7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI 0_Jr0_Jr4lx2IqxVCjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVWUtVW8ZwCIc40Y 0x0EwIxGrwCI42IY6xIIjxv20xvE14v26r1j6r1xMIIF0xvE2Ix0cI8IcVCY1x0267AKxV W8JVWxJwCI42IY6xAIw20EY4v20xvaj40_Zr0_Wr1UMIIF0xvEx4A2jsIE14v26r1j6r4U MIIF0xvEx4A2jsIEc7CjxVAFwI0_Gr0_Gr1UYxBIdaVFxhVjvjDU0xZFpf9x0JUdHUDUUU UU= X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ X-CFilter-Loop: Reflected X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org ?? 2022/08/03 20:15, Yu Kuai ะด??: > From: Yu Kuai > > There are two problems can lead to lost wakeup: > > 1) invalid wakeup on the wrong waitqueue: > > For example, 2 * wake_batch tags are put, while only wake_batch threads > are woken: > > __sbq_wake_up > atomic_cmpxchg -> reset wait_cnt > __sbq_wake_up -> decrease wait_cnt > ... > __sbq_wake_up -> wait_cnt is decreased to 0 again > atomic_cmpxchg > sbq_index_atomic_inc -> increase wake_index > wake_up_nr -> wake up and waitqueue might be empty > sbq_index_atomic_inc -> increase again, one waitqueue is skipped > wake_up_nr -> invalid wake up because old wakequeue might be empty > > To fix the problem, increasing 'wake_index' before resetting 'wait_cnt'. > > 2) 'wait_cnt' can be decreased while waitqueue is empty > > As pointed out by Jan Kara, following race is possible: > > CPU1 CPU2 > __sbq_wake_up __sbq_wake_up > sbq_wake_ptr() sbq_wake_ptr() -> the same > wait_cnt = atomic_dec_return() > /* decreased to 0 */ > sbq_index_atomic_inc() > /* move to next waitqueue */ > atomic_set() > /* reset wait_cnt */ > wake_up_nr() > /* wake up on the old waitqueue */ > wait_cnt = atomic_dec_return() > /* > * decrease wait_cnt in the old > * waitqueue, while it can be > * empty. > */ > > Fix the problem by waking up before updating 'wake_index' and > 'wait_cnt'. > > With this patch, noted that 'wait_cnt' is still decreased in the old > empty waitqueue, however, the wakeup is redirected to a active waitqueue, > and the extra decrement on the old empty waitqueue is not handled. friendly ping ... > > Fixes: 88459642cba4 ("blk-mq: abstract tag allocation out into sbitmap library") > Signed-off-by: Yu Kuai > Reviewed-by: Jan Kara > --- > Changes in official version: > - fix spelling mistake in comments > - add review tag > Changes in rfc v4: > - remove patch 1, which improve fairness with overhead > - merge patch2 and patch 3 > Changes in rfc v3: > - rename patch 2, and add some comments. > - add patch 3, which fixes a new issue pointed out by Jan Kara. > Changes in rfc v2: > - split to spearate patches for different problem. > - add fix tag > > previous versions: > rfc v1: https://lore.kernel.org/all/20220617141125.3024491-1-yukuai3@huawei.com/ > rfc v2: https://lore.kernel.org/all/20220619080309.1630027-1-yukuai3@huawei.com/ > rfc v3: https://lore.kernel.org/all/20220710042200.20936-1-yukuai1@huaweicloud.com/ > rfc v4: https://lore.kernel.org/all/20220723024122.2990436-1-yukuai1@huaweicloud.com/ > lib/sbitmap.c | 55 ++++++++++++++++++++++++++++++--------------------- > 1 file changed, 33 insertions(+), 22 deletions(-) > > diff --git a/lib/sbitmap.c b/lib/sbitmap.c > index 29eb0484215a..1aa55806f6a5 100644 > --- a/lib/sbitmap.c > +++ b/lib/sbitmap.c > @@ -611,32 +611,43 @@ static bool __sbq_wake_up(struct sbitmap_queue *sbq) > return false; > > wait_cnt = atomic_dec_return(&ws->wait_cnt); > - if (wait_cnt <= 0) { > - int ret; > + /* > + * For concurrent callers of this, callers should call this function > + * again to wakeup a new batch on a different 'ws'. > + */ > + if (wait_cnt < 0 || !waitqueue_active(&ws->wait)) > + return true; > > - wake_batch = READ_ONCE(sbq->wake_batch); > + if (wait_cnt > 0) > + return false; > > - /* > - * Pairs with the memory barrier in sbitmap_queue_resize() to > - * ensure that we see the batch size update before the wait > - * count is reset. > - */ > - smp_mb__before_atomic(); > + wake_batch = READ_ONCE(sbq->wake_batch); > > - /* > - * For concurrent callers of this, the one that failed the > - * atomic_cmpxhcg() race should call this function again > - * to wakeup a new batch on a different 'ws'. > - */ > - ret = atomic_cmpxchg(&ws->wait_cnt, wait_cnt, wake_batch); > - if (ret == wait_cnt) { > - sbq_index_atomic_inc(&sbq->wake_index); > - wake_up_nr(&ws->wait, wake_batch); > - return false; > - } > + /* > + * Wake up first in case that concurrent callers decrease wait_cnt > + * while waitqueue is empty. > + */ > + wake_up_nr(&ws->wait, wake_batch); > > - return true; > - } > + /* > + * Pairs with the memory barrier in sbitmap_queue_resize() to > + * ensure that we see the batch size update before the wait > + * count is reset. > + * > + * Also pairs with the implicit barrier between decrementing wait_cnt > + * and checking for waitqueue_active() to make sure waitqueue_active() > + * sees result of the wakeup if atomic_dec_return() has seen the result > + * of atomic_set(). > + */ > + smp_mb__before_atomic(); > + > + /* > + * Increase wake_index before updating wait_cnt, otherwise concurrent > + * callers can see valid wait_cnt in old waitqueue, which can cause > + * invalid wakeup on the old waitqueue. > + */ > + sbq_index_atomic_inc(&sbq->wake_index); > + atomic_set(&ws->wait_cnt, wake_batch); > > return false; > } >