Received: by 2002:a05:6358:1087:b0:cb:c9d3:cd90 with SMTP id j7csp1469130rwi; Wed, 19 Oct 2022 10:58:38 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4FXa62xVJ0t7CmYw6NtcmCvrlcyDtD3QHx9STZH3gzUeIiVEkTU2aDt6mmHX7J+DXJz70p X-Received: by 2002:a17:906:8451:b0:78d:d467:55a4 with SMTP id e17-20020a170906845100b0078dd46755a4mr7729886ejy.352.1666202318003; Wed, 19 Oct 2022 10:58:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666202317; cv=none; d=google.com; s=arc-20160816; b=b5ZkA8Sq8CiF+DJpju5frg6YEUDsYitf0ns2Mpqlp9SCXV4ER3ABS1UgrmMMMpsdug vTE+BgZzgA7hX4o2Vb8k0pyVbtPm3TX9s2Dk1j8IztzgJXW5kA2nQxtVKT8+lX61gTHU nhjtplyAe+NgPNgcM+8eOjXEmW3oz2QTuj6v8V04E6Tud05cN9iGDAEmUqQOz8LPLpj5 sHmABMMU+dcfgXUxUV8hFPnumbigQ91PHkDAQd7npk6P61ZPVS8ExdKaHne56nd41Nxn HjPjbWXbfyfiq6djoAHxp2oNce7p4sJh7NQGJeweg5nE5ZwIcjpwIhpoVq3qcsGPhNNP 7I8w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=9QUeR0xJtQiT2FtLI52jrQbxV3zmFpDGxTjYeu2DOx8=; b=VM8EKGRyzPpS2sTKKAhCOcWUfuM9nY3+n37Cbll38lK/0H0aQSPG/b2iHtTWD9fc/r xGtTRAK1rveoNuybBE04xNpk0sD9BF3es1SWXo1F+Rt8YO+6md89gDB/fcaxMItPF2pq ER1BDyonvqRxPUxHhmRVrOSpqH9Wabx357umNmyo0q4m8f/+guNFGoW6G0gjRKEwBBqc tMIAFg8PeMgpSPFALjcf+Vz4jPcmTFKyQWqDyJEADIqmxGit3MWwKYfZZ6QOcFCr1mV3 t7O4bg5JL+7L6IfLQ3nHBRew2muLoRiQGEAW1N4cyogLXzJe4gAuBzg/CCQLQwoXph74 5zeA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel-dk.20210112.gappssmtp.com header.s=20210112 header.b="V/ZniAWg"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id w13-20020a05640234cd00b00459f9c3d02bsi17983166edc.22.2022.10.19.10.58.12; Wed, 19 Oct 2022 10:58:37 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel-dk.20210112.gappssmtp.com header.s=20210112 header.b="V/ZniAWg"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231286AbiJSRi3 (ORCPT + 99 others); Wed, 19 Oct 2022 13:38:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44514 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230406AbiJSRiH (ORCPT ); Wed, 19 Oct 2022 13:38:07 -0400 Received: from mail-pg1-x536.google.com (mail-pg1-x536.google.com [IPv6:2607:f8b0:4864:20::536]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 71DFE1D5E13 for ; Wed, 19 Oct 2022 10:37:45 -0700 (PDT) Received: by mail-pg1-x536.google.com with SMTP id 129so16882788pgc.5 for ; Wed, 19 Oct 2022 10:37:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=9QUeR0xJtQiT2FtLI52jrQbxV3zmFpDGxTjYeu2DOx8=; b=V/ZniAWglCRNerUtlTJMs4j1SfPZ1ajarI4h/dNAxQyEK2N36B2teXn8PZxt9kvn1f Xif8oSa5gYEVDVvg0XegTof5CeCWNSF62xVS+QfPKCSUslCcknu9BwKqRLjTVJYzQ+G/ XbiXwKu14DCQBdyaY/YaWAK8KtZLc8RKq3Vt5aaTuLeHoMmPjRUnlUF2nqgNCCtM2uM+ 8noapyxj98cscd5ee2Cj6rYlT7SGJtOgYveyoELRYZq3FvNKBM/zAb2fHZea+pLml8sm 3Yg/zWKcNMEnuBznDhTYhJqvapfUgOUMeK0ldzMaKpVYcPV3tAQcYjf9WdI2ATnszIgn SIfQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=9QUeR0xJtQiT2FtLI52jrQbxV3zmFpDGxTjYeu2DOx8=; b=AKdWsQnChqB1u+IgqZXO3algx9IHkXxVipJdkCdNrp8FM5dHbTV5+VdUt48hijJzaC yvudMmGe23bt3js3AQpn6DtIsGTx8ZnlnzwUhfJHdqxkpOLgiPPd8k7b0u70s2x4JGFD 4/nNP9yz0R9Gt/DgQNh4tbJEJffc/g/6rUF7f8rpJDmIGkoJV2EIJZIdoOweiwhsgWCW YSEMagimTUDnzADaD2Kwu9PozVKv5pIzgE0E9cF8BzCbvJQuUzBD6+MOFTqDr5fhSb3B Jcze7tPwPHjJVOhvSocPFihfuUW1JMJA/jRd9ETxOK4d+Fe1N9xTrTWnId6UidjmrNMI uacQ== X-Gm-Message-State: ACrzQf1KTUiL38eeNMMFfwZUGAghOIkJnZk5Fry5I47DX7zfJLXbCihY 3m7/7VP/EGUcTiRxPiUqf+MfRtqJyVCApxEo X-Received: by 2002:a65:6c08:0:b0:448:c216:fe9 with SMTP id y8-20020a656c08000000b00448c2160fe9mr7980809pgu.243.1666201055365; Wed, 19 Oct 2022 10:37:35 -0700 (PDT) Received: from [192.168.4.201] (cpe-72-132-29-68.dc.res.rr.com. [72.132.29.68]) by smtp.gmail.com with ESMTPSA id a188-20020a624dc5000000b005629d8a3204sm11892309pfb.99.2022.10.19.10.37.34 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 19 Oct 2022 10:37:34 -0700 (PDT) Message-ID: Date: Wed, 19 Oct 2022 10:37:33 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux aarch64; rv:102.0) Gecko/20100101 Thunderbird/102.3.3 Subject: Re: [PATCH 6.0 479/862] sbitmap: fix possible io hung due to lost wakeup Content-Language: en-US To: Greg Kroah-Hartman , Hugh Dickins Cc: linux-kernel@vger.kernel.org, stable@vger.kernel.org, Yu Kuai , Jan Kara , Sasha Levin , linux-block@vger.kernel.org References: <20221019083249.951566199@linuxfoundation.org> <20221019083311.114449669@linuxfoundation.org> <174a196-5473-4e93-a52a-5e26eb37949@google.com> From: Jens Axboe In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/19/22 10:25 AM, Greg Kroah-Hartman wrote: > On Wed, Oct 19, 2022 at 08:06:26AM -0700, Hugh Dickins wrote: >> On Wed, 19 Oct 2022, Greg Kroah-Hartman wrote: >> >>> From: Yu Kuai >>> >>> [ Upstream commit 040b83fcecfb86f3225d3a5de7fd9b3fbccf83b4 ] >>> >>> There are two problems can lead to lost wakeup: >>> >>> 1) invalid wakeup on the wrong waitqueue: >>> >>> For example, 2 * wake_batch tags are put, while only wake_batch threads >>> are woken: >>> >>> __sbq_wake_up >>> atomic_cmpxchg -> reset wait_cnt >>> __sbq_wake_up -> decrease wait_cnt >>> ... >>> __sbq_wake_up -> wait_cnt is decreased to 0 again >>> atomic_cmpxchg >>> sbq_index_atomic_inc -> increase wake_index >>> wake_up_nr -> wake up and waitqueue might be empty >>> sbq_index_atomic_inc -> increase again, one waitqueue is skipped >>> wake_up_nr -> invalid wake up because old wakequeue might be empty >>> >>> To fix the problem, increasing 'wake_index' before resetting 'wait_cnt'. >>> >>> 2) 'wait_cnt' can be decreased while waitqueue is empty >>> >>> As pointed out by Jan Kara, following race is possible: >>> >>> CPU1 CPU2 >>> __sbq_wake_up __sbq_wake_up >>> sbq_wake_ptr() sbq_wake_ptr() -> the same >>> wait_cnt = atomic_dec_return() >>> /* decreased to 0 */ >>> sbq_index_atomic_inc() >>> /* move to next waitqueue */ >>> atomic_set() >>> /* reset wait_cnt */ >>> wake_up_nr() >>> /* wake up on the old waitqueue */ >>> wait_cnt = atomic_dec_return() >>> /* >>> * decrease wait_cnt in the old >>> * waitqueue, while it can be >>> * empty. >>> */ >>> >>> Fix the problem by waking up before updating 'wake_index' and >>> 'wait_cnt'. >>> >>> With this patch, noted that 'wait_cnt' is still decreased in the old >>> empty waitqueue, however, the wakeup is redirected to a active waitqueue, >>> and the extra decrement on the old empty waitqueue is not handled. >>> >>> Fixes: 88459642cba4 ("blk-mq: abstract tag allocation out into sbitmap library") >>> Signed-off-by: Yu Kuai >>> Reviewed-by: Jan Kara >>> Link: https://lore.kernel.org/r/20220803121504.212071-1-yukuai1@huaweicloud.com >>> Signed-off-by: Jens Axboe >>> Signed-off-by: Sasha Levin >> >> I have no authority on linux-block, but I'll say NAK to this one >> (and 517/862), and let Jens and Jan overrule me if they disagree. >> >> This was the first of several 6.1-rc1 commits which had given me lost >> wakeups never suffered before; was not tagged Cc stable; and (unless I've >> missed it on lore) never had AUTOSEL posted to linux-block or linux-kernel. > > Ok, thanks for the review. I'll drop both of the sbitmap.c changes and > if people report issues and want them back, I'll be glad to revisit them > then. Sorry for being late, did see Hugh respond to the original auto-select as well, and was surprised to see it moving forward after that. Let's please drop them for now. -- Jens Axboe