Received: by 2002:a05:6a10:6d10:0:0:0:0 with SMTP id gq16csp623421pxb; Fri, 15 Apr 2022 07:34:07 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx55u8wi4kl+WJ759GZ8McRyosU88WQogtHYHn1xDc/4k9NhfKFYjjt5zehb4zIhTiwW+fl X-Received: by 2002:a17:907:1ca3:b0:6e8:ade2:2308 with SMTP id nb35-20020a1709071ca300b006e8ade22308mr6446625ejc.88.1650033247755; Fri, 15 Apr 2022 07:34:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1650033247; cv=none; d=google.com; s=arc-20160816; b=H5Vaajy567fCYzeeSAhLY/g79hZP6jHiVvy4IRUp0GMQIw2UKbO7ox7bESgjuMT9wv BjchWYaxle00sc3FzYlKSffTzw1T7M+wXdv8pz5k0ip9WEBClH3aLeTh6Z4Ko3PwrxsH LtQgq91zlfO4gD7jnhzvriFjy8gYunXgh7u8GP5HpjlavcvaJTOHYbBlvbTL6sfKTbrW Eczk56g7YR2x9iocZU5dM825U1yx2a5ko4P1XMkvK28Pcc/RCKeCNB74roL0hWy5x0Ja Mi56CQV9s2EfkRD3m+NfgSIzJLAxCA4eM4G9z4FSXcRCfTzb/R7hGmKsLfmlyHWuY18j GLFQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:references:cc:to :subject; bh=4YnlPjnGTUgrRYzqaRIaqyaM02O7FrFT91CM/R5PQXM=; b=J+jwpBSiNMJp2xf1n/+/h0pZe1nZm0sXjWU8QlNSz0yGkOMgavY+rYVDWMrHuNoWl7 /5HGQp+a+JfkE1ggDuoUUy98TcLY2HZ3PeSYZhJPNJp8B/0OCGdmTAaLqApyN/BmfRHh NsQLZoOq8+Tz5Wzn0vLAMv9viCcjSEHLKEcgV7NC/03orT2MWpzh/zWzO/U9AQck/MPJ mp4IRLqjNA1SbVr9nT4C1WP0aCvjch3OEe5RCU7Uvsx9RjnFM3/G8+4YVtdaFiXuiD5J VHOH8Hq6JZvEfuGh9TRcQSe4znDEXZXk5nbpn6DiK6QNf/6w/xVTrQFIo+ZFEZdLqYIL lKAg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id jr7-20020a170906514700b006e862e900cfsi1022950ejc.524.2022.04.15.07.33.40; Fri, 15 Apr 2022 07:34:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1350880AbiDOHKM (ORCPT + 99 others); Fri, 15 Apr 2022 03:10:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55334 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233330AbiDOHKK (ORCPT ); Fri, 15 Apr 2022 03:10:10 -0400 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E744E75C16; Fri, 15 Apr 2022 00:07:41 -0700 (PDT) Received: from kwepemi500013.china.huawei.com (unknown [172.30.72.54]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4KfnQd4hL0zFpyJ; Fri, 15 Apr 2022 15:05:13 +0800 (CST) Received: from kwepemm600009.china.huawei.com (7.193.23.164) by kwepemi500013.china.huawei.com (7.221.188.120) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Fri, 15 Apr 2022 15:07:39 +0800 Received: from [10.174.176.73] (10.174.176.73) by kwepemm600009.china.huawei.com (7.193.23.164) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Fri, 15 Apr 2022 15:07:38 +0800 Subject: Re: [PATCH -next RFC v2 3/8] sbitmap: make sure waitqueues are balanced To: "Li, Ming" , , , , CC: , , References: <20220408073916.1428590-1-yukuai3@huawei.com> <20220408073916.1428590-4-yukuai3@huawei.com> From: "yukuai (C)" Message-ID: <208a49d2-5f29-c48e-206c-260ee3f1d991@huawei.com> Date: Fri, 15 Apr 2022 15:07:38 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.176.73] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To kwepemm600009.china.huawei.com (7.193.23.164) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-6.7 required=5.0 tests=BAYES_00,NICE_REPLY_A, RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H5,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 在 2022/04/15 14:31, Li, Ming 写道: > > > On 4/8/2022 3:39 PM, Yu Kuai wrote: >> Currently, same waitqueue might be woken up continuously: >> >> __sbq_wake_up __sbq_wake_up >> sbq_wake_ptr -> assume 0 >> sbq_wake_ptr -> 0 >> atomic_dec_return >> atomic_dec_return >> atomic_cmpxchg -> succeed >> atomic_cmpxchg -> failed >> return true >> >> __sbq_wake_up >> sbq_wake_ptr >> atomic_read(&sbq->wake_index) -> still 0 >> sbq_index_atomic_inc -> inc to 1 >> if (waitqueue_active(&ws->wait)) >> if (wake_index != atomic_read(&sbq->wake_index)) >> atomic_set -> reset from 1 to 0 >> wake_up_nr -> wake up first waitqueue >> // continue to wake up in first waitqueue >> >> What's worse, io hung is possible in theory because wake up might be >> missed. For example, 2 * wake_batch tags are put, while only wake_batch >> threads are worken: >> >> __sbq_wake_up >> atomic_cmpxchg -> reset wait_cnt >> __sbq_wake_up -> decrease wait_cnt >> ... >> __sbq_wake_up -> wait_cnt is decreased to 0 again >> atomic_cmpxchg >> sbq_index_atomic_inc -> increase wake_index >> wake_up_nr -> wake up and waitqueue might be empty >> sbq_index_atomic_inc -> increase again, one waitqueue is skipped >> wake_up_nr -> invalid wake up because old wakequeue might be empty >> >> To fix the problem, refactor to make sure waitqueues will be woken up >> one by one, and also choose the next waitqueue by the number of threads >> that are waiting to keep waitqueues balanced. > Hi, do you think that updating wake_index before atomic_cmpxchg(ws->wait_cnt) also can solve these two problems? > like this: Hi, The first problem is due to sbq_wake_ptr() is using atomic_set() to update 'wake_index'. The second problem is due to __sbq_wake_up() is updating 'wait_cnt' before 'wait_index'. > __sbq_wake_up() > { > .... > if (wait_cnt <= 0) { > ret = atomic_cmpxchg(sbq->wake_index, old_wake_index, next_wake_index); How is the 'next_wake_index' chosen? And the same in sbq_wake_ptr(). > if (ret == old_wake_index) { > ret = atomic_cmpxchg(ws->wait_cnt, wait_cnt, wake_batch); If this failed, just return true with 'wake_index' updated? Then the caller will call this again, so it seems this can't prevent 'wake_index' updated multiple times, and 'wait_cnt' in the old 'ws' is not updated. > if (ret == wait_cnt) > wake_up_nr(ws->wait, wake_batch); > } > } > } > > Your solution is picking the waitqueue with the largest waiters_cnt as the next one to be waked up, I think that waitqueue is possible to starve. > if lots of threads in a same waitqueue stop waiting before sbq wakes them up, it will cause the waiters_cnt of waitqueue is much less than others, looks like sbq_update_wake_index() would never pick this waitqueue. What do you think? is it possible? It will be possible if adding threads to waitqueues is not balanced, and I suppose it's not possible after tag premmption is disabled. However, instead of chosing the waitqueue with largest waiters_cnt, chosing the next waitqueue with 'waiters_cnt > 0' might be alternative. Thanks, Kuai