Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp521383rwd; Wed, 31 May 2023 01:26:02 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7grcWJIzIEn9u7g0GMh0DJzVeAhz6oW1BqV3+fy5hPjyMYhc3TBujJ2ltQBOQ9yb5umEjh X-Received: by 2002:ac8:5c13:0:b0:3f4:e7e4:177b with SMTP id i19-20020ac85c13000000b003f4e7e4177bmr5551226qti.18.1685521562178; Wed, 31 May 2023 01:26:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685521562; cv=none; d=google.com; s=arc-20160816; b=nGJV69DdkYDpqAqX4YUk/42tSxTZ2HFFbAQEdkWCXgBHy2D38S6Llt3opprFUI1nWA E/6XnPaVeVr+/r0FaEynRzKwUTMFWvwNTYm35SVImLHMch08hMNNZ4VvFQjMTwck3CZA Rs8gR+8YH1VXlNLUe7+bYNaPetU4axERF/G9AV50cxwuo1LID7SxxPHdjdP/IK6fpV+Q taL08Te/NhoS1v5Sm7omY+nPDJzc5bGzgXedPgXs5q79xXV/qxQiE3CrJK9OA0gQPGyP HZhO3y0IL4IDw7V89QqU0eWj0NBKYkf2t/EunpUs5sg82cpaL1FtHj8neTeVOmf/0hF2 z86A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:references:cc:to :subject; bh=smJupp/fMdHz6Ia4GHYHzSTH2A0vS1XdjlydY2bncnA=; b=wp7kv8EpzWfdWnW7kW9qzfoohSih/VBTIv5SG5bnHO6fNEukhD6TDnwI7lbeitK/2/ p3yQIb3kg1CEc5vTdncKbMs1ow5OVPOvDz69i0+51z1qXOqL4J53Y/c6Fm8OtGGMdlrY PeUqA3j2vQaFZ+Z+Q9+UHXCNrKbyCMLLRv4c/YrTFui2KG1kaRAoDa3mRMqJ+xAq1Mft Ps3Yv0fRRDKrI7TRuRjgj8tiyTIpbMlKyDob1oQqMkUwO91J8FNQ4ogvp2tpTe48Ju4q eAlKGDHxk8+rzxSI0sSx+4GCrtr20nwt7PiiUdX+f6reTLX9qt3s0Mt31Yp9TJSiZ3IM QtNA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id bs125-20020a632883000000b00517ac1a9a63si625364pgb.129.2023.05.31.01.25.48; Wed, 31 May 2023 01:26:02 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235066AbjEaIGx (ORCPT + 99 others); Wed, 31 May 2023 04:06:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39068 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235178AbjEaIGo (ORCPT ); Wed, 31 May 2023 04:06:44 -0400 Received: from dggsgout11.his.huawei.com (unknown [45.249.212.51]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EEEED113; Wed, 31 May 2023 01:06:40 -0700 (PDT) Received: from mail02.huawei.com (unknown [172.30.67.153]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4QWMKn05Gvz4f3lVQ; Wed, 31 May 2023 16:06:37 +0800 (CST) Received: from [10.174.176.73] (unknown [10.174.176.73]) by APP4 (Coremail) with SMTP id gCh0CgAHvbAMAHdk5YFtKg--.48355S3; Wed, 31 May 2023 16:06:37 +0800 (CST) Subject: Re: [PATCH -next v3 6/7] md/raid1-10: don't handle pluged bio by daemon thread To: Xiao Ni , Yu Kuai Cc: song@kernel.org, neilb@suse.de, akpm@osdl.org, linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org, yi.zhang@huawei.com, yangerkun@huawei.com, "yukuai (C)" References: <20230529131106.2123367-1-yukuai1@huaweicloud.com> <20230529131106.2123367-7-yukuai1@huaweicloud.com> <830352e1-ecfa-f5dc-ce7c-349553bd3003@huaweicloud.com> From: Yu Kuai Message-ID: Date: Wed, 31 May 2023 16:06:35 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-CM-TRANSID: gCh0CgAHvbAMAHdk5YFtKg--.48355S3 X-Coremail-Antispam: 1UD129KBjvJXoW3AF4rCrWUWF45WFyrCFW3Wrg_yoWxGryUp3 yUJa1YkFWUJrW2vwnFva1jvFySqayDKFW7ZrykGws5WF9IqF9rGF4UGFW8urykZr15GFyx Zr15KrZxGFyYvFJanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUU9214x267AKxVW8JVW5JwAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK02 1l84ACjcxK6xIIjxv20xvE14v26F1j6w1UM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26r4U JVWxJr1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_Gc CE3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2WlYx0E 2Ix0cI8IcVAFwI0_Jrv_JF1lYx0Ex4A2jsIE14v26r1j6r4UMcvjeVCFs4IE7xkEbVWUJV W8JwACjcxG0xvEwIxGrwACjI8F5VA0II8E6IAqYI8I648v4I1lFIxGxcIEc7CjxVA2Y2ka 0xkIwI1lc7I2V7IY0VAS07AlzVAYIcxG8wCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7x kEbVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E 67AF67kF1VAFwI0_Jw0_GFylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCw CI42IY6xIIjxv20xvEc7CjxVAFwI0_Gr0_Cr1lIxAIcVCF04k26cxKx2IYs7xG6rWUJVWr Zr1UMIIF0xvEx4A2jsIE14v26r1j6r4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Gr0_Gr1UYx BIdaVFxhVjvjDU0xZFpf9x0JUp6wZUUUUU= X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ X-CFilter-Loop: Reflected X-Spam-Status: No, score=-1.6 required=5.0 tests=BAYES_00,KHOP_HELO_FCRDNS, MAY_BE_FORGED,NICE_REPLY_A,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, 在 2023/05/31 16:00, Xiao Ni 写道: > On Wed, May 31, 2023 at 3:55 PM Yu Kuai wrote: >> >> Hi, >> >> 在 2023/05/31 15:50, Xiao Ni 写道: >>> On Mon, May 29, 2023 at 9:14 PM Yu Kuai wrote: >>>> >>>> From: Yu Kuai >>>> >>>> current->bio_list will be set under submit_bio() context, in this case >>>> bitmap io will be added to the list and wait for current io submission to >>>> finish, while current io submission must wait for bitmap io to be done. >>>> commit 874807a83139 ("md/raid1{,0}: fix deadlock in bitmap_unplug.") fix >>>> the deadlock by handling plugged bio by daemon thread. >>> >>> Thanks for the historic introduction. I did a test and printed the >>> logs in raid10_unplug. The tools I used are dd and mkfs. from_schedule >>> is always true during I/O and it's 0 when io finishes. So I have a >>> question here, how can I trigger the condition that from_schedule is 0 >>> and current->list is not NULL? In other words, is there really a >>> deadlock here? Before your patch it looks like all bios are merged >>> into conf->pending_bio_list and are handled by raid10d. It can't >>> submit bio directly in the originating process which mentioned in >>> 57c67df48866 >>> >> As I mentioned below, after commit a214b949d8e3, this deadlock doesn't >> exist anymore, and without this patch, patch 7 will introduce this >> scenario again. >> >> Thanks, >> Kuai >>>> >>>> On the one hand, the deadlock won't exist after commit a214b949d8e3 >>>> ("blk-mq: only flush requests from the plug in blk_mq_submit_bio"). On >>>> the other hand, current solution makes it impossible to flush plugged bio >>>> in raid1/10_make_request(), because this will cause that all the writes >>>> will goto daemon thread. >>>> >>>> In order to limit the number of plugged bio, commit 874807a83139 >>>> ("md/raid1{,0}: fix deadlock in bitmap_unplug.") is reverted, and the >>>> deadlock is fixed by handling bitmap io asynchronously. >>>> >>>> Signed-off-by: Yu Kuai >>>> --- >>>> drivers/md/raid1-10.c | 14 ++++++++++++++ >>>> drivers/md/raid1.c | 4 ++-- >>>> drivers/md/raid10.c | 8 +++----- >>>> 3 files changed, 19 insertions(+), 7 deletions(-) >>>> >>>> diff --git a/drivers/md/raid1-10.c b/drivers/md/raid1-10.c >>>> index 73cc3cb9154d..17e55c1fd5a1 100644 >>>> --- a/drivers/md/raid1-10.c >>>> +++ b/drivers/md/raid1-10.c >>>> @@ -151,3 +151,17 @@ static inline bool raid1_add_bio_to_plug(struct mddev *mddev, struct bio *bio, >>>> >>>> return true; >>>> } >>>> + >>>> +/* >>>> + * current->bio_list will be set under submit_bio() context, in this case bitmap >>>> + * io will be added to the list and wait for current io submission to finish, >>>> + * while current io submission must wait for bitmap io to be done. In order to >>>> + * avoid such deadlock, submit bitmap io asynchronously. >>>> + */ >>>> +static inline void raid1_prepare_flush_writes(struct bitmap *bitmap) >>>> +{ >>>> + if (current->bio_list) >>>> + md_bitmap_unplug_async(bitmap); >>>> + else >>>> + md_bitmap_unplug(bitmap); >>>> +} >>>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c >>>> index 0778e398584c..006620fed595 100644 >>>> --- a/drivers/md/raid1.c >>>> +++ b/drivers/md/raid1.c >>>> @@ -794,7 +794,7 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect >>>> static void flush_bio_list(struct r1conf *conf, struct bio *bio) >>>> { >>>> /* flush any pending bitmap writes to disk before proceeding w/ I/O */ >>>> - md_bitmap_unplug(conf->mddev->bitmap); >>>> + raid1_prepare_flush_writes(conf->mddev->bitmap); >>> >>> If we unplug bitmap asynchronously, can we make sure the bitmap are >>> flushed before the corresponding data? > > Could you explain this question? Sorry that I missed this... See the new helper in patch 5, md_bitmap_unplug_async() will still wait for bitmap io to finish. md_bitmap_unplug_async DECLARE_COMPLETION_ONSTACK(done) ... wait_for_completion(&done) Thanks, Kuai > > Regards > Xiao > > >>> >>> Regards >>> Xiao >>> >>>> wake_up(&conf->wait_barrier); >>>> >>>> while (bio) { /* submit pending writes */ >>>> @@ -1166,7 +1166,7 @@ static void raid1_unplug(struct blk_plug_cb *cb, bool from_schedule) >>>> struct r1conf *conf = mddev->private; >>>> struct bio *bio; >>>> >>>> - if (from_schedule || current->bio_list) { >>>> + if (from_schedule) { >>>> spin_lock_irq(&conf->device_lock); >>>> bio_list_merge(&conf->pending_bio_list, &plug->pending); >>>> spin_unlock_irq(&conf->device_lock); >>>> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c >>>> index 6640507ecb0d..fb22cfe94d32 100644 >>>> --- a/drivers/md/raid10.c >>>> +++ b/drivers/md/raid10.c >>>> @@ -902,9 +902,7 @@ static void flush_pending_writes(struct r10conf *conf) >>>> __set_current_state(TASK_RUNNING); >>>> >>>> blk_start_plug(&plug); >>>> - /* flush any pending bitmap writes to disk >>>> - * before proceeding w/ I/O */ >>>> - md_bitmap_unplug(conf->mddev->bitmap); >>>> + raid1_prepare_flush_writes(conf->mddev->bitmap); >>>> wake_up(&conf->wait_barrier); >>>> >>>> while (bio) { /* submit pending writes */ >>>> @@ -1108,7 +1106,7 @@ static void raid10_unplug(struct blk_plug_cb *cb, bool from_schedule) >>>> struct r10conf *conf = mddev->private; >>>> struct bio *bio; >>>> >>>> - if (from_schedule || current->bio_list) { >>>> + if (from_schedule) { >>>> spin_lock_irq(&conf->device_lock); >>>> bio_list_merge(&conf->pending_bio_list, &plug->pending); >>>> spin_unlock_irq(&conf->device_lock); >>>> @@ -1120,7 +1118,7 @@ static void raid10_unplug(struct blk_plug_cb *cb, bool from_schedule) >>>> >>>> /* we aren't scheduling, so we can do the write-out directly. */ >>>> bio = bio_list_get(&plug->pending); >>>> - md_bitmap_unplug(mddev->bitmap); >>>> + raid1_prepare_flush_writes(mddev->bitmap); >>>> wake_up(&conf->wait_barrier); >>>> >>>> while (bio) { /* submit pending writes */ >>>> -- >>>> 2.39.2 >>>> >>> >>> . >>> >> > > . >