Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp2888931rwd; Mon, 29 May 2023 02:24:00 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7KXqzwrqfsFJNXEIYEDk1zVfCPkVZAxAqCPmoZCYTXxIQj4Zwnuv+2l3goCCfHpd3etKNQ X-Received: by 2002:a17:90a:4cc7:b0:253:83d8:e487 with SMTP id k65-20020a17090a4cc700b0025383d8e487mr10539400pjh.0.1685352240601; Mon, 29 May 2023 02:24:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685352240; cv=none; d=google.com; s=arc-20160816; b=ON6CkmEr+FK4dhw2PZow7Gh63/GtkdSgRk+PHo2+WbfI5rZZJTI7KAP33o32MJNvpY +9SZ/c/BWuw4TO3m7DRR87meA1SPXfi2tnsqfebW1iR21RSrArKbqGOidmjYdV3h+ikk +NyuoHKi2oacTjJVmuhqq/yCCrY0Sf7mGV4CPYmUg/oG7tHBpBNQpoWFq7pZgpVkla15 RgNq1eDKhh7iUwTLZjd05PiNVJ9pB5pyAsaGNf2brTSfmuJnjc7XOVxE7qDb8jLs5kwg aIGgranPac1wR0EnGX0Z/K6rLqrTyz8RwcabRkRp/qsCyYoidrBo/38YgGX4zaiy+htH 7hKQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:references:cc:to :subject; bh=NeNyGZMGpNF+ORDfL5pUGRxVrdsDGygaErB8THWSAII=; b=C0wX1/3Kwrw0MU72erHTfDm1sbBB3LgNdDSovRdKr+Oph2sNMjvtHptwgJ1RVW2MtX 6Gf/ByRNlhnkW7Zubj+DE2/pFzXN1J4JIdu/BWU6Yce7m8JeuWMmg8NLD2nHpGvUK3Gx /ICWMGIhgRX6gbbXs5gncIZ7fyZr4wkDSTrmh3IkC/a2F2uCqLItabQDGZ5oIUGKWg3A 3rs8m1hH4L4FLTOOuJTzhHbBeSLjGzbHz35JJtSP3IR9+YWAX3jzbZm53GKDEC2TZ8F0 XLsgqTAjk7SNhSWveZUe/DALsh7z6bwD4mFMLoxRJWLFSP+MNtL9ysBOe/sptRJVI077 mWMQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id kk2-20020a17090b4a0200b0024e1172c1d5si7583850pjb.155.2023.05.29.02.23.45; Mon, 29 May 2023 02:24:00 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231585AbjE2Ium (ORCPT + 99 others); Mon, 29 May 2023 04:50:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52388 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229513AbjE2Iuk (ORCPT ); Mon, 29 May 2023 04:50:40 -0400 Received: from dggsgout11.his.huawei.com (unknown [45.249.212.51]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C8BDFAC; Mon, 29 May 2023 01:50:38 -0700 (PDT) Received: from mail02.huawei.com (unknown [172.30.67.153]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4QV8PQ2pf5z4f3v4h; Mon, 29 May 2023 16:50:34 +0800 (CST) Received: from [10.174.176.73] (unknown [10.174.176.73]) by APP4 (Coremail) with SMTP id gCh0CgAHcLNYZ3Rk+JjaKQ--.12674S3; Mon, 29 May 2023 16:50:34 +0800 (CST) Subject: Re: [PATCH -next v2 7/7] md/raid1-10: limit the number of plugged bio To: Xiao Ni , Yu Kuai Cc: song@kernel.org, akpm@osdl.org, neilb@suse.de, linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org, yi.zhang@huawei.com, yangerkun@huawei.com, "yukuai (C)" References: <20230426082031.1299149-1-yukuai1@huaweicloud.com> <20230426082031.1299149-8-yukuai1@huaweicloud.com> <5e9852fe-0d47-92fc-f6a9-16d028d09ad4@huaweicloud.com> <25279079-2600-b0d3-5279-caaf6f664d71@huaweicloud.com> From: Yu Kuai Message-ID: Date: Mon, 29 May 2023 16:50:32 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-CM-TRANSID: gCh0CgAHcLNYZ3Rk+JjaKQ--.12674S3 X-Coremail-Antispam: 1UD129KBjvJXoWxuFW8uF1rAFWfAFyDAr4rAFb_yoW3Ww4xpa 17J3WYkFWUJry7XwnFq3WUZFyftw47XrWUWry8Jw17AryqqFyDWFW8JrWrCr1kZr13GryU Xrs0grZrWr15tF7anT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUU9F14x267AKxVW8JVW5JwAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK02 1l84ACjcxK6xIIjxv20xvE14v26w1j6s0DM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26r4U JVWxJr1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_Gc CE3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2WlYx0E 2Ix0cI8IcVAFwI0_Jr0_Jr4lYx0Ex4A2jsIE14v26r1j6r4UMcvjeVCFs4IE7xkEbVWUJV W8JwACjcxG0xvEwIxGrwACjI8F5VA0II8E6IAqYI8I648v4I1lFIxGxcIEc7CjxVA2Y2ka 0xkIwI1lc7I2V7IY0VAS07AlzVAYIcxG8wCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7x kEbVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E 67AF67kF1VAFwI0_Jw0_GFylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCw CI42IY6xIIjxv20xvEc7CjxVAFwI0_Jr0_Gr1lIxAIcVCF04k26cxKx2IYs7xG6rW3Jr0E 3s1lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIxAIcVC2z280aVCY1x0267AKxVW8JVW8JrUvcS sGvfC2KfnxnUUI43ZEXa7VUbXdbUUUUUU== X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ X-CFilter-Loop: Reflected X-Spam-Status: No, score=-1.6 required=5.0 tests=BAYES_00,KHOP_HELO_FCRDNS, MAY_BE_FORGED,NICE_REPLY_A,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, 在 2023/05/29 15:57, Xiao Ni 写道: > On Mon, May 29, 2023 at 11:18 AM Yu Kuai wrote: >> >> Hi, >> >> 在 2023/05/29 11:10, Xiao Ni 写道: >>> On Mon, May 29, 2023 at 10:20 AM Yu Kuai wrote: >>>> >>>> Hi, >>>> >>>> 在 2023/05/29 10:08, Xiao Ni 写道: >>>>> Hi Kuai >>>>> >>>>> There is a limitation of the memory in your test. But for most >>>>> situations, customers should not set this. Can this change introduce a >>>>> performance regression against other situations? >>>> >>>> Noted that this limitation is just to triggered writeback as soon as >>>> possible in the test, and it's 100% sure real situations can trigger >>>> dirty pages write back asynchronously and continue to produce new dirty >>>> pages. >>> >>> Hi >>> >>> I'm confused here. If we want to trigger write back quickly, it needs >>> to set these two values with a smaller number, rather than 0 and 60. >>> Right? >> >> 60 is not required, I'll remove this setting. >> >> 0 just means write back if there are any dirty pages. > > Hi Kuai > > Does 0 mean disabling write back? I tried to find the doc that > describes the meaning when setting dirty_background_ratio to 0, but I > didn't find it. > In https://www.kernel.org/doc/html/next/admin-guide/sysctl/vm.html it > doesn't describe this. But it says something like this > > Note: > dirty_background_bytes is the counterpart of dirty_background_ratio. Only > one of them may be specified at a time. When one sysctl is written it is > immediately taken into account to evaluate the dirty memory limits and the > other appears as 0 when read. > > Maybe you can specify dirty_background_ratio to 1 if you want to > trigger write back ASAP. The purpose here is to trigger write back ASAP, I'm not an expert here, but based on test result, 0 obviously doesn't mean disable write back. Set dirty_background_bytes to a value, dirty_background_ratio will be set to 0 together, which means dirty_background_ratio is disabled. However, change dirty_background_ratio from default value to 0, will end up both dirty_background_ratio and dirty_background_bytes to be 0, and based on following related code, I think 0 just means write back if there are any dirty pages. domain_dirty_limits: bg_bytes = dirty_background_bytes -> 0 bg_ratio = (dirty_background_ratio * PAGE_SIZE) / 100 -> 0 if (bg_bytes) bg_thresh = DIV_ROUND_UP(bg_bytes, PAGE_SIZE); else bg_thresh = (bg_ratio * available_memory) / PAGE_SIZE; -> 0 dtc->bg_thresh = bg_thresh; -> 0 balance_dirty_pages nr_reclaimable = global_node_page_state(NR_FILE_DIRTY); if (!laptop_mode && nr_reclaimable > gdtc->bg_thresh && !writeback_in_progress(wb)) wb_start_background_writeback(wb); -> writeback ASAP Thanks, Kuai > >>>> >>>> If a lot of bio is not plugged, then it's the same as before; if a lot >>>> of bio is plugged, noted that before this patchset, these bio will spent >>>> quite a long time in plug, and hence I think performance should be >>>> better. >>> >>> Hmm, it depends on if it's sequential or not? If it's a big io >>> request, can it miss the merge opportunity? >> >> The bio will still be merged to underlying disks' rq(if it's rq based), >> underlying disk won't flush plug untill the number of request exceed >> threshold. > > Thanks for this. > > Regards > Xiao >> >> Thanks, >> Kuai >>> >>> Regards >>> Xiao >>> >>>> >>>> Thanks, >>>> Kuai >>>>> >>>>> Best Regards >>>>> Xiao >>>>> >>>>> On Wed, Apr 26, 2023 at 4:24 PM Yu Kuai wrote: >>>>>> >>>>>> From: Yu Kuai >>>>>> >>>>>> bio can be added to plug infinitely, and following writeback test can >>>>>> trigger huge amount of plugged bio: >>>>>> >>>>>> Test script: >>>>>> modprobe brd rd_nr=4 rd_size=10485760 >>>>>> mdadm -CR /dev/md0 -l10 -n4 /dev/ram[0123] --assume-clean >>>>>> echo 0 > /proc/sys/vm/dirty_background_ratio >>>>>> echo 60 > /proc/sys/vm/dirty_ratio >>>>>> fio -filename=/dev/md0 -ioengine=libaio -rw=write -bs=4k -numjobs=1 -iodepth=128 -name=test >>>>>> >>>>>> Test result: >>>>>> Monitor /sys/block/md0/inflight will found that inflight keep increasing >>>>>> until fio finish writing, after running for about 2 minutes: >>>>>> >>>>>> [root@fedora ~]# cat /sys/block/md0/inflight >>>>>> 0 4474191 >>>>>> >>>>>> Fix the problem by limiting the number of plugged bio based on the number >>>>>> of copies for original bio. >>>>>> >>>>>> Signed-off-by: Yu Kuai >>>>>> --- >>>>>> drivers/md/raid1-10.c | 9 ++++++++- >>>>>> drivers/md/raid1.c | 2 +- >>>>>> drivers/md/raid10.c | 2 +- >>>>>> 3 files changed, 10 insertions(+), 3 deletions(-) >>>>>> >>>>>> diff --git a/drivers/md/raid1-10.c b/drivers/md/raid1-10.c >>>>>> index 98d678b7df3f..35fb80aa37aa 100644 >>>>>> --- a/drivers/md/raid1-10.c >>>>>> +++ b/drivers/md/raid1-10.c >>>>>> @@ -21,6 +21,7 @@ >>>>>> #define IO_MADE_GOOD ((struct bio *)2) >>>>>> >>>>>> #define BIO_SPECIAL(bio) ((unsigned long)bio <= 2) >>>>>> +#define MAX_PLUG_BIO 32 >>>>>> >>>>>> /* for managing resync I/O pages */ >>>>>> struct resync_pages { >>>>>> @@ -31,6 +32,7 @@ struct resync_pages { >>>>>> struct raid1_plug_cb { >>>>>> struct blk_plug_cb cb; >>>>>> struct bio_list pending; >>>>>> + unsigned int count; >>>>>> }; >>>>>> >>>>>> static void rbio_pool_free(void *rbio, void *data) >>>>>> @@ -127,7 +129,7 @@ static inline void md_submit_write(struct bio *bio) >>>>>> } >>>>>> >>>>>> static inline bool md_add_bio_to_plug(struct mddev *mddev, struct bio *bio, >>>>>> - blk_plug_cb_fn unplug) >>>>>> + blk_plug_cb_fn unplug, int copies) >>>>>> { >>>>>> struct raid1_plug_cb *plug = NULL; >>>>>> struct blk_plug_cb *cb; >>>>>> @@ -147,6 +149,11 @@ static inline bool md_add_bio_to_plug(struct mddev *mddev, struct bio *bio, >>>>>> >>>>>> plug = container_of(cb, struct raid1_plug_cb, cb); >>>>>> bio_list_add(&plug->pending, bio); >>>>>> + if (++plug->count / MAX_PLUG_BIO >= copies) { >>>>>> + list_del(&cb->list); >>>>>> + cb->callback(cb, false); >>>>>> + } >>>>>> + >>>>>> >>>>>> return true; >>>>>> } >>>>>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c >>>>>> index 639e09cecf01..c6066408a913 100644 >>>>>> --- a/drivers/md/raid1.c >>>>>> +++ b/drivers/md/raid1.c >>>>>> @@ -1562,7 +1562,7 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio, >>>>>> r1_bio->sector); >>>>>> /* flush_pending_writes() needs access to the rdev so...*/ >>>>>> mbio->bi_bdev = (void *)rdev; >>>>>> - if (!md_add_bio_to_plug(mddev, mbio, raid1_unplug)) { >>>>>> + if (!md_add_bio_to_plug(mddev, mbio, raid1_unplug, disks)) { >>>>>> spin_lock_irqsave(&conf->device_lock, flags); >>>>>> bio_list_add(&conf->pending_bio_list, mbio); >>>>>> spin_unlock_irqrestore(&conf->device_lock, flags); >>>>>> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c >>>>>> index bd9e655ca408..7135cfaf75db 100644 >>>>>> --- a/drivers/md/raid10.c >>>>>> +++ b/drivers/md/raid10.c >>>>>> @@ -1306,7 +1306,7 @@ static void raid10_write_one_disk(struct mddev *mddev, struct r10bio *r10_bio, >>>>>> >>>>>> atomic_inc(&r10_bio->remaining); >>>>>> >>>>>> - if (!md_add_bio_to_plug(mddev, mbio, raid10_unplug)) { >>>>>> + if (!md_add_bio_to_plug(mddev, mbio, raid10_unplug, conf->copies)) { >>>>>> spin_lock_irqsave(&conf->device_lock, flags); >>>>>> bio_list_add(&conf->pending_bio_list, mbio); >>>>>> spin_unlock_irqrestore(&conf->device_lock, flags); >>>>>> -- >>>>>> 2.39.2 >>>>>> >>>>> >>>>> . >>>>> >>>> >>> >>> . >>> >> > > . >