Received: by 2002:ab2:3141:0:b0:1ed:23cc:44d1 with SMTP id i1csp16024lqg; Thu, 29 Feb 2024 18:00:12 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCXZdhA7u7M7Engzlo6TV/r52oM9RdLn5K4XXlidVXaKFRw9V9vocV7/uxfD4ds44kbeIeACwVSE1AcTl/hgbOvm6xz4TWe4YcjTPJAZLA== X-Google-Smtp-Source: AGHT+IF+KLYLiIyTGALsHK7o+Yyp+miIZBoN8j4Hm5jLAMpUKvLtXcG0/ph/uGOKT/paJafy4hVb X-Received: by 2002:a05:622a:178e:b0:42e:b9c9:e119 with SMTP id s14-20020a05622a178e00b0042eb9c9e119mr366193qtk.43.1709258412140; Thu, 29 Feb 2024 18:00:12 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709258412; cv=pass; d=google.com; s=arc-20160816; b=egXDtKlyuTfx0mLAae/81SXKka0r2UCBLYXLTl7ZWPNU0U9Bgemc3FL8ebPNhp1cUu aq6hLdjXafaSBw+0VaWS2vhfxCmNoRgMN9luBEU2FBSfwg53IH3Z0SYtUYHWR6mwHoQL 8BgQGj75TJKiFuLDvBd5o+SLE4ylg0/Fjq6dtK4pgi5IjpSqKadbTygywkjrK21u0+wH 731TYBeuzHroHDLHgKsNoMZRovKIwDexFgdWYFPdyhtDL8gakxKYSbr/BdgdfHa9UFOo RRP/qX2iP4RFoHcwA0lbFLExUDCxZmKdxt0EA59psEjvQ66IdCI+Ho6f6qDkPTy/zJHj pv9w== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:user-agent:date:message-id:from :references:cc:to:subject; bh=tSvnrMitaj2C+tv3Se64mZZquiRybwHjqrR9inSkAHs=; fh=aJWwGslckJpUkos82MSfqkXb4FxdyqaABD7bLP+n064=; b=d/a3aWcqjqShFq8wAZoM9cQrDDm9qBPoVOuXmCbwyd8KJuLPRxDQ6CYyK7AmxbINjG wSzXT+wWT5464iDKiZrPcYlCWQWagu5CCKTZOgsWFct/0tVPPQtwPh2y6u7rqtNfeP18 RHQUmSrO8Rlh9ilv4lSTZs6xERJ33PuN7jaiFhucM7YQjfXNQqFPIwdz6nryexNeUqZc COkSy8z0PQGlJfQiWGM5ZYNOnYGF53+u+PtNmCjdtZC1CRP302ujQKEpXg6lIyNhix4a I1MUAISOHTYzlorSzn6vegOtQSjX8bkTx60gHPJ9q08iUWj8RJTxEZYe5fqV1I4xj3Do Dj2w==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=huaweicloud.com); spf=pass (google.com: domain of linux-kernel+bounces-87809-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-87809-linux.lists.archive=gmail.com@vger.kernel.org" Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id h20-20020ac85154000000b0042ec64f51f4si809934qtn.137.2024.02.29.18.00.11 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 29 Feb 2024 18:00:12 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-87809-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=huaweicloud.com); spf=pass (google.com: domain of linux-kernel+bounces-87809-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-87809-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id CCAE81C22803 for ; Fri, 1 Mar 2024 02:00:11 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id D094438F9D; Fri, 1 Mar 2024 02:00:00 +0000 (UTC) Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7EC3DFC03; Fri, 1 Mar 2024 01:59:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709258400; cv=none; b=eG4pByBRPcPG+euP0SnPEagOVGXVNvxubMiwZmWv3+rk56DG+TEbnO7ap5dVIT5/v/tHX1NnMzwrtw0/b3kzMD5XqxnoBAN9aDIPCNUEyqGsX0YqCY5r6dvJNyoQKunA8Z+r89CTuBnDxXVGC43uiSiAkpvd6x+aEip1jrFuzAM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709258400; c=relaxed/simple; bh=qAHgni4UkLPw0apvildN88ZpJCb3iXFGkD1LHTOz3QE=; h=Subject:To:Cc:References:From:Message-ID:Date:MIME-Version: In-Reply-To:Content-Type; b=uKdUjslYYaKOpT3FirqowDrj3t+ViP3dwD9f5qjX2ASGO67UlPnofhhSKMjiZSd2c7+hKDj5PJSC6CT+5NibpcP2fZwor2hi7jCN4nOUZPdUHcQWStFyw5Nk+Uw0rtwUflRtzKj+6ezcTHJ6UKIh5jUx1wYdD7lPT+RsvUDQfas= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.216]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4TmB9Y3B4pz4f3lWD; Fri, 1 Mar 2024 09:59:45 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id CE5FC1A01A8; Fri, 1 Mar 2024 09:59:52 +0800 (CST) Received: from [10.174.176.73] (unknown [10.174.176.73]) by APP1 (Coremail) with SMTP id cCh0CgDHlxCWNuFlofJLFg--.62870S3; Fri, 01 Mar 2024 09:59:52 +0800 (CST) Subject: Re: [PATCH md-6.9 v4 03/11] md/raid1: record nonrot rdevs while adding/removing rdevs to conf To: Paul Menzel , Yu Kuai Cc: xni@redhat.com, paul.e.luse@linux.intel.com, song@kernel.org, neilb@suse.com, shli@fb.com, linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org, yi.zhang@huawei.com, yangerkun@huawei.com, "yukuai (C)" References: <20240229095714.926789-1-yukuai1@huaweicloud.com> <20240229095714.926789-4-yukuai1@huaweicloud.com> <7b030433-518e-4fe7-976c-3ffb5f7f1a85@molgen.mpg.de> From: Yu Kuai Message-ID: Date: Fri, 1 Mar 2024 09:59:50 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 In-Reply-To: <7b030433-518e-4fe7-976c-3ffb5f7f1a85@molgen.mpg.de> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-CM-TRANSID:cCh0CgDHlxCWNuFlofJLFg--.62870S3 X-Coremail-Antispam: 1UD129KBjvJXoW3GFWfuFW7Kr15JF4rtr1rCrg_yoWxtrWrpr 4ktFWrJryUCrn5Jr1Utr1UAryrtw1UJa1DJr1xJa4jqr1UJryjqF4UWryjgr1UJr48Jr1U Jr1UJrsrZr1xJF7anT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUU9F14x267AKxVW8JVW5JwAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK02 1l84ACjcxK6xIIjxv20xvE14v26F1j6w1UM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26r4U JVWxJr1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_Gc CE3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2WlYx0E 2Ix0cI8IcVAFwI0_Jr0_Jr4lYx0Ex4A2jsIE14v26r1j6r4UMcvjeVCFs4IE7xkEbVWUJV W8JwACjcxG0xvEwIxGrwACjI8F5VA0II8E6IAqYI8I648v4I1lFIxGxcIEc7CjxVA2Y2ka 0xkIwI1lc7I2V7IY0VAS07AlzVAYIcxG8wCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7x kEbVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E 67AF67kF1VAFwI0_Jw0_GFylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCw CI42IY6xIIjxv20xvEc7CjxVAFwI0_Gr0_Cr1lIxAIcVCF04k26cxKx2IYs7xG6rW3Jr0E 3s1lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIxAIcVC2z280aVCY1x0267AKxVW8JVW8JrUvcS sGvfC2KfnxnUUI43ZEXa7VUbXdbUUUUUU== X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ Hi, 在 2024/03/01 0:37, Paul Menzel 写道: > Dear Yu, > > > Thank you for your patch. > > > Am 29.02.24 um 10:57 schrieb Yu Kuai: >> From: Yu Kuai >> >> For raid1, each read will iterate all the rdevs from conf and check if >> any rdev is non-rotational, then choose rdev with minimal IO inflight >> if so, or rdev with closest distance otherwise. >> >> Disk nonrot info can be changed through sysfs entry: >> >> /sys/block/[disk_name]/queue/rotational >> >> However, consider that this should only be used for testing, and user >> really shouldn't do this in real life. Record the number of >> non-rotational >> disks in conf, to avoid checking each rdev in IO fast path and simplify > > The comma is not needed. > >> read_balance() a little bit. > > Just to make sure, I understood correctly. Changing > `/sys/block/[disk_name]/queue/rotational` will now not be considered > anymore, right? Yes, and I think this will case performance to be worse in real life. > > For the summary, maybe you could also say “cache”. Maybe: > > Cache attribute rotational while adding/removing rdevs to conf > >> Co-developed-by: Paul Luse >> Signed-off-by: Paul Luse >> Signed-off-by: Yu Kuai >> --- >>   drivers/md/md.h    |  1 + >>   drivers/md/raid1.c | 17 ++++++++++------- >>   drivers/md/raid1.h |  1 + >>   3 files changed, 12 insertions(+), 7 deletions(-) >> >> diff --git a/drivers/md/md.h b/drivers/md/md.h >> index a49ab04ab707..b2076a165c10 100644 >> --- a/drivers/md/md.h >> +++ b/drivers/md/md.h >> @@ -207,6 +207,7 @@ enum flag_bits { >>                    * check if there is collision between raid1 >>                    * serial bios. >>                    */ >> +    Nonrot,            /* non-rotational device (SSD) */ >>   }; >>   static inline int is_badblock(struct md_rdev *rdev, sector_t s, int >> sectors, >> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c >> index 6ec9998f6257..de6ea87d4d24 100644 >> --- a/drivers/md/raid1.c >> +++ b/drivers/md/raid1.c >> @@ -599,7 +599,6 @@ static int read_balance(struct r1conf *conf, >> struct r1bio *r1_bio, int *max_sect >>       int sectors; >>       int best_good_sectors; >>       int best_disk, best_dist_disk, best_pending_disk; >> -    int has_nonrot_disk; >>       int disk; >>       sector_t best_dist; >>       unsigned int min_pending; >> @@ -620,7 +619,6 @@ static int read_balance(struct r1conf *conf, >> struct r1bio *r1_bio, int *max_sect >>       best_pending_disk = -1; >>       min_pending = UINT_MAX; >>       best_good_sectors = 0; >> -    has_nonrot_disk = 0; >>       choose_next_idle = 0; >>       clear_bit(R1BIO_FailFast, &r1_bio->state); >> @@ -637,7 +635,6 @@ static int read_balance(struct r1conf *conf, >> struct r1bio *r1_bio, int *max_sect >>           sector_t first_bad; >>           int bad_sectors; >>           unsigned int pending; >> -        bool nonrot; >>           rdev = conf->mirrors[disk].rdev; >>           if (r1_bio->bios[disk] == IO_BLOCKED >> @@ -703,8 +700,6 @@ static int read_balance(struct r1conf *conf, >> struct r1bio *r1_bio, int *max_sect >>               /* At least two disks to choose from so failfast is OK */ >>               set_bit(R1BIO_FailFast, &r1_bio->state); >> -        nonrot = bdev_nonrot(rdev->bdev); >> -        has_nonrot_disk |= nonrot; >>           pending = atomic_read(&rdev->nr_pending); >>           dist = abs(this_sector - conf->mirrors[disk].head_position); >>           if (choose_first) { >> @@ -731,7 +726,7 @@ static int read_balance(struct r1conf *conf, >> struct r1bio *r1_bio, int *max_sect >>                * small, but not a big deal since when the second disk >>                * starts IO, the first disk is likely still busy. >>                */ >> -            if (nonrot && opt_iosize > 0 && >> +            if (test_bit(Nonrot, &rdev->flags) && opt_iosize > 0 && >>                   mirror->seq_start != MaxSector && >>                   mirror->next_seq_sect > opt_iosize && >>                   mirror->next_seq_sect - opt_iosize >= >> @@ -763,7 +758,7 @@ static int read_balance(struct r1conf *conf, >> struct r1bio *r1_bio, int *max_sect >>        * mixed ratation/non-rotational disks depending on workload. >>        */ >>       if (best_disk == -1) { >> -        if (has_nonrot_disk || min_pending == 0) >> +        if (READ_ONCE(conf->nonrot_disks) || min_pending == 0) >>               best_disk = best_pending_disk; >>           else >>               best_disk = best_dist_disk; >> @@ -1768,6 +1763,11 @@ static bool raid1_add_conf(struct r1conf *conf, >> struct md_rdev *rdev, int disk, >>       if (info->rdev) >>           return false; >> +    if (bdev_nonrot(rdev->bdev)) { >> +        set_bit(Nonrot, &rdev->flags); >> +        WRITE_ONCE(conf->nonrot_disks, conf->nonrot_disks + 1); >> +    } >> + >>       rdev->raid_disk = disk; >>       info->head_position = 0; >>       info->seq_start = MaxSector; >> @@ -1791,6 +1791,9 @@ static bool raid1_remove_conf(struct r1conf >> *conf, int disk) >>           rdev->mddev->degraded < conf->raid_disks) >>           return false; >> +    if (test_and_clear_bit(Nonrot, &rdev->flags)) >> +        WRITE_ONCE(conf->nonrot_disks, conf->nonrot_disks - 1); >> + >>       WRITE_ONCE(info->rdev, NULL); >>       return true; >>   } >> diff --git a/drivers/md/raid1.h b/drivers/md/raid1.h >> index 14d4211a123a..5300cbaa58a4 100644 >> --- a/drivers/md/raid1.h >> +++ b/drivers/md/raid1.h >> @@ -71,6 +71,7 @@ struct r1conf { >>                            * allow for replacements. >>                            */ >>       int            raid_disks; >> +    int            nonrot_disks; >>       spinlock_t        device_lock; > > As you meant “fastpath” in the commit message, if I remember correctly, > this does not improve the performance in benchmarks, right? Yest, this just safe some memory load command, this is to little to affect performance benchmarks. Main ideal here is make read_balance() cleaner. Thanks, Kuai > > > Kind regards, > > Paul > . >