Received: by 2002:a05:7412:b10a:b0:f3:1519:9f41 with SMTP id az10csp2532086rdb; Sun, 3 Dec 2023 23:04:41 -0800 (PST) X-Google-Smtp-Source: AGHT+IHCzs5zHsXiWEBiZX7g03Lqz49hW+xyWuuRzdf1WnnPi4wA9+oB1WJ4Jng3rhB/LPnrfYpW X-Received: by 2002:a05:6870:c995:b0:1fb:264d:c2fa with SMTP id hi21-20020a056870c99500b001fb264dc2famr2995768oab.46.1701673481529; Sun, 03 Dec 2023 23:04:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701673481; cv=none; d=google.com; s=arc-20160816; b=mtejkDiyZqpygZIcLN0UwkpLaFBpFPQghxabCVfYWk5PpgZKkb1rVoFMJQzX0zf2Qd f7XdeuWU/cX226xq1K3MK3a+JqP+ojQE+jNbz9YGir0KpXpbGUHoilsJUBy5YbpmZVkL cb/h4EKVd+2i0Jf6XkjzES03ouPVWyCreAcU7TNEuoiSYEp3w1byAAU2VtNbzztgOD99 WW/FYmFBLkAEOhW83GLf+46nsPzYbcKd2jPAFgxxiuRYqGIU8ZtcofVrnH7GCII6i9j/ NcRkOIC31wEAfwatRv0iE5j7rVLSwSSanFNtUYLLDNaAWBU0qm+nmVPDpNI3hSWLWpcF n9Ng== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=fytnuJ2CVU9j4q90G9Ds6UMvYXOmeUbKQYPjpeT46T0=; fh=yZExZylOo88F7S/1N7h6+sIXjY5QzLX+xNp2FlAI9WQ=; b=PvaqZ8vyuOmSpXTfBMnPMnO/86zOdsljmcugcz5tLSRl8JbZQm5u/jjhYa1Ls4Sneo hq0Avc8LVBbN+JGw4kLJmZ8K+rD8faEzGK37aj46OQAJy+9Jiu+S2FSOto/9RO+IkROZ qdN31PDz5JqrP2pDjZW+Ids05tNQrkD99IenUGmd/5uXTSp62dIDcM4F7lsIiUl11t1h w0e49dXIO1vZCVFaQ3Mf4CeWlh4kzhlLlis0XsHsqLS8BW/iYiSidjwJLZSLUu+LZRyT 9N1uVoApFh+F+mb5ShJ6ojlra6bJl9sx/m84uGoNwAxvLbeQ5alfOa+2gLx6FyUU2Wcq InVg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from snail.vger.email (snail.vger.email. [2620:137:e000::3:7]) by mx.google.com with ESMTPS id 21-20020a630115000000b005c6617b52e3si3239344pgb.0.2023.12.03.23.04.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 03 Dec 2023 23:04:41 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) client-ip=2620:137:e000::3:7; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 2FEFF805ECDA; Sun, 3 Dec 2023 23:04:40 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234593AbjLDHE2 (ORCPT + 99 others); Mon, 4 Dec 2023 02:04:28 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34440 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234535AbjLDHEZ (ORCPT ); Mon, 4 Dec 2023 02:04:25 -0500 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0F13DB3; Sun, 3 Dec 2023 23:04:31 -0800 (PST) Received: from mail.maildlp.com (unknown [172.19.163.216]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4SkF5g2Pncz4f3lfx; Mon, 4 Dec 2023 15:04:23 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.112]) by mail.maildlp.com (Postfix) with ESMTP id 00F341A0E29; Mon, 4 Dec 2023 15:04:27 +0800 (CST) Received: from huaweicloud.com (unknown [10.175.104.67]) by APP1 (Coremail) with SMTP id cCh0CgDX2hD5eW1lTQvKCg--.62157S5; Mon, 04 Dec 2023 15:04:27 +0800 (CST) From: linan666@huaweicloud.com To: song@kernel.org Cc: linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org, linan122@huawei.com, yukuai3@huawei.com, yi.zhang@huawei.com, houtao1@huawei.com, yangerkun@huawei.com Subject: [PATCH 1/2] md: factor out a helper exceed_read_errors() to check read_errors Date: Mon, 4 Dec 2023 15:03:26 +0800 Message-Id: <20231204070327.3150356-2-linan666@huaweicloud.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20231204070327.3150356-1-linan666@huaweicloud.com> References: <20231204070327.3150356-1-linan666@huaweicloud.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-CM-TRANSID: cCh0CgDX2hD5eW1lTQvKCg--.62157S5 X-Coremail-Antispam: 1UD129KBjvJXoW3AFW8tryxKw4xXw4DArWkZwb_yoW7Gw4fpa 13AasxJr4UJry7ArnrJFWqya4Fv34SyayjyrWxJa1xWwn5Jrn8tFy5WFy0g348GF9xAw15 XFZ8Grs8CF4kKFUanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUBlb4IE77IF4wAFF20E14v26ryj6rWUM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUGw A2048vs2IY020Ec7CjxVAFwI0_JFI_Gr1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0rcxS w2x7M28EF7xvwVC0I7IYx2IY67AKxVWDJVCq3wA2z4x0Y4vE2Ix0cI8IcVCY1x0267AKxV W8Jr0_Cr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E14v2 6rxl6s0DM2vYz4IE04k24VAvwVAKI4IrM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrV ACY4xI64kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWU JVW8JwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lw4CEc2x0rVAKj4xxMx AIw28IcxkI7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_ Jr4lx2IqxVCjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVWUtVW8ZwCIc40Y0x0EwI xGrwCI42IY6xIIjxv20xvE14v26r1j6r1xMIIF0xvE2Ix0cI8IcVCY1x0267AKxVW8JVWx JwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIxAIcV C2z280aVCY1x0267AKxVW8JVW8JrUvcSsGvfC2KfnxnUUI43ZEXa7IU858n5UUUUU== X-CM-SenderInfo: polqt0awwwqx5xdzvxpfor3voofrz/ X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_BLOCKED,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Sun, 03 Dec 2023 23:04:40 -0800 (PST) From: Li Nan Move check_decay_read_errors() to raid1-10.c and factor out a helper exceed_read_errors() to check if read_errors exceeds the limit, so that raid1 can also use it. There are no functional changes. Signed-off-by: Li Nan --- drivers/md/raid1-10.c | 54 +++++++++++++++++++++++++++++++++++++++++++ drivers/md/raid10.c | 47 +------------------------------------ 2 files changed, 55 insertions(+), 46 deletions(-) diff --git a/drivers/md/raid1-10.c b/drivers/md/raid1-10.c index 3f22edec70e7..9e17eab915f5 100644 --- a/drivers/md/raid1-10.c +++ b/drivers/md/raid1-10.c @@ -173,3 +173,57 @@ static inline void raid1_prepare_flush_writes(struct bitmap *bitmap) else md_bitmap_unplug(bitmap); } + +/* + * Used by fix_read_error() to decay the per rdev read_errors. + * We halve the read error count for every hour that has elapsed + * since the last recorded read error. + */ +static inline void check_decay_read_errors(struct mddev *mddev, struct md_rdev *rdev) +{ + long cur_time_mon; + unsigned long hours_since_last; + unsigned int read_errors = atomic_read(&rdev->read_errors); + + cur_time_mon = ktime_get_seconds(); + + if (rdev->last_read_error == 0) { + /* first time we've seen a read error */ + rdev->last_read_error = cur_time_mon; + return; + } + + hours_since_last = (long)(cur_time_mon - + rdev->last_read_error) / 3600; + + rdev->last_read_error = cur_time_mon; + + /* + * if hours_since_last is > the number of bits in read_errors + * just set read errors to 0. We do this to avoid + * overflowing the shift of read_errors by hours_since_last. + */ + if (hours_since_last >= 8 * sizeof(read_errors)) + atomic_set(&rdev->read_errors, 0); + else + atomic_set(&rdev->read_errors, read_errors >> hours_since_last); +} + +static inline bool exceed_read_errors(struct mddev *mddev, struct md_rdev *rdev) +{ + int max_read_errors = atomic_read(&mddev->max_corr_read_errors); + int read_errors; + + check_decay_read_errors(mddev, rdev); + read_errors = atomic_inc_return(&rdev->read_errors); + if (read_errors > max_read_errors) { + pr_notice("md:%s: %pg: Raid device exceeded read_error threshold [cur %d:max %d]\n", + mdname(mddev), rdev->bdev, read_errors, max_read_errors); + pr_notice("md:%s: %pg: Failing raid device\n", + mdname(mddev), rdev->bdev); + md_error(mddev, rdev); + return true; + } + + return false; +} diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index 375c11d6159f..8ea4974fb91c 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -2592,42 +2592,6 @@ static void recovery_request_write(struct mddev *mddev, struct r10bio *r10_bio) } } -/* - * Used by fix_read_error() to decay the per rdev read_errors. - * We halve the read error count for every hour that has elapsed - * since the last recorded read error. - * - */ -static void check_decay_read_errors(struct mddev *mddev, struct md_rdev *rdev) -{ - long cur_time_mon; - unsigned long hours_since_last; - unsigned int read_errors = atomic_read(&rdev->read_errors); - - cur_time_mon = ktime_get_seconds(); - - if (rdev->last_read_error == 0) { - /* first time we've seen a read error */ - rdev->last_read_error = cur_time_mon; - return; - } - - hours_since_last = (long)(cur_time_mon - - rdev->last_read_error) / 3600; - - rdev->last_read_error = cur_time_mon; - - /* - * if hours_since_last is > the number of bits in read_errors - * just set read errors to 0. We do this to avoid - * overflowing the shift of read_errors by hours_since_last. - */ - if (hours_since_last >= 8 * sizeof(read_errors)) - atomic_set(&rdev->read_errors, 0); - else - atomic_set(&rdev->read_errors, read_errors >> hours_since_last); -} - static int r10_sync_page_io(struct md_rdev *rdev, sector_t sector, int sectors, struct page *page, enum req_op op) { @@ -2665,7 +2629,6 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10 int sect = 0; /* Offset from r10_bio->sector */ int sectors = r10_bio->sectors, slot = r10_bio->read_slot; struct md_rdev *rdev; - int max_read_errors = atomic_read(&mddev->max_corr_read_errors); int d = r10_bio->devs[slot].devnum; /* still own a reference to this rdev, so it cannot @@ -2678,15 +2641,7 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10 more fix_read_error() attempts */ return; - check_decay_read_errors(mddev, rdev); - atomic_inc(&rdev->read_errors); - if (atomic_read(&rdev->read_errors) > max_read_errors) { - pr_notice("md/raid10:%s: %pg: Raid device exceeded read_error threshold [cur %d:max %d]\n", - mdname(mddev), rdev->bdev, - atomic_read(&rdev->read_errors), max_read_errors); - pr_notice("md/raid10:%s: %pg: Failing raid device\n", - mdname(mddev), rdev->bdev); - md_error(mddev, rdev); + if (exceed_read_errors(mddev, rdev)) { r10_bio->devs[slot].bio = IO_BLOCKED; return; } -- 2.39.2