Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp1740668pxu; Thu, 17 Dec 2020 18:15:54 -0800 (PST) X-Google-Smtp-Source: ABdhPJyHM6HDCNcv6O/9gs1PE0E5FYR09wSXrpLNhpObeYEpYiQxcNrsPjyOJ9AtUuAJmhLmtQxn X-Received: by 2002:a17:906:5293:: with SMTP id c19mr1930685ejm.72.1608257754053; Thu, 17 Dec 2020 18:15:54 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1608257754; cv=none; d=google.com; s=arc-20160816; b=hqwIVzrOezddXu0qG4QNEoLCZeLhjlpUdhyO29gc68tIKQLcEt5xk1a+hZEGGl1fRJ Dy1KYKxmXSWHXa6mDgeY6ZkieMbdxfkdRSQqidGJqjZqJKFttjKXPFA8Y98rIzvpwiYY Yufi1ajWKVbpWNGr1kXP7nTbrxy6jIxs9fctvYXsNB1/v+IuKA4nUGJfyEhUgk0lM7jc ur6meaJe5NR53VrIunjVjSg8HDcGG769IeoF0CvU3w7Vj+Q1FPj/VAHQ5RRp+BUFFcdf gU1ZPEKMYRLrSpO/MBp2CU8v+9gMwFE6ixFosCYQ4CEB+5e0Tz0bnmZLzeGmDOTQNTJK T3LA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=c2yx7AE0oWDZIp35+vo9qjfvTDNeuH9wy+ojJ/ukwbA=; b=LAYLvA8wT8M3jmGxOuLZShqqj6eNRBsMxLou6CQ9e4s/fAZfc5rYsFR29lhJH4y+U0 zl/ad5uwu7U3W6hx3Y+CGIeBNyFO4V5TtQ3qK7ceA4D78T1x0wK/ISo1oeMxSRBRpaUz gO2AR3KdjB+GA34NNMiguyyVdR3r+blwA4evoY8yql8qfoC+C793zV8JmwQwIhMRQIwE 9OMdxttfLbW0LGy+1A96zRKGwvesY+DbqXBm9U3Weo/h/73Oz181oszY2m5SIE7aa74D dDYD8bnHCTOckoML2FYjeV6Dmyq6VrKq8ni+V4ngeAK/D+JrPVOAmUcG/3hCZZUzPkN2 sBag== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id bz13si3835528ejc.437.2020.12.17.18.15.31; Thu, 17 Dec 2020 18:15:54 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732484AbgLRCOj (ORCPT + 99 others); Thu, 17 Dec 2020 21:14:39 -0500 Received: from mail.cn.fujitsu.com ([183.91.158.132]:46790 "EHLO heian.cn.fujitsu.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727160AbgLRCOj (ORCPT ); Thu, 17 Dec 2020 21:14:39 -0500 X-IronPort-AV: E=Sophos;i="5.78,428,1599494400"; d="scan'208";a="102691740" Received: from unknown (HELO cn.fujitsu.com) ([10.167.33.5]) by heian.cn.fujitsu.com with ESMTP; 18 Dec 2020 10:13:47 +0800 Received: from G08CNEXMBPEKD05.g08.fujitsu.local (unknown [10.167.33.204]) by cn.fujitsu.com (Postfix) with ESMTP id DBE494CE6013; Fri, 18 Dec 2020 10:13:45 +0800 (CST) Received: from irides.mr (10.167.225.141) by G08CNEXMBPEKD05.g08.fujitsu.local (10.167.33.204) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Fri, 18 Dec 2020 10:13:45 +0800 Subject: Re: [RFC PATCH v3 8/9] md: Implement ->corrupted_range() To: "Darrick J. Wong" CC: , , , , , , , , , , , , , Theodore Ts'o References: <20201215121414.253660-1-ruansy.fnst@cn.fujitsu.com> <20201215121414.253660-9-ruansy.fnst@cn.fujitsu.com> <20201215205102.GB6918@magnolia> From: Ruan Shiyang Message-ID: Date: Fri, 18 Dec 2020 10:11:54 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.5.1 MIME-Version: 1.0 In-Reply-To: <20201215205102.GB6918@magnolia> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-Originating-IP: [10.167.225.141] X-ClientProxiedBy: G08CNEXCHPEKD06.g08.fujitsu.local (10.167.33.205) To G08CNEXMBPEKD05.g08.fujitsu.local (10.167.33.204) X-yoursite-MailScanner-ID: DBE494CE6013.AEDC9 X-yoursite-MailScanner: Found to be clean X-yoursite-MailScanner-From: ruansy.fnst@cn.fujitsu.com X-Spam-Status: No Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2020/12/16 上午4:51, Darrick J. Wong wrote: > On Tue, Dec 15, 2020 at 08:14:13PM +0800, Shiyang Ruan wrote: >> With the support of ->rmap(), it is possible to obtain the superblock on >> a mapped device. >> >> If a pmem device is used as one target of mapped device, we cannot >> obtain its superblock directly. With the help of SYSFS, the mapped >> device can be found on the target devices. So, we iterate the >> bdev->bd_holder_disks to obtain its mapped device. >> >> Signed-off-by: Shiyang Ruan >> --- >> drivers/md/dm.c | 66 +++++++++++++++++++++++++++++++++++++++++++ >> drivers/nvdimm/pmem.c | 9 ++++-- >> fs/block_dev.c | 21 ++++++++++++++ >> include/linux/genhd.h | 7 +++++ >> 4 files changed, 100 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/md/dm.c b/drivers/md/dm.c >> index 4e0cbfe3f14d..9da1f9322735 100644 >> --- a/drivers/md/dm.c >> +++ b/drivers/md/dm.c >> @@ -507,6 +507,71 @@ static int dm_blk_report_zones(struct gendisk *disk, sector_t sector, >> #define dm_blk_report_zones NULL >> #endif /* CONFIG_BLK_DEV_ZONED */ >> >> +struct dm_blk_corrupt { >> + struct block_device *bdev; >> + sector_t offset; >> +}; >> + >> +static int dm_blk_corrupt_fn(struct dm_target *ti, struct dm_dev *dev, >> + sector_t start, sector_t len, void *data) >> +{ >> + struct dm_blk_corrupt *bc = data; >> + >> + return bc->bdev == (void *)dev->bdev && >> + (start <= bc->offset && bc->offset < start + len); >> +} >> + >> +static int dm_blk_corrupted_range(struct gendisk *disk, >> + struct block_device *target_bdev, >> + loff_t target_offset, size_t len, void *data) >> +{ >> + struct mapped_device *md = disk->private_data; >> + struct block_device *md_bdev = md->bdev; >> + struct dm_table *map; >> + struct dm_target *ti; >> + struct super_block *sb; >> + int srcu_idx, i, rc = 0; >> + bool found = false; >> + sector_t disk_sec, target_sec = to_sector(target_offset); >> + >> + map = dm_get_live_table(md, &srcu_idx); >> + if (!map) >> + return -ENODEV; >> + >> + for (i = 0; i < dm_table_get_num_targets(map); i++) { >> + ti = dm_table_get_target(map, i); >> + if (ti->type->iterate_devices && ti->type->rmap) { >> + struct dm_blk_corrupt bc = {target_bdev, target_sec}; >> + >> + found = ti->type->iterate_devices(ti, dm_blk_corrupt_fn, &bc); >> + if (!found) >> + continue; >> + disk_sec = ti->type->rmap(ti, target_sec); > > What happens if the dm device has multiple reverse mappings because the > physical storage is being shared at multiple LBAs? (e.g. a > deduplication target) I thought that the dm device knows the mapping relationship, and it can be done by implementation of ->rmap() in each target. Did I understand it wrong? > >> + break; >> + } >> + } >> + >> + if (!found) { >> + rc = -ENODEV; >> + goto out; >> + } >> + >> + sb = get_super(md_bdev); >> + if (!sb) { >> + rc = bd_disk_holder_corrupted_range(md_bdev, to_bytes(disk_sec), len, data); >> + goto out; >> + } else if (sb->s_op->corrupted_range) { >> + loff_t off = to_bytes(disk_sec - get_start_sect(md_bdev)); >> + >> + rc = sb->s_op->corrupted_range(sb, md_bdev, off, len, data); > > This "call bd_disk_holder_corrupted_range or sb->s_op->corrupted_range" > logic appears twice; should it be refactored into a common helper? > > Or, should the superblock dispatch part move to > bd_disk_holder_corrupted_range? bd_disk_holder_corrupted_range() requires SYSFS configuration. I introduce it to handle those block devices that can not obtain superblock by `get_super()`. Usually, if we create filesystem directly on a pmem device, or make some partitions at first, we can use `get_super()` to get the superblock. In other case, such as creating a LVM on pmem device, `get_super()` does not work. So, I think refactoring it into a common helper looks better. -- Thanks, Ruan Shiyang. > >> + } >> + drop_super(sb); >> + >> +out: >> + dm_put_live_table(md, srcu_idx); >> + return rc; >> +} >> + >> static int dm_prepare_ioctl(struct mapped_device *md, int *srcu_idx, >> struct block_device **bdev) >> { >> @@ -3084,6 +3149,7 @@ static const struct block_device_operations dm_blk_dops = { >> .getgeo = dm_blk_getgeo, >> .report_zones = dm_blk_report_zones, >> .pr_ops = &dm_pr_ops, >> + .corrupted_range = dm_blk_corrupted_range, >> .owner = THIS_MODULE >> }; >> >> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c >> index 4688bff19c20..e8cfaf860149 100644 >> --- a/drivers/nvdimm/pmem.c >> +++ b/drivers/nvdimm/pmem.c >> @@ -267,11 +267,14 @@ static int pmem_corrupted_range(struct gendisk *disk, struct block_device *bdev, >> >> bdev_offset = (disk_sector - get_start_sect(bdev)) << SECTOR_SHIFT; >> sb = get_super(bdev); >> - if (sb && sb->s_op->corrupted_range) { >> + if (!sb) { >> + rc = bd_disk_holder_corrupted_range(bdev, bdev_offset, len, data); >> + goto out; >> + } else if (sb->s_op->corrupted_range) >> rc = sb->s_op->corrupted_range(sb, bdev, bdev_offset, len, data); >> - drop_super(sb); > > This is out of scope for this patch(set) but do you think that the scsi > disk driver should intercept media errors from sense data and call > ->corrupted_range too? ISTR Ted muttering that one of his employers had > a patchset to do more with sense data than the upstream kernel currently > does... > >> - } >> + drop_super(sb); >> >> +out: >> bdput(bdev); >> return rc; >> } >> diff --git a/fs/block_dev.c b/fs/block_dev.c >> index 9e84b1928b94..d3e6bddb8041 100644 >> --- a/fs/block_dev.c >> +++ b/fs/block_dev.c >> @@ -1171,6 +1171,27 @@ struct bd_holder_disk { >> int refcnt; >> }; >> >> +int bd_disk_holder_corrupted_range(struct block_device *bdev, loff_t off, size_t len, void *data) >> +{ >> + struct bd_holder_disk *holder; >> + struct gendisk *disk; >> + int rc = 0; >> + >> + if (list_empty(&(bdev->bd_holder_disks))) >> + return -ENODEV; >> + >> + list_for_each_entry(holder, &bdev->bd_holder_disks, list) { >> + disk = holder->disk; >> + if (disk->fops->corrupted_range) { >> + rc = disk->fops->corrupted_range(disk, bdev, off, len, data); >> + if (rc != -ENODEV) >> + break; >> + } >> + } >> + return rc; >> +} >> +EXPORT_SYMBOL_GPL(bd_disk_holder_corrupted_range); >> + >> static struct bd_holder_disk *bd_find_holder_disk(struct block_device *bdev, >> struct gendisk *disk) >> { >> diff --git a/include/linux/genhd.h b/include/linux/genhd.h >> index ed06209008b8..fba247b852fa 100644 >> --- a/include/linux/genhd.h >> +++ b/include/linux/genhd.h >> @@ -382,9 +382,16 @@ int blkdev_ioctl(struct block_device *, fmode_t, unsigned, unsigned long); >> long compat_blkdev_ioctl(struct file *, unsigned, unsigned long); >> >> #ifdef CONFIG_SYSFS >> +int bd_disk_holder_corrupted_range(struct block_device *bdev, loff_t off, >> + size_t len, void *data); >> int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk); >> void bd_unlink_disk_holder(struct block_device *bdev, struct gendisk *disk); >> #else >> +int bd_disk_holder_corrupted_range(struct block_device *bdev, loff_t off, >> + size_t len, void *data) >> +{ >> + return 0; >> +} >> static inline int bd_link_disk_holder(struct block_device *bdev, >> struct gendisk *disk) >> { >> -- >> 2.29.2 >> >> >> > >