Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp717678pxj; Thu, 3 Jun 2021 18:21:17 -0700 (PDT) X-Google-Smtp-Source: ABdhPJw8IUrN/6HEWlByYenWKkVXMoatlWOjbDV3YYDQsB0xOffhHABDimXJb8UtVEvOsRJnKsNz X-Received: by 2002:a50:9549:: with SMTP id v9mr2112684eda.312.1622769676886; Thu, 03 Jun 2021 18:21:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1622769676; cv=none; d=google.com; s=arc-20160816; b=00LQRAa/HXMPK8xZGoV1Q0Ua7Khj3+LpUU+pUWsBVEBcAg2eE03+KiR6UNUm89uW5l X+jpDmtF7OW7A5whDbOXFYdChplw2LWil93EhRxF02t505BlqE39ol5vvvBVRyEjHsqR WB3NsmlsywVNxUZH3conaDH07UBiIH4bu1b8Gf+VaLhSLEKdBmIeizFMHBo7JA9NTII3 DhupO2le0jaUMi78EJEQvAf4TpALLpr7k2YXqM1WzxD6PERlbvfE1SGpBzniyBdzASyn qQMpbVsxSQt9eRZwP1X/0muJV5CZOmfDqg/T7dPc8hcIbd++mJnUpvRyM5/RrquO7Tyd +1kg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :ironport-hdrordr; bh=k7vGxaFKzmevlaKiDzzcAsSA7RcvNG/IZuQ//jvieiY=; b=Gstt7Vhss8W+o7+xuSWrr6dLYq4vqwBWAPPnil95jeKnMyBbt0+j35ZzSaVbPfjQfd +OlOSZJ1o6F2BkDTJVxR8Op4JN5xTzQY1pdw9tyDsrvobxavinagjUaLxwLi2k+aSiWj N0EDW3S5Dry8IY50KFSrh3FB7nmDibWnhBUbmsgQTMaZo9H2SoxedYyDVVpe8pXJ8/oN xf7syvAiwtPJIOtR1r7euixzi4NPV1WZiw1ewen3ksjcVIb0YWzt6yqk96BdnDe34Cr0 BqoOcJknyNiOXz5/yEu/DPwC7QNNFju9yv+6Vtm5uIyFB1HuysyP+QrvCa3XmYjJHhta TARw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id bm18si3347483edb.404.2021.06.03.18.20.53; Thu, 03 Jun 2021 18:21:16 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230087AbhFDBU7 (ORCPT + 99 others); Thu, 3 Jun 2021 21:20:59 -0400 Received: from mail.cn.fujitsu.com ([183.91.158.132]:12817 "EHLO heian.cn.fujitsu.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S230046AbhFDBUz (ORCPT ); Thu, 3 Jun 2021 21:20:55 -0400 IronPort-HdrOrdr: =?us-ascii?q?A9a23=3AciOndaGMdrd53wsKpLqEjceALOsnbusQ8zAX?= =?us-ascii?q?PiFKKSC9Hfb0qynDpp4mPHzP6Qr5OktOpTnoAsDpKk80naQFgrX5Vo3PYOCJgg?= =?us-ascii?q?WVEL0=3D?= X-IronPort-AV: E=Sophos;i="5.83,246,1616428800"; d="scan'208";a="109209802" Received: from unknown (HELO cn.fujitsu.com) ([10.167.33.5]) by heian.cn.fujitsu.com with ESMTP; 04 Jun 2021 09:19:06 +0800 Received: from G08CNEXMBPEKD04.g08.fujitsu.local (unknown [10.167.33.201]) by cn.fujitsu.com (Postfix) with ESMTP id D3F604C36A00; Fri, 4 Jun 2021 09:19:01 +0800 (CST) Received: from G08CNEXJMPEKD02.g08.fujitsu.local (10.167.33.202) by G08CNEXMBPEKD04.g08.fujitsu.local (10.167.33.201) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Fri, 4 Jun 2021 09:19:02 +0800 Received: from G08CNEXCHPEKD07.g08.fujitsu.local (10.167.33.80) by G08CNEXJMPEKD02.g08.fujitsu.local (10.167.33.202) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Fri, 4 Jun 2021 09:18:55 +0800 Received: from irides.mr.mr.mr (10.167.225.141) by G08CNEXCHPEKD07.g08.fujitsu.local (10.167.33.209) with Microsoft SMTP Server id 15.0.1497.2 via Frontend Transport; Fri, 4 Jun 2021 09:18:54 +0800 From: Shiyang Ruan To: , , , , , CC: , , , , , , Subject: [PATCH v4 05/10] mm, pmem: Implement ->memory_failure() in pmem driver Date: Fri, 4 Jun 2021 09:18:39 +0800 Message-ID: <20210604011844.1756145-6-ruansy.fnst@fujitsu.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210604011844.1756145-1-ruansy.fnst@fujitsu.com> References: <20210604011844.1756145-1-ruansy.fnst@fujitsu.com> MIME-Version: 1.0 Content-Transfer-Encoding: 7BIT Content-Type: text/plain; charset=US-ASCII X-yoursite-MailScanner-ID: D3F604C36A00.A0E01 X-yoursite-MailScanner: Found to be clean X-yoursite-MailScanner-From: ruansy.fnst@fujitsu.com X-Spam-Status: No Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Call the ->memory_failure() which is implemented by pmem driver, in order to finally notify filesystem to handle the corrupted data. The handler which collects and kills processes are moved into mf_dax_kill_procs(), which will be called by filesystem. Keep the old handler in order to roll back if driver or filesystem does not support ->memory_failure()/->corrupted_range(). Signed-off-by: Shiyang Ruan --- block/genhd.c | 30 ++++++++++++++++++ drivers/nvdimm/pmem.c | 14 +++++++++ include/linux/genhd.h | 1 + mm/memory-failure.c | 71 +++++++++++++++++++++++++++---------------- 4 files changed, 90 insertions(+), 26 deletions(-) diff --git a/block/genhd.c b/block/genhd.c index 9f8cb7beaad1..75834bd057df 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -718,6 +718,36 @@ struct block_device *bdget_disk(struct gendisk *disk, int partno) return bdev; } +/** + * bdget_disk_sector - get block device by given sector number + * @disk: gendisk of interest + * @sector: sector number + * + * RETURNS: the found block device where sector locates in + */ +struct block_device *bdget_disk_sector(struct gendisk *disk, sector_t sector) +{ + struct block_device *part = NULL, *p; + unsigned long idx; + + rcu_read_lock(); + xa_for_each(&disk->part_tbl, idx, p) { + if (p->bd_partno == 0) + continue; + if (p->bd_start_sect <= sector && + sector < p->bd_start_sect + bdev_nr_sectors(p)) { + part = p; + break; + } + } + rcu_read_unlock(); + if (!part) + part = disk->part0; + + return bdget_disk(disk, part->bd_partno); +} +EXPORT_SYMBOL(bdget_disk_sector); + /* * print a full list of all partitions - intended for places where the root * filesystem can't be mounted and thus to give the victim some idea of what diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index ed10a8b66068..98349e7d0a28 100644 --- a/drivers/nvdimm/pmem.c +++ b/drivers/nvdimm/pmem.c @@ -364,9 +364,23 @@ static void pmem_release_disk(void *__pmem) put_disk(pmem->disk); } +static int pmem_pagemap_memory_failure(struct dev_pagemap *pgmap, + unsigned long pfn, int flags) +{ + struct pmem_device *pdev = + container_of(pgmap, struct pmem_device, pgmap); + loff_t offset = PFN_PHYS(pfn) - pdev->phys_addr - pdev->data_offset; + struct block_device *bdev = + bdget_disk_sector(pdev->disk, offset >> SECTOR_SHIFT); + + return dax_corrupted_range(pdev->dax_dev, bdev, offset, + page_size(pfn_to_page(pfn)), &flags); +} + static const struct dev_pagemap_ops fsdax_pagemap_ops = { .kill = pmem_pagemap_kill, .cleanup = pmem_pagemap_cleanup, + .memory_failure = pmem_pagemap_memory_failure, }; static int pmem_attach_disk(struct device *dev, diff --git a/include/linux/genhd.h b/include/linux/genhd.h index 6fc26f7bdf71..2ad70c02c343 100644 --- a/include/linux/genhd.h +++ b/include/linux/genhd.h @@ -219,6 +219,7 @@ static inline void add_disk_no_queue_reg(struct gendisk *disk) extern void del_gendisk(struct gendisk *gp); extern struct block_device *bdget_disk(struct gendisk *disk, int partno); +extern struct block_device *bdget_disk_sector(struct gendisk *disk, sector_t sector); void set_disk_ro(struct gendisk *disk, bool read_only); diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 4377e727d478..43017d7f3918 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1247,6 +1247,36 @@ static void unmap_and_kill(struct list_head *to_kill, unsigned long pfn, kill_procs(to_kill, flags & MF_MUST_KILL, false, pfn, flags); } +static int mf_generic_kill_procs(unsigned long long pfn, int flags) +{ + struct page *page = pfn_to_page(pfn); + LIST_HEAD(to_kill); + dax_entry_t cookie; + + /* + * Prevent the inode from being freed while we are interrogating + * the address_space, typically this would be handled by + * lock_page(), but dax pages do not use the page lock. This + * also prevents changes to the mapping of this pfn until + * poison signaling is complete. + */ + cookie = dax_lock_page(page); + if (!cookie) + return -EBUSY; + /* + * Unlike System-RAM there is no possibility to swap in a + * different physical page at a given virtual address, so all + * userspace consumption of ZONE_DEVICE memory necessitates + * SIGBUS (i.e. MF_MUST_KILL) + */ + flags |= MF_ACTION_REQUIRED | MF_MUST_KILL; + collect_procs(page, &to_kill, flags & MF_ACTION_REQUIRED); + + unmap_and_kill(&to_kill, pfn, page->mapping, page->index, flags); + dax_unlock_page(page, cookie); + return 0; +} + int mf_dax_kill_procs(struct address_space *mapping, pgoff_t index, int flags) { LIST_HEAD(to_kill); @@ -1348,9 +1378,7 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags, struct dev_pagemap *pgmap) { struct page *page = pfn_to_page(pfn); - LIST_HEAD(to_kill); int rc = -EBUSY; - dax_entry_t cookie; if (flags & MF_COUNT_INCREASED) /* @@ -1364,20 +1392,9 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags, goto out; } - /* - * Prevent the inode from being freed while we are interrogating - * the address_space, typically this would be handled by - * lock_page(), but dax pages do not use the page lock. This - * also prevents changes to the mapping of this pfn until - * poison signaling is complete. - */ - cookie = dax_lock_page(page); - if (!cookie) - goto out; - if (hwpoison_filter(page)) { rc = 0; - goto unlock; + goto out; } if (pgmap->type == MEMORY_DEVICE_PRIVATE) { @@ -1385,7 +1402,7 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags, * TODO: Handle HMM pages which may need coordination * with device-side memory. */ - goto unlock; + goto out; } /* @@ -1395,19 +1412,21 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags, SetPageHWPoison(page); /* - * Unlike System-RAM there is no possibility to swap in a - * different physical page at a given virtual address, so all - * userspace consumption of ZONE_DEVICE memory necessitates - * SIGBUS (i.e. MF_MUST_KILL) + * Call driver's implementation to handle the memory failure, + * otherwise roll back to generic handler. */ - flags |= MF_ACTION_REQUIRED | MF_MUST_KILL; - collect_procs_file(page, page->mapping, page->index, &to_kill, - flags & MF_ACTION_REQUIRED); + if (pgmap->ops->memory_failure) { + rc = pgmap->ops->memory_failure(pgmap, pfn, flags); + /* + * Roll back to generic handler too if operation is not + * supported inside the driver/device/filesystem. + */ + if (rc != EOPNOTSUPP) + goto out; + } + + rc = mf_generic_kill_procs(pfn, flags); - unmap_and_kill(&to_kill, pfn, page->mapping, page->index, flags); - rc = 0; -unlock: - dax_unlock_page(page, cookie); out: /* drop pgmap ref acquired in caller */ put_dev_pagemap(pgmap); -- 2.31.1