2022-01-12 14:41:38

by Xiongwei Song

[permalink] [raw]
Subject: [PATCH v3 0/2] Add support for getting page info of ZONE_DEVICE by /proc/kpage*

From: Xiongwei Song <[email protected]>

Patch 1 is adding pfn_to_devmap_page() function to get page of ZONE_DEVICE
by pfn. It checks if dev_pagemap is valid, if yes, return page pointer.

Patch 2 is finishing supporting /proc/kpage* in exposing pages info of
ZONE_DEVICE to userspace.

The unit test has been done by "page-types -r", which ran in qemu with the
below arguments:
-object memory-backend-file,id=mem2,share,mem-path=./virtio_pmem.img,size=2G
-device virtio-pmem-pci,memdev=mem2,id=nv1
, which is used to emulate pmem device with 2G memory space.

As we know, the pages in ZONE_DEVICE are only set PG_reserved flag. So
before the serires,
run "page-types -r", the result is:
flags page-count MB symbolic-flags long-symbolic-flags
0x0000000100000000 24377 95 ___________________________r________________ reserved
, which means the only PG_reserved set of pages in system wide have 24377.

run "cat /proc/zoneinfo" to get the ZONE_DEVICE info:
Node 1, zone Device
pages free 0
boost 0
min 0
low 0
high 0
spanned 0
present 0
managed 0
cma 0
protection: (0, 0, 0, 0, 0)

After this series,
run "page-types -r", the result is:
flags page-count MB symbolic-flags long-symbolic-flags
0x0000000100000000 548665 2143 ___________________________r________________ reserved
, which means the only PG_reserved set of pages in system wide have 548665.

Run "cat /proc/zoneinfo" to get the ZONE_DEVICE info:
Node 1, zone Device
pages free 0
boost 0
min 0
low 0
high 0
spanned 524288
present 0
managed 0
cma 0
protection: (0, 0, 0, 0, 0)

, these added pages number is 524288 in ZONE_DEVICE as spanned field
showed. Meanwhile, we can do 548665 - 24377 = 524288 that is increment
of the reserved pages, it equals to the spanned field of ZONE_DEVICE.
Hence it looks like the patchset works well.

v2 -> v3:
* Before returning page pointer, check validity of page by
pgmap_pfn_valid(). https://lkml.org/lkml/2022/1/10/853 .

v1 -> v2:
* Take David's suggestion to simplify the implementation of
pfn_to_devmap_page(). Please take a look at
https://lkml.org/lkml/2022/1/10/320 .

Xiongwei Song (2):
mm/memremap.c: Add pfn_to_devmap_page() to get page in ZONE_DEVICE
proc: Add getting pages info of ZONE_DEVICE support

fs/proc/page.c | 35 ++++++++++++++++++++-------------
include/linux/memremap.h | 8 ++++++++
mm/memremap.c | 42 ++++++++++++++++++++++++++++++++++++++++
3 files changed, 72 insertions(+), 13 deletions(-)

--
2.30.2



2022-01-12 14:46:37

by Xiongwei Song

[permalink] [raw]
Subject: [PATCH v3 1/2] mm/memremap.c: Add pfn_to_devmap_page() to get page in ZONE_DEVICE

From: Xiongwei Song <[email protected]>

when requesting page information by /proc/kpage*, the pages in ZONE_DEVICE
were ignored . We need a function to help on this.

The pfn_to_devmap_page() function like pfn_to_online_page(), but only
concerns the pages in ZONE_DEVICE.

Suggested-by: David Hildenbrand <[email protected]>
Signed-off-by: Xiongwei Song <[email protected]>
---
v3: Before returning page pointer, check validity of page by
pgmap_pfn_valid().
v2: Simplify pfn_to_devmap_page() as David suggested.
---
include/linux/memremap.h | 8 ++++++++
mm/memremap.c | 19 +++++++++++++++++++
2 files changed, 27 insertions(+)

diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index c0e9d35889e8..621723e9c4a5 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -137,6 +137,8 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap);
void devm_memunmap_pages(struct device *dev, struct dev_pagemap *pgmap);
struct dev_pagemap *get_dev_pagemap(unsigned long pfn,
struct dev_pagemap *pgmap);
+struct page *pfn_to_devmap_page(unsigned long pfn,
+ struct dev_pagemap **pgmap);
bool pgmap_pfn_valid(struct dev_pagemap *pgmap, unsigned long pfn);

unsigned long vmem_altmap_offset(struct vmem_altmap *altmap);
@@ -166,6 +168,12 @@ static inline struct dev_pagemap *get_dev_pagemap(unsigned long pfn,
return NULL;
}

+static inline struct page *pfn_to_devmap_page(unsigned long pfn,
+ struct dev_pagemap **pgmap)
+{
+ return NULL;
+}
+
static inline bool pgmap_pfn_valid(struct dev_pagemap *pgmap, unsigned long pfn)
{
return false;
diff --git a/mm/memremap.c b/mm/memremap.c
index 5a66a71ab591..782309b74d71 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -494,6 +494,25 @@ struct dev_pagemap *get_dev_pagemap(unsigned long pfn,
}
EXPORT_SYMBOL_GPL(get_dev_pagemap);

+/**
+ * pfn_to_devmap_page - get page pointer which belongs to dev_pagemap by @pfn
+ * @pfn: page frame number to lookup page_map
+ * @pgmap: to save pgmap address which is for putting reference
+ *
+ * If @pgmap is non-NULL, then pfn is on ZONE_DEVICE. Meanwhile check if
+ * pfn is valid in @pgmap, if yes return page pointer.
+ */
+struct page *pfn_to_devmap_page(unsigned long pfn, struct dev_pagemap **pgmap)
+{
+ if (pfn_valid(pfn)) {
+ *pgmap = get_dev_pagemap(pfn, NULL);
+ if (*pgmap && pgmap_pfn_valid(*pgmap, pfn))
+ return pfn_to_page(pfn);
+ }
+
+ return NULL;
+}
+
#ifdef CONFIG_DEV_PAGEMAP_OPS
void free_devmap_managed_page(struct page *page)
{
--
2.30.2


2022-01-12 14:46:36

by Xiongwei Song

[permalink] [raw]
Subject: [PATCH v3 2/2] proc: Add getting pages info of ZONE_DEVICE support

From: Xiongwei Song <[email protected]>

When requesting pages info by /proc/kpage*, the pages in ZONE_DEVICE were
ignored.

The pfn_to_devmap_page() function can help to get page that belongs to
ZONE_DEVICE.

Signed-off-by: Xiongwei Song <[email protected]>
---
V3: Reset pgmap to NULL after putting dev_pagemap to prevent false non-NULL.
---
fs/proc/page.c | 41 ++++++++++++++++++++++++++++-------------
1 file changed, 28 insertions(+), 13 deletions(-)

diff --git a/fs/proc/page.c b/fs/proc/page.c
index 9f1077d94cde..d4fc308765f5 100644
--- a/fs/proc/page.c
+++ b/fs/proc/page.c
@@ -15,6 +15,7 @@
#include <linux/page_idle.h>
#include <linux/kernel-page-flags.h>
#include <linux/uaccess.h>
+#include <linux/memremap.h>
#include "internal.h"

#define KPMSIZE sizeof(u64)
@@ -46,6 +47,7 @@ static ssize_t kpagecount_read(struct file *file, char __user *buf,
{
const unsigned long max_dump_pfn = get_max_dump_pfn();
u64 __user *out = (u64 __user *)buf;
+ struct dev_pagemap *pgmap = NULL;
struct page *ppage;
unsigned long src = *ppos;
unsigned long pfn;
@@ -60,17 +62,20 @@ static ssize_t kpagecount_read(struct file *file, char __user *buf,
count = min_t(unsigned long, count, (max_dump_pfn * KPMSIZE) - src);

while (count > 0) {
- /*
- * TODO: ZONE_DEVICE support requires to identify
- * memmaps that were actually initialized.
- */
ppage = pfn_to_online_page(pfn);
+ if (!ppage)
+ ppage = pfn_to_devmap_page(pfn, &pgmap);

if (!ppage || PageSlab(ppage) || page_has_type(ppage))
pcount = 0;
else
pcount = page_mapcount(ppage);

+ if (pgmap) {
+ put_dev_pagemap(pgmap);
+ pgmap = NULL;
+ }
+
if (put_user(pcount, out)) {
ret = -EFAULT;
break;
@@ -229,10 +234,12 @@ static ssize_t kpageflags_read(struct file *file, char __user *buf,
{
const unsigned long max_dump_pfn = get_max_dump_pfn();
u64 __user *out = (u64 __user *)buf;
+ struct dev_pagemap *pgmap = NULL;
struct page *ppage;
unsigned long src = *ppos;
unsigned long pfn;
ssize_t ret = 0;
+ u64 flags;

pfn = src / KPMSIZE;
if (src & KPMMASK || count & KPMMASK)
@@ -242,13 +249,17 @@ static ssize_t kpageflags_read(struct file *file, char __user *buf,
count = min_t(unsigned long, count, (max_dump_pfn * KPMSIZE) - src);

while (count > 0) {
- /*
- * TODO: ZONE_DEVICE support requires to identify
- * memmaps that were actually initialized.
- */
ppage = pfn_to_online_page(pfn);
+ if (!ppage)
+ ppage = pfn_to_devmap_page(pfn, &pgmap);
+
+ flags = stable_page_flags(ppage);
+ if (pgmap) {
+ put_dev_pagemap(pgmap);
+ pgmap = NULL;
+ }

- if (put_user(stable_page_flags(ppage), out)) {
+ if (put_user(flags, out)) {
ret = -EFAULT;
break;
}
@@ -277,6 +288,7 @@ static ssize_t kpagecgroup_read(struct file *file, char __user *buf,
{
const unsigned long max_dump_pfn = get_max_dump_pfn();
u64 __user *out = (u64 __user *)buf;
+ struct dev_pagemap *pgmap = NULL;
struct page *ppage;
unsigned long src = *ppos;
unsigned long pfn;
@@ -291,17 +303,20 @@ static ssize_t kpagecgroup_read(struct file *file, char __user *buf,
count = min_t(unsigned long, count, (max_dump_pfn * KPMSIZE) - src);

while (count > 0) {
- /*
- * TODO: ZONE_DEVICE support requires to identify
- * memmaps that were actually initialized.
- */
ppage = pfn_to_online_page(pfn);
+ if (!ppage)
+ ppage = pfn_to_devmap_page(pfn, &pgmap);

if (ppage)
ino = page_cgroup_ino(ppage);
else
ino = 0;

+ if (pgmap) {
+ put_dev_pagemap(pgmap);
+ pgmap = NULL;
+ }
+
if (put_user(ino, out)) {
ret = -EFAULT;
break;
--
2.30.2


2022-01-13 15:31:25

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] proc: Add getting pages info of ZONE_DEVICE support

On Wed 12-01-22 22:35:17, [email protected] wrote:
> From: Xiongwei Song <[email protected]>
>
> When requesting pages info by /proc/kpage*, the pages in ZONE_DEVICE were
> ignored.
>
> The pfn_to_devmap_page() function can help to get page that belongs to
> ZONE_DEVICE.

Why is this needed? Who would consume that information and what for?
--
Michal Hocko
SUSE Labs

2022-01-14 10:03:35

by Xiongwei Song

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] proc: Add getting pages info of ZONE_DEVICE support

HI Michal,

On Thu, Jan 13, 2022 at 11:31 PM Michal Hocko <[email protected]> wrote:
>
> On Wed 12-01-22 22:35:17, [email protected] wrote:
> > From: Xiongwei Song <[email protected]>
> >
> > When requesting pages info by /proc/kpage*, the pages in ZONE_DEVICE were
> > ignored.
> >
> > The pfn_to_devmap_page() function can help to get page that belongs to
> > ZONE_DEVICE.
>
> Why is this needed? Who would consume that information and what for?

It's for debug purpose, which checks page flags in system wide. No any other
special thought. But it looks like it's not appropriate to expose now from my
understand, which is from David's comment.
https://lore.kernel.org/linux-mm/[email protected]/T/#m4eccbb2698dbebc80ec00be47382b34b0f64b4fc

Regards,
Xingwei

2022-01-14 10:18:50

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] proc: Add getting pages info of ZONE_DEVICE support

On Fri 14-01-22 18:03:04, Xiongwei Song wrote:
> HI Michal,
>
> On Thu, Jan 13, 2022 at 11:31 PM Michal Hocko <[email protected]> wrote:
> >
> > On Wed 12-01-22 22:35:17, [email protected] wrote:
> > > From: Xiongwei Song <[email protected]>
> > >
> > > When requesting pages info by /proc/kpage*, the pages in ZONE_DEVICE were
> > > ignored.
> > >
> > > The pfn_to_devmap_page() function can help to get page that belongs to
> > > ZONE_DEVICE.
> >
> > Why is this needed? Who would consume that information and what for?
>
> It's for debug purpose, which checks page flags in system wide. No any other
> special thought. But it looks like it's not appropriate to expose now from my
> understand, which is from David's comment.
> https://lore.kernel.org/linux-mm/[email protected]/T/#m4eccbb2698dbebc80ec00be47382b34b0f64b4fc

yes, I do agree with David. This is the reason I am asking because I do
remember we have deliberately excluded those pages. If there is no real
user to use that information then I do not think we want to make the
code more complex and check for memmap and other peculiarities.

Thanks!
--
Michal Hocko
SUSE Labs