2017-09-22 20:14:11

by Reza Arbab

[permalink] [raw]
Subject: [PATCH] mm/device-public-memory: Enable move_pages() to stat device memory

The move_pages() syscall can be used to find the numa node where a page
currently resides. This is not working for device public memory pages,
which erroneously report -EFAULT (unmapped or zero page).

Enable by adding a FOLL_DEVICE flag for follow_page(), which
move_pages() will use. This could be done unconditionally, but adding a
flag seems like a safer change.

Cc: Jérôme Glisse <[email protected]>
Signed-off-by: Reza Arbab <[email protected]>
---
include/linux/mm.h | 1 +
mm/gup.c | 2 +-
mm/migrate.c | 2 +-
3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index f8c10d3..783cb57 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2368,6 +2368,7 @@ static inline struct page *follow_page(struct vm_area_struct *vma,
#define FOLL_MLOCK 0x1000 /* lock present pages */
#define FOLL_REMOTE 0x2000 /* we are working on non-current tsk/mm */
#define FOLL_COW 0x4000 /* internal GUP flag */
+#define FOLL_DEVICE 0x8000 /* return device pages */

static inline int vm_fault_to_errno(int vm_fault, int foll_flags)
{
diff --git a/mm/gup.c b/mm/gup.c
index b2b4d42..6fbad70 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -110,7 +110,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
return NULL;
}

- page = vm_normal_page(vma, address, pte);
+ page = _vm_normal_page(vma, address, pte, flags & FOLL_DEVICE);
if (!page && pte_devmap(pte) && (flags & FOLL_GET)) {
/*
* Only return device mapping pages in the FOLL_GET case since
diff --git a/mm/migrate.c b/mm/migrate.c
index 6954c14..dea0ceb 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1690,7 +1690,7 @@ static void do_pages_stat_array(struct mm_struct *mm, unsigned long nr_pages,
goto set_status;

/* FOLL_DUMP to ignore special (like zero) pages */
- page = follow_page(vma, addr, FOLL_DUMP);
+ page = follow_page(vma, addr, FOLL_DUMP | FOLL_DEVICE);

err = PTR_ERR(page);
if (IS_ERR(page))
--
1.8.3.1


2017-09-22 20:32:09

by Reza Arbab

[permalink] [raw]
Subject: Re: [PATCH] mm/device-public-memory: Enable move_pages() to stat device memory

On Fri, Sep 22, 2017 at 08:13:56PM +0000, Reza Arbab wrote:
>The move_pages() syscall can be used to find the numa node where a page
>currently resides. This is not working for device public memory pages,
>which erroneously report -EFAULT (unmapped or zero page).

Argh. Please disregard this patch.

My test setup has a chunk of system memory carved out as pretend device
public memory, to experiment with. Of course the real thing has no numa
node!

Apologies all, it's been a long day.

--
Reza Arbab

2017-09-22 21:01:25

by Reza Arbab

[permalink] [raw]
Subject: Re: [PATCH] mm/device-public-memory: Enable move_pages() to stat device memory

On Fri, Sep 22, 2017 at 08:31:57PM +0000, Reza Arbab wrote:
>On Fri, Sep 22, 2017 at 08:13:56PM +0000, Reza Arbab wrote:
>>The move_pages() syscall can be used to find the numa node where a page
>>currently resides. This is not working for device public memory pages,
>>which erroneously report -EFAULT (unmapped or zero page).
>
>Argh. Please disregard this patch.
>
>My test setup has a chunk of system memory carved out as pretend
>device public memory, to experiment with. Of course the real thing has
>no numa node!

On third thought, yes it does!

static int hmm_devmem_pages_create(struct hmm_devmem *devmem)
{
:
nid = dev_to_node(device);
if (nid < 0)
nid = numa_mem_id();
:
if (devmem->pagemap.type == MEMORY_DEVICE_PUBLIC)
ret = arch_add_memory(nid, align_start, align_size, false);
:
}

So now I think the patch may be right after all. Please un-disregard it.
Regard it? Whatever.

--
Reza Arbab

2017-09-26 13:37:11

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] mm/device-public-memory: Enable move_pages() to stat device memory

On Fri 22-09-17 15:13:56, Reza Arbab wrote:
> The move_pages() syscall can be used to find the numa node where a page
> currently resides. This is not working for device public memory pages,
> which erroneously report -EFAULT (unmapped or zero page).
>
> Enable by adding a FOLL_DEVICE flag for follow_page(), which
> move_pages() will use. This could be done unconditionally, but adding a
> flag seems like a safer change.

I do not understand purpose of this patch. What is the numa node of a
device memory?

> Cc: J?r?me Glisse <[email protected]>
> Signed-off-by: Reza Arbab <[email protected]>
> ---
> include/linux/mm.h | 1 +
> mm/gup.c | 2 +-
> mm/migrate.c | 2 +-
> 3 files changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index f8c10d3..783cb57 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2368,6 +2368,7 @@ static inline struct page *follow_page(struct vm_area_struct *vma,
> #define FOLL_MLOCK 0x1000 /* lock present pages */
> #define FOLL_REMOTE 0x2000 /* we are working on non-current tsk/mm */
> #define FOLL_COW 0x4000 /* internal GUP flag */
> +#define FOLL_DEVICE 0x8000 /* return device pages */
>
> static inline int vm_fault_to_errno(int vm_fault, int foll_flags)
> {
> diff --git a/mm/gup.c b/mm/gup.c
> index b2b4d42..6fbad70 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -110,7 +110,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
> return NULL;
> }
>
> - page = vm_normal_page(vma, address, pte);
> + page = _vm_normal_page(vma, address, pte, flags & FOLL_DEVICE);
> if (!page && pte_devmap(pte) && (flags & FOLL_GET)) {
> /*
> * Only return device mapping pages in the FOLL_GET case since
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 6954c14..dea0ceb 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1690,7 +1690,7 @@ static void do_pages_stat_array(struct mm_struct *mm, unsigned long nr_pages,
> goto set_status;
>
> /* FOLL_DUMP to ignore special (like zero) pages */
> - page = follow_page(vma, addr, FOLL_DUMP);
> + page = follow_page(vma, addr, FOLL_DUMP | FOLL_DEVICE);
>
> err = PTR_ERR(page);
> if (IS_ERR(page))
> --
> 1.8.3.1
>

--
Michal Hocko
SUSE Labs

2017-09-26 14:47:21

by Reza Arbab

[permalink] [raw]
Subject: Re: [PATCH] mm/device-public-memory: Enable move_pages() to stat device memory

On Tue, Sep 26, 2017 at 01:37:07PM +0000, Michal Hocko wrote:
>On Fri 22-09-17 15:13:56, Reza Arbab wrote:
>> The move_pages() syscall can be used to find the numa node where a page
>> currently resides. This is not working for device public memory pages,
>> which erroneously report -EFAULT (unmapped or zero page).
>>
>> Enable by adding a FOLL_DEVICE flag for follow_page(), which
>> move_pages() will use. This could be done unconditionally, but adding a
>> flag seems like a safer change.
>
>I do not understand purpose of this patch. What is the numa node of a
>device memory?

Well, using hmm_devmem_pages_create() it is added to this node:

nid = dev_to_node(device);
if (nid < 0)
nid = numa_mem_id();

I understand it's minimally useful information to userspace, but the
memory does have a nid and move_pages() is supposed to be able to return
what that is. I ran into this using a testcase which tries to verify
that user addresses were correctly migrated to coherent device memory.

That said, I'm okay with dropping this if you don't think it's
worthwhile.

--
Reza Arbab

2017-09-26 16:20:00

by Jerome Glisse

[permalink] [raw]
Subject: Re: [PATCH] mm/device-public-memory: Enable move_pages() to stat device memory

On Tue, Sep 26, 2017 at 09:47:10AM -0500, Reza Arbab wrote:
> On Tue, Sep 26, 2017 at 01:37:07PM +0000, Michal Hocko wrote:
> > On Fri 22-09-17 15:13:56, Reza Arbab wrote:
> > > The move_pages() syscall can be used to find the numa node where a page
> > > currently resides. This is not working for device public memory pages,
> > > which erroneously report -EFAULT (unmapped or zero page).
> > >
> > > Enable by adding a FOLL_DEVICE flag for follow_page(), which
> > > move_pages() will use. This could be done unconditionally, but adding a
> > > flag seems like a safer change.
> >
> > I do not understand purpose of this patch. What is the numa node of a
> > device memory?
>
> Well, using hmm_devmem_pages_create() it is added to this node:
>
> nid = dev_to_node(device);
> if (nid < 0)
> nid = numa_mem_id();
>
> I understand it's minimally useful information to userspace, but the memory
> does have a nid and move_pages() is supposed to be able to return what that
> is. I ran into this using a testcase which tries to verify that user
> addresses were correctly migrated to coherent device memory.
>
> That said, I'm okay with dropping this if you don't think it's worthwhile.

Just to add a data point, PCIE devices are tie to one CPU (architecturaly PCIE
lane are connected to CPU at least on x86/ppc AFAIK) and thus to one numa node.


Right now i am traveling but i want to check that this patch does not allow
user to inadvertaly pin device memory page. I will look into it once i am
back.

Cheers,
J?r?me

2017-09-26 16:32:46

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] mm/device-public-memory: Enable move_pages() to stat device memory

On Tue 26-09-17 09:47:10, Reza Arbab wrote:
> On Tue, Sep 26, 2017 at 01:37:07PM +0000, Michal Hocko wrote:
> > On Fri 22-09-17 15:13:56, Reza Arbab wrote:
> > > The move_pages() syscall can be used to find the numa node where a page
> > > currently resides. This is not working for device public memory pages,
> > > which erroneously report -EFAULT (unmapped or zero page).
> > >
> > > Enable by adding a FOLL_DEVICE flag for follow_page(), which
> > > move_pages() will use. This could be done unconditionally, but adding a
> > > flag seems like a safer change.
> >
> > I do not understand purpose of this patch. What is the numa node of a
> > device memory?
>
> Well, using hmm_devmem_pages_create() it is added to this node:
>
> nid = dev_to_node(device);
> if (nid < 0)
> nid = numa_mem_id();

OK, but do all the HMM devices have concept of NUMA affinity? From the
code you are pasting they do not have to...

> I understand it's minimally useful information to userspace, but the memory
> does have a nid and move_pages() is supposed to be able to return what that
> is. I ran into this using a testcase which tries to verify that user
> addresses were correctly migrated to coherent device memory.
>
> That said, I'm okay with dropping this if you don't think it's worthwhile.

I am just worried that we allow information which is not generally
sensible and I am also not sure what the userspace can actually do with
that information.
--
Michal Hocko
SUSE Labs

2017-09-26 18:35:33

by Reza Arbab

[permalink] [raw]
Subject: Re: [PATCH] mm/device-public-memory: Enable move_pages() to stat device memory

On Tue, Sep 26, 2017 at 04:32:41PM +0000, Michal Hocko wrote:
>On Tue 26-09-17 09:47:10, Reza Arbab wrote:
>> On Tue, Sep 26, 2017 at 01:37:07PM +0000, Michal Hocko wrote:
>> > On Fri 22-09-17 15:13:56, Reza Arbab wrote:
>> > > The move_pages() syscall can be used to find the numa node where a page
>> > > currently resides. This is not working for device public memory pages,
>> > > which erroneously report -EFAULT (unmapped or zero page).
>> > >
>> > > Enable by adding a FOLL_DEVICE flag for follow_page(), which
>> > > move_pages() will use. This could be done unconditionally, but adding a
>> > > flag seems like a safer change.
>> >
>> > I do not understand purpose of this patch. What is the numa node of a
>> > device memory?
>>
>> Well, using hmm_devmem_pages_create() it is added to this node:
>>
>> nid = dev_to_node(device);
>> if (nid < 0)
>> nid = numa_mem_id();
>
>OK, but do all the HMM devices have concept of NUMA affinity? From the
>code you are pasting they do not have to...

I don't know the definitive answer here, but as Jerome said PCIE devices
should, and we are heading that way with NVLink/CAPI as well. It seems
the default is just the nearest node.

>> I understand it's minimally useful information to userspace, but the memory
>> does have a nid and move_pages() is supposed to be able to return what that
>> is. I ran into this using a testcase which tries to verify that user
>> addresses were correctly migrated to coherent device memory.
>>
>> That said, I'm okay with dropping this if you don't think it's worthwhile.
>
>I am just worried that we allow information which is not generally
>sensible and I am also not sure what the userspace can actually do with
>that information.

As mentioned, it is minimally useful, e.g. for verifying migration, so
returning the nid seems sensible to me. Alternatively, we might at least
change the documentation to say

-EFAULT
This is a zero page, a device page, or the memory area is not mapped by the process.
^^^^^^^^^^^^^

--
Reza Arbab