2017-09-26 07:57:13

by Xishi Qiu

[permalink] [raw]
Subject: [RFC] a question about mlockall() and mprotect()

When we call mlockall(), we will add VM_LOCKED to the vma,
if the vma prot is ---p, then mm_populate -> get_user_pages
will not alloc memory.

I find it said "ignore errors" in mm_populate()
static inline void mm_populate(unsigned long addr, unsigned long len)
{
/* Ignore errors */
(void) __mm_populate(addr, len, 1);
}

And later we call mprotect() to change the prot, then it is
still not alloc memory for the mlocked vma.

My question is that, shall we alloc memory if the prot changed,
and who(kernel, glibc, user) should alloc the memory?

Thanks,
Xishi Qiu


2017-09-26 08:17:25

by Michal Hocko

[permalink] [raw]
Subject: Re: [RFC] a question about mlockall() and mprotect()

On Tue 26-09-17 15:56:55, Xishi Qiu wrote:
> When we call mlockall(), we will add VM_LOCKED to the vma,
> if the vma prot is ---p,

not sure what you mean here. apply_mlockall_flags will set the flag on
all vmas except for special mappings (mlock_fixup). This phase will
cause that memory reclaim will not free already mapped pages in those
vmas (see page_check_references and the lazy mlock pages move to
unevictable LRUs).

> then mm_populate -> get_user_pages will not alloc memory.

mm_populate all the vmas with pages. Well there are certainly some
constrains - e.g. memory cgroup hard limit might be hit and so the
faulting might fail.

> I find it said "ignore errors" in mm_populate()
> static inline void mm_populate(unsigned long addr, unsigned long len)
> {
> /* Ignore errors */
> (void) __mm_populate(addr, len, 1);
> }

But we do not report the failure because any failure past
apply_mlockall_flags would be tricky to handle. We have already dropped
the mmap_sem lock so some other address space operations could have
interfered.

> And later we call mprotect() to change the prot, then it is
> still not alloc memory for the mlocked vma.
>
> My question is that, shall we alloc memory if the prot changed,
> and who(kernel, glibc, user) should alloc the memory?

I do not understand your question but if you are asking how to get pages
to map your vmas then touching that area will fault the memory in.
--
Michal Hocko
SUSE Labs

2017-09-26 08:40:12

by Xishi Qiu

[permalink] [raw]
Subject: Re: [RFC] a question about mlockall() and mprotect()

On 2017/9/26 16:17, Michal Hocko wrote:

> On Tue 26-09-17 15:56:55, Xishi Qiu wrote:
>> When we call mlockall(), we will add VM_LOCKED to the vma,
>> if the vma prot is ---p,
>
> not sure what you mean here. apply_mlockall_flags will set the flag on
> all vmas except for special mappings (mlock_fixup). This phase will
> cause that memory reclaim will not free already mapped pages in those
> vmas (see page_check_references and the lazy mlock pages move to
> unevictable LRUs).
>
>> then mm_populate -> get_user_pages will not alloc memory.
>
> mm_populate all the vmas with pages. Well there are certainly some
> constrains - e.g. memory cgroup hard limit might be hit and so the
> faulting might fail.
>
>> I find it said "ignore errors" in mm_populate()
>> static inline void mm_populate(unsigned long addr, unsigned long len)
>> {
>> /* Ignore errors */
>> (void) __mm_populate(addr, len, 1);
>> }
>
> But we do not report the failure because any failure past
> apply_mlockall_flags would be tricky to handle. We have already dropped
> the mmap_sem lock so some other address space operations could have
> interfered.
>
>> And later we call mprotect() to change the prot, then it is
>> still not alloc memory for the mlocked vma.
>>
>> My question is that, shall we alloc memory if the prot changed,
>> and who(kernel, glibc, user) should alloc the memory?
>
> I do not understand your question but if you are asking how to get pages
> to map your vmas then touching that area will fault the memory in.

Hi Michal,

syscall mlockall() will first apply the VM_LOCKED to the vma, then
call mm_populate() to map the vmas.

mm_populate
populate_vma_page_range
__get_user_pages
check_vma_flags
And the above path maybe return -EFAULT in some case, right?

If we call mprotect() to change the prot of vma, just let
check_vma_flags() return 0, then we will get the mlocked pages
in following page-fault, right?

My question is that, shall we map the vmas immediately when
the prot changed? If we should map it immediately, who(kernel, glibc, user)
do this step?

Thanks,
Xishi Qiu

2017-09-26 09:03:02

by Michal Hocko

[permalink] [raw]
Subject: Re: [RFC] a question about mlockall() and mprotect()

On Tue 26-09-17 16:39:56, Xishi Qiu wrote:
> On 2017/9/26 16:17, Michal Hocko wrote:
>
> > On Tue 26-09-17 15:56:55, Xishi Qiu wrote:
> >> When we call mlockall(), we will add VM_LOCKED to the vma,
> >> if the vma prot is ---p,
> >
> > not sure what you mean here. apply_mlockall_flags will set the flag on
> > all vmas except for special mappings (mlock_fixup). This phase will
> > cause that memory reclaim will not free already mapped pages in those
> > vmas (see page_check_references and the lazy mlock pages move to
> > unevictable LRUs).
> >
> >> then mm_populate -> get_user_pages will not alloc memory.
> >
> > mm_populate all the vmas with pages. Well there are certainly some
> > constrains - e.g. memory cgroup hard limit might be hit and so the
> > faulting might fail.
> >
> >> I find it said "ignore errors" in mm_populate()
> >> static inline void mm_populate(unsigned long addr, unsigned long len)
> >> {
> >> /* Ignore errors */
> >> (void) __mm_populate(addr, len, 1);
> >> }
> >
> > But we do not report the failure because any failure past
> > apply_mlockall_flags would be tricky to handle. We have already dropped
> > the mmap_sem lock so some other address space operations could have
> > interfered.
> >
> >> And later we call mprotect() to change the prot, then it is
> >> still not alloc memory for the mlocked vma.
> >>
> >> My question is that, shall we alloc memory if the prot changed,
> >> and who(kernel, glibc, user) should alloc the memory?
> >
> > I do not understand your question but if you are asking how to get pages
> > to map your vmas then touching that area will fault the memory in.
>
> Hi Michal,
>
> syscall mlockall() will first apply the VM_LOCKED to the vma, then
> call mm_populate() to map the vmas.
>
> mm_populate
> populate_vma_page_range
> __get_user_pages
> check_vma_flags
> And the above path maybe return -EFAULT in some case, right?
>
> If we call mprotect() to change the prot of vma, just let
> check_vma_flags() return 0, then we will get the mlocked pages
> in following page-fault, right?

Any future page fault to the existing vma will result in the mlocked
page. That is what VM_LOCKED guarantess.

> My question is that, shall we map the vmas immediately when
> the prot changed? If we should map it immediately, who(kernel, glibc, user)
> do this step?

This is still very fuzzy. What are you actually trying to achieve?
--
Michal Hocko
SUSE Labs

2017-09-26 09:14:16

by Xishi Qiu

[permalink] [raw]
Subject: Re: [RFC] a question about mlockall() and mprotect()

On 2017/9/26 17:02, Michal Hocko wrote:

> On Tue 26-09-17 16:39:56, Xishi Qiu wrote:
>> On 2017/9/26 16:17, Michal Hocko wrote:
>>
>>> On Tue 26-09-17 15:56:55, Xishi Qiu wrote:
>>>> When we call mlockall(), we will add VM_LOCKED to the vma,
>>>> if the vma prot is ---p,
>>>
>>> not sure what you mean here. apply_mlockall_flags will set the flag on
>>> all vmas except for special mappings (mlock_fixup). This phase will
>>> cause that memory reclaim will not free already mapped pages in those
>>> vmas (see page_check_references and the lazy mlock pages move to
>>> unevictable LRUs).
>>>
>>>> then mm_populate -> get_user_pages will not alloc memory.
>>>
>>> mm_populate all the vmas with pages. Well there are certainly some
>>> constrains - e.g. memory cgroup hard limit might be hit and so the
>>> faulting might fail.
>>>
>>>> I find it said "ignore errors" in mm_populate()
>>>> static inline void mm_populate(unsigned long addr, unsigned long len)
>>>> {
>>>> /* Ignore errors */
>>>> (void) __mm_populate(addr, len, 1);
>>>> }
>>>
>>> But we do not report the failure because any failure past
>>> apply_mlockall_flags would be tricky to handle. We have already dropped
>>> the mmap_sem lock so some other address space operations could have
>>> interfered.
>>>
>>>> And later we call mprotect() to change the prot, then it is
>>>> still not alloc memory for the mlocked vma.
>>>>
>>>> My question is that, shall we alloc memory if the prot changed,
>>>> and who(kernel, glibc, user) should alloc the memory?
>>>
>>> I do not understand your question but if you are asking how to get pages
>>> to map your vmas then touching that area will fault the memory in.
>>
>> Hi Michal,
>>
>> syscall mlockall() will first apply the VM_LOCKED to the vma, then
>> call mm_populate() to map the vmas.
>>
>> mm_populate
>> populate_vma_page_range
>> __get_user_pages
>> check_vma_flags
>> And the above path maybe return -EFAULT in some case, right?
>>
>> If we call mprotect() to change the prot of vma, just let
>> check_vma_flags() return 0, then we will get the mlocked pages
>> in following page-fault, right?
>
> Any future page fault to the existing vma will result in the mlocked
> page. That is what VM_LOCKED guarantess.
>
>> My question is that, shall we map the vmas immediately when
>> the prot changed? If we should map it immediately, who(kernel, glibc, user)
>> do this step?
>
> This is still very fuzzy. What are you actually trying to achieve?

I don't expect page fault any more after mlock.

2017-09-26 09:18:41

by Michal Hocko

[permalink] [raw]
Subject: Re: [RFC] a question about mlockall() and mprotect()

On Tue 26-09-17 17:13:59, Xishi Qiu wrote:
> On 2017/9/26 17:02, Michal Hocko wrote:
[...]
> > This is still very fuzzy. What are you actually trying to achieve?
>
> I don't expect page fault any more after mlock.

This should be the case normally. Except when mm_populate fails which
can happen e.g. when running inside a memcg with the hard limit
configured. Is there any other unexpected failure scenario you are
seeing?

--
Michal Hocko
SUSE Labs

2017-09-26 09:23:06

by Xishi Qiu

[permalink] [raw]
Subject: Re: [RFC] a question about mlockall() and mprotect()

On 2017/9/26 17:13, Xishi Qiu wrote:

> On 2017/9/26 17:02, Michal Hocko wrote:
>
>> On Tue 26-09-17 16:39:56, Xishi Qiu wrote:
>>> On 2017/9/26 16:17, Michal Hocko wrote:
>>>
>>>> On Tue 26-09-17 15:56:55, Xishi Qiu wrote:
>>>>> When we call mlockall(), we will add VM_LOCKED to the vma,
>>>>> if the vma prot is ---p,
>>>>
>>>> not sure what you mean here. apply_mlockall_flags will set the flag on
>>>> all vmas except for special mappings (mlock_fixup). This phase will
>>>> cause that memory reclaim will not free already mapped pages in those
>>>> vmas (see page_check_references and the lazy mlock pages move to
>>>> unevictable LRUs).
>>>>
>>>>> then mm_populate -> get_user_pages will not alloc memory.
>>>>
>>>> mm_populate all the vmas with pages. Well there are certainly some
>>>> constrains - e.g. memory cgroup hard limit might be hit and so the
>>>> faulting might fail.
>>>>
>>>>> I find it said "ignore errors" in mm_populate()
>>>>> static inline void mm_populate(unsigned long addr, unsigned long len)
>>>>> {
>>>>> /* Ignore errors */
>>>>> (void) __mm_populate(addr, len, 1);
>>>>> }
>>>>
>>>> But we do not report the failure because any failure past
>>>> apply_mlockall_flags would be tricky to handle. We have already dropped
>>>> the mmap_sem lock so some other address space operations could have
>>>> interfered.
>>>>
>>>>> And later we call mprotect() to change the prot, then it is
>>>>> still not alloc memory for the mlocked vma.
>>>>>
>>>>> My question is that, shall we alloc memory if the prot changed,
>>>>> and who(kernel, glibc, user) should alloc the memory?
>>>>
>>>> I do not understand your question but if you are asking how to get pages
>>>> to map your vmas then touching that area will fault the memory in.
>>>
>>> Hi Michal,
>>>
>>> syscall mlockall() will first apply the VM_LOCKED to the vma, then
>>> call mm_populate() to map the vmas.
>>>
>>> mm_populate
>>> populate_vma_page_range
>>> __get_user_pages
>>> check_vma_flags
>>> And the above path maybe return -EFAULT in some case, right?
>>>
>>> If we call mprotect() to change the prot of vma, just let
>>> check_vma_flags() return 0, then we will get the mlocked pages
>>> in following page-fault, right?
>>
>> Any future page fault to the existing vma will result in the mlocked
>> page. That is what VM_LOCKED guarantess.
>>
>>> My question is that, shall we map the vmas immediately when
>>> the prot changed? If we should map it immediately, who(kernel, glibc, user)
>>> do this step?
>>
>> This is still very fuzzy. What are you actually trying to achieve?
>
> I don't expect page fault any more after mlock.
>

Our apps is some thing like RT, and page-fault maybe cause a lot of time,
e.g. lock, mem reclaim ..., so I use mlock and don't want page fault
any more.

Thanks,
Xishi Qiu

>
> .
>



2017-09-26 09:45:20

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [RFC] a question about mlockall() and mprotect()

On 09/26/2017 11:22 AM, Xishi Qiu wrote:
> On 2017/9/26 17:13, Xishi Qiu wrote:
>>> This is still very fuzzy. What are you actually trying to achieve?
>>
>> I don't expect page fault any more after mlock.
>>
>
> Our apps is some thing like RT, and page-fault maybe cause a lot of time,
> e.g. lock, mem reclaim ..., so I use mlock and don't want page fault
> any more.

Why does your app then have restricted mprotect when calling mlockall()
and only later adjusts the mprotect?

Vlastimil

2017-09-26 11:00:16

by Michal Hocko

[permalink] [raw]
Subject: Re: [RFC] a question about mlockall() and mprotect()

On Tue 26-09-17 11:45:16, Vlastimil Babka wrote:
> On 09/26/2017 11:22 AM, Xishi Qiu wrote:
> > On 2017/9/26 17:13, Xishi Qiu wrote:
> >>> This is still very fuzzy. What are you actually trying to achieve?
> >>
> >> I don't expect page fault any more after mlock.
> >>
> >
> > Our apps is some thing like RT, and page-fault maybe cause a lot of time,
> > e.g. lock, mem reclaim ..., so I use mlock and don't want page fault
> > any more.
>
> Why does your app then have restricted mprotect when calling mlockall()
> and only later adjusts the mprotect?

Ahh, OK I see what is goging on. So you have PROT_NONE vma at the time
mlockall and then later mprotect it something else and want to fault all
that memory at the mprotect time?

So basically to do
---
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 6d3e2f082290..b665b5d1c544 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -369,7 +369,7 @@ mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev,
* Private VM_LOCKED VMA becoming writable: trigger COW to avoid major
* fault on access.
*/
- if ((oldflags & (VM_WRITE | VM_SHARED | VM_LOCKED)) == VM_LOCKED &&
+ if ((oldflags & (VM_WRITE | VM_LOCKED)) == VM_LOCKED &&
(newflags & VM_WRITE)) {
populate_vma_page_range(vma, start, end, NULL);
}

--
Michal Hocko
SUSE Labs

2017-09-27 05:51:24

by Xishi Qiu

[permalink] [raw]
Subject: Re: [RFC] a question about mlockall() and mprotect()

On 2017/9/26 19:00, Michal Hocko wrote:

> On Tue 26-09-17 11:45:16, Vlastimil Babka wrote:
>> On 09/26/2017 11:22 AM, Xishi Qiu wrote:
>>> On 2017/9/26 17:13, Xishi Qiu wrote:
>>>>> This is still very fuzzy. What are you actually trying to achieve?
>>>>
>>>> I don't expect page fault any more after mlock.
>>>>
>>>
>>> Our apps is some thing like RT, and page-fault maybe cause a lot of time,
>>> e.g. lock, mem reclaim ..., so I use mlock and don't want page fault
>>> any more.
>>
>> Why does your app then have restricted mprotect when calling mlockall()
>> and only later adjusts the mprotect?
>
> Ahh, OK I see what is goging on. So you have PROT_NONE vma at the time
> mlockall and then later mprotect it something else and want to fault all
> that memory at the mprotect time?
>
> So basically to do
> ---
> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index 6d3e2f082290..b665b5d1c544 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -369,7 +369,7 @@ mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev,
> * Private VM_LOCKED VMA becoming writable: trigger COW to avoid major
> * fault on access.
> */
> - if ((oldflags & (VM_WRITE | VM_SHARED | VM_LOCKED)) == VM_LOCKED &&
> + if ((oldflags & (VM_WRITE | VM_LOCKED)) == VM_LOCKED &&
> (newflags & VM_WRITE)) {
> populate_vma_page_range(vma, start, end, NULL);
> }
>

Hi Michal,

My kernel is v3.10, and I missed this code, thank you reminding me.

Thanks,
Xishi Qiu