by Eric B Munson

[permalink] [raw]

Subject: Re: [PATCH V5 0/7] Allow user to request memory to be locked on page fault

On Mon, 27 Jul 2015, Vlastimil Babka wrote:

> On 07/24/2015 11:28 PM, Eric B Munson wrote:
>
> ...
>
> >Changes from V4:
> >Drop all architectures for new sys call entries except x86[_64] and MIPS
> >Drop munlock2 and munlockall2
> >Make VM_LOCKONFAULT a modifier to VM_LOCKED only to simplify book keeping
> >Adjust tests to match
>
> Hi, thanks for considering my suggestions. Well, I do hope there
> were correct as API's are hard and I'm no API expert. But since
> API's are also impossible to change after merging, I'm sorry but
> I'll keep pestering for one last thing. Thanks again for persisting,
> I do believe it's for the good thing!
>
> The thing is that I still don't like that one has to call
> mlock2(MLOCK_LOCKED) to get the equivalent of the old mlock(). Why
> is that flag needed? We have two modes of locking now, and v5 no
> longer treats them separately in vma flags. But having two flags
> gives us four possible combinations, so two of them would serve
> nothing but to confuse the programmer IMHO. What will mlock2()
> without flags do? What will mlock2(MLOCK_LOCKED | MLOCK_ONFAULT) do?
> (Note I haven't studied the code yet, as having agreed on the API
> should come first. But I did suggest documenting these things more
> thoroughly too...)
> OK I checked now and both cases above seem to return EINVAL.
>
> So about the only point I see in MLOCK_LOCKED flag is parity with
> MAP_LOCKED for mmap(). But as Kirill said (and me before as well)
> MAP_LOCKED is broken anyway so we shouldn't twist the rest just of
> the API to keep the poor thing happier in its misery.
>
> Also note that AFAICS you don't have MCL_LOCKED for mlockall() so
> there's no full parity anyway. But please don't fix that by adding
> MCL_LOCKED :)
>
> Thanks!

I have an MLOCK_LOCKED flag because I prefer an interface to be
explicit. The caller of mlock2() will be required to fill in the flags
argument regardless. I can drop the MLOCK_LOCKED flag with 0 being the
value for LOCKED, but I thought it easier to make clear what was going
on at any call to mlock2(). If user space defines a MLOCK_LOCKED that
happens to be 0, I suppose that would be okay.

We do actually have an MCL_LOCKED, we just call it MCL_CURRENT. Would
you prefer that I match the name in mlock2() (add MLOCK_CURRENT
instead)?

Finally, on the question of MAP_LOCKONFAULT, do you just dislike
MAP_LOCKED and do not want to see it extended, or is this a NAK on the
set if that patch is included. I ask because I have to spin a V6 to get
the MLOCK flag declarations right, but I would prefer not to do a V7+.
If this is a NAK with, I can drop that patch and rework the tests to
cover without the mmap flag. Otherwise I want to keep it, I have an
internal user that would like to see it added.

Attachments:

(No filename) (2.72 kB)
signature.asc (819.00 B)
Digital signature Download all attachments

2015-07-27 13:41:31

On Mon, 27 Jul 2015, Kirill A. Shutemov wrote:

> On Mon, Jul 27, 2015 at 09:41:26AM -0400, Eric B Munson wrote:
> > On Mon, 27 Jul 2015, Kirill A. Shutemov wrote:
> >
> > > On Fri, Jul 24, 2015 at 05:28:43PM -0400, Eric B Munson wrote:
> > > > The cost of faulting in all memory to be locked can be very high when
> > > > working with large mappings. If only portions of the mapping will be
> > > > used this can incur a high penalty for locking.
> > > >
> > > > Now that we have the new VMA flag for the locked but not present state,
> > > > expose it as an mmap option like MAP_LOCKED -> VM_LOCKED.
> > >
> > > As I mentioned before, I don't think this interface is justified.
> > >
> > > MAP_LOCKED has known issues[1]. The MAP_LOCKED problem is not necessary
> > > affects MAP_LOCKONFAULT, but still.
> > >
> > > Let's not add new interface unless it's demonstrably useful.
> > >
> > > [1] http://lkml.kernel.org/g/[email protected]
> >
> > I understand and should have been more explicit. This patch is still
> > included becuase I have an internal user that wants to see it added.
> > The problem discussed in the thread you point out does not affect
> > MAP_LOCKONFAULT because we do not attempt to populate the region with
> > MAP_LOCKONFAULT.
> >
> > As I told Vlastimil, if this is a hard NAK with the patch I can work
> > with that. Otherwise I prefer it stays.
>
> That's not how it works.

I am not sure what you mean here. I have a user that will find this
useful and MAP_LOCKONFAULT does not suffer from the problem you point
out. I do not understand your NAK but thank you for explicit about it.

>
> Once an ABI added to the kernel it stays there practically forever.
> Therefore it must be useful to justify maintenance cost. I don't see it
> demonstrated.

I understand this, and I get that you do not like MAP_LOCKED, but I do
not see how your dislike for MAP_LOCKED means that this would not be
useful.

>
> So, NAK.
>

V6 will not have the new mmap flag unless there is someone else that
speaks up in favor of keeping it.

Attachments:

(No filename) (2.03 kB)
signature.asc (819.00 B)
Digital signature Download all attachments

2015-07-27 14:16:42

by Vlastimil Babka

[permalink] [raw]

Subject: Re: [PATCH V5 0/7] Allow user to request memory to be locked on page fault

On 07/27/2015 03:35 PM, Eric B Munson wrote:
> On Mon, 27 Jul 2015, Vlastimil Babka wrote:
>
>> On 07/24/2015 11:28 PM, Eric B Munson wrote:
>>
>> ...
>>
>>> Changes from V4:
>>> Drop all architectures for new sys call entries except x86[_64] and MIPS
>>> Drop munlock2 and munlockall2
>>> Make VM_LOCKONFAULT a modifier to VM_LOCKED only to simplify book keeping
>>> Adjust tests to match
>>
>> Hi, thanks for considering my suggestions. Well, I do hope there
>> were correct as API's are hard and I'm no API expert. But since
>> API's are also impossible to change after merging, I'm sorry but
>> I'll keep pestering for one last thing. Thanks again for persisting,
>> I do believe it's for the good thing!
>>
>> The thing is that I still don't like that one has to call
>> mlock2(MLOCK_LOCKED) to get the equivalent of the old mlock(). Why
>> is that flag needed? We have two modes of locking now, and v5 no
>> longer treats them separately in vma flags. But having two flags
>> gives us four possible combinations, so two of them would serve
>> nothing but to confuse the programmer IMHO. What will mlock2()
>> without flags do? What will mlock2(MLOCK_LOCKED | MLOCK_ONFAULT) do?
>> (Note I haven't studied the code yet, as having agreed on the API
>> should come first. But I did suggest documenting these things more
>> thoroughly too...)
>> OK I checked now and both cases above seem to return EINVAL.
>>
>> So about the only point I see in MLOCK_LOCKED flag is parity with
>> MAP_LOCKED for mmap(). But as Kirill said (and me before as well)
>> MAP_LOCKED is broken anyway so we shouldn't twist the rest just of
>> the API to keep the poor thing happier in its misery.
>>
>> Also note that AFAICS you don't have MCL_LOCKED for mlockall() so
>> there's no full parity anyway. But please don't fix that by adding
>> MCL_LOCKED :)
>>
>> Thanks!
>
>
> I have an MLOCK_LOCKED flag because I prefer an interface to be
> explicit.

I think it's already explicit enough that the user calls mlock2(), no?
He obviously wants the range mlocked. An optional flag says that there
should be no pre-fault.

> The caller of mlock2() will be required to fill in the flags
> argument regardless.

I guess users not caring about MLOCK_ONFAULT will continue using plain
mlock() without flags anyway.

I can drop the MLOCK_LOCKED flag with 0 being the
> value for LOCKED, but I thought it easier to make clear what was going
> on at any call to mlock2(). If user space defines a MLOCK_LOCKED that
> happens to be 0, I suppose that would be okay.

Yeah that would remove the weird 4-states-of-which-2-are-invalid problem
I mentioned, but at the cost of glibc wrapper behaving differently than
the kernel syscall itself. For little gain.

> We do actually have an MCL_LOCKED, we just call it MCL_CURRENT. Would
> you prefer that I match the name in mlock2() (add MLOCK_CURRENT
> instead)?

Hm it's similar but not exactly the same, because MCL_FUTURE is not the
same as MLOCK_ONFAULT :) So MLOCK_CURRENT would be even more confusing.
Especially if mlockall(MCL_CURRENT | MCL_FUTURE) is OK, but
mlock2(MLOCK_LOCKED | MLOCK_ONFAULT) is invalid.

> Finally, on the question of MAP_LOCKONFAULT, do you just dislike
> MAP_LOCKED and do not want to see it extended, or is this a NAK on the
> set if that patch is included. I ask because I have to spin a V6 to get
> the MLOCK flag declarations right, but I would prefer not to do a V7+.
> If this is a NAK with, I can drop that patch and rework the tests to
> cover without the mmap flag. Otherwise I want to keep it, I have an
> internal user that would like to see it added.

I don't want to NAK that patch if you think it's useful.

2015-07-27 14:54:13

On Tue, 28 Jul 2015, Michal Hocko wrote:

> [I am sorry but I didn't get to this sooner.]
>
> On Mon 27-07-15 10:54:09, Eric B Munson wrote:
> > Now that VM_LOCKONFAULT is a modifier to VM_LOCKED and
> > cannot be specified independentally, it might make more sense to mirror
> > that relationship to userspace. Which would lead to soemthing like the
> > following:
>
> A modifier makes more sense.
>
> > To lock and populate a region:
> > mlock2(start, len, 0);
> >
> > To lock on fault a region:
> > mlock2(start, len, MLOCK_ONFAULT);
> >
> > If LOCKONFAULT is seen as a modifier to mlock, then having the flags
> > argument as 0 mean do mlock classic makes more sense to me.
> >
> > To mlock current on fault only:
> > mlockall(MCL_CURRENT | MCL_ONFAULT);
> >
> > To mlock future on fault only:
> > mlockall(MCL_FUTURE | MCL_ONFAULT);
> >
> > To lock everything on fault:
> > mlockall(MCL_CURRENT | MCL_FUTURE | MCL_ONFAULT);
>
> Makes sense to me. The only remaining and still tricky part would be
> the munlock{all}(flags) behavior. What should munlock(MLOCK_ONFAULT)
> do? Keep locked and poppulate the range or simply ignore the flag an
> just unlock?
>
> I can see some sense to allow munlockall(MCL_FUTURE[|MLOCK_ONFAULT]),
> munlockall(MCL_CURRENT) resp. munlockall(MCL_CURRENT|MCL_FUTURE) but
> other combinations sound weird to me.
>
> Anyway munlock with flags opens new doors of trickiness.

In the current revision there are no new munlock[all] system calls
introduced. munlockall() unconditionally cleared both MCL_CURRENT and
MCL_FUTURE before the set and now unconditionally clears all three.
munlock() does the same for VM_LOCK and VM_LOCKONFAULT. If the user
wants to adjust mlockall flags today, they need to call mlockall a
second time with the new flags, this remains true for mlockall after
this set and the same behavior is mirrored in mlock2. The only
remaining question I have is should we have 2 new mlockall flags so that
the caller can explicitly set VM_LOCKONFAULT in the mm->def_flags vs
locking all current VMAs on fault. I ask because if the user wants to
lock all current VMAs the old way, but all future VMAs on fault they
have to call mlockall() twice:

mlockall(MCL_CURRENT);
mlockall(MCL_CURRENT | MCL_FUTURE | MCL_ONFAULT);

This has the side effect of converting all the current VMAs to
VM_LOCKONFAULT, but because they were all made present and locked in the
first call, this should not matter in most cases. The catch is that,
like mmap(MAP_LOCKED), mlockall() does not communicate if mm_populate()
fails. This has been true of mlockall() from the beginning so I don't
know if it needs more than an entry in the man page to clarify (which I
will add when I add documentation for MCL_ONFAULT). In a much less
likely corner case, it is not possible in the current setup to request
all current VMAs be VM_LOCKONFAULT and all future be VM_LOCKED.

Attachments:

(No filename) (2.84 kB)
signature.asc (819.00 B)
Digital signature Download all attachments

2015-07-28 15:11:06

by Vlastimil Babka

[permalink] [raw]

Subject: Re: [PATCH V5 0/7] Allow user to request memory to be locked on page fault

On 07/28/2015 03:49 PM, Eric B Munson wrote:
> On Tue, 28 Jul 2015, Michal Hocko wrote:
>

[...]

> The only
> remaining question I have is should we have 2 new mlockall flags so that
> the caller can explicitly set VM_LOCKONFAULT in the mm->def_flags vs
> locking all current VMAs on fault. I ask because if the user wants to
> lock all current VMAs the old way, but all future VMAs on fault they
> have to call mlockall() twice:
>
> mlockall(MCL_CURRENT);
> mlockall(MCL_CURRENT | MCL_FUTURE | MCL_ONFAULT);
>
> This has the side effect of converting all the current VMAs to
> VM_LOCKONFAULT, but because they were all made present and locked in the
> first call, this should not matter in most cases.

Shouldn't the user be able to do this?

mlockall(MCL_CURRENT)
mlockall(MCL_FUTURE | MCL_ONFAULT);

Note that the second call shouldn't change (i.e. munlock) existing vma's
just because MCL_CURRENT is not present. The current implementation
doesn't do that thanks to the following in do_mlockall():

if (flags == MCL_FUTURE)
goto out;

before current vma's are processed and MCL_CURRENT is checked. This is
probably so that do_mlockall() can also handle the munlockall() syscall.
So we should be careful not to break this, but otherwise there are no
limitations by not having two MCL_ONFAULT flags. Having to do invoke
syscalls instead of one is not an issue as this shouldn't be frequent
syscall.

> The catch is that,
> like mmap(MAP_LOCKED), mlockall() does not communicate if mm_populate()
> fails. This has been true of mlockall() from the beginning so I don't
> know if it needs more than an entry in the man page to clarify (which I
> will add when I add documentation for MCL_ONFAULT).

Good point.

> In a much less
> likely corner case, it is not possible in the current setup to request
> all current VMAs be VM_LOCKONFAULT and all future be VM_LOCKED.

So again this should work:

mlockall(MCL_CURRENT | MCL_ONFAULT)
mlockall(MCL_FUTURE);

But the order matters here, as current implementation of do_mlockall()
will clear VM_LOCKED from def_flags if MCL_FUTURE is not passed. So
*it's different* from how it handles MCL_CURRENT (as explained above).
And not documented in manpage. Oh crap, this API is a closet full of
skeletons. Maybe it was an unnoticed regression and we can restore some
sanity?

2015-07-28 18:06:46

On 07/29/2015 12:45 PM, Michal Hocko wrote:
>> In a much less
>> likely corner case, it is not possible in the current setup to request
>> all current VMAs be VM_LOCKONFAULT and all future be VM_LOCKED.
>
> Vlastimil has already pointed that out. MCL_FUTURE doesn't clear
> MCL_CURRENT. I was quite surprised in the beginning but it makes a
> perfect sense. mlockall call shouldn't lead into munlocking, that would
> be just weird. Clearing MCL_FUTURE on MCL_CURRENT makes sense on the
> other hand because the request is explicit about _current_ memory and it
> doesn't lead to any munlocking.

Yeah after more thinking it does make some sense despite the perceived
inconsistency, but it's definitely worth documenting properly. It also already
covers the usecase for munlockall2(MCL_FUTURE) which IIRC you had in the earlier
revisions...