by Vlastimil Babka

[permalink] [raw]

Subject: Re: [PATCH v8 3/6] mm: Introduce VM_LOCKONFAULT

On 08/26/2015 08:24 PM, Eric B Munson wrote:
> The cost of faulting in all memory to be locked can be very high when
> working with large mappings. If only portions of the mapping will be
> used this can incur a high penalty for locking.
>
> For the example of a large file, this is the usage pattern for a large
> statical language model (probably applies to other statical or graphical
> models as well). For the security example, any application transacting
> in data that cannot be swapped out (credit card data, medical records,
> etc).
>
> This patch introduces the ability to request that pages are not
> pre-faulted, but are placed on the unevictable LRU when they are finally
> faulted in. The VM_LOCKONFAULT flag will be used together with
> VM_LOCKED and has no effect when set without VM_LOCKED. Setting the
> VM_LOCKONFAULT flag for a VMA will cause pages faulted into that VMA to
> be added to the unevictable LRU when they are faulted or if they are
> already present, but will not cause any missing pages to be faulted in.
>
> Exposing this new lock state means that we cannot overload the meaning
> of the FOLL_POPULATE flag any longer. Prior to this patch it was used
> to mean that the VMA for a fault was locked. This means we need the
> new FOLL_MLOCK flag to communicate the locked state of a VMA.
> FOLL_POPULATE will now only control if the VMA should be populated and
> in the case of VM_LOCKONFAULT, it will not be set.
>
> Signed-off-by: Eric B Munson <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Vlastimil Babka <[email protected]>
> Cc: Jonathan Corbet <[email protected]>
> Cc: "Kirill A. Shutemov" <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]

Acked-by: Vlastimil Babka <[email protected]>

I just wonder if the call to populate_vma_page_range from mprotect_fixup
is just an potentially expensive no-op for VM_LOCKONFAULT vma's? It
might find many cow candidates but faultin_page() won't do anything. And
it shouldn't find any existing pages to put on the unevictable list from
this context.

But it's a corner case and preventing it would mean putting in another
VM_LOCKONFAULT check so maybe we can leave it like this.
-

2015-08-28 19:34:57

by Eric B Munson

[permalink] [raw]

Subject: Re: [PATCH v8 3/6] mm: Introduce VM_LOCKONFAULT

On Fri, 28 Aug 2015, Michal Hocko wrote:

> On Wed 26-08-15 14:24:22, Eric B Munson wrote:
> > The cost of faulting in all memory to be locked can be very high when
> > working with large mappings. If only portions of the mapping will be
> > used this can incur a high penalty for locking.
> >
> > For the example of a large file, this is the usage pattern for a large
> > statical language model (probably applies to other statical or graphical
> > models as well). For the security example, any application transacting
> > in data that cannot be swapped out (credit card data, medical records,
> > etc).
> >
> > This patch introduces the ability to request that pages are not
> > pre-faulted, but are placed on the unevictable LRU when they are finally
> > faulted in. The VM_LOCKONFAULT flag will be used together with
> > VM_LOCKED and has no effect when set without VM_LOCKED. Setting the
> > VM_LOCKONFAULT flag for a VMA will cause pages faulted into that VMA to
> > be added to the unevictable LRU when they are faulted or if they are
> > already present, but will not cause any missing pages to be faulted in.
>
> OK, I can live with this. Thank you for removing the part which exports
> the flag to the userspace.
>
> > Exposing this new lock state means that we cannot overload the meaning
> > of the FOLL_POPULATE flag any longer. Prior to this patch it was used
> > to mean that the VMA for a fault was locked. This means we need the
> > new FOLL_MLOCK flag to communicate the locked state of a VMA.
> > FOLL_POPULATE will now only control if the VMA should be populated and
> > in the case of VM_LOCKONFAULT, it will not be set.
>
> I thinking that this part is really unnecessary. populate_vma_page_range
> could have simply returned without calling gup for VM_LOCKONFAULT
> vmas. You would save the pte walk and the currently mapped pages would
> be still protected from the reclaim. The side effect would be that they
> would litter the regular LRUs and mlock/unevictable counters wouldn't be
> updated until those pages are encountered during the reclaim and culled
> to unevictable list.
>
> I would expect that mlock with this flag would be typically called
> on mostly unpopulated mappings so the side effects would be barely
> noticeable while the lack of pte walk would be really nice (especially
> for the large mappings).
>
> This would be a nice optimization and minor code reduction but I am not
> going to insist on it. I will leave the decision to you.

If I am understanding you correctly, this is how the lock on fault set
started. Jon Corbet pointed out that this would leave pages which were
present when mlock2(MLOCK_ONFAULT) was called in an unlocked state, only
locking them after they were reclaimed and then refaulted.

Even if this was never the case, we scan the entire range for a call to
mlock() and will lock the pages which are present. Why would we pay the
cost of getting the accounting right on the present pages for mlock, but
not lock on fault?

>
> > Signed-off-by: Eric B Munson <[email protected]>
> > Cc: Michal Hocko <[email protected]>
> > Cc: Vlastimil Babka <[email protected]>
> > Cc: Jonathan Corbet <[email protected]>
> > Cc: "Kirill A. Shutemov" <[email protected]>
> > Cc: [email protected]
> > Cc: [email protected]
> > Cc: [email protected]
>
> Acked-by: Michal Hocko <[email protected]>
>
> One note below:
>
> > ---
> > Changes from v7:
> > *Drop entries in smaps and dri code to avoid exposing VM_LOCKONFAULT to
> > userspace. VM_LOCKONFAULT is still exposed via mm/debug.c
> > *Create VM_LOCKED_CLEAR_MASK to be used anywhere we want to clear all
> > flags relating to locked VMAs
> >
> > include/linux/mm.h | 5 +++++
> > kernel/fork.c | 2 +-
> > mm/debug.c | 1 +
> > mm/gup.c | 10 ++++++++--
> > mm/huge_memory.c | 2 +-
> > mm/hugetlb.c | 4 ++--
> > mm/mlock.c | 2 +-
> > mm/mmap.c | 2 +-
> > mm/rmap.c | 6 ++++--
> > 9 files changed, 24 insertions(+), 10 deletions(-)
> [...]
> > diff --git a/mm/rmap.c b/mm/rmap.c
> > index 171b687..14ce002 100644
> > --- a/mm/rmap.c
> > +++ b/mm/rmap.c
> > @@ -744,7 +744,8 @@ static int page_referenced_one(struct page *page, struct vm_area_struct *vma,
> >
> > if (vma->vm_flags & VM_LOCKED) {
> > spin_unlock(ptl);
> > - pra->vm_flags |= VM_LOCKED;
> > + pra->vm_flags |=
> > + (vma->vm_flags & (VM_LOCKED | VM_LOCKONFAULT));
> > return SWAP_FAIL; /* To break the loop */
> > }
> >
> > @@ -765,7 +766,8 @@ static int page_referenced_one(struct page *page, struct vm_area_struct *vma,
> >
> > if (vma->vm_flags & VM_LOCKED) {
> > pte_unmap_unlock(pte, ptl);
> > - pra->vm_flags |= VM_LOCKED;
> > + pra->vm_flags |=
> > + (vma->vm_flags & (VM_LOCKED | VM_LOCKONFAULT));
> > return SWAP_FAIL; /* To break the loop */
> > }
>
> Why do we need to export this? Neither of the consumers care and should
> care. VM_LOCKONFAULT should never be set without VM_LOCKED which is the
> only thing that we should care about.

I exported VM_LOCKONFAULT because this is an internal interface and I
saw no harm in doing so. I do not have a use case for it at the moment,
so I would be fine dropping this hunk.

Attachments:

(No filename) (5.14 kB)
signature.asc (819.00 B)
Digital signature Download all attachments

2015-08-31 08:53:46

by Michal Hocko

[permalink] [raw]

Subject: Re: [PATCH v8 3/6] mm: Introduce VM_LOCKONFAULT

On Fri 28-08-15 15:34:54, Eric B Munson wrote:
> On Fri, 28 Aug 2015, Michal Hocko wrote:
>
> > On Wed 26-08-15 14:24:22, Eric B Munson wrote:
> > > The cost of faulting in all memory to be locked can be very high when
> > > working with large mappings. If only portions of the mapping will be
> > > used this can incur a high penalty for locking.
> > >
> > > For the example of a large file, this is the usage pattern for a large
> > > statical language model (probably applies to other statical or graphical
> > > models as well). For the security example, any application transacting
> > > in data that cannot be swapped out (credit card data, medical records,
> > > etc).
> > >
> > > This patch introduces the ability to request that pages are not
> > > pre-faulted, but are placed on the unevictable LRU when they are finally
> > > faulted in. The VM_LOCKONFAULT flag will be used together with
> > > VM_LOCKED and has no effect when set without VM_LOCKED. Setting the
> > > VM_LOCKONFAULT flag for a VMA will cause pages faulted into that VMA to
> > > be added to the unevictable LRU when they are faulted or if they are
> > > already present, but will not cause any missing pages to be faulted in.
> >
> > OK, I can live with this. Thank you for removing the part which exports
> > the flag to the userspace.
> >
> > > Exposing this new lock state means that we cannot overload the meaning
> > > of the FOLL_POPULATE flag any longer. Prior to this patch it was used
> > > to mean that the VMA for a fault was locked. This means we need the
> > > new FOLL_MLOCK flag to communicate the locked state of a VMA.
> > > FOLL_POPULATE will now only control if the VMA should be populated and
> > > in the case of VM_LOCKONFAULT, it will not be set.
> >
> > I thinking that this part is really unnecessary. populate_vma_page_range
> > could have simply returned without calling gup for VM_LOCKONFAULT
> > vmas. You would save the pte walk and the currently mapped pages would
> > be still protected from the reclaim. The side effect would be that they
> > would litter the regular LRUs and mlock/unevictable counters wouldn't be
> > updated until those pages are encountered during the reclaim and culled
> > to unevictable list.
> >
> > I would expect that mlock with this flag would be typically called
> > on mostly unpopulated mappings so the side effects would be barely
> > noticeable while the lack of pte walk would be really nice (especially
> > for the large mappings).
> >
> > This would be a nice optimization and minor code reduction but I am not
> > going to insist on it. I will leave the decision to you.
>
> If I am understanding you correctly, this is how the lock on fault set
> started. Jon Corbet pointed out that this would leave pages which were
> present when mlock2(MLOCK_ONFAULT) was called in an unlocked state, only
> locking them after they were reclaimed and then refaulted.

Not really. They would be lazily locked during the reclaim. Have a look
at try_to_unmap -> try_to_unmap_one path. So those pages will be
effectively locked - just not accounted for that fact yet.

> Even if this was never the case, we scan the entire range for a call to
> mlock() and will lock the pages which are present. Why would we pay the
> cost of getting the accounting right on the present pages for mlock, but
> not lock on fault?

Because mlock() has a different semantic and you _have_ to walk the whole
range just to pre-fault memory. Mlocking the already present pages is
not really adding much on top. Situation is different with lock on
fault because pre-faulting doesn't happen and crawling the whole range
just to find present pages sounds like a wasted time when the same can
be handled lazily.

But as I've said, I will not insist...

> > > Signed-off-by: Eric B Munson <[email protected]>
> > > Cc: Michal Hocko <[email protected]>
> > > Cc: Vlastimil Babka <[email protected]>
> > > Cc: Jonathan Corbet <[email protected]>
> > > Cc: "Kirill A. Shutemov" <[email protected]>
> > > Cc: [email protected]
> > > Cc: [email protected]
> > > Cc: [email protected]
> >
> > Acked-by: Michal Hocko <[email protected]>
> >
> > One note below:
> >
> > > ---
> > > Changes from v7:
> > > *Drop entries in smaps and dri code to avoid exposing VM_LOCKONFAULT to
> > > userspace. VM_LOCKONFAULT is still exposed via mm/debug.c
> > > *Create VM_LOCKED_CLEAR_MASK to be used anywhere we want to clear all
> > > flags relating to locked VMAs
> > >
> > > include/linux/mm.h | 5 +++++
> > > kernel/fork.c | 2 +-
> > > mm/debug.c | 1 +
> > > mm/gup.c | 10 ++++++++--
> > > mm/huge_memory.c | 2 +-
> > > mm/hugetlb.c | 4 ++--
> > > mm/mlock.c | 2 +-
> > > mm/mmap.c | 2 +-
> > > mm/rmap.c | 6 ++++--
> > > 9 files changed, 24 insertions(+), 10 deletions(-)
> > [...]
> > > diff --git a/mm/rmap.c b/mm/rmap.c
> > > index 171b687..14ce002 100644
> > > --- a/mm/rmap.c
> > > +++ b/mm/rmap.c
> > > @@ -744,7 +744,8 @@ static int page_referenced_one(struct page *page, struct vm_area_struct *vma,
> > >
> > > if (vma->vm_flags & VM_LOCKED) {
> > > spin_unlock(ptl);
> > > - pra->vm_flags |= VM_LOCKED;
> > > + pra->vm_flags |=
> > > + (vma->vm_flags & (VM_LOCKED | VM_LOCKONFAULT));
> > > return SWAP_FAIL; /* To break the loop */
> > > }
> > >
> > > @@ -765,7 +766,8 @@ static int page_referenced_one(struct page *page, struct vm_area_struct *vma,
> > >
> > > if (vma->vm_flags & VM_LOCKED) {
> > > pte_unmap_unlock(pte, ptl);
> > > - pra->vm_flags |= VM_LOCKED;
> > > + pra->vm_flags |=
> > > + (vma->vm_flags & (VM_LOCKED | VM_LOCKONFAULT));
> > > return SWAP_FAIL; /* To break the loop */
> > > }
> >
> > Why do we need to export this? Neither of the consumers care and should
> > care. VM_LOCKONFAULT should never be set without VM_LOCKED which is the
> > only thing that we should care about.
>
> I exported VM_LOCKONFAULT because this is an internal interface and I
> saw no harm in doing so. I do not have a use case for it at the moment,
> so I would be fine dropping this hunk.

I was objecting because nobody except for the population path should
really care about this flag. The real locking semantic is already
described by VM_LOCKED. If there ever is a user of VM_LOCKONFAULT from
those paths it should be added explicitly. So please drop these two.
The fewer instances of VM_LOCKONFAULT we have the easier this will be to
maintain.
--
Michal Hocko
SUSE Labs