Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20;
From:   Yin Fengwei <fengwei.yin@intel.com>
To:     linux-mm@kvack.org, linux-kernel@vger.kernel.org,
        akpm@linux-foundation.org, yuzhao@google.com, willy@infradead.org,
        david@redhat.com, ryan.roberts@arm.com, shy828301@gmail.com
Cc:     fengwei.yin@intel.com
Subject: [RFC PATCH v2 0/3] support large folio for mlock
Date:   Wed, 12 Jul 2023 14:01:41 +0800
Message-Id: <20230712060144.3006358-1-fengwei.yin@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Yu mentioned at [1] about the mlock() can't be applied to large folio.

I leant the related code and here is my understanding:
- For RLIMIT_MEMLOCK related, there is no problem. Becuase the
  RLIMIT_MEMLOCK statistics is not related underneath page. That means
  underneath page mlock or munlock doesn't impact the RLIMIT_MEMLOCK
  statistics collection which is always correct.

- For keeping the page in RAM, there is no problem either. At least,
  during try_to_unmap_one(), once detect the VMA has VM_LOCKED bit
  set in vm_flags, the folio will be kept whatever the folio is
  mlocked or not.

So the function of mlock for large folio works. But it's not optimized
because the page reclaim needs scan these large folio and may split
them.

This series identified the large folio for mlock to two types:
  - The large folio is in VM_LOCKED VMA range
  - The large folio cross VM_LOCKED VMA boundary

For the first type, we mlock large folio so page relcaim will skip it.
For the second type, we don't mlock large folio. It's allowed to be
picked by page reclaim and be split. So the pages not in VM_LOCKED VMA
range are allowed to be reclaimed/released.

patch1 introduce API to check whether large folio is in VMA range.
patch2 make page reclaim/mlock_vma_folio/munlock_vma_folio support
large folio mlock/munlock.
patch3 make mlock/munlock syscall support large folio.

testing done:
  - kernel selftest. No extra failure introduced

Matthew commented on v1 that the large folio should be split if it
crosses the VMA boundaries. But there is no obvious correct method
to handle split failure and it's a common issue for mprotect,
mlock, mremap, munmap....

So I keep v1 behaivor (not split folio if it crosses VMA boundaries)
in v2.

[1] https://lore.kernel.org/linux-mm/CAOUHufbtNPkdktjt_5qM45GegVO-rCFOMkSh0HQminQ12zsV8Q@mail.gmail.com/

Changes from v1:
  patch1:
    - Add new function folio_within_vma() based on folio_in_range()
      per Yu's suggestion

  patch2:
    - Update folio_referenced_one() to skip the entries which are
      out of VM_LOCKED VMA range if the large folio cross VMA
      boundaries per Yu's suggestion

  patch3:
    - Simplify the changes in mlock_pte_range() by introduing two
      helper functions should_mlock_folio() and get_folio_mlock_step()
      per Yu's suggestion


Yin Fengwei (3):
  mm: add functions folio_in_range() and folio_within_vma()
  mm: handle large folio when large folio in VM_LOCKED VMA range
  mm: mlock: update mlock_pte_range to handle large folio

 mm/internal.h |  43 +++++++++++++++++++--
 mm/mlock.c    | 104 +++++++++++++++++++++++++++++++++++++++++++++++---
 mm/rmap.c     |  34 +++++++++++++----
 3 files changed, 166 insertions(+), 15 deletions(-)

-- 
2.39.2