Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965633AbbD2P6n (ORCPT ); Wed, 29 Apr 2015 11:58:43 -0400 Received: from mx1.redhat.com ([209.132.183.28]:52617 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753560AbbD2P6m (ORCPT ); Wed, 29 Apr 2015 11:58:42 -0400 Message-ID: <5540FF92.2060706@redhat.com> Date: Wed, 29 Apr 2015 17:58:10 +0200 From: Jerome Marchand User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0 MIME-Version: 1.0 To: "Kirill A. Shutemov" , Andrew Morton , Andrea Arcangeli , Hugh Dickins CC: Dave Hansen , Mel Gorman , Rik van Riel , Vlastimil Babka , Christoph Lameter , Naoya Horiguchi , Steve Capper , "Aneesh Kumar K.V" , Johannes Weiner , Michal Hocko , Sasha Levin , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCHv5 07/28] thp, mlock: do not allow huge pages in mlocked area References: <1429823043-157133-1-git-send-email-kirill.shutemov@linux.intel.com> <1429823043-157133-8-git-send-email-kirill.shutemov@linux.intel.com> In-Reply-To: <1429823043-157133-8-git-send-email-kirill.shutemov@linux.intel.com> Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="t5GwGCslIfa7TsR6XBklQBTiWW0fLVSPb" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6472 Lines: 183 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --t5GwGCslIfa7TsR6XBklQBTiWW0fLVSPb Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable On 04/23/2015 11:03 PM, Kirill A. Shutemov wrote: > With new refcounting THP can belong to several VMAs. This makes tricky > to track THP pages, when they partially mlocked. It can lead to leaking= > mlocked pages to non-VM_LOCKED vmas and other problems. >=20 > With this patch we will split all pages on mlock and avoid > fault-in/collapse new THP in VM_LOCKED vmas. >=20 > I've tried alternative approach: do not mark THP pages mlocked and keep= > them on normal LRUs. This way vmscan could try to split huge pages on > memory pressure and free up subpages which doesn't belong to VM_LOCKED > vmas. But this is user-visible change: we screw up Mlocked accouting > reported in meminfo, so I had to leave this approach aside. >=20 > We can bring something better later, but this should be good enough for= > now. >=20 > Signed-off-by: Kirill A. Shutemov > Tested-by: Sasha Levin Acked-by: Jerome Marchand > --- > mm/gup.c | 2 ++ > mm/huge_memory.c | 5 ++++- > mm/memory.c | 3 ++- > mm/mlock.c | 51 +++++++++++++++++++-----------------------------= --- > 4 files changed, 27 insertions(+), 34 deletions(-) >=20 > diff --git a/mm/gup.c b/mm/gup.c > index eaeeae15006b..7334eb24f414 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -882,6 +882,8 @@ long populate_vma_page_range(struct vm_area_struct = *vma, > VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_sem), mm); > =20 > gup_flags =3D FOLL_TOUCH | FOLL_POPULATE; > + if (vma->vm_flags & VM_LOCKED) > + gup_flags |=3D FOLL_SPLIT; > /* > * We want to touch writable mappings with a write fault in order > * to break COW, except for shared mappings because these don't COW > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index fd8af5b9917f..fa3d4f78b716 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -796,6 +796,8 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm= , struct vm_area_struct *vma, > =20 > if (haddr < vma->vm_start || haddr + HPAGE_PMD_SIZE > vma->vm_end) > return VM_FAULT_FALLBACK; > + if (vma->vm_flags & VM_LOCKED) > + return VM_FAULT_FALLBACK; > if (unlikely(anon_vma_prepare(vma))) > return VM_FAULT_OOM; > if (unlikely(khugepaged_enter(vma, vma->vm_flags))) > @@ -2467,7 +2469,8 @@ static bool hugepage_vma_check(struct vm_area_str= uct *vma) > if ((!(vma->vm_flags & VM_HUGEPAGE) && !khugepaged_always()) || > (vma->vm_flags & VM_NOHUGEPAGE)) > return false; > - > + if (vma->vm_flags & VM_LOCKED) > + return false; > if (!vma->anon_vma || vma->vm_ops) > return false; > if (is_vma_temporary_stack(vma)) > diff --git a/mm/memory.c b/mm/memory.c > index 559c6651d6b6..8bbd3f88544b 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -2156,7 +2156,8 @@ static int wp_page_copy(struct mm_struct *mm, str= uct vm_area_struct *vma, > =20 > pte_unmap_unlock(page_table, ptl); > mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end); > - if (old_page) { > + /* THP pages are never mlocked */ > + if (old_page && !PageTransCompound(old_page)) { > /* > * Don't let another task, with possibly unlocked vma, > * keep the mlocked page. > diff --git a/mm/mlock.c b/mm/mlock.c > index 6fd2cf15e868..76cde3967483 100644 > --- a/mm/mlock.c > +++ b/mm/mlock.c > @@ -443,39 +443,26 @@ void munlock_vma_pages_range(struct vm_area_struc= t *vma, > page =3D follow_page_mask(vma, start, FOLL_GET | FOLL_DUMP, > &page_mask); > =20 > - if (page && !IS_ERR(page)) { > - if (PageTransHuge(page)) { > - lock_page(page); > - /* > - * Any THP page found by follow_page_mask() may > - * have gotten split before reaching > - * munlock_vma_page(), so we need to recompute > - * the page_mask here. > - */ > - page_mask =3D munlock_vma_page(page); > - unlock_page(page); > - put_page(page); /* follow_page_mask() */ > - } else { > - /* > - * Non-huge pages are handled in batches via > - * pagevec. The pin from follow_page_mask() > - * prevents them from collapsing by THP. > - */ > - pagevec_add(&pvec, page); > - zone =3D page_zone(page); > - zoneid =3D page_zone_id(page); > + if (page && !IS_ERR(page) && !PageTransCompound(page)) { > + /* > + * Non-huge pages are handled in batches via > + * pagevec. The pin from follow_page_mask() > + * prevents them from collapsing by THP. > + */ > + pagevec_add(&pvec, page); > + zone =3D page_zone(page); > + zoneid =3D page_zone_id(page); > =20 > - /* > - * Try to fill the rest of pagevec using fast > - * pte walk. This will also update start to > - * the next page to process. Then munlock the > - * pagevec. > - */ > - start =3D __munlock_pagevec_fill(&pvec, vma, > - zoneid, start, end); > - __munlock_pagevec(&pvec, zone); > - goto next; > - } > + /* > + * Try to fill the rest of pagevec using fast > + * pte walk. This will also update start to > + * the next page to process. Then munlock the > + * pagevec. > + */ > + start =3D __munlock_pagevec_fill(&pvec, vma, > + zoneid, start, end); > + __munlock_pagevec(&pvec, zone); > + goto next; > } > /* It's a bug to munlock in the middle of a THP page */ > VM_BUG_ON((start >> PAGE_SHIFT) & page_mask); >=20 --t5GwGCslIfa7TsR6XBklQBTiWW0fLVSPb Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAEBCAAGBQJVQP+SAAoJEHTzHJCtsuoCZ5gH/3I69ZhWEmlf7V/wTsER5w4q vSlrxPlV/RNL3mCM1xn7US6p/FyakhPmUcekdKDB2hjHtRFbOsLYrHEPQhv/eDcE PMx4j4ZZ+8siFfA0ThRqkwerkOp3l90j5Pl53T0sT7G9iap+Gnrsd1b1aIqwXdP5 L8AuuNyS3D/+kw+xlsLun6DdCciZmUTHSQyF5EbsNNJwVk8CwUKSRL1tDQNmtPx9 J72nmCUKKPMRLk27BodT84CXpZxzXy0Rg3hK52a1uSjTOdAFhxnzDYVZDeabISHv utSgYBfZxw5Bvd+U+pLqHdJuXL7mXR0HaSluFTH4thmV/oh802Zd62DP550f04g= =Uc/j -----END PGP SIGNATURE----- --t5GwGCslIfa7TsR6XBklQBTiWW0fLVSPb-- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/