Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp3875041imm; Tue, 11 Sep 2018 03:34:46 -0700 (PDT) X-Google-Smtp-Source: ANB0VdZS0CINXENUV9XGDoVBrrzYaOE3QoozSureB5ECC2hmox+BfdLCpGEare30kdfRISQqo4Um X-Received: by 2002:a63:d309:: with SMTP id b9-v6mr27846681pgg.163.1536662086668; Tue, 11 Sep 2018 03:34:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536662086; cv=none; d=google.com; s=arc-20160816; b=Y68Vd5PgyzgLRfaSpPomNvKJE8pLO+dcy371yGEvftauis9UO1odqXhMUDHI1TUGiT Eccrx4IQqmKIbtMQmCphE5ZwDDjglV5lnp/520ME02t6iZWxDjoOPkTTbqtu8Emt50tY QyawA5eBpAOTescvefa9G2MXnq7oLNJotTpL8LPIsIiarVLLaT6ihML94irLaXS0+SKF sJJpBG7LyE/rqhOeJ9OIGYgDqV/2i5yv6zcWaNxHvTMToDbxTWp2FZ/C030xyhgMmh+B Ml55Q/eNvBmE6HYTA1pt2qOT/KMP0bKtFUEa3wzyv/T+8pGxSyToqH8vkFEFgbPpQ4ys 1Dug== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from; bh=WP3DagXKtOqtONqNiBtirpo7NUzU3AKNAiuc+m4/cCI=; b=WDSR0xXpaJKOLvpEEPN2UWAbJ7feETsNwqcneolXG3lg+px/MgcZPAM6O/x/iZ0HG3 +0RrjM4xokxPFEQhCgiceLW/v9gkUOAW43Qwfav3sq+ykX74kCR+FI7kZF0gyC/qNRBT b0iRJjxWmWPV96hoJJXBvTN4G15q0d/PW6c5pk5/3cnUQVHtNzX3rqOomN268hRV+VZ2 KTqXbTfpg1ZZ8IZRypVMYMpOl0FVRORfAgHM1vtUwggO+K1NnbjOrAHZCwsFJlyOU43y +srgi1Mmt5bGe+nt60hUhqo5zsS4zhhvdEwdS0ebQi39pDuo4ou2ozNJ6UenEHjFgG1h pWSQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 1-v6si20454542pgb.107.2018.09.11.03.34.30; Tue, 11 Sep 2018 03:34:46 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726899AbeIKPdD (ORCPT + 99 others); Tue, 11 Sep 2018 11:33:03 -0400 Received: from mga18.intel.com ([134.134.136.126]:21385 "EHLO mga18.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726569AbeIKPdD (ORCPT ); Tue, 11 Sep 2018 11:33:03 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 11 Sep 2018 03:34:19 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,360,1531810800"; d="scan'208";a="69964605" Received: from black.fi.intel.com ([10.237.72.28]) by fmsmga008.fm.intel.com with ESMTP; 11 Sep 2018 03:34:17 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id 388CC32A; Tue, 11 Sep 2018 13:34:15 +0300 (EEST) From: "Kirill A. Shutemov" To: Andrew Morton Cc: Vegard Nossum , linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" , stable@vger.kernel.org, Zi Yan , Naoya Horiguchi , Vlastimil Babka , Andrea Arcangeli Subject: [PATCH] mm, thp: Fix mlocking THP page with migration enabled Date: Tue, 11 Sep 2018 13:34:03 +0300 Message-Id: <20180911103403.38086-1-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.18.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org A transparent huge page is represented by a single entry on an LRU list. Therefore, we can only make unevictable an entire compound page, not individual subpages. If a user tries to mlock() part of a huge page, we want the rest of the page to be reclaimable. We handle this by keeping PTE-mapped huge pages on normal LRU lists: the PMD on border of VM_LOCKED VMA will be split into PTE table. Introduction of THP migration breaks the rules around mlocking THP pages. If we had a single PMD mapping of the page in mlocked VMA, the page will get mlocked, regardless of PTE mappings of the page. For tmpfs/shmem it's easy to fix by checking PageDoubleMap() in remove_migration_pmd(). Anon THP pages can only be shared between processes via fork(). Mlocked page can only be shared if parent mlocked it before forking, otherwise CoW will be triggered on mlock(). For Anon-THP, we can fix the issue by munlocking the page on removing PTE migration entry for the page. PTEs for the page will always come after mlocked PMD: rmap walks VMAs from oldest to newest. Test-case: #include #include #include #include #include int main(void) { unsigned long nodemask = 4; void *addr; addr = mmap((void *)0x20000000UL, 2UL << 20, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_LOCKED, -1, 0); if (fork()) { wait(NULL); return 0; } mlock(addr, 4UL << 10); mbind(addr, 2UL << 20, MPOL_PREFERRED | MPOL_F_RELATIVE_NODES, &nodemask, 4, MPOL_MF_MOVE | MPOL_MF_MOVE_ALL); return 0; } Signed-off-by: Kirill A. Shutemov Reported-by: Vegard Nossum Fixes: 616b8371539a ("mm: thp: enable thp migration in generic path") Cc: [v4.14+] Cc: Zi Yan Cc: Naoya Horiguchi Cc: Vlastimil Babka Cc: Andrea Arcangeli --- mm/huge_memory.c | 2 +- mm/migrate.c | 3 +++ 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 533f9b00147d..00704060b7f7 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2931,7 +2931,7 @@ void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new) else page_add_file_rmap(new, true); set_pmd_at(mm, mmun_start, pvmw->pmd, pmde); - if (vma->vm_flags & VM_LOCKED) + if ((vma->vm_flags & VM_LOCKED) && !PageDoubleMap(new)) mlock_vma_page(new); update_mmu_cache_pmd(vma, address, pvmw->pmd); } diff --git a/mm/migrate.c b/mm/migrate.c index d6a2e89b086a..01dad96b25b5 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -275,6 +275,9 @@ static bool remove_migration_pte(struct page *page, struct vm_area_struct *vma, if (vma->vm_flags & VM_LOCKED && !PageTransCompound(new)) mlock_vma_page(new); + if (PageTransCompound(new) && PageMlocked(page)) + clear_page_mlock(page); + /* No need to invalidate - it was non-present before */ update_mmu_cache(vma, pvmw.address, pvmw.pte); } -- 2.18.0