Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp3996032imm; Mon, 17 Sep 2018 06:39:19 -0700 (PDT) X-Google-Smtp-Source: ANB0VdZ7VOFFvNUoY6AIFaZYtg5vbAdzgR6oq258bKahiL0tTDGXhpsa7nDCBWla6RIYmknqQXfH X-Received: by 2002:a63:5845:: with SMTP id i5-v6mr23547192pgm.272.1537191558898; Mon, 17 Sep 2018 06:39:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537191558; cv=none; d=google.com; s=arc-20160816; b=IFF8YjfEfoTb9hRT91jeHfA+epZYQkqejL2Xv1Hn5hyVOVQGWEHrJ3JPBBGuR+Sqim 0L7Y1iHOEOmLhMv3QPCLiyuDfBwg68rUSfJWuvOCBIfFfQe9iis3sY93ZY5f0W4t1Skz yHc4bmmJR/DaeBa431r3vsFpUeYpdBrspSYGlh6rupjAK9okjtOJayBbDTBoGdjPHJUY 64IDBXDFU4vk2nD5/vD1rOOpYMzB2/py3dgtMIqaFMhdI4u0YBve0wwj3ln0ulRosLhn 45SKBxf/MujLb0JwEwWDW7lI2BMPRmwMfxpHxgLeBvMRV/X6Hn6FRIOsPjyK/uDZcj4j vNIw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from; bh=YQ9BWkWgUNtX8lyNe55pgzMHfwL7wexHsonpod9NVqg=; b=Xp4iw1RRMh4aNGmXNJu8RuHDykium72TCmuYMH4wI6UOY/okgTRPS70X0IjgQna3uS Kmi0oXUfeKo5HS3wwj9tbzpPeW3TzSjvJAw636FPrJ/DiD14UaMhZ6EqZdJSwkwTw+i6 2xZMV7LE9cyr6bWY8YUW8WnWQPAzaUEnC0Ox+wireu0FA3zk1PyakOXbw4oT1/AzxWWG XF/MteRwhEkxXkuPRei34as97XrEtgrwyoqPCvE51gPfOLPkgMEA14GFwnOf1epTBawV 4r3tYgrt1rdB/5RsYR4dGPK8nqT85oKkmV2Sr53CCoGKt5CoSmYLlgB80jOQMeV27fu3 mXvQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o69-v6si15416996pfi.279.2018.09.17.06.39.03; Mon, 17 Sep 2018 06:39:18 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728374AbeIQTFw (ORCPT + 99 others); Mon, 17 Sep 2018 15:05:52 -0400 Received: from mga09.intel.com ([134.134.136.24]:47172 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726865AbeIQTFv (ORCPT ); Mon, 17 Sep 2018 15:05:51 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 17 Sep 2018 06:38:28 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,385,1531810800"; d="scan'208";a="90780111" Received: from black.fi.intel.com ([10.237.72.28]) by fmsmga001.fm.intel.com with ESMTP; 17 Sep 2018 06:38:25 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id D827713F; Mon, 17 Sep 2018 16:38:24 +0300 (EEST) From: "Kirill A. Shutemov" To: Andrew Morton Cc: Vegard Nossum , linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" , stable@vger.kernel.org, Zi Yan , Naoya Horiguchi , Vlastimil Babka , Andrea Arcangeli Subject: [PATCHv2] mm, thp: Fix mlocking THP page with migration enabled Date: Mon, 17 Sep 2018 16:38:16 +0300 Message-Id: <20180917133816.43995-1-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.18.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org A transparent huge page is represented by a single entry on an LRU list. Therefore, we can only make unevictable an entire compound page, not individual subpages. If a user tries to mlock() part of a huge page, we want the rest of the page to be reclaimable. We handle this by keeping PTE-mapped huge pages on normal LRU lists: the PMD on border of VM_LOCKED VMA will be split into PTE table. Introduction of THP migration breaks[1] the rules around mlocking THP pages. If we had a single PMD mapping of the page in mlocked VMA, the page will get mlocked, regardless of PTE mappings of the page. For tmpfs/shmem it's easy to fix by checking PageDoubleMap() in remove_migration_pmd(). Anon THP pages can only be shared between processes via fork(). Mlocked page can only be shared if parent mlocked it before forking, otherwise CoW will be triggered on mlock(). For Anon-THP, we can fix the issue by munlocking the page on removing PTE migration entry for the page. PTEs for the page will always come after mlocked PMD: rmap walks VMAs from oldest to newest. Test-case: #include #include #include #include #include int main(void) { unsigned long nodemask = 4; void *addr; addr = mmap((void *)0x20000000UL, 2UL << 20, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_LOCKED, -1, 0); if (fork()) { wait(NULL); return 0; } mlock(addr, 4UL << 10); mbind(addr, 2UL << 20, MPOL_PREFERRED | MPOL_F_RELATIVE_NODES, &nodemask, 4, MPOL_MF_MOVE); return 0; } [1] https://lkml.kernel.org/r/CAOMGZ=G52R-30rZvhGxEbkTw7rLLwBGadVYeo--iizcD3upL3A@mail.gmail.com Signed-off-by: Kirill A. Shutemov Reported-by: Vegard Nossum Fixes: 616b8371539a ("mm: thp: enable thp migration in generic path") Cc: [v4.14+] Cc: Zi Yan Cc: Naoya Horiguchi Cc: Vlastimil Babka Cc: Andrea Arcangeli --- mm/huge_memory.c | 2 +- mm/migrate.c | 3 +++ 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 533f9b00147d..00704060b7f7 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2931,7 +2931,7 @@ void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new) else page_add_file_rmap(new, true); set_pmd_at(mm, mmun_start, pvmw->pmd, pmde); - if (vma->vm_flags & VM_LOCKED) + if ((vma->vm_flags & VM_LOCKED) && !PageDoubleMap(new)) mlock_vma_page(new); update_mmu_cache_pmd(vma, address, pvmw->pmd); } diff --git a/mm/migrate.c b/mm/migrate.c index d6a2e89b086a..9d374011c244 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -275,6 +275,9 @@ static bool remove_migration_pte(struct page *page, struct vm_area_struct *vma, if (vma->vm_flags & VM_LOCKED && !PageTransCompound(new)) mlock_vma_page(new); + if (PageTransHuge(page) && PageMlocked(page)) + clear_page_mlock(page); + /* No need to invalidate - it was non-present before */ update_mmu_cache(vma, pvmw.address, pvmw.pte); } -- 2.18.0