Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755452AbcK2C6E (ORCPT ); Mon, 28 Nov 2016 21:58:04 -0500 Received: from mga04.intel.com ([192.55.52.120]:49493 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752046AbcK2C5z (ORCPT ); Mon, 28 Nov 2016 21:57:55 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.31,714,1473145200"; d="scan'208";a="1091821746" Subject: [PATCH] mremap: move_ptes: check pte dirty after its removal To: Linus Torvalds References: <026b73f6-ca1d-e7bb-766c-4aaeb7071ce6@intel.com> <20161128083715.GA21738@aaronlu.sh.intel.com> <20161128084012.GC21738@aaronlu.sh.intel.com> Cc: Linux Memory Management List , Dave Hansen , Andrew Morton , "Kirill A. Shutemov" , Huang Ying , Linux Kernel Mailing List From: Aaron Lu Message-ID: <977b6c8b-2df3-5f4b-0d6c-fe766cf3fae0@intel.com> Date: Tue, 29 Nov 2016 10:57:53 +0800 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3653 Lines: 99 On 11/29/2016 01:15 AM, Linus Torvalds wrote: > However, I also independently think I found an actual bug while > looking at the code as part of looking at the patch. > > This part looks racy: > > /* > * We are remapping a dirty PTE, make sure to > * flush TLB before we drop the PTL for the > * old PTE or we may race with page_mkclean(). > */ > if (pte_present(*old_pte) && pte_dirty(*old_pte)) > force_flush = true; > pte = ptep_get_and_clear(mm, old_addr, old_pte); > > where the issue is that another thread might make the pte be dirty (in > the hardware walker, so no locking of ours make any difference) > *after* we checked whether it was dirty, but *before* we removed it > from the page tables. Ah, very right. Thanks for the catch! > > So I think the "check for force-flush" needs to come *after*, and we should do > > pte = ptep_get_and_clear(mm, old_addr, old_pte); > if (pte_present(pte) && pte_dirty(pte)) > force_flush = true; > > instead. > > This happens for the pmd case too. Here is a fix patch, sorry for the trouble. >From c0dc52fd3d3be93afb5b97804937a1b1b7ef136e Mon Sep 17 00:00:00 2001 From: Aaron Lu Date: Tue, 29 Nov 2016 10:33:37 +0800 Subject: [PATCH] mremap: move_ptes: check pte dirty after its removal Linus found there still is a race in mremap after commit 5d1904204c99 ("mremap: fix race between mremap() and page cleanning"). As described by Linus: the issue is that another thread might make the pte be dirty (in the hardware walker, so no locking of ours make any difference) *after* we checked whether it was dirty, but *before* we removed it from the page tables. Fix it by moving the check after we removed it from the page table. Suggested-by: Linus Torvalds Signed-off-by: Aaron Lu --- mm/huge_memory.c | 2 +- mm/mremap.c | 6 +++++- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index eff3de359d50..a3e466c489a9 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1456,9 +1456,9 @@ bool move_huge_pmd(struct vm_area_struct *vma, unsigned long old_addr, new_ptl = pmd_lockptr(mm, new_pmd); if (new_ptl != old_ptl) spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING); + pmd = pmdp_huge_get_and_clear(mm, old_addr, old_pmd); if (pmd_present(*old_pmd) && pmd_dirty(*old_pmd)) force_flush = true; - pmd = pmdp_huge_get_and_clear(mm, old_addr, old_pmd); VM_BUG_ON(!pmd_none(*new_pmd)); if (pmd_move_must_withdraw(new_ptl, old_ptl) && diff --git a/mm/mremap.c b/mm/mremap.c index 6ccecc03f56a..4b39dd0974e5 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -149,14 +149,18 @@ static void move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd, if (pte_none(*old_pte)) continue; + pte = ptep_get_and_clear(mm, old_addr, old_pte); /* * We are remapping a dirty PTE, make sure to * flush TLB before we drop the PTL for the * old PTE or we may race with page_mkclean(). + * + * This check has to be done after we removed the + * old PTE from page tables or another thread may + * dirty it after the check and before the removal. */ if (pte_present(*old_pte) && pte_dirty(*old_pte)) force_flush = true; - pte = ptep_get_and_clear(mm, old_addr, old_pte); pte = move_pte(pte, new_vma->vm_page_prot, old_addr, new_addr); pte = move_soft_dirty_pte(pte); set_pte_at(mm, new_addr, new_pte, pte); -- 2.5.5