Received: by 2002:a05:6a10:a852:0:0:0:0 with SMTP id d18csp635748pxy; Fri, 30 Apr 2021 12:54:41 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxpTf/65QmpUbv/siPX3cwlk/IM6QG7YyPdhHi23TSnkfLhf1CfbmzCPRMIhlGbEuyANrdF X-Received: by 2002:a17:90b:88b:: with SMTP id bj11mr7063024pjb.224.1619812481435; Fri, 30 Apr 2021 12:54:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1619812481; cv=none; d=google.com; s=arc-20160816; b=AJ4Y1Lw6Qz3qjsdWjvxLBcs243M9bP+M7QmDV7/9mVDiz9DI6Gne/F8vnng2j/bF6v fJk0knTZRDNboN/6eesggN6jBz4GZTgucv/D5/v5pt+yV0noSQrIxegOObKCI64rUz6V kSOIsdGKy/3h96JvVjcyRg6F0wwVWQn98YQnGr6tQA0Mb6ViWi7dPl8IBw78QSPwyzpl SEqaVdNZAjOKAzl5i+BM6ZN7tZy+eWnY57OhvcHcUQRC0jYFU8V9Q2vXBs7ru82QXTyS 5011fA9tcrKRm8J3NlJ9GZpiQ0VHt0Q5DQa4pUzEOAfcB94+jlKuf6cNLr/HIBwVMubJ 9B6A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature:dkim-signature; bh=JLbOD6CQvK5+Vbi6lHoPmCez/F/O/tswSP9m74uqZKA=; b=aGqsuhyCXtSEGVOW5fwW6uC6WtK2QY3K85tOf9iJxmfxzxm/oho0wVSmqu2pbKLKO+ Z3zR9Mp5+ZYvjuLkl1lU/PCgmJekPiKkDb45osJQxiaKL7Zl9zwtTzsRzxcuPGlhKZ6M OPtcq8gdWJ2g3yZYpPaI8dPc/qABFOOuhKSrmOxmbv9qzyheuYk4d5C0AcUe1syyB89z 05KIDa8ixjmPwVW88q2J+wkipkGUTmzWTReORuV2EDFhS/WGfh4Xfh1FIAtAC0EjGghX v24VTWSBYkTdaOSCEKX8l6xys2J+v/zmD/29Zb09GiLtuOVOuZgAGdJ1FvaymAXYeSI+ ySRQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (no key) header.i=@lespinasse.org header.s=srv-14-ed header.b=ZHIRuBYI; dkim=pass (test mode) header.i=@lespinasse.org header.s=srv-14-rsa header.b=IEDqf76y; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=lespinasse.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id z19si4450919pgv.404.2021.04.30.12.54.28; Fri, 30 Apr 2021 12:54:41 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=neutral (no key) header.i=@lespinasse.org header.s=srv-14-ed header.b=ZHIRuBYI; dkim=pass (test mode) header.i=@lespinasse.org header.s=srv-14-rsa header.b=IEDqf76y; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=lespinasse.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233756AbhD3TyH (ORCPT + 99 others); Fri, 30 Apr 2021 15:54:07 -0400 Received: from server.lespinasse.org ([63.205.204.226]:39479 "EHLO server.lespinasse.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232397AbhD3Tx3 (ORCPT ); Fri, 30 Apr 2021 15:53:29 -0400 DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-14-ed; t=1619812353; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=JLbOD6CQvK5+Vbi6lHoPmCez/F/O/tswSP9m74uqZKA=; b=ZHIRuBYI42TPV0EKvuDcckJ/sm5j5f0A5ghiyVmRets6zFEr5EvGBrfObyqWSuNH0ixzZ miFe3Szsn+X1IdJCQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-14-rsa; t=1619812353; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=JLbOD6CQvK5+Vbi6lHoPmCez/F/O/tswSP9m74uqZKA=; b=IEDqf76yGYIduS5KKKlHNO9UJwI6AkwAdi+FRipbomT86Si8cIyNoBlVo6LnSP2V9dZiB PsYJDHd97hdwOEUEvnaCnBUwcGglz4ElbbVSrNX4ZTBSEu2cT9bGtWbTs23fFB5ZzTIM/GN /TtD6alSyHZ/E45XIpaHdYoDLFV3Qs3cNEljS4ZmsalC/c3+2OVgKtAVV+yreeuIfq7JdqA 5xDkj/SM8+Hmegt72zXCejv9qyp9f7w9dy2gBe23gaGmz32wlP6BQDp0O8y/83K0xdZWlUW bzoH4Ih6nFnIPoUHZ+DVDwakUnpXPjUDXm8IoLfrbS+H6ZH4XbNe08xo1YHw== Received: from zeus.lespinasse.org (zeus.lespinasse.org [10.0.0.150]) by server.lespinasse.org (Postfix) with ESMTPS id 773F4160328; Fri, 30 Apr 2021 12:52:33 -0700 (PDT) Received: by zeus.lespinasse.org (Postfix, from userid 1000) id 682A219F523; Fri, 30 Apr 2021 12:52:33 -0700 (PDT) From: Michel Lespinasse To: Linux-MM , Linux-Kernel Cc: Laurent Dufour , Peter Zijlstra , Michal Hocko , Matthew Wilcox , Rik van Riel , Paul McKenney , Andrew Morton , Suren Baghdasaryan , Joel Fernandes , Andy Lutomirski , Michel Lespinasse Subject: [PATCH 16/29] mm: add pte_map_lock() and pte_spinlock() Date: Fri, 30 Apr 2021 12:52:17 -0700 Message-Id: <20210430195232.30491-17-michel@lespinasse.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210430195232.30491-1-michel@lespinasse.org> References: <20210430195232.30491-1-michel@lespinasse.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org pte_map_lock() and pte_spinlock() are used by fault handlers to ensure the pte is mapped and locked before they commit the faulted page to the mm's address space at the end of the fault. The functions differ in their preconditions; pte_map_lock() expects the pte to be unmapped prior to the call, while pte_spinlock() expects it to be already mapped. In the speculative fault case, the functions verify, after locking the pte, that the mmap sequence count has not changed since the start of the fault, and thus that no mmap lock writers have been running concurrently with the fault. After that point the page table lock serializes any further races with concurrent mmap lock writers. If the mmap sequence count check fails, both functions will return false with the pte being left unmapped and unlocked. Signed-off-by: Michel Lespinasse --- include/linux/mm.h | 36 +++++++++++++++++++++++++ mm/memory.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 102 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index dee8a4833779..8124cd53ce15 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3183,5 +3183,41 @@ extern int sysctl_nr_trim_pages; void mem_dump_obj(void *object); +#ifdef CONFIG_MMU +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT + +bool __pte_map_lock(struct vm_fault *vmf); + +static inline bool pte_map_lock(struct vm_fault *vmf) +{ + VM_BUG_ON(vmf->pte); + return __pte_map_lock(vmf); +} + +static inline bool pte_spinlock(struct vm_fault *vmf) +{ + VM_BUG_ON(!vmf->pte); + return __pte_map_lock(vmf); +} + +#else /* !CONFIG_SPECULATIVE_PAGE_FAULT */ + +static inline bool pte_map_lock(struct vm_fault *vmf) +{ + vmf->pte = pte_offset_map_lock(vmf->vma->vm_mm, vmf->pmd, vmf->address, + &vmf->ptl); + return true; +} + +static inline bool pte_spinlock(struct vm_fault *vmf) +{ + vmf->ptl = pte_lockptr(vmf->vma->vm_mm, vmf->pmd); + spin_lock(vmf->ptl); + return true; +} + +#endif /* CONFIG_SPECULATIVE_PAGE_FAULT */ +#endif /* CONFIG_MMU */ + #endif /* __KERNEL__ */ #endif /* _LINUX_MM_H */ diff --git a/mm/memory.c b/mm/memory.c index 3f5c3d6c0197..e2f9e4c096dd 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2586,6 +2586,72 @@ EXPORT_SYMBOL_GPL(apply_to_existing_page_range); #define speculative_page_walk_end() local_irq_enable() #endif +bool __pte_map_lock(struct vm_fault *vmf) +{ +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + pmd_t pmdval; +#endif + pte_t *pte = vmf->pte; + spinlock_t *ptl; + + if (!(vmf->flags & FAULT_FLAG_SPECULATIVE)) { + vmf->ptl = pte_lockptr(vmf->vma->vm_mm, vmf->pmd); + if (!pte) + vmf->pte = pte_offset_map(vmf->pmd, vmf->address); + spin_lock(vmf->ptl); + return true; + } + + speculative_page_walk_begin(); + if (!mmap_seq_read_check(vmf->vma->vm_mm, vmf->seq)) + goto fail; + /* + * The mmap sequence count check guarantees that the page + * tables are still valid at that point, and + * speculative_page_walk_begin() ensures that they stay around. + */ +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + /* + * We check if the pmd value is still the same to ensure that there + * is not a huge collapse operation in progress in our back. + */ + pmdval = READ_ONCE(*vmf->pmd); + if (!pmd_same(pmdval, vmf->orig_pmd)) + goto fail; +#endif + ptl = pte_lockptr(vmf->vma->vm_mm, vmf->pmd); + if (!pte) + pte = pte_offset_map(vmf->pmd, vmf->address); + /* + * Try locking the page table. + * + * Note that we might race against zap_pte_range() which + * invalidates TLBs while holding the page table lock. + * We are still under the speculative_page_walk_begin() section, + * and zap_pte_range() could thus deadlock with us if we tried + * using spin_lock() here. + * + * We also don't want to retry until spin_trylock() succeeds, + * because of the starvation potential against a stream of lockers. + */ + if (unlikely(!spin_trylock(ptl))) + goto fail; + if (!mmap_seq_read_check(vmf->vma->vm_mm, vmf->seq)) + goto unlock_fail; + speculative_page_walk_end(); + vmf->pte = pte; + vmf->ptl = ptl; + return true; + +unlock_fail: + spin_unlock(ptl); +fail: + if (pte) + pte_unmap(pte); + speculative_page_walk_end(); + return false; +} + #endif /* CONFIG_SPECULATIVE_PAGE_FAULT */ /* -- 2.20.1