Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp577055pxf; Wed, 7 Apr 2021 06:51:41 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwV0OMHoQ9jfNLkyCTTt4nH+CZFk8s7FC+5Op314GjOdk4xvSG6KIbB3NdUZYzOXdEU057U X-Received: by 2002:a17:906:c102:: with SMTP id do2mr3842567ejc.305.1617803501528; Wed, 07 Apr 2021 06:51:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1617803501; cv=none; d=google.com; s=arc-20160816; b=f0p/dcwUZK981OlGVztJYLL9YUiIDKQKz22PCnS3HToJEinYARJ0pe2iMwt6UIMFkg 4FWMiV5x7d8yLK81T8QHdyaLdxTTH1hHNjcp3GTDI6zI33fOhM0nWutcnKrXVH7It9V5 38x1Ykuuips6b6nN5O9vZ+totakjfOfn+nsBQomZC5v+VVhxZJm5P5VKu+Aole/a9/4s YnbAf22Zo48/Q2ZDZLH3IycQNsmNcoxOW7U86OBqaLvN53E+RLjbqvm2xI7C9iQy1mXq jDGiXYqCvSdOOYCOCRnOzKihnUvj7AjSlNZF0b8OpUtAwJD+VQdy5jjvb8YMB6sIVzz+ hwxg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature:dkim-signature; bh=CjG9cXBfCGz37cJgHr3H1dKyoE0oEoz4J1dkmh5NNCg=; b=D9ru33a3rxetv0t61ebez0U0x8Bz+WSvs6VUmHfRr6kOVOju/8igjyWV+4xhExfUwf FySMlRL2pFdMZrr2J5MG7AM7zeKCRGttRz417NAKFMUuTxav0FCwm9OcJ+g44Hht1j1p jRHClzGHICT5yke+PBVzZ88YZZrL6LHnYfH0KVeI91QGaiyugoFqpOxUVGrRDSL6HVgc LWJq4WB3DrayQnhT+AdeZ0xrjpjVLDoVmIgd/kbUyIA552cDBSqwWUeiC/IkdcXKlPxP /lnO8a/8yF+ZaWXChXZjplKtyOQjc18ZDCqsmu2j786BDOtb8ZUAN+BdxMc0RAj/lwBy wQcQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (no key) header.i=@lespinasse.org; dkim=pass (test mode) header.i=@lespinasse.org header.s=srv-11-rsa header.b="q0/QAhX2"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=lespinasse.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id q20si21047937ejb.289.2021.04.07.06.51.16; Wed, 07 Apr 2021 06:51:41 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=neutral (no key) header.i=@lespinasse.org; dkim=pass (test mode) header.i=@lespinasse.org header.s=srv-11-rsa header.b="q0/QAhX2"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=lespinasse.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1347884AbhDGBwA (ORCPT + 99 others); Tue, 6 Apr 2021 21:52:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51130 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1347803AbhDGBvt (ORCPT ); Tue, 6 Apr 2021 21:51:49 -0400 Received: from server.lespinasse.org (unknown [IPv6:2602:303:fcdc:ce10::100:0]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 50AF6C06175F for ; Tue, 6 Apr 2021 18:51:40 -0700 (PDT) DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-11-ed; t=1617759903; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=CjG9cXBfCGz37cJgHr3H1dKyoE0oEoz4J1dkmh5NNCg=; b=yM6xNR3TMq76ZYckpLMGjkzx0S/w7f/gz1/7CSWCS9RX/0FOrYG+SfxcWVhU1zxZJBmOD /HlPaPF2xGrhS5hBQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-11-rsa; t=1617759903; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=CjG9cXBfCGz37cJgHr3H1dKyoE0oEoz4J1dkmh5NNCg=; b=q0/QAhX2NFAO+hY4Y//y0LFODZO05QGiHTCbTcypAuapYNan4SDX7zfsyzfEt1XdUkjr7 2KVgBZm7Sxj37H/8sMXwgaOt4pNU6ISdqlxBHJ0oACTMN+bFbaj1KbpnlXvuk+HEMHnlV9k BM80kjgIYIBVMbDF14NTSi51Vnfz3GkYiggzBWUTur9OWzbefrZ6DUXChgzsLZxoBGGrSh0 lpCPsKML05GVhgOWfIPLGOOOQ1jTOSeYb8oWE8z55oIj99zsZ9ILV2I34xYo68E7Nw4uv3p DmODchp4sYxTrYsb8lFUKDXU2IgXjF4tIyCCmzCzh47MYSEEJ0e6/veFr7VA== Received: from zeus.lespinasse.org (zeus.lespinasse.org [IPv6:fd00::150:0]) by server.lespinasse.org (Postfix) with ESMTPS id 023FA160350; Tue, 6 Apr 2021 18:45:03 -0700 (PDT) Received: by zeus.lespinasse.org (Postfix, from userid 1000) id E81F619F31F; Tue, 6 Apr 2021 18:45:02 -0700 (PDT) From: Michel Lespinasse To: Linux-MM Cc: Laurent Dufour , Peter Zijlstra , Michal Hocko , Matthew Wilcox , Rik van Riel , Paul McKenney , Andrew Morton , Suren Baghdasaryan , Joel Fernandes , Rom Lemarchand , Linux-Kernel , Michel Lespinasse Subject: [RFC PATCH 21/37] mm: implement speculative handling in do_swap_page() Date: Tue, 6 Apr 2021 18:44:46 -0700 Message-Id: <20210407014502.24091-22-michel@lespinasse.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210407014502.24091-1-michel@lespinasse.org> References: <20210407014502.24091-1-michel@lespinasse.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org If the pte is larger than long, use pte_spinlock() to lock the page table when verifying the pte - pte_spinlock() is necessary to ensure the page table is still valid when we are locking it. Abort speculative faults if the pte is not a swap entry, or if the desired page is not found in swap cache, to keep things as simple as possible. Only use trylock when locking the swapped page - again to keep things simple, and also the usual lock_page_or_retry would otherwise try to release the mmap lock which is not held in the speculative case. Use pte_map_lock() to ensure proper synchronization when finally committing the faulted page to the mm address space. Signed-off-by: Michel Lespinasse --- mm/memory.c | 74 ++++++++++++++++++++++++++++++----------------------- 1 file changed, 42 insertions(+), 32 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index fc555fae0844..ab3160719bf3 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2637,30 +2637,6 @@ bool __pte_map_lock(struct vm_fault *vmf) #endif /* CONFIG_SPECULATIVE_PAGE_FAULT */ -/* - * handle_pte_fault chooses page fault handler according to an entry which was - * read non-atomically. Before making any commitment, on those architectures - * or configurations (e.g. i386 with PAE) which might give a mix of unmatched - * parts, do_swap_page must check under lock before unmapping the pte and - * proceeding (but do_wp_page is only called after already making such a check; - * and do_anonymous_page can safely check later on). - */ -static inline int pte_unmap_same(struct mm_struct *mm, pmd_t *pmd, - pte_t *page_table, pte_t orig_pte) -{ - int same = 1; -#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPTION) - if (sizeof(pte_t) > sizeof(unsigned long)) { - spinlock_t *ptl = pte_lockptr(mm, pmd); - spin_lock(ptl); - same = pte_same(*page_table, orig_pte); - spin_unlock(ptl); - } -#endif - pte_unmap(page_table); - return same; -} - static inline bool cow_user_page(struct page *dst, struct page *src, struct vm_fault *vmf) { @@ -3369,12 +3345,34 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) return VM_FAULT_RETRY; } - if (!pte_unmap_same(vma->vm_mm, vmf->pmd, vmf->pte, vmf->orig_pte)) - goto out; +#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPTION) + if (sizeof(pte_t) > sizeof(unsigned long)) { + /* + * vmf->orig_pte was read non-atomically. Before making + * any commitment, on those architectures or configurations + * (e.g. i386 with PAE) which might give a mix of + * unmatched parts, we must check under lock before + * unmapping the pte and proceeding. + * + * (but do_wp_page is only called after already making + * such a check; and do_anonymous_page can safely + * check later on). + */ + if (!pte_spinlock(vmf)) + return VM_FAULT_RETRY; + if (!pte_same(*vmf->pte, vmf->orig_pte)) + goto unlock; + spin_unlock(vmf->ptl); + } +#endif + pte_unmap(vmf->pte); + vmf->pte = NULL; entry = pte_to_swp_entry(vmf->orig_pte); if (unlikely(non_swap_entry(entry))) { - if (is_migration_entry(entry)) { + if (vmf->flags & FAULT_FLAG_SPECULATIVE) { + ret = VM_FAULT_RETRY; + } else if (is_migration_entry(entry)) { migration_entry_wait(vma->vm_mm, vmf->pmd, vmf->address); } else if (is_device_private_entry(entry)) { @@ -3395,8 +3393,14 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) swapcache = page; if (!page) { - struct swap_info_struct *si = swp_swap_info(entry); + struct swap_info_struct *si; + if (vmf->flags & FAULT_FLAG_SPECULATIVE) { + delayacct_clear_flag(DELAYACCT_PF_SWAPIN); + return VM_FAULT_RETRY; + } + + si = swp_swap_info(entry); if (data_race(si->flags & SWP_SYNCHRONOUS_IO) && __swap_count(entry) == 1) { /* skip swapcache */ @@ -3459,7 +3463,10 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) goto out_release; } - locked = lock_page_or_retry(page, vma->vm_mm, vmf->flags); + if (vmf->flags & FAULT_FLAG_SPECULATIVE) + locked = trylock_page(page); + else + locked = lock_page_or_retry(page, vma->vm_mm, vmf->flags); delayacct_clear_flag(DELAYACCT_PF_SWAPIN); if (!locked) { @@ -3487,10 +3494,13 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) cgroup_throttle_swaprate(page, GFP_KERNEL); /* - * Back out if somebody else already faulted in this pte. + * Back out if the VMA has changed in our back during a speculative + * page fault or if somebody else already faulted in this pte. */ - vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, - &vmf->ptl); + if (!pte_map_lock(vmf)) { + ret = VM_FAULT_RETRY; + goto out_page; + } if (unlikely(!pte_same(*vmf->pte, vmf->orig_pte))) goto out_nomap; -- 2.20.1