Received: by 10.223.185.116 with SMTP id b49csp4199264wrg; Mon, 26 Feb 2018 13:02:47 -0800 (PST) X-Google-Smtp-Source: AH8x227FOYfEQDgZ4hiYNQETXtkL12NgsrpLnxltUPUwuaLgJF8gmOnaGED2K8Q5IiLFQcC1bcdu X-Received: by 10.99.156.17 with SMTP id f17mr9580244pge.12.1519678967501; Mon, 26 Feb 2018 13:02:47 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1519678967; cv=none; d=google.com; s=arc-20160816; b=u4mZK6YUeBSgAQ1OmX5g2PiropidUTot1JAq8jjcZLstcmBX2iCkxhL+KR3M4F+Rfu XlqaAxBi6v3L3XySBZMSxcA556a6ioo2c20PsRNctwspSAJD+sAfBMs6KlsJwxRGp4F4 qmYJkrf5bp/IDcyoGMrv+xa+vn8oYBhf4TfsvUCNcJnVMl75soVs/dTpUmHkYGaCFhGF sjWo54mxup7M98lHtxg/40UVdap0u47ehQqEofNuBM4gy5tvsRUwLy1XbQbFcZ2PsYg6 TdIYVDGkKD38Qqbrga6VeJqv1UIwoBezIMzsTEao9HMlIWodpDNYOjV3BD6ap61EBGAe Xz0A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=xLK+v4M8nEIpPPuIfhzGS9qigJCCFfJCe4WXqiujY6Y=; b=kRJwyy6HTwgUhY9RKSGq7NQ7NEcBTYm4QudxBk5aQlPdRD0tiTDQ1Up2dmVdY35d8a 6prxtcEUzcdbbJKBIQm0Wa3/ytoeX3whIzjGUj1RqAJWxvwGtHd9A22q0HjbJMImbiFZ cWVyhQJpLYHOWMO6A8DgH9A7/ceqs/JN2B9ZR5QmjP+xxouPI0QoX4cNGxpg1iFop2Ee T0BwUd40YVe0W5MJjLfFWhwhCkVX6wJCdAVJldeQYaCX/myGkwlZ7a2PB/4D7PNF8MES 7Q+J+1FTHY4hbaRoff6qiU4fIGFWzdZPB6DUSCRu+T+LcOjo3qV2oPCk6HFvXrPPfDue UGDA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h3-v6si7269650plh.694.2018.02.26.13.02.20; Mon, 26 Feb 2018 13:02:47 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752189AbeBZUWP (ORCPT + 99 others); Mon, 26 Feb 2018 15:22:15 -0500 Received: from mail.linuxfoundation.org ([140.211.169.12]:34008 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751591AbeBZUWO (ORCPT ); Mon, 26 Feb 2018 15:22:14 -0500 Received: from localhost (clnet-b04-243.ikbnet.co.at [83.175.124.243]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id 0C146BAE; Mon, 26 Feb 2018 20:22:13 +0000 (UTC) From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Ross Zwisler , Jan Kara , Pawel Lebioda , "Darrick J. Wong" , Alexander Viro , Christoph Hellwig , Dan Williams , Dave Hansen , Matthew Wilcox , "Kirill A . Shutemov" , Dave Jiang , Xiong Zhou , Eryu Guan , Andrew Morton , Linus Torvalds Subject: [PATCH 4.9 29/39] mm: avoid spurious bad pmd warning messages Date: Mon, 26 Feb 2018 21:20:50 +0100 Message-Id: <20180226201644.948091150@linuxfoundation.org> X-Mailer: git-send-email 2.16.2 In-Reply-To: <20180226201643.660109883@linuxfoundation.org> References: <20180226201643.660109883@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 4.9-stable review patch. If anyone has any objections, please let me know. ------------------ From: Ross Zwisler commit d0f0931de936a0a468d7e59284d39581c16d3a73 upstream. When the pmd_devmap() checks were added by 5c7fb56e5e3f ("mm, dax: dax-pmd vs thp-pmd vs hugetlbfs-pmd") to add better support for DAX huge pages, they were all added to the end of if() statements after existing pmd_trans_huge() checks. So, things like: - if (pmd_trans_huge(*pmd)) + if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd)) When further checks were added after pmd_trans_unstable() checks by commit 7267ec008b5c ("mm: postpone page table allocation until we have page to map") they were also added at the end of the conditional: + if (pmd_trans_unstable(fe->pmd) || pmd_devmap(*fe->pmd)) This ordering is fine for pmd_trans_huge(), but doesn't work for pmd_trans_unstable(). This is because DAX huge pages trip the bad_pmd() check inside of pmd_none_or_trans_huge_or_clear_bad() (called by pmd_trans_unstable()), which prints out a warning and returns 1. So, we do end up doing the right thing, but only after spamming dmesg with suspicious looking messages: mm/pgtable-generic.c:39: bad pmd ffff8808daa49b88(84000001006000a5) Reorder these checks in a helper so that pmd_devmap() is checked first, avoiding the error messages, and add a comment explaining why the ordering is important. Fixes: commit 7267ec008b5c ("mm: postpone page table allocation until we have page to map") Link: http://lkml.kernel.org/r/20170522215749.23516-1-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler Reviewed-by: Jan Kara Cc: Pawel Lebioda Cc: "Darrick J. Wong" Cc: Alexander Viro Cc: Christoph Hellwig Cc: Dan Williams Cc: Dave Hansen Cc: Matthew Wilcox Cc: "Kirill A . Shutemov" Cc: Dave Jiang Cc: Xiong Zhou Cc: Eryu Guan Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- mm/memory.c | 40 ++++++++++++++++++++++++++++++---------- 1 file changed, 30 insertions(+), 10 deletions(-) --- a/mm/memory.c +++ b/mm/memory.c @@ -2848,6 +2848,17 @@ static int __do_fault(struct fault_env * return ret; } +/* + * The ordering of these checks is important for pmds with _PAGE_DEVMAP set. + * If we check pmd_trans_unstable() first we will trip the bad_pmd() check + * inside of pmd_none_or_trans_huge_or_clear_bad(). This will end up correctly + * returning 1 but not before it spams dmesg with the pmd_clear_bad() output. + */ +static int pmd_devmap_trans_unstable(pmd_t *pmd) +{ + return pmd_devmap(*pmd) || pmd_trans_unstable(pmd); +} + static int pte_alloc_one_map(struct fault_env *fe) { struct vm_area_struct *vma = fe->vma; @@ -2871,18 +2882,27 @@ static int pte_alloc_one_map(struct faul map_pte: /* * If a huge pmd materialized under us just retry later. Use - * pmd_trans_unstable() instead of pmd_trans_huge() to ensure the pmd - * didn't become pmd_trans_huge under us and then back to pmd_none, as - * a result of MADV_DONTNEED running immediately after a huge pmd fault - * in a different thread of this mm, in turn leading to a misleading - * pmd_trans_huge() retval. All we have to ensure is that it is a - * regular pmd that we can walk with pte_offset_map() and we can do that - * through an atomic read in C, which is what pmd_trans_unstable() - * provides. + * pmd_trans_unstable() via pmd_devmap_trans_unstable() instead of + * pmd_trans_huge() to ensure the pmd didn't become pmd_trans_huge + * under us and then back to pmd_none, as a result of MADV_DONTNEED + * running immediately after a huge pmd fault in a different thread of + * this mm, in turn leading to a misleading pmd_trans_huge() retval. + * All we have to ensure is that it is a regular pmd that we can walk + * with pte_offset_map() and we can do that through an atomic read in + * C, which is what pmd_trans_unstable() provides. */ - if (pmd_trans_unstable(fe->pmd) || pmd_devmap(*fe->pmd)) + if (pmd_devmap_trans_unstable(fe->pmd)) return VM_FAULT_NOPAGE; + /* + * At this point we know that our vmf->pmd points to a page of ptes + * and it cannot become pmd_none(), pmd_devmap() or pmd_trans_huge() + * for the duration of the fault. If a racing MADV_DONTNEED runs and + * we zap the ptes pointed to by our vmf->pmd, the vmf->ptl will still + * be valid and we will re-check to make sure the vmf->pte isn't + * pte_none() under vmf->ptl protection when we return to + * alloc_set_pte(). + */ fe->pte = pte_offset_map_lock(vma->vm_mm, fe->pmd, fe->address, &fe->ptl); return 0; @@ -3456,7 +3476,7 @@ static int handle_pte_fault(struct fault fe->pte = NULL; } else { /* See comment in pte_alloc_one_map() */ - if (pmd_trans_unstable(fe->pmd) || pmd_devmap(*fe->pmd)) + if (pmd_devmap_trans_unstable(fe->pmd)) return 0; /* * A regular pmd is established and it can't morph into a huge