Received: by 2002:a05:6a10:413:0:0:0:0 with SMTP id 19csp3547093pxp; Tue, 8 Mar 2022 17:08:45 -0800 (PST) X-Google-Smtp-Source: ABdhPJwfVg/SEHkQYxyIfYSEw0pt0pN7LWb3VlhI9zYP1G3OMTynwqp6ivHQMhg4zHnnvqhE9w/u X-Received: by 2002:a17:90b:4c84:b0:1bf:73aa:ba78 with SMTP id my4-20020a17090b4c8400b001bf73aaba78mr7810615pjb.4.1646788125176; Tue, 08 Mar 2022 17:08:45 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1646788125; cv=none; d=google.com; s=arc-20160816; b=Gj+tcTgDcQYQnm/urSqdJmIksCISNCOga/BdvHttnYassR7+u3Oiv8fgVR93qG19K0 ZmwP/j4yvJPET6wHPrWfvVPmWznJuFaMJOwnkDjL+GrEl3d3VcGw1Lo7PU0UsOcnzUy7 Sg3Yn7J50V4+kcywlcUpwLwlYisQul3XpNPxZc/gpWBSqOJKt5xWaEiE0Cw2OjcLKOVq XwMtnzNKltJFE7unTtZYkxRCfVgUkztZgqKjLM2hTe/Gvoax6NHhJMXccgKPHhn4YHpM rWwtHZwJXP3Ar0oW/uo7yVthGP+zCsaNlhmfpML33SHhvWBv4tdgAK4XlHbPFE/OfyKX gabA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=4/ZaSFdohDjZkHSnq8KBnhQxp6IC6gPVzxKlowDxfyA=; b=vVjxYjQ+AYdYnScFJeIC0utD2dD9OSTKT6CI3mT79EKanDPLHrNvUeBiL2+WUVq3TL dHpfi45rGZImiLDihDGhBYE2L/N0z4RFjUyxRtsut3cPcuuh0OML+Z9K7ZDbNFEzYS8Q fs2ADvud4njt5FPLeXu6sY3dKjJSF7eXeSzzqsKOTbbFBeYmvc2ekfVdERZ+MO2cZtfH VKOSfyab6kwYr/JP6LZGoca8Qmt9XDGAICeOf7AprZCrK90Xz0WAqkVvNQBxix634HPZ sXMEBysyOmZzd7u47CrwtWmhPAPcMDv7SEePr9diFNvP4XrjmjH6YtK2fhvoOizYanvV 3Dvw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=FzgM6qOo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id k13-20020a056a00134d00b004f7550f7f55si462428pfu.225.2022.03.08.17.08.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Mar 2022 17:08:45 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=FzgM6qOo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 2B4B6182D83; Tue, 8 Mar 2022 16:13:26 -0800 (PST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239818AbiCHOUM (ORCPT + 99 others); Tue, 8 Mar 2022 09:20:12 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44068 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345977AbiCHOUK (ORCPT ); Tue, 8 Mar 2022 09:20:10 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 807094B877 for ; Tue, 8 Mar 2022 06:19:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1646749141; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4/ZaSFdohDjZkHSnq8KBnhQxp6IC6gPVzxKlowDxfyA=; b=FzgM6qOozIED51tevEU73QDjDLaOFlSXlQpBvX1s0+djHPtifhWkSxPqzZW82WcEkDOJNU Av1rR3WcgM87b3fau6fuDi4R2z8r0wAv7UKyQLwCdyVXQwIxop2OL3wX0p+y+pOJpSvs6f 0OlgltIkgfIFGbzwRf6JfyJEXtz7Mgw= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-493-j7FvuNnBP2qM5ld7S8DVQA-1; Tue, 08 Mar 2022 09:18:58 -0500 X-MC-Unique: j7FvuNnBP2qM5ld7S8DVQA-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id EAEE81006AA5; Tue, 8 Mar 2022 14:18:54 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.195.19]) by smtp.corp.redhat.com (Postfix) with ESMTP id 766B57886E; Tue, 8 Mar 2022 14:18:48 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: Andrew Morton , Hugh Dickins , Linus Torvalds , David Rientjes , Shakeel Butt , John Hubbard , Jason Gunthorpe , Mike Kravetz , Mike Rapoport , Yang Shi , "Kirill A . Shutemov" , Matthew Wilcox , Vlastimil Babka , Jann Horn , Michal Hocko , Nadav Amit , Rik van Riel , Roman Gushchin , Andrea Arcangeli , Peter Xu , Donald Dutile , Christoph Hellwig , Oleg Nesterov , Jan Kara , Liang Zhang , Pedro Gomes , Oded Gabbay , linux-mm@kvack.org, David Hildenbrand Subject: [PATCH v1 15/15] mm/gup: sanity-check with CONFIG_DEBUG_VM that anonymous pages are exclusive when (un)pinning Date: Tue, 8 Mar 2022 15:14:37 +0100 Message-Id: <20220308141437.144919-16-david@redhat.com> In-Reply-To: <20220308141437.144919-1-david@redhat.com> References: <20220308141437.144919-1-david@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Spam-Status: No, score=-2.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Let's verify when (un)pinning anonymous pages that we always deal with exclusive anonymous pages, which guarantees that we'll have a reliable PIN, meaning that we cannot end up with the GUP pin being inconsistent with he pages mapped into the page tables due to a COW triggered by a write fault. When pinning pages, after conditionally triggering GUP unsharing of possibly shared anonymous pages, we should always only see exclusive anonymous pages. Note that anonymous pages that are mapped writable must be marked exclusive, otherwise we'd have a BUG. When pinning during ordinary GUP, simply add a check after our conditional GUP-triggered unsharing checks. As we know exactly how the page is mapped, we know exactly in which page we have to check for PageAnonExclusive(). When pinning via GUP-fast we have to be careful, because we can race with fork(): verify only after we made sure via the seqcount that we didn't race with concurrent fork() that we didn't end up pinning a possibly shared anonymous page. Similarly, when unpinning, verify that the pages are still marked as exclusive: otherwise something turned the pages possibly shared, which can result in random memory corruptions, which we really want to catch. With only the pinned pages at hand and not the actual page table entries we have to be a bit careful: hugetlb pages are always mapped via a single logical page table entry referencing the head page and PG_anon_exclusive of the head page applies. Anon THP are a bit more complicated, because we might have obtained the page reference either via a PMD or a PTE -- depending on the mapping type we either have to check PageAnonExclusive of the head page (PMD-mapped THP) or the tail page (PTE-mapped THP) applies: as we don't know and to make our life easier, check that either is set. Take care to not verify in case we're unpinning during GUP-fast because we detected concurrent fork(): we might stumble over an anonymous page that is now shared. Signed-off-by: David Hildenbrand --- mm/gup.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++- mm/huge_memory.c | 3 +++ mm/hugetlb.c | 3 +++ 3 files changed, 63 insertions(+), 1 deletion(-) diff --git a/mm/gup.c b/mm/gup.c index 9e4864130202..b36f02f2b720 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -45,6 +45,38 @@ static void hpage_pincount_sub(struct page *page, int refs) atomic_sub(refs, compound_pincount_ptr(page)); } +static inline void sanity_check_pinned_pages(struct page **pages, + unsigned long npages) +{ +#ifdef CONFIG_DEBUG_VM + /* + * We only pin anonymous pages if they are exclusive. Once pinned, we + * can no longer turn them possibly shared and PageAnonExclusive() will + * stick around until the page is freed. + * + * We'd like to verify that our pinned anonymous pages are still mapped + * exclusively. The issue with anon THP is that we don't know how + * they are/were mapped when pinning them. However, for anon + * THP we can assume that either the given page (PTE-mapped THP) or + * the head page (PMD-mapped THP) should be PageAnonExclusive(). If + * neither is the case, there is certainly something wrong. + */ + for (; npages; npages--, pages++) { + struct page *page = *pages; + struct page *head = compound_head(page); + + if (!PageAnon(head)) + continue; + if (!PageCompound(head) || PageHuge(head)) + VM_BUG_ON_PAGE(!PageAnonExclusive(head), page); + else + /* Either a PTE-mapped or a PMD-mapped THP. */ + VM_BUG_ON_PAGE(!PageAnonExclusive(head) && + !PageAnonExclusive(page), page); + } +#endif /* CONFIG_DEBUG_VM */ +} + /* Equivalent to calling put_page() @refs times. */ static void put_page_refs(struct page *page, int refs) { @@ -250,6 +282,7 @@ bool __must_check try_grab_page(struct page *page, unsigned int flags) */ void unpin_user_page(struct page *page) { + sanity_check_pinned_pages(&page, 1); put_compound_head(compound_head(page), 1, FOLL_PIN); } EXPORT_SYMBOL(unpin_user_page); @@ -340,6 +373,7 @@ void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages, return; } + sanity_check_pinned_pages(pages, npages); for_each_compound_head(index, pages, npages, head, ntails) { /* * Checking PageDirty at this point may race with @@ -404,6 +438,21 @@ void unpin_user_page_range_dirty_lock(struct page *page, unsigned long npages, } EXPORT_SYMBOL(unpin_user_page_range_dirty_lock); +static void unpin_user_pages_lockless(struct page **pages, unsigned long npages) +{ + unsigned long index; + struct page *head; + unsigned int ntails; + + /* + * Don't perform any sanity checks because we might have raced with + * fork() and some anonymous pages might now actually be shared -- + * which is why we're unpinning after all. + */ + for_each_compound_head(index, pages, npages, head, ntails) + put_compound_head(head, ntails, FOLL_PIN); +} + /** * unpin_user_pages() - release an array of gup-pinned pages. * @pages: array of pages to be marked dirty and released. @@ -426,6 +475,7 @@ void unpin_user_pages(struct page **pages, unsigned long npages) */ if (WARN_ON(IS_ERR_VALUE(npages))) return; + sanity_check_pinned_pages(pages, npages); for_each_compound_head(index, pages, npages, head, ntails) put_compound_head(head, ntails, FOLL_PIN); @@ -572,6 +622,10 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, page = ERR_PTR(-EMLINK); goto out; } + + VM_BUG_ON((flags & FOLL_PIN) && PageAnon(page) && + !PageAnonExclusive(page)); + /* try_grab_page() does nothing unless FOLL_GET or FOLL_PIN is set. */ if (unlikely(!try_grab_page(page, flags))) { page = ERR_PTR(-ENOMEM); @@ -2895,8 +2949,10 @@ static unsigned long lockless_pages_from_mm(unsigned long start, */ if (gup_flags & FOLL_PIN) { if (read_seqcount_retry(¤t->mm->write_protect_seq, seq)) { - unpin_user_pages(pages, nr_pinned); + unpin_user_pages_lockless(pages, nr_pinned); return 0; + } else { + sanity_check_pinned_pages(pages, nr_pinned); } } return nr_pinned; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 0f02c121c884..d9559341a5f1 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1392,6 +1392,9 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, if (!pmd_write(*pmd) && gup_must_unshare(flags, page)) return ERR_PTR(-EMLINK); + VM_BUG_ON((flags & FOLL_PIN) && PageAnon(page) && + !PageAnonExclusive(page)); + if (!try_grab_page(page, flags)) return ERR_PTR(-ENOMEM); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index c5d63baa2957..0d150d100111 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6092,6 +6092,9 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, pfn_offset = (vaddr & ~huge_page_mask(h)) >> PAGE_SHIFT; page = pte_page(huge_ptep_get(pte)); + VM_BUG_ON((flags & FOLL_PIN) && PageAnon(page) && + !PageAnonExclusive(page)); + /* * If subpage information not requested, update counters * and skip the same_page loop below. -- 2.35.1