Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp373815pxj; Fri, 11 Jun 2021 01:05:46 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwEL+KE8QsdBQg7y0wMEFKyvjP8C65CF4Ub5SY0KY7Ap03rHf5/Zoh48sAo4JSvEU50xppY X-Received: by 2002:a17:906:a203:: with SMTP id r3mr2557092ejy.361.1623398745925; Fri, 11 Jun 2021 01:05:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1623398745; cv=none; d=google.com; s=arc-20160816; b=GTo5WOpl+YuUrBna6iwG5GVNxJWKTGivbIViDsw+pbQ8tOGeYeFUW51I6HurL5hMuo WwtsZ8rB4wpOPTYp0mO6lydmogQZJbvqYhNXM6MbZzKdi4NN4YCBSlHwm6dN5qWOu4xS oHbBB+d8xkmIDPT+s/Ms9oMLXfnErFn5VVxNBpg/t8NLkZQ2a+HuqquYH88G8Jqq0etJ zIBPdxW6B4QaWqsxcoY6zFHsy0Ts7OW3qz0V7nDZ9JE1XsiFeTwtQt37HPBax7n6EC66 s1cqznWCyGOHblrepo6XtKwpJFNLVdfgQd1kEv22Nhw4yZarGuYTarfHbbFrjoGv+ekb 8yTw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=bE/KUZ9QKkdW8cpXSSyU518CP3tPDoswPEAOU9fuFew=; b=bUcH2g3MrG/KNVKizu8HOo/PKKQ1YoZWtQ0nuXDadC7xX4Yt2N9BnE9VkrUF7lXjPM sTwOr86KdH5n1F0NaVmJxe/cF+DL0jXZNZTTMk9mdl1h//JZ1RAvJA0E+ffUPOIMzUQh o8fpuEd7Xlg+ni27tshWjV/BTQIpIb+oTe2swYzMoR20DRHtMa1GZmrdij9+uelwZEx9 xLbRwlufbqD6DV9u4KbtJo6nWwSNt3gcC3mwnsxJjlahbwyhTrb/dmohS7Um3xYAah5u GCkHWoQ+sgbbCG2hkCWrDhtV7LiPVce5OyiKAMABLIp6sQk7MqEk2kurrr0Ja1hZMFzE 6TqA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=r1ucxI47; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id n15si4472065ejl.587.2021.06.11.01.05.22; Fri, 11 Jun 2021 01:05:45 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=r1ucxI47; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231636AbhFKIEA (ORCPT + 99 others); Fri, 11 Jun 2021 04:04:00 -0400 Received: from mail-wr1-f42.google.com ([209.85.221.42]:35542 "EHLO mail-wr1-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229584AbhFKID6 (ORCPT ); Fri, 11 Jun 2021 04:03:58 -0400 Received: by mail-wr1-f42.google.com with SMTP id m18so5017022wrv.2 for ; Fri, 11 Jun 2021 01:02:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=bE/KUZ9QKkdW8cpXSSyU518CP3tPDoswPEAOU9fuFew=; b=r1ucxI47ris1Yyujr58rPBjyNMwf0oO3T7TBdzTpWtzhzrkvdA+rkJdqLe6+CDPPzA 2E3mwuTrlr0mW2bfVEk51AOjX+QvUVU+5euXfZJqdWVqJLOKswJDc4TErtFr0DOzkft6 Of1o+nShLFfiUO3iUFIHe8XRT0QsJY2RYTg4H53d8uOjSH8UW1/S3GrpBF1WF7IXWMlH ufNmZoNkQ/Qp3R3QfTDCNXPUvFs7lH3Z5jjFNfmpM5EzVkkbWluQz3d5AQ3GtznVeHbN yAhSXXeEtdSXrKFNc5aZXG+1pJRV6DJOPbIwaMKFkJhckP38iuWjUYt2/MFd6msdfHBJ OpXw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=bE/KUZ9QKkdW8cpXSSyU518CP3tPDoswPEAOU9fuFew=; b=b9jOq/xehvDdY023x+AMwoa1oBx5Vnq44y48LZXZG0WCJF0s23P/AzY5nL45uc07XW TP420r2MBHQyNmkZJDfAwk4zZWWJApU4LHZAt/Kjq5HD92kEumtWjL7Uipy8U3cUW4Rb UYsuMWKMZV64q+Ej9mXL9AMPsGn0rhEXZgFo1p57FV7aGu1k123BrUVF8rY6buPPmCV9 yK3qOrzSotD8ZMcayExU7PKJ1+o1yPC0kLQTcr7KmuC30VlNuEFVeQP9qnVbPv/R2rxY 8ewFj+7zcZBJnzUsUs8BhymhIIFbebTBSTNO5lQA4TwyvWKazRiFQMGFQduTdbq8PPZi yenw== X-Gm-Message-State: AOAM533Arn/Pk1Kq7XN4IisCzcmHs+P7Qo8unKhIVL6a3bX+AkdYrkti T+HG/G0WoT7l3fKU9b7rVdZsxA== X-Received: by 2002:a5d:6b0e:: with SMTP id v14mr2502091wrw.297.1623398460461; Fri, 11 Jun 2021 01:01:00 -0700 (PDT) Received: from localhost ([2a02:168:96c5:1:55ed:514f:6ad7:5bcc]) by smtp.gmail.com with ESMTPSA id x18sm6079898wrw.19.2021.06.11.01.00.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 11 Jun 2021 01:00:59 -0700 (PDT) From: Jann Horn To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Jann Horn , Matthew Wilcox , "Kirill A . Shutemov" , John Hubbard , Jan Kara , stable@vger.kernel.org Subject: [PATCH] mm/gup: fix try_grab_compound_head() race with split_huge_page() Date: Fri, 11 Jun 2021 10:00:27 +0200 Message-Id: <20210611080027.984937-1-jannh@google.com> X-Mailer: git-send-email 2.32.0.272.g935e593368-goog MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org try_grab_compound_head() is used to grab a reference to a page from get_user_pages_fast(), which is only protected against concurrent freeing of page tables (via local_irq_save()), but not against concurrent TLB flushes, freeing of data pages, or splitting of compound pages. Because no reference is held to the page when try_grab_compound_head() is called, the page may have been freed and reallocated by the time its refcount has been elevated; therefore, once we're holding a stable reference to the page, the caller re-checks whether the PTE still points to the same page (with the same access rights). The problem is that try_grab_compound_head() has to grab a reference on the head page; but between the time we look up what the head page is and the time we actually grab a reference on the head page, the compound page may have been split up (either explicitly through split_huge_page() or by freeing the compound page to the buddy allocator and then allocating its individual order-0 pages). If that happens, get_user_pages_fast() may end up returning the right page but lifting the refcount on a now-unrelated page, leading to use-after-free of pages. To fix it: Re-check whether the pages still belong together after lifting the refcount on the head page. Move anything else that checks compound_head(page) below the refcount increment. This can't actually happen on bare-metal x86 (because there, disabling IRQs locks out remote TLB flushes), but it can happen on virtualized x86 (e.g. under KVM) and probably also on arm64. The race window is pretty narrow, and constantly allocating and shattering hugepages isn't exactly fast; for now I've only managed to reproduce this in an x86 KVM guest with an artificially widened timing window (by adding a loop that repeatedly calls `inl(0x3f8 + 5)` in `try_get_compound_head()` to force VM exits, so that PV TLB flushes are used instead of IPIs). Cc: Matthew Wilcox Cc: Kirill A. Shutemov Cc: John Hubbard Cc: Jan Kara Cc: stable@vger.kernel.org Fixes: 7aef4172c795 ("mm: handle PTE-mapped tail pages in gerneric fast gup= implementaiton") Signed-off-by: Jann Horn --- mm/gup.c | 54 +++++++++++++++++++++++++++++++++++++++--------------- 1 file changed, 39 insertions(+), 15 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index 3ded6a5f26b2..1f9c0ac15073 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -43,8 +43,21 @@ static void hpage_pincount_sub(struct page *page, int re= fs) =20 atomic_sub(refs, compound_pincount_ptr(page)); } =20 +/* Equivalent to calling put_page() @refs times. */ +static void put_page_refs(struct page *page, int refs) +{ + VM_BUG_ON_PAGE(page_ref_count(page) < refs, page); + /* + * Calling put_page() for each ref is unnecessarily slow. Only the last + * ref needs a put_page(). + */ + if (refs > 1) + page_ref_sub(page, refs - 1); + put_page(page); +} + /* * Return the compound head page with ref appropriately incremented, * or NULL if that failed. */ @@ -55,8 +68,23 @@ static inline struct page *try_get_compound_head(struct = page *page, int refs) if (WARN_ON_ONCE(page_ref_count(head) < 0)) return NULL; if (unlikely(!page_cache_add_speculative(head, refs))) return NULL; + + /* + * At this point we have a stable reference to the head page; but it + * could be that between the compound_head() lookup and the refcount + * increment, the compound page was split, in which case we'd end up + * holding a reference on a page that has nothing to do with the page + * we were given anymore. + * So now that the head page is stable, recheck that the pages still + * belong together. + */ + if (unlikely(compound_head(page) !=3D head)) { + put_page_refs(head, refs); + return NULL; + } + return head; } =20 /* @@ -94,25 +122,28 @@ __maybe_unused struct page *try_grab_compound_head(str= uct page *page, if (unlikely((flags & FOLL_LONGTERM) && !is_pinnable_page(page))) return NULL; =20 + /* + * CAUTION: Don't use compound_head() on the page before this + * point, the result won't be stable. + */ + page =3D try_get_compound_head(page, refs); + if (!page) + return NULL; + /* * When pinning a compound page of order > 1 (which is what * hpage_pincount_available() checks for), use an exact count to * track it, via hpage_pincount_add/_sub(). * * However, be sure to *also* increment the normal page refcount * field at least once, so that the page really is pinned. */ - if (!hpage_pincount_available(page)) - refs *=3D GUP_PIN_COUNTING_BIAS; - - page =3D try_get_compound_head(page, refs); - if (!page) - return NULL; - if (hpage_pincount_available(page)) hpage_pincount_add(page, refs); + else + page_ref_add(page, refs * (GUP_PIN_COUNTING_BIAS - 1)); =20 mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_ACQUIRED, orig_refs); =20 @@ -134,16 +165,9 @@ static void put_compound_head(struct page *page, int r= efs, unsigned int flags) else refs *=3D GUP_PIN_COUNTING_BIAS; } =20 - VM_BUG_ON_PAGE(page_ref_count(page) < refs, page); - /* - * Calling put_page() for each ref is unnecessarily slow. Only the last - * ref needs a put_page(). - */ - if (refs > 1) - page_ref_sub(page, refs - 1); - put_page(page); + put_page_refs(page, refs); } =20 /** * try_grab_page() - elevate a page's refcount by a flag-dependent amount base-commit: 614124bea77e452aa6df7a8714e8bc820b489922 --=20 2.32.0.272.g935e593368-goog