Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp58700pxj; Wed, 16 Jun 2021 20:14:29 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwExy86fSmvyUgxHOcOefRmUcpgHgoxVibWMwvmbU3K+lySKY1RPaYflV/BVwCVe4nLqQhV X-Received: by 2002:aa7:d846:: with SMTP id f6mr3467191eds.341.1623899669119; Wed, 16 Jun 2021 20:14:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1623899669; cv=none; d=google.com; s=arc-20160816; b=j8TU8jvntKuH7Tfqi5KRDo+JMCO9o03XGXxJ0DafUpsMIvfGAxmLXIBcrjx8hXt/AN Z2lB9dClbvNxc5Q0tjtJl6q+9zx8SPN0PitKBuPeAUv6PimjNnRlmwHxgRlEJtIiJrwm TuOIcFCbf7XWc4Xy7pxXzpv9CFM8SVekMLPEJjgUsUYj7YybIwsbmhDImkjNXyna8kGU J2n7/HYESnmGBRcO+CLBQ3P9j/gg14UlHsobh7MTbQYhI7jZro6UJU1s5/9vG8yfuVpI 0MEzw5DYhPMeI7AFglAndDxDbEWbFl/MhrihR0LKc+GSoj+Jl9LQnxugGjrbSvTpb7UJ KpUw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=CFdiQI3qmnEyPxLcKXhVtU8Qc6gVzEZmvcKXtiv7pwA=; b=o54Kt9QqZuLpi04pffPtVZZQnl7uONWZcf2pf74qYnwvz3XadOkzILwjGdTwma/AwL X1NV4htRh9L1SeGUI/EPqH/vHy1kQFrapRNAhzeYlqo1M9APZHoskP45nXUGyDFyGJBP E6S1MUFcX1VS2o/1xboTSsarU+fAL0FaP+j4MYhAllQqo4bR+OXTWwcm1UpjRGnSdQU0 U3FhZRkvnTaF/UwH9fewYzAwA6XOjyIym9/AaPVd3mrGh813HBnkcL5hxgv7BS9Lxni8 pgbJLWMlU7FlYt1Nh/LaZN1jBjNNMbvFmHh4ps4r2O3cC7DbRGwrnd1I/Za2hTY6NH5W XiIA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=rCYk6Iop; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id r10si3798159edy.600.2021.06.16.20.14.07; Wed, 16 Jun 2021 20:14:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=rCYk6Iop; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231923AbhFPSnL (ORCPT + 99 others); Wed, 16 Jun 2021 14:43:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53576 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231910AbhFPSnK (ORCPT ); Wed, 16 Jun 2021 14:43:10 -0400 Received: from mail-ej1-x62b.google.com (mail-ej1-x62b.google.com [IPv6:2a00:1450:4864:20::62b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AE10BC061574; Wed, 16 Jun 2021 11:41:03 -0700 (PDT) Received: by mail-ej1-x62b.google.com with SMTP id my49so5407175ejc.7; Wed, 16 Jun 2021 11:41:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=CFdiQI3qmnEyPxLcKXhVtU8Qc6gVzEZmvcKXtiv7pwA=; b=rCYk6IopIKVvTbBrdVu9tEpKZMkk+xOngtR5Y4tsgpcoBzFjKVkYrbSTxv+cuYY+69 Z8ig6vEBX1rn57Gu0b86t36n4+pwXzXJMw3gUbUnp1xOqLhxCZm/HlyuuMrDm+K+xqrU Pb/7kzSFz4NnBlPMR7tI1S20OueZofIzOP+O/7+lcO+3Sz5mbVP2WwDHemXHAxP2yaeC X8afD2/hPmcpyrExmVwEoYkBo+BUN7rEjVLt8/4OnXrV/K3LinQYmWJsw3n4P9J8yPsH sg/UFIS+B+PWadlA/kO9t56jGyx0H/MLrRvvTf5fvVHfat2vtzNAG1LeDcJM8D1XdajX dSag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=CFdiQI3qmnEyPxLcKXhVtU8Qc6gVzEZmvcKXtiv7pwA=; b=ZJPc6zPZdbh57k5z2YtEi5Z22Luj5C6gH4FehEZBiSZgIVLj2LA4R7C9Mgnsq6V6ie LTUg3ug5PtNciPvQkY1Fz+QlHMsNDoggWOYPZjhVFYa+Rz1nj67MgsL+VRhAR6WXRKJI 2KAjCe68UAY9aSEWvViHwLKpVmU+dA/A03uQhl9bgb5TepB09MhdgbVVZcfULtpZVn+a 905w6b/QchToYUrt6LNHcVDEI7eQex7gm6TY34IN4vZYA7bmTga+ll40jmwSQ7u2GAzG GHimtvoKBHY2uBDeYouC0GLvX8lD9wcKlWWvUe0UUUVBg6M7T1Eg+eawIfj/ZiPZJMha 9xrw== X-Gm-Message-State: AOAM5305GAyWVkTGsWiyGaRBl33dXfmgojJadR87eZkQrZdGv7yZPhlD ACB4vn/+50O6EIzxogCcpsn6Ui8M43nIZWRuYBc= X-Received: by 2002:a17:906:1691:: with SMTP id s17mr900783ejd.161.1623868862266; Wed, 16 Jun 2021 11:41:02 -0700 (PDT) MIME-Version: 1.0 References: <20210615012014.1100672-1-jannh@google.com> <50d828d1-2ce6-21b4-0e27-fb15daa77561@nvidia.com> <6d21f8cb-4b72-bdec-386c-684ddbcdada1@suse.cz> In-Reply-To: <6d21f8cb-4b72-bdec-386c-684ddbcdada1@suse.cz> From: Yang Shi Date: Wed, 16 Jun 2021 11:40:50 -0700 Message-ID: Subject: Re: [PATCH v2] mm/gup: fix try_grab_compound_head() race with split_huge_page() To: Vlastimil Babka Cc: Jann Horn , John Hubbard , Matthew Wilcox , Andrew Morton , Linux-MM , kernel list , "Kirill A . Shutemov" , Jan Kara , stable Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 16, 2021 at 10:27 AM Vlastimil Babka wrote: > > On 6/16/21 1:10 AM, Yang Shi wrote: > > On Tue, Jun 15, 2021 at 5:10 AM Jann Horn wrote: > >> > >> On Tue, Jun 15, 2021 at 8:37 AM John Hubbard wrote: > >> > On 6/14/21 6:20 PM, Jann Horn wrote: > >> > > try_grab_compound_head() is used to grab a reference to a page from > >> > > get_user_pages_fast(), which is only protected against concurrent > >> > > freeing of page tables (via local_irq_save()), but not against > >> > > concurrent TLB flushes, freeing of data pages, or splitting of compound > >> > > pages. > >> [...] > >> > Reviewed-by: John Hubbard > >> > >> Thanks! > >> > >> [...] > >> > > @@ -55,8 +72,23 @@ static inline struct page *try_get_compound_head(struct page *page, int refs) > >> > > if (WARN_ON_ONCE(page_ref_count(head) < 0)) > >> > > return NULL; > >> > > if (unlikely(!page_cache_add_speculative(head, refs))) > >> > > return NULL; > >> > > + > >> > > + /* > >> > > + * At this point we have a stable reference to the head page; but it > >> > > + * could be that between the compound_head() lookup and the refcount > >> > > + * increment, the compound page was split, in which case we'd end up > >> > > + * holding a reference on a page that has nothing to do with the page > >> > > + * we were given anymore. > >> > > + * So now that the head page is stable, recheck that the pages still > >> > > + * belong together. > >> > > + */ > >> > > + if (unlikely(compound_head(page) != head)) { > >> > > >> > I was just wondering about what all could happen here. Such as: page gets split, > >> > reallocated into a different-sized compound page, one that still has page pointing > >> > to head. I think that's OK, because we don't look at or change other huge page > >> > fields. > >> > > >> > But I thought I'd mention the idea in case anyone else has any clever ideas about > >> > how this simple check might be insufficient here. It seems fine to me, but I > >> > routinely lack enough imagination about concurrent operations. :) > >> > >> Hmmm... I think the scariest aspect here is probably the interaction > >> with concurrent allocation of a compound page on architectures with > >> store-store reordering (like ARM). *If* the page allocator handled > >> compound pages with lockless, non-atomic percpu freelists, I think it > >> might be possible that the zeroing of tail_page->compound_head in > >> put_page() could be reordered after the page has been freed, > >> reallocated and set to refcount 1 again? > >> > >> That shouldn't be possible at the moment, but it is still a bit scary. > > > > It might be possible after Mel's "mm/page_alloc: Allow high-order > > pages to be stored on the per-cpu lists" patch > > (https://patchwork.kernel.org/project/linux-mm/patch/20210611135753.GC30378@techsingularity.net/). > > Those would be percpu indeed, but not "lockless, non-atomic", no? They are > protected by a local_lock. The local_lock is *not* a lock on non-PREEMPT_RT kernel IIUC. It disables preempt and IRQ. But preempt disable is no-op on non-preempt kernel. IRQ disable can guarantee it is atomic context, but I'm not sure if it is equivalent to "atomic freelists" in Jann's context. > > >> > >> > >> I think the lockless page cache code also has to deal with somewhat > >> similar ordering concerns when it uses page_cache_get_speculative(), > >> e.g. in mapping_get_entry() - first it looks up a page pointer with > >> xas_load(), and any access to the page later on would be a _dependent > >> load_, but if the page then gets freed, reallocated, and inserted into > >> the page cache again before the refcount increment and the re-check > >> using xas_reload(), then there would be no data dependency from > >> xas_reload() to the following use of the page... > >> > > >