Received: by 2002:a05:6a10:2785:0:0:0:0 with SMTP id ia5csp51103pxb; Tue, 12 Jan 2021 19:45:46 -0800 (PST) X-Google-Smtp-Source: ABdhPJypAH/uMp+2Npyr/tKU4v2ivJefwfsWIjD6Gsll4+Tba5TeNFsKM8pn7PQY7GqgRN5Ij+Gp X-Received: by 2002:a50:b5c5:: with SMTP id a63mr173159ede.227.1610509546620; Tue, 12 Jan 2021 19:45:46 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1610509546; cv=none; d=google.com; s=arc-20160816; b=e7VqCKnDYAK/BPpIZD1oc+ZaB16wBi/UWxkBZCxwpjIUKYGWs0yqxQKAFjSv3TbVmc qubq4CL1mItCX/uyGd9iHOGrXELafZ083en8UgnMMqjGg8eydwnjcMMGt1ZiOw2eb22q 5kOS+yeG9ti1XbBwPOc6+mJQpMsnDSR92ySi8xcwWaZp7e7EHReblTAhD7VWxc4392lz cyRzG+mi3CF/tsclyo6CqyWk2lAsPckpD7lvWAFM/2G162AjeM7f2ETY7U80KWFimwvA 14+WqBYzZ/SiNVWpT7qfS67DCvVvuKljQIIOMrkpcBcCIbFSz4NzwRjehoBoSralzfF9 lYBQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=CH+xsIRM+IV0uinVWZZSPxYmWLGJmAD6nwl0cbAWTi4=; b=C5eKWiuK8SOVsVlsT0VaHYBhKISONTxvhxfN2dCfq0Wn3mRngMQVj42ImZBPy/s7d0 1uAN3VOCpdeweNrFpKT8VN25809ESVu+8AKS0qVhXarJGMayKTzRffZjVOzwr2YPqLo3 hzTCMQcWXNgzS0bOMJ4MVB+stUjLJWsowK1fuPF0ZpXpltV0zNfvu/nz7mOmflaUdyPz TEt9fnuv3OLjzMojOQXJ4Tbq02X4tAXM/Jg33VyTlLrfENFvp57+4edAkgdIizglbGL9 fyNhEznsvPDUQuOX9ICvj8Vv+TPIhv31pmTV/u1kJdJV9dhmGoXwZmHFZRWO+/wC1UWM 35aQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b=Vgjl7SKu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id i12si430455edn.0.2021.01.12.19.45.23; Tue, 12 Jan 2021 19:45:46 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b=Vgjl7SKu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726483AbhAMDcH (ORCPT + 99 others); Tue, 12 Jan 2021 22:32:07 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37052 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726310AbhAMDcG (ORCPT ); Tue, 12 Jan 2021 22:32:06 -0500 Received: from mail-lf1-x12e.google.com (mail-lf1-x12e.google.com [IPv6:2a00:1450:4864:20::12e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 09CCEC061575 for ; Tue, 12 Jan 2021 19:31:26 -0800 (PST) Received: by mail-lf1-x12e.google.com with SMTP id s26so656494lfc.8 for ; Tue, 12 Jan 2021 19:31:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=CH+xsIRM+IV0uinVWZZSPxYmWLGJmAD6nwl0cbAWTi4=; b=Vgjl7SKuukjWFtnm7i3ijVQrIXWvFzxtoe/B4OnSHO3sGSWu7IPVs4gEMtptVVrJH9 rGL5Gzd9pD0dU+kkbFRXafvsqm107Yx5JFtNWGTnl6xau+f57pdALjQESjacpWyhmt57 e8wmN5OnfVL4Iv/iOEQ9nCjD9Yb7schG5pB+E= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=CH+xsIRM+IV0uinVWZZSPxYmWLGJmAD6nwl0cbAWTi4=; b=laLJga+UbZ5jHznzeMuJn38qNAqYqHVSgl5efIl03KRKTrKS5Rr+i+jJIJpabJe8Yk sYxg/GYhyvyasoe+TIJMyqoeONqxQ3sO6Q0toNwn092inJRsj0FmrQknGaW8g/OfAzGl PqyFMQAn9vFUozIXJah8NlsBtX911eiZOl6Xzf2Gbj8S5BH/iECnIqVrv5PGDzkkSgmf 3009P0q+DEK3bURdJW66pbb4seXug793tXD7QCDTKHuXGZq1SM3GH3LLu43PNqH857+X NlG+ZE9w0gX3IxOD6porY2G626G8ZeLMEZWIE98UgGAjMC3XB6O2w7MVdhlo6g3j6Bye uNDA== X-Gm-Message-State: AOAM532ZllvJbDUso2TBaFZOwei01Kt95mYBg0wZKe3iy85bU6sx5vM2 EwYeeTPjoNdnVu3CHaVEWrGHvPb54Kex3w== X-Received: by 2002:ac2:4d28:: with SMTP id h8mr913805lfk.263.1610508684480; Tue, 12 Jan 2021 19:31:24 -0800 (PST) Received: from mail-lj1-f182.google.com (mail-lj1-f182.google.com. [209.85.208.182]) by smtp.gmail.com with ESMTPSA id a7sm64879lfb.78.2021.01.12.19.31.23 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 12 Jan 2021 19:31:24 -0800 (PST) Received: by mail-lj1-f182.google.com with SMTP id m13so916668ljo.11 for ; Tue, 12 Jan 2021 19:31:23 -0800 (PST) X-Received: by 2002:a2e:b4af:: with SMTP id q15mr1030700ljm.507.1610508683314; Tue, 12 Jan 2021 19:31:23 -0800 (PST) MIME-Version: 1.0 References: <20210110004435.26382-1-aarcange@redhat.com> <45806a5a-65c2-67ce-fc92-dc8c2144d766@nvidia.com> <20210113021619.GL35215@casper.infradead.org> In-Reply-To: <20210113021619.GL35215@casper.infradead.org> From: Linus Torvalds Date: Tue, 12 Jan 2021 19:31:07 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 0/1] mm: restore full accuracy in COW page reuse To: Matthew Wilcox Cc: John Hubbard , Andrea Arcangeli , Andrew Morton , Linux-MM , Linux Kernel Mailing List , Yu Zhao , Andy Lutomirski , Peter Xu , Pavel Emelyanov , Mike Kravetz , Mike Rapoport , Minchan Kim , Will Deacon , Peter Zijlstra , Hugh Dickins , "Kirill A. Shutemov" , Oleg Nesterov , Jann Horn , Kees Cook , Leon Romanovsky , Jason Gunthorpe , Jan Kara , Kirill Tkhai , Nadav Amit , Jens Axboe Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jan 12, 2021 at 6:16 PM Matthew Wilcox wrote: > > The thing about the speculative page cache references is that they can > temporarily bump a refcount on a page which _used_ to be in the page > cache and has now been reallocated as some other kind of page. Oh, and thinking about this made me think we might actually have a serious bug here, and it has nothing what-so-ever to do with COW, GUP, or even the page count itself. It's unlikely enough that I think it's mostly theoretical, but tell me I'm wrong. PLEASE tell me I'm wrong: CPU1 does page_cache_get_speculative under RCU lock CPU2 frees and re-uses the page CPU1 CPU2 ---- ---- page = xas_load(&xas); if (!page_cache_get_speculative(page)) goto repeat; .. succeeds .. remove page from XA release page reuse for something else .. and then re-check .. if (unlikely(page != xas_reload(&xas))) { put_page(page); goto repeat; } ok, the above all looks fine. We got the speculative ref, but then we noticed that its' not valid any more, so we put it again. All good, right? Wrong. What if that "reuse for something else" was actually really quick, and both allocated and released it? That still sounds good, right? Yes, now the "put_page()" will be the one that _actually_ releases the page, but we're still fine, right? Very very wrong. The "reuse for something else" on CPU2 might have gotten not an order-0 page, but a *high-order* page. So it allocated (and then immediately free'd) maybe an order-2 allocation with _four_ pages, and the re-use happened when we had coalesced the buddy pages. But when we release the page on CPU1, we will release just _one_ page, and the other three pages will be lost forever. IOW, we restored the page count perfectly fine, but we screwed up the page sizes and buddy information. Ok, so the above is so unlikely from a timing standpoint that I don't think it ever happens, but I don't see why it couldn't happen in theory. Please somebody tell me I'm missing some clever thing we do to make sure this can actually not happen.. Linus