Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp1900316pxf; Fri, 19 Mar 2021 20:44:00 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyqoeCd86OfZWH6TADsfHyy7oG9GD5eUZTGVtWQfYMVSCjm3KgsZJ+mPB1gZ3g4Nic8+wd5 X-Received: by 2002:a17:906:7102:: with SMTP id x2mr7919920ejj.355.1616211840616; Fri, 19 Mar 2021 20:44:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1616211840; cv=none; d=google.com; s=arc-20160816; b=Sp3ay1fyUUyA1XohYtqd1zrf+29zGefdntQF0RbTskB10FWvXp4+DIIt8o8Yu7f+nm /RzAXk+uw6X1147Xq070bgrP4bBdvRnarL+7zhGUcwr08t+YSF+FzAW+Vr/uucs8p5Bp NgijfJpro+rUoZPGPv22GBd1WP83UdMooUjhYrEf4vYgecHReDM31Hm4udFLl/ZP1C9Z Zn2ynoxdZkW++xdWeOXRe57QnZ9QPAGdh3qtLK/xTM6BGWh0EFo63n3iiBu6lQwFyHrM H666dv809JfVBN7tIdIBOSgbc02nYstKi3g4Mlzpn6Bf1A2RwFQu4yIOXzKKiey/doUW PUYQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=kEfEWIhBq2zE+HxZoC4lm1vNYaQCx2j9hxZ78LM+5kQ=; b=Q35OItaUmR9ycMDDfEj0w1JrRpQkmSgBjNUq3X7zhwOzj8kBeelqmhadXHmMcCDrmj /74hql0vvTg6xMSjT8jZLGMoKqFczLiQdWdH54AoCW3ucOPBAzHzXCp77DJui+4smwZ+ 9wV4SfLzkw8lG2jAK+jQmZnnZ3jCHIraBTfnIzz1rVd55rRLhtkB60SmJ6dqAi0e5SfG qft2RKfpjrBIfWysRiL5TV5HHgH+tuZAiKag/7WqwHWs/PVhX4ToaVGKALZgmIE0j8XJ E0tpsOkvqewPmrSPQM63t50kR7BxyMyiuuwSi0uOUJQvs6o6LwRkRFnwfVeFSM6ZXHha Pl0g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=ZupSzN80; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id a18si6125728ejr.606.2021.03.19.20.43.04; Fri, 19 Mar 2021 20:44:00 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=ZupSzN80; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229748AbhCTD0c (ORCPT + 99 others); Fri, 19 Mar 2021 23:26:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53320 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229618AbhCTD0c (ORCPT ); Fri, 19 Mar 2021 23:26:32 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BD7F8C061761; Fri, 19 Mar 2021 20:26:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=kEfEWIhBq2zE+HxZoC4lm1vNYaQCx2j9hxZ78LM+5kQ=; b=ZupSzN80E4VJvwrYB0bcCXNkhO Gkg0eqfi2txQszZAkHfFSsz1UHnKwTf1fAnSwtwkfpFDXo739+jczEndpqVQNiOeIS2g9F38+8N7E wMc7m4f7H73fzy6hV/cn5HnQ2Ix66O0q238F97ElkNDsvMRNYcpI5JGjruoPPkFE6vOb2qiH5f7/W IWLpUj1RXd9Sw1xbieBKa0cT179ibjAocvig7CKTih/ZVAF9V0FvyEa9r3u01r8Ntq6XbEo/8EIZE EI1VOmjECBLFc8nEyhtomT14d+WppGRt/ynK9qvjuNVdO5OsA6XhzvHCF5w6aIQMD4izMVhX8u7+m H2rpRYOg==; Received: from willy by casper.infradead.org with local (Exim 4.94 #2 (Red Hat Linux)) id 1lNSFQ-005Kxb-QF; Sat, 20 Mar 2021 03:25:58 +0000 Date: Sat, 20 Mar 2021 03:25:56 +0000 From: Matthew Wilcox To: Hugh Dickins Cc: Johannes Weiner , Andrew Morton , Michal Hocko , Zhou Guanghui , Zi Yan , Shakeel Butt , Roman Gushchin , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: Re: [PATCH] mm: page_alloc: fix memcg accounting leak in speculative cache lookup Message-ID: <20210320032556.GD3420@casper.infradead.org> References: <20210319071547.60973-1-hannes@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Mar 19, 2021 at 06:52:58PM -0700, Hugh Dickins wrote: > > + /* > > + * Drop the base reference from __alloc_pages and free. In > > + * case there is an outstanding speculative reference, from > > + * e.g. the page cache, it will put and free the page later. > > + */ > > + if (likely(put_page_testzero(page))) { > > free_the_page(page, order); > > - else if (!PageHead(page)) > > + return; > > + } > > + > > + /* > > + * The speculative reference will put and free the page. > > + * > > + * However, if the speculation was into a higher-order page > > + * chunk that isn't marked compound, the other side will know > > + * nothing about our buddy pages and only free the order-0 > > + * page at the start of our chunk! We must split off and free > > + * the buddy pages here. > > + * > > + * The buddy pages aren't individually refcounted, so they > > + * can't have any pending speculative references themselves. > > + */ > > + if (!PageHead(page) && order > 0) { > > The put_page_testzero() has released our reference to the first > subpage of page: it's now under the control of the racing speculative > lookup. So it seems to me unsafe to be checking PageHead(page) here: > if it was actually a compound page, PageHead might already be cleared > by now, and we doubly free its tail pages below? I think we need to > use a "bool compound = PageHead(page)" on entry to __free_pages(). > > Or alternatively, it's wrong to call __free_pages() on a compound > page anyway, so we should not check PageHead at all, except in a > WARN_ON_ONCE(PageCompound(page)) at the start? Alas ... $ git grep '__free_pages\>.*compound' drivers/dma-buf/heaps/system_heap.c: __free_pages(page, compound_order(page)); drivers/dma-buf/heaps/system_heap.c: __free_pages(p, compound_order(p)); drivers/dma-buf/heaps/system_heap.c: __free_pages(page, compound_order(page)); mm/huge_memory.c: __free_pages(zero_page, compound_order(zero_page)); mm/huge_memory.c: __free_pages(zero_page, compound_order(zero_page)); mm/slub.c: __free_pages(page, compound_order(page)); Maybe we should disallow it! There are a few other places to check: $ grep -l __GFP_COMP $(git grep -lw __free_pages) | wc -l 24 (assuming the pages are allocated and freed in the same file, which is a reasonable approximation, but not guaranteed to catch everything. Many of these 24 will be false positives, of course.)