Received: by 10.213.65.68 with SMTP id h4csp3629128imn; Tue, 10 Apr 2018 01:58:14 -0700 (PDT) X-Google-Smtp-Source: AIpwx495A1OskETM/iUa3g+OpVb7hGJT0ETnOlddsk22jtc1tc8wcHMko5jQqYA8q/lJNoepuXtC X-Received: by 2002:a17:902:128c:: with SMTP id g12-v6mr41594401pla.98.1523350694026; Tue, 10 Apr 2018 01:58:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1523350693; cv=none; d=google.com; s=arc-20160816; b=uKDcCH2NhVLB9higioV6JFbAeyG5Sy2awy7q5ETwAEsPE0XI0k5yupolPrvRMNpY6Q APVSYr5sMo859LK6CRCt3kp7P1bpd9wH+Udh9WIrZ1g1lAvAqgWMgsaqnHmcL2R1/9XB vxG9hLzQd+h4IKd5GxiWK21u8Ug1Y5MDP3gpt1a61WMLeMxGVfrsfuF7fu9eM8zsb8KZ 8Gf13jhw7AkD40kOhVbpgXlbsqciHi/V4wUuOenDf3z+Zf7nzThyH4CmKiItOScOuXxH Wghb3o7DRvxSzzzpdGaCjhxTy51iCqephdLHVl+DkMbVZWMY/bFr8dZGmGAtgc68jaeo IF5g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=sFcI6VU4lG2pedxBPrVMZ4sdSi3/UiIpgwGA2j7g4W0=; b=RugkubN+w3ZkrGQfoGjv1jgc6ElYaRKNsBeK2r2w7MavaqTcy01c04wn5qwuQ15yvu DJ/XDLBGdPfU/gpnxZCdU2MF9YBz4KLMCQ2UdnUZNQ7mskFLhmongu14n64rDprVbQfO hlaS3nLo0f8GlLvHwcQ5nnqSAcBTa3TaKqQSvj6uyMP8OnSL5OVBOQ8x4PgmGjsrFP/7 v96LYjSZzaTTAf1a6zis+Ji2QvW06NRn7PmHhHOJg+9OhFFNaFhujslvQWXnf0djSvKL BGdSiI8rKGmKjT+/B9MVskW6Lc2PnjdGS1vhoZhsQlFEYTzS0/gS/pCCAKP72irujfDc LlTQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z1-v6si2160121plb.101.2018.04.10.01.57.37; Tue, 10 Apr 2018 01:58:13 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752368AbeDJIu4 (ORCPT + 99 others); Tue, 10 Apr 2018 04:50:56 -0400 Received: from mx2.suse.de ([195.135.220.15]:45560 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751888AbeDJIuy (ORCPT ); Tue, 10 Apr 2018 04:50:54 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 27622AF5E; Tue, 10 Apr 2018 08:50:52 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 2A7561E09FD; Tue, 10 Apr 2018 10:50:49 +0200 (CEST) Date: Tue, 10 Apr 2018 10:50:49 +0200 From: Jan Kara To: Minchan Kim Cc: Matthew Wilcox , Chao Yu , Jaegeuk Kim , Christopher Lameter , Andrew Morton , linux-mm , LKML , Johannes Weiner , Jan Kara , Chris Fries , linux-f2fs-devel@lists.sourceforge.net, linux-fsdevel@vger.kernel.org Subject: Re: [PATCH] mm: workingset: fix NULL ptr dereference Message-ID: <20180410085049.7ysheqavwuykjkvn@quack2.suse.cz> References: <20180409111403.GA31652@bombadil.infradead.org> <20180409112514.GA195937@rodete-laptop-imager.corp.google.com> <7706245c-2661-f28b-f7f9-8f11e1ae932b@huawei.com> <20180409144958.GA211679@rodete-laptop-imager.corp.google.com> <20180409152032.GB11756@bombadil.infradead.org> <20180409230409.GA214542@rodete-desktop-imager.corp.google.com> <20180410011211.GA31282@bombadil.infradead.org> <20180410023339.GB214542@rodete-desktop-imager.corp.google.com> <20180410024152.GC31282@bombadil.infradead.org> <20180410025903.GA38000@rodete-desktop-imager.corp.google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180410025903.GA38000@rodete-desktop-imager.corp.google.com> User-Agent: NeoMutt/20170421 (1.8.2) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 10-04-18 11:59:03, Minchan Kim wrote: > On Mon, Apr 09, 2018 at 07:41:52PM -0700, Matthew Wilcox wrote: > > On Tue, Apr 10, 2018 at 11:33:39AM +0900, Minchan Kim wrote: > > > @@ -522,7 +532,7 @@ EXPORT_SYMBOL(radix_tree_preload); > > > */ > > > int radix_tree_maybe_preload(gfp_t gfp_mask) > > > { > > > - if (gfpflags_allow_blocking(gfp_mask)) > > > + if (gfpflags_allow_blocking(gfp_mask) && !(gfp_mask & __GFP_ZERO)) > > > return __radix_tree_preload(gfp_mask, RADIX_TREE_PRELOAD_SIZE); > > > /* Preloading doesn't help anything with this gfp mask, skip it */ > > > preempt_disable(); > > > > No, you've completely misunderstood what's going on in this function. > > Okay, I hope this version clear current concerns. > > From fb37c41b90f7d3ead1798e5cb7baef76709afd94 Mon Sep 17 00:00:00 2001 > From: Minchan Kim > Date: Tue, 10 Apr 2018 11:54:57 +0900 > Subject: [PATCH v3] mm: workingset: fix NULL ptr dereference > > It assumes shadow entries of radix tree rely on the init state > that node->private_list allocated newly is list_empty state > for the working. Currently, it's initailized in SLAB constructor > which means node of radix tree would be initialized only when > *slub allocates new page*, not *slub alloctes new object*. > > If some FS or subsystem pass gfp_mask to __GFP_ZERO, that means > newly allocated node can have !list_empty(node->private_list) > by memset of slab allocator. It ends up calling NULL deference > at workingset_update_node by failing list_empty check. > > This patch fixes it. The patch looks good. I'd just rephrase the changelog to be more understandable. Something like: GFP mask passed to page cache functions (often coming from mapping->gfp_mask) is used both for allocation of page cache page and for allocation of radix tree metadata necessary to add the page to the page cache. When the mask contains __GFP_ZERO (as is the case for some f2fs metadata mappings), this breaks radix tree code as that code expects allocated radix tree nodes to be properly initialized by the slab constructor and not zeroed. In particular node->private_list is failing list_empty() check and the following list operation in workingset_update_node() will dereference NULL. Fix the problem by removing __GFP_ZERO from the mask for radix tree allocations. Also warn if __GFP_ZERO gets passed to __radix_tree_preload() to avoid silent breakage in the future for other radix tree users. With that fixed you can add: Reviewed-by: Jan Kara Honza > > Fixes: 449dd6984d0e ("mm: keep page cache radix tree nodes in check") > Cc: Johannes Weiner > Cc: Jan Kara > Cc: Matthew Wilcox > Cc: Jaegeuk Kim > Cc: Chao Yu > Cc: Christopher Lameter > Cc: linux-fsdevel@vger.kernel.org > Cc: stable@vger.kernel.org > Reported-by: Chris Fries > Signed-off-by: Minchan Kim > --- > lib/radix-tree.c | 9 +++++++++ > mm/filemap.c | 5 +++-- > 2 files changed, 12 insertions(+), 2 deletions(-) > > diff --git a/lib/radix-tree.c b/lib/radix-tree.c > index da9e10c827df..7569e637dbaa 100644 > --- a/lib/radix-tree.c > +++ b/lib/radix-tree.c > @@ -470,6 +470,15 @@ static __must_check int __radix_tree_preload(gfp_t gfp_mask, unsigned nr) > struct radix_tree_node *node; > int ret = -ENOMEM; > > + /* > + * New allocate node must have node->private_list as INIT_LIST_HEAD > + * state by workingset shadow memory implementation. > + * If user pass __GFP_ZERO by mistake, slab allocator will clear > + * node->private_list, which makes a BUG. Rather than going Oops, > + * just fix and warn about it. > + */ > + if (WARN_ON(gfp_mask & __GFP_ZERO)) > + gfp_mask &= ~__GFP_ZERO; > /* > * Nodes preloaded by one cgroup can be be used by another cgroup, so > * they should never be accounted to any particular memory cgroup. > diff --git a/mm/filemap.c b/mm/filemap.c > index ab77e19ab09c..b6de9d691c8a 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -786,7 +786,7 @@ int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask) > VM_BUG_ON_PAGE(!PageLocked(new), new); > VM_BUG_ON_PAGE(new->mapping, new); > > - error = radix_tree_preload(gfp_mask & ~__GFP_HIGHMEM); > + error = radix_tree_preload(gfp_mask & ~(__GFP_HIGHMEM | __GFP_ZERO)); > if (!error) { > struct address_space *mapping = old->mapping; > void (*freepage)(struct page *); > @@ -842,7 +842,8 @@ static int __add_to_page_cache_locked(struct page *page, > return error; > } > > - error = radix_tree_maybe_preload(gfp_mask & ~__GFP_HIGHMEM); > + error = radix_tree_maybe_preload(gfp_mask & > + ~(__GFP_HIGHMEM | __GFP_ZERO)); > if (error) { > if (!huge) > mem_cgroup_cancel_charge(page, memcg, false); > -- > 2.17.0.484.g0c8726318c-goog > > -- Jan Kara SUSE Labs, CR