Received: by 10.213.65.68 with SMTP id h4csp3710158imn; Tue, 10 Apr 2018 03:32:38 -0700 (PDT) X-Google-Smtp-Source: AIpwx488Sb1Nspr8Nw+6VM9Cy9ojaBYFAtmelkus2HWOcN2Njf60AaMxCEdEA9jEgZgPpI5tPgxv X-Received: by 10.98.182.8 with SMTP id j8mr2270887pff.115.1523356358918; Tue, 10 Apr 2018 03:32:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1523356358; cv=none; d=google.com; s=arc-20160816; b=nSztcFndHCf2pTWqmGEf00JBlzOns/DSHBfgzH7PjDtgtovskoYK3/qc0JC9WVHMGh G42qaDHACypTGaN2FhekLe40I3mCRBYwnaMazsLLL0prC9eSgV/PS2v7bcGNyxScCkOj 3JB+JgVfm1rNLnyMRjOcVMlSL4dyz1W6W3e7qaVooXK80dxQW2jAG8cQ0fAaihO06qAf 3nbg7pkabFsUQR3OdM3KyML6oCN+ZEK2uk5hY1Hrzi1qzFbGqlUyZOOUV/V7LkylI9Yn Vslh22LfEULxrB+u7vMjWMJSDgp6fYknwxsXuOMczlkVjgyqq8guCD8JinJ7/NQJgUn+ ycqQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=BEoTIwWGmxDf2tvc1tD0W2Ap48CZuqugxqCDn7BC9OY=; b=njYvl1zoCIUjukg1KgLPQuFj/MH3tHpksnk6Q9XZqZimNyA47VnBklKJ9UzIp2dfP1 lDOQFRtirmCu4vzkGUBhhbEYya1NSEtd4064Kk7ARf+1h3bbEMo6HGGOouP98QfcRLQz pgLXBZMDPlc/ukb4RwUEaV9uYirHXklX2ByIlrXRxzNi2iE0e6DXyXipjRnNGs7P1+tJ MSCd3KbomDP5IqBcMyCcMEbm0J//vhm7+96Zy6LObgPQv0HdlT0ePxxwBRfjeYJn3pAq g+1EWCxHF/SCeFoZg8nb99JbfNGgKN4IqfZVyMpM42pfcXH0O8r7fqV3cK3W6y1b6nAH b9mw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w9-v6si2200461plp.304.2018.04.10.03.32.01; Tue, 10 Apr 2018 03:32:38 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751747AbeDJK2s (ORCPT + 99 others); Tue, 10 Apr 2018 06:28:48 -0400 Received: from mx2.suse.de ([195.135.220.15]:57033 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751427AbeDJK2r (ORCPT ); Tue, 10 Apr 2018 06:28:47 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 3EC01ABD0; Tue, 10 Apr 2018 10:28:46 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id BF7311E09FD; Tue, 10 Apr 2018 12:28:45 +0200 (CEST) Date: Tue, 10 Apr 2018 12:28:45 +0200 From: Jan Kara To: Michal Hocko Cc: Jan Kara , Minchan Kim , Andrew Morton , linux-mm , LKML , Johannes Weiner , Chris Fries Subject: Re: [PATCH] mm: workingset: fix NULL ptr dereference Message-ID: <20180410102845.3ixg2lbnumqn2o6z@quack2.suse.cz> References: <20180409015815.235943-1-minchan@kernel.org> <20180410082243.GW21835@dhcp22.suse.cz> <20180410085531.m2xvzi7nenbrgbve@quack2.suse.cz> <20180410093241.GA21835@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180410093241.GA21835@dhcp22.suse.cz> User-Agent: NeoMutt/20170421 (1.8.2) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 10-04-18 11:32:41, Michal Hocko wrote: > On Tue 10-04-18 10:55:31, Jan Kara wrote: > > On Tue 10-04-18 10:22:43, Michal Hocko wrote: > > > On Mon 09-04-18 10:58:15, Minchan Kim wrote: > > > > Recently, I got a report like below. > > > > > > > > [ 7858.792946] [] __list_del_entry+0x30/0xd0 > > > > [ 7858.792951] [] list_lru_del+0xac/0x1ac > > > > [ 7858.792957] [] page_cache_tree_insert+0xd8/0x110 > > > > [ 7858.792962] [] __add_to_page_cache_locked+0xf8/0x4e0 > > > > [ 7858.792967] [] add_to_page_cache_lru+0x50/0x1ac > > > > [ 7858.792972] [] pagecache_get_page+0x468/0x57c > > > > [ 7858.792979] [] __get_node_page+0x84/0x764 > > > > [ 7858.792986] [] f2fs_iget+0x264/0xdc8 > > > > [ 7858.792991] [] f2fs_lookup+0x3b4/0x660 > > > > [ 7858.792998] [] lookup_slow+0x1e4/0x348 > > > > [ 7858.793003] [] walk_component+0x21c/0x320 > > > > [ 7858.793008] [] path_lookupat+0x90/0x1bc > > > > [ 7858.793013] [] filename_lookup+0x8c/0x1a0 > > > > [ 7858.793018] [] vfs_fstatat+0x84/0x10c > > > > [ 7858.793023] [] SyS_newfstatat+0x28/0x64 > > > > > > > > v4.9 kenrel already has the d3798ae8c6f3,("mm: filemap: don't > > > > plant shadow entries without radix tree node") so I thought > > > > it should be okay. When I was googling, I found others report > > > > such problem and I think current kernel still has the problem. > > > > > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1431567 > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1420335 > > > > > > > > It assumes shadow entry of radix tree relies on the init state > > > > that node->private_list allocated should be list_empty state. > > > > Currently, it's initailized in SLAB constructor which means > > > > node of radix tree would be initialized only when *slub allocates > > > > new page*, not *new object*. So, if some FS or subsystem pass > > > > gfp_mask to __GFP_ZERO, slub allocator will do memset blindly. > > > > That means allocated node can have !list_empty(node->private_list). > > > > It ends up calling NULL deference at workingset_update_node by > > > > failing list_empty check. > > > > > > > > This patch should fix it. > > > > > > > > Fixes: 449dd6984d0e ("mm: keep page cache radix tree nodes in check") > > > > Reported-by: Chris Fries > > > > Cc: Johannes Weiner > > > > Cc: Jan Kara > > > > Signed-off-by: Minchan Kim > > > > > > Regardless of whether it makes sense to use __GFP_ZERO from the upper > > > layer or not, it is subtle as hell to rely on the pre-existing state > > > for a newly allocated object. So yes this makes perfect sense. > > > > > > Do we want CC: stable? > > > Acked-by: Michal Hocko > > > > Well, for hot allocations we do rely on previous state a lot. After all > > that's what slab constructor was created for. Whether radix tree node > > allocation is such a hot path is a question for debate, I agree. > > I really doubt that LIST_INIT is something to notice for the radix tree > allocation. I agree with that. > So I would rather have safe code than rely on the previous state which is > really subtle. And I agree on subtlety part here as well. But even with LIST_INIT we'll be relying on some fields being 0 / NULL so you cannot really say that with LIST_INIT we won't be relying on previous state. And fully memsetting radix_tree_node on allocation *would* IMO have effect on the performance. So I'm not convinced LIST_INIT buys us much. It deals with __GFP_ZERO problem but not much else. > Btw. I am not a huge fan of ctor semantic as we have it. I am not really > sure all users understand when it is called... Yeah, ctor semantics is subtle and we had some hard to debug bugs in VFS because someone didn't understand the semantics. But OTOH for large structures and frequent allocations the gain is worth it. Honza -- Jan Kara SUSE Labs, CR