Received: by 10.223.176.5 with SMTP id f5csp4372085wra; Tue, 30 Jan 2018 06:25:00 -0800 (PST) X-Google-Smtp-Source: AH8x225J54/PIjbKTy1Y5MZOQrrsPRCcsw8hoPmD1KoGHzkFJIMcLKeQd1blUdhKkwYjBEhPy4AC X-Received: by 2002:a17:902:40a:: with SMTP id 10-v6mr25067720ple.88.1517322300247; Tue, 30 Jan 2018 06:25:00 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1517322300; cv=none; d=google.com; s=arc-20160816; b=BRqj87ir4g+uS1vri8BKAa80S3w9KNAHKE9qPLt/b4caLi8zsO7/KHV0e9pXr1uPnq rp6uXXBq8NKi8Sx40UxNC1bJTy0CTjrw6SA1/NTCh+hjklJyoPSZq7JjguaQZsXplCOG zv2Cfp1PorZI4J02tzh2/Iv5yK2nfQtZaeKZfX2adep6DqIe1xs6CGAu3EaXaWUGHjR2 6EYo5J9woi39OujaHOCwNolhMUE6EwlPE7OjJ6AFenVRGCPGLKtTNg8v7Z1XnoaqAC43 FkkORDgn5BfzuWxEYcMzwBXDYLlSXwmJLeknG1xCEGQAFzfeua217JyNoSQ+DPBFQYUn dB9Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=64Isb6kGaUJ0/++fowVE6mNMMAEYKc7Dn1V58KXWC8o=; b=YHgWbay0AV5PaCaRhkCYOs+1pDDcS6ka0EuOBqiSbeQs3EM9j73HVelOM07KZtn+cf 8gH604ykIGPmYQNFSoMUkk2Eo094mYMskD2UAc8zRnymuYW/5sUcfPYC1YcAWkfhOduH 7MCd0eyNGd2aX3tQ9Z/R8SQrCEKKdfGkLfRYs6bQtsG+2PVfQGjRn2zp9kXI1g82s3wu ATsy11uXe/He/E8B55lpfCI4TwaStoXAK09YVq2KbM4GA4SkWWXc7sZHSwXMwiK+6wGp 5FVUE7mLm9YbDIhZaHoWBuaHS+Eis0oe+WDFpGTaRSKGSkESKYbnD3yRUzWv1G5IR29r jMzA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z3-v6si6924235plb.117.2018.01.30.06.24.45; Tue, 30 Jan 2018 06:25:00 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752837AbeA3OBJ (ORCPT + 99 others); Tue, 30 Jan 2018 09:01:09 -0500 Received: from mx2.suse.de ([195.135.220.15]:37814 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751786AbeA3OBH (ORCPT ); Tue, 30 Jan 2018 09:01:07 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id D59D6AAB9; Tue, 30 Jan 2018 14:01:05 +0000 (UTC) Date: Tue, 30 Jan 2018 15:01:04 +0100 From: Michal Hocko To: Dmitry Vyukov Cc: "Kirill A. Shutemov" , Florian Westphal , Tetsuo Handa , David Miller , netfilter-devel@vger.kernel.org, coreteam@netfilter.org, netdev , Andrea Arcangeli , Yang Shi , syzkaller-bugs@googlegroups.com, LKML , Ingo Molnar , Linux-MM , David Rientjes , Andrew Morton , guro@fb.com, "Kirill A. Shutemov" Subject: Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2) Message-ID: <20180130140104.GE21609@dhcp22.suse.cz> References: <20180129072357.GD5906@breakpoint.cc> <20180129082649.sysf57wlp7i7ltb2@node.shutemov.name> <20180129165722.GF5906@breakpoint.cc> <20180129182811.fze4vrb5zd5cojmr@node.shutemov.name> <20180129223522.GG5906@breakpoint.cc> <20180130075226.GL21609@dhcp22.suse.cz> <20180130081127.GH5906@breakpoint.cc> <20180130082817.cbax5qj4mxancx4b@node.shutemov.name> <20180130095739.GV21609@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180130095739.GV21609@dhcp22.suse.cz> User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 30-01-18 10:57:39, Michal Hocko wrote: > On Tue 30-01-18 10:02:34, Dmitry Vyukov wrote: > > On Tue, Jan 30, 2018 at 9:28 AM, Kirill A. Shutemov > > wrote: > > > On Tue, Jan 30, 2018 at 09:11:27AM +0100, Florian Westphal wrote: > > >> Michal Hocko wrote: > > >> > On Mon 29-01-18 23:35:22, Florian Westphal wrote: > > >> > > Kirill A. Shutemov wrote: > > >> > [...] > > >> > > > I hate what I'm saying, but I guess we need some tunable here. > > >> > > > Not sure what exactly. > > >> > > > > >> > > Would memcg help? > > >> > > > >> > That really depends. I would have to check whether vmalloc path obeys > > >> > __GFP_ACCOUNT (I suspect it does except for page tables allocations but > > >> > that shouldn't be a big deal). But then the other potential problem is > > >> > the life time of the xt_table_info (or other potentially large) data > > >> > structures. Are they bound to any process life time. > > >> > > >> No. > > > > > > Well, IIUC they bound to net namespace life time, so killing all > > > proccesses in the namespace would help to get memory back. :) > > > > ... unless the namespace is mounted into file system. > > > > Let's start with NOWARN as that's what kernel generally uses for > > allocations with user-controllable size. ENOMEM is roughly as > > informative as the WARNING message in this case. > > You want __GFP_NORETRY but that is not _fully_ supported by kvmalloc > right now. More specifically kvmalloc doesn't guanratee that the request > will not trigger the OOM killer (like regular __GFP_NORETRY). This is > because of internal vmalloc restrictions. If you are however OK to > simply bail out in most cases then __GFP_NORETRY should work reasonably > fine. > > > I think we also need to consider setting up memory cgroup for > > syzkaller test processes (we do RLIMIT_AS, but that's weak). > > Well, this is not about syzkaller, it merely pointed out a potential > DoS... And that has to be addressed somehow. So how about this? --- From d48e950f1b04f234b57b9e34c363bdcfec10aeee Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Tue, 30 Jan 2018 14:51:07 +0100 Subject: [PATCH] net/netfilter/x_tables.c: make allocation less aggressive syzbot has noticed that xt_alloc_table_info can allocate a lot of memory. This is an admin only interface but an admin in a namespace is sufficient as well. eacd86ca3b03 ("net/netfilter/x_tables.c: use kvmalloc() in xt_alloc_table_info()") has changed the opencoded kmalloc->vmalloc fallback into kvmalloc. It has dropped __GFP_NORETRY on the way because vmalloc has simply never fully supported __GFP_NORETRY semantic. This is still the case because e.g. page tables backing the vmalloc area are hardcoded GFP_KERNEL. Revert back to __GFP_NORETRY as a poors man defence against excessively large allocation request here. We will not rule out the OOM killer completely but __GFP_NORETRY should at least stop the large request in most cases. Fixes: eacd86ca3b03 ("net/netfilter/x_tables.c: use kvmalloc() in xt_alloc_table_info()") Signed-off-by: Michal Hocko --- net/netfilter/x_tables.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c index d8571f414208..a5f5c29bcbdc 100644 --- a/net/netfilter/x_tables.c +++ b/net/netfilter/x_tables.c @@ -1003,7 +1003,13 @@ struct xt_table_info *xt_alloc_table_info(unsigned int size) if ((SMP_ALIGN(size) >> PAGE_SHIFT) + 2 > totalram_pages) return NULL; - info = kvmalloc(sz, GFP_KERNEL); + /* + * __GFP_NORETRY is not fully supported by kvmalloc but it should + * work reasonably well if sz is too large and bail out rather + * than shoot all processes down before realizing there is nothing + * more to reclaim. + */ + info = kvmalloc(sz, GFP_KERNEL | __GFP_NORETRY); if (!info) return NULL; -- 2.15.1 -- Michal Hocko SUSE Labs