Received: by 2002:a25:23cc:0:0:0:0:0 with SMTP id j195csp909795ybj; Tue, 5 May 2020 09:29:33 -0700 (PDT) X-Google-Smtp-Source: APiQypL8TPTuW38VhOnJgRdosZkKWBdQhhWZ8C6PMob2NaHArDC7gpkRAF2R4JfpE+IX6U2GxQ95 X-Received: by 2002:a50:a365:: with SMTP id 92mr3480077edn.220.1588696173286; Tue, 05 May 2020 09:29:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1588696173; cv=none; d=google.com; s=arc-20160816; b=NX1B45078Gydcqe6ADQUadXVpaowD5ZfZ7u/JUJjiznSdM4mTtL7nm+l310tP1Rgm5 BdlyiqhwiNDgShde/hK0Adrdo4Dz8duTgwO1gXD7IWI2Kls0JOZci0juOfX7f8/5kmYR 0/iNSKdzEWBUkbFiDcDhdkzBn4czTN3/IiRqoTvch3PogEyh3jzHYRr6+2Hinrw/MH2w gw+1UPXCbJDUsMZ3cLMyYgIytPpKJ30+zIrn+SUBA/60+5w2gE8+KbcdUxf3AAGZI4YE 0+TINNONnyFkPSM9R6SlElrAFKAZszz+mp9gyDGPWalKbetCqX+tNFcAcUy3oX1yAcPq wlLA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=lY5pumRXad0Yf3crbrfQ4WbDTgwSeTshxxOtbsV8b9o=; b=Vb2wEmCTTjcpEVHTxc/MUUPNfLlKNQyzLBHPnNWUyLiyUT7WtZp8AbFnKpUmdtrar2 Q81k7ZquDIAKcN/Scov6ktxeVZCgQ+zQ3mVOPsmZ3FlbtHkuGu2UgEudByaKR9viwhxE jU/x1DSm7UkxdJq2m7nwamaQPG2J1TT54E61J9Ni9abwfHD9szOkNQoacuUGig7EM30v vFQhADCa3gyBa4PZFMsAq80QnVASQsQBlfdaVhwL+Q6/oZC8bZuTeTN5JLiauqoqA9sL p7tHu1TPBM1lHICPwvt1Xt0d7CETUhVG8QhQ9pbBQu3vwy3A8BNg91kSU3D2jgN0d0Yi zguA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id cq19si1356508edb.333.2020.05.05.09.29.02; Tue, 05 May 2020 09:29:33 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730798AbgEEQ0r (ORCPT + 99 others); Tue, 5 May 2020 12:26:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41618 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730282AbgEEQ0o (ORCPT ); Tue, 5 May 2020 12:26:44 -0400 Received: from ZenIV.linux.org.uk (zeniv.linux.org.uk [IPv6:2002:c35c:fd02::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 54901C061A0F; Tue, 5 May 2020 09:26:44 -0700 (PDT) Received: from viro by ZenIV.linux.org.uk with local (Exim 4.92.3 #3 (Red Hat Linux)) id 1jW0Ou-001bTK-0E; Tue, 05 May 2020 16:26:32 +0000 Date: Tue, 5 May 2020 17:26:31 +0100 From: Al Viro To: Eric Dumazet Cc: SeongJae Park , Eric Dumazet , David Miller , Jakub Kicinski , Greg Kroah-Hartman , sj38.park@gmail.com, netdev , LKML , SeongJae Park , snu@amazon.com, amit@kernel.org, stable@vger.kernel.org Subject: Re: Re: [PATCH net v2 0/2] Revert the 'socket_alloc' life cycle change Message-ID: <20200505162631.GY23230@ZenIV.linux.org.uk> References: <20200505154644.18997-1-sjpark@amazon.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, May 05, 2020 at 09:00:44AM -0700, Eric Dumazet wrote: > > Not exactly the 10,000,000, as it is only the possible highest number, but I > > was able to observe clear exponential increase of the number of the objects > > using slabtop. Before the start of the problematic workload, the number of > > objects of 'kmalloc-64' was 5760, but I was able to observe the number increase > > to 1,136,576. > > > > OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME > > before: 5760 5088 88% 0.06K 90 64 360K kmalloc-64 > > after: 1136576 1136576 100% 0.06K 17759 64 71036K kmalloc-64 > > > > Great, thanks. > > How recent is the kernel you are running for your experiment ? > > Let's make sure the bug is not in RCU. > > After Al changes, RCU got slightly better under stress. The thing that worries me here is that this is far from being the only source of RCU-delayed freeing of objects. If we really see bogus OOM kills due to that (IRL, not in an artificial microbenchmark), we'd better do something that would help with all those sources, not just paper over the contributions from one of those. Because there's no chance in hell to get rid of RCU-delayed freeing in general... Does the problem extend to kfree_rcu()? And there's a lot of RCU callbacks that boil down to kmem_cache_free(); those really look like they should have exact same issue - sock_free_inode() is one of those, after all.