Received: by 2002:a25:23cc:0:0:0:0:0 with SMTP id j195csp907687ybj; Tue, 5 May 2020 09:27:06 -0700 (PDT) X-Google-Smtp-Source: APiQypIqpnK6WrTAekXqliqkOe7VFW49kpK2bmJY2rcLMzV6vFhJqvr6EfVjz5hv4yVC5aM/Qqhu X-Received: by 2002:a17:906:454a:: with SMTP id s10mr3510094ejq.141.1588696026559; Tue, 05 May 2020 09:27:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1588696026; cv=none; d=google.com; s=arc-20160816; b=GcxK+SKOctmezM3zAFxYtFISQk5ya35phztL8Yz4SNhkHIXqNv/U/cnDudjQk/WvPu NcoqGQfGDdFNlRndMPd8ECCKlQ0d/NmCw04iuCpfcdND2803KbsfbQwPW0ejiSqjF7GY xzv9mF56MM7OJMqR5kAQRDuTYSmbXsTvCIwLJvNN2XtpPpq1NGDfNqaE8O1jkgUNkfl6 JezQgEBQO8nKfTZaxRhwsyMw5t5naDQzUSafYUw6qpojUvgW3TNPEx6giCm7QtojPK64 HwsbOV2HWg5bvX7y0iiZvnyUI2K1Qe4XYm9aiOseYfjTd8u3fGPa54V3cJWhB0pGD9Wo I5mQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature; bh=oxHsmVrleduKbLfAdqcEQpbtGTwe8qx1AwhsOsfeLyk=; b=y3OP5WJwNu08S7z9nTSfmUYxfLojmQba3L+yciKHO3rjPZL+7weXCuRL/WXMOAvpbm oqfHBoByAjLD+DiKl5i6fFSFnjShbymZ9pyUTjMDBjKYNKmDv3Xx5eLT6W6VH6eal2Wy usMG42s+gmXWzuu/ND9m5NbowAuXyo5u++u5Yq3GQ+fbx/gV+fUSq43xz1J41Wd3bhBE zpXKwIwYzYoHbfLyep2f1Idtv2gwshH1+HcXCA2TdfRl7g2OdvpxoaYQ0xMaJLzx1F9/ SISMCC/JLbkiWlVNChQWEKppVtl+doBAAK84i77Zl23sxAGWyzMcXcW8/dI8Prk/OUfp AOJA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b="Qg/ZENHD"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id a16si1438969edt.274.2020.05.05.09.26.43; Tue, 05 May 2020 09:27:06 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b="Qg/ZENHD"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730350AbgEEQZM (ORCPT + 99 others); Tue, 5 May 2020 12:25:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41358 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729455AbgEEQZM (ORCPT ); Tue, 5 May 2020 12:25:12 -0400 Received: from mail-pl1-x643.google.com (mail-pl1-x643.google.com [IPv6:2607:f8b0:4864:20::643]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 01A09C061A0F; Tue, 5 May 2020 09:25:12 -0700 (PDT) Received: by mail-pl1-x643.google.com with SMTP id b6so985546plz.13; Tue, 05 May 2020 09:25:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=oxHsmVrleduKbLfAdqcEQpbtGTwe8qx1AwhsOsfeLyk=; b=Qg/ZENHDNjH6IzMJDZxDsrruyETFW2jgFsFCGCeaJiUV9nGDSl+cZ5OvakprNCSsWG Fr1Bwz6GnUxJi3qJnhCckUnEkOlzTwMYGf1o0kgMNtYaaF2Z8+wMHnwY8MbcyIjNTy7J QD5EZ4ez25L449ERjGWZLZHoBMCdaVg5/ipgrdaRS+PiurMW0cfTmoZ88ibznaIaj0mw h/0rYTlVQOCVZDsUS8OW4K0543o601VBShrPweuJ43dkuVc7iN2oH6ylARE8wDQvO/qi oboYXXMzqaKC61dRofsVXe3I3J0dPELQ2czrgC9A+yQcc7JEpfCmluMKGAAq3Mlrejbj JsCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=oxHsmVrleduKbLfAdqcEQpbtGTwe8qx1AwhsOsfeLyk=; b=GvYPYJ9OYUHU7I+ncihZPdZn2cJcFnO2Sp4Fwvtl6lOKQeetBYuQae7APe0XHFvdbJ suKgQeINWjhPvmcHc05kYXi0rbrELHudy76o7bb4fjESKwq1U17OJ+7xys/9sa0i4UNK /J74p6Tb+nm5+yTc2kLY7qbf3yInh5LjYlYP2Ym1etn+Npgg9IXlb+3Vkp7tfgO/qs62 wv5WCWSqtNU1/6epj1yw2kx4JIQfB1x0fLHPKHdS7IW7hl2ojV8gYT3ylvYI5SsuP9vk 69SZIPoin9o1QT5t1PDKeaJQ8q+BPQnpWGuVoE9khhtOWLGdGp2WqJtawtvzleHXm7hN JFmw== X-Gm-Message-State: AGi0PuYk6ErWn3K+dxCT3yDh4BphplBnTnAEhNqPm1GH+fCHMZpKax+D qpEIqw5aZRGKB6CeDlOu/78= X-Received: by 2002:a17:902:bd02:: with SMTP id p2mr3870012pls.72.1588695911524; Tue, 05 May 2020 09:25:11 -0700 (PDT) Received: from [192.168.86.235] (c-73-241-150-58.hsd1.ca.comcast.net. [73.241.150.58]) by smtp.gmail.com with ESMTPSA id x23sm1798701pgf.32.2020.05.05.09.25.07 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 05 May 2020 09:25:09 -0700 (PDT) Subject: Re: [PATCH net v2 0/2] Revert the 'socket_alloc' life cycle change To: SeongJae Park , Eric Dumazet Cc: Eric Dumazet , David Miller , Al Viro , Jakub Kicinski , Greg Kroah-Hartman , sj38.park@gmail.com, netdev , LKML , SeongJae Park , snu@amazon.com, amit@kernel.org, stable@vger.kernel.org, Paul McKenney References: <20200505161302.547-1-sjpark@amazon.com> From: Eric Dumazet Message-ID: <05843a3c-eb9d-3a0d-f992-7e4b97cc1f19@gmail.com> Date: Tue, 5 May 2020 09:25:06 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.6.0 MIME-Version: 1.0 In-Reply-To: <20200505161302.547-1-sjpark@amazon.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 5/5/20 9:13 AM, SeongJae Park wrote: > On Tue, 5 May 2020 09:00:44 -0700 Eric Dumazet wrote: > >> On Tue, May 5, 2020 at 8:47 AM SeongJae Park wrote: >>> >>> On Tue, 5 May 2020 08:20:50 -0700 Eric Dumazet wrote: >>> >>>> >>>> >>>> On 5/5/20 8:07 AM, SeongJae Park wrote: >>>>> On Tue, 5 May 2020 07:53:39 -0700 Eric Dumazet wrote: >>>>> >>>> >>>>>> Why do we have 10,000,000 objects around ? Could this be because of >>>>>> some RCU problem ? >>>>> >>>>> Mainly because of a long RCU grace period, as you guess. I have no idea how >>>>> the grace period became so long in this case. >>>>> >>>>> As my test machine was a virtual machine instance, I guess RCU readers >>>>> preemption[1] like problem might affected this. >>>>> >>>>> [1] https://www.usenix.org/system/files/conference/atc17/atc17-prasad.pdf >>>>> >>>>>> >>>>>> Once Al patches reverted, do you have 10,000,000 sock_alloc around ? >>>>> >>>>> Yes, both the old kernel that prior to Al's patches and the recent kernel >>>>> reverting the Al's patches didn't reproduce the problem. >>>>> >>>> >>>> I repeat my question : Do you have 10,000,000 (smaller) objects kept in slab caches ? >>>> >>>> TCP sockets use the (very complex, error prone) SLAB_TYPESAFE_BY_RCU, but not the struct socket_wq >>>> object that was allocated in sock_alloc_inode() before Al patches. >>>> >>>> These objects should be visible in kmalloc-64 kmem cache. >>> >>> Not exactly the 10,000,000, as it is only the possible highest number, but I >>> was able to observe clear exponential increase of the number of the objects >>> using slabtop. Before the start of the problematic workload, the number of >>> objects of 'kmalloc-64' was 5760, but I was able to observe the number increase >>> to 1,136,576. >>> >>> OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME >>> before: 5760 5088 88% 0.06K 90 64 360K kmalloc-64 >>> after: 1136576 1136576 100% 0.06K 17759 64 71036K kmalloc-64 >>> >> >> Great, thanks. >> >> How recent is the kernel you are running for your experiment ? > > It's based on 5.4.35. > >> >> Let's make sure the bug is not in RCU. > > One thing I can currently say is that the grace period passes at last. I > modified the benchmark to repeat not 10,000 times but only 5,000 times to run > the test without OOM but easily observable memory pressure. As soon as the > benchmark finishes, the memory were freed. > > If you need more tests, please let me know. > I would ask Paul opinion on this issue, because we have many objects being freed after RCU grace periods. If RCU subsystem can not keep-up, I guess other workloads will also suffer. Sure, we can revert patches there and there trying to work around the issue, but for objects allocated from process context, we should not have these problems.