Received: by 2002:a25:23cc:0:0:0:0:0 with SMTP id j195csp1970257ybj; Wed, 6 May 2020 08:25:09 -0700 (PDT) X-Google-Smtp-Source: APiQypJo1sWzqV1ZtCmZvg2cKlRRVZrFykI2DokS6spS1kXmAzOp9n+tdJX/OJ0sei7CQAtOLiUj X-Received: by 2002:a05:6402:8c9:: with SMTP id d9mr7247797edz.357.1588778708882; Wed, 06 May 2020 08:25:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1588778708; cv=none; d=google.com; s=arc-20160816; b=M922X2BXBxnUkimJYF8xtmwH6K8TT5OlzTVrcJvCLHQKXJsNJfKekynWJF5S45Vxp+ OkPJ6z3GRDB+NEFx8rFgudVmK54M3/KSL5LhkjVxWg5X+sK4pmx52WhK/16D+iZ3arad fc+TimjRpI5C1CC958sFmjcWunjA8xGfvUpAKZdFVq82/LNMYyQnat00auHsX4XJwyEB AdD5tShWl3XhBxl23l+tbxeSe+HeM/FTq8Ov4JpSuXmp+ns5B4b2yWulZYNOZwZUrM4t Nf2Koi+nhlPMLw2XXYEDktHqqvkmQ54Uok2yvJcAl2UD6BuBgQIZXxmKjdACm8wstt5+ DPQw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:in-reply-to:message-id:date :subject:cc:to:from:ironport-sdr:dkim-signature; bh=Qxz8YBQJMe0tFK5dHJ7HQYaq14w58zhXEZxLWPuYNeE=; b=staSAacQvMpcUbsXjFnwDOvdclGR9IuM0o1s1yjN2v/PBWLjQ2k52emvME0aUCTOu5 SBxjuqiGESC0XExHGWna4oWTPXCF4TKeH+mjoLgtnl4DT2QN8167CFyX44WBV3AJubJX iSPTJSSXLg81g/zfFOD2dBg4QxtPT32I0J/SkhZu7Jt0VsQhY62NV9/Ix3fDY2tVAEdy ksYVf7yasCNpjqZ8F5de/s9Y+jvxSfj/Sac9HIJVxgy+mI6HJr+B62PZtgYwgVlEIIeB EMfqKyRFRAbWwHbdYxmIYdiubSl3RlaE8inlsHmsIEj+rhY1sZo9jVuq946rCkhs1lPX THLA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b="BC/sLKzD"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id pk26si1398033ejb.404.2020.05.06.08.24.45; Wed, 06 May 2020 08:25:08 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b="BC/sLKzD"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729373AbgEFPU7 (ORCPT + 99 others); Wed, 6 May 2020 11:20:59 -0400 Received: from smtp-fw-33001.amazon.com ([207.171.190.10]:38474 "EHLO smtp-fw-33001.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728821AbgEFPU7 (ORCPT ); Wed, 6 May 2020 11:20:59 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1588778458; x=1620314458; h=from:to:cc:subject:date:message-id:in-reply-to: mime-version; bh=Qxz8YBQJMe0tFK5dHJ7HQYaq14w58zhXEZxLWPuYNeE=; b=BC/sLKzDzARc1pQD/WzENadvk+JG4VV1rJZFWINw4+iGfN/1vkBz/vXW RPsmLPCiyKzkOtQYD7r+1hTMugPzpgq6r+HHWSA5tpD7a7scq7OCXGetp jrZxMWGf2f4vl/Mvy8pOTbTxTgEM8rrW/oJ4xb+uFBWzSWwhea76UzLDx o=; IronPort-SDR: Ld8GMMi8Y88G+q0R5NC/a+LOiUcKRXJWMBHS1Vb9+4Qrjo2+QlB9FolQxA3SnFKWn7ta1QRKpi +ozPwuTuHyJA== X-IronPort-AV: E=Sophos;i="5.73,359,1583193600"; d="scan'208";a="43082169" Received: from sea32-co-svc-lb4-vlan3.sea.corp.amazon.com (HELO email-inbound-relay-1d-38ae4ad2.us-east-1.amazon.com) ([10.47.23.38]) by smtp-border-fw-out-33001.sea14.amazon.com with ESMTP; 06 May 2020 15:20:56 +0000 Received: from EX13MTAUEA002.ant.amazon.com (iad55-ws-svc-p15-lb9-vlan3.iad.amazon.com [10.40.159.166]) by email-inbound-relay-1d-38ae4ad2.us-east-1.amazon.com (Postfix) with ESMTPS id A05A4A2171; Wed, 6 May 2020 15:20:52 +0000 (UTC) Received: from EX13D31EUA001.ant.amazon.com (10.43.165.15) by EX13MTAUEA002.ant.amazon.com (10.43.61.77) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Wed, 6 May 2020 15:20:51 +0000 Received: from u886c93fd17d25d.ant.amazon.com (10.43.162.37) by EX13D31EUA001.ant.amazon.com (10.43.165.15) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Wed, 6 May 2020 15:20:43 +0000 From: SeongJae Park To: "Paul E. McKenney" CC: SeongJae Park , Eric Dumazet , Eric Dumazet , David Miller , "Al Viro" , Jakub Kicinski , "Greg Kroah-Hartman" , , netdev , LKML , SeongJae Park , , , Subject: Re: Re: Re: Re: Re: Re: [PATCH net v2 0/2] Revert the 'socket_alloc' life cycle change Date: Wed, 6 May 2020 17:20:25 +0200 Message-ID: <20200506152025.22085-1-sjpark@amazon.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200506144151.GZ2869@paulmck-ThinkPad-P72> (raw) MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [10.43.162.37] X-ClientProxiedBy: EX13d09UWC004.ant.amazon.com (10.43.162.114) To EX13D31EUA001.ant.amazon.com (10.43.165.15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 6 May 2020 07:41:51 -0700 "Paul E. McKenney" wrote: > On Wed, May 06, 2020 at 02:59:26PM +0200, SeongJae Park wrote: > > TL; DR: It was not kernel's fault, but the benchmark program. > > > > So, the problem is reproducible using the lebench[1] only. I carefully read > > it's code again. > > > > Before running the problem occurred "poll big" sub test, lebench executes > > "context switch" sub test. For the test, it sets the cpu affinity[2] and > > process priority[3] of itself to '0' and '-20', respectively. However, it > > doesn't restore the values to original value even after the "context switch" is > > finished. For the reason, "select big" sub test also run binded on CPU 0 and > > has lowest nice value. Therefore, it can disturb the RCU callback thread for > > the CPU 0, which processes the deferred deallocations of the sockets, and as a > > result it triggers the OOM. > > > > We confirmed the problem disappears by offloading the RCU callbacks from the > > CPU 0 using rcu_nocbs=0 boot parameter or simply restoring the affinity and/or > > priority. > > > > Someone _might_ still argue that this is kernel problem because the problem > > didn't occur on the old kernels prior to the Al's patches. However, setting > > the affinity and priority was available because the program received the > > permission. Therefore, it would be reasonable to blame the system > > administrators rather than the kernel. > > > > So, please ignore this patchset, apology for making confuse. If you still has > > some doubts or need more tests, please let me know. > > > > [1] https://github.com/LinuxPerfStudy/LEBench > > [2] https://github.com/LinuxPerfStudy/LEBench/blob/master/TEST_DIR/OS_Eval.c#L820 > > [3] https://github.com/LinuxPerfStudy/LEBench/blob/master/TEST_DIR/OS_Eval.c#L822 > > Thank you for chasing this down! > > I have had this sort of thing on my list as a potential issue, but given > that it is now really showing up, it sounds like it is time to bump > up its priority a bit. Of course there are limits, so if userspace is > running at any of the real-time priorities, making sufficient CPU time > available to RCU's kthreads becomes userspace's responsibility. But if > everything is running at SCHED_OTHER (which is this case here, correct?), Correct. > then it is reasonable for RCU to do some work to avoid this situation. That would be also great! > > But still, yes, the immediate job is fixing the benchmark. ;-) Totally agreed. > > Thanx, Paul > > PS. Why not just attack all potential issues on my list? Because I > usually learn quite a bit from seeing the problem actually happen. > And sometimes other changes in RCU eliminate the potential issue > before it has a chance to happen. Sounds interesting, I will try some of those in my spare time ;) Thanks, SeongJae Park