Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753706AbcKRM4f (ORCPT ); Fri, 18 Nov 2016 07:56:35 -0500 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:48696 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753271AbcKRM4d (ORCPT ); Fri, 18 Nov 2016 07:56:33 -0500 Date: Fri, 18 Nov 2016 04:56:27 -0800 From: "Paul E. McKenney" To: Ding Tianhong Cc: josh@joshtriplett.org, rostedt@goodmis.org, mathieu.desnoyers@efficios.com, jiangshanlai@gmail.com, "linux-kernel@vger.kernel.org" Subject: Re: [PATCH] rcu: Fix soft lockup for rcu_nocb_kthread Reply-To: paulmck@linux.vnet.ibm.com References: <57610368.7080905@huawei.com> <20160615154913.GC3923@linux.vnet.ibm.com> <576242AB.5010204@huawei.com> <20160616141920.GO3923@linux.vnet.ibm.com> <57AA7FAA.1030801@huawei.com> <20160810015900.GB3482@linux.vnet.ibm.com> <3dedae95-d939-bdf5-ea1e-3932c44f0874@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3dedae95-d939-bdf5-ea1e-3932c44f0874@huawei.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16111812-0008-0000-0000-0000061C18A8 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00006098; HX=3.00000240; KW=3.00000007; PH=3.00000004; SC=3.00000189; SDB=6.00782391; UDB=6.00377536; IPR=6.00559861; BA=6.00004892; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00013368; XFM=3.00000011; UTC=2016-11-18 12:56:29 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16111812-0009-0000-0000-00003D1F47AA Message-Id: <20161118125627.GN3612@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-11-18_07:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1609300000 definitions=main-1611180230 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2932 Lines: 75 On Fri, Nov 18, 2016 at 08:37:28PM +0800, Ding Tianhong wrote: > > > On 2016/8/10 9:59, Paul E. McKenney wrote: > > On Wed, Aug 10, 2016 at 09:13:14AM +0800, Ding Tianhong wrote: > >> On 2016/6/16 22:19, Paul E. McKenney wrote: > >>> On Thu, Jun 16, 2016 at 02:09:47PM +0800, Ding Tianhong wrote: > >>>> On 2016/6/15 23:49, Paul E. McKenney wrote: > >>>>> On Wed, Jun 15, 2016 at 03:27:36PM +0800, Ding Tianhong wrote: > >>>>>> I met this problem when using the Testgine to send package to ixgbevf nic > >>>>>> by this steps: > >>>>>> 1. Connect to ixgbevf, and set the speed to 10Gb/s, it could work fine. > >>>>>> 2. Then use ifconfig to down the nic and up again, loop for several times. > >>>>>> 3. The system panic by soft lockup. > >>>>> > >>>>> Good catch, queued for review and testing. But what .config was your > >>>>> kernel built with? > >>>>> > >>>> > >>>> I use the redhat7.1 defconfig to build my kernel, and the RCU config is this: > >>>> 120 # > >>>> 121 # RCU Subsystem > >>>> 122 # > >>>> 123 CONFIG_TREE_RCU=y > >>>> 124 # CONFIG_PREEMPT_RCU is not set > >>>> 125 CONFIG_RCU_STALL_COMMON=y > >>>> 126 CONFIG_CONTEXT_TRACKING=y > >>>> 127 CONFIG_RCU_USER_QS=y > >>>> 128 # CONFIG_CONTEXT_TRACKING_FORCE is not set > >>>> 129 CONFIG_RCU_FANOUT=64 > >>>> 130 CONFIG_RCU_FANOUT_LEAF=16 > >>>> 131 # CONFIG_RCU_FANOUT_EXACT is not set > >>>> 132 # CONFIG_RCU_FAST_NO_HZ is not set > >>>> 133 # CONFIG_TREE_RCU_TRACE is not set > >>>> 134 CONFIG_RCU_NOCB_CPU=y > >>>> 135 CONFIG_RCU_NOCB_CPU_ALL=y > >>>> 136 CONFIG_BUILD_BIN2C=y > >>> > >>> Thank you! You were running with preemption disabled, so your system > >>> would indeed be very susceptible to this problem. > >>> > >>>>> Also, I did tweak both the commit log and the patch. Your cond_resched() > >>>>> would prevent soft lockups, but not RCU stalls, so I substituted > >>>>> cond_resched_rcu_qs(). Please let me know if either of those changes > >>>>> causes problems at your end. > >>>> > >>>> Looks fine to me, I will apply this to my branch and test it, thanks. > >>> > >>> Please let me know how it goes! > >>> > >>> Thanx, Paul > >>> > >> > >> Hi Paul: > >> > >> It has been a long time after applying this patch, and didn't found any problem, I believe this patch is fine, thanks. > > > > Very good! I will push this one upstream during the next merge window. > > > > Thanx, Paul > > > > Hi Paul: > > Sorry to say that I have found this patch will introduce an OOM problem, it will be triggered by huge IP abnormal packet > arrived, it looks that avoid process any pending softirqs in the rcuos kthread is the best way to fix this problem, I will > send a new patch to revert this and fix the problem. Interesting... Could you please let me know exactly how the added cond_resched_rcu_qs() leads to an OOM? Is it that the softirqs prevent the grace-period kthread from making progress? Thanx, Paul