Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp2529782imu; Thu, 29 Nov 2018 06:28:13 -0800 (PST) X-Google-Smtp-Source: AFSGD/VQQArwtcAm+/AdFtc7pBS/CZIQjeHDIRo9Q55SfvZmxEUEZOK5xZKrxDk4Tk1UCFdZo/mi X-Received: by 2002:a17:902:c5:: with SMTP id a63mr1689999pla.267.1543501693372; Thu, 29 Nov 2018 06:28:13 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543501693; cv=none; d=google.com; s=arc-20160816; b=o3tNPflNdjRuzfVWZZCiDC2yd1HQGJfW3kIKEFSXrQLcUJ72WfSgHUSe+eBXfarqLY IDe+Gs/NgGJRgrDq1W40d5zMOIGQZLVFBNFljYQrKhCkopIIxRTkh+DNq1YszyO7hwSV NxReO1bFBaF/RayTBgGndzybn8g6B86aEW61XmW3H/6UZRlAySoh4mQHjpKhgBU8T6Ko 2b1hGS7ttSYMqWTlpadonUItbOqPr2yb/nEnIk+eDN3NV2Uq7O8Z/BFXdo3um5E0DBoH ga3qyC8b3KoKEgq31C4uYSlGufnT1AJ8Bl9CPUOe6ol8fxsgTBTnpo/2FfEkI8QJta6B Cglg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:user-agent:in-reply-to :content-disposition:mime-version:references:reply-to:subject:cc:to :from:date; bh=3/c/g9igo3c+3oTwK4QQLOiP591Vl19h9A3zgLeRgH8=; b=ZCXiGUW80NxtjQUaLGzRQUiyipvPAYfzNdlICoeoKB3BB0Ql6EODB1uI8qHGrWm/X0 xCBYt+Te31t3xQ82n1kFa9QLLX80DgCwQFWGOCcUHyKq5Bh1OPIAd5AJ7lzry/w/73oi 0GKLsPQop/xurQueCMgUmDnRmB2Sk98SMCebkP/AMXdZfq9v8k9a3dJz9TqY++Xyivo8 5k2XX7ngGYx+No5Pk7hzSKCZA00qNqtq5zSCFTexhdX1L+xPM61TXcbeKRRbAhAOJmD/ c2zL7+wOeK9S1UQQw1kZY/2xCgebE+W0ZfzcukuduCYq89nOcP/JISG/Bd0gUNsZrM42 PDHA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v11si2161265plp.85.2018.11.29.06.27.57; Thu, 29 Nov 2018 06:28:13 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387654AbeK3Bcw (ORCPT + 99 others); Thu, 29 Nov 2018 20:32:52 -0500 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:43248 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731814AbeK3Bcv (ORCPT ); Thu, 29 Nov 2018 20:32:51 -0500 Received: from pps.filterd (m0098399.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id wATEJH5O118028 for ; Thu, 29 Nov 2018 09:27:18 -0500 Received: from e15.ny.us.ibm.com (e15.ny.us.ibm.com [129.33.205.205]) by mx0a-001b2d01.pphosted.com with ESMTP id 2p2ga6vyey-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 29 Nov 2018 09:27:18 -0500 Received: from localhost by e15.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 29 Nov 2018 14:27:16 -0000 Received: from b01cxnp23032.gho.pok.ibm.com (9.57.198.27) by e15.ny.us.ibm.com (146.89.104.202) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Thu, 29 Nov 2018 14:27:13 -0000 Received: from b01ledav003.gho.pok.ibm.com (b01ledav003.gho.pok.ibm.com [9.57.199.108]) by b01cxnp23032.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id wATERDve14024920 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Thu, 29 Nov 2018 14:27:13 GMT Received: from b01ledav003.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id DF797B205F; Thu, 29 Nov 2018 14:27:12 +0000 (GMT) Received: from b01ledav003.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B4A5CB206A; Thu, 29 Nov 2018 14:27:12 +0000 (GMT) Received: from paulmck-ThinkPad-W541 (unknown [9.85.160.212]) by b01ledav003.gho.pok.ibm.com (Postfix) with ESMTP; Thu, 29 Nov 2018 14:27:12 +0000 (GMT) Received: by paulmck-ThinkPad-W541 (Postfix, from userid 1000) id AE92B16C1B35; Thu, 29 Nov 2018 06:27:12 -0800 (PST) Date: Thu, 29 Nov 2018 06:27:12 -0800 From: "Paul E. McKenney" To: "He, Bo" Cc: "linux-kernel@vger.kernel.org" , "josh@joshtriplett.org" , "rostedt@goodmis.org" , "mathieu.desnoyers@efficios.com" , "jiangshanlai@gmail.com" , "Zhang, Jun" , "Xiao, Jin" , "Zhang, Yanmin" Subject: Re: rcu_preempt caused oom Reply-To: paulmck@linux.ibm.com References: <20181129130647.GG4170@linux.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20181129130647.GG4170@linux.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 18112914-0068-0000-0000-00000369077F X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00010143; HX=3.00000242; KW=3.00000007; PH=3.00000004; SC=3.00000270; SDB=6.01124485; UDB=6.00583832; IPR=6.00904613; MB=3.00024384; MTD=3.00000008; XFM=3.00000015; UTC=2018-11-29 14:27:16 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18112914-0069-0000-0000-000046970B95 Message-Id: <20181129142712.GA16607@linux.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2018-11-29_09:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=2 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1811290121 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Nov 29, 2018 at 05:06:47AM -0800, Paul E. McKenney wrote: > On Thu, Nov 29, 2018 at 08:49:35AM +0000, He, Bo wrote: > > Hi, > > we test on kernel 4.19.0 on android, after run more than 24 Hours monkey stress test, we see OOM on 1/10 2G memory board, the issue is not seen on the 4.14 kernel. > > we have done some debugs: > > 1. OOM is due to the filp consume too many memory: 300M vs 2G board. > > 2. with the 120s hung task detect, most of the tasks will block at __wait_rcu_gp: wait_for_completion(&rs_array[i].completion); Did you did see any RCU CPU stall warnings? Or have those been disabled? If they have been disabled, could you please rerun with them enabled? > > [47571.863839] Kernel panic - not syncing: hung_task: blocked tasks > > [47571.875446] CPU: 1 PID: 13626 Comm: FinalizerDaemon Tainted: G U O 4.19.0-quilt-2e5dc0ac-gf3f313245eb6 #1 > > [47571.887603] Call Trace: > > [47571.890547] dump_stack+0x70/0xa5 > > [47571.894456] panic+0xe3/0x241 > > [47571.897977] ? wait_for_completion_timeout+0x72/0x1b0 > > [47571.903830] __wait_rcu_gp+0x17b/0x180 > > [47571.908226] synchronize_rcu.part.76+0x38/0x50 > > [47571.913393] ? __call_rcu.constprop.79+0x3a0/0x3a0 > > [47571.918948] ? __bpf_trace_rcu_invoke_callback+0x10/0x10 > > [47571.925094] synchronize_rcu+0x43/0x50 > > [47571.929487] evdev_detach_client+0x59/0x60 > > [47571.934264] evdev_release+0x4e/0xd0 > > [47571.938464] __fput+0xfa/0x1f0 > > [47571.942072] ____fput+0xe/0x10 > > [47571.945683] task_work_run+0x90/0xc0 > > [47571.949884] exit_to_usermode_loop+0x9f/0xb0 > > [47571.954855] do_syscall_64+0xfa/0x110 > > [47571.959151] entry_SYSCALL_64_after_hwframe+0x49/0xbe This is indeed a task waiting on synchronize_rcu(). > > 3. after enable the rcu trace, we don't see rcu_quiescent_state_report trace in a long time, we see rcu_callback: rcu_preempt will never response with the rcu_invoke_callback. > > [47572.040668] ps-12388 1d..1 47566097572us : rcu_grace_period: rcu_preempt 23716088 AccWaitCB > > [47572.040707] ps-12388 1d... 47566097621us : rcu_callback: rcu_preempt rhp=00000000783a728b func=file_free_rcu 4354/82824 > > [47572.040734] ps-12388 1d..1 47566097622us : rcu_future_grace_period: rcu_preempt 23716088 23716092 0 0 3 Startleaf > > [47572.040756] ps-12388 1d..1 47566097623us : rcu_future_grace_period: rcu_preempt 23716088 23716092 0 0 3 Prestarted > > [47572.040778] ps-12388 1d..1 47566097623us : rcu_grace_period: rcu_preempt 23716088 AccWaitCB > > [47572.040802] ps-12388 1d... 47566097674us : rcu_callback: rcu_preempt rhp=0000000042c76521 func=file_free_rcu 4354/82825 > > [47572.040824] ps-12388 1d..1 47566097676us : rcu_future_grace_period: rcu_preempt 23716088 23716092 0 0 3 Startleaf > > [47572.040847] ps-12388 1d..1 47566097676us : rcu_future_grace_period: rcu_preempt 23716088 23716092 0 0 3 Prestarted > > [47572.040868] ps-12388 1d..1 47566097676us : rcu_grace_period: rcu_preempt 23716088 AccWaitCB > > [47572.040895] ps-12388 1d..1 47566097716us : rcu_callback: rcu_preempt rhp=000000005e40fde2 func=avc_node_free 4354/82826 > > [47572.040919] ps-12388 1d..1 47566097735us : rcu_callback: rcu_preempt rhp=00000000f80fe353 func=avc_node_free 4354/82827 > > [47572.040943] ps-12388 1d..1 47566097758us : rcu_callback: rcu_preempt rhp=000000007486f400 func=avc_node_free 4354/82828 > > [47572.040967] ps-12388 1d..1 47566097760us : rcu_callback: rcu_preempt rhp=00000000b87872a8 func=avc_node_free 4354/82829 > > [47572.040990] ps-12388 1d... 47566097789us : rcu_callback: rcu_preempt rhp=000000008c656343 func=file_free_rcu 4354/82830 > > [47572.041013] ps-12388 1d..1 47566097790us : rcu_future_grace_period: rcu_preempt 23716088 23716092 0 0 3 Startleaf > > [47572.041036] ps-12388 1d..1 47566097790us : rcu_future_grace_period: rcu_preempt 23716088 23716092 0 0 3 Prestarted > > [47572.041057] ps-12388 1d..1 47566097791us : rcu_grace_period: rcu_preempt 23716088 AccWaitCB > > [47572.041081] ps-12388 1d... 47566097871us : rcu_callback: rcu_preempt rhp=000000007e6c898c func=file_free_rcu 4354/82831 > > [47572.041103] ps-12388 1d..1 47566097872us : rcu_future_grace_period: rcu_preempt 23716088 23716092 0 0 3 Startleaf > > [47572.041126] ps-12388 1d..1 47566097872us : rcu_future_grace_period: rcu_preempt 23716088 23716092 0 0 3 Prestarted > > [47572.041147] ps-12388 1d..1 47566097873us : rcu_grace_period: rcu_preempt 23716088 AccWaitCB > > [47572.041170] ps-12388 1d... 47566097945us : rcu_callback: rcu_preempt rhp=0000000032f4f174 func=file_free_rcu 4354/82832 > > [47572.041193] ps-12388 1d..1 47566097946us : rcu_future_grace_period: rcu_preempt 23716088 23716092 0 0 3 Startleaf Callbacks are being queued and future grace periods to handle them are being requested, but as you say, no progress on the current grace period. Is it possible to start the trace earlier? > > Do you have any suggestions to debug the issue? > > If you do not already have CONFIG_RCU_BOOST=y set, could you please > rebuild with that? > > Could you also please send your .config file? So, to summarize: 1. If you don't have RCU CPU stall warnings enabled, please enable them. For example, please remove rcupdate.rcu_cpu_stall_suppress from the kernel boot parameters if it is there. Getting an RCU CPU stall warning would be extremely helpful. It contains many useful diagnostics. 2. If possible, please start the trace before the last grace period starts. 3. If CONFIG_RCU_BOOST=y is not set, please try setting it. 4. Please send me your .config file. Thanx, Paul