Received: by 2002:a25:7ec1:0:0:0:0:0 with SMTP id z184csp8679851ybc; Fri, 29 Nov 2019 14:55:02 -0800 (PST) X-Google-Smtp-Source: APXvYqyHfMRpfWayP6q+/p+WBF1xVIVJNU0zuKuMdG4J4AHaeQXBesh2jjHBd7a0ZBf71Ws5DGBC X-Received: by 2002:a17:906:1395:: with SMTP id f21mr62281422ejc.152.1575068102671; Fri, 29 Nov 2019 14:55:02 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1575068102; cv=none; d=google.com; s=arc-20160816; b=TFLlH/tXCZlGMPQHff9zLs8ohtGMCIc9ZpFQBaUSHryEcKAuSc4KbbQpFBPjK//+/T 3ADGV0LsgNkrjs7d3DMHXSIwU8wFW40TKH9wZtBHU8kc2NR6CEVUS3kmIGO/HgAojjSb 4YxfXGuhKuve/PncVyYM8Hz+IsxbO6soMe+k9R7cyxwuL5RSDfbIQZxGcwsf1oyrfOBS npU1AJxdSzslgMMLzkbA9atQEhRFaeDcmAmdYDEhAHNyBGdGMk3046+/ytdpPnWkCVIO AXOeM4nJ3t3G/fBX14e39dPiX064Uue5Rtf7E82I8N5K11hOQa4kuF14bNXzOoQiwQkm 7kgA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:to:subject:message-id:date:from :mime-version:dkim-signature; bh=KmlKmmT0etYCADZ8Ax0Q+YIHnJREMaeSktPwKEyQ4xc=; b=vtzRmT+ns5OYI6nML3TV93XPw3Zg5uDINQXlElmT0v00FA6CLEy1oc7jyqbr9OI7sl Zk9P8SQDgHvX76K200Y/T8Sb1o55scvHhiE2d8FasK+trpM7ir/iOtcEDwqA4/DtGNaZ FtVXaI0C7uTzDHO+pSFdiQnlrrIAgXOPeWlob1utk99k0FsyO27eHPthCz//dPeb2209 ICLlWb5jmRi6N3tncehQMy13B35nqwA06DfP6CfTvLQ7ipo+UNmBLTLrlXcCxsifqrGd Iqz/ICSkNSyKsdsasBkZQas8TngtLp8QbuauNARUV51cYgEb5K0/dgJSldftAwzAnf6b QTDQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=WtdYzZSg; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w19si14844398eje.290.2019.11.29.14.54.39; Fri, 29 Nov 2019 14:55:02 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=WtdYzZSg; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727236AbfK2Wwe (ORCPT + 99 others); Fri, 29 Nov 2019 17:52:34 -0500 Received: from mail-oi1-f193.google.com ([209.85.167.193]:39185 "EHLO mail-oi1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727073AbfK2Wwe (ORCPT ); Fri, 29 Nov 2019 17:52:34 -0500 Received: by mail-oi1-f193.google.com with SMTP id a67so8514351oib.6 for ; Fri, 29 Nov 2019 14:52:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=KmlKmmT0etYCADZ8Ax0Q+YIHnJREMaeSktPwKEyQ4xc=; b=WtdYzZSgyrqAOpVsdcCIvWuYbBSyTTULLDQVnIkGGUHFQe0oUsBWKbPPX12pH8HFrP 9qkgo1C/ZjmkHfu8S+QGtpuv4NstP8x817Jbtk2NerhHWADQoadM+WGfTieOWLeUsUdk 7dMW3jFlprkPNDcV+pUE39AZz1DyIYpdD3b2a5nCGm1YSJls02o4fy+hycoFnVuH//gR 8Se4kq++Ww+JE/Z00lAjlZKlMIYQp+wLv2vYJToKlVZN32XPSsAR0Dz1MhmVx/cdqs3y SyaUJBY+xZd0Wg4xucVSjgyM2X/SxDqBrflfFcyjkoinXelghnfz9Q9usiltz+oIBIZQ s4yA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=KmlKmmT0etYCADZ8Ax0Q+YIHnJREMaeSktPwKEyQ4xc=; b=ACScyy7CN5tPgwU53eqxyg0bd6If85dxlET9O+5hzYV02thh1ON77Z7Wwt2k04SrSV vE8oBpjJyNtHDIXaHFBkearqsUQuAl2B81c8f6cIFlmCtqH6RGEHuDE8HdvhdH6UHXgY WH4Lk/o7rGv7AmKiFhJngmnKjTEEuIEoe0Pl8muG2eFxquWXNuwDk96L9VrsHhZsJVmd iPEsQ0pfwXlo34QrrwfNRq5b1veQ3NjPSEDtMYhUC/R/3G55fi+xjRwiLSSRy/TtCUh/ RJX97BtD0FBqvRgjOJKq7ZoghHSpWRFqJ+zy78OR2nXr6k2SjUycU2jeHdQnoS3OGvpX BIxg== X-Gm-Message-State: APjAAAWPTgTlIBVRefo6VZrbzoQdUYBhnujoLbvVWwn3PKqFvt0cBV1s 0JwbehOkb6dJd3e0cuAz9NnaEgOAh+5KYIicoFI7U72M X-Received: by 2002:a05:6808:3ab:: with SMTP id n11mr2589367oie.5.1575067952824; Fri, 29 Nov 2019 14:52:32 -0800 (PST) MIME-Version: 1.0 From: Radu Rendec Date: Fri, 29 Nov 2019 17:52:22 -0500 Message-ID: Subject: RCU stall on 4.9.191/CONFIG_PREEMPT, single CPU To: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Everyone, I'm experiencing what appears to be a complete system freeze on a single CPU embedded system (PowerPC, MPC8378) running Linux 4.9.191 with CONFIG_PREEMPT enabled. Enabling all lockup detection in the kernel config revealed RCU stalls, detected by rcu_preempt (sample included below). These occur every 30 seconds, give or take. At that point the network stack is hosed (no replies to ARP requests). Softirqs seem to be hosed as well - I added a test module that logs a message every 0.5 seconds from a timer callback and, once the "freeze" occurs, these messages are never printed. I simply do not understand the RCU and scheduler implementation well enough to figure this out on my own, so any idea or suggestion would be helpful. Looking at Documentation/RCU/stallwarn.txt, I found this among the potential causes for an RCU stall: A CPU-bound real-time task in a CONFIG_PREEMPT kernel, which might happen to preempt a low-priority task in the middle of an RCU read-side critical section. This is especially damaging if that low-priority task is not permitted to run on any other CPU, in which case the next RCU grace period can never complete In my case all tasks are CPU-bound, simply because there is only one CPU. And yes, I do have realtime tasks (many of them), so any of them can potentially preempt a low-priority task in the middle of an RCU read critical section. Does that mean it's a bad idea to run a fully preemptible kernel on a single CPU? If that were the case, I would expect to see this kind of problems much sooner/more often, but they are only triggered by a very specific activity of the application that runs on top. Thanks, Radu Rendec [ 902.232175] INFO: rcu_preempt detected stalls on CPUs/tasks: [ 902.237847] (detected by 0, t=21002 jiffies, g=8316, c=8315, q=210) [ 902.244203] All QSes seen, last rcu_preempt kthread activity 21002 (602030-581028), jiffies_till_next_fqs=3, root ->qsmask 0x0 [ 902.255586] backupRestore R running task 0 1525 1254 0x00000000 [ 902.262633] Call Trace: [ 902.265093] [9ac0fc60] [8005b2f4] rcu_check_callbacks+0x848/0x860 (unreliable) [ 902.272319] [9ac0fcd0] [8005e878] update_process_times+0x30/0x60 [ 902.278328] [9ac0fce0] [8007046c] tick_sched_timer+0x58/0xb4 [ 902.283985] [9ac0fd10] [8005f844] __hrtimer_run_queues.constprop.34+0x100/0x18c [ 902.291291] [9ac0fd50] [8005fae0] hrtimer_interrupt+0xc0/0x290 [ 902.297122] [9ac0fda0] [8000a3c8] __timer_interrupt+0x15c/0x20c [ 902.303038] [9ac0fdb0] [8000a610] timer_interrupt+0x9c/0xc8 [ 902.308614] [9ac0fdd0] [8000f678] ret_from_except+0x0/0x14 [ 902.314106] --- interrupt: 901 at timespec64_add_safe+0x108/0x12c [ 902.314106] LR = poll_select_set_timeout+0x94/0x110 [ 902.325404] [9ac0fe90] [111294e8] 0x111294e8 (unreliable) [ 902.330804] [9ac0feb0] [800e8a1c] poll_select_set_timeout+0x50/0x110 [ 902.337155] [9ac0ff10] [800e9d70] SyS_poll+0x68/0x114 [ 902.342205] [9ac0ff40] [8000ef6c] ret_from_syscall+0x0/0x38 [ 902.347773] --- interrupt: c01 at 0xe5855f8 [ 902.347773] LR = 0xe5855e0 [ 902.354994] rcu_preempt kthread starved for 21002 jiffies! g8316 c8315 f0x2 RCU_GP_WAIT_FQS(3) ->state=0x0 [ 902.364638] rcu_preempt R running task 0 7 2 0x00000800 [ 902.371686] Call Trace: [ 902.374126] [9f043d00] [9ac08080] 0x9ac08080 (unreliable) [ 902.379526] [9f043dc0] [80415ca4] __schedule+0x278/0x490 [ 902.384835] [9f043e10] [8041608c] schedule+0x3c/0xac [ 902.389799] [9f043e20] [80418e9c] schedule_timeout+0x128/0x240 [ 902.395630] [9f043e60] [8005a0d4] rcu_gp_kthread+0x3dc/0xa6c [ 902.401289] [9f043ef0] [80038b7c] kthread+0xbc/0xd0 [ 902.406164] [9f043f40] [8000f094] ret_from_kernel_thread+0x5c/0x64