Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp2041970imm; Mon, 16 Jul 2018 00:44:03 -0700 (PDT) X-Google-Smtp-Source: AAOMgpdYVZPjBN9FimpthHMWXvRVnFFEqKsIw0M+mmCj7odJ+81Aqml0SgCj5OjgmISN8YfccMBd X-Received: by 2002:a17:902:5a08:: with SMTP id q8-v6mr9769151pli.300.1531727043432; Mon, 16 Jul 2018 00:44:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1531727043; cv=none; d=google.com; s=arc-20160816; b=OzKT4o2ofFJCVEKUaNJHMyyjDx7PnZXJqAkxmWa2sc0m0EuMEpj63ZLcs4P9FnBkVN uX0XpzqctAfEkXjOgAOvzsR6KG/JhOw9ufqJ0yueo92msVO/WYnuGc2Nmd4Y0f/Gm0T5 pPbxkWA6h9shfvtNuBQMraSi+Al9L8f2Hw5DiUnjsJlVeEiJjKTq8wcVTtAa381zvGvp sMbRgnBT+Fjh/uTN6auODf0loGKSWRh2qR2ph6gskIjNvXr0L0XtAGk285xsfo79xBw4 m+NV5zGv9P54Jwh+3rQDkbHBHDbetlR1DwbiErAVJvZdvqFiOJOe7o60mMQ8gg67u8+l z5GQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=BBqf49e6KHrq7AfNiJ/Y7ocOsgzlK8bPwHGF7L+Npk0=; b=SG3b9Dc8ar3e6EGOboyF4v0ok/EWBuRGJ8uXOrS14GnbLkKEpl1nFGStKEoyDGtj19 ZkESWSaJ7b7kTA0fClwcgvftc/eNTD1yA9WAaBEhMtZXq3YJQPfivVxE6m/QExnwdqE9 aH8sdhUZO6RdW7wZ589Bxk/GvDBy2XMPtk68eVdQex7GLjYpyO0B9TqsTUdT2H3q8RYl QMKcF9uhcndUOGK9MspUdKfUx6zHAa32xaMKThYiCjmqtSyeoXUbv8+LwB6ZnkqKfGzM WDuqUI+zZ7eFK4TpNGisyKfaoIumofJmKVTKeO9WVmKTIoJ4ZvPzUPz5/n7cJt3mcEYY HEwQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x1-v6si12589676pge.521.2018.07.16.00.43.48; Mon, 16 Jul 2018 00:44:03 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388787AbeGPIIf (ORCPT + 99 others); Mon, 16 Jul 2018 04:08:35 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:47804 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730161AbeGPIIe (ORCPT ); Mon, 16 Jul 2018 04:08:34 -0400 Received: from localhost (LFbn-1-12247-202.w90-92.abo.wanadoo.fr [90.92.61.202]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id E4B70CA0; Mon, 16 Jul 2018 07:42:31 +0000 (UTC) From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Paul Burton , James Hogan , Ralf Baechle , Huacai Chen , linux-mips@linux-mips.org Subject: [PATCH 4.9 03/32] MIPS: Use async IPIs for arch_trigger_cpumask_backtrace() Date: Mon, 16 Jul 2018 09:36:11 +0200 Message-Id: <20180716073504.871411055@linuxfoundation.org> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180716073504.433996952@linuxfoundation.org> References: <20180716073504.433996952@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 4.9-stable review patch. If anyone has any objections, please let me know. ------------------ From: Paul Burton commit b63e132b6433a41cf311e8bc382d33fd2b73b505 upstream. The current MIPS implementation of arch_trigger_cpumask_backtrace() is broken because it attempts to use synchronous IPIs despite the fact that it may be run with interrupts disabled. This means that when arch_trigger_cpumask_backtrace() is invoked, for example by the RCU CPU stall watchdog, we may: - Deadlock due to use of synchronous IPIs with interrupts disabled, causing the CPU that's attempting to generate the backtrace output to hang itself. - Not succeed in generating the desired output from remote CPUs. - Produce warnings about this from smp_call_function_many(), for example: [42760.526910] INFO: rcu_sched detected stalls on CPUs/tasks: [42760.535755] 0-...!: (1 GPs behind) idle=ade/140000000000000/0 softirq=526944/526945 fqs=0 [42760.547874] 1-...!: (0 ticks this GP) idle=e4a/140000000000000/0 softirq=547885/547885 fqs=0 [42760.559869] (detected by 2, t=2162 jiffies, g=266689, c=266688, q=33) [42760.568927] ------------[ cut here ]------------ [42760.576146] WARNING: CPU: 2 PID: 1216 at kernel/smp.c:416 smp_call_function_many+0x88/0x20c [42760.587839] Modules linked in: [42760.593152] CPU: 2 PID: 1216 Comm: sh Not tainted 4.15.4-00373-gee058bb4d0c2 #2 [42760.603767] Stack : 8e09bd20 8e09bd20 8e09bd20 fffffff0 00000007 00000006 00000000 8e09bca8 [42760.616937] 95b2b379 95b2b379 807a0080 00000007 81944518 0000018a 00000032 00000000 [42760.630095] 00000000 00000030 80000000 00000000 806eca74 00000009 8017e2b8 000001a0 [42760.643169] 00000000 00000002 00000000 8e09baa4 00000008 808b8008 86d69080 8e09bca0 [42760.656282] 8e09ad50 805e20aa 00000000 00000000 00000000 8017e2b8 00000009 801070ca [42760.669424] ... [42760.673919] Call Trace: [42760.678672] [<27fde568>] show_stack+0x70/0xf0 [42760.685417] [<84751641>] dump_stack+0xaa/0xd0 [42760.692188] [<699d671c>] __warn+0x80/0x92 [42760.698549] [<68915d41>] warn_slowpath_null+0x28/0x36 [42760.705912] [] smp_call_function_many+0x88/0x20c [42760.713696] [<6bbdfc2a>] arch_trigger_cpumask_backtrace+0x30/0x4a [42760.722216] [] rcu_dump_cpu_stacks+0x6a/0x98 [42760.729580] [<796e7629>] rcu_check_callbacks+0x672/0x6ac [42760.737476] [<059b3b43>] update_process_times+0x18/0x34 [42760.744981] [<6eb94941>] tick_sched_handle.isra.5+0x26/0x38 [42760.752793] [<478d3d70>] tick_sched_timer+0x1c/0x50 [42760.759882] [] __hrtimer_run_queues+0xc6/0x226 [42760.767418] [] hrtimer_interrupt+0x88/0x19a [42760.775031] [<6765a19e>] gic_compare_interrupt+0x2e/0x3a [42760.782761] [<0558bf5f>] handle_percpu_devid_irq+0x78/0x168 [42760.790795] [<90c11ba2>] generic_handle_irq+0x1e/0x2c [42760.798117] [<1b6d462c>] gic_handle_local_int+0x38/0x86 [42760.805545] [] gic_irq_dispatch+0xa/0x14 [42760.812534] [<90c11ba2>] generic_handle_irq+0x1e/0x2c [42760.820086] [] do_IRQ+0x16/0x20 [42760.826274] [<9aef3ce6>] plat_irq_dispatch+0x62/0x94 [42760.833458] [<6a94b53c>] except_vec_vi_end+0x70/0x78 [42760.840655] [<22284043>] smp_call_function_many+0x1ba/0x20c [42760.848501] [<54022b58>] smp_call_function+0x1e/0x2c [42760.855693] [] flush_tlb_mm+0x2a/0x98 [42760.862730] [<0844cdd0>] tlb_flush_mmu+0x1c/0x44 [42760.869628] [] arch_tlb_finish_mmu+0x26/0x3e [42760.877021] [<1aeaaf74>] tlb_finish_mmu+0x18/0x66 [42760.883907] [] exit_mmap+0x76/0xea [42760.890428] [] mmput+0x80/0x11a [42760.896632] [] do_exit+0x1f4/0x80c [42760.903158] [] do_group_exit+0x20/0x7e [42760.909990] [<13fa8d54>] __wake_up_parent+0x0/0x1e [42760.917045] [<46cf89d0>] smp_call_function_many+0x1a2/0x20c [42760.924893] [<8c21a93b>] syscall_common+0x14/0x1c [42760.931765] ---[ end trace 02aa09da9dc52a60 ]--- [42760.938342] ------------[ cut here ]------------ [42760.945311] WARNING: CPU: 2 PID: 1216 at kernel/smp.c:291 smp_call_function_single+0xee/0xf8 ... This patch switches MIPS' arch_trigger_cpumask_backtrace() to use async IPIs & smp_call_function_single_async() in order to resolve this problem. We ensure use of the pre-allocated call_single_data_t structures is serialized by maintaining a cpumask indicating that they're busy, and refusing to attempt to send an IPI when a CPU's bit is set in this mask. This should only happen if a CPU hasn't responded to a previous backtrace IPI - ie. if it's hung - and we print a warning to the console in this case. I've marked this for stable branches as far back as v4.9, to which it applies cleanly. Strictly speaking the faulty MIPS implementation can be traced further back to commit 856839b76836 ("MIPS: Add arch_trigger_all_cpu_backtrace() function") in v3.19, but kernel versions v3.19 through v4.8 will require further work to backport due to the rework performed in commit 9a01c3ed5cdb ("nmi_backtrace: add more trigger_*_cpu_backtrace() methods"). Signed-off-by: Paul Burton Patchwork: https://patchwork.linux-mips.org/patch/19597/ Cc: James Hogan Cc: Ralf Baechle Cc: Huacai Chen Cc: linux-mips@linux-mips.org Cc: stable@vger.kernel.org # v4.9+ Fixes: 856839b76836 ("MIPS: Add arch_trigger_all_cpu_backtrace() function") Fixes: 9a01c3ed5cdb ("nmi_backtrace: add more trigger_*_cpu_backtrace() methods") Signed-off-by: Greg Kroah-Hartman --- arch/mips/kernel/process.c | 45 ++++++++++++++++++++++++++++++--------------- 1 file changed, 30 insertions(+), 15 deletions(-) --- a/arch/mips/kernel/process.c +++ b/arch/mips/kernel/process.c @@ -26,6 +26,7 @@ #include #include #include +#include #include #include @@ -633,28 +634,42 @@ unsigned long arch_align_stack(unsigned return sp & ALMASK; } -static void arch_dump_stack(void *info) -{ - struct pt_regs *regs; +static DEFINE_PER_CPU(call_single_data_t, backtrace_csd); +static struct cpumask backtrace_csd_busy; - regs = get_irq_regs(); - - if (regs) - show_regs(regs); - else - dump_stack(); +static void handle_backtrace(void *info) +{ + nmi_cpu_backtrace(get_irq_regs()); + cpumask_clear_cpu(smp_processor_id(), &backtrace_csd_busy); } -void arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self) +static void raise_backtrace(cpumask_t *mask) { - long this_cpu = get_cpu(); + call_single_data_t *csd; + int cpu; - if (cpumask_test_cpu(this_cpu, mask) && !exclude_self) - dump_stack(); + for_each_cpu(cpu, mask) { + /* + * If we previously sent an IPI to the target CPU & it hasn't + * cleared its bit in the busy cpumask then it didn't handle + * our previous IPI & it's not safe for us to reuse the + * call_single_data_t. + */ + if (cpumask_test_and_set_cpu(cpu, &backtrace_csd_busy)) { + pr_warn("Unable to send backtrace IPI to CPU%u - perhaps it hung?\n", + cpu); + continue; + } - smp_call_function_many(mask, arch_dump_stack, NULL, 1); + csd = &per_cpu(backtrace_csd, cpu); + csd->func = handle_backtrace; + smp_call_function_single_async(cpu, csd); + } +} - put_cpu(); +void arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self) +{ + nmi_trigger_cpumask_backtrace(mask, exclude_self, raise_backtrace); } int mips_get_process_fp_mode(struct task_struct *task)