Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp2517367imm; Thu, 19 Jul 2018 23:24:17 -0700 (PDT) X-Google-Smtp-Source: AAOMgpeJcvv6Cv+gZX8yXqASDWnqaaA6njzp12P40WhVZn6bU2Xk3tS21q2Yk/COzig2q/58ZxzR X-Received: by 2002:a65:5141:: with SMTP id g1-v6mr797787pgq.418.1532067857672; Thu, 19 Jul 2018 23:24:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1532067857; cv=none; d=google.com; s=arc-20160816; b=Axo2JMQFP5ZZs9YjDlcHH8n5MPxBsskYnGqXgKTZnM70eHXcLEr56P82BZ8c89rG/K MpPdESPFQDvWGy8Dh8vwmrXYd+YpJxGqlZ/r3cDh79f4u2G1U6qQVb42SyUlSMaie4kp Uhksb5hS/JOrfT42PRFrwosD69eYV62yMQ1mF4VaH9tyT06glZrmQ7v9EvzwFhIv/X11 YFPY6MShpkOLAUkNI3cFKLdPsd99XICscZ1ePrwBbm9E6Y+gfYlOeM+KCMmcfGMTRiXC A3Pi1qthbY6x9t9JAupvn0yS9/UYHl5ep9d46ftanLZ/Kx+vMmgmRdvvpPNrqnKBnKc1 UnbA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=aB24qdN+yoktWuhgK/0vgeH8sbkc4FtdtKGX25SytYA=; b=HXNaB0gu5pd1lPDW3gVXVMkaFBGTUHEbA9wyHDfvKg6e/NhYuO6/OMECLdUaXAN1Db 8wfcjbhNRb6Ks6h6irPistTwFZyp1jllnvui2Cn2//RvXXBYTIbq2vVa3Kme9UE0QNQv HaGJqQLqan4hEiCwzwlbBx2l/zac692J093hqbfM8ckyKQ4iqX94J0Dcc+kfkFyk34Rz 8ZlOj/5yNGvVrvpmcos/4zvoGRkt680GJgpy/2N6ALJ7oAHT+0A1i1uuSthqCELC1mAq Cos9gjDjpqMICIVnrwvPSfbJ+OHR7+Y+6PPg1XxKi1rPs1cRYMbb9vekb+ncf0+igiMP oH+w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 3-v6si1188228pgm.37.2018.07.19.23.24.02; Thu, 19 Jul 2018 23:24:17 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727347AbeGTHKD (ORCPT + 99 others); Fri, 20 Jul 2018 03:10:03 -0400 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70]:58282 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727240AbeGTHKD (ORCPT ); Fri, 20 Jul 2018 03:10:03 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 9A7B27A9; Thu, 19 Jul 2018 23:23:24 -0700 (PDT) Received: from salmiak (usa-sjc-mx-foss1.foss.arm.com [217.140.101.70]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 110D73F246; Thu, 19 Jul 2018 23:23:20 -0700 (PDT) Date: Fri, 20 Jul 2018 07:23:15 +0100 From: Mark Rutland To: Venkata Narendra Kumar Gutta Cc: linux-arm-kernel@lists.infradead.org, tsoni@codeaurora.org, ckadabi@codeaurora.org, rishabhb@codeaurora.org, linux-kernel@vger.kernel.org, robh@kernel.org, hoeun.ryu@gmail.com, adobriyan@gmail.com, zhizhouzhang@asrmicro.com, suzuki.poulose@arm.com, james.morse@arm.com, will.deacon@arm.com, catalin.marinas@arm.com, Abhimanyu Kapur Subject: Re: [PATCH] ARM64: smp: Fix cpu_up() racing with sys_reboot Message-ID: <20180720062314.7qh23yfq2o5jciuh@salmiak> References: <1532038726-3376-1-git-send-email-vnkgutta@codeaurora.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1532038726-3376-1-git-send-email-vnkgutta@codeaurora.org> User-Agent: NeoMutt/20170113 (1.7.2) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jul 19, 2018 at 03:18:46PM -0700, Venkata Narendra Kumar Gutta wrote: > Nothing stops a process from hotplugging in a CPU concurrently > with a sys_reboot() call. In such a situation we could have > ipi_cpu_stop() mark a cpu as 'offline' and _cpu_up() ignore the > fact that the CPU is not really offline and call the > CPU_UP_PREPARE notifier. When this happens stop_machine code will > complain that the cpu thread already exists and BUG_ON(). > > CPU0 CPU1 > > sys_reboot() > kernel_restart() > machine_restart() > machine_shutdown() > smp_send_stop() From a quick look arm64's machine_restart() disables IRQs, calls smp_send_stop(), then calls one of: * efi_reboot() * arm_pm_restart() * do_kernel_restart() ... and I can't see any of these calling machine_shutdown() directly. How does machine_shutdown() get called? Is that somewhere in the restart_list associated with do_kernel_restart? > ... ipi_cpu_stop() > set_cpu_online(1, false) > local_irq_disable() > while(1) > Given IRQs are disabled in machine_restart(), I don't understand how this preemption can occur. Where do they get re-enabled? > cpu_up() > _cpu_up() > if (!cpu_online(1)) > __cpu_notify(CPU_UP_PREPARE...) > > cpu_stop_cpu_callback() > BUG_ON(stopper->thread) > > This is easily reproducible by hotplugging in and out in a tight > loop while also rebooting. Is this reproducible with mainline? e.g. v4.18-rc5? I've packed away my HW in preparation for an office move, but I might be able to give this a go in a VM. > Since the CPU is not really offline and hasn't gone through the > proper steps to be marked as such, let's mark the CPU as inactive. > This is just as easily testable as online and avoids any possibility > of _cpu_up() trying to bring the CPU back online when it never was > offline to begin with. Based on the similar patchset by for arm > targets 040c163( "ARM: smp: Fix cpu_up() racing with sys_reboot)" Is this in mainline? Looking at v4.18-rc5 I don't see arm's ipi_cpu_stop touching the active mask. Thanks, Mark. > > Signed-off-by: Abhimanyu Kapur > Signed-off-by: Venkata Narendra Kumar Gutta > --- > arch/arm64/kernel/smp.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c > index 2faa986..adee4d3 100644 > --- a/arch/arm64/kernel/smp.c > +++ b/arch/arm64/kernel/smp.c > @@ -790,7 +790,7 @@ void arch_irq_work_raise(void) > */ > static void ipi_cpu_stop(unsigned int cpu) > { > - set_cpu_online(cpu, false); > + set_cpu_active(cpu, false); > > local_daif_mask(); > sdei_mask_local_cpu(); > @@ -925,10 +925,10 @@ void smp_send_stop(void) > > /* Wait up to one second for other CPUs to stop */ > timeout = USEC_PER_SEC; > - while (num_online_cpus() > 1 && timeout--) > + while (num_active_cpus() > 1 && timeout--) > udelay(1); > > - if (num_online_cpus() > 1) > + if (num_active_cpus() > 1) > pr_warning("SMP: failed to stop secondary CPUs %*pbl\n", > cpumask_pr_args(cpu_online_mask)); > > -- > The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, > a Linux Foundation Collaborative Project >