Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751242AbdHaHDk (ORCPT ); Thu, 31 Aug 2017 03:03:40 -0400 Received: from mail-io0-f173.google.com ([209.85.223.173]:35657 "EHLO mail-io0-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750973AbdHaHDj (ORCPT ); Thu, 31 Aug 2017 03:03:39 -0400 MIME-Version: 1.0 In-Reply-To: <7e9037a6-195b-ca0f-75da-5a338c5e938d@imgtec.com> References: <1503476475-21069-1-git-send-email-matt.redfearn@imgtec.com> <71ea8331-78da-c22b-d46d-99ab6c187bbf@nokia.com> <7e9037a6-195b-ca0f-75da-5a338c5e938d@imgtec.com> From: Huacai Chen Date: Thu, 31 Aug 2017 15:03:38 +0800 Message-ID: Subject: Re: [PATCH] MIPS: Revert "MIPS: Fix race on setting and getting cpu_online_mask" To: Matt Redfearn Cc: Matija Glavinic Pecotic , Ralf Baechle , Linux MIPS Mailing List , Marcin Nowakowski , Paul Gortmaker , "linux-kernel@vger.kernel.org" , Ingo Molnar , Paul Burton Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2306 Lines: 58 Yes, your original patch (8f46cca1e6c06) is needed in 4.12/4.13, but they should be reverted in 4.1/4.4-stable branch. Huacai On Wed, Aug 30, 2017 at 9:24 PM, Matt Redfearn wrote: > Hi Huacai, > > On 29/08/17 02:43, Huacai Chen wrote: >> >> I suggest to drop sync_r4k completely, because it is inaccurate. You >> can use IPI to synchronize count/compare instead, as Loongson-3 does. > > > I am all for a better fix, such as this - but that would be a much more > invasive change than what I propose. Currently 4.13 is broken by the patch > that this is attempting to revert. It is easy to deadlock the system by > hotplugging a CPU while it is busy. That was what my patch 8f46cca1e6c06 > originally fixed. Even though it is, perhaps, not stylistically great to > have the synchronisation done by callers, the fact is that it *is* done > (added in 8df3e07e7f21f), so the behavior for 4.13 would be safe and > deadlocks not possible. We can then look at more invasive changes that are > acceptable to everyone during the 4.14 cycle. > > Thanks, > Matt > > >> >> Huacai >> >> On Mon, Aug 28, 2017 at 6:07 PM, Matija Glavinic Pecotic >> wrote: >>> >>> On 08/23/2017 10:21 AM, Matt Redfearn wrote: >>>> >>>> As noted in the commit message, upstream differs in this area. The >>>> hotplug code now waits on a completion event in bringup_wait_for_ap, >>>> which is set by the starting CPU in cpuhp_online_idle once it calls >>>> cpu_startup_entry. Thus there is no possibility of a race in upstream, >>>> and this commit has only re-introduced the deadlock condition, which can >>>> be observed on multiple platforms when running a heavy load test at the >>>> same time as hotplugging CPUs. See commit 8f46cca1e6c06 ("MIPS: SMP: Fix >>>> possibility of deadlock when bringing CPUs online") for details. >>> >>> I personally do not like the fact that synchronization is implicitly done >>> by the callers, it is the reason why the patch was proposed. As noted >>> before, it is enough someone checks cpu online mask somewhere in between and >>> there is race again. >>> >>> How about moving synchronise_count_slave before setting the cpu online? >>> Is there dependency it has to be done after completion? >>> >>> Regards, >>> >>> Matija >>> >