Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753205AbbBYRAc (ORCPT ); Wed, 25 Feb 2015 12:00:32 -0500 Received: from pandora.arm.linux.org.uk ([78.32.30.218]:34848 "EHLO pandora.arm.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751479AbbBYRAb (ORCPT ); Wed, 25 Feb 2015 12:00:31 -0500 Date: Wed, 25 Feb 2015 17:00:11 +0000 From: Russell King - ARM Linux To: Nicolas Pitre Cc: "Paul E. McKenney" , Mark Rutland , Krzysztof Kozlowski , Arnd Bergmann , Bartlomiej Zolnierkiewicz , Catalin Marinas , Stephen Boyd , linux-kernel@vger.kernel.org, Will Deacon , linux-arm-kernel@lists.infradead.org, Marek Szyprowski Subject: Re: [PATCH v2] ARM: Don't use complete() during __cpu_die Message-ID: <20150225170011.GC8656@n2100.arm.linux.org.uk> References: <1423131270-24047-1-git-send-email-k.kozlowski@samsung.com> <20150205105035.GL8656@n2100.arm.linux.org.uk> <20150205142918.GA10634@linux.vnet.ibm.com> <20150205161100.GQ8656@n2100.arm.linux.org.uk> <20150225125610.GY8656@n2100.arm.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1739 Lines: 36 On Wed, Feb 25, 2015 at 11:47:48AM -0500, Nicolas Pitre wrote: > I completely agree with the r/w spinlock. Something like this ought to > be sufficient to make gic_raise_softirq() reentrant which is the issue > here, right? I've been stress-testing it for a while with no problems > so far. No. The issue is that we need a totally lockless way to raise an IPI during CPU hot-unplug, so we can raise an IPI in __cpu_die() to tell the __cpu_kill() code that it's safe to proceed to platform code. As soon sa that IPI has been received, the receiving CPU can decide to cut power to the dying CPU. So, it's entirely possible that power could be lost on the dying CPU before the unlock has become visible. It's a catch-22 - the reason we're sending the IPI is for synchronisation, but right now we need another form of synchronisation because we're using a form of synchronisation... We could just use the spin-and-poll solution instead of an IPI, but I really don't like that - when you see the complexity needed to re-initialise it each time, it quickly becomes very yucky because there is no well defined order between __cpu_die() and __cpu_kill() being called by the two respective CPUs. The last patch I saw doing that had multiple bits to indicate success and timeout, and rather a lot of complexity to recover from failures, and reinitialise state for a second CPU going down. -- FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/