Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753310AbbDSAEu (ORCPT ); Sat, 18 Apr 2015 20:04:50 -0400 Received: from mail-ig0-f170.google.com ([209.85.213.170]:35994 "EHLO mail-ig0-f170.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751342AbbDSAEt (ORCPT ); Sat, 18 Apr 2015 20:04:49 -0400 MIME-Version: 1.0 In-Reply-To: <20150418234050.GA5987@roeck-us.net> References: <20150418232325.GA22411@roeck-us.net> <20150418234050.GA5987@roeck-us.net> Date: Sat, 18 Apr 2015 20:04:48 -0400 X-Google-Sender-Auth: Q_kMsJvcuWYI7R_OJ_tn3Mpf4bA Message-ID: Subject: Re: qemu:arm test failure due to commit 8053871d0f7f (smp: Fix smp_call_function_single_async() locking) From: Linus Torvalds To: Guenter Roeck Cc: Linux Kernel Mailing List , Peter Zijlstra , Ingo Molnar Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2131 Lines: 54 On Sat, Apr 18, 2015 at 7:40 PM, Guenter Roeck wrote: > On Sat, Apr 18, 2015 at 04:23:25PM -0700, Guenter Roeck wrote: >> >> my qemu test for arm:vexpress fails with the latest upstream kernel. It fails >> hard - I don't get any output from the console. Bisect points to commit >> 8053871d0f7f ("smp: Fix smp_call_function_single_async() locking"). >> Reverting this commit fixes the problem. Hmm. It being qemu, can you look at where it seems to lock? > Additional observation: The system boots if I add "-smp cpus=4" to the qemu > options. It does still hang, however, with "-smp cpus=2" and "-smp cpus=3". Funky. That patch still looks obviously correct to me after looking at it again, but I guess we need to revert it if somebody can't see what's wrong. It does make async (wait=0) smp_call_function_single() possibly be *really* asynchronous, ie the 'csd' ends up being released and can be re-used even before the call-single function has completed. That should be a good thing, but I wonder if that triggers some ARM bug. Instead of doing a full revert, what happens if you replace this part: + /* Do we wait until *after* callback? */ + if (csd->flags & CSD_FLAG_SYNCHRONOUS) { + func(info); + csd_unlock(csd); + } else { + csd_unlock(csd); + func(info); + } with just + func(info); + csd_unlock(csd); ie keeping the csd locked until the function has actually completed? I guess for completeness, we should do the same thing for the cpu == smp_processor_id() case (see the "We can unlock early" comment). Now, if that makes a difference, I think it implies a bug in the caller, so it's not the right fix, but it would be an interesting thing to test. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/