Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750981AbbDSEDr (ORCPT ); Sun, 19 Apr 2015 00:03:47 -0400 Received: from bh-25.webhostbox.net ([208.91.199.152]:52692 "EHLO bh-25.webhostbox.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750748AbbDSEDo (ORCPT ); Sun, 19 Apr 2015 00:03:44 -0400 Message-ID: <55332900.9050401@roeck-us.net> Date: Sat, 18 Apr 2015 21:03:12 -0700 From: Guenter Roeck User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 MIME-Version: 1.0 To: Rabin Vincent CC: Linus Torvalds , Linux Kernel Mailing List , Peter Zijlstra , Ingo Molnar Subject: Re: qemu:arm test failure due to commit 8053871d0f7f (smp: Fix smp_call_function_single_async() locking) References: <20150418232325.GA22411@roeck-us.net> <20150418234050.GA5987@roeck-us.net> <55330B32.4010907@roeck-us.net> <20150419033940.GA25145@debian> In-Reply-To: <20150419033940.GA25145@debian> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Authenticated_sender: linux@roeck-us.net X-OutGoing-Spam-Status: No, score=-1.0 X-CTCH-PVer: 0000001 X-CTCH-Spam: Unknown X-CTCH-VOD: Unknown X-CTCH-Flags: 0 X-CTCH-RefID: str=0001.0A020205.5533291F.00BD,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0 X-CTCH-Score: 0.000 X-CTCH-ScoreCust: 0.000 X-CTCH-Rules: X-CTCH-SenderID: linux@roeck-us.net X-CTCH-SenderID-Flags: 0 X-CTCH-SenderID-TotalMessages: 3 X-CTCH-SenderID-TotalSpam: 0 X-CTCH-SenderID-TotalSuspected: 0 X-CTCH-SenderID-TotalConfirmed: 0 X-CTCH-SenderID-TotalBulk: 0 X-CTCH-SenderID-TotalVirus: 0 X-CTCH-SenderID-TotalRecipients: 0 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - bh-25.webhostbox.net X-AntiAbuse: Original Domain - vger.kernel.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - roeck-us.net X-Get-Message-Sender-Via: bh-25.webhostbox.net: mailgid no entry from get_relayhosts_entry X-Source: X-Source-Args: X-Source-Dir: Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4223 Lines: 111 On 04/18/2015 08:39 PM, Rabin Vincent wrote: > On Sat, Apr 18, 2015 at 06:56:02PM -0700, Guenter Roeck wrote: >> Further debugging (with added WARN_ON if cpu != 0 in smp_call_function_single) shows: >> >> [<800157ec>] (unwind_backtrace) from [<8001250c>] (show_stack+0x10/0x14) >> [<8001250c>] (show_stack) from [<80494cb4>] (dump_stack+0x88/0x98) >> [<80494cb4>] (dump_stack) from [<80024058>] (warn_slowpath_common+0x84/0xb4) >> [<80024058>] (warn_slowpath_common) from [<80024124>] (warn_slowpath_null+0x1c/0x24) >> [<80024124>] (warn_slowpath_null) from [<80078fc8>] (smp_call_function_single+0x170/0x178) >> [<80078fc8>] (smp_call_function_single) from [<80090024>] (perf_event_exit_cpu+0x80/0xf0) >> [<80090024>] (perf_event_exit_cpu) from [<80090110>] (perf_cpu_notify+0x30/0x48) >> [<80090110>] (perf_cpu_notify) from [<8003d340>] (notifier_call_chain+0x44/0x84) >> [<8003d340>] (notifier_call_chain) from [<8002451c>] (_cpu_up+0x120/0x168) >> [<8002451c>] (_cpu_up) from [<800245d4>] (cpu_up+0x70/0x94) >> [<800245d4>] (cpu_up) from [<80624234>] (smp_init+0xac/0xb0) >> [<80624234>] (smp_init) from [<80618d84>] (kernel_init_freeable+0x118/0x268) >> [<80618d84>] (kernel_init_freeable) from [<8049107c>] (kernel_init+0x8/0xe8) >> [<8049107c>] (kernel_init) from [<8000f320>] (ret_from_fork+0x14/0x34) >> ---[ end trace 2f9f1bb8a47b3a1b ]--- >> smp_call_function_single, cpu=1, wait=1, csd_stack=87825ea0 >> generic_exec_single, cpu=1, smp_processor_id()=0 >> csd_lock_wait: csd=87825ea0, flags=0x3 >> >> This is repeated for each secondary CPU. But the secondary CPUs don't respond because >> they are not enabled, which I guess explains why the lock is never released. >> >> So, in other words, this happens because the system believes (presumably per configuration >> / fdt data) that there are four CPU cores, but that is not really the case. Previously that >> did not matter, and was handled correctly. Now it is fatal. >> >> Does this help ? > > 8053871d0f7f67c7efb7f226ef031f78877d6625 moved the csd locking to the > callers, but the offline check, which was earlier done before the csd > locking, was not moved. The following moves the checks to the earlier > point fixes your boot: > Yes, your patch fixes the problem. Tested-by: Guenter Roeck Thanks, Guenter > diff --git a/kernel/smp.c b/kernel/smp.c > index 2aaac2c..ba1fb01 100644 > --- a/kernel/smp.c > +++ b/kernel/smp.c > @@ -159,9 +159,6 @@ static int generic_exec_single(int cpu, struct call_single_data *csd, > } > > > - if ((unsigned)cpu >= nr_cpu_ids || !cpu_online(cpu)) > - return -ENXIO; > - > csd->func = func; > csd->info = info; > > @@ -289,6 +286,12 @@ int smp_call_function_single(int cpu, smp_call_func_t func, void *info, > WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled() > && !oops_in_progress); > > + if (cpu != smp_processor_id() > + && ((unsigned)cpu >= nr_cpu_ids || !cpu_online(cpu))) { > + err = -ENXIO; > + goto out; > + } > + > csd = &csd_stack; > if (!wait) { > csd = this_cpu_ptr(&csd_data); > @@ -300,6 +303,7 @@ int smp_call_function_single(int cpu, smp_call_func_t func, void *info, > if (wait) > csd_lock_wait(csd); > > +out: > put_cpu(); > > return err; > @@ -328,6 +332,12 @@ int smp_call_function_single_async(int cpu, struct call_single_data *csd) > > preempt_disable(); > > + if (cpu != smp_processor_id() > + && ((unsigned)cpu >= nr_cpu_ids || !cpu_online(cpu))) { > + err = -ENXIO; > + goto out; > + } > + > /* We could deadlock if we have to wait here with interrupts disabled! */ > if (WARN_ON_ONCE(csd->flags & CSD_FLAG_LOCK)) > csd_lock_wait(csd); > @@ -336,8 +346,9 @@ int smp_call_function_single_async(int cpu, struct call_single_data *csd) > smp_wmb(); > > err = generic_exec_single(cpu, csd, csd->func, csd->info); > - preempt_enable(); > > +out: > + preempt_enable(); > return err; > } > EXPORT_SYMBOL_GPL(smp_call_function_single_async); > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/