Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752369AbbDSB4e (ORCPT ); Sat, 18 Apr 2015 21:56:34 -0400 Received: from bh-25.webhostbox.net ([208.91.199.152]:39892 "EHLO bh-25.webhostbox.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751197AbbDSB4d (ORCPT ); Sat, 18 Apr 2015 21:56:33 -0400 Message-ID: <55330B32.4010907@roeck-us.net> Date: Sat, 18 Apr 2015 18:56:02 -0700 From: Guenter Roeck User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 MIME-Version: 1.0 To: Linus Torvalds CC: Linux Kernel Mailing List , Peter Zijlstra , Ingo Molnar Subject: Re: qemu:arm test failure due to commit 8053871d0f7f (smp: Fix smp_call_function_single_async() locking) References: <20150418232325.GA22411@roeck-us.net> <20150418234050.GA5987@roeck-us.net> In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Authenticated_sender: linux@roeck-us.net X-OutGoing-Spam-Status: No, score=-1.0 X-CTCH-PVer: 0000001 X-CTCH-Spam: Unknown X-CTCH-VOD: Unknown X-CTCH-Flags: 0 X-CTCH-RefID: str=0001.0A020205.55330B50.00DE,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0 X-CTCH-Score: 0.000 X-CTCH-ScoreCust: 0.000 X-CTCH-Rules: X-CTCH-SenderID: linux@roeck-us.net X-CTCH-SenderID-Flags: 0 X-CTCH-SenderID-TotalMessages: 3 X-CTCH-SenderID-TotalSpam: 0 X-CTCH-SenderID-TotalSuspected: 0 X-CTCH-SenderID-TotalConfirmed: 0 X-CTCH-SenderID-TotalBulk: 0 X-CTCH-SenderID-TotalVirus: 0 X-CTCH-SenderID-TotalRecipients: 0 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - bh-25.webhostbox.net X-AntiAbuse: Original Domain - vger.kernel.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - roeck-us.net X-Get-Message-Sender-Via: bh-25.webhostbox.net: mailgid no entry from get_relayhosts_entry X-Source: X-Source-Args: X-Source-Dir: Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3130 Lines: 69 On 04/18/2015 05:04 PM, Linus Torvalds wrote: > On Sat, Apr 18, 2015 at 7:40 PM, Guenter Roeck wrote: >> On Sat, Apr 18, 2015 at 04:23:25PM -0700, Guenter Roeck wrote: >>> >>> my qemu test for arm:vexpress fails with the latest upstream kernel. It fails >>> hard - I don't get any output from the console. Bisect points to commit >>> 8053871d0f7f ("smp: Fix smp_call_function_single_async() locking"). >>> Reverting this commit fixes the problem. > > Hmm. It being qemu, can you look at where it seems to lock? > static void csd_lock_wait(struct call_single_data *csd) { +#if 0 while (smp_load_acquire(&csd->flags) & CSD_FLAG_LOCK) cpu_relax(); +#else + pr_info("csd_lock_wait: flags=0x%x\n", smp_load_acquire(&csd->flags)); +#endif } prints csd_lock_wait: flags=0x3 repeatedly for each call to csd_lock_wait() [and bypasses the problem]. Further debugging shows that wait==1, and that csd points to the pre-initialized csd_stack (which has CSD_FLAG_LOCK set). It seems that CSD_FLAG_LOCK is never reset (there is no call to csd_unlock(), ever). Further debugging (with added WARN_ON if cpu != 0 in smp_call_function_single) shows: [<800157ec>] (unwind_backtrace) from [<8001250c>] (show_stack+0x10/0x14) [<8001250c>] (show_stack) from [<80494cb4>] (dump_stack+0x88/0x98) [<80494cb4>] (dump_stack) from [<80024058>] (warn_slowpath_common+0x84/0xb4) [<80024058>] (warn_slowpath_common) from [<80024124>] (warn_slowpath_null+0x1c/0x24) [<80024124>] (warn_slowpath_null) from [<80078fc8>] (smp_call_function_single+0x170/0x178) [<80078fc8>] (smp_call_function_single) from [<80090024>] (perf_event_exit_cpu+0x80/0xf0) [<80090024>] (perf_event_exit_cpu) from [<80090110>] (perf_cpu_notify+0x30/0x48) [<80090110>] (perf_cpu_notify) from [<8003d340>] (notifier_call_chain+0x44/0x84) [<8003d340>] (notifier_call_chain) from [<8002451c>] (_cpu_up+0x120/0x168) [<8002451c>] (_cpu_up) from [<800245d4>] (cpu_up+0x70/0x94) [<800245d4>] (cpu_up) from [<80624234>] (smp_init+0xac/0xb0) [<80624234>] (smp_init) from [<80618d84>] (kernel_init_freeable+0x118/0x268) [<80618d84>] (kernel_init_freeable) from [<8049107c>] (kernel_init+0x8/0xe8) [<8049107c>] (kernel_init) from [<8000f320>] (ret_from_fork+0x14/0x34) ---[ end trace 2f9f1bb8a47b3a1b ]--- smp_call_function_single, cpu=1, wait=1, csd_stack=87825ea0 generic_exec_single, cpu=1, smp_processor_id()=0 csd_lock_wait: csd=87825ea0, flags=0x3 This is repeated for each secondary CPU. But the secondary CPUs don't respond because they are not enabled, which I guess explains why the lock is never released. So, in other words, this happens because the system believes (presumably per configuration / fdt data) that there are four CPU cores, but that is not really the case. Previously that did not matter, and was handled correctly. Now it is fatal. Does this help ? Guenter -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/