Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1030818AbcCQVDK (ORCPT ); Thu, 17 Mar 2016 17:03:10 -0400 Received: from www.linutronix.de ([62.245.132.108]:34031 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1030620AbcCQVDH (ORCPT ); Thu, 17 Mar 2016 17:03:07 -0400 Date: Thu, 17 Mar 2016 22:01:38 +0100 (CET) From: Thomas Gleixner To: Josh Boyer cc: "Richard W.M. Jones" , x86 , "Linux-Kernel@Vger. Kernel. Org" Subject: Re: Oops from calibrate_delay_is_known on qemu machine with Linux v4.5-1523-g271ecc5253e2 In-Reply-To: Message-ID: References: User-Agent: Alpine 2.11 (DEB 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2310 Lines: 55 Josh, On Thu, 17 Mar 2016, Josh Boyer wrote: > We've had a report [1] of the mainline kernel crashing on a single-cpu > QEMU machine (not kvm) in Fedora. It looks as if the emulated machine > is failing to provide a TSC and the calibrate_delay_is_known function > is passing NULL to cpumask_any_but for the mask parameter. At least > that's all I've been able to discern thus far. > > I was wondering if you had any insight into this issue, given your > recent commit to change calibrate_delay_is_known to use > topology_core_cpumask. The backtrace is below. > at (null) > [ 0.010000] IP: [] _find_next_bit.part.0+0x15/0x70 > [ 0.010000] PGD 0 > > [ 0.010000] RSP: 0000:ffffffff81e03e40 EFLAGS: 00000246 > [ 0.010000] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 > [ 0.010000] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000 > [ 0.010000] RBP: ffffffff81e03e50 R08: ffffffffffffffff R09: 0000000000000000 > [ 0.010000] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 > [ 0.010000] R13: ffffffff82248960 R14: ffffffff822562e0 R15: 0000000000000000 > [ 0.010000] FS: 0000000000000000(0000) GS:ffff88001ee00000(0000) > knlGS:0000000000000000 > [ 0.010000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 0.010000] CR2: 0000000000000000 CR3: 0000000001e06000 CR4: 00000000000006b0 > [ 0.010000] Stack: > [ 0.010000] ffffffff81e03e50 ffffffff81469928 ffffffff81e03e70 > ffffffff81453d56 > [ 0.010000] 0000000000000000 ffff88001f3fa780 ffffffff81e03e80 > ffffffff81040495 > [ 0.010000] ffffffff81e03f40 ffffffff8100285a ffffffff810eefb3 > ffffffff00000000 > [ 0.010000] Call Trace: > [ 0.010000] [] ? find_next_bit+0x18/0x20 > [ 0.010000] [] cpumask_any_but+0x26/0x50 Yuck. That requires that topology_core_cpumask(cpu) is NULL. #define topology_core_cpumask(cpu) (per_cpu(cpu_core_map, cpu)) ... DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_core_map); So that can only result in a NULL pointer if you CONFIG_CPUMASK_OFFSTACK enabled and the allocation fails, which is not checked !?@! I tried to reproduce with Richards script, but so far no dice. Can you please provide your kernel config? Thanks, tglx