2013-05-08 23:57:43

by Robin Holt

[permalink] [raw]
Subject: Full dynticks needs evtdesc set before marking cpu online.

Thomas,

We are seeing failures booting medium sized machines which I think is
a change in expectations that dyntick put on x86's start_secondary.

During boot of cpus, we see an occassional panic in tick_do_broadcast at

195 if (!cpumask_empty(mask)) {
196 /*
197 * It might be necessary to actually check whether the devices
198 * have different broadcast functions. For now, just use the
199 * one of the first device. This works as long as we have this
200 * misfeature only on x86 (lapic)
201 */
202 td = &per_cpu(tick_cpu_device, cpumask_first(mask));
203 td->evtdev->broadcast(mask);
^^^^^^
NULL --------+


This is called from:
211 static void tick_do_periodic_broadcast(void)
212 {
213 raw_spin_lock(&tick_broadcast_lock);
214
215 cpumask_and(tmpmask, cpu_online_mask, tick_broadcast_mask);
216 tick_do_broadcast(tmpmask);


Now the problem. In start_secondary, we have:
272 lock_vector_lock();
273 set_cpu_online(smp_processor_id(), true);
274 unlock_vector_lock();
275 per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE;
276 x86_platform.nmi_init();
277
278 /* enable local interrupts */
279 local_irq_enable();
280
281 /* to prevent fake stack check failure in clock setup */
282 boot_init_stack_canary();
283
284 x86_cpuinit.setup_percpu_clockev();

So we have the cpu marked online on line 273, but evtdesc is not set
until line 284. This code has been in start_secondary for a considerable
period of time. I think it is just being revealed now.

It does not show up with a normal config, but taking a 'make
x86_64_defconfig' kernel and changing CONFIG_MAXSMP seems to change boot
timing enouogh to make it reproducible on 4 socket and above machines.

The following makes it boot, but I am not sure if this is the right
thing to do.

$ git diff
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 9c73b51..8456432 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -264,6 +264,8 @@ notrace static void __cpuinit start_secondary(void *unused)
*/
check_tsc_sync_target();

+ x86_cpuinit.setup_percpu_clockev();
+
/*
* We need to hold vector_lock so there the set of online cpus
* does not change while we are assigning vectors to cpus. Holding
@@ -281,8 +283,6 @@ notrace static void __cpuinit start_secondary(void *unused)
/* to prevent fake stack check failure in clock setup */
boot_init_stack_canary();

- x86_cpuinit.setup_percpu_clockev();
-
wmb();
cpu_startup_entry(CPUHP_ONLINE);
}


Thanks,
Robin Holt


2013-05-12 20:16:16

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: Full dynticks needs evtdesc set before marking cpu online.

On Wed, May 08, 2013 at 06:57:36PM -0500, Robin Holt wrote:
> Thomas,
>
> We are seeing failures booting medium sized machines which I think is
> a change in expectations that dyntick put on x86's start_secondary.
>
> During boot of cpus, we see an occassional panic in tick_do_broadcast at
>
> 195 if (!cpumask_empty(mask)) {
> 196 /*
> 197 * It might be necessary to actually check whether the devices
> 198 * have different broadcast functions. For now, just use the
> 199 * one of the first device. This works as long as we have this
> 200 * misfeature only on x86 (lapic)
> 201 */
> 202 td = &per_cpu(tick_cpu_device, cpumask_first(mask));
> 203 td->evtdev->broadcast(mask);
> ^^^^^^
> NULL --------+

Seems not related to full dynticks. I can reproduce it at every boot
with CONFIG_NO_HZ_FULL=n.

Attached is the config that triggers it and the log:

[ 0.000000] Linux version 3.9.0+ (fweisbec@phenom) (gcc version 4.4.5 (Ubuntu/Linaro 4.4.4-14ubuntu5.1) ) #2 SMP PREEMPT Sat May 11 16:05:21 CEST 2013
[ 0.000000] Command line: root=/dev/sda1 ro ignore_loglevel rcu_nocbs=1-2 nohz_full=1-2 console=ttyS0,115200n8
[ 0.000000] KERNEL supported cpus:
[ 0.000000] Intel GenuineIntel
[ 0.000000] CPU: vendor_id 'AuthenticAMD' unknown, using generic init.
[ 0.000000] CPU: Your system may be unstable.
[ 0.000000] e820: BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009f3ff] usable
[ 0.000000] BIOS-e820: [mem 0x000000000009f400-0x000000000009ffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000cffdffff] usable
[ 0.000000] BIOS-e820: [mem 0x00000000cffe0000-0x00000000cffe2fff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x00000000cffe3000-0x00000000cffeffff] ACPI data
[ 0.000000] BIOS-e820: [mem 0x00000000cfff0000-0x00000000cfffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000e0000000-0x00000000efffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fec00000-0x00000000ffffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x0000000107ffffff] usable
[ 0.000000] debug: ignoring loglevel setting.
[ 0.000000] NX (Execute Disable) protection: active
[ 0.000000] SMBIOS 2.5 present.
[ 0.000000] DMI: FUJITSU SIEMENS AMD690VM-FMH/AMD690VM-FMH, BIOS V5.13 03/14/2008
[ 0.000000] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
[ 0.000000] e820: remove [mem 0x000a0000-0x000fffff] usable
[ 0.000000] e820: last_pfn = 0x108000 max_arch_pfn = 0x400000000
[ 0.000000] MTRR default type: uncachable
[ 0.000000] MTRR fixed ranges enabled:
[ 0.000000] 00000-9FFFF write-back
[ 0.000000] A0000-BFFFF uncachable
[ 0.000000] C0000-C7FFF write-protect
[ 0.000000] C8000-FFFFF uncachable
[ 0.000000] MTRR variable ranges enabled:
[ 0.000000] 0 base 000000000000 mask FFFF80000000 write-back
[ 0.000000] 1 base 000080000000 mask FFFFC0000000 write-back
[ 0.000000] 2 base 0000C0000000 mask FFFFF0000000 write-back
[ 0.000000] 3 base 000100000000 mask FFFFF8000000 write-back
[ 0.000000] 4 disabled
[ 0.000000] 5 disabled
[ 0.000000] 6 disabled
[ 0.000000] 7 disabled
[ 0.000000] x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
[ 0.000000] e820: update [mem 0xd0000000-0xffffffff] usable ==> reserved
[ 0.000000] e820: last_pfn = 0xcffe0 max_arch_pfn = 0x400000000
[ 0.000000] found SMP MP-table at [mem 0x000f3dc0-0x000f3dcf] mapped at [ffff8800000f3dc0]
[ 0.000000] Scanning 1 areas for low memory corruption
[ 0.000000] Base memory trampoline at [ffff880000099000] 99000 size 24576
[ 0.000000] Using GB pages for direct mapping
[ 0.000000] init_memory_mapping: [mem 0x00000000-0x000fffff]
[ 0.000000] [mem 0x00000000-0x000fffff] page 4k
[ 0.000000] BRK [0x036f7000, 0x036f7fff] PGTABLE
[ 0.000000] BRK [0x036f8000, 0x036f8fff] PGTABLE
[ 0.000000] BRK [0x036f9000, 0x036f9fff] PGTABLE
[ 0.000000] init_memory_mapping: [mem 0x107e00000-0x107ffffff]
[ 0.000000] [mem 0x107e00000-0x107ffffff] page 2M
[ 0.000000] BRK [0x036fa000, 0x036fafff] PGTABLE
[ 0.000000] init_memory_mapping: [mem 0x104000000-0x107dfffff]
[ 0.000000] [mem 0x104000000-0x107dfffff] page 2M
[ 0.000000] init_memory_mapping: [mem 0x100000000-0x103ffffff]
[ 0.000000] [mem 0x100000000-0x103ffffff] page 2M
[ 0.000000] init_memory_mapping: [mem 0x80000000-0xcffdffff]
[ 0.000000] [mem 0x80000000-0xbfffffff] page 1G
[ 0.000000] [mem 0xc0000000-0xcfdfffff] page 2M
[ 0.000000] [mem 0xcfe00000-0xcffdffff] page 4k
[ 0.000000] BRK [0x036fb000, 0x036fbfff] PGTABLE
[ 0.000000] init_memory_mapping: [mem 0x00100000-0x7fffffff]
[ 0.000000] [mem 0x00100000-0x001fffff] page 4k
[ 0.000000] [mem 0x00200000-0x3fffffff] page 2M
[ 0.000000] [mem 0x40000000-0x7fffffff] page 1G
[ 0.000000] ACPI: RSDP 00000000000f7cc0 00024 (v02 FSC )
[ 0.000000] ACPI: RSDT 00000000cffe3040 0003C (v01 FSC PC 42302E31 AWRD 00000000)
[ 0.000000] ACPI: FACP 00000000cffe30c0 00074 (v01 FSC PC 42302E31 AWRD 00000000)
[ 0.000000] ACPI: DSDT 00000000cffe3180 049C0 (v01 FSC AWRDACPI 00001000 MSFT 03000000)
[ 0.000000] ACPI: FACS 00000000cffe0000 00040
[ 0.000000] ACPI: SSDT 00000000cffe7c80 00544 (v01 PTLTD POWERNOW 00000001 LTP 00000001)
[ 0.000000] ACPI: HPET 00000000cffe8240 00038 (v01 FSC PC 42302E31 AWRD 00000098)
[ 0.000000] ACPI: SLIC 00000000cffe82c0 00176 (v01 FSC PC 42302E31 AWRD 00000001)
[ 0.000000] ACPI: MCFG 00000000cffe8480 0003C (v01 FSC PC 42302E31 AWRD 00000000)
[ 0.000000] ACPI: APIC 00000000cffe7b80 00084 (v01 FSC PC 42302E31 AWRD 00000000)
[ 0.000000] ACPI: Local APIC address 0xfee00000
[ 0.000000] No NUMA configuration found
[ 0.000000] Faking a node at [mem 0x0000000000000000-0x0000000107ffffff]
[ 0.000000] Initmem setup node 0 [mem 0x00000000-0x107ffffff]
[ 0.000000] NODE_DATA [mem 0x107fd8000-0x107ffefff]
[ 0.000000] [ffffea0000000000-ffffea00041fffff] PMD -> [ffff880104000000-ffff8801075fffff] on node 0
[ 0.000000] Zone ranges:
[ 0.000000] DMA [mem 0x00001000-0x00ffffff]
[ 0.000000] DMA32 [mem 0x01000000-0xffffffff]
[ 0.000000] Normal [mem 0x100000000-0x107ffffff]
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x00001000-0x0009efff]
[ 0.000000] node 0: [mem 0x00100000-0xcffdffff]
[ 0.000000] node 0: [mem 0x100000000-0x107ffffff]
[ 0.000000] On node 0 totalpages: 884606
[ 0.000000] DMA zone: 64 pages used for memmap
[ 0.000000] DMA zone: 21 pages reserved
[ 0.000000] DMA zone: 3998 pages, LIFO batch:0
[ 0.000000] DMA32 zone: 13248 pages used for memmap
[ 0.000000] DMA32 zone: 847840 pages, LIFO batch:31
[ 0.000000] Normal zone: 512 pages used for memmap
[ 0.000000] Normal zone: 32768 pages, LIFO batch:7
[ 0.000000] ACPI: PM-Timer IO Port: 0x4008
[ 0.000000] ACPI: Local APIC address 0xfee00000
[ 0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
[ 0.000000] ACPI: NR_CPUS/possible_cpus limit of 2 reached. Processor 2/0x2 ignored.
[ 0.000000] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] enabled)
[ 0.000000] ACPI: NR_CPUS/possible_cpus limit of 2 reached. Processor 3/0x3 ignored.
[ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
[ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
[ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1])
[ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1])
[ 0.000000] ACPI: IOAPIC (id[0x04] address[0xfec00000] gsi_base[0])
[ 0.000000] IOAPIC[0]: apic_id 4, version 33, address 0xfec00000, GSI 0-23
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
[ 0.000000] ACPI: IRQ0 used by override.
[ 0.000000] ACPI: IRQ2 used by override.
[ 0.000000] ACPI: IRQ9 used by override.
[ 0.000000] Using ACPI (MADT) for SMP configuration information
[ 0.000000] ACPI: HPET id: 0x10b9a201 base: 0xfed00000
[ 0.000000] smpboot: 4 Processors exceeds NR_CPUS limit of 2
[ 0.000000] smpboot: Allowing 2 CPUs, 0 hotplug CPUs
[ 0.000000] nr_irqs_gsi: 40
[ 0.000000] e820: [mem 0xd0000000-0xdfffffff] available for PCI devices
[ 0.000000] setup_percpu: NR_CPUS:2 nr_cpumask_bits:2 nr_cpu_ids:2 nr_node_ids:1
[ 0.000000] PERCPU: Embedded 477 pages/cpu @ffff880103c00000 s1921280 r8192 d24320 u2097152
[ 0.000000] pcpu-alloc: s1921280 r8192 d24320 u2097152 alloc=1*2097152
[ 0.000000] pcpu-alloc: [0] 0 [0] 1
[ 0.000000] Built 1 zonelists in Node order, mobility grouping on. Total pages: 870761
[ 0.000000] Policy zone: Normal
[ 0.000000] Kernel command line: root=/dev/sda1 ro ignore_loglevel rcu_nocbs=1-2 nohz_full=1-2 console=ttyS0,115200n8
[ 0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
[ 0.000000] Memory: 3372776k/4325376k available (14604k kernel code, 786952k absent, 165648k reserved, 9565k data, 3456k init)
[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
[ 0.000000] Preemptible hierarchical RCU implementation.
[ 0.000000] RCU debugfs-based tracing is enabled.
[ 0.000000] RCU lockdep checking is enabled.
[ 0.000000] Experimental no-CBs for all CPUs
[ 0.000000] Experimental no-CBs CPUs: 0-1.
[ 0.000000] NR_IRQS:4352 nr_irqs:512 16
[ 0.000000] Console: colour VGA+ 80x25
[ 0.000000] console [ttyS0] enabled
[ 0.000000] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
[ 0.000000] ... MAX_LOCKDEP_SUBCLASSES: 8
[ 0.000000] ... MAX_LOCK_DEPTH: 48
[ 0.000000] ... MAX_LOCKDEP_KEYS: 8191
[ 0.000000] ... CLASSHASH_SIZE: 4096
[ 0.000000] ... MAX_LOCKDEP_ENTRIES: 16384
[ 0.000000] ... MAX_LOCKDEP_CHAINS: 32768
[ 0.000000] ... CHAINHASH_SIZE: 16384
[ 0.000000] memory used by lock dependency info: 6367 kB
[ 0.000000] per task-struct memory footprint: 2688 bytes
[ 0.000000] ODEBUG: 49 of 49 active objects replaced
[ 0.000000] kmemleak: Kernel memory leak detector disabled
[ 0.000000] hpet clockevent registered
[ 0.000000] tsc: Fast TSC calibration failed
[ 0.000000] tsc: Unable to calibrate against PIT
[ 0.000000] tsc: using HPET reference calibration
[ 0.000000] tsc: Detected 2299.710 MHz processor
[ 0.000000] tsc: Marking TSC unstable due to TSCs unsynchronized
[ 0.011000] Calibrating delay loop (skipped), value calculated using timer frequency.. 4599.42 BogoMIPS (lpj=2299710)
[ 0.012024] pid_max: default: 32768 minimum: 301
[ 0.014303] Security Framework initialized
[ 0.015031] SELinux: Initializing.
[ 0.016185] SELinux: Starting in permissive mode
[ 0.022138] Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
[ 0.032281] Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
[ 0.037180] Mount-cache hash table entries: 256
[ 0.053269] Initializing cgroup subsys devices
[ 0.054124] Initializing cgroup subsys freezer
[ 0.055090] Initializing cgroup subsys net_cls
[ 0.056051] Initializing cgroup subsys blkio
[ 0.057033] Initializing cgroup subsys perf_event
[ 0.059062] Initializing cgroup subsys net_prio
[ 0.061168] mce: CPU supports 6 MCE banks
[ 0.062041] mce: unknown CPU type - not enabling MCE support
[ 0.063027] numa_add_cpu cpu 0 node 0: mask now 0
[ 0.065017] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
[ 0.065017] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0
[ 0.065017] tlb_flushall_shift: -1
[ 0.066280] Freeing SMP alternatives: 40k freed
[ 0.067055] ACPI: Core revision 20130328
[ 0.703059] ACPI: All ACPI Tables successfully acquired
[ 0.707111] ftrace: allocating 44867 entries in 176 pages
[ 0.726591] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[ 0.737707] smpboot: CPU0: AuthenticAMD AMD Phenom(tm) 9600 Quad-Core Processor (fam: 10, model: 02, stepping: 02)
[ 0.743000] Performance Events:
[ 0.757505] ftrace: Allocated trace_printk buffers
[ 0.775835] NMI watchdog: disabled (cpu0): hardware events not enabled
[ 0.786881] SMP alternatives: lockdep: fixing up alternatives
[ 0.787129] smpboot: Booting Node 0, Processors #1 OK
[ 0.011000] numa_add_cpu cpu 1 node 0: mask now 0-1
[ 0.011000] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
[ 0.011000] IP: [<ffffffff81130e76>] clockevents_set_mode+0x16/0x70
[ 0.011000] PGD 0
[ 0.011000] Oops: 0000 [#1] PREEMPT SMP
[ 0.011000] Modules linked in:
[ 0.011000] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.9.0+ #2
[ 0.011000] Hardware name: FUJITSU SIEMENS AMD690VM-FMH/AMD690VM-FMH, BIOS V5.13 03/14/2008
[ 0.011000] task: ffff880102e6ca40 ti: ffff880102e66000 task.ti: ffff880102e66000
[ 0.011000] RIP: 0010:[<ffffffff81130e76>] [<ffffffff81130e76>] clockevents_set_mode+0x16/0x70
[ 0.011000] RSP: 0000:ffff880103e03ef8 EFLAGS: 00010082
[ 0.011000] RAX: ffff880103e0d620 RBX: 0000000000000001 RCX: 0000000000000000
[ 0.011000] RDX: 0000000000000003 RSI: 0000000000000003 RDI: 0000000000000000
[ 0.011000] RBP: ffff880103e03f08 R08: 0000000000000000 R09: 0000000000000001
[ 0.011000] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880103826a80
[ 0.011000] R13: 000000000000d640 R14: ffff880102e67e78 R15: 0000000000000000
[ 0.011000] FS: 0000000000000000(0000) GS:ffff880103e00000(0000) knlGS:0000000000000000
[ 0.011000] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 0.011000] CR2: 0000000000000038 CR3: 000000000260c000 CR4: 00000000000006a0
[ 0.011000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 0.011000] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 0.011000] Stack:
[ 0.011000] 0000000000000001 ffff880103826a80 ffff880103e03f28 ffffffff81132113
[ 0.011000] 000000000000d640 0000000000000001 ffff880103e03f58 ffffffff811339c2
[ 0.011000] 0000000000000001 ffffffffffffffc8 0000000000000000 ffff880102e67e78
[ 0.011000] Call Trace:
[ 0.011000] <IRQ>
[ 0.011000] [<ffffffff81132113>] tick_check_oneshot_broadcast+0x53/0x90
[ 0.011000] [<ffffffff811339c2>] tick_check_idle+0x32/0xe0
[ 0.011000] [<ffffffff810d4f7c>] irq_enter+0x7c/0x90
[ 0.011000] [<ffffffff81e3ea7e>] do_IRQ+0x3e/0xe0
[ 0.011000] [<ffffffff81e3c52c>] common_interrupt+0x6c/0x6c
[ 0.011000] <EOI>
[ 0.011000] [<ffffffff81e31bc8>] ? start_secondary+0x218/0x277
[ 0.011000] [<ffffffff81e31bc3>] ? start_secondary+0x213/0x277
[ 0.011000] Code: dc 48 83 c4 08 5b c9 c3 eb 0b 90 90 90 90 90 90 90 90 90 90 90 55 48 89 e5 48 83 ec 10 48 89 1c 24 4c 89 64 24 08 66 66 66 66 90 <39> 77 38 48 89 fb 41 89 f4 74 13 48 89 fe 44 89 e7 ff 53 50 41
[ 0.011000] RIP [<ffffffff81130e76>] clockevents_set_mode+0x16/0x70
[ 0.011000] RSP <ffff880103e03ef8>
[ 0.011000] CR2: 0000000000000038
[ 0.866000] BUG: unable to handle kernel [ 0.011000] ---[ end trace bd42872bae65a76c ]---
[ 0.011000] Kernel panic - not syncing: Fatal exception in interrupt
NULL pointer dereference at 0000000000000048
[ 0.866000] IP: [<ffffffff811328ae>] tick_do_broadcast+0x5e/0x100
[ 0.866000] PGD 0
[ 0.866000] Oops: 0000 [#2] PREEMPT SMP
[ 0.866000] Modules linked in:
[ 0.866000] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G D 3.9.0+ #2
[ 0.866000] Hardware name: FUJITSU SIEMENS AMD690VM-FMH/AMD690VM-FMH, BIOS V5.13 03/14/2008
[ 0.866000] task: ffff880102c20000 ti: ffff88010336a000 task.ti: ffff88010336a000
[ 0.866000] RIP: 0010:[<ffffffff811328ae>] [<ffffffff811328ae>] tick_do_broadcast+0x5e/0x100
[ 0.866000] RSP: 0000:ffff880103c03c40 EFLAGS: 00010002
[ 0.866000] RAX: 0000000000000000 RBX: ffff8801038263f0 RCX: 0000000000000001
[ 0.866000] RDX: ffff8801038263f0 RSI: 0000000000000002 RDI: ffff8801038263f0
[ 0.866000] RBP: ffff880103c03c60 R08: 0000000000000000 R09: 0000000000000008
[ 0.866000] R10: 0000000000000001 R11: 0000000000000000 R12: 000000000000d620
[ 0.866000] R13: 0000000000000000 R14: ffff880103c00080 R15: 0000000000000000
[ 0.866000] FS: 0000000000000000(0000) GS:ffff880103c00000(0000) knlGS:0000000000000000
[ 0.866000] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 0.866000] CR2: 0000000000000048 CR3: 000000000260c000 CR4: 00000000000006b0
[ 0.866000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 0.866000] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 0.866000] Stack:
[ 0.866000] ffffffff81132be5 0000000000000000 ffffffff82617e00 ffffffff82624700
[ 0.866000] ffff880103c03c70 ffffffff81132c11 ffff880103c03c90 ffffffff81132c34
[ 0.866000] ffffffff82617e00 ffff88010380c3d0 ffff880103c03ca0 ffffffff8104cd75
[ 0.866000] Call Trace:
[ 0.866000] <IRQ>
[ 0.866000] [<ffffffff81132be5>] ? tick_do_periodic_broadcast+0x15/0x50
[ 0.866000] [<ffffffff81132c11>] tick_do_periodic_broadcast+0x41/0x50
[ 0.866000] [<ffffffff81132c34>] tick_handle_periodic_broadcast+0x14/0x50
[ 0.866000] [<ffffffff8104cd75>] timer_interrupt+0x15/0x20
[ 0.866000] [<ffffffff811796ad>] handle_irq_event_percpu+0x9d/0x3b0
[ 0.866000] [<ffffffff81179a08>] handle_irq_event+0x48/0x70
[ 0.866000] [<ffffffff8117d1dd>] handle_edge_irq+0x6d/0x130
[ 0.866000] [<ffffffff8104c4a1>] handle_irq+0x71/0x190
[ 0.866000] [<ffffffff81116452>] ? vtime_account_system+0x52/0x60
[ 0.866000] [<ffffffff81e3ea9d>] do_IRQ+0x5d/0xe0
[ 0.866000] [<ffffffff81e3c52c>] common_interrupt+0x6c/0x6c
[ 0.866000] [<ffffffff810d5811>] ? __do_softirq+0xc1/0x470
[ 0.866000] [<ffffffff810d581b>] ? __do_softirq+0xcb/0x470
[ 0.866000] [<ffffffff810d5811>] ? __do_softirq+0xc1/0x470
[ 0.866000] [<ffffffff81e3bdc5>] ? _raw_spin_unlock+0x35/0x60
[ 0.866000] [<ffffffff810d5d1d>] irq_exit+0xed/0x100
[ 0.866000] [<ffffffff81e3eb8b>] smp_apic_timer_interrupt+0x6b/0x98
[ 0.866000] [<ffffffff81e3dcdc>] apic_timer_interrupt+0x6c/0x80
[ 0.866000] <EOI>
[ 0.866000] [<ffffffff8113a480>] ? mark_held_locks+0x90/0x150
[ 0.866000] [<ffffffff81e3bea6>] ? _raw_spin_unlock_irq+0x36/0x70
[ 0.866000] [<ffffffff81e3bea0>] ? _raw_spin_unlock_irq+0x30/0x70
[ 0.866000] [<ffffffff81e3a195>] __schedule+0x925/0x9c0
[ 0.866000] [<ffffffff81e3a359>] schedule+0x29/0x70
[ 0.866000] [<ffffffff81e3135e>] native_cpu_up+0x76e/0x966
[ 0.866000] [<ffffffff81e33326>] cpu_up+0xfb/0x17b
[ 0.866000] [<ffffffff8299037d>] smp_init+0x4e/0x8e
[ 0.866000] [<ffffffff829729dc>] kernel_init_freeable+0x21a/0x32e
[ 0.866000] [<ffffffff81e28200>] ? rest_init+0x160/0x160
[ 0.866000] [<ffffffff81e2820e>] kernel_init+0xe/0xf0
[ 0.866000] [<ffffffff81e3cfdc>] ret_from_fork+0x7c/0xb0
[ 0.866000] [<ffffffff81e28200>] ? rest_init+0x160/0x160
[ 0.866000] Code: 85 c0 75 2a 48 63 35 ba 30 65 01 48 89 df 49 c7 c4 20 d6 00 00 e8 53 f4 52 00 89 c0 48 89 df 48 8b 04 c5 40 26 78 82 4a 8b 04 20 <ff> 50 48 48 83 c4 10 5b 41 5c c9 c3 eb 04 90 90 90 90 39 f0 73
[ 0.866000] RIP [<ffffffff811328ae>] tick_do_broadcast+0x5e/0x100
[ 0.866000] RSP <ffff880103c03c40>
[ 0.866000] CR2: 0000000000000048
[ 0.866000] ---[ end trace bd42872bae65a76d ]---


Attachments:
(No filename) (19.41 kB)
config_crash (90.43 kB)
Download all attachments

2013-05-13 09:21:05

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Full dynticks needs evtdesc set before marking cpu online.

On Wed, 8 May 2013, Robin Holt wrote:

> Thomas,
>
> We are seeing failures booting medium sized machines which I think is
> a change in expectations that dyntick put on x86's start_secondary.
>
> During boot of cpus, we see an occassional panic in tick_do_broadcast at

http://lkml.indiana.edu/hypermail/linux/kernel/1305.0/01818.html

Will hit Linus tree soon.

Thanks,

tglx

2013-05-13 12:55:20

by Robin Holt

[permalink] [raw]
Subject: Re: Full dynticks needs evtdesc set before marking cpu online.

On Mon, May 13, 2013 at 11:21:00AM +0200, Thomas Gleixner wrote:
> On Wed, 8 May 2013, Robin Holt wrote:
>
> > Thomas,
> >
> > We are seeing failures booting medium sized machines which I think is
> > a change in expectations that dyntick put on x86's start_secondary.
> >
> > During boot of cpus, we see an occassional panic in tick_do_broadcast at
>
> http://lkml.indiana.edu/hypermail/linux/kernel/1305.0/01818.html
>
> Will hit Linus tree soon.

I think this is really due to a sequence in start_secondary. The cpu
has been marked as online, but its evtdesc has not been initialized.
I sent a followup to this with a hack/patch.

It was essentially:
--- linux.orig/arch/x86/kernel/smpboot.c
+++ linux/arch/x86/kernel/smpboot.c
@@ -264,6 +264,8 @@ notrace static void __cpuinit start_seco
*/
check_tsc_sync_target();

+ x86_cpuinit.setup_percpu_clockev();
+
/*
* We need to hold vector_lock so there the set of online cpus
* does not change while we are assigning vectors to cpus. Holding
@@ -281,8 +283,6 @@ notrace static void __cpuinit start_seco
/* to prevent fake stack check failure in clock setup */
boot_init_stack_canary();

- x86_cpuinit.setup_percpu_clockev();
-
wmb();
cpu_startup_entry(CPUHP_ONLINE);
}


Robin

2013-05-13 13:04:02

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Full dynticks needs evtdesc set before marking cpu online.

On Mon, 13 May 2013, Robin Holt wrote:

> On Mon, May 13, 2013 at 11:21:00AM +0200, Thomas Gleixner wrote:
> > On Wed, 8 May 2013, Robin Holt wrote:
> >
> > > Thomas,
> > >
> > > We are seeing failures booting medium sized machines which I think is
> > > a change in expectations that dyntick put on x86's start_secondary.
> > >
> > > During boot of cpus, we see an occassional panic in tick_do_broadcast at
> >
> > http://lkml.indiana.edu/hypermail/linux/kernel/1305.0/01818.html
> >
> > Will hit Linus tree soon.
>
> I think this is really due to a sequence in start_secondary. The cpu
> has been marked as online, but its evtdesc has not been initialized.
> I sent a followup to this with a hack/patch.

No, the real issue is that I messed up the cpumask conversion in the
broadcast code, i.e. using alloc instead of zalloc, which allocated
nonzeroed memory for the cpumasks, so any random bit set will crash
the machine. Your patch is just papering over the issue.


> It was essentially:
> --- linux.orig/arch/x86/kernel/smpboot.c
> +++ linux/arch/x86/kernel/smpboot.c
> @@ -264,6 +264,8 @@ notrace static void __cpuinit start_seco
> */
> check_tsc_sync_target();
>
> + x86_cpuinit.setup_percpu_clockev();
> +
> /*
> * We need to hold vector_lock so there the set of online cpus
> * does not change while we are assigning vectors to cpus. Holding
> @@ -281,8 +283,6 @@ notrace static void __cpuinit start_seco
> /* to prevent fake stack check failure in clock setup */
> boot_init_stack_canary();
>
> - x86_cpuinit.setup_percpu_clockev();
> -
> wmb();
> cpu_startup_entry(CPUHP_ONLINE);
> }
>
>
> Robin
>

2013-05-13 13:59:51

by Robin Holt

[permalink] [raw]
Subject: Re: Full dynticks needs evtdesc set before marking cpu online.

On Mon, May 13, 2013 at 03:03:55PM +0200, Thomas Gleixner wrote:
> On Mon, 13 May 2013, Robin Holt wrote:
>
> > On Mon, May 13, 2013 at 11:21:00AM +0200, Thomas Gleixner wrote:
> > > On Wed, 8 May 2013, Robin Holt wrote:
> > >
> > > > Thomas,
> > > >
> > > > We are seeing failures booting medium sized machines which I think is
> > > > a change in expectations that dyntick put on x86's start_secondary.
> > > >
> > > > During boot of cpus, we see an occassional panic in tick_do_broadcast at
> > >
> > > http://lkml.indiana.edu/hypermail/linux/kernel/1305.0/01818.html
> > >
> > > Will hit Linus tree soon.
> >
> > I think this is really due to a sequence in start_secondary. The cpu
> > has been marked as online, but its evtdesc has not been initialized.
> > I sent a followup to this with a hack/patch.
>
> No, the real issue is that I messed up the cpumask conversion in the
> broadcast code, i.e. using alloc instead of zalloc, which allocated
> nonzeroed memory for the cpumasks, so any random bit set will crash
> the machine. Your patch is just papering over the issue.

I believe I understand now. What would be the downside of moving
the initialization to before marking the cpu online? It seems like a
reasonable this to expect as well in spite of it not being the right
fix to the other bug.

Thanks,
Robin

2013-05-13 14:04:52

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Full dynticks needs evtdesc set before marking cpu online.

On Mon, 13 May 2013, Robin Holt wrote:
> On Mon, May 13, 2013 at 03:03:55PM +0200, Thomas Gleixner wrote:
> > On Mon, 13 May 2013, Robin Holt wrote:
> >
> > > On Mon, May 13, 2013 at 11:21:00AM +0200, Thomas Gleixner wrote:
> > > > On Wed, 8 May 2013, Robin Holt wrote:
> > > >
> > > > > Thomas,
> > > > >
> > > > > We are seeing failures booting medium sized machines which I think is
> > > > > a change in expectations that dyntick put on x86's start_secondary.
> > > > >
> > > > > During boot of cpus, we see an occassional panic in tick_do_broadcast at
> > > >
> > > > http://lkml.indiana.edu/hypermail/linux/kernel/1305.0/01818.html
> > > >
> > > > Will hit Linus tree soon.
> > >
> > > I think this is really due to a sequence in start_secondary. The cpu
> > > has been marked as online, but its evtdesc has not been initialized.
> > > I sent a followup to this with a hack/patch.
> >
> > No, the real issue is that I messed up the cpumask conversion in the
> > broadcast code, i.e. using alloc instead of zalloc, which allocated
> > nonzeroed memory for the cpumasks, so any random bit set will crash
> > the machine. Your patch is just papering over the issue.
>
> I believe I understand now. What would be the downside of moving
> the initialization to before marking the cpu online? It seems like a
> reasonable this to expect as well in spite of it not being the right
> fix to the other bug.

Yes, we can move it, but its not a required thing that the tick device
is setup befor onlining.

Thanks,

tglx

2013-05-13 14:31:08

by Robin Holt

[permalink] [raw]
Subject: Re: Full dynticks needs evtdesc set before marking cpu online.

On Mon, May 13, 2013 at 04:04:45PM +0200, Thomas Gleixner wrote:
> On Mon, 13 May 2013, Robin Holt wrote:
> > On Mon, May 13, 2013 at 03:03:55PM +0200, Thomas Gleixner wrote:
> > > On Mon, 13 May 2013, Robin Holt wrote:
> > >
> > > > On Mon, May 13, 2013 at 11:21:00AM +0200, Thomas Gleixner wrote:
> > > > > On Wed, 8 May 2013, Robin Holt wrote:
> > > > >
> > > > > > Thomas,
> > > > > >
> > > > > > We are seeing failures booting medium sized machines which I think is
> > > > > > a change in expectations that dyntick put on x86's start_secondary.
> > > > > >
> > > > > > During boot of cpus, we see an occassional panic in tick_do_broadcast at
> > > > >
> > > > > http://lkml.indiana.edu/hypermail/linux/kernel/1305.0/01818.html
> > > > >
> > > > > Will hit Linus tree soon.
> > > >
> > > > I think this is really due to a sequence in start_secondary. The cpu
> > > > has been marked as online, but its evtdesc has not been initialized.
> > > > I sent a followup to this with a hack/patch.
> > >
> > > No, the real issue is that I messed up the cpumask conversion in the
> > > broadcast code, i.e. using alloc instead of zalloc, which allocated
> > > nonzeroed memory for the cpumasks, so any random bit set will crash
> > > the machine. Your patch is just papering over the issue.
> >
> > I believe I understand now. What would be the downside of moving
> > the initialization to before marking the cpu online? It seems like a
> > reasonable this to expect as well in spite of it not being the right
> > fix to the other bug.
>
> Yes, we can move it, but its not a required thing that the tick device
> is setup befor onlining.

I tested with your patch and it does fix my problem as well.

Thank your,
Robin