Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759372AbcJRNkg (ORCPT ); Tue, 18 Oct 2016 09:40:36 -0400 Received: from mx1.redhat.com ([209.132.183.28]:56654 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753969AbcJRNk3 (ORCPT ); Tue, 18 Oct 2016 09:40:29 -0400 Date: Tue, 18 Oct 2016 15:40:25 +0200 From: Igor Mammedov To: linux-kernel@vger.kernel.org Cc: mingo@redhat.com, peterz@infradead.org, tglx@linutronix.de, efault@gmx.de, torvalds@linux-foundation.org, imammedo@redhat.com Subject: regression since 4.8 and newer in select_idle_siblings() Message-ID: <20161018154025.1e686cad@nial.brq.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Tue, 18 Oct 2016 13:40:28 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4312 Lines: 79 kernel crashes at runtime due null pointer dereference at select_idle_sibling() -> select_idle_cpu() ... u64 avg_cost = this_sd->avg_scan_cost; regression bisects to: commit 10e2f1acd0106c05229f94c70a344ce3a2c8008b Author: Peter Zijlstra sched/core: Rewrite and improve select_idle_siblings() to reproduce crash at runtime start VM with: qemu-system-x86_64 [-enable-kvm] \ -smp 4,sockets=2 \ linux48_disk.img and offline cpu1 in guest: echo 0 > /sys/devices/system/cpu/cpu1/online as result guest panics immediately or with some small delay from some path that triggers access to select_idle_sibling(). To reproduce crash at boot start VM with a recent QEMU (since 2.7): qemu-2.7/qemu-system-x86_64 -smp 1,sockets=2,cores=2,threads=1,maxcpus=4 \ -device qemu64-x86_64-cpu,socket-id=1,core-id=0,thread-id=0 \ -device qemu64-x86_64-cpu,socket-id=1,core-id=1,thread-id=0 \ -kernel bzImage_v48 [-enable-kvm] === one of the panics === [ 0.688680] BUG: unable to handle kernel NULL pointer dereference at 0000000000000078 [ 0.688685] IP: [] select_idle_sibling+0x172/0x3b0 [ 0.688686] PGD 0 [ 0.688687] Oops: 0000 [#1] SMP [ 0.688690] CPU: 0 PID: 109 Comm: kworker/u8:2 Not tainted 4.8.0-rc8+ #675 [ 0.688690] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014 [ 0.688694] Workqueue: events_unbound async_run_entry_fn [ 0.688695] task: ffff88007c258000 task.stack: ffff88007c3b0000 [ 0.688697] RIP: 0010:[] [] select_idle_sibling+0x172/0x3b0 [ 0.688697] RSP: 0000:ffff88007c3b3bb0 EFLAGS: 00010007 [ 0.688698] RAX: 000000000000051b RBX: 0000000000000004 RCX: 0000000000000001 [ 0.688699] RDX: 0000000000000040 RSI: 0000000000000004 RDI: ffff88007d00a008 [ 0.688699] RBP: ffff88007c3b3c10 R08: 0000000000000000 R09: 0000000000000000 [ 0.688700] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000002 [ 0.688700] R13: ffff88007d00a008 R14: 0000000000000000 R15: 0000000000000004 [ 0.688701] FS: 0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 [ 0.688702] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.688703] CR2: 0000000000000078 CR3: 0000000001c06000 CR4: 00000000000006f0 [ 0.688705] Stack: [ 0.688707] 0000000000000001 ffff88007c80e480 000000000000a118 ffff88007c282900 [ 0.688708] 0000000100000000 0000000000000002 0000000200000000 ffff88007c80e600 [ 0.688709] ffff88007c282900 0000000000018ec0 0000000000000000 0000000000000000 [ 0.688710] Call Trace: [ 0.688712] [] select_task_rq_fair+0x717/0x730 [ 0.688713] [] ? update_curr+0xc7/0x150 [ 0.688715] [] ? __enqueue_entity+0x6c/0x70 [ 0.688718] [] try_to_wake_up+0x104/0x390 [ 0.688719] [] wake_up_process+0x15/0x20 [ 0.688724] [] scsi_eh_wakeup+0x33/0xa0 [ 0.688725] [] scsi_schedule_eh+0x4c/0x60 [ 0.688728] [] ata_std_sched_eh+0x3f/0x60 [ 0.688729] [] ata_port_schedule_eh+0x13/0x20 [ 0.688730] [] __ata_port_probe+0x44/0x60 [ 0.688731] [] ata_port_probe+0x20/0x40 [ 0.688732] [] async_port_probe+0x2e/0x60 [ 0.688734] [] async_run_entry_fn+0x39/0x140 [ 0.688736] [] process_one_work+0x152/0x400 [ 0.688738] [] worker_thread+0x125/0x4b0 [ 0.688739] [] ? process_one_work+0x400/0x400 [ 0.688740] [] kthread+0xd8/0xf0 [ 0.688744] [] ret_from_fork+0x1f/0x40 [ 0.688745] [] ? __kthread_parkme+0x70/0x70 [ 0.688757] Code: c7 c0 20 dd 00 00 65 48 03 05 c3 bd f2 7e 4c 8b 30 48 c7 c0 c0 8e 01 00 65 48 03 05 b1 bd f2 7e 48 8b 80 c8 09 00 00 48 c1 e8 09 <49> 39 46 78 0f 87 29 02 00 00 65 8b 3d 9d bd f2 7e e8 b8 c8 ff [ 0.688758] RIP [] select_idle_sibling+0x172/0x3b0 [ 0.688759] RSP [ 0.688759] CR2: 0000000000000078 [ 0.688762] ---[ end trace f10266de945b1779 ]---