Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753548AbaBTEc5 (ORCPT ); Wed, 19 Feb 2014 23:32:57 -0500 Received: from e28smtp09.in.ibm.com ([122.248.162.9]:37364 "EHLO e28smtp09.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752748AbaBTEcz (ORCPT ); Wed, 19 Feb 2014 23:32:55 -0500 Message-ID: <5305856F.3000109@linux.vnet.ibm.com> Date: Thu, 20 Feb 2014 12:32:47 +0800 From: Michael wang User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: Sasha Levin , Peter Zijlstra CC: Ingo Molnar , LKML Subject: Re: sched: hang in migrate_swap References: <5304F32A.4040907@oracle.com> In-Reply-To: <5304F32A.4040907@oracle.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 14022004-2674-0000-0000-00000CCE0273 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 02/20/2014 02:08 AM, Sasha Levin wrote: > Hi all, > > While fuzzing with trinity inside a KVM tools guest, running latest > -next kernel, I see to hit the following hang quite often. Fix for the stuck issue around idle_balance() is now in progress, this may caused be the same problem, I suggest we do some retest after these patch got merged. Regards, Michael Wang > > The initial spew is: > > [ 293.110057] BUG: soft lockup - CPU#8 stuck for 22s! [migration/8:258] > [ 293.110057] Modules linked in: > [ 293.110057] irq event stamp: 20828 > [ 293.110057] hardirqs last enabled at (20827): [] > restore_args+0x0/0x30 > [ 293.110057] hardirqs last disabled at (20828): [] > apic_timer_interrupt+0x6d/0x80 > [ 293.110057] softirqs last enabled at (20826): [] > __do_softirq+0x447/0x4f0 > [ 293.110057] softirqs last disabled at (20821): [] > irq_exit+0x83/0x160 > [ 293.110057] CPU: 8 PID: 258 Comm: migration/8 Tainted: G W > 3.14.0-rc3-next-20140218-sasha-00008-g16140ff-dirty #109 > [ 293.110057] task: ffff8805b8283000 ti: ffff8805b82c0000 task.ti: > ffff8805b82c0000 > [ 293.110057] RIP: 0010:[] [] > multi_cpu_stop+0xc2/0x1c0 > [ 293.110057] RSP: 0000:ffff8805b82c1cd8 EFLAGS: 00000293 > [ 293.110057] RAX: 0000000000000001 RBX: ffffffff84385437 RCX: > 0000000000000000 > [ 293.110057] RDX: 0000000000000001 RSI: 0000000000000001 RDI: > ffff8800c40bb978 > [ 293.110057] RBP: ffff8805b82c1d18 R08: 0000000000000000 R09: > 0000000000000000 > [ 293.110057] R10: 0000000000000001 R11: 0000000000000001 R12: > ffff8805b82c1c48 > [ 293.110057] R13: ffff8805b8283000 R14: ffff8805b82c0000 R15: > ffff8805b8283000 > [ 293.110057] FS: 0000000000000000(0000) GS:ffff8805b9000000(0000) > knlGS:0000000000000000 > [ 293.110057] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [ 293.110057] CR2: 0000000000942d58 CR3: 0000000005c26000 CR4: > 00000000000006e0 > [ 293.110057] DR0: 000000000089e000 DR1: 00007f938d51a000 DR2: > 0000000000000000 > [ 293.110057] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > 0000000000000600 > [ 293.110057] Stack: > [ 293.110057] ffff8805b91d0be0 00000000c40bb9d8 ffff8805b82c1cf8 > ffff8805b91d0be0 > [ 293.110057] ffff8800c40bba28 ffffffff812037d0 ffff8800c40bb978 > ffff8805b91d0c30 > [ 293.110057] ffff8805b82c1de8 ffffffff81202f56 0000000000000001 > ffffffff81202e44 > [ 293.110057] Call Trace: > [ 293.110057] [] ? stop_cpus+0x60/0x60 > [ 293.110057] [] cpu_stopper_thread+0x96/0x190 > [ 293.110057] [] ? cpu_stop_should_run+0x44/0x60 > [ 293.110057] [] ? __lock_release+0x1da/0x1f0 > [ 293.110057] [] ? > _raw_spin_unlock_irqrestore+0x65/0xb0 > [ 293.110057] [] ? trace_hardirqs_on+0xd/0x10 > [ 293.110057] [] smpboot_thread_fn+0x2b6/0x2c0 > [ 293.110057] [] ? > smpboot_register_percpu_thread+0x100/0x100 > [ 293.110057] [] kthread+0x105/0x110 > [ 293.110057] [] ? wait_for_completion+0xc2/0x110 > [ 293.110057] [] ? set_kthreadd_affinity+0x30/0x30 > [ 293.110057] [] ret_from_fork+0x7c/0xb0 > [ 293.110057] [] ? set_kthreadd_affinity+0x30/0x30 > [ 293.110057] Code: b5 05 01 66 0f 1f 44 00 00 45 89 ed 4d 0f a3 2e 45 > 19 ed 45 85 ed 41 0f 95 c7 4c 8d 73 24 31 c0 c7 45 cc 00 00 00 00 66 90 > f3 90 <44> 8b 6b 20 41 39 c5 74 66 41 83 fd 02 74 0f 41 83 fd 03 75 41 > > This is followed by a dump of the rest of the CPU states, and I see the > following: > > [ 268.450444] NMI backtrace for cpu 34 > [ 268.450444] CPU: 34 PID: 9859 Comm: trinity-c129 Tainted: G W > 3.14.0-rc3-next-20140218-sasha-00008-g16140ff-dirty #109 > [ 268.450444] task: ffff880154aa3000 ti: ffff8800c40ba000 task.ti: > ffff8800c40ba000 > [ 268.450444] RIP: 0010:[] [] > pvclock_clocksource_read+0x1a/0xc0 > [ 268.450444] RSP: 0018:ffff8800c40bb538 EFLAGS: 00000082 > [ 268.450444] RAX: 00000000905f62ec RBX: ffff8805b93d8500 RCX: > 000000000000000a > [ 268.450444] RDX: 00000000000000a1 RSI: 0000000000000040 RDI: > ffff881027f12880 > [ 268.450444] RBP: ffff8800c40bb548 R08: 0000000000000000 R09: > 0000000000000001 > [ 268.450444] R10: 0000000000000002 R11: 0000000000000001 R12: > ffff8805b93d8510 > [ 268.450444] R13: 0000000000000000 R14: 0000000000000000 R15: > 0000000000000001 > [ 268.450444] FS: 00007f938e16a700(0000) GS:ffff8805b9200000(0000) > knlGS:0000000000000000 > [ 268.450444] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [ 268.450444] CR2: 00007f938a658a38 CR3: 00000000c40b2000 CR4: > 00000000000006e0 > [ 268.450444] DR0: 000000000089e000 DR1: 00007f938d51a000 DR2: > 0000000000000000 > [ 268.450444] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > 0000000000f7060a > [ 268.450444] Stack: > [ 268.450444] ffff8805b93d8500 ffff8805b93d8510 ffff8800c40bb558 > ffffffff810b6f04 > [ 268.450444] ffff8800c40bb568 ffffffff81076e1d ffff8800c40bb598 > ffffffff81180635 > [ 268.450444] ffffffff875f7be0 0000000000000022 00000000001d8500 > ffff8805b93d8500 > [ 268.450444] Call Trace: > [ 268.450444] [] kvm_clock_read+0x24/0x50 > [ 268.450444] [] sched_clock+0x1d/0x30 > [ 268.450444] [] sched_clock_local+0x25/0x90 > [ 268.450444] [] sched_clock_cpu+0xb8/0x110 > [ 268.450444] [] local_clock+0x2d/0x40 > [ 268.450444] [] __lock_acquire+0x2a8/0x580 > [ 268.450444] [] ? update_blocked_averages+0x61d/0x670 > [ 268.450444] [] lock_acquire+0x182/0x1d0 > [ 268.450444] [] ? idle_balance+0x3c/0x2c0 > [ 268.450444] [] idle_balance+0x71/0x2c0 > [ 268.450444] [] ? idle_balance+0x3c/0x2c0 > [ 268.450444] [] pick_next_task_fair+0x24e/0x290 > [ 268.450444] [] ? dequeue_task+0x6e/0x80 > [ 268.450444] [] __schedule+0x2a5/0x840 > [ 268.450444] [] schedule+0x65/0x70 > [ 268.450444] [] schedule_timeout+0x38/0x2a0 > [ 268.450444] [] ? put_lock_stats+0xe/0x30 > [ 268.450444] [] ? mark_held_locks+0x6c/0x90 > [ 268.450444] [] ? _raw_spin_unlock_irq+0x2b/0x80 > [ 268.450444] [] wait_for_completion+0xb7/0x110 > [ 268.450444] [] ? try_to_wake_up+0x2a0/0x2a0 > [ 268.450444] [] stop_two_cpus+0x261/0x2b0 > [ 268.450444] [] ? __migrate_swap_task+0xa0/0xa0 > [ 268.450444] [] ? kvm_clock_read+0x24/0x50 > [ 268.450444] [] ? stop_cpus+0x60/0x60 > [ 268.450444] [] ? stop_cpus+0x60/0x60 > [ 268.450444] [] ? put_lock_stats+0xe/0x30 > [ 268.450444] [] ? wait_for_completion+0x3f/0x110 > [ 268.450444] [] migrate_swap+0x175/0x1a0 > [ 268.450444] [] task_numa_migrate+0x510/0x600 > [ 268.450444] [] ? select_task_rq_fair+0x440/0x440 > [ 268.450444] [] numa_migrate_preferred+0x4d/0x60 > [ 268.450444] [] task_numa_fault+0x190/0x210 > [ 268.450444] [] do_huge_pmd_numa_page+0x432/0x450 > [ 268.450444] [] ? put_lock_stats+0xe/0x30 > [ 268.450444] [] ? delay_tsc+0xea/0x110 > [ 268.450444] [] __handle_mm_fault+0x244/0x3a0 > [ 268.450444] [] ? rcu_read_unlock+0x5d/0x60 > [ 268.450444] [] handle_mm_fault+0x10b/0x1b0 > [ 268.450444] [] ? __do_page_fault+0x2e2/0x590 > [ 268.450444] [] __do_page_fault+0x551/0x590 > [ 268.450444] [] ? vtime_account_user+0x91/0xa0 > [ 268.450444] [] ? > context_tracking_user_exit+0xa8/0x1c0 > [ 268.450444] [] ? _raw_spin_unlock+0x30/0x50 > [ 268.450444] [] ? vtime_account_user+0x91/0xa0 > [ 268.450444] [] ? > context_tracking_user_exit+0xa8/0x1c0 > [ 268.450444] [] do_page_fault+0x3d/0x70 > [ 268.450444] [] do_async_page_fault+0x35/0x100 > [ 268.450444] [] async_page_fault+0x28/0x30 > [ 268.450444] Code: 6d e1 10 00 e8 98 8a 15 00 c9 c3 66 0f 1f 44 00 00 > 55 48 89 e5 48 83 ec 10 8b 37 eb 06 0f 1f 40 00 89 c6 0f 1f 00 0f ae e8 > 0f 31 <48> c1 e2 20 89 c0 0f be 4f 1c 48 09 c2 44 8b 47 18 48 2b 57 08 > > The stack trace is always pointing to migrate_swap() calling > stop_two_cpus(). Note that I don't > see another CPU in a stopped state when it occurs. > > > Thanks, > Sasha > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/