Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755456Ab2K0OiQ (ORCPT ); Tue, 27 Nov 2012 09:38:16 -0500 Received: from mail-vb0-f46.google.com ([209.85.212.46]:43540 "EHLO mail-vb0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753280Ab2K0OiP (ORCPT ); Tue, 27 Nov 2012 09:38:15 -0500 MIME-Version: 1.0 In-Reply-To: <1353993325.14050.49.camel@ThinkPad-T5421.cn.ibm.com> References: <1353993325.14050.49.camel@ThinkPad-T5421.cn.ibm.com> Date: Tue, 27 Nov 2012 15:38:14 +0100 Message-ID: Subject: Re: [RFC PATCH] Fix abnormal rcu dynticks_nesting values related to async page fault From: Frederic Weisbecker To: Li Zhong Cc: linux-next list , LKML , paulmck@linux.vnet.ibm.com, sasha.levin@oracle.com, gleb@redhat.com, avi@redhat.com Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2751 Lines: 49 2012/11/27 Li Zhong : > I noticed some warnings complaining about dynticks_nesting value, like > > [ 267.545032] ------------[ cut here ]------------ > [ 267.545032] WARNING: at kernel/rcutree.c:382 rcu_eqs_enter+0xab/0xc0() > [ 267.545032] Hardware name: Bochs > [ 267.545032] Modules linked in: > [ 267.545032] Pid: 0, comm: swapper/2 Not tainted 3.7.0-rc5-next-20121115 #8 > [ 267.545032] Call Trace: > [ 267.545032] [] warn_slowpath_common+0x7f/0xc0 > [ 267.545032] [] warn_slowpath_null+0x1a/0x20 > [ 267.545032] [] rcu_eqs_enter+0xab/0xc0 > [ 267.545032] [] rcu_idle_enter+0x2b/0x70 > [ 267.545032] [] cpu_idle+0x6f/0x100 > [ 267.545032] [] start_secondary+0x205/0x20c > [ 267.545032] ---[ end trace 924ae80da035028d ]--- > > After enabling rcu-dyntick tracing, I got following abnormal > dynticks_nesting values (13fffffffffffff, ff00000000000001,etc): > ... > 1 -0 [002] dN.2 18739.518567: rcu_dyntick: End 0 140000000000000 rcu_idle_exit > 2 sshd-696 [002] d..1 18739.518675: rcu_dyntick: ++= 140000000000000 140000000000001 rcu_irq_enter - apf (not present) How did that happen? When I look at do_async_page_fault(), KVM_PV_REASON_PAGE_NOT_PRESENT doesn't do rcu_irq_enter(). > > 3 -0 [002] d..2 18739.518705: rcu_dyntick: Start 140000000000001 0 rcu_idle_enter > 4 -0 [002] d..2 18739.521252: rcu_dyntick: End 0 1 rcu_irq_enter - apf (page ready) > 5 -0 [002] dN.2 18739.521261: rcu_dyntick: Start 1 0 rcu_irq_exit - apf (page ready) > 6 -0 [002] dN.2 18739.521263: rcu_dyntick: End 0 140000000000000 rcu_idle_exit > > 7 sshd-696 [002] d..1 18739.521299: rcu_dyntick: --= 140000000000000 13fffffffffffff rcu_irq_exit - apf (not present) I'm confused for the same reason here. > 8 sshd-696 [002] d..1 18739.521302: rcu_dyntick: Start 13fffffffffffff 0 rcu_user_enter > 9 sshd-696 [002] d..1 18739.521330: rcu_dyntick: End 0 1 rcu_irq_enter - apf (not present) Same. Now we certainly need to add some rcu_user_exit() on do_async_page_fault(). Although I'm not quite sure when this function is called. Is it an exception or an irq? Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/