Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755569AbaKSJtp (ORCPT ); Wed, 19 Nov 2014 04:49:45 -0500 Received: from mga14.intel.com ([192.55.52.115]:28919 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753776AbaKSJti (ORCPT ); Wed, 19 Nov 2014 04:49:38 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.97,862,1389772800"; d="scan'208";a="418557713" From: "Zheng, Lv" To: "Rafael J. Wysocki" , "Kirill A. Shutemov" CC: "Wysocki, Rafael J" , "Brown, Len" , Lv Zheng , "linux-kernel@vger.kernel.org" , "linux-acpi@vger.kernel.org" Subject: RE: [PATCH 1/6] ACPI/EC: Introduce STARTED/STOPPED flags to replace BLOCKED flag. Thread-Topic: [PATCH 1/6] ACPI/EC: Introduce STARTED/STOPPED flags to replace BLOCKED flag. Thread-Index: AQHP9yVTyINIm6IIIUqyDtdpynBt3JxRS4XwgBSlwQCAAIUygIABQsNA Date: Wed, 19 Nov 2014 08:55:25 +0000 Message-ID: <1AE640813FDE7649BE1B193DEA596E88026A36FB@SHSMSX101.ccr.corp.intel.com> References: <1AE640813FDE7649BE1B193DEA596E8802689778@SHSMSX101.ccr.corp.intel.com> <20141118132328.GA27428@node.dhcp.inet.fi> <4486101.LWS7CexbAj@vostro.rjw.lan> In-Reply-To: <4486101.LWS7CexbAj@vostro.rjw.lan> Accept-Language: zh-CN, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by nfs id sAJ9nqM3030854 Hi, Rafael I think you know this issue. [PATCH 1] can trigger this dead lock because it is actually based on another GPE dead lock fixing series. I have fixed the dead lock in acpi_ev_gpe_detect() or acpi_ev_gpe_dispatch(). The problem is they haven't been upstreamed to ACPICA, so I couldn't post them here. I was thinking we can work this around by applying the acpi_os_wait_events_complete() enhancement support prior than applying this because it can only happen in suspend. But it seems this can also be triggered during boot. So we can have 3 choices here in order to merge this series: 1. Merging the GPE dead lock fix before it is merged in the ACPICA upstream. 2. Changing [PATCH 1] and do not hold EC lock currently (though it is racy, it is currently racy). 3. Reverting [PATCH 1-4] and wait until GPE dead lock fixed in ACPICA upstream. Which one do you prefer? IMO, we have several issues, their fixes form a dependency circle: 1. GPE dead lock: it may depends on DISPATCH_METHOD flushing (we shouldn't bump enabling status up in acpi_ev_asynch_enable_gpe()) 2. EC transaction flushing: it depends on the GPE dead lock 3. EC event polling: it depends on the EC transaction flushing, this is required to support EC event draining as mentioned in bugzilla 44161 4. DISPATCH_METHOD flushing: it depends on EC event polling, if we don't move EC query from the _Lxx/_Exx work queue, then it may block DISPATCH_METHOD flushing. So it seems we need to determine which one should be merged first. IMO, the GPE dead lock fix is the most basic one. Thanks and best regards -Lv > From: Rafael J. Wysocki [mailto:rjw@rjwysocki.net] > Sent: Wednesday, November 19, 2014 5:20 AM > To: Kirill A. Shutemov > Cc: Zheng, Lv; Wysocki, Rafael J; Brown, Len; Lv Zheng; linux-kernel@vger.kernel.org; linux-acpi@vger.kernel.org > Subject: Re: [PATCH 1/6] ACPI/EC: Introduce STARTED/STOPPED flags to replace BLOCKED flag. > > On Tuesday, November 18, 2014 03:23:28 PM Kirill A. Shutemov wrote: > > On Wed, Nov 05, 2014 at 02:52:36AM +0000, Zheng, Lv wrote: > > [cut] > > > > > Here's lockdep warning I see on -next: > > Is patch [1/6] sufficient to trigger this or do you need all [1-4/6]? > > > > [ 0.510159] ====================================================== > > [ 0.510171] [ INFO: possible circular locking dependency detected ] > > [ 0.510185] 3.18.0-rc4-next-20141117-07404-g9dad2ab6df8b #66 Not tainted > > [ 0.510197] ------------------------------------------------------- > > [ 0.510209] swapper/3/0 is trying to acquire lock: > > [ 0.510219] (&(&ec->lock)->rlock){-.....}, at: [] acpi_ec_gpe_handler+0x21/0xfc > > [ 0.510254] > > [ 0.510254] but task is already holding lock: > > [ 0.510266] (&(*(&acpi_gbl_gpe_lock))->rlock){-.....}, at: [] acpi_os_acquire_lock+0xe/0x10 > > [ 0.510296] > > [ 0.510296] which lock already depends on the new lock. > > [ 0.510296] > > [ 0.510312] > > [ 0.510312] the existing dependency chain (in reverse order) is: > > [ 0.510327] > > [ 0.510327] -> #1 (&(*(&acpi_gbl_gpe_lock))->rlock){-.....}: > > [ 0.510344] [] lock_acquire+0xdf/0x2d0 > > [ 0.510364] [] _raw_spin_lock_irqsave+0x50/0x70 > > [ 0.510381] [] acpi_os_acquire_lock+0xe/0x10 > > [ 0.510398] [] acpi_enable_gpe+0x22/0x68 > > [ 0.510416] [] acpi_ec_start+0x66/0x87 > > [ 0.510432] [] ec_install_handlers+0x41/0xa4 > > [ 0.510449] [] acpi_ec_ecdt_probe+0x1a9/0x1ea > > [ 0.510466] [] acpi_init+0x8b/0x26e > > [ 0.510480] [] do_one_initcall+0xd8/0x210 > > [ 0.510496] [] kernel_init_freeable+0x1f5/0x282 > > [ 0.510513] [] kernel_init+0xe/0xf0 > > [ 0.510527] [] ret_from_fork+0x7c/0xb0 > > [ 0.510542] > > [ 0.510542] -> #0 (&(&ec->lock)->rlock){-.....}: > > [ 0.510558] [] __lock_acquire+0x210f/0x2220 > > [ 0.510574] [] lock_acquire+0xdf/0x2d0 > > [ 0.510589] [] _raw_spin_lock_irqsave+0x50/0x70 > > [ 0.510604] [] acpi_ec_gpe_handler+0x21/0xfc > > [ 0.510620] [] acpi_ev_gpe_dispatch+0xd2/0x143 > > [ 0.510636] [] acpi_ev_gpe_detect+0xc8/0x10f > > [ 0.510652] [] acpi_ev_sci_xrupt_handler+0x22/0x38 > > [ 0.510669] [] acpi_irq+0x16/0x31 > > [ 0.510684] [] handle_irq_event_percpu+0x6f/0x540 > > [ 0.510702] [] handle_irq_event+0x41/0x70 > > [ 0.510718] [] handle_fasteoi_irq+0x86/0x140 > > [ 0.510733] [] handle_irq+0x22/0x40 > > [ 0.510748] [] do_IRQ+0x4f/0xf0 > > [ 0.510762] [] ret_from_intr+0x0/0x1a > > [ 0.510777] [] default_idle+0x23/0x260 > > [ 0.510792] [] arch_cpu_idle+0xf/0x20 > > [ 0.510806] [] cpu_startup_entry+0x36b/0x5b0 > > [ 0.510821] [] start_secondary+0x1a4/0x1d0 > > [ 0.510840] > > [ 0.510840] other info that might help us debug this: > > [ 0.510840] > > [ 0.510856] Possible unsafe locking scenario: > > [ 0.510856] > > [ 0.510868] CPU0 CPU1 > > [ 0.510877] ---- ---- > > [ 0.510886] lock(&(*(&acpi_gbl_gpe_lock))->rlock); > > [ 0.510898] lock(&(&ec->lock)->rlock); > > [ 0.510912] lock(&(*(&acpi_gbl_gpe_lock))->rlock); > > [ 0.510927] lock(&(&ec->lock)->rlock); > > [ 0.510938] > > [ 0.510938] *** DEADLOCK *** > > [ 0.510938] > > [ 0.510953] 1 lock held by swapper/3/0: > > [ 0.510961] #0: (&(*(&acpi_gbl_gpe_lock))->rlock){-.....}, at: [] acpi_os_acquire_lock+0xe/0x10 > > [ 0.510990] > > [ 0.510990] stack backtrace: > > [ 0.511004] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 3.18.0-rc4-next-20141117-07404-g9dad2ab6df8b #66 > > [ 0.511021] Hardware name: LENOVO 3460CC6/3460CC6, BIOS G6ET93WW (2.53 ) 02/04/2013 > > [ 0.511035] ffffffff82cb2f70 ffff88011e2c3bb8 ffffffff81afc316 0000000000000011 > > [ 0.511055] ffffffff82cb2f70 ffff88011e2c3c08 ffffffff81afae11 0000000000000001 > > [ 0.511074] ffff88011e2c3c68 ffff88011e2c3c08 ffff8801193f92d0 ffff8801193f9b20 > > [ 0.511094] Call Trace: > > [ 0.511101] [] dump_stack+0x4c/0x6e > > [ 0.511125] [] print_circular_bug+0x2b2/0x2c3 > > [ 0.511142] [] __lock_acquire+0x210f/0x2220 > > [ 0.511159] [] lock_acquire+0xdf/0x2d0 > > [ 0.511176] [] ? acpi_ec_gpe_handler+0x21/0xfc > > [ 0.511192] [] _raw_spin_lock_irqsave+0x50/0x70 > > [ 0.511209] [] ? acpi_ec_gpe_handler+0x21/0xfc > > [ 0.511225] [] ? acpi_hw_write+0x4b/0x52 > > [ 0.511241] [] acpi_ec_gpe_handler+0x21/0xfc > > [ 0.511258] [] acpi_ev_gpe_dispatch+0xd2/0x143 > > [ 0.511274] [] acpi_ev_gpe_detect+0xc8/0x10f > > [ 0.511292] [] acpi_ev_sci_xrupt_handler+0x22/0x38 > > [ 0.511309] [] acpi_irq+0x16/0x31 > > [ 0.511325] [] handle_irq_event_percpu+0x6f/0x540 > > [ 0.511342] [] handle_irq_event+0x41/0x70 > > [ 0.511357] [] ? handle_fasteoi_irq+0x28/0x140 > > [ 0.511372] [] handle_fasteoi_irq+0x86/0x140 > > [ 0.511388] [] handle_irq+0x22/0x40 > > [ 0.511402] [] do_IRQ+0x4f/0xf0 > > [ 0.511417] [] common_interrupt+0x72/0x72 > > [ 0.511428] [] ? native_safe_halt+0x6/0x10 > > [ 0.511454] [] ? trace_hardirqs_on+0xd/0x10 > > [ 0.511468] [] default_idle+0x23/0x260 > > [ 0.511482] [] arch_cpu_idle+0xf/0x20 > > [ 0.511496] [] cpu_startup_entry+0x36b/0x5b0 > > [ 0.511512] [] start_secondary+0x1a4/0x1d0 > > > > > > > > -- > I speak only for myself. > Rafael J. Wysocki, Intel Open Source Technology Center. ????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?