Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760172Ab2BJUq6 (ORCPT ); Fri, 10 Feb 2012 15:46:58 -0500 Received: from mail-iy0-f174.google.com ([209.85.210.174]:50933 "EHLO mail-iy0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1760115Ab2BJUq5 (ORCPT ); Fri, 10 Feb 2012 15:46:57 -0500 Date: Fri, 10 Feb 2012 12:46:52 -0800 From: Tejun Heo To: Alan Stern Cc: Jens Axboe , "Rafael J. Wysocki" , Linux-pm mailing list , Kernel development list Subject: Re: Bug in disk event polling Message-ID: <20120210204652.GJ19392@google.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1674 Lines: 44 (cc'ing Rafael) Hello, Alan. On Fri, Feb 10, 2012 at 03:31:20PM -0500, Alan Stern wrote: > Don't ask me why this hasn't shown up earlier... There's a big fat bug > in the implementation of disk event polling. > > The polling is done using the system_nrt_wq work queue, which isn't > freezable. As a result, polling continues while the system is > preparing for suspend or hibernation. > > Obviously I/O to suspended devices doesn't work well. Somewhat less > obviously, error recovery for the failed I/O attempts can interfere > with normal system resume. Hmmm.... I see. Yeah, that can be a problem. > You can see this for yourself easily enough by suspending or > hibernating while a USB flash drive is plugged in. You don't even need > to go through the full suspend procedure; the first two stages are > enough (echo devices >/sys/power/pm_test). Check the system log > afterward; most likely you'll find the flash drive got errors and had > to be unregistered and re-enumerated. Do you happen to have log of such failure? Polilng failure itself shouldn't lead to such failure mode. > I have verified that changing all occurrences of system_nrt_wq in > block/genhd.c to system_freezable_wq fixes the bug. However this may > not be the way you want to solve it; you may prefer to have a freezable > non-reentrant work queue. Please feel free to send out a patch to fix the issue. :) Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/