Received: by 2002:a5b:505:0:0:0:0:0 with SMTP id o5csp4096681ybp; Mon, 7 Oct 2019 03:10:04 -0700 (PDT) X-Google-Smtp-Source: APXvYqy0g6Nv7fM1zxMjsbyqobaxIsplfMDtZl8W6oXiNX2m9gKJ9BrH72JO8loYwdAYOAPlH405 X-Received: by 2002:a17:906:6d53:: with SMTP id a19mr8349055ejt.144.1570443004684; Mon, 07 Oct 2019 03:10:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1570443004; cv=none; d=google.com; s=arc-20160816; b=oiYGptbYy0zT+0G0GUDdRufPQnqoTvOIj31WRN12MdMArnDYAmeiuptjhjIxXUikjN rrzMQ4R8gqhCALapm0H43jqop3yizfuLLbDkqu0jdsp22e9ilDyqXmtYeqZ1YpdgQ3UY ryjZ5aJFl9PofQN1EJbgjcJatF3V5vB5LrHHRAr4inTy2jvp6jh8p08mpi7h/zT9g6OJ X1zukmStaCKXGwvJG9cX48lvH74Aml9Mhe8piMNJ3IyerqAljWMpaNhK9AAbHW6BCRHZ aXFWLZSlJhKKgsGu/K1fdhB3JSj7IjmGx9EsAEVF9D+ih58S5bg8UVWdXh59No9GeBYD dALQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=Jj3n4HHovtnQ4kynzDQJHGixadTfSRj9kI7vQ1oxqjw=; b=Tugx3hM96vkUiXKzoh3M4O6RL8yAWEjyTu47Mf1jwR0//GJoQH/WIN2vInZ2fXYu36 zm5qRj4TMt4utH7H460wk9X1uo1kvtDuO4k8sglbTqd3+1CSdDCvmQEJilAWMr233alc etDOqCPLOTFg4EnhhBYStHMJZfIUDgFDiQyIqfhLsGnAfg/gCbkjkeSQ9blLtGfuUgkE 2N14lnAwpOJ6FaUh13/8w+9QPd90OIUom7B4/KGXsxbXtItfifFZ3fHU5eZ4zZGtmg30 VboKYnzB/4x6bnfYsOo2VvCT8ODfmHbpc9StLSRplgwS5a16JiEXosRAwqHPs8GJEwhZ FCRw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e49si8652630edb.87.2019.10.07.03.09.41; Mon, 07 Oct 2019 03:10:04 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727674AbfJGKIl (ORCPT + 99 others); Mon, 7 Oct 2019 06:08:41 -0400 Received: from mx2.suse.de ([195.135.220.15]:53592 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726010AbfJGKIk (ORCPT ); Mon, 7 Oct 2019 06:08:40 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id CE97CAE8A; Mon, 7 Oct 2019 10:08:38 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 1F10F1E481E; Mon, 7 Oct 2019 12:08:38 +0200 (CEST) Date: Mon, 7 Oct 2019 12:08:38 +0200 From: Jan Kara To: Jens Axboe Cc: Mika Westerberg , "Rafael J. Wysocki" , Tejun Heo , "Rafael J. Wysocki" , AceLan Kao , Jan Kara , Greg Kroah-Hartman , linux-kernel@vger.kernel.org Subject: Re: System hangs if NVMe/SSD is removed during suspend Message-ID: <20191007100838.GA24366@quack2.suse.cz> References: <20191002122136.GD2819@lahna.fi.intel.com> <20191003165033.GC3247445@devbig004.ftw2.facebook.com> <20191004080340.GB2819@lahna.fi.intel.com> <2367934.HCQFgJ56tP@kreacher> <20191004110151.GH2819@lahna.fi.intel.com> <99b3ffb8-4205-9795-a48a-09125f5fceec@kernel.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <99b3ffb8-4205-9795-a48a-09125f5fceec@kernel.dk> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri 04-10-19 07:32:40, Jens Axboe wrote: > On 10/4/19 5:01 AM, Mika Westerberg wrote: > > On Fri, Oct 04, 2019 at 11:59:26AM +0200, Rafael J. Wysocki wrote: > >> On Friday, October 4, 2019 10:03:40 AM CEST Mika Westerberg wrote: > >>> On Thu, Oct 03, 2019 at 09:50:33AM -0700, Tejun Heo wrote: > >>>> Hello, Mika. > >>>> > >>>> On Wed, Oct 02, 2019 at 03:21:36PM +0300, Mika Westerberg wrote: > >>>>> but from that discussion I don't see more generic solution to be > >>>>> implemented. > >>>>> > >>>>> Any ideas we should fix this properly? > >>>> > >>>> Yeah, the only fix I can think of is not using freezable wq. It's > >>>> just not a good idea and not all that difficult to avoid using. > >>> > >>> OK, thanks. > >>> > >>> In that case I will just make a patch that removes WQ_FREEZABLE from > >>> bdi_wq and see what people think about it :) > >> > >> I guess that depends on why WQ_FREEZABLE was added to it in the first place. :-) > >> > >> The reason might be to avoid writes to persistent storage after creating an > >> image during hibernation, since wqs remain frozen throughout the entire > >> hibernation including the image saving phase. > > > > Good point. > > > >> Arguably, making the wq freezable is kind of a sledgehammer approach to that > >> particular issue, but in principle it may prevent data corruption from > >> occurring, so be careful there. > > > > I tried to find the commit that introduced the "freezing" and I think it > > is this one: > > > > 03ba3782e8dc writeback: switch to per-bdi threads for flushing data > > > > Unfortunately from that commit it is not clear (at least to me) why it > > calls set_freezable() for the bdi task. It does not look like it has > > anything to do with blocking writes to storage while entering > > hibernation but I may be mistaken. > > Wow, a decade ago... > > Honestly, I don't recall why these were marked freezable, and as I wrote > in the other reply, I don't think there's a good reason for that to be > the case. Well, cannot it happen that the flush worker will get stuck in D state because some subsystem is already suspended and thus hibernation fails (because AFAIK processes in uninterruptible sleep block hibernation)? I was also somewhat worried that the hibernation image could be inconsistent if flush workers do something while hibernation image is being taken but that does not seem to be a valid concern as all kernel processes get frozen before hibernation image is taken. Honza -- Jan Kara SUSE Labs, CR