Received: by 2002:ac0:e350:0:0:0:0:0 with SMTP id g16csp338080imn; Thu, 4 Aug 2022 07:56:22 -0700 (PDT) X-Google-Smtp-Source: AA6agR6RjuWovGqdSMEfdTeKFBzua/bn2yMKSaxB38XByjbjR1prOFXMqlb2e1N2Z+MGGi5Q/qUK X-Received: by 2002:a17:90b:384f:b0:1f4:ee87:9523 with SMTP id nl15-20020a17090b384f00b001f4ee879523mr2495203pjb.100.1659624982535; Thu, 04 Aug 2022 07:56:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1659624982; cv=none; d=google.com; s=arc-20160816; b=woHtT3qOPQGr3CtfdYco7LamZ4uDblhqxcOmo/fe/A7M9eUPGgucY0yGBsatiAQDUD 9Ce9wHQtWPfAOHQO7RWMVeTrF/qKi5VMnX4pRTKZBlKtxHQMPg5i5wQ8VWk0QF2/C6mj wQdpVZAmlgIyU8lNrn/Vghu8TZBVVipwYN8PSCZ7NsfKcisv7W1UWIotkL4D94IXWFw0 dSx45NeBvKpsqWLMmeI39+FTWwF0rrNWrUXtC0LqteyKvrMR04scbwjzVb36t50a4gNh 22089tAzgF6ufc+YLiYDUzvTO0+M26WqIBBQOnmoSnWcbDA7E1ml0Vn0p7Dsfu8IyINr 78pA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=pRmSUd1EoEFpb3TNDL28gZctGYf5YR6L//sWKjJLihE=; b=ue9heR+KkWSjg2LzssbMRBpJN2BumJM+xdg2HdrJbz4FpaOuSP4E4e1hAzEq1fMbwE WCzBiLCGHnT0NVrpfooQKIqup0GSSWdbevQUTss+trIiUo+TdXaskREZNuDO3n4Efd3y qhB3uWxGxmEPge4OxXroTURXG24t+NwC/qVXdEFuS8SRArKlvzrHodv4VJ6+I8YqPNzC t+oAz5ePqPqNtvPPD932azG9D3k9E+PV2wXg9S31JWtrN3PV73BVO67pI7NpNSWv3oMO EkHPvu9vyW5LWtCkIBvFfFelE03/dlij+xBpBygc2V15jBf/IOKXC8+kzG0BAUY457Q+ E27w== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@mit.edu header.s=outgoing header.b="VHPn/vp4"; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=mit.edu Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e37-20020a630f25000000b0041bbf561c75si1634972pgl.563.2022.08.04.07.55.57; Thu, 04 Aug 2022 07:56:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=fail header.i=@mit.edu header.s=outgoing header.b="VHPn/vp4"; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=mit.edu Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233597AbiHDOqE (ORCPT + 99 others); Thu, 4 Aug 2022 10:46:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59110 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240163AbiHDOqA (ORCPT ); Thu, 4 Aug 2022 10:46:00 -0400 Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ECD8E1A82A for ; Thu, 4 Aug 2022 07:45:57 -0700 (PDT) Received: from cwcc.thunk.org (pool-108-49-209-117.bstnma.fios.verizon.net [108.49.209.117]) (authenticated bits=0) (User authenticated as tytso@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id 274EjjYZ012421 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 4 Aug 2022 10:45:46 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mit.edu; s=outgoing; t=1659624346; bh=pRmSUd1EoEFpb3TNDL28gZctGYf5YR6L//sWKjJLihE=; h=Date:From:To:Cc:Subject:References:In-Reply-To; b=VHPn/vp4NkkVLR0/1oy5MmWSa1MQikEjGmbwJuV51jgkr7peGy7rvAAqqwv56H6U8 MeXOKwEgROBxiVEBZFWhov+A9zcfJQRZUrfft8PWFL4VqgtBzRu9WAuWzstInTlq1V owYkn/E9JR98tYp4uPSuUaBmws39LdPmeTDDMkS/kfT6Oz696+e6kAhe0rmI/nXEk7 uUSU4DsSzDF1kBiNUQtV0pz5dWhwhUpK8YFCKX1VA32DdObGyxUHr8zVyZx838UJ+O LoARp/FfUToM+EPcpm1HT0iQJ0tAK1+1tCxkKrstiJx1vqZhBGx5/JN4t882UPXwBU UeVFl5DTBqa5Q== Received: by cwcc.thunk.org (Postfix, from userid 15806) id E6CE815C00E4; Thu, 4 Aug 2022 10:45:44 -0400 (EDT) Date: Thu, 4 Aug 2022 10:45:44 -0400 From: "Theodore Ts'o" To: bugzilla-daemon@kernel.org Cc: linux-ext4@vger.kernel.org Subject: Re: [Bug 216322] Freezing of tasks failed after 60.004 seconds (1 tasks refusing to freeze... task:fstrim ext4_trim_fs - Dell XPS 13 9310 Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-4.0 required=5.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Thu, Aug 04, 2022 at 11:47:47AM +0000, bugzilla-daemon@kernel.org wrote: > > I agree that the FITRIM interface is flawed in this way. But > ext4_try_to_trim_range() actually does have fatal_signal_pending() and > will return -ERESTARTSYS if that's true. Or did you have something else in > mind? The fatal_signal_pending() only checks for SIGKILL. I'm not sure why it returns ERESTARTSYS, since that's not applicable for a kill -9 signal. The fake_signal_wake_up() function in kernel/freezer.c doesn't send a fatal signal, so the fatal_signal_pending() check isn't going to help here. > Also in that case, I see no reason why we would not be able to adjust > the fstrim_range to make it easier to re-start where we left off if > we're going to return -ERESTARTSYS. I am missing something? Well, we could adjust fstrim_range.start and fstrim_range.len to make it easier to restart --- but that's only if we know for sure that we're going to be restarting the system call. So we need to break some abstraction barriers since if the signal is one where based on the sigaction flags, the system all is *not* restarted, then fstrim_range.len is supposed to contain the number of bytes trimmed. And even if the system call is restarted, there's no place to stash the number of bytes trimmed so far, since fstrim_range.len is overloaded. This why the interface is so horrible... > I have not had time to look deeply into the traces, but are you actually > sure that we're not stuck in blkdev_issue_discard() instead? I'm not 100% certain, but unless the block device has been put to sleep first (in which case I think we would have noticed much sooner since lots of other suspend-to-ram use cases would be failling --- in writeback threads, for example), I'd be really surprised if we're getting stuck there. Even when we need to wait for the queue to be drained so there is space to send the next discard, that shouldn't take 60+ seconds. - Ted