Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp1465016rwr; Thu, 27 Apr 2023 19:02:20 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5hoy1FH5UAqWjLiS+gXwt1caKv3xnCh4GHVlRg8QVHmv2LURK6+WY+dZiz0PIHEkYjoJDs X-Received: by 2002:a6b:5c0a:0:b0:760:ec21:a8af with SMTP id z10-20020a6b5c0a000000b00760ec21a8afmr3144217ioh.0.1682647340482; Thu, 27 Apr 2023 19:02:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1682647340; cv=none; d=google.com; s=arc-20160816; b=WUv+4BegHyh4OBUmbFOk5urWt+JtrgSORw5QuMdEe66NrKdj7RPBzsypBN1/VuHQn7 Hk5Svy0rKfZMcJH8qoya3HnnURQoaDPK0b4GzqXwqtOtCSX50f5LfmNNYJtJgkN91Tgd gF91SV8m8jtzER5zbs8htvLsImF6d29CUUZeYyn6318/lWCmpmbVi4EbxwvveAA7IEI4 gk0IhvwqJnjhKAiJtDv8Viu76YHmksnGVVBzxlDlSAeKIqsZC6DrpKxbllquj7Ctalrj z6f1OuUF2f6Ak342GjNNF4K5beuTBsyzDcW/nm45bVezDnsvaP/u9R2io1znp+lXBCmW hL5g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=DAggdfiCq2i+jGMt/hx+W9dG2SinJfHBtiVgLfXz2w8=; b=NPgLYKsXPa5fpbE99mirxcGCh5ymiI0SVcrr0BJft3VYMA2KtuQc0rs2M0Gif3b+xV WiALKCxonp8iAfIL7Un+5C7n3EaRNKuSLNr7Gxlp3BysUSlgJtFEOktor4TZhvb4SWVn 7F9blVWOlk9BtOF76R5BmE71nAw/UmRJdLy1OL89c4R1wuNS+Xb8p1Ms4H/q5ZwTcb7D QzpvsRfZGkR60aulrOf1ixIXiPQ3WkaoK1qJP8dVmg7//UnHqzCsUSMhBsVNSoVIpZWU CYdnnYV7tMdc93FJRNRWS5Ie2LzwhNVkYlHuTe0KpDxl6KnGAiQbvc/bzLtmdqG4vuFl qNJA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=T6zvQBbk; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e10-20020a05660222ca00b00753056a8f5dsi13988270ioe.82.2023.04.27.19.02.00; Thu, 27 Apr 2023 19:02:20 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=T6zvQBbk; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229985AbjD1BmO (ORCPT + 99 others); Thu, 27 Apr 2023 21:42:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58078 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229818AbjD1BmO (ORCPT ); Thu, 27 Apr 2023 21:42:14 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 486AB1708 for ; Thu, 27 Apr 2023 18:41:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1682646093; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=DAggdfiCq2i+jGMt/hx+W9dG2SinJfHBtiVgLfXz2w8=; b=T6zvQBbkcAApoP0CFATrNewbzvWTVo2+bSM89NGazNHqbIp//PwkT80I1ezHSgKfKzOmcf B/b7HsjWm8iX2dfo6+74/0AueOgWLXPZywSjygFnsy/Q/L2L0ZhoUDc/gM8AJRY3UB5WAD gMaPOVQQo+s4KBOwMATdn0DYqxiFYJs= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-115-X7zBevW-NnOvO2-CBXNroA-1; Thu, 27 Apr 2023 21:41:29 -0400 X-MC-Unique: X7zBevW-NnOvO2-CBXNroA-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id A9C6529A9D40; Fri, 28 Apr 2023 01:41:28 +0000 (UTC) Received: from ovpn-8-24.pek2.redhat.com (ovpn-8-24.pek2.redhat.com [10.72.8.24]) by smtp.corp.redhat.com (Postfix) with ESMTPS id CBB96492C3E; Fri, 28 Apr 2023 01:41:20 +0000 (UTC) Date: Fri, 28 Apr 2023 09:41:15 +0800 From: Ming Lei To: Baokun Li Cc: Matthew Wilcox , Theodore Ts'o , linux-ext4@vger.kernel.org, Andreas Dilger , linux-block@vger.kernel.org, Andrew Morton , linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Dave Chinner , Eric Sandeen , Christoph Hellwig , Zhang Yi , yangerkun , ming.lei@redhat.com Subject: Re: [ext4 io hang] buffered write io hang in balance_dirty_pages Message-ID: References: <663b10eb-4b61-c445-c07c-90c99f629c74@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 X-Spam-Status: No, score=-2.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Thu, Apr 27, 2023 at 07:27:04PM +0800, Ming Lei wrote: > On Thu, Apr 27, 2023 at 07:19:35PM +0800, Baokun Li wrote: > > On 2023/4/27 18:01, Ming Lei wrote: > > > On Thu, Apr 27, 2023 at 02:36:51PM +0800, Baokun Li wrote: > > > > On 2023/4/27 12:50, Ming Lei wrote: > > > > > Hello Matthew, > > > > > > > > > > On Thu, Apr 27, 2023 at 04:58:36AM +0100, Matthew Wilcox wrote: > > > > > > On Thu, Apr 27, 2023 at 10:20:28AM +0800, Ming Lei wrote: > > > > > > > Hello Guys, > > > > > > > > > > > > > > I got one report in which buffered write IO hangs in balance_dirty_pages, > > > > > > > after one nvme block device is unplugged physically, then umount can't > > > > > > > succeed. > > > > > > That's a feature, not a bug ... the dd should continue indefinitely? > > > > > Can you explain what the feature is? And not see such 'issue' or 'feature' > > > > > on xfs. > > > > > > > > > > The device has been gone, so IMO it is reasonable to see FS buffered write IO > > > > > failed. Actually dmesg has shown that 'EXT4-fs (nvme0n1): Remounting > > > > > filesystem read-only'. Seems these things may confuse user. > > > > > > > > The reason for this difference is that ext4 and xfs handle errors > > > > differently. > > > > > > > > ext4 remounts the filesystem as read-only or even just continues, vfs_write > > > > does not check for these. > > > vfs_write may not find anything wrong, but ext4 remount could see that > > > disk is gone, which might happen during or after remount, however. > > > > > > > xfs shuts down the filesystem, so it returns a failure at > > > > xfs_file_write_iter when it finds an error. > > > > > > > > > > > > ``` ext4 > > > > ksys_write > > > > ?vfs_write > > > > ? ext4_file_write_iter > > > > ?? ext4_buffered_write_iter > > > > ??? ext4_write_checks > > > > ???? file_modified > > > > ????? file_modified_flags > > > > ?????? __file_update_time > > > > ??????? inode_update_time > > > > ???????? generic_update_time > > > > ????????? __mark_inode_dirty > > > > ?????????? ext4_dirty_inode ---> 2. void func, No propagating errors out > > > > ??????????? __ext4_journal_start_sb > > > > ???????????? ext4_journal_check_start ---> 1. Error found, remount-ro > > > > ??? generic_perform_write ---> 3. No error sensed, continue > > > > ???? balance_dirty_pages_ratelimited > > > > ????? balance_dirty_pages_ratelimited_flags > > > > ?????? balance_dirty_pages > > > > ??????? // 4. Sleeping waiting for dirty pages to be freed > > > > ??????? __set_current_state(TASK_KILLABLE) > > > > ??????? io_schedule_timeout(pause); > > > > ``` > > > > > > > > ``` xfs > > > > ksys_write > > > > ?vfs_write > > > > ? xfs_file_write_iter > > > > ?? if (xfs_is_shutdown(ip->i_mount)) > > > > ???? return -EIO;??? ---> dd fail > > > > ``` > > > Thanks for the info which is really helpful for me to understand the > > > problem. > > > > > > > > > balance_dirty_pages() is sleeping in KILLABLE state, so kill -9 of > > > > > > the dd process should succeed. > > > > > Yeah, dd can be killed, however it may be any application(s), :-) > > > > > > > > > > Fortunately it won't cause trouble during reboot/power off, given > > > > > userspace will be killed at that time. > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > Ming > > > > > > > > > Don't worry about that, we always set the current thread to TASK_KILLABLE > > > > > > > > while waiting in balance_dirty_pages(). > > > I have another concern, if 'dd' isn't killed, dirty pages won't be cleaned, and > > > these (big amount)memory becomes not usable, and typical scenario could be USB HDD > > > unplugged. > > > > > > > > > thanks, > > > Ming > > Yes, it is unreasonable to continue writing data with the previously opened > > fd after > > the file system becomes read-only, resulting in dirty page accumulation. > > > > I provided a patch in another reply. > > Could you help test if it can solve your problem? > > If it can indeed solve your problem, I will officially send it to the email > > list. > > OK, I will test it tomorrow. Your patch can avoid dd hang when bs is 512 at default, but if bs is increased to 1G and more 'dd' tasks are started, the dd hang issue still can be observed. The reason should be the next paragraph I posted. Another thing is that if remount read-only makes sense on one dead disk? Yeah, block layer doesn't export such interface for querying if bdev is dead. However, I think it is reasonable to export such interface if FS needs that. > > But I am afraid if it can avoid the issue completely because the > old write task hang in balance_dirty_pages() may still write/dirty pages > if it is one very big size write IO. thanks, Ming