Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp17499rwr; Wed, 26 Apr 2023 19:46:52 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7+92xUdpi1lsBlIBbm0t8bobTVqrIRW+LfCkwLyu4Wut4UDg6/g5gSPjNjvvymiYjEbwV2 X-Received: by 2002:a17:90a:c283:b0:24b:cad3:afb8 with SMTP id f3-20020a17090ac28300b0024bcad3afb8mr263986pjt.23.1682563612224; Wed, 26 Apr 2023 19:46:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1682563612; cv=none; d=google.com; s=arc-20160816; b=uMxXNzsWG1l9vPc4Ln37v8kOnkTFzAirLzx6qzkkudlwinD0aiBmnWfl8Cp2nZu7q2 lfM022m84ILIqXWAx0sU6y1Rbm8xBJkp//Qgc+zZGkJKM2GF24DPnokoQG3wUriEk82x Azu2E+vRhDypMcytphcDCJC+pkWipqV2WffoIH/mHIMktvtDee6N897JqjiVk7MgwXRQ yirSe2JWAVxk3m+r02ZmAEbaqRqi5uFiZQM+yTYyER1JuZ8TsACXkmLL03YLn17WrcZp yF0Xlly0GmtFWyYerC+y+TBR2WYkd6IN2soR5xUFIlwroWuqW1lroBy6Qsi/4L9sCYxk uXgw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-disposition:mime-version:message-id :subject:cc:to:from:date:dkim-signature; bh=GCPcaFdJ9SvoZ93cdbG0lFwwi9Thl1ubA5vESylOj3g=; b=EwSZNbUXICyqYh4xPVbJvYnfZt+5hyT5uigzMFOYpzYh2AFxuGCCxkZsPZgLwy3NvT cy0wPfxYdfNadBTRo1HLWc+xlaVLZkyULqqsGPEsQGlQDBBeNNss+SqYdQCv8NCaQNF6 zPJ9FcsADzt5beOI8w017onIaD4F+PstJxGnDyX81pUdVPpp9w+YMZ5mFiImVHYE4ptd yCC3g1sxuUiT9REXpaelRNJc58P4SSs1RYaysggDJV+lwpffPNarJN/CreVDgcuDxVRs M35csIYQ5tc1vbkbQASNOK8Zhg7jYDj2QskVQytNZSXojLYFcwPfbAA7Ag4xewabsQDP nATA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=XxW94Zzk; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id mj6-20020a17090b368600b0023d3fad2294si17323572pjb.10.2023.04.26.19.46.31; Wed, 26 Apr 2023 19:46:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=XxW94Zzk; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242709AbjD0CVj (ORCPT + 99 others); Wed, 26 Apr 2023 22:21:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33514 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242731AbjD0CVi (ORCPT ); Wed, 26 Apr 2023 22:21:38 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 090DF40CF for ; Wed, 26 Apr 2023 19:20:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1682562045; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type; bh=GCPcaFdJ9SvoZ93cdbG0lFwwi9Thl1ubA5vESylOj3g=; b=XxW94ZzkWl43miFhqKWVrnSo1FayrKHOO/mrIsjwW8hfSGpClTsDjeHi98i+EVGfEAkIQA Q0HnIPrNfghuYjSBvwdKQ0ueMwFnFowu52jIbbaWC2zQ7rsCT52ShCqM3odUYLJ3pXb20b Vku2awX/4E6y0fb1x59ienDFlbNsQgE= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-605-aRep1uVGPCu1PlVejb15nA-1; Wed, 26 Apr 2023 22:20:40 -0400 X-MC-Unique: aRep1uVGPCu1PlVejb15nA-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 5842B29A9D28; Thu, 27 Apr 2023 02:20:40 +0000 (UTC) Received: from ovpn-8-24.pek2.redhat.com (ovpn-8-24.pek2.redhat.com [10.72.8.24]) by smtp.corp.redhat.com (Postfix) with ESMTPS id C35681121314; Thu, 27 Apr 2023 02:20:33 +0000 (UTC) Date: Thu, 27 Apr 2023 10:20:28 +0800 From: Ming Lei To: Theodore Ts'o , linux-ext4@vger.kernel.org Cc: ming.lei@redhat.com, Andreas Dilger , linux-block@vger.kernel.org, Andrew Morton , linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Dave Chinner , Eric Sandeen , Christoph Hellwig , Zhang Yi Subject: [ext4 io hang] buffered write io hang in balance_dirty_pages Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 X-Spam-Status: No, score=-2.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org Hello Guys, I got one report in which buffered write IO hangs in balance_dirty_pages, after one nvme block device is unplugged physically, then umount can't succeed. Turns out it is one long-term issue, and it can be triggered at least since v5.14 until the latest v6.3. And the issue can be reproduced reliably in KVM guest: 1) run the following script inside guest: mkfs.ext4 -F /dev/nvme0n1 mount /dev/nvme0n1 /mnt dd if=/dev/zero of=/mnt/z.img& sleep 10 echo 1 > /sys/block/nvme0n1/device/device/remove 2) dd hang is observed and /dev/nvme0n1 is gone actually [root@ktest-09 ~]# ps -ax | grep dd 1348 pts/0 D 0:33 dd if=/dev/zero of=/mnt/z.img 1365 pts/0 S+ 0:00 grep --color=auto dd [root@ktest-09 ~]# cat /proc/1348/stack [<0>] balance_dirty_pages+0x649/0x2500 [<0>] balance_dirty_pages_ratelimited_flags+0x4c6/0x5d0 [<0>] generic_perform_write+0x310/0x4c0 [<0>] ext4_buffered_write_iter+0x130/0x2c0 [ext4] [<0>] new_sync_write+0x28e/0x4a0 [<0>] vfs_write+0x62a/0x920 [<0>] ksys_write+0xf9/0x1d0 [<0>] do_syscall_64+0x59/0x90 [<0>] entry_SYSCALL_64_after_hwframe+0x63/0xcd [root@ktest-09 ~]# lsblk | grep nvme [root@ktest-09 ~]# BTW, my VM sets 2G ram, and the nvme disk size is 40GB. So far only observed on ext4 FS, not see it on XFS. I guess it isn't related with disk type, and not tried such test on other type of disks yet, but will do. Seems like dirty pages aren't cleaned after ext4 bio is failed in this situation? Thanks, Ming