Received: by 2002:a05:6a10:1a4d:0:0:0:0 with SMTP id nk13csp6026711pxb; Mon, 14 Feb 2022 13:26:10 -0800 (PST) X-Google-Smtp-Source: ABdhPJxwmD8xA0qmvgEa1DGFz7shT4vWwRR8ZsbVYQhh7xWDIJ0+ujOvNpHUirpp0fB4Z/rgNaYR X-Received: by 2002:a17:902:b494:: with SMTP id y20mr809095plr.167.1644873970275; Mon, 14 Feb 2022 13:26:10 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1644873970; cv=none; d=google.com; s=arc-20160816; b=D5pH1cOIiZ2HDVB1F5bsHw9GfPVy6BZsOHxm3B14mR1+MToA/POQ0tXznmkxACMpWA YwFRrlwjV9d50rcDX1APoIExAcArmDz45svyNOAHQD6Zk126t47qoe9V/m6ACvNZTDxn ikLWhEXzuuclDhVk9ym6NFFJzHREGaOo66nckN6X48sPW9n6wYLrAYYqibDMPxfBMH7s H9wLZ1H5kdl7YkojfAOH6hHczezPorkfaj3CNVCv4xwdEvszFW+Biw8fDifiYXnqpmg9 MCNaxD+GKtow8yKLYD68sMhriV7fTV/E44GGhDXqVBrxatW5M7Nzn4i4AUrRFmVCgXl5 6Wqw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=DHTZ4X93EBgAXhQ+1KosV3ycLyMKWBkj8b7T4xLtbKw=; b=pKIjZ0dQHukGaiCXza3DiuyBlDQmwQfIpCvVhLVQEbZWwvDgobdTAhJggyQFKUsYdB zSbdGZjbNYQYDspkbC4mQh1Z9+C/h+XW5GARafsO8AOzkrk9c7Fn5F3sJ/6yOcr/r1Sg +uzxQ1El3ojmDHxFPPhP8bR+7rENmtQ8MPEuZ7MfarANIAH2jXu5bP/XIkbufGK1su5B uVGy91fsH6/WCHyC9UUFwpPjpYCPRgEKF2e9OTuKxOkrL+3WctDHMDKkS2nC9D21KpGF eRj+FYFsSno7iN4EcMrqSVQzBeK/7/EzFqwx+ewUpani5spPBfSR6iPT2ufjlgK0b67p HF5A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@toxicpanda-com.20210112.gappssmtp.com header.s=20210112 header.b=kAbth2wJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id f65si780536pgc.524.2022.02.14.13.26.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 14 Feb 2022 13:26:10 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@toxicpanda-com.20210112.gappssmtp.com header.s=20210112 header.b=kAbth2wJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 03E52B23B8; Mon, 14 Feb 2022 12:45:45 -0800 (PST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1355592AbiBNPGs (ORCPT + 99 others); Mon, 14 Feb 2022 10:06:48 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:36962 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355579AbiBNPGr (ORCPT ); Mon, 14 Feb 2022 10:06:47 -0500 Received: from mail-qk1-x734.google.com (mail-qk1-x734.google.com [IPv6:2607:f8b0:4864:20::734]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5386249F9D for ; Mon, 14 Feb 2022 07:06:39 -0800 (PST) Received: by mail-qk1-x734.google.com with SMTP id b35so14565973qkp.6 for ; Mon, 14 Feb 2022 07:06:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20210112.gappssmtp.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to; bh=DHTZ4X93EBgAXhQ+1KosV3ycLyMKWBkj8b7T4xLtbKw=; b=kAbth2wJxr7bdox2Ck1J0RadJRt1M64cYFeZgPfW/4WTH7QLyENPaBM4m586XUHfQa MjV9Kf0FoPQ1FwUZXlsWaUuhNbFvJtEcyg1CJLagf8ErMB3pycUTSO+3P8djes/k/zd9 FHLAsBdFpuRAD9s65ikGSM1CKfD5za6peJADsCBL37jdUlTpbqAEVa6lJmQVECu8kNTw 56z+Ee6Jv3jgdOb9Z8mW2olwiHL14QIuQUbEjCwXklCEDDzmjDzOGMtONCZ/rg0KV/7q LVnRo8bdeWL/Ct2GKAVij+HMNK9B0cyoBKuSBcSpxgUmgWhwUP43XsCRZURnQWb4AyPC a5vw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=DHTZ4X93EBgAXhQ+1KosV3ycLyMKWBkj8b7T4xLtbKw=; b=VC9qtOOsFluk9ARlh91U6ly5yMqaHUvnUFDlcrd5aBmclvRPmiXLgia7QHPgxUWglT eNRPYQqNvWv+0/AU8yDkkXq0pzQzkOiMt1ioSdXhlkFa2OgjU89q0GJDuRdPh9u8viZE D3IGqAjhgZbn2RYGKVELk4HOHpNAA99tuE52b/62tLme7+F6OE3msy/YP9F/UIpB+kjW xdVO58gnECNsNsf3JjBkCo2SX+vVmq90o+ZjmPpM0CVJGWrc0XyHHhW645sgt4xBRXBq GAq9OZkuyjtRqbH3pVTzAmq+eQpHBT9uxg5uQUHxB0+29geiUYPMZYgxMZv/1HrqJkaT I+PQ== X-Gm-Message-State: AOAM533wLICorXfRyD++kVTnCb9OEB34YHdigdYc5lF0VzlPU0bua3gk xhPfiZ7ceni/tkY9LoK6P0joBQ== X-Received: by 2002:a37:9fcf:: with SMTP id i198mr29775qke.288.1644851198226; Mon, 14 Feb 2022 07:06:38 -0800 (PST) Received: from localhost (cpe-174-109-172-136.nc.res.rr.com. [174.109.172.136]) by smtp.gmail.com with ESMTPSA id bi11sm15191055qkb.18.2022.02.14.07.06.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 14 Feb 2022 07:06:37 -0800 (PST) Date: Mon, 14 Feb 2022 10:06:36 -0500 From: Josef Bacik To: Paul Menzel Cc: Chris Mason , David Sterba , linux-btrfs@vger.kernel.org, LKML Subject: Re: I/O errors only during shutdown and free space cache failures Message-ID: References: <404c5f3b-24c1-c442-9132-82d3212bf7f1@molgen.mpg.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <404c5f3b-24c1-c442-9132-82d3212bf7f1@molgen.mpg.de> X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RDNS_NONE, SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Feb 14, 2022 at 10:11:18AM +0100, Paul Menzel wrote: > Dear Linux folks, > > > On an IBM S822LC with Ubuntu 21.10, only when shutting down the system there > are I/O errors hanging the system. I was able to capture some messages from > the end: > > ``` > [XXXXXXXXXXXX3] I/O error, dev sda, sector 1198060800 op 0x1:(WRITE) flags > 0x100000 phys_seg 1 prio class 0 > [223109.260842] BTRFS error (device sda2): bdev /dev/sda2 errs: wr 3, rd 0, > flush 0, corrupt 0, gen 0 > [223109.260878] sd 0:0:0:0: [sda] tag#15 timing out command, waited 180s > [223109.261026] I/O error, dev sda, sector 1198829952 op 0x1:(WRITE) flags > 0x104000 phys_seg 20 prio class 0 > [223109.360562] sd 0:0:0:0: [sda] tag#28 timing out command, waited 180s > [223109.360615] I/O error, dev sda, sector 1198835072 op 0x1:(WRITE) flags > 0x100000 phys_seg 17 prio class 0 > [223109.360757] sd 0:0:0:0: [sda] tag#29 timing out command, waited 180s > [223109.360778] I/O error, dev sda, sector 1198832512 op 0x1:(WRITE) flags > 0x104000 phys_seg 20 prio class 0 > [223109.360798] BTRFS error (device sda2): bdev /dev/sda2 errs: wr 4, rd 0, > flush 0, corrupt 0, gen 0 > [223179.108402] INFO: task kworker/0:2:651024 blocked for more than 120 > seconds. > [223179.108481] Not tainted 5.17.0-rc3+ #1 > [223179.108580] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [223179.109311] INFO: task kworker/u321:7:652803 blocked for more than 120 > seconds. > [223179.109357] Not tainted 5.17.0-rc3+ #1 > [223179.109370] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [223289.259960] sd 0:0:0:0: [sda] tag#24 timing out command, waited 180s > [223289.260169] I/O error, dev sda, sector 1197961728 op 0x0:(READ) flags > 0x0 phys_seg 1 prio class 0 > [223289.260192] BTRFS error (device sda2): bdev /dev/sda2 errs: wr 4, rd 1, > flush 0, corrupt 0, gen 0 > [223289.363952] sd 0:0:0:0: [sda] tag#13 timing out command, waited 180s > [223289.363958] sd 0:0:0:0: [sda] tag#16 timing out command, waited 180s > [223289.364152] I/O error, dev sda, sector 1064447232 op 0x1:(WRITE) flags > 0x4800 phys_seg 20 prio class 0 > [223289.364176] I/O error, dev sda, sector 1198837504 op 0x1:(WRITE) flags > 0x100000 phys_seg 7 prio class 0 > [223289.364182] sd 0:0:0:0: [sda] tag#14 timing out command, waited 180s > [223289.364198] BTRFS error (device sda2): bdev /dev/sda2 errs: wr 5, rd 1, > flush 0, corrupt 0, gen 0 > [223289.364364] I/O error, dev sda, sector 1064444672 op 0x1:(WRITE) flags > 0x4800 phys_seg 20 prio class 0 > [223289.364377] sd 0:0:0:0: [sda] tag#17 timing out command, waited 180s > [223289.364388] sd 0:0:0:0: [sda] tag#15 timing out command, > [228549.050771442,5] IPMI: Soft shutdown requested > waited 180s > [223289.364417] I/O error, dev sda, sector 1198838528 op 0x1:(WRITE) flags > 0x100000 phys_seg 2 prio class 0 > [223289.364421] I/O error, dev sda, sector 1064452352 op 0x1:(WRITE) flags > 0x800 phys_seg 4 prio class 0 > [223289.364438] BTRFS error (device sda2): bdev /dev/sda2 errs: wr 6, rd 1, > flush 0, corrupt 0, gen 0 > [223289.364466] sd 0:0:0:0: [sda] tag#19 timing out command, waited 180s > [223289.364498] I/O error, dev sda, sector 1064449792 op 0x1:(WRITE) flags > 0x4800 phys_seg 20 prio class 0 > [223289.364520] BTRFS error (device sda2): bdev /dev/sda2 errs: wr 7, rd 1, > flush 0, corrupt 0, gen 0 > [223299.944147] INFO: task kworker/0:2:651024 blocked for more than 241 > seconds. > [223299.944197] Not tainted 5.17.0-rc3+ #1 > [223299.944327] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [223299.945059] INFO: task kworker/u321:7:652803 blocked for more than 241 > seconds. > [223299.945105] Not tainted 5.17.0-rc3+ #1 > [223299.945117] "echo 0 > /proc/sys/kernel/hun[228630.461459920,5] OPAL: > Shutdown request type 0x0... > g_task_timeout_secs" disables this message. > [223303.591904] sd 1:0:0:0: [sdb] tag#23 timing out command, waited 360s > [223303.592090] I/O error, dev sdb, sector 0 op 0x1:(WRITE) flags 0x800 > phys_seg 0 prio class 0 > [223420.771542] INFO: task kworker/35:2:1077 blocked for more than 120 > seconds. > [223420.771680] Not tainted 5.17.0-rc3+ #1 > [223420.771722] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [223469.263372] sd 0:0:0:0: [sda] tag#31 timing out command, waited 180s > [223469.263480] I/O error, dev sda, sector 1197961728 op 0x0:(READ) flags > 0x0 phys_seg 1 prio class 0 > [223469.263600] BTRFS error (device sda2): bdev /dev/sda2 errs: wr 7, rd 2, > flush 0, corrupt 0, gen 0 > [223469.367343] sd 0:0:0:0: [sda] tag#17 timing out command, waited 180s > [223469.367346] sd 0:0:0:0: [sda] tag#19 timing out command, waited 180s > [223469.367358] sd 0:0:0:0: [sda] tag#15 timing out command, waited 180s > [223469.367382] I/O error, dev sda, sector 1198836992 op 0x1:(WRITE) flags > 0x800 phys_seg 1 prio class 0 > [223469.367390] BTRFS error (device sda2): bdev /dev/sda2 errs: wr 8, rd 2, > flush 0, corrupt 0, gen 0 > [223469.367401] I/O error, dev sda, sector 1198839168 op 0x1:(WRITE) flags > 0x100000 phys_seg 1 prio class 0 > [223469.367406] BTRFS error (device sda2): bdev /dev/sda2 errs: wr 9, rd 2, > flush 0, corrupt 0, gen 0 > [223469.367433] sd 0:0:0:0: [sda] tag#18 timing out command, waited 180s > [223469.367442] I/O error, dev sda, sector 1198839424 op 0x1:(WRITE) flags > 0x100000 phys_seg 6 prio class 0 > [223469.367445] BTRFS error (device sda2): bdev /dev/sda2 errs: wr 10, rd 2, > flush 0, corrupt 0, gen 0 > [223469.367585] sd 0:0:0:0: [sda] tag#16 timing out command, waited 180s > [223469.367637] I/O error, dev sda, sector 480640 op 0x1:(WRITE) flags > 0x1800 phys_seg 1 prio class 0 > [223469.367670] BTRFS error (device sda2): bdev /dev/sda2 errs: wr 11, rd 2, > flush 0, corrupt 0, gen 0 > [223469.367690] I/O error, dev sda, sector 1198834816 op 0x1:(WRITE) flags > 0x800 phys_seg 1 prio class 0 > [223469.367798] sd 0:0:0:0: [sda] tag#20 timing out command, waited 180s > [223469.367845] BTRFS error (device sda2): bdev /dev/sda2 errs: wr 12, rd 2, > flush 0, corrupt 0, gen 0 > [223469.367888] sd 0:0:0:0: [sda] tag#21 timing out command, waited 180s > [223469.367929] I/O error, dev sda, sector 2577792 op 0x1:(WRITE) flags > 0x1800 phys_seg 1 prio class 0 > [223469.367949] BTRFS error (device sda2): bdev /dev/sda2 errs: wr 13, rd 2, > flush 0, corrupt 0, gen 0 > [223469.367949] I/O error, dev sda, sector 1198838272 op 0x1:(WRITE) flags > 0x800 phys_seg 1 prio class 0 > [223469.367989] BTRFS error (device sda2): bdev /dev/sda2 errs: wr 14, rd 2, > flush 0, corrupt 0, gen 0 > [223469.368018] sd 0:0:0:0: [sda] tag#22 timing out command, waited 180s > [223469.368046] I/O error, dev sda, sector 1198831360 op 0x1:(WRITE) flags > 0x800 phys_seg 1 prio class 0 > [223469.368069] BTRFS error (device sda2): bdev /dev/sda2 errs: wr 15, rd 2, > flush 0, corrupt 0, gen 0 > [223469.368105] sd 0:0:0:0: [sda] tag#23 timing out command, waited 180s > [223469.368132] I/O error, dev sda, sector 1198060800 op 0x1:(WRITE) flags > 0x800 phys_seg 1 prio class 0 > [223469.368156] BTRFS error (device sda2): bdev /dev/sda2 errs: wr 16, rd 2, > flush 0, corrupt 0, gen 0 > [223469.380930] reboot: Power down > ``` > > I had to power down the system manually. There are no messages when booting > (and mounting) the HDD, which is the system drive. > > Checking the partition, checking the free space cache returns some failures: > > ``` > $ btrfs version > btrfs-progs v5.10.1 > $ sudo btrfs check --readonly --force /dev/sda2 > Opening filesystem to check... > WARNING: filesystem mounted, continuing because of --force > Checking filesystem on /dev/sda2 > UUID: 2c3dd738-785a-469b-843e-9f0ba8b47b0d > [1/7] checking root items > [2/7] checking extents > [3/7] checking free space cache > btrfs: space cache generation (4119788) does not match inode (4119782) > failed to load free space cache for block group 29360128 > btrfs: space cache generation (4119788) does not match inode (4119775) > failed to load free space cache for block group 741448089600 > btrfs: space cache generation (4119786) does not match inode (4119782) > failed to load free space cache for block group 751111766016 > btrfs: space cache generation (4119788) does not match inode (4119782) > failed to load free space cache for block group 795135180800 > btrfs: space cache generation (4119788) does not match inode (4119782) > failed to load free space cache for block group 796208922624 > btrfs: space cache generation (4119788) does not match inode (4119782) > failed to load free space cache for block group 803725115392 > btrfs: space cache generation (4119788) does not match inode (4119782) > failed to load free space cache for block group 804798857216 > btrfs: space cache generation (4119788) does not match inode (4119782) > failed to load free space cache for block group 808020082688 > btrfs: space cache generation (4119788) does not match inode (4119782) > failed to load free space cache for block group 811241308160 > [4/7] checking fs roots > [5/7] checking only csums items (without verifying data) > [6/7] checking root refs > [7/7] checking quota groups skipped (not enabled on this FS) > found 604326461440 bytes used, no error found > total csum bytes: 35109100 > total tree bytes: 7996112896 > total fs tree bytes: 7701987328 > total extent tree bytes: 231276544 > btree space waste bytes: 1615676149 > file data blocks allocated: 818525175808 > referenced 724324974592 > ``` > > Is that related? Is there a way to fix the system partition during runtime? > (Otherwise I’d need to boot some live system or take the drive out.) > These messages are just informational, so you know it's going to throw out the free space cache and re-create it from the extent tree. These are common if there's an unclean shutdown. Just as a general recommendation I'd suggest switching to -o space_cache=v2, which is the new default and better for performance in general. That being said the disk is having trouble, I'd replace it. The fs is fine so you should be good to get everything off of it. Thanks, Josef