Received: by 2002:a25:6193:0:0:0:0:0 with SMTP id v141csp138684ybb; Tue, 31 Mar 2020 19:14:43 -0700 (PDT) X-Google-Smtp-Source: APiQypJ/H577fmhIOw1ofzAcSptST3HvGxnrh39XuAKuNQlROxzMts/+7kHJiJKqbTMiktbbnhzK X-Received: by 2002:aca:4cc9:: with SMTP id z192mr1311723oia.134.1585707282952; Tue, 31 Mar 2020 19:14:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1585707282; cv=none; d=google.com; s=arc-20160816; b=yKdiApzpPBHXHN8vx8NzbDjqypOVLD+H/I0kfpnxfNqy8ihbyIlt+OPX1cAKt2VHw6 4Cbh0Uz70RxYF/IPHqbW0mSr+2CWMCc6bCncTOa6FNrj4LytXZPfVvF8BsOVNraMAAUV V2XSWZF62O+u6zMy4ffshni2/KHSLRkrbcSFZjwwe6MOErnXWDnVQG54C1KInRo5hDen LMzxTigXOJOM/idf4bEJ50BInhZ9K7UyGcLwf7E0OQkTdmmMa1WNnlUFFfJM922taSKv i3DwDGqKQ6OsxRieEEwcFHWjQaAAvHhX08RjJOWSza/95Z9ciTbmKYBYPtAjqvCLBJEB ak6A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:to:references:message-id :content-transfer-encoding:cc:date:in-reply-to:from:subject :mime-version:dkim-signature; bh=KniChMqTea044eYeZlDJiANOdPR1prRqmVeYeeXnee4=; b=WEZWOP7nO3GBRO69cztjnuc0wjFRIO+6q0GjV/0FgL0mkUvjcTnKRX+KilVPEJdQcE aZITzD8A7X5A3AUZ92epfFyrDDN238SYb6/Li+b+jI2QGbhEUHhnOGWhTcb4/BuAxpSV pxEGpNBOS5FrXmDUSQzeLn662rhucoiZt8hXVGyL038LRKcw5zUinIZO5fbZ8SZoSAJY 8PElQQI0FA9NNjzttaHsYxJMRI4Dd4UPsR7XsHLf+9IvGPjOBVXXRMl2OqUl2g7k452m dyWHWRdshJgtvwUMvYn5IeM1VuVN1C6uARZe02xY6Vr+U46Pk2i1o09kV0Z1HKhsBNiH Ge6g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@lca.pw header.s=google header.b=nSPPWhK2; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q2si186039otg.211.2020.03.31.19.14.30; Tue, 31 Mar 2020 19:14:42 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@lca.pw header.s=google header.b=nSPPWhK2; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731573AbgDACNq (ORCPT + 99 others); Tue, 31 Mar 2020 22:13:46 -0400 Received: from mail-qv1-f67.google.com ([209.85.219.67]:36346 "EHLO mail-qv1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731427AbgDACNq (ORCPT ); Tue, 31 Mar 2020 22:13:46 -0400 Received: by mail-qv1-f67.google.com with SMTP id z13so12091701qvw.3 for ; Tue, 31 Mar 2020 19:13:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lca.pw; s=google; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=KniChMqTea044eYeZlDJiANOdPR1prRqmVeYeeXnee4=; b=nSPPWhK2h5eO9qi6oaV3ytvnbTJjBjGLxyceKsKSqUTR6Y0zCKxs1YGiGWuJRUtlk4 9iK54/peAqPuSj3usv53qGY11IoJ34tGTo1qqcjFOjwmdV6Ja5i9NXVn32daM+XYBojC Eov2NIpwaNNSTxiP3MQyqNLM4OcfRCx2NwDEfBaXID3gqzhvyfTE9+n2iTWpq06qzFuB 8lKgDLxyz1rpnbnFpO+hH1+meQR6+MvWUpiiiDJVypNtIbqdTASqN5tZ6L2SM4YUj9Op jgSUpSFo9zZ6+y0tm+nrj1TynIh42MNydXVKyM17TDJMrRThIekDombdZKAQL35gtQIF NBkw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=KniChMqTea044eYeZlDJiANOdPR1prRqmVeYeeXnee4=; b=QDmzWQE7mY5bbyqdkZmPLlSYg+cZidpg1gWxN8eDvn/JHPur2PyCrsB1ORGW6cO9h6 ORY5JgEXLYiyqIixTEhSArozGmdzwPzlSp/bmGOtGDA34mJ0yG8GsUow7CNMwhkGVnNM BNDrT+y8XB7Qwh5i4A/Wi+Q7zQbVzKGM7dkBqbKThOC1jX9PhMdN2UrBoi4sBxtP5Ue/ pDXbnDpXwvRrB4L8zosfkdnfmkcsJMKAk11GSBDKthGI8jSn3u1OuFtps9VwFHFZUZtU nD6+nnEuQRLEN7KwaaEafIKhrauf6i06yIl+WBdnX1tj17/LpPDh9EV8IOO6sVKiJk8I rkMg== X-Gm-Message-State: ANhLgQ0trAX0q5/b25Y+IjT4cSOOtUP00KX99XrInfcv5gYJoFVh8WAK wIiKuDhFJdcIPvFqX+lbzUbndob5Mosofw== X-Received: by 2002:ad4:4e2f:: with SMTP id dm15mr6474007qvb.10.1585707224571; Tue, 31 Mar 2020 19:13:44 -0700 (PDT) Received: from [192.168.1.153] (pool-71-184-117-43.bstnma.fios.verizon.net. [71.184.117.43]) by smtp.gmail.com with ESMTPSA id v75sm594528qkb.22.2020.03.31.19.13.43 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 31 Mar 2020 19:13:44 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.80.23.2.2\)) Subject: Re: linux-next: xfs metadata corruption since 30 March From: Qian Cai In-Reply-To: <20200331221324.GZ10776@dread.disaster.area> Date: Tue, 31 Mar 2020 22:13:42 -0400 Cc: "Darrick J. Wong" , Christoph Hellwig , linux-xfs@vger.kernel.org, LKML Content-Transfer-Encoding: quoted-printable Message-Id: <05FB019A-F4DC-414C-B8D9-D2735AF22034@lca.pw> References: <990EDC4E-1A4E-4AC3-84D9-078ACF5EB9CC@lca.pw> <20200331221324.GZ10776@dread.disaster.area> To: Dave Chinner X-Mailer: Apple Mail (2.3608.80.23.2.2) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On Mar 31, 2020, at 6:13 PM, Dave Chinner wrote: >=20 > On Tue, Mar 31, 2020 at 05:57:24PM -0400, Qian Cai wrote: >> Ever since two days ago, linux-next starts to trigger xfs metadata = corruption >> during compilation workloads on both powerpc and arm64, >=20 > Is this on an existing filesystem, or a new filesystem? New. >=20 >> I suspect it could be one of those commits, >>=20 >> https://lore.kernel.org/linux-xfs/20200328182533.GM29339@magnolia/ >>=20 >> Especially, those commits that would mark corruption more = aggressively? >>=20 >> [8d57c21600a5] xfs: add a function to deal with corrupt buffers = post-verifiers >> [e83cf875d67a] xfs: xfs_buf_corruption_error should take = __this_address >> [ce99494c9699] xfs: fix buffer corruption reporting when = xfs_dir3_free_header_check fails >> [1cb5deb5bc09] xfs: don't ever return a stale pointer from = __xfs_dir3_free_read >> [6fb5aac73310] xfs: check owner of dir3 free blocks >> [a10c21ed5d52] xfs: check owner of dir3 data blocks >> [1b2c1a63b678] xfs: check owner of dir3 blocks >> [2e107cf869ee] xfs: mark dir corrupt when lookup-by-hash fails >> [806d3909a57e] xfs: mark extended attr corrupt when = lookup-by-hash fails >=20 > Doubt it - they only add extra detection code and these: >=20 >> [29331.182313][ T665] XFS (dm-2): Metadata corruption detected at = xfs_inode_buf_verify+0x2b8/0x350 [xfs], xfs_inode block 0xa9b97900 = xfs_inode_buf_verify >> xfs_inode_buf_verify at fs/xfs/libxfs/xfs_inode_buf.c:101 >> [29331.182373][ T665] XFS (dm-2): Unmount and run xfs_repair >> [29331.182386][ T665] XFS (dm-2): First 128 bytes of corrupted = metadata buffer: >> [29331.182402][ T665] 00000000: 2f 2a 20 53 50 44 58 2d 4c 69 63 65 = 6e 73 65 2d /* SPDX-License- >> [29331.182426][ T665] 00000010: 49 64 65 6e 74 69 66 69 65 72 3a 20 = 47 50 4c 2d Identifier: GPL- >=20 > Would get caught by the existing verifiers as they aren't valid > metadata at all. >=20 > Basically, you are getting file data where there should be inode > metadata. First thing to do is fix the existing corruptions with > xfs_repair - please post the entire output so we can see what was > corruption and what it fixed. # xfs_repair -v /dev/mapper/rhel_hpe--apollo--cn99xx--11-home=20 Phase 1 - find and verify superblock... - block cache size set to 4355512 entries Phase 2 - using internal log - zero log... zero_log: head block 793608 tail block 786824 ERROR: The filesystem has valuable metadata changes in a log which needs = to be replayed. Mount the filesystem to replay the log, and unmount it = before re-running xfs_repair. If you are unable to mount the filesystem, then = use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a = mount of the filesystem before doing this. # mount /dev/mapper/rhel_hpe--apollo--cn99xx--11-home /home/ # umount /home/ # xfs_repair -v /dev/mapper/rhel_hpe--apollo--cn99xx--11-home=20 Phase 1 - find and verify superblock... - block cache size set to 4355512 entries Phase 2 - using internal log - zero log... zero_log: head block 793624 tail block 793624 - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno =3D 0 - agno =3D 1 - agno =3D 2 - agno =3D 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno =3D 0 - agno =3D 2 - agno =3D 1 - agno =3D 3 Phase 5 - rebuild AG headers and trees... - agno =3D 0 - agno =3D 1 - agno =3D 2 - agno =3D 3 - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - agno =3D 0 - agno =3D 1 - agno =3D 2 - agno =3D 3 - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... XFS_REPAIR Summary Tue Mar 31 22:10:54 2020 Phase Start End Duration Phase 1: 03/31 22:10:45 03/31 22:10:45=09 Phase 2: 03/31 22:10:45 03/31 22:10:45=09 Phase 3: 03/31 22:10:45 03/31 22:10:46 1 second Phase 4: 03/31 22:10:46 03/31 22:10:53 7 seconds Phase 5: 03/31 22:10:53 03/31 22:10:53=09 Phase 6: 03/31 22:10:53 03/31 22:10:53=09 Phase 7: 03/31 22:10:53 03/31 22:10:53=09 Total run time: 8 seconds done >=20 > Then if the problem is still reproducable, I suspect you are going > to have to bisect it. i.e. run test, get corruption, mark bisect > bad, run xfs_repair or mkfs to fix mess, install new kernel, run > test again.... >=20 > Cheers, >=20 > Dave. > --=20 > Dave Chinner > david@fromorbit.com