Received: by 2002:a05:6358:11c7:b0:104:8066:f915 with SMTP id i7csp1602579rwl; Fri, 31 Mar 2023 13:49:17 -0700 (PDT) X-Google-Smtp-Source: AKy350ZnW4hBmQCQjPt3QgshBdU0Dgv5pG8rZ6xlXRfO6SLvfF4a7s+2ZLJUk5D2NeZiEUS0+TbF X-Received: by 2002:a17:903:41c6:b0:1a1:faf4:4165 with SMTP id u6-20020a17090341c600b001a1faf44165mr7699722ple.3.1680295757134; Fri, 31 Mar 2023 13:49:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680295757; cv=none; d=google.com; s=arc-20160816; b=S6COUhuyCbfToT58czDhM0SeDDXnY9roUgEaSLSP9TSyH2WA7OLAuKxUZiJskS/DFH vOafK0a6ogug0rGIJvyVEC+DIHHyKdwTw7acO4J28Ix0+5sVoSlM7EMSkqMJhFPnzWmh 6gHjV0OraxXLNFzZBz5f6WhePCVeq/2bA6AnvKGD8p+rlMfdUN8KiUA/+qg+NpyecrdQ e0wq+WRpU4EjfTXnQw9FsX/pW3DkgEBQd4d/XXgSxvVcPHxBH0YHlesCC6fluOiPR4nM JMdfvFM/ElC22agqM/KTUj0LsUQpFAD1PVFueP6ips+Kcayh3q7depF4nzKUO0XT+/Rm P5lQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=yvQa5duEFmv21oECF91MpJhqxo4SC69A1svuPHKziuc=; b=ycT9ER8IAUqBZvhFWzxAkied4OKf5r2LsDwplQcQvgu3VjoI3Wy8Semkry6EG4MVDx law0uMM3j6F8BNQ4Do/lH5I/xQli9waLw0jw9+qHJ9fWAe8g+WZMjg2xeT99Mdf0CvIf CDtzpOvJ5r2BmrCs1pkb5atD3y1CiqmlqQP8zpZBDrsDeLauQ4kPaZ91uLAX2Z1LQAqw jaLP/cdmzOPOM4XC3+R3WCSVacIPInxOPBsC5goIKVUYsibakfJk9VjPTPLe2PMRBCyt FduNvtuFufWlZjVd4SXVH+vDPktkyPmkBDrkBhfrT324NzWjC+TYFCzK/VArqRCU4JIX zGMg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@fromorbit-com.20210112.gappssmtp.com header.s=20210112 header.b=KhVXzwIW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=fromorbit.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a4-20020a170902b58400b0019e6d80f948si3004770pls.485.2023.03.31.13.49.05; Fri, 31 Mar 2023 13:49:17 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@fromorbit-com.20210112.gappssmtp.com header.s=20210112 header.b=KhVXzwIW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=fromorbit.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232781AbjCaUqe (ORCPT + 99 others); Fri, 31 Mar 2023 16:46:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57360 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231609AbjCaUqc (ORCPT ); Fri, 31 Mar 2023 16:46:32 -0400 Received: from mail-pf1-x42d.google.com (mail-pf1-x42d.google.com [IPv6:2607:f8b0:4864:20::42d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 76DC22220E for ; Fri, 31 Mar 2023 13:46:31 -0700 (PDT) Received: by mail-pf1-x42d.google.com with SMTP id q20so4470448pfs.2 for ; Fri, 31 Mar 2023 13:46:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20210112.gappssmtp.com; s=20210112; t=1680295591; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=yvQa5duEFmv21oECF91MpJhqxo4SC69A1svuPHKziuc=; b=KhVXzwIWCeYgb1lZI5Yha5mNmFpNFAJqlC0Egm6HvFwWgb8Ki2n4SSpvugRY3D7BkR s1fg44MShT6ti7TmbJ79LYprwI882/Fr4a77KAADJnzA4VJR6b1Jgmklh5BBwWFDF1SU ZUQ1Nvc4tRsdwUhgNV+GfDF4ysFpvn3hWx25MLLdCX5KuKYS+MCfxG1vlpHbrl2l24Yb AknVvu0jfrSD133BhKNQ5ZXlH1faApCoWGgr/6thHtXalF72GX86hiBrI1MKX9eorMiP 3Oz03NZcqXK+VkTJRCzIYyVJX5+OPHiOzkIm1AhN5j9kldHmGj52qDyl6vTjy+8tAxMI 7d7w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680295591; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=yvQa5duEFmv21oECF91MpJhqxo4SC69A1svuPHKziuc=; b=K4+lBt7L2Mib2TfD5x/vPspj7gqPQcmzmptJw+qs/tsE3znO/WWQivpJkGlc6bZw0C cRDxY2OFgJcJY7B0dGDupdxFqROhBdEVpMS+/emlc1FFJvv9xulU3YYstXYm5ISzReqX /TH18YaiB+LBvhdEjC3EjYmxkbgmZhNJypvvE7KpGAasZ5GqY5UjX1YJuL40i4d0clNI Fv4ifyRZW3DpM5NC/GOYkxNuUUA7ju5uEYeQbu2w9DwYYh9CKtMbNHxmiBM+giFt47Wn FIfYxw6A8EjILRO0uvro1T/K8UDBdk6Y7T1S27zPYWS/ISRdv610x0jxMwb6Li4Q2hXi P1Tg== X-Gm-Message-State: AAQBX9cBSs8OTRWtzand+X2HWf2xbgpe/ZJJ2M/OJqKKicHHEQI8f49c oJSZazDjpMGW1p/GOL/vZvtD+w== X-Received: by 2002:a62:4ec9:0:b0:575:b783:b6b3 with SMTP id c192-20020a624ec9000000b00575b783b6b3mr25452388pfb.28.1680295590838; Fri, 31 Mar 2023 13:46:30 -0700 (PDT) Received: from dread.disaster.area (pa49-181-91-157.pa.nsw.optusnet.com.au. [49.181.91.157]) by smtp.gmail.com with ESMTPSA id a3-20020a62bd03000000b005abc0d426c4sm2225403pff.54.2023.03.31.13.46.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 31 Mar 2023 13:46:30 -0700 (PDT) Received: from dave by dread.disaster.area with local (Exim 4.92.3) (envelope-from ) id 1piLdj-00FUjp-5i; Sat, 01 Apr 2023 07:46:27 +1100 Date: Sat, 1 Apr 2023 07:46:27 +1100 From: Dave Chinner To: "Darrick J. Wong" Cc: Aleksandr Nogikh , syzbot , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org, syzkaller-bugs@googlegroups.com Subject: Re: [syzbot] [xfs?] WARNING in xfs_bmap_extents_to_btree Message-ID: <20230331204627.GH3223426@dread.disaster.area> References: <0000000000003da76805f8021fb5@google.com> <20230330012750.GF3223426@dread.disaster.area> <20230330224302.GG3223426@dread.disaster.area> <20230331012537.GC4126677@frogsfrogsfrogs> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20230331012537.GC4126677@frogsfrogsfrogs> X-Spam-Status: No, score=0.0 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 30, 2023 at 06:25:37PM -0700, Darrick J. Wong wrote: > On Fri, Mar 31, 2023 at 09:43:02AM +1100, Dave Chinner wrote: > > On Thu, Mar 30, 2023 at 10:52:37AM +0200, Aleksandr Nogikh wrote: > > > On Thu, Mar 30, 2023 at 3:27 AM 'Dave Chinner' via syzkaller-bugs > > > wrote: > > > > > > > > On Tue, Mar 28, 2023 at 09:08:01PM -0700, syzbot wrote: > > > > > Hello, > > > > > > > > > > syzbot found the following issue on: > > > > > > > > > > HEAD commit: 1e760fa3596e Merge tag 'gfs2-v6.3-rc3-fix' of git://git.ke.. > > > > > git tree: upstream > > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=16f83651c80000 > > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=acdb62bf488a8fe5 > > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=0c383e46e9b4827b01b1 > > > > > compiler: Debian clang version 15.0.7, GNU ld (GNU Binutils for Debian) 2.35.2 > > > > > > > > > > Unfortunately, I don't have any reproducer for this issue yet. > > > > > > > > > > Downloadable assets: > > > > > disk image: https://storage.googleapis.com/syzbot-assets/17229b6e6fe0/disk-1e760fa3.raw.xz > > > > > vmlinux: https://storage.googleapis.com/syzbot-assets/69b5d310fba0/vmlinux-1e760fa3.xz > > > > > kernel image: https://storage.googleapis.com/syzbot-assets/0c65624aace9/bzImage-1e760fa3.xz > > > > > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > > > > Reported-by: syzbot+0c383e46e9b4827b01b1@syzkaller.appspotmail.com > > > > > > > > > > ------------[ cut here ]------------ > > > > > WARNING: CPU: 1 PID: 24101 at fs/xfs/libxfs/xfs_bmap.c:660 xfs_bmap_extents_to_btree+0xe1b/0x1190 > > > > > > > > Allocation got an unexpected ENOSPC when it was supposed to have a > > > > valid reservation for the space. Likely because of an inconsistency > > > > that had been induced into the filesystem where superblock space > > > > accounting doesn't exactly match the AG space accounting and/or the > > > > tracked free space. > > > > > > > > Given this is a maliciously corrupted filesystem image, this sort of > > > > warning is expected and there's probably nothing we can do to avoid > > > > it short of a full filesystem verification pass during mount. > > > > That's not a viable solution, so I think we should just ignore > > > > syzbot when it generates this sort of warning.... > > > > > > If it's not a warning about a kernel bug, then WARN_ON should probably > > > be replaced by some more suitable reporting mechanism. Kernel coding > > > style document explicitly says: > > > > > > "WARN*() must not be used for a condition that is expected to trigger > > > easily, for example, by user space actions. > > > > That's exactly the case here. It should *never* happen in normal > > production workloads, and it if does then we have the *potential* > > for silent data loss occurring. That's *exactly* the sort of thing > > we should be warning admins about in no uncertain terms. Also, we > > use WARN_ON_ONCE(), so it's not going to spam the logs. > > > > syzbot is a malicious program - it is injecting broken stuff into > > the kernel as root to try to trigger situations like this. That > > doesn't make a warning it triggers bad or incorrect - syzbot is > > pertubing tightly coupled structures in a way that makes the > > information shared across those structures inconsistent and > > eventually the code is going to trip over that inconsistency. > > > > IOWs, once someone has used root permissions to mount a maliciously > > crafted filesystem image, *all bets are off*. The machine is running > > a potentially compromised kernel at this point. Hence it is almost > > guaranteed that at some point the kernel is going to discover things > > are *badly wrong* and start dumping "this should never happen!" > > warnings into the logs. That's what the warnings are supposed to do, > > and the fact that syzbot can trigger them doesn't make the warnings > > wrong. > > > > > pr_warn_once() is a > > > possible alternative, if you need to notify the user of a problem." > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/coding-style.rst?id=1e760fa3596e8c7f08412712c168288b79670d78#n1223 > > > > It is worth remembering that those are guidelines, not enforcable > > rules and any experienced kernel developer will tell you the same > > thing. We know the guidelines, we know when to apply them, we know > > there are cases that the guidelines simply can't, don't or won't > > cover. > > ...and perhaps the WARNs that can result from corrupted metadata should > be changed to XFS_IS_CORRUPT() ? Well, I think in the case it isn't -corrupt- metadata, more the case that there is an inconsistency between different structures that are internally consistent. e.g. remove a free space extent from the freespace tree without removing the space from the global free space counters. Now delalloc reservation is allowed by the global counters, but when we got to allocate the extent - or the bmap btree block to index it - we fail the allocation because the free space btrees are empty. The allocation structures are not internally inconsistent or corrupt, so it's done the right thing by returning ENOSPC. The global counters are not obviously inconsistent or corrupt, either. So it can be triggered by just the right sort of corruption at exactly the right time (i.e at 100% ENOSPC), but the chances of this convoluted set of circumstances happening in production systems is pretty much infintesimal. > We still get a kernel log about something going wrong, only now the > report doesn't trigger everyone's WARN triggers, and we tell the user to > go run xfs_repair. I think that is exactly the wrong thing to do. We have a history of this WARN firing as a result of software bugs in XFS - typically a transaction space reservation or allocation parameter setup issue - in which case a WARN_ON_ONCE is more appropriate here than declaring the filesystem corrupt. That's the bottom line - this specific WARN has been placed because it is an indicator of a bug in the code, not because it is something that occurs because of filesystem corruption. The WARN is an indicator that the bug needs to be reported, not simply put back on the user to clean up the mess and continue on blissfully unaware that they tripped over a kernel bug rather than some nebulous, unexplainable corruption. syzbot being able to trip over it by corrupting the fs in just the right way doesn't mean we should change it - syzbot is a malicious attacker, not a production workload, and I really don't think we should be changing warnings that we actually want users to report just to shut up syzbot. -Dave. -- Dave Chinner david@fromorbit.com