Received: by 2002:a05:6a10:6d10:0:0:0:0 with SMTP id gq16csp4007846pxb; Tue, 19 Apr 2022 14:49:14 -0700 (PDT) X-Google-Smtp-Source: ABdhPJysWTFTGz7FjLrpRBSrioc9Q32Y+TZARRQXTnTfcgWZ900FcfxcXbexw0yKVIWk6OCWaLfX X-Received: by 2002:a17:902:c613:b0:159:9f9:85ec with SMTP id r19-20020a170902c61300b0015909f985ecmr8414639plr.67.1650404953812; Tue, 19 Apr 2022 14:49:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1650404953; cv=none; d=google.com; s=arc-20160816; b=rCgBGny3kfm2nB4sOFJ9xlUoVK/m0UiqVYXj9GnIStdyq8hsggyMA6xlToe1TiV5tf k6aM9M2aANMNvgxIrnZP9+GSJ5JFHtshXckawR8rDngsFTVZOC97buNLtdIo8bk0gAEa oN5VFtl1K6vWKgxm5iImuwMCe20sv3pP1/daGOrb7NhBOM6MDKBBhzoU6p/gX64yd7S+ pUlFD9/kV7zHX3RSJVbmMlm0xUiqShkTjuxwCwL7d50cpFFDpDN6AyeBMnrvnzriMIeF hdw5P8HN8dV0Nz/M6c8kXuobdmAKQR7t/KUDbVYLuc9l/COR+/iy5TK3dk8Os40gcgsp o6Yg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=DVh+Faq8gjOw0CX9GqEq2VjkWImIgkplghemS5N/sR0=; b=L78whz6nLov4ie+5oHVlH+aUqrUy5xUUu0CIGAdHBDxu+8q5YNDOLd+W0kiJTC1dkq af7KwJiRxOzI1ttTLZueutvdKKe272zgoWjGGECP+XDTA/a+UiroEIF1bAgwwxSvE0po NzAH1EHxS/EUQESrC7ayHCy0kqcl/eOwfZD+PrWYp1CbPKqvQ8lxzlm8pIZIWNnEtyho lkVKVw0d5w+xkGCHBU4nI44SzgybTqPe//j+4Cqw+t+8YjKytftMOphJD7iBiUqvHEcY iEefPcIbACzgjRscLxw7nLMHLb1tvU7SBwb4jxTy12O5PjhRYM10HWL4bvGR6OS1yQAe xqCA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id r10-20020a63e50a000000b003a283acb929si410475pgh.490.2022.04.19.14.48.57; Tue, 19 Apr 2022 14:49:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1355017AbiDSQsJ (ORCPT + 99 others); Tue, 19 Apr 2022 12:48:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54176 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355092AbiDSQob (ORCPT ); Tue, 19 Apr 2022 12:44:31 -0400 X-Greylist: delayed 2119 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Tue, 19 Apr 2022 09:41:47 PDT Received: from nibbler.cm4all.net (nibbler.cm4all.net [IPv6:2001:8d8:970:e500:82:165:145:151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2BD822A249 for ; Tue, 19 Apr 2022 09:41:47 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by nibbler.cm4all.net (Postfix) with ESMTP id 3F259C00E1 for ; Tue, 19 Apr 2022 18:41:46 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at nibbler.cm4all.net Received: from nibbler.cm4all.net ([127.0.0.1]) by localhost (nibbler.cm4all.net [127.0.0.1]) (amavisd-new, port 10024) with LMTP id URCB-aDoYIYF for ; Tue, 19 Apr 2022 18:41:39 +0200 (CEST) Received: from zero.intern.cm-ag (zero.intern.cm-ag [172.30.16.10]) by nibbler.cm4all.net (Postfix) with SMTP id 1679EC00D5 for ; Tue, 19 Apr 2022 18:41:39 +0200 (CEST) Received: (qmail 29906 invoked from network); 19 Apr 2022 22:31:40 +0200 Received: from unknown (HELO rabbit.intern.cm-ag) (172.30.3.1) by zero.intern.cm-ag with SMTP; 19 Apr 2022 22:31:40 +0200 Received: by rabbit.intern.cm-ag (Postfix, from userid 1023) id D8786460EFB; Tue, 19 Apr 2022 18:41:38 +0200 (CEST) Date: Tue, 19 Apr 2022 18:41:38 +0200 From: Max Kellermann To: David Howells Cc: Max Kellermann , linux-cachefs@redhat.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: fscache corruption in Linux 5.17? Message-ID: References: <454834.1650373340@warthog.procyon.org.uk> <508603.1650385022@warthog.procyon.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <508603.1650385022@warthog.procyon.org.uk> X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2022/04/19 18:17, David Howells wrote: > find /var/cache/fscache -inum $((0xiiii)) > > and see if you can see the corruption in there. Note that there may be blocks > of zeroes corresponding to unfetched file blocks. I checked several known-corrupt files, but unfortunately, all corruption have disappeared :-( The /var/cache/fscache/ files have a time stamp half an hour ago (17:53 CET = 15:53 GMT). I don't know what happened at that time - too bad this disappeared after a week, just when we started investigating it. All those new files are all-zero. No data is stored in any of them. Note that I had to enable /sys/kernel/debug/tracing/events/cachefiles/enable; the trace events you named (read/write/trunc/io_error/vfs_error) do not emit anything. This is what I see: kworker/u98:11-1446185 [016] ..... 1813913.318370: cachefiles_ref: c=00014bd5 o=12080f1c u=1 NEW obj kworker/u98:11-1446185 [016] ..... 1813913.318379: cachefiles_lookup: o=12080f1c dB=3e01ee B=3e5580 e=0 kworker/u98:11-1446185 [016] ..... 1813913.318380: cachefiles_mark_active: o=12080f1c B=3e5580 kworker/u98:11-1446185 [016] ..... 1813913.318401: cachefiles_coherency: o=12080f1c OK B=3e5580 c=0 kworker/u98:11-1446185 [016] ..... 1813913.318402: cachefiles_ref: c=00014bd5 o=12080f1c u=1 SEE lookup_cookie > Also, what filesystem is backing your cachefiles cache? It could be useful to > dump the extent list of the file. You should be able to do this with > "filefrag -e". It's ext4. Filesystem type is: ef53 File size of /var/cache/fscache/cache/Infs,3.0,2,,a4214ac,c0000208,,,3002c0,10000,10000,12c,1770,bb8,1770,1/@58/T,c0000208,,1cf4167,184558d9,c0000208,,40,36bab37,40, is 188416 (46 blocks of 4096 bytes) /var/cache/fscache/cache/Infs,3.0,2,,a4214ac,c0000208,,,3002c0,10000,10000,12c,1770,bb8,1770,1/@58/T,c0000208,,1cf4167,184558d9,c0000208,,40,36bab37,40,: 0 extents found File size of /var/cache/fscache/cache/Infs,3.0,2,,a4214ac,c0000208,,,3002c0,10000,10000,12c,1770,bb8,1770,1/@ea/T,c0000208,,10cc976,1208c7f6,c0000208,,40,36bab37,40, is 114688 (28 blocks of 4096 bytes) /var/cache/fscache/cache/Infs,3.0,2,,a4214ac,c0000208,,,3002c0,10000,10000,12c,1770,bb8,1770,1/@ea/T,c0000208,,10cc976,1208c7f6,c0000208,,40,36bab37,40,: 0 extents found > As to why this happens, a write that's misaligned by 31 bytes should cause DIO > to a disk to fail - so it shouldn't be possible to write that. However, I'm > doing fallocate and truncate on the file to shape it so that DIO will work on > it, so it's possible that there's a bug there. The cachefiles_trunc trace > lines may help catch that. I don't think any write is misaligned. This was triggered by a WordPress update, so I think the WordPress updater truncated and rewrote all files. Random guess: some pages got transferred to the NFS server, but the local copy in fscache did not get updated. Max