Received: by 2002:a05:6a10:6d10:0:0:0:0 with SMTP id gq16csp830339pxb; Tue, 12 Apr 2022 14:40:03 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxQfwGvgZVLeWZZZVDs4evpuD8KNPf6rUfjNF7cYsEITZtBq+CuftxVGOkU8gk3vOsQBKq8 X-Received: by 2002:a65:6813:0:b0:384:b288:8702 with SMTP id l19-20020a656813000000b00384b2888702mr32442939pgt.275.1649799603401; Tue, 12 Apr 2022 14:40:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1649799603; cv=none; d=google.com; s=arc-20160816; b=c5PT5P43jZ3AaY01fr5s1jM7HEEEnftr7AdBm6fsoRimQGf2LWQcOnKkbCPV/UytTW EqlFRCxv+NKaUaNIbsMfmHTSE/9M/7iJD3qz/08dDhhmiPdSPZ+cpzInxTwK3pGZ9Kax meiqetK68WH54VN1RNMUwzbj8dF9kscCZRCOWM8ooW1YtJgxbXxObnZIa9ggXuMaU88s o8cuZj27QrSBjibSeSjcxDA++3GbImZRK4PAQyEdJt+q7Huuowz/NhOhg59wqBSD7HHl IjQxrJmK0NPTFEDnBZv/M292lrL2nRynMmCkRyBlvbmwFKKdtvN0lna0PFXKSlv2oDSn DAkw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-disposition:mime-version:message-id :subject:cc:to:from:date; bh=uMuzFbRLqSn0s3vRnMUVL0JPGqU7RHJCAOeWcwv+8TI=; b=qLIeFYslHkiXSgTxnIJ+Tn13iBESKZyNMxupbs6JVUeqM+NNArlonBz/gdIUs+g21m kZhe0ICWXnG/wHctHiy7q6mgvcgKoxLLL2P2H0U29M7Di0HFynYgWjQZ9yNpOH32MqPV bpiYFtsj933zQc6vBTKAW//Hf9T3oEw8I+qLxF6OVEXRDQ0PyGrlN8Xew3pbNJmymg1E SDSpuDRrundW8hqHJLxeeHXSRc+HGGOFb6TuVDoWnwDqEs2QUCiRiFYg1eHlR5gf4Nvq ko1rgeNOS9CYwndbLaUzGkPJpZJ8eZbz8eGY2Zte5FiiykamjIfk7j6aMV5IaX19/FLz CU5w== ARC-Authentication-Results: i=1; mx.google.com; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id s3-20020a056a0008c300b00505897f9aa8si13840539pfu.185.2022.04.12.14.40.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Apr 2022 14:40:03 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 4F491139AEC; Tue, 12 Apr 2022 13:45:57 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1356923AbiDLPSk (ORCPT + 99 others); Tue, 12 Apr 2022 11:18:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38250 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1356891AbiDLPSd (ORCPT ); Tue, 12 Apr 2022 11:18:33 -0400 X-Greylist: delayed 326 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Tue, 12 Apr 2022 08:16:13 PDT Received: from nibbler.cm4all.net (nibbler.cm4all.net [82.165.145.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4B7215E74C for ; Tue, 12 Apr 2022 08:16:13 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by nibbler.cm4all.net (Postfix) with ESMTP id 8A3ABC007D for ; Tue, 12 Apr 2022 17:10:45 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at nibbler.cm4all.net Received: from nibbler.cm4all.net ([127.0.0.1]) by localhost (nibbler.cm4all.net [127.0.0.1]) (amavisd-new, port 10024) with LMTP id lI6n_RWVLLF0 for ; Tue, 12 Apr 2022 17:10:38 +0200 (CEST) Received: from zero.intern.cm-ag (zero.intern.cm-ag [172.30.16.10]) by nibbler.cm4all.net (Postfix) with SMTP id 5F1FDC00A5 for ; Tue, 12 Apr 2022 17:10:38 +0200 (CEST) Received: (qmail 29504 invoked from network); 12 Apr 2022 20:59:24 +0200 Received: from unknown (HELO rabbit.intern.cm-ag) (172.30.3.1) by zero.intern.cm-ag with SMTP; 12 Apr 2022 20:59:24 +0200 Received: by rabbit.intern.cm-ag (Postfix, from userid 1023) id 329E3460C77; Tue, 12 Apr 2022 17:10:38 +0200 (CEST) Date: Tue, 12 Apr 2022 17:10:38 +0200 From: Max Kellermann To: dhowells@redhat.com Cc: linux-cachefs@redhat.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: fscache corruption in Linux 5.17? Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RDNS_NONE, SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi David, two weeks ago, I updated a cluster of web servers to Linux kernel 5.17.1 (5.16.x previously) which includes your rewrite of the fscache code. In the last few days, there were numerous complaints about broken WordPress installations after WordPress was updated. There were PHP syntax errors everywhere. Indeed there were broken PHP files, but the interesting part is: those corruptions were only on one of the web servers; the others were fine, the file contents were only broken on one of the servers. File size and time stamp and everyhing in "stat" is identical, just the file contents are corrupted; it looks like a mix of old and new contents. The corruptions always started at multiples of 4096 bytes. An example diff: --- ok/wp-includes/media.php 2022-04-06 05:51:50.000000000 +0200 +++ broken/wp-includes/media.php 2022-04-06 05:51:50.000000000 +0200 @@ -5348,7 +5348,7 @@ /** * Filters the threshold for how many of the first content media elements to not lazy-load. * - * For these first content media elements, the `loading` attribute will be omitted. By default, this is the case + * For these first content media elements, the `loading` efault, this is the case * for only the very first content media element. * * @since 5.9.0 @@ -5377,3 +5377,4 @@ return $content_media_count; } +^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@ The corruption can be explained by WordPress commit https://github.com/WordPress/WordPress/commit/07855db0ee8d5cff2 which makes the file 31 bytes longer (185055 -> 185086). The "broken" web server sees the new contents until offset 184320 (= 45 * 4096), but sees the old contents from there on; followed by 31 null bytes (because the kernel reads past the end of the cache?). All web servers mount a storage via NFSv3 with fscache. My suspicion is that this is caused by a fscache regression in Linux 5.17. What do you think? What can I do to debug this further, is there any information you need? I don't know much about how fscache works internally and how to obtain information. Max