Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp7464683rwb; Wed, 23 Nov 2022 06:56:02 -0800 (PST) X-Google-Smtp-Source: AA0mqf7Gc4su+VRX5ScoccrcO/OKwkwdilkvtJrrqZK8b8+CeVXI5PxZoduxcSyKajaJx3RikZCh X-Received: by 2002:a17:90b:3c0d:b0:20d:478a:9d75 with SMTP id pb13-20020a17090b3c0d00b0020d478a9d75mr36953924pjb.149.1669215362613; Wed, 23 Nov 2022 06:56:02 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1669215362; cv=none; d=google.com; s=arc-20160816; b=xOXSFZsW0PWwK6+Ztm702rd8f/CPsbxh0gd3DqOwVBFShZS71V6bOcoxjujrhYliBr A8El1nzAv0fSJXW7nNh/EEPNaxVk/KGBFiW5amDGjGxILnOI+f9Da/5TwWHQLgGiJiV7 oyE7MXZyRM2vHrhPsUe60dlz7dMAmPYanyyaMTZaE+xpNThuyVsHCyZpX1aJgid2x3hM KzcEVCTq77jpfuo6sWORU21FGVYG4EdiEPkKlVpvYVruXBxO1AdBtrLsUBY9yA+WtI5x cZBIxCis1FkXILMrgqmdoYOuRux7HbwsoQtYVWZpdj7Ja0VQyGCGuBZmoc2gYXWQhQmh aQpQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:content-language:to :subject:from:user-agent:mime-version:date:message-id; bh=6txy0g1Gm/tgiR1O/OG2y36M1X6Sna6W3ePwto0c4D8=; b=P7rR43pnicX93ZfirRJR65hklc7cETKOed93YX9uuj70BEE+K0GN66qfw+hiVsMwUe 2DMaSooNPuHQPH4qQHS1lSUaVWFOy97WWzClTk6haAU6foeDV7PFXrDukC4cuPO2hOlu TDzLTPNlMwcf1sERcxSaY0Ntni0Bc5t4L9HfvDS/UOxFDdC4k8fZ4G9mJvcJTK7aHQTW LnYeBb/0hncu9g3GZzW/9mji1/MNuatYpIyu3iH8MYClFvisQ/FDJDKLZBhEMpfe68Zq eAdWMqTdgTmFKBuTRLYWRMdD2+26L3K5y0Ca30u2zkQYSmBfZ1A/xpNOSSeOZHhy1Wq4 x0ig== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id bj16-20020a17090b089000b00214292112a6si1771516pjb.82.2022.11.23.06.55.44; Wed, 23 Nov 2022 06:56:02 -0800 (PST) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237815AbiKWOyp (ORCPT + 99 others); Wed, 23 Nov 2022 09:54:45 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56296 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237758AbiKWOyp (ORCPT ); Wed, 23 Nov 2022 09:54:45 -0500 X-Greylist: delayed 401 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Wed, 23 Nov 2022 06:54:44 PST Received: from us.icdsoft.com (us.icdsoft.com [192.252.146.184]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 31FDDC02 for ; Wed, 23 Nov 2022 06:54:44 -0800 (PST) Received: (qmail 4397 invoked by uid 1001); 23 Nov 2022 14:48:03 -0000 Received: from unknown (HELO ?94.155.37.249?) (famzah@icdsoft.com@94.155.37.249) by 192.252.159.165 with ESMTPA; 23 Nov 2022 14:48:03 -0000 Message-ID: Date: Wed, 23 Nov 2022 16:48:01 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.5.0 From: Ivan Zahariev Subject: kernel BUG at fs/ext4/inode.c:1914 - page_buffers() To: linux-ext4@vger.kernel.org Content-Language: en-US Cc: Theodore Ts'o , Greg Kroah-Hartman Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org Hello, Starting with kernel 5.15 for the past eight months we have a total of 12 kernel panics at a fleet of 1000 KVM/Qemu machines which look the following way:     kernel BUG at fs/ext4/inode.c:1914 Switching from kernel 4.14 to 5.15 almost immediately triggered the problem. Therefore we are very confident that userland activity is more or less the same and is not the root cause. The kernel function which triggers the BUG is __ext4_journalled_writepage(). In 5.15 the code for __ext4_journalled_writepage() in "fs/ext4/inode.c" is the same as the current kernel "master". The line where the BUG is triggered is:     struct buffer_head *page_bufs = page_buffers(page) The definition of "page_buffers(page)" in "include/linux/buffer_head.h" hasn't changed since 4.14, so no difference here. This is where the actual "kernel BUG" event is triggered:     /* If we *know* page->private refers to buffer_heads */     #define page_buffers(page) \         ({ \             BUG_ON(!PagePrivate(page)); \             ((struct buffer_head *)page_private(page)); \         })     #define page_has_buffers(page) PagePrivate(page) Initially I thought that the issue is already discussed here: https://lore.kernel.org/all/Yg0m6IjcNmfaSokM@google.com/ But this seems to be another (solved) problem and Theodore Ts'o already made a quick fix by simply reporting the rare occurrence and continuing forward. The commit is in 5.15 (and in the latest kernel), so it's not helping our case: https://github.com/torvalds/linux/commit/cc5095747edfb054ca2068d01af20be3fcc3634f Back to the problem! 99% of the difference between 4.14 and the latest kernel for __ext4_journalled_writepage() in "fs/ext4/inode.c" comes from the following commit: https://github.com/torvalds/linux/commit/5c48a7df91499e371ef725895b2e2d21a126e227 Is it safe that we revert this patch on the latest 5.15 kernel, so that we can confirm if this resolves the issue for us? Best regards. --Ivan