Received: by 2002:a25:1506:0:0:0:0:0 with SMTP id 6csp3356740ybv; Sat, 15 Feb 2020 18:26:49 -0800 (PST) X-Google-Smtp-Source: APXvYqx6Mpp10qvSlDvz6EWYZn8P6IhAbzJ5eEpXGCV5XtH67iAiboYnhSUVrPbFFMX7Urgxe29E X-Received: by 2002:a9d:21c5:: with SMTP id s63mr7390340otb.142.1581820008886; Sat, 15 Feb 2020 18:26:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1581820008; cv=none; d=google.com; s=arc-20160816; b=CA1fPiPFa4BfRLM34ICM6LVj8HTzWOFRvdkZAPiTl6yM9jnhL7ujaT6tNfGL8aOj1S tHEsqXp3xQP4pmhTVk0nSWIGUdG4SbUXe1N7jvY3Q66GhS3n7dCNhTfWXfgcsUNpfUn/ xY4rxcVBPBQl/PtG9FI0zdiL2L+LXPxz8era3OccKQ2q6f8pXy2Ua+WUNFSBJg0aap8D aZjgOobviPtcw6Pp/2tscHAtx3zZDwWhDQ7hV3jDqkyjbkdX1KxnWFvDW/vt3ZZnPDT6 q13skYe288uGhhETQLmEzThAUZkFz5iaPY28jisGQASwYqifu4HDJxJASuOoA8PMSwMt GWnQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=hn1LAaGTLMUYM/u3qFiUMUaaIMVmNm5PW8Pt3lKNegM=; b=Erb8jLPcsyWjeMGR+n2MFsJQv+emkQaWLI3MbnhqEhJCzE3F9beFV4ZdOI/HR8DKjT oT5ymsscjZXPUoarVihj7C8OFq0A3ikBMjcqlS1TufxDRntLpuh3xfANIMbh/qNuRXWD RggHUJ7CPXTtqWuIRCXdmySfFklrCUAekHKzpwjmThjXlC4gVwJgaNt1AM9RutpBH7l0 HJ0eV8hWiyCqZgu9T2BjUC9IIjorx3RPPNXsPrah+naVTbmUEuZDKUUh1aB3vOwtLdxX 56MfvScqdtiDRQZGRbWktuvpRbPM5uS7aE0wyjt8fLy+FjW/AEl7hD13T2gw/UAlg62A Vdhw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=Fvutr1hZ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j18si5275231otq.275.2020.02.15.18.26.24; Sat, 15 Feb 2020 18:26:48 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=Fvutr1hZ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726697AbgBPCYz (ORCPT + 99 others); Sat, 15 Feb 2020 21:24:55 -0500 Received: from mail-vk1-f196.google.com ([209.85.221.196]:45571 "EHLO mail-vk1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726524AbgBPCYz (ORCPT ); Sat, 15 Feb 2020 21:24:55 -0500 Received: by mail-vk1-f196.google.com with SMTP id g7so3653724vkl.12; Sat, 15 Feb 2020 18:24:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=hn1LAaGTLMUYM/u3qFiUMUaaIMVmNm5PW8Pt3lKNegM=; b=Fvutr1hZbHBtGkF7oeQ/BdZ9cdHyLFGQF+ZfrNYLS8239jrnE8YQvZW4ZZm58G2bCy Z2MJpzqM4uwhUe1wjyVFAm4jJJvQjnTVZzNgDVluvY6XbyDgWZ6lBKfUbc/YmhwwGebN R1nP0Z2Q4tdPNgGgB8nZs1s4k0JqIyK5rMK8xyRtGdqQmWki5tPeJ5cVyyAyUn+xP1q/ lkVLLiz9ySKl/J5cybUX51i34qtwKleoR/twuSqu426+7qrcKoeKjyW8oP+72X8WK83d V05qS14D/P8EudKj19OodvR9lWvNAMOY5lqSjm5+47E145IAKVuCv0HmbaiDGzrdkyGf gTKQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=hn1LAaGTLMUYM/u3qFiUMUaaIMVmNm5PW8Pt3lKNegM=; b=IdEFL3q5lGumS8kArSUeSvTZfnkcOEwDkYJAwSIiAHk/w5yLrAE/YpRUb0PkYQ2NwL PPS33/RbAkxQrUXMA9rq2L9g+4oG2+vnlhc/Lx9XB/ZHeexTjcaTcjC/UIPSmevcxaNT Amz544kzL7vR1SzxaWPwZ/MmvEjcErHLxAhRy1UrNFx63WOEdZhPh5MVLWZazgtSYdtM JKa8/7URZ/FvsqNlj9zW+ht7o7TD3Y+zDmg1ai61hNJIblPK3Efj+SBFCK44Tb+xaxLS WN0Z/6Mt+p4ZKKMrRouaM8CC3izYyGZir72H5FvhcBki3i3IPFYXoBHpQPXEPAqB2gj3 Q0lg== X-Gm-Message-State: APjAAAX7hrYroLeufZWdcAfXGxC3I2RMOLzf8VNonXeDGGOWbq0NIeCs bg5ntkaK9O96UapdLnSGXdKKssgU3G9HfdL6Y38i4RZLjeo= X-Received: by 2002:ac5:c5c2:: with SMTP id g2mr2718312vkl.82.1581819892621; Sat, 15 Feb 2020 18:24:52 -0800 (PST) MIME-Version: 1.0 References: <20200123.225827.1155989593018204741.hermes@ceres.dti.ne.jp> <20200210.224609.499887311281343618.hermes@ceres.dti.ne.jp> <20200216.111029.687350152614907818.hermes@ceres.dti.ne.jp> In-Reply-To: <20200216.111029.687350152614907818.hermes@ceres.dti.ne.jp> From: "Brian G." Date: Sat, 15 Feb 2020 20:24:41 -0600 Message-ID: Subject: Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_construct To: ARAI Shun-ichi Cc: linux-kernel@vger.kernel.org, linux-nilfs@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This is my first post to the LKML, so please be kind :) I also have been affected by this bug. The bug is triggered whenever a write happens to the filesystem, which means mounting read-only is an available option to recover data. I took the time to do a full bisect on the kernel sources and have identified the commit where the breakage happens. Regarding versions, I can confirm that 4.19.83 is stable with regards to NILFS, and 4.19.84 and later are broken. I can also confirm that 5.3.10 works fine and have heard that 5.3.12 breaks NILFS as well. I can also confirm that the 5.4.18 kernel still has this issue. I did not trace how far back the issue goes on the 5.4.x series, or even in more detail on the 5.3.x series. To simplify my bisection task, I used the 4.19.x series, and determined that commit d3b3c0a14615c495118acc4bdca23d53eea46ed2 is the commit that breaks NILFS. Furthermore, when reverting this commit on otherwise clean 4.19.84 kernel sources, the NILFS issue does not occur anymore. I'm not familiar enough with NILFS's internals to determine why the small caching change to the kernel from that commit breaks NILFS, nor can I offer a patch to fix it (besides reverting the offending change) but I can confirm that this is the initial cause. I also know there has been alot of new changes to kernel caching in more recent (5.4 / 5.5 / 5.6) kernels, so perhaps there is still more diagnostics to do. I have the test VM that I used for bisection available if someone wants to coordinate with me to put together a patch for this, but ideally someone can take my diagnostics effort here and make use of it directly. I saved dmesg logs from both good and bad cases and I can send them if someone is interested. I can also provide some level of detailed system setup instructions to reproduce the issue. I did my testing against an existing external hard drive, but I have been able to reproduce the issue consistently against a freshly created loopback mount as well, so it is not just caused by disk corruption or an unclean unmount. - Brian On Sat, Feb 15, 2020 at 8:11 PM ARAI Shun-ichi wrote: > > And, > > In <20200210.224609.499887311281343618.hermes@ceres.dti.ne.jp>; > ARAI Shun-ichi wrote > as Subject "Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_construct": > > > Hi, > > > > FYI, reporting additional test results. > > > > I reproduced this problem with clean NILFS2 fs in previous mail. > > "clean" means that "make filesystem before every tests." > > In this mail, I tried to reproduct with/without VG/LV, LUKS, loopback. > > > > * Not reproduced > > USB stick - primary partition - NILFS2 > > USB stick - primary partition - VG/LV - NILFS2 > > USB stick - primary partition - VG/LV - LUKS - NILFS2 > > USB stick - primary partition - LUKS - VG/LV - NILFS2 > > USB stick - primary partition - LUKS - VG/LV - LUKS - NILFS2 > > /tmp (tmpfs) - regular file - NILFS2 (loopback mount, kernel 4.19.82) > > USB stick - primary partition(512MiB) - NILFS2 > > > > * Reproduced (always, immediately) > > /tmp (tmpfs) - regular file - NILFS2 (loopback mount) > > USB stick - primary partition - ext4 - regular file - NILFS2 (loopback mount) > > this loopback problem is seen in Kernel 5.5.4. > > > Test conditions: > > kernel 4.19.86 (same as previous test) > > NILFS2/ext4 filesystem, VG/LV, LUKS were made with default parameters > > size of "primary partition" in USB stick is approx. 14GiB > > size of "regular file" is approx. 512MiB > > "reproduce": mount NILFS2, touch file, sync