From: =?UTF-8?Q?Maciej_=C5=BBenczykowski?= Subject: Re: NULL pointer dereference in ext4_ext_remove_space on 3.5.1 Date: Thu, 16 Aug 2012 14:40:53 -0700 Message-ID: References: <20120816024654.GB3781@thunk.org> <20120816111051.GA16036@localhost> <20120816152513.GA31346@thunk.org> <20120816211948.GF31346@thunk.org> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 To: "Theodore Ts'o" , =?UTF-8?Q?Maciej_=C5=BBenczykowski?= , Fengguang Wu , Marti Raudsepp , Kernel hackers , ext4 hackers Return-path: Received: from mail-gh0-f174.google.com ([209.85.160.174]:61761 "EHLO mail-gh0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1030362Ab2HPVkx (ORCPT ); Thu, 16 Aug 2012 17:40:53 -0400 Received: by ghrr11 with SMTP id r11so3457284ghr.19 for ; Thu, 16 Aug 2012 14:40:53 -0700 (PDT) In-Reply-To: <20120816211948.GF31346@thunk.org> Sender: linux-ext4-owner@vger.kernel.org List-ID: > Maciej, you weren't able to reliably repro the crash were you? I'm > pretty sure this should fix the crash, but it would be really great to > confirm things. > > I suspect creating a file system with a really small journal may make > it easier to reproduce, but I haven't had time to try create a > reliable repro for this bug yet. This happened twice to me while moving data off of a ~1TB ext4 partition. The data portion was on a stripe raid across 2 ~500GB drives, the journal was on a relatively large partition (500MB?) on an SSD. (crypto and lvm were also involved). I've since emptied the partition and deleted even the raid array. Both times it happened during rm, first time rm -rf of a directory tree, second time during rm of a 250GB disk image generated by dd (from a notebook drive). Both rm's were manually run by me from a shell command line, and there was pretty much nothing else happening on the machine at the time. I'm not aware of there having been anything interesting (like: holes/punch/sparseness, much r/w activity in the middle of files, etc) on this filesystem, it was pretty much just a write-once data backup that I had copied elsewhere and was deleting. The 250GB disk image was definitely just a sequentially written disk dump, and I think the same thing holds true for the contents of the wiped directory tree (although in many much smaller files). I know i=1 in both cases (and dissasembly pointed out the location where the above debug patch is BUGing), but I don't think it's possible to figure out what inode # it crashed on. Perhaps just untarring a bunch of kernels onto an empty partition, filling it up, then deleting those kernels should be sufficient to repro this (untried). Perhaps something like: create 1TB filesystem untar a thousand kernel source trees on to it create 20GB files of junk until it is full rm -rf / - Maciej