Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933427Ab2HPVlH (ORCPT ); Thu, 16 Aug 2012 17:41:07 -0400 Received: from mail-yw0-f46.google.com ([209.85.213.46]:58546 "EHLO mail-yw0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1030459Ab2HPVky (ORCPT ); Thu, 16 Aug 2012 17:40:54 -0400 MIME-Version: 1.0 In-Reply-To: <20120816211948.GF31346@thunk.org> References: <20120816024654.GB3781@thunk.org> <20120816111051.GA16036@localhost> <20120816152513.GA31346@thunk.org> <20120816211948.GF31346@thunk.org> Date: Thu, 16 Aug 2012 14:40:53 -0700 Message-ID: Subject: Re: NULL pointer dereference in ext4_ext_remove_space on 3.5.1 From: =?UTF-8?Q?Maciej_=C5=BBenczykowski?= To: "Theodore Ts'o" , =?UTF-8?Q?Maciej_=C5=BBenczykowski?= , Fengguang Wu , Marti Raudsepp , Kernel hackers , ext4 hackers Content-Type: text/plain; charset=UTF-8 X-System-Of-Record: true Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2136 Lines: 49 > Maciej, you weren't able to reliably repro the crash were you? I'm > pretty sure this should fix the crash, but it would be really great to > confirm things. > > I suspect creating a file system with a really small journal may make > it easier to reproduce, but I haven't had time to try create a > reliable repro for this bug yet. This happened twice to me while moving data off of a ~1TB ext4 partition. The data portion was on a stripe raid across 2 ~500GB drives, the journal was on a relatively large partition (500MB?) on an SSD. (crypto and lvm were also involved). I've since emptied the partition and deleted even the raid array. Both times it happened during rm, first time rm -rf of a directory tree, second time during rm of a 250GB disk image generated by dd (from a notebook drive). Both rm's were manually run by me from a shell command line, and there was pretty much nothing else happening on the machine at the time. I'm not aware of there having been anything interesting (like: holes/punch/sparseness, much r/w activity in the middle of files, etc) on this filesystem, it was pretty much just a write-once data backup that I had copied elsewhere and was deleting. The 250GB disk image was definitely just a sequentially written disk dump, and I think the same thing holds true for the contents of the wiped directory tree (although in many much smaller files). I know i=1 in both cases (and dissasembly pointed out the location where the above debug patch is BUGing), but I don't think it's possible to figure out what inode # it crashed on. Perhaps just untarring a bunch of kernels onto an empty partition, filling it up, then deleting those kernels should be sufficient to repro this (untried). Perhaps something like: create 1TB filesystem untar a thousand kernel source trees on to it create 20GB files of junk until it is full rm -rf / - Maciej -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/