From: Ric Wheeler Subject: Re: [PATCH] e2fsck: Discard free data and inode blocks. Date: Fri, 22 Oct 2010 10:46:02 -0400 Message-ID: <4CC1A3AA.6040004@gmail.com> References: <1287670556-23460-1-git-send-email-lczerner@redhat.com> <6388FD2D-50A8-42B9-A955-3824451ACBF4@dilger.ca> <4CC175E6.5000700@gmail.com> <4CC19BC2.9010503@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Andreas Dilger , linux-ext4@vger.kernel.org, tytso@mit.edu, sandeen@redhat.com To: Lukas Czerner Return-path: Received: from mail-yx0-f174.google.com ([209.85.213.174]:58405 "EHLO mail-yx0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753152Ab0JVOqN (ORCPT ); Fri, 22 Oct 2010 10:46:13 -0400 Received: by yxn35 with SMTP id 35so693530yxn.19 for ; Fri, 22 Oct 2010 07:46:12 -0700 (PDT) In-Reply-To: Sender: linux-ext4-owner@vger.kernel.org List-ID: On 10/22/2010 10:32 AM, Lukas Czerner wrote: > On Fri, 22 Oct 2010, Ric Wheeler wrote: > >> On 10/22/2010 07:43 AM, Lukas Czerner wrote: >>> On Fri, 22 Oct 2010, Ric Wheeler wrote: >>> >>>> On 10/22/2010 05:12 AM, Lukas Czerner wrote: >>>>> On Thu, 21 Oct 2010, Andreas Dilger wrote: >>>>> >>>>>> On 2010-10-21, at 08:15, Lukas Czerner wrote: >>>>>>> In Pass 5 when we are checking block and inode bitmaps we have great >>>>>>> opportunity to discard free space and unused inodes on the device, >>>>>>> because bitmaps has just been verified as valid. This commit takes >>>>>>> advantage of this opportunity and discards both, all free space and >>>>>>> unused inodes. >>>>>>> >>>>>>> I have added new option '-K' which when set, disables discard. Also >>>>>>> when >>>>>>> the underlying device does not support discard, or BLKDISCARD ioctl >>>>>>> returns any kind of error, or when some errors occurred in bitmaps, >>>>>>> the >>>>>>> discard is disabled. >>>>>> I'm always a bit nervous with patches like this, that will prevent >>>>>> data >>>>>> recovery after an e2fsck run (which seems like the opposite of what we >>>>>> want from e2fsck). >>>>>> >>>>>> Two suggestions: >>>>>> - it probably makes sense to disable this by default, and allow it to >>>>>> be >>>>>> specified on the command-line and e2fsck.conf >>>>>> - should we really have a short option, or a "-E discard" and "-E >>>>>> nodiscard" >>>>>> options, which allow us to change the default easily at some later >>>>>> time >>>>>> (which we can't do with a single -K flag) >>>>> Right, I agree it would be probably better to disable this by default. >>>>> >>>>> >>>> If we do disable it by default, I think that we might also want to be >>>> consistent and disable the discard support in mkfs by default as well? >>>> >>>> thanks! >>>> >>>> Ric >>>> >>> I think that this will not be necessary. There is a concern that it might >>> prevent data recovery after fsck because it might be already discarded >>> (some weird fs corruption?) in pass 5. However in my opinion this is a >>> very small window (if there even is any), because we have already passed >>> check 1-4 and we have just confirmed that group descriptors should be ok. >>> But when there is an even slight chance this might happen I would suggest >>> that we really disable it by default (at least for a while - we will see >>> then). >>> >>> On the other hand there is nothing to be afraid of in the case of mkfs, >>> because we can not possibly lose any relevant data, because discard is >>> done before the filesystem gets created. >>> >>> -Lukas >> My concern with mkfs is that we have seen several devices which don't handle >> this well. >> >> We will be using this TRIM (or UNMAP, etc) on lots of old, creaky hardware >> with old firmware, so having it try on all devices is almost certainly going >> to cause breakages, hangs, etc in the field.... >> >> Ric >> > Well, so far the only breakages I have seen was with lots of small TRIMs > (or UNMAPs, etc) issued in random pattern, never in case of mkfs which > is quite a opposite - big sequential ranges. > > Hangs should be covered by those two patches: > > http://marc.info/?l=linux-ext4&m=128774558623608&w=2 > http://marc.info/?l=linux-ext4&m=128767099123375&w=2 > > if, of course, they get upstream. Also there is a big win, when discard > also zeroes data, because in that case we can just skip inode table > initialization (zeroing) without any need of in-kernel lazyinit code > enabled. And we get all this for free. It was introduced with Sandeens > patch: > > http://marc.info/?l=linux-ext4&m=128234048208327&w=2 > > So, I would rather leave it on by default. > > -Lukas You cannot 100% depend on discard zeroing blocks - that is not a universal requirement of devices that support it. Specifically, for ATA devices, I think that there are optional bits that specify how a device will behave when you read from a trimmed region. Ric