Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754328Ab0KSNxc (ORCPT ); Fri, 19 Nov 2010 08:53:32 -0500 Received: from ironport2-out.teksavvy.com ([206.248.154.183]:23707 "EHLO ironport2-out.pppoe.ca" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1754120Ab0KSNxa (ORCPT ); Fri, 19 Nov 2010 08:53:30 -0500 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApIBAIMQ5kxLd/sX/2dsb2JhbAAHgz3MIJELgSKDNnMEhFqLEw X-IronPort-AV: E=Sophos;i="4.59,223,1288584000"; d="scan'208";a="82976615" Message-ID: <4CE68155.50705@teksavvy.com> Date: Fri, 19 Nov 2010 08:53:25 -0500 From: Mark Lord User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-GB; rv:1.9.2.12) Gecko/20101027 Thunderbird/3.1.6 MIME-Version: 1.0 To: Steven Whitehouse CC: Lukas Czerner , James Bottomley , Christoph Hellwig , Matthew Wilcox , Josef Bacik , tytso@mit.edu, linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, sandeen@redhat.com Subject: Re: [PATCH 1/2] fs: Do not dispatch FITRIM through separate super_operation References: <1290065809-3976-1-git-send-email-lczerner@redhat.com> <20101118130630.GJ6178@parisc-linux.org> <20101118134804.GN5618@dhcp231-156.rdu.redhat.com> <20101118141957.GK6178@parisc-linux.org> <20101118142918.GA18510@infradead.org> <1290100750.3041.72.camel@mulgrave.site> <1290168976.2570.45.camel@dolmen> In-Reply-To: <1290168976.2570.45.camel@dolmen> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2311 Lines: 45 On 10-11-19 07:16 AM, Steven Whitehouse wrote: > > There doesn't seem to be any technical reason why faster implementations are not possible. There is a very good reason why faster implementations may be *difficult* (if not impossible) in many cases: DETERMINISTIC trim. This requires that the drive guarantee the block ranges will return a constant known value after TRIM. Which means they MUST write to flash during the trim. And any WRITE to flash means a potential ERASE operation may be needed. Simply buffering the trim in RAM and returning success is not an option here, because loss of power would negate the (virtual) TRIM. So they MUST record the trim operation to non-volatile storage. This can be done in a variety of ways, but one of the simplest is to just do the full TRIM then and there, shuffling data and erasing the blocks before signaling completion. Another, possibly faster way, is to have TRIM just update a block bitmap somewhere inside FLASH, and avoid ERASE until most of an entire flash block (eg. 256KB) is marked as "trimmed". This is the implementation we all hope for, but which many (most?) current drives do not seem to implement. Non-deterministic TRIM should also try to ensure that the original data is no longer there (for security reasons), so it may have the same issues. > Equally, FITRIM is useful since the overhead can be reduced to certain > points in time when a system is less busy. With GFS2 (this may well also > apply to OCFS2) doing a userspace trim is not very easy since there is > no simple way to access the locking for the fs from userspace wiper.sh locks the blocks by reserving the space for a file. But it has to lock ALL freespace, whereas FITRIM could be clever and only lock the bits it is actually trimming at any instant (I'm agreeing with you!). > I'm intending to put a patch together fairly shortly to implement FITRIM for GFS2. Excellent. So eventually we might expect FITRIM to reappear at the VFS level, rather than being buried inside each individual fs's ioctl() handler? Cheers -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/