From: Pavel Machek Subject: Re: [patch] ext2/3: document conditions when reliable operation is possible Date: Sun, 30 Aug 2009 09:19:57 +0200 Message-ID: <20090830071957.GA1656@ucw.cz> References: <4A92F6FC.4060907@redhat.com> <20090826111751.GC26595@elf.ucw.cz> <20090826122813.GI32712@mit.edu> <200908270106.15032.rob@landley.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Rob Landley , Theodore Tso , Rik van Riel , Ric Wheeler , Florian Weimer , Goswin von Brederlow , kernel list , Andrew Morton , mtk.manpages@gmail.com, rdunlap@xenotime.net, linux-doc@vger.kernel.org, linux-ext4@vger.kernel.org, corbet@lwn.net To: david@lang.hm Return-path: Content-Disposition: inline In-Reply-To: Sender: linux-doc-owner@vger.kernel.org List-Id: linux-ext4.vger.kernel.org Hi! >> I thought the reason for that was that if your metadata is horked, further >> writes to the disk can trash unrelated existing data because it's lost track >> of what's allocated and what isn't. So back when the assumption was "what's >> written stays written", then keeping the metadata sane was still darn >> important to prevent normal operation from overwriting unrelated existing >> data. >> >> Then Pavel notified us of a situation where interrupted writes to the disk can >> trash unrelated existing data _anyway_, because the flash block size on the 16 >> gig flash key I bought retail at Fry's is 2 megabytes, and the filesystem thinks >> it's 4k or smaller. It seems like what _broke_ was the assumption that the >> filesystem block size >= the disk block size, and nobody noticed for a while. >> (Except the people making jffs2 and friends, anyway.) >> >> Today we have cheap plentiful USB keys that act like hard drives, except that >> their write block size isn't remotely the same as hard drives', but they >> pretend it is, and then the block wear levelling algorithms fuzz things >> further. (Gee, a drive controller lying about drive geometry, the scsi crowd >> should feel right at home.) > > actually, you don't know if your USB key works that way or not. Pavel has > ssome that do, that doesn't mean that all flash drives do > > when you do a write to a flash drive you have to do the following items > > 1. allocate an empty eraseblock to put the data on > > 2. read the old eraseblock > > 3. merge the incoming write to the eraseblock > > 4. write the updated data to the flash > > 5. update the flash trnslation layer to point reads at the new location > instead of the old location. That would need two erases per single sector writen, no? Erase is in milisecond range, so the performance would be just way too bad :-(. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html