From: david@lang.hm Subject: Re: [patch] ext2/3: document conditions when reliable operation is possible Date: Wed, 26 Aug 2009 04:28:00 -0700 (PDT) Message-ID: References: <20090825232601.GF4300@elf.ucw.cz> <4A947682.2010204@redhat.com> <20090825235359.GJ4300@elf.ucw.cz> <4A947DA9.2080906@redhat.com> <20090826001645.GN4300@elf.ucw.cz> <4A948259.40007@redhat.com> <20090826010018.GA17684@mit.edu> <4A948C94.7040103@redhat.com> <20090826025849.GF32712@mit.edu> <4A9510D2.1090704@redhat.com> <20090826111208.GA26595@elf.ucw.cz> Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Ric Wheeler , Theodore Tso , Florian Weimer , Goswin von Brederlow , Rob Landley , kernel list , Andrew Morton , mtk.manpages@gmail.com, rdunlap@xenotime.net, linux-doc@vger.kernel.org, linux-ext4@vger.kernel.org, corbet@lwn.net To: Pavel Machek Return-path: In-Reply-To: <20090826111208.GA26595@elf.ucw.cz> Sender: linux-doc-owner@vger.kernel.org List-Id: linux-ext4.vger.kernel.org On Wed, 26 Aug 2009, Pavel Machek wrote: > On Wed 2009-08-26 06:39:14, Ric Wheeler wrote: >> On 08/25/2009 10:58 PM, Theodore Tso wrote: >>> On Tue, Aug 25, 2009 at 09:15:00PM -0400, Ric Wheeler wrote: >>> >>>> I agree with the whole write up outside of the above - degraded RAID >>>> does meet this requirement unless you have a second (or third, counting >>>> the split write) failure during the rebuild. >>>> >>> The argument is that if the degraded RAID array is running in this >>> state for a long time, and the power fails while the software RAID is >>> in the middle of writing out a stripe, such that the stripe isn't >>> completely written out, we could lose all of the data in that stripe. >>> >>> In other words, a power failure in the middle of writing out a stripe >>> in a degraded RAID array counts as a second failure. >>> To me, this isn't a particularly interesting or newsworthy point, >>> since a competent system administrator who cares about his data and/or >>> his hardware will (a) have a UPS, and (b) be running with a hot spare >>> and/or will imediately replace a failed drive in a RAID array. >> >> I agree that this is not an interesting (or likely) scenario, certainly >> when compared to the much more frequent failures that RAID will protect >> against which is why I object to the document as Pavel suggested. It >> will steer people away from using RAID and directly increase their >> chances of losing their data if they use just a single disk. > > So instead of fixing or at least documenting known software deficiency > in Linux MD stack, you'll try to surpress that information so that > people use more of raid5 setups? > > Perhaps the better documentation will push them to RAID1, or maybe > make them buy an UPS? people aren't objecting to better documentation, they are objecting to misleading documentation. for flash drives the danger is very straightforward (although even then you have to note that it depends heavily on the firmware of the device, some will loose lots of data, some won't loose any) a good thing to do here would be for someone to devise a test to show this problem, and then gather the results of lots of people performing this test to see what the commonalities are. you are generalizing that since you have lost data on flash drives, all flash drives are dangerous. what if it turns out that only one manufacturer is doing things wrong? you will have discouraged people from using flash drives for no reason. (potentially causing them to loose data becouse they ae scared away from using flash drives and don't implement anything better) to be safe, all that a flash drive needs to do is to not change the FTL pointers until the data has fully been recorded in it's new location. this is probably a trivial firmware change. for raid arrays, we are still learning the nuances of what actually can happen. the comment that Rik made a few hours ago when he pointed out that with raid 5 you won't trash the entire stripe (which is what I thought happened from prior comments), but instead run the risk of loosing two relativly definable chunks of data 1. the block you are writing (which you can loose anyway) 2. the block that would live on the disk that is missing. that drasticly lessens the impact of the problem I would like to see someone explain what would happen on raid 6, and I think that the possibilities that Neil talked about where he said that it was possible to try the various combinations and see which ones agree with each other would be a good thing to implement if he can do so. but the super simplified statement you keep trying to make is significantly overstating and oversimplifying the problem. David Lang