Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756694AbXFQROO (ORCPT ); Sun, 17 Jun 2007 13:14:14 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754072AbXFQROF (ORCPT ); Sun, 17 Jun 2007 13:14:05 -0400 Received: from mail.tmr.com ([64.65.253.246]:35436 "EHLO gaimboi.tmr.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751818AbXFQROC (ORCPT ); Sun, 17 Jun 2007 13:14:02 -0400 Message-ID: <46756BE2.7010401@tmr.com> Date: Sun, 17 Jun 2007 13:14:10 -0400 From: Bill Davidsen Organization: TMR Associates Inc, Schenectady NY User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.8) Gecko/20061105 SeaMonkey/1.0.6 MIME-Version: 1.0 To: Neil Brown CC: david@lang.hm, linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org Subject: Re: limits on raid References: <18034.479.256870.600360@notabene.brown> <18034.3676.477575.490448@notabene.brown> In-Reply-To: <18034.3676.477575.490448@notabene.brown> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4323 Lines: 102 Neil Brown wrote: > On Thursday June 14, david@lang.hm wrote: > >> On Fri, 15 Jun 2007, Neil Brown wrote: >> >> >>> On Thursday June 14, david@lang.hm wrote: >>> >>>> what is the limit for the number of devices that can be in a single array? >>>> >>>> I'm trying to build a 45x750G array and want to experiment with the >>>> different configurations. I'm trying to start with raid6, but mdadm is >>>> complaining about an invalid number of drives >>>> >>>> David Lang >>>> >>> "man mdadm" search for "limits". (forgive typos). >>> >> thanks. >> >> why does it still default to the old format after so many new versions? >> (by the way, the documetnation said 28 devices, but I couldn't get it to >> accept more then 27) >> > > Dunno - maybe I can't count... > > >> it's now churning away 'rebuilding' the brand new array. >> >> a few questions/thoughts. >> >> why does it need to do a rebuild when makeing a new array? couldn't it >> just zero all the drives instead? (or better still just record most of the >> space as 'unused' and initialize it as it starts useing it?) >> > > Yes, it could zero all the drives first. But that would take the same > length of time (unless p/q generation was very very slow), and you > wouldn't be able to start writing data until it had finished. > You can "dd" /dev/zero onto all drives and then create the array with > --assume-clean if you want to. You could even write a shell script to > do it for you. > > Yes, you could record which space is used vs unused, but I really > don't think the complexity is worth it. > > How about a simple solution which would get an array on line and still be safe? All it would take is a flag which forced reconstruct writes for RAID-5. You could set it with an option, or automatically if someone puts --assume-clean with --create, leave it in the superblock until the first "repair" runs to completion. And for repair you could make some assumptions about bad parity not being caused by error but just unwritten. Thought 2: I think the unwritten bit is easier than you think, you only need it on parity blocks for RAID5, not on data blocks. When a write is done, if the bit is set do a reconstruct, write the parity block, and clear the bit. Keeping a bit per data block is madness, and appears to be unnecessary as well. >> while I consider zfs to be ~80% hype, one advantage it could have (but I >> don't know if it has) is that since the filesystem an raid are integrated >> into one layer they can optimize the case where files are being written >> onto unallocated space and instead of reading blocks from disk to >> calculate the parity they could just put zeros in the unallocated space, >> potentially speeding up the system by reducing the amount of disk I/O. >> > > Certainly. But the raid doesn't need to be tightly integrated > into the filesystem to achieve this. The filesystem need only know > the geometry of the RAID and when it comes to write, it tries to write > full stripes at a time. If that means writing some extra blocks full > of zeros, it can try to do that. This would require a little bit > better communication between filesystem and raid, but not much. If > anyone has a filesystem that they want to be able to talk to raid > better, they need only ask... > > >> is there any way that linux would be able to do this sort of thing? or is >> it impossible due to the layering preventing the nessasary knowledge from >> being in the right place? >> > > Linux can do anything we want it to. Interfaces can be changed. All > it takes is a fairly well defined requirement, and the will to make it > happen (and some technical expertise, and lots of time .... and > coffee?). > Well, I gave you two thoughts, one which would be slow until a repair but sounds easy to do, and one which is slightly harder but works better and minimizes performance impact. -- bill davidsen CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/