2004-01-11 15:22:33

by Wilfried Weissmann

[permalink] [raw]
Subject: kernel 2.6 biosraid via device mapper - partition support

hello,

i just started to investigate how biosraid support for the HPT37X
IDE-chipsets can be implemented in the 2.6 kernel. implementing the
basic raid levels (0, 1, 0+1, JBOD) seems to be pretty straight forward.
this can be done by reading the raid signatures of the disks and then
pipeing the configuration through dmsetup or using the libdevmapper
library directly. what bothers me is the partition support. the number
of minor device nodes that are registered per mapped block device is 1.
this means that there is no way that the kernel does the
partition-handling by itself. the alternative is to do the partition
scanning in userspace and to use another device mapper layer to create
the partition device nodes. it appears that this was already suggested
by Christophe Varoqui ( http://lwn.net/Articles/13958/ ) but this
project is now idle. this also has the disadvantage that any changes in
the partitioning of the raid volume (e.g. by using *fdisk, distribution
installers, ...) require a manual re-invocation of the biosraid setup
tool. plus the whole code under linux/fs/partitions/... has to be
duplicated so that not only the dos partitioning scheme is supported,
but also BSD slices, x86 solaris, windows dynamic disks, ...

which way to go? is there another solution that i have missed?

regards,
Wilfried


2004-01-11 19:35:16

by Christophe Saout

[permalink] [raw]
Subject: Re: [dm-devel] kernel 2.6 biosraid via device mapper - partition support

Am So, den 11.01.2004 schrieb Wilfried Weissmann um 16:21:

> i just started to investigate how biosraid support for the HPT37X
> IDE-chipsets can be implemented in the 2.6 kernel. implementing the
> basic raid levels (0, 1, 0+1, JBOD) seems to be pretty straight forward.

Yepp.

> this can be done by reading the raid signatures of the disks and then
> pipeing the configuration through dmsetup or using the libdevmapper
> library directly. what bothers me is the partition support. the number
> of minor device nodes that are registered per mapped block device is 1.
> this means that there is no way that the kernel does the
> partition-handling by itself.

Yepp.

> the alternative is to do the partition
> scanning in userspace and to use another device mapper layer to create
> the partition device nodes. it appears that this was already suggested
> by Christophe Varoqui ( http://lwn.net/Articles/13958/ ) but this
> project is now idle.

Some people have written scripts to run in initrd for this the last
time. Using sfdisk -ld and shell scripts is very simple though not a
solution for everything, just for now.

> this also has the disadvantage that any changes in
> the partitioning of the raid volume (e.g. by using *fdisk, distribution
> installers, ...) require a manual re-invocation of the biosraid setup
> tool. plus the whole code under linux/fs/partitions/... has to be
> duplicated so that not only the dos partitioning scheme is supported,
> but also BSD slices, x86 solaris, windows dynamic disks, ...

The plan for Linux 2.7 is to rip out partition detection from the kernel
and do everything in userspace (probably initramfs). So someone could
start by making the partition detection code a library.

The detection script/program could then be run from udev (the hotplug
interface) when a new block device is created.

I don't know what can be done about the ioctl that tells the kernel to
reread the partition table. It could either be dropped or make the
kernel create a hotplug event but that seems rather ugly.

> which way to go? is there another solution that i have missed?

I think you're on the right way. But you see there are still a lot of
issues to be discussed.


2004-01-11 20:34:17

by Jeff Garzik

[permalink] [raw]
Subject: partition detection in 2.7

Christophe Saout wrote:
> The plan for Linux 2.7 is to rip out partition detection from the kernel
> and do everything in userspace (probably initramfs). So someone could
> start by making the partition detection code a library.


Linus just vehementally stated recently that partition parsing wouldn't
be leaving the kernel :)

However, I do think we will eventually move to a middle ground, where
partition parsing code will be in the kernel _source_ tree, but it will
be in initramfs as you describe.

The reason being is that block device attachment and setup is growing
more complicated over time, as people move to things like dm+lvm2+md or
iscsi+dm+evms. Thus, the support code to make those device combinations
to be used as a root device will get more and more complex.

RAID and device mapper were actually two big reasons why I am pushing
for klibc (pushed back to 2.7 now) and initramfs in the kernel source
tree. LVM, some EFI bits, dm, and md all have boot components are
mainly in the kernel because they need to happen (a) early and (b)
always, not because they actually need to be compiled into the kernel
image itself.

Jeff