2003-06-29 06:43:52

by rmoser

[permalink] [raw]
Subject: File System conversion -- ideas

I know I spout a ... wtf? HTML composing? *attempts to eliminate*

I know I spout a lot of crap, and wish I could just do it all (can we get
a "Make a small device driver for virtual hardware in Linux 2.4 and 2.5"
tutorial up on kernel.org?!), but I think I've got some good ideas. At
any rate, the good is kept and the bad is weeded out, right?

Anyhow, I'm thinking still about when reiser4 comes out. I want to
convert to it from reiser3.6. It came to my attention that a user-space
tool to convert between filesystems is NOT the best way to deal with
this. Seriously, you'd think it would be, right? Wrong, IMHO.

You have the filesystem code for every filesystem Linux supports. It's
there, in the kernel. So why maintain a kludgy userspace tool that has
to be rewritten to understand them all? I have a better idea.

How about a kernel syscall? It's possible to do this on a running
filesystem but it's far too difficult for a start, so let's start with
unmounted filesystems mmkay?

**** BEGIN WELL STRUCTURED MESSAGE ****

I'm going to go over a method of building into the kernel a filesystem
conversion suite. I am first going to go over a brief overrun of the concept,
then I will draw up a roadmap, and then I will explain why I believe this is
the best way to solve this problem.

What I am suggesting is a kernel syscall suite that will allow a simple
userspace application to invoke a conversion between filesystems on an
unmounted filesystem. The idea is that instead of maintaining the tool,
you (sorry I keep wanting to say "we" for no reason, so excuse me if I do)
simply code this and then maintain the kernel as usual, almost forgetting
about the tool because it changes with the kernel.

The first thing that has to happen is that the kernel filesystem drivers must
be altered to allow the filesystems to draw out the meta-data and group it
with the data, transmit it to the conversion functions, and have this data given
to them to be rewritten. This will require a quick pre-pass of each individual
inode and a comparison to decide if the converted filesystem will actually
fit on disk; ext3 being converted to ext3 with a larger block size will FAIL if
the conversion causes the data to be bigger than the media.

The second thing that must happen is a syscall has to be added that allows
for conversion to be invoked. Simple. Preferably these functions would fork
from the kernel in a new thread or process and work in userspace, to avoid
locking the kernel as they execute and lagging the userspace by making the
kernel eat massive resources.

The last thing is that a user-space program to invoke these syscalls has to be
coded.

Here is a suggested roadmap, with excessive detail:

1) Create a method for storing meta-data for each file/directory on a filesystem
which is being slowly destroyed. The data structures have to house everything
including the data that goes into the inode tables and all meta-data about the
inode, plus the data for the file/directory itself. It MUST be object oriented,
because some meta-data will not transfer from one filesystem to another. Each
unit should, possibly MUST, be compressed, since it MAY be larger than the
original input and also because it likely will have to be stored in the space of the
original data, unless the data is slowly shifted down the filesystem. It is
preferable to make this datasystem fault tolerant, so that if it goes down, the
conversion can be continued without damage. It should be possible to plan a
conversion on umount, so that the root filesystem may be converted at shutdown.
- Object oriented: Store meta-data that may not be recognized by the new
filesystem
- Journalized: Don't break!
- Compress data unit: Don't get bigger than input. Option is per-unit (compressed
files get bigger when compressed!)
- Store data that is needed to resume the conversion at any time: There may be
a collossal system crash during conversion!
- Differentiate between each filesystem structure and the datasystem used during
conversion: Must be able to disassemble one filesystem and reassemble it to
another WITHOUT getting lost!
- ... I had another important thing I forgot for now. You guys are smart and there's
more of you than me. You figure it out.

2) Write this datastructure into the filesystems section of the kernel.

3) Rewrite the filesystem drivers in the kernel to be able to communicate with
the filesystem conversion datastructure code. This will allow the slow systematic
destruction of the filesystem in place and at the same time the slow systematic
creation of the new filesystem for EVERY filesystem [with write support?] in the
kernel.

4) Impliment a syscall to initiate this process. The functions should be run in
userspace if it is possible to fork execution out of the kernel and into userspace.
These syscalls include the checks to make sure the filesystem is not mounted.

5) Impliment a userspace program to call these functions. It is slave to syscalls;
it does NOT do the checks itself.

6) Revisit steps 1 through 5 as needed until the process works properly.

7) Continue on to recode kernel VFS to allow the conversion to take place on a
running filesystem, and to allow that filesystem to be mounted even if the
conversion was forcibly stopped. This will allow a smoother conversion of the
root filesystem and allow the user to keep running during conversion of the root
filesystem, although with likely a massive latencey issue.

I believe this is the best method of dealing with the problem of filesystem
conversion. Current methods include making a new, empty filesystem and
copying over all files as root with a 'umask 000' command first. Future methods
excluding this one may include a userspace program with understanding of
multiple filesystems, or a userspace program that understands kernel modules.
These have the following flaws:

- Copying the files requires a large amount of disk space to create a new
partition and place the new filesystem on it. Also, it alters the entire
partition layout and forces the user to either ping-pong between partitions
or rewrite his /etc/fstab and possibly his root= parameter in his kernel command
line.
- A userspace program with filesystems coded in will require constant matinence
as new filesystems are created and old filesystems are maintained. The
constant development of filesystems such as ext2/3 (i.e. 2.0 doesn't understand
the extra features in the version of ext2 that 2.2 has) and reiserfs causes this
to require the rewriting of code in the userspace program, which is redundant
because the filesystem has already been implimented in an incompatible
manner in the kernel itself.
- A userspace program using kernel modules will be more prone to bugs, as it has
to simultaneously grok all version of kernel modules. The structure of kernel
modules changes as time goes on. The program will grow and grow, or lose the
ability to communicate with older kernel versions. It has the advantage that it
may be written to also grok older kernel modules; however, the entire
infrastructure described above may be backported to older kernel versions,
making this argument moot.

My method has the flaw that you have to get the kernel developers to agree to it.
If Linus is reading this, I'd like to at least ask that you hold back rejecting this until
the other developers have a chance to examine it (pessimistic outlook I have on
things, isn't it?). It also has the flaw of requiring massive kernel-level work to
impliment, which in itself may render the kernel useless if not done quite right.
These changes to the filesystem drivers should not affect the filesystem drivers
until the kernel explicitely calls them to do the conversions (and when the option
is enabled in the make {menu | x}config). It is flawed also in that it requires
massive CPU and possibly memory; but that is expected of any conversion of
filesystems.

Well, tell me what you think. This is where my thinking ends.


2003-06-29 09:48:38

by John Bradford

[permalink] [raw]
Subject: Re: File System conversion -- ideas

> Anyhow, I'm thinking still about when reiser4 comes out. I want to
> convert to it from reiser3.6. It came to my attention that a user-space
> tool to convert between filesystems is NOT the best way to deal with
> this. Seriously, you'd think it would be, right? Wrong, IMHO.
>
> You have the filesystem code for every filesystem Linux supports. It's
> there, in the kernel. So why maintain a kludgy userspace tool that has
> to be rewritten to understand them all? I have a better idea.
>
> How about a kernel syscall? It's possible to do this on a running
> filesystem but it's far too difficult for a start, so let's start with
> unmounted filesystems mmkay?

Apart from the special case of converting from one major version of a
filesystem to another major version of the same filesystem, I think
the performance of an on-the-fly filesystem conversion utility is
going to be so much worse than just creating a new partition and
copying the data across, that the only reason to do it would be if you
could do it on a read-write filesystem without unmounting it.

What I'd like to see is union mounts which allowed you to mount a new
filesystem of a different type over the original one, and have all new
writes go to the new fileystem. I.E. as files were modified, they
would be re-written to the new FS. That would be one way of avoiding
the performance hit on a busy server.

John.

2003-06-29 13:13:57

by Jamie Lokier

[permalink] [raw]
Subject: Re: File System conversion -- ideas

John Bradford wrote:
I think
> the performance of an on-the-fly filesystem conversion utility is
> going to be so much worse than just creating a new partition and
> copying the data across,

which is awfully difficult if you have, say, a 60GB filesystem, a 60GB
disk, and nothing else.

> that the only reason to do it would be if you
> could do it on a read-write filesystem without unmounting it.

IMHO even if it requires the filesystem to be unmounted, it would
still be useful. More challenging to use - you'd have to boot and run
from ramdisk, but much more useful than not being able to convert at all.

> What I'd like to see is union mounts which allowed you to mount a new
> filesystem of a different type over the original one, and have all new
> writes go to the new fileystem. I.E. as files were modified, they
> would be re-written to the new FS. That would be one way of avoiding
> the performance hit on a busy server.

But useless unless you have a second disk lying around that you don't
use for anything but filesystem conversions.

-- Jamie

2003-06-29 13:38:07

by Leonard Milcin Jr.

[permalink] [raw]
Subject: Re: File System conversion -- ideas

Jamie Lokier wrote:
> John Bradford wrote:
> I think
>
>>the performance of an on-the-fly filesystem conversion utility is
>>going to be so much worse than just creating a new partition and
>>copying the data across,
>
>
> which is awfully difficult if you have, say, a 60GB filesystem, a 60GB
> disk, and nothing else.
>

I think that filesystem conversion on-the-fly is useless. Why? If you're
making conversion of filesystem, you have to make good backup of data
from that filesystem. It is likely that when something goes wrong during
conversion (power loss) filesystem will be corrupted, and data will be
lost. If you think the data is not worth to make backup - you don't have
to convert it. Just delete worthless filesystem, and create new one. I
the data is worth making backup, and finally you make it - you don't
need to convert it. You could just delete filesystem, and restore data
from copy. If in turn one think the data is worth to protect it from
loss, but he will not do it... he risks that the data will be lost, and
he should not get access to such things.

I think that copying data to another filesystem, and restoring it to
newly created is most of the time best and fastest method of converting
filesystems.

Regards,

Leonard Milcin Jr.

--
"Unix IS user friendly... It's just selective about who its friends are."
-- Tollef Fog Heen

2003-06-29 13:36:44

by David D. Hagood

[permalink] [raw]
Subject: Re: File System conversion -- ideas

This is a place where logical volume management can help.

For example, suppose you have a 60G disk, 55G of data, in ext2, and you
wish to convert to ReiserFS.

Step 1: Shrink the volume to 55G. This requires a "shrink disk" utility
for the source file system (which exists for the major file systems in
use today).
Step 2: Create an LVM block in the remaining 5G.
Step 3: Create a ReiserFS in the LVM block.
Step 4: Move 5G of data from the ext2 system to the ReiserFS block.
Step 5: Shrink the ext2 volume by another 5G
Step 6: Convert that 5G into an LVM block
Step 7: Add that block to the ReiserFS volume group.
Step 8: Grow the ReiserFS.
Step 9: Repeat 4-8 as needed.


This is why I'd really love to see LVM|EVM become standard, not just in
the kernel but in the distributions - if every distro by default made
all Linux volumes in LVM, then migrating data to bigger drives/adding
more space/converting file systems would be so much easier.

2003-06-29 15:51:04

by John Bradford

[permalink] [raw]
Subject: Re: File System conversion -- ideas

> > I think
> > the performance of an on-the-fly filesystem conversion utility is
> > going to be so much worse than just creating a new partition and
> > copying the data across,
>
> which is awfully difficult if you have, say, a 60GB filesystem, a 60GB
> disk, and nothing else.

Well, I don't partition all of the space on every new disk I buy
straight away, I partition off what I think I'll need, and leave the
rest unallocated.

> > that the only reason to do it would be if you
> > could do it on a read-write filesystem without unmounting it.
>
> IMHO even if it requires the filesystem to be unmounted, it would
> still be useful. More challenging to use - you'd have to boot and run
> from ramdisk, but much more useful than not being able to convert at all.

Only if it is the root filesystem, the filesystem of which generally
isn't going to affect overall performance that much.

> > What I'd like to see is union mounts which allowed you to mount a new
> > filesystem of a different type over the original one, and have all new
> > writes go to the new fileystem. I.E. as files were modified, they
> > would be re-written to the new FS. That would be one way of avoiding
> > the performance hit on a busy server.
>
> But useless unless you have a second disk lying around that you don't
> use for anything but filesystem conversions.

Not at all. You can just use unpartitioned space on your existing
disk.

John.

2003-06-29 16:02:04

by John Bradford

[permalink] [raw]
Subject: Re: File System conversion -- ideas

> > I think
> >
> >>the performance of an on-the-fly filesystem conversion utility is
> >>going to be so much worse than just creating a new partition and
> >>copying the data across,
> >
> >
> > which is awfully difficult if you have, say, a 60GB filesystem, a 60GB
> > disk, and nothing else.
> >
>
> I think that filesystem conversion on-the-fly is useless. Why? If you're
> making conversion of filesystem, you have to make good backup of data
> from that filesystem.

I agree.

Imagine a webserver with all it's webpages on a 40 GB EXT-2 partition
on /dev/sda1.

If I wanted to move the data on to a ReiserFS partition, I would just:

* Create the new partition on another device, E.G. /dev/sdb1
* Mount /dev/sda1 read-only
* Copy the data across to /dev/sdb1 as a nice process
* Stop the webserver processes
* Unmount /dev/sda1
* Mount /dev/sdb1 read-only
* Restart the webserver processes
* Test it
* Mount /dev/sdb1 read-write
* Keep /dev/sda1 around as a quick-to-access backup until I was sure
it was all working correctly.
* Re-use /dev/sda1

The webserver would be off-line for only a few seconds, and
performance would not be significantly degraded at any time.

John.

2003-06-29 18:13:43

by rmoser

[permalink] [raw]
Subject: Re: File System conversion -- ideas



*********** REPLY SEPARATOR ***********

On 6/29/2003 at 11:11 AM John Bradford wrote:

>> Anyhow, I'm thinking still about when reiser4 comes out. I want to
>> convert to it from reiser3.6. It came to my attention that a user-space
>> tool to convert between filesystems is NOT the best way to deal with
>> this. Seriously, you'd think it would be, right? Wrong, IMHO.
>>
>> You have the filesystem code for every filesystem Linux supports. It's
>> there, in the kernel. So why maintain a kludgy userspace tool that has
>> to be rewritten to understand them all? I have a better idea.
>>
>> How about a kernel syscall? It's possible to do this on a running
>> filesystem but it's far too difficult for a start, so let's start with
>> unmounted filesystems mmkay?
>
>Apart from the special case of converting from one major version of a
>filesystem to another major version of the same filesystem, I think
>the performance of an on-the-fly filesystem conversion utility is
>going to be so much worse than just creating a new partition and
>copying the data across, that the only reason to do it would be if you
>could do it on a read-write filesystem without unmounting it.
>

You've entirely missed the point :/ Did you read the last section? I noted
that the "make new partition and copy" method requires, first off, space
for a new partition. All my partitions have massive amount of data on them.
I can't do that. Those of us that can have to either do it twice, or rewrite
fstab.

Eventually I'm hoping it can be done on a read-write filesystem. It's
possible; I've thought about how to defragment read-write datasystems
without getting in the way of logical operations.

>What I'd like to see is union mounts which allowed you to mount a new
>filesystem of a different type over the original one, and have all new
>writes go to the new fileystem. I.E. as files were modified, they
>would be re-written to the new FS. That would be one way of avoiding
>the performance hit on a busy server.
>

mmmm, then you'd need both fs' though. That's not conversion ;-)


>John.
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/

--Bluefox Icy

2003-06-29 18:15:16

by John Bradford

[permalink] [raw]
Subject: Re: File System conversion -- ideas

> This is a place where logical volume management can help.
>
> For example, suppose you have a 60G disk, 55G of data, in ext2, and you
> wish to convert to ReiserFS.
>
> Step 1: Shrink the volume to 55G. This requires a "shrink disk" utility
> for the source file system (which exists for the major file systems in
> use today).
> Step 2: Create an LVM block in the remaining 5G.
> Step 3: Create a ReiserFS in the LVM block.
> Step 4: Move 5G of data from the ext2 system to the ReiserFS block.
> Step 5: Shrink the ext2 volume by another 5G
> Step 6: Convert that 5G into an LVM block
> Step 7: Add that block to the ReiserFS volume group.
> Step 8: Grow the ReiserFS.
> Step 9: Repeat 4-8 as needed.
>
>
> This is why I'd really love to see LVM|EVM become standard, not just in
> the kernel but in the distributions - if every distro by default made
> all Linux volumes in LVM, then migrating data to bigger drives/adding
> more space/converting file systems would be so much easier.

It's also a good reason not to use one huge partition on each disk,
and a good reason not to partition the whole disk when it's not
needed.

I've seen, (mainly desktop, not server), Linux machines with one
physical disk containing two partitions, root and swap, with the swap
partition being twice the physical memory of the box, even when the
box has more than a gigabyte of physical RAM.

It's usually more flexible just to partition the space you need, and
add more partitions when necessary. For typical desktop use, swap
isn't even necessary with 1 GB of physical RAM.

For example, if you have an 80 GB disk, you could initially partition
10 GB for the root partition, and leave 70 GB unused. When the root
partition fills us, you can simply use du -s /* to see which
directories are taking up the most space, and move them to separate
partitions.

John.

2003-06-29 18:17:24

by rmoser

[permalink] [raw]
Subject: Re: File System conversion -- ideas



*********** REPLY SEPARATOR ***********

On 6/29/2003 at 8:50 AM David D. Hagood wrote:

>This is a place where logical volume management can help.
>
>For example, suppose you have a 60G disk, 55G of data, in ext2, and you
>wish to convert to ReiserFS.
>
>Step 1: Shrink the volume to 55G. This requires a "shrink disk" utility
>for the source file system (which exists for the major file systems in
>use today).
>Step 2: Create an LVM block in the remaining 5G.
>Step 3: Create a ReiserFS in the LVM block.
>Step 4: Move 5G of data from the ext2 system to the ReiserFS block.
>Step 5: Shrink the ext2 volume by another 5G
>Step 6: Convert that 5G into an LVM block
>Step 7: Add that block to the ReiserFS volume group.
>Step 8: Grow the ReiserFS.
>Step 9: Repeat 4-8 as needed.
>

Ass yourself for hours, each time risking making a typo and killing both
filesystems, or risking having the LVM resize die from a powerdrop or a kick
to the power button (sorry we don't all have immortal fault tolerance). I actually
though about this one and figured it was too rediculously annoying to actually
bring up :-p

>
>This is why I'd really love to see LVM|EVM become standard, not just in
>the kernel but in the distributions - if every distro by default made
>all Linux volumes in LVM, then migrating data to bigger drives/adding
>more space/converting file systems would be so much easier.
>

I've never used LVM, but I'll look at it one day. If it's stable, that's good; I
don't use Windows. I don't know exactly what LVM is but I have a pretty
good idea; it's been forever since I read the doc on it, I forgot what it said!

>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/

--Bluefox Icy

2003-06-29 18:33:11

by rmoser

[permalink] [raw]
Subject: Re: File System conversion -- ideas



*********** REPLY SEPARATOR ***********

On 6/29/2003 at 3:54 PM Leonard Milcin Jr. wrote:

>Jamie Lokier wrote:
>> John Bradford wrote:
>> I think
>>
>>>the performance of an on-the-fly filesystem conversion utility is
>>>going to be so much worse than just creating a new partition and
>>>copying the data across,
>>
>>
>> which is awfully difficult if you have, say, a 60GB filesystem, a 60GB
>> disk, and nothing else.
>>
>
>I think that filesystem conversion on-the-fly is useless. Why? If you're
>making conversion of filesystem, you have to make good backup of data
>from that filesystem. It is likely that when something goes wrong during
>conversion (power loss) filesystem will be corrupted, and data will be
>lost.

Let's try to make this a bit clearer. Remember I pointed out,

[QUOTE]
"The first thing ... is that the kernel filesystem drivers must ... allow the
filesystems to draw out the meta-data ... with the data, transmit it to the
conversion functions, and have this data given to them to be rewritten.
This will require a quick pre-pass ... to decide if [it] will actually fit"
[ENDQUOTE]

This applies to the datasystem that is rolled out ontop of this too. I
did point out that you need to create a transaction-based datasystem
capable of rolling back any changes and storing each state as it goes
so that it is 100% capable of picking up from where it left off without any
data loss. The way this is done is that before you do anything that alters
the state of the conversion--the "state" being the data that is stored to
explain where each filesystem is, allows them to figure out what they are
doing next, and everything else they need to do--you create a transaction
with all of the original data in it (roll-back, not roll-forward). Then you begin
the change. If power is lost, the kernel will recognize that the superblock
on this filesystem is a conversion datasystem, and replay the journal and
continue the conversion as soon as someone tries to mount it. Nothing
short of a bug in the code can damage it.

The issue is that you need the first filesystem to make space for you, for
the journal and for the conversion datasystem. It supplies the details of
where to find this space to the CDS, which is used for the journal and such.
The CDS can move blocks around; it is consulted by both filesystems because
blocks may be moved and thus it must find their real physical location. The
most important one here is the primary superblock, which is moved away from
the beginning of the media. When the FS wants block N, it asks the CDS
for it. Thing is, you need space for this. It will be checked for, to make sure
that there's enough space to do the conversion.

> If you think the data is not worth to make backup - you don't have
>to convert it. Just delete worthless filesystem, and create new one. I
>the data is worth making backup, and finally you make it - you don't
>need to convert it. You could just delete filesystem, and restore data
>from copy. If in turn one think the data is worth to protect it from
>loss, but he will not do it... he risks that the data will be lost, and
>he should not get access to such things.
>

Nrrrg. Yeah, I've got 80 gig and only CDR's to back up to, and no money.
A CDR may read for me the day it's written, and then not work the next
day. Still a risk.

>I think that copying data to another filesystem, and restoring it to
>newly created is most of the time best and fastest method of converting
>filesystems.
>

If you have the space.

>Regards,
>
>Leonard Milcin Jr.
>
>--
>"Unix IS user friendly... It's just selective about who its friends are."
> -- Tollef Fog Heen
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/



2003-06-29 18:35:41

by rmoser

[permalink] [raw]
Subject: Re: File System conversion -- ideas

I'm the only one in the world who can have 80 gig of partitions
and not have capped on any of them (i.e. they've got free space
but it's getting less and less each day). ;-)

*********** REPLY SEPARATOR ***********

On 6/29/2003 at 7:37 PM John Bradford wrote:

>> This is a place where logical volume management can help.
>>
>> For example, suppose you have a 60G disk, 55G of data, in ext2, and you
>> wish to convert to ReiserFS.
>>
>> Step 1: Shrink the volume to 55G. This requires a "shrink disk" utility
>> for the source file system (which exists for the major file systems in
>> use today).
>> Step 2: Create an LVM block in the remaining 5G.
>> Step 3: Create a ReiserFS in the LVM block.
>> Step 4: Move 5G of data from the ext2 system to the ReiserFS block.
>> Step 5: Shrink the ext2 volume by another 5G
>> Step 6: Convert that 5G into an LVM block
>> Step 7: Add that block to the ReiserFS volume group.
>> Step 8: Grow the ReiserFS.
>> Step 9: Repeat 4-8 as needed.
>>
>>
>> This is why I'd really love to see LVM|EVM become standard, not just in
>> the kernel but in the distributions - if every distro by default made
>> all Linux volumes in LVM, then migrating data to bigger drives/adding
>> more space/converting file systems would be so much easier.
>
>It's also a good reason not to use one huge partition on each disk,
>and a good reason not to partition the whole disk when it's not
>needed.
>
>I've seen, (mainly desktop, not server), Linux machines with one
>physical disk containing two partitions, root and swap, with the swap
>partition being twice the physical memory of the box, even when the
>box has more than a gigabyte of physical RAM.
>
>It's usually more flexible just to partition the space you need, and
>add more partitions when necessary. For typical desktop use, swap
>isn't even necessary with 1 GB of physical RAM.
>
>For example, if you have an 80 GB disk, you could initially partition
>10 GB for the root partition, and leave 70 GB unused. When the root
>partition fills us, you can simply use du -s /* to see which
>directories are taking up the most space, and move them to separate
>partitions.
>
>John.
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/



2003-06-29 18:35:46

by John Bradford

[permalink] [raw]
Subject: Re: File System conversion -- ideas

> >> Anyhow, I'm thinking still about when reiser4 comes out. I want to
> >> convert to it from reiser3.6. It came to my attention that a user-space
> >> tool to convert between filesystems is NOT the best way to deal with
> >> this. Seriously, you'd think it would be, right? Wrong, IMHO.
> >>
> >> You have the filesystem code for every filesystem Linux supports. It's
> >> there, in the kernel. So why maintain a kludgy userspace tool that has
> >> to be rewritten to understand them all? I have a better idea.
> >>
> >> How about a kernel syscall? It's possible to do this on a running
> >> filesystem but it's far too difficult for a start, so let's start with
> >> unmounted filesystems mmkay?
> >
> >Apart from the special case of converting from one major version of a
> >filesystem to another major version of the same filesystem, I think
> >the performance of an on-the-fly filesystem conversion utility is
> >going to be so much worse than just creating a new partition and
> >copying the data across, that the only reason to do it would be if you
> >could do it on a read-write filesystem without unmounting it.
> >
>
> You've entirely missed the point :/ Did you read the last section?

Yes, but...

> I noted
> that the "make new partition and copy" method requires, first off, space
> for a new partition. All my partitions have massive amount of data on them.
> I can't do that. Those of us that can have to either do it twice, or rewrite
> fstab.

Rewriting fstab shouldn't be a problem :-).

> Eventually I'm hoping it can be done on a read-write filesystem. It's
> possible; I've thought about how to defragment read-write datasystems
> without getting in the way of logical operations.

Seriously, though, I was thinking more of what's most useful in a
server situation, where it's not uncommon to have a lot of spare
capacity - I don't think that the kernel mode read-only only converter
is going to be much of an advantage over a userspace solution in those
situations, whereas a read-write one would potentially be, because
although it's reasonable to expect backups to be done anyway, if you
can avoid the downtime needed for the restore, that's a Good Thing.

> >What I'd like to see is union mounts which allowed you to mount a new
> >filesystem of a different type over the original one, and have all new
> >writes go to the new fileystem. I.E. as files were modified, they
> >would be re-written to the new FS. That would be one way of avoiding
> >the performance hit on a busy server.
> >
>
> mmmm, then you'd need both fs' though. That's not conversion ;-)

The idea was to transparently delete files from the old filesystem
once they had been written to, and therefore transferred to the new
filesystem.

I think you've missed my point - for a desktop machine, an hour or two
downtime is usually no problem. For an ISPs webserver, it usually
is, (unless there are a cluster of them serving requests for the same
sites). However, to be able to convert filesystems without:

* Significant performance loss of network serving applications
* Significant downtime

is a very desireable feature, but the ability to do this on a
read-write filesystem is critical - if it has to be unmounted, it's
not as useful.

The reason I mentioned union mounts was because BSD already has union
mounts - see the mount_union manual page for more details. I don't
know of an implementation that allows you to automatically delete the
file on the old filesystem, when the copy on the new filesystem has
been made, though.

John.

2003-06-29 18:59:12

by rmoser

[permalink] [raw]
Subject: Re: File System conversion -- ideas



*********** REPLY SEPARATOR ***********

On 6/29/2003 at 7:58 PM John Bradford wrote:

>> >> Anyhow, I'm thinking still about when reiser4 comes out. I want to
>> >> convert to it from reiser3.6. It came to my attention that a
>user-space
>> >> tool to convert between filesystems is NOT the best way to deal with
>> >> this. Seriously, you'd think it would be, right? Wrong, IMHO.
>> >>
>> >> You have the filesystem code for every filesystem Linux supports.
>It's
>> >> there, in the kernel. So why maintain a kludgy userspace tool that
>has
>> >> to be rewritten to understand them all? I have a better idea.
>> >>
>> >> How about a kernel syscall? It's possible to do this on a running
>> >> filesystem but it's far too difficult for a start, so let's start with
>> >> unmounted filesystems mmkay?
>> >
>> >Apart from the special case of converting from one major version of a
>> >filesystem to another major version of the same filesystem, I think
>> >the performance of an on-the-fly filesystem conversion utility is
>> >going to be so much worse than just creating a new partition and
>> >copying the data across, that the only reason to do it would be if you
>> >could do it on a read-write filesystem without unmounting it.
>> >
>>
>> You've entirely missed the point :/ Did you read the last section?
>
>Yes, but...
>
>> I noted
>> that the "make new partition and copy" method requires, first off, space
>> for a new partition. All my partitions have massive amount of data on
>them.
>> I can't do that. Those of us that can have to either do it twice, or
>rewrite
>> fstab.
>
>Rewriting fstab shouldn't be a problem :-).
>
>> Eventually I'm hoping it can be done on a read-write filesystem. It's
>> possible; I've thought about how to defragment read-write datasystems
>> without getting in the way of logical operations.
>
>Seriously, though, I was thinking more of what's most useful in a
>server situation, where it's not uncommon to have a lot of spare
>capacity - I don't think that the kernel mode read-only only converter
>is going to be much of an advantage over a userspace solution in those
>situations, whereas a read-write one would potentially be, because
>although it's reasonable to expect backups to be done anyway, if you
>can avoid the downtime needed for the restore, that's a Good Thing.
>

It should be easy enough. I dunno if it'll require a VFS rewrite or not though.
The idea is to buffer changes to and allow retrieval of logical filesystem
objects, which requires.. well, RAM. Although, since the inodes on the new
fs won't need to be in the same order they were in on the old fs, it should be
possible to simply write new data to the new fs, IF you watch what you're
doing. And yes, I do realize I'm talking about writing to half-existant
filesystems that by rights can't even mount. (Actually, more like an empty
filesystem that's jumbled around physically, but is being addressed logically
anyway).

Easy trick: Skip deleted inodes, and if you have to change an inode, have
the old fs go mark it as deleted real quick and free the space around it, giving
it to the conversion datasystem. Now you can run read-write while you do it.

Remember also that I insist that there must be a journal in the CDS
(conversion datasystem).

>> >What I'd like to see is union mounts which allowed you to mount a new
>> >filesystem of a different type over the original one, and have all new
>> >writes go to the new fileystem. I.E. as files were modified, they
>> >would be re-written to the new FS. That would be one way of avoiding
>> >the performance hit on a busy server.
>> >
>>
>> mmmm, then you'd need both fs' though. That's not conversion ;-)
>
>The idea was to transparently delete files from the old filesystem
>once they had been written to, and therefore transferred to the new
>filesystem.
>

Heh, sounds like what I'm doing but you're hitting my final goal from the
beginning, and using two partitions.

>I think you've missed my point - for a desktop machine, an hour or two
>downtime is usually no problem. For an ISPs webserver, it usually
>is, (unless there are a cluster of them serving requests for the same
>sites). However, to be able to convert filesystems without:
>
>* Significant performance loss of network serving applications
>* Significant downtime
>
>is a very desireable feature, but the ability to do this on a
>read-write filesystem is critical - if it has to be unmounted, it's
>not as useful.
>

That's the eventual idea. As for performance, errm. The performance loss
would be in referencing the CDS to find where the data in each filesystem is,
and in the CPU time and RAM used up, along with the massive disk access,
while the system does its job. Shouldn't be a problem on servers though;
IIRC they use SCSI disks and fast CPUs?

>The reason I mentioned union mounts was because BSD already has union
>mounts - see the mount_union manual page for more details. I don't
>know of an implementation that allows you to automatically delete the
>file on the old filesystem, when the copy on the new filesystem has
>been made, though.
>

If you think about it, you have this:

[PARTITION 1]
|
V
[PARTITION 2]

I have this (the == is an equivalence signm i.e. this is what's inside):

[PARTITION]
==
[DATASYSTEM]
==
[FILESYSTEM 1]
|
V
[DATASYSTEM ATOMS]
|
V
[FILESYSTEM 2]

Both filesystems are the full size of the partition, and so is the
datasystem. The only difference is that before you start you have
to make sure that the datasystem's gonna fit in with the free space
on the first filesystem, and still have space to start the second
filesystem, and then have space for its atoms. These atoms will
slowly be destroyed as they go into the second filesystem. You
have to also make sure that the second FS won't be bigger than the
first, and will at the end have enough to hold at least the empty
datasystem and one atom.

I feel I should note, since I forgot before, that an atom can contain part
of the data for an inode, as long as you know this and can write the atom
out to the new filesystem and get more of the old.

>John.
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/



2003-06-29 19:02:17

by Jamie Lokier

[permalink] [raw]
Subject: Re: File System conversion -- ideas

John Bradford wrote:
> > which is awfully difficult if you have, say, a 60GB filesystem, a 60GB
> > disk, and nothing else.
>
> Well, I don't partition all of the space on every new disk I buy
> straight away, I partition off what I think I'll need, and leave the
> rest unallocated.

I used to do something like that. It became awfully inconvenient, so I...

> > > that the only reason to do it would be if you
> > > could do it on a read-write filesystem without unmounting it.
> >
> > IMHO even if it requires the filesystem to be unmounted, it would
> > still be useful. More challenging to use - you'd have to boot and run
> > from ramdisk, but much more useful than not being able to convert at all.
>
> Only if it is the root filesystem, the filesystem of which generally
> isn't going to affect overall performance that much.

...now use a single "/" filesystem on most systems, with a tiny
"/boot" one to ensure booting. With journalling, this risk of losing
data this way is much lower than it used to be, and the old reason for
using multiple partitions - to avoid having to fsck /usr - no longer applies.

> > But useless unless you have a second disk lying around that you don't
> > use for anything but filesystem conversions.
>
> Not at all. You can just use unpartitioned space on your existing
> disk.

So you have as much space unpartitioned on your disks as you are
actually using to store data? I generally don't.

-- Jamie

2003-06-29 19:14:39

by Jamie Lokier

[permalink] [raw]
Subject: Re: File System conversion -- ideas

Leonard Milcin Jr. wrote:
> I think that filesystem conversion on-the-fly is useless. Why? If you're
> making conversion of filesystem, you have to make good backup of data
> from that filesystem.

I disagree with this statement.

> It is likely that when something goes wrong during
> conversion (power loss) filesystem will be corrupted, and data will be
> lost.

Only if the converter stores a temporarily inconsistent state on the
filesystem. Sometimes it is possible to write a converter where the
filesystem is in a consistent state throughout, except perhaps during
a critical transition from one filesystem type to the other.

> If you think the data is not worth to make backup - you don't have
> to convert it. Just delete worthless filesystem, and create new one.
> I
> the data is worth making backup, and finally you make it - you don't
> need to convert it.

You are discounting the existence of data which is valuable enough not
to have deleted already, yet which is not valuable enough to backup.
I'd count local mirrors in this category: backup is too expensive, yet
the cost of recreating the mirror is significant (days of
downloading), therefore worth keeping if possible.

Also MP3 & DIVX collections etc. If you lose them it's not the end of
the world, but you'd rather not.

> You could just delete filesystem, and restore data
> from copy. If in turn one think the data is worth to protect it from
> loss, but he will not do it... he risks that the data will be lost, and
> he should not get access to such things.
^^^^^^

It may not be worth it to _you_, but to me the cost of spare disks is
significant enough that I choose to risk my less valuable data. It's
my data hence my choice.

> I think that copying data to another filesystem, and restoring it to
> newly created is most of the time best and fastest method of converting
> filesystems.

You are right that this diminishes the value of an in-place filesystem
converter (and defragmenter), because it is not necessary if you have
the foresight to use multiple partitions or LVM, and enough spare disk
space.q

However it would still be useful to some people, some of the time.

Consider that many people choose ext3 rather than reiser simply
because it is easy to convert ext2 to ext3, and hard to convert ext2
to reiser (and hard to convert back if they don't like it). I have
seen this written by many people who choose to use ext3. Thus proving
that there is value in in-place filesystem conversion :)

-- Jamie

2003-06-29 19:22:25

by rmoser

[permalink] [raw]
Subject: Re: File System conversion -- ideas



*********** REPLY SEPARATOR ***********

On 6/29/2003 at 8:28 PM Jamie Lokier wrote:

>Leonard Milcin Jr. wrote:
>> I think that filesystem conversion on-the-fly is useless. Why? If you're
>> making conversion of filesystem, you have to make good backup of data
>> from that filesystem.
>
>I disagree with this statement.
>

Me too. It's a GOOD IDEA. But... heck we don't all have tape backups.

>> It is likely that when something goes wrong during
>> conversion (power loss) filesystem will be corrupted, and data will be
>> lost.
>
>Only if the converter stores a temporarily inconsistent state on the
>filesystem. Sometimes it is possible to write a converter where the
>filesystem is in a consistent state throughout, except perhaps during
>a critical transition from one filesystem type to the other.
>

Dude come on I said put a journal in the datasystem so you DON'T get
inconsistencies like that! (A roll-back journal)

>> If you think the data is not worth to make backup - you don't have
>> to convert it. Just delete worthless filesystem, and create new one.
>> I
>> the data is worth making backup, and finally you make it - you don't
>> need to convert it.
>
>You are discounting the existence of data which is valuable enough not
>to have deleted already, yet which is not valuable enough to backup.
>I'd count local mirrors in this category: backup is too expensive, yet
>the cost of recreating the mirror is significant (days of
>downloading), therefore worth keeping if possible.
>

Mmhmm

>Also MP3 & DIVX collections etc. If you lose them it's not the end of
>the world, but you'd rather not.
>

HEY! It IS the end of the world if I lose /data/audio !!!!!!! I can't code
without music!

>> You could just delete filesystem, and restore data
>> from copy. If in turn one think the data is worth to protect it from
>> loss, but he will not do it... he risks that the data will be lost, and
>> he should not get access to such things.
> ^^^^^^
>
>It may not be worth it to _you_, but to me the cost of spare disks is
>significant enough that I choose to risk my less valuable data. It's
>my data hence my choice.
>

You forgot something. I only risk bugs in the code, that's why there's a
journal. You can have a bug in the filesystem code. You're taking the
same risks doing the conversion that you are mounting th efilesystem.

>> I think that copying data to another filesystem, and restoring it to
>> newly created is most of the time best and fastest method of converting
>> filesystems.
>
>You are right that this diminishes the value of an in-place filesystem
>converter (and defragmenter), because it is not necessary if you have
>the foresight to use multiple partitions or LVM, and enough spare disk
>space.q
>

Erm? Not everyone has spare disk space or wants to be assed with it.
Those methods take more work.

>However it would still be useful to some people, some of the time.
>
>Consider that many people choose ext3 rather than reiser simply
>because it is easy to convert ext2 to ext3, and hard to convert ext2
>to reiser (and hard to convert back if they don't like it). I have
>seen this written by many people who choose to use ext3. Thus proving
>that there is value in in-place filesystem conversion :)
>

that's me. I cite that I want to go from reiser3.6 to reiser4, but I have only
one reiser3.6. I used to have all reiserfs, and yes it was a lot faster. Now
I want it back.

>-- Jamie



2003-06-29 19:21:13

by Leonard Milcin Jr.

[permalink] [raw]
Subject: Re: File System conversion -- ideas

> Nrrrg. Yeah, I've got 80 gig and only CDR's to back up to, and no money.
> A CDR may read for me the day it's written, and then not work the next
> day. Still a risk.

Say, why you would want to change filesystem type?

If you have to change filesystem type, I think it is because you have a
good reason to do it. I can't imagine the reason explaining the need of
converting filesystem if you use this system as home desktop. For
ordinary user filesystem is just used for storing data and managing
permissions to that data. These are not real-time or
performance-critical systems. Thus most of the popular filesystems like
ext2, ext3, reiserfs basically fit their needs. If they choose right
filesystem type at startup, they could use it for a time of life of
their hard disk.

There are very few situations when you really need to convert
filesystem. Most of the time this operation is done by person who have
some experience with computers, and highly probable by person who has
access to additional hard disks, etc. I have never heard of one who had
to change filesystem type, and had no access to additional equipment.

I don't want to say it is not possible, to provide such a function
safely. What I want to say is that kernel developers should not
complicate filesystem code without *very* good reason. I think that
providing on-the-fly conversion capability is not a good reason. Good
reason is when you can improve usability for many users and most of the
time, not when you ease one operation needed by very few users few times
in their life, especially when they can do what they need by just
transferring their data back and forth to another disk, or machine.


Regards,


Leonard Milcin Jr.


--
"Unix IS user friendly... It's just selective about who its friends are."
-- Tollef Fog Heen

2003-06-29 19:27:26

by Leonard Milcin Jr.

[permalink] [raw]
Subject: Re: File System conversion -- ideas

Leonard Milcin Jr. wrote:
> If you have to change filesystem type, I think it is because you have a
> good reason to do it. I can't imagine the reason explaining the need of
> converting filesystem if you use this system as home desktop. For
> ordinary user filesystem is just used for storing data and managing
> permissions to that data. These are not real-time or
> performance-critical systems. Thus most of the popular filesystems like
> ext2, ext3, reiserfs basically fit their needs. If they choose right
> filesystem type at startup, they could use it for a time of life of
> their hard disk.
>
> There are very few situations when you really need to convert
> filesystem. Most of the time this operation is done by person who have
> some experience with computers, and highly probable by person who has
> access to additional hard disks, etc. I have never heard of one who had
> to change filesystem type, and had no access to additional equipment.
>
> I don't want to say it is not possible, to provide such a function
> safely. What I want to say is that kernel developers should not
> complicate filesystem code without *very* good reason. I think that
> providing on-the-fly conversion capability is not a good reason. Good
> reason is when you can improve usability for many users and most of the
> time, not when you ease one operation needed by very few users few times
> in their life, especially when they can do what they need by just
> transferring their data back and forth to another disk, or machine.

Ok, I forgot about enterprise users with lots of data, and probably
lacking free space, so I missed a point.

--
"Unix IS user friendly... It's just selective about who its friends are."
-- Tollef Fog Heen

2003-06-29 19:28:16

by Jamie Lokier

[permalink] [raw]
Subject: Re: File System conversion -- ideas

John Bradford wrote:
> It's usually more flexible just to partition the space you need, and
> add more partitions when necessary. For typical desktop use, swap
> isn't even necessary with 1 GB of physical RAM.

Partitions are never the right size when you fill one up.

I used to do what you describe, and got fed up when I had too many
strange symbolic links around, things like

/var/www -> /disk2/www
/var/log/httpd -> /disk2/httpd_logs
/home/jamie -> /disk2/jamie
/home/jamie/downloads -> /disk3/jamie_downloads

etc.

It seemed simpler to have one filesystem, and indeed it was.

(Now I have two drives at home I am back to the above, unfortunately.
At least the laptop is nice and simple, as it can only have 1 drive :)

Also, on a dedicated server I still use symbolic links between
partitions as it is too risky to try rearranging the partitions
remotely, and too expensive to rent more disk space.

-- Jamie

2003-06-29 19:28:05

by Al Viro

[permalink] [raw]
Subject: Re: File System conversion -- ideas

On Sun, Jun 29, 2003 at 08:28:47PM +0100, Jamie Lokier wrote:
> Consider that many people choose ext3 rather than reiser simply
> because it is easy to convert ext2 to ext3, and hard to convert ext2
> to reiser (and hard to convert back if they don't like it). I have
> seen this written by many people who choose to use ext3. Thus proving
> that there is value in in-place filesystem conversion :)

Uh-huh. You want to get in-kernel conversion between ext* and reiserfs?
With recoverable state if aborted? Get real.

2003-06-29 19:30:44

by Jamie Lokier

[permalink] [raw]
Subject: Re: File System conversion -- ideas

Leonard Milcin Jr. wrote:
> >Nrrrg. Yeah, I've got 80 gig and only CDR's to back up to, and no money.
> >A CDR may read for me the day it's written, and then not work the next
> >day. Still a risk.
>
> Say, why you would want to change filesystem type?

I'd like to try reiser4 when it is available because I heard from Hans
that it is faster...

Isn't that a good reason?

-- Jamie

2003-06-29 19:33:53

by rmoser

[permalink] [raw]
Subject: Re: File System conversion -- ideas



*********** REPLY SEPARATOR ***********

On 6/29/2003 at 8:42 PM [email protected] wrote:

>On Sun, Jun 29, 2003 at 08:28:47PM +0100, Jamie Lokier wrote:
>> Consider that many people choose ext3 rather than reiser simply
>> because it is easy to convert ext2 to ext3, and hard to convert ext2
>> to reiser (and hard to convert back if they don't like it). I have
>> seen this written by many people who choose to use ext3. Thus proving
>> that there is value in in-place filesystem conversion :)
>
>Uh-huh. You want to get in-kernel conversion between ext* and reiserfs?
>With recoverable state if aborted? Get real.

no, in-kernel conversion between everything. You don't think it can be done?
It's not that difficult a problem to manage data like that :D

2003-06-29 19:34:27

by rmoser

[permalink] [raw]
Subject: Re: File System conversion -- ideas



*********** REPLY SEPARATOR ***********

On 6/29/2003 at 9:37 PM Leonard Milcin Jr. wrote:

>> Nrrrg. Yeah, I've got 80 gig and only CDR's to back up to, and no
>money.
>> A CDR may read for me the day it's written, and then not work the next
>> day. Still a risk.
>
>Say, why you would want to change filesystem type?
>

I installed debian and it couldn't boot bk24 (kernel 2.4) at install, so I have
like 1 reiserfs partition (created afterwards, because at the time I had
10 gig free) and the rest are ext3. ext3 is VERY slow. I know, it's not
THAT slow, but... I have like 5 partitions on each disk, and 2 disks. So,
take off swap and reiserfs, like 6 at one time. Painful.

>If you have to change filesystem type, I think it is because you have a
>good reason to do it. I can't imagine the reason explaining the need of
>converting filesystem if you use this system as home desktop. For
>ordinary user filesystem is just used for storing data and managing
>permissions to that data. These are not real-time or
>performance-critical systems. Thus most of the popular filesystems like
>ext2, ext3, reiserfs basically fit their needs. If they choose right
>filesystem type at startup, they could use it for a time of life of
>their hard disk.
>

reiserfs is the filesytem that servers should use. It has the least latency
these days ;-) And it's quite stable.

>There are very few situations when you really need to convert
>filesystem. Most of the time this operation is done by person who have
>some experience with computers, and highly probable by person who has
>access to additional hard disks, etc. I have never heard of one who had
>to change filesystem type, and had no access to additional equipment.
>

Some of us are walking brains with very shallow pockets.

>I don't want to say it is not possible, to provide such a function
>safely. What I want to say is that kernel developers should not
>complicate filesystem code without *very* good reason. I think that
>providing on-the-fly conversion capability is not a good reason. Good
>reason is when you can improve usability for many users and most of the
>time, not when you ease one operation needed by very few users few times
>in their life, especially when they can do what they need by just
>transferring their data back and forth to another disk, or machine.
>

The filesystem code wouldn't be much more complicated. The changes
needed for this would all be in a separate source file anyway. Most of the
complicated crap is all in the code for the datasystem that manages the
two filesystems that suddenly exist in the same space.

>
>Regards,
>
>
>Leonard Milcin Jr.
>
>
>--
>"Unix IS user friendly... It's just selective about who its friends are."
> -- Tollef Fog Heen
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/

--Bluefox Icy

2003-06-29 19:37:19

by rmoser

[permalink] [raw]
Subject: Re: File System conversion -- ideas



*********** REPLY SEPARATOR ***********

On 6/29/2003 at 8:44 PM Jamie Lokier wrote:

>Leonard Milcin Jr. wrote:
>> >Nrrrg. Yeah, I've got 80 gig and only CDR's to back up to, and no
>money.
>> >A CDR may read for me the day it's written, and then not work the next
>> >day. Still a risk.
>>
>> Say, why you would want to change filesystem type?
>
>I'd like to try reiser4 when it is available because I heard from Hans
>that it is faster...
>

Me too! :D I want reiser4 for its full journaling support

>Isn't that a good reason?
>
>-- Jamie



2003-06-29 19:37:59

by rmoser

[permalink] [raw]
Subject: Re: File System conversion -- ideas



*********** REPLY SEPARATOR ***********


>Ok, I forgot about enterprise users with lots of data, and probably
>lacking free space, so I missed a point.
>

Yeppers. Also that the eventual goal (at least in my mind) is to allow
this to be done on a running r/w filesystem safely, which isn't as tough
a problem as it sounds.

>--
>"Unix IS user friendly... It's just selective about who its friends are."
> -- Tollef Fog Heen
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/



2003-06-29 19:41:43

by David D. Hagood

[permalink] [raw]
Subject: Re: File System conversion -- ideas

rmoser wrote:

> Ass yourself for hours, each time risking making a typo and killing both
> filesystems, or risking having the LVM resize die from a powerdrop or a kick
> to the power button (sorry we don't all have immortal fault tolerance). I actually
> though about this one and figured it was too rediculously annoying to actually
> bring up :-p
>

> I've never used LVM, but I'll look at it one day. If it's stable, that's good; I
> don't use Windows. I don't know exactly what LVM is but I have a pretty
> good idea; it's been forever since I read the doc on it, I forgot what it said!
>


Funny how, having never used LVM you have an opinion about it.

I have. I have done EXACTLY what I described.

First of all, do you REALLY think my way is any less failure prone,
especially in the presence of the possiblilty of power failure than any
other method? My method preserves a mountable, valid file system at each
step of the way - the resized downward of the old file system, the
resize upward of the new, the file copy.

Secondly, if you are REALLY concerned about the manual aspect of what I
suggested, you can write a simple shell script to do the work.

Third of all, the longest parts of the process I describe will be the
resize downward of the old file system and the copy of the data - the
LVM parts of this operation are pretty damn quick.

2003-06-29 19:43:56

by John Bradford

[permalink] [raw]
Subject: Re: File System conversion -- ideas

> > > > that the only reason to do it would be if you
> > > > could do it on a read-write filesystem without unmounting it.
> > >
> > > IMHO even if it requires the filesystem to be unmounted, it would
> > > still be useful. More challenging to use - you'd have to boot and run
> > > from ramdisk, but much more useful than not being able to convert at all.
> >
> > Only if it is the root filesystem, the filesystem of which generally
> > isn't going to affect overall performance that much.
>
> ...now use a single "/" filesystem on most systems, with a tiny
> "/boot" one to ensure booting. With journalling, this risk of losing
> data this way is much lower than it used to be, and the old reason for
> using multiple partitions - to avoid having to fsck /usr - no longer applies.

Well, I prefer to have separate patitions to reduce fragmentation and
increase flexibility, but I can see there are reasons for having a
single root filesystem.

> > > But useless unless you have a second disk lying around that you don't
> > > use for anything but filesystem conversions.
> >
> > Not at all. You can just use unpartitioned space on your existing
> > disk.
>
> So you have as much space unpartitioned on your disks as you are
> actually using to store data? I generally don't.

I probably average about 20% of the disk partitioned in my single disk
desktop boxes.

John.

2003-06-29 19:47:30

by Al Viro

[permalink] [raw]
Subject: Re: File System conversion -- ideas

On Sun, Jun 29, 2003 at 03:45:41PM -0400, rmoser wrote:

> >> seen this written by many people who choose to use ext3. Thus proving
> >> that there is value in in-place filesystem conversion :)
> >
> >Uh-huh. You want to get in-kernel conversion between ext* and reiserfs?
> >With recoverable state if aborted? Get real.
>
> no, in-kernel conversion between everything. You don't think it can be done?
> It's not that difficult a problem to manage data like that :D

I think that I will believe it when I see the patchset implementing it.
Provided that it will be convincing enough. Other than that... Not
really. You will need code for each pair of filesystems, since
convertor will need to know *both* layouts. No amount of handwaving
is likely to work around that. And we have what, something between
10 and 20 local filesystems? Have fun...

If you want your idea to be considered seriously - take reiserfs code,
take ext3 code, copy both to userland and put together a conversion
between them. Both ways. That, by definition, is easier than doing
it in kernel - you have the same code available and none of the limitations/
interaction with other stuff. When you have it working, well, time to
see what extra PITA will come from making it coexist with other parts
of kernel (and with much more poor runtime environment).

AFAICS, it is _very_ hard to implement. Even outside of the kernel.
If you can get it done - well, that might do a lot for having the
idea considered seriously. "Might" since you need to do it in a way
that would survive transplantation into the kernel _and_ would scale
better that O((number of filesystem types)^2).

2003-06-29 19:50:51

by Al Viro

[permalink] [raw]
Subject: Re: File System conversion -- ideas

On Sun, Jun 29, 2003 at 08:44:23PM +0100, Jamie Lokier wrote:
> Leonard Milcin Jr. wrote:
> > >Nrrrg. Yeah, I've got 80 gig and only CDR's to back up to, and no money.
> > >A CDR may read for me the day it's written, and then not work the next
> > >day. Still a risk.
> >
> > Say, why you would want to change filesystem type?
>
> I'd like to try reiser4 when it is available because I heard from Hans
> that it is faster...
>
> Isn't that a good reason?

Not really. Never, ever, try a new code on live system. Put together
a test box and/or test disk. Regardless of nature of code in question -
if you want to test something, go for a dedicated test setup.

2003-06-29 19:54:12

by David D. Hagood

[permalink] [raw]
Subject: Re: File System conversion -- ideas

rmoser wrote:

> no, in-kernel conversion between everything. You don't think it can be done?
> It's not that difficult a problem to manage data like that :D

OK, then - Show Us The Code.

Everyone else who have expressed an opinion believe an in-kernel
converter to be far to difficult to get right. You disagree, and think
it should be easy.

So write it. Show us the code. Change our minds.

You opened with a a "This should be possible". We raised you a "No, it's
hard, here's other ways to do it." You raised with a "It should be
easy." I call.

Let's see your cards.

2003-06-29 19:54:12

by rmoser

[permalink] [raw]
Subject: Re: File System conversion -- ideas



*********** REPLY SEPARATOR ***********

On 6/29/2003 at 2:55 PM David D. Hagood wrote:

>rmoser wrote:
>
>> Ass yourself for hours, each time risking making a typo and killing both
>> filesystems, or risking having the LVM resize die from a powerdrop or a
>kick
>> to the power button (sorry we don't all have immortal fault tolerance).
>I actually
>> though about this one and figured it was too rediculously annoying to
>actually
>> bring up :-p
>>
>
>> I've never used LVM, but I'll look at it one day. If it's stable,
>that's good; I
>> don't use Windows. I don't know exactly what LVM is but I have a pretty
>> good idea; it's been forever since I read the doc on it, I forgot what
>it said!
>>
>
>
>Funny how, having never used LVM you have an opinion about it.
>
>I have. I have done EXACTLY what I described.
>
>First of all, do you REALLY think my way is any less failure prone,
>especially in the presence of the possiblilty of power failure than any
>other method? My method preserves a mountable, valid file system at each
>step of the way - the resized downward of the old file system, the
>resize upward of the new, the file copy.
>

Except for a crash at the precise moment that data is being written during
a resize of a partition in LVM or the filesystem iteself. To my knowledge,
said operation is not journaled.

WTF is the doc for this? wth do I have an incomplete Documentation/ tree
or something? I dunno, maybe I read about LVM on some site or something.
It doesn't matter; it's been too many years since I've though about it or read
about it or even seen it.

>Secondly, if you are REALLY concerned about the manual aspect of what I
>suggested, you can write a simple shell script to do the work.
>
>Third of all, the longest parts of the process I describe will be the
>resize downward of the old file system and the copy of the data - the
>LVM parts of this operation are pretty damn quick.



2003-06-29 19:59:26

by John Bradford

[permalink] [raw]
Subject: Re: File System conversion -- ideas

> >> You've entirely missed the point :/ Did you read the last section?
> >
> >Yes, but...
> >
> >> I noted
> >> that the "make new partition and copy" method requires, first off, space
> >> for a new partition. All my partitions have massive amount of data on
> >> them.
> >> I can't do that. Those of us that can have to either do it twice, or
> >> rewrite
> >> fstab.
> >
> >Rewriting fstab shouldn't be a problem :-).
> >
> >> Eventually I'm hoping it can be done on a read-write filesystem. It's
> >> possible; I've thought about how to defragment read-write datasystems
> >> without getting in the way of logical operations.
> >
> >Seriously, though, I was thinking more of what's most useful in a
> >server situation, where it's not uncommon to have a lot of spare
> >capacity - I don't think that the kernel mode read-only only converter
> >is going to be much of an advantage over a userspace solution in those
> >situations, whereas a read-write one would potentially be, because
> >although it's reasonable to expect backups to be done anyway, if you
> >can avoid the downtime needed for the restore, that's a Good Thing.
> >
>
> It should be easy enough. I dunno if it'll require a VFS rewrite or not though.
> The idea is to buffer changes to and allow retrieval of logical filesystem
> objects, which requires.. well, RAM. Although, since the inodes on the new
> fs won't need to be in the same order they were in on the old fs, it should be
> possible to simply write new data to the new fs, IF you watch what you're
> doing. And yes, I do realize I'm talking about writing to half-existant
> filesystems that by rights can't even mount. (Actually, more like an empty
> filesystem that's jumbled around physically, but is being addressed logically
> anyway).
>
> Easy trick: Skip deleted inodes, and if you have to change an inode, have
> the old fs go mark it as deleted real quick and free the space around it, giving
> it to the conversion datasystem. Now you can run read-write while you do it.
>
> Remember also that I insist that there must be a journal in the CDS
> (conversion datasystem).
>
> >> >What I'd like to see is union mounts which allowed you to mount a new
> >> >filesystem of a different type over the original one, and have all new
> >> >writes go to the new fileystem. I.E. as files were modified, they
> >> >would be re-written to the new FS. That would be one way of avoiding
> >> >the performance hit on a busy server.
> >> >
> >>
> >> mmmm, then you'd need both fs' though. That's not conversion ;-)
> >
> >The idea was to transparently delete files from the old filesystem
> >once they had been written to, and therefore transferred to the new
> >filesystem.
> >
>
> Heh, sounds like what I'm doing but you're hitting my final goal from the
> beginning, and using two partitions.
>
> >I think you've missed my point - for a desktop machine, an hour or two
> >downtime is usually no problem. For an ISPs webserver, it usually
> >is, (unless there are a cluster of them serving requests for the same
> >sites). However, to be able to convert filesystems without:
> >
> >* Significant performance loss of network serving applications
> >* Significant downtime
> >
> >is a very desireable feature, but the ability to do this on a
> >read-write filesystem is critical - if it has to be unmounted, it's
> >not as useful.
> >
>
> That's the eventual idea. As for performance, errm. The performance loss
> would be in referencing the CDS to find where the data in each filesystem is,
> and in the CPU time and RAM used up, along with the massive disk access,
> while the system does its job. Shouldn't be a problem on servers though;
> IIRC they use SCSI disks and fast CPUs?

The disk accesses were what I was thinking of. May well not be a
problem in reality.

> >The reason I mentioned union mounts was because BSD already has union
> >mounts - see the mount_union manual page for more details. I don't
> >know of an implementation that allows you to automatically delete the
> >file on the old filesystem, when the copy on the new filesystem has
> >been made, though.
> >
>
> If you think about it, you have this:
>
> [PARTITION 1]
> |
> V
> [PARTITION 2]
>
> I have this (the == is an equivalence signm i.e. this is what's inside):
>
> [PARTITION]
> ==
> [DATASYSTEM]
> ==
> [FILESYSTEM 1]
> |
> V
> [DATASYSTEM ATOMS]
> |
> V
> [FILESYSTEM 2]
>
> Both filesystems are the full size of the partition, and so is the
> datasystem. The only difference is that before you start you have
> to make sure that the datasystem's gonna fit in with the free space
> on the first filesystem, and still have space to start the second
> filesystem, and then have space for its atoms.

Just thought - that's going to be a problem in read-write mode :-/.

If the disk fills up, we'd need to be able to maintain a consistant
filesystem structure, (at least good enough so that a separate
fsck-like utility could repair it - if the disk filled up, then the
conversion couldn't be done on-the-fly).

> These atoms will
> slowly be destroyed as they go into the second filesystem. You
> have to also make sure that the second FS won't be bigger than the
> first, and will at the end have enough to hold at least the empty
> datasystem and one atom.
>
> I feel I should note, since I forgot before, that an atom can contain part
> of the data for an inode, as long as you know this and can write the atom
> out to the new filesystem and get more of the old.

Seems like a solid idea, though. As long as it worked on at least
read-only mounted filesystems, I'd be quite interested in seeing it in
the mainline kernel.

John.

2003-06-29 20:05:49

by Leonard Milcin Jr.

[permalink] [raw]
Subject: Re: File System conversion -- ideas

David D. Hagood wrote:
> Funny how, having never used LVM you have an opinion about it.
>
> I have. I have done EXACTLY what I described.
>
> First of all, do you REALLY think my way is any less failure prone,
> especially in the presence of the possiblilty of power failure than any
> other method? My method preserves a mountable, valid file system at each
> step of the way - the resized downward of the old file system, the
> resize upward of the new, the file copy.
>
> Secondly, if you are REALLY concerned about the manual aspect of what I
> suggested, you can write a simple shell script to do the work.
>
> Third of all, the longest parts of the process I describe will be the
> resize downward of the old file system and the copy of the data - the
> LVM parts of this operation are pretty damn quick.

Yes, and I think it is the right way to follow. If we ensure, that each
of described steps preserves filesystem integrity we could automate this
thus getting what was described in initial idea but simpler. Yet better
- there is code that solves nearly all problems, there is only need to
automate fiddling with partitions and LVM, so end user will see this as
real transparent :-)

--
"Unix IS user friendly... It's just selective about who its friends are."
-- Tollef Fog Heen

2003-06-29 20:10:37

by Leonard Milcin Jr.

[permalink] [raw]
Subject: Re: File System conversion -- ideas

[email protected] wrote:
> Not really. Never, ever, try a new code on live system. Put together
> a test box and/or test disk. Regardless of nature of code in question -
> if you want to test something, go for a dedicated test setup.

You forgot, that new code after some time will change to good and
well-tested code, which one would find useful and enough safe.

--
"Unix IS user friendly... It's just selective about who its friends are."
-- Tollef Fog Heen

2003-06-29 20:08:11

by Davide Libenzi

[permalink] [raw]
Subject: Re: File System conversion -- ideas

On Sun, 29 Jun 2003 [email protected] wrote:

> I think that I will believe it when I see the patchset implementing it.
> Provided that it will be convincing enough. Other than that... Not
> really. You will need code for each pair of filesystems, since
> convertor will need to know *both* layouts. No amount of handwaving
> is likely to work around that. And we have what, something between
> 10 and 20 local filesystems? Have fun...
>
> If you want your idea to be considered seriously - take reiserfs code,
> take ext3 code, copy both to userland and put together a conversion
> between them. Both ways. That, by definition, is easier than doing
> it in kernel - you have the same code available and none of the limitations/
> interaction with other stuff. When you have it working, well, time to
> see what extra PITA will come from making it coexist with other parts
> of kernel (and with much more poor runtime environment).
>
> AFAICS, it is _very_ hard to implement. Even outside of the kernel.
> If you can get it done - well, that might do a lot for having the
> idea considered seriously. "Might" since you need to do it in a way
> that would survive transplantation into the kernel _and_ would scale
> better that O((number of filesystem types)^2).

Maybe defining a "neutral" metadata export/import might help in limiting
such NFS^2 ...



- Davide

2003-06-29 20:13:51

by Al Viro

[permalink] [raw]
Subject: Re: File System conversion -- ideas

On Sun, Jun 29, 2003 at 01:19:24PM -0700, Davide Libenzi wrote:

> > AFAICS, it is _very_ hard to implement. Even outside of the kernel.
> > If you can get it done - well, that might do a lot for having the
> > idea considered seriously. "Might" since you need to do it in a way
> > that would survive transplantation into the kernel _and_ would scale
> > better that O((number of filesystem types)^2).
>
> Maybe defining a "neutral" metadata export/import might help in limiting
> such NFS^2 ...

Go for it - do it in userland, define the mapping between various sorts
of metadata and let's see how well you can make it work. Have fun.

2003-06-29 20:17:49

by rmoser

[permalink] [raw]
Subject: Re: File System conversion -- ideas



*********** REPLY SEPARATOR ***********

On 6/29/2003 at 9:00 PM [email protected] wrote:

>On Sun, Jun 29, 2003 at 03:45:41PM -0400, rmoser wrote:
>
>> >> seen this written by many people who choose to use ext3. Thus proving
>> >> that there is value in in-place filesystem conversion :)
>> >
>> >Uh-huh. You want to get in-kernel conversion between ext* and reiserfs?
>> >With recoverable state if aborted? Get real.
>>
>> no, in-kernel conversion between everything. You don't think it can be
>done?
>> It's not that difficult a problem to manage data like that :D
>
>I think that I will believe it when I see the patchset implementing it.
>Provided that it will be convincing enough. Other than that... Not
>really. You will need code for each pair of filesystems, since
>convertor will need to know *both* layouts. No amount of handwaving
>is likely to work around that. And we have what, something between
>10 and 20 local filesystems? Have fun...

NO! You're not getting the point at all!

You don't need a pair! If you have 10 filesystems, you need 10 sets of
code in each direction, not 90. You convert from the data/metadata set
in the first filesystem to a self-contained atom, and then back from the
atom to the data/metadata set in the new filesystem. The atom is object
oriented, so anything that can't be moved over--like ACLs or Reiser4's
extended attributes that nobody else has, or permissions if converting to
vfat--is just lost. Note that if the data has an attribute like "Compressed"
or "encrypted", it is expanded/decrypted and thus brought back to its
natural form before being stuffed into an atom.

You are thinking like this (direct):

EXT2 -> Reiserfs
EXT2 -> XFS
EXT2 -> JFS
Reiserfs -> EXT2
Reiserfs -> XFS
Reiserfs -> JFS
XFS -> EXT2
XFS -> Reiserfs
XFS -> JFS
JFS -> EXT2
JFS -> XFS
JFS -> Reiserfs

Total: 12

I am thinking like this (atom):

EXT2 -> atom
Reiserfs -> atom
XFS -> atom
JFS -> atom
atom -> Ext2
atom -> Reiserfs
atom -> XFS
atom -> JFS

total: 8

for 2 through 10, the direct:atom ratios are:

2:2 6:6 12:8 20:10 30:12 42:14 56:16 72:18 90:20

So with 10 filesystem types, N*(N-1) or 90 pairs to go directly from one
filesystem's datastructures to any other's; N*2 or 20 pairs to go from
Metadata/Data pair -> Self-contained object oriented possibly
compressed atom -> Metadata/Data pair. That's N sets of code to go
FS_OBJECT -> atom and N sets to go from atom -> FS_OBJECT, in
this case 10 and 10.

When we get to 20 filesystems, direct conversion needs 380 pieces of
code, whereas my method needs only 20 + 20 == 40. I obviously put
more thought into this than you, but that's okay; it's an obscure idea and
I don't expect everyone to think before answering.

I might note that although it needs 40 pieces of code to handle all 20
filesystems, the filesystems may not all support it. So, look at it as
2 pieces of code per filesystem: one to, one from.

>If you want your idea to be considered seriously - take reiserfs code,
>take ext3 code, copy both to userland and put together a conversion
>between them. Both ways. That, by definition, is easier than doing
>it in kernel - you have the same code available and none of the
>limitations/
>interaction with other stuff. When you have it working, well, time to
>see what extra PITA will come from making it coexist with other parts
>of kernel (and with much more poor runtime environment).
>

That would be much harder to maintain as well. It would have to be altered
every time the filesystem code in the kernel is changed.

>AFAICS, it is _very_ hard to implement. Even outside of the kernel.
>If you can get it done - well, that might do a lot for having the
>idea considered seriously. "Might" since you need to do it in a way
>that would survive transplantation into the kernel _and_ would scale
>better that O((number of filesystem types)^2).

I've beaten the O((FS_COUNT)^2) already. And by the way, it's
O((FS_COUNT)*(FS_COUNT - 1_). There's exactly O(2*FS_COUNT)
and o(2*FS_COUNT) sets of code needed total to be able to convert
between any two filesystems.

Now, what's impractical is maintaining two sets of code that do exactly
the same thing. Why maintain code here to read the filesystems, and
also in the kernel? Just do it in the kernel. Don't lose sight of the fact
that the final goal (after all else is done) is to modify VFS to actually
run this thing as a filesystem. THAT is what's going to be a bitch. The
conversions are simple enough.

--Bluefox Icy

2003-06-29 20:19:17

by rmoser

[permalink] [raw]
Subject: Re: File System conversion -- ideas



*********** REPLY SEPARATOR ***********

On 6/29/2003 at 9:02 PM [email protected] wrote:

>On Sun, Jun 29, 2003 at 08:44:23PM +0100, Jamie Lokier wrote:
>> Leonard Milcin Jr. wrote:
>> > >Nrrrg. Yeah, I've got 80 gig and only CDR's to back up to, and no
>money.
>> > >A CDR may read for me the day it's written, and then not work the next
>> > >day. Still a risk.
>> >
>> > Say, why you would want to change filesystem type?
>>
>> I'd like to try reiser4 when it is available because I heard from Hans
>> that it is faster...
>>
>> Isn't that a good reason?
>
>Not really. Never, ever, try a new code on live system. Put together
>a test box and/or test disk. Regardless of nature of code in question -
>if you want to test something, go for a dedicated test setup.

Umm, reiser4 isn't going to be released as stable until it's well tested.
I heard months ago that initial betas were out--BETAs, as in IT'S DONE--
and stability was predicted by June. If the code has been thoroughly
tested, then it satisfies this basic security principle, and thus he just
wants to try it out to see how it works.

>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/



2003-06-29 20:25:39

by rmoser

[permalink] [raw]
Subject: Re: File System conversion -- ideas



*********** REPLY SEPARATOR ***********

On 6/29/2003 at 3:05 PM David D. Hagood wrote:

>rmoser wrote:
>
>> no, in-kernel conversion between everything. You don't think it can be
>done?
>> It's not that difficult a problem to manage data like that :D
>
>OK, then - Show Us The Code.
>
>Everyone else who have expressed an opinion believe an in-kernel
>converter to be far to difficult to get right. You disagree, and think
>it should be easy.
>
>So write it. Show us the code. Change our minds.
>
>You opened with a a "This should be possible". We raised you a "No, it's
>hard, here's other ways to do it." You raised with a "It should be
>easy." I call.
>

Told you, I can't code it. I could work on making an initial design for the
most important part though, the datasystem that separates the two filesystems
and holds the meta-data and data in self-contained atoms. I KNOW I won't
get it right the first time, but I can give you a place to start.

I absolutely can not code anything in the kernel at this time. I've tried. I'll
get it eventually ;-)

Citing the original message:

[QUOTE]
I know I spout a lot of crap, and wish I could just do it all (can we get
a "Make a small device driver for virtual hardware in Linux 2.4 and 2.5"
tutorial up on kernel.org?!), but I think I've got some good ideas. At
any rate, the good is kept and the bad is weeded out, right?
[ENDQUOTE]

I actually started pretty sure that someone would want me to come up
with an initial design for the datasystem that is used to control this.
Not that I drew it out before-hand, mind you; I just thought someone
would ask. I KNEW someone would tell me to code it, which is why
I said I can't right off the bat.

>Let's see your cards.
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/

--Bluefox Icy

2003-06-29 20:28:54

by rmoser

[permalink] [raw]
Subject: Re: File System conversion -- ideas



*********** REPLY SEPARATOR ***********

On 6/29/2003 at 1:19 PM Davide Libenzi wrote:

>On Sun, 29 Jun 2003 [email protected] wrote:
>
>> I think that I will believe it when I see the patchset implementing it.
>> Provided that it will be convincing enough. Other than that... Not
>> really. You will need code for each pair of filesystems, since
>> convertor will need to know *both* layouts. No amount of handwaving
>> is likely to work around that. And we have what, something between
>> 10 and 20 local filesystems? Have fun...
>>
>> If you want your idea to be considered seriously - take reiserfs code,
>> take ext3 code, copy both to userland and put together a conversion
>> between them. Both ways. That, by definition, is easier than doing
>> it in kernel - you have the same code available and none of the
>limitations/
>> interaction with other stuff. When you have it working, well, time to
>> see what extra PITA will come from making it coexist with other parts
>> of kernel (and with much more poor runtime environment).
>>
>> AFAICS, it is _very_ hard to implement. Even outside of the kernel.
>> If you can get it done - well, that might do a lot for having the
>> idea considered seriously. "Might" since you need to do it in a way
>> that would survive transplantation into the kernel _and_ would scale
>> better that O((number of filesystem types)^2).
>
>Maybe defining a "neutral" metadata export/import might help in limiting
>such NFS^2 ...
>
>

That was in the original message. :-p Some people don't read.

>
>- Davide
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/

--Bluefox Icy


2003-06-29 20:36:21

by rmoser

[permalink] [raw]
Subject: Re: File System conversion -- ideas



*********** REPLY SEPARATOR ***********

On 6/29/2003 at 9:20 PM John Bradford wrote:

>> If you think about it, you have this:
>>
>> [PARTITION 1]
>> |
>> V
>> [PARTITION 2]
>>
>> I have this (the == is an equivalence signm i.e. this is what's inside):
>>
>> [PARTITION]
>> ==
>> [DATASYSTEM]
>> ==
>> [FILESYSTEM 1]
>> |
>> V
>> [DATASYSTEM ATOMS]
>> |
>> V
>> [FILESYSTEM 2]
>>
>> Both filesystems are the full size of the partition, and so is the
>> datasystem. The only difference is that before you start you have
>> to make sure that the datasystem's gonna fit in with the free space
>> on the first filesystem, and still have space to start the second
>> filesystem, and then have space for its atoms.
>
>Just thought - that's going to be a problem in read-write mode :-/.
>
>If the disk fills up, we'd need to be able to maintain a consistant
>filesystem structure, (at least good enough so that a separate
>fsck-like utility could repair it - if the disk filled up, then the
>conversion couldn't be done on-the-fly).
>


mmm.. hadn't thought of that.

1 second answer: Lock down some of the freespace. Do NOT let it
get full. You know how ext2 reserves 5% for the superuser? Do that.
Reserve enough freespace to keep working and finish the conversion.
Predict from the beginning how much free space is going to be needed,
and how much is going to be left over at the very final stages of the
conversion.

>> These atoms will
>> slowly be destroyed as they go into the second filesystem. You
>> have to also make sure that the second FS won't be bigger than the
>> first, and will at the end have enough to hold at least the empty
>> datasystem and one atom.
>>
>> I feel I should note, since I forgot before, that an atom can contain
>part
>> of the data for an inode, as long as you know this and can write the atom
>> out to the new filesystem and get more of the old.
>
>Seems like a solid idea, though. As long as it worked on at least
>read-only mounted filesystems, I'd be quite interested in seeing it in
>the mainline kernel.
>

On a side note, the bitch is gonna be trying to swap the superblock back
in over the datasystem's superblock. Maybe we should set it up so that
the datasystem has the journal in a fixed place, or tell the user where it
is, so that if we do crash at that absolute final stage, we can finish it out
:/

Sorry, just spouting extra things. To me, it's no good unless we've prepared
for 100% of the problems we will encounter, excluding bugs in the code.

>John.
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/


--Bluefox Icy

2003-06-29 20:33:22

by rmoser

[permalink] [raw]
Subject: Re: File System conversion -- ideas



*********** REPLY SEPARATOR ***********

On 6/29/2003 at 9:25 PM [email protected] wrote:

>On Sun, Jun 29, 2003 at 01:19:24PM -0700, Davide Libenzi wrote:
>
>> > AFAICS, it is _very_ hard to implement. Even outside of the kernel.
>> > If you can get it done - well, that might do a lot for having the
>> > idea considered seriously. "Might" since you need to do it in a way
>> > that would survive transplantation into the kernel _and_ would scale
>> > better that O((number of filesystem types)^2).
>>
>> Maybe defining a "neutral" metadata export/import might help in limiting
>> such NFS^2 ...
>
>Go for it - do it in userland, define the mapping between various sorts
>of metadata and let's see how well you can make it work. Have fun.

You sound like virt
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/



2003-06-29 20:29:52

by David D. Hagood

[permalink] [raw]
Subject: Re: File System conversion -- ideas

rmoser wrote:

> Except for a crash at the precise moment that data is being written during
> a resize of a partition in LVM or the filesystem iteself. To my knowledge,
> said operation is not journaled.

And the window of vulnerability for my method is very small - for yours
it is very large (the whole duration of the conversion operation.)

Sorry, but I've seen too many folks like you in the past on lists like
this. You write in with a poorly considered idea, and when people try to
show you why it won't work you plug your ears and say
"Nyah-Nyah-Nyah-I'm-not-listening".

As I said before: if you think this is so easy to do, DO IT. SHOW US THE
CODE.

Until you do, I consider this "discussion" at an end.

2003-06-29 20:36:42

by Hugo Mills

[permalink] [raw]
Subject: Re: File System conversion -- ideas

On Sun, Jun 29, 2003 at 04:29:45PM -0400, rmoser wrote:
> *********** REPLY SEPARATOR ***********
>
> NO! You're not getting the point at all!
>
> You don't need a pair! If you have 10 filesystems, you need 10 sets of
> code in each direction, not 90. You convert from the data/metadata set
> in the first filesystem to a self-contained atom, and then back from the
> atom to the data/metadata set in the new filesystem. The atom is object
> oriented, so anything that can't be moved over--like ACLs or Reiser4's
> extended attributes that nobody else has, or permissions if converting to
> vfat--is just lost.

You will, of course, ensure that your atoms contain the superset of
all filesystem metadata semantics.

> Note that if the data has an attribute like "Compressed"
> or "encrypted", it is expanded/decrypted and thus brought back to its
> natural form before being stuffed into an atom.
[snip]

> So with 10 filesystem types, N*(N-1) or 90 pairs to go directly from one
> filesystem's datastructures to any other's; N*2 or 20 pairs to go from
> Metadata/Data pair -> Self-contained object oriented possibly
> compressed atom -> Metadata/Data pair. That's N sets of code to go
> FS_OBJECT -> atom and N sets to go from atom -> FS_OBJECT, in
> this case 10 and 10.
>
> When we get to 20 filesystems, direct conversion needs 380 pieces of
> code, whereas my method needs only 20 + 20 == 40. I obviously put
> more thought into this than you, but that's okay; it's an obscure idea and
> I don't expect everyone to think before answering.

Actually:

1) I think Viro did mention exactly this method in one of his mails.

2) It's not an obscure idea at all -- it's one of the standard
techniques if you've ever had to consider (let alone write!) a
set of data-conversion routines.

> >If you want your idea to be considered seriously - take reiserfs code,
> >take ext3 code, copy both to userland and put together a conversion
> >between them. Both ways. That, by definition, is easier than doing
> >it in kernel - you have the same code available and none of the
> >limitations/
> >interaction with other stuff. When you have it working, well, time to
> >see what extra PITA will come from making it coexist with other parts
> >of kernel (and with much more poor runtime environment).
> >
>
> That would be much harder to maintain as well. It would have to be altered
> every time the filesystem code in the kernel is changed.

Yes, but the point is it's a much easier thing to implement and
test the concept than diving straight into kernel code. You don't have
to maintain it for very long (if at all) -- just long enough to prove
to everyone that this kind of conversion is possible, and that they
should help you roll it into the kernel.

> >AFAICS, it is _very_ hard to implement. Even outside of the kernel.
> >If you can get it done - well, that might do a lot for having the
> >idea considered seriously. "Might" since you need to do it in a way
> >that would survive transplantation into the kernel _and_ would scale
> >better that O((number of filesystem types)^2).
>
> I've beaten the O((FS_COUNT)^2) already. And by the way, it's
> O((FS_COUNT)*(FS_COUNT - 1_). There's exactly O(2*FS_COUNT)
> and o(2*FS_COUNT) sets of code needed total to be able to convert
> between any two filesystems.

There's no such thing as O(x*(x-1)). This is precisely O(x^2).
Similarly, O(2*x) is precisely the same as O(x). If you're going to
try to use mathematics to demonstrate your point, please at least make
sure that you're using it _right_.

> Now, what's impractical is maintaining two sets of code that do exactly
> the same thing. Why maintain code here to read the filesystems, and
> also in the kernel?

It's not a maintenance thing at all -- it's a matter of
demonstrating that you can walk before you try running.

> Just do it in the kernel. Don't lose sight of the fact
> that the final goal (after all else is done) is to modify VFS to actually
> run this thing as a filesystem. THAT is what's going to be a bitch. The
> conversions are simple enough.

Hugo.

--
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
PGP key: 1C335860 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- For months now, we have been making triumphant retreats ---
before a demoralised enemy who is advancing
in utter disorder.


Attachments:
(No filename) (4.39 kB)
(No filename) (189.00 B)
Download all attachments

2003-06-29 20:39:22

by Davide Libenzi

[permalink] [raw]
Subject: Re: File System conversion -- ideas

On Sun, 29 Jun 2003 [email protected] wrote:

> > Maybe defining a "neutral" metadata export/import might help in limiting
> > such NFS^2 ...
>
> Go for it - do it in userland, define the mapping between various sorts
> of metadata and let's see how well you can make it work. Have fun.

Al, I don't even think about doing it :) Tar still works for me (and the
neutral format to be compatible with all fs will be nothing more than a
tar can export) and the thing is not even close to be interesting. It was
obvious though that :

# raiser.export | ext2.import && ext2.export | raiser.import

will produce a different raiser metadata.



- Davide

2003-06-29 20:40:21

by Al Viro

[permalink] [raw]
Subject: Re: File System conversion -- ideas

On Sun, Jun 29, 2003 at 04:29:45PM -0400, rmoser wrote:

> NO! You're not getting the point at all!
>
> You don't need a pair! If you have 10 filesystems, you need 10 sets of
> code in each direction, not 90. You convert from the data/metadata set
> in the first filesystem to a self-contained atom, and then back from the

[snip handwaving]

> That would be much harder to maintain as well. It would have to be altered
> every time the filesystem code in the kernel is changed.

Not really, as long as filesystem _layout_ is stable.

> I've beaten the O((FS_COUNT)^2) already. And by the way, it's
> O((FS_COUNT)*(FS_COUNT - 1_). There's exactly O(2*FS_COUNT)
> and o(2*FS_COUNT) sets of code needed total to be able to convert
> between any two filesystems.

No, you have not. You are yet to demonstrate that it's doable.

> Now, what's impractical is maintaining two sets of code that do exactly
> the same thing. Why maintain code here to read the filesystems, and
> also in the kernel? Just do it in the kernel. Don't lose sight of the fact
> that the final goal (after all else is done) is to modify VFS to actually
> run this thing as a filesystem. THAT is what's going to be a bitch. The
> conversions are simple enough.

The *SHOW* *THEM*. You keep repeating that it's simple. Fine, show that
it can be done. Then we can start talking about the rest - until you can
demonstrate (as in, show the working code) that does what you call simple,
there is no point in going any further.

_That_ is the point of contention. And no, saying the word "atom" does
not count as proof of feasibility. Show how to map metadata between different
filesystem types. Hell, show that you know what types of metadata are there.

Forget about in-core data structures. Whatever data structures you use,
it boils down to manipulating on-disk ones - that's kinda the point of
exercise, right? Show what should be done with them - with whatever in-core
objects you like. Assuming that VFS or any other parts of kernel do not
get into your way and do not impose any restrictions - how would you do this
stuff? From one on-disk layout to another. In details. Then we can go
and see how to make existing kernel objects live with that. That will be
extra condition and it will only make the problem harder. Until you have
a solution of easier problem, there's no sense in discussing harder one.

2003-06-29 20:42:52

by rmoser

[permalink] [raw]
Subject: Re: File System conversion -- ideas



*********** REPLY SEPARATOR ***********

On 6/29/2003 at 3:41 PM David D. Hagood wrote:

>rmoser wrote:
>
>> Except for a crash at the precise moment that data is being written
>during
>> a resize of a partition in LVM or the filesystem iteself. To my
>knowledge,
>> said operation is not journaled.
>
>And the window of vulnerability for my method is very small - for yours
>it is very large (the whole duration of the conversion operation.)
>

Wrong. Go read it. Citing the original post:

[QUOTE]
1) Create a method for storing meta-data for each file/directory on a filesystem
which is being slowly destroyed. [...] It is
preferable to make this datasystem fault tolerant, so that if it goes down, the
conversion can be continued without damage. [...]
- Object oriented: Store meta-data that may not be recognized by the new
filesystem
- Journalized: Don't break!
[...]
- Store data that is needed to resume the conversion at any time: There may be
a collossal system crash during conversion!
- Differentiate between each filesystem structure and the datasystem used during
conversion: Must be able to disassemble one filesystem and reassemble it to
another WITHOUT getting lost!
[ENDQUOTE]

Everything that happens everywhere should be roll-back journalized, so that if anything
happens, we don't finish what we did but instead go back to the immediate prior
consistent state that will allow us to continue on with our work. There is only one
vulnerability point: the final swapping out of the datasystem's superblock for the new
filesystem's superblock at the very end. There's a way to fix this too. Display to the
user where exactly the journal is. Then stop. He writes this number down. Then, you
journal the change, as in roll-forward journaling, so it will complete if it crashes. If
for some reason the machine drops--kernel panic, power outage, cat finds the reset
button--you run the conversion against the system and request to replay journal at
the offset it gave you. The journal in this special case is sitting in a chunk of free
space in the filesystem somewhere, and houses only this one transaction. This means
that once this transaction finishes, the journal is just some random data written
to that area in the middle of free space on the filesystem. It's unimportant, and doesn't
have to be removed because there are no references to it.

>Sorry, but I've seen too many folks like you in the past on lists like
>this. You write in with a poorly considered idea, and when people try to
>show you why it won't work you plug your ears and say
>"Nyah-Nyah-Nyah-I'm-not-listening".
>
>As I said before: if you think this is so easy to do, DO IT. SHOW US THE
>CODE.
>

I wish.

>Until you do, I consider this "discussion" at an end.

--Bluefox Icy

2003-06-29 20:47:46

by rmoser

[permalink] [raw]
Subject: Re: File System conversion -- ideas



*********** REPLY SEPARATOR ***********

On 6/29/2003 at 9:50 PM Hugo Mills wrote:

>On Sun, Jun 29, 2003 at 04:29:45PM -0400, rmoser wrote:
>> *********** REPLY SEPARATOR ***********
>>
>> NO! You're not getting the point at all!
>>
>> You don't need a pair! If you have 10 filesystems, you need 10 sets of
>> code in each direction, not 90. You convert from the data/metadata set
>> in the first filesystem to a self-contained atom, and then back from the
>> atom to the data/metadata set in the new filesystem. The atom is object
>> oriented, so anything that can't be moved over--like ACLs or Reiser4's
>> extended attributes that nobody else has, or permissions if converting to
>> vfat--is just lost.
>
> You will, of course, ensure that your atoms contain the superset of
>all filesystem metadata semantics.
>

Yes, that's the point of object orientation. Objects I don't understand I ignore.
Objects I do understand I keep. Objects I don't understand don't confuse
me because I can see the difference between two objects.

>> Note that if the data has an attribute like "Compressed"
>> or "encrypted", it is expanded/decrypted and thus brought back to its
>> natural form before being stuffed into an atom.
>[snip]
>
>> So with 10 filesystem types, N*(N-1) or 90 pairs to go directly from one
>> filesystem's datastructures to any other's; N*2 or 20 pairs to go from
>> Metadata/Data pair -> Self-contained object oriented possibly
>> compressed atom -> Metadata/Data pair. That's N sets of code to go
>> FS_OBJECT -> atom and N sets to go from atom -> FS_OBJECT, in
>> this case 10 and 10.
>>
>> When we get to 20 filesystems, direct conversion needs 380 pieces of
>> code, whereas my method needs only 20 + 20 == 40. I obviously put
>> more thought into this than you, but that's okay; it's an obscure idea
>and
>> I don't expect everyone to think before answering.
>
> Actually:
>
>1) I think Viro did mention exactly this method in one of his mails.
>
>2) It's not an obscure idea at all -- it's one of the standard
> techniques if you've ever had to consider (let alone write!) a
> set of data-conversion routines.
>
wow, I re-invented another wheel.

>> >If you want your idea to be considered seriously - take reiserfs code,
>> >take ext3 code, copy both to userland and put together a conversion
>> >between them. Both ways. That, by definition, is easier than doing
>> >it in kernel - you have the same code available and none of the
>> >limitations/
>> >interaction with other stuff. When you have it working, well, time to
>> >see what extra PITA will come from making it coexist with other parts
>> >of kernel (and with much more poor runtime environment).
>> >
>>
>> That would be much harder to maintain as well. It would have to be
>altered
>> every time the filesystem code in the kernel is changed.
>
> Yes, but the point is it's a much easier thing to implement and
>test the concept than diving straight into kernel code. You don't have
>to maintain it for very long (if at all) -- just long enough to prove
>to everyone that this kind of conversion is possible, and that they
>should help you roll it into the kernel.
>

I can't code it. I want to, it'd be FUN, but I can't.

>> >AFAICS, it is _very_ hard to implement. Even outside of the kernel.
>> >If you can get it done - well, that might do a lot for having the
>> >idea considered seriously. "Might" since you need to do it in a way
>> >that would survive transplantation into the kernel _and_ would scale
>> >better that O((number of filesystem types)^2).
>>
>> I've beaten the O((FS_COUNT)^2) already. And by the way, it's
>> O((FS_COUNT)*(FS_COUNT - 1_). There's exactly O(2*FS_COUNT)
>> and o(2*FS_COUNT) sets of code needed total to be able to convert
>> between any two filesystems.
>
> There's no such thing as O(x*(x-1)). This is precisely O(x^2).
>Similarly, O(2*x) is precisely the same as O(x). If you're going to
>try to use mathematics to demonstrate your point, please at least make
>sure that you're using it _right_.
>

Big O notation is inappropriate here because it measures time complexity;
however, I was following Viro's lead. We're using it to measure code
complexity, sorry.

>> Now, what's impractical is maintaining two sets of code that do exactly
>> the same thing. Why maintain code here to read the filesystems, and
>> also in the kernel?
>
> It's not a maintenance thing at all -- it's a matter of
>demonstrating that you can walk before you try running.
>

Erm, if you're going to do it at all, do it right first. Actually demonstrating
it is not the only way to prove it's possible.

>> Just do it in the kernel. Don't lose sight of the fact
>> that the final goal (after all else is done) is to modify VFS to actually
>> run this thing as a filesystem. THAT is what's going to be a bitch. The
>> conversions are simple enough.
>
> Hugo.
>
>--
>=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
> PGP key: 1C335860 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
> --- For months now, we have been making triumphant retreats ---
> before a demoralised enemy who is advancing
> in utter disorder.
>
>-----BEGIN PGP SIGNATURE-----
>Version: GnuPG v1.2.2 (GNU/Linux)
>
>iD8DBQE+/1D7ssJ7whwzWGARArdCAJ4pBlRI5wUCQuto8a/UJS89VgVGqACglV2k
>yZmfIJpKxN2qEjONnx5FicA=
>=iJlv
>-----END PGP SIGNATURE-----

Calmest input I've seen yet.

--Bluefox Icy

2003-06-29 21:01:03

by Davide Libenzi

[permalink] [raw]
Subject: Re: File System conversion -- ideas

On Sun, 29 Jun 2003, rmoser wrote:

> >> I've beaten the O((FS_COUNT)^2) already. And by the way, it's
> >> O((FS_COUNT)*(FS_COUNT - 1_). There's exactly O(2*FS_COUNT)
> >> and o(2*FS_COUNT) sets of code needed total to be able to convert
> >> between any two filesystems.
> >
> > There's no such thing as O(x*(x-1)). This is precisely O(x^2).
> >Similarly, O(2*x) is precisely the same as O(x). If you're going to
> >try to use mathematics to demonstrate your point, please at least make
> >sure that you're using it _right_.
> >
>
> Big O notation is inappropriate here because it measures time complexity;
> however, I was following Viro's lead. We're using it to measure code
> complexity, sorry.

In which math book O() is a time thingy ?


- Davide

2003-06-29 20:57:54

by Chris Friesen

[permalink] [raw]
Subject: Re: File System conversion -- ideas

[email protected] wrote:

> The *SHOW* *THEM*. You keep repeating that it's simple. Fine, show that
> it can be done. Then we can start talking about the rest - until you can
> demonstrate (as in, show the working code) that does what you call simple,
> there is no point in going any further.
>
> _That_ is the point of contention. And no, saying the word "atom" does
> not count as proof of feasibility. Show how to map metadata between different
> filesystem types. Hell, show that you know what types of metadata are there.


Presumably the in-kernel metadata would need to be a superset of all of
the different metadata stored by all the different supported filesystems.

It then needs two converters for each filesystem, to/from the metadata
format. When you hit a new filesystem that your metadata doesn't
support, you extend that metadata.

Of course, you're screwed if the new filesystem is sufficiently odd that
it doesn't map nicely to the metadata organization.

Chris


--
Chris Friesen | MailStop: 043/33/F10
Nortel Networks | work: (613) 765-0557
3500 Carling Avenue | fax: (613) 765-2986
Nepean, ON K2H 8E9 Canada | email: [email protected]

2003-06-29 20:57:38

by rmoser

[permalink] [raw]
Subject: Re: File System conversion -- ideas



*********** REPLY SEPARATOR ***********

On 6/29/2003 at 9:51 PM [email protected] wrote:

>On Sun, Jun 29, 2003 at 04:29:45PM -0400, rmoser wrote:
>
>> NO! You're not getting the point at all!
>>
>> You don't need a pair! If you have 10 filesystems, you need 10 sets of
>> code in each direction, not 90. You convert from the data/metadata set
>> in the first filesystem to a self-contained atom, and then back from the
>
>[snip handwaving]
>
>> That would be much harder to maintain as well. It would have to be
>altered
>> every time the filesystem code in the kernel is changed.
>
>Not really, as long as filesystem _layout_ is stable.
>

Maybe heh.

>> I've beaten the O((FS_COUNT)^2) already. And by the way, it's
>> O((FS_COUNT)*(FS_COUNT - 1_). There's exactly O(2*FS_COUNT)
>> and o(2*FS_COUNT) sets of code needed total to be able to convert
>> between any two filesystems.
>
>No, you have not. You are yet to demonstrate that it's doable.
>
>> Now, what's impractical is maintaining two sets of code that do exactly
>> the same thing. Why maintain code here to read the filesystems, and
>> also in the kernel? Just do it in the kernel. Don't lose sight of the
>fact
>> that the final goal (after all else is done) is to modify VFS to actually
>> run this thing as a filesystem. THAT is what's going to be a bitch. The
>> conversions are simple enough.
>
>The *SHOW* *THEM*. You keep repeating that it's simple. Fine, show that
>it can be done. Then we can start talking about the rest - until you can
>demonstrate (as in, show the working code) that does what you call simple,
>there is no point in going any further.
>

I'm not coding it. I wish I could. heh. Hmm.... :/ I can't keep the wheels in
my head from cranking out ideas on how to structure the datasystem though
:/ I'll go diagram that out for a start I guess.

>_That_ is the point of contention. And no, saying the word "atom" does
>not count as proof of feasibility. Show how to map metadata between
>different
>filesystem types. Hell, show that you know what types of metadata are
>there.
>

heh. Right-o. Need to find out about filesystem structure...

>Forget about in-core data structures. Whatever data structures you use,
>it boils down to manipulating on-disk ones - that's kinda the point of
>exercise, right? Show what should be done with them - with whatever
>in-core
>objects you like. Assuming that VFS or any other parts of kernel do not
>get into your way and do not impose any restrictions - how would you do
>this
>stuff? From one on-disk layout to another. In details. Then we can go
>and see how to make existing kernel objects live with that. That will be
>extra condition and it will only make the problem harder. Until you have
>a solution of easier problem, there's no sense in discussing harder one.

Yeah, I know. I always do keep the harder problem in mind, though, when
I intend to build it upon the easier problem. The reason is that I want to
make sure the design isn't going to get in the way once the easier part is
solved.

well I'll go play.

--Bluefox Icy

2003-06-29 21:17:45

by Diego Calleja

[permalink] [raw]
Subject: Re: File System conversion -- ideas

On Sun, 29 Jun 2003 15:45:41 -0400
rmoser <[email protected]> wrote:

> no, in-kernel conversion between everything. You don't think it can be done?
> It's not that difficult a problem to manage data like that :D

personally, whan i want to change the filesystem of my data (not very often
though, 2 or 3 times in my life, and that was because i was bored), i just do
a new particion with the filesystem i want, mount it, and cp -a everything I
want (or tar it, or use whatever backup/script software you want). Thats the
way of doing things, IMHO. Appart that you can convert all your data to
another filesystem, it gives you flexibility, which I wouldn't get in the
kernel.

And well...how many people are you expecting to change from one filesystem
to another in the real world?

2003-06-29 21:23:12

by Hugo Mills

[permalink] [raw]
Subject: Re: File System conversion -- ideas

[Damn, forgot to cc the list]

On Sun, Jun 29, 2003 at 05:00:34PM -0400, rmoser wrote:
>
>
> *********** REPLY SEPARATOR ***********
>
> On 6/29/2003 at 9:50 PM Hugo Mills wrote:
>
> >On Sun, Jun 29, 2003 at 04:29:45PM -0400, rmoser wrote:
> >> That would be much harder to maintain as well. It would have to be
> >altered
> >> every time the filesystem code in the kernel is changed.
> >
> > Yes, but the point is it's a much easier thing to implement and
> >test the concept than diving straight into kernel code. You don't have
> >to maintain it for very long (if at all) -- just long enough to prove
> >to everyone that this kind of conversion is possible, and that they
> >should help you roll it into the kernel.
> >
>
> I can't code it. I want to, it'd be FUN, but I can't.

This, I think, is your problem. IIRC, you had the same problem last
time you posted an idea to LKML. It's like novelists -- come up with a
good idea for a story, and approach a novelist with it: "Hey! I've got
this great idea for a story!". They will tell you exactly where to
stick your great idea. Their problem is not a lack of ideas, it's a
lack of time in which to implement all of their ideas. For example,
I've got enough ideas right now for bits of code and research that I
want to write that I could probably work full-time for at least 10
years to implement just the ones I have _now_. Effectively, what
you're asking these people to do is to do all of the work, and give
you some of the credit for having the idea.

Start with the easy bits: make a list of _every_ piece of metadata
that can be stored by an ext2 filesystem. Do the same for ReiserFS.
Work out how one maps to the other. Write a C/C++ struct to contain
that metadata. Work out how you're going to store your metadata nodes
on-disk. Those are the easy bits.

Then it gets harder: see if you can get the documentation for the
on-disk format of ext2. I don't believe that it's _stunningly_
complicated, at least in its basics. If ext2 turns out to be too
complicated, try FAT32 (or even FAT16!). Write a piece of (userspace)
code which can read that format (or take it from either the kernel or
e2fsck) and which converts to your own format. Write another piece of
code which converts back (to ext2). Try it on a small ext2 loopback
device -- copy all the data on the "device" to a file somewhere else,
and then try to create another ext2 FS from your metadata.

Fix bugs, and repeat for ReiserFS.

By this point, you will know how ext2 and Reiser really work. Then
you can start considering how to manage your metadata objects inside a
partly-converted filesystem. Work out how to do that, and implement it
(still in user-space). At that point, you can take it back to LKML,
and say: "here's some code which I think will work". They will
probably tear it to bits, and find loads of holes. This is good -- it
probably means that someone's found it worthwhile enough to help you
with it. At this point, if it works, things may snowball, although
you'll still end up doing most of the work.

Personally, I think you're doomed, and I think you're probably
terminally doomed at the managing-the-atoms-in-the-filesystem bit,
just because by its very nature an atom is going to take up much more
space than the equivalent metadata in any given filesystem.

> >> I've beaten the O((FS_COUNT)^2) already. And by the way, it's
> >> O((FS_COUNT)*(FS_COUNT - 1_). There's exactly O(2*FS_COUNT)
> >> and o(2*FS_COUNT) sets of code needed total to be able to convert
> >> between any two filesystems.
> >
> > There's no such thing as O(x*(x-1)). This is precisely O(x^2).
> >Similarly, O(2*x) is precisely the same as O(x). If you're going to
> >try to use mathematics to demonstrate your point, please at least make
> >sure that you're using it _right_.
> >
>
> Big O notation is inappropriate here because it measures time complexity;
> however, I was following Viro's lead. We're using it to measure code
> complexity, sorry.

Just to put the record straight, computer scientists normally use
O-notation to describe time complexity. However, it's a general
notation for describing functions qualitatively, and could be used in
any context (growth of time, growth of code, growth of data, w.r.t.
some input parameter(s)). I've used O-notation for talking both about
data size and simply about arbitrary functions (when you expand a
polynomial series, you tend to write "... + O(e^3)", or whatever, at
the end of the useful expansion of the series, to indicate that it
keeps going).

Hugo.

--
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
PGP key: 1C335860 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- For months now, we have been making triumphant retreats ---
before a demoralised enemy who is advancing
in utter disorder.


Attachments:
(No filename) (4.83 kB)
(No filename) (189.00 B)
Download all attachments

2003-06-29 21:36:46

by John Bradford

[permalink] [raw]
Subject: Re: File System conversion -- ideas

> >> Both filesystems are the full size of the partition, and so is the
> >> datasystem. The only difference is that before you start you have
> >> to make sure that the datasystem's gonna fit in with the free space
> >> on the first filesystem, and still have space to start the second
> >> filesystem, and then have space for its atoms.
> >
> >Just thought - that's going to be a problem in read-write mode :-/.
> >
> >If the disk fills up, we'd need to be able to maintain a consistant
> >filesystem structure, (at least good enough so that a separate
> >fsck-like utility could repair it - if the disk filled up, then the
> >conversion couldn't be done on-the-fly).
> >
>
>
> mmm.. hadn't thought of that.
>
> 1 second answer: Lock down some of the freespace. Do NOT let it
> get full. You know how ext2 reserves 5% for the superuser? Do that.
> Reserve enough freespace to keep working and finish the conversion.
> Predict from the beginning how much free space is going to be needed,
> and how much is going to be left over at the very final stages of the
> conversion.

That should work fine in most cases - it's not a problem to reserve
too much for the duration of the converstion, as it all gets freed
afterwards. In most cases, we'd probably only need a relatively small
amount of space to allow writes whilst the conversion is in progress.

John.

2003-06-29 21:42:03

by rmoser

[permalink] [raw]
Subject: Re: File System conversion -- ideas



*********** REPLY SEPARATOR ***********

On 6/29/2003 at 10:37 PM Hugo Mills wrote:

>[Damn, forgot to cc the list]
>
>On Sun, Jun 29, 2003 at 05:00:34PM -0400, rmoser wrote:
>>
>>
>> *********** REPLY SEPARATOR ***********
>>
>> On 6/29/2003 at 9:50 PM Hugo Mills wrote:
>>
>> >On Sun, Jun 29, 2003 at 04:29:45PM -0400, rmoser wrote:
>> >> That would be much harder to maintain as well. It would have to be
>> >altered
>> >> every time the filesystem code in the kernel is changed.
>> >
>> > Yes, but the point is it's a much easier thing to implement and
>> >test the concept than diving straight into kernel code. You don't have
>> >to maintain it for very long (if at all) -- just long enough to prove
>> >to everyone that this kind of conversion is possible, and that they
>> >should help you roll it into the kernel.
>> >
>>
>> I can't code it. I want to, it'd be FUN, but I can't.
>
> This, I think, is your problem. IIRC, you had the same problem last
>time you posted an idea to LKML. It's like novelists -- come up with a
>good idea for a story, and approach a novelist with it: "Hey! I've got
>this great idea for a story!". They will tell you exactly where to
>stick your great idea. Their problem is not a lack of ideas, it's a
>lack of time in which to implement all of their ideas. For example,
>I've got enough ideas right now for bits of code and research that I
>want to write that I could probably work full-time for at least 10
>years to implement just the ones I have _now_. Effectively, what
>you're asking these people to do is to do all of the work, and give
>you some of the credit for having the idea.
>

I don't need credit. I didn't code it :-p

> Start with the easy bits: make a list of _every_ piece of metadata
>that can be stored by an ext2 filesystem. Do the same for ReiserFS.
>Work out how one maps to the other. Write a C/C++ struct to contain
>that metadata. Work out how you're going to store your metadata nodes
>on-disk. Those are the easy bits.
>

Nerg. Heh that's gonna be hard to find. Need to get a book on filesystems.

> Then it gets harder: see if you can get the documentation for the
>on-disk format of ext2. I don't believe that it's _stunningly_
>complicated, at least in its basics. If ext2 turns out to be too
>complicated, try FAT32 (or even FAT16!). Write a piece of (userspace)
>code which can read that format (or take it from either the kernel or
>e2fsck) and which converts to your own format. Write another piece of
>code which converts back (to ext2). Try it on a small ext2 loopback
>device -- copy all the data on the "device" to a file somewhere else,
>and then try to create another ext2 FS from your metadata.
>

I can likely do it with 2 datasystems of my own design but nobody would
give a rats ass.

> Fix bugs, and repeat for ReiserFS.
>
> By this point, you will know how ext2 and Reiser really work. Then
>you can start considering how to manage your metadata objects inside a
>partly-converted filesystem. Work out how to do that, and implement it

What? I'd rather structure the datasystem to handle it right off the bat.
(I'm expecting to get flamed for this statement lol)

>(still in user-space). At that point, you can take it back to LKML,
>and say: "here's some code which I think will work". They will
>probably tear it to bits, and find loads of holes. This is good -- it
>probably means that someone's found it worthwhile enough to help you
>with it. At this point, if it works, things may snowball, although
>you'll still end up doing most of the work.
>

I'll fry the kernel if I put it in there.

> Personally, I think you're doomed, and I think you're probably
>terminally doomed at the managing-the-atoms-in-the-filesystem bit,
>just because by its very nature an atom is going to take up much more
>space than the equivalent metadata in any given filesystem.
>

Like I said, it won't work 100% of the time. But of course Linux doesn't
boot 100% of the time; sometimes some fool tries to run it on a system
with 64k of RAM, and it just requires a bit more than that.

>> >> I've beaten the O((FS_COUNT)^2) already. And by the way, it's
>> >> O((FS_COUNT)*(FS_COUNT - 1_). There's exactly O(2*FS_COUNT)
>> >> and o(2*FS_COUNT) sets of code needed total to be able to convert
>> >> between any two filesystems.
>> >
>> > There's no such thing as O(x*(x-1)). This is precisely O(x^2).
>> >Similarly, O(2*x) is precisely the same as O(x). If you're going to
>> >try to use mathematics to demonstrate your point, please at least make
>> >sure that you're using it _right_.
>> >
>>
>> Big O notation is inappropriate here because it measures time complexity;
>> however, I was following Viro's lead. We're using it to measure code
>> complexity, sorry.
>
> Just to put the record straight, computer scientists normally use
>O-notation to describe time complexity. However, it's a general
>notation for describing functions qualitatively, and could be used in
>any context (growth of time, growth of code, growth of data, w.r.t.
>some input parameter(s)). I've used O-notation for talking both about
>data size and simply about arbitrary functions (when you expand a
>polynomial series, you tend to write "... + O(e^3)", or whatever, at
>the end of the useful expansion of the series, to indicate that it
>keeps going).
>
> Hugo.
>
>--
>=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
> PGP key: 1C335860 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
> --- For months now, we have been making triumphant retreats ---
> before a demoralised enemy who is advancing
> in utter disorder.
>
>-----BEGIN PGP SIGNATURE-----
>Version: GnuPG v1.2.2 (GNU/Linux)
>
>iD8DBQE+/1wTssJ7whwzWGARArWTAJ4vHR2U/lrV74CSWqBe4B0jLMotUQCfal6b
>kd8BMIpmvhi4Szc2GOfSL5c=
>=WGqO
>-----END PGP SIGNATURE-----
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/

--Bluefox Icy

2003-06-29 22:11:36

by Hugo Mills

[permalink] [raw]
Subject: Re: File System conversion -- ideas

On Sun, Jun 29, 2003 at 05:54:58PM -0400, rmoser wrote:
> On 6/29/2003 at 10:37 PM Hugo Mills wrote:
> > Start with the easy bits: make a list of _every_ piece of metadata
> >that can be stored by an ext2 filesystem. Do the same for ReiserFS.
> >Work out how one maps to the other. Write a C/C++ struct to contain
> >that metadata. Work out how you're going to store your metadata nodes
> >on-disk. Those are the easy bits.
>
> Nerg. Heh that's gonna be hard to find. Need to get a book on filesystems.

Hardly difficult. I found what appears to be a pretty complete
35-page document on the structure of the FAT filesystem in about 10
seconds with Google. ext2 was a little tricker to find, but eventually
I got to the sourceforge project for e2fsprogs.

[snip]
> > Fix bugs, and repeat for ReiserFS.
> >
> > By this point, you will know how ext2 and Reiser really work. Then
> >you can start considering how to manage your metadata objects inside a
> >partly-converted filesystem. Work out how to do that, and implement it
>
> What? I'd rather structure the datasystem to handle it right off the bat.
> (I'm expecting to get flamed for this statement lol)

And quite rightly. Take small steps, moving towards your ultimate
goal. Don't try to get a perfect system working immediately -- you
_will_ fail (unless you're some previously unsung ?bercoder, in which
case you'll merely find it insanely difficult :) ).

Hugo.

--
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
PGP key: 1C335860 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- For months now, we have been making triumphant retreats ---
before a demoralised enemy who is advancing
in utter disorder.


Attachments:
(No filename) (1.77 kB)
(No filename) (189.00 B)
Download all attachments

2003-06-29 23:50:32

by Richard Braakman

[permalink] [raw]
Subject: Re: File System conversion -- ideas

On Sun, Jun 29, 2003 at 04:36:15PM -0400, rmoser wrote:
> Told you, I can't code it. I could work on making an initial design for the
> most important part though, the datasystem that separates the two filesystems
> and holds the meta-data and data in self-contained atoms. I KNOW I won't
> get it right the first time, but I can give you a place to start.

I don't think that's the most important part. The most important part
is figuring out a layout for the filesystem while it's in transition,
such that it is at the same time a valid ext3 filesystem (so that the
ext3 export routines can work on it) and a valid reiser4 filesystem
(so that the reiser4 import routines can work on it). And you need
to do it in such a way that the import routines won't stomp on data
that hasn't been exported yet.

If you don't have that, then there's no point in putting it in the
kernel because you won't be able to re-use the kernel fs code anyway.

Then you need to generalize this to work with any pair of filesystems.

As for the datasystem to hold the metadata: I expect you'll find that
backup/restore systems already implement this. It's what they have to
do, after all.

Richard Braakman

2003-06-30 00:11:17

by Jan Harkes

[permalink] [raw]
Subject: Re: File System conversion -- ideas

On Sun, Jun 29, 2003 at 04:29:45PM -0400, rmoser wrote:
> NO! You're not getting the point at all!
>
> You don't need a pair! If you have 10 filesystems, you need 10 sets of
> code in each direction, not 90. You convert from the data/metadata set
> in the first filesystem to a self-contained atom, and then back from the
> atom to the data/metadata set in the new filesystem. The atom is object
> oriented, so anything that can't be moved over--like ACLs or Reiser4's
> extended attributes that nobody else has, or permissions if converting to
> vfat--is just lost. Note that if the data has an attribute like "Compressed"
> or "encrypted", it is expanded/decrypted and thus brought back to its
> natural form before being stuffed into an atom.

I typically call that 'tar' and it works great whenever I want to
convert from one filesystem to another. I just haven't got a clue why
you want to implement tar (or cpio) in the kernel as the userspace
implementation is already pretty usable.

Jan

2003-06-30 00:44:39

by rmoser

[permalink] [raw]
Subject: Re: File System conversion -- ideas



*********** REPLY SEPARATOR ***********

On 6/30/2003 at 3:05 AM Richard Braakman wrote:

>On Sun, Jun 29, 2003 at 04:36:15PM -0400, rmoser wrote:
>> Told you, I can't code it. I could work on making an initial design for
>the
>> most important part though, the datasystem that separates the two
>filesystems
>> and holds the meta-data and data in self-contained atoms. I KNOW I won't
>> get it right the first time, but I can give you a place to start.
>
>I don't think that's the most important part. The most important part
>is figuring out a layout for the filesystem while it's in transition,
>such that it is at the same time a valid ext3 filesystem (so that the
>ext3 export routines can work on it) and a valid reiser4 filesystem
>(so that the reiser4 import routines can work on it). And you need
>to do it in such a way that the import routines won't stomp on data
>that hasn't been exported yet.
>

Nerg. Filesystem isn't exactly readable while it's being disassembled.
Well, I'd leave everything in place for inodes that are there, but entries in
the inode table are slowly being removed. The idea is that you keep what
you need to get to valid, and the hell with what you're done with. For
example, once I've drained all data out of the superblock and categorized
it so that I can pass it to functions to deal with the orginal fs (OFS), you can
destroy the superblock on-disk, and just make logical copies of it in RAM
whenever it needbe referenced (just like buffering, before it's flushed to disk
the buffer is read for the changed data).

By the same token, once you grab all data and meta-data for an Inode and
place it into an atom, you're'm done with that inode and it's data. You won't
be reading it again and you will NOT be writing to the OFS again. Poof, free'd.
That space belongs to the converter datasystem.

Note that the inode table is now invalid. Still, you can mark what I've taken
apart already down, and point out where to start. The OFS driver will of
course have given you a chunk of all the important information from the
superblock and whatever else it needs to locate any given inode by index,
which it'll be querying when you tell it to get the NEXT inode. Of course you
extract that inode, and you have exactly what you need to locate the data
belonging to it and pack it into another atom.

By the same token, you can probably start off with a valid filesystem for
your TFS (Target FS), by spewing out a superblock and a fresh, empty
Inode table that you fill in slowly.

Now I'm working on this upstairs, and I've run into the directory issue.
You know, the inode numbers change as you go along. There's a way
to deal with that too. Three ways actually.

-- Hold directories off until last. As you rewrite inodes, you scan the
directory atoms and find the pointers to the original inode numbers.
Then change them to whatever the hell they changed to.

-- Inform the TFS of the original inode index for each inode as you send
it, and let it rewrite the directories

-- Make sure inodes match up

The last one is stupid. It may not be possible, or may just not work.
It requires copying deleted inodes too.

The second one is an admirable attempt, but still stupid: You have to
do a whole scan every time an inode is written to the TFS.

That first one is great. It puts all directories at the end of the TFS, true.
but they're all together, interestingly enough. Also, it lets the CDS
(Conversion DataSystem) control how these are indexed, which allows
universal optimization by rewriting the CDS' code.

>If you don't have that, then there's no point in putting it in the
>kernel because you won't be able to re-use the kernel fs code anyway.
>

Some of it has to be rewritten.

>Then you need to generalize this to work with any pair of filesystems.
>

Done.

>As for the datasystem to hold the metadata: I expect you'll find that
>backup/restore systems already implement this. It's what they have to
>do, after all.
>

NOT in place they don't!

>Richard Braakman
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/



2003-06-30 00:48:05

by rmoser

[permalink] [raw]
Subject: Re: File System conversion -- ideas



*********** REPLY SEPARATOR ***********

On 6/29/2003 at 8:25 PM Jan Harkes wrote:

>On Sun, Jun 29, 2003 at 04:29:45PM -0400, rmoser wrote:
>> NO! You're not getting the point at all!
>>
>> You don't need a pair! If you have 10 filesystems, you need 10 sets of
>> code in each direction, not 90. You convert from the data/metadata set
>> in the first filesystem to a self-contained atom, and then back from the
>> atom to the data/metadata set in the new filesystem. The atom is object
>> oriented, so anything that can't be moved over--like ACLs or Reiser4's
>> extended attributes that nobody else has, or permissions if converting to
>> vfat--is just lost. Note that if the data has an attribute like
>"Compressed"
>> or "encrypted", it is expanded/decrypted and thus brought back to its
>> natural form before being stuffed into an atom.
>
>I typically call that 'tar' and it works great whenever I want to
>convert from one filesystem to another. I just haven't got a clue why
>you want to implement tar (or cpio) in the kernel as the userspace
>implementation is already pretty usable.
>

tar --inplace --fs-convert --targetfs=reiserfs /dev/hda1

....... it doesn't like it

>Jan
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/



2003-06-30 03:38:57

by Horst H. von Brand

[permalink] [raw]
Subject: Re: File System conversion -- ideas

rmoser <[email protected]> said:
> "Leonard Milcin Jr." <[email protected]> said:

> >Ok, I forgot about enterprise users with lots of data, and probably
> >lacking free space, so I missed a point.

> Yeppers. Also that the eventual goal (at least in my mind) is to allow
> this to be done on a running r/w filesystem safely, which isn't as tough
> a problem as it sounds.

It is a lot of in-kernel complexity, for a one-shot job once in a blue moon
(or even once per machine, if that much). If it can be done easily (like
ext2 --> ext3), by all means go ahead! If there is the slightest hint of
complexity, forget it. Not worth the kernel code, plus it won't ever be
debugged past "nice toy" stage for people who really care about their data.

"Enterprise users" have backups, and are more than willing to just get a
new disk/machine for migrating data. Data is normally _much_ more valuable
than the media it sits on.

Can we now please drop this sillyness? Either show the astonished world
by coding it up and debugging it that it _is_ doable, or shut up.
--
Dr. Horst H. von Brand User #22616 counter.li.org
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria +56 32 654239
Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513

2003-06-30 08:32:44

by John Bradford

[permalink] [raw]
Subject: Re: File System conversion -- ideas

> >I typically call that 'tar' and it works great whenever I want to
> >convert from one filesystem to another. I just haven't got a clue why
> >you want to implement tar (or cpio) in the kernel as the userspace
> >implementation is already pretty usable.
> >
>
> tar --inplace --fs-convert --targetfs=reiserfs /dev/hda1
>
> ....... it doesn't like it

tar -cf - -C /old_filesystem | tar -xf - -C /newfilesystem

Works fine, and copies symbolic links, and device files properly. If
you don't want sparse files expanded, you can use --sparse.

Yes, it needs both old and new filesystems on-line at once. That
isn't a problem for a lot of users.

It has the advantage over an on-line conversion utility that the files
are layed out in the way they were intended to be by the filesystem,
for performance, and anti-fragmentation reasons.

There are probably a few smaller ISPs, with customer webservers which
are not guaranteed to be backed up, who would like to be able to
switch to a more modern filesystem at some point in the future,
without downtime. Union mounts would potentially be useful here - the
old data can be kept on whatever filesystem it's on, and a new
filesystem union mounted over it. If a file is updated, it's
re-written to the new filesystem. Data that's changed would migrate
to the new filesystem. Once most of it is across, you could touch all
of the remaining files, and force them across. Your webserver
performance shouldn't be impacted during all of this, (and might even
improve, if write performance is much better on the new filesystem).

For desktop users, small amounts of downtime usually don't matter,
filesystem performance isn't usually critical either, and if the data
isn't backed up anywhere, data integrity _is_ important, so I would
suggest that they either stick with their existing filesystem, or
backup and restore.

A conversion utility could save the time of the restore, but if it
leaves the user with a badly fragmented or poorly layed out
filesystem, it could well be counter-productive.

So, assuming that the main real world use would be small, but busy
servers which want better performance for new data, and old data
gradually migrated across, but with minimum performance impact, union
mounts would be a way to achieve this.

Union mounts would be a lot easier to implement, and a lot more
useful than a converstion utility. Note that BSD already has union
mounts.

IFF a conversion utility could be written that:

* Works on read-write mounted filesystems
* Doesn't produce a poorly layed out filesystem

Then _maybe_ it would be useful.

Personally, if I was interested in implementing this, (which I'm not),
I wouldn't worry about data integrity at all times - (if it failed for
some reason, it would require a restore of the backup which the user
was advised to make anyway), but create the framework of the new
filesystem image in memory, with references to the location of data in
the old, (real), filesystem, having moved all the data to the end of
the existing filesystem. Once complete, I'd unmount the existing FS,
and overwrite it with the in-memory filesystem, making sure to read
anything I needed from the old, (unmounted), filesystem image before
overwriting it.

I.E. to convert a filesystem with three files, (A, B, and C), and some
free space, (F). The block numbers are underneath.

Old filesystem with 4K block size
----------------------------
AAAAABCCCBBCCBBCCCCCCCFFFFFF
1234567891111111111222222222
0123456789012345678

Move data to the end.

FFFFFFCCCBBCCBBCCCCCCCAAAAAB

Desired new filesystem with 8K block size
----------------------------
AAAAAaFFBBBBBbFFCCCCCCCCCCCC
1 2 3 4 5 6 7 8 9 1 1 1 1 1
0 1 2 3 4

(the lower case letters represent the extra space taken up by the
larger block size).

So, we would create a table in memory:

New FS Old FS
1 23,24
2 25,26
3 27
5 28,10
6 11,14
7 15
9 7,8
10 9,12
11 13,16
12 17,18
13 19,20
14 21,22

This table maps the in-memory new filesystem to blocks in the old
filesystem. Now, we could unmount the old filesystem, and mount the
new virtual filesystem in it's place, then start writing the new
filesystem to disk:

Read data from old blocks 23 and 24, and write it to new block 1, overwriting old blocks 1 and 2,
Read data from old blocks 25 and 26, and write it to new block 2, overwriting old blocks 3 and 4,
Read data from old block 27 , and write it to new block 3, overwriting old blocks 5 and 6,
Store data from old blocks 7 and 8, (for new block 9), in RAM,
erase block 4 , overwriting old blocks 7 and 8,
Read data from old blocks 28 and 10, and write it to new block 5, overwriting old blocks 9 and 10,
Store data from old block 12, (for new block 10) in RAM,
Read data from old blocks 11 and 14, and write it to new block 6, overwriting old blocks 11, and 12,
Store data from old block 13, (for new block 11), in to RAM.
Read data from old block 15 , and write it to new block 7, overwriting old blocks 13, and 14,
Read data from old block 16, (for new block 11), in to RAM.
erase block 8 , overwriting old blocks 15, and 16,
Write data from RAM, in to new block 9, and free that RAM.

...etc...

At the end, you'd have the new filesystem on disk, and there would be
a direct mapping between the virtual RAM-based filesystem and the disk
blocks. At that point, you could umount the virtual filesystem, and
mount the disk based one.

This would be do-able on a read-write filesystem, because writes would
go only to the new RAM-based virtual filesystem - the original
filesystem would be mounted read-only before the convertion process
started.

So, it's interesting and possible in theory, but is it practical or
worth implementing? I don't think so. If somebody is interested in
implementing it I'd be pleased to see it in the kernel, but it's not a
project I'd have any real interest in myself.

John.

2003-06-30 08:58:56

by Nikita Danilov

[permalink] [raw]
Subject: Re: File System conversion -- ideas

[email protected] writes:
> On Sun, Jun 29, 2003 at 01:19:24PM -0700, Davide Libenzi wrote:
>
> > > AFAICS, it is _very_ hard to implement. Even outside of the kernel.
> > > If you can get it done - well, that might do a lot for having the
> > > idea considered seriously. "Might" since you need to do it in a way
> > > that would survive transplantation into the kernel _and_ would scale
> > > better that O((number of filesystem types)^2).
> >
> > Maybe defining a "neutral" metadata export/import might help in limiting
> > such NFS^2 ...
>
> Go for it - do it in userland, define the mapping between various sorts
> of metadata and let's see how well you can make it work. Have fun.

Some people are actually doing this:

http://tzukanov.narod.ru/convertfs/

> -

Nikita.

2003-06-30 09:22:23

by Hans Reiser

[permalink] [raw]
Subject: Re: File System conversion -- ideas

I tend to agree with the below. I just want to add though that there
are a lot of users who have one disk drive and and no decent network
connection to somewhere with a lot of storage. It would be nice to
adapt tar to understand about the reiser4 resizer and mkreiser4 and the
reiser3 resizer, and the partitioner (yah, at this point it would no
longer really be tar, but.... ), and to have it shrink the V3 partition,
create a reiser4 partition, copy some of the V3 partition to the V4
partition, shrink the V3 partition some more, etc.....

Money will get us to do this. Otherwise we will work on what we are
contracted to do for DARPA.

Hans

John Bradford wrote:

>>>I typically call that 'tar' and it works great whenever I want to
>>>convert from one filesystem to another. I just haven't got a clue why
>>>you want to implement tar (or cpio) in the kernel as the userspace
>>>implementation is already pretty usable.
>>>
>>>
>>>
>>tar --inplace --fs-convert --targetfs=reiserfs /dev/hda1
>>
>>....... it doesn't like it
>>
>>
>
>tar -cf - -C /old_filesystem | tar -xf - -C /newfilesystem
>
>Works fine, and copies symbolic links, and device files properly. If
>you don't want sparse files expanded, you can use --sparse.
>
>Yes, it needs both old and new filesystems on-line at once. That
>isn't a problem for a lot of users.
>
>It has the advantage over an on-line conversion utility that the files
>are layed out in the way they were intended to be by the filesystem,
>for performance, and anti-fragmentation reasons.
>
>


2003-06-30 12:52:04

by Jesse Pollard

[permalink] [raw]
Subject: Re: File System conversion -- ideas

On Sunday 29 June 2003 01:57, rmoser wrote:
> I know I spout a ... wtf? HTML composing? *attempts to eliminate*
>
> I know I spout a lot of crap, and wish I could just do it all (can we get
> a "Make a small device driver for virtual hardware in Linux 2.4 and 2.5"
> tutorial up on kernel.org?!), but I think I've got some good ideas. At
> any rate, the good is kept and the bad is weeded out, right?
>
> Anyhow, I'm thinking still about when reiser4 comes out. I want to
> convert to it from reiser3.6. It came to my attention that a user-space
> tool to convert between filesystems is NOT the best way to deal with
> this. Seriously, you'd think it would be, right? Wrong, IMHO.
>
> You have the filesystem code for every filesystem Linux supports. It's
> there, in the kernel. So why maintain a kludgy userspace tool that has
> to be rewritten to understand them all? I have a better idea.
>
> How about a kernel syscall? It's possible to do this on a running
> filesystem but it's far too difficult for a start, so let's start with
> unmounted filesystems mmkay?
>
> **** BEGIN WELL STRUCTURED MESSAGE ****
>
> I'm going to go over a method of building into the kernel a filesystem
> conversion suite. I am first going to go over a brief overrun of the
> concept, then I will draw up a roadmap, and then I will explain why I
> believe this is the best way to solve this problem.
[snip]

Whats wrong with:

mount old filesystem
mkfs newfilesystem on different disk
mount new filesystem
cd old filesystem
tar -cfp - . | (cd new_filesystem; tar -xfp -)

Which is what I do.

If I MUST do something more in-place replacement....

1. backup to tape
2. backup to tape (never hurts)
3. verify tape
4. umount old_filesystem
5. mkfs new_filesystem (same disk)
6. mount new_filesystem
7. restore from tape

A lot longer. but there is no "kludgy userspace tool that has to be
rewritten to understand them all".

2003-06-30 13:13:00

by Jesse Pollard

[permalink] [raw]
Subject: Re: File System conversion -- ideas

On Sunday 29 June 2003 14:45, rmoser wrote:
> *********** REPLY SEPARATOR ***********
>
> On 6/29/2003 at 8:42 PM [email protected] wrote:
> >On Sun, Jun 29, 2003 at 08:28:47PM +0100, Jamie Lokier wrote:
> >> Consider that many people choose ext3 rather than reiser simply
> >> because it is easy to convert ext2 to ext3, and hard to convert ext2
> >> to reiser (and hard to convert back if they don't like it). I have
> >> seen this written by many people who choose to use ext3. Thus proving
> >> that there is value in in-place filesystem conversion :)
> >
> >Uh-huh. You want to get in-kernel conversion between ext* and reiserfs?
> >With recoverable state if aborted? Get real.
>
> no, in-kernel conversion between everything. You don't think it can be
> done? It's not that difficult a problem to manage data like that :D

You are ASSUMING that the new filesystem requires lessthan or equal amount
of metadata. This is NOT always true. A conversion of a full EXT2 to Riserfs
would fail simply because there is no free space to expand the needed
additional overhead.

Going in the other direction usually is possible (again, depending on the
filesystem) but there are exceptions... Try converting an EXT2 to DosFS.
In place. And maintain a recoverable state when aborted.

Not gonna happen.

Too much depends on what the target filesystem is, and what it may require.

Consider another - switching to an extent filesystem... If the datablocks
don't move, then you need MORE extents than the current indirect pointers.
And each extent is LARGER than the indirect pointers.

Then you have to compress/condense the extents (requiring shuffling data
blocks around to reduce the number of extents). Each requires free space
to do it's work, and the amount of free blocks is not the same.

Faster to do a copy. more reliable too. and recoverable.

2003-06-30 13:28:08

by Hans Reiser

[permalink] [raw]
Subject: Re: File System conversion -- ideas

Jesse Pollard wrote:

>On Sunday 29 June 2003 14:45, rmoser wrote:
>
>
>>*********** REPLY SEPARATOR ***********
>>
>>On 6/29/2003 at 8:42 PM [email protected] wrote:
>>
>>
>>>On Sun, Jun 29, 2003 at 08:28:47PM +0100, Jamie Lokier wrote:
>>>
>>>
>>>>Consider that many people choose ext3 rather than reiser simply
>>>>because it is easy to convert ext2 to ext3, and hard to convert ext2
>>>>to reiser (and hard to convert back if they don't like it). I have
>>>>seen this written by many people who choose to use ext3. Thus proving
>>>>that there is value in in-place filesystem conversion :)
>>>>
>>>>
>>>Uh-huh. You want to get in-kernel conversion between ext* and reiserfs?
>>>With recoverable state if aborted? Get real.
>>>
>>>
>>no, in-kernel conversion between everything. You don't think it can be
>>done? It's not that difficult a problem to manage data like that :D
>>
>>
>
>You are ASSUMING that the new filesystem requires lessthan or equal amount
>of metadata. This is NOT always true. A conversion of a full EXT2 to Riserfs
>would fail simply because there is no free space to expand the needed
>additional overhead.
>
Uh, you mean converting reiserfs to ext2 would fail.... we are more
space efficient....

>
>Going in the other direction usually is possible (again, depending on the
>filesystem) but there are exceptions... Try converting an EXT2 to DosFS.
>In place. And maintain a recoverable state when aborted.
>
>Not gonna happen.
>
>Too much depends on what the target filesystem is, and what it may require.
>
>Consider another - switching to an extent filesystem... If the datablocks
>don't move, then you need MORE extents than the current indirect pointers.
>And each extent is LARGER than the indirect pointers.
>
>Then you have to compress/condense the extents (requiring shuffling data
>blocks around to reduce the number of extents). Each requires free space
>to do it's work, and the amount of free blocks is not the same.
>
>Faster to do a copy. more reliable too. and recoverable.
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/
>
>
>
>


--
Hans


2003-06-30 13:42:29

by Jesse Pollard

[permalink] [raw]
Subject: Re: File System conversion -- ideas

On Monday 30 June 2003 08:42, Hans Reiser wrote:
> Jesse Pollard wrote:
[snip]
> >>no, in-kernel conversion between everything. You don't think it can be
> >>done? It's not that difficult a problem to manage data like that :D
> >
> >You are ASSUMING that the new filesystem requires lessthan or equal amount
> >of metadata. This is NOT always true. A conversion of a full EXT2 to
> > Riserfs would fail simply because there is no free space to expand the
> > needed additional overhead.
>
> Uh, you mean converting reiserfs to ext2 would fail.... we are more
> space efficient....

yes. stupid me got the order backward.

2003-06-30 13:48:34

by John Bradford

[permalink] [raw]
Subject: Re: File System conversion -- ideas

> I tend to agree with the below. I just want to add though that there
> are a lot of users who have one disk drive and and no decent network
> connection to somewhere with a lot of storage. It would be nice to
> adapt tar to understand about the reiser4 resizer and mkreiser4 and the
> reiser3 resizer, and the partitioner (yah, at this point it would no
> longer really be tar, but.... ), and to have it shrink the V3 partition,
> create a reiser4 partition, copy some of the V3 partition to the V4
> partition, shrink the V3 partition some more, etc.....

Out of interest, won't the resulting filesystem be excessively
fragmented, and cause worse performance than a virgin filesystem, or
does the reiser resizer actively prevent that?

John.

2003-06-30 15:29:30

by Leonard Milcin Jr.

[permalink] [raw]
Subject: Re: File System conversion -- ideas

John Bradford wrote:
> Out of interest, won't the resulting filesystem be excessively
> fragmented, and cause worse performance than a virgin filesystem, or
> does the reiser resizer actively prevent that?

This is not a big problem. Firstly, we have to get a working filesystem,
whether it will be fragmented or not. Then the filesystem can be
defragmented.

--
"Unix IS user friendly... It's just selective about who its friends are."
-- Tollef Fog Heen

Subject: Re: File System conversion -- ideas

"David D. Hagood" <[email protected]> writes:

>For example, suppose you have a 60G disk, 55G of data, in ext2, and you
>wish to convert to ReiserFS.

>Step 1: Shrink the volume to 55G. This requires a "shrink disk" utility
>for the source file system (which exists for the major file systems in
>use today).

You have a 6 GB file. You lose. :-)

Regards
Henning

--
Dipl.-Inf. (Univ.) Henning P. Schmiedehausen INTERMETA GmbH
[email protected] +49 9131 50 654 0 http://www.intermeta.de/

Java, perl, Solaris, Linux, xSP Consulting, Web Services
freelance consultant -- Jakarta Turbine Development -- hero for hire

--- Quote of the week: "It is pointless to tell people anything when
you know that they won't process the message." --- Jonathan Revusky

2003-06-30 16:15:32

by Al Viro

[permalink] [raw]
Subject: Re: File System conversion -- ideas

On Mon, Jun 30, 2003 at 01:36:41PM +0400, Hans Reiser wrote:
> I tend to agree with the below. I just want to add though that there
> are a lot of users who have one disk drive and and no decent network
> connection to somewhere with a lot of storage. It would be nice to
> adapt tar to understand about the reiser4 resizer and mkreiser4 and the
> reiser3 resizer, and the partitioner (yah, at this point it would no
> longer really be tar, but.... ), and to have it shrink the V3 partition,
> create a reiser4 partition, copy some of the V3 partition to the V4
> partition, shrink the V3 partition some more, etc.....
>
> Money will get us to do this. Otherwise we will work on what we are
> contracted to do for DARPA.

*Ugh*. If one really wants reiserfs v3 -> v4 conversion, presumably
there are much more intelligent ways to do that. For one thing,
you really don't want to create an empty tree and move the stuff
from original node-by-node - that would give a shitload of IO on
tree rebalancing alone, not to mention the PITA it will be for allocator
(you get to reshuffle trees a lot on potentially almost full fs).

2003-06-30 16:43:09

by Leonard Milcin Jr.

[permalink] [raw]
Subject: Re: File System conversion -- ideas

Henning P. Schmiedehausen wrote:
> "David D. Hagood" <[email protected]> writes:
>
>
>>For example, suppose you have a 60G disk, 55G of data, in ext2, and you
>>wish to convert to ReiserFS.
>
>
>>Step 1: Shrink the volume to 55G. This requires a "shrink disk" utility
>>for the source file system (which exists for the major file systems in
>>use today).
>
>
> You have a 6 GB file. You lose. :-)
>
> Regards
> Henning
>

Hey folk! I don't used LVM, but I think it allows file to be splitted
between diferent filesystems. Yes?


--
"Unix IS user friendly... It's just selective about who its friends are."
-- Tollef Fog Heen

2003-06-30 16:54:56

by Kevin Corry

[permalink] [raw]
Subject: Re: File System conversion -- ideas

On Monday 30 June 2003 11:59, Leonard Milcin Jr. wrote:
> Henning P. Schmiedehausen wrote:
> > "David D. Hagood" <[email protected]> writes:
> >>For example, suppose you have a 60G disk, 55G of data, in ext2, and you
> >>wish to convert to ReiserFS.
> >>
> >>
> >>Step 1: Shrink the volume to 55G. This requires a "shrink disk" utility
> >>for the source file system (which exists for the major file systems in
> >>use today).
> >
> > You have a 6 GB file. You lose. :-)
> >
> > Regards
> > Henning
>
> Hey folk! I don't used LVM, but I think it allows file to be splitted
> between diferent filesystems. Yes?

Um, no. Volume managers allow you to span a volume across multiple disks. But
a filesystem (and thus all of its files) is still fully contained within a
single volume. IOW, volume management is a method for managing block-devices.
Filesystems are a method for managing files. There is a distinct line between
them.

--
Kevin Corry
[email protected]
http://evms.sourceforge.net/

2003-06-30 17:23:35

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: File System conversion -- ideas

On Mon, 30 Jun 2003 18:59:34 +0200, "Leonard Milcin Jr." <[email protected]> said:

> Hey folk! I don't used LVM, but I think it allows file to be splitted
> between diferent filesystems. Yes?

No.

LVM lets you "glue together" multiple partitions/disks/parts of disks to make ONE filesystem.

There's no way to say "Gig 1 throughj 3 of this file is in /fs1/foo and Gigs 4 and 5 are in /fs2/bar".

For starters, what happens if the permissions on foo and bar are different? ;)


Attachments:
(No filename) (226.00 B)

2003-07-01 09:43:32

by Stewart Smith

[permalink] [raw]
Subject: Re: File System conversion -- ideas

On Mon, Jun 30, 2003 at 04:05:00PM +0000, Henning P. Schmiedehausen wrote:
> You have a 6 GB file. You lose. :-)

funny how people can get their head around partially downloaded files, but not around partial files on a disk.

copy 5GB first, resize, then the next gig. easy. (well, relatively).

Sure, you could crash and end up with 2 halves - which a 'cat' would fix!

--
Stewart Smith
Vice President, Linux Australia
http://www.linux.org.au (personal: http://www.flamingspork.com)

2003-07-01 09:48:04

by Stewart Smith

[permalink] [raw]
Subject: Re: File System conversion -- ideas

On Sun, Jun 29, 2003 at 09:37:30PM +0200, Leonard Milcin Jr. wrote:

> Say, why you would want to change filesystem type?

Upgrading to Linux (FAT/NTFS/HFS/HFS+ -> ext2/ext3/reiser/XFS).

What would be *really* useful is a Pc partition map -> LVM conversion utility.

--
Stewart Smith
Vice President, Linux Australia
http://www.linux.org.au (personal: http://www.flamingspork.com)

2003-07-01 10:02:10

by Stewart Smith

[permalink] [raw]
Subject: Re: File System conversion -- ideas

On Sun, Jun 29, 2003 at 03:48:06PM -0400, rmoser wrote:

> Yeppers. Also that the eventual goal (at least in my mind) is to allow
> this to be done on a running r/w filesystem safely, which isn't as tough
> a problem as it sounds.

Yes it is. In fact, it is more of a problem then you think.

Think of this simple scenario:
a script is running that downloads a kernel patch, applies it to a tree,
then renames the directory to $1-$patchname.

Half way through this, during the patch, the backup script comes through
and starts to backup the filesystem.


Now - wipe the drive clean at the end and restore it to a sane state.

Doing live things on storage systems without transactions, snapshots or
whatever you want to call them is tricky at best. resizing is going to
cause headaches.

--
Stewart Smith
Vice President, Linux Australia
http://www.linux.org.au (personal: http://www.flamingspork.com)

2003-07-01 14:40:43

by Leonard Milcin Jr.

[permalink] [raw]
Subject: Re: File System conversion -- ideas

Stewart Smith wrote:
> Yes it is. In fact, it is more of a problem then you think.
>
> Think of this simple scenario:
> a script is running that downloads a kernel patch, applies it to a tree,
> then renames the directory to $1-$patchname.
>
> Half way through this, during the patch, the backup script comes through
> and starts to backup the filesystem.
>
>
> Now - wipe the drive clean at the end and restore it to a sane state.
>
> Doing live things on storage systems without transactions, snapshots or
> whatever you want to call them is tricky at best. resizing is going to
> cause headaches.
>

I think of some sort of overlay filesystem on top of that *thing*. In
this case ovarlay filesystem could serve as redo log in database system.
Then we need only worry with read operations, not write. Writes will be
stored in redo log, and eventually they will be included when actual
read only filesystem will be converted.

What you think about this?



--
"Unix IS user friendly... It's just selective about who its friends are."
-- Tollef Fog Heen

2003-07-01 15:28:43

by Stewart Smith

[permalink] [raw]
Subject: Re: File System conversion -- ideas

On Tue, Jul 01, 2003 at 04:55:58PM +0200, Leonard Milcin Jr. wrote:

> I think of some sort of overlay filesystem on top of that *thing*. In
> this case ovarlay filesystem could serve as redo log in database system.
> Then we need only worry with read operations, not write. Writes will be
> stored in redo log, and eventually they will be included when actual
> read only filesystem will be converted.

This is exactly what has been said before in this thread
- i.e. mount the new FS over the old one (union style)
and new writes go to the new FS.

I really thing LVM resizing automagick would be the way to go to.
*much* cleaner and easier to implement.

The real useful thing to do would be to write a utility that would
convert non-LVM systems to LVM.

--
Stewart Smith
Vice President, Linux Australia
http://www.linux.org.au (personal: http://www.flamingspork.com)

2003-07-01 15:50:19

by Matt Reuther

[permalink] [raw]
Subject: Re: File System conversion -- ideas

It seems like the loopback device would be useful for this. You can move all
of you stuff into a mounted loopback device with the new fs. Is there not
some utility to take a filesystem image from inside an fs, and overwrite
that fs with it. It would be lots of sector-to-sector shuffling, but it
would be cleaner than trying to convert.

I guess you could try overlaying the old and new filesystems by virtualizing
the inodes, superblocks, directories, and other stuff in RAM, but you still
have to write it to disk, and some of the metadata from one fs will collide
with the other one. The superblock for ext2fs needs to written to several
fixed places on the filesystem, which might also be needed by
reiserfs/xfs/whatever.

Matt

_________________________________________________________________
MSN 8 helps eliminate e-mail viruses. Get 2 months FREE*.
http://join.msn.com/?page=features/virus

2003-07-01 16:00:00

by Frank Gevaerts

[permalink] [raw]
Subject: Re: File System conversion -- ideas

On Tue, Jul 01, 2003 at 12:04:37PM -0400, Matt Reuther wrote:
> It seems like the loopback device would be useful for this. You can move
> all of you stuff into a mounted loopback device with the new fs. Is there
> not some utility to take a filesystem image from inside an fs, and
> overwrite that fs with it. It would be lots of sector-to-sector shuffling,
> but it would be cleaner than trying to convert.

http://tzukanov.narod.ru/convertfs/ as someone (I don't remember who)
said earlier.

Frank

> Matt

2003-07-01 16:03:32

by Leonard Milcin Jr.

[permalink] [raw]
Subject: Re: File System conversion -- ideas

Stewart Smith wrote:
> This is exactly what has been said before in this thread
> - i.e. mount the new FS over the old one (union style)
> and new writes go to the new FS.
>
> I really thing LVM resizing automagick would be the way to go to.
> *much* cleaner and easier to implement.
>
> The real useful thing to do would be to write a utility that would
> convert non-LVM systems to LVM.
>

I said that too, some time ago. But don't know why it didn't reach LKML.
Perhaps my fault...

LVM resizing would be very good, because most of it is already coded.


--
"Unix IS user friendly... It's just selective about who its friends are."
-- Tollef Fog Heen

2003-07-02 10:44:26

by Pavel Machek

[permalink] [raw]
Subject: Re: File System conversion -- ideas

Hi!

> > >Uh-huh. You want to get in-kernel conversion between ext* and reiserfs?
> > >With recoverable state if aborted? Get real.
> >
> > no, in-kernel conversion between everything. You don't think it can be done?
> > It's not that difficult a problem to manage data like that :D
>
> I think that I will believe it when I see the patchset implementing it.
> Provided that it will be convincing enough. Other than that... Not
> really. You will need code for each pair of filesystems, since
> convertor will need to know *both* layouts. No amount of handwaving
> is likely to work around that. And we have what, something between
> 10 and 20 local filesystems? Have fun...
>
> If you want your idea to be considered seriously - take reiserfs code,
> take ext3 code, copy both to userland and put together a conversion
> between them. Both ways. That, by definition, is easier than doing

Actually partition surprise should be able
to do ext2<=>reiser. It does not have journal, IIRC :-(.
Pavel
--
Pavel
Written on sharp zaurus, because my Velo1 broke. If you have Velo you don't need...

2003-07-02 14:34:43

by Jan Kara

[permalink] [raw]
Subject: Re: File System conversion -- ideas

Hello,

> > > >Uh-huh. You want to get in-kernel conversion between ext* and reiserfs?
> > > >With recoverable state if aborted? Get real.
> > >
> > > no, in-kernel conversion between everything. You don't think it can be done?
> > > It's not that difficult a problem to manage data like that :D
> >
> > I think that I will believe it when I see the patchset implementing it.
> > Provided that it will be convincing enough. Other than that... Not
> > really. You will need code for each pair of filesystems, since
> > convertor will need to know *both* layouts. No amount of handwaving
> > is likely to work around that. And we have what, something between
> > 10 and 20 local filesystems? Have fun...
> >
> > If you want your idea to be considered seriously - take reiserfs code,
> > take ext3 code, copy both to userland and put together a conversion
> > between them. Both ways. That, by definition, is easier than doing
>
> Actually partition surprise should be able
> to do ext2<=>reiser. It does not have journal, IIRC :-(.
Partition Surprise could handle the conversions in userspace but not
in-kernel ones (which I think are discussed above)... We had only set
of patches for online ext2 resizing and that was already non-trivial
(especially shifting block descriptors and such). Conversion could be
actually done in a similar way in principle - one could establish some
block remapper which would map metadata of new filesystem into free
space of the other one and could be moving files by deleting it from
one filesystem and creating them in the other one... But it would not
volunteer to write that (it would be really hard to do it atomicly and
almost impossible to not make garbage of your data when power fails..)

Honza
--
Jan Kara <[email protected]>
SuSE CR Labs

2003-07-06 19:28:18

by Svein Ove Aas

[permalink] [raw]
Subject: Re: File System conversion -- ideas

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

mandag 30. juni 2003, 15:26, skrev Jesse Pollard:

> You are ASSUMING that the new filesystem requires lessthan or equal amount
> of metadata. This is NOT always true. A conversion of a full EXT2 to
> Riserfs would fail simply because there is no free space to expand the
> needed additional overhead.
>
> Going in the other direction usually is possible (again, depending on the
> filesystem) but there are exceptions... Try converting an EXT2 to DosFS.
> In place. And maintain a recoverable state when aborted.
>
> Not gonna happen.
>
> Too much depends on what the target filesystem is, and what it may require.
>
> Consider another - switching to an extent filesystem... If the datablocks
> don't move, then you need MORE extents than the current indirect pointers.
> And each extent is LARGER than the indirect pointers.
>
> Then you have to compress/condense the extents (requiring shuffling data
> blocks around to reduce the number of extents). Each requires free space
> to do it's work, and the amount of free blocks is not the same.
>
> Faster to do a copy. more reliable too. and recoverable.

What this boils down to is, "there may not be enough space".
Personally I prefer incrementally resizing LVM partitions for conversion
anyway, but I'll take a stab at this.

Simple solution: In your conversion routines, allocate a chunk of another
filesystem (just another file), and use that for scratch space. Journalling
and the like won't get much harder, so why not?
And you can easily expand the file if you need to.

Problem: You might wind up with a 99%-converted filesystem and not have enough
space to do without the scratch file. This could be bad.
If you can mount the filesystem as-is, then simply deleting a few large files
would allow the conversion to complete. Otherwise you'll have to resize the
underlying partition and the partial FS - difficult, but doable.

I'm think this idea ranks under "cool, but difficult and not quite necessary".
Union mounts coupled with LVM and online resizing (which both Reiser3.6 and
Reiser4 support, I presume) would render it irrelevant anyway; even without
them, it can still be done with just LVM if taking the filesystem offline for
a few hours is okay.

And if it isn't then you can likely afford some backup tape.

- - Svein Ove Aas
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/CHjS9OlFkai3rMARAtvRAKCJhS4pMI73/AQmT4Nu8nT3XkKfOwCeLfWI
DL+O7LSIRnWaXKJFoT6L550=
=lMxE
-----END PGP SIGNATURE-----

2003-07-07 08:22:15

by John Bradford

[permalink] [raw]
Subject: Re: File System conversion -- ideas

> What this boils down to is, "there may not be enough space".
> Personally I prefer incrementally resizing LVM partitions for conversion
> anyway, but I'll take a stab at this.

Depending on the filesystem, incrementally resizing LVM paritions
could be a very _bad_ way to do it - continuously re-sizing a
partition will typically encourage poor layout and fragmentation. It
would be possible to defragment and optimise the partition afterwards,
but that would extend the convertion time even more, especially if it
was done in a way which kept a consistent filesystem throughout, on a
filesystem without much free space.

The way to avoid, or at least minimise the problem of having one
partition filling the disk, is not to fully partition disks to begin
with - that gives you the flexibility to test and use different
partition types, and move data around. Even without using LVM, it's
easy to move data around if it's on partitions which are each no
bigger than 25% of the disk.

John.