2005-05-09 18:39:09

by Markus Klotzbuecher

[permalink] [raw]
Subject: [ANNOUNCE] mini_fo-0.6.0 overlay file system

mini_fo is a virtual kernel filesystem that can make read-only file
systems writable. This is done by redirecting modifying operations to
a writeable location called "storage directory", and leaving the
original data in the "base directory" untouched. When reading, the
file system merges the modifed and original data so that only the
newest versions will appear. This occurs transparently to the user,
who can access the data like on any other read-write file system.

mini_fo was originally developed for use in embedded systems, and
therefore is lightweight in terms of module size (~50K), memory usage
and storage usage. Nevertheless it has proved usefull for other
projects such as live cds or for sandboxing and testing.

For more information and download of the sources visit the project
page:

http://www.denx.de/twiki/bin/view/Know/MiniFOHome

ChangeLog for mini_fo-0-6-0:
- support for 2.4 and 2.6 kernels.
- mini_fo now implements all file system operations.
- many bugfixes and code cleanup.


Markus Klotzbuecher


2005-05-10 06:07:43

by Eric Lammerts

[permalink] [raw]
Subject: Re: [ANNOUNCE] mini_fo-0.6.0 overlay file system

Markus Klotzbuecher wrote:
> mini_fo is a virtual kernel filesystem that can make read-only file
> systems writable.

Nice.

Some remarks:
Some functions return -ENOTSUPP on error, which makes "ls -l" complain
loudly when getxattr() fails. This should be -EOPNOTSUPP.

The module taints the kernel because of MODULE_LICENSE("LGPL").
Since all your copyright statements say it's GPL software, better change
this to "GPL".

cheers,

Eric

2005-05-10 15:30:43

by Lee Revell

[permalink] [raw]
Subject: Re: [ANNOUNCE] mini_fo-0.6.0 overlay file system

On Tue, 2005-05-10 at 02:07 -0400, Eric Lammerts wrote:
> Markus Klotzbuecher wrote:
> > mini_fo is a virtual kernel filesystem that can make read-only file
> > systems writable.
>
> Nice.
>
> Some remarks:
> Some functions return -ENOTSUPP on error, which makes "ls -l" complain
> loudly when getxattr() fails. This should be -EOPNOTSUPP.
>
> The module taints the kernel because of MODULE_LICENSE("LGPL").
> Since all your copyright statements say it's GPL software, better change
> this to "GPL".

Ugh. Why does an LGPL module taint the kernel again?

Lee

2005-05-10 15:42:14

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [ANNOUNCE] mini_fo-0.6.0 overlay file system

On Tue, 2005-05-10 at 11:29 -0400, Lee Revell wrote:
> On Tue, 2005-05-10 at 02:07 -0400, Eric Lammerts wrote:
> > Markus Klotzbuecher wrote:
> > > mini_fo is a virtual kernel filesystem that can make read-only file
> > > systems writable.
> >
> > Nice.
> >
> > Some remarks:
> > Some functions return -ENOTSUPP on error, which makes "ls -l" complain
> > loudly when getxattr() fails. This should be -EOPNOTSUPP.
> >
> > The module taints the kernel because of MODULE_LICENSE("LGPL").
> > Since all your copyright statements say it's GPL software, better change
> > this to "GPL".
>
> Ugh. Why does an LGPL module taint the kernel again?

it's gpl anyway in practice when insmod'ed (LGPL mixed with GPL code
becomes GPL, as per the LGPL license)

you can say "GPL with additional rights" which would capture the spirit
of LGPL although the additional rights aren't really all that useful (eg
only for out-of-linux use)

2005-05-10 16:59:53

by Markus Klotzbuecher

[permalink] [raw]
Subject: Re: [ANNOUNCE] mini_fo-0.6.0 overlay file system

Hi Eric,

Thank you for the feedback.

On Tue, May 10, 2005 at 02:07:37AM -0400, Eric Lammerts wrote:
> Some remarks:
> Some functions return -ENOTSUPP on error, which makes "ls -l" complain
> loudly when getxattr() fails. This should be -EOPNOTSUPP.

You're right. Fixed in attached patch.

> The module taints the kernel because of MODULE_LICENSE("LGPL").
> Since all your copyright statements say it's GPL software, better change
> this to "GPL".

It seems to be ok to change this. Patch corrects this too.

Cheers

Markus



diff -Nru mini_fo.ORIG/inode.c mini_fo/inode.c
--- mini_fo.ORIG/inode.c 2005-05-06 23:59:08.000000000 +0200
+++ mini_fo/inode.c 2005-05-10 18:09:47.000000000 +0200
@@ -1259,7 +1259,7 @@
STATIC int
mini_fo_getxattr(struct dentry *dentry, const char *name, void *value, size_t size) {
struct dentry *hidden_dentry = NULL;
- int err = -ENOTSUPP;
+ int err = -EOPNOTSUPP;
/* Define these anyway so we don't need as much ifdef'ed code. */
char *encoded_name = NULL;
char *encoded_value = NULL;
@@ -1304,7 +1304,7 @@

{
struct dentry *hidden_dentry = NULL;
- int err = -ENOTSUPP;
+ int err = -EOPNOTSUPP;

/* Define these anyway, so we don't have as much ifdef'ed code. */
char *encoded_value = NULL;
@@ -1340,7 +1340,7 @@
STATIC int
mini_fo_removexattr(struct dentry *dentry, const char *name) {
struct dentry *hidden_dentry = NULL;
- int err = -ENOTSUPP;
+ int err = -EOPNOTSUPP;
char *encoded_name;

check_mini_fo_dentry(dentry);
@@ -1372,7 +1372,7 @@
STATIC int
mini_fo_listxattr(struct dentry *dentry, char *list, size_t size) {
struct dentry *hidden_dentry = NULL;
- int err = -ENOTSUPP;
+ int err = -EOPNOTSUPP;
char *encoded_list = NULL;

check_mini_fo_dentry(dentry);
diff -Nru mini_fo.ORIG/main.c mini_fo/main.c
--- mini_fo.ORIG/main.c 2005-05-06 23:59:08.000000000 +0200
+++ mini_fo/main.c 2005-05-10 17:54:13.000000000 +0200
@@ -405,7 +405,7 @@

MODULE_AUTHOR("Erez Zadok <[email protected]>");
MODULE_DESCRIPTION("FiST-generated mini_fo filesystem");
-MODULE_LICENSE("LGPL");
+MODULE_LICENSE("GPL");

/* MODULE_PARM(fist_debug_var, "i"); */
/* MODULE_PARM_DESC(fist_debug_var, "Debug level"); */


2005-05-12 12:18:57

by Jörn Engel

[permalink] [raw]
Subject: Re: [ANNOUNCE] mini_fo-0.6.0 overlay file system

On Mon, 9 May 2005 20:40:22 +0200, Markus Klotzbuecher wrote:
>
> mini_fo is a virtual kernel filesystem that can make read-only file
> systems writable. This is done by redirecting modifying operations to
> a writeable location called "storage directory", and leaving the
> original data in the "base directory" untouched. When reading, the
> file system merges the modifed and original data so that only the
> newest versions will appear. This occurs transparently to the user,
> who can access the data like on any other read-write file system.
>
> mini_fo was originally developed for use in embedded systems, and
> therefore is lightweight in terms of module size (~50K), memory usage
> and storage usage. Nevertheless it has proved usefull for other
> projects such as live cds or for sandboxing and testing.

Just out of curiosity: how do you perform the copy-up operation?
In-kernel copies of large files are a huge problem and for union-mount
purposes, I'm clueless about how to fix things.

J?rn

--
The competent programmer is fully aware of the strictly limited size of
his own skull; therefore he approaches the programming task in full
humility, and among other things he avoids clever tricks like the plague.
-- Edsger W. Dijkstra

2005-05-12 16:43:30

by Markus Klotzbuecher

[permalink] [raw]
Subject: Re: [ANNOUNCE] mini_fo-0.6.0 overlay file system

Hi Joern,

On Thu, May 12, 2005 at 02:18:42PM +0200, J?rn Engel wrote:

> Just out of curiosity: how do you perform the copy-up operation?
> In-kernel copies of large files are a huge problem and for union-mount
> purposes, I'm clueless about how to fix things.

Basically I open the source and the target file on the lower file
systems for reading and writing respectively, then read and write page
sized chunks of data until all has been copied. Obviously not ideal
for large files, but I had no better idea so far.

Markus

2005-05-13 03:18:51

by Kyle Moffett

[permalink] [raw]
Subject: Re: [ANNOUNCE] mini_fo-0.6.0 overlay file system

On May 12, 2005, at 12:44:13, Markus Klotzbuecher wrote:
> Hi Joern,
>
> On Thu, May 12, 2005 at 02:18:42PM +0200, J?rn Engel wrote:
>> Just out of curiosity: how do you perform the copy-up operation?
>> In-kernel copies of large files are a huge problem and for union-
>> mount
>> purposes, I'm clueless about how to fix things.
>>
>
> Basically I open the source and the target file on the lower file
> systems for reading and writing respectively, then read and write page
> sized chunks of data until all has been copied. Obviously not ideal
> for large files, but I had no better idea so far.

I've been thinking about a "-o union" mount option for a while now, and
I had a couple ideas on this topic.

1) This system should be a first-class VFS element, IE: -o union should
work on all filesystems, regardless of feature support. (As long as you
can read/write from/to the unioned fs and read from the underlying fs)

2) When forced to copy data, the copy should be done in the context of
whatever process is doing the "write" operation, and be interruptible,
etc. The end result is that if you union an nfs mount over another one,
it will just seem like a write to a big file takes a _really_ long time
to complete.

3) ext2/3 should get an extra flag for files and directories that
indicates nonresidence. This would be used by the VFS union layer to
map existence/nonexistence to the unioned filesystem (If it's ext2/3).
That way, if I later unmounted the unioned ext3 fs and remounted it
elsewhere without the underlying storage, I would be able to access the
parts of the directory structure and files that are resident, and the
rest would fail with a new error code ENONRESIDENT or similar.

For example, if I have two ext3 filesystems on /dev/hdb1 and /dev/hdb2,
then I could do this:

mount -t ext3 -o ro /dev/hdb1 /mnt
mount -t ext3 -o rw,union /dev/hdb2 /mnt

The on-disk structures might look like this:

/dev/hdb1:
foo/
bar => blocks(1-4)
baz => blocks(1-8)
oof/
xuuq => blocks(1-2)

/dev/hdb2:
foo/
bar => sparse,nonresident,blocks(2,4)
baz => sparse,nonresident,blocks(1-4,9-12)
quux => blocks(1-32)
oof/ => nonresident
mumble/
grumble => blocks(1-4)

The resultant view to the user:

/mnt
foo/
bar => blocks(1-4)
baz => blocks(1-12)
quux => blocks(1-32)
oof/
xuuq => blocks(1-2)
mumble/
grumble => blocks(1-4)

If they deleted /dev/hdb1, but still wanted whatever changes they had
made on /dev/hdb2, they could always get at them by remounting /dev/hdb2
somewhere _without_ "-o union", and use a modified tar to package up the
resident portions of files the same way it does for sparse files.
Naturally there would need to be a way to mark a sparse file's empty
spaces as nonresident if so desired when untarring.


Cheers,
Kyle Moffett

-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GCM/CS/IT/U d- s++: a18 C++++>$ UB/L/X/*++++(+)>$ P+++(++++)>$
L++++(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+
PGP+++ t+(+++) 5 X R? tv-(--) b++++(++) DI+ D+ G e->++++$ h!*()>++$
r !y?(-)
------END GEEK CODE BLOCK------



2005-05-13 08:02:15

by Jörn Engel

[permalink] [raw]
Subject: Re: [ANNOUNCE] mini_fo-0.6.0 overlay file system

It took me quite some time to understand what you're really after.
There are a bunch of problems I can imagine, but this approach may
actually work out. If it does, it definitely solves the problem of
copy-up of insanely large files.

If it does...

On Thu, 12 May 2005 23:18:36 -0400, Kyle Moffett wrote:
>
> I've been thinking about a "-o union" mount option for a while now, and
> I had a couple ideas on this topic.
>
> 1) This system should be a first-class VFS element, IE: -o union should
> work on all filesystems, regardless of feature support. (As long as you
> can read/write from/to the unioned fs and read from the underlying fs)
>
> 2) When forced to copy data, the copy should be done in the context of
> whatever process is doing the "write" operation, and be interruptible,
> etc. The end result is that if you union an nfs mount over another one,
> it will just seem like a write to a big file takes a _really_ long time
> to complete.

Doesn't even have to be interruptable. Your trick, if I understand it
correctly, is to copy data up on a block level, not on a file level.
Hence, even those nasty multi-gigabyte files will be copied in
granularities of a few bytes - definitely nothing you absolutely need
to be interruptable.

> 3) ext2/3 should get an extra flag for files and directories that
> indicates nonresidence. This would be used by the VFS union layer to
> map existence/nonexistence to the unioned filesystem (If it's ext2/3).
> That way, if I later unmounted the unioned ext3 fs and remounted it
> elsewhere without the underlying storage, I would be able to access the
> parts of the directory structure and files that are resident, and the
> rest would fail with a new error code ENONRESIDENT or similar.

ENONRESIDENT bugs me somehow. I guess EIO would be quite sufficient.
Maybe you also want a new incompatible fs flag, just to make sure old
kernels without proper understanding don't mess up the fs.

> For example, if I have two ext3 filesystems on /dev/hdb1 and /dev/hdb2,
> then I could do this:
>
> mount -t ext3 -o ro /dev/hdb1 /mnt
> mount -t ext3 -o rw,union /dev/hdb2 /mnt
>
> The on-disk structures might look like this:
>
> /dev/hdb1:
> foo/
> bar => blocks(1-4)
> baz => blocks(1-8)
> oof/
> xuuq => blocks(1-2)
>
> /dev/hdb2:
> foo/
> bar => sparse,nonresident,blocks(2,4)
> baz => sparse,nonresident,blocks(1-4,9-12)
> quux => blocks(1-32)
> oof/ => nonresident
> mumble/
> grumble => blocks(1-4)
>
> The resultant view to the user:
>
> /mnt
> foo/
> bar => blocks(1-4)
> baz => blocks(1-12)
> quux => blocks(1-32)
> oof/
> xuuq => blocks(1-2)
> mumble/
> grumble => blocks(1-4)
>
> If they deleted /dev/hdb1, but still wanted whatever changes they had
> made on /dev/hdb2, they could always get at them by remounting /dev/hdb2
> somewhere _without_ "-o union", and use a modified tar to package up the
> resident portions of files the same way it does for sparse files.
> Naturally there would need to be a way to mark a sparse file's empty
> spaces as nonresident if so desired when untarring.

That's the old well-known (to some people) union-mount behaviour.

Really, your idea of a block (page, whatever) level granularity for
copying data is nice. It solves the biggest concern I had left for
union mount. Actually implementing it, though, depends on quite a bit
of infrastructure that just doesn't exist yet. Still, a very
interesting idea.

J?rn

--
To announce that there must be no criticism of the President, or that we
are to stand by the President, right or wrong, is not only unpatriotic
and servile, but is morally treasonable to the American public.
-- Theodore Roosevelt, Kansas City Star, 1918

2005-05-13 11:28:57

by Kyle Moffett

[permalink] [raw]
Subject: Re: [ANNOUNCE] mini_fo-0.6.0 overlay file system

On May 13, 2005, at 04:01:37, J?rn Engel wrote:
> Doesn't even have to be interruptable.

Well, I wrote in my first mail:

> On Thu, 12 May 2005 23:18:36 -0400, Kyle Moffett wrote:
>> 1) This system should be a first-class VFS element, IE: -o union
>> should
>> work on all filesystems, regardless of feature support.

I'd like to have -o union work not just on ext2/3. It could
potentially be
very _slow_ on other filesystems, until they get nonresident file
support,
but it would definitely need to be an interruptible page copy in that
case.

> Your trick, if I understand it correctly, is to copy data up on a
> block
> level, not on a file level.

Precisely.

>> That way, if I later unmounted the unioned ext3 fs and remounted it
>> elsewhere without the underlying storage, I would be able to
>> access the
>> parts of the directory structure and files that are resident, and the
>> rest would fail with a new error code ENONRESIDENT or similar.
>
> ENONRESIDENT bugs me somehow. I guess EIO would be quite sufficient.

Hmm. Ideally a program like tar would be able to determine which
pages of
a file are resident in memory and only store those. How does this
currently
work for sparse files?

> Maybe you also want a new incompatible fs flag, just to make sure old
> kernels without proper understanding don't mess up the fs.

Definitely. You'd only need to set this if there were any
nonresident files,
however, and those would probably only be created if you union
mounted with
"-o nonres" or similar.

>> If they deleted /dev/hdb1, but still wanted whatever changes they had
>> made on /dev/hdb2, they could always get at them by remounting /
>> dev/hdb2
>> somewhere _without_ "-o union", and use a modified tar to package
>> up the
>> resident portions of files the same way it does for sparse files.
>> Naturally there would need to be a way to mark a sparse file's empty
>> spaces as nonresident if so desired when untarring.
>
> That's the old well-known (to some people) union-mount behaviour.

I'm just describing the whole idea in totality, so that everybody can
get an
idea of what's going on.

> Really, your idea of a block (page, whatever) level granularity for
> copying data is nice.

I liked the idea of the existing linux sparse file support, so I
based it off
that.

> It solves the biggest concern I had left for union mount. Actually
> implementing it, though, depends on quite a bit of infrastructure
> that just
> doesn't exist yet. Still, a very interesting idea.

For ext2/ext3, the sparse-file-support _does_ exist, so the only
major parts
that need to be added are:
o An extra ext2/ext3 flag that indicates nonresidence (For both
sparse
files, normal files, and directories).
o VFS-level support for the union operation with hooks to let each
filesystem do something special.





Cheers,
Kyle Moffett

-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GCM/CS/IT/U d- s++: a18 C++++>$ UB/L/X/*++++(+)>$ P+++(++++)>$
L++++(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+
PGP+++ t+(+++) 5 X R? tv-(--) b++++(++) DI+ D+ G e->++++$ h!*()>++$
r !y?(-)
------END GEEK CODE BLOCK------



2005-05-13 12:25:05

by Jörn Engel

[permalink] [raw]
Subject: Re: [ANNOUNCE] mini_fo-0.6.0 overlay file system

On Fri, 13 May 2005 07:26:14 -0400, Kyle Moffett wrote:
>
> >It solves the biggest concern I had left for union mount. Actually
> >implementing it, though, depends on quite a bit of infrastructure
> >that just
> >doesn't exist yet. Still, a very interesting idea.
>
> For ext2/ext3, the sparse-file-support _does_ exist, so the only
> major parts
> that need to be added are:
> o An extra ext2/ext3 flag that indicates nonresidence (For both
> sparse
> files, normal files, and directories).
> o VFS-level support for the union operation with hooks to let each
> filesystem do something special.

That and replacing the page cache by something different. Page cache
is referencing pages by inode,offset pairs. Having a potentially
infinite amount of inodes to look at, in order, may require a tiny bit
of patching. ;)

J?rn

--
Linux [...] existed just for discussion between people who wanted
to show off how geeky they were.
-- Rob Enderle

2005-05-13 12:49:44

by Jan Blunck

[permalink] [raw]
Subject: Re: [ANNOUNCE] mini_fo-0.6.0 overlay file system

On 5/13/05, Kyle Moffett <[email protected]> wrote:
>
> I've been thinking about a "-o union" mount option for a while now, and
> I had a couple ideas on this topic.
>

So, I'm not the only one :) Actually, I'm working on a VFS based union
mounts implementation. There are still some major bugs which I want to
fix before posting the patches.

Some key features:
- VFS based approach, all file systems *should* work when mounted read-only
- unification of directory listings with readdir()
- copy-up with the help of J?rn's madcow patches to sendfile
- whiteout implementation for the VFS and ext2

The patches basically implement a union stack with/on dentries. That
is working quite well but I still have some issues with the dcache. At
the moment other stuff has higher priorities.

Jan

2005-05-13 20:47:40

by Markus Klotzbuecher

[permalink] [raw]
Subject: Re: [ANNOUNCE] mini_fo-0.6.0 overlay file system

On Thu, May 12, 2005 at 11:18:36PM -0400, Kyle Moffett wrote:

> 2) When forced to copy data, the copy should be done in the context of
> whatever process is doing the "write" operation, and be interruptible,
> etc. The end result is that if you union an nfs mount over another one,
> it will just seem like a write to a big file takes a _really_ long time
> to complete.

This is what happens with mini_fo, provided your kernel is preemptive.

> 3) ext2/3 should get an extra flag for files and directories that
> indicates nonresidence. This would be used by the VFS union layer to
[...]

I like the idea of copying modified data on a per block basis. This
really would avoid unnecessary long copy operations and potentially
save a lot of storage. But I think a unifying layer should not rely on
specialities such as sparse files or flags provided by the lower layer
file systems.

Markus

2005-05-13 21:01:38

by Kyle Moffett

[permalink] [raw]
Subject: Re: [ANNOUNCE] mini_fo-0.6.0 overlay file system

On May 13, 2005, at 08:24:51, J?rn Engel wrote:
> That and replacing the page cache by something different. Page cache
> is referencing pages by inode,offset pairs. Having a potentially
> infinite amount of inodes to look at, in order, may require a tiny bit
> of patching. ;)

Why modify the page cache (much)? In the sparse-nonresident-file case,
when a filesystem is asked to return a page-cache page mapped to the
file, it looks it up in itself. If that call returns ENONRESIDENT or
similar, the VFS union code would go to the next filesystem in the
stack and attempt a similar lookup. If any filesystem but the root
returns a page, then the unionfs code would map it read-only and COW.
When the COW triggers, it will call into the VFS union code, which will
either:
(a) If the topmost filesystem supports sparse-nonresident, then
have it allocate the new page on-disk.
(b) Otherwise, copy the whole file and modify the page.

Then it would return the new now-writeable pagecache page.

Cheers,
Kyle Moffett

-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GCM/CS/IT/U d- s++: a18 C++++>$ UB/L/X/*++++(+)>$ P+++(++++)>$
L++++(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+
PGP+++ t+(+++) 5 X R? tv-(--) b++++(++) DI+ D+ G e->++++$ h!*()>++$
r !y?(-)
------END GEEK CODE BLOCK------