2008-01-16 16:47:17

by Paul Albrecht

[permalink] [raw]
Subject: unionfs, cow, and whiteout

Hi,

I have a question about how unionfs handles file deletion when a write
enabled file system is union mounted over a read only file system. For
example, I do something like the following:

mount -t unionfs -o dirs=/cow=rw:/rofs=ro unionfs /mnt

If I create and delete a file in /mnt which is not present in /rofs it
persists as whiteout in the cow file system which is not what I would
have expected.

Why does the deleted file persist as whiteout in the /cow file system of
the union mount?

Please cc me in your response as I'm not subscribed to the lkml.

--

Paul Albrecht


2008-01-16 17:15:07

by Erez Zadok

[permalink] [raw]
Subject: Re: unionfs, cow, and whiteout

In message <1200500833.11000.0.camel@thinix-laptop>, Paul Albrecht writes:
> Hi,
>
> I have a question about how unionfs handles file deletion when a write
> enabled file system is union mounted over a read only file system. For
> example, I do something like the following:
>
> mount -t unionfs -o dirs=/cow=rw:/rofs=ro unionfs /mnt
>
> If I create and delete a file in /mnt which is not present in /rofs it
> persists as whiteout in the cow file system which is not what I would
> have expected.

Paul, the alternative is to scan all branches (there could be many) to
ensure that the file may not exist by that name anywhere else, and if so,
try to delete all instances of it. This was deemed too expensive and
complex.

Another possible problem is that if you choose to insert a new branch in the
middle, and you didn't have the whiteout, you may re-expose the file name
unintentionally.

You might want to take a look at our unionfs-odf version: it places
whiteouts in a separate persistent store outside the branches, not as .wh.*
files in individual branches.

> Why does the deleted file persist as whiteout in the /cow file system of
> the union mount?
>
> Please cc me in your response as I'm not subscribed to the lkml.
>
> --
>
> Paul Albrecht

Cheers,
Erez.

2008-01-16 19:49:00

by Paul Albrecht

[permalink] [raw]
Subject: Re: unionfs, cow, and whiteout


On Wed, 2008-01-16 at 12:13 -0500, Erez Zadok wrote:
> In message <1200500833.11000.0.camel@thinix-laptop>, Paul Albrecht writes:
> > Hi,
> >
> > I have a question about how unionfs handles file deletion when a write
> > enabled file system is union mounted over a read only file system. For
> > example, I do something like the following:
> >
> > mount -t unionfs -o dirs=/cow=rw:/rofs=ro unionfs /mnt
> >
> > If I create and delete a file in /mnt which is not present in /rofs it
> > persists as whiteout in the cow file system which is not what I would
> > have expected.
>
> Paul, the alternative is to scan all branches (there could be many) to
> ensure that the file may not exist by that name anywhere else, and if so,
> try to delete all instances of it. This was deemed too expensive and
> complex.
>

I'm not sure we're talking about the same problem. What I do is union
mount a write enabled file system like tmpfs over a read only file
system like squashfs.

There's no way to create, modify, or delete files in a squashed file
system; they can be copied up when they're modified; or, they can be
whited out when they're deleted.

Whenever a file is created in the union mount, it necessarily gets
created in tmpfs. When that file gets deleted, it gets whited out which
doesn't make sense because it doesn't exist in the other layer.

This is a problem because over time as files are created, modified, and
deleted whiteout cruft accumulates in the cow layer of the union mount.

Fixing the problem doesn't seem that complex and shouldn't require
searching all the layers of the union mount.

If the union file system simply took note of whether a file was created
in the cow layer because it's new or because it's been modified and
copied up from the read only file system, then it would simply delete
the file in the former case and and use whiteout in the latter.

> Another possible problem is that if you choose to insert a new branch in the
> middle, and you didn't have the whiteout, you may re-expose the file name
> unintentionally.
>

I don't see how the a "deleted" file in a read only file system could be
re-exposed unless its whiteout in the cow layer was deleted, but that's
really not the issue.

What I'm objecting to is creating the whiteout in the cow layer when the
file didn't get there via a copy up from a read only file system. In
this case there's no worry about re-exposing the deleted file because
it's really deleted.

> You might want to take a look at our unionfs-odf version: it places
> whiteouts in a separate persistent store outside the branches, not as .wh.*
> files in individual branches.
>
> > Why does the deleted file persist as whiteout in the /cow file system of
> > the union mount?
> >
> > Please cc me in your response as I'm not subscribed to the lkml.
> >
> > --
> >
> > Paul Albrecht
>
> Cheers,
> Erez.
--

Paul Albrecht

2008-01-16 20:18:59

by Erez Zadok

[permalink] [raw]
Subject: Re: unionfs, cow, and whiteout

[I recommend we direct future discussions in this thread to the unionfs
ML. -ezk]

In message <1200512926.12092.33.camel@thinix-laptop>, Paul Albrecht writes:
[...]
> I'm not sure we're talking about the same problem. What I do is union
> mount a write enabled file system like tmpfs over a read only file
> system like squashfs.
>
> There's no way to create, modify, or delete files in a squashed file
> system; they can be copied up when they're modified; or, they can be
> whited out when they're deleted.
>
> Whenever a file is created in the union mount, it necessarily gets
> created in tmpfs. When that file gets deleted, it gets whited out which
> doesn't make sense because it doesn't exist in the other layer.
>
> This is a problem because over time as files are created, modified, and
> deleted whiteout cruft accumulates in the cow layer of the union mount.
>
> Fixing the problem doesn't seem that complex and shouldn't require
> searching all the layers of the union mount.

Paul, you're looking into a specific 2-branch configuration where one branch
is r-o and the other is r-w. Yes, in that specific case, one could argue
that a whiteout isn't needed. But what if I have N branches, with a mix of
rw/ro branches, where a file or its whiteout could exist in any branch? If
I don't create a whiteout, then I have to scan all N branches and remove the
same file from there (assuming the file doesn't exist on a r-o branch --
then I have to abort).

Note also that branches could be dynamically marked r-o or r-w over the
lifetime of the union: so a file which was deletable before may not be
deletable in the future.

We used to have several modes of operations, including one called
DELETE_ALL, which was similar to what you're asking for. But it complicated
the code considerably and most users didn't use that mode. So we opted for
simplicity and clarity of code, rather than having special cases for
different branch configurations.

If you're willing to open a feature-request report on
https://bugzilla.filesystems.org/, then we'll be happy to consider your
request and see how it can be incorporated while keeping the base code
devoid of special cases. Thanks.

> If the union file system simply took note of whether a file was created
> in the cow layer because it's new or because it's been modified and
> copied up from the read only file system, then it would simply delete
> the file in the former case and and use whiteout in the latter.

Taking that "note" requires that the information survives a reboot; so I
can't store it in memory, but it has to be stored persistently. That would
complicate the code and one might as well use unionfs-odf instead.

> > Another possible problem is that if you choose to insert a new branch in
> > the middle, and you didn't have the whiteout, you may re-expose the file
> > name unintentionally.
> >
>
> I don't see how the a "deleted" file in a read only file system could be
> re-exposed unless its whiteout in the cow layer was deleted, but that's
> really not the issue.

Suppose you have your two branches, you created a file X and deleted it.
Now, you insert a new branch in the *middle*, which has file X in it: do you
want that new file to show up in /bin/ls, or not? If you didn't create a
whiteout in the /cow layer, then file X will re-appear after the user
supposedly deleted it. (To be fair, the desired semantics here are not
clear -- some users may want it one way or another -- but I want to ensure a
*consistent* semantics that is simple to understand).

> What I'm objecting to is creating the whiteout in the cow layer when the
> file didn't get there via a copy up from a read only file system. In
> this case there's no worry about re-exposing the deleted file because
> it's really deleted.

Paul, it really looks to me that you'd prefer the unionfs-odf version: it
has a flavor of the older delete-all mode. In unionfs-odf, we first try to
delete the file from all branches. If we can't (b/c of r-o branches/media),
then we create a whiteout in the (small) /odf partition. Therefore,
whiteouts are never stored in the main union'ed branches.

Cheers,
Erez.