by Christian Stroetmann

[permalink] [raw]

Subject: Re: [PATCH 14/39] union-mount: Union mounts documentation

Aloha Everybody;

On the 24.08.2010 22:48, Valerie Aurora wrote:
> On Tue, Aug 24, 2010 at 11:28:37AM +0900, J. R. Okajima wrote:
>> Thank you for explanation, very much.

Me too

> You are welcome!
>
>> When a rename happens on a layer directly, aufs receives a
>> inotify/fsnotify event. Following the event type, aufs makes the cached
>> dentry/inode obsoleted and they will be lookup-ed again in the
>> succeeding access. Finally aufs will know the upper parent_dir1 is not
>> covering the lower parent_dir2 anymore.
>> This notification is the main purpose of the strict option which is
>> called "udba=notify" (User's Direct Branch Access).
> No, that's not a sufficient description and leaves open questions
> about all sorts of deadlocks and race conditions. For example,
> inotify events occur while holding locks only on one layer. You
> obviously need to lock the top layer to update the inheritance and
> parent-child relationships. Now you are locking the lower layer first
> and the top layer second, which is the reverse of the usual order.
> Also, it should not be an option.
>
> If Al Viro says it's wrong, you need a very detailed explanation of
> why it is right. See Documentation/filesystem/directory-locking for
> an example of the argument you have to make to show that moving things
> around on the lower layer is safe. In general, your first task is to
> show a global lock ordering to prove lack of deadlocks (which I don't
> think you should spend time on because most VFS experts think it is
> impossible to do with two read-write layers).

This all reminds me of the 5/dining philosophers problem and its
solutions, especially the waiter and the resource hierarchy solutions
(see [1]).
And I do think that such problems can always be solved in a real world
context, but often the solutions are very time and/or space consuming.

> I'm not going to explain any more how aufs is wrong; it's the
> maintainer's job to convince Al Viro and other maintainers that aufs
> is right. But I hope this gave you a start and showed why union
> mounts is a preferred approach for many people.
>
> Thanks,
>
> -VAL

[1] http://en.wikipedia.org/wiki/Dining_philosophers_problem

Have fun
Christian

2010-08-25 05:04:43

by J. R. Okajima

[permalink] [raw]

Subject: Re: [PATCH 14/39] union-mount: Union mounts documentation

Valerie Aurora:
> No, that's not a sufficient description and leaves open questions
> about all sorts of deadlocks and race conditions. For example,
> inotify events occur while holding locks only on one layer. You
> obviously need to lock the top layer to update the inheritance and
> parent-child relationships. Now you are locking the lower layer first
> and the top layer second, which is the reverse of the usual order.

I don't agree about deadlock and race condition.
When user modifies the dir hierarchy on the layer directly during
aufs_rename() is running, aufs will detect it after lock_rename().
It behaves like this.
- decide the layer where actual rename operates. create the dir
hierarchy on it if necessary.
- lock_rename() for the layer
- calls ->rename()
or
- if the renaming file exists on the lower readonly layer, aufs will
copyup it to the upper writable layer as the rename target name.
In this case, ->rename() is not called.

If a user changes the dir hierarchy directly on the layer before
aufs_rename(), then the notify event tells aufs it and aufs gets the
latetst hierarchy.

If it happens before lock_rename() in aufs_rename(), aufs verifies the
relationship between the target child and the locked dir. if it differs,
return EBUSY. Of course, lock_rename() follows the "ancestors first"
order described in Documentation/filesystem/directory-locking.

> around on the lower layer is safe. In general, your first task is to
> show a global lock ordering to prove lack of deadlocks (which I don't
> think you should spend time on because most VFS experts think it is
> impossible to do with two read-write layers).

Since you may not read this anymore and other people doesn't seem to
be intrested in aufs, it may not be meaningful to write down about
locking in aufs. But I will try.

At first,
- since aufs is FS, it has its own super_block, dentry and inode.
- super_block, dentry and inode in aufs have private data which contains
rwsem.
- the locking order for these rwsem is child-first.
- aufs specifies FS_RENAME_DOES_D_MOVE.

locking order in aufs_rename
+ down_read() for aufs sb
protects sb from branch-add, delete.
+ two down_write()s for src and dest child
protects them from other processes in aufs.
+ down_write() for the dst_parent.
+ decide the layer where we will operate, by comparing the index of
layers where the targets exist and the layer attribute (ro, rw).
+ copyup the dest dir hierarchy if necessary, by repeating
- dget_parent(), down/up_read() for the parent (in aufs)
- mutex_lock() for the dir (on the layer) to mkdir the non-existing
child dir on the layer and verify the parent-child relationship.
- mkdir and setattr on the layer.
- mutex_unlock() the dir on the layer.
+ test they are rename-able
if it is a dir, it must be empty (logically) or must not have children
on the multiple branches.
+ if src_parent and dst_parent differ, down_write both. up_write for
dst_parent may be necessary to keep the "child-first" rule in aufs.

(from here the "sub-VFS" characteristic of aufs appears)
+ lock_rename() on the layer
and verify the every relationships between child and parent.
+ test the src_child is deletable.
+ test the dst_child is add-able or deletable if it exists.
+ vfs_rename() on the layer or copyup src_child as a dst_child name.
+ unlock_rename() on the layer

(return to aufs world)
+ d_drop() dst_child if necessary.
+ d_move()
+ up_write() for src_parent and dst_parent
+ up_write() fot src_child and dst_child
+ up_read() for aufs sb

Strictly speaking, there are more things which aufs_rename() handles
such as inode attributes, whiteout, opaque-dir, internal pointers to the
object on the layer, temporary dir-name. But they are unrelated to the
locking order essentially. So I didn't describe about them.

Thank you reading this long mail.

J. R. Okajima