2012-08-15 09:27:00

by Cyrill Gorcunov

[permalink] [raw]
Subject: [patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

To provide fsnotify object inodes being watched without
binding to alphabetical path we need to encode them with
exportfs help. This patch adds a helper which operates
with plain inodes directly.

Signed-off-by: Cyrill Gorcunov <[email protected]>
Acked-by: Pavel Emelyanov <[email protected]>
CC: Al Viro <[email protected]>
CC: Alexey Dobriyan <[email protected]>
CC: Andrew Morton <[email protected]>
CC: James Bottomley <[email protected]>
---
fs/exportfs/expfs.c | 19 +++++++++++++++++++
include/linux/exportfs.h | 2 ++
2 files changed, 21 insertions(+)

Index: linux-2.6.git/fs/exportfs/expfs.c
===================================================================
--- linux-2.6.git.orig/fs/exportfs/expfs.c
+++ linux-2.6.git/fs/exportfs/expfs.c
@@ -302,6 +302,25 @@ out:
return error;
}

+int export_encode_inode_fh(struct inode *inode, struct fid *fid, int *max_len)
+{
+ int len = *max_len;
+ int type = FILEID_INO32_GEN;
+
+ if (len < 2) {
+ *max_len = 2;
+ return 255;
+ }
+
+ len = 2;
+ fid->i32.ino = inode->i_ino;
+ fid->i32.gen = inode->i_generation;
+ *max_len = len;
+
+ return type;
+}
+EXPORT_SYMBOL_GPL(export_encode_inode_fh);
+
/**
* export_encode_fh - default export_operations->encode_fh function
* @inode: the object to encode
Index: linux-2.6.git/include/linux/exportfs.h
===================================================================
--- linux-2.6.git.orig/include/linux/exportfs.h
+++ linux-2.6.git/include/linux/exportfs.h
@@ -177,6 +177,8 @@ struct export_operations {
int (*commit_metadata)(struct inode *inode);
};

+extern int export_encode_inode_fh(struct inode *inode, struct fid *fid, int *max_len);
+
extern int exportfs_encode_fh(struct dentry *dentry, struct fid *fid,
int *max_len, int connectable);
extern struct dentry *exportfs_decode_fh(struct vfsmount *mnt, struct fid *fid,


2012-08-15 20:45:49

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

On Wed, Aug 15, 2012 at 01:21:20PM +0400, Cyrill Gorcunov wrote:
> To provide fsnotify object inodes being watched without
> binding to alphabetical path we need to encode them with
> exportfs help. This patch adds a helper which operates
> with plain inodes directly.

I don't get it--this seems like a really roundabout way to get inode and
generation number, if that's all you want.

On the other hand, if you want a real filehandle then wouldn't you want
to e.g. call the filesystem's ->encode_fh() if necessary, as
exportfs_encode_fh() does?

--b.

>
> Signed-off-by: Cyrill Gorcunov <[email protected]>
> Acked-by: Pavel Emelyanov <[email protected]>
> CC: Al Viro <[email protected]>
> CC: Alexey Dobriyan <[email protected]>
> CC: Andrew Morton <[email protected]>
> CC: James Bottomley <[email protected]>
> ---
> fs/exportfs/expfs.c | 19 +++++++++++++++++++
> include/linux/exportfs.h | 2 ++
> 2 files changed, 21 insertions(+)
>
> Index: linux-2.6.git/fs/exportfs/expfs.c
> ===================================================================
> --- linux-2.6.git.orig/fs/exportfs/expfs.c
> +++ linux-2.6.git/fs/exportfs/expfs.c
> @@ -302,6 +302,25 @@ out:
> return error;
> }
>
> +int export_encode_inode_fh(struct inode *inode, struct fid *fid, int *max_len)
> +{
> + int len = *max_len;
> + int type = FILEID_INO32_GEN;
> +
> + if (len < 2) {
> + *max_len = 2;
> + return 255;
> + }
> +
> + len = 2;
> + fid->i32.ino = inode->i_ino;
> + fid->i32.gen = inode->i_generation;
> + *max_len = len;
> +
> + return type;
> +}
> +EXPORT_SYMBOL_GPL(export_encode_inode_fh);
> +
> /**
> * export_encode_fh - default export_operations->encode_fh function
> * @inode: the object to encode
> Index: linux-2.6.git/include/linux/exportfs.h
> ===================================================================
> --- linux-2.6.git.orig/include/linux/exportfs.h
> +++ linux-2.6.git/include/linux/exportfs.h
> @@ -177,6 +177,8 @@ struct export_operations {
> int (*commit_metadata)(struct inode *inode);
> };
>
> +extern int export_encode_inode_fh(struct inode *inode, struct fid *fid, int *max_len);
> +
> extern int exportfs_encode_fh(struct dentry *dentry, struct fid *fid,
> int *max_len, int connectable);
> extern struct dentry *exportfs_decode_fh(struct vfsmount *mnt, struct fid *fid,
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2012-08-15 21:02:44

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

On Wed, Aug 15, 2012 at 04:45:46PM -0400, J. Bruce Fields wrote:
> On Wed, Aug 15, 2012 at 01:21:20PM +0400, Cyrill Gorcunov wrote:
> > To provide fsnotify object inodes being watched without
> > binding to alphabetical path we need to encode them with
> > exportfs help. This patch adds a helper which operates
> > with plain inodes directly.
>
> I don't get it--this seems like a really roundabout way to get inode and
> generation number, if that's all you want.

We can re-open the targets via filehandle on restore, this was the idea.
All this series aimed to achieve the way to restore objects after checkpoit,
thus we need to provide additional information which would be enough.

> On the other hand, if you want a real filehandle then wouldn't you want
> to e.g. call the filesystem's ->encode_fh() if necessary, as
> exportfs_encode_fh() does?

Well, one of the problem I hit when I've been trying to use encode_fh
is that every new implementation of encode_fh will require some size
(even unknown) in buffer where encoded data pushed. Correct me please
if I'm wrong. But with export_encode_inode_fh there is a small buffer
with pretty known size needed on stack needed for printing data in
fdinfo.

Cyrill

2012-08-15 22:06:27

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

On Thu, Aug 16, 2012 at 01:02:37AM +0400, Cyrill Gorcunov wrote:
> On Wed, Aug 15, 2012 at 04:45:46PM -0400, J. Bruce Fields wrote:
> > On Wed, Aug 15, 2012 at 01:21:20PM +0400, Cyrill Gorcunov wrote:
> > > To provide fsnotify object inodes being watched without
> > > binding to alphabetical path we need to encode them with
> > > exportfs help. This patch adds a helper which operates
> > > with plain inodes directly.
> >
> > I don't get it--this seems like a really roundabout way to get inode and
> > generation number, if that's all you want.
>
> We can re-open the targets via filehandle on restore, this was the idea.
> All this series aimed to achieve the way to restore objects after checkpoit,
> thus we need to provide additional information which would be enough.

For this to work it'll need to be something you can pass to
open_by_handle_at, won't it?

> > On the other hand, if you want a real filehandle then wouldn't you want
> > to e.g. call the filesystem's ->encode_fh() if necessary, as
> > exportfs_encode_fh() does?
>
> Well, one of the problem I hit when I've been trying to use encode_fh
> is that every new implementation of encode_fh will require some size
> (even unknown) in buffer where encoded data pushed. Correct me please
> if I'm wrong. But with export_encode_inode_fh there is a small buffer
> with pretty known size needed on stack needed for printing data in
> fdinfo.

You can just give encode_fh a too-small data and let it fail if it's not
big enough.

(In practice I think everyone supports NFSv3 filehandles which have a
maximum size of 64 bytes.)

--b.

2012-08-16 06:24:54

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

On Wed, Aug 15, 2012 at 06:06:23PM -0400, J. Bruce Fields wrote:
> On Thu, Aug 16, 2012 at 01:02:37AM +0400, Cyrill Gorcunov wrote:
> > On Wed, Aug 15, 2012 at 04:45:46PM -0400, J. Bruce Fields wrote:
> > > On Wed, Aug 15, 2012 at 01:21:20PM +0400, Cyrill Gorcunov wrote:
> > > > To provide fsnotify object inodes being watched without
> > > > binding to alphabetical path we need to encode them with
> > > > exportfs help. This patch adds a helper which operates
> > > > with plain inodes directly.
> > >
> > > I don't get it--this seems like a really roundabout way to get inode and
> > > generation number, if that's all you want.
> >
> > We can re-open the targets via filehandle on restore, this was the idea.
> > All this series aimed to achieve the way to restore objects after checkpoit,
> > thus we need to provide additional information which would be enough.
>
> For this to work it'll need to be something you can pass to
> open_by_handle_at, won't it?

Yes, and for inotify we print out this handled via fdinfo, later on restore
we use it together with open_by_handle_at.

>
> > > On the other hand, if you want a real filehandle then wouldn't you want
> > > to e.g. call the filesystem's ->encode_fh() if necessary, as
> > > exportfs_encode_fh() does?
> >
> > Well, one of the problem I hit when I've been trying to use encode_fh
> > is that every new implementation of encode_fh will require some size
> > (even unknown) in buffer where encoded data pushed. Correct me please
> > if I'm wrong. But with export_encode_inode_fh there is a small buffer
> > with pretty known size needed on stack needed for printing data in
> > fdinfo.
>
> You can just give encode_fh a too-small data and let it fail if it's not
> big enough.
>
> (In practice I think everyone supports NFSv3 filehandles which have a
> maximum size of 64 bytes.)

I'll think about it, thanks!

Cyrill

2012-08-16 12:38:23

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

On Thu, Aug 16, 2012 at 10:24:48AM +0400, Cyrill Gorcunov wrote:
> > > > On the other hand, if you want a real filehandle then wouldn't you want
> > > > to e.g. call the filesystem's ->encode_fh() if necessary, as
> > > > exportfs_encode_fh() does?
> > >
> > > Well, one of the problem I hit when I've been trying to use encode_fh
> > > is that every new implementation of encode_fh will require some size
> > > (even unknown) in buffer where encoded data pushed. Correct me please
> > > if I'm wrong. But with export_encode_inode_fh there is a small buffer
> > > with pretty known size needed on stack needed for printing data in
> > > fdinfo.
> >
> > You can just give encode_fh a too-small data and let it fail if it's not
> > big enough.
> >
> > (In practice I think everyone supports NFSv3 filehandles which have a
> > maximum size of 64 bytes.)
>
> I'll think about it, thanks!

Hi Bruce, thinking a bit more I guess using general encode_fh is not that
convenient since it operates with dentries while our fdinfo output deals
with inodes. Thus I should either provide some new encode_fh variant
which would deal with inodes directly without "parents". Which doesn't
look for me anyhow better than the new export_encode_inode_fh helper.

After all, if the use of encode_fh become a mandatory rule we can easily
extend fsnotify fdinfo output to support new scheme without breaking
user space, because output looks like

| fhandle-type: 1 f_handle: 49b1060023552153

(ie if something is changed than these fields will be simply updated).

Or maybe I miss something?

Cyrill

2012-08-16 12:47:09

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

On Thu, Aug 16, 2012 at 04:38:14PM +0400, Cyrill Gorcunov wrote:
> On Thu, Aug 16, 2012 at 10:24:48AM +0400, Cyrill Gorcunov wrote:
> > > > > On the other hand, if you want a real filehandle then wouldn't you want
> > > > > to e.g. call the filesystem's ->encode_fh() if necessary, as
> > > > > exportfs_encode_fh() does?
> > > >
> > > > Well, one of the problem I hit when I've been trying to use encode_fh
> > > > is that every new implementation of encode_fh will require some size
> > > > (even unknown) in buffer where encoded data pushed. Correct me please
> > > > if I'm wrong. But with export_encode_inode_fh there is a small buffer
> > > > with pretty known size needed on stack needed for printing data in
> > > > fdinfo.
> > >
> > > You can just give encode_fh a too-small data and let it fail if it's not
> > > big enough.
> > >
> > > (In practice I think everyone supports NFSv3 filehandles which have a
> > > maximum size of 64 bytes.)
> >
> > I'll think about it, thanks!
>
> Hi Bruce, thinking a bit more I guess using general encode_fh is not that
> convenient since it operates with dentries while our fdinfo output deals
> with inodes. Thus I should either provide some new encode_fh variant
> which would deal with inodes directly without "parents".

I can't see why that wouldn't work.

> Which doesn't
> look for me anyhow better than the new export_encode_inode_fh helper.

That isn't going to work for filesystems that define their own
encode_fh:

$ git grep '\.encode_fh'
fs/btrfs/export.c: .encode_fh = btrfs_encode_fh,
fs/ceph/export.c: .encode_fh = ceph_encode_fh,
fs/fat/inode.c: .encode_fh = fat_encode_fh,
fs/fuse/inode.c: .encode_fh = fuse_encode_fh,
fs/gfs2/export.c: .encode_fh = gfs2_encode_fh,
fs/isofs/export.c: .encode_fh = isofs_export_encode_fh,
fs/nilfs2/namei.c: .encode_fh = nilfs_encode_fh,
fs/ocfs2/export.c: .encode_fh = ocfs2_encode_fh,
fs/reiserfs/super.c: .encode_fh = reiserfs_encode_fh,
fs/udf/namei.c: .encode_fh = udf_encode_fh,
fs/xfs/xfs_export.c: .encode_fh = xfs_fs_encode_fh,
mm/shmem.c: .encode_fh = shmem_encode_fh,

--b.

> After all, if the use of encode_fh become a mandatory rule we can easily
> extend fsnotify fdinfo output to support new scheme without breaking
> user space, because output looks like
>
> | fhandle-type: 1 f_handle: 49b1060023552153
>
> (ie if something is changed than these fields will be simply updated).
>
> Or maybe I miss something?
>
> Cyrill

2012-08-16 13:16:40

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

On Thu, Aug 16, 2012 at 08:47:03AM -0400, J. Bruce Fields wrote:
> > Hi Bruce, thinking a bit more I guess using general encode_fh is not that
> > convenient since it operates with dentries while our fdinfo output deals
> > with inodes. Thus I should either provide some new encode_fh variant
> > which would deal with inodes directly without "parents".
>
> I can't see why that wouldn't work.
>
> > Which doesn't
> > look for me anyhow better than the new export_encode_inode_fh helper.
>
> That isn't going to work for filesystems that define their own
> encode_fh:
>
> $ git grep '\.encode_fh'
> fs/btrfs/export.c: .encode_fh = btrfs_encode_fh,
> fs/ceph/export.c: .encode_fh = ceph_encode_fh,
...

Agreed. I'll try to cook something.

2012-08-16 13:43:45

by Al Viro

[permalink] [raw]
Subject: Re: [patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

On Thu, Aug 16, 2012 at 04:38:14PM +0400, Cyrill Gorcunov wrote:

> Hi Bruce, thinking a bit more I guess using general encode_fh is not that
> convenient since it operates with dentries while our fdinfo output deals
> with inodes. Thus I should either provide some new encode_fh variant
> which would deal with inodes directly without "parents". Which doesn't
> look for me anyhow better than the new export_encode_inode_fh helper.

Huh? You do have dentries, for crying out loud...

2012-08-16 13:47:35

by Pavel Emelyanov

[permalink] [raw]
Subject: Re: [patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

On 08/16/2012 05:43 PM, Al Viro wrote:
> On Thu, Aug 16, 2012 at 04:38:14PM +0400, Cyrill Gorcunov wrote:
>
>> Hi Bruce, thinking a bit more I guess using general encode_fh is not that
>> convenient since it operates with dentries while our fdinfo output deals
>> with inodes. Thus I should either provide some new encode_fh variant
>> which would deal with inodes directly without "parents". Which doesn't
>> look for me anyhow better than the new export_encode_inode_fh helper.
>
> Huh? You do have dentries, for crying out loud...

Sometimes we don't -- the inotify thing gets an inode only.
Unlike other notifies that have dentries at hands...

> .
>

Thanks,
Pavel

2012-08-16 13:48:53

by Al Viro

[permalink] [raw]
Subject: Re: [patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

On Thu, Aug 16, 2012 at 02:43:39PM +0100, Al Viro wrote:
> On Thu, Aug 16, 2012 at 04:38:14PM +0400, Cyrill Gorcunov wrote:
>
> > Hi Bruce, thinking a bit more I guess using general encode_fh is not that
> > convenient since it operates with dentries while our fdinfo output deals
> > with inodes. Thus I should either provide some new encode_fh variant
> > which would deal with inodes directly without "parents". Which doesn't
> > look for me anyhow better than the new export_encode_inode_fh helper.
>
> Huh? You do have dentries, for crying out loud...

Oww... You are using it for idiotify and the rest of that pile of
garbage? What a mess...

2012-08-16 13:50:22

by Al Viro

[permalink] [raw]
Subject: Re: [patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

On Thu, Aug 16, 2012 at 05:47:06PM +0400, Pavel Emelyanov wrote:
> On 08/16/2012 05:43 PM, Al Viro wrote:
> > On Thu, Aug 16, 2012 at 04:38:14PM +0400, Cyrill Gorcunov wrote:
> >
> >> Hi Bruce, thinking a bit more I guess using general encode_fh is not that
> >> convenient since it operates with dentries while our fdinfo output deals
> >> with inodes. Thus I should either provide some new encode_fh variant
> >> which would deal with inodes directly without "parents". Which doesn't
> >> look for me anyhow better than the new export_encode_inode_fh helper.
> >
> > Huh? You do have dentries, for crying out loud...
>
> Sometimes we don't -- the inotify thing gets an inode only.
> Unlike other notifies that have dentries at hands...

What's wrong with saying "we don't support idiotify"?

2012-08-16 13:53:51

by Pavel Emelyanov

[permalink] [raw]
Subject: Re: [patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

On 08/16/2012 05:50 PM, Al Viro wrote:
> On Thu, Aug 16, 2012 at 05:47:06PM +0400, Pavel Emelyanov wrote:
>> On 08/16/2012 05:43 PM, Al Viro wrote:
>>> On Thu, Aug 16, 2012 at 04:38:14PM +0400, Cyrill Gorcunov wrote:
>>>
>>>> Hi Bruce, thinking a bit more I guess using general encode_fh is not that
>>>> convenient since it operates with dentries while our fdinfo output deals
>>>> with inodes. Thus I should either provide some new encode_fh variant
>>>> which would deal with inodes directly without "parents". Which doesn't
>>>> look for me anyhow better than the new export_encode_inode_fh helper.
>>>
>>> Huh? You do have dentries, for crying out loud...
>>
>> Sometimes we don't -- the inotify thing gets an inode only.
>> Unlike other notifies that have dentries at hands...
>
> What's wrong with saying "we don't support idiotify"?

Lot's of exiting software uses them and we cannot say something like "we cannot
migrate a container with Apache running inside" :(

> .

Thanks,
Pavel

2012-08-16 13:54:56

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

On Thu, Aug 16, 2012 at 02:50:19PM +0100, Al Viro wrote:
> On Thu, Aug 16, 2012 at 05:47:06PM +0400, Pavel Emelyanov wrote:
> > On 08/16/2012 05:43 PM, Al Viro wrote:
> > > On Thu, Aug 16, 2012 at 04:38:14PM +0400, Cyrill Gorcunov wrote:
> > >
> > >> Hi Bruce, thinking a bit more I guess using general encode_fh is not that
> > >> convenient since it operates with dentries while our fdinfo output deals
> > >> with inodes. Thus I should either provide some new encode_fh variant
> > >> which would deal with inodes directly without "parents". Which doesn't
> > >> look for me anyhow better than the new export_encode_inode_fh helper.
> > >
> > > Huh? You do have dentries, for crying out loud...
> >
> > Sometimes we don't -- the inotify thing gets an inode only.
> > Unlike other notifies that have dentries at hands...
>
> What's wrong with saying "we don't support idiotify"?

Al, we need some way to restore inotifies after checkpoint.
At the very early versions of these patches I simply added
dentry to the inotify mark thus once inotify created we always
have a dentry to refer on in encode_fh, but I'm not sure if
this will be good design.

Cyrill

2012-08-16 14:03:09

by James Bottomley

[permalink] [raw]
Subject: Re: [patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

On Thu, 2012-08-16 at 17:54 +0400, Cyrill Gorcunov wrote:
> On Thu, Aug 16, 2012 at 02:50:19PM +0100, Al Viro wrote:
> > On Thu, Aug 16, 2012 at 05:47:06PM +0400, Pavel Emelyanov wrote:
> > > On 08/16/2012 05:43 PM, Al Viro wrote:
> > > > On Thu, Aug 16, 2012 at 04:38:14PM +0400, Cyrill Gorcunov wrote:
> > > >
> > > >> Hi Bruce, thinking a bit more I guess using general encode_fh is not that
> > > >> convenient since it operates with dentries while our fdinfo output deals
> > > >> with inodes. Thus I should either provide some new encode_fh variant
> > > >> which would deal with inodes directly without "parents". Which doesn't
> > > >> look for me anyhow better than the new export_encode_inode_fh helper.
> > > >
> > > > Huh? You do have dentries, for crying out loud...
> > >
> > > Sometimes we don't -- the inotify thing gets an inode only.
> > > Unlike other notifies that have dentries at hands...
> >
> > What's wrong with saying "we don't support idiotify"?
>
> Al, we need some way to restore inotifies after checkpoint.
> At the very early versions of these patches I simply added
> dentry to the inotify mark thus once inotify created we always
> have a dentry to refer on in encode_fh, but I'm not sure if
> this will be good design.

Actually, I was about to suggest this. This can be done internally
within fs/notify without actually modifying the syscall interface, can't
it, since they take a path which is used to obtain the inode? It looks
like the whole of the inotify interface could be internally recast to
use dentries instead of inodes. Unless I've missed something obvious?

James

????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2012-08-16 14:14:10

by Pavel Emelyanov

[permalink] [raw]
Subject: Re: [patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

On 08/16/2012 06:03 PM, James Bottomley wrote:
> On Thu, 2012-08-16 at 17:54 +0400, Cyrill Gorcunov wrote:
>> On Thu, Aug 16, 2012 at 02:50:19PM +0100, Al Viro wrote:
>>> On Thu, Aug 16, 2012 at 05:47:06PM +0400, Pavel Emelyanov wrote:
>>>> On 08/16/2012 05:43 PM, Al Viro wrote:
>>>>> On Thu, Aug 16, 2012 at 04:38:14PM +0400, Cyrill Gorcunov wrote:
>>>>>
>>>>>> Hi Bruce, thinking a bit more I guess using general encode_fh is not that
>>>>>> convenient since it operates with dentries while our fdinfo output deals
>>>>>> with inodes. Thus I should either provide some new encode_fh variant
>>>>>> which would deal with inodes directly without "parents". Which doesn't
>>>>>> look for me anyhow better than the new export_encode_inode_fh helper.
>>>>>
>>>>> Huh? You do have dentries, for crying out loud...
>>>>
>>>> Sometimes we don't -- the inotify thing gets an inode only.
>>>> Unlike other notifies that have dentries at hands...
>>>
>>> What's wrong with saying "we don't support idiotify"?
>>
>> Al, we need some way to restore inotifies after checkpoint.
>> At the very early versions of these patches I simply added
>> dentry to the inotify mark thus once inotify created we always
>> have a dentry to refer on in encode_fh, but I'm not sure if
>> this will be good design.
>
> Actually, I was about to suggest this. This can be done internally
> within fs/notify without actually modifying the syscall interface, can't
> it, since they take a path which is used to obtain the inode? It looks
> like the whole of the inotify interface could be internally recast to
> use dentries instead of inodes. Unless I've missed something obvious?

This will change the observable by userspace behavior. Various apps inotify a
file, then rename/unlink/link it or do tricks with mounts container the file.
And it works one way if the dentry+mount reference is 0 (now) and some other
way if it's not (after the proposed change).

The dentries-related behavior is especially bad on NFS with its silly-renames
decisions based on dentry reference counters. The mount-related one is bad in
general.

I'm saying this, because we were facing such problems at approx. once-a-week
rate when we did this in OpenVZ :(

> James
>

Thanks,
Pavel

2012-08-16 14:16:00

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

On Thu, Aug 16, 2012 at 02:03:00PM +0000, James Bottomley wrote:
> > > What's wrong with saying "we don't support idiotify"?
> >
> > Al, we need some way to restore inotifies after checkpoint.
> > At the very early versions of these patches I simply added
> > dentry to the inotify mark thus once inotify created we always
> > have a dentry to refer on in encode_fh, but I'm not sure if
> > this will be good design.
>
> Actually, I was about to suggest this. This can be done internally
> within fs/notify without actually modifying the syscall interface, can't
> it, since they take a path which is used to obtain the inode? It looks
> like the whole of the inotify interface could be internally recast to
> use dentries instead of inodes. Unless I've missed something obvious?

Well, after looking into do_sys_name_to_handle->exportfs_encode_fh
sequence more precisely it seems it will be easier to extend
exportfs_encode_fh to support inodes directly instead of playing
with notify code (again, if i'm not missing something too).
i'm cooking a patch to show (once it's tested i'll send it out).

2012-08-16 14:41:58

by Al Viro

[permalink] [raw]
Subject: Re: [patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

On Thu, Aug 16, 2012 at 06:15:53PM +0400, Cyrill Gorcunov wrote:
> On Thu, Aug 16, 2012 at 02:03:00PM +0000, James Bottomley wrote:
> > > > What's wrong with saying "we don't support idiotify"?
> > >
> > > Al, we need some way to restore inotifies after checkpoint.
> > > At the very early versions of these patches I simply added
> > > dentry to the inotify mark thus once inotify created we always
> > > have a dentry to refer on in encode_fh, but I'm not sure if
> > > this will be good design.
> >
> > Actually, I was about to suggest this. This can be done internally
> > within fs/notify without actually modifying the syscall interface, can't
> > it, since they take a path which is used to obtain the inode? It looks
> > like the whole of the inotify interface could be internally recast to
> > use dentries instead of inodes. Unless I've missed something obvious?
>
> Well, after looking into do_sys_name_to_handle->exportfs_encode_fh
> sequence more precisely it seems it will be easier to extend
> exportfs_encode_fh to support inodes directly instead of playing
> with notify code (again, if i'm not missing something too).
> i'm cooking a patch to show (once it's tested i'll send it out).

Good luck doing that with e.g. VFAT... And then there's such thing
as filesystems that don't have ->encode_fh() for a lot of very good
reasons; just try to do that on sysfs, for example. Or on ramfs,
for that matter... And while saying "you can't export that over
NFS" seems to work fine, idiotify-lovers will screech if you try
to ban their perversion of choice on those filesystems.

2012-08-16 14:48:42

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

On Thu, Aug 16, 2012 at 03:41:52PM +0100, Al Viro wrote:
> On Thu, Aug 16, 2012 at 06:15:53PM +0400, Cyrill Gorcunov wrote:
> > On Thu, Aug 16, 2012 at 02:03:00PM +0000, James Bottomley wrote:
> > > > > What's wrong with saying "we don't support idiotify"?
> > > >
> > > > Al, we need some way to restore inotifies after checkpoint.
> > > > At the very early versions of these patches I simply added
> > > > dentry to the inotify mark thus once inotify created we always
> > > > have a dentry to refer on in encode_fh, but I'm not sure if
> > > > this will be good design.
> > >
> > > Actually, I was about to suggest this. This can be done internally
> > > within fs/notify without actually modifying the syscall interface, can't
> > > it, since they take a path which is used to obtain the inode? It looks
> > > like the whole of the inotify interface could be internally recast to
> > > use dentries instead of inodes. Unless I've missed something obvious?
> >
> > Well, after looking into do_sys_name_to_handle->exportfs_encode_fh
> > sequence more precisely it seems it will be easier to extend
> > exportfs_encode_fh to support inodes directly instead of playing
> > with notify code (again, if i'm not missing something too).
> > i'm cooking a patch to show (once it's tested i'll send it out).
>
> Good luck doing that with e.g. VFAT... And then there's such thing
> as filesystems that don't have ->encode_fh() for a lot of very good

Wait, Al, it seems I messed up. If some fs has no encode_fh() implemented
the default encoding with FILEID_INO32_GEN_PARENT will be used for that.

> reasons; just try to do that on sysfs, for example. Or on ramfs,
> for that matter... And while saying "you can't export that over
> NFS" seems to work fine, idiotify-lovers will screech if you try
> to ban their perversion of choice on those filesystems.

Cyrill

2012-08-16 14:54:40

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

On Thu, Aug 16, 2012 at 06:48:35PM +0400, Cyrill Gorcunov wrote:
> On Thu, Aug 16, 2012 at 03:41:52PM +0100, Al Viro wrote:
> > On Thu, Aug 16, 2012 at 06:15:53PM +0400, Cyrill Gorcunov wrote:
> > > On Thu, Aug 16, 2012 at 02:03:00PM +0000, James Bottomley wrote:
> > > > > > What's wrong with saying "we don't support idiotify"?
> > > > >
> > > > > Al, we need some way to restore inotifies after checkpoint.
> > > > > At the very early versions of these patches I simply added
> > > > > dentry to the inotify mark thus once inotify created we always
> > > > > have a dentry to refer on in encode_fh, but I'm not sure if
> > > > > this will be good design.
> > > >
> > > > Actually, I was about to suggest this. This can be done internally
> > > > within fs/notify without actually modifying the syscall interface, can't
> > > > it, since they take a path which is used to obtain the inode? It looks
> > > > like the whole of the inotify interface could be internally recast to
> > > > use dentries instead of inodes. Unless I've missed something obvious?
> > >
> > > Well, after looking into do_sys_name_to_handle->exportfs_encode_fh
> > > sequence more precisely it seems it will be easier to extend
> > > exportfs_encode_fh to support inodes directly instead of playing
> > > with notify code (again, if i'm not missing something too).
> > > i'm cooking a patch to show (once it's tested i'll send it out).
> >
> > Good luck doing that with e.g. VFAT... And then there's such thing
> > as filesystems that don't have ->encode_fh() for a lot of very good
>
> Wait, Al, it seems I messed up. If some fs has no encode_fh() implemented
> the default encoding with FILEID_INO32_GEN_PARENT will be used for that.

Yeah, but then try decoding it; the first two lines of exportfs_decode_fh:

if (!nop || !nop->fh_to_dentry)
return ERR_PTR(-ESTALE);

Fundamentally, if you're asking for something that you can use to look up an
inode on a filesystem (and that works even after the inode's diseappeared from
the inode cache), then you're asking for a filehandle. Filesystems that
currently don't support filehandles probably lack that support for some good
reason.

--b.

>
> > reasons; just try to do that on sysfs, for example. Or on ramfs,
> > for that matter... And while saying "you can't export that over
> > NFS" seems to work fine, idiotify-lovers will screech if you try
> > to ban their perversion of choice on those filesystems.
>
> Cyrill

2012-08-16 14:55:32

by Al Viro

[permalink] [raw]
Subject: Re: [patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

On Thu, Aug 16, 2012 at 06:48:35PM +0400, Cyrill Gorcunov wrote:

> > Good luck doing that with e.g. VFAT... And then there's such thing
> > as filesystems that don't have ->encode_fh() for a lot of very good
>
> Wait, Al, it seems I messed up. If some fs has no encode_fh() implemented
> the default encoding with FILEID_INO32_GEN_PARENT will be used for that.

... which doesn't work for a lot of filesystems. Not if you want to be
able to decode the result afterwards and get something useful out of
that. Trying to implement ->fh_to_dentry(), especially with fhandle
generated by inode alone is going to be really interesting for a bunch
of stuff...

2012-08-16 14:57:10

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

On Thu, Aug 16, 2012 at 08:47:03AM -0400, J. Bruce Fields wrote:
> On Thu, Aug 16, 2012 at 04:38:14PM +0400, Cyrill Gorcunov wrote:
> > On Thu, Aug 16, 2012 at 10:24:48AM +0400, Cyrill Gorcunov wrote:
> > > > > > On the other hand, if you want a real filehandle then wouldn't you want
> > > > > > to e.g. call the filesystem's ->encode_fh() if necessary, as
> > > > > > exportfs_encode_fh() does?
> > > > >
> > > > > Well, one of the problem I hit when I've been trying to use encode_fh
> > > > > is that every new implementation of encode_fh will require some size
> > > > > (even unknown) in buffer where encoded data pushed. Correct me please
> > > > > if I'm wrong. But with export_encode_inode_fh there is a small buffer
> > > > > with pretty known size needed on stack needed for printing data in
> > > > > fdinfo.
> > > >
> > > > You can just give encode_fh a too-small data and let it fail if it's not
> > > > big enough.
> > > >
> > > > (In practice I think everyone supports NFSv3 filehandles which have a
> > > > maximum size of 64 bytes.)
> > >
> > > I'll think about it, thanks!
> >
> > Hi Bruce, thinking a bit more I guess using general encode_fh is not that
> > convenient since it operates with dentries while our fdinfo output deals
> > with inodes. Thus I should either provide some new encode_fh variant
> > which would deal with inodes directly without "parents".
>
> I can't see why that wouldn't work.

Guys, would the patch below be more-less acceptible?
In inotify I think we could pass "parent" as NULL and use general
encode engine then (ie it will look like someone called for
name_to_handle_at on inotify target).

---
fs, exportfs: Add exportfs_encode_inode_fh helper

This helpers allows to use per-sb encode_fh
functionality where inodes come in as arguments
(say the caller has no dentries to provide).

Signed-off-by: Cyrill Gorcunov <[email protected]>
CC: Pavel Emelyanov <[email protected]>
CC: Al Viro <[email protected]>
CC: "J. Bruce Fields" <[email protected]>
CC: Alexey Dobriyan <[email protected]>
CC: Andrew Morton <[email protected]>
CC: James Bottomley <[email protected]>
---
fs/exportfs/expfs.c | 19 ++++++++++++++-----
include/linux/exportfs.h | 2 ++
2 files changed, 16 insertions(+), 5 deletions(-)

Index: linux-2.6.git/fs/exportfs/expfs.c
===================================================================
--- linux-2.6.git.orig/fs/exportfs/expfs.c
+++ linux-2.6.git/fs/exportfs/expfs.c
@@ -341,10 +341,21 @@ static int export_encode_fh(struct inode
return type;
}

+int exportfs_encode_inode_fh(struct inode *inode, struct fid *fid,
+ int *max_len, struct inode *parent)
+{
+ const struct export_operations *nop = inode->i_sb->s_export_op;
+
+ if (nop)
+ return nop->encode_fh(inode, fid->raw, max_len, parent);
+
+ return export_encode_fh(inode, fid, max_len, parent);
+}
+EXPORT_SYMBOL_GPL(exportfs_encode_inode_fh);
+
int exportfs_encode_fh(struct dentry *dentry, struct fid *fid, int *max_len,
int connectable)
{
- const struct export_operations *nop = dentry->d_sb->s_export_op;
int error;
struct dentry *p = NULL;
struct inode *inode = dentry->d_inode, *parent = NULL;
@@ -357,10 +368,8 @@ int exportfs_encode_fh(struct dentry *de
*/
parent = p->d_inode;
}
- if (nop->encode_fh)
- error = nop->encode_fh(inode, fid->raw, max_len, parent);
- else
- error = export_encode_fh(inode, fid, max_len, parent);
+
+ error = exportfs_encode_inode_fh(inode, fid, max_len, parent);
dput(p);

return error;
Index: linux-2.6.git/include/linux/exportfs.h
===================================================================
--- linux-2.6.git.orig/include/linux/exportfs.h
+++ linux-2.6.git/include/linux/exportfs.h
@@ -177,6 +177,8 @@ struct export_operations {
int (*commit_metadata)(struct inode *inode);
};

+extern int exportfs_encode_inode_fh(struct inode *inode, struct fid *fid,
+ int *max_len, struct inode *parent);
extern int exportfs_encode_fh(struct dentry *dentry, struct fid *fid,
int *max_len, int connectable);
extern struct dentry *exportfs_decode_fh(struct vfsmount *mnt, struct fid *fid,

2012-08-16 15:05:47

by Al Viro

[permalink] [raw]
Subject: Re: [patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

On Thu, Aug 16, 2012 at 06:57:00PM +0400, Cyrill Gorcunov wrote:

> Guys, would the patch below be more-less acceptible?
> In inotify I think we could pass "parent" as NULL and use general
> encode engine then (ie it will look like someone called for
> name_to_handle_at on inotify target).

Wait. What the hell are you going to do with those afterwards?
Again, there's a shitload of filesystems that cannot be exported
over NFS, exactly because there's no way to implement sanely
working fhandles. And idiotify is allowed for all of them.

You *can't* decode anything fhandle-like on e.g. sysfs. Or procfs.
Or configfs. Or any network filesystem (I'd argue that we should simply
ban idiotify on those, but good luck doing that). You *can* do that
for FAT derivatives, but only if you have parent directory when creating
that sucker.

2012-08-16 15:06:36

by Pavel Emelyanov

[permalink] [raw]
Subject: Re: [patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

On 08/16/2012 06:55 PM, Al Viro wrote:
> On Thu, Aug 16, 2012 at 06:48:35PM +0400, Cyrill Gorcunov wrote:
>
>>> Good luck doing that with e.g. VFAT... And then there's such thing
>>> as filesystems that don't have ->encode_fh() for a lot of very good
>>
>> Wait, Al, it seems I messed up. If some fs has no encode_fh() implemented
>> the default encoding with FILEID_INO32_GEN_PARENT will be used for that.
>
> ... which doesn't work for a lot of filesystems. Not if you want to be
> able to decode the result afterwards and get something useful out of
> that. Trying to implement ->fh_to_dentry(), especially with fhandle
> generated by inode alone is going to be really interesting for a bunch
> of stuff...
> .
>

Hm... Then I suppose the best we can do is -- show in a fdinfo file the inode
number, device where it is and a filehandle _iff_ provided by a filesystem.
For fanotify/dnotify -- only a path.

Thanks,
Pavel

2012-08-16 15:07:55

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

On Thu, Aug 16, 2012 at 03:55:27PM +0100, Al Viro wrote:
> On Thu, Aug 16, 2012 at 06:48:35PM +0400, Cyrill Gorcunov wrote:
>
> > > Good luck doing that with e.g. VFAT... And then there's such thing
> > > as filesystems that don't have ->encode_fh() for a lot of very good
> >
> > Wait, Al, it seems I messed up. If some fs has no encode_fh() implemented
> > the default encoding with FILEID_INO32_GEN_PARENT will be used for that.
>
> ... which doesn't work for a lot of filesystems. Not if you want to be
> able to decode the result afterwards and get something useful out of
> that. Trying to implement ->fh_to_dentry(), especially with fhandle
> generated by inode alone is going to be really interesting for a bunch
> of stuff...

hmm, yup (and Bruce just pointed me to exportfs_decode_fh). So, if fs doesn't
provide encode_fh (and decode as well) but the FILEID_INO32_GEN_PARENT used
instead, we will not be able to restore such inotify later. Then I suppose
such application will be unrestorable at least for a while. Would not this
be acceptable trade off?

Cyrill

2012-08-16 15:09:37

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

On Thu, Aug 16, 2012 at 04:05:42PM +0100, Al Viro wrote:
> On Thu, Aug 16, 2012 at 06:57:00PM +0400, Cyrill Gorcunov wrote:
>
> > Guys, would the patch below be more-less acceptible?
> > In inotify I think we could pass "parent" as NULL and use general
> > encode engine then (ie it will look like someone called for
> > name_to_handle_at on inotify target).
>
> Wait. What the hell are you going to do with those afterwards?
> Again, there's a shitload of filesystems that cannot be exported
> over NFS, exactly because there's no way to implement sanely
> working fhandles. And idiotify is allowed for all of them.

Yes, just recognized that, Al. There will be no way to restore them
from fhandle provided via fdinfo. /me scratching the head...

> You *can't* decode anything fhandle-like on e.g. sysfs. Or procfs.
> Or configfs. Or any network filesystem (I'd argue that we should simply
> ban idiotify on those, but good luck doing that). You *can* do that
> for FAT derivatives, but only if you have parent directory when creating
> that sucker.

Cyrill

2012-08-16 15:36:07

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

On Thu, Aug 16, 2012 at 07:06:18PM +0400, Pavel Emelyanov wrote:
> On 08/16/2012 06:55 PM, Al Viro wrote:
> > On Thu, Aug 16, 2012 at 06:48:35PM +0400, Cyrill Gorcunov wrote:
> >
> >>> Good luck doing that with e.g. VFAT... And then there's such thing
> >>> as filesystems that don't have ->encode_fh() for a lot of very good
> >>
> >> Wait, Al, it seems I messed up. If some fs has no encode_fh() implemented
> >> the default encoding with FILEID_INO32_GEN_PARENT will be used for that.
> >
> > ... which doesn't work for a lot of filesystems. Not if you want to be
> > able to decode the result afterwards and get something useful out of
> > that. Trying to implement ->fh_to_dentry(), especially with fhandle
> > generated by inode alone is going to be really interesting for a bunch
> > of stuff...
> > .
> >
>
> Hm... Then I suppose the best we can do is -- show in a fdinfo file the inode
> number, device where it is and a filehandle _iff_ provided by a filesystem.
> For fanotify/dnotify -- only a path.

For notifications on mount points it's not a problem (we already do that

+ ret = seq_printf(m, "fanotify mnt_id: %8x mask: %8x ignored_mask: %8x\n",
+ mnt->mnt_id, mark->mask, mark->ignored_mask);

printing inode number and device it's easy as well. Fetching the path is not
obvious for me since inotify carries inodes only. To generate path I would
have to obtain dentry from inode I suppose, that's what you mean?

Cyrill

2012-08-17 20:58:53

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

Al Viro <[email protected]> writes:

> On Thu, Aug 16, 2012 at 06:15:53PM +0400, Cyrill Gorcunov wrote:
>> On Thu, Aug 16, 2012 at 02:03:00PM +0000, James Bottomley wrote:
>> > > > What's wrong with saying "we don't support idiotify"?
>> > >
>> > > Al, we need some way to restore inotifies after checkpoint.
>> > > At the very early versions of these patches I simply added
>> > > dentry to the inotify mark thus once inotify created we always
>> > > have a dentry to refer on in encode_fh, but I'm not sure if
>> > > this will be good design.
>> >
>> > Actually, I was about to suggest this. This can be done internally
>> > within fs/notify without actually modifying the syscall interface, can't
>> > it, since they take a path which is used to obtain the inode? It looks
>> > like the whole of the inotify interface could be internally recast to
>> > use dentries instead of inodes. Unless I've missed something obvious?
>>
>> Well, after looking into do_sys_name_to_handle->exportfs_encode_fh
>> sequence more precisely it seems it will be easier to extend
>> exportfs_encode_fh to support inodes directly instead of playing
>> with notify code (again, if i'm not missing something too).
>> i'm cooking a patch to show (once it's tested i'll send it out).
>
> Good luck doing that with e.g. VFAT... And then there's such thing
> as filesystems that don't have ->encode_fh() for a lot of very good
> reasons; just try to do that on sysfs, for example. Or on ramfs,
> for that matter... And while saying "you can't export that over
> NFS" seems to work fine, idiotify-lovers will screech if you try
> to ban their perversion of choice on those filesystems.

For whatever it is worth inotify does not currently work on sysfs or
procfs or any other filesystem that looks like a network filesystem and
whose modifications don't proceed through the vfs like a normal
filesystem.

Eric

2012-08-20 14:28:56

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: [patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

Cyrill Gorcunov <[email protected]> writes:

> To provide fsnotify object inodes being watched without
> binding to alphabetical path we need to encode them with
> exportfs help. This patch adds a helper which operates
> with plain inodes directly.

doesn't name_to_handle_at() work for you ? It also allows to get a file
handle using file descriptor.

>
> Signed-off-by: Cyrill Gorcunov <[email protected]>
> Acked-by: Pavel Emelyanov <[email protected]>
> CC: Al Viro <[email protected]>
> CC: Alexey Dobriyan <[email protected]>
> CC: Andrew Morton <[email protected]>
> CC: James Bottomley <[email protected]>
> ---
> fs/exportfs/expfs.c | 19 +++++++++++++++++++
> include/linux/exportfs.h | 2 ++
> 2 files changed, 21 insertions(+)
>
> Index: linux-2.6.git/fs/exportfs/expfs.c
> ===================================================================
> --- linux-2.6.git.orig/fs/exportfs/expfs.c
> +++ linux-2.6.git/fs/exportfs/expfs.c
> @@ -302,6 +302,25 @@ out:
> return error;
> }
>
> +int export_encode_inode_fh(struct inode *inode, struct fid *fid, int *max_len)
> +{
> + int len = *max_len;
> + int type = FILEID_INO32_GEN;
> +
> + if (len < 2) {
> + *max_len = 2;
> + return 255;
> + }
> +
> + len = 2;
> + fid->i32.ino = inode->i_ino;
> + fid->i32.gen = inode->i_generation;
> + *max_len = len;
> +
> + return type;
> +}
> +EXPORT_SYMBOL_GPL(export_encode_inode_fh);
> +


If you are looking at getting file handle, that may not be
sufficient. Some file system put more info in file handle (btrfs)

> /**
> * export_encode_fh - default export_operations->encode_fh function
> * @inode: the object to encode
> Index: linux-2.6.git/include/linux/exportfs.h
> ===================================================================
> --- linux-2.6.git.orig/include/linux/exportfs.h
> +++ linux-2.6.git/include/linux/exportfs.h
> @@ -177,6 +177,8 @@ struct export_operations {
> int (*commit_metadata)(struct inode *inode);
> };
>
> +extern int export_encode_inode_fh(struct inode *inode, struct fid *fid, int *max_len);
> +
> extern int exportfs_encode_fh(struct dentry *dentry, struct fid *fid,
> int *max_len, int connectable);
> extern struct dentry *exportfs_decode_fh(struct vfsmount *mnt, struct fid *fid,
>

-aneesh

2012-08-20 16:33:45

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

On Mon, Aug 20, 2012 at 07:49:23PM +0530, Aneesh Kumar K.V wrote:
> Cyrill Gorcunov <[email protected]> writes:
>
> > To provide fsnotify object inodes being watched without
> > binding to alphabetical path we need to encode them with
> > exportfs help. This patch adds a helper which operates
> > with plain inodes directly.
>
> doesn't name_to_handle_at() work for you ? It also allows to get a file
> handle using file descriptor.

Hi, sorry for dealy. Well, the last idea is to get rid of this helper,
I've sent out an updated version where ino+dev is only printed.

Cyrill

2012-08-20 18:32:33

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

On Mon, Aug 20, 2012 at 08:33:38PM +0400, Cyrill Gorcunov wrote:
> On Mon, Aug 20, 2012 at 07:49:23PM +0530, Aneesh Kumar K.V wrote:
> > Cyrill Gorcunov <[email protected]> writes:
> >
> > > To provide fsnotify object inodes being watched without
> > > binding to alphabetical path we need to encode them with
> > > exportfs help. This patch adds a helper which operates
> > > with plain inodes directly.
> >
> > doesn't name_to_handle_at() work for you ? It also allows to get a file
> > handle using file descriptor.
>
> Hi, sorry for dealy. Well, the last idea is to get rid of this helper,
> I've sent out an updated version where ino+dev is only printed.

I don't understand how ino and dev are useful to you, though, if you're
still hoping to be able to look up inodes using this information later
on.

--b.

2012-08-20 19:06:17

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

On Mon, Aug 20, 2012 at 02:32:25PM -0400, J. Bruce Fields wrote:
> On Mon, Aug 20, 2012 at 08:33:38PM +0400, Cyrill Gorcunov wrote:
> > On Mon, Aug 20, 2012 at 07:49:23PM +0530, Aneesh Kumar K.V wrote:
> > > Cyrill Gorcunov <[email protected]> writes:
> > >
> > > > To provide fsnotify object inodes being watched without
> > > > binding to alphabetical path we need to encode them with
> > > > exportfs help. This patch adds a helper which operates
> > > > with plain inodes directly.
> > >
> > > doesn't name_to_handle_at() work for you ? It also allows to get a file
> > > handle using file descriptor.
> >
> > Hi, sorry for dealy. Well, the last idea is to get rid of this helper,
> > I've sent out an updated version where ino+dev is only printed.
>
> I don't understand how ino and dev are useful to you, though, if you're
> still hoping to be able to look up inodes using this information later
> on.

Hi Bruce, I believe having ino+dev is better than nothing. Otherwise we
simply have no clue which targets are bound to inotify mark. Sometime
(!) we can try to generate fhandle in userspace from this ino+dev bundle
and then open the target file.

Cyrill

2012-08-20 19:32:10

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

On Mon, Aug 20, 2012 at 11:06:06PM +0400, Cyrill Gorcunov wrote:
> On Mon, Aug 20, 2012 at 02:32:25PM -0400, J. Bruce Fields wrote:
> > On Mon, Aug 20, 2012 at 08:33:38PM +0400, Cyrill Gorcunov wrote:
> > > On Mon, Aug 20, 2012 at 07:49:23PM +0530, Aneesh Kumar K.V wrote:
> > > > Cyrill Gorcunov <[email protected]> writes:
> > > >
> > > > > To provide fsnotify object inodes being watched without
> > > > > binding to alphabetical path we need to encode them with
> > > > > exportfs help. This patch adds a helper which operates
> > > > > with plain inodes directly.
> > > >
> > > > doesn't name_to_handle_at() work for you ? It also allows to get a file
> > > > handle using file descriptor.
> > >
> > > Hi, sorry for dealy. Well, the last idea is to get rid of this helper,
> > > I've sent out an updated version where ino+dev is only printed.
> >
> > I don't understand how ino and dev are useful to you, though, if you're
> > still hoping to be able to look up inodes using this information later
> > on.
>
> Hi Bruce, I believe having ino+dev is better than nothing. Otherwise we
> simply have no clue which targets are bound to inotify mark. Sometime
> (!) we can try to generate fhandle in userspace from this ino+dev bundle
> and then open the target file.

That's insufficient to generate a filehandle in general.

(Also: there's the usual inode-number aliasing problem: the inode number
could get reused by another file. Unless you know the file is being
held open the whole time.)

--b.

2012-08-20 19:40:45

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

On Mon, Aug 20, 2012 at 03:32:04PM -0400, J. Bruce Fields wrote:
> > > > Hi, sorry for dealy. Well, the last idea is to get rid of this helper,
> > > > I've sent out an updated version where ino+dev is only printed.
> > >
> > > I don't understand how ino and dev are useful to you, though, if you're
> > > still hoping to be able to look up inodes using this information later
> > > on.
> >
> > Hi Bruce, I believe having ino+dev is better than nothing. Otherwise we
> > simply have no clue which targets are bound to inotify mark. Sometime
> > (!) we can try to generate fhandle in userspace from this ino+dev bundle
> > and then open the target file.
>
> That's insufficient to generate a filehandle in general.

That's why I said `sometime', and this is better than nothing.

> (Also: there's the usual inode-number aliasing problem: the inode number
> could get reused by another file. Unless you know the file is being
> held open the whole time.)

For c/r session inode should exist and not get reused.

Cyrill

2012-08-21 09:18:47

by Pavel Emelyanov

[permalink] [raw]
Subject: Re: [patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

On 08/20/2012 11:32 PM, J. Bruce Fields wrote:
> On Mon, Aug 20, 2012 at 11:06:06PM +0400, Cyrill Gorcunov wrote:
>> On Mon, Aug 20, 2012 at 02:32:25PM -0400, J. Bruce Fields wrote:
>>> On Mon, Aug 20, 2012 at 08:33:38PM +0400, Cyrill Gorcunov wrote:
>>>> On Mon, Aug 20, 2012 at 07:49:23PM +0530, Aneesh Kumar K.V wrote:
>>>>> Cyrill Gorcunov <[email protected]> writes:
>>>>>
>>>>>> To provide fsnotify object inodes being watched without
>>>>>> binding to alphabetical path we need to encode them with
>>>>>> exportfs help. This patch adds a helper which operates
>>>>>> with plain inodes directly.
>>>>>
>>>>> doesn't name_to_handle_at() work for you ? It also allows to get a file
>>>>> handle using file descriptor.
>>>>
>>>> Hi, sorry for dealy. Well, the last idea is to get rid of this helper,
>>>> I've sent out an updated version where ino+dev is only printed.
>>>
>>> I don't understand how ino and dev are useful to you, though, if you're
>>> still hoping to be able to look up inodes using this information later
>>> on.
>>
>> Hi Bruce, I believe having ino+dev is better than nothing. Otherwise we
>> simply have no clue which targets are bound to inotify mark. Sometime
>> (!) we can try to generate fhandle in userspace from this ino+dev bundle
>> and then open the target file.
>
> That's insufficient to generate a filehandle in general.

Yes, sure, but for live migration having inode and device is enough and that's why.
We can use two ways of having a filesystem on the target machine in the same
state (from paths points of view) as it was on destination one:

1. copy file tree in a rsync manner
2. copy a virtual disk image file

In the 1st case we can map inode number to path easily, since we iterate over a filesystem
anyway. I agree, that rsync is not perfect for migration but still.

In the 2nd case we can generate filehandle out of an inode number only since we _do_ know
that inode will not get reused.


However, if you have some better ideas on what information about inode should be exported
to the userspace please share.

> (Also: there's the usual inode-number aliasing problem: the inode number
> could get reused by another file. Unless you know the file is being
> held open the whole time.)
>
> --b.
> .
>

Thanks,
Pavel

2012-08-21 10:43:40

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: [patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

Pavel Emelyanov <[email protected]> writes:

> On 08/20/2012 11:32 PM, J. Bruce Fields wrote:
>> On Mon, Aug 20, 2012 at 11:06:06PM +0400, Cyrill Gorcunov wrote:
>>> On Mon, Aug 20, 2012 at 02:32:25PM -0400, J. Bruce Fields wrote:
>>>> On Mon, Aug 20, 2012 at 08:33:38PM +0400, Cyrill Gorcunov wrote:
>>>>> On Mon, Aug 20, 2012 at 07:49:23PM +0530, Aneesh Kumar K.V wrote:
>>>>>> Cyrill Gorcunov <[email protected]> writes:
>>>>>>
>>>>>>> To provide fsnotify object inodes being watched without
>>>>>>> binding to alphabetical path we need to encode them with
>>>>>>> exportfs help. This patch adds a helper which operates
>>>>>>> with plain inodes directly.
>>>>>>
>>>>>> doesn't name_to_handle_at() work for you ? It also allows to get a file
>>>>>> handle using file descriptor.
>>>>>
>>>>> Hi, sorry for dealy. Well, the last idea is to get rid of this helper,
>>>>> I've sent out an updated version where ino+dev is only printed.
>>>>
>>>> I don't understand how ino and dev are useful to you, though, if you're
>>>> still hoping to be able to look up inodes using this information later
>>>> on.
>>>
>>> Hi Bruce, I believe having ino+dev is better than nothing. Otherwise we
>>> simply have no clue which targets are bound to inotify mark. Sometime
>>> (!) we can try to generate fhandle in userspace from this ino+dev bundle
>>> and then open the target file.
>>
>> That's insufficient to generate a filehandle in general.
>
> Yes, sure, but for live migration having inode and device is enough and that's why.
> We can use two ways of having a filesystem on the target machine in the same
> state (from paths points of view) as it was on destination one:
>
> 1. copy file tree in a rsync manner
> 2. copy a virtual disk image file
>
> In the 1st case we can map inode number to path easily, since we iterate over a filesystem
> anyway. I agree, that rsync is not perfect for migration but still.
>
> In the 2nd case we can generate filehandle out of an inode number only since we _do_ know
> that inode will not get reused.

If you are going to to use open_by_handle, then that handle is not
sufficient right ? Or do you have open_by_inode ? as part of c/r ?

>
>
> However, if you have some better ideas on what information about inode should be exported
> to the userspace please share.
>

Why not use name_to_handle(fd,...) and open_by_handle(handle,..) ?


>> (Also: there's the usual inode-number aliasing problem: the inode number
>> could get reused by another file. Unless you know the file is being
>> held open the whole time.)
>>

-aneesh

2012-08-21 10:50:09

by Pavel Emelyanov

[permalink] [raw]
Subject: Re: [patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

On 08/21/2012 02:42 PM, Aneesh Kumar K.V wrote:
> Pavel Emelyanov <[email protected]> writes:
>
>> On 08/20/2012 11:32 PM, J. Bruce Fields wrote:
>>> On Mon, Aug 20, 2012 at 11:06:06PM +0400, Cyrill Gorcunov wrote:
>>>> On Mon, Aug 20, 2012 at 02:32:25PM -0400, J. Bruce Fields wrote:
>>>>> On Mon, Aug 20, 2012 at 08:33:38PM +0400, Cyrill Gorcunov wrote:
>>>>>> On Mon, Aug 20, 2012 at 07:49:23PM +0530, Aneesh Kumar K.V wrote:
>>>>>>> Cyrill Gorcunov <[email protected]> writes:
>>>>>>>
>>>>>>>> To provide fsnotify object inodes being watched without
>>>>>>>> binding to alphabetical path we need to encode them with
>>>>>>>> exportfs help. This patch adds a helper which operates
>>>>>>>> with plain inodes directly.
>>>>>>>
>>>>>>> doesn't name_to_handle_at() work for you ? It also allows to get a file
>>>>>>> handle using file descriptor.
>>>>>>
>>>>>> Hi, sorry for dealy. Well, the last idea is to get rid of this helper,
>>>>>> I've sent out an updated version where ino+dev is only printed.
>>>>>
>>>>> I don't understand how ino and dev are useful to you, though, if you're
>>>>> still hoping to be able to look up inodes using this information later
>>>>> on.
>>>>
>>>> Hi Bruce, I believe having ino+dev is better than nothing. Otherwise we
>>>> simply have no clue which targets are bound to inotify mark. Sometime
>>>> (!) we can try to generate fhandle in userspace from this ino+dev bundle
>>>> and then open the target file.
>>>
>>> That's insufficient to generate a filehandle in general.
>>
>> Yes, sure, but for live migration having inode and device is enough and that's why.
>> We can use two ways of having a filesystem on the target machine in the same
>> state (from paths points of view) as it was on destination one:
>>
>> 1. copy file tree in a rsync manner
>> 2. copy a virtual disk image file
>>
>> In the 1st case we can map inode number to path easily, since we iterate over a filesystem
>> anyway. I agree, that rsync is not perfect for migration but still.
>>
>> In the 2nd case we can generate filehandle out of an inode number only since we _do_ know
>> that inode will not get reused.
>
> If you are going to to use open_by_handle, then that handle is not
> sufficient right ? Or do you have open_by_inode ? as part of c/r ?

Why? For e.g. ext4 you can construct a handle in userspace and open by it.

>>
>> However, if you have some better ideas on what information about inode should be exported
>> to the userspace please share.
>>
>
> Why not use name_to_handle(fd,...) and open_by_handle(handle,..) ?

Because we don't have an fd at hands by the time we need to know the handle.

>
>>> (Also: there's the usual inode-number aliasing problem: the inode number
>>> could get reused by another file. Unless you know the file is being
>>> held open the whole time.)
>>>
>
> -aneesh
>
> .
>

Thanks,
Pavel

2012-08-21 10:54:34

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

On Tue, Aug 21, 2012 at 02:49:47PM +0400, Pavel Emelyanov wrote:
> >>
> >> However, if you have some better ideas on what information about inode should be exported
> >> to the userspace please share.
> >>
> >
> > Why not use name_to_handle(fd,...) and open_by_handle(handle,..) ?
>
> Because we don't have an fd at hands by the time we need to know the handle.

Yeah, this might be not clear from patchset itself but inotify marks carry
inodes inside kernel thus it's inodes what we can use when we fetch information
about targets and put it into fdinfo output.

Cyrill

2012-08-21 11:09:28

by Pavel Emelyanov

[permalink] [raw]
Subject: Re: [patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

On 08/21/2012 02:54 PM, Cyrill Gorcunov wrote:
> On Tue, Aug 21, 2012 at 02:49:47PM +0400, Pavel Emelyanov wrote:
>>>>
>>>> However, if you have some better ideas on what information about inode should be exported
>>>> to the userspace please share.
>>>>
>>>
>>> Why not use name_to_handle(fd,...) and open_by_handle(handle,..) ?
>>
>> Because we don't have an fd at hands by the time we need to know the handle.
>
> Yeah, this might be not clear from patchset itself but inotify marks carry
> inodes inside kernel thus it's inodes what we can use when we fetch information
> about targets and put it into fdinfo output.

Al, Bruce, Aneesh,

What if we calculate the handle at the time we do have struct path at hands (i.e.
when we create the inotify) and store it on the inotify structure purely to be
shown later in proc. Would that be acceptable?

> Cyrill
> .

Thanks,
Pavel

2012-08-21 12:09:16

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

On Tue, Aug 21, 2012 at 02:49:47PM +0400, Pavel Emelyanov wrote:
> On 08/21/2012 02:42 PM, Aneesh Kumar K.V wrote:
> > Pavel Emelyanov <[email protected]> writes:
> >
> >> On 08/20/2012 11:32 PM, J. Bruce Fields wrote:
> >>> On Mon, Aug 20, 2012 at 11:06:06PM +0400, Cyrill Gorcunov wrote:
> >>>> On Mon, Aug 20, 2012 at 02:32:25PM -0400, J. Bruce Fields wrote:
> >>>>> On Mon, Aug 20, 2012 at 08:33:38PM +0400, Cyrill Gorcunov wrote:
> >>>>>> On Mon, Aug 20, 2012 at 07:49:23PM +0530, Aneesh Kumar K.V wrote:
> >>>>>>> Cyrill Gorcunov <[email protected]> writes:
> >>>>>>>
> >>>>>>>> To provide fsnotify object inodes being watched without
> >>>>>>>> binding to alphabetical path we need to encode them with
> >>>>>>>> exportfs help. This patch adds a helper which operates
> >>>>>>>> with plain inodes directly.
> >>>>>>>
> >>>>>>> doesn't name_to_handle_at() work for you ? It also allows to get a file
> >>>>>>> handle using file descriptor.
> >>>>>>
> >>>>>> Hi, sorry for dealy. Well, the last idea is to get rid of this helper,
> >>>>>> I've sent out an updated version where ino+dev is only printed.
> >>>>>
> >>>>> I don't understand how ino and dev are useful to you, though, if you're
> >>>>> still hoping to be able to look up inodes using this information later
> >>>>> on.
> >>>>
> >>>> Hi Bruce, I believe having ino+dev is better than nothing. Otherwise we
> >>>> simply have no clue which targets are bound to inotify mark. Sometime
> >>>> (!) we can try to generate fhandle in userspace from this ino+dev bundle
> >>>> and then open the target file.
> >>>
> >>> That's insufficient to generate a filehandle in general.
> >>
> >> Yes, sure, but for live migration having inode and device is enough and that's why.
> >> We can use two ways of having a filesystem on the target machine in the same
> >> state (from paths points of view) as it was on destination one:
> >>
> >> 1. copy file tree in a rsync manner
> >> 2. copy a virtual disk image file
> >>
> >> In the 1st case we can map inode number to path easily, since we iterate over a filesystem

OK. Then you don't care about unlinked files?

If the filesystem's frozen by the time you get here, I suppose you could
also just use paths?

> >> anyway. I agree, that rsync is not perfect for migration but still.
> >>
> >> In the 2nd case we can generate filehandle out of an inode number only since we _do_ know
> >> that inode will not get reused.
> >
> > If you are going to to use open_by_handle, then that handle is not
> > sufficient right ? Or do you have open_by_inode ? as part of c/r ?
>
> Why? For e.g. ext4 you can construct a handle in userspace and open by it.

If it's a real filehandle you want, in general you don't want to
construct it in userspace--depending on the filesystem it may require
filesystem-specific knowledge.

--b.

>
> >>
> >> However, if you have some better ideas on what information about inode should be exported
> >> to the userspace please share.
> >>
> >
> > Why not use name_to_handle(fd,...) and open_by_handle(handle,..) ?
>
> Because we don't have an fd at hands by the time we need to know the handle.
>
> >
> >>> (Also: there's the usual inode-number aliasing problem: the inode number
> >>> could get reused by another file. Unless you know the file is being
> >>> held open the whole time.)
> >>>
> >
> > -aneesh
> >
> > .
> >
>
> Thanks,
> Pavel

2012-08-21 12:12:01

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

On Tue, Aug 21, 2012 at 03:09:05PM +0400, Pavel Emelyanov wrote:
> On 08/21/2012 02:54 PM, Cyrill Gorcunov wrote:
> > On Tue, Aug 21, 2012 at 02:49:47PM +0400, Pavel Emelyanov wrote:
> >>>>
> >>>> However, if you have some better ideas on what information about inode should be exported
> >>>> to the userspace please share.
> >>>>
> >>>
> >>> Why not use name_to_handle(fd,...) and open_by_handle(handle,..) ?
> >>
> >> Because we don't have an fd at hands by the time we need to know the handle.
> >
> > Yeah, this might be not clear from patchset itself but inotify marks carry
> > inodes inside kernel thus it's inodes what we can use when we fetch information
> > about targets and put it into fdinfo output.
>
> Al, Bruce, Aneesh,
>
> What if we calculate the handle at the time we do have struct path at hands (i.e.
> when we create the inotify) and store it on the inotify structure purely to be
> shown later in proc. Would that be acceptable?

Was it the lack of a dentry that was really the problem? I thought it
was just the fact that not all filesystems support filehandles.

--b.

2012-08-21 12:23:15

by Pavel Emelyanov

[permalink] [raw]
Subject: Re: [patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

On 08/21/2012 04:11 PM, J. Bruce Fields wrote:
> On Tue, Aug 21, 2012 at 03:09:05PM +0400, Pavel Emelyanov wrote:
>> On 08/21/2012 02:54 PM, Cyrill Gorcunov wrote:
>>> On Tue, Aug 21, 2012 at 02:49:47PM +0400, Pavel Emelyanov wrote:
>>>>>>
>>>>>> However, if you have some better ideas on what information about inode should be exported
>>>>>> to the userspace please share.
>>>>>>
>>>>>
>>>>> Why not use name_to_handle(fd,...) and open_by_handle(handle,..) ?
>>>>
>>>> Because we don't have an fd at hands by the time we need to know the handle.
>>>
>>> Yeah, this might be not clear from patchset itself but inotify marks carry
>>> inodes inside kernel thus it's inodes what we can use when we fetch information
>>> about targets and put it into fdinfo output.
>>
>> Al, Bruce, Aneesh,
>>
>> What if we calculate the handle at the time we do have struct path at hands (i.e.
>> when we create the inotify) and store it on the inotify structure purely to be
>> shown later in proc. Would that be acceptable?
>
> Was it the lack of a dentry that was really the problem? I thought it
> was just the fact that not all filesystems support filehandles.

Initial problem -- we don't know what is being watched by an inotify fd.

Having a dentry somewhere was the 1st attempt to solve this -- keep a path
in inotify and show it when required. It doesn't work since holding a ref on
path changes the behavior of watched inode (we cannot rename/unlink/remount
it the same way as we could before patching the kernel).

> --b.
> .
>

Thanks,
Pavel

2012-08-21 12:26:20

by Pavel Emelyanov

[permalink] [raw]
Subject: Re: [patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

On 08/21/2012 04:09 PM, J. Bruce Fields wrote:
> On Tue, Aug 21, 2012 at 02:49:47PM +0400, Pavel Emelyanov wrote:
>> On 08/21/2012 02:42 PM, Aneesh Kumar K.V wrote:
>>> Pavel Emelyanov <[email protected]> writes:
>>>
>>>> On 08/20/2012 11:32 PM, J. Bruce Fields wrote:
>>>>> On Mon, Aug 20, 2012 at 11:06:06PM +0400, Cyrill Gorcunov wrote:
>>>>>> On Mon, Aug 20, 2012 at 02:32:25PM -0400, J. Bruce Fields wrote:
>>>>>>> On Mon, Aug 20, 2012 at 08:33:38PM +0400, Cyrill Gorcunov wrote:
>>>>>>>> On Mon, Aug 20, 2012 at 07:49:23PM +0530, Aneesh Kumar K.V wrote:
>>>>>>>>> Cyrill Gorcunov <[email protected]> writes:
>>>>>>>>>
>>>>>>>>>> To provide fsnotify object inodes being watched without
>>>>>>>>>> binding to alphabetical path we need to encode them with
>>>>>>>>>> exportfs help. This patch adds a helper which operates
>>>>>>>>>> with plain inodes directly.
>>>>>>>>>
>>>>>>>>> doesn't name_to_handle_at() work for you ? It also allows to get a file
>>>>>>>>> handle using file descriptor.
>>>>>>>>
>>>>>>>> Hi, sorry for dealy. Well, the last idea is to get rid of this helper,
>>>>>>>> I've sent out an updated version where ino+dev is only printed.
>>>>>>>
>>>>>>> I don't understand how ino and dev are useful to you, though, if you're
>>>>>>> still hoping to be able to look up inodes using this information later
>>>>>>> on.
>>>>>>
>>>>>> Hi Bruce, I believe having ino+dev is better than nothing. Otherwise we
>>>>>> simply have no clue which targets are bound to inotify mark. Sometime
>>>>>> (!) we can try to generate fhandle in userspace from this ino+dev bundle
>>>>>> and then open the target file.
>>>>>
>>>>> That's insufficient to generate a filehandle in general.
>>>>
>>>> Yes, sure, but for live migration having inode and device is enough and that's why.
>>>> We can use two ways of having a filesystem on the target machine in the same
>>>> state (from paths points of view) as it was on destination one:
>>>>
>>>> 1. copy file tree in a rsync manner
>>>> 2. copy a virtual disk image file
>>>>
>>>> In the 1st case we can map inode number to path easily, since we iterate over a filesystem
>
> OK. Then you don't care about unlinked files?

Yes. If it's unlinked file, we can emulate this on restore with opening any path,
then unlinking it. The inode number will change, yes, but in many cases this is
acceptable trade-off.

> If the filesystem's frozen by the time you get here, I suppose you could
> also just use paths?

Sure, but where can we get the path from? This is what we're trying to resolve.

>>>> anyway. I agree, that rsync is not perfect for migration but still.
>>>>
>>>> In the 2nd case we can generate filehandle out of an inode number only since we _do_ know
>>>> that inode will not get reused.
>>>
>>> If you are going to to use open_by_handle, then that handle is not
>>> sufficient right ? Or do you have open_by_inode ? as part of c/r ?
>>
>> Why? For e.g. ext4 you can construct a handle in userspace and open by it.
>
> If it's a real filehandle you want, in general you don't want to
> construct it in userspace--depending on the filesystem it may require
> filesystem-specific knowledge.

I see.

> --b.

Thanks,
Pavel

2012-08-21 12:29:15

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

On Tue, Aug 21, 2012 at 04:22:31PM +0400, Pavel Emelyanov wrote:
> On 08/21/2012 04:11 PM, J. Bruce Fields wrote:
> > On Tue, Aug 21, 2012 at 03:09:05PM +0400, Pavel Emelyanov wrote:
> >> On 08/21/2012 02:54 PM, Cyrill Gorcunov wrote:
> >>> On Tue, Aug 21, 2012 at 02:49:47PM +0400, Pavel Emelyanov wrote:
> >>>>>>
> >>>>>> However, if you have some better ideas on what information about inode should be exported
> >>>>>> to the userspace please share.
> >>>>>>
> >>>>>
> >>>>> Why not use name_to_handle(fd,...) and open_by_handle(handle,..) ?
> >>>>
> >>>> Because we don't have an fd at hands by the time we need to know the handle.
> >>>
> >>> Yeah, this might be not clear from patchset itself but inotify marks carry
> >>> inodes inside kernel thus it's inodes what we can use when we fetch information
> >>> about targets and put it into fdinfo output.
> >>
> >> Al, Bruce, Aneesh,
> >>
> >> What if we calculate the handle at the time we do have struct path at hands (i.e.
> >> when we create the inotify) and store it on the inotify structure purely to be
> >> shown later in proc. Would that be acceptable?
> >
> > Was it the lack of a dentry that was really the problem? I thought it
> > was just the fact that not all filesystems support filehandles.
>
> Initial problem -- we don't know what is being watched by an inotify fd.
>
> Having a dentry somewhere was the 1st attempt to solve this -- keep a path
> in inotify and show it when required. It doesn't work since holding a ref on
> path changes the behavior of watched inode (we cannot rename/unlink/remount
> it the same way as we could before patching the kernel).

OK. So if you don't mind the fact that there are filesystems with
inotify support but not filehandle support, then I think generating a
filehandle early as you describe would work. I guess it's a little more
memory per watched inode.

--b.

2012-08-21 12:40:35

by Pavel Emelyanov

[permalink] [raw]
Subject: Re: [patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

>>>> Al, Bruce, Aneesh,
>>>>
>>>> What if we calculate the handle at the time we do have struct path at hands (i.e.
>>>> when we create the inotify) and store it on the inotify structure purely to be
>>>> shown later in proc. Would that be acceptable?
>>>
>>> Was it the lack of a dentry that was really the problem? I thought it
>>> was just the fact that not all filesystems support filehandles.
>>
>> Initial problem -- we don't know what is being watched by an inotify fd.
>>
>> Having a dentry somewhere was the 1st attempt to solve this -- keep a path
>> in inotify and show it when required. It doesn't work since holding a ref on
>> path changes the behavior of watched inode (we cannot rename/unlink/remount
>> it the same way as we could before patching the kernel).
>
> OK. So if you don't mind the fact that there are filesystems with
> inotify support but not filehandle support, then I think generating a
> filehandle early as you describe would work. I guess it's a little more
> memory per watched inode.

Great! Thanks, Bruce, we'll rework the patch accordingly :)

> --b.
> .
>

Thanks,
Pavel

2012-08-21 12:52:11

by Boaz Harrosh

[permalink] [raw]
Subject: Re: [patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

On 08/21/2012 03:29 PM, J. Bruce Fields wrote:
<>

> OK. So if you don't mind the fact that there are filesystems with
> inotify support but not filehandle support, then I think generating a
> filehandle early as you describe would work. I guess it's a little more
> memory per watched inode.
>


For the minority of FSs that do not have a filehandle support it should
be easy to generate a generic one, that should work 95% of the time.

Are we guaranteed that after the checkpoint restore the version of
the Kernel is the same as the one that did the checkpoint? If yes
then I don't see any problem.

> --b.


Cheers
Boaz

2012-08-21 13:00:23

by Pavel Emelyanov

[permalink] [raw]
Subject: Re: [patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

On 08/21/2012 04:51 PM, Boaz Harrosh wrote:
> On 08/21/2012 03:29 PM, J. Bruce Fields wrote:
> <>
>
>> OK. So if you don't mind the fact that there are filesystems with
>> inotify support but not filehandle support, then I think generating a
>> filehandle early as you describe would work. I guess it's a little more
>> memory per watched inode.
>>
>
>
> For the minority of FSs that do not have a filehandle support it should
> be easy to generate a generic one, that should work 95% of the time.

Yup, this is how exportfd_encode_fh currently works.

> Are we guaranteed that after the checkpoint restore the version of
> the Kernel is the same as the one that did the checkpoint? If yes
> then I don't see any problem.

Strictly speaking -- no we don't. Migration should to work across kernel
versions (from older to newer). Why kernel version matters in this case?

>> --b.
>
>
> Cheers
> Boaz
>

2012-08-21 13:08:43

by Boaz Harrosh

[permalink] [raw]
Subject: Re: [patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

On 08/21/2012 03:59 PM, Pavel Emelyanov wrote:

> On 08/21/2012 04:51 PM, Boaz Harrosh wrote:
>> On 08/21/2012 03:29 PM, J. Bruce Fields wrote:
>> <>
>
> Strictly speaking -- no we don't. Migration should to work across kernel
> versions (from older to newer). Why kernel version matters in this case?


I was thinking of an FS that used to not implement export_fh but starts
to in the new version. Or some other such change in fhandles. But it's
such a far fetch I agree.

Boaz

2012-08-21 13:13:10

by Al Viro

[permalink] [raw]
Subject: Re: [patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

On Tue, Aug 21, 2012 at 03:51:36PM +0300, Boaz Harrosh wrote:
> For the minority of FSs that do not have a filehandle support it should
> be easy to generate a generic one, that should work 95% of the time.

Great. Your task, then, is to show how to do that for sysfs. Or for nfs.
Those should be representative enough for you to get the appropriate
reality check.

2012-08-21 13:40:16

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

On Tue, Aug 21, 2012 at 08:29:08AM -0400, J. Bruce Fields wrote:
> > Initial problem -- we don't know what is being watched by an inotify fd.
> >
> > Having a dentry somewhere was the 1st attempt to solve this -- keep a path
> > in inotify and show it when required. It doesn't work since holding a ref on
> > path changes the behavior of watched inode (we cannot rename/unlink/remount
> > it the same way as we could before patching the kernel).
>
> OK. So if you don't mind the fact that there are filesystems with
> inotify support but not filehandle support, then I think generating a
> filehandle early as you describe would work. I guess it's a little more
> memory per watched inode.

So, I thought about something like below, any comments?
---
fs/notify/inotify/inotify.h | 8 +++++++
fs/notify/inotify/inotify_user.c | 41 ++++++++++++++++++++++++++++++++++-----
2 files changed, 44 insertions(+), 5 deletions(-)

Index: linux-2.6.git/fs/notify/inotify/inotify.h
===================================================================
--- linux-2.6.git.orig/fs/notify/inotify/inotify.h
+++ linux-2.6.git/fs/notify/inotify/inotify.h
@@ -1,6 +1,7 @@
#include <linux/fsnotify_backend.h>
#include <linux/inotify.h>
#include <linux/slab.h> /* struct kmem_cache */
+#include <linux/exportfs.h>

extern struct kmem_cache *event_priv_cachep;

@@ -9,9 +10,16 @@ struct inotify_event_private_data {
int wd;
};

+#if defined(CONFIG_PROC_FS) && defined(CONFIG_EXPORTFS) && defined(CONFIG_CHECKPOINT_RESTORE)
+# define INOTIFY_USE_FHANDLE
+#endif
+
struct inotify_inode_mark {
struct fsnotify_mark fsn_mark;
int wd;
+#ifdef INOTIFY_USE_FHANDLE
+ __u8 fhandle[sizeof(struct file_handle) + MAX_HANDLE_SZ];
+#endif
};

extern void inotify_ignored_and_remove_idr(struct fsnotify_mark *fsn_mark,
Index: linux-2.6.git/fs/notify/inotify/inotify_user.c
===================================================================
--- linux-2.6.git.orig/fs/notify/inotify/inotify_user.c
+++ linux-2.6.git/fs/notify/inotify/inotify_user.c
@@ -566,6 +566,32 @@ static void inotify_free_mark(struct fsn
kmem_cache_free(inotify_inode_mark_cachep, i_mark);
}

+#ifdef INOTIFY_USE_FHANDLE
+static int inotify_encode_wd_fhandle(struct inotify_inode_mark *mark, struct dentry *dentry)
+{
+ struct file_handle *fhandle = (struct file_handle *)mark->fhandle;
+ int size, ret;
+
+ BUILD_BUG_ON(sizeof(mark->fhandle) <= sizeof(struct file_handle));
+
+ fhandle->handle_bytes = sizeof(mark->fhandle) - sizeof(struct file_handle);
+ size = fhandle->handle_bytes >> 2;
+
+ ret = exportfs_encode_fh(dentry, (struct fid *)fhandle->f_handle, &size, 0);
+ if ((ret == 255) || (ret == -ENOSPC))
+ return -EOVERFLOW;
+
+ fhandle->handle_type = ret;
+
+ return 0;
+}
+# else
+static int inotify_encode_wd_fhandle(struct inotify_inode_mark *mark, struct dentry *dentry)
+{
+ return 0;
+}
+#endif
+
static int inotify_update_existing_watch(struct fsnotify_group *group,
struct inode *inode,
u32 arg)
@@ -621,10 +647,11 @@ static int inotify_update_existing_watch
}

static int inotify_new_watch(struct fsnotify_group *group,
- struct inode *inode,
+ struct dentry *dentry,
u32 arg)
{
struct inotify_inode_mark *tmp_i_mark;
+ struct inode *inode = dentry->d_inode;
__u32 mask;
int ret;
struct idr *idr = &group->inotify_data.idr;
@@ -647,6 +674,10 @@ static int inotify_new_watch(struct fsno
if (atomic_read(&group->inotify_data.user->inotify_watches) >= inotify_max_user_watches)
goto out_err;

+ ret = inotify_encode_wd_fhandle(tmp_i_mark, dentry);
+ if (ret)
+ goto out_err;
+
ret = inotify_add_to_idr(idr, idr_lock, &group->inotify_data.last_wd,
tmp_i_mark);
if (ret)
@@ -673,16 +704,16 @@ out_err:
return ret;
}

-static int inotify_update_watch(struct fsnotify_group *group, struct inode *inode, u32 arg)
+static int inotify_update_watch(struct fsnotify_group *group, struct dentry *dentry, u32 arg)
{
int ret = 0;

retry:
/* try to update and existing watch with the new arg */
- ret = inotify_update_existing_watch(group, inode, arg);
+ ret = inotify_update_existing_watch(group, dentry->d_inode, arg);
/* no mark present, try to add a new one */
if (ret == -ENOENT)
- ret = inotify_new_watch(group, inode, arg);
+ ret = inotify_new_watch(group, dentry, arg);
/*
* inotify_new_watch could race with another thread which did an
* inotify_new_watch between the update_existing and the add watch
@@ -785,7 +816,7 @@ SYSCALL_DEFINE3(inotify_add_watch, int,
group = filp->private_data;

/* create/update an inode mark */
- ret = inotify_update_watch(group, inode, mask);
+ ret = inotify_update_watch(group, path.dentry, mask);
path_put(&path);
fput_and_out:
fput_light(filp, fput_needed);

2012-08-22 05:49:34

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: [patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

Pavel Emelyanov <[email protected]> writes:

> On 08/21/2012 02:42 PM, Aneesh Kumar K.V wrote:
>> Pavel Emelyanov <[email protected]> writes:
>>
>>> On 08/20/2012 11:32 PM, J. Bruce Fields wrote:
>>>> On Mon, Aug 20, 2012 at 11:06:06PM +0400, Cyrill Gorcunov wrote:
>>>>> On Mon, Aug 20, 2012 at 02:32:25PM -0400, J. Bruce Fields wrote:
>>>>>> On Mon, Aug 20, 2012 at 08:33:38PM +0400, Cyrill Gorcunov wrote:
>>>>>>> On Mon, Aug 20, 2012 at 07:49:23PM +0530, Aneesh Kumar K.V wrote:
>>>>>>>> Cyrill Gorcunov <[email protected]> writes:
>>>>>>>>
>>>>>>>>> To provide fsnotify object inodes being watched without
>>>>>>>>> binding to alphabetical path we need to encode them with
>>>>>>>>> exportfs help. This patch adds a helper which operates
>>>>>>>>> with plain inodes directly.
>>>>>>>>
>>>>>>>> doesn't name_to_handle_at() work for you ? It also allows to get a file
>>>>>>>> handle using file descriptor.
>>>>>>>
>>>>>>> Hi, sorry for dealy. Well, the last idea is to get rid of this helper,
>>>>>>> I've sent out an updated version where ino+dev is only printed.
>>>>>>
>>>>>> I don't understand how ino and dev are useful to you, though, if you're
>>>>>> still hoping to be able to look up inodes using this information later
>>>>>> on.
>>>>>
>>>>> Hi Bruce, I believe having ino+dev is better than nothing. Otherwise we
>>>>> simply have no clue which targets are bound to inotify mark. Sometime
>>>>> (!) we can try to generate fhandle in userspace from this ino+dev bundle
>>>>> and then open the target file.
>>>>
>>>> That's insufficient to generate a filehandle in general.
>>>
>>> Yes, sure, but for live migration having inode and device is enough and that's why.
>>> We can use two ways of having a filesystem on the target machine in the same
>>> state (from paths points of view) as it was on destination one:
>>>
>>> 1. copy file tree in a rsync manner
>>> 2. copy a virtual disk image file
>>>
>>> In the 1st case we can map inode number to path easily, since we iterate over a filesystem
>>> anyway. I agree, that rsync is not perfect for migration but still.
>>>
>>> In the 2nd case we can generate filehandle out of an inode number only since we _do_ know
>>> that inode will not get reused.
>>
>> If you are going to to use open_by_handle, then that handle is not
>> sufficient right ? Or do you have open_by_inode ? as part of c/r ?
>
> Why? For e.g. ext4 you can construct a handle in userspace and open by
> it.

open_by_handle use exportfs_decode_fh which use file system specific
fh_to_dentry

foe ext4 we require a generation number

inode = get_inode(sb, fid->i32.ino, fid->i32.gen);

For brtfs

objectid = fid->objectid;
root_objectid = fid->root_objectid;
generation = fid->gen;

return btrfs_get_dentry(sb, objectid, root_objectid, generation, 1);

-aneesh

2012-08-23 08:06:55

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

On Wed, Aug 22, 2012 at 11:19:07AM +0530, Aneesh Kumar K.V wrote:
> Pavel Emelyanov <[email protected]> writes:
>
> > Why? For e.g. ext4 you can construct a handle in userspace and open by
> > it.
>
> open_by_handle use exportfs_decode_fh which use file system specific
> fh_to_dentry
>
> foe ext4 we require a generation number
>
> inode = get_inode(sb, fid->i32.ino, fid->i32.gen);
>

Hi Aneesh, yes we need i_generation but for ext3/4 it could be
fetched with ioctl as far as i see.

> For brtfs
>
> objectid = fid->objectid;
> root_objectid = fid->root_objectid;
> generation = fid->gen;
>
> return btrfs_get_dentry(sb, objectid, root_objectid, generation, 1);

For btrfs it become more complex. But still the last version I'm about
to send for review today (once everything get tested) will provide
fhandle carried with inotify mark _and_ inode number and device. This
information should be enough for us. After all having inode and device
should allow us to figure out the fs used on inotify target.

Cyrill

2012-08-23 09:01:22

by Marco Stornelli

[permalink] [raw]
Subject: Re: [patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

Il 23/08/2012 10:06, Cyrill Gorcunov ha scritto:
> On Wed, Aug 22, 2012 at 11:19:07AM +0530, Aneesh Kumar K.V wrote:
>> Pavel Emelyanov <[email protected]> writes:
>>
>>> Why? For e.g. ext4 you can construct a handle in userspace and open by
>>> it.
>>
>> open_by_handle use exportfs_decode_fh which use file system specific
>> fh_to_dentry
>>
>> foe ext4 we require a generation number
>>
>> inode = get_inode(sb, fid->i32.ino, fid->i32.gen);
>>
>
> Hi Aneesh, yes we need i_generation but for ext3/4 it could be
> fetched with ioctl as far as i see.
>
>> For brtfs
>>
>> objectid = fid->objectid;
>> root_objectid = fid->root_objectid;
>> generation = fid->gen;
>>
>> return btrfs_get_dentry(sb, objectid, root_objectid, generation, 1);
>
> For btrfs it become more complex. But still the last version I'm about
> to send for review today (once everything get tested) will provide
> fhandle carried with inotify mark _and_ inode number and device. This
> information should be enough for us. After all having inode and device
> should allow us to figure out the fs used on inotify target.
>
> Cyrill
> --

Sorry if it's a really stupid question but I didn't follow deeply the
thread. How can you provide a device if the fs is mounted on "none" (ex
tmpfs)? In this case you can't associate device <=> fs, because you
haven't got a /dev/something.

Marco

2012-08-23 09:29:57

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [patch 4/8] fs, exportfs: Add export_encode_inode_fh helper

On Thu, Aug 23, 2012 at 10:54:44AM +0200, Marco Stornelli wrote:
> >>For brtfs
> >>
> >> objectid = fid->objectid;
> >> root_objectid = fid->root_objectid;
> >> generation = fid->gen;
> >>
> >> return btrfs_get_dentry(sb, objectid, root_objectid, generation, 1);
> >
> >For btrfs it become more complex. But still the last version I'm about
> >to send for review today (once everything get tested) will provide
> >fhandle carried with inotify mark _and_ inode number and device. This
> >information should be enough for us. After all having inode and device
> >should allow us to figure out the fs used on inotify target.
>
> Sorry if it's a really stupid question but I didn't follow deeply
> the thread. How can you provide a device if the fs is mounted on
> "none" (ex tmpfs)? In this case you can't associate device <=> fs,
> because you haven't got a /dev/something.

That's ::s_dev from superblock, together with parsing /proc/$pid/mountinfo
and /proc/mounts you can figure out anything you need.

Cyrill