2017-03-10 08:24:18

by Cyrill Gorcunov

[permalink] [raw]
Subject: [patch 1/3] procfs: fdinfo -- Extend information about epoll target files

Since it is possbile to have same number in tfd field (say
file added, closed, then nother file dup'ed to same number
and added back) it is imposible to distinguish such target
files solely by their numbers.

Strictly speaking regular applications don't need to recognize
these targets at all but for checkpoint/restore sake we need
to collect targets to be able to push them back on restore
stage in a proper order.

Thus lets add file position, inode and device number where
this target lays. This three fields can be used as a primary
key for sorting, and together with kcmp help CRIU can find
out an exact file target (from the whole set of processes
being checkpointed).

Signed-off-by: Cyrill Gorcunov <[email protected]>
CC: Al Viro <[email protected]>
CC: Andrew Morton <[email protected]>
CC: Andrey Vagin <[email protected]>
CC: Pavel Emelyanov <[email protected]>
CC: Michael Kerrisk <[email protected]>
CC: Kir Kolyshkin <[email protected]>
CC: Jason Baron <[email protected]>
CC: Andy Lutomirski <[email protected]>
---
Documentation/filesystems/proc.txt | 6 +++++-
fs/eventpoll.c | 8 ++++++--
2 files changed, 11 insertions(+), 3 deletions(-)

Index: linux-ml.git/Documentation/filesystems/proc.txt
===================================================================
--- linux-ml.git.orig/Documentation/filesystems/proc.txt
+++ linux-ml.git/Documentation/filesystems/proc.txt
@@ -1779,12 +1779,16 @@ pair provide additional information part
pos: 0
flags: 02
mnt_id: 9
- tfd: 5 events: 1d data: ffffffffffffffff
+ tfd: 5 events: 1d data: ffffffffffffffff pos:0 ino:61af sdev:7

where 'tfd' is a target file descriptor number in decimal form,
'events' is events mask being watched and the 'data' is data
associated with a target [see epoll(7) for more details].

+ The 'pos' is current offset of the target file in decimal form
+ [see lseek(2)], 'ino' and 'sdev' are inode and device numbers
+ where target file resides, all in hex format.
+
Fsnotify files
~~~~~~~~~~~~~~
For inotify files the format is the following
Index: linux-ml.git/fs/eventpoll.c
===================================================================
--- linux-ml.git.orig/fs/eventpoll.c
+++ linux-ml.git/fs/eventpoll.c
@@ -883,10 +883,14 @@ static void ep_show_fdinfo(struct seq_fi
mutex_lock(&ep->mtx);
for (rbp = rb_first(&ep->rbr); rbp; rbp = rb_next(rbp)) {
struct epitem *epi = rb_entry(rbp, struct epitem, rbn);
+ struct inode *inode = file_inode(epi->ffd.file);

- seq_printf(m, "tfd: %8d events: %8x data: %16llx\n",
+ seq_printf(m, "tfd: %8d events: %8x data: %16llx "
+ " pos:%lli ino:%lx sdev:%x\n",
epi->ffd.fd, epi->event.events,
- (long long)epi->event.data);
+ (long long)epi->event.data,
+ (long long)epi->ffd.file->f_pos,
+ inode->i_ino, inode->i_sb->s_dev);
if (seq_has_overflowed(m))
break;
}


2017-03-17 05:03:18

by Andrei Vagin

[permalink] [raw]
Subject: Re: [patch 1/3] procfs: fdinfo -- Extend information about epoll target files

On Fri, Mar 10, 2017 at 11:16:56AM +0300, Cyrill Gorcunov wrote:
> Since it is possbile to have same number in tfd field (say
> file added, closed, then nother file dup'ed to same number
> and added back) it is imposible to distinguish such target
> files solely by their numbers.
>
> Strictly speaking regular applications don't need to recognize
> these targets at all but for checkpoint/restore sake we need
> to collect targets to be able to push them back on restore
> stage in a proper order.
>
> Thus lets add file position, inode and device number where
> this target lays. This three fields can be used as a primary
> key for sorting, and together with kcmp help CRIU can find
> out an exact file target (from the whole set of processes
> being checkpointed).
>
> Signed-off-by: Cyrill Gorcunov <[email protected]>
> CC: Al Viro <[email protected]>
> CC: Andrew Morton <[email protected]>
> CC: Andrey Vagin <[email protected]>
> CC: Pavel Emelyanov <[email protected]>
> CC: Michael Kerrisk <[email protected]>
> CC: Kir Kolyshkin <[email protected]>
> CC: Jason Baron <[email protected]>
> CC: Andy Lutomirski <[email protected]>
> ---
> Documentation/filesystems/proc.txt | 6 +++++-
> fs/eventpoll.c | 8 ++++++--
> 2 files changed, 11 insertions(+), 3 deletions(-)
>
> Index: linux-ml.git/Documentation/filesystems/proc.txt
> ===================================================================
> --- linux-ml.git.orig/Documentation/filesystems/proc.txt
> +++ linux-ml.git/Documentation/filesystems/proc.txt
> @@ -1779,12 +1779,16 @@ pair provide additional information part
> pos: 0
> flags: 02
> mnt_id: 9
> - tfd: 5 events: 1d data: ffffffffffffffff
> + tfd: 5 events: 1d data: ffffffffffffffff pos:0 ino:61af sdev:7

I think it may be better to print mnt_id instead of sdev, because there
may be two file descriptors opened from different bind mounts.

>
> where 'tfd' is a target file descriptor number in decimal form,
> 'events' is events mask being watched and the 'data' is data
> associated with a target [see epoll(7) for more details].
>
> + The 'pos' is current offset of the target file in decimal form
> + [see lseek(2)], 'ino' and 'sdev' are inode and device numbers
> + where target file resides, all in hex format.
> +
> Fsnotify files
> ~~~~~~~~~~~~~~
> For inotify files the format is the following
> Index: linux-ml.git/fs/eventpoll.c
> ===================================================================
> --- linux-ml.git.orig/fs/eventpoll.c
> +++ linux-ml.git/fs/eventpoll.c
> @@ -883,10 +883,14 @@ static void ep_show_fdinfo(struct seq_fi
> mutex_lock(&ep->mtx);
> for (rbp = rb_first(&ep->rbr); rbp; rbp = rb_next(rbp)) {
> struct epitem *epi = rb_entry(rbp, struct epitem, rbn);
> + struct inode *inode = file_inode(epi->ffd.file);
>
> - seq_printf(m, "tfd: %8d events: %8x data: %16llx\n",
> + seq_printf(m, "tfd: %8d events: %8x data: %16llx "
> + " pos:%lli ino:%lx sdev:%x\n",
> epi->ffd.fd, epi->event.events,
> - (long long)epi->event.data);
> + (long long)epi->event.data,
> + (long long)epi->ffd.file->f_pos,
> + inode->i_ino, inode->i_sb->s_dev);
> if (seq_has_overflowed(m))
> break;
> }
>

2017-03-17 08:26:26

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: [patch 1/3] procfs: fdinfo -- Extend information about epoll target files

On Thu, Mar 16, 2017 at 09:59:09PM -0700, Andrei Vagin wrote:
> On Fri, Mar 10, 2017 at 11:16:56AM +0300, Cyrill Gorcunov wrote:
> > Since it is possbile to have same number in tfd field (say
> > file added, closed, then nother file dup'ed to same number
> > and added back) it is imposible to distinguish such target
> > files solely by their numbers.
> >
> > Strictly speaking regular applications don't need to recognize
> > these targets at all but for checkpoint/restore sake we need
> > to collect targets to be able to push them back on restore
> > stage in a proper order.
> >
> > Thus lets add file position, inode and device number where
> > this target lays. This three fields can be used as a primary
> > key for sorting, and together with kcmp help CRIU can find
> > out an exact file target (from the whole set of processes
> > being checkpointed).
> >
> > Signed-off-by: Cyrill Gorcunov <[email protected]>
> > CC: Al Viro <[email protected]>
> > CC: Andrew Morton <[email protected]>
> > CC: Andrey Vagin <[email protected]>
> > CC: Pavel Emelyanov <[email protected]>
> > CC: Michael Kerrisk <[email protected]>
> > CC: Kir Kolyshkin <[email protected]>
> > CC: Jason Baron <[email protected]>
> > CC: Andy Lutomirski <[email protected]>
> > ---
> > Documentation/filesystems/proc.txt | 6 +++++-
> > fs/eventpoll.c | 8 ++++++--
> > 2 files changed, 11 insertions(+), 3 deletions(-)
> >
> > Index: linux-ml.git/Documentation/filesystems/proc.txt
> > ===================================================================
> > --- linux-ml.git.orig/Documentation/filesystems/proc.txt
> > +++ linux-ml.git/Documentation/filesystems/proc.txt
> > @@ -1779,12 +1779,16 @@ pair provide additional information part
> > pos: 0
> > flags: 02
> > mnt_id: 9
> > - tfd: 5 events: 1d data: ffffffffffffffff
> > + tfd: 5 events: 1d data: ffffffffffffffff pos:0 ino:61af sdev:7
>
> I think it may be better to print mnt_id instead of sdev, because there
> may be two file descriptors opened from different bind mounts.

Fetching mnt_id is not that cheap in compare with sdev: instead of
straight dereference inode->i_sb->s_dev we will have to figure out
mnt_id from file+path, and our primary key is from sdev+ino anyway,
so until _really_ needed I prefer cheaper/simplier solution.