Many monitoring tools include open file count as a metric. Currently
the only way to get this number is to enumerate the files in /proc/pid/fd.
The problem with the current approach is that it does many things people
generally don't care about when they need one number for a metric.
In our tests for cadvisor, which reports open file counts per cgroup,
we observed that reading the number of open files is slow. Out of 35.23%
of CPU time spent in `proc_readfd_common`, we see 29.43% spent in
`proc_fill_cache`, which is responsible for filling dentry info.
Some of this extra time is spinlock contention, but it's a contention
for the lock we don't want to take to begin with.
We considered putting the number of open files in /proc/pid/stat.
Unfortunately, counting the number of fds involves iterating the fdtable,
which means that it might slow down /proc/pid/stat for processes
with many open files. Instead we opted to put this info in /proc/pid/fd
as a size member of the stat syscall result. Previously the reported
number was zero, so there's very little risk of breaking anything,
while still providing a somewhat logical way to count the open files.
Previously:
```
$ sudo stat /proc/1/fd | head -n2
File: /proc/1/fd
Size: 0 Blocks: 0 IO Block: 1024 directory
```
With this patch:
```
$ sudo stat /proc/1/fd | head -n2
File: /proc/1/fd
Size: 65 Blocks: 0 IO Block: 1024 directory
```
Correctness check:
```
$ sudo ls /proc/1/fd | wc -l
65
```
There are two alternatives to this approach that I can see:
* Expose /proc/pid/fd_count with a count there
* Make fd count acces O(1) and expose it in /proc/pid/status
I can probably figure out how to do the former, but the latter
will require somebody with more experience in file code than myself.
Signed-off-by: Ivan Babrou <[email protected]>
---
fs/proc/fd.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 47 insertions(+)
diff --git a/fs/proc/fd.c b/fs/proc/fd.c
index 913bef0d2a36..c7ac142500a8 100644
--- a/fs/proc/fd.c
+++ b/fs/proc/fd.c
@@ -279,6 +279,29 @@ static int proc_readfd_common(struct file *file, struct dir_context *ctx,
return 0;
}
+static int proc_readfd_count(struct inode *inode)
+{
+ struct task_struct *p = get_proc_task(inode);
+ unsigned int fd = 0, count = 0;
+
+ if (!p)
+ return -ENOENT;
+
+ rcu_read_lock();
+ while (task_lookup_next_fd_rcu(p, &fd)) {
+ rcu_read_unlock();
+
+ count++;
+ fd++;
+
+ cond_resched();
+ rcu_read_lock();
+ }
+ rcu_read_unlock();
+ put_task_struct(p);
+ return count;
+}
+
static int proc_readfd(struct file *file, struct dir_context *ctx)
{
return proc_readfd_common(file, ctx, proc_fd_instantiate);
@@ -319,9 +342,33 @@ int proc_fd_permission(struct user_namespace *mnt_userns,
return rv;
}
+int proc_fd_getattr(struct user_namespace *mnt_userns,
+ const struct path *path, struct kstat *stat,
+ u32 request_mask, unsigned int query_flags)
+{
+ struct inode *inode = d_inode(path->dentry);
+ struct proc_dir_entry *de = PDE(inode);
+
+ if (de) {
+ nlink_t nlink = READ_ONCE(de->nlink);
+
+ if (nlink > 0)
+ set_nlink(inode, nlink);
+ }
+
+ generic_fillattr(&init_user_ns, inode, stat);
+
+ /* If it's a directory, put the number of open fds there */
+ if (S_ISDIR(inode->i_mode))
+ stat->size = proc_readfd_count(inode);
+
+ return 0;
+}
+
const struct inode_operations proc_fd_inode_operations = {
.lookup = proc_lookupfd,
.permission = proc_fd_permission,
+ .getattr = proc_fd_getattr,
.setattr = proc_setattr,
};
--
2.37.2
(cc's added)
On Fri, 16 Sep 2022 16:08:52 -0700 Ivan Babrou <[email protected]> wrote:
> Many monitoring tools include open file count as a metric. Currently
> the only way to get this number is to enumerate the files in /proc/pid/fd.
>
> The problem with the current approach is that it does many things people
> generally don't care about when they need one number for a metric.
> In our tests for cadvisor, which reports open file counts per cgroup,
> we observed that reading the number of open files is slow. Out of 35.23%
> of CPU time spent in `proc_readfd_common`, we see 29.43% spent in
> `proc_fill_cache`, which is responsible for filling dentry info.
> Some of this extra time is spinlock contention, but it's a contention
> for the lock we don't want to take to begin with.
>
> We considered putting the number of open files in /proc/pid/stat.
> Unfortunately, counting the number of fds involves iterating the fdtable,
> which means that it might slow down /proc/pid/stat for processes
> with many open files. Instead we opted to put this info in /proc/pid/fd
> as a size member of the stat syscall result. Previously the reported
> number was zero, so there's very little risk of breaking anything,
> while still providing a somewhat logical way to count the open files.
Documentation/filesystems/proc.rst would be an appropriate place to
document this ;)
> Previously:
>
> ```
> $ sudo stat /proc/1/fd | head -n2
> File: /proc/1/fd
> Size: 0 Blocks: 0 IO Block: 1024 directory
> ```
>
> With this patch:
>
> ```
> $ sudo stat /proc/1/fd | head -n2
> File: /proc/1/fd
> Size: 65 Blocks: 0 IO Block: 1024 directory
> ```
>
> Correctness check:
>
> ```
> $ sudo ls /proc/1/fd | wc -l
> 65
> ```
>
> There are two alternatives to this approach that I can see:
>
> * Expose /proc/pid/fd_count with a count there
> * Make fd count acces O(1) and expose it in /proc/pid/status
>
> I can probably figure out how to do the former, but the latter
> will require somebody with more experience in file code than myself.
>
> Signed-off-by: Ivan Babrou <[email protected]>
> ---
> fs/proc/fd.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 47 insertions(+)
>
> diff --git a/fs/proc/fd.c b/fs/proc/fd.c
> index 913bef0d2a36..c7ac142500a8 100644
> --- a/fs/proc/fd.c
> +++ b/fs/proc/fd.c
> @@ -279,6 +279,29 @@ static int proc_readfd_common(struct file *file, struct dir_context *ctx,
> return 0;
> }
>
> +static int proc_readfd_count(struct inode *inode)
> +{
> + struct task_struct *p = get_proc_task(inode);
> + unsigned int fd = 0, count = 0;
> +
> + if (!p)
> + return -ENOENT;
> +
> + rcu_read_lock();
> + while (task_lookup_next_fd_rcu(p, &fd)) {
> + rcu_read_unlock();
> +
> + count++;
> + fd++;
> +
> + cond_resched();
> + rcu_read_lock();
> + }
> + rcu_read_unlock();
> + put_task_struct(p);
> + return count;
> +}
> +
> static int proc_readfd(struct file *file, struct dir_context *ctx)
> {
> return proc_readfd_common(file, ctx, proc_fd_instantiate);
> @@ -319,9 +342,33 @@ int proc_fd_permission(struct user_namespace *mnt_userns,
> return rv;
> }
>
> +int proc_fd_getattr(struct user_namespace *mnt_userns,
> + const struct path *path, struct kstat *stat,
> + u32 request_mask, unsigned int query_flags)
> +{
> + struct inode *inode = d_inode(path->dentry);
> + struct proc_dir_entry *de = PDE(inode);
> +
> + if (de) {
> + nlink_t nlink = READ_ONCE(de->nlink);
> +
> + if (nlink > 0)
> + set_nlink(inode, nlink);
> + }
> +
> + generic_fillattr(&init_user_ns, inode, stat);
> +
> + /* If it's a directory, put the number of open fds there */
> + if (S_ISDIR(inode->i_mode))
> + stat->size = proc_readfd_count(inode);
> +
> + return 0;
> +}
> +
> const struct inode_operations proc_fd_inode_operations = {
> .lookup = proc_lookupfd,
> .permission = proc_fd_permission,
> + .getattr = proc_fd_getattr,
> .setattr = proc_setattr,
> };
>
> --
> 2.37.2
On Fri, Sep 16, 2022 at 5:01 PM Andrew Morton <[email protected]> wrote:
>
> (cc's added)
>
> On Fri, 16 Sep 2022 16:08:52 -0700 Ivan Babrou <[email protected]> wrote:
>
> > Many monitoring tools include open file count as a metric. Currently
> > the only way to get this number is to enumerate the files in /proc/pid/fd.
> >
> > The problem with the current approach is that it does many things people
> > generally don't care about when they need one number for a metric.
> > In our tests for cadvisor, which reports open file counts per cgroup,
> > we observed that reading the number of open files is slow. Out of 35.23%
> > of CPU time spent in `proc_readfd_common`, we see 29.43% spent in
> > `proc_fill_cache`, which is responsible for filling dentry info.
> > Some of this extra time is spinlock contention, but it's a contention
> > for the lock we don't want to take to begin with.
> >
> > We considered putting the number of open files in /proc/pid/stat.
> > Unfortunately, counting the number of fds involves iterating the fdtable,
> > which means that it might slow down /proc/pid/stat for processes
> > with many open files. Instead we opted to put this info in /proc/pid/fd
> > as a size member of the stat syscall result. Previously the reported
> > number was zero, so there's very little risk of breaking anything,
> > while still providing a somewhat logical way to count the open files.
>
> Documentation/filesystems/proc.rst would be an appropriate place to
> document this ;)
I am more than happy to add the docs after there's a confirmation that
this is an appropriate approach to expose this information. I probably
should've mentioned this explicitly, that's on me. There are two
alternative approaches at the bottom of my original email that might
be considered.
On Fri, Sep 16, 2022 at 05:01:15PM -0700, Andrew Morton wrote:
> (cc's added)
>
> On Fri, 16 Sep 2022 16:08:52 -0700 Ivan Babrou <[email protected]> wrote:
>
> > Many monitoring tools include open file count as a metric. Currently
> > the only way to get this number is to enumerate the files in /proc/pid/fd.
> >
> > The problem with the current approach is that it does many things people
> > generally don't care about when they need one number for a metric.
> > In our tests for cadvisor, which reports open file counts per cgroup,
> > we observed that reading the number of open files is slow. Out of 35.23%
> > of CPU time spent in `proc_readfd_common`, we see 29.43% spent in
> > `proc_fill_cache`, which is responsible for filling dentry info.
> > Some of this extra time is spinlock contention, but it's a contention
> > for the lock we don't want to take to begin with.
> >
> > We considered putting the number of open files in /proc/pid/stat.
> > Unfortunately, counting the number of fds involves iterating the fdtable,
> > which means that it might slow down /proc/pid/stat for processes
> > with many open files. Instead we opted to put this info in /proc/pid/fd
> > as a size member of the stat syscall result. Previously the reported
> > number was zero, so there's very little risk of breaking anything,
> > while still providing a somewhat logical way to count the open files.
>
> Documentation/filesystems/proc.rst would be an appropriate place to
> document this ;)
>
> > Previously:
> >
> > ```
> > $ sudo stat /proc/1/fd | head -n2
> > File: /proc/1/fd
> > Size: 0 Blocks: 0 IO Block: 1024 directory
> > ```
> >
> > With this patch:
> >
> > ```
> > $ sudo stat /proc/1/fd | head -n2
> > File: /proc/1/fd
> > Size: 65 Blocks: 0 IO Block: 1024 directory
Yes. This is natural place.
> > ```
> >
> > Correctness check:
> >
> > ```
> > $ sudo ls /proc/1/fd | wc -l
> > 65
> > ```
> >
> > There are two alternatives to this approach that I can see:
> >
> > * Expose /proc/pid/fd_count with a count there
> > * Make fd count acces O(1) and expose it in /proc/pid/status
This is doable, next to FDSize.
Below is doable too.
> > --- a/fs/proc/fd.c
> > +++ b/fs/proc/fd.c
> > @@ -279,6 +279,29 @@ static int proc_readfd_common(struct file *file, struct dir_context *ctx,
> > return 0;
> > }
> >
> > +static int proc_readfd_count(struct inode *inode)
> > +{
> > + struct task_struct *p = get_proc_task(inode);
> > + unsigned int fd = 0, count = 0;
> > +
> > + if (!p)
> > + return -ENOENT;
> > +
> > + rcu_read_lock();
> > + while (task_lookup_next_fd_rcu(p, &fd)) {
> > + rcu_read_unlock();
> > +
> > + count++;
> > + fd++;
> > +
> > + cond_resched();
> > + rcu_read_lock();
> > + }
> > + rcu_read_unlock();
> > + put_task_struct(p);
> > + return count;
> > +}
> > +
> > static int proc_readfd(struct file *file, struct dir_context *ctx)
> > {
> > return proc_readfd_common(file, ctx, proc_fd_instantiate);
> > @@ -319,9 +342,33 @@ int proc_fd_permission(struct user_namespace *mnt_userns,
> > return rv;
> > }
> >
> > +int proc_fd_getattr(struct user_namespace *mnt_userns,
> > + const struct path *path, struct kstat *stat,
> > + u32 request_mask, unsigned int query_flags)
> > +{
> > + struct inode *inode = d_inode(path->dentry);
> > + struct proc_dir_entry *de = PDE(inode);
> > +
> > + if (de) {
> > + nlink_t nlink = READ_ONCE(de->nlink);
> > +
> > + if (nlink > 0)
> > + set_nlink(inode, nlink);
> > + }
> > +
> > + generic_fillattr(&init_user_ns, inode, stat);
^^^^^^^^^^^^^
Is this correct? I'm not userns guy at all.
> > +
> > + /* If it's a directory, put the number of open fds there */
> > + if (S_ISDIR(inode->i_mode))
> > + stat->size = proc_readfd_count(inode);
ENOENT can get there. In principle this is OK, userspace can live with it.
> > const struct inode_operations proc_fd_inode_operations = {
> > .lookup = proc_lookupfd,
> > .permission = proc_fd_permission,
> > + .getattr = proc_fd_getattr,
> > .setattr = proc_setattr,
On Fri, Sep 16, 2022 at 04:08:52PM -0700, Ivan Babrou wrote:
> We considered putting the number of open files in /proc/pid/stat.
> Unfortunately, counting the number of fds involves iterating the fdtable,
> which means that it might slow down /proc/pid/stat for processes
> with many open files. Instead we opted to put this info in /proc/pid/fd
> as a size member of the stat syscall result. Previously the reported
> number was zero, so there's very little risk of breaking anything,
> while still providing a somewhat logical way to count the open files.
Instead of using the st_size of /proc/<pid>/fd, why not return that
value in st_nlink? /proc/<pid>/fd is a directory, so having st_nlinks
return number of fd's plus 2 (for . and ..) would be much more natural.
Cheers,
- Ted
> > > * Make fd count acces O(1) and expose it in /proc/pid/status
>
> This is doable, next to FDSize.
It feels like a better solution, but maybe I'm missing some context
here. Let me know whether this is preferred.
That said, I've tried doing it, but failed. There's a noticeable
mismatch in the numbers:
* systemd:
ivan@vm:~$ sudo ls -l /proc/1/fd | wc -l
66
ivan@vm:~$ cat /proc/1/status | fgrep FD
FDSize: 256
FDUsed: 71
* journald:
ivan@vm:~$ sudo ls -l /proc/803/fd | wc -l
29
ivan@vm:~$ cat /proc/803/status | fgrep FD
FDSize: 128
FDUsed: 37
I'll see if I can make it work next week. I'm happy to receive tips as well.
Below is my attempt (link in case gmail breaks patch formatting):
* https://gist.githubusercontent.com/bobrik/acce40881d629d8cce2e55966b31a0a2/raw/716eb4724a8fe3afeeb76fd2a7a47ee13790a9e9/fdused.patch
diff --git a/fs/file.c b/fs/file.c
index 3bcc1ecc314a..8bc0741cabf1 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -85,6 +85,8 @@ static void copy_fdtable(struct fdtable *nfdt,
struct fdtable *ofdt)
memset((char *)nfdt->fd + cpy, 0, set);
copy_fd_bitmaps(nfdt, ofdt, ofdt->max_fds);
+
+ atomic_set(&nfdt->count, atomic_read(&ofdt->count));
}
/*
@@ -105,6 +107,7 @@ static void copy_fdtable(struct fdtable *nfdt,
struct fdtable *ofdt)
static struct fdtable * alloc_fdtable(unsigned int nr)
{
struct fdtable *fdt;
+ atomic_t count = ATOMIC_INIT(0);
void *data;
/*
@@ -148,6 +151,7 @@ static struct fdtable * alloc_fdtable(unsigned int nr)
fdt->close_on_exec = data;
data += nr / BITS_PER_BYTE;
fdt->full_fds_bits = data;
+ fdt->count = count;
return fdt;
@@ -399,6 +403,8 @@ struct files_struct *dup_fd(struct files_struct
*oldf, unsigned int max_fds, int
/* clear the remainder */
memset(new_fds, 0, (new_fdt->max_fds - open_files) * sizeof(struct file *));
+ atomic_set(&new_fdt->count, atomic_read(&old_fdt->count));
+
rcu_assign_pointer(newf->fdt, new_fdt);
return newf;
@@ -474,6 +480,7 @@ struct files_struct init_files = {
.close_on_exec = init_files.close_on_exec_init,
.open_fds = init_files.open_fds_init,
.full_fds_bits = init_files.full_fds_bits_init,
+ .count = ATOMIC_INIT(0),
},
.file_lock = __SPIN_LOCK_UNLOCKED(init_files.file_lock),
.resize_wait = __WAIT_QUEUE_HEAD_INITIALIZER(init_files.resize_wait),
@@ -613,6 +620,7 @@ void fd_install(unsigned int fd, struct file *file)
BUG_ON(fdt->fd[fd] != NULL);
rcu_assign_pointer(fdt->fd[fd], file);
spin_unlock(&files->file_lock);
+ atomic_inc(&fdt->count);
return;
}
/* coupled with smp_wmb() in expand_fdtable() */
@@ -621,6 +629,7 @@ void fd_install(unsigned int fd, struct file *file)
BUG_ON(fdt->fd[fd] != NULL);
rcu_assign_pointer(fdt->fd[fd], file);
rcu_read_unlock_sched();
+ atomic_inc(&fdt->count);
}
EXPORT_SYMBOL(fd_install);
@@ -646,6 +655,7 @@ static struct file *pick_file(struct files_struct
*files, unsigned fd)
if (file) {
rcu_assign_pointer(fdt->fd[fd], NULL);
__put_unused_fd(files, fd);
+ atomic_dec(&fdt->count);
}
return file;
}
@@ -844,6 +854,7 @@ void do_close_on_exec(struct files_struct *files)
filp_close(file, files);
cond_resched();
spin_lock(&files->file_lock);
+ atomic_dec(&fdt->count);
}
}
@@ -1108,6 +1119,7 @@ __releases(&files->file_lock)
else
__clear_close_on_exec(fd, fdt);
spin_unlock(&files->file_lock);
+ atomic_inc(&fdt->count);
if (tofree)
filp_close(tofree, files);
diff --git a/fs/proc/array.c b/fs/proc/array.c
index 99fcbfda8e25..5847f077bfc3 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -153,7 +153,8 @@ static inline void task_state(struct seq_file *m,
struct pid_namespace *ns,
struct task_struct *tracer;
const struct cred *cred;
pid_t ppid, tpid = 0, tgid, ngid;
- unsigned int max_fds = 0;
+ struct fdtable *fdt;
+ unsigned int max_fds = 0, open_fds = 0;
rcu_read_lock();
ppid = pid_alive(p) ?
@@ -170,8 +171,11 @@ static inline void task_state(struct seq_file *m,
struct pid_namespace *ns,
task_lock(p);
if (p->fs)
umask = p->fs->umask;
- if (p->files)
- max_fds = files_fdtable(p->files)->max_fds;
+ if (p->files) {
+ fdt = files_fdtable(p->files);
+ max_fds = fdt->max_fds;
+ open_fds = atomic_read(&fdt->count);
+ }
task_unlock(p);
rcu_read_unlock();
@@ -194,6 +198,7 @@ static inline void task_state(struct seq_file *m,
struct pid_namespace *ns,
seq_put_decimal_ull(m, "\t", from_kgid_munged(user_ns, cred->sgid));
seq_put_decimal_ull(m, "\t", from_kgid_munged(user_ns, cred->fsgid));
seq_put_decimal_ull(m, "\nFDSize:\t", max_fds);
+ seq_put_decimal_ull(m, "\nFDUsed:\t", open_fds);
seq_puts(m, "\nGroups:\t");
group_info = cred->group_info;
diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h
index e066816f3519..59aceb1e4bc6 100644
--- a/include/linux/fdtable.h
+++ b/include/linux/fdtable.h
@@ -31,6 +31,7 @@ struct fdtable {
unsigned long *open_fds;
unsigned long *full_fds_bits;
struct rcu_head rcu;
+ atomic_t count;
};
static inline bool close_on_exec(unsigned int fd, const struct fdtable *fdt)
> > > +
> > > + generic_fillattr(&init_user_ns, inode, stat);
> ^^^^^^^^^^^^^
>
> Is this correct? I'm not userns guy at all.
I mostly copied from here:
* https://elixir.bootlin.com/linux/v6.0-rc5/source/fs/proc/generic.c#L150
Maybe it can be simplified even further to match this one:
* https://elixir.bootlin.com/linux/v6.0-rc5/source/fs/proc/root.c#L317
On Sat, Sep 17, 2022 at 8:31 AM Theodore Ts'o <[email protected]> wrote:
>
> On Fri, Sep 16, 2022 at 04:08:52PM -0700, Ivan Babrou wrote:
> > We considered putting the number of open files in /proc/pid/stat.
> > Unfortunately, counting the number of fds involves iterating the fdtable,
> > which means that it might slow down /proc/pid/stat for processes
> > with many open files. Instead we opted to put this info in /proc/pid/fd
> > as a size member of the stat syscall result. Previously the reported
> > number was zero, so there's very little risk of breaking anything,
> > while still providing a somewhat logical way to count the open files.
>
> Instead of using the st_size of /proc/<pid>/fd, why not return that
> value in st_nlink? /proc/<pid>/fd is a directory, so having st_nlinks
> return number of fd's plus 2 (for . and ..) would be much more natural.
From what I see, st_nlinks is used for the number of subdirectories
and it doesn't include files. In /proc/fd we only have files (well,
symlinks really). I'm still happy to use that instead if that's
preferred.
On Sat, Sep 17, 2022 at 10:15:13AM +0300, Alexey Dobriyan wrote:
> On Fri, Sep 16, 2022 at 05:01:15PM -0700, Andrew Morton wrote:
> > (cc's added)
> >
> > On Fri, 16 Sep 2022 16:08:52 -0700 Ivan Babrou <[email protected]> wrote:
> >
> > > Many monitoring tools include open file count as a metric. Currently
> > > the only way to get this number is to enumerate the files in /proc/pid/fd.
> > >
> > > The problem with the current approach is that it does many things people
> > > generally don't care about when they need one number for a metric.
> > > In our tests for cadvisor, which reports open file counts per cgroup,
> > > we observed that reading the number of open files is slow. Out of 35.23%
> > > of CPU time spent in `proc_readfd_common`, we see 29.43% spent in
> > > `proc_fill_cache`, which is responsible for filling dentry info.
> > > Some of this extra time is spinlock contention, but it's a contention
> > > for the lock we don't want to take to begin with.
> > >
> > > We considered putting the number of open files in /proc/pid/stat.
> > > Unfortunately, counting the number of fds involves iterating the fdtable,
> > > which means that it might slow down /proc/pid/stat for processes
> > > with many open files. Instead we opted to put this info in /proc/pid/fd
> > > as a size member of the stat syscall result. Previously the reported
> > > number was zero, so there's very little risk of breaking anything,
> > > while still providing a somewhat logical way to count the open files.
> >
> > Documentation/filesystems/proc.rst would be an appropriate place to
> > document this ;)
> >
> > > Previously:
> > >
> > > ```
> > > $ sudo stat /proc/1/fd | head -n2
> > > File: /proc/1/fd
> > > Size: 0 Blocks: 0 IO Block: 1024 directory
> > > ```
> > >
> > > With this patch:
> > >
> > > ```
> > > $ sudo stat /proc/1/fd | head -n2
> > > File: /proc/1/fd
> > > Size: 65 Blocks: 0 IO Block: 1024 directory
>
> Yes. This is natural place.
>
> > > ```
> > >
> > > Correctness check:
> > >
> > > ```
> > > $ sudo ls /proc/1/fd | wc -l
> > > 65
> > > ```
> > >
> > > There are two alternatives to this approach that I can see:
> > >
> > > * Expose /proc/pid/fd_count with a count there
>
> > > * Make fd count acces O(1) and expose it in /proc/pid/status
>
> This is doable, next to FDSize.
>
> Below is doable too.
>
> > > --- a/fs/proc/fd.c
> > > +++ b/fs/proc/fd.c
> > > @@ -279,6 +279,29 @@ static int proc_readfd_common(struct file *file, struct dir_context *ctx,
> > > return 0;
> > > }
> > >
> > > +static int proc_readfd_count(struct inode *inode)
> > > +{
> > > + struct task_struct *p = get_proc_task(inode);
> > > + unsigned int fd = 0, count = 0;
> > > +
> > > + if (!p)
> > > + return -ENOENT;
> > > +
> > > + rcu_read_lock();
> > > + while (task_lookup_next_fd_rcu(p, &fd)) {
> > > + rcu_read_unlock();
> > > +
> > > + count++;
> > > + fd++;
> > > +
> > > + cond_resched();
> > > + rcu_read_lock();
> > > + }
> > > + rcu_read_unlock();
> > > + put_task_struct(p);
> > > + return count;
> > > +}
> > > +
> > > static int proc_readfd(struct file *file, struct dir_context *ctx)
> > > {
> > > return proc_readfd_common(file, ctx, proc_fd_instantiate);
> > > @@ -319,9 +342,33 @@ int proc_fd_permission(struct user_namespace *mnt_userns,
> > > return rv;
> > > }
> > >
> > > +int proc_fd_getattr(struct user_namespace *mnt_userns,
> > > + const struct path *path, struct kstat *stat,
> > > + u32 request_mask, unsigned int query_flags)
> > > +{
> > > + struct inode *inode = d_inode(path->dentry);
> > > + struct proc_dir_entry *de = PDE(inode);
> > > +
> > > + if (de) {
> > > + nlink_t nlink = READ_ONCE(de->nlink);
> > > +
> > > + if (nlink > 0)
> > > + set_nlink(inode, nlink);
> > > + }
> > > +
> > > + generic_fillattr(&init_user_ns, inode, stat);
> ^^^^^^^^^^^^^
>
> Is this correct? I'm not userns guy at all.
This is correct. :)
On Sat, Sep 17, 2022 at 11:32:02AM -0700, Ivan Babrou wrote:
> > > > * Make fd count acces O(1) and expose it in /proc/pid/status
> >
> > This is doable, next to FDSize.
>
> It feels like a better solution, but maybe I'm missing some context
> here. Let me know whether this is preferred.
I don't know. I'd put it in st_size as you did initially.
/proc/*/status should be slow.
On Mon, Sep 19, 2022 at 5:30 AM Alexey Dobriyan <[email protected]> wrote:
>
> On Sat, Sep 17, 2022 at 11:32:02AM -0700, Ivan Babrou wrote:
> > > > > * Make fd count acces O(1) and expose it in /proc/pid/status
> > >
> > > This is doable, next to FDSize.
> >
> > It feels like a better solution, but maybe I'm missing some context
> > here. Let me know whether this is preferred.
>
> I don't know. I'd put it in st_size as you did initially.
> /proc/*/status should be slow.
Could you elaborate what you mean?
* Are you saying that having FDUsed in /proc/*/status _would_ be slow?
I would imagine that adding atomic_read() there shouldn't slow things
down too much.
* Are you saying that reading /proc/*/status is already slow and
reading the number of open files from there would be inefficient?