2007-11-19 21:00:09

by Pavel Machek

[permalink] [raw]
Subject: 2.6.24-rc3: find complains about /proc/net

Hi!

I think that this worked before:

root@amd:/proc# find . -name "timer_info"
find: WARNING: Hard link count is wrong for ./net: this may be a bug
in your filesystem driver. Automatically turning on find's -noleaf
option. Earlier results may have failed to include directories that
should have been searched.
root@amd:/proc#

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


2007-11-19 21:46:57

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.24-rc3: find complains about /proc/net

On Monday, 19 of November 2007, Pavel Machek wrote:
> Hi!
>
> I think that this worked before:
>
> root@amd:/proc# find . -name "timer_info"
> find: WARNING: Hard link count is wrong for ./net: this may be a bug
> in your filesystem driver. Automatically turning on find's -noleaf
> option. Earlier results may have failed to include directories that
> should have been searched.
> root@amd:/proc#

I'm seeing that too.

Rafael

2007-11-20 15:52:34

by Pavel Emelyanov

[permalink] [raw]
Subject: Re: 2.6.24-rc3: find complains about /proc/net

Rafael J. Wysocki wrote:
> On Monday, 19 of November 2007, Pavel Machek wrote:
>> Hi!
>>
>> I think that this worked before:
>>
>> root@amd:/proc# find . -name "timer_info"
>> find: WARNING: Hard link count is wrong for ./net: this may be a bug
>> in your filesystem driver. Automatically turning on find's -noleaf
>> option. Earlier results may have failed to include directories that
>> should have been searched.
>> root@amd:/proc#
>
> I'm seeing that too.

I have a better things with 2.6.24-rc3 ;)

# cd /proc/net
# ls ..
ls: reading directory ..: Not a directory

and this

# cd /proc
# find
...
./net
find: . changed during execution of find
# find net
find: net changed during execution of find
# find net/
<this works ok however>

Moreover. Program that opens /proc/net and dumps the /proc/self/fd
files produces the following:

# cd /
# a.out /proc/net
...
lr-x------ 1 root root 64 Nov 20 18:02 3 -> /proc/net/net (deleted)
...
# cd /proc/net
# a.out .
...
lr-x------ 1 root root 64 Nov 20 18:03 3 -> /proc/net/net (deleted)
...
# a.out ..
...
lr-x------ 1 root root 64 Nov 20 18:03 3 -> /proc/net
...

This all is somehow related to the shadow proc files.
E.g. the first problem (with -ENOTDIR) is due to the shadow /proc/net
dentry doesn't implement the .readdir method:

static const struct file_operations proc_net_dir_operations = {
.read = generic_read_dir,
};

And I haven't managed to find out why the rest problems
occur...

Eric, do you have fixes for it?

> Rafael
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2007-11-20 21:53:37

by Eric W. Biederman

[permalink] [raw]
Subject: Re: 2.6.24-rc3: find complains about /proc/net

Pavel Emelyanov <[email protected]> writes:

> Rafael J. Wysocki wrote:
>> On Monday, 19 of November 2007, Pavel Machek wrote:
>>> Hi!
>>>
>>> I think that this worked before:
>>>
>>> root@amd:/proc# find . -name "timer_info"
>>> find: WARNING: Hard link count is wrong for ./net: this may be a bug
>>> in your filesystem driver. Automatically turning on find's -noleaf
>>> option. Earlier results may have failed to include directories that
>>> should have been searched.
>>> root@amd:/proc#
>>
>> I'm seeing that too.
>
> I have a better things with 2.6.24-rc3 ;)
>
> # cd /proc/net
> # ls ..
> ls: reading directory ..: Not a directory

Ok. That part is truly a bug.
Looks like you have tracked down the cause.
Grumble you are getting the wrong .. :(

> and this
>
> # cd /proc
> # find
> ...
> ./net
> find: . changed during execution of find
> # find net
> find: net changed during execution of find
> # find net/
> <this works ok however>
>
> Moreover. Program that opens /proc/net and dumps the /proc/self/fd
> files produces the following:
>
> # cd /
> # a.out /proc/net
> ...
> lr-x------ 1 root root 64 Nov 20 18:02 3 -> /proc/net/net (deleted)
> ...
> # cd /proc/net
> # a.out .
> ...
> lr-x------ 1 root root 64 Nov 20 18:03 3 -> /proc/net/net (deleted)
> ...

> # a.out ..
> ...
> lr-x------ 1 root root 64 Nov 20 18:03 3 -> /proc/net
> ...

Yes all of those are nasty. So much for my clever way of implementing
these things. Grr. Simple hacks that almost work!

> This all is somehow related to the shadow proc files.
> E.g. the first problem (with -ENOTDIR) is due to the shadow /proc/net
> dentry doesn't implement the .readdir method:
>
> static const struct file_operations proc_net_dir_operations = {
> .read = generic_read_dir,
> };
>
> And I haven't managed to find out why the rest problems
> occur...
>
> Eric, do you have fixes for it?

Not exactly. It is tricky. I have known there are issues but so far
the difficulty of a better solution has been higher then my annoyance
level with this problem.

A special solution for !CONFIG_NET_NS may be practical for 2.6.24.

The only way I know of to really solve this problem cleanly and
completely is to make /proc/net an explicit symlink to /proc/self/net
and make /proc/<pid>/net a magic mountpoint (ala nfs automounts) that
mounts a per network namespace filesystem. Al Viro wasn't to happy
when I suggested it (mostly because he was convinced such a solution
was likely to be full of races).

The half assed clean solution is to ensure nothing under /proc/net
gets cached and ensure the dentry tree is built properly, for the
current reader of /proc.

A third option is to fix .. in /proc/net. Although I'm a bit
dubious if that will do more then fix a few symptoms with the
current solution.

Eric

2007-11-20 21:59:45

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.24-rc3: find complains about /proc/net


* Eric W. Biederman <[email protected]> wrote:

> > lr-x------ 1 root root 64 Nov 20 18:03 3 -> /proc/net
> > ...
>
> Yes all of those are nasty. So much for my clever way of implementing
> these things. Grr. Simple hacks that almost work!

btw., in case you feel inclined, i recently did some userspace coding
and found to my surprise that /proc/self points to the parent task, not
the thread itself (giving threads no real way to examine themselves). If
you are hacking in this area, would it be a big trouble to add something
like /proc/self-task/ or something like that? I had to use a raw gettid
syscall to figure out the TID to get to /proc/*/tasks/TID/sched
instrumentation info - which is quite a PITA.

Ingo

2007-11-20 22:18:14

by Eric W. Biederman

[permalink] [raw]
Subject: Re: 2.6.24-rc3: find complains about /proc/net

Ingo Molnar <[email protected]> writes:

> * Eric W. Biederman <[email protected]> wrote:
>
>> > lr-x------ 1 root root 64 Nov 20 18:03 3 -> /proc/net
>> > ...
>>
>> Yes all of those are nasty. So much for my clever way of implementing
>> these things. Grr. Simple hacks that almost work!
>
> btw., in case you feel inclined, i recently did some userspace coding
> and found to my surprise that /proc/self points to the parent task, not
> the thread itself (giving threads no real way to examine themselves). If
> you are hacking in this area, would it be a big trouble to add something
> like /proc/self-task/ or something like that? I had to use a raw gettid
> syscall to figure out the TID to get to /proc/*/tasks/TID/sched
> instrumentation info - which is quite a PITA.

Agreed. I have been debating with myself in the last couple of days
if it is a bug that /proc/self uses the tgid and not the actual pid/tid
value.

If I can be convinced that posix threads don't care I will happily just
switch /proc/self, calling the current implementation a bug.

I think it is a bug the real question is what are the backwards
compatibility implications. Do posix threads care?

It appears to me that either we need to fix /proc/self or we need
to add /proc/task-self and fix /proc/mounts to point at that.

In the normal case we share all of the same things so I think it is
a don't care. Except that /proc/self/status | grep Pid returns the
tgid.

Hmm. I think I am just going to send Andrew a patch for 2.6.25 that
just fixes /proc/self. I just fail to see how using the tgid is correct.
The only cases we could care seem to do the wrong thing when we use the
tgid.

Eric

2007-11-20 22:36:29

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.24-rc3: find complains about /proc/net


these are all questions for Ulrich and Roland - Cc:-ed them.

* Eric W. Biederman <[email protected]> wrote:

> Ingo Molnar <[email protected]> writes:
>
> > * Eric W. Biederman <[email protected]> wrote:
> >
> >> > lr-x------ 1 root root 64 Nov 20 18:03 3 -> /proc/net
> >> > ...
> >>
> >> Yes all of those are nasty. So much for my clever way of implementing
> >> these things. Grr. Simple hacks that almost work!
> >
> > btw., in case you feel inclined, i recently did some userspace coding
> > and found to my surprise that /proc/self points to the parent task, not
> > the thread itself (giving threads no real way to examine themselves). If
> > you are hacking in this area, would it be a big trouble to add something
> > like /proc/self-task/ or something like that? I had to use a raw gettid
> > syscall to figure out the TID to get to /proc/*/tasks/TID/sched
> > instrumentation info - which is quite a PITA.
>
> Agreed. I have been debating with myself in the last couple of days
> if it is a bug that /proc/self uses the tgid and not the actual
> pid/tid value.
>
> If I can be convinced that posix threads don't care I will happily
> just switch /proc/self, calling the current implementation a bug.
>
> I think it is a bug the real question is what are the backwards
> compatibility implications. Do posix threads care?
>
> It appears to me that either we need to fix /proc/self or we need to
> add /proc/task-self and fix /proc/mounts to point at that.
>
> In the normal case we share all of the same things so I think it is a
> don't care. Except that /proc/self/status | grep Pid returns the
> tgid.
>
> Hmm. I think I am just going to send Andrew a patch for 2.6.25 that
> just fixes /proc/self. I just fail to see how using the tgid is
> correct. The only cases we could care seem to do the wrong thing when
> we use the tgid.

2007-11-20 22:42:32

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH] proc: Fix the threaded /proc/self.


Long ago when the CLONE_THREAD support first went it someone
thought it would be wise to point /proc/self at /proc/<tgid>
instead of /proc/<pid>.

Given that /proc/<tgid> can return information about a very different
task (if enough things have been unshared) then our current process
/proc/<tgid> seems blatantly wrong. So far I have yet to think up
an example where the current behavior would be advantageous, and
I can see several places where it is seriously non-intuitive.

We may be stuck with the current broken behavior for backwards
compatibility reasons but lets try fixing our ancient bug for the
2.6.25 time frame and see if anyone screams.

Signed-off-by: Eric W. Biederman <[email protected]>
---
fs/proc/base.c | 12 ++++++------
1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 34a1821..8502436 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2050,22 +2050,22 @@ static int proc_self_readlink(struct dentry *dentry, char __user *buffer,
int buflen)
{
struct pid_namespace *ns = dentry->d_sb->s_fs_info;
- pid_t tgid = task_tgid_nr_ns(current, ns);
+ pid_t pid = task_pid_nr_ns(current, ns);
char tmp[PROC_NUMBUF];
- if (!tgid)
+ if (!pid)
return -ENOENT;
- sprintf(tmp, "%d", tgid);
+ sprintf(tmp, "%d", pid);
return vfs_readlink(dentry,buffer,buflen,tmp);
}

static void *proc_self_follow_link(struct dentry *dentry, struct nameidata *nd)
{
struct pid_namespace *ns = dentry->d_sb->s_fs_info;
- pid_t tgid = task_tgid_nr_ns(current, ns);
+ pid_t pid = task_pid_nr_ns(current, ns);
char tmp[PROC_NUMBUF];
- if (!tgid)
+ if (!pid)
return ERR_PTR(-ENOENT);
- sprintf(tmp, "%d", task_tgid_nr_ns(current, ns));
+ sprintf(tmp, "%d", pid);
return ERR_PTR(vfs_follow_link(nd,tmp));
}

--
1.5.3.rc6.17.g1911

2007-11-20 22:55:20

by Roland McGrath

[permalink] [raw]
Subject: Re: 2.6.24-rc3: find complains about /proc/net

When did /proc/self get changed to follow tgid instead of pid? glibc uses
/proc/self to refer to various things that are usually shared anyway (fd,
maps, cwd, exe), but I think the expectation has always been that this
refers to the same calling thread, not the group leader. e.g., if one
thread has changed uids so it no longer has access to the group leader's
/proc/PID/fd, suddenly it using /proc/self/fd starts failing.


Thanks,
Roland

2007-11-20 22:58:29

by Guillaume Chazarain

[permalink] [raw]
Subject: Re: [PATCH] proc: Fix the threaded /proc/self.

Hello Eric,

This fills a need I had to get the current TID in a Java program,
so I'm very interested in this change. OTOH, how will someone
not reading LKML discover that the current TID is now in
/proc/self and that it was not always the case?

I would put my 2 cents in /proc/self/task/self, this way TGID are
always in /proc and TID in /proc/TGID/task.

--
Guillaume

2007-11-20 23:01:50

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.24-rc3: find complains about /proc/net


* Roland McGrath <[email protected]> wrote:

> When did /proc/self get changed to follow tgid instead of pid? glibc
> uses /proc/self to refer to various things that are usually shared
> anyway (fd, maps, cwd, exe), but I think the expectation has always
> been that this refers to the same calling thread, not the group
> leader. e.g., if one thread has changed uids so it no longer has
> access to the group leader's /proc/PID/fd, suddenly it using
> /proc/self/fd starts failing.

i guess it was a v2.6.24 change, hence a regression that needs to be
fixed?

Ingo

2007-11-20 23:03:47

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] proc: Fix the threaded /proc/self.


* Eric W. Biederman <[email protected]> wrote:

> We may be stuck with the current broken behavior for backwards
> compatibility reasons but lets try fixing our ancient bug for the
> 2.6.25 time frame and see if anyone screams.

to make sure i got you right - do you agree that this is a regression
and that we need the patch below included in 2.6.24?

> Signed-off-by: Eric W. Biederman <[email protected]>

Acked-by: Ingo Molnar <[email protected]>

> ---
> fs/proc/base.c | 12 ++++++------
> 1 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index 34a1821..8502436 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -2050,22 +2050,22 @@ static int proc_self_readlink(struct dentry *dentry, char __user *buffer,
> int buflen)
> {
> struct pid_namespace *ns = dentry->d_sb->s_fs_info;
> - pid_t tgid = task_tgid_nr_ns(current, ns);
> + pid_t pid = task_pid_nr_ns(current, ns);
> char tmp[PROC_NUMBUF];
> - if (!tgid)
> + if (!pid)
> return -ENOENT;
> - sprintf(tmp, "%d", tgid);
> + sprintf(tmp, "%d", pid);
> return vfs_readlink(dentry,buffer,buflen,tmp);
> }
>
> static void *proc_self_follow_link(struct dentry *dentry, struct nameidata *nd)
> {
> struct pid_namespace *ns = dentry->d_sb->s_fs_info;
> - pid_t tgid = task_tgid_nr_ns(current, ns);
> + pid_t pid = task_pid_nr_ns(current, ns);
> char tmp[PROC_NUMBUF];
> - if (!tgid)
> + if (!pid)
> return ERR_PTR(-ENOENT);
> - sprintf(tmp, "%d", task_tgid_nr_ns(current, ns));
> + sprintf(tmp, "%d", pid);
> return ERR_PTR(vfs_follow_link(nd,tmp));
> }
>
> --
> 1.5.3.rc6.17.g1911

Ingo

2007-11-20 23:07:06

by Guillaume Chazarain

[permalink] [raw]
Subject: Re: 2.6.24-rc3: find complains about /proc/net

On 11/21/07, Ingo Molnar <[email protected]> wrote:
> i guess it was a v2.6.24 change, hence a regression that needs to be
> fixed?

It seems to be

http://git.kernel.org/?p=linux/kernel/git/tglx/history.git;a=commitdiff;h=01660410

So, linux 2.6.0-test6

--
Guillaume

2007-11-20 23:26:46

by Roland McGrath

[permalink] [raw]
Subject: Re: 2.6.24-rc3: find complains about /proc/net

Oh, it seems it has indeed been that way for a very long time, so I was
mistaken. It still seems a little odd to me. Ulrich can say definitively
whether the kind of concern I mentioned really matters one way or the other
for glibc.


Thanks,
Roland

2007-11-20 23:36:27

by Ulrich Drepper

[permalink] [raw]
Subject: Re: 2.6.24-rc3: find complains about /proc/net

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Roland McGrath wrote:
> Oh, it seems it has indeed been that way for a very long time, so I was
> mistaken. It still seems a little odd to me. Ulrich can say definitively
> whether the kind of concern I mentioned really matters one way or the other
> for glibc.

glibc cannot survive (at least NPTL) if somebody uses funny CLONE_*
flags to separate various pieces of information, e.g., file descriptors.
So, all the information in each thread's /proc/self should be identical.

When the information is not the same, the current semantics seems to be
more useful. So I guess, no change is the way to go here.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iD8DBQFHQ25/2ijCOnn/RHQRAmhhAJsHRF7FqO8DWwZ97gHxIO/i4Z1AAQCffCGa
Q2J8kjthKbbNQf1USWMAw3Y=
=xl/a
-----END PGP SIGNATURE-----

2007-11-20 23:43:57

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.24-rc3: find complains about /proc/net


* Guillaume Chazarain <[email protected]> wrote:

> On 11/21/07, Ingo Molnar <[email protected]> wrote:
> > i guess it was a v2.6.24 change, hence a regression that needs to be
> > fixed?
>
> It seems to be
>
> http://git.kernel.org/?p=linux/kernel/git/tglx/history.git;a=commitdiff;h=01660410
>
> So, linux 2.6.0-test6

grumble :-/ So i guess /proc/self-task it has to be.

Ingo

2007-11-20 23:45:34

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.24-rc3: find complains about /proc/net


* Ulrich Drepper <[email protected]> wrote:

> > Oh, it seems it has indeed been that way for a very long time, so I
> > was mistaken. It still seems a little odd to me. Ulrich can say
> > definitively whether the kind of concern I mentioned really matters
> > one way or the other for glibc.
>
> glibc cannot survive (at least NPTL) if somebody uses funny CLONE_*
> flags to separate various pieces of information, e.g., file
> descriptors.
> So, all the information in each thread's /proc/self should be
> identical.
>
> When the information is not the same, the current semantics seems to
> be more useful. So I guess, no change is the way to go here.

can you see any danger to providing a /proc/self_task/ link? (or can you
think of a better name/API/approach)

Ingo

2007-11-20 23:51:37

by Roland McGrath

[permalink] [raw]
Subject: Re: 2.6.24-rc3: find complains about /proc/net

> can you see any danger to providing a /proc/self_task/ link? (or can you
> think of a better name/API/approach)

That is a poor name to choose given /proc/self/task exists as something
else (just try writing a sentence comparing them and then read it aloud).
Probably /proc/self/task/self is what makes the most sense structurally.
I don't know if it matters to whatever use you are concerned with to have
two more steps in the lookup.

Thanks,
Roland

2007-11-21 00:43:38

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.24-rc3: find complains about /proc/net

On Wednesday, 21 of November 2007, Roland McGrath wrote:
> > can you see any danger to providing a /proc/self_task/ link? (or can you
> > think of a better name/API/approach)
>
> That is a poor name to choose given /proc/self/task exists as something
> else (just try writing a sentence comparing them and then read it aloud).
> Probably /proc/self/task/self is what makes the most sense structurally.
> I don't know if it matters to whatever use you are concerned with to have
> two more steps in the lookup.

Hm, /proc/this_thread maybe?

Rafael

2007-11-21 00:44:01

by Eric W. Biederman

[permalink] [raw]
Subject: Re: 2.6.24-rc3: find complains about /proc/net

Ulrich Drepper <[email protected]> writes:

> Roland McGrath wrote:
>> Oh, it seems it has indeed been that way for a very long time, so I was
>> mistaken. It still seems a little odd to me. Ulrich can say definitively
>> whether the kind of concern I mentioned really matters one way or the other
>> for glibc.
>
> glibc cannot survive (at least NPTL) if somebody uses funny CLONE_*
> flags to separate various pieces of information, e.g., file descriptors.
> So, all the information in each thread's /proc/self should be identical.

Which seems to confirm that glibc and native pthread can't care.

> When the information is not the same, the current semantics seems to be
> more useful. So I guess, no change is the way to go here.

Could you elaborate a bit on how the semantics of returning the
wrong information are more useful?

In particular if a thread does the logical equivalent of:
grep Pid: /proc/self/status. It always get the tgid despite
having a different process id.

How can that possibly be useful or correct?

>From the kernel side I really think the current semantics of /proc/self
in the context of threads is a bug and confusing. All of the kernel
developers first reaction when this was pointed out was that this
is a regression.

If it is truly useful to user space we can preserve this API design
bug forever. I just want to make certain we are not being bug
compatible without a good reason.

Currently we have several kernel side bugs with threaded
programs because /proc/self does not do the intuitive thing. Unless
something has changed recently selinux will cause accesses by a
non-leader thread to fail when accessing files through /proc/self.

So far the more I look at the current /proc/self behavior the
more I am convinced it is broken, and useless. Please help me see
where it is useful, so we can justify keeping it.

Thanks,
Eric

2007-11-21 00:49:24

by Eric W. Biederman

[permalink] [raw]
Subject: Re: 2.6.24-rc3: find complains about /proc/net

Roland McGrath <[email protected]> writes:

>> can you see any danger to providing a /proc/self_task/ link? (or can you
>> think of a better name/API/approach)
>
> That is a poor name to choose given /proc/self/task exists as something
> else (just try writing a sentence comparing them and then read it aloud).
> Probably /proc/self/task/self is what makes the most sense structurally.
> I don't know if it matters to whatever use you are concerned with to have
> two more steps in the lookup.

Well the only case it could matter is if you aren't allowed to access
/proc/<tgid> which I think may actually be the current selinux behavior.

So if we can't fix /proc/self we need to introduce /proc/task-self at
the top level, just to be certain we don't run into weird cases like
that. Otherwise /proc/self/task/self sounds like a wonderful suggestion.

Eric

2007-11-21 01:25:30

by Eric W. Biederman

[permalink] [raw]
Subject: Re: 2.6.24-rc3: find complains about /proc/net

Pavel Emelyanov <[email protected]> writes:

> Rafael J. Wysocki wrote:
>> On Monday, 19 of November 2007, Pavel Machek wrote:
>>> Hi!
>>>
>>> I think that this worked before:
>>>
>>> root@amd:/proc# find . -name "timer_info"
>>> find: WARNING: Hard link count is wrong for ./net: this may be a bug
>>> in your filesystem driver. Automatically turning on find's -noleaf
>>> option. Earlier results may have failed to include directories that
>>> should have been searched.
>>> root@amd:/proc#
>>
>> I'm seeing that too.
>
> I have a better things with 2.6.24-rc3 ;)
>
> # cd /proc/net
> # ls ..
> ls: reading directory ..: Not a directory
>
> and this
>
> # cd /proc
> # find
> ...
> ./net
> find: . changed during execution of find
> # find net
> find: net changed during execution of find
> # find net/
> <this works ok however>
>
> Moreover. Program that opens /proc/net and dumps the /proc/self/fd
> files produces the following:
>
> # cd /
> # a.out /proc/net
> ...
> lr-x------ 1 root root 64 Nov 20 18:02 3 -> /proc/net/net (deleted)
> ...
> # cd /proc/net
> # a.out .
> ...
> lr-x------ 1 root root 64 Nov 20 18:03 3 -> /proc/net/net (deleted)
> ...
> # a.out ..
> ...
> lr-x------ 1 root root 64 Nov 20 18:03 3 -> /proc/net
> ...
>
> This all is somehow related to the shadow proc files.
> E.g. the first problem (with -ENOTDIR) is due to the shadow /proc/net
> dentry doesn't implement the .readdir method:
>
> static const struct file_operations proc_net_dir_operations = {
> .read = generic_read_dir,
> };
>
> And I haven't managed to find out why the rest problems
> occur...
>
> Eric, do you have fixes for it?

Duh. There is one other possible solution I forgot to mention and
at least as a first pass it should be relatively simple. Have the
mount of proc capture the network namespace. I'm not certain
if it is what we want long term but it should be simple and relatively
easy to implement.

I don't like capturing the network namespace when we mount proc but
it is easier then implementing /proc/self/net. Which is the other
real alternative.

Eric

2007-11-21 01:26:10

by Robert Hancock

[permalink] [raw]
Subject: Re: 2.6.24-rc3: find complains about /proc/net

Eric W. Biederman wrote:
> Could you elaborate a bit on how the semantics of returning the
> wrong information are more useful?
>
> In particular if a thread does the logical equivalent of:
> grep Pid: /proc/self/status. It always get the tgid despite
> having a different process id.

The POSIX-defined userspace concept of a PID requires that all threads
appear to have the same PID. This is something that Linux didn't comply
with under the old LinuxThreads implementation and was finally fixed
with NPTL. This isn't a POSIX-defined interface, but I assume it's
trying to be consistent with getpid(), etc.

> How can that possibly be useful or correct?
>
> From the kernel side I really think the current semantics of /proc/self
> in the context of threads is a bug and confusing. All of the kernel
> developers first reaction when this was pointed out was that this
> is a regression.
>
> If it is truly useful to user space we can preserve this API design
> bug forever. I just want to make certain we are not being bug
> compatible without a good reason.
>
> Currently we have several kernel side bugs with threaded
> programs because /proc/self does not do the intuitive thing. Unless
> something has changed recently selinux will cause accesses by a
> non-leader thread to fail when accessing files through /proc/self.
>
> So far the more I look at the current /proc/self behavior the
> more I am convinced it is broken, and useless. Please help me see
> where it is useful, so we can justify keeping it.

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

2007-11-21 01:44:13

by Eric W. Biederman

[permalink] [raw]
Subject: Re: 2.6.24-rc3: find complains about /proc/net

Robert Hancock <[email protected]> writes:

> Eric W. Biederman wrote:
>> Could you elaborate a bit on how the semantics of returning the
>> wrong information are more useful?
>>
>> In particular if a thread does the logical equivalent of:
>> grep Pid: /proc/self/status. It always get the tgid despite
>> having a different process id.
>
> The POSIX-defined userspace concept of a PID requires that all threads appear to
> have the same PID. This is something that Linux didn't comply with under the old
> LinuxThreads implementation and was finally fixed with NPTL. This isn't a
> POSIX-defined interface, but I assume it's trying to be consistent with
> getpid(), etc.


Linux exports two fields in /proc/self/status:
Tgid: 32698
Pid: 32698

The tgid maps to the posix concept. The pid is this context is the
thread id.

So it seems broken to me to return the same thread id for different threads.

Eric

2007-11-21 06:38:22

by Eric W. Biederman

[permalink] [raw]
Subject: Re: 2.6.24-rc3: find complains about /proc/net


Below is a preliminary patch. It solves the directory issue but it doesn't
play well with proc_mnt and proc_flush_task. It works by simply caching the
network namespace when we mount proc so we don't have to be fancy and dynamic.

Something for the discussion anyway.

I will start sorting out what makes sense tomorrow.

Eric


>From f359fde2469ba8be2123a465e788a83c7e6831e9 Mon Sep 17 00:00:00 2001
From: Eric W. Biederman <[email protected]>
Date: Tue, 20 Nov 2007 19:36:05 -0700
Subject: [PATCH] proc: Fix /proc/net directory listings.

Having proc dynamically display the contents of /proc/net is
hard. So make life simpler by capturing the network namespace
when we mount proc and only displaying that network namespace.

---
fs/proc/base.c | 8 ++--
fs/proc/generic.c | 4 ++-
fs/proc/internal.h | 13 +++++++
fs/proc/proc_net.c | 89 ++++-------------------------------------------
fs/proc/root.c | 50 ++++++++++++++++++--------
include/linux/proc_fs.h | 4 ++
6 files changed, 66 insertions(+), 102 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index aeaf0d0..9d4f06a 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2395,7 +2395,7 @@ struct dentry *proc_pid_lookup(struct inode *dir, struct dentry * dentry, struct
if (tgid == ~0U)
goto out;

- ns = dentry->d_sb->s_fs_info;
+ ns = proc_sbi(dentry->d_sb)->pid_ns;
rcu_read_lock();
task = find_task_by_pid_ns(tgid, ns);
if (task)
@@ -2476,7 +2476,7 @@ int proc_pid_readdir(struct file * filp, void * dirent, filldir_t filldir)
goto out;
}

- ns = filp->f_dentry->d_sb->s_fs_info;
+ ns = proc_sbi(filp->f_dentry->d_sb)->pid_ns;
tgid = filp->f_pos - TGID_OFFSET;
for (task = next_tgid(tgid, ns);
task;
@@ -2615,7 +2615,7 @@ static struct dentry *proc_task_lookup(struct inode *dir, struct dentry * dentry
if (tid == ~0U)
goto out;

- ns = dentry->d_sb->s_fs_info;
+ ns = proc_sbi(dentry->d_sb)->pid_ns;
rcu_read_lock();
task = find_task_by_pid_ns(tid, ns);
if (task)
@@ -2758,7 +2758,7 @@ static int proc_task_readdir(struct file * filp, void * dirent, filldir_t filldi
/* f_version caches the tgid value that the last readdir call couldn't
* return. lseek aka telldir automagically resets f_version to 0.
*/
- ns = filp->f_dentry->d_sb->s_fs_info;
+ ns = proc_sbi(filp->f_dentry->d_sb)->pid_ns;
tid = (int)filp->f_version;
filp->f_version = 0;
for (task = first_tid(leader, tid, pos - 2, ns);
diff --git a/fs/proc/generic.c b/fs/proc/generic.c
index 1bdb624..b58f0ec 100644
--- a/fs/proc/generic.c
+++ b/fs/proc/generic.c
@@ -398,7 +398,9 @@ struct dentry *proc_lookup(struct inode * dir, struct dentry *dentry, struct nam
continue;
if (!memcmp(dentry->d_name.name, de->name, de->namelen)) {
unsigned int ino = de->low_ino;
-
+
+ if (de->shadow_proc)
+ de = de->shadow_proc(dentry->d_sb, de);
de_get(de);
spin_unlock(&proc_subdir_lock);
error = -EINVAL;
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 1820eb2..a26f115 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -11,6 +11,18 @@

#include <linux/proc_fs.h>

+struct pid_namespace;
+struct net;
+struct proc_sb_info {
+ struct pid_namespace *pid_ns;
+ struct net *net_ns;
+};
+
+static inline struct proc_sb_info *proc_sbi(struct super_block *sb)
+{
+ return sb->s_fs_info;
+}
+
#ifdef CONFIG_PROC_SYSCTL
extern int proc_sys_init(void);
#else
@@ -78,3 +90,4 @@ static inline int proc_fd(struct inode *inode)
{
return PROC_I(inode)->fd;
}
+
diff --git a/fs/proc/proc_net.c b/fs/proc/proc_net.c
index 131f9c6..8a82e29 100644
--- a/fs/proc/proc_net.c
+++ b/fs/proc/proc_net.c
@@ -50,89 +50,15 @@ struct net *get_proc_net(const struct inode *inode)
}
EXPORT_SYMBOL_GPL(get_proc_net);

-static struct proc_dir_entry *proc_net_shadow;
+static struct proc_dir_entry *shadow_pde;

-static struct dentry *proc_net_shadow_dentry(struct dentry *parent,
- struct proc_dir_entry *de)
+static struct proc_dir_entry *proc_net_shadow(struct super_block *sb,
+ struct proc_dir_entry *de)
{
- struct dentry *shadow = NULL;
- struct inode *inode;
- if (!de)
- goto out;
- de_get(de);
- inode = proc_get_inode(parent->d_inode->i_sb, de->low_ino, de);
- if (!inode)
- goto out_de_put;
- shadow = d_alloc_name(parent, de->name);
- if (!shadow)
- goto out_iput;
- shadow->d_op = parent->d_op; /* proc_dentry_operations */
- d_instantiate(shadow, inode);
-out:
- return shadow;
-out_iput:
- iput(inode);
-out_de_put:
- de_put(de);
- goto out;
-}
-
-static void *proc_net_follow_link(struct dentry *parent, struct nameidata *nd)
-{
- struct net *net = current->nsproxy->net_ns;
- struct dentry *shadow;
- shadow = proc_net_shadow_dentry(parent, net->proc_net);
- if (!shadow)
- return ERR_PTR(-ENOENT);
-
- dput(nd->dentry);
- /* My dentry count is 1 and that should be enough as the
- * shadow dentry is thrown away immediately.
- */
- nd->dentry = shadow;
- return NULL;
+ struct proc_sb_info *sbi = proc_sbi(sb);
+ return sbi->net_ns->proc_net;
}

-static struct dentry *proc_net_lookup(struct inode *dir, struct dentry *dentry,
- struct nameidata *nd)
-{
- struct net *net = current->nsproxy->net_ns;
- struct dentry *shadow;
-
- shadow = proc_net_shadow_dentry(nd->dentry, net->proc_net);
- if (!shadow)
- return ERR_PTR(-ENOENT);
-
- dput(nd->dentry);
- nd->dentry = shadow;
-
- return shadow->d_inode->i_op->lookup(shadow->d_inode, dentry, nd);
-}
-
-static int proc_net_setattr(struct dentry *dentry, struct iattr *iattr)
-{
- struct net *net = current->nsproxy->net_ns;
- struct dentry *shadow;
- int ret;
-
- shadow = proc_net_shadow_dentry(dentry->d_parent, net->proc_net);
- if (!shadow)
- return -ENOENT;
- ret = shadow->d_inode->i_op->setattr(shadow, iattr);
- dput(shadow);
- return ret;
-}
-
-static const struct file_operations proc_net_dir_operations = {
- .read = generic_read_dir,
-};
-
-static struct inode_operations proc_net_dir_inode_operations = {
- .follow_link = proc_net_follow_link,
- .lookup = proc_net_lookup,
- .setattr = proc_net_setattr,
-};
-
static __net_init int proc_net_ns_init(struct net *net)
{
struct proc_dir_entry *root, *netd, *net_statd;
@@ -185,9 +111,8 @@ static struct pernet_operations __net_initdata proc_net_ns_ops = {

int __init proc_net_init(void)
{
- proc_net_shadow = proc_mkdir("net", NULL);
- proc_net_shadow->proc_iops = &proc_net_dir_inode_operations;
- proc_net_shadow->proc_fops = &proc_net_dir_operations;
+ shadow_pde = proc_mkdir("net", NULL);
+ shadow_pde->shadow_proc = proc_net_shadow;

return register_pernet_subsys(&proc_net_ns_ops);
}
diff --git a/fs/proc/root.c b/fs/proc/root.c
index ec9cb3b..e60ac83 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -19,6 +19,7 @@
#include <linux/smp_lock.h>
#include <linux/mount.h>
#include <linux/pid_namespace.h>
+#include <net/net_namespace.h>

#include "internal.h"

@@ -26,15 +27,23 @@ struct proc_dir_entry *proc_bus, *proc_root_fs, *proc_root_driver;

static int proc_test_super(struct super_block *sb, void *data)
{
- return sb->s_fs_info == data;
+ struct proc_sb_info *sbi = proc_sbi(sb), *info = data;
+ return (sbi->pid_ns == info->pid_ns) &&
+ (sbi->net_ns == info->net_ns);
}

static int proc_set_super(struct super_block *sb, void *data)
{
- struct pid_namespace *ns;
-
- ns = (struct pid_namespace *)data;
- sb->s_fs_info = get_pid_ns(ns);
+
+ struct proc_sb_info *new, *info = data;
+
+ new = kzalloc(sizeof(*new), GFP_KERNEL);
+ if (!new)
+ return -ENOMEM;
+ *new = *info;
+ get_pid_ns(new->pid_ns);
+ get_net(new->net_ns);
+ sb->s_fs_info = new;
return set_anon_super(sb, NULL);
}

@@ -43,7 +52,7 @@ static int proc_get_sb(struct file_system_type *fs_type,
{
int err;
struct super_block *sb;
- struct pid_namespace *ns;
+ struct proc_sb_info info, *sbi;
struct proc_inode *ei;

if (proc_mnt) {
@@ -57,12 +66,14 @@ static int proc_get_sb(struct file_system_type *fs_type,
ei->pid = find_get_pid(1);
}

+ info.pid_ns = current->nsproxy->pid_ns;
+ info.net_ns = current->nsproxy->net_ns;
if (flags & MS_KERNMOUNT)
- ns = (struct pid_namespace *)data;
+ sbi = data;
else
- ns = current->nsproxy->pid_ns;
+ sbi = &info;

- sb = sget(fs_type, proc_test_super, proc_set_super, ns);
+ sb = sget(fs_type, proc_test_super, proc_set_super, sbi);
if (IS_ERR(sb))
return PTR_ERR(sb);

@@ -78,12 +89,13 @@ static int proc_get_sb(struct file_system_type *fs_type,
ei = PROC_I(sb->s_root->d_inode);
if (!ei->pid) {
rcu_read_lock();
- ei->pid = get_pid(find_pid_ns(1, ns));
+ ei->pid = get_pid(find_pid_ns(1, sbi->pid_ns));
rcu_read_unlock();
}

sb->s_flags |= MS_ACTIVE;
- ns->proc_mnt = mnt;
+ if (!sbi->pid_ns->proc_mnt)
+ sbi->pid_ns->proc_mnt = mnt;
}

return simple_set_mnt(mnt, sb);
@@ -91,11 +103,13 @@ static int proc_get_sb(struct file_system_type *fs_type,

static void proc_kill_sb(struct super_block *sb)
{
- struct pid_namespace *ns;
+ struct proc_sb_info *sbi;

- ns = (struct pid_namespace *)sb->s_fs_info;
+ sbi = proc_sbi(sb);
kill_anon_super(sb);
- put_pid_ns(ns);
+ put_pid_ns(sbi->pid_ns);
+ put_net(sbi->net_ns);
+ kfree(sbi);
}

static struct file_system_type proc_fs_type = {
@@ -106,13 +120,16 @@ static struct file_system_type proc_fs_type = {

void __init proc_root_init(void)
{
+ struct proc_sb_info info;
int err = proc_init_inodecache();
if (err)
return;
err = register_filesystem(&proc_fs_type);
if (err)
return;
- proc_mnt = kern_mount_data(&proc_fs_type, &init_pid_ns);
+ info.pid_ns = &init_pid_ns;
+ info.net_ns = current->nsproxy->net_ns;
+ proc_mnt = kern_mount_data(&proc_fs_type, &info);
err = PTR_ERR(proc_mnt);
if (IS_ERR(proc_mnt)) {
unregister_filesystem(&proc_fs_type);
@@ -214,8 +231,11 @@ struct proc_dir_entry proc_root = {

int pid_ns_prepare_proc(struct pid_namespace *ns)
{
+ struct proc_sb_info info;
struct vfsmount *mnt;

+ info.pid_ns = ns;
+ info.net_ns = current->nsproxy->net_ns;
mnt = kern_mount_data(&proc_fs_type, ns);
if (IS_ERR(mnt))
return PTR_ERR(mnt);
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index 2b3c1d8..c22c558 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -48,6 +48,9 @@ typedef int (read_proc_t)(char *page, char **start, off_t off,
typedef int (write_proc_t)(struct file *file, const char __user *buffer,
unsigned long count, void *data);
typedef int (get_info_t)(char *, char **, off_t, int);
+struct proc_dir_entry;
+typedef struct proc_dir_entry *(shadow_proc_t)(struct super_block *sb,
+ struct proc_dir_entry *pde);

struct proc_dir_entry {
unsigned int low_ino;
@@ -79,6 +82,7 @@ struct proc_dir_entry {
int pde_users; /* number of callers into module in progress */
spinlock_t pde_unload_lock; /* proc_fops checks and pde_users bumps */
struct completion *pde_unload_completion;
+ shadow_proc_t *shadow_proc;
};

struct kcore_list {
--
1.5.3.rc6.17.g1911

2007-11-21 09:37:47

by Pavel Emelyanov

[permalink] [raw]
Subject: Re: 2.6.24-rc3: find complains about /proc/net

Eric W. Biederman wrote:
> Below is a preliminary patch. It solves the directory issue but it doesn't
> play well with proc_mnt and proc_flush_task. It works by simply caching the
> network namespace when we mount proc so we don't have to be fancy and dynamic.

Nice... Where should we apply this patch to?

> Something for the discussion anyway.
>
> I will start sorting out what makes sense tomorrow.
>
> Eric
>
>
>>From f359fde2469ba8be2123a465e788a83c7e6831e9 Mon Sep 17 00:00:00 2001
> From: Eric W. Biederman <[email protected]>
> Date: Tue, 20 Nov 2007 19:36:05 -0700
> Subject: [PATCH] proc: Fix /proc/net directory listings.
>
> Having proc dynamically display the contents of /proc/net is
> hard. So make life simpler by capturing the network namespace
> when we mount proc and only displaying that network namespace.
>
> ---
> fs/proc/base.c | 8 ++--
> fs/proc/generic.c | 4 ++-
> fs/proc/internal.h | 13 +++++++
> fs/proc/proc_net.c | 89 ++++-------------------------------------------
> fs/proc/root.c | 50 ++++++++++++++++++--------
> include/linux/proc_fs.h | 4 ++
> 6 files changed, 66 insertions(+), 102 deletions(-)
>
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index aeaf0d0..9d4f06a 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -2395,7 +2395,7 @@ struct dentry *proc_pid_lookup(struct inode *dir, struct dentry * dentry, struct
> if (tgid == ~0U)
> goto out;
>
> - ns = dentry->d_sb->s_fs_info;
> + ns = proc_sbi(dentry->d_sb)->pid_ns;
> rcu_read_lock();
> task = find_task_by_pid_ns(tgid, ns);
> if (task)
> @@ -2476,7 +2476,7 @@ int proc_pid_readdir(struct file * filp, void * dirent, filldir_t filldir)
> goto out;
> }
>
> - ns = filp->f_dentry->d_sb->s_fs_info;
> + ns = proc_sbi(filp->f_dentry->d_sb)->pid_ns;
> tgid = filp->f_pos - TGID_OFFSET;
> for (task = next_tgid(tgid, ns);
> task;
> @@ -2615,7 +2615,7 @@ static struct dentry *proc_task_lookup(struct inode *dir, struct dentry * dentry
> if (tid == ~0U)
> goto out;
>
> - ns = dentry->d_sb->s_fs_info;
> + ns = proc_sbi(dentry->d_sb)->pid_ns;
> rcu_read_lock();
> task = find_task_by_pid_ns(tid, ns);
> if (task)
> @@ -2758,7 +2758,7 @@ static int proc_task_readdir(struct file * filp, void * dirent, filldir_t filldi
> /* f_version caches the tgid value that the last readdir call couldn't
> * return. lseek aka telldir automagically resets f_version to 0.
> */
> - ns = filp->f_dentry->d_sb->s_fs_info;
> + ns = proc_sbi(filp->f_dentry->d_sb)->pid_ns;
> tid = (int)filp->f_version;
> filp->f_version = 0;
> for (task = first_tid(leader, tid, pos - 2, ns);
> diff --git a/fs/proc/generic.c b/fs/proc/generic.c
> index 1bdb624..b58f0ec 100644
> --- a/fs/proc/generic.c
> +++ b/fs/proc/generic.c
> @@ -398,7 +398,9 @@ struct dentry *proc_lookup(struct inode * dir, struct dentry *dentry, struct nam
> continue;
> if (!memcmp(dentry->d_name.name, de->name, de->namelen)) {
> unsigned int ino = de->low_ino;
> -
> +
> + if (de->shadow_proc)
> + de = de->shadow_proc(dentry->d_sb, de);
> de_get(de);
> spin_unlock(&proc_subdir_lock);
> error = -EINVAL;
> diff --git a/fs/proc/internal.h b/fs/proc/internal.h
> index 1820eb2..a26f115 100644
> --- a/fs/proc/internal.h
> +++ b/fs/proc/internal.h
> @@ -11,6 +11,18 @@
>
> #include <linux/proc_fs.h>
>
> +struct pid_namespace;
> +struct net;
> +struct proc_sb_info {
> + struct pid_namespace *pid_ns;
> + struct net *net_ns;
> +};
> +
> +static inline struct proc_sb_info *proc_sbi(struct super_block *sb)
> +{
> + return sb->s_fs_info;
> +}
> +
> #ifdef CONFIG_PROC_SYSCTL
> extern int proc_sys_init(void);
> #else
> @@ -78,3 +90,4 @@ static inline int proc_fd(struct inode *inode)
> {
> return PROC_I(inode)->fd;
> }
> +
> diff --git a/fs/proc/proc_net.c b/fs/proc/proc_net.c
> index 131f9c6..8a82e29 100644
> --- a/fs/proc/proc_net.c
> +++ b/fs/proc/proc_net.c
> @@ -50,89 +50,15 @@ struct net *get_proc_net(const struct inode *inode)
> }
> EXPORT_SYMBOL_GPL(get_proc_net);
>
> -static struct proc_dir_entry *proc_net_shadow;
> +static struct proc_dir_entry *shadow_pde;
>
> -static struct dentry *proc_net_shadow_dentry(struct dentry *parent,
> - struct proc_dir_entry *de)
> +static struct proc_dir_entry *proc_net_shadow(struct super_block *sb,
> + struct proc_dir_entry *de)
> {
> - struct dentry *shadow = NULL;
> - struct inode *inode;
> - if (!de)
> - goto out;
> - de_get(de);
> - inode = proc_get_inode(parent->d_inode->i_sb, de->low_ino, de);
> - if (!inode)
> - goto out_de_put;
> - shadow = d_alloc_name(parent, de->name);
> - if (!shadow)
> - goto out_iput;
> - shadow->d_op = parent->d_op; /* proc_dentry_operations */
> - d_instantiate(shadow, inode);
> -out:
> - return shadow;
> -out_iput:
> - iput(inode);
> -out_de_put:
> - de_put(de);
> - goto out;
> -}
> -
> -static void *proc_net_follow_link(struct dentry *parent, struct nameidata *nd)
> -{
> - struct net *net = current->nsproxy->net_ns;
> - struct dentry *shadow;
> - shadow = proc_net_shadow_dentry(parent, net->proc_net);
> - if (!shadow)
> - return ERR_PTR(-ENOENT);
> -
> - dput(nd->dentry);
> - /* My dentry count is 1 and that should be enough as the
> - * shadow dentry is thrown away immediately.
> - */
> - nd->dentry = shadow;
> - return NULL;
> + struct proc_sb_info *sbi = proc_sbi(sb);
> + return sbi->net_ns->proc_net;
> }
>
> -static struct dentry *proc_net_lookup(struct inode *dir, struct dentry *dentry,
> - struct nameidata *nd)
> -{
> - struct net *net = current->nsproxy->net_ns;
> - struct dentry *shadow;
> -
> - shadow = proc_net_shadow_dentry(nd->dentry, net->proc_net);
> - if (!shadow)
> - return ERR_PTR(-ENOENT);
> -
> - dput(nd->dentry);
> - nd->dentry = shadow;
> -
> - return shadow->d_inode->i_op->lookup(shadow->d_inode, dentry, nd);
> -}
> -
> -static int proc_net_setattr(struct dentry *dentry, struct iattr *iattr)
> -{
> - struct net *net = current->nsproxy->net_ns;
> - struct dentry *shadow;
> - int ret;
> -
> - shadow = proc_net_shadow_dentry(dentry->d_parent, net->proc_net);
> - if (!shadow)
> - return -ENOENT;
> - ret = shadow->d_inode->i_op->setattr(shadow, iattr);
> - dput(shadow);
> - return ret;
> -}
> -
> -static const struct file_operations proc_net_dir_operations = {
> - .read = generic_read_dir,
> -};
> -
> -static struct inode_operations proc_net_dir_inode_operations = {
> - .follow_link = proc_net_follow_link,
> - .lookup = proc_net_lookup,
> - .setattr = proc_net_setattr,
> -};
> -
> static __net_init int proc_net_ns_init(struct net *net)
> {
> struct proc_dir_entry *root, *netd, *net_statd;
> @@ -185,9 +111,8 @@ static struct pernet_operations __net_initdata proc_net_ns_ops = {
>
> int __init proc_net_init(void)
> {
> - proc_net_shadow = proc_mkdir("net", NULL);
> - proc_net_shadow->proc_iops = &proc_net_dir_inode_operations;
> - proc_net_shadow->proc_fops = &proc_net_dir_operations;
> + shadow_pde = proc_mkdir("net", NULL);
> + shadow_pde->shadow_proc = proc_net_shadow;
>
> return register_pernet_subsys(&proc_net_ns_ops);
> }
> diff --git a/fs/proc/root.c b/fs/proc/root.c
> index ec9cb3b..e60ac83 100644
> --- a/fs/proc/root.c
> +++ b/fs/proc/root.c
> @@ -19,6 +19,7 @@
> #include <linux/smp_lock.h>
> #include <linux/mount.h>
> #include <linux/pid_namespace.h>
> +#include <net/net_namespace.h>
>
> #include "internal.h"
>
> @@ -26,15 +27,23 @@ struct proc_dir_entry *proc_bus, *proc_root_fs, *proc_root_driver;
>
> static int proc_test_super(struct super_block *sb, void *data)
> {
> - return sb->s_fs_info == data;
> + struct proc_sb_info *sbi = proc_sbi(sb), *info = data;
> + return (sbi->pid_ns == info->pid_ns) &&
> + (sbi->net_ns == info->net_ns);
> }
>
> static int proc_set_super(struct super_block *sb, void *data)
> {
> - struct pid_namespace *ns;
> -
> - ns = (struct pid_namespace *)data;
> - sb->s_fs_info = get_pid_ns(ns);
> +
> + struct proc_sb_info *new, *info = data;
> +
> + new = kzalloc(sizeof(*new), GFP_KERNEL);
> + if (!new)
> + return -ENOMEM;
> + *new = *info;
> + get_pid_ns(new->pid_ns);
> + get_net(new->net_ns);
> + sb->s_fs_info = new;
> return set_anon_super(sb, NULL);
> }
>
> @@ -43,7 +52,7 @@ static int proc_get_sb(struct file_system_type *fs_type,
> {
> int err;
> struct super_block *sb;
> - struct pid_namespace *ns;
> + struct proc_sb_info info, *sbi;
> struct proc_inode *ei;
>
> if (proc_mnt) {
> @@ -57,12 +66,14 @@ static int proc_get_sb(struct file_system_type *fs_type,
> ei->pid = find_get_pid(1);
> }
>
> + info.pid_ns = current->nsproxy->pid_ns;
> + info.net_ns = current->nsproxy->net_ns;
> if (flags & MS_KERNMOUNT)
> - ns = (struct pid_namespace *)data;
> + sbi = data;
> else
> - ns = current->nsproxy->pid_ns;
> + sbi = &info;
>
> - sb = sget(fs_type, proc_test_super, proc_set_super, ns);
> + sb = sget(fs_type, proc_test_super, proc_set_super, sbi);
> if (IS_ERR(sb))
> return PTR_ERR(sb);
>
> @@ -78,12 +89,13 @@ static int proc_get_sb(struct file_system_type *fs_type,
> ei = PROC_I(sb->s_root->d_inode);
> if (!ei->pid) {
> rcu_read_lock();
> - ei->pid = get_pid(find_pid_ns(1, ns));
> + ei->pid = get_pid(find_pid_ns(1, sbi->pid_ns));
> rcu_read_unlock();
> }
>
> sb->s_flags |= MS_ACTIVE;
> - ns->proc_mnt = mnt;
> + if (!sbi->pid_ns->proc_mnt)
> + sbi->pid_ns->proc_mnt = mnt;
> }
>
> return simple_set_mnt(mnt, sb);
> @@ -91,11 +103,13 @@ static int proc_get_sb(struct file_system_type *fs_type,
>
> static void proc_kill_sb(struct super_block *sb)
> {
> - struct pid_namespace *ns;
> + struct proc_sb_info *sbi;
>
> - ns = (struct pid_namespace *)sb->s_fs_info;
> + sbi = proc_sbi(sb);
> kill_anon_super(sb);
> - put_pid_ns(ns);
> + put_pid_ns(sbi->pid_ns);
> + put_net(sbi->net_ns);
> + kfree(sbi);
> }
>
> static struct file_system_type proc_fs_type = {
> @@ -106,13 +120,16 @@ static struct file_system_type proc_fs_type = {
>
> void __init proc_root_init(void)
> {
> + struct proc_sb_info info;
> int err = proc_init_inodecache();
> if (err)
> return;
> err = register_filesystem(&proc_fs_type);
> if (err)
> return;
> - proc_mnt = kern_mount_data(&proc_fs_type, &init_pid_ns);
> + info.pid_ns = &init_pid_ns;
> + info.net_ns = current->nsproxy->net_ns;
> + proc_mnt = kern_mount_data(&proc_fs_type, &info);
> err = PTR_ERR(proc_mnt);
> if (IS_ERR(proc_mnt)) {
> unregister_filesystem(&proc_fs_type);
> @@ -214,8 +231,11 @@ struct proc_dir_entry proc_root = {
>
> int pid_ns_prepare_proc(struct pid_namespace *ns)
> {
> + struct proc_sb_info info;
> struct vfsmount *mnt;
>
> + info.pid_ns = ns;
> + info.net_ns = current->nsproxy->net_ns;
> mnt = kern_mount_data(&proc_fs_type, ns);
> if (IS_ERR(mnt))
> return PTR_ERR(mnt);
> diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
> index 2b3c1d8..c22c558 100644
> --- a/include/linux/proc_fs.h
> +++ b/include/linux/proc_fs.h
> @@ -48,6 +48,9 @@ typedef int (read_proc_t)(char *page, char **start, off_t off,
> typedef int (write_proc_t)(struct file *file, const char __user *buffer,
> unsigned long count, void *data);
> typedef int (get_info_t)(char *, char **, off_t, int);
> +struct proc_dir_entry;
> +typedef struct proc_dir_entry *(shadow_proc_t)(struct super_block *sb,
> + struct proc_dir_entry *pde);
>
> struct proc_dir_entry {
> unsigned int low_ino;
> @@ -79,6 +82,7 @@ struct proc_dir_entry {
> int pde_users; /* number of callers into module in progress */
> spinlock_t pde_unload_lock; /* proc_fops checks and pde_users bumps */
> struct completion *pde_unload_completion;
> + shadow_proc_t *shadow_proc;
> };
>
> struct kcore_list {

2007-11-24 23:35:36

by Eric W. Biederman

[permalink] [raw]
Subject: [CFT][PATCH] proc_net: Remove userspace visible changes.


Ok. I have kicked around a lot implementation ideas and took a good hard
look at my /proc/net implementation. The patch below should close all
of the holes with /proc/net that I am aware of.

Bind mounts work and properly capture /proc/net/
stat of /proc/net and /proc/net/ return the same information.
cd /proc/net/ ; ls .. works
The dentry has the proper parent and no longer appears deleted.

As well as few more theoretical cases I have been able to imagine,
like open("/proc/net", O_NOFOLLOW | O_DIRECTORY) getdents...

Please take a look and kick this patch around. I don't expect anyone
to find any issues but a few more eyeballs before I send this
along to Linus would be appreciated. Thanks.


From: Eric W. Biederman <[email protected]>
Subject: [PATCH] proc_net: Remove userspace visible changes.

This patch fixes some bugs in corner cases of the /proc/net
implementation.

In proc_net_shadow_dentry.
- Set the parent dentry properly.
- Make the dentry appear hashed so .. works.

Remove the unreachable proc_net_lookup.

Implement proc_net_getattr to complete the
set of implemented inode operations.

Implement proc_net_open which changes the directory we
are openting to remove the need to implement any other
file operations.

Add a big fat comment on how /proc/net works to make it
easier for someone else to look at and understand this code.

This patch should remove the last of the accidental user visible artifacts
that arose from adding network namespace support to /proc/net.

Signed-off-by: Eric W. Biederman <[email protected]>
---
fs/proc/proc_net.c | 116 +++++++++++++++++++++++++++++++++++++++++----------
1 files changed, 93 insertions(+), 23 deletions(-)

diff --git a/fs/proc/proc_net.c b/fs/proc/proc_net.c
index 131f9c6..b0b4b3f 100644
--- a/fs/proc/proc_net.c
+++ b/fs/proc/proc_net.c
@@ -50,24 +50,69 @@ struct net *get_proc_net(const struct inode *inode)
}
EXPORT_SYMBOL_GPL(get_proc_net);

+/*
+ * The contents of the files under /proc/net depend on which network
+ * namespace you are in.
+ *
+ * This implementation relies on the following properties.
+ *
+ * - Each network namespaces has it's own /proc/net dcache tree.
+ * - A directory with a follow_link method never calls lookup
+ * - It is possible in ->open to competely change which underlying
+ * filesystem, path, and inode the struct file refers to.
+ * - A dcache entry with DCACHE_UNHASHED clear and pprev set
+ * appares hashed (and thus valid) to the dcache.
+ *
+ * To give each network namespace it's own /proc/net directory
+ * in a manner transparent to user space (and not requiring /proc)
+ * be remounted we do the following things:
+ *
+ * Keep a different dentry tree for each network namespace under
+ * /proc/net.
+ *
+ * Have the root of the /proc/net dentry tree be a ``unhashed''
+ * dentry with it's root pointing at the /proc dentry. Making
+ * it appear in parallel with the normal /proc/net.
+ *
+ * Redirect all opens of the normal /proc/net to the one appropriate
+ * for the opening process in ->open.
+ *
+ * Redirect all directory traversals onto the appropriate /proc/net
+ * with a follow_link method.
+ *
+ * Wrap all other applicable inode operations so they appear to
+ * happen not on the normal /proc/net but on the network namespace
+ * specific one.
+ *
+ * Currently we can use a bind mount inside a network namespace
+ * to /proc/net visible to processes outside that network namespace.
+ * Long term /proc/net should migrate to /proc/<pid>/net removing
+ * the need for the bind mount for monitoring processes.
+ */
+
static struct proc_dir_entry *proc_net_shadow;

-static struct dentry *proc_net_shadow_dentry(struct dentry *parent,
- struct proc_dir_entry *de)
+static struct dentry *proc_net_shadow_dentry(struct net *net,
+ struct dentry *dentry)
{
+ struct proc_dir_entry *de = net->proc_net;
struct dentry *shadow = NULL;
struct inode *inode;
if (!de)
goto out;
de_get(de);
- inode = proc_get_inode(parent->d_inode->i_sb, de->low_ino, de);
+ inode = proc_get_inode(dentry->d_sb, de->low_ino, de);
if (!inode)
goto out_de_put;
- shadow = d_alloc_name(parent, de->name);
+ shadow = d_alloc(dentry->d_parent, &dentry->d_name);
if (!shadow)
goto out_iput;
- shadow->d_op = parent->d_op; /* proc_dentry_operations */
+ shadow->d_op = dentry->d_op; /* proc_dentry_operations */
d_instantiate(shadow, inode);
+
+ /* Make the dentry looked hashed */
+ shadow->d_hash.pprev = &shadow->d_hash.next;
+ shadow->d_flags &= ~DCACHE_UNHASHED;
out:
return shadow;
out_iput:
@@ -77,36 +122,36 @@ out_de_put:
goto out;
}

-static void *proc_net_follow_link(struct dentry *parent, struct nameidata *nd)
+static void *proc_net_follow_link(struct dentry *dentry, struct nameidata *nd)
{
struct net *net = current->nsproxy->net_ns;
struct dentry *shadow;
- shadow = proc_net_shadow_dentry(parent, net->proc_net);
+
+ shadow = proc_net_shadow_dentry(net, dentry);
if (!shadow)
- return ERR_PTR(-ENOENT);
+ goto out_err;

dput(nd->dentry);
- /* My dentry count is 1 and that should be enough as the
- * shadow dentry is thrown away immediately.
- */
nd->dentry = shadow;
+
return NULL;
+out_err:
+ return ERR_PTR(-ENOENT);
}

-static struct dentry *proc_net_lookup(struct inode *dir, struct dentry *dentry,
- struct nameidata *nd)
+static int proc_net_getattr(struct vfsmount *mnt, struct dentry *dentry,
+ struct kstat *stat)
{
struct net *net = current->nsproxy->net_ns;
struct dentry *shadow;
+ int ret;

- shadow = proc_net_shadow_dentry(nd->dentry, net->proc_net);
+ shadow = proc_net_shadow_dentry(net, dentry);
if (!shadow)
- return ERR_PTR(-ENOENT);
-
- dput(nd->dentry);
- nd->dentry = shadow;
-
- return shadow->d_inode->i_op->lookup(shadow->d_inode, dentry, nd);
+ return -ENOENT;
+ ret = shadow->d_inode->i_op->getattr(mnt, shadow, stat);
+ dput(shadow);
+ return ret;
}

static int proc_net_setattr(struct dentry *dentry, struct iattr *iattr)
@@ -115,7 +160,7 @@ static int proc_net_setattr(struct dentry *dentry, struct iattr *iattr)
struct dentry *shadow;
int ret;

- shadow = proc_net_shadow_dentry(dentry->d_parent, net->proc_net);
+ shadow = proc_net_shadow_dentry(net, dentry);
if (!shadow)
return -ENOENT;
ret = shadow->d_inode->i_op->setattr(shadow, iattr);
@@ -123,13 +168,38 @@ static int proc_net_setattr(struct dentry *dentry, struct iattr *iattr)
return ret;
}

+static int proc_net_open(struct inode *inode, struct file *filp)
+{
+ struct net *net = current->nsproxy->net_ns;
+ struct dentry *shadow;
+ int ret;
+
+ shadow = proc_net_shadow_dentry(net, filp->f_dentry);
+ if (!shadow)
+ return -ENOENT;
+
+ inode = shadow->d_inode;
+
+ fops_put(filp->f_op);
+ dput(filp->f_dentry);
+
+ filp->f_mapping = inode->i_mapping;
+ filp->f_op = fops_get(inode->i_fop);
+ filp->f_dentry = shadow;
+
+ ret = 0;
+ if (filp->f_op && filp->f_op->open)
+ ret = filp->f_op->open(inode, filp);
+ return ret;
+}
+
static const struct file_operations proc_net_dir_operations = {
- .read = generic_read_dir,
+ .open = proc_net_open,
};

static struct inode_operations proc_net_dir_inode_operations = {
.follow_link = proc_net_follow_link,
- .lookup = proc_net_lookup,
+ .getattr = proc_net_getattr,
.setattr = proc_net_setattr,
};

--
1.5.3.rc6.17.g1911

2007-11-26 08:45:28

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [CFT][PATCH] proc_net: Remove userspace visible changes.


Forget this patch. It works but I have found something better.
Full explanation in the morning.

Eric

2007-11-26 22:19:43

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 2.6.24-rc3] Fix /proc/net breakage


Pavel Emelyanov <[email protected]> writes:

> Rafael J. Wysocki wrote:
>> On Monday, 19 of November 2007, Pavel Machek wrote:
>>> Hi!
>>>
>>> I think that this worked before:
>>>
>>> root@amd:/proc# find . -name "timer_info"
>>> find: WARNING: Hard link count is wrong for ./net: this may be a bug
>>> in your filesystem driver. Automatically turning on find's -noleaf
>>> option. Earlier results may have failed to include directories that
>>> should have been searched.
>>> root@amd:/proc#
>>
>> I'm seeing that too.
>
> I have a better things with 2.6.24-rc3 ;)
>
> # cd /proc/net
> # ls ..
> ls: reading directory ..: Not a directory
>
> and this
>
> # cd /proc
> # find
> ...
> ./net
> find: . changed during execution of find
> # find net
> find: net changed during execution of find
> # find net/
> <this works ok however>
>
> Moreover. Program that opens /proc/net and dumps the /proc/self/fd
> files produces the following:
>
> # cd /
> # a.out /proc/net
> ...
> lr-x------ 1 root root 64 Nov 20 18:02 3 -> /proc/net/net (deleted)
> ...
> # cd /proc/net
> # a.out .
> ...
> lr-x------ 1 root root 64 Nov 20 18:03 3 -> /proc/net/net (deleted)
> ...
> # a.out ..
> ...
> lr-x------ 1 root root 64 Nov 20 18:03 3 -> /proc/net
> ...

Well I clearly goofed when I added the initial network namespace support
for /proc/net. Currently things work but there are odd details visible
to user space, even when we have a single network namespace.

Since we do not cache proc_dir_entry dentries at the moment we can
just modify ->lookup to return a different directory inode depending
on the network namespace of the process looking at /proc/net, replacing
the current technique of using a magic and fragile follow_link method.

To accomplish that this patch:
- introduces a shadow_proc method to allow different dentries to
be returned from proc_lookup.
- Removes the old /proc/net follow_link magic
- Fixes a weakness in our not caching of proc generic dentries.

As shadow_proc uses a task struct to decided which dentry to return we
can go back later and fix the proc generic caching without modifying any code that
uses the shadow_proc method.

Signed-off-by: Eric W. Biederman <[email protected]>
---
fs/proc/generic.c | 12 ++++++-
fs/proc/proc_net.c | 86 +++--------------------------------------------
include/linux/proc_fs.h | 3 ++
3 files changed, 19 insertions(+), 82 deletions(-)

diff --git a/fs/proc/generic.c b/fs/proc/generic.c
index a9806bc..c2b7523 100644
--- a/fs/proc/generic.c
+++ b/fs/proc/generic.c
@@ -374,9 +374,16 @@ static int proc_delete_dentry(struct dentry * dentry)
return 1;
}

+static int proc_revalidate_dentry(struct dentry *dentry, struct nameidata *nd)
+{
+ d_drop(dentry);
+ return 0;
+}
+
static struct dentry_operations proc_dentry_operations =
{
.d_delete = proc_delete_dentry,
+ .d_revalidate = proc_revalidate_dentry,
};

/*
@@ -397,8 +404,11 @@ struct dentry *proc_lookup(struct inode * dir, struct dentry *dentry, struct nam
if (de->namelen != dentry->d_name.len)
continue;
if (!memcmp(dentry->d_name.name, de->name, de->namelen)) {
- unsigned int ino = de->low_ino;
+ unsigned int ino;

+ if (de->shadow_proc)
+ de = de->shadow_proc(current, de);
+ ino = de->low_ino;
de_get(de);
spin_unlock(&proc_subdir_lock);
error = -EINVAL;
diff --git a/fs/proc/proc_net.c b/fs/proc/proc_net.c
index 131f9c6..0afe21e 100644
--- a/fs/proc/proc_net.c
+++ b/fs/proc/proc_net.c
@@ -50,89 +50,14 @@ struct net *get_proc_net(const struct inode *inode)
}
EXPORT_SYMBOL_GPL(get_proc_net);

-static struct proc_dir_entry *proc_net_shadow;
+static struct proc_dir_entry *shadow_pde;

-static struct dentry *proc_net_shadow_dentry(struct dentry *parent,
+static struct proc_dir_entry *proc_net_shadow(struct task_struct *task,
struct proc_dir_entry *de)
{
- struct dentry *shadow = NULL;
- struct inode *inode;
- if (!de)
- goto out;
- de_get(de);
- inode = proc_get_inode(parent->d_inode->i_sb, de->low_ino, de);
- if (!inode)
- goto out_de_put;
- shadow = d_alloc_name(parent, de->name);
- if (!shadow)
- goto out_iput;
- shadow->d_op = parent->d_op; /* proc_dentry_operations */
- d_instantiate(shadow, inode);
-out:
- return shadow;
-out_iput:
- iput(inode);
-out_de_put:
- de_put(de);
- goto out;
-}
-
-static void *proc_net_follow_link(struct dentry *parent, struct nameidata *nd)
-{
- struct net *net = current->nsproxy->net_ns;
- struct dentry *shadow;
- shadow = proc_net_shadow_dentry(parent, net->proc_net);
- if (!shadow)
- return ERR_PTR(-ENOENT);
-
- dput(nd->dentry);
- /* My dentry count is 1 and that should be enough as the
- * shadow dentry is thrown away immediately.
- */
- nd->dentry = shadow;
- return NULL;
+ return task->nsproxy->net_ns->proc_net;
}

-static struct dentry *proc_net_lookup(struct inode *dir, struct dentry *dentry,
- struct nameidata *nd)
-{
- struct net *net = current->nsproxy->net_ns;
- struct dentry *shadow;
-
- shadow = proc_net_shadow_dentry(nd->dentry, net->proc_net);
- if (!shadow)
- return ERR_PTR(-ENOENT);
-
- dput(nd->dentry);
- nd->dentry = shadow;
-
- return shadow->d_inode->i_op->lookup(shadow->d_inode, dentry, nd);
-}
-
-static int proc_net_setattr(struct dentry *dentry, struct iattr *iattr)
-{
- struct net *net = current->nsproxy->net_ns;
- struct dentry *shadow;
- int ret;
-
- shadow = proc_net_shadow_dentry(dentry->d_parent, net->proc_net);
- if (!shadow)
- return -ENOENT;
- ret = shadow->d_inode->i_op->setattr(shadow, iattr);
- dput(shadow);
- return ret;
-}
-
-static const struct file_operations proc_net_dir_operations = {
- .read = generic_read_dir,
-};
-
-static struct inode_operations proc_net_dir_inode_operations = {
- .follow_link = proc_net_follow_link,
- .lookup = proc_net_lookup,
- .setattr = proc_net_setattr,
-};
-
static __net_init int proc_net_ns_init(struct net *net)
{
struct proc_dir_entry *root, *netd, *net_statd;
@@ -185,9 +110,8 @@ static struct pernet_operations __net_initdata proc_net_ns_ops = {

int __init proc_net_init(void)
{
- proc_net_shadow = proc_mkdir("net", NULL);
- proc_net_shadow->proc_iops = &proc_net_dir_inode_operations;
- proc_net_shadow->proc_fops = &proc_net_dir_operations;
+ shadow_pde = proc_mkdir("net", NULL);
+ shadow_pde->shadow_proc = proc_net_shadow;

return register_pernet_subsys(&proc_net_ns_ops);
}
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index b070b3b..a5d22c1 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -48,6 +48,8 @@ typedef int (read_proc_t)(char *page, char **start, off_t off,
typedef int (write_proc_t)(struct file *file, const char __user *buffer,
unsigned long count, void *data);
typedef int (get_info_t)(char *, char **, off_t, int);
+typedef struct proc_dir_entry *(shadow_proc_t)(struct task_struct *task,
+ struct proc_dir_entry *pde);

struct proc_dir_entry {
unsigned int low_ino;
@@ -79,6 +81,7 @@ struct proc_dir_entry {
int pde_users; /* number of callers into module in progress */
spinlock_t pde_unload_lock; /* proc_fops checks and pde_users bumps */
struct completion *pde_unload_completion;
+ shadow_proc_t *shadow_proc;
};

struct kcore_list {
--
1.5.3.rc6.17.g1911

2007-11-27 11:21:54

by Pavel Emelyanov

[permalink] [raw]
Subject: Re: [PATCH 2.6.24-rc3] Fix /proc/net breakage

[snip]

>
> Well I clearly goofed when I added the initial network namespace support
> for /proc/net. Currently things work but there are odd details visible
> to user space, even when we have a single network namespace.
>
> Since we do not cache proc_dir_entry dentries at the moment we can
> just modify ->lookup to return a different directory inode depending
> on the network namespace of the process looking at /proc/net, replacing
> the current technique of using a magic and fragile follow_link method.
>
> To accomplish that this patch:
> - introduces a shadow_proc method to allow different dentries to
> be returned from proc_lookup.
> - Removes the old /proc/net follow_link magic
> - Fixes a weakness in our not caching of proc generic dentries.
>
> As shadow_proc uses a task struct to decided which dentry to return we
> can go back later and fix the proc generic caching without modifying any code that
> uses the shadow_proc method.
>
> Signed-off-by: Eric W. Biederman <[email protected]>

Thanks, Eric.

Much better ('find /proc' works and so does 'ls ..'), but one
issue is still unsolved :(

I mentioned the program, that opens the directory and dumps the
content of the /proc/self/fd. Here it is (stupid but simple):

==== prog.c
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <asm/fcntl.h>
#include <unistd.h>

int main(int argc, char **argv)
{
int fd;

fd = open(argv[1], O_RDONLY|O_DIRECTORY);
if (fd == -1) {
perror("Can't open");
return 1;
}

system("ls -l /proc/self/fd");
return 0;
}
====

So. Here's the result of running this program:

# cd /proc/net/
# pwd
/proc/net
# ~/a.out .
total 0
lrwx------ 1 root root 64 Nov 27 13:27 0 -> /dev/pts/0
lrwx------ 1 root root 64 Nov 27 13:27 1 -> /dev/pts/0
lrwx------ 1 root root 64 Nov 27 13:27 2 -> /dev/pts/0
lr-x------ 1 root root 64 Nov 27 13:27 3 -> /proc/net (deleted)
lr-x------ 1 root root 64 Nov 27 13:27 4 -> /proc/4475/fd

# cd /proc
# pwd
/proc
# ~/a.out net
total 0
lrwx------ 1 root root 64 Nov 27 13:27 0 -> /dev/pts/0
lrwx------ 1 root root 64 Nov 27 13:27 1 -> /dev/pts/0
lrwx------ 1 root root 64 Nov 27 13:27 2 -> /dev/pts/0
lr-x------ 1 root root 64 Nov 27 13:27 3 -> /proc/net
lr-x------ 1 root root 64 Nov 27 13:27 4 -> /proc/4477/fd

# cd /proc/net/stat
# pwd
/proc/net/stat
# ~/a.out ..
total 0
lrwx------ 1 root root 64 Nov 27 13:29 0 -> /dev/pts/0
lrwx------ 1 root root 64 Nov 27 13:29 1 -> /dev/pts/0
lrwx------ 1 root root 64 Nov 27 13:29 2 -> /dev/pts/0
lr-x------ 1 root root 64 Nov 27 13:29 3 -> /proc/net (deleted)
lr-x------ 1 root root 64 Nov 27 13:29 4 -> /proc/4482/fd
# ~/a.out .
total 0
lrwx------ 1 root root 64 Nov 27 13:32 0 -> /dev/pts/0
lrwx------ 1 root root 64 Nov 27 13:32 1 -> /dev/pts/0
lrwx------ 1 root root 64 Nov 27 13:32 2 -> /dev/pts/0
lr-x------ 1 root root 64 Nov 27 13:32 3 -> /proc/net/stat
lr-x------ 1 root root 64 Nov 27 13:32 4 -> /proc/4488/fd

Bad thing is that . when cdir is /proc/net and .. when cdir is
anything under /proc/net (i.e. the /proc/net itself) is marked as "(deleted)".

[snip]

Thanks,
Pavel

2007-11-27 12:38:38

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 2.6.24-rc3] Fix /proc/net breakage

Pavel Emelyanov <[email protected]> writes:

> [snip]
>
>>
>> Well I clearly goofed when I added the initial network namespace support
>> for /proc/net. Currently things work but there are odd details visible
>> to user space, even when we have a single network namespace.
>>
>> Since we do not cache proc_dir_entry dentries at the moment we can
>> just modify ->lookup to return a different directory inode depending
>> on the network namespace of the process looking at /proc/net, replacing
>> the current technique of using a magic and fragile follow_link method.
>>
>> To accomplish that this patch:
>> - introduces a shadow_proc method to allow different dentries to
>> be returned from proc_lookup.
>> - Removes the old /proc/net follow_link magic
>> - Fixes a weakness in our not caching of proc generic dentries.
>>
>> As shadow_proc uses a task struct to decided which dentry to return we
>> can go back later and fix the proc generic caching without modifying any code
> that
>> uses the shadow_proc method.
>>
>> Signed-off-by: Eric W. Biederman <[email protected]>
>
> Thanks, Eric.
>
> Much better ('find /proc' works and so does 'ls ..'), but one
> issue is still unsolved :(
>
> I mentioned the program, that opens the directory and dumps the
> content of the /proc/self/fd. Here it is (stupid but simple):

The cause is totally different this time, and is not limited
to /proc/net. But this is the only side effect of the changes now.

I used the one line version of your test program to confirm this:

$ cd /proc/driver/

$ ls -l /proc/self/fd/100 100< .

lr-x------ 1 eric eric 64 Nov 27 05:06 /proc/self/fd/100 -> /proc/driver/

$ ls -l /proc/self/fd/100 100< .

lr-x------ 1 eric eric 64 Nov 27 05:07 /proc/self/fd/100 -> /proc/driver (deleted)

What is happening is that the aggressive non-caching logic is dropping
the dentries and thus they show up as deleted. I.e. When something
triggers another lookup of that dentry revalidate drops the old
dentry.

This actually fixes a race in /proc today where if you open a file,
remove the module for it, reload the module for that file, and then
attempt to access it, lookup will return the old dentry with the
old fops (even if there is not a current version of that file).
kill_proc_inodes current keeps us from using the old version of
the file_operations for directories (because it removes them) but
it does not prevent this DOS attack where keeping a proc file open
always ensures everyone will see the old version.

Given that the behavior seems more correct then what we have currently
I can live with files showing up as deleted from time to time.

Ultimately the fix is to correct the caching logic in
fs/proc/generic.c to only drop dentries when necessary and then
the deleted markers will only show up when the file or directory
really goes away.

I don't expect user space will care about the semi-legitimate
(deleted) marks. So I don't think this one down side will be a big
deal for 2.6.24.

Eric




2007-12-07 04:52:21

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH 2.6.24-rc3] Fix /proc/net breakage

On Mon, 2007-11-26 at 15:17 -0700, Eric W. Biederman wrote:
> Well I clearly goofed when I added the initial network namespace support
> for /proc/net. Currently things work but there are odd details visible
> to user space, even when we have a single network namespace.
>
> Since we do not cache proc_dir_entry dentries at the moment we can
> just modify ->lookup to return a different directory inode depending
> on the network namespace of the process looking at /proc/net, replacing
> the current technique of using a magic and fragile follow_link method.
>
> To accomplish that this patch:
> - introduces a shadow_proc method to allow different dentries to
> be returned from proc_lookup.
> - Removes the old /proc/net follow_link magic
> - Fixes a weakness in our not caching of proc generic dentries.
>
> As shadow_proc uses a task struct to decided which dentry to return we
> can go back later and fix the proc generic caching without modifying any code that
> uses the shadow_proc method.
>
> Signed-off-by: Eric W. Biederman <[email protected]>
> ---
> fs/proc/generic.c | 12 ++++++-
> fs/proc/proc_net.c | 86 +++--------------------------------------------
> include/linux/proc_fs.h | 3 ++
> 3 files changed, 19 insertions(+), 82 deletions(-)

(commit 2b1e300a9dfc3196ccddf6f1d74b91b7af55e416)

This seems to have broken the use of /proc/bus/usb as a mountpoint. It
always appears empty now, whatever's supposed to be mounted there.

--
dwmw2

2007-12-07 10:24:28

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 2.6.24-rc3] Fix /proc/net breakage

On Fri, 07 Dec 2007 04:51:37 +0000 David Woodhouse <[email protected]> wrote:

> On Mon, 2007-11-26 at 15:17 -0700, Eric W. Biederman wrote:
> > Well I clearly goofed when I added the initial network namespace support
> > for /proc/net. Currently things work but there are odd details visible
> > to user space, even when we have a single network namespace.
> >
> > Since we do not cache proc_dir_entry dentries at the moment we can
> > just modify ->lookup to return a different directory inode depending
> > on the network namespace of the process looking at /proc/net, replacing
> > the current technique of using a magic and fragile follow_link method.
> >
> > To accomplish that this patch:
> > - introduces a shadow_proc method to allow different dentries to
> > be returned from proc_lookup.
> > - Removes the old /proc/net follow_link magic
> > - Fixes a weakness in our not caching of proc generic dentries.
> >
> > As shadow_proc uses a task struct to decided which dentry to return we
> > can go back later and fix the proc generic caching without modifying any code that
> > uses the shadow_proc method.
> >
> > Signed-off-by: Eric W. Biederman <[email protected]>
> > ---
> > fs/proc/generic.c | 12 ++++++-
> > fs/proc/proc_net.c | 86 +++--------------------------------------------
> > include/linux/proc_fs.h | 3 ++
> > 3 files changed, 19 insertions(+), 82 deletions(-)
>
> (commit 2b1e300a9dfc3196ccddf6f1d74b91b7af55e416)
>
> This seems to have broken the use of /proc/bus/usb as a mountpoint. It
> always appears empty now, whatever's supposed to be mounted there.
>

Yes. Denis and Eric are tossing around competing patches but afaik nobody
is happy with any of them. Guys, could we get this sorted soonish please?

2007-12-07 11:11:10

by Denis V. Lunev

[permalink] [raw]
Subject: Re: [PATCH 2.6.24-rc3] Fix /proc/net breakage

Andrew Morton wrote:
> On Fri, 07 Dec 2007 04:51:37 +0000 David Woodhouse <[email protected]> wrote:
>
>> On Mon, 2007-11-26 at 15:17 -0700, Eric W. Biederman wrote:
>>> Well I clearly goofed when I added the initial network namespace support
>>> for /proc/net. Currently things work but there are odd details visible
>>> to user space, even when we have a single network namespace.
>>>
>>> Since we do not cache proc_dir_entry dentries at the moment we can
>>> just modify ->lookup to return a different directory inode depending
>>> on the network namespace of the process looking at /proc/net, replacing
>>> the current technique of using a magic and fragile follow_link method.
>>>
>>> To accomplish that this patch:
>>> - introduces a shadow_proc method to allow different dentries to
>>> be returned from proc_lookup.
>>> - Removes the old /proc/net follow_link magic
>>> - Fixes a weakness in our not caching of proc generic dentries.
>>>
>>> As shadow_proc uses a task struct to decided which dentry to return we
>>> can go back later and fix the proc generic caching without modifying any code that
>>> uses the shadow_proc method.
>>>
>>> Signed-off-by: Eric W. Biederman <[email protected]>
>>> ---
>>> fs/proc/generic.c | 12 ++++++-
>>> fs/proc/proc_net.c | 86 +++--------------------------------------------
>>> include/linux/proc_fs.h | 3 ++
>>> 3 files changed, 19 insertions(+), 82 deletions(-)
>> (commit 2b1e300a9dfc3196ccddf6f1d74b91b7af55e416)
>>
>> This seems to have broken the use of /proc/bus/usb as a mountpoint. It
>> always appears empty now, whatever's supposed to be mounted there.
>>
>
> Yes. Denis and Eric are tossing around competing patches but afaik nobody
> is happy with any of them. Guys, could we get this sorted soonish please?
>

Andrew, I become too relaxed after receiving
"Tested-by: Giacomo Catenazzi <[email protected]>"

Eric, I believe that reverting an original behavior is better than your
new one as
- you introduce search into the depth by calling have_submounts(dentry)
during revalidation for all(!) /proc dentries
- your shadowing behavior will be broken if you'll mount something in
the depth of shadowed tree (this can be done as a DoS attempt)

As a last minute call, may be it will be better to pin network namespace
like a pid namespace during mount to avoid this crap at all?

Regards,
Den

2007-12-27 17:41:09

by Andreas Mohr

[permalink] [raw]
Subject: Re: [PATCH 2.6.24-rc3] Fix /proc/net breakage

Hi,

On Fri, Dec 07, 2007 at 02:23:42AM -0800, Andrew Morton wrote:
> > (commit 2b1e300a9dfc3196ccddf6f1d74b91b7af55e416)
> >
> > This seems to have broken the use of /proc/bus/usb as a mountpoint. It
> > always appears empty now, whatever's supposed to be mounted there.
> >
>
> Yes. Denis and Eric are tossing around competing patches but afaik nobody
> is happy with any of them. Guys, could we get this sorted soonish please?

"Soonish" being rather earlier than 20071227?
'cause it's still throwing a fit for me on 2.6.24-rc6-mm1(!) (plus hotfix),
nothing visible in /proc/bus/usb, thus WLAN driver won't probe
anything.
Side note: later upgraded from hotplug-based setup to udev,
but didn't change status quo of problem of empty mount.
Now back again to a 2.6.19ish kernel, since this is the only one
that I could always fall back on after a sizeable number of attempts
to get to run anything newer trouble-free in this environment.
Frankly I'm slightly getting fed up with the number of regressions
I'm hitting in my smallish environment lately. But that's sorta
what you pay for when going experimental.
Just the same day I also had to discover that e100 doesn't usefully
support (IOW, fatal resource probing error) my Intel BNC combo 645477-004,
as opposed to eepro100. Swapped the card from this ""production""
headless box to an easily accessible one for followup debugging of e100.


K6-3@150 headless, Debian stable.

mount 2.12r-19
usbutils 0.72-7

fstab:
none /proc/bus/usb usbfs defaults 0 0

lsmod (NOTE: from 2.6.19-cks2, sorry):

Module Size Used by
act_police 6980 1
sch_ingress 3712 1
cls_u32 7300 5
sch_tbf 6400 1
sch_sfq 5888 4
sch_htb 16128 1
ppp_async 11648 1
crc_ccitt 2304 1 ppp_async
ipt_ULOG 7940 1
xt_state 2304 4
xt_limit 2816 11
xt_tcpudp 3200 48
xt_multiport 3200 10
iptable_mangle 2944 0
iptable_nat 7044 1
iptable_filter 3200 1
ip_tables 12360 3 iptable_mangle,iptable_nat,iptable_filter
ip_nat_tftp 2048 0
ip_conntrack_tftp 4504 1 ip_nat_tftp
ip_nat_h323 7296 0
ip_conntrack_h323 48028 1 ip_nat_h323
ip_nat_irc 2816 0
ip_nat_ftp 3456 0
ip_conntrack_irc 6928 1 ip_nat_irc
ip_conntrack_ftp 7312 1 ip_nat_ftp
ipt_MASQUERADE 3840 1
ip_nat 16684 6 iptable_nat,ip_nat_tftp,ip_nat_h323,ip_nat_irc,ip_nat_ftp,ipt_MASQUERADE
ip_conntrack 43884 12 xt_state,iptable_nat,ip_nat_tftp,ip_conntrack_tftp,ip_nat_h323,ip_conntrack_h323,ip_nat_irc,ip_nat_ftp,ip_conntrack_irc,ip_conntrack_ftp,ipt_MASQUERADE,ip_nat
ipt_REJECT 4608 0
ipt_LOG 6784 11
x_tables 14468 10 ipt_ULOG,xt_state,xt_limit,xt_tcpudp,xt_multiport,iptable_nat,ip_tables,ipt_MASQUERADE,ipt_REJECT,ipt_LOG
isdn 109920 0
sis5595 14472 0
i2c_isa 6528 1 sis5595
at76c503_rfmd 5900 0
firmware_class 9728 1 at76c503_rfmd
i2c_sis5595 8196 0
at76c503 85856 1 at76c503_rfmd
at76_usbdfu 5636 1 at76c503
i2c_sis630 9100 0
i2c_core 22656 4 sis5595,i2c_isa,i2c_sis5595,i2c_sis630
ohci_hcd 31748 0
usbcore 135044 5 at76c503_rfmd,at76c503,at76_usbdfu,ohci_hcd
e100 33800 0
3c59x 41384 0

lspci -v (2.6.19-cks2):

00:00.0 Host bridge: Silicon Integrated Systems [SiS] 5591/5592 Host (rev 02)
Flags: bus master, medium devsel, latency 64
Memory at e8000000 (32-bit, non-prefetchable) [size=64M]
Capabilities: [c0] AGP version 1.0

00:00.1 IDE interface: Silicon Integrated Systems [SiS] 5513 [IDE] (rev d0) (prog-if 8a [Master SecP PriP])
Flags: bus master, fast devsel, latency 64, IRQ 14
I/O ports at <ignored>
I/O ports at <ignored>
I/O ports at <ignored>
I/O ports at <ignored>
I/O ports at 4000 [size=16]

00:01.0 ISA bridge: Silicon Integrated Systems [SiS] SiS85C503/5513 (LPC Bridge) (rev 01)
Flags: bus master, medium devsel, latency 0

00:01.1 Class ff00: Silicon Integrated Systems [SiS] ACPI
Flags: medium devsel

00:01.2 USB Controller: Silicon Integrated Systems [SiS] USB 1.0 Controller (rev 11) (prog-if 10 [OHCI])
Flags: bus master, medium devsel, latency 64, IRQ 12
Memory at f0102000 (32-bit, non-prefetchable) [size=4K]

00:02.0 PCI bridge: Silicon Integrated Systems [SiS] Virtual PCI-to-PCI bridge (AGP) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=00, secondary=01, subordinate=01, sec-latency=64
I/O behind bridge: 0000d000-0000dfff
Memory behind bridge: ec000000-edffffff
Prefetchable memory behind bridge: e0000000-e7ffffff

00:09.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro 100] (rev 08)
Subsystem: Intel Corporation EtherExpress PRO/100+ Management Adapter
Flags: bus master, medium devsel, latency 64, IRQ 11
Memory at f0100000 (32-bit, non-prefetchable) [size=4K]
I/O ports at e000 [size=64]
Memory at f0000000 (32-bit, non-prefetchable) [size=1M]
Expansion ROM at ee000000 [disabled] [size=1M]
Capabilities: [dc] Power Management version 2

00:0f.0 Ethernet controller: 3Com Corporation 3c905B Deluxe Etherlink 10/100/BNC [Cyclone]
Subsystem: 3Com Corporation 3c905B Deluxe Etherlink 10/100/BNC [Cyclone]
Flags: bus master, medium devsel, latency 64, IRQ 10
I/O ports at e400 [size=128]
Memory at f0101000 (32-bit, non-prefetchable) [size=128]
Expansion ROM at ef000000 [disabled] [size=128K]
Capabilities: [dc] Power Management version 1

01:00.0 VGA compatible controller: ATI Technologies Inc Radeon RV100 QY [Radeon 7000/VE] (prog-if 00 [VGA])
Subsystem: ATI Technologies Inc Unknown device 110a
Flags: bus master, stepping, 66MHz, medium devsel, latency 64
Memory at e0000000 (32-bit, prefetchable) [size=128M]
I/O ports at d000 [size=256]
Memory at ed000000 (32-bit, non-prefetchable) [size=64K]
[virtual] Expansion ROM at ec000000 [disabled] [size=128K]
Capabilities: [58] AGP version 2.0
Capabilities: [50] Power Management version 2

Thanks,

Andreas Mohr

2007-12-27 18:42:15

by Alexey Dobriyan

[permalink] [raw]
Subject: Re: [PATCH 2.6.24-rc3] Fix /proc/net breakage

On Thu, Dec 27, 2007 at 06:40:56PM +0100, Andreas Mohr wrote:
> On Fri, Dec 07, 2007 at 02:23:42AM -0800, Andrew Morton wrote:
> > > (commit 2b1e300a9dfc3196ccddf6f1d74b91b7af55e416)
> > >
> > > This seems to have broken the use of /proc/bus/usb as a mountpoint. It
> > > always appears empty now, whatever's supposed to be mounted there.
> > >
> >
> > Yes. Denis and Eric are tossing around competing patches but afaik nobody
> > is happy with any of them. Guys, could we get this sorted soonish please?
>
> "Soonish" being rather earlier than 20071227?
> 'cause it's still throwing a fit for me on 2.6.24-rc6-mm1(!) (plus hotfix),
> nothing visible in /proc/bus/usb, thus WLAN driver won't probe
> anything.

Patch which restores usual behaviour was merged in 2.6.24-rc5
(3790ee4bd86396558eedd86faac1052cb782e4e1 "proc: remove/Fix proc generic d_revalidate")
so no hotfixes are needed. I just checked with bind mounting / to
/proc/bus/usb -- it works.

2007-12-27 22:17:42

by Andreas Mohr

[permalink] [raw]
Subject: Re: [PATCH 2.6.24-rc3] Fix /proc/net breakage

Hi,

On Thu, Dec 27, 2007 at 09:41:45PM +0300, Alexey Dobriyan wrote:
> On Thu, Dec 27, 2007 at 06:40:56PM +0100, Andreas Mohr wrote:
> > On Fri, Dec 07, 2007 at 02:23:42AM -0800, Andrew Morton wrote:
> > > > (commit 2b1e300a9dfc3196ccddf6f1d74b91b7af55e416)
> > > >
> > > > This seems to have broken the use of /proc/bus/usb as a mountpoint. It
> > > > always appears empty now, whatever's supposed to be mounted there.
> > > >
> > >
> > > Yes. Denis and Eric are tossing around competing patches but afaik nobody
> > > is happy with any of them. Guys, could we get this sorted soonish please?
> >
> > "Soonish" being rather earlier than 20071227?
> > 'cause it's still throwing a fit for me on 2.6.24-rc6-mm1(!) (plus hotfix),
> > nothing visible in /proc/bus/usb, thus WLAN driver won't probe
> > anything.
>
> Patch which restores usual behaviour was merged in 2.6.24-rc5
> (3790ee4bd86396558eedd86faac1052cb782e4e1 "proc: remove/Fix proc generic d_revalidate")

Via a kerneltrap.org article (problematic internet setup here currently,
non-C&P text terminal only) I could see that this is the patch which
removes the d_revalidate member and the corresponding proc_revalidate_dentry().
And this is the state that my 2.6.24-rc_six_-mm1 tree is in already.
So either it didn't help here or it broke again by some later change or
there's some dumb PEBKAC error here.

> so no hotfixes are needed. I just checked with bind mounting / to
> /proc/bus/usb -- it works.

OK, I'll try to re-check manual, raw, bare-metal mounting (bind etc.) soonish.

"CONFIG_NETNS" as mentioned in the patch description actually seems
to be "CONFIG_NET_NS", BTW.
(which I DON'T have set at the moment, if this happens to make a difference)

Thanks,

Andreas Mohr

2007-12-28 06:22:26

by Alexey Dobriyan

[permalink] [raw]
Subject: Re: [PATCH 2.6.24-rc3] Fix /proc/net breakage

On Thu, Dec 27, 2007 at 11:17:28PM +0100, Andreas Mohr wrote:
> Hi,
>
> On Thu, Dec 27, 2007 at 09:41:45PM +0300, Alexey Dobriyan wrote:
> > On Thu, Dec 27, 2007 at 06:40:56PM +0100, Andreas Mohr wrote:
> > > On Fri, Dec 07, 2007 at 02:23:42AM -0800, Andrew Morton wrote:
> > > > > (commit 2b1e300a9dfc3196ccddf6f1d74b91b7af55e416)
> > > > >
> > > > > This seems to have broken the use of /proc/bus/usb as a mountpoint. It
> > > > > always appears empty now, whatever's supposed to be mounted there.
> > > > >
> > > >
> > > > Yes. Denis and Eric are tossing around competing patches but afaik nobody
> > > > is happy with any of them. Guys, could we get this sorted soonish please?
> > >
> > > "Soonish" being rather earlier than 20071227?
> > > 'cause it's still throwing a fit for me on 2.6.24-rc6-mm1(!) (plus hotfix),
> > > nothing visible in /proc/bus/usb, thus WLAN driver won't probe
> > > anything.
> >
> > Patch which restores usual behaviour was merged in 2.6.24-rc5
> > (3790ee4bd86396558eedd86faac1052cb782e4e1 "proc: remove/Fix proc generic d_revalidate")
>
> Via a kerneltrap.org article (problematic internet setup here currently,
> non-C&P text terminal only) I could see that this is the patch which
> removes the d_revalidate member and the corresponding proc_revalidate_dentry().
> And this is the state that my 2.6.24-rc_six_-mm1 tree is in already.

OK.

> So either it didn't help here or it broke again by some later change or
> there's some dumb PEBKAC error here.

Do you by chance forgot CONFIG_USB_?HCI_HCD=y ? I see empty usbfs here if
they are deselected.

> > so no hotfixes are needed. I just checked with bind mounting / to
> > /proc/bus/usb -- it works.
>
> OK, I'll try to re-check manual, raw, bare-metal mounting (bind etc.) soonish.
>
> "CONFIG_NETNS" as mentioned in the patch description actually seems
> to be "CONFIG_NET_NS", BTW.
> (which I DON'T have set at the moment, if this happens to make a difference)

I shouldn't.

2007-12-28 07:21:30

by Andreas Mohr

[permalink] [raw]
Subject: Re: [PATCH 2.6.24-rc3] Fix /proc/net breakage

Hi,

On Fri, Dec 28, 2007 at 09:22:01AM +0300, Alexey Dobriyan wrote:
> On Thu, Dec 27, 2007 at 11:17:28PM +0100, Andreas Mohr wrote:
> > And this is the state that my 2.6.24-rc_six_-mm1 tree is in already.
>
> OK.
>
> > So either it didn't help here or it broke again by some later change or
> > there's some dumb PEBKAC error here.
>
> Do you by chance forgot CONFIG_USB_?HCI_HCD=y ? I see empty usbfs here if
> they are deselected.

Quite well-educated guess! In fact I had already discovered that
ohci-hcd fails to load, at all. Doh.
(however after going through all recent ohci.*hcd related LKML postings
it seems there's no report about such problems yet)

Sorry for barking up the entirely wrong tree!

For giggles, here's the output:

# modprobe ohci-hcd
FATAL: Error inserting ohci_hcd (/lib/modules/2.6.24-rc6-mm1-gate/kernel/drivers/usb/host/ohci-hcd.ko): No such device

dmesg:
ohci_hcd: 2006 August 04 USB 1.1 'Open' Host Controller (OHCI) Driver
ohci_hcd: block sizes: ed 64 td 64

(yes, that's all there is, despite CONFIG_USB_DEBUG being set)

The LED of a usb stick isn't active either, for obvious reasons.

And keep in mind that this is a (relatively old) OHCI-only machine...
(which had the 2.6.19 lsmod showing ohci-hcd just fine and working fine
with WLAN USB)

Now pondering whether to try -rc6 proper or whether to revert specific
guilty-looking USB changes...
And wondering how to properly elevate this issue (prompt Greg about it,
new thread, bug #, ...?)

Thanks,

Andreas Mohr

2007-12-30 16:14:54

by Ingo Molnar

[permalink] [raw]
Subject: [usb regression] Re: [PATCH 2.6.24-rc3] Fix /proc/net breakage


* Andreas Mohr <[email protected]> wrote:

> (yes, that's all there is, despite CONFIG_USB_DEBUG being set)
>
> The LED of a usb stick isn't active either, for obvious reasons.
>
> And keep in mind that this is a (relatively old) OHCI-only machine...
> (which had the 2.6.19 lsmod showing ohci-hcd just fine and working
> fine with WLAN USB)
>
> Now pondering whether to try -rc6 proper or whether to revert specific
> guilty-looking USB changes... And wondering how to properly elevate
> this issue (prompt Greg about it, new thread, bug #, ...?)

Cc:-ed Alan Stern for the USB regression.

Ingo

2007-12-30 20:34:57

by Alan Stern

[permalink] [raw]
Subject: Re: [usb regression] Re: [PATCH 2.6.24-rc3] Fix /proc/net breakage

On Sun, 30 Dec 2007, Ingo Molnar wrote:

> * Andreas Mohr <[email protected]> wrote:
>
> > (yes, that's all there is, despite CONFIG_USB_DEBUG being set)
> >
> > The LED of a usb stick isn't active either, for obvious reasons.
> >
> > And keep in mind that this is a (relatively old) OHCI-only machine...
> > (which had the 2.6.19 lsmod showing ohci-hcd just fine and working
> > fine with WLAN USB)
> >
> > Now pondering whether to try -rc6 proper or whether to revert specific
> > guilty-looking USB changes... And wondering how to properly elevate
> > this issue (prompt Greg about it, new thread, bug #, ...?)

It looks like Greg misused the debugfs API -- which is ironic, because
he wrote debugfs in the first place! :-)

Let me know if this patch fixes the problem. If it does, I'll submit
it to Greg with all the proper accoutrements.

Alan Stern


Index: 2.6.24-rc6-mm1/drivers/usb/host/ohci-hcd.c
===================================================================
--- 2.6.24-rc6-mm1.orig/drivers/usb/host/ohci-hcd.c
+++ 2.6.24-rc6-mm1/drivers/usb/host/ohci-hcd.c
@@ -1067,14 +1067,8 @@ static int __init ohci_hcd_mod_init(void

#ifdef DEBUG
ohci_debug_root = debugfs_create_dir("ohci", NULL);
- if (!ohci_debug_root || IS_ERR(ohci_debug_root)) {
- if (!ohci_debug_root)
- retval = -ENOENT;
- else
- retval = PTR_ERR(ohci_debug_root);
-
- goto error_debug;
- }
+ if (!ohci_debug_root)
+ return -ENOENT;
#endif

#ifdef PS3_SYSTEM_BUS_DRIVER
@@ -1142,7 +1136,6 @@ static int __init ohci_hcd_mod_init(void
#ifdef DEBUG
debugfs_remove(ohci_debug_root);
ohci_debug_root = NULL;
- error_debug:
#endif

return retval;
Index: 2.6.24-rc6-mm1/drivers/usb/host/ohci-dbg.c
===================================================================
--- 2.6.24-rc6-mm1.orig/drivers/usb/host/ohci-dbg.c
+++ 2.6.24-rc6-mm1/drivers/usb/host/ohci-dbg.c
@@ -813,30 +813,29 @@ static inline void create_debug_files (s
struct device *dev = bus->dev;

ohci->debug_dir = debugfs_create_dir(bus->bus_name, ohci_debug_root);
- if (!ohci->debug_dir || IS_ERR(ohci->debug_dir)) {
- ohci->debug_dir = NULL;
- goto done;
- }
+ if (!ohci->debug_dir)
+ return;

ohci->debug_async = debugfs_create_file("async", S_IRUGO,
ohci->debug_dir, dev,
&debug_async_fops);
- if (!ohci->debug_async || IS_ERR(ohci->debug_async))
+ if (!ohci->debug_async)
goto async_error;

ohci->debug_periodic = debugfs_create_file("periodic", S_IRUGO,
ohci->debug_dir, dev,
&debug_periodic_fops);
- if (!ohci->debug_periodic || IS_ERR(ohci->debug_periodic))
+ if (!ohci->debug_periodic)
goto periodic_error;

ohci->debug_registers = debugfs_create_file("registers", S_IRUGO,
ohci->debug_dir, dev,
&debug_registers_fops);
- if (!ohci->debug_registers || IS_ERR(ohci->debug_registers))
+ if (!ohci->debug_registers)
goto registers_error;

- goto done;
+ ohci_dbg(ohci, "created debug files\n");
+ return;

registers_error:
debugfs_remove(ohci->debug_periodic);
@@ -847,10 +846,6 @@ periodic_error:
async_error:
debugfs_remove(ohci->debug_dir);
ohci->debug_dir = NULL;
-done:
- return;
-
- ohci_dbg (ohci, "created debug files\n");
}

static inline void remove_debug_files (struct ohci_hcd *ohci)
Index: 2.6.24-rc6-mm1/drivers/usb/host/ehci-hcd.c
===================================================================
--- 2.6.24-rc6-mm1.orig/drivers/usb/host/ehci-hcd.c
+++ 2.6.24-rc6-mm1/drivers/usb/host/ehci-hcd.c
@@ -1019,14 +1019,8 @@ static int __init ehci_hcd_init(void)

#ifdef DEBUG
ehci_debug_root = debugfs_create_dir("ehci", NULL);
- if (!ehci_debug_root || IS_ERR(ehci_debug_root)) {
- if (!ehci_debug_root)
- retval = -ENOENT;
- else
- retval = PTR_ERR(ehci_debug_root);
-
- return retval;
- }
+ if (!ehci_debug_root)
+ return -ENOENT;
#endif

#ifdef PLATFORM_DRIVER
Index: 2.6.24-rc6-mm1/drivers/usb/host/ehci-dbg.c
===================================================================
--- 2.6.24-rc6-mm1.orig/drivers/usb/host/ehci-dbg.c
+++ 2.6.24-rc6-mm1/drivers/usb/host/ehci-dbg.c
@@ -914,30 +914,28 @@ static inline void create_debug_files (s
struct usb_bus *bus = &ehci_to_hcd(ehci)->self;

ehci->debug_dir = debugfs_create_dir(bus->bus_name, ehci_debug_root);
- if (!ehci->debug_dir || IS_ERR(ehci->debug_dir)) {
- ehci->debug_dir = NULL;
- goto done;
- }
+ if (!ehci->debug_dir)
+ return;

ehci->debug_async = debugfs_create_file("async", S_IRUGO,
ehci->debug_dir, bus,
&debug_async_fops);
- if (!ehci->debug_async || IS_ERR(ehci->debug_async))
+ if (!ehci->debug_async)
goto async_error;

ehci->debug_periodic = debugfs_create_file("periodic", S_IRUGO,
ehci->debug_dir, bus,
&debug_periodic_fops);
- if (!ehci->debug_periodic || IS_ERR(ehci->debug_periodic))
+ if (!ehci->debug_periodic)
goto periodic_error;

ehci->debug_registers = debugfs_create_file("registers", S_IRUGO,
ehci->debug_dir, bus,
&debug_registers_fops);
- if (!ehci->debug_registers || IS_ERR(ehci->debug_registers))
+ if (!ehci->debug_registers)
goto registers_error;

- goto done;
+ return;

registers_error:
debugfs_remove(ehci->debug_periodic);
@@ -948,9 +946,6 @@ periodic_error:
async_error:
debugfs_remove(ehci->debug_dir);
ehci->debug_dir = NULL;
-
-done:
- return;
}

static inline void remove_debug_files (struct ehci_hcd *ehci)

2007-12-31 05:24:25

by Greg KH

[permalink] [raw]
Subject: Re: [usb regression] Re: [PATCH 2.6.24-rc3] Fix /proc/net breakage

On Sun, Dec 30, 2007 at 03:34:45PM -0500, Alan Stern wrote:
> On Sun, 30 Dec 2007, Ingo Molnar wrote:
>
> > * Andreas Mohr <[email protected]> wrote:
> >
> > > (yes, that's all there is, despite CONFIG_USB_DEBUG being set)
> > >
> > > The LED of a usb stick isn't active either, for obvious reasons.
> > >
> > > And keep in mind that this is a (relatively old) OHCI-only machine...
> > > (which had the 2.6.19 lsmod showing ohci-hcd just fine and working
> > > fine with WLAN USB)
> > >
> > > Now pondering whether to try -rc6 proper or whether to revert specific
> > > guilty-looking USB changes... And wondering how to properly elevate
> > > this issue (prompt Greg about it, new thread, bug #, ...?)
>
> It looks like Greg misused the debugfs API -- which is ironic, because
> he wrote debugfs in the first place! :-)

Oh crap, sorry, I did mess that up :(

> Let me know if this patch fixes the problem. If it does, I'll submit
> it to Greg with all the proper accoutrements.

This isn't going to work if CONFIG_DEBUGFS is not enabled either :(

> Index: 2.6.24-rc6-mm1/drivers/usb/host/ohci-hcd.c
> ===================================================================
> --- 2.6.24-rc6-mm1.orig/drivers/usb/host/ohci-hcd.c
> +++ 2.6.24-rc6-mm1/drivers/usb/host/ohci-hcd.c
> @@ -1067,14 +1067,8 @@ static int __init ohci_hcd_mod_init(void
>
> #ifdef DEBUG
> ohci_debug_root = debugfs_create_dir("ohci", NULL);
> - if (!ohci_debug_root || IS_ERR(ohci_debug_root)) {
> - if (!ohci_debug_root)
> - retval = -ENOENT;
> - else
> - retval = PTR_ERR(ohci_debug_root);
> -
> - goto error_debug;
> - }
> + if (!ohci_debug_root)
> + return -ENOENT;

It needs to check for ERR_PTR(-ENODEV) which is the return value if
debugfs is not enabled, and if so, just ignore things.

If NULL is returned, or anything else, it's a real error.

So, try something like:
if (!ohci_debug_root) {
retval = -ENOENT;
goto error_debug;
}
if (IS_ERR(ohci_debug_root) && PTR_ERR(ohci_debug_root) != -ENODEV) {
retval = PTR_ERR(ohci_debug_root);
goto error_debug;
}

and let me know of that works for you.

Although the combination of CONFIG_USB_DEBUG and not CONFIG_DEBUGFS is
strange, we could just enable CONFIG_DEBUGFS is USB_DEBUG is enabled and
then simplify this logic a bunch at the same time.

thanks,

greg k-h

2007-12-31 17:50:07

by Alan Stern

[permalink] [raw]
Subject: Re: [usb regression] Re: [PATCH 2.6.24-rc3] Fix /proc/net breakage

On Sun, 30 Dec 2007, Greg KH wrote:

> > It looks like Greg misused the debugfs API -- which is ironic, because
> > he wrote debugfs in the first place! :-)
>
> Oh crap, sorry, I did mess that up :(
>
> > Let me know if this patch fixes the problem. If it does, I'll submit
> > it to Greg with all the proper accoutrements.
>
> This isn't going to work if CONFIG_DEBUGFS is not enabled either :(

Why not? The debugging files won't be created, true, but the driver
should work just fine regardless. This is exactly the way uhci-hcd
behaves, and it hasn't caused any problems.

> > Index: 2.6.24-rc6-mm1/drivers/usb/host/ohci-hcd.c
> > ===================================================================
> > --- 2.6.24-rc6-mm1.orig/drivers/usb/host/ohci-hcd.c
> > +++ 2.6.24-rc6-mm1/drivers/usb/host/ohci-hcd.c
> > @@ -1067,14 +1067,8 @@ static int __init ohci_hcd_mod_init(void
> >
> > #ifdef DEBUG
> > ohci_debug_root = debugfs_create_dir("ohci", NULL);
> > - if (!ohci_debug_root || IS_ERR(ohci_debug_root)) {
> > - if (!ohci_debug_root)
> > - retval = -ENOENT;
> > - else
> > - retval = PTR_ERR(ohci_debug_root);
> > -
> > - goto error_debug;
> > - }
> > + if (!ohci_debug_root)
> > + return -ENOENT;
>
> It needs to check for ERR_PTR(-ENODEV) which is the return value if
> debugfs is not enabled, and if so, just ignore things.

No. That is the mistake you made before. If debugfs isn't enabled
then the driver should just ignore things, yes -- and in particular it
should ignore the fact that the return code is ERR_PTR(-ENODEV). No
extra checking is required.

> If NULL is returned, or anything else, it's a real error.

If NULL is returned, it's a real error, agreed. But if anything else
is returned then the call was successful as far as the driver is
concerned.

> So, try something like:
> if (!ohci_debug_root) {
> retval = -ENOENT;
> goto error_debug;
> }
> if (IS_ERR(ohci_debug_root) && PTR_ERR(ohci_debug_root) != -ENODEV) {
> retval = PTR_ERR(ohci_debug_root);
> goto error_debug;
> }
>
> and let me know of that works for you.

Greg, it sounds like you have forgotten the rationale you originally
used for specifying the return codes in debugfs! The idea was to
return non-NULL if CONFIG_DEBUGFS was disabled, so that drivers could
pretend the calls had succeeded and not need to worry about matters
beyond their control.

Only a NULL return indicates a genuine error. The kerneldoc for
debugfs_create_dir says this very plainly.

> Although the combination of CONFIG_USB_DEBUG and not CONFIG_DEBUGFS is
> strange, we could just enable CONFIG_DEBUGFS is USB_DEBUG is enabled and
> then simplify this logic a bunch at the same time.

Most distributions enable CONFIG_DEBUGFS by default, don't they? So
this problem only comes up when people make up their own configs.
Having USB_DEBUG enabled and DEBUGFS disabled seems like a perfectly
reasonable combination to me. I wouldn't change any aspect of the
configuration logic.

Alan Stern

2007-12-31 19:28:48

by Greg KH

[permalink] [raw]
Subject: Re: [usb regression] Re: [PATCH 2.6.24-rc3] Fix /proc/net breakage

On Mon, Dec 31, 2007 at 12:49:52PM -0500, Alan Stern wrote:
> On Sun, 30 Dec 2007, Greg KH wrote:
>
> > > It looks like Greg misused the debugfs API -- which is ironic, because
> > > he wrote debugfs in the first place! :-)
> >
> > Oh crap, sorry, I did mess that up :(
> >
> > > Let me know if this patch fixes the problem. If it does, I'll submit
> > > it to Greg with all the proper accoutrements.
> >
> > This isn't going to work if CONFIG_DEBUGFS is not enabled either :(
>
> Why not? The debugging files won't be created, true, but the driver
> should work just fine regardless. This is exactly the way uhci-hcd
> behaves, and it hasn't caused any problems.

Ok, you are right, it was late and after some wine and I shouldn't have
been responding to email :)

I'll rework this, and send out a patch tonight or tomorrow when I get a
chance (am on the road right now). We should only care about NULL being
returned here, that's the only "real" error.

And in thinking about it some more, I should go audit all the debugfs
callers to make sure they are doing something sane too, it shouldn't be
this complex at all...

thanks,

greg k-h

2008-01-02 05:59:21

by Greg KH

[permalink] [raw]
Subject: Re: [usb regression] Re: [PATCH 2.6.24-rc3] Fix /proc/net breakage

On Mon, Dec 31, 2007 at 11:26:43AM -0800, Greg KH wrote:
> On Mon, Dec 31, 2007 at 12:49:52PM -0500, Alan Stern wrote:
> > On Sun, 30 Dec 2007, Greg KH wrote:
> >
> > > > It looks like Greg misused the debugfs API -- which is ironic, because
> > > > he wrote debugfs in the first place! :-)
> > >
> > > Oh crap, sorry, I did mess that up :(

Ok, no, I didn't write that patch, I'm getting very confused here.

In 2.6.24-rc6 there is no usage of debugfs in the ohci driver.

In the -mm tree there is a patch, from Tony Jones, that moves some debug
code out of sysfs and into debugfs where it belongs. It does it for
both the ehci and ohci USB host controller drivers, and this is the code
that is incorrect if CONFIG_DEBUGFS is not enabled.

So, for the 2.6.24 release, nothing needs to be changed, all is good,
and there is no regression.

Right? Or am I still confused about this whole thing?

I will go fix up Tony's patches in the -mm tree that do not handle the
error return values from debugfs properly, but that is not such a rush,
as Tony is on vacation for a few weeks, and those patches are going to
be going in only after 2.6.24 is out.

Hm, I wonder if this means I can go back to drinking more holiday
wine... :)

thanks,

greg k-h

2008-01-02 06:04:34

by Andreas Mohr

[permalink] [raw]
Subject: Re: [usb regression] Re: [PATCH 2.6.24-rc3] Fix /proc/net breakage

Hi,

On Sun, Dec 30, 2007 at 03:34:45PM -0500, Alan Stern wrote:
> It looks like Greg misused the debugfs API -- which is ironic, because
> he wrote debugfs in the first place! :-)
>
> Let me know if this patch fixes the problem. If it does, I'll submit
> it to Greg with all the proper accoutrements.

Finally a 2.6.24-rc6-mm1 with working USB WLAN. :)
IOW I can confirm that this was the cause of the USB problem I was having.

Thanks!

Andreas Mohr

2008-01-02 06:13:22

by Andreas Mohr

[permalink] [raw]
Subject: Re: [usb regression] Re: [PATCH 2.6.24-rc3] Fix /proc/net breakage

Hi,

On Tue, Jan 01, 2008 at 10:00:06PM -0800, Greg KH wrote:
> Ok, no, I didn't write that patch, I'm getting very confused here.
>
> In 2.6.24-rc6 there is no usage of debugfs in the ohci driver.
>
> In the -mm tree there is a patch, from Tony Jones, that moves some debug
> code out of sysfs and into debugfs where it belongs. It does it for
> both the ehci and ohci USB host controller drivers, and this is the code
> that is incorrect if CONFIG_DEBUGFS is not enabled.
>
> So, for the 2.6.24 release, nothing needs to be changed, all is good,
> and there is no regression.
>
> Right? Or am I still confused about this whole thing?

Probably, since I also wasn't sure about this getting added to the
post-2.6.23 regressions. I'd expect this problem to be -mm only myself,
but then I didn't actively verify it in code (and I haven't tried -rc6
proper on that machine yet either, build takes ages ;).
OK, since I cannot really offer anything other than positive feelings about
this being non-rc6 breakage, should I try -rc6 proper, too?

> Hm, I wonder if this means I can go back to drinking more holiday
> wine... :)

.....even more confusion? :)

Thanks for your help,

Andreas Mohr

2008-01-02 07:15:56

by Greg KH

[permalink] [raw]
Subject: Re: [usb regression] Re: [PATCH 2.6.24-rc3] Fix /proc/net breakage

On Wed, Jan 02, 2008 at 07:13:08AM +0100, Andreas Mohr wrote:
> Hi,
>
> On Tue, Jan 01, 2008 at 10:00:06PM -0800, Greg KH wrote:
> > Ok, no, I didn't write that patch, I'm getting very confused here.
> >
> > In 2.6.24-rc6 there is no usage of debugfs in the ohci driver.
> >
> > In the -mm tree there is a patch, from Tony Jones, that moves some debug
> > code out of sysfs and into debugfs where it belongs. It does it for
> > both the ehci and ohci USB host controller drivers, and this is the code
> > that is incorrect if CONFIG_DEBUGFS is not enabled.
> >
> > So, for the 2.6.24 release, nothing needs to be changed, all is good,
> > and there is no regression.
> >
> > Right? Or am I still confused about this whole thing?
>
> Probably, since I also wasn't sure about this getting added to the
> post-2.6.23 regressions. I'd expect this problem to be -mm only myself,
> but then I didn't actively verify it in code (and I haven't tried -rc6
> proper on that machine yet either, build takes ages ;).
> OK, since I cannot really offer anything other than positive feelings about
> this being non-rc6 breakage, should I try -rc6 proper, too?

Please do, that would be very helpful.

As the code being discussed isn't even in Linus's tree, it's hard to see
how this could be a proposed fix for the regression :)

thanks,

greg k-h

2008-01-02 15:56:55

by Alan Stern

[permalink] [raw]
Subject: Re: [usb regression] Re: [PATCH 2.6.24-rc3] Fix /proc/net breakage

On Tue, 1 Jan 2008, Greg KH wrote:

> On Mon, Dec 31, 2007 at 11:26:43AM -0800, Greg KH wrote:
> > On Mon, Dec 31, 2007 at 12:49:52PM -0500, Alan Stern wrote:
> > > On Sun, 30 Dec 2007, Greg KH wrote:
> > >
> > > > > It looks like Greg misused the debugfs API -- which is ironic, because
> > > > > he wrote debugfs in the first place! :-)

> Ok, no, I didn't write that patch, I'm getting very confused here.
>
> In 2.6.24-rc6 there is no usage of debugfs in the ohci driver.
>
> In the -mm tree there is a patch, from Tony Jones, that moves some debug
> code out of sysfs and into debugfs where it belongs. It does it for
> both the ehci and ohci USB host controller drivers, and this is the code
> that is incorrect if CONFIG_DEBUGFS is not enabled.

My mistake; I got the impression you had written that new code rather
than Tony. BTW, I don't recall ever seeing Tony's patch announced on
linux-usb or linux-usb-devel. Did I simply miss it?

> So, for the 2.6.24 release, nothing needs to be changed, all is good,
> and there is no regression.
>
> Right? Or am I still confused about this whole thing?

Correct. The problem exists only in -mm and your development tree.

> I will go fix up Tony's patches in the -mm tree that do not handle the
> error return values from debugfs properly, but that is not such a rush,
> as Tony is on vacation for a few weeks, and those patches are going to
> be going in only after 2.6.24 is out.

The fix I posted earlier in this thread should simply be merged into
Tony's patches.

Alan Stern

2008-01-02 18:49:00

by David Brownell

[permalink] [raw]
Subject: Re: [usb regression] Re: [PATCH 2.6.24-rc3] Fix /proc/net breakage

On Wednesday 02 January 2008, Alan Stern wrote:
> ?BTW, I don't recall ever seeing Tony's patch announced on
> linux-usb or linux-usb-devel. ?Did I simply miss it?

I think he didn't post it. I got some questions from him at
one point, which I answered, but as I recall he decided for
some reason to stop that work.