2013-12-31 12:33:07

by Jeff Layton

[permalink] [raw]
Subject: should we change how the kernel detects whether gssproxy is running?

I'm a bit concerned with how /proc/net/rpc/use-gss-proxy works...

For one thing, when the kernel first boots any read against that file
hangs. That's going to be extremely problematic for certain tools that
scrape info out of /proc for troubleshooting purposes (e.g. Red Hat's
sosreport tool).

Also, it seems like if gssproxy suddenly dies, then this switch stays
stuck in its position. There is no way switch back after enabling
gssproxy.

All of that seems unnecessarily complicated. Is there some rationale
for it that I'm missing?

Would it instead make more sense to instead just have gssproxy
hold a file open under /proc? If the file is being held open, then
send upcalls to gssproxy. If not then use the legacy code.

That way the upcalls would truly be conditional on whether gssproxy is
running...

--
Jeff Layton <[email protected]>


2013-12-31 20:52:07

by Simo Sorce

[permalink] [raw]
Subject: Re: should we change how the kernel detects whether gssproxy is running?

On Tue, 2013-12-31 at 13:01 -0500, J. Bruce Fields wrote:
> On Tue, Dec 31, 2013 at 07:33:00AM -0500, Jeff Layton wrote:
> > I'm a bit concerned with how /proc/net/rpc/use-gss-proxy works...
> >
> > For one thing, when the kernel first boots any read against that file
> > hangs. That's going to be extremely problematic for certain tools that
> > scrape info out of /proc for troubleshooting purposes (e.g. Red Hat's
> > sosreport tool).
>
> Is that the only file under /proc for which that's true? (E.g. the rpc
> cache channel files probably do the same, don't they?) I was assuming
> tools like sosreport need to work from lists of specific paths.
>
> > Also, it seems like if gssproxy suddenly dies, then this switch stays
> > stuck in its position. There is no way switch back after enabling
> > gssproxy.
>
> That's true. I didn't think the ability to change on the fly was
> necessary (but I'll admit it's annoying when debugging at least.)
>
> > All of that seems unnecessarily complicated. Is there some rationale
> > for it that I'm missing?
> >
> > Would it instead make more sense to instead just have gssproxy
> > hold a file open under /proc? If the file is being held open, then
> > send upcalls to gssproxy. If not then use the legacy code.
>
> The kernel code needs to know which way to handle an incoming packet at
> the time it arrives. If it falls back on the legacy upcall that means
> failing large init_sec_context calls. So a delayed gss-proxy start (or
> a crash and restart) would mean clients erroring out (with fatal errors
> I think, not just something that would be retried).

Well if gss-proxy is not running it will fail anyway right ?
We have 90s before nfsd starts receiving incoming calls at startup
right ? Isn't that enough to guarantee whatever the admin configured to
start is started ? If gss-proxy is dead for whatever reason a failure
will happen anyway now, esp because rpc.gssd will most probably not be
running anyway if the admin configured gss-proxy instead ...

Simo.

--
Simo Sorce * Red Hat, Inc * New York


2013-12-31 23:02:42

by Jeff Layton

[permalink] [raw]
Subject: Re: should we change how the kernel detects whether gssproxy is running?

On Wed, 1 Jan 2014 09:56:20 +1100
NeilBrown <[email protected]> wrote:

> On Tue, 31 Dec 2013 13:01:23 -0500 "J. Bruce Fields" <[email protected]>
> wrote:
>
> > On Tue, Dec 31, 2013 at 07:33:00AM -0500, Jeff Layton wrote:
> > > I'm a bit concerned with how /proc/net/rpc/use-gss-proxy works...
> > >
> > > For one thing, when the kernel first boots any read against that file
> > > hangs. That's going to be extremely problematic for certain tools that
> > > scrape info out of /proc for troubleshooting purposes (e.g. Red Hat's
> > > sosreport tool).
> >
> > Is that the only file under /proc for which that's true? (E.g. the rpc
> > cache channel files probably do the same, don't they?) I was assuming
> > tools like sosreport need to work from lists of specific paths.
>
> The rpc cache channel files do not block on reads, so 'cat' works well on
> them.
> A process (like mountd) that wasn't to see new additions will use select (or
> poll) for an 'exception' condition, and then read.
>
> I think that it is best of all files in /proc (or /sys) would support 'cat'.
> If I "tar" up "/proc" on my notebook it doesn't block ... though it does take
> quite a while on /proc/kcore :-)
>

I think we have to deal with the possibility that something will
eventually get stuck reading this file.

I can understand why you might make the kernel hang for a little while
waiting for gssproxy to start up or something. What's not clear to me
is why you'd make userland reads vs. this particular proc file hang.
What problem is fixed by making this hang?

--
Jeff Layton <[email protected]>


Attachments:
signature.asc (836.00 B)

2013-12-31 18:01:29

by J. Bruce Fields

[permalink] [raw]
Subject: Re: should we change how the kernel detects whether gssproxy is running?

On Tue, Dec 31, 2013 at 07:33:00AM -0500, Jeff Layton wrote:
> I'm a bit concerned with how /proc/net/rpc/use-gss-proxy works...
>
> For one thing, when the kernel first boots any read against that file
> hangs. That's going to be extremely problematic for certain tools that
> scrape info out of /proc for troubleshooting purposes (e.g. Red Hat's
> sosreport tool).

Is that the only file under /proc for which that's true? (E.g. the rpc
cache channel files probably do the same, don't they?) I was assuming
tools like sosreport need to work from lists of specific paths.

> Also, it seems like if gssproxy suddenly dies, then this switch stays
> stuck in its position. There is no way switch back after enabling
> gssproxy.

That's true. I didn't think the ability to change on the fly was
necessary (but I'll admit it's annoying when debugging at least.)

> All of that seems unnecessarily complicated. Is there some rationale
> for it that I'm missing?
>
> Would it instead make more sense to instead just have gssproxy
> hold a file open under /proc? If the file is being held open, then
> send upcalls to gssproxy. If not then use the legacy code.

The kernel code needs to know which way to handle an incoming packet at
the time it arrives. If it falls back on the legacy upcall that means
failing large init_sec_context calls. So a delayed gss-proxy start (or
a crash and restart) would mean clients erroring out (with fatal errors
I think, not just something that would be retried).

--b.

> That way the upcalls would truly be conditional on whether gssproxy is
> running...
>
> --
> Jeff Layton <[email protected]>

2013-12-31 22:56:33

by NeilBrown

[permalink] [raw]
Subject: Re: should we change how the kernel detects whether gssproxy is running?

On Tue, 31 Dec 2013 13:01:23 -0500 "J. Bruce Fields" <[email protected]>
wrote:

> On Tue, Dec 31, 2013 at 07:33:00AM -0500, Jeff Layton wrote:
> > I'm a bit concerned with how /proc/net/rpc/use-gss-proxy works...
> >
> > For one thing, when the kernel first boots any read against that file
> > hangs. That's going to be extremely problematic for certain tools that
> > scrape info out of /proc for troubleshooting purposes (e.g. Red Hat's
> > sosreport tool).
>
> Is that the only file under /proc for which that's true? (E.g. the rpc
> cache channel files probably do the same, don't they?) I was assuming
> tools like sosreport need to work from lists of specific paths.

The rpc cache channel files do not block on reads, so 'cat' works well on
them.
A process (like mountd) that wasn't to see new additions will use select (or
poll) for an 'exception' condition, and then read.

I think that it is best of all files in /proc (or /sys) would support 'cat'.
If I "tar" up "/proc" on my notebook it doesn't block ... though it does take
quite a while on /proc/kcore :-)

NeilBrown


Attachments:
signature.asc (828.00 B)

2014-01-02 18:55:22

by J. Bruce Fields

[permalink] [raw]
Subject: Re: should we change how the kernel detects whether gssproxy is running?

On Wed, Jan 01, 2014 at 09:56:20AM +1100, NeilBrown wrote:
> On Tue, 31 Dec 2013 13:01:23 -0500 "J. Bruce Fields" <[email protected]>
> wrote:
>
> > On Tue, Dec 31, 2013 at 07:33:00AM -0500, Jeff Layton wrote:
> > > I'm a bit concerned with how /proc/net/rpc/use-gss-proxy works...
> > >
> > > For one thing, when the kernel first boots any read against that file
> > > hangs. That's going to be extremely problematic for certain tools that
> > > scrape info out of /proc for troubleshooting purposes (e.g. Red Hat's
> > > sosreport tool).
> >
> > Is that the only file under /proc for which that's true? (E.g. the rpc
> > cache channel files probably do the same, don't they?) I was assuming
> > tools like sosreport need to work from lists of specific paths.
>
> The rpc cache channel files do not block on reads, so 'cat' works well on
> them.
> A process (like mountd) that wasn't to see new additions will use select (or
> poll) for an 'exception' condition, and then read.
>
> I think that it is best of all files in /proc (or /sys) would support 'cat'.
> If I "tar" up "/proc" on my notebook it doesn't block ... though it does take
> quite a while on /proc/kcore :-)

Yes, trying that myself, I see the delay reading kcore, a bunch of "file
removed before we read it"/"file changed as we read it" errors (not too
surprising), EBUSY on /proc/acpi/event, and a few permission errors.

So yes letting a /proc read hang looks like a bug, my bad.

--b.

2014-01-01 12:21:15

by Jeff Layton

[permalink] [raw]
Subject: Re: should we change how the kernel detects whether gssproxy is running?

On Tue, 31 Dec 2013 13:01:23 -0500
"J. Bruce Fields" <[email protected]> wrote:

> On Tue, Dec 31, 2013 at 07:33:00AM -0500, Jeff Layton wrote:
> > I'm a bit concerned with how /proc/net/rpc/use-gss-proxy works...
> >
> > For one thing, when the kernel first boots any read against that file
> > hangs. That's going to be extremely problematic for certain tools that
> > scrape info out of /proc for troubleshooting purposes (e.g. Red Hat's
> > sosreport tool).
>
> Is that the only file under /proc for which that's true? (E.g. the rpc
> cache channel files probably do the same, don't they?) I was assuming
> tools like sosreport need to work from lists of specific paths.
>
> > Also, it seems like if gssproxy suddenly dies, then this switch stays
> > stuck in its position. There is no way switch back after enabling
> > gssproxy.
>
> That's true. I didn't think the ability to change on the fly was
> necessary (but I'll admit it's annoying when debugging at least.)
>
> > All of that seems unnecessarily complicated. Is there some rationale
> > for it that I'm missing?
> >
> > Would it instead make more sense to instead just have gssproxy
> > hold a file open under /proc? If the file is being held open, then
> > send upcalls to gssproxy. If not then use the legacy code.
>
> The kernel code needs to know which way to handle an incoming packet at
> the time it arrives. If it falls back on the legacy upcall that means
> failing large init_sec_context calls. So a delayed gss-proxy start (or
> a crash and restart) would mean clients erroring out (with fatal errors
> I think, not just something that would be retried).
>

Now that I look at the code, I'm not sure this is really doing what you
intend. The kernel upcall code immediately falls back on the legacy
code if gssproxy isn't up when the RPCs come in.

The only thing that seems to wait for that is reads from the
use-gss-proxy file, which doesn't make a lot of sense to me. AFAICT,
only gssproxy itself opens that file and all it does it write to it.

RFC patchset forthcoming. Probably best we continue the discussion there...

--
Jeff Layton <[email protected]>

2014-01-02 18:52:10

by J. Bruce Fields

[permalink] [raw]
Subject: Re: should we change how the kernel detects whether gssproxy is running?

On Tue, Dec 31, 2013 at 03:52:00PM -0500, Simo Sorce wrote:
> On Tue, 2013-12-31 at 13:01 -0500, J. Bruce Fields wrote:
> > On Tue, Dec 31, 2013 at 07:33:00AM -0500, Jeff Layton wrote:
> > > I'm a bit concerned with how /proc/net/rpc/use-gss-proxy works...
> > >
> > > For one thing, when the kernel first boots any read against that file
> > > hangs. That's going to be extremely problematic for certain tools that
> > > scrape info out of /proc for troubleshooting purposes (e.g. Red Hat's
> > > sosreport tool).
> >
> > Is that the only file under /proc for which that's true? (E.g. the rpc
> > cache channel files probably do the same, don't they?) I was assuming
> > tools like sosreport need to work from lists of specific paths.
> >
> > > Also, it seems like if gssproxy suddenly dies, then this switch stays
> > > stuck in its position. There is no way switch back after enabling
> > > gssproxy.
> >
> > That's true. I didn't think the ability to change on the fly was
> > necessary (but I'll admit it's annoying when debugging at least.)
> >
> > > All of that seems unnecessarily complicated. Is there some rationale
> > > for it that I'm missing?
> > >
> > > Would it instead make more sense to instead just have gssproxy
> > > hold a file open under /proc? If the file is being held open, then
> > > send upcalls to gssproxy. If not then use the legacy code.
> >
> > The kernel code needs to know which way to handle an incoming packet at
> > the time it arrives. If it falls back on the legacy upcall that means
> > failing large init_sec_context calls. So a delayed gss-proxy start (or
> > a crash and restart) would mean clients erroring out (with fatal errors
> > I think, not just something that would be retried).
>
> Well if gss-proxy is not running it will fail anyway right ?

It will use the cache upcall, which I believe will wait at least a
little while before giving up on rpc.svcgssd.

> We have 90s before nfsd starts receiving incoming calls at startup
> right ?

No, nfsd needs to be able to process incoming init_sec_context calls the
moment it starts. You're probably thinking of the grace period, which
affects only opens and locks.

> Isn't that enough to guarantee whatever the admin configured to
> start is started ? If gss-proxy is dead for whatever reason a failure
> will happen anyway now, esp because rpc.gssd will most probably not be
> running anyway if the admin configured gss-proxy instead ...

OK, I was probably unreasonably worried about gss-proxy going down while
nfsd was running. If we order startup correctly and don't allow
restarting gss-proxy without restarting nfsd then that shouldn't be a
worry.

--b.