On Thu, Jan 22, 2009 at 09:36:53AM +1100, Greg Banks wrote:
> Chuck Lever wrote:
> >
> >
> > I think we need to visit this issue on a case-by-case basis.
> > Sometimes dprintk is appropriate. Sometimes printk(KERN_ERR).
> > Sometimes a performance metric.
> Well said.
>
> > Trond has always maintained that dprintk() is best for developers, but
> > probably inappropriate for field debugging,
> It's not a perfect tool but it beats nothing at all.
> > and I think that may also
> > apply to trace points.
> It depends on whether distros can be convinced to enable it by default,
> and install by default any necessary userspace infrastructure. The
> most important thing for field debugging is Just Knowing that you have
> all the bits necessary to perform useful debugging without having to
> find some RPM that matches the kernel that the machine is actually
> running now, and not the one that was present when the machine was
> installed.
On the mount case specifically: How far are we from the idea of a mount
program that can identify most problems itself? I know its error
reporting has gotten better....
I suppose the main feedback mount gets right now is an error code from
the mount system call, and that may be too narrow an interface to cover
most problems. Is there some way we can give mount a real interface it
can use to find out this stuff instead of just dumping more strings into
the logs?
My main obstacle to judging a solution is that I don't have in mind a
good list of (say) the top 10 problems that can cause the first mount to
fail. Hm:
- dns lookup of the server fails
- server isn't reachable
- server isn't running nfs
- requested path isn't known to server or isn't exported
- export is there, but requires more security
- user doesn't have gss credentials
- file permissions on the export are wrong
...
--b.
On Jan 21, 2009, at Jan 21, 2009, 5:56 PM, J. Bruce Fields wrote:
> On Thu, Jan 22, 2009 at 09:36:53AM +1100, Greg Banks wrote:
>> Chuck Lever wrote:
>>>
>>>
>>> I think we need to visit this issue on a case-by-case basis.
>>> Sometimes dprintk is appropriate. Sometimes printk(KERN_ERR).
>>> Sometimes a performance metric.
>> Well said.
>>
>>> Trond has always maintained that dprintk() is best for developers,
>>> but
>>> probably inappropriate for field debugging,
>> It's not a perfect tool but it beats nothing at all.
>>> and I think that may also
>>> apply to trace points.
>> It depends on whether distros can be convinced to enable it by
>> default,
>> and install by default any necessary userspace infrastructure. The
>> most important thing for field debugging is Just Knowing that you
>> have
>> all the bits necessary to perform useful debugging without having to
>> find some RPM that matches the kernel that the machine is actually
>> running now, and not the one that was present when the machine was
>> installed.
>
> On the mount case specifically: How far are we from the idea of a
> mount
> program that can identify most problems itself? I know its error
> reporting has gotten better....
> I suppose the main feedback mount gets right now is an error code from
> the mount system call, and that may be too narrow an interface to
> cover
> most problems. Is there some way we can give mount a real interface
> it
> can use to find out this stuff instead of just dumping more strings
> into
> the logs?
A main reason it does this rather than generate error messages on the
terminal is that mount has to run in "background" environments.
Mounts done at boot time do not have a controlling terminal. A bg
mount can drop into the background, and thus it loses its controlling
terminal. Automounter doesn't have a controlling terminal to begin
with.
My feeling is that, as mount is a system tool, it should report its
problems in the system log. If there's a controlling terminal, report
it there too. But by and large it is a tool that is run most often
without direct human intervention or monitoring.
In addition there are a lot of cases it can (and does) handle by
itself. Renegotiating mount option settings is one of these things.
It's a narrow interface, but I'm not sure yet it's entirely inadequate.
> My main obstacle to judging a solution is that I don't have in mind a
> good list of (say) the top 10 problems that can cause the first
> mount to
> fail. Hm:
>
> - dns lookup of the server fails
> - server isn't reachable
> - server isn't running nfs
> - requested path isn't known to server or isn't exported
> - export is there, but requires more security
> - user doesn't have gss credentials
> - file permissions on the export are wrong
> ...
- tcp_wrappers or iptables blocking access
- network routing problems
- v2/v3 server not running rpcbind or lockd
This is exactly why I want to start with some real world examples.
Without examples we are much more likely to design something that
isn't useful to anyone. History (or e-mail archives) gives us a lot
of information about what might be common problems.
I think we handle some of these cases reasonably well today, though
they could probably stand some polish; others, like security
configuration, are still a little new and kind of a low priority (for
good or bad reasons) and so it is still a bit confusing.
But really, if mount can report a clear error message and suggest a
course of corrective action, I don't think a dprintk or trace point or
SystemTap will be of any greater help.
--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
My favorite: when you try a Kerberos mount and one of the kernel modules
requried for this isn't loaded, mount gets ENOMEM and says "out of
memory"
-Dan
-----Original Message-----
From: J. Bruce Fields [mailto:[email protected]]
Sent: Wednesday, January 21, 2009 2:56 PM
To: Greg Banks
Cc: Chuck Lever; Linux NFS Mailing list; Linux NFSv4 mailing list;
SystemTAP
Subject: Re: [RFC][PATCH 0/5] NFS: trace points added to mounting path
On Thu, Jan 22, 2009 at 09:36:53AM +1100, Greg Banks wrote:
> Chuck Lever wrote:
> >
> >
> > I think we need to visit this issue on a case-by-case basis.
> > Sometimes dprintk is appropriate. Sometimes printk(KERN_ERR).
> > Sometimes a performance metric.
> Well said.
>
> > Trond has always maintained that dprintk() is best for developers,
> > but probably inappropriate for field debugging,
> It's not a perfect tool but it beats nothing at all.
> > and I think that may also
> > apply to trace points.
> It depends on whether distros can be convinced to enable it by
default,
> and install by default any necessary userspace infrastructure. The
> most important thing for field debugging is Just Knowing that you have
> all the bits necessary to perform useful debugging without having to
> find some RPM that matches the kernel that the machine is actually
> running now, and not the one that was present when the machine was
> installed.
On the mount case specifically: How far are we from the idea of a mount
program that can identify most problems itself? I know its error
reporting has gotten better....
I suppose the main feedback mount gets right now is an error code from
the mount system call, and that may be too narrow an interface to cover
most problems. Is there some way we can give mount a real interface it
can use to find out this stuff instead of just dumping more strings into
the logs?
My main obstacle to judging a solution is that I don't have in mind a
good list of (say) the top 10 problems that can cause the first mount to
fail. Hm:
- dns lookup of the server fails
- server isn't reachable
- server isn't running nfs
- requested path isn't known to server or isn't exported
- export is there, but requires more security
- user doesn't have gss credentials
- file permissions on the export are wrong
...
--b.
J. Bruce Fields wrote:
>
> On the mount case specifically: How far are we from the idea of a mount
> program that can identify most problems itself? I know its error
> reporting has gotten better....
>
> I suppose the main feedback mount gets right now is an error code from
> the mount system call, and that may be too narrow an interface to cover
> most problems. Is there some way we can give mount a real interface it
> can use to find out this stuff instead of just dumping more strings into
> the logs?
Interesting.... Store something like a reason code (similar to what they have
in he Kerberos) in somewhere in the proc file system?
Its seems to me this is a common problem among network file systems...
steved.
On Thu, Jan 22, 2009 at 10:59:49AM -0500, Steve Dickson wrote:
> J. Bruce Fields wrote:
> >
> > On the mount case specifically: How far are we from the idea of a mount
> > program that can identify most problems itself? I know its error
> > reporting has gotten better....
> >
> > I suppose the main feedback mount gets right now is an error code from
> > the mount system call, and that may be too narrow an interface to cover
> > most problems. Is there some way we can give mount a real interface it
> > can use to find out this stuff instead of just dumping more strings into
> > the logs?
> Interesting.... Store something like a reason code (similar to what they have
> in he Kerberos)
Maybe.
> in somewhere in the proc file system?
But then I don't know how you'd associate it with a particular mount
attempt.
--b.
> Its seems to me this is a common problem among network file systems...
J. Bruce Fields wrote:
> On Thu, Jan 22, 2009 at 10:59:49AM -0500, Steve Dickson wrote:
>
>> J. Bruce Fields wrote:
>>
>>
>> Interesting.... Store something like a reason code (similar to what they have
>> in he Kerberos)
>>
>
> Maybe.
>
>
>> in somewhere in the proc file system?
>>
>
> But then I don't know how you'd associate it with a particular mount
> attempt.
>
>
>
You could do something truly awful like add a new code in the unused
bits of the errno value returned from mount. It would confuse an
unmodified userspace, but reporting "Unknown Error" isn't much less
useful than "I/O Error".
--
Greg Banks, P.Engineer, SGI Australian Software Group.
the brightly coloured sporks of revolution.
I don't speak for SGI.