2010-02-16 12:20:31

by Nikita V. Youshchenko

[permalink] [raw]
Subject: Extended error reporting to user space?

Hi

I'm developing a device driver that, in it's ioctl()s, accepts a complex
data structure. Before doing it's operation, it performs large number of
checks if data is valid. If one of those checks fail, driver
returns -EINVAL.

Unfortunately this -EINVAL is not really useful. E.g. if a developer,
sitting in his IDE and debugging his code, will see ioctl()
returning -EINVAL, and will have hard times finding what exactly is wrong.

Before inventing driver-specific extended error reporting, I'd like to ask
if there is anything more or less generic for this.
I believe situation when -Exxx is too weak interface for error reporting is
common.

Nikita


2010-02-16 12:23:57

by Alexey Dobriyan

[permalink] [raw]
Subject: Re: Extended error reporting to user space?

On Tue, Feb 16, 2010 at 2:20 PM, Nikita V. Youshchenko <[email protected]> wrote:
> Unfortunately this -EINVAL is not really useful.

There is no such thing.
Driver can spit messages with printk() though.

2010-02-16 13:14:55

by Cong Wang

[permalink] [raw]
Subject: Re: Extended error reporting to user space?

On Tue, Feb 16, 2010 at 02:23:55PM +0200, Alexey Dobriyan wrote:
>On Tue, Feb 16, 2010 at 2:20 PM, Nikita V. Youshchenko <[email protected]> wrote:
>> Unfortunately this -EINVAL is not really useful.
>
>There is no such thing.
>Driver can spit messages with printk() though.

Right, if errno is not enough, printk() is.

You could also use other interface to debug, but printk() is the
simplest.

Thanks.

2010-02-17 09:58:42

by Andi Kleen

[permalink] [raw]
Subject: Re: Extended error reporting to user space?

"Nikita V. Youshchenko" <[email protected]> writes:

> I'm developing a device driver that, in it's ioctl()s, accepts a complex
> data structure. Before doing it's operation, it performs large number of
> checks if data is valid. If one of those checks fail, driver
> returns -EINVAL.
>
> Unfortunately this -EINVAL is not really useful. E.g. if a developer,
> sitting in his IDE and debugging his code, will see ioctl()
> returning -EINVAL, and will have hard times finding what exactly is wrong.
>
> Before inventing driver-specific extended error reporting, I'd like to ask
> if there is anything more or less generic for this.
> I believe situation when -Exxx is too weak interface for error reporting is
> common.

This is a very common problem in Linux unfortunately. I always
describe that as a the "ed approach to error handling". Instead
of giving a error message you just give ?. Just ? happens
to be EINVAL in Linux.

My favourite example of this is the configuration of the networking
queueing disciplines, which configure complicated data structures and
algorithms and in many cases have tens of different error conditions
based on the input parameters -- and they all just report EINVAL.

The standard way (standard kludge or standard workaround would be a
better description) is to use printk; often guarded by a special
kernel tunable or ifdef to avoid flooding the log in the normal case.

IMHO it would be best to simply add a way to return strings directly
in this case (a la plan9). This would be probably not too hard to
implement. It's not there unfortunately.

This could be done with one of the message oriented protocols,
e.g. netlink or read/write on a special minor.

-Andi

--
[email protected] -- Speaking for myself only.

2010-02-17 10:16:56

by Nikita V. Youshchenko

[permalink] [raw]
Subject: Re: Extended error reporting to user space?

> "Nikita V. Youshchenko" <[email protected]> writes:
> > I'm developing a device driver that, in it's ioctl()s, accepts a
> > complex data structure. Before doing it's operation, it performs large
> > number of checks if data is valid. If one of those checks fail, driver
> > returns -EINVAL.
> >
> > Unfortunately this -EINVAL is not really useful. E.g. if a developer,
> > sitting in his IDE and debugging his code, will see ioctl()
> > returning -EINVAL, and will have hard times finding what exactly is
> > wrong.
> >
> > Before inventing driver-specific extended error reporting, I'd like to
> > ask if there is anything more or less generic for this.
> > I believe situation when -Exxx is too weak interface for error
> > reporting is common.
>
> This is a very common problem in Linux unfortunately. I always
> describe that as a the "ed approach to error handling". Instead
> of giving a error message you just give ?. Just ? happens
> to be EINVAL in Linux.
>
> My favourite example of this is the configuration of the networking
> queueing disciplines, which configure complicated data structures and
> algorithms and in many cases have tens of different error conditions
> based on the input parameters -- and they all just report EINVAL.
>
> The standard way (standard kludge or standard workaround would be a
> better description) is to use printk; often guarded by a special
> kernel tunable or ifdef to avoid flooding the log in the normal case.
>
> IMHO it would be best to simply add a way to return strings directly
> in this case (a la plan9). This would be probably not too hard to
> implement. It's not there unfortunately.
>
> This could be done with one of the message oriented protocols,
> e.g. netlink or read/write on a special minor.

Why not create a generic solution for this, if one does not exist yet?

For example, have a "last error" string associated with task_struct, that:
- will clean on each syscall entry,
- while syscall is running, may be filled with printf-style routines,
- may be accessible from userspace with additional syscall [that obviously
should not reset error]?

This will give driver writers a common interface for extended error
reporting...

Nikita

2010-02-17 10:35:11

by Andi Kleen

[permalink] [raw]
Subject: Re: Extended error reporting to user space?

On Wed, Feb 17, 2010 at 01:16:48PM +0300, Nikita V. Youshchenko wrote:
> > "Nikita V. Youshchenko" <[email protected]> writes:
> > > I'm developing a device driver that, in it's ioctl()s, accepts a
> > > complex data structure. Before doing it's operation, it performs large
> > > number of checks if data is valid. If one of those checks fail, driver
> > > returns -EINVAL.
> > >
> > > Unfortunately this -EINVAL is not really useful. E.g. if a developer,
> > > sitting in his IDE and debugging his code, will see ioctl()
> > > returning -EINVAL, and will have hard times finding what exactly is
> > > wrong.
> > >
> > > Before inventing driver-specific extended error reporting, I'd like to
> > > ask if there is anything more or less generic for this.
> > > I believe situation when -Exxx is too weak interface for error
> > > reporting is common.
> >
> > This is a very common problem in Linux unfortunately. I always
> > describe that as a the "ed approach to error handling". Instead
> > of giving a error message you just give ?. Just ? happens
> > to be EINVAL in Linux.
> >
> > My favourite example of this is the configuration of the networking
> > queueing disciplines, which configure complicated data structures and
> > algorithms and in many cases have tens of different error conditions
> > based on the input parameters -- and they all just report EINVAL.
> >
> > The standard way (standard kludge or standard workaround would be a
> > better description) is to use printk; often guarded by a special
> > kernel tunable or ifdef to avoid flooding the log in the normal case.
> >
> > IMHO it would be best to simply add a way to return strings directly
> > in this case (a la plan9). This would be probably not too hard to
> > implement. It's not there unfortunately.
> >
> > This could be done with one of the message oriented protocols,
> > e.g. netlink or read/write on a special minor.
>
> Why not create a generic solution for this, if one does not exist yet?

Someone would need to do it. Yes I think it would be a worthy project.

The trick is also get around the objections of the "but we always
did it this way" Unix traditionalists.

>
> For example, have a "last error" string associated with task_struct, that:
> - will clean on each syscall entry,
> - while syscall is running, may be filled with printf-style routines,
> - may be accessible from userspace with additional syscall [that obviously
> should not reset error]?
>
> This will give driver writers a common interface for extended error
> reporting...

You would need a way to save/restore that string too (like it works
with errno) otherwise libraries cannot use it safely. Also
it would be good to have something that does not impact the system
call fast path for a non error call.

>From the basic semantics I think I would prefer a way
associated with each syscall. It could be probably fit into
many syscall ABIs, but that would need architecture specific
changes, which are difficult to coordinate (Linux has too many
architectures and many of them with inactive maintainers)

One way to do that would be a "extended ioctl" syscall that supports
this in a generic way (and perhaps could fix some of the other problems
of ioctl too, like better type safety).

Designing such a thing might end up being a rat-hole (and you would
probably need to be very careful to avoid the second system effect)

Of course the qdiscs and other code who uses netlink instead would also
need something equivalent.

Also I expect someone would come up with localization issues, although
the the classical "translation database" approach would probably work
anyways.

-Andi

--
[email protected] -- Speaking for myself only.

2010-02-17 10:44:51

by Alan

[permalink] [raw]
Subject: Re: Extended error reporting to user space?

> For example, have a "last error" string associated with task_struct, that:
> - will clean on each syscall entry,
> - while syscall is running, may be filled with printf-style routines,
> - may be accessible from userspace with additional syscall [that obviously
> should not reset error]?
>
> This will give driver writers a common interface for extended error
> reporting...

Thats probably overkill. For almost any ioctl type interface the only
thing you *need* to make more sense is the address of the field that was
deemed invalid.

So in your ioctl handler you'd do something like

get_user(v, &foo->wombats);
if (v < 5) {
error_addr(&oo->wombats);
return -EINVAL;
}

returning text is all very well, and printk can help debug, but neither
actually help application code or particularly help interpreters to dig
into the detail and act themselves to fix a problem or understand it. It
also costs material amounts of unswappable memory and also disk storage
for the kernel image on embedded devices.

Two other problems text returns bring up or ambiguity and translations -
its almost impossible to keep them unique even within a big module. It's
also possible to get things like typos in the returned text or
mis-spellings that you then can't fix because some other app now has

if (strcmp(returned_err, "No such wombat evalueted")==0) {
...
}

in it. (HTTP 'referer' being a dark warning from history ...)

A lot of other systems keep message catalogues often indexed by
module:error. Text lookups in userspace (easy to do with existing
interfaces), and the OS providing generic, specific, and identifying
module info.

I guess the Linux extension to that would end up as

extended_error(&foo->wombats, E_NOT_A_VALID_BREEDING_POPULATION);

and internally expand to include THIS_MODULE and extract the module name.

There's another related problem here too - Unix style errors lack the
ability of some OS systems to report "It worked but ....." which leads to
interface oddities like termios where it reports "Ok" but you have to
inpsect the returned structure to see if you got what you requested.

Doesn't look too hard to add some of this or something similar as you
suggest and while it would take a long time to get coverage you have to
start somewhere.

Alan

2010-02-17 11:57:48

by Andi Kleen

[permalink] [raw]
Subject: Re: Extended error reporting to user space?

Hi Alan,

> Thats probably overkill. For almost any ioctl type interface the only
> thing you *need* to make more sense is the address of the field that was
> deemed invalid.

Take a look at all the return -EINVALs in net/sched/sch_cbq.c
and then tell me if you really still believe just knowing the field
is enough to diagnose those. A common issue for example
is if it depends on the current state somehow.

> actually help application code or particularly help interpreters to dig
> into the detail and act themselves to fix a problem or understand it. It
> also costs material amounts of unswappable memory and also disk storage
> for the kernel image on embedded devices.

Trading developer time for a few bytes saved is exactly the wrong
tradeoff, even on a small system. In principle it could be CONFIGed
of course, but I suspect it wouldn't be worth it (especially
compared to all the other bloat)

>
> Two other problems text returns bring up or ambiguity and translations -
> its almost impossible to keep them unique even within a big module. It's

For translations the "pragmatic text database" works reasonably well
I think. Also you don't necessarily need them to be unique
(if the english string is not unique, why would the translation need to be?)

Sure text won't solve all problems either, but it's infinitely
better than EINVAL.


> also possible to get things like typos in the returned text or
> mis-spellings that you then can't fix because some other app now has
>
> if (strcmp(returned_err, "No such wombat evalueted")==0) {
> ...
> }
>
> in it. (HTTP 'referer' being a dark warning from history ...)

You could get numbers wrong too. There's really no cure against
that.

But yes it's a good point -- would need to make sure that the spelling
police would direct their efforts elsewhere as much as possible.


>
> A lot of other systems keep message catalogues often indexed by
> module:error. Text lookups in userspace (easy to do with existing
> interfaces), and the OS providing generic, specific, and identifying
> module info.

That's the IBM approach. I have some doubts it would really work
for a distributed environment like Linux. I believe it has been
even tried already (e.g. there's a Japanese project for such
a catalog). I don't think it works that well.

I think i would prefer just text strings. In principle one
could still develop a convention inside them though.

>
> I guess the Linux extension to that would end up as
>
> extended_error(&foo->wombats, E_NOT_A_VALID_BREEDING_POPULATION);
>
> and internally expand to include THIS_MODULE and extract the module name.

Hmm, yes including the module might be reasonable.

> There's another related problem here too - Unix style errors lack the
> ability of some OS systems to report "It worked but ....." which leads to
> interface oddities like termios where it reports "Ok" but you have to
> inpsect the returned structure to see if you got what you requested.

I suspect that's better solved in some way specific to that call.
I don't think it's all that common anyways.

-Andi

--
[email protected] -- Speaking for myself only.

2010-02-17 19:50:22

by Dr. David Alan Gilbert

[permalink] [raw]
Subject: Re: Extended error reporting to user space?

* Andi Kleen ([email protected]) wrote:
> On Wed, Feb 17, 2010 at 01:16:48PM +0300, Nikita V. Youshchenko wrote:
> > > "Nikita V. Youshchenko" <[email protected]> writes:
> > > > I'm developing a device driver that, in it's ioctl()s, accepts a
> > > > complex data structure. Before doing it's operation, it performs large
> > > > number of checks if data is valid. If one of those checks fail, driver
> > > > returns -EINVAL.
> > > >
> > > > Unfortunately this -EINVAL is not really useful. E.g. if a developer,
> > > > sitting in his IDE and debugging his code, will see ioctl()
> > > > returning -EINVAL, and will have hard times finding what exactly is
> > > > wrong.
> > > >
> > > > Before inventing driver-specific extended error reporting, I'd like to
> > > > ask if there is anything more or less generic for this.
> > > > I believe situation when -Exxx is too weak interface for error
> > > > reporting is common.
> > >
> > > This is a very common problem in Linux unfortunately. I always
> > > describe that as a the "ed approach to error handling". Instead
> > > of giving a error message you just give ?. Just ? happens
> > > to be EINVAL in Linux.
> > >
> > > My favourite example of this is the configuration of the networking
> > > queueing disciplines, which configure complicated data structures and
> > > algorithms and in many cases have tens of different error conditions
> > > based on the input parameters -- and they all just report EINVAL.
> > >
> > > The standard way (standard kludge or standard workaround would be a
> > > better description) is to use printk; often guarded by a special
> > > kernel tunable or ifdef to avoid flooding the log in the normal case.
> > >
> > > IMHO it would be best to simply add a way to return strings directly
> > > in this case (a la plan9). This would be probably not too hard to
> > > implement. It's not there unfortunately.
> > >
> > > This could be done with one of the message oriented protocols,
> > > e.g. netlink or read/write on a special minor.
> >
> > Why not create a generic solution for this, if one does not exist yet?
>
> Someone would need to do it. Yes I think it would be a worthy project.
>
> The trick is also get around the objections of the "but we always
> did it this way" Unix traditionalists.

I'd wondered about some form of halfway house where the error
value is expanded but could be truncated for compatibility - i.e.
if at the moment we had:

return -EINVAL;

it would become:

return ERRORNUM(EINVAL, BADLENGTH);

and that would expand to something like:
return -(EINVAL + BADLENGTH << ESHIFT);

existing syscall handlers could mask the extended error bits out on the way
back, and a new entry could pass the whole error value back where user space
could separate out the other part of the error.

This still feels quite like stretching the traditional way; but at the cost
of it still having the same problems (e.g. having to define a list of error
values).

One hard problem is that often the thing that actually returns the error
has actually just got a failure from something that called it which didn't
return any diagnostics, so to do this properly errors have to be passed
around in a lot of places; you'll also have to figure out just how far
down you want to pass it - if a read() fails due to a SCSI error there
is a whole load of different levels of information that you have to chose
what to return.

<snip>

Dave (who has stared at mmap's that have returned EINVAL for way too long)

--
-----Open up your eyes, open up your mind, open up your code -------
/ Dr. David Alan Gilbert | Running GNU/Linux on Alpha,68K| Happy \
\ gro.gilbert @ treblig.org | MIPS,x86,ARM,SPARC,PPC & HPPA | In Hex /
\ _________________________|_____ http://www.treblig.org |_______/

2010-02-17 20:23:17

by Andi Kleen

[permalink] [raw]
Subject: Re: Extended error reporting to user space?

> I'd wondered about some form of halfway house where the error
> value is expanded but could be truncated for compatibility - i.e.

Who would do the truncation?

> if at the moment we had:
>
> return -EINVAL;
>
> it would become:
>
> return ERRORNUM(EINVAL, BADLENGTH);

x86 only has about 12 bits in the current ABI btw.

-Andi

2010-02-21 00:09:00

by Dr. David Alan Gilbert

[permalink] [raw]
Subject: Re: Extended error reporting to user space?

* Andi Kleen ([email protected]) wrote:
> > I'd wondered about some form of halfway house where the error
> > value is expanded but could be truncated for compatibility - i.e.
>
> Who would do the truncation?

I'd assumed something like the code in entry*.S or equivalent for
existing syscalls, but have another entry that would pass it through
to a newer libc that new how to handle it.

> > if at the moment we had:
> >
> > return -EINVAL;
> >
> > it would become:
> >
> > return ERRORNUM(EINVAL, BADLENGTH);
>
> x86 only has about 12 bits in the current ABI btw.

Ah yes I can see that now, hmm.

Dave
--
-----Open up your eyes, open up your mind, open up your code -------
/ Dr. David Alan Gilbert | Running GNU/Linux on Alpha,68K| Happy \
\ gro.gilbert @ treblig.org | MIPS,x86,ARM,SPARC,PPC & HPPA | In Hex /
\ _________________________|_____ http://www.treblig.org |_______/