2024-02-07 09:55:02

by Miklos Szeredi

[permalink] [raw]
Subject: [LSF/MM/BPF TOPIC] tracing the source of errors

[I'm not planning to attend LSF this year, but I thought this topic
might be of interest to those who will.]

The errno thing is really ancient and yet quite usable. But when
trying to find out where a particular EINVAL is coming from, that's
often mission impossible.

Would it make sense to add infrastructure to allow tracing the source
of errors? E.g.

strace --errno-trace ls -l foo
..
statx(AT_FDCWD, "foo", ...) = -1 ENOENT [fs/namei.c:1852]
..

Don't know about others, but this issue comes up quite often for me.

I would implement this with macros that record the place where a
particular error has originated, and some way to query the last one
(which wouldn't be 100% accurate, but good enough I guess).

Thanks,
Miklos


2024-02-07 11:01:09

by Jan Kara

[permalink] [raw]
Subject: Re: [Lsf-pc] [LSF/MM/BPF TOPIC] tracing the source of errors

On Wed 07-02-24 10:54:34, Miklos Szeredi via Lsf-pc wrote:
> [I'm not planning to attend LSF this year, but I thought this topic
> might be of interest to those who will.]
>
> The errno thing is really ancient and yet quite usable. But when
> trying to find out where a particular EINVAL is coming from, that's
> often mission impossible.
>
> Would it make sense to add infrastructure to allow tracing the source
> of errors? E.g.
>
> strace --errno-trace ls -l foo
> ...
> statx(AT_FDCWD, "foo", ...) = -1 ENOENT [fs/namei.c:1852]
> ...
>
> Don't know about others, but this issue comes up quite often for me.

Yes, having this available would be really useful at times. Sometimes I
had to resort to kprobes or good old printks.

> I would implement this with macros that record the place where a
> particular error has originated, and some way to query the last one
> (which wouldn't be 100% accurate, but good enough I guess).

The problem always has been how to implement this functionality in a
transparent way so the code does not become a mess. So if you have some
idea, I'd say go for it :)

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2024-02-07 11:24:07

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [Lsf-pc] [LSF/MM/BPF TOPIC] tracing the source of errors

On Wed, 7 Feb 2024 at 12:00, Jan Kara <[email protected]> wrote:

> The problem always has been how to implement this functionality in a
> transparent way so the code does not become a mess. So if you have some
> idea, I'd say go for it :)

My first idea would be to wrap all instances of E* (e.g. ERR(E*)).
But this could be made completely transparent by renaming current
definition of E* to _E* and defining E* to be the wrapped ones.
There's probably a catch (or several catches) somewhere, though.

Thanks,
Miklos

2024-02-07 13:06:56

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [Lsf-pc] [LSF/MM/BPF TOPIC] tracing the source of errors

On Wed, Feb 07, 2024 at 12:23:41PM +0100, Miklos Szeredi wrote:
> On Wed, 7 Feb 2024 at 12:00, Jan Kara <[email protected]> wrote:
>
> > The problem always has been how to implement this functionality in a
> > transparent way so the code does not become a mess. So if you have some
> > idea, I'd say go for it :)
>
> My first idea would be to wrap all instances of E* (e.g. ERR(E*)).
> But this could be made completely transparent by renaming current
> definition of E* to _E* and defining E* to be the wrapped ones.
> There's probably a catch (or several catches) somewhere, though.

To be perfectly clear, you're suggesting two things.

Option (a) change "all" code like this:
- ret = -EINVAL;
+ ret = -ERR(EINVAL);

where ERR would do some magic with __func__ and __LINE__.

Option (b)

-#define EINVAL 22
+#define E_INVAL 22
+#define EINVAL ERR(E_INVAL)

and then change all code that does something like:

if (err == -EINVAL)
to
if (err == -E_INVAL)

Or have I misunderstood?

2024-02-07 17:26:34

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [LSF/MM/BPF TOPIC] tracing the source of errors

On Wed, Feb 07, 2024 at 10:54:34AM +0100, Miklos Szeredi wrote:
> [I'm not planning to attend LSF this year, but I thought this topic
> might be of interest to those who will.]
>
> The errno thing is really ancient and yet quite usable. But when
> trying to find out where a particular EINVAL is coming from, that's
> often mission impossible.
>
> Would it make sense to add infrastructure to allow tracing the source
> of errors? E.g.
>
> strace --errno-trace ls -l foo
> ...
> statx(AT_FDCWD, "foo", ...) = -1 ENOENT [fs/namei.c:1852]
> ...
>
> Don't know about others, but this issue comes up quite often for me.
>
> I would implement this with macros that record the place where a
> particular error has originated, and some way to query the last one
> (which wouldn't be 100% accurate, but good enough I guess).

Hmmm, weren't Kent and Suren working on code tagging for memory
allocation profiling? It would be kinda nice to wrap that up in the
error return paths as well.

Granted then we end up with some nasty macro mess like:

[Pretend that there's a struct errno_tag, DEFINE_ALLOC_TAG, and
__alloc_tag_add symbols that looks mostly like struct alloc_tag from [1]
and then (backslashes elided)]

#define Err(x)
({
int __errno = (x);
DEFINE_ERRNO_TAG(_errno_tag);

trace_return_errno(__this_address, __errno)
__errno_tag_add(&_errno_tag, __errno);
__errno;
})

foo = kmalloc(...);
if (!foo)
return Err(-ENOMEM);

or

if (fs_is_messed_up())
return Err(-EINVAL);

This would get us the ability to ftrace for where errno returns
initiate, as well as collect counters for how often we're actually
doing that in production. You could even add time_stats too, but
annotating the entire kernel might be a stretch.

--D

[1] https://lwn.net/Articles/906660/

> Thanks,
> Miklos
>

2024-02-07 19:39:31

by Eric Sandeen

[permalink] [raw]
Subject: Re: [Lsf-pc] [LSF/MM/BPF TOPIC] tracing the source of errors

On 2/7/24 5:23 AM, Miklos Szeredi wrote:
> On Wed, 7 Feb 2024 at 12:00, Jan Kara <[email protected]> wrote:
>
>> The problem always has been how to implement this functionality in a
>> transparent way so the code does not become a mess. So if you have some
>> idea, I'd say go for it :)
>
> My first idea would be to wrap all instances of E* (e.g. ERR(E*)).
> But this could be made completely transparent by renaming current
> definition of E* to _E* and defining E* to be the wrapped ones.
> There's probably a catch (or several catches) somewhere, though.
>
> Thanks,
> Miklos
>

Just FWIW, XFS has kind of been there and back again on wrapping error returns
with macros.

Long ago, we had an XFS_ERROR() macro, i.e.

if (error)
return -XFS_ERROR(error);

sprinkled (randomly) throughout the code.

(it didn't make it out through strace, and was pretty clunky but could printk or
BUG based on which error you were looking for, IIRC.)

In 2014(!) I removed it, pointing out that systemtap could essentially do the
same thing, and do it more flexibly (see: [PATCH 2/2] xfs: Nuke XFS_ERROR macro):

# probe module("xfs").function("xfs_*").return { if (@defined($return) &&
$return == VALUE) { ... } }

hch pointed out that systemtap was not a viable option for many, and further
discussion turned up a slightly kludgey way to use kprobes:

-- from dchinner --
#!/bin/bash

TRACEDIR=/sys/kernel/debug/tracing

grep -i 't xfs_' /proc/kallsyms | awk '{print $3}' ; while read F; do
echo "r:ret_$F $F \$retval" >> $TRACEDIR/kprobe_events
done

for E in $TRACEDIR/events/kprobes/ret_xfs_*/enable; do
echo 1 > $E
done;

echo 'arg1 > 0xffffffffffffff00' > $TRACEDIR/events/kprobes/filter

for T in $TRACEDIR/events/kprobes/ret_xfs_*/trigger; do
echo 'traceoff if arg1 > 0xffffffffffffff00' > $T
done
--------

which yields i.e.:

# dd if=/dev/zero of=/mnt/scratch/newfile bs=513 oflag=direct
dd: error writing ¿/mnt/scratch/newfile¿: Invalid argument
1+0 records in
0+0 records out
0 bytes (0 B) copied, 0.000259882 s, 0.0 kB/s
root@test4:~# cat /sys/kernel/debug/tracing/trace
# tracer: nop
#
# entries-in-buffer/entries-written: 1/1 #P:16
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
<...>-8073 [006] d... 145740.460546: ret_xfs_file_dio_aio_write:
(xfs_file_aio_write+0x170/0x180 <- xfs_file_dio_aio_write) arg1=0xffffffffffffffea

where that last negative number is the errno.

Not the prettiest thing but something that works today and could maybe be improved?

-Eric

2024-02-07 22:10:17

by Dave Chinner

[permalink] [raw]
Subject: Re: [LSF/MM/BPF TOPIC] tracing the source of errors

On Wed, Feb 07, 2024 at 10:54:34AM +0100, Miklos Szeredi wrote:
> [I'm not planning to attend LSF this year, but I thought this topic
> might be of interest to those who will.]
>
> The errno thing is really ancient and yet quite usable. But when
> trying to find out where a particular EINVAL is coming from, that's
> often mission impossible.
>
> Would it make sense to add infrastructure to allow tracing the source
> of errors? E.g.
>
> strace --errno-trace ls -l foo
> ...
> statx(AT_FDCWD, "foo", ...) = -1 ENOENT [fs/namei.c:1852]
> ...
>
> Don't know about others, but this issue comes up quite often for me.

ftrace using the function_graph tracer will emit the return values
of the functions if you use it with the 'funcgraph-retval' option.

Seems like a solved problem?

-Dave.
--
Dave Chinner
[email protected]

2024-02-08 09:10:22

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [LSF/MM/BPF TOPIC] tracing the source of errors

On Wed, 7 Feb 2024 at 22:37, Dave Chinner <[email protected]> wrote:

> ftrace using the function_graph tracer will emit the return values
> of the functions if you use it with the 'funcgraph-retval' option.
>
> Seems like a solved problem?

Except

a) this seems exceedingly difficult to set up for non-developers,
which is often where this is needed. Even strace is pretty verbose
and the generated output too big, let alone all function calls across
the whole system.

b) can only point to the function was generated. But the same error
is often generated for several different reasons within the same
function and the return value doesn't help there.

I think a) is the critical one, and possibly the ftrace infrastructure
could be used for something more friendly that just pointed to the
function where the error was generated without having to go through
hoops.

Thanks,
Miklos

2024-02-08 15:57:42

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [LSF/MM/BPF TOPIC] tracing the source of errors

On Thu, Feb 08, 2024 at 08:37:29AM +1100, Dave Chinner wrote:
> ftrace using the function_graph tracer will emit the return values
> of the functions if you use it with the 'funcgraph-retval' option.

OK, but that may not be fine grained enough. Why is mmap() returning
-ENOMEM?

unsigned long do_mmap(struct file *file, unsigned long addr,
..
/* Careful about overflows.. */
len = PAGE_ALIGN(len);
if (!len)
return -ENOMEM;
..
/* Too many mappings? */
if (mm->map_count > sysctl_max_map_count)
return -ENOMEM;

So it can distinguish between mmap() returning ENOMEM because
get_unmapped_area() returned ENOMEM and do_mmap() returning ENOMEM of
its own accord (right?), but it can't tell you which of the above two
cases you hit. Or can it?

2024-02-08 20:39:26

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: Re: [Lsf-pc] [LSF/MM/BPF TOPIC] tracing the source of errors

Jan Kara <[email protected]> writes:

> On Wed 07-02-24 10:54:34, Miklos Szeredi via Lsf-pc wrote:
>> [I'm not planning to attend LSF this year, but I thought this topic
>> might be of interest to those who will.]
>>
>> The errno thing is really ancient and yet quite usable. But when
>> trying to find out where a particular EINVAL is coming from, that's
>> often mission impossible.
>>
>> Would it make sense to add infrastructure to allow tracing the source
>> of errors? E.g.
>>
>> strace --errno-trace ls -l foo
>> ...
>> statx(AT_FDCWD, "foo", ...) = -1 ENOENT [fs/namei.c:1852]
>> ...
>>
>> Don't know about others, but this issue comes up quite often for me.
>
> Yes, having this available would be really useful at times. Sometimes I
> had to resort to kprobes or good old printks.
>
>> I would implement this with macros that record the place where a
>> particular error has originated, and some way to query the last one
>> (which wouldn't be 100% accurate, but good enough I guess).
>
> The problem always has been how to implement this functionality in a
> transparent way so the code does not become a mess. So if you have some
> idea, I'd say go for it :)

I had a proposal to provide the LoC of filesystem errors as part of an
extended record of the FAN_FS_ERROR messages (fanotify interface). It
might be a sensible interface to expose this information if not
prohibitively expensive.

One might record the position with a macro and do the fsnotify_sb_error
from a safer context.

--
Gabriel Krisman Bertazi

2024-02-08 20:48:04

by Gabriel Krisman Bertazi

[permalink] [raw]
Subject: Re: [Lsf-pc] [LSF/MM/BPF TOPIC] tracing the source of errors

Matthew Wilcox <[email protected]> writes:

> Option (b)
>
> -#define EINVAL 22
> +#define E_INVAL 22
> +#define EINVAL ERR(E_INVAL)

Note there will surely be cases where EINVAL is used as a soft failure
and the kernel will just try something else, instead of propagating the
error up the stack. In this case, there is no point in logging the
first case of error, as it will just be expected behavior.

So there's really no way around explicitly annotating (ERR (EINVAL))
in place where it really matters, instead of changing the definition
of -EINVAL itself or automatically converting check sites.

>
> and then change all code that does something like:
>
> if (err == -EINVAL)
> to
> if (err == -E_INVAL)
>
> Or have I misunderstood?



--
Gabriel Krisman Bertazi

2024-02-09 01:55:32

by Dave Chinner

[permalink] [raw]
Subject: Re: [LSF/MM/BPF TOPIC] tracing the source of errors

On Thu, Feb 08, 2024 at 03:57:27PM +0000, Matthew Wilcox wrote:
> On Thu, Feb 08, 2024 at 08:37:29AM +1100, Dave Chinner wrote:
> > ftrace using the function_graph tracer will emit the return values
> > of the functions if you use it with the 'funcgraph-retval' option.
>
> OK, but that may not be fine grained enough. Why is mmap() returning
> -ENOMEM?
>
> unsigned long do_mmap(struct file *file, unsigned long addr,
> ...
> /* Careful about overflows.. */
> len = PAGE_ALIGN(len);
> if (!len)
> return -ENOMEM;
> ...
> /* Too many mappings? */
> if (mm->map_count > sysctl_max_map_count)
> return -ENOMEM;
>
> So it can distinguish between mmap() returning ENOMEM because
> get_unmapped_area() returned ENOMEM and do_mmap() returning ENOMEM of
> its own accord (right?),

The call stack trace should tell you which function the error
originated from, yes?

> but it can't tell you which of the above two
> cases you hit. Or can it?

Never used it, but it might be able to - the "sym-offset" option
will display exact offsets of the function in the trace. If you then
add "funcgraph-tail" it will emit the function being returned from.
If the return location is generated with an offset indicating the
actual return, then it might tell you the exact location.

If it doesn't, then this would seem like a reasonable thing to add
to ftrace - function return tracing with a filter to grab the return
location when the return value is less than 0 seems exactly the sort
of thing ftrace was intended to be used for...

-Dave.
--
Dave Chinner
[email protected]

2024-02-09 02:28:02

by Dave Chinner

[permalink] [raw]
Subject: Re: [LSF/MM/BPF TOPIC] tracing the source of errors

On Thu, Feb 08, 2024 at 10:09:36AM +0100, Miklos Szeredi wrote:
> On Wed, 7 Feb 2024 at 22:37, Dave Chinner <[email protected]> wrote:
>
> > ftrace using the function_graph tracer will emit the return values
> > of the functions if you use it with the 'funcgraph-retval' option.
> >
> > Seems like a solved problem?
>
> Except
>
> a) this seems exceedingly difficult to set up for non-developers,
> which is often where this is needed. Even strace is pretty verbose
> and the generated output too big, let alone all function calls across
> the whole system.

trace-cmd is your friend.

# trace-cmd record -p function_graph -l vfs_statx sleep 1
# trace-cmd report
...

There's also 'perf ftrace ...' as another wrapper for ftrace based
profiling, though I have never used that at all.

> b) can only point to the function was generated. But the same error
> is often generated for several different reasons within the same
> function and the return value doesn't help there.

In most cases knowing waht the function call parameters are
and the return value is enough to determine which error check
failed. Seems like that's within the scope of what ftrace could
provide us with...

> I think a) is the critical one, and possibly the ftrace infrastructure
> could be used for something more friendly that just pointed to the
> function where the error was generated without having to go through
> hoops.

*nod*

I don't use ftrace directly - never have. trace-cmd has been around
for a long time and does pretty much everything I've ever needed
over the past 15 years. That said, trace-cmd does not export all of
ftrace's features, but most of them are there and there are a lot of
users and developers already familiar with trace-cmd as a way of
seeing what is going on inside the kernel when things go wrong....

Hence I just don't see the "ftrace is difficult to use" argument as
being relevant to "how do we trace the source of the error"
discussions.

Cheers,

Dave.
--
Dave Chinner
[email protected]