LinuxLists.cc - [PATCH 1/3] SUNRPC: Move call to rpc_count_iostats before rpc_call

2019-05-23 20:14:35

Subject: [PATCH 1/3] SUNRPC: Move call to rpc_count_iostats before rpc_call_done

For diagnostic purposes, it would be useful to have an rpc_iostats
metric of RPCs completing with tk_status < 0. Unfortunately,
tk_status is reset inside the rpc_call_done functions for each
operation, and the call to tally the per-op metrics comes after
rpc_call_done. Refactor the call to rpc_count_iostat earlier in
rpc_exit_task so we can count these RPCs completing in error.

Signed-off-by: Dave Wysochanski <[email protected]>
---
net/sunrpc/sched.c | 5 +++++
net/sunrpc/xprt.c | 4 ----
2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index 28956c70100a..543caef296e4 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -22,6 +22,7 @@
#include <linux/sched/mm.h>

#include <linux/sunrpc/clnt.h>
+#include <linux/sunrpc/metrics.h>

#include "sunrpc.h"

@@ -770,6 +771,10 @@ rpc_reset_task_statistics(struct rpc_task *task)
void rpc_exit_task(struct rpc_task *task)
{
task->tk_action = NULL;
+ if (task->tk_ops->rpc_count_stats)
+ task->tk_ops->rpc_count_stats(task, task->tk_calldata);
+ else if (task->tk_client)
+ rpc_count_iostats(task, task->tk_client->cl_metrics);
if (task->tk_ops->rpc_call_done != NULL) {
task->tk_ops->rpc_call_done(task, task->tk_calldata);
if (task->tk_action != NULL) {
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index d7117d241460..d4477b227a96 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -1716,10 +1716,6 @@ void xprt_release(struct rpc_task *task)
}

xprt = req->rq_xprt;
- if (task->tk_ops->rpc_count_stats != NULL)
- task->tk_ops->rpc_count_stats(task, task->tk_calldata);
- else if (task->tk_client)
- rpc_count_iostats(task, task->tk_client->cl_metrics);
xprt_request_dequeue_all(task, req);
spin_lock_bh(&xprt->transport_lock);
xprt->ops->release_xprt(xprt, task);
--
2.20.1

2019-05-23 20:14:35

by David Wysochanski

[permalink] [raw]

Subject: [PATCH] mountstats: add per-op error counts for mountstats command

Display the count of ops completing with error status (status < 0)
on kernels that support it.

Signed-off-by: Dave Wysochanski <[email protected]>
---
tools/mountstats/mountstats.py | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/mountstats/mountstats.py b/tools/mountstats/mountstats.py
index c7fb8bb1..b8c720d5 100755
--- a/tools/mountstats/mountstats.py
+++ b/tools/mountstats/mountstats.py
@@ -475,7 +475,9 @@ class DeviceData:
retrans = stats[2] - count
if retrans != 0:
print('\t%d retrans (%d%%)' % (retrans, ((retrans * 100) / count)), end=' ')
- print('\t%d major timeouts' % stats[3])
+ print('\t%d major timeouts' % stats[3], end='')
+ if len(stats) >= 10 and stats[9] != 0:
+ print('%d errors (%d%%)' % (stats[9], ((stats[9] * 100) / count)))
else:
print('')
print('\tavg bytes sent per op: %d\tavg bytes received per op: %d' % \
--
2.20.1

2019-05-23 20:15:13

by David Wysochanski

[permalink] [raw]

Subject: [PATCH 3/3] SUNRPC: Count ops completing with tk_status < 0

We often see various error conditions with NFS4.x that show up with
a very high operation count all completing with tk_status < 0 in a
short period of time. Add a count to rpc_iostats to record on a
per-op basis the ops that complete in this manner, which will
enable lower overhead diagnostics.

Signed-off-by: Dave Wysochanski <[email protected]>
---
include/linux/sunrpc/metrics.h | 7 ++++++-
net/sunrpc/stats.c | 8 ++++++--
2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/include/linux/sunrpc/metrics.h b/include/linux/sunrpc/metrics.h
index 1b3751327575..0ee3f7052846 100644
--- a/include/linux/sunrpc/metrics.h
+++ b/include/linux/sunrpc/metrics.h
@@ -30,7 +30,7 @@
#include <linux/ktime.h>
#include <linux/spinlock.h>

-#define RPC_IOSTATS_VERS "1.0"
+#define RPC_IOSTATS_VERS "1.1"

struct rpc_iostats {
spinlock_t om_lock;
@@ -66,6 +66,11 @@ struct rpc_iostats {
ktime_t om_queue, /* queued for xmit */
om_rtt, /* RPC RTT */
om_execute; /* RPC execution */
+ /*
+ * The count of operations that complete with tk_status < 0.
+ * These statuses usually indicate error conditions.
+ */
+ unsigned long om_error_status;
} ____cacheline_aligned;

struct rpc_task;
diff --git a/net/sunrpc/stats.c b/net/sunrpc/stats.c
index 8b2d3c58ffae..737414247ca7 100644
--- a/net/sunrpc/stats.c
+++ b/net/sunrpc/stats.c
@@ -176,6 +176,8 @@ void rpc_count_iostats_metrics(const struct rpc_task *task,

execute = ktime_sub(now, task->tk_start);
op_metrics->om_execute = ktime_add(op_metrics->om_execute, execute);
+ if (task->tk_status < 0)
+ op_metrics->om_error_status++;

spin_unlock(&op_metrics->om_lock);

@@ -218,13 +220,14 @@ static void _add_rpc_iostats(struct rpc_iostats *a, struct rpc_iostats *b)
a->om_queue = ktime_add(a->om_queue, b->om_queue);
a->om_rtt = ktime_add(a->om_rtt, b->om_rtt);
a->om_execute = ktime_add(a->om_execute, b->om_execute);
+ a->om_error_status += b->om_error_status;
}

static void _print_rpc_iostats(struct seq_file *seq, struct rpc_iostats *stats,
int op, const struct rpc_procinfo *procs)
{
_print_name(seq, op, procs);
- seq_printf(seq, "%lu %lu %lu %llu %llu %llu %llu %llu\n",
+ seq_printf(seq, "%lu %lu %lu %llu %llu %llu %llu %llu %lu\n",
stats->om_ops,
stats->om_ntrans,
stats->om_timeouts,
@@ -232,7 +235,8 @@ static void _print_rpc_iostats(struct seq_file *seq, struct rpc_iostats *stats,
stats->om_bytes_recv,
ktime_to_ms(stats->om_queue),
ktime_to_ms(stats->om_rtt),
- ktime_to_ms(stats->om_execute));
+ ktime_to_ms(stats->om_execute),
+ stats->om_error_status);
}

void rpc_clnt_show_stats(struct seq_file *seq, struct rpc_clnt *clnt)
--
2.20.1

2019-05-23 20:34:06

by David Wysochanski

[permalink] [raw]

Subject: [PATCH v2] mountstats: add per-op error counts for mountstats command

Display the count of ops completing with error status (status < 0)
on kernels that support it.

Signed-off-by: Dave Wysochanski <[email protected]>
---
tools/mountstats/mountstats.py | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/mountstats/mountstats.py b/tools/mountstats/mountstats.py
index c7fb8bb1..2f525f4b 100755
--- a/tools/mountstats/mountstats.py
+++ b/tools/mountstats/mountstats.py
@@ -475,7 +475,9 @@ class DeviceData:
retrans = stats[2] - count
if retrans != 0:
print('\t%d retrans (%d%%)' % (retrans, ((retrans * 100) / count)), end=' ')
- print('\t%d major timeouts' % stats[3])
+ print('\t%d major timeouts' % stats[3], end='')
+ if len(stats) >= 10 and stats[9] != 0:
+ print('\t%d errors (%d%%)' % (stats[9], ((stats[9] * 100) / count)))
else:
print('')
print('\tavg bytes sent per op: %d\tavg bytes received per op: %d' % \
--
2.20.1

2019-05-30 21:56:36

by J. Bruce Fields

[permalink] [raw]

Subject: Re: [PATCH 3/3] SUNRPC: Count ops completing with tk_status < 0

On Thu, May 23, 2019 at 04:13:50PM -0400, Dave Wysochanski wrote:
> We often see various error conditions with NFS4.x that show up with
> a very high operation count all completing with tk_status < 0 in a
> short period of time. Add a count to rpc_iostats to record on a
> per-op basis the ops that complete in this manner, which will
> enable lower overhead diagnostics.

Looks like a good idea to me.

It's too bad we can't distinguish the errors. (E.g. ESTALE on a lookup
call is a lot more interesting than ENOENT.) But understood that
maintaining (number of possible errors) * (number of possible ops)
counters is probably overkill, so just counting the number of errors
seems like a good start.

--b.

>
> Signed-off-by: Dave Wysochanski <[email protected]>
> ---
> include/linux/sunrpc/metrics.h | 7 ++++++-
> net/sunrpc/stats.c | 8 ++++++--
> 2 files changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/sunrpc/metrics.h b/include/linux/sunrpc/metrics.h
> index 1b3751327575..0ee3f7052846 100644
> --- a/include/linux/sunrpc/metrics.h
> +++ b/include/linux/sunrpc/metrics.h
> @@ -30,7 +30,7 @@
> #include <linux/ktime.h>
> #include <linux/spinlock.h>
>
> -#define RPC_IOSTATS_VERS "1.0"
> +#define RPC_IOSTATS_VERS "1.1"
>
> struct rpc_iostats {
> spinlock_t om_lock;
> @@ -66,6 +66,11 @@ struct rpc_iostats {
> ktime_t om_queue, /* queued for xmit */
> om_rtt, /* RPC RTT */
> om_execute; /* RPC execution */
> + /*
> + * The count of operations that complete with tk_status < 0.
> + * These statuses usually indicate error conditions.
> + */
> + unsigned long om_error_status;
> } ____cacheline_aligned;
>
> struct rpc_task;
> diff --git a/net/sunrpc/stats.c b/net/sunrpc/stats.c
> index 8b2d3c58ffae..737414247ca7 100644
> --- a/net/sunrpc/stats.c
> +++ b/net/sunrpc/stats.c
> @@ -176,6 +176,8 @@ void rpc_count_iostats_metrics(const struct rpc_task *task,
>
> execute = ktime_sub(now, task->tk_start);
> op_metrics->om_execute = ktime_add(op_metrics->om_execute, execute);
> + if (task->tk_status < 0)
> + op_metrics->om_error_status++;
>
> spin_unlock(&op_metrics->om_lock);
>
> @@ -218,13 +220,14 @@ static void _add_rpc_iostats(struct rpc_iostats *a, struct rpc_iostats *b)
> a->om_queue = ktime_add(a->om_queue, b->om_queue);
> a->om_rtt = ktime_add(a->om_rtt, b->om_rtt);
> a->om_execute = ktime_add(a->om_execute, b->om_execute);
> + a->om_error_status += b->om_error_status;
> }
>
> static void _print_rpc_iostats(struct seq_file *seq, struct rpc_iostats *stats,
> int op, const struct rpc_procinfo *procs)
> {
> _print_name(seq, op, procs);
> - seq_printf(seq, "%lu %lu %lu %llu %llu %llu %llu %llu\n",
> + seq_printf(seq, "%lu %lu %lu %llu %llu %llu %llu %llu %lu\n",
> stats->om_ops,
> stats->om_ntrans,
> stats->om_timeouts,
> @@ -232,7 +235,8 @@ static void _print_rpc_iostats(struct seq_file *seq, struct rpc_iostats *stats,
> stats->om_bytes_recv,
> ktime_to_ms(stats->om_queue),
> ktime_to_ms(stats->om_rtt),
> - ktime_to_ms(stats->om_execute));
> + ktime_to_ms(stats->om_execute),
> + stats->om_error_status);
> }
>
> void rpc_clnt_show_stats(struct seq_file *seq, struct rpc_clnt *clnt)
> --
> 2.20.1

2019-05-30 22:22:39

by Chuck Lever III

[permalink] [raw]

Subject: Re: [PATCH 3/3] SUNRPC: Count ops completing with tk_status < 0

> On May 30, 2019, at 5:38 PM, [email protected] wrote:
>
> On Thu, May 23, 2019 at 04:13:50PM -0400, Dave Wysochanski wrote:
>> We often see various error conditions with NFS4.x that show up with
>> a very high operation count all completing with tk_status < 0 in a
>> short period of time. Add a count to rpc_iostats to record on a
>> per-op basis the ops that complete in this manner, which will
>> enable lower overhead diagnostics.
>
> Looks like a good idea to me.
>
> It's too bad we can't distinguish the errors. (E.g. ESTALE on a lookup
> call is a lot more interesting than ENOENT.) But understood that
> maintaining (number of possible errors) * (number of possible ops)
> counters is probably overkill, so just counting the number of errors
> seems like a good start.

We now have trace points that can do that too.

> --b.
>
>>
>> Signed-off-by: Dave Wysochanski <[email protected]>
>> ---
>> include/linux/sunrpc/metrics.h | 7 ++++++-
>> net/sunrpc/stats.c | 8 ++++++--
>> 2 files changed, 12 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/linux/sunrpc/metrics.h b/include/linux/sunrpc/metrics.h
>> index 1b3751327575..0ee3f7052846 100644
>> --- a/include/linux/sunrpc/metrics.h
>> +++ b/include/linux/sunrpc/metrics.h
>> @@ -30,7 +30,7 @@
>> #include <linux/ktime.h>
>> #include <linux/spinlock.h>
>>
>> -#define RPC_IOSTATS_VERS "1.0"
>> +#define RPC_IOSTATS_VERS "1.1"
>>
>> struct rpc_iostats {
>> spinlock_t om_lock;
>> @@ -66,6 +66,11 @@ struct rpc_iostats {
>> ktime_t om_queue, /* queued for xmit */
>> om_rtt, /* RPC RTT */
>> om_execute; /* RPC execution */
>> + /*
>> + * The count of operations that complete with tk_status < 0.
>> + * These statuses usually indicate error conditions.
>> + */
>> + unsigned long om_error_status;
>> } ____cacheline_aligned;
>>
>> struct rpc_task;
>> diff --git a/net/sunrpc/stats.c b/net/sunrpc/stats.c
>> index 8b2d3c58ffae..737414247ca7 100644
>> --- a/net/sunrpc/stats.c
>> +++ b/net/sunrpc/stats.c
>> @@ -176,6 +176,8 @@ void rpc_count_iostats_metrics(const struct rpc_task *task,
>>
>> execute = ktime_sub(now, task->tk_start);
>> op_metrics->om_execute = ktime_add(op_metrics->om_execute, execute);
>> + if (task->tk_status < 0)
>> + op_metrics->om_error_status++;
>>
>> spin_unlock(&op_metrics->om_lock);
>>
>> @@ -218,13 +220,14 @@ static void _add_rpc_iostats(struct rpc_iostats *a, struct rpc_iostats *b)
>> a->om_queue = ktime_add(a->om_queue, b->om_queue);
>> a->om_rtt = ktime_add(a->om_rtt, b->om_rtt);
>> a->om_execute = ktime_add(a->om_execute, b->om_execute);
>> + a->om_error_status += b->om_error_status;
>> }
>>
>> static void _print_rpc_iostats(struct seq_file *seq, struct rpc_iostats *stats,
>> int op, const struct rpc_procinfo *procs)
>> {
>> _print_name(seq, op, procs);
>> - seq_printf(seq, "%lu %lu %lu %llu %llu %llu %llu %llu\n",
>> + seq_printf(seq, "%lu %lu %lu %llu %llu %llu %llu %llu %lu\n",
>> stats->om_ops,
>> stats->om_ntrans,
>> stats->om_timeouts,
>> @@ -232,7 +235,8 @@ static void _print_rpc_iostats(struct seq_file *seq, struct rpc_iostats *stats,
>> stats->om_bytes_recv,
>> ktime_to_ms(stats->om_queue),
>> ktime_to_ms(stats->om_rtt),
>> - ktime_to_ms(stats->om_execute));
>> + ktime_to_ms(stats->om_execute),
>> + stats->om_error_status);
>> }
>>
>> void rpc_clnt_show_stats(struct seq_file *seq, struct rpc_clnt *clnt)
>> --
>> 2.20.1

--
Chuck Lever

2019-05-30 22:35:27

by J. Bruce Fields

[permalink] [raw]

Subject: Re: [PATCH 3/3] SUNRPC: Count ops completing with tk_status < 0

On Thu, May 30, 2019 at 06:19:54PM -0400, Chuck Lever wrote:
>
>
> > On May 30, 2019, at 5:38 PM, [email protected] wrote:
> >
> > On Thu, May 23, 2019 at 04:13:50PM -0400, Dave Wysochanski wrote:
> >> We often see various error conditions with NFS4.x that show up with
> >> a very high operation count all completing with tk_status < 0 in a
> >> short period of time. Add a count to rpc_iostats to record on a
> >> per-op basis the ops that complete in this manner, which will
> >> enable lower overhead diagnostics.
> >
> > Looks like a good idea to me.
> >
> > It's too bad we can't distinguish the errors. (E.g. ESTALE on a lookup
> > call is a lot more interesting than ENOENT.) But understood that
> > maintaining (number of possible errors) * (number of possible ops)
> > counters is probably overkill, so just counting the number of errors
> > seems like a good start.
>
> We now have trace points that can do that too.

You mean, that can report every error (and its value)?

I assume having these statistics in mountstats is still useful, though.

--b.

>
>
> > --b.
> >
> >>
> >> Signed-off-by: Dave Wysochanski <[email protected]>
> >> ---
> >> include/linux/sunrpc/metrics.h | 7 ++++++-
> >> net/sunrpc/stats.c | 8 ++++++--
> >> 2 files changed, 12 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/include/linux/sunrpc/metrics.h b/include/linux/sunrpc/metrics.h
> >> index 1b3751327575..0ee3f7052846 100644
> >> --- a/include/linux/sunrpc/metrics.h
> >> +++ b/include/linux/sunrpc/metrics.h
> >> @@ -30,7 +30,7 @@
> >> #include <linux/ktime.h>
> >> #include <linux/spinlock.h>
> >>
> >> -#define RPC_IOSTATS_VERS "1.0"
> >> +#define RPC_IOSTATS_VERS "1.1"
> >>
> >> struct rpc_iostats {
> >> spinlock_t om_lock;
> >> @@ -66,6 +66,11 @@ struct rpc_iostats {
> >> ktime_t om_queue, /* queued for xmit */
> >> om_rtt, /* RPC RTT */
> >> om_execute; /* RPC execution */
> >> + /*
> >> + * The count of operations that complete with tk_status < 0.
> >> + * These statuses usually indicate error conditions.
> >> + */
> >> + unsigned long om_error_status;
> >> } ____cacheline_aligned;
> >>
> >> struct rpc_task;
> >> diff --git a/net/sunrpc/stats.c b/net/sunrpc/stats.c
> >> index 8b2d3c58ffae..737414247ca7 100644
> >> --- a/net/sunrpc/stats.c
> >> +++ b/net/sunrpc/stats.c
> >> @@ -176,6 +176,8 @@ void rpc_count_iostats_metrics(const struct rpc_task *task,
> >>
> >> execute = ktime_sub(now, task->tk_start);
> >> op_metrics->om_execute = ktime_add(op_metrics->om_execute, execute);
> >> + if (task->tk_status < 0)
> >> + op_metrics->om_error_status++;
> >>
> >> spin_unlock(&op_metrics->om_lock);
> >>
> >> @@ -218,13 +220,14 @@ static void _add_rpc_iostats(struct rpc_iostats *a, struct rpc_iostats *b)
> >> a->om_queue = ktime_add(a->om_queue, b->om_queue);
> >> a->om_rtt = ktime_add(a->om_rtt, b->om_rtt);
> >> a->om_execute = ktime_add(a->om_execute, b->om_execute);
> >> + a->om_error_status += b->om_error_status;
> >> }
> >>
> >> static void _print_rpc_iostats(struct seq_file *seq, struct rpc_iostats *stats,
> >> int op, const struct rpc_procinfo *procs)
> >> {
> >> _print_name(seq, op, procs);
> >> - seq_printf(seq, "%lu %lu %lu %llu %llu %llu %llu %llu\n",
> >> + seq_printf(seq, "%lu %lu %lu %llu %llu %llu %llu %llu %lu\n",
> >> stats->om_ops,
> >> stats->om_ntrans,
> >> stats->om_timeouts,
> >> @@ -232,7 +235,8 @@ static void _print_rpc_iostats(struct seq_file *seq, struct rpc_iostats *stats,
> >> stats->om_bytes_recv,
> >> ktime_to_ms(stats->om_queue),
> >> ktime_to_ms(stats->om_rtt),
> >> - ktime_to_ms(stats->om_execute));
> >> + ktime_to_ms(stats->om_execute),
> >> + stats->om_error_status);
> >> }
> >>
> >> void rpc_clnt_show_stats(struct seq_file *seq, struct rpc_clnt *clnt)
> >> --
> >> 2.20.1
>
> --
> Chuck Lever
>
>

2019-05-31 00:18:55

by David Wysochanski

[permalink] [raw]

Subject: Re: [PATCH 3/3] SUNRPC: Count ops completing with tk_status < 0

On Thu, May 30, 2019 at 5:39 PM J. Bruce Fields <[email protected]> wrote:
>
> On Thu, May 23, 2019 at 04:13:50PM -0400, Dave Wysochanski wrote:
> > We often see various error conditions with NFS4.x that show up with
> > a very high operation count all completing with tk_status < 0 in a
> > short period of time. Add a count to rpc_iostats to record on a
> > per-op basis the ops that complete in this manner, which will
> > enable lower overhead diagnostics.
>
> Looks like a good idea to me.
>
> It's too bad we can't distinguish the errors. (E.g. ESTALE on a lookup
> call is a lot more interesting than ENOENT.) But understood that
> maintaining (number of possible errors) * (number of possible ops)
> counters is probably overkill, so just counting the number of errors
> seems like a good start.
>
> --b.
>

I did consider a more elaborate approach, where each code would be
counted. Most likely that would need to go into debugfs. I didn't
pursue that but may look into it in the future along with other work.

I do think the < 0 status counts have value even if we do not know the
specific code, they are very low overhead, and always there if added
to mountstats. I was envisioning using this along with something like
a periodic capture of mountstats possibly using PCP and then using
some statistics as a "NFS health" measure for a specific mount point.
There are already some "NFS health" metrics that may be calculated
with existing mountstats (for example if you're clever you can spot
certain NFS4 protocol loops), but the error counts would make it much
easier and reliable.

> >
> > Signed-off-by: Dave Wysochanski <[email protected]>
> > ---
> > include/linux/sunrpc/metrics.h | 7 ++++++-
> > net/sunrpc/stats.c | 8 ++++++--
> > 2 files changed, 12 insertions(+), 3 deletions(-)
> >
> > diff --git a/include/linux/sunrpc/metrics.h b/include/linux/sunrpc/metrics.h
> > index 1b3751327575..0ee3f7052846 100644
> > --- a/include/linux/sunrpc/metrics.h
> > +++ b/include/linux/sunrpc/metrics.h
> > @@ -30,7 +30,7 @@
> > #include <linux/ktime.h>
> > #include <linux/spinlock.h>
> >
> > -#define RPC_IOSTATS_VERS "1.0"
> > +#define RPC_IOSTATS_VERS "1.1"
> >
> > struct rpc_iostats {
> > spinlock_t om_lock;
> > @@ -66,6 +66,11 @@ struct rpc_iostats {
> > ktime_t om_queue, /* queued for xmit */
> > om_rtt, /* RPC RTT */
> > om_execute; /* RPC execution */
> > + /*
> > + * The count of operations that complete with tk_status < 0.
> > + * These statuses usually indicate error conditions.
> > + */
> > + unsigned long om_error_status;
> > } ____cacheline_aligned;
> >
> > struct rpc_task;
> > diff --git a/net/sunrpc/stats.c b/net/sunrpc/stats.c
> > index 8b2d3c58ffae..737414247ca7 100644
> > --- a/net/sunrpc/stats.c
> > +++ b/net/sunrpc/stats.c
> > @@ -176,6 +176,8 @@ void rpc_count_iostats_metrics(const struct rpc_task *task,
> >
> > execute = ktime_sub(now, task->tk_start);
> > op_metrics->om_execute = ktime_add(op_metrics->om_execute, execute);
> > + if (task->tk_status < 0)
> > + op_metrics->om_error_status++;
> >
> > spin_unlock(&op_metrics->om_lock);
> >
> > @@ -218,13 +220,14 @@ static void _add_rpc_iostats(struct rpc_iostats *a, struct rpc_iostats *b)
> > a->om_queue = ktime_add(a->om_queue, b->om_queue);
> > a->om_rtt = ktime_add(a->om_rtt, b->om_rtt);
> > a->om_execute = ktime_add(a->om_execute, b->om_execute);
> > + a->om_error_status += b->om_error_status;
> > }
> >
> > static void _print_rpc_iostats(struct seq_file *seq, struct rpc_iostats *stats,
> > int op, const struct rpc_procinfo *procs)
> > {
> > _print_name(seq, op, procs);
> > - seq_printf(seq, "%lu %lu %lu %llu %llu %llu %llu %llu\n",
> > + seq_printf(seq, "%lu %lu %lu %llu %llu %llu %llu %llu %lu\n",
> > stats->om_ops,
> > stats->om_ntrans,
> > stats->om_timeouts,
> > @@ -232,7 +235,8 @@ static void _print_rpc_iostats(struct seq_file *seq, struct rpc_iostats *stats,
> > stats->om_bytes_recv,
> > ktime_to_ms(stats->om_queue),
> > ktime_to_ms(stats->om_rtt),
> > - ktime_to_ms(stats->om_execute));
> > + ktime_to_ms(stats->om_execute),
> > + stats->om_error_status);
> > }
> >
> > void rpc_clnt_show_stats(struct seq_file *seq, struct rpc_clnt *clnt)
> > --
> > 2.20.1

--
Dave Wysochanski
Principal Software Maintenance Engineer
T: 919-754-4024

2019-05-31 13:26:26

by Chuck Lever III

[permalink] [raw]

Subject: Re: [PATCH 3/3] SUNRPC: Count ops completing with tk_status < 0

> On May 30, 2019, at 6:33 PM, Bruce Fields <[email protected]> wrote:
>
> On Thu, May 30, 2019 at 06:19:54PM -0400, Chuck Lever wrote:
>>
>>
>>> On May 30, 2019, at 5:38 PM, [email protected] wrote:
>>>
>>> On Thu, May 23, 2019 at 04:13:50PM -0400, Dave Wysochanski wrote:
>>>> We often see various error conditions with NFS4.x that show up with
>>>> a very high operation count all completing with tk_status < 0 in a
>>>> short period of time. Add a count to rpc_iostats to record on a
>>>> per-op basis the ops that complete in this manner, which will
>>>> enable lower overhead diagnostics.
>>>
>>> Looks like a good idea to me.
>>>
>>> It's too bad we can't distinguish the errors. (E.g. ESTALE on a lookup
>>> call is a lot more interesting than ENOENT.) But understood that
>>> maintaining (number of possible errors) * (number of possible ops)
>>> counters is probably overkill, so just counting the number of errors
>>> seems like a good start.
>>
>> We now have trace points that can do that too.
>
> You mean, that can report every error (and its value)?

Yes, the nfs_xdr_status trace point reports the error by value and symbolic name.

> I assume having these statistics in mountstats is still useful, though.
>
> --b.
>
>>
>>
>>> --b.
>>>
>>>>
>>>> Signed-off-by: Dave Wysochanski <[email protected]>
>>>> ---
>>>> include/linux/sunrpc/metrics.h | 7 ++++++-
>>>> net/sunrpc/stats.c | 8 ++++++--
>>>> 2 files changed, 12 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/include/linux/sunrpc/metrics.h b/include/linux/sunrpc/metrics.h
>>>> index 1b3751327575..0ee3f7052846 100644
>>>> --- a/include/linux/sunrpc/metrics.h
>>>> +++ b/include/linux/sunrpc/metrics.h
>>>> @@ -30,7 +30,7 @@
>>>> #include <linux/ktime.h>
>>>> #include <linux/spinlock.h>
>>>>
>>>> -#define RPC_IOSTATS_VERS "1.0"
>>>> +#define RPC_IOSTATS_VERS "1.1"
>>>>
>>>> struct rpc_iostats {
>>>> spinlock_t om_lock;
>>>> @@ -66,6 +66,11 @@ struct rpc_iostats {
>>>> ktime_t om_queue, /* queued for xmit */
>>>> om_rtt, /* RPC RTT */
>>>> om_execute; /* RPC execution */
>>>> + /*
>>>> + * The count of operations that complete with tk_status < 0.
>>>> + * These statuses usually indicate error conditions.
>>>> + */
>>>> + unsigned long om_error_status;
>>>> } ____cacheline_aligned;
>>>>
>>>> struct rpc_task;
>>>> diff --git a/net/sunrpc/stats.c b/net/sunrpc/stats.c
>>>> index 8b2d3c58ffae..737414247ca7 100644
>>>> --- a/net/sunrpc/stats.c
>>>> +++ b/net/sunrpc/stats.c
>>>> @@ -176,6 +176,8 @@ void rpc_count_iostats_metrics(const struct rpc_task *task,
>>>>
>>>> execute = ktime_sub(now, task->tk_start);
>>>> op_metrics->om_execute = ktime_add(op_metrics->om_execute, execute);
>>>> + if (task->tk_status < 0)
>>>> + op_metrics->om_error_status++;
>>>>
>>>> spin_unlock(&op_metrics->om_lock);
>>>>
>>>> @@ -218,13 +220,14 @@ static void _add_rpc_iostats(struct rpc_iostats *a, struct rpc_iostats *b)
>>>> a->om_queue = ktime_add(a->om_queue, b->om_queue);
>>>> a->om_rtt = ktime_add(a->om_rtt, b->om_rtt);
>>>> a->om_execute = ktime_add(a->om_execute, b->om_execute);
>>>> + a->om_error_status += b->om_error_status;
>>>> }
>>>>
>>>> static void _print_rpc_iostats(struct seq_file *seq, struct rpc_iostats *stats,
>>>> int op, const struct rpc_procinfo *procs)
>>>> {
>>>> _print_name(seq, op, procs);
>>>> - seq_printf(seq, "%lu %lu %lu %llu %llu %llu %llu %llu\n",
>>>> + seq_printf(seq, "%lu %lu %lu %llu %llu %llu %llu %llu %lu\n",
>>>> stats->om_ops,
>>>> stats->om_ntrans,
>>>> stats->om_timeouts,
>>>> @@ -232,7 +235,8 @@ static void _print_rpc_iostats(struct seq_file *seq, struct rpc_iostats *stats,
>>>> stats->om_bytes_recv,
>>>> ktime_to_ms(stats->om_queue),
>>>> ktime_to_ms(stats->om_rtt),
>>>> - ktime_to_ms(stats->om_execute));
>>>> + ktime_to_ms(stats->om_execute),
>>>> + stats->om_error_status);
>>>> }
>>>>
>>>> void rpc_clnt_show_stats(struct seq_file *seq, struct rpc_clnt *clnt)
>>>> --
>>>> 2.20.1
>>
>> --
>> Chuck Lever

--
Chuck Lever

2019-06-03 14:32:29

by Steve Dickson

[permalink] [raw]

Subject: Re: [PATCH v2] mountstats: add per-op error counts for mountstats command

On 5/23/19 4:33 PM, Dave Wysochanski wrote:
> Display the count of ops completing with error status (status < 0)
> on kernels that support it.
>
> Signed-off-by: Dave Wysochanski <[email protected]>
> ---
> tools/mountstats/mountstats.py | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
Committed...

steved.
>
> diff --git a/tools/mountstats/mountstats.py b/tools/mountstats/mountstats.py
> index c7fb8bb1..2f525f4b 100755
> --- a/tools/mountstats/mountstats.py
> +++ b/tools/mountstats/mountstats.py
> @@ -475,7 +475,9 @@ class DeviceData:
> retrans = stats[2] - count
> if retrans != 0:
> print('\t%d retrans (%d%%)' % (retrans, ((retrans * 100) / count)), end=' ')
> - print('\t%d major timeouts' % stats[3])
> + print('\t%d major timeouts' % stats[3], end='')
> + if len(stats) >= 10 and stats[9] != 0:
> + print('\t%d errors (%d%%)' % (stats[9], ((stats[9] * 100) / count)))
> else:
> print('')
> print('\tavg bytes sent per op: %d\tavg bytes received per op: %d' % \
>

2019-06-03 18:54:51

by David Wysochanski

[permalink] [raw]

Subject: Re: [PATCH 3/3] SUNRPC: Count ops completing with tk_status < 0

On Fri, 2019-05-31 at 09:25 -0400, Chuck Lever wrote:
> > On May 30, 2019, at 6:33 PM, Bruce Fields <[email protected]>
> > wrote:
> >
> > On Thu, May 30, 2019 at 06:19:54PM -0400, Chuck Lever wrote:
> > >
> > >
> > > > On May 30, 2019, at 5:38 PM, [email protected] wrote:
> > > >
> > > > On Thu, May 23, 2019 at 04:13:50PM -0400, Dave Wysochanski
> > > > wrote:
> > > > > We often see various error conditions with NFS4.x that show
> > > > > up with
> > > > > a very high operation count all completing with tk_status < 0
> > > > > in a
> > > > > short period of time. Add a count to rpc_iostats to record
> > > > > on a
> > > > > per-op basis the ops that complete in this manner, which will
> > > > > enable lower overhead diagnostics.
> > > >
> > > > Looks like a good idea to me.
> > > >
> > > > It's too bad we can't distinguish the errors. (E.g. ESTALE on
> > > > a lookup
> > > > call is a lot more interesting than ENOENT.) But understood
> > > > that
> > > > maintaining (number of possible errors) * (number of possible
> > > > ops)
> > > > counters is probably overkill, so just counting the number of
> > > > errors
> > > > seems like a good start.
> > >
> > > We now have trace points that can do that too.
> >
> > You mean, that can report every error (and its value)?
>
> Yes, the nfs_xdr_status trace point reports the error by value and
> symbolic name.
>

The tracepoint is very useful I agree. I don't think it will show:
a) the mount
b) the opcode

Or am I mistaken and there's a way to get those with a filter or
another tracepoint?

>
> > I assume having these statistics in mountstats is still useful,
> > though.
> >
> > --b.
> >
> > >
> > >
> > > > --b.
> > > >
> > > > >
> > > > > Signed-off-by: Dave Wysochanski <[email protected]>
> > > > > ---
> > > > > include/linux/sunrpc/metrics.h | 7 ++++++-
> > > > > net/sunrpc/stats.c | 8 ++++++--
> > > > > 2 files changed, 12 insertions(+), 3 deletions(-)
> > > > >
> > > > > diff --git a/include/linux/sunrpc/metrics.h
> > > > > b/include/linux/sunrpc/metrics.h
> > > > > index 1b3751327575..0ee3f7052846 100644
> > > > > --- a/include/linux/sunrpc/metrics.h
> > > > > +++ b/include/linux/sunrpc/metrics.h
> > > > > @@ -30,7 +30,7 @@
> > > > > #include <linux/ktime.h>
> > > > > #include <linux/spinlock.h>
> > > > >
> > > > > -#define RPC_IOSTATS_VERS "1.0"
> > > > > +#define RPC_IOSTATS_VERS "1.1"
> > > > >
> > > > > struct rpc_iostats {
> > > > > spinlock_t om_lock;
> > > > > @@ -66,6 +66,11 @@ struct rpc_iostats {
> > > > > ktime_t om_queue, /* queued for
> > > > > xmit */
> > > > > om_rtt, /* RPC RTT */
> > > > > om_execute; /* RPC
> > > > > execution */
> > > > > + /*
> > > > > + * The count of operations that complete with tk_status
> > > > > < 0.
> > > > > + * These statuses usually indicate error conditions.
> > > > > + */
> > > > > + unsigned long om_error_status;
> > > > > } ____cacheline_aligned;
> > > > >
> > > > > struct rpc_task;
> > > > > diff --git a/net/sunrpc/stats.c b/net/sunrpc/stats.c
> > > > > index 8b2d3c58ffae..737414247ca7 100644
> > > > > --- a/net/sunrpc/stats.c
> > > > > +++ b/net/sunrpc/stats.c
> > > > > @@ -176,6 +176,8 @@ void rpc_count_iostats_metrics(const
> > > > > struct rpc_task *task,
> > > > >
> > > > > execute = ktime_sub(now, task->tk_start);
> > > > > op_metrics->om_execute = ktime_add(op_metrics-
> > > > > >om_execute, execute);
> > > > > + if (task->tk_status < 0)
> > > > > + op_metrics->om_error_status++;
> > > > >
> > > > > spin_unlock(&op_metrics->om_lock);
> > > > >
> > > > > @@ -218,13 +220,14 @@ static void _add_rpc_iostats(struct
> > > > > rpc_iostats *a, struct rpc_iostats *b)
> > > > > a->om_queue = ktime_add(a->om_queue, b->om_queue);
> > > > > a->om_rtt = ktime_add(a->om_rtt, b->om_rtt);
> > > > > a->om_execute = ktime_add(a->om_execute, b-
> > > > > >om_execute);
> > > > > + a->om_error_status += b->om_error_status;
> > > > > }
> > > > >
> > > > > static void _print_rpc_iostats(struct seq_file *seq, struct
> > > > > rpc_iostats *stats,
> > > > > int op, const struct
> > > > > rpc_procinfo *procs)
> > > > > {
> > > > > _print_name(seq, op, procs);
> > > > > - seq_printf(seq, "%lu %lu %lu %llu %llu %llu %llu
> > > > > %llu\n",
> > > > > + seq_printf(seq, "%lu %lu %lu %llu %llu %llu %llu %llu
> > > > > %lu\n",
> > > > > stats->om_ops,
> > > > > stats->om_ntrans,
> > > > > stats->om_timeouts,
> > > > > @@ -232,7 +235,8 @@ static void _print_rpc_iostats(struct
> > > > > seq_file *seq, struct rpc_iostats *stats,
> > > > > stats->om_bytes_recv,
> > > > > ktime_to_ms(stats->om_queue),
> > > > > ktime_to_ms(stats->om_rtt),
> > > > > - ktime_to_ms(stats->om_execute));
> > > > > + ktime_to_ms(stats->om_execute),
> > > > > + stats->om_error_status);
> > > > > }
> > > > >
> > > > > void rpc_clnt_show_stats(struct seq_file *seq, struct
> > > > > rpc_clnt *clnt)
> > > > > --
> > > > > 2.20.1
> > >
> > > --
> > > Chuck Lever
>
> --
> Chuck Lever
>
>
>

2019-06-03 18:57:25

by Chuck Lever III

[permalink] [raw]

Subject: Re: [PATCH 3/3] SUNRPC: Count ops completing with tk_status < 0

> On Jun 3, 2019, at 2:53 PM, Dave Wysochanski <[email protected]> wrote:
>
> On Fri, 2019-05-31 at 09:25 -0400, Chuck Lever wrote:
>>> On May 30, 2019, at 6:33 PM, Bruce Fields <[email protected]>
>>> wrote:
>>>
>>> On Thu, May 30, 2019 at 06:19:54PM -0400, Chuck Lever wrote:
>>>>
>>>>
>>>>> On May 30, 2019, at 5:38 PM, [email protected] wrote:
>>>>>
>>>>> On Thu, May 23, 2019 at 04:13:50PM -0400, Dave Wysochanski
>>>>> wrote:
>>>>>> We often see various error conditions with NFS4.x that show
>>>>>> up with
>>>>>> a very high operation count all completing with tk_status < 0
>>>>>> in a
>>>>>> short period of time. Add a count to rpc_iostats to record
>>>>>> on a
>>>>>> per-op basis the ops that complete in this manner, which will
>>>>>> enable lower overhead diagnostics.
>>>>>
>>>>> Looks like a good idea to me.
>>>>>
>>>>> It's too bad we can't distinguish the errors. (E.g. ESTALE on
>>>>> a lookup
>>>>> call is a lot more interesting than ENOENT.) But understood
>>>>> that
>>>>> maintaining (number of possible errors) * (number of possible
>>>>> ops)
>>>>> counters is probably overkill, so just counting the number of
>>>>> errors
>>>>> seems like a good start.
>>>>
>>>> We now have trace points that can do that too.
>>>
>>> You mean, that can report every error (and its value)?
>>
>> Yes, the nfs_xdr_status trace point reports the error by value and
>> symbolic name.
>>
>
> The tracepoint is very useful I agree. I don't think it will show:
> a) the mount
> b) the opcode
>
> Or am I mistaken and there's a way to get those with a filter or
> another tracepoint?

The opcode can be exposed by another trace point, but the link between
the two trace points is tenuous and could be improved.

I don't believe any of the NFS trace points expose the mount. My testing
is largely on a single mount so my imagination stopped there.

>>> I assume having these statistics in mountstats is still useful,
>>> though.
>>>
>>> --b.
>>>
>>>>
>>>>
>>>>> --b.
>>>>>
>>>>>>
>>>>>> Signed-off-by: Dave Wysochanski <[email protected]>
>>>>>> ---
>>>>>> include/linux/sunrpc/metrics.h | 7 ++++++-
>>>>>> net/sunrpc/stats.c | 8 ++++++--
>>>>>> 2 files changed, 12 insertions(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/include/linux/sunrpc/metrics.h
>>>>>> b/include/linux/sunrpc/metrics.h
>>>>>> index 1b3751327575..0ee3f7052846 100644
>>>>>> --- a/include/linux/sunrpc/metrics.h
>>>>>> +++ b/include/linux/sunrpc/metrics.h
>>>>>> @@ -30,7 +30,7 @@
>>>>>> #include <linux/ktime.h>
>>>>>> #include <linux/spinlock.h>
>>>>>>
>>>>>> -#define RPC_IOSTATS_VERS "1.0"
>>>>>> +#define RPC_IOSTATS_VERS "1.1"
>>>>>>
>>>>>> struct rpc_iostats {
>>>>>> spinlock_t om_lock;
>>>>>> @@ -66,6 +66,11 @@ struct rpc_iostats {
>>>>>> ktime_t om_queue, /* queued for
>>>>>> xmit */
>>>>>> om_rtt, /* RPC RTT */
>>>>>> om_execute; /* RPC
>>>>>> execution */
>>>>>> + /*
>>>>>> + * The count of operations that complete with tk_status
>>>>>> < 0.
>>>>>> + * These statuses usually indicate error conditions.
>>>>>> + */
>>>>>> + unsigned long om_error_status;
>>>>>> } ____cacheline_aligned;
>>>>>>
>>>>>> struct rpc_task;
>>>>>> diff --git a/net/sunrpc/stats.c b/net/sunrpc/stats.c
>>>>>> index 8b2d3c58ffae..737414247ca7 100644
>>>>>> --- a/net/sunrpc/stats.c
>>>>>> +++ b/net/sunrpc/stats.c
>>>>>> @@ -176,6 +176,8 @@ void rpc_count_iostats_metrics(const
>>>>>> struct rpc_task *task,
>>>>>>
>>>>>> execute = ktime_sub(now, task->tk_start);
>>>>>> op_metrics->om_execute = ktime_add(op_metrics-
>>>>>>> om_execute, execute);
>>>>>> + if (task->tk_status < 0)
>>>>>> + op_metrics->om_error_status++;
>>>>>>
>>>>>> spin_unlock(&op_metrics->om_lock);
>>>>>>
>>>>>> @@ -218,13 +220,14 @@ static void _add_rpc_iostats(struct
>>>>>> rpc_iostats *a, struct rpc_iostats *b)
>>>>>> a->om_queue = ktime_add(a->om_queue, b->om_queue);
>>>>>> a->om_rtt = ktime_add(a->om_rtt, b->om_rtt);
>>>>>> a->om_execute = ktime_add(a->om_execute, b-
>>>>>>> om_execute);
>>>>>> + a->om_error_status += b->om_error_status;
>>>>>> }
>>>>>>
>>>>>> static void _print_rpc_iostats(struct seq_file *seq, struct
>>>>>> rpc_iostats *stats,
>>>>>> int op, const struct
>>>>>> rpc_procinfo *procs)
>>>>>> {
>>>>>> _print_name(seq, op, procs);
>>>>>> - seq_printf(seq, "%lu %lu %lu %llu %llu %llu %llu
>>>>>> %llu\n",
>>>>>> + seq_printf(seq, "%lu %lu %lu %llu %llu %llu %llu %llu
>>>>>> %lu\n",
>>>>>> stats->om_ops,
>>>>>> stats->om_ntrans,
>>>>>> stats->om_timeouts,
>>>>>> @@ -232,7 +235,8 @@ static void _print_rpc_iostats(struct
>>>>>> seq_file *seq, struct rpc_iostats *stats,
>>>>>> stats->om_bytes_recv,
>>>>>> ktime_to_ms(stats->om_queue),
>>>>>> ktime_to_ms(stats->om_rtt),
>>>>>> - ktime_to_ms(stats->om_execute));
>>>>>> + ktime_to_ms(stats->om_execute),
>>>>>> + stats->om_error_status);
>>>>>> }
>>>>>>
>>>>>> void rpc_clnt_show_stats(struct seq_file *seq, struct
>>>>>> rpc_clnt *clnt)
>>>>>> --
>>>>>> 2.20.1
>>>>
>>>> --
>>>> Chuck Lever
>>
>> --
>> Chuck Lever

--
Chuck Lever

2019-06-03 19:06:18

by David Wysochanski

[permalink] [raw]

Subject: Re: [PATCH 3/3] SUNRPC: Count ops completing with tk_status < 0

On Mon, 2019-06-03 at 14:56 -0400, Chuck Lever wrote:
> > On Jun 3, 2019, at 2:53 PM, Dave Wysochanski <[email protected]>
> > wrote:
> >
> > On Fri, 2019-05-31 at 09:25 -0400, Chuck Lever wrote:
> > > > On May 30, 2019, at 6:33 PM, Bruce Fields <[email protected]
> > > > >
> > > > wrote:
> > > >
> > > > On Thu, May 30, 2019 at 06:19:54PM -0400, Chuck Lever wrote:
> > > > >
> > > > >
> > > > > > On May 30, 2019, at 5:38 PM, [email protected] wrote:
> > > > > >
> > > > > > On Thu, May 23, 2019 at 04:13:50PM -0400, Dave Wysochanski
> > > > > > wrote:
> > > > > > > We often see various error conditions with NFS4.x that
> > > > > > > show
> > > > > > > up with
> > > > > > > a very high operation count all completing with tk_status
> > > > > > > < 0
> > > > > > > in a
> > > > > > > short period of time. Add a count to rpc_iostats to
> > > > > > > record
> > > > > > > on a
> > > > > > > per-op basis the ops that complete in this manner, which
> > > > > > > will
> > > > > > > enable lower overhead diagnostics.
> > > > > >
> > > > > > Looks like a good idea to me.
> > > > > >
> > > > > > It's too bad we can't distinguish the errors. (E.g. ESTALE
> > > > > > on
> > > > > > a lookup
> > > > > > call is a lot more interesting than ENOENT.) But
> > > > > > understood
> > > > > > that
> > > > > > maintaining (number of possible errors) * (number of
> > > > > > possible
> > > > > > ops)
> > > > > > counters is probably overkill, so just counting the number
> > > > > > of
> > > > > > errors
> > > > > > seems like a good start.
> > > > >
> > > > > We now have trace points that can do that too.
> > > >
> > > > You mean, that can report every error (and its value)?
> > >
> > > Yes, the nfs_xdr_status trace point reports the error by value
> > > and
> > > symbolic name.
> > >
> >
> > The tracepoint is very useful I agree. I don't think it will show:
> > a) the mount
> > b) the opcode
> >
> > Or am I mistaken and there's a way to get those with a filter or
> > another tracepoint?
>
> The opcode can be exposed by another trace point, but the link
> between
> the two trace points is tenuous and could be improved.
>

I think the number is there but it's not decoded! This was for a WRITE
completing with BAD_STATEID - very nice:

kworker/u16:0-31130 [006] .... 949028.602298: nfs4_xdr_status: operation 38: nfs status -10025 (BAD_STATEID)

> I don't believe any of the NFS trace points expose the mount. My
> testing
> is largely on a single mount so my imagination stopped there.
>

Ok thanks for the confirmation.

>
> > > > I assume having these statistics in mountstats is still useful,
> > > > though.
> > > >
> > > > --b.
> > > >
> > > > >
> > > > >
> > > > > > --b.
> > > > > >
> > > > > > >
> > > > > > > Signed-off-by: Dave Wysochanski <[email protected]>
> > > > > > > ---
> > > > > > > include/linux/sunrpc/metrics.h | 7 ++++++-
> > > > > > > net/sunrpc/stats.c | 8 ++++++--
> > > > > > > 2 files changed, 12 insertions(+), 3 deletions(-)
> > > > > > >
> > > > > > > diff --git a/include/linux/sunrpc/metrics.h
> > > > > > > b/include/linux/sunrpc/metrics.h
> > > > > > > index 1b3751327575..0ee3f7052846 100644
> > > > > > > --- a/include/linux/sunrpc/metrics.h
> > > > > > > +++ b/include/linux/sunrpc/metrics.h
> > > > > > > @@ -30,7 +30,7 @@
> > > > > > > #include <linux/ktime.h>
> > > > > > > #include <linux/spinlock.h>
> > > > > > >
> > > > > > > -#define RPC_IOSTATS_VERS "1.0"
> > > > > > > +#define RPC_IOSTATS_VERS "1.1"
> > > > > > >
> > > > > > > struct rpc_iostats {
> > > > > > > spinlock_t om_lock;
> > > > > > > @@ -66,6 +66,11 @@ struct rpc_iostats {
> > > > > > > ktime_t om_queue, /* queued
> > > > > > > for
> > > > > > > xmit */
> > > > > > > om_rtt, /* RPC
> > > > > > > RTT */
> > > > > > > om_execute; /* RPC
> > > > > > > execution */
> > > > > > > + /*
> > > > > > > + * The count of operations that complete with tk_status
> > > > > > > < 0.
> > > > > > > + * These statuses usually indicate error conditions.
> > > > > > > + */
> > > > > > > + unsigned long om_error_status;
> > > > > > > } ____cacheline_aligned;
> > > > > > >
> > > > > > > struct rpc_task;
> > > > > > > diff --git a/net/sunrpc/stats.c b/net/sunrpc/stats.c
> > > > > > > index 8b2d3c58ffae..737414247ca7 100644
> > > > > > > --- a/net/sunrpc/stats.c
> > > > > > > +++ b/net/sunrpc/stats.c
> > > > > > > @@ -176,6 +176,8 @@ void rpc_count_iostats_metrics(const
> > > > > > > struct rpc_task *task,
> > > > > > >
> > > > > > > execute = ktime_sub(now, task->tk_start);
> > > > > > > op_metrics->om_execute = ktime_add(op_metrics-
> > > > > > > > om_execute, execute);
> > > > > > >
> > > > > > > + if (task->tk_status < 0)
> > > > > > > + op_metrics->om_error_status++;
> > > > > > >
> > > > > > > spin_unlock(&op_metrics->om_lock);
> > > > > > >
> > > > > > > @@ -218,13 +220,14 @@ static void _add_rpc_iostats(struct
> > > > > > > rpc_iostats *a, struct rpc_iostats *b)
> > > > > > > a->om_queue = ktime_add(a->om_queue, b->om_queue);
> > > > > > > a->om_rtt = ktime_add(a->om_rtt, b->om_rtt);
> > > > > > > a->om_execute = ktime_add(a->om_execute, b-
> > > > > > > > om_execute);
> > > > > > >
> > > > > > > + a->om_error_status += b->om_error_status;
> > > > > > > }
> > > > > > >
> > > > > > > static void _print_rpc_iostats(struct seq_file *seq,
> > > > > > > struct
> > > > > > > rpc_iostats *stats,
> > > > > > > int op, const struct
> > > > > > > rpc_procinfo *procs)
> > > > > > > {
> > > > > > > _print_name(seq, op, procs);
> > > > > > > - seq_printf(seq, "%lu %lu %lu %llu %llu %llu %llu
> > > > > > > %llu\n",
> > > > > > > + seq_printf(seq, "%lu %lu %lu %llu %llu %llu %llu %llu
> > > > > > > %lu\n",
> > > > > > > stats->om_ops,
> > > > > > > stats->om_ntrans,
> > > > > > > stats->om_timeouts,
> > > > > > > @@ -232,7 +235,8 @@ static void _print_rpc_iostats(struct
> > > > > > > seq_file *seq, struct rpc_iostats *stats,
> > > > > > > stats->om_bytes_recv,
> > > > > > > ktime_to_ms(stats->om_queue),
> > > > > > > ktime_to_ms(stats->om_rtt),
> > > > > > > - ktime_to_ms(stats->om_execute));
> > > > > > > + ktime_to_ms(stats->om_execute),
> > > > > > > + stats->om_error_status);
> > > > > > > }
> > > > > > >
> > > > > > > void rpc_clnt_show_stats(struct seq_file *seq, struct
> > > > > > > rpc_clnt *clnt)
> > > > > > > --
> > > > > > > 2.20.1
> > > > >
> > > > > --
> > > > > Chuck Lever
> > >
> > > --
> > > Chuck Lever
>
> --
> Chuck Lever
>
>
>

2019-06-03 19:10:57

by Chuck Lever III

[permalink] [raw]

Subject: Re: [PATCH 3/3] SUNRPC: Count ops completing with tk_status < 0

> On Jun 3, 2019, at 3:05 PM, Dave Wysochanski <[email protected]> wrote:
>
> On Mon, 2019-06-03 at 14:56 -0400, Chuck Lever wrote:
>>> On Jun 3, 2019, at 2:53 PM, Dave Wysochanski <[email protected]>
>>> wrote:
>>>
>>> On Fri, 2019-05-31 at 09:25 -0400, Chuck Lever wrote:
>>>>> On May 30, 2019, at 6:33 PM, Bruce Fields <[email protected]
>>>>>>
>>>>> wrote:
>>>>>
>>>>> On Thu, May 30, 2019 at 06:19:54PM -0400, Chuck Lever wrote:
>>>>>>
>>>>>>
>>>>>>> On May 30, 2019, at 5:38 PM, [email protected] wrote:
>>>>>>>
>>>>>>> On Thu, May 23, 2019 at 04:13:50PM -0400, Dave Wysochanski
>>>>>>> wrote:
>>>>>>>> We often see various error conditions with NFS4.x that
>>>>>>>> show
>>>>>>>> up with
>>>>>>>> a very high operation count all completing with tk_status
>>>>>>>> < 0
>>>>>>>> in a
>>>>>>>> short period of time. Add a count to rpc_iostats to
>>>>>>>> record
>>>>>>>> on a
>>>>>>>> per-op basis the ops that complete in this manner, which
>>>>>>>> will
>>>>>>>> enable lower overhead diagnostics.
>>>>>>>
>>>>>>> Looks like a good idea to me.
>>>>>>>
>>>>>>> It's too bad we can't distinguish the errors. (E.g. ESTALE
>>>>>>> on
>>>>>>> a lookup
>>>>>>> call is a lot more interesting than ENOENT.) But
>>>>>>> understood
>>>>>>> that
>>>>>>> maintaining (number of possible errors) * (number of
>>>>>>> possible
>>>>>>> ops)
>>>>>>> counters is probably overkill, so just counting the number
>>>>>>> of
>>>>>>> errors
>>>>>>> seems like a good start.
>>>>>>
>>>>>> We now have trace points that can do that too.
>>>>>
>>>>> You mean, that can report every error (and its value)?
>>>>
>>>> Yes, the nfs_xdr_status trace point reports the error by value
>>>> and
>>>> symbolic name.
>>>>
>>>
>>> The tracepoint is very useful I agree. I don't think it will show:
>>> a) the mount
>>> b) the opcode
>>>
>>> Or am I mistaken and there's a way to get those with a filter or
>>> another tracepoint?
>>
>> The opcode can be exposed by another trace point, but the link
>> between
>> the two trace points is tenuous and could be improved.
>>
>
> I think the number is there but it's not decoded! This was for a WRITE
> completing with BAD_STATEID - very nice:
>
> kworker/u16:0-31130 [006] .... 949028.602298: nfs4_xdr_status: operation 38: nfs status -10025 (BAD_STATEID)

OK, I'm smarter than I look. This shows the NFS operation,
not the RPC procedure (which would be NFS4 COMPOUND), which
is the right thing to do.

>> I don't believe any of the NFS trace points expose the mount. My
>> testing
>> is largely on a single mount so my imagination stopped there.
>>
>
> Ok thanks for the confirmation.

Well, it's something worth thinking carefully about: if
you see a slick way to make that information available,
have at it.

>>>>> I assume having these statistics in mountstats is still useful,
>>>>> though.
>>>>>
>>>>> --b.
>>>>>
>>>>>>
>>>>>>
>>>>>>> --b.
>>>>>>>
>>>>>>>>
>>>>>>>> Signed-off-by: Dave Wysochanski <[email protected]>
>>>>>>>> ---
>>>>>>>> include/linux/sunrpc/metrics.h | 7 ++++++-
>>>>>>>> net/sunrpc/stats.c | 8 ++++++--
>>>>>>>> 2 files changed, 12 insertions(+), 3 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/include/linux/sunrpc/metrics.h
>>>>>>>> b/include/linux/sunrpc/metrics.h
>>>>>>>> index 1b3751327575..0ee3f7052846 100644
>>>>>>>> --- a/include/linux/sunrpc/metrics.h
>>>>>>>> +++ b/include/linux/sunrpc/metrics.h
>>>>>>>> @@ -30,7 +30,7 @@
>>>>>>>> #include <linux/ktime.h>
>>>>>>>> #include <linux/spinlock.h>
>>>>>>>>
>>>>>>>> -#define RPC_IOSTATS_VERS "1.0"
>>>>>>>> +#define RPC_IOSTATS_VERS "1.1"
>>>>>>>>
>>>>>>>> struct rpc_iostats {
>>>>>>>> spinlock_t om_lock;
>>>>>>>> @@ -66,6 +66,11 @@ struct rpc_iostats {
>>>>>>>> ktime_t om_queue, /* queued
>>>>>>>> for
>>>>>>>> xmit */
>>>>>>>> om_rtt, /* RPC
>>>>>>>> RTT */
>>>>>>>> om_execute; /* RPC
>>>>>>>> execution */
>>>>>>>> + /*
>>>>>>>> + * The count of operations that complete with tk_status
>>>>>>>> < 0.
>>>>>>>> + * These statuses usually indicate error conditions.
>>>>>>>> + */
>>>>>>>> + unsigned long om_error_status;
>>>>>>>> } ____cacheline_aligned;
>>>>>>>>
>>>>>>>> struct rpc_task;
>>>>>>>> diff --git a/net/sunrpc/stats.c b/net/sunrpc/stats.c
>>>>>>>> index 8b2d3c58ffae..737414247ca7 100644
>>>>>>>> --- a/net/sunrpc/stats.c
>>>>>>>> +++ b/net/sunrpc/stats.c
>>>>>>>> @@ -176,6 +176,8 @@ void rpc_count_iostats_metrics(const
>>>>>>>> struct rpc_task *task,
>>>>>>>>
>>>>>>>> execute = ktime_sub(now, task->tk_start);
>>>>>>>> op_metrics->om_execute = ktime_add(op_metrics-
>>>>>>>>> om_execute, execute);
>>>>>>>>
>>>>>>>> + if (task->tk_status < 0)
>>>>>>>> + op_metrics->om_error_status++;
>>>>>>>>
>>>>>>>> spin_unlock(&op_metrics->om_lock);
>>>>>>>>
>>>>>>>> @@ -218,13 +220,14 @@ static void _add_rpc_iostats(struct
>>>>>>>> rpc_iostats *a, struct rpc_iostats *b)
>>>>>>>> a->om_queue = ktime_add(a->om_queue, b->om_queue);
>>>>>>>> a->om_rtt = ktime_add(a->om_rtt, b->om_rtt);
>>>>>>>> a->om_execute = ktime_add(a->om_execute, b-
>>>>>>>>> om_execute);
>>>>>>>>
>>>>>>>> + a->om_error_status += b->om_error_status;
>>>>>>>> }
>>>>>>>>
>>>>>>>> static void _print_rpc_iostats(struct seq_file *seq,
>>>>>>>> struct
>>>>>>>> rpc_iostats *stats,
>>>>>>>> int op, const struct
>>>>>>>> rpc_procinfo *procs)
>>>>>>>> {
>>>>>>>> _print_name(seq, op, procs);
>>>>>>>> - seq_printf(seq, "%lu %lu %lu %llu %llu %llu %llu
>>>>>>>> %llu\n",
>>>>>>>> + seq_printf(seq, "%lu %lu %lu %llu %llu %llu %llu %llu
>>>>>>>> %lu\n",
>>>>>>>> stats->om_ops,
>>>>>>>> stats->om_ntrans,
>>>>>>>> stats->om_timeouts,
>>>>>>>> @@ -232,7 +235,8 @@ static void _print_rpc_iostats(struct
>>>>>>>> seq_file *seq, struct rpc_iostats *stats,
>>>>>>>> stats->om_bytes_recv,
>>>>>>>> ktime_to_ms(stats->om_queue),
>>>>>>>> ktime_to_ms(stats->om_rtt),
>>>>>>>> - ktime_to_ms(stats->om_execute));
>>>>>>>> + ktime_to_ms(stats->om_execute),
>>>>>>>> + stats->om_error_status);
>>>>>>>> }
>>>>>>>>
>>>>>>>> void rpc_clnt_show_stats(struct seq_file *seq, struct
>>>>>>>> rpc_clnt *clnt)
>>>>>>>> --
>>>>>>>> 2.20.1
>>>>>>
>>>>>> --
>>>>>> Chuck Lever
>>>>
>>>> --
>>>> Chuck Lever
>>
>> --
>> Chuck Lever

--
Chuck Lever

2019-06-04 14:47:14

by J. Bruce Fields

[permalink] [raw]

Subject: Re: [PATCH 3/3] SUNRPC: Count ops completing with tk_status < 0

On Mon, Jun 03, 2019 at 02:56:29PM -0400, Chuck Lever wrote:
> > On Jun 3, 2019, at 2:53 PM, Dave Wysochanski <[email protected]> wrote:
> > On Fri, 2019-05-31 at 09:25 -0400, Chuck Lever wrote:
> >>> On May 30, 2019, at 6:33 PM, Bruce Fields <[email protected]>
> >>> wrote:
> >>> On Thu, May 30, 2019 at 06:19:54PM -0400, Chuck Lever wrote:
> >>>> We now have trace points that can do that too.
> >>>
> >>> You mean, that can report every error (and its value)?
> >>
> >> Yes, the nfs_xdr_status trace point reports the error by value and
> >> symbolic name.
> >
> > The tracepoint is very useful I agree. I don't think it will show:
> > a) the mount
> > b) the opcode
> >
> > Or am I mistaken and there's a way to get those with a filter or
> > another tracepoint?
>
> The opcode can be exposed by another trace point, but the link between
> the two trace points is tenuous and could be improved.
>
> I don't believe any of the NFS trace points expose the mount. My testing
> is largely on a single mount so my imagination stopped there.

Dumb question: is it possible to add more fields to tracepoints without
breaking some kind of backwards compatibility?

I wonder if adding, say, an xid and an xprt pointer to tracepoints when
available would help with this kind of thing.

In any case, I think Dave's stats will still be handy if only because
they're on all the time.

--b.

2019-06-04 14:59:17

by Trond Myklebust

[permalink] [raw]

Subject: Re: [PATCH 3/3] SUNRPC: Count ops completing with tk_status < 0

On Tue, 2019-06-04 at 10:45 -0400, Bruce Fields wrote:
> On Mon, Jun 03, 2019 at 02:56:29PM -0400, Chuck Lever wrote:
> > > On Jun 3, 2019, at 2:53 PM, Dave Wysochanski <[email protected]
> > > > wrote:
> > > On Fri, 2019-05-31 at 09:25 -0400, Chuck Lever wrote:
> > > > > On May 30, 2019, at 6:33 PM, Bruce Fields <
> > > > > [email protected]>
> > > > > wrote:
> > > > > On Thu, May 30, 2019 at 06:19:54PM -0400, Chuck Lever wrote:
> > > > > > We now have trace points that can do that too.
> > > > >
> > > > > You mean, that can report every error (and its value)?
> > > >
> > > > Yes, the nfs_xdr_status trace point reports the error by value
> > > > and
> > > > symbolic name.
> > >
> > > The tracepoint is very useful I agree. I don't think it will
> > > show:
> > > a) the mount
> > > b) the opcode
> > >
> > > Or am I mistaken and there's a way to get those with a filter or
> > > another tracepoint?
> >
> > The opcode can be exposed by another trace point, but the link
> > between
> > the two trace points is tenuous and could be improved.
> >
> > I don't believe any of the NFS trace points expose the mount. My
> > testing
> > is largely on a single mount so my imagination stopped there.
>
> Dumb question: is it possible to add more fields to tracepoints
> without
> breaking some kind of backwards compatibility?

Tracepoints are not to be considered an API.

> I wonder if adding, say, an xid and an xprt pointer to tracepoints
> when
> available would help with this kind of thing.
>
> In any case, I think Dave's stats will still be handy if only because
> they're on all the time.

Given that someone who shall remain anonymous added the struct rpc_rqst
to the struct xdr_stream for "debugging purposes", it should be quite
possible to add both the XID and the xprt in this case.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]

2019-06-07 14:28:08

by David Wysochanski

[permalink] [raw]

Subject: Re: [PATCH 3/3] SUNRPC: Count ops completing with tk_status < 0

On Tue, 2019-06-04 at 10:45 -0400, Bruce Fields wrote:
> On Mon, Jun 03, 2019 at 02:56:29PM -0400, Chuck Lever wrote:
> > > On Jun 3, 2019, at 2:53 PM, Dave Wysochanski <[email protected]
> > > > wrote:
> > > On Fri, 2019-05-31 at 09:25 -0400, Chuck Lever wrote:
> > > > > On May 30, 2019, at 6:33 PM, Bruce Fields <
> > > > > [email protected]>
> > > > > wrote:
> > > > > On Thu, May 30, 2019 at 06:19:54PM -0400, Chuck Lever wrote:
> > > > > > We now have trace points that can do that too.
> > > > >
> > > > > You mean, that can report every error (and its value)?
> > > >
> > > > Yes, the nfs_xdr_status trace point reports the error by value
> > > > and
> > > > symbolic name.
> > >
> > > The tracepoint is very useful I agree. I don't think it will
> > > show:
> > > a) the mount
> > > b) the opcode
> > >
> > > Or am I mistaken and there's a way to get those with a filter or
> > > another tracepoint?
> >
> > The opcode can be exposed by another trace point, but the link
> > between
> > the two trace points is tenuous and could be improved.
> >
> > I don't believe any of the NFS trace points expose the mount. My
> > testing
> > is largely on a single mount so my imagination stopped there.
>
> Dumb question: is it possible to add more fields to tracepoints
> without
> breaking some kind of backwards compatibility?
>
> I wonder if adding, say, an xid and an xprt pointer to tracepoints
> when
> available would help with this kind of thing.
>
> In any case, I think Dave's stats will still be handy if only because
> they're on all the time.
>
> --b.

Trond or Anna, will you take this series for mountstats or are you
opposed to it?

I think it is useful in conjuction with the tracepoints because it is
always on and easy to know which mount is involved (we often start with
a customer saying mount XYZ has some issue or is hanging). If you see
problems or want other testing please let me know.

Thanks!