Received: by 2002:a25:ab43:0:0:0:0:0 with SMTP id u61csp4495385ybi; Mon, 3 Jun 2019 11:57:25 -0700 (PDT) X-Google-Smtp-Source: APXvYqy2FxaXN31STZN1Av+V+SFsP3ahGV/YEH3Aoj0e5IuYrcMcAY4XwWGjBLl8RZsImJL9pxtr X-Received: by 2002:a62:4dc5:: with SMTP id a188mr33669253pfb.8.1559588244985; Mon, 03 Jun 2019 11:57:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559588244; cv=none; d=google.com; s=arc-20160816; b=N8nMRbrPcfFJ4lUJjy2CjngElN3+CTiRECC2+STvpgCpTt0+qzgI/dIPm8XxtobgvC p9m0Mx0Rq0x8cfwwcSaQ/rsLf67RUYPTN8VxQKJVyfZAmg/BBGhj3TY5IC3b5JTKrmR9 gUY9EWH12CjAl+VFGmSX2gG/98fFI2woSZblV8F9W8tZAmIwwcuEzLtpQYaVucpgXWca VUSqlzNR+XgFnTvPw3aufSvOEkgazn5HusmkDqcwjtUMmkmFFSwFtsodGSnBGn/nVBht dNvls0mi5ixAHVLFuZsWlYarJZOjSLvPZEyFxK0/EYoiIKTFoKz+/9DAiOkWSEckjHOH S/kw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:to:references:message-id :content-transfer-encoding:cc:date:in-reply-to:from:subject :mime-version:dkim-signature; bh=jNvF0mCZQ85SIj5zZFKtF/4vMn6EwVBD9gbfakWH8FU=; b=lGnjZJ292T9ifPSStF3X54xnXGOnDuvhCoJrpWotafkNBMXJRDnh8Biie3bE2YjryA qmDvwds9OEUxhVYIBOxLdhic1mbZ88mD02Mpng8un5A8kMcijyeNUnHITPEGXmnZ982R ubUJv9NTN7s/6j4CkGUyiO9EUfHC3vY7nUeEwlehqesIQK/JBVyWH0nQuf0LXO3QWpFQ 5XbXvCtDRv1DFVcFWbr2TC0rjcGRxjqd85r6CD9Xg/cT+CMWTO9roa30YkwTLGh5VCkj VuPwdM+yR+2GkgClKX+2qeU+LTcvofzLQacSQzsj1Xusw/876V+LUCd3HPLiSnhbWMiF m9kg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=IrBCKmdo; spf=pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d35si15791447pla.349.2019.06.03.11.57.08; Mon, 03 Jun 2019 11:57:24 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=IrBCKmdo; spf=pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726112AbfFCS5D (ORCPT + 99 others); Mon, 3 Jun 2019 14:57:03 -0400 Received: from userp2120.oracle.com ([156.151.31.85]:54412 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726049AbfFCS5D (ORCPT ); Mon, 3 Jun 2019 14:57:03 -0400 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x53Is1Ph188819; Mon, 3 Jun 2019 18:56:34 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=content-type : mime-version : subject : from : in-reply-to : date : cc : content-transfer-encoding : message-id : references : to; s=corp-2018-07-02; bh=jNvF0mCZQ85SIj5zZFKtF/4vMn6EwVBD9gbfakWH8FU=; b=IrBCKmdoRgF6vF0vtJbDXAd2BOCppVHlEvcmv57RvCz3GuehgkfGdcLjxOQKuJ0pyf31 pUWKiiXDbNThgRtz86WZg/lkm4XMXOMLSptCBDM5d7fromTXoffhZSKxyDBAQNZrOYJt aJG9T1XCbLovjP0f8kcGpQboMAEfI62i9X2lblt4Kw6Kv+vdl0EdB4MfGGJwqeUZsFXM kDq+fwS1CkrcMNXF/8/svabjEiWpSnYID7qQonAY5wB4vxaxXSr4LrJkYnBDzpeUtSI5 wx0Y/wD3eauv+tOyerkn3cNstdggpYoCL0n5tMkWOkqBvjmActsZ7+t9CemKKrxCWflN 1w== Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by userp2120.oracle.com with ESMTP id 2suj0q8s3d-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 03 Jun 2019 18:56:34 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x53IuEJt126693; Mon, 3 Jun 2019 18:56:34 GMT Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by userp3020.oracle.com with ESMTP id 2svnn8e7yw-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 03 Jun 2019 18:56:33 +0000 Received: from abhmp0022.oracle.com (abhmp0022.oracle.com [141.146.116.28]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id x53IuU7e020977; Mon, 3 Jun 2019 18:56:30 GMT Received: from anon-dhcp-171.1015granger.net (/68.61.232.219) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 03 Jun 2019 11:56:30 -0700 Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\)) Subject: Re: [PATCH 3/3] SUNRPC: Count ops completing with tk_status < 0 From: Chuck Lever In-Reply-To: Date: Mon, 3 Jun 2019 14:56:29 -0400 Cc: Bruce Fields , Linux NFS Mailing List Content-Transfer-Encoding: 7bit Message-Id: <5CE8A68E-F5C2-4321-8F57-451F5E5AF789@oracle.com> References: <20190523201351.12232-1-dwysocha@redhat.com> <20190523201351.12232-3-dwysocha@redhat.com> <20190530213857.GA24802@fieldses.org> <9B9F0C9B-C493-44F5-ABD1-6B2B4BAA2F08@oracle.com> <20190530223314.GA25368@fieldses.org> To: Dave Wysochanski X-Mailer: Apple Mail (2.3445.104.11) X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9277 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1906030128 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9277 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1906030128 Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org > On Jun 3, 2019, at 2:53 PM, Dave Wysochanski wrote: > > On Fri, 2019-05-31 at 09:25 -0400, Chuck Lever wrote: >>> On May 30, 2019, at 6:33 PM, Bruce Fields >>> wrote: >>> >>> On Thu, May 30, 2019 at 06:19:54PM -0400, Chuck Lever wrote: >>>> >>>> >>>>> On May 30, 2019, at 5:38 PM, bfields@fieldses.org wrote: >>>>> >>>>> On Thu, May 23, 2019 at 04:13:50PM -0400, Dave Wysochanski >>>>> wrote: >>>>>> We often see various error conditions with NFS4.x that show >>>>>> up with >>>>>> a very high operation count all completing with tk_status < 0 >>>>>> in a >>>>>> short period of time. Add a count to rpc_iostats to record >>>>>> on a >>>>>> per-op basis the ops that complete in this manner, which will >>>>>> enable lower overhead diagnostics. >>>>> >>>>> Looks like a good idea to me. >>>>> >>>>> It's too bad we can't distinguish the errors. (E.g. ESTALE on >>>>> a lookup >>>>> call is a lot more interesting than ENOENT.) But understood >>>>> that >>>>> maintaining (number of possible errors) * (number of possible >>>>> ops) >>>>> counters is probably overkill, so just counting the number of >>>>> errors >>>>> seems like a good start. >>>> >>>> We now have trace points that can do that too. >>> >>> You mean, that can report every error (and its value)? >> >> Yes, the nfs_xdr_status trace point reports the error by value and >> symbolic name. >> > > The tracepoint is very useful I agree. I don't think it will show: > a) the mount > b) the opcode > > Or am I mistaken and there's a way to get those with a filter or > another tracepoint? The opcode can be exposed by another trace point, but the link between the two trace points is tenuous and could be improved. I don't believe any of the NFS trace points expose the mount. My testing is largely on a single mount so my imagination stopped there. >>> I assume having these statistics in mountstats is still useful, >>> though. >>> >>> --b. >>> >>>> >>>> >>>>> --b. >>>>> >>>>>> >>>>>> Signed-off-by: Dave Wysochanski >>>>>> --- >>>>>> include/linux/sunrpc/metrics.h | 7 ++++++- >>>>>> net/sunrpc/stats.c | 8 ++++++-- >>>>>> 2 files changed, 12 insertions(+), 3 deletions(-) >>>>>> >>>>>> diff --git a/include/linux/sunrpc/metrics.h >>>>>> b/include/linux/sunrpc/metrics.h >>>>>> index 1b3751327575..0ee3f7052846 100644 >>>>>> --- a/include/linux/sunrpc/metrics.h >>>>>> +++ b/include/linux/sunrpc/metrics.h >>>>>> @@ -30,7 +30,7 @@ >>>>>> #include >>>>>> #include >>>>>> >>>>>> -#define RPC_IOSTATS_VERS "1.0" >>>>>> +#define RPC_IOSTATS_VERS "1.1" >>>>>> >>>>>> struct rpc_iostats { >>>>>> spinlock_t om_lock; >>>>>> @@ -66,6 +66,11 @@ struct rpc_iostats { >>>>>> ktime_t om_queue, /* queued for >>>>>> xmit */ >>>>>> om_rtt, /* RPC RTT */ >>>>>> om_execute; /* RPC >>>>>> execution */ >>>>>> + /* >>>>>> + * The count of operations that complete with tk_status >>>>>> < 0. >>>>>> + * These statuses usually indicate error conditions. >>>>>> + */ >>>>>> + unsigned long om_error_status; >>>>>> } ____cacheline_aligned; >>>>>> >>>>>> struct rpc_task; >>>>>> diff --git a/net/sunrpc/stats.c b/net/sunrpc/stats.c >>>>>> index 8b2d3c58ffae..737414247ca7 100644 >>>>>> --- a/net/sunrpc/stats.c >>>>>> +++ b/net/sunrpc/stats.c >>>>>> @@ -176,6 +176,8 @@ void rpc_count_iostats_metrics(const >>>>>> struct rpc_task *task, >>>>>> >>>>>> execute = ktime_sub(now, task->tk_start); >>>>>> op_metrics->om_execute = ktime_add(op_metrics- >>>>>>> om_execute, execute); >>>>>> + if (task->tk_status < 0) >>>>>> + op_metrics->om_error_status++; >>>>>> >>>>>> spin_unlock(&op_metrics->om_lock); >>>>>> >>>>>> @@ -218,13 +220,14 @@ static void _add_rpc_iostats(struct >>>>>> rpc_iostats *a, struct rpc_iostats *b) >>>>>> a->om_queue = ktime_add(a->om_queue, b->om_queue); >>>>>> a->om_rtt = ktime_add(a->om_rtt, b->om_rtt); >>>>>> a->om_execute = ktime_add(a->om_execute, b- >>>>>>> om_execute); >>>>>> + a->om_error_status += b->om_error_status; >>>>>> } >>>>>> >>>>>> static void _print_rpc_iostats(struct seq_file *seq, struct >>>>>> rpc_iostats *stats, >>>>>> int op, const struct >>>>>> rpc_procinfo *procs) >>>>>> { >>>>>> _print_name(seq, op, procs); >>>>>> - seq_printf(seq, "%lu %lu %lu %llu %llu %llu %llu >>>>>> %llu\n", >>>>>> + seq_printf(seq, "%lu %lu %lu %llu %llu %llu %llu %llu >>>>>> %lu\n", >>>>>> stats->om_ops, >>>>>> stats->om_ntrans, >>>>>> stats->om_timeouts, >>>>>> @@ -232,7 +235,8 @@ static void _print_rpc_iostats(struct >>>>>> seq_file *seq, struct rpc_iostats *stats, >>>>>> stats->om_bytes_recv, >>>>>> ktime_to_ms(stats->om_queue), >>>>>> ktime_to_ms(stats->om_rtt), >>>>>> - ktime_to_ms(stats->om_execute)); >>>>>> + ktime_to_ms(stats->om_execute), >>>>>> + stats->om_error_status); >>>>>> } >>>>>> >>>>>> void rpc_clnt_show_stats(struct seq_file *seq, struct >>>>>> rpc_clnt *clnt) >>>>>> -- >>>>>> 2.20.1 >>>> >>>> -- >>>> Chuck Lever >> >> -- >> Chuck Lever -- Chuck Lever