2009-01-16 16:25:26

by Steve Dickson

[permalink] [raw]
Subject: [RFC][PATCH 0/5] NFS: trace points added to mounting path

Hello,

Very recently patches were added to the mainline kernel that
enabled the use of trace points. This patch series takes
advantage of those patch by introducing trace points
to the mounting path of NFS mounts. Its hoped these
trace points can be used by system administrators to
identify why NFS mounts are failing or hang in
production kernels.

IMHO, one general problem with today's "canned" NFS debugging today is it
becomes very verbose very quickly.... "I get here" and "I get there" type of
debugging statements. Although they help trace the code but very rarely
shows/defines what the actual problem was. So what I've try to do is
"define the error paths" by putting a trace point at every error exit
in hopes to define where and why things broke.

So the ultimate goal would be to replace all the dprintks with trace points
but still be able to enable them through the rpcdebug command (although we
might want to think about splitting the command out into three different
commands nfsdebug, nfsddebug, rpcdebug). Since trace points have very little
overhead, a set of trace points could be enable in production with have
little or no effect on functionality or performance.

Another advantage with trace points is the type and amount of
information that can be retrieved. With these trace points, I'm
passing in the error code as well as the data structure[s] associated
with that error. This allows the "canned" information that IT people
would used (via the rpcdebug command which would turn on a group of
trace points) as well as more detailed information that kernel developers
can used (via systemtap scripts which would turn on individual trace
points).

Patch summary:
* fs/nfs/client.c

* fs/nfs/getroot.c

* fs/nfs/super.c

The based files where traces where added.

* include/trace/nfs.h

* kernel/trace/Makefile

* kernel/trace/nfs-trace.c

The overhead of added the trace points and then converting them
into trace marks .

* samples/nfs/nfs_mount.stp

The systemtap script used to access the trace marks. I probably
should have documented the file better, but the first three
functions in the file are how structures are pulled from the
kernel. The rest are probes used to active the trace markers.


Comments... Acceptance??

steved.




2009-01-21 17:13:53

by Steve Dickson

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/5] NFS: trace points added to mounting path

Sorry for the delayed response... That darn flux capacitor broke again! ;-)


Chuck Lever wrote:
>
> I'm all for improving the observability of the NFS client.
Well, in theory, trace points will also touch the server and all
of the rpc code...

>
> But I don't (yet) see the advantage of adding this complexity in the
> mount path. Maybe the more complex and asynchronous parts of the NFS
> client, like the cached read and write paths, are more suitable to this
> type of tool.
Well the complexity is, at this point, due to how the trace points
are tied to and used by the systemtap. I'm hopeful this complexity
will die down as time goes on...

>
> Why can't we simply improve the information content of the dprintks?
The theory is trace point can be turned on, in production kernels, with
little or no performance issues...

> Can you give a few real examples of problems that these new trace points
> can identify that better dprintks wouldn't be able to address?
They can supply more information that can be used by both a kernel
guy and an IT guy.... Meaning they can supply detailed structure information
that a kernel guy would need as well as supplying the simple error code
that an IT guy would be interested.

> Generally, what kind of problems do admins face that the dprintks don't
> handle today, and what are the alternatives to addressing those issues?
Not being an admin guy, I really don't have an answer for this... but
I can say since trace point are not so much of a drag on the system as
printks are.. with in timing issues using trace point would be a big advantage
over printks

>
> Do admins who run enterprise kernels actually use SystemTap, or do they
> fall back on network traces and other tried and true troubleshooting
> methodologies?
Currently to run systemtap, one need kernel debug info and kernel developer
info installed on the system. Most productions system don't install those types
of packages.... But with trace points those type of packages will no longer be
needed, so I could definitely see admins using systemtap once its available...
Look at Dtrace... people are using that now that its available and fairly stable.

>
> If we think the mount path needs such instrumentation, consider updating
> fs/nfs/mount_clnt.c and net/sunrpc/rpcb_clnt.c as well.
>
I was just following what what was currently being debug when
'rpcinfo -m nfs -s mount' was set... maybe I missed something...
I'll take a look...

steved.

2009-01-16 16:30:09

by Steve Dickson

[permalink] [raw]
Subject: [PATCH 3/5] NFS: Adding trace points to nfs/client.c

Added trace points to a number of client routines
used in the mount process. If a routine was only called
by the the mounting process, the prefix on the trace
point starts with 'trace_nfs_mount' otherwise the
routine name as used as the prefix.

Signed-off-by: Steve Dickson <[email protected]>

--- linux/fs/nfs/client.c.orig 2009-01-14 15:54:19.000000000 -0500
+++ linux/fs/nfs/client.c 2009-01-15 12:40:08.000000000 -0500
@@ -37,6 +37,7 @@
#include <linux/in6.h>
#include <net/ipv6.h>
#include <linux/nfs_xdr.h>
+#include <trace/nfs.h>

#include <asm/system.h>

@@ -46,6 +47,19 @@
#include "iostat.h"
#include "internal.h"

+DEFINE_TRACE(nfs_create_rpc_client);
+DEFINE_TRACE(nfs_create_rpc_client_proto);
+DEFINE_TRACE(nfs_start_lockd);
+DEFINE_TRACE(nfs_init_server_rpcclient);
+DEFINE_TRACE(nfs_init_server_rpcclient_clone);
+DEFINE_TRACE(nfs_init_server_rpcclient_auth);
+DEFINE_TRACE(nfs_mount_init_clnt);
+DEFINE_TRACE(nfs_mount_init_srv);
+DEFINE_TRACE(nfs_probe_fsinfo);
+DEFINE_TRACE(nfs_probe_fsinfo_setcaps);
+DEFINE_TRACE(nfs_probe_fsinfo_fsinfo);
+DEFINE_TRACE(nfs_create_server);
+
#define NFSDBG_FACILITY NFSDBG_CLIENT

static DEFINE_SPINLOCK(nfs_client_lock);
@@ -507,6 +521,8 @@ static int nfs_create_rpc_client(struct
if (noresvport)
args.flags |= RPC_CLNT_CREATE_NONPRIVPORT;

+ trace_nfs_create_rpc_client(clp, timeparms, flavor, args.flags);
+
if (!IS_ERR(clp->cl_rpcclient))
return 0;

@@ -514,6 +530,7 @@ static int nfs_create_rpc_client(struct
if (IS_ERR(clnt)) {
dprintk("%s: cannot create RPC client. Error = %ld\n",
__func__, PTR_ERR(clnt));
+ trace_nfs_create_rpc_client_proto(clp, PTR_ERR(clnt));
return PTR_ERR(clnt);
}

@@ -554,9 +571,10 @@ static int nfs_start_lockd(struct nfs_se
return 0;

host = nlmclnt_init(&nlm_init);
- if (IS_ERR(host))
+ if (IS_ERR(host)) {
+ trace_nfs_start_lockd(server, PTR_ERR(host));
return PTR_ERR(host);
-
+ }
server->nlm_host = host;
server->destroy = nfs_destroy_server;
return 0;
@@ -601,9 +619,12 @@ static int nfs_init_server_rpcclient(str
{
struct nfs_client *clp = server->nfs_client;

+ trace_nfs_init_server_rpcclient(server, timeo, pseudoflavour);
+
server->client = rpc_clone_client(clp->cl_rpcclient);
if (IS_ERR(server->client)) {
dprintk("%s: couldn't create rpc_client!\n", __func__);
+ trace_nfs_init_server_rpcclient_clone(server, PTR_ERR(server->client));
return PTR_ERR(server->client);
}

@@ -618,6 +639,7 @@ static int nfs_init_server_rpcclient(str
auth = rpcauth_create(pseudoflavour, server->client);
if (IS_ERR(auth)) {
dprintk("%s: couldn't create credcache!\n", __func__);
+ trace_nfs_init_server_rpcclient_auth(server, PTR_ERR(auth));
return PTR_ERR(auth);
}
}
@@ -657,6 +679,7 @@ static int nfs_init_client(struct nfs_cl
error:
nfs_mark_client_ready(clp, error);
dprintk("<-- nfs_init_client() = xerror %d\n", error);
+ trace_nfs_mount_init_clnt(clp, error);
return error;
}

@@ -688,6 +711,7 @@ static int nfs_init_server(struct nfs_se
clp = nfs_get_client(&cl_init);
if (IS_ERR(clp)) {
dprintk("<-- nfs_init_server() = error %ld\n", PTR_ERR(clp));
+ trace_nfs_mount_init_srv(server, PTR_ERR(clp));
return PTR_ERR(clp);
}

@@ -807,19 +831,23 @@ static int nfs_probe_fsinfo(struct nfs_s
int error;

dprintk("--> nfs_probe_fsinfo()\n");
+ trace_nfs_probe_fsinfo(server, mntfh, fattr);

if (clp->rpc_ops->set_capabilities != NULL) {
error = clp->rpc_ops->set_capabilities(server, mntfh);
- if (error < 0)
+ if (error < 0) {
+ trace_nfs_probe_fsinfo_setcaps(server, error);
goto out_error;
+ }
}

fsinfo.fattr = fattr;
nfs_fattr_init(fattr);
error = clp->rpc_ops->fsinfo(server, mntfh, &fsinfo);
- if (error < 0)
+ if (error < 0) {
+ trace_nfs_probe_fsinfo_fsinfo(server, error);
goto out_error;
-
+ }
nfs_server_set_fsinfo(server, &fsinfo);
error = bdi_init(&server->backing_dev_info);
if (error)
@@ -956,6 +984,7 @@ struct nfs_server *nfs_create_server(con
if (!(fattr.valid & NFS_ATTR_FATTR)) {
error = server->nfs_client->rpc_ops->getattr(server, mntfh, &fattr);
if (error < 0) {
+ trace_nfs_create_server(server, mntfh, error);
dprintk("nfs_create_server: getattr error = %d\n", -error);
goto error;
}

2009-01-16 16:32:14

by Steve Dickson

[permalink] [raw]
Subject: [PATCH 4/5] NFS: Convert trace points to trace markers

This the magic that converts the trace points into trace marks
so systemtap scripts can read the data. There is a good chance
these interfaces could change, but it should not effect the
trace points define in the code.

Signed-off-by: Steve Dickson <[email protected]>

--- linux/kernel/trace/Makefile.orig 2009-01-14 15:54:20.000000000 -0500
+++ linux/kernel/trace/Makefile 2009-01-15 12:23:11.000000000 -0500
@@ -18,7 +18,7 @@ endif
obj-$(CONFIG_FUNCTION_TRACER) += libftrace.o
obj-$(CONFIG_RING_BUFFER) += ring_buffer.o

-obj-$(CONFIG_TRACING) += trace.o
+obj-$(CONFIG_TRACING) += trace.o trace_nfs.o
obj-$(CONFIG_CONTEXT_SWITCH_TRACER) += trace_sched_switch.o
obj-$(CONFIG_SYSPROF_TRACER) += trace_sysprof.o
obj-$(CONFIG_FUNCTION_TRACER) += trace_functions.o
--- /dev/null 2009-01-08 09:11:53.943261863 -0500
+++ linux/kernel/trace/trace_nfs.c 2009-01-15 12:23:11.000000000 -0500
@@ -0,0 +1,135 @@
+/*
+ * kernel/trace/nfs-trace.c
+ *
+ * NFS tracepoint probes.
+ */
+
+#include <linux/autoconf.h>
+#include <linux/module.h>
+#include <linux/marker.h>
+
+#include <linux/sunrpc/sched.h>
+#include <linux/sunrpc/xdr.h>
+#include <linux/sunrpc/xprt.h>
+
+#include <trace/nfs.h>
+
+/*
+ * NFS mount probs
+ */
+void probe_nfs_mount(struct file_system_type *fs_type,
+ int flags, const char *dev_name, void *raw_data, struct vfsmount *mnt)
+{
+ trace_mark(nfs_mount, "%p %x %p %p %p",
+ fs_type, flags, dev_name, raw_data, mnt);
+}
+void probe_nfs_mount_data_null(struct nfs_mount_data *data, int err)
+{
+ trace_mark(nfs_mount_data_null, "%p %d", data, err);
+}
+void probe_nfs_mount_data_invalvers(struct nfs_mount_data *data, int err)
+{
+ trace_mark(nfs_mount_data_invalvers, "%p %d", data, err);
+}
+void probe_nfs_mount_data_invalsec(struct nfs_mount_data *data, int err)
+{
+ trace_mark(nfs_mount_data_invalsec, "%p %d", data, err);
+}
+void probe_nfs_mount_data_nomem(struct nfs_mount_data *data, int err)
+{
+ trace_mark(nfs_mount_data_nomem, "%p %d", data, err);
+}
+void probe_nfs_mount_data_noaddr(struct nfs_mount_data *data, int err)
+{
+ trace_mark(nfs_mount_data_noaddr, "%p %d", data, err);
+}
+void probe_nfs_mount_data_badfh(struct nfs_mount_data *data, int err)
+{
+ trace_mark(nfs_mount_data_badfh, "%p %d", data, err);
+}
+void probe_nfs_mount_init_clnt(struct nfs_client *clp, int error) {
+
+ trace_mark(nfs_mount_init_clnt, "%p %d", clp, error);
+}
+void probe_nfs_mount_get_root(struct super_block *sb,
+ struct nfs_server *server, struct nfs_fh *mntfh)
+{
+ trace_mark(nfs_mount_get_root, "%p %p %p", sb, server, mntfh);
+}
+void probe_nfs__mount_get_root_fhget(struct super_block *sb,
+ struct nfs_server *server, int error)
+{
+ trace_mark(nfs__mount_get_root_fhget, "%p %p %d", sb, server, error);
+}
+void probe_nfs__mount_get_root_dummy_root(struct super_block *sb,
+ struct nfs_server *server, int error)
+{
+ trace_mark(nfs__mount_get_root_dummy_root, "%p %p %d", sb, server, error);
+}
+void probe_nfs__mount_get_root_alias(struct super_block *sb,
+ struct nfs_server *server, int error)
+{
+ trace_mark(nfs__mount_get_root_alias, "%p %p %d", sb, server, error);
+}
+
+void probe_nfs_create_rpc_client(struct nfs_client *clp,
+ const struct rpc_timeout *timeo, rpc_authflavor_t flavor,
+ int flags)
+{
+ trace_mark(nfs_create_rpc_client, "%p %p %d %d",
+ clp, timeo, flavor, flags);
+}
+void probe_nfs_create_rpc_client_proto(struct nfs_client *clp, int error)
+{
+
+ trace_mark(nfs_create_rpc_client_proto, "%p %d", clp, error);
+}
+void probe_nfs_start_lockd(struct nfs_server *server, int error)
+{
+ trace_mark(nfs_start_lockd, "%p %d", server, error);
+}
+void probe_nfs_init_server_rpcclient(struct nfs_server *server,
+ const struct rpc_timeout *timeo,
+ rpc_authflavor_t flavor)
+{
+ trace_mark(nfs_init_server_rpcclient, "%p %p %d", server, timeo, flavor);
+}
+void probe_nfs_init_server_rpcclient_clone(
+ struct nfs_server *server, int error)
+{
+ trace_mark(nfs_init_server_rpcclient_clone, "%p %d", server, error);
+}
+void probe_nfs_init_server_rpcclient_auth(
+ struct nfs_server *server, int error)
+{
+ trace_mark(nfs_init_server_rpcclient_auth, "%p %d", server, error);
+}
+void probe_nfs_mount_init_srv(
+ struct nfs_server *server, int error)
+{
+ trace_mark(nfs_mount_init_srv, "%p %d", server, error);
+}
+
+void probe_nfs_probe_fsinfo(struct nfs_server *server,
+ struct nfs_fh *mntfh, struct nfs_fattr *fattr)
+{
+ trace_mark(nfs_probe_fsinfo, "%p %p %p", server, mntfh, fattr);
+}
+void probe_nfs_probe_fsinfo_setcaps(struct nfs_server *server,
+ struct nfs_fh *mntfh, int error)
+{
+ trace_mark(nfs_probe_fsinfo_setcaps, "%p %p %d", server, mntfh, error);
+}
+void probe_nfs_probe_fsinfo_fsinfo(struct nfs_server *server, int error)
+{
+ trace_mark(nfs_probe_fsinfo_fsinfo, "%p %d", server, error);
+}
+void probe_nfs_create_server(struct nfs_server *server,
+ struct nfs_fh *mntfh, int error)
+{
+ trace_mark(nfs_create_server, "%p %p %d", server, mntfh, error);
+}
+void probe_nfs_mount_sget(struct nfs_server *server, int error)
+{
+ trace_mark(nfs_mount_sget, "%p %d", server, error);
+}
--- /dev/null 2009-01-08 09:11:53.943261863 -0500
+++ linux/include/trace/nfs.h 2009-01-15 12:30:45.000000000 -0500
@@ -0,0 +1,99 @@
+#ifndef _TRACE_NFS_H
+#define _TRACE_NFS_H
+#include <linux/init.h>
+#include <linux/kernel.h>
+
+#include <linux/buffer_head.h>
+#include <linux/tracepoint.h>
+
+#include <linux/nfs_mount.h>
+#include <linux/nfs4.h>
+#include <linux/nfs_xdr.h>
+#include <linux/nfs_fs_sb.h>
+#include <linux/sunrpc/auth.h>
+
+#include <linux/posix_acl.h>
+/*
+ * NFS mounting trace points
+ */
+DECLARE_TRACE(nfs_mount,
+ TPPROTO(struct file_system_type *fs_type, int flags,
+ const char *dev_name, void *raw_data, struct vfsmount *mnt),
+ TPARGS(fs_type, flags, dev_name, raw_data, mnt));
+
+DECLARE_TRACE(nfs_mount_data_null,
+ TPPROTO(struct nfs_mount_data *data, int err), TPARGS(data, err));
+DECLARE_TRACE(nfs_mount_data_invalvers,
+ TPPROTO(struct nfs_mount_data *data, int err), TPARGS(data, err));
+DECLARE_TRACE(nfs_mount_data_invalsec,
+ TPPROTO(struct nfs_mount_data *data, int err), TPARGS(data, err));
+DECLARE_TRACE(nfs_mount_data_nomem,
+ TPPROTO(struct nfs_mount_data *data, int err), TPARGS(data, err));
+DECLARE_TRACE(nfs_mount_data_noaddr,
+ TPPROTO(struct nfs_mount_data *data, int err), TPARGS(data, err));
+DECLARE_TRACE(nfs_mount_data_badfh,
+ TPPROTO(struct nfs_mount_data *data, int err), TPARGS(data, err));
+
+DECLARE_TRACE(nfs_mount_get_root,
+ TPPROTO(struct super_block *sb, struct nfs_server *server,
+ struct nfs_fh *mntfh),
+ TPARGS(sb, server, mntfh));
+DECLARE_TRACE(nfs_mount_getroot,
+ TPPROTO(struct super_block *sb, struct nfs_server *server,
+ int error),
+ TPARGS(sb, server, error));
+DECLARE_TRACE(nfs_mount_get_root_fhget,
+ TPPROTO(struct super_block *sb, struct nfs_server *server,
+ int error),
+ TPARGS(sb, server, error));
+DECLARE_TRACE(nfs_mount_get_root_alias,
+ TPPROTO(struct super_block *sb, struct nfs_server *server,
+ int error),
+ TPARGS(sb, server, error));
+DECLARE_TRACE(nfs_mount_get_root_dummy_root,
+ TPPROTO(struct super_block *sb, struct nfs_server *server,
+ int error),
+ TPARGS(sb, server, error));
+
+DECLARE_TRACE(nfs_create_rpc_client,
+ TPPROTO(struct nfs_client *clp, const struct rpc_timeout *timeo,
+ rpc_authflavor_t flavor, int flags),
+ TPARGS(clp, timeo, flavor, flags));
+
+DECLARE_TRACE(nfs_create_rpc_client_proto,
+ TPPROTO(struct nfs_client *clp, int error),
+ TPARGS(clp,error));
+//DECLARE_TRACE(nfs_create_rpc_client_client, TPPROTO(int error), TPARGS(error));
+DECLARE_TRACE(nfs_mount_init_clnt,
+ TPPROTO(struct nfs_client *clp, int error), TPARGS(clp, error));
+DECLARE_TRACE(nfs_mount_init_srv,
+ TPPROTO(struct nfs_server *server, int error), TPARGS(server, error));
+
+DECLARE_TRACE(nfs_mount_sget,
+ TPPROTO(struct nfs_server *server, int error), TPARGS(server, error));
+
+DECLARE_TRACE(nfs_init_server_rpcclient,
+ TPPROTO(struct nfs_server *server, const struct rpc_timeout *timeo,
+ rpc_authflavor_t flavor),
+ TPARGS(server, timeo, flavor));
+DECLARE_TRACE(nfs_init_server_rpcclient_clone,
+ TPPROTO(struct nfs_server *server, int error), TPARGS(server, error));
+DECLARE_TRACE(nfs_init_server_rpcclient_auth,
+ TPPROTO(struct nfs_server *server, int error), TPARGS(server, error));
+
+DECLARE_TRACE(nfs_start_lockd,
+ TPPROTO(struct nfs_server *server, int error), TPARGS(server, error));
+
+DECLARE_TRACE(nfs_probe_fsinfo,
+ TPPROTO(struct nfs_server *server, struct nfs_fh *mntfh,
+ struct nfs_fattr *fattr),
+ TPARGS(server, mntfh, fattr));
+DECLARE_TRACE(nfs_probe_fsinfo_fsinfo,
+ TPPROTO(struct nfs_server *server, int error), TPARGS(server, error));
+DECLARE_TRACE(nfs_probe_fsinfo_setcaps,
+ TPPROTO(struct nfs_server *server, int error), TPARGS(server, error));
+
+DECLARE_TRACE(nfs_create_server,
+ TPPROTO(struct nfs_server *server, struct nfs_fh *mntfh, int error),
+ TPARGS(server, mntfh, error));
+#endif

2009-01-16 16:33:56

by Steve Dickson

[permalink] [raw]
Subject: [PATCH 5/5] NFS: Systemtap script

The systemtap scripts used to pull and parse the information
from the kernel.

Signed-off-by: Steve Dickson <[email protected]>

--- /dev/null 2009-01-08 09:11:53.943261863 -0500
+++ linux/samples/nfs/nfs_mount.stp 2009-01-15 12:23:11.000000000 -0500
@@ -0,0 +1,160 @@
+%{
+#include <linux/mount.h>
+#include <linux/nfs_mount.h>
+%}
+
+function _fstype_name:string (_fstype:long) %{
+ struct file_system_type *fstype;
+ char *name;
+
+ fstype = (struct file_system_type *)(long)kread(&(THIS->_fstype));
+ name = (char *)(long)kread(&fstype->name);
+
+ snprintf(THIS->__retvalue, MAXSTRINGLEN, "name %s flags 0x%x",
+ name, fstype->fs_flags);
+
+ CATCH_DEREF_FAULT();
+%}
+function _vfsmnt_dump:string (_vfsmnt:long) %{
+ struct vfsmount *vfsmnt;
+ char *dev;
+
+ vfsmnt = (struct vfsmount *)(long)kread(&(THIS->_vfsmnt));
+ dev = (char *)(long)kread(&vfsmnt->mnt_devname);
+
+ snprintf(THIS->__retvalue, MAXSTRINGLEN, "dev %s flags=0x%x",
+ vfsmnt->mnt_devname, vfsmnt->mnt_flags);
+
+ CATCH_DEREF_FAULT();
+%}
+function _nfsmnt_dump:string (_nfsmnt:long) %{
+ struct nfs_mount_data *data;
+ unsigned char *bytes;
+
+ data = (struct nfs_mount_data *)(long)kread(&(THIS->_nfsmnt));
+ bytes = (unsigned char *)&data->addr.sin_addr.s_addr;
+
+ snprintf(THIS->__retvalue, MAXSTRINGLEN,
+ "vers %d flags 0x%x flavor %d hostname %s(%d.%d.%d.%d)",
+ data->version, data->flags, data->pseudoflavor,
+ data->hostname, bytes[0], bytes[1], bytes[2], bytes[3]);
+
+ CATCH_DEREF_FAULT();
+%}
+
+probe kernel.mark("nfs_mount") {
+ printf("nfs_mount:entry: fstype (%s) flags %x dev %s\n",
+ _fstype_name($arg1), $arg2, kernel_string($arg3));
+ printf("\tdata: %s\n\tmnt: %s\n",
+ _nfsmnt_dump($arg4), _vfsmnt_dump($arg5));
+
+}
+probe kernel.mark("nfs_mount_data_null") {
+ printf("nfs_mount: missing mount data: errno %d\n", $arg1);
+}
+probe kernel.mark("nfs_mount_data_badvers") {
+
+ if ($arg1 <= 0) {
+ printf("nfs_mount: invalid mount version: vers %d <= 0\n", $arg1);
+ } else {
+ printf("nfs_mount: invalid mount version: vers %d > %d\n",
+ $arg1, $arg2);
+ }
+}
+probe kernel.mark("nfs_mount_data_invalvers") {
+ if ($arg1 == 3) {
+ printf("nfs_mount: mount structure version %d does not support NFSv3\n", $arg1);
+ } else {
+ printf("nfs_mount: mount structure version %d does not support strong security\n", arg1)
+ }
+}
+probe kernel.mark("nfs_mount_data_noaddr") {
+ printf("nfs_mount: invalid server IP address:\n");
+}
+probe kernel.mark("nfs_mount_data_badsize") {
+ printf("nfs_mount: invalid root filehandle: fhsize %s > maxsize %d\n",
+ arg1, arg2);
+}
+probe kernel.mark("nfs_mount_get_root") {
+ printf("nfs_get_root: sb %p server %p mntfh %p\n", $arg1, $arg2, $arg3);
+}
+probe kernel.mark("nfs_mount_getroot_fhget1") {
+ printf("nfs_get_root: !s_root: nfs_fhget failed: errno %d (%s)\n",
+ $arg1, errno_str($arg1));
+}
+probe kernel.mark("nfs_mount_getroot_alloc1") {
+ printf("nfs_get_root: !s_root: d_alloc_root failed: errno %d (%s)\n",
+ $arg1, errno_str($arg1));
+}
+probe kernel.mark("nfs_mount_getroot_fhget2") {
+ printf("nfs_get_root: nfs_fhget failed: errno %d (%s)\n",
+ $arg1, errno_str($arg1));
+}
+probe kernel.mark("nfs_mount_getroot_alloc2") {
+ printf("nfs_get_root: d_alloc_root failed: errno %d (%s)\n",
+ $arg1, errno_str($arg1));
+}
+probe kernel.mark("nfs_mount_init_clnt") {
+ printf("nfs_init_client: nfs_create_rpc_client failed: errno %d (%s)\n",
+ $arg1, errno_str($arg1));
+}
+probe kernel.mark("nfs_create_rpc_client") {
+ printf("nfs_create_rpc_client: clp %p proto %d timeo %d retrans %d flavor %d",
+ $arg1, $arg2, $arg3, $arg4, $arg5);
+}
+probe kernel.mark("nfs_create_rpc_client_proto") {
+ printf("nfs_create_rpc_client: xprt_create_proto failed: errno %d (%s)\n",
+ $arg1, errno_str($arg1));
+}
+probe kernel.mark("nfs_create_rpc_client_client") {
+ printf("nfs_create_rpc_client: rpc_create_client failed: errno %d (%s)\n",
+ $arg1, errno_str($arg1));
+}
+probe kernel.mark("nfs_mount_init_srv") {
+ printf("nfs_init_server: nfs_get_client failed: errno %d (%s)\n",
+ $arg1, errno_str($arg1));
+}
+probe kernel.mark("nfs_mount_sget") {
+ printf("nfs_init_server: nfs_get_client failed: errno %d (%s)\n",
+ $arg1, errno_str($arg1));
+}
+
+probe kernel.mark("nfs_init_server_rpcclient") {
+ printf("nfs_init_server_rpcclient: server %p flavor %d\n",
+ $arg1, $arg2);
+}
+probe kernel.mark("nfs_init_server_rpcclient_clone") {
+ printf("nfs_init_server_rpcclient: rpc_clone_client failed errno %d (%s)\n",
+ $arg1, errno_str($arg1));
+}
+probe kernel.mark("nfs_init_server_rpcclient_auth") {
+ printf("nfs_init_server_rpcclient: rpcauth_create failed errno %d (%s)\n",
+ $arg1, errno_str($arg1));
+}
+probe kernel.mark("nfs_start_lockd") {
+ // struct nfs_server *server = $arg1;
+
+ printf("nfs_start_lockd: lockd_up_proto failed errno %d (%s)\n",
+ $arg2, errno_str($arg2));
+}
+probe kernel.mark("nfs_probe_fsinfo") {
+ printf("nfs_probe_fsinfo: server %p mntfh %p fattr %p\n",
+ $arg1, $arg2, $arg3);
+}
+probe kernel.mark("nfs_probe_fsinfo_setcaps") {
+ printf("nfs_probe_fsinfo: set_capabilities failed errno %d (%s)\n",
+ $arg1, errno_str($arg1));
+}
+probe kernel.mark("nfs_probe_fsinfo_fsinfo") {
+ printf("nfs_probe_fsinfo: fsinfo failed errno %d (%s)\n",
+ $arg1, errno_str($arg1));
+}
+probe kernel.mark("nfs_create_server") {
+ // struct nfs_server *server = $arg1;
+ // struct nfs_fh *mntfh = $arg2;
+
+ printf("nfs_create_server: getattr failed errno %d (%s)\n",
+ $arg3, errno_str($arg3));
+}
+probe begin { log("starting nfs_mount trace") }
+probe end { log("ending nfs_mount trace") }

2009-01-16 18:53:47

by Chuck Lever

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/5] NFS: trace points added to mounting path


On Jan 16, 2009, at Jan 16, 2009, 11:22 AM, Steve Dickson wrote:

> Hello,
>
> Very recently patches were added to the mainline kernel that
> enabled the use of trace points. This patch series takes
> advantage of those patch by introducing trace points
> to the mounting path of NFS mounts. Its hoped these
> trace points can be used by system administrators to
> identify why NFS mounts are failing or hang in
> production kernels.
>
>
> IMHO, one general problem with today's "canned" NFS debugging today
> is it
> becomes very verbose very quickly.... "I get here" and "I get there"
> type of
> debugging statements. Although they help trace the code but very
> rarely
> shows/defines what the actual problem was. So what I've try to do is
> "define the error paths" by putting a trace point at every error exit
> in hopes to define where and why things broke.
>
> So the ultimate goal would be to replace all the dprintks with trace
> points
> but still be able to enable them through the rpcdebug command
> (although we
> might want to think about splitting the command out into three
> different
> commands nfsdebug, nfsddebug, rpcdebug). Since trace points have
> very little
> overhead, a set of trace points could be enable in production with
> have
> little or no effect on functionality or performance.
>
> Another advantage with trace points is the type and amount of
> information that can be retrieved. With these trace points, I'm
> passing in the error code as well as the data structure[s] associated
> with that error. This allows the "canned" information that IT people
> would used (via the rpcdebug command which would turn on a group of
> trace points) as well as more detailed information that kernel
> developers
> can used (via systemtap scripts which would turn on individual trace
> points).
>
> Patch summary:
> * fs/nfs/client.c
>
> * fs/nfs/getroot.c
>
> * fs/nfs/super.c
>
> The based files where traces where added.
>
> * include/trace/nfs.h
>
> * kernel/trace/Makefile
>
> * kernel/trace/nfs-trace.c
>
> The overhead of added the trace points and then converting them
> into trace marks .
>
> * samples/nfs/nfs_mount.stp
>
> The systemtap script used to access the trace marks. I probably
> should have documented the file better, but the first three
> functions in the file are how structures are pulled from the
> kernel. The rest are probes used to active the trace markers.
>
>
> Comments... Acceptance??


I'm all for improving the observability of the NFS client.

But I don't (yet) see the advantage of adding this complexity in the
mount path. Maybe the more complex and asynchronous parts of the NFS
client, like the cached read and write paths, are more suitable to
this type of tool.

Why can't we simply improve the information content of the dprintks?
Can you give a few real examples of problems that these new trace
points can identify that better dprintks wouldn't be able to address?
Generally, what kind of problems do admins face that the dprintks
don't handle today, and what are the alternatives to addressing those
issues?

Do admins who run enterprise kernels actually use SystemTap, or do
they fall back on network traces and other tried and true
troubleshooting methodologies?

If we think the mount path needs such instrumentation, consider
updating fs/nfs/mount_clnt.c and net/sunrpc/rpcb_clnt.c as well.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

2009-01-16 23:44:50

by Greg Banks

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/5] NFS: trace points added to mounting path

Steve Dickson wrote:
> So the ultimate goal would be to replace all the dprintks with trace points
> but still be able to enable them through the rpcdebug command
I have a patch which changes the definition of the dprintk() macro (but
*not* dprintk() callsites) to allow enabling and disabling individual
dprintk() statements through a /proc/ interface. Would you be
interested in that?

--
Greg Banks, P.Engineer, SGI Australian Software Group.
the brightly coloured sporks of revolution.
I don't speak for SGI.


2009-01-17 16:16:01

by Frank Ch. Eigler

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/5] NFS: trace points added to mounting path



Greg Banks <gnb-cP1dWloDopni96+mSzHFpQC/[email protected]> writes:

> I have a patch which changes the definition of the dprintk() macro
> (but *not* dprintk() callsites) to allow enabling and disabling
> individual dprintk() statements through a /proc/ interface. Would
> you be interested in that?

It would make more sense to me to turn dprintk's into trace_marks, then
use http://lkml.org/lkml/2008/12/30/297 to control transmission of the
data to ftrace.

- FChE


2009-01-18 16:40:37

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/5] NFS: trace points added to mounting path

On Fri, Jan 16, 2009 at 11:22:41AM -0500, Steve Dickson wrote:
> Comments... Acceptance??

Please add support to actually make use of these without a big pile of
out of tree crap.

2009-01-18 17:48:15

by Frank Ch. Eigler

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/5] NFS: trace points added to mounting path



Hi -

On Sat, Jan 17, 2009 at 09:58:45PM -0600, Tom Tucker wrote:
> [...]
> >>I have a patch which changes the definition of the dprintk() macro
> >>(but *not* dprintk() callsites) to allow enabling and disabling
> >>individual dprintk() statements through a /proc/ interface. Would
> >>you be interested in that?
> >
> >It would make more sense to me to turn dprintk's into trace_marks, then
> >use http://lkml.org/lkml/2008/12/30/297 to control transmission of the
> >data to ftrace.

> [...]
> That said, and sorry for my ignorance on trace markers, but:
>
> - Could you describe how we would define "classes" of trace markers. I
> certainly don't want to have to turn on and off each call-site
> individually. How would these classes be different than adding more bits
> to the current rpc_debug mechanism?

You're well prepared to reuse the classes you already have - see below.


> - From Steve's patches, it's not obvious to me how we would convert
> dprintk to trace markers without visiting every single call site. Can
> the current macros be munged to use the trace marker interfaces without
> losing debug information?

The minimal possibility is to just do something like this:

#define dprintk(format...) trace_mark(nfs_message, format)

Or for the dfprintk that includes the facility symbol:

#define dfprintk(facility,format...) trace_mark (nfs_#facility, format)

The result would be to have one marker family for each
{RPC,NFS,...}DBG_* type - or any other "facility" symbol you invent on
the spot. All members of each family can be enabled by attaching a
marker handler to the "nfs_FACILITY" name. With Lai's markers->ftrace
proposed patch, this would be done from user-space by something like:

echo -n 'nfs_FACILITY' > /debugfs/tracing/tracing_markers

(You can take away the nfs_ prefix if you like.)

> - What is the overhead of an "inactive" trace marker in data size
> and execution time relative to a dprintk?

It should be similar. The global {nfs,rpc,...}_debug numbers could go
away and just rely on the marker's built-in on/off control API.

> - What is the overhead of an "active" trace marker in data size and
> execution time relative to a dprintk?

If the markers end up being channeled to the ftrace buffers, it should
be significantly lighter-weight than sending them to printk. If some
other marker consumer connects also or instead, it depends on what
that does.


- FChE


2009-01-18 23:13:23

by Greg Banks

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/5] NFS: trace points added to mounting path

Frank Ch. Eigler wrote:
> Greg Banks <gnb-cP1dWloDopni96+mSzHFpQC/[email protected]> writes:
>
>
>> I have a patch which changes the definition of the dprintk() macro
>> (but *not* dprintk() callsites) to allow enabling and disabling
>> individual dprintk() statements through a /proc/ interface. Would
>> you be interested in that?
>>
>
> It would make more sense to me to turn dprintk's into trace_marks,
Umm, ok. Sorry to be so ignorant but where would I find the doc that
tells me about adding trace marks ?

> then
> use http://lkml.org/lkml/2008/12/30/297 to control transmission of the
> data to ftrace.
>
>
The control interface seems a little primitive. It seems like you can
only activate and deactivate single printks ? I don't see a way to e.g.
activate every trace make in a particular function, or in a particular
.c file. I thought both of these were useful things to do, so I
implemented them. Below is an extract from the doc that accompanies the
sgi-dprintk module.

The dprintk module has even more useful features:

* Simple query language allows turning on and off dprintks by matching
any combination of:

- source filename
- function name
- line number (including ranges of line numbers)
- module name
- format string

* Provides a /proc/dprintk which can be read to display the complete
list of all dprintk()s known, to help guide you

* The module is optional. The NFS dprintk()s still work with the
/proc/sys/sunrpc/ bitmasks. The dprintk module can be loaded or
unloaded at any time.

* In addition to enabling the print, two other behaviours can be enabled:

- printing a kernel stack trace
- crashing the kernel, so that a dump can be taken

...

Viewing dprintk() Behaviour
===========================

You can view the currently configured behaviour of all the dprintk()s
in loaded modules by reading /proc/dprintk. For example:

nullarbor:~ # cat /proc/dprintk
# filename:lineno [module]function flags format
/usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svc_rdma.c:323 [svcxprt_rdma]svc_rdma_cleanup - "SVCRDMA\040Module\040Removed,\040deregister\040RPC\040RDMA\040transport\012"
/usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svc_rdma.c:341 [svcxprt_rdma]svc_rdma_init - "\011max_inline\040\040\040\040\040\040\040:\040%d\012"
/usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svc_rdma.c:340 [svcxprt_rdma]svc_rdma_init - "\011sq_depth\040\040\040\040\040\040\040\040\040:\040%d\012"
/usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svc_rdma.c:338 [svcxprt_rdma]svc_rdma_init - "\011max_requests\040\040\040\040\040:\040%d\012"
...

Command Language Reference
==========================

At the lexical level, a command comprises a sequence of words separated
by whitespace characters. Note that newlines are treated as word
separators and do *not* end a command or allow multiple commands to
be done together. So these are all equivalent:

nullarbor:~ # echo -c 'file svcsock.c line 1603 +p' > /proc/dprintk
nullarbor:~ # echo -c ' file svcsock.c line 1603 +p ' > /proc/dprintk
nullarbor:~ # echo -c 'file svcsock.c\nline 1603 +p' > /proc/dprintk
nullarbor:~ # echo -n 'file svcsock.c line 1603 +p' > /proc/dprintk

Commands are bounded by a write() system call. If you want to do
multiple commands you need to do a separate "echo" for each, like:

nullarbor:~ # echo 'file svcsock.c line 1603 +p' > /proc/dprintk ;\
> echo 'file svcsock.c line 1563 +p' > /proc/dprintk

or even like:

nullarbor:~ # (
> echo 'file svcsock.c line 1603 +p' ;\
> echo 'file svcsock.c line 1563 +p' ;\
> ) > /proc/dprintk

At the syntactical level, a command comprises a sequence of match
specifications, followed by a flags change specification.

command ::= match-spec* flags-spec

The match-spec's are used to choose a subset of the known dprintk()
callsites to which to apply the flags-spec. Think of them as a query
with implicit ANDs between each pair. Note that an empty list of
match-specs is possible, but is not very useful because it will not
match any dprintk() callsites.

A match specification comprises a keyword, which controls the attribute
of the callsite to be compared, and a value to compare against. Possible
keywords are:

match-spec ::= 'func' string |
'file' string |
'module' string |
'format' string |
'line' line-range

line-range ::= lineno |
'-'lineno |
lineno'-' |
lineno'-'lineno
// Note: line-range cannot contain space, e.g.
// "1-30" is valid range but "1 - 30" is not.

lineno ::= unsigned-int

The meanings of each keyword are:

func
The given string is compared against the function name
of each callsite. Example:

func svc_tcp_accept

file
The given string is compared against either the full
pathname or the basename of the source file of each
callsite. Examples:

file svcsock.c
file /usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svcsock.c

module
The given string is compared against the module name
of each callsite. The module name is the string as
seen in "lsmod", i.e. without the directory or the .ko
suffix and with '-' changed to '_'. Examples:

module sunrpc
module nfsd

format
The given string is searched for in the dprintk() format
string. Note that the string does not need to match the
entire format, only some part. Whitespace and other
special characters can be escaped using C octal character
escape \ooo notation, e.g. the space character is \040.
Examples:

format svcrdma: // many of the NFS/RDMA server dprintks
format readahead // some dprintks in the readahead cache
format nfsd:\040SETATTR // how to match a format with whitespace

line
The given line number or range of line numbers is compared
against the line number of each dprintk() callsite. A single
line number matches the callsite line number exactly. A
range of line numbers matches any callsite between the first
and last line number inclusive. An empty first number means
the first line in the file, an empty line number means the
last number in the file. Examples:

line 1603 // exactly line 1603
line 1600-1605 // the six lines from line 1600 to line 1605
line -1605 // the 1605 lines from line 1 to line 1605
line 1600- // all lines from line 1600 to the end of the file

The flags specification comprises a change operation followed
by one or more flag characters. The change operation is one
of the characters:

-
remove the given flags

+
add the given flags

=
set the flags to the given flags

The flags are:

p
Causes a printk() message to be emitted to dmesg,
i.e. the obvious semantic of a dprintk().

s
Causes a kernel stack trace to be emitted to dmesg.
The printk() is emitted first, even if the 'p' flag
is not specified.

c
Causes the kernel to panic using the kernel BUG()
macro. This will cause the machine to drop into KDB
or take a kernel crash dump, according to how the
machine has been configured. The printk() is emitted
first, even if the 'p' flag is not specified.

Note the regexp ^[-+=][scp]+$ matches a flags specification.
Note also that there is no convenient syntax to remove all
the flags at once, you need to use "-psc".

Examples
========

// enable the message at line 1603 of file svcsock.c
nullarbor:~ # echo -n 'file svcsock.c line 1603 +p' > /proc/dprintk

// enable all the messages in file svcsock.c
nullarbor:~ # echo -n 'file svcsock.c +p' > /proc/dprintk

// enable all the messages in file svcsock.c
nullarbor:~ # echo -n 'file svcsock.c +p' > /proc/dprintk

// enable all the messages in the NFS server module
nullarbor:~ # echo -n 'module nfsd +p' > /proc/dprintk

// enable all 12 messages in the function svc_process()
nullarbor:~ # echo -n 'func svc_process +p' > /proc/dprintk

// disable all 12 messages in the function svc_process()
nullarbor:~ # echo -n 'func svc_process -p' > /proc/dprintk

// print a stack trace on every upcall to rpc.mountd or rpc.idmapd
nullarbor:~ # echo -n 'format Want\040update,\040refage +s' > /proc/dprintk

// cause a kernel crash dump when an RPC call to an
// unknown RPC program number is received
nullarbor:~ # echo -n 'format unknown\040program +c' > /proc/dprintk



--
Greg Banks, P.Engineer, SGI Australian Software Group.
the brightly coloured sporks of revolution.
I don't speak for SGI.