LinuxLists.cc - [PATCH] vfs: Add a trace point in the mark_inode

2009-10-26 05:52:35

Subject: [PATCH] vfs: Add a trace point in the mark_inode_dirty function

>From b894af8a33bec621dd1a4126603a3ca372bf0643 Mon Sep 17 00:00:00 2001
From: Arjan van de Ven <[email protected]>
Date: Sun, 25 Oct 2009 15:37:04 -0700
Subject: [PATCH] vfs: Add a trace point in the mark_inode_dirty function

PowerTOP would like to be able to show who is keeping the disk
busy by dirtying data. The most logical spot for this is in the vfs
in the mark_inode_dirty() function. Doing this on the block level
is not possible because by the time the IO hits the block layer the
guilty party can no longer be found ("kjournald" and "pdflush" are not
useful answers to "who caused this file to be dirty).

The trace point follows the same logic/style as the block_dump code
and pretty much dumps the same data, just not to dmesg (and thus to
/var/log/messages) but via the trace events streams.

Signed-of-by: Arjan van de Ven <[email protected]>
---
fs/fs-writeback.c | 3 ++
fs/inode.c | 4 +++
include/trace/events/vfs.h | 53 ++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 60 insertions(+), 0 deletions(-)
create mode 100644 include/trace/events/vfs.h

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 9d5360c..4102f20 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -25,6 +25,7 @@
#include <linux/blkdev.h>
#include <linux/backing-dev.h>
#include <linux/buffer_head.h>
+#include <trace/events/vfs.h>
#include "internal.h"

#define inode_to_bdi(inode) ((inode)->i_mapping->backing_dev_info)
@@ -1071,6 +1072,8 @@ void __mark_inode_dirty(struct inode *inode, int flags)
if ((inode->i_state & flags) == flags)
return;

+ trace_dirty_inode(inode, current);
+
if (unlikely(block_dump))
block_dump___mark_inode_dirty(inode);

diff --git a/fs/inode.c b/fs/inode.c
index 4d8e3be..a61e8ba 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1624,3 +1624,7 @@ void init_special_inode(struct inode *inode, umode_t mode, dev_t rdev)
inode->i_ino);
}
EXPORT_SYMBOL(init_special_inode);
+
+#define CREATE_TRACE_POINTS
+#include <trace/events/vfs.h>
+
diff --git a/include/trace/events/vfs.h b/include/trace/events/vfs.h
new file mode 100644
index 0000000..21cf9fb
--- /dev/null
+++ b/include/trace/events/vfs.h
@@ -0,0 +1,53 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM vfs
+
+#if !defined(_TRACE_VFS_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_VFS_H
+
+/*
+ * Tracepoint for dirtying an inode:
+ */
+TRACE_EVENT(dirty_inode,
+
+ TP_PROTO(struct inode *inode, struct task_struct *task),
+
+ TP_ARGS(inode, task),
+
+ TP_STRUCT__entry(
+ __array( char, comm, TASK_COMM_LEN )
+ __field( pid_t, pid )
+ __array( char, dev, 16 )
+ __array( char, file, 32 )
+ ),
+
+ TP_fast_assign(
+ if (inode->i_ino || strcmp(inode->i_sb->s_id, "bdev")) {
+ struct dentry *dentry;
+ const char *name = "?";
+
+ dentry = d_find_alias(inode);
+ if (dentry) {
+ spin_lock(&dentry->d_lock);
+ name = (const char *) dentry->d_name.name;
+ }
+
+ memcpy(__entry->comm, task->comm, TASK_COMM_LEN);
+ __entry->pid = task->pid;
+ strlcpy(__entry->file, name, 32);
+ strlcpy(__entry->dev, inode->i_sb->s_id, 16);
+
+ if (dentry) {
+ spin_unlock(&dentry->d_lock);
+ dput(dentry);
+ }
+ }
+ ),
+
+ TP_printk("task=%i (%s) file=%s dev=%s",
+ __entry->pid, __entry->comm, __entry->file, __entry->dev)
+);
+
+#endif /* _TRACE_VFS_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
--
1.6.2.5

--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2009-10-26 06:03:50

by Andrew Morton

[permalink] [raw]

Subject: Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function

On Sun, 25 Oct 2009 22:53:42 -0700 Arjan van de Ven <[email protected]> wrote:

> >From b894af8a33bec621dd1a4126603a3ca372bf0643 Mon Sep 17 00:00:00 2001
> From: Arjan van de Ven <[email protected]>
> Date: Sun, 25 Oct 2009 15:37:04 -0700
> Subject: [PATCH] vfs: Add a trace point in the mark_inode_dirty function
>
> PowerTOP would like to be able to show who is keeping the disk
> busy by dirtying data. The most logical spot for this is in the vfs
> in the mark_inode_dirty() function. Doing this on the block level
> is not possible because by the time the IO hits the block layer the
> guilty party can no longer be found ("kjournald" and "pdflush" are not
> useful answers to "who caused this file to be dirty).
>
> The trace point follows the same logic/style as the block_dump code
> and pretty much dumps the same data, just not to dmesg (and thus to
> /var/log/messages) but via the trace events streams.
>
> ...
>
> @@ -1071,6 +1072,8 @@ void __mark_inode_dirty(struct inode *inode, int flags)
> if ((inode->i_state & flags) == flags)
> return;
>
> + trace_dirty_inode(inode, current);
> +
> if (unlikely(block_dump))
> block_dump___mark_inode_dirty(inode);
>

Doesn't powertop also want to know who is spinning up the disk via
buffered reads, direct-io reads and direct-io writes?

That's why the block_dump hook in submit_bio() is there.

2009-10-26 06:54:06

by Arjan van de Ven

[permalink] [raw]

Subject: Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function

On Sun, 25 Oct 2009 23:03:21 -0700
Andrew Morton <[email protected]> wrote:

> > ...
> >
> > @@ -1071,6 +1072,8 @@ void __mark_inode_dirty(struct inode *inode,
> > int flags) if ((inode->i_state & flags) == flags)
> > return;
> >
> > + trace_dirty_inode(inode, current);
> > +
> > if (unlikely(block_dump))
> > block_dump___mark_inode_dirty(inode);
> >
>
> Doesn't powertop also want to know who is spinning up the disk via
> buffered reads, direct-io reads and direct-io writes?
>
> That's why the block_dump hook in submit_bio() is there.

there is already an existing trace event for this in the block layer...
I was planning to use block_bio_queue for this.

--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2009-10-27 16:02:43

by Jason Baron

[permalink] [raw]

Subject: Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function

On Sun, Oct 25, 2009 at 10:53:42PM -0700, Arjan van de Ven wrote:
> diff --git a/include/trace/events/vfs.h b/include/trace/events/vfs.h
> new file mode 100644
> index 0000000..21cf9fb
> --- /dev/null
> +++ b/include/trace/events/vfs.h
> @@ -0,0 +1,53 @@
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM vfs
> +
> +#if !defined(_TRACE_VFS_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_VFS_H
> +
> +/*
> + * Tracepoint for dirtying an inode:
> + */

hi,

I would like to see us add docbook style comments for each added trace
event. I've already started a docbook for this, and written comments for
the irq tracepoints, in include/trace/events/irq.h. The docbook can be
viewed at: http://www.kernel.org/doc/htmldocs/tracepoint/

I think there has been some talk that the documentation should be
included as part of the kernel, so that 'perf' can easily access this. I
don't see how these docbook style comments wouldn't help with that
effort as well. We already have scripts to parse these comments, and it
probably wouldn't be hard to extend them to create a binary version, if
needed. Further, as your above comment indicates, these types of
comments want to be included in the source code above the TRACE_EVENTS
macros.

The initial lkml postings were:

1) http://lkml.indiana.edu/hypermail/linux/kernel/0904.3/02647.html
2) http://lkml.indiana.edu/hypermail/linux/kernel/0904.3/02650.html
3) http://lkml.indiana.edu/hypermail/linux/kernel/0904.3/02648.html
4) http://lkml.indiana.edu/hypermail/linux/kernel/0904.3/02651.html

thanks,

-Jason

2009-11-11 02:01:07

by Fengguang Wu

[permalink] [raw]

Subject: Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function

Hi Arjan,

> + TP_printk("task=%i (%s) file=%s dev=%s",
> + __entry->pid, __entry->comm, __entry->file, __entry->dev)

Maybe this is enough for POWERTOP, however for general use, the dirty
type(data/metadata) and inode number may be valuable to some users?

Thanks,
Fengguang

2009-11-11 02:34:51

by Li Zefan

[permalink] [raw]

Subject: Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function

> +/*
> + * Tracepoint for dirtying an inode:
> + */
> +TRACE_EVENT(dirty_inode,
> +
> + TP_PROTO(struct inode *inode, struct task_struct *task),
> +
> + TP_ARGS(inode, task),
> +
> + TP_STRUCT__entry(
> + __array( char, comm, TASK_COMM_LEN )
> + __field( pid_t, pid )
> + __array( char, dev, 16 )
> + __array( char, file, 32 )
> + ),
> +
> + TP_fast_assign(
> + if (inode->i_ino || strcmp(inode->i_sb->s_id, "bdev")) {
> + struct dentry *dentry;
> + const char *name = "?";
> +
> + dentry = d_find_alias(inode);
> + if (dentry) {
> + spin_lock(&dentry->d_lock);
> + name = (const char *) dentry->d_name.name;
> + }
> +
> + memcpy(__entry->comm, task->comm, TASK_COMM_LEN);
> + __entry->pid = task->pid;
> + strlcpy(__entry->file, name, 32);
> + strlcpy(__entry->dev, inode->i_sb->s_id, 16);
> +
> + if (dentry) {
> + spin_unlock(&dentry->d_lock);
> + dput(dentry);
> + }
> + }

This will leave __entry->comm, __entry->file and __entry->dev
uninitialized in the "else" case..

And is there any reason that we have to use __array() but can't
use __string()?

> + ),
> +
> + TP_printk("task=%i (%s) file=%s dev=%s",
> + __entry->pid, __entry->comm, __entry->file, __entry->dev)
> +);
> +
> +#endif /* _TRACE_VFS_H */

2009-11-11 06:33:30

by Arjan van de Ven

[permalink] [raw]

Subject: Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function

On Wed, 11 Nov 2009 10:01:08 +0800
Wu Fengguang <[email protected]> wrote:

> Hi Arjan,
>
> > + TP_printk("task=%i (%s) file=%s dev=%s",
> > + __entry->pid, __entry->comm, __entry->file,
> > __entry->dev)
>
> Maybe this is enough for POWERTOP, however for general use, the dirty
> type(data/metadata) and inode number may be valuable to some users?

what can a user do with an inode number????

--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2009-11-11 06:40:26

by Fengguang Wu

[permalink] [raw]

Subject: Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function

On Wed, Nov 11, 2009 at 02:34:56PM +0800, Arjan van de Ven wrote:
> On Wed, 11 Nov 2009 10:01:08 +0800
> Wu Fengguang <[email protected]> wrote:
>
> > Hi Arjan,
> >
> > > + TP_printk("task=%i (%s) file=%s dev=%s",
> > > + __entry->pid, __entry->comm, __entry->file,
> > > __entry->dev)
> >
> > Maybe this is enough for POWERTOP, however for general use, the dirty
> > type(data/metadata) and inode number may be valuable to some users?
>
> what can a user do with an inode number????

Hmm, maybe to tell one file from another in the trace stream :)
And maybe to compare it with other data sources, like "ls -i".

Thanks,
Fengguang

2009-11-11 07:42:42

by Jeff Garzik

[permalink] [raw]

Subject: Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function

On 11/11/2009 01:34 AM, Arjan van de Ven wrote:
> Wu Fengguang<[email protected]> wrote:
>> Maybe this is enough for POWERTOP, however for general use, the dirty
>> type(data/metadata) and inode number may be valuable to some users?
>
> what can a user do with an inode number????

Inode numbers have always been visible to userspace... IIRC, tar(1)
uses the st_ino member of struct stat to detect hard links in certain
cases. ls(1) displays inode numbers with -i, find(1) looks for them
with -inum, ...

Jeff

2009-11-11 07:45:55

by Ingo Molnar

[permalink] [raw]

Subject: Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function

* Jeff Garzik <[email protected]> wrote:

> On 11/11/2009 01:34 AM, Arjan van de Ven wrote:
> >Wu Fengguang<[email protected]> wrote:
> >>Maybe this is enough for POWERTOP, however for general use, the dirty
> >>type(data/metadata) and inode number may be valuable to some users?
> >
> >what can a user do with an inode number????
>
> Inode numbers have always been visible to userspace... IIRC, tar(1)
> uses the st_ino member of struct stat to detect hard links in certain
> cases. ls(1) displays inode numbers with -i, find(1) looks for them
> with -inum, ...

Without an inode->vfs-name lookup/matching service it's of limited
utility though to developers and users. So inode numbers are fine (as
nicely unique physical identifiers)- as long as corresponding vfs name
string is available too.

Ingo

2009-11-11 07:56:34

by Jeff Garzik

[permalink] [raw]

Subject: Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function

On 11/11/2009 02:45 AM, Ingo Molnar wrote:
>
> * Jeff Garzik<[email protected]> wrote:
>
>> On 11/11/2009 01:34 AM, Arjan van de Ven wrote:
>>> Wu Fengguang<[email protected]> wrote:
>>>> Maybe this is enough for POWERTOP, however for general use, the dirty
>>>> type(data/metadata) and inode number may be valuable to some users?
>>>
>>> what can a user do with an inode number????
>>
>> Inode numbers have always been visible to userspace... IIRC, tar(1)
>> uses the st_ino member of struct stat to detect hard links in certain
>> cases. ls(1) displays inode numbers with -i, find(1) looks for them
>> with -inum, ...
>
> Without an inode->vfs-name lookup/matching service it's of limited
> utility

Look in the quoted text for one such service... :)

Jeff

2009-11-11 11:15:45

by Ingo Molnar

[permalink] [raw]

Subject: Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function

* Jeff Garzik <[email protected]> wrote:

> On 11/11/2009 02:45 AM, Ingo Molnar wrote:
> >
> >* Jeff Garzik<[email protected]> wrote:
> >
> >>On 11/11/2009 01:34 AM, Arjan van de Ven wrote:
> >>>Wu Fengguang<[email protected]> wrote:
> >>>>Maybe this is enough for POWERTOP, however for general use, the dirty
> >>>>type(data/metadata) and inode number may be valuable to some users?
> >>>
> >>>what can a user do with an inode number????
> >>
> >>Inode numbers have always been visible to userspace... IIRC, tar(1)
> >>uses the st_ino member of struct stat to detect hard links in certain
> >>cases. ls(1) displays inode numbers with -i, find(1) looks for them
> >>with -inum, ...
> >
> >Without an inode->vfs-name lookup/matching service it's of limited
> >utility
>
> Look in the quoted text for one such service... :)

I'm not sure i understand your point - do you really suggest using
find(1) to make traces readable?

Ingo

2009-11-11 16:17:41

by Arjan van de Ven

[permalink] [raw]

Subject: Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function

On Wed, 11 Nov 2009 02:42:39 -0500
Jeff Garzik <[email protected]> wrote:

> On 11/11/2009 01:34 AM, Arjan van de Ven wrote:
> > Wu Fengguang<[email protected]> wrote:
> >> Maybe this is enough for POWERTOP, however for general use, the
> >> dirty type(data/metadata) and inode number may be valuable to some
> >> users?
> >
> > what can a user do with an inode number????
>
> Inode numbers have always been visible to userspace... IIRC, tar(1)
> uses the st_ino member of struct stat to detect hard links in certain
> cases. ls(1) displays inode numbers with -i, find(1) looks for them
> with -inum, ...

ok let me rephrase that. What would a user DO with this inode number
information, given that the filename is already passed on;
what additional useful, not-infinitely-hard-to-get piece of information
does this give the user that he can use to do <something> that he cannot
do with the current information ? E.g. what is that <something> ?

--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2009-11-11 17:27:44

by Kok, Auke

[permalink] [raw]

Subject: Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function

Ingo Molnar wrote:
> * Jeff Garzik <[email protected]> wrote:
>
>> On 11/11/2009 01:34 AM, Arjan van de Ven wrote:
>>> Wu Fengguang<[email protected]> wrote:
>>>> Maybe this is enough for POWERTOP, however for general use, the dirty
>>>> type(data/metadata) and inode number may be valuable to some users?
>>> what can a user do with an inode number????
>> Inode numbers have always been visible to userspace... IIRC, tar(1)
>> uses the st_ino member of struct stat to detect hard links in certain
>> cases. ls(1) displays inode numbers with -i, find(1) looks for them
>> with -inum, ...
>
> Without an inode->vfs-name lookup/matching service it's of limited
> utility though to developers and users. So inode numbers are fine (as
> nicely unique physical identifiers)- as long as corresponding vfs name
> string is available too.

agreed, this is the second problem I've stumbled upon (readahead was the first)
that could use something like this.

you can't reliably use inode numbers unless you actually know all of them at the
time that the tracepoint is generated: they could be deleted, modified etc. An
inode number is nice, but the filename and blockdev info would be superior for
both problems we're trying to solve.

Auke

2009-11-11 18:31:12

by Theodore Ts'o

[permalink] [raw]

Subject: Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function

On Wed, Nov 11, 2009 at 08:45:42AM +0100, Ingo Molnar wrote:
> Without an inode->vfs-name lookup/matching service it's of limited
> utility though to developers and users. So inode numbers are fine (as
> nicely unique physical identifiers)- as long as corresponding vfs name
> string is available too.

Inode numbers are quite usable for me; but I'm not afraid to do

debugfs /dev/sdb -R "ncheck 12345"

:-)

If you really want to avoid that, one relatively lightweight thing we
could do, which would avoid needing to dump the entire pathname out,
would be to print out the triple (devno, dir_ino, file_ino), and then
provide a privileged syscall which translates this to a user-visible
pathname. It won't be necessarily the pathname which the user used to
open the file (since there might be links, and bind mounts, et. al),
but if the goal is to give one of the user-friendly names of the inode
(as opposed to _the_ pathname used to open the file), it's quite sufficient.

- Ted

2009-11-11 18:56:39

by Ingo Molnar

[permalink] [raw]

Subject: Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function

* Theodore Tso <[email protected]> wrote:

> On Wed, Nov 11, 2009 at 08:45:42AM +0100, Ingo Molnar wrote:
> > Without an inode->vfs-name lookup/matching service it's of limited
> > utility though to developers and users. So inode numbers are fine (as
> > nicely unique physical identifiers)- as long as corresponding vfs name
> > string is available too.
>
> Inode numbers are quite usable for me; but I'm not afraid to do
>
> debugfs /dev/sdb -R "ncheck 12345"
>
> :-)
>
> If you really want to avoid that, one relatively lightweight thing we
> could do, which would avoid needing to dump the entire pathname out,
> would be to print out the triple (devno, dir_ino, file_ino), and then
> provide a privileged syscall which translates this to a user-visible
> pathname. It won't be necessarily the pathname which the user used to
> open the file (since there might be links, and bind mounts, et. al),
> but if the goal is to give one of the user-friendly names of the inode
> (as opposed to _the_ pathname used to open the file), it's quite
> sufficient.

Hm, why add a new syscall to retrieve the name we already had when the
event happened?

Also, why add a new syscall to retrieve something that might not exist
anymore? (the VFS namespace is quite dynamic - post-processing to
retrieve names is fundamentally racy)

What matters most for analysis is the 'name of the moment' - the thing
that the app used at that point.

Arjan isnt doing this just randomly, he's one of the few people trying
to speed up Linux booting - and this is the info he finds useful. We
should give that information in a reasonable way, and the tracepoint he
proposed looks pretty reasonable.

Ingo

2009-11-11 23:10:37

by Frank Ch. Eigler

[permalink] [raw]

Subject: Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function

Arjan van de Ven <[email protected]> writes:

> [...] ok let me rephrase that. What would a user DO with this inode
> number information, given that the filename is already passed on;
> what additional useful, not-infinitely-hard-to-get piece of
> information [...]

Perhaps "infinitely hard" is the wrong criterion, but passing inode
numbers lets a tracepoint client track changes to the same file, even
though the file may be renamed/unlinked over time.

- FChE

2009-11-11 23:38:12

by Kok, Auke

[permalink] [raw]

Subject: Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function

Frank Ch. Eigler wrote:
> Arjan van de Ven <[email protected]> writes:
>
>> [...] ok let me rephrase that. What would a user DO with this inode
>> number information, given that the filename is already passed on;
>> what additional useful, not-infinitely-hard-to-get piece of
>> information [...]
>
> Perhaps "infinitely hard" is the wrong criterion, but passing inode
> numbers lets a tracepoint client track changes to the same file, even
> though the file may be renamed/unlinked over time.

If you already know what the file object is, sure. We're interested in the case
where we have no clue what the file object actually is to begin with. Having a
trace with a random inode number pop up and then disappear into thin air won't
help much at all, especially if we can't map it back to something "real" on disk.
in time.

Auke

2009-11-12 02:14:07

by Arjan van de Ven

[permalink] [raw]

Subject: Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function

On Wed, 11 Nov 2009 13:29:25 -0500
Theodore Tso <[email protected]> wrote:

> If you really want to avoid that, one relatively lightweight thing we
> could do, which would avoid needing to dump the entire pathname out,
> would be to print out the triple (devno, dir_ino, file_ino), and then
> provide a privileged syscall which translates this to a user-visible
> pathname. It won't be necessarily the pathname which the user used to
> open the file (since there might be links, and bind mounts, et. al),
> but if the goal is to give one of the user-friendly names of the inode
> (as opposed to _the_ pathname used to open the file), it's quite
> sufficient.

here's the problem. PowerTOP wants to tell you "application FOO caused
the disk to spin up. BAD APPLICATION." And then give a suggestion as to
which file in question was involved, to give the developer a hint as to
what to fix. Oh and this has to be done 30 seconds after the actual
dirty happened of course (since clearly we don't want to wake a cpu up
or interfere with the running system).

What the patch does is perfectly fine for that. Your "solution" to get a
filename 30 seconds later with debugfs involves spinning up the disk...
... exactly the kind of thing this trace point wants to avoid in the
first place!
(and for those who say "yeah but the file may have been renamed". I
don't care! I want to show the filename that the app dirtied! Not what
name that file is right now, 30 seconds later. At least, I want to show
enough so that the developer of the app knows how to fix his pile of
power-eating-dung.

Fixing an OS this way (and yes I have used this patch to fix Moblin and
other partner linux OSes) saves around 0.5 Watts or so. Quite worth it
in terms of battery life.

I would like to turn this around: The current patch is sufficient for my
real-world usecase. I am still looking for the "other" mythical usecase
that would need an inode number instead.... lets put it this way: if
one of those comes around it's trivial to extend the trace data in an
ABI compatible way to include that. Until then.. it's just bloat.

--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2009-11-12 07:22:28

by Ingo Molnar

[permalink] [raw]

Subject: Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function

* Kok, Auke <[email protected]> wrote:

> If you already know what the file object is, sure. We're interested in
> the case where we have no clue what the file object actually is to
> begin with. Having a trace with a random inode number pop up and then
> disappear into thin air won't help much at all, especially if we can't
> map it back to something "real" on disk. in time.

Yep.

It's similar to PID/comm tracing, which we already do consistently for
all major task events such as fork/exit, sleep/wakeup/context-switch,
etc.

By the 'use inode numbers' argument it should be perfectly fine to only
trace the physical PID itself, and look up the comm later in /proc, or
to add a syscall to do it.

In reality it's not fine. Not just the unnecessary overhead (you have to
look up something you already had) - but also that tasks will exit in
high-freq workloads (so the comm is lost), the PID might not match up
anymore, tasks can change their comm, etc.

The most important principle with event logging is that we want the most
high quality information and we want to a trustable and simple data
source: so for tasks we want the PID and the comm, and for files we want
the top name component and perhaps also the inode number (plus a
filesystem id), captured when the event happened.

Ingo

2009-11-15 18:59:09

by Arjan van de Ven

[permalink] [raw]

Subject: Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function

On Wed, 11 Nov 2009 10:33:55 +0800
Li Zefan <[email protected]> wrote:
>
> This will leave __entry->comm, __entry->file and __entry->dev
> uninitialized in the "else" case..

I would expect the struct to be zeroed before.... is it?
(the TP_ stuff is so obscure I find it hard to figure out where it ends
up)

--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2009-11-16 00:57:12

by Li Zefan

[permalink] [raw]

Subject: Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function

Arjan van de Ven wrote:
> On Wed, 11 Nov 2009 10:33:55 +0800
> Li Zefan <[email protected]> wrote:
>> This will leave __entry->comm, __entry->file and __entry->dev
>> uninitialized in the "else" case..
>
> I would expect the struct to be zeroed before.... is it?
> (the TP_ stuff is so obscure I find it hard to figure out where it ends
> up)
>

No, it won't. Ring buffer won't zero the buffer before returning it,
and TP_ stuff won't zero the buffer after getting it.

See static void ftrace_raw_event_##call(proto) in include/event/ftrace.h

2009-11-20 10:43:47

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function

Guys, I think both the inode number and name do have a use case. For
file system developers observing the filesystem the inode number is very
useful, and if you look at the ext4 tracing already in tree or the xfs
tracing going in in the next window they use the inode number all over.

Which btw brings up another good argument - to make the tracing really
useful we need to have conventions. While the inode number seems to be
a realtively easy one printing the device is more difficult. XFS just
prints the raw major/minor to stay simple, ext4 has a complicated ad-hoc
cache of device names, and this one just prints the superblock id
string.

Of course for a user the name is a lot more meaninful, but also
relatively expensive to generate. Then again I'm not even sure how the
last pathname component only here is all that useful - it can't be used
to easily find the file.

2009-11-20 10:52:05

by Ingo Molnar

[permalink] [raw]

Subject: Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function

* Christoph Hellwig <[email protected]> wrote:

> Guys, I think both the inode number and name do have a use case. For
> file system developers observing the filesystem the inode number is
> very useful, and if you look at the ext4 tracing already in tree or
> the xfs tracing going in in the next window they use the inode number
> all over.
>
> Which btw brings up another good argument - to make the tracing really
> useful we need to have conventions. While the inode number seems to
> be a realtively easy one printing the device is more difficult. XFS
> just prints the raw major/minor to stay simple, ext4 has a complicated
> ad-hoc cache of device names, and this one just prints the superblock
> id string.

Agreed.

> Of course for a user the name is a lot more meaninful, but also
> relatively expensive to generate. Then again I'm not even sure how
> the last pathname component only here is all that useful - it can't be
> used to easily find the file.

That's not the main point though - the point is for app developers (and
users) being able to see 'oh, _that_ file it is, we need to fix that'.
In the context of a specific app, the last component filename carries
95% of the useful information.

Look at how PowerTOP does it, for a real-life usecase.

Ingo

2009-11-20 14:43:39

by Arjan van de Ven

[permalink] [raw]

Subject: Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function

On Fri, 20 Nov 2009 05:43:35 -0500
Christoph Hellwig <[email protected]> wrote:

>
> Which btw brings up another good argument - to make the tracing really
> useful we need to have conventions. While the inode number seems to
> be a realtively easy one printing the device is more difficult. XFS
> just prints the raw major/minor to stay simple, ext4 has a
> complicated ad-hoc cache of device names, and this one just prints
> the superblock id string.

I was just trying to stay compatible with blockdump, and it even makes
sense ;)

>
> Of course for a user the name is a lot more meaninful, but also
> relatively expensive to generate. Then again I'm not even sure how
> the last pathname component only here is all that useful - it can't
> be used to easily find the file.

in my case it's not about finding the file, but finding the place in
the application that is doing the writing. The last pathname component
is more than enough for this....

>

--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2009-11-20 16:05:51

by Jamie Lokier

[permalink] [raw]

Subject: Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function

Arjan van de Ven wrote:
> in my case it's not about finding the file, but finding the place in
> the application that is doing the writing. The last pathname component
> is more than enough for this....

So what you really need is the source file and line number in your
application where it does the writing :-)

-- Jamie

2009-11-20 16:44:21

by Arjan van de Ven

[permalink] [raw]

Subject: Re: [PATCH] vfs: Add a trace point in the mark_inode_dirty function

On Fri, 20 Nov 2009 16:05:26 +0000
Jamie Lokier <[email protected]> wrote:

> Arjan van de Ven wrote:
> > in my case it's not about finding the file, but finding the place in
> > the application that is doing the writing. The last pathname
> > component is more than enough for this....
>
> So what you really need is the source file and line number in your
> application where it does the writing :-)

while you present this as a joke.... perf can more or less do this
using the backtrace infrastructure ...

--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org