2018-04-13 16:44:00

by Steffen Maier

[permalink] [raw]
Subject: [PATCH v2 0/2] tracing/events: block: bring more on a par with blktrace

I had the need to understand I/O request processing in detail.
But I also had the need to enrich block traces with other trace events
including my own dynamic kprobe events. So I preferred block trace events
over blktrace to get everything nicely sorted into one ftrace output.
However, I missed device filtering for (un)plug events and also
the difference between the two flavors of unplug.

The first two patches bring block trace events closer to their
counterpart in blktrace tooling.

The last patch is just an RFC. I still kept it in this patch set because
it is inspired by PATCH 2/2.

Changes since v1:
[PATCH v2 1/2]
Use 0 and 1 instead of false and true for __print_symbolic table.
Now "trace-cmd report" can decode it. [Steven Rostedt]

Steffen Maier (3):
tracing/events: block: track and print if unplug was explicit or
schedule
tracing/events: block: dev_t via driver core for plug and unplug
events
tracing/events: block: also try to get dev_t via driver core for some
events

include/trace/events/block.h | 33 ++++++++++++++++++++++++++++-----
1 file changed, 28 insertions(+), 5 deletions(-)

--
2.13.5



2018-04-13 16:37:58

by Steffen Maier

[permalink] [raw]
Subject: [PATCH v2 1/2] tracing/events: block: track and print if unplug was explicit or schedule

Just like blktrace distinguishes explicit and schedule by means of
BLK_TA_UNPLUG_IO and BLK_TA_UNPLUG_TIMER, actually make use of the
existing argument "explicit" to distinguish the two cases in the one
common tracepoint block_unplug.

Complements v2.6.39 commit 49cac01e1fa7 ("block: make unplug timer trace
event correspond to the schedule() unplug") and commit d9c978331790
("block: remove block_unplug_timer() trace point").

Signed-off-by: Steffen Maier <[email protected]>
---

Changes since v1:
Use 0 and 1 instead of false and true for __print_symbolic table.
Now "trace-cmd report" can decode it. [Steven Rostedt]

include/trace/events/block.h | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/include/trace/events/block.h b/include/trace/events/block.h
index 81b43f5bdf23..e90bb6eb8097 100644
--- a/include/trace/events/block.h
+++ b/include/trace/events/block.h
@@ -470,6 +470,11 @@ TRACE_EVENT(block_plug,
TP_printk("[%s]", __entry->comm)
);

+#define show_block_unplug_explicit(val) \
+ __print_symbolic(val, \
+ {0, "schedule"}, \
+ {1, "explicit"})
+
DECLARE_EVENT_CLASS(block_unplug,

TP_PROTO(struct request_queue *q, unsigned int depth, bool explicit),
@@ -478,15 +483,18 @@ DECLARE_EVENT_CLASS(block_unplug,

TP_STRUCT__entry(
__field( int, nr_rq )
+ __field( bool, explicit )
__array( char, comm, TASK_COMM_LEN )
),

TP_fast_assign(
__entry->nr_rq = depth;
+ __entry->explicit = explicit;
memcpy(__entry->comm, current->comm, TASK_COMM_LEN);
),

- TP_printk("[%s] %d", __entry->comm, __entry->nr_rq)
+ TP_printk("[%s] %d %s", __entry->comm, __entry->nr_rq,
+ show_block_unplug_explicit(__entry->explicit))
);

/**
--
2.13.5


2018-04-13 16:38:39

by Steffen Maier

[permalink] [raw]
Subject: [RFC v2] tracing/events: block: also try to get dev_t via driver core for some events

Complements v2.6.31 commit 55782138e47d ("tracing/events: convert block
trace points to TRACE_EVENT()") for cases where rq->rq_disk == NULL:
block_rq_requeue, block_rq_insert, block_rq_issue;
and for cases where bio == NULL:
block_getrq, block_sleeprq.

NB: The NULL pointer check for q->kobj.parent is certainly racy and
I don't have enough experience if it's good enough for a trace event.

Since I don't know when above cases would occur,
I'm not sure this is worth it.

Signed-off-by: Steffen Maier <[email protected]>
---
include/trace/events/block.h | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/include/trace/events/block.h b/include/trace/events/block.h
index 6ea5a3899c2e..001e4f369df9 100644
--- a/include/trace/events/block.h
+++ b/include/trace/events/block.h
@@ -86,7 +86,9 @@ TRACE_EVENT(block_rq_requeue,
),

TP_fast_assign(
- __entry->dev = rq->rq_disk ? disk_devt(rq->rq_disk) : 0;
+ __entry->dev = rq->rq_disk ? disk_devt(rq->rq_disk) :
+ (q->kobj.parent ?
+ container_of(q->kobj.parent, struct device, kobj)->devt : 0);
__entry->sector = blk_rq_trace_sector(rq);
__entry->nr_sector = blk_rq_trace_nr_sectors(rq);

@@ -162,7 +164,9 @@ DECLARE_EVENT_CLASS(block_rq,
),

TP_fast_assign(
- __entry->dev = rq->rq_disk ? disk_devt(rq->rq_disk) : 0;
+ __entry->dev = rq->rq_disk ? disk_devt(rq->rq_disk) :
+ (q->kobj.parent ?
+ container_of(q->kobj.parent, struct device, kobj)->devt : 0);
__entry->sector = blk_rq_trace_sector(rq);
__entry->nr_sector = blk_rq_trace_nr_sectors(rq);
__entry->bytes = blk_rq_bytes(rq);
@@ -397,7 +401,9 @@ DECLARE_EVENT_CLASS(block_get_rq,
),

TP_fast_assign(
- __entry->dev = bio ? bio_dev(bio) : 0;
+ __entry->dev = bio ? bio_dev(bio) :
+ (q->kobj.parent ?
+ container_of(q->kobj.parent, struct device, kobj)->devt : 0);
__entry->sector = bio ? bio->bi_iter.bi_sector : 0;
__entry->nr_sector = bio ? bio_sectors(bio) : 0;
blk_fill_rwbs(__entry->rwbs,
--
2.13.5


2018-04-13 16:43:51

by Steffen Maier

[permalink] [raw]
Subject: [PATCH v2 2/2] tracing/events: block: dev_t via driver core for plug and unplug events

Complements v2.6.31 commit 55782138e47d ("tracing/events: convert block
trace points to TRACE_EVENT()") to be equivalent to traditional blktrace
output. Also this allows event filtering to not always get all (un)plug
events.

NB: The NULL pointer check for q->kobj.parent is certainly racy and
I don't have enough experience if it's good enough for a trace event.
The change did work for my cases (block device read/write I/O on
zfcp-attached SCSI disks and dm-mpath on top).

While I haven't seen any prior art using driver core (parent) relations
for trace events, there are other cases using this when no direct pointer
exists between objects, such as:
#define to_scsi_target(d) container_of(d, struct scsi_target, dev)
static inline struct scsi_target *scsi_target(struct scsi_device *sdev)
{
return to_scsi_target(sdev->sdev_gendev.parent);
}

This is the object model we make use of here:

struct gendisk {
struct hd_struct {
struct device { /*container_of*/
struct kobject kobj; <--+
dev_t devt; /*deref*/ |
} __dev; |
} part0; |
struct request_queue *queue; ..+ |
} : |
: |
struct request_queue { <..............+ |
/* queue kobject */ |
struct kobject { |
struct kobject *parent; --------+
} kobj;
}

The parent pointer comes from:
#define disk_to_dev(disk) (&(disk)->part0.__dev)
int blk_register_queue(struct gendisk *disk)
struct device *dev = disk_to_dev(disk);
struct request_queue *q = disk->queue;
ret = kobject_add(&q->kobj, kobject_get(&dev->kobj), "%s", "queue");
^^^parent

$ ls -d /sys/block/sdf/queue
/sys/block/sda/queue
$ cat /sys/block/sdf/dev
80:0

A partition does not have its own request queue:

$ cat /sys/block/sdf/sdf1/dev
8:81
$ ls -d /sys/block/sdf/sdf1/queue
ls: cannot access '/sys/block/sdf/sdf1/queue': No such file or directory

The difference to blktrace parsed output is that block events don't use the
partition's minor number but the containing block device's minor number:

$ dd if=/dev/sdf1 count=1

$ cat /sys/kernel/debug/tracing/trace
block_bio_remap: 8,80 R 2048 + 32 <- (8,81) 0
block_bio_queue: 8,80 R 2048 + 32 [dd]
block_getrq: 8,80 R 2048 + 32 [dd]
block_plug: 8,80 [dd]
^^^^
block_rq_insert: 8,80 R 16384 () 2048 + 32 [dd]
block_unplug: 8,80 [dd] 1 explicit
^^^^
block_rq_issue: 8,80 R 16384 () 2048 + 32 [dd]
block_rq_complete: 8,80 R () 2048 + 32 [0]

$ btrace /dev/sdf1
8,80 1 1 0.000000000 240240 A R 2048 + 32 <- (8,81) 0
8,81 1 2 0.000220890 240240 Q R 2048 + 32 [dd]
8,81 1 3 0.000229639 240240 G R 2048 + 32 [dd]
8,81 1 4 0.000231805 240240 P N [dd]
^^
8,81 1 5 0.000234671 240240 I R 2048 + 32 [dd]
8,81 1 6 0.000236365 240240 U N [dd] 1
^^
8,81 1 7 0.000238527 240240 D R 2048 + 32 [dd]
8,81 2 2 0.000613741 0 C R 2048 + 32 [0]

Signed-off-by: Steffen Maier <[email protected]>
---
include/trace/events/block.h | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/include/trace/events/block.h b/include/trace/events/block.h
index e90bb6eb8097..6ea5a3899c2e 100644
--- a/include/trace/events/block.h
+++ b/include/trace/events/block.h
@@ -460,14 +460,18 @@ TRACE_EVENT(block_plug,
TP_ARGS(q),

TP_STRUCT__entry(
+ __field( dev_t, dev )
__array( char, comm, TASK_COMM_LEN )
),

TP_fast_assign(
+ __entry->dev = q->kobj.parent ?
+ container_of(q->kobj.parent, struct device, kobj)->devt : 0;
memcpy(__entry->comm, current->comm, TASK_COMM_LEN);
),

- TP_printk("[%s]", __entry->comm)
+ TP_printk("%d,%d [%s]",
+ MAJOR(__entry->dev), MINOR(__entry->dev), __entry->comm)
);

#define show_block_unplug_explicit(val) \
@@ -482,18 +486,23 @@ DECLARE_EVENT_CLASS(block_unplug,
TP_ARGS(q, depth, explicit),

TP_STRUCT__entry(
+ __field( dev_t, dev )
__field( int, nr_rq )
__field( bool, explicit )
__array( char, comm, TASK_COMM_LEN )
),

TP_fast_assign(
+ __entry->dev = q->kobj.parent ?
+ container_of(q->kobj.parent, struct device, kobj)->devt : 0;
__entry->nr_rq = depth;
__entry->explicit = explicit;
memcpy(__entry->comm, current->comm, TASK_COMM_LEN);
),

- TP_printk("[%s] %d %s", __entry->comm, __entry->nr_rq,
+ TP_printk("%d,%d [%s] %d %s",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __entry->comm, __entry->nr_rq,
show_block_unplug_explicit(__entry->explicit))
);

--
2.13.5


2018-04-13 17:51:52

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] tracing/events: block: bring more on a par with blktrace

On Fri, 13 Apr 2018 18:36:20 +0200
Steffen Maier <[email protected]> wrote:

> I had the need to understand I/O request processing in detail.
> But I also had the need to enrich block traces with other trace events
> including my own dynamic kprobe events. So I preferred block trace events
> over blktrace to get everything nicely sorted into one ftrace output.
> However, I missed device filtering for (un)plug events and also
> the difference between the two flavors of unplug.
>
> The first two patches bring block trace events closer to their
> counterpart in blktrace tooling.
>
> The last patch is just an RFC. I still kept it in this patch set because
> it is inspired by PATCH 2/2.
>
> Changes since v1:
> [PATCH v2 1/2]
> Use 0 and 1 instead of false and true for __print_symbolic table.
> Now "trace-cmd report" can decode it. [Steven Rostedt]
>
> Steffen Maier (3):
> tracing/events: block: track and print if unplug was explicit or
> schedule
> tracing/events: block: dev_t via driver core for plug and unplug
> events
> tracing/events: block: also try to get dev_t via driver core for some
> events
>
> include/trace/events/block.h | 33 ++++++++++++++++++++++++++++-----
> 1 file changed, 28 insertions(+), 5 deletions(-)
>

From just the tracing point of view:

Reviewed-by: Steven Rostedt (VMware) <[email protected]>

as for the race conditions in the block layer, I'll let someone else
decided that.

-- Steve