2010-07-28 06:27:12

by Sanagi, Koki

[permalink] [raw]
Subject: [RFC PATCH v4 0/5] netdev: show a process of packets

CHANGE-LOG since v3:
1) change arguments of softirq tracepoint into original one.
2) remove tracepoint of dev_kfree_skb_irq and skb_free_datagram_locked
and add trace_kfree_skb before __kfree_skb instead of them.
3) add tracepoint to netif_rx and display it by netdev-times script.

These patch-set adds tracepoints to show us a process of packets.
Using these tracepoints and existing points, we can get the time when
packet passes through some points in transmit or receive sequence.
For example, this is an output of perf script which is attached by patch 5/5.

106133.171439sec cpu=0
irq_entry(+0.000msec irq=24:eth4)
|
softirq_entry(+0.006msec)
|
|---netif_receive_skb(+0.010msec skb=f2d15900 len=100)
| |
| skb_copy_datagram_iovec(+0.039msec 10291::10291)
|
napi_poll_exit(+0.022msec eth4)

106134.175634sec cpu=1
irq_entry(+0.000msec irq=28:eth1)
|
|---netif_rx(+0.009msec skb=f3ef0a00)
|
softirq_entry(+0.018msec)
|
|---netif_receive_skb(+0.021msec skb=f3ef0a00 len=84)
| |
| skb_copy_datagram_iovec(+0.033msec 0:swapper)
|
napi_poll_exit(+0.035msec (no_device))

The above is a receive side(eth4 is NAPI. eth1 is non-NAPI). Like this, it can
show receive sequence frominterrupt(irq_entry) to application
(skb_copy_datagram_iovec).
This script shows one NET_RX softirq and events related to it. All relative
time bases on first irq_entry which raise NET_RX softirq.

dev len Qdisc netdevice free
eth4 74 106125.030004sec 0.006msec 0.009msec
eth4 87 106125.041020sec 0.007msec 0.023msec
eth4 66 106125.042291sec 0.003msec 0.012msec
eth4 66 106125.043274sec 0.006msec 0.004msec
eth4 850 106125.044283sec 0.007msec 0.018msec

The above is a transmit side. There are three check-time-points.
Point1 is before putting a packet to Qdisc. point2 is after ndo_start_xmit in
dev_hard_start_xmit. It indicates finishing putting a packet to driver.
point3 is in consume_skb and kfree_skb. It indicates freeing a transmitted packet.
Values of this script are, from left, device name, length of a packet, a time of
point1, an interval time between point1 and point2 and an interval time between
point2 and point3.

These times are useful to analyze a performance or to detect a point where
packet delays. For example,
- NET_RX softirq calling is late.
- Application is late to take a packet.
- It takes much time to put a transmitting packet to driver
(It may be caused by packed queue)

And also, these tracepoint help us to investigate a network driver's trouble
from memory dump because ftrace records it to memory. And ftrace is so light
even if always trace on. So, in a case investigating a problem which doesn't
reproduce, it is useful.

Thanks,
Koki Sanagi.


2010-07-28 06:29:21

by Sanagi, Koki

[permalink] [raw]
Subject: [RFC PATCH v4 1/5] irq: add tracepoint to softirq_raise

CHANGE-LOG since v3:
-change arguments of softirq tracepoint into original one.

From: Lai Jiangshan <[email protected]>

Add a tracepoint for tracing when softirq action is raised.

It and the existed tracepoints complete softirq's tracepoints:
softirq_raise, softirq_entry and softirq_exit.

And when this tracepoint is used in combination with
the softirq_entry tracepoint we can determine
the softirq raise latency.

Signed-off-by: Lai Jiangshan <[email protected]>
Acked-by: Mathieu Desnoyers <[email protected]>
Acked-by: Frederic Weisbecker <[email protected]>

[ factorize softirq events with DECLARE_EVENT_CLASS ]
Signed-off-by: Koki Sanagi <[email protected]>
---
include/linux/interrupt.h | 8 +++++++-
include/trace/events/irq.h | 26 ++++++++++++++++++++++++--
2 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index c233113..5460db4 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -18,6 +18,7 @@
#include <asm/atomic.h>
#include <asm/ptrace.h>
#include <asm/system.h>
+#include <trace/events/irq.h>

/*
* These correspond to the IORESOURCE_IRQ_* defines in
@@ -402,7 +403,12 @@ asmlinkage void do_softirq(void);
asmlinkage void __do_softirq(void);
extern void open_softirq(int nr, void (*action)(struct softirq_action *));
extern void softirq_init(void);
-#define __raise_softirq_irqoff(nr) do { or_softirq_pending(1UL << (nr)); } while (0)
+static inline void __raise_softirq_irqoff(unsigned int nr)
+{
+ trace_softirq_raise((struct softirq_action *)&nr, NULL);
+ or_softirq_pending(1UL << nr);
+}
+
extern void raise_softirq_irqoff(unsigned int nr);
extern void raise_softirq(unsigned int nr);
extern void wakeup_softirqd(void);
diff --git a/include/trace/events/irq.h b/include/trace/events/irq.h
index 0e4cfb6..3ddda02 100644
--- a/include/trace/events/irq.h
+++ b/include/trace/events/irq.h
@@ -5,7 +5,9 @@
#define _TRACE_IRQ_H

#include <linux/tracepoint.h>
-#include <linux/interrupt.h>
+
+struct irqaction;
+struct softirq_action;

#define softirq_name(sirq) { sirq##_SOFTIRQ, #sirq }
#define show_softirq_name(val) \
@@ -93,7 +95,10 @@ DECLARE_EVENT_CLASS(softirq,
),

TP_fast_assign(
- __entry->vec = (int)(h - vec);
+ if (vec)
+ __entry->vec = (int)(h - vec);
+ else
+ __entry->vec = *((int *)h);
),

TP_printk("vec=%d [action=%s]", __entry->vec,
@@ -136,6 +141,23 @@ DEFINE_EVENT(softirq, softirq_exit,
TP_ARGS(h, vec)
);

+/**
+ * softirq_raise - called immediately when a softirq is raised
+ * @h: pointer to struct softirq_action
+ * @vec: pointer to first struct softirq_action in softirq_vec array
+ *
+ * The @h parameter contains a pointer to the softirq vector number which is
+ * raised. @vec is NULL and it means @h includes vector number not
+ * softirq_action. When used in combination with the softirq_entry tracepoint
+ * we can determine the softirq raise latency.
+ */
+DEFINE_EVENT(softirq, softirq_raise,
+
+ TP_PROTO(struct softirq_action *h, struct softirq_action *vec),
+
+ TP_ARGS(h, vec)
+);
+
#endif /* _TRACE_IRQ_H */

/* This part must be outside protection */

2010-07-28 06:30:30

by Sanagi, Koki

[permalink] [raw]
Subject: [RFC PATCH v4 2/5] napi: convert trace_napi_poll to TRACE_EVENT

From: Neil Horman <[email protected]>

This patch converts trace_napi_poll from DECLARE_EVENT to TRACE_EVENT to improve
the usability of napi_poll tracepoint.

<idle>-0 [001] 241302.750777: napi_poll: napi poll on napi struct f6acc480 for device eth3
<idle>-0 [000] 241302.852389: napi_poll: napi poll on napi struct f5d0d70c for device eth1

An original patch is below.
http://marc.info/?l=linux-kernel&m=126021713809450&w=2
Signed-off-by: Neil Horman <[email protected]>

And add a fix by Steven Rostedt.
http://marc.info/?l=linux-kernel&m=126150506519173&w=2

Signed-off-by: Koki Sanagi <[email protected]>
---
include/trace/events/napi.h | 25 +++++++++++++++++++++++--
1 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/include/trace/events/napi.h b/include/trace/events/napi.h
index 188deca..8fe1e93 100644
--- a/include/trace/events/napi.h
+++ b/include/trace/events/napi.h
@@ -6,10 +6,31 @@

#include <linux/netdevice.h>
#include <linux/tracepoint.h>
+#include <linux/ftrace.h>
+
+#define NO_DEV "(no_device)"
+
+TRACE_EVENT(napi_poll,

-DECLARE_TRACE(napi_poll,
TP_PROTO(struct napi_struct *napi),
- TP_ARGS(napi));
+
+ TP_ARGS(napi),
+
+ TP_STRUCT__entry(
+ __field( struct napi_struct *, napi)
+ __string( dev_name, napi->dev ? napi->dev->name : NO_DEV)
+ ),
+
+ TP_fast_assign(
+ __entry->napi = napi;
+ __assign_str(dev_name, napi->dev ? napi->dev->name : NO_DEV);
+ ),
+
+ TP_printk("napi poll on napi struct %p for device %s",
+ __entry->napi, __get_str(dev_name))
+);
+
+#undef NO_DEV

#endif /* _TRACE_NAPI_H_ */

2010-07-28 06:31:30

by Sanagi, Koki

[permalink] [raw]
Subject: [RFC PATCH v4 3/5] netdev: add tracepoints to netdev layer

CHANGE-LOG since v3:
-add tracepoint to netif_rx.


This patch adds tracepoint to dev_queue_xmit, dev_hard_start_xmit, netif_rx and
netif_receive_skb. These tracepoints help you to monitor network driver's
input/output.

<idle>-0 [001] 112447.902030: netif_rx: dev=eth1 skbaddr=f3ef0900 len=84
<idle>-0 [001] 112447.902039: netif_receive_skb: dev=eth1 skbaddr=f3ef0900 len=84
sshd-6828 [000] 112447.903257: net_dev_queue: dev=eth4 skbaddr=f3fca538 len=226
sshd-6828 [000] 112447.903260: net_dev_xmit: dev=eth4 skbaddr=f3fca538 len=226 rc=0

Signed-off-by: Koki Sanagi <[email protected]>
---
include/trace/events/net.h | 82 ++++++++++++++++++++++++++++++++++++++++++++
net/core/dev.c | 6 +++
net/core/net-traces.c | 1 +
3 files changed, 89 insertions(+), 0 deletions(-)

diff --git a/include/trace/events/net.h b/include/trace/events/net.h
new file mode 100644
index 0000000..5f247f5
--- /dev/null
+++ b/include/trace/events/net.h
@@ -0,0 +1,82 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM net
+
+#if !defined(_TRACE_NET_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_NET_H
+
+#include <linux/skbuff.h>
+#include <linux/netdevice.h>
+#include <linux/ip.h>
+#include <linux/tracepoint.h>
+
+TRACE_EVENT(net_dev_xmit,
+
+ TP_PROTO(struct sk_buff *skb,
+ int rc),
+
+ TP_ARGS(skb, rc),
+
+ TP_STRUCT__entry(
+ __field( void *, skbaddr )
+ __field( unsigned int, len )
+ __field( int, rc )
+ __string( name, skb->dev->name )
+ ),
+
+ TP_fast_assign(
+ __entry->skbaddr = skb;
+ __entry->len = skb->len;
+ __entry->rc = rc;
+ __assign_str(name, skb->dev->name);
+ ),
+
+ TP_printk("dev=%s skbaddr=%p len=%u rc=%d",
+ __get_str(name), __entry->skbaddr, __entry->len, __entry->rc)
+);
+
+DECLARE_EVENT_CLASS(net_dev_template,
+
+ TP_PROTO(struct sk_buff *skb),
+
+ TP_ARGS(skb),
+
+ TP_STRUCT__entry(
+ __field( void *, skbaddr )
+ __field( unsigned int, len )
+ __string( name, skb->dev->name )
+ ),
+
+ TP_fast_assign(
+ __entry->skbaddr = skb;
+ __entry->len = skb->len;
+ __assign_str(name, skb->dev->name);
+ ),
+
+ TP_printk("dev=%s skbaddr=%p len=%u",
+ __get_str(name), __entry->skbaddr, __entry->len)
+)
+
+DEFINE_EVENT(net_dev_template, net_dev_queue,
+
+ TP_PROTO(struct sk_buff *skb),
+
+ TP_ARGS(skb)
+);
+
+DEFINE_EVENT(net_dev_template, netif_receive_skb,
+
+ TP_PROTO(struct sk_buff *skb),
+
+ TP_ARGS(skb)
+);
+
+DEFINE_EVENT(net_dev_template, netif_rx,
+
+ TP_PROTO(struct sk_buff *skb),
+
+ TP_ARGS(skb)
+);
+#endif /* _TRACE_NET_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/net/core/dev.c b/net/core/dev.c
index 6e1b437..9ea9c1e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -130,6 +130,7 @@
#include <linux/jhash.h>
#include <linux/random.h>
#include <trace/events/napi.h>
+#include <trace/events/net.h>
#include <linux/pci.h>

#include "net-sysfs.h"
@@ -1981,6 +1982,7 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
}

rc = ops->ndo_start_xmit(skb, dev);
+ trace_net_dev_xmit(skb, rc);
if (rc == NETDEV_TX_OK)
txq_trans_update(txq);
return rc;
@@ -2001,6 +2003,7 @@ gso:
skb_dst_drop(nskb);

rc = ops->ndo_start_xmit(nskb, dev);
+ trace_net_dev_xmit(nskb, rc);
if (unlikely(rc != NETDEV_TX_OK)) {
if (rc & ~NETDEV_TX_MASK)
goto out_kfree_gso_skb;
@@ -2189,6 +2192,7 @@ int dev_queue_xmit(struct sk_buff *skb)
#ifdef CONFIG_NET_CLS_ACT
skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_EGRESS);
#endif
+ trace_net_dev_queue(skb);
if (q->enqueue) {
rc = __dev_xmit_skb(skb, q, dev, txq);
goto out;
@@ -2515,6 +2519,7 @@ int netif_rx(struct sk_buff *skb)
if (netdev_tstamp_prequeue)
net_timestamp_check(skb);

+ trace_netif_rx(skb);
#ifdef CONFIG_RPS
{
struct rps_dev_flow voidflow, *rflow = &voidflow;
@@ -2829,6 +2834,7 @@ static int __netif_receive_skb(struct sk_buff *skb)
if (!netdev_tstamp_prequeue)
net_timestamp_check(skb);

+ trace_netif_receive_skb(skb);
if (vlan_tx_tag_present(skb) && vlan_hwaccel_do_receive(skb))
return NET_RX_SUCCESS;

diff --git a/net/core/net-traces.c b/net/core/net-traces.c
index afa6380..7f1bb2a 100644
--- a/net/core/net-traces.c
+++ b/net/core/net-traces.c
@@ -26,6 +26,7 @@

#define CREATE_TRACE_POINTS
#include <trace/events/skb.h>
+#include <trace/events/net.h>
#include <trace/events/napi.h>

EXPORT_TRACEPOINT_SYMBOL_GPL(kfree_skb);

2010-07-28 06:34:15

by Sanagi, Koki

[permalink] [raw]
Subject: [RFC PATCH v4 4/5] skb: add tracepoints to freeing skb

CHANGE-LOG since v3:
-remove tracepoint of dev_kfree_skb_irq and skb_free_datagram_locked.
Insead of them, add trace_kfree_skb before __kfree_skb in net_tx_action and
skb_free_datagram_locked. And there is no change of about the location of
trace_consume_skb to keep a symmetry against trace_kfree_skb.

This patch adds tracepoint to consume_skb and add trace_kfree_skb before
__kfree_skb in skb_free_datagram_locked and net_tx_action.
Combinating with tracepoint on dev_hard_start_xmit, we can check how long it
takes to free transmited packets. And using it, we can calculate how many
packets driver had at that time. It is useful when a drop of transmited packet
is a problem.

sshd-6828 [000] 112689.258154: consume_skb: skbaddr=f2d99bb8

Signed-off-by: Koki Sanagi <[email protected]>
---
include/trace/events/skb.h | 17 +++++++++++++++++
net/core/datagram.c | 1 +
net/core/dev.c | 2 ++
net/core/skbuff.c | 1 +
4 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
index 4b2be6d..75ce9d5 100644
--- a/include/trace/events/skb.h
+++ b/include/trace/events/skb.h
@@ -35,6 +35,23 @@ TRACE_EVENT(kfree_skb,
__entry->skbaddr, __entry->protocol, __entry->location)
);

+TRACE_EVENT(consume_skb,
+
+ TP_PROTO(struct sk_buff *skb),
+
+ TP_ARGS(skb),
+
+ TP_STRUCT__entry(
+ __field( void *, skbaddr )
+ ),
+
+ TP_fast_assign(
+ __entry->skbaddr = skb;
+ ),
+
+ TP_printk("skbaddr=%p", __entry->skbaddr)
+);
+
TRACE_EVENT(skb_copy_datagram_iovec,

TP_PROTO(const struct sk_buff *skb, int len),
diff --git a/net/core/datagram.c b/net/core/datagram.c
index 251997a..282806b 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -243,6 +243,7 @@ void skb_free_datagram_locked(struct sock *sk, struct sk_buff *skb)
unlock_sock_fast(sk, slow);

/* skb is now orphaned, can be freed outside of locked section */
+ trace_kfree_skb(skb, skb_free_datagram_locked);
__kfree_skb(skb);
}
EXPORT_SYMBOL(skb_free_datagram_locked);
diff --git a/net/core/dev.c b/net/core/dev.c
index 9ea9c1e..92d79bf 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -131,6 +131,7 @@
#include <linux/random.h>
#include <trace/events/napi.h>
#include <trace/events/net.h>
+#include <trace/events/skb.h>
#include <linux/pci.h>

#include "net-sysfs.h"
@@ -2577,6 +2578,7 @@ static void net_tx_action(struct softirq_action *h)
clist = clist->next;

WARN_ON(atomic_read(&skb->users));
+ trace_kfree_skb(skb, net_tx_action);
__kfree_skb(skb);
}
}
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 7da58a2..9cb91a5 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -466,6 +466,7 @@ void consume_skb(struct sk_buff *skb)
smp_rmb();
else if (likely(!atomic_dec_and_test(&skb->users)))
return;
+ trace_consume_skb(skb);
__kfree_skb(skb);
}
EXPORT_SYMBOL(consume_skb);

2010-07-28 06:35:36

by Sanagi, Koki

[permalink] [raw]
Subject: [RFC PATCH v4 5/5] perf:add a script shows a process of packet

Add a perf script which shows a process of packets and processed time.
It helps us to investigate networking or network device.

If you want to use it, install perf and record perf.data like following.

#perf trace record netdev-times [script]

If you set script, perf gathers records until it ends.
If not, you must Ctrl-C to stop recording.

And if you want a report from record,

#perf trace report netdev-times [options]

If you use some options, you can limit an output.
Option is below.

tx: show only process of tx packets
rx: show only process of rx packets
dev=: show a process specified with this option
debug: work with debug mode. It shows buffer status.

For example, if you want to show a process of received packets associated
with eth4,

#perf trace report netdev-times rx dev=eth4
106133.171439sec cpu=0
irq_entry(+0.000msec irq=24:eth4)
|
softirq_entry(+0.006msec)
|
|---netif_receive_skb(+0.010msec skb=f2d15900 len=100)
| |
| skb_copy_datagram_iovec(+0.039msec 10291::10291)
|
napi_poll_exit(+0.022msec eth4)

This perf script helps us to analyze a process time of transmit/receive
sequence.

Signed-off-by: Koki Sanagi <[email protected]>
---
tools/perf/scripts/python/bin/netdev-times-record | 8 +
tools/perf/scripts/python/bin/netdev-times-report | 5 +
tools/perf/scripts/python/netdev-times.py | 464 +++++++++++++++++++++
3 files changed, 477 insertions(+), 0 deletions(-)

diff --git a/tools/perf/scripts/python/bin/netdev-times-record b/tools/perf/scripts/python/bin/netdev-times-record
new file mode 100644
index 0000000..2b59511
--- /dev/null
+++ b/tools/perf/scripts/python/bin/netdev-times-record
@@ -0,0 +1,8 @@
+#!/bin/bash
+perf record -c 1 -f -R -a -e net:net_dev_xmit -e net:net_dev_queue \
+ -e net:netif_receive_skb -e net:netif_rx \
+ -e skb:consume_skb -e skb:kfree_skb \
+ -e skb:skb_copy_datagram_iovec -e napi:napi_poll \
+ -e irq:irq_handler_entry -e irq:irq_handler_exit \
+ -e irq:softirq_entry -e irq:softirq_exit \
+ -e irq:softirq_raise $@
diff --git a/tools/perf/scripts/python/bin/netdev-times-report b/tools/perf/scripts/python/bin/netdev-times-report
new file mode 100644
index 0000000..c3d0a63
--- /dev/null
+++ b/tools/perf/scripts/python/bin/netdev-times-report
@@ -0,0 +1,5 @@
+#!/bin/bash
+# description: display a process of packet and processing time
+# args: [tx] [rx] [dev=] [debug]
+
+perf trace -s ~/libexec/perf-core/scripts/python/netdev-times.py $@
diff --git a/tools/perf/scripts/python/netdev-times.py b/tools/perf/scripts/python/netdev-times.py
new file mode 100644
index 0000000..9aa0a32
--- /dev/null
+++ b/tools/perf/scripts/python/netdev-times.py
@@ -0,0 +1,464 @@
+# Display a process of packets and processed time.
+# It helps us to investigate networking or network device.
+#
+# options
+# tx: show only tx chart
+# rx: show only rx chart
+# dev=: show only thing related to specified device
+# debug: work with debug mode. It shows buffer status.
+
+import os
+import sys
+
+sys.path.append(os.environ['PERF_EXEC_PATH'] + \
+ '/scripts/python/Perf-Trace-Util/lib/Perf/Trace')
+
+from perf_trace_context import *
+from Core import *
+from Util import *
+
+all_event_list = []; # insert all tracepoint event related with this script
+irq_dic = {}; # key is cpu and value is a list which stacks irqs
+ # which raise NET_RX softirq
+net_rx_dic = {}; # key is cpu and value include time of NET_RX softirq-entry
+ # and a list which stacks receive
+receive_hunk_list = []; # a list which include a sequence of receive events
+rx_skb_list = []; # received packet list for matching
+ # skb_copy_datagram_iovec
+
+buffer_budget = 65536; # the budget of rx_skb_list, tx_queue_list and
+ # tx_xmit_list
+of_count_rx_skb_list = 0; # overflow count
+
+tx_queue_list = []; # list of packets which pass through dev_queue_xmit
+of_count_tx_queue_list = 0; # overflow count
+
+tx_xmit_list = []; # list of packets which pass through dev_hard_start_xmit
+of_count_tx_xmit_list = 0; # overflow count
+
+tx_free_list = []; # list of packets which is freed
+
+# options
+show_tx = 0;
+show_rx = 0;
+dev = 0; # store a name of device specified by option "dev="
+debug = 0;
+
+# indices of event_info tuple
+EINFO_IDX_NAME= 0
+EINFO_IDX_CONTEXT=1
+EINFO_IDX_CPU= 2
+EINFO_IDX_TIME= 3
+EINFO_IDX_PID= 4
+EINFO_IDX_COMM= 5
+
+# Calculate a time interval(msec) from src(nsec) to dst(nsec)
+def diff_msec(src, dst):
+ return (dst - src) / 1000000.0
+
+# Display a process of transmitting a packet
+def print_transmit(hunk):
+ if dev != 0 and hunk['dev'].find(dev) < 0:
+ return
+ print "%7s %5d %6d.%06dsec %12.3fmsec %12.3fmsec" % \
+ (hunk['dev'], hunk['len'],
+ nsecs_secs(hunk['queue_t']),
+ nsecs_nsecs(hunk['queue_t'])/1000,
+ diff_msec(hunk['queue_t'], hunk['xmit_t']),
+ diff_msec(hunk['xmit_t'], hunk['free_t']))
+
+# Format for displaying rx packet processing
+PF_IRQ_ENTRY= " irq_entry(+%.3fmsec irq=%d:%s)"
+PF_SOFT_ENTRY=" softirq_entry(+%.3fmsec)"
+PF_NAPI_POLL= " napi_poll_exit(+%.3fmsec %s)"
+PF_JOINT= " |"
+PF_WJOINT= " | |"
+PF_NET_RECV= " |---netif_receive_skb(+%.3fmsec skb=%x len=%d)"
+PF_NET_RX= " |---netif_rx(+%.3fmsec skb=%x)"
+PF_CPY_DGRAM= " | skb_copy_datagram_iovec(+%.3fmsec %d:%s)"
+PF_KFREE_SKB= " | kfree_skb(+%.3fmsec location=%x)"
+PF_CONS_SKB= " | consume_skb(+%.3fmsec)"
+
+# Display a process of received packets and interrputs associated with
+# a NET_RX softirq
+def print_receive(hunk):
+ show_hunk = 0
+ irq_list = hunk['irq_list']
+ cpu = irq_list[0]['cpu']
+ base_t = irq_list[0]['irq_ent_t']
+ # check if this hunk should be showed
+ if dev != 0:
+ for i in range(len(irq_list)):
+ if irq_list[i]['name'].find(dev) >= 0:
+ show_hunk = 1
+ break
+ else:
+ show_hunk = 1
+ if show_hunk == 0:
+ return
+
+ print "%d.%06dsec cpu=%d" % \
+ (nsecs_secs(base_t), nsecs_nsecs(base_t)/1000, cpu)
+ for i in range(len(irq_list)):
+ print PF_IRQ_ENTRY % \
+ (diff_msec(base_t, irq_list[i]['irq_ent_t']),
+ irq_list[i]['irq'], irq_list[i]['name'])
+ print PF_JOINT
+ irq_event_list = irq_list[i]['event_list']
+ for j in range(len(irq_event_list)):
+ irq_event = irq_event_list[j]
+ if irq_event['event'] == 'netif_rx':
+ print PF_NET_RX % \
+ (diff_msec(base_t, irq_event['time']),
+ irq_event['skbaddr'])
+ print PF_JOINT
+ print PF_SOFT_ENTRY % \
+ diff_msec(base_t, hunk['sirq_ent_t'])
+ print PF_JOINT
+ event_list = hunk['event_list']
+ for i in range(len(event_list)):
+ event = event_list[i]
+ if event['event_name'] == 'napi_poll':
+ print PF_NAPI_POLL % \
+ (diff_msec(base_t, event['event_t']), event['dev'])
+ if i == len(event_list) - 1:
+ print ""
+ else:
+ print PF_JOINT
+ else:
+ print PF_NET_RECV % \
+ (diff_msec(base_t, event['event_t']), event['skbaddr'],
+ event['len'])
+ if 'comm' in event.keys():
+ print PF_WJOINT
+ print PF_CPY_DGRAM % \
+ (diff_msec(base_t, event['comm_t']),
+ event['pid'], event['comm'])
+ elif 'handle' in event.keys():
+ print PF_WJOINT
+ if event['handle'] == "kfree_skb":
+ print PF_KFREE_SKB % \
+ (diff_msec(base_t,
+ event['comm_t']),
+ event['location'])
+ elif event['handle'] == "consume_skb":
+ print PF_CONS_SKB % \
+ diff_msec(base_t,
+ event['comm_t'])
+ print PF_JOINT
+
+def trace_begin():
+ global show_tx
+ global show_rx
+ global dev
+ global debug
+
+ for i in range(len(sys.argv)):
+ if i == 0:
+ continue
+ arg = sys.argv[i]
+ if arg == 'tx':
+ show_tx = 1
+ elif arg =='rx':
+ show_rx = 1
+ elif arg.find('dev=',0, 4) >= 0:
+ dev = arg[4:]
+ elif arg == 'debug':
+ debug = 1
+ if show_tx == 0 and show_rx == 0:
+ show_tx = 1
+ show_rx = 1
+
+def trace_end():
+ # order all events in time
+ all_event_list.sort(lambda a,b :cmp(a[EINFO_IDX_TIME],
+ b[EINFO_IDX_TIME]))
+ # process all events
+ for i in range(len(all_event_list)):
+ event_info = all_event_list[i]
+ name = event_info[EINFO_IDX_NAME]
+ if name == 'irq__softirq_exit':
+ handle_irq_softirq_exit(event_info)
+ elif name == 'irq__softirq_entry':
+ handle_irq_softirq_entry(event_info)
+ elif name == 'irq__softirq_raise':
+ handle_irq_softirq_raise(event_info)
+ elif name == 'irq__irq_handler_entry':
+ handle_irq_handler_entry(event_info)
+ elif name == 'irq__irq_handler_exit':
+ handle_irq_handler_exit(event_info)
+ elif name == 'napi__napi_poll':
+ handle_napi_poll(event_info)
+ elif name == 'net__netif_receive_skb':
+ handle_netif_receive_skb(event_info)
+ elif name == 'net__netif_rx':
+ handle_netif_rx(event_info)
+ elif name == 'skb__skb_copy_datagram_iovec':
+ handle_skb_copy_datagram_iovec(event_info)
+ elif name == 'net__net_dev_queue':
+ handle_net_dev_queue(event_info)
+ elif name == 'net__net_dev_xmit':
+ handle_net_dev_xmit(event_info)
+ elif name == 'skb__kfree_skb':
+ handle_kfree_skb(event_info)
+ elif name == 'skb__consume_skb':
+ handle_consume_skb(event_info)
+ # display receive hunks
+ if show_rx:
+ for i in range(len(receive_hunk_list)):
+ print_receive(receive_hunk_list[i])
+ # display transmit hunks
+ if show_tx:
+ print " dev len Qdisc " \
+ " netdevice free"
+ for i in range(len(tx_free_list)):
+ print_transmit(tx_free_list[i])
+ if debug:
+ print "debug buffer status"
+ print "----------------------------"
+ print "xmit Qdisc:remain:%d overflow:%d" % \
+ (len(tx_queue_list), of_count_tx_queue_list)
+ print "xmit netdevice:remain:%d overflow:%d" % \
+ (len(tx_xmit_list), of_count_tx_xmit_list)
+ print "receive:remain:%d overflow:%d" % \
+ (len(rx_skb_list), of_count_rx_skb_list)
+
+# called from perf, when it finds a correspoinding event
+def irq__softirq_entry(name, context, cpu, sec, nsec, pid, comm, vec):
+ if symbol_str("irq__softirq_entry", "vec", vec) != "NET_RX":
+ return
+ event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, vec)
+ all_event_list.append(event_info)
+
+def irq__softirq_exit(name, context, cpu, sec, nsec, pid, comm, vec):
+ if symbol_str("irq__softirq_entry", "vec", vec) != "NET_RX":
+ return
+ event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, vec)
+ all_event_list.append(event_info)
+
+def irq__softirq_raise(name, context, cpu, sec, nsec, pid, comm, vec):
+ if symbol_str("irq__softirq_entry", "vec", vec) != "NET_RX":
+ return
+ event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, vec)
+ all_event_list.append(event_info)
+
+def irq__irq_handler_entry(name, context, cpu, sec, nsec, pid, comm,
+ irq, irq_name):
+ event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+ irq, irq_name)
+ all_event_list.append(event_info)
+
+def irq__irq_handler_exit(name, context, cpu, sec, nsec, pid, comm, irq, ret):
+ event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, irq, ret)
+ all_event_list.append(event_info)
+
+def napi__napi_poll(name, context, cpu, sec, nsec, pid, comm, napi, dev_name):
+ event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+ napi, dev_name)
+ all_event_list.append(event_info)
+
+def net__netif_receive_skb(name, context, cpu, sec, nsec, pid, comm, skbaddr,
+ skblen, dev_name):
+ event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+ skbaddr, skblen, dev_name)
+ all_event_list.append(event_info)
+
+def net__netif_rx(name, context, cpu, sec, nsec, pid, comm, skbaddr,
+ skblen, dev_name):
+ event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+ skbaddr, skblen, dev_name)
+ all_event_list.append(event_info)
+
+def net__net_dev_queue(name, context, cpu, sec, nsec, pid, comm,
+ skbaddr, skblen, dev_name):
+ event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+ skbaddr, skblen, dev_name)
+ all_event_list.append(event_info)
+
+def net__net_dev_xmit(name, context, cpu, sec, nsec, pid, comm,
+ skbaddr, skblen, rc, dev_name):
+ event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+ skbaddr, skblen, rc ,dev_name)
+ all_event_list.append(event_info)
+
+def skb__kfree_skb(name, context, cpu, sec, nsec, pid, comm,
+ skbaddr, protocol, location):
+ event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+ skbaddr, protocol, location)
+ all_event_list.append(event_info)
+
+def skb__consume_skb(name, context, cpu, sec, nsec, pid, comm, skbaddr):
+ event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+ skbaddr)
+ all_event_list.append(event_info)
+
+def skb__skb_copy_datagram_iovec(name, context, cpu, sec, nsec, pid, comm,
+ skbaddr, skblen):
+ event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+ skbaddr, skblen)
+ all_event_list.append(event_info)
+
+def handle_irq_handler_entry(event_info):
+ (name, context, cpu, time, pid, comm, irq, irq_name) = event_info
+ if cpu not in irq_dic.keys():
+ irq_dic[cpu] = []
+ irq_record = {'irq':irq, 'name':irq_name, 'cpu':cpu, 'irq_ent_t':time}
+ irq_dic[cpu].append(irq_record)
+
+def handle_irq_handler_exit(event_info):
+ (name, context, cpu, time, pid, comm, irq, ret) = event_info
+ if cpu not in irq_dic.keys():
+ return
+ irq_record = irq_dic[cpu].pop()
+ if irq != irq_record['irq']:
+ return
+ irq_record.update({'irq_ext_t':time})
+ # if an irq doesn't include NET_RX softirq, drop.
+ if 'event_list' in irq_record.keys():
+ irq_dic[cpu].append(irq_record)
+
+def handle_irq_softirq_raise(event_info):
+ (name, context, cpu, time, pid, comm, vec) = event_info
+ if cpu not in irq_dic.keys() \
+ or len(irq_dic[cpu]) == 0:
+ return
+ irq_record = irq_dic[cpu].pop()
+ if 'event_list' in irq_record.keys():
+ irq_event_list = irq_record['event_list']
+ else:
+ irq_event_list = []
+ irq_event_list.append({'time':time, 'event':'sirq_raise'})
+ irq_record.update({'event_list':irq_event_list})
+ irq_dic[cpu].append(irq_record)
+
+def handle_irq_softirq_entry(event_info):
+ (name, context, cpu, time, pid, comm, vec) = event_info
+ net_rx_dic[cpu] = {'sirq_ent_t':time, 'event_list':[]}
+
+def handle_irq_softirq_exit(event_info):
+ (name, context, cpu, time, pid, comm, vec) = event_info
+ irq_list = []
+ event_list = 0
+ if cpu in irq_dic.keys():
+ irq_list = irq_dic[cpu]
+ del irq_dic[cpu]
+ if cpu in net_rx_dic.keys():
+ sirq_ent_t = net_rx_dic[cpu]['sirq_ent_t']
+ event_list = net_rx_dic[cpu]['event_list']
+ del net_rx_dic[cpu]
+ if irq_list == [] or event_list == 0:
+ return
+ rec_data = {'sirq_ent_t':sirq_ent_t, 'sirq_ext_t':time,
+ 'irq_list':irq_list, 'event_list':event_list}
+ # merge information realted to a NET_RX softirq
+ receive_hunk_list.append(rec_data)
+
+def handle_napi_poll(event_info):
+ (name, context, cpu, time, pid, comm, napi, dev_name) = event_info
+ if cpu in net_rx_dic.keys():
+ event_list = net_rx_dic[cpu]['event_list']
+ rec_data = {'event_name':'napi_poll',
+ 'dev':dev_name, 'event_t':time}
+ event_list.append(rec_data)
+
+def handle_netif_rx(event_info):
+ (name, context, cpu, time, pid, comm,
+ skbaddr, skblen, dev_name) = event_info
+ if cpu not in irq_dic.keys() \
+ or len(irq_dic[cpu]) == 0:
+ return
+ irq_record = irq_dic[cpu].pop()
+ if 'event_list' in irq_record.keys():
+ irq_event_list = irq_record['event_list']
+ else:
+ irq_event_list = []
+ irq_event_list.append({'time':time, 'event':'netif_rx',
+ 'skbaddr':skbaddr, 'skblen':skblen, 'dev_name':dev_name})
+ irq_record.update({'event_list':irq_event_list})
+ irq_dic[cpu].append(irq_record)
+
+def handle_netif_receive_skb(event_info):
+ global of_count_rx_skb_list
+
+ (name, context, cpu, time, pid, comm,
+ skbaddr, skblen, dev_name) = event_info
+ if cpu in net_rx_dic.keys():
+ rec_data = {'event_name':'netif_receive_skb',
+ 'event_t':time, 'skbaddr':skbaddr, 'len':skblen}
+ event_list = net_rx_dic[cpu]['event_list']
+ event_list.append(rec_data)
+ rx_skb_list.insert(0, rec_data)
+ if len(rx_skb_list) > buffer_budget:
+ rx_skb_list.pop()
+ of_count_rx_skb_list += 1
+
+def handle_net_dev_queue(event_info):
+ global of_count_tx_queue_list
+
+ (name, context, cpu, time, pid, comm,
+ skbaddr, skblen, dev_name) = event_info
+ skb = {'dev':dev_name, 'skbaddr':skbaddr, 'len':skblen, 'queue_t':time}
+ tx_queue_list.insert(0, skb)
+ if len(tx_queue_list) > buffer_budget:
+ tx_queue_list.pop()
+ of_count_tx_queue_list += 1
+
+def handle_net_dev_xmit(event_info):
+ global of_count_tx_xmit_list
+
+ (name, context, cpu, time, pid, comm,
+ skbaddr, skblen, rc, dev_name) = event_info
+ if rc == 0: # NETDEV_TX_OK
+ for i in range(len(tx_queue_list)):
+ skb = tx_queue_list[i]
+ if skb['skbaddr'] == skbaddr:
+ skb['xmit_t'] = time
+ tx_xmit_list.insert(0, skb)
+ del tx_queue_list[i]
+ if len(tx_xmit_list) > buffer_budget:
+ tx_xmit_list.pop()
+ of_count_tx_xmit_list += 1
+ return
+
+def handle_kfree_skb(event_info):
+ (name, context, cpu, time, pid, comm,
+ skbaddr, protocol, location) = event_info
+ for i in range(len(tx_queue_list)):
+ skb = tx_queue_list[i]
+ if skb['skbaddr'] == skbaddr:
+ del tx_queue_list[i]
+ return
+ for i in range(len(tx_xmit_list)):
+ skb = tx_xmit_list[i]
+ if skb['skbaddr'] == skbaddr:
+ skb['free_t'] = time
+ tx_free_list.append(skb)
+ del tx_xmit_list[i]
+ return
+ for i in range(len(rx_skb_list)):
+ rec_data = rx_skb_list[i]
+ if rec_data['skbaddr'] == skbaddr:
+ rec_data.update({'handle':"kfree_skb",
+ 'comm':comm, 'pid':pid, 'comm_t':time})
+ del rx_skb_list[i]
+ return
+
+def handle_consume_skb(event_info):
+ (name, context, cpu, time, pid, comm, skbaddr) = event_info
+ for i in range(len(tx_xmit_list)):
+ skb = tx_xmit_list[i]
+ if skb['skbaddr'] == skbaddr:
+ skb['free_t'] = time
+ tx_free_list.append(skb)
+ del tx_xmit_list[i]
+ return
+
+def handle_skb_copy_datagram_iovec(event_info):
+ (name, context, cpu, time, pid, comm, skbaddr, skblen) = event_info
+ for i in range(len(rx_skb_list)):
+ rec_data = rx_skb_list[i]
+ if skbaddr == rec_data['skbaddr']:
+ rec_data.update({'handle':"skb_copy_datagram_iovec",
+ 'comm':comm, 'pid':pid, 'comm_t':time})
+ del rx_skb_list[i]
+ return

2010-08-27 12:37:45

by Neil Horman

[permalink] [raw]
Subject: Re: [RFC PATCH v4 1/5] irq: add tracepoint to softirq_raise

On Wed, Jul 28, 2010 at 03:29:05PM +0900, Koki Sanagi wrote:
> CHANGE-LOG since v3:
> -change arguments of softirq tracepoint into original one.
>
> From: Lai Jiangshan <[email protected]>
>
> Add a tracepoint for tracing when softirq action is raised.
>
> It and the existed tracepoints complete softirq's tracepoints:
> softirq_raise, softirq_entry and softirq_exit.
>
> And when this tracepoint is used in combination with
> the softirq_entry tracepoint we can determine
> the softirq raise latency.
>
> Signed-off-by: Lai Jiangshan <[email protected]>
> Acked-by: Mathieu Desnoyers <[email protected]>
> Acked-by: Frederic Weisbecker <[email protected]>
>
Sorry its taken me awhile to get back to these, I've been on PTO
Acked-by: Neil Horman <[email protected]>

> [ factorize softirq events with DECLARE_EVENT_CLASS ]
> Signed-off-by: Koki Sanagi <[email protected]>
> ---
> include/linux/interrupt.h | 8 +++++++-
> include/trace/events/irq.h | 26 ++++++++++++++++++++++++--
> 2 files changed, 31 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
> index c233113..5460db4 100644
> --- a/include/linux/interrupt.h
> +++ b/include/linux/interrupt.h
> @@ -18,6 +18,7 @@
> #include <asm/atomic.h>
> #include <asm/ptrace.h>
> #include <asm/system.h>
> +#include <trace/events/irq.h>
>
> /*
> * These correspond to the IORESOURCE_IRQ_* defines in
> @@ -402,7 +403,12 @@ asmlinkage void do_softirq(void);
> asmlinkage void __do_softirq(void);
> extern void open_softirq(int nr, void (*action)(struct softirq_action *));
> extern void softirq_init(void);
> -#define __raise_softirq_irqoff(nr) do { or_softirq_pending(1UL << (nr)); } while (0)
> +static inline void __raise_softirq_irqoff(unsigned int nr)
> +{
> + trace_softirq_raise((struct softirq_action *)&nr, NULL);
> + or_softirq_pending(1UL << nr);
> +}
> +
> extern void raise_softirq_irqoff(unsigned int nr);
> extern void raise_softirq(unsigned int nr);
> extern void wakeup_softirqd(void);
> diff --git a/include/trace/events/irq.h b/include/trace/events/irq.h
> index 0e4cfb6..3ddda02 100644
> --- a/include/trace/events/irq.h
> +++ b/include/trace/events/irq.h
> @@ -5,7 +5,9 @@
> #define _TRACE_IRQ_H
>
> #include <linux/tracepoint.h>
> -#include <linux/interrupt.h>
> +
> +struct irqaction;
> +struct softirq_action;
>
> #define softirq_name(sirq) { sirq##_SOFTIRQ, #sirq }
> #define show_softirq_name(val) \
> @@ -93,7 +95,10 @@ DECLARE_EVENT_CLASS(softirq,
> ),
>
> TP_fast_assign(
> - __entry->vec = (int)(h - vec);
> + if (vec)
> + __entry->vec = (int)(h - vec);
> + else
> + __entry->vec = *((int *)h);
> ),
>
> TP_printk("vec=%d [action=%s]", __entry->vec,
> @@ -136,6 +141,23 @@ DEFINE_EVENT(softirq, softirq_exit,
> TP_ARGS(h, vec)
> );
>
> +/**
> + * softirq_raise - called immediately when a softirq is raised
> + * @h: pointer to struct softirq_action
> + * @vec: pointer to first struct softirq_action in softirq_vec array
> + *
> + * The @h parameter contains a pointer to the softirq vector number which is
> + * raised. @vec is NULL and it means @h includes vector number not
> + * softirq_action. When used in combination with the softirq_entry tracepoint
> + * we can determine the softirq raise latency.
> + */
> +DEFINE_EVENT(softirq, softirq_raise,
> +
> + TP_PROTO(struct softirq_action *h, struct softirq_action *vec),
> +
> + TP_ARGS(h, vec)
> +);
> +
> #endif /* _TRACE_IRQ_H */
>
> /* This part must be outside protection */
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2010-08-27 12:38:09

by Neil Horman

[permalink] [raw]
Subject: Re: [RFC PATCH v4 2/5] napi: convert trace_napi_poll to TRACE_EVENT

On Wed, Jul 28, 2010 at 03:30:14PM +0900, Koki Sanagi wrote:
> From: Neil Horman <[email protected]>
>
> This patch converts trace_napi_poll from DECLARE_EVENT to TRACE_EVENT to improve
> the usability of napi_poll tracepoint.
>
> <idle>-0 [001] 241302.750777: napi_poll: napi poll on napi struct f6acc480 for device eth3
> <idle>-0 [000] 241302.852389: napi_poll: napi poll on napi struct f5d0d70c for device eth1
>
> An original patch is below.
> http://marc.info/?l=linux-kernel&m=126021713809450&w=2
> Signed-off-by: Neil Horman <[email protected]>
>
> And add a fix by Steven Rostedt.
> http://marc.info/?l=linux-kernel&m=126150506519173&w=2
>
> Signed-off-by: Koki Sanagi <[email protected]>
> ---
> include/trace/events/napi.h | 25 +++++++++++++++++++++++--
> 1 files changed, 23 insertions(+), 2 deletions(-)
>
Acked-by: Neil Horman <[email protected]>

> diff --git a/include/trace/events/napi.h b/include/trace/events/napi.h
> index 188deca..8fe1e93 100644
> --- a/include/trace/events/napi.h
> +++ b/include/trace/events/napi.h
> @@ -6,10 +6,31 @@
>
> #include <linux/netdevice.h>
> #include <linux/tracepoint.h>
> +#include <linux/ftrace.h>
> +
> +#define NO_DEV "(no_device)"
> +
> +TRACE_EVENT(napi_poll,
>
> -DECLARE_TRACE(napi_poll,
> TP_PROTO(struct napi_struct *napi),
> - TP_ARGS(napi));
> +
> + TP_ARGS(napi),
> +
> + TP_STRUCT__entry(
> + __field( struct napi_struct *, napi)
> + __string( dev_name, napi->dev ? napi->dev->name : NO_DEV)
> + ),
> +
> + TP_fast_assign(
> + __entry->napi = napi;
> + __assign_str(dev_name, napi->dev ? napi->dev->name : NO_DEV);
> + ),
> +
> + TP_printk("napi poll on napi struct %p for device %s",
> + __entry->napi, __get_str(dev_name))
> +);
> +
> +#undef NO_DEV
>
> #endif /* _TRACE_NAPI_H_ */
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2010-08-27 12:38:35

by Neil Horman

[permalink] [raw]
Subject: Re: [RFC PATCH v4 3/5] netdev: add tracepoints to netdev layer

On Wed, Jul 28, 2010 at 03:31:17PM +0900, Koki Sanagi wrote:
> CHANGE-LOG since v3:
> -add tracepoint to netif_rx.
>
>
> This patch adds tracepoint to dev_queue_xmit, dev_hard_start_xmit, netif_rx and
> netif_receive_skb. These tracepoints help you to monitor network driver's
> input/output.
>
> <idle>-0 [001] 112447.902030: netif_rx: dev=eth1 skbaddr=f3ef0900 len=84
> <idle>-0 [001] 112447.902039: netif_receive_skb: dev=eth1 skbaddr=f3ef0900 len=84
> sshd-6828 [000] 112447.903257: net_dev_queue: dev=eth4 skbaddr=f3fca538 len=226
> sshd-6828 [000] 112447.903260: net_dev_xmit: dev=eth4 skbaddr=f3fca538 len=226 rc=0
>
> Signed-off-by: Koki Sanagi <[email protected]>
Acked-by: Neil Horman <[email protected]>

> ---
> include/trace/events/net.h | 82 ++++++++++++++++++++++++++++++++++++++++++++
> net/core/dev.c | 6 +++
> net/core/net-traces.c | 1 +
> 3 files changed, 89 insertions(+), 0 deletions(-)
>
> diff --git a/include/trace/events/net.h b/include/trace/events/net.h
> new file mode 100644
> index 0000000..5f247f5
> --- /dev/null
> +++ b/include/trace/events/net.h
> @@ -0,0 +1,82 @@
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM net
> +
> +#if !defined(_TRACE_NET_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_NET_H
> +
> +#include <linux/skbuff.h>
> +#include <linux/netdevice.h>
> +#include <linux/ip.h>
> +#include <linux/tracepoint.h>
> +
> +TRACE_EVENT(net_dev_xmit,
> +
> + TP_PROTO(struct sk_buff *skb,
> + int rc),
> +
> + TP_ARGS(skb, rc),
> +
> + TP_STRUCT__entry(
> + __field( void *, skbaddr )
> + __field( unsigned int, len )
> + __field( int, rc )
> + __string( name, skb->dev->name )
> + ),
> +
> + TP_fast_assign(
> + __entry->skbaddr = skb;
> + __entry->len = skb->len;
> + __entry->rc = rc;
> + __assign_str(name, skb->dev->name);
> + ),
> +
> + TP_printk("dev=%s skbaddr=%p len=%u rc=%d",
> + __get_str(name), __entry->skbaddr, __entry->len, __entry->rc)
> +);
> +
> +DECLARE_EVENT_CLASS(net_dev_template,
> +
> + TP_PROTO(struct sk_buff *skb),
> +
> + TP_ARGS(skb),
> +
> + TP_STRUCT__entry(
> + __field( void *, skbaddr )
> + __field( unsigned int, len )
> + __string( name, skb->dev->name )
> + ),
> +
> + TP_fast_assign(
> + __entry->skbaddr = skb;
> + __entry->len = skb->len;
> + __assign_str(name, skb->dev->name);
> + ),
> +
> + TP_printk("dev=%s skbaddr=%p len=%u",
> + __get_str(name), __entry->skbaddr, __entry->len)
> +)
> +
> +DEFINE_EVENT(net_dev_template, net_dev_queue,
> +
> + TP_PROTO(struct sk_buff *skb),
> +
> + TP_ARGS(skb)
> +);
> +
> +DEFINE_EVENT(net_dev_template, netif_receive_skb,
> +
> + TP_PROTO(struct sk_buff *skb),
> +
> + TP_ARGS(skb)
> +);
> +
> +DEFINE_EVENT(net_dev_template, netif_rx,
> +
> + TP_PROTO(struct sk_buff *skb),
> +
> + TP_ARGS(skb)
> +);
> +#endif /* _TRACE_NET_H */
> +
> +/* This part must be outside protection */
> +#include <trace/define_trace.h>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 6e1b437..9ea9c1e 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -130,6 +130,7 @@
> #include <linux/jhash.h>
> #include <linux/random.h>
> #include <trace/events/napi.h>
> +#include <trace/events/net.h>
> #include <linux/pci.h>
>
> #include "net-sysfs.h"
> @@ -1981,6 +1982,7 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
> }
>
> rc = ops->ndo_start_xmit(skb, dev);
> + trace_net_dev_xmit(skb, rc);
> if (rc == NETDEV_TX_OK)
> txq_trans_update(txq);
> return rc;
> @@ -2001,6 +2003,7 @@ gso:
> skb_dst_drop(nskb);
>
> rc = ops->ndo_start_xmit(nskb, dev);
> + trace_net_dev_xmit(nskb, rc);
> if (unlikely(rc != NETDEV_TX_OK)) {
> if (rc & ~NETDEV_TX_MASK)
> goto out_kfree_gso_skb;
> @@ -2189,6 +2192,7 @@ int dev_queue_xmit(struct sk_buff *skb)
> #ifdef CONFIG_NET_CLS_ACT
> skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_EGRESS);
> #endif
> + trace_net_dev_queue(skb);
> if (q->enqueue) {
> rc = __dev_xmit_skb(skb, q, dev, txq);
> goto out;
> @@ -2515,6 +2519,7 @@ int netif_rx(struct sk_buff *skb)
> if (netdev_tstamp_prequeue)
> net_timestamp_check(skb);
>
> + trace_netif_rx(skb);
> #ifdef CONFIG_RPS
> {
> struct rps_dev_flow voidflow, *rflow = &voidflow;
> @@ -2829,6 +2834,7 @@ static int __netif_receive_skb(struct sk_buff *skb)
> if (!netdev_tstamp_prequeue)
> net_timestamp_check(skb);
>
> + trace_netif_receive_skb(skb);
> if (vlan_tx_tag_present(skb) && vlan_hwaccel_do_receive(skb))
> return NET_RX_SUCCESS;
>
> diff --git a/net/core/net-traces.c b/net/core/net-traces.c
> index afa6380..7f1bb2a 100644
> --- a/net/core/net-traces.c
> +++ b/net/core/net-traces.c
> @@ -26,6 +26,7 @@
>
> #define CREATE_TRACE_POINTS
> #include <trace/events/skb.h>
> +#include <trace/events/net.h>
> #include <trace/events/napi.h>
>
> EXPORT_TRACEPOINT_SYMBOL_GPL(kfree_skb);
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2010-08-27 12:38:57

by Neil Horman

[permalink] [raw]
Subject: Re: [RFC PATCH v4 4/5] skb: add tracepoints to freeing skb

On Wed, Jul 28, 2010 at 03:33:57PM +0900, Koki Sanagi wrote:
> CHANGE-LOG since v3:
> -remove tracepoint of dev_kfree_skb_irq and skb_free_datagram_locked.
> Insead of them, add trace_kfree_skb before __kfree_skb in net_tx_action and
> skb_free_datagram_locked. And there is no change of about the location of
> trace_consume_skb to keep a symmetry against trace_kfree_skb.
>
> This patch adds tracepoint to consume_skb and add trace_kfree_skb before
> __kfree_skb in skb_free_datagram_locked and net_tx_action.
> Combinating with tracepoint on dev_hard_start_xmit, we can check how long it
> takes to free transmited packets. And using it, we can calculate how many
> packets driver had at that time. It is useful when a drop of transmited packet
> is a problem.
>
> sshd-6828 [000] 112689.258154: consume_skb: skbaddr=f2d99bb8
>
> Signed-off-by: Koki Sanagi <[email protected]>
Acked-by: Neil Horman <[email protected]>

> ---
> include/trace/events/skb.h | 17 +++++++++++++++++
> net/core/datagram.c | 1 +
> net/core/dev.c | 2 ++
> net/core/skbuff.c | 1 +
> 4 files changed, 21 insertions(+), 0 deletions(-)
>
> diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
> index 4b2be6d..75ce9d5 100644
> --- a/include/trace/events/skb.h
> +++ b/include/trace/events/skb.h
> @@ -35,6 +35,23 @@ TRACE_EVENT(kfree_skb,
> __entry->skbaddr, __entry->protocol, __entry->location)
> );
>
> +TRACE_EVENT(consume_skb,
> +
> + TP_PROTO(struct sk_buff *skb),
> +
> + TP_ARGS(skb),
> +
> + TP_STRUCT__entry(
> + __field( void *, skbaddr )
> + ),
> +
> + TP_fast_assign(
> + __entry->skbaddr = skb;
> + ),
> +
> + TP_printk("skbaddr=%p", __entry->skbaddr)
> +);
> +
> TRACE_EVENT(skb_copy_datagram_iovec,
>
> TP_PROTO(const struct sk_buff *skb, int len),
> diff --git a/net/core/datagram.c b/net/core/datagram.c
> index 251997a..282806b 100644
> --- a/net/core/datagram.c
> +++ b/net/core/datagram.c
> @@ -243,6 +243,7 @@ void skb_free_datagram_locked(struct sock *sk, struct sk_buff *skb)
> unlock_sock_fast(sk, slow);
>
> /* skb is now orphaned, can be freed outside of locked section */
> + trace_kfree_skb(skb, skb_free_datagram_locked);
> __kfree_skb(skb);
> }
> EXPORT_SYMBOL(skb_free_datagram_locked);
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 9ea9c1e..92d79bf 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -131,6 +131,7 @@
> #include <linux/random.h>
> #include <trace/events/napi.h>
> #include <trace/events/net.h>
> +#include <trace/events/skb.h>
> #include <linux/pci.h>
>
> #include "net-sysfs.h"
> @@ -2577,6 +2578,7 @@ static void net_tx_action(struct softirq_action *h)
> clist = clist->next;
>
> WARN_ON(atomic_read(&skb->users));
> + trace_kfree_skb(skb, net_tx_action);
> __kfree_skb(skb);
> }
> }
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 7da58a2..9cb91a5 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -466,6 +466,7 @@ void consume_skb(struct sk_buff *skb)
> smp_rmb();
> else if (likely(!atomic_dec_and_test(&skb->users)))
> return;
> + trace_consume_skb(skb);
> __kfree_skb(skb);
> }
> EXPORT_SYMBOL(consume_skb);
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2010-08-27 12:39:37

by Neil Horman

[permalink] [raw]
Subject: Re: [RFC PATCH v4 5/5] perf:add a script shows a process of packet

On Wed, Jul 28, 2010 at 03:35:16PM +0900, Koki Sanagi wrote:
> Add a perf script which shows a process of packets and processed time.
> It helps us to investigate networking or network device.
>
> If you want to use it, install perf and record perf.data like following.
>
> #perf trace record netdev-times [script]
>
> If you set script, perf gathers records until it ends.
> If not, you must Ctrl-C to stop recording.
>
> And if you want a report from record,
>
> #perf trace report netdev-times [options]
>
> If you use some options, you can limit an output.
> Option is below.
>
> tx: show only process of tx packets
> rx: show only process of rx packets
> dev=: show a process specified with this option
> debug: work with debug mode. It shows buffer status.
>
> For example, if you want to show a process of received packets associated
> with eth4,
>
> #perf trace report netdev-times rx dev=eth4
> 106133.171439sec cpu=0
> irq_entry(+0.000msec irq=24:eth4)
> |
> softirq_entry(+0.006msec)
> |
> |---netif_receive_skb(+0.010msec skb=f2d15900 len=100)
> | |
> | skb_copy_datagram_iovec(+0.039msec 10291::10291)
> |
> napi_poll_exit(+0.022msec eth4)
>
> This perf script helps us to analyze a process time of transmit/receive
> sequence.
>
> Signed-off-by: Koki Sanagi <[email protected]>
Acked-by: Neil Horman <[email protected]>

> ---
> tools/perf/scripts/python/bin/netdev-times-record | 8 +
> tools/perf/scripts/python/bin/netdev-times-report | 5 +
> tools/perf/scripts/python/netdev-times.py | 464 +++++++++++++++++++++
> 3 files changed, 477 insertions(+), 0 deletions(-)
>
> diff --git a/tools/perf/scripts/python/bin/netdev-times-record b/tools/perf/scripts/python/bin/netdev-times-record
> new file mode 100644
> index 0000000..2b59511
> --- /dev/null
> +++ b/tools/perf/scripts/python/bin/netdev-times-record
> @@ -0,0 +1,8 @@
> +#!/bin/bash
> +perf record -c 1 -f -R -a -e net:net_dev_xmit -e net:net_dev_queue \
> + -e net:netif_receive_skb -e net:netif_rx \
> + -e skb:consume_skb -e skb:kfree_skb \
> + -e skb:skb_copy_datagram_iovec -e napi:napi_poll \
> + -e irq:irq_handler_entry -e irq:irq_handler_exit \
> + -e irq:softirq_entry -e irq:softirq_exit \
> + -e irq:softirq_raise $@
> diff --git a/tools/perf/scripts/python/bin/netdev-times-report b/tools/perf/scripts/python/bin/netdev-times-report
> new file mode 100644
> index 0000000..c3d0a63
> --- /dev/null
> +++ b/tools/perf/scripts/python/bin/netdev-times-report
> @@ -0,0 +1,5 @@
> +#!/bin/bash
> +# description: display a process of packet and processing time
> +# args: [tx] [rx] [dev=] [debug]
> +
> +perf trace -s ~/libexec/perf-core/scripts/python/netdev-times.py $@
> diff --git a/tools/perf/scripts/python/netdev-times.py b/tools/perf/scripts/python/netdev-times.py
> new file mode 100644
> index 0000000..9aa0a32
> --- /dev/null
> +++ b/tools/perf/scripts/python/netdev-times.py
> @@ -0,0 +1,464 @@
> +# Display a process of packets and processed time.
> +# It helps us to investigate networking or network device.
> +#
> +# options
> +# tx: show only tx chart
> +# rx: show only rx chart
> +# dev=: show only thing related to specified device
> +# debug: work with debug mode. It shows buffer status.
> +
> +import os
> +import sys
> +
> +sys.path.append(os.environ['PERF_EXEC_PATH'] + \
> + '/scripts/python/Perf-Trace-Util/lib/Perf/Trace')
> +
> +from perf_trace_context import *
> +from Core import *
> +from Util import *
> +
> +all_event_list = []; # insert all tracepoint event related with this script
> +irq_dic = {}; # key is cpu and value is a list which stacks irqs
> + # which raise NET_RX softirq
> +net_rx_dic = {}; # key is cpu and value include time of NET_RX softirq-entry
> + # and a list which stacks receive
> +receive_hunk_list = []; # a list which include a sequence of receive events
> +rx_skb_list = []; # received packet list for matching
> + # skb_copy_datagram_iovec
> +
> +buffer_budget = 65536; # the budget of rx_skb_list, tx_queue_list and
> + # tx_xmit_list
> +of_count_rx_skb_list = 0; # overflow count
> +
> +tx_queue_list = []; # list of packets which pass through dev_queue_xmit
> +of_count_tx_queue_list = 0; # overflow count
> +
> +tx_xmit_list = []; # list of packets which pass through dev_hard_start_xmit
> +of_count_tx_xmit_list = 0; # overflow count
> +
> +tx_free_list = []; # list of packets which is freed
> +
> +# options
> +show_tx = 0;
> +show_rx = 0;
> +dev = 0; # store a name of device specified by option "dev="
> +debug = 0;
> +
> +# indices of event_info tuple
> +EINFO_IDX_NAME= 0
> +EINFO_IDX_CONTEXT=1
> +EINFO_IDX_CPU= 2
> +EINFO_IDX_TIME= 3
> +EINFO_IDX_PID= 4
> +EINFO_IDX_COMM= 5
> +
> +# Calculate a time interval(msec) from src(nsec) to dst(nsec)
> +def diff_msec(src, dst):
> + return (dst - src) / 1000000.0
> +
> +# Display a process of transmitting a packet
> +def print_transmit(hunk):
> + if dev != 0 and hunk['dev'].find(dev) < 0:
> + return
> + print "%7s %5d %6d.%06dsec %12.3fmsec %12.3fmsec" % \
> + (hunk['dev'], hunk['len'],
> + nsecs_secs(hunk['queue_t']),
> + nsecs_nsecs(hunk['queue_t'])/1000,
> + diff_msec(hunk['queue_t'], hunk['xmit_t']),
> + diff_msec(hunk['xmit_t'], hunk['free_t']))
> +
> +# Format for displaying rx packet processing
> +PF_IRQ_ENTRY= " irq_entry(+%.3fmsec irq=%d:%s)"
> +PF_SOFT_ENTRY=" softirq_entry(+%.3fmsec)"
> +PF_NAPI_POLL= " napi_poll_exit(+%.3fmsec %s)"
> +PF_JOINT= " |"
> +PF_WJOINT= " | |"
> +PF_NET_RECV= " |---netif_receive_skb(+%.3fmsec skb=%x len=%d)"
> +PF_NET_RX= " |---netif_rx(+%.3fmsec skb=%x)"
> +PF_CPY_DGRAM= " | skb_copy_datagram_iovec(+%.3fmsec %d:%s)"
> +PF_KFREE_SKB= " | kfree_skb(+%.3fmsec location=%x)"
> +PF_CONS_SKB= " | consume_skb(+%.3fmsec)"
> +
> +# Display a process of received packets and interrputs associated with
> +# a NET_RX softirq
> +def print_receive(hunk):
> + show_hunk = 0
> + irq_list = hunk['irq_list']
> + cpu = irq_list[0]['cpu']
> + base_t = irq_list[0]['irq_ent_t']
> + # check if this hunk should be showed
> + if dev != 0:
> + for i in range(len(irq_list)):
> + if irq_list[i]['name'].find(dev) >= 0:
> + show_hunk = 1
> + break
> + else:
> + show_hunk = 1
> + if show_hunk == 0:
> + return
> +
> + print "%d.%06dsec cpu=%d" % \
> + (nsecs_secs(base_t), nsecs_nsecs(base_t)/1000, cpu)
> + for i in range(len(irq_list)):
> + print PF_IRQ_ENTRY % \
> + (diff_msec(base_t, irq_list[i]['irq_ent_t']),
> + irq_list[i]['irq'], irq_list[i]['name'])
> + print PF_JOINT
> + irq_event_list = irq_list[i]['event_list']
> + for j in range(len(irq_event_list)):
> + irq_event = irq_event_list[j]
> + if irq_event['event'] == 'netif_rx':
> + print PF_NET_RX % \
> + (diff_msec(base_t, irq_event['time']),
> + irq_event['skbaddr'])
> + print PF_JOINT
> + print PF_SOFT_ENTRY % \
> + diff_msec(base_t, hunk['sirq_ent_t'])
> + print PF_JOINT
> + event_list = hunk['event_list']
> + for i in range(len(event_list)):
> + event = event_list[i]
> + if event['event_name'] == 'napi_poll':
> + print PF_NAPI_POLL % \
> + (diff_msec(base_t, event['event_t']), event['dev'])
> + if i == len(event_list) - 1:
> + print ""
> + else:
> + print PF_JOINT
> + else:
> + print PF_NET_RECV % \
> + (diff_msec(base_t, event['event_t']), event['skbaddr'],
> + event['len'])
> + if 'comm' in event.keys():
> + print PF_WJOINT
> + print PF_CPY_DGRAM % \
> + (diff_msec(base_t, event['comm_t']),
> + event['pid'], event['comm'])
> + elif 'handle' in event.keys():
> + print PF_WJOINT
> + if event['handle'] == "kfree_skb":
> + print PF_KFREE_SKB % \
> + (diff_msec(base_t,
> + event['comm_t']),
> + event['location'])
> + elif event['handle'] == "consume_skb":
> + print PF_CONS_SKB % \
> + diff_msec(base_t,
> + event['comm_t'])
> + print PF_JOINT
> +
> +def trace_begin():
> + global show_tx
> + global show_rx
> + global dev
> + global debug
> +
> + for i in range(len(sys.argv)):
> + if i == 0:
> + continue
> + arg = sys.argv[i]
> + if arg == 'tx':
> + show_tx = 1
> + elif arg =='rx':
> + show_rx = 1
> + elif arg.find('dev=',0, 4) >= 0:
> + dev = arg[4:]
> + elif arg == 'debug':
> + debug = 1
> + if show_tx == 0 and show_rx == 0:
> + show_tx = 1
> + show_rx = 1
> +
> +def trace_end():
> + # order all events in time
> + all_event_list.sort(lambda a,b :cmp(a[EINFO_IDX_TIME],
> + b[EINFO_IDX_TIME]))
> + # process all events
> + for i in range(len(all_event_list)):
> + event_info = all_event_list[i]
> + name = event_info[EINFO_IDX_NAME]
> + if name == 'irq__softirq_exit':
> + handle_irq_softirq_exit(event_info)
> + elif name == 'irq__softirq_entry':
> + handle_irq_softirq_entry(event_info)
> + elif name == 'irq__softirq_raise':
> + handle_irq_softirq_raise(event_info)
> + elif name == 'irq__irq_handler_entry':
> + handle_irq_handler_entry(event_info)
> + elif name == 'irq__irq_handler_exit':
> + handle_irq_handler_exit(event_info)
> + elif name == 'napi__napi_poll':
> + handle_napi_poll(event_info)
> + elif name == 'net__netif_receive_skb':
> + handle_netif_receive_skb(event_info)
> + elif name == 'net__netif_rx':
> + handle_netif_rx(event_info)
> + elif name == 'skb__skb_copy_datagram_iovec':
> + handle_skb_copy_datagram_iovec(event_info)
> + elif name == 'net__net_dev_queue':
> + handle_net_dev_queue(event_info)
> + elif name == 'net__net_dev_xmit':
> + handle_net_dev_xmit(event_info)
> + elif name == 'skb__kfree_skb':
> + handle_kfree_skb(event_info)
> + elif name == 'skb__consume_skb':
> + handle_consume_skb(event_info)
> + # display receive hunks
> + if show_rx:
> + for i in range(len(receive_hunk_list)):
> + print_receive(receive_hunk_list[i])
> + # display transmit hunks
> + if show_tx:
> + print " dev len Qdisc " \
> + " netdevice free"
> + for i in range(len(tx_free_list)):
> + print_transmit(tx_free_list[i])
> + if debug:
> + print "debug buffer status"
> + print "----------------------------"
> + print "xmit Qdisc:remain:%d overflow:%d" % \
> + (len(tx_queue_list), of_count_tx_queue_list)
> + print "xmit netdevice:remain:%d overflow:%d" % \
> + (len(tx_xmit_list), of_count_tx_xmit_list)
> + print "receive:remain:%d overflow:%d" % \
> + (len(rx_skb_list), of_count_rx_skb_list)
> +
> +# called from perf, when it finds a correspoinding event
> +def irq__softirq_entry(name, context, cpu, sec, nsec, pid, comm, vec):
> + if symbol_str("irq__softirq_entry", "vec", vec) != "NET_RX":
> + return
> + event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, vec)
> + all_event_list.append(event_info)
> +
> +def irq__softirq_exit(name, context, cpu, sec, nsec, pid, comm, vec):
> + if symbol_str("irq__softirq_entry", "vec", vec) != "NET_RX":
> + return
> + event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, vec)
> + all_event_list.append(event_info)
> +
> +def irq__softirq_raise(name, context, cpu, sec, nsec, pid, comm, vec):
> + if symbol_str("irq__softirq_entry", "vec", vec) != "NET_RX":
> + return
> + event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, vec)
> + all_event_list.append(event_info)
> +
> +def irq__irq_handler_entry(name, context, cpu, sec, nsec, pid, comm,
> + irq, irq_name):
> + event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
> + irq, irq_name)
> + all_event_list.append(event_info)
> +
> +def irq__irq_handler_exit(name, context, cpu, sec, nsec, pid, comm, irq, ret):
> + event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, irq, ret)
> + all_event_list.append(event_info)
> +
> +def napi__napi_poll(name, context, cpu, sec, nsec, pid, comm, napi, dev_name):
> + event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
> + napi, dev_name)
> + all_event_list.append(event_info)
> +
> +def net__netif_receive_skb(name, context, cpu, sec, nsec, pid, comm, skbaddr,
> + skblen, dev_name):
> + event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
> + skbaddr, skblen, dev_name)
> + all_event_list.append(event_info)
> +
> +def net__netif_rx(name, context, cpu, sec, nsec, pid, comm, skbaddr,
> + skblen, dev_name):
> + event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
> + skbaddr, skblen, dev_name)
> + all_event_list.append(event_info)
> +
> +def net__net_dev_queue(name, context, cpu, sec, nsec, pid, comm,
> + skbaddr, skblen, dev_name):
> + event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
> + skbaddr, skblen, dev_name)
> + all_event_list.append(event_info)
> +
> +def net__net_dev_xmit(name, context, cpu, sec, nsec, pid, comm,
> + skbaddr, skblen, rc, dev_name):
> + event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
> + skbaddr, skblen, rc ,dev_name)
> + all_event_list.append(event_info)
> +
> +def skb__kfree_skb(name, context, cpu, sec, nsec, pid, comm,
> + skbaddr, protocol, location):
> + event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
> + skbaddr, protocol, location)
> + all_event_list.append(event_info)
> +
> +def skb__consume_skb(name, context, cpu, sec, nsec, pid, comm, skbaddr):
> + event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
> + skbaddr)
> + all_event_list.append(event_info)
> +
> +def skb__skb_copy_datagram_iovec(name, context, cpu, sec, nsec, pid, comm,
> + skbaddr, skblen):
> + event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
> + skbaddr, skblen)
> + all_event_list.append(event_info)
> +
> +def handle_irq_handler_entry(event_info):
> + (name, context, cpu, time, pid, comm, irq, irq_name) = event_info
> + if cpu not in irq_dic.keys():
> + irq_dic[cpu] = []
> + irq_record = {'irq':irq, 'name':irq_name, 'cpu':cpu, 'irq_ent_t':time}
> + irq_dic[cpu].append(irq_record)
> +
> +def handle_irq_handler_exit(event_info):
> + (name, context, cpu, time, pid, comm, irq, ret) = event_info
> + if cpu not in irq_dic.keys():
> + return
> + irq_record = irq_dic[cpu].pop()
> + if irq != irq_record['irq']:
> + return
> + irq_record.update({'irq_ext_t':time})
> + # if an irq doesn't include NET_RX softirq, drop.
> + if 'event_list' in irq_record.keys():
> + irq_dic[cpu].append(irq_record)
> +
> +def handle_irq_softirq_raise(event_info):
> + (name, context, cpu, time, pid, comm, vec) = event_info
> + if cpu not in irq_dic.keys() \
> + or len(irq_dic[cpu]) == 0:
> + return
> + irq_record = irq_dic[cpu].pop()
> + if 'event_list' in irq_record.keys():
> + irq_event_list = irq_record['event_list']
> + else:
> + irq_event_list = []
> + irq_event_list.append({'time':time, 'event':'sirq_raise'})
> + irq_record.update({'event_list':irq_event_list})
> + irq_dic[cpu].append(irq_record)
> +
> +def handle_irq_softirq_entry(event_info):
> + (name, context, cpu, time, pid, comm, vec) = event_info
> + net_rx_dic[cpu] = {'sirq_ent_t':time, 'event_list':[]}
> +
> +def handle_irq_softirq_exit(event_info):
> + (name, context, cpu, time, pid, comm, vec) = event_info
> + irq_list = []
> + event_list = 0
> + if cpu in irq_dic.keys():
> + irq_list = irq_dic[cpu]
> + del irq_dic[cpu]
> + if cpu in net_rx_dic.keys():
> + sirq_ent_t = net_rx_dic[cpu]['sirq_ent_t']
> + event_list = net_rx_dic[cpu]['event_list']
> + del net_rx_dic[cpu]
> + if irq_list == [] or event_list == 0:
> + return
> + rec_data = {'sirq_ent_t':sirq_ent_t, 'sirq_ext_t':time,
> + 'irq_list':irq_list, 'event_list':event_list}
> + # merge information realted to a NET_RX softirq
> + receive_hunk_list.append(rec_data)
> +
> +def handle_napi_poll(event_info):
> + (name, context, cpu, time, pid, comm, napi, dev_name) = event_info
> + if cpu in net_rx_dic.keys():
> + event_list = net_rx_dic[cpu]['event_list']
> + rec_data = {'event_name':'napi_poll',
> + 'dev':dev_name, 'event_t':time}
> + event_list.append(rec_data)
> +
> +def handle_netif_rx(event_info):
> + (name, context, cpu, time, pid, comm,
> + skbaddr, skblen, dev_name) = event_info
> + if cpu not in irq_dic.keys() \
> + or len(irq_dic[cpu]) == 0:
> + return
> + irq_record = irq_dic[cpu].pop()
> + if 'event_list' in irq_record.keys():
> + irq_event_list = irq_record['event_list']
> + else:
> + irq_event_list = []
> + irq_event_list.append({'time':time, 'event':'netif_rx',
> + 'skbaddr':skbaddr, 'skblen':skblen, 'dev_name':dev_name})
> + irq_record.update({'event_list':irq_event_list})
> + irq_dic[cpu].append(irq_record)
> +
> +def handle_netif_receive_skb(event_info):
> + global of_count_rx_skb_list
> +
> + (name, context, cpu, time, pid, comm,
> + skbaddr, skblen, dev_name) = event_info
> + if cpu in net_rx_dic.keys():
> + rec_data = {'event_name':'netif_receive_skb',
> + 'event_t':time, 'skbaddr':skbaddr, 'len':skblen}
> + event_list = net_rx_dic[cpu]['event_list']
> + event_list.append(rec_data)
> + rx_skb_list.insert(0, rec_data)
> + if len(rx_skb_list) > buffer_budget:
> + rx_skb_list.pop()
> + of_count_rx_skb_list += 1
> +
> +def handle_net_dev_queue(event_info):
> + global of_count_tx_queue_list
> +
> + (name, context, cpu, time, pid, comm,
> + skbaddr, skblen, dev_name) = event_info
> + skb = {'dev':dev_name, 'skbaddr':skbaddr, 'len':skblen, 'queue_t':time}
> + tx_queue_list.insert(0, skb)
> + if len(tx_queue_list) > buffer_budget:
> + tx_queue_list.pop()
> + of_count_tx_queue_list += 1
> +
> +def handle_net_dev_xmit(event_info):
> + global of_count_tx_xmit_list
> +
> + (name, context, cpu, time, pid, comm,
> + skbaddr, skblen, rc, dev_name) = event_info
> + if rc == 0: # NETDEV_TX_OK
> + for i in range(len(tx_queue_list)):
> + skb = tx_queue_list[i]
> + if skb['skbaddr'] == skbaddr:
> + skb['xmit_t'] = time
> + tx_xmit_list.insert(0, skb)
> + del tx_queue_list[i]
> + if len(tx_xmit_list) > buffer_budget:
> + tx_xmit_list.pop()
> + of_count_tx_xmit_list += 1
> + return
> +
> +def handle_kfree_skb(event_info):
> + (name, context, cpu, time, pid, comm,
> + skbaddr, protocol, location) = event_info
> + for i in range(len(tx_queue_list)):
> + skb = tx_queue_list[i]
> + if skb['skbaddr'] == skbaddr:
> + del tx_queue_list[i]
> + return
> + for i in range(len(tx_xmit_list)):
> + skb = tx_xmit_list[i]
> + if skb['skbaddr'] == skbaddr:
> + skb['free_t'] = time
> + tx_free_list.append(skb)
> + del tx_xmit_list[i]
> + return
> + for i in range(len(rx_skb_list)):
> + rec_data = rx_skb_list[i]
> + if rec_data['skbaddr'] == skbaddr:
> + rec_data.update({'handle':"kfree_skb",
> + 'comm':comm, 'pid':pid, 'comm_t':time})
> + del rx_skb_list[i]
> + return
> +
> +def handle_consume_skb(event_info):
> + (name, context, cpu, time, pid, comm, skbaddr) = event_info
> + for i in range(len(tx_xmit_list)):
> + skb = tx_xmit_list[i]
> + if skb['skbaddr'] == skbaddr:
> + skb['free_t'] = time
> + tx_free_list.append(skb)
> + del tx_xmit_list[i]
> + return
> +
> +def handle_skb_copy_datagram_iovec(event_info):
> + (name, context, cpu, time, pid, comm, skbaddr, skblen) = event_info
> + for i in range(len(rx_skb_list)):
> + rec_data = rx_skb_list[i]
> + if skbaddr == rec_data['skbaddr']:
> + rec_data.update({'handle':"skb_copy_datagram_iovec",
> + 'comm':comm, 'pid':pid, 'comm_t':time})
> + del rx_skb_list[i]
> + return
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>