In the status quo, we should see three different outcomes of the reported
sched-out task state from perf-script, perf-sched-timehist, and Tp_printk
of tracepoint sched_switch. And it's not hard to figure out that the
former two are built upon the third one, and the reason why we see this
inconsistency is that the former two does not catch up with the internal
change of reported task state definitions as the kernel evolves.
IMHO, exporting internal representations of task state in the tracepoint
sched_switch is not a good practice and not encouraged at all, which can
easily break userspace tools that relies on it. Especially when tracepoints
are massively used in many observability tools nowadays due to its stable
nature, which makes them no longer used for debug only purpose and we
should be careful to decide what ought to be reported to userspace and what
ought not.
Therefore, to fix the issues mentioned above for good, instead of choosing
to sync the userspace tracing tools with the latest task states constants
mapping, I proposed to replace reported task state in sched_switch with
a symbolic character, and save the further processing of userspace tools
and spare them from knowing further implementation details in the kernel.
After this patch seires, we report 'RSDTtXZPI' the same as in procfs, plus
a 'p' which denotes PREEMP_ACTIVE and is used for sched_switch tracepoint only.
Reviews welcome!
Regards,
Ze
Ze Gao (2):
sched, tracing: report task state in symbolic chars instead
perf sched: sync with latest sched_switch tracepoint definition
include/trace/events/sched.h | 41 ++++++++++-----------------
tools/perf/builtin-sched.c | 55 ++++++------------------------------
2 files changed, 24 insertions(+), 72 deletions(-)
Ze Gao (1):
libtraceevent: sync with latest sched_switch tracepoint definition
plugins/plugin_sched_switch.c | 23 +----------------------
1 file changed, 1 insertion(+), 22 deletions(-)
--
2.40.1
Since tracepoint sched_switch changes its reported task state type,
update the parsing logic accordingly.
Signed-off-by: Ze Gao <[email protected]>
---
plugins/plugin_sched_switch.c | 23 +----------------------
1 file changed, 1 insertion(+), 22 deletions(-)
diff --git a/plugins/plugin_sched_switch.c b/plugins/plugin_sched_switch.c
index 8752cae..37c1be2 100644
--- a/plugins/plugin_sched_switch.c
+++ b/plugins/plugin_sched_switch.c
@@ -9,27 +9,6 @@
#include "event-parse.h"
#include "trace-seq.h"
-static void write_state(struct trace_seq *s, int val)
-{
- const char states[] = "SDTtZXxW";
- int found = 0;
- int i;
-
- for (i = 0; i < (sizeof(states) - 1); i++) {
- if (!(val & (1 << i)))
- continue;
-
- if (found)
- trace_seq_putc(s, '|');
-
- found = 1;
- trace_seq_putc(s, states[i]);
- }
-
- if (!found)
- trace_seq_putc(s, 'R');
-}
-
static void write_and_save_comm(struct tep_format_field *field,
struct tep_record *record,
struct trace_seq *s, int pid)
@@ -100,7 +79,7 @@ static int sched_switch_handler(struct trace_seq *s,
trace_seq_printf(s, "[%d] ", (int) val);
if (tep_get_field_val(s, event, "prev_state", record, &val, 1) == 0)
- write_state(s, val);
+ trace_seq_putc(s, (char) val);
trace_seq_puts(s, " ==> ");
--
2.40.1