Report priorities in 'short' and prev_state in 'int' to save
some buffer space. And also reorder the fields so that we take
struct alignment into consideration to make the record compact.
Suggested-by: Steven Rostedt (Google) <[email protected]>
Signed-off-by: Ze Gao <[email protected]>
---
include/trace/events/sched.h | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)
diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index e507901bcab8..36863ffb00c6 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -188,7 +188,7 @@ DEFINE_EVENT(sched_wakeup_template, sched_wakeup_new,
TP_ARGS(p));
#ifdef CREATE_TRACE_POINTS
-static inline long __trace_sched_switch_state(bool preempt,
+static inline int __trace_sched_switch_state(bool preempt,
unsigned int prev_state,
struct task_struct *p)
{
@@ -251,25 +251,25 @@ TRACE_EVENT(sched_switch,
TP_ARGS(preempt, prev, next, prev_state),
TP_STRUCT__entry(
- __array( char, prev_comm, TASK_COMM_LEN )
__field( pid_t, prev_pid )
- __field( int, prev_prio )
- __field( long, prev_state )
- __field( char, prev_state_char )
- __array( char, next_comm, TASK_COMM_LEN )
__field( pid_t, next_pid )
- __field( int, next_prio )
+ __field( short, prev_prio )
+ __field( short, next_prio )
+ __field( int, prev_state )
+ __array( char, prev_comm, TASK_COMM_LEN )
+ __array( char, next_comm, TASK_COMM_LEN )
+ __field( char, prev_state_char )
),
TP_fast_assign(
- memcpy(__entry->next_comm, next->comm, TASK_COMM_LEN);
__entry->prev_pid = prev->pid;
- __entry->prev_prio = prev->prio;
+ __entry->next_pid = next->pid;
+ __entry->prev_prio = (short) prev->prio;
+ __entry->next_prio = (short) next->prio;
__entry->prev_state = __trace_sched_switch_state(preempt, prev_state, prev);
__entry->prev_state_char = __trace_sched_switch_state_char(preempt, prev_state, prev);
memcpy(__entry->prev_comm, prev->comm, TASK_COMM_LEN);
- __entry->next_pid = next->pid;
- __entry->next_prio = next->prio;
+ memcpy(__entry->next_comm, next->comm, TASK_COMM_LEN);
/* XXX SCHED_DEADLINE */
),
--
2.40.1
On Tue, Aug 01, 2023 at 05:01:22PM +0800, Ze Gao wrote:
> Report priorities in 'short' and prev_state in 'int' to save
> some buffer space. And also reorder the fields so that we take
> struct alignment into consideration to make the record compact.
>
> Suggested-by: Steven Rostedt (Google) <[email protected]>
I don't see a single line describing the effort you've done to audit
consumers of this tracepoint.
*IF* you're wanting to break this tracepoint ABI, because seriously
that's what it is, then you get to invest the time and effort to audit
the users.
Oops, I thought sending this series for RFC is the "effort" you mean
to audit the users :/
Correct me if I'm making stupid moves here and enlighten me what
I should do furthermore to audit the users.
Thanks,
Ze
On Tue, Aug 1, 2023 at 7:47 PM Peter Zijlstra <[email protected]> wrote:
>
> On Tue, Aug 01, 2023 at 05:01:22PM +0800, Ze Gao wrote:
> > Report priorities in 'short' and prev_state in 'int' to save
> > some buffer space. And also reorder the fields so that we take
> > struct alignment into consideration to make the record compact.
> >
> > Suggested-by: Steven Rostedt (Google) <[email protected]>
>
> I don't see a single line describing the effort you've done to audit
> consumers of this tracepoint.
>
> *IF* you're wanting to break this tracepoint ABI, because seriously
> that's what it is, then you get to invest the time and effort to audit
> the users.
On Tue, 1 Aug 2023 13:46:50 +0200
Peter Zijlstra <[email protected]> wrote:
> On Tue, Aug 01, 2023 at 05:01:22PM +0800, Ze Gao wrote:
> > Report priorities in 'short' and prev_state in 'int' to save
> > some buffer space. And also reorder the fields so that we take
> > struct alignment into consideration to make the record compact.
> >
> > Suggested-by: Steven Rostedt (Google) <[email protected]>
>
> I don't see a single line describing the effort you've done to audit
> consumers of this tracepoint.
>
> *IF* you're wanting to break this tracepoint ABI, because seriously
> that's what it is, then you get to invest the time and effort to audit
> the users.
The known major users that I am aware of is raesdaemon,
powertop/latencytop, perf, trace-cmd and some bpf tools. The bpf tooling is
known to update per kernel. The others all use libtraceevent that can
handle this change.
What other tools are there? There's Perfetto, but it also looks at tracefs
to examine where the values are. There's LTTng, but I believe it uses the
raw tracepoint directly and doesn't look at the layout of the ftrace/perf
buffers.
All other tooling I am slightly aware of uses libtracefs and libtraceveent,
as I've been giving many talks on how to use those libraries.
-- Steve
On Tue, 1 Aug 2023 17:01:22 +0800
Ze Gao <[email protected]> wrote:
> Report priorities in 'short' and prev_state in 'int' to save
> some buffer space. And also reorder the fields so that we take
> struct alignment into consideration to make the record compact.
>
> Suggested-by: Steven Rostedt (Google) <[email protected]>
> Signed-off-by: Ze Gao <[email protected]>
I'd swap this patch with patch 3. That is, make the field changes first.
I'd like this to get in regardless of if the state_char is accepted. We may
want to get this in first to see if there's any regressions before we add a
state_char.
-- Steve
> ---
> include/trace/events/sched.h | 22 +++++++++++-----------
> 1 file changed, 11 insertions(+), 11 deletions(-)
>
> diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
> index e507901bcab8..36863ffb00c6 100644
> --- a/include/trace/events/sched.h
> +++ b/include/trace/events/sched.h
> @@ -188,7 +188,7 @@ DEFINE_EVENT(sched_wakeup_template, sched_wakeup_new,
> TP_ARGS(p));
>
> #ifdef CREATE_TRACE_POINTS
> -static inline long __trace_sched_switch_state(bool preempt,
> +static inline int __trace_sched_switch_state(bool preempt,
> unsigned int prev_state,
> struct task_struct *p)
> {
> @@ -251,25 +251,25 @@ TRACE_EVENT(sched_switch,
> TP_ARGS(preempt, prev, next, prev_state),
>
> TP_STRUCT__entry(
> - __array( char, prev_comm, TASK_COMM_LEN )
> __field( pid_t, prev_pid )
> - __field( int, prev_prio )
> - __field( long, prev_state )
> - __field( char, prev_state_char )
> - __array( char, next_comm, TASK_COMM_LEN )
> __field( pid_t, next_pid )
> - __field( int, next_prio )
> + __field( short, prev_prio )
> + __field( short, next_prio )
> + __field( int, prev_state )
> + __array( char, prev_comm, TASK_COMM_LEN )
> + __array( char, next_comm, TASK_COMM_LEN )
> + __field( char, prev_state_char )
> ),
>
> TP_fast_assign(
> - memcpy(__entry->next_comm, next->comm, TASK_COMM_LEN);
> __entry->prev_pid = prev->pid;
> - __entry->prev_prio = prev->prio;
> + __entry->next_pid = next->pid;
> + __entry->prev_prio = (short) prev->prio;
> + __entry->next_prio = (short) next->prio;
> __entry->prev_state = __trace_sched_switch_state(preempt, prev_state, prev);
> __entry->prev_state_char = __trace_sched_switch_state_char(preempt, prev_state, prev);
> memcpy(__entry->prev_comm, prev->comm, TASK_COMM_LEN);
> - __entry->next_pid = next->pid;
> - __entry->next_prio = next->prio;
> + memcpy(__entry->next_comm, next->comm, TASK_COMM_LEN);
> /* XXX SCHED_DEADLINE */
> ),
>
Thanks for clarifying this ! Steven. This is really helpful.
Regards,
Ze
On Tue, Aug 1, 2023 at 10:33 PM Steven Rostedt <[email protected]> wrote:
>
> On Tue, 1 Aug 2023 13:46:50 +0200
> Peter Zijlstra <[email protected]> wrote:
>
> > On Tue, Aug 01, 2023 at 05:01:22PM +0800, Ze Gao wrote:
> > > Report priorities in 'short' and prev_state in 'int' to save
> > > some buffer space. And also reorder the fields so that we take
> > > struct alignment into consideration to make the record compact.
> > >
> > > Suggested-by: Steven Rostedt (Google) <[email protected]>
> >
> > I don't see a single line describing the effort you've done to audit
> > consumers of this tracepoint.
> >
> > *IF* you're wanting to break this tracepoint ABI, because seriously
> > that's what it is, then you get to invest the time and effort to audit
> > the users.
>
> The known major users that I am aware of is raesdaemon,
> powertop/latencytop, perf, trace-cmd and some bpf tools. The bpf tooling is
> known to update per kernel. The others all use libtraceevent that can
> handle this change.
>
> What other tools are there? There's Perfetto, but it also looks at tracefs
> to examine where the values are. There's LTTng, but I believe it uses the
> raw tracepoint directly and doesn't look at the layout of the ftrace/perf
> buffers.
>
> All other tooling I am slightly aware of uses libtracefs and libtraceveent,
> as I've been giving many talks on how to use those libraries.
>
> -- Steve
Fair point, will do it in v4 as well.
Thanks,
Ze
On Tue, Aug 1, 2023 at 10:16 PM Steven Rostedt <[email protected]> wrote:
>
> On Tue, 1 Aug 2023 17:01:22 +0800
> Ze Gao <[email protected]> wrote:
>
> > Report priorities in 'short' and prev_state in 'int' to save
> > some buffer space. And also reorder the fields so that we take
> > struct alignment into consideration to make the record compact.
> >
> > Suggested-by: Steven Rostedt (Google) <[email protected]>
> > Signed-off-by: Ze Gao <[email protected]>
>
> I'd swap this patch with patch 3. That is, make the field changes first.
> I'd like this to get in regardless of if the state_char is accepted. We may
> want to get this in first to see if there's any regressions before we add a
> state_char.
>
> -- Steve
>
>
> > ---
> > include/trace/events/sched.h | 22 +++++++++++-----------
> > 1 file changed, 11 insertions(+), 11 deletions(-)
> >
> > diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
> > index e507901bcab8..36863ffb00c6 100644
> > --- a/include/trace/events/sched.h
> > +++ b/include/trace/events/sched.h
> > @@ -188,7 +188,7 @@ DEFINE_EVENT(sched_wakeup_template, sched_wakeup_new,
> > TP_ARGS(p));
> >
> > #ifdef CREATE_TRACE_POINTS
> > -static inline long __trace_sched_switch_state(bool preempt,
> > +static inline int __trace_sched_switch_state(bool preempt,
> > unsigned int prev_state,
> > struct task_struct *p)
> > {
> > @@ -251,25 +251,25 @@ TRACE_EVENT(sched_switch,
> > TP_ARGS(preempt, prev, next, prev_state),
> >
> > TP_STRUCT__entry(
> > - __array( char, prev_comm, TASK_COMM_LEN )
> > __field( pid_t, prev_pid )
> > - __field( int, prev_prio )
> > - __field( long, prev_state )
> > - __field( char, prev_state_char )
> > - __array( char, next_comm, TASK_COMM_LEN )
> > __field( pid_t, next_pid )
> > - __field( int, next_prio )
> > + __field( short, prev_prio )
> > + __field( short, next_prio )
> > + __field( int, prev_state )
> > + __array( char, prev_comm, TASK_COMM_LEN )
> > + __array( char, next_comm, TASK_COMM_LEN )
> > + __field( char, prev_state_char )
> > ),
> >
> > TP_fast_assign(
> > - memcpy(__entry->next_comm, next->comm, TASK_COMM_LEN);
> > __entry->prev_pid = prev->pid;
> > - __entry->prev_prio = prev->prio;
> > + __entry->next_pid = next->pid;
> > + __entry->prev_prio = (short) prev->prio;
> > + __entry->next_prio = (short) next->prio;
> > __entry->prev_state = __trace_sched_switch_state(preempt, prev_state, prev);
> > __entry->prev_state_char = __trace_sched_switch_state_char(preempt, prev_state, prev);
> > memcpy(__entry->prev_comm, prev->comm, TASK_COMM_LEN);
> > - __entry->next_pid = next->pid;
> > - __entry->next_prio = next->prio;
> > + memcpy(__entry->next_comm, next->comm, TASK_COMM_LEN);
> > /* XXX SCHED_DEADLINE */
> > ),
> >
>