by Valentin Schneider

[permalink] [raw]

Subject: Re: [RFC PATCH 1/1] smp: Add tracepoints for functions called with smp_call_function*()

On 10/05/23 17:27, Leonardo Brás wrote:
> On Thu, 2023-05-04 at 12:59 +0100, Valentin Schneider wrote:
>> +TRACE_EVENT(csd_queue_cpu,
>> +
>> + TP_PROTO(const unsigned int cpu,
>> + unsigned long callsite,
>> + smp_call_func_t func,
>> + call_single_data_t *csd),
>> +
>> + TP_ARGS(cpu, callsite, func, csd),
>> +
>> + TP_STRUCT__entry(
>> + __field(unsigned int, cpu)
>> + __field(void *, callsite)
>> + __field(void *, func)
>> + __field(void *, csd)
>> + ),
>> +
>> + TP_fast_assign(
>> + __entry->cpu = cpu;
>> + __entry->callsite = (void *)callsite;
>> + __entry->func = func;
>> + __entry->csd = csd;
>> + ),
>> +
>> + TP_printk("cpu=%u callsite=%pS func=%pS csd=%p",
>> + __entry->cpu, __entry->callsite, __entry->func, __entry->csd)
>> +);
>
> This is for the caller side, right?
>

Yep, see usage lower down.

>> +
>> +DECLARE_EVENT_CLASS(csd_function,
>> +
>> + TP_PROTO(smp_call_func_t func, call_single_data_t *csd),
>> +
>> + TP_ARGS(func, csd),
>> +
>> + TP_STRUCT__entry(
>> + __field(void *, func)
>> + __field(void *, csd)
>> + ),
>> +
>> + TP_fast_assign(
>> + __entry->func = func;
>> + __entry->csd = csd;
>> + ),
>> +
>> + TP_printk("func=%pS csd=%p", __entry->func, __entry->csd)
>> +);
>> +
>> +DEFINE_EVENT(csd_function, csd_function_entry,
>> + TP_PROTO(smp_call_func_t func, call_single_data_t *csd),
>> + TP_ARGS(func, csd)
>> +);
>> +
>> +DEFINE_EVENT(csd_function, csd_function_exit,
>> + TP_PROTO(smp_call_func_t func, call_single_data_t *csd),
>> + TP_ARGS(func, csd)
>> +);
>
> Oh, this is what event_class is for. Thanks for the example :)
>
>> +
>> +#endif /* _TRACE_SMP_H */
>> +
>> +/* This part must be outside protection */
>> +#include <trace/define_trace.h>
>> diff --git a/kernel/smp.c b/kernel/smp.c
>> index ab3e5dad6cfe9..7d28db303e9bc 100644
>> --- a/kernel/smp.c
>> +++ b/kernel/smp.c
>> @@ -27,6 +27,9 @@
>> #include <linux/jump_label.h>
>>
>> #include <trace/events/ipi.h>
>> +#define CREATE_TRACE_POINTS
>> +#include <trace/events/smp.h>
>> +#undef CREATE_TRACE_POINTS
>>
>> #include "smpboot.h"
>> #include "sched/smp.h"
>> @@ -121,6 +124,14 @@ send_call_function_ipi_mask(struct cpumask *mask)
>> arch_send_call_function_ipi_mask(mask);
>> }
>>
>> +static __always_inline void
>> +csd_do_func(smp_call_func_t func, void *info, call_single_data_t *csd)
>> +{
>> + trace_csd_function_entry(func, csd);
>> + func(info);
>> + trace_csd_function_exit(func, csd);
>> +}
>> +
>
> Good one, a helper to avoid calling those traces everywhere.
>
> Honest question:
> Since info == csd->info and func == csd->func, we could just pass csd, right?
> I suppose the suggestion on the 3-argument version is to use the values already
> fetched from memory instead of fetching them again. Is that correct?
>

There's also the special case of CSD_TYPE_TTWU where there is no csd->func,
instead we have an implicit func mapping to sched_ttwu_pending). I think
it's preferable to directly feed the right things to the TP than to
duplicate the "decoding" logic against the *csd passed as TP argument.

2023-05-11 09:35:43

by Leonardo Bras

[permalink] [raw]

Subject: Re: [RFC PATCH 1/1] smp: Add tracepoints for functions called with smp_call_function*()

On Thu, May 11, 2023 at 5:13 AM Valentin Schneider <[email protected]> wrote:
>
> On 10/05/23 17:27, Leonardo Brás wrote:
> > On Thu, 2023-05-04 at 12:59 +0100, Valentin Schneider wrote:
> >> +TRACE_EVENT(csd_queue_cpu,
> >> +
> >> + TP_PROTO(const unsigned int cpu,
> >> + unsigned long callsite,
> >> + smp_call_func_t func,
> >> + call_single_data_t *csd),
> >> +
> >> + TP_ARGS(cpu, callsite, func, csd),
> >> +
> >> + TP_STRUCT__entry(
> >> + __field(unsigned int, cpu)
> >> + __field(void *, callsite)
> >> + __field(void *, func)
> >> + __field(void *, csd)
> >> + ),
> >> +
> >> + TP_fast_assign(
> >> + __entry->cpu = cpu;
> >> + __entry->callsite = (void *)callsite;
> >> + __entry->func = func;
> >> + __entry->csd = csd;
> >> + ),
> >> +
> >> + TP_printk("cpu=%u callsite=%pS func=%pS csd=%p",
> >> + __entry->cpu, __entry->callsite, __entry->func, __entry->csd)
> >> +);
> >
> > This is for the caller side, right?
> >
>
> Yep, see usage lower down.
>
> >> +
> >> +DECLARE_EVENT_CLASS(csd_function,
> >> +
> >> + TP_PROTO(smp_call_func_t func, call_single_data_t *csd),
> >> +
> >> + TP_ARGS(func, csd),
> >> +
> >> + TP_STRUCT__entry(
> >> + __field(void *, func)
> >> + __field(void *, csd)
> >> + ),
> >> +
> >> + TP_fast_assign(
> >> + __entry->func = func;
> >> + __entry->csd = csd;
> >> + ),
> >> +
> >> + TP_printk("func=%pS csd=%p", __entry->func, __entry->csd)
> >> +);
> >> +
> >> +DEFINE_EVENT(csd_function, csd_function_entry,
> >> + TP_PROTO(smp_call_func_t func, call_single_data_t *csd),
> >> + TP_ARGS(func, csd)
> >> +);
> >> +
> >> +DEFINE_EVENT(csd_function, csd_function_exit,
> >> + TP_PROTO(smp_call_func_t func, call_single_data_t *csd),
> >> + TP_ARGS(func, csd)
> >> +);
> >
> > Oh, this is what event_class is for. Thanks for the example :)
> >
> >> +
> >> +#endif /* _TRACE_SMP_H */
> >> +
> >> +/* This part must be outside protection */
> >> +#include <trace/define_trace.h>
> >> diff --git a/kernel/smp.c b/kernel/smp.c
> >> index ab3e5dad6cfe9..7d28db303e9bc 100644
> >> --- a/kernel/smp.c
> >> +++ b/kernel/smp.c
> >> @@ -27,6 +27,9 @@
> >> #include <linux/jump_label.h>
> >>
> >> #include <trace/events/ipi.h>
> >> +#define CREATE_TRACE_POINTS
> >> +#include <trace/events/smp.h>
> >> +#undef CREATE_TRACE_POINTS
> >>
> >> #include "smpboot.h"
> >> #include "sched/smp.h"
> >> @@ -121,6 +124,14 @@ send_call_function_ipi_mask(struct cpumask *mask)
> >> arch_send_call_function_ipi_mask(mask);
> >> }
> >>
> >> +static __always_inline void
> >> +csd_do_func(smp_call_func_t func, void *info, call_single_data_t *csd)
> >> +{
> >> + trace_csd_function_entry(func, csd);
> >> + func(info);
> >> + trace_csd_function_exit(func, csd);
> >> +}
> >> +
> >
> > Good one, a helper to avoid calling those traces everywhere.
> >
> > Honest question:
> > Since info == csd->info and func == csd->func, we could just pass csd, right?
> > I suppose the suggestion on the 3-argument version is to use the values already
> > fetched from memory instead of fetching them again. Is that correct?
> >
>
> There's also the special case of CSD_TYPE_TTWU where there is no csd->func,
> instead we have an implicit func mapping to sched_ttwu_pending). I think
> it's preferable to directly feed the right things to the TP than to
> duplicate the "decoding" logic against the *csd passed as TP argument.
>

Quite interesting, thank you for sharing!

I sent a v3 which got a warning from the kernel testing robot. I
already solved the warning, so please provide feedback on the rest of
the patch.

About the warning:
It is an alignment error between 'struct __call_single_data' from
generic_exec_single() and 'call_single_data_t' from csd_do_func(): the
first is 8-byte aligned, and the second is 32-byte aligned according
to the typedef.

My first idea was to convert my patches' parameters from
call_single_data_t to 'struct __call_single_data', but then I found
out the 'struct' option allows splitting csd between 2 cachelines,
which is usually bad.
Then I decided to send a patchset [1] fixing generic_exec_single() and
its callers. If it's accepted, the warning will go away, and v3 will
merge cleanly.

If not, I will send a v4 changing the parameters.

Thanks for reviewing,
Leo

1. https://lore.kernel.org/lkml/[email protected]