LinuxLists.cc - Query: Regarding Notifier chain callback debugging or profiling

2020-02-10 11:57:46

Subject: Query: Regarding Notifier chain callback debugging or profiling

Hi,

In Linux kernel, everywhere we are using notification chains to notify
for any kernel events, But we don't have any debugging or profiling
mechanism to know which callback is taking time or currently we are
stuck on which call back(without dumps it is difficult to say for last
problem)

Below are the few ways, which we can implement to profile callback on
need basis:

1) Use trace event before and after callback:

static int notifier_call_chain(struct notifier_block **nl,
unsigned long val, void *v,
int nr_to_call, int *nr_calls)
{
int ret = NOTIFY_DONE;
struct notifier_block *nb, *next_nb;

+ trace_event for entry of callback
ret = nb->notifier_call(nb, val, v);
+ trace_event for exit of callback

}
return ret;
}

2) Or use pr_debug instead of trace_event

3) Both of the above approach has certain problems, like it will dump
callback for each notifier chain, which might flood trace buffer or dmesg.

So we can use bool variable to control that and dump the required
notification chain only.

Some thing like below we can use:

struct srcu_notifier_head {
struct mutex mutex;
struct srcu_struct srcu;
struct notifier_block __rcu *head;
+ bool debug_callback;
};

static int notifier_call_chain(struct notifier_block **nl,
unsigned long val, void *v,
- int nr_to_call, int *nr_calls)
+ int nr_to_call, int *nr_calls, bool
debug_callback)
{
int ret = NOTIFY_DONE;
struct notifier_block *nb, *next_nb;
@@ -526,6 +526,7 @@ void srcu_init_notifier_head(struct
srcu_notifier_head *nh)
if (init_srcu_struct(&nh->srcu) < 0)
BUG();
nh->head = NULL;
+ nh->debug_callback = false; -> by default it would be false for
every notifier chain.

4) we can also think of something pre and post function, before and
after each callback, And we can enable only for those who wants to profile.

Please let us what approach we can use, or please suggest some debugging
mechanism for the same.

Regards
Gaurav

--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center,
Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

2020-02-10 21:07:54

by Greg Kroah-Hartman

[permalink] [raw]

Subject: Re: Query: Regarding Notifier chain callback debugging or profiling

On Mon, Feb 10, 2020 at 05:26:16PM +0530, Gaurav Kohli wrote:
> Hi,
>
> In Linux kernel, everywhere we are using notification chains to notify for
> any kernel events, But we don't have any debugging or profiling mechanism to
> know which callback is taking time or currently we are stuck on which call
> back(without dumps it is difficult to say for last problem)

Callbacks are a mess, I agree.

> Below are the few ways, which we can implement to profile callback on need
> basis:
>
> 1) Use trace event before and after callback:
>
> static int notifier_call_chain(struct notifier_block **nl,
> unsigned long val, void *v,
> int nr_to_call, int *nr_calls)
> {
> int ret = NOTIFY_DONE;
> struct notifier_block *nb, *next_nb;
>
>
> + trace_event for entry of callback
> ret = nb->notifier_call(nb, val, v);
> + trace_event for exit of callback

Ick.

> }
> return ret;
> }
>
> 2) Or use pr_debug instead of trace_event
>
> 3) Both of the above approach has certain problems, like it will dump
> callback for each notifier chain, which might flood trace buffer or dmesg.
>
> So we can use bool variable to control that and dump the required
> notification chain only.
>
> Some thing like below we can use:
>
> struct srcu_notifier_head {
> struct mutex mutex;
> struct srcu_struct srcu;
> struct notifier_block __rcu *head;
> + bool debug_callback;
> };
>
>
> static int notifier_call_chain(struct notifier_block **nl,
> unsigned long val, void *v,
> - int nr_to_call, int *nr_calls)
> + int nr_to_call, int *nr_calls, bool
> debug_callback)
> {
> int ret = NOTIFY_DONE;
> struct notifier_block *nb, *next_nb;
> @@ -526,6 +526,7 @@ void srcu_init_notifier_head(struct srcu_notifier_head
> *nh)
> if (init_srcu_struct(&nh->srcu) < 0)
> BUG();
> nh->head = NULL;
> + nh->debug_callback = false; -> by default it would be false for
> every notifier chain.
>
> 4) we can also think of something pre and post function, before and after
> each callback, And we can enable only for those who wants to profile.
>
> Please let us what approach we can use, or please suggest some debugging
> mechanism for the same.

Why not just pay attention to the specific notifier you want? Trace
when the specific blocking_notifier_call_chain() is called.

What specific notifier call chain is causing you problems that you need
to debug?

thanks,

greg k-h

2020-02-11 05:50:17

by Gaurav Kohli

[permalink] [raw]

Subject: Re: Query: Regarding Notifier chain callback debugging or profiling

On 2/11/2020 2:36 AM, Greg KH wrote:
> On Mon, Feb 10, 2020 at 05:26:16PM +0530, Gaurav Kohli wrote:
>> Hi,
>>
>> In Linux kernel, everywhere we are using notification chains to notify for
>> any kernel events, But we don't have any debugging or profiling mechanism to
>> know which callback is taking time or currently we are stuck on which call
>> back(without dumps it is difficult to say for last problem)
>
> Callbacks are a mess, I agree.
>
>> Below are the few ways, which we can implement to profile callback on need
>> basis:
>>
>> 1) Use trace event before and after callback:
>>
>> static int notifier_call_chain(struct notifier_block **nl,
>> unsigned long val, void *v,
>> int nr_to_call, int *nr_calls)
>> {
>> int ret = NOTIFY_DONE;
>> struct notifier_block *nb, *next_nb;
>>
>>
>> + trace_event for entry of callback
>> ret = nb->notifier_call(nb, val, v);
>> + trace_event for exit of callback
>
> Ick.
>
>> }
>> return ret;
>> }
>>
>> 2) Or use pr_debug instead of trace_event
>>
>> 3) Both of the above approach has certain problems, like it will dump
>> callback for each notifier chain, which might flood trace buffer or dmesg.
>>
>> So we can use bool variable to control that and dump the required
>> notification chain only.
>>
>> Some thing like below we can use:
>>
>> struct srcu_notifier_head {
>> struct mutex mutex;
>> struct srcu_struct srcu;
>> struct notifier_block __rcu *head;
>> + bool debug_callback;
>> };
>>
>>
>> static int notifier_call_chain(struct notifier_block **nl,
>> unsigned long val, void *v,
>> - int nr_to_call, int *nr_calls)
>> + int nr_to_call, int *nr_calls, bool
>> debug_callback)
>> {
>> int ret = NOTIFY_DONE;
>> struct notifier_block *nb, *next_nb;
>> @@ -526,6 +526,7 @@ void srcu_init_notifier_head(struct srcu_notifier_head
>> *nh)
>> if (init_srcu_struct(&nh->srcu) < 0)
>> BUG();
>> nh->head = NULL;
>> + nh->debug_callback = false; -> by default it would be false for
>> every notifier chain.
>>
>> 4) we can also think of something pre and post function, before and after
>> each callback, And we can enable only for those who wants to profile.
>>
>> Please let us what approach we can use, or please suggest some debugging
>> mechanism for the same.
>
> Why not just pay attention to the specific notifier you want? Trace
> when the specific blocking_notifier_call_chain() is called.
>
> What specific notifier call chain is causing you problems that you need
> to debug?

Thanks Greg for the reply.
I agree, we can trace specific notifier chain, but that is very hacky(we
have to add debug code here and there when problems comes)

We are using lot of SRCU notifier callchain to notify clients for
events, And if we have something generic debugging mechanism, we just
have to switch on for that particular client for initial testing phase.

As mentioned above, if we can come up with something like below then
only client has to switch on who wants to debug:
>> struct srcu_notifier_head {
>> struct mutex mutex;
>> struct srcu_struct srcu;
>> struct notifier_block __rcu *head;
>> + bool debug_callback; -> this we can turn on for particular
client.
>> };

Right now we don't have any generic way to debug notifier chains, please
suggest some approach. On live target, it is difficult to say where
notification chain got stuck.

Regards
Gaurav
>
> thanks,
>
> greg k-h
>

--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center,
Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

2020-02-11 12:46:37

by Greg Kroah-Hartman

[permalink] [raw]

Subject: Re: Query: Regarding Notifier chain callback debugging or profiling

On Tue, Feb 11, 2020 at 10:16:03AM +0530, Gaurav Kohli wrote:
>
>
> On 2/11/2020 2:36 AM, Greg KH wrote:
> > On Mon, Feb 10, 2020 at 05:26:16PM +0530, Gaurav Kohli wrote:
> > > Hi,
> > >
> > > In Linux kernel, everywhere we are using notification chains to notify for
> > > any kernel events, But we don't have any debugging or profiling mechanism to
> > > know which callback is taking time or currently we are stuck on which call
> > > back(without dumps it is difficult to say for last problem)
> >
> > Callbacks are a mess, I agree.
> >
> > > Below are the few ways, which we can implement to profile callback on need
> > > basis:
> > >
> > > 1) Use trace event before and after callback:
> > >
> > > static int notifier_call_chain(struct notifier_block **nl,
> > > unsigned long val, void *v,
> > > int nr_to_call, int *nr_calls)
> > > {
> > > int ret = NOTIFY_DONE;
> > > struct notifier_block *nb, *next_nb;
> > >
> > >
> > > + trace_event for entry of callback
> > > ret = nb->notifier_call(nb, val, v);
> > > + trace_event for exit of callback
> >
> > Ick.
> >
> > > }
> > > return ret;
> > > }
> > >
> > > 2) Or use pr_debug instead of trace_event
> > >
> > > 3) Both of the above approach has certain problems, like it will dump
> > > callback for each notifier chain, which might flood trace buffer or dmesg.
> > >
> > > So we can use bool variable to control that and dump the required
> > > notification chain only.
> > >
> > > Some thing like below we can use:
> > >
> > > struct srcu_notifier_head {
> > > struct mutex mutex;
> > > struct srcu_struct srcu;
> > > struct notifier_block __rcu *head;
> > > + bool debug_callback;
> > > };
> > >
> > >
> > > static int notifier_call_chain(struct notifier_block **nl,
> > > unsigned long val, void *v,
> > > - int nr_to_call, int *nr_calls)
> > > + int nr_to_call, int *nr_calls, bool
> > > debug_callback)
> > > {
> > > int ret = NOTIFY_DONE;
> > > struct notifier_block *nb, *next_nb;
> > > @@ -526,6 +526,7 @@ void srcu_init_notifier_head(struct srcu_notifier_head
> > > *nh)
> > > if (init_srcu_struct(&nh->srcu) < 0)
> > > BUG();
> > > nh->head = NULL;
> > > + nh->debug_callback = false; -> by default it would be false for
> > > every notifier chain.
> > >
> > > 4) we can also think of something pre and post function, before and after
> > > each callback, And we can enable only for those who wants to profile.
> > >
> > > Please let us what approach we can use, or please suggest some debugging
> > > mechanism for the same.
> >
> > Why not just pay attention to the specific notifier you want? Trace
> > when the specific blocking_notifier_call_chain() is called.
> >
> > What specific notifier call chain is causing you problems that you need
> > to debug?
>
> Thanks Greg for the reply.
> I agree, we can trace specific notifier chain, but that is very hacky(we
> have to add debug code here and there when problems comes)
>
> We are using lot of SRCU notifier callchain to notify clients for events,
> And if we have something generic debugging mechanism, we just have to switch
> on for that particular client for initial testing phase.

Why are you using SRCU notifier chains for events?

What are you using them for like this, what in-kernel code is this so
that I can see what you are doing?

That feels like a very slow way of doing things, especially given the
recent changes in compilers due to Spectre issues.

> As mentioned above, if we can come up with something like below then only
> client has to switch on who wants to debug:
> >> struct srcu_notifier_head {
> >> struct mutex mutex;
> >> struct srcu_struct srcu;
> >> struct notifier_block __rcu *head;
> >> + bool debug_callback; -> this we can turn on for particular
> client.
> >> };
>
> Right now we don't have any generic way to debug notifier chains, please
> suggest some approach. On live target, it is difficult to say where
> notification chain got stuck.

I suggest not using notifier chains for events :)

Seriously, try something local for your specific notifiers first. It
should be easy to just add tracing for all of them using ftrace or bpf,
right?

thanks,

greg k-h