2019-01-18 10:11:27

by Pintu Kumar

[permalink] [raw]
Subject: Need help: how to locate failure from irq_chip subsystem

Hi All,

Currently, I am trying to debug a boot up crash on some qualcomm
snapdragon arm64 board with kernel 4.9.
I could find the cause of the failure, but I am unable to locate from
which subsystem/drivers this is getting triggered.
If you have any ideas or suggestions to locate the issue, please let me know.

This is the snapshot of crash logs:
[ 6.907065] Unable to handle kernel NULL pointer dereference at
virtual address 00000000
[ 6.973938] PC is at 0x0
[ 6.976503] LR is at __ipipe_ack_fasteoi_irq+0x28/0x38
[ 7.151078] Process qmp_aop (pid: 24, stack limit = 0xfffffffbedc18000)
[ 7.242668] [< (null)>] (null)
[ 7.247416] [<ffffff9469f8d2e0>] __ipipe_dispatch_irq+0x78/0x340
[ 7.253469] [<ffffff9469e81564>] __ipipe_grab_irq+0x5c/0xd0
[ 7.341538] [<ffffff9469e81d68>] gic_handle_irq+0xc0/0x154

[ 6.288581] [PINTU]: __ipipe_ack_fasteoi_irq - called
[ 6.293698] [PINTU]: __ipipe_ack_fasteoi_irq:
desc->irq_data.chip->irq_hold is NULL

When I check, I found that the irq_hold implementation is missing in
one of the irq_chip driver (expected by ipipe), which I am supposed to
implement.

But I am unable to locate which irq_chip driver.
If there are any good techniques to locate this in kernel, please help.


Thanks,
Pintu


2019-01-18 10:33:08

by Sai Prakash Ranjan

[permalink] [raw]
Subject: Re: Need help: how to locate failure from irq_chip subsystem

Hi Pintu-san,

On 1/18/2019 3:38 PM, Pintu Agarwal wrote:
> Hi All,
>
> Currently, I am trying to debug a boot up crash on some qualcomm
> snapdragon arm64 board with kernel 4.9.
> I could find the cause of the failure, but I am unable to locate from
> which subsystem/drivers this is getting triggered.
> If you have any ideas or suggestions to locate the issue, please let me know.
>
> This is the snapshot of crash logs:
> [ 6.907065] Unable to handle kernel NULL pointer dereference at
> virtual address 00000000
> [ 6.973938] PC is at 0x0
> [ 6.976503] LR is at __ipipe_ack_fasteoi_irq+0x28/0x38
> [ 7.151078] Process qmp_aop (pid: 24, stack limit = 0xfffffffbedc18000)
> [ 7.242668] [< (null)>] (null)
> [ 7.247416] [<ffffff9469f8d2e0>] __ipipe_dispatch_irq+0x78/0x340
> [ 7.253469] [<ffffff9469e81564>] __ipipe_grab_irq+0x5c/0xd0
> [ 7.341538] [<ffffff9469e81d68>] gic_handle_irq+0xc0/0x154
>
> [ 6.288581] [PINTU]: __ipipe_ack_fasteoi_irq - called
> [ 6.293698] [PINTU]: __ipipe_ack_fasteoi_irq:
> desc->irq_data.chip->irq_hold is NULL
>
> When I check, I found that the irq_hold implementation is missing in
> one of the irq_chip driver (expected by ipipe), which I am supposed to
> implement.
>
> But I am unable to locate which irq_chip driver.
> If there are any good techniques to locate this in kernel, please help.
>

Could you please tell which QCOM SoC this board is based on?

Thanks,
Sai

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

2019-01-18 10:50:26

by Pintu Kumar

[permalink] [raw]
Subject: Re: Need help: how to locate failure from irq_chip subsystem

On Fri, Jan 18, 2019 at 3:54 PM Sai Prakash Ranjan
<[email protected]> wrote:
>
> Hi Pintu-san,
>
> On 1/18/2019 3:38 PM, Pintu Agarwal wrote:
> > Hi All,
> >
> > Currently, I am trying to debug a boot up crash on some qualcomm
> > snapdragon arm64 board with kernel 4.9.
> > I could find the cause of the failure, but I am unable to locate from
> > which subsystem/drivers this is getting triggered.
> > If you have any ideas or suggestions to locate the issue, please let me know.
> >
> > This is the snapshot of crash logs:
> > [ 6.907065] Unable to handle kernel NULL pointer dereference at
> > virtual address 00000000
> > [ 6.973938] PC is at 0x0
> > [ 6.976503] LR is at __ipipe_ack_fasteoi_irq+0x28/0x38
> > [ 7.151078] Process qmp_aop (pid: 24, stack limit = 0xfffffffbedc18000)
> > [ 7.242668] [< (null)>] (null)
> > [ 7.247416] [<ffffff9469f8d2e0>] __ipipe_dispatch_irq+0x78/0x340
> > [ 7.253469] [<ffffff9469e81564>] __ipipe_grab_irq+0x5c/0xd0
> > [ 7.341538] [<ffffff9469e81d68>] gic_handle_irq+0xc0/0x154
> >
> > [ 6.288581] [PINTU]: __ipipe_ack_fasteoi_irq - called
> > [ 6.293698] [PINTU]: __ipipe_ack_fasteoi_irq:
> > desc->irq_data.chip->irq_hold is NULL
> >
> > When I check, I found that the irq_hold implementation is missing in
> > one of the irq_chip driver (expected by ipipe), which I am supposed to
> > implement.
> >
> > But I am unable to locate which irq_chip driver.
> > If there are any good techniques to locate this in kernel, please help.
> >
>
> Could you please tell which QCOM SoC this board is based on?
>

Snapdragon 845 with kernel 4.9.x
I want to know from which subsystem it is triggered:drivers/soc/qcom/

>
> --
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
> of Code Aurora Forum, hosted by The Linux Foundation

2019-01-18 11:05:20

by Sai Prakash Ranjan

[permalink] [raw]
Subject: Re: Need help: how to locate failure from irq_chip subsystem

On 1/18/2019 4:18 PM, Pintu Agarwal wrote:
> On Fri, Jan 18, 2019 at 3:54 PM Sai Prakash Ranjan
> <[email protected]> wrote:
>>
>> Hi Pintu-san,
>>
>> On 1/18/2019 3:38 PM, Pintu Agarwal wrote:
>>> Hi All,
>>>
>>> Currently, I am trying to debug a boot up crash on some qualcomm
>>> snapdragon arm64 board with kernel 4.9.
>>> I could find the cause of the failure, but I am unable to locate from
>>> which subsystem/drivers this is getting triggered.
>>> If you have any ideas or suggestions to locate the issue, please let me know.
>>>
>>> This is the snapshot of crash logs:
>>> [ 6.907065] Unable to handle kernel NULL pointer dereference at
>>> virtual address 00000000
>>> [ 6.973938] PC is at 0x0
>>> [ 6.976503] LR is at __ipipe_ack_fasteoi_irq+0x28/0x38
>>> [ 7.151078] Process qmp_aop (pid: 24, stack limit = 0xfffffffbedc18000)
>>> [ 7.242668] [< (null)>] (null)
>>> [ 7.247416] [<ffffff9469f8d2e0>] __ipipe_dispatch_irq+0x78/0x340
>>> [ 7.253469] [<ffffff9469e81564>] __ipipe_grab_irq+0x5c/0xd0
>>> [ 7.341538] [<ffffff9469e81d68>] gic_handle_irq+0xc0/0x154
>>>
>>> [ 6.288581] [PINTU]: __ipipe_ack_fasteoi_irq - called
>>> [ 6.293698] [PINTU]: __ipipe_ack_fasteoi_irq:
>>> desc->irq_data.chip->irq_hold is NULL
>>>
>>> When I check, I found that the irq_hold implementation is missing in
>>> one of the irq_chip driver (expected by ipipe), which I am supposed to
>>> implement.
>>>
>>> But I am unable to locate which irq_chip driver.
>>> If there are any good techniques to locate this in kernel, please help.
>>>
>>
>> Could you please tell which QCOM SoC this board is based on?
>>
>
> Snapdragon 845 with kernel 4.9.x
> I want to know from which subsystem it is triggered:drivers/soc/qcom/
>

Irqchip driver is "drivers/irqchip/irq-gic-v3.c". The kernel you are
using is msm-4.9 I suppose or some other kernel?

- Sai
--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

2019-01-18 11:22:14

by Pintu Kumar

[permalink] [raw]
Subject: Re: Need help: how to locate failure from irq_chip subsystem

> >> Could you please tell which QCOM SoC this board is based on?
> >>
> >
> > Snapdragon 845 with kernel 4.9.x
> > I want to know from which subsystem it is triggered:drivers/soc/qcom/
> >
>
> Irqchip driver is "drivers/irqchip/irq-gic-v3.c". The kernel you are
> using is msm-4.9 I suppose or some other kernel?
>
Yes, I am using customized version of msm-4.9 kernel based on Android.
And yes the irqchip driver is: irq-gic-v3, which I can see from config.

But, what I wanted to know is, how to find out which driver module
(hopefully under: /drivers/soc/qcom/) that register with this
irq_chip, is getting triggered at the time of crash ?
So, that I can implement irq_hold function for it, which is the cause of crash.

2019-01-18 11:55:36

by Sai Prakash Ranjan

[permalink] [raw]
Subject: Re: Need help: how to locate failure from irq_chip subsystem

On 1/18/2019 4:50 PM, Pintu Agarwal wrote:
>>>> Could you please tell which QCOM SoC this board is based on?
>>>>
>>>
>>> Snapdragon 845 with kernel 4.9.x
>>> I want to know from which subsystem it is triggered:drivers/soc/qcom/
>>>
>>
>> Irqchip driver is "drivers/irqchip/irq-gic-v3.c". The kernel you are
>> using is msm-4.9 I suppose or some other kernel?
>>
> Yes, I am using customized version of msm-4.9 kernel based on Android.
> And yes the irqchip driver is: irq-gic-v3, which I can see from config.
>
> But, what I wanted to know is, how to find out which driver module
> (hopefully under: /drivers/soc/qcom/) that register with this
> irq_chip, is getting triggered at the time of crash ?
> So, that I can implement irq_hold function for it, which is the cause of crash.
>

Hmm, since this is a bootup crash, *initcall_debug* should help.
Add "initcall_debug ignore_loglevel" to kernel commandline and
check the last log before crash.

- Sai

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

2019-01-21 12:36:46

by Pintu Kumar

[permalink] [raw]
Subject: Re: Need help: how to locate failure from irq_chip subsystem

On Fri, Jan 18, 2019 at 5:23 PM Sai Prakash Ranjan
<[email protected]> wrote:
>
> On 1/18/2019 4:50 PM, Pintu Agarwal wrote:
> >>>> Could you please tell which QCOM SoC this board is based on?
> >>>>
> >>>
> >>> Snapdragon 845 with kernel 4.9.x
> >>> I want to know from which subsystem it is triggered:drivers/soc/qcom/
> >>>
> >>
> >> Irqchip driver is "drivers/irqchip/irq-gic-v3.c". The kernel you are
> >> using is msm-4.9 I suppose or some other kernel?
> >>
> > Yes, I am using customized version of msm-4.9 kernel based on Android.
> > And yes the irqchip driver is: irq-gic-v3, which I can see from config.
> >
> > But, what I wanted to know is, how to find out which driver module
> > (hopefully under: /drivers/soc/qcom/) that register with this
> > irq_chip, is getting triggered at the time of crash ?
> > So, that I can implement irq_hold function for it, which is the cause of crash.
> >
>
> Hmm, since this is a bootup crash, *initcall_debug* should help.
> Add "initcall_debug ignore_loglevel" to kernel commandline and
> check the last log before crash.
>

OK thanks Sai, for your suggestions.
Yes, I already tried that, but it did not help much.

Anyways, I could finally find the culprit driver, from where null
reference is coming.
So, that issue is fixed.
But, now I am looking into another issue.
If required, I will post further...

Thanks,
Pintu