2020-07-16 09:22:09

by liuqi (BA)

[permalink] [raw]
Subject: [PATCH] drivers/perf: Fix kernel panic when rmmod PMU modules during perf sampling

When users try to remove PMU modules during perf sampling, kernel panic
will happen because the pmu->read() is a NULL pointer here.

INFO on HiSilicon hip08 platform as follow:
pc : hisi_uncore_pmu_event_update+0x30/0xa4 [hisi_uncore_pmu]
lr : hisi_uncore_pmu_read+0x20/0x2c [hisi_uncore_pmu]
sp : ffff800010103e90
x29: ffff800010103e90 x28: ffff0027db0c0e40
x27: ffffa29a76f129d8 x26: ffffa29a77ceb000
x25: ffffa29a773a5000 x24: ffffa29a77392000
x23: ffffddffe5943f08 x22: ffff002784285960
x21: ffff002784285800 x20: ffff0027d2e76c80
x19: ffff0027842859e0 x18: ffff80003498bcc8
x17: ffffa29a76afe910 x16: ffffa29a7583f530
x15: 16151a1512061a1e x14: 0000000000000000
x13: ffffa29a76f1e238 x12: 0000000000000001
x11: 0000000000000400 x10: 00000000000009f0
x9 : ffff8000107b3e70 x8 : ffff0027db0c1890
x7 : ffffa29a773a7000 x6 : 00000007f5131013
x5 : 00000007f5131013 x4 : 09f257d417c00000
x3 : 00000002187bd7ce x2 : ffffa29a38f0f0d8
x1 : ffffa29a38eae268 x0 : ffff0027d2e76c80
Call trace:
hisi_uncore_pmu_event_update+0x30/0xa4 [hisi_uncore_pmu]
hisi_uncore_pmu_read+0x20/0x2c [hisi_uncore_pmu]
__perf_event_read+0x1a0/0x1f8
flush_smp_call_function_queue+0xa0/0x160
generic_smp_call_function_single_interrupt+0x18/0x20
handle_IPI+0x31c/0x4dc
gic_handle_irq+0x2c8/0x310
el1_irq+0xcc/0x180
arch_cpu_idle+0x4c/0x20c
default_idle_call+0x20/0x30
do_idle+0x1b4/0x270
cpu_startup_entry+0x28/0x30
secondary_start_kernel+0x1a4/0x1fc

To solve the above issue, current module should be registered to kernel,
so that try_module_get() can be invoked when perf sampling starts. This
adds the reference counting of module and could prevent users from removing
modules during sampling.

Signed-off-by: Qi Liu <[email protected]>
Reported-by: Haifeng Wang <[email protected]>
Reviewed-by: John Garry <[email protected]>
---
Kernel panic will also happen when users try to unbind PMU drivers with
device. This unbind issue could be solved by another patch latter.

drivers/perf/arm_smmuv3_pmu.c | 1 +
drivers/perf/fsl_imx8_ddr_perf.c | 1 +
drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c | 1 +
drivers/perf/hisilicon/hisi_uncore_hha_pmu.c | 1 +
drivers/perf/hisilicon/hisi_uncore_l3c_pmu.c | 1 +
5 files changed, 5 insertions(+)

diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c
index 48e28ef..90caba56 100644
--- a/drivers/perf/arm_smmuv3_pmu.c
+++ b/drivers/perf/arm_smmuv3_pmu.c
@@ -742,6 +742,7 @@ static int smmu_pmu_probe(struct platform_device *pdev)
platform_set_drvdata(pdev, smmu_pmu);

smmu_pmu->pmu = (struct pmu) {
+ .module = THIS_MODULE,
.task_ctx_nr = perf_invalid_context,
.pmu_enable = smmu_pmu_enable,
.pmu_disable = smmu_pmu_disable,
diff --git a/drivers/perf/fsl_imx8_ddr_perf.c b/drivers/perf/fsl_imx8_ddr_perf.c
index 90884d1..2aed2d9 100644
--- a/drivers/perf/fsl_imx8_ddr_perf.c
+++ b/drivers/perf/fsl_imx8_ddr_perf.c
@@ -512,6 +512,7 @@ static int ddr_perf_init(struct ddr_pmu *pmu, void __iomem *base,
{
*pmu = (struct ddr_pmu) {
.pmu = (struct pmu) {
+ .module = THIS_MODULE,
.capabilities = PERF_PMU_CAP_NO_EXCLUDE,
.task_ctx_nr = perf_invalid_context,
.attr_groups = attr_groups,
diff --git a/drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c b/drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c
index 15713fa..71587f1 100644
--- a/drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c
+++ b/drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c
@@ -378,6 +378,7 @@ static int hisi_ddrc_pmu_probe(struct platform_device *pdev)
ddrc_pmu->sccl_id, ddrc_pmu->index_id);
ddrc_pmu->pmu = (struct pmu) {
.name = name,
+ .module = THIS_MODULE,
.task_ctx_nr = perf_invalid_context,
.event_init = hisi_uncore_pmu_event_init,
.pmu_enable = hisi_uncore_pmu_enable,
diff --git a/drivers/perf/hisilicon/hisi_uncore_hha_pmu.c b/drivers/perf/hisilicon/hisi_uncore_hha_pmu.c
index dcc5600..c199de7 100644
--- a/drivers/perf/hisilicon/hisi_uncore_hha_pmu.c
+++ b/drivers/perf/hisilicon/hisi_uncore_hha_pmu.c
@@ -390,6 +390,7 @@ static int hisi_hha_pmu_probe(struct platform_device *pdev)
hha_pmu->sccl_id, hha_pmu->index_id);
hha_pmu->pmu = (struct pmu) {
.name = name,
+ .module = THIS_MODULE,
.task_ctx_nr = perf_invalid_context,
.event_init = hisi_uncore_pmu_event_init,
.pmu_enable = hisi_uncore_pmu_enable,
diff --git a/drivers/perf/hisilicon/hisi_uncore_l3c_pmu.c b/drivers/perf/hisilicon/hisi_uncore_l3c_pmu.c
index 7719ae4..567d7e6 100644
--- a/drivers/perf/hisilicon/hisi_uncore_l3c_pmu.c
+++ b/drivers/perf/hisilicon/hisi_uncore_l3c_pmu.c
@@ -380,6 +380,7 @@ static int hisi_l3c_pmu_probe(struct platform_device *pdev)
l3c_pmu->sccl_id, l3c_pmu->index_id);
l3c_pmu->pmu = (struct pmu) {
.name = name,
+ .module = THIS_MODULE,
.task_ctx_nr = perf_invalid_context,
.event_init = hisi_uncore_pmu_event_init,
.pmu_enable = hisi_uncore_pmu_enable,
--
2.7.4


2020-07-16 09:43:07

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH] drivers/perf: Fix kernel panic when rmmod PMU modules during perf sampling

On Thu, Jul 16, 2020 at 05:19:25PM +0800, Qi Liu wrote:
> Kernel panic will also happen when users try to unbind PMU drivers with
> device. This unbind issue could be solved by another patch latter.
>
> drivers/perf/arm_smmuv3_pmu.c | 1 +
> drivers/perf/fsl_imx8_ddr_perf.c | 1 +
> drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c | 1 +
> drivers/perf/hisilicon/hisi_uncore_hha_pmu.c | 1 +
> drivers/perf/hisilicon/hisi_uncore_l3c_pmu.c | 1 +
> 5 files changed, 5 insertions(+)
>
> diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c
> index 48e28ef..90caba56 100644
> --- a/drivers/perf/arm_smmuv3_pmu.c
> +++ b/drivers/perf/arm_smmuv3_pmu.c
> @@ -742,6 +742,7 @@ static int smmu_pmu_probe(struct platform_device *pdev)
> platform_set_drvdata(pdev, smmu_pmu);
>
> smmu_pmu->pmu = (struct pmu) {
> + .module = THIS_MODULE,

I thought platform_driver_register() did this automatically?

Will

2020-07-16 10:24:21

by John Garry

[permalink] [raw]
Subject: Re: [PATCH] drivers/perf: Fix kernel panic when rmmod PMU modules during perf sampling

On 16/07/2020 10:41, Will Deacon wrote:
> On Thu, Jul 16, 2020 at 05:19:25PM +0800, Qi Liu wrote:
>> Kernel panic will also happen when users try to unbind PMU drivers with
>> device. This unbind issue could be solved by another patch latter.
>>
>> drivers/perf/arm_smmuv3_pmu.c | 1 +
>> drivers/perf/fsl_imx8_ddr_perf.c | 1 +
>> drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c | 1 +
>> drivers/perf/hisilicon/hisi_uncore_hha_pmu.c | 1 +
>> drivers/perf/hisilicon/hisi_uncore_l3c_pmu.c | 1 +
>> 5 files changed, 5 insertions(+)
>>
>> diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c
>> index 48e28ef..90caba56 100644
>> --- a/drivers/perf/arm_smmuv3_pmu.c
>> +++ b/drivers/perf/arm_smmuv3_pmu.c
>> @@ -742,6 +742,7 @@ static int smmu_pmu_probe(struct platform_device *pdev)
>> platform_set_drvdata(pdev, smmu_pmu);
>>
>> smmu_pmu->pmu = (struct pmu) {
>> + .module = THIS_MODULE,
>
> I thought platform_driver_register() did this automatically?
>

Isn't that something different? The perf framework knows nothing of the
platform_device/driver really, and just knows the event_source device
which it creates. And so we also need to tell the perf framework about
the module backing this pmu.

I think some relevant code is perf_try_init_event() ->
try_module_get(pmu->module).

Thanks,
John

2020-07-16 10:27:55

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH] drivers/perf: Fix kernel panic when rmmod PMU modules during perf sampling

On 2020-07-16 10:41, Will Deacon wrote:
> On Thu, Jul 16, 2020 at 05:19:25PM +0800, Qi Liu wrote:
>> Kernel panic will also happen when users try to unbind PMU drivers with
>> device. This unbind issue could be solved by another patch latter.
>>
>> drivers/perf/arm_smmuv3_pmu.c | 1 +
>> drivers/perf/fsl_imx8_ddr_perf.c | 1 +
>> drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c | 1 +
>> drivers/perf/hisilicon/hisi_uncore_hha_pmu.c | 1 +
>> drivers/perf/hisilicon/hisi_uncore_l3c_pmu.c | 1 +
>> 5 files changed, 5 insertions(+)
>>
>> diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c
>> index 48e28ef..90caba56 100644
>> --- a/drivers/perf/arm_smmuv3_pmu.c
>> +++ b/drivers/perf/arm_smmuv3_pmu.c
>> @@ -742,6 +742,7 @@ static int smmu_pmu_probe(struct platform_device *pdev)
>> platform_set_drvdata(pdev, smmu_pmu);
>>
>> smmu_pmu->pmu = (struct pmu) {
>> + .module = THIS_MODULE,
>
> I thought platform_driver_register() did this automatically?

For the platform device itself, yes, but this is for the PMU device -
perf needs to take a reference to the module, otherwise the platform
device can still be pulled out from under its feet.

I can't remember if we ever discussed making perf_pmu_register() do the
same trick as platform_device_register() and friends, but obviously it's
a possibility.

Robin.

2020-07-16 10:32:53

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH] drivers/perf: Fix kernel panic when rmmod PMU modules during perf sampling

On Thu, Jul 16, 2020 at 11:26:25AM +0100, Robin Murphy wrote:
> On 2020-07-16 10:41, Will Deacon wrote:
> > On Thu, Jul 16, 2020 at 05:19:25PM +0800, Qi Liu wrote:
> > > Kernel panic will also happen when users try to unbind PMU drivers with
> > > device. This unbind issue could be solved by another patch latter.
> > >
> > > drivers/perf/arm_smmuv3_pmu.c | 1 +
> > > drivers/perf/fsl_imx8_ddr_perf.c | 1 +
> > > drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c | 1 +
> > > drivers/perf/hisilicon/hisi_uncore_hha_pmu.c | 1 +
> > > drivers/perf/hisilicon/hisi_uncore_l3c_pmu.c | 1 +
> > > 5 files changed, 5 insertions(+)
> > >
> > > diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c
> > > index 48e28ef..90caba56 100644
> > > --- a/drivers/perf/arm_smmuv3_pmu.c
> > > +++ b/drivers/perf/arm_smmuv3_pmu.c
> > > @@ -742,6 +742,7 @@ static int smmu_pmu_probe(struct platform_device *pdev)
> > > platform_set_drvdata(pdev, smmu_pmu);
> > >
> > > smmu_pmu->pmu = (struct pmu) {
> > > + .module = THIS_MODULE,
> >
> > I thought platform_driver_register() did this automatically?
>
> For the platform device itself, yes, but this is for the PMU device - perf
> needs to take a reference to the module, otherwise the platform device can
> still be pulled out from under its feet.

Urgh, gross.

> I can't remember if we ever discussed making perf_pmu_register() do the same
> trick as platform_device_register() and friends, but obviously it's a
> possibility.

Yeah, but I suppose this patch is the right thing to do for now. I'll queue
it as a fix.

Will

2020-07-16 10:38:33

by John Garry

[permalink] [raw]
Subject: Re: [PATCH] drivers/perf: Fix kernel panic when rmmod PMU modules during perf sampling

On 16/07/2020 11:30, Will Deacon wrote:
> On Thu, Jul 16, 2020 at 11:26:25AM +0100, Robin Murphy wrote:
>> On 2020-07-16 10:41, Will Deacon wrote:
>>> On Thu, Jul 16, 2020 at 05:19:25PM +0800, Qi Liu wrote:
>>>> Kernel panic will also happen when users try to unbind PMU drivers with
>>>> device. This unbind issue could be solved by another patch latter.
>>>>
>>>> drivers/perf/arm_smmuv3_pmu.c | 1 +
>>>> drivers/perf/fsl_imx8_ddr_perf.c | 1 +
>>>> drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c | 1 +
>>>> drivers/perf/hisilicon/hisi_uncore_hha_pmu.c | 1 +
>>>> drivers/perf/hisilicon/hisi_uncore_l3c_pmu.c | 1 +
>>>> 5 files changed, 5 insertions(+)
>>>>
>>>> diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c
>>>> index 48e28ef..90caba56 100644
>>>> --- a/drivers/perf/arm_smmuv3_pmu.c
>>>> +++ b/drivers/perf/arm_smmuv3_pmu.c
>>>> @@ -742,6 +742,7 @@ static int smmu_pmu_probe(struct platform_device *pdev)
>>>> platform_set_drvdata(pdev, smmu_pmu);
>>>>
>>>> smmu_pmu->pmu = (struct pmu) {
>>>> + .module = THIS_MODULE,
>>> I thought platform_driver_register() did this automatically?
>> For the platform device itself, yes, but this is for the PMU device - perf
>> needs to take a reference to the module, otherwise the platform device can
>> still be pulled out from under its feet.
> Urgh, gross.
>
>> I can't remember if we ever discussed making perf_pmu_register() do the same
>> trick as platform_device_register() and friends, but obviously it's a
>> possibility.
> Yeah, but I suppose this patch is the right thing to do for now. I'll queue
> it as a fix.
>

Please also note what Qi Liu wrote about being able to unbind the driver
and cause the same issue. I don't know if you can about that issue also.

2020-07-16 10:45:28

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH] drivers/perf: Fix kernel panic when rmmod PMU modules during perf sampling

On Thu, Jul 16, 2020 at 11:36:10AM +0100, John Garry wrote:
> On 16/07/2020 11:30, Will Deacon wrote:
> > On Thu, Jul 16, 2020 at 11:26:25AM +0100, Robin Murphy wrote:
> > > On 2020-07-16 10:41, Will Deacon wrote:
> > > > On Thu, Jul 16, 2020 at 05:19:25PM +0800, Qi Liu wrote:
> > > > > Kernel panic will also happen when users try to unbind PMU drivers with
> > > > > device. This unbind issue could be solved by another patch latter.
> > > > >
> > > > > drivers/perf/arm_smmuv3_pmu.c | 1 +
> > > > > drivers/perf/fsl_imx8_ddr_perf.c | 1 +
> > > > > drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c | 1 +
> > > > > drivers/perf/hisilicon/hisi_uncore_hha_pmu.c | 1 +
> > > > > drivers/perf/hisilicon/hisi_uncore_l3c_pmu.c | 1 +
> > > > > 5 files changed, 5 insertions(+)
> > > > >
> > > > > diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c
> > > > > index 48e28ef..90caba56 100644
> > > > > --- a/drivers/perf/arm_smmuv3_pmu.c
> > > > > +++ b/drivers/perf/arm_smmuv3_pmu.c
> > > > > @@ -742,6 +742,7 @@ static int smmu_pmu_probe(struct platform_device *pdev)
> > > > > platform_set_drvdata(pdev, smmu_pmu);
> > > > >
> > > > > smmu_pmu->pmu = (struct pmu) {
> > > > > + .module = THIS_MODULE,
> > > > I thought platform_driver_register() did this automatically?
> > > For the platform device itself, yes, but this is for the PMU device - perf
> > > needs to take a reference to the module, otherwise the platform device can
> > > still be pulled out from under its feet.
> > Urgh, gross.
> >
> > > I can't remember if we ever discussed making perf_pmu_register() do the same
> > > trick as platform_device_register() and friends, but obviously it's a
> > > possibility.
> > Yeah, but I suppose this patch is the right thing to do for now. I'll queue
> > it as a fix.
> >
>
> Please also note what Qi Liu wrote about being able to unbind the driver and
> cause the same issue. I don't know if you can about that issue also.

I guess we have to throw in some '.suppress_bind_attrs = true,' lines as
well, then? I'll queue this as-is, but happy to take a follow-up if somebody
can test it.

Will

2020-07-16 11:14:20

by John Garry

[permalink] [raw]
Subject: Re: [PATCH] drivers/perf: Fix kernel panic when rmmod PMU modules during perf sampling

On 16/07/2020 11:44, Will Deacon wrote:
>> Please also note what Qi Liu wrote about being able to unbind the driver and
>> cause the same issue. I don't know if you can about that issue also.
> I guess we have to throw in some '.suppress_bind_attrs = true,' lines as
> well, then?

Yeah, I think it would be a follow up using suppress_bind_attrs. TBH, I
don't know why this is an opt out feature, than an opt in. Probably a
good reason...

I'll queue this as-is, but happy to take a follow-up if somebody
> can test it.
>

Thanks,
john

2020-07-16 12:03:28

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH] drivers/perf: Fix kernel panic when rmmod PMU modules during perf sampling

On Thu, 16 Jul 2020 17:19:25 +0800, Qi Liu wrote:
> When users try to remove PMU modules during perf sampling, kernel panic
> will happen because the pmu->read() is a NULL pointer here.
>
> INFO on HiSilicon hip08 platform as follow:
> pc : hisi_uncore_pmu_event_update+0x30/0xa4 [hisi_uncore_pmu]
> lr : hisi_uncore_pmu_read+0x20/0x2c [hisi_uncore_pmu]
> sp : ffff800010103e90
> x29: ffff800010103e90 x28: ffff0027db0c0e40
> x27: ffffa29a76f129d8 x26: ffffa29a77ceb000
> x25: ffffa29a773a5000 x24: ffffa29a77392000
> x23: ffffddffe5943f08 x22: ffff002784285960
> x21: ffff002784285800 x20: ffff0027d2e76c80
> x19: ffff0027842859e0 x18: ffff80003498bcc8
> x17: ffffa29a76afe910 x16: ffffa29a7583f530
> x15: 16151a1512061a1e x14: 0000000000000000
> x13: ffffa29a76f1e238 x12: 0000000000000001
> x11: 0000000000000400 x10: 00000000000009f0
> x9 : ffff8000107b3e70 x8 : ffff0027db0c1890
> x7 : ffffa29a773a7000 x6 : 00000007f5131013
> x5 : 00000007f5131013 x4 : 09f257d417c00000
> x3 : 00000002187bd7ce x2 : ffffa29a38f0f0d8
> x1 : ffffa29a38eae268 x0 : ffff0027d2e76c80
> Call trace:
> hisi_uncore_pmu_event_update+0x30/0xa4 [hisi_uncore_pmu]
> hisi_uncore_pmu_read+0x20/0x2c [hisi_uncore_pmu]
> __perf_event_read+0x1a0/0x1f8
> flush_smp_call_function_queue+0xa0/0x160
> generic_smp_call_function_single_interrupt+0x18/0x20
> handle_IPI+0x31c/0x4dc
> gic_handle_irq+0x2c8/0x310
> el1_irq+0xcc/0x180
> arch_cpu_idle+0x4c/0x20c
> default_idle_call+0x20/0x30
> do_idle+0x1b4/0x270
> cpu_startup_entry+0x28/0x30
> secondary_start_kernel+0x1a4/0x1fc
>
> [...]

Applied to arm64 (for-next/fixes), thanks!

[1/1] drivers/perf: Fix kernel panic when rmmod PMU modules during perf sampling
https://git.kernel.org/arm64/c/bdc5c744c7b6

Cheers,
--
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev