2021-05-11 04:00:13

by chenxiang (M)

[permalink] [raw]
Subject: Qestion about device link

Hi Rafael and other guys,

I am trying to add a device link between scsi_host->shost_gendev and
hisi_hba->dev to support runtime PM for hisi_hba driver

(as it supports runtime PM for scsi host in some scenarios such as error
handler etc, we can avoid to do them again if adding a

device link between scsi_host->shost_gendev and hisi_hba->dev) as
follows (hisi_sas driver is under directory drivers/scsi/hisi_sas):

device_link_add(&shost->shost_gendev, hisi_hba->dev, DL_FLAG_PM_RUNTIME
| DL_FLAG_RPM_ACTIVE)

We have a full test on it, and it works well except when rmmod the
driver, some call trace occurs as follows:

[root@localhost ~]# rmmod hisi_sas_v3_hw
[ 105.377944] BUG: scheduling while atomic: kworker/113:1/811/0x00000201
[ 105.384469] Modules linked in: bluetooth rfkill ib_isert
iscsi_target_mod ib_ipoib ib_umad iptable_filter vfio_iommu_type1
vfio_pci vfio_virqfd vfio rpcrdma ib_is er
libiscsi scsi_transport_iscsi crct10dif_ce sbsa_gwdt hns_roce_hw_v2
hisi_sec2 hisi_hpre hisi_zip hisi_qm uacce spi_hisi_sfc_v3xx
hisi_trng_v2 rng_core hisi_uncore _hha_pmu
hisi_uncore_ddrc_pmu hisi_uncore_l3c_pmu spi_dw_mmio hisi_uncore_pmu
hns3 hclge hnae3 hisi_sas_v3_hw(-) hisi_sas_main libsas
[ 105.424841] CPU: 113 PID: 811 Comm: kworker/113:1 Kdump: loaded
Tainted: G W 5.12.0-rc1+ #1
[ 105.434454] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS
2280-V2 CS V5.B143.01 04/22/2021
[ 105.443287] Workqueue: rcu_gp srcu_invoke_callbacks
[ 105.448154] Call trace:
[ 105.450593] dump_backtrace+0x0/0x1a4
[ 105.454245] show_stack+0x24/0x40
[ 105.457548] dump_stack+0xc8/0x104
[ 105.460939] __schedule_bug+0x68/0x80
[ 105.464590] __schedule+0x73c/0x77c
[ 105.465700] BUG: scheduling while atomic: kworker/96:1/791/0x00000201
[ 105.468066] schedule+0x7c/0x110
[ 105.468068] schedule_timeout+0x194/0x1d4
[ 105.474490] Modules linked in:
[ 105.477692] wait_for_completion+0x8c/0x12c
[ 105.477695] rcu_barrier+0x1e0/0x2fc
[ 105.477697] scsi_host_dev_release+0x50/0xf0
[ 105.477701] device_release+0x40/0xa0
[ 105.477704] kobject_put+0xac/0x100
[ 105.477707] __device_link_free_srcu+0x50/0x74
[ 105.477709] srcu_invoke_callbacks+0x108/0x1a4
[ 105.484743] process_one_work+0x1dc/0x48c
[ 105.492468] worker_thread+0x7c/0x464
[ 105.492471] kthread+0x168/0x16c
[ 105.492473] ret_from_fork+0x10/0x18
...

After analyse the process, we find that it will
device_del(&shost->gendev) in function scsi_remove_host() and then

put_device(&shost->shost_gendev) in function scsi_host_put() when
removing the driver, if there is a link between shost and hisi_hba->dev,

it will try to delete the link in device_del(), and also will
call_srcu(__device_link_free_srcu) to put_device() link->consumer and
supplier.

But if put device() for shost_gendev in device_link_free() is later than
in scsi_host_put(), it will call scsi_host_dev_release() in

srcu_invoke_callbacks() while it is atomic and there are scheduling in
scsi_host_dev_release(),

so it reports the BUG "scheduling while atomic:...".

thread 1 thread2
hisi_sas_v3_remove
...
sas_remove_host()
...
scsi_remove_host()
...
device_del(&shost->shost_gendev)
...
device_link_purge()
__device_link_del()
device_unregister(&link->link_dev)
devlink_dev_release
call_srcu(__device_link_free_srcu) ----------->
srcu_invoke_callbacks (atomic)
__device_link_free_srcu
...
scsi_host_put()
put_device(&shost->shost_gendev) (ref = 1)
device_link_free()
put_device(link->consumer)
//shost->gendev ref = 0
...
scsi_host_dev_release
...
rcu_barrier
kthread_stop()


We can check kref of shost->shost_gendev to make sure scsi_host_put() to
release scsi host device in LLDD driver to avoid the issue,

but it seems be a common issue: function __device_link_free_srcu calls
put_device() for consumer and supplier,

but if it's ref =0 at that time and there are scheduling or sleep in
dev_release, it may have the issue.

Do you have any idea about the issue?


Best regards,

Xiang Chen



2021-05-11 10:43:33

by chenxiang (M)

[permalink] [raw]
Subject: Question about device link//Re: Qestion about device link

Re-edit the non-aligned flowchart and add CC to Greg-KH and Saravanna.


在 2021/5/11 11:59, chenxiang (M) 写道:
> Hi Rafael and other guys,
>
> I am trying to add a device link between scsi_host->shost_gendev and
> hisi_hba->dev to support runtime PM for hisi_hba driver
>
> (as it supports runtime PM for scsi host in some scenarios such as
> error handler etc, we can avoid to do them again if adding a
>
> device link between scsi_host->shost_gendev and hisi_hba->dev) as
> follows (hisi_sas driver is under directory drivers/scsi/hisi_sas):
>
> device_link_add(&shost->shost_gendev, hisi_hba->dev,
> DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE)
>
> We have a full test on it, and it works well except when rmmod the
> driver, some call trace occurs as follows:
>
> [root@localhost ~]# rmmod hisi_sas_v3_hw
> [ 105.377944] BUG: scheduling while atomic: kworker/113:1/811/0x00000201
> [ 105.384469] Modules linked in: bluetooth rfkill ib_isert
> iscsi_target_mod ib_ipoib ib_umad iptable_filter vfio_iommu_type1
> vfio_pci vfio_virqfd vfio rpcrdma ib_is er
> libiscsi scsi_transport_iscsi crct10dif_ce sbsa_gwdt hns_roce_hw_v2
> hisi_sec2 hisi_hpre hisi_zip hisi_qm uacce spi_hisi_sfc_v3xx
> hisi_trng_v2 rng_core hisi_uncore _hha_pmu
> hisi_uncore_ddrc_pmu hisi_uncore_l3c_pmu spi_dw_mmio hisi_uncore_pmu
> hns3 hclge hnae3 hisi_sas_v3_hw(-) hisi_sas_main libsas
> [ 105.424841] CPU: 113 PID: 811 Comm: kworker/113:1 Kdump: loaded
> Tainted: G W 5.12.0-rc1+ #1
> [ 105.434454] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS
> 2280-V2 CS V5.B143.01 04/22/2021
> [ 105.443287] Workqueue: rcu_gp srcu_invoke_callbacks
> [ 105.448154] Call trace:
> [ 105.450593] dump_backtrace+0x0/0x1a4
> [ 105.454245] show_stack+0x24/0x40
> [ 105.457548] dump_stack+0xc8/0x104
> [ 105.460939] __schedule_bug+0x68/0x80
> [ 105.464590] __schedule+0x73c/0x77c
> [ 105.465700] BUG: scheduling while atomic: kworker/96:1/791/0x00000201
> [ 105.468066] schedule+0x7c/0x110
> [ 105.468068] schedule_timeout+0x194/0x1d4
> [ 105.474490] Modules linked in:
> [ 105.477692] wait_for_completion+0x8c/0x12c
> [ 105.477695] rcu_barrier+0x1e0/0x2fc
> [ 105.477697] scsi_host_dev_release+0x50/0xf0
> [ 105.477701] device_release+0x40/0xa0
> [ 105.477704] kobject_put+0xac/0x100
> [ 105.477707] __device_link_free_srcu+0x50/0x74
> [ 105.477709] srcu_invoke_callbacks+0x108/0x1a4
> [ 105.484743] process_one_work+0x1dc/0x48c
> [ 105.492468] worker_thread+0x7c/0x464
> [ 105.492471] kthread+0x168/0x16c
> [ 105.492473] ret_from_fork+0x10/0x18
> ...
>
> After analyse the process, we find that it will
> device_del(&shost->gendev) in function scsi_remove_host() and then
>
> put_device(&shost->shost_gendev) in function scsi_host_put() when
> removing the driver, if there is a link between shost and hisi_hba->dev,
>
> it will try to delete the link in device_del(), and also will
> call_srcu(__device_link_free_srcu) to put_device() link->consumer and
> supplier.
>
> But if put device() for shost_gendev in device_link_free() is later
> than in scsi_host_put(), it will call scsi_host_dev_release() in
>
> srcu_invoke_callbacks() while it is atomic and there are scheduling in
> scsi_host_dev_release(),
>
> so it reports the BUG "scheduling while atomic:...".
>
> thread 1 thread2
> hisi_sas_v3_remove
> ...
> sas_remove_host()
> ...
> scsi_remove_host()
> ...
> device_del(&shost->shost_gendev)
> ...
> device_link_purge()
> __device_link_del()
> device_unregister(&link->link_dev)
> devlink_dev_release
> call_srcu(__device_link_free_srcu) ----------->
> srcu_invoke_callbacks (atomic)
> __device_link_free_srcu
> ...
> scsi_host_put()
> put_device(&shost->shost_gendev) (ref = 1)
> device_link_free()
> put_device(link->consumer)
> //shost->gendev ref = 0
> ...
> scsi_host_dev_release
> ...
> rcu_barrier
> kthread_stop()

Re-edit the non-aligned flowchart
thread 1 thread 2
hisi_sas_v3_remove()
...
sas_remove_host()
...
device_del(&shost->shost_gendev)
...
device_link_purge()
__device_link_del()
device_unregister(&link->link_dev)
devlink_dev_release
call_srcu(__device_link_free_srcu) ----------->
srcu_invoke_callbacks (atomic)
__device_link_free_srcu()
...
scsi_host_put()
put_device(&shost->shost_gendev) (ref = 1)
device_link_free()
put_device(link->consumer)
//shost->gendev ref = 0
...
scsi_host_dev_release()
...
rcu_barrier()
kthread_stop()

>
>
> We can check kref of shost->shost_gendev to make sure scsi_host_put()
> to release scsi host device in LLDD driver to avoid the issue,
>
> but it seems be a common issue: function __device_link_free_srcu
> calls put_device() for consumer and supplier,
>
> but if it's ref =0 at that time and there are scheduling or sleep in
> dev_release, it may have the issue.
>
> Do you have any idea about the issue?
>
>
> Best regards,
>
> Xiang Chen
>
>
>
> .
>


2021-05-11 14:43:22

by Wysocki, Rafael J

[permalink] [raw]
Subject: Re: Qestion about device link

On 5/11/2021 5:59 AM, chenxiang (M) wrote:
> Hi Rafael and other guys,
>
> I am trying to add a device link between scsi_host->shost_gendev and
> hisi_hba->dev to support runtime PM for hisi_hba driver
>
> (as it supports runtime PM for scsi host in some scenarios such as
> error handler etc, we can avoid to do them again if adding a
>
> device link between scsi_host->shost_gendev and hisi_hba->dev) as
> follows (hisi_sas driver is under directory drivers/scsi/hisi_sas):
>
> device_link_add(&shost->shost_gendev, hisi_hba->dev,
> DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE)
>
> We have a full test on it, and it works well except when rmmod the
> driver, some call trace occurs as follows:
>
> [root@localhost ~]# rmmod hisi_sas_v3_hw
> [  105.377944] BUG: scheduling while atomic: kworker/113:1/811/0x00000201
> [  105.384469] Modules linked in: bluetooth rfkill ib_isert
> iscsi_target_mod ib_ipoib ib_umad iptable_filter vfio_iommu_type1
> vfio_pci vfio_virqfd vfio rpcrdma ib_is                         er
> libiscsi scsi_transport_iscsi crct10dif_ce sbsa_gwdt hns_roce_hw_v2
> hisi_sec2 hisi_hpre hisi_zip hisi_qm uacce spi_hisi_sfc_v3xx
> hisi_trng_v2 rng_core hisi_uncore                         _hha_pmu
> hisi_uncore_ddrc_pmu hisi_uncore_l3c_pmu spi_dw_mmio hisi_uncore_pmu
> hns3 hclge hnae3 hisi_sas_v3_hw(-) hisi_sas_main libsas
> [  105.424841] CPU: 113 PID: 811 Comm: kworker/113:1 Kdump: loaded
> Tainted: G        W         5.12.0-rc1+ #1
> [  105.434454] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS
> 2280-V2 CS V5.B143.01 04/22/2021
> [  105.443287] Workqueue: rcu_gp srcu_invoke_callbacks
> [  105.448154] Call trace:
> [  105.450593]  dump_backtrace+0x0/0x1a4
> [  105.454245]  show_stack+0x24/0x40
> [  105.457548]  dump_stack+0xc8/0x104
> [  105.460939]  __schedule_bug+0x68/0x80
> [  105.464590]  __schedule+0x73c/0x77c
> [  105.465700] BUG: scheduling while atomic: kworker/96:1/791/0x00000201
> [  105.468066]  schedule+0x7c/0x110
> [  105.468068]  schedule_timeout+0x194/0x1d4
> [  105.474490] Modules linked in:
> [  105.477692]  wait_for_completion+0x8c/0x12c
> [  105.477695]  rcu_barrier+0x1e0/0x2fc
> [  105.477697]  scsi_host_dev_release+0x50/0xf0
> [  105.477701]  device_release+0x40/0xa0
> [  105.477704]  kobject_put+0xac/0x100
> [  105.477707]  __device_link_free_srcu+0x50/0x74
> [  105.477709]  srcu_invoke_callbacks+0x108/0x1a4
> [  105.484743]  process_one_work+0x1dc/0x48c
> [  105.492468]  worker_thread+0x7c/0x464
> [  105.492471]  kthread+0x168/0x16c
> [  105.492473]  ret_from_fork+0x10/0x18
> ...
>
> After analyse the process, we find that it will
> device_del(&shost->gendev) in function scsi_remove_host() and then
>
> put_device(&shost->shost_gendev) in function scsi_host_put() when
> removing the driver, if there is a link between shost and hisi_hba->dev,
>
> it will try to delete the link in device_del(), and also will
> call_srcu(__device_link_free_srcu) to put_device() link->consumer and
> supplier.
>
> But if put device() for shost_gendev in device_link_free() is later
> than in scsi_host_put(), it will call scsi_host_dev_release() in
>
> srcu_invoke_callbacks() while it is atomic and there are scheduling in
> scsi_host_dev_release(),
>
> so it reports the BUG "scheduling while atomic:...".
>
> thread 1                                                   thread2
> hisi_sas_v3_remove
>     ...
>     sas_remove_host()
>         ...
>         scsi_remove_host()
>             ...
>             device_del(&shost->shost_gendev)
>                 ...
>                 device_link_purge()
>                     __device_link_del()
>                         device_unregister(&link->link_dev)
>                             devlink_dev_release
> call_srcu(__device_link_free_srcu)    ----------->
> srcu_invoke_callbacks  (atomic)
>         __device_link_free_srcu
>     ...
>     scsi_host_put()
>         put_device(&shost->shost_gendev) (ref = 1)
>                 device_link_free()
>                               put_device(link->consumer)
> //shost->gendev ref = 0
>                                           ...
>                                           scsi_host_dev_release
>                                                       ...
> rcu_barrier
> kthread_stop()
>
>
> We can check kref of shost->shost_gendev to make sure scsi_host_put()
> to release scsi host device in LLDD driver to avoid the issue,
>
> but it seems be a common issue:  function __device_link_free_srcu
> calls put_device() for consumer and supplier,
>
> but if it's ref =0 at that time and there are scheduling or sleep in
> dev_release, it may have the issue.
>
> Do you have any idea about the issue?
>
Yes, this is a general issue.

If I'm not mistaken, it can be addressed by further deferring the
device_link_free() invocation through a workqueue.

Let me cut a patch doing this.


2021-05-11 18:28:15

by Saravana Kannan

[permalink] [raw]
Subject: Re: Question about device link//Re: Qestion about device link

On Tue, May 11, 2021 at 3:42 AM chenxiang (M) <[email protected]> wrote:
>
> Re-edit the non-aligned flowchart and add CC to Greg-KH and Saravanna.
>
>
> 在 2021/5/11 11:59, chenxiang (M) 写道:
> > Hi Rafael and other guys,
> >
> > I am trying to add a device link between scsi_host->shost_gendev and
> > hisi_hba->dev to support runtime PM for hisi_hba driver
> >
> > (as it supports runtime PM for scsi host in some scenarios such as
> > error handler etc, we can avoid to do them again if adding a
> >
> > device link between scsi_host->shost_gendev and hisi_hba->dev) as
> > follows (hisi_sas driver is under directory drivers/scsi/hisi_sas):
> >
> > device_link_add(&shost->shost_gendev, hisi_hba->dev,
> > DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE)
> >
> > We have a full test on it, and it works well except when rmmod the
> > driver, some call trace occurs as follows:
> >
> > [root@localhost ~]# rmmod hisi_sas_v3_hw
> > [ 105.377944] BUG: scheduling while atomic: kworker/113:1/811/0x00000201
> > [ 105.384469] Modules linked in: bluetooth rfkill ib_isert
> > iscsi_target_mod ib_ipoib ib_umad iptable_filter vfio_iommu_type1
> > vfio_pci vfio_virqfd vfio rpcrdma ib_is er
> > libiscsi scsi_transport_iscsi crct10dif_ce sbsa_gwdt hns_roce_hw_v2
> > hisi_sec2 hisi_hpre hisi_zip hisi_qm uacce spi_hisi_sfc_v3xx
> > hisi_trng_v2 rng_core hisi_uncore _hha_pmu
> > hisi_uncore_ddrc_pmu hisi_uncore_l3c_pmu spi_dw_mmio hisi_uncore_pmu
> > hns3 hclge hnae3 hisi_sas_v3_hw(-) hisi_sas_main libsas
> > [ 105.424841] CPU: 113 PID: 811 Comm: kworker/113:1 Kdump: loaded
> > Tainted: G W 5.12.0-rc1+ #1
> > [ 105.434454] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS
> > 2280-V2 CS V5.B143.01 04/22/2021
> > [ 105.443287] Workqueue: rcu_gp srcu_invoke_callbacks
> > [ 105.448154] Call trace:
> > [ 105.450593] dump_backtrace+0x0/0x1a4
> > [ 105.454245] show_stack+0x24/0x40
> > [ 105.457548] dump_stack+0xc8/0x104
> > [ 105.460939] __schedule_bug+0x68/0x80
> > [ 105.464590] __schedule+0x73c/0x77c
> > [ 105.465700] BUG: scheduling while atomic: kworker/96:1/791/0x00000201
> > [ 105.468066] schedule+0x7c/0x110
> > [ 105.468068] schedule_timeout+0x194/0x1d4
> > [ 105.474490] Modules linked in:
> > [ 105.477692] wait_for_completion+0x8c/0x12c
> > [ 105.477695] rcu_barrier+0x1e0/0x2fc
> > [ 105.477697] scsi_host_dev_release+0x50/0xf0
> > [ 105.477701] device_release+0x40/0xa0
> > [ 105.477704] kobject_put+0xac/0x100
> > [ 105.477707] __device_link_free_srcu+0x50/0x74
> > [ 105.477709] srcu_invoke_callbacks+0x108/0x1a4
> > [ 105.484743] process_one_work+0x1dc/0x48c
> > [ 105.492468] worker_thread+0x7c/0x464
> > [ 105.492471] kthread+0x168/0x16c
> > [ 105.492473] ret_from_fork+0x10/0x18
> > ...
> >
> > After analyse the process, we find that it will
> > device_del(&shost->gendev) in function scsi_remove_host() and then
> >
> > put_device(&shost->shost_gendev) in function scsi_host_put() when
> > removing the driver, if there is a link between shost and hisi_hba->dev,
> >
> > it will try to delete the link in device_del(), and also will
> > call_srcu(__device_link_free_srcu) to put_device() link->consumer and
> > supplier.
> >
> > But if put device() for shost_gendev in device_link_free() is later
> > than in scsi_host_put(), it will call scsi_host_dev_release() in
> >
> > srcu_invoke_callbacks() while it is atomic and there are scheduling in
> > scsi_host_dev_release(),
> >
> > so it reports the BUG "scheduling while atomic:...".
> >
> > thread 1 thread2
> > hisi_sas_v3_remove
> > ...
> > sas_remove_host()
> > ...
> > scsi_remove_host()
> > ...
> > device_del(&shost->shost_gendev)
> > ...
> > device_link_purge()
> > __device_link_del()
> > device_unregister(&link->link_dev)
> > devlink_dev_release
> > call_srcu(__device_link_free_srcu) ----------->
> > srcu_invoke_callbacks (atomic)
> > __device_link_free_srcu
> > ...
> > scsi_host_put()
> > put_device(&shost->shost_gendev) (ref = 1)
> > device_link_free()
> > put_device(link->consumer)
> > //shost->gendev ref = 0
> > ...
> > scsi_host_dev_release
> > ...
> > rcu_barrier
> > kthread_stop()
>
> Re-edit the non-aligned flowchart
> thread 1 thread 2
> hisi_sas_v3_remove()
> ...
> sas_remove_host()
> ...
> device_del(&shost->shost_gendev)
> ...
> device_link_purge()
> __device_link_del()
> device_unregister(&link->link_dev)
> devlink_dev_release
> call_srcu(__device_link_free_srcu) ----------->
> srcu_invoke_callbacks (atomic)
> __device_link_free_srcu()
> ...
> scsi_host_put()
> put_device(&shost->shost_gendev) (ref = 1)
> device_link_free()
> put_device(link->consumer)
> //shost->gendev ref = 0
> ...
> scsi_host_dev_release()
> ...
> rcu_barrier()
> kthread_stop()
>
> >
> >
> > We can check kref of shost->shost_gendev to make sure scsi_host_put()
> > to release scsi host device in LLDD driver to avoid the issue,
> >
> > but it seems be a common issue: function __device_link_free_srcu
> > calls put_device() for consumer and supplier,
> >
> > but if it's ref =0 at that time and there are scheduling or sleep in
> > dev_release, it may have the issue.
> >
> > Do you have any idea about the issue?

Another report for the same issue.
https://lore.kernel.org/lkml/CAGETcx80xSZ8d4JbZqiSz4L0VNtL+HCnFCS2u3F9aNC0QQoQjg@mail.gmail.com/

I don't have enough context yet about the need for SRCU (I haven't
read up all the runtime PM code), but this is a real issue that needs
to be solved.

Dirty/terrible hack is to kick off another work to do the
put_device(). But is there any SRCU option that'll try to do the
release in a non-atomic context?

-Saravana

2021-05-11 19:16:10

by Wysocki, Rafael J

[permalink] [raw]
Subject: Re: Question about device link//Re: Qestion about device link

On 5/11/2021 8:23 PM, Saravana Kannan wrote:
> On Tue, May 11, 2021 at 3:42 AM chenxiang (M) <[email protected]> wrote:
>> Re-edit the non-aligned flowchart and add CC to Greg-KH and Saravanna.
>>
>>
>> 在 2021/5/11 11:59, chenxiang (M) 写道:
>>> Hi Rafael and other guys,
>>>
>>> I am trying to add a device link between scsi_host->shost_gendev and
>>> hisi_hba->dev to support runtime PM for hisi_hba driver
>>>
>>> (as it supports runtime PM for scsi host in some scenarios such as
>>> error handler etc, we can avoid to do them again if adding a
>>>
>>> device link between scsi_host->shost_gendev and hisi_hba->dev) as
>>> follows (hisi_sas driver is under directory drivers/scsi/hisi_sas):
>>>
>>> device_link_add(&shost->shost_gendev, hisi_hba->dev,
>>> DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE)
>>>
>>> We have a full test on it, and it works well except when rmmod the
>>> driver, some call trace occurs as follows:
>>>
>>> [root@localhost ~]# rmmod hisi_sas_v3_hw
>>> [ 105.377944] BUG: scheduling while atomic: kworker/113:1/811/0x00000201
>>> [ 105.384469] Modules linked in: bluetooth rfkill ib_isert
>>> iscsi_target_mod ib_ipoib ib_umad iptable_filter vfio_iommu_type1
>>> vfio_pci vfio_virqfd vfio rpcrdma ib_is er
>>> libiscsi scsi_transport_iscsi crct10dif_ce sbsa_gwdt hns_roce_hw_v2
>>> hisi_sec2 hisi_hpre hisi_zip hisi_qm uacce spi_hisi_sfc_v3xx
>>> hisi_trng_v2 rng_core hisi_uncore _hha_pmu
>>> hisi_uncore_ddrc_pmu hisi_uncore_l3c_pmu spi_dw_mmio hisi_uncore_pmu
>>> hns3 hclge hnae3 hisi_sas_v3_hw(-) hisi_sas_main libsas
>>> [ 105.424841] CPU: 113 PID: 811 Comm: kworker/113:1 Kdump: loaded
>>> Tainted: G W 5.12.0-rc1+ #1
>>> [ 105.434454] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS
>>> 2280-V2 CS V5.B143.01 04/22/2021
>>> [ 105.443287] Workqueue: rcu_gp srcu_invoke_callbacks
>>> [ 105.448154] Call trace:
>>> [ 105.450593] dump_backtrace+0x0/0x1a4
>>> [ 105.454245] show_stack+0x24/0x40
>>> [ 105.457548] dump_stack+0xc8/0x104
>>> [ 105.460939] __schedule_bug+0x68/0x80
>>> [ 105.464590] __schedule+0x73c/0x77c
>>> [ 105.465700] BUG: scheduling while atomic: kworker/96:1/791/0x00000201
>>> [ 105.468066] schedule+0x7c/0x110
>>> [ 105.468068] schedule_timeout+0x194/0x1d4
>>> [ 105.474490] Modules linked in:
>>> [ 105.477692] wait_for_completion+0x8c/0x12c
>>> [ 105.477695] rcu_barrier+0x1e0/0x2fc
>>> [ 105.477697] scsi_host_dev_release+0x50/0xf0
>>> [ 105.477701] device_release+0x40/0xa0
>>> [ 105.477704] kobject_put+0xac/0x100
>>> [ 105.477707] __device_link_free_srcu+0x50/0x74
>>> [ 105.477709] srcu_invoke_callbacks+0x108/0x1a4
>>> [ 105.484743] process_one_work+0x1dc/0x48c
>>> [ 105.492468] worker_thread+0x7c/0x464
>>> [ 105.492471] kthread+0x168/0x16c
>>> [ 105.492473] ret_from_fork+0x10/0x18
>>> ...
>>>
>>> After analyse the process, we find that it will
>>> device_del(&shost->gendev) in function scsi_remove_host() and then
>>>
>>> put_device(&shost->shost_gendev) in function scsi_host_put() when
>>> removing the driver, if there is a link between shost and hisi_hba->dev,
>>>
>>> it will try to delete the link in device_del(), and also will
>>> call_srcu(__device_link_free_srcu) to put_device() link->consumer and
>>> supplier.
>>>
>>> But if put device() for shost_gendev in device_link_free() is later
>>> than in scsi_host_put(), it will call scsi_host_dev_release() in
>>>
>>> srcu_invoke_callbacks() while it is atomic and there are scheduling in
>>> scsi_host_dev_release(),
>>>
>>> so it reports the BUG "scheduling while atomic:...".
>>>
>>> thread 1 thread2
>>> hisi_sas_v3_remove
>>> ...
>>> sas_remove_host()
>>> ...
>>> scsi_remove_host()
>>> ...
>>> device_del(&shost->shost_gendev)
>>> ...
>>> device_link_purge()
>>> __device_link_del()
>>> device_unregister(&link->link_dev)
>>> devlink_dev_release
>>> call_srcu(__device_link_free_srcu) ----------->
>>> srcu_invoke_callbacks (atomic)
>>> __device_link_free_srcu
>>> ...
>>> scsi_host_put()
>>> put_device(&shost->shost_gendev) (ref = 1)
>>> device_link_free()
>>> put_device(link->consumer)
>>> //shost->gendev ref = 0
>>> ...
>>> scsi_host_dev_release
>>> ...
>>> rcu_barrier
>>> kthread_stop()
>> Re-edit the non-aligned flowchart
>> thread 1 thread 2
>> hisi_sas_v3_remove()
>> ...
>> sas_remove_host()
>> ...
>> device_del(&shost->shost_gendev)
>> ...
>> device_link_purge()
>> __device_link_del()
>> device_unregister(&link->link_dev)
>> devlink_dev_release
>> call_srcu(__device_link_free_srcu) ----------->
>> srcu_invoke_callbacks (atomic)
>> __device_link_free_srcu()
>> ...
>> scsi_host_put()
>> put_device(&shost->shost_gendev) (ref = 1)
>> device_link_free()
>> put_device(link->consumer)
>> //shost->gendev ref = 0
>> ...
>> scsi_host_dev_release()
>> ...
>> rcu_barrier()
>> kthread_stop()
>>
>>>
>>> We can check kref of shost->shost_gendev to make sure scsi_host_put()
>>> to release scsi host device in LLDD driver to avoid the issue,
>>>
>>> but it seems be a common issue: function __device_link_free_srcu
>>> calls put_device() for consumer and supplier,
>>>
>>> but if it's ref =0 at that time and there are scheduling or sleep in
>>> dev_release, it may have the issue.
>>>
>>> Do you have any idea about the issue?
> Another report for the same issue.
> https://lore.kernel.org/lkml/CAGETcx80xSZ8d4JbZqiSz4L0VNtL+HCnFCS2u3F9aNC0QQoQjg@mail.gmail.com/
>
> I don't have enough context yet about the need for SRCU (I haven't
> read up all the runtime PM code), but this is a real issue that needs
> to be solved.
>
> Dirty/terrible hack is to kick off another work to do the
> put_device().

I wouldn't call it dirty or terrible, but it may just be the thing that
needs to be done here.


> But is there any SRCU option that'll try to do the
> release in a non-atomic context?

No, the callbacks are run from a softirq if I'm not mistaken.

Thanks!


2021-05-11 19:18:50

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Qestion about device link

On Tuesday, May 11, 2021 4:39:31 PM CEST Rafael J. Wysocki wrote:
> On 5/11/2021 5:59 AM, chenxiang (M) wrote:
> > Hi Rafael and other guys,
> >
> > I am trying to add a device link between scsi_host->shost_gendev and
> > hisi_hba->dev to support runtime PM for hisi_hba driver
> >
> > (as it supports runtime PM for scsi host in some scenarios such as
> > error handler etc, we can avoid to do them again if adding a
> >
> > device link between scsi_host->shost_gendev and hisi_hba->dev) as
> > follows (hisi_sas driver is under directory drivers/scsi/hisi_sas):
> >
> > device_link_add(&shost->shost_gendev, hisi_hba->dev,
> > DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE)
> >
> > We have a full test on it, and it works well except when rmmod the
> > driver, some call trace occurs as follows:
> >
> > [root@localhost ~]# rmmod hisi_sas_v3_hw
> > [ 105.377944] BUG: scheduling while atomic: kworker/113:1/811/0x00000201
> > [ 105.384469] Modules linked in: bluetooth rfkill ib_isert
> > iscsi_target_mod ib_ipoib ib_umad iptable_filter vfio_iommu_type1
> > vfio_pci vfio_virqfd vfio rpcrdma ib_is er
> > libiscsi scsi_transport_iscsi crct10dif_ce sbsa_gwdt hns_roce_hw_v2
> > hisi_sec2 hisi_hpre hisi_zip hisi_qm uacce spi_hisi_sfc_v3xx
> > hisi_trng_v2 rng_core hisi_uncore _hha_pmu
> > hisi_uncore_ddrc_pmu hisi_uncore_l3c_pmu spi_dw_mmio hisi_uncore_pmu
> > hns3 hclge hnae3 hisi_sas_v3_hw(-) hisi_sas_main libsas
> > [ 105.424841] CPU: 113 PID: 811 Comm: kworker/113:1 Kdump: loaded
> > Tainted: G W 5.12.0-rc1+ #1
> > [ 105.434454] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS
> > 2280-V2 CS V5.B143.01 04/22/2021
> > [ 105.443287] Workqueue: rcu_gp srcu_invoke_callbacks
> > [ 105.448154] Call trace:
> > [ 105.450593] dump_backtrace+0x0/0x1a4
> > [ 105.454245] show_stack+0x24/0x40
> > [ 105.457548] dump_stack+0xc8/0x104
> > [ 105.460939] __schedule_bug+0x68/0x80
> > [ 105.464590] __schedule+0x73c/0x77c
> > [ 105.465700] BUG: scheduling while atomic: kworker/96:1/791/0x00000201
> > [ 105.468066] schedule+0x7c/0x110
> > [ 105.468068] schedule_timeout+0x194/0x1d4
> > [ 105.474490] Modules linked in:
> > [ 105.477692] wait_for_completion+0x8c/0x12c
> > [ 105.477695] rcu_barrier+0x1e0/0x2fc
> > [ 105.477697] scsi_host_dev_release+0x50/0xf0
> > [ 105.477701] device_release+0x40/0xa0
> > [ 105.477704] kobject_put+0xac/0x100
> > [ 105.477707] __device_link_free_srcu+0x50/0x74
> > [ 105.477709] srcu_invoke_callbacks+0x108/0x1a4
> > [ 105.484743] process_one_work+0x1dc/0x48c
> > [ 105.492468] worker_thread+0x7c/0x464
> > [ 105.492471] kthread+0x168/0x16c
> > [ 105.492473] ret_from_fork+0x10/0x18
> > ...
> >
> > After analyse the process, we find that it will
> > device_del(&shost->gendev) in function scsi_remove_host() and then
> >
> > put_device(&shost->shost_gendev) in function scsi_host_put() when
> > removing the driver, if there is a link between shost and hisi_hba->dev,
> >
> > it will try to delete the link in device_del(), and also will
> > call_srcu(__device_link_free_srcu) to put_device() link->consumer and
> > supplier.
> >
> > But if put device() for shost_gendev in device_link_free() is later
> > than in scsi_host_put(), it will call scsi_host_dev_release() in
> >
> > srcu_invoke_callbacks() while it is atomic and there are scheduling in
> > scsi_host_dev_release(),
> >
> > so it reports the BUG "scheduling while atomic:...".
> >
> > thread 1 thread2
> > hisi_sas_v3_remove
> > ...
> > sas_remove_host()
> > ...
> > scsi_remove_host()
> > ...
> > device_del(&shost->shost_gendev)
> > ...
> > device_link_purge()
> > __device_link_del()
> > device_unregister(&link->link_dev)
> > devlink_dev_release
> > call_srcu(__device_link_free_srcu) ----------->
> > srcu_invoke_callbacks (atomic)
> > __device_link_free_srcu
> > ...
> > scsi_host_put()
> > put_device(&shost->shost_gendev) (ref = 1)
> > device_link_free()
> > put_device(link->consumer)
> > //shost->gendev ref = 0
> > ...
> > scsi_host_dev_release
> > ...
> > rcu_barrier
> > kthread_stop()
> >
> >
> > We can check kref of shost->shost_gendev to make sure scsi_host_put()
> > to release scsi host device in LLDD driver to avoid the issue,
> >
> > but it seems be a common issue: function __device_link_free_srcu
> > calls put_device() for consumer and supplier,
> >
> > but if it's ref =0 at that time and there are scheduling or sleep in
> > dev_release, it may have the issue.
> >
> > Do you have any idea about the issue?
> >
> Yes, this is a general issue.
>
> If I'm not mistaken, it can be addressed by further deferring the
> device_link_free() invocation through a workqueue.
>
> Let me cut a patch doing this.

Please test the patch below and let me know if it works for you.

---
drivers/base/core.c | 18 ++++++++++++++++--
include/linux/device.h | 5 ++++-
2 files changed, 20 insertions(+), 3 deletions(-)

Index: linux-pm/drivers/base/core.c
===================================================================
--- linux-pm.orig/drivers/base/core.c
+++ linux-pm/drivers/base/core.c
@@ -455,16 +455,30 @@ static void device_link_free(struct devi
}

#ifdef CONFIG_SRCU
+static void __device_link_free_fn(struct work_struct *work)
+{
+ device_link_free(container_of(work, struct device_link, srcu.work));
+}
+
static void __device_link_free_srcu(struct rcu_head *rhead)
{
- device_link_free(container_of(rhead, struct device_link, rcu_head));
+ struct device_link *link = container_of(rhead, struct device_link,
+ srcu.rhead);
+ struct work_struct *work = &link->srcu.work;
+
+ /*
+ * Because device_link_free() may sleep in some cases, schedule the
+ * execution of it instead of invoking it directly.
+ */
+ INIT_WORK(work, __device_link_free_fn);
+ schedule_work(work);
}

static void devlink_dev_release(struct device *dev)
{
struct device_link *link = to_devlink(dev);

- call_srcu(&device_links_srcu, &link->rcu_head, __device_link_free_srcu);
+ call_srcu(&device_links_srcu, &link->srcu.rhead, __device_link_free_srcu);
}
#else
static void devlink_dev_release(struct device *dev)
Index: linux-pm/include/linux/device.h
===================================================================
--- linux-pm.orig/include/linux/device.h
+++ linux-pm/include/linux/device.h
@@ -584,7 +584,10 @@ struct device_link {
refcount_t rpm_active;
struct kref kref;
#ifdef CONFIG_SRCU
- struct rcu_head rcu_head;
+ union {
+ struct rcu_head rhead;
+ struct work_struct work;
+ } srcu;
#endif
bool supplier_preactivated; /* Owned by consumer probe. */
};



2021-05-11 19:27:34

by Saravana Kannan

[permalink] [raw]
Subject: Re: Qestion about device link

On Tue, May 11, 2021 at 12:16 PM Rafael J. Wysocki <[email protected]> wrote:
>
> On Tuesday, May 11, 2021 4:39:31 PM CEST Rafael J. Wysocki wrote:
> > On 5/11/2021 5:59 AM, chenxiang (M) wrote:
> > > Hi Rafael and other guys,
> > >
> > > I am trying to add a device link between scsi_host->shost_gendev and
> > > hisi_hba->dev to support runtime PM for hisi_hba driver
> > >
> > > (as it supports runtime PM for scsi host in some scenarios such as
> > > error handler etc, we can avoid to do them again if adding a
> > >
> > > device link between scsi_host->shost_gendev and hisi_hba->dev) as
> > > follows (hisi_sas driver is under directory drivers/scsi/hisi_sas):
> > >
> > > device_link_add(&shost->shost_gendev, hisi_hba->dev,
> > > DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE)
> > >
> > > We have a full test on it, and it works well except when rmmod the
> > > driver, some call trace occurs as follows:
> > >
> > > [root@localhost ~]# rmmod hisi_sas_v3_hw
> > > [ 105.377944] BUG: scheduling while atomic: kworker/113:1/811/0x00000201
> > > [ 105.384469] Modules linked in: bluetooth rfkill ib_isert
> > > iscsi_target_mod ib_ipoib ib_umad iptable_filter vfio_iommu_type1
> > > vfio_pci vfio_virqfd vfio rpcrdma ib_is er
> > > libiscsi scsi_transport_iscsi crct10dif_ce sbsa_gwdt hns_roce_hw_v2
> > > hisi_sec2 hisi_hpre hisi_zip hisi_qm uacce spi_hisi_sfc_v3xx
> > > hisi_trng_v2 rng_core hisi_uncore _hha_pmu
> > > hisi_uncore_ddrc_pmu hisi_uncore_l3c_pmu spi_dw_mmio hisi_uncore_pmu
> > > hns3 hclge hnae3 hisi_sas_v3_hw(-) hisi_sas_main libsas
> > > [ 105.424841] CPU: 113 PID: 811 Comm: kworker/113:1 Kdump: loaded
> > > Tainted: G W 5.12.0-rc1+ #1
> > > [ 105.434454] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS
> > > 2280-V2 CS V5.B143.01 04/22/2021
> > > [ 105.443287] Workqueue: rcu_gp srcu_invoke_callbacks
> > > [ 105.448154] Call trace:
> > > [ 105.450593] dump_backtrace+0x0/0x1a4
> > > [ 105.454245] show_stack+0x24/0x40
> > > [ 105.457548] dump_stack+0xc8/0x104
> > > [ 105.460939] __schedule_bug+0x68/0x80
> > > [ 105.464590] __schedule+0x73c/0x77c
> > > [ 105.465700] BUG: scheduling while atomic: kworker/96:1/791/0x00000201
> > > [ 105.468066] schedule+0x7c/0x110
> > > [ 105.468068] schedule_timeout+0x194/0x1d4
> > > [ 105.474490] Modules linked in:
> > > [ 105.477692] wait_for_completion+0x8c/0x12c
> > > [ 105.477695] rcu_barrier+0x1e0/0x2fc
> > > [ 105.477697] scsi_host_dev_release+0x50/0xf0
> > > [ 105.477701] device_release+0x40/0xa0
> > > [ 105.477704] kobject_put+0xac/0x100
> > > [ 105.477707] __device_link_free_srcu+0x50/0x74
> > > [ 105.477709] srcu_invoke_callbacks+0x108/0x1a4
> > > [ 105.484743] process_one_work+0x1dc/0x48c
> > > [ 105.492468] worker_thread+0x7c/0x464
> > > [ 105.492471] kthread+0x168/0x16c
> > > [ 105.492473] ret_from_fork+0x10/0x18
> > > ...
> > >
> > > After analyse the process, we find that it will
> > > device_del(&shost->gendev) in function scsi_remove_host() and then
> > >
> > > put_device(&shost->shost_gendev) in function scsi_host_put() when
> > > removing the driver, if there is a link between shost and hisi_hba->dev,
> > >
> > > it will try to delete the link in device_del(), and also will
> > > call_srcu(__device_link_free_srcu) to put_device() link->consumer and
> > > supplier.
> > >
> > > But if put device() for shost_gendev in device_link_free() is later
> > > than in scsi_host_put(), it will call scsi_host_dev_release() in
> > >
> > > srcu_invoke_callbacks() while it is atomic and there are scheduling in
> > > scsi_host_dev_release(),
> > >
> > > so it reports the BUG "scheduling while atomic:...".
> > >
> > > thread 1 thread2
> > > hisi_sas_v3_remove
> > > ...
> > > sas_remove_host()
> > > ...
> > > scsi_remove_host()
> > > ...
> > > device_del(&shost->shost_gendev)
> > > ...
> > > device_link_purge()
> > > __device_link_del()
> > > device_unregister(&link->link_dev)
> > > devlink_dev_release
> > > call_srcu(__device_link_free_srcu) ----------->
> > > srcu_invoke_callbacks (atomic)
> > > __device_link_free_srcu
> > > ...
> > > scsi_host_put()
> > > put_device(&shost->shost_gendev) (ref = 1)
> > > device_link_free()
> > > put_device(link->consumer)
> > > //shost->gendev ref = 0
> > > ...
> > > scsi_host_dev_release
> > > ...
> > > rcu_barrier
> > > kthread_stop()
> > >
> > >
> > > We can check kref of shost->shost_gendev to make sure scsi_host_put()
> > > to release scsi host device in LLDD driver to avoid the issue,
> > >
> > > but it seems be a common issue: function __device_link_free_srcu
> > > calls put_device() for consumer and supplier,
> > >
> > > but if it's ref =0 at that time and there are scheduling or sleep in
> > > dev_release, it may have the issue.
> > >
> > > Do you have any idea about the issue?
> > >
> > Yes, this is a general issue.
> >
> > If I'm not mistaken, it can be addressed by further deferring the
> > device_link_free() invocation through a workqueue.
> >
> > Let me cut a patch doing this.
>
> Please test the patch below and let me know if it works for you.
>
> ---
> drivers/base/core.c | 18 ++++++++++++++++--
> include/linux/device.h | 5 ++++-
> 2 files changed, 20 insertions(+), 3 deletions(-)
>
> Index: linux-pm/drivers/base/core.c
> ===================================================================
> --- linux-pm.orig/drivers/base/core.c
> +++ linux-pm/drivers/base/core.c
> @@ -455,16 +455,30 @@ static void device_link_free(struct devi
> }
>
> #ifdef CONFIG_SRCU
> +static void __device_link_free_fn(struct work_struct *work)
> +{
> + device_link_free(container_of(work, struct device_link, srcu.work));
> +}
> +
> static void __device_link_free_srcu(struct rcu_head *rhead)
> {
> - device_link_free(container_of(rhead, struct device_link, rcu_head));
> + struct device_link *link = container_of(rhead, struct device_link,
> + srcu.rhead);
> + struct work_struct *work = &link->srcu.work;
> +
> + /*
> + * Because device_link_free() may sleep in some cases, schedule the
> + * execution of it instead of invoking it directly.
> + */
> + INIT_WORK(work, __device_link_free_fn);
> + schedule_work(work);
> }
>
> static void devlink_dev_release(struct device *dev)
> {
> struct device_link *link = to_devlink(dev);
>
> - call_srcu(&device_links_srcu, &link->rcu_head, __device_link_free_srcu);
> + call_srcu(&device_links_srcu, &link->srcu.rhead, __device_link_free_srcu);
> }
> #else
> static void devlink_dev_release(struct device *dev)
> Index: linux-pm/include/linux/device.h
> ===================================================================
> --- linux-pm.orig/include/linux/device.h
> +++ linux-pm/include/linux/device.h
> @@ -584,7 +584,10 @@ struct device_link {
> refcount_t rpm_active;
> struct kref kref;
> #ifdef CONFIG_SRCU
> - struct rcu_head rcu_head;
> + union {
> + struct rcu_head rhead;
> + struct work_struct work;
> + } srcu;

I'll do the actual review on a more final patch? I see you are trying
to save space, but is this worth the readability reduction?

-Saravana

2021-05-11 19:44:15

by Saravana Kannan

[permalink] [raw]
Subject: Re: Question about device link//Re: Qestion about device link

On Tue, May 11, 2021 at 12:13 PM Rafael J. Wysocki
<[email protected]> wrote:
>
> On 5/11/2021 8:23 PM, Saravana Kannan wrote:
> > On Tue, May 11, 2021 at 3:42 AM chenxiang (M) <[email protected]> wrote:
> >> Re-edit the non-aligned flowchart and add CC to Greg-KH and Saravanna.
> >>
> >>
> >> 在 2021/5/11 11:59, chenxiang (M) 写道:
> >>> Hi Rafael and other guys,
> >>>
> >>> I am trying to add a device link between scsi_host->shost_gendev and
> >>> hisi_hba->dev to support runtime PM for hisi_hba driver
> >>>
> >>> (as it supports runtime PM for scsi host in some scenarios such as
> >>> error handler etc, we can avoid to do them again if adding a
> >>>
> >>> device link between scsi_host->shost_gendev and hisi_hba->dev) as
> >>> follows (hisi_sas driver is under directory drivers/scsi/hisi_sas):
> >>>
> >>> device_link_add(&shost->shost_gendev, hisi_hba->dev,
> >>> DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE)
> >>>
> >>> We have a full test on it, and it works well except when rmmod the
> >>> driver, some call trace occurs as follows:
> >>>
> >>> [root@localhost ~]# rmmod hisi_sas_v3_hw
> >>> [ 105.377944] BUG: scheduling while atomic: kworker/113:1/811/0x00000201
> >>> [ 105.384469] Modules linked in: bluetooth rfkill ib_isert
> >>> iscsi_target_mod ib_ipoib ib_umad iptable_filter vfio_iommu_type1
> >>> vfio_pci vfio_virqfd vfio rpcrdma ib_is er
> >>> libiscsi scsi_transport_iscsi crct10dif_ce sbsa_gwdt hns_roce_hw_v2
> >>> hisi_sec2 hisi_hpre hisi_zip hisi_qm uacce spi_hisi_sfc_v3xx
> >>> hisi_trng_v2 rng_core hisi_uncore _hha_pmu
> >>> hisi_uncore_ddrc_pmu hisi_uncore_l3c_pmu spi_dw_mmio hisi_uncore_pmu
> >>> hns3 hclge hnae3 hisi_sas_v3_hw(-) hisi_sas_main libsas
> >>> [ 105.424841] CPU: 113 PID: 811 Comm: kworker/113:1 Kdump: loaded
> >>> Tainted: G W 5.12.0-rc1+ #1
> >>> [ 105.434454] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS
> >>> 2280-V2 CS V5.B143.01 04/22/2021
> >>> [ 105.443287] Workqueue: rcu_gp srcu_invoke_callbacks
> >>> [ 105.448154] Call trace:
> >>> [ 105.450593] dump_backtrace+0x0/0x1a4
> >>> [ 105.454245] show_stack+0x24/0x40
> >>> [ 105.457548] dump_stack+0xc8/0x104
> >>> [ 105.460939] __schedule_bug+0x68/0x80
> >>> [ 105.464590] __schedule+0x73c/0x77c
> >>> [ 105.465700] BUG: scheduling while atomic: kworker/96:1/791/0x00000201
> >>> [ 105.468066] schedule+0x7c/0x110
> >>> [ 105.468068] schedule_timeout+0x194/0x1d4
> >>> [ 105.474490] Modules linked in:
> >>> [ 105.477692] wait_for_completion+0x8c/0x12c
> >>> [ 105.477695] rcu_barrier+0x1e0/0x2fc
> >>> [ 105.477697] scsi_host_dev_release+0x50/0xf0
> >>> [ 105.477701] device_release+0x40/0xa0
> >>> [ 105.477704] kobject_put+0xac/0x100
> >>> [ 105.477707] __device_link_free_srcu+0x50/0x74
> >>> [ 105.477709] srcu_invoke_callbacks+0x108/0x1a4
> >>> [ 105.484743] process_one_work+0x1dc/0x48c
> >>> [ 105.492468] worker_thread+0x7c/0x464
> >>> [ 105.492471] kthread+0x168/0x16c
> >>> [ 105.492473] ret_from_fork+0x10/0x18
> >>> ...
> >>>
> >>> After analyse the process, we find that it will
> >>> device_del(&shost->gendev) in function scsi_remove_host() and then
> >>>
> >>> put_device(&shost->shost_gendev) in function scsi_host_put() when
> >>> removing the driver, if there is a link between shost and hisi_hba->dev,
> >>>
> >>> it will try to delete the link in device_del(), and also will
> >>> call_srcu(__device_link_free_srcu) to put_device() link->consumer and
> >>> supplier.
> >>>
> >>> But if put device() for shost_gendev in device_link_free() is later
> >>> than in scsi_host_put(), it will call scsi_host_dev_release() in
> >>>
> >>> srcu_invoke_callbacks() while it is atomic and there are scheduling in
> >>> scsi_host_dev_release(),
> >>>
> >>> so it reports the BUG "scheduling while atomic:...".
> >>>
> >>> thread 1 thread2
> >>> hisi_sas_v3_remove
> >>> ...
> >>> sas_remove_host()
> >>> ...
> >>> scsi_remove_host()
> >>> ...
> >>> device_del(&shost->shost_gendev)
> >>> ...
> >>> device_link_purge()
> >>> __device_link_del()
> >>> device_unregister(&link->link_dev)
> >>> devlink_dev_release
> >>> call_srcu(__device_link_free_srcu) ----------->
> >>> srcu_invoke_callbacks (atomic)
> >>> __device_link_free_srcu
> >>> ...
> >>> scsi_host_put()
> >>> put_device(&shost->shost_gendev) (ref = 1)
> >>> device_link_free()
> >>> put_device(link->consumer)
> >>> //shost->gendev ref = 0
> >>> ...
> >>> scsi_host_dev_release
> >>> ...
> >>> rcu_barrier
> >>> kthread_stop()
> >> Re-edit the non-aligned flowchart
> >> thread 1 thread 2
> >> hisi_sas_v3_remove()
> >> ...
> >> sas_remove_host()
> >> ...
> >> device_del(&shost->shost_gendev)
> >> ...
> >> device_link_purge()
> >> __device_link_del()
> >> device_unregister(&link->link_dev)
> >> devlink_dev_release
> >> call_srcu(__device_link_free_srcu) ----------->
> >> srcu_invoke_callbacks (atomic)
> >> __device_link_free_srcu()
> >> ...
> >> scsi_host_put()
> >> put_device(&shost->shost_gendev) (ref = 1)
> >> device_link_free()
> >> put_device(link->consumer)
> >> //shost->gendev ref = 0
> >> ...
> >> scsi_host_dev_release()
> >> ...
> >> rcu_barrier()
> >> kthread_stop()
> >>
> >>>
> >>> We can check kref of shost->shost_gendev to make sure scsi_host_put()
> >>> to release scsi host device in LLDD driver to avoid the issue,
> >>>
> >>> but it seems be a common issue: function __device_link_free_srcu
> >>> calls put_device() for consumer and supplier,
> >>>
> >>> but if it's ref =0 at that time and there are scheduling or sleep in
> >>> dev_release, it may have the issue.
> >>>
> >>> Do you have any idea about the issue?
> > Another report for the same issue.
> > https://lore.kernel.org/lkml/CAGETcx80xSZ8d4JbZqiSz4L0VNtL+HCnFCS2u3F9aNC0QQoQjg@mail.gmail.com/
> >
> > I don't have enough context yet about the need for SRCU (I haven't
> > read up all the runtime PM code), but this is a real issue that needs
> > to be solved.
> >
> > Dirty/terrible hack is to kick off another work to do the
> > put_device().
>
> I wouldn't call it dirty or terrible, but it may just be the thing that
> needs to be done here.
>
>
> > But is there any SRCU option that'll try to do the
> > release in a non-atomic context?
>
> No, the callbacks are run from a softirq if I'm not mistaken.

Right, I meant that this seems like a common thing some SRCU callbacks
might want to do. So, I thought that there might be a flag or option
to kick off work for srcu callbacks. Also, the stack trace shows that
this is already running in a work context but the callback is wrapped
with local_bh_disable/enable() and that's the reason for this warning.
But I don't know enough about SRCU implementation to make a comment on
whether "run stuff in a work queue" can be a generic SRCU feature.

Anyway, if kicking off a new work is what you want to do, I'm not
going to oppose that.

-Saravana

2021-05-11 19:45:00

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Qestion about device link

On Tue, May 11, 2021 at 9:24 PM Saravana Kannan <[email protected]> wrote:
>
> On Tue, May 11, 2021 at 12:16 PM Rafael J. Wysocki <[email protected]> wrote:
> >
> > On Tuesday, May 11, 2021 4:39:31 PM CEST Rafael J. Wysocki wrote:
> > > On 5/11/2021 5:59 AM, chenxiang (M) wrote:
> > > > Hi Rafael and other guys,
> > > >
> > > > I am trying to add a device link between scsi_host->shost_gendev and
> > > > hisi_hba->dev to support runtime PM for hisi_hba driver
> > > >
> > > > (as it supports runtime PM for scsi host in some scenarios such as
> > > > error handler etc, we can avoid to do them again if adding a
> > > >
> > > > device link between scsi_host->shost_gendev and hisi_hba->dev) as
> > > > follows (hisi_sas driver is under directory drivers/scsi/hisi_sas):
> > > >
> > > > device_link_add(&shost->shost_gendev, hisi_hba->dev,
> > > > DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE)
> > > >
> > > > We have a full test on it, and it works well except when rmmod the
> > > > driver, some call trace occurs as follows:
> > > >
> > > > [root@localhost ~]# rmmod hisi_sas_v3_hw
> > > > [ 105.377944] BUG: scheduling while atomic: kworker/113:1/811/0x00000201
> > > > [ 105.384469] Modules linked in: bluetooth rfkill ib_isert
> > > > iscsi_target_mod ib_ipoib ib_umad iptable_filter vfio_iommu_type1
> > > > vfio_pci vfio_virqfd vfio rpcrdma ib_is er
> > > > libiscsi scsi_transport_iscsi crct10dif_ce sbsa_gwdt hns_roce_hw_v2
> > > > hisi_sec2 hisi_hpre hisi_zip hisi_qm uacce spi_hisi_sfc_v3xx
> > > > hisi_trng_v2 rng_core hisi_uncore _hha_pmu
> > > > hisi_uncore_ddrc_pmu hisi_uncore_l3c_pmu spi_dw_mmio hisi_uncore_pmu
> > > > hns3 hclge hnae3 hisi_sas_v3_hw(-) hisi_sas_main libsas
> > > > [ 105.424841] CPU: 113 PID: 811 Comm: kworker/113:1 Kdump: loaded
> > > > Tainted: G W 5.12.0-rc1+ #1
> > > > [ 105.434454] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS
> > > > 2280-V2 CS V5.B143.01 04/22/2021
> > > > [ 105.443287] Workqueue: rcu_gp srcu_invoke_callbacks
> > > > [ 105.448154] Call trace:
> > > > [ 105.450593] dump_backtrace+0x0/0x1a4
> > > > [ 105.454245] show_stack+0x24/0x40
> > > > [ 105.457548] dump_stack+0xc8/0x104
> > > > [ 105.460939] __schedule_bug+0x68/0x80
> > > > [ 105.464590] __schedule+0x73c/0x77c
> > > > [ 105.465700] BUG: scheduling while atomic: kworker/96:1/791/0x00000201
> > > > [ 105.468066] schedule+0x7c/0x110
> > > > [ 105.468068] schedule_timeout+0x194/0x1d4
> > > > [ 105.474490] Modules linked in:
> > > > [ 105.477692] wait_for_completion+0x8c/0x12c
> > > > [ 105.477695] rcu_barrier+0x1e0/0x2fc
> > > > [ 105.477697] scsi_host_dev_release+0x50/0xf0
> > > > [ 105.477701] device_release+0x40/0xa0
> > > > [ 105.477704] kobject_put+0xac/0x100
> > > > [ 105.477707] __device_link_free_srcu+0x50/0x74
> > > > [ 105.477709] srcu_invoke_callbacks+0x108/0x1a4
> > > > [ 105.484743] process_one_work+0x1dc/0x48c
> > > > [ 105.492468] worker_thread+0x7c/0x464
> > > > [ 105.492471] kthread+0x168/0x16c
> > > > [ 105.492473] ret_from_fork+0x10/0x18
> > > > ...
> > > >
> > > > After analyse the process, we find that it will
> > > > device_del(&shost->gendev) in function scsi_remove_host() and then
> > > >
> > > > put_device(&shost->shost_gendev) in function scsi_host_put() when
> > > > removing the driver, if there is a link between shost and hisi_hba->dev,
> > > >
> > > > it will try to delete the link in device_del(), and also will
> > > > call_srcu(__device_link_free_srcu) to put_device() link->consumer and
> > > > supplier.
> > > >
> > > > But if put device() for shost_gendev in device_link_free() is later
> > > > than in scsi_host_put(), it will call scsi_host_dev_release() in
> > > >
> > > > srcu_invoke_callbacks() while it is atomic and there are scheduling in
> > > > scsi_host_dev_release(),
> > > >
> > > > so it reports the BUG "scheduling while atomic:...".
> > > >
> > > > thread 1 thread2
> > > > hisi_sas_v3_remove
> > > > ...
> > > > sas_remove_host()
> > > > ...
> > > > scsi_remove_host()
> > > > ...
> > > > device_del(&shost->shost_gendev)
> > > > ...
> > > > device_link_purge()
> > > > __device_link_del()
> > > > device_unregister(&link->link_dev)
> > > > devlink_dev_release
> > > > call_srcu(__device_link_free_srcu) ----------->
> > > > srcu_invoke_callbacks (atomic)
> > > > __device_link_free_srcu
> > > > ...
> > > > scsi_host_put()
> > > > put_device(&shost->shost_gendev) (ref = 1)
> > > > device_link_free()
> > > > put_device(link->consumer)
> > > > //shost->gendev ref = 0
> > > > ...
> > > > scsi_host_dev_release
> > > > ...
> > > > rcu_barrier
> > > > kthread_stop()
> > > >
> > > >
> > > > We can check kref of shost->shost_gendev to make sure scsi_host_put()
> > > > to release scsi host device in LLDD driver to avoid the issue,
> > > >
> > > > but it seems be a common issue: function __device_link_free_srcu
> > > > calls put_device() for consumer and supplier,
> > > >
> > > > but if it's ref =0 at that time and there are scheduling or sleep in
> > > > dev_release, it may have the issue.
> > > >
> > > > Do you have any idea about the issue?
> > > >
> > > Yes, this is a general issue.
> > >
> > > If I'm not mistaken, it can be addressed by further deferring the
> > > device_link_free() invocation through a workqueue.
> > >
> > > Let me cut a patch doing this.
> >
> > Please test the patch below and let me know if it works for you.
> >
> > ---
> > drivers/base/core.c | 18 ++++++++++++++++--
> > include/linux/device.h | 5 ++++-
> > 2 files changed, 20 insertions(+), 3 deletions(-)
> >
> > Index: linux-pm/drivers/base/core.c
> > ===================================================================
> > --- linux-pm.orig/drivers/base/core.c
> > +++ linux-pm/drivers/base/core.c
> > @@ -455,16 +455,30 @@ static void device_link_free(struct devi
> > }
> >
> > #ifdef CONFIG_SRCU
> > +static void __device_link_free_fn(struct work_struct *work)
> > +{
> > + device_link_free(container_of(work, struct device_link, srcu.work));
> > +}
> > +
> > static void __device_link_free_srcu(struct rcu_head *rhead)
> > {
> > - device_link_free(container_of(rhead, struct device_link, rcu_head));
> > + struct device_link *link = container_of(rhead, struct device_link,
> > + srcu.rhead);
> > + struct work_struct *work = &link->srcu.work;
> > +
> > + /*
> > + * Because device_link_free() may sleep in some cases, schedule the
> > + * execution of it instead of invoking it directly.
> > + */
> > + INIT_WORK(work, __device_link_free_fn);
> > + schedule_work(work);
> > }
> >
> > static void devlink_dev_release(struct device *dev)
> > {
> > struct device_link *link = to_devlink(dev);
> >
> > - call_srcu(&device_links_srcu, &link->rcu_head, __device_link_free_srcu);
> > + call_srcu(&device_links_srcu, &link->srcu.rhead, __device_link_free_srcu);
> > }
> > #else
> > static void devlink_dev_release(struct device *dev)
> > Index: linux-pm/include/linux/device.h
> > ===================================================================
> > --- linux-pm.orig/include/linux/device.h
> > +++ linux-pm/include/linux/device.h
> > @@ -584,7 +584,10 @@ struct device_link {
> > refcount_t rpm_active;
> > struct kref kref;
> > #ifdef CONFIG_SRCU
> > - struct rcu_head rcu_head;
> > + union {
> > + struct rcu_head rhead;
> > + struct work_struct work;
> > + } srcu;
>
> I'll do the actual review on a more final patch? I see you are trying
> to save space, but is this worth the readability reduction?

Well, is this so much less readable?

Anyway, because we have a whole struct device in there now, the size
difference probably doesn't matter.

Also it is not particularly useful to do the entire unregistration
under the device links write lock, so the device_unregister() in
__device_link_del() could be delegated to a workqueue and run from
there, so the whole call_rcu() dance could be avoided.

Which would be a better fix IMO, but let's see if this one works.

2021-05-12 03:26:03

by chenxiang (M)

[permalink] [raw]
Subject: Re: Qestion about device link

Hi Rafael,


在 2021/5/12 3:16, Rafael J. Wysocki 写道:
> On Tuesday, May 11, 2021 4:39:31 PM CEST Rafael J. Wysocki wrote:
>> On 5/11/2021 5:59 AM, chenxiang (M) wrote:
>>> Hi Rafael and other guys,
>>>
>>> I am trying to add a device link between scsi_host->shost_gendev and
>>> hisi_hba->dev to support runtime PM for hisi_hba driver
>>>
>>> (as it supports runtime PM for scsi host in some scenarios such as
>>> error handler etc, we can avoid to do them again if adding a
>>>
>>> device link between scsi_host->shost_gendev and hisi_hba->dev) as
>>> follows (hisi_sas driver is under directory drivers/scsi/hisi_sas):
>>>
>>> device_link_add(&shost->shost_gendev, hisi_hba->dev,
>>> DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE)
>>>
>>> We have a full test on it, and it works well except when rmmod the
>>> driver, some call trace occurs as follows:
>>>
>>> [root@localhost ~]# rmmod hisi_sas_v3_hw
>>> [ 105.377944] BUG: scheduling while atomic: kworker/113:1/811/0x00000201
>>> [ 105.384469] Modules linked in: bluetooth rfkill ib_isert
>>> iscsi_target_mod ib_ipoib ib_umad iptable_filter vfio_iommu_type1
>>> vfio_pci vfio_virqfd vfio rpcrdma ib_is er
>>> libiscsi scsi_transport_iscsi crct10dif_ce sbsa_gwdt hns_roce_hw_v2
>>> hisi_sec2 hisi_hpre hisi_zip hisi_qm uacce spi_hisi_sfc_v3xx
>>> hisi_trng_v2 rng_core hisi_uncore _hha_pmu
>>> hisi_uncore_ddrc_pmu hisi_uncore_l3c_pmu spi_dw_mmio hisi_uncore_pmu
>>> hns3 hclge hnae3 hisi_sas_v3_hw(-) hisi_sas_main libsas
>>> [ 105.424841] CPU: 113 PID: 811 Comm: kworker/113:1 Kdump: loaded
>>> Tainted: G W 5.12.0-rc1+ #1
>>> [ 105.434454] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS
>>> 2280-V2 CS V5.B143.01 04/22/2021
>>> [ 105.443287] Workqueue: rcu_gp srcu_invoke_callbacks
>>> [ 105.448154] Call trace:
>>> [ 105.450593] dump_backtrace+0x0/0x1a4
>>> [ 105.454245] show_stack+0x24/0x40
>>> [ 105.457548] dump_stack+0xc8/0x104
>>> [ 105.460939] __schedule_bug+0x68/0x80
>>> [ 105.464590] __schedule+0x73c/0x77c
>>> [ 105.465700] BUG: scheduling while atomic: kworker/96:1/791/0x00000201
>>> [ 105.468066] schedule+0x7c/0x110
>>> [ 105.468068] schedule_timeout+0x194/0x1d4
>>> [ 105.474490] Modules linked in:
>>> [ 105.477692] wait_for_completion+0x8c/0x12c
>>> [ 105.477695] rcu_barrier+0x1e0/0x2fc
>>> [ 105.477697] scsi_host_dev_release+0x50/0xf0
>>> [ 105.477701] device_release+0x40/0xa0
>>> [ 105.477704] kobject_put+0xac/0x100
>>> [ 105.477707] __device_link_free_srcu+0x50/0x74
>>> [ 105.477709] srcu_invoke_callbacks+0x108/0x1a4
>>> [ 105.484743] process_one_work+0x1dc/0x48c
>>> [ 105.492468] worker_thread+0x7c/0x464
>>> [ 105.492471] kthread+0x168/0x16c
>>> [ 105.492473] ret_from_fork+0x10/0x18
>>> ...
>>>
>>> After analyse the process, we find that it will
>>> device_del(&shost->gendev) in function scsi_remove_host() and then
>>>
>>> put_device(&shost->shost_gendev) in function scsi_host_put() when
>>> removing the driver, if there is a link between shost and hisi_hba->dev,
>>>
>>> it will try to delete the link in device_del(), and also will
>>> call_srcu(__device_link_free_srcu) to put_device() link->consumer and
>>> supplier.
>>>
>>> But if put device() for shost_gendev in device_link_free() is later
>>> than in scsi_host_put(), it will call scsi_host_dev_release() in
>>>
>>> srcu_invoke_callbacks() while it is atomic and there are scheduling in
>>> scsi_host_dev_release(),
>>>
>>> so it reports the BUG "scheduling while atomic:...".
>>>
>>> thread 1 thread2
>>> hisi_sas_v3_remove
>>> ...
>>> sas_remove_host()
>>> ...
>>> scsi_remove_host()
>>> ...
>>> device_del(&shost->shost_gendev)
>>> ...
>>> device_link_purge()
>>> __device_link_del()
>>> device_unregister(&link->link_dev)
>>> devlink_dev_release
>>> call_srcu(__device_link_free_srcu) ----------->
>>> srcu_invoke_callbacks (atomic)
>>> __device_link_free_srcu
>>> ...
>>> scsi_host_put()
>>> put_device(&shost->shost_gendev) (ref = 1)
>>> device_link_free()
>>> put_device(link->consumer)
>>> //shost->gendev ref = 0
>>> ...
>>> scsi_host_dev_release
>>> ...
>>> rcu_barrier
>>> kthread_stop()
>>>
>>>
>>> We can check kref of shost->shost_gendev to make sure scsi_host_put()
>>> to release scsi host device in LLDD driver to avoid the issue,
>>>
>>> but it seems be a common issue: function __device_link_free_srcu
>>> calls put_device() for consumer and supplier,
>>>
>>> but if it's ref =0 at that time and there are scheduling or sleep in
>>> dev_release, it may have the issue.
>>>
>>> Do you have any idea about the issue?
>>>
>> Yes, this is a general issue.
>>
>> If I'm not mistaken, it can be addressed by further deferring the
>> device_link_free() invocation through a workqueue.
>>
>> Let me cut a patch doing this.
> Please test the patch below and let me know if it works for you.

I have a test on the patch, and it solves my issue.


>
> ---
> drivers/base/core.c | 18 ++++++++++++++++--
> include/linux/device.h | 5 ++++-
> 2 files changed, 20 insertions(+), 3 deletions(-)
>
> Index: linux-pm/drivers/base/core.c
> ===================================================================
> --- linux-pm.orig/drivers/base/core.c
> +++ linux-pm/drivers/base/core.c
> @@ -455,16 +455,30 @@ static void device_link_free(struct devi
> }
>
> #ifdef CONFIG_SRCU
> +static void __device_link_free_fn(struct work_struct *work)
> +{
> + device_link_free(container_of(work, struct device_link, srcu.work));
> +}
> +
> static void __device_link_free_srcu(struct rcu_head *rhead)
> {
> - device_link_free(container_of(rhead, struct device_link, rcu_head));
> + struct device_link *link = container_of(rhead, struct device_link,
> + srcu.rhead);
> + struct work_struct *work = &link->srcu.work;
> +
> + /*
> + * Because device_link_free() may sleep in some cases, schedule the
> + * execution of it instead of invoking it directly.
> + */
> + INIT_WORK(work, __device_link_free_fn);
> + schedule_work(work);
> }
>
> static void devlink_dev_release(struct device *dev)
> {
> struct device_link *link = to_devlink(dev);
>
> - call_srcu(&device_links_srcu, &link->rcu_head, __device_link_free_srcu);
> + call_srcu(&device_links_srcu, &link->srcu.rhead, __device_link_free_srcu);
> }
> #else
> static void devlink_dev_release(struct device *dev)
> Index: linux-pm/include/linux/device.h
> ===================================================================
> --- linux-pm.orig/include/linux/device.h
> +++ linux-pm/include/linux/device.h
> @@ -584,7 +584,10 @@ struct device_link {
> refcount_t rpm_active;
> struct kref kref;
> #ifdef CONFIG_SRCU
> - struct rcu_head rcu_head;
> + union {
> + struct rcu_head rhead;
> + struct work_struct work;
> + } srcu;
> #endif
> bool supplier_preactivated; /* Owned by consumer probe. */
> };
>
>
>
>
> .
>


2021-05-12 11:33:17

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Qestion about device link

On Wed, May 12, 2021 at 8:38 AM Hillf Danton <[email protected]> wrote:
>
> On Tue, 11 May 2021 21:43:40 Rafael J. Wysocki wrote:
> > > #ifdef CONFIG_SRCU
> > > +static void __device_link_free_fn(struct work_struct *work)
> > > +{
> > > + device_link_free(container_of(work, struct device_link, srcu.work));
> > > +}
> > > +
> > > static void __device_link_free_srcu(struct rcu_head *rhead)
> > > {
> > > - device_link_free(container_of(rhead, struct device_link, rcu_head));
> > > + struct device_link *link = container_of(rhead, struct device_link,
> > > + srcu.rhead);
> > > + struct work_struct *work = &link->srcu.work;
> > > +
> > > + /*
> > > + * Because device_link_free() may sleep in some cases, schedule the
> > > + * execution of it instead of invoking it directly.
> > > + */
> > > + INIT_WORK(work, __device_link_free_fn);
> > > + schedule_work(work);
> > > }
>
> Nope, you need something like queue_work(system_unbound_wq, work); instead
> because of the blocking wq callback.

system_long_wq rather, as it really doesn't matter when it gets completed.

2021-05-12 14:07:23

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Qestion about device link

On Wednesday, May 12, 2021 5:24:53 AM CEST chenxiang (M) wrote:
> Hi Rafael,
>
>
> 在 2021/5/12 3:16, Rafael J. Wysocki 写道:
> > On Tuesday, May 11, 2021 4:39:31 PM CEST Rafael J. Wysocki wrote:
> >> On 5/11/2021 5:59 AM, chenxiang (M) wrote:
> >>> Hi Rafael and other guys,
> >>>
> >>> I am trying to add a device link between scsi_host->shost_gendev and
> >>> hisi_hba->dev to support runtime PM for hisi_hba driver
> >>>
> >>> (as it supports runtime PM for scsi host in some scenarios such as
> >>> error handler etc, we can avoid to do them again if adding a
> >>>
> >>> device link between scsi_host->shost_gendev and hisi_hba->dev) as
> >>> follows (hisi_sas driver is under directory drivers/scsi/hisi_sas):
> >>>
> >>> device_link_add(&shost->shost_gendev, hisi_hba->dev,
> >>> DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE)
> >>>
> >>> We have a full test on it, and it works well except when rmmod the
> >>> driver, some call trace occurs as follows:
> >>>
> >>> [root@localhost ~]# rmmod hisi_sas_v3_hw
> >>> [ 105.377944] BUG: scheduling while atomic: kworker/113:1/811/0x00000201
> >>> [ 105.384469] Modules linked in: bluetooth rfkill ib_isert
> >>> iscsi_target_mod ib_ipoib ib_umad iptable_filter vfio_iommu_type1
> >>> vfio_pci vfio_virqfd vfio rpcrdma ib_is er
> >>> libiscsi scsi_transport_iscsi crct10dif_ce sbsa_gwdt hns_roce_hw_v2
> >>> hisi_sec2 hisi_hpre hisi_zip hisi_qm uacce spi_hisi_sfc_v3xx
> >>> hisi_trng_v2 rng_core hisi_uncore _hha_pmu
> >>> hisi_uncore_ddrc_pmu hisi_uncore_l3c_pmu spi_dw_mmio hisi_uncore_pmu
> >>> hns3 hclge hnae3 hisi_sas_v3_hw(-) hisi_sas_main libsas
> >>> [ 105.424841] CPU: 113 PID: 811 Comm: kworker/113:1 Kdump: loaded
> >>> Tainted: G W 5.12.0-rc1+ #1
> >>> [ 105.434454] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS
> >>> 2280-V2 CS V5.B143.01 04/22/2021
> >>> [ 105.443287] Workqueue: rcu_gp srcu_invoke_callbacks
> >>> [ 105.448154] Call trace:
> >>> [ 105.450593] dump_backtrace+0x0/0x1a4
> >>> [ 105.454245] show_stack+0x24/0x40
> >>> [ 105.457548] dump_stack+0xc8/0x104
> >>> [ 105.460939] __schedule_bug+0x68/0x80
> >>> [ 105.464590] __schedule+0x73c/0x77c
> >>> [ 105.465700] BUG: scheduling while atomic: kworker/96:1/791/0x00000201
> >>> [ 105.468066] schedule+0x7c/0x110
> >>> [ 105.468068] schedule_timeout+0x194/0x1d4
> >>> [ 105.474490] Modules linked in:
> >>> [ 105.477692] wait_for_completion+0x8c/0x12c
> >>> [ 105.477695] rcu_barrier+0x1e0/0x2fc
> >>> [ 105.477697] scsi_host_dev_release+0x50/0xf0
> >>> [ 105.477701] device_release+0x40/0xa0
> >>> [ 105.477704] kobject_put+0xac/0x100
> >>> [ 105.477707] __device_link_free_srcu+0x50/0x74
> >>> [ 105.477709] srcu_invoke_callbacks+0x108/0x1a4
> >>> [ 105.484743] process_one_work+0x1dc/0x48c
> >>> [ 105.492468] worker_thread+0x7c/0x464
> >>> [ 105.492471] kthread+0x168/0x16c
> >>> [ 105.492473] ret_from_fork+0x10/0x18
> >>> ...
> >>>
> >>> After analyse the process, we find that it will
> >>> device_del(&shost->gendev) in function scsi_remove_host() and then
> >>>
> >>> put_device(&shost->shost_gendev) in function scsi_host_put() when
> >>> removing the driver, if there is a link between shost and hisi_hba->dev,
> >>>
> >>> it will try to delete the link in device_del(), and also will
> >>> call_srcu(__device_link_free_srcu) to put_device() link->consumer and
> >>> supplier.
> >>>
> >>> But if put device() for shost_gendev in device_link_free() is later
> >>> than in scsi_host_put(), it will call scsi_host_dev_release() in
> >>>
> >>> srcu_invoke_callbacks() while it is atomic and there are scheduling in
> >>> scsi_host_dev_release(),
> >>>
> >>> so it reports the BUG "scheduling while atomic:...".
> >>>
> >>> thread 1 thread2
> >>> hisi_sas_v3_remove
> >>> ...
> >>> sas_remove_host()
> >>> ...
> >>> scsi_remove_host()
> >>> ...
> >>> device_del(&shost->shost_gendev)
> >>> ...
> >>> device_link_purge()
> >>> __device_link_del()
> >>> device_unregister(&link->link_dev)
> >>> devlink_dev_release
> >>> call_srcu(__device_link_free_srcu) ----------->
> >>> srcu_invoke_callbacks (atomic)
> >>> __device_link_free_srcu
> >>> ...
> >>> scsi_host_put()
> >>> put_device(&shost->shost_gendev) (ref = 1)
> >>> device_link_free()
> >>> put_device(link->consumer)
> >>> //shost->gendev ref = 0
> >>> ...
> >>> scsi_host_dev_release
> >>> ...
> >>> rcu_barrier
> >>> kthread_stop()
> >>>
> >>>
> >>> We can check kref of shost->shost_gendev to make sure scsi_host_put()
> >>> to release scsi host device in LLDD driver to avoid the issue,
> >>>
> >>> but it seems be a common issue: function __device_link_free_srcu
> >>> calls put_device() for consumer and supplier,
> >>>
> >>> but if it's ref =0 at that time and there are scheduling or sleep in
> >>> dev_release, it may have the issue.
> >>>
> >>> Do you have any idea about the issue?
> >>>
> >> Yes, this is a general issue.
> >>
> >> If I'm not mistaken, it can be addressed by further deferring the
> >> device_link_free() invocation through a workqueue.
> >>
> >> Let me cut a patch doing this.
> > Please test the patch below and let me know if it works for you.
>
> I have a test on the patch, and it solves my issue.

Great, thanks!

Please also test the patch appended below (it uses a slightly different approach).

---
drivers/base/core.c | 37 +++++++++++++++++++++++--------------
include/linux/device.h | 6 ++----
2 files changed, 25 insertions(+), 18 deletions(-)

Index: linux-pm/drivers/base/core.c
===================================================================
--- linux-pm.orig/drivers/base/core.c
+++ linux-pm/drivers/base/core.c
@@ -193,6 +193,11 @@ int device_links_read_lock_held(void)
{
return srcu_read_lock_held(&device_links_srcu);
}
+
+void device_link_synchronize_removal(void)
+{
+ synchronize_srcu(&device_links_srcu);
+}
#else /* !CONFIG_SRCU */
static DECLARE_RWSEM(device_links_lock);

@@ -223,6 +228,10 @@ int device_links_read_lock_held(void)
return lockdep_is_held(&device_links_lock);
}
#endif
+
+static inline void device_link_synchronize_removal(void)
+{
+}
#endif /* !CONFIG_SRCU */

static bool device_is_ancestor(struct device *dev, struct device *target)
@@ -444,8 +453,13 @@ static struct attribute *devlink_attrs[]
};
ATTRIBUTE_GROUPS(devlink);

-static void device_link_free(struct device_link *link)
+static void device_link_release_fn(struct work_struct *work)
{
+ struct device_link *link = container_of(work, struct device_link, rm_work);
+
+ /* Ensure that all references to the link object have been dropped. */
+ device_link_synchronize_removal();
+
while (refcount_dec_not_one(&link->rpm_active))
pm_runtime_put(link->supplier);

@@ -454,24 +468,19 @@ static void device_link_free(struct devi
kfree(link);
}

-#ifdef CONFIG_SRCU
-static void __device_link_free_srcu(struct rcu_head *rhead)
-{
- device_link_free(container_of(rhead, struct device_link, rcu_head));
-}
-
static void devlink_dev_release(struct device *dev)
{
struct device_link *link = to_devlink(dev);

- call_srcu(&device_links_srcu, &link->rcu_head, __device_link_free_srcu);
-}
-#else
-static void devlink_dev_release(struct device *dev)
-{
- device_link_free(to_devlink(dev));
+ INIT_WORK(&link->rm_work, device_link_release_fn);
+ /*
+ * It may take a while to complete this work because of the SRCU
+ * synchronization in device_link_release_fn() and if the consumer or
+ * supplier devices get deleted when it runs, so put it into the "long"
+ * workqueue.
+ */
+ queue_work(system_long_wq, &link->rm_work);
}
-#endif

static struct class devlink_class = {
.name = "devlink",
Index: linux-pm/include/linux/device.h
===================================================================
--- linux-pm.orig/include/linux/device.h
+++ linux-pm/include/linux/device.h
@@ -570,7 +570,7 @@ struct device {
* @flags: Link flags.
* @rpm_active: Whether or not the consumer device is runtime-PM-active.
* @kref: Count repeated addition of the same link.
- * @rcu_head: An RCU head to use for deferred execution of SRCU callbacks.
+ * @rm_work: Work structure used for removing the link.
* @supplier_preactivated: Supplier has been made active before consumer probe.
*/
struct device_link {
@@ -583,9 +583,7 @@ struct device_link {
u32 flags;
refcount_t rpm_active;
struct kref kref;
-#ifdef CONFIG_SRCU
- struct rcu_head rcu_head;
-#endif
+ struct work_struct rm_work;
bool supplier_preactivated; /* Owned by consumer probe. */
};




2021-05-12 20:38:16

by Saravana Kannan

[permalink] [raw]
Subject: Re: Qestion about device link

On Wed, May 12, 2021 at 7:04 AM Rafael J. Wysocki <[email protected]> wrote:
>
> On Wednesday, May 12, 2021 5:24:53 AM CEST chenxiang (M) wrote:
> > Hi Rafael,
> >
> >
> > 在 2021/5/12 3:16, Rafael J. Wysocki 写道:
> > > On Tuesday, May 11, 2021 4:39:31 PM CEST Rafael J. Wysocki wrote:
> > >> On 5/11/2021 5:59 AM, chenxiang (M) wrote:
> > >>> Hi Rafael and other guys,
> > >>>
> > >>> I am trying to add a device link between scsi_host->shost_gendev and
> > >>> hisi_hba->dev to support runtime PM for hisi_hba driver
> > >>>
> > >>> (as it supports runtime PM for scsi host in some scenarios such as
> > >>> error handler etc, we can avoid to do them again if adding a
> > >>>
> > >>> device link between scsi_host->shost_gendev and hisi_hba->dev) as
> > >>> follows (hisi_sas driver is under directory drivers/scsi/hisi_sas):
> > >>>
> > >>> device_link_add(&shost->shost_gendev, hisi_hba->dev,
> > >>> DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE)
> > >>>
> > >>> We have a full test on it, and it works well except when rmmod the
> > >>> driver, some call trace occurs as follows:
> > >>>
> > >>> [root@localhost ~]# rmmod hisi_sas_v3_hw
> > >>> [ 105.377944] BUG: scheduling while atomic: kworker/113:1/811/0x00000201
> > >>> [ 105.384469] Modules linked in: bluetooth rfkill ib_isert
> > >>> iscsi_target_mod ib_ipoib ib_umad iptable_filter vfio_iommu_type1
> > >>> vfio_pci vfio_virqfd vfio rpcrdma ib_is er
> > >>> libiscsi scsi_transport_iscsi crct10dif_ce sbsa_gwdt hns_roce_hw_v2
> > >>> hisi_sec2 hisi_hpre hisi_zip hisi_qm uacce spi_hisi_sfc_v3xx
> > >>> hisi_trng_v2 rng_core hisi_uncore _hha_pmu
> > >>> hisi_uncore_ddrc_pmu hisi_uncore_l3c_pmu spi_dw_mmio hisi_uncore_pmu
> > >>> hns3 hclge hnae3 hisi_sas_v3_hw(-) hisi_sas_main libsas
> > >>> [ 105.424841] CPU: 113 PID: 811 Comm: kworker/113:1 Kdump: loaded
> > >>> Tainted: G W 5.12.0-rc1+ #1
> > >>> [ 105.434454] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS
> > >>> 2280-V2 CS V5.B143.01 04/22/2021
> > >>> [ 105.443287] Workqueue: rcu_gp srcu_invoke_callbacks
> > >>> [ 105.448154] Call trace:
> > >>> [ 105.450593] dump_backtrace+0x0/0x1a4
> > >>> [ 105.454245] show_stack+0x24/0x40
> > >>> [ 105.457548] dump_stack+0xc8/0x104
> > >>> [ 105.460939] __schedule_bug+0x68/0x80
> > >>> [ 105.464590] __schedule+0x73c/0x77c
> > >>> [ 105.465700] BUG: scheduling while atomic: kworker/96:1/791/0x00000201
> > >>> [ 105.468066] schedule+0x7c/0x110
> > >>> [ 105.468068] schedule_timeout+0x194/0x1d4
> > >>> [ 105.474490] Modules linked in:
> > >>> [ 105.477692] wait_for_completion+0x8c/0x12c
> > >>> [ 105.477695] rcu_barrier+0x1e0/0x2fc
> > >>> [ 105.477697] scsi_host_dev_release+0x50/0xf0
> > >>> [ 105.477701] device_release+0x40/0xa0
> > >>> [ 105.477704] kobject_put+0xac/0x100
> > >>> [ 105.477707] __device_link_free_srcu+0x50/0x74
> > >>> [ 105.477709] srcu_invoke_callbacks+0x108/0x1a4
> > >>> [ 105.484743] process_one_work+0x1dc/0x48c
> > >>> [ 105.492468] worker_thread+0x7c/0x464
> > >>> [ 105.492471] kthread+0x168/0x16c
> > >>> [ 105.492473] ret_from_fork+0x10/0x18
> > >>> ...
> > >>>
> > >>> After analyse the process, we find that it will
> > >>> device_del(&shost->gendev) in function scsi_remove_host() and then
> > >>>
> > >>> put_device(&shost->shost_gendev) in function scsi_host_put() when
> > >>> removing the driver, if there is a link between shost and hisi_hba->dev,
> > >>>
> > >>> it will try to delete the link in device_del(), and also will
> > >>> call_srcu(__device_link_free_srcu) to put_device() link->consumer and
> > >>> supplier.
> > >>>
> > >>> But if put device() for shost_gendev in device_link_free() is later
> > >>> than in scsi_host_put(), it will call scsi_host_dev_release() in
> > >>>
> > >>> srcu_invoke_callbacks() while it is atomic and there are scheduling in
> > >>> scsi_host_dev_release(),
> > >>>
> > >>> so it reports the BUG "scheduling while atomic:...".
> > >>>
> > >>> thread 1 thread2
> > >>> hisi_sas_v3_remove
> > >>> ...
> > >>> sas_remove_host()
> > >>> ...
> > >>> scsi_remove_host()
> > >>> ...
> > >>> device_del(&shost->shost_gendev)
> > >>> ...
> > >>> device_link_purge()
> > >>> __device_link_del()
> > >>> device_unregister(&link->link_dev)
> > >>> devlink_dev_release
> > >>> call_srcu(__device_link_free_srcu) ----------->
> > >>> srcu_invoke_callbacks (atomic)
> > >>> __device_link_free_srcu
> > >>> ...
> > >>> scsi_host_put()
> > >>> put_device(&shost->shost_gendev) (ref = 1)
> > >>> device_link_free()
> > >>> put_device(link->consumer)
> > >>> //shost->gendev ref = 0
> > >>> ...
> > >>> scsi_host_dev_release
> > >>> ...
> > >>> rcu_barrier
> > >>> kthread_stop()
> > >>>
> > >>>
> > >>> We can check kref of shost->shost_gendev to make sure scsi_host_put()
> > >>> to release scsi host device in LLDD driver to avoid the issue,
> > >>>
> > >>> but it seems be a common issue: function __device_link_free_srcu
> > >>> calls put_device() for consumer and supplier,
> > >>>
> > >>> but if it's ref =0 at that time and there are scheduling or sleep in
> > >>> dev_release, it may have the issue.
> > >>>
> > >>> Do you have any idea about the issue?
> > >>>
> > >> Yes, this is a general issue.
> > >>
> > >> If I'm not mistaken, it can be addressed by further deferring the
> > >> device_link_free() invocation through a workqueue.
> > >>
> > >> Let me cut a patch doing this.
> > > Please test the patch below and let me know if it works for you.
> >
> > I have a test on the patch, and it solves my issue.
>
> Great, thanks!
>
> Please also test the patch appended below (it uses a slightly different approach).
>
> ---
> drivers/base/core.c | 37 +++++++++++++++++++++++--------------
> include/linux/device.h | 6 ++----
> 2 files changed, 25 insertions(+), 18 deletions(-)
>
> Index: linux-pm/drivers/base/core.c
> ===================================================================
> --- linux-pm.orig/drivers/base/core.c
> +++ linux-pm/drivers/base/core.c
> @@ -193,6 +193,11 @@ int device_links_read_lock_held(void)
> {
> return srcu_read_lock_held(&device_links_srcu);
> }
> +
> +void device_link_synchronize_removal(void)
> +{
> + synchronize_srcu(&device_links_srcu);
> +}
> #else /* !CONFIG_SRCU */
> static DECLARE_RWSEM(device_links_lock);
>
> @@ -223,6 +228,10 @@ int device_links_read_lock_held(void)
> return lockdep_is_held(&device_links_lock);
> }
> #endif
> +
> +static inline void device_link_synchronize_removal(void)
> +{
> +}
> #endif /* !CONFIG_SRCU */
>
> static bool device_is_ancestor(struct device *dev, struct device *target)
> @@ -444,8 +453,13 @@ static struct attribute *devlink_attrs[]
> };
> ATTRIBUTE_GROUPS(devlink);
>
> -static void device_link_free(struct device_link *link)
> +static void device_link_release_fn(struct work_struct *work)
> {
> + struct device_link *link = container_of(work, struct device_link, rm_work);
> +
> + /* Ensure that all references to the link object have been dropped. */
> + device_link_synchronize_removal();
> +
> while (refcount_dec_not_one(&link->rpm_active))
> pm_runtime_put(link->supplier);
>
> @@ -454,24 +468,19 @@ static void device_link_free(struct devi
> kfree(link);
> }
>
> -#ifdef CONFIG_SRCU
> -static void __device_link_free_srcu(struct rcu_head *rhead)
> -{
> - device_link_free(container_of(rhead, struct device_link, rcu_head));
> -}
> -
> static void devlink_dev_release(struct device *dev)
> {
> struct device_link *link = to_devlink(dev);
>
> - call_srcu(&device_links_srcu, &link->rcu_head, __device_link_free_srcu);
> -}
> -#else
> -static void devlink_dev_release(struct device *dev)
> -{
> - device_link_free(to_devlink(dev));
> + INIT_WORK(&link->rm_work, device_link_release_fn);
> + /*
> + * It may take a while to complete this work because of the SRCU
> + * synchronization in device_link_release_fn() and if the consumer or
> + * supplier devices get deleted when it runs, so put it into the "long"
> + * workqueue.
> + */
> + queue_work(system_long_wq, &link->rm_work);

Not too strong of an opinion, but this seems like an unnecessary work
queue when SRCUs aren't enabled. We could just leave this part as is
and limit your changes to the SRCU implementation?

For the SRCU implementation, yes, this is a lot cleaner/nicer than
kicking off another work from the SRCU callback.

-Saravana

> }
> -#endif
>
> static struct class devlink_class = {
> .name = "devlink",
> Index: linux-pm/include/linux/device.h
> ===================================================================
> --- linux-pm.orig/include/linux/device.h
> +++ linux-pm/include/linux/device.h
> @@ -570,7 +570,7 @@ struct device {
> * @flags: Link flags.
> * @rpm_active: Whether or not the consumer device is runtime-PM-active.
> * @kref: Count repeated addition of the same link.
> - * @rcu_head: An RCU head to use for deferred execution of SRCU callbacks.
> + * @rm_work: Work structure used for removing the link.
> * @supplier_preactivated: Supplier has been made active before consumer probe.
> */
> struct device_link {
> @@ -583,9 +583,7 @@ struct device_link {
> u32 flags;
> refcount_t rpm_active;
> struct kref kref;
> -#ifdef CONFIG_SRCU
> - struct rcu_head rcu_head;
> -#endif
> + struct work_struct rm_work;
> bool supplier_preactivated; /* Owned by consumer probe. */
> };
>

2021-05-13 03:05:51

by chenxiang (M)

[permalink] [raw]
Subject: Re: Qestion about device link

Hi Rafael,


在 2021/5/12 22:04, Rafael J. Wysocki 写道:
> On Wednesday, May 12, 2021 5:24:53 AM CEST chenxiang (M) wrote:
>> Hi Rafael,
>>
>>
>> 在 2021/5/12 3:16, Rafael J. Wysocki 写道:
>>> On Tuesday, May 11, 2021 4:39:31 PM CEST Rafael J. Wysocki wrote:
>>>> On 5/11/2021 5:59 AM, chenxiang (M) wrote:
>>>>> Hi Rafael and other guys,
>>>>>
>>>>> I am trying to add a device link between scsi_host->shost_gendev and
>>>>> hisi_hba->dev to support runtime PM for hisi_hba driver
>>>>>
>>>>> (as it supports runtime PM for scsi host in some scenarios such as
>>>>> error handler etc, we can avoid to do them again if adding a
>>>>>
>>>>> device link between scsi_host->shost_gendev and hisi_hba->dev) as
>>>>> follows (hisi_sas driver is under directory drivers/scsi/hisi_sas):
>>>>>
>>>>> device_link_add(&shost->shost_gendev, hisi_hba->dev,
>>>>> DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE)
>>>>>
>>>>> We have a full test on it, and it works well except when rmmod the
>>>>> driver, some call trace occurs as follows:
>>>>>
>>>>> [root@localhost ~]# rmmod hisi_sas_v3_hw
>>>>> [ 105.377944] BUG: scheduling while atomic: kworker/113:1/811/0x00000201
>>>>> [ 105.384469] Modules linked in: bluetooth rfkill ib_isert
>>>>> iscsi_target_mod ib_ipoib ib_umad iptable_filter vfio_iommu_type1
>>>>> vfio_pci vfio_virqfd vfio rpcrdma ib_is er
>>>>> libiscsi scsi_transport_iscsi crct10dif_ce sbsa_gwdt hns_roce_hw_v2
>>>>> hisi_sec2 hisi_hpre hisi_zip hisi_qm uacce spi_hisi_sfc_v3xx
>>>>> hisi_trng_v2 rng_core hisi_uncore _hha_pmu
>>>>> hisi_uncore_ddrc_pmu hisi_uncore_l3c_pmu spi_dw_mmio hisi_uncore_pmu
>>>>> hns3 hclge hnae3 hisi_sas_v3_hw(-) hisi_sas_main libsas
>>>>> [ 105.424841] CPU: 113 PID: 811 Comm: kworker/113:1 Kdump: loaded
>>>>> Tainted: G W 5.12.0-rc1+ #1
>>>>> [ 105.434454] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS
>>>>> 2280-V2 CS V5.B143.01 04/22/2021
>>>>> [ 105.443287] Workqueue: rcu_gp srcu_invoke_callbacks
>>>>> [ 105.448154] Call trace:
>>>>> [ 105.450593] dump_backtrace+0x0/0x1a4
>>>>> [ 105.454245] show_stack+0x24/0x40
>>>>> [ 105.457548] dump_stack+0xc8/0x104
>>>>> [ 105.460939] __schedule_bug+0x68/0x80
>>>>> [ 105.464590] __schedule+0x73c/0x77c
>>>>> [ 105.465700] BUG: scheduling while atomic: kworker/96:1/791/0x00000201
>>>>> [ 105.468066] schedule+0x7c/0x110
>>>>> [ 105.468068] schedule_timeout+0x194/0x1d4
>>>>> [ 105.474490] Modules linked in:
>>>>> [ 105.477692] wait_for_completion+0x8c/0x12c
>>>>> [ 105.477695] rcu_barrier+0x1e0/0x2fc
>>>>> [ 105.477697] scsi_host_dev_release+0x50/0xf0
>>>>> [ 105.477701] device_release+0x40/0xa0
>>>>> [ 105.477704] kobject_put+0xac/0x100
>>>>> [ 105.477707] __device_link_free_srcu+0x50/0x74
>>>>> [ 105.477709] srcu_invoke_callbacks+0x108/0x1a4
>>>>> [ 105.484743] process_one_work+0x1dc/0x48c
>>>>> [ 105.492468] worker_thread+0x7c/0x464
>>>>> [ 105.492471] kthread+0x168/0x16c
>>>>> [ 105.492473] ret_from_fork+0x10/0x18
>>>>> ...
>>>>>
>>>>> After analyse the process, we find that it will
>>>>> device_del(&shost->gendev) in function scsi_remove_host() and then
>>>>>
>>>>> put_device(&shost->shost_gendev) in function scsi_host_put() when
>>>>> removing the driver, if there is a link between shost and hisi_hba->dev,
>>>>>
>>>>> it will try to delete the link in device_del(), and also will
>>>>> call_srcu(__device_link_free_srcu) to put_device() link->consumer and
>>>>> supplier.
>>>>>
>>>>> But if put device() for shost_gendev in device_link_free() is later
>>>>> than in scsi_host_put(), it will call scsi_host_dev_release() in
>>>>>
>>>>> srcu_invoke_callbacks() while it is atomic and there are scheduling in
>>>>> scsi_host_dev_release(),
>>>>>
>>>>> so it reports the BUG "scheduling while atomic:...".
>>>>>
>>>>> thread 1 thread2
>>>>> hisi_sas_v3_remove
>>>>> ...
>>>>> sas_remove_host()
>>>>> ...
>>>>> scsi_remove_host()
>>>>> ...
>>>>> device_del(&shost->shost_gendev)
>>>>> ...
>>>>> device_link_purge()
>>>>> __device_link_del()
>>>>> device_unregister(&link->link_dev)
>>>>> devlink_dev_release
>>>>> call_srcu(__device_link_free_srcu) ----------->
>>>>> srcu_invoke_callbacks (atomic)
>>>>> __device_link_free_srcu
>>>>> ...
>>>>> scsi_host_put()
>>>>> put_device(&shost->shost_gendev) (ref = 1)
>>>>> device_link_free()
>>>>> put_device(link->consumer)
>>>>> //shost->gendev ref = 0
>>>>> ...
>>>>> scsi_host_dev_release
>>>>> ...
>>>>> rcu_barrier
>>>>> kthread_stop()
>>>>>
>>>>>
>>>>> We can check kref of shost->shost_gendev to make sure scsi_host_put()
>>>>> to release scsi host device in LLDD driver to avoid the issue,
>>>>>
>>>>> but it seems be a common issue: function __device_link_free_srcu
>>>>> calls put_device() for consumer and supplier,
>>>>>
>>>>> but if it's ref =0 at that time and there are scheduling or sleep in
>>>>> dev_release, it may have the issue.
>>>>>
>>>>> Do you have any idea about the issue?
>>>>>
>>>> Yes, this is a general issue.
>>>>
>>>> If I'm not mistaken, it can be addressed by further deferring the
>>>> device_link_free() invocation through a workqueue.
>>>>
>>>> Let me cut a patch doing this.
>>> Please test the patch below and let me know if it works for you.
>> I have a test on the patch, and it solves my issue.
> Great, thanks!
>
> Please also test the patch appended below (it uses a slightly different approach).

I have a test on this change, and it also solves my issue.

>
> ---
> drivers/base/core.c | 37 +++++++++++++++++++++++--------------
> include/linux/device.h | 6 ++----
> 2 files changed, 25 insertions(+), 18 deletions(-)
>
> Index: linux-pm/drivers/base/core.c
> ===================================================================
> --- linux-pm.orig/drivers/base/core.c
> +++ linux-pm/drivers/base/core.c
> @@ -193,6 +193,11 @@ int device_links_read_lock_held(void)
> {
> return srcu_read_lock_held(&device_links_srcu);
> }
> +
> +void device_link_synchronize_removal(void)
> +{
> + synchronize_srcu(&device_links_srcu);
> +}
> #else /* !CONFIG_SRCU */
> static DECLARE_RWSEM(device_links_lock);
>
> @@ -223,6 +228,10 @@ int device_links_read_lock_held(void)
> return lockdep_is_held(&device_links_lock);
> }
> #endif
> +
> +static inline void device_link_synchronize_removal(void)
> +{
> +}
> #endif /* !CONFIG_SRCU */
>
> static bool device_is_ancestor(struct device *dev, struct device *target)
> @@ -444,8 +453,13 @@ static struct attribute *devlink_attrs[]
> };
> ATTRIBUTE_GROUPS(devlink);
>
> -static void device_link_free(struct device_link *link)
> +static void device_link_release_fn(struct work_struct *work)
> {
> + struct device_link *link = container_of(work, struct device_link, rm_work);
> +
> + /* Ensure that all references to the link object have been dropped. */
> + device_link_synchronize_removal();
> +
> while (refcount_dec_not_one(&link->rpm_active))
> pm_runtime_put(link->supplier);
>
> @@ -454,24 +468,19 @@ static void device_link_free(struct devi
> kfree(link);
> }
>
> -#ifdef CONFIG_SRCU
> -static void __device_link_free_srcu(struct rcu_head *rhead)
> -{
> - device_link_free(container_of(rhead, struct device_link, rcu_head));
> -}
> -
> static void devlink_dev_release(struct device *dev)
> {
> struct device_link *link = to_devlink(dev);
>
> - call_srcu(&device_links_srcu, &link->rcu_head, __device_link_free_srcu);
> -}
> -#else
> -static void devlink_dev_release(struct device *dev)
> -{
> - device_link_free(to_devlink(dev));
> + INIT_WORK(&link->rm_work, device_link_release_fn);
> + /*
> + * It may take a while to complete this work because of the SRCU
> + * synchronization in device_link_release_fn() and if the consumer or
> + * supplier devices get deleted when it runs, so put it into the "long"
> + * workqueue.
> + */
> + queue_work(system_long_wq, &link->rm_work);
> }
> -#endif
>
> static struct class devlink_class = {
> .name = "devlink",
> Index: linux-pm/include/linux/device.h
> ===================================================================
> --- linux-pm.orig/include/linux/device.h
> +++ linux-pm/include/linux/device.h
> @@ -570,7 +570,7 @@ struct device {
> * @flags: Link flags.
> * @rpm_active: Whether or not the consumer device is runtime-PM-active.
> * @kref: Count repeated addition of the same link.
> - * @rcu_head: An RCU head to use for deferred execution of SRCU callbacks.
> + * @rm_work: Work structure used for removing the link.
> * @supplier_preactivated: Supplier has been made active before consumer probe.
> */
> struct device_link {
> @@ -583,9 +583,7 @@ struct device_link {
> u32 flags;
> refcount_t rpm_active;
> struct kref kref;
> -#ifdef CONFIG_SRCU
> - struct rcu_head rcu_head;
> -#endif
> + struct work_struct rm_work;
> bool supplier_preactivated; /* Owned by consumer probe. */
> };
>
>
>
>
>
> .
>


2021-05-13 20:22:46

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Qestion about device link

On Thu, May 13, 2021 at 5:00 AM chenxiang (M) <[email protected]> wrote:
>
> Hi Rafael,
>
>
> 在 2021/5/12 22:04, Rafael J. Wysocki 写道:
> > On Wednesday, May 12, 2021 5:24:53 AM CEST chenxiang (M) wrote:
> >> Hi Rafael,
> >>
> >>
> >> 在 2021/5/12 3:16, Rafael J. Wysocki 写道:
> >>> On Tuesday, May 11, 2021 4:39:31 PM CEST Rafael J. Wysocki wrote:
> >>>> On 5/11/2021 5:59 AM, chenxiang (M) wrote:
> >>>>> Hi Rafael and other guys,
> >>>>>
> >>>>> I am trying to add a device link between scsi_host->shost_gendev and
> >>>>> hisi_hba->dev to support runtime PM for hisi_hba driver
> >>>>>
> >>>>> (as it supports runtime PM for scsi host in some scenarios such as
> >>>>> error handler etc, we can avoid to do them again if adding a
> >>>>>
> >>>>> device link between scsi_host->shost_gendev and hisi_hba->dev) as
> >>>>> follows (hisi_sas driver is under directory drivers/scsi/hisi_sas):
> >>>>>
> >>>>> device_link_add(&shost->shost_gendev, hisi_hba->dev,
> >>>>> DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE)
> >>>>>
> >>>>> We have a full test on it, and it works well except when rmmod the
> >>>>> driver, some call trace occurs as follows:
> >>>>>
> >>>>> [root@localhost ~]# rmmod hisi_sas_v3_hw
> >>>>> [ 105.377944] BUG: scheduling while atomic: kworker/113:1/811/0x00000201
> >>>>> [ 105.384469] Modules linked in: bluetooth rfkill ib_isert
> >>>>> iscsi_target_mod ib_ipoib ib_umad iptable_filter vfio_iommu_type1
> >>>>> vfio_pci vfio_virqfd vfio rpcrdma ib_is er
> >>>>> libiscsi scsi_transport_iscsi crct10dif_ce sbsa_gwdt hns_roce_hw_v2
> >>>>> hisi_sec2 hisi_hpre hisi_zip hisi_qm uacce spi_hisi_sfc_v3xx
> >>>>> hisi_trng_v2 rng_core hisi_uncore _hha_pmu
> >>>>> hisi_uncore_ddrc_pmu hisi_uncore_l3c_pmu spi_dw_mmio hisi_uncore_pmu
> >>>>> hns3 hclge hnae3 hisi_sas_v3_hw(-) hisi_sas_main libsas
> >>>>> [ 105.424841] CPU: 113 PID: 811 Comm: kworker/113:1 Kdump: loaded
> >>>>> Tainted: G W 5.12.0-rc1+ #1
> >>>>> [ 105.434454] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS
> >>>>> 2280-V2 CS V5.B143.01 04/22/2021
> >>>>> [ 105.443287] Workqueue: rcu_gp srcu_invoke_callbacks
> >>>>> [ 105.448154] Call trace:
> >>>>> [ 105.450593] dump_backtrace+0x0/0x1a4
> >>>>> [ 105.454245] show_stack+0x24/0x40
> >>>>> [ 105.457548] dump_stack+0xc8/0x104
> >>>>> [ 105.460939] __schedule_bug+0x68/0x80
> >>>>> [ 105.464590] __schedule+0x73c/0x77c
> >>>>> [ 105.465700] BUG: scheduling while atomic: kworker/96:1/791/0x00000201
> >>>>> [ 105.468066] schedule+0x7c/0x110
> >>>>> [ 105.468068] schedule_timeout+0x194/0x1d4
> >>>>> [ 105.474490] Modules linked in:
> >>>>> [ 105.477692] wait_for_completion+0x8c/0x12c
> >>>>> [ 105.477695] rcu_barrier+0x1e0/0x2fc
> >>>>> [ 105.477697] scsi_host_dev_release+0x50/0xf0
> >>>>> [ 105.477701] device_release+0x40/0xa0
> >>>>> [ 105.477704] kobject_put+0xac/0x100
> >>>>> [ 105.477707] __device_link_free_srcu+0x50/0x74
> >>>>> [ 105.477709] srcu_invoke_callbacks+0x108/0x1a4
> >>>>> [ 105.484743] process_one_work+0x1dc/0x48c
> >>>>> [ 105.492468] worker_thread+0x7c/0x464
> >>>>> [ 105.492471] kthread+0x168/0x16c
> >>>>> [ 105.492473] ret_from_fork+0x10/0x18
> >>>>> ...
> >>>>>
> >>>>> After analyse the process, we find that it will
> >>>>> device_del(&shost->gendev) in function scsi_remove_host() and then
> >>>>>
> >>>>> put_device(&shost->shost_gendev) in function scsi_host_put() when
> >>>>> removing the driver, if there is a link between shost and hisi_hba->dev,
> >>>>>
> >>>>> it will try to delete the link in device_del(), and also will
> >>>>> call_srcu(__device_link_free_srcu) to put_device() link->consumer and
> >>>>> supplier.
> >>>>>
> >>>>> But if put device() for shost_gendev in device_link_free() is later
> >>>>> than in scsi_host_put(), it will call scsi_host_dev_release() in
> >>>>>
> >>>>> srcu_invoke_callbacks() while it is atomic and there are scheduling in
> >>>>> scsi_host_dev_release(),
> >>>>>
> >>>>> so it reports the BUG "scheduling while atomic:...".
> >>>>>
> >>>>> thread 1 thread2
> >>>>> hisi_sas_v3_remove
> >>>>> ...
> >>>>> sas_remove_host()
> >>>>> ...
> >>>>> scsi_remove_host()
> >>>>> ...
> >>>>> device_del(&shost->shost_gendev)
> >>>>> ...
> >>>>> device_link_purge()
> >>>>> __device_link_del()
> >>>>> device_unregister(&link->link_dev)
> >>>>> devlink_dev_release
> >>>>> call_srcu(__device_link_free_srcu) ----------->
> >>>>> srcu_invoke_callbacks (atomic)
> >>>>> __device_link_free_srcu
> >>>>> ...
> >>>>> scsi_host_put()
> >>>>> put_device(&shost->shost_gendev) (ref = 1)
> >>>>> device_link_free()
> >>>>> put_device(link->consumer)
> >>>>> //shost->gendev ref = 0
> >>>>> ...
> >>>>> scsi_host_dev_release
> >>>>> ...
> >>>>> rcu_barrier
> >>>>> kthread_stop()
> >>>>>
> >>>>>
> >>>>> We can check kref of shost->shost_gendev to make sure scsi_host_put()
> >>>>> to release scsi host device in LLDD driver to avoid the issue,
> >>>>>
> >>>>> but it seems be a common issue: function __device_link_free_srcu
> >>>>> calls put_device() for consumer and supplier,
> >>>>>
> >>>>> but if it's ref =0 at that time and there are scheduling or sleep in
> >>>>> dev_release, it may have the issue.
> >>>>>
> >>>>> Do you have any idea about the issue?
> >>>>>
> >>>> Yes, this is a general issue.
> >>>>
> >>>> If I'm not mistaken, it can be addressed by further deferring the
> >>>> device_link_free() invocation through a workqueue.
> >>>>
> >>>> Let me cut a patch doing this.
> >>> Please test the patch below and let me know if it works for you.
> >> I have a test on the patch, and it solves my issue.
> > Great, thanks!
> >
> > Please also test the patch appended below (it uses a slightly different approach).
>
> I have a test on this change, and it also solves my issue.

Awesome, thanks!

> >
> > ---
> > drivers/base/core.c | 37 +++++++++++++++++++++++--------------
> > include/linux/device.h | 6 ++----
> > 2 files changed, 25 insertions(+), 18 deletions(-)
> >
> > Index: linux-pm/drivers/base/core.c
> > ===================================================================
> > --- linux-pm.orig/drivers/base/core.c
> > +++ linux-pm/drivers/base/core.c
> > @@ -193,6 +193,11 @@ int device_links_read_lock_held(void)
> > {
> > return srcu_read_lock_held(&device_links_srcu);
> > }
> > +
> > +void device_link_synchronize_removal(void)
> > +{
> > + synchronize_srcu(&device_links_srcu);
> > +}
> > #else /* !CONFIG_SRCU */
> > static DECLARE_RWSEM(device_links_lock);
> >
> > @@ -223,6 +228,10 @@ int device_links_read_lock_held(void)
> > return lockdep_is_held(&device_links_lock);
> > }
> > #endif
> > +
> > +static inline void device_link_synchronize_removal(void)
> > +{
> > +}
> > #endif /* !CONFIG_SRCU */
> >
> > static bool device_is_ancestor(struct device *dev, struct device *target)
> > @@ -444,8 +453,13 @@ static struct attribute *devlink_attrs[]
> > };
> > ATTRIBUTE_GROUPS(devlink);
> >
> > -static void device_link_free(struct device_link *link)
> > +static void device_link_release_fn(struct work_struct *work)
> > {
> > + struct device_link *link = container_of(work, struct device_link, rm_work);
> > +
> > + /* Ensure that all references to the link object have been dropped. */
> > + device_link_synchronize_removal();
> > +
> > while (refcount_dec_not_one(&link->rpm_active))
> > pm_runtime_put(link->supplier);
> >
> > @@ -454,24 +468,19 @@ static void device_link_free(struct devi
> > kfree(link);
> > }
> >
> > -#ifdef CONFIG_SRCU
> > -static void __device_link_free_srcu(struct rcu_head *rhead)
> > -{
> > - device_link_free(container_of(rhead, struct device_link, rcu_head));
> > -}
> > -
> > static void devlink_dev_release(struct device *dev)
> > {
> > struct device_link *link = to_devlink(dev);
> >
> > - call_srcu(&device_links_srcu, &link->rcu_head, __device_link_free_srcu);
> > -}
> > -#else
> > -static void devlink_dev_release(struct device *dev)
> > -{
> > - device_link_free(to_devlink(dev));
> > + INIT_WORK(&link->rm_work, device_link_release_fn);
> > + /*
> > + * It may take a while to complete this work because of the SRCU
> > + * synchronization in device_link_release_fn() and if the consumer or
> > + * supplier devices get deleted when it runs, so put it into the "long"
> > + * workqueue.
> > + */
> > + queue_work(system_long_wq, &link->rm_work);
> > }
> > -#endif
> >
> > static struct class devlink_class = {
> > .name = "devlink",
> > Index: linux-pm/include/linux/device.h
> > ===================================================================
> > --- linux-pm.orig/include/linux/device.h
> > +++ linux-pm/include/linux/device.h
> > @@ -570,7 +570,7 @@ struct device {
> > * @flags: Link flags.
> > * @rpm_active: Whether or not the consumer device is runtime-PM-active.
> > * @kref: Count repeated addition of the same link.
> > - * @rcu_head: An RCU head to use for deferred execution of SRCU callbacks.
> > + * @rm_work: Work structure used for removing the link.
> > * @supplier_preactivated: Supplier has been made active before consumer probe.
> > */
> > struct device_link {
> > @@ -583,9 +583,7 @@ struct device_link {
> > u32 flags;
> > refcount_t rpm_active;
> > struct kref kref;
> > -#ifdef CONFIG_SRCU
> > - struct rcu_head rcu_head;
> > -#endif
> > + struct work_struct rm_work;
> > bool supplier_preactivated; /* Owned by consumer probe. */
> > };
> >
> >
> >
> >
> >
> > .
> >
>
>

2021-05-13 20:22:53

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Qestion about device link

On Thu, May 13, 2021 at 12:24 AM Saravana Kannan <[email protected]> wrote:
>
> On Wed, May 12, 2021 at 7:04 AM Rafael J. Wysocki <[email protected]> wrote:
> >
> > On Wednesday, May 12, 2021 5:24:53 AM CEST chenxiang (M) wrote:
> > > Hi Rafael,
> > >
> > >
> > > 在 2021/5/12 3:16, Rafael J. Wysocki 写道:
> > > > On Tuesday, May 11, 2021 4:39:31 PM CEST Rafael J. Wysocki wrote:
> > > >> On 5/11/2021 5:59 AM, chenxiang (M) wrote:
> > > >>> Hi Rafael and other guys,
> > > >>>
> > > >>> I am trying to add a device link between scsi_host->shost_gendev and
> > > >>> hisi_hba->dev to support runtime PM for hisi_hba driver
> > > >>>
> > > >>> (as it supports runtime PM for scsi host in some scenarios such as
> > > >>> error handler etc, we can avoid to do them again if adding a
> > > >>>
> > > >>> device link between scsi_host->shost_gendev and hisi_hba->dev) as
> > > >>> follows (hisi_sas driver is under directory drivers/scsi/hisi_sas):
> > > >>>
> > > >>> device_link_add(&shost->shost_gendev, hisi_hba->dev,
> > > >>> DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE)
> > > >>>
> > > >>> We have a full test on it, and it works well except when rmmod the
> > > >>> driver, some call trace occurs as follows:
> > > >>>
> > > >>> [root@localhost ~]# rmmod hisi_sas_v3_hw
> > > >>> [ 105.377944] BUG: scheduling while atomic: kworker/113:1/811/0x00000201
> > > >>> [ 105.384469] Modules linked in: bluetooth rfkill ib_isert
> > > >>> iscsi_target_mod ib_ipoib ib_umad iptable_filter vfio_iommu_type1
> > > >>> vfio_pci vfio_virqfd vfio rpcrdma ib_is er
> > > >>> libiscsi scsi_transport_iscsi crct10dif_ce sbsa_gwdt hns_roce_hw_v2
> > > >>> hisi_sec2 hisi_hpre hisi_zip hisi_qm uacce spi_hisi_sfc_v3xx
> > > >>> hisi_trng_v2 rng_core hisi_uncore _hha_pmu
> > > >>> hisi_uncore_ddrc_pmu hisi_uncore_l3c_pmu spi_dw_mmio hisi_uncore_pmu
> > > >>> hns3 hclge hnae3 hisi_sas_v3_hw(-) hisi_sas_main libsas
> > > >>> [ 105.424841] CPU: 113 PID: 811 Comm: kworker/113:1 Kdump: loaded
> > > >>> Tainted: G W 5.12.0-rc1+ #1
> > > >>> [ 105.434454] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS
> > > >>> 2280-V2 CS V5.B143.01 04/22/2021
> > > >>> [ 105.443287] Workqueue: rcu_gp srcu_invoke_callbacks
> > > >>> [ 105.448154] Call trace:
> > > >>> [ 105.450593] dump_backtrace+0x0/0x1a4
> > > >>> [ 105.454245] show_stack+0x24/0x40
> > > >>> [ 105.457548] dump_stack+0xc8/0x104
> > > >>> [ 105.460939] __schedule_bug+0x68/0x80
> > > >>> [ 105.464590] __schedule+0x73c/0x77c
> > > >>> [ 105.465700] BUG: scheduling while atomic: kworker/96:1/791/0x00000201
> > > >>> [ 105.468066] schedule+0x7c/0x110
> > > >>> [ 105.468068] schedule_timeout+0x194/0x1d4
> > > >>> [ 105.474490] Modules linked in:
> > > >>> [ 105.477692] wait_for_completion+0x8c/0x12c
> > > >>> [ 105.477695] rcu_barrier+0x1e0/0x2fc
> > > >>> [ 105.477697] scsi_host_dev_release+0x50/0xf0
> > > >>> [ 105.477701] device_release+0x40/0xa0
> > > >>> [ 105.477704] kobject_put+0xac/0x100
> > > >>> [ 105.477707] __device_link_free_srcu+0x50/0x74
> > > >>> [ 105.477709] srcu_invoke_callbacks+0x108/0x1a4
> > > >>> [ 105.484743] process_one_work+0x1dc/0x48c
> > > >>> [ 105.492468] worker_thread+0x7c/0x464
> > > >>> [ 105.492471] kthread+0x168/0x16c
> > > >>> [ 105.492473] ret_from_fork+0x10/0x18
> > > >>> ...
> > > >>>
> > > >>> After analyse the process, we find that it will
> > > >>> device_del(&shost->gendev) in function scsi_remove_host() and then
> > > >>>
> > > >>> put_device(&shost->shost_gendev) in function scsi_host_put() when
> > > >>> removing the driver, if there is a link between shost and hisi_hba->dev,
> > > >>>
> > > >>> it will try to delete the link in device_del(), and also will
> > > >>> call_srcu(__device_link_free_srcu) to put_device() link->consumer and
> > > >>> supplier.
> > > >>>
> > > >>> But if put device() for shost_gendev in device_link_free() is later
> > > >>> than in scsi_host_put(), it will call scsi_host_dev_release() in
> > > >>>
> > > >>> srcu_invoke_callbacks() while it is atomic and there are scheduling in
> > > >>> scsi_host_dev_release(),
> > > >>>
> > > >>> so it reports the BUG "scheduling while atomic:...".
> > > >>>
> > > >>> thread 1 thread2
> > > >>> hisi_sas_v3_remove
> > > >>> ...
> > > >>> sas_remove_host()
> > > >>> ...
> > > >>> scsi_remove_host()
> > > >>> ...
> > > >>> device_del(&shost->shost_gendev)
> > > >>> ...
> > > >>> device_link_purge()
> > > >>> __device_link_del()
> > > >>> device_unregister(&link->link_dev)
> > > >>> devlink_dev_release
> > > >>> call_srcu(__device_link_free_srcu) ----------->
> > > >>> srcu_invoke_callbacks (atomic)
> > > >>> __device_link_free_srcu
> > > >>> ...
> > > >>> scsi_host_put()
> > > >>> put_device(&shost->shost_gendev) (ref = 1)
> > > >>> device_link_free()
> > > >>> put_device(link->consumer)
> > > >>> //shost->gendev ref = 0
> > > >>> ...
> > > >>> scsi_host_dev_release
> > > >>> ...
> > > >>> rcu_barrier
> > > >>> kthread_stop()
> > > >>>
> > > >>>
> > > >>> We can check kref of shost->shost_gendev to make sure scsi_host_put()
> > > >>> to release scsi host device in LLDD driver to avoid the issue,
> > > >>>
> > > >>> but it seems be a common issue: function __device_link_free_srcu
> > > >>> calls put_device() for consumer and supplier,
> > > >>>
> > > >>> but if it's ref =0 at that time and there are scheduling or sleep in
> > > >>> dev_release, it may have the issue.
> > > >>>
> > > >>> Do you have any idea about the issue?
> > > >>>
> > > >> Yes, this is a general issue.
> > > >>
> > > >> If I'm not mistaken, it can be addressed by further deferring the
> > > >> device_link_free() invocation through a workqueue.
> > > >>
> > > >> Let me cut a patch doing this.
> > > > Please test the patch below and let me know if it works for you.
> > >
> > > I have a test on the patch, and it solves my issue.
> >
> > Great, thanks!
> >
> > Please also test the patch appended below (it uses a slightly different approach).
> >
> > ---
> > drivers/base/core.c | 37 +++++++++++++++++++++++--------------
> > include/linux/device.h | 6 ++----
> > 2 files changed, 25 insertions(+), 18 deletions(-)
> >
> > Index: linux-pm/drivers/base/core.c
> > ===================================================================
> > --- linux-pm.orig/drivers/base/core.c
> > +++ linux-pm/drivers/base/core.c
> > @@ -193,6 +193,11 @@ int device_links_read_lock_held(void)
> > {
> > return srcu_read_lock_held(&device_links_srcu);
> > }
> > +
> > +void device_link_synchronize_removal(void)
> > +{
> > + synchronize_srcu(&device_links_srcu);
> > +}
> > #else /* !CONFIG_SRCU */
> > static DECLARE_RWSEM(device_links_lock);
> >
> > @@ -223,6 +228,10 @@ int device_links_read_lock_held(void)
> > return lockdep_is_held(&device_links_lock);
> > }
> > #endif
> > +
> > +static inline void device_link_synchronize_removal(void)
> > +{
> > +}
> > #endif /* !CONFIG_SRCU */
> >
> > static bool device_is_ancestor(struct device *dev, struct device *target)
> > @@ -444,8 +453,13 @@ static struct attribute *devlink_attrs[]
> > };
> > ATTRIBUTE_GROUPS(devlink);
> >
> > -static void device_link_free(struct device_link *link)
> > +static void device_link_release_fn(struct work_struct *work)
> > {
> > + struct device_link *link = container_of(work, struct device_link, rm_work);
> > +
> > + /* Ensure that all references to the link object have been dropped. */
> > + device_link_synchronize_removal();
> > +
> > while (refcount_dec_not_one(&link->rpm_active))
> > pm_runtime_put(link->supplier);
> >
> > @@ -454,24 +468,19 @@ static void device_link_free(struct devi
> > kfree(link);
> > }
> >
> > -#ifdef CONFIG_SRCU
> > -static void __device_link_free_srcu(struct rcu_head *rhead)
> > -{
> > - device_link_free(container_of(rhead, struct device_link, rcu_head));
> > -}
> > -
> > static void devlink_dev_release(struct device *dev)
> > {
> > struct device_link *link = to_devlink(dev);
> >
> > - call_srcu(&device_links_srcu, &link->rcu_head, __device_link_free_srcu);
> > -}
> > -#else
> > -static void devlink_dev_release(struct device *dev)
> > -{
> > - device_link_free(to_devlink(dev));
> > + INIT_WORK(&link->rm_work, device_link_release_fn);
> > + /*
> > + * It may take a while to complete this work because of the SRCU
> > + * synchronization in device_link_release_fn() and if the consumer or
> > + * supplier devices get deleted when it runs, so put it into the "long"
> > + * workqueue.
> > + */
> > + queue_work(system_long_wq, &link->rm_work);
>
> Not too strong of an opinion, but this seems like an unnecessary work
> queue when SRCUs aren't enabled.

It is not strictly necessary then, but I want the code with and
without SRCU to be as similar as possible and it doesn't really hurt
to defer the freeing of memory associated with the device link even in
the non-SRCU case, because whoever drops the last reference to the
device link doesn't really care when that memory gets released and may
not want to wait until that actually happens.

> We could just leave this part as is and limit your changes to the SRCU implementation?

I don't really see why that would be better.

2021-05-13 22:38:27

by Saravana Kannan

[permalink] [raw]
Subject: Re: Qestion about device link

On Thu, May 13, 2021 at 5:13 AM Rafael J. Wysocki <[email protected]> wrote:
>
> On Thu, May 13, 2021 at 12:24 AM Saravana Kannan <[email protected]> wrote:
> >
> > On Wed, May 12, 2021 at 7:04 AM Rafael J. Wysocki <[email protected]> wrote:
> > >
> > > On Wednesday, May 12, 2021 5:24:53 AM CEST chenxiang (M) wrote:
> > > > Hi Rafael,
> > > >
> > > >
> > > > 在 2021/5/12 3:16, Rafael J. Wysocki 写道:
> > > > > On Tuesday, May 11, 2021 4:39:31 PM CEST Rafael J. Wysocki wrote:
> > > > >> On 5/11/2021 5:59 AM, chenxiang (M) wrote:
> > > > >>> Hi Rafael and other guys,
> > > > >>>
> > > > >>> I am trying to add a device link between scsi_host->shost_gendev and
> > > > >>> hisi_hba->dev to support runtime PM for hisi_hba driver
> > > > >>>
> > > > >>> (as it supports runtime PM for scsi host in some scenarios such as
> > > > >>> error handler etc, we can avoid to do them again if adding a
> > > > >>>
> > > > >>> device link between scsi_host->shost_gendev and hisi_hba->dev) as
> > > > >>> follows (hisi_sas driver is under directory drivers/scsi/hisi_sas):
> > > > >>>
> > > > >>> device_link_add(&shost->shost_gendev, hisi_hba->dev,
> > > > >>> DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE)
> > > > >>>
> > > > >>> We have a full test on it, and it works well except when rmmod the
> > > > >>> driver, some call trace occurs as follows:
> > > > >>>
> > > > >>> [root@localhost ~]# rmmod hisi_sas_v3_hw
> > > > >>> [ 105.377944] BUG: scheduling while atomic: kworker/113:1/811/0x00000201
> > > > >>> [ 105.384469] Modules linked in: bluetooth rfkill ib_isert
> > > > >>> iscsi_target_mod ib_ipoib ib_umad iptable_filter vfio_iommu_type1
> > > > >>> vfio_pci vfio_virqfd vfio rpcrdma ib_is er
> > > > >>> libiscsi scsi_transport_iscsi crct10dif_ce sbsa_gwdt hns_roce_hw_v2
> > > > >>> hisi_sec2 hisi_hpre hisi_zip hisi_qm uacce spi_hisi_sfc_v3xx
> > > > >>> hisi_trng_v2 rng_core hisi_uncore _hha_pmu
> > > > >>> hisi_uncore_ddrc_pmu hisi_uncore_l3c_pmu spi_dw_mmio hisi_uncore_pmu
> > > > >>> hns3 hclge hnae3 hisi_sas_v3_hw(-) hisi_sas_main libsas
> > > > >>> [ 105.424841] CPU: 113 PID: 811 Comm: kworker/113:1 Kdump: loaded
> > > > >>> Tainted: G W 5.12.0-rc1+ #1
> > > > >>> [ 105.434454] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS
> > > > >>> 2280-V2 CS V5.B143.01 04/22/2021
> > > > >>> [ 105.443287] Workqueue: rcu_gp srcu_invoke_callbacks
> > > > >>> [ 105.448154] Call trace:
> > > > >>> [ 105.450593] dump_backtrace+0x0/0x1a4
> > > > >>> [ 105.454245] show_stack+0x24/0x40
> > > > >>> [ 105.457548] dump_stack+0xc8/0x104
> > > > >>> [ 105.460939] __schedule_bug+0x68/0x80
> > > > >>> [ 105.464590] __schedule+0x73c/0x77c
> > > > >>> [ 105.465700] BUG: scheduling while atomic: kworker/96:1/791/0x00000201
> > > > >>> [ 105.468066] schedule+0x7c/0x110
> > > > >>> [ 105.468068] schedule_timeout+0x194/0x1d4
> > > > >>> [ 105.474490] Modules linked in:
> > > > >>> [ 105.477692] wait_for_completion+0x8c/0x12c
> > > > >>> [ 105.477695] rcu_barrier+0x1e0/0x2fc
> > > > >>> [ 105.477697] scsi_host_dev_release+0x50/0xf0
> > > > >>> [ 105.477701] device_release+0x40/0xa0
> > > > >>> [ 105.477704] kobject_put+0xac/0x100
> > > > >>> [ 105.477707] __device_link_free_srcu+0x50/0x74
> > > > >>> [ 105.477709] srcu_invoke_callbacks+0x108/0x1a4
> > > > >>> [ 105.484743] process_one_work+0x1dc/0x48c
> > > > >>> [ 105.492468] worker_thread+0x7c/0x464
> > > > >>> [ 105.492471] kthread+0x168/0x16c
> > > > >>> [ 105.492473] ret_from_fork+0x10/0x18
> > > > >>> ...
> > > > >>>
> > > > >>> After analyse the process, we find that it will
> > > > >>> device_del(&shost->gendev) in function scsi_remove_host() and then
> > > > >>>
> > > > >>> put_device(&shost->shost_gendev) in function scsi_host_put() when
> > > > >>> removing the driver, if there is a link between shost and hisi_hba->dev,
> > > > >>>
> > > > >>> it will try to delete the link in device_del(), and also will
> > > > >>> call_srcu(__device_link_free_srcu) to put_device() link->consumer and
> > > > >>> supplier.
> > > > >>>
> > > > >>> But if put device() for shost_gendev in device_link_free() is later
> > > > >>> than in scsi_host_put(), it will call scsi_host_dev_release() in
> > > > >>>
> > > > >>> srcu_invoke_callbacks() while it is atomic and there are scheduling in
> > > > >>> scsi_host_dev_release(),
> > > > >>>
> > > > >>> so it reports the BUG "scheduling while atomic:...".
> > > > >>>
> > > > >>> thread 1 thread2
> > > > >>> hisi_sas_v3_remove
> > > > >>> ...
> > > > >>> sas_remove_host()
> > > > >>> ...
> > > > >>> scsi_remove_host()
> > > > >>> ...
> > > > >>> device_del(&shost->shost_gendev)
> > > > >>> ...
> > > > >>> device_link_purge()
> > > > >>> __device_link_del()
> > > > >>> device_unregister(&link->link_dev)
> > > > >>> devlink_dev_release
> > > > >>> call_srcu(__device_link_free_srcu) ----------->
> > > > >>> srcu_invoke_callbacks (atomic)
> > > > >>> __device_link_free_srcu
> > > > >>> ...
> > > > >>> scsi_host_put()
> > > > >>> put_device(&shost->shost_gendev) (ref = 1)
> > > > >>> device_link_free()
> > > > >>> put_device(link->consumer)
> > > > >>> //shost->gendev ref = 0
> > > > >>> ...
> > > > >>> scsi_host_dev_release
> > > > >>> ...
> > > > >>> rcu_barrier
> > > > >>> kthread_stop()
> > > > >>>
> > > > >>>
> > > > >>> We can check kref of shost->shost_gendev to make sure scsi_host_put()
> > > > >>> to release scsi host device in LLDD driver to avoid the issue,
> > > > >>>
> > > > >>> but it seems be a common issue: function __device_link_free_srcu
> > > > >>> calls put_device() for consumer and supplier,
> > > > >>>
> > > > >>> but if it's ref =0 at that time and there are scheduling or sleep in
> > > > >>> dev_release, it may have the issue.
> > > > >>>
> > > > >>> Do you have any idea about the issue?
> > > > >>>
> > > > >> Yes, this is a general issue.
> > > > >>
> > > > >> If I'm not mistaken, it can be addressed by further deferring the
> > > > >> device_link_free() invocation through a workqueue.
> > > > >>
> > > > >> Let me cut a patch doing this.
> > > > > Please test the patch below and let me know if it works for you.
> > > >
> > > > I have a test on the patch, and it solves my issue.
> > >
> > > Great, thanks!
> > >
> > > Please also test the patch appended below (it uses a slightly different approach).
> > >
> > > ---
> > > drivers/base/core.c | 37 +++++++++++++++++++++++--------------
> > > include/linux/device.h | 6 ++----
> > > 2 files changed, 25 insertions(+), 18 deletions(-)
> > >
> > > Index: linux-pm/drivers/base/core.c
> > > ===================================================================
> > > --- linux-pm.orig/drivers/base/core.c
> > > +++ linux-pm/drivers/base/core.c
> > > @@ -193,6 +193,11 @@ int device_links_read_lock_held(void)
> > > {
> > > return srcu_read_lock_held(&device_links_srcu);
> > > }
> > > +
> > > +void device_link_synchronize_removal(void)
> > > +{
> > > + synchronize_srcu(&device_links_srcu);
> > > +}
> > > #else /* !CONFIG_SRCU */
> > > static DECLARE_RWSEM(device_links_lock);
> > >
> > > @@ -223,6 +228,10 @@ int device_links_read_lock_held(void)
> > > return lockdep_is_held(&device_links_lock);
> > > }
> > > #endif
> > > +
> > > +static inline void device_link_synchronize_removal(void)
> > > +{
> > > +}
> > > #endif /* !CONFIG_SRCU */
> > >
> > > static bool device_is_ancestor(struct device *dev, struct device *target)
> > > @@ -444,8 +453,13 @@ static struct attribute *devlink_attrs[]
> > > };
> > > ATTRIBUTE_GROUPS(devlink);
> > >
> > > -static void device_link_free(struct device_link *link)
> > > +static void device_link_release_fn(struct work_struct *work)
> > > {
> > > + struct device_link *link = container_of(work, struct device_link, rm_work);
> > > +
> > > + /* Ensure that all references to the link object have been dropped. */
> > > + device_link_synchronize_removal();
> > > +
> > > while (refcount_dec_not_one(&link->rpm_active))
> > > pm_runtime_put(link->supplier);
> > >
> > > @@ -454,24 +468,19 @@ static void device_link_free(struct devi
> > > kfree(link);
> > > }
> > >
> > > -#ifdef CONFIG_SRCU
> > > -static void __device_link_free_srcu(struct rcu_head *rhead)
> > > -{
> > > - device_link_free(container_of(rhead, struct device_link, rcu_head));
> > > -}
> > > -
> > > static void devlink_dev_release(struct device *dev)
> > > {
> > > struct device_link *link = to_devlink(dev);
> > >
> > > - call_srcu(&device_links_srcu, &link->rcu_head, __device_link_free_srcu);
> > > -}
> > > -#else
> > > -static void devlink_dev_release(struct device *dev)
> > > -{
> > > - device_link_free(to_devlink(dev));
> > > + INIT_WORK(&link->rm_work, device_link_release_fn);
> > > + /*
> > > + * It may take a while to complete this work because of the SRCU
> > > + * synchronization in device_link_release_fn() and if the consumer or
> > > + * supplier devices get deleted when it runs, so put it into the "long"
> > > + * workqueue.
> > > + */
> > > + queue_work(system_long_wq, &link->rm_work);
> >
> > Not too strong of an opinion, but this seems like an unnecessary work
> > queue when SRCUs aren't enabled.
>
> It is not strictly necessary then, but I want the code with and
> without SRCU to be as similar as possible and it doesn't really hurt
> to defer the freeing of memory associated with the device link even in
> the non-SRCU case, because whoever drops the last reference to the
> device link doesn't really care when that memory gets released and may
> not want to wait until that actually happens.

Yeah, I fully understand why you did it. I was just wondering if it
was worth the additional CPU cycles and context switching. I'll leave
it at that.

Whichever way you go:
Reviewed-by: Saravana Kannan <[email protected]>

-Saravana

>
> > We could just leave this part as is and limit your changes to the SRCU implementation?
>
> I don't really see why that would be better.