2019-11-19 01:20:35

by Xiang Zheng

[permalink] [raw]
Subject: [PATCH v2] pci: lock the pci_cfg_wait queue for the consistency of data

Commit "7ea7e98fd8d0" suggests that the "pci_lock" is sufficient,
and all the callers of pci_wait_cfg() are wrapped with the "pci_lock".

However, since the commit "cdcb33f98244" merged, the accesses to
the pci_cfg_wait queue are not safe anymore. A "pci_lock" is
insufficient and we need to hold an additional queue lock while
read/write the wait queue.

So let's use the add_wait_queue()/remove_wait_queue() instead of
__add_wait_queue()/__remove_wait_queue(). Also move the wait queue
functionality around the "schedule()" function to avoid reintroducing
the deadlock addressed by "cdcb33f98244".

Signed-off-by: Xiang Zheng <[email protected]>
Cc: Heyi Guo <[email protected]>
Cc: Biaoxiang Ye <[email protected]>
---

v2:
- Move the wait queue functionality around the "schedule()" function to
avoid reintroducing the deadlock addressed by "cdcb33f98244"

---

drivers/pci/access.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/access.c b/drivers/pci/access.c
index 2fccb5762c76..09342a74e5ea 100644
--- a/drivers/pci/access.c
+++ b/drivers/pci/access.c
@@ -207,14 +207,14 @@ static noinline void pci_wait_cfg(struct pci_dev *dev)
{
DECLARE_WAITQUEUE(wait, current);

- __add_wait_queue(&pci_cfg_wait, &wait);
do {
set_current_state(TASK_UNINTERRUPTIBLE);
raw_spin_unlock_irq(&pci_lock);
+ add_wait_queue(&pci_cfg_wait, &wait);
schedule();
+ remove_wait_queue(&pci_cfg_wait, &wait);
raw_spin_lock_irq(&pci_lock);
} while (dev->block_cfg_access);
- __remove_wait_queue(&pci_cfg_wait, &wait);
}

/* Returns 0 on success, negative values indicate error. */
--
2.19.1



2019-11-19 20:26:55

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH v2] pci: lock the pci_cfg_wait queue for the consistency of data

On Tue, Nov 19, 2019 at 09:15:45AM +0800, Xiang Zheng wrote:
> Commit "7ea7e98fd8d0" suggests that the "pci_lock" is sufficient,
> and all the callers of pci_wait_cfg() are wrapped with the "pci_lock".
>
> However, since the commit "cdcb33f98244" merged, the accesses to
> the pci_cfg_wait queue are not safe anymore. A "pci_lock" is
> insufficient and we need to hold an additional queue lock while
> read/write the wait queue.
>
> So let's use the add_wait_queue()/remove_wait_queue() instead of
> __add_wait_queue()/__remove_wait_queue(). Also move the wait queue
> functionality around the "schedule()" function to avoid reintroducing
> the deadlock addressed by "cdcb33f98244".

Procedural nits:

- Run "git log --oneline drivers/pci/access.c" and follow the
convention, e.g., starts with "PCI: " and first subsequent word is
capitalized.

- Use conventional commit references, e.g., 7ea7e98fd8d0 ("PCI:
Block on access to temporarily unavailable pci device") and
cdcb33f98244 ("PCI: Avoid possible deadlock on pci_lock and
p->pi_lock")

- IIRC you found that this actually caused a panic; please include
the lore.kernel.org URL to that report.

You can wait for a while to see if there are more substantive comments
to address before posting a v3.

> Signed-off-by: Xiang Zheng <[email protected]>
> Cc: Heyi Guo <[email protected]>
> Cc: Biaoxiang Ye <[email protected]>
> ---
>
> v2:
> - Move the wait queue functionality around the "schedule()" function to
> avoid reintroducing the deadlock addressed by "cdcb33f98244"
>
> ---
>
> drivers/pci/access.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/pci/access.c b/drivers/pci/access.c
> index 2fccb5762c76..09342a74e5ea 100644
> --- a/drivers/pci/access.c
> +++ b/drivers/pci/access.c
> @@ -207,14 +207,14 @@ static noinline void pci_wait_cfg(struct pci_dev *dev)
> {
> DECLARE_WAITQUEUE(wait, current);
>
> - __add_wait_queue(&pci_cfg_wait, &wait);
> do {
> set_current_state(TASK_UNINTERRUPTIBLE);
> raw_spin_unlock_irq(&pci_lock);
> + add_wait_queue(&pci_cfg_wait, &wait);
> schedule();
> + remove_wait_queue(&pci_cfg_wait, &wait);
> raw_spin_lock_irq(&pci_lock);
> } while (dev->block_cfg_access);
> - __remove_wait_queue(&pci_cfg_wait, &wait);
> }
>
> /* Returns 0 on success, negative values indicate error. */
> --
> 2.19.1
>
>

2019-11-20 06:23:07

by Xiang Zheng

[permalink] [raw]
Subject: Re: [PATCH v2] pci: lock the pci_cfg_wait queue for the consistency of data


On 2019/11/20 4:23, Bjorn Helgaas wrote:
> On Tue, Nov 19, 2019 at 09:15:45AM +0800, Xiang Zheng wrote:
>> Commit "7ea7e98fd8d0" suggests that the "pci_lock" is sufficient,
>> and all the callers of pci_wait_cfg() are wrapped with the "pci_lock".
>>
>> However, since the commit "cdcb33f98244" merged, the accesses to
>> the pci_cfg_wait queue are not safe anymore. A "pci_lock" is
>> insufficient and we need to hold an additional queue lock while
>> read/write the wait queue.
>>
>> So let's use the add_wait_queue()/remove_wait_queue() instead of
>> __add_wait_queue()/__remove_wait_queue(). Also move the wait queue
>> functionality around the "schedule()" function to avoid reintroducing
>> the deadlock addressed by "cdcb33f98244".
>
> Procedural nits:
>
> - Run "git log --oneline drivers/pci/access.c" and follow the
> convention, e.g., starts with "PCI: " and first subsequent word is
> capitalized.
>
> - Use conventional commit references, e.g., 7ea7e98fd8d0 ("PCI:
> Block on access to temporarily unavailable pci device") and
> cdcb33f98244 ("PCI: Avoid possible deadlock on pci_lock and
> p->pi_lock")
>
> - IIRC you found that this actually caused a panic; please include
> the lore.kernel.org URL to that report.
>

Got it, I will address these nits.

> You can wait for a while to see if there are more substantive comments
> to address before posting a v3.
>

OK.

>> Signed-off-by: Xiang Zheng <[email protected]>
>> Cc: Heyi Guo <[email protected]>
>> Cc: Biaoxiang Ye <[email protected]>
>> ---
>>
>> v2:
>> - Move the wait queue functionality around the "schedule()" function to
>> avoid reintroducing the deadlock addressed by "cdcb33f98244"
>>
>> ---
>>
>> drivers/pci/access.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/pci/access.c b/drivers/pci/access.c
>> index 2fccb5762c76..09342a74e5ea 100644
>> --- a/drivers/pci/access.c
>> +++ b/drivers/pci/access.c
>> @@ -207,14 +207,14 @@ static noinline void pci_wait_cfg(struct pci_dev *dev)
>> {
>> DECLARE_WAITQUEUE(wait, current);
>>
>> - __add_wait_queue(&pci_cfg_wait, &wait);
>> do {
>> set_current_state(TASK_UNINTERRUPTIBLE);
>> raw_spin_unlock_irq(&pci_lock);
>> + add_wait_queue(&pci_cfg_wait, &wait);
>> schedule();
>> + remove_wait_queue(&pci_cfg_wait, &wait);
>> raw_spin_lock_irq(&pci_lock);
>> } while (dev->block_cfg_access);
>> - __remove_wait_queue(&pci_cfg_wait, &wait);
>> }
>>
>> /* Returns 0 on success, negative values indicate error. */
>> --
>> 2.19.1
>>
>>
>
> .
>

--

Thanks,
Xiang