When I use the following command to power on a slot which has been
powered off already.
echo 1 > /sys/bus/pci/slots/22/power
It prints the following error:
-bash: echo: write error: No such device
But the slot is actually powered on and the devices is probed.
In function 'pciehp_sysfs_enable_slot()', we use 'wait_event()' to wait
until 'ctrl->pending_events' is cleared in 'pciehp_ist()'. But in some
situation, when 'pciehp_ist()' is woken up on a nearby CPU after
'pciehp_request' is called, 'ctrl->pending_events' is cleared before we
go into sleep state. 'wait_event()' will check the condition before
going into sleep. So we return immediately and '-ENODEV' is return.
This patch use struct completion to wait until irq_thread 'pciehp_ist'
is completed.
Signed-off-by: Xiongfeng Wang <[email protected]>
---
drivers/pci/hotplug/pciehp.h | 2 +-
drivers/pci/hotplug/pciehp_ctrl.c | 8 ++++----
drivers/pci/hotplug/pciehp_hpc.c | 4 ++--
3 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
index 8c51a04..f8c3826 100644
--- a/drivers/pci/hotplug/pciehp.h
+++ b/drivers/pci/hotplug/pciehp.h
@@ -102,7 +102,7 @@ struct controller {
struct hotplug_slot hotplug_slot; /* hotplug core interface */
struct rw_semaphore reset_lock;
int request_result;
- wait_queue_head_t requester;
+ struct completion requester;
};
/**
diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
index 631ced0..2237793 100644
--- a/drivers/pci/hotplug/pciehp_ctrl.c
+++ b/drivers/pci/hotplug/pciehp_ctrl.c
@@ -366,9 +366,9 @@ int pciehp_sysfs_enable_slot(struct hotplug_slot *hotplug_slot)
* card before the thread wakes up, so initialize to -ENODEV.
*/
ctrl->request_result = -ENODEV;
+ reinit_completion(&ctrl->requester);
pciehp_request(ctrl, PCI_EXP_SLTSTA_PDC);
- wait_event(ctrl->requester,
- !atomic_read(&ctrl->pending_events));
+ wait_for_completion(&ctrl->requester);
return ctrl->request_result;
case POWERON_STATE:
ctrl_info(ctrl, "Slot(%s): Already in powering on state\n",
@@ -399,9 +399,9 @@ int pciehp_sysfs_disable_slot(struct hotplug_slot *hotplug_slot)
case BLINKINGOFF_STATE:
case ON_STATE:
mutex_unlock(&ctrl->state_lock);
+ reinit_completion(&ctrl->requester);
pciehp_request(ctrl, DISABLE_SLOT);
- wait_event(ctrl->requester,
- !atomic_read(&ctrl->pending_events));
+ wait_for_completion(&ctrl->requester);
return ctrl->request_result;
case POWEROFF_STATE:
ctrl_info(ctrl, "Slot(%s): Already in powering off state\n",
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index bd990e3..0a74b48 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -654,7 +654,7 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
up_read(&ctrl->reset_lock);
pci_config_pm_runtime_put(pdev);
- wake_up(&ctrl->requester);
+ complete(&ctrl->requester);
return IRQ_HANDLED;
}
@@ -862,7 +862,7 @@ struct controller *pcie_init(struct pcie_device *dev)
mutex_init(&ctrl->ctrl_lock);
mutex_init(&ctrl->state_lock);
init_rwsem(&ctrl->reset_lock);
- init_waitqueue_head(&ctrl->requester);
+ init_completion(&ctrl->requester);
init_waitqueue_head(&ctrl->queue);
INIT_DELAYED_WORK(&ctrl->button_work, pciehp_queue_pushbutton_work);
dbg_ctrl(ctrl);
--
1.7.12.4
On Thu, Jul 04, 2019 at 03:50:38PM +0800, Xiongfeng Wang wrote:
> When I use the following command to power on a slot which has been
> powered off already.
> echo 1 > /sys/bus/pci/slots/22/power
> It prints the following error:
> -bash: echo: write error: No such device
> But the slot is actually powered on and the devices is probed.
>
> In function 'pciehp_sysfs_enable_slot()', we use 'wait_event()' to wait
> until 'ctrl->pending_events' is cleared in 'pciehp_ist()'. But in some
> situation, when 'pciehp_ist()' is woken up on a nearby CPU after
> 'pciehp_request' is called, 'ctrl->pending_events' is cleared before we
> go into sleep state. 'wait_event()' will check the condition before
> going into sleep. So we return immediately and '-ENODEV' is return.
>
> This patch use struct completion to wait until irq_thread 'pciehp_ist'
> is completed.
Thank you, good catch.
Unfortunately your patch still allows the following race AFAICS:
* pciehp_ist() is running (e.g. due to a hotplug operation)
* a request to disable or enable the slot is submitted via sysfs,
the completion is reinitialized
* pciehp_ist() finishes, signals completion
* the sysfs request returns to user space prematurely
* pciehp_ist() is run, handles the sysfs request, signals completion again
I'd suggest something like the below instead, could you give it a whirl
and see if it reliably fixes the issue for you?
-- >8 --
Subject: [PATCH] PCI: pciehp: Avoid returning prematurely from sysfs requests
A sysfs request to enable or disable a PCIe hotplug slot should not
return before it has been carried out. That is sought to be achieved
by waiting until the controller's "pending_events" have been cleared.
However the IRQ thread pciehp_ist() clears the "pending_events" before
it acts on them. If pciehp_sysfs_enable_slot() / _disable_slot() happen
to check the "pending_events" after they have been cleared but while
pciehp_ist() is still running, the functions may return prematurely
with an incorrect return value.
Fix by introducing an "ist_running" flag which must be false before a
sysfs request is allowed to return.
Fixes: 32a8cef274fe ("PCI: pciehp: Enable/disable exclusively from IRQ thread")
Link: https://lore.kernel.org/linux-pci/[email protected]
Reported-by: Xiongfeng Wang <[email protected]>
Signed-off-by: Lukas Wunner <[email protected]>
Cc: [email protected] # v4.19+
---
drivers/pci/hotplug/pciehp.h | 2 ++
drivers/pci/hotplug/pciehp_ctrl.c | 6 ++++--
drivers/pci/hotplug/pciehp_hpc.c | 2 ++
3 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
index 8c51a04b8083..e316bde45c7b 100644
--- a/drivers/pci/hotplug/pciehp.h
+++ b/drivers/pci/hotplug/pciehp.h
@@ -72,6 +72,7 @@ extern int pciehp_poll_time;
* @reset_lock: prevents access to the Data Link Layer Link Active bit in the
* Link Status register and to the Presence Detect State bit in the Slot
* Status register during a slot reset which may cause them to flap
+ * @ist_running: flag to keep user request waiting while IRQ thread is running
* @request_result: result of last user request submitted to the IRQ thread
* @requester: wait queue to wake up on completion of user request,
* used for synchronous slot enable/disable request via sysfs
@@ -101,6 +102,7 @@ struct controller {
struct hotplug_slot hotplug_slot; /* hotplug core interface */
struct rw_semaphore reset_lock;
+ unsigned int ist_running;
int request_result;
wait_queue_head_t requester;
};
diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
index 631ced0ab28a..1ce9ce335291 100644
--- a/drivers/pci/hotplug/pciehp_ctrl.c
+++ b/drivers/pci/hotplug/pciehp_ctrl.c
@@ -368,7 +368,8 @@ int pciehp_sysfs_enable_slot(struct hotplug_slot *hotplug_slot)
ctrl->request_result = -ENODEV;
pciehp_request(ctrl, PCI_EXP_SLTSTA_PDC);
wait_event(ctrl->requester,
- !atomic_read(&ctrl->pending_events));
+ !atomic_read(&ctrl->pending_events) &&
+ !ctrl->ist_running);
return ctrl->request_result;
case POWERON_STATE:
ctrl_info(ctrl, "Slot(%s): Already in powering on state\n",
@@ -401,7 +402,8 @@ int pciehp_sysfs_disable_slot(struct hotplug_slot *hotplug_slot)
mutex_unlock(&ctrl->state_lock);
pciehp_request(ctrl, DISABLE_SLOT);
wait_event(ctrl->requester,
- !atomic_read(&ctrl->pending_events));
+ !atomic_read(&ctrl->pending_events) &&
+ !ctrl->ist_running);
return ctrl->request_result;
case POWEROFF_STATE:
ctrl_info(ctrl, "Slot(%s): Already in powering off state\n",
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index bd990e3371e3..9e2d7688e8cc 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -608,6 +608,7 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
irqreturn_t ret;
u32 events;
+ ctrl->ist_running = true;
pci_config_pm_runtime_get(pdev);
/* rerun pciehp_isr() if the port was inaccessible on interrupt */
@@ -654,6 +655,7 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
up_read(&ctrl->reset_lock);
pci_config_pm_runtime_put(pdev);
+ ctrl->ist_running = false;
wake_up(&ctrl->requester);
return IRQ_HANDLED;
}
--
2.20.1
Hi, Lukas
On 2019/8/6 15:24, Lukas Wunner wrote:
> On Thu, Jul 04, 2019 at 03:50:38PM +0800, Xiongfeng Wang wrote:
>> When I use the following command to power on a slot which has been
>> powered off already.
>> echo 1 > /sys/bus/pci/slots/22/power
>> It prints the following error:
>> -bash: echo: write error: No such device
>> But the slot is actually powered on and the devices is probed.
>>
>> In function 'pciehp_sysfs_enable_slot()', we use 'wait_event()' to wait
>> until 'ctrl->pending_events' is cleared in 'pciehp_ist()'. But in some
>> situation, when 'pciehp_ist()' is woken up on a nearby CPU after
>> 'pciehp_request' is called, 'ctrl->pending_events' is cleared before we
>> go into sleep state. 'wait_event()' will check the condition before
>> going into sleep. So we return immediately and '-ENODEV' is return.
>>
>> This patch use struct completion to wait until irq_thread 'pciehp_ist'
>> is completed.
>
> Thank you, good catch.
>
> Unfortunately your patch still allows the following race AFAICS:
>
> * pciehp_ist() is running (e.g. due to a hotplug operation)
> * a request to disable or enable the slot is submitted via sysfs,
> the completion is reinitialized
> * pciehp_ist() finishes, signals completion
> * the sysfs request returns to user space prematurely
> * pciehp_ist() is run, handles the sysfs request, signals completion again
>
> I'd suggest something like the below instead, could you give it a whirl
> and see if it reliably fixes the issue for you?
I tested the below patch. It can fix the issue.
I am not sure whether the following sequence will be a problem.
* pciehp_ist() is running, and 'ctrl->pending_events' is cleared
* a request to disable the slot is submitted via sysfs. 'ctrl->pending_events'
is set and the irq_thread 'pciehp_ist' is waken up. But pciehp_ist() is running.
So it doesn't take effect. 'ctrl->pending_events' is not cleared until next time
pciehp_ist() is waken up. So pciehp_sysfs_enable_slot() will wait until next
pciehp_ist() is waken up. I am not sure how 'irq_wake_thread()' will effect
the running irq_thread.
How about making the process synchronous instead of waking up the irq_thread ?
Thanks,
Xiongfeng.
>
> -- >8 --
>
> Subject: [PATCH] PCI: pciehp: Avoid returning prematurely from sysfs requests
>
> A sysfs request to enable or disable a PCIe hotplug slot should not
> return before it has been carried out. That is sought to be achieved
> by waiting until the controller's "pending_events" have been cleared.
>
> However the IRQ thread pciehp_ist() clears the "pending_events" before
> it acts on them. If pciehp_sysfs_enable_slot() / _disable_slot() happen
> to check the "pending_events" after they have been cleared but while
> pciehp_ist() is still running, the functions may return prematurely
> with an incorrect return value.
>
> Fix by introducing an "ist_running" flag which must be false before a
> sysfs request is allowed to return.
>
> Fixes: 32a8cef274fe ("PCI: pciehp: Enable/disable exclusively from IRQ thread")
> Link: https://lore.kernel.org/linux-pci/[email protected]
> Reported-by: Xiongfeng Wang <[email protected]>
> Signed-off-by: Lukas Wunner <[email protected]>
> Cc: [email protected] # v4.19+
> ---
> drivers/pci/hotplug/pciehp.h | 2 ++
> drivers/pci/hotplug/pciehp_ctrl.c | 6 ++++--
> drivers/pci/hotplug/pciehp_hpc.c | 2 ++
> 3 files changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
> index 8c51a04b8083..e316bde45c7b 100644
> --- a/drivers/pci/hotplug/pciehp.h
> +++ b/drivers/pci/hotplug/pciehp.h
> @@ -72,6 +72,7 @@ extern int pciehp_poll_time;
> * @reset_lock: prevents access to the Data Link Layer Link Active bit in the
> * Link Status register and to the Presence Detect State bit in the Slot
> * Status register during a slot reset which may cause them to flap
> + * @ist_running: flag to keep user request waiting while IRQ thread is running
> * @request_result: result of last user request submitted to the IRQ thread
> * @requester: wait queue to wake up on completion of user request,
> * used for synchronous slot enable/disable request via sysfs
> @@ -101,6 +102,7 @@ struct controller {
>
> struct hotplug_slot hotplug_slot; /* hotplug core interface */
> struct rw_semaphore reset_lock;
> + unsigned int ist_running;
> int request_result;
> wait_queue_head_t requester;
> };
> diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
> index 631ced0ab28a..1ce9ce335291 100644
> --- a/drivers/pci/hotplug/pciehp_ctrl.c
> +++ b/drivers/pci/hotplug/pciehp_ctrl.c
> @@ -368,7 +368,8 @@ int pciehp_sysfs_enable_slot(struct hotplug_slot *hotplug_slot)
> ctrl->request_result = -ENODEV;
> pciehp_request(ctrl, PCI_EXP_SLTSTA_PDC);
> wait_event(ctrl->requester,
> - !atomic_read(&ctrl->pending_events));
> + !atomic_read(&ctrl->pending_events) &&
> + !ctrl->ist_running);
> return ctrl->request_result;
> case POWERON_STATE:
> ctrl_info(ctrl, "Slot(%s): Already in powering on state\n",
> @@ -401,7 +402,8 @@ int pciehp_sysfs_disable_slot(struct hotplug_slot *hotplug_slot)
> mutex_unlock(&ctrl->state_lock);
> pciehp_request(ctrl, DISABLE_SLOT);
> wait_event(ctrl->requester,
> - !atomic_read(&ctrl->pending_events));
> + !atomic_read(&ctrl->pending_events) &&
> + !ctrl->ist_running);
> return ctrl->request_result;
> case POWEROFF_STATE:
> ctrl_info(ctrl, "Slot(%s): Already in powering off state\n",
> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
> index bd990e3371e3..9e2d7688e8cc 100644
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -608,6 +608,7 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
> irqreturn_t ret;
> u32 events;
>
> + ctrl->ist_running = true;
> pci_config_pm_runtime_get(pdev);
>
> /* rerun pciehp_isr() if the port was inaccessible on interrupt */
> @@ -654,6 +655,7 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
> up_read(&ctrl->reset_lock);
>
> pci_config_pm_runtime_put(pdev);
> + ctrl->ist_running = false;
> wake_up(&ctrl->requester);
> return IRQ_HANDLED;
> }
>
On Wed, Aug 07, 2019 at 04:28:32PM +0800, Xiongfeng Wang wrote:
> On 2019/8/6 15:24, Lukas Wunner wrote:
> > I'd suggest something like the below instead, could you give it a whirl
> > and see if it reliably fixes the issue for you?
>
> I tested the below patch. It can fix the issue.
Thank you! I'll submit it as a proper patch then.
> I am not sure whether the following sequence will be a problem.
> * pciehp_ist() is running, and 'ctrl->pending_events' is cleared
> * a request to disable the slot is submitted via sysfs.
> 'ctrl->pending_events' is set and the irq_thread 'pciehp_ist' is waken up.
> But pciehp_ist() is running. So it doesn't take effect.
> 'ctrl->pending_events' is not cleared until next time pciehp_ist() is
> waken up. So pciehp_sysfs_enable_slot() will wait until next
> pciehp_ist() is waken up. I am not sure how 'irq_wake_thread()' will
> effect the running irq_thread.
That's not a problem. If irq_wake_thread() is called while pciehp_ist()
is running, the latter will automatically perform another iteration.
It's the same situation if an interrupt is received while pciehp_ist()
is running.
irq_wake_thread() sets the IRQTF_RUNTHREAD flag and irq_wait_for_interrupt()
checks that flag and causes irq_thread() to perform another invocation
of handler_fn(), which is pciehp_ist() in this case.
So pciehp basically treats a user request like an interrupt received from
the hardware. It's meant to simplify the pciehp code. But it's non-trivial
to understand because one needs to have an understanding of the genirq
code to appreciate the simplicity.
Let me know if this explanation wasn't clear enough and you have further
questions.
> How about making the process synchronous instead of waking up the
> irq_thread?
That's what we had before, but it has its own problems since it requires
locking and interaction between the IRQ thread and the sysfs entry points.
Thanks,
Lukas