2020-11-22 01:44:21

by Ashok Raj

[permalink] [raw]
Subject: [Patch v2 1/1] PCI: pciehp: Add support for handling MRL events

When Mechanical Retention Lock (MRL) is present, Linux doesn't process
those change events.

Support for these can be found starting Icelake Server.

The following changes need to be enabled when MRL is present.

1. Subscribe to MRL change events in SlotControl.
2. When MRL is closed,
- If there is no ATTN button, then POWER on the slot.
- If there is ATTN button, and an MRL event pending, ignore
Presence Detect. Since we want ATTN button to drive the
hotplug event.
- If currently slot is powered on, but MRL is open,
PCIe Base Spec 5.0 Chapter 6.7.1.3 states.
If an MRL Sensor is implemented without a corresponding
MRL Sensor input on the Hot-Plug Controller, it is recommended
that the MRL Sensor be routed to power fault input of the Hot-Plug
Controller. This allows an active adapter to be powered off when the
MRL is opened."

This seems to suggest that the slot should be brought
down as soon as MRL is opened.

Signed-off-by: Ashok Raj <[email protected]>
Co-developed-by: Kuppuswamy Sathyanarayanan <[email protected]>
---
Changes since v1:
- Changes suggested by Lucas Wunner
https://lore.kernel.org/linux-pci/20201119223749.GA103783@otc-nc-03/T/#m1f661ae901e7dedad73dea370bb63abd52c610eb
- Consolidate MRL handling in pciehp_handle_presence_or_link_change()
- Added helped latch_closed()
- Add comments why MRL open should function as hot-remove.
- Don't nuke PDC, it might mask a button PUSH synthesized event after 5
secs.
- Bjorn: Fix Subject to be consistent with other commits.
---
drivers/pci/hotplug/pciehp_ctrl.c | 36 +++++++++++++++++++++++++++++++++++-
drivers/pci/hotplug/pciehp_hpc.c | 14 ++++++++++++--
2 files changed, 47 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
index 9f85815b4f53..aa8b187ff769 100644
--- a/drivers/pci/hotplug/pciehp_ctrl.c
+++ b/drivers/pci/hotplug/pciehp_ctrl.c
@@ -224,9 +224,22 @@ void pciehp_handle_disable_request(struct controller *ctrl)
ctrl->request_result = pciehp_disable_slot(ctrl, SAFE_REMOVAL);
}

+static bool latch_closed(struct controller *ctrl)
+{
+ u8 getstatus = 0;
+
+ if (!MRL_SENS(ctrl))
+ return true;
+
+ pciehp_get_latch_status(ctrl, &getstatus);
+
+ return (getstatus == 0 ? true : false);
+}
+
void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
{
int present, link_active;
+ u8 getstatus = 0;

/*
* If the slot is on and presence or link has changed, turn it off.
@@ -246,6 +259,20 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
if (events & PCI_EXP_SLTSTA_PDC)
ctrl_info(ctrl, "Slot(%s): Card not present\n",
slot_name(ctrl));
+ if (events & PCI_EXP_SLTSTA_MRLSC)
+ ctrl_info(ctrl, "Slot(%s): Latch %s\n",
+ slot_name(ctrl), getstatus ? "Open" : "Closed");
+ /*
+ * PCIe Base Spec 5.0 Chapter 6.7.1.3 states.
+ *
+ * If an MRL Sensor is implemented without a corresponding MRL Sensor input
+ * on the Hot-Plug Controller, it is recommended that the MRL Sensor be
+ * routed to power fault input of the Hot-Plug Controller.
+ * This allows an active adapter to be powered off when the MRL is opened."
+ *
+ * This seems to suggest that the slot should be brought down as soon as MRL
+ * is opened.
+ */
pciehp_disable_slot(ctrl, SURPRISE_REMOVAL);
break;
default:
@@ -257,7 +284,7 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
mutex_lock(&ctrl->state_lock);
present = pciehp_card_present(ctrl);
link_active = pciehp_check_link_active(ctrl);
- if (present <= 0 && link_active <= 0) {
+ if ((present <= 0 && link_active <= 0) || !latch_closed(ctrl)) {
mutex_unlock(&ctrl->state_lock);
return;
}
@@ -275,6 +302,13 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
if (link_active)
ctrl_info(ctrl, "Slot(%s): Link Up\n",
slot_name(ctrl));
+ /*
+ * If slot is closed && ATTN button exists
+ * don't continue, let the ATTN button
+ * drive the hot-plug
+ */
+ if (((events & PCI_EXP_SLTSTA_MRLSC) && ATTN_BUTTN(ctrl)))
+ return;
ctrl->request_result = pciehp_enable_slot(ctrl);
break;
default:
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 53433b37e181..7cfa27bcf951 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -605,7 +605,7 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
*/
status &= PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD |
PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_CC |
- PCI_EXP_SLTSTA_DLLSC;
+ PCI_EXP_SLTSTA_DLLSC | PCI_EXP_SLTSTA_MRLSC;

/*
* If we've already reported a power fault, don't report it again
@@ -710,8 +710,10 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
down_read(&ctrl->reset_lock);
if (events & DISABLE_SLOT)
pciehp_handle_disable_request(ctrl);
- else if (events & (PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_DLLSC))
+ else if (events & (PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_DLLSC |
+ PCI_EXP_SLTSTA_MRLSC))
pciehp_handle_presence_or_link_change(ctrl, events);
+
up_read(&ctrl->reset_lock);

ret = IRQ_HANDLED;
@@ -768,6 +770,14 @@ static void pcie_enable_notification(struct controller *ctrl)
cmd |= PCI_EXP_SLTCTL_ABPE;
else
cmd |= PCI_EXP_SLTCTL_PDCE;
+
+ /*
+ * If MRL sensor is present, then subscribe for MRL
+ * Changes to be notified as well.
+ */
+ if (MRL_SENS(ctrl))
+ cmd |= PCI_EXP_SLTCTL_MRLSCE;
+
if (!pciehp_poll_mode)
cmd |= PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE;

--
2.7.4


2020-11-22 09:12:58

by Lukas Wunner

[permalink] [raw]
Subject: Re: [Patch v2 1/1] PCI: pciehp: Add support for handling MRL events

On Sat, Nov 21, 2020 at 05:42:03PM -0800, Ashok Raj wrote:
> --- a/drivers/pci/hotplug/pciehp_ctrl.c
> +++ b/drivers/pci/hotplug/pciehp_ctrl.c
> void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
> {
> int present, link_active;
> + u8 getstatus = 0;
>
> /*
> * If the slot is on and presence or link has changed, turn it off.
> @@ -246,6 +259,20 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
> if (events & PCI_EXP_SLTSTA_PDC)
> ctrl_info(ctrl, "Slot(%s): Card not present\n",
> slot_name(ctrl));
> + if (events & PCI_EXP_SLTSTA_MRLSC)
> + ctrl_info(ctrl, "Slot(%s): Latch %s\n",
> + slot_name(ctrl), getstatus ? "Open" : "Closed");

This message will currently always be "Latch closed". It should be
"Latch open" instead because if the slot was up, the latch must have
been closed. So an MRLSC event can only mean that the latch is now open.
The "getstatus" variable can be removed.


> + /*
> + * PCIe Base Spec 5.0 Chapter 6.7.1.3 states.
> + *
> + * If an MRL Sensor is implemented without a corresponding MRL Sensor input
> + * on the Hot-Plug Controller, it is recommended that the MRL Sensor be
> + * routed to power fault input of the Hot-Plug Controller.
> + * This allows an active adapter to be powered off when the MRL is opened."
> + *
> + * This seems to suggest that the slot should be brought down as soon as MRL
> + * is opened.
> + */
> pciehp_disable_slot(ctrl, SURPRISE_REMOVAL);
> break;

The code comment is not wrapped at 80 chars and a bit long.
I'd move it to the commit message and keep only a shortened version here.

The "SURPRISE_REMOVAL" may now be problematic because the card may still
be in the slot (both presence and link still up) with only the MRL open.
My suggestion would be to add a local variable "bool safe_removal"
which is initialized to "SAFE_REMOVAL". In the two if-clauses for
DLLSC and PDC, it is set to SURPRISE_REMOVAL.


> @@ -275,6 +302,13 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
> if (link_active)
> ctrl_info(ctrl, "Slot(%s): Link Up\n",
> slot_name(ctrl));
> + /*
> + * If slot is closed && ATTN button exists
> + * don't continue, let the ATTN button
> + * drive the hot-plug
> + */
> + if (((events & PCI_EXP_SLTSTA_MRLSC) && ATTN_BUTTN(ctrl)))
> + return;
> ctrl->request_result = pciehp_enable_slot(ctrl);
> break;

Hm, if the Attention Button is pressed with MRL still open, the slot is
not brought up. If the MRL is subsequently closed, it is still not
brought up. I guess the slot keeps blinking and one has to push the
button to abort the operation, then press it once more to attempt
another slot bringup. The spec doesn't seem to say how such a situation
should be handled. Oh well.

I'm wondering if this is the right place to bail out: Immediately
before the above hunk, the button_work is canceled, so it can't later
trigger bringup of the slot. Shouldn't the above check be in the
code block with the "Turn the slot on if it's occupied or link is up"
comment?

You're also not unlocking the state_lock here before bailing out of
the function.


> @@ -710,8 +710,10 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
> down_read(&ctrl->reset_lock);
> if (events & DISABLE_SLOT)
> pciehp_handle_disable_request(ctrl);
> - else if (events & (PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_DLLSC))
> + else if (events & (PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_DLLSC |
> + PCI_EXP_SLTSTA_MRLSC))
> pciehp_handle_presence_or_link_change(ctrl, events);
> +
> up_read(&ctrl->reset_lock);

Unnecessary newline added.


> @@ -768,6 +770,14 @@ static void pcie_enable_notification(struct controller *ctrl)
> cmd |= PCI_EXP_SLTCTL_ABPE;
> else
> cmd |= PCI_EXP_SLTCTL_PDCE;
> +
> + /*
> + * If MRL sensor is present, then subscribe for MRL
> + * Changes to be notified as well.
> + */
> + if (MRL_SENS(ctrl))
> + cmd |= PCI_EXP_SLTCTL_MRLSCE;
> +

The code comment doesn't add much information, so can probably be
dropped.

You need to add PCI_EXP_SLTCTL_MRLSCE to the "mask" variable in this
function (before PFDE, as in pcie_disable_notification()).
I don't think the interrupt is enabled at all if it's not added to
"mask", has this patch been tested at all?

Something else: When pciehp probes, it should check whether the slot
is up even though MRL is open. (E.g. the machine is booted, the card
in the slot was enumerated but the latch is open.) I think in that
case we need to bring down the slot. I suggest adding a check to
pciehp_check_presence() whether the latch is open. If so,
a PCI_EXP_SLTSTA_MRLSC event should be synthesized. You could rename
the latch_closed() function to pciehp_latch_closed() and remove its
"static" attribute so that you can call it from pciehp_core.c.

Thanks,

Lukas

2020-11-25 23:38:25

by Ashok Raj

[permalink] [raw]
Subject: Re: [Patch v2 1/1] PCI: pciehp: Add support for handling MRL events

Hi Lukas

On Sun, Nov 22, 2020 at 10:08:52AM +0100, Lukas Wunner wrote:
> On Sat, Nov 21, 2020 at 05:42:03PM -0800, Ashok Raj wrote:
> > --- a/drivers/pci/hotplug/pciehp_ctrl.c
> > +++ b/drivers/pci/hotplug/pciehp_ctrl.c
> > void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
> > {
> > int present, link_active;
> > + u8 getstatus = 0;
> >
> > /*
> > * If the slot is on and presence or link has changed, turn it off.
> > @@ -246,6 +259,20 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
> > if (events & PCI_EXP_SLTSTA_PDC)
> > ctrl_info(ctrl, "Slot(%s): Card not present\n",
> > slot_name(ctrl));
> > + if (events & PCI_EXP_SLTSTA_MRLSC)
> > + ctrl_info(ctrl, "Slot(%s): Latch %s\n",
> > + slot_name(ctrl), getstatus ? "Open" : "Closed");
>
> This message will currently always be "Latch closed". It should be
> "Latch open" instead because if the slot was up, the latch must have
> been closed. So an MRLSC event can only mean that the latch is now open.
> The "getstatus" variable can be removed.

We only report if there was an MRLSC event. What if there is a link event,
but MRL is closed? This just reflects current state rather than hardcoding
a value which could be wrong in cases where you try to remove due to DLLSC
event?

>
>
> > + /*
> > + * PCIe Base Spec 5.0 Chapter 6.7.1.3 states.
> > + *
> > + * If an MRL Sensor is implemented without a corresponding MRL Sensor input
> > + * on the Hot-Plug Controller, it is recommended that the MRL Sensor be
> > + * routed to power fault input of the Hot-Plug Controller.
> > + * This allows an active adapter to be powered off when the MRL is opened."
> > + *
> > + * This seems to suggest that the slot should be brought down as soon as MRL
> > + * is opened.
> > + */
> > pciehp_disable_slot(ctrl, SURPRISE_REMOVAL);
> > break;
>
> The code comment is not wrapped at 80 chars and a bit long.
> I'd move it to the commit message and keep only a shortened version here.

Make sense. I'll clean this up.
>
> The "SURPRISE_REMOVAL" may now be problematic because the card may still
> be in the slot (both presence and link still up) with only the MRL open.
> My suggestion would be to add a local variable "bool safe_removal"
> which is initialized to "SAFE_REMOVAL". In the two if-clauses for
> DLLSC and PDC, it is set to SURPRISE_REMOVAL.

I see, so for MRL we want to treat it as safe-removal, for other two its
surprise. Got it.

>
>
> > @@ -275,6 +302,13 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
> > if (link_active)
> > ctrl_info(ctrl, "Slot(%s): Link Up\n",
> > slot_name(ctrl));
> > + /*
> > + * If slot is closed && ATTN button exists
> > + * don't continue, let the ATTN button
> > + * drive the hot-plug
> > + */
> > + if (((events & PCI_EXP_SLTSTA_MRLSC) && ATTN_BUTTN(ctrl)))
> > + return;
> > ctrl->request_result = pciehp_enable_slot(ctrl);
> > break;
>
> Hm, if the Attention Button is pressed with MRL still open, the slot is
> not brought up. If the MRL is subsequently closed, it is still not
> brought up. I guess the slot keeps blinking and one has to push the
> button to abort the operation, then press it once more to attempt
> another slot bringup. The spec doesn't seem to say how such a situation
> should be handled. Oh well.

Looks like we are in the same boat today even without MRL. If no card in
slot and someone presses ATTN, after 5 sec blink, we call the synthetic PDC
event. But the check for present || link_active would fail and return
immediately. So the light would keep blinking until someone presses ATTN to
cancel?

Maybe in that we should simply cancel if it was blinking before we return?

>
> I'm wondering if this is the right place to bail out: Immediately
> before the above hunk, the button_work is canceled, so it can't later
> trigger bringup of the slot. Shouldn't the above check be in the
> code block with the "Turn the slot on if it's occupied or link is up"
> comment?

Or maybe remove the check !latch_closed(ctrl), and let if fall through
anyway.
>
> You're also not unlocking the state_lock here before bailing out of
> the function.
>
>
> > @@ -710,8 +710,10 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
> > down_read(&ctrl->reset_lock);
> > if (events & DISABLE_SLOT)
> > pciehp_handle_disable_request(ctrl);
> > - else if (events & (PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_DLLSC))
> > + else if (events & (PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_DLLSC |
> > + PCI_EXP_SLTSTA_MRLSC))
> > pciehp_handle_presence_or_link_change(ctrl, events);
> > +
> > up_read(&ctrl->reset_lock);
>
> Unnecessary newline added.

Will remove.

>
>
> > @@ -768,6 +770,14 @@ static void pcie_enable_notification(struct controller *ctrl)
> > cmd |= PCI_EXP_SLTCTL_ABPE;
> > else
> > cmd |= PCI_EXP_SLTCTL_PDCE;
> > +
> > + /*
> > + * If MRL sensor is present, then subscribe for MRL
> > + * Changes to be notified as well.
> > + */
> > + if (MRL_SENS(ctrl))
> > + cmd |= PCI_EXP_SLTCTL_MRLSCE;
> > +
>
> The code comment doesn't add much information, so can probably be
> dropped.

Make sense.

>
> You need to add PCI_EXP_SLTCTL_MRLSCE to the "mask" variable in this
> function (before PFDE, as in pcie_disable_notification()).
> I don't think the interrupt is enabled at all if it's not added to
> "mask", has this patch been tested at all?

The first patch was tested, but I didn't have that in the mask variable
even then.

>
> Something else: When pciehp probes, it should check whether the slot
> is up even though MRL is open. (E.g. the machine is booted, the card
> in the slot was enumerated but the latch is open.) I think in that
> case we need to bring down the slot. I suggest adding a check to
> pciehp_check_presence() whether the latch is open. If so,
> a PCI_EXP_SLTSTA_MRLSC event should be synthesized. You could rename
> the latch_closed() function to pciehp_latch_closed() and remove its
> "static" attribute so that you can call it from pciehp_core.c.

Good point. I missed that. I'll have another version spun after a test.

--
Cheers,
Ashok

[Forgiveness is the attribute of the STRONG - Gandhi]

2020-12-03 22:56:25

by Ashok Raj

[permalink] [raw]
Subject: Re: [Patch v2 1/1] PCI: pciehp: Add support for handling MRL events

Hi Lukas and Bjorn


On Sun, Nov 22, 2020 at 10:08:52AM +0100, Lukas Wunner wrote:
> > @@ -275,6 +302,13 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
> > if (link_active)
> > ctrl_info(ctrl, "Slot(%s): Link Up\n",
> > slot_name(ctrl));
> > + /*
> > + * If slot is closed && ATTN button exists
> > + * don't continue, let the ATTN button
> > + * drive the hot-plug
> > + */
> > + if (((events & PCI_EXP_SLTSTA_MRLSC) && ATTN_BUTTN(ctrl)))
> > + return;
> > ctrl->request_result = pciehp_enable_slot(ctrl);
> > break;
>
> Hm, if the Attention Button is pressed with MRL still open, the slot is
> not brought up. If the MRL is subsequently closed, it is still not
> brought up. I guess the slot keeps blinking and one has to push the
> button to abort the operation, then press it once more to attempt
> another slot bringup. The spec doesn't seem to say how such a situation
> should be handled. Oh well.
>
> I'm wondering if this is the right place to bail out: Immediately
> before the above hunk, the button_work is canceled, so it can't later
> trigger bringup of the slot. Shouldn't the above check be in the
> code block with the "Turn the slot on if it's occupied or link is up"
> comment?
>

I have a fix tested on the platform, but I'm wondering if that's exactly
what you had in mind.

Currently we don't subscribe for PDC events when ATTN exists. So the
behavior is almost similar to this MRL case after ATTN, but the slot is not
ready for hot-add.

- Press ATTN,
- Slot is empty
- After 5 seconds synthetic PDC arrives.
but since no presence and no link active, we leave slot in
BLINKINGON_STATE, and led keeps blinking

if someone were to add a card after the 5 seconds, no hot-add is processed
since we don't get notifications for PDC events when ATTN exists.

Can we automatically cancel the blinking and return slot back to OFF_STATE?

This way we don't need another button press to first cancel, and restart
add via another button press?

According to section 6.7.1.5 Attention Button.
Once the power indicator begins blinking, a 5 second abort interval exists
during which a second depression of the attention button cancels the operation.

If the operation initiated by the attention button fails for any reason, it
is recommended that system software present an error message explaining
failure via a software user interface, or add the error message to system
log.

Seems like we can cancel the blinking and return back to power off state.
Since the attention button press wasn't successful to add anything.?

Alternately we can also choose to subscribe to PDC, but ignore if slot is
in OFF_STATE. So we let ATTN drive the add. But if PDC happens and we are
in BLINKINGON_STATE, then we can process the hot-add? Spec says a software
recommendation, but i think the cancel after 5 seconds seems better?

Cheers,
Ashok

2020-12-10 21:01:34

by Lukas Wunner

[permalink] [raw]
Subject: Re: [Patch v2 1/1] PCI: pciehp: Add support for handling MRL events

On Thu, Dec 03, 2020 at 02:51:24PM -0800, Raj, Ashok wrote:
> - Press ATTN,
> - Slot is empty
> - After 5 seconds synthetic PDC arrives.
> but since no presence and no link active, we leave slot in
> BLINKINGON_STATE, and led keeps blinking
>
> if someone were to add a card after the 5 seconds, no hot-add is processed
> since we don't get notifications for PDC events when ATTN exists.
>
> Can we automatically cancel the blinking and return slot back to OFF_STATE?

Yes.


> If the operation initiated by the attention button fails for any reason, it
> is recommended that system software present an error message explaining
> failure via a software user interface, or add the error message to system
> log.

Ah so we're supposed to log a message if the slot is empty.
That needs to be added then to the code snippet I sent you
yesterday in response to your off-list e-mail.


> Alternately we can also choose to subscribe to PDC, but ignore if slot is
> in OFF_STATE. So we let ATTN drive the add. But if PDC happens and we are
> in BLINKINGON_STATE, then we can process the hot-add? Spec says a software
> recommendation, but i think the cancel after 5 seconds seems better?

That approach seems more complicated. It's better to stop blinking
and return to OFF_STATE if after the 5 second interval the slot is
found to be empty.

Thanks,

Lukas