Hello!
I would like to open a question about PCIe Warm Reset. Warm Reset of
PCIe card is triggered by asserting PERST# signal and in most cases
PERST# signal is controlled by GPIO.
Basically every native Linux PCIe controller driver is doing this Warm
Reset of connected PCIe card during native driver initialization
procedure.
And now the important question is: How long should be PCIe card in Warm
Reset state? After which timeout can be PERST# signal de-asserted by
Linux controller driver?
Lorenzo and Rob already expressed concerns [1] [2] that this Warm Reset
timeout should not be driver specific and I agree with them.
I have done investigation which timeout is using which native PCIe
driver [3] and basically every driver is using different timeout.
I have tried to find timeouts in PCIe specifications, I was not able to
understand and deduce correct timeout value for Warm Reset from PCIe
specifications. What I have found is written in my email [4].
Alex (as a "reset expert"), could you look at this issue?
Or is there somebody else who understand PCIe specifications and PCIe
diagrams to figure out what is the minimal timeout for de-asserting
PERST# signal?
There are still some issues with WiFi cards (e.g. Compex one) which
sometimes do not appear on PCIe bus. And based on these "reset timeout
differences" in Linux PCIe controller drivers, I suspect that it is not
(only) the problems in WiFi cards but also in Linux PCIe controller
drivers. In my email [3] I have written that I figured out that WLE1216
card needs to be in Warm Reset state for at least 10ms, otherwise card
is not detected.
[1] - https://lore.kernel.org/linux-pci/20200513115940.fiemtnxfqcyqo6ik@pali/
[2] - https://lore.kernel.org/linux-pci/20200507212002.GA32182@bogus/
[3] - https://lore.kernel.org/linux-pci/20200424092546.25p3hdtkehohe3xw@pali/
[4] - https://lore.kernel.org/linux-pci/20200430082245.xblvb7xeamm4e336@pali/
On Tuesday 23 March 2021 21:49:41 Amey Narkhede wrote:
> On 21/03/10 12:05PM, Pali Rohár wrote:
> > Hello!
> >
> > I would like to open a question about PCIe Warm Reset. Warm Reset of
> > PCIe card is triggered by asserting PERST# signal and in most cases
> > PERST# signal is controlled by GPIO.
> >
> > Basically every native Linux PCIe controller driver is doing this Warm
> > Reset of connected PCIe card during native driver initialization
> > procedure.
> >
> > And now the important question is: How long should be PCIe card in Warm
> > Reset state? After which timeout can be PERST# signal de-asserted by
> > Linux controller driver?
> >
> > Lorenzo and Rob already expressed concerns [1] [2] that this Warm Reset
> > timeout should not be driver specific and I agree with them.
> >
> > I have done investigation which timeout is using which native PCIe
> > driver [3] and basically every driver is using different timeout.
> >
> > I have tried to find timeouts in PCIe specifications, I was not able to
> > understand and deduce correct timeout value for Warm Reset from PCIe
> > specifications. What I have found is written in my email [4].
> >
> > Alex (as a "reset expert"), could you look at this issue?
> >
> > Or is there somebody else who understand PCIe specifications and PCIe
> > diagrams to figure out what is the minimal timeout for de-asserting
> > PERST# signal?
> >
> > There are still some issues with WiFi cards (e.g. Compex one) which
> > sometimes do not appear on PCIe bus. And based on these "reset timeout
> > differences" in Linux PCIe controller drivers, I suspect that it is not
> > (only) the problems in WiFi cards but also in Linux PCIe controller
> > drivers. In my email [3] I have written that I figured out that WLE1216
> > card needs to be in Warm Reset state for at least 10ms, otherwise card
> > is not detected.
> >
> > [1] - https://lore.kernel.org/linux-pci/20200513115940.fiemtnxfqcyqo6ik@pali/
> > [2] - https://lore.kernel.org/linux-pci/20200507212002.GA32182@bogus/
> > [3] - https://lore.kernel.org/linux-pci/20200424092546.25p3hdtkehohe3xw@pali/
> > [4] - https://lore.kernel.org/linux-pci/20200430082245.xblvb7xeamm4e336@pali/
>
> I somehow got my hands on PCIe Gen4 spec. It says on page no 555-
> "When PERST# is provided to a component or adapter, this signal must be
> used by the component or adapter as Fundamental Reset.
> When PERST# is not provided to a component or adapter, Fundamental Reset is
> generated autonomously by the component or adapter, and the details of how
> this is done are outside the scope of this document."
> Not sure what component/adapter means in this context.
>
> Then below it says-
> "In some cases, it may be possible for the Fundamental Reset mechanism
> to be triggered by hardware without the removal and re-application of
> power to the component. This is called a warm reset. This document does
> not specify a means for generating a warm reset."
>
> Thanks,
> Amey
Hello Amey, PCIe Base document does not specify how to control PERST#
signal and how to issue Warm Reset. But it is documented in PCIe CEM,
Mini PCIe CEM and M.2 CEM documents (maybe in some other PCIe docs too).
It is needed look into more documents, "merge them in head" and then
deduce final meaning...
On 21/03/23 05:27PM, Pali Rohár wrote:
> On Tuesday 23 March 2021 21:49:41 Amey Narkhede wrote:
> > On 21/03/10 12:05PM, Pali Rohár wrote:
> > > Hello!
> > >
> > > I would like to open a question about PCIe Warm Reset. Warm Reset of
> > > PCIe card is triggered by asserting PERST# signal and in most cases
> > > PERST# signal is controlled by GPIO.
> > >
> > > Basically every native Linux PCIe controller driver is doing this Warm
> > > Reset of connected PCIe card during native driver initialization
> > > procedure.
> > >
> > > And now the important question is: How long should be PCIe card in Warm
> > > Reset state? After which timeout can be PERST# signal de-asserted by
> > > Linux controller driver?
> > >
> > > Lorenzo and Rob already expressed concerns [1] [2] that this Warm Reset
> > > timeout should not be driver specific and I agree with them.
> > >
> > > I have done investigation which timeout is using which native PCIe
> > > driver [3] and basically every driver is using different timeout.
> > >
> > > I have tried to find timeouts in PCIe specifications, I was not able to
> > > understand and deduce correct timeout value for Warm Reset from PCIe
> > > specifications. What I have found is written in my email [4].
> > >
> > > Alex (as a "reset expert"), could you look at this issue?
> > >
> > > Or is there somebody else who understand PCIe specifications and PCIe
> > > diagrams to figure out what is the minimal timeout for de-asserting
> > > PERST# signal?
> > >
> > > There are still some issues with WiFi cards (e.g. Compex one) which
> > > sometimes do not appear on PCIe bus. And based on these "reset timeout
> > > differences" in Linux PCIe controller drivers, I suspect that it is not
> > > (only) the problems in WiFi cards but also in Linux PCIe controller
> > > drivers. In my email [3] I have written that I figured out that WLE1216
> > > card needs to be in Warm Reset state for at least 10ms, otherwise card
> > > is not detected.
> > >
> > > [1] - https://lore.kernel.org/linux-pci/20200513115940.fiemtnxfqcyqo6ik@pali/
> > > [2] - https://lore.kernel.org/linux-pci/20200507212002.GA32182@bogus/
> > > [3] - https://lore.kernel.org/linux-pci/20200424092546.25p3hdtkehohe3xw@pali/
> > > [4] - https://lore.kernel.org/linux-pci/20200430082245.xblvb7xeamm4e336@pali/
> >
> > I somehow got my hands on PCIe Gen4 spec. It says on page no 555-
> > "When PERST# is provided to a component or adapter, this signal must be
> > used by the component or adapter as Fundamental Reset.
> > When PERST# is not provided to a component or adapter, Fundamental Reset is
> > generated autonomously by the component or adapter, and the details of how
> > this is done are outside the scope of this document."
> > Not sure what component/adapter means in this context.
> >
> > Then below it says-
> > "In some cases, it may be possible for the Fundamental Reset mechanism
> > to be triggered by hardware without the removal and re-application of
> > power to the component. This is called a warm reset. This document does
> > not specify a means for generating a warm reset."
> >
> > Thanks,
> > Amey
>
> Hello Amey, PCIe Base document does not specify how to control PERST#
> signal and how to issue Warm Reset. But it is documented in PCIe CEM,
> Mini PCIe CEM and M.2 CEM documents (maybe in some other PCIe docs too).
>
> It is needed look into more documents, "merge them in head" and then
> deduce final meaning...
Okay so PCIe CEM revision 2.0(from 2007) on page no 22 says-
"On power up, the deassertion of PERST# is delayed 100 ms (TPVPERL)
from the power rails achieving specified operating limits. Also, within
this time, the reference clocks (REFCLK+, REFCLK-) also become stable,
at least TPERST-CLK before PERST# is deasserted."
Then below it says-
"After there has been time (TPVPERL) for the power and clock to become
stable, PERST# is deasserted high and the PCI Express functions can start
up."
And then there is table of timing on page no 33-
Symbol Parameter Min
TPVPERL Power Stable to PERST# inactive 100ms
TPERST-CLK REFCLK stable before PERST# inactive 100μs
TPERST PERST# active time 100μs
TFAIL Power level invalid to PERST# active 500ns
...
I agree this is confusing.
Thanks,
Amey
On 21/03/10 12:05PM, Pali Roh?r wrote:
> Hello!
>
> I would like to open a question about PCIe Warm Reset. Warm Reset of
> PCIe card is triggered by asserting PERST# signal and in most cases
> PERST# signal is controlled by GPIO.
>
> Basically every native Linux PCIe controller driver is doing this Warm
> Reset of connected PCIe card during native driver initialization
> procedure.
>
> And now the important question is: How long should be PCIe card in Warm
> Reset state? After which timeout can be PERST# signal de-asserted by
> Linux controller driver?
>
> Lorenzo and Rob already expressed concerns [1] [2] that this Warm Reset
> timeout should not be driver specific and I agree with them.
>
> I have done investigation which timeout is using which native PCIe
> driver [3] and basically every driver is using different timeout.
>
> I have tried to find timeouts in PCIe specifications, I was not able to
> understand and deduce correct timeout value for Warm Reset from PCIe
> specifications. What I have found is written in my email [4].
>
> Alex (as a "reset expert"), could you look at this issue?
>
> Or is there somebody else who understand PCIe specifications and PCIe
> diagrams to figure out what is the minimal timeout for de-asserting
> PERST# signal?
>
> There are still some issues with WiFi cards (e.g. Compex one) which
> sometimes do not appear on PCIe bus. And based on these "reset timeout
> differences" in Linux PCIe controller drivers, I suspect that it is not
> (only) the problems in WiFi cards but also in Linux PCIe controller
> drivers. In my email [3] I have written that I figured out that WLE1216
> card needs to be in Warm Reset state for at least 10ms, otherwise card
> is not detected.
>
> [1] - https://lore.kernel.org/linux-pci/20200513115940.fiemtnxfqcyqo6ik@pali/
> [2] - https://lore.kernel.org/linux-pci/20200507212002.GA32182@bogus/
> [3] - https://lore.kernel.org/linux-pci/20200424092546.25p3hdtkehohe3xw@pali/
> [4] - https://lore.kernel.org/linux-pci/20200430082245.xblvb7xeamm4e336@pali/
I somehow got my hands on PCIe Gen4 spec. It says on page no 555-
"When PERST# is provided to a component or adapter, this signal must be
used by the component or adapter as Fundamental Reset.
When PERST# is not provided to a component or adapter, Fundamental Reset is
generated autonomously by the component or adapter, and the details of how
this is done are outside the scope of this document."
Not sure what component/adapter means in this context.
Then below it says-
"In some cases, it may be possible for the Fundamental Reset mechanism
to be triggered by hardware without the removal and re-application of
power to the component. This is called a warm reset. This document does
not specify a means for generating a warm reset."
Thanks,
Amey
From: Amey Narkhede
> Sent: 23 March 2021 16:58
>
> On 21/03/23 05:27PM, Pali Rohár wrote:
> > On Tuesday 23 March 2021 21:49:41 Amey Narkhede wrote:
> > > On 21/03/10 12:05PM, Pali Rohár wrote:
> > > > Hello!
> > > >
> > > > I would like to open a question about PCIe Warm Reset. Warm Reset of
> > > > PCIe card is triggered by asserting PERST# signal and in most cases
> > > > PERST# signal is controlled by GPIO.
> > > >
> > > > Basically every native Linux PCIe controller driver is doing this Warm
> > > > Reset of connected PCIe card during native driver initialization
> > > > procedure.
> > > >
> > > > And now the important question is: How long should be PCIe card in Warm
> > > > Reset state? After which timeout can be PERST# signal de-asserted by
> > > > Linux controller driver?
> > > >
> > > > Lorenzo and Rob already expressed concerns [1] [2] that this Warm Reset
> > > > timeout should not be driver specific and I agree with them.
> > > >
> > > > I have done investigation which timeout is using which native PCIe
> > > > driver [3] and basically every driver is using different timeout.
> > > >
> > > > I have tried to find timeouts in PCIe specifications, I was not able to
> > > > understand and deduce correct timeout value for Warm Reset from PCIe
> > > > specifications. What I have found is written in my email [4].
> > > >
> > > > Alex (as a "reset expert"), could you look at this issue?
> > > >
> > > > Or is there somebody else who understand PCIe specifications and PCIe
> > > > diagrams to figure out what is the minimal timeout for de-asserting
> > > > PERST# signal?
> > > >
> > > > There are still some issues with WiFi cards (e.g. Compex one) which
> > > > sometimes do not appear on PCIe bus. And based on these "reset timeout
> > > > differences" in Linux PCIe controller drivers, I suspect that it is not
> > > > (only) the problems in WiFi cards but also in Linux PCIe controller
> > > > drivers. In my email [3] I have written that I figured out that WLE1216
> > > > card needs to be in Warm Reset state for at least 10ms, otherwise card
> > > > is not detected.
> > > >
> > > > [1] - https://lore.kernel.org/linux-pci/20200513115940.fiemtnxfqcyqo6ik@pali/
> > > > [2] - https://lore.kernel.org/linux-pci/20200507212002.GA32182@bogus/
> > > > [3] - https://lore.kernel.org/linux-pci/20200424092546.25p3hdtkehohe3xw@pali/
> > > > [4] - https://lore.kernel.org/linux-pci/20200430082245.xblvb7xeamm4e336@pali/
> > >
> > > I somehow got my hands on PCIe Gen4 spec. It says on page no 555-
> > > "When PERST# is provided to a component or adapter, this signal must be
> > > used by the component or adapter as Fundamental Reset.
> > > When PERST# is not provided to a component or adapter, Fundamental Reset is
> > > generated autonomously by the component or adapter, and the details of how
> > > this is done are outside the scope of this document."
> > > Not sure what component/adapter means in this context.
> > >
> > > Then below it says-
> > > "In some cases, it may be possible for the Fundamental Reset mechanism
> > > to be triggered by hardware without the removal and re-application of
> > > power to the component. This is called a warm reset. This document does
> > > not specify a means for generating a warm reset."
> > >
> > > Thanks,
> > > Amey
> >
> > Hello Amey, PCIe Base document does not specify how to control PERST#
> > signal and how to issue Warm Reset. But it is documented in PCIe CEM,
> > Mini PCIe CEM and M.2 CEM documents (maybe in some other PCIe docs too).
> >
> > It is needed look into more documents, "merge them in head" and then
> > deduce final meaning...
> Okay so PCIe CEM revision 2.0(from 2007) on page no 22 says-
> "On power up, the deassertion of PERST# is delayed 100 ms (TPVPERL)
> from the power rails achieving specified operating limits. Also, within
> this time, the reference clocks (REFCLK+, REFCLK-) also become stable,
> at least TPERST-CLK before PERST# is deasserted."
>
> Then below it says-
> "After there has been time (TPVPERL) for the power and clock to become
> stable, PERST# is deasserted high and the PCI Express functions can start
> up."
>
> And then there is table of timing on page no 33-
> Symbol Parameter Min
> TPVPERL Power Stable to PERST# inactive 100ms
> TPERST-CLK REFCLK stable before PERST# inactive 100μs
> TPERST PERST# active time 100μs
> TFAIL Power level invalid to PERST# active 500ns
> ...
>
> I agree this is confusing.
There is also the related issue of the time after reset is removed
before the target must respond to the first configuration cycle.
I can't see the value in the (nice bound) copy of the PCI 2.0 spec I have.
But IIRC it is 100ms (it might just me 500ms).
While this might seem like ages it can be problematic if targets have
to load large FPGA images from serial EEPROMs.
Most x86 systems have lots of slow bios code so tend to be fine.
But some other systems can try to enumerate the PCIe target before
it is actually ready - causing semi-random failures.
Our current fpgas do load the pcie interface before most of the
logic, and can be configured to force cycle reruns on the config
space accesses until fully loaded.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
On Thu, 25 Mar 2021, David Laight wrote:
> I can't see the value in the (nice bound) copy of the PCI 2.0 spec I have.
> But IIRC it is 100ms (it might just me 500ms).
> While this might seem like ages it can be problematic if targets have
> to load large FPGA images from serial EEPROMs.
AFAICT it is 100ms for the Conventional Reset before Configuration
Requests are allowed to be issued in the first place, and then they are
allowed to fail with the Configuration Request Retry Status (CRS) status
until the device is ready to respond. Then it is 1.0s before the Root
Complex and/or system software is allowed to consider a device broken that
does not return a Successful Completion status for a valid Configuration
Request.
This 1.0s period is analogous to the Trhfa parameter for PCI/PCI-X buses
(2^25/2^27 bus clocks respectively; I don't know why the PCIe spec quotes
the latter value as 2^26, contrary to what the original PCI-X spec says,
but obviously the latter document is what sets the norm), which also has
to be respected for the respective bus segments in the presence of PCIe to
PCI/PCI-X bridges.
For Function-level reset the timeout is 100ms.
This is specified in sections 6.6.1. "Conventional Reset" and 6.6.2.
"Function-Level Reset (FLR)" respectively in the copy of PCIe 2.0 base
spec I have access to; I imagine other versions may have different section
numbers, but will have them named similarly.
If I were to implement this stuff, for good measure I'd give it a safety
margin beyond what the spec requires and use a timeout of say 2-4s while
actively querying the status of the device. The values given in the spec
are only the minimum requirements.
HTH,
Maciej
On Tuesday 30 March 2021 15:04:02 Maciej W. Rozycki wrote:
> On Thu, 25 Mar 2021, David Laight wrote:
>
> > I can't see the value in the (nice bound) copy of the PCI 2.0 spec I have.
> > But IIRC it is 100ms (it might just me 500ms).
> > While this might seem like ages it can be problematic if targets have
> > to load large FPGA images from serial EEPROMs.
>
> AFAICT it is 100ms for the Conventional Reset before Configuration
> Requests are allowed to be issued in the first place, and then they are
> allowed to fail with the Configuration Request Retry Status (CRS) status
> until the device is ready to respond. Then it is 1.0s before the Root
> Complex and/or system software is allowed to consider a device broken that
> does not return a Successful Completion status for a valid Configuration
> Request.
>
> This 1.0s period is analogous to the Trhfa parameter for PCI/PCI-X buses
> (2^25/2^27 bus clocks respectively; I don't know why the PCIe spec quotes
> the latter value as 2^26, contrary to what the original PCI-X spec says,
> but obviously the latter document is what sets the norm), which also has
> to be respected for the respective bus segments in the presence of PCIe to
> PCI/PCI-X bridges.
Hello Maciej! Thank you for information.
> For Function-level reset the timeout is 100ms.
>
> This is specified in sections 6.6.1. "Conventional Reset" and 6.6.2.
> "Function-Level Reset (FLR)" respectively in the copy of PCIe 2.0 base
> spec I have access to; I imagine other versions may have different section
> numbers, but will have them named similarly.
>
> If I were to implement this stuff, for good measure I'd give it a safety
> margin beyond what the spec requires and use a timeout of say 2-4s while
> actively querying the status of the device. The values given in the spec
> are only the minimum requirements.
Are you able to also figure out what is the minimal timeout value for PCIe Warm Reset?
Because we are having troubles to "decode" correct minimal timeout value
for this PCIe Warm Reset (not Function-level reset).
On Tue, 30 Mar 2021, Pali Rohár wrote:
> > If I were to implement this stuff, for good measure I'd give it a safety
> > margin beyond what the spec requires and use a timeout of say 2-4s while
> > actively querying the status of the device. The values given in the spec
> > are only the minimum requirements.
>
> Are you able to also figure out what is the minimal timeout value for
> PCIe Warm Reset?
>
> Because we are having troubles to "decode" correct minimal timeout value
> for this PCIe Warm Reset (not Function-level reset).
The spec does not give any exceptions AFAICT as to the timeouts required
between the three kinds of a Conventional Reset (Hot, Warm, or Cold) and
refers to them collectively as a Conventional Reset across the relevant
parts of the document, so clearly the same rules apply.
Maciej
On Tuesday 30 March 2021 16:34:47 Maciej W. Rozycki wrote:
> On Tue, 30 Mar 2021, Pali Rohár wrote:
>
> > > If I were to implement this stuff, for good measure I'd give it a safety
> > > margin beyond what the spec requires and use a timeout of say 2-4s while
> > > actively querying the status of the device. The values given in the spec
> > > are only the minimum requirements.
> >
> > Are you able to also figure out what is the minimal timeout value for
> > PCIe Warm Reset?
> >
> > Because we are having troubles to "decode" correct minimal timeout value
> > for this PCIe Warm Reset (not Function-level reset).
>
> The spec does not give any exceptions AFAICT as to the timeouts required
> between the three kinds of a Conventional Reset (Hot, Warm, or Cold) and
> refers to them collectively as a Conventional Reset across the relevant
> parts of the document, so clearly the same rules apply.
>
> Maciej
There are specified more timeouts related to Warm reset and PERST#
signal. Just they are not in Base spec, but in CEM spec. See previous
Amey's email where are described some timeouts and also links in my
first email where I put other timeouts defined in specs relevant for
PERST# signal and therefore also for Warm Reset.
On Tue, 30 Mar 2021, Pali Rohár wrote:
> > The spec does not give any exceptions AFAICT as to the timeouts required
> > between the three kinds of a Conventional Reset (Hot, Warm, or Cold) and
> > refers to them collectively as a Conventional Reset across the relevant
> > parts of the document, so clearly the same rules apply.
>
> There are specified more timeouts related to Warm reset and PERST#
> signal. Just they are not in Base spec, but in CEM spec. See previous
> Amey's email where are described some timeouts and also links in my
> first email where I put other timeouts defined in specs relevant for
> PERST# signal and therefore also for Warm Reset.
I specifically referred to the time allowed for devices to take between a
reset and the first successful configuration cycle David wondered about.
I don't think I can comment on the timeouts given in the CEM spec as I
don't have a copy. Sorry.
Maciej
On Tuesday 30 March 2021 15:04:02 Maciej W. Rozycki wrote:
> On Thu, 25 Mar 2021, David Laight wrote:
>
> > I can't see the value in the (nice bound) copy of the PCI 2.0 spec I have.
> > But IIRC it is 100ms (it might just me 500ms).
> > While this might seem like ages it can be problematic if targets have
> > to load large FPGA images from serial EEPROMs.
>
> AFAICT it is 100ms for the Conventional Reset before Configuration
> Requests are allowed to be issued in the first place...
Hi Maciej! Now I see that we have talked about two different things.
My question is: How long should be card is reset state. And you
described timeouts after reset finish... which are different timeouts.
In case you know also timeout how long should card stay in reset state
then please let us know!