2020-01-07 16:16:47

by Muni Sekhar

[permalink] [raw]
Subject: pcie: xilinx: kernel hang - ISR readl()

Hi,

I have module with Xilinx FPGA. It implements UART(s), SPI(s),
parallel I/O and interfaces them to the Host CPU via PCI Express bus.
I see that my system freezes without capturing the crash dump for
certain tests. I debugged this issue and it was tracked down to the
below mentioned interrupt handler code.


In ISR, first reads the Interrupt Status register using ‘readl()’ as
given below.
status = readl(ctrl->reg + INT_STATUS);


And then clears the pending interrupts using ‘writel()’ as given blow.
writel(status, ctrl->reg + INT_STATUS);


I've noticed a kernel hang if INT_STATUS register read again after
clearing the pending interrupts.

Can someone clarify me why the kernel hangs without crash dump incase
if I read the INT_STATUS register using readl() after clearing the
pending bits?

Can readl() block?


Snippet of the ISR code is given blow:

https://pastebin.com/WdnZJZF5



static irqreturn_t pcie_isr(int irq, void *dev_id)

{

struct test_device *ctrl = data;

u32 status;





status = readl(ctrl->reg + INT_STATUS);

/*

* Check to see if it was our interrupt

*/

if (!(status & 0x000C))

return IRQ_NONE;



/* Clear the interrupt */

writel(status, ctrl->reg + INT_STATUS);



if (status & 0x0004) {

/*

* Tx interrupt pending.

*/

....

}



if (status & 0x0008) {

/* Rx interrupt Pending */

/* The system freezes if I read again the INT_STATUS
register as given below */

status = readl(ctrl->reg + INT_STATUS);

....

}

..

return IRQ_HANDLED;
}



--
Thanks,
Sekhar


2020-01-08 14:56:34

by Muni Sekhar

[permalink] [raw]
Subject: Re: pcie: xilinx: kernel hang - ISR readl()

On Tue, Jan 7, 2020 at 9:45 PM Muni Sekhar <[email protected]> wrote:
>
> Hi,
>
> I have module with Xilinx FPGA. It implements UART(s), SPI(s),
> parallel I/O and interfaces them to the Host CPU via PCI Express bus.
> I see that my system freezes without capturing the crash dump for
> certain tests. I debugged this issue and it was tracked down to the
> below mentioned interrupt handler code.
>
>
> In ISR, first reads the Interrupt Status register using ‘readl()’ as
> given below.
> status = readl(ctrl->reg + INT_STATUS);
>
>
> And then clears the pending interrupts using ‘writel()’ as given blow.
> writel(status, ctrl->reg + INT_STATUS);
>
>
> I've noticed a kernel hang if INT_STATUS register read again after
> clearing the pending interrupts.
>
> Can someone clarify me why the kernel hangs without crash dump incase
> if I read the INT_STATUS register using readl() after clearing the
> pending bits?
>
> Can readl() block?
>
>
> Snippet of the ISR code is given blow:
>
> https://pastebin.com/WdnZJZF5
The correct snippet of the ISR code is here: https://pastebin.com/as2tSPwE
>
>
>
> static irqreturn_t pcie_isr(int irq, void *dev_id)
>
> {
>
> struct test_device *ctrl = data;
>
> u32 status;
>
> …
>
>
>
> status = readl(ctrl->reg + INT_STATUS);
>
> /*
>
> * Check to see if it was our interrupt
>
> */
>
> if (!(status & 0x000C))
>
> return IRQ_NONE;
>
>
>
> /* Clear the interrupt */
>
> writel(status, ctrl->reg + INT_STATUS);
>
>
>
> if (status & 0x0004) {
>
> /*
>
> * Tx interrupt pending.
>
> */
>
> ....
>
> }
>
>
>
> if (status & 0x0008) {
>
> /* Rx interrupt Pending */
>
> /* The system freezes if I read again the INT_STATUS
> register as given below */
>
> status = readl(ctrl->reg + INT_STATUS);
>
> ....
>
> }
>
> ..
>
> return IRQ_HANDLED;
> }
>
>
>
> --
> Thanks,
> Sekhar



--
Thanks,
Sekhar

2020-01-08 20:16:15

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: pcie: xilinx: kernel hang - ISR readl()

On Tue, Jan 07, 2020 at 09:45:13PM +0530, Muni Sekhar wrote:
> Hi,
>
> I have module with Xilinx FPGA. It implements UART(s), SPI(s),
> parallel I/O and interfaces them to the Host CPU via PCI Express bus.
> I see that my system freezes without capturing the crash dump for
> certain tests. I debugged this issue and it was tracked down to the
> below mentioned interrupt handler code.
>
>
> In ISR, first reads the Interrupt Status register using ‘readl()’ as
> given below.
> status = readl(ctrl->reg + INT_STATUS);
>
>
> And then clears the pending interrupts using ‘writel()’ as given blow.
> writel(status, ctrl->reg + INT_STATUS);
>
>
> I've noticed a kernel hang if INT_STATUS register read again after
> clearing the pending interrupts.
>
> Can someone clarify me why the kernel hangs without crash dump incase
> if I read the INT_STATUS register using readl() after clearing the
> pending bits?
>
> Can readl() block?

readl() should not block in software. Obviously at the hardware CPU
instruction level, the read instruction has to wait for the result of
the read. Since that data is provided by the device, i.e., your FPGA,
it's possible there's a problem there.

Can you tell whether the FPGA has received the Memory Read for
INT_STATUS and sent the completion?

On the architectures I'm familiar with, if a device doesn't respond,
something would eventually time out so the CPU doesn't wait forever.

> Snippet of the ISR code is given blow:
>
> https://pastebin.com/WdnZJZF5
>
>
>
> static irqreturn_t pcie_isr(int irq, void *dev_id)
>
> {
>
> struct test_device *ctrl = data;
>
> u32 status;
>
> …
>
>
>
> status = readl(ctrl->reg + INT_STATUS);
>
> /*
>
> * Check to see if it was our interrupt
>
> */
>
> if (!(status & 0x000C))
>
> return IRQ_NONE;
>
>
>
> /* Clear the interrupt */
>
> writel(status, ctrl->reg + INT_STATUS);
>
>
>
> if (status & 0x0004) {
>
> /*
>
> * Tx interrupt pending.
>
> */
>
> ....
>
> }
>
>
>
> if (status & 0x0008) {
>
> /* Rx interrupt Pending */
>
> /* The system freezes if I read again the INT_STATUS
> register as given below */
>
> status = readl(ctrl->reg + INT_STATUS);
>
> ....
>
> }
>
> ..
>
> return IRQ_HANDLED;
> }
>
>
>
> --
> Thanks,
> Sekhar

2020-01-09 03:19:14

by Muni Sekhar

[permalink] [raw]
Subject: Re: pcie: xilinx: kernel hang - ISR readl()

On Thu, Jan 9, 2020 at 1:45 AM Bjorn Helgaas <[email protected]> wrote:
>
> On Tue, Jan 07, 2020 at 09:45:13PM +0530, Muni Sekhar wrote:
> > Hi,
> >
> > I have module with Xilinx FPGA. It implements UART(s), SPI(s),
> > parallel I/O and interfaces them to the Host CPU via PCI Express bus.
> > I see that my system freezes without capturing the crash dump for
> > certain tests. I debugged this issue and it was tracked down to the
> > below mentioned interrupt handler code.
> >
> >
> > In ISR, first reads the Interrupt Status register using ‘readl()’ as
> > given below.
> > status = readl(ctrl->reg + INT_STATUS);
> >
> >
> > And then clears the pending interrupts using ‘writel()’ as given blow.
> > writel(status, ctrl->reg + INT_STATUS);
> >
> >
> > I've noticed a kernel hang if INT_STATUS register read again after
> > clearing the pending interrupts.
> >
> > Can someone clarify me why the kernel hangs without crash dump incase
> > if I read the INT_STATUS register using readl() after clearing the
> > pending bits?
> >
> > Can readl() block?
>
> readl() should not block in software. Obviously at the hardware CPU
> instruction level, the read instruction has to wait for the result of
> the read. Since that data is provided by the device, i.e., your FPGA,
> it's possible there's a problem there.

Thank you very much for your reply.
Where can I find the details about what is protocol for reading the
‘memory mapped IO’? Can you point me to any useful links..
I tried locate the exact point of the kernel code where CPU waits for
read instruction as given below.
readl() -> __raw_readl() -> return *(const volatile u32 __force *)add
Do I need to check for the assembly instructions, here?

>
> Can you tell whether the FPGA has received the Memory Read for
> INT_STATUS and sent the completion?

Is there a way to know this with the help of software debugging(either
enabling dynamic debugging or adding new debug prints)? Can you please
point some tools\hw needed to find this?


>
> On the architectures I'm familiar with, if a device doesn't respond,
> something would eventually time out so the CPU doesn't wait forever.

What is timeout here? I mean how long CPU waits for completion? Since
this code runs from interrupt context, does it causes the system to
freeze if timeout is more?

lspci output:
$ lspci
00:00.0 Host bridge: Intel Corporation Atom Processor Z36xxx/Z37xxx
Series SoC Transaction Register (rev 11)
00:02.0 VGA compatible controller: Intel Corporation Atom Processor
Z36xxx/Z37xxx Series Graphics & Display (rev 11)
00:13.0 SATA controller: Intel Corporation Atom Processor E3800 Series
SATA AHCI Controller (rev 11)
00:14.0 USB controller: Intel Corporation Atom Processor
Z36xxx/Z37xxx, Celeron N2000 Series USB xHCI (rev 11)
00:1a.0 Encryption controller: Intel Corporation Atom Processor
Z36xxx/Z37xxx Series Trusted Execution Engine (rev 11)
00:1b.0 Audio device: Intel Corporation Atom Processor Z36xxx/Z37xxx
Series High Definition Audio Controller (rev 11)
00:1c.0 PCI bridge: Intel Corporation Atom Processor E3800 Series PCI
Express Root Port 1 (rev 11)
00:1c.2 PCI bridge: Intel Corporation Atom Processor E3800 Series PCI
Express Root Port 3 (rev 11)
00:1c.3 PCI bridge: Intel Corporation Atom Processor E3800 Series PCI
Express Root Port 4 (rev 11)
00:1d.0 USB controller: Intel Corporation Atom Processor Z36xxx/Z37xxx
Series USB EHCI (rev 11)
00:1f.0 ISA bridge: Intel Corporation Atom Processor Z36xxx/Z37xxx
Series Power Control Unit (rev 11)
00:1f.3 SMBus: Intel Corporation Atom Processor E3800 Series SMBus
Controller (rev 11)
01:00.0 RAM memory: PLDA Device 5555
03:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network
Connection (rev 03)

>
> > Snippet of the ISR code is given blow:
> >
> > https://pastebin.com/WdnZJZF5
> >
> >
> >
> > static irqreturn_t pcie_isr(int irq, void *dev_id)
> >
> > {
> >
> > struct test_device *ctrl = data;
> >
> > u32 status;
> >
> > …
> >
> >
> >
> > status = readl(ctrl->reg + INT_STATUS);
> >
> > /*
> >
> > * Check to see if it was our interrupt
> >
> > */
> >
> > if (!(status & 0x000C))
> >
> > return IRQ_NONE;
> >
> >
> >
> > /* Clear the interrupt */
> >
> > writel(status, ctrl->reg + INT_STATUS);
> >
> >
> >
> > if (status & 0x0004) {
> >
> > /*
> >
> > * Tx interrupt pending.
> >
> > */
> >
> > ....
> >
> > }
> >
> >
> >
> > if (status & 0x0008) {
> >
> > /* Rx interrupt Pending */
> >
> > /* The system freezes if I read again the INT_STATUS
> > register as given below */
> >
> > status = readl(ctrl->reg + INT_STATUS);
> >
> > ....
> >
> > }
> >
> > ..
> >
> > return IRQ_HANDLED;
> > }
> >
> >
> >
> > --
> > Thanks,
> > Sekhar



--
Thanks,
Sekhar

2020-01-09 04:36:11

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: pcie: xilinx: kernel hang - ISR readl()

On Thu, Jan 09, 2020 at 08:47:51AM +0530, Muni Sekhar wrote:
> On Thu, Jan 9, 2020 at 1:45 AM Bjorn Helgaas <[email protected]> wrote:
> > On Tue, Jan 07, 2020 at 09:45:13PM +0530, Muni Sekhar wrote:
> > > Hi,
> > >
> > > I have module with Xilinx FPGA. It implements UART(s), SPI(s),
> > > parallel I/O and interfaces them to the Host CPU via PCI Express bus.
> > > I see that my system freezes without capturing the crash dump for
> > > certain tests. I debugged this issue and it was tracked down to the
> > > below mentioned interrupt handler code.
> > >
> > >
> > > In ISR, first reads the Interrupt Status register using ‘readl()’ as
> > > given below.
> > > status = readl(ctrl->reg + INT_STATUS);
> > >
> > >
> > > And then clears the pending interrupts using ‘writel()’ as given blow.
> > > writel(status, ctrl->reg + INT_STATUS);
> > >
> > >
> > > I've noticed a kernel hang if INT_STATUS register read again after
> > > clearing the pending interrupts.
> > >
> > > Can someone clarify me why the kernel hangs without crash dump incase
> > > if I read the INT_STATUS register using readl() after clearing the
> > > pending bits?
> > >
> > > Can readl() block?
> >
> > readl() should not block in software. Obviously at the hardware CPU
> > instruction level, the read instruction has to wait for the result of
> > the read. Since that data is provided by the device, i.e., your FPGA,
> > it's possible there's a problem there.
>
> Thank you very much for your reply.
> Where can I find the details about what is protocol for reading the
> ‘memory mapped IO’? Can you point me to any useful links..
> I tried locate the exact point of the kernel code where CPU waits for
> read instruction as given below.
> readl() -> __raw_readl() -> return *(const volatile u32 __force *)add
> Do I need to check for the assembly instructions, here?

The C pointer dereference, e.g., "*address", will be some sort of a
"load" instruction in assembly. The CPU wait isn't explicit; it's
just that when you load a value, the CPU waits for the value.

> > Can you tell whether the FPGA has received the Memory Read for
> > INT_STATUS and sent the completion?
>
> Is there a way to know this with the help of software debugging(either
> enabling dynamic debugging or adding new debug prints)? Can you please
> point some tools\hw needed to find this?

You could learn this either via a PCIe analyzer (expensive piece of
hardware) or possibly some logic in the FPGA that would log PCIe
transactions in a buffer and make them accessible via some other
interface (you mentioned it had parallel and other interfaces).

> > On the architectures I'm familiar with, if a device doesn't respond,
> > something would eventually time out so the CPU doesn't wait forever.
>
> What is timeout here? I mean how long CPU waits for completion? Since
> this code runs from interrupt context, does it causes the system to
> freeze if timeout is more?

The Root Port should have a Completion Timeout. This is required by
the PCIe spec. The *reporting* of the timeout is somewhat
implementation-specific since the reporting is outside the PCIe
domain. I don't know the duration of the timeout, but it certainly
shouldn't be long enough to look like a "system freeze".

> lspci output:
> $ lspci
> 00:00.0 Host bridge: Intel Corporation Atom Processor Z36xxx/Z37xxx
> Series SoC Transaction Register (rev 11)
> 00:02.0 VGA compatible controller: Intel Corporation Atom Processor
> Z36xxx/Z37xxx Series Graphics & Display (rev 11)
> 00:13.0 SATA controller: Intel Corporation Atom Processor E3800 Series
> SATA AHCI Controller (rev 11)
> 00:14.0 USB controller: Intel Corporation Atom Processor
> Z36xxx/Z37xxx, Celeron N2000 Series USB xHCI (rev 11)
> 00:1a.0 Encryption controller: Intel Corporation Atom Processor
> Z36xxx/Z37xxx Series Trusted Execution Engine (rev 11)
> 00:1b.0 Audio device: Intel Corporation Atom Processor Z36xxx/Z37xxx
> Series High Definition Audio Controller (rev 11)
> 00:1c.0 PCI bridge: Intel Corporation Atom Processor E3800 Series PCI
> Express Root Port 1 (rev 11)
> 00:1c.2 PCI bridge: Intel Corporation Atom Processor E3800 Series PCI
> Express Root Port 3 (rev 11)
> 00:1c.3 PCI bridge: Intel Corporation Atom Processor E3800 Series PCI
> Express Root Port 4 (rev 11)
> 00:1d.0 USB controller: Intel Corporation Atom Processor Z36xxx/Z37xxx
> Series USB EHCI (rev 11)
> 00:1f.0 ISA bridge: Intel Corporation Atom Processor Z36xxx/Z37xxx
> Series Power Control Unit (rev 11)
> 00:1f.3 SMBus: Intel Corporation Atom Processor E3800 Series SMBus
> Controller (rev 11)
> 01:00.0 RAM memory: PLDA Device 5555

Is this 01:00.0 device the FPGA?

> 03:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network
> Connection (rev 03)

2020-01-09 04:51:39

by Muni Sekhar

[permalink] [raw]
Subject: Re: pcie: xilinx: kernel hang - ISR readl()

On Thu, Jan 9, 2020 at 10:05 AM Bjorn Helgaas <[email protected]> wrote:
>
> On Thu, Jan 09, 2020 at 08:47:51AM +0530, Muni Sekhar wrote:
> > On Thu, Jan 9, 2020 at 1:45 AM Bjorn Helgaas <[email protected]> wrote:
> > > On Tue, Jan 07, 2020 at 09:45:13PM +0530, Muni Sekhar wrote:
> > > > Hi,
> > > >
> > > > I have module with Xilinx FPGA. It implements UART(s), SPI(s),
> > > > parallel I/O and interfaces them to the Host CPU via PCI Express bus.
> > > > I see that my system freezes without capturing the crash dump for
> > > > certain tests. I debugged this issue and it was tracked down to the
> > > > below mentioned interrupt handler code.
> > > >
> > > >
> > > > In ISR, first reads the Interrupt Status register using ‘readl()’ as
> > > > given below.
> > > > status = readl(ctrl->reg + INT_STATUS);
> > > >
> > > >
> > > > And then clears the pending interrupts using ‘writel()’ as given blow.
> > > > writel(status, ctrl->reg + INT_STATUS);
> > > >
> > > >
> > > > I've noticed a kernel hang if INT_STATUS register read again after
> > > > clearing the pending interrupts.
> > > >
> > > > Can someone clarify me why the kernel hangs without crash dump incase
> > > > if I read the INT_STATUS register using readl() after clearing the
> > > > pending bits?
> > > >
> > > > Can readl() block?
> > >
> > > readl() should not block in software. Obviously at the hardware CPU
> > > instruction level, the read instruction has to wait for the result of
> > > the read. Since that data is provided by the device, i.e., your FPGA,
> > > it's possible there's a problem there.
> >
> > Thank you very much for your reply.
> > Where can I find the details about what is protocol for reading the
> > ‘memory mapped IO’? Can you point me to any useful links..
> > I tried locate the exact point of the kernel code where CPU waits for
> > read instruction as given below.
> > readl() -> __raw_readl() -> return *(const volatile u32 __force *)add
> > Do I need to check for the assembly instructions, here?
>
> The C pointer dereference, e.g., "*address", will be some sort of a
> "load" instruction in assembly. The CPU wait isn't explicit; it's
> just that when you load a value, the CPU waits for the value.
>
> > > Can you tell whether the FPGA has received the Memory Read for
> > > INT_STATUS and sent the completion?
> >
> > Is there a way to know this with the help of software debugging(either
> > enabling dynamic debugging or adding new debug prints)? Can you please
> > point some tools\hw needed to find this?
>
> You could learn this either via a PCIe analyzer (expensive piece of
> hardware) or possibly some logic in the FPGA that would log PCIe
> transactions in a buffer and make them accessible via some other
> interface (you mentioned it had parallel and other interfaces).
>
> > > On the architectures I'm familiar with, if a device doesn't respond,
> > > something would eventually time out so the CPU doesn't wait forever.
> >
> > What is timeout here? I mean how long CPU waits for completion? Since
> > this code runs from interrupt context, does it causes the system to
> > freeze if timeout is more?
>
> The Root Port should have a Completion Timeout. This is required by
> the PCIe spec. The *reporting* of the timeout is somewhat
> implementation-specific since the reporting is outside the PCIe
> domain. I don't know the duration of the timeout, but it certainly
> shouldn't be long enough to look like a "system freeze".
>
> > lspci output:
> > $ lspci
> > 00:00.0 Host bridge: Intel Corporation Atom Processor Z36xxx/Z37xxx
> > Series SoC Transaction Register (rev 11)
> > 00:02.0 VGA compatible controller: Intel Corporation Atom Processor
> > Z36xxx/Z37xxx Series Graphics & Display (rev 11)
> > 00:13.0 SATA controller: Intel Corporation Atom Processor E3800 Series
> > SATA AHCI Controller (rev 11)
> > 00:14.0 USB controller: Intel Corporation Atom Processor
> > Z36xxx/Z37xxx, Celeron N2000 Series USB xHCI (rev 11)
> > 00:1a.0 Encryption controller: Intel Corporation Atom Processor
> > Z36xxx/Z37xxx Series Trusted Execution Engine (rev 11)
> > 00:1b.0 Audio device: Intel Corporation Atom Processor Z36xxx/Z37xxx
> > Series High Definition Audio Controller (rev 11)
> > 00:1c.0 PCI bridge: Intel Corporation Atom Processor E3800 Series PCI
> > Express Root Port 1 (rev 11)
> > 00:1c.2 PCI bridge: Intel Corporation Atom Processor E3800 Series PCI
> > Express Root Port 3 (rev 11)
> > 00:1c.3 PCI bridge: Intel Corporation Atom Processor E3800 Series PCI
> > Express Root Port 4 (rev 11)
> > 00:1d.0 USB controller: Intel Corporation Atom Processor Z36xxx/Z37xxx
> > Series USB EHCI (rev 11)
> > 00:1f.0 ISA bridge: Intel Corporation Atom Processor Z36xxx/Z37xxx
> > Series Power Control Unit (rev 11)
> > 00:1f.3 SMBus: Intel Corporation Atom Processor E3800 Series SMBus
> > Controller (rev 11)
> > 01:00.0 RAM memory: PLDA Device 5555
>
> Is this 01:00.0 device the FPGA?
Yes you are correct. 01:00.0 RAM memory: PLDA Device 5555

>
> > 03:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network
> > Connection (rev 03)



--
Thanks,
Sekhar

2020-01-18 01:49:02

by Muni Sekhar

[permalink] [raw]
Subject: Re: pcie: xilinx: kernel hang - ISR readl()

On Thu, Jan 9, 2020 at 10:05 AM Bjorn Helgaas <[email protected]> wrote:
>
> On Thu, Jan 09, 2020 at 08:47:51AM +0530, Muni Sekhar wrote:
> > On Thu, Jan 9, 2020 at 1:45 AM Bjorn Helgaas <[email protected]> wrote:
> > > On Tue, Jan 07, 2020 at 09:45:13PM +0530, Muni Sekhar wrote:
> > > > Hi,
> > > >
> > > > I have module with Xilinx FPGA. It implements UART(s), SPI(s),
> > > > parallel I/O and interfaces them to the Host CPU via PCI Express bus.
> > > > I see that my system freezes without capturing the crash dump for
> > > > certain tests. I debugged this issue and it was tracked down to the
> > > > below mentioned interrupt handler code.
> > > >
> > > >
> > > > In ISR, first reads the Interrupt Status register using ‘readl()’ as
> > > > given below.
> > > > status = readl(ctrl->reg + INT_STATUS);
> > > >
> > > >
> > > > And then clears the pending interrupts using ‘writel()’ as given blow.
> > > > writel(status, ctrl->reg + INT_STATUS);
> > > >
> > > >
> > > > I've noticed a kernel hang if INT_STATUS register read again after
> > > > clearing the pending interrupts.
> > > >
> > > > Can someone clarify me why the kernel hangs without crash dump incase
> > > > if I read the INT_STATUS register using readl() after clearing the
> > > > pending bits?
> > > >
> > > > Can readl() block?
> > >
> > > readl() should not block in software. Obviously at the hardware CPU
> > > instruction level, the read instruction has to wait for the result of
> > > the read. Since that data is provided by the device, i.e., your FPGA,
> > > it's possible there's a problem there.
> >
> > Thank you very much for your reply.
> > Where can I find the details about what is protocol for reading the
> > ‘memory mapped IO’? Can you point me to any useful links..
> > I tried locate the exact point of the kernel code where CPU waits for
> > read instruction as given below.
> > readl() -> __raw_readl() -> return *(const volatile u32 __force *)add
> > Do I need to check for the assembly instructions, here?
>
> The C pointer dereference, e.g., "*address", will be some sort of a
> "load" instruction in assembly. The CPU wait isn't explicit; it's
> just that when you load a value, the CPU waits for the value.
>
> > > Can you tell whether the FPGA has received the Memory Read for
> > > INT_STATUS and sent the completion?
I have not seen any ‘missing’ completions on the logic analyser. Is
there any other ways to debug this one?

> >
> > Is there a way to know this with the help of software debugging(either
> > enabling dynamic debugging or adding new debug prints)? Can you please
> > point some tools\hw needed to find this?
>
> You could learn this either via a PCIe analyzer (expensive piece of
> hardware) or possibly some logic in the FPGA that would log PCIe
> transactions in a buffer and make them accessible via some other
> interface (you mentioned it had parallel and other interfaces).
>
> > > On the architectures I'm familiar with, if a device doesn't respond,
> > > something would eventually time out so the CPU doesn't wait forever.
> >
> > What is timeout here? I mean how long CPU waits for completion? Since
> > this code runs from interrupt context, does it causes the system to
> > freeze if timeout is more?
>
> The Root Port should have a Completion Timeout. This is required by
> the PCIe spec. The *reporting* of the timeout is somewhat
> implementation-specific since the reporting is outside the PCIe
> domain. I don't know the duration of the timeout, but it certainly
> shouldn't be long enough to look like a "system freeze".
>
> > lspci output:
> > $ lspci
> > 00:00.0 Host bridge: Intel Corporation Atom Processor Z36xxx/Z37xxx
> > Series SoC Transaction Register (rev 11)
> > 00:02.0 VGA compatible controller: Intel Corporation Atom Processor
> > Z36xxx/Z37xxx Series Graphics & Display (rev 11)
> > 00:13.0 SATA controller: Intel Corporation Atom Processor E3800 Series
> > SATA AHCI Controller (rev 11)
> > 00:14.0 USB controller: Intel Corporation Atom Processor
> > Z36xxx/Z37xxx, Celeron N2000 Series USB xHCI (rev 11)
> > 00:1a.0 Encryption controller: Intel Corporation Atom Processor
> > Z36xxx/Z37xxx Series Trusted Execution Engine (rev 11)
> > 00:1b.0 Audio device: Intel Corporation Atom Processor Z36xxx/Z37xxx
> > Series High Definition Audio Controller (rev 11)
> > 00:1c.0 PCI bridge: Intel Corporation Atom Processor E3800 Series PCI
> > Express Root Port 1 (rev 11)
> > 00:1c.2 PCI bridge: Intel Corporation Atom Processor E3800 Series PCI
> > Express Root Port 3 (rev 11)
> > 00:1c.3 PCI bridge: Intel Corporation Atom Processor E3800 Series PCI
> > Express Root Port 4 (rev 11)
> > 00:1d.0 USB controller: Intel Corporation Atom Processor Z36xxx/Z37xxx
> > Series USB EHCI (rev 11)
> > 00:1f.0 ISA bridge: Intel Corporation Atom Processor Z36xxx/Z37xxx
> > Series Power Control Unit (rev 11)
> > 00:1f.3 SMBus: Intel Corporation Atom Processor E3800 Series SMBus
> > Controller (rev 11)
> > 01:00.0 RAM memory: PLDA Device 5555
>
> Is this 01:00.0 device the FPGA?
>
> > 03:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network
> > Connection (rev 03)



--
Thanks,
Sekhar

2020-01-28 17:41:55

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: pcie: xilinx: kernel hang - ISR readl()

On Sat, Jan 18, 2020 at 07:16:14AM +0530, Muni Sekhar wrote:
> On Thu, Jan 9, 2020 at 10:05 AM Bjorn Helgaas <[email protected]> wrote:
> >
> > On Thu, Jan 09, 2020 at 08:47:51AM +0530, Muni Sekhar wrote:
> > > On Thu, Jan 9, 2020 at 1:45 AM Bjorn Helgaas <[email protected]> wrote:
> > > > On Tue, Jan 07, 2020 at 09:45:13PM +0530, Muni Sekhar wrote:
> > > > > Hi,
> > > > >
> > > > > I have module with Xilinx FPGA. It implements UART(s), SPI(s),
> > > > > parallel I/O and interfaces them to the Host CPU via PCI Express bus.
> > > > > I see that my system freezes without capturing the crash dump for
> > > > > certain tests. I debugged this issue and it was tracked down to the
> > > > > below mentioned interrupt handler code.
> > > > >
> > > > >
> > > > > In ISR, first reads the Interrupt Status register using ‘readl()’ as
> > > > > given below.
> > > > > status = readl(ctrl->reg + INT_STATUS);
> > > > >
> > > > >
> > > > > And then clears the pending interrupts using ‘writel()’ as given blow.
> > > > > writel(status, ctrl->reg + INT_STATUS);
> > > > >
> > > > >
> > > > > I've noticed a kernel hang if INT_STATUS register read again after
> > > > > clearing the pending interrupts.
> > > > >
> > > > > Can someone clarify me why the kernel hangs without crash dump incase
> > > > > if I read the INT_STATUS register using readl() after clearing the
> > > > > pending bits?
> > > > >
> > > > > Can readl() block?
> > > >
> > > > readl() should not block in software. Obviously at the hardware CPU
> > > > instruction level, the read instruction has to wait for the result of
> > > > the read. Since that data is provided by the device, i.e., your FPGA,
> > > > it's possible there's a problem there.
> > >
> > > Thank you very much for your reply.
> > > Where can I find the details about what is protocol for reading the
> > > ‘memory mapped IO’? Can you point me to any useful links..
> > > I tried locate the exact point of the kernel code where CPU waits for
> > > read instruction as given below.
> > > readl() -> __raw_readl() -> return *(const volatile u32 __force *)add
> > > Do I need to check for the assembly instructions, here?
> >
> > The C pointer dereference, e.g., "*address", will be some sort of a
> > "load" instruction in assembly. The CPU wait isn't explicit; it's
> > just that when you load a value, the CPU waits for the value.
> >
> > > > Can you tell whether the FPGA has received the Memory Read for
> > > > INT_STATUS and sent the completion?
> I have not seen any ‘missing’ completions on the logic analyser. Is
> there any other ways to debug this one?

If you see the Memory Read and the associated Completion, and you
still see a hang in the kernel, then mostly likely the problem is not
in PCIe.

I would start by trying to prove that the instruction after the
readl() is or is not executed.

Bjorn

2020-01-30 16:09:30

by Muni Sekhar

[permalink] [raw]
Subject: Re: pcie: xilinx: kernel hang - ISR readl()

On Thu, Jan 9, 2020 at 10:05 AM Bjorn Helgaas <[email protected]> wrote:
>
> On Thu, Jan 09, 2020 at 08:47:51AM +0530, Muni Sekhar wrote:
> > On Thu, Jan 9, 2020 at 1:45 AM Bjorn Helgaas <[email protected]> wrote:
> > > On Tue, Jan 07, 2020 at 09:45:13PM +0530, Muni Sekhar wrote:
> > > > Hi,
> > > >
> > > > I have module with Xilinx FPGA. It implements UART(s), SPI(s),
> > > > parallel I/O and interfaces them to the Host CPU via PCI Express bus.
> > > > I see that my system freezes without capturing the crash dump for
> > > > certain tests. I debugged this issue and it was tracked down to the
> > > > below mentioned interrupt handler code.
> > > >
> > > >
> > > > In ISR, first reads the Interrupt Status register using ‘readl()’ as
> > > > given below.
> > > > status = readl(ctrl->reg + INT_STATUS);
> > > >
> > > >
> > > > And then clears the pending interrupts using ‘writel()’ as given blow.
> > > > writel(status, ctrl->reg + INT_STATUS);
> > > >
> > > >
> > > > I've noticed a kernel hang if INT_STATUS register read again after
> > > > clearing the pending interrupts.
> > > >
> > > > Can someone clarify me why the kernel hangs without crash dump incase
> > > > if I read the INT_STATUS register using readl() after clearing the
> > > > pending bits?
> > > >
> > > > Can readl() block?
> > >
> > > readl() should not block in software. Obviously at the hardware CPU
> > > instruction level, the read instruction has to wait for the result of
> > > the read. Since that data is provided by the device, i.e., your FPGA,
> > > it's possible there's a problem there.
> >
> > Thank you very much for your reply.
> > Where can I find the details about what is protocol for reading the
> > ‘memory mapped IO’? Can you point me to any useful links..
> > I tried locate the exact point of the kernel code where CPU waits for
> > read instruction as given below.
> > readl() -> __raw_readl() -> return *(const volatile u32 __force *)add
> > Do I need to check for the assembly instructions, here?
>
> The C pointer dereference, e.g., "*address", will be some sort of a
> "load" instruction in assembly. The CPU wait isn't explicit; it's
> just that when you load a value, the CPU waits for the value.
>
> > > Can you tell whether the FPGA has received the Memory Read for
> > > INT_STATUS and sent the completion?
> >
> > Is there a way to know this with the help of software debugging(either
> > enabling dynamic debugging or adding new debug prints)? Can you please
> > point some tools\hw needed to find this?
>
> You could learn this either via a PCIe analyzer (expensive piece of
> hardware) or possibly some logic in the FPGA that would log PCIe
> transactions in a buffer and make them accessible via some other
> interface (you mentioned it had parallel and other interfaces).
>
> > > On the architectures I'm familiar with, if a device doesn't respond,
> > > something would eventually time out so the CPU doesn't wait forever.
> >
> > What is timeout here? I mean how long CPU waits for completion? Since
> > this code runs from interrupt context, does it causes the system to
> > freeze if timeout is more?
>
> The Root Port should have a Completion Timeout. This is required by
> the PCIe spec. The *reporting* of the timeout is somewhat
> implementation-specific since the reporting is outside the PCIe
> domain. I don't know the duration of the timeout, but it certainly
> shouldn't be long enough to look like a "system freeze".
Does kernel writes to PCIe configuration space register ‘Device
Control 2 Register’ (Offset 0x28)? When I tried to read this register,
I noticed bit 4 is set (which disables completion timeouts) and rest
all other bits are zero. So, Completion Timeout detection mechanism is
disabled, right? If so what could be the reason for this?

>
> > lspci output:
> > $ lspci
> > 00:00.0 Host bridge: Intel Corporation Atom Processor Z36xxx/Z37xxx
> > Series SoC Transaction Register (rev 11)
> > 00:02.0 VGA compatible controller: Intel Corporation Atom Processor
> > Z36xxx/Z37xxx Series Graphics & Display (rev 11)
> > 00:13.0 SATA controller: Intel Corporation Atom Processor E3800 Series
> > SATA AHCI Controller (rev 11)
> > 00:14.0 USB controller: Intel Corporation Atom Processor
> > Z36xxx/Z37xxx, Celeron N2000 Series USB xHCI (rev 11)
> > 00:1a.0 Encryption controller: Intel Corporation Atom Processor
> > Z36xxx/Z37xxx Series Trusted Execution Engine (rev 11)
> > 00:1b.0 Audio device: Intel Corporation Atom Processor Z36xxx/Z37xxx
> > Series High Definition Audio Controller (rev 11)
> > 00:1c.0 PCI bridge: Intel Corporation Atom Processor E3800 Series PCI
> > Express Root Port 1 (rev 11)
> > 00:1c.2 PCI bridge: Intel Corporation Atom Processor E3800 Series PCI
> > Express Root Port 3 (rev 11)
> > 00:1c.3 PCI bridge: Intel Corporation Atom Processor E3800 Series PCI
> > Express Root Port 4 (rev 11)
> > 00:1d.0 USB controller: Intel Corporation Atom Processor Z36xxx/Z37xxx
> > Series USB EHCI (rev 11)
> > 00:1f.0 ISA bridge: Intel Corporation Atom Processor Z36xxx/Z37xxx
> > Series Power Control Unit (rev 11)
> > 00:1f.3 SMBus: Intel Corporation Atom Processor E3800 Series SMBus
> > Controller (rev 11)
> > 01:00.0 RAM memory: PLDA Device 5555
>
> Is this 01:00.0 device the FPGA?
>
> > 03:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network
> > Connection (rev 03)



--
Thanks,
Sekhar

2020-01-30 19:01:57

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: pcie: xilinx: kernel hang - ISR readl()

On Thu, Jan 30, 2020 at 09:37:48PM +0530, Muni Sekhar wrote:
> On Thu, Jan 9, 2020 at 10:05 AM Bjorn Helgaas <[email protected]> wrote:
> >
> > On Thu, Jan 09, 2020 at 08:47:51AM +0530, Muni Sekhar wrote:
> > > On Thu, Jan 9, 2020 at 1:45 AM Bjorn Helgaas <[email protected]> wrote:
> > > > On Tue, Jan 07, 2020 at 09:45:13PM +0530, Muni Sekhar wrote:
> > > > > Hi,
> > > > >
> > > > > I have module with Xilinx FPGA. It implements UART(s), SPI(s),
> > > > > parallel I/O and interfaces them to the Host CPU via PCI Express bus.
> > > > > I see that my system freezes without capturing the crash dump for
> > > > > certain tests. I debugged this issue and it was tracked down to the
> > > > > below mentioned interrupt handler code.
> > > > >
> > > > >
> > > > > In ISR, first reads the Interrupt Status register using ‘readl()’ as
> > > > > given below.
> > > > > status = readl(ctrl->reg + INT_STATUS);
> > > > >
> > > > >
> > > > > And then clears the pending interrupts using ‘writel()’ as given blow.
> > > > > writel(status, ctrl->reg + INT_STATUS);
> > > > >
> > > > >
> > > > > I've noticed a kernel hang if INT_STATUS register read again after
> > > > > clearing the pending interrupts.
> > > > >
> > > > > Can someone clarify me why the kernel hangs without crash dump incase
> > > > > if I read the INT_STATUS register using readl() after clearing the
> > > > > pending bits?
> > > > >
> > > > > Can readl() block?
> > > >
> > > > readl() should not block in software. Obviously at the hardware CPU
> > > > instruction level, the read instruction has to wait for the result of
> > > > the read. Since that data is provided by the device, i.e., your FPGA,
> > > > it's possible there's a problem there.
> > >
> > > Thank you very much for your reply.
> > > Where can I find the details about what is protocol for reading the
> > > ‘memory mapped IO’? Can you point me to any useful links..
> > > I tried locate the exact point of the kernel code where CPU waits for
> > > read instruction as given below.
> > > readl() -> __raw_readl() -> return *(const volatile u32 __force *)add
> > > Do I need to check for the assembly instructions, here?
> >
> > The C pointer dereference, e.g., "*address", will be some sort of a
> > "load" instruction in assembly. The CPU wait isn't explicit; it's
> > just that when you load a value, the CPU waits for the value.
> >
> > > > Can you tell whether the FPGA has received the Memory Read for
> > > > INT_STATUS and sent the completion?
> > >
> > > Is there a way to know this with the help of software debugging(either
> > > enabling dynamic debugging or adding new debug prints)? Can you please
> > > point some tools\hw needed to find this?
> >
> > You could learn this either via a PCIe analyzer (expensive piece of
> > hardware) or possibly some logic in the FPGA that would log PCIe
> > transactions in a buffer and make them accessible via some other
> > interface (you mentioned it had parallel and other interfaces).
> >
> > > > On the architectures I'm familiar with, if a device doesn't respond,
> > > > something would eventually time out so the CPU doesn't wait forever.
> > >
> > > What is timeout here? I mean how long CPU waits for completion? Since
> > > this code runs from interrupt context, does it causes the system to
> > > freeze if timeout is more?
> >
> > The Root Port should have a Completion Timeout. This is required by
> > the PCIe spec. The *reporting* of the timeout is somewhat
> > implementation-specific since the reporting is outside the PCIe
> > domain. I don't know the duration of the timeout, but it certainly
> > shouldn't be long enough to look like a "system freeze".
> Does kernel writes to PCIe configuration space register ‘Device
> Control 2 Register’ (Offset 0x28)? When I tried to read this register,
> I noticed bit 4 is set (which disables completion timeouts) and rest
> all other bits are zero. So, Completion Timeout detection mechanism is
> disabled, right? If so what could be the reason for this?

To my knowledge Linux doesn't set PCI_EXP_DEVCTL2_COMP_TMOUT_DIS
except for one powerpc case. You can check yourself by using cscope
or grep to look for PCI_EXP_DEVCTL2_COMP_TMOUT_DIS or PCI_EXP_DEVCTL2.

If you're seeing bit 4 (PCI_EXP_DEVCTL2_COMP_TMOUT_DIS) set, it's
likely because firmware set it. You can try booting with
"pci=earlydump" to see what's there before Linux starts changing
things.

Bjorn

2020-01-31 11:34:16

by David Laight

[permalink] [raw]
Subject: RE: pcie: xilinx: kernel hang - ISR readl()

From: Bjorn Helgaas
> Sent: 30 January 2020 19:01
..
> > > You could learn this either via a PCIe analyzer (expensive piece of
> > > hardware) or possibly some logic in the FPGA that would log PCIe
> > > transactions in a buffer and make them accessible via some other
> > > interface (you mentioned it had parallel and other interfaces).

You can probably use the Xilinx equivalent of Altera 'signaltap'
to work out what is happening within the fpga.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2020-01-31 16:35:29

by Muni Sekhar

[permalink] [raw]
Subject: Re: pcie: xilinx: kernel hang - ISR readl()

On Fri, Jan 31, 2020 at 12:30 AM Bjorn Helgaas <[email protected]> wrote:
>
> On Thu, Jan 30, 2020 at 09:37:48PM +0530, Muni Sekhar wrote:
> > On Thu, Jan 9, 2020 at 10:05 AM Bjorn Helgaas <[email protected]> wrote:
> > >
> > > On Thu, Jan 09, 2020 at 08:47:51AM +0530, Muni Sekhar wrote:
> > > > On Thu, Jan 9, 2020 at 1:45 AM Bjorn Helgaas <[email protected]> wrote:
> > > > > On Tue, Jan 07, 2020 at 09:45:13PM +0530, Muni Sekhar wrote:
> > > > > > Hi,
> > > > > >
> > > > > > I have module with Xilinx FPGA. It implements UART(s), SPI(s),
> > > > > > parallel I/O and interfaces them to the Host CPU via PCI Express bus.
> > > > > > I see that my system freezes without capturing the crash dump for
> > > > > > certain tests. I debugged this issue and it was tracked down to the
> > > > > > below mentioned interrupt handler code.
> > > > > >
> > > > > >
> > > > > > In ISR, first reads the Interrupt Status register using ‘readl()’ as
> > > > > > given below.
> > > > > > status = readl(ctrl->reg + INT_STATUS);
> > > > > >
> > > > > >
> > > > > > And then clears the pending interrupts using ‘writel()’ as given blow.
> > > > > > writel(status, ctrl->reg + INT_STATUS);
> > > > > >
> > > > > >
> > > > > > I've noticed a kernel hang if INT_STATUS register read again after
> > > > > > clearing the pending interrupts.
> > > > > >
> > > > > > Can someone clarify me why the kernel hangs without crash dump incase
> > > > > > if I read the INT_STATUS register using readl() after clearing the
> > > > > > pending bits?
> > > > > >
> > > > > > Can readl() block?
> > > > >
> > > > > readl() should not block in software. Obviously at the hardware CPU
> > > > > instruction level, the read instruction has to wait for the result of
> > > > > the read. Since that data is provided by the device, i.e., your FPGA,
> > > > > it's possible there's a problem there.
> > > >
> > > > Thank you very much for your reply.
> > > > Where can I find the details about what is protocol for reading the
> > > > ‘memory mapped IO’? Can you point me to any useful links..
> > > > I tried locate the exact point of the kernel code where CPU waits for
> > > > read instruction as given below.
> > > > readl() -> __raw_readl() -> return *(const volatile u32 __force *)add
> > > > Do I need to check for the assembly instructions, here?
> > >
> > > The C pointer dereference, e.g., "*address", will be some sort of a
> > > "load" instruction in assembly. The CPU wait isn't explicit; it's
> > > just that when you load a value, the CPU waits for the value.
> > >
> > > > > Can you tell whether the FPGA has received the Memory Read for
> > > > > INT_STATUS and sent the completion?
> > > >
> > > > Is there a way to know this with the help of software debugging(either
> > > > enabling dynamic debugging or adding new debug prints)? Can you please
> > > > point some tools\hw needed to find this?
> > >
> > > You could learn this either via a PCIe analyzer (expensive piece of
> > > hardware) or possibly some logic in the FPGA that would log PCIe
> > > transactions in a buffer and make them accessible via some other
> > > interface (you mentioned it had parallel and other interfaces).
> > >
> > > > > On the architectures I'm familiar with, if a device doesn't respond,
> > > > > something would eventually time out so the CPU doesn't wait forever.
> > > >
> > > > What is timeout here? I mean how long CPU waits for completion? Since
> > > > this code runs from interrupt context, does it causes the system to
> > > > freeze if timeout is more?
> > >
> > > The Root Port should have a Completion Timeout. This is required by
> > > the PCIe spec. The *reporting* of the timeout is somewhat
> > > implementation-specific since the reporting is outside the PCIe
> > > domain. I don't know the duration of the timeout, but it certainly
> > > shouldn't be long enough to look like a "system freeze".
> > Does kernel writes to PCIe configuration space register ‘Device
> > Control 2 Register’ (Offset 0x28)? When I tried to read this register,
> > I noticed bit 4 is set (which disables completion timeouts) and rest
> > all other bits are zero. So, Completion Timeout detection mechanism is
> > disabled, right? If so what could be the reason for this?
>
> To my knowledge Linux doesn't set PCI_EXP_DEVCTL2_COMP_TMOUT_DIS
> except for one powerpc case. You can check yourself by using cscope
> or grep to look for PCI_EXP_DEVCTL2_COMP_TMOUT_DIS or PCI_EXP_DEVCTL2.
>
> If you're seeing bit 4 (PCI_EXP_DEVCTL2_COMP_TMOUT_DIS) set, it's
> likely because firmware set it. You can try booting with
> "pci=earlydump" to see what's there before Linux starts changing
> things.

[ 0.000000] pci 0000:01:00.0 config space:

00: 56 15 55 55 07 00 10 00 00 00 00 05 10 00 00 00
10: 00 00 40 d0 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00
30: 00 00 00 00 40 00 00 00 00 00 00 00 0b 01 00 00
40: 01 48 03 00 08 00 00 00 05 60 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 10 00 02 00 c2 8f 00 00 00 28 01 00 21 f4 03 00
70: 01 00 21 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 02 00 00 00 10 00 00 00 00 00 00 00
90: 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00


Device Control 2" is located @offset 0x28 in PCI Express Capability
Structure. But where does 'PCI Express Capability Structure' located
in the above mentioned 'PCI Express Configuration Space'?
>
> Bjorn



--
Thanks,
Sekhar

2020-01-31 20:48:50

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: pcie: xilinx: kernel hang - ISR readl()

On Fri, Jan 31, 2020 at 10:04:05PM +0530, Muni Sekhar wrote:
> On Fri, Jan 31, 2020 at 12:30 AM Bjorn Helgaas <[email protected]> wrote:
> > On Thu, Jan 30, 2020 at 09:37:48PM +0530, Muni Sekhar wrote:
> > > On Thu, Jan 9, 2020 at 10:05 AM Bjorn Helgaas <[email protected]> wrote:
> > > >
> > > > On Thu, Jan 09, 2020 at 08:47:51AM +0530, Muni Sekhar wrote:
> > > > > On Thu, Jan 9, 2020 at 1:45 AM Bjorn Helgaas <[email protected]> wrote:
> > > > > > On Tue, Jan 07, 2020 at 09:45:13PM +0530, Muni Sekhar wrote:
> > > > > > > Hi,
> > > > > > >
> > > > > > > I have module with Xilinx FPGA. It implements UART(s), SPI(s),
> > > > > > > parallel I/O and interfaces them to the Host CPU via PCI Express bus.
> > > > > > > I see that my system freezes without capturing the crash dump for
> > > > > > > certain tests. I debugged this issue and it was tracked down to the
> > > > > > > below mentioned interrupt handler code.
> > > > > > >
> > > > > > >
> > > > > > > In ISR, first reads the Interrupt Status register using ‘readl()’ as
> > > > > > > given below.
> > > > > > > status = readl(ctrl->reg + INT_STATUS);
> > > > > > >
> > > > > > >
> > > > > > > And then clears the pending interrupts using ‘writel()’ as given blow.
> > > > > > > writel(status, ctrl->reg + INT_STATUS);
> > > > > > >
> > > > > > >
> > > > > > > I've noticed a kernel hang if INT_STATUS register read again after
> > > > > > > clearing the pending interrupts.
> > > > > > >
> > > > > > > Can someone clarify me why the kernel hangs without crash dump incase
> > > > > > > if I read the INT_STATUS register using readl() after clearing the
> > > > > > > pending bits?
> > > > > > >
> > > > > > > Can readl() block?
> > > > > >
> > > > > > readl() should not block in software. Obviously at the hardware CPU
> > > > > > instruction level, the read instruction has to wait for the result of
> > > > > > the read. Since that data is provided by the device, i.e., your FPGA,
> > > > > > it's possible there's a problem there.
> > > > >
> > > > > Thank you very much for your reply.
> > > > > Where can I find the details about what is protocol for reading the
> > > > > ‘memory mapped IO’? Can you point me to any useful links..
> > > > > I tried locate the exact point of the kernel code where CPU waits for
> > > > > read instruction as given below.
> > > > > readl() -> __raw_readl() -> return *(const volatile u32 __force *)add
> > > > > Do I need to check for the assembly instructions, here?
> > > >
> > > > The C pointer dereference, e.g., "*address", will be some sort of a
> > > > "load" instruction in assembly. The CPU wait isn't explicit; it's
> > > > just that when you load a value, the CPU waits for the value.
> > > >
> > > > > > Can you tell whether the FPGA has received the Memory Read for
> > > > > > INT_STATUS and sent the completion?
> > > > >
> > > > > Is there a way to know this with the help of software debugging(either
> > > > > enabling dynamic debugging or adding new debug prints)? Can you please
> > > > > point some tools\hw needed to find this?
> > > >
> > > > You could learn this either via a PCIe analyzer (expensive piece of
> > > > hardware) or possibly some logic in the FPGA that would log PCIe
> > > > transactions in a buffer and make them accessible via some other
> > > > interface (you mentioned it had parallel and other interfaces).
> > > >
> > > > > > On the architectures I'm familiar with, if a device doesn't respond,
> > > > > > something would eventually time out so the CPU doesn't wait forever.
> > > > >
> > > > > What is timeout here? I mean how long CPU waits for completion? Since
> > > > > this code runs from interrupt context, does it causes the system to
> > > > > freeze if timeout is more?
> > > >
> > > > The Root Port should have a Completion Timeout. This is required by
> > > > the PCIe spec. The *reporting* of the timeout is somewhat
> > > > implementation-specific since the reporting is outside the PCIe
> > > > domain. I don't know the duration of the timeout, but it certainly
> > > > shouldn't be long enough to look like a "system freeze".
> > > Does kernel writes to PCIe configuration space register ‘Device
> > > Control 2 Register’ (Offset 0x28)? When I tried to read this register,
> > > I noticed bit 4 is set (which disables completion timeouts) and rest
> > > all other bits are zero. So, Completion Timeout detection mechanism is
> > > disabled, right? If so what could be the reason for this?
> >
> > To my knowledge Linux doesn't set PCI_EXP_DEVCTL2_COMP_TMOUT_DIS
> > except for one powerpc case. You can check yourself by using cscope
> > or grep to look for PCI_EXP_DEVCTL2_COMP_TMOUT_DIS or PCI_EXP_DEVCTL2.
> >
> > If you're seeing bit 4 (PCI_EXP_DEVCTL2_COMP_TMOUT_DIS) set, it's
> > likely because firmware set it. You can try booting with
> > "pci=earlydump" to see what's there before Linux starts changing
> > things.
>
> [ 0.000000] pci 0000:01:00.0 config space:
>
> 00: 56 15 55 55 07 00 10 00 00 00 00 05 10 00 00 00
> 10: 00 00 40 d0 00 00 00 00 00 00 00 00 00 00 00 00
> 20: 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00
> 30: 00 00 00 00 40 00 00 00 00 00 00 00 0b 01 00 00
> 40: 01 48 03 00 08 00 00 00 05 60 00 00 00 00 00 00
> 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 60: 10 00 02 00 c2 8f 00 00 00 28 01 00 21 f4 03 00
> 70: 01 00 21 00 00 00 00 00 00 00 00 00 00 00 00 00
> 80: 00 00 00 00 02 00 00 00 10 00 00 00 00 00 00 00
> 90: 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00
> a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
>
> Device Control 2" is located @offset 0x28 in PCI Express Capability
> Structure. But where does 'PCI Express Capability Structure' located
> in the above mentioned 'PCI Express Configuration Space'?

"lspci -v" tells you the location of the capability. For example, on
my system:

# lspci -vxxxs1c.0
00:1c.0 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port (rev f1) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0, IRQ 122
Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
I/O behind bridge: None
Memory behind bridge: f1100000-f11fffff [size=1M]
Prefetchable memory behind bridge: None
Capabilities: [40] Express Root Port (Slot+), MSI 00
Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
Capabilities: [90] Subsystem: Lenovo Sunrise Point-LP PCI Express Root Port
Capabilities: [a0] Power Management version 3
Capabilities: [100] Advanced Error Reporting
Capabilities: [140] Access Control Services
Capabilities: [200] L1 PM Substates
Capabilities: [220] Secondary PCI Express
LnkCtl3: LnkEquIntrruptEn-, PerformEqu-
LaneErrStat: 0
Kernel driver in use: pcieport
00: 86 80 10 9d 07 04 10 00 f1 00 04 06 00 00 81 00
10: 00 00 00 00 00 00 00 00 00 02 02 00 f0 00 00 20
20: 10 f1 10 f1 f1 ff 01 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 40 00 00 00 00 00 00 00 ff 01 02 00
40: 10 80 42 01 01 80 00 00 20 00 10 00 13 48 72 01
50: 42 00 12 70 00 b2 04 00 00 00 48 01 00 00 00 00
60: 00 00 00 00 37 08 00 00 00 04 00 00 0e 00 00 00
70: 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 05 90 01 00 18 02 e0 fe 00 00 00 00 00 00 00 00
90: 0d a0 00 00 aa 17 38 22 00 00 00 00 00 00 00 00
a0: 01 00 03 c8 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 11 10 00 07 42 18 00 00 08 00 9e 8b 00 00 00 00
e0: 00 f7 73 00 03 90 00 00 16 80 12 00 00 00 00 00
f0: 50 01 00 00 00 03 00 40 b3 0f 30 08 04 00 00 01

The PCI Express capability is at "[40]" (0x40) and PCI_EXP_DEVCTL2 is
a 16-bit register at offset 40 (0x28) from that. So on my system,
PCI_EXP_DEVCTL2 is at 0x68 with value 0x0400 (PCI_EXP_DEVCTL2_LTR_EN).
This matches what lspci decodes:

# lspci -vvs1c.0 | grep -A1 DevCtl2
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled ARIFwd-
AtomicOpsCtl: ReqEn- EgressBlck-


2020-02-01 03:16:19

by Muni Sekhar

[permalink] [raw]
Subject: Re: pcie: xilinx: kernel hang - ISR readl()

On Sat, Feb 1, 2020 at 2:16 AM Bjorn Helgaas <[email protected]> wrote:
>
> On Fri, Jan 31, 2020 at 10:04:05PM +0530, Muni Sekhar wrote:
> > On Fri, Jan 31, 2020 at 12:30 AM Bjorn Helgaas <[email protected]> wrote:
> > > On Thu, Jan 30, 2020 at 09:37:48PM +0530, Muni Sekhar wrote:
> > > > On Thu, Jan 9, 2020 at 10:05 AM Bjorn Helgaas <[email protected]> wrote:
> > > > >
> > > > > On Thu, Jan 09, 2020 at 08:47:51AM +0530, Muni Sekhar wrote:
> > > > > > On Thu, Jan 9, 2020 at 1:45 AM Bjorn Helgaas <[email protected]> wrote:
> > > > > > > On Tue, Jan 07, 2020 at 09:45:13PM +0530, Muni Sekhar wrote:
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > I have module with Xilinx FPGA. It implements UART(s), SPI(s),
> > > > > > > > parallel I/O and interfaces them to the Host CPU via PCI Express bus.
> > > > > > > > I see that my system freezes without capturing the crash dump for
> > > > > > > > certain tests. I debugged this issue and it was tracked down to the
> > > > > > > > below mentioned interrupt handler code.
> > > > > > > >
> > > > > > > >
> > > > > > > > In ISR, first reads the Interrupt Status register using ‘readl()’ as
> > > > > > > > given below.
> > > > > > > > status = readl(ctrl->reg + INT_STATUS);
> > > > > > > >
> > > > > > > >
> > > > > > > > And then clears the pending interrupts using ‘writel()’ as given blow.
> > > > > > > > writel(status, ctrl->reg + INT_STATUS);
> > > > > > > >
> > > > > > > >
> > > > > > > > I've noticed a kernel hang if INT_STATUS register read again after
> > > > > > > > clearing the pending interrupts.
> > > > > > > >
> > > > > > > > Can someone clarify me why the kernel hangs without crash dump incase
> > > > > > > > if I read the INT_STATUS register using readl() after clearing the
> > > > > > > > pending bits?
> > > > > > > >
> > > > > > > > Can readl() block?
> > > > > > >
> > > > > > > readl() should not block in software. Obviously at the hardware CPU
> > > > > > > instruction level, the read instruction has to wait for the result of
> > > > > > > the read. Since that data is provided by the device, i.e., your FPGA,
> > > > > > > it's possible there's a problem there.
> > > > > >
> > > > > > Thank you very much for your reply.
> > > > > > Where can I find the details about what is protocol for reading the
> > > > > > ‘memory mapped IO’? Can you point me to any useful links..
> > > > > > I tried locate the exact point of the kernel code where CPU waits for
> > > > > > read instruction as given below.
> > > > > > readl() -> __raw_readl() -> return *(const volatile u32 __force *)add
> > > > > > Do I need to check for the assembly instructions, here?
> > > > >
> > > > > The C pointer dereference, e.g., "*address", will be some sort of a
> > > > > "load" instruction in assembly. The CPU wait isn't explicit; it's
> > > > > just that when you load a value, the CPU waits for the value.
> > > > >
> > > > > > > Can you tell whether the FPGA has received the Memory Read for
> > > > > > > INT_STATUS and sent the completion?
> > > > > >
> > > > > > Is there a way to know this with the help of software debugging(either
> > > > > > enabling dynamic debugging or adding new debug prints)? Can you please
> > > > > > point some tools\hw needed to find this?
> > > > >
> > > > > You could learn this either via a PCIe analyzer (expensive piece of
> > > > > hardware) or possibly some logic in the FPGA that would log PCIe
> > > > > transactions in a buffer and make them accessible via some other
> > > > > interface (you mentioned it had parallel and other interfaces).
> > > > >
> > > > > > > On the architectures I'm familiar with, if a device doesn't respond,
> > > > > > > something would eventually time out so the CPU doesn't wait forever.
> > > > > >
> > > > > > What is timeout here? I mean how long CPU waits for completion? Since
> > > > > > this code runs from interrupt context, does it causes the system to
> > > > > > freeze if timeout is more?
> > > > >
> > > > > The Root Port should have a Completion Timeout. This is required by
> > > > > the PCIe spec. The *reporting* of the timeout is somewhat
> > > > > implementation-specific since the reporting is outside the PCIe
> > > > > domain. I don't know the duration of the timeout, but it certainly
> > > > > shouldn't be long enough to look like a "system freeze".
> > > > Does kernel writes to PCIe configuration space register ‘Device
> > > > Control 2 Register’ (Offset 0x28)? When I tried to read this register,
> > > > I noticed bit 4 is set (which disables completion timeouts) and rest
> > > > all other bits are zero. So, Completion Timeout detection mechanism is
> > > > disabled, right? If so what could be the reason for this?
> > >
> > > To my knowledge Linux doesn't set PCI_EXP_DEVCTL2_COMP_TMOUT_DIS
> > > except for one powerpc case. You can check yourself by using cscope
> > > or grep to look for PCI_EXP_DEVCTL2_COMP_TMOUT_DIS or PCI_EXP_DEVCTL2.
> > >
> > > If you're seeing bit 4 (PCI_EXP_DEVCTL2_COMP_TMOUT_DIS) set, it's
> > > likely because firmware set it. You can try booting with
> > > "pci=earlydump" to see what's there before Linux starts changing
> > > things.
Yes Linux doesn't set PCI_EXP_DEVCTL2_COMP_TMOUT_DIS, verified with earlydump.
Firmware means BIOS? If so is there a way to enable the timeout detection?

01:00.0 RAM memory: PLDA Device 5555
Subsystem: Device 4000:0000
Flags: bus master, fast devsel, latency 0, IRQ 16
Memory at d0400000 (32-bit, non-prefetchable) [size=4M]
Capabilities: [40] Power Management version 3
Capabilities: [48] MSI: Enable- Count=1/1 Maskable- 64bit-
Capabilities: [60] Express Endpoint, MSI 00
Kernel driver in use: PLDA PCI
Kernel modules: plda_pci

00: 56 15 55 55 07 00 10 00 00 00 00 05 10 00 00 00
10: 00 00 40 d0 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00
30: 00 00 00 00 40 00 00 00 00 00 00 00 0b 01 00 00
40: 01 48 03 00 08 00 00 00 05 60 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 10 00 02 00 c2 8f 00 00 00 28 01 00 21 f4 03 00
70: 01 00 21 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 02 00 00 00 10 00 00 00 00 00 00 00
90: 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

So, on my system, the PCI Express capability is at "[60]" and
PCI_EXP_DEVCTL2 is at 0x88 with value 0x0010
(PCI_EXP_DEVCTL2_COMP_TMOUT_DIS). Also this matches what lspci
decodes:

$ sudo lspci -vvs00.0 | grep -A1 DevCtl2
DevCtl2: Completion Timeout: 50us to 50ms,
TimeoutDis+, LTR-, OBFF Disabled
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-



> >
> > [ 0.000000] pci 0000:01:00.0 config space:
> >
> > 00: 56 15 55 55 07 00 10 00 00 00 00 05 10 00 00 00
> > 10: 00 00 40 d0 00 00 00 00 00 00 00 00 00 00 00 00
> > 20: 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00
> > 30: 00 00 00 00 40 00 00 00 00 00 00 00 0b 01 00 00
> > 40: 01 48 03 00 08 00 00 00 05 60 00 00 00 00 00 00
> > 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > 60: 10 00 02 00 c2 8f 00 00 00 28 01 00 21 f4 03 00
> > 70: 01 00 21 00 00 00 00 00 00 00 00 00 00 00 00 00
> > 80: 00 00 00 00 02 00 00 00 10 00 00 00 00 00 00 00
> > 90: 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00
> > a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >
> >
> > Device Control 2" is located @offset 0x28 in PCI Express Capability
> > Structure. But where does 'PCI Express Capability Structure' located
> > in the above mentioned 'PCI Express Configuration Space'?
>
> "lspci -v" tells you the location of the capability. For example, on
> my system:
>
> # lspci -vxxxs1c.0
> 00:1c.0 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port (rev f1) (prog-if 00 [Normal decode])
> Flags: bus master, fast devsel, latency 0, IRQ 122
> Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
> I/O behind bridge: None
> Memory behind bridge: f1100000-f11fffff [size=1M]
> Prefetchable memory behind bridge: None
> Capabilities: [40] Express Root Port (Slot+), MSI 00
> Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
> Capabilities: [90] Subsystem: Lenovo Sunrise Point-LP PCI Express Root Port
> Capabilities: [a0] Power Management version 3
> Capabilities: [100] Advanced Error Reporting
> Capabilities: [140] Access Control Services
> Capabilities: [200] L1 PM Substates
> Capabilities: [220] Secondary PCI Express
> LnkCtl3: LnkEquIntrruptEn-, PerformEqu-
> LaneErrStat: 0
> Kernel driver in use: pcieport
> 00: 86 80 10 9d 07 04 10 00 f1 00 04 06 00 00 81 00
> 10: 00 00 00 00 00 00 00 00 00 02 02 00 f0 00 00 20
> 20: 10 f1 10 f1 f1 ff 01 00 00 00 00 00 00 00 00 00
> 30: 00 00 00 00 40 00 00 00 00 00 00 00 ff 01 02 00
> 40: 10 80 42 01 01 80 00 00 20 00 10 00 13 48 72 01
> 50: 42 00 12 70 00 b2 04 00 00 00 48 01 00 00 00 00
> 60: 00 00 00 00 37 08 00 00 00 04 00 00 0e 00 00 00
> 70: 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 80: 05 90 01 00 18 02 e0 fe 00 00 00 00 00 00 00 00
> 90: 0d a0 00 00 aa 17 38 22 00 00 00 00 00 00 00 00
> a0: 01 00 03 c8 00 00 00 00 00 00 00 00 00 00 00 00
> b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> d0: 11 10 00 07 42 18 00 00 08 00 9e 8b 00 00 00 00
> e0: 00 f7 73 00 03 90 00 00 16 80 12 00 00 00 00 00
> f0: 50 01 00 00 00 03 00 40 b3 0f 30 08 04 00 00 01
>
> The PCI Express capability is at "[40]" (0x40) and PCI_EXP_DEVCTL2 is
> a 16-bit register at offset 40 (0x28) from that. So on my system,
> PCI_EXP_DEVCTL2 is at 0x68 with value 0x0400 (PCI_EXP_DEVCTL2_LTR_EN).
> This matches what lspci decodes:
>
> # lspci -vvs1c.0 | grep -A1 DevCtl2
> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled ARIFwd-
> AtomicOpsCtl: ReqEn- EgressBlck-
>
>


--
Thanks,
Sekhar

2020-02-01 18:30:42

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: pcie: xilinx: kernel hang - ISR readl()

On Sat, Feb 01, 2020 at 08:44:40AM +0530, Muni Sekhar wrote:
> On Sat, Feb 1, 2020 at 2:16 AM Bjorn Helgaas <[email protected]> wrote:
> > On Fri, Jan 31, 2020 at 10:04:05PM +0530, Muni Sekhar wrote:
> > > On Fri, Jan 31, 2020 at 12:30 AM Bjorn Helgaas <[email protected]> wrote:
> > > > On Thu, Jan 30, 2020 at 09:37:48PM +0530, Muni Sekhar wrote:
> > > > > On Thu, Jan 9, 2020 at 10:05 AM Bjorn Helgaas <[email protected]> wrote:
> > > > > >
> > > > > > On Thu, Jan 09, 2020 at 08:47:51AM +0530, Muni Sekhar wrote:
> > > > > > > On Thu, Jan 9, 2020 at 1:45 AM Bjorn Helgaas <[email protected]> wrote:
> > > > > > > > On Tue, Jan 07, 2020 at 09:45:13PM +0530, Muni Sekhar wrote:
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > I have module with Xilinx FPGA. It implements UART(s), SPI(s),
> > > > > > > > > parallel I/O and interfaces them to the Host CPU via PCI Express bus.
> > > > > > > > > I see that my system freezes without capturing the crash dump for
> > > > > > > > > certain tests. I debugged this issue and it was tracked down to the
> > > > > > > > > below mentioned interrupt handler code.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > In ISR, first reads the Interrupt Status register using ‘readl()’ as
> > > > > > > > > given below.
> > > > > > > > > status = readl(ctrl->reg + INT_STATUS);
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > And then clears the pending interrupts using ‘writel()’ as given blow.
> > > > > > > > > writel(status, ctrl->reg + INT_STATUS);
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > I've noticed a kernel hang if INT_STATUS register read again after
> > > > > > > > > clearing the pending interrupts.
> > > > > > > > >
> > > > > > > > > Can someone clarify me why the kernel hangs without crash dump incase
> > > > > > > > > if I read the INT_STATUS register using readl() after clearing the
> > > > > > > > > pending bits?
> > > > > > > > >
> > > > > > > > > Can readl() block?
> > > > > > > >
> > > > > > > > readl() should not block in software. Obviously at the hardware CPU
> > > > > > > > instruction level, the read instruction has to wait for the result of
> > > > > > > > the read. Since that data is provided by the device, i.e., your FPGA,
> > > > > > > > it's possible there's a problem there.
> > > > > > >
> > > > > > > Thank you very much for your reply.
> > > > > > > Where can I find the details about what is protocol for reading the
> > > > > > > ‘memory mapped IO’? Can you point me to any useful links..
> > > > > > > I tried locate the exact point of the kernel code where CPU waits for
> > > > > > > read instruction as given below.
> > > > > > > readl() -> __raw_readl() -> return *(const volatile u32 __force *)add
> > > > > > > Do I need to check for the assembly instructions, here?
> > > > > >
> > > > > > The C pointer dereference, e.g., "*address", will be some sort of a
> > > > > > "load" instruction in assembly. The CPU wait isn't explicit; it's
> > > > > > just that when you load a value, the CPU waits for the value.
> > > > > >
> > > > > > > > Can you tell whether the FPGA has received the Memory Read for
> > > > > > > > INT_STATUS and sent the completion?
> > > > > > >
> > > > > > > Is there a way to know this with the help of software debugging(either
> > > > > > > enabling dynamic debugging or adding new debug prints)? Can you please
> > > > > > > point some tools\hw needed to find this?
> > > > > >
> > > > > > You could learn this either via a PCIe analyzer (expensive piece of
> > > > > > hardware) or possibly some logic in the FPGA that would log PCIe
> > > > > > transactions in a buffer and make them accessible via some other
> > > > > > interface (you mentioned it had parallel and other interfaces).
> > > > > >
> > > > > > > > On the architectures I'm familiar with, if a device doesn't respond,
> > > > > > > > something would eventually time out so the CPU doesn't wait forever.
> > > > > > >
> > > > > > > What is timeout here? I mean how long CPU waits for completion? Since
> > > > > > > this code runs from interrupt context, does it causes the system to
> > > > > > > freeze if timeout is more?
> > > > > >
> > > > > > The Root Port should have a Completion Timeout. This is required by
> > > > > > the PCIe spec. The *reporting* of the timeout is somewhat
> > > > > > implementation-specific since the reporting is outside the PCIe
> > > > > > domain. I don't know the duration of the timeout, but it certainly
> > > > > > shouldn't be long enough to look like a "system freeze".
> > > > > Does kernel writes to PCIe configuration space register ‘Device
> > > > > Control 2 Register’ (Offset 0x28)? When I tried to read this register,
> > > > > I noticed bit 4 is set (which disables completion timeouts) and rest
> > > > > all other bits are zero. So, Completion Timeout detection mechanism is
> > > > > disabled, right? If so what could be the reason for this?
> > > >
> > > > To my knowledge Linux doesn't set PCI_EXP_DEVCTL2_COMP_TMOUT_DIS
> > > > except for one powerpc case. You can check yourself by using cscope
> > > > or grep to look for PCI_EXP_DEVCTL2_COMP_TMOUT_DIS or PCI_EXP_DEVCTL2.
> > > >
> > > > If you're seeing bit 4 (PCI_EXP_DEVCTL2_COMP_TMOUT_DIS) set, it's
> > > > likely because firmware set it. You can try booting with
> > > > "pci=earlydump" to see what's there before Linux starts changing
> > > > things.
>
> Yes Linux doesn't set PCI_EXP_DEVCTL2_COMP_TMOUT_DIS, verified with earlydump.
> Firmware means BIOS? If so is there a way to enable the timeout detection?

Sure; you can change the kernel to turn off
PCI_EXP_DEVCTL2_COMP_TMOUT_DIS (for debugging purposes, at least), or
you can do it with setpci, e.g.,

# setpci -s01:00.0 CAP_EXP+0x28.W=0x0000

> 01:00.0 RAM memory: PLDA Device 5555
> Subsystem: Device 4000:0000
> Flags: bus master, fast devsel, latency 0, IRQ 16
> Memory at d0400000 (32-bit, non-prefetchable) [size=4M]
> Capabilities: [40] Power Management version 3
> Capabilities: [48] MSI: Enable- Count=1/1 Maskable- 64bit-
> Capabilities: [60] Express Endpoint, MSI 00
> Kernel driver in use: PLDA PCI
> Kernel modules: plda_pci
>
> 00: 56 15 55 55 07 00 10 00 00 00 00 05 10 00 00 00
> 10: 00 00 40 d0 00 00 00 00 00 00 00 00 00 00 00 00
> 20: 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00
> 30: 00 00 00 00 40 00 00 00 00 00 00 00 0b 01 00 00
> 40: 01 48 03 00 08 00 00 00 05 60 00 00 00 00 00 00
> 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 60: 10 00 02 00 c2 8f 00 00 00 28 01 00 21 f4 03 00
> 70: 01 00 21 00 00 00 00 00 00 00 00 00 00 00 00 00
> 80: 00 00 00 00 02 00 00 00 10 00 00 00 00 00 00 00
> 90: 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00
> a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> So, on my system, the PCI Express capability is at "[60]" and
> PCI_EXP_DEVCTL2 is at 0x88 with value 0x0010
> (PCI_EXP_DEVCTL2_COMP_TMOUT_DIS). Also this matches what lspci
> decodes:
>
> $ sudo lspci -vvs00.0 | grep -A1 DevCtl2
> DevCtl2: Completion Timeout: 50us to 50ms,
> TimeoutDis+, LTR-, OBFF Disabled
> LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-

2020-02-04 15:36:09

by Muni Sekhar

[permalink] [raw]
Subject: Re: pcie: xilinx: kernel hang - ISR readl()

On Sat, Feb 1, 2020 at 11:59 PM Bjorn Helgaas <[email protected]> wrote:
>
> On Sat, Feb 01, 2020 at 08:44:40AM +0530, Muni Sekhar wrote:
> > On Sat, Feb 1, 2020 at 2:16 AM Bjorn Helgaas <[email protected]> wrote:
> > > On Fri, Jan 31, 2020 at 10:04:05PM +0530, Muni Sekhar wrote:
> > > > On Fri, Jan 31, 2020 at 12:30 AM Bjorn Helgaas <[email protected]> wrote:
> > > > > On Thu, Jan 30, 2020 at 09:37:48PM +0530, Muni Sekhar wrote:
> > > > > > On Thu, Jan 9, 2020 at 10:05 AM Bjorn Helgaas <[email protected]> wrote:
> > > > > > >
> > > > > > > On Thu, Jan 09, 2020 at 08:47:51AM +0530, Muni Sekhar wrote:
> > > > > > > > On Thu, Jan 9, 2020 at 1:45 AM Bjorn Helgaas <[email protected]> wrote:
> > > > > > > > > On Tue, Jan 07, 2020 at 09:45:13PM +0530, Muni Sekhar wrote:
> > > > > > > > > > Hi,
> > > > > > > > > >
> > > > > > > > > > I have module with Xilinx FPGA. It implements UART(s), SPI(s),
> > > > > > > > > > parallel I/O and interfaces them to the Host CPU via PCI Express bus.
> > > > > > > > > > I see that my system freezes without capturing the crash dump for
> > > > > > > > > > certain tests. I debugged this issue and it was tracked down to the
> > > > > > > > > > below mentioned interrupt handler code.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > In ISR, first reads the Interrupt Status register using ‘readl()’ as
> > > > > > > > > > given below.
> > > > > > > > > > status = readl(ctrl->reg + INT_STATUS);
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > And then clears the pending interrupts using ‘writel()’ as given blow.
> > > > > > > > > > writel(status, ctrl->reg + INT_STATUS);
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I've noticed a kernel hang if INT_STATUS register read again after
> > > > > > > > > > clearing the pending interrupts.
> > > > > > > > > >
> > > > > > > > > > Can someone clarify me why the kernel hangs without crash dump incase
> > > > > > > > > > if I read the INT_STATUS register using readl() after clearing the
> > > > > > > > > > pending bits?
> > > > > > > > > >
> > > > > > > > > > Can readl() block?
> > > > > > > > >
> > > > > > > > > readl() should not block in software. Obviously at the hardware CPU
> > > > > > > > > instruction level, the read instruction has to wait for the result of
> > > > > > > > > the read. Since that data is provided by the device, i.e., your FPGA,
> > > > > > > > > it's possible there's a problem there.
> > > > > > > >
> > > > > > > > Thank you very much for your reply.
> > > > > > > > Where can I find the details about what is protocol for reading the
> > > > > > > > ‘memory mapped IO’? Can you point me to any useful links..
> > > > > > > > I tried locate the exact point of the kernel code where CPU waits for
> > > > > > > > read instruction as given below.
> > > > > > > > readl() -> __raw_readl() -> return *(const volatile u32 __force *)add
> > > > > > > > Do I need to check for the assembly instructions, here?
> > > > > > >
> > > > > > > The C pointer dereference, e.g., "*address", will be some sort of a
> > > > > > > "load" instruction in assembly. The CPU wait isn't explicit; it's
> > > > > > > just that when you load a value, the CPU waits for the value.
> > > > > > >
> > > > > > > > > Can you tell whether the FPGA has received the Memory Read for
> > > > > > > > > INT_STATUS and sent the completion?
> > > > > > > >
> > > > > > > > Is there a way to know this with the help of software debugging(either
> > > > > > > > enabling dynamic debugging or adding new debug prints)? Can you please
> > > > > > > > point some tools\hw needed to find this?
> > > > > > >
> > > > > > > You could learn this either via a PCIe analyzer (expensive piece of
> > > > > > > hardware) or possibly some logic in the FPGA that would log PCIe
> > > > > > > transactions in a buffer and make them accessible via some other
> > > > > > > interface (you mentioned it had parallel and other interfaces).
> > > > > > >
> > > > > > > > > On the architectures I'm familiar with, if a device doesn't respond,
> > > > > > > > > something would eventually time out so the CPU doesn't wait forever.
> > > > > > > >
> > > > > > > > What is timeout here? I mean how long CPU waits for completion? Since
> > > > > > > > this code runs from interrupt context, does it causes the system to
> > > > > > > > freeze if timeout is more?
> > > > > > >
> > > > > > > The Root Port should have a Completion Timeout. This is required by
> > > > > > > the PCIe spec. The *reporting* of the timeout is somewhat
> > > > > > > implementation-specific since the reporting is outside the PCIe
> > > > > > > domain. I don't know the duration of the timeout, but it certainly
> > > > > > > shouldn't be long enough to look like a "system freeze".
> > > > > > Does kernel writes to PCIe configuration space register ‘Device
> > > > > > Control 2 Register’ (Offset 0x28)? When I tried to read this register,
> > > > > > I noticed bit 4 is set (which disables completion timeouts) and rest
> > > > > > all other bits are zero. So, Completion Timeout detection mechanism is
> > > > > > disabled, right? If so what could be the reason for this?
> > > > >
> > > > > To my knowledge Linux doesn't set PCI_EXP_DEVCTL2_COMP_TMOUT_DIS
> > > > > except for one powerpc case. You can check yourself by using cscope
> > > > > or grep to look for PCI_EXP_DEVCTL2_COMP_TMOUT_DIS or PCI_EXP_DEVCTL2.
> > > > >
> > > > > If you're seeing bit 4 (PCI_EXP_DEVCTL2_COMP_TMOUT_DIS) set, it's
> > > > > likely because firmware set it. You can try booting with
> > > > > "pci=earlydump" to see what's there before Linux starts changing
> > > > > things.
> >
> > Yes Linux doesn't set PCI_EXP_DEVCTL2_COMP_TMOUT_DIS, verified with earlydump.
> > Firmware means BIOS? If so is there a way to enable the timeout detection?
>
> Sure; you can change the kernel to turn off
> PCI_EXP_DEVCTL2_COMP_TMOUT_DIS (for debugging purposes, at least), or
> you can do it with setpci, e.g.,
>
> # setpci -s01:00.0 CAP_EXP+0x28.W=0x0000
If a PCIe device(endpoint) doesn't respond for non-posted memory reads
and if we turn off PCI_EXP_DEVCTL2_COMP_TMOUT_DIS as mentioned above
then it should result time out instead of system freeze, right?

Also, is there a way to know whether timeout occurred at the host
side(with the help of kernel log by enabling dynamic debug)?

>
> > 01:00.0 RAM memory: PLDA Device 5555
> > Subsystem: Device 4000:0000
> > Flags: bus master, fast devsel, latency 0, IRQ 16
> > Memory at d0400000 (32-bit, non-prefetchable) [size=4M]
> > Capabilities: [40] Power Management version 3
> > Capabilities: [48] MSI: Enable- Count=1/1 Maskable- 64bit-
> > Capabilities: [60] Express Endpoint, MSI 00
> > Kernel driver in use: PLDA PCI
> > Kernel modules: plda_pci
> >
> > 00: 56 15 55 55 07 00 10 00 00 00 00 05 10 00 00 00
> > 10: 00 00 40 d0 00 00 00 00 00 00 00 00 00 00 00 00
> > 20: 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00
> > 30: 00 00 00 00 40 00 00 00 00 00 00 00 0b 01 00 00
> > 40: 01 48 03 00 08 00 00 00 05 60 00 00 00 00 00 00
> > 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > 60: 10 00 02 00 c2 8f 00 00 00 28 01 00 21 f4 03 00
> > 70: 01 00 21 00 00 00 00 00 00 00 00 00 00 00 00 00
> > 80: 00 00 00 00 02 00 00 00 10 00 00 00 00 00 00 00
> > 90: 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00
> > a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >
> > So, on my system, the PCI Express capability is at "[60]" and
> > PCI_EXP_DEVCTL2 is at 0x88 with value 0x0010
> > (PCI_EXP_DEVCTL2_COMP_TMOUT_DIS). Also this matches what lspci
> > decodes:
> >
> > $ sudo lspci -vvs00.0 | grep -A1 DevCtl2
> > DevCtl2: Completion Timeout: 50us to 50ms,
> > TimeoutDis+, LTR-, OBFF Disabled
> > LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-



--
Thanks,
Sekhar