Hi Mika,
Since the commit commit 720d6a671a6e("PCI: pciehp: Do not handle events
if interrupts are masked"), the hotplug support on Qemu Guest(4.120-rc1)
with a vfio passthrough device seems to be broken. This is on an ARM64 platform.
I am booting a Guest with below command line options with the intention of
hot add a ixgbevf dev later,
./qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-version=3 -cpu host \
-kernel Image_4.20-rc1 \
-initrd rootfs-iperf.cpio \
-device ioh3420,id=rp1 \
-net none \
-m 4096 \
-nographic -D -d -enable-kvm \
-append "console=ttyAMA0 root=/dev/vda -m 4096 rw pciehp.pciehp_debug=1
pcie_ports=native searlycon=pl011,0x9000000"
But receives this on boot,
[ 1.327852] pciehp 0000:00:01.0:pcie004: Timeout
on hotplug command 0x03f1 (issued 1016 msec ago)
[ 1.335842] pciehp 0000:00:01.0:pcie004: Timeout on hotplug command
0x03f1 (issued 1024 msec ago)
[ 3.847843] pciehp 0000:00:01.0:pcie004: Failed to check link status
[ 3.855843] pciehp 0000:00:01.0:pcie004: Timeout on hotplug command
0x02f1 (issued 2520 msec ago)
[ 4.879846] pciehp 0000:00:01.0:pcie004: Timeout on hotplug command
0x06f1 (issued 1024 msec ago)
[ 5.911840] pciehp 0000:00:01.0:pcie004: Timeout on hotplug command
0x06f1 (issued 2056 msec ago)
[ 6.927844] pciehp 0000:00:01.0:pcie004: Timeout on hotplug command
0x07f1 (issued 1016 msec ago)
[ 7.951843] pciehp 0000:00:01.0:pcie004: Timeout on hotplug command
0x0771 (issued 1024 msec ago)
Trying to hot add using "device_addvfio-pci,host=0000:01:10.1,id=net0,bus=rp1"
doesn't work either. And if I boot the guest with an assigned device
(-device vfio-pci,host=0000:01:10.1,id=net0,bus=rp1), I can see the dev listed in
the Guest but then hot remove doesn't work.
This all works on 4.19 and bisect points to the above mentioned commit, where an
additional check is added in pciehp_isr(),
- * Interrupts only occur in D3hot or shallower (PCIe r4.0, sec 6.7.3.4).
+ * Interrupts only occur in D3hot or shallower and only if enabled
+ * in the Slot Control register (PCIe r4.0, sec 6.7.3.4).
*/
- if (pdev->current_state == PCI_D3cold)
+ if (pdev->current_state == PCI_D3cold ||
+ (!(ctrl->slot_ctrl & PCI_EXP_SLTCTL_HPIE) && !pciehp_poll_mode))
return IRQ_NONE;
I think this doesn't work for the first time, where the cmd with PCI_EXP_SLTCTL_HPIE bit set
is written,
pciehp_probe()
pcie_init_notification()
pcie_enable_notification()
pcie_do_write_cmd()
to begin with, ctrl->slot_ctrl = 0 in pciehp_isr() as this is only set once the write
is returned.
Or else I am missing something here. Please take a look and let me know.
Thanks,
Shameer
+Lukas
On Tue, Nov 13, 2018 at 11:45:42AM +0000, Shameerali Kolothum Thodi wrote:
> Hi Mika,
Hi,
> Since the commit commit 720d6a671a6e("PCI: pciehp: Do not handle events
> if interrupts are masked"), the hotplug support on Qemu Guest(4.120-rc1)
> with a vfio passthrough device seems to be broken. This is on an ARM64 platform.
>
> I am booting a Guest with below command line options with the intention of
> hot add a ixgbevf dev later,
>
> ./qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-version=3 -cpu host \
> -kernel Image_4.20-rc1 \
> -initrd rootfs-iperf.cpio \
> -device ioh3420,id=rp1 \
> -net none \
> -m 4096 \
> -nographic -D -d -enable-kvm \
> -append "console=ttyAMA0 root=/dev/vda -m 4096 rw pciehp.pciehp_debug=1
> pcie_ports=native searlycon=pl011,0x9000000"
>
> But receives this on boot,
>
> [ 1.327852] pciehp 0000:00:01.0:pcie004: Timeout
> on hotplug command 0x03f1 (issued 1016 msec ago)
> [ 1.335842] pciehp 0000:00:01.0:pcie004: Timeout on hotplug command
> 0x03f1 (issued 1024 msec ago)
> [ 3.847843] pciehp 0000:00:01.0:pcie004: Failed to check link status
> [ 3.855843] pciehp 0000:00:01.0:pcie004: Timeout on hotplug command
> 0x02f1 (issued 2520 msec ago)
> [ 4.879846] pciehp 0000:00:01.0:pcie004: Timeout on hotplug command
> 0x06f1 (issued 1024 msec ago)
> [ 5.911840] pciehp 0000:00:01.0:pcie004: Timeout on hotplug command
> 0x06f1 (issued 2056 msec ago)
> [ 6.927844] pciehp 0000:00:01.0:pcie004: Timeout on hotplug command
> 0x07f1 (issued 1016 msec ago)
> [ 7.951843] pciehp 0000:00:01.0:pcie004: Timeout on hotplug command
> 0x0771 (issued 1024 msec ago)
>
> Trying to hot add using "device_addvfio-pci,host=0000:01:10.1,id=net0,bus=rp1"
> doesn't work either. And if I boot the guest with an assigned device
> (-device vfio-pci,host=0000:01:10.1,id=net0,bus=rp1), I can see the dev listed in
> the Guest but then hot remove doesn't work.
>
> This all works on 4.19 and bisect points to the above mentioned commit, where an
> additional check is added in pciehp_isr(),
>
> - * Interrupts only occur in D3hot or shallower (PCIe r4.0, sec 6.7.3.4).
> + * Interrupts only occur in D3hot or shallower and only if enabled
> + * in the Slot Control register (PCIe r4.0, sec 6.7.3.4).
> */
> - if (pdev->current_state == PCI_D3cold)
> + if (pdev->current_state == PCI_D3cold ||
> + (!(ctrl->slot_ctrl & PCI_EXP_SLTCTL_HPIE) && !pciehp_poll_mode))
> return IRQ_NONE;
>
> I think this doesn't work for the first time, where the cmd with PCI_EXP_SLTCTL_HPIE bit set
> is written,
> pciehp_probe()
> pcie_init_notification()
> pcie_enable_notification()
> pcie_do_write_cmd()
>
> to begin with, ctrl->slot_ctrl = 0 in pciehp_isr() as this is only set once the write
> is returned.
>
> Or else I am missing something here. Please take a look and let me know.
Thanks for the detailed report and analysis. I think you are right and
the ctrl->slot_ctrl is only set after the actual value has been
programmed to the hardware, so if there was interrupt "pending" it will
trigger immediately (just to find ctrl->slot_ctrl == 0).
I wonder if the following change helps here?
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 7dd443aea5a5..cd9eae650aa5 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -156,9 +156,9 @@ static void pcie_do_write_cmd(struct controller *ctrl, u16 cmd,
slot_ctrl |= (cmd & mask);
ctrl->cmd_busy = 1;
smp_mb();
+ ctrl->slot_ctrl = slot_ctrl;
pcie_capability_write_word(pdev, PCI_EXP_SLTCTL, slot_ctrl);
ctrl->cmd_started = jiffies;
- ctrl->slot_ctrl = slot_ctrl;
/*
* Controllers with the Intel CF118 and similar errata advertise
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]]
> Sent: 13 November 2018 12:25
> To: Shameerali Kolothum Thodi <[email protected]>
> Cc: [email protected]; [email protected]; Wangzhou (B)
> <[email protected]>; Linuxarm <[email protected]>; Lukas
> Wunner <[email protected]>
> Subject: Re: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue
>
> +Lukas
>
> On Tue, Nov 13, 2018 at 11:45:42AM +0000, Shameerali Kolothum Thodi wrote:
> > Hi Mika,
>
> Hi,
>
> > Since the commit commit 720d6a671a6e("PCI: pciehp: Do not handle events
> > if interrupts are masked"), the hotplug support on Qemu Guest(4.120-rc1)
> > with a vfio passthrough device seems to be broken. This is on an ARM64
> platform.
> >
> > I am booting a Guest with below command line options with the intention of
> > hot add a ixgbevf dev later,
> >
> > ./qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-version=3 -cpu
> host \
> > -kernel Image_4.20-rc1 \
> > -initrd rootfs-iperf.cpio \
> > -device ioh3420,id=rp1 \
> > -net none \
> > -m 4096 \
> > -nographic -D -d -enable-kvm \
> > -append "console=ttyAMA0 root=/dev/vda -m 4096 rw
> pciehp.pciehp_debug=1
> > pcie_ports=native searlycon=pl011,0x9000000"
> >
> > But receives this on boot,
> >
> > [ 1.327852] pciehp 0000:00:01.0:pcie004: Timeout
> > on hotplug command 0x03f1 (issued 1016 msec ago)
> > [ 1.335842] pciehp 0000:00:01.0:pcie004: Timeout on hotplug command
> > 0x03f1 (issued 1024 msec ago)
> > [ 3.847843] pciehp 0000:00:01.0:pcie004: Failed to check link status
> > [ 3.855843] pciehp 0000:00:01.0:pcie004: Timeout on hotplug command
> > 0x02f1 (issued 2520 msec ago)
> > [ 4.879846] pciehp 0000:00:01.0:pcie004: Timeout on hotplug command
> > 0x06f1 (issued 1024 msec ago)
> > [ 5.911840] pciehp 0000:00:01.0:pcie004: Timeout on hotplug command
> > 0x06f1 (issued 2056 msec ago)
> > [ 6.927844] pciehp 0000:00:01.0:pcie004: Timeout on hotplug command
> > 0x07f1 (issued 1016 msec ago)
> > [ 7.951843] pciehp 0000:00:01.0:pcie004: Timeout on hotplug command
> > 0x0771 (issued 1024 msec ago)
> >
> > Trying to hot add using "device_addvfio-
> pci,host=0000:01:10.1,id=net0,bus=rp1"
> > doesn't work either. And if I boot the guest with an assigned device
> > (-device vfio-pci,host=0000:01:10.1,id=net0,bus=rp1), I can see the dev listed
> in
> > the Guest but then hot remove doesn't work.
> >
> > This all works on 4.19 and bisect points to the above mentioned commit,
> where an
> > additional check is added in pciehp_isr(),
> >
> > - * Interrupts only occur in D3hot or shallower (PCIe r4.0, sec 6.7.3.4).
> > + * Interrupts only occur in D3hot or shallower and only if enabled
> > + * in the Slot Control register (PCIe r4.0, sec 6.7.3.4).
> > */
> > - if (pdev->current_state == PCI_D3cold)
> > + if (pdev->current_state == PCI_D3cold ||
> > + (!(ctrl->slot_ctrl & PCI_EXP_SLTCTL_HPIE) && !pciehp_poll_mode))
> > return IRQ_NONE;
> >
> > I think this doesn't work for the first time, where the cmd with
> PCI_EXP_SLTCTL_HPIE bit set
> > is written,
> > pciehp_probe()
> > pcie_init_notification()
> > pcie_enable_notification()
> > pcie_do_write_cmd()
> >
> > to begin with, ctrl->slot_ctrl = 0 in pciehp_isr() as this is only set once the
> write
> > is returned.
> >
> > Or else I am missing something here. Please take a look and let me know.
>
> Thanks for the detailed report and analysis. I think you are right and
> the ctrl->slot_ctrl is only set after the actual value has been
> programmed to the hardware, so if there was interrupt "pending" it will
> trigger immediately (just to find ctrl->slot_ctrl == 0).
>
> I wonder if the following change helps here?
>
> diff --git a/drivers/pci/hotplug/pciehp_hpc.c
> b/drivers/pci/hotplug/pciehp_hpc.c
> index 7dd443aea5a5..cd9eae650aa5 100644
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -156,9 +156,9 @@ static void pcie_do_write_cmd(struct controller *ctrl,
> u16 cmd,
> slot_ctrl |= (cmd & mask);
> ctrl->cmd_busy = 1;
> smp_mb();
> + ctrl->slot_ctrl = slot_ctrl;
Actually I tried this one, but it doesn't help in this case as the initial
pcie_capability_read_word() returns the slot_ctrl without PCI_EXP_SLTCTL_HPIE
bit set. It looks to me pcie_enable_notification() function enables this,
if (!pciehp_poll_mode)
cmd |= PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE;
I don't know this is as per the spec or not as the initial cap read doesn't seems to
have the PCI_EXP_SLTCTL_HPIE bit set.
Thanks,
Shameer
> pcie_capability_write_word(pdev, PCI_EXP_SLTCTL, slot_ctrl);
> ctrl->cmd_started = jiffies;
> - ctrl->slot_ctrl = slot_ctrl;
>
> /*
> * Controllers with the Intel CF118 and similar errata advertise
> -----Original Message-----
> From: Linuxarm [mailto:[email protected]] On Behalf Of
> Shameerali Kolothum Thodi
> Sent: 13 November 2018 12:36
> To: [email protected]
> Cc: [email protected]; Lukas Wunner <[email protected]>; linux-
> [email protected]; Linuxarm <[email protected]>
> Subject: RE: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue
>
>
>
> > -----Original Message-----
> > From: [email protected]
> > [mailto:[email protected]]
> > Sent: 13 November 2018 12:25
> > To: Shameerali Kolothum Thodi <[email protected]>
> > Cc: [email protected]; [email protected]; Wangzhou (B)
> > <[email protected]>; Linuxarm <[email protected]>; Lukas
> > Wunner <[email protected]>
> > Subject: Re: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue
> >
> > +Lukas
> >
> > On Tue, Nov 13, 2018 at 11:45:42AM +0000, Shameerali Kolothum Thodi
> wrote:
> > > Hi Mika,
> >
> > Hi,
> >
> > > Since the commit commit 720d6a671a6e("PCI: pciehp: Do not handle events
> > > if interrupts are masked"), the hotplug support on Qemu Guest(4.120-rc1)
> > > with a vfio passthrough device seems to be broken. This is on an ARM64
> > platform.
> > >
> > > I am booting a Guest with below command line options with the intention of
> > > hot add a ixgbevf dev later,
> > >
> > > ./qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-version=3 -cpu
> > host \
> > > -kernel Image_4.20-rc1 \
> > > -initrd rootfs-iperf.cpio \
> > > -device ioh3420,id=rp1 \
> > > -net none \
> > > -m 4096 \
> > > -nographic -D -d -enable-kvm \
> > > -append "console=ttyAMA0 root=/dev/vda -m 4096 rw
> > pciehp.pciehp_debug=1
> > > pcie_ports=native searlycon=pl011,0x9000000"
> > >
> > > But receives this on boot,
> > >
> > > [ 1.327852] pciehp 0000:00:01.0:pcie004: Timeout
> > > on hotplug command 0x03f1 (issued 1016 msec ago)
> > > [ 1.335842] pciehp 0000:00:01.0:pcie004: Timeout on hotplug command
> > > 0x03f1 (issued 1024 msec ago)
> > > [ 3.847843] pciehp 0000:00:01.0:pcie004: Failed to check link status
> > > [ 3.855843] pciehp 0000:00:01.0:pcie004: Timeout on hotplug command
> > > 0x02f1 (issued 2520 msec ago)
> > > [ 4.879846] pciehp 0000:00:01.0:pcie004: Timeout on hotplug command
> > > 0x06f1 (issued 1024 msec ago)
> > > [ 5.911840] pciehp 0000:00:01.0:pcie004: Timeout on hotplug command
> > > 0x06f1 (issued 2056 msec ago)
> > > [ 6.927844] pciehp 0000:00:01.0:pcie004: Timeout on hotplug command
> > > 0x07f1 (issued 1016 msec ago)
> > > [ 7.951843] pciehp 0000:00:01.0:pcie004: Timeout on hotplug command
> > > 0x0771 (issued 1024 msec ago)
> > >
> > > Trying to hot add using "device_addvfio-
> > pci,host=0000:01:10.1,id=net0,bus=rp1"
> > > doesn't work either. And if I boot the guest with an assigned device
> > > (-device vfio-pci,host=0000:01:10.1,id=net0,bus=rp1), I can see the dev
> listed
> > in
> > > the Guest but then hot remove doesn't work.
> > >
> > > This all works on 4.19 and bisect points to the above mentioned commit,
> > where an
> > > additional check is added in pciehp_isr(),
> > >
> > > - * Interrupts only occur in D3hot or shallower (PCIe r4.0, sec 6.7.3.4).
> > > + * Interrupts only occur in D3hot or shallower and only if enabled
> > > + * in the Slot Control register (PCIe r4.0, sec 6.7.3.4).
> > > */
> > > - if (pdev->current_state == PCI_D3cold)
> > > + if (pdev->current_state == PCI_D3cold ||
> > > + (!(ctrl->slot_ctrl & PCI_EXP_SLTCTL_HPIE) && !pciehp_poll_mode))
> > > return IRQ_NONE;
> > >
> > > I think this doesn't work for the first time, where the cmd with
> > PCI_EXP_SLTCTL_HPIE bit set
> > > is written,
> > > pciehp_probe()
> > > pcie_init_notification()
> > > pcie_enable_notification()
> > > pcie_do_write_cmd()
> > >
> > > to begin with, ctrl->slot_ctrl = 0 in pciehp_isr() as this is only set once the
> > write
> > > is returned.
> > >
> > > Or else I am missing something here. Please take a look and let me know.
> >
> > Thanks for the detailed report and analysis. I think you are right and
> > the ctrl->slot_ctrl is only set after the actual value has been
> > programmed to the hardware, so if there was interrupt "pending" it will
> > trigger immediately (just to find ctrl->slot_ctrl == 0).
> >
> > I wonder if the following change helps here?
> >
> > diff --git a/drivers/pci/hotplug/pciehp_hpc.c
> > b/drivers/pci/hotplug/pciehp_hpc.c
> > index 7dd443aea5a5..cd9eae650aa5 100644
> > --- a/drivers/pci/hotplug/pciehp_hpc.c
> > +++ b/drivers/pci/hotplug/pciehp_hpc.c
> > @@ -156,9 +156,9 @@ static void pcie_do_write_cmd(struct controller *ctrl,
> > u16 cmd,
> > slot_ctrl |= (cmd & mask);
> > ctrl->cmd_busy = 1;
> > smp_mb();
> > + ctrl->slot_ctrl = slot_ctrl;
>
> Actually I tried this one, but it doesn't help in this case as the initial
> pcie_capability_read_word() returns the slot_ctrl without
> PCI_EXP_SLTCTL_HPIE
> bit set. It looks to me pcie_enable_notification() function enables this,
>
> if (!pciehp_poll_mode)
> cmd |= PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE;
>
> I don't know this is as per the spec or not as the initial cap read doesn't seems
> to
> have the PCI_EXP_SLTCTL_HPIE bit set.
Hang on..I think this should work as you are setting the ctrl->slot_ctrl after updating
it with the cmd | mask. I will try this and report back.
Thanks,
Shameer
> Thanks,
> Shameer
>
> > pcie_capability_write_word(pdev, PCI_EXP_SLTCTL, slot_ctrl);
> > ctrl->cmd_started = jiffies;
> > - ctrl->slot_ctrl = slot_ctrl;
> >
> > /*
> > * Controllers with the Intel CF118 and similar errata advertise
>
> _______________________________________________
> Linuxarm mailing list
> [email protected]
> http://hulk.huawei.com/mailman/listinfo/linuxarm
On Tue, Nov 13, 2018 at 12:36:20PM +0000, Shameerali Kolothum Thodi wrote:
> > @@ -156,9 +156,9 @@ static void pcie_do_write_cmd(struct controller *ctrl,
> > u16 cmd,
> > slot_ctrl |= (cmd & mask);
> > ctrl->cmd_busy = 1;
> > smp_mb();
> > + ctrl->slot_ctrl = slot_ctrl;
>
> Actually I tried this one, but it doesn't help in this case as the initial
> pcie_capability_read_word() returns the slot_ctrl without PCI_EXP_SLTCTL_HPIE
> bit set. It looks to me pcie_enable_notification() function enables this,
>
> if (!pciehp_poll_mode)
> cmd |= PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE;
>
> I don't know this is as per the spec or not as the initial cap read doesn't seems to
> have the PCI_EXP_SLTCTL_HPIE bit set.
If I read the code right cmd value should end up in ctrl->slot_ctrl
properly from pcie_enable_notification().
However, I think we are missing check for PCI_EXP_SLTCTL_CCIE in
pciehp_isr().
Here's an updated patch, can you try and see if it makes any difference?
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 7dd443aea5a5..da2cbe892444 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -156,9 +156,9 @@ static void pcie_do_write_cmd(struct controller *ctrl, u16 cmd,
slot_ctrl |= (cmd & mask);
ctrl->cmd_busy = 1;
smp_mb();
+ ctrl->slot_ctrl = slot_ctrl;
pcie_capability_write_word(pdev, PCI_EXP_SLTCTL, slot_ctrl);
ctrl->cmd_started = jiffies;
- ctrl->slot_ctrl = slot_ctrl;
/*
* Controllers with the Intel CF118 and similar errata advertise
@@ -522,7 +522,7 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
* in the Slot Control register (PCIe r4.0, sec 6.7.3.4).
*/
if (pdev->current_state == PCI_D3cold ||
- (!(ctrl->slot_ctrl & PCI_EXP_SLTCTL_HPIE) && !pciehp_poll_mode))
+ (!(ctrl->slot_ctrl & (PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE)) && !pciehp_poll_mode))
return IRQ_NONE;
/*
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]]
> Sent: 13 November 2018 12:59
> To: Shameerali Kolothum Thodi <[email protected]>
> Cc: [email protected]; [email protected]; Wangzhou (B)
> <[email protected]>; Linuxarm <[email protected]>; Lukas
> Wunner <[email protected]>
> Subject: Re: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue
>
> On Tue, Nov 13, 2018 at 12:36:20PM +0000, Shameerali Kolothum Thodi wrote:
> > > @@ -156,9 +156,9 @@ static void pcie_do_write_cmd(struct controller
> > > *ctrl,
> > > u16 cmd,
> > > slot_ctrl |= (cmd & mask);
> > > ctrl->cmd_busy = 1;
> > > smp_mb();
> > > + ctrl->slot_ctrl = slot_ctrl;
> >
> > Actually I tried this one, but it doesn't help in this case as the
> > initial
> > pcie_capability_read_word() returns the slot_ctrl without
> > PCI_EXP_SLTCTL_HPIE bit set. It looks to me
> > pcie_enable_notification() function enables this,
> >
> > if (!pciehp_poll_mode)
> > cmd |= PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE;
> >
> > I don't know this is as per the spec or not as the initial cap read
> > doesn't seems to have the PCI_EXP_SLTCTL_HPIE bit set.
>
> If I read the code right cmd value should end up in ctrl->slot_ctrl properly from
> pcie_enable_notification().
Right. As I mentioned in my previous mail, I missed the fact that you are updating
the ctrl->slot_ctrl with cmd value while in my test I did my update with the value
returned by pcie_capability_read_word().
> However, I think we are missing check for PCI_EXP_SLTCTL_CCIE in
> pciehp_isr().
Ok.
> Here's an updated patch, can you try and see if it makes any difference?
I just tried this and it works. Thanks.
See few comments below.
> diff --git a/drivers/pci/hotplug/pciehp_hpc.c
> b/drivers/pci/hotplug/pciehp_hpc.c
> index 7dd443aea5a5..da2cbe892444 100644
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -156,9 +156,9 @@ static void pcie_do_write_cmd(struct controller *ctrl,
> u16 cmd,
> slot_ctrl |= (cmd & mask);
> ctrl->cmd_busy = 1;
> smp_mb();
> + ctrl->slot_ctrl = slot_ctrl;
Does it make more sense if we can move this before smp_mb()?. Also I am not
sure updating the ctrl->slot_ctrl before actually the hardware is programmed
with that value will result in any other race conditions? TBH, I am not that familiar
with this code and I leave that to you :)
Thanks,
Shameer
> pcie_capability_write_word(pdev, PCI_EXP_SLTCTL, slot_ctrl);
> ctrl->cmd_started = jiffies;
> - ctrl->slot_ctrl = slot_ctrl;
>
> /*
> * Controllers with the Intel CF118 and similar errata advertise @@ -
> 522,7 +522,7 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
> * in the Slot Control register (PCIe r4.0, sec 6.7.3.4).
> */
> if (pdev->current_state == PCI_D3cold ||
> - (!(ctrl->slot_ctrl & PCI_EXP_SLTCTL_HPIE) && !pciehp_poll_mode))
> + (!(ctrl->slot_ctrl & (PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE))
> +&& !pciehp_poll_mode))
> return IRQ_NONE;
>
> /*
On Tue, Nov 13, 2018 at 01:28:04PM +0000, Shameerali Kolothum Thodi wrote:
>
> > -----Original Message-----
> > From: [email protected]
> > [mailto:[email protected]]
> > Sent: 13 November 2018 12:59
> > To: Shameerali Kolothum Thodi <[email protected]>
> > Cc: [email protected]; [email protected]; Wangzhou (B)
> > <[email protected]>; Linuxarm <[email protected]>; Lukas
> > Wunner <[email protected]>
> > Subject: Re: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue
> >
> > On Tue, Nov 13, 2018 at 12:36:20PM +0000, Shameerali Kolothum Thodi wrote:
> > > > @@ -156,9 +156,9 @@ static void pcie_do_write_cmd(struct controller
> > > > *ctrl,
> > > > u16 cmd,
> > > > slot_ctrl |= (cmd & mask);
> > > > ctrl->cmd_busy = 1;
> > > > smp_mb();
> > > > + ctrl->slot_ctrl = slot_ctrl;
> > >
> > > Actually I tried this one, but it doesn't help in this case as the
> > > initial
> > > pcie_capability_read_word() returns the slot_ctrl without
> > > PCI_EXP_SLTCTL_HPIE bit set. It looks to me
> > > pcie_enable_notification() function enables this,
> > >
> > > if (!pciehp_poll_mode)
> > > cmd |= PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE;
> > >
> > > I don't know this is as per the spec or not as the initial cap read
> > > doesn't seems to have the PCI_EXP_SLTCTL_HPIE bit set.
> >
> > If I read the code right cmd value should end up in ctrl->slot_ctrl properly from
> > pcie_enable_notification().
>
> Right. As I mentioned in my previous mail, I missed the fact that you are updating
> the ctrl->slot_ctrl with cmd value while in my test I did my update with the value
> returned by pcie_capability_read_word().
OK, I see.
> > However, I think we are missing check for PCI_EXP_SLTCTL_CCIE in
> > pciehp_isr().
>
> Ok.
>
> > Here's an updated patch, can you try and see if it makes any difference?
>
> I just tried this and it works. Thanks.
Can you still check that the previous one (without _CCIE check) works?
> See few comments below.
>
> > diff --git a/drivers/pci/hotplug/pciehp_hpc.c
> > b/drivers/pci/hotplug/pciehp_hpc.c
> > index 7dd443aea5a5..da2cbe892444 100644
> > --- a/drivers/pci/hotplug/pciehp_hpc.c
> > +++ b/drivers/pci/hotplug/pciehp_hpc.c
> > @@ -156,9 +156,9 @@ static void pcie_do_write_cmd(struct controller *ctrl,
> > u16 cmd,
> > slot_ctrl |= (cmd & mask);
> > ctrl->cmd_busy = 1;
> > smp_mb();
> > + ctrl->slot_ctrl = slot_ctrl;
>
> Does it make more sense if we can move this before smp_mb()?. Also I am not
> sure updating the ctrl->slot_ctrl before actually the hardware is programmed
> with that value will result in any other race conditions? TBH, I am not that familiar
> with this code and I leave that to you :)
Both are good questions :)
For the moving ctrl->slot_ctrl before pcie_capability_write_word(), I
think we should be fine and this is actually more correct because if we
are unmasking interrupts they may trigger immediately making
pciehp_isr() find wrong values in ctrl->slot_ctrl (as can be seen in the
issue you reported).
The smb_mb() thing is not that clear (at least to me) because it is used
in two places in the driver and both seem to be making write to
ctrl->cmd_busy visible to other CPUs but I don't see where we deal with
the read part.
I may be missing something, though.
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]]
> Sent: 13 November 2018 15:08
> To: Shameerali Kolothum Thodi <[email protected]>
> Cc: [email protected]; [email protected]; Wangzhou (B)
> <[email protected]>; Linuxarm <[email protected]>; Lukas
> Wunner <[email protected]>
> Subject: Re: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue
[...]
> > Right. As I mentioned in my previous mail, I missed the fact that you are
> updating
> > the ctrl->slot_ctrl with cmd value while in my test I did my update with the
> value
> > returned by pcie_capability_read_word().
>
> OK, I see.
>
> > > However, I think we are missing check for PCI_EXP_SLTCTL_CCIE in
> > > pciehp_isr().
> >
> > Ok.
> >
> > > Here's an updated patch, can you try and see if it makes any difference?
> >
> > I just tried this and it works. Thanks.
>
> Can you still check that the previous one (without _CCIE check) works?
Yes, it works for me without _CCIE.
> > See few comments below.
> >
> > > diff --git a/drivers/pci/hotplug/pciehp_hpc.c
> > > b/drivers/pci/hotplug/pciehp_hpc.c
> > > index 7dd443aea5a5..da2cbe892444 100644
> > > --- a/drivers/pci/hotplug/pciehp_hpc.c
> > > +++ b/drivers/pci/hotplug/pciehp_hpc.c
> > > @@ -156,9 +156,9 @@ static void pcie_do_write_cmd(struct controller
> *ctrl,
> > > u16 cmd,
> > > slot_ctrl |= (cmd & mask);
> > > ctrl->cmd_busy = 1;
> > > smp_mb();
> > > + ctrl->slot_ctrl = slot_ctrl;
> >
> > Does it make more sense if we can move this before smp_mb()?. Also I am
> not
> > sure updating the ctrl->slot_ctrl before actually the hardware is
> programmed
> > with that value will result in any other race conditions? TBH, I am not that
> familiar
> > with this code and I leave that to you :)
>
> Both are good questions :)
>
> For the moving ctrl->slot_ctrl before pcie_capability_write_word(), I
> think we should be fine and this is actually more correct because if we
> are unmasking interrupts they may trigger immediately making
> pciehp_isr() find wrong values in ctrl->slot_ctrl (as can be seen in the
> issue you reported).
Ok. I was more concerned about an unsolicited event triggering the _isr
while we are modifying the ctrl->slot_ctrl. But that's ok I think as the _isr
reads the hw status anyway.
> The smb_mb() thing is not that clear (at least to me) because it is used
> in two places in the driver and both seem to be making write to
> ctrl->cmd_busy visible to other CPUs but I don't see where we deal with
> the read part.
>
> I may be missing something, though.
I think the read part is in wait_event_timeout() which evaluates the condition.
The wake_up is called from the pciehp_isr(). Since the flag is being updated
in both process level and interrupt handler context, smp_mb() is used. I think
the same now applies to ctrl->slot_ctrl now as this being used in process
context and interrupt context as well.
Thanks,
Shameer
On Tue, Nov 13, 2018 at 03:57:47PM +0000, Shameerali Kolothum Thodi wrote:
> > The smb_mb() thing is not that clear (at least to me) because it is used
> > in two places in the driver and both seem to be making write to
> > ctrl->cmd_busy visible to other CPUs but I don't see where we deal with
> > the read part.
> >
> > I may be missing something, though.
>
> I think the read part is in wait_event_timeout() which evaluates the condition.
> The wake_up is called from the pciehp_isr(). Since the flag is being updated
> in both process level and interrupt handler context, smp_mb() is used. I think
> the same now applies to ctrl->slot_ctrl now as this being used in process
> context and interrupt context as well.
Right, but that would require to use another read/general barrier in the
pciehp_isr() before we read the variable in case interrupt happens
immediately on another CPU (at least that's my understanding). Since I'm
not too comfortable with all these barriers to be honest I would prefer
reading the slot control register directly in pciehp_isr() :-)
I wonder if the below works in your case? I think it is still easier to
understand than adding another barrier there.
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 7dd443aea5a5..575da1005836 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -518,11 +518,9 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
u16 status, events;
/*
- * Interrupts only occur in D3hot or shallower and only if enabled
- * in the Slot Control register (PCIe r4.0, sec 6.7.3.4).
+ * Interrupts only occur in D3hot or shallower (PCIe r4.0, sec 6.7.3.4).
*/
- if (pdev->current_state == PCI_D3cold ||
- (!(ctrl->slot_ctrl & PCI_EXP_SLTCTL_HPIE) && !pciehp_poll_mode))
+ if (pdev->current_state == PCI_D3cold)
return IRQ_NONE;
/*
@@ -548,6 +546,22 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
return IRQ_NONE;
}
+ if (!pciehp_poll_mode) {
+ u16 ctrl;
+
+ /*
+ * Check that the hotplug interrupt was enabled. It may
+ * be that the interrupt was meant for PME instead as
+ * they share the MSI vector.
+ */
+ pcie_capability_read_word(pdev, PCI_EXP_SLTCTL, &ctrl);
+ if (ctrl == (u16) ~0 || !(ctrl & PCI_EXP_SLTCTL_HPIE)) {
+ if (parent)
+ pm_runtime_put(parent);
+ return IRQ_NONE;
+ }
+ }
+
/*
* Slot Status contains plain status bits as well as event
* notification bits; right now we only want the event bits.
On Wed, Nov 14, 2018 at 11:52:25AM +0200, [email protected] wrote:
> On Tue, Nov 13, 2018 at 03:57:47PM +0000, Shameerali Kolothum Thodi wrote:
> > > The smb_mb() thing is not that clear (at least to me) because it is used
> > > in two places in the driver and both seem to be making write to
> > > ctrl->cmd_busy visible to other CPUs but I don't see where we deal with
> > > the read part.
> > >
> > > I may be missing something, though.
> >
> > I think the read part is in wait_event_timeout() which evaluates the
> > condition. The wake_up is called from the pciehp_isr(). Since the flag
> > is being updated in both process level and interrupt handler context,
> > smp_mb() is used. I think the same now applies to ctrl->slot_ctrl now
> > as this being used in process context and interrupt context as well.
>
> Right, but that would require to use another read/general barrier in the
> pciehp_isr() before we read the variable in case interrupt happens
> immediately on another CPU (at least that's my understanding).
In pcie_do_write_cmd(), please just move the
ctrl->slot_ctrl = slot_ctrl;
above the call to pcie_capability_write_word().
AFAICS an explicit memory barrier isn't needed here because of the call to
pcie_capability_write_word(), which "will [ordinarily] be guaranteed to be
fully ordered and uncombined" (Documentation/memory-barriers.txt, section
"KERNEL I/O BARRIER EFFECTS").
The memory barrier in pciehp_isr() is also bogus because the following
wake_up() implies a memory barrier if a task was woken. (And if none
was woken, who cares.)
> Since I'm
> not too comfortable with all these barriers to be honest I would prefer
> reading the slot control register directly in pciehp_isr() :-)
That is an approach I'd strongly object to: While pciehp itself only
signals very few interrupts (making an additional mmio read appear to
be negligible), it may share its interrupt with other devices. On my
MacBookPro9,1, a hotplug port of the Thunderbolt controller shares
its interrupt line with the Wifi card and SD card reader, and those
may signal a huge number of interrupts. On such a machine an additional
mmio read per interrupt becomes a problem.
Thanks,
Lukas
On Wed, Nov 14, 2018 at 02:30:14PM +0100, Lukas Wunner wrote:
> On Wed, Nov 14, 2018 at 11:52:25AM +0200, [email protected] wrote:
> > On Tue, Nov 13, 2018 at 03:57:47PM +0000, Shameerali Kolothum Thodi wrote:
> > > > The smb_mb() thing is not that clear (at least to me) because it is used
> > > > in two places in the driver and both seem to be making write to
> > > > ctrl->cmd_busy visible to other CPUs but I don't see where we deal with
> > > > the read part.
> > > >
> > > > I may be missing something, though.
> > >
> > > I think the read part is in wait_event_timeout() which evaluates the
> > > condition. The wake_up is called from the pciehp_isr(). Since the flag
> > > is being updated in both process level and interrupt handler context,
> > > smp_mb() is used. I think the same now applies to ctrl->slot_ctrl now
> > > as this being used in process context and interrupt context as well.
> >
> > Right, but that would require to use another read/general barrier in the
> > pciehp_isr() before we read the variable in case interrupt happens
> > immediately on another CPU (at least that's my understanding).
>
> In pcie_do_write_cmd(), please just move the
>
> ctrl->slot_ctrl = slot_ctrl;
>
> above the call to pcie_capability_write_word().
>
> AFAICS an explicit memory barrier isn't needed here because of the call to
> pcie_capability_write_word(), which "will [ordinarily] be guaranteed to be
> fully ordered and uncombined" (Documentation/memory-barriers.txt, section
> "KERNEL I/O BARRIER EFFECTS").
>
> The memory barrier in pciehp_isr() is also bogus because the following
> wake_up() implies a memory barrier if a task was woken. (And if none
> was woken, who cares.)
>
>
> > Since I'm
> > not too comfortable with all these barriers to be honest I would prefer
> > reading the slot control register directly in pciehp_isr() :-)
>
> That is an approach I'd strongly object to: While pciehp itself only
> signals very few interrupts (making an additional mmio read appear to
> be negligible), it may share its interrupt with other devices. On my
> MacBookPro9,1, a hotplug port of the Thunderbolt controller shares
> its interrupt line with the Wifi card and SD card reader, and those
> may signal a huge number of interrupts. On such a machine an additional
> mmio read per interrupt becomes a problem.
OK.
I just sent a patch moving ctrl->slot_ctrl assignment to happen before
pcie_capability_write_word().