2022-03-28 09:04:32

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re:

On Mon, Mar 28, 2022 at 12:56:41PM +0800, Jason Wang wrote:
> On Fri, Mar 25, 2022 at 6:10 PM Michael S. Tsirkin <[email protected]> wrote:
> >
> > On Fri, Mar 25, 2022 at 05:20:19PM +0800, Jason Wang wrote:
> > > On Fri, Mar 25, 2022 at 5:10 PM Michael S. Tsirkin <[email protected]> wrote:
> > > >
> > > > On Fri, Mar 25, 2022 at 03:52:00PM +0800, Jason Wang wrote:
> > > > > On Fri, Mar 25, 2022 at 2:31 PM Michael S. Tsirkin <[email protected]> wrote:
> > > > > >
> > > > > > Bcc:
> > > > > > Subject: Re: [PATCH 3/3] virtio: harden vring IRQ
> > > > > > Message-ID: <[email protected]>
> > > > > > Reply-To:
> > > > > > In-Reply-To: <[email protected]>
> > > > > >
> > > > > > On Fri, Mar 25, 2022 at 11:04:08AM +0800, Jason Wang wrote:
> > > > > > >
> > > > > > > 在 2022/3/24 下午7:03, Michael S. Tsirkin 写道:
> > > > > > > > On Thu, Mar 24, 2022 at 04:40:04PM +0800, Jason Wang wrote:
> > > > > > > > > This is a rework on the previous IRQ hardening that is done for
> > > > > > > > > virtio-pci where several drawbacks were found and were reverted:
> > > > > > > > >
> > > > > > > > > 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
> > > > > > > > > that is used by some device such as virtio-blk
> > > > > > > > > 2) done only for PCI transport
> > > > > > > > >
> > > > > > > > > In this patch, we tries to borrow the idea from the INTX IRQ hardening
> > > > > > > > > in the reverted commit 080cd7c3ac87 ("virtio-pci: harden INTX interrupts")
> > > > > > > > > by introducing a global irq_soft_enabled variable for each
> > > > > > > > > virtio_device. Then we can to toggle it during
> > > > > > > > > virtio_reset_device()/virtio_device_ready(). A synchornize_rcu() is
> > > > > > > > > used in virtio_reset_device() to synchronize with the IRQ handlers. In
> > > > > > > > > the future, we may provide config_ops for the transport that doesn't
> > > > > > > > > use IRQ. With this, vring_interrupt() can return check and early if
> > > > > > > > > irq_soft_enabled is false. This lead to smp_load_acquire() to be used
> > > > > > > > > but the cost should be acceptable.
> > > > > > > > Maybe it should be but is it? Can't we use synchronize_irq instead?
> > > > > > >
> > > > > > >
> > > > > > > Even if we allow the transport driver to synchornize through
> > > > > > > synchronize_irq() we still need a check in the vring_interrupt().
> > > > > > >
> > > > > > > We do something like the following previously:
> > > > > > >
> > > > > > > if (!READ_ONCE(vp_dev->intx_soft_enabled))
> > > > > > > return IRQ_NONE;
> > > > > > >
> > > > > > > But it looks like a bug since speculative read can be done before the check
> > > > > > > where the interrupt handler can't see the uncommitted setup which is done by
> > > > > > > the driver.
> > > > > >
> > > > > > I don't think so - if you sync after setting the value then
> > > > > > you are guaranteed that any handler running afterwards
> > > > > > will see the new value.
> > > > >
> > > > > The problem is not disabled but the enable.
> > > >
> > > > So a misbehaving device can lose interrupts? That's not a problem at all
> > > > imo.
> > >
> > > It's the interrupt raised before setting irq_soft_enabled to true:
> > >
> > > CPU 0 probe) driver specific setup (not commited)
> > > CPU 1 IRQ handler) read the uninitialized variable
> > > CPU 0 probe) set irq_soft_enabled to true
> > > CPU 1 IRQ handler) read irq_soft_enable as true
> > > CPU 1 IRQ handler) use the uninitialized variable
> > >
> > > Thanks
> >
> > Yea, it hurts if you do it. So do not do it then ;).
> >
> > irq_soft_enabled (I think driver_ok or status is a better name)
>
> I can change it to driver_ok.
>
> > should be initialized to false *before* irq is requested.
> >
> > And requesting irq commits all memory otherwise all drivers would be
> > broken,
>
> So I think we might talk different issues:
>
> 1) Whether request_irq() commits the previous setups, I think the
> answer is yes, since the spin_unlock of desc->lock (release) can
> guarantee this though there seems no documentation around
> request_irq() to say this.
>
> And I can see at least drivers/video/fbdev/omap2/omapfb/dss/dispc.c is
> using smp_wmb() before the request_irq().
>
> And even if write is ordered we still need read to be ordered to be
> paired with that.
>
> > if it doesn't it just needs to be fixed, not worked around in
> > virtio.
>
> 2) virtio drivers might do a lot of setups between request_irq() and
> virtio_device_ready():
>
> request_irq()
> driver specific setups
> virtio_device_ready()
>
> CPU 0 probe) request_irq()
> CPU 1 IRQ handler) read the uninitialized variable
> CPU 0 probe) driver specific setups
> CPU 0 probe) smp_store_release(intr_soft_enabled, true), commit the setups
> CPU 1 IRQ handler) read irq_soft_enable as true
> CPU 1 IRQ handler) use the uninitialized variable
>
> Thanks


As I said, virtio_device_ready needs to do synchronize_irq.
That will guarantee all setup is visible to the specific IRQ, this
is what it's point is.


> >
> >
> > > >
> > > > > We use smp_store_relase()
> > > > > to make sure the driver commits the setup before enabling the irq. It
> > > > > means the read needs to be ordered as well in vring_interrupt().
> > > > >
> > > > > >
> > > > > > Although I couldn't find anything about this in memory-barriers.txt
> > > > > > which surprises me.
> > > > > >
> > > > > > CC Paul to help make sure I'm right.
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > > To avoid breaking legacy device which can send IRQ before DRIVER_OK, a
> > > > > > > > > module parameter is introduced to enable the hardening so function
> > > > > > > > > hardening is disabled by default.
> > > > > > > > Which devices are these? How come they send an interrupt before there
> > > > > > > > are any buffers in any queues?
> > > > > > >
> > > > > > >
> > > > > > > I copied this from the commit log for 22b7050a024d7
> > > > > > >
> > > > > > > "
> > > > > > >
> > > > > > > This change will also benefit old hypervisors (before 2009)
> > > > > > > that send interrupts without checking DRIVER_OK: previously,
> > > > > > > the callback could race with driver-specific initialization.
> > > > > > > "
> > > > > > >
> > > > > > > If this is only for config interrupt, I can remove the above log.
> > > > > >
> > > > > >
> > > > > > This is only for config interrupt.
> > > > >
> > > > > Ok.
> > > > >
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > > Note that the hardening is only done for vring interrupt since the
> > > > > > > > > config interrupt hardening is already done in commit 22b7050a024d7
> > > > > > > > > ("virtio: defer config changed notifications"). But the method that is
> > > > > > > > > used by config interrupt can't be reused by the vring interrupt
> > > > > > > > > handler because it uses spinlock to do the synchronization which is
> > > > > > > > > expensive.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Jason Wang <[email protected]>
> > > > > > > >
> > > > > > > > > ---
> > > > > > > > > drivers/virtio/virtio.c | 19 +++++++++++++++++++
> > > > > > > > > drivers/virtio/virtio_ring.c | 9 ++++++++-
> > > > > > > > > include/linux/virtio.h | 4 ++++
> > > > > > > > > include/linux/virtio_config.h | 25 +++++++++++++++++++++++++
> > > > > > > > > 4 files changed, 56 insertions(+), 1 deletion(-)
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > > > > > > > index 8dde44ea044a..85e331efa9cc 100644
> > > > > > > > > --- a/drivers/virtio/virtio.c
> > > > > > > > > +++ b/drivers/virtio/virtio.c
> > > > > > > > > @@ -7,6 +7,12 @@
> > > > > > > > > #include <linux/of.h>
> > > > > > > > > #include <uapi/linux/virtio_ids.h>
> > > > > > > > > +static bool irq_hardening = false;
> > > > > > > > > +
> > > > > > > > > +module_param(irq_hardening, bool, 0444);
> > > > > > > > > +MODULE_PARM_DESC(irq_hardening,
> > > > > > > > > + "Disalbe IRQ software processing when it is not expected");
> > > > > > > > > +
> > > > > > > > > /* Unique numbering for virtio devices. */
> > > > > > > > > static DEFINE_IDA(virtio_index_ida);
> > > > > > > > > @@ -220,6 +226,15 @@ static int virtio_features_ok(struct virtio_device *dev)
> > > > > > > > > * */
> > > > > > > > > void virtio_reset_device(struct virtio_device *dev)
> > > > > > > > > {
> > > > > > > > > + /*
> > > > > > > > > + * The below synchronize_rcu() guarantees that any
> > > > > > > > > + * interrupt for this line arriving after
> > > > > > > > > + * synchronize_rcu() has completed is guaranteed to see
> > > > > > > > > + * irq_soft_enabled == false.
> > > > > > > > News to me I did not know synchronize_rcu has anything to do
> > > > > > > > with interrupts. Did not you intend to use synchronize_irq?
> > > > > > > > I am not even 100% sure synchronize_rcu is by design a memory barrier
> > > > > > > > though it's most likely is ...
> > > > > > >
> > > > > > >
> > > > > > > According to the comment above tree RCU version of synchronize_rcu():
> > > > > > >
> > > > > > > """
> > > > > > >
> > > > > > > * RCU read-side critical sections are delimited by rcu_read_lock()
> > > > > > > * and rcu_read_unlock(), and may be nested. In addition, but only in
> > > > > > > * v5.0 and later, regions of code across which interrupts, preemption,
> > > > > > > * or softirqs have been disabled also serve as RCU read-side critical
> > > > > > > * sections. This includes hardware interrupt handlers, softirq handlers,
> > > > > > > * and NMI handlers.
> > > > > > > """
> > > > > > >
> > > > > > > So interrupt handlers are treated as read-side critical sections.
> > > > > > >
> > > > > > > And it has the comment for explain the barrier:
> > > > > > >
> > > > > > > """
> > > > > > >
> > > > > > > * Note that this guarantee implies further memory-ordering guarantees.
> > > > > > > * On systems with more than one CPU, when synchronize_rcu() returns,
> > > > > > > * each CPU is guaranteed to have executed a full memory barrier since
> > > > > > > * the end of its last RCU read-side critical section whose beginning
> > > > > > > * preceded the call to synchronize_rcu(). In addition, each CPU having
> > > > > > > """
> > > > > > >
> > > > > > > So on SMP it provides a full barrier. And for UP/tiny RCU we don't need the
> > > > > > > barrier, if the interrupt come after WRITE_ONCE() it will see the
> > > > > > > irq_soft_enabled as false.
> > > > > > >
> > > > > >
> > > > > > You are right. So then
> > > > > > 1. I do not think we need load_acquire - why is it needed? Just
> > > > > > READ_ONCE should do.
> > > > >
> > > > > See above.
> > > > >
> > > > > > 2. isn't synchronize_irq also doing the same thing?
> > > > >
> > > > >
> > > > > Yes, but it requires a config ops since the IRQ knowledge is transport specific.
> > > > >
> > > > > >
> > > > > >
> > > > > > > >
> > > > > > > > > + */
> > > > > > > > > + WRITE_ONCE(dev->irq_soft_enabled, false);
> > > > > > > > > + synchronize_rcu();
> > > > > > > > > +
> > > > > > > > > dev->config->reset(dev);
> > > > > > > > > }
> > > > > > > > > EXPORT_SYMBOL_GPL(virtio_reset_device);
> > > > > > > > Please add comment explaining where it will be enabled.
> > > > > > > > Also, we *really* don't need to synch if it was already disabled,
> > > > > > > > let's not add useless overhead to the boot sequence.
> > > > > > >
> > > > > > >
> > > > > > > Ok.
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > @@ -427,6 +442,10 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > > spin_lock_init(&dev->config_lock);
> > > > > > > > > dev->config_enabled = false;
> > > > > > > > > dev->config_change_pending = false;
> > > > > > > > > + dev->irq_soft_check = irq_hardening;
> > > > > > > > > +
> > > > > > > > > + if (dev->irq_soft_check)
> > > > > > > > > + dev_info(&dev->dev, "IRQ hardening is enabled\n");
> > > > > > > > > /* We always start by resetting the device, in case a previous
> > > > > > > > > * driver messed it up. This also tests that code path a little. */
> > > > > > > > one of the points of hardening is it's also helpful for buggy
> > > > > > > > devices. this flag defeats the purpose.
> > > > > > >
> > > > > > >
> > > > > > > Do you mean:
> > > > > > >
> > > > > > > 1) we need something like config_enable? This seems not easy to be
> > > > > > > implemented without obvious overhead, mainly the synchronize with the
> > > > > > > interrupt handlers
> > > > > >
> > > > > > But synchronize is only on tear-down path. That is not critical for any
> > > > > > users at the moment, even less than probe.
> > > > >
> > > > > I meant if we have vq->irq_pending, we need to call vring_interrupt()
> > > > > in the virtio_device_ready() and synchronize the IRQ handlers with
> > > > > spinlock or others.
> > > > >
> > > > > >
> > > > > > > 2) enable this by default, so I don't object, but this may have some risk
> > > > > > > for old hypervisors
> > > > > >
> > > > > >
> > > > > > The risk if there's a driver adding buffers without setting DRIVER_OK.
> > > > >
> > > > > Probably not, we have devices that accept random inputs from outside,
> > > > > net, console, input etc. I've done a round of audits of the Qemu
> > > > > codes. They look all fine since day0.
> > > > >
> > > > > > So with this approach, how about we rename the flag "driver_ok"?
> > > > > > And then add_buf can actually test it and BUG_ON if not there (at least
> > > > > > in the debug build).
> > > > >
> > > > > This looks like a hardening of the driver in the core instead of the
> > > > > device. I think it can be done but in a separate series.
> > > > >
> > > > > >
> > > > > > And going down from there, how about we cache status in the
> > > > > > device? Then we don't need to keep re-reading it every time,
> > > > > > speeding boot up a tiny bit.
> > > > >
> > > > > I don't fully understand here, actually spec requires status to be
> > > > > read back for validation in many cases.
> > > > >
> > > > > Thanks
> > > > >
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > > > > index 962f1477b1fa..0170f8c784d8 100644
> > > > > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > > > > @@ -2144,10 +2144,17 @@ static inline bool more_used(const struct vring_virtqueue *vq)
> > > > > > > > > return vq->packed_ring ? more_used_packed(vq) : more_used_split(vq);
> > > > > > > > > }
> > > > > > > > > -irqreturn_t vring_interrupt(int irq, void *_vq)
> > > > > > > > > +irqreturn_t vring_interrupt(int irq, void *v)
> > > > > > > > > {
> > > > > > > > > + struct virtqueue *_vq = v;
> > > > > > > > > + struct virtio_device *vdev = _vq->vdev;
> > > > > > > > > struct vring_virtqueue *vq = to_vvq(_vq);
> > > > > > > > > + if (!virtio_irq_soft_enabled(vdev)) {
> > > > > > > > > + dev_warn_once(&vdev->dev, "virtio vring IRQ raised before DRIVER_OK");
> > > > > > > > > + return IRQ_NONE;
> > > > > > > > > + }
> > > > > > > > > +
> > > > > > > > > if (!more_used(vq)) {
> > > > > > > > > pr_debug("virtqueue interrupt with no work for %p\n", vq);
> > > > > > > > > return IRQ_NONE;
> > > > > > > > > diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> > > > > > > > > index 5464f398912a..957d6ad604ac 100644
> > > > > > > > > --- a/include/linux/virtio.h
> > > > > > > > > +++ b/include/linux/virtio.h
> > > > > > > > > @@ -95,6 +95,8 @@ dma_addr_t virtqueue_get_used_addr(struct virtqueue *vq);
> > > > > > > > > * @failed: saved value for VIRTIO_CONFIG_S_FAILED bit (for restore)
> > > > > > > > > * @config_enabled: configuration change reporting enabled
> > > > > > > > > * @config_change_pending: configuration change reported while disabled
> > > > > > > > > + * @irq_soft_check: whether or not to check @irq_soft_enabled
> > > > > > > > > + * @irq_soft_enabled: callbacks enabled
> > > > > > > > > * @config_lock: protects configuration change reporting
> > > > > > > > > * @dev: underlying device.
> > > > > > > > > * @id: the device type identification (used to match it with a driver).
> > > > > > > > > @@ -109,6 +111,8 @@ struct virtio_device {
> > > > > > > > > bool failed;
> > > > > > > > > bool config_enabled;
> > > > > > > > > bool config_change_pending;
> > > > > > > > > + bool irq_soft_check;
> > > > > > > > > + bool irq_soft_enabled;
> > > > > > > > > spinlock_t config_lock;
> > > > > > > > > spinlock_t vqs_list_lock; /* Protects VQs list access */
> > > > > > > > > struct device dev;
> > > > > > > > > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > > > > > > > > index dafdc7f48c01..9c1b61f2e525 100644
> > > > > > > > > --- a/include/linux/virtio_config.h
> > > > > > > > > +++ b/include/linux/virtio_config.h
> > > > > > > > > @@ -174,6 +174,24 @@ static inline bool virtio_has_feature(const struct virtio_device *vdev,
> > > > > > > > > return __virtio_test_bit(vdev, fbit);
> > > > > > > > > }
> > > > > > > > > +/*
> > > > > > > > > + * virtio_irq_soft_enabled: whether we can execute callbacks
> > > > > > > > > + * @vdev: the device
> > > > > > > > > + */
> > > > > > > > > +static inline bool virtio_irq_soft_enabled(const struct virtio_device *vdev)
> > > > > > > > > +{
> > > > > > > > > + if (!vdev->irq_soft_check)
> > > > > > > > > + return true;
> > > > > > > > > +
> > > > > > > > > + /*
> > > > > > > > > + * Read irq_soft_enabled before reading other device specific
> > > > > > > > > + * data. Paried with smp_store_relase() in
> > > > > > > > paired
> > > > > > >
> > > > > > >
> > > > > > > Will fix.
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > > + * virtio_device_ready() and WRITE_ONCE()/synchronize_rcu() in
> > > > > > > > > + * virtio_reset_device().
> > > > > > > > > + */
> > > > > > > > > + return smp_load_acquire(&vdev->irq_soft_enabled);
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > /**
> > > > > > > > > * virtio_has_dma_quirk - determine whether this device has the DMA quirk
> > > > > > > > > * @vdev: the device
> > > > > > > > > @@ -236,6 +254,13 @@ void virtio_device_ready(struct virtio_device *dev)
> > > > > > > > > if (dev->config->enable_cbs)
> > > > > > > > > dev->config->enable_cbs(dev);
> > > > > > > > > + /*
> > > > > > > > > + * Commit the driver setup before enabling the virtqueue
> > > > > > > > > + * callbacks. Paried with smp_load_acuqire() in
> > > > > > > > > + * virtio_irq_soft_enabled()
> > > > > > > > > + */
> > > > > > > > > + smp_store_release(&dev->irq_soft_enabled, true);
> > > > > > > > > +
> > > > > > > > > BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > }
> > > > > > > > > --
> > > > > > > > > 2.25.1
> > > > > >
> > > >
> >


2022-03-28 11:30:42

by Jason Wang

[permalink] [raw]
Subject: Re:

On Mon, Mar 28, 2022 at 1:59 PM Michael S. Tsirkin <[email protected]> wrote:
>
> On Mon, Mar 28, 2022 at 12:56:41PM +0800, Jason Wang wrote:
> > On Fri, Mar 25, 2022 at 6:10 PM Michael S. Tsirkin <[email protected]> wrote:
> > >
> > > On Fri, Mar 25, 2022 at 05:20:19PM +0800, Jason Wang wrote:
> > > > On Fri, Mar 25, 2022 at 5:10 PM Michael S. Tsirkin <[email protected]> wrote:
> > > > >
> > > > > On Fri, Mar 25, 2022 at 03:52:00PM +0800, Jason Wang wrote:
> > > > > > On Fri, Mar 25, 2022 at 2:31 PM Michael S. Tsirkin <[email protected]> wrote:
> > > > > > >
> > > > > > > Bcc:
> > > > > > > Subject: Re: [PATCH 3/3] virtio: harden vring IRQ
> > > > > > > Message-ID: <[email protected]>
> > > > > > > Reply-To:
> > > > > > > In-Reply-To: <[email protected]>
> > > > > > >
> > > > > > > On Fri, Mar 25, 2022 at 11:04:08AM +0800, Jason Wang wrote:
> > > > > > > >
> > > > > > > > 在 2022/3/24 下午7:03, Michael S. Tsirkin 写道:
> > > > > > > > > On Thu, Mar 24, 2022 at 04:40:04PM +0800, Jason Wang wrote:
> > > > > > > > > > This is a rework on the previous IRQ hardening that is done for
> > > > > > > > > > virtio-pci where several drawbacks were found and were reverted:
> > > > > > > > > >
> > > > > > > > > > 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
> > > > > > > > > > that is used by some device such as virtio-blk
> > > > > > > > > > 2) done only for PCI transport
> > > > > > > > > >
> > > > > > > > > > In this patch, we tries to borrow the idea from the INTX IRQ hardening
> > > > > > > > > > in the reverted commit 080cd7c3ac87 ("virtio-pci: harden INTX interrupts")
> > > > > > > > > > by introducing a global irq_soft_enabled variable for each
> > > > > > > > > > virtio_device. Then we can to toggle it during
> > > > > > > > > > virtio_reset_device()/virtio_device_ready(). A synchornize_rcu() is
> > > > > > > > > > used in virtio_reset_device() to synchronize with the IRQ handlers. In
> > > > > > > > > > the future, we may provide config_ops for the transport that doesn't
> > > > > > > > > > use IRQ. With this, vring_interrupt() can return check and early if
> > > > > > > > > > irq_soft_enabled is false. This lead to smp_load_acquire() to be used
> > > > > > > > > > but the cost should be acceptable.
> > > > > > > > > Maybe it should be but is it? Can't we use synchronize_irq instead?
> > > > > > > >
> > > > > > > >
> > > > > > > > Even if we allow the transport driver to synchornize through
> > > > > > > > synchronize_irq() we still need a check in the vring_interrupt().
> > > > > > > >
> > > > > > > > We do something like the following previously:
> > > > > > > >
> > > > > > > > if (!READ_ONCE(vp_dev->intx_soft_enabled))
> > > > > > > > return IRQ_NONE;
> > > > > > > >
> > > > > > > > But it looks like a bug since speculative read can be done before the check
> > > > > > > > where the interrupt handler can't see the uncommitted setup which is done by
> > > > > > > > the driver.
> > > > > > >
> > > > > > > I don't think so - if you sync after setting the value then
> > > > > > > you are guaranteed that any handler running afterwards
> > > > > > > will see the new value.
> > > > > >
> > > > > > The problem is not disabled but the enable.
> > > > >
> > > > > So a misbehaving device can lose interrupts? That's not a problem at all
> > > > > imo.
> > > >
> > > > It's the interrupt raised before setting irq_soft_enabled to true:
> > > >
> > > > CPU 0 probe) driver specific setup (not commited)
> > > > CPU 1 IRQ handler) read the uninitialized variable
> > > > CPU 0 probe) set irq_soft_enabled to true
> > > > CPU 1 IRQ handler) read irq_soft_enable as true
> > > > CPU 1 IRQ handler) use the uninitialized variable
> > > >
> > > > Thanks
> > >
> > > Yea, it hurts if you do it. So do not do it then ;).
> > >
> > > irq_soft_enabled (I think driver_ok or status is a better name)
> >
> > I can change it to driver_ok.
> >
> > > should be initialized to false *before* irq is requested.
> > >
> > > And requesting irq commits all memory otherwise all drivers would be
> > > broken,
> >
> > So I think we might talk different issues:
> >
> > 1) Whether request_irq() commits the previous setups, I think the
> > answer is yes, since the spin_unlock of desc->lock (release) can
> > guarantee this though there seems no documentation around
> > request_irq() to say this.
> >
> > And I can see at least drivers/video/fbdev/omap2/omapfb/dss/dispc.c is
> > using smp_wmb() before the request_irq().
> >
> > And even if write is ordered we still need read to be ordered to be
> > paired with that.
> >
> > > if it doesn't it just needs to be fixed, not worked around in
> > > virtio.
> >
> > 2) virtio drivers might do a lot of setups between request_irq() and
> > virtio_device_ready():
> >
> > request_irq()
> > driver specific setups
> > virtio_device_ready()
> >
> > CPU 0 probe) request_irq()
> > CPU 1 IRQ handler) read the uninitialized variable
> > CPU 0 probe) driver specific setups
> > CPU 0 probe) smp_store_release(intr_soft_enabled, true), commit the setups
> > CPU 1 IRQ handler) read irq_soft_enable as true
> > CPU 1 IRQ handler) use the uninitialized variable
> >
> > Thanks
>
>
> As I said, virtio_device_ready needs to do synchronize_irq.
> That will guarantee all setup is visible to the specific IRQ,

Only the interrupt after synchronize_irq() returns.

>this
> is what it's point is.

What happens if an interrupt is raised in the middle like:

smp_store_release(dev->irq_soft_enabled, true)
IRQ handler
synchornize_irq()

If we don't enforce a reading order, the IRQ handler may still see the
uninitialized variable.

Thanks

>
>
> > >
> > >
> > > > >
> > > > > > We use smp_store_relase()
> > > > > > to make sure the driver commits the setup before enabling the irq. It
> > > > > > means the read needs to be ordered as well in vring_interrupt().
> > > > > >
> > > > > > >
> > > > > > > Although I couldn't find anything about this in memory-barriers.txt
> > > > > > > which surprises me.
> > > > > > >
> > > > > > > CC Paul to help make sure I'm right.
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > To avoid breaking legacy device which can send IRQ before DRIVER_OK, a
> > > > > > > > > > module parameter is introduced to enable the hardening so function
> > > > > > > > > > hardening is disabled by default.
> > > > > > > > > Which devices are these? How come they send an interrupt before there
> > > > > > > > > are any buffers in any queues?
> > > > > > > >
> > > > > > > >
> > > > > > > > I copied this from the commit log for 22b7050a024d7
> > > > > > > >
> > > > > > > > "
> > > > > > > >
> > > > > > > > This change will also benefit old hypervisors (before 2009)
> > > > > > > > that send interrupts without checking DRIVER_OK: previously,
> > > > > > > > the callback could race with driver-specific initialization.
> > > > > > > > "
> > > > > > > >
> > > > > > > > If this is only for config interrupt, I can remove the above log.
> > > > > > >
> > > > > > >
> > > > > > > This is only for config interrupt.
> > > > > >
> > > > > > Ok.
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > Note that the hardening is only done for vring interrupt since the
> > > > > > > > > > config interrupt hardening is already done in commit 22b7050a024d7
> > > > > > > > > > ("virtio: defer config changed notifications"). But the method that is
> > > > > > > > > > used by config interrupt can't be reused by the vring interrupt
> > > > > > > > > > handler because it uses spinlock to do the synchronization which is
> > > > > > > > > > expensive.
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Jason Wang <[email protected]>
> > > > > > > > >
> > > > > > > > > > ---
> > > > > > > > > > drivers/virtio/virtio.c | 19 +++++++++++++++++++
> > > > > > > > > > drivers/virtio/virtio_ring.c | 9 ++++++++-
> > > > > > > > > > include/linux/virtio.h | 4 ++++
> > > > > > > > > > include/linux/virtio_config.h | 25 +++++++++++++++++++++++++
> > > > > > > > > > 4 files changed, 56 insertions(+), 1 deletion(-)
> > > > > > > > > >
> > > > > > > > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > > > > > > > > index 8dde44ea044a..85e331efa9cc 100644
> > > > > > > > > > --- a/drivers/virtio/virtio.c
> > > > > > > > > > +++ b/drivers/virtio/virtio.c
> > > > > > > > > > @@ -7,6 +7,12 @@
> > > > > > > > > > #include <linux/of.h>
> > > > > > > > > > #include <uapi/linux/virtio_ids.h>
> > > > > > > > > > +static bool irq_hardening = false;
> > > > > > > > > > +
> > > > > > > > > > +module_param(irq_hardening, bool, 0444);
> > > > > > > > > > +MODULE_PARM_DESC(irq_hardening,
> > > > > > > > > > + "Disalbe IRQ software processing when it is not expected");
> > > > > > > > > > +
> > > > > > > > > > /* Unique numbering for virtio devices. */
> > > > > > > > > > static DEFINE_IDA(virtio_index_ida);
> > > > > > > > > > @@ -220,6 +226,15 @@ static int virtio_features_ok(struct virtio_device *dev)
> > > > > > > > > > * */
> > > > > > > > > > void virtio_reset_device(struct virtio_device *dev)
> > > > > > > > > > {
> > > > > > > > > > + /*
> > > > > > > > > > + * The below synchronize_rcu() guarantees that any
> > > > > > > > > > + * interrupt for this line arriving after
> > > > > > > > > > + * synchronize_rcu() has completed is guaranteed to see
> > > > > > > > > > + * irq_soft_enabled == false.
> > > > > > > > > News to me I did not know synchronize_rcu has anything to do
> > > > > > > > > with interrupts. Did not you intend to use synchronize_irq?
> > > > > > > > > I am not even 100% sure synchronize_rcu is by design a memory barrier
> > > > > > > > > though it's most likely is ...
> > > > > > > >
> > > > > > > >
> > > > > > > > According to the comment above tree RCU version of synchronize_rcu():
> > > > > > > >
> > > > > > > > """
> > > > > > > >
> > > > > > > > * RCU read-side critical sections are delimited by rcu_read_lock()
> > > > > > > > * and rcu_read_unlock(), and may be nested. In addition, but only in
> > > > > > > > * v5.0 and later, regions of code across which interrupts, preemption,
> > > > > > > > * or softirqs have been disabled also serve as RCU read-side critical
> > > > > > > > * sections. This includes hardware interrupt handlers, softirq handlers,
> > > > > > > > * and NMI handlers.
> > > > > > > > """
> > > > > > > >
> > > > > > > > So interrupt handlers are treated as read-side critical sections.
> > > > > > > >
> > > > > > > > And it has the comment for explain the barrier:
> > > > > > > >
> > > > > > > > """
> > > > > > > >
> > > > > > > > * Note that this guarantee implies further memory-ordering guarantees.
> > > > > > > > * On systems with more than one CPU, when synchronize_rcu() returns,
> > > > > > > > * each CPU is guaranteed to have executed a full memory barrier since
> > > > > > > > * the end of its last RCU read-side critical section whose beginning
> > > > > > > > * preceded the call to synchronize_rcu(). In addition, each CPU having
> > > > > > > > """
> > > > > > > >
> > > > > > > > So on SMP it provides a full barrier. And for UP/tiny RCU we don't need the
> > > > > > > > barrier, if the interrupt come after WRITE_ONCE() it will see the
> > > > > > > > irq_soft_enabled as false.
> > > > > > > >
> > > > > > >
> > > > > > > You are right. So then
> > > > > > > 1. I do not think we need load_acquire - why is it needed? Just
> > > > > > > READ_ONCE should do.
> > > > > >
> > > > > > See above.
> > > > > >
> > > > > > > 2. isn't synchronize_irq also doing the same thing?
> > > > > >
> > > > > >
> > > > > > Yes, but it requires a config ops since the IRQ knowledge is transport specific.
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > >
> > > > > > > > > > + */
> > > > > > > > > > + WRITE_ONCE(dev->irq_soft_enabled, false);
> > > > > > > > > > + synchronize_rcu();
> > > > > > > > > > +
> > > > > > > > > > dev->config->reset(dev);
> > > > > > > > > > }
> > > > > > > > > > EXPORT_SYMBOL_GPL(virtio_reset_device);
> > > > > > > > > Please add comment explaining where it will be enabled.
> > > > > > > > > Also, we *really* don't need to synch if it was already disabled,
> > > > > > > > > let's not add useless overhead to the boot sequence.
> > > > > > > >
> > > > > > > >
> > > > > > > > Ok.
> > > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > @@ -427,6 +442,10 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > > > spin_lock_init(&dev->config_lock);
> > > > > > > > > > dev->config_enabled = false;
> > > > > > > > > > dev->config_change_pending = false;
> > > > > > > > > > + dev->irq_soft_check = irq_hardening;
> > > > > > > > > > +
> > > > > > > > > > + if (dev->irq_soft_check)
> > > > > > > > > > + dev_info(&dev->dev, "IRQ hardening is enabled\n");
> > > > > > > > > > /* We always start by resetting the device, in case a previous
> > > > > > > > > > * driver messed it up. This also tests that code path a little. */
> > > > > > > > > one of the points of hardening is it's also helpful for buggy
> > > > > > > > > devices. this flag defeats the purpose.
> > > > > > > >
> > > > > > > >
> > > > > > > > Do you mean:
> > > > > > > >
> > > > > > > > 1) we need something like config_enable? This seems not easy to be
> > > > > > > > implemented without obvious overhead, mainly the synchronize with the
> > > > > > > > interrupt handlers
> > > > > > >
> > > > > > > But synchronize is only on tear-down path. That is not critical for any
> > > > > > > users at the moment, even less than probe.
> > > > > >
> > > > > > I meant if we have vq->irq_pending, we need to call vring_interrupt()
> > > > > > in the virtio_device_ready() and synchronize the IRQ handlers with
> > > > > > spinlock or others.
> > > > > >
> > > > > > >
> > > > > > > > 2) enable this by default, so I don't object, but this may have some risk
> > > > > > > > for old hypervisors
> > > > > > >
> > > > > > >
> > > > > > > The risk if there's a driver adding buffers without setting DRIVER_OK.
> > > > > >
> > > > > > Probably not, we have devices that accept random inputs from outside,
> > > > > > net, console, input etc. I've done a round of audits of the Qemu
> > > > > > codes. They look all fine since day0.
> > > > > >
> > > > > > > So with this approach, how about we rename the flag "driver_ok"?
> > > > > > > And then add_buf can actually test it and BUG_ON if not there (at least
> > > > > > > in the debug build).
> > > > > >
> > > > > > This looks like a hardening of the driver in the core instead of the
> > > > > > device. I think it can be done but in a separate series.
> > > > > >
> > > > > > >
> > > > > > > And going down from there, how about we cache status in the
> > > > > > > device? Then we don't need to keep re-reading it every time,
> > > > > > > speeding boot up a tiny bit.
> > > > > >
> > > > > > I don't fully understand here, actually spec requires status to be
> > > > > > read back for validation in many cases.
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > > > > > index 962f1477b1fa..0170f8c784d8 100644
> > > > > > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > > > > > @@ -2144,10 +2144,17 @@ static inline bool more_used(const struct vring_virtqueue *vq)
> > > > > > > > > > return vq->packed_ring ? more_used_packed(vq) : more_used_split(vq);
> > > > > > > > > > }
> > > > > > > > > > -irqreturn_t vring_interrupt(int irq, void *_vq)
> > > > > > > > > > +irqreturn_t vring_interrupt(int irq, void *v)
> > > > > > > > > > {
> > > > > > > > > > + struct virtqueue *_vq = v;
> > > > > > > > > > + struct virtio_device *vdev = _vq->vdev;
> > > > > > > > > > struct vring_virtqueue *vq = to_vvq(_vq);
> > > > > > > > > > + if (!virtio_irq_soft_enabled(vdev)) {
> > > > > > > > > > + dev_warn_once(&vdev->dev, "virtio vring IRQ raised before DRIVER_OK");
> > > > > > > > > > + return IRQ_NONE;
> > > > > > > > > > + }
> > > > > > > > > > +
> > > > > > > > > > if (!more_used(vq)) {
> > > > > > > > > > pr_debug("virtqueue interrupt with no work for %p\n", vq);
> > > > > > > > > > return IRQ_NONE;
> > > > > > > > > > diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> > > > > > > > > > index 5464f398912a..957d6ad604ac 100644
> > > > > > > > > > --- a/include/linux/virtio.h
> > > > > > > > > > +++ b/include/linux/virtio.h
> > > > > > > > > > @@ -95,6 +95,8 @@ dma_addr_t virtqueue_get_used_addr(struct virtqueue *vq);
> > > > > > > > > > * @failed: saved value for VIRTIO_CONFIG_S_FAILED bit (for restore)
> > > > > > > > > > * @config_enabled: configuration change reporting enabled
> > > > > > > > > > * @config_change_pending: configuration change reported while disabled
> > > > > > > > > > + * @irq_soft_check: whether or not to check @irq_soft_enabled
> > > > > > > > > > + * @irq_soft_enabled: callbacks enabled
> > > > > > > > > > * @config_lock: protects configuration change reporting
> > > > > > > > > > * @dev: underlying device.
> > > > > > > > > > * @id: the device type identification (used to match it with a driver).
> > > > > > > > > > @@ -109,6 +111,8 @@ struct virtio_device {
> > > > > > > > > > bool failed;
> > > > > > > > > > bool config_enabled;
> > > > > > > > > > bool config_change_pending;
> > > > > > > > > > + bool irq_soft_check;
> > > > > > > > > > + bool irq_soft_enabled;
> > > > > > > > > > spinlock_t config_lock;
> > > > > > > > > > spinlock_t vqs_list_lock; /* Protects VQs list access */
> > > > > > > > > > struct device dev;
> > > > > > > > > > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > > > > > > > > > index dafdc7f48c01..9c1b61f2e525 100644
> > > > > > > > > > --- a/include/linux/virtio_config.h
> > > > > > > > > > +++ b/include/linux/virtio_config.h
> > > > > > > > > > @@ -174,6 +174,24 @@ static inline bool virtio_has_feature(const struct virtio_device *vdev,
> > > > > > > > > > return __virtio_test_bit(vdev, fbit);
> > > > > > > > > > }
> > > > > > > > > > +/*
> > > > > > > > > > + * virtio_irq_soft_enabled: whether we can execute callbacks
> > > > > > > > > > + * @vdev: the device
> > > > > > > > > > + */
> > > > > > > > > > +static inline bool virtio_irq_soft_enabled(const struct virtio_device *vdev)
> > > > > > > > > > +{
> > > > > > > > > > + if (!vdev->irq_soft_check)
> > > > > > > > > > + return true;
> > > > > > > > > > +
> > > > > > > > > > + /*
> > > > > > > > > > + * Read irq_soft_enabled before reading other device specific
> > > > > > > > > > + * data. Paried with smp_store_relase() in
> > > > > > > > > paired
> > > > > > > >
> > > > > > > >
> > > > > > > > Will fix.
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > + * virtio_device_ready() and WRITE_ONCE()/synchronize_rcu() in
> > > > > > > > > > + * virtio_reset_device().
> > > > > > > > > > + */
> > > > > > > > > > + return smp_load_acquire(&vdev->irq_soft_enabled);
> > > > > > > > > > +}
> > > > > > > > > > +
> > > > > > > > > > /**
> > > > > > > > > > * virtio_has_dma_quirk - determine whether this device has the DMA quirk
> > > > > > > > > > * @vdev: the device
> > > > > > > > > > @@ -236,6 +254,13 @@ void virtio_device_ready(struct virtio_device *dev)
> > > > > > > > > > if (dev->config->enable_cbs)
> > > > > > > > > > dev->config->enable_cbs(dev);
> > > > > > > > > > + /*
> > > > > > > > > > + * Commit the driver setup before enabling the virtqueue
> > > > > > > > > > + * callbacks. Paried with smp_load_acuqire() in
> > > > > > > > > > + * virtio_irq_soft_enabled()
> > > > > > > > > > + */
> > > > > > > > > > + smp_store_release(&dev->irq_soft_enabled, true);
> > > > > > > > > > +
> > > > > > > > > > BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > > dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > > }
> > > > > > > > > > --
> > > > > > > > > > 2.25.1
> > > > > > >
> > > > >
> > >
>

2022-03-28 22:26:28

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re:

On Mon, Mar 28, 2022 at 02:18:22PM +0800, Jason Wang wrote:
> On Mon, Mar 28, 2022 at 1:59 PM Michael S. Tsirkin <[email protected]> wrote:
> >
> > On Mon, Mar 28, 2022 at 12:56:41PM +0800, Jason Wang wrote:
> > > On Fri, Mar 25, 2022 at 6:10 PM Michael S. Tsirkin <[email protected]> wrote:
> > > >
> > > > On Fri, Mar 25, 2022 at 05:20:19PM +0800, Jason Wang wrote:
> > > > > On Fri, Mar 25, 2022 at 5:10 PM Michael S. Tsirkin <[email protected]> wrote:
> > > > > >
> > > > > > On Fri, Mar 25, 2022 at 03:52:00PM +0800, Jason Wang wrote:
> > > > > > > On Fri, Mar 25, 2022 at 2:31 PM Michael S. Tsirkin <[email protected]> wrote:
> > > > > > > >
> > > > > > > > Bcc:
> > > > > > > > Subject: Re: [PATCH 3/3] virtio: harden vring IRQ
> > > > > > > > Message-ID: <[email protected]>
> > > > > > > > Reply-To:
> > > > > > > > In-Reply-To: <[email protected]>
> > > > > > > >
> > > > > > > > On Fri, Mar 25, 2022 at 11:04:08AM +0800, Jason Wang wrote:
> > > > > > > > >
> > > > > > > > > 在 2022/3/24 下午7:03, Michael S. Tsirkin 写道:
> > > > > > > > > > On Thu, Mar 24, 2022 at 04:40:04PM +0800, Jason Wang wrote:
> > > > > > > > > > > This is a rework on the previous IRQ hardening that is done for
> > > > > > > > > > > virtio-pci where several drawbacks were found and were reverted:
> > > > > > > > > > >
> > > > > > > > > > > 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
> > > > > > > > > > > that is used by some device such as virtio-blk
> > > > > > > > > > > 2) done only for PCI transport
> > > > > > > > > > >
> > > > > > > > > > > In this patch, we tries to borrow the idea from the INTX IRQ hardening
> > > > > > > > > > > in the reverted commit 080cd7c3ac87 ("virtio-pci: harden INTX interrupts")
> > > > > > > > > > > by introducing a global irq_soft_enabled variable for each
> > > > > > > > > > > virtio_device. Then we can to toggle it during
> > > > > > > > > > > virtio_reset_device()/virtio_device_ready(). A synchornize_rcu() is
> > > > > > > > > > > used in virtio_reset_device() to synchronize with the IRQ handlers. In
> > > > > > > > > > > the future, we may provide config_ops for the transport that doesn't
> > > > > > > > > > > use IRQ. With this, vring_interrupt() can return check and early if
> > > > > > > > > > > irq_soft_enabled is false. This lead to smp_load_acquire() to be used
> > > > > > > > > > > but the cost should be acceptable.
> > > > > > > > > > Maybe it should be but is it? Can't we use synchronize_irq instead?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Even if we allow the transport driver to synchornize through
> > > > > > > > > synchronize_irq() we still need a check in the vring_interrupt().
> > > > > > > > >
> > > > > > > > > We do something like the following previously:
> > > > > > > > >
> > > > > > > > > if (!READ_ONCE(vp_dev->intx_soft_enabled))
> > > > > > > > > return IRQ_NONE;
> > > > > > > > >
> > > > > > > > > But it looks like a bug since speculative read can be done before the check
> > > > > > > > > where the interrupt handler can't see the uncommitted setup which is done by
> > > > > > > > > the driver.
> > > > > > > >
> > > > > > > > I don't think so - if you sync after setting the value then
> > > > > > > > you are guaranteed that any handler running afterwards
> > > > > > > > will see the new value.
> > > > > > >
> > > > > > > The problem is not disabled but the enable.
> > > > > >
> > > > > > So a misbehaving device can lose interrupts? That's not a problem at all
> > > > > > imo.
> > > > >
> > > > > It's the interrupt raised before setting irq_soft_enabled to true:
> > > > >
> > > > > CPU 0 probe) driver specific setup (not commited)
> > > > > CPU 1 IRQ handler) read the uninitialized variable
> > > > > CPU 0 probe) set irq_soft_enabled to true
> > > > > CPU 1 IRQ handler) read irq_soft_enable as true
> > > > > CPU 1 IRQ handler) use the uninitialized variable
> > > > >
> > > > > Thanks
> > > >
> > > > Yea, it hurts if you do it. So do not do it then ;).
> > > >
> > > > irq_soft_enabled (I think driver_ok or status is a better name)
> > >
> > > I can change it to driver_ok.
> > >
> > > > should be initialized to false *before* irq is requested.
> > > >
> > > > And requesting irq commits all memory otherwise all drivers would be
> > > > broken,
> > >
> > > So I think we might talk different issues:
> > >
> > > 1) Whether request_irq() commits the previous setups, I think the
> > > answer is yes, since the spin_unlock of desc->lock (release) can
> > > guarantee this though there seems no documentation around
> > > request_irq() to say this.
> > >
> > > And I can see at least drivers/video/fbdev/omap2/omapfb/dss/dispc.c is
> > > using smp_wmb() before the request_irq().
> > >
> > > And even if write is ordered we still need read to be ordered to be
> > > paired with that.

IMO it synchronizes with the CPU to which irq is
delivered. Otherwise basically all drivers would be broken,
wouldn't they be?
I don't know whether it's correct on all platforms, but if not
we need to fix request_irq.

> > >
> > > > if it doesn't it just needs to be fixed, not worked around in
> > > > virtio.
> > >
> > > 2) virtio drivers might do a lot of setups between request_irq() and
> > > virtio_device_ready():
> > >
> > > request_irq()
> > > driver specific setups
> > > virtio_device_ready()
> > >
> > > CPU 0 probe) request_irq()
> > > CPU 1 IRQ handler) read the uninitialized variable
> > > CPU 0 probe) driver specific setups
> > > CPU 0 probe) smp_store_release(intr_soft_enabled, true), commit the setups
> > > CPU 1 IRQ handler) read irq_soft_enable as true
> > > CPU 1 IRQ handler) use the uninitialized variable
> > >
> > > Thanks
> >
> >
> > As I said, virtio_device_ready needs to do synchronize_irq.
> > That will guarantee all setup is visible to the specific IRQ,
>
> Only the interrupt after synchronize_irq() returns.

Anything else is a buggy device though.

> >this
> > is what it's point is.
>
> What happens if an interrupt is raised in the middle like:
>
> smp_store_release(dev->irq_soft_enabled, true)
> IRQ handler
> synchornize_irq()
>
> If we don't enforce a reading order, the IRQ handler may still see the
> uninitialized variable.
>
> Thanks

IMHO variables should be initialized before request_irq
to a value meaning "not a valid interrupt".
Specifically driver_ok = false.
Handler in the scenario you describe will then see !driver_ok
and exit immediately.


> >
> >
> > > >
> > > >
> > > > > >
> > > > > > > We use smp_store_relase()
> > > > > > > to make sure the driver commits the setup before enabling the irq. It
> > > > > > > means the read needs to be ordered as well in vring_interrupt().
> > > > > > >
> > > > > > > >
> > > > > > > > Although I couldn't find anything about this in memory-barriers.txt
> > > > > > > > which surprises me.
> > > > > > > >
> > > > > > > > CC Paul to help make sure I'm right.
> > > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > To avoid breaking legacy device which can send IRQ before DRIVER_OK, a
> > > > > > > > > > > module parameter is introduced to enable the hardening so function
> > > > > > > > > > > hardening is disabled by default.
> > > > > > > > > > Which devices are these? How come they send an interrupt before there
> > > > > > > > > > are any buffers in any queues?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > I copied this from the commit log for 22b7050a024d7
> > > > > > > > >
> > > > > > > > > "
> > > > > > > > >
> > > > > > > > > This change will also benefit old hypervisors (before 2009)
> > > > > > > > > that send interrupts without checking DRIVER_OK: previously,
> > > > > > > > > the callback could race with driver-specific initialization.
> > > > > > > > > "
> > > > > > > > >
> > > > > > > > > If this is only for config interrupt, I can remove the above log.
> > > > > > > >
> > > > > > > >
> > > > > > > > This is only for config interrupt.
> > > > > > >
> > > > > > > Ok.
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > Note that the hardening is only done for vring interrupt since the
> > > > > > > > > > > config interrupt hardening is already done in commit 22b7050a024d7
> > > > > > > > > > > ("virtio: defer config changed notifications"). But the method that is
> > > > > > > > > > > used by config interrupt can't be reused by the vring interrupt
> > > > > > > > > > > handler because it uses spinlock to do the synchronization which is
> > > > > > > > > > > expensive.
> > > > > > > > > > >
> > > > > > > > > > > Signed-off-by: Jason Wang <[email protected]>
> > > > > > > > > >
> > > > > > > > > > > ---
> > > > > > > > > > > drivers/virtio/virtio.c | 19 +++++++++++++++++++
> > > > > > > > > > > drivers/virtio/virtio_ring.c | 9 ++++++++-
> > > > > > > > > > > include/linux/virtio.h | 4 ++++
> > > > > > > > > > > include/linux/virtio_config.h | 25 +++++++++++++++++++++++++
> > > > > > > > > > > 4 files changed, 56 insertions(+), 1 deletion(-)
> > > > > > > > > > >
> > > > > > > > > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > > > > > > > > > index 8dde44ea044a..85e331efa9cc 100644
> > > > > > > > > > > --- a/drivers/virtio/virtio.c
> > > > > > > > > > > +++ b/drivers/virtio/virtio.c
> > > > > > > > > > > @@ -7,6 +7,12 @@
> > > > > > > > > > > #include <linux/of.h>
> > > > > > > > > > > #include <uapi/linux/virtio_ids.h>
> > > > > > > > > > > +static bool irq_hardening = false;
> > > > > > > > > > > +
> > > > > > > > > > > +module_param(irq_hardening, bool, 0444);
> > > > > > > > > > > +MODULE_PARM_DESC(irq_hardening,
> > > > > > > > > > > + "Disalbe IRQ software processing when it is not expected");
> > > > > > > > > > > +
> > > > > > > > > > > /* Unique numbering for virtio devices. */
> > > > > > > > > > > static DEFINE_IDA(virtio_index_ida);
> > > > > > > > > > > @@ -220,6 +226,15 @@ static int virtio_features_ok(struct virtio_device *dev)
> > > > > > > > > > > * */
> > > > > > > > > > > void virtio_reset_device(struct virtio_device *dev)
> > > > > > > > > > > {
> > > > > > > > > > > + /*
> > > > > > > > > > > + * The below synchronize_rcu() guarantees that any
> > > > > > > > > > > + * interrupt for this line arriving after
> > > > > > > > > > > + * synchronize_rcu() has completed is guaranteed to see
> > > > > > > > > > > + * irq_soft_enabled == false.
> > > > > > > > > > News to me I did not know synchronize_rcu has anything to do
> > > > > > > > > > with interrupts. Did not you intend to use synchronize_irq?
> > > > > > > > > > I am not even 100% sure synchronize_rcu is by design a memory barrier
> > > > > > > > > > though it's most likely is ...
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > According to the comment above tree RCU version of synchronize_rcu():
> > > > > > > > >
> > > > > > > > > """
> > > > > > > > >
> > > > > > > > > * RCU read-side critical sections are delimited by rcu_read_lock()
> > > > > > > > > * and rcu_read_unlock(), and may be nested. In addition, but only in
> > > > > > > > > * v5.0 and later, regions of code across which interrupts, preemption,
> > > > > > > > > * or softirqs have been disabled also serve as RCU read-side critical
> > > > > > > > > * sections. This includes hardware interrupt handlers, softirq handlers,
> > > > > > > > > * and NMI handlers.
> > > > > > > > > """
> > > > > > > > >
> > > > > > > > > So interrupt handlers are treated as read-side critical sections.
> > > > > > > > >
> > > > > > > > > And it has the comment for explain the barrier:
> > > > > > > > >
> > > > > > > > > """
> > > > > > > > >
> > > > > > > > > * Note that this guarantee implies further memory-ordering guarantees.
> > > > > > > > > * On systems with more than one CPU, when synchronize_rcu() returns,
> > > > > > > > > * each CPU is guaranteed to have executed a full memory barrier since
> > > > > > > > > * the end of its last RCU read-side critical section whose beginning
> > > > > > > > > * preceded the call to synchronize_rcu(). In addition, each CPU having
> > > > > > > > > """
> > > > > > > > >
> > > > > > > > > So on SMP it provides a full barrier. And for UP/tiny RCU we don't need the
> > > > > > > > > barrier, if the interrupt come after WRITE_ONCE() it will see the
> > > > > > > > > irq_soft_enabled as false.
> > > > > > > > >
> > > > > > > >
> > > > > > > > You are right. So then
> > > > > > > > 1. I do not think we need load_acquire - why is it needed? Just
> > > > > > > > READ_ONCE should do.
> > > > > > >
> > > > > > > See above.
> > > > > > >
> > > > > > > > 2. isn't synchronize_irq also doing the same thing?
> > > > > > >
> > > > > > >
> > > > > > > Yes, but it requires a config ops since the IRQ knowledge is transport specific.
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > + */
> > > > > > > > > > > + WRITE_ONCE(dev->irq_soft_enabled, false);
> > > > > > > > > > > + synchronize_rcu();
> > > > > > > > > > > +
> > > > > > > > > > > dev->config->reset(dev);
> > > > > > > > > > > }
> > > > > > > > > > > EXPORT_SYMBOL_GPL(virtio_reset_device);
> > > > > > > > > > Please add comment explaining where it will be enabled.
> > > > > > > > > > Also, we *really* don't need to synch if it was already disabled,
> > > > > > > > > > let's not add useless overhead to the boot sequence.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Ok.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > @@ -427,6 +442,10 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > > > > spin_lock_init(&dev->config_lock);
> > > > > > > > > > > dev->config_enabled = false;
> > > > > > > > > > > dev->config_change_pending = false;
> > > > > > > > > > > + dev->irq_soft_check = irq_hardening;
> > > > > > > > > > > +
> > > > > > > > > > > + if (dev->irq_soft_check)
> > > > > > > > > > > + dev_info(&dev->dev, "IRQ hardening is enabled\n");
> > > > > > > > > > > /* We always start by resetting the device, in case a previous
> > > > > > > > > > > * driver messed it up. This also tests that code path a little. */
> > > > > > > > > > one of the points of hardening is it's also helpful for buggy
> > > > > > > > > > devices. this flag defeats the purpose.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Do you mean:
> > > > > > > > >
> > > > > > > > > 1) we need something like config_enable? This seems not easy to be
> > > > > > > > > implemented without obvious overhead, mainly the synchronize with the
> > > > > > > > > interrupt handlers
> > > > > > > >
> > > > > > > > But synchronize is only on tear-down path. That is not critical for any
> > > > > > > > users at the moment, even less than probe.
> > > > > > >
> > > > > > > I meant if we have vq->irq_pending, we need to call vring_interrupt()
> > > > > > > in the virtio_device_ready() and synchronize the IRQ handlers with
> > > > > > > spinlock or others.
> > > > > > >
> > > > > > > >
> > > > > > > > > 2) enable this by default, so I don't object, but this may have some risk
> > > > > > > > > for old hypervisors
> > > > > > > >
> > > > > > > >
> > > > > > > > The risk if there's a driver adding buffers without setting DRIVER_OK.
> > > > > > >
> > > > > > > Probably not, we have devices that accept random inputs from outside,
> > > > > > > net, console, input etc. I've done a round of audits of the Qemu
> > > > > > > codes. They look all fine since day0.
> > > > > > >
> > > > > > > > So with this approach, how about we rename the flag "driver_ok"?
> > > > > > > > And then add_buf can actually test it and BUG_ON if not there (at least
> > > > > > > > in the debug build).
> > > > > > >
> > > > > > > This looks like a hardening of the driver in the core instead of the
> > > > > > > device. I think it can be done but in a separate series.
> > > > > > >
> > > > > > > >
> > > > > > > > And going down from there, how about we cache status in the
> > > > > > > > device? Then we don't need to keep re-reading it every time,
> > > > > > > > speeding boot up a tiny bit.
> > > > > > >
> > > > > > > I don't fully understand here, actually spec requires status to be
> > > > > > > read back for validation in many cases.
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > > > > > > index 962f1477b1fa..0170f8c784d8 100644
> > > > > > > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > > > > > > @@ -2144,10 +2144,17 @@ static inline bool more_used(const struct vring_virtqueue *vq)
> > > > > > > > > > > return vq->packed_ring ? more_used_packed(vq) : more_used_split(vq);
> > > > > > > > > > > }
> > > > > > > > > > > -irqreturn_t vring_interrupt(int irq, void *_vq)
> > > > > > > > > > > +irqreturn_t vring_interrupt(int irq, void *v)
> > > > > > > > > > > {
> > > > > > > > > > > + struct virtqueue *_vq = v;
> > > > > > > > > > > + struct virtio_device *vdev = _vq->vdev;
> > > > > > > > > > > struct vring_virtqueue *vq = to_vvq(_vq);
> > > > > > > > > > > + if (!virtio_irq_soft_enabled(vdev)) {
> > > > > > > > > > > + dev_warn_once(&vdev->dev, "virtio vring IRQ raised before DRIVER_OK");
> > > > > > > > > > > + return IRQ_NONE;
> > > > > > > > > > > + }
> > > > > > > > > > > +
> > > > > > > > > > > if (!more_used(vq)) {
> > > > > > > > > > > pr_debug("virtqueue interrupt with no work for %p\n", vq);
> > > > > > > > > > > return IRQ_NONE;
> > > > > > > > > > > diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> > > > > > > > > > > index 5464f398912a..957d6ad604ac 100644
> > > > > > > > > > > --- a/include/linux/virtio.h
> > > > > > > > > > > +++ b/include/linux/virtio.h
> > > > > > > > > > > @@ -95,6 +95,8 @@ dma_addr_t virtqueue_get_used_addr(struct virtqueue *vq);
> > > > > > > > > > > * @failed: saved value for VIRTIO_CONFIG_S_FAILED bit (for restore)
> > > > > > > > > > > * @config_enabled: configuration change reporting enabled
> > > > > > > > > > > * @config_change_pending: configuration change reported while disabled
> > > > > > > > > > > + * @irq_soft_check: whether or not to check @irq_soft_enabled
> > > > > > > > > > > + * @irq_soft_enabled: callbacks enabled
> > > > > > > > > > > * @config_lock: protects configuration change reporting
> > > > > > > > > > > * @dev: underlying device.
> > > > > > > > > > > * @id: the device type identification (used to match it with a driver).
> > > > > > > > > > > @@ -109,6 +111,8 @@ struct virtio_device {
> > > > > > > > > > > bool failed;
> > > > > > > > > > > bool config_enabled;
> > > > > > > > > > > bool config_change_pending;
> > > > > > > > > > > + bool irq_soft_check;
> > > > > > > > > > > + bool irq_soft_enabled;
> > > > > > > > > > > spinlock_t config_lock;
> > > > > > > > > > > spinlock_t vqs_list_lock; /* Protects VQs list access */
> > > > > > > > > > > struct device dev;
> > > > > > > > > > > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > > > > > > > > > > index dafdc7f48c01..9c1b61f2e525 100644
> > > > > > > > > > > --- a/include/linux/virtio_config.h
> > > > > > > > > > > +++ b/include/linux/virtio_config.h
> > > > > > > > > > > @@ -174,6 +174,24 @@ static inline bool virtio_has_feature(const struct virtio_device *vdev,
> > > > > > > > > > > return __virtio_test_bit(vdev, fbit);
> > > > > > > > > > > }
> > > > > > > > > > > +/*
> > > > > > > > > > > + * virtio_irq_soft_enabled: whether we can execute callbacks
> > > > > > > > > > > + * @vdev: the device
> > > > > > > > > > > + */
> > > > > > > > > > > +static inline bool virtio_irq_soft_enabled(const struct virtio_device *vdev)
> > > > > > > > > > > +{
> > > > > > > > > > > + if (!vdev->irq_soft_check)
> > > > > > > > > > > + return true;
> > > > > > > > > > > +
> > > > > > > > > > > + /*
> > > > > > > > > > > + * Read irq_soft_enabled before reading other device specific
> > > > > > > > > > > + * data. Paried with smp_store_relase() in
> > > > > > > > > > paired
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Will fix.
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > + * virtio_device_ready() and WRITE_ONCE()/synchronize_rcu() in
> > > > > > > > > > > + * virtio_reset_device().
> > > > > > > > > > > + */
> > > > > > > > > > > + return smp_load_acquire(&vdev->irq_soft_enabled);
> > > > > > > > > > > +}
> > > > > > > > > > > +
> > > > > > > > > > > /**
> > > > > > > > > > > * virtio_has_dma_quirk - determine whether this device has the DMA quirk
> > > > > > > > > > > * @vdev: the device
> > > > > > > > > > > @@ -236,6 +254,13 @@ void virtio_device_ready(struct virtio_device *dev)
> > > > > > > > > > > if (dev->config->enable_cbs)
> > > > > > > > > > > dev->config->enable_cbs(dev);
> > > > > > > > > > > + /*
> > > > > > > > > > > + * Commit the driver setup before enabling the virtqueue
> > > > > > > > > > > + * callbacks. Paried with smp_load_acuqire() in
> > > > > > > > > > > + * virtio_irq_soft_enabled()
> > > > > > > > > > > + */
> > > > > > > > > > > + smp_store_release(&dev->irq_soft_enabled, true);
> > > > > > > > > > > +
> > > > > > > > > > > BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > > > dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > > > }
> > > > > > > > > > > --
> > > > > > > > > > > 2.25.1
> > > > > > > >
> > > > > >
> > > >
> >

2022-03-29 08:01:33

by Jason Wang

[permalink] [raw]
Subject: Re:

On Mon, Mar 28, 2022 at 6:41 PM Michael S. Tsirkin <[email protected]> wrote:
>
> On Mon, Mar 28, 2022 at 02:18:22PM +0800, Jason Wang wrote:
> > On Mon, Mar 28, 2022 at 1:59 PM Michael S. Tsirkin <[email protected]> wrote:
> > >
> > > On Mon, Mar 28, 2022 at 12:56:41PM +0800, Jason Wang wrote:
> > > > On Fri, Mar 25, 2022 at 6:10 PM Michael S. Tsirkin <[email protected]> wrote:
> > > > >
> > > > > On Fri, Mar 25, 2022 at 05:20:19PM +0800, Jason Wang wrote:
> > > > > > On Fri, Mar 25, 2022 at 5:10 PM Michael S. Tsirkin <[email protected]> wrote:
> > > > > > >
> > > > > > > On Fri, Mar 25, 2022 at 03:52:00PM +0800, Jason Wang wrote:
> > > > > > > > On Fri, Mar 25, 2022 at 2:31 PM Michael S. Tsirkin <[email protected]> wrote:
> > > > > > > > >
> > > > > > > > > Bcc:
> > > > > > > > > Subject: Re: [PATCH 3/3] virtio: harden vring IRQ
> > > > > > > > > Message-ID: <[email protected]>
> > > > > > > > > Reply-To:
> > > > > > > > > In-Reply-To: <[email protected]>
> > > > > > > > >
> > > > > > > > > On Fri, Mar 25, 2022 at 11:04:08AM +0800, Jason Wang wrote:
> > > > > > > > > >
> > > > > > > > > > 在 2022/3/24 下午7:03, Michael S. Tsirkin 写道:
> > > > > > > > > > > On Thu, Mar 24, 2022 at 04:40:04PM +0800, Jason Wang wrote:
> > > > > > > > > > > > This is a rework on the previous IRQ hardening that is done for
> > > > > > > > > > > > virtio-pci where several drawbacks were found and were reverted:
> > > > > > > > > > > >
> > > > > > > > > > > > 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ
> > > > > > > > > > > > that is used by some device such as virtio-blk
> > > > > > > > > > > > 2) done only for PCI transport
> > > > > > > > > > > >
> > > > > > > > > > > > In this patch, we tries to borrow the idea from the INTX IRQ hardening
> > > > > > > > > > > > in the reverted commit 080cd7c3ac87 ("virtio-pci: harden INTX interrupts")
> > > > > > > > > > > > by introducing a global irq_soft_enabled variable for each
> > > > > > > > > > > > virtio_device. Then we can to toggle it during
> > > > > > > > > > > > virtio_reset_device()/virtio_device_ready(). A synchornize_rcu() is
> > > > > > > > > > > > used in virtio_reset_device() to synchronize with the IRQ handlers. In
> > > > > > > > > > > > the future, we may provide config_ops for the transport that doesn't
> > > > > > > > > > > > use IRQ. With this, vring_interrupt() can return check and early if
> > > > > > > > > > > > irq_soft_enabled is false. This lead to smp_load_acquire() to be used
> > > > > > > > > > > > but the cost should be acceptable.
> > > > > > > > > > > Maybe it should be but is it? Can't we use synchronize_irq instead?
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Even if we allow the transport driver to synchornize through
> > > > > > > > > > synchronize_irq() we still need a check in the vring_interrupt().
> > > > > > > > > >
> > > > > > > > > > We do something like the following previously:
> > > > > > > > > >
> > > > > > > > > > if (!READ_ONCE(vp_dev->intx_soft_enabled))
> > > > > > > > > > return IRQ_NONE;
> > > > > > > > > >
> > > > > > > > > > But it looks like a bug since speculative read can be done before the check
> > > > > > > > > > where the interrupt handler can't see the uncommitted setup which is done by
> > > > > > > > > > the driver.
> > > > > > > > >
> > > > > > > > > I don't think so - if you sync after setting the value then
> > > > > > > > > you are guaranteed that any handler running afterwards
> > > > > > > > > will see the new value.
> > > > > > > >
> > > > > > > > The problem is not disabled but the enable.
> > > > > > >
> > > > > > > So a misbehaving device can lose interrupts? That's not a problem at all
> > > > > > > imo.
> > > > > >
> > > > > > It's the interrupt raised before setting irq_soft_enabled to true:
> > > > > >
> > > > > > CPU 0 probe) driver specific setup (not commited)
> > > > > > CPU 1 IRQ handler) read the uninitialized variable
> > > > > > CPU 0 probe) set irq_soft_enabled to true
> > > > > > CPU 1 IRQ handler) read irq_soft_enable as true
> > > > > > CPU 1 IRQ handler) use the uninitialized variable
> > > > > >
> > > > > > Thanks
> > > > >
> > > > > Yea, it hurts if you do it. So do not do it then ;).
> > > > >
> > > > > irq_soft_enabled (I think driver_ok or status is a better name)
> > > >
> > > > I can change it to driver_ok.
> > > >
> > > > > should be initialized to false *before* irq is requested.
> > > > >
> > > > > And requesting irq commits all memory otherwise all drivers would be
> > > > > broken,
> > > >
> > > > So I think we might talk different issues:
> > > >
> > > > 1) Whether request_irq() commits the previous setups, I think the
> > > > answer is yes, since the spin_unlock of desc->lock (release) can
> > > > guarantee this though there seems no documentation around
> > > > request_irq() to say this.
> > > >
> > > > And I can see at least drivers/video/fbdev/omap2/omapfb/dss/dispc.c is
> > > > using smp_wmb() before the request_irq().
> > > >
> > > > And even if write is ordered we still need read to be ordered to be
> > > > paired with that.
>
> IMO it synchronizes with the CPU to which irq is
> delivered. Otherwise basically all drivers would be broken,
> wouldn't they be?

I guess it's because most of the drivers don't care much about the
buggy/malicious device. And most of the devices may require an extra
step to enable device IRQ after request_irq(). Or it's the charge of
the driver to do the synchronization.

> I don't know whether it's correct on all platforms, but if not
> we need to fix request_irq.
>
> > > >
> > > > > if it doesn't it just needs to be fixed, not worked around in
> > > > > virtio.
> > > >
> > > > 2) virtio drivers might do a lot of setups between request_irq() and
> > > > virtio_device_ready():
> > > >
> > > > request_irq()
> > > > driver specific setups
> > > > virtio_device_ready()
> > > >
> > > > CPU 0 probe) request_irq()
> > > > CPU 1 IRQ handler) read the uninitialized variable
> > > > CPU 0 probe) driver specific setups
> > > > CPU 0 probe) smp_store_release(intr_soft_enabled, true), commit the setups
> > > > CPU 1 IRQ handler) read irq_soft_enable as true
> > > > CPU 1 IRQ handler) use the uninitialized variable
> > > >
> > > > Thanks
> > >
> > >
> > > As I said, virtio_device_ready needs to do synchronize_irq.
> > > That will guarantee all setup is visible to the specific IRQ,
> >
> > Only the interrupt after synchronize_irq() returns.
>
> Anything else is a buggy device though.

Yes, but the goal of this patch is to prevent the possible attack from
buggy(malicious) devices.

>
> > >this
> > > is what it's point is.
> >
> > What happens if an interrupt is raised in the middle like:
> >
> > smp_store_release(dev->irq_soft_enabled, true)
> > IRQ handler
> > synchornize_irq()
> >
> > If we don't enforce a reading order, the IRQ handler may still see the
> > uninitialized variable.
> >
> > Thanks
>
> IMHO variables should be initialized before request_irq
> to a value meaning "not a valid interrupt".
> Specifically driver_ok = false.
> Handler in the scenario you describe will then see !driver_ok
> and exit immediately.

So just to make sure we're on the same page.

1) virtio_reset_device() will set the driver_ok to false;
2) virtio_device_ready() will set the driver_ok to true

So for virtio drivers, it often did:

1) virtio_reset_device()
2) find_vqs() which will call request_irq()
3) other driver specific setups
4) virtio_device_ready()

In virtio_device_ready(), the patch perform the following currently:

smp_store_release(driver_ok, true);
set_status(DRIVER_OK);

Per your suggestion, to add synchronize_irq() after
smp_store_release() so we had

smp_store_release(driver_ok, true);
synchornize_irq()
set_status(DRIVER_OK)

Suppose there's a interrupt raised before the synchronize_irq(), if we do:

if (READ_ONCE(driver_ok)) {
vq->callback()
}

It will see the driver_ok as true but how can we make sure
vq->callback sees the driver specific setups (3) above?

And an example is virtio_scsi():

virtio_reset_device()
virtscsi_probe()
virtscsi_init()
virtio_find_vqs()
...
virtscsi_init_vq(&vscsi->event_vq, vqs[1])
....
virtio_device_ready()

In virtscsi_event_done():

virtscsi_event_done():
virtscsi_vq_done(vscsi, &vscsi->event_vq, ...);

We need to make sure the even_done reads driver_ok before read vscsi->event_vq.

Thanks

>
>
> > >
> > >
> > > > >
> > > > >
> > > > > > >
> > > > > > > > We use smp_store_relase()
> > > > > > > > to make sure the driver commits the setup before enabling the irq. It
> > > > > > > > means the read needs to be ordered as well in vring_interrupt().
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Although I couldn't find anything about this in memory-barriers.txt
> > > > > > > > > which surprises me.
> > > > > > > > >
> > > > > > > > > CC Paul to help make sure I'm right.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > To avoid breaking legacy device which can send IRQ before DRIVER_OK, a
> > > > > > > > > > > > module parameter is introduced to enable the hardening so function
> > > > > > > > > > > > hardening is disabled by default.
> > > > > > > > > > > Which devices are these? How come they send an interrupt before there
> > > > > > > > > > > are any buffers in any queues?
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I copied this from the commit log for 22b7050a024d7
> > > > > > > > > >
> > > > > > > > > > "
> > > > > > > > > >
> > > > > > > > > > This change will also benefit old hypervisors (before 2009)
> > > > > > > > > > that send interrupts without checking DRIVER_OK: previously,
> > > > > > > > > > the callback could race with driver-specific initialization.
> > > > > > > > > > "
> > > > > > > > > >
> > > > > > > > > > If this is only for config interrupt, I can remove the above log.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > This is only for config interrupt.
> > > > > > > >
> > > > > > > > Ok.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > Note that the hardening is only done for vring interrupt since the
> > > > > > > > > > > > config interrupt hardening is already done in commit 22b7050a024d7
> > > > > > > > > > > > ("virtio: defer config changed notifications"). But the method that is
> > > > > > > > > > > > used by config interrupt can't be reused by the vring interrupt
> > > > > > > > > > > > handler because it uses spinlock to do the synchronization which is
> > > > > > > > > > > > expensive.
> > > > > > > > > > > >
> > > > > > > > > > > > Signed-off-by: Jason Wang <[email protected]>
> > > > > > > > > > >
> > > > > > > > > > > > ---
> > > > > > > > > > > > drivers/virtio/virtio.c | 19 +++++++++++++++++++
> > > > > > > > > > > > drivers/virtio/virtio_ring.c | 9 ++++++++-
> > > > > > > > > > > > include/linux/virtio.h | 4 ++++
> > > > > > > > > > > > include/linux/virtio_config.h | 25 +++++++++++++++++++++++++
> > > > > > > > > > > > 4 files changed, 56 insertions(+), 1 deletion(-)
> > > > > > > > > > > >
> > > > > > > > > > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > > > > > > > > > > index 8dde44ea044a..85e331efa9cc 100644
> > > > > > > > > > > > --- a/drivers/virtio/virtio.c
> > > > > > > > > > > > +++ b/drivers/virtio/virtio.c
> > > > > > > > > > > > @@ -7,6 +7,12 @@
> > > > > > > > > > > > #include <linux/of.h>
> > > > > > > > > > > > #include <uapi/linux/virtio_ids.h>
> > > > > > > > > > > > +static bool irq_hardening = false;
> > > > > > > > > > > > +
> > > > > > > > > > > > +module_param(irq_hardening, bool, 0444);
> > > > > > > > > > > > +MODULE_PARM_DESC(irq_hardening,
> > > > > > > > > > > > + "Disalbe IRQ software processing when it is not expected");
> > > > > > > > > > > > +
> > > > > > > > > > > > /* Unique numbering for virtio devices. */
> > > > > > > > > > > > static DEFINE_IDA(virtio_index_ida);
> > > > > > > > > > > > @@ -220,6 +226,15 @@ static int virtio_features_ok(struct virtio_device *dev)
> > > > > > > > > > > > * */
> > > > > > > > > > > > void virtio_reset_device(struct virtio_device *dev)
> > > > > > > > > > > > {
> > > > > > > > > > > > + /*
> > > > > > > > > > > > + * The below synchronize_rcu() guarantees that any
> > > > > > > > > > > > + * interrupt for this line arriving after
> > > > > > > > > > > > + * synchronize_rcu() has completed is guaranteed to see
> > > > > > > > > > > > + * irq_soft_enabled == false.
> > > > > > > > > > > News to me I did not know synchronize_rcu has anything to do
> > > > > > > > > > > with interrupts. Did not you intend to use synchronize_irq?
> > > > > > > > > > > I am not even 100% sure synchronize_rcu is by design a memory barrier
> > > > > > > > > > > though it's most likely is ...
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > According to the comment above tree RCU version of synchronize_rcu():
> > > > > > > > > >
> > > > > > > > > > """
> > > > > > > > > >
> > > > > > > > > > * RCU read-side critical sections are delimited by rcu_read_lock()
> > > > > > > > > > * and rcu_read_unlock(), and may be nested. In addition, but only in
> > > > > > > > > > * v5.0 and later, regions of code across which interrupts, preemption,
> > > > > > > > > > * or softirqs have been disabled also serve as RCU read-side critical
> > > > > > > > > > * sections. This includes hardware interrupt handlers, softirq handlers,
> > > > > > > > > > * and NMI handlers.
> > > > > > > > > > """
> > > > > > > > > >
> > > > > > > > > > So interrupt handlers are treated as read-side critical sections.
> > > > > > > > > >
> > > > > > > > > > And it has the comment for explain the barrier:
> > > > > > > > > >
> > > > > > > > > > """
> > > > > > > > > >
> > > > > > > > > > * Note that this guarantee implies further memory-ordering guarantees.
> > > > > > > > > > * On systems with more than one CPU, when synchronize_rcu() returns,
> > > > > > > > > > * each CPU is guaranteed to have executed a full memory barrier since
> > > > > > > > > > * the end of its last RCU read-side critical section whose beginning
> > > > > > > > > > * preceded the call to synchronize_rcu(). In addition, each CPU having
> > > > > > > > > > """
> > > > > > > > > >
> > > > > > > > > > So on SMP it provides a full barrier. And for UP/tiny RCU we don't need the
> > > > > > > > > > barrier, if the interrupt come after WRITE_ONCE() it will see the
> > > > > > > > > > irq_soft_enabled as false.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > You are right. So then
> > > > > > > > > 1. I do not think we need load_acquire - why is it needed? Just
> > > > > > > > > READ_ONCE should do.
> > > > > > > >
> > > > > > > > See above.
> > > > > > > >
> > > > > > > > > 2. isn't synchronize_irq also doing the same thing?
> > > > > > > >
> > > > > > > >
> > > > > > > > Yes, but it requires a config ops since the IRQ knowledge is transport specific.
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > + */
> > > > > > > > > > > > + WRITE_ONCE(dev->irq_soft_enabled, false);
> > > > > > > > > > > > + synchronize_rcu();
> > > > > > > > > > > > +
> > > > > > > > > > > > dev->config->reset(dev);
> > > > > > > > > > > > }
> > > > > > > > > > > > EXPORT_SYMBOL_GPL(virtio_reset_device);
> > > > > > > > > > > Please add comment explaining where it will be enabled.
> > > > > > > > > > > Also, we *really* don't need to synch if it was already disabled,
> > > > > > > > > > > let's not add useless overhead to the boot sequence.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Ok.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > @@ -427,6 +442,10 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > > > > > spin_lock_init(&dev->config_lock);
> > > > > > > > > > > > dev->config_enabled = false;
> > > > > > > > > > > > dev->config_change_pending = false;
> > > > > > > > > > > > + dev->irq_soft_check = irq_hardening;
> > > > > > > > > > > > +
> > > > > > > > > > > > + if (dev->irq_soft_check)
> > > > > > > > > > > > + dev_info(&dev->dev, "IRQ hardening is enabled\n");
> > > > > > > > > > > > /* We always start by resetting the device, in case a previous
> > > > > > > > > > > > * driver messed it up. This also tests that code path a little. */
> > > > > > > > > > > one of the points of hardening is it's also helpful for buggy
> > > > > > > > > > > devices. this flag defeats the purpose.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Do you mean:
> > > > > > > > > >
> > > > > > > > > > 1) we need something like config_enable? This seems not easy to be
> > > > > > > > > > implemented without obvious overhead, mainly the synchronize with the
> > > > > > > > > > interrupt handlers
> > > > > > > > >
> > > > > > > > > But synchronize is only on tear-down path. That is not critical for any
> > > > > > > > > users at the moment, even less than probe.
> > > > > > > >
> > > > > > > > I meant if we have vq->irq_pending, we need to call vring_interrupt()
> > > > > > > > in the virtio_device_ready() and synchronize the IRQ handlers with
> > > > > > > > spinlock or others.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > 2) enable this by default, so I don't object, but this may have some risk
> > > > > > > > > > for old hypervisors
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > The risk if there's a driver adding buffers without setting DRIVER_OK.
> > > > > > > >
> > > > > > > > Probably not, we have devices that accept random inputs from outside,
> > > > > > > > net, console, input etc. I've done a round of audits of the Qemu
> > > > > > > > codes. They look all fine since day0.
> > > > > > > >
> > > > > > > > > So with this approach, how about we rename the flag "driver_ok"?
> > > > > > > > > And then add_buf can actually test it and BUG_ON if not there (at least
> > > > > > > > > in the debug build).
> > > > > > > >
> > > > > > > > This looks like a hardening of the driver in the core instead of the
> > > > > > > > device. I think it can be done but in a separate series.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > And going down from there, how about we cache status in the
> > > > > > > > > device? Then we don't need to keep re-reading it every time,
> > > > > > > > > speeding boot up a tiny bit.
> > > > > > > >
> > > > > > > > I don't fully understand here, actually spec requires status to be
> > > > > > > > read back for validation in many cases.
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > > > > > > > index 962f1477b1fa..0170f8c784d8 100644
> > > > > > > > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > > > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > > > > > > > @@ -2144,10 +2144,17 @@ static inline bool more_used(const struct vring_virtqueue *vq)
> > > > > > > > > > > > return vq->packed_ring ? more_used_packed(vq) : more_used_split(vq);
> > > > > > > > > > > > }
> > > > > > > > > > > > -irqreturn_t vring_interrupt(int irq, void *_vq)
> > > > > > > > > > > > +irqreturn_t vring_interrupt(int irq, void *v)
> > > > > > > > > > > > {
> > > > > > > > > > > > + struct virtqueue *_vq = v;
> > > > > > > > > > > > + struct virtio_device *vdev = _vq->vdev;
> > > > > > > > > > > > struct vring_virtqueue *vq = to_vvq(_vq);
> > > > > > > > > > > > + if (!virtio_irq_soft_enabled(vdev)) {
> > > > > > > > > > > > + dev_warn_once(&vdev->dev, "virtio vring IRQ raised before DRIVER_OK");
> > > > > > > > > > > > + return IRQ_NONE;
> > > > > > > > > > > > + }
> > > > > > > > > > > > +
> > > > > > > > > > > > if (!more_used(vq)) {
> > > > > > > > > > > > pr_debug("virtqueue interrupt with no work for %p\n", vq);
> > > > > > > > > > > > return IRQ_NONE;
> > > > > > > > > > > > diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> > > > > > > > > > > > index 5464f398912a..957d6ad604ac 100644
> > > > > > > > > > > > --- a/include/linux/virtio.h
> > > > > > > > > > > > +++ b/include/linux/virtio.h
> > > > > > > > > > > > @@ -95,6 +95,8 @@ dma_addr_t virtqueue_get_used_addr(struct virtqueue *vq);
> > > > > > > > > > > > * @failed: saved value for VIRTIO_CONFIG_S_FAILED bit (for restore)
> > > > > > > > > > > > * @config_enabled: configuration change reporting enabled
> > > > > > > > > > > > * @config_change_pending: configuration change reported while disabled
> > > > > > > > > > > > + * @irq_soft_check: whether or not to check @irq_soft_enabled
> > > > > > > > > > > > + * @irq_soft_enabled: callbacks enabled
> > > > > > > > > > > > * @config_lock: protects configuration change reporting
> > > > > > > > > > > > * @dev: underlying device.
> > > > > > > > > > > > * @id: the device type identification (used to match it with a driver).
> > > > > > > > > > > > @@ -109,6 +111,8 @@ struct virtio_device {
> > > > > > > > > > > > bool failed;
> > > > > > > > > > > > bool config_enabled;
> > > > > > > > > > > > bool config_change_pending;
> > > > > > > > > > > > + bool irq_soft_check;
> > > > > > > > > > > > + bool irq_soft_enabled;
> > > > > > > > > > > > spinlock_t config_lock;
> > > > > > > > > > > > spinlock_t vqs_list_lock; /* Protects VQs list access */
> > > > > > > > > > > > struct device dev;
> > > > > > > > > > > > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > > > > > > > > > > > index dafdc7f48c01..9c1b61f2e525 100644
> > > > > > > > > > > > --- a/include/linux/virtio_config.h
> > > > > > > > > > > > +++ b/include/linux/virtio_config.h
> > > > > > > > > > > > @@ -174,6 +174,24 @@ static inline bool virtio_has_feature(const struct virtio_device *vdev,
> > > > > > > > > > > > return __virtio_test_bit(vdev, fbit);
> > > > > > > > > > > > }
> > > > > > > > > > > > +/*
> > > > > > > > > > > > + * virtio_irq_soft_enabled: whether we can execute callbacks
> > > > > > > > > > > > + * @vdev: the device
> > > > > > > > > > > > + */
> > > > > > > > > > > > +static inline bool virtio_irq_soft_enabled(const struct virtio_device *vdev)
> > > > > > > > > > > > +{
> > > > > > > > > > > > + if (!vdev->irq_soft_check)
> > > > > > > > > > > > + return true;
> > > > > > > > > > > > +
> > > > > > > > > > > > + /*
> > > > > > > > > > > > + * Read irq_soft_enabled before reading other device specific
> > > > > > > > > > > > + * data. Paried with smp_store_relase() in
> > > > > > > > > > > paired
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Will fix.
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > + * virtio_device_ready() and WRITE_ONCE()/synchronize_rcu() in
> > > > > > > > > > > > + * virtio_reset_device().
> > > > > > > > > > > > + */
> > > > > > > > > > > > + return smp_load_acquire(&vdev->irq_soft_enabled);
> > > > > > > > > > > > +}
> > > > > > > > > > > > +
> > > > > > > > > > > > /**
> > > > > > > > > > > > * virtio_has_dma_quirk - determine whether this device has the DMA quirk
> > > > > > > > > > > > * @vdev: the device
> > > > > > > > > > > > @@ -236,6 +254,13 @@ void virtio_device_ready(struct virtio_device *dev)
> > > > > > > > > > > > if (dev->config->enable_cbs)
> > > > > > > > > > > > dev->config->enable_cbs(dev);
> > > > > > > > > > > > + /*
> > > > > > > > > > > > + * Commit the driver setup before enabling the virtqueue
> > > > > > > > > > > > + * callbacks. Paried with smp_load_acuqire() in
> > > > > > > > > > > > + * virtio_irq_soft_enabled()
> > > > > > > > > > > > + */
> > > > > > > > > > > > + smp_store_release(&dev->irq_soft_enabled, true);
> > > > > > > > > > > > +
> > > > > > > > > > > > BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > > > > dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > > > > }
> > > > > > > > > > > > --
> > > > > > > > > > > > 2.25.1
> > > > > > > > >
> > > > > > >
> > > > >
> > >
>

2022-03-29 11:24:08

by Thomas Gleixner

[permalink] [raw]
Subject: Re:

On Mon, Mar 28 2022 at 06:40, Michael S. Tsirkin wrote:
> On Mon, Mar 28, 2022 at 02:18:22PM +0800, Jason Wang wrote:
>> > > So I think we might talk different issues:
>> > >
>> > > 1) Whether request_irq() commits the previous setups, I think the
>> > > answer is yes, since the spin_unlock of desc->lock (release) can
>> > > guarantee this though there seems no documentation around
>> > > request_irq() to say this.
>> > >
>> > > And I can see at least drivers/video/fbdev/omap2/omapfb/dss/dispc.c is
>> > > using smp_wmb() before the request_irq().

That's a complete bogus example especially as there is not a single
smp_rmb() which pairs with the smp_wmb().

>> > > And even if write is ordered we still need read to be ordered to be
>> > > paired with that.
>
> IMO it synchronizes with the CPU to which irq is
> delivered. Otherwise basically all drivers would be broken,
> wouldn't they be?
> I don't know whether it's correct on all platforms, but if not
> we need to fix request_irq.

There is nothing to fix:

request_irq()
raw_spin_lock_irq(desc->lock); // ACQUIRE
....
raw_spin_unlock_irq(desc->lock); // RELEASE

interrupt()
raw_spin_lock(desc->lock); // ACQUIRE
set status to IN_PROGRESS
raw_spin_unlock(desc->lock); // RELEASE
invoke handler()

So anything which the driver set up _before_ request_irq() is visible to
the interrupt handler. No?

>> What happens if an interrupt is raised in the middle like:
>>
>> smp_store_release(dev->irq_soft_enabled, true)
>> IRQ handler
>> synchornize_irq()

This is bogus. The obvious order of things is:

dev->ok = false;
request_irq();

moar_setup();
synchronize_irq(); // ACQUIRE + RELEASE
dev->ok = true;

The reverse operation on teardown:

dev->ok = false;
synchronize_irq(); // ACQUIRE + RELEASE

teardown();

So in both cases a simple check in the handler is sufficient:

handler()
if (!dev->ok)
return;

I'm not understanding what you folks are trying to "fix" here. If any
driver does this in the wrong order, then the driver is broken.

Sure, you can do the same with:

dev->ok = false;
request_irq();
moar_setup();
smp_wmb();
dev->ok = true;

for the price of a smp_rmb() in the interrupt handler:

handler()
if (!dev->ok)
return;
smp_rmb();

but that's only working for the setup case correctly and not for
teardown.

Thanks,

tglx

2022-03-30 10:07:16

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re:

On Tue, Mar 29, 2022 at 10:35:21AM +0200, Thomas Gleixner wrote:
> On Mon, Mar 28 2022 at 06:40, Michael S. Tsirkin wrote:
> > On Mon, Mar 28, 2022 at 02:18:22PM +0800, Jason Wang wrote:
> >> > > So I think we might talk different issues:
> >> > >
> >> > > 1) Whether request_irq() commits the previous setups, I think the
> >> > > answer is yes, since the spin_unlock of desc->lock (release) can
> >> > > guarantee this though there seems no documentation around
> >> > > request_irq() to say this.
> >> > >
> >> > > And I can see at least drivers/video/fbdev/omap2/omapfb/dss/dispc.c is
> >> > > using smp_wmb() before the request_irq().
>
> That's a complete bogus example especially as there is not a single
> smp_rmb() which pairs with the smp_wmb().
>
> >> > > And even if write is ordered we still need read to be ordered to be
> >> > > paired with that.
> >
> > IMO it synchronizes with the CPU to which irq is
> > delivered. Otherwise basically all drivers would be broken,
> > wouldn't they be?
> > I don't know whether it's correct on all platforms, but if not
> > we need to fix request_irq.
>
> There is nothing to fix:
>
> request_irq()
> raw_spin_lock_irq(desc->lock); // ACQUIRE
> ....
> raw_spin_unlock_irq(desc->lock); // RELEASE
>
> interrupt()
> raw_spin_lock(desc->lock); // ACQUIRE
> set status to IN_PROGRESS
> raw_spin_unlock(desc->lock); // RELEASE
> invoke handler()
>
> So anything which the driver set up _before_ request_irq() is visible to
> the interrupt handler. No?
> >> What happens if an interrupt is raised in the middle like:
> >>
> >> smp_store_release(dev->irq_soft_enabled, true)
> >> IRQ handler
> >> synchornize_irq()
>
> This is bogus. The obvious order of things is:
>
> dev->ok = false;
> request_irq();
>
> moar_setup();
> synchronize_irq(); // ACQUIRE + RELEASE
> dev->ok = true;
>
> The reverse operation on teardown:
>
> dev->ok = false;
> synchronize_irq(); // ACQUIRE + RELEASE
>
> teardown();
>
> So in both cases a simple check in the handler is sufficient:
>
> handler()
> if (!dev->ok)
> return;


Thanks a lot for the analysis Thomas. This is more or less what I was
thinking.

>
> I'm not understanding what you folks are trying to "fix" here.

We are trying to fix the driver since at the moment it does not
have the dev->ok flag at all.


And I suspect virtio is not alone in that.
So it would have been nice if there was a standard flag
replacing the driver-specific dev->ok above, and ideally
would also handle the case of an interrupt triggering
too early by deferring the interrupt until the flag is set.

And in fact, it does kind of exist: IRQF_NO_AUTOEN, and you would call
enable_irq instead of dev->ok = true, except
- it doesn't work with affinity managed IRQs
- it does not work with shared IRQs

So using dev->ok as you propose above seems better at this point.

> If any
> driver does this in the wrong order, then the driver is broken.

I agree, however:
$ git grep synchronize_irq `git grep -l request_irq drivers/net/`|wc -l
113
$ git grep -l request_irq drivers/net/|wc -l
397

I suspect there are more drivers which in theory need the
synchronize_irq dance but in practice do not execute it.


> Sure, you can do the same with:
>
> dev->ok = false;
> request_irq();
> moar_setup();
> smp_wmb();
> dev->ok = true;
>
> for the price of a smp_rmb() in the interrupt handler:
>
> handler()
> if (!dev->ok)
> return;
> smp_rmb();
>
> but that's only working for the setup case correctly and not for
> teardown.
>
> Thanks,
>
> tglx

2022-03-31 04:28:13

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re:

On Tue, Mar 29, 2022 at 03:12:14PM +0800, Jason Wang wrote:
> > > > > > And requesting irq commits all memory otherwise all drivers would be
> > > > > > broken,
> > > > >
> > > > > So I think we might talk different issues:
> > > > >
> > > > > 1) Whether request_irq() commits the previous setups, I think the
> > > > > answer is yes, since the spin_unlock of desc->lock (release) can
> > > > > guarantee this though there seems no documentation around
> > > > > request_irq() to say this.
> > > > >
> > > > > And I can see at least drivers/video/fbdev/omap2/omapfb/dss/dispc.c is
> > > > > using smp_wmb() before the request_irq().
> > > > >
> > > > > And even if write is ordered we still need read to be ordered to be
> > > > > paired with that.
> >
> > IMO it synchronizes with the CPU to which irq is
> > delivered. Otherwise basically all drivers would be broken,
> > wouldn't they be?
>
> I guess it's because most of the drivers don't care much about the
> buggy/malicious device. And most of the devices may require an extra
> step to enable device IRQ after request_irq(). Or it's the charge of
> the driver to do the synchronization.

It is true that the use-case of malicious devices is somewhat boutique.
But I think most drivers do want to have their hotplug routines to be
robust, yes.

> > I don't know whether it's correct on all platforms, but if not
> > we need to fix request_irq.
> >
> > > > >
> > > > > > if it doesn't it just needs to be fixed, not worked around in
> > > > > > virtio.
> > > > >
> > > > > 2) virtio drivers might do a lot of setups between request_irq() and
> > > > > virtio_device_ready():
> > > > >
> > > > > request_irq()
> > > > > driver specific setups
> > > > > virtio_device_ready()
> > > > >
> > > > > CPU 0 probe) request_irq()
> > > > > CPU 1 IRQ handler) read the uninitialized variable
> > > > > CPU 0 probe) driver specific setups
> > > > > CPU 0 probe) smp_store_release(intr_soft_enabled, true), commit the setups
> > > > > CPU 1 IRQ handler) read irq_soft_enable as true
> > > > > CPU 1 IRQ handler) use the uninitialized variable
> > > > >
> > > > > Thanks
> > > >
> > > >
> > > > As I said, virtio_device_ready needs to do synchronize_irq.
> > > > That will guarantee all setup is visible to the specific IRQ,
> > >
> > > Only the interrupt after synchronize_irq() returns.
> >
> > Anything else is a buggy device though.
>
> Yes, but the goal of this patch is to prevent the possible attack from
> buggy(malicious) devices.

Right. However if a driver of a *buggy* device somehow sees driver_ok =
false even though it's actually initialized, that is not a deal breaker
as that does not open us up to an attack.

> >
> > > >this
> > > > is what it's point is.
> > >
> > > What happens if an interrupt is raised in the middle like:
> > >
> > > smp_store_release(dev->irq_soft_enabled, true)
> > > IRQ handler
> > > synchornize_irq()
> > >
> > > If we don't enforce a reading order, the IRQ handler may still see the
> > > uninitialized variable.
> > >
> > > Thanks
> >
> > IMHO variables should be initialized before request_irq
> > to a value meaning "not a valid interrupt".
> > Specifically driver_ok = false.
> > Handler in the scenario you describe will then see !driver_ok
> > and exit immediately.
>
> So just to make sure we're on the same page.
>
> 1) virtio_reset_device() will set the driver_ok to false;
> 2) virtio_device_ready() will set the driver_ok to true
>
> So for virtio drivers, it often did:
>
> 1) virtio_reset_device()
> 2) find_vqs() which will call request_irq()
> 3) other driver specific setups
> 4) virtio_device_ready()
>
> In virtio_device_ready(), the patch perform the following currently:
>
> smp_store_release(driver_ok, true);
> set_status(DRIVER_OK);
>
> Per your suggestion, to add synchronize_irq() after
> smp_store_release() so we had
>
> smp_store_release(driver_ok, true);
> synchornize_irq()
> set_status(DRIVER_OK)
>
> Suppose there's a interrupt raised before the synchronize_irq(), if we do:
>
> if (READ_ONCE(driver_ok)) {
> vq->callback()
> }
>
> It will see the driver_ok as true but how can we make sure
> vq->callback sees the driver specific setups (3) above?
>
> And an example is virtio_scsi():
>
> virtio_reset_device()
> virtscsi_probe()
> virtscsi_init()
> virtio_find_vqs()
> ...
> virtscsi_init_vq(&vscsi->event_vq, vqs[1])
> ....
> virtio_device_ready()
>
> In virtscsi_event_done():
>
> virtscsi_event_done():
> virtscsi_vq_done(vscsi, &vscsi->event_vq, ...);
>
> We need to make sure the even_done reads driver_ok before read vscsi->event_vq.
>
> Thanks


See response by Thomas. A simple if (!dev->driver_ok) should be enough,
it's all under a lock.

> >
> >
> > > >
> > > >
> > > > > >
> > > > > >
> > > > > > > >
> > > > > > > > > We use smp_store_relase()
> > > > > > > > > to make sure the driver commits the setup before enabling the irq. It
> > > > > > > > > means the read needs to be ordered as well in vring_interrupt().
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Although I couldn't find anything about this in memory-barriers.txt
> > > > > > > > > > which surprises me.
> > > > > > > > > >
> > > > > > > > > > CC Paul to help make sure I'm right.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > To avoid breaking legacy device which can send IRQ before DRIVER_OK, a
> > > > > > > > > > > > > module parameter is introduced to enable the hardening so function
> > > > > > > > > > > > > hardening is disabled by default.
> > > > > > > > > > > > Which devices are these? How come they send an interrupt before there
> > > > > > > > > > > > are any buffers in any queues?
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > I copied this from the commit log for 22b7050a024d7
> > > > > > > > > > >
> > > > > > > > > > > "
> > > > > > > > > > >
> > > > > > > > > > > This change will also benefit old hypervisors (before 2009)
> > > > > > > > > > > that send interrupts without checking DRIVER_OK: previously,
> > > > > > > > > > > the callback could race with driver-specific initialization.
> > > > > > > > > > > "
> > > > > > > > > > >
> > > > > > > > > > > If this is only for config interrupt, I can remove the above log.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > This is only for config interrupt.
> > > > > > > > >
> > > > > > > > > Ok.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > Note that the hardening is only done for vring interrupt since the
> > > > > > > > > > > > > config interrupt hardening is already done in commit 22b7050a024d7
> > > > > > > > > > > > > ("virtio: defer config changed notifications"). But the method that is
> > > > > > > > > > > > > used by config interrupt can't be reused by the vring interrupt
> > > > > > > > > > > > > handler because it uses spinlock to do the synchronization which is
> > > > > > > > > > > > > expensive.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Signed-off-by: Jason Wang <[email protected]>
> > > > > > > > > > > >
> > > > > > > > > > > > > ---
> > > > > > > > > > > > > drivers/virtio/virtio.c | 19 +++++++++++++++++++
> > > > > > > > > > > > > drivers/virtio/virtio_ring.c | 9 ++++++++-
> > > > > > > > > > > > > include/linux/virtio.h | 4 ++++
> > > > > > > > > > > > > include/linux/virtio_config.h | 25 +++++++++++++++++++++++++
> > > > > > > > > > > > > 4 files changed, 56 insertions(+), 1 deletion(-)
> > > > > > > > > > > > >
> > > > > > > > > > > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > > > > > > > > > > > index 8dde44ea044a..85e331efa9cc 100644
> > > > > > > > > > > > > --- a/drivers/virtio/virtio.c
> > > > > > > > > > > > > +++ b/drivers/virtio/virtio.c
> > > > > > > > > > > > > @@ -7,6 +7,12 @@
> > > > > > > > > > > > > #include <linux/of.h>
> > > > > > > > > > > > > #include <uapi/linux/virtio_ids.h>
> > > > > > > > > > > > > +static bool irq_hardening = false;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +module_param(irq_hardening, bool, 0444);
> > > > > > > > > > > > > +MODULE_PARM_DESC(irq_hardening,
> > > > > > > > > > > > > + "Disalbe IRQ software processing when it is not expected");
> > > > > > > > > > > > > +
> > > > > > > > > > > > > /* Unique numbering for virtio devices. */
> > > > > > > > > > > > > static DEFINE_IDA(virtio_index_ida);
> > > > > > > > > > > > > @@ -220,6 +226,15 @@ static int virtio_features_ok(struct virtio_device *dev)
> > > > > > > > > > > > > * */
> > > > > > > > > > > > > void virtio_reset_device(struct virtio_device *dev)
> > > > > > > > > > > > > {
> > > > > > > > > > > > > + /*
> > > > > > > > > > > > > + * The below synchronize_rcu() guarantees that any
> > > > > > > > > > > > > + * interrupt for this line arriving after
> > > > > > > > > > > > > + * synchronize_rcu() has completed is guaranteed to see
> > > > > > > > > > > > > + * irq_soft_enabled == false.
> > > > > > > > > > > > News to me I did not know synchronize_rcu has anything to do
> > > > > > > > > > > > with interrupts. Did not you intend to use synchronize_irq?
> > > > > > > > > > > > I am not even 100% sure synchronize_rcu is by design a memory barrier
> > > > > > > > > > > > though it's most likely is ...
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > According to the comment above tree RCU version of synchronize_rcu():
> > > > > > > > > > >
> > > > > > > > > > > """
> > > > > > > > > > >
> > > > > > > > > > > * RCU read-side critical sections are delimited by rcu_read_lock()
> > > > > > > > > > > * and rcu_read_unlock(), and may be nested. In addition, but only in
> > > > > > > > > > > * v5.0 and later, regions of code across which interrupts, preemption,
> > > > > > > > > > > * or softirqs have been disabled also serve as RCU read-side critical
> > > > > > > > > > > * sections. This includes hardware interrupt handlers, softirq handlers,
> > > > > > > > > > > * and NMI handlers.
> > > > > > > > > > > """
> > > > > > > > > > >
> > > > > > > > > > > So interrupt handlers are treated as read-side critical sections.
> > > > > > > > > > >
> > > > > > > > > > > And it has the comment for explain the barrier:
> > > > > > > > > > >
> > > > > > > > > > > """
> > > > > > > > > > >
> > > > > > > > > > > * Note that this guarantee implies further memory-ordering guarantees.
> > > > > > > > > > > * On systems with more than one CPU, when synchronize_rcu() returns,
> > > > > > > > > > > * each CPU is guaranteed to have executed a full memory barrier since
> > > > > > > > > > > * the end of its last RCU read-side critical section whose beginning
> > > > > > > > > > > * preceded the call to synchronize_rcu(). In addition, each CPU having
> > > > > > > > > > > """
> > > > > > > > > > >
> > > > > > > > > > > So on SMP it provides a full barrier. And for UP/tiny RCU we don't need the
> > > > > > > > > > > barrier, if the interrupt come after WRITE_ONCE() it will see the
> > > > > > > > > > > irq_soft_enabled as false.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > You are right. So then
> > > > > > > > > > 1. I do not think we need load_acquire - why is it needed? Just
> > > > > > > > > > READ_ONCE should do.
> > > > > > > > >
> > > > > > > > > See above.
> > > > > > > > >
> > > > > > > > > > 2. isn't synchronize_irq also doing the same thing?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Yes, but it requires a config ops since the IRQ knowledge is transport specific.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > + */
> > > > > > > > > > > > > + WRITE_ONCE(dev->irq_soft_enabled, false);
> > > > > > > > > > > > > + synchronize_rcu();
> > > > > > > > > > > > > +
> > > > > > > > > > > > > dev->config->reset(dev);
> > > > > > > > > > > > > }
> > > > > > > > > > > > > EXPORT_SYMBOL_GPL(virtio_reset_device);
> > > > > > > > > > > > Please add comment explaining where it will be enabled.
> > > > > > > > > > > > Also, we *really* don't need to synch if it was already disabled,
> > > > > > > > > > > > let's not add useless overhead to the boot sequence.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Ok.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > @@ -427,6 +442,10 @@ int register_virtio_device(struct virtio_device *dev)
> > > > > > > > > > > > > spin_lock_init(&dev->config_lock);
> > > > > > > > > > > > > dev->config_enabled = false;
> > > > > > > > > > > > > dev->config_change_pending = false;
> > > > > > > > > > > > > + dev->irq_soft_check = irq_hardening;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + if (dev->irq_soft_check)
> > > > > > > > > > > > > + dev_info(&dev->dev, "IRQ hardening is enabled\n");
> > > > > > > > > > > > > /* We always start by resetting the device, in case a previous
> > > > > > > > > > > > > * driver messed it up. This also tests that code path a little. */
> > > > > > > > > > > > one of the points of hardening is it's also helpful for buggy
> > > > > > > > > > > > devices. this flag defeats the purpose.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Do you mean:
> > > > > > > > > > >
> > > > > > > > > > > 1) we need something like config_enable? This seems not easy to be
> > > > > > > > > > > implemented without obvious overhead, mainly the synchronize with the
> > > > > > > > > > > interrupt handlers
> > > > > > > > > >
> > > > > > > > > > But synchronize is only on tear-down path. That is not critical for any
> > > > > > > > > > users at the moment, even less than probe.
> > > > > > > > >
> > > > > > > > > I meant if we have vq->irq_pending, we need to call vring_interrupt()
> > > > > > > > > in the virtio_device_ready() and synchronize the IRQ handlers with
> > > > > > > > > spinlock or others.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > 2) enable this by default, so I don't object, but this may have some risk
> > > > > > > > > > > for old hypervisors
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > The risk if there's a driver adding buffers without setting DRIVER_OK.
> > > > > > > > >
> > > > > > > > > Probably not, we have devices that accept random inputs from outside,
> > > > > > > > > net, console, input etc. I've done a round of audits of the Qemu
> > > > > > > > > codes. They look all fine since day0.
> > > > > > > > >
> > > > > > > > > > So with this approach, how about we rename the flag "driver_ok"?
> > > > > > > > > > And then add_buf can actually test it and BUG_ON if not there (at least
> > > > > > > > > > in the debug build).
> > > > > > > > >
> > > > > > > > > This looks like a hardening of the driver in the core instead of the
> > > > > > > > > device. I think it can be done but in a separate series.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > And going down from there, how about we cache status in the
> > > > > > > > > > device? Then we don't need to keep re-reading it every time,
> > > > > > > > > > speeding boot up a tiny bit.
> > > > > > > > >
> > > > > > > > > I don't fully understand here, actually spec requires status to be
> > > > > > > > > read back for validation in many cases.
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > > > > > > > > index 962f1477b1fa..0170f8c784d8 100644
> > > > > > > > > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > > > > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > > > > > > > > @@ -2144,10 +2144,17 @@ static inline bool more_used(const struct vring_virtqueue *vq)
> > > > > > > > > > > > > return vq->packed_ring ? more_used_packed(vq) : more_used_split(vq);
> > > > > > > > > > > > > }
> > > > > > > > > > > > > -irqreturn_t vring_interrupt(int irq, void *_vq)
> > > > > > > > > > > > > +irqreturn_t vring_interrupt(int irq, void *v)
> > > > > > > > > > > > > {
> > > > > > > > > > > > > + struct virtqueue *_vq = v;
> > > > > > > > > > > > > + struct virtio_device *vdev = _vq->vdev;
> > > > > > > > > > > > > struct vring_virtqueue *vq = to_vvq(_vq);
> > > > > > > > > > > > > + if (!virtio_irq_soft_enabled(vdev)) {
> > > > > > > > > > > > > + dev_warn_once(&vdev->dev, "virtio vring IRQ raised before DRIVER_OK");
> > > > > > > > > > > > > + return IRQ_NONE;
> > > > > > > > > > > > > + }
> > > > > > > > > > > > > +
> > > > > > > > > > > > > if (!more_used(vq)) {
> > > > > > > > > > > > > pr_debug("virtqueue interrupt with no work for %p\n", vq);
> > > > > > > > > > > > > return IRQ_NONE;
> > > > > > > > > > > > > diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> > > > > > > > > > > > > index 5464f398912a..957d6ad604ac 100644
> > > > > > > > > > > > > --- a/include/linux/virtio.h
> > > > > > > > > > > > > +++ b/include/linux/virtio.h
> > > > > > > > > > > > > @@ -95,6 +95,8 @@ dma_addr_t virtqueue_get_used_addr(struct virtqueue *vq);
> > > > > > > > > > > > > * @failed: saved value for VIRTIO_CONFIG_S_FAILED bit (for restore)
> > > > > > > > > > > > > * @config_enabled: configuration change reporting enabled
> > > > > > > > > > > > > * @config_change_pending: configuration change reported while disabled
> > > > > > > > > > > > > + * @irq_soft_check: whether or not to check @irq_soft_enabled
> > > > > > > > > > > > > + * @irq_soft_enabled: callbacks enabled
> > > > > > > > > > > > > * @config_lock: protects configuration change reporting
> > > > > > > > > > > > > * @dev: underlying device.
> > > > > > > > > > > > > * @id: the device type identification (used to match it with a driver).
> > > > > > > > > > > > > @@ -109,6 +111,8 @@ struct virtio_device {
> > > > > > > > > > > > > bool failed;
> > > > > > > > > > > > > bool config_enabled;
> > > > > > > > > > > > > bool config_change_pending;
> > > > > > > > > > > > > + bool irq_soft_check;
> > > > > > > > > > > > > + bool irq_soft_enabled;
> > > > > > > > > > > > > spinlock_t config_lock;
> > > > > > > > > > > > > spinlock_t vqs_list_lock; /* Protects VQs list access */
> > > > > > > > > > > > > struct device dev;
> > > > > > > > > > > > > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > > > > > > > > > > > > index dafdc7f48c01..9c1b61f2e525 100644
> > > > > > > > > > > > > --- a/include/linux/virtio_config.h
> > > > > > > > > > > > > +++ b/include/linux/virtio_config.h
> > > > > > > > > > > > > @@ -174,6 +174,24 @@ static inline bool virtio_has_feature(const struct virtio_device *vdev,
> > > > > > > > > > > > > return __virtio_test_bit(vdev, fbit);
> > > > > > > > > > > > > }
> > > > > > > > > > > > > +/*
> > > > > > > > > > > > > + * virtio_irq_soft_enabled: whether we can execute callbacks
> > > > > > > > > > > > > + * @vdev: the device
> > > > > > > > > > > > > + */
> > > > > > > > > > > > > +static inline bool virtio_irq_soft_enabled(const struct virtio_device *vdev)
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + if (!vdev->irq_soft_check)
> > > > > > > > > > > > > + return true;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + /*
> > > > > > > > > > > > > + * Read irq_soft_enabled before reading other device specific
> > > > > > > > > > > > > + * data. Paried with smp_store_relase() in
> > > > > > > > > > > > paired
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Will fix.
> > > > > > > > > > >
> > > > > > > > > > > Thanks
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > + * virtio_device_ready() and WRITE_ONCE()/synchronize_rcu() in
> > > > > > > > > > > > > + * virtio_reset_device().
> > > > > > > > > > > > > + */
> > > > > > > > > > > > > + return smp_load_acquire(&vdev->irq_soft_enabled);
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > /**
> > > > > > > > > > > > > * virtio_has_dma_quirk - determine whether this device has the DMA quirk
> > > > > > > > > > > > > * @vdev: the device
> > > > > > > > > > > > > @@ -236,6 +254,13 @@ void virtio_device_ready(struct virtio_device *dev)
> > > > > > > > > > > > > if (dev->config->enable_cbs)
> > > > > > > > > > > > > dev->config->enable_cbs(dev);
> > > > > > > > > > > > > + /*
> > > > > > > > > > > > > + * Commit the driver setup before enabling the virtqueue
> > > > > > > > > > > > > + * callbacks. Paried with smp_load_acuqire() in
> > > > > > > > > > > > > + * virtio_irq_soft_enabled()
> > > > > > > > > > > > > + */
> > > > > > > > > > > > > + smp_store_release(&dev->irq_soft_enabled, true);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > > > > > dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > > > > > > > > }
> > > > > > > > > > > > > --
> > > > > > > > > > > > > 2.25.1
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> >

2022-04-12 21:25:29

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re:

On Tue, Mar 29, 2022 at 10:35:21AM +0200, Thomas Gleixner wrote:
> On Mon, Mar 28 2022 at 06:40, Michael S. Tsirkin wrote:
> > On Mon, Mar 28, 2022 at 02:18:22PM +0800, Jason Wang wrote:
> >> > > So I think we might talk different issues:
> >> > >
> >> > > 1) Whether request_irq() commits the previous setups, I think the
> >> > > answer is yes, since the spin_unlock of desc->lock (release) can
> >> > > guarantee this though there seems no documentation around
> >> > > request_irq() to say this.
> >> > >
> >> > > And I can see at least drivers/video/fbdev/omap2/omapfb/dss/dispc.c is
> >> > > using smp_wmb() before the request_irq().
>
> That's a complete bogus example especially as there is not a single
> smp_rmb() which pairs with the smp_wmb().
>
> >> > > And even if write is ordered we still need read to be ordered to be
> >> > > paired with that.
> >
> > IMO it synchronizes with the CPU to which irq is
> > delivered. Otherwise basically all drivers would be broken,
> > wouldn't they be?
> > I don't know whether it's correct on all platforms, but if not
> > we need to fix request_irq.
>
> There is nothing to fix:
>
> request_irq()
> raw_spin_lock_irq(desc->lock); // ACQUIRE
> ....
> raw_spin_unlock_irq(desc->lock); // RELEASE
>
> interrupt()
> raw_spin_lock(desc->lock); // ACQUIRE
> set status to IN_PROGRESS
> raw_spin_unlock(desc->lock); // RELEASE
> invoke handler()
>
> So anything which the driver set up _before_ request_irq() is visible to
> the interrupt handler. No?
>
> >> What happens if an interrupt is raised in the middle like:
> >>
> >> smp_store_release(dev->irq_soft_enabled, true)
> >> IRQ handler
> >> synchornize_irq()
>
> This is bogus. The obvious order of things is:
>
> dev->ok = false;
> request_irq();
>
> moar_setup();
> synchronize_irq(); // ACQUIRE + RELEASE
> dev->ok = true;
>
> The reverse operation on teardown:
>
> dev->ok = false;
> synchronize_irq(); // ACQUIRE + RELEASE
>
> teardown();
>
> So in both cases a simple check in the handler is sufficient:
>
> handler()
> if (!dev->ok)
> return;

Does this need to be if (!READ_ONCE(dev->ok)) ?



> I'm not understanding what you folks are trying to "fix" here. If any
> driver does this in the wrong order, then the driver is broken.
>
> Sure, you can do the same with:
>
> dev->ok = false;
> request_irq();
> moar_setup();
> smp_wmb();
> dev->ok = true;
>
> for the price of a smp_rmb() in the interrupt handler:
>
> handler()
> if (!dev->ok)
> return;
> smp_rmb();
>
> but that's only working for the setup case correctly and not for
> teardown.
>
> Thanks,
>
> tglx