Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18;
Date:   Thu, 20 Jan 2022 15:39:12 -0500
From:   "Michael S. Tsirkin" <mst@redhat.com>
To:     Peter Hilber <peter.hilber@opensynergy.com>
Cc:     Cristian Marussi <cristian.marussi@arm.com>,
        linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
        sudeep.holla@arm.com, james.quinlan@broadcom.com,
        Jonathan.Cameron@huawei.com, f.fainelli@gmail.com,
        etienne.carriere@linaro.org, vincent.guittot@linaro.org,
        souvik.chakravarty@arm.com, igor.skalkin@opensynergy.com,
        virtualization@lists.linux-foundation.org
Subject: Re: [PATCH v9 09/11] firmware: arm_scmi: Add atomic mode support to
 virtio transport
Message-ID: <20220120150418-mutt-send-email-mst@kernel.org>
References: <20211220195646.44498-10-cristian.marussi@arm.com>
 <20211221140027.41524-1-cristian.marussi@arm.com>
 <f231094a-6f34-3dc1-237d-97218e8fde91@opensynergy.com>
 <20220119122338.GE6113@e120937-lin>
 <2f1ea794-a0b9-2099-edc0-b2aeb3ca6b92@opensynergy.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <2f1ea794-a0b9-2099-edc0-b2aeb3ca6b92@opensynergy.com>
Precedence: bulk

On Thu, Jan 20, 2022 at 08:09:56PM +0100, Peter Hilber wrote:
> On 19.01.22 13:23, Cristian Marussi wrote:
> > On Tue, Jan 18, 2022 at 03:21:03PM +0100, Peter Hilber wrote:
> >> On 21.12.21 15:00, Cristian Marussi wrote:
> >>> Add support for .mark_txdone and .poll_done transport operations to SCMI
> >>> VirtIO transport as pre-requisites to enable atomic operations.
> >>>
> >>> Add a Kernel configuration option to enable SCMI VirtIO transport polling
> >>> and atomic mode for selected SCMI transactions while leaving it default
> >>> disabled.
> >>>
> >>
> >> Hi Cristian,
> >>
> >> thanks for the update. I have some more remarks inline below.
> >>
> > 
> > Hi Peter,
> > 
> > thanks for your review, much appreciated, please see my replies online.
> > 
> >> My impression is that the virtio core does not expose helper functions suitable
> >> to busy-poll for used buffers. But changing this might not be difficult. Maybe
> >> more_used() from virtio_ring.c could be exposed via a wrapper?
> >>
> > 
> > While I definitely agree that the virtio core support for polling is far from
> > ideal, some support is provided and my point was at first to try implement SCMI
> > virtio polling leveraging what we have now in the core and see if it was attainable
> > (indeed I tried early in this series to avoid as a whole to have to support polling
> > at the SCMI transport layer to attain SCMI cmds atomicity..but that was an ill
> > attempt that led nowhere good...)
> > 
> > Btw, I was planning to post a new series next week (after merge-windows) with some
> > fixes I did already, at this point I'll include also some fixes derived
> > from some of your remarks.
> > 
> >> Best regards,
> >>
> >> Peter
> >>
> [snip]>>> + *
> >>> + * Return: True once polling has successfully completed.
> >>> + */
> >>> +static bool virtio_poll_done(struct scmi_chan_info *cinfo,
> >>> +			     struct scmi_xfer *xfer)
> >>> +{
> >>> +	bool pending, ret = false;
> >>> +	unsigned int length, any_prefetched = 0;
> >>> +	unsigned long flags;
> >>> +	struct scmi_vio_msg *next_msg, *msg = xfer->priv;
> >>> +	struct scmi_vio_channel *vioch = cinfo->transport_info;
> >>> +
> >>> +	if (!msg)
> >>> +		return true;
> >>> +
> >>> +	spin_lock_irqsave(&msg->poll_lock, flags);
> >>> +	/* Processed already by other polling loop on another CPU ? */
> >>> +	if (msg->poll_idx == VIO_MSG_POLL_DONE) {
> >>> +		spin_unlock_irqrestore(&msg->poll_lock, flags);
> >>> +		return true;
> >>> +	}
> >>> +
> >>> +	/* Has cmdq index moved at all ? */
> >>> +	pending = virtqueue_poll(vioch->vqueue, msg->poll_idx);
> >>
> >> In my understanding, the polling comparison could still be subject to the ABA
> >> problem when exactly 2**16 messages have been marked as used since
> >> msg->poll_idx was set (unlikely scenario, granted).
> >>
> >> I think this would be a lot simpler if the virtio core exported some
> >> concurrency-safe helper function for such polling (similar to more_used() from
> >> virtio_ring.c), as discussed at the top.
> > 
> > So this is the main limitation indeed of the current implementation, I
> > cannot distinguish if there was an exact full wrap and I'm reading the same
> > last_idx as before but a whoppying 2**16 messages have instead gone through...
> > 
> > The tricky part seems to me here that even introducing dedicated helpers
> > for polling in order to account for such wrapping (similar to more_used())
> > those would be based by current VirtIO spec on a single bit wrap counter,
> > so how do you discern if 2 whole wraps have happened (even more unlikely..) ?
> > 
> > Maybe I'm missing something though...
> > 
> 
> In my understanding, there is no need to keep track of the old state. We
> actually only want to check whether the device has marked any buffers as `used'
> which we did not retrieve yet via virtqueue_get_buf_ctx().
> 
> This is what more_used() checks in my understanding. One would just need to
> translate the external `struct virtqueue' param to the virtio_ring.c internal
> representation `struct vring_virtqueue' and then call `more_used()'.
> 
> There would be no need to keep `poll_idx` then.
> 
> Best regards,
> 
> Peter

Not really, I don't think so.

There's no magic in more_used. No synchronization happens.
more_used is exactly like virtqueue_poll except
you get to maintain your own index.

As it is, it is quite possible to read the cached index,
then another thread makes 2^16 bufs available, then device
uses them all, and presto you get a false positive.

I guess we can play with memory barriers such that cache
read happens after the index read - but it seems that
will just lead to the same wrap around problem
in reverse. So IIUC it's quite a bit more involved than
just translating structures.

And yes, a more_used like API would remove the need to pass
the index around, but it will also obscure the fact that
there's internal state here and that it's inherently racy
wrt wrap arounds. Whereas I'm happy to see that virtqueue_poll
seems to have made it clear enough that people get it.


It's not hard to handle wrap around in the driver if you like though:
just have a 32 bit atomic counter and increment it each time you are
going to make 2^16 buffers available. That gets you to 2^48 with an
overhead of an atomic read and that should be enough short term. Make
sure the cache line where you put the counter is not needed elsewhere -
checking it in a tight loop with an atomic will force it to the local
CPU. And if you are doing that virtqueue_poll will be enough.


> 
> > I'll have a though about this, but in my opinion this seems something so
> > unlikely that we could live with it, for the moment at least...
> > [snip]