2022-06-08 18:18:40

by Anup Patel

[permalink] [raw]
Subject: [PATCH] rpmsg: virtio: Fix broken rpmsg_probe()

The rpmsg_probe() is broken at the moment because virtqueue_add_inbuf()
fails due to both virtqueues (Rx and Tx) marked as broken by the
__vring_new_virtqueue() function. To solve this, virtio_device_ready()
(which unbreaks queues) should be called before virtqueue_add_inbuf().

Fixes: 8b4ec69d7e09 ("virtio: harden vring IRQ")
Signed-off-by: Anup Patel <[email protected]>
---
drivers/rpmsg/virtio_rpmsg_bus.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/rpmsg/virtio_rpmsg_bus.c b/drivers/rpmsg/virtio_rpmsg_bus.c
index 905ac7910c98..71a64d2c7644 100644
--- a/drivers/rpmsg/virtio_rpmsg_bus.c
+++ b/drivers/rpmsg/virtio_rpmsg_bus.c
@@ -929,6 +929,9 @@ static int rpmsg_probe(struct virtio_device *vdev)
/* and half is dedicated for TX */
vrp->sbufs = bufs_va + total_buf_space / 2;

+ /* From this point on, we can notify and get callbacks. */
+ virtio_device_ready(vdev);
+
/* set up the receive buffers */
for (i = 0; i < vrp->num_bufs / 2; i++) {
struct scatterlist sg;
@@ -983,9 +986,6 @@ static int rpmsg_probe(struct virtio_device *vdev)
*/
notify = virtqueue_kick_prepare(vrp->rvq);

- /* From this point on, we can notify and get callbacks. */
- virtio_device_ready(vdev);
-
/* tell the remote processor it can start sending messages */
/*
* this might be concurrent with callbacks, but we are only
--
2.34.1


2022-06-29 18:04:38

by Mathieu Poirier

[permalink] [raw]
Subject: Re: [PATCH] rpmsg: virtio: Fix broken rpmsg_probe()

Hi Anup,

On Wed, Jun 08, 2022 at 10:43:34PM +0530, Anup Patel wrote:
> The rpmsg_probe() is broken at the moment because virtqueue_add_inbuf()
> fails due to both virtqueues (Rx and Tx) marked as broken by the
> __vring_new_virtqueue() function. To solve this, virtio_device_ready()
> (which unbreaks queues) should be called before virtqueue_add_inbuf().
>
> Fixes: 8b4ec69d7e09 ("virtio: harden vring IRQ")
> Signed-off-by: Anup Patel <[email protected]>
> ---
> drivers/rpmsg/virtio_rpmsg_bus.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/rpmsg/virtio_rpmsg_bus.c b/drivers/rpmsg/virtio_rpmsg_bus.c
> index 905ac7910c98..71a64d2c7644 100644
> --- a/drivers/rpmsg/virtio_rpmsg_bus.c
> +++ b/drivers/rpmsg/virtio_rpmsg_bus.c
> @@ -929,6 +929,9 @@ static int rpmsg_probe(struct virtio_device *vdev)
> /* and half is dedicated for TX */
> vrp->sbufs = bufs_va + total_buf_space / 2;
>
> + /* From this point on, we can notify and get callbacks. */
> + virtio_device_ready(vdev);
> +

Calling virtio_device_ready() here means that virtqueue_get_buf_ctx_split() can
potentially be called (by way of rpmsg_recv_done()), which will race with
virtqueue_add_inbuf(). If buffers in the virtqueue aren't available then
rpmsg_recv_done() will fail, potentially breaking remote processors' state
machines that don't expect their initial name service to fail when the "device"
has been marked as ready.

What does make me curious though is that nobody on the remoteproc mailing list
has complained about commit 8b4ec69d7e09 breaking their environment... By now,
i.e rc4, that should have happened. Anyone from TI, ST and Xilinx care to test this on
their rig?

Thanks,
Mathieu

> /* set up the receive buffers */
> for (i = 0; i < vrp->num_bufs / 2; i++) {
> struct scatterlist sg;
> @@ -983,9 +986,6 @@ static int rpmsg_probe(struct virtio_device *vdev)
> */
> notify = virtqueue_kick_prepare(vrp->rvq);
>
> - /* From this point on, we can notify and get callbacks. */
> - virtio_device_ready(vdev);
> -
> /* tell the remote processor it can start sending messages */
> /*
> * this might be concurrent with callbacks, but we are only
> --
> 2.34.1
>

2022-06-30 07:34:20

by Anup Patel

[permalink] [raw]
Subject: Re: [PATCH] rpmsg: virtio: Fix broken rpmsg_probe()

Hi Mathieu,

On Wed, Jun 29, 2022 at 11:13 PM Mathieu Poirier
<[email protected]> wrote:
>
> Hi Anup,
>
> On Wed, Jun 08, 2022 at 10:43:34PM +0530, Anup Patel wrote:
> > The rpmsg_probe() is broken at the moment because virtqueue_add_inbuf()
> > fails due to both virtqueues (Rx and Tx) marked as broken by the
> > __vring_new_virtqueue() function. To solve this, virtio_device_ready()
> > (which unbreaks queues) should be called before virtqueue_add_inbuf().
> >
> > Fixes: 8b4ec69d7e09 ("virtio: harden vring IRQ")
> > Signed-off-by: Anup Patel <[email protected]>
> > ---
> > drivers/rpmsg/virtio_rpmsg_bus.c | 6 +++---
> > 1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/rpmsg/virtio_rpmsg_bus.c b/drivers/rpmsg/virtio_rpmsg_bus.c
> > index 905ac7910c98..71a64d2c7644 100644
> > --- a/drivers/rpmsg/virtio_rpmsg_bus.c
> > +++ b/drivers/rpmsg/virtio_rpmsg_bus.c
> > @@ -929,6 +929,9 @@ static int rpmsg_probe(struct virtio_device *vdev)
> > /* and half is dedicated for TX */
> > vrp->sbufs = bufs_va + total_buf_space / 2;
> >
> > + /* From this point on, we can notify and get callbacks. */
> > + virtio_device_ready(vdev);
> > +
>
> Calling virtio_device_ready() here means that virtqueue_get_buf_ctx_split() can
> potentially be called (by way of rpmsg_recv_done()), which will race with
> virtqueue_add_inbuf(). If buffers in the virtqueue aren't available then
> rpmsg_recv_done() will fail, potentially breaking remote processors' state
> machines that don't expect their initial name service to fail when the "device"
> has been marked as ready.

We have VirtIO RPMSG available for Xvisor Guest/VM. When I run Linux-5.19-rcX
as Xvisor Guest/VM, I get following warning for every call to
virtqueue_add_inbuf():

[guest0/uart0] [ 2.147931] ------------[ cut here ]------------
[guest0/uart0] [ 2.166242] WARNING: CPU: 0 PID: 1 at
drivers/rpmsg/virtio_rpmsg_bus.c:941 rpmsg_probe+0x2e6/0x39e
[guest0/uart0] [ 2.190337] Modules linked in:
[guest0/uart0] [ 2.196514] CPU: 0 PID: 1 Comm: swapper/0 Not
tainted 5.19.0-rc4 #1
[guest0/uart0] [ 2.222706] Hardware name: Virt64 (DT)
[guest0/uart0] [ 2.231712] epc : rpmsg_probe+0x2e6/0x39e
[guest0/uart0] [ 2.243443] ra : rpmsg_probe+0x1f8/0x39e
[guest0/uart0] [ 2.256899] epc : ffffffff804f7b2c ra :
ffffffff804f7a3e sp : ff2000000400b870
[guest0/uart0] [ 2.277700] gp : ffffffff810dc5b8 tp :
ff60000001258000 t0 : ff2000000400b888
[guest0/uart0] [ 2.293144] t1 : 0000000000000008 t2 :
0000000000000040 s0 : ff2000000400b910
[guest0/uart0] [ 2.318932] s1 : ff600000012cde00 a0 :
fffffffffffffffb a1 : ff2000000400b858
[guest0/uart0] [ 2.339688] a2 : 0000000000000001 a3 :
0000000000000000 a4 : 0000000000000001
[guest0/uart0] [ 2.361387] a5 : 0000000000000000 a6 :
0000000000000000 a7 : 0000000000000cc0
[guest0/uart0] [ 2.396055] s2 : ff60000003a00000 s3 :
0000000000000000 s4 : ff600000015ad428
[guest0/uart0] [ 2.414673] s5 : 0000000000000000 s6 :
0000000000000cc0 s7 : ff60000003a00000
[guest0/uart0] [ 2.439996] s8 : ffffffff80d91408 s9 :
0000000000040000 s10: ffffffff808000ac
[guest0/uart0] [ 2.463539] s11: 0000000000000000 t3 :
ff6000000fb4f000 t4 : ffffffffffffffff
[guest0/uart0] [ 2.484201] t5 : 0000000000000000 t6 : ff600000025a20c4
[guest0/uart0] [ 2.500229] status: 0000000200000120 badaddr:
0000000000000000 cause: 0000000000000003
[guest0/uart0] [ 2.522769] [<ffffffff80374032>] virtio_dev_probe+0x162/0x2e6
[guest0/uart0] [ 2.544173] [<ffffffff803b57a8>]
really_probe.part.0+0x56/0x1ec
[guest0/uart0] [ 2.574984] [<ffffffff803b59ae>]
__driver_probe_device+0x70/0xde
[guest0/uart0] [ 2.596610] [<ffffffff803b5a48>] driver_probe_device+0x2c/0xb0
[guest0/uart0] [ 2.612964] [<ffffffff803b5f70>]
__device_attach_driver+0x62/0x9a
[guest0/uart0] [ 2.640715] [<ffffffff803b3b78>] bus_for_each_drv+0x4c/0x8a
[guest0/uart0] [ 2.659895] [<ffffffff803b5c7a>] __device_attach+0x96/0x17a
[guest0/uart0] [ 2.679809] [<ffffffff803b60be>] device_initial_probe+0xe/0x16
[guest0/uart0] [ 2.696944] [<ffffffff803b4b80>] bus_probe_device+0x7c/0x82
[guest0/uart0] [ 2.720666] [<ffffffff803b2aec>] device_add+0x2da/0x6be
[guest0/uart0] [ 2.743288] [<ffffffff80373e4e>]
register_virtio_device+0x192/0x214
[guest0/uart0] [ 2.765431] [<ffffffff80378048>] virtio_mmio_probe+0x134/0x1f2
[guest0/uart0] [ 2.795974] [<ffffffff803b77f2>] platform_probe+0x4e/0x96
[guest0/uart0] [ 2.816947] [<ffffffff803b57a8>]
really_probe.part.0+0x56/0x1ec
[guest0/uart0] [ 2.832840] [<ffffffff803b59ae>]
__driver_probe_device+0x70/0xde
[guest0/uart0] [ 2.854298] [<ffffffff803b5a48>] driver_probe_device+0x2c/0xb0
[guest0/uart0] [ 2.874422] [<ffffffff803b6008>] __driver_attach+0x60/0x108
[guest0/uart0] [ 2.897804] [<ffffffff803b3af2>] bus_for_each_dev+0x4a/0x84
[guest0/uart0] [ 2.927286] [<ffffffff803b522e>] driver_attach+0x1a/0x22
[guest0/uart0] [ 2.957400] [<ffffffff803b4d9e>] bus_add_driver+0x12c/0x196
[guest0/uart0] [ 2.977573] [<ffffffff803b677a>] driver_register+0x48/0xdc
[guest0/uart0] [ 3.001172] [<ffffffff803b7564>]
__platform_driver_register+0x1c/0x24
[guest0/uart0] [ 3.028986] [<ffffffff8081c11c>] virtio_mmio_init+0x1a/0x22
[guest0/uart0] [ 3.047038] [<ffffffff800020dc>] do_one_initcall+0x38/0x174
[guest0/uart0] [ 3.071633] [<ffffffff80800fca>]
kernel_init_freeable+0x1a6/0x20a
[guest0/uart0] [ 3.095840] [<ffffffff80652d04>] kernel_init+0x1e/0x10a
[guest0/uart0] [ 3.120816] [<ffffffff800032dc>] ret_from_exception+0x0/0xc
[guest0/uart0] [ 3.143400] ---[ end trace 0000000000000000 ]---

I am not claiming that this patch does the right thing but with this patch
VirtIO RPmsg starts working again for me on Xvisor Guest/VM.

Upon further investigation, it seems this warning is because of a recent kernel
commit mentioned in the "Fixes: " tag. Looks like with latest 5.19-rcX, we can't
call virtqueue_add_inbuf() until virtqueue is marked as unbroken by
virtio_device_ready().

Regards,
Anup

>
> What does make me curious though is that nobody on the remoteproc mailing list
> has complained about commit 8b4ec69d7e09 breaking their environment... By now,
> i.e rc4, that should have happened. Anyone from TI, ST and Xilinx care to test this on
> their rig?
>
> Thanks,
> Mathieu
>
> > /* set up the receive buffers */
> > for (i = 0; i < vrp->num_bufs / 2; i++) {
> > struct scatterlist sg;
> > @@ -983,9 +986,6 @@ static int rpmsg_probe(struct virtio_device *vdev)
> > */
> > notify = virtqueue_kick_prepare(vrp->rvq);
> >
> > - /* From this point on, we can notify and get callbacks. */
> > - virtio_device_ready(vdev);
> > -
> > /* tell the remote processor it can start sending messages */
> > /*
> > * this might be concurrent with callbacks, but we are only
> > --
> > 2.34.1
> >

2022-06-30 16:38:35

by Arnaud Pouliquen

[permalink] [raw]
Subject: Re: [PATCH] rpmsg: virtio: Fix broken rpmsg_probe()

Hi,

On 6/29/22 19:43, Mathieu Poirier wrote:
> Hi Anup,
>
> On Wed, Jun 08, 2022 at 10:43:34PM +0530, Anup Patel wrote:
>> The rpmsg_probe() is broken at the moment because virtqueue_add_inbuf()
>> fails due to both virtqueues (Rx and Tx) marked as broken by the
>> __vring_new_virtqueue() function. To solve this, virtio_device_ready()
>> (which unbreaks queues) should be called before virtqueue_add_inbuf().
>>
>> Fixes: 8b4ec69d7e09 ("virtio: harden vring IRQ")
>> Signed-off-by: Anup Patel <[email protected]>
>> ---
>> drivers/rpmsg/virtio_rpmsg_bus.c | 6 +++---
>> 1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/rpmsg/virtio_rpmsg_bus.c b/drivers/rpmsg/virtio_rpmsg_bus.c
>> index 905ac7910c98..71a64d2c7644 100644
>> --- a/drivers/rpmsg/virtio_rpmsg_bus.c
>> +++ b/drivers/rpmsg/virtio_rpmsg_bus.c
>> @@ -929,6 +929,9 @@ static int rpmsg_probe(struct virtio_device *vdev)
>> /* and half is dedicated for TX */
>> vrp->sbufs = bufs_va + total_buf_space / 2;
>>
>> + /* From this point on, we can notify and get callbacks. */
>> + virtio_device_ready(vdev);
>> +
>
> Calling virtio_device_ready() here means that virtqueue_get_buf_ctx_split() can
> potentially be called (by way of rpmsg_recv_done()), which will race with
> virtqueue_add_inbuf(). If buffers in the virtqueue aren't available then
> rpmsg_recv_done() will fail, potentially breaking remote processors' state
> machines that don't expect their initial name service to fail when the "device"
> has been marked as ready.
>
> What does make me curious though is that nobody on the remoteproc mailing list
> has complained about commit 8b4ec69d7e09 breaking their environment... By now,
> i.e rc4, that should have happened. Anyone from TI, ST and Xilinx care to test this on
> their rig?

I tested on STm32mp1 board using tag v5.19-rc4(03c765b0e3b4)
I confirm the issue!

Concerning the solution, I share Mathieu's concern. This could break legacy.
I made a short test and I would suggest to use __virtio_unbreak_device instead, tounbreak the virtqueues without changing the init sequence.

I this case the patch would be:

+ /*
+ * Unbreak the virtqueues to allow to add buffers before setting the vdev status
+ * to ready
+ */
+ __virtio_unbreak_device(vdev);
+

/* set up the receive buffers */
for (i = 0; i < vrp->num_bufs / 2; i++) {
struct scatterlist sg;
void *cpu_addr = vrp->rbufs + i * vrp->buf_size;

Regards,
Arnaud

>
> Thanks,
> Mathieu
>
>> /* set up the receive buffers */
>> for (i = 0; i < vrp->num_bufs / 2; i++) {
>> struct scatterlist sg;
>> @@ -983,9 +986,6 @@ static int rpmsg_probe(struct virtio_device *vdev)
>> */
>> notify = virtqueue_kick_prepare(vrp->rvq);
>>
>> - /* From this point on, we can notify and get callbacks. */
>> - virtio_device_ready(vdev);
>> -
>> /* tell the remote processor it can start sending messages */
>> /*
>> * this might be concurrent with callbacks, but we are only
>> --
>> 2.34.1
>>

2022-06-30 18:02:29

by Mathieu Poirier

[permalink] [raw]
Subject: Re: [PATCH] rpmsg: virtio: Fix broken rpmsg_probe()

+ [email protected]
+ [email protected]
+ [email protected]

On Thu, 30 Jun 2022 at 10:20, Arnaud POULIQUEN
<[email protected]> wrote:
>
> Hi,
>
> On 6/29/22 19:43, Mathieu Poirier wrote:
> > Hi Anup,
> >
> > On Wed, Jun 08, 2022 at 10:43:34PM +0530, Anup Patel wrote:
> >> The rpmsg_probe() is broken at the moment because virtqueue_add_inbuf()
> >> fails due to both virtqueues (Rx and Tx) marked as broken by the
> >> __vring_new_virtqueue() function. To solve this, virtio_device_ready()
> >> (which unbreaks queues) should be called before virtqueue_add_inbuf().
> >>
> >> Fixes: 8b4ec69d7e09 ("virtio: harden vring IRQ")
> >> Signed-off-by: Anup Patel <[email protected]>
> >> ---
> >> drivers/rpmsg/virtio_rpmsg_bus.c | 6 +++---
> >> 1 file changed, 3 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/rpmsg/virtio_rpmsg_bus.c b/drivers/rpmsg/virtio_rpmsg_bus.c
> >> index 905ac7910c98..71a64d2c7644 100644
> >> --- a/drivers/rpmsg/virtio_rpmsg_bus.c
> >> +++ b/drivers/rpmsg/virtio_rpmsg_bus.c
> >> @@ -929,6 +929,9 @@ static int rpmsg_probe(struct virtio_device *vdev)
> >> /* and half is dedicated for TX */
> >> vrp->sbufs = bufs_va + total_buf_space / 2;
> >>
> >> + /* From this point on, we can notify and get callbacks. */
> >> + virtio_device_ready(vdev);
> >> +
> >
> > Calling virtio_device_ready() here means that virtqueue_get_buf_ctx_split() can
> > potentially be called (by way of rpmsg_recv_done()), which will race with
> > virtqueue_add_inbuf(). If buffers in the virtqueue aren't available then
> > rpmsg_recv_done() will fail, potentially breaking remote processors' state
> > machines that don't expect their initial name service to fail when the "device"
> > has been marked as ready.
> >
> > What does make me curious though is that nobody on the remoteproc mailing list
> > has complained about commit 8b4ec69d7e09 breaking their environment... By now,
> > i.e rc4, that should have happened. Anyone from TI, ST and Xilinx care to test this on
> > their rig?
>
> I tested on STm32mp1 board using tag v5.19-rc4(03c765b0e3b4)
> I confirm the issue!
>
> Concerning the solution, I share Mathieu's concern. This could break legacy.
> I made a short test and I would suggest to use __virtio_unbreak_device instead, tounbreak the virtqueues without changing the init sequence.
>
> I this case the patch would be:
>
> + /*
> + * Unbreak the virtqueues to allow to add buffers before setting the vdev status
> + * to ready
> + */
> + __virtio_unbreak_device(vdev);
> +
>
> /* set up the receive buffers */
> for (i = 0; i < vrp->num_bufs / 2; i++) {
> struct scatterlist sg;
> void *cpu_addr = vrp->rbufs + i * vrp->buf_size;

This will indeed fix the problem. On the flip side the kernel
documentation for __virtio_unbreak_device() puzzles me...
It clearly states that it should be used for probing and restoring but
_not_ directly by the driver. Function rpmsg_probe() is part of
probing but also the entry point to a driver.

Michael and virtualisation folks, is this the right way to move forward?

>
> Regards,
> Arnaud
>
> >
> > Thanks,
> > Mathieu
> >
> >> /* set up the receive buffers */
> >> for (i = 0; i < vrp->num_bufs / 2; i++) {
> >> struct scatterlist sg;
> >> @@ -983,9 +986,6 @@ static int rpmsg_probe(struct virtio_device *vdev)
> >> */
> >> notify = virtqueue_kick_prepare(vrp->rvq);
> >>
> >> - /* From this point on, we can notify and get callbacks. */
> >> - virtio_device_ready(vdev);
> >> -
> >> /* tell the remote processor it can start sending messages */
> >> /*
> >> * this might be concurrent with callbacks, but we are only
> >> --
> >> 2.34.1
> >>

2022-06-30 19:26:36

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] rpmsg: virtio: Fix broken rpmsg_probe()

On Wed, Jun 08, 2022 at 10:43:34PM +0530, Anup Patel wrote:
> The rpmsg_probe() is broken at the moment because virtqueue_add_inbuf()
> fails due to both virtqueues (Rx and Tx) marked as broken by the
> __vring_new_virtqueue() function. To solve this, virtio_device_ready()
> (which unbreaks queues) should be called before virtqueue_add_inbuf().
>
> Fixes: 8b4ec69d7e09 ("virtio: harden vring IRQ")
> Signed-off-by: Anup Patel <[email protected]>


Yea, I've more or less reverted that. (There's VIRTIO_HARDEN_NOTIFICATION
that we'll use to try and develop this to something more reasonable).

> ---
> drivers/rpmsg/virtio_rpmsg_bus.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/rpmsg/virtio_rpmsg_bus.c b/drivers/rpmsg/virtio_rpmsg_bus.c
> index 905ac7910c98..71a64d2c7644 100644
> --- a/drivers/rpmsg/virtio_rpmsg_bus.c
> +++ b/drivers/rpmsg/virtio_rpmsg_bus.c
> @@ -929,6 +929,9 @@ static int rpmsg_probe(struct virtio_device *vdev)
> /* and half is dedicated for TX */
> vrp->sbufs = bufs_va + total_buf_space / 2;
>
> + /* From this point on, we can notify and get callbacks. */
> + virtio_device_ready(vdev);
> +
> /* set up the receive buffers */
> for (i = 0; i < vrp->num_bufs / 2; i++) {
> struct scatterlist sg;
> @@ -983,9 +986,6 @@ static int rpmsg_probe(struct virtio_device *vdev)
> */
> notify = virtqueue_kick_prepare(vrp->rvq);
>
> - /* From this point on, we can notify and get callbacks. */
> - virtio_device_ready(vdev);
> -
> /* tell the remote processor it can start sending messages */
> /*
> * this might be concurrent with callbacks, but we are only
> --
> 2.34.1
>
>

2022-06-30 19:42:46

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] rpmsg: virtio: Fix broken rpmsg_probe()

On Thu, Jun 30, 2022 at 11:51:30AM -0600, Mathieu Poirier wrote:
> + [email protected]
> + [email protected]
> + [email protected]
>
> On Thu, 30 Jun 2022 at 10:20, Arnaud POULIQUEN
> <[email protected]> wrote:
> >
> > Hi,
> >
> > On 6/29/22 19:43, Mathieu Poirier wrote:
> > > Hi Anup,
> > >
> > > On Wed, Jun 08, 2022 at 10:43:34PM +0530, Anup Patel wrote:
> > >> The rpmsg_probe() is broken at the moment because virtqueue_add_inbuf()
> > >> fails due to both virtqueues (Rx and Tx) marked as broken by the
> > >> __vring_new_virtqueue() function. To solve this, virtio_device_ready()
> > >> (which unbreaks queues) should be called before virtqueue_add_inbuf().
> > >>
> > >> Fixes: 8b4ec69d7e09 ("virtio: harden vring IRQ")
> > >> Signed-off-by: Anup Patel <[email protected]>
> > >> ---
> > >> drivers/rpmsg/virtio_rpmsg_bus.c | 6 +++---
> > >> 1 file changed, 3 insertions(+), 3 deletions(-)
> > >>
> > >> diff --git a/drivers/rpmsg/virtio_rpmsg_bus.c b/drivers/rpmsg/virtio_rpmsg_bus.c
> > >> index 905ac7910c98..71a64d2c7644 100644
> > >> --- a/drivers/rpmsg/virtio_rpmsg_bus.c
> > >> +++ b/drivers/rpmsg/virtio_rpmsg_bus.c
> > >> @@ -929,6 +929,9 @@ static int rpmsg_probe(struct virtio_device *vdev)
> > >> /* and half is dedicated for TX */
> > >> vrp->sbufs = bufs_va + total_buf_space / 2;
> > >>
> > >> + /* From this point on, we can notify and get callbacks. */
> > >> + virtio_device_ready(vdev);
> > >> +
> > >
> > > Calling virtio_device_ready() here means that virtqueue_get_buf_ctx_split() can
> > > potentially be called (by way of rpmsg_recv_done()), which will race with
> > > virtqueue_add_inbuf(). If buffers in the virtqueue aren't available then
> > > rpmsg_recv_done() will fail, potentially breaking remote processors' state
> > > machines that don't expect their initial name service to fail when the "device"
> > > has been marked as ready.
> > >
> > > What does make me curious though is that nobody on the remoteproc mailing list
> > > has complained about commit 8b4ec69d7e09 breaking their environment... By now,
> > > i.e rc4, that should have happened. Anyone from TI, ST and Xilinx care to test this on
> > > their rig?
> >
> > I tested on STm32mp1 board using tag v5.19-rc4(03c765b0e3b4)
> > I confirm the issue!
> >
> > Concerning the solution, I share Mathieu's concern. This could break legacy.
> > I made a short test and I would suggest to use __virtio_unbreak_device instead, tounbreak the virtqueues without changing the init sequence.
> >
> > I this case the patch would be:
> >
> > + /*
> > + * Unbreak the virtqueues to allow to add buffers before setting the vdev status
> > + * to ready
> > + */
> > + __virtio_unbreak_device(vdev);
> > +
> >
> > /* set up the receive buffers */
> > for (i = 0; i < vrp->num_bufs / 2; i++) {
> > struct scatterlist sg;
> > void *cpu_addr = vrp->rbufs + i * vrp->buf_size;
>
> This will indeed fix the problem. On the flip side the kernel
> documentation for __virtio_unbreak_device() puzzles me...
> It clearly states that it should be used for probing and restoring but
> _not_ directly by the driver. Function rpmsg_probe() is part of
> probing but also the entry point to a driver.
>
> Michael and virtualisation folks, is this the right way to move forward?

I don't think it is, __virtio_unbreak_device is intended for core use.

> >
> > Regards,
> > Arnaud
> >
> > >
> > > Thanks,
> > > Mathieu
> > >
> > >> /* set up the receive buffers */
> > >> for (i = 0; i < vrp->num_bufs / 2; i++) {
> > >> struct scatterlist sg;
> > >> @@ -983,9 +986,6 @@ static int rpmsg_probe(struct virtio_device *vdev)
> > >> */
> > >> notify = virtqueue_kick_prepare(vrp->rvq);
> > >>
> > >> - /* From this point on, we can notify and get callbacks. */
> > >> - virtio_device_ready(vdev);
> > >> -
> > >> /* tell the remote processor it can start sending messages */
> > >> /*
> > >> * this might be concurrent with callbacks, but we are only
> > >> --
> > >> 2.34.1
> > >>

2022-06-30 19:50:12

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] rpmsg: virtio: Fix broken rpmsg_probe()

On Wed, Jun 29, 2022 at 11:43:18AM -0600, Mathieu Poirier wrote:
> Hi Anup,
>
> On Wed, Jun 08, 2022 at 10:43:34PM +0530, Anup Patel wrote:
> > The rpmsg_probe() is broken at the moment because virtqueue_add_inbuf()
> > fails due to both virtqueues (Rx and Tx) marked as broken by the
> > __vring_new_virtqueue() function. To solve this, virtio_device_ready()
> > (which unbreaks queues) should be called before virtqueue_add_inbuf().
> >
> > Fixes: 8b4ec69d7e09 ("virtio: harden vring IRQ")
> > Signed-off-by: Anup Patel <[email protected]>
> > ---
> > drivers/rpmsg/virtio_rpmsg_bus.c | 6 +++---
> > 1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/rpmsg/virtio_rpmsg_bus.c b/drivers/rpmsg/virtio_rpmsg_bus.c
> > index 905ac7910c98..71a64d2c7644 100644
> > --- a/drivers/rpmsg/virtio_rpmsg_bus.c
> > +++ b/drivers/rpmsg/virtio_rpmsg_bus.c
> > @@ -929,6 +929,9 @@ static int rpmsg_probe(struct virtio_device *vdev)
> > /* and half is dedicated for TX */
> > vrp->sbufs = bufs_va + total_buf_space / 2;
> >
> > + /* From this point on, we can notify and get callbacks. */
> > + virtio_device_ready(vdev);
> > +
>
> Calling virtio_device_ready() here means that virtqueue_get_buf_ctx_split() can
> potentially be called (by way of rpmsg_recv_done()), which will race with
> virtqueue_add_inbuf(). If buffers in the virtqueue aren't available then
> rpmsg_recv_done() will fail, potentially breaking remote processors' state
> machines that don't expect their initial name service to fail when the "device"
> has been marked as ready.

When you say available I am guessing you really need used.

With a non broken device you won't get a callback
until some buffers have been used.

Or, if no used buffers are present then you will get another
callback down the road.


>
> What does make me curious though is that nobody on the remoteproc mailing list
> has complained about commit 8b4ec69d7e09 breaking their environment... By now,
> i.e rc4, that should have happened. Anyone from TI, ST and Xilinx care to test this on
> their rig?
>
> Thanks,
> Mathieu
>
> > /* set up the receive buffers */
> > for (i = 0; i < vrp->num_bufs / 2; i++) {
> > struct scatterlist sg;
> > @@ -983,9 +986,6 @@ static int rpmsg_probe(struct virtio_device *vdev)
> > */
> > notify = virtqueue_kick_prepare(vrp->rvq);
> >
> > - /* From this point on, we can notify and get callbacks. */
> > - virtio_device_ready(vdev);
> > -
> > /* tell the remote processor it can start sending messages */
> > /*
> > * this might be concurrent with callbacks, but we are only
> > --
> > 2.34.1
> >
>

2022-07-01 01:28:00

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH] rpmsg: virtio: Fix broken rpmsg_probe()

On Fri, Jul 1, 2022 at 3:20 AM Michael S. Tsirkin <[email protected]> wrote:
>
> On Thu, Jun 30, 2022 at 11:51:30AM -0600, Mathieu Poirier wrote:
> > + [email protected]
> > + [email protected]
> > + [email protected]
> >
> > On Thu, 30 Jun 2022 at 10:20, Arnaud POULIQUEN
> > <[email protected]> wrote:
> > >
> > > Hi,
> > >
> > > On 6/29/22 19:43, Mathieu Poirier wrote:
> > > > Hi Anup,
> > > >
> > > > On Wed, Jun 08, 2022 at 10:43:34PM +0530, Anup Patel wrote:
> > > >> The rpmsg_probe() is broken at the moment because virtqueue_add_inbuf()
> > > >> fails due to both virtqueues (Rx and Tx) marked as broken by the
> > > >> __vring_new_virtqueue() function. To solve this, virtio_device_ready()
> > > >> (which unbreaks queues) should be called before virtqueue_add_inbuf().
> > > >>
> > > >> Fixes: 8b4ec69d7e09 ("virtio: harden vring IRQ")
> > > >> Signed-off-by: Anup Patel <[email protected]>
> > > >> ---
> > > >> drivers/rpmsg/virtio_rpmsg_bus.c | 6 +++---
> > > >> 1 file changed, 3 insertions(+), 3 deletions(-)
> > > >>
> > > >> diff --git a/drivers/rpmsg/virtio_rpmsg_bus.c b/drivers/rpmsg/virtio_rpmsg_bus.c
> > > >> index 905ac7910c98..71a64d2c7644 100644
> > > >> --- a/drivers/rpmsg/virtio_rpmsg_bus.c
> > > >> +++ b/drivers/rpmsg/virtio_rpmsg_bus.c
> > > >> @@ -929,6 +929,9 @@ static int rpmsg_probe(struct virtio_device *vdev)
> > > >> /* and half is dedicated for TX */
> > > >> vrp->sbufs = bufs_va + total_buf_space / 2;
> > > >>
> > > >> + /* From this point on, we can notify and get callbacks. */
> > > >> + virtio_device_ready(vdev);
> > > >> +
> > > >
> > > > Calling virtio_device_ready() here means that virtqueue_get_buf_ctx_split() can
> > > > potentially be called (by way of rpmsg_recv_done()), which will race with
> > > > virtqueue_add_inbuf(). If buffers in the virtqueue aren't available then
> > > > rpmsg_recv_done() will fail, potentially breaking remote processors' state
> > > > machines that don't expect their initial name service to fail when the "device"
> > > > has been marked as ready.
> > > >
> > > > What does make me curious though is that nobody on the remoteproc mailing list
> > > > has complained about commit 8b4ec69d7e09 breaking their environment... By now,
> > > > i.e rc4, that should have happened. Anyone from TI, ST and Xilinx care to test this on
> > > > their rig?
> > >
> > > I tested on STm32mp1 board using tag v5.19-rc4(03c765b0e3b4)
> > > I confirm the issue!
> > >
> > > Concerning the solution, I share Mathieu's concern. This could break legacy.
> > > I made a short test and I would suggest to use __virtio_unbreak_device instead, tounbreak the virtqueues without changing the init sequence.
> > >
> > > I this case the patch would be:
> > >
> > > + /*
> > > + * Unbreak the virtqueues to allow to add buffers before setting the vdev status
> > > + * to ready
> > > + */
> > > + __virtio_unbreak_device(vdev);
> > > +
> > >
> > > /* set up the receive buffers */
> > > for (i = 0; i < vrp->num_bufs / 2; i++) {
> > > struct scatterlist sg;
> > > void *cpu_addr = vrp->rbufs + i * vrp->buf_size;
> >
> > This will indeed fix the problem. On the flip side the kernel
> > documentation for __virtio_unbreak_device() puzzles me...
> > It clearly states that it should be used for probing and restoring but
> > _not_ directly by the driver. Function rpmsg_probe() is part of
> > probing but also the entry point to a driver.
> >
> > Michael and virtualisation folks, is this the right way to move forward?
>
> I don't think it is, __virtio_unbreak_device is intended for core use.

Can we fill the rx after virtio_device_ready() in this case?

Btw, the driver set driver ok after registering, we probably get a svq
kick before DRIVER_OK?

Thanks

>
> > >
> > > Regards,
> > > Arnaud
> > >
> > > >
> > > > Thanks,
> > > > Mathieu
> > > >
> > > >> /* set up the receive buffers */
> > > >> for (i = 0; i < vrp->num_bufs / 2; i++) {
> > > >> struct scatterlist sg;
> > > >> @@ -983,9 +986,6 @@ static int rpmsg_probe(struct virtio_device *vdev)
> > > >> */
> > > >> notify = virtqueue_kick_prepare(vrp->rvq);
> > > >>
> > > >> - /* From this point on, we can notify and get callbacks. */
> > > >> - virtio_device_ready(vdev);
> > > >> -
> > > >> /* tell the remote processor it can start sending messages */
> > > >> /*
> > > >> * this might be concurrent with callbacks, but we are only
> > > >> --
> > > >> 2.34.1
> > > >>
>

2022-07-01 06:40:45

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] rpmsg: virtio: Fix broken rpmsg_probe()

On Fri, Jul 01, 2022 at 09:22:15AM +0800, Jason Wang wrote:
> On Fri, Jul 1, 2022 at 3:20 AM Michael S. Tsirkin <[email protected]> wrote:
> >
> > On Thu, Jun 30, 2022 at 11:51:30AM -0600, Mathieu Poirier wrote:
> > > + [email protected]
> > > + [email protected]
> > > + [email protected]
> > >
> > > On Thu, 30 Jun 2022 at 10:20, Arnaud POULIQUEN
> > > <[email protected]> wrote:
> > > >
> > > > Hi,
> > > >
> > > > On 6/29/22 19:43, Mathieu Poirier wrote:
> > > > > Hi Anup,
> > > > >
> > > > > On Wed, Jun 08, 2022 at 10:43:34PM +0530, Anup Patel wrote:
> > > > >> The rpmsg_probe() is broken at the moment because virtqueue_add_inbuf()
> > > > >> fails due to both virtqueues (Rx and Tx) marked as broken by the
> > > > >> __vring_new_virtqueue() function. To solve this, virtio_device_ready()
> > > > >> (which unbreaks queues) should be called before virtqueue_add_inbuf().
> > > > >>
> > > > >> Fixes: 8b4ec69d7e09 ("virtio: harden vring IRQ")
> > > > >> Signed-off-by: Anup Patel <[email protected]>
> > > > >> ---
> > > > >> drivers/rpmsg/virtio_rpmsg_bus.c | 6 +++---
> > > > >> 1 file changed, 3 insertions(+), 3 deletions(-)
> > > > >>
> > > > >> diff --git a/drivers/rpmsg/virtio_rpmsg_bus.c b/drivers/rpmsg/virtio_rpmsg_bus.c
> > > > >> index 905ac7910c98..71a64d2c7644 100644
> > > > >> --- a/drivers/rpmsg/virtio_rpmsg_bus.c
> > > > >> +++ b/drivers/rpmsg/virtio_rpmsg_bus.c
> > > > >> @@ -929,6 +929,9 @@ static int rpmsg_probe(struct virtio_device *vdev)
> > > > >> /* and half is dedicated for TX */
> > > > >> vrp->sbufs = bufs_va + total_buf_space / 2;
> > > > >>
> > > > >> + /* From this point on, we can notify and get callbacks. */
> > > > >> + virtio_device_ready(vdev);
> > > > >> +
> > > > >
> > > > > Calling virtio_device_ready() here means that virtqueue_get_buf_ctx_split() can
> > > > > potentially be called (by way of rpmsg_recv_done()), which will race with
> > > > > virtqueue_add_inbuf(). If buffers in the virtqueue aren't available then
> > > > > rpmsg_recv_done() will fail, potentially breaking remote processors' state
> > > > > machines that don't expect their initial name service to fail when the "device"
> > > > > has been marked as ready.
> > > > >
> > > > > What does make me curious though is that nobody on the remoteproc mailing list
> > > > > has complained about commit 8b4ec69d7e09 breaking their environment... By now,
> > > > > i.e rc4, that should have happened. Anyone from TI, ST and Xilinx care to test this on
> > > > > their rig?
> > > >
> > > > I tested on STm32mp1 board using tag v5.19-rc4(03c765b0e3b4)
> > > > I confirm the issue!
> > > >
> > > > Concerning the solution, I share Mathieu's concern. This could break legacy.
> > > > I made a short test and I would suggest to use __virtio_unbreak_device instead, tounbreak the virtqueues without changing the init sequence.
> > > >
> > > > I this case the patch would be:
> > > >
> > > > + /*
> > > > + * Unbreak the virtqueues to allow to add buffers before setting the vdev status
> > > > + * to ready
> > > > + */
> > > > + __virtio_unbreak_device(vdev);
> > > > +
> > > >
> > > > /* set up the receive buffers */
> > > > for (i = 0; i < vrp->num_bufs / 2; i++) {
> > > > struct scatterlist sg;
> > > > void *cpu_addr = vrp->rbufs + i * vrp->buf_size;
> > >
> > > This will indeed fix the problem. On the flip side the kernel
> > > documentation for __virtio_unbreak_device() puzzles me...
> > > It clearly states that it should be used for probing and restoring but
> > > _not_ directly by the driver. Function rpmsg_probe() is part of
> > > probing but also the entry point to a driver.
> > >
> > > Michael and virtualisation folks, is this the right way to move forward?
> >
> > I don't think it is, __virtio_unbreak_device is intended for core use.
>
> Can we fill the rx after virtio_device_ready() in this case?
>
> Btw, the driver set driver ok after registering, we probably get a svq
> kick before DRIVER_OK?
>
> Thanks

Is this an ack for the original patch?

> >
> > > >
> > > > Regards,
> > > > Arnaud
> > > >
> > > > >
> > > > > Thanks,
> > > > > Mathieu
> > > > >
> > > > >> /* set up the receive buffers */
> > > > >> for (i = 0; i < vrp->num_bufs / 2; i++) {
> > > > >> struct scatterlist sg;
> > > > >> @@ -983,9 +986,6 @@ static int rpmsg_probe(struct virtio_device *vdev)
> > > > >> */
> > > > >> notify = virtqueue_kick_prepare(vrp->rvq);
> > > > >>
> > > > >> - /* From this point on, we can notify and get callbacks. */
> > > > >> - virtio_device_ready(vdev);
> > > > >> -
> > > > >> /* tell the remote processor it can start sending messages */
> > > > >> /*
> > > > >> * this might be concurrent with callbacks, but we are only
> > > > >> --
> > > > >> 2.34.1
> > > > >>
> >

2022-07-01 09:07:31

by Arnaud Pouliquen

[permalink] [raw]
Subject: Re: [PATCH] rpmsg: virtio: Fix broken rpmsg_probe()

Hello,

On 6/30/22 21:19, Michael S. Tsirkin wrote:
> On Wed, Jun 29, 2022 at 11:43:18AM -0600, Mathieu Poirier wrote:
>> Hi Anup,
>>
>> On Wed, Jun 08, 2022 at 10:43:34PM +0530, Anup Patel wrote:
>>> The rpmsg_probe() is broken at the moment because virtqueue_add_inbuf()
>>> fails due to both virtqueues (Rx and Tx) marked as broken by the
>>> __vring_new_virtqueue() function. To solve this, virtio_device_ready()
>>> (which unbreaks queues) should be called before virtqueue_add_inbuf().
>>>
>>> Fixes: 8b4ec69d7e09 ("virtio: harden vring IRQ")
>>> Signed-off-by: Anup Patel <[email protected]>
>>> ---
>>> drivers/rpmsg/virtio_rpmsg_bus.c | 6 +++---
>>> 1 file changed, 3 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/rpmsg/virtio_rpmsg_bus.c b/drivers/rpmsg/virtio_rpmsg_bus.c
>>> index 905ac7910c98..71a64d2c7644 100644
>>> --- a/drivers/rpmsg/virtio_rpmsg_bus.c
>>> +++ b/drivers/rpmsg/virtio_rpmsg_bus.c
>>> @@ -929,6 +929,9 @@ static int rpmsg_probe(struct virtio_device *vdev)
>>> /* and half is dedicated for TX */
>>> vrp->sbufs = bufs_va + total_buf_space / 2;
>>>
>>> + /* From this point on, we can notify and get callbacks. */
>>> + virtio_device_ready(vdev);
>>> +
>>
>> Calling virtio_device_ready() here means that virtqueue_get_buf_ctx_split() can
>> potentially be called (by way of rpmsg_recv_done()), which will race with
>> virtqueue_add_inbuf(). If buffers in the virtqueue aren't available then
>> rpmsg_recv_done() will fail, potentially breaking remote processors' state
>> machines that don't expect their initial name service to fail when the "device"
>> has been marked as ready.
>
> When you say available I am guessing you really need used.
>
> With a non broken device you won't get a callback
> until some buffers have been used.
>
> Or, if no used buffers are present then you will get another
> callback down the road.

In current implementation the Linux rpmsg_virtio driver allocates the
virtio buffers for the coprocessor rpmsg virtio device transmission and
then updates the virtio device status in shared memory to inform the
coprocessor that it is ready for inter-processor communication.

So from coprocessor perspective, when the virtio device is ready
(set to VIRTIO_CONFIG_S_DRIVER_OK), it can
start to get available buffers and send virtio buffers to the Linux.

With the patch proposed, the virtio is set to VIRTIO_CONFIG_S_DRIVER_OK
while no buffer are available for the coprocessor transmission.

I'm agree that, if the Linux rpmsg_virtio driver has not allocated the
buffer, the coprocessor will fail to get available virtio buffer for
communication and so has "just" to wait that some buffers are available
in the virtqueue.

But this change the behavior and can lead to an unexpected error case
for some legacy coprocessor firmware...
Should we take the risk that this legacy is no longer compatible?


That said regarding the virtio spec 1.1 chapter 3.1.1 [1], I also wonder
if the introduction of the virqueue broken flag is compliant with the
spec?
But i guess this is probably a matter of interpretation...

"
The driver MUST follow this sequence to initialize a device:
[...]
7. Perform device-specific setup, including discovery of virtqueues for
the device, optional per-bus setup, reading and possibly writing the
device’s virtio configuration space, and population of virtqueues.
8. Set the DRIVER_OK status bit. At this point the device is “live”.
"

My question is what means in point 7. "and population of virtqueues"?

In my interpretation the call of "virtqueue_add_inbuf()" populates the
RX virtqueue.
That would mean that calling virtqueue_add_inbuf before calling
virtio_device_ready() should be possible.

Thanks,
Arnaud

[1]https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-920001


>
>
>>
>> What does make me curious though is that nobody on the remoteproc mailing list
>> has complained about commit 8b4ec69d7e09 breaking their environment... By now,
>> i.e rc4, that should have happened. Anyone from TI, ST and Xilinx care to test this on
>> their rig?
>>
>> Thanks,
>> Mathieu
>>
>>> /* set up the receive buffers */
>>> for (i = 0; i < vrp->num_bufs / 2; i++) {
>>> struct scatterlist sg;
>>> @@ -983,9 +986,6 @@ static int rpmsg_probe(struct virtio_device *vdev)
>>> */
>>> notify = virtqueue_kick_prepare(vrp->rvq);
>>>
>>> - /* From this point on, we can notify and get callbacks. */
>>> - virtio_device_ready(vdev);
>>> -
>>> /* tell the remote processor it can start sending messages */
>>> /*
>>> * this might be concurrent with callbacks, but we are only
>>> --
>>> 2.34.1
>>>
>>
>

2022-07-01 09:47:39

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] rpmsg: virtio: Fix broken rpmsg_probe()

On Fri, Jul 01, 2022 at 11:00:28AM +0200, Arnaud POULIQUEN wrote:
> Hello,
>
> On 6/30/22 21:19, Michael S. Tsirkin wrote:
> > On Wed, Jun 29, 2022 at 11:43:18AM -0600, Mathieu Poirier wrote:
> >> Hi Anup,
> >>
> >> On Wed, Jun 08, 2022 at 10:43:34PM +0530, Anup Patel wrote:
> >>> The rpmsg_probe() is broken at the moment because virtqueue_add_inbuf()
> >>> fails due to both virtqueues (Rx and Tx) marked as broken by the
> >>> __vring_new_virtqueue() function. To solve this, virtio_device_ready()
> >>> (which unbreaks queues) should be called before virtqueue_add_inbuf().
> >>>
> >>> Fixes: 8b4ec69d7e09 ("virtio: harden vring IRQ")
> >>> Signed-off-by: Anup Patel <[email protected]>
> >>> ---
> >>> drivers/rpmsg/virtio_rpmsg_bus.c | 6 +++---
> >>> 1 file changed, 3 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/drivers/rpmsg/virtio_rpmsg_bus.c b/drivers/rpmsg/virtio_rpmsg_bus.c
> >>> index 905ac7910c98..71a64d2c7644 100644
> >>> --- a/drivers/rpmsg/virtio_rpmsg_bus.c
> >>> +++ b/drivers/rpmsg/virtio_rpmsg_bus.c
> >>> @@ -929,6 +929,9 @@ static int rpmsg_probe(struct virtio_device *vdev)
> >>> /* and half is dedicated for TX */
> >>> vrp->sbufs = bufs_va + total_buf_space / 2;
> >>>
> >>> + /* From this point on, we can notify and get callbacks. */
> >>> + virtio_device_ready(vdev);
> >>> +
> >>
> >> Calling virtio_device_ready() here means that virtqueue_get_buf_ctx_split() can
> >> potentially be called (by way of rpmsg_recv_done()), which will race with
> >> virtqueue_add_inbuf(). If buffers in the virtqueue aren't available then
> >> rpmsg_recv_done() will fail, potentially breaking remote processors' state
> >> machines that don't expect their initial name service to fail when the "device"
> >> has been marked as ready.
> >
> > When you say available I am guessing you really need used.
> >
> > With a non broken device you won't get a callback
> > until some buffers have been used.
> >
> > Or, if no used buffers are present then you will get another
> > callback down the road.
>
> In current implementation the Linux rpmsg_virtio driver allocates the
> virtio buffers for the coprocessor rpmsg virtio device transmission and
> then updates the virtio device status in shared memory to inform the
> coprocessor that it is ready for inter-processor communication.
>
> So from coprocessor perspective, when the virtio device is ready
> (set to VIRTIO_CONFIG_S_DRIVER_OK), it can
> start to get available buffers and send virtio buffers to the Linux.
>
> With the patch proposed, the virtio is set to VIRTIO_CONFIG_S_DRIVER_OK
> while no buffer are available for the coprocessor transmission.
>
> I'm agree that, if the Linux rpmsg_virtio driver has not allocated the
> buffer, the coprocessor will fail to get available virtio buffer for
> communication and so has "just" to wait that some buffers are available
> in the virtqueue.
>
> But this change the behavior and can lead to an unexpected error case
> for some legacy coprocessor firmware...
> Should we take the risk that this legacy is no longer compatible?
>
>
> That said regarding the virtio spec 1.1 chapter 3.1.1 [1], I also wonder
> if the introduction of the virqueue broken flag is compliant with the
> spec?
> But i guess this is probably a matter of interpretation...
>
> "
> The driver MUST follow this sequence to initialize a device:
> [...]
> 7. Perform device-specific setup, including discovery of virtqueues for
> the device, optional per-bus setup, reading and possibly writing the
> device’s virtio configuration space, and population of virtqueues.
> 8. Set the DRIVER_OK status bit. At this point the device is “live”.
> "
>
> My question is what means in point 7. "and population of virtqueues"?
>
> In my interpretation the call of "virtqueue_add_inbuf()" populates the
> RX virtqueue.
> That would mean that calling virtqueue_add_inbuf before calling
> virtio_device_ready() should be possible.
>
> Thanks,
> Arnaud
>
> [1]https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-920001

I think I agree. For example, the networking device uses "population" in this
sense:

It is generally a good idea to keep the receive virtqueue as
fully populated as possible: if it runs out, network performance
will suffer.



>
> >
> >
> >>
> >> What does make me curious though is that nobody on the remoteproc mailing list
> >> has complained about commit 8b4ec69d7e09 breaking their environment... By now,
> >> i.e rc4, that should have happened. Anyone from TI, ST and Xilinx care to test this on
> >> their rig?
> >>
> >> Thanks,
> >> Mathieu
> >>
> >>> /* set up the receive buffers */
> >>> for (i = 0; i < vrp->num_bufs / 2; i++) {
> >>> struct scatterlist sg;
> >>> @@ -983,9 +986,6 @@ static int rpmsg_probe(struct virtio_device *vdev)
> >>> */
> >>> notify = virtqueue_kick_prepare(vrp->rvq);
> >>>
> >>> - /* From this point on, we can notify and get callbacks. */
> >>> - virtio_device_ready(vdev);
> >>> -
> >>> /* tell the remote processor it can start sending messages */
> >>> /*
> >>> * this might be concurrent with callbacks, but we are only
> >>> --
> >>> 2.34.1
> >>>
> >>
> >

2022-07-04 04:59:23

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH] rpmsg: virtio: Fix broken rpmsg_probe()

On Fri, Jul 1, 2022 at 2:16 PM Michael S. Tsirkin <[email protected]> wrote:
>
> On Fri, Jul 01, 2022 at 09:22:15AM +0800, Jason Wang wrote:
> > On Fri, Jul 1, 2022 at 3:20 AM Michael S. Tsirkin <[email protected]> wrote:
> > >
> > > On Thu, Jun 30, 2022 at 11:51:30AM -0600, Mathieu Poirier wrote:
> > > > + [email protected]
> > > > + [email protected]
> > > > + [email protected]
> > > >
> > > > On Thu, 30 Jun 2022 at 10:20, Arnaud POULIQUEN
> > > > <[email protected]> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > On 6/29/22 19:43, Mathieu Poirier wrote:
> > > > > > Hi Anup,
> > > > > >
> > > > > > On Wed, Jun 08, 2022 at 10:43:34PM +0530, Anup Patel wrote:
> > > > > >> The rpmsg_probe() is broken at the moment because virtqueue_add_inbuf()
> > > > > >> fails due to both virtqueues (Rx and Tx) marked as broken by the
> > > > > >> __vring_new_virtqueue() function. To solve this, virtio_device_ready()
> > > > > >> (which unbreaks queues) should be called before virtqueue_add_inbuf().
> > > > > >>
> > > > > >> Fixes: 8b4ec69d7e09 ("virtio: harden vring IRQ")
> > > > > >> Signed-off-by: Anup Patel <[email protected]>
> > > > > >> ---
> > > > > >> drivers/rpmsg/virtio_rpmsg_bus.c | 6 +++---
> > > > > >> 1 file changed, 3 insertions(+), 3 deletions(-)
> > > > > >>
> > > > > >> diff --git a/drivers/rpmsg/virtio_rpmsg_bus.c b/drivers/rpmsg/virtio_rpmsg_bus.c
> > > > > >> index 905ac7910c98..71a64d2c7644 100644
> > > > > >> --- a/drivers/rpmsg/virtio_rpmsg_bus.c
> > > > > >> +++ b/drivers/rpmsg/virtio_rpmsg_bus.c
> > > > > >> @@ -929,6 +929,9 @@ static int rpmsg_probe(struct virtio_device *vdev)
> > > > > >> /* and half is dedicated for TX */
> > > > > >> vrp->sbufs = bufs_va + total_buf_space / 2;
> > > > > >>
> > > > > >> + /* From this point on, we can notify and get callbacks. */
> > > > > >> + virtio_device_ready(vdev);
> > > > > >> +
> > > > > >
> > > > > > Calling virtio_device_ready() here means that virtqueue_get_buf_ctx_split() can
> > > > > > potentially be called (by way of rpmsg_recv_done()), which will race with
> > > > > > virtqueue_add_inbuf(). If buffers in the virtqueue aren't available then
> > > > > > rpmsg_recv_done() will fail, potentially breaking remote processors' state
> > > > > > machines that don't expect their initial name service to fail when the "device"
> > > > > > has been marked as ready.
> > > > > >
> > > > > > What does make me curious though is that nobody on the remoteproc mailing list
> > > > > > has complained about commit 8b4ec69d7e09 breaking their environment... By now,
> > > > > > i.e rc4, that should have happened. Anyone from TI, ST and Xilinx care to test this on
> > > > > > their rig?
> > > > >
> > > > > I tested on STm32mp1 board using tag v5.19-rc4(03c765b0e3b4)
> > > > > I confirm the issue!
> > > > >
> > > > > Concerning the solution, I share Mathieu's concern. This could break legacy.
> > > > > I made a short test and I would suggest to use __virtio_unbreak_device instead, tounbreak the virtqueues without changing the init sequence.
> > > > >
> > > > > I this case the patch would be:
> > > > >
> > > > > + /*
> > > > > + * Unbreak the virtqueues to allow to add buffers before setting the vdev status
> > > > > + * to ready
> > > > > + */
> > > > > + __virtio_unbreak_device(vdev);
> > > > > +
> > > > >
> > > > > /* set up the receive buffers */
> > > > > for (i = 0; i < vrp->num_bufs / 2; i++) {
> > > > > struct scatterlist sg;
> > > > > void *cpu_addr = vrp->rbufs + i * vrp->buf_size;
> > > >
> > > > This will indeed fix the problem. On the flip side the kernel
> > > > documentation for __virtio_unbreak_device() puzzles me...
> > > > It clearly states that it should be used for probing and restoring but
> > > > _not_ directly by the driver. Function rpmsg_probe() is part of
> > > > probing but also the entry point to a driver.
> > > >
> > > > Michael and virtualisation folks, is this the right way to move forward?
> > >
> > > I don't think it is, __virtio_unbreak_device is intended for core use.
> >
> > Can we fill the rx after virtio_device_ready() in this case?
> >
> > Btw, the driver set driver ok after registering, we probably get a svq
> > kick before DRIVER_OK?
> >
> > Thanks
>
> Is this an ack for the original patch?

Nope, I meant, instead of moving virtio_device_ready() a little bit
earlier, can we only move the rvq filling after virtio_device_ready().

Thanks

>
> > >
> > > > >
> > > > > Regards,
> > > > > Arnaud
> > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > Mathieu
> > > > > >
> > > > > >> /* set up the receive buffers */
> > > > > >> for (i = 0; i < vrp->num_bufs / 2; i++) {
> > > > > >> struct scatterlist sg;
> > > > > >> @@ -983,9 +986,6 @@ static int rpmsg_probe(struct virtio_device *vdev)
> > > > > >> */
> > > > > >> notify = virtqueue_kick_prepare(vrp->rvq);
> > > > > >>
> > > > > >> - /* From this point on, we can notify and get callbacks. */
> > > > > >> - virtio_device_ready(vdev);
> > > > > >> -
> > > > > >> /* tell the remote processor it can start sending messages */
> > > > > >> /*
> > > > > >> * this might be concurrent with callbacks, but we are only
> > > > > >> --
> > > > > >> 2.34.1
> > > > > >>
> > >
>

2022-07-04 10:18:33

by Arnaud Pouliquen

[permalink] [raw]
Subject: Re: [PATCH] rpmsg: virtio: Fix broken rpmsg_probe()

Hello Jason,

On 7/4/22 06:35, Jason Wang wrote:
> On Fri, Jul 1, 2022 at 2:16 PM Michael S. Tsirkin <[email protected]> wrote:
>>
>> On Fri, Jul 01, 2022 at 09:22:15AM +0800, Jason Wang wrote:
>>> On Fri, Jul 1, 2022 at 3:20 AM Michael S. Tsirkin <[email protected]> wrote:
>>>>
>>>> On Thu, Jun 30, 2022 at 11:51:30AM -0600, Mathieu Poirier wrote:
>>>>> + [email protected]
>>>>> + [email protected]
>>>>> + [email protected]
>>>>>
>>>>> On Thu, 30 Jun 2022 at 10:20, Arnaud POULIQUEN
>>>>> <[email protected]> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> On 6/29/22 19:43, Mathieu Poirier wrote:
>>>>>>> Hi Anup,
>>>>>>>
>>>>>>> On Wed, Jun 08, 2022 at 10:43:34PM +0530, Anup Patel wrote:
>>>>>>>> The rpmsg_probe() is broken at the moment because virtqueue_add_inbuf()
>>>>>>>> fails due to both virtqueues (Rx and Tx) marked as broken by the
>>>>>>>> __vring_new_virtqueue() function. To solve this, virtio_device_ready()
>>>>>>>> (which unbreaks queues) should be called before virtqueue_add_inbuf().
>>>>>>>>
>>>>>>>> Fixes: 8b4ec69d7e09 ("virtio: harden vring IRQ")
>>>>>>>> Signed-off-by: Anup Patel <[email protected]>
>>>>>>>> ---
>>>>>>>> drivers/rpmsg/virtio_rpmsg_bus.c | 6 +++---
>>>>>>>> 1 file changed, 3 insertions(+), 3 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/rpmsg/virtio_rpmsg_bus.c b/drivers/rpmsg/virtio_rpmsg_bus.c
>>>>>>>> index 905ac7910c98..71a64d2c7644 100644
>>>>>>>> --- a/drivers/rpmsg/virtio_rpmsg_bus.c
>>>>>>>> +++ b/drivers/rpmsg/virtio_rpmsg_bus.c
>>>>>>>> @@ -929,6 +929,9 @@ static int rpmsg_probe(struct virtio_device *vdev)
>>>>>>>> /* and half is dedicated for TX */
>>>>>>>> vrp->sbufs = bufs_va + total_buf_space / 2;
>>>>>>>>
>>>>>>>> + /* From this point on, we can notify and get callbacks. */
>>>>>>>> + virtio_device_ready(vdev);
>>>>>>>> +
>>>>>>>
>>>>>>> Calling virtio_device_ready() here means that virtqueue_get_buf_ctx_split() can
>>>>>>> potentially be called (by way of rpmsg_recv_done()), which will race with
>>>>>>> virtqueue_add_inbuf(). If buffers in the virtqueue aren't available then
>>>>>>> rpmsg_recv_done() will fail, potentially breaking remote processors' state
>>>>>>> machines that don't expect their initial name service to fail when the "device"
>>>>>>> has been marked as ready.
>>>>>>>
>>>>>>> What does make me curious though is that nobody on the remoteproc mailing list
>>>>>>> has complained about commit 8b4ec69d7e09 breaking their environment... By now,
>>>>>>> i.e rc4, that should have happened. Anyone from TI, ST and Xilinx care to test this on
>>>>>>> their rig?
>>>>>>
>>>>>> I tested on STm32mp1 board using tag v5.19-rc4(03c765b0e3b4)
>>>>>> I confirm the issue!
>>>>>>
>>>>>> Concerning the solution, I share Mathieu's concern. This could break legacy.
>>>>>> I made a short test and I would suggest to use __virtio_unbreak_device instead, tounbreak the virtqueues without changing the init sequence.
>>>>>>
>>>>>> I this case the patch would be:
>>>>>>
>>>>>> + /*
>>>>>> + * Unbreak the virtqueues to allow to add buffers before setting the vdev status
>>>>>> + * to ready
>>>>>> + */
>>>>>> + __virtio_unbreak_device(vdev);
>>>>>> +
>>>>>>
>>>>>> /* set up the receive buffers */
>>>>>> for (i = 0; i < vrp->num_bufs / 2; i++) {
>>>>>> struct scatterlist sg;
>>>>>> void *cpu_addr = vrp->rbufs + i * vrp->buf_size;
>>>>>
>>>>> This will indeed fix the problem. On the flip side the kernel
>>>>> documentation for __virtio_unbreak_device() puzzles me...
>>>>> It clearly states that it should be used for probing and restoring but
>>>>> _not_ directly by the driver. Function rpmsg_probe() is part of
>>>>> probing but also the entry point to a driver.
>>>>>
>>>>> Michael and virtualisation folks, is this the right way to move forward?
>>>>
>>>> I don't think it is, __virtio_unbreak_device is intended for core use.
>>>
>>> Can we fill the rx after virtio_device_ready() in this case?
>>>
>>> Btw, the driver set driver ok after registering, we probably get a svq
>>> kick before DRIVER_OK?

By "registering" you mean calling rpmsg_virtio_add_ctrl_dev and
rpmsg_ns_register_device?

The rpmsg_ns_register_device has to be called before. Because it has to be
probed to handle the first message coming from the remote side to create
associated rpmsg local device. It doesn't send message.

The risk could be for the rpmsg_ctrl device. Registering it
after the virtio_device_ready(vdev) call could make sense...

>>>
>>> Thanks
>>
>> Is this an ack for the original patch?
>
> Nope, I meant, instead of moving virtio_device_ready() a little bit
> earlier, can we only move the rvq filling after virtio_device_ready().
>
> Thanks

Please find some concerns about this inversion here:
https://lore.kernel.org/lkml/[email protected]/

Regarding __virtio_unbreak_device. The pending virtio_break_device is
used by some virtio driver.
Could we consider that it makes sense to also have a
virtio_unbreak_device interface?


I do not well understand the reason of the commit:
8b4ec69d7e09 ("virtio: harden vring IRQ", 2022-05-27)
So following alternative is probably pretty naive:
Is the use of virtqueue_disable_cb could be an alternative to the
vq->broken usage allowing to register buffer while preventing virtqueue IRQ?

Thanks,
Arnaud

>
>>
>>>>
>>>>>>
>>>>>> Regards,
>>>>>> Arnaud
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Mathieu
>>>>>>>
>>>>>>>> /* set up the receive buffers */
>>>>>>>> for (i = 0; i < vrp->num_bufs / 2; i++) {
>>>>>>>> struct scatterlist sg;
>>>>>>>> @@ -983,9 +986,6 @@ static int rpmsg_probe(struct virtio_device *vdev)
>>>>>>>> */
>>>>>>>> notify = virtqueue_kick_prepare(vrp->rvq);
>>>>>>>>
>>>>>>>> - /* From this point on, we can notify and get callbacks. */
>>>>>>>> - virtio_device_ready(vdev);
>>>>>>>> -
>>>>>>>> /* tell the remote processor it can start sending messages */
>>>>>>>> /*
>>>>>>>> * this might be concurrent with callbacks, but we are only
>>>>>>>> --
>>>>>>>> 2.34.1
>>>>>>>>
>>>>
>>
>

2022-07-06 04:22:42

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH] rpmsg: virtio: Fix broken rpmsg_probe()

On Mon, Jul 4, 2022 at 5:45 PM Arnaud POULIQUEN
<[email protected]> wrote:
>
> Hello Jason,
>
> On 7/4/22 06:35, Jason Wang wrote:
> > On Fri, Jul 1, 2022 at 2:16 PM Michael S. Tsirkin <[email protected]> wrote:
> >>
> >> On Fri, Jul 01, 2022 at 09:22:15AM +0800, Jason Wang wrote:
> >>> On Fri, Jul 1, 2022 at 3:20 AM Michael S. Tsirkin <[email protected]> wrote:
> >>>>
> >>>> On Thu, Jun 30, 2022 at 11:51:30AM -0600, Mathieu Poirier wrote:
> >>>>> + [email protected]
> >>>>> + [email protected]
> >>>>> + [email protected]
> >>>>>
> >>>>> On Thu, 30 Jun 2022 at 10:20, Arnaud POULIQUEN
> >>>>> <[email protected]> wrote:
> >>>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> On 6/29/22 19:43, Mathieu Poirier wrote:
> >>>>>>> Hi Anup,
> >>>>>>>
> >>>>>>> On Wed, Jun 08, 2022 at 10:43:34PM +0530, Anup Patel wrote:
> >>>>>>>> The rpmsg_probe() is broken at the moment because virtqueue_add_inbuf()
> >>>>>>>> fails due to both virtqueues (Rx and Tx) marked as broken by the
> >>>>>>>> __vring_new_virtqueue() function. To solve this, virtio_device_ready()
> >>>>>>>> (which unbreaks queues) should be called before virtqueue_add_inbuf().
> >>>>>>>>
> >>>>>>>> Fixes: 8b4ec69d7e09 ("virtio: harden vring IRQ")
> >>>>>>>> Signed-off-by: Anup Patel <[email protected]>
> >>>>>>>> ---
> >>>>>>>> drivers/rpmsg/virtio_rpmsg_bus.c | 6 +++---
> >>>>>>>> 1 file changed, 3 insertions(+), 3 deletions(-)
> >>>>>>>>
> >>>>>>>> diff --git a/drivers/rpmsg/virtio_rpmsg_bus.c b/drivers/rpmsg/virtio_rpmsg_bus.c
> >>>>>>>> index 905ac7910c98..71a64d2c7644 100644
> >>>>>>>> --- a/drivers/rpmsg/virtio_rpmsg_bus.c
> >>>>>>>> +++ b/drivers/rpmsg/virtio_rpmsg_bus.c
> >>>>>>>> @@ -929,6 +929,9 @@ static int rpmsg_probe(struct virtio_device *vdev)
> >>>>>>>> /* and half is dedicated for TX */
> >>>>>>>> vrp->sbufs = bufs_va + total_buf_space / 2;
> >>>>>>>>
> >>>>>>>> + /* From this point on, we can notify and get callbacks. */
> >>>>>>>> + virtio_device_ready(vdev);
> >>>>>>>> +
> >>>>>>>
> >>>>>>> Calling virtio_device_ready() here means that virtqueue_get_buf_ctx_split() can
> >>>>>>> potentially be called (by way of rpmsg_recv_done()), which will race with
> >>>>>>> virtqueue_add_inbuf(). If buffers in the virtqueue aren't available then
> >>>>>>> rpmsg_recv_done() will fail, potentially breaking remote processors' state
> >>>>>>> machines that don't expect their initial name service to fail when the "device"
> >>>>>>> has been marked as ready.
> >>>>>>>
> >>>>>>> What does make me curious though is that nobody on the remoteproc mailing list
> >>>>>>> has complained about commit 8b4ec69d7e09 breaking their environment... By now,
> >>>>>>> i.e rc4, that should have happened. Anyone from TI, ST and Xilinx care to test this on
> >>>>>>> their rig?
> >>>>>>
> >>>>>> I tested on STm32mp1 board using tag v5.19-rc4(03c765b0e3b4)
> >>>>>> I confirm the issue!
> >>>>>>
> >>>>>> Concerning the solution, I share Mathieu's concern. This could break legacy.
> >>>>>> I made a short test and I would suggest to use __virtio_unbreak_device instead, tounbreak the virtqueues without changing the init sequence.
> >>>>>>
> >>>>>> I this case the patch would be:
> >>>>>>
> >>>>>> + /*
> >>>>>> + * Unbreak the virtqueues to allow to add buffers before setting the vdev status
> >>>>>> + * to ready
> >>>>>> + */
> >>>>>> + __virtio_unbreak_device(vdev);
> >>>>>> +
> >>>>>>
> >>>>>> /* set up the receive buffers */
> >>>>>> for (i = 0; i < vrp->num_bufs / 2; i++) {
> >>>>>> struct scatterlist sg;
> >>>>>> void *cpu_addr = vrp->rbufs + i * vrp->buf_size;
> >>>>>
> >>>>> This will indeed fix the problem. On the flip side the kernel
> >>>>> documentation for __virtio_unbreak_device() puzzles me...
> >>>>> It clearly states that it should be used for probing and restoring but
> >>>>> _not_ directly by the driver. Function rpmsg_probe() is part of
> >>>>> probing but also the entry point to a driver.
> >>>>>
> >>>>> Michael and virtualisation folks, is this the right way to move forward?
> >>>>
> >>>> I don't think it is, __virtio_unbreak_device is intended for core use.
> >>>
> >>> Can we fill the rx after virtio_device_ready() in this case?
> >>>
> >>> Btw, the driver set driver ok after registering, we probably get a svq
> >>> kick before DRIVER_OK?
>
> By "registering" you mean calling rpmsg_virtio_add_ctrl_dev and
> rpmsg_ns_register_device?

Yes.

>
> The rpmsg_ns_register_device has to be called before. Because it has to be
> probed to handle the first message coming from the remote side to create
> associated rpmsg local device.

I couldn't find the code to do this, maybe you can give me some hint on this.

> It doesn't send message.

I see the function register the device to the bus, I wonder if this
means the device could be probed and used by the driver before
virtio_device_ready().

>
> The risk could be for the rpmsg_ctrl device. Registering it
> after the virtio_device_ready(vdev) call could make sense...

I see.

>
> >>>
> >>> Thanks
> >>
> >> Is this an ack for the original patch?
> >
> > Nope, I meant, instead of moving virtio_device_ready() a little bit
> > earlier, can we only move the rvq filling after virtio_device_ready().
> >
> > Thanks
>
> Please find some concerns about this inversion here:
> https://lore.kernel.org/lkml/[email protected]/
>
> Regarding __virtio_unbreak_device. The pending virtio_break_device is
> used by some virtio driver.
> Could we consider that it makes sense to also have a
> virtio_unbreak_device interface?

We don't want to allow the driver to unbreak a device since it's
easier to have bugs.

>
>
> I do not well understand the reason of the commit:
> 8b4ec69d7e09 ("virtio: harden vring IRQ", 2022-05-27)

It tries to forbid the virtqueue callbacks to be called before
virtio_device_ready(). This helps to prevent the malicious device from
attacking the driver.

But unfortunately, it breaks several driver because:

1) some driver have races in probe/remove
2) it tries to reuse vq->broken which may break the driver that call
virqueue_add() before virtio_device_ready() which is allowed by the
spec

There's a discussion to have a better behavior that doesn't break the
existing drivers. And the IRQ hardening feature is marked as broken
now, so rpmsg should be fine without any extra effort.

> So following alternative is probably pretty naive:
> Is the use of virtqueue_disable_cb could be an alternative to the
> vq->broken usage allowing to register buffer while preventing virtqueue IRQ?

Probably not, there's no guarantee that the device will not send
notification after virqtueue_disable_cb().

Thanks

>
> Thanks,
> Arnaud
>
> >
> >>
> >>>>
> >>>>>>
> >>>>>> Regards,
> >>>>>> Arnaud
> >>>>>>
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Mathieu
> >>>>>>>
> >>>>>>>> /* set up the receive buffers */
> >>>>>>>> for (i = 0; i < vrp->num_bufs / 2; i++) {
> >>>>>>>> struct scatterlist sg;
> >>>>>>>> @@ -983,9 +986,6 @@ static int rpmsg_probe(struct virtio_device *vdev)
> >>>>>>>> */
> >>>>>>>> notify = virtqueue_kick_prepare(vrp->rvq);
> >>>>>>>>
> >>>>>>>> - /* From this point on, we can notify and get callbacks. */
> >>>>>>>> - virtio_device_ready(vdev);
> >>>>>>>> -
> >>>>>>>> /* tell the remote processor it can start sending messages */
> >>>>>>>> /*
> >>>>>>>> * this might be concurrent with callbacks, but we are only
> >>>>>>>> --
> >>>>>>>> 2.34.1
> >>>>>>>>
> >>>>
> >>
> >
>

2022-07-06 06:59:11

by Arnaud Pouliquen

[permalink] [raw]
Subject: Re: [PATCH] rpmsg: virtio: Fix broken rpmsg_probe()



On 7/6/22 06:03, Jason Wang wrote:
> On Mon, Jul 4, 2022 at 5:45 PM Arnaud POULIQUEN
> <[email protected]> wrote:
>>
>> Hello Jason,
>>
>> On 7/4/22 06:35, Jason Wang wrote:
>>> On Fri, Jul 1, 2022 at 2:16 PM Michael S. Tsirkin <[email protected]> wrote:
>>>>
>>>> On Fri, Jul 01, 2022 at 09:22:15AM +0800, Jason Wang wrote:
>>>>> On Fri, Jul 1, 2022 at 3:20 AM Michael S. Tsirkin <[email protected]> wrote:
>>>>>>
>>>>>> On Thu, Jun 30, 2022 at 11:51:30AM -0600, Mathieu Poirier wrote:
>>>>>>> + [email protected]
>>>>>>> + [email protected]
>>>>>>> + [email protected]
>>>>>>>
>>>>>>> On Thu, 30 Jun 2022 at 10:20, Arnaud POULIQUEN
>>>>>>> <[email protected]> wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> On 6/29/22 19:43, Mathieu Poirier wrote:
>>>>>>>>> Hi Anup,
>>>>>>>>>
>>>>>>>>> On Wed, Jun 08, 2022 at 10:43:34PM +0530, Anup Patel wrote:
>>>>>>>>>> The rpmsg_probe() is broken at the moment because virtqueue_add_inbuf()
>>>>>>>>>> fails due to both virtqueues (Rx and Tx) marked as broken by the
>>>>>>>>>> __vring_new_virtqueue() function. To solve this, virtio_device_ready()
>>>>>>>>>> (which unbreaks queues) should be called before virtqueue_add_inbuf().
>>>>>>>>>>
>>>>>>>>>> Fixes: 8b4ec69d7e09 ("virtio: harden vring IRQ")
>>>>>>>>>> Signed-off-by: Anup Patel <[email protected]>
>>>>>>>>>> ---
>>>>>>>>>> drivers/rpmsg/virtio_rpmsg_bus.c | 6 +++---
>>>>>>>>>> 1 file changed, 3 insertions(+), 3 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/rpmsg/virtio_rpmsg_bus.c b/drivers/rpmsg/virtio_rpmsg_bus.c
>>>>>>>>>> index 905ac7910c98..71a64d2c7644 100644
>>>>>>>>>> --- a/drivers/rpmsg/virtio_rpmsg_bus.c
>>>>>>>>>> +++ b/drivers/rpmsg/virtio_rpmsg_bus.c
>>>>>>>>>> @@ -929,6 +929,9 @@ static int rpmsg_probe(struct virtio_device *vdev)
>>>>>>>>>> /* and half is dedicated for TX */
>>>>>>>>>> vrp->sbufs = bufs_va + total_buf_space / 2;
>>>>>>>>>>
>>>>>>>>>> + /* From this point on, we can notify and get callbacks. */
>>>>>>>>>> + virtio_device_ready(vdev);
>>>>>>>>>> +
>>>>>>>>>
>>>>>>>>> Calling virtio_device_ready() here means that virtqueue_get_buf_ctx_split() can
>>>>>>>>> potentially be called (by way of rpmsg_recv_done()), which will race with
>>>>>>>>> virtqueue_add_inbuf(). If buffers in the virtqueue aren't available then
>>>>>>>>> rpmsg_recv_done() will fail, potentially breaking remote processors' state
>>>>>>>>> machines that don't expect their initial name service to fail when the "device"
>>>>>>>>> has been marked as ready.
>>>>>>>>>
>>>>>>>>> What does make me curious though is that nobody on the remoteproc mailing list
>>>>>>>>> has complained about commit 8b4ec69d7e09 breaking their environment... By now,
>>>>>>>>> i.e rc4, that should have happened. Anyone from TI, ST and Xilinx care to test this on
>>>>>>>>> their rig?
>>>>>>>>
>>>>>>>> I tested on STm32mp1 board using tag v5.19-rc4(03c765b0e3b4)
>>>>>>>> I confirm the issue!
>>>>>>>>
>>>>>>>> Concerning the solution, I share Mathieu's concern. This could break legacy.
>>>>>>>> I made a short test and I would suggest to use __virtio_unbreak_device instead, tounbreak the virtqueues without changing the init sequence.
>>>>>>>>
>>>>>>>> I this case the patch would be:
>>>>>>>>
>>>>>>>> + /*
>>>>>>>> + * Unbreak the virtqueues to allow to add buffers before setting the vdev status
>>>>>>>> + * to ready
>>>>>>>> + */
>>>>>>>> + __virtio_unbreak_device(vdev);
>>>>>>>> +
>>>>>>>>
>>>>>>>> /* set up the receive buffers */
>>>>>>>> for (i = 0; i < vrp->num_bufs / 2; i++) {
>>>>>>>> struct scatterlist sg;
>>>>>>>> void *cpu_addr = vrp->rbufs + i * vrp->buf_size;
>>>>>>>
>>>>>>> This will indeed fix the problem. On the flip side the kernel
>>>>>>> documentation for __virtio_unbreak_device() puzzles me...
>>>>>>> It clearly states that it should be used for probing and restoring but
>>>>>>> _not_ directly by the driver. Function rpmsg_probe() is part of
>>>>>>> probing but also the entry point to a driver.
>>>>>>>
>>>>>>> Michael and virtualisation folks, is this the right way to move forward?
>>>>>>
>>>>>> I don't think it is, __virtio_unbreak_device is intended for core use.
>>>>>
>>>>> Can we fill the rx after virtio_device_ready() in this case?
>>>>>
>>>>> Btw, the driver set driver ok after registering, we probably get a svq
>>>>> kick before DRIVER_OK?
>>
>> By "registering" you mean calling rpmsg_virtio_add_ctrl_dev and
>> rpmsg_ns_register_device?
>
> Yes.
>
>>
>> The rpmsg_ns_register_device has to be called before. Because it has to be
>> probed to handle the first message coming from the remote side to create
>> associated rpmsg local device.
>
> I couldn't find the code to do this, maybe you can give me some hint on this.

The rpmsg_ns is available here :
https://elixir.bootlin.com/linux/latest/source/drivers/rpmsg/rpmsg_ns.c

It is probed on rpmsg_ns_register_device call.
https://elixir.bootlin.com/linux/latest/source/drivers/rpmsg/virtio_rpmsg_bus.c#L974


>
>> It doesn't send message.
>
> I see the function register the device to the bus, I wonder if this
> means the device could be probed and used by the driver before
> virtio_device_ready().
>
>>
>> The risk could be for the rpmsg_ctrl device. Registering it
>> after the virtio_device_ready(vdev) call could make sense...
>
> I see.
>
>>
>>>>>
>>>>> Thanks
>>>>
>>>> Is this an ack for the original patch?
>>>
>>> Nope, I meant, instead of moving virtio_device_ready() a little bit
>>> earlier, can we only move the rvq filling after virtio_device_ready().
>>>
>>> Thanks
>>
>> Please find some concerns about this inversion here:
>> https://lore.kernel.org/lkml/[email protected]/
>>
>> Regarding __virtio_unbreak_device. The pending virtio_break_device is
>> used by some virtio driver.
>> Could we consider that it makes sense to also have a
>> virtio_unbreak_device interface?
>
> We don't want to allow the driver to unbreak a device since it's
> easier to have bugs.
>
>>
>>
>> I do not well understand the reason of the commit:
>> 8b4ec69d7e09 ("virtio: harden vring IRQ", 2022-05-27)
>
> It tries to forbid the virtqueue callbacks to be called before
> virtio_device_ready(). This helps to prevent the malicious device from
> attacking the driver.
>
> But unfortunately, it breaks several driver because:
>
> 1) some driver have races in probe/remove
> 2) it tries to reuse vq->broken which may break the driver that call
> virqueue_add() before virtio_device_ready() which is allowed by the
> spec
>
> There's a discussion to have a better behavior that doesn't break the
> existing drivers. And the IRQ hardening feature is marked as broken
> now, so rpmsg should be fine without any extra effort.

Thanks for the explanations.
If the discussions are in a mail thread could you give me the reference?

Thanks,
Arnaud

>
>> So following alternative is probably pretty naive:
>> Is the use of virtqueue_disable_cb could be an alternative to the
>> vq->broken usage allowing to register buffer while preventing virtqueue IRQ?
>
> Probably not, there's no guarantee that the device will not send
> notification after virqtueue_disable_cb().
>
> Thanks
>
>>
>> Thanks,
>> Arnaud
>>
>>>
>>>>
>>>>>>
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Arnaud
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Mathieu
>>>>>>>>>
>>>>>>>>>> /* set up the receive buffers */
>>>>>>>>>> for (i = 0; i < vrp->num_bufs / 2; i++) {
>>>>>>>>>> struct scatterlist sg;
>>>>>>>>>> @@ -983,9 +986,6 @@ static int rpmsg_probe(struct virtio_device *vdev)
>>>>>>>>>> */
>>>>>>>>>> notify = virtqueue_kick_prepare(vrp->rvq);
>>>>>>>>>>
>>>>>>>>>> - /* From this point on, we can notify and get callbacks. */
>>>>>>>>>> - virtio_device_ready(vdev);
>>>>>>>>>> -
>>>>>>>>>> /* tell the remote processor it can start sending messages */
>>>>>>>>>> /*
>>>>>>>>>> * this might be concurrent with callbacks, but we are only
>>>>>>>>>> --
>>>>>>>>>> 2.34.1
>>>>>>>>>>
>>>>>>
>>>>
>>>
>>
>

2022-07-08 06:42:32

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH] rpmsg: virtio: Fix broken rpmsg_probe()

On Wed, Jul 6, 2022 at 2:57 PM Arnaud POULIQUEN
<[email protected]> wrote:
>
>
>
> On 7/6/22 06:03, Jason Wang wrote:
> > On Mon, Jul 4, 2022 at 5:45 PM Arnaud POULIQUEN
> > <[email protected]> wrote:
> >>
> >> Hello Jason,
> >>
> >> On 7/4/22 06:35, Jason Wang wrote:
> >>> On Fri, Jul 1, 2022 at 2:16 PM Michael S. Tsirkin <[email protected]> wrote:
> >>>>
> >>>> On Fri, Jul 01, 2022 at 09:22:15AM +0800, Jason Wang wrote:
> >>>>> On Fri, Jul 1, 2022 at 3:20 AM Michael S. Tsirkin <[email protected]> wrote:
> >>>>>>
> >>>>>> On Thu, Jun 30, 2022 at 11:51:30AM -0600, Mathieu Poirier wrote:
> >>>>>>> + [email protected]
> >>>>>>> + [email protected]
> >>>>>>> + [email protected]
> >>>>>>>
> >>>>>>> On Thu, 30 Jun 2022 at 10:20, Arnaud POULIQUEN
> >>>>>>> <[email protected]> wrote:
> >>>>>>>>
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> On 6/29/22 19:43, Mathieu Poirier wrote:
> >>>>>>>>> Hi Anup,
> >>>>>>>>>
> >>>>>>>>> On Wed, Jun 08, 2022 at 10:43:34PM +0530, Anup Patel wrote:
> >>>>>>>>>> The rpmsg_probe() is broken at the moment because virtqueue_add_inbuf()
> >>>>>>>>>> fails due to both virtqueues (Rx and Tx) marked as broken by the
> >>>>>>>>>> __vring_new_virtqueue() function. To solve this, virtio_device_ready()
> >>>>>>>>>> (which unbreaks queues) should be called before virtqueue_add_inbuf().
> >>>>>>>>>>
> >>>>>>>>>> Fixes: 8b4ec69d7e09 ("virtio: harden vring IRQ")
> >>>>>>>>>> Signed-off-by: Anup Patel <[email protected]>
> >>>>>>>>>> ---
> >>>>>>>>>> drivers/rpmsg/virtio_rpmsg_bus.c | 6 +++---
> >>>>>>>>>> 1 file changed, 3 insertions(+), 3 deletions(-)
> >>>>>>>>>>
> >>>>>>>>>> diff --git a/drivers/rpmsg/virtio_rpmsg_bus.c b/drivers/rpmsg/virtio_rpmsg_bus.c
> >>>>>>>>>> index 905ac7910c98..71a64d2c7644 100644
> >>>>>>>>>> --- a/drivers/rpmsg/virtio_rpmsg_bus.c
> >>>>>>>>>> +++ b/drivers/rpmsg/virtio_rpmsg_bus.c
> >>>>>>>>>> @@ -929,6 +929,9 @@ static int rpmsg_probe(struct virtio_device *vdev)
> >>>>>>>>>> /* and half is dedicated for TX */
> >>>>>>>>>> vrp->sbufs = bufs_va + total_buf_space / 2;
> >>>>>>>>>>
> >>>>>>>>>> + /* From this point on, we can notify and get callbacks. */
> >>>>>>>>>> + virtio_device_ready(vdev);
> >>>>>>>>>> +
> >>>>>>>>>
> >>>>>>>>> Calling virtio_device_ready() here means that virtqueue_get_buf_ctx_split() can
> >>>>>>>>> potentially be called (by way of rpmsg_recv_done()), which will race with
> >>>>>>>>> virtqueue_add_inbuf(). If buffers in the virtqueue aren't available then
> >>>>>>>>> rpmsg_recv_done() will fail, potentially breaking remote processors' state
> >>>>>>>>> machines that don't expect their initial name service to fail when the "device"
> >>>>>>>>> has been marked as ready.
> >>>>>>>>>
> >>>>>>>>> What does make me curious though is that nobody on the remoteproc mailing list
> >>>>>>>>> has complained about commit 8b4ec69d7e09 breaking their environment... By now,
> >>>>>>>>> i.e rc4, that should have happened. Anyone from TI, ST and Xilinx care to test this on
> >>>>>>>>> their rig?
> >>>>>>>>
> >>>>>>>> I tested on STm32mp1 board using tag v5.19-rc4(03c765b0e3b4)
> >>>>>>>> I confirm the issue!
> >>>>>>>>
> >>>>>>>> Concerning the solution, I share Mathieu's concern. This could break legacy.
> >>>>>>>> I made a short test and I would suggest to use __virtio_unbreak_device instead, tounbreak the virtqueues without changing the init sequence.
> >>>>>>>>
> >>>>>>>> I this case the patch would be:
> >>>>>>>>
> >>>>>>>> + /*
> >>>>>>>> + * Unbreak the virtqueues to allow to add buffers before setting the vdev status
> >>>>>>>> + * to ready
> >>>>>>>> + */
> >>>>>>>> + __virtio_unbreak_device(vdev);
> >>>>>>>> +
> >>>>>>>>
> >>>>>>>> /* set up the receive buffers */
> >>>>>>>> for (i = 0; i < vrp->num_bufs / 2; i++) {
> >>>>>>>> struct scatterlist sg;
> >>>>>>>> void *cpu_addr = vrp->rbufs + i * vrp->buf_size;
> >>>>>>>
> >>>>>>> This will indeed fix the problem. On the flip side the kernel
> >>>>>>> documentation for __virtio_unbreak_device() puzzles me...
> >>>>>>> It clearly states that it should be used for probing and restoring but
> >>>>>>> _not_ directly by the driver. Function rpmsg_probe() is part of
> >>>>>>> probing but also the entry point to a driver.
> >>>>>>>
> >>>>>>> Michael and virtualisation folks, is this the right way to move forward?
> >>>>>>
> >>>>>> I don't think it is, __virtio_unbreak_device is intended for core use.
> >>>>>
> >>>>> Can we fill the rx after virtio_device_ready() in this case?
> >>>>>
> >>>>> Btw, the driver set driver ok after registering, we probably get a svq
> >>>>> kick before DRIVER_OK?
> >>
> >> By "registering" you mean calling rpmsg_virtio_add_ctrl_dev and
> >> rpmsg_ns_register_device?
> >
> > Yes.
> >
> >>
> >> The rpmsg_ns_register_device has to be called before. Because it has to be
> >> probed to handle the first message coming from the remote side to create
> >> associated rpmsg local device.
> >
> > I couldn't find the code to do this, maybe you can give me some hint on this.
>
> The rpmsg_ns is available here :
> https://elixir.bootlin.com/linux/latest/source/drivers/rpmsg/rpmsg_ns.c
>
> It is probed on rpmsg_ns_register_device call.
> https://elixir.bootlin.com/linux/latest/source/drivers/rpmsg/virtio_rpmsg_bus.c#L974

Yes but what I want to ask is, it looks to me
rpmsg_ns_register_device() only creates a rpmsg device. Do you mean
the rpmsg driver that will handle the first message during its probe?

>
>
> >
> >> It doesn't send message.
> >
> > I see the function register the device to the bus, I wonder if this
> > means the device could be probed and used by the driver before
> > virtio_device_ready().
> >
> >>
> >> The risk could be for the rpmsg_ctrl device. Registering it
> >> after the virtio_device_ready(vdev) call could make sense...
> >
> > I see.
> >
> >>
> >>>>>
> >>>>> Thanks
> >>>>
> >>>> Is this an ack for the original patch?
> >>>
> >>> Nope, I meant, instead of moving virtio_device_ready() a little bit
> >>> earlier, can we only move the rvq filling after virtio_device_ready().
> >>>
> >>> Thanks
> >>
> >> Please find some concerns about this inversion here:
> >> https://lore.kernel.org/lkml/[email protected]/
> >>
> >> Regarding __virtio_unbreak_device. The pending virtio_break_device is
> >> used by some virtio driver.
> >> Could we consider that it makes sense to also have a
> >> virtio_unbreak_device interface?
> >
> > We don't want to allow the driver to unbreak a device since it's
> > easier to have bugs.
> >
> >>
> >>
> >> I do not well understand the reason of the commit:
> >> 8b4ec69d7e09 ("virtio: harden vring IRQ", 2022-05-27)
> >
> > It tries to forbid the virtqueue callbacks to be called before
> > virtio_device_ready(). This helps to prevent the malicious device from
> > attacking the driver.
> >
> > But unfortunately, it breaks several driver because:
> >
> > 1) some driver have races in probe/remove
> > 2) it tries to reuse vq->broken which may break the driver that call
> > virqueue_add() before virtio_device_ready() which is allowed by the
> > spec
> >
> > There's a discussion to have a better behavior that doesn't break the
> > existing drivers. And the IRQ hardening feature is marked as broken
> > now, so rpmsg should be fine without any extra effort.
>
> Thanks for the explanations.
> If the discussions are in a mail thread could you give me the reference?

Here're the discussions and commits:

https://lore.kernel.org/lkml/[email protected]/

https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git/commit/?h=linux-next&id=c346dae4f3fbce51bbd4f2ec5e8c6f9b91e93163
https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git/commit/?h=linux-next&id=6a9720576cd00d30722c5f755bd17d4cfa9df636

Thanks

>
> Thanks,
> Arnaud
>
> >
> >> So following alternative is probably pretty naive:
> >> Is the use of virtqueue_disable_cb could be an alternative to the
> >> vq->broken usage allowing to register buffer while preventing virtqueue IRQ?
> >
> > Probably not, there's no guarantee that the device will not send
> > notification after virqtueue_disable_cb().
> >
> > Thanks
> >
> >>
> >> Thanks,
> >> Arnaud
> >>
> >>>
> >>>>
> >>>>>>
> >>>>>>>>
> >>>>>>>> Regards,
> >>>>>>>> Arnaud
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> Mathieu
> >>>>>>>>>
> >>>>>>>>>> /* set up the receive buffers */
> >>>>>>>>>> for (i = 0; i < vrp->num_bufs / 2; i++) {
> >>>>>>>>>> struct scatterlist sg;
> >>>>>>>>>> @@ -983,9 +986,6 @@ static int rpmsg_probe(struct virtio_device *vdev)
> >>>>>>>>>> */
> >>>>>>>>>> notify = virtqueue_kick_prepare(vrp->rvq);
> >>>>>>>>>>
> >>>>>>>>>> - /* From this point on, we can notify and get callbacks. */
> >>>>>>>>>> - virtio_device_ready(vdev);
> >>>>>>>>>> -
> >>>>>>>>>> /* tell the remote processor it can start sending messages */
> >>>>>>>>>> /*
> >>>>>>>>>> * this might be concurrent with callbacks, but we are only
> >>>>>>>>>> --
> >>>>>>>>>> 2.34.1
> >>>>>>>>>>
> >>>>>>
> >>>>
> >>>
> >>
> >
>

2022-07-08 08:25:18

by Arnaud Pouliquen

[permalink] [raw]
Subject: Re: [PATCH] rpmsg: virtio: Fix broken rpmsg_probe()



On 7/8/22 08:19, Jason Wang wrote:
> On Wed, Jul 6, 2022 at 2:57 PM Arnaud POULIQUEN
> <[email protected]> wrote:
>>
>>
>>
>> On 7/6/22 06:03, Jason Wang wrote:
>>> On Mon, Jul 4, 2022 at 5:45 PM Arnaud POULIQUEN
>>> <[email protected]> wrote:
>>>>
>>>> Hello Jason,
>>>>
>>>> On 7/4/22 06:35, Jason Wang wrote:
>>>>> On Fri, Jul 1, 2022 at 2:16 PM Michael S. Tsirkin <[email protected]> wrote:
>>>>>>
>>>>>> On Fri, Jul 01, 2022 at 09:22:15AM +0800, Jason Wang wrote:
>>>>>>> On Fri, Jul 1, 2022 at 3:20 AM Michael S. Tsirkin <[email protected]> wrote:
>>>>>>>>
>>>>>>>> On Thu, Jun 30, 2022 at 11:51:30AM -0600, Mathieu Poirier wrote:
>>>>>>>>> + [email protected]
>>>>>>>>> + [email protected]
>>>>>>>>> + [email protected]
>>>>>>>>>
>>>>>>>>> On Thu, 30 Jun 2022 at 10:20, Arnaud POULIQUEN
>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> On 6/29/22 19:43, Mathieu Poirier wrote:
>>>>>>>>>>> Hi Anup,
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Jun 08, 2022 at 10:43:34PM +0530, Anup Patel wrote:
>>>>>>>>>>>> The rpmsg_probe() is broken at the moment because virtqueue_add_inbuf()
>>>>>>>>>>>> fails due to both virtqueues (Rx and Tx) marked as broken by the
>>>>>>>>>>>> __vring_new_virtqueue() function. To solve this, virtio_device_ready()
>>>>>>>>>>>> (which unbreaks queues) should be called before virtqueue_add_inbuf().
>>>>>>>>>>>>
>>>>>>>>>>>> Fixes: 8b4ec69d7e09 ("virtio: harden vring IRQ")
>>>>>>>>>>>> Signed-off-by: Anup Patel <[email protected]>
>>>>>>>>>>>> ---
>>>>>>>>>>>> drivers/rpmsg/virtio_rpmsg_bus.c | 6 +++---
>>>>>>>>>>>> 1 file changed, 3 insertions(+), 3 deletions(-)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/drivers/rpmsg/virtio_rpmsg_bus.c b/drivers/rpmsg/virtio_rpmsg_bus.c
>>>>>>>>>>>> index 905ac7910c98..71a64d2c7644 100644
>>>>>>>>>>>> --- a/drivers/rpmsg/virtio_rpmsg_bus.c
>>>>>>>>>>>> +++ b/drivers/rpmsg/virtio_rpmsg_bus.c
>>>>>>>>>>>> @@ -929,6 +929,9 @@ static int rpmsg_probe(struct virtio_device *vdev)
>>>>>>>>>>>> /* and half is dedicated for TX */
>>>>>>>>>>>> vrp->sbufs = bufs_va + total_buf_space / 2;
>>>>>>>>>>>>
>>>>>>>>>>>> + /* From this point on, we can notify and get callbacks. */
>>>>>>>>>>>> + virtio_device_ready(vdev);
>>>>>>>>>>>> +
>>>>>>>>>>>
>>>>>>>>>>> Calling virtio_device_ready() here means that virtqueue_get_buf_ctx_split() can
>>>>>>>>>>> potentially be called (by way of rpmsg_recv_done()), which will race with
>>>>>>>>>>> virtqueue_add_inbuf(). If buffers in the virtqueue aren't available then
>>>>>>>>>>> rpmsg_recv_done() will fail, potentially breaking remote processors' state
>>>>>>>>>>> machines that don't expect their initial name service to fail when the "device"
>>>>>>>>>>> has been marked as ready.
>>>>>>>>>>>
>>>>>>>>>>> What does make me curious though is that nobody on the remoteproc mailing list
>>>>>>>>>>> has complained about commit 8b4ec69d7e09 breaking their environment... By now,
>>>>>>>>>>> i.e rc4, that should have happened. Anyone from TI, ST and Xilinx care to test this on
>>>>>>>>>>> their rig?
>>>>>>>>>>
>>>>>>>>>> I tested on STm32mp1 board using tag v5.19-rc4(03c765b0e3b4)
>>>>>>>>>> I confirm the issue!
>>>>>>>>>>
>>>>>>>>>> Concerning the solution, I share Mathieu's concern. This could break legacy.
>>>>>>>>>> I made a short test and I would suggest to use __virtio_unbreak_device instead, tounbreak the virtqueues without changing the init sequence.
>>>>>>>>>>
>>>>>>>>>> I this case the patch would be:
>>>>>>>>>>
>>>>>>>>>> + /*
>>>>>>>>>> + * Unbreak the virtqueues to allow to add buffers before setting the vdev status
>>>>>>>>>> + * to ready
>>>>>>>>>> + */
>>>>>>>>>> + __virtio_unbreak_device(vdev);
>>>>>>>>>> +
>>>>>>>>>>
>>>>>>>>>> /* set up the receive buffers */
>>>>>>>>>> for (i = 0; i < vrp->num_bufs / 2; i++) {
>>>>>>>>>> struct scatterlist sg;
>>>>>>>>>> void *cpu_addr = vrp->rbufs + i * vrp->buf_size;
>>>>>>>>>
>>>>>>>>> This will indeed fix the problem. On the flip side the kernel
>>>>>>>>> documentation for __virtio_unbreak_device() puzzles me...
>>>>>>>>> It clearly states that it should be used for probing and restoring but
>>>>>>>>> _not_ directly by the driver. Function rpmsg_probe() is part of
>>>>>>>>> probing but also the entry point to a driver.
>>>>>>>>>
>>>>>>>>> Michael and virtualisation folks, is this the right way to move forward?
>>>>>>>>
>>>>>>>> I don't think it is, __virtio_unbreak_device is intended for core use.
>>>>>>>
>>>>>>> Can we fill the rx after virtio_device_ready() in this case?
>>>>>>>
>>>>>>> Btw, the driver set driver ok after registering, we probably get a svq
>>>>>>> kick before DRIVER_OK?
>>>>
>>>> By "registering" you mean calling rpmsg_virtio_add_ctrl_dev and
>>>> rpmsg_ns_register_device?
>>>
>>> Yes.
>>>
>>>>
>>>> The rpmsg_ns_register_device has to be called before. Because it has to be
>>>> probed to handle the first message coming from the remote side to create
>>>> associated rpmsg local device.
>>>
>>> I couldn't find the code to do this, maybe you can give me some hint on this.
>>
>> The rpmsg_ns is available here :
>> https://elixir.bootlin.com/linux/latest/source/drivers/rpmsg/rpmsg_ns.c
>>
>> It is probed on rpmsg_ns_register_device call.
>> https://elixir.bootlin.com/linux/latest/source/drivers/rpmsg/virtio_rpmsg_bus.c#L974
>
> Yes but what I want to ask is, it looks to me
> rpmsg_ns_register_device() only creates a rpmsg device. Do you mean
> the rpmsg driver that will handle the first message during its probe?

No it will be out of its probe, in its callback. the callback is called
by the virtio-rpmsg based on the rpmsg receiver address.

For the details:
In rpmsg virtio implementation there is a mechanism to discover the
RPMsg services supported by the remote processor: the name service
announcement. For instance for the rpmsg_tty[1], the remote processor
sends a rpmsg service announcement message indicating that it supports
the "rpmsg-tty" service.
On linux side the rpmsg_ns receives the message and creates a rpmsg
channel that leads to a rpmsg_tty device creation on the rpmsg bus.

If the rpmsg_ns is not registered (so no rpmsg receiver address
registered), then when the "ns announcement" is received,the message
is dropped, the service not initialized.

[1]:https://elixir.bootlin.com/linux/v5.19-rc4/source/drivers/tty/rpmsg_tty.c

>
>>
>>
>>>
>>>> It doesn't send message.
>>>
>>> I see the function register the device to the bus, I wonder if this
>>> means the device could be probed and used by the driver before
>>> virtio_device_ready().
>>>
>>>>
>>>> The risk could be for the rpmsg_ctrl device. Registering it
>>>> after the virtio_device_ready(vdev) call could make sense...
>>>
>>> I see.
>>>
>>>>
>>>>>>>
>>>>>>> Thanks
>>>>>>
>>>>>> Is this an ack for the original patch?
>>>>>
>>>>> Nope, I meant, instead of moving virtio_device_ready() a little bit
>>>>> earlier, can we only move the rvq filling after virtio_device_ready().
>>>>>
>>>>> Thanks
>>>>
>>>> Please find some concerns about this inversion here:
>>>> https://lore.kernel.org/lkml/[email protected]/
>>>>
>>>> Regarding __virtio_unbreak_device. The pending virtio_break_device is
>>>> used by some virtio driver.
>>>> Could we consider that it makes sense to also have a
>>>> virtio_unbreak_device interface?
>>>
>>> We don't want to allow the driver to unbreak a device since it's
>>> easier to have bugs.
>>>
>>>>
>>>>
>>>> I do not well understand the reason of the commit:
>>>> 8b4ec69d7e09 ("virtio: harden vring IRQ", 2022-05-27)
>>>
>>> It tries to forbid the virtqueue callbacks to be called before
>>> virtio_device_ready(). This helps to prevent the malicious device from
>>> attacking the driver.
>>>
>>> But unfortunately, it breaks several driver because:
>>>
>>> 1) some driver have races in probe/remove
>>> 2) it tries to reuse vq->broken which may break the driver that call
>>> virqueue_add() before virtio_device_ready() which is allowed by the
>>> spec
>>>
>>> There's a discussion to have a better behavior that doesn't break the
>>> existing drivers. And the IRQ hardening feature is marked as broken
>>> now, so rpmsg should be fine without any extra effort.
>>
>> Thanks for the explanations.
>> If the discussions are in a mail thread could you give me the reference?
>
> Here're the discussions and commits:
>
> https://lore.kernel.org/lkml/[email protected]/
>
> https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git/commit/?h=linux-next&id=c346dae4f3fbce51bbd4f2ec5e8c6f9b91e93163
> https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git/commit/?h=linux-next&id=6a9720576cd00d30722c5f755bd17d4cfa9df636

Thanks for the links!
So no more update planed in drivers/rpmsg/virtio_rpmsg_bus.c, if i well understood...

Thanks,
Arnaud

>
> Thanks
>
>>
>> Thanks,
>> Arnaud
>>
>>>
>>>> So following alternative is probably pretty naive:
>>>> Is the use of virtqueue_disable_cb could be an alternative to the
>>>> vq->broken usage allowing to register buffer while preventing virtqueue IRQ?
>>>
>>> Probably not, there's no guarantee that the device will not send
>>> notification after virqtueue_disable_cb().
>>>
>>> Thanks
>>>
>>>>
>>>> Thanks,
>>>> Arnaud
>>>>
>>>>>
>>>>>>
>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Arnaud
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Mathieu
>>>>>>>>>>>
>>>>>>>>>>>> /* set up the receive buffers */
>>>>>>>>>>>> for (i = 0; i < vrp->num_bufs / 2; i++) {
>>>>>>>>>>>> struct scatterlist sg;
>>>>>>>>>>>> @@ -983,9 +986,6 @@ static int rpmsg_probe(struct virtio_device *vdev)
>>>>>>>>>>>> */
>>>>>>>>>>>> notify = virtqueue_kick_prepare(vrp->rvq);
>>>>>>>>>>>>
>>>>>>>>>>>> - /* From this point on, we can notify and get callbacks. */
>>>>>>>>>>>> - virtio_device_ready(vdev);
>>>>>>>>>>>> -
>>>>>>>>>>>> /* tell the remote processor it can start sending messages */
>>>>>>>>>>>> /*
>>>>>>>>>>>> * this might be concurrent with callbacks, but we are only
>>>>>>>>>>>> --
>>>>>>>>>>>> 2.34.1
>>>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

2022-07-12 09:09:36

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH] rpmsg: virtio: Fix broken rpmsg_probe()

On Fri, Jul 8, 2022 at 4:01 PM Arnaud POULIQUEN
<[email protected]> wrote:
>
>
>
> On 7/8/22 08:19, Jason Wang wrote:
> > On Wed, Jul 6, 2022 at 2:57 PM Arnaud POULIQUEN
> > <[email protected]> wrote:
> >>
> >>
> >>
> >> On 7/6/22 06:03, Jason Wang wrote:
> >>> On Mon, Jul 4, 2022 at 5:45 PM Arnaud POULIQUEN
> >>> <[email protected]> wrote:
> >>>>
> >>>> Hello Jason,
> >>>>
> >>>> On 7/4/22 06:35, Jason Wang wrote:
> >>>>> On Fri, Jul 1, 2022 at 2:16 PM Michael S. Tsirkin <[email protected]> wrote:
> >>>>>>
> >>>>>> On Fri, Jul 01, 2022 at 09:22:15AM +0800, Jason Wang wrote:
> >>>>>>> On Fri, Jul 1, 2022 at 3:20 AM Michael S. Tsirkin <[email protected]> wrote:
> >>>>>>>>
> >>>>>>>> On Thu, Jun 30, 2022 at 11:51:30AM -0600, Mathieu Poirier wrote:
> >>>>>>>>> + [email protected]
> >>>>>>>>> + [email protected]
> >>>>>>>>> + [email protected]
> >>>>>>>>>
> >>>>>>>>> On Thu, 30 Jun 2022 at 10:20, Arnaud POULIQUEN
> >>>>>>>>> <[email protected]> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Hi,
> >>>>>>>>>>
> >>>>>>>>>> On 6/29/22 19:43, Mathieu Poirier wrote:
> >>>>>>>>>>> Hi Anup,
> >>>>>>>>>>>
> >>>>>>>>>>> On Wed, Jun 08, 2022 at 10:43:34PM +0530, Anup Patel wrote:
> >>>>>>>>>>>> The rpmsg_probe() is broken at the moment because virtqueue_add_inbuf()
> >>>>>>>>>>>> fails due to both virtqueues (Rx and Tx) marked as broken by the
> >>>>>>>>>>>> __vring_new_virtqueue() function. To solve this, virtio_device_ready()
> >>>>>>>>>>>> (which unbreaks queues) should be called before virtqueue_add_inbuf().
> >>>>>>>>>>>>
> >>>>>>>>>>>> Fixes: 8b4ec69d7e09 ("virtio: harden vring IRQ")
> >>>>>>>>>>>> Signed-off-by: Anup Patel <[email protected]>
> >>>>>>>>>>>> ---
> >>>>>>>>>>>> drivers/rpmsg/virtio_rpmsg_bus.c | 6 +++---
> >>>>>>>>>>>> 1 file changed, 3 insertions(+), 3 deletions(-)
> >>>>>>>>>>>>
> >>>>>>>>>>>> diff --git a/drivers/rpmsg/virtio_rpmsg_bus.c b/drivers/rpmsg/virtio_rpmsg_bus.c
> >>>>>>>>>>>> index 905ac7910c98..71a64d2c7644 100644
> >>>>>>>>>>>> --- a/drivers/rpmsg/virtio_rpmsg_bus.c
> >>>>>>>>>>>> +++ b/drivers/rpmsg/virtio_rpmsg_bus.c
> >>>>>>>>>>>> @@ -929,6 +929,9 @@ static int rpmsg_probe(struct virtio_device *vdev)
> >>>>>>>>>>>> /* and half is dedicated for TX */
> >>>>>>>>>>>> vrp->sbufs = bufs_va + total_buf_space / 2;
> >>>>>>>>>>>>
> >>>>>>>>>>>> + /* From this point on, we can notify and get callbacks. */
> >>>>>>>>>>>> + virtio_device_ready(vdev);
> >>>>>>>>>>>> +
> >>>>>>>>>>>
> >>>>>>>>>>> Calling virtio_device_ready() here means that virtqueue_get_buf_ctx_split() can
> >>>>>>>>>>> potentially be called (by way of rpmsg_recv_done()), which will race with
> >>>>>>>>>>> virtqueue_add_inbuf(). If buffers in the virtqueue aren't available then
> >>>>>>>>>>> rpmsg_recv_done() will fail, potentially breaking remote processors' state
> >>>>>>>>>>> machines that don't expect their initial name service to fail when the "device"
> >>>>>>>>>>> has been marked as ready.
> >>>>>>>>>>>
> >>>>>>>>>>> What does make me curious though is that nobody on the remoteproc mailing list
> >>>>>>>>>>> has complained about commit 8b4ec69d7e09 breaking their environment... By now,
> >>>>>>>>>>> i.e rc4, that should have happened. Anyone from TI, ST and Xilinx care to test this on
> >>>>>>>>>>> their rig?
> >>>>>>>>>>
> >>>>>>>>>> I tested on STm32mp1 board using tag v5.19-rc4(03c765b0e3b4)
> >>>>>>>>>> I confirm the issue!
> >>>>>>>>>>
> >>>>>>>>>> Concerning the solution, I share Mathieu's concern. This could break legacy.
> >>>>>>>>>> I made a short test and I would suggest to use __virtio_unbreak_device instead, tounbreak the virtqueues without changing the init sequence.
> >>>>>>>>>>
> >>>>>>>>>> I this case the patch would be:
> >>>>>>>>>>
> >>>>>>>>>> + /*
> >>>>>>>>>> + * Unbreak the virtqueues to allow to add buffers before setting the vdev status
> >>>>>>>>>> + * to ready
> >>>>>>>>>> + */
> >>>>>>>>>> + __virtio_unbreak_device(vdev);
> >>>>>>>>>> +
> >>>>>>>>>>
> >>>>>>>>>> /* set up the receive buffers */
> >>>>>>>>>> for (i = 0; i < vrp->num_bufs / 2; i++) {
> >>>>>>>>>> struct scatterlist sg;
> >>>>>>>>>> void *cpu_addr = vrp->rbufs + i * vrp->buf_size;
> >>>>>>>>>
> >>>>>>>>> This will indeed fix the problem. On the flip side the kernel
> >>>>>>>>> documentation for __virtio_unbreak_device() puzzles me...
> >>>>>>>>> It clearly states that it should be used for probing and restoring but
> >>>>>>>>> _not_ directly by the driver. Function rpmsg_probe() is part of
> >>>>>>>>> probing but also the entry point to a driver.
> >>>>>>>>>
> >>>>>>>>> Michael and virtualisation folks, is this the right way to move forward?
> >>>>>>>>
> >>>>>>>> I don't think it is, __virtio_unbreak_device is intended for core use.
> >>>>>>>
> >>>>>>> Can we fill the rx after virtio_device_ready() in this case?
> >>>>>>>
> >>>>>>> Btw, the driver set driver ok after registering, we probably get a svq
> >>>>>>> kick before DRIVER_OK?
> >>>>
> >>>> By "registering" you mean calling rpmsg_virtio_add_ctrl_dev and
> >>>> rpmsg_ns_register_device?
> >>>
> >>> Yes.
> >>>
> >>>>
> >>>> The rpmsg_ns_register_device has to be called before. Because it has to be
> >>>> probed to handle the first message coming from the remote side to create
> >>>> associated rpmsg local device.
> >>>
> >>> I couldn't find the code to do this, maybe you can give me some hint on this.
> >>
> >> The rpmsg_ns is available here :
> >> https://elixir.bootlin.com/linux/latest/source/drivers/rpmsg/rpmsg_ns.c
> >>
> >> It is probed on rpmsg_ns_register_device call.
> >> https://elixir.bootlin.com/linux/latest/source/drivers/rpmsg/virtio_rpmsg_bus.c#L974
> >
> > Yes but what I want to ask is, it looks to me
> > rpmsg_ns_register_device() only creates a rpmsg device. Do you mean
> > the rpmsg driver that will handle the first message during its probe?
>
> No it will be out of its probe, in its callback. the callback is called
> by the virtio-rpmsg based on the rpmsg receiver address.
>
> For the details:
> In rpmsg virtio implementation there is a mechanism to discover the
> RPMsg services supported by the remote processor: the name service
> announcement. For instance for the rpmsg_tty[1], the remote processor
> sends a rpmsg service announcement message indicating that it supports
> the "rpmsg-tty" service.
> On linux side the rpmsg_ns receives the message and creates a rpmsg
> channel that leads to a rpmsg_tty device creation on the rpmsg bus.
>
> If the rpmsg_ns is not registered (so no rpmsg receiver address
> registered), then when the "ns announcement" is received,the message
> is dropped, the service not initialized.
>
> [1]:https://elixir.bootlin.com/linux/v5.19-rc4/source/drivers/tty/rpmsg_tty.c

Thanks, so if I understand correctly, there could be a race between
the virtio_device_ready() and the name service:

If the announcement came before DRIVER_OK, it might be dropped by the device.

>
> >
> >>
> >>
> >>>
> >>>> It doesn't send message.
> >>>
> >>> I see the function register the device to the bus, I wonder if this
> >>> means the device could be probed and used by the driver before
> >>> virtio_device_ready().
> >>>
> >>>>
> >>>> The risk could be for the rpmsg_ctrl device. Registering it
> >>>> after the virtio_device_ready(vdev) call could make sense...
> >>>
> >>> I see.
> >>>
> >>>>
> >>>>>>>
> >>>>>>> Thanks
> >>>>>>
> >>>>>> Is this an ack for the original patch?
> >>>>>
> >>>>> Nope, I meant, instead of moving virtio_device_ready() a little bit
> >>>>> earlier, can we only move the rvq filling after virtio_device_ready().
> >>>>>
> >>>>> Thanks
> >>>>
> >>>> Please find some concerns about this inversion here:
> >>>> https://lore.kernel.org/lkml/[email protected]/
> >>>>
> >>>> Regarding __virtio_unbreak_device. The pending virtio_break_device is
> >>>> used by some virtio driver.
> >>>> Could we consider that it makes sense to also have a
> >>>> virtio_unbreak_device interface?
> >>>
> >>> We don't want to allow the driver to unbreak a device since it's
> >>> easier to have bugs.
> >>>
> >>>>
> >>>>
> >>>> I do not well understand the reason of the commit:
> >>>> 8b4ec69d7e09 ("virtio: harden vring IRQ", 2022-05-27)
> >>>
> >>> It tries to forbid the virtqueue callbacks to be called before
> >>> virtio_device_ready(). This helps to prevent the malicious device from
> >>> attacking the driver.
> >>>
> >>> But unfortunately, it breaks several driver because:
> >>>
> >>> 1) some driver have races in probe/remove
> >>> 2) it tries to reuse vq->broken which may break the driver that call
> >>> virqueue_add() before virtio_device_ready() which is allowed by the
> >>> spec
> >>>
> >>> There's a discussion to have a better behavior that doesn't break the
> >>> existing drivers. And the IRQ hardening feature is marked as broken
> >>> now, so rpmsg should be fine without any extra effort.
> >>
> >> Thanks for the explanations.
> >> If the discussions are in a mail thread could you give me the reference?
> >
> > Here're the discussions and commits:
> >
> > https://lore.kernel.org/lkml/[email protected]/
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git/commit/?h=linux-next&id=c346dae4f3fbce51bbd4f2ec5e8c6f9b91e93163
> > https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git/commit/?h=linux-next&id=6a9720576cd00d30722c5f755bd17d4cfa9df636
>
> Thanks for the links!
> So no more update planed in drivers/rpmsg/virtio_rpmsg_bus.c, if i well understood...

Michael proposed to allow the callback after vq kick, I think the
rpmsg callback is ready before it kicks the device. If this is true,
no more updates.

But to be safe, I will cc you and all the other maintainers for the
patch of the above proposal.

Thanks

>
> Thanks,
> Arnaud
>
> >
> > Thanks
> >
> >>
> >> Thanks,
> >> Arnaud
> >>
> >>>
> >>>> So following alternative is probably pretty naive:
> >>>> Is the use of virtqueue_disable_cb could be an alternative to the
> >>>> vq->broken usage allowing to register buffer while preventing virtqueue IRQ?
> >>>
> >>> Probably not, there's no guarantee that the device will not send
> >>> notification after virqtueue_disable_cb().
> >>>
> >>> Thanks
> >>>
> >>>>
> >>>> Thanks,
> >>>> Arnaud
> >>>>
> >>>>>
> >>>>>>
> >>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Regards,
> >>>>>>>>>> Arnaud
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks,
> >>>>>>>>>>> Mathieu
> >>>>>>>>>>>
> >>>>>>>>>>>> /* set up the receive buffers */
> >>>>>>>>>>>> for (i = 0; i < vrp->num_bufs / 2; i++) {
> >>>>>>>>>>>> struct scatterlist sg;
> >>>>>>>>>>>> @@ -983,9 +986,6 @@ static int rpmsg_probe(struct virtio_device *vdev)
> >>>>>>>>>>>> */
> >>>>>>>>>>>> notify = virtqueue_kick_prepare(vrp->rvq);
> >>>>>>>>>>>>
> >>>>>>>>>>>> - /* From this point on, we can notify and get callbacks. */
> >>>>>>>>>>>> - virtio_device_ready(vdev);
> >>>>>>>>>>>> -
> >>>>>>>>>>>> /* tell the remote processor it can start sending messages */
> >>>>>>>>>>>> /*
> >>>>>>>>>>>> * this might be concurrent with callbacks, but we are only
> >>>>>>>>>>>> --
> >>>>>>>>>>>> 2.34.1
> >>>>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>