2021-08-06 19:13:13

by Xianting Tian

[permalink] [raw]
Subject: [PATCH] mailbox: fix a UAF bug in msg_submit()

We met a UAF issue during our mailbox testing.

In synchronous mailbox, we use mbox_send_message() to send a message
and wait for completion. mbox_send_message() calls msg_submit() to
send the message for the first time, if timeout, it will send the
message in tx_tick() for the second time.

We assume message sending failed for both two times, then the message
will be still in the internal buffer of a chan(chan->msg_data[idx]).
It will be send again in the same way when mbox_send_message() is
called next time. But, at this time this message (chan->msg_data[idx])
may be a UAF pointer, as the message is passed to mailbox core by user.
User may free it after last calling of mbox_send_message() returned
or not. Who knows!!!

In this patch, if the first time sending timeout, we pass timeout
info(-ETIME) to msg_submit() when do the second time sending by
tx_tick(). If it still send failed (chan->mbox->ops->send_data()
returned non-zero value) in the second time, we will give up this
message(chan->msg_count--) sending. It doesn't matter, user can chose
to send it again.

Actually, the issue I described above doesn't exist if
'chan->mbox->ops->send_data()' always return 0. Because if it always
returns 0, we will always do 'chan->msg_count—' regardless of message
sending success or failure. We have such mailbox driver, for example,
hi6220_mbox_send_data() always return 0. But we still have mailbox
drivers, which don't always return 0, for example, flexrm_send_data()
of drivers/mailbox/bcm-flexrm-mailbox.c.

Signed-off-by: Xianting Tian <[email protected]>
---
drivers/mailbox/mailbox.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
index 3e7d4b20a..3e010aafa 100644
--- a/drivers/mailbox/mailbox.c
+++ b/drivers/mailbox/mailbox.c
@@ -50,7 +50,7 @@ static int add_to_rbuf(struct mbox_chan *chan, void *mssg)
return idx;
}

-static void msg_submit(struct mbox_chan *chan)
+static void msg_submit(struct mbox_chan *chan, int last_submit)
{
unsigned count, idx;
unsigned long flags;
@@ -75,7 +75,7 @@ static void msg_submit(struct mbox_chan *chan)
chan->cl->tx_prepare(chan->cl, data);
/* Try to submit a message to the MBOX controller */
err = chan->mbox->ops->send_data(chan, data);
- if (!err) {
+ if (!err || last_submit == -ETIME) {
chan->active_req = data;
chan->msg_count--;
}
@@ -101,7 +101,7 @@ static void tx_tick(struct mbox_chan *chan, int r)
spin_unlock_irqrestore(&chan->lock, flags);

/* Submit next message */
- msg_submit(chan);
+ msg_submit(chan, r);

if (!mssg)
return;
@@ -260,7 +260,7 @@ int mbox_send_message(struct mbox_chan *chan, void *mssg)
return t;
}

- msg_submit(chan);
+ msg_submit(chan, 0);

if (chan->cl->tx_block) {
unsigned long wait;
--
2.17.1


2021-08-10 16:22:35

by Xianting Tian

[permalink] [raw]
Subject: Re: [PATCH] mailbox: fix a UAF bug in msg_submit()

Could I get the comments for the patch, thanks.


在 2021/8/6 下午8:15, Xianting Tian 写道:
> We met a UAF issue during our mailbox testing.
>
> In synchronous mailbox, we use mbox_send_message() to send a message
> and wait for completion. mbox_send_message() calls msg_submit() to
> send the message for the first time, if timeout, it will send the
> message in tx_tick() for the second time.
>
> We assume message sending failed for both two times, then the message
> will be still in the internal buffer of a chan(chan->msg_data[idx]).
> It will be send again in the same way when mbox_send_message() is
> called next time. But, at this time this message (chan->msg_data[idx])
> may be a UAF pointer, as the message is passed to mailbox core by user.
> User may free it after last calling of mbox_send_message() returned
> or not. Who knows!!!
>
> In this patch, if the first time sending timeout, we pass timeout
> info(-ETIME) to msg_submit() when do the second time sending by
> tx_tick(). If it still send failed (chan->mbox->ops->send_data()
> returned non-zero value) in the second time, we will give up this
> message(chan->msg_count--) sending. It doesn't matter, user can chose
> to send it again.
>
> Actually, the issue I described above doesn't exist if
> 'chan->mbox->ops->send_data()' always return 0. Because if it always
> returns 0, we will always do 'chan->msg_count—' regardless of message
> sending success or failure. We have such mailbox driver, for example,
> hi6220_mbox_send_data() always return 0. But we still have mailbox
> drivers, which don't always return 0, for example, flexrm_send_data()
> of drivers/mailbox/bcm-flexrm-mailbox.c.
>
> Signed-off-by: Xianting Tian <[email protected]>
> ---
> drivers/mailbox/mailbox.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
> index 3e7d4b20a..3e010aafa 100644
> --- a/drivers/mailbox/mailbox.c
> +++ b/drivers/mailbox/mailbox.c
> @@ -50,7 +50,7 @@ static int add_to_rbuf(struct mbox_chan *chan, void *mssg)
> return idx;
> }
>
> -static void msg_submit(struct mbox_chan *chan)
> +static void msg_submit(struct mbox_chan *chan, int last_submit)
> {
> unsigned count, idx;
> unsigned long flags;
> @@ -75,7 +75,7 @@ static void msg_submit(struct mbox_chan *chan)
> chan->cl->tx_prepare(chan->cl, data);
> /* Try to submit a message to the MBOX controller */
> err = chan->mbox->ops->send_data(chan, data);
> - if (!err) {
> + if (!err || last_submit == -ETIME) {
> chan->active_req = data;
> chan->msg_count--;
> }
> @@ -101,7 +101,7 @@ static void tx_tick(struct mbox_chan *chan, int r)
> spin_unlock_irqrestore(&chan->lock, flags);
>
> /* Submit next message */
> - msg_submit(chan);
> + msg_submit(chan, r);
>
> if (!mssg)
> return;
> @@ -260,7 +260,7 @@ int mbox_send_message(struct mbox_chan *chan, void *mssg)
> return t;
> }
>
> - msg_submit(chan);
> + msg_submit(chan, 0);
>
> if (chan->cl->tx_block) {
> unsigned long wait;

2021-08-11 02:40:05

by Guo Ren

[permalink] [raw]
Subject: Re: [PATCH] mailbox: fix a UAF bug in msg_submit()

On Fri, Aug 6, 2021 at 8:15 PM Xianting Tian
<[email protected]> wrote:
>
> We met a UAF issue during our mailbox testing.
>
> In synchronous mailbox, we use mbox_send_message() to send a message
> and wait for completion. mbox_send_message() calls msg_submit() to
> send the message for the first time, if timeout, it will send the
> message in tx_tick() for the second time.
Yes, in chan->cl->tx_block will give a second chance to transmit the
msg in tx_tick. If chan->mbox->ops->send_data returned error again,
then mbox_send_message would return to the user's driver, and the user
driver would free the mssg which is still in chan->msg_data &
msg_count. Then it caused "Used After Free" problem.

The current second chance for chan->cl->tx_block is tricky and it's
hard for mailbox driver and user driver implementation in the way.
I recommend deleting the second chance mechanism totally to make the
code understandable. (If first failed, 2th & 3th & 4th would be still
failed.)


>
> We assume message sending failed for both two times, then the message
> will be still in the internal buffer of a chan(chan->msg_data[idx]).
> It will be send again in the same way when mbox_send_message() is
> called next time. But, at this time this message (chan->msg_data[idx])
> may be a UAF pointer, as the message is passed to mailbox core by user.
> User may free it after last calling of mbox_send_message() returned
> or not. Who knows!!!
>
> In this patch, if the first time sending timeout, we pass timeout
> info(-ETIME) to msg_submit() when do the second time sending by
> tx_tick(). If it still send failed (chan->mbox->ops->send_data()
> returned non-zero value) in the second time, we will give up this
> message(chan->msg_count--) sending. It doesn't matter, user can chose
> to send it again.
>
> Actually, the issue I described above doesn't exist if
> 'chan->mbox->ops->send_data()' always return 0. Because if it always
> returns 0, we will always do 'chan->msg_count—' regardless of message
> sending success or failure. We have such mailbox driver, for example,
> hi6220_mbox_send_data() always return 0. But we still have mailbox
> drivers, which don't always return 0, for example, flexrm_send_data()
> of drivers/mailbox/bcm-flexrm-mailbox.c.
>
> Signed-off-by: Xianting Tian <[email protected]>
> ---
> drivers/mailbox/mailbox.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
> index 3e7d4b20a..3e010aafa 100644
> --- a/drivers/mailbox/mailbox.c
> +++ b/drivers/mailbox/mailbox.c
> @@ -50,7 +50,7 @@ static int add_to_rbuf(struct mbox_chan *chan, void *mssg)
> return idx;
> }
>
> -static void msg_submit(struct mbox_chan *chan)
> +static void msg_submit(struct mbox_chan *chan, int last_submit)
> {
> unsigned count, idx;
> unsigned long flags;
> @@ -75,7 +75,7 @@ static void msg_submit(struct mbox_chan *chan)
> chan->cl->tx_prepare(chan->cl, data);
> /* Try to submit a message to the MBOX controller */
> err = chan->mbox->ops->send_data(chan, data);
> - if (!err) {
> + if (!err || last_submit == -ETIME) {
> chan->active_req = data;
> chan->msg_count--;
> }
> @@ -101,7 +101,7 @@ static void tx_tick(struct mbox_chan *chan, int r)
> spin_unlock_irqrestore(&chan->lock, flags);
>
> /* Submit next message */
> - msg_submit(chan);
> + msg_submit(chan, r);
>
> if (!mssg)
> return;
> @@ -260,7 +260,7 @@ int mbox_send_message(struct mbox_chan *chan, void *mssg)
> return t;
> }
>
> - msg_submit(chan);
> + msg_submit(chan, 0);
>
> if (chan->cl->tx_block) {
> unsigned long wait;
> --
> 2.17.1
>


--
Best Regards
Guo Ren

ML: https://lore.kernel.org/linux-csky/

2021-08-17 03:42:44

by Xianting Tian

[permalink] [raw]
Subject: Re: [PATCH] mailbox: fix a UAF bug in msg_submit()


在 2021/8/11 上午10:37, Guo Ren 写道:
> On Fri, Aug 6, 2021 at 8:15 PM Xianting Tian
> <[email protected]> wrote:
>> We met a UAF issue during our mailbox testing.
>>
>> In synchronous mailbox, we use mbox_send_message() to send a message
>> and wait for completion. mbox_send_message() calls msg_submit() to
>> send the message for the first time, if timeout, it will send the
>> message in tx_tick() for the second time.
> Yes, in chan->cl->tx_block will give a second chance to transmit the
> msg in tx_tick. If chan->mbox->ops->send_data returned error again,
> then mbox_send_message would return to the user's driver, and the user
> driver would free the mssg which is still in chan->msg_data &
> msg_count. Then it caused "Used After Free" problem.
>
> The current second chance for chan->cl->tx_block is tricky and it's
> hard for mailbox driver and user driver implementation in the way.
> I recommend deleting the second chance mechanism totally to make the
> code understandable. (If first failed, 2th & 3th & 4th would be still
> failed.)
>
Thanks Guo Ren's comments,

Hi Jassi Brar,

Could you comment this patch, wthether it is a issue or not?  we look
forward to your reply.

>> We assume message sending failed for both two times, then the message
>> will be still in the internal buffer of a chan(chan->msg_data[idx]).
>> It will be send again in the same way when mbox_send_message() is
>> called next time. But, at this time this message (chan->msg_data[idx])
>> may be a UAF pointer, as the message is passed to mailbox core by user.
>> User may free it after last calling of mbox_send_message() returned
>> or not. Who knows!!!
>>
>> In this patch, if the first time sending timeout, we pass timeout
>> info(-ETIME) to msg_submit() when do the second time sending by
>> tx_tick(). If it still send failed (chan->mbox->ops->send_data()
>> returned non-zero value) in the second time, we will give up this
>> message(chan->msg_count--) sending. It doesn't matter, user can chose
>> to send it again.
>>
>> Actually, the issue I described above doesn't exist if
>> 'chan->mbox->ops->send_data()' always return 0. Because if it always
>> returns 0, we will always do 'chan->msg_count—' regardless of message
>> sending success or failure. We have such mailbox driver, for example,
>> hi6220_mbox_send_data() always return 0. But we still have mailbox
>> drivers, which don't always return 0, for example, flexrm_send_data()
>> of drivers/mailbox/bcm-flexrm-mailbox.c.
>>
>> Signed-off-by: Xianting Tian <[email protected]>
>> ---
>> drivers/mailbox/mailbox.c | 8 ++++----
>> 1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
>> index 3e7d4b20a..3e010aafa 100644
>> --- a/drivers/mailbox/mailbox.c
>> +++ b/drivers/mailbox/mailbox.c
>> @@ -50,7 +50,7 @@ static int add_to_rbuf(struct mbox_chan *chan, void *mssg)
>> return idx;
>> }
>>
>> -static void msg_submit(struct mbox_chan *chan)
>> +static void msg_submit(struct mbox_chan *chan, int last_submit)
>> {
>> unsigned count, idx;
>> unsigned long flags;
>> @@ -75,7 +75,7 @@ static void msg_submit(struct mbox_chan *chan)
>> chan->cl->tx_prepare(chan->cl, data);
>> /* Try to submit a message to the MBOX controller */
>> err = chan->mbox->ops->send_data(chan, data);
>> - if (!err) {
>> + if (!err || last_submit == -ETIME) {
>> chan->active_req = data;
>> chan->msg_count--;
>> }
>> @@ -101,7 +101,7 @@ static void tx_tick(struct mbox_chan *chan, int r)
>> spin_unlock_irqrestore(&chan->lock, flags);
>>
>> /* Submit next message */
>> - msg_submit(chan);
>> + msg_submit(chan, r);
>>
>> if (!mssg)
>> return;
>> @@ -260,7 +260,7 @@ int mbox_send_message(struct mbox_chan *chan, void *mssg)
>> return t;
>> }
>>
>> - msg_submit(chan);
>> + msg_submit(chan, 0);
>>
>> if (chan->cl->tx_block) {
>> unsigned long wait;
>> --
>> 2.17.1
>>
>

2021-08-17 04:31:26

by Jassi Brar

[permalink] [raw]
Subject: Re: [PATCH] mailbox: fix a UAF bug in msg_submit()

On Fri, Aug 6, 2021 at 7:15 AM Xianting Tian
<[email protected]> wrote:
>
> We met a UAF issue during our mailbox testing.
>
> In synchronous mailbox, we use mbox_send_message() to send a message
> and wait for completion. mbox_send_message() calls msg_submit() to
> send the message for the first time, if timeout, it will send the
> message in tx_tick() for the second time.
>
Seems like your controller's .send_data() returns error. Can you
please explain why it does so? Because
send_data() only _accepts_ data for further transmission... which
should seldom be a problem.

thanks.

2021-08-17 05:42:21

by Xianting Tian

[permalink] [raw]
Subject: Re: [PATCH] mailbox: fix a UAF bug in msg_submit()


在 2021/8/17 下午12:29, Jassi Brar 写道:
> On Fri, Aug 6, 2021 at 7:15 AM Xianting Tian
> <[email protected]> wrote:
>> We met a UAF issue during our mailbox testing.
>>
>> In synchronous mailbox, we use mbox_send_message() to send a message
>> and wait for completion. mbox_send_message() calls msg_submit() to
>> send the message for the first time, if timeout, it will send the
>> message in tx_tick() for the second time.
>>
> Seems like your controller's .send_data() returns error. Can you
> please explain why it does so? Because
> send_data() only _accepts_ data for further transmission... which
> should seldom be a problem.


>
> thanks.

2021-08-17 06:01:38

by Xianting Tian

[permalink] [raw]
Subject: Re: [PATCH] mailbox: fix a UAF bug in msg_submit()


在 2021/8/17 下午12:29, Jassi Brar 写道:
> On Fri, Aug 6, 2021 at 7:15 AM Xianting Tian
> <[email protected]> wrote:
>> We met a UAF issue during our mailbox testing.
>>
>> In synchronous mailbox, we use mbox_send_message() to send a message
>> and wait for completion. mbox_send_message() calls msg_submit() to
>> send the message for the first time, if timeout, it will send the
>> message in tx_tick() for the second time.
>>
> Seems like your controller's .send_data() returns error. Can you
> please explain why it does so? Because
> send_data() only _accepts_ data for further transmission... which
> should seldom be a problem.

Thanks for the comments,

We developed virtio-mailbox for heterogeneous virtualization system.

virtio-mailbox is based on the mialbox framework.

In virtio framework, its send func 'virtqueue_add_outbuf()' may return
error, which caused .send_data() return error.  And also contains other
csenarios.

But I think mailbox framework shouldn't depend on .send_data() always
return OK,  as .send_data() is implemented by mailbox hardware
manufacturer, which is not controlled by mailbox framework itself.

You said 'seldom',  but it still possible we can meet such issue.  sucn
as flexrm_send_data() of drivers/mailbox/bcm-flexrm-mailbox.c.

I think mailbox framework should be work normaly no matter .send_data()
returns ok or not ok.  Do you think so? thanks

>
> thanks.

2021-08-17 08:02:47

by Xianting Tian

[permalink] [raw]
Subject: Re: [PATCH] mailbox: fix a UAF bug in msg_submit()


在 2021/8/17 下午1:58, Xianting TIan 写道:
>
> 在 2021/8/17 下午12:29, Jassi Brar 写道:
>> On Fri, Aug 6, 2021 at 7:15 AM Xianting Tian
>> <[email protected]> wrote:
>>> We met a UAF issue during our mailbox testing.
>>>
>>> In synchronous mailbox, we use mbox_send_message() to send a message
>>> and wait for completion. mbox_send_message() calls msg_submit() to
>>> send the message for the first time, if timeout, it will send the
>>> message in tx_tick() for the second time.
>>>
>> Seems like your controller's  .send_data() returns error. Can you
>> please explain why it does so? Because
>> send_data() only _accepts_ data for further transmission... which
>> should seldom be a problem.
>
> Thanks for the comments,
>
> We developed virtio-mailbox for heterogeneous virtualization system.
>
> virtio-mailbox is based on the mialbox framework.
>
> In virtio framework, its send func 'virtqueue_add_outbuf()' may return
> error, which caused .send_data() return error.  And also contains
> other csenarios.
>
> But I think mailbox framework shouldn't depend on .send_data() always
> return OK,  as .send_data() is implemented by mailbox hardware
> manufacturer, which is not controlled by mailbox framework itself.
>
> You said 'seldom',  but it still possible we can meet such issue. sucn
> as flexrm_send_data() of drivers/mailbox/bcm-flexrm-mailbox.c.
>
> I think mailbox framework should be work normaly no matter
> .send_data() returns ok or not ok.  Do you think so? thanks

Another solution is to ignore the return value of .send_data() in
msg_submit(),

change

        err = chan->mbox->ops->send_data(chan, data);
        if (!err) {
                chan->active_req = data;
                chan->msg_count--;
        }

to

        chan->mbox->ops->send_data(chan, data);
        chan->active_req = data;
        chan->msg_count--;

But the side effect of the solution is obvious, as if send failed in the
first time, it will have no chance to sent it in tx_tick() for the
second time. That is to say, no retry.

So I think the solution in this patch is better.

Looking forward to your further comments, thanks

>
>>
>> thanks.

2021-08-17 23:34:45

by Jassi Brar

[permalink] [raw]
Subject: Re: [PATCH] mailbox: fix a UAF bug in msg_submit()

On Tue, Aug 17, 2021 at 3:01 AM Xianting TIan
<[email protected]> wrote:
>
>
> 在 2021/8/17 下午1:58, Xianting TIan 写道:
> >
> > 在 2021/8/17 下午12:29, Jassi Brar 写道:
> >> On Fri, Aug 6, 2021 at 7:15 AM Xianting Tian
> >> <[email protected]> wrote:
> >>> We met a UAF issue during our mailbox testing.
> >>>
> >>> In synchronous mailbox, we use mbox_send_message() to send a message
> >>> and wait for completion. mbox_send_message() calls msg_submit() to
> >>> send the message for the first time, if timeout, it will send the
> >>> message in tx_tick() for the second time.
> >>>
> >> Seems like your controller's .send_data() returns error. Can you
> >> please explain why it does so? Because
> >> send_data() only _accepts_ data for further transmission... which
> >> should seldom be a problem.
> >
> > Thanks for the comments,
> >
> > We developed virtio-mailbox for heterogeneous virtualization system.
> >
> > virtio-mailbox is based on the mialbox framework.
> >
> > In virtio framework, its send func 'virtqueue_add_outbuf()' may return
> > error, which caused .send_data() return error. And also contains
> > other csenarios.
> >
> > But I think mailbox framework shouldn't depend on .send_data() always
> > return OK, as .send_data() is implemented by mailbox hardware
> > manufacturer, which is not controlled by mailbox framework itself.
> >
As I said, send_data() is basically "accepted for transfer", and not
"transferred fine".

> > You said 'seldom', but it still possible we can meet such issue. sucn
> > as flexrm_send_data() of drivers/mailbox/bcm-flexrm-mailbox.c.
> >
The api is used not just in synchronous mode.
And the flexrm driver relies on ACK method, not the synchronous one.

> > I think mailbox framework should be work normaly no matter
> > .send_data() returns ok or not ok. Do you think so? thanks
>
Normal is your controller driver should be ready after .startup() and
accepts the first message submitted to it.
If it does that, everything would work.

> Another solution is to ignore the return value of .send_data() in
> msg_submit(),
>
> change
>
> err = chan->mbox->ops->send_data(chan, data);
> if (!err) {
> chan->active_req = data;
> chan->msg_count--;
> }
>
> to
>
> chan->mbox->ops->send_data(chan, data);
> chan->active_req = data;
> chan->msg_count--;
>
Yes, I also have something like this in mind. Definitely not the hack
in your original post.

> But the side effect of the solution is obvious, as if send failed in the
> first time, it will have no chance to sent it in tx_tick() for the
> second time. That is to say, no retry.
>
The api doesn't retry. The .last_tx_done() is supposed to tell the
api when is it ok to send a message.

The following should work for you (though I believe your code needs
fixing), which anyway, should have been there.

diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
index 3e7d4b20ab34..9824a51b82fa 100644
--- a/drivers/mailbox/mailbox.c
+++ b/drivers/mailbox/mailbox.c
@@ -75,10 +75,12 @@ static void msg_submit(struct mbox_chan *chan)
chan->cl->tx_prepare(chan->cl, data);
/* Try to submit a message to the MBOX controller */
err = chan->mbox->ops->send_data(chan, data);
- if (!err) {
+ if (!err)
chan->active_req = data;
- chan->msg_count--;
- }
+
+ /* done with another message */
+ chan->msg_count--;
+
exit:
spin_unlock_irqrestore(&chan->lock, flags);

2021-08-18 01:41:54

by Xianting Tian

[permalink] [raw]
Subject: Re: [PATCH] mailbox: fix a UAF bug in msg_submit()


在 2021/8/18 上午7:33, Jassi Brar 写道:
> On Tue, Aug 17, 2021 at 3:01 AM Xianting TIan
> <[email protected]> wrote:
>>
>> 在 2021/8/17 下午1:58, Xianting TIan 写道:
>>> 在 2021/8/17 下午12:29, Jassi Brar 写道:
>>>> On Fri, Aug 6, 2021 at 7:15 AM Xianting Tian
>>>> <[email protected]> wrote:
>>>>> We met a UAF issue during our mailbox testing.
>>>>>
>>>>> In synchronous mailbox, we use mbox_send_message() to send a message
>>>>> and wait for completion. mbox_send_message() calls msg_submit() to
>>>>> send the message for the first time, if timeout, it will send the
>>>>> message in tx_tick() for the second time.
>>>>>
>>>> Seems like your controller's .send_data() returns error. Can you
>>>> please explain why it does so? Because
>>>> send_data() only _accepts_ data for further transmission... which
>>>> should seldom be a problem.
>>> Thanks for the comments,
>>>
>>> We developed virtio-mailbox for heterogeneous virtualization system.
>>>
>>> virtio-mailbox is based on the mialbox framework.
>>>
>>> In virtio framework, its send func 'virtqueue_add_outbuf()' may return
>>> error, which caused .send_data() return error. And also contains
>>> other csenarios.
>>>
>>> But I think mailbox framework shouldn't depend on .send_data() always
>>> return OK, as .send_data() is implemented by mailbox hardware
>>> manufacturer, which is not controlled by mailbox framework itself.
>>>
> As I said, send_data() is basically "accepted for transfer", and not
> "transferred fine".
>
>>> You said 'seldom', but it still possible we can meet such issue. sucn
>>> as flexrm_send_data() of drivers/mailbox/bcm-flexrm-mailbox.c.
>>>
> The api is used not just in synchronous mode.
> And the flexrm driver relies on ACK method, not the synchronous one.
>
>>> I think mailbox framework should be work normaly no matter
>>> .send_data() returns ok or not ok. Do you think so? thanks
> Normal is your controller driver should be ready after .startup() and
> accepts the first message submitted to it.
> If it does that, everything would work.
>
>> Another solution is to ignore the return value of .send_data() in
>> msg_submit(),
>>
>> change
>>
>> err = chan->mbox->ops->send_data(chan, data);
>> if (!err) {
>> chan->active_req = data;
>> chan->msg_count--;
>> }
>>
>> to
>>
>> chan->mbox->ops->send_data(chan, data);
>> chan->active_req = data;
>> chan->msg_count--;
>>
> Yes, I also have something like this in mind. Definitely not the hack
> in your original post.
>
>> But the side effect of the solution is obvious, as if send failed in the
>> first time, it will have no chance to sent it in tx_tick() for the
>> second time. That is to say, no retry.
>>
> The api doesn't retry. The .last_tx_done() is supposed to tell the
> api when is it ok to send a message.
>
> The following should work for you (though I believe your code needs
> fixing), which anyway, should have been there.

thanks for the comment, so you apply below solution to kernel code?

I will use it as our solution.

>
> diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
> index 3e7d4b20ab34..9824a51b82fa 100644
> --- a/drivers/mailbox/mailbox.c
> +++ b/drivers/mailbox/mailbox.c
> @@ -75,10 +75,12 @@ static void msg_submit(struct mbox_chan *chan)
> chan->cl->tx_prepare(chan->cl, data);
> /* Try to submit a message to the MBOX controller */
> err = chan->mbox->ops->send_data(chan, data);
> - if (!err) {
> + if (!err)
> chan->active_req = data;
> - chan->msg_count--;
> - }
> +
> + /* done with another message */
> + chan->msg_count--;
> +
> exit:
> spin_unlock_irqrestore(&chan->lock, flags);