2022-02-09 15:32:32

by Wen Gu

[permalink] [raw]
Subject: [PATCH net] net/smc: Avoid overwriting the copies of clcsock callback functions

The callback functions of clcsock will be saved and replaced during
the fallback. But if the fallback happens more than once, then the
copies of these callback functions will be overwritten incorrectly,
resulting in a loop call issue:

clcsk->sk_error_report
|- smc_fback_error_report() <------------------------------|
|- smc_fback_forward_wakeup() | (loop)
|- clcsock_callback() (incorrectly overwritten) |
|- smc->clcsk_error_report() ------------------|

So this patch fixes the issue by saving these function pointers only
once in the fallback and avoiding overwriting.

Reported-by: [email protected]
Fixes: 341adeec9ada ("net/smc: Forward wakeup to smc socket waitqueue after fallback")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wen Gu <[email protected]>
---
net/smc/af_smc.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 8c89d0b..306d9e8c 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -667,14 +667,17 @@ static void smc_fback_error_report(struct sock *clcsk)
static int smc_switch_to_fallback(struct smc_sock *smc, int reason_code)
{
struct sock *clcsk;
+ int rc = 0;

mutex_lock(&smc->clcsock_release_lock);
if (!smc->clcsock) {
- mutex_unlock(&smc->clcsock_release_lock);
- return -EBADF;
+ rc = -EBADF;
+ goto out;
}
clcsk = smc->clcsock->sk;

+ if (smc->use_fallback)
+ goto out;
smc->use_fallback = true;
smc->fallback_rsn = reason_code;
smc_stat_fallback(smc);
@@ -702,8 +705,9 @@ static int smc_switch_to_fallback(struct smc_sock *smc, int reason_code)
smc->clcsock->sk->sk_user_data =
(void *)((uintptr_t)smc | SK_USER_DATA_NOCOPY);
}
+out:
mutex_unlock(&smc->clcsock_release_lock);
- return 0;
+ return rc;
}

/* fall back during connect */
--
1.8.3.1



2022-02-10 02:54:12

by Tony Lu

[permalink] [raw]
Subject: Re: [PATCH net] net/smc: Avoid overwriting the copies of clcsock callback functions

On Wed, Feb 09, 2022 at 10:10:53PM +0800, Wen Gu wrote:
> The callback functions of clcsock will be saved and replaced during
> the fallback. But if the fallback happens more than once, then the
> copies of these callback functions will be overwritten incorrectly,
> resulting in a loop call issue:
>
> clcsk->sk_error_report
> |- smc_fback_error_report() <------------------------------|
> |- smc_fback_forward_wakeup() | (loop)
> |- clcsock_callback() (incorrectly overwritten) |
> |- smc->clcsk_error_report() ------------------|
>
> So this patch fixes the issue by saving these function pointers only
> once in the fallback and avoiding overwriting.
>
> Reported-by: [email protected]
> Fixes: 341adeec9ada ("net/smc: Forward wakeup to smc socket waitqueue after fallback")
> Link: https://lore.kernel.org/r/[email protected]
> Signed-off-by: Wen Gu <[email protected]>
> ---
> net/smc/af_smc.c | 10 +++++++---
> 1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
> index 8c89d0b..306d9e8c 100644
> --- a/net/smc/af_smc.c
> +++ b/net/smc/af_smc.c
> @@ -667,14 +667,17 @@ static void smc_fback_error_report(struct sock *clcsk)
> static int smc_switch_to_fallback(struct smc_sock *smc, int reason_code)
> {
> struct sock *clcsk;
> + int rc = 0;
>
> mutex_lock(&smc->clcsock_release_lock);
> if (!smc->clcsock) {
> - mutex_unlock(&smc->clcsock_release_lock);
> - return -EBADF;
> + rc = -EBADF;
> + goto out;
> }
> clcsk = smc->clcsock->sk;
>
> + if (smc->use_fallback)
> + goto out;
> smc->use_fallback = true;

I am wondering that there is a potential racing. If ->use_fallback is
setted to true, but the rest of replacing process is on the way, others
who tested and passed ->use_fallback, they would get old value before
replacing.

Thanks,
Tony Lu

2022-02-10 09:30:42

by Wen Gu

[permalink] [raw]
Subject: Re: [PATCH net] net/smc: Avoid overwriting the copies of clcsock callback functions



On 2022/2/10 10:50 am, Tony Lu wrote:

> I am wondering that there is a potential racing. If ->use_fallback is
> setted to true, but the rest of replacing process is on the way, others
> who tested and passed ->use_fallback, they would get old value before
> replacing.
>

Thanks for your comments.

I understand your concern. But when I went through all the places that
check for smc->use_fallback, I haven't found the exact potential racing
point. Please point out if I missed something. Thank you.

In my humble opinion, most of the operations after smc->use_fallback check
have no direct relationship with what did in smc_switch_to_fallback() (the
replacement of clcsock callback functions), except for which in smc_sendmsg(),
smc_recvmsg() and smc_sendpage():

smc_sendmsg():

if (smc->use_fallback) {
rc = smc->clcsock->ops->sendmsg(smc->clcsock, msg, len);
}

smc_recvmsg():

if (smc->use_fallback) {
rc = smc->clcsock->ops->recvmsg(smc->clcsock, msg, len, flags);
}

smc_sendpage():

if (smc->use_fallback) {
rc = kernel_sendpage(smc->clcsock, page, offset,
size, flags);
}

If smc->use_fallback is set to true, but callback functions (sk_data_ready ...)
of clcsock haven't been replaced yet at this moment, there may be a racing as
you described.

But it won't happen, because fallback must already be done before sending and receiving.

What do you think about it?

Thanks,
Wen Gu


2022-02-11 02:46:59

by Tony Lu

[permalink] [raw]
Subject: Re: [PATCH net] net/smc: Avoid overwriting the copies of clcsock callback functions

On Thu, Feb 10, 2022 at 04:56:00PM +0800, Wen Gu wrote:
>
>
> On 2022/2/10 10:50 am, Tony Lu wrote:
>
> > I am wondering that there is a potential racing. If ->use_fallback is
> > setted to true, but the rest of replacing process is on the way, others
> > who tested and passed ->use_fallback, they would get old value before
> > replacing.
> >
>
> Thanks for your comments.
>
> I understand your concern. But when I went through all the places that
> check for smc->use_fallback, I haven't found the exact potential racing
> point. Please point out if I missed something. Thank you.
>
> In my humble opinion, most of the operations after smc->use_fallback check
> have no direct relationship with what did in smc_switch_to_fallback() (the
> replacement of clcsock callback functions), except for which in smc_sendmsg(),
> smc_recvmsg() and smc_sendpage():
>
> smc_sendmsg():
>
> if (smc->use_fallback) {
> rc = smc->clcsock->ops->sendmsg(smc->clcsock, msg, len);
> }
>
> smc_recvmsg():
>
> if (smc->use_fallback) {
> rc = smc->clcsock->ops->recvmsg(smc->clcsock, msg, len, flags);
> }
>
> smc_sendpage():
>
> if (smc->use_fallback) {
> rc = kernel_sendpage(smc->clcsock, page, offset,
> size, flags);
> }
>
> If smc->use_fallback is set to true, but callback functions (sk_data_ready ...)
> of clcsock haven't been replaced yet at this moment, there may be a racing as
> you described.
>
> But it won't happen, because fallback must already be done before sending and receiving.
>
> What do you think about it?
>
I am concerning about the non-blocking work in workqueue. If we can make
sure the order of fallback is determined, it would be safe. From your
analysis, I think it is safe for now.

Let's back to the patch, the original version of switch_to_fallback()
has a implicit reentrant semantics. This fixes should work, thanks.

Thanks for your detailed investigation.

Best regards,
Tony Lu

2022-02-11 13:37:56

by patchwork-bot+netdevbpf

[permalink] [raw]
Subject: Re: [PATCH net] net/smc: Avoid overwriting the copies of clcsock callback functions

Hello:

This patch was applied to netdev/net.git (master)
by David S. Miller <[email protected]>:

On Wed, 9 Feb 2022 22:10:53 +0800 you wrote:
> The callback functions of clcsock will be saved and replaced during
> the fallback. But if the fallback happens more than once, then the
> copies of these callback functions will be overwritten incorrectly,
> resulting in a loop call issue:
>
> clcsk->sk_error_report
> |- smc_fback_error_report() <------------------------------|
> |- smc_fback_forward_wakeup() | (loop)
> |- clcsock_callback() (incorrectly overwritten) |
> |- smc->clcsk_error_report() ------------------|
>
> [...]

Here is the summary with links:
- [net] net/smc: Avoid overwriting the copies of clcsock callback functions
https://git.kernel.org/netdev/net/c/1de9770d121e

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html