sk_msg_trim() tries to only update curr pointer if it falls into
the trimmed region. The logic, however, does not take into the
account pointer wrapping that sk_msg_iter_var_prev() does.
This means that when the message was trimmed completely, the new
curr pointer would have the value of MAX_MSG_FRAGS - 1, which is
neither smaller than any other value, nor would it actually be
correct.
Special case the trimming to 0 length a little bit.
This bug caused the TLS code to not copy all of the message, if
zero copy filled in fewer sg entries than memcopy would need.
Big thanks to Alexander Potapenko for the non-KMSAN reproducer.
Fixes: d829e9c4112b ("tls: convert to generic sk_msg interface")
Reported-by: [email protected]
Reported-by: [email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
---
Daniel, John, does this look okay?
CC: Eric Biggers <[email protected]>
CC: [email protected]
CC: [email protected]
CC: [email protected]
net/core/skmsg.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index cf390e0aa73d..c42c145216b1 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -276,7 +276,10 @@ void sk_msg_trim(struct sock *sk, struct sk_msg *msg, int len)
* However trimed data that has not yet been used in a copy op
* does not require an update.
*/
- if (msg->sg.curr >= i) {
+ if (!msg->sg.size) {
+ msg->sg.curr = 0;
+ msg->sg.copybreak = 0;
+ } else if (msg->sg.curr >= i) {
msg->sg.curr = i;
msg->sg.copybreak = msg->sg.data[i].length;
}
--
2.23.0
On Wed, 30 Oct 2019 09:05:42 -0700, Jakub Kicinski wrote:
> sk_msg_trim() tries to only update curr pointer if it falls into
> the trimmed region. The logic, however, does not take into the
> account pointer wrapping that sk_msg_iter_var_prev() does.
> This means that when the message was trimmed completely, the new
> curr pointer would have the value of MAX_MSG_FRAGS - 1, which is
> neither smaller than any other value, nor would it actually be
> correct.
>
> Special case the trimming to 0 length a little bit.
>
> This bug caused the TLS code to not copy all of the message, if
> zero copy filled in fewer sg entries than memcopy would need.
>
> Big thanks to Alexander Potapenko for the non-KMSAN reproducer.
>
> Fixes: d829e9c4112b ("tls: convert to generic sk_msg interface")
> Reported-by: [email protected]
> Reported-by: [email protected]
> Signed-off-by: Jakub Kicinski <[email protected]>
> ---
> Daniel, John, does this look okay?
>
> CC: Eric Biggers <[email protected]>
> CC: [email protected]
> CC: [email protected]
> CC: [email protected]
>
> net/core/skmsg.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
Daniel, John does this patch look reasonable? I must admit
the skmsg stuff in TLS scares me, it'd appreciate an ack.
Jakub Kicinski wrote:
> sk_msg_trim() tries to only update curr pointer if it falls into
> the trimmed region. The logic, however, does not take into the
> account pointer wrapping that sk_msg_iter_var_prev() does.
> This means that when the message was trimmed completely, the new
> curr pointer would have the value of MAX_MSG_FRAGS - 1, which is
> neither smaller than any other value, nor would it actually be
> correct.
>
> Special case the trimming to 0 length a little bit.
>
> This bug caused the TLS code to not copy all of the message, if
> zero copy filled in fewer sg entries than memcopy would need.
>
> Big thanks to Alexander Potapenko for the non-KMSAN reproducer.
>
> Fixes: d829e9c4112b ("tls: convert to generic sk_msg interface")
> Reported-by: [email protected]
> Reported-by: [email protected]
> Signed-off-by: Jakub Kicinski <[email protected]>
> ---
> Daniel, John, does this look okay?
Thanks for the second ping!
>
> CC: Eric Biggers <[email protected]>
> CC: [email protected]
> CC: [email protected]
> CC: [email protected]
>
> net/core/skmsg.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> index cf390e0aa73d..c42c145216b1 100644
> --- a/net/core/skmsg.c
> +++ b/net/core/skmsg.c
> @@ -276,7 +276,10 @@ void sk_msg_trim(struct sock *sk, struct sk_msg *msg, int len)
> * However trimed data that has not yet been used in a copy op
> * does not require an update.
> */
> - if (msg->sg.curr >= i) {
> + if (!msg->sg.size) {
> + msg->sg.curr = 0;
> + msg->sg.copybreak = 0;
> + } else if (msg->sg.curr >= i) {
> msg->sg.curr = i;
> msg->sg.copybreak = msg->sg.data[i].length;
> }
> --
Its actually not sufficient. We can't directly do comparisons against curr
like this. msg->sg is a ring buffer so we have to be careful for these
types of comparisons.
Examples hopefully help explian. Consider the case with a ring layout on
entering sk_msg_trim,
0 1 2 N = MAX_MSG_FRAGS
|_|_|_|...|_|_|_|...|_|_|_|_|....|_|_|
^ ^ ^
curr end start
Start trimming from end
0 1 2 N = MAX_MSG_FRAGS
|X|X|X|...|X|X|_|...|_|_|i|X|....|X|X|
^ ^ ^
curr end start
We trim backwards through ring with sk_msg_iter_var_prev(). And its
possible to end with the result of above where 'i' is greater than curr
and greater than start leaving scatterlist elements so size != 0.
i > curr && i > start && sg.size != 0
but we wont catch it with this condition
if (msg->sg.curr >= i)
So we won't reset curr and copybreak so we have a potential issue now
where curr is pointing at data that has been trimmed.
I'll put together a fix but the correct thing to do here is a proper
ring greater than op which is not what we have there. Although, your patch
is also really a good one to have because reseting curr = 0 and
copybreak = 0 when possible keeps the ring from being fragmented which
avoids chaining when we push scatterlists down to crypto layer. So for
your patch,
Acked-By: John Fastabend <[email protected]>
If it should go to net or net-next I think is probably up for debate
Nice catch!!! Can you send me the reproducer?
Thanks,
John
John Fastabend wrote:
> Jakub Kicinski wrote:
> > sk_msg_trim() tries to only update curr pointer if it falls into
> > the trimmed region. The logic, however, does not take into the
> > account pointer wrapping that sk_msg_iter_var_prev() does.
> > This means that when the message was trimmed completely, the new
> > curr pointer would have the value of MAX_MSG_FRAGS - 1, which is
> > neither smaller than any other value, nor would it actually be
> > correct.
> >
> > Special case the trimming to 0 length a little bit.
> >
> > This bug caused the TLS code to not copy all of the message, if
> > zero copy filled in fewer sg entries than memcopy would need.
> >
> > Big thanks to Alexander Potapenko for the non-KMSAN reproducer.
> >
> > Fixes: d829e9c4112b ("tls: convert to generic sk_msg interface")
> > Reported-by: [email protected]
> > Reported-by: [email protected]
> > Signed-off-by: Jakub Kicinski <[email protected]>
> > ---
> > Daniel, John, does this look okay?
>
> Thanks for the second ping!
>
> >
> > CC: Eric Biggers <[email protected]>
> > CC: [email protected]
> > CC: [email protected]
> > CC: [email protected]
> >
> > net/core/skmsg.c | 5 ++++-
> > 1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> > index cf390e0aa73d..c42c145216b1 100644
> > --- a/net/core/skmsg.c
> > +++ b/net/core/skmsg.c
> > @@ -276,7 +276,10 @@ void sk_msg_trim(struct sock *sk, struct sk_msg *msg, int len)
> > * However trimed data that has not yet been used in a copy op
> > * does not require an update.
> > */
> > - if (msg->sg.curr >= i) {
> > + if (!msg->sg.size) {
> > + msg->sg.curr = 0;
> > + msg->sg.copybreak = 0;
> > + } else if (msg->sg.curr >= i) {
> > msg->sg.curr = i;
> > msg->sg.copybreak = msg->sg.data[i].length;
> > }
> > --
>
>
> Its actually not sufficient. We can't directly do comparisons against curr
> like this. msg->sg is a ring buffer so we have to be careful for these
> types of comparisons.
>
> Examples hopefully help explian. Consider the case with a ring layout on
> entering sk_msg_trim,
Perhaps worth adding this case is only possible AFAIK with BPF manipulating
the ring to buffer/release data.
>
> 0 1 2 N = MAX_MSG_FRAGS
> |_|_|_|...|_|_|_|...|_|_|_|_|....|_|_|
> ^ ^ ^
> curr end start
>
> Start trimming from end
>
> 0 1 2 N = MAX_MSG_FRAGS
> |X|X|X|...|X|X|_|...|_|_|i|X|....|X|X|
> ^ ^ ^
> curr end start
>
> We trim backwards through ring with sk_msg_iter_var_prev(). And its
> possible to end with the result of above where 'i' is greater than curr
> and greater than start leaving scatterlist elements so size != 0.
>
> i > curr && i > start && sg.size != 0
>
> but we wont catch it with this condition
>
> if (msg->sg.curr >= i)
>
> So we won't reset curr and copybreak so we have a potential issue now
> where curr is pointing at data that has been trimmed.
>
> I'll put together a fix but the correct thing to do here is a proper
> ring greater than op which is not what we have there. Although, your patch
> is also really a good one to have because reseting curr = 0 and
> copybreak = 0 when possible keeps the ring from being fragmented which
> avoids chaining when we push scatterlists down to crypto layer. So for
> your patch,
>
> Acked-By: John Fastabend <[email protected]>
>
> If it should go to net or net-next I think is probably up for debate
>
> Nice catch!!! Can you send me the reproducer?
>
> Thanks,
> John
>
>
>
>
On Thu, 31 Oct 2019 15:05:53 -0700, John Fastabend wrote:
> Jakub Kicinski wrote:
> > sk_msg_trim() tries to only update curr pointer if it falls into
> > the trimmed region. The logic, however, does not take into the
> > account pointer wrapping that sk_msg_iter_var_prev() does.
> > This means that when the message was trimmed completely, the new
> > curr pointer would have the value of MAX_MSG_FRAGS - 1, which is
> > neither smaller than any other value, nor would it actually be
> > correct.
> >
> > Special case the trimming to 0 length a little bit.
> >
> > This bug caused the TLS code to not copy all of the message, if
> > zero copy filled in fewer sg entries than memcopy would need.
> >
> > Big thanks to Alexander Potapenko for the non-KMSAN reproducer.
> >
> > Fixes: d829e9c4112b ("tls: convert to generic sk_msg interface")
> > Reported-by: [email protected]
> > Reported-by: [email protected]
> > Signed-off-by: Jakub Kicinski <[email protected]>
> > ---
> > Daniel, John, does this look okay?
>
> Thanks for the second ping!
No problem, I was worried the patch got categorized as TLS and therefore
lower prio ;)
> > CC: Eric Biggers <[email protected]>
> > CC: [email protected]
> > CC: [email protected]
> > CC: [email protected]
> >
> > net/core/skmsg.c | 5 ++++-
> > 1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> > index cf390e0aa73d..c42c145216b1 100644
> > --- a/net/core/skmsg.c
> > +++ b/net/core/skmsg.c
> > @@ -276,7 +276,10 @@ void sk_msg_trim(struct sock *sk, struct sk_msg *msg, int len)
> > * However trimed data that has not yet been used in a copy op
> > * does not require an update.
> > */
> > - if (msg->sg.curr >= i) {
> > + if (!msg->sg.size) {
> > + msg->sg.curr = 0;
> > + msg->sg.copybreak = 0;
> > + } else if (msg->sg.curr >= i) {
> > msg->sg.curr = i;
> > msg->sg.copybreak = msg->sg.data[i].length;
> > }
> > --
>
>
> Its actually not sufficient. We can't directly do comparisons against curr
> like this. msg->sg is a ring buffer so we have to be careful for these
> types of comparisons.
>
> Examples hopefully help explian. Consider the case with a ring layout on
> entering sk_msg_trim,
>
> 0 1 2 N = MAX_MSG_FRAGS
> |_|_|_|...|_|_|_|...|_|_|_|_|....|_|_|
> ^ ^ ^
> curr end start
>
> Start trimming from end
>
> 0 1 2 N = MAX_MSG_FRAGS
> |X|X|X|...|X|X|_|...|_|_|i|X|....|X|X|
> ^ ^ ^
> curr end start
>
> We trim backwards through ring with sk_msg_iter_var_prev(). And its
> possible to end with the result of above where 'i' is greater than curr
> and greater than start leaving scatterlist elements so size != 0.
>
> i > curr && i > start && sg.size != 0
>
> but we wont catch it with this condition
>
> if (msg->sg.curr >= i)
>
> So we won't reset curr and copybreak so we have a potential issue now
> where curr is pointing at data that has been trimmed.
I see, that makes sense and explains some of the complexity!
Perhaps the simplest way to go would be to adjust the curr as we go
then? The comparison logic could get a little hairy. So like this:
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index cf390e0aa73d..c2b0f9cb589c 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -261,25 +261,29 @@ void sk_msg_trim(struct sock *sk, struct sk_msg *msg, int len)
msg->sg.size = len;
while (msg->sg.data[i].length &&
trim >= msg->sg.data[i].length) {
+ bool move_curr = msg->sg.curr == i;
+
trim -= msg->sg.data[i].length;
sk_msg_free_elem(sk, msg, i, true);
sk_msg_iter_var_prev(i);
+ if (move_curr) {
+ msg->sg.curr = i;
+ msg->sg.copybreak = msg->sg.data[i].length;
+ }
if (!trim)
goto out;
}
msg->sg.data[i].length -= trim;
sk_mem_uncharge(sk, trim);
-out:
/* If we trim data before curr pointer update copybreak and current
* so that any future copy operations start at new copy location.
* However trimed data that has not yet been used in a copy op
* does not require an update.
*/
- if (msg->sg.curr >= i) {
- msg->sg.curr = i;
+ if (msg->sg.curr == i && msg->sg.copybreak > msg->sg.data[i].length)
msg->sg.copybreak = msg->sg.data[i].length;
- }
+out:
sk_msg_iter_var_next(i);
msg->sg.end = i;
}
> I'll put together a fix but the correct thing to do here is a proper
> ring greater than op which is not what we have there. Although, your patch
> is also really a good one to have because reseting curr = 0 and
> copybreak = 0 when possible keeps the ring from being fragmented which
> avoids chaining when we push scatterlists down to crypto layer. So for
> your patch,
>
> Acked-By: John Fastabend <[email protected]>
>
> If it should go to net or net-next I think is probably up for debate
>
> Nice catch!!! Can you send me the reproducer?
I was using the repro from the syzbot report:
https://syzkaller.appspot.com/bug?extid=6f50c99e8f6194bf363f
plus this hack from Alexander to avoid the need for KMSAN:
https://lkml.kernel.org/linux-crypto/CAG_fn=UGCoDk04tL2vB981JmXgo6+-RUPmrTa3dSsK5UbZaTjA@mail.gmail.com/
On Thu, 31 Oct 2019 15:24:44 -0700, Jakub Kicinski wrote:
> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> index cf390e0aa73d..c2b0f9cb589c 100644
> --- a/net/core/skmsg.c
> +++ b/net/core/skmsg.c
> @@ -261,25 +261,29 @@ void sk_msg_trim(struct sock *sk, struct sk_msg *msg, int len)
> msg->sg.size = len;
> while (msg->sg.data[i].length &&
> trim >= msg->sg.data[i].length) {
> + bool move_curr = msg->sg.curr == i;
> +
> trim -= msg->sg.data[i].length;
> sk_msg_free_elem(sk, msg, i, true);
> sk_msg_iter_var_prev(i);
> + if (move_curr) {
> + msg->sg.curr = i;
> + msg->sg.copybreak = msg->sg.data[i].length;
> + }
> if (!trim)
> goto out;
> }
Thinking about this in between builds that is clearly nonsensical,
sorry. But I'd feel a little better if we merged a full fix instead of
just fixing the simple case for now :( Maybe I can produce a working
patch based on your description..
Jakub Kicinski wrote:
> On Thu, 31 Oct 2019 15:05:53 -0700, John Fastabend wrote:
> > Jakub Kicinski wrote:
> > > sk_msg_trim() tries to only update curr pointer if it falls into
> > > the trimmed region. The logic, however, does not take into the
> > > account pointer wrapping that sk_msg_iter_var_prev() does.
> > > This means that when the message was trimmed completely, the new
> > > curr pointer would have the value of MAX_MSG_FRAGS - 1, which is
> > > neither smaller than any other value, nor would it actually be
> > > correct.
> > >
> > > Special case the trimming to 0 length a little bit.
> > >
> > > This bug caused the TLS code to not copy all of the message, if
> > > zero copy filled in fewer sg entries than memcopy would need.
> > >
> > > Big thanks to Alexander Potapenko for the non-KMSAN reproducer.
> > >
> > > Fixes: d829e9c4112b ("tls: convert to generic sk_msg interface")
> > > Reported-by: [email protected]
> > > Reported-by: [email protected]
> > > Signed-off-by: Jakub Kicinski <[email protected]>
> > > ---
> > > Daniel, John, does this look okay?
> >
> > Thanks for the second ping!
>
> No problem, I was worried the patch got categorized as TLS and therefore
> lower prio ;)
Nope close to the top of the list here.
>
> > > CC: Eric Biggers <[email protected]>
> > > CC: [email protected]
> > > CC: [email protected]
> > > CC: [email protected]
> > >
> > > net/core/skmsg.c | 5 ++++-
> > > 1 file changed, 4 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> > > index cf390e0aa73d..c42c145216b1 100644
> > > --- a/net/core/skmsg.c
> > > +++ b/net/core/skmsg.c
> > > @@ -276,7 +276,10 @@ void sk_msg_trim(struct sock *sk, struct sk_msg *msg, int len)
> > > * However trimed data that has not yet been used in a copy op
> > > * does not require an update.
> > > */
> > > - if (msg->sg.curr >= i) {
> > > + if (!msg->sg.size) {
> > > + msg->sg.curr = 0;
> > > + msg->sg.copybreak = 0;
> > > + } else if (msg->sg.curr >= i) {
> > > msg->sg.curr = i;
> > > msg->sg.copybreak = msg->sg.data[i].length;
> > > }
> > > --
> >
> >
> > Its actually not sufficient. We can't directly do comparisons against curr
> > like this. msg->sg is a ring buffer so we have to be careful for these
> > types of comparisons.
> >
> > Examples hopefully help explian. Consider the case with a ring layout on
> > entering sk_msg_trim,
> >
> > 0 1 2 N = MAX_MSG_FRAGS
> > |_|_|_|...|_|_|_|...|_|_|_|_|....|_|_|
> > ^ ^ ^
> > curr end start
> >
> > Start trimming from end
> >
> > 0 1 2 N = MAX_MSG_FRAGS
> > |X|X|X|...|X|X|_|...|_|_|i|X|....|X|X|
> > ^ ^ ^
> > curr end start
> >
> > We trim backwards through ring with sk_msg_iter_var_prev(). And its
> > possible to end with the result of above where 'i' is greater than curr
> > and greater than start leaving scatterlist elements so size != 0.
> >
> > i > curr && i > start && sg.size != 0
> >
> > but we wont catch it with this condition
> >
> > if (msg->sg.curr >= i)
> >
> > So we won't reset curr and copybreak so we have a potential issue now
> > where curr is pointing at data that has been trimmed.
>
> I see, that makes sense and explains some of the complexity!
>
> Perhaps the simplest way to go would be to adjust the curr as we go
> then? The comparison logic could get a little hairy. So like this:
I don't think the comparison is too bad. Working it out live here. First
do a bit of case analysis, We have 3 pointers so there are 3!=6 possible
arrangements (permutations),
1. S,C,E 6. S,E,C
5. C,S,E 2. C,E,S
3. E,S,C 4. E,C,S
Case 1: Normal case start < curr < end
0 1 2 N = MAX_MSG_FRAGS
|_|_|_|...|_|_|_|...|_|_|_|_|....|_|_|
^ ^ ^
start curr end
if (start < end && i < curr)
curr = i;
Case 2: curr < end < start (in absolute index terms)
0 1 2 N = MAX_MSG_FRAGS
|_|_|_|...|_|_|_|...|_|_|_|_|....|_|_|
^ ^ ^
curr end start
if (end < start && (i < curr || i > start))
curr = i
Case 3: end < start < curr
0 1 2 N = MAX_MSG_FRAGS
|_|_|_|...|_|_|_|...|_|_|_|_|....|_|_|
^ ^ ^
end start curr
if (end < start && (i < curr)
curr = i;
Case 4: end < curr < start
0 1 2 N = MAX_MSG_FRAGS
|_|_|_|...|_|_|_|...|_|_|_|_|....|_|_|
^ ^ ^
end curr start
(nonsense curr would be invalid here it must be between the start and end)
Case 5: curr < start < end
0 1 2 N = MAX_MSG_FRAGS
|_|_|_|...|_|_|_|...|_|_|_|_|....|_|_|
^ ^ ^
curr start end
(nonsense curr would be invalid here it must be between the start and end)
Case 6: start < end < curr
0 1 2 N = MAX_MSG_FRAGS
|_|_|_|...|_|_|_|...|_|_|_|_|....|_|_|
^ ^ ^
start end curr
(nonsense curr would be invalid here it must be between the start and end)
So I think the following would suffice,
if (msg->sg.start < msg->sg.end && i < msg->sg.curr) {
msg->sg.curr = i;
msg->sg.copybreak = msg->sg.data[i].length;
} else if (msg->sg.end < msg->sg.start && (i < msg->sg.curr || i > msg->sg.start))
msg->sg.curr = i;
msg->sg.copybreak = msg->sg.data[i].length;
} else if (msg->sg.end < msg->sg.start && (i < msg->sg.curr) {
curr = i;
msg->sg.copybreak = msg->sg.data[i].length;
}
Finally fold the last two cases into one so we get
if (msg->sg.start < msg->sg.end && i < msg->sg.curr) {
msg->sg.curr = i;
msg->sg.copybreak = msg->sg.data[i].length;
} else if (msg->sg.end < msg->sg.start && (i < msg->sg.curr || i > msg->sg.start))
msg->sg.curr = i;
msg->sg.copybreak = msg->sg.data[i].length;
So not so bad. Put that in a helper in sk_msg.h and use it in trim. I can check
logic in the AM and draft a patch but I think that should be correct. Also will
need to audit to see if there are other cases this happens. In general I tried
to always use i == msg->sg.{start|end|curr} to avoid this.
Hopefully it wasn't too verbose above but figured it couldn't hurt.
.John
On Thu, 31 Oct 2019 21:44:45 -0700, John Fastabend wrote:
> Jakub Kicinski wrote:
> > On Thu, 31 Oct 2019 15:05:53 -0700, John Fastabend wrote:
> > > Jakub Kicinski wrote:
> > > > sk_msg_trim() tries to only update curr pointer if it falls into
> > > > the trimmed region. The logic, however, does not take into the
> > > > account pointer wrapping that sk_msg_iter_var_prev() does.
> > > > This means that when the message was trimmed completely, the new
> > > > curr pointer would have the value of MAX_MSG_FRAGS - 1, which is
> > > > neither smaller than any other value, nor would it actually be
> > > > correct.
> > > >
> > > > Special case the trimming to 0 length a little bit.
> > > >
> > > > This bug caused the TLS code to not copy all of the message, if
> > > > zero copy filled in fewer sg entries than memcopy would need.
> > > >
> > > > Big thanks to Alexander Potapenko for the non-KMSAN reproducer.
> > > >
> > > > Fixes: d829e9c4112b ("tls: convert to generic sk_msg interface")
> > > > Reported-by: [email protected]
> > > > Reported-by: [email protected]
> > > > Signed-off-by: Jakub Kicinski <[email protected]>
> > > > ---
> > > > Daniel, John, does this look okay?
> > >
> > > Thanks for the second ping!
> >
> > No problem, I was worried the patch got categorized as TLS and therefore
> > lower prio ;)
>
> Nope close to the top of the list here.
>
> >
> [...]
> [...]
> >
> > I see, that makes sense and explains some of the complexity!
> >
> > Perhaps the simplest way to go would be to adjust the curr as we go
> > then? The comparison logic could get a little hairy. So like this:
>
> I don't think the comparison is too bad. Working it out live here. First
> do a bit of case analysis, We have 3 pointers so there are 3!=6 possible
> arrangements (permutations),
>
> 1. S,C,E 6. S,E,C
> 5. C,S,E 2. C,E,S
> 3. E,S,C 4. E,C,S
>
>
> Case 1: Normal case start < curr < end
>
> 0 1 2 N = MAX_MSG_FRAGS
> |_|_|_|...|_|_|_|...|_|_|_|_|....|_|_|
> ^ ^ ^
> start curr end
>
> if (start < end && i < curr)
> curr = i;
>
>
> Case 2: curr < end < start (in absolute index terms)
>
> 0 1 2 N = MAX_MSG_FRAGS
> |_|_|_|...|_|_|_|...|_|_|_|_|....|_|_|
> ^ ^ ^
> curr end start
>
> if (end < start && (i < curr || i > start))
> curr = i
>
>
> Case 3: end < start < curr
>
> 0 1 2 N = MAX_MSG_FRAGS
> |_|_|_|...|_|_|_|...|_|_|_|_|....|_|_|
> ^ ^ ^
> end start curr
>
>
> if (end < start && (i < curr)
> curr = i;
>
>
> Case 4: end < curr < start
>
> 0 1 2 N = MAX_MSG_FRAGS
> |_|_|_|...|_|_|_|...|_|_|_|_|....|_|_|
> ^ ^ ^
> end curr start
>
> (nonsense curr would be invalid here it must be between the start and end)
>
> Case 5: curr < start < end
>
> 0 1 2 N = MAX_MSG_FRAGS
> |_|_|_|...|_|_|_|...|_|_|_|_|....|_|_|
> ^ ^ ^
> curr start end
>
> (nonsense curr would be invalid here it must be between the start and end)
>
> Case 6: start < end < curr
>
> 0 1 2 N = MAX_MSG_FRAGS
> |_|_|_|...|_|_|_|...|_|_|_|_|....|_|_|
> ^ ^ ^
> start end curr
>
> (nonsense curr would be invalid here it must be between the start and end)
>
> So I think the following would suffice,
>
>
> if (msg->sg.start < msg->sg.end && i < msg->sg.curr) {
> msg->sg.curr = i;
> msg->sg.copybreak = msg->sg.data[i].length;
> } else if (msg->sg.end < msg->sg.start && (i < msg->sg.curr || i > msg->sg.start))
> msg->sg.curr = i;
> msg->sg.copybreak = msg->sg.data[i].length;
> } else if (msg->sg.end < msg->sg.start && (i < msg->sg.curr) {
> curr = i;
> msg->sg.copybreak = msg->sg.data[i].length;
> }
>
> Finally fold the last two cases into one so we get
>
> if (msg->sg.start < msg->sg.end && i < msg->sg.curr) {
> msg->sg.curr = i;
> msg->sg.copybreak = msg->sg.data[i].length;
> } else if (msg->sg.end < msg->sg.start && (i < msg->sg.curr || i > msg->sg.start))
> msg->sg.curr = i;
> msg->sg.copybreak = msg->sg.data[i].length;
>
> So not so bad. Put that in a helper in sk_msg.h and use it in trim. I can check
> logic in the AM and draft a patch but I think that should be correct. Also will
> need to audit to see if there are other cases this happens. In general I tried
> to always use i == msg->sg.{start|end|curr} to avoid this.
I will look in depth tomorrow as well, the full/empty cases are a
little tricky to fold into general logic.
I came up with this before I got distracted Halloweening :)
diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
index e4b3fb4bb77c..ce7055259877 100644
--- a/include/linux/skmsg.h
+++ b/include/linux/skmsg.h
@@ -139,6 +139,11 @@ static inline void sk_msg_apply_bytes(struct sk_psock *psock, u32 bytes)
}
}
+static inline u32 sk_msg_iter_dist(u32 start, u32 end)
+{
+ return end >= start ? end - start : end + (MAX_MSG_FRAGS - start);
+}
+
#define sk_msg_iter_var_prev(var) \
do { \
if (var == 0) \
@@ -198,9 +203,7 @@ static inline u32 sk_msg_elem_used(const struct sk_msg *msg)
if (sk_msg_full(msg))
return MAX_MSG_FRAGS;
- return msg->sg.end >= msg->sg.start ?
- msg->sg.end - msg->sg.start :
- msg->sg.end + (MAX_MSG_FRAGS - msg->sg.start);
+ return sk_msg_iter_dist(msg->sg.start, msg->sg.end);
}
static inline struct scatterlist *sk_msg_elem(struct sk_msg *msg, int which)
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index cf390e0aa73d..f6b4a70bafa9 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -270,18 +270,26 @@ void sk_msg_trim(struct sock *sk, struct sk_msg *msg, int len)
msg->sg.data[i].length -= trim;
sk_mem_uncharge(sk, trim);
+ if (msg->sg.curr == i && msg->sg.copybreak > msg->sg.data[i].length)
+ msg->sg.copybreak = msg->sg.data[i].length;
out:
+ sk_msg_iter_var_next(i);
+ msg->sg.end = i;
+
/* If we trim data before curr pointer update copybreak and current
* so that any future copy operations start at new copy location.
* However trimed data that has not yet been used in a copy op
* does not require an update.
*/
- if (msg->sg.curr >= i) {
+ if (!msg->sg.size) {
+ msg->sg.curr = 0;
+ msg->sg.copybreak = 0;
+ } else if (sk_msg_iter_dist(msg->sg.start, msg->sg.curr) >
+ sk_msg_iter_dist(msg->sg.end, msg->sg.curr)) {
+ sk_msg_iter_var_prev(i);
msg->sg.curr = i;
msg->sg.copybreak = msg->sg.data[i].length;
}
- sk_msg_iter_var_next(i);
- msg->sg.end = i;
}
EXPORT_SYMBOL_GPL(sk_msg_trim);
--
2.23.0
Jakub Kicinski wrote:
> On Thu, 31 Oct 2019 21:44:45 -0700, John Fastabend wrote:
> > Jakub Kicinski wrote:
> > > On Thu, 31 Oct 2019 15:05:53 -0700, John Fastabend wrote:
> > > > Jakub Kicinski wrote:
> > > > > sk_msg_trim() tries to only update curr pointer if it falls into
> > > > > the trimmed region. The logic, however, does not take into the
> > > > > account pointer wrapping that sk_msg_iter_var_prev() does.
> > > > > This means that when the message was trimmed completely, the new
> > > > > curr pointer would have the value of MAX_MSG_FRAGS - 1, which is
> > > > > neither smaller than any other value, nor would it actually be
> > > > > correct.
> > > > >
> > > > > Special case the trimming to 0 length a little bit.
> > > > >
> > > > > This bug caused the TLS code to not copy all of the message, if
> > > > > zero copy filled in fewer sg entries than memcopy would need.
> > > > >
> > > > > Big thanks to Alexander Potapenko for the non-KMSAN reproducer.
> > > > >
> > > > > Fixes: d829e9c4112b ("tls: convert to generic sk_msg interface")
> > > > > Reported-by: [email protected]
> > > > > Reported-by: [email protected]
> > > > > Signed-off-by: Jakub Kicinski <[email protected]>
> > > > > ---
> > > > > Daniel, John, does this look okay?
> > > >
> > > > Thanks for the second ping!
> > >
> > > No problem, I was worried the patch got categorized as TLS and therefore
> > > lower prio ;)
> >
> > Nope close to the top of the list here.
> >
> > >
> > [...]
> > [...]
> > >
> > > I see, that makes sense and explains some of the complexity!
> > >
> > > Perhaps the simplest way to go would be to adjust the curr as we go
> > > then? The comparison logic could get a little hairy. So like this:
> >
> > I don't think the comparison is too bad. Working it out live here. First
> > do a bit of case analysis, We have 3 pointers so there are 3!=6 possible
> > arrangements (permutations),
> >
> > 1. S,C,E 6. S,E,C
> > 5. C,S,E 2. C,E,S
> > 3. E,S,C 4. E,C,S
> >
> >
> > Case 1: Normal case start < curr < end
> >
> > 0 1 2 N = MAX_MSG_FRAGS
> > |_|_|_|...|_|_|_|...|_|_|_|_|....|_|_|
> > ^ ^ ^
> > start curr end
> >
> > if (start < end && i < curr)
> > curr = i;
> >
> >
> > Case 2: curr < end < start (in absolute index terms)
> >
> > 0 1 2 N = MAX_MSG_FRAGS
> > |_|_|_|...|_|_|_|...|_|_|_|_|....|_|_|
> > ^ ^ ^
> > curr end start
> >
> > if (end < start && (i < curr || i > start))
> > curr = i
> >
> >
> > Case 3: end < start < curr
> >
> > 0 1 2 N = MAX_MSG_FRAGS
> > |_|_|_|...|_|_|_|...|_|_|_|_|....|_|_|
> > ^ ^ ^
> > end start curr
> >
> >
> > if (end < start && (i < curr)
> > curr = i;
> >
> >
> > Case 4: end < curr < start
> >
> > 0 1 2 N = MAX_MSG_FRAGS
> > |_|_|_|...|_|_|_|...|_|_|_|_|....|_|_|
> > ^ ^ ^
> > end curr start
> >
> > (nonsense curr would be invalid here it must be between the start and end)
> >
> > Case 5: curr < start < end
> >
> > 0 1 2 N = MAX_MSG_FRAGS
> > |_|_|_|...|_|_|_|...|_|_|_|_|....|_|_|
> > ^ ^ ^
> > curr start end
> >
> > (nonsense curr would be invalid here it must be between the start and end)
> >
> > Case 6: start < end < curr
> >
> > 0 1 2 N = MAX_MSG_FRAGS
> > |_|_|_|...|_|_|_|...|_|_|_|_|....|_|_|
> > ^ ^ ^
> > start end curr
> >
> > (nonsense curr would be invalid here it must be between the start and end)
> >
> > So I think the following would suffice,
> >
> >
> > if (msg->sg.start < msg->sg.end && i < msg->sg.curr) {
> > msg->sg.curr = i;
> > msg->sg.copybreak = msg->sg.data[i].length;
> > } else if (msg->sg.end < msg->sg.start && (i < msg->sg.curr || i > msg->sg.start))
> > msg->sg.curr = i;
> > msg->sg.copybreak = msg->sg.data[i].length;
> > } else if (msg->sg.end < msg->sg.start && (i < msg->sg.curr) {
> > curr = i;
> > msg->sg.copybreak = msg->sg.data[i].length;
> > }
> >
> > Finally fold the last two cases into one so we get
> >
> > if (msg->sg.start < msg->sg.end && i < msg->sg.curr) {
> > msg->sg.curr = i;
> > msg->sg.copybreak = msg->sg.data[i].length;
> > } else if (msg->sg.end < msg->sg.start && (i < msg->sg.curr || i > msg->sg.start))
> > msg->sg.curr = i;
> > msg->sg.copybreak = msg->sg.data[i].length;
> >
> > So not so bad. Put that in a helper in sk_msg.h and use it in trim. I can check
> > logic in the AM and draft a patch but I think that should be correct. Also will
> > need to audit to see if there are other cases this happens. In general I tried
> > to always use i == msg->sg.{start|end|curr} to avoid this.
>
> I will look in depth tomorrow as well, the full/empty cases are a
> little tricky to fold into general logic.
>
> I came up with this before I got distracted Halloweening :)
Same here. Looking at the two cases from above.
if (msg->sg.start < msg->sg.end &&
i < msg->sg.curr) { // i <= msg->sg.curr
msg->sg.curr = i;
msg->sg.copybreak = msg->sg.data[i].length;
}
If we happen to trim the entire msg so size=0 then i==start
which should mean i < msg->sg.curr unless msg->sg.curr = msg->sg.start
so we should just use <=. In the second case.
else if (msg->sg.end < msg->sg.start &&
(i < msg->sg.curr || i > msg->sg.start)) { // i <= msg->sg.curr
msg->sg.curr = i;
msg->sg.copybreak = msg->sg.data[i].length;
}
If we trim the entire message here i == sg.start again. And same
thing use <= and we should catch case sg.tart = sg.curr.
In the full case we didn't trim anything so we shouldn't do any
manipulating of curr or copybreak.
>
> diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
> index e4b3fb4bb77c..ce7055259877 100644
> --- a/include/linux/skmsg.h
> +++ b/include/linux/skmsg.h
> @@ -139,6 +139,11 @@ static inline void sk_msg_apply_bytes(struct sk_psock *psock, u32 bytes)
> }
> }
>
> +static inline u32 sk_msg_iter_dist(u32 start, u32 end)
> +{
> + return end >= start ? end - start : end + (MAX_MSG_FRAGS - start);
> +}
> +
> #define sk_msg_iter_var_prev(var) \
> do { \
> if (var == 0) \
> @@ -198,9 +203,7 @@ static inline u32 sk_msg_elem_used(const struct sk_msg *msg)
> if (sk_msg_full(msg))
> return MAX_MSG_FRAGS;
>
> - return msg->sg.end >= msg->sg.start ?
> - msg->sg.end - msg->sg.start :
> - msg->sg.end + (MAX_MSG_FRAGS - msg->sg.start);
> + return sk_msg_iter_dist(msg->sg.start, msg->sg.end);
> }
>
> static inline struct scatterlist *sk_msg_elem(struct sk_msg *msg, int which)
> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> index cf390e0aa73d..f6b4a70bafa9 100644
> --- a/net/core/skmsg.c
> +++ b/net/core/skmsg.c
> @@ -270,18 +270,26 @@ void sk_msg_trim(struct sock *sk, struct sk_msg *msg, int len)
>
> msg->sg.data[i].length -= trim;
> sk_mem_uncharge(sk, trim);
> + if (msg->sg.curr == i && msg->sg.copybreak > msg->sg.data[i].length)
> + msg->sg.copybreak = msg->sg.data[i].length;
> out:
> + sk_msg_iter_var_next(i);
> + msg->sg.end = i;
> +
> /* If we trim data before curr pointer update copybreak and current
> * so that any future copy operations start at new copy location.
> * However trimed data that has not yet been used in a copy op
> * does not require an update.
> */
> - if (msg->sg.curr >= i) {
> + if (!msg->sg.size) {
I do think its a bit nicer if we don't special case the size = 0 case. If
we get here and i != start then we would have extra bytes in the sg
items between the items (i, end) and nonzero size. If i == start then the
we sg.size = 0. I don't think there are any other cases.
> + msg->sg.curr = 0;
> + msg->sg.copybreak = 0;
> + } else if (sk_msg_iter_dist(msg->sg.start, msg->sg.curr) >
> + sk_msg_iter_dist(msg->sg.end, msg->sg.curr)) {
> + sk_msg_iter_var_prev(i);
I suspect with small update to dist logic the special case could also
be dropped here. But I have a preference for my example above at the
moment. Just getting coffee now so will think on it though.
FWIW I've not compiled my example.
> msg->sg.curr = i;
> msg->sg.copybreak = msg->sg.data[i].length;
> }
> - sk_msg_iter_var_next(i);
> - msg->sg.end = i;
> }
> EXPORT_SYMBOL_GPL(sk_msg_trim);
>
> --
> 2.23.0
On Fri, 01 Nov 2019 08:01:00 -0700, John Fastabend wrote:
> Jakub Kicinski wrote:
> > I will look in depth tomorrow as well, the full/empty cases are a
> > little tricky to fold into general logic.
> >
> > I came up with this before I got distracted Halloweening :)
>
> Same here. Looking at the two cases from above.
>
> if (msg->sg.start < msg->sg.end &&
> i < msg->sg.curr) { // i <= msg->sg.curr
> msg->sg.curr = i;
> msg->sg.copybreak = msg->sg.data[i].length;
> }
>
> If we happen to trim the entire msg so size=0 then i==start
> which should mean i < msg->sg.curr unless msg->sg.curr = msg->sg.start
> so we should just use <=. In the second case.
>
> else if (msg->sg.end < msg->sg.start &&
> (i < msg->sg.curr || i > msg->sg.start)) { // i <= msg->sg.curr
> msg->sg.curr = i;
> msg->sg.copybreak = msg->sg.data[i].length;
> }
>
> If we trim the entire message here i == sg.start again. And same
> thing use <= and we should catch case sg.tart = sg.curr.
>
> In the full case we didn't trim anything so we shouldn't do any
> manipulating of curr or copybreak.
Hm, don't we need to potentially move the copybreak back a little?
That's why I added this:
if (msg->sg.curr == i && msg->sg.copybreak > msg->sg.data[i].length)
msg->sg.copybreak = msg->sg.data[i].length;
To make sure that if we trimmed "a little bit" of the last SG but
didn't actually consume it the copybreak doesn't point after the length.
But perhaps that's not needed since sk_msg_memcopy_from_iter() special
cases the copybreak > length, anyway?
> > diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
> > index e4b3fb4bb77c..ce7055259877 100644
> > --- a/include/linux/skmsg.h
> > +++ b/include/linux/skmsg.h
> > @@ -139,6 +139,11 @@ static inline void sk_msg_apply_bytes(struct sk_psock *psock, u32 bytes)
> > }
> > }
> >
> > +static inline u32 sk_msg_iter_dist(u32 start, u32 end)
> > +{
> > + return end >= start ? end - start : end + (MAX_MSG_FRAGS - start);
> > +}
> > +
> > #define sk_msg_iter_var_prev(var) \
> > do { \
> > if (var == 0) \
> > @@ -198,9 +203,7 @@ static inline u32 sk_msg_elem_used(const struct sk_msg *msg)
> > if (sk_msg_full(msg))
> > return MAX_MSG_FRAGS;
> >
> > - return msg->sg.end >= msg->sg.start ?
> > - msg->sg.end - msg->sg.start :
> > - msg->sg.end + (MAX_MSG_FRAGS - msg->sg.start);
> > + return sk_msg_iter_dist(msg->sg.start, msg->sg.end);
> > }
> >
> > static inline struct scatterlist *sk_msg_elem(struct sk_msg *msg, int which)
> > diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> > index cf390e0aa73d..f6b4a70bafa9 100644
> > --- a/net/core/skmsg.c
> > +++ b/net/core/skmsg.c
> > @@ -270,18 +270,26 @@ void sk_msg_trim(struct sock *sk, struct sk_msg *msg, int len)
> >
> > msg->sg.data[i].length -= trim;
> > sk_mem_uncharge(sk, trim);
> > + if (msg->sg.curr == i && msg->sg.copybreak > msg->sg.data[i].length)
> > + msg->sg.copybreak = msg->sg.data[i].length;
> > out:
> > + sk_msg_iter_var_next(i);
> > + msg->sg.end = i;
> > +
> > /* If we trim data before curr pointer update copybreak and current
> > * so that any future copy operations start at new copy location.
> > * However trimed data that has not yet been used in a copy op
> > * does not require an update.
> > */
> > - if (msg->sg.curr >= i) {
> > + if (!msg->sg.size) {
>
> I do think its a bit nicer if we don't special case the size = 0 case. If
> we get here and i != start then we would have extra bytes in the sg
> items between the items (i, end) and nonzero size. If i == start then the
> we sg.size = 0. I don't think there are any other cases.
On an empty message i ended up before start, so we'd have to take the
wrapping into account, no? I couldn't come up with a way to handle
that, and the full case cleanly :S Perhaps there are some constraints
on the geometry that simplify it.
> > + msg->sg.curr = 0;
Ugh, this should say msg->sg.start, not 0.
> > + msg->sg.copybreak = 0;
> > + } else if (sk_msg_iter_dist(msg->sg.start, msg->sg.curr) >
> > + sk_msg_iter_dist(msg->sg.end, msg->sg.curr)) {
> > + sk_msg_iter_var_prev(i);
>
> I suspect with small update to dist logic the special case could also
> be dropped here. But I have a preference for my example above at the
> moment. Just getting coffee now so will think on it though.
Oka, I like the dist thing, I thought that's where you were going in
your first email :)
I need to do some more admin, and then I'll probably write a unit test
for this code (use space version).. So we can test either patch with it.
> FWIW I've not compiled my example.
>
> > msg->sg.curr = i;
> > msg->sg.copybreak = msg->sg.data[i].length;
> > }
> > - sk_msg_iter_var_next(i);
> > - msg->sg.end = i;
> > }
> > EXPORT_SYMBOL_GPL(sk_msg_trim);
On Fri, 1 Nov 2019 10:22:38 -0700, Jakub Kicinski wrote:
> > > + msg->sg.copybreak = 0;
> > > + } else if (sk_msg_iter_dist(msg->sg.start, msg->sg.curr) >
> > > + sk_msg_iter_dist(msg->sg.end, msg->sg.curr)) {
> > > + sk_msg_iter_var_prev(i);
> >
> > I suspect with small update to dist logic the special case could also
> > be dropped here. But I have a preference for my example above at the
> > moment. Just getting coffee now so will think on it though.
>
> Oka, I like the dist thing, I thought that's where you were going in
> your first email :)
>
> I need to do some more admin, and then I'll probably write a unit test
> for this code (use space version).. So we can test either patch with it.
Attaching my "unit test", you should be able to just replace
sk_msg_trim() with yours and re-run. That said my understanding of the
expected geometry of the buffer may not be correct :)
The patch I posted yesterday, with the small adjustment to set curr to
start on empty message passes that test, here it is again:
----->8-----
From 953df5bc0992e31a2c7863ea8b8e490ba7a07356 Mon Sep 17 00:00:00 2001
From: Jakub Kicinski <[email protected]>
Date: Tue, 29 Oct 2019 20:20:49 -0700
Subject: [PATCH net] net/tls: fix sk_msg trim on fallback to copy mode
sk_msg_trim() tries to only update curr pointer if it falls into
the trimmed region. The logic, however, does not take into the
account pointer wrapping that sk_msg_iter_var_prev() does nor
(as John points out) the fact that msg->sg is a ring buffer.
This means that when the message was trimmed completely, the new
curr pointer would have the value of MAX_MSG_FRAGS - 1, which is
neither smaller than any other value, nor would it actually be
correct.
Special case the trimming to 0 length a little bit and rework
the comparison between curr and end to take into account wrapping.
This bug caused the TLS code to not copy all of the message, if
zero copy filled in fewer sg entries than memcopy would need.
Big thanks to Alexander Potapenko for the non-KMSAN reproducer.
v2:
- take into account that msg->sg is a ring buffer (John).
Fixes: d829e9c4112b ("tls: convert to generic sk_msg interface")
Suggested-by: John Fastabend <[email protected]>
Reported-by: [email protected]
Reported-by: [email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
---
include/linux/skmsg.h | 9 ++++++---
net/core/skmsg.c | 20 +++++++++++++++-----
2 files changed, 21 insertions(+), 8 deletions(-)
diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
index e4b3fb4bb77c..ce7055259877 100644
--- a/include/linux/skmsg.h
+++ b/include/linux/skmsg.h
@@ -139,6 +139,11 @@ static inline void sk_msg_apply_bytes(struct sk_psock *psock, u32 bytes)
}
}
+static inline u32 sk_msg_iter_dist(u32 start, u32 end)
+{
+ return end >= start ? end - start : end + (MAX_MSG_FRAGS - start);
+}
+
#define sk_msg_iter_var_prev(var) \
do { \
if (var == 0) \
@@ -198,9 +203,7 @@ static inline u32 sk_msg_elem_used(const struct sk_msg *msg)
if (sk_msg_full(msg))
return MAX_MSG_FRAGS;
- return msg->sg.end >= msg->sg.start ?
- msg->sg.end - msg->sg.start :
- msg->sg.end + (MAX_MSG_FRAGS - msg->sg.start);
+ return sk_msg_iter_dist(msg->sg.start, msg->sg.end);
}
static inline struct scatterlist *sk_msg_elem(struct sk_msg *msg, int which)
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index cf390e0aa73d..f87fde3a846c 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -270,18 +270,28 @@ void sk_msg_trim(struct sock *sk, struct sk_msg *msg, int len)
msg->sg.data[i].length -= trim;
sk_mem_uncharge(sk, trim);
+ /* Adjust copybreak if it falls into the trimmed part of last buf */
+ if (msg->sg.curr == i && msg->sg.copybreak > msg->sg.data[i].length)
+ msg->sg.copybreak = msg->sg.data[i].length;
out:
- /* If we trim data before curr pointer update copybreak and current
- * so that any future copy operations start at new copy location.
+ sk_msg_iter_var_next(i);
+ msg->sg.end = i;
+
+ /* If we trim data a full sg elem before curr pointer update
+ * copybreak and current so that any future copy operations
+ * start at new copy location.
* However trimed data that has not yet been used in a copy op
* does not require an update.
*/
- if (msg->sg.curr >= i) {
+ if (!msg->sg.size) {
+ msg->sg.curr = msg->sg.start;
+ msg->sg.copybreak = 0;
+ } else if (sk_msg_iter_dist(msg->sg.start, msg->sg.curr) >
+ sk_msg_iter_dist(msg->sg.end, msg->sg.curr)) {
+ sk_msg_iter_var_prev(i);
msg->sg.curr = i;
msg->sg.copybreak = msg->sg.data[i].length;
}
- sk_msg_iter_var_next(i);
- msg->sg.end = i;
}
EXPORT_SYMBOL_GPL(sk_msg_trim);
--
2.23.0
Jakub Kicinski wrote:
> On Fri, 1 Nov 2019 10:22:38 -0700, Jakub Kicinski wrote:
> > > > + msg->sg.copybreak = 0;
> > > > + } else if (sk_msg_iter_dist(msg->sg.start, msg->sg.curr) >
> > > > + sk_msg_iter_dist(msg->sg.end, msg->sg.curr)) {
> > > > + sk_msg_iter_var_prev(i);
> > >
> > > I suspect with small update to dist logic the special case could also
> > > be dropped here. But I have a preference for my example above at the
> > > moment. Just getting coffee now so will think on it though.
> >
> > Oka, I like the dist thing, I thought that's where you were going in
> > your first email :)
> >
> > I need to do some more admin, and then I'll probably write a unit test
> > for this code (use space version).. So we can test either patch with it.
>
> Attaching my "unit test", you should be able to just replace
> sk_msg_trim() with yours and re-run. That said my understanding of the
> expected geometry of the buffer may not be correct :)
>
> The patch I posted yesterday, with the small adjustment to set curr to
> start on empty message passes that test, here it is again:
>
> ----->8-----
>
> From 953df5bc0992e31a2c7863ea8b8e490ba7a07356 Mon Sep 17 00:00:00 2001
> From: Jakub Kicinski <[email protected]>
> Date: Tue, 29 Oct 2019 20:20:49 -0700
> Subject: [PATCH net] net/tls: fix sk_msg trim on fallback to copy mode
>
> sk_msg_trim() tries to only update curr pointer if it falls into
> the trimmed region. The logic, however, does not take into the
> account pointer wrapping that sk_msg_iter_var_prev() does nor
> (as John points out) the fact that msg->sg is a ring buffer.
>
> This means that when the message was trimmed completely, the new
> curr pointer would have the value of MAX_MSG_FRAGS - 1, which is
> neither smaller than any other value, nor would it actually be
> correct.
>
> Special case the trimming to 0 length a little bit and rework
> the comparison between curr and end to take into account wrapping.
>
> This bug caused the TLS code to not copy all of the message, if
> zero copy filled in fewer sg entries than memcopy would need.
>
> Big thanks to Alexander Potapenko for the non-KMSAN reproducer.
>
> v2:
> - take into account that msg->sg is a ring buffer (John).
>
> Fixes: d829e9c4112b ("tls: convert to generic sk_msg interface")
> Suggested-by: John Fastabend <[email protected]>
> Reported-by: [email protected]
> Reported-by: [email protected]
> Signed-off-by: Jakub Kicinski <[email protected]>
> ---
> include/linux/skmsg.h | 9 ++++++---
> net/core/skmsg.c | 20 +++++++++++++++-----
> 2 files changed, 21 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
> index e4b3fb4bb77c..ce7055259877 100644
> --- a/include/linux/skmsg.h
> +++ b/include/linux/skmsg.h
> @@ -139,6 +139,11 @@ static inline void sk_msg_apply_bytes(struct sk_psock *psock, u32 bytes)
> }
> }
>
> +static inline u32 sk_msg_iter_dist(u32 start, u32 end)
> +{
> + return end >= start ? end - start : end + (MAX_MSG_FRAGS - start);
> +}
> +
> #define sk_msg_iter_var_prev(var) \
> do { \
> if (var == 0) \
> @@ -198,9 +203,7 @@ static inline u32 sk_msg_elem_used(const struct sk_msg *msg)
> if (sk_msg_full(msg))
> return MAX_MSG_FRAGS;
>
> - return msg->sg.end >= msg->sg.start ?
> - msg->sg.end - msg->sg.start :
> - msg->sg.end + (MAX_MSG_FRAGS - msg->sg.start);
> + return sk_msg_iter_dist(msg->sg.start, msg->sg.end);
> }
I think its nice to pull this into a helper so I'm ok with also using the
dist below, except for one comment below.
>
> static inline struct scatterlist *sk_msg_elem(struct sk_msg *msg, int which)
> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> index cf390e0aa73d..f87fde3a846c 100644
> --- a/net/core/skmsg.c
> +++ b/net/core/skmsg.c
> @@ -270,18 +270,28 @@ void sk_msg_trim(struct sock *sk, struct sk_msg *msg, int len)
>
> msg->sg.data[i].length -= trim;
> sk_mem_uncharge(sk, trim);
> + /* Adjust copybreak if it falls into the trimmed part of last buf */
> + if (msg->sg.curr == i && msg->sg.copybreak > msg->sg.data[i].length)
> + msg->sg.copybreak = msg->sg.data[i].length;
> out:
> - /* If we trim data before curr pointer update copybreak and current
> - * so that any future copy operations start at new copy location.
> + sk_msg_iter_var_next(i);
> + msg->sg.end = i;
> +
> + /* If we trim data a full sg elem before curr pointer update
> + * copybreak and current so that any future copy operations
> + * start at new copy location.
> * However trimed data that has not yet been used in a copy op
> * does not require an update.
> */
> - if (msg->sg.curr >= i) {
> + if (!msg->sg.size) {
> + msg->sg.curr = msg->sg.start;
> + msg->sg.copybreak = 0;
> + } else if (sk_msg_iter_dist(msg->sg.start, msg->sg.curr) >
> + sk_msg_iter_dist(msg->sg.end, msg->sg.curr)) {
I'm not seeing how this can work. Taking simple case with start < end
so normal geometry without wrapping. Let,
start = 1
curr = 3
end = 4
We could trim an index to get,
start = 1
curr = 3
i = 3
end = 4
Then after out: label this would push end up one,
start = 1
curr = 3
i = 3
end = 4
But dist(start,curr) = 2 and dist(end, curr) = 1 and we would set curr
to '3' but clear the copybreak? I think a better comparison would be,
if (sk_msg_iter_dist(msg->sg.start, i) <
sk_msg_iter_dist(msg->sg.start, msg->sg.curr)
To check if 'i' walked past curr so we can reset curr/copybreak?
> + sk_msg_iter_var_prev(i);
> msg->sg.curr = i;
> msg->sg.copybreak = msg->sg.data[i].length;
> }
> - sk_msg_iter_var_next(i);
> - msg->sg.end = i;
> }
> EXPORT_SYMBOL_GPL(sk_msg_trim);
>
> --
> 2.23.0
>
On Mon, 04 Nov 2019 10:56:52 -0800, John Fastabend wrote:
> Jakub Kicinski wrote:
> > diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> > index cf390e0aa73d..f87fde3a846c 100644
> > --- a/net/core/skmsg.c
> > +++ b/net/core/skmsg.c
> > @@ -270,18 +270,28 @@ void sk_msg_trim(struct sock *sk, struct sk_msg *msg, int len)
> >
> > msg->sg.data[i].length -= trim;
> > sk_mem_uncharge(sk, trim);
> > + /* Adjust copybreak if it falls into the trimmed part of last buf */
> > + if (msg->sg.curr == i && msg->sg.copybreak > msg->sg.data[i].length)
> > + msg->sg.copybreak = msg->sg.data[i].length;
> > out:
> > - /* If we trim data before curr pointer update copybreak and current
> > - * so that any future copy operations start at new copy location.
> > + sk_msg_iter_var_next(i);
> > + msg->sg.end = i;
> > +
> > + /* If we trim data a full sg elem before curr pointer update
> > + * copybreak and current so that any future copy operations
> > + * start at new copy location.
> > * However trimed data that has not yet been used in a copy op
> > * does not require an update.
> > */
> > - if (msg->sg.curr >= i) {
> > + if (!msg->sg.size) {
> > + msg->sg.curr = msg->sg.start;
> > + msg->sg.copybreak = 0;
> > + } else if (sk_msg_iter_dist(msg->sg.start, msg->sg.curr) >
> > + sk_msg_iter_dist(msg->sg.end, msg->sg.curr)) {
>
> I'm not seeing how this can work. Taking simple case with start < end
> so normal geometry without wrapping. Let,
>
> start = 1
> curr = 3
> end = 4
>
> We could trim an index to get,
>
> start = 1
> curr = 3
> i = 3
> end = 4
IOW like this?
test_one(/* start */ 1, /* curr */ 3, /* copybreak */ 150,
/* trim */ 500,
/* curr */ 3, /* copybreak */ 100, /* end */ 4,
/* data */ 200, 200, 200);
test #13 start:1 curr:3 end:4 cb:150 size: 600 0 200 200 200 0 OKAY
> Then after out: label this would push end up one,
>
> start = 1
> curr = 3
> i = 3
> end = 4
I moved the assignment to end before the curr adjustment, so 'i' is
equivalent to 'end' at this point.
> But dist(start,curr) = 2 and dist(end, curr) = 1 and we would set curr
> to '3' but clear the copybreak?
I don't think we'd fall into this condition ever, unless we moved end.
And in your example AFAIU we don't move end.
> I think a better comparison would be,
>
> if (sk_msg_iter_dist(msg->sg.start, i) <
> sk_msg_iter_dist(msg->sg.start, msg->sg.curr)
>
> To check if 'i' walked past curr so we can reset curr/copybreak?
Ack, this does read better!
Should we use <= here? If we dropped a full segment, should curr point
at the end of the last remaining segment or should it point at 0 in end?
Jakub Kicinski wrote:
> On Mon, 04 Nov 2019 10:56:52 -0800, John Fastabend wrote:
> > Jakub Kicinski wrote:
> > > diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> > > index cf390e0aa73d..f87fde3a846c 100644
> > > --- a/net/core/skmsg.c
> > > +++ b/net/core/skmsg.c
> > > @@ -270,18 +270,28 @@ void sk_msg_trim(struct sock *sk, struct sk_msg *msg, int len)
> > >
> > > msg->sg.data[i].length -= trim;
> > > sk_mem_uncharge(sk, trim);
> > > + /* Adjust copybreak if it falls into the trimmed part of last buf */
> > > + if (msg->sg.curr == i && msg->sg.copybreak > msg->sg.data[i].length)
> > > + msg->sg.copybreak = msg->sg.data[i].length;
> > > out:
> > > - /* If we trim data before curr pointer update copybreak and current
> > > - * so that any future copy operations start at new copy location.
> > > + sk_msg_iter_var_next(i);
> > > + msg->sg.end = i;
> > > +
> > > + /* If we trim data a full sg elem before curr pointer update
> > > + * copybreak and current so that any future copy operations
> > > + * start at new copy location.
> > > * However trimed data that has not yet been used in a copy op
> > > * does not require an update.
> > > */
> > > - if (msg->sg.curr >= i) {
> > > + if (!msg->sg.size) {
> > > + msg->sg.curr = msg->sg.start;
> > > + msg->sg.copybreak = 0;
> > > + } else if (sk_msg_iter_dist(msg->sg.start, msg->sg.curr) >
> > > + sk_msg_iter_dist(msg->sg.end, msg->sg.curr)) {
> >
> > I'm not seeing how this can work. Taking simple case with start < end
> > so normal geometry without wrapping. Let,
> >
> > start = 1
> > curr = 3
> > end = 4
> >
> > We could trim an index to get,
> >
> > start = 1
> > curr = 3
> > i = 3
> > end = 4
>
> IOW like this?
>
> test_one(/* start */ 1, /* curr */ 3, /* copybreak */ 150,
> /* trim */ 500,
> /* curr */ 3, /* copybreak */ 100, /* end */ 4,
> /* data */ 200, 200, 200);
>
> test #13 start:1 curr:3 end:4 cb:150 size: 600 0 200 200 200 0 OKAY
>
> > Then after out: label this would push end up one,
> >
> > start = 1
> > curr = 3
> > i = 3
> > end = 4
>
> I moved the assignment to end before the curr adjustment, so 'i' is
> equivalent to 'end' at this point.
right.
>
> > But dist(start,curr) = 2 and dist(end, curr) = 1 and we would set curr
> > to '3' but clear the copybreak?
>
> I don't think we'd fall into this condition ever, unless we moved end.
> And in your example AFAIU we don't move end.
>
> > I think a better comparison would be,
> >
> > if (sk_msg_iter_dist(msg->sg.start, i) <
> > sk_msg_iter_dist(msg->sg.start, msg->sg.curr)
> >
> > To check if 'i' walked past curr so we can reset curr/copybreak?
>
> Ack, this does read better!
Great.
>
> Should we use <= here? If we dropped a full segment, should curr point
> at the end of the last remaining segment or should it point at 0 in end?
Right it should be <=.
Full segment? If a segment is trimmed exactly then curr can point to the
previous segment with 'copybreak = sge->length' so next copy will see a
full buffer and advance curr. Or can leave curr on the trim'ed segment
with copybreak set to 0.
Both should be OK as long as copybreak is correct. Reviewing code now
to be sure we didn't take any shortcuts but that should be true else we
may have other bugs when working with BPF.