2018-08-09 22:41:28

by Dominique Martinet

[permalink] [raw]
Subject: [PATCH v0] strparser: remove any offset before parsing messages

Offset is not well handled by strparser users right now.

Out of the current strparser users, we have:
- tls, that handles offset properly in parse and rcv callbacks
- kcm, that handles offset in rcv but not in parse
- bpf sockmap, that does not seem to handle offset anywhere

Calling pskb_pull() on the skb before parsing ensures that the offset
will be 0 everywhere in practice unless the user modifies it themselves
like tls, as a workaround for the other two protocols.

This fixes a bug whilch can be exhibited by implementing a simpe kcm
parser that looks for the packet size in the first word of the packet,
and sending two such packets in a single write() call on the other side:
the second message will be cut at the length of the first message.
Since this is a stream protocol, all the following messages will also
be corrupt since it will start looking for the next offset at a wrong
position.

Signed-off-by: Dominique Martinet <[email protected]>
---

Discussions on the bug along with a (bad) reproducer can be found here:
http://lkml.kernel.org/m/20180803182830.GB29193@nautica
(now the problem is better understood though it's much simpler to send
two messages at once than to spam and wait for tcp aggregation to do it)


Two notes:
- I've marked this patch v0 as we could move the pskb_pull() up to
where strp.offset is set, and just always leave it at 0 in the strparser
code.
This will let applications that are fine dealing with a non-zero offset
deal with it as they seem fit (tls writes into the offset and full_len
fields behind the back of the stream parser), while still being safe for
kcm/sockmap

- Even with that modification I'm not totally happy with
single-handedly eating the offset for strparser users which could handle
it, but I'm not really familiar with the cost this really has in
practice...
A better fix would be to handle the offset properly in the callbacks,
but frankly at least for kcm I don't see how (maybe because I'm not
familiar with how bpf programs work)

Another idea I had would be to write flags when registering the protocol
e.g. strp->cb.flags & STRP_CAN_PARSE_WITH_OFFSET or something like that,
but without an idea of the cost of that pull I don't know if it's worth
doing.


Anyway, comments welcome.


net/strparser/strparser.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/net/strparser/strparser.c b/net/strparser/strparser.c
index 625acb27efcc..d7a3b81c3481 100644
--- a/net/strparser/strparser.c
+++ b/net/strparser/strparser.c
@@ -222,6 +222,16 @@ static int __strp_recv(read_descriptor_t *desc, struct sk_buff *orig_skb,
if (!stm->strp.full_len) {
ssize_t len;

+ /* Can only parse if there is no offset */
+ if (unlikely(stm->strp.offset)) {
+ if (!pskb_pull(skb, stm->strp.offset)) {
+ STRP_STATS_INCR(strp->stats.mem_fail);
+ strp_parser_err(strp, -ENOMEM, desc);
+ break;
+ }
+ stm->strp.offset = 0;
+ }
+
len = (*strp->cb.parse_msg)(strp, head);

if (!len) {
@@ -249,8 +259,7 @@ static int __strp_recv(read_descriptor_t *desc, struct sk_buff *orig_skb,
STRP_STATS_INCR(strp->stats.msg_too_big);
strp_parser_err(strp, -EMSGSIZE, desc);
break;
- } else if (len <= (ssize_t)head->len -
- skb->len - stm->strp.offset) {
+ } else if (len <= (ssize_t)head->len - skb->len) {
/* Length must be into new skb (and also
* greater than zero)
*/
--
2.17.1



2018-08-21 14:55:42

by Doron Roberts-Kedes

[permalink] [raw]
Subject: Re: [PATCH] strparser: remove any offset before parsing messages

On Tue, Aug 21, 2018 at 02:51:46PM +0200, Dominique Martinet wrote:
> Offset is not well handled by strparser users right now.
>
> Out of the current strparser users, we have:
> - tls, that handles offset properly in parse and rcv callbacks
> - kcm, that handles offset in rcv but not in parse
> - bpf sockmap, that does not seem to handle offset anywhere
>
> Calling pskb_pull() on new skb ensures that the offset will be 0
> everywhere in practice, and in particular for the parse function,
> unless the user modifies it themselves like tls does.
>
> This fixes a bug which can be exhibited by implementing a simple kcm
> parser that looks for the packet size in the first word of the packet,
> and sending two such packets in a single write() call on the other side:
> the second message will be cut at the length of the first message.
> Since this is a stream protocol, all the following messages will also
> be corrupt since it will start looking for the next offset at a wrong
> position.
>
> Signed-off-by: Dominique Martinet <[email protected]>
> ---

There are a few issues with this patch. First, it seems like you're
trying to fix bugs in users of strparser by changing an implementation
detail of strparser. Second, this implementation change can add malloc's
and copies where there were none before.

If strparser users do not handle non-zero offset properly, then that
doesn't motivate changing the implementation of strparser to copy
around data to accomodate those buggy users.

Why not submit a patch that handles offset properly in the code you
pointed out?

2018-08-21 15:47:23

by Dominique Martinet

[permalink] [raw]
Subject: [PATCH] strparser: remove any offset before parsing messages

Offset is not well handled by strparser users right now.

Out of the current strparser users, we have:
- tls, that handles offset properly in parse and rcv callbacks
- kcm, that handles offset in rcv but not in parse
- bpf sockmap, that does not seem to handle offset anywhere

Calling pskb_pull() on new skb ensures that the offset will be 0
everywhere in practice, and in particular for the parse function,
unless the user modifies it themselves like tls does.

This fixes a bug which can be exhibited by implementing a simple kcm
parser that looks for the packet size in the first word of the packet,
and sending two such packets in a single write() call on the other side:
the second message will be cut at the length of the first message.
Since this is a stream protocol, all the following messages will also
be corrupt since it will start looking for the next offset at a wrong
position.

Signed-off-by: Dominique Martinet <[email protected]>
---

I haven't had any comment on v0, so here is what I had planned as
refactoring anyawy, but I'd *really* like some opinion on this as a
whole...

Also note that compiling bpf programs with libbcc is currently broken in
linus' master, see the fix thread here:
https://lkml.kernel.org/r/[email protected]
You can likely just revert cafa0010cd51 ("Raise the minimum required gcc
version to 4.6") localy meanwhile, or base the patch off 4.18.

net/strparser/strparser.c | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/net/strparser/strparser.c b/net/strparser/strparser.c
index da1a676860ca..d7fb30b1bcfc 100644
--- a/net/strparser/strparser.c
+++ b/net/strparser/strparser.c
@@ -201,7 +201,17 @@ static int __strp_recv(read_descriptor_t *desc, struct sk_buff *orig_skb,
strp->skb_nextp = NULL;
stm = _strp_msg(head);
memset(stm, 0, sizeof(*stm));
- stm->strp.offset = orig_offset + eaten;
+
+ /* Can only parse if there is no offset */
+ if (unlikely(orig_offset + eaten)) {
+ if (!pskb_pull(skb, orig_offset + eaten)) {
+ STRP_STATS_INCR(strp->stats.mem_fail);
+ strp_parser_err(strp, -ENOMEM, desc);
+ break;
+ }
+ orig_len -= eaten;
+ orig_offset = eaten = 0;
+ }
} else {
/* Unclone if we are appending to an skb that we
* already share a frag_list with.
@@ -253,8 +263,7 @@ static int __strp_recv(read_descriptor_t *desc, struct sk_buff *orig_skb,
STRP_STATS_INCR(strp->stats.msg_too_big);
strp_parser_err(strp, -EMSGSIZE, desc);
break;
- } else if (len <= (ssize_t)head->len -
- skb->len - stm->strp.offset) {
+ } else if (len <= (ssize_t)head->len - skb->len) {
/* Length must be into new skb (and also
* greater than zero)
*/
--
2.17.1


2018-08-21 19:40:38

by Dominique Martinet

[permalink] [raw]
Subject: Re: [PATCH] strparser: remove any offset before parsing messages

Doron Roberts-Kedes wrote on Tue, Aug 21, 2018:
> There are a few issues with this patch. First, it seems like you're
> trying to fix bugs in users of strparser by changing an implementation
> detail of strparser.

Yes, that's why I have been writing since the original discussion that I
do not like this fix, but as I said in the other thread and v0 of this
patch I do not know how to tell the bpf function to start with an offset
in the skb in e.g. kcm_parse_func_strparser

I could add the pull in that function, but that feels a bit wrong on a
separation level to me.

> Second, this implementation change can add malloc's and copies where
> there were none before.

Yes I agree this is more than suboptimal for tls, I've also said that.


> If strparser users do not handle non-zero offset properly, then that
> doesn't motivate changing the implementation of strparser to copy
> around data to accomodate those buggy users.
>
> Why not submit a patch that handles offset properly in the code you
> pointed out?

One of the solutions I had suggested was adding a flag at strparser
setup time to only do that pull for users which cannot handle offset,
but nobody seemed interested two weeks ago. I can still do that.

That's still suboptimal, but I don't have any better idea.
To properly fix the users, I'd really need help with how bpf works to
even know if passing an offset would be possible in the first place, as
I do not see how at this time.


Thanks,
--
Dominique Martinet

2018-08-21 21:26:04

by Doron Roberts-Kedes

[permalink] [raw]
Subject: Re: [PATCH] strparser: remove any offset before parsing messages

On Tue, Aug 21, 2018 at 09:36:55PM +0200, Dominique Martinet wrote:

> One of the solutions I had suggested was adding a flag at strparser
> setup time to only do that pull for users which cannot handle offset,
> but nobody seemed interested two weeks ago. I can still do that.

This seems overly complicated.

> That's still suboptimal, but I don't have any better idea.
> To properly fix the users, I'd really need help with how bpf works to
> even know if passing an offset would be possible in the first place, as
> I do not see how at this time.

Thanks for clarifying Dominique.
It seems like we mainly agree that the proposed patch is suboptimal for
existing clients of the library that use offset correctly (tls).

It also seems like you've identified that the proper fix is in bpf.
Regrettably, I cannot help you understand how bpf works because I'm not
familiar with that code.

As an aside, I would recommend reaching the netdev FAQ page:
https://www.kernel.org/doc/Documentation/networking/netdev-FAQ.txt

It contains helpful hints about how to format email subjects (specifying
net vs. net-next) and determining when trees are closed (currently
closed).

2018-08-21 23:08:43

by Dominique Martinet

[permalink] [raw]
Subject: Re: [PATCH] strparser: remove any offset before parsing messages

Doron Roberts-Kedes wrote on Tue, Aug 21, 2018:
> On Tue, Aug 21, 2018 at 09:36:55PM +0200, Dominique Martinet wrote:
> > One of the solutions I had suggested was adding a flag at strparser
> > setup time to only do that pull for users which cannot handle offset,
> > but nobody seemed interested two weeks ago. I can still do that.
>
> This seems overly complicated.

That sounds much simpler than adding a similar feature to bpf if it
doesn't already support it -- we already have an user-provided struct at
strparser init timer (strp->cb) in which it would be easy to add a flag
saying whether the callbacks support offset or not.

That's maybe three more lines than the current patch, which is also
pretty simple, I'm not sure what you're expecting from alternative
solutions to call that overly complicated...


> It seems like we mainly agree that the proposed patch is suboptimal for
> existing clients of the library that use offset correctly (tls).
>
> It also seems like you've identified that the proper fix is in bpf.

I don't think bpf itself needs to be changed here -- the offset is
stored in a strparser specific struct so short of such a skb_pull I
think we'd need to change the type of the bpf function, pass it it the
extra parameter, and make it a user visible change breaking the kcm
API... And I have no idea for sockmap but probably something similar.

I can't think of that as better than adding a flag to strparser.

(Also, note that pskb_pull will not copy any data or allocate memory
unless we're pulling past the end of the skb, which seems pretty
unlikely in that situation as we should have consumed any fully "eaten"
skb before getting to a new one here -- so in practice this patch just
adds a skb->data += offset with safety guards "just in case")


> As an aside, I would recommend reaching the netdev FAQ page:
> https://www.kernel.org/doc/Documentation/networking/netdev-FAQ.txt
>
> It contains helpful hints about how to format email subjects (specifying
> net vs. net-next) and determining when trees are closed (currently
> closed).

Thank you for pointing out to that document.
While this bug has been around from day one of kcm it is still a bug fix
so I'd rather let maintainers decide which tree it goes to.

I wasn't aware that net-next closes during the Linus merge window, but
if you want to split hairs, the FAQ says "do not send new net-next
content to netdev [while closed]" and this isn't new content as the v0
of the patch was mostly the same and sent before the 4.18 release, (and,
ironically, did not get any comment).
Anyway, sure, I'll wait until next week for a v2 if that matters.


Thanks,
--
Dominique

2018-08-21 23:51:45

by Doron Roberts-Kedes

[permalink] [raw]
Subject: Re: [PATCH] strparser: remove any offset before parsing messages

On Wed, Aug 22, 2018 at 12:51:13AM +0200, Dominique Martinet wrote:
> That's maybe three more lines than the current patch, which is also
> pretty simple, I'm not sure what you're expecting from alternative
> solutions to call that overly complicated...

The line count is not the source of the complexity. The undue complexity
is having strparser operate in two modes: one for clients that properly
use the API by respecting the value of offset, and another for clients
that do not.

> I don't think bpf itself needs to be changed here -- the offset is
> stored in a strparser specific struct so short of such a skb_pull I
> think we'd need to change the type of the bpf function, pass it it the
> extra parameter, and make it a user visible change breaking the kcm
> API... And I have no idea for sockmap but probably something similar.

I'm not sure I follow you here. Any rcv_msg callback implementation
receives an skb. Calling strp_msg() on the skb gives you the strp_msg
which has the offset value. Can you explain why passing an extra
parameter is necessary to get the offset?

> I can't think of that as better than adding a flag to strparser.
>
> (Also, note that pskb_pull will not copy any data or allocate memory
> unless we're pulling past the end of the skb, which seems pretty
> unlikely in that situation as we should have consumed any fully "eaten"
> skb before getting to a new one here -- so in practice this patch just
> adds a skb->data += offset with safety guards "just in case")

Yes, no data will be copied if the you don't pull beyond the linear
buffer. Adding overhead even in a small percentage of cases still
requires a good justification. In this particular case, I think a good
justification would be demonstrating that it is impractical for the
buggy strparser users you've pointed out to use the existing API and
respect the value of offset. You have indicated that you are not super
familiar with the bpf code, which is fine (I'm not either), but this
isn't a good reason to make a change to strparser instead of bpf.

2018-08-22 01:04:43

by Dominique Martinet

[permalink] [raw]
Subject: Re: [PATCH] strparser: remove any offset before parsing messages

Doron Roberts-Kedes wrote on Tue, Aug 21, 2018:
> On Wed, Aug 22, 2018 at 12:51:13AM +0200, Dominique Martinet wrote:
> > That's maybe three more lines than the current patch, which is also
> > pretty simple, I'm not sure what you're expecting from alternative
> > solutions to call that overly complicated...
>
> The line count is not the source of the complexity. The undue complexity
> is having strparser operate in two modes: one for clients that properly
> use the API by respecting the value of offset, and another for clients
> that do not.

I might sound stubborn at this point but that still does not sound
complex compared to the alternatives I can think of.


> > I don't think bpf itself needs to be changed here -- the offset is
> > stored in a strparser specific struct so short of such a skb_pull I
> > think we'd need to change the type of the bpf function, pass it it the
> > extra parameter, and make it a user visible change breaking the kcm
> > API... And I have no idea for sockmap but probably something similar.
>
> I'm not sure I follow you here. Any rcv_msg callback implementation
> receives an skb. Calling strp_msg() on the skb gives you the strp_msg
> which has the offset value. Can you explain why passing an extra
> parameter is necessary to get the offset?


Yes, the rcv_msg callback itself can get the offset easily, and it's not
that which needs an extra parameter but the bpf function kcm/sockmap are
calling which would need either an extra parameter or changing to get
that value themselves -- the later is probably not very difficult and
won't break the API, but at the same time pushes the responsability to
kcm/sockmap users who will get that wrong and waste time trying to
understand how kcm works like I did.

Breaking the API has the advantage that users will get an error telling
them to update their bpf program instead of randomly failing when tcp
aggregation happens.

For what it's worth, I don't think either are acceptable solutions, I'm
just stating what would a "fix in bpf" would be.


A "fix in kcm/sockmap" would be to pull just before calling the bpf
program, which could be a middle ground, but I think that's more
perverse than adding a flag to strparser for new users because we'd
basically be saying users should modify the skb that strparser users
under its feet, and there is no guarantee what would happen to the
strparser logic in that case -- it might work to pull in the parser
function but it might not work in rcv for all I know, or the next user
might think that since pull is ok some other operation on the skb is as
well...


> > (Also, note that pskb_pull will not copy any data or allocate memory
> > unless we're pulling past the end of the skb, which seems pretty
> > unlikely in that situation as we should have consumed any fully "eaten"
> > skb before getting to a new one here -- so in practice this patch just
> > adds a skb->data += offset with safety guards "just in case")
>
> Yes, no data will be copied if the you don't pull beyond the linear
> buffer. Adding overhead even in a small percentage of cases still
> requires a good justification.

As I wrote above, I think it should not be possible, so we're not
even talking about a small percentage here.
The reason I didn't use skb_pull (the head-only variant) is that I'd
rather have the overhead than a BUG() if I'm wrong on this...

> In this particular case, I think a good justification would be
> demonstrating that it is impractical for the buggy strparser users
> you've pointed out to use the existing API and respect the value of
> offset.

That's what the previous paragraph about changing the API intended to
do.

> You have indicated that you are not super familiar with the bpf code,
> which is fine (I'm not either), but this isn't a good reason to make a
> change to strparser instead of bpf.

I didn't even know kcm/strparser existed a few weeks ago, not being
familiar doesn't mean I didn't look at how it works.

I'd be glad to be proven wrong here but I really do not think there is a
possibility within the bpf framework to give a fake skb with an external
offset to the bpf program transparently.

Assuming there is, it would have to be more expensive than a pull (it'd
basically need to do a pull then restore the original data somehow)...

And if there isn't I certainly do not want to add one, not because I
don't want to mess with something I'm not too familiar with (that'd
actually be an argument to do it that way to me), but because I do not
think it belongs there; bpf should not need to know what a skb is or how
it works.


All being said I'm still convinced having two modes in strparser is the
simplest solution.

--
Dominique

2018-08-22 03:37:21

by Doron Roberts-Kedes

[permalink] [raw]
Subject: Re: [PATCH] strparser: remove any offset before parsing messages

On Wed, Aug 22, 2018 at 02:46:47AM +0200, Dominique Martinet wrote:
> Yes, the rcv_msg callback itself can get the offset easily, and it's not
> that which needs an extra parameter but the bpf function kcm/sockmap are
> calling which would need either an extra parameter or changing to get
> that value themselves.

Ah cool. Thanks for explaining.

> For what it's worth, I don't think either are acceptable solutions, I'm
> just stating what would a "fix in bpf" would be.

Agreed that the discussion should be about whether to fix it up in
strparser or sockmap. bpf seems inappropriate.

> strparser logic in that case -- it might work to pull in the parser
> function but it might not work in rcv for all I know, or the next user
> might think that since pull is ok some other operation on the skb is as
> well...

Just to make sure I understand, is it possible you meant to say that the
other way around? Surely the rcv callback can do whatever it wants with
the skb. Its the parse callback that may need to be a little more
careful with the skb.

For the parse case, why not just clone and pull?

> As I wrote above, I think it should not be possible, so we're not
> even talking about a small percentage here.
> The reason I didn't use skb_pull (the head-only variant) is that I'd
> rather have the overhead than a BUG() if I'm wrong on this...

A printk in that section when (orig_offset + eaten > skb_headlen(head))
confirms that this case is not uncommon or impossible. Would have to do
more work to see how many hundreds of times per second, but it is not a
philosophical concern.


2018-08-22 05:48:52

by Dominique Martinet

[permalink] [raw]
Subject: Re: [PATCH] strparser: remove any offset before parsing messages

Doron Roberts-Kedes wrote on Tue, Aug 21, 2018:
> > strparser logic in that case -- it might work to pull in the parser
> > function but it might not work in rcv for all I know, or the next user
> > might think that since pull is ok some other operation on the skb is as
> > well...
>
> Just to make sure I understand, is it possible you meant to say that the
> other way around? Surely the rcv callback can do whatever it wants with
> the skb. Its the parse callback that may need to be a little more
> careful with the skb.

Hmm, right, when we call the rcv callback we remove the skb from
strp->skb_head, and by the next iteration we have a new clone of the
orig_skb so it looks safe, but that's not something that's obvious at
first glance; it works because the skb was cloned at the start of the
loop.


> For the parse case, why not just clone and pull?

Hmm, if we clone I guess there is no side effect to fear, that could be
acceptable... It feels that we're just pushing more overhead on to
kcm/sockmap though; in strparser's __strp_recv we know we can pull
safely so doing it there feels simpler to me.

After testing the overhead doesn't seem to be bad though, it looks like
it's less than the noise level on my laptop on performance governor; the
only thing to pay attention to is that if we clone here I'll need to
also add another pull in sockmap's rcv function
(smap_read_sock_strparser) that doesn't handle offset either.

If we only pull just before parsing I think rcv is also guaranteed to
have no offset, it has to start with the same offset as the parsing
unless the user changed it, right?



Anyway there's progress, we're down to these two-ish choices if I
followed correctly:
- add a flag in strp_callbacks that says offsets are ok or not for
parsing, and just pull if it's set.
Now we're back to that I'd be moving the pull just before parsing like I
did in v0, as that's easier to follow.
or
- add a (clone?+)pull in kcm_rcv_strparser and smap_parse_func_strparser
(+ in smap_read_sock_strparser if clone)


(As a side note, I noticed this patch is bugged, orig_offset/eaten
shouldn't be reset to zero after the pull (and thus orig_len doesn't
need changing either); that's what I get for trying to "simplify"
something that was simpler in the v0 to me...)


> > As I wrote above, I think it should not be possible, so we're not
> > even talking about a small percentage here.
> > The reason I didn't use skb_pull (the head-only variant) is that I'd
> > rather have the overhead than a BUG() if I'm wrong on this...
>
> A printk in that section when (orig_offset + eaten > skb_headlen(head))
> confirms that this case is not uncommon or impossible. Would have to do
> more work to see how many hundreds of times per second, but it is not a
> philosophical concern.

Hmm, right, it does happen if I force two bigger packets in a single
write() on my reproducer; I guess my workload didn't exhibit that
behaviour with a 9p client...

I've tried measuring that overhead as well by writing a more complex bpf
program that would fetch the offset in the skb but for some reason I'm
reading a 0 offset when it's not zero... well, not like there's much
choice for this at this point anyway; I don't think we'll do this
without pull, I'll put that on background.

--
Dominique


2018-08-22 18:41:42

by Dave Watson

[permalink] [raw]
Subject: Re: [PATCH] strparser: remove any offset before parsing messages

On 08/22/18 07:47 AM, Dominique Martinet wrote:
> > > As I wrote above, I think it should not be possible, so we're not
> > > even talking about a small percentage here.
> > > The reason I didn't use skb_pull (the head-only variant) is that I'd
> > > rather have the overhead than a BUG() if I'm wrong on this...
> >
> > A printk in that section when (orig_offset + eaten > skb_headlen(head))
> > confirms that this case is not uncommon or impossible. Would have to do
> > more work to see how many hundreds of times per second, but it is not a
> > philosophical concern.
>
> Hmm, right, it does happen if I force two bigger packets in a single
> write() on my reproducer; I guess my workload didn't exhibit that
> behaviour with a 9p client...
>
> I've tried measuring that overhead as well by writing a more complex bpf
> program that would fetch the offset in the skb but for some reason I'm
> reading a 0 offset when it's not zero... well, not like there's much
> choice for this at this point anyway; I don't think we'll do this
> without pull, I'll put that on background.

For what it is worth we checked the offset in bpf, something
along the lines of

struct kcm_rx_msg { int full_len; int offset;};
static inline struct kcm_rx_msg *kcm_rx_msg(struct __sk_buff *skb)
{ return (struct kcm_rx_msg *)skb->cb;}

int decode_framing(struct __sk_buff *skb)
{ return load_word(skb, kcm_rx_msg(skb)->offset);}

Although it did puzzle me for a while figuring that out when I ran in
to it.

2018-08-23 01:06:18

by Dominique Martinet

[permalink] [raw]
Subject: Re: [PATCH] strparser: remove any offset before parsing messages

Dave Watson wrote on Wed, Aug 22, 2018:
> > I've tried measuring that overhead as well by writing a more complex bpf
> > program that would fetch the offset in the skb but for some reason I'm
> > reading a 0 offset when it's not zero... well, not like there's much
> > choice for this at this point anyway; I don't think we'll do this
> > without pull, I'll put that on background.
>
> For what it is worth we checked the offset in bpf, something
> along the lines of

Oh, thanks!

> > struct kcm_rx_msg { int full_len; int offset;};
> static inline struct kcm_rx_msg *kcm_rx_msg(struct __sk_buff *skb)
> { return (struct kcm_rx_msg *)skb->cb;}
>
> int decode_framing(struct __sk_buff *skb)
> { return load_word(skb, kcm_rx_msg(skb)->offset);}

So you're taking directly the address at skb->cb but the linux code has
this function:
static inline struct strp_msg *strp_msg(struct sk_buff *skb)
{
return (struct strp_msg *)((void *)skb->cb +
offsetof(struct qdisc_skb_cb, data));
}
and qdisc_skb_cb.data is another 8 bytes in, that would explain I had
different results (and now I'm trying your snippet it does work), but
I'll have to admit I fail to understand this....

Ok, so 'cb' in __sk_buff is 48 bytes in but 'cb' in sk_buff is 40 bytes
in -- I might just start getting annoyed over this, is there a reason
for the different offset?!


> Although it did puzzle me for a while figuring that out when I ran in
> to it.

Well, at least it means some people were aware of the problem and worked
around it in their own way -- what do you think of pulling instead?
I mean, we could just document that "really well" and provide the
get-offset function in some header that would be made include-able from
bpf.. But right now this isn't really the case.


FWIW now I have this version I also don't notice any performance change
with the pull on my example, it actually looks like the bpf load_word is
slightly slower than pull to access data that is not in the head, but
the noise level is pretty bad.


Thanks,
--
Dominique

2018-09-11 09:23:59

by Dominique Martinet

[permalink] [raw]
Subject: [PATCH v2] kcm: remove any offset before parsing messages

The current code assumes kcm users know they need to look for the
strparser offset within their bpf program, which is not documented
anywhere and examples laying around do not do.

The actual recv function does handle the offset well, so we can create a
temporary clone of the skb and pull that one up as required for parsing.

The pull itself has a cost if we are pulling beyond the head data,
measured to 2-3% latency in a noisy VM with a local client stressing
that path. The clone's impact seemed too small to measure.

This bug can be exhibited easily by implementing a "trivial" kcm parser
taking the first bytes as size, and on the client sending at least two
such packets in a single write().

Note that bpf sockmap has the same problem, both for parse and for recv,
so it would pulling twice or a real pull within the strparser logic if
anyone cares about that.

Signed-off-by: Dominique Martinet <[email protected]>
---

Hmm, while trying to benchmark this, I sometimes got hangs in
kcm_wait_data() for the last packet somehow?
The sender program was done (exited (zombie) so I assumed the sender
socket flushed), but the receiver was in kcm_wait_data in kcm_recvmsg
indicating it parsed a header but there was no skb to peek at?
But the sock is locked so this shouldn't be racy...

I can get it fairly often with this patch and small messages with an
offset, but I think it's just because the pull changes some timing - I
can't hit it with just the clone, and I can hit it with a pull without
clone as well.... And I don't see how pulling a cloned skb can impact
the original socket, but I'm a bit fuzzy on this.

(it's interesting that I didn't seem to hit this race when pulling in
strparser, that shouldn't be any different)


I'll look at that a bit more, but there have been no activity here for
a while and I don't have the energy to keep pushing in the strparser
direction, so take this patch more as a change of direction and a bit
as a 'bump' on the subject - I think it's important but I have too much
on my plate right now to keep pushing if there is no interest from the
authors.

net/kcm/kcmsock.c | 26 +++++++++++++++++++++++++-
1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/net/kcm/kcmsock.c b/net/kcm/kcmsock.c
index 571d824e4e24..36c438b95955 100644
--- a/net/kcm/kcmsock.c
+++ b/net/kcm/kcmsock.c
@@ -381,8 +381,32 @@ static int kcm_parse_func_strparser(struct strparser *strp, struct sk_buff *skb)
{
struct kcm_psock *psock = container_of(strp, struct kcm_psock, strp);
struct bpf_prog *prog = psock->bpf_prog;
+ struct sk_buff *clone_skb = NULL;
+ struct strp_msg *stm;
+ int rc;
+
+ stm = strp_msg(skb);
+ if (stm->offset) {
+ skb = clone_skb = skb_clone(skb, GFP_ATOMIC);
+ if (!clone_skb)
+ return -ENOMEM;
+
+ if (!pskb_pull(clone_skb, stm->offset)) {
+ rc = -ENOMEM;
+ goto out;
+ }
+
+ /* reset cloned skb's offset for bpf programs using it */
+ stm = strp_msg(clone_skb);
+ stm->offset = 0;
+ }
+
+ rc = (*prog->bpf_func)(skb, prog->insnsi);
+out:
+ if (clone_skb)
+ kfree_skb(clone_skb);

- return (*prog->bpf_func)(skb, prog->insnsi);
+ return rc;
}

static int kcm_read_sock_done(struct strparser *strp, int err)
--
2.17.1


2018-09-12 05:38:52

by Dominique Martinet

[permalink] [raw]
Subject: Re: [PATCH v2] kcm: remove any offset before parsing messages

Dominique Martinet wrote on Tue, Sep 11, 2018:
> Hmm, while trying to benchmark this, I sometimes got hangs in
> kcm_wait_data() for the last packet somehow?
> The sender program was done (exited (zombie) so I assumed the sender
> socket flushed), but the receiver was in kcm_wait_data in kcm_recvmsg
> indicating it parsed a header but there was no skb to peek at?
> But the sock is locked so this shouldn't be racy...
>
> I can get it fairly often with this patch and small messages with an
> offset, but I think it's just because the pull changes some timing - I
> can't hit it with just the clone, and I can hit it with a pull without
> clone as well.... And I don't see how pulling a cloned skb can impact
> the original socket, but I'm a bit fuzzy on this.

This is weird, I cannot reproduce at all without that pull, even if I
add another delay there instead of the pull, so it's not just timing...

I was confusing kcm_recvmsg and kcm_rcv_strparser but I still think
there is a race in kcm_wait_data between the peek and the wait. I have
confirmed on a hang that the sock's sk_receive_queue is not empty so
skb_peek would return something at this point, but it is waiting in
kcm_wait_data's sk_wait_data.. And no event would be coming at this
point since the sender is done.

kcm_wait_data does have the socket lock but the enqueuing counterpart
(kcm_queue_rcv_skb) is apparently called without lock most of the time
(at least, sk->sk_lock.owned is not set most of the times) ; but if I
add a lock_sock() here it can deadlock when a kfree_skb triggers
kcm_rcv_ready if that kfree_skb was done with the lock... ugh.

I thought the mux rx lock would be taken all the time but that appear
not to be the case either, and even if it were I don't see how to make
sk_wait_data with another lock in the first place.


So meh, I'm not sure why the pull messes up with this, I'm tempted to
say this is an old problem but I'm quite annoyed I do not seem to be
able to reproduce it.
I'm really out of time now so unless someone cares it'll wait for a few
more weeks.


(As an unrelated note, wrt to the patch, it might be nice to add a new
kcm socket option so users could say "my bpf program handles offsets,
don't bother with this")
--
Dominique

2018-09-18 01:45:25

by David Miller

[permalink] [raw]
Subject: Re: [PATCH v2] kcm: remove any offset before parsing messages

From: Dominique Martinet <[email protected]>
Date: Wed, 12 Sep 2018 07:36:42 +0200

> Dominique Martinet wrote on Tue, Sep 11, 2018:
>> Hmm, while trying to benchmark this, I sometimes got hangs in
>> kcm_wait_data() for the last packet somehow?
>> The sender program was done (exited (zombie) so I assumed the sender
>> socket flushed), but the receiver was in kcm_wait_data in kcm_recvmsg
>> indicating it parsed a header but there was no skb to peek at?
>> But the sock is locked so this shouldn't be racy...
>>
>> I can get it fairly often with this patch and small messages with an
>> offset, but I think it's just because the pull changes some timing - I
>> can't hit it with just the clone, and I can hit it with a pull without
>> clone as well.... And I don't see how pulling a cloned skb can impact
>> the original socket, but I'm a bit fuzzy on this.
>
> This is weird, I cannot reproduce at all without that pull, even if I
> add another delay there instead of the pull, so it's not just timing...

I really can't apply this patch until you resolve this.

It is weird, given your description, though...

2018-09-18 01:58:07

by Dominique Martinet

[permalink] [raw]
Subject: Re: [PATCH v2] kcm: remove any offset before parsing messages

David Miller wrote on Mon, Sep 17, 2018:
> From: Dominique Martinet <[email protected]>
> Date: Wed, 12 Sep 2018 07:36:42 +0200
>
> > Dominique Martinet wrote on Tue, Sep 11, 2018:
> >> Hmm, while trying to benchmark this, I sometimes got hangs in
> >> kcm_wait_data() for the last packet somehow?
> >> The sender program was done (exited (zombie) so I assumed the sender
> >> socket flushed), but the receiver was in kcm_wait_data in kcm_recvmsg
> >> indicating it parsed a header but there was no skb to peek at?
> >> But the sock is locked so this shouldn't be racy...
> >>
> >> I can get it fairly often with this patch and small messages with an
> >> offset, but I think it's just because the pull changes some timing - I
> >> can't hit it with just the clone, and I can hit it with a pull without
> >> clone as well.... And I don't see how pulling a cloned skb can impact
> >> the original socket, but I'm a bit fuzzy on this.
> >
> > This is weird, I cannot reproduce at all without that pull, even if I
> > add another delay there instead of the pull, so it's not just timing...
>
> I really can't apply this patch until you resolve this.
>
> It is weird, given your description, though...

Thanks for the reminder! I totally agree with you here and did not
expect this to be merged as it is (in retrospect, I probably should have
written something to that extent in the subject, "RFC"?)

I really don't have much time to give to that right now as I'm doing
this on my freetime, and the lack of reply has been rather demotivating
so it got pushed back a few times...
Given you did reply now I'll try to spend some time to figure that out
in the next couple of weeks but it might not make it for this cycle
depending on the number of rc we'll get and time you want this to soak
it -next.


(I can start by putting the pull back in netparser and try to reproduce,
it's really weird that I never got it to happen at the time...)

--
Dominique

2018-09-18 02:42:47

by David Miller

[permalink] [raw]
Subject: Re: [PATCH v2] kcm: remove any offset before parsing messages

From: Dominique Martinet <[email protected]>
Date: Tue, 18 Sep 2018 03:57:23 +0200

> Given you did reply now I'll try to spend some time to figure that out
> in the next couple of weeks but it might not make it for this cycle
> depending on the number of rc we'll get and time you want this to soak
> it -next.

Great.

Remind me, is there actually any way for the bpf programs run in this
situation to even _see_ strp_msg(skb)->offset at all?

There isn't right? And the alternate proposal was to add such a
facility, right?

Just trying to remember all of the context, maybe it's good
information to add to the commit message?

2018-09-18 02:46:38

by Dominique Martinet

[permalink] [raw]
Subject: Re: [PATCH v2] kcm: remove any offset before parsing messages

David Miller wrote on Mon, Sep 17, 2018:
> Remind me, is there actually any way for the bpf programs run in this
> situation to even _see_ strp_msg(skb)->offset at all?

No, they can see it, so it's possible to make a KCM program that works
right now if you are careful (I'm not sure why the offset within bpf is
different from the offset in the kernel though, it looks like the bpf
program skips the qos part of the control buffer)

> There isn't right? And the alternate proposal was to add such a
> facility, right?

The problem is that this isn't documented at all, and I could not find
any example doing that until Dave gave me one (I couldn't get it to work
because of the different offset).

The alternate proposal was to just document it, yes.

> Just trying to remember all of the context, maybe it's good
> information to add to the commit message?

Good idea, I'll add some more explanation there.

--
Dominique Martinet

2018-09-18 02:54:04

by David Miller

[permalink] [raw]
Subject: Re: [PATCH v2] kcm: remove any offset before parsing messages

From: Dominique Martinet <[email protected]>
Date: Tue, 18 Sep 2018 04:45:36 +0200

> David Miller wrote on Mon, Sep 17, 2018:
>> Remind me, is there actually any way for the bpf programs run in this
>> situation to even _see_ strp_msg(skb)->offset at all?
>
> No, they can see it, so it's possible to make a KCM program that works
> right now if you are careful (I'm not sure why the offset within bpf is
> different from the offset in the kernel though, it looks like the bpf
> program skips the qos part of the control buffer)

What helper is used in the BPF program to get this offset value?

(also good info to add to the commit message)

2018-09-18 03:00:29

by Dominique Martinet

[permalink] [raw]
Subject: Re: [PATCH v2] kcm: remove any offset before parsing messages

David Miller wrote on Mon, Sep 17, 2018:
> > No, they can see it, so it's possible to make a KCM program that works
> > right now if you are careful (I'm not sure why the offset within bpf is
> > different from the offset in the kernel though, it looks like the bpf
> > program skips the qos part of the control buffer)
>
> What helper is used in the BPF program to get this offset value?
>
> (also good info to add to the commit message)

Dave defined one himself ; for a simple protocol where the offset is in
the first four bytes of the message.

The whole bpf program could look like this:

------
struct kcm_rx_msg { int full_len; int offset; };
static inline struct kcm_rx_msg *kcm_rx_msg(struct __sk_buff *skb) {
return (struct kcm_rx_msg *)skb->cb;
}
int decode_framing(struct __sk_buff *skb) {
return load_word(skb, kcm_rx_msg(skb)->offset);
}
------

If we go towards documenting it, adding a helper would be useful yes;
buf if we pull that becomes unnecessary.
(I'll add the example program in the commit message anyway at your
suggestion)

--
Dominique Martinet

2018-10-31 02:59:50

by Dominique Martinet

[permalink] [raw]
Subject: Re: [PATCH v2] kcm: remove any offset before parsing messages

Dominique Martinet wrote on Tue, Sep 18, 2018:
> David Miller wrote on Mon, Sep 17, 2018:
> > From: Dominique Martinet <[email protected]>
> > Date: Wed, 12 Sep 2018 07:36:42 +0200
> > > Dominique Martinet wrote on Tue, Sep 11, 2018:
> > >> Hmm, while trying to benchmark this, I sometimes got hangs in
> > >> kcm_wait_data() for the last packet somehow?
> > >> The sender program was done (exited (zombie) so I assumed the sender
> > >> socket flushed), but the receiver was in kcm_wait_data in kcm_recvmsg
> > >> indicating it parsed a header but there was no skb to peek at?
> > >> But the sock is locked so this shouldn't be racy...
> > >>
> > >> I can get it fairly often with this patch and small messages with an
> > >> offset, but I think it's just because the pull changes some timing - I
> > >> can't hit it with just the clone, and I can hit it with a pull without
> > >> clone as well.... And I don't see how pulling a cloned skb can impact
> > >> the original socket, but I'm a bit fuzzy on this.
> > >
> > > This is weird, I cannot reproduce at all without that pull, even if I
> > > add another delay there instead of the pull, so it's not just timing...
> >
> > I really can't apply this patch until you resolve this.
> >
> > It is weird, given your description, though...
>
> Thanks for the reminder! I totally agree with you here and did not
> expect this to be merged as it is (in retrospect, I probably should have
> written something to that extent in the subject, "RFC"?)

Found the issue after some trouble reproducing on other VM, long story
short:
- I was blaming kcm_wait_data's sk_wait_data to wait while there was
something in sk->sk_receive_queue, but after adding a fake timeout and
some debug messages I can see the receive queue is empty.
However going back up from the kcm_sock to the kcm_mux to the kcm_psock,
there are things in the psock's socket's receive_queue... (If I'm
following the code correctly, that would be the underlying tcp socket)

- that psock's strparser contains some hints: the interrupted and
stopped bits are set. strp->interrupted looks like it's only set if
kcm_parse_msg returns something < 0. . .
And surely enough, the skb_pull returns NULL iff there's such a hang...!

I might be tempted to send a patch to strparser to add a pr_debug
message in strp_abort_strp...

Anyway, that probably explains I have no problem with bigger VM
(uselessly more memory available) or without KASAN (I guess there's
overhead?), but I'm sending at most 300k of data and the VM has a 1.5GB
of ram, so if there's an allocation failure there I think there's a
problem ! . . .

So, well, I'm not sure on the way forward. Adding a bpf helper and
document that kcm users should mind the offset?

Thanks,
--
Dominique

2019-02-15 02:28:15

by Tom Herbert

[permalink] [raw]
Subject: Re: [PATCH v2] kcm: remove any offset before parsing messages

On Thu, Feb 14, 2019 at 5:00 PM Dominique Martinet
<[email protected]> wrote:
>
> Dominique Martinet wrote on Wed, Oct 31, 2018:
> > Anyway, that probably explains I have no problem with bigger VM
> > (uselessly more memory available) or without KASAN (I guess there's
> > overhead?), but I'm sending at most 300k of data and the VM has a 1.5GB
> > of ram, so if there's an allocation failure there I think there's a
> > problem ! . . .
> >
> > So, well, I'm not sure on the way forward. Adding a bpf helper and
> > document that kcm users should mind the offset?
>
> bump on this - I had mostly forgotten about it but the nfs-ganesha
> community that could make use of KCM reminded me of my patch that's
> waiting for this.
>
> Summary for people coming back after four months:
> - kcm is great, but the bpf function that's supposed to be called for
> each packet does not automatically adjust the offset so that it can
> assume the skb starts with the packet it needs to look at
>
> - there is some workaround code that is far from obvious and
> undocumented, see the original thread[1]:
> [1] http://lkml.kernel.org/r/20180822183852.jnwlxnz54gbbf6po@davejwatson-mba.dhcp.thefacebook.com
>
> - my patch here tried to automatically pull the corresponding packet to
> the front, but apparently with KASAN can trigger out of memory
> behaviours on "small" VMs, so even if it doesn't seem to impact
> performance much without KASAN I don't think it's really ok to
> potentially hang the connection due to oom under severe conditions.
>
>
> The best alternative I see is adding a proper helper to get
> "kcm_rx_msg(skb)->offset" from bpf and document it so users aren't as
> lost as I have been; I'm not quite sure how/where to add such a helper
> though as I've barely looked at the bpf code until now, but should we go
> for that?

Dominique,

Thanks for looking into this.

I'd rather not complicate the bpf code for this. Can we just always do
an pskb_pull after skb_clone?

Tom

>
>
> (it really feels wrong to me that some random person who just started by
> trying to use kcm has to put this much effort to keep the ball rolling,
> if nobody cares about kcm I'm also open to have it removed completely
> despite the obvious performance gain I benchmarked for ganesha[2] ;
> barely maintained feature is worse than no feature)
>
> [2] https://review.gerrithub.io/c/ffilz/nfs-ganesha/+/421314
>
> Thanks,
> --
> Dominique

2019-02-15 02:28:36

by Dominique Martinet

[permalink] [raw]
Subject: Re: [PATCH v2] kcm: remove any offset before parsing messages

Tom Herbert wrote on Thu, Feb 14, 2019:
> > The best alternative I see is adding a proper helper to get
> > "kcm_rx_msg(skb)->offset" from bpf and document it so users aren't as
> > lost as I have been; I'm not quite sure how/where to add such a helper
> > though as I've barely looked at the bpf code until now, but should we go
> > for that?
>
> I'd rather not complicate the bpf code for this. Can we just always do
> an pskb_pull after skb_clone?

Which skb_clone are you thinking of?
If you're referring to the one in strparser, that was my very first
approach here[1], but Dordon shot it down saying that this is not an
strparser bug but a kcm bug since there are ways for users to properly
get the offset and use it -- and the ktls code does it right.

Frankly, my opinion still is that it'd be better in strparser because
there also is some bad use in bpf sockmap (they never look at the offset
either, while kcm uses it for rcv but not for parse), but looking at it
from an optimal performance point of view I agree the user can make
better decision than strparser so I understand where he comes from.


This second patch[2] (the current thread) now does an extra clone if
there is an offset, but the problem really isn't in the clone but the
pull itself that can fail and return NULL when there is memory pressure.
For some reason I hadn't been able to reproduce that behaviour with
strparser doing the pull, but I assume it would also be possible to hit
in extreme situations, I'm not sure...

So anyway, we basically have three choices that I can see:
- push harder on strparser and go back to my first patch ; it's simple
and makes using strparser easier/safer but has a small overhead for
ktls, which uses the current strparser implementation correctly.
I hadn't been able to get this to fail with KASAN last time I tried but
I assume it should still be possible somehow.

- the current patch, that I could only get to fail with KASAN, but does
complexify kcm a bit; this also does not fix bpf sockmap at all.
Would still require to fix the hang, so make strparser retry a few times
if strp->cb.parse_msg failed maybe? Or at least get the error back to
userspace somehow.

- my last suggestion to document the kcm behaviour, and if possible add
a bpf helper to make proper usage easier ; this will make kcm harder to
use on end users but it's better than not being documented and seeing
random unappropriate lengths being parsed.



[1] first strparser patch
http://lkml.kernel.org/r/[email protected]
[2] current thread's patch
http://lkml.kernel.org/r/[email protected]


Thanks,
--
Dominique

2019-02-15 02:28:39

by Dominique Martinet

[permalink] [raw]
Subject: Re: [PATCH v2] kcm: remove any offset before parsing messages

Dominique Martinet wrote on Wed, Oct 31, 2018:
> Anyway, that probably explains I have no problem with bigger VM
> (uselessly more memory available) or without KASAN (I guess there's
> overhead?), but I'm sending at most 300k of data and the VM has a 1.5GB
> of ram, so if there's an allocation failure there I think there's a
> problem ! . . .
>
> So, well, I'm not sure on the way forward. Adding a bpf helper and
> document that kcm users should mind the offset?

bump on this - I had mostly forgotten about it but the nfs-ganesha
community that could make use of KCM reminded me of my patch that's
waiting for this.

Summary for people coming back after four months:
- kcm is great, but the bpf function that's supposed to be called for
each packet does not automatically adjust the offset so that it can
assume the skb starts with the packet it needs to look at

- there is some workaround code that is far from obvious and
undocumented, see the original thread[1]:
[1] http://lkml.kernel.org/r/20180822183852.jnwlxnz54gbbf6po@davejwatson-mba.dhcp.thefacebook.com

- my patch here tried to automatically pull the corresponding packet to
the front, but apparently with KASAN can trigger out of memory
behaviours on "small" VMs, so even if it doesn't seem to impact
performance much without KASAN I don't think it's really ok to
potentially hang the connection due to oom under severe conditions.


The best alternative I see is adding a proper helper to get
"kcm_rx_msg(skb)->offset" from bpf and document it so users aren't as
lost as I have been; I'm not quite sure how/where to add such a helper
though as I've barely looked at the bpf code until now, but should we go
for that?


(it really feels wrong to me that some random person who just started by
trying to use kcm has to put this much effort to keep the ball rolling,
if nobody cares about kcm I'm also open to have it removed completely
despite the obvious performance gain I benchmarked for ganesha[2] ;
barely maintained feature is worse than no feature)

[2] https://review.gerrithub.io/c/ffilz/nfs-ganesha/+/421314

Thanks,
--
Dominique

2019-02-15 02:49:00

by Tom Herbert

[permalink] [raw]
Subject: Re: [PATCH v2] kcm: remove any offset before parsing messages

On Thu, Feb 14, 2019 at 5:57 PM Dominique Martinet
<[email protected]> wrote:
>
> Tom Herbert wrote on Thu, Feb 14, 2019:
> > > The best alternative I see is adding a proper helper to get
> > > "kcm_rx_msg(skb)->offset" from bpf and document it so users aren't as
> > > lost as I have been; I'm not quite sure how/where to add such a helper
> > > though as I've barely looked at the bpf code until now, but should we go
> > > for that?
> >
> > I'd rather not complicate the bpf code for this. Can we just always do
> > an pskb_pull after skb_clone?
>
> Which skb_clone are you thinking of?
> If you're referring to the one in strparser, that was my very first
> approach here[1], but Dordon shot it down saying that this is not an
> strparser bug but a kcm bug since there are ways for users to properly
> get the offset and use it -- and the ktls code does it right.
>
> Frankly, my opinion still is that it'd be better in strparser because
> there also is some bad use in bpf sockmap (they never look at the offset
> either, while kcm uses it for rcv but not for parse), but looking at it
> from an optimal performance point of view I agree the user can make
> better decision than strparser so I understand where he comes from.
>
>
> This second patch[2] (the current thread) now does an extra clone if
> there is an offset, but the problem really isn't in the clone but the
> pull itself that can fail and return NULL when there is memory pressure.
> For some reason I hadn't been able to reproduce that behaviour with
> strparser doing the pull, but I assume it would also be possible to hit
> in extreme situations, I'm not sure...
>
This option looks the best to me, at least as a way to fix the issue
without requiring a change to the API. If the pull fails, doesn't that
just mean that the parser fails? Is there some behavior with this
patch that is not being handled gracefully?

Thanks,
Tom

> So anyway, we basically have three choices that I can see:
> - push harder on strparser and go back to my first patch ; it's simple
> and makes using strparser easier/safer but has a small overhead for
> ktls, which uses the current strparser implementation correctly.
> I hadn't been able to get this to fail with KASAN last time I tried but
> I assume it should still be possible somehow.
>
> - the current patch, that I could only get to fail with KASAN, but does
> complexify kcm a bit; this also does not fix bpf sockmap at all.
> Would still require to fix the hang, so make strparser retry a few times
> if strp->cb.parse_msg failed maybe? Or at least get the error back to
> userspace somehow.
>
> - my last suggestion to document the kcm behaviour, and if possible add
> a bpf helper to make proper usage easier ; this will make kcm harder to
> use on end users but it's better than not being documented and seeing
> random unappropriate lengths being parsed.
>
>
>
> [1] first strparser patch
> http://lkml.kernel.org/r/[email protected]
> [2] current thread's patch
> http://lkml.kernel.org/r/[email protected]
>
>
> Thanks,
> --
> Dominique

2019-02-15 14:50:14

by Dominique Martinet

[permalink] [raw]
Subject: Re: [PATCH v2] kcm: remove any offset before parsing messages

Tom Herbert wrote on Thu, Feb 14, 2019:
> > This second patch[2] (the current thread) now does an extra clone if
> > there is an offset, but the problem really isn't in the clone but the
> > pull itself that can fail and return NULL when there is memory pressure.
> > For some reason I hadn't been able to reproduce that behaviour with
> > strparser doing the pull, but I assume it would also be possible to hit
> > in extreme situations, I'm not sure...
>
> This option looks the best to me, at least as a way to fix the issue
> without requiring a change to the API. If the pull fails, doesn't that
> just mean that the parser fails? Is there some behavior with this
> patch that is not being handled gracefully?

Yes, the parser fails with -ENOMEM ; that is not handled gracefully at
all: from a user point of view, the connection just hangs (recvmsg never
returns), without so much as a message in dmesg either.

It took me a while to figure out what failed exactly as I did indeed
expect strparser/kcm to handle that better, but ultimately as things
stand if the parser fails it calls strp_parser_err() with the error
which ends up in strp_abort_strp that should call
sk->sk_error_report(sk) but in kcm case sk is the csk and I guess
failing csk does not make a pending recv on the kcm sock to fail...

I'm not sure whether propagating the error to the upper socket is the
right thing to do, kcm is meant to be able to work with multiple
underlying sockets so I feel we must be cautious about that, but
strparser might be able to retry somehow.
This is what I said below:
> > [,,,]
> > - the current patch, that I could only get to fail with KASAN, but does
> > complexify kcm a bit; this also does not fix bpf sockmap at all.
> > Would still require to fix the hang, so make strparser retry a few times
> > if strp->cb.parse_msg failed maybe? Or at least get the error back to
> > userspace somehow.

Thanks,
--
Dominique

2019-02-15 14:51:51

by Tom Herbert

[permalink] [raw]
Subject: Re: [PATCH v2] kcm: remove any offset before parsing messages

On Thu, Feb 14, 2019 at 7:31 PM Dominique Martinet
<[email protected]> wrote:
>
> Tom Herbert wrote on Thu, Feb 14, 2019:
> > > This second patch[2] (the current thread) now does an extra clone if
> > > there is an offset, but the problem really isn't in the clone but the
> > > pull itself that can fail and return NULL when there is memory pressure.
> > > For some reason I hadn't been able to reproduce that behaviour with
> > > strparser doing the pull, but I assume it would also be possible to hit
> > > in extreme situations, I'm not sure...
> >
> > This option looks the best to me, at least as a way to fix the issue
> > without requiring a change to the API. If the pull fails, doesn't that
> > just mean that the parser fails? Is there some behavior with this
> > patch that is not being handled gracefully?
>
> Yes, the parser fails with -ENOMEM ; that is not handled gracefully at
> all: from a user point of view, the connection just hangs (recvmsg never
> returns), without so much as a message in dmesg either.
>
Dominique,

That's by design. Here is the description in kcm.txt:

"When a TCP socket is attached to a KCM multiplexor data ready
(POLLIN) and write space available (POLLOUT) events are handled by the
multiplexor. If there is a state change (disconnection) or other error
on a TCP socket, an error is posted on the TCP socket so that a
POLLERR event happens and KCM discontinues using the socket. When the
application gets the error notification for a TCP socket, it should
unattach the socket from KCM and then handle the error condition (the
typical response is to close the socket and create a new connection if
necessary)."

> It took me a while to figure out what failed exactly as I did indeed
> expect strparser/kcm to handle that better, but ultimately as things
> stand if the parser fails it calls strp_parser_err() with the error
> which ends up in strp_abort_strp that should call
> sk->sk_error_report(sk) but in kcm case sk is the csk and I guess
> failing csk does not make a pending recv on the kcm sock to fail...
>
> I'm not sure whether propagating the error to the upper socket is the
> right thing to do, kcm is meant to be able to work with multiple
> underlying sockets so I feel we must be cautious about that, but

Right, that's the motivation for the design.

> strparser might be able to retry somehow.

We could retry on -ENOMEM since it is potentially a transient error,
but generally for errors (like connection is simply broken) it seems
like it's working as intended. I suppose we could add a socket option
to fail the KCM socket when all attached lower sockets have failed,
but I not sure that would be a significant benefit for additional
complexity.

> This is what I said below:
> > > [,,,]
> > > - the current patch, that I could only get to fail with KASAN, but does
> > > complexify kcm a bit; this also does not fix bpf sockmap at all.
> > > Would still require to fix the hang, so make strparser retry a few times
> > > if strp->cb.parse_msg failed maybe? Or at least get the error back to
> > > userspace somehow.

The error should be getting to userspace via the TCP socket.

Tom

>
> Thanks,
> --
> Dominique

2019-02-15 14:56:50

by Dominique Martinet

[permalink] [raw]
Subject: Re: [PATCH v2] kcm: remove any offset before parsing messages

Tom Herbert wrote on Thu, Feb 14, 2019:
> On Thu, Feb 14, 2019 at 7:31 PM Dominique Martinet
> <[email protected]> wrote:
> > Yes, the parser fails with -ENOMEM ; that is not handled gracefully at
> > all: from a user point of view, the connection just hangs (recvmsg never
> > returns), without so much as a message in dmesg either.
>
> That's by design. Here is the description in kcm.txt:
>
> "When a TCP socket is attached to a KCM multiplexor data ready
> (POLLIN) and write space available (POLLOUT) events are handled by the
> multiplexor. If there is a state change (disconnection) or other error
> on a TCP socket, an error is posted on the TCP socket so that a
> POLLERR event happens and KCM discontinues using the socket. When the
> application gets the error notification for a TCP socket, it should
> unattach the socket from KCM and then handle the error condition (the
> typical response is to close the socket and create a new connection if
> necessary)."

Sigh, that's what I get for relying on pieces of code found on the
internet.

It does make "trivial" kcm sockets difficult to use though, the old
ganesha code I adapted to kcm was the horrible (naive?) kind spawning
one thread per connection and just waiting on read(), so making it just
create a kcm socket from the tcp one and wait on recvmsg() until an
error pops up does not seem an option.

That being said I'm a bit confused, I thought part of the point of kcm
was the multiplexing so a simple server could just keep attaching tcp
sockets to a single kcm socket and only have a single trivial loop
reading from the kcm socket ; but I guess there's no harm in also
looking for POLLERR on the tcp sockets... It would still need to close
them once clients disconnect I guess, this really only affects my naive
server.

> > strparser might be able to retry somehow.
>
> We could retry on -ENOMEM since it is potentially a transient error,

Yes, I think we should aim to retry on -ENOMEM; I agree all errors are
not meant to be retried on but there already are different cases based
on what the parse cb returned; but that can be a different patch.

> but generally for errors (like connection is simply broken) it seems
> like it's working as intended. I suppose we could add a socket option
> to fail the KCM socket when all attached lower sockets have failed,
> but I not sure that would be a significant benefit for additional
> complexity.

Right, I agree it's probably not worth doing, now I'm aware of this I
can probably motivate myself to change this old code to use epoll
properly.

I think it could make sense to have this option for simpler programs,
but as things stand I guess we can require kcm users to do this much,
thanks for the explanation, and sorry for having failed to notice it.



With all that said I guess my patch should work correctly then, I'll try
to find some time to check the error does come back up the tcp socket in
my reproducer but I have no reason to believe it doesn't.

I'd like to see some retry on ENOMEM before this is merged though, so
while I'm there I'll resend this with a second patch doing that
retry,.. I think just not setting strp->interrupted and not reporting
the error up might be enough? Will have to try either way.



Thanks for the feedback,
--
Dominique

2019-02-20 04:13:02

by Dominique Martinet

[permalink] [raw]
Subject: Re: [PATCH v2] kcm: remove any offset before parsing messages

Dominique Martinet wrote on Fri, Feb 15, 2019:
> With all that said I guess my patch should work correctly then, I'll try
> to find some time to check the error does come back up the tcp socket in
> my reproducer but I have no reason to believe it doesn't.

Ok, so I can confirm this part - the 'csock' does come back up with
POLLERR if the parse function returns ENOMEM in the current code.

It also comes back up with POLLERR when the remote side closes the
connection, which is expected, but I'm having a very hard time
understanding how an application is supposed to deal with these
POLLERR after having read the documentation and a bit of
experimentation.
I'm not sure how much it would matter for real life (if the other end
closes the connection most servers would not care about what they said
just before closing, but I can imagine some clients doing that in real
life e.g. a POST request they don't care if it succeeds or not)...
My test program works like this:
- client connects, sends 10k messages and close()s the socket
- server loops recving and close()s after 10k messages; it used to be
recvmsg() directly but it's now using poll then recvmsg.


When the client closes the socket, some messages are obviously still "in
flight", and the server will recv a POLLERR notification on the csock at
some point with many messages left.
The documentation says to unattach the csock when you get POLLER. If I
do that, the kcm socket will no longer give me any message, so all the
messages still in flight at the time are lost.

If I just ignore the csock like I used to, all the messages do come just
fine, but as said previously on a real error this will just make recvmsg
or the polling hang forever and I see no way to distinguish a "real"
error vs. a connection shut down from the remote side with data left in
the pipe.
I thought of checking POLLERR on csock and POLLIN not set on kcmsock,
but even that seems to happen fairly regularily - the kcm sock hasn't
been filled up, it's still reading from the csock.


On the other hand, checking POLLIN on the csock does say there is still
data left, so I know there is data left on the csock, but this is also
the case on a real error (e.g. if parser returns -ENOMEM)
... And this made me try to read from the csock after detaching it and I
can resume manual tcp parsing for a few messages until read() fails with
EPROTO ?! and I cannot seem to be able to get anything out of attaching
it back to kcm (for e.g. an ENOMEM error that was transient)...



I'm honestly not sure how the POLLERR notification mechanism works but I
think it would be much easier to use KCM if we could somehow delay that
error until KCM is done feeding from the csock (when netparser really
stops reading from it like on real error, e.g. abort callback maybe?)
I think it's fine if the csock is closed before the kcm sock message is
read, but we should not lose messages like this.



> I'd like to see some retry on ENOMEM before this is merged though, so
> while I'm there I'll resend this with a second patch doing that
> retry,.. I think just not setting strp->interrupted and not reporting
> the error up might be enough? Will have to try either way.

I also tried playing with that without much success.
I had assumed just not calling strp_parser_err() (which calls the
abort_parser cb) would be enough, eventually calling strp_start_timer()
like the !len case, but no can do.
With that said, returning 0 from the parse function also raises POLLERR
on the csock and hangs netparser, so things aren't that simple...


I could use a bit of help again.

Thanks,
--
Dominique

2019-02-20 16:20:26

by Tom Herbert

[permalink] [raw]
Subject: Re: [PATCH v2] kcm: remove any offset before parsing messages

On Tue, Feb 19, 2019 at 8:12 PM Dominique Martinet
<[email protected]> wrote:
>
> Dominique Martinet wrote on Fri, Feb 15, 2019:
> > With all that said I guess my patch should work correctly then, I'll try
> > to find some time to check the error does come back up the tcp socket in
> > my reproducer but I have no reason to believe it doesn't.
>
> Ok, so I can confirm this part - the 'csock' does come back up with
> POLLERR if the parse function returns ENOMEM in the current code.
>
Good.

> It also comes back up with POLLERR when the remote side closes the
> connection, which is expected, but I'm having a very hard time
> understanding how an application is supposed to deal with these
> POLLERR after having read the documentation and a bit of
> experimentation.
> I'm not sure how much it would matter for real life (if the other end
> closes the connection most servers would not care about what they said
> just before closing, but I can imagine some clients doing that in real
> life e.g. a POST request they don't care if it succeeds or not)...
> My test program works like this:
> - client connects, sends 10k messages and close()s the socket
> - server loops recving and close()s after 10k messages; it used to be
> recvmsg() directly but it's now using poll then recvmsg.
>
>
> When the client closes the socket, some messages are obviously still "in
> flight", and the server will recv a POLLERR notification on the csock at
> some point with many messages left.
> The documentation says to unattach the csock when you get POLLER. If I
> do that, the kcm socket will no longer give me any message, so all the
> messages still in flight at the time are lost.

So basically it sounds like you're interested in supporting TCP
connections that are half closed. I believe that the error in half
closed is EPIPE, so if the TCP socket returns that it can be ignored
and the socket can continue being attached and used to send data.

Another possibility is to add some linger semantics to an attached
socket. For instance, a large message might be sent so that part of
the messge is queued in TCP and part is queued in the KCM socket.
Unattach would probably break that message. We probably want to linger
option similar to SO_LINGER (or maybe just use the option on the TCP
socket) that means don't complete the detach until any message being
transmitted on the lower socket has been queued.
>
> If I just ignore the csock like I used to, all the messages do come just
> fine, but as said previously on a real error this will just make recvmsg
> or the polling hang forever and I see no way to distinguish a "real"
> error vs. a connection shut down from the remote side with data left in
> the pipe.
> I thought of checking POLLERR on csock and POLLIN not set on kcmsock,
> but even that seems to happen fairly regularily - the kcm sock hasn't
> been filled up, it's still reading from the csock.
>
>
> On the other hand, checking POLLIN on the csock does say there is still
> data left, so I know there is data left on the csock, but this is also
> the case on a real error (e.g. if parser returns -ENOMEM)
> ... And this made me try to read from the csock after detaching it and I
> can resume manual tcp parsing for a few messages until read() fails with
> EPROTO ?! and I cannot seem to be able to get anything out of attaching
> it back to kcm (for e.g. an ENOMEM error that was transient)...
>
>
>
> I'm honestly not sure how the POLLERR notification mechanism works but I
> think it would be much easier to use KCM if we could somehow delay that
> error until KCM is done feeding from the csock (when netparser really
> stops reading from it like on real error, e.g. abort callback maybe?)
> I think it's fine if the csock is closed before the kcm sock message is
> read, but we should not lose messages like this.

Sounds like linger semantics is needed then.

>
>
>
> > I'd like to see some retry on ENOMEM before this is merged though, so
> > while I'm there I'll resend this with a second patch doing that
> > retry,.. I think just not setting strp->interrupted and not reporting
> > the error up might be enough? Will have to try either way.
>
> I also tried playing with that without much success.
> I had assumed just not calling strp_parser_err() (which calls the
> abort_parser cb) would be enough, eventually calling strp_start_timer()
> like the !len case, but no can do.

I think you need to ignore the ENOMEM and have a timer or other
callback to retry the operation in the future.

> With that said, returning 0 from the parse function also raises POLLERR
> on the csock and hangs netparser, so things aren't that simple...

Can you point to where this is happening. If the parse_msg callback
returns 0 that is suppose to indicate that more bytes are needed.

>

>
> I could use a bit of help again.
>
> Thanks,
> --
> Dominique

2019-02-21 08:23:40

by Dominique Martinet

[permalink] [raw]
Subject: Re: [PATCH v2] kcm: remove any offset before parsing messages

Tom Herbert wrote on Wed, Feb 20, 2019:
> > When the client closes the socket, some messages are obviously still "in
> > flight", and the server will recv a POLLERR notification on the csock at
> > some point with many messages left.
> > The documentation says to unattach the csock when you get POLLER. If I
> > do that, the kcm socket will no longer give me any message, so all the
> > messages still in flight at the time are lost.
>
> So basically it sounds like you're interested in supporting TCP
> connections that are half closed. I believe that the error in half
> closed is EPIPE, so if the TCP socket returns that it can be ignored
> and the socket can continue being attached and used to send data.

Did you mean 'can continue being attached and used to receive data'?

I can confirm getsockopt with SO_ERROR gets me EPIPE, but I don't see
how to efficiently ignore EPIPE until POLLIN gets unset -- polling on
both the csock and kcm socket will do many needless wakeups on only the
csock from what I can see, so I'd need some holdoff timer or something.
I guess it's possible though.

> Another possibility is to add some linger semantics to an attached
> socket. For instance, a large message might be sent so that part of
> the messge is queued in TCP and part is queued in the KCM socket.
> Unattach would probably break that message. We probably want to linger
> option similar to SO_LINGER (or maybe just use the option on the TCP
> socket) that means don't complete the detach until any message being
> transmitted on the lower socket has been queued.

That would certainly work, even if non-obvious from a user standpoint.


> > > I'd like to see some retry on ENOMEM before this is merged though, so
> > > while I'm there I'll resend this with a second patch doing that
> > > retry,.. I think just not setting strp->interrupted and not reporting
> > > the error up might be enough? Will have to try either way.
> >
> > I also tried playing with that without much success.
> > I had assumed just not calling strp_parser_err() (which calls the
> > abort_parser cb) would be enough, eventually calling strp_start_timer()
> > like the !len case, but no can do.
>
> I think you need to ignore the ENOMEM and have a timer or other
> callback to retry the operation in the future.

Yes, that's what I had intended to try; basically just break out and
schedule timer as said above.

After a bit more debugging, this part works (__strp_recv() is called
again); but the next packet that is treated properly is rejected because
by the time __strp_recv() was called again a new skb was read and the
length isn't large enough to go all the way into the new packet, so this
test fails:
} else if (len <= (ssize_t)head->len -
skb->len - stm->strp.offset) {
/* Length must be into new skb (and also
* greater than zero)
*/
STRP_STATS_INCR(strp->stats.bad_hdr_len);
strp_parser_err(strp, -EPROTO, desc);

So I need to figure a way to say "call this function again without
reading more data" somehow, or make this check more lax e.g. accept any
len > 0 after a retry maybe...
Removing that branch altogether seems to work at least but I'm not sure
we'd want to?
(grmbl at this slow VM and strparser not being possible to enable as a
module, it takes ages to test)


> > With that said, returning 0 from the parse function also raises POLLERR
> > on the csock and hangs netparser, so things aren't that simple...
>
> Can you point to where this is happening. If the parse_msg callback
> returns 0 that is suppose to indicate that more bytes are needed.

I just blindly returned 0 "from time to time" in the kcm parser
function, but looking at above it was failing on the same check.
This somewhat makes sense for this one to fail here if a new packet was
read, no sane parser function should give a length smaller than what
they require to determine the length.


Thanks,
--
Dominique

2019-02-22 19:25:40

by Tom Herbert

[permalink] [raw]
Subject: Re: [PATCH v2] kcm: remove any offset before parsing messages

On Thu, Feb 21, 2019 at 12:22 AM Dominique Martinet
<[email protected]> wrote:
>
> Tom Herbert wrote on Wed, Feb 20, 2019:
> > > When the client closes the socket, some messages are obviously still "in
> > > flight", and the server will recv a POLLERR notification on the csock at
> > > some point with many messages left.
> > > The documentation says to unattach the csock when you get POLLER. If I
> > > do that, the kcm socket will no longer give me any message, so all the
> > > messages still in flight at the time are lost.
> >
> > So basically it sounds like you're interested in supporting TCP
> > connections that are half closed. I believe that the error in half
> > closed is EPIPE, so if the TCP socket returns that it can be ignored
> > and the socket can continue being attached and used to send data.
>
> Did you mean 'can continue being attached and used to receive data'?
>
No, I meant shutdown on receive side when FIN is receved. TX is still
allowed to drain an queued bytes. To support shutdown on the TX side
would require additional logic since we need to effectively detach the
transmit path but retain the receive path. I'm not sure this is a
compelling use case to support.

> I can confirm getsockopt with SO_ERROR gets me EPIPE, but I don't see
> how to efficiently ignore EPIPE until POLLIN gets unset -- polling on
> both the csock and kcm socket will do many needless wakeups on only the
> csock from what I can see, so I'd need some holdoff timer or something.
> I guess it's possible though.

We might need to clear the error somehow. May a read of zero bytes?

>
> > Another possibility is to add some linger semantics to an attached
> > socket. For instance, a large message might be sent so that part of
> > the messge is queued in TCP and part is queued in the KCM socket.
> > Unattach would probably break that message. We probably want to linger
> > option similar to SO_LINGER (or maybe just use the option on the TCP
> > socket) that means don't complete the detach until any message being
> > transmitted on the lower socket has been queued.
>
> That would certainly work, even if non-obvious from a user standpoint.
>
>
> > > > I'd like to see some retry on ENOMEM before this is merged though, so
> > > > while I'm there I'll resend this with a second patch doing that
> > > > retry,.. I think just not setting strp->interrupted and not reporting
> > > > the error up might be enough? Will have to try either way.
> > >
> > > I also tried playing with that without much success.
> > > I had assumed just not calling strp_parser_err() (which calls the
> > > abort_parser cb) would be enough, eventually calling strp_start_timer()
> > > like the !len case, but no can do.
> >
> > I think you need to ignore the ENOMEM and have a timer or other
> > callback to retry the operation in the future.
>
> Yes, that's what I had intended to try; basically just break out and
> schedule timer as said above.

You might want to look at some other systems, I don't recall if
there's a hook that can be used for when memory pressure is relieved.

>
> After a bit more debugging, this part works (__strp_recv() is called
> again); but the next packet that is treated properly is rejected because
> by the time __strp_recv() was called again a new skb was read and the
> length isn't large enough to go all the way into the new packet, so this
> test fails:
> } else if (len <= (ssize_t)head->len -
> skb->len - stm->strp.offset) {
> /* Length must be into new skb (and also
> * greater than zero)
> */
> STRP_STATS_INCR(strp->stats.bad_hdr_len);
> strp_parser_err(strp, -EPROTO, desc);
>
> So I need to figure a way to say "call this function again without
> reading more data" somehow, or make this check more lax e.g. accept any
> len > 0 after a retry maybe...
> Removing that branch altogether seems to work at least but I'm not sure
> we'd want to?

I like the check since it's conservative and covers the normal case.
Maybe just need some more logic?
> (grmbl at this slow VM and strparser not being possible to enable as a
> module, it takes ages to test)
>
>
> > > With that said, returning 0 from the parse function also raises POLLERR
> > > on the csock and hangs netparser, so things aren't that simple...
> >
> > Can you point to where this is happening. If the parse_msg callback
> > returns 0 that is suppose to indicate that more bytes are needed.
>
> I just blindly returned 0 "from time to time" in the kcm parser
> function, but looking at above it was failing on the same check.
> This somewhat makes sense for this one to fail here if a new packet was
> read, no sane parser function should give a length smaller than what
> they require to determine the length.
>
>
> Thanks,
> --
> Dominique

2019-02-22 20:28:57

by Dominique Martinet

[permalink] [raw]
Subject: Re: [PATCH v2] kcm: remove any offset before parsing messages

Tom Herbert wrote on Fri, Feb 22, 2019:
> > > So basically it sounds like you're interested in supporting TCP
> > > connections that are half closed. I believe that the error in half
> > > closed is EPIPE, so if the TCP socket returns that it can be ignored
> > > and the socket can continue being attached and used to send data.
> >
> > Did you mean 'can continue being attached and used to receive data'?
>
> No, I meant shutdown on receive side when FIN is receved. TX is still
> allowed to drain an queued bytes. To support shutdown on the TX side
> would require additional logic since we need to effectively detach the
> transmit path but retain the receive path. I'm not sure this is a
> compelling use case to support.

Hm, it must be a matter of how we see thing but from what I understand
it's exactly the other way around. The remote closed the connection, so
trying to send anything would just yield a RST, so TX doesn't make
sense.
On the other hand, anything that had been sent by the remote before the
FIN and is on the local side's memory should still be receivable.

When you think about it as a TCP stream it's really weird: data coming,
data coming, data coming, FIN received.
But in the networking stack that received FIN short-circuits all the
data that was left around and immediately raises an EPIPE error.

I don't see what makes this FIN packet so great that it should be
processed before the data; we should only see that EPIPE when we're
done reading the data before it or trying to send something.

I'll check tomorrow/next week but I'm pretty sure the packets before
that have been ack'd at a tcp level as well, so losing them in the
application level is really unexpected.

> > I can confirm getsockopt with SO_ERROR gets me EPIPE, but I don't see
> > how to efficiently ignore EPIPE until POLLIN gets unset -- polling on
> > both the csock and kcm socket will do many needless wakeups on only the
> > csock from what I can see, so I'd need some holdoff timer or something.
> > I guess it's possible though.
>
> We might need to clear the error somehow. May a read of zero bytes?

Can try.

> > After a bit more debugging, this part works (__strp_recv() is called
> > again); but the next packet that is treated properly is rejected because
> > by the time __strp_recv() was called again a new skb was read and the
> > length isn't large enough to go all the way into the new packet, so this
> > test fails:
> > } else if (len <= (ssize_t)head->len -
> > skb->len - stm->strp.offset) {
> > /* Length must be into new skb (and also
> > * greater than zero)
> > */
> > STRP_STATS_INCR(strp->stats.bad_hdr_len);
> > strp_parser_err(strp, -EPROTO, desc);
> >
> > So I need to figure a way to say "call this function again without
> > reading more data" somehow, or make this check more lax e.g. accept any
> > len > 0 after a retry maybe...
> > Removing that branch altogether seems to work at least but I'm not sure
> > we'd want to?
>
> I like the check since it's conservative and covers the normal case.
> Maybe just need some more logic?

I can add a "retrying" state and not fail here if we ewre retrying for
whatever reason perhaps...
But I'm starting to wonder how this would work if my client didn't keep
on sending data, I'll try to fail on the last client's packet and see
how __strp_recv is called again through the timer, with the same skb
perhaps?

--
Dominique

2019-02-22 21:02:38

by Tom Herbert

[permalink] [raw]
Subject: Re: [PATCH v2] kcm: remove any offset before parsing messages

On Fri, Feb 22, 2019 at 12:28 PM Dominique Martinet
<[email protected]> wrote:
>
> Tom Herbert wrote on Fri, Feb 22, 2019:
> > > > So basically it sounds like you're interested in supporting TCP
> > > > connections that are half closed. I believe that the error in half
> > > > closed is EPIPE, so if the TCP socket returns that it can be ignored
> > > > and the socket can continue being attached and used to send data.
> > >
> > > Did you mean 'can continue being attached and used to receive data'?
> >
> > No, I meant shutdown on receive side when FIN is receved. TX is still
> > allowed to drain an queued bytes. To support shutdown on the TX side
> > would require additional logic since we need to effectively detach the
> > transmit path but retain the receive path. I'm not sure this is a
> > compelling use case to support.
>
Dominque,

> Hm, it must be a matter of how we see thing but from what I understand
> it's exactly the other way around. The remote closed the connection, so
> trying to send anything would just yield a RST, so TX doesn't make
> sense.
> On the other hand, anything that had been sent by the remote before the
> FIN and is on the local side's memory should still be receivable.

That's right, any data sent before the FIN can be received. After the
FIN, there is not more data to received. But, the FIN doesn't say
anything about the reverse path. For instance, if the peer did
shutdown(SHUT_WR), then it sent the FIN so it no longer transmits on
the connection, but still can receive data.

>
> When you think about it as a TCP stream it's really weird: data coming,
> data coming, data coming, FIN received.
> But in the networking stack that received FIN short-circuits all the
> data that was left around and immediately raises an EPIPE error.
>
Right, it doesn't help that most people automatically assume a
received FIN means the connection is closed! (although in practice
that assumption works pretty well for most applications).

> I don't see what makes this FIN packet so great that it should be
> processed before the data; we should only see that EPIPE when we're
> done reading the data before it or trying to send something.

It's not supposed to be. If your seeing a problem, it might because
state_change is processed before the received data is drained. If
plain recvmsg were being used, zero bytes is on returned after all
outstanding received data on the socket has been read. We might need
some additional logic in KCM to handle this.

>
> I'll check tomorrow/next week but I'm pretty sure the packets before
> that have been ack'd at a tcp level as well, so losing them in the
> application level is really unexpected.
>
> > > I can confirm getsockopt with SO_ERROR gets me EPIPE, but I don't see
> > > how to efficiently ignore EPIPE until POLLIN gets unset -- polling on
> > > both the csock and kcm socket will do many needless wakeups on only the
> > > csock from what I can see, so I'd need some holdoff timer or something.
> > > I guess it's possible though.
> >
> > We might need to clear the error somehow. May a read of zero bytes?
>
> Can try.
>
> > > After a bit more debugging, this part works (__strp_recv() is called
> > > again); but the next packet that is treated properly is rejected because
> > > by the time __strp_recv() was called again a new skb was read and the
> > > length isn't large enough to go all the way into the new packet, so this
> > > test fails:
> > > } else if (len <= (ssize_t)head->len -
> > > skb->len - stm->strp.offset) {
> > > /* Length must be into new skb (and also
> > > * greater than zero)
> > > */
> > > STRP_STATS_INCR(strp->stats.bad_hdr_len);
> > > strp_parser_err(strp, -EPROTO, desc);
> > >
> > > So I need to figure a way to say "call this function again without
> > > reading more data" somehow, or make this check more lax e.g. accept any
> > > len > 0 after a retry maybe...
> > > Removing that branch altogether seems to work at least but I'm not sure
> > > we'd want to?
> >
> > I like the check since it's conservative and covers the normal case.
> > Maybe just need some more logic?
>
> I can add a "retrying" state and not fail here if we ewre retrying for
> whatever reason perhaps...
> But I'm starting to wonder how this would work if my client didn't keep
> on sending data, I'll try to fail on the last client's packet and see
> how __strp_recv is called again through the timer, with the same skb
> perhaps?

Right, a timer or some sort of aynchronous callbask is needed. Like I
said, I don't think dealing with memory failure like this is unique to
KCM.

Thanks again for looking into this, I think this is good progress!

Tom

>
> --
> Dominique