2017-06-01 14:00:57

by Cyril Hrubis

[permalink] [raw]
Subject: commit f5f99309 (sock: do not set sk_err in sock_dequeue_err_skb) has broken ping

Hi!
I've started to wonder why is ping eating 100% CPU shortly after I've
upgraded my machine to 4.10 and here is what I found:

The ping main_loop() sleeps in poll() on its socket, the poll() usually times
out, at least that's what strace suggets which causes ping to sleep for ~1s in
the kernel.

See ping source at:

https://github.com/iputils/iputils/blob/master/ping_common.c#L587

The poll() seems to start returning POLLERR immediatelly after poll() is called
on the socket in a case that connection has dropped for a short while. It seems to be easily reproducible with:

* Starting ping with some ip address i.e. ping 4.2.2.2
* Letting it ping for a minute or so
* Disconnection a WAN cable from your AP
* After a minute or so ping ends up bussy looping on
poll() that returns with POLLERR immediatelly
* After plugging the cable back the problem gets only
worse since we now spend 99% of the time bussy looping
on the poll() syscall
* And my CPU fan starts to scream loudly

I've bisected the problem to this commit:

commit f5f99309fa7481f59a500f0d08f3379cd6424c1f (HEAD, refs/bisect/bad)
Author: Soheil Hassas Yeganeh <[email protected]>
Date: Thu Nov 3 18:24:27 2016 -0400

sock: do not set sk_err in sock_dequeue_err_skb

--
Cyril Hrubis
[email protected]


2017-06-01 14:11:39

by Soheil Hassas Yeganeh

[permalink] [raw]
Subject: Re: commit f5f99309 (sock: do not set sk_err in sock_dequeue_err_skb) has broken ping

On Thu, Jun 1, 2017 at 10:00 AM, Cyril Hrubis <[email protected]> wrote:
> I've bisected the problem to this commit:
>
> commit f5f99309fa7481f59a500f0d08f3379cd6424c1f (HEAD, refs/bisect/bad)
> Author: Soheil Hassas Yeganeh <[email protected]>
> Date: Thu Nov 3 18:24:27 2016 -0400
>
> sock: do not set sk_err in sock_dequeue_err_skb

Hi Cyril,

I'm sorry for the problem, and thank you for the report.

Two questions:
1. Could you double check whether you have the following commit in your tree?

commit 83a1a1a70e87f676fbb6086b26b6ac7f7fdd107d
Author: Soheil Hassas Yeganeh <[email protected]>
Date: Wed Nov 30 14:01:08 2016 -0500
sock: reset sk_err for ICMP packets read from error queue

2. I've also have sent a fix to iputils on
https://github.com/iputils/iputils/pull/75. Would you be kind to try
that pull request as well?

Thanks,
Soheil

2017-06-01 14:31:48

by Cyril Hrubis

[permalink] [raw]
Subject: Re: commit f5f99309 (sock: do not set sk_err in sock_dequeue_err_skb) has broken ping

Hi!
> > I've bisected the problem to this commit:
> >
> > commit f5f99309fa7481f59a500f0d08f3379cd6424c1f (HEAD, refs/bisect/bad)
> > Author: Soheil Hassas Yeganeh <[email protected]>
> > Date: Thu Nov 3 18:24:27 2016 -0400
> >
> > sock: do not set sk_err in sock_dequeue_err_skb
>
> Hi Cyril,
>
> I'm sorry for the problem, and thank you for the report.
>
> Two questions:
> 1. Could you double check whether you have the following commit in your tree?
>
> commit 83a1a1a70e87f676fbb6086b26b6ac7f7fdd107d
> Author: Soheil Hassas Yeganeh <[email protected]>
> Date: Wed Nov 30 14:01:08 2016 -0500
> sock: reset sk_err for ICMP packets read from error queue

I've started bisecting on v4.11 and see the problem on v4.10 on another
machine, the patch should be there in both cases and the bug is easily
reproducible.

> 2. I've also have sent a fix to iputils on
> https://github.com/iputils/iputils/pull/75. Would you be kind to try
> that pull request as well?

That fixed the problem, you can add:

Tested-by: Cyril Hrubis <[email protected]>

--
Cyril Hrubis
[email protected]

2017-06-01 14:40:14

by Soheil Hassas Yeganeh

[permalink] [raw]
Subject: Re: commit f5f99309 (sock: do not set sk_err in sock_dequeue_err_skb) has broken ping

On Thu, Jun 1, 2017 at 10:31 AM, Cyril Hrubis <[email protected]> wrote:
> I've started bisecting on v4.11 and see the problem on v4.10 on another
> machine, the patch should be there in both cases and the bug is easily
> reproducible.

Thank you for the confirmation. Could you please try the following
patch to see if it fixes your issue?

>From 3ec438460425d127741b20f03f78644c9e441e8c Mon Sep 17 00:00:00 2001
From: Soheil Hassas Yeganeh <[email protected]>
Date: Thu, 1 Jun 2017 10:34:09 -0400
Subject: [PATCH net] sock: reset sk_err when the error queue is empty

Before f5f99309fa74 (sock: do not set sk_err in
sock_dequeue_err_skb), sk_err was reset to 0 upon reading
from the error queue when the error queue was empty.

Applications, most notably ping, are relying on this
behavior to reset sk_err.

Reset sk_err when there is no packet left on the
error queue.

Fixes: f5f99309fa74 (sock: do not set sk_err in sock_dequeue_err_skb)
Reported-by: Cyril Hrubis <[email protected]>
Signed-off-by: Soheil Hassas Yeganeh <[email protected]>
---
net/core/skbuff.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 346d3e85dfbc..5a726161f4e4 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3758,7 +3758,7 @@ struct sk_buff *sock_dequeue_err_skb(struct sock *sk)
icmp_next = is_icmp_err_skb(skb_next);
spin_unlock_irqrestore(&q->lock, flags);

- if (is_icmp_err_skb(skb) && !icmp_next)
+ if ((is_icmp_err_skb(skb) && !icmp_next) || !skb_next)
sk->sk_err = 0;

if (skb_next)
--
2.13.0.219.gdb65acc882-goog


>> 2. I've also have sent a fix to iputils on
>> https://github.com/iputils/iputils/pull/75. Would you be kind to try
>> that pull request as well?
>
> That fixed the problem, you can add:
>
> Tested-by: Cyril Hrubis <[email protected]>

Thank you for testing! Will do.

2017-06-01 15:10:10

by Cyril Hrubis

[permalink] [raw]
Subject: Re: commit f5f99309 (sock: do not set sk_err in sock_dequeue_err_skb) has broken ping

Hi!
> Thank you for the confirmation. Could you please try the following
> patch to see if it fixes your issue?

Does not seem to help, I still got the same bussy loop.

--
Cyril Hrubis
[email protected]

2017-06-01 15:15:56

by Soheil Hassas Yeganeh

[permalink] [raw]
Subject: Re: commit f5f99309 (sock: do not set sk_err in sock_dequeue_err_skb) has broken ping

On Thu, Jun 1, 2017 at 11:10 AM, Cyril Hrubis <[email protected]> wrote:
>> Thank you for the confirmation. Could you please try the following
>> patch to see if it fixes your issue?
>
> Does not seem to help, I still got the same bussy loop.

Thank you for trying the patch. Unfortunately, I can't reproduce on my
machines here.Would you humor me with another one? Thank you!

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 5a726161f4e4..49207298fcea 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3742,7 +3742,8 @@ EXPORT_SYMBOL(sock_queue_err_skb);
static bool is_icmp_err_skb(const struct sk_buff *skb)
{
return skb && (SKB_EXT_ERR(skb)->ee.ee_origin == SO_EE_ORIGIN_ICMP ||
- SKB_EXT_ERR(skb)->ee.ee_origin == SO_EE_ORIGIN_ICMP6);
+ SKB_EXT_ERR(skb)->ee.ee_origin == SO_EE_ORIGIN_ICMP6 ||
+ SKB_EXT_ERR(skb)->ee.ee_origin == SO_EE_ORIGIN_LOCAL);
}

struct sk_buff *sock_dequeue_err_skb(struct sock *sk)

2017-06-01 15:36:50

by Cyril Hrubis

[permalink] [raw]
Subject: Re: commit f5f99309 (sock: do not set sk_err in sock_dequeue_err_skb) has broken ping

Hi!
> >> Thank you for the confirmation. Could you please try the following
> >> patch to see if it fixes your issue?
> >
> > Does not seem to help, I still got the same bussy loop.
>
> Thank you for trying the patch. Unfortunately, I can't reproduce on my
> machines here.Would you humor me with another one? Thank you!

Still does not help.

I've modified the ping binary (on the top of you patch that reads the
error queue) to dump the error structure.

It seems to repeatedly produce (until I plug the cable back):

ee_errno = 113 ee_origin = 2 ee_type = 3 ee_code = 1 ee_info = 0 ee_data = 0

So we get EHOSTUNREACH on SO_EE_ORIGIN_ICMP.

--
Cyril Hrubis
[email protected]

2017-06-01 16:43:34

by Soheil Hassas Yeganeh

[permalink] [raw]
Subject: Re: commit f5f99309 (sock: do not set sk_err in sock_dequeue_err_skb) has broken ping

On Thu, Jun 1, 2017 at 11:36 AM, Cyril Hrubis <[email protected]> wrote:
> It seems to repeatedly produce (until I plug the cable back):
>
> ee_errno = 113 ee_origin = 2 ee_type = 3 ee_code = 1 ee_info = 0 ee_data = 0
>
> So we get EHOSTUNREACH on SO_EE_ORIGIN_ICMP.

Thank you very much! I have a wild guess that, when we
have a train of skbs on the error queue starting from a local error,
we will see this issue.

Ping (without my patch) considers EAGAIN on a normal read as an
indication that there is nothing on the error queue, but that's a
flawed assumption.

Would you mind trying another shot in the darkness please? Thanks!

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 5a726161f4e4..097152a03c74 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3742,7 +3742,8 @@ EXPORT_SYMBOL(sock_queue_err_skb);
static bool is_icmp_err_skb(const struct sk_buff *skb)
{
return skb && (SKB_EXT_ERR(skb)->ee.ee_origin == SO_EE_ORIGIN_ICMP ||
- SKB_EXT_ERR(skb)->ee.ee_origin == SO_EE_ORIGIN_ICMP6);
+ SKB_EXT_ERR(skb)->ee.ee_origin == SO_EE_ORIGIN_ICMP6 ||
+ SKB_EXT_ERR(skb)->ee.ee_origin == SO_EE_ORIGIN_LOCAL);
}

struct sk_buff *sock_dequeue_err_skb(struct sock *sk)
@@ -3751,14 +3752,19 @@ struct sk_buff *sock_dequeue_err_skb(struct sock *sk)
struct sk_buff *skb, *skb_next = NULL;
bool icmp_next = false;
unsigned long flags;
+ int err = 0;

spin_lock_irqsave(&q->lock, flags);
skb = __skb_dequeue(q);
- if (skb && (skb_next = skb_peek(q)))
+ if (skb && (skb_next = skb_peek(q))) {
icmp_next = is_icmp_err_skb(skb_next);
+ err = SKB_EXT_ERR(skb_next)->ee.ee_origin;
+ }
spin_unlock_irqrestore(&q->lock, flags);

- if (is_icmp_err_skb(skb) && !icmp_next)
+ if (icmp_next)
+ sk->sk_err = err;
+ else if (is_icmp_err_skb(skb) && !icmp_next)
sk->sk_err = 0;

if (skb_next)

2017-06-01 20:03:21

by Cyril Hrubis

[permalink] [raw]
Subject: Re: commit f5f99309 (sock: do not set sk_err in sock_dequeue_err_skb) has broken ping

Hi!
> Thank you very much! I have a wild guess that, when we
> have a train of skbs on the error queue starting from a local error,
> we will see this issue.
>
> Ping (without my patch) considers EAGAIN on a normal read as an
> indication that there is nothing on the error queue, but that's a
> flawed assumption.
>
> Would you mind trying another shot in the darkness please? Thanks!

This patch seems to fix the issue, I've tried several times and poll()
just timeouts, haven't seen a single POLLERR in the ping strace.

You can add my Tested-by: for this patch as well :-).

--
Cyril Hrubis
[email protected]