2014-09-10 22:13:21

by Duyck, Alexander H

[permalink] [raw]
Subject: [PATCH net-next 0/2] Address reference counting issues with sock_queue_err_skb

After looking over the code for skb_clone_sk after some comments made by
Eric Dumazet I have come to the conclusion that skb_clone_sk is taking the
correct approach in how to handle the sk_refcnt when creating a buffer that
is eventually meant to be returned to the socket via the sock_queue_err_skb
function.

However upon review of other callers I found what I believe to be a
possible reference count issue in the path for handling "wifi ack" packets.
To address this I have applied the same logic that is currently in place so
that the sk_refcnt will be forced to stay at least 1, or we will not
provide an skb to return in the sk_error_queue.

---

Alexander Duyck (2):
skb: Add documentation for skb_clone_sk
mac80211: Resolve sk_refcnt/sk_wmem_alloc issue in wifi ack path


net/core/skbuff.c | 18 ++++++++++++++++++
net/mac80211/tx.c | 15 ++++-----------
2 files changed, 22 insertions(+), 11 deletions(-)

--


2014-09-11 15:53:37

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH net-next 2/2] mac80211: Resolve sk_refcnt/sk_wmem_alloc issue in wifi ack path

On Thu, 2014-09-11 at 08:21 -0700, Alexander Duyck wrote:

[...]
> >> EXPORT_SYMBOL_GPL(skb_complete_wifi_ack);
> >
> > Here I'm not sure it matters *for this function*? Wouldn't it be freed
> > then in sock_put(), which has the same net effect on this function
> > overall? It doesn't use it after sock_queue_err_skb().
>
> The significant piece is that we are calling sock_put *after*. So if we
> are dropping the last reference the buffer is already in the
> sk_error_queue and will be purged when __sk_free is called.

Yeah, I understand. But that's more of a problem of sock_queue_err_skb()
rather than this function.

> > Seems like maybe this should be in sock_queue_err_skb() itself, since it
> > does the orphaning first and then looks at the socket. Or the
> > documentation for that function should state that it has to be held, but
> > there are plenty of callers?
>
> The problem is there are a number of cases where the sock_hold/put are
> not needed. For example, if we were to clone the skb and immediately
> send the clone up the sk_error_queue then we don't need it. We only
> need it if there is a risk that orphaning the buffer sent could
> potentially result in the destructor calling __sk_free.

Ok, that's reasonable. Maybe then you can add that to the documentation
of sock_queue_err_skb() - that it must (somehow) ensure the socket can't
go away while it's being called? That way this caller change would
become clearer IMHO.

> > So you're removing this part, but can't we really not reuse the clone_sk
> > copy? The difference is that it's charged, but that's fine for the
> > purposes here, no? Or am I misunderstanding that?

> The copy being held cannot really be used for transmit. The problem is
> that it is holding the wrong kind of reference.

Ok.

> The problem lies in the order things are released. The sock_put
> function will dec_and_test sk_refcnt, once it reaches 0 it will do a
> dec_and_test on sk_wmem_alloc to see if it should call __sk_free. Until
> that reaches 0 sk_wmem_alloc cannot reach 0. Once either of these drops
> to 0 we cannot bring the value back up from there. So if I were to
> transmit the clone then it could let the sk_refcnt drop to 0 in which
> case any calls to sock_hold are invalid.
>
> I would need to somehow hold the reference based on sk_wmem_alloc if we
> want to transmit the clone. Many of the hardware timestamping drivers
> seem to just clone the original skb, queue that clone onto the
> sk_error_queue, and then free the original after completing the call. I
> suppose we could change it to something like that, but you are still
> looking at possibly 2 clones in that case anyway.

Well, no need. I just had originally wanted to reuse the clone so under
these corner case conditions we didn't clone twice - no big deal, it
never happens anyway (that IDR thing should never actually run out of
space)

Anyway, thanks. I expect due to the patch 1 davem will apply both
patches (and I'm going to be on vacation anyway), so

Acked-by: Johannes Berg <[email protected]>

for both patches.

Thanks!

johannes


2014-09-10 22:14:19

by Duyck, Alexander H

[permalink] [raw]
Subject: [PATCH net-next 2/2] mac80211: Resolve sk_refcnt/sk_wmem_alloc issue in wifi ack path

There is a possible issue with the use, or lack thereof of sk_refcnt and
sk_wmem_alloc in the wifi ack status functionality.

Specifically if a socket were to request acknowledgements, and the socket
were to have sk_refcnt drop to 0 resulting in it waiting on sk_wmem_alloc
to reach 0 it would be possible to have sock_queue_err_skb orphan the last
buffer, resulting in __sk_free being called on the socket. After this the
buffer is enqueued on sk_error_queue, however the queue has already been
flushed resulting in at least a memory leak, if not a data corruption.

Signed-off-by: Alexander Duyck <[email protected]>
---
net/core/skbuff.c | 5 +++++
net/mac80211/tx.c | 15 ++++-----------
2 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index c9da77a..c8259ac 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3628,9 +3628,14 @@ void skb_complete_wifi_ack(struct sk_buff *skb, bool acked)
serr->ee.ee_errno = ENOMSG;
serr->ee.ee_origin = SO_EE_ORIGIN_TXSTATUS;

+ /* take a reference to prevent skb_orphan() from freeing the socket */
+ sock_hold(sk);
+
err = sock_queue_err_skb(sk, skb);
if (err)
kfree_skb(skb);
+
+ sock_put(sk);
}
EXPORT_SYMBOL_GPL(skb_complete_wifi_ack);

diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 925c39f..cf71414 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -2072,30 +2072,23 @@ netdev_tx_t ieee80211_subif_start_xmit(struct sk_buff *skb,

if (unlikely(!multicast && skb->sk &&
skb_shinfo(skb)->tx_flags & SKBTX_WIFI_STATUS)) {
- struct sk_buff *orig_skb = skb;
+ struct sk_buff *ack_skb = skb_clone_sk(skb);

- skb = skb_clone(skb, GFP_ATOMIC);
- if (skb) {
+ if (ack_skb) {
unsigned long flags;
int id;

spin_lock_irqsave(&local->ack_status_lock, flags);
- id = idr_alloc(&local->ack_status_frames, orig_skb,
+ id = idr_alloc(&local->ack_status_frames, ack_skb,
1, 0x10000, GFP_ATOMIC);
spin_unlock_irqrestore(&local->ack_status_lock, flags);

if (id >= 0) {
info_id = id;
info_flags |= IEEE80211_TX_CTL_REQ_TX_STATUS;
- } else if (skb_shared(skb)) {
- kfree_skb(orig_skb);
} else {
- kfree_skb(skb);
- skb = orig_skb;
+ kfree_skb(ack_skb);
}
- } else {
- /* couldn't clone -- lose tx status ... */
- skb = orig_skb;
}
}



2014-09-12 21:51:54

by David Miller

[permalink] [raw]
Subject: Re: [PATCH net-next 0/2] Address reference counting issues with sock_queue_err_skb

From: Alexander Duyck <[email protected]>
Date: Wed, 10 Sep 2014 18:04:42 -0400

> After looking over the code for skb_clone_sk after some comments made by
> Eric Dumazet I have come to the conclusion that skb_clone_sk is taking the
> correct approach in how to handle the sk_refcnt when creating a buffer that
> is eventually meant to be returned to the socket via the sock_queue_err_skb
> function.
>
> However upon review of other callers I found what I believe to be a
> possible reference count issue in the path for handling "wifi ack" packets.
> To address this I have applied the same logic that is currently in place so
> that the sk_refcnt will be forced to stay at least 1, or we will not
> provide an skb to return in the sk_error_queue.

Series applied, thanks Alex.

2014-09-11 14:41:04

by Duyck, Alexander H

[permalink] [raw]
Subject: Re: [PATCH net-next 2/2] mac80211: Resolve sk_refcnt/sk_wmem_alloc issue in wifi ack path

On 09/11/2014 02:38 AM, Arend van Spriel wrote:
> On 09/11/14 09:06, Johannes Berg wrote:
>> On Wed, 2014-09-10 at 18:05 -0400, Alexander Duyck wrote:
>>> There is a possible issue with the use, or lack thereof of sk_refcnt and
>>> sk_wmem_alloc in the wifi ack status functionality.
>>>
>>> Specifically if a socket were to request acknowledgements, and the
>>> socket
>>> were to have sk_refcnt drop to 0 resulting in it waiting on
>>> sk_wmem_alloc
>>> to reach 0 it would be possible to have sock_queue_err_skb orphan the
>>> last
>>> buffer, resulting in __sk_free being called on the socket. After
>>> this the
>>> buffer is enqueued on sk_error_queue, however the queue has already been
>>> flushed resulting in at least a memory leak, if not a data corruption.
>>
>> Oh. Thanks :-)
>
> Hi Alexander,
>
> So why is this only an issue in wifi ack path. The sock_queue_err_skb()
> does not mention the caller should hold a sock reference. This seems
> entirely an issue of the sock_queue_err_skb() function itself so why not
> do sk_hold/sk_put within that function. Does it impose too much overhead?
>
> Regards,
> Arend

I considered it but there are a number of cases where this is not an issue.

For example in the tx timestamping path there is the software timestamp
case where the buffer is cloned and the clone is queued immediately onto
the sk_error_queue. In that case we still have a reference in the other
skb that is maintaining the socket.

So I thought it best to just address the cases where I know this could
be a problem. I had already addressed it in the timestamping for
hardware timestamps where we are doing something similar. So I thought
it would make sense to cover the other case that should have the same
problems.

Thanks,

Alex

2014-09-11 07:06:42

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH net-next 2/2] mac80211: Resolve sk_refcnt/sk_wmem_alloc issue in wifi ack path

On Wed, 2014-09-10 at 18:05 -0400, Alexander Duyck wrote:
> There is a possible issue with the use, or lack thereof of sk_refcnt and
> sk_wmem_alloc in the wifi ack status functionality.
>
> Specifically if a socket were to request acknowledgements, and the socket
> were to have sk_refcnt drop to 0 resulting in it waiting on sk_wmem_alloc
> to reach 0 it would be possible to have sock_queue_err_skb orphan the last
> buffer, resulting in __sk_free being called on the socket. After this the
> buffer is enqueued on sk_error_queue, however the queue has already been
> flushed resulting in at least a memory leak, if not a data corruption.

Oh. Thanks :-)

> + /* take a reference to prevent skb_orphan() from freeing the socket */
> + sock_hold(sk);
> +
> err = sock_queue_err_skb(sk, skb);
> if (err)
> kfree_skb(skb);
> +
> + sock_put(sk);
> }
> EXPORT_SYMBOL_GPL(skb_complete_wifi_ack);

Here I'm not sure it matters *for this function*? Wouldn't it be freed
then in sock_put(), which has the same net effect on this function
overall? It doesn't use it after sock_queue_err_skb().

Seems like maybe this should be in sock_queue_err_skb() itself, since it
does the orphaning first and then looks at the socket. Or the
documentation for that function should state that it has to be held, but
there are plenty of callers?

> spin_lock_irqsave(&local->ack_status_lock, flags);
> - id = idr_alloc(&local->ack_status_frames, orig_skb,
> + id = idr_alloc(&local->ack_status_frames, ack_skb,
> 1, 0x10000, GFP_ATOMIC);
> spin_unlock_irqrestore(&local->ack_status_lock, flags);
>
> if (id >= 0) {
> info_id = id;
> info_flags |= IEEE80211_TX_CTL_REQ_TX_STATUS;
> - } else if (skb_shared(skb)) {
> - kfree_skb(orig_skb);
> } else {
> - kfree_skb(skb);
> - skb = orig_skb;
> + kfree_skb(ack_skb);
> }

So you're removing this part, but can't we really not reuse the clone_sk
copy? The difference is that it's charged, but that's fine for the
purposes here, no? Or am I misunderstanding that?

johannes


2014-09-11 09:38:28

by Arend van Spriel

[permalink] [raw]
Subject: Re: [PATCH net-next 2/2] mac80211: Resolve sk_refcnt/sk_wmem_alloc issue in wifi ack path

On 09/11/14 09:06, Johannes Berg wrote:
> On Wed, 2014-09-10 at 18:05 -0400, Alexander Duyck wrote:
>> There is a possible issue with the use, or lack thereof of sk_refcnt and
>> sk_wmem_alloc in the wifi ack status functionality.
>>
>> Specifically if a socket were to request acknowledgements, and the socket
>> were to have sk_refcnt drop to 0 resulting in it waiting on sk_wmem_alloc
>> to reach 0 it would be possible to have sock_queue_err_skb orphan the last
>> buffer, resulting in __sk_free being called on the socket. After this the
>> buffer is enqueued on sk_error_queue, however the queue has already been
>> flushed resulting in at least a memory leak, if not a data corruption.
>
> Oh. Thanks :-)

Hi Alexander,

So why is this only an issue in wifi ack path. The sock_queue_err_skb()
does not mention the caller should hold a sock reference. This seems
entirely an issue of the sock_queue_err_skb() function itself so why not
do sk_hold/sk_put within that function. Does it impose too much overhead?

Regards,
Arend

2014-09-10 22:14:08

by Duyck, Alexander H

[permalink] [raw]
Subject: [PATCH net-next v2 1/2] skb: Add documentation for skb_clone_sk

This change adds some documentation to the call skb_clone_sk. This is
meant to help clarify the purpose of the function for other developers.

Signed-off-by: Alexander Duyck <[email protected]>
---

v2: Updated comments to specifically call out need for sock_hold/sock_put

net/core/skbuff.c | 13 +++++++++++++
1 file changed, 13 insertions(+)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index a18dfb0..c9da77a 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3511,6 +3511,19 @@ struct sk_buff *sock_dequeue_err_skb(struct sock *sk)
}
EXPORT_SYMBOL(sock_dequeue_err_skb);

+/**
+ * skb_clone_sk - create clone of skb, and take reference to socket
+ * @skb: the skb to clone
+ *
+ * This function creates a clone of a buffer that holds a reference on
+ * sk_refcnt. Buffers created via this function are meant to be
+ * returned using sock_queue_err_skb, or free via kfree_skb.
+ *
+ * When passing buffers allocated with this function to sock_queue_err_skb
+ * it is necessary to wrap the call with sock_hold/sock_put in order to
+ * prevent the socket from being released prior to being enqueued on
+ * the sk_error_queue.
+ */
struct sk_buff *skb_clone_sk(struct sk_buff *skb)
{
struct sock *sk = skb->sk;


2014-09-11 15:22:24

by Duyck, Alexander H

[permalink] [raw]
Subject: Re: [PATCH net-next 2/2] mac80211: Resolve sk_refcnt/sk_wmem_alloc issue in wifi ack path

On 09/11/2014 12:06 AM, Johannes Berg wrote:
> On Wed, 2014-09-10 at 18:05 -0400, Alexander Duyck wrote:
>> There is a possible issue with the use, or lack thereof of sk_refcnt and
>> sk_wmem_alloc in the wifi ack status functionality.
>>
>> Specifically if a socket were to request acknowledgements, and the socket
>> were to have sk_refcnt drop to 0 resulting in it waiting on sk_wmem_alloc
>> to reach 0 it would be possible to have sock_queue_err_skb orphan the last
>> buffer, resulting in __sk_free being called on the socket. After this the
>> buffer is enqueued on sk_error_queue, however the queue has already been
>> flushed resulting in at least a memory leak, if not a data corruption.
>
> Oh. Thanks :-)
>
>> + /* take a reference to prevent skb_orphan() from freeing the socket */
>> + sock_hold(sk);
>> +
>> err = sock_queue_err_skb(sk, skb);
>> if (err)
>> kfree_skb(skb);
>> +
>> + sock_put(sk);
>> }
>> EXPORT_SYMBOL_GPL(skb_complete_wifi_ack);
>
> Here I'm not sure it matters *for this function*? Wouldn't it be freed
> then in sock_put(), which has the same net effect on this function
> overall? It doesn't use it after sock_queue_err_skb().

The significant piece is that we are calling sock_put *after*. So if we
are dropping the last reference the buffer is already in the
sk_error_queue and will be purged when __sk_free is called.

> Seems like maybe this should be in sock_queue_err_skb() itself, since it
> does the orphaning first and then looks at the socket. Or the
> documentation for that function should state that it has to be held, but
> there are plenty of callers?

The problem is there are a number of cases where the sock_hold/put are
not needed. For example, if we were to clone the skb and immediately
send the clone up the sk_error_queue then we don't need it. We only
need it if there is a risk that orphaning the buffer sent could
potentially result in the destructor calling __sk_free.

>> spin_lock_irqsave(&local->ack_status_lock, flags);
>> - id = idr_alloc(&local->ack_status_frames, orig_skb,
>> + id = idr_alloc(&local->ack_status_frames, ack_skb,
>> 1, 0x10000, GFP_ATOMIC);
>> spin_unlock_irqrestore(&local->ack_status_lock, flags);
>>
>> if (id >= 0) {
>> info_id = id;
>> info_flags |= IEEE80211_TX_CTL_REQ_TX_STATUS;
>> - } else if (skb_shared(skb)) {
>> - kfree_skb(orig_skb);
>> } else {
>> - kfree_skb(skb);
>> - skb = orig_skb;
>> + kfree_skb(ack_skb);
>> }
>
> So you're removing this part, but can't we really not reuse the clone_sk
> copy? The difference is that it's charged, but that's fine for the
> purposes here, no? Or am I misunderstanding that?
>
> johannes

The copy being held cannot really be used for transmit. The problem is
that it is holding the wrong kind of reference.

The problem lies in the order things are released. The sock_put
function will dec_and_test sk_refcnt, once it reaches 0 it will do a
dec_and_test on sk_wmem_alloc to see if it should call __sk_free. Until
that reaches 0 sk_wmem_alloc cannot reach 0. Once either of these drops
to 0 we cannot bring the value back up from there. So if I were to
transmit the clone then it could let the sk_refcnt drop to 0 in which
case any calls to sock_hold are invalid.

I would need to somehow hold the reference based on sk_wmem_alloc if we
want to transmit the clone. Many of the hardware timestamping drivers
seem to just clone the original skb, queue that clone onto the
sk_error_queue, and then free the original after completing the call. I
suppose we could change it to something like that, but you are still
looking at possibly 2 clones in that case anyway.

Thanks,

Alex