2022-07-11 07:04:14

by Hangyu Hua

[permalink] [raw]
Subject: [PATCH] net: 9p: fix possible refcount leak in p9_read_work() and recv_done()

A ref got in p9_tag_lookup needs to be put when functions enter the
error path.

Fix this by adding p9_req_put in error path.

Fixes: 728356dedeff ("9p: Add refcount to p9_req_t")
Signed-off-by: Hangyu Hua <[email protected]>
---
net/9p/trans_fd.c | 3 +++
net/9p/trans_rdma.c | 1 +
2 files changed, 4 insertions(+)

diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index 8f8f95e39b03..c4ccb7b9e1bf 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -343,6 +343,7 @@ static void p9_read_work(struct work_struct *work)
p9_debug(P9_DEBUG_ERROR,
"No recv fcall for tag %d (req %p), disconnecting!\n",
m->rc.tag, m->rreq);
+ p9_req_put(m->rreq);
m->rreq = NULL;
err = -EIO;
goto error;
@@ -372,6 +373,8 @@ static void p9_read_work(struct work_struct *work)
"Request tag %d errored out while we were reading the reply\n",
m->rc.tag);
err = -EIO;
+ p9_req_put(m->rreq);
+ m->rreq = NULL;
goto error;
}
spin_unlock(&m->client->lock);
diff --git a/net/9p/trans_rdma.c b/net/9p/trans_rdma.c
index 88e563826674..82b5d6894ee2 100644
--- a/net/9p/trans_rdma.c
+++ b/net/9p/trans_rdma.c
@@ -317,6 +317,7 @@ recv_done(struct ib_cq *cq, struct ib_wc *wc)
/* Check that we have not yet received a reply for this request.
*/
if (unlikely(req->rc.sdata)) {
+ p9_req_put(req);
pr_err("Duplicate reply for request %d", tag);
goto err_out;
}
--
2.25.1


2022-07-11 07:57:18

by Dominique Martinet

[permalink] [raw]
Subject: Re: [PATCH] net: 9p: fix possible refcount leak in p9_read_work() and recv_done()

Hangyu Hua wrote on Mon, Jul 11, 2022 at 02:59:07PM +0800:
> A ref got in p9_tag_lookup needs to be put when functions enter the
> error path.
>
> Fix this by adding p9_req_put in error path.

I wish it was that simple.

Did you actually observe a leak?

> diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
> index 8f8f95e39b03..c4ccb7b9e1bf 100644
> --- a/net/9p/trans_fd.c
> +++ b/net/9p/trans_fd.c
> @@ -343,6 +343,7 @@ static void p9_read_work(struct work_struct *work)
> p9_debug(P9_DEBUG_ERROR,
> "No recv fcall for tag %d (req %p), disconnecting!\n",
> m->rc.tag, m->rreq);
> + p9_req_put(m->rreq);
> m->rreq = NULL;
> err = -EIO;
> goto error;
> @@ -372,6 +373,8 @@ static void p9_read_work(struct work_struct *work)
> "Request tag %d errored out while we were reading the reply\n",
> m->rc.tag);
> err = -EIO;
> + p9_req_put(m->rreq);
> + m->rreq = NULL;
> goto error;
> }
> spin_unlock(&m->client->lock);


for tcp, we still have that request in m->req_list, so we should be
calling p9_client_cb which will do the p9_req_put in p9_conn_cancel.

If you do it here, you'll get a refcount overflow and use after free.

> diff --git a/net/9p/trans_rdma.c b/net/9p/trans_rdma.c
> index 88e563826674..82b5d6894ee2 100644
> --- a/net/9p/trans_rdma.c
> +++ b/net/9p/trans_rdma.c
> @@ -317,6 +317,7 @@ recv_done(struct ib_cq *cq, struct ib_wc *wc)
> /* Check that we have not yet received a reply for this request.
> */
> if (unlikely(req->rc.sdata)) {
> + p9_req_put(req);
> pr_err("Duplicate reply for request %d", tag);
> goto err_out;
> }

This one isn't as clear cut, I see that they put the client in a
FLUSHING state but nothing seems to acton on it... But if this happens
we're already in the use after free realm -- it means rc.sdata was
already set so the other thread could be calling p9_client_cb anytime if
it already hasn't, and yet another thread will then do the final ref put
and free this.
We shouldn't free this here as that would also be an overflow. The best
possible thing to do at this point is just to stop using that pointer.


If you actually run into a problem with these refcounts (should get a
warning on umount that something didn't get freed) then by all mean
let's look further into it, but please don't send such patches without
testing the error paths you're "fixing" -- I'm pretty sure a reproducer
to hit these paths would bark errors in dmesg as refcount has an
overflow check.

--
Dominique

2022-07-12 03:32:16

by Hangyu Hua

[permalink] [raw]
Subject: Re: [PATCH] net: 9p: fix possible refcount leak in p9_read_work() and recv_done()

On 2022/7/11 15:39, [email protected] wrote:
> Hangyu Hua wrote on Mon, Jul 11, 2022 at 02:59:07PM +0800:
>> A ref got in p9_tag_lookup needs to be put when functions enter the
>> error path.
>>
>> Fix this by adding p9_req_put in error path.
>
> I wish it was that simple.
>
> Did you actually observe a leak? >
>> diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
>> index 8f8f95e39b03..c4ccb7b9e1bf 100644
>> --- a/net/9p/trans_fd.c
>> +++ b/net/9p/trans_fd.c
>> @@ -343,6 +343,7 @@ static void p9_read_work(struct work_struct *work)
>> p9_debug(P9_DEBUG_ERROR,
>> "No recv fcall for tag %d (req %p), disconnecting!\n",
>> m->rc.tag, m->rreq);
>> + p9_req_put(m->rreq);
>> m->rreq = NULL;
>> err = -EIO;
>> goto error;
>> @@ -372,6 +373,8 @@ static void p9_read_work(struct work_struct *work)
>> "Request tag %d errored out while we were reading the reply\n",
>> m->rc.tag);
>> err = -EIO;
>> + p9_req_put(m->rreq);
>> + m->rreq = NULL;
>> goto error;
>> }
>> spin_unlock(&m->client->lock);
>
>
> for tcp, we still have that request in m->req_list, so we should be
> calling p9_client_cb which will do the p9_req_put in p9_conn_cancel.
>
> If you do it here, you'll get a refcount overflow and use after free.
>


That's a little weird. If you are right, the three return paths of this
function are inconsistent with the handling of refcount.

static void p9_read_work(struct work_struct *work)
{
...
if ((m->rreq) && (m->rc.offset == m->rc.capacity)) {
p9_debug(P9_DEBUG_TRANS, "got new packet\n");
m->rreq->rc.size = m->rc.offset;
spin_lock(&m->client->lock);
if (m->rreq->status == REQ_STATUS_SENT) {
list_del(&m->rreq->req_list);
p9_client_cb(m->client, m->rreq, REQ_STATUS_RCVD); <---- [1]
} else if (m->rreq->status == REQ_STATUS_FLSHD) {
/* Ignore replies associated with a cancelled request. */
p9_debug(P9_DEBUG_TRANS,
"Ignore replies associated with a cancelled request\n"); <---- [2]
} else {
spin_unlock(&m->client->lock);
p9_debug(P9_DEBUG_ERROR,
"Request tag %d errored out while we were reading the reply\n",
m->rc.tag);
err = -EIO;
goto error; <---- [3]
}
spin_unlock(&m->client->lock);
m->rc.sdata = NULL;
m->rc.offset = 0;
m->rc.capacity = 0;
p9_req_put(m->rreq); <---- [4]
m->rreq = NULL;
}
...
error:
p9_conn_cancel(m, err); <---- [5]
clear_bit(Rworksched, &m->wsched);
}

There are three return paths here, [1] and [2] and [3].

[1]: m->rreq will be put twice in [1] and [4]. And m->rreq will be
deleted from the m->req_list in [1].

[2]: m->rreq will be put in [4]. And m->rreq will not be deleted from
m->req_list.

[3]: m->rreq will be put in [5]. And m->rreq will be deleted from the
m->req_list in [5].

If p9_tag_lookup keep the refcount of req which is in m->req_list. There
will be a double put in return path [1] and a potential UAF in return
path [2]. And this also means a req in m->req_list without getting
refcount before p9_tag_lookup.

static void p9_write_work(struct work_struct *work)
{
...
list_move_tail(&req->req_list, &m->req_list);

m->wbuf = req->tc.sdata;
m->wsize = req->tc.size;
m->wpos = 0;
p9_req_get(req);
...
}

But if you check out p9_write_work, a refcount already get after
list_move_tail. We don't need to rely on p9_tag_lookup to keep a list's
refcount. Whatsmore, code comments in p9_tag_alloc also proves that the
refcount get by p9_tag_lookup is a temporary refcount.

So i still think there may be a refcount leak.

>> diff --git a/net/9p/trans_rdma.c b/net/9p/trans_rdma.c
>> index 88e563826674..82b5d6894ee2 100644
>> --- a/net/9p/trans_rdma.c
>> +++ b/net/9p/trans_rdma.c
>> @@ -317,6 +317,7 @@ recv_done(struct ib_cq *cq, struct ib_wc *wc)
>> /* Check that we have not yet received a reply for this request.
>> */
>> if (unlikely(req->rc.sdata)) {
>> + p9_req_put(req);
>> pr_err("Duplicate reply for request %d", tag);
>> goto err_out;
>> }
>
> This one isn't as clear cut, I see that they put the client in a
> FLUSHING state but nothing seems to acton on it... But if this happens
> we're already in the use after free realm -- it means rc.sdata was
> already set so the other thread could be calling p9_client_cb anytime if
> it already hasn't, and yet another thread will then do the final ref put
> and free this.
> We shouldn't free this here as that would also be an overflow. The best
> possible thing to do at this point is just to stop using that pointer.
>

But p9_tag_lookup have a lock inside. Doesn't this mean p9_tag_lookup
won't return a freed req? Otherwise we should fix the lock to avoid
falling into the use after free realm.

Thanks,
Hangyu
>
> If you actually run into a problem with these refcounts (should get a
> warning on umount that something didn't get freed) then by all mean
> let's look further into it, but please don't send such patches without
> testing the error paths you're "fixing" -- I'm pretty sure a reproducer
> to hit these paths would bark errors in dmesg as refcount has an
> overflow check.
>
> --
> Dominique

2022-07-12 06:57:37

by Dominique Martinet

[permalink] [raw]
Subject: Re: [PATCH] net: 9p: fix possible refcount leak in p9_read_work() and recv_done()

Hangyu Hua wrote on Tue, Jul 12, 2022 at 11:24:36AM +0800:
> That's a little weird. If you are right, the three return paths of this
> function are inconsistent with the handling of refcount.
>
> static void p9_read_work(struct work_struct *work)
> {
> ...
> if ((m->rreq) && (m->rc.offset == m->rc.capacity)) {
> p9_debug(P9_DEBUG_TRANS, "got new packet\n");
> m->rreq->rc.size = m->rc.offset;
> spin_lock(&m->client->lock);
> if (m->rreq->status == REQ_STATUS_SENT) {
> list_del(&m->rreq->req_list);
> p9_client_cb(m->client, m->rreq, REQ_STATUS_RCVD); <---- [1]
> } else if (m->rreq->status == REQ_STATUS_FLSHD) {
> /* Ignore replies associated with a cancelled request. */
> p9_debug(P9_DEBUG_TRANS,
> "Ignore replies associated with a cancelled request\n"); <---- [2]
> } else {
> spin_unlock(&m->client->lock);
> p9_debug(P9_DEBUG_ERROR,
> "Request tag %d errored out while we were reading the reply\n",
> m->rc.tag);
> err = -EIO;
> goto error; <---- [3]
> }
> spin_unlock(&m->client->lock);
> m->rc.sdata = NULL;
> m->rc.offset = 0;
> m->rc.capacity = 0;
> p9_req_put(m->rreq); <---- [4]
> m->rreq = NULL;
> }
> ...
> error:
> p9_conn_cancel(m, err); <---- [5]
> clear_bit(Rworksched, &m->wsched);
> }
>
> There are three return paths here, [1] and [2] and [3].
> [1]: m->rreq will be put twice in [1] and [4]. And m->rreq will be deleted
> from the m->req_list in [1].
>
> [2]: m->rreq will be put in [4]. And m->rreq will not be deleted from
> m->req_list.

when req status got put to FLUSHD the req was dropped from the list
already and put in p9_fd_cancel, so we shouldn't put it here.

> [3]: m->rreq will be put in [5]. And m->rreq will be deleted from the
> m->req_list in [5].

On this error case I really can't say anything: it depends on how the
req got in this state in the first place -- more precisely is it still
in req_list or not?

But even if it is and we leak it here, we return an error here, so the
connection will be marked as disconnected and won't be usable anymore.
The memory will be freed when the user umounts after that.

If we took the time to re-init the rreq->req_list everytime we could
check if it's empty (don't think we can rely on it being poisoned), but
I just don't think it's worth it: it's better to consume a bit more
memory until umount than to risk a UAF.

(note: while writing this I noticed p9_tag_cleanup() in
p9_client_destroy() only tracks requests still in the idr, so doesn't
work for requests that went through p9_tag_remove().
We don't need p9_tag_remove() anymore so I've just gotten rid of it and
we will catch these now)


> If p9_tag_lookup keep the refcount of req which is in m->req_list. There
> will be a double put in return path [1] and a potential UAF in return path
> [2]. And this also means a req in m->req_list without getting refcount
> before p9_tag_lookup.

That is the nominal path, we'd notice immediately if there are too many
puts there.
A request is initialized with two refs so that we can have one for the
transport ((a), for fd, "is the request tracked in a list?") and one for
the main thread ((b), p9_client_rpc which will put it at the end)
Then you get a third ref from p9_tag_lookup that I was forgetting about,
(c).

Going through [1] removes it from the list, and removes the associated
ref (a), then through p9_client_cb which removes ref (c) and wakes up
p9_client_rpc which takes the last ref (b), freeing the request.


Now you were correct on one of these error paths not described in your
last mail: we -are- missing a p9_req_ut in the "No recv fcall for tag
%d" error path shortly after p9_tag_lookup, for the ref obtained from
p9_tag_lookup itself -- feel free to resend a patch with just that one.
But once again the connection is now unusable and it'll be caught on
umount so it's not the end of the world...

(I'd appreciate if you base the new patch on top of
https://github.com/martinetd/linux/commits/9p-next )

>
> static void p9_write_work(struct work_struct *work)
> {
> ...
> list_move_tail(&req->req_list, &m->req_list);
>
> m->wbuf = req->tc.sdata;
> m->wsize = req->tc.size;
> m->wpos = 0;
> p9_req_get(req);
> ...
> }
>
> But if you check out p9_write_work, a refcount already get after
> list_move_tail. We don't need to rely on p9_tag_lookup to keep a list's
> refcount.

This refcount is because we are keeping a ref in m->wreq, and is freed
when m->wreq is set back to null when the packet is done writing a few
lines below (but possibly in another call of the function).

refs don't have to come from p9_tag_lookup, they're managing pointers
lifecycle: we're making a copy of the pointer, so we should increment
the refcount so another thread can't free the req under us. In this case
the p9_req_get() is under the trans fd m->client->lock where we got the
req from the list, so req can't be freed between its obtention from the
list and then; once the lock is dropped the req is protected by the ref.


> Whatsmore, code comments in p9_tag_alloc also proves that the
> refcount get by p9_tag_lookup is a temporary refcount.

comments don't prove anything, but yes I forgot p9_tag_alloc takes a ref
when I commented earlier, sorry.

> > This one isn't as clear cut, I see that they put the client in a
> > FLUSHING state but nothing seems to acton on it... But if this happens
> > we're already in the use after free realm -- it means rc.sdata was
> > already set so the other thread could be calling p9_client_cb anytime if
> > it already hasn't, and yet another thread will then do the final ref put
> > and free this.
> > We shouldn't free this here as that would also be an overflow. The best
> > possible thing to do at this point is just to stop using that pointer.
> >
>
> But p9_tag_lookup have a lock inside. Doesn't this mean p9_tag_lookup won't
> return a freed req? Otherwise we should fix the lock to avoid falling into
> the use after free realm.

Right, that falls into the p9_tag_lookup ref, I had implemented this
better than I thought I did...

I agree that one is also more correct to add, although I'd really want
to make some rdma setup and trigger a few errors to test.

--
Dominique

2022-07-12 08:45:26

by Hangyu Hua

[permalink] [raw]
Subject: Re: [PATCH] net: 9p: fix possible refcount leak in p9_read_work() and recv_done()

On 2022/7/12 14:06, [email protected] wrote:
> Hangyu Hua wrote on Tue, Jul 12, 2022 at 11:24:36AM +0800:
>> That's a little weird. If you are right, the three return paths of this
>> function are inconsistent with the handling of refcount.
>>
>> static void p9_read_work(struct work_struct *work)
>> {
>> ...
>> if ((m->rreq) && (m->rc.offset == m->rc.capacity)) {
>> p9_debug(P9_DEBUG_TRANS, "got new packet\n");
>> m->rreq->rc.size = m->rc.offset;
>> spin_lock(&m->client->lock);
>> if (m->rreq->status == REQ_STATUS_SENT) {
>> list_del(&m->rreq->req_list);
>> p9_client_cb(m->client, m->rreq, REQ_STATUS_RCVD); <---- [1]
>> } else if (m->rreq->status == REQ_STATUS_FLSHD) {
>> /* Ignore replies associated with a cancelled request. */
>> p9_debug(P9_DEBUG_TRANS,
>> "Ignore replies associated with a cancelled request\n"); <---- [2]
>> } else {
>> spin_unlock(&m->client->lock);
>> p9_debug(P9_DEBUG_ERROR,
>> "Request tag %d errored out while we were reading the reply\n",
>> m->rc.tag);
>> err = -EIO;
>> goto error; <---- [3]
>> }
>> spin_unlock(&m->client->lock);
>> m->rc.sdata = NULL;
>> m->rc.offset = 0;
>> m->rc.capacity = 0;
>> p9_req_put(m->rreq); <---- [4]
>> m->rreq = NULL;
>> }
>> ...
>> error:
>> p9_conn_cancel(m, err); <---- [5]
>> clear_bit(Rworksched, &m->wsched);
>> }
>>
>> There are three return paths here, [1] and [2] and [3].
>> [1]: m->rreq will be put twice in [1] and [4]. And m->rreq will be deleted
>> from the m->req_list in [1].
>>
>> [2]: m->rreq will be put in [4]. And m->rreq will not be deleted from
>> m->req_list.
>
> when req status got put to FLUSHD the req was dropped from the list
> already and put in p9_fd_cancel, so we shouldn't put it here.
>
>> [3]: m->rreq will be put in [5]. And m->rreq will be deleted from the
>> m->req_list in [5].
>
> On this error case I really can't say anything: it depends on how the
> req got in this state in the first place -- more precisely is it still
> in req_list or not?
>
> But even if it is and we leak it here, we return an error here, so the
> connection will be marked as disconnected and won't be usable anymore.
> The memory will be freed when the user umounts after that.
>
> If we took the time to re-init the rreq->req_list everytime we could
> check if it's empty (don't think we can rely on it being poisoned), but
> I just don't think it's worth it: it's better to consume a bit more
> memory until umount than to risk a UAF.
>
> (note: while writing this I noticed p9_tag_cleanup() in
> p9_client_destroy() only tracks requests still in the idr, so doesn't
> work for requests that went through p9_tag_remove().
> We don't need p9_tag_remove() anymore so I've just gotten rid of it and
> we will catch these now)
>
>
>> If p9_tag_lookup keep the refcount of req which is in m->req_list. There
>> will be a double put in return path [1] and a potential UAF in return path
>> [2]. And this also means a req in m->req_list without getting refcount
>> before p9_tag_lookup.
>
> That is the nominal path, we'd notice immediately if there are too many
> puts there.
> A request is initialized with two refs so that we can have one for the
> transport ((a), for fd, "is the request tracked in a list?") and one for
> the main thread ((b), p9_client_rpc which will put it at the end)
> Then you get a third ref from p9_tag_lookup that I was forgetting about,
> (c).
>
> Going through [1] removes it from the list, and removes the associated
> ref (a), then through p9_client_cb which removes ref (c) and wakes up
> p9_client_rpc which takes the last ref (b), freeing the request.
>

I think the normal path is right beacuase p9_tag_lookup and [4] keep the
balance of refcount. This just proves that error path may have refcount
leak. Beacause error path only put refcount once. In general, either [4]
of the normal path is redundant(like you said, this is easy to catch),
or the error path may have a refcount leak.

But you are right, it'll be caught on umount.

>
> Now you were correct on one of these error paths not described in your
> last mail: we -are- missing a p9_req_ut in the "No recv fcall for tag
> %d" error path shortly after p9_tag_lookup, for the ref obtained from
> p9_tag_lookup itself -- feel free to resend a patch with just that one.
> But once again the connection is now unusable and it'll be caught on
> umount so it's not the end of the world...
>
> (I'd appreciate if you base the new patch on top of
> https://github.com/martinetd/linux/commits/9p-next )
>

I see. I will make a new patch later.

>>
>> static void p9_write_work(struct work_struct *work)
>> {
>> ...
>> list_move_tail(&req->req_list, &m->req_list);
>>
>> m->wbuf = req->tc.sdata;
>> m->wsize = req->tc.size;
>> m->wpos = 0;
>> p9_req_get(req);
>> ...
>> }
>>
>> But if you check out p9_write_work, a refcount already get after
>> list_move_tail. We don't need to rely on p9_tag_lookup to keep a list's
>> refcount.
>
> This refcount is because we are keeping a ref in m->wreq, and is freed
> when m->wreq is set back to null when the packet is done writing a few
> lines below (but possibly in another call of the function).
>
> refs don't have to come from p9_tag_lookup, they're managing pointers
> lifecycle: we're making a copy of the pointer, so we should increment
> the refcount so another thread can't free the req under us. In this case
> the p9_req_get() is under the trans fd m->client->lock where we got the
> req from the list, so req can't be freed between its obtention from the
> list and then; once the lock is dropped the req is protected by the ref.
>
>

My fault. I misunderstood here.

Thanks,
Hangyu



>> Whatsmore, code comments in p9_tag_alloc also proves that the
>> refcount get by p9_tag_lookup is a temporary refcount.
>
> comments don't prove anything, but yes I forgot p9_tag_alloc takes a ref
> when I commented earlier, sorry.
>
>>> This one isn't as clear cut, I see that they put the client in a
>>> FLUSHING state but nothing seems to acton on it... But if this happens
>>> we're already in the use after free realm -- it means rc.sdata was
>>> already set so the other thread could be calling p9_client_cb anytime if
>>> it already hasn't, and yet another thread will then do the final ref put
>>> and free this.
>>> We shouldn't free this here as that would also be an overflow. The best
>>> possible thing to do at this point is just to stop using that pointer.
>>>
>>
>> But p9_tag_lookup have a lock inside. Doesn't this mean p9_tag_lookup won't
>> return a freed req? Otherwise we should fix the lock to avoid falling into
>> the use after free realm.
>
> Right, that falls into the p9_tag_lookup ref, I had implemented this
> better than I thought I did...
>
> I agree that one is also more correct to add, although I'd really want
> to make some rdma setup and trigger a few errors to test.
> > --
> Dominique