LinuxLists.cc - Small fixes/changes for RDS

2014-10-01 21:50:00

Subject: Small fixes/changes for RDS

Hi,

I got a report of one issue within RDS (after investigation it was a double
free), and I'm sending the fix (patch 3/3) which reporter said it works (no more
WARNING triggered on a specially instrumented kernel). The report/test was done
on a very old kernel (RHEL 5, 2.6.18 based with backports), but the problem the
patch handles still exists and should not change. Besides that, while
reviewing some of the code but being unable to reproduce with rds_tcp, I
noticed two small improvements/fixes which are in patches 1 and 2.

thanks,
--
[]'s
Herton

2014-10-01 21:50:04

by Herton R. Krzesinski

[permalink] [raw]

Subject: [PATCH 1/3] net/rds: call rds_conn_drop instead of open code it at rds_connect_complete

Signed-off-by: Herton R. Krzesinski <[email protected]>
---
net/rds/threads.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/rds/threads.c b/net/rds/threads.c
index 65eaefc..dc2402e 100644
--- a/net/rds/threads.c
+++ b/net/rds/threads.c
@@ -78,8 +78,7 @@ void rds_connect_complete(struct rds_connection *conn)
"current state is %d\n",
__func__,
atomic_read(&conn->c_state));
- atomic_set(&conn->c_state, RDS_CONN_ERROR);
- queue_work(rds_wq, &conn->c_down_w);
+ rds_conn_drop(conn);
return;
}

--
1.9.3

2014-10-01 21:50:11

by Herton R. Krzesinski

[permalink] [raw]

Subject: [PATCH 3/3] net/rds: fix possible double free on sock tear down

I got a report of a double free happening at RDS slab cache. One
suspicion was that may be somewhere we were doing a sock_hold/sock_put
on an already freed sock. Thus after providing a kernel with the
following change:

static inline void sock_hold(struct sock *sk)
{
- atomic_inc(&sk->sk_refcnt);
+ if (!atomic_inc_not_zero(&sk->sk_refcnt))
+ WARN(1, "Trying to hold sock already gone: %p (family: %hd)\n",
+ sk, sk->sk_family);
}

The warning successfuly triggered:

Trying to hold sock already gone: ffff81f6dda61280 (family: 21)
WARNING: at include/net/sock.h:350 sock_hold()
Call Trace:
<IRQ> [<ffffffff8adac135>] :rds:rds_send_remove_from_sock+0xf0/0x21b
[<ffffffff8adad35c>] :rds:rds_send_drop_acked+0xbf/0xcf
[<ffffffff8addf546>] :rds_rdma:rds_ib_recv_tasklet_fn+0x256/0x2dc
[<ffffffff8009899a>] tasklet_action+0x8f/0x12b
[<ffffffff800125a2>] __do_softirq+0x89/0x133
[<ffffffff8005f30c>] call_softirq+0x1c/0x28
[<ffffffff8006e644>] do_softirq+0x2c/0x7d
[<ffffffff8006e4d4>] do_IRQ+0xee/0xf7
[<ffffffff8005e625>] ret_from_intr+0x0/0xa
<EOI>

Looking at the call chain above, the only way I think this would be
possible is if somewhere we already released the same socket->sock which
is assigned to the rds_message at rds_send_remove_from_sock. Which seems
only possible to happen after the tear down done on rds_release.

rds_release properly calls rds_send_drop_to to drop the socket from any
rds_message, and some proper synchronization is in place to avoid race
with rds_send_drop_acked/rds_send_remove_from_sock. However, I still see
a very narrow window where it may be possible we touch a sock already
released: when rds_release races with rds_send_drop_acked, we check
RDS_MSG_ON_CONN to avoid cleanup on the same rds_message, but in this
specific case we don't clear rm->m_rs. In this case, it seems we could
then go on at rds_send_drop_to and after it returns, the sock is freed
by last sock_put on rds_release, with concurrently we being at
rds_send_remove_from_sock; then at some point in the loop at
rds_send_remove_from_sock we process an rds_message which didn't have
rm->m_rs unset for a freed sock, and a possible sock_hold on an sock
already gone at rds_release happens.

This hopefully address the described condition above and avoids a double
free on "second last" sock_put. In addition, I removed the comment about
socket destruction on top of rds_send_drop_acked: we call rds_send_drop_to
in rds_release and we should have things properly serialized there, thus
I can't see the comment being accurate there.

Signed-off-by: Herton R. Krzesinski <[email protected]>
---
net/rds/send.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/net/rds/send.c b/net/rds/send.c
index 2371816..0a64541 100644
--- a/net/rds/send.c
+++ b/net/rds/send.c
@@ -593,8 +593,11 @@ static void rds_send_remove_from_sock(struct list_head *messages, int status)
sock_put(rds_rs_to_sk(rs));
}
rs = rm->m_rs;
- sock_hold(rds_rs_to_sk(rs));
+ if (rs)
+ sock_hold(rds_rs_to_sk(rs));
}
+ if (!rs)
+ goto unlock_and_drop;
spin_lock(&rs->rs_lock);

if (test_and_clear_bit(RDS_MSG_ON_SOCK, &rm->m_flags)) {
@@ -638,9 +641,6 @@ unlock_and_drop:
* queue. This means that in the TCP case, the message may not have been
* assigned the m_ack_seq yet - but that's fine as long as tcp_is_acked
* checks the RDS_MSG_HAS_ACK_SEQ bit.
- *
- * XXX It's not clear to me how this is safely serialized with socket
- * destruction. Maybe it should bail if it sees SOCK_DEAD.
*/
void rds_send_drop_acked(struct rds_connection *conn, u64 ack,
is_acked_func is_acked)
@@ -711,6 +711,9 @@ void rds_send_drop_to(struct rds_sock *rs, struct sockaddr_in *dest)
*/
if (!test_and_clear_bit(RDS_MSG_ON_CONN, &rm->m_flags)) {
spin_unlock_irqrestore(&conn->c_lock, flags);
+ spin_lock_irqsave(&rm->m_rs_lock, flags);
+ rm->m_rs = NULL;
+ spin_unlock_irqrestore(&rm->m_rs_lock, flags);
continue;
}
list_del_init(&rm->m_conn_item);
--
1.9.3

2014-10-01 21:50:53

by Herton R. Krzesinski

[permalink] [raw]

Subject: [PATCH 2/3] net/rds: do proper house keeping if connection fails in rds_tcp_conn_connect

I see two problems if we consider the sock->ops->connect attempt to fail in
rds_tcp_conn_connect. The first issue is that for example we don't remove the
previously added rds_tcp_connection item to rds_tcp_tc_list at
rds_tcp_set_callbacks, which means that on a next reconnect attempt for the
same rds_connection, when rds_tcp_conn_connect is called we can again call
rds_tcp_set_callbacks, resulting in duplicated items on rds_tcp_tc_list,
leading to list corruption: to avoid this just make sure we call
properly rds_tcp_restore_callbacks before we exit. The second issue
is that we should also release the sock properly, by setting sock = NULL
only if we are returning without error.

Signed-off-by: Herton R. Krzesinski <[email protected]>
---
net/rds/tcp_connect.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/rds/tcp_connect.c b/net/rds/tcp_connect.c
index a65ee78..f9f564a 100644
--- a/net/rds/tcp_connect.c
+++ b/net/rds/tcp_connect.c
@@ -106,11 +106,14 @@ int rds_tcp_conn_connect(struct rds_connection *conn)
rds_tcp_set_callbacks(sock, conn);
ret = sock->ops->connect(sock, (struct sockaddr *)&dest, sizeof(dest),
O_NONBLOCK);
- sock = NULL;

rdsdebug("connect to address %pI4 returned %d\n", &conn->c_faddr, ret);
if (ret == -EINPROGRESS)
ret = 0;
+ if (ret == 0)
+ sock = NULL;
+ else
+ rds_tcp_restore_callbacks(sock, conn->c_transport_data);

out:
if (sock)
--
1.9.3

2014-10-03 19:50:57

by David Miller

[permalink] [raw]

Subject: Re: Small fixes/changes for RDS

From: "Herton R. Krzesinski" <[email protected]>
Date: Wed, 1 Oct 2014 18:49:51 -0300

> I got a report of one issue within RDS (after investigation it was a double
> free), and I'm sending the fix (patch 3/3) which reporter said it works (no more
> WARNING triggered on a specially instrumented kernel). The report/test was done
> on a very old kernel (RHEL 5, 2.6.18 based with backports), but the problem the
> patch handles still exists and should not change. Besides that, while
> reviewing some of the code but being unable to reproduce with rds_tcp, I
> noticed two small improvements/fixes which are in patches 1 and 2.

Series applied, thanks.