2017-12-20 03:13:11

by Yafang Shao

[permalink] [raw]
Subject: [PATCH v3 net-next 0/5] replace tcp_set_state tracepoint with inet_sock_set_state

According to the discussion in the mail thread
https://patchwork.kernel.org/patch/10099243/,
tcp_set_state tracepoint is renamed to inet_sock_set_state tracepoint and is
moved to include/trace/events/sock.h.

With this new tracepoint, we can trace AF_INET/AF_INET6 sock state transitions.
As there's only one single tracepoint for inet, so I didn't create a new trace
file named trace/events/inet_sock.h, and just place it in
include/trace/events/sock.h

Currently TCP/DCCP/SCTP state transitions are traced with this tracepoint.

- Why not more protocol ?
If we really think that anonter protocol should be traced, I will modify the
code to trace it.
I just want to make the code easy and not output useless information.

Steven Rostedt (VMware) (1):
tcp: Export to userspace the TCP state names for the trace events

Yafang Shao (4):
net: tracepoint: replace tcp_set_state tracepoint with
inet_sock_set_state tracepoint
net: sock: replace sk_state_load with inet_sk_state_load and remove
sk_state_store
net: tracepoint: using sock_set_state tracepoint to trace DCCP state
transition
net: tracepoint: using sock_set_state tracepoint to trace SCTP state
transition

include/net/inet_sock.h | 25 ++++++++++
include/net/sock.h | 25 ----------
include/trace/events/sock.h | 107 ++++++++++++++++++++++++++++++++++++++++
include/trace/events/tcp.h | 16 ------
net/dccp/proto.c | 2 +-
net/ipv4/af_inet.c | 14 ++++++
net/ipv4/inet_connection_sock.c | 8 +--
net/ipv4/inet_hashtables.c | 2 +-
net/ipv4/tcp.c | 10 ++--
net/ipv4/tcp_diag.c | 2 +-
net/ipv4/tcp_ipv4.c | 2 +-
net/ipv6/tcp_ipv6.c | 2 +-
net/sctp/endpointola.c | 2 +-
net/sctp/sm_sideeffect.c | 4 +-
net/sctp/socket.c | 12 ++---
15 files changed, 167 insertions(+), 66 deletions(-)

--
1.8.3.1


2017-12-20 03:13:21

by Yafang Shao

[permalink] [raw]
Subject: [PATCH v3 net-next 1/5] tcp: Export to userspace the TCP state names for the trace events

From: "Steven Rostedt (VMware)" <[email protected]>

The TCP trace events (specifically tcp_set_state), maps emums to symbol
names via __print_symbolic(). But this only works for reading trace events
from the tracefs trace files. If perf or trace-cmd were to record these
events, the event format file does not convert the enum names into numbers,
and you get something like:

__print_symbolic(REC->oldstate,
{ TCP_ESTABLISHED, "TCP_ESTABLISHED" },
{ TCP_SYN_SENT, "TCP_SYN_SENT" },
{ TCP_SYN_RECV, "TCP_SYN_RECV" },
{ TCP_FIN_WAIT1, "TCP_FIN_WAIT1" },
{ TCP_FIN_WAIT2, "TCP_FIN_WAIT2" },
{ TCP_TIME_WAIT, "TCP_TIME_WAIT" },
{ TCP_CLOSE, "TCP_CLOSE" },
{ TCP_CLOSE_WAIT, "TCP_CLOSE_WAIT" },
{ TCP_LAST_ACK, "TCP_LAST_ACK" },
{ TCP_LISTEN, "TCP_LISTEN" },
{ TCP_CLOSING, "TCP_CLOSING" },
{ TCP_NEW_SYN_RECV, "TCP_NEW_SYN_RECV" })

Where trace-cmd and perf do not know the values of those enums.

Use the TRACE_DEFINE_ENUM() macros that will have the trace events convert
the enum strings into their values at system boot. This will allow perf and
trace-cmd to see actual numbers and not enums:

__print_symbolic(REC->oldstate,
{ 1, "TCP_ESTABLISHED" },
{ 2, "TCP_SYN_SENT" },
{ 3, "TCP_SYN_RECV" },
{ 4, "TCP_FIN_WAIT1" },
{ 5, "TCP_FIN_WAIT2" },
{ 6, "TCP_TIME_WAIT" },
{ 7, "TCP_CLOSE" },
{ 8, "TCP_CLOSE_WAIT" },
{ 9, "TCP_LAST_ACK" },
{ 10, "TCP_LISTEN" },
{ 11, "TCP_CLOSING" },
{ 12, "TCP_NEW_SYN_RECV" })

Signed-off-by: Steven Rostedt (VMware) <[email protected]>
Acked-by: Song Liu <[email protected]>
Signed-off-by: Yafang Shao <[email protected]>
---
include/trace/events/tcp.h | 41 ++++++++++++++++++++++++++++-------------
1 file changed, 28 insertions(+), 13 deletions(-)

diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
index 07cccca..ec52fb3 100644
--- a/include/trace/events/tcp.h
+++ b/include/trace/events/tcp.h
@@ -9,21 +9,36 @@
#include <linux/tracepoint.h>
#include <net/ipv6.h>

+#define tcp_state_names \
+ EM(TCP_ESTABLISHED) \
+ EM(TCP_SYN_SENT) \
+ EM(TCP_SYN_RECV) \
+ EM(TCP_FIN_WAIT1) \
+ EM(TCP_FIN_WAIT2) \
+ EM(TCP_TIME_WAIT) \
+ EM(TCP_CLOSE) \
+ EM(TCP_CLOSE_WAIT) \
+ EM(TCP_LAST_ACK) \
+ EM(TCP_LISTEN) \
+ EM(TCP_CLOSING) \
+ EMe(TCP_NEW_SYN_RECV) \
+
+/* enums need to be exported to user space */
+#undef EM
+#undef EMe
+#define EM(a) TRACE_DEFINE_ENUM(a);
+#define EMe(a) TRACE_DEFINE_ENUM(a);
+
+tcp_state_names
+
+#undef EM
+#undef EMe
+#define EM(a) tcp_state_name(a),
+#define EMe(a) tcp_state_name(a)
+
#define tcp_state_name(state) { state, #state }
#define show_tcp_state_name(val) \
- __print_symbolic(val, \
- tcp_state_name(TCP_ESTABLISHED), \
- tcp_state_name(TCP_SYN_SENT), \
- tcp_state_name(TCP_SYN_RECV), \
- tcp_state_name(TCP_FIN_WAIT1), \
- tcp_state_name(TCP_FIN_WAIT2), \
- tcp_state_name(TCP_TIME_WAIT), \
- tcp_state_name(TCP_CLOSE), \
- tcp_state_name(TCP_CLOSE_WAIT), \
- tcp_state_name(TCP_LAST_ACK), \
- tcp_state_name(TCP_LISTEN), \
- tcp_state_name(TCP_CLOSING), \
- tcp_state_name(TCP_NEW_SYN_RECV))
+ __print_symbolic(val, tcp_state_names)

/*
* tcp event with arguments sk and skb
--
1.8.3.1

2017-12-20 03:13:29

by Yafang Shao

[permalink] [raw]
Subject: [PATCH v3 net-next 2/5] net: tracepoint: replace tcp_set_state tracepoint with inet_sock_set_state tracepoint

As sk_state is a common field for struct sock, so the state
transition tracepoint should not be a TCP specific feature.
Currently it traces all AF_INET state transition, so I rename this
tracepoint to inet_sock_set_state tracepoint with some minor changes and move it
into trace/events/sock.h.
We dont need to create a file named trace/events/inet_sock.h for this one single
tracepoint.

Two helpers are introduced to trace sk_state transition
- void inet_sk_state_store(struct sock *sk, int newstate);
- void inet_sk_set_state(struct sock *sk, int state);
As trace header should not be included in other header files,
so they are defined in sock.c.

The protocol such as SCTP maybe compiled as a ko, hence export
inet_sk_set_state().

Signed-off-by: Yafang Shao <[email protected]>
---
include/net/inet_sock.h | 2 +
include/trace/events/sock.h | 107 ++++++++++++++++++++++++++++++++++++++++
include/trace/events/tcp.h | 31 ------------
net/ipv4/af_inet.c | 14 ++++++
net/ipv4/inet_connection_sock.c | 6 +--
net/ipv4/inet_hashtables.c | 2 +-
net/ipv4/tcp.c | 6 +--
7 files changed, 128 insertions(+), 40 deletions(-)

diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 39efb96..a3431a4 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -290,6 +290,8 @@ static inline void inet_sk_copy_descendant(struct sock *sk_to,
#endif

int inet_sk_rebuild_header(struct sock *sk);
+void inet_sk_set_state(struct sock *sk, int state);
+void inet_sk_state_store(struct sock *sk, int newstate);

static inline unsigned int __inet_ehashfn(const __be32 laddr,
const __u16 lport,
diff --git a/include/trace/events/sock.h b/include/trace/events/sock.h
index ec4dade..3b9094a 100644
--- a/include/trace/events/sock.h
+++ b/include/trace/events/sock.h
@@ -6,7 +6,50 @@
#define _TRACE_SOCK_H

#include <net/sock.h>
+#include <net/ipv6.h>
#include <linux/tracepoint.h>
+#include <linux/ipv6.h>
+#include <linux/tcp.h>
+
+/* The protocol traced by sock_set_state */
+#define inet_protocol_names \
+ EM(IPPROTO_TCP) \
+ EM(IPPROTO_DCCP) \
+ EMe(IPPROTO_SCTP)
+
+#define tcp_state_names \
+ EM(TCP_ESTABLISHED) \
+ EM(TCP_SYN_SENT) \
+ EM(TCP_SYN_RECV) \
+ EM(TCP_FIN_WAIT1) \
+ EM(TCP_FIN_WAIT2) \
+ EM(TCP_TIME_WAIT) \
+ EM(TCP_CLOSE) \
+ EM(TCP_CLOSE_WAIT) \
+ EM(TCP_LAST_ACK) \
+ EM(TCP_LISTEN) \
+ EM(TCP_CLOSING) \
+ EMe(TCP_NEW_SYN_RECV)
+
+/* enums need to be exported to user space */
+#undef EM
+#undef EMe
+#define EM(a) TRACE_DEFINE_ENUM(a);
+#define EMe(a) TRACE_DEFINE_ENUM(a);
+
+inet_protocol_names
+tcp_state_names
+
+#undef EM
+#undef EMe
+#define EM(a) { a, #a },
+#define EMe(a) { a, #a }
+
+#define show_inet_protocol_name(val) \
+ __print_symbolic(val, inet_protocol_names)
+
+#define show_tcp_state_name(val) \
+ __print_symbolic(val, tcp_state_names)

TRACE_EVENT(sock_rcvqueue_full,

@@ -63,6 +106,70 @@
__entry->rmem_alloc)
);

+TRACE_EVENT(inet_sock_set_state,
+
+ TP_PROTO(const struct sock *sk, const int oldstate, const int newstate),
+
+ TP_ARGS(sk, oldstate, newstate),
+
+ TP_STRUCT__entry(
+ __field(const void *, skaddr)
+ __field(int, oldstate)
+ __field(int, newstate)
+ __field(__u16, sport)
+ __field(__u16, dport)
+ __field(__u8, protocol)
+ __array(__u8, saddr, 4)
+ __array(__u8, daddr, 4)
+ __array(__u8, saddr_v6, 16)
+ __array(__u8, daddr_v6, 16)
+ ),
+
+ TP_fast_assign(
+ struct inet_sock *inet = inet_sk(sk);
+ struct in6_addr *pin6;
+ __be32 *p32;
+
+ __entry->skaddr = sk;
+ __entry->oldstate = oldstate;
+ __entry->newstate = newstate;
+
+ __entry->protocol = sk->sk_protocol;
+ __entry->sport = ntohs(inet->inet_sport);
+ __entry->dport = ntohs(inet->inet_dport);
+
+ p32 = (__be32 *) __entry->saddr;
+ *p32 = inet->inet_saddr;
+
+ p32 = (__be32 *) __entry->daddr;
+ *p32 = inet->inet_daddr;
+
+#if IS_ENABLED(CONFIG_IPV6)
+ if (sk->sk_family == AF_INET6) {
+ pin6 = (struct in6_addr *)__entry->saddr_v6;
+ *pin6 = sk->sk_v6_rcv_saddr;
+ pin6 = (struct in6_addr *)__entry->daddr_v6;
+ *pin6 = sk->sk_v6_daddr;
+ } else
+#endif
+ {
+ pin6 = (struct in6_addr *)__entry->saddr_v6;
+ ipv6_addr_set_v4mapped(inet->inet_saddr, pin6);
+ pin6 = (struct in6_addr *)__entry->daddr_v6;
+ ipv6_addr_set_v4mapped(inet->inet_daddr, pin6);
+ }
+ ),
+
+ TP_printk("protocol=%s sport=%hu dport=%hu saddr=%pI4 daddr=%pI4"
+ "saddrv6=%pI6c daddrv6=%pI6c oldstate=%s newstate=%s",
+ show_inet_protocol_name(__entry->protocol),
+ __entry->sport, __entry->dport,
+ __entry->saddr, __entry->daddr,
+ __entry->saddr_v6, __entry->daddr_v6,
+ show_tcp_state_name(__entry->oldstate),
+ show_tcp_state_name(__entry->newstate))
+);
+
#endif /* _TRACE_SOCK_H */

/* This part must be outside protection */
diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
index ec52fb3..8e88a16 100644
--- a/include/trace/events/tcp.h
+++ b/include/trace/events/tcp.h
@@ -9,37 +9,6 @@
#include <linux/tracepoint.h>
#include <net/ipv6.h>

-#define tcp_state_names \
- EM(TCP_ESTABLISHED) \
- EM(TCP_SYN_SENT) \
- EM(TCP_SYN_RECV) \
- EM(TCP_FIN_WAIT1) \
- EM(TCP_FIN_WAIT2) \
- EM(TCP_TIME_WAIT) \
- EM(TCP_CLOSE) \
- EM(TCP_CLOSE_WAIT) \
- EM(TCP_LAST_ACK) \
- EM(TCP_LISTEN) \
- EM(TCP_CLOSING) \
- EMe(TCP_NEW_SYN_RECV) \
-
-/* enums need to be exported to user space */
-#undef EM
-#undef EMe
-#define EM(a) TRACE_DEFINE_ENUM(a);
-#define EMe(a) TRACE_DEFINE_ENUM(a);
-
-tcp_state_names
-
-#undef EM
-#undef EMe
-#define EM(a) tcp_state_name(a),
-#define EMe(a) tcp_state_name(a)
-
-#define tcp_state_name(state) { state, #state }
-#define show_tcp_state_name(val) \
- __print_symbolic(val, tcp_state_names)
-
/*
* tcp event with arguments sk and skb
*
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index f00499a..bab98a4 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -121,6 +121,7 @@
#endif
#include <net/l3mdev.h>

+#include <trace/events/sock.h>

/* The inetsw table contains everything that inet_create needs to
* build a new socket.
@@ -1220,6 +1221,19 @@ int inet_sk_rebuild_header(struct sock *sk)
}
EXPORT_SYMBOL(inet_sk_rebuild_header);

+void inet_sk_set_state(struct sock *sk, int state)
+{
+ trace_inet_sock_set_state(sk, sk->sk_state, state);
+ sk->sk_state = state;
+}
+EXPORT_SYMBOL(inet_sk_set_state);
+
+void inet_sk_state_store(struct sock *sk, int newstate)
+{
+ trace_inet_sock_set_state(sk, sk->sk_state, newstate);
+ smp_store_release(&sk->sk_state, newstate);
+}
+
struct sk_buff *inet_gso_segment(struct sk_buff *skb,
netdev_features_t features)
{
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 4ca46dc..f460fc0 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -783,7 +783,7 @@ struct sock *inet_csk_clone_lock(const struct sock *sk,
if (newsk) {
struct inet_connection_sock *newicsk = inet_csk(newsk);

- newsk->sk_state = TCP_SYN_RECV;
+ inet_sk_set_state(newsk, TCP_SYN_RECV);
newicsk->icsk_bind_hash = NULL;

inet_sk(newsk)->inet_dport = inet_rsk(req)->ir_rmt_port;
@@ -877,7 +877,7 @@ int inet_csk_listen_start(struct sock *sk, int backlog)
* It is OK, because this socket enters to hash table only
* after validation is complete.
*/
- sk_state_store(sk, TCP_LISTEN);
+ inet_sk_state_store(sk, TCP_LISTEN);
if (!sk->sk_prot->get_port(sk, inet->inet_num)) {
inet->inet_sport = htons(inet->inet_num);

@@ -888,7 +888,7 @@ int inet_csk_listen_start(struct sock *sk, int backlog)
return 0;
}

- sk->sk_state = TCP_CLOSE;
+ inet_sk_set_state(sk, TCP_CLOSE);
return err;
}
EXPORT_SYMBOL_GPL(inet_csk_listen_start);
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index f6f5810..37b7da0 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -544,7 +544,7 @@ bool inet_ehash_nolisten(struct sock *sk, struct sock *osk)
sock_prot_inuse_add(sock_net(sk), sk->sk_prot, 1);
} else {
percpu_counter_inc(sk->sk_prot->orphan_count);
- sk->sk_state = TCP_CLOSE;
+ inet_sk_set_state(sk, TCP_CLOSE);
sock_set_flag(sk, SOCK_DEAD);
inet_csk_destroy_sock(sk);
}
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index c470fec..d408fb4 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -283,8 +283,6 @@
#include <asm/ioctls.h>
#include <net/busy_poll.h>

-#include <trace/events/tcp.h>
-
struct percpu_counter tcp_orphan_count;
EXPORT_SYMBOL_GPL(tcp_orphan_count);

@@ -2040,8 +2038,6 @@ void tcp_set_state(struct sock *sk, int state)
{
int oldstate = sk->sk_state;

- trace_tcp_set_state(sk, oldstate, state);
-
switch (state) {
case TCP_ESTABLISHED:
if (oldstate != TCP_ESTABLISHED)
@@ -2065,7 +2061,7 @@ void tcp_set_state(struct sock *sk, int state)
/* Change state AFTER socket is unhashed to avoid closed
* socket sitting in hash tables.
*/
- sk_state_store(sk, state);
+ inet_sk_state_store(sk, state);

#ifdef STATE_TRACE
SOCK_DEBUG(sk, "TCP sk=%p, State %s -> %s\n", sk, statename[oldstate], statename[state]);
--
1.8.3.1

2017-12-20 03:13:40

by Yafang Shao

[permalink] [raw]
Subject: [PATCH v3 net-next 5/5] net: tracepoint: using sock_set_state tracepoint to trace SCTP state transition

With changes in inet_ files, SCTP state transitions are traced with
inet_sock_set_state tracepoint.
As SCTP state names, i.e. SCTP_SS_CLOSED, SCTP_SS_ESTABLISHED,
have the same value with TCP state names. So the output info still print
the TCP state names, that makes the code easy.

Signed-off-by: Yafang Shao <[email protected]>
---
net/sctp/endpointola.c | 2 +-
net/sctp/sm_sideeffect.c | 4 ++--
net/sctp/socket.c | 12 ++++++------
3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/net/sctp/endpointola.c b/net/sctp/endpointola.c
index ee1e601..8b31468 100644
--- a/net/sctp/endpointola.c
+++ b/net/sctp/endpointola.c
@@ -232,7 +232,7 @@ void sctp_endpoint_free(struct sctp_endpoint *ep)
{
ep->base.dead = true;

- ep->base.sk->sk_state = SCTP_SS_CLOSED;
+ inet_sk_set_state(ep->base.sk, SCTP_SS_CLOSED);

/* Unlink this endpoint, so we can't find it again! */
sctp_unhash_endpoint(ep);
diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
index 8adde71..c0c3ec6 100644
--- a/net/sctp/sm_sideeffect.c
+++ b/net/sctp/sm_sideeffect.c
@@ -878,12 +878,12 @@ static void sctp_cmd_new_state(struct sctp_cmd_seq *cmds,
* successfully completed a connect() call.
*/
if (sctp_state(asoc, ESTABLISHED) && sctp_sstate(sk, CLOSED))
- sk->sk_state = SCTP_SS_ESTABLISHED;
+ inet_sk_set_state(sk, SCTP_SS_ESTABLISHED);

/* Set the RCV_SHUTDOWN flag when a SHUTDOWN is received. */
if (sctp_state(asoc, SHUTDOWN_RECEIVED) &&
sctp_sstate(sk, ESTABLISHED)) {
- sk->sk_state = SCTP_SS_CLOSING;
+ inet_sk_set_state(sk, SCTP_SS_CLOSING);
sk->sk_shutdown |= RCV_SHUTDOWN;
}
}
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 7eec0a0..59b5689 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -1544,7 +1544,7 @@ static void sctp_close(struct sock *sk, long timeout)

lock_sock_nested(sk, SINGLE_DEPTH_NESTING);
sk->sk_shutdown = SHUTDOWN_MASK;
- sk->sk_state = SCTP_SS_CLOSING;
+ inet_sk_set_state(sk, SCTP_SS_CLOSING);

ep = sctp_sk(sk)->ep;

@@ -4653,7 +4653,7 @@ static void sctp_shutdown(struct sock *sk, int how)
if (how & SEND_SHUTDOWN && !list_empty(&ep->asocs)) {
struct sctp_association *asoc;

- sk->sk_state = SCTP_SS_CLOSING;
+ inet_sk_set_state(sk, SCTP_SS_CLOSING);
asoc = list_entry(ep->asocs.next,
struct sctp_association, asocs);
sctp_primitive_SHUTDOWN(net, asoc, NULL);
@@ -7509,13 +7509,13 @@ static int sctp_listen_start(struct sock *sk, int backlog)
* sockets.
*
*/
- sk->sk_state = SCTP_SS_LISTENING;
+ inet_sk_set_state(sk, SCTP_SS_LISTENING);
if (!ep->base.bind_addr.port) {
if (sctp_autobind(sk))
return -EAGAIN;
} else {
if (sctp_get_port(sk, inet_sk(sk)->inet_num)) {
- sk->sk_state = SCTP_SS_CLOSED;
+ inet_sk_set_state(sk, SCTP_SS_CLOSED);
return -EADDRINUSE;
}
}
@@ -8538,10 +8538,10 @@ static void sctp_sock_migrate(struct sock *oldsk, struct sock *newsk,
* is called, set RCV_SHUTDOWN flag.
*/
if (sctp_state(assoc, CLOSED) && sctp_style(newsk, TCP)) {
- newsk->sk_state = SCTP_SS_CLOSED;
+ inet_sk_set_state(newsk, SCTP_SS_CLOSED);
newsk->sk_shutdown |= RCV_SHUTDOWN;
} else {
- newsk->sk_state = SCTP_SS_ESTABLISHED;
+ inet_sk_set_state(newsk, SCTP_SS_ESTABLISHED);
}

release_sock(newsk);
--
1.8.3.1

2017-12-20 03:13:37

by Yafang Shao

[permalink] [raw]
Subject: [PATCH v3 net-next 4/5] net: tracepoint: using sock_set_state tracepoint to trace DCCP state transition

With changes in inet_ files, DCCP state transitions are traced with
inet_sock_set_state tracepoint.

Signed-off-by: Yafang Shao <[email protected]>
---
net/dccp/proto.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/dccp/proto.c b/net/dccp/proto.c
index 9d43c1f..7a75a1d 100644
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -110,7 +110,7 @@ void dccp_set_state(struct sock *sk, const int state)
/* Change state AFTER socket is unhashed to avoid closed
* socket sitting in hash tables.
*/
- sk->sk_state = state;
+ inet_sk_set_state(sk, state);
}

EXPORT_SYMBOL_GPL(dccp_set_state);
--
1.8.3.1

2017-12-20 03:14:27

by Yafang Shao

[permalink] [raw]
Subject: [PATCH v3 net-next 3/5] net: sock: replace sk_state_load with inet_sk_state_load and remove sk_state_store

sk_state_load is only used by AF_INET/AF_INET6, so rename it to
inet_sk_state_load and move it into inet_sock.h.

sk_state_store is removed as it is not used any more.

Signed-off-by: Yafang Shao <[email protected]>
---
include/net/inet_sock.h | 25 ++++++++++++++++++++++++-
include/net/sock.h | 25 -------------------------
net/ipv4/inet_connection_sock.c | 2 +-
net/ipv4/tcp.c | 4 ++--
net/ipv4/tcp_diag.c | 2 +-
net/ipv4/tcp_ipv4.c | 2 +-
net/ipv6/tcp_ipv6.c | 2 +-
7 files changed, 30 insertions(+), 32 deletions(-)

diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index a3431a4..0a671c3 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -290,9 +290,32 @@ static inline void inet_sk_copy_descendant(struct sock *sk_to,
#endif

int inet_sk_rebuild_header(struct sock *sk);
-void inet_sk_set_state(struct sock *sk, int state);
+
+/**
+ * inet_sk_state_load - read sk->sk_state for lockless contexts
+ * @sk: socket pointer
+ *
+ * Paired with inet_sk_state_store(). Used in places we don't hold socket lock:
+ * tcp_diag_get_info(), tcp_get_info(), tcp_poll(), get_tcp4_sock() ...
+ */
+static inline int inet_sk_state_load(const struct sock *sk)
+{
+ /* state change might impact lockless readers. */
+ return smp_load_acquire(&sk->sk_state);
+}
+
+/**
+ * inet_sk_state_store - update sk->sk_state
+ * @sk: socket pointer
+ * @newstate: new state
+ *
+ * Paired with inet_sk_state_load(). Should be used in contexts where
+ * state change might impact lockless readers.
+ */
void inet_sk_state_store(struct sock *sk, int newstate);

+void inet_sk_set_state(struct sock *sk, int state);
+
static inline unsigned int __inet_ehashfn(const __be32 laddr,
const __u16 lport,
const __be32 faddr,
diff --git a/include/net/sock.h b/include/net/sock.h
index 9a90472..4fd211b 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2332,31 +2332,6 @@ static inline bool sk_listener(const struct sock *sk)
return (1 << sk->sk_state) & (TCPF_LISTEN | TCPF_NEW_SYN_RECV);
}

-/**
- * sk_state_load - read sk->sk_state for lockless contexts
- * @sk: socket pointer
- *
- * Paired with sk_state_store(). Used in places we do not hold socket lock :
- * tcp_diag_get_info(), tcp_get_info(), tcp_poll(), get_tcp4_sock() ...
- */
-static inline int sk_state_load(const struct sock *sk)
-{
- return smp_load_acquire(&sk->sk_state);
-}
-
-/**
- * sk_state_store - update sk->sk_state
- * @sk: socket pointer
- * @newstate: new state
- *
- * Paired with sk_state_load(). Should be used in contexts where
- * state change might impact lockless readers.
- */
-static inline void sk_state_store(struct sock *sk, int newstate)
-{
- smp_store_release(&sk->sk_state, newstate);
-}
-
void sock_enable_timestamp(struct sock *sk, int flag);
int sock_get_timestamp(struct sock *, struct timeval __user *);
int sock_get_timestampns(struct sock *, struct timespec __user *);
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index f460fc0..12410ec 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -685,7 +685,7 @@ static void reqsk_timer_handler(struct timer_list *t)
int max_retries, thresh;
u8 defer_accept;

- if (sk_state_load(sk_listener) != TCP_LISTEN)
+ if (inet_sk_state_load(sk_listener) != TCP_LISTEN)
goto drop;

max_retries = icsk->icsk_syn_retries ? : net->ipv4.sysctl_tcp_synack_retries;
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index d408fb4..67d39b7 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -502,7 +502,7 @@ unsigned int tcp_poll(struct file *file, struct socket *sock, poll_table *wait)

sock_poll_wait(file, sk_sleep(sk), wait);

- state = sk_state_load(sk);
+ state = inet_sk_state_load(sk);
if (state == TCP_LISTEN)
return inet_csk_listen_poll(sk);

@@ -2916,7 +2916,7 @@ void tcp_get_info(struct sock *sk, struct tcp_info *info)
if (sk->sk_type != SOCK_STREAM)
return;

- info->tcpi_state = sk_state_load(sk);
+ info->tcpi_state = inet_sk_state_load(sk);

/* Report meaningful fields for all TCP states, including listeners */
rate = READ_ONCE(sk->sk_pacing_rate);
diff --git a/net/ipv4/tcp_diag.c b/net/ipv4/tcp_diag.c
index abbf0ed..81148f7 100644
--- a/net/ipv4/tcp_diag.c
+++ b/net/ipv4/tcp_diag.c
@@ -24,7 +24,7 @@ static void tcp_diag_get_info(struct sock *sk, struct inet_diag_msg *r,
{
struct tcp_info *info = _info;

- if (sk_state_load(sk) == TCP_LISTEN) {
+ if (inet_sk_state_load(sk) == TCP_LISTEN) {
r->idiag_rqueue = sk->sk_ack_backlog;
r->idiag_wqueue = sk->sk_max_ack_backlog;
} else if (sk->sk_type == SOCK_STREAM) {
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 77ea45d..67ef303 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -2281,7 +2281,7 @@ static void get_tcp4_sock(struct sock *sk, struct seq_file *f, int i)
timer_expires = jiffies;
}

- state = sk_state_load(sk);
+ state = inet_sk_state_load(sk);
if (state == TCP_LISTEN)
rx_queue = sk->sk_ack_backlog;
else
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 1f04ec0..af2b2a2 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1795,7 +1795,7 @@ static void get_tcp6_sock(struct seq_file *seq, struct sock *sp, int i)
timer_expires = jiffies;
}

- state = sk_state_load(sp);
+ state = inet_sk_state_load(sp);
if (state == TCP_LISTEN)
rx_queue = sp->sk_ack_backlog;
else
--
1.8.3.1

2017-12-20 19:00:44

by David Miller

[permalink] [raw]
Subject: Re: [PATCH v3 net-next 0/5] replace tcp_set_state tracepoint with inet_sock_set_state

From: Yafang Shao <[email protected]>
Date: Wed, 20 Dec 2017 11:12:49 +0800

> According to the discussion in the mail thread
> https://patchwork.kernel.org/patch/10099243/,
> tcp_set_state tracepoint is renamed to inet_sock_set_state tracepoint and is
> moved to include/trace/events/sock.h.
>
> With this new tracepoint, we can trace AF_INET/AF_INET6 sock state transitions.
> As there's only one single tracepoint for inet, so I didn't create a new trace
> file named trace/events/inet_sock.h, and just place it in
> include/trace/events/sock.h
>
> Currently TCP/DCCP/SCTP state transitions are traced with this tracepoint.
>
> - Why not more protocol ?
> If we really think that anonter protocol should be traced, I will modify the
> code to trace it.
> I just want to make the code easy and not output useless information.

Series applied, thanks.

2017-12-30 22:33:41

by Brendan Gregg

[permalink] [raw]
Subject: Re: [PATCH v3 net-next 2/5] net: tracepoint: replace tcp_set_state tracepoint with inet_sock_set_state tracepoint

On Tue, Dec 19, 2017 at 7:12 PM, Yafang Shao <[email protected]> wrote:
> As sk_state is a common field for struct sock, so the state
> transition tracepoint should not be a TCP specific feature.
> Currently it traces all AF_INET state transition, so I rename this
> tracepoint to inet_sock_set_state tracepoint with some minor changes and move it
> into trace/events/sock.h.

The tcp:tcp_set_state probe is tcp_set_state(), so it's only going to
fire for TCP sessions. It's not broken, and we could add a
sctp:sctp_set_state as well. Replacing tcp:tcp_set_state with
inet_sk_set_state is feeling like we might be baking too much
implementation detail into the tracepoint API.

If we must have inet_sk_set_state, then must we also delete tcp:tcp_set_state?

Brendan


> We dont need to create a file named trace/events/inet_sock.h for this one single
> tracepoint.
>
> Two helpers are introduced to trace sk_state transition
> - void inet_sk_state_store(struct sock *sk, int newstate);
> - void inet_sk_set_state(struct sock *sk, int state);
> As trace header should not be included in other header files,
> so they are defined in sock.c.
>
> The protocol such as SCTP maybe compiled as a ko, hence export
> inet_sk_set_state().
>[...]

2017-12-31 03:07:08

by Yafang Shao

[permalink] [raw]
Subject: Re: [PATCH v3 net-next 2/5] net: tracepoint: replace tcp_set_state tracepoint with inet_sock_set_state tracepoint

On Sun, Dec 31, 2017 at 6:33 AM, Brendan Gregg
<[email protected]> wrote:
> On Tue, Dec 19, 2017 at 7:12 PM, Yafang Shao <[email protected]> wrote:
>> As sk_state is a common field for struct sock, so the state
>> transition tracepoint should not be a TCP specific feature.
>> Currently it traces all AF_INET state transition, so I rename this
>> tracepoint to inet_sock_set_state tracepoint with some minor changes and move it
>> into trace/events/sock.h.
>
> The tcp:tcp_set_state probe is tcp_set_state(), so it's only going to
> fire for TCP sessions. It's not broken, and we could add a
> sctp:sctp_set_state as well. Replacing tcp:tcp_set_state with
> inet_sk_set_state is feeling like we might be baking too much
> implementation detail into the tracepoint API.
>
> If we must have inet_sk_set_state, then must we also delete tcp:tcp_set_state?
>

Hi Brendan,

The reason we have to make this change could be got from this mail
thread, https://patchwork.kernel.org/patch/10099243/ .

The original tcp:tcp_set_state probe doesn't traced all TCP state transitions.
There're some state transitions in inet_connection_sock.c and
inet_hashtables.c are missed.
So we have to place this probe into these two files to fix the issue.
But as inet_connection_sock.c and inet_hashtables.c are common files
for all IPv4 protocols, not only for TCP, so it is not proper to place
a tcp_ function in these two files.
That's why we decide to rename tcp:tcp_set_state probe to
sock:inet_sock_set_state.

Thanks
Yafang

2018-01-02 19:47:00

by Brendan Gregg

[permalink] [raw]
Subject: Re: [PATCH v3 net-next 2/5] net: tracepoint: replace tcp_set_state tracepoint with inet_sock_set_state tracepoint

On Sat, Dec 30, 2017 at 7:06 PM, Yafang Shao <[email protected]> wrote:
> On Sun, Dec 31, 2017 at 6:33 AM, Brendan Gregg
> <[email protected]> wrote:
>> On Tue, Dec 19, 2017 at 7:12 PM, Yafang Shao <[email protected]> wrote:
>>> As sk_state is a common field for struct sock, so the state
>>> transition tracepoint should not be a TCP specific feature.
>>> Currently it traces all AF_INET state transition, so I rename this
>>> tracepoint to inet_sock_set_state tracepoint with some minor changes and move it
>>> into trace/events/sock.h.
>>
>> The tcp:tcp_set_state probe is tcp_set_state(), so it's only going to
>> fire for TCP sessions. It's not broken, and we could add a
>> sctp:sctp_set_state as well. Replacing tcp:tcp_set_state with
>> inet_sk_set_state is feeling like we might be baking too much
>> implementation detail into the tracepoint API.
>>
>> If we must have inet_sk_set_state, then must we also delete tcp:tcp_set_state?
>>
>
> Hi Brendan,
>
> The reason we have to make this change could be got from this mail
> thread, https://patchwork.kernel.org/patch/10099243/ .
>
> The original tcp:tcp_set_state probe doesn't traced all TCP state transitions.
> There're some state transitions in inet_connection_sock.c and
> inet_hashtables.c are missed.
> So we have to place this probe into these two files to fix the issue.
> But as inet_connection_sock.c and inet_hashtables.c are common files
> for all IPv4 protocols, not only for TCP, so it is not proper to place
> a tcp_ function in these two files.
> That's why we decide to rename tcp:tcp_set_state probe to
> sock:inet_sock_set_state.

It kinda feels like we are fixing one exposing-implementation problem
(the missing state changes, which I'm happy to see fixed), by exposing
another (there's no tcp:tcp_set_state because we don't want to put tcp
functions in inet*.c files). Anyway...

If I'm to use sock:inet_sock_set_state for TCP tracing, I'd like
sk->sk_protocol exposed as a tracepoint argument so I can match on
IPPROTO_TCP. Otherwise I'll have to keep digging it out of (void
*)skaddr. (And if we're adding arguments, maybe consider sk_family as
well, to make it easier to see which address arguments to use).

Brendan

2018-01-02 19:52:11

by David Miller

[permalink] [raw]
Subject: Re: [PATCH v3 net-next 2/5] net: tracepoint: replace tcp_set_state tracepoint with inet_sock_set_state tracepoint

From: Brendan Gregg <[email protected]>
Date: Tue, 2 Jan 2018 11:46:26 -0800

> If I'm to use sock:inet_sock_set_state for TCP tracing, I'd like
> sk->sk_protocol exposed as a tracepoint argument so I can match on
> IPPROTO_TCP.

Agreed.

2018-01-03 01:49:38

by Yafang Shao

[permalink] [raw]
Subject: Re: [PATCH v3 net-next 2/5] net: tracepoint: replace tcp_set_state tracepoint with inet_sock_set_state tracepoint

On Wed, Jan 3, 2018 at 3:46 AM, Brendan Gregg <[email protected]> wrote:
> On Sat, Dec 30, 2017 at 7:06 PM, Yafang Shao <[email protected]> wrote:
>> On Sun, Dec 31, 2017 at 6:33 AM, Brendan Gregg
>> <[email protected]> wrote:
>>> On Tue, Dec 19, 2017 at 7:12 PM, Yafang Shao <[email protected]> wrote:
>>>> As sk_state is a common field for struct sock, so the state
>>>> transition tracepoint should not be a TCP specific feature.
>>>> Currently it traces all AF_INET state transition, so I rename this
>>>> tracepoint to inet_sock_set_state tracepoint with some minor changes and move it
>>>> into trace/events/sock.h.
>>>
>>> The tcp:tcp_set_state probe is tcp_set_state(), so it's only going to
>>> fire for TCP sessions. It's not broken, and we could add a
>>> sctp:sctp_set_state as well. Replacing tcp:tcp_set_state with
>>> inet_sk_set_state is feeling like we might be baking too much
>>> implementation detail into the tracepoint API.
>>>
>>> If we must have inet_sk_set_state, then must we also delete tcp:tcp_set_state?
>>>
>>
>> Hi Brendan,
>>
>> The reason we have to make this change could be got from this mail
>> thread, https://patchwork.kernel.org/patch/10099243/ .
>>
>> The original tcp:tcp_set_state probe doesn't traced all TCP state transitions.
>> There're some state transitions in inet_connection_sock.c and
>> inet_hashtables.c are missed.
>> So we have to place this probe into these two files to fix the issue.
>> But as inet_connection_sock.c and inet_hashtables.c are common files
>> for all IPv4 protocols, not only for TCP, so it is not proper to place
>> a tcp_ function in these two files.
>> That's why we decide to rename tcp:tcp_set_state probe to
>> sock:inet_sock_set_state.
>
> It kinda feels like we are fixing one exposing-implementation problem
> (the missing state changes, which I'm happy to see fixed), by exposing
> another (there's no tcp:tcp_set_state because we don't want to put tcp
> functions in inet*.c files). Anyway...
>
> If I'm to use sock:inet_sock_set_state for TCP tracing, I'd like
> sk->sk_protocol exposed as a tracepoint argument so I can match on
> IPPROTO_TCP. Otherwise I'll have to keep digging it out of (void
> *)skaddr. (And if we're adding arguments, maybe consider sk_family as
> well, to make it easier to see which address arguments to use).
>

I will improve it as your suggestion.

Thanks
Yafang