[NET/IPV4]: a lighter UDP-Lite (RFC 3828)
This is a revised RFC resubmission of the UDP-Lite code which, thanks
to suggestions by David Miller, is now drastically reduced in size:
``A fully functional UDP-Lite module in a mere 209 lines !''
I feel that not much more can be removed without making the code obfuscated,
but would like to challenge people on this list to look out for further
possible integration and reductions.
I would further like to hear suggestions for a common naming scheme, after
some of the UDP functions have been made generic, shared between both UDP
and UDP-Lite.
I will wait with the UDP(-Lite)v6 part until feedback and comments have been
received: the v6-side will mirror the format of the v4-side.
To get a quick idea of what is happening, it is best to start with udplite.c,
since this also lists all the shared functions. This file is #included into
udp.c -- I did want to keep functionally different blocks of code logically
separate, but could not see the need for separate compilation.
A detailed changelog is included below.
The code has been tested over several days on i686, i386-SMP, AMD,
and sparc64 platforms; using various userland and kernel applications
such as multicast streaming, DNS, socket programs, NFS client/server
(different file sizes); and on hardware with TX/RX UDP checksums (tg3).
Enclosed patch can be applied to Torvald's tree. Application code for testing is on
http://www.erg.abdn.ac.uk/users/gerrit/udp-lite/files/udplite_linux.tar.gz
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
Important things that need to be resolved
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
a) Naming scheme: Several functions are now generic, shared between UDP and UDP-Lite.
They have not been given new names so far. Which naming scheme should be used ???
[e.g. `udpl_checksum_complete() instead of `udp_checksum_complete()' ?]
b) udp_v{4,6}_get_port(): raised earlier, this function appears almost identical in
two places. There has been discussion, resubmission, but no final opinion yet.
Can people please decide whether the suggested integration is OK or not -- the single
get_port algorithm now has a total of four customers: udp4, udplite4, udp6, udplite6.
c) Code cosmetics: I have left out any cosmetical changes for later, to minimize patch
size. But eventually I would like to tidy up the code, in particular add more
documentation to the structs and some of the (shared) functions. Suggestions ?
d) Shared udp_hash_lock: Is it worth to implement separate rwlocks for UDP and UDP-Lite?
This would make the code quite a lot more complicated and disadvantages will only occur
in the borderline case when many UDP applications have to compete at the same time with
many UDP-Lite applications. But will this result in noticeable performance loss at all?
C h a n g e l o g
1/ Code integration.
The patch follows David's suggestions. Additionally, the implementation
was made simpler by exploiting the new `pcflag' member which
struct udp_sock
now contains. This flag can only be set by UDP-Lite and so uniquely
distinguishes UDP and UDP-Lite sockets. On UDP sockets, pcflag will
always be 0 since the structure is zeroed out upon allocation.
2/ No separate UDP-Lite header.
UDP-Lite does not really define a new header structure, rather it re-interprets the
`len' header field of UDP with a different semantics. Therefore, a separate `struct
udplitehdr' is not really necessary and hides the fact that 75% of the header
structures have exactly the same meaning. Thus UDP-Lite now also uses `struct udphdr',
the semantic difference is taken care of by the code.
3/ Code-sharing.
The following functions can now be shared due to reliance on common structures:
* udp_disconnect() (thanks to unified struct udp_sock )
* udp_v4_mcast_next() (thanks to unified struct udp_sock )
* udp_getsockopt() (thanks to unified struct udp_sock )
* do_udp_getsockopt() (thanks to unified struct udp_sock )
* compat_udp_getsockopt() (thanks to unified struct udp_sock )
* udp_encap_rcv() (thanks to unified struct udphdr )
* udp_ioctl() (thanks to unifying both structures)
The following functions have been turned into parameterised ones:
* udp_v4_get_port() - parameterised as __udp_get_port()
* udp_v4_lookup() - parameterised as __udp_lookup()
* udp_err() - parameterised as __udp_err()
* udp_v4_mcast_deliver() - parameterised as __udp_mcast_deliver()
This was possible thanks to common use of udp_v4_mcast_next(see above).
* udp_lport_inuse() - parameterised as __udp_lport_inuse()
This function is unnecessary in net/udp.h ! See earlier patch / discussion
on udp_get_port() in net/ipv6/udp.c
* udp_rcv() - parameterised as __udp_common_rcv()
Functions shared between UDP and UDP-Lite: in several cases it was more straightforward
and cleaner to make shared functions udplite-aware than adding even more functions.
* udp_checksum_complete() and
* __udp_checksum_complete()
The entire checksumming code has been revised and integrated; inevitably, the UDP
checksumming code implements a subset of the UDP-Lite checksumming code, a good reason
to consolidate both. See also udp(lite)_checksum_init(), udp(lite)_csum_outgoing().
* udp_recvmsg()
Required minor modifications in 3 places, thanks to unified checksumming procedures.
* udp_poll()
The task which udp_poll tackles is the same for UDP and UDP-Lite. The gain is
that a separate `struct proto_ops' is not necessary for udplite, the modification
is trivial thanks to shared checksum routines.
* udp_sendmsg()
Required only 6 minor changes, removing 189 lines out of udplite.c.
* udp_push_pending_frames()
All checksumming code has been externalized into udp(lite)_csum_outgoing(),
which are called accordingly. Also makes it easier to compare UDP/-Lite checksumming.
* udp_sendpage()
Modifications required one variable and two conditionals.
* udp_queue_rcv_skb()
Thanks to udp_encap_rcv() and unified checksum processing, the differences
essentially amount to updating the MIB statistics variables.
* udp_setsockopt() and
* compat_udp_setsockopt() and
* do_udp_getsocktopt()
Setsockopt required three additional tests, to avoid setting UDP-Lite options
on UDP sockets.
The following functions have been merged with existing UDP functions:
* udp_check() - one-line (not in-line) function which was only called once in v4/udp.c
and not at all in v6/udp.c; further used an used udphdr as parameter.
* udp_v4_lookup_longway() - Has been renamed into __udp_lookup() and the locking code from
udp_v4_lookup() has been put in-place.
These are new functions:
* udp_csum_outgoing() - externalizes existing code into self-contained function
* udplite_csum_outgoing() - symmetrical to udp_checksum_outgoing(), also fully self-contained
* udplite_checksum_init() - symmetrical to udp_checksum_init()
* udplite4_proc_init() - simply registers the udplite4-specific seq_afinfo
* udplite4_proc_exit() - symmetrical to udp4_proc_exit()
And, finally, there are functions which required no changes and can trivially be shared:
* udp_v4_hash/unhash()
* udp_flush_pending_frames()
* udp_destroy_sock()
* udp_close()
* udp_seq_open() (thanks to an updated udp_iter_state struct)
* udp_seq_start(), udp_seq_next()
* udp4_seq_show(), udp_seq_stop()
* udp_get_first(), udp_get_next()
* udp4_format_sock()
* udp_get_idx()
* udp_proc_register()
* udp_proc_unregister()
4/ UDP counter bug.
The present implementation of UDP increments InDatagrams when the packet is enqueued and increments
InErrors when the datagram is dequeued later due to a failed checksum (udp_recvmsg(), udp_poll()).
This behaviour is contrary to RFC 2013 (cf. http://bugzilla.kernel.org/show_bug.cgi?id=6660).
The present implementation resolves this bug both for UDP and UDP-Lite by decrementing whenever
an enqueued datagram is removed due to failed input processing; applications count correctly.
5/ A MIB for UDP-Lite.
UDP-Lite, a `lighter' UDP, currently re-uses the existing SNMPv2 MIB for UDP (RFC 2013). This is
perfectly OK for the moment, since so far there has been no standardization effort for a UDP-Lite
MIB yet. Experimental UDP-Lite MIB patches will be made available for testing and research purposes,
and will be submitted to this list as soon as standardization effort can be discerned.
-- grrtrr
[NET/IPv4]: update for udp.c only, to match 2.6.18-rc4-mm3
This is an update only, as the previous patch can not cope
with recent changes to udp.c (all other files remain the same).
Up-to-date, complete patches can always be taken from
http://www.erg.abdn.ac.uk/users/gerrit/udp-lite/files/udplite_linux.tar.gz
Signed-off-by: Gerrit Renker <[email protected]>
---
udp.c | 606 ++++++++++++++++++++++++++++++++++++++++++++----------------------
1 file changed, 410 insertions(+), 196 deletions(-)
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 514c1e9..4ddd8e6 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -92,10 +92,8 @@ #include <linux/errno.h>
#include <linux/timer.h>
#include <linux/mm.h>
#include <linux/inet.h>
-#include <linux/ipv6.h>
#include <linux/netdevice.h>
#include <net/snmp.h>
-#include <net/ip.h>
#include <net/tcp_states.h>
#include <net/protocol.h>
#include <linux/skbuff.h>
@@ -121,7 +119,19 @@ DEFINE_RWLOCK(udp_hash_lock);
/* Shared by v4/v6 udp. */
int udp_port_rover;
-static int udp_v4_get_port(struct sock *sk, unsigned short snum)
+/* the extensions for UDP-Lite (RFC 3828) */
+#include "udplite.c"
+
+/**
+ * __udp_get_port - find an unbound UDP(-Lite) port
+ *
+ * @sk: udp_sock
+ * @snum: port number to look up
+ * @udptable: hash list table, must be of UDP_HTABLE_SIZE
+ * @port_rover: pointer to record of last unallocated port
+ */
+int __udp_get_port(struct sock *sk, unsigned short snum,
+ struct hlist_head udptable[], int *port_rover)
{
struct hlist_node *node;
struct sock *sk2;
@@ -131,16 +141,16 @@ static int udp_v4_get_port(struct sock *
if (snum == 0) {
int best_size_so_far, best, result, i;
- if (udp_port_rover > sysctl_local_port_range[1] ||
- udp_port_rover < sysctl_local_port_range[0])
- udp_port_rover = sysctl_local_port_range[0];
+ if (*port_rover > sysctl_local_port_range[1] ||
+ *port_rover < sysctl_local_port_range[0])
+ *port_rover = sysctl_local_port_range[0];
best_size_so_far = 32767;
- best = result = udp_port_rover;
+ best = result = *port_rover;
for (i = 0; i < UDP_HTABLE_SIZE; i++, result++) {
struct hlist_head *list;
int size;
- list = &udp_hash[result & (UDP_HTABLE_SIZE - 1)];
+ list = &udptable[result & (UDP_HTABLE_SIZE - 1)];
if (hlist_empty(list)) {
if (result > sysctl_local_port_range[1])
result = sysctl_local_port_range[0] +
@@ -162,16 +172,16 @@ static int udp_v4_get_port(struct sock *
result = sysctl_local_port_range[0]
+ ((result - sysctl_local_port_range[0]) &
(UDP_HTABLE_SIZE - 1));
- if (!udp_lport_inuse(result))
+ if (! __udp_lport_inuse(result, udptable))
break;
}
if (i >= (1 << 16) / UDP_HTABLE_SIZE)
goto fail;
gotit:
- udp_port_rover = snum = result;
+ *port_rover = snum = result;
} else {
sk_for_each(sk2, node,
- &udp_hash[snum & (UDP_HTABLE_SIZE - 1)]) {
+ &udptable[snum & (UDP_HTABLE_SIZE - 1)]) {
struct inet_sock *inet2 = inet_sk(sk2);
if (inet2->num == snum &&
@@ -189,7 +199,7 @@ gotit:
}
inet->num = snum;
if (sk_unhashed(sk)) {
- struct hlist_head *h = &udp_hash[snum & (UDP_HTABLE_SIZE - 1)];
+ struct hlist_head *h = &udptable[snum & (UDP_HTABLE_SIZE - 1)];
sk_add_node(sk, h);
sock_prot_inc_use(sk->sk_prot);
@@ -202,6 +212,11 @@ fail:
return 1;
}
+static __inline__ int udp_v4_get_port(struct sock *sk, unsigned short snum)
+{
+ return __udp_get_port(sk, snum, udp_hash, &udp_port_rover);
+}
+
static void udp_v4_hash(struct sock *sk)
{
BUG();
@@ -217,18 +232,24 @@ static void udp_v4_unhash(struct sock *s
write_unlock_bh(&udp_hash_lock);
}
-/* UDP is nearly always wildcards out the wazoo, it makes no sense to try
- * harder than this. -DaveM
+/**
+ * __udp_lookup - find UDP(-Lite) socket
+ *
+ * @udptable: hash list table, must be of UDP_HTABLE_SIZE
+ *
+ * UDP nearly always wildcards out the wazoo, it makes no sense to try
+ * harder than this. -DaveM
*/
-static struct sock *udp_v4_lookup_longway(u32 saddr, u16 sport,
- u32 daddr, u16 dport, int dif)
+struct sock *__udp_lookup(u32 saddr, u16 sport, u32 daddr, u16 dport, int dif,
+ struct hlist_head udptable[] )
{
struct sock *sk, *result = NULL;
struct hlist_node *node;
unsigned short hnum = ntohs(dport);
int badness = -1;
- sk_for_each(sk, node, &udp_hash[hnum & (UDP_HTABLE_SIZE - 1)]) {
+ read_lock(&udp_hash_lock);
+ sk_for_each(sk, node, &udptable[hnum & (UDP_HTABLE_SIZE - 1)]) {
struct inet_sock *inet = inet_sk(sk);
if (inet->num == hnum && !ipv6_only_sock(sk)) {
@@ -262,20 +283,17 @@ static struct sock *udp_v4_lookup_longwa
}
}
}
+ if (result)
+ sock_hold(result);
+ read_unlock(&udp_hash_lock);
+
return result;
}
static __inline__ struct sock *udp_v4_lookup(u32 saddr, u16 sport,
u32 daddr, u16 dport, int dif)
{
- struct sock *sk;
-
- read_lock(&udp_hash_lock);
- sk = udp_v4_lookup_longway(saddr, sport, daddr, dport, dif);
- if (sk)
- sock_hold(sk);
- read_unlock(&udp_hash_lock);
- return sk;
+ return __udp_lookup(saddr, sport, daddr, dport, dif, udp_hash);
}
static inline struct sock *udp_v4_mcast_next(struct sock *sk,
@@ -306,7 +324,11 @@ found:
return s;
}
-/*
+/**
+ * __udp_err - generic UDP/-Lite error routine
+ *
+ * @udptable: hash list table, must be of UDP_HTABLE_SIZE
+ *
* This routine is called by the ICMP module when it gets some
* sort of error condition. If err < 0 then the socket should
* be closed and the error returned to the user. If err > 0
@@ -316,8 +338,7 @@ found:
* header points to the first 8 bytes of the udp header. We need
* to find the appropriate port.
*/
-
-void udp_err(struct sk_buff *skb, u32 info)
+void __udp_err(struct sk_buff *skb, u32 info, struct hlist_head udptable[])
{
struct inet_sock *inet;
struct iphdr *iph = (struct iphdr*)skb->data;
@@ -328,7 +349,8 @@ void udp_err(struct sk_buff *skb, u32 in
int harderr;
int err;
- sk = udp_v4_lookup(iph->daddr, uh->dest, iph->saddr, uh->source, skb->dev->ifindex);
+ sk = __udp_lookup(iph->daddr, uh->dest, iph->saddr, uh->source,
+ skb->dev->ifindex, udptable );
if (sk == NULL) {
ICMP_INC_STATS_BH(ICMP_MIB_INERRORS);
return; /* No socket for error */
@@ -382,6 +404,11 @@ out:
sock_put(sk);
}
+__inline__ void udp_err(struct sk_buff *skb, u32 info)
+{
+ return __udp_err(skb, info, udp_hash);
+}
+
/*
* Throw away all pending data and cancel the corking. Socket is locked.
*/
@@ -396,33 +423,17 @@ static void udp_flush_pending_frames(str
}
}
-/*
- * Push out all pending data as one UDP datagram. Socket is locked.
- */
-static int udp_push_pending_frames(struct sock *sk, struct udp_sock *up)
+static void udp_csum_outgoing(struct sock *sk, struct sk_buff *skb,
+ int totlen, u32 src, u32 dst )
{
- struct inet_sock *inet = inet_sk(sk);
- struct flowi *fl = &inet->cork.fl;
- struct sk_buff *skb;
- struct udphdr *uh;
- int err = 0;
+ unsigned int csum = 0;
+ struct udphdr *uh = skb->h.uh;
- /* Grab the skbuff where UDP header space exists. */
- if ((skb = skb_peek(&sk->sk_write_queue)) == NULL)
- goto out;
-
- /*
- * Create a UDP header
- */
- uh = skb->h.uh;
- uh->source = fl->fl_ip_sport;
- uh->dest = fl->fl_ip_dport;
- uh->len = htons(up->len);
uh->check = 0;
if (sk->sk_no_check == UDP_CSUM_NOXMIT) {
skb->ip_summed = CHECKSUM_NONE;
- goto send;
+ return;
}
if (skb_queue_len(&sk->sk_write_queue) == 1) {
@@ -431,42 +442,95 @@ static int udp_push_pending_frames(struc
*/
if (skb->ip_summed == CHECKSUM_PARTIAL) {
skb->csum = offsetof(struct udphdr, check);
- uh->check = ~csum_tcpudp_magic(fl->fl4_src, fl->fl4_dst,
- up->len, IPPROTO_UDP, 0);
+ uh->check = ~csum_tcpudp_magic(src, dst, totlen,
+ IPPROTO_UDP, 0 );
+ return;
} else {
- skb->csum = csum_partial((char *)uh,
- sizeof(struct udphdr), skb->csum);
- uh->check = csum_tcpudp_magic(fl->fl4_src, fl->fl4_dst,
- up->len, IPPROTO_UDP, skb->csum);
- if (uh->check == 0)
- uh->check = -1;
+ csum = csum_partial(skb->h.raw, sizeof(struct udphdr),
+ skb->csum );
}
} else {
- unsigned int csum = 0;
/*
- * HW-checksum won't work as there are two or more
+ * HW-checksum won't work as there are two or more
* fragments on the socket so that all csums of sk_buffs
- * should be together.
+ * should be together
*/
if (skb->ip_summed == CHECKSUM_PARTIAL) {
- int offset = (unsigned char *)uh - skb->data;
+ int offset = skb->h.raw - skb->data;
skb->csum = skb_checksum(skb, offset, skb->len - offset, 0);
skb->ip_summed = CHECKSUM_NONE;
} else {
- skb->csum = csum_partial((char *)uh,
+ skb->csum = csum_partial(skb->h.raw,
sizeof(struct udphdr), skb->csum);
}
skb_queue_walk(&sk->sk_write_queue, skb) {
csum = csum_add(csum, skb->csum);
}
- uh->check = csum_tcpudp_magic(fl->fl4_src, fl->fl4_dst,
- up->len, IPPROTO_UDP, csum);
- if (uh->check == 0)
- uh->check = -1;
}
-send:
+
+ uh->check = csum_tcpudp_magic(src, dst, totlen, IPPROTO_UDP, csum);
+ if (uh->check == 0)
+ uh->check = -1;
+}
+
+/*
+ * Push out all pending data as one UDP/-Lite datagram. Socket is locked.
+ */
+static int udp_push_pending_frames(struct sock *sk, struct udp_sock *up)
+{
+ struct inet_sock *inet = inet_sk(sk);
+ struct flowi *fl = &inet->cork.fl;
+ struct sk_buff *skb;
+ struct udphdr *uh;
+ int err = 0;
+ u16 cscov = up->len;
+
+ /* Grab the skbuff where UDP header space exists. */
+ if ((skb = skb_peek(&sk->sk_write_queue)) == NULL)
+ goto out;
+
+ /*
+ * Create a UDP header
+ */
+ uh = skb->h.uh;
+ uh->source = fl->fl_ip_sport;
+ uh->dest = fl->fl_ip_dport;
+ uh->len = htons(up->len);
+
+ /*
+ * If sender has set `partial coverage' socket option on a
+ * UDP-Lite socket, adjust coverage length accordingly.
+ * All other cases default to traditional UDP checksum mode.
+ */
+ if (up->pcflag & UDPLITE_SEND_CC) {
+ if (up->pcslen < up->len) {
+ /* up->pcslen == 0 means that full coverage is required,
+ * partial coverage only if 0 < up->pcslen < up->len */
+ if (0 < up->pcslen) {
+ cscov = up->pcslen;
+ }
+ uh->len = htons(up->pcslen);
+ }
+ /*
+ * NOTE: Causes for the error case `up->pcslen > up->len':
+ * (i) Application error (will not be penalized).
+ * (ii) Payload too big for send buffer: data is split
+ * into several packets, each with its own header.
+ * In this case (e.g. last segment), coverage may
+ * exceed packet length.
+ * Since packets with coverage length > packet length are
+ * illegal, we fall back to the defaults here.
+ */
+ }
+
+ if(up->pcflag)
+ udplite_csum_outgoing(sk, skb, up->len, cscov,
+ fl->fl4_src, fl->fl4_dst);
+ else
+ udp_csum_outgoing(sk, skb, up->len, fl->fl4_src, fl->fl4_dst);
+
err = ip_push_pending_frames(sk);
out:
up->len = 0;
@@ -474,12 +538,11 @@ out:
return err;
}
-
-static unsigned short udp_check(struct udphdr *uh, int len, unsigned long saddr, unsigned long daddr, unsigned long base)
-{
- return(csum_tcpudp_magic(saddr, daddr, len, IPPROTO_UDP, base));
-}
-
+/**
+ * udp_sendmsg - generic UDP/-Lite send routine
+ *
+ * This function is udplite-aware and works for both protocols.
+ */
int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
size_t len)
{
@@ -493,8 +556,9 @@ int udp_sendmsg(struct kiocb *iocb, stru
u32 daddr, faddr, saddr;
u16 dport;
u8 tos;
- int err;
+ int err, is_udplite = up->pcflag;
int corkreq = up->corkflag || msg->msg_flags&MSG_MORE;
+ int (*getfrag)(void *, char *, int, int, int, struct sk_buff *);
if (len > 0xFFFF)
return -EMSGSIZE;
@@ -599,7 +663,7 @@ int udp_sendmsg(struct kiocb *iocb, stru
{ .daddr = faddr,
.saddr = saddr,
.tos = tos } },
- .proto = IPPROTO_UDP,
+ .proto = sk->sk_protocol,
.uli_u = { .ports =
{ .sport = inet->sport,
.dport = dport } } };
@@ -645,8 +709,9 @@ back_from_confirm:
do_append_data:
up->len += ulen;
- err = ip_append_data(sk, ip_generic_getfrag, msg->msg_iov, ulen,
- sizeof(struct udphdr), &ipc, rt,
+ getfrag = is_udplite? udplite_getfrag : ip_generic_getfrag;
+ err = ip_append_data(sk, getfrag, msg->msg_iov, ulen,
+ sizeof(struct udphdr), &ipc, rt,
corkreq ? msg->msg_flags|MSG_MORE : msg->msg_flags);
if (err)
udp_flush_pending_frames(sk);
@@ -659,7 +724,7 @@ out:
if (free)
kfree(ipc.opt);
if (!err) {
- UDP_INC_STATS_USER(UDP_MIB_OUTDATAGRAMS);
+ UDP_INC_STATS_USER(UDP_MIB_OUTDATAGRAMS, is_udplite);
return len;
}
/*
@@ -670,7 +735,7 @@ out:
* seems like overkill.
*/
if (err == -ENOBUFS || test_bit(SOCK_NOSPACE, &sk->sk_socket->flags)) {
- UDP_INC_STATS_USER(UDP_MIB_SNDBUFERRORS);
+ UDP_INC_STATS_USER(UDP_MIB_SNDBUFERRORS, is_udplite);
}
return err;
@@ -682,8 +747,13 @@ do_confirm:
goto out;
}
-static int udp_sendpage(struct sock *sk, struct page *page, int offset,
- size_t size, int flags)
+/**
+ * udp_sendpage - generic UDP/-Lite sendpage routine
+ *
+ * This function is udplite-aware and can be used on both sockets.
+ */
+int udp_sendpage(struct sock *sk, struct page *page, int offset,
+ size_t size, int flags)
{
struct udp_sock *up = udp_sk(sk);
int ret;
@@ -731,12 +801,12 @@ out:
}
/*
- * IOCTL requests applicable to the UDP protocol
+ * IOCTL requests applicable to the UDP(-Lite) protocol
*/
-
+
int udp_ioctl(struct sock *sk, int cmd, unsigned long arg)
{
- switch(cmd)
+ switch(cmd)
{
case SIOCOUTQ:
{
@@ -770,29 +840,21 @@ int udp_ioctl(struct sock *sk, int cmd,
return(0);
}
-static __inline__ int __udp_checksum_complete(struct sk_buff *skb)
-{
- return __skb_checksum_complete(skb);
-}
-
-static __inline__ int udp_checksum_complete(struct sk_buff *skb)
-{
- return skb->ip_summed != CHECKSUM_UNNECESSARY &&
- __udp_checksum_complete(skb);
-}
-/*
- * This should be easy, if there is something there we
- * return it, otherwise we block.
+/**
+ * udp_recvmsg - generic UDP/-Lite receive processing
+ *
+ * This routine is udplite-aware and works for both protocols.
+ * Principle: if there is something there we return it, otherwise we block.
*/
-static int udp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
- size_t len, int noblock, int flags, int *addr_len)
+int udp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
+ size_t len, int noblock, int flags, int *addr_len)
{
struct inet_sock *inet = inet_sk(sk);
struct sockaddr_in *sin = (struct sockaddr_in *)msg->msg_name;
struct sk_buff *skb;
- int copied, err;
+ int copied, err, copy_only, is_udplite = IS_UDPLITE(sk);
/*
* Check any passed addresses
@@ -814,21 +876,30 @@ try_again:
msg->msg_flags |= MSG_TRUNC;
}
- if (skb->ip_summed==CHECKSUM_UNNECESSARY) {
- err = skb_copy_datagram_iovec(skb, sizeof(struct udphdr), msg->msg_iov,
- copied);
- } else if (msg->msg_flags&MSG_TRUNC) {
+ /*
+ * Decide whether to checksum and/or copy data.
+ *
+ * UDP: checksum may have been computed in HW,
+ * (re-)compute it if message is truncated.
+ * UDP-Lite: always needs to checksum, no HW support.
+ */
+ copy_only = (skb->ip_summed == CHECKSUM_UNNECESSARY);
+
+ if (is_udplite || (!copy_only && msg->msg_flags&MSG_TRUNC)) {
if (__udp_checksum_complete(skb))
goto csum_copy_err;
- err = skb_copy_datagram_iovec(skb, sizeof(struct udphdr), msg->msg_iov,
- copied);
- } else {
- err = skb_copy_and_csum_datagram_iovec(skb, sizeof(struct udphdr), msg->msg_iov);
+ copy_only = 1;
+ }
+ if (copy_only)
+ err = skb_copy_datagram_iovec(skb, sizeof(struct udphdr),
+ msg->msg_iov, copied );
+ else {
+ err = skb_copy_and_csum_datagram_iovec(skb,
+ sizeof(struct udphdr), msg->msg_iov);
if (err == -EINVAL)
goto csum_copy_err;
}
-
if (err)
goto out_free;
@@ -855,7 +926,8 @@ out:
return err;
csum_copy_err:
- UDP_INC_STATS_BH(UDP_MIB_INERRORS);
+ UDP_DEC_STATS_BH(UDP_MIB_INDATAGRAMS, is_udplite);
+ UDP_INC_STATS_BH(UDP_MIB_INERRORS, is_udplite);
skb_kill_datagram(sk, skb, flags);
@@ -864,7 +936,6 @@ csum_copy_err:
goto try_again;
}
-
int udp_disconnect(struct sock *sk, int flags)
{
struct inet_sock *inet = inet_sk(sk);
@@ -892,8 +963,12 @@ static void udp_close(struct sock *sk, l
sk_common_release(sk);
}
-/* return:
- * 1 if the the UDP system should process it
+/**
+ * udp_encap_rcv - handle encapsulated packets
+ *
+ * This routine is udplite-aware and works on both sockets.
+ * return:
+ * 1 if the the UDP(-Lite) system should process it
* 0 if we should drop this packet
* -1 if it should get processed by xfrm4_rcv_encap
*/
@@ -980,7 +1055,11 @@ #else
#endif
}
-/* returns:
+/**
+ * udp_queue_rcv_skb - receive queue processing
+ *
+ * This routine is udplite-aware and works on both sockets.
+ * returns:
* -1: error
* 0: success
* >0: "udp encap" protocol resubmission
@@ -988,7 +1067,7 @@ #endif
* Note that in the success and error cases, the skb is assumed to
* have either been requeued or freed.
*/
-static int udp_queue_rcv_skb(struct sock * sk, struct sk_buff *skb)
+int udp_queue_rcv_skb(struct sock * sk, struct sk_buff *skb)
{
struct udp_sock *up = udp_sk(sk);
int rc;
@@ -996,10 +1075,8 @@ static int udp_queue_rcv_skb(struct sock
/*
* Charge it to the socket, dropping if the queue is full.
*/
- if (!xfrm4_policy_check(sk, XFRM_POLICY_IN, skb)) {
- kfree_skb(skb);
- return -1;
- }
+ if (!xfrm4_policy_check(sk, XFRM_POLICY_IN, skb))
+ goto drop;
nf_reset(skb);
if (up->encap_type) {
@@ -1010,7 +1087,7 @@ static int udp_queue_rcv_skb(struct sock
* If it's an encapsulateed packet, then pass it to the
* IPsec xfrm input and return the response
* appropriately. Otherwise, just fall through and
- * pass this up the UDP socket.
+ * pass this up the UDP/-Lite socket.
*/
int ret;
@@ -1023,47 +1100,94 @@ static int udp_queue_rcv_skb(struct sock
if (ret < 0) {
/* process the ESP packet */
ret = xfrm4_rcv_encap(skb, up->encap_type);
- UDP_INC_STATS_BH(UDP_MIB_INDATAGRAMS);
+ UDP_INC_STATS_BH(UDP_MIB_INDATAGRAMS, up->pcflag);
return -ret;
}
- /* FALLTHROUGH -- it's a UDP Packet */
+ /* FALLTHROUGH -- it's a UDP/-Lite Packet */
}
- if (sk->sk_filter && skb->ip_summed != CHECKSUM_UNNECESSARY) {
- if (__udp_checksum_complete(skb)) {
- UDP_INC_STATS_BH(UDP_MIB_INERRORS);
- kfree_skb(skb);
- return -1;
+ /*
+ * UDP-Lite specific tests, ignored on UDP sockets
+ */
+ if ((up->pcflag & UDPLITE_RECV_CC) && UDP_SKB_CB(skb)->partial_cov) {
+
+ /*
+ * MIB statistics other than incrementing the error count are
+ * disabled for the following two types of errors: these depend
+ * on the application settings, not on the functioning of the
+ * protocol stack as such.
+ *
+ *
+ * RFC 3828 here recommends (sec 3.3): "There should also be a
+ * way ... to ... at least let the receiving application block
+ * delivery of packets with coverage values less than a value
+ * provided by the application."
+ */
+ if (up->pcrlen == 0) { /* full coverage was set */
+ LIMIT_NETDEBUG(KERN_WARNING "UDPLITE: partial coverage "
+ "%d while full coverage %d requested\n",
+ UDP_SKB_CB(skb)->cscov, skb->len);
+ goto drop;
+ }
+ /* The next case involves violating the min. coverage requested
+ * by the receiver. This is subtle: if receiver wants x and x is
+ * greater than the buffersize/MTU then receiver will complain
+ * that it wants x while sender emits packets of smaller size y.
+ * Therefore the above ...()->partial_cov statement is essential.
+ */
+ if (UDP_SKB_CB(skb)->cscov < up->pcrlen) {
+ LIMIT_NETDEBUG(KERN_WARNING
+ "UDPLITE: coverage %d too small, need min %d\n",
+ UDP_SKB_CB(skb)->cscov, up->pcrlen);
+ goto drop;
}
+ }
+
+ if (sk->sk_filter && skb->ip_summed != CHECKSUM_UNNECESSARY) {
+ if (__udp_checksum_complete(skb))
+ goto drop;
skb->ip_summed = CHECKSUM_UNNECESSARY;
}
if ((rc = sock_queue_rcv_skb(sk,skb)) < 0) {
/* Note that an ENOMEM error is charged twice */
if (rc == -ENOMEM)
- UDP_INC_STATS_BH(UDP_MIB_RCVBUFERRORS);
- UDP_INC_STATS_BH(UDP_MIB_INERRORS);
- kfree_skb(skb);
- return -1;
+ UDP_INC_STATS_BH(UDP_MIB_RCVBUFERRORS, up->pcflag);
+ goto drop;
}
- UDP_INC_STATS_BH(UDP_MIB_INDATAGRAMS);
+
+ /*
+ * XXX Incrementing this counter when the datagram is later taken off
+ * the queue due to receive failure is problematic, cf.
+ * http://bugzilla.kernel.org/show_bug.cgi?id=6660
+ * This module counts correctly by decrementing InDatagrams whenever
+ * the datagram is popped off a queue without being actually delivered,
+ * see udp_recvmsg() and udp_poll().
+ */
+ UDP_INC_STATS_BH(UDP_MIB_INDATAGRAMS, up->pcflag);
return 0;
+
+drop:
+ UDP_INC_STATS_BH(UDP_MIB_INERRORS, up->pcflag);
+ kfree_skb(skb);
+ return -1;
}
-/*
- * Multicasts and broadcasts go to each listener.
+/**
+ * __udp_mcast_deliver - generic multicast delivery
*
+ * UDP(-Lite) multicasts and broadcasts go to each listener.
* Note: called only from the BH handler context,
* so we don't need to lock the hashes.
*/
-static int udp_v4_mcast_deliver(struct sk_buff *skb, struct udphdr *uh,
- u32 saddr, u32 daddr)
+int __udp_mcast_deliver(struct sk_buff *skb, struct udphdr *uh,
+ u32 saddr, u32 daddr, struct hlist_head udptable[])
{
struct sock *sk;
int dif;
read_lock(&udp_hash_lock);
- sk = sk_head(&udp_hash[ntohs(uh->dest) & (UDP_HTABLE_SIZE - 1)]);
+ sk = sk_head(&udptable[ntohs(uh->dest) & (UDP_HTABLE_SIZE - 1)]);
dif = skb->dev->ifindex;
sk = udp_v4_mcast_next(sk, uh->dest, daddr, uh->source, saddr, dif);
if (sk) {
@@ -1092,10 +1216,16 @@ static int udp_v4_mcast_deliver(struct s
return 0;
}
-/* Initialize UDP checksum. If exited with zero value (success),
- * CHECKSUM_UNNECESSARY means, that no more checks are required.
- * Otherwise, csum completion requires chacksumming packet body,
- * including udp header and folding it to skb->csum.
+static __inline__ int udp_v4_mcast_deliver(struct sk_buff *skb,
+ struct udphdr *uh, u32 saddr, u32 daddr)
+{
+ return __udp_mcast_deliver(skb, uh, saddr, daddr, udp_hash);
+}
+
+/* Initialize UDP checksum.
+ * CHECKSUM_UNNECESSARY means that no more checks are required.
+ * Otherwise, csum completion requires checksumming packet body,
+ * including udp header, and folding it to skb->csum.
*/
static void udp_checksum_init(struct sk_buff *skb, struct udphdr *uh,
unsigned short ulen, u32 saddr, u32 daddr)
@@ -1103,7 +1233,7 @@ static void udp_checksum_init(struct sk_
if (uh->check == 0) {
skb->ip_summed = CHECKSUM_UNNECESSARY;
} else if (skb->ip_summed == CHECKSUM_COMPLETE) {
- if (!udp_check(uh, ulen, saddr, daddr, skb->csum))
+ if (!csum_tcpudp_magic(saddr,daddr,ulen, IPPROTO_UDP, skb->csum))
skb->ip_summed = CHECKSUM_UNNECESSARY;
}
if (skb->ip_summed != CHECKSUM_UNNECESSARY)
@@ -1111,51 +1241,61 @@ static void udp_checksum_init(struct sk_
/* Probably, we should checksum udp header (it should be in cache
* in any case) and data in tiny packets (< rx copybreak).
*/
+
+ /* UDP = UDP-Lite with a non-partial checksum coverage */
+ UDP_SKB_CB(skb)->partial_cov = 0;
}
/*
- * All we need to do is get the socket, and then do a checksum.
+ * All we need to do is get the socket, and then do a checksum.
*/
-
-int udp_rcv(struct sk_buff *skb)
+int __udp_common_rcv(struct sk_buff *skb, int is_udplite)
{
struct sock *sk;
- struct udphdr *uh;
+ struct udphdr *uh = skb->h.uh;
unsigned short ulen;
struct rtable *rt = (struct rtable*)skb->dst;
u32 saddr = skb->nh.iph->saddr;
u32 daddr = skb->nh.iph->daddr;
int len = skb->len;
+ struct hlist_head *ht = is_udplite? udplite_hash : udp_hash;
/*
- * Validate the packet and the UDP length.
+ * Validate the packet.
*/
if (!pskb_may_pull(skb, sizeof(struct udphdr)))
- goto no_header;
-
- uh = skb->h.uh;
+ goto drop; /* No space for header. */
ulen = ntohs(uh->len);
- if (ulen > len || ulen < sizeof(*uh))
- goto short_packet;
+ if (! is_udplite) {
+ if (ulen > len || ulen < sizeof(*uh))
+ goto short_packet;
- if (pskb_trim_rcsum(skb, ulen))
- goto short_packet;
+ if (pskb_trim_rcsum(skb, ulen))
+ goto short_packet;
- udp_checksum_init(skb, uh, ulen, saddr, daddr);
+ udp_checksum_init(skb, uh, ulen, saddr, daddr);
+
+ } else { /* UDP-Lite: we must not trim here */
+ if (len < sizeof(*uh))
+ goto short_packet;
+
+ if (! udplite_checksum_init(skb, uh, len, saddr, daddr))
+ goto csum_error;
+ }
if(rt->rt_flags & (RTCF_BROADCAST|RTCF_MULTICAST))
- return udp_v4_mcast_deliver(skb, uh, saddr, daddr);
+ return __udp_mcast_deliver(skb, uh, saddr, daddr, ht);
- sk = udp_v4_lookup(saddr, uh->source, daddr, uh->dest, skb->dev->ifindex);
+ sk = __udp_lookup(saddr, uh->source, daddr, uh->dest, skb->dev->ifindex, ht);
if (sk != NULL) {
int ret = udp_queue_rcv_skb(sk, skb);
sock_put(sk);
/* a return value > 0 means to resubmit the input, but
- * it it wants the return to be -protocol, or 0
+ * it wants the return to be -protocol, or 0
*/
if (ret > 0)
return -ret;
@@ -1170,7 +1310,7 @@ int udp_rcv(struct sk_buff *skb)
if (udp_checksum_complete(skb))
goto csum_error;
- UDP_INC_STATS_BH(UDP_MIB_NOPORTS);
+ UDP_INC_STATS_BH(UDP_MIB_NOPORTS, is_udplite);
icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0);
/*
@@ -1181,35 +1321,39 @@ int udp_rcv(struct sk_buff *skb)
return(0);
short_packet:
- LIMIT_NETDEBUG(KERN_DEBUG "UDP: short packet: From %u.%u.%u.%u:%u %d/%d to %u.%u.%u.%u:%u\n",
+ LIMIT_NETDEBUG(KERN_DEBUG "UDP%s: short packet: From %u.%u.%u.%u:%u %d/%d to %u.%u.%u.%u:%u\n",
+ is_udplite? "-Lite" : "",
NIPQUAD(saddr),
ntohs(uh->source),
ulen,
len,
NIPQUAD(daddr),
ntohs(uh->dest));
-no_header:
- UDP_INC_STATS_BH(UDP_MIB_INERRORS);
- kfree_skb(skb);
- return(0);
+ goto drop;
csum_error:
- /*
- * RFC1122: OK. Discards the bad packet silently (as far as
- * the network is concerned, anyway) as per 4.1.3.4 (MUST).
+ /*
+ * RFC1122: OK. Discards the bad packet silently (as far as
+ * the network is concerned, anyway) as per 4.1.3.4 (MUST).
*/
- LIMIT_NETDEBUG(KERN_DEBUG "UDP: bad checksum. From %d.%d.%d.%d:%d to %d.%d.%d.%d:%d ulen %d\n",
+ LIMIT_NETDEBUG(KERN_DEBUG "UDP%s: bad checksum. From %d.%d.%d.%d:%d to %d.%d.%d.%d:%d ulen %d\n",
+ is_udplite? "-Lite" : "",
NIPQUAD(saddr),
ntohs(uh->source),
NIPQUAD(daddr),
ntohs(uh->dest),
ulen);
drop:
- UDP_INC_STATS_BH(UDP_MIB_INERRORS);
+ UDP_INC_STATS_BH(UDP_MIB_INERRORS, is_udplite);
kfree_skb(skb);
return(0);
}
+__inline__ int udp_rcv(struct sk_buff *skb)
+{
+ return __udp_common_rcv(skb, 0);
+}
+
static int udp_destroy_sock(struct sock *sk)
{
lock_sock(sk);
@@ -1219,7 +1363,7 @@ static int udp_destroy_sock(struct sock
}
/*
- * Socket option code for UDP
+ * Socket option code for UDP and UDP-Lite (shared).
*/
static int do_udp_setsockopt(struct sock *sk, int level, int optname,
char __user *optval, int optlen)
@@ -1259,6 +1403,32 @@ static int do_udp_setsockopt(struct sock
}
break;
+ /*
+ * UDP-Lite's partial checksum coverage (RFC 3828).
+ */
+ /* The sender sets actual checksum coverage length via this option.
+ * The case coverage > packet length is handled by send module. */
+ case UDPLITE_SEND_CSCOV:
+ if (!up->pcflag) /* Disable the option on UDP sockets */
+ return -ENOPROTOOPT;
+ if (val != 0 && val < 8) /* Illegal coverage: use default (8) */
+ val = 8;
+ up->pcslen = val;
+ up->pcflag |= UDPLITE_SEND_CC;
+ break;
+
+ /* The receiver specifies a minimum checksum coverage value. To make
+ * sense, this should be set to at least 8 (as done below). If zero is
+ * used, this again means full checksum coverage. */
+ case UDPLITE_RECV_CSCOV:
+ if (!up->pcflag) /* Disable the option on UDP sockets */
+ return -ENOPROTOOPT;
+ if (val != 0 && val < 8) /* Avoid silly minimal values. */
+ val = 8;
+ up->pcrlen = val;
+ up->pcflag |= UDPLITE_RECV_CC;
+ break;
+
default:
err = -ENOPROTOOPT;
break;
@@ -1267,21 +1437,21 @@ static int do_udp_setsockopt(struct sock
return err;
}
-static int udp_setsockopt(struct sock *sk, int level, int optname,
- char __user *optval, int optlen)
+int udp_setsockopt(struct sock *sk, int level, int optname,
+ char __user *optval, int optlen)
{
- if (level != SOL_UDP)
- return ip_setsockopt(sk, level, optname, optval, optlen);
- return do_udp_setsockopt(sk, level, optname, optval, optlen);
+ if (level == SOL_UDP || level == SOL_UDPLITE)
+ return do_udp_setsockopt(sk, level, optname, optval, optlen);
+ return ip_setsockopt(sk, level, optname, optval, optlen);
}
#ifdef CONFIG_COMPAT
-static int compat_udp_setsockopt(struct sock *sk, int level, int optname,
- char __user *optval, int optlen)
+int compat_udp_setsockopt(struct sock *sk, int level, int optname,
+ char __user *optval, int optlen)
{
- if (level != SOL_UDP)
- return compat_ip_setsockopt(sk, level, optname, optval, optlen);
- return do_udp_setsockopt(sk, level, optname, optval, optlen);
+ if (level == SOL_UDP || level == SOL_UDPLITE)
+ return do_udp_setsockopt(sk, level, optname, optval, optlen);
+ return compat_ip_setsockopt(sk, level, optname, optval, optlen);
}
#endif
@@ -1308,6 +1478,15 @@ static int do_udp_getsockopt(struct sock
val = up->encap_type;
break;
+ /* the following two always return 0 on UDP sockets */
+ case UDPLITE_SEND_CSCOV:
+ val = up->pcslen;
+ break;
+
+ case UDPLITE_RECV_CSCOV:
+ val = up->pcrlen;
+ break;
+
default:
return -ENOPROTOOPT;
};
@@ -1319,25 +1498,26 @@ static int do_udp_getsockopt(struct sock
return 0;
}
-static int udp_getsockopt(struct sock *sk, int level, int optname,
- char __user *optval, int __user *optlen)
+int udp_getsockopt(struct sock *sk, int level, int optname,
+ char __user *optval, int __user *optlen )
{
- if (level != SOL_UDP)
- return ip_getsockopt(sk, level, optname, optval, optlen);
- return do_udp_getsockopt(sk, level, optname, optval, optlen);
+ if (level == SOL_UDP || level == SOL_UDPLITE)
+ return do_udp_getsockopt(sk, level, optname, optval, optlen);
+ return ip_getsockopt(sk, level, optname, optval, optlen);
}
#ifdef CONFIG_COMPAT
-static int compat_udp_getsockopt(struct sock *sk, int level, int optname,
- char __user *optval, int __user *optlen)
+int compat_udp_getsockopt(struct sock *sk, int level, int optname,
+ char __user *optval, int __user *optlen)
{
- if (level != SOL_UDP)
- return compat_ip_getsockopt(sk, level, optname, optval, optlen);
- return do_udp_getsockopt(sk, level, optname, optval, optlen);
+ if (level == SOL_UDP || level == SOL_UDPLITE)
+ return do_udp_getsockopt(sk, level, optname, optval, optlen);
+ return compat_ip_getsockopt(sk, level, optname, optval, optlen);
}
#endif
+
/**
- * udp_poll - wait for a UDP event.
+ * udp_poll - wait for a UDP(-Lite) event.
* @file - file struct
* @sock - socket
* @wait - poll table
@@ -1348,11 +1528,14 @@ #endif
* then it could get return from select indicating data available
* but then block when reading it. Add special case code
* to work around these arguably broken applications.
+ *
+ * The routine is udplite-aware and works for both protocols.
*/
unsigned int udp_poll(struct file *file, struct socket *sock, poll_table *wait)
{
unsigned int mask = datagram_poll(file, sock, wait);
struct sock *sk = sock->sk;
+ int is_lite = IS_UDPLITE(sk);
/* Check for false positives due to checksum errors */
if ( (mask & POLLRDNORM) &&
@@ -1364,7 +1547,11 @@ unsigned int udp_poll(struct file *file,
spin_lock_bh(&rcvq->lock);
while ((skb = skb_peek(rcvq)) != NULL) {
if (udp_checksum_complete(skb)) {
- UDP_INC_STATS_BH(UDP_MIB_INERRORS);
+ /* The datagram has already been counted as
+ * InDatagram when earlier it was enqueued.
+ * Update count of really received datagrams. */
+ UDP_DEC_STATS_BH(UDP_MIB_INDATAGRAMS, is_lite);
+ UDP_INC_STATS_BH(UDP_MIB_INERRORS, is_lite);
__skb_unlink(skb, rcvq);
kfree_skb(skb);
} else {
@@ -1407,6 +1594,31 @@ #ifdef CONFIG_COMPAT
#endif
};
+struct proto udplite_prot = {
+ .name = "UDP-Lite",
+ .owner = THIS_MODULE,
+ .close = udp_close,
+ .connect = ip4_datagram_connect,
+ .disconnect = udp_disconnect,
+ .ioctl = udp_ioctl,
+ .init = udplite_sk_init,
+ .destroy = udp_destroy_sock,
+ .setsockopt = udp_setsockopt,
+ .getsockopt = udp_getsockopt,
+ .sendmsg = udp_sendmsg,
+ .recvmsg = udp_recvmsg,
+ .sendpage = udp_sendpage,
+ .backlog_rcv = udp_queue_rcv_skb,
+ .hash = udp_v4_hash,
+ .unhash = udp_v4_unhash,
+ .get_port = udplite_v4_get_port,
+ .obj_size = sizeof(struct udp_sock),
+#ifdef CONFIG_COMPAT
+ .compat_setsockopt = compat_udp_setsockopt,
+ .compat_getsockopt = compat_udp_getsockopt,
+#endif
+};
+
/* ------------------------------------------------------------------------ */
#ifdef CONFIG_PROC_FS
@@ -1417,7 +1629,7 @@ static struct sock *udp_get_first(struct
for (state->bucket = 0; state->bucket < UDP_HTABLE_SIZE; ++state->bucket) {
struct hlist_node *node;
- sk_for_each(sk, node, &udp_hash[state->bucket]) {
+ sk_for_each(sk, node, state->hashtable + state->bucket) {
if (sk->sk_family == state->family)
goto found;
}
@@ -1438,7 +1650,7 @@ try_again:
} while (sk && sk->sk_family != state->family);
if (!sk && ++state->bucket < UDP_HTABLE_SIZE) {
- sk = sk_head(&udp_hash[state->bucket]);
+ sk = sk_head(state->hashtable + state->bucket);
goto try_again;
}
return sk;
@@ -1488,6 +1700,7 @@ static int udp_seq_open(struct inode *in
if (!s)
goto out;
s->family = afinfo->family;
+ s->hashtable = afinfo->hashtable;
s->seq_ops.start = udp_seq_start;
s->seq_ops.next = udp_seq_next;
s->seq_ops.show = afinfo->seq_show;
@@ -1554,7 +1767,7 @@ static void udp4_format_sock(struct sock
atomic_read(&sp->sk_refcnt), sp);
}
-static int udp4_seq_show(struct seq_file *seq, void *v)
+int udp4_seq_show(struct seq_file *seq, void *v)
{
if (v == SEQ_START_TOKEN)
seq_printf(seq, "%-127s\n",
@@ -1577,6 +1790,7 @@ static struct udp_seq_afinfo udp4_seq_af
.owner = THIS_MODULE,
.name = "udp",
.family = AF_INET,
+ .hashtable = udp_hash,
.seq_show = udp4_seq_show,
.seq_fops = &udp4_seq_fops,
};
On 8/28/06, [email protected] <[email protected]> wrote:
> [NET/IPv4]: update for udp.c only, to match 2.6.18-rc4-mm3
>
> This is an update only, as the previous patch can not cope
> with recent changes to udp.c (all other files remain the same).
>
> Up-to-date, complete patches can always be taken from
> http://www.erg.abdn.ac.uk/users/gerrit/udp-lite/files/udplite_linux.tar.gz
>
> Signed-off-by: Gerrit Renker <[email protected]>
> ---
> udp.c | 606 ++++++++++++++++++++++++++++++++++++++++++++----------------------
> 1 file changed, 410 insertions(+), 196 deletions(-)
>
>
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 514c1e9..4ddd8e6 100644
> @@ -731,12 +801,12 @@ out:
> }
>
> /*
> - * IOCTL requests applicable to the UDP protocol
> + * IOCTL requests applicable to the UDP(-Lite) protocol
> */
Avoid these changes to reduce patch file size, please
> -
> +
> int udp_ioctl(struct sock *sk, int cmd, unsigned long arg)
> {
> - switch(cmd)
> + switch(cmd)
Ditto
> -/*
> - * This should be easy, if there is something there we
> - * return it, otherwise we block.
> +/**
> + * udp_recvmsg - generic UDP/-Lite receive processing
> + *
> + * This routine is udplite-aware and works for both protocols.
> @@ -980,7 +1055,11 @@ #else
> #endif
> }
>
> -/* returns:
> +/**
> + * udp_queue_rcv_skb - receive queue processing
> + *
> + * This routine is udplite-aware and works on both sockets.
>
> if (up->encap_type) {
> @@ -1010,7 +1087,7 @@ static int udp_queue_rcv_skb(struct sock
> * If it's an encapsulateed packet, then pass it to the
> * IPsec xfrm input and return the response
> * appropriately. Otherwise, just fall through and
> - * pass this up the UDP socket.
> + * pass this up the UDP/-Lite socket.
> */
> - /* FALLTHROUGH -- it's a UDP Packet */
> + /* FALLTHROUGH -- it's a UDP/-Lite Packet */
> }
>
> /*
> - * All we need to do is get the socket, and then do a checksum.
> + * All we need to do is get the socket, and then do a checksum.
> */
> -
Huh, what was this one? trailing whitespace? Can you leave this for
another cset doing just the reformatting?
> @@ -1219,7 +1363,7 @@ static int udp_destroy_sock(struct sock
> }
>
> /*
> - * Socket option code for UDP
> + * Socket option code for UDP and UDP-Lite (shared).
> */
> #endif
> +
> /**
> - * udp_poll - wait for a UDP event.
> + * udp_poll - wait for a UDP(-Lite) event.
See next comment
> * @file - file struct
> * @sock - socket
> * @wait - poll table
> @@ -1348,11 +1528,14 @@ #endif
> * then it could get return from select indicating data available
> * but then block when reading it. Add special case code
> * to work around these arguably broken applications.
> + *
> + * The routine is udplite-aware and works for both protocols.
I guess these comments can go as well, as one can quickly realise the
functions handles UDP lite with all the "IS_UDPLITE(sk)" calls and
"is_{udp}lite" variables :-)
> */
> unsigned int udp_poll(struct file *file, struct socket *sock, poll_table *wait)
> {
> unsigned int mask = datagram_poll(file, sock, wait);
> struct sock *sk = sock->sk;
> + int is_lite = IS_UDPLITE(sk);
Regards,
- Arnaldo
Quoting Arnaldo Carvalho de Melo:
| Avoid these changes to reduce patch file size, please
I apologize for the bad patch format - I am revising the entire
patch to improve readability and will resend.
- Gerrit
On 8/28/06, [email protected] <[email protected]> wrote:
> Quoting Arnaldo Carvalho de Melo:
> | Avoid these changes to reduce patch file size, please
>
> I apologize for the bad patch format - I am revising the entire
> patch to improve readability and will resend.
No need for apologies and thanks for taking my suggestions into account.
- Arnaldo
[Net/IPv4]: REVISED UDP-Lite standalone support and shared UDP/-Lite socket structure.
This is in principle the same patch as posted earlier, with the difference that
all whitespace changes have been removed; in addition, statements have been re-ordered
so as to give a better-readable patch.
Signed-off-by: Gerrit Renker <[email protected]>
---
include/linux/udp.h | 11 ++
include/net/udplite.h | 35 ++++++++
net/ipv4/udplite.c | 209 ++++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 255 insertions(+)
diff --git a/include/linux/udp.h b/include/linux/udp.h
index 90223f0..1b7cf10 100644
--- a/include/linux/udp.h
+++ b/include/linux/udp.h
@@ -50,12 +50,23 @@ struct udp_sock {
* when the socket is uncorked.
*/
__u16 len; /* total length of pending frames */
+ /*
+ * Fields specific to UDP-Lite.
+ */
+ __u16 pcslen;
+ __u16 pcrlen;
+/* indicator bits used by pcflag: */
+#define UDPLITE_BIT 0x1 /* set by udplite proto init function */
+#define UDPLITE_SEND_CC 0x2 /* set via udplite setsockopt */
+#define UDPLITE_RECV_CC 0x4 /* set via udplite setsocktopt */
+ __u8 pcflag; /* marks socket as UDP-Lite if > 0 */
};
static inline struct udp_sock *udp_sk(const struct sock *sk)
{
return (struct udp_sock *)sk;
}
+#define IS_UDPLITE(__sk) (udp_sk(__sk)->pcflag)
#endif
diff --git a/net/ipv4/udplite.c b/net/ipv4/udplite.c
new file mode 100644
index 0000000..3911403
--- /dev/null
+++ b/net/ipv4/udplite.c
@@ -0,0 +1,209 @@
+/*
+ * UDPLITE An implementation of the UDP-Lite protocol (RFC 3828).
+ *
+ * Version: $Id: udplite.c,v 1.22 2006/08/22 13:01:52 gerrit Exp gerrit $
+ *
+ * Authors: Gerrit Renker <[email protected]>
+ *
+ * Changes:
+ * Fixes:
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+struct hlist_head udplite_hash[UDP_HTABLE_SIZE];
+int udplite_port_rover;
+DEFINE_SNMP_STAT(struct udp_mib, udplite_statistics) __read_mostly;
+
+/* these functions are called by UDP-Lite with protocol-specific parameters */
+static int __udp_get_port(struct sock *, unsigned short,
+ struct hlist_head *, int * );
+static struct sock *__udp_lookup(u32 , u16, u32, u16, int, struct hlist_head *);
+static int __udp_mcast_deliver(struct sk_buff *, struct udphdr *,
+ u32, u32, struct hlist_head * );
+static int __udp_common_rcv(struct sk_buff *, int is_udplite);
+static void __udp_err(struct sk_buff *, u32, struct hlist_head *);
+#ifdef CONFIG_PROC_FS
+static int udp4_seq_show(struct seq_file *, void *);
+#endif
+
+/*
+ * Designate sk as UDP-Lite socket
+ */
+static inline int udplite_sk_init(struct sock *sk)
+{
+ udp_sk(sk)->pcflag = UDPLITE_BIT;
+ return 0;
+}
+
+static __inline__ int udplite_v4_get_port(struct sock *sk, unsigned short snum)
+{
+ return __udp_get_port(sk, snum, udplite_hash, &udplite_port_rover);
+}
+
+static __inline__ struct sock *udplite_v4_lookup(u32 saddr, u16 sport,
+ u32 daddr, u16 dport, int dif)
+{
+ return __udp_lookup(saddr, sport, daddr, dport, dif, udplite_hash);
+}
+
+static __inline__ int udplite_v4_mcast_deliver(struct sk_buff *skb,
+ struct udphdr *uh, u32 saddr, u32 daddr)
+{
+ return __udp_mcast_deliver(skb, uh, saddr, daddr, udplite_hash);
+}
+
+__inline__ int udplite_rcv(struct sk_buff *skb)
+{
+ return __udp_common_rcv(skb, 1);
+}
+
+__inline__ void udplite_err(struct sk_buff *skb, u32 info)
+{
+ return __udp_err(skb, info, udplite_hash);
+}
+
+static int udplite_checksum_init(struct sk_buff *skb, struct udphdr *uh,
+ unsigned short len, u32 saddr, u32 daddr)
+{
+ u16 cscov;
+
+ /* In UDPv4 a zero checksum means that the transmitter generated no
+ * checksum. UDP-Lite (like IPv6) mandates checksums, hence packets
+ * with a zero checksum field are illegal. */
+ if (uh->check == 0) {
+ LIMIT_NETDEBUG(KERN_DEBUG "UDPLITE: zeroed csum field"
+ "(%d.%d.%d.%d:%d -> %d.%d.%d.%d:%d)\n", NIPQUAD(saddr),
+ ntohs(uh->source), NIPQUAD(daddr), ntohs(uh->dest) );
+ return 0;
+ }
+
+ UDP_SKB_CB(skb)->partial_cov = 0;
+ cscov = ntohs(uh->len);
+
+ if (cscov == 0) /* Indicates that full coverage is required. */
+ cscov = len;
+ else if (cscov < 8 || cscov > len) {
+ /*
+ * Coverage length violates RFC 3828: log and discard silently.
+ */
+ LIMIT_NETDEBUG(KERN_DEBUG "UDPLITE: bad csum coverage %d/%d "
+ "(%d.%d.%d.%d:%d -> %d.%d.%d.%d:%d)\n", cscov, len,
+ NIPQUAD(saddr), ntohs(uh->source),
+ NIPQUAD(daddr), ntohs(uh->dest) );
+ return 0;
+
+ } else if (cscov < len)
+ UDP_SKB_CB(skb)->partial_cov = 1;
+
+ UDP_SKB_CB(skb)->cscov = cscov;
+
+ /*
+ * Initialise pseudo-header for checksum computation.
+ *
+ * There is no known NIC manufacturer supporting UDP-Lite yet,
+ * hence ip_summed is always (re-)set to CHECKSUM_NONE.
+ */
+ skb->csum = csum_tcpudp_nofold(saddr, daddr, len, IPPROTO_UDPLITE, 0);
+ skb->ip_summed = CHECKSUM_NONE;
+
+ return 1;
+}
+
+static void udplite_csum_outgoing(struct sock *sk, struct sk_buff *skb,
+ int totlen, int cscov, u32 src, u32 dst)
+{
+ unsigned int csum = 0, len;
+ struct udphdr *uh = skb->h.uh;
+
+ uh->check = 0;
+
+ skb->ip_summed = CHECKSUM_NONE; /* no HW support for checksumming */
+
+ if (skb_queue_len(&sk->sk_write_queue) == 1) {
+ /*
+ * Only one fragment on the socket.
+ */
+ csum = skb_checksum(skb, skb->h.raw - skb->data, cscov, 0);
+
+ } else {
+ skb_queue_walk(&sk->sk_write_queue, skb) {
+ len = skb->tail - skb->h.raw;
+
+ skb->csum = skb_checksum(skb, skb->h.raw - skb->data,
+ (cscov > len)? len : cscov, 0);
+ csum = csum_add(csum, skb->csum);
+
+ if (cscov < len) /* Enough seen. */
+ break;
+ cscov -= len;
+ }
+ }
+
+ uh->check = csum_tcpudp_magic(src, dst, totlen, IPPROTO_UDPLITE, csum);
+ if (uh->check == 0)
+ uh->check = -1;
+}
+
+
+static struct net_protocol udplite_protocol = {
+ .handler = udplite_rcv,
+ .err_handler = udplite_err,
+ .no_policy = 1,
+};
+
+static struct inet_protosw udplite4_protosw = {
+ .type = SOCK_DGRAM,
+ .protocol = IPPROTO_UDPLITE,
+ .prot = &udplite_prot,
+ .ops = &inet_dgram_ops,
+ .capability = -1,
+ .no_check = 0, /* must checksum (RFC 3828) */
+ .flags = INET_PROTOSW_PERMANENT,
+};
+
+void __init udplite4_register(void)
+{
+ if (proto_register(&udplite_prot, 1))
+ goto out_register_err;
+
+ if (inet_add_protocol(&udplite_protocol, IPPROTO_UDPLITE) < 0)
+ goto out_unregister_proto;
+
+ inet_register_protosw(&udplite4_protosw);
+
+ return;
+
+out_unregister_proto:
+ proto_unregister(&udplite_prot);
+out_register_err:
+ printk(KERN_CRIT "udplite4_register: Cannot add UDP-Lite protocol\n");
+}
+
+#ifdef CONFIG_PROC_FS
+static struct file_operations udplite4_seq_fops;
+static struct udp_seq_afinfo udplite4_seq_afinfo = {
+ .owner = THIS_MODULE,
+ .name = "udplite",
+ .family = AF_INET,
+ .hashtable = udplite_hash,
+ .seq_show = udp4_seq_show,
+ .seq_fops = &udplite4_seq_fops,
+};
+
+__inline__ int __init udplite4_proc_init(void)
+{
+ return udp_proc_register(&udplite4_seq_afinfo);
+}
+
+__inline__ void udplite4_proc_exit(void)
+{
+ udp_proc_unregister(&udplite4_seq_afinfo);
+}
+#endif /* CONFIG_PROC_FS */
+
+EXPORT_SYMBOL(udplite_hash);
+EXPORT_SYMBOL(udplite_prot);
diff --git a/include/net/udplite.h b/include/net/udplite.h
new file mode 100644
index 0000000..1a32c42
--- /dev/null
+++ b/include/net/udplite.h
@@ -0,0 +1,35 @@
+/*
+ * Definitions for the UDP-Lite (RFC 3828) code.
+ */
+#ifndef _UDPLITE_H
+#define _UDPLITE_H
+
+/* UDP-Lite socket options */
+#define UDPLITE_SEND_CSCOV 10 /* sender partial coverage (as sent) */
+#define UDPLITE_RECV_CSCOV 11 /* receiver partial coverage (threshold ) */
+
+extern struct proto udplite_prot;
+extern struct hlist_head udplite_hash[UDP_HTABLE_SIZE];
+
+/* UDP-Lite does not have a standardized MIB yet, so we inherit from UDP */
+DECLARE_SNMP_STAT(struct udp_mib, udplite_statistics);
+
+/*
+ * Checksum computation is all in software, hence simpler getfrag.
+ */
+static __inline__ int udplite_getfrag(void *from, char *to, int offset,
+ int len, int odd, struct sk_buff *skb)
+{
+ return memcpy_fromiovecend(to, (struct iovec *) from, offset, len);
+}
+
+/*
+ * net/ipv4/udplite.c
+ */
+extern void udplite4_register(void);
+#ifdef CONFIG_PROC_FS
+extern int udplite4_proc_init(void);
+extern void udplite4_proc_exit(void);
+#endif
+
+#endif /* _UDPLITE_H */
[Net/IPv4]: REVISED Modifications to the UDP module and generic UDP/-Lite processing.
Signed-off-by: Gerrit Renker <[email protected]>
---
include/net/udp.h | 68 ++++++-
net/ipv4/udp.c | 489 ++++++++++++++++++++++++++++++++++++------------------
2 files changed, 395 insertions(+), 162 deletions(-)
diff --git a/include/net/udp.h b/include/net/udp.h
index 766fba1..77c5fb9 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -26,9 +26,48 @@ #include <linux/list.h>
#include <net/inet_sock.h>
#include <net/sock.h>
#include <net/snmp.h>
+#include <net/ip.h>
+#include <linux/ipv6.h>
#include <linux/seq_file.h>
#define UDP_HTABLE_SIZE 128
+#include <net/udplite.h>
+
+/**
+ * struct udp_skb_cb - UDP(-Lite) private variables
+ *
+ * @header: private variables used by IPv4/IPv6
+ * @cscov: checksum coverage length (UDP-Lite only)
+ * @partial_cov: if set indicates partial csum coverage
+ */
+struct udp_skb_cb {
+ union {
+ struct inet_skb_parm h4;
+#if defined(CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE)
+ struct inet6_skb_parm h6;
+#endif
+ } header;
+ __u16 cscov;
+ __u8 partial_cov;
+};
+#define UDP_SKB_CB(__skb) ((struct udp_skb_cb *)((__skb)->cb))
+
+/*
+ * Generic checksumming routines for UDP(-Lite) v4 and v6
+ */
+static inline u16 __udp_checksum_complete(struct sk_buff *skb)
+{
+ if (! UDP_SKB_CB(skb)->partial_cov)
+ return __skb_checksum_complete(skb);
+ return csum_fold(skb_checksum(skb, 0, UDP_SKB_CB(skb)->cscov,
+ skb->csum));
+}
+
+static __inline__ int udp_checksum_complete(struct sk_buff *skb)
+{
+ return skb->ip_summed != CHECKSUM_UNNECESSARY &&
+ __udp_checksum_complete(skb);
+}
/* udp.c: This needs to be shared by v4 and v6 because the lookup
* and hashing code needs to work with different AF's yet
@@ -39,16 +78,24 @@ extern rwlock_t udp_hash_lock;
extern int udp_port_rover;
-static inline int udp_lport_inuse(u16 num)
+/*
+ * XXX: since udp_v{4,6}_get_port use common code, these two functions
+ * will soon go
+ */
+static inline int __udp_lport_inuse(u16 num, struct hlist_head udptable[])
{
struct sock *sk;
struct hlist_node *node;
- sk_for_each(sk, node, &udp_hash[num & (UDP_HTABLE_SIZE - 1)])
+ sk_for_each(sk, node, &udptable[num & (UDP_HTABLE_SIZE - 1)])
if (inet_sk(sk)->num == num)
return 1;
return 0;
}
+static __inline__ int udp_lport_inuse(u16 num)
+{
+ return __udp_lport_inuse(num, udp_hash);
+}
/* Note: this must match 'valbool' in sock_setsockopt */
#define UDP_CSUM_NOXMIT 1
@@ -75,21 +122,32 @@ extern unsigned int udp_poll(struct file
poll_table *wait);
DECLARE_SNMP_STAT(struct udp_mib, udp_statistics);
-#define UDP_INC_STATS(field) SNMP_INC_STATS(udp_statistics, field)
-#define UDP_INC_STATS_BH(field) SNMP_INC_STATS_BH(udp_statistics, field)
-#define UDP_INC_STATS_USER(field) SNMP_INC_STATS_USER(udp_statistics, field)
+/*
+ * SNMP statistics for UDP and UDP-Lite
+ */
+#define UDP_INC_STATS_USER(field, is_udplite) do { \
+ if (is_udplite) SNMP_INC_STATS_USER(udplite_statistics, field); \
+ else SNMP_INC_STATS_USER(udp_statistics, field); } while(0)
+#define UDP_INC_STATS_BH(field, is_udplite) do { \
+ if (is_udplite) SNMP_INC_STATS_BH(udplite_statistics, field); \
+ else SNMP_INC_STATS_BH(udp_statistics, field); } while(0)
+#define UDP_DEC_STATS_BH(field, is_udplite) do { \
+ if (is_udplite) SNMP_DEC_STATS_BH(udplite_statistics, field); \
+ else SNMP_DEC_STATS_BH(udp_statistics, field); } while(0)
/* /proc */
struct udp_seq_afinfo {
struct module *owner;
char *name;
sa_family_t family;
+ struct hlist_head *hashtable;
int (*seq_show) (struct seq_file *m, void *v);
struct file_operations *seq_fops;
};
struct udp_iter_state {
sa_family_t family;
+ struct hlist_head *hashtable;
int bucket;
struct seq_operations seq_ops;
};
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 514c1e9..5ca0db3 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -92,10 +92,8 @@ #include <linux/errno.h>
#include <linux/timer.h>
#include <linux/mm.h>
#include <linux/inet.h>
-#include <linux/ipv6.h>
#include <linux/netdevice.h>
#include <net/snmp.h>
-#include <net/ip.h>
#include <net/tcp_states.h>
#include <net/protocol.h>
#include <linux/skbuff.h>
@@ -108,6 +106,8 @@ #include <net/route.h>
#include <net/inet_common.h>
#include <net/checksum.h>
#include <net/xfrm.h>
+/* the extensions for UDP-Lite (RFC 3828) */
+#include "udplite.c"
/*
* Snmp MIB for the UDP layer
@@ -121,7 +121,13 @@ DEFINE_RWLOCK(udp_hash_lock);
/* Shared by v4/v6 udp. */
int udp_port_rover;
-static int udp_v4_get_port(struct sock *sk, unsigned short snum)
+static __inline__ int udp_v4_get_port(struct sock *sk, unsigned short snum)
+{
+ return __udp_get_port(sk, snum, udp_hash, &udp_port_rover);
+}
+
+int __udp_get_port(struct sock *sk, unsigned short snum,
+ struct hlist_head udptable[], int *port_rover)
{
struct hlist_node *node;
struct sock *sk2;
@@ -131,16 +137,16 @@ static int udp_v4_get_port(struct sock *
if (snum == 0) {
int best_size_so_far, best, result, i;
- if (udp_port_rover > sysctl_local_port_range[1] ||
- udp_port_rover < sysctl_local_port_range[0])
- udp_port_rover = sysctl_local_port_range[0];
+ if (*port_rover > sysctl_local_port_range[1] ||
+ *port_rover < sysctl_local_port_range[0])
+ *port_rover = sysctl_local_port_range[0];
best_size_so_far = 32767;
- best = result = udp_port_rover;
+ best = result = *port_rover;
for (i = 0; i < UDP_HTABLE_SIZE; i++, result++) {
struct hlist_head *list;
int size;
- list = &udp_hash[result & (UDP_HTABLE_SIZE - 1)];
+ list = &udptable[result & (UDP_HTABLE_SIZE - 1)];
if (hlist_empty(list)) {
if (result > sysctl_local_port_range[1])
result = sysctl_local_port_range[0] +
@@ -162,16 +168,16 @@ static int udp_v4_get_port(struct sock *
result = sysctl_local_port_range[0]
+ ((result - sysctl_local_port_range[0]) &
(UDP_HTABLE_SIZE - 1));
- if (!udp_lport_inuse(result))
+ if (! __udp_lport_inuse(result, udptable))
break;
}
if (i >= (1 << 16) / UDP_HTABLE_SIZE)
goto fail;
gotit:
- udp_port_rover = snum = result;
+ *port_rover = snum = result;
} else {
sk_for_each(sk2, node,
- &udp_hash[snum & (UDP_HTABLE_SIZE - 1)]) {
+ &udptable[snum & (UDP_HTABLE_SIZE - 1)]) {
struct inet_sock *inet2 = inet_sk(sk2);
if (inet2->num == snum &&
@@ -189,7 +195,7 @@ gotit:
}
inet->num = snum;
if (sk_unhashed(sk)) {
- struct hlist_head *h = &udp_hash[snum & (UDP_HTABLE_SIZE - 1)];
+ struct hlist_head *h = &udptable[snum & (UDP_HTABLE_SIZE - 1)];
sk_add_node(sk, h);
sock_prot_inc_use(sk->sk_prot);
@@ -220,15 +226,22 @@ static void udp_v4_unhash(struct sock *s
/* UDP is nearly always wildcards out the wazoo, it makes no sense to try
* harder than this. -DaveM
*/
-static struct sock *udp_v4_lookup_longway(u32 saddr, u16 sport,
- u32 daddr, u16 dport, int dif)
+static __inline__ struct sock *udp_v4_lookup(u32 saddr, u16 sport,
+ u32 daddr, u16 dport, int dif)
+{
+ return __udp_lookup(saddr, sport, daddr, dport, dif, udp_hash);
+}
+
+struct sock *__udp_lookup(u32 saddr, u16 sport, u32 daddr, u16 dport, int dif,
+ struct hlist_head udptable[] )
{
struct sock *sk, *result = NULL;
struct hlist_node *node;
unsigned short hnum = ntohs(dport);
int badness = -1;
- sk_for_each(sk, node, &udp_hash[hnum & (UDP_HTABLE_SIZE - 1)]) {
+ read_lock(&udp_hash_lock);
+ sk_for_each(sk, node, &udptable[hnum & (UDP_HTABLE_SIZE - 1)]) {
struct inet_sock *inet = inet_sk(sk);
if (inet->num == hnum && !ipv6_only_sock(sk)) {
@@ -262,20 +275,11 @@ static struct sock *udp_v4_lookup_longwa
}
}
}
- return result;
-}
-
-static __inline__ struct sock *udp_v4_lookup(u32 saddr, u16 sport,
- u32 daddr, u16 dport, int dif)
-{
- struct sock *sk;
-
- read_lock(&udp_hash_lock);
- sk = udp_v4_lookup_longway(saddr, sport, daddr, dport, dif);
- if (sk)
- sock_hold(sk);
+ if (result)
+ sock_hold(result);
read_unlock(&udp_hash_lock);
- return sk;
+
+ return result;
}
static inline struct sock *udp_v4_mcast_next(struct sock *sk,
@@ -317,7 +321,12 @@ found:
* to find the appropriate port.
*/
-void udp_err(struct sk_buff *skb, u32 info)
+__inline__ void udp_err(struct sk_buff *skb, u32 info)
+{
+ return __udp_err(skb, info, udp_hash);
+}
+
+void __udp_err(struct sk_buff *skb, u32 info, struct hlist_head udptable[])
{
struct inet_sock *inet;
struct iphdr *iph = (struct iphdr*)skb->data;
@@ -328,7 +337,8 @@ void udp_err(struct sk_buff *skb, u32 in
int harderr;
int err;
- sk = udp_v4_lookup(iph->daddr, uh->dest, iph->saddr, uh->source, skb->dev->ifindex);
+ sk = __udp_lookup(iph->daddr, uh->dest, iph->saddr, uh->source,
+ skb->dev->ifindex, udptable );
if (sk == NULL) {
ICMP_INC_STATS_BH(ICMP_MIB_INERRORS);
return; /* No socket for error */
@@ -396,33 +406,17 @@ static void udp_flush_pending_frames(str
}
}
-/*
- * Push out all pending data as one UDP datagram. Socket is locked.
- */
-static int udp_push_pending_frames(struct sock *sk, struct udp_sock *up)
+static void udp_csum_outgoing(struct sock *sk, struct sk_buff *skb,
+ int totlen, u32 src, u32 dst )
{
- struct inet_sock *inet = inet_sk(sk);
- struct flowi *fl = &inet->cork.fl;
- struct sk_buff *skb;
- struct udphdr *uh;
- int err = 0;
-
- /* Grab the skbuff where UDP header space exists. */
- if ((skb = skb_peek(&sk->sk_write_queue)) == NULL)
- goto out;
+ unsigned int csum = 0;
+ struct udphdr *uh = skb->h.uh;
- /*
- * Create a UDP header
- */
- uh = skb->h.uh;
- uh->source = fl->fl_ip_sport;
- uh->dest = fl->fl_ip_dport;
- uh->len = htons(up->len);
uh->check = 0;
if (sk->sk_no_check == UDP_CSUM_NOXMIT) {
skb->ip_summed = CHECKSUM_NONE;
- goto send;
+ return;
}
if (skb_queue_len(&sk->sk_write_queue) == 1) {
@@ -431,42 +425,95 @@ static int udp_push_pending_frames(struc
*/
if (skb->ip_summed == CHECKSUM_PARTIAL) {
skb->csum = offsetof(struct udphdr, check);
- uh->check = ~csum_tcpudp_magic(fl->fl4_src, fl->fl4_dst,
- up->len, IPPROTO_UDP, 0);
- } else {
- skb->csum = csum_partial((char *)uh,
- sizeof(struct udphdr), skb->csum);
- uh->check = csum_tcpudp_magic(fl->fl4_src, fl->fl4_dst,
- up->len, IPPROTO_UDP, skb->csum);
- if (uh->check == 0)
- uh->check = -1;
- }
+ uh->check = ~csum_tcpudp_magic(src, dst,
+ totlen, IPPROTO_UDP, 0);
+ return;
+ } else {
+ csum = csum_partial(skb->h.raw,
+ sizeof(struct udphdr), skb->csum);
+ }
} else {
- unsigned int csum = 0;
/*
* HW-checksum won't work as there are two or more
* fragments on the socket so that all csums of sk_buffs
* should be together.
*/
if (skb->ip_summed == CHECKSUM_PARTIAL) {
- int offset = (unsigned char *)uh - skb->data;
+ int offset = skb->h.raw - skb->data;
skb->csum = skb_checksum(skb, offset, skb->len - offset, 0);
skb->ip_summed = CHECKSUM_NONE;
- } else {
- skb->csum = csum_partial((char *)uh,
- sizeof(struct udphdr), skb->csum);
- }
+ } else
+ skb->csum = csum_partial(skb->h.raw,
+ sizeof(struct udphdr), skb->csum);
skb_queue_walk(&sk->sk_write_queue, skb) {
csum = csum_add(csum, skb->csum);
}
- uh->check = csum_tcpudp_magic(fl->fl4_src, fl->fl4_dst,
- up->len, IPPROTO_UDP, csum);
- if (uh->check == 0)
- uh->check = -1;
}
-send:
+
+ uh->check = csum_tcpudp_magic(src, dst, totlen, IPPROTO_UDP, csum);
+ if (uh->check == 0)
+ uh->check = -1;
+}
+
+/*
+ * Push out all pending data as one UDP datagram. Socket is locked.
+ */
+static int udp_push_pending_frames(struct sock *sk, struct udp_sock *up)
+{
+ struct inet_sock *inet = inet_sk(sk);
+ struct flowi *fl = &inet->cork.fl;
+ struct sk_buff *skb;
+ struct udphdr *uh;
+ int err = 0;
+ u16 cscov = up->len;
+
+ /* Grab the skbuff where UDP header space exists. */
+ if ((skb = skb_peek(&sk->sk_write_queue)) == NULL)
+ goto out;
+
+ /*
+ * Create a UDP header
+ */
+ uh = skb->h.uh;
+ uh->source = fl->fl_ip_sport;
+ uh->dest = fl->fl_ip_dport;
+ uh->len = htons(up->len);
+
+ /*
+ * If sender has set `partial coverage' socket option on a
+ * UDP-Lite socket, adjust coverage length accordingly.
+ * All other cases default to traditional UDP checksum mode.
+ */
+ if (up->pcflag & UDPLITE_SEND_CC) {
+ if (up->pcslen < up->len) {
+ /* up->pcslen == 0 means that full coverage is required,
+ * partial coverage only if 0 < up->pcslen < up->len */
+ if (0 < up->pcslen) {
+ cscov = up->pcslen;
+ }
+ uh->len = htons(up->pcslen);
+ }
+ /*
+ * NOTE: Causes for the error case `up->pcslen > up->len':
+ * (i) Application error (will not be penalized).
+ * (ii) Payload too big for send buffer: data is split
+ * into several packets, each with its own header.
+ * In this case (e.g. last segment), coverage may
+ * exceed packet length.
+ * Since packets with coverage length > packet length are
+ * illegal, we fall back to the defaults here.
+ */
+ }
+
+ if(up->pcflag)
+ udplite_csum_outgoing(sk, skb, up->len, cscov,
+ fl->fl4_src, fl->fl4_dst);
+ else
+ udp_csum_outgoing(sk, skb, up->len, fl->fl4_src, fl->fl4_dst);
+
+
err = ip_push_pending_frames(sk);
out:
up->len = 0;
@@ -475,11 +522,6 @@ out:
}
-static unsigned short udp_check(struct udphdr *uh, int len, unsigned long saddr, unsigned long daddr, unsigned long base)
-{
- return(csum_tcpudp_magic(saddr, daddr, len, IPPROTO_UDP, base));
-}
-
int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
size_t len)
{
@@ -493,8 +535,9 @@ int udp_sendmsg(struct kiocb *iocb, stru
u32 daddr, faddr, saddr;
u16 dport;
u8 tos;
- int err;
+ int err, is_udplite = up->pcflag;
int corkreq = up->corkflag || msg->msg_flags&MSG_MORE;
+ int (*getfrag)(void *, char *, int, int, int, struct sk_buff *);
if (len > 0xFFFF)
return -EMSGSIZE;
@@ -599,7 +642,7 @@ int udp_sendmsg(struct kiocb *iocb, stru
{ .daddr = faddr,
.saddr = saddr,
.tos = tos } },
- .proto = IPPROTO_UDP,
+ .proto = sk->sk_protocol,
.uli_u = { .ports =
{ .sport = inet->sport,
.dport = dport } } };
@@ -645,7 +688,8 @@ back_from_confirm:
do_append_data:
up->len += ulen;
- err = ip_append_data(sk, ip_generic_getfrag, msg->msg_iov, ulen,
+ getfrag = is_udplite ? udplite_getfrag : ip_generic_getfrag;
+ err = ip_append_data(sk, getfrag, msg->msg_iov, ulen,
sizeof(struct udphdr), &ipc, rt,
corkreq ? msg->msg_flags|MSG_MORE : msg->msg_flags);
if (err)
@@ -659,7 +703,7 @@ out:
if (free)
kfree(ipc.opt);
if (!err) {
- UDP_INC_STATS_USER(UDP_MIB_OUTDATAGRAMS);
+ UDP_INC_STATS_USER(UDP_MIB_OUTDATAGRAMS, is_udplite);
return len;
}
/*
@@ -670,7 +714,7 @@ out:
* seems like overkill.
*/
if (err == -ENOBUFS || test_bit(SOCK_NOSPACE, &sk->sk_socket->flags)) {
- UDP_INC_STATS_USER(UDP_MIB_SNDBUFERRORS);
+ UDP_INC_STATS_USER(UDP_MIB_SNDBUFERRORS, is_udplite);
}
return err;
@@ -770,17 +814,6 @@ int udp_ioctl(struct sock *sk, int cmd,
return(0);
}
-static __inline__ int __udp_checksum_complete(struct sk_buff *skb)
-{
- return __skb_checksum_complete(skb);
-}
-
-static __inline__ int udp_checksum_complete(struct sk_buff *skb)
-{
- return skb->ip_summed != CHECKSUM_UNNECESSARY &&
- __udp_checksum_complete(skb);
-}
-
/*
* This should be easy, if there is something there we
* return it, otherwise we block.
@@ -792,7 +825,7 @@ static int udp_recvmsg(struct kiocb *ioc
struct inet_sock *inet = inet_sk(sk);
struct sockaddr_in *sin = (struct sockaddr_in *)msg->msg_name;
struct sk_buff *skb;
- int copied, err;
+ int copied, err, copy_only, is_udplite = IS_UDPLITE(sk);
/*
* Check any passed addresses
@@ -814,17 +847,26 @@ try_again:
msg->msg_flags |= MSG_TRUNC;
}
- if (skb->ip_summed==CHECKSUM_UNNECESSARY) {
- err = skb_copy_datagram_iovec(skb, sizeof(struct udphdr), msg->msg_iov,
- copied);
- } else if (msg->msg_flags&MSG_TRUNC) {
+ /*
+ * Decide whether to checksum and/or copy data.
+ *
+ * UDP: checksum may have been computed in HW,
+ * (re-)compute it if message is truncated.
+ * UDP-Lite: always needs to checksum, no HW support.
+ */
+ copy_only = (skb->ip_summed == CHECKSUM_UNNECESSARY);
+
+ if (is_udplite || (!copy_only && msg->msg_flags&MSG_TRUNC)) {
if (__udp_checksum_complete(skb))
goto csum_copy_err;
- err = skb_copy_datagram_iovec(skb, sizeof(struct udphdr), msg->msg_iov,
- copied);
- } else {
- err = skb_copy_and_csum_datagram_iovec(skb, sizeof(struct udphdr), msg->msg_iov);
+ copy_only = 1;
+ }
+ if (copy_only)
+ err = skb_copy_datagram_iovec(skb, sizeof(struct udphdr),
+ msg->msg_iov, copied );
+ else {
+ err = skb_copy_and_csum_datagram_iovec(skb, sizeof(struct udphdr), msg->msg_iov);
if (err == -EINVAL)
goto csum_copy_err;
}
@@ -855,7 +897,8 @@ out:
return err;
csum_copy_err:
- UDP_INC_STATS_BH(UDP_MIB_INERRORS);
+ UDP_INC_STATS_BH(UDP_MIB_INERRORS, is_udplite);
+ UDP_DEC_STATS_BH(UDP_MIB_INDATAGRAMS, is_udplite);
skb_kill_datagram(sk, skb, flags);
@@ -996,10 +1039,8 @@ static int udp_queue_rcv_skb(struct sock
/*
* Charge it to the socket, dropping if the queue is full.
*/
- if (!xfrm4_policy_check(sk, XFRM_POLICY_IN, skb)) {
- kfree_skb(skb);
- return -1;
- }
+ if (!xfrm4_policy_check(sk, XFRM_POLICY_IN, skb))
+ goto drop;
nf_reset(skb);
if (up->encap_type) {
@@ -1023,31 +1064,77 @@ static int udp_queue_rcv_skb(struct sock
if (ret < 0) {
/* process the ESP packet */
ret = xfrm4_rcv_encap(skb, up->encap_type);
- UDP_INC_STATS_BH(UDP_MIB_INDATAGRAMS);
+ UDP_INC_STATS_BH(UDP_MIB_INDATAGRAMS, up->pcflag);
return -ret;
}
/* FALLTHROUGH -- it's a UDP Packet */
}
if (sk->sk_filter && skb->ip_summed != CHECKSUM_UNNECESSARY) {
- if (__udp_checksum_complete(skb)) {
- UDP_INC_STATS_BH(UDP_MIB_INERRORS);
- kfree_skb(skb);
- return -1;
- }
+ if (__udp_checksum_complete(skb))
+ goto drop;
skb->ip_summed = CHECKSUM_UNNECESSARY;
}
+ /*
+ * UDP-Lite specific tests, ignored on UDP sockets
+ * XXX: may be better to do them before sk->sk_filter
+ */
+ if ((up->pcflag & UDPLITE_RECV_CC) && UDP_SKB_CB(skb)->partial_cov) {
+
+ /*
+ * MIB statistics other than incrementing the error count are
+ * disabled for the following two types of errors: these depend
+ * on the application settings, not on the functioning of the
+ * protocol stack as such.
+ *
+ *
+ * RFC 3828 here recommends (sec 3.3): "There should also be a
+ * way ... to ... at least let the receiving application block
+ * delivery of packets with coverage values less than a value
+ * provided by the application."
+ */
+ if (up->pcrlen == 0) { /* full coverage was set */
+ LIMIT_NETDEBUG(KERN_WARNING "UDPLITE: partial coverage "
+ "%d while full coverage %d requested\n",
+ UDP_SKB_CB(skb)->cscov, skb->len);
+ goto drop;
+ }
+ /* The next case involves violating the min. coverage requested
+ * by the receiver. This is subtle: if receiver wants x and x is
+ * greater than the buffersize/MTU then receiver will complain
+ * that it wants x while sender emits packets of smaller size y.
+ * Therefore the above ...()->partial_cov statement is essential.
+ */
+ if (UDP_SKB_CB(skb)->cscov < up->pcrlen) {
+ LIMIT_NETDEBUG(KERN_WARNING
+ "UDPLITE: coverage %d too small, need min %d\n",
+ UDP_SKB_CB(skb)->cscov, up->pcrlen);
+ goto drop;
+ }
+ }
+
if ((rc = sock_queue_rcv_skb(sk,skb)) < 0) {
/* Note that an ENOMEM error is charged twice */
if (rc == -ENOMEM)
- UDP_INC_STATS_BH(UDP_MIB_RCVBUFERRORS);
- UDP_INC_STATS_BH(UDP_MIB_INERRORS);
- kfree_skb(skb);
- return -1;
+ UDP_INC_STATS_BH(UDP_MIB_RCVBUFERRORS, up->pcflag);
+ goto drop;
}
- UDP_INC_STATS_BH(UDP_MIB_INDATAGRAMS);
+ /*
+ * XXX Incrementing this counter when the datagram is later taken off
+ * the queue due to receive failure is problematic, cf.
+ * http://bugzilla.kernel.org/show_bug.cgi?id=6660
+ * This module counts correctly by decrementing InDatagrams whenever
+ * the datagram is popped off a queue without being actually delivered,
+ * see udp_recvmsg() and udp_poll().
+ */
+ UDP_INC_STATS_BH(UDP_MIB_INDATAGRAMS, up->pcflag);
return 0;
+
+drop:
+ UDP_INC_STATS_BH(UDP_MIB_INERRORS, up->pcflag);
+ kfree_skb(skb);
+ return -1;
}
/*
@@ -1056,14 +1143,20 @@ static int udp_queue_rcv_skb(struct sock
* Note: called only from the BH handler context,
* so we don't need to lock the hashes.
*/
-static int udp_v4_mcast_deliver(struct sk_buff *skb, struct udphdr *uh,
- u32 saddr, u32 daddr)
+static __inline__ int udp_v4_mcast_deliver(struct sk_buff *skb,
+ struct udphdr *uh, u32 saddr, u32 daddr)
+{
+ return __udp_mcast_deliver(skb, uh, saddr, daddr, udp_hash);
+}
+
+int __udp_mcast_deliver(struct sk_buff *skb, struct udphdr *uh,
+ u32 saddr, u32 daddr, struct hlist_head udptable[])
{
struct sock *sk;
int dif;
read_lock(&udp_hash_lock);
- sk = sk_head(&udp_hash[ntohs(uh->dest) & (UDP_HTABLE_SIZE - 1)]);
+ sk = sk_head(&udptable[ntohs(uh->dest) & (UDP_HTABLE_SIZE - 1)]);
dif = skb->dev->ifindex;
sk = udp_v4_mcast_next(sk, uh->dest, daddr, uh->source, saddr, dif);
if (sk) {
@@ -1103,7 +1196,7 @@ static void udp_checksum_init(struct sk_
if (uh->check == 0) {
skb->ip_summed = CHECKSUM_UNNECESSARY;
} else if (skb->ip_summed == CHECKSUM_COMPLETE) {
- if (!udp_check(uh, ulen, saddr, daddr, skb->csum))
+ if (!csum_tcpudp_magic(saddr,daddr,ulen, IPPROTO_UDP, skb->csum))
skb->ip_summed = CHECKSUM_UNNECESSARY;
}
if (skb->ip_summed != CHECKSUM_UNNECESSARY)
@@ -1111,13 +1204,20 @@ static void udp_checksum_init(struct sk_
/* Probably, we should checksum udp header (it should be in cache
* in any case) and data in tiny packets (< rx copybreak).
*/
+
+ /* UDP = UDP-Lite with a non-partial checksum coverage */
+ UDP_SKB_CB(skb)->partial_cov = 0;
}
/*
* All we need to do is get the socket, and then do a checksum.
*/
-
-int udp_rcv(struct sk_buff *skb)
+__inline__ int udp_rcv(struct sk_buff *skb)
+{
+ return __udp_common_rcv(skb, 0);
+}
+
+int __udp_common_rcv(struct sk_buff *skb, int is_udplite)
{
struct sock *sk;
struct udphdr *uh;
@@ -1126,29 +1226,39 @@ int udp_rcv(struct sk_buff *skb)
u32 saddr = skb->nh.iph->saddr;
u32 daddr = skb->nh.iph->daddr;
int len = skb->len;
+ struct hlist_head *ht = is_udplite? udplite_hash : udp_hash;
/*
- * Validate the packet and the UDP length.
+ * Validate the packet.
*/
if (!pskb_may_pull(skb, sizeof(struct udphdr)))
- goto no_header;
+ goto drop; /* No space for header. */
uh = skb->h.uh;
ulen = ntohs(uh->len);
- if (ulen > len || ulen < sizeof(*uh))
- goto short_packet;
+ if (! is_udplite) {
+ if (ulen > len || ulen < sizeof(*uh))
+ goto short_packet;
+
+ if (pskb_trim_rcsum(skb, ulen))
+ goto short_packet;
- if (pskb_trim_rcsum(skb, ulen))
- goto short_packet;
+ udp_checksum_init(skb, uh, ulen, saddr, daddr);
- udp_checksum_init(skb, uh, ulen, saddr, daddr);
+ } else { /* UDP-Lite: we must not trim here */
+ if (len < sizeof(*uh))
+ goto short_packet;
+
+ if (! udplite_checksum_init(skb, uh, len, saddr, daddr))
+ goto csum_error;
+ }
if(rt->rt_flags & (RTCF_BROADCAST|RTCF_MULTICAST))
- return udp_v4_mcast_deliver(skb, uh, saddr, daddr);
+ return __udp_mcast_deliver(skb, uh, saddr, daddr, ht);
- sk = udp_v4_lookup(saddr, uh->source, daddr, uh->dest, skb->dev->ifindex);
+ sk = __udp_lookup(saddr, uh->source, daddr, uh->dest, skb->dev->ifindex, ht);
if (sk != NULL) {
int ret = udp_queue_rcv_skb(sk, skb);
@@ -1170,7 +1280,7 @@ int udp_rcv(struct sk_buff *skb)
if (udp_checksum_complete(skb))
goto csum_error;
- UDP_INC_STATS_BH(UDP_MIB_NOPORTS);
+ UDP_INC_STATS_BH(UDP_MIB_NOPORTS, is_udplite);
icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0);
/*
@@ -1181,31 +1291,30 @@ int udp_rcv(struct sk_buff *skb)
return(0);
short_packet:
- LIMIT_NETDEBUG(KERN_DEBUG "UDP: short packet: From %u.%u.%u.%u:%u %d/%d to %u.%u.%u.%u:%u\n",
+ LIMIT_NETDEBUG(KERN_DEBUG "UDP%s: short packet: From %u.%u.%u.%u:%u %d/%d to %u.%u.%u.%u:%u\n",
+ is_udplite? "-Lite" : "",
NIPQUAD(saddr),
ntohs(uh->source),
ulen,
len,
NIPQUAD(daddr),
ntohs(uh->dest));
-no_header:
- UDP_INC_STATS_BH(UDP_MIB_INERRORS);
- kfree_skb(skb);
- return(0);
+ goto drop;
csum_error:
/*
* RFC1122: OK. Discards the bad packet silently (as far as
* the network is concerned, anyway) as per 4.1.3.4 (MUST).
*/
- LIMIT_NETDEBUG(KERN_DEBUG "UDP: bad checksum. From %d.%d.%d.%d:%d to %d.%d.%d.%d:%d ulen %d\n",
+ LIMIT_NETDEBUG(KERN_DEBUG "UDP%s: bad checksum. From %d.%d.%d.%d:%d to %d.%d.%d.%d:%d ulen %d\n",
+ is_udplite? "-Lite" : "",
NIPQUAD(saddr),
ntohs(uh->source),
NIPQUAD(daddr),
ntohs(uh->dest),
ulen);
drop:
- UDP_INC_STATS_BH(UDP_MIB_INERRORS);
+ UDP_INC_STATS_BH(UDP_MIB_INERRORS, is_udplite);
kfree_skb(skb);
return(0);
}
@@ -1259,6 +1368,32 @@ static int do_udp_setsockopt(struct sock
}
break;
+ /*
+ * UDP-Lite's partial checksum coverage (RFC 3828).
+ */
+ /* The sender sets actual checksum coverage length via this option.
+ * The case coverage > packet length is handled by send module. */
+ case UDPLITE_SEND_CSCOV:
+ if (!up->pcflag) /* Disable the option on UDP sockets */
+ return -ENOPROTOOPT;
+ if (val != 0 && val < 8) /* Illegal coverage: use default (8) */
+ val = 8;
+ up->pcslen = val;
+ up->pcflag |= UDPLITE_SEND_CC;
+ break;
+
+ /* The receiver specifies a minimum checksum coverage value. To make
+ * sense, this should be set to at least 8 (as done below). If zero is
+ * used, this again means full checksum coverage. */
+ case UDPLITE_RECV_CSCOV:
+ if (!up->pcflag) /* Disable the option on UDP sockets */
+ return -ENOPROTOOPT;
+ if (val != 0 && val < 8) /* Avoid silly minimal values. */
+ val = 8;
+ up->pcrlen = val;
+ up->pcflag |= UDPLITE_RECV_CC;
+ break;
+
default:
err = -ENOPROTOOPT;
break;
@@ -1270,18 +1405,18 @@ static int do_udp_setsockopt(struct sock
static int udp_setsockopt(struct sock *sk, int level, int optname,
char __user *optval, int optlen)
{
- if (level != SOL_UDP)
- return ip_setsockopt(sk, level, optname, optval, optlen);
- return do_udp_setsockopt(sk, level, optname, optval, optlen);
+ if (level == SOL_UDP || level == SOL_UDPLITE)
+ return do_udp_setsockopt(sk, level, optname, optval, optlen);
+ return ip_setsockopt(sk, level, optname, optval, optlen);
}
#ifdef CONFIG_COMPAT
static int compat_udp_setsockopt(struct sock *sk, int level, int optname,
char __user *optval, int optlen)
{
- if (level != SOL_UDP)
- return compat_ip_setsockopt(sk, level, optname, optval, optlen);
- return do_udp_setsockopt(sk, level, optname, optval, optlen);
+ if (level == SOL_UDP || level == SOL_UDPLITE)
+ return do_udp_setsockopt(sk, level, optname, optval, optlen);
+ return compat_ip_setsockopt(sk, level, optname, optval, optlen);
}
#endif
@@ -1308,6 +1443,15 @@ static int do_udp_getsockopt(struct sock
val = up->encap_type;
break;
+ /* the following two always return 0 on UDP sockets */
+ case UDPLITE_SEND_CSCOV:
+ val = up->pcslen;
+ break;
+
+ case UDPLITE_RECV_CSCOV:
+ val = up->pcrlen;
+ break;
+
default:
return -ENOPROTOOPT;
};
@@ -1322,18 +1466,18 @@ static int do_udp_getsockopt(struct sock
static int udp_getsockopt(struct sock *sk, int level, int optname,
char __user *optval, int __user *optlen)
{
- if (level != SOL_UDP)
- return ip_getsockopt(sk, level, optname, optval, optlen);
- return do_udp_getsockopt(sk, level, optname, optval, optlen);
+ if (level == SOL_UDP || level == SOL_UDPLITE)
+ return do_udp_getsockopt(sk, level, optname, optval, optlen);
+ return ip_getsockopt(sk, level, optname, optval, optlen);
}
#ifdef CONFIG_COMPAT
static int compat_udp_getsockopt(struct sock *sk, int level, int optname,
char __user *optval, int __user *optlen)
{
- if (level != SOL_UDP)
- return compat_ip_getsockopt(sk, level, optname, optval, optlen);
- return do_udp_getsockopt(sk, level, optname, optval, optlen);
+ if (level == SOL_UDP || level == SOL_UDPLITE)
+ return do_udp_getsockopt(sk, level, optname, optval, optlen);
+ return compat_ip_getsockopt(sk, level, optname, optval, optlen);
}
#endif
/**
@@ -1353,6 +1497,7 @@ unsigned int udp_poll(struct file *file,
{
unsigned int mask = datagram_poll(file, sock, wait);
struct sock *sk = sock->sk;
+ int is_lite = IS_UDPLITE(sk);
/* Check for false positives due to checksum errors */
if ( (mask & POLLRDNORM) &&
@@ -1364,7 +1509,11 @@ unsigned int udp_poll(struct file *file,
spin_lock_bh(&rcvq->lock);
while ((skb = skb_peek(rcvq)) != NULL) {
if (udp_checksum_complete(skb)) {
- UDP_INC_STATS_BH(UDP_MIB_INERRORS);
+ /* The datagram has already been counted as
+ * InDatagram when earlier it was enqueued.
+ * Update count of really received datagrams. */
+ UDP_DEC_STATS_BH(UDP_MIB_INDATAGRAMS, is_lite);
+ UDP_INC_STATS_BH(UDP_MIB_INERRORS, is_lite);
__skb_unlink(skb, rcvq);
kfree_skb(skb);
} else {
@@ -1407,6 +1556,30 @@ #ifdef CONFIG_COMPAT
#endif
};
+struct proto udplite_prot = {
+ .name = "UDP-Lite",
+ .owner = THIS_MODULE,
+ .close = udp_close,
+ .connect = ip4_datagram_connect,
+ .disconnect = udp_disconnect,
+ .ioctl = udp_ioctl,
+ .init = udplite_sk_init,
+ .destroy = udp_destroy_sock,
+ .setsockopt = udp_setsockopt,
+ .getsockopt = udp_getsockopt,
+ .sendmsg = udp_sendmsg,
+ .recvmsg = udp_recvmsg,
+ .sendpage = udp_sendpage,
+ .backlog_rcv = udp_queue_rcv_skb,
+ .hash = udp_v4_hash,
+ .unhash = udp_v4_unhash,
+ .get_port = udplite_v4_get_port,
+ .obj_size = sizeof(struct udp_sock),
+#ifdef CONFIG_COMPAT
+ .compat_setsockopt = compat_udp_setsockopt,
+ .compat_getsockopt = compat_udp_getsockopt,
+#endif
+};
/* ------------------------------------------------------------------------ */
#ifdef CONFIG_PROC_FS
@@ -1417,7 +1590,7 @@ static struct sock *udp_get_first(struct
for (state->bucket = 0; state->bucket < UDP_HTABLE_SIZE; ++state->bucket) {
struct hlist_node *node;
- sk_for_each(sk, node, &udp_hash[state->bucket]) {
+ sk_for_each(sk, node, state->hashtable + state->bucket) {
if (sk->sk_family == state->family)
goto found;
}
@@ -1438,7 +1611,7 @@ try_again:
} while (sk && sk->sk_family != state->family);
if (!sk && ++state->bucket < UDP_HTABLE_SIZE) {
- sk = sk_head(&udp_hash[state->bucket]);
+ sk = sk_head(state->hashtable + state->bucket);
goto try_again;
}
return sk;
@@ -1488,6 +1661,7 @@ static int udp_seq_open(struct inode *in
if (!s)
goto out;
s->family = afinfo->family;
+ s->hashtable = afinfo->hashtable;
s->seq_ops.start = udp_seq_start;
s->seq_ops.next = udp_seq_next;
s->seq_ops.show = afinfo->seq_show;
@@ -1554,7 +1728,7 @@ static void udp4_format_sock(struct sock
atomic_read(&sp->sk_refcnt), sp);
}
-static int udp4_seq_show(struct seq_file *seq, void *v)
+int udp4_seq_show(struct seq_file *seq, void *v)
{
if (v == SEQ_START_TOKEN)
seq_printf(seq, "%-127s\n",
@@ -1577,6 +1751,7 @@ static struct udp_seq_afinfo udp4_seq_af
.owner = THIS_MODULE,
.name = "udp",
.family = AF_INET,
+ .hashtable = udp_hash,
.seq_show = udp4_seq_show,
.seq_fops = &udp4_seq_fops,
};
[Net/IPv4]: REVISED Miscellaneous changes which complete the
v4 support for UDP-Lite.
Signed-off-by: Gerrit Renker <[email protected]>
---
include/linux/in.h | 1 +
include/linux/socket.h | 1 +
include/net/snmp.h | 2 ++
include/net/xfrm.h | 2 ++
net/ipv4/af_inet.c | 15 ++++++++++++++-
net/ipv4/proc.c | 16 ++++++++++++++--
net/ipv6/udp.c | 1 +
7 files changed, 35 insertions(+), 3 deletions(-)
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 877e5b3..43faef2 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1223,10 +1223,14 @@ static int __init init_ipv4_mibs(void)
tcp_statistics[1] = alloc_percpu(struct tcp_mib);
udp_statistics[0] = alloc_percpu(struct udp_mib);
udp_statistics[1] = alloc_percpu(struct udp_mib);
+ udplite_statistics[0] = alloc_percpu(struct udp_mib);
+ udplite_statistics[1] = alloc_percpu(struct udp_mib);
+
if (!
(net_statistics[0] && net_statistics[1] && ip_statistics[0]
&& ip_statistics[1] && tcp_statistics[0] && tcp_statistics[1]
- && udp_statistics[0] && udp_statistics[1]))
+ && udp_statistics[0] && udp_statistics[1]
+ && udplite_statistics[0] && udplite_statistics[1] ) )
return -ENOMEM;
(void) tcp_mib_init();
@@ -1300,6 +1304,11 @@ #endif
inet_register_protosw(q);
/*
+ * Add UDP-Lite (RFC 3828)
+ */
+ udplite4_register();
+
+ /*
* Set the ARP module up
*/
@@ -1367,6 +1376,8 @@ static int __init ipv4_proc_init(void)
goto out_tcp;
if (udp4_proc_init())
goto out_udp;
+ if (udplite4_proc_init())
+ goto out_udplite;
if (fib_proc_init())
goto out_fib;
if (ip_misc_proc_init())
@@ -1376,6 +1387,8 @@ out:
out_misc:
fib_proc_exit();
out_fib:
+ udplite4_proc_exit();
+out_udplite:
udp4_proc_exit();
out_udp:
tcp4_proc_exit();
diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
index 9c6cbe3..608fe34 100644
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -66,9 +66,10 @@ static int sockstat_seq_show(struct seq_
tcp_death_row.tw_count, atomic_read(&tcp_sockets_allocated),
atomic_read(&tcp_memory_allocated));
seq_printf(seq, "UDP: inuse %d\n", fold_prot_inuse(&udp_prot));
+ seq_printf(seq, "UDPLITE: inuse %d\n", fold_prot_inuse(&udplite_prot));
seq_printf(seq, "RAW: inuse %d\n", fold_prot_inuse(&raw_prot));
- seq_printf(seq, "FRAG: inuse %d memory %d\n", ip_frag_nqueues,
- atomic_read(&ip_frag_mem));
+ seq_printf(seq, "FRAG: inuse %d memory %d\n", ip_frag_nqueues,
+ atomic_read(&ip_frag_mem));
return 0;
}
@@ -304,6 +305,17 @@ static int snmp_seq_show(struct seq_file
fold_field((void **) udp_statistics,
snmp4_udp_list[i].entry));
+ /* the UDP and UDP-Lite MIBs are the same */
+ seq_puts(seq, "\nUdpLite:");
+ for (i = 0; snmp4_udp_list[i].name != NULL; i++)
+ seq_printf(seq, " %s", snmp4_udp_list[i].name);
+
+ seq_puts(seq, "\nUdpLite:");
+ for (i = 0; snmp4_udp_list[i].name != NULL; i++)
+ seq_printf(seq, " %lu",
+ fold_field((void **) udplite_statistics,
+ snmp4_udp_list[i].entry) );
+
seq_putc(seq, '\n');
return 0;
}
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index b9cc55c..b72540b 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1073,6 +1073,7 @@ static struct udp_seq_afinfo udp6_seq_af
.owner = THIS_MODULE,
.name = "udp6",
.family = AF_INET6,
+ .hashtable = udp_hash,
.seq_show = udp6_seq_show,
.seq_fops = &udp6_seq_fops,
};
diff --git a/include/linux/in.h b/include/linux/in.h
index 94f557f..5ada82e 100644
--- a/include/linux/in.h
+++ b/include/linux/in.h
@@ -44,6 +44,7 @@ enum {
IPPROTO_COMP = 108, /* Compression Header protocol */
IPPROTO_SCTP = 132, /* Stream Control Transport Protocol */
+ IPPROTO_UDPLITE = 136, /* UDP-Lite (RFC 3828) */
IPPROTO_RAW = 255, /* Raw IP packets */
IPPROTO_MAX
diff --git a/include/linux/socket.h b/include/linux/socket.h
index 3614090..592b666 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -264,6 +264,7 @@ #define SOL_UDP 17
#define SOL_IPV6 41
#define SOL_ICMPV6 58
#define SOL_SCTP 132
+#define SOL_UDPLITE 136 /* UDP-Lite (RFC 3828) */
#define SOL_RAW 255
#define SOL_IPX 256
#define SOL_AX25 257
diff --git a/include/net/snmp.h b/include/net/snmp.h
index 464970e..34183aa 100644
--- a/include/net/snmp.h
+++ b/include/net/snmp.h
@@ -131,6 +131,8 @@ #define SNMP_INC_STATS(mib, field) \
(per_cpu_ptr(mib[!in_softirq()], raw_smp_processor_id())->mibs[field]++)
#define SNMP_DEC_STATS(mib, field) \
(per_cpu_ptr(mib[!in_softirq()], raw_smp_processor_id())->mibs[field]--)
+#define SNMP_DEC_STATS_BH(mib, field) \
+ (per_cpu_ptr(mib[0], raw_smp_processor_id())->mibs[field]--)
#define SNMP_ADD_STATS_BH(mib, field, addend) \
(per_cpu_ptr(mib[0], raw_smp_processor_id())->mibs[field] += addend)
#define SNMP_ADD_STATS_USER(mib, field, addend) \
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 6df3ecb..7f9913e 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -467,6 +467,7 @@ u16 xfrm_flowi_sport(struct flowi *fl)
switch(fl->proto) {
case IPPROTO_TCP:
case IPPROTO_UDP:
+ case IPPROTO_UDPLITE:
case IPPROTO_SCTP:
port = fl->fl_ip_sport;
break;
@@ -492,6 +493,7 @@ u16 xfrm_flowi_dport(struct flowi *fl)
switch(fl->proto) {
case IPPROTO_TCP:
case IPPROTO_UDP:
+ case IPPROTO_UDPLITE:
case IPPROTO_SCTP:
port = fl->fl_ip_dport;
break;
[email protected] wrote:
> [Net/IPv4]: REVISED Miscellaneous changes which complete the
> v4 support for UDP-Lite.
>
> --- a/include/net/xfrm.h
> +++ b/include/net/xfrm.h
> @@ -467,6 +467,7 @@ u16 xfrm_flowi_sport(struct flowi *fl)
> switch(fl->proto) {
> case IPPROTO_TCP:
> case IPPROTO_UDP:
> + case IPPROTO_UDPLITE:
> case IPPROTO_SCTP:
> port = fl->fl_ip_sport;
> break;
> @@ -492,6 +493,7 @@ u16 xfrm_flowi_dport(struct flowi *fl)
> switch(fl->proto) {
> case IPPROTO_TCP:
> case IPPROTO_UDP:
> + case IPPROTO_UDPLITE:
> case IPPROTO_SCTP:
> port = fl->fl_ip_dport;
> break;
You also need to adapt _decode_session[46] in xfrm[46]_policy.c for
IPsec. While you're at it you might consider adjusting xt_tcpudp,
xt_multiport, ipt_LOG and ip6t_LOG as well to get some basic
netfilter support. I'm going to take care of connection tracking
and NAT once this is in mainline.
Quoting Patrick McHardy:
| [email protected] wrote:
| > [Net/IPv4]: REVISED Miscellaneous changes which complete the
| > v4 support for UDP-Lite.
| >
|
| > --- a/include/net/xfrm.h
| > +++ b/include/net/xfrm.h
| > @@ -467,6 +467,7 @@ u16 xfrm_flowi_sport(struct flowi *fl)
| > switch(fl->proto) {
| > case IPPROTO_TCP:
| > case IPPROTO_UDP:
| > + case IPPROTO_UDPLITE:
| > case IPPROTO_SCTP:
| > port = fl->fl_ip_sport;
| > break;
| > @@ -492,6 +493,7 @@ u16 xfrm_flowi_dport(struct flowi *fl)
| > switch(fl->proto) {
| > case IPPROTO_TCP:
| > case IPPROTO_UDP:
| > + case IPPROTO_UDPLITE:
| > case IPPROTO_SCTP:
| > port = fl->fl_ip_dport;
| > break;
Many thanks for the helpful pointers - I will make sure that the changes
are in the next version. I will be waiting a little while for further comments
before the update.
- Gerrit