2001-02-23 07:01:57

by David Miller

[permalink] [raw]
Subject: [UPDATE] zerocopy BETA 3


Usual spot:

ftp://ftp.kernel.org/pub/linux/kernel/people/davem/zerocopy-2.4.2-2.diff.gz

Changes since last installment:

1) More errors in TCP receive queue collapser are discovered and
fixed.
2) Several URG handling details on receive side are made more
consistent and sane.
3) Workaround for win2000/95 VJ header compression bugs is
implemented.
4) Update to latest 3c59x driver from Andrew, this should cure some
link type detection problems.
5) IP conntrack fix from Rusty.

Please test, to my knowledge the only issue remaining now are the
gbit performance issues, which are being discussed by Pekka and
Alexey.

Later,
David S. Miller
[email protected]


2001-02-23 10:39:12

by Jan Rękorajski

[permalink] [raw]
Subject: Re: [UPDATE] zerocopy BETA 3

On Thu, 22 Feb 2001, David S. Miller wrote:

>
> Usual spot:
>
> ftp://ftp.kernel.org/pub/linux/kernel/people/davem/zerocopy-2.4.2-2.diff.gz
>
> Changes since last installment:
>
> 3) Workaround for win2000/95 VJ header compression bugs is
> implemented.

Could you please make a patch with this fix only? Or is it available
somewhere?

Jan
--
Jan R?korajski | ALL SUSPECTS ARE GUILTY. PERIOD!
baggins<at>mimuw.edu.pl | OTHERWISE THEY WOULDN'T BE SUSPECTS, WOULD THEY?
BOFH, MANIAC | -- TROOPS by Kevin Rubio

2001-02-25 03:39:15

by Chris Wedgwood

[permalink] [raw]
Subject: Re: [UPDATE] zerocopy BETA 3

On Fri, Feb 23, 2001 at 11:42:49AM +0100, Jan Rekorajski wrote:

Could you please make a patch with this fix only? Or is it
available somewhere?

--- linux-2.4.2/include/net/ip.h Sun Feb 25 01:15:19 2001
+++ linux-2.4.2+zc-2/include/net/ip.h Sun Feb 25 01:53:52 2001
@@ -188,11 +188,16 @@

extern void __ip_select_ident(struct iphdr *iph, struct dst_entry *dst);

-static inline void ip_select_ident(struct iphdr *iph, struct dst_entry *dst)
+static inline void ip_select_ident(struct iphdr *iph, struct dst_entry *dst, struct sock *sk)
{
- if (iph->frag_off&__constant_htons(IP_DF))
- iph->id = 0;
- else
+ if (iph->frag_off&__constant_htons(IP_DF)) {
+ /* This is only to work around buggy Windows95/2000
+ * VJ compression implementations. If the ID field
+ * does not change, they drop every other packet in
+ * a TCP stream using header compression.
+ */
+ iph->id = (sk ? sk->protinfo.af_inet.id++ : 0);
+ } else
__ip_select_ident(iph, dst);
}


FWIW; I am still seeing _really_ bad throughput on a 10M ethernet
segment between 2.4.2+zc-2 and Windows98 SE. Nobody else has
complained so I guess it is something local (mii-tool for Windows
wouldn't be a bad idea), but if the above doesn't work for you I'd
been keen to know about it.



--cw

2001-02-25 03:50:35

by Jan Rękorajski

[permalink] [raw]
Subject: Re: [UPDATE] zerocopy BETA 3

On Sun, 25 Feb 2001, Chris Wedgwood wrote:

> On Fri, Feb 23, 2001 at 11:42:49AM +0100, Jan Rekorajski wrote:
>
> Could you please make a patch with this fix only? Or is it
> available somewhere?
>
[cut incomplete patch ;)]

There are more changes, I hacked'em out of vger CVS:

diff -urN linux/include/net/ip.h linux.fixed/include/net/ip.h
--- linux/include/net/ip.h Thu Feb 22 01:10:38 2001
+++ linux.fixed/include/net/ip.h Fri Feb 23 14:40:40 2001
@@ -188,11 +188,16 @@

extern void __ip_select_ident(struct iphdr *iph, struct dst_entry *dst);

-static inline void ip_select_ident(struct iphdr *iph, struct dst_entry *dst)
+static inline void ip_select_ident(struct iphdr *iph, struct dst_entry *dst, struct sock *sk)
{
- if (iph->frag_off&__constant_htons(IP_DF))
- iph->id = 0;
- else
+ if (iph->frag_off&__constant_htons(IP_DF)) {
+ /* This is only to work around buggy Windows95/2000
+ * VJ compression implementations. If the ID field
+ * does not change, they drop every other packet in
+ * a TCP stream using header compression.
+ */
+ iph->id = (sk ? sk->protinfo.af_inet.id++ : 0);
+ } else
__ip_select_ident(iph, dst);
}

diff -urN linux/include/net/ipip.h linux.fixed/include/net/ipip.h
--- linux/include/net/ipip.h Sat Aug 5 03:18:49 2000
+++ linux.fixed/include/net/ipip.h Fri Feb 23 14:40:43 2001
@@ -30,7 +30,7 @@
int pkt_len = skb->len; \
\
iph->tot_len = htons(skb->len); \
- ip_select_ident(iph, &rt->u.dst); \
+ ip_select_ident(iph, &rt->u.dst, NULL); \
ip_send_check(iph); \
\
err = NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, skb, NULL, rt->u.dst.dev, do_ip_send); \
diff -urN linux/include/net/sock.h linux.fixed/include/net/sock.h
--- linux/include/net/sock.h Thu Feb 22 01:10:24 2001
+++ linux.fixed/include/net/sock.h Fri Feb 23 14:40:49 2001
@@ -204,6 +204,7 @@
__u8 mc_loop; /* Loopback */
unsigned recverr : 1,
freebind : 1;
+ __u16 id; /* ID counter for DF pkts */
__u8 pmtudisc;
int mc_index; /* Multicast device index */
__u32 mc_addr;
diff -urN linux/net/ipv4/af_inet.c linux.fixed/net/ipv4/af_inet.c
--- linux/net/ipv4/af_inet.c Fri Dec 29 23:07:24 2000
+++ linux.fixed/net/ipv4/af_inet.c Fri Feb 23 14:40:34 2001
@@ -355,6 +355,8 @@
else
sk->protinfo.af_inet.pmtudisc = IP_PMTUDISC_WANT;

+ sk->protinfo.af_inet.id = 0;
+
sock_init_data(sock,sk);

sk->destruct = inet_sock_destruct;
diff -urN linux/net/ipv4/igmp.c linux.fixed/net/ipv4/igmp.c
--- linux/net/ipv4/igmp.c Tue Jan 9 19:54:57 2001
+++ linux.fixed/net/ipv4/igmp.c Fri Feb 23 14:40:38 2001
@@ -235,7 +235,7 @@
iph->saddr = rt->rt_src;
iph->protocol = IPPROTO_IGMP;
iph->tot_len = htons(IGMP_SIZE);
- ip_select_ident(iph, &rt->u.dst);
+ ip_select_ident(iph, &rt->u.dst, NULL);
((u8*)&iph[1])[0] = IPOPT_RA;
((u8*)&iph[1])[1] = 4;
((u8*)&iph[1])[2] = 0;
diff -urN linux/net/ipv4/ip_output.c linux.fixed/net/ipv4/ip_output.c
--- linux/net/ipv4/ip_output.c Fri Oct 27 20:03:14 2000
+++ linux.fixed/net/ipv4/ip_output.c Fri Feb 23 14:54:17 2001
@@ -141,7 +141,7 @@
iph->saddr = rt->rt_src;
iph->protocol = sk->protocol;
iph->tot_len = htons(skb->len);
- ip_select_ident(iph, &rt->u.dst);
+ ip_select_ident(iph, &rt->u.dst, sk);
skb->nh.iph = iph;

if (opt && opt->optlen) {
@@ -307,7 +307,7 @@
if (ip_dont_fragment(sk, &rt->u.dst))
iph->frag_off |= __constant_htons(IP_DF);

- ip_select_ident(iph, &rt->u.dst);
+ ip_select_ident(iph, &rt->u.dst, sk);

/* Add an IP checksum. */
ip_send_check(iph);
@@ -328,7 +328,7 @@
kfree_skb(skb);
return -EMSGSIZE;
}
- ip_select_ident(iph, &rt->u.dst);
+ ip_select_ident(iph, &rt->u.dst, sk);
return ip_fragment(skb, skb->dst->output);
}

@@ -425,7 +425,7 @@
int err;
int offset, mf;
int mtu;
- u16 id = 0;
+ u16 id;

int hh_len = (rt->u.dst.dev->hard_header_len + 15)&~15;
int nfrags=0;
@@ -495,6 +495,8 @@
* Begin outputting the bytes.
*/

+ id = (sk ? sk->protinfo.af_inet.id++ : 0);
+
do {
char *data;
struct sk_buff * skb;
@@ -677,7 +679,7 @@
iph->tot_len = htons(length);
iph->frag_off = df;
iph->ttl=sk->protinfo.af_inet.mc_ttl;
- ip_select_ident(iph, &rt->u.dst);
+ ip_select_ident(iph, &rt->u.dst, sk);
if (rt->rt_type != RTN_MULTICAST)
iph->ttl=sk->protinfo.af_inet.ttl;
iph->protocol=sk->protocol;
diff -urN linux/net/ipv4/ipmr.c linux.fixed/net/ipv4/ipmr.c
--- linux/net/ipv4/ipmr.c Wed Nov 29 06:53:45 2000
+++ linux.fixed/net/ipv4/ipmr.c Fri Feb 23 14:40:45 2001
@@ -1092,7 +1092,7 @@
iph->protocol = IPPROTO_IPIP;
iph->ihl = 5;
iph->tot_len = htons(skb->len);
- ip_select_ident(iph, skb->dst);
+ ip_select_ident(iph, skb->dst, NULL);
ip_send_check(iph);

skb->h.ipiph = skb->nh.iph;
diff -urN linux/net/ipv4/raw.c linux.fixed/net/ipv4/raw.c
--- linux/net/ipv4/raw.c Fri Feb 9 20:29:44 2001
+++ linux.fixed/net/ipv4/raw.c Fri Feb 23 14:40:47 2001
@@ -296,7 +296,7 @@
* ip_build_xmit clean (well less messy).
*/
if (!iph->id)
- ip_select_ident(iph, rfh->dst);
+ ip_select_ident(iph, rfh->dst, NULL);
iph->check=ip_fast_csum((unsigned char *)iph, iph->ihl);
}
return 0;

> FWIW; I am still seeing _really_ bad throughput on a 10M ethernet
> segment between 2.4.2+zc-2 and Windows98 SE. Nobody else has
> complained so I guess it is something local (mii-tool for Windows
> wouldn't be a bad idea), but if the above doesn't work for you I'd
> been keen to know about it.

I hadn't the time to test it fully yet, but DaveM's quick and dirty patch
for this cured my problems.

Jan
--
Jan R?korajski | ALL SUSPECTS ARE GUILTY. PERIOD!
baggins<at>mimuw.edu.pl | OTHERWISE THEY WOULDN'T BE SUSPECTS, WOULD THEY?
BOFH, MANIAC | -- TROOPS by Kevin Rubio

2001-02-26 05:31:55

by David Miller

[permalink] [raw]
Subject: Re: [UPDATE] zerocopy BETA 3


Chris Wedgwood writes:
> --- linux-2.4.2/include/net/ip.h Sun Feb 25 01:15:19 2001
> +++ linux-2.4.2+zc-2/include/net/ip.h Sun Feb 25 01:53:52 2001

You need to part that adds "id" to the sock struct too.
This won't build "as-is".

Besides, I'd like people to have to test the zerocopy stuff
for me, they'll get the ID fix if they do that :-)

Later,
David S. Miller
[email protected]

2001-02-26 22:37:00

by Michael Peddemors

[permalink] [raw]
Subject: Re: [UPDATE] zerocopy.. While working on ip.h stuff

While doing some work on some ip options stuff, I have noticed a bunchof
unused entries in linux/include/linux/ip.h

A few things.. why is ip.h not part of the linux/include/net rather than
linux/include/linux hierachy?

Defined items that are not used anywhere in the source..
Can any of them be deleted now?
<see below>

Also, I was looking into some RFC 1812 stuff. (Thanks for nothing Dave :) and
was looking at 4.2.2.6 where it mentions that a router MUST implement the End
of Option List option.. Havent' figured out where that is implememented yet..

Also was trying to figure out some things.
I want to create a new ip_option for use in some DOS protection experiments.
I have a whole 40 bytes (+/-) to share... Now although I don't see anything
explicitly prohibiting the use of unused IP Header option space, I know that
it really was designed for use by the sending parties, and not routers in
between.. Has anyone seen any RFC that explicitly says I MUST NOT?


IPTOS_PREC_NETCONTROL
IPTOS_PREC_FLASHOVERRIDE
IPTOS_PREC_FLASH
IPTOS_PREC_IMMEDIATE
IPTOS_PREC_PRIORITY
IPTOS_PREC_ROUTINE
IPOPT_RESERVED1
IPOPT_RESERVED2
IPOPT_OPTVAL
IPOPT_OLEN
IPOPT_MINOFF
MAX_IPOPTLEN
IPOPT_EOL



> diff -urN linux/include/net/ip.h linux.fixed/include/net/ip.h
--------------------------------------------------------
Michael Peddemors - Senior Consultant
Unix?Administration - WebSite Hosting
Network?Services - Programming
Wizard?Internet Services http://www.wizard.ca
Linux Support Specialist - http://www.linuxmagic.com
--------------------------------------------------------
(604)?589-0037 Beautiful British Columbia, Canada
--------------------------------------------------------

2001-02-26 23:23:31

by Andi Kleen

[permalink] [raw]
Subject: Re: [UPDATE] zerocopy.. While working on ip.h stuff

Michael Peddemors <[email protected]> writes:

> A few things.. why is ip.h not part of the linux/include/net rather than
> linux/include/linux hierachy?

Because it needs to be user visible for raw sockets (linux is exported to the user,
net isn't)

> Defined items that are not used anywhere in the source..
> Can any of them be deleted now?

nope. they can be useful for the user.

> Also, I was looking into some RFC 1812 stuff. (Thanks for nothing Dave :) and
> was looking at 4.2.2.6 where it mentions that a router MUST implement the End
> of Option List option.. Havent' figured out where that is implememented yet..

It is (see net/ipv4/ip_options:ip_options_compile())

> Also was trying to figure out some things.
> I want to create a new ip_option for use in some DOS protection experiments.
> I have a whole 40 bytes (+/-) to share... Now although I don't see anything
> explicitly prohibiting the use of unused IP Header option space, I know that
> it really was designed for use by the sending parties, and not routers in
> between.. Has anyone seen any RFC that explicitly says I MUST NOT?

Using IP options is strongly deprecated because it causes a lot of switches/routers
to go from hardware into software switch mode (-> it kills your gigabit routers)


> IPTOS_PREC_NETCONTROL
[...]
They are implemented, just only implicitely as an array index.


-Andi

2001-02-26 23:30:03

by David Miller

[permalink] [raw]
Subject: Re: [UPDATE] zerocopy.. While working on ip.h stuff


Michael Peddemors writes:
> A few things.. why is ip.h not part of the linux/include/net rather than
> linux/include/linux hierachy?

Exported to older userlands...

> Defined items that are not used anywhere in the source..
> Can any of them be deleted now?
> <see below>

So what, userland makes use of them :-)

> Also, I was looking into some RFC 1812 stuff. (Thanks for nothing Dave :) and
> was looking at 4.2.2.6 where it mentions that a router MUST implement the End
> of Option List option.. Havent' figured out where that is implememented yet..

egrep "IPOPT_END" net/ipv4/ip_options.c

You just aren't looking hard enough.

> Also was trying to figure out some things.
> I want to create a new ip_option for use in some DOS protection experiments.
> I have a whole 40 bytes (+/-) to share... Now although I don't see anything
> explicitly prohibiting the use of unused IP Header option space, I know that
> it really was designed for use by the sending parties, and not routers in
> between.. Has anyone seen any RFC that explicitly says I MUST NOT?

Not to my knowledge. Routers already change the time to live field,
so I see no reason why they can't do smart things with special IP
options either (besides efficiency concerns :-).

Later,
David S. Miller
[email protected]

2001-02-26 23:52:00

by Benjamin C.R. LaHaise

[permalink] [raw]
Subject: Re: [UPDATE] zerocopy.. While working on ip.h stuff

On Mon, 26 Feb 2001, David S. Miller wrote:

> Not to my knowledge. Routers already change the time to live field,
> so I see no reason why they can't do smart things with special IP
> options either (besides efficiency concerns :-).

A number of ISPs patch the MSS value to 1492 due to the ridiculous number
of PMTU black holes out on the net. Since the ip header fits in the cache
of some CPUs (like the P4), this becoming a cheaper operation than ever
before.

-ben

2001-02-27 00:11:23

by David Miller

[permalink] [raw]
Subject: Re: [UPDATE] zerocopy.. While working on ip.h stuff


Benjamin C.R. LaHaise writes:
> Since the ip header fits in the cache of some CPUs (like the P4),
> this becoming a cheaper operation than ever before.

At gigapacket rates, it becomes an issue. This guy is talking about
tinkering with new IP _options_, not just the header. So even if the
IP header itself fits totally in a cache line, the options afterwardsd
likely will not and thus require another cache miss.

Later,
David S. Miller
[email protected]

2001-02-27 00:15:53

by Benjamin C.R. LaHaise

[permalink] [raw]
Subject: Re: [UPDATE] zerocopy.. While working on ip.h stuff

On Mon, 26 Feb 2001, David S. Miller wrote:

> At gigapacket rates, it becomes an issue. This guy is talking about
> tinkering with new IP _options_, not just the header. So even if the
> IP header itself fits totally in a cache line, the options afterwardsd
> likely will not and thus require another cache miss.

Hmmm, one way around this is to have the packet queue store things in
in a linear array of pointers to data areas, then process things in
bursts, ie:

- find packet data areas for queued packets
- walk list doing prefetches of ip header and options
- then actually do the packet processing (save output for later)

That will require a number of new hooks for pipelining operations, though.
Just a thought.

-ben

2001-02-27 02:14:20

by Michael Peddemors

[permalink] [raw]
Subject: Re: [UPDATE] zerocopy.. While working on ip.h stuff

On Mon, 26 Feb 2001, David S. Miller wrote:

> > Also, I was looking into some RFC 1812 stuff. (Thanks for nothing Dave
> > :) and was looking at 4.2.2.6 where it mentions that a router MUST
> > implement the End of Option List option.. Havent' figured out where
> > that is implememented yet..
>
> egrep "IPOPT_END" net/ipv4/ip_options.c
>
> You just aren't looking hard enough.

Was looking for IPOPT_EOL :) Forgot about it's reference..


--
"Catch the magic of Linux...."
--------------------------------------------------------
Michael Peddemors - Senior Consultant
Unix Administration - WebSite Hosting
Network Services - Programming
Wizard Internet Services http://www.wizard.ca
Linux Support Specialist - http://www.linuxmagic.com
--------------------------------------------------------
(604)?589-0037 Beautiful British Columbia, Canada
--------------------------------------------------------

2001-02-27 02:31:43

by Michael Peddemors

[permalink] [raw]
Subject: Re: [UPDATE] zerocopy.. While working on ip.h stuff

On Mon, 26 Feb 2001, Benjamin C.R. LaHaise wrote:
> On Mon, 26 Feb 2001, David S. Miller wrote:
> > At gigapacket rates, it becomes an issue. This guy is talking about
> > tinkering with new IP _options_, not just the header. So even if the
> > IP header itself fits totally in a cache line, the options afterwardsd
> > likely will not and thus require another cache miss.

Yes, I expect to use the whole of the allowed size :)
So instead of the more common IP Header length of 20 bytes, I will be using
25-60 bytes for a header, (But so does source routing) and the router RFC
says that we should handle it...
Now, of course, you have raised the question of whether that would be handled
effeciently with the current kernel code..

> Hmmm, one way around this is to have the packet queue store things in
> in a linear array of pointers to data areas, then process things in
> bursts, ie:
>
> - find packet data areas for queued packets
> - walk list doing prefetches of ip header and options
> - then actually do the packet processing (save output for later)
>
> That will require a number of new hooks for pipelining operations, though.
> Just a thought.
>
> -ben

--
"Catch the magic of Linux...."
--------------------------------------------------------
Michael Peddemors - Senior Consultant
Unix Administration - WebSite Hosting
Network Services - Programming
Wizard Internet Services http://www.wizard.ca
Linux Support Specialist - http://www.linuxmagic.com
--------------------------------------------------------
(604)?589-0037 Beautiful British Columbia, Canada
--------------------------------------------------------