by Denys Fedoryshchenko

[permalink] [raw]

Subject: Re: 2.6.21 -> 2.6.22 & 2.6.23-rc8 performance regression

Resend for maillists (was discareded cause of encoding issues as SPAM).

Everything looks fine, for sure. Confirmed on second server.

On Mon, 01 Oct 2007 10:20:07 +0200, Eric Dumazet wrote

> > Well, i can play a bit more on "live" servers. I have now hot-swap server with
> > full gentoo, where i can rebuild any kernel you want, with any applied patch.
> > But it looks more like not overhead, load becoming high too "spiky", and it is
> > not just permantenly higher. Also it is not normal that all system becoming
> > unresposive (for example ping 127.0.0.1 becoming 300ms for period, when usage
> > softirq jumps to 100%).
> >
> >
> Could you try a pristine 2.6.22.9 and some patch in
> secure_tcp_sequence_number() like :
>
> --- drivers/char/random.c.orig 2007-10-01 10:18:42.000000000 +0200
> +++ drivers/char/random.c 2007-10-01 10:19:58.000000000 +0200
> @@ -1554,7 +1554,7 @@
> * That's funny, Linux has one built in! Use it!
> * (Networks are faster now - should this be increased?)
> */
> - seq += ktime_get_real().tv64;
> + seq += ktime_get_real().tv64 / 1000;
> #if 0
> printk("init_seq(%lx, %lx, %d, %d) = %d\n",
> saddr, daddr, sport, dport, seq);
>
> Thank you
>
> > On Mon, 01 Oct 2007 00:12:59 -0700 (PDT), David Miller wrote
> >
> >> From: Eric Dumazet <[email protected]>
> >> Date: Mon, 01 Oct 2007 07:59:12 +0200
> >>
> >>
> >>> No problem here on bigger servers, so I CC David Miller and netdev
> >>> on this one. AFAIK do_gettimeofday() and ktime_get_real() should
> >>> use the same underlying hardware functions on PC and no performance
> >>> problem should happen here.
> >>>
> >> One thing that jumps out at me is that on 32-bit (and to a certain
> >> extent on 64-bit) there is a lot of stack accesses and missed
> >> optimizations because all of the work occurs, and gets expanded,
> >> inside of ktime_get_real().
> >>
> >> The timespec_to_ktime() inside of there constructs the ktime_t return
> >> value on the stack, then returns that as an aggregate to the caller.
> >>
> >> That cannot be without some cost.
> >>
> >> ktime_get_real() is definitely a candidate for inlining especially in
> >> these kinds of cases where we'll happily get computations in local
> >> registers instead of all of this on-stack nonsense. And in several
> >> cases (if the caller only needs the tv_sec value, for example)
> >> computations can be elided entirely.
> >>
> >> It would be constructive to experiment and see if this is in fact
> >> part of the problem.
> >>
> >
> >
> > --
> > Denys Fedoryshchenko
> > Technical Manager
> > Virtual ISP S.A.L.
> >
> >
> >

--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.

2007-10-01 20:10:29

by Eric Dumazet

[permalink] [raw]

Subject: Re: 2.6.21 -> 2.6.22 & 2.6.23-rc8 performance regression

--- linux-2.6.22/drivers/char/random.c 2007-10-01 10:18:42.000000000 +0200
+++ linux-2.6.22-ed/drivers/char/random.c 2007-10-01 21:47:58.000000000 +0200
@@ -1550,11 +1550,13 @@ __u32 secure_tcp_sequence_number(__be32
* As close as possible to RFC 793, which
* suggests using a 250 kHz clock.
* Further reading shows this assumes 2 Mb/s networks.
- * For 10 Gb/s Ethernet, a 1 GHz clock is appropriate.
- * That's funny, Linux has one built in! Use it!
- * (Networks are faster now - should this be increased?)
+ * For 10 Mb/s Ethernet, a 1 MHz clock is appropriate.
+ * For 10 Gb/s Ethernet, a 1 GHz clock should be ok, but
+ * we also need to limit the resolution so that the u32 seq
+ * overlaps less than one time per MSL (2 minutes).
+ * Choosing a clock of 64 ns period is OK. (period of 274 s)
*/
- seq += ktime_get_real().tv64;
+ seq += ktime_get_real().tv64 >> 6;
#if 0
printk("init_seq(%lx, %lx, %d, %d) = %d\n",
saddr, daddr, sport, dport, seq);

Attachments:

seq.patch (990.00 B)

2007-10-01 20:57:20

by David Miller

[permalink] [raw]

Subject: Re: 2.6.21 -> 2.6.22 & 2.6.23-rc8 performance regression

From: Eric Dumazet <[email protected]>
Date: Mon, 01 Oct 2007 22:10:03 +0200

> So maybe the following patch is necessary...
>
> I believe IPV6 & DCCP are immune to this problem.
>
> Thanks again Denys for spotting this.
>
> Eric
>
> [PATCH] TCP : secure_tcp_sequence_number() should not use a too fast clock
>
> TCP V4 sequence numbers are 32bits, and RFC 793 assumed a 250 KHz clock.
> In order to follow network speed increase, we can use a faster clock, but
> we should limit this clock so that the delay between two rollovers is
> greater than MSL (TCP Maximum Segment Lifetime : 2 minutes)
>
> Choosing a 64 nsec clock should be OK, since the rollovers occur every
> 274 seconds.
>
> Problem spotted by Denys Fedoryshchenko
>
> Signed-off-by: Eric Dumazet <[email protected]>

Thanks a lot Eric for bringing closure to this.

I'll apply this and add a reference in the commit message to the
changeset that introduced this problem, since it might help
others who look at this.