2001-04-28 00:14:51

by Fabio Riccardi

[permalink] [raw]
Subject: X15 alpha release: as fast as TUX but in user space

Dear All,

I'd like to announce the first release of X15 Alpha 1, a _user space_
web server that is as fast as TUX.

On my Dell 4400 with 2G of RAM and 2 933MHz PIII and NetGear 2Gbit NICs
I achieve about 2500 SpecWeb99 connections, with both X15 and
TUX (actually X15 is sligtly faster, some 20 connections more... ;)

Given the limitations of my experimental setup I'd like to ask if some
of you could help me testing my software on some higher end machines.
I'm interested to see what happens on 4-8 processors in terms of
scalability etc.

You can download X15 Alpha 1 from here:
http://www.chromium.com/X15-Alpha-1.tgz

The the README file in the tarball should contain sufficient information
to run the thing, I also included a support module for running the
SpecWeb benchmark.

TIA, ciao,

- Fabio



2001-04-28 00:42:28

by Aaron Lehmann

[permalink] [raw]
Subject: Re: X15 alpha release: as fast as TUX but in user space

On Fri, Apr 27, 2001 at 05:18:26PM -0700, Fabio Riccardi wrote:
> You can download X15 Alpha 1 from here:
> http://www.chromium.com/X15-Alpha-1.tgz

Where's the source?

2001-04-28 00:52:21

by David Miller

[permalink] [raw]
Subject: Re: X15 alpha release: as fast as TUX but in user space


Fabio Riccardi writes:
> On my Dell 4400 with 2G of RAM and 2 933MHz PIII and NetGear 2Gbit NICs
> I achieve about 2500 SpecWeb99 connections, with both X15 and
> TUX (actually X15 is sligtly faster, some 20 connections more... ;)

What is the CPU utilization like in X15 vs. TUX during
these runs?

Later,
David S. Miller
[email protected]

2001-04-28 01:08:09

by Fabio Riccardi

[permalink] [raw]
Subject: Re: X15 alpha release: as fast as TUX but in user space

In both cases (X15 and TUX) the CPU utilization is 100%

There are no IO bottlenecks on disk or on the net side.

I think that the major bottleneck is the speed of RAM and the PCI bus, wait
cycles.

We are basically going at the speed of the hardware.

- Fabio

"David S. Miller" wrote:

> Fabio Riccardi writes:
> > On my Dell 4400 with 2G of RAM and 2 933MHz PIII and NetGear 2Gbit NICs
> > I achieve about 2500 SpecWeb99 connections, with both X15 and
> > TUX (actually X15 is sligtly faster, some 20 connections more... ;)
>
> What is the CPU utilization like in X15 vs. TUX during
> these runs?
>
> Later,
> David S. Miller
> [email protected]
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2001-04-28 08:44:48

by Ingo Molnar

[permalink] [raw]
Subject: Re: X15 alpha release: as fast as TUX but in user space


Fabio,

i noticed one weirdness in the Date-field handling of X15. X15 appears to
cache the Date field too, which is contrary to RFCs:

earth2:~> wget -s http://localhost/index.html -O - 2>/dev/null | grep Date
Date: Sat Apr 28 10:15:14 2001
earth2:~> date
Sat Apr 28 10:32:40 CEST 2001

ie. there is already a 15 minutes difference between the 'origin date of
the reply' and the actual date of the reply. (i started X15 up 15 minutes
ago.)

per RFC 2616:
.............
The Date general-header field represents the date and time at which the
message was originated, [...]

Origin servers MUST include a Date header field in all responses, [...]
.............

i considered the caching of the Date field for TUX too, and avoided it
exactly due to this issue, to not violate this 'MUST' item in the RFC. It
can be reasonably expected from a web server to have a 1-second accurate
Date: field.

the header-caching in X15 gives it an edge against TUX, obviously, but IMO
it's a questionable practice.

if caching of headers was be allowed then we could the obvious trick of
sendfile()ing complete web replies (first header, then body).

Ingo

2001-04-28 13:15:34

by Ville Herva

[permalink] [raw]
Subject: Re: X15 alpha release: as fast as TUX but in user space

On Sat, Apr 28, 2001 at 10:42:29AM +0200, you [Ingo Molnar] claimed:
>
> per RFC 2616:
> .............
> The Date general-header field represents the date and time at which the
> message was originated, [...]
>
> Origin servers MUST include a Date header field in all responses, [...]
> .............
>
> i considered the caching of the Date field for TUX too, and avoided it
> exactly due to this issue, to not violate this 'MUST' item in the RFC. It
> can be reasonably expected from a web server to have a 1-second accurate
> Date: field.
>
> the header-caching in X15 gives it an edge against TUX, obviously, but IMO
> it's a questionable practice.
>
> if caching of headers was be allowed then we could the obvious trick of
> sendfile()ing complete web replies (first header, then body).

Uhh, perhaps I'm stupid, but why not cache the date field and update the
field once a five seconds? Or even once a second?

I mean, at the rate of thousands of requests per second that should give you
some advantage over dynamically generating it -- especially if that's the
only thing hindering copletely sendfile()'ing the answer.


-- v --

[email protected]

2001-04-28 13:23:07

by Ingo Molnar

[permalink] [raw]
Subject: Re: X15 alpha release: as fast as TUX but in user space


On Sat, 28 Apr 2001, Ville Herva wrote:

> Uhh, perhaps I'm stupid, but why not cache the date field and update
> the field once a five seconds? Or even once a second?

yes, that should work. but that means possibly updating thousands of (or
more) cached headers, which has some overhead ...

> I mean, at the rate of thousands of requests per second that should
> give you some advantage over dynamically generating it -- especially
> if that's the only thing hindering copletely sendfile()'ing the
> answer.

well, the method i suggested was to use sendfile() twice: first the
(cached, or freshly constructed) headers put into a big file, then the
body itself (which is the original file, accessed via cached file
descriptors).

(splitting up the header and the body has the benefit of not dual-caching
the same webcontent. this is what TUX does too.)

Ingo

2001-04-28 13:26:17

by Ingo Molnar

[permalink] [raw]
Subject: Re: X15 alpha release: as fast as TUX but in user space


On Sat, 28 Apr 2001, Ville Herva wrote:

> Uhh, perhaps I'm stupid, but why not cache the date field and update
> the field once a five seconds? Or even once a second?

perhaps the best way would be to do this updating in the sending code
itself.

first there would be a 'current time thread', which updates a global
shared variable that shows the current time. (ie. no extra system-call is
needed to access current time.) If the header-sending code detects that
current time is not equal to the timestamp stored in the header itself,
then the header is reconstructed. Pretty simple.

Ingo

2001-04-28 13:30:59

by Ville Herva

[permalink] [raw]
Subject: Re: X15 alpha release: as fast as TUX but in user space

On Sat, Apr 28, 2001 at 03:24:25PM +0200, you [Ingo Molnar] claimed:
>
> On Sat, 28 Apr 2001, Ville Herva wrote:
>
> > Uhh, perhaps I'm stupid, but why not cache the date field and update
> > the field once a five seconds? Or even once a second?
>
> perhaps the best way would be to do this updating in the sending code
> itself.
>
> first there would be a 'current time thread', which updates a global
> shared variable that shows the current time. (ie. no extra system-call is
> needed to access current time.) If the header-sending code detects that
> current time is not equal to the timestamp stored in the header itself,
> then the header is reconstructed. Pretty simple.

Yes, that's vaguely resembles what I had in mind. Of course I had no idea
about the data structures Tux or X15 use internally, so I couldn't think it
too thoroughly.


-- v --

[email protected]

2001-04-28 13:56:13

by Andi Kleen

[permalink] [raw]
Subject: Re: X15 alpha release: as fast as TUX but in user space

On Sat, Apr 28, 2001 at 04:30:30PM +0300, Ville Herva wrote:
> Yes, that's vaguely resembles what I had in mind. Of course I had no idea
> about the data structures Tux or X15 use internally, so I couldn't think it
> too thoroughly.

You can also just use the cycle counter directly in most modern CPUs. It can
be read with a single instruction.
In fact modern glibc will do it for you when you use
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, ...)

-Andi

2001-04-29 21:21:45

by Fabio Riccardi

[permalink] [raw]
Subject: Re: X15 alpha release: as fast as TUX but in user space

I can disable header caching and see what happens, I'll add an option for this
in the next X15 release.

Nevertheless I don't know how much this is interesting in real life, since on
the internet most static pages are cached on proxies. I agree that the
RFC asks for a date for the original response, but once the response is cached
what does this date mean?

- Fabio

Ingo Molnar wrote:

> Fabio,
>
> i noticed one weirdness in the Date-field handling of X15. X15 appears to
> cache the Date field too, which is contrary to RFCs:
>
> earth2:~> wget -s http://localhost/index.html -O - 2>/dev/null | grep Date
> Date: Sat Apr 28 10:15:14 2001
> earth2:~> date
> Sat Apr 28 10:32:40 CEST 2001
>
> ie. there is already a 15 minutes difference between the 'origin date of
> the reply' and the actual date of the reply. (i started X15 up 15 minutes
> ago.)
>
> per RFC 2616:
> .............
> The Date general-header field represents the date and time at which the
> message was originated, [...]
>
> Origin servers MUST include a Date header field in all responses, [...]
> .............
>
> i considered the caching of the Date field for TUX too, and avoided it
> exactly due to this issue, to not violate this 'MUST' item in the RFC. It
> can be reasonably expected from a web server to have a 1-second accurate
> Date: field.
>
> the header-caching in X15 gives it an edge against TUX, obviously, but IMO
> it's a questionable practice.
>
> if caching of headers was be allowed then we could the obvious trick of
> sendfile()ing complete web replies (first header, then body).
>
> Ingo

2001-04-30 05:44:24

by dean gaudet

[permalink] [raw]
Subject: Re: X15 alpha release: as fast as TUX but in user space

On Sun, 29 Apr 2001, Fabio Riccardi wrote:

> I can disable header caching and see what happens, I'll add an option
> for this in the next X15 release.

heh, well to be honest, i'd put the (permanent) caching of the Date header
into the very slimy, benchmark-only trick category. (right up there
alongside running the HTTP and TCP stacks inside the NIC interrupt
handler, which folks have done to get even better scores.)

> Nevertheless I don't know how much this is interesting in real life,
> since on the internet most static pages are cached on proxies. I agree
> that the RFC asks for a date for the original response, but once the
> response is cached what does this date mean?

the Date is always the time the response was generated on the origin
server. (there are exceptions for clockless servers, but such servers
have other limitations you wouldn't want -- notably they MUST NOT generate
Last-Modified.)

one example oddity you might experience with a non-increasing Date
surrounds Last-Modified and Date, see section 13.3.3. note that the rfc
indicates that if Last-Modified is less than 60 seconds earlier than Date
then Last-Modified is only a weak validator rather than a strong
validator. this would complicate range requests -- because weak
validators can't be used with range requests. if your server never
updates the Date after the first time it serves an object then you'd
potentially never get out of this 60 second window.

(strong validators are guaranteed to change whenever the object changes...
and Last-Modified isn't strong until some time has passed -- consider NFS
mounted docroots, clock skew in the origin network, multiple file updates
within a second, etc.)

there are a bunch of other things that Date is used for, all of them are
related to caching heuristics and rules.

in theory you could claim that you're implementing a cache server rather
than an origin server... i dunno what the SPEC committee will think when
you try to submit results though :)

so way back when sendfile() was being created i brought up the Date issue
and pointed out that we needed more than a single "void *, size_t" to take
care of headers. eventually this discussion lead creation of TCP_CORK --
so that a http server could use writev() to send a two element iov for the
headers -- one element with everything that doesn't need to change, the
next element with the Date.

i also kind of expected the high performance servers to rebuild a Date:
header every second for all of its threads to use. (of course it's not
that simple, you really want to keep a circular list of N dates... and
just assume that after N seconds no thread could still be accessing an old
Date.)

is this too slow for some reason? (does it play well with zero-copy?)

-dean








2001-04-30 06:40:11

by David Miller

[permalink] [raw]
Subject: Re: X15 alpha release: as fast as TUX but in user space


dean gaudet writes:
> is this too slow for some reason? (does it play well with zero-copy?)

His trick ends up with a minimal set of scatter gather entries.
That's the whole gain behind the trick he's doing.

If you do the TCP_CORK thing, what you end up with is a scatter gather
entry in the SKB for the header bits, then the page cache segments.

Even if we had the HP sendfile() interface iovec garbage, we would end
up with the same number of SKB iovec entries as for the TCP_CORK case
today.

What TUX basically does is build up the header by hand in a scribble
page it uses for header builing, passes that to tcp_sendpage() with
MSG_MORE set, then it initiates the sendfile() part. The final effect
inside the networking is basically equivalent to using
TCP_CORK+sendfile() in userspace, the only difference being that:

1) the scratch page for the headers is maintained per-socket by TCP
2) the header is copied once from user to kernel

I would find it amusing to see what adding the header+file caching
trick to TUX would do to it's results :-)

Later,
David S. Miller
[email protected]

2001-04-30 19:32:58

by Fabio Riccardi

[permalink] [raw]
Subject: Re: X15 alpha release: as fast as TUX but in user space

Ok I fixed it, the header date timestamp is updated with every request.

Performance doesn't seem to have suffered significantly (less than 1%).

You can find the new release at: http://www.chromium.com/X15-Alpha-2.tgz

BTW: Don't call me slime, I wasn't trying to cheat, I just didn't know that
the date stamp was required to be really up-to-date.

- Fabio

dean gaudet wrote:

> On Sun, 29 Apr 2001, Fabio Riccardi wrote:
>
> > I can disable header caching and see what happens, I'll add an option
> > for this in the next X15 release.
>
> heh, well to be honest, i'd put the (permanent) caching of the Date header
> into the very slimy, benchmark-only trick category. (right up there
> alongside running the HTTP and TCP stacks inside the NIC interrupt
> handler, which folks have done to get even better scores.)
>
> > Nevertheless I don't know how much this is interesting in real life,
> > since on the internet most static pages are cached on proxies. I agree
> > that the RFC asks for a date for the original response, but once the
> > response is cached what does this date mean?
>
> the Date is always the time the response was generated on the origin
> server. (there are exceptions for clockless servers, but such servers
> have other limitations you wouldn't want -- notably they MUST NOT generate
> Last-Modified.)
>
> one example oddity you might experience with a non-increasing Date
> surrounds Last-Modified and Date, see section 13.3.3. note that the rfc
> indicates that if Last-Modified is less than 60 seconds earlier than Date
> then Last-Modified is only a weak validator rather than a strong
> validator. this would complicate range requests -- because weak
> validators can't be used with range requests. if your server never
> updates the Date after the first time it serves an object then you'd
> potentially never get out of this 60 second window.
>
> (strong validators are guaranteed to change whenever the object changes...
> and Last-Modified isn't strong until some time has passed -- consider NFS
> mounted docroots, clock skew in the origin network, multiple file updates
> within a second, etc.)
>
> there are a bunch of other things that Date is used for, all of them are
> related to caching heuristics and rules.
>
> in theory you could claim that you're implementing a cache server rather
> than an origin server... i dunno what the SPEC committee will think when
> you try to submit results though :)
>
> so way back when sendfile() was being created i brought up the Date issue
> and pointed out that we needed more than a single "void *, size_t" to take
> care of headers. eventually this discussion lead creation of TCP_CORK --
> so that a http server could use writev() to send a two element iov for the
> headers -- one element with everything that doesn't need to change, the
> next element with the Date.
>
> i also kind of expected the high performance servers to rebuild a Date:
> header every second for all of its threads to use. (of course it's not
> that simple, you really want to keep a circular list of N dates... and
> just assume that after N seconds no thread could still be accessing an old
> Date.)
>
> is this too slow for some reason? (does it play well with zero-copy?)
>
> -dean
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2001-04-30 21:48:02

by dean gaudet

[permalink] [raw]
Subject: Re: X15 alpha release: as fast as TUX but in user space

On Mon, 30 Apr 2001, Fabio Riccardi wrote:

> Ok I fixed it, the header date timestamp is updated with every request.
>
> Performance doesn't seem to have suffered significantly (less than 1%).

rad!

> BTW: Don't call me slime, I wasn't trying to cheat, I just didn't know that
> the date stamp was required to be really up-to-date.

sorry, i meant to put a smily on there :)


On Sun, 29 Apr 2001, David S. Miller wrote:

> If you do the TCP_CORK thing, what you end up with is a scatter gather
> entry in the SKB for the header bits, then the page cache segments.

so then the NIC would be sent a 3 entry gather list -- 1 entry for TCP/IP
headers, 1 for HTTP headers, and 1 for the initial page cache segment?

are there any NICs which take only 2 entry lists? (boo hiss and curses
on such things if they exist!)

-dean

2001-04-30 21:53:42

by David Miller

[permalink] [raw]
Subject: Re: X15 alpha release: as fast as TUX but in user space


dean gaudet writes:
> On Sun, 29 Apr 2001, David S. Miller wrote:
>
> > If you do the TCP_CORK thing, what you end up with is a scatter gather
> > entry in the SKB for the header bits, then the page cache segments.
>
> so then the NIC would be sent a 3 entry gather list -- 1 entry for TCP/IP
> headers, 1 for HTTP headers, and 1 for the initial page cache segment?

Basically. It's weird because we could change tcp_sendmsg() to grab a
"little bit" of space in skb->data after the TCP headers area, but
that would screw all the memory allocation advantages carving up pages
gives us.

TCP used to be really rough on the memory subsystem, and in particular
going to a page carving scheme helped a lot in this area.

> are there any NICs which take only 2 entry lists? (boo hiss and curses
> on such things if they exist!)

Tulip I think falls into this category, I could be wrong. It has two
buffer pointers in the RX descriptor, but one might be able to chain
them.

Alexey added SG support to Tulip at some point, and I can probably dig
up the patch. It doesn't do hw csumming, though.

Later,
David S. Miller
[email protected]