2001-03-20 03:35:47

by Fabio Riccardi

[permalink] [raw]
Subject: user space web server accelerator support

Hi,

I've been working for a while on a user-space web server accelerator (as
opposed to a kernel space accelerator, like TUX). So far I've had very
promising results and I can achieve performance (spec) figures
comparable to those of TUX.

Although my implementation is entirely sitting in user space, I need
some cooperation form the kernel for efficiently forwarding network
connections from the accelerator to the full-fledged Apache server.

I've made a little kernel hack (mostly lifted out of the TUX and khttpd
code) to forward a live socket connection from an application to
another. I'd like to clean this up such that my users don't have to mock
with their kernel to get my accelerator to work.

Would it be a major heresy to ask for a new system call?

If so I could still hide my stuff in a kernel module and snatch an
unused kernel call for my private use (such as the one allotted for
tux). The problem with this is that the kernel only exposes the "right"
symbols to the modules if either khttp or ipv6 are compiled as modules.

How could this be fixed?

TIA, ciao,

- Fabio



2001-03-20 03:56:17

by Fabio Riccardi

[permalink] [raw]
Subject: Re: user space web server accelerator support

How can Apache "grab" the file descriptor?

My understanding is that file descriptors are data structures private to
a process...

Am I missing something?

- Fabio

"David S. Miller" wrote:

> Fabio Riccardi writes:
> > How could this be fixed?
>
> Why not pass the filedescriptors to apache over a UNIX domain
> socket? I see no need for a new facility.
>
> Later,
> David S. Miller
> [email protected]

2001-03-20 03:52:27

by David Miller

[permalink] [raw]
Subject: Re: user space web server accelerator support


Fabio Riccardi writes:
> How could this be fixed?

Why not pass the filedescriptors to apache over a UNIX domain
socket? I see no need for a new facility.

Later,
David S. Miller
[email protected]

2001-03-20 04:05:17

by Fabio Riccardi

[permalink] [raw]
Subject: Re: user space web server accelerator support

Fantastic!

I was not aware of it, sorry... where can I find some doc?

- Fabio

"David S. Miller" wrote:

> Fabio Riccardi writes:
> > How can Apache "grab" the file descriptor?
> >
> > My understanding is that file descriptors are data structures private to
> > a process...
> >
> > Am I missing something?
>
> Unix sockets allow one processes to "give" a file descriptor to
> another process via a facility called "file descriptor passing".
>
> Later,
> David S. Miller
> [email protected]

2001-03-20 04:00:37

by David Miller

[permalink] [raw]
Subject: Re: user space web server accelerator support


Fabio Riccardi writes:
> How can Apache "grab" the file descriptor?
>
> My understanding is that file descriptors are data structures private to
> a process...
>
> Am I missing something?

Unix sockets allow one processes to "give" a file descriptor to
another process via a facility called "file descriptor passing".

Later,
David S. Miller
[email protected]

2001-03-20 13:25:17

by Erik Mouw

[permalink] [raw]
Subject: Re: user space web server accelerator support

On Mon, Mar 19, 2001 at 08:07:49PM -0800, Fabio Riccardi wrote:
> Fantastic!
>
> I was not aware of it, sorry... where can I find some doc?

W. Richard Stevens, "Advanced programming in the UNIX environment",
chapter 15.3.


Erik

> "David S. Miller" wrote:
>
> > Unix sockets allow one processes to "give" a file descriptor to
> > another process via a facility called "file descriptor passing".

--
J.A.K. (Erik) Mouw, Information and Communication Theory Group, Department
of Electrical Engineering, Faculty of Information Technology and Systems,
Delft University of Technology, PO BOX 5031, 2600 GA Delft, The Netherlands
Phone: +31-15-2783635 Fax: +31-15-2781843 Email: [email protected]
WWW: http://www-ict.its.tudelft.nl/~erik/

2001-03-20 16:03:30

by Zach Brown

[permalink] [raw]
Subject: Re: user space web server accelerator support

> Fantastic!
>
> I was not aware of it, sorry... where can I find some doc?

There are some patches in the apache source rpms in
http://www.zabbo.net/phhttpd/ that shows how apache can connect to
another daemon and get its incoming connections sockets from it.

phhttpd itself is pretty hairy code (don't ask :)), but the apache
changes are pretty straight forward.

--
zach

2001-03-23 03:51:33

by Fabio Riccardi

[permalink] [raw]
Subject: Re: user space web server accelerator support

Dave, Zach,

thanks for your help, I've implemented a file descriptor passing mechanism
very similar to that of Zach's and it worked.

The problem now is performance, fd passing is utterly slow!

On my system (a 1GHz Pentium III + 2G RAM) I can do 1300 SpecWeb99 with a
khttp-like socket passing mechanism, while I only get something like 500 using
file descriptor passing. Indeed with fd passing I decrease Apache's
performance instead of increasing it!

I've checked my code several times and I don't believe that I have introduced
any specific bottleneck of my own (the code actually is quite trivial).

I've profiled the kernel and some interesting differences show:

With direct socket passing, 1300 SpecWeb load:

9759 total 0.0071
902 handle_IRQ_event 7.5167
256 skb_clone 0.6957
256 do_tcp_sendpages 0.0954
239 tcp_v4_rcv 0.1572
238 schedule 0.1766
226 __kfree_skb 0.9741
207 skb_release_data 1.7845
204 tcp_transmit_skb 0.1541
199 d_lookup 0.6910
190 path_walk 0.0973
181 ip_output 0.6754
168 fget 2.2105
165 do_softirq 1.1786
158 do_generic_file_read 0.1287

With file descriptor passing, 500 SpecWeb load:

8621 total 0.0063
7037 schedule 5.2203
462 handle_IRQ_event 3.8500
188 __wake_up 0.9216
114 unix_stream_data_wait 0.4191
81 __switch_to 0.3750
58 schedule_timeout 0.3718
25 d_lookup 0.0868
20 skb_clone 0.0543
19 path_walk 0.0097
17 tcp_transmit_skb 0.0128
17 do_tcp_sendpages 0.0063
17 do_softirq 0.1214
15 system_call 0.2679
15 sys_rt_sigtimedwait 0.0207

Zach, have you ever noticed such a performance bottleneck in your phhttpd?

SpecWeb has about 30% of its load as dynamic requests, so the amount of
forwarding is definitively significative in my case. Sime time ago I measured
khttp's impact in socket passing and I found that it was negligible
(forwarding everything to Apache instead of having it directly listening on
the socket had an impact of a few percent).

My impression from a first look to the profiling data is that the kernel is
doing a very poor job of scheduling and is ping-ponging between processes...
like it is not doing any buffering whatsoever and it is doing a contect switch
for every passed file descriptor.

Any thoughts?

- Fabio


2001-03-23 19:15:51

by Zach Brown

[permalink] [raw]
Subject: Re: user space web server accelerator support

> Zach, have you ever noticed such a performance bottleneck in your phhttpd?

yup, this is definitely something you don't want to be doing in the fast
path :)

> Any thoughts?

Sorry I don't remember the start of this thread, but I'll ask anyway;
have you looked at Ingo Molnar's Tux server? Its state of the art unix
serving, implemented in the linux kernel:

http://people.redhat.com/mingo/TUX-patches/

--
zach

2001-03-23 20:21:51

by Fabio Riccardi

[permalink] [raw]
Subject: kernel support for _user space_ web server accelerator

Ok, here it comes again,

I don't like the idea of having a web server in the kernel, I don't think it
belongs there.

Yes I'm pretty familiar with TUX, I believe that it is a foundamental piece of
achievement in web server performance study. Neverthanless I think that it is
sitting on the wrong spot.

I'm building an alternative web server that is entirely in _user space_ and
that achieves the same level of performance as TUX. Presently I can match TUX
performance within 10-20%, and I still have quite a few improvements in my
pocket.

Nevertheless I need some minimal help from the kernel, like a FAST (and
secure?) mechanism for socket forwarding and a better (non-blocking on
files) sendfile interface.

For the time being I'm using a socket delivery mechanism similar to that of
TUX and khttpd, as I stated at the beginning of this thread. I don't like the
idea of patching the kernel, I don't believe that it is a viable distribution
mechanism and I'm trying to find a better way of adding the functionality that
I require as a kernel module.

Currently the "right" kernel network interfaces are exposed to the modules only
if khttpd or ipv6 are compiled as modules. Can we change this such that a
standard binary kernel (say, the one coming with a vanilla RedHat distrubution
or similar) would expose the right stuff?

Would it make any sense to have a real system call doing this kind of stuff?

HELP! :)

TIA, ciao,

- Fabio

Zach Brown wrote:

> > Zach, have you ever noticed such a performance bottleneck in your phhttpd?
>
> yup, this is definitely something you don't want to be doing in the fast
> path :)
>
> > Any thoughts?
>
> Sorry I don't remember the start of this thread, but I'll ask anyway;
> have you looked at Ingo Molnar's Tux server? Its state of the art unix
> serving, implemented in the linux kernel:
>
> http://people.redhat.com/mingo/TUX-patches/
>
> --
> zach

2001-04-18 17:21:26

by Ingo Molnar

[permalink] [raw]
Subject: numbers?


On Fri, 23 Mar 2001, Fabio Riccardi wrote:

> I'm building an alternative web server that is entirely in _user
> space_ and that achieves the same level of performance as TUX.
> Presently I can match TUX performance within 10-20%, and I still have
> quite a few improvements in my pocket.

very interesting statement, which appears to be contradicted by numbers on
your website. Your website says you get a 1375 SPECweb99 connections
result on a dual 1 GHz, 4 GB, PIII system:

http://www.chromium.com/cr_hp.html

the best TUX 2.0 result published so far, on a very similar system (same
CPU speed, same amount of RAM, same number and type of network cards) is
3222 connections:

http://www.spec.org/osg/web99/results/res2001q2/web99-20010319-00100.html

the difference between 1375 and 3222 is quite substantial, TUX is 134%
faster (2.3 times the performance of your server). I'm sure a userspace
webserver can get quite close to TUX in simple static benchmarks (in fact
phttpd should be very close), but SPECweb99 is far from simple. When
saying you are 10-20% close to TUX, did you refer to SPECweb99 results?

Ingo

2001-04-20 19:32:12

by Fabio Riccardi

[permalink] [raw]
Subject: Re: numbers?

The current chromium server is based on Apache 1.3, and it inherits its threading
limitations.

Incidentally the same server running on a kernel with a multiqueue scheduler
achieves 1600 connections per second on the same machine, that was the original
reason for my message for a better scheduler.

In any case the chromium server is a substantially faster apache (more than a factor
of two on spec, possibly much more in real life), and given that Apache is the most
widespread server on the planet, having something that makes it faster is quite
handy. You don't need any further training for users, all standard modules work, we
even fixed the performance problems that afflicted Tomcat Jakarta. A convenient
solution.

Our forthcoming server (dubbed X15) is a completely new thing, but it still sits in
user space.

X15 is the server I was referring to and as far as I can measure I get very much the
same performance as TUX.

On a Dell 4400 (933 MHz PIII, 2G of RAM, 5 9G disks) I get 2450 connections/second.

On a Dell PowerEdge 1550/1000 the published TUX 2 result is 2765.

If you take into account the fact that the 1550 has a faster processor (1GHz) and a
more modern bus architecture (Serverworks HE with memory interleaving and a triple
PCI bus), the performance is roughly the same.

I'd love to try TUX and X15 head to head on the same hardware, indeed I've spent the
last two days trying to get TUX to run on my Dell 4400, but I wasn't very lucky.

The static pages work fine, the dynamic module gets executed, but for some reason it
fails to open the postlog file and to spawn the spec utility tasks at reset time.

I tried the latest TUX based on 2.4.2-ac26 from your home site at RedHat, and I've
used the latest TUX dynamic code that is published on the SPEC site
(Compaq-20010122-DL320-API.tar.gz).

Are they compatible with each other?

I'll try again today with the TUX that comes with the new RH 7.1

The PowerEdge 2500 for which the TUX result is 3225 (I think that this is the result
you were quoting) is a much modern machine than the 4400, with a much higher memory
bandwidth, that could explain the performance difference (four NICs sucking data at
the same time require quite a bit of bandwidth).

I'll make an alpha release of X15 available for download by the end of next week, so
people will be able to test it independently.

- Fabio

Ingo Molnar wrote:

> On Fri, 23 Mar 2001, Fabio Riccardi wrote:
>
> > I'm building an alternative web server that is entirely in _user
> > space_ and that achieves the same level of performance as TUX.
> > Presently I can match TUX performance within 10-20%, and I still have
> > quite a few improvements in my pocket.
>
> very interesting statement, which appears to be contradicted by numbers on
> your website. Your website says you get a 1375 SPECweb99 connections
> result on a dual 1 GHz, 4 GB, PIII system:
>
> http://www.chromium.com/cr_hp.html
>
> the best TUX 2.0 result published so far, on a very similar system (same
> CPU speed, same amount of RAM, same number and type of network cards) is
> 3222 connections:
>
> http://www.spec.org/osg/web99/results/res2001q2/web99-20010319-00100.html
>
> the difference between 1375 and 3222 is quite substantial, TUX is 134%
> faster (2.3 times the performance of your server). I'm sure a userspace
> webserver can get quite close to TUX in simple static benchmarks (in fact
> phttpd should be very close), but SPECweb99 is far from simple. When
> saying you are 10-20% close to TUX, did you refer to SPECweb99 results?
>
> Ingo
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2001-04-20 19:44:26

by Ingo Molnar

[permalink] [raw]
Subject: Re: numbers?


On Fri, 20 Apr 2001, Fabio Riccardi wrote:

> X15 is the server I was referring to and as far as I can measure I get
> very much the same performance as TUX.
>
> On a Dell 4400 (933 MHz PIII, 2G of RAM, 5 9G disks) I get 2450
> connections/second.

(the unit is not "connections/second" but "connections")

> On a Dell PowerEdge 1550/1000 the published TUX 2 result is 2765.
>
> If you take into account the fact that the 1550 has a faster processor
> (1GHz) and a more modern bus architecture (Serverworks HE with memory
> interleaving and a triple PCI bus), the performance is roughly the
> same.

the system was IO-limited (given that a ~9 GB fileset was running on a 2
GB RAM system), so CPU speed has not a big impact. I'd say it makes no
sense to compare different systems.

> The static pages work fine, the dynamic module gets executed, but for
> some reason it fails to open the postlog file and to spawn the spec
> utility tasks at reset time.

the newest TUX code chroots into docroot, so you should either use "/" as
the docroot, or put /lib libraries into your docroot.

> I'll make an alpha release of X15 available for download by the end of
> next week, so people will be able to test it independently.

(will source code be available so we can see whether it's an apples to
apples thing?)

Ingo

2001-04-20 20:52:37

by Alan

[permalink] [raw]
Subject: Re: numbers?

> Incidentally the same server running on a kernel with a multiqueue scheduler
> achieves 1600 connections per second on the same machine, that was the original
> reason for my message for a better scheduler.

I get 2000 connections a second with a single threaded server called thttpd
on my setup. Thats out of the box on 2.4.2ac with zero copy/sendfile.

I've never had occasion to frob with tux or specweb

2001-04-20 21:09:09

by Fabio Riccardi

[permalink] [raw]
Subject: Re: numbers?

Alan,

SPEC connections are cumulative of static (70%) and dynamic (30%) pages, with the
dynamic using quite a bit of CPU (25%-30%) and the static pages dataset of several
(6-8) gigabytes.

The chromium server is actually much faster than thttpd and it is a complete web
server.

- Fabio

Alan Cox wrote:

> > Incidentally the same server running on a kernel with a multiqueue scheduler
> > achieves 1600 connections per second on the same machine, that was the original
> > reason for my message for a better scheduler.
>
> I get 2000 connections a second with a single threaded server called thttpd
> on my setup. Thats out of the box on 2.4.2ac with zero copy/sendfile.
>
> I've never had occasion to frob with tux or specweb

2001-04-20 21:19:53

by Fabio Riccardi

[permalink] [raw]
Subject: Re: numbers?

Ingo Molnar wrote:

> > On a Dell PowerEdge 1550/1000 the published TUX 2 result is 2765.
> >
> > If you take into account the fact that the 1550 has a faster processor
> > (1GHz) and a more modern bus architecture (Serverworks HE with memory
> > interleaving and a triple PCI bus), the performance is roughly the
> > same.
>
> the system was IO-limited (given that a ~9 GB fileset was running on a 2
> GB RAM system), so CPU speed has not a big impact. I'd say it makes no
> sense to compare different systems.

>From what I've seen the major impact comes from the disk IO bandwidth to
memory size ratio and from the PCI bus to memory bandwidth.

I agree that comparing different hardware architectures is a tricky business,
but you asked me to comment on some of the comparisons that you made...

> > The static pages work fine, the dynamic module gets executed, but for
> > some reason it fails to open the postlog file and to spawn the spec
> > utility tasks at reset time.
>
> the newest TUX code chroots into docroot, so you should either use "/" as
> the docroot, or put /lib libraries into your docroot.

Oh, the docs don't mention anything of that... I'll try to set the docroot as
you say

> > I'll make an alpha release of X15 available for download by the end of
> > next week, so people will be able to test it independently.
>
> (will source code be available so we can see whether it's an apples to
> apples thing?)

I'll release the source for the SPEC dynamic code dll, which indeed is just a
straight porting of the TUX dynamic code from the SPEC site.

- Fabio


2001-04-21 04:44:52

by Ingo Molnar

[permalink] [raw]
Subject: Re: numbers?


On Fri, 20 Apr 2001, Fabio Riccardi wrote:

> I agree that comparing different hardware architectures is a tricky
> business, but you asked me to comment on some of the comparisons that
> you made...

well, those two systems looked similar enough. (same CPU speed and the
test is CPU limited in that case.) But i agree that the only sure
comparison is by testing on the same system.

Ingo