Em Mon, Dec 10, 2001 at 11:18:26PM -0800, David S. Miller escreveu:
> From: Arnaldo Carvalho de Melo <[email protected]>
> Date: Mon, 10 Dec 2001 23:08:10 -0200
>
> David, if you don't like it, I'll happily switch to the big fat
> union idea, but I think that this is more clean and will avoid us
> having to patch sock.h every time a new net stack is added to the
> kernel.
>
> I'm a little concerned about having to allocate two objects instead of
> one.
ok, this new patch doesn't allocates two objects, just one like it is
today, more below
> These things aren't like inodes. Inodes are cached and lookup
> read-multiple objects, whereas sockets are throw-away and recycled
> objects. Inode allocation performance therefore isn't that critical,
> but socket allocation performance is.
>
> Then we go back to the old problem of protocols that can be used to
> embed IP and thus having to keep track of parallel state for multiple
> protocols. I think your changes did not compromise that for what
> we currently support though.
>
> I still need to think about this some more....
>
> You know, actually, the protocols are the ones which call sk_alloc().
> So we could just extend sk_alloc() to take a kmem_cache_t argument.
Well, in this patch I added two new fields to net_proto_family, sk_cachep
and sk_size, so that the current protocols will pass this as zero without
modification and the net_families[proto]->sk_cachep will get the current
sk_cachep slabcache, but the ones that actually initializes the sk_cachep
and sk_size members of net_proto_family will use its private slab cache
that has objsize == sizeof(struct sock) + sizeof(proto_opt), with this I
removed the sk->tp_pinfo altogether and use the macro TCP_PINFO like you
suggested, also net_pinfo is gone and sk->protinfo is just a void pointer
now.
> TCP could thus make a kmem_cache_t which is sizeof(struct sock)
> + sizeof(struct tcp_opt) and then set the TP_INFO pointer to "(sk +
> 1)".
>
> Oh yes, another overhead is all the extra dereferencing. To fight
> that we could make a macro that knows the above layout:
>
> #define TCP_PINFO(SK) ((struct tcp_opt *)((SK) + 1))
yes
> So I guess we could do things your way without any of the potential
> performance problems.
>
> It is going to be a while before I can apply something like this, I
> would like to help Jens+Linus get the new block stuff in shape first.
> This would obviously be a 2.5.x change too.
Ok, this is just for comments, I'll do the modifications that people agree
on here.
The changes were rather minimal, i.e., tcp already was using this style:
int tcp_foo_function(struct sock* sk)
{
struct tcp_opt *tp = &sk->tp_pinfo.af_tcp;
and in the patch it just changes it to:
struct tcp_opt *tp = TCP_PINFO(sk);
in most places.
I could make patches to make the IP_SK case be of similar style of the
current tcp_opt usage (i.e., like the code excerpt above).
This message was sent using the patch 8)
All the protocols, khttpd, netfilter, etc, are already converted to this
new style and the only thing that still has to be done is to remove things
like daddr, saddr, rcv_saddr, dport, sport and other ipv4 specific members
of struct sock. Ah, another thing is to try and make rtnetlink use
sock_register prior to using sk_alloc, so that I can remove the check for
in sk_alloc and net_proto_sk_size for net_families[family] being NULL.
Please let me know if this is something acceptable for 2.5.
The patch is still for 2.4.16, but it should apply cleanly to 2.5.1, I
think. Of course I'll make sure it works with 2.5.1 if it is considered OK.
I'll stop working on it for now till further comments are made by the net
stack maintainers and concentrate on a new task: to do the same thing for
struct inode, where, it seems, we won't even need the per fs slabcaches,
just using a private void pointer 8)
Here is an example of how the slabcaches are:
[acme@rama2 acme]$ grep sock /proc/slabinfo
unix_sock 5 10 396 1 1 1 : 17 735 2 1 0
inet_sock 17 20 800 4 4 1 : 25 207 8 4 0
sock 0 0 332 0 0 1 : 0 0 0 0 0
[acme@rama2 acme]$
And without the patch, using 2.4.16-pre1
[acme@brinquedo linux]$ grep sock /proc/slabinfo
sock 76 84 1056 11 12 2 : 87 4730 19 7 0
With this we get memory savings and performance gains in addition to the
much needed (IMHO) cleanup of include/net/sock.h 8)
On big busy servers these savings can reach over one megabyte of kernel memory,
please correct me if I'm wrong :) IPv6 sockets use about 980 bytes, so for
a kernel with IPv6 compiled even as a module one can get savings even for
the ipv4 sockets case.
Patch available at:
http://www.kernel.org/pub/linux/kernel/people/acme/v2.4/2.4.16/
sock.cleanup-5.patch.bz2
A not so complete changelog is at:
http://www.kernel.org/pub/linux/kernel/people/acme/v2.4/2.4.16/
patch-2.4.16.log
it can be of help in understanding this patch.
Waiting for comments and testers results,
TIA,
- Arnaldo
Em Mon, Dec 17, 2001 at 10:51:34PM -0800, David S. Miller escreveu:
> From: Arnaldo Carvalho de Melo <[email protected]>
> Date: Tue, 18 Dec 2001 03:35:52 -0200
>
> the only thing that still has to be done is to remove things
> like daddr, saddr, rcv_saddr, dport, sport and other ipv4 specific members
> of struct sock
>
> Actually, I'd like to keep the first couple cache lines of struct
> sock the way it is :-( For hash lookups the identity + the hash next
> pointer fit perfectly in one cache line on nearly all platforms.
fair
> Which brings me to...
>
> Please let me know if this is something acceptable for 2.5.
>
> What kind of before/after effect do you see in lat_tcp/lat_connect
> (from lmbench) runs?
Will see today, I concentrated on the cleanup part trying not to harm
performance by following the suggestions for the first patch (i.e., just one
allocation, etc). I'll test it later today, at the lab, UP and SMP (4 and 8
way) and submit the results here.
Apart from possible performance problems, does the patch looks OK?
- Arnaldo
Em Mon, Dec 17, 2001 at 10:51:34PM -0800, David S. Miller escreveu:
> Which brings me to...
>
> Please let me know if this is something acceptable for 2.5.
>
> What kind of before/after effect do you see in lat_tcp/lat_connect
> (from lmbench) runs?
Improvements on the lat_connect case? :)
2.4.16
TCP latency using 127.0.0.1: 119.3369 microseconds
TCP latency using 127.0.0.1: 118.9847 microseconds
TCP latency using 127.0.0.1: 118.5139 microseconds
TCP latency using 127.0.0.1: 119.1301 microseconds
TCP latency using 127.0.0.1: 118.6322 microseconds
TCP/IP connection cost to 127.0.0.1: 429.6667 microseconds
TCP/IP connection cost to 127.0.0.1: 430.7692 microseconds
TCP/IP connection cost to 127.0.0.1: 431.4615 microseconds
TCP/IP connection cost to 127.0.0.1: 430.3846 microseconds
TCP/IP connection cost to 127.0.0.1: 435.4615 microseconds
2.4.16-acme5
TCP latency using 127.0.0.1: 119.2639 microseconds
TCP latency using 127.0.0.1: 118.6068 microseconds
TCP latency using 127.0.0.1: 119.0443 microseconds
TCP latency using 127.0.0.1: 119.5683 microseconds
TCP latency using 127.0.0.1: 118.9556 microseconds
TCP/IP connection cost to 127.0.0.1: 408.3571 microseconds
TCP/IP connection cost to 127.0.0.1: 409.6429 microseconds
TCP/IP connection cost to 127.0.0.1: 410.6429 microseconds
TCP/IP connection cost to 127.0.0.1: 409.2143 microseconds
TCP/IP connection cost to 127.0.0.1: 414.8333 microseconds
More results are coming, this time for local connections on a 8-way box
- Arnaldo
From: Arnaldo Carvalho de Melo <[email protected]>
Date: Tue, 18 Dec 2001 18:52:00 -0200
Em Mon, Dec 17, 2001 at 10:51:34PM -0800, David S. Miller escreveu:
> Which brings me to...
>
> Please let me know if this is something acceptable for 2.5.
>
> What kind of before/after effect do you see in lat_tcp/lat_connect
> (from lmbench) runs?
Improvements on the lat_connect case? :)
Great. I have no fundamental problems with your changes.
Now, when/if we move the hash-chain/identity members into
the IPv4 struct (there are some issues with this wrt. ipv6 btw) I will
be interested in seeing the same tests done :-)
Ok, patch for 2.5.1, without the bogus cvs $id strings hunks, being used
in this machine now.
Available at:
http://www.kernel.org/pub/linux/kernel/people/acme/v2.5/2.5.1/
sock.cleanup-2.5.1.patch.bz2
Ah, the lat_unix_connect results on a pentium 300 mmx notebook:
2.5.1 + this patch
UNIX connection cost : 96.1749 microseconds
UNIX connection cost : 96.3361 microseconds
UNIX connection cost : 97.2310 microseconds
UNIX connection cost : 101.9180 microseconds
UNIX connection cost : 97.2461 microseconds
2.4.16 pristine
UNIX connection cost : 112.7034 microseconds
UNIX connection cost : 114.5494 microseconds
UNIX connection cost : 114.0923 microseconds
UNIX connection cost : 111.0959 microseconds
UNIX connection cost : 120.8419 microseconds
And about 100 KB of kernel memory saved for AF_UNIX sockets on a basic KDE
session (i.e., the AF_UNIX struct sock now is about 400 bytes when it is about
1200 bytes on a pristine kernel).
- Arnaldo
From: Arnaldo Carvalho de Melo <[email protected]>
Date: Thu, 20 Dec 2001 01:23:39 -0200
Available at:
http://www.kernel.org/pub/linux/kernel/people/acme/v2.5/2.5.1/
sock.cleanup-2.5.1.patch.bz2
Looking pretty good. I have one improvement.
I'd rather you pass the "kmem_cache_t" directly into sk_alloc, use
NULL for "I don't have any extra private area".
And then, for the IP case lay it out like this:
struct sock
struct ip_opt
struct {tcp,raw4,...}_opt
And use different kmem_cache_t's for each protocol instead of
the same one for tcp, raw4, etc.
RAW/UDP sockets waste a lot of space with your current layout.
Em Thu, Dec 20, 2001 at 12:21:26AM -0800, David S. Miller escreveu:
> From: Arnaldo Carvalho de Melo <[email protected]>
> Date: Thu, 20 Dec 2001 01:23:39 -0200
>
> Available at:
>
> http://www.kernel.org/pub/linux/kernel/people/acme/v2.5/2.5.1/
> sock.cleanup-2.5.1.patch.bz2
>
> Looking pretty good. I have one improvement.
>
> I'd rather you pass the "kmem_cache_t" directly into sk_alloc, use
> NULL for "I don't have any extra private area".
humm I did that with sock_register to avoid changing all the sk_alloc
users, but in the end all protocols were changed so... ok, I'll do that, at
least it'll simplify the "rtnetlink socket allocated early in the boot
process before sock_register(rtnetlink) was called".
> And then, for the IP case lay it out like this:
>
> struct sock
> struct ip_opt
> struct {tcp,raw4,...}_opt
>
> And use different kmem_cache_t's for each protocol instead of
> the same one for tcp, raw4, etc.
>
> RAW/UDP sockets waste a lot of space with your current layout.
*grin*
Ok, ok, lets save more bytes 8) I'll look into this.
- Arnaldo
Em Thu, Dec 20, 2001 at 12:21:26AM -0800, David S. Miller escreveu:
> From: Arnaldo Carvalho de Melo <[email protected]>
> Date: Thu, 20 Dec 2001 01:23:39 -0200
>
> Available at:
>
> http://www.kernel.org/pub/linux/kernel/people/acme/v2.5/2.5.1/
> sock.cleanup-2.5.1.patch.bz2
>
> Looking pretty good. I have one improvement.
>
> I'd rather you pass the "kmem_cache_t" directly into sk_alloc, use
> NULL for "I don't have any extra private area".
>
> And then, for the IP case lay it out like this:
>
> struct sock
> struct ip_opt
> struct {tcp,raw4,...}_opt
>
> And use different kmem_cache_t's for each protocol instead of
> the same one for tcp, raw4, etc.
>
> RAW/UDP sockets waste a lot of space with your current layout.
Indeed it wastes, but the current setup in the stock kernel wastes even more ;)
Well, did what you suggested, adding a slab parameter to sk_alloc and I
also overloaded zero_it but its current behaviour is maintained, i.e., 0 ==
don't zeroes the newly allocated sock, 1 == zeroes it, the overloading is:
1 == sizeof(struct sock), > 1 == objsize of the per protocol slabcache. For
now I did only to UDPv4 sockets, will do the others this afternoon, this is
the result so far:
[rama2 kernel-acme]$ grep sock /proc/slabinfo
unix_sock 7 20 400 1 2 1 : 17 572 2 0 0
udp_sock 6 10 372 1 1 1 : 7 31 1 0 0
tcp_sock 13 15 800 3 3 1 : 13 46 3 0 0
sock 0 0 336 0 0 1 : 0 0 0 0 0
Now UDP sockets use only 372 bytes while in the stock kernel it uses 1280
bytes when all the protocols are selected (as modules or statically
linked, but more than 1 KB when just TCP/IP v4 is selected).
More to come. Patch will be available later today.
- Arnaldo
Em Fri, Dec 21, 2001 at 11:54:38AM -0200, Arnaldo Carvalho de Melo escreveu:
> Em Thu, Dec 20, 2001 at 12:21:26AM -0800, David S. Miller escreveu:
> > I'd rather you pass the "kmem_cache_t" directly into sk_alloc, use
> > NULL for "I don't have any extra private area".
> >
> > And then, for the IP case lay it out like this:
> >
> > struct sock
> > struct ip_opt
> > struct {tcp,raw4,...}_opt
> >
> > And use different kmem_cache_t's for each protocol instead of
> > the same one for tcp, raw4, etc.
> >
> > RAW/UDP sockets waste a lot of space with your current layout.
Done, patch available at:
http://www.kernel.org/pub/linux/kernel/people/acme/v2.5/2.5.2-pre1/
sock.cleanup-2.5.2-pre1.bz2
Current state of /proc/slabinfo:
[acme@rama2 acme]$ grep sock /proc/slabinfo
unix_sock 7 20 400 1 2 1 : 17 572 2 0 0
raw4_sock 0 10 376 0 1 1 : 1 3 1 0 0
udp_sock 6 10 372 1 1 1 : 7 31 1 0 0
tcp_sock 13 15 800 3 3 1 : 14 47 3 0 0
sock 0 0 336 0 0 1 : 0 0 0 0 0
TODO: do the same for IPv6, that now has only one slabcache for all its
protocols.
- Arnaldo