2001-10-13 18:26:16

by Andrea Arcangeli

[permalink] [raw]
Subject: 2.4.13pre2aa1

Only in 2.4.12aa1: 00_backout-2.4.11pre1-1
Only in 2.4.12aa1: 00_o_direct-2
Only in 2.4.13pre2aa1: 00_o_direct-3

Made a self contained patch that applies cleanly to 2.4.13pre2 ready
for merging. Many thanks to Janet Morgan for fiding another bug in the
blkdev O_DIRECT support.

Only in 2.4.12aa1: 00_cache-without-buffers-1
Only in 2.4.12aa1: 00_parport-fix-1

Just in mainline.

Only in 2.4.13pre2aa1: 00_files_struct_rcu-2.4.10-04-1

File locking read/write spinlock replaced with RCU.

Only in 2.4.13pre2aa1: 00_ordered-freeing-1

Free the pages so that they gets allocated in physical order later,
shouldn't matter but I got reports of slower in-core (cache)
performance on some arch after a fresh boot, and speedup after the
freelist got randomized by load.

Only in 2.4.13pre2aa1: 00_rcu-poll-1

RCU implementation based on latest Dipankar's patch against 2.4.10.
I changed it so that it only has as fast-path/reader-load cost a
per-cpu counter increment in scheduler and nothing else. It is not
arch dependent either. I also tried to optimized the UP case.

Only in 2.4.12aa1: 00_rwsem-fair-22
Only in 2.4.13pre2aa1: 00_rwsem-fair-23

Rediffed.

Only in 2.4.12aa1: 00_rwsem-fair-22-recursive-4
Only in 2.4.13pre2aa1: 00_rwsem-fair-23-recursive-4

Renamed.

Only in 2.4.12aa1: 00_vm-2
Only in 2.4.13pre2aa1: 00_vm-3
Only in 2.4.13pre2aa1: 00_vm-3.1

Further vm changes, backed out the PG_wait_for_IO since it seems not
to make a relevant difference. Should behave better under swap load.
In particular I'm probing the inactive list from shrink_cache now,
so that I get feedback on when it's time to swap before the shrinks
on the inactive list starts failing.

Only in 2.4.12aa1: 10_compiler.h-1
Only in 2.4.13pre2aa1: 10_compiler.h-2

Rediffed.

Only in 2.4.13pre2aa1: 10_lvm-snapshot-check-1
Only in 2.4.12aa1: 10_lvm-snapshot-hardsectsize-1
Only in 2.4.13pre2aa1: 10_lvm-snapshot-hardsectsize-2

LVM updates from Chris.

Only in 2.4.12aa1: 10_numa-sched-11
Only in 2.4.13pre2aa1: 10_numa-sched-12

Rediffed.

Only in 2.4.12aa1: 50_uml-patch-2.4.11-1.bz2
Only in 2.4.13pre2aa1: 50_uml-patch-2.4.12-1-1.bz2

Latest patch from Jeff.

Only in 2.4.12aa1: 60_tux-2.4.10-ac10-E6.bz2
Only in 2.4.13pre2aa1: 60_tux-2.4.10-ac10-F5.bz2
Only in 2.4.12aa1: 62_tux-generic-file-read-1
Only in 2.4.13pre2aa1: 62_tux-generic-file-read-2

Latest update from Ingo.

Andrea


2001-10-15 17:31:09

by Martin Devera

[permalink] [raw]
Subject: new VM: what is classzone ?

Hello Andrea,

please can you explain in few words what 'classzone'
is supposed to be ?
I'm trying to understand new code and knowing idea
behind could help more people. And should not delay
your work too much ..

devik


2001-10-15 18:11:16

by M. Edward Borasky

[permalink] [raw]
Subject: How many versions of VM are there?

Can I get some clarification on how many different versions of VM are
"kicking around?" I know there is the 2.2 version, the 2.4.x Linus Torvalds
version, and Rik van Riel's version which currently resides in the
2.4.x-acyy tree. Are there others? Which one is likely to be the "fittest
that survives?" And where might I find documentation on the tuning parameters?
The 2.2 version is of little interest, but the others are quite important.
--
[email protected] (M. Edward Borasky) http://www.aracnet.com/~znmeb
http://groups.yahoo.com/group/BoraskyResearchJournal
http://groups.yahoo.com/group/comp-finance
http://groups.yahoo.com/group/pdx-neuro-semantics

Americans for Gnu Control remind you: "No gnus is good news!"

2001-10-15 20:57:40

by Slo Mo Snail

[permalink] [raw]
Subject: Re: How many versions of VM are there?

As far as I know there are 2 different version of VM:
2.2.x and 2.4.x-acyy: Rik von Riel's VM (but I'm not sure wether it's the
same VM)
2.4.x: Andrea's VM

You find Documentation in
Documentation/vm/*
Documentation/sysctl/vm.txt

but i think there's no Documentation of Andrea's VM yet

Bye

> Can I get some clarification on how many different versions of VM are
> "kicking around?" I know there is the 2.2 version, the 2.4.x Linus Torvalds
> version, and Rik van Riel's version which currently resides in the
> 2.4.x-acyy tree. Are there others? Which one is likely to be the "fittest
> that survives?" And where might I find documentation on the tuning
> parameters? The 2.2 version is of little interest, but the others are quite
> important.

2001-10-15 21:15:12

by Mike Fedyk

[permalink] [raw]
Subject: Re: How many versions of VM are there?

On Mon, Oct 15, 2001 at 10:58:22PM +0200, Slo Mo Snail wrote:
> As far as I know there are 2 different version of VM:
> 2.2.x and 2.4.x-acyy: Rik von Riel's VM (but I'm not sure wether it's the
> same VM)

No.

It's not the same VM. 2.2 had a VM change from Andrea at about 2.2.17-18,
don't know about before...

> 2.4.x: Andrea's VM
>

2.4.0-2.4.10pre10 = Rik's VM.

2.4.10pre11+ = Andrea's VM

2.4.12-ac has Rik's vm, and it looks like Alan will keep Rik's VM for a while.

> You find Documentation in
> Documentation/vm/*
> Documentation/sysctl/vm.txt
>

I think this is still out of date... There's a patch at Rik's site. Don't
know if it is accurate for Andrea's VM...

http://www.surriel.com/patches/

> but i think there's no Documentation of Andrea's VM yet
>

True. I don't know if Andrea is working on that yet... It looks like he's
trying to iron out his new VM...

Mike

2001-10-15 22:24:52

by Tim Moore

[permalink] [raw]
Subject: Re: How many versions of VM are there?

Mike Fedyk wrote:
>
> On Mon, Oct 15, 2001 at 10:58:22PM +0200, Slo Mo Snail wrote:
> > As far as I know there are 2 different version of VM:
> > 2.2.x and 2.4.x-acyy: Rik von Riel's VM (but I'm not sure wether it's the
> > same VM)
>
> No.
>
> It's not the same VM. 2.2 had a VM change from Andrea at about 2.2.17-18,
> don't know about before...

2.2.19pre2
o Drop the page aging for a moment to merge the
Andrea VM
o Merge Andrea's VM-global patch (Andrea
Arcangeli)

rgds,
tim.
--

2001-10-15 22:29:32

by Mike Fedyk

[permalink] [raw]
Subject: Re: How many versions of VM are there?

On Mon, Oct 15, 2001 at 03:24:48PM -0700, Tim Moore wrote:
> Mike Fedyk wrote:
> >
> > On Mon, Oct 15, 2001 at 10:58:22PM +0200, Slo Mo Snail wrote:
> > > As far as I know there are 2 different version of VM:
> > > 2.2.x and 2.4.x-acyy: Rik von Riel's VM (but I'm not sure wether it's the
> > > same VM)
> >
> > No.
> >
> > It's not the same VM. 2.2 had a VM change from Andrea at about 2.2.17-18,
> > don't know about before...
>
> 2.2.19pre2
> o Drop the page aging for a moment to merge the
> Andrea VM
> o Merge Andrea's VM-global patch (Andrea
> Arcangeli)
>

So 2.2 used page aging like 2.0 and 2.4 until 2.2.19pre2?

2001-10-15 22:39:04

by Gerhard Mack

[permalink] [raw]
Subject: Re: How many versions of VM are there?

On Mon, 15 Oct 2001, Mike Fedyk wrote:

> On Mon, Oct 15, 2001 at 03:24:48PM -0700, Tim Moore wrote:
> > Mike Fedyk wrote:
> > >
> > > On Mon, Oct 15, 2001 at 10:58:22PM +0200, Slo Mo Snail wrote:
> > > > As far as I know there are 2 different version of VM:
> > > > 2.2.x and 2.4.x-acyy: Rik von Riel's VM (but I'm not sure wether it's the
> > > > same VM)
> > >
> > > No.
> > >
> > > It's not the same VM. 2.2 had a VM change from Andrea at about 2.2.17-18,
> > > don't know about before...
> >
> > 2.2.19pre2
> > o Drop the page aging for a moment to merge the
> > Andrea VM
> > o Merge Andrea's VM-global patch (Andrea
> > Arcangeli)
> >
>
> So 2.2 used page aging like 2.0 and 2.4 until 2.2.19pre2?

AFIK there were some changes to the VM during the mid 2.1.x series.

Gerhard


--
Gerhard Mack

[email protected]

<>< As a computer I find your faith in technology amusing.

2001-10-16 10:16:39

by snpe

[permalink] [raw]
Subject: Re: How many versions of VM are there?

On Monday 15 October 2001 11:15 pm, Mike Fedyk wrote:

> http://www.surriel.com/patches/
>

Is there this link ?

regards ,
peco

2001-10-16 10:39:05

by Kirill Ratkin

[permalink] [raw]
Subject: Very old kernel.

Hi. Do anybody know how to compile old kernel? (I need
to compile 2.0.35 verion). I make config and make dep,
when I do it I see error (during make dep). I found
this problem as bus error in mkdep binary. I tried to
take config scripts from 2.4.x kernel and it's ok but
when I tried to compile I saw many error connected
with asm statement and function type prefixes (like
__constant_memcopy). I wouldn't like to install old
gcc and old binutils. Are there ways to compile old
kernel with new dev. tools?

Regards,


__________________________________________________
Do You Yahoo!?
Make a great connection at Yahoo! Personals.
http://personals.yahoo.com

2001-10-16 11:03:42

by Christian Groessler

[permalink] [raw]
Subject: Re: Very old kernel.


On 10/16/2001 03:39:06 AM MST Kirill Ratkin wrote:
>
>Hi. Do anybody know how to compile old kernel? (I need
>to compile 2.0.35 verion). I make config and make dep,
>when I do it I see error (during make dep). I found
>this problem as bus error in mkdep binary. I tried to
>take config scripts from 2.4.x kernel and it's ok but
>when I tried to compile I saw many error connected
>with asm statement and function type prefixes (like
>__constant_memcopy). I wouldn't like to install old
>gcc and old binutils. Are there ways to compile old
>kernel with new dev. tools?
>
>Regards,

I did the following change to mkdep.c of the 2.0.39 kernel
to compile it. It's some time ago, it was a problem
with mmap, iirc.

regards,
chris


--- mkdep.c.org Tue Oct 8 18:33:56 1996
+++ mkdep.c Wed Feb 7 15:24:36 2001
@@ -229,6 +229,8 @@
int pagesizem1 = getpagesize()-1;
int fd = open(filename, O_RDONLY);
struct stat st;
+// printf("pagesize: %d\n",pagesizem1);
+// exit (1);

if (fd < 0) {
perror("mkdep: open");
@@ -236,6 +238,7 @@
}
fstat(fd, &st);
mapsize = st.st_size + 2*sizeof(unsigned long);
+#if 0
mapsize = (mapsize+pagesizem1) & ~pagesizem1;
map = mmap(NULL, mapsize, PROT_READ, MAP_PRIVATE, fd, 0);
if (-1 == (long)map) {
@@ -243,9 +246,20 @@
close(fd);
return;
}
- close(fd);
state_machine(map);
munmap(map, mapsize);
+#else
+ map = malloc(mapsize);
+ if (! map) {
+ perror("mkdep: malloc");
+ close(fd);
+ return;
+ }
+ read (fd,map,st.st_size);
+ state_machine(map);
+ free(map);
+#endif
+ close(fd);
if (hasdep)
puts(command);
}
@@ -254,6 +268,7 @@
{
int len;
char * hpath;
+return(0);

hpath = getenv("HPATH");
if (!hpath)



2001-10-16 12:45:35

by Martin Devera

[permalink] [raw]
Subject: sendto syscall is slow

Hello,

i'm doing new qos discipline developement and use own
mesurment tool. It simply uses PF_PACKET and then
doing sendto/recv simulating various flows.
(I use both lo and eth0 where I short-connected RX-TX
pins in single ethcard)

I can't get beyond 25 000 packets per second. gprof:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
35.67 5.39 5.39 498750 0.01 0.01 sendto
26.67 9.42 4.03 1000826 0.00 0.00 poll
19.06 12.30 2.88 498750 0.01 0.01 recv

Is there any faster way to force raw packets to kernel ? I need
to push qos discipline to its edge but I can't because send
syscall is bottleneck.
Is it possible to tx multiple packets in sinhle call or should
I extend kernel myself for this testing purpose ?

thanks, devik


2001-10-16 13:06:22

by Martin Dalecki

[permalink] [raw]
Subject: Re: sendto syscall is slow

Martin Devera wrote:
>
> Hello,
>
> i'm doing new qos discipline developement and use own
> mesurment tool. It simply uses PF_PACKET and then
> doing sendto/recv simulating various flows.
> (I use both lo and eth0 where I short-connected RX-TX
> pins in single ethcard)
>
> I can't get beyond 25 000 packets per second. gprof:
> Each sample counts as 0.01 seconds.
> % cumulative self self total
> time seconds seconds calls ms/call ms/call name
> 35.67 5.39 5.39 498750 0.01 0.01 sendto
> 26.67 9.42 4.03 1000826 0.00 0.00 poll
> 19.06 12.30 2.88 498750 0.01 0.01 recv
>
> Is there any faster way to force raw packets to kernel ? I need
> to push qos discipline to its edge but I can't because send
> syscall is bottleneck.
> Is it possible to tx multiple packets in sinhle call or should
> I extend kernel myself for this testing purpose ?

Increase the HZ constant in the kernel, which is determining the
sceduler frequency, which is apparently due to BH handling acting
as a low-pass filder for your siganls here. However please
beware of
many possible sideffects this may have on your system.

2001-10-16 13:30:15

by David Weinehall

[permalink] [raw]
Subject: Re: Very old kernel.

On Tue, Oct 16, 2001 at 03:39:06AM -0700, Kirill Ratkin wrote:
> Hi. Do anybody know how to compile old kernel? (I need
> to compile 2.0.35 verion). I make config and make dep,
> when I do it I see error (during make dep). I found
> this problem as bus error in mkdep binary. I tried to
> take config scripts from 2.4.x kernel and it's ok but
> when I tried to compile I saw many error connected
> with asm statement and function type prefixes (like
> __constant_memcopy). I wouldn't like to install old
> gcc and old binutils. Are there ways to compile old
> kernel with new dev. tools?

The __asm__ in v2.0.xx won't compile with too new binutils (unless you
use v2.0.40-pre[12], where I've fixed this), and a new gcc will
miscompile the x86 port at least.


/David
_ _
// David Weinehall <[email protected]> /> Northern lights wander \\
// Project MCA Linux hacker // Dance across the winter sky //
\> http://www.acc.umu.se/~tao/ </ Full colour fire </

2001-10-16 13:56:16

by Martin Devera

[permalink] [raw]
Subject: Re: sendto syscall is slow

> > i'm doing new qos discipline developement and use own
> > mesurment tool. It simply uses PF_PACKET and then
> > doing sendto/recv simulating various flows.
> > (I use both lo and eth0 where I short-connected RX-TX
> > pins in single ethcard)
> >
> > I can't get beyond 25 000 packets per second. gprof:
> > Each sample counts as 0.01 seconds.
> > % cumulative self self total
> > time seconds seconds calls ms/call ms/call name
> > 35.67 5.39 5.39 498750 0.01 0.01 sendto
> > 26.67 9.42 4.03 1000826 0.00 0.00 poll
> > 19.06 12.30 2.88 498750 0.01 0.01 recv
> >[snip]
>
> Increase the HZ constant in the kernel, which is determining the
> sceduler frequency, which is apparently due to BH handling acting
> as a low-pass filder for your siganls here. However please
> beware of many possible sideffects this may have on your system.

I did. The no of packets decreased:
37.40 6.63 6.63 439050 0.02 0.02 sendto
25.66 11.19 4.55 881028 0.01 0.01 poll
20.19 14.77 3.58 439050 0.01 0.01 recv

Not it is about 23 000/sec probably due to higher system overhead.
I don't think it could affect this case because recieve queue is
drained from softirq which is run when syscall returns to userspace.
So that is should not be bound to scheculer timing (as I both send
and recieve from single process).

Martin

2001-10-16 15:59:09

by Francois Romieu

[permalink] [raw]
Subject: Re: sendto syscall is slow

Martin Devera <[email protected]> :
[sendto/recv profile]
> Is there any faster way to force raw packets to kernel ? I need
> to push qos discipline to its edge but I can't because send
> syscall is bottleneck.
> Is it possible to tx multiple packets in sinhle call or should
> I extend kernel myself for this testing purpose ?

Do you have the same profile for sendto when Rx/Tx isn't short
connected ?
You may consider polling for Tx/Rx completion in the Tx path at the
driver level. If your cpu isn't too much powered it will make
a difference.

--
Ueimor

2001-10-16 16:08:19

by Ravi Chamarti

[permalink] [raw]
Subject: Ref: zerocopy +netfilter performance problem.


Hi all.

I am kind of new to this forum and am not sure whether
to pose this question in linux-kernel or linux-net

I am using linux kernel 2.4.4 and having performance
problem using zerocopy code and netfilter code in the
kernel.

I have been using zerocopy path through network stack
and things are going fine with that. Until I tried
enabling netfilter support in the kernel. The way I am
using zerocopy code is by passing kernel physical
pages directly to tcp_sendpage and letting network
code and NIC do the rest.


I enabled only network packet filter option inorder to
register my own nf_hook. I haven't enabled other
options like netfitler debugging/socket
filtering/conntrack/iptables/ipchains compt.
Idea is to register my hook and do a little work with
the packet header but not with the data. I do have one
small hook which just return NF_ACCEPT and do nothing
with the packet.

I am not intending to use any netfilter hooks for
zerocopy path, however would like to use netfiler
hooks for some non-zerocopy path traffic.


What I see that a skb with frag list is getting copied
in the netfilter hook call (nf_hook_slow) ( I guess
skb_linearize routine is remapping all physical pages
and copies into a kernel buffer).


My question is that is this copy is required for
netfilter to work? Do we somehow get around
with netfilter to work such that the zerocopy path
passes the packet without any copy?

Thanks
Ravi Chamarti.


__________________________________________________
Do You Yahoo!?
Make a great connection at Yahoo! Personals.
http://personals.yahoo.com

2001-10-16 16:15:11

by Martin Devera

[permalink] [raw]
Subject: Re: sendto syscall is slow

> Do you have the same profile for sendto when Rx/Tx isn't short
> connected ?

I have not. I have only one computer and one 100Mbit eth card
at home. But I've got the same results when I used loopback
driver.
But without loopback wires it goes like this:
41.03 4.34 4.34 478200 0.01 0.01 sendto
29.52 7.46 3.12 481389 0.01 0.01 poll
14.70 9.02 1.55 478200 0.00 0.00 setsockopt

> You may consider polling for Tx/Rx completion in the Tx path at the
> driver level. If your cpu isn't too much powered it will make
> a difference.

Are you speaking about rewriting nic driver ? Like try to drain
waiting packet from nic's memory while enqueuing new one ?

IMHO the bottleneck will be probably in send syscall (probably
syscall overhead).
I'm thinking about hack which will allow me to send() large
buffer and kernel code will break it into smaller ones and
device_queue_xmit them at once.

devik

2001-10-16 17:03:48

by Francois Romieu

[permalink] [raw]
Subject: Re: sendto syscall is slow

Martin Devera <[email protected]> :
[...]
> Are you speaking about rewriting nic driver ? Like try to drain
> waiting packet from nic's memory while enqueuing new one ?

Partly: simply disabling Rx/Tx interrupt and checking for ack
in buffers descriptor during hard_start_xmit. The profile for
loopback shows your problem is not here however. :o(

--
Ueimor

2001-10-16 17:10:58

by Martin Devera

[permalink] [raw]
Subject: Re: sendto syscall is slow

> Martin Devera <[email protected]> :
> [...]
> > Are you speaking about rewriting nic driver ? Like try to drain
> > waiting packet from nic's memory while enqueuing new one ?
>
> Partly: simply disabling Rx/Tx interrupt and checking for ack
> in buffers descriptor during hard_start_xmit. The profile for
> loopback shows your problem is not here however. :o(

I just found that PF_SOCKET can be mmaped to improve reads.
Only I can't found docs how to use the functionality ..

2001-10-18 18:41:11

by Ravi Chamarti

[permalink] [raw]
Subject: Re: Ref: zerocopy +netfilter performance problem.

Hi,

Thanks for your response Alexey. I appreciate it.

--- [email protected] wrote:
> Hello!
>
> > My question is that is this copy is required for
> > netfilter to work? Do we somehow get around
> > with netfilter to work such that the zerocopy path
> > passes the packet without any copy?
>
> Yes & yes.
>
> Existing netfilter modules do not understand
> fragmented skbs,
> and as soon as netfilter folks are lazy even to move
> the check
> to relevant modules, even smart hooks has to be
> harmed by this.

How many netfilter modules exist which do not
understand fragmented skbs and need to look at the
skb data?

Will the following approach work?

if the somehow hook register shows interest only in
header (by setting a flag, may be in nf_hooks_ops
struct), then we can avoid the copy of the fragmented
skb's data and all other cases, we copy fragmented
skb's data to a kernel buffer. The side effect is that
a flag field is introduced into nf_hook_ops struct
which makes netfilter modules to recompile. Are there
any other side affects or better approaches?


regards
Ravi Chamarti


__________________________________________________
Do You Yahoo!?
Make a great connection at Yahoo! Personals.
http://personals.yahoo.com

2001-10-18 18:48:01

by Alexey Kuznetsov

[permalink] [raw]
Subject: Re: Ref: zerocopy +netfilter performance problem.

Hello!

> How many netfilter modules exist which do not

All of them.

> if the somehow hook register shows interest only in
> header

All the headers except for IP header can be split, at least
defragmenter generates them.

So, not this but rather: "does it understand that skb may be not linear?"

It will work of course.

Alexey