LinuxLists.cc - huge number of intr/s on large nfs server

2002-10-14 20:21:21

Subject: huge number of intr/s on large nfs server

Hi All,

I have a 2.4.18 kernel running on a dual 2.4Ghz Xeon platform using software
RAID 5 via IBM's EVMS and EXT3. The system is being used as an NFS server
and although local disk performance is excellent, NFS performance (over UDP
and TCP, vers 2 and 3 with multiple different client mount block sizes) is
poor to bad. Looking at mpstat while the system is under load shows the
%system to be quite high (94-96%) but most interestingly shows the number of
intr/s (context switches) to be 17-18K plus!

Since I was not sure what was causing all of these context switches, I
installed SGI kernprof and ran it during a 15 minute run. I used this
command to start kernprof: 'kernprof -r -d time -f 1000 -t pc -b -c all' and
this one to stop it: 'kernprof -e -i | sort -nr +2 | less >
big_csswitch.txt'

The output of this collection is located here (18Kb):

http://www.effrem.com/linux/kernel/dev/big_csswitch.txt

Most interesting to me is why in the top three results:

default_idle [c010542c]: 861190
_text_lock_inode [c015d031]: 141795
UNKNOWN_KERNEL [c01227f0]: 101532

that default_idle would be the highest value when the CPUs showed 94-96%
busy. Also interesting is what UNKNOWN_KERNEL is. ???

The server described above has 14 internal IDE disks configured as software
Raid 5 and connected to the network with one Syskonnect copper gigabit card.
I used 30 100 base-T connected clients all of which performed sequential
writes to one large 1.3TB volume on the file server. They were mounted
NFSv2, UDP, 8K r+w size for this run. I was able to achieve only 35MB/sec of
sustained NFS write throughput. Local disk performance (e.g. dd file) for
sustained writes is *much* higher. I am using knfsd with the latest 2.4.18
Neil Brown fixes from his site. Distribution is Debian 3.0 Woody Stable.

Many thanks in advance for the insight,

Eff Norwood

-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-10-15 08:13:08

by Bogdan Costescu

[permalink] [raw]

Subject: Re: huge number of intr/s on large nfs server

On Mon, 14 Oct 2002, Eff Norwood wrote:

> Also interesting is what UNKNOWN_KERNEL is. ???

Do you have any binary-only modules in the running kernel ?

> The server described above has 14 internal IDE disks configured as software
> Raid 5 and connected to the network with one Syskonnect copper gigabit card.

Please confirm: you have at least 7 IDE controllers. If so, how are the
interrupts associated to them (look in /proc/interrupts) ? Are there
shared interrupts ? Is the NIC sharing an interrupt line with at least one
IDE controller ? I believe the answer to the last question might be yes,
as this would explain why disk access is fast, but mixed disk+network
access is slow; IDE drivers and NIC drivers don't mix well (see
discussions over "max_interrupt_work" parameter for most network drivers).
However, NFS might not be a good network test in this case, maybe ftp - or
anything TCP based - would have been better.

> I used 30 100 base-T connected clients all of which performed sequential
> writes to one large 1.3TB volume on the file server. They were mounted
> NFSv2, UDP, 8K r+w size for this run.

Err, you have a GE NIC in the server and FE NICs in the clients. Running
NFS over UDP might not the best solution in this case; try NFS over TCP.

> I was able to achieve only 35MB/sec of sustained NFS write throughput.
> Local disk performance (e.g. dd file) for sustained writes is *much*
> higher.

I don't know how you tested this, but disk head movement might influence
the results a lot. For a single file (the "dd" case), the kernel will
probably try to create a contiguous file, of course limited by the free
regions of the disk; however, when writting to 30 files simultaneously,
the kernel might have to wait for the disk to write in different regions
for different files - I'm sure Daniel could explain this better than me -
so that write speed is much reduced. I think that a better disk test would
be to try to do all the 30 writes simultaneously on the server itself with
"dd" or something similar, so that the network is not involved.

--
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: [email protected]

-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-10-15 14:02:11

by Heflin, Roger A.

[permalink] [raw]

Subject: RE: huge number of intr/s on large nfs server

You say 35MB/second, and 17k context switches, so that is about
2k/context switch. You get at least 1 context switch per every
few packets sent out, and you get 1 context switch for each disk
io done (I believe), the more data you send out, the more context
switches/interrupts that you will get. Things that appear to reduce
the numbers are using larger packets (32k seems to reduce the
numbers over 8k nfs-wsize,rsize), making anything else larger
may also reduce the total number. And software raid should
increase the number a bit as more has to be taken care of by
the main cpu, whereas in hardware raid the parity writes
are invisible to the main cpu. Also with hardware raid the=20
parity calcs are on the hardware and not on the main cpu,
so this reduces the main cpu usage. With a hardware raid
setup I get performance numbers similar to what you are seeing,
only I get more like 50% cpu usage on a single slightly slower cpu.
With a SCSI fc 5+1 disk setup with a mylex controller I am getting
about 25MB/second writes.

Just doing local IO will produce lots and lots of interrupts/context
switches.

When you did the local dd you did make sure to break the cache?
Otherwise the results that you get will be rather useless. I have
been finding that I can usually get about 1/2 of the local capacity
to the network across NFS when I break the cache, if you don't
break the cache you get very very large results.

Roger

> Message: 1
> From: "Eff Norwood" <[email protected]>
> To: <[email protected]>
> Cc: "Daniel Phillips" <[email protected]>
> Date: Mon, 14 Oct 2002 13:21:15 -0700
> Subject: [NFS] huge number of intr/s on large nfs server
>=20
> Hi All,
>=20
> I have a 2.4.18 kernel running on a dual 2.4Ghz Xeon platform using =
software
> RAID 5 via IBM's EVMS and EXT3. The system is being used as an NFS =
server
> and although local disk performance is excellent, NFS performance =
(over UDP
> and TCP, vers 2 and 3 with multiple different client mount block =
sizes) is
> poor to bad. Looking at mpstat while the system is under load shows =
the
> %system to be quite high (94-96%) but most interestingly shows the =
number of
> intr/s (context switches) to be 17-18K plus!
>=20
> Since I was not sure what was causing all of these context switches, I
> installed SGI kernprof and ran it during a 15 minute run. I used this
> command to start kernprof: 'kernprof -r -d time -f 1000 -t pc -b -c =
all' and
> this one to stop it: 'kernprof -e -i | sort -nr +2 | less >
> big_csswitch.txt'
>=20
> The output of this collection is located here (18Kb):
>=20
> http://www.effrem.com/linux/kernel/dev/big_csswitch.txt
>=20
> Most interesting to me is why in the top three results:
>=20
> default_idle [c010542c]: 861190
> _text_lock_inode [c015d031]: 141795
> UNKNOWN_KERNEL [c01227f0]: 101532
>=20
> that default_idle would be the highest value when the CPUs showed =
94-96%
> busy. Also interesting is what UNKNOWN_KERNEL is. ???
>=20
> The server described above has 14 internal IDE disks configured as =
software
> Raid 5 and connected to the network with one Syskonnect copper gigabit =
card.
> I used 30 100 base-T connected clients all of which performed =
sequential
> writes to one large 1.3TB volume on the file server. They were mounted
> NFSv2, UDP, 8K r+w size for this run. I was able to achieve only =
35MB/sec of
> sustained NFS write throughput. Local disk performance (e.g. dd file) =
for
> sustained writes is *much* higher. I am using knfsd with the latest =
2.4.18
> Neil Brown fixes from his site. Distribution is Debian 3.0 Woody =
Stable.
>=20
> Many thanks in advance for the insight,
>=20
> Eff Norwood
>=20
>=20
>=20
>=20
>=20

-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-10-15 16:50:30

by Eff Norwood

[permalink] [raw]

Subject: RE: huge number of intr/s on large nfs server

> > Also interesting is what UNKNOWN_KERNEL is. ???
>
> Do you have any binary-only modules in the running kernel ?

Not that I am aware of.

> > The server described above has 14 internal IDE disks configured
> as software
> > Raid 5 and connected to the network with one Syskonnect copper
> gigabit card.
>
> Please confirm: you have at least 7 IDE controllers.

No. I have a 3ware card that turns 8 ide disks into 8 SCSI disks for all
intensive purposes.

> If so, how are the
> interrupts associated to them (look in /proc/interrupts) ? Are there
> shared interrupts ? Is the NIC sharing an interrupt line with at
> least one
> IDE controller ?

No, the 3ware card has its own interrupt as do each of the gigabit
interfaces. Each card (3ware, gigabit) is also on its own bus. The
SuperMicro MB I'm using has x4 PCI-X busses and one card is in each.

> However, NFS might not be a good network test in this case, maybe
> ftp - or
> anything TCP based - would have been better.

I agree but I need to use NFS.

> > I used 30 100 base-T connected clients all of which performed sequential
> > writes to one large 1.3TB volume on the file server. They were mounted
> > NFSv2, UDP, 8K r+w size for this run.
>
> Err, you have a GE NIC in the server and FE NICs in the clients.

Correct - connected through a Foundry switch.

> Running
> NFS over UDP might not the best solution in this case; try NFS over TCP.

I tried this and it was *much* worse. NFS over TCP seems pretty broken right
now in terms of throughput. Certainly much worse than UDP.

> > I was able to achieve only 35MB/sec of sustained NFS write throughput.
> > Local disk performance (e.g. dd file) for sustained writes is *much*
> > higher.
>
> I think that a better disk
> test would
> be to try to do all the 30 writes simultaneously on the server
> itself with
> "dd" or something similar, so that the network is not involved.

This might get us better numbers, but I'm looking to fix the performance
over the network - not on local disk. Yes, my test was to have 30 individual
clients dd 10MB files over NFS to the server.

Thanks,

Eff

-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-10-15 17:03:12

by Bogdan Costescu

[permalink] [raw]

Subject: RE: huge number of intr/s on large nfs server

On Tue, 15 Oct 2002, Eff Norwood wrote:

> No, the 3ware card has its own interrupt as do each of the gigabit
> interfaces. Each card (3ware, gigabit) is also on its own bus. The
> SuperMicro MB I'm using has x4 PCI-X busses and one card is in each.

OK, the setup is completely different that the one I imagined.

> I tried this and it was *much* worse. NFS over TCP seems pretty broken right
> now in terms of throughput. Certainly much worse than UDP.

Well, somebody else from this list could help you find out why... just
make sure that you use the latest patches that were announced here.
However, you should make sure first that the network is in good condition.
I've already posted several messages on this topic, so please search some
archives of this list.

> This might get us better numbers, but I'm looking to fix the performance
> over the network - not on local disk. Yes, my test was to have 30 individual
> clients dd 10MB files over NFS to the server.

I'm sorry but I think that we don't understand each other here. You said
that you tested one dd on the server (so direct access to disk) versus 30
dd's on clients (so over network). What I meant was to try to test 30 dd's
on the server (so again direct access to disk) - this is an intermediate
situation between the 2 that you mentioned and should show how much you
could get from your disks and what is the involvment of the network.
If this scattered write shows something close to 35 MB/s, then it's the
disk that limits the writes from the clients and even with a dedicated
network card for each client you won't get better write peformance over
NFS. However, if the scattered write shows something significantly better
than 35 MB/s, then either the network or the congestion mechanism of NFS
are not working right (or even they both work right and this is the
maximum that you could get from your setup !).

--
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: [email protected]

-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-10-15 21:34:22

by Andrew Theurer

[permalink] [raw]

Subject: Re: huge number of intr/s on large nfs server

On Monday 14 October 2002 3:21 pm, Eff Norwood wrote:
> Hi All,
>
> I have a 2.4.18 kernel running on a dual 2.4Ghz Xeon platform using
> software RAID 5 via IBM's EVMS and EXT3. The system is being used as an=
NFS
> server and although local disk performance is excellent, NFS performanc=
e
> (over UDP and TCP, vers 2 and 3 with multiple different client mount bl=
ock
> sizes) is poor to bad. Looking at mpstat while the system is under load
> shows the %system to be quite high (94-96%) but most interestingly show=
s
> the number of intr/s (context switches) to be 17-18K plus!

Are you sure you want to use raid5? I thought there was a lot of overhea=
d vs=20
mirroring. Look for evms_raid5d or something similar in top when you do =
a dd=20
test. Do you have enough disks to spare to do mirroring? =20

What mode did you use for ext3? try data=3Dwriteback

> Since I was not sure what was causing all of these context switches, I
> installed SGI kernprof and ran it during a 15 minute run. I used this
> command to start kernprof: 'kernprof -r -d time -f 1000 -t pc -b -c all=
'
> and this one to stop it: 'kernprof -e -i | sort -nr +2 | less >
> big_csswitch.txt'
>
> The output of this collection is located here (18Kb):
>
> http://www.effrem.com/linux/kernel/dev/big_csswitch.txt
>
> Most interesting to me is why in the top three results:
>
> default_idle [c010542c]: 861190
> _text_lock_inode [c015d031]: 141795
> UNKNOWN_KERNEL [c01227f0]: 101532
>
> that default_idle would be the highest value when the CPUs showed 94-96=
%
> busy. Also interesting is what UNKNOWN_KERNEL is. ???

This can be any module, binary only or not, for kernprof. It is most lik=
ely=20
your module for the GigE card. Can you check, and if possible, build it i=
nto=20
the kernel? Actually, can you build everything in the kernel, so kernpro=
f=20
may see all functions?=20

Also, make sure you reset the kernel profile after starting the test, and=
stop=20
the profile before the test ends. Otherwise you are getting idle time=20
before/after the test.=20

> The server described above has 14 internal IDE disks configured as soft=
ware
> Raid 5 and connected to the network with one Syskonnect copper gigabit
> card. I used 30 100 base-T connected clients all of which performed
> sequential writes to one large 1.3TB volume on the file server. They we=
re
> mounted NFSv2, UDP, 8K r+w size for this run. I was able to achieve onl=
y
> 35MB/sec of sustained NFS write throughput. Local disk performance (e.g=
=2E dd
> file) for sustained writes is *much* higher. I am using knfsd with the
> latest 2.4.18 Neil Brown fixes from his site. Distribution is Debian 3.=
0
> Woody Stable.

Sorry but I forgot, maybe someone can tell me, are the nfs writes async o=
r=20
not? =20

There is a udp scaling problem, but it does not sound like you have hit t=
hat=20
(yet) if you are 95% CPU. =20

Are you running hyperthreading? This is a little off topic, but when the=
=20
first problem is fixed, you might want to try hyperthreading. I saw a 2=
5%=20
improvement in netbench (samba) on a 2 way P4 Xeon. =20

Also, your local dd test simulated the clients, right? 30 dd's? What=20
throughput did you get? =20

-Andrew

-------------------------------------------------------
This sf.net email is sponsored by: viaVerio will pay you up to
$1,000 for every account that you consolidate with us.
http://ad.doubleclick.net/clk;4749864;7604308;v?
http://www.viaverio.com/consolidator/osdn.cfm
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-10-16 02:41:31

by Eff Norwood

[permalink] [raw]

Subject: RE: huge number of intr/s on large nfs server

Hi Roger,

> When you did the local dd you did make sure to break the cache?

You bet. I created 10GB files so as to be sure to break the 2GB cache. Still
*much* faster local than NFS no matter what proto/size/options.

Eff Norwood

-------------------------------------------------------
This sf.net email is sponsored by: viaVerio will pay you up to
$1,000 for every account that you consolidate with us.
http://ad.doubleclick.net/clk;4749864;7604308;v?
http://www.viaverio.com/consolidator/osdn.cfm
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-10-16 20:06:21

by Eff Norwood

[permalink] [raw]

Subject: RE: huge number of intr/s on large nfs server

> Are you sure you want to use raid5?

Yes. I don't have enough disks to do a mirror/stripe combo and still get the
space I want.

> What mode did you use for ext3? try data=writeback

For and NFS server, I think that's a bad idea since writeback can allow old
data to show up in files after a crash. I used ordered.

> This can be any module, binary only or not, for kernprof. It is
> most likely
> your module for the GigE card. Can you check, and if possible,
> build it into
> the kernel?

I can't build the gigabit into the kernel because I need to pass the module
parameters. The rest I can and will build into the kernel.

> Also, make sure you reset the kernel profile after starting the
> test, and stop
> the profile before the test ends. Otherwise you are getting idle time
> before/after the test.

Got that ok.

> Are you running hyperthreading? This is a little off topic, but when the
> first problem is fixed, you might want to try hyperthreading. I
> saw a 25%
> improvement in netbench (samba) on a 2 way P4 Xeon.

No, I'm not. I thought I read on LKML that hyperthreading did nothing except
destabilize the system.

> Also, your local dd test simulated the clients, right? 30 dd's?
Right

> What
> throughput did you get?

About 105-110MB/sec total.

Thanks,

Eff

-------------------------------------------------------
This sf.net email is sponsored by: viaVerio will pay you up to
$1,000 for every account that you consolidate with us.
http://ad.doubleclick.net/clk;4749864;7604308;v?
http://www.viaverio.com/consolidator/osdn.cfm
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-10-16 22:51:02

by Donavan Pantke

[permalink] [raw]

Subject: Re: huge number of intr/s on large nfs server

On Wednesday 16 October 2002 16:06, Eff Norwood wrote:

> > This can be any module, binary only or not, for kernprof. It is
> > most likely
> > your module for the GigE card. Can you check, and if possible,
> > build it into
> > the kernel?
>
> I can't build the gigabit into the kernel because I need to pass the mo=
dule
> parameters. The rest I can and will build into the kernel.
>

=09Umm... just a small interjection, but can't you just pass the paramete=
rs=20
through LILO or GRUB? I remember doing that for several Dual-NIC firewall=
s.

=09Donavan Pantke

-------------------------------------------------------
This sf.net email is sponsored by: viaVerio will pay you up to
$1,000 for every account that you consolidate with us.
http://ad.doubleclick.net/clk;4749864;7604308;v?
http://www.viaverio.com/consolidator/osdn.cfm
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-10-16 23:18:33

by Eff Norwood

[permalink] [raw]

Subject: RE: huge number of intr/s on large nfs server

> Umm... just a small interjection, but can't you just pass
> the parameters
> through LILO or GRUB?

That's an *excellent* suggestion. I had forgotten you could do that! So yes,
everything can be built into the kernel. Anyone know of a good howto on this
subject?

Thanks!

Eff

-------------------------------------------------------
This sf.net email is sponsored by: viaVerio will pay you up to
$1,000 for every account that you consolidate with us.
http://ad.doubleclick.net/clk;4749864;7604308;v?
http://www.viaverio.com/consolidator/osdn.cfm
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-10-16 23:28:18

by Donavan Pantke

[permalink] [raw]

Subject: Re: huge number of intr/s on large nfs server

On Wednesday 16 October 2002 19:18, Eff Norwood wrote:
> > =09Umm... just a small interjection, but can't you just pass
> > the parameters
> > through LILO or GRUB?
>
> That's an *excellent* suggestion. I had forgotten you could do that! So
> yes, everything can be built into the kernel. Anyone know of a good how=
to
> on this subject?

=09Umm... stadard linuxdoc.org should have one, but real quick, in LILO i=
t=20
involves this:

append=3D"eth0=3D<module parameters> eth1=3D<module parameters>"

=09Donavan

-------------------------------------------------------
This sf.net email is sponsored by: viaVerio will pay you up to
$1,000 for every account that you consolidate with us.
http://ad.doubleclick.net/clk;4749864;7604308;v?
http://www.viaverio.com/consolidator/osdn.cfm
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-10-17 02:29:01

by Benjamin LaHaise

[permalink] [raw]

Subject: Re: huge number of intr/s on large nfs server

What NIC are you using? High numbers of interrupts per second are not
necessarily the problem, but reducing it may be possible. Diagnosing
these kinds of performance issues is near impossible without a complete
description of the system, and the identity of the NIC/driver is
essential.

-ben
--
"Do you seek knowledge in time travel?"

-------------------------------------------------------
This sf.net email is sponsored by: viaVerio will pay you up to
$1,000 for every account that you consolidate with us.
http://ad.doubleclick.net/clk;4749864;7604308;v?
http://www.viaverio.com/consolidator/osdn.cfm
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-10-17 02:49:50

by Eff Norwood

[permalink] [raw]

Subject: RE: huge number of intr/s on large nfs server

> What NIC are you using? High numbers of interrupts per second are not
> necessarily the problem, but reducing it may be possible. Diagnosing
> these kinds of performance issues is near impossible without a complete
> description of the system, and the identity of the NIC/driver is
> essential.

Hi Ben,

This system in question is a SuperMicro P4DL6 MB with dual Xeon 2.4's and
2.0GB of memory. One of the PCI-X slots/busses has a 3ware 7850 IDE RAID
card that is *not* using its on card raid. Two more of the PCI-X
slots/busses have Syskonnect SK-9821 copper gigabit cards. I'm using x8 IBM
Deskstar 120GB drives configured via IBM's EVMS 1.2.0 as software RAID5.
Distro is Debian 3.0 Woody stable. Kernel is 2.4.19 with all of Neil Brown's
latest KNFSD patches from his web site. Syskonnect and 3ware driver are
those found in the 2.4.19 kernel.

Switch is a Foundry BigIron and clients are 30 x SuperMicro SuperServer
6010H with dual PIII 1.0Ghz, 1024MB of RAM, and Syskonnect SK-9821 copper
gigabit cards also running Debian 3.0 Stable with kernel 2.4.19 and all of
Trond's latest NFS client patches.

Thanks!

Eff

-------------------------------------------------------
This sf.net email is sponsored by: viaVerio will pay you up to
$1,000 for every account that you consolidate with us.
http://ad.doubleclick.net/clk;4749864;7604308;v?
http://www.viaverio.com/consolidator/osdn.cfm
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-10-17 11:10:42

by Alex Thiel

[permalink] [raw]

Subject: Re: huge number of intr/s on large nfs server

On Thursday 17 October 2002 04:49, Eff Norwood wrote:
>
> Switch is a Foundry BigIron and clients are 30 x SuperMicro SuperServer
> 6010H with dual PIII 1.0Ghz, 1024MB of RAM, and Syskonnect SK-9821 copp=
er
> gigabit cards also running Debian 3.0 Stable with kernel 2.4.19 and all=
of
> Trond's latest NFS client patches.
>

Just to be sure: have you checked that the clients are actually only send=
ing=20
at 100Mbit/s?

=09Alex

-------------------------------------------------------
This sf.net email is sponsored by: viaVerio will pay you up to
$1,000 for every account that you consolidate with us.
http://ad.doubleclick.net/clk;4749864;7604308;v?
http://www.viaverio.com/consolidator/osdn.cfm
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-10-17 16:44:36

by Eff Norwood

[permalink] [raw]

Subject: RE: huge number of intr/s on large nfs server

Hi Alex,

> Just to be sure: have you checked that the clients are actually
> only sending
> at 100Mbit/s?

Good point. Yes, I'm sure as it's hard set on the client and measured as
such at the switch.

Thanks,

Eff

-------------------------------------------------------
This sf.net email is sponsored by: viaVerio will pay you up to
$1,000 for every account that you consolidate with us.
http://ad.doubleclick.net/clk;4749864;7604308;v?
http://www.viaverio.com/consolidator/osdn.cfm
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-10-18 02:05:45

by Benjamin LaHaise

[permalink] [raw]

Subject: Re: huge number of intr/s on large nfs server

On Wed, Oct 16, 2002 at 07:49:38PM -0700, Eff Norwood wrote:
> This system in question is a SuperMicro P4DL6 MB with dual Xeon 2.4's and
> 2.0GB of memory. One of the PCI-X slots/busses has a 3ware 7850 IDE RAID

Can you try running a non-SMP kernel? At least one nfs server I'm running
does pretty well with a UP kernel, even though it is getting in the
neighbourhood of 10-20k interrupts per second. Note that it is using
async writes, and gets about 40MB/s with ext3 and is 100% busy when being
written to, with the bulk of the time spent computing raid checksum blocks
and copying data around and in prune_icache. With an SMP system, there is
likely to be contention introduced by interrupts floating back and forth
between CPUs when the single client writing case is really not helped by
SMP in the current knfsd. Binding various interrupts to specific CPUs may
help things here, but part of the problem lies with knfsd allowing multiple
processes to back up in generic_file_write, which serializes on the inode
semaphore. This is pure overhead that a UP system won't suffer from quite
as dramatically. Also, make sure your journal is on a separate disk.
Cheers,

-ben

-------------------------------------------------------
This sf.net email is sponsored by: viaVerio will pay you up to
$1,000 for every account that you consolidate with us.
http://ad.doubleclick.net/clk;4749864;7604308;v?
http://www.viaverio.com/consolidator/osdn.cfm
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-10-18 02:19:38

by Eff Norwood

[permalink] [raw]

Subject: RE: huge number of intr/s on large nfs server

Ben,

These are *excellent* suggestions! The UP discussion had never crossed my
mind but it all makes sense now that you suggest it. I'll try it over the
weekend when I have clients available and will publish the results in
detailed format. Is anyone out there working on a better NFS for Linux since
NFS v4 is soon to be out and everyone always seems to be ripping on Linux as
an NFS server? I'd love to know about any plans. One thought that crossed my
mind was to rip the NFS server out of *BSD and implement it in Linux. I have
no idea if that actually makes any sense, but I understand that the BSD NFS
server is pretty good stuff.

Thanks!

Eff

-------------------------------------------------------
This sf.net email is sponsored by: viaVerio will pay you up to
$1,000 for every account that you consolidate with us.
http://ad.doubleclick.net/clk;4749864;7604308;v?
http://www.viaverio.com/consolidator/osdn.cfm
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs