2002-03-30 18:48:25

by Randy Hron

[permalink] [raw]
Subject: Linux 2.4.19-pre5

> This release has -aa writeout scheduling changes, which should improve IO
> performance (and interactivity under heavy write loads).

> _Please_ test that extensively looking for any kind of problems
> (performance, interactivity, etc).

2.4.19-pre5 shows a lot of improvement in the tests
I run. dbench 128 throughput up over 50%

dbench 128 processes
2.4.19-pre4 8.4 ****************
2.4.19-pre5 13.2 **************************

Tiobench sequential writes:
10-20% more throughput and latency is lower.

Tiobench Sequential reads
Down 7-8%.

Andrew Morton's read_latency2 patch improves tiobench
sequential reads and writes by 10-35% in the tests I've
run. More importantly, read_latency2 drops max latency
with 32-128 tiobench threads from 300-600+ seconds
down to 2-8 seconds. (2.4.19-pre5 is still unfair
to some read requests when threads >= 32)

I'm happy with pre5 and hope more chunks of -aa show
up in pre6. Maybe Andrew will update read_latency2 for
inclusion in pre6. :) It helps tiobench seq writes too.
dbench goes down a little though.

Max latency is the metric that stands out as "needs
improvement" and "fix exists".

tiobench seq reads 128 threads
MB/s max latency
2.4.19-pre1aa1 6.98 661.3 seconds
2.4.19-pre1aa1rl 9.55 7.8 seconds

tiobench seq writes 32 threads
MB/s max latency
2.4.19-pre1aa1 15.46 26.1 seconds
2.4.19-pre1aa1rl 17.31 18.0 seconds

The read latency issue exists on a 4 way xeon
with 4GB ram too. Max latency jumps to 270 seconds
with 32 tiobench threads, and is over 500 seconds when
threads >= 128. (latency in milliseconds below)

Sequential Reads
Num Avg Maximum Lat% Lat%
Kernel Thr Rate (CPU%) Latency Latency >2s >10s
-------------- --- ------------------------------------------------------
2.4.19-pre5 1 38.46 23.94% 0.302 111.14 0.00000 0.00000
2.4.19-pre5 32 30.24 21.69% 9.883 270391.48 0.01106 0.00915
2.4.19-pre5 64 30.08 21.67% 17.868 357219.21 0.01965 0.01807
2.4.19-pre5 128 30.40 22.77% 30.460 520607.27 0.02714 0.02569
2.4.19-pre5 256 29.07 21.96% 56.444 539381.86 0.05378 0.05197


The behemoth benchmark page:
http://home.earthlink.net/~rwhron/kernel/k6-2-475.html

--
Randy Hron


2002-03-30 19:51:06

by Andrew Morton

[permalink] [raw]
Subject: Re: Linux 2.4.19-pre5

[email protected] wrote:
>
> > This release has -aa writeout scheduling changes, which should improve IO
> > performance (and interactivity under heavy write loads).
>
> > _Please_ test that extensively looking for any kind of problems
> > (performance, interactivity, etc).
>
> 2.4.19-pre5 shows a lot of improvement in the tests
> I run. dbench 128 throughput up over 50%
>
> dbench 128 processes
> 2.4.19-pre4 8.4 ****************
> 2.4.19-pre5 13.2 **************************

dbench throughput is highly dependent upon the amount of memory
which you allow it to use. -pre5 is throttling writers based
on the amount of dirty buffers, not the amount of dirty+locked
buffers. Hence this change.

It's worth noting that balance_dirty() basically does this:

if (dirty_memory > size-of-ZONE_NORMAL * ratio)
write_stuff();

That's rather irrational, because most of the dirty buffers
will be in ZONE_HIGHMEM. So hmmmm. Probably we should go
across all zones and start writeout if any of them is getting
full of dirty data. Which may not make any difference....

> Tiobench sequential writes:
> 10-20% more throughput and latency is lower.

The bdflush changes mean that we're doing more write-behind.
So possibly write throughput only *seems* to be better,
because more of it is happening after the measurement period
has ended. It depends whether tiobench is performing an
fsync, and is including that fsync time in its reporting.
It should be.

> Tiobench Sequential reads
> Down 7-8%.

Dunno. I can't immediately thing of anything in pre5
which would cause this.

> Andrew Morton's read_latency2 patch improves tiobench
> sequential reads and writes by 10-35% in the tests I've
> run. More importantly, read_latency2 drops max latency
> with 32-128 tiobench threads from 300-600+ seconds
> down to 2-8 seconds. (2.4.19-pre5 is still unfair
> to some read requests when threads >= 32)

These numbers are surprising. The get_request starvation
change should have smoothed things out. Perhaps there's
something else going on, or it's not working right. If
you could please send me all the details to reproduce this
I'll take a look. Thanks.

> I'm happy with pre5 and hope more chunks of -aa show
> up in pre6. Maybe Andrew will update read_latency2 for
> inclusion in pre6. :) It helps tiobench seq writes too.
> dbench goes down a little though.

http://www.zip.com.au/~akpm/linux/patches/2.4/2.4.19-pre5/


Nice testing report, BTW. As we discussed off-list, your
opinions, observations and summary are even more valuable than
columns of numbers :)

Have fun with that quad, but don't break it.

I'll get the rest of the -aa VM patches up at the above URL
soonish. I seem to have found a nutty workload which is returning
extremely occasional allocation failures for GFP_HIGHUSER
requests, which will deliver fatal SIGBUS at pagefault time.
There's plenty of swap available, so this is a snag.

-

2002-03-30 21:27:45

by Randy Hron

[permalink] [raw]
Subject: Re: Linux 2.4.19-pre5

> > run. More importantly, read_latency2 drops max latency
> > with 32-128 tiobench threads from 300-600+ seconds
> > down to 2-8 seconds. (2.4.19-pre5 is still unfair
> > to some read requests when threads >= 32)
>
> These numbers are surprising. The get_request starvation
> change should have smoothed things out. Perhaps there's
> something else going on, or it's not working right. If
> you could please send me all the details to reproduce this
> I'll take a look. Thanks.

There was an improvement (reduction) in max latency
during sequential _writes after get_request starvation
went in. Tiobench didn't show an improvement for seq _read
max latency though. read_latency2 makes the huge difference.

The sequential read max latency walls for various trees looks like:
tree # of threads
rmap 128
ac 128
marcelo 32
linus 64
2.5-akpm-everything >128
2.4 read latency2 >128

I.E. tiobench with threads > the numbers above would probably
give the impression the machine was locked up or frozen if your
read request was the unlucky max. The average latencies are
generally reasonable. It's the max, and % of high latency
requests that varies most between the trees.

Using the updated tiobench.pl in
http://prdownloads.sourceforge.net/tiobench/tiobench-0.3.3.tar.gz
(actually - http://home.earthlink.net/~rwhron/kernel/tiobench2.pl
which is very similar to the one in tiobench-0.3.3)

On the 4 way 4GB box:
./tiobench.pl --size 8192 --numruns 3 --block 16384 --threads 1 \
--threads 32 --threads 64 --threads 128 --threads 256.

On the k6-2 with 384M ram:
/tiobench.pl --size 2048 --numruns 3 --threads 8 --threads 16 \
--threads 32 --threads 64 --threads 128

> http://www.zip.com.au/~akpm/linux/patches/2.4/2.4.19-pre5/

Thanks for updating read_latency2! I'm trying it and your other
patches on 2.4.19-pre5. :)

2002-03-30 21:43:30

by Ed Sweetman

[permalink] [raw]
Subject: Re: Linux 2.4.19-pre5

On Sat, 2002-03-30 at 16:33, Randy Hron wrote:
> > > run. More importantly, read_latency2 drops max latency
> > > with 32-128 tiobench threads from 300-600+ seconds
> > > down to 2-8 seconds. (2.4.19-pre5 is still unfair
> > > to some read requests when threads >= 32)
> >
> > These numbers are surprising. The get_request starvation
> > change should have smoothed things out. Perhaps there's
> > something else going on, or it's not working right. If
> > you could please send me all the details to reproduce this
> > I'll take a look. Thanks.
>
> There was an improvement (reduction) in max latency
> during sequential _writes after get_request starvation
> went in. Tiobench didn't show an improvement for seq _read
> max latency though. read_latency2 makes the huge difference.
>
> The sequential read max latency walls for various trees looks like:
> tree # of threads
> rmap 128
> ac 128
> marcelo 32
> linus 64
> 2.5-akpm-everything >128
> 2.4 read latency2 >128
>
> I.E. tiobench with threads > the numbers above would probably
> give the impression the machine was locked up or frozen if your
> read request was the unlucky max. The average latencies are
> generally reasonable. It's the max, and % of high latency

Is that to say an ac branch (which uses rmap) can do the 128 but is
non-responsive? I sent a couple mails of my own preliminary runs and
the feel i got when running the test was absolutely no effect on
responsiveness even as the load hit 110. Of course this is with riel's
preempt patch for 2.4.19-pre4-ac3. I guess I'll try with threads = 256
just to see if this frozen feeling occurs in preempt kernels as well.
You dont seem to test them anywhere on your own site.

2002-03-30 22:20:18

by Randy Hron

[permalink] [raw]
Subject: Re: Linux 2.4.19-pre5

> > The sequential read max latency walls for various trees looks like:
> > tree # of threads
> > rmap 128
> > ac 128
> > marcelo 32
> > linus 64
> > 2.5-akpm-everything >128
> > 2.4 read latency2 >128
> >
> > read request was the unlucky max.

> Is that to say an ac branch (which uses rmap) can do the 128 but is
> non-responsive?

Thanks for testing! The more the merrier. :)

unlucky max is the keyword above. The average latency is okay
in general. It's the requests that are waiting to get serviced that
may give the impression "it's locked up".

> the feel i got when running the test was absolutely no effect on
> responsiveness even as the load hit 110.

I see you did several runs, but the mail wrapped and it's hard to
read your results. With 644 MB RAM, I wouldn't expect you to
see the "big latency" phenomenon with these scenerios:

128 MB datafiles 128 threads
384 MB datafiles 1 thread
2048 MB datafiles 8 threads

> preempt patch for 2.4.19-pre4-ac3. I guess I'll try with threads = 256

That's a good number and maybe a 4096MB datafile based on your RAM.
If you are lucky, only the test's I/O's will win the "unlucky max".
If you're not, just be patient, the machine will survive.

The big latency wall moved from 32 to 128 tiobench threads in
2.4.19-pre9-ac3 after Alan put in rmap12e.

> just to see if this frozen feeling occurs in preempt kernels as well.

Yeah, that will be interesting.

> You dont seem to test them anywhere on your own site.

Most of the tests I run measure throughput. Tiobench has the nice
latency metric. When I tested preempt, it generally had lower
throughput. Low latency is important and I'm glad Robert,
Andrew, Ingo and all the others are improving the kernel in
that respect.

If you're curious, there is a 2.4.18-pre3 preempt/lockbreak
and low-latency page at
http://home.earthlink.net/~rwhron/kernel/pe.html

2002-03-30 23:49:27

by Ed Sweetman

[permalink] [raw]
Subject: Re: Linux 2.4.19-pre5

Due to evolution's really cool way of wrapping my emails. . i'll attach
the results.

In this test I wanted to see this lag. So i switched between virtual
desktops (i have 5) and used irc (eterm + epic). What i saw was lower
priority (meaning processes with bigger nice values) at one specific
spot in the test would stop responding but all processes at the same
priority level would continue merrily. I'd say about 5/6 of the way
through the test is where the lower priority processes would stop
responding for a couple seconds. But they revived pretty quickly, only
paused my typing for a couple moments. I failed to see any lag at all
during the entire test on like-wise prioritized processes. I wouldn't
have even known i was running the test if my cpu temp wasn't climbing so
high and the load wasn't 256 on procmeter3.


As for the throughput debate that always follows the preempt kernel. I
think it really depends on the kind of io you're doing. It really
depends on what the program is trying to do with it's io.
Programs that want to throttle and like to do that sequentially,
probably wont appretiate you stopping it and reading some other part of
the disk for a bit and have it have to go back to where it stopped.
Almost no userland non-monolithic database apps actually do something
like that. None that i've come into contact with. But you choose what
works for your workload. if i'm running a app that wants control over
the io to itself, i run it with a higher priority than other processes
and you've basically taken away the "preemptiveness" factor and you get
normal kernel performance (theoretically). seems logical.

Anyway I digress. I mean to get into the latency aspect. Sequential
writes is scary but that's to be expected on ext3. Random reads is a
concern though. It shouldn't be that high. although it was high in
all the tests i ran compared to the other three.


Attachments:
ed_tiotest.text (1.98 kB)

2002-03-31 06:53:49

by Andrew Morton

[permalink] [raw]
Subject: Re: Linux 2.4.19-pre5

Andrew Morton wrote:
>
> ...
> I'll get the rest of the -aa VM patches up at the above URL
> soonish.

http://www.zip.com.au/~akpm/linux/patches/2.4/2.4.19-pre5/aa1/

Rediffed, retested.

> I seem to have found a nutty workload which is returning
> extremely occasional allocation failures for GFP_HIGHUSER
> requests, which will deliver fatal SIGBUS at pagefault time.
> There's plenty of swap available, so this is a snag.

False alarm. My test app was not handling SIGBUS inside its SIGBUS
handler.

-

2002-03-31 12:36:52

by Randy Hron

[permalink] [raw]
Subject: Re: Linux 2.4.19-pre5

> In this test I wanted to see this lag.

Just to be clear on what the "max latency" number is, it's the I/O
request within tiobench that waited the longest. I.E. The process notes
the time before it's ready to make the request, then notes the time
after the request is fullfilled. With a 2048MB file and a 4096 byte
block, there may be over 500,000 requests.

It's a relatively small number of requests that have the big latency
wait, so depending on the I/O requests your other applications make
during the test, a long wait may not be obvious, unless one or your
I/O's gets left at the end of the queue for a long time.

This is sometimes referred to as a "corner case".

The point where the "# of threads" manifests the "big latency
wall" is to note a dramatic change in longest I/O latency. This
point varies between the kernel trees.

The "big latency phenomenon" has been in the 2.4 tree at least
since 2.4.17 which is the first kernel I have this measurement
for. It probably goes back much further.

read_latency2
-------------
I tested read_latency2 with 2.4.19-pre5. pre5 vanilla hits
a wall at 32 tiobench threads for sequential reads. With
read_latency2, the wall is around 128.

For random reads, pre5 hits a wall at 64 threads. With
read_latency2, the wall is not apparent even with 128 threads.

read_latency2 appears to reduce sequential write latency
too, but not as dramatically as in the read tests.

2002-03-31 20:06:16

by Ed Sweetman

[permalink] [raw]
Subject: Re: Linux 2.4.19-pre5

On Sun, 2002-03-31 at 07:42, Randy Hron wrote:
> > In this test I wanted to see this lag.
>
> Just to be clear on what the "max latency" number is, it's the I/O
> request within tiobench that waited the longest. I.E. The process notes
> the time before it's ready to make the request, then notes the time
> after the request is fullfilled. With a 2048MB file and a 4096 byte
> block, there may be over 500,000 requests.
>
> It's a relatively small number of requests that have the big latency
> wait, so depending on the I/O requests your other applications make
> during the test, a long wait may not be obvious, unless one or your
> I/O's gets left at the end of the queue for a long time.
>
> This is sometimes referred to as a "corner case".
>
> The point where the "# of threads" manifests the "big latency
> wall" is to note a dramatic change in longest I/O latency. This
> point varies between the kernel trees.
>
> The "big latency phenomenon" has been in the 2.4 tree at least
> since 2.4.17 which is the first kernel I have this measurement
> for. It probably goes back much further.
>
> read_latency2
> -------------
> I tested read_latency2 with 2.4.19-pre5. pre5 vanilla hits
> a wall at 32 tiobench threads for sequential reads. With
> read_latency2, the wall is around 128.
>
> For random reads, pre5 hits a wall at 64 threads. With
> read_latency2, the wall is not apparent even with 128 threads.
>
> read_latency2 appears to reduce sequential write latency
> too, but not as dramatically as in the read tests.
>


I think the preempt kernel shows is that the big latency phenomenon does
not manifest. Rather, if there is a latency spike, it's effect does not
hurt the preempt kernel because that latency is contained within that
process only, it will not effect other processes. At least that's what
I observed with tiobench. Do you have any tests specifically that
you'd like me to run to somehow show this wall? I understand that
there is a latency problem in the kernels ... but the factor that makes
the big difference is whether this is detrimental to things not creating
the problem. You said it makes the box look like it's halted in your
tests, I saw no such thing.

Heh, maybe i'm confused on your definition of the wall. I understand
the latency wall as the point where latency makes the computer
unreactive to the user. If this is true then the preempt kernel has no
wall as far as I can see, at normal priorities it is always reactive to
the user with tiobench.

If it's just a point where latency >2s is above a certain arbituary
number, then it really depends on the media you're running the io test
on.

Also, how is it possible to get >100% cpu efficiency, as shown in some
of my results for tiobench?

Perhaps you see this wall because of how the kernel decides what to swap
in and out of it's swap space. If you keep your swapfile on the drive
you're testing on, you would create periods which nothing responds as
the system tries swapping it back in, as well as slowing down your
test. I keep swap on a separate drive on a different controller.
Although, I get little swap activity, couple KB during the test after a
few runs. Ok, i'm out of ideas.

2002-03-31 23:11:02

by Randy Hron

[permalink] [raw]
Subject: Re: Linux 2.4.19-pre5

> the problem. You said it makes the box look like it's halted in your
> tests, I saw no such thing.

I haven't directly observed any box tightening up for
more than a few seconds. There have been a few reports
on lkml of things like that happening. Based on tiotest
results, I can see if the I/O request you are waiting for
is one of those few that isn't serviced for dozens or
hundreds of seconds, you'll be annoyed.

The number of requests that takes over 10 seconds is
often just 3 in 10,000. There may be only 1 request in
500,000 that takes 500 seconds to service. The chance
of your interactive i/o being the "longest" is small, unless
your interactive work is producing enough I/O to compete
with tiotest.

What I like about read_latency2 is that most latencies
are less, and the highest latency is much less.

> Heh, maybe i'm confused on your definition of the wall.

Sorry if that wasn't clear. "The wall" is the point where
the highest latency in the test skyrockets.

For instance, in recent ac kernels, the _highest_ latency for
sequential reads is less than 5 seconds at 64 threads in all
the ac's I've tested. At 128 threads, the _lowest_ max
latency figure is 200 seconds. So, I used the "wall" term
for 128 in the ac series.

Similarly, in the Marcelo tree, 1.7 seconds is the _highest_
latency with 16 threads in all the mainline kernels I've tested.
At 32 threads, 137 seconds is the _lowest_ maximum latency.
So I used the idea "wall = 32 threads" for 2.4 mainline.

The actual number of seconds will vary depending on the
hardware. I've observed the "skyrocketing" max latency
or "wall" phenomemon on several boxes though.

The max latency growth before and after "the wall" is similar
as threads increase. That is max latency grows slowly, then
jumps enormously, then grows gradually again.

A little picture, not to scale:
m x
a x
x x *

l
a
t
e
n
c
y

s
e
q
#
r
e ........ the wall ......................
a #
d # *
s x *
x*# *#
threads 8 16 32 64 128

* = ac
x = mainline
# = read_latency2

The "not to scale" means the distance between points
above and below "the wall" is actually much greater.

If you feel like it, perhaps you could test ac with preempt
and 8 16 32 64 128 256 threads and see if exhibits a wall
pattern or not.

Thanks for your interest.

2002-04-01 00:37:26

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: Linux 2.4.19-pre5

On Sat, Mar 30, 2002 at 10:52:00PM -0800, Andrew Morton wrote:
> False alarm. My test app was not handling SIGBUS inside its SIGBUS
> handler.

Good :). BTW, sigbus should never indicate an oom failure, SIGKILL is
always sent in such a case. If it would came out of a pagefault it would
mean it was a MAP_SHARED access beyond the end of the file.

thanks,

Andrea

2002-04-01 01:24:29

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: Linux 2.4.19-pre5

On Sat, Mar 30, 2002 at 11:49:06AM -0800, Andrew Morton wrote:
> That's rather irrational, because most of the dirty buffers
> will be in ZONE_HIGHMEM. So hmmmm. Probably we should go

You're right it's not the best, but it's intentional and correct. We've
just one single balance_dirty(void) and it has to balance metadata and
data, data will be in highmem too, but metadata will be in normal zone
for most filesystems (even ext[23], modulo direntries for ext2) and
that's why we've to consider only the normal zone as a certain target of
the allocation, to be sure not to overstimate for metadata. OTOH in
particular to take full advantage of the point of view watermarks it
would be really nicer to say if we've to balance_dirty on the normal
zone or on the highmem zone (currently we could overstimate a bit the
amount of ram we can take from the normal zone with an highmem
allocation (we look at the the high watermark from the "normal" point of
view), but OTOH we always understimate the amount of potential highmem
free), so the current way is mostly ok for now (not a showstopper). To
improve that bit we simply need a kind of "zone" argument to
balance_dirty() API, done that the other changes should be a formality.

Andrea

2002-04-04 09:08:42

by Tom Holroyd

[permalink] [raw]
Subject: Re: Linux 2.4.19-pre5

AlphaPC 264DP 666 MHz (Tsunami, UP)
1GB RAM
gcc version 3.0.3

Running stuff as usual, reading large files, I can often get
very long mouse freezes when redrawing a certain window in X after
leaving it for a while. I never saw this behavior in 2.4.18-rc1,
which I ran for over 1 month doing the same stuff. vmstat doesn't
report swapping activity that I can see, just a window that should
refresh (no backing store) right away causes long (2~5 sec) freezes.


2002-04-04 20:33:20

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: Linux 2.4.19-pre5


Could you please try to reproduce with 2.4.19-pre4 ?

Thanks

On Thu, 4 Apr 2002, Tom Holroyd wrote:

> AlphaPC 264DP 666 MHz (Tsunami, UP)
> 1GB RAM
> gcc version 3.0.3
>
> Running stuff as usual, reading large files, I can often get
> very long mouse freezes when redrawing a certain window in X after
> leaving it for a while. I never saw this behavior in 2.4.18-rc1,
> which I ran for over 1 month doing the same stuff. vmstat doesn't
> report swapping activity that I can see, just a window that should
> refresh (no backing store) right away causes long (2~5 sec) freezes.


2002-04-05 04:13:59

by Tom Holroyd

[permalink] [raw]
Subject: Re: Linux 2.4.19-pre5

On Thu, 4 Apr 2002, Marcelo Tosatti wrote:

> Could you please try to reproduce with 2.4.19-pre4 ?

OK, I could, so I searched back and -pre1 was OK. This behavior
showed up in -pre2. It seems to be related to the mm changes.
Unfortunately I don't know how to back those out safely to check that.

To repeat, I set up a window that has to be redrawn (no backing
store), then use ee (electric eyes) to scroll through 50 or so JPGs
then go back to redraw the aforementioned window. In -pre2 I get 5
sec freezes and no disk IO during the interval, so it seems like a
memory management thing.

Any tests I could do? A -pre2 patch without the mm changes?

> On Thu, 4 Apr 2002, Tom Holroyd wrote:
>
> > AlphaPC 264DP 666 MHz (Tsunami, UP)
> > 1GB RAM
> > gcc version 3.0.3
> > ... a window that should
> > refresh (no backing store) right away causes long (2~5 sec) freezes.

2002-04-16 14:49:22

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: Linux 2.4.19-pre5

On Fri, Apr 05, 2002 at 01:13:30PM +0900, Tom Holroyd wrote:
> On Thu, 4 Apr 2002, Marcelo Tosatti wrote:
>
> > Could you please try to reproduce with 2.4.19-pre4 ?
>
> OK, I could, so I searched back and -pre1 was OK. This behavior
> showed up in -pre2. It seems to be related to the mm changes.
> Unfortunately I don't know how to back those out safely to check that.
>
> To repeat, I set up a window that has to be redrawn (no backing
> store), then use ee (electric eyes) to scroll through 50 or so JPGs
> then go back to redraw the aforementioned window. In -pre2 I get 5
> sec freezes and no disk IO during the interval, so it seems like a
> memory management thing.
>
> Any tests I could do? A -pre2 patch without the mm changes?

there are no mm diffs between pre1 and pre2. The first mm changes are in
pre5.

Andrea

2002-04-17 01:22:55

by Tom Holroyd

[permalink] [raw]
Subject: Re: Linux 2.4.19-pre5

On Tue, 16 Apr 2002, Andrea Arcangeli wrote:

> there are no mm diffs between pre1 and pre2. The first mm changes are in
> pre5.

I was referring to the changes in mm/, such as mm/filemap.c,
mm/mmap.c, mm/page_alloc.c, etc., all of which were rather extensive
in -pre2, and contained some 64 bit specific stuff.

As I mentioned to Marcelo earlier, I have been able to get ~5 sec
mouse freezes on my Alpha. Right now I'm running -pre7, and I just
reproduced it. The only way I know of to do it is run a certain
graphics program that's work-related, plot something, switch screens
(in the 'I have a 4x4 virtual desktop' sense), run a big filter job
that reads about 200-300 meg in, filters it, and writes it back out
(alternatively, I can scroll through a directory full of images, so
maybe it only has to have a few hundred meg read in), then switch
back to the graphics screen and hit "refresh". It does not work every
time, but I certainly notice it when it happens. Note that after I've
done it once, it's much harder to reproduce. Of course, that maybe
depends on the phase of the moon, because I reproduced it twice in a
row this morning, with profiling data the second time. Here's the
list of routines that got ticks (as reported by readprofile) during
the ~3 sec freeze I just caused:

1 alloc_skb
1 alpha_switch_to
1 cached_lookup
1 collect_sigign_sigcatch
1 copy_page
1 del_timer
1 dentry_open
1 do_no_page
1 do_page_fault
1 do_switch_stack
1 fput
1 generic_file_read
1 generic_file_write
1 get_gendisk
1 iput
1 link_path_walk
1 locks_remove_flock
1 poll_freewait
1 read_aux
1 remove_wait_queue
1 restore_all
1 schedule
1 scsi_dispatch_cmd
1 session_of_pgrp
1 sock_alloc_send_pskb
1 sys_close
1 sys_ioctl
1 sys_select
1 undo_switch_stack
1 unix_poll
1 unix_stream_sendmsg
1 vsprintf
2 clear_page
2 generic_file_readahead
2 number
2 sock_poll
2 tcp_poll
2 update_atime
3 __free_pages
3 add_wait_queue
3 kfree
3 vsnprintf
4 entSys
4 fget
5 do_select
5 handle_IRQ_event
7 __copy_user
7 sys_read
8 __divqu
10 __remqu
10 do_generic_file_read
18 entInt
19 keyboard_interrupt

I've done this quite a few times now, and add_wait_queue &
__free_pages are always there. (Well, and keyboard_interrupt, but
that's not it.) Other things are more random, but it's hard to judge
precisely (I'm not sure how the profiling works but I assume short
routines can be easily missed).

This is an AlphaPC 264DP 666 MHz (uniprocessor version, non-SMP
kernel). 1GB RAM. AHA-2940U/UW SCSI on the PCI bus along
with a Permedia 2 graphics card.