2004-04-04 05:23:02

by Andrea Arcangeli

[permalink] [raw]
Subject: 2.6.5-aa1

This fixes a tiny race in the recent mprotect merging code, here's the
intradiff for review, plus it merges some nice lowlatency improvement
from Takashi.

--- x/mm/mprotect.c.~1~ 2004-04-04 06:26:09.226033712 +0200
+++ x/mm/mprotect.c 2004-04-04 06:29:37.165422120 +0200
@@ -196,12 +196,18 @@ mprotect_attempt_merge(struct vm_area_st

/*
* Otherwise extend it.
+ * We need the anon_vma_lock only for "vma" since it's changing
+ * vma->vm_start and vma->vm_pgoff. prev->vm_start and
+ * prev->vm_pgoff are unchanged so the race on prev->vm_end
+ * is controlled w/o explicit anon-vma locking.
*/
if (file)
down(i_shared_sem);
+ anon_vma_lock(vma);
__vma_modify(root, prev, prev->vm_start, end, prev->vm_pgoff);
__vma_modify(root, vma, end, vma->vm_end,
vma->vm_pgoff + ((end - vma->vm_start) >> PAGE_SHIFT));
+ anon_vma_unlock(vma);
if (file)
up(i_shared_sem);
return 1;
@@ -264,6 +270,7 @@ mprotect_attempt_merge_final(struct vm_a

if (file)
down(i_shared_sem);
+ /* no need of anon_vma_lock for any "vm_end" extension */
__vma_modify(root, prev, prev->vm_start,
next->vm_end, prev->vm_pgoff);


I didn't yet merge the ppc patch because I'm not really sure it's
necessary (how can it not oops in the first place, if that patch was
needed? OTOH certainly that patch cannot hurt either but I'll wait
feedback from the testing first). The only pending bug at the moment is
the gfp-no-compound related crash from Christoph on ppc showing
page->private corrupted. I currently doubt it's a bug in my changes,
though I cannot exclude it either. As soon as I get the results from the
three debugging patches I will know more about it. I definitely cannot
reproduce anything wrong here, and the gfp-no-compound fixed the last
swap-suspend related glitch plus it makes the interface with the drivers
more robust.

URL:

http://www.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.6/2.6.5-aa1.gz
http://www.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.6/2.6.5-aa1/

Changelog diff between 2.6.5-rc3-aa3 and 2.6.5-aa1:

Files 2.6.5-rc3-aa3/disable-cap-mlock and 2.6.5-aa1/disable-cap-mlock differ
Files 2.6.5-rc3-aa3/extraversion and 2.6.5-aa1/extraversion differ
Files 2.6.5-rc3-aa3/prio-tree.gz and 2.6.5-aa1/prio-tree.gz differ

Rediffed due rejects.

Files 2.6.5-rc3-aa3/mprotect-vma-merging and 2.6.5-aa1/mprotect-vma-merging differ

Fixed race condition in mprotect, must hold the anon_vma_lock()
while moving either ->vm_start or ->vm_pgoff (extending the vm_end
doesn't need it instead since the race is controlled w/o explicit
locking).

Only in 2.6.5-aa1: unmap_vmas-lat

Don't threat no-preempt differently from -preempt w.r.t. worst case
latencies.

Only in 2.6.5-aa1: writeback-lat

Merged Takashi Iwai's lowlatency fixes adding missing schedule points,
reducing greatly the worst case latency with preempt disabled.


2004-04-04 21:07:44

by Marcus Hartig

[permalink] [raw]
Subject: Re: 2.6.5-aa1

> This fixes a tiny race in the recent mprotect merging code, here's the
> intradiff for review, plus it merges some nice lowlatency improvement
> from Takashi.

Runs fine her with my GNOME 2.6 desktop. Fast like Speedy Gonzales.
Good work.

But now with the vanilla 2.6.5 and/or -aa1 my favourite game Enemy
Territory quits with "signal 11". With 2.6.5-rc3 it runs stable for hours.

No change in the kernel config, all with preempt, no CONFIG_REGPARM for
nVidia binary drivers is set, or other changes. But only when I want to
access the net server game browser in ET to play online! Only then bumm!

With 2.6.5-rc3 all runs fine. Amusingly, hmmm?

Marcus

2004-04-04 22:59:51

by Jeff Sipek

[permalink] [raw]
Subject: Re: 2.6.5-aa1

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sunday 04 April 2004 17:05, Marcus Hartig wrote:
<snip>
> But now with the vanilla 2.6.5 and/or -aa1 my favourite game Enemy
> Territory quits with "signal 11". With 2.6.5-rc3 it runs stable for hours.
>
> No change in the kernel config, all with preempt, no CONFIG_REGPARM for
> nVidia binary drivers is set, or other changes. But only when I want to
> access the net server game browser in ET to play online! Only then bumm!

Same here (with vanilla 2.6.5, I didn't try -aa.)

> With 2.6.5-rc3 all runs fine. Amusingly, hmmm?

In 2.6.5-rc1 it works fine.

Jeff.

- --
Penguin : Linux version 2.6.2-rc2-net64 on an i686 machine (3932.16 BogoMips).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFAcJNhwFP0+seVj/4RAp6gAKCXL7rhnhWrlPLGHd+uHYNU1b+QggCcCb0n
ivXbW7pWxMXXEt+jlH8gEx0=
=tmuS
-----END PGP SIGNATURE-----

2004-04-05 00:20:27

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: 2.6.5-aa1

On Sun, Apr 04, 2004 at 06:59:41PM -0400, Jeff Sipek wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On Sunday 04 April 2004 17:05, Marcus Hartig wrote:
> <snip>
> > But now with the vanilla 2.6.5 and/or -aa1 my favourite game Enemy
> > Territory quits with "signal 11". With 2.6.5-rc3 it runs stable for hours.
> >
> > No change in the kernel config, all with preempt, no CONFIG_REGPARM for
> > nVidia binary drivers is set, or other changes. But only when I want to
> > access the net server game browser in ET to play online! Only then bumm!
>
> Same here (with vanilla 2.6.5, I didn't try -aa.)

did you get an oops or just a sigsegv? (see dmesg) If you only got a
sigsegv can you try to keep the segfaulting process under "strace -o
/tmp/o -p <pid>" and report the last few syscalls before the segfault?
That should reduce the scope of the problem, I had a look at the
diff between rc3 and 2.6.5 final but I found nothing obvious that could
explain your problem (yet).

2004-04-05 02:19:13

by Jeff Sipek

[permalink] [raw]
Subject: Re: 2.6.5-aa1

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sunday 04 April 2004 20:20, Andrea Arcangeli wrote:
> did you get an oops or just a sigsegv? (see dmesg)

only sigsegv

> If you only got a
> sigsegv can you try to keep the segfaulting process under "strace -o
> /tmp/o -p <pid>" and report the last few syscalls before the segfault?

Sure. I started the process as:

artsdsp -m et

et is a shell script (created during the installation) that changes the
working directory and executes et.x86. Then, I attached strace to the actual
executable (et.x86.)

Here are last few lines of the output:

ioctl(68, 0xc0104629, 0xbfffda98) = 0
munmap(0x47e37000, 1056768) = 0
ioctl(68, 0xc0104629, 0xbfffda98) = 0
ioctl(70, 0xc0184633, 0xbfffdaa4) = 0
ioctl(68, 0xc0104629, 0xbfffdab4) = 0
ioctl(71, 0xc01046cf, 0xbfffda88) = 0
close(71) = 0
ioctl(68, 0xc0104629, 0xbfffdab4) = 0
ioctl(72, 0xc01046cf, 0xbfffda88) = 0
close(72) = 0
ioctl(68, 0xc0104629, 0xbfffdab4) = 0
ioctl(73, 0xc01046cf, 0xbfffda88) = 0
close(73) = 0
ioctl(68, 0xc0104629, 0xbfffdaa4) = 0
ioctl(68, 0xc0104629, 0xbfffdaa4) = 0
ioctl(68, 0xc0104629, 0xbfffdaa4) = 0
ioctl(68, 0xc0104629, 0xbfffda98) = 0
ioctl(68, 0xc0104629, 0xbfffda8c) = 0
ioctl(68, 0xc0104629, 0xbfffdaa4) = 0
ioctl(68, 0xc0104629, 0xbfffdaa4) = 0
ioctl(68, 0xc0104629, 0xbfffdaa4) = 0
munmap(0x4804b000, 4096) = 0
ioctl(68, 0xc0104629, 0xbfffdab4) = 0
ioctl(68, 0xc0104629, 0xbfffdab4) = 0
close(70) = 0
getpid() = 1987
munmap(0x46b9b000, 378720) = 0
munmap(0x46bf8000, 4933916) = 0
write(2, "Shutdown tty console\n", 21) = 21
ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig -icanon -ec
ho ...}) = 0
ioctl(0, SNDCTL_TMR_STOP or TCSETSW, {B38400 opost isig icanon echo ..
.}) = 0
ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo
...}) = 0
munmap(0x4bbd5000, 1210664) = 0
munmap(0x4c029000, 46072) = 0
munmap(0x4c257000, 344064) = 0
munmap(0x4c035000, 2233160) = 0
munmap(0x4c2ab000, 1210664) = 0
munmap(0x4c3d3000, 46072) = 0
munmap(0x489bd000, 4096) = 0
exit_group(0) = ?

> That should reduce the scope of the problem, I had a look at the
> diff between rc3 and 2.6.5 final but I found nothing obvious that could
> explain your problem (yet).

I had to use artsdsp to run et, because it, just like everything else here,
hangs when it tries to open /dev/dsp. Even dd if=/dev/dsp of=/dev/null hangs.
Interestingly enough, xmms works without any problems via both alsa and oss
emulation.

Strace reports shows this:
open("/dev/dsp", O_RDWR
and waits forever. While lsof shows NO processes. (Note: the sound issue is
not new, I just tested 2.6.2-rc? and it was broken there too.)

Jeff.

- --
Keyboard not found!
Press F1 to enter Setup
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFAcMIXwFP0+seVj/4RAsEoAKCxzeLvcxtmCIk5TDqiBQvFHAcJ4QCdH8yg
BqvMqoTQvcSEjPC453IJTOs=
=Fq9B
-----END PGP SIGNATURE-----

2004-04-05 05:45:36

by Marcus Hartig

[permalink] [raw]
Subject: Re: 2.6.5-aa1

Andrea Arcangeli wrote:

> did you get an oops or just a sigsegv? (see dmesg) If you only got a
> sigsegv can you try to keep the segfaulting process under "strace -o
> /tmp/o -p <pid>" and report the last few syscalls before the segfault?
> That should reduce the scope of the problem, I had a look at the
> diff between rc3 and 2.6.5 final but I found nothing obvious that could
> explain your problem (yet).

nForce2 board, nVidia bin driver 5341 with ALSA sound driver snd_intel8x0.
No sound daemon under my Fedora GNOME Desktop running. Online with DSL
connection.
Ive tested it again, no chance with 2.6.5 only with rc3, switching back
and it runs fine. ALSA changes? Or in the net code? To get sound in the
game I do this:

echo "et.x86 0 0 direct" > /proc/asound/card0/pcm0p/oss
echo "et.x86 0 0 disable" > /proc/asound/card0/pcm0c/oss


The related part (i hope) of strace:
-------------------------------------------
sched_yield() = 0
sched_yield() = 0
sched_yield() = 0
sched_yield() = 0
sched_yield() = 0
sched_yield() = 0
sched_yield() = 0
sched_yield() = 0
gettimeofday({1081141090, 664894}, NULL) = 0
gettimeofday({1081141090, 664917}, NULL) = 0
gettimeofday({1081141090, 664944}, NULL) = 0
ioctl(36, SNDCTL_DSP_GETOPTR, 0xbffff774) = 0
time([1081141090]) = 1081141090
gettimeofday({1081141090, 665080}, NULL) = 0
open("/home/marcus/.etwolf/etmain/etkey", O_RDONLY) = 39
close(39) = 0
open("/home/marcus/.etwolf/etmain/etkey", O_RDONLY) = 39
fstat64(39, {st_mode=S_IFREG|0754, st_size=67, ...}) = 0
mmap2(NULL, 131072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0x54355000
read(39, "0000001002200311220003785725I\374CJ"..., 131072) = 67
read(39, "", 131072) = 0
close(39) = 0
munmap(0x54355000, 131072) = 0
time([1081141090]) = 1081141090
gettimeofday({1081141090, 665413}, NULL) = 0
time([1081141090]) = 1081141090
gettimeofday({1081141090, 665459}, NULL) = 0
time([1081141090]) = 1081141090
gettimeofday({1081141090, 665502}, NULL) = 0
ioctl(29, FIONREAD, [32]) = 0
read(29, "\6\0\24\1\310\t\34\0\217\0\0\0\2\0\340\1\0\0\0\0\376\377"...,
32) = 32
gettimeofday({1081141090, 665627}, NULL) = 0
ioctl(29, FIONREAD, [32]) = 0
read(29, "\6\0\24\1\320\t\34\0\217\0\0\0\2\0\340\1\0\0\0\0\377\377"...,
32) = 32
gettimeofday({1081141090, 665705}, NULL) = 0
ioctl(29, FIONREAD, [32]) = 0
read(29, "\6\0\24\1\330\t\34\0\217\0\0\0\2\0\340\1\0\0\0\0\376\377"...,
32) = 32
gettimeofday({1081141090, 665794}, NULL) = 0
ioctl(29, FIONREAD, [0]) = 0
read(0, 0xbfff768f, 1) = -1 EAGAIN (Resource temporarily
unavailable)
recvfrom(37, 0x93848c0, 32768, 0, 0xbfff76e0, 0xbfff76dc) = -1 EAGAIN
(Resource temporarily unavailable)
gettimeofday({1081141090, 665962}, NULL) = 0
gettimeofday({1081141090, 665982}, NULL) = 0
ioctl(29, FIONREAD, [0]) = 0
read(0, 0xbfff768f, 1) = -1 EAGAIN (Resource temporarily
unavailable)
recvfrom(37, 0x93848c0, 32768, 0, 0xbfff76e0, 0xbfff76dc) = -1 EAGAIN
(Resource temporarily unavailable)
gettimeofday({1081141090, 666104}, NULL) = 0
ioctl(29, FIONREAD, [0]) = 0
read(0, 0xbfff768f, 1) = -1 EAGAIN (Resource temporarily
unavailable)
recvfrom(37, 0x93848c0, 32768, 0, 0xbfff76e0, 0xbfff76dc) = -1 EAGAIN
(Resource temporarily unavailable)
gettimeofday({1081141090, 666206}, NULL) = 0
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 3), ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x54355000
write(1, "Received signal 11, exiting...\n", 31) = 31
write(29, "\211\2\2\0\0\0\0\0+D\1\0", 12) = 12
read(29, "\1\2\26\1\0\0\0\0\22\0\300\1\0\0\0\0\0\0\0\0\34\0\0\0\300"...,
32) = 32
write(29, "i\2\3\0\2\0\1\0\4\0\1\1\33D\2\0\0\0\0\0 \0\2\0\0\0\0\0"..., 84)
= 84
read(29, "\6\0\32\1\337\t\34\0\217\0\0\0\2\0\340\1\0\0\0\0\0\2\200"...,
32) = 32
read(29, "\1\0\34\1\0\0\0\0\1\0\0\0\250\365\377\277\360\270\v\10"..., 32) = 32
munmap(0x5075b000, 266240) = 0
-------------------------------------------------------------------------

Thanks, i will test it later with prempt off and an other driver.

Marcus

2004-04-05 07:03:34

by Jeff Sipek

[permalink] [raw]
Subject: Re: 2.6.5-aa1

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Monday 05 April 2004 01:39, Marcus Hartig wrote:
<snip>
> echo "et.x86 0 0 direct" > /proc/asound/card0/pcm0p/oss
> echo "et.x86 0 0 disable" > /proc/asound/card0/pcm0c/oss

I used only the first one of the two commands, and had to use artsdsp to get
sound. With both of those commands, I just got et running without arts and it
didn't sigsegv.

> The related part (i hope) of strace:
<snip>
>
> Thanks, i will test it later with prempt off and an other driver.

Jeff.

- --
Real Programmers consider "what you see is what you get" to be just as
bad a concept in Text Editors as it is in women. No, the Real Programmer
wants a "you asked for it, you got it" text editor -- complicated,
cryptic, powerful, unforgiving, dangerous.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFAcQS8wFP0+seVj/4RAhL6AJ90rSQMKtx9pSAvmDtmkgBtJgwXVwCglLbz
SepNHo5CLDfhFZV3Ic3YF3A=
=1NYt
-----END PGP SIGNATURE-----

2004-04-05 17:32:13

by Marcus Hartig

[permalink] [raw]
Subject: Re: 2.6.5-aa1

Martin Schlemmer wrote:

> Stupid question - where do you get the 5341 driver??

:-) http://www.nforcershq.com/forum/viewtopic.php?t=44256

Marcus


2004-04-05 18:33:59

by Marcus Hartig

[permalink] [raw]
Subject: Re: 2.6.5-aa1

Andrea Arcangeli wrote:

> That should reduce the scope of the problem, I had a look at the
> diff between rc3 and 2.6.5 final but I found nothing obvious that could
> explain your problem (yet).

It seems to be CONFIG_PREEMPT. I have compiled the 2.6.5-aa1 only without
it and ET runs now 30min without a signal11.


Marcus

2004-04-05 19:02:29

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: 2.6.5-aa1

On Mon, Apr 05, 2004 at 08:31:29PM +0200, Marcus Hartig wrote:
> Andrea Arcangeli wrote:
>
> >That should reduce the scope of the problem, I had a look at the
> >diff between rc3 and 2.6.5 final but I found nothing obvious that could
> >explain your problem (yet).
>
> It seems to be CONFIG_PREEMPT. I have compiled the 2.6.5-aa1 only without
> it and ET runs now 30min without a signal11.

sounds good, probably a preempt bug in the alsa code or an rcu issue or
something like that. my tree has the most important fixes in the
writeback code from Takashi to provide the same lowlatency w/ or w/o
CONFIG_PREEMPT so you shouldn't notice much difference either ways. It
was a good decision to leave preempt off for higher reliability too,
preempt isn't just a matter of spinlocks, sometime you need explicit
preempt_disable to make it work right.

still it'd be nice to fix it purerly as an exercise, exercises are
useful nevertheless ;).