2002-10-01 13:03:15

by Alessandro Suardi

[permalink] [raw]
Subject: Re: Shared memory shmat/dt not working well in 2.5.x

Zlatko Calusic wrote:
> Hi, Andrew, Hugh & others.
>
> Still having problems with Oracle on 2.5.x (it can't even be started),

[snip]

Just wanted to add that I can't provide further info about which
kernel broke it... updated map:

2.5.34 kernel okay, Oracle works
2.5.35 kernel doesn't compile
2.5.36 oops on linux kernel boot, frozen
2.5.37 oops on linux kernel boot, SysRQ works
2.5.38 kernel okay, Oracle OOMs
2.5.39 as 2.5.38
2.5.40 kernel.org down, no mirrors carrying it yet

My box is a dell latitude CPx750J, PIII CPU, 256M RAM / 512MB swap
all on ext3fs, mounted rw,noatime except of course for /dev/shm
which is tmpfs. UP kernel, preempt is on, hugetlb is off.

As I told Andrew in private email, the Oracle shm segment is created,
the background processes forked but the SQL*Plus child which should
perform the database open after checking datafiles and obviously
attaching the shm segment (about 50MB of it) gets killed by OOM.

--alessandro

"everything dies, baby that's a fact
but maybe everything that dies someday comes back"
(Bruce Springsteen, "Atlantic City")

2002-10-01 13:03:20

by Hugh Dickins

[permalink] [raw]
Subject: [PATCH] Re: Shared memory shmat/dt not working well in 2.5.x

On Tue, 1 Oct 2002, Zlatko Calusic wrote:
>
> Still having problems with Oracle on 2.5.x (it can't even be started),
> I devoted some time trying to pinpoint where the problem is. Reading
> many traces of Oracle, and rebooting a dozen times, I finally found
> that the culprit is weird behaviour of shmat/shmdt functions in 2.5,
> when combined with mprotect() calls. I wrote a simple test app
> (attached) and I'm also appending output of it below (running on
> 2.4.19 & 2.5.39 kernels, see the difference).

Exemplary bug report! Many thanks for taking so much trouble to
reproduce the problem. Patch below (against 2.5.39) should fix it:
I'll send Linus and Andrew when I can get hold of a 2.5.40 tree.

Hugh

--- 2.5.39/mm/mmap.c Fri Sep 20 17:57:49 2002
+++ linux/mm/mmap.c Tue Oct 1 13:59:54 2002
@@ -1055,7 +1055,7 @@ int split_vma(struct mm_struct * mm, str
if (new_below) {
new->vm_end = addr;
vma->vm_start = addr;
- vma->vm_pgoff += ((addr - vma->vm_start) >> PAGE_SHIFT);
+ vma->vm_pgoff += ((addr - new->vm_start) >> PAGE_SHIFT);
} else {
vma->vm_end = addr;
new->vm_start = addr;

2002-10-01 13:24:09

by Alessandro Suardi

[permalink] [raw]
Subject: Re: [PATCH] Re: Shared memory shmat/dt not working well in 2.5.x

Hugh Dickins wrote:
> On Tue, 1 Oct 2002, Zlatko Calusic wrote:
>
>>Still having problems with Oracle on 2.5.x (it can't even be started),
>>I devoted some time trying to pinpoint where the problem is. Reading
>>many traces of Oracle, and rebooting a dozen times, I finally found
>>that the culprit is weird behaviour of shmat/shmdt functions in 2.5,
>>when combined with mprotect() calls. I wrote a simple test app
>>(attached) and I'm also appending output of it below (running on
>>2.4.19 & 2.5.39 kernels, see the difference).
>
>
> Exemplary bug report! Many thanks for taking so much trouble to
> reproduce the problem. Patch below (against 2.5.39) should fix it:
> I'll send Linus and Andrew when I can get hold of a 2.5.40 tree.
>
> Hugh
>
> --- 2.5.39/mm/mmap.c Fri Sep 20 17:57:49 2002
> +++ linux/mm/mmap.c Tue Oct 1 13:59:54 2002
> @@ -1055,7 +1055,7 @@ int split_vma(struct mm_struct * mm, str
> if (new_below) {
> new->vm_end = addr;
> vma->vm_start = addr;
> - vma->vm_pgoff += ((addr - vma->vm_start) >> PAGE_SHIFT);
> + vma->vm_pgoff += ((addr - new->vm_start) >> PAGE_SHIFT);
> } else {
> vma->vm_end = addr;
> new->vm_start = addr;

I'm glad to report that Oracle 9.2 is now able to start once again
on 2.5.x series :)

Thanks, cool work as always !

--alessandro

"everything dies, baby that's a fact
but maybe everything that dies someday comes back"
(Bruce Springsteen, "Atlantic City")

2002-10-01 13:32:09

by Zlatko Calusic

[permalink] [raw]
Subject: Re: [PATCH] Re: Shared memory shmat/dt not working well in 2.5.x

Hugh Dickins <[email protected]> writes:

> On Tue, 1 Oct 2002, Zlatko Calusic wrote:
>>
>> Still having problems with Oracle on 2.5.x (it can't even be started),
>> I devoted some time trying to pinpoint where the problem is. Reading
>> many traces of Oracle, and rebooting a dozen times, I finally found
>> that the culprit is weird behaviour of shmat/shmdt functions in 2.5,
>> when combined with mprotect() calls. I wrote a simple test app
>> (attached) and I'm also appending output of it below (running on
>> 2.4.19 & 2.5.39 kernels, see the difference).
>
> Exemplary bug report! Many thanks for taking so much trouble to
> reproduce the problem. Patch below (against 2.5.39) should fix it:
> I'll send Linus and Andrew when I can get hold of a 2.5.40 tree.

Oh, dear sir, but you are the one who solved it eventually. :)
Anyway, I'm glad it worked well and I was helpful.

Looking forward to test/bench Oracle w/ patch applied (and report
further bugs in ext3, fsync() and friends ;)).

Keep up the good work!
--
Zlatko

2002-10-01 13:41:16

by Zlatko Calusic

[permalink] [raw]
Subject: Re: [PATCH] Re: Shared memory shmat/dt not working well in 2.5.x

Alessandro Suardi <[email protected]> writes:

> Hugh Dickins wrote:
>> On Tue, 1 Oct 2002, Zlatko Calusic wrote:
>>
>>>Still having problems with Oracle on 2.5.x (it can't even be started),
>>>I devoted some time trying to pinpoint where the problem is. Reading
>>>many traces of Oracle, and rebooting a dozen times, I finally found
>>>that the culprit is weird behaviour of shmat/shmdt functions in 2.5,
>>>when combined with mprotect() calls. I wrote a simple test app
>>>(attached) and I'm also appending output of it below (running on
>>>2.4.19 & 2.5.39 kernels, see the difference).
>> Exemplary bug report! Many thanks for taking so much trouble to
>> reproduce the problem. Patch below (against 2.5.39) should fix it:
>> I'll send Linus and Andrew when I can get hold of a 2.5.40 tree.
>> Hugh
>
[snip]
>
> I'm glad to report that Oracle 9.2 is now able to start once again
> on 2.5.x series :)
>
> Thanks, cool work as always !

Was it a known problem for some time?

I haven't been testing 2.5.x series for some time, and also haven't
read linux-kernel list last few months, so I don't know exact history
of the bug. If you can enlighten me, I'm just curious... :)

I rememeber other more complicated bugs from the older 2.5.x kernels,
and now I'll test if they're solved in newer ones. I might need some
help if they still exist (could you lend me a hand if that's the
case?) as I was getting Oracle internal error - coredump - with only
one meaningful sentence (at least to me :)). Google was silent on the
case. :(

Regards,
--
Zlatko

2002-10-01 14:47:30

by Alessandro Suardi

[permalink] [raw]
Subject: Re: [PATCH] Re: Shared memory shmat/dt not working well in 2.5.x

Zlatko Calusic wrote:

>>I'm glad to report that Oracle 9.2 is now able to start once again
>> on 2.5.x series :)
>>
>>Thanks, cool work as always !
>
>
> Was it a known problem for some time?
>
> I haven't been testing 2.5.x series for some time, and also haven't
> read linux-kernel list last few months, so I don't know exact history
> of the bug. If you can enlighten me, I'm just curious... :)
>
> I rememeber other more complicated bugs from the older 2.5.x kernels,
> and now I'll test if they're solved in newer ones. I might need some
> help if they still exist (could you lend me a hand if that's the
> case?) as I was getting Oracle internal error - coredump - with only
> one meaningful sentence (at least to me :)). Google was silent on the
> case. :(

I reported the issue on l-k the other day:

http://www.uwsg.iu.edu/hypermail/linux/kernel/0209.3/1691.html

The more complicated bug you're talking about is the exec_mmap
change introduced in 2.5.19 and fixed a handful of versions
later, possibly .28, where PMON wouldn't start after 120"...
I guess :)


Ciao,

--alessandro

"everything dies, baby that's a fact
but maybe everything that dies someday comes back"
(Bruce Springsteen, "Atlantic City")

2002-10-01 14:53:47

by Zlatko Calusic

[permalink] [raw]
Subject: Re: [PATCH] Re: Shared memory shmat/dt not working well in 2.5.x

Alessandro Suardi <[email protected]> writes:

> Zlatko Calusic wrote:
>
>>>I'm glad to report that Oracle 9.2 is now able to start once again
>>> on 2.5.x series :)
>>>
>>>Thanks, cool work as always !
>> Was it a known problem for some time?
>> I haven't been testing 2.5.x series for some time, and also haven't
>> read linux-kernel list last few months, so I don't know exact history
>> of the bug. If you can enlighten me, I'm just curious... :)
>> I rememeber other more complicated bugs from the older 2.5.x kernels,
>> and now I'll test if they're solved in newer ones. I might need some
>> help if they still exist (could you lend me a hand if that's the
>> case?) as I was getting Oracle internal error - coredump - with only
>> one meaningful sentence (at least to me :)). Google was silent on the
>> case. :(
>
> I reported the issue on l-k the other day:
>
> http://www.uwsg.iu.edu/hypermail/linux/kernel/0209.3/1691.html

I see. Same day I decided to dig deeper. :)

>
> The more complicated bug you're talking about is the exec_mmap
> change introduced in 2.5.19 and fixed a handful of versions
> later, possibly .28, where PMON wouldn't start after 120"...
> I guess :)

Great. Thanks for the useful info.

It looks that there's a chance I will do only the interesting
benchmarking part. :) I'm quite curious how Andrew's work in 2.5.x
will affect performance of Oracle database.

Thanks for everything.
--
Zlatko

2002-10-01 15:27:13

by Hugh Dickins

[permalink] [raw]
Subject: [PATCH] Oracle startup split_vma fix

Alessandro Suardi and Zlatko Calusic independently reported that
Oracle cannot start on recent 2.5: excellent research by Zlatko
quickly pointed to vm_pgoff buglet in the new split_vma.

Patch below against 2.5.40 or 2.5.40-mm1: please apply.

--- 2.5.40/mm/mmap.c Tue Oct 1 15:33:04 2002
+++ linux/mm/mmap.c Tue Oct 1 15:53:06 2002
@@ -1058,7 +1058,7 @@
if (new_below) {
new->vm_end = addr;
vma->vm_start = addr;
- vma->vm_pgoff += ((addr - vma->vm_start) >> PAGE_SHIFT);
+ vma->vm_pgoff += ((addr - new->vm_start) >> PAGE_SHIFT);
} else {
vma->vm_end = addr;
new->vm_start = addr;

2002-10-02 18:40:37

by Zlatko Calusic

[permalink] [raw]
Subject: Re: [PATCH] Re: Shared memory shmat/dt not working well in 2.5.x

Alessandro Suardi <[email protected]> writes:
> The more complicated bug you're talking about is the exec_mmap
> change introduced in 2.5.19 and fixed a handful of versions
> later, possibly .28, where PMON wouldn't start after 120"...
> I guess :)

Oh, well, if that one is really fixed, then I have another one. ;)

After some time up, few select & few inserts, Oracle decided to die
(2.5.40 + Hugh's patch, SMP, Oracle 9.0.1.4 - works flawlessly on
2.4.19). I have a full coredump, but I don't know what to do with it
(if somebody wants it, just say). It seems benchmarking will
wait... :(


*** 2002-10-02 20:15:27.634
*** SESSION ID:(4.1) 2002-10-02 20:15:27.583
BH (0x0x60fee288) file#: 1 rdba: 0x004000c7 (1/199) class 1 ba: 0x0x60c9a000
set: 3, dbwrid: 0
hash: [53509d88,53509d88], lru: [60fee370,60fee220]
LRU flags:
ckptq: [NULL] fileq: [NULL]
st: XCURRENT, md: NULL, rsop: 0x(nil), tch: 1
L:[0x0.0.0] H:[0x0.0.0] R:[0x0.0.0]
*** 2002-10-02 20:15:27.634
ksedmp: internal or fatal error
ORA-00600: Message 600 not found; No message file for product=RDBMS, facility=ORA; arguments: [kcbkllrba_2]

...

--
Zlatko

2002-10-08 11:22:37

by Zlatko Calusic

[permalink] [raw]
Subject: Re: [PATCH] Re: Shared memory shmat/dt not working well in 2.5.x

Zlatko Calusic <[email protected]> writes:

> Alessandro Suardi <[email protected]> writes:
>> The more complicated bug you're talking about is the exec_mmap
>> change introduced in 2.5.19 and fixed a handful of versions
>> later, possibly .28, where PMON wouldn't start after 120"...
>> I guess :)
>
> Oh, well, if that one is really fixed, then I have another one. ;)
>

Hm, not anymore!

Thanks to you guys, 2.5.41 is flawless. It works under all the tests
that were failing before. Great work!

I did some benchmarks and it looks like 2.5 is a little bit slower. I
have two small perl+plsql applications for testing purposes,
"cucibench" benches how long it takes to parse cucitail POP daemon log
and put it into database (insert load). "mailproc" processes sendmail
log and does the same. mailproc is a little bit more complicated (it
also does updates). The results are as follows (numbers are
minutes:seconds it took to finish the task on Oracle 9.2.0.1):

| app | 2.4.19 | 2.5.41 |
|-----------------------------|
| cucibench | 03:17 | 03:38 |
| mailproc | 02:12 | 02:30 |
|-----------------------------|

I also observed that other application I use occasionally - LXR (Linux
source cross referencing tool) - takes much longer to generate xref
database (which is in Berkeley DB files). It works in three passes,
where the last one, when it dumps symbols into DB, is interesting. In
2.4 it finishes quickly (it uses 100% CPU, then occasionally syncs the
databases - heavy write traffic for a second - then continues), but
2.5 has problems with it (it stucks writing to disk all the time, CPU
usage is minimal and process progresses very slowly). Andrew, if
you're interested I can send you some numbers to describe the case
better.

Keep up the good work!
--
Zlatko

2002-10-08 11:33:30

by Duncan Sands

[permalink] [raw]
Subject: Re: [PATCH] Re: Shared memory shmat/dt not working well in 2.5.x

> I also observed that other application I use occasionally - LXR (Linux
> source cross referencing tool) - takes much longer to generate xref
> database (which is in Berkeley DB files). It works in three passes,
> where the last one, when it dumps symbols into DB, is interesting. In
> 2.4 it finishes quickly (it uses 100% CPU, then occasionally syncs the
> databases - heavy write traffic for a second - then continues), but
> 2.5 has problems with it (it stucks writing to disk all the time, CPU
> usage is minimal and process progresses very slowly). Andrew, if
> you're interested I can send you some numbers to describe the case
> better.

Hmmm, are you using ext3? Changes to the meaning of yield sometimes
make fsync go very slowly. This problem has been around since 2.5.28,
and hasn't yet been fixed (As for a fix, Andrew Morton said "I'll sit tight for
the while, see where shed_yield() behaviour ends up").

All the best,

Duncan.

2002-10-08 15:05:02

by Zlatko Calusic

[permalink] [raw]
Subject: Re: [PATCH] Re: Shared memory shmat/dt not working well in 2.5.x

Duncan Sands <[email protected]> writes:

>> I also observed that other application I use occasionally - LXR (Linux
>> source cross referencing tool) - takes much longer to generate xref
>> database (which is in Berkeley DB files). It works in three passes,
>> where the last one, when it dumps symbols into DB, is interesting. In
>> 2.4 it finishes quickly (it uses 100% CPU, then occasionally syncs the
>> databases - heavy write traffic for a second - then continues), but
>> 2.5 has problems with it (it stucks writing to disk all the time, CPU
>> usage is minimal and process progresses very slowly). Andrew, if
>> you're interested I can send you some numbers to describe the case
>> better.
>
> Hmmm, are you using ext3? Changes to the meaning of yield sometimes
> make fsync go very slowly. This problem has been around since 2.5.28,
> and hasn't yet been fixed (As for a fix, Andrew Morton said "I'll sit tight for
> the while, see where shed_yield() behaviour ends up").
>

Yes, it's an ext3 partition, ordered mode. I don't have ext2 compiled
into kernel anymore. :)

Hm, if it's a problem with fsync() then that could explain slight
Oracle slowdown, too, as I think that Oracle is a heavy user of
fsync. But I don't know that for sure. I'll investigate further..

Regards,
--
Zlatko

2002-10-08 15:20:36

by Duncan Sands

[permalink] [raw]
Subject: Re: [PATCH] Re: Shared memory shmat/dt not working well in 2.5.x

> > Hmmm, are you using ext3? Changes to the meaning of yield sometimes
> > make fsync go very slowly. This problem has been around since 2.5.28,
> > and hasn't yet been fixed (As for a fix, Andrew Morton said "I'll sit
> > tight for the while, see where shed_yield() behaviour ends up").
>
> Yes, it's an ext3 partition, ordered mode. I don't have ext2 compiled
> into kernel anymore. :)
>
> Hm, if it's a problem with fsync() then that could explain slight
> Oracle slowdown, too, as I think that Oracle is a heavy user of
> fsync. But I don't know that for sure. I'll investigate further..

Andrew Morton made this suggestion to me:

>Please try replacing the yield() in fs/jbd/transaction.c
>with
>
> set_current_state(TASK_RUNNING);
> schedule();

and indeed it cured my problems.

All the best,

Duncan.