2002-11-29 23:30:46

by J.A. Magallon

[permalink] [raw]
Subject: [PATCHSET] Linux 2.4.20-jam0

Hi all...

New announcement of the -jam patches. While we all await for a real
-aa1 from Andrea, here goes an -jam0 (ie, not -jam1), with -aa
patch ported. It runs fine on my box.

Additions since last release (see README for credits...):

- reverted the fast-pte part of -aa. Still have to try again
to see if it is more stable now.
- force-inline patch.
- 4M queue size for block writes
- P4 prefetching
- Orlov inode allocator for 2.4
- BProc 3.2.3

(btw, I have bee looking for orlov for ext2 - it exists ? -
and htree for ext2/3. Any pointers ? )

As always, get it at:

http://giga.cps.unizar.es/~magallon/linux/kernel/2.4.20-jam0.tar.gz
http://giga.cps.unizar.es/~magallon/linux/kernel/2.4.20-jam0/

Enjoy !!

--
J.A. Magallon <[email protected]> \ Software is like sex:
werewolf.able.es \ It's better when it's free
Mandrake Linux release 9.1 (Cooker) for i586
Linux 2.4.20-jam0 (gcc 3.2 (Mandrake Linux 9.1 3.2-4mdk))


2002-11-30 00:40:36

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCHSET] Linux 2.4.20-jam0

"J.A. Magallon" wrote:
>
> - Orlov inode allocator for 2.4

The Orlov allocator in 2.5 has caused a tremendous performance regression
in dbench-on-ext3/ordered-on-scsi.

I don't know why yet - I doubt if it's due to the allocator itself - more
likely an IO scheduling bug in ext3, or a bug in the 2.5 elevator.

There is no such regression on IDE - presumably write caching is covering
up the problem.

So that's something to watch out for.

(where did your Orlov patch from? All the tabs are mangled)

You'll need to port this missing bit, which provides the `oldalloc'
and `orlov' mount options.


fs/ext3/super.c | 4 ++++
1 files changed, 4 insertions(+)

--- 25/fs/ext3/super.c~ext3-oldalloc Fri Nov 29 02:21:20 2002
+++ 25-akpm/fs/ext3/super.c Fri Nov 29 02:22:03 2002
@@ -662,6 +662,10 @@ static int parse_options (char * options
return 0;
sbi->s_resuid = v;
}
+ else if (!strcmp (this_char, "oldalloc"))
+ set_opt (sbi->s_mount_opt, OLDALLOC);
+ else if (!strcmp (this_char, "orlov"))
+ clear_opt (sbi->s_mount_opt, OLDALLOC);
#ifdef CONFIG_JBD_DEBUG
else if (!strcmp (this_char, "ro-after")) {
unsigned long v;

_

2002-11-30 06:30:59

by Hu Gang

[permalink] [raw]
Subject: Re: [PATCHSET] Linux 2.4.20-jam0

On Sat, 30 Nov 2002 00:38:07 +0100
"J.A. Magallon" <[email protected]> wrote:

> - Orlov inode allocator for 2.4

- add andrew morton supper.c patch
- change the indent to linux standard.


--
- Hu Gang


Attachments:
2.4.20_orlov-indent (17.43 kB)

2002-11-30 14:38:18

by J.A. Magallon

[permalink] [raw]
Subject: Re: [PATCHSET] Linux 2.4.20-jam0


On 2002.11.30 Andrew Morton wrote:
>"J.A. Magallon" wrote:
>>
>> - Orlov inode allocator for 2.4
>
>The Orlov allocator in 2.5 has caused a tremendous performance regression
>in dbench-on-ext3/ordered-on-scsi.
>
>I don't know why yet - I doubt if it's due to the allocator itself - more
>likely an IO scheduling bug in ext3, or a bug in the 2.5 elevator.
>
>There is no such regression on IDE - presumably write caching is covering
>up the problem.
>

Is there any way I can test that ? I have all scsi drives and can
for example remount with 'orlov' or 'oldalloc'...

>So that's something to watch out for.
>
>(where did your Orlov patch from? All the tabs are mangled)
>

See the other answer to previous message...

>You'll need to port this missing bit, which provides the `oldalloc'
>and `orlov' mount options.
>

Thanks, I will add it...
BTW, who puts names to options ? Wouldn't be more intuitive to add options
like 'ialloc_std' or 'ialloc_orlov' ? Too late to change this ?

TIA

--
J.A. Magallon <[email protected]> \ Software is like sex:
werewolf.able.es \ It's better when it's free
Mandrake Linux release 9.1 (Cooker) for i586
Linux 2.4.20-jam0 (gcc 3.2 (Mandrake Linux 9.1 3.2-4mdk))

2002-11-30 14:51:27

by J.A. Magallon

[permalink] [raw]
Subject: Re: [PATCHSET] Linux 2.4.20-jam0


On 2002.11.30 hugang wrote:
>On Sat, 30 Nov 2002 00:38:07 +0100
>"J.A. Magallon" <[email protected]> wrote:
>
>> - Orlov inode allocator for 2.4
>
>- add andrew morton supper.c patch
>- change the indent to linux standard.
>

Thankks, I will update it.
Just a note for further updates: could you make the patch from /usr/src,
instead of /usr/src/linux, so it can be applied with patch -p1 ?
It is the standard way...

TIA

--
J.A. Magallon <[email protected]> \ Software is like sex:
werewolf.able.es \ It's better when it's free
Mandrake Linux release 9.1 (Cooker) for i586
Linux 2.4.20-jam0 (gcc 3.2 (Mandrake Linux 9.1 3.2-4mdk))

2002-11-30 14:55:29

by J.A. Magallon

[permalink] [raw]
Subject: Re: [PATCHSET] Linux 2.4.20-jam0


On 2002.11.30 Sean Neakums wrote:
>commence J.A. Magallon quotation:
>
>> Thanks, I will add it...
>> BTW, who puts names to options ? Wouldn't be more intuitive to add options
>> like 'ialloc_std' or 'ialloc_orlov' ? Too late to change this ?
>
>There isn't exactly a whole lot of contention in the mount-options
>namespace. And neither orlov not ialloc_orlov is in any way
>"intuitive". However, orlov is more guessable, to my mind, than
>ialloc_orlov.
>

Well, what I think is more understandable when you see a /etc/fstab
would be something like 'inode_allocator=std' or 'inode_allocator=orlov' or
'inode_allocator=xxxxx' if something new appears.

--
J.A. Magallon <[email protected]> \ Software is like sex:
werewolf.able.es \ It's better when it's free
Mandrake Linux release 9.1 (Cooker) for i586
Linux 2.4.20-jam0 (gcc 3.2 (Mandrake Linux 9.1 3.2-4mdk))

2002-11-30 14:51:18

by Sean Neakums

[permalink] [raw]
Subject: Re: [PATCHSET] Linux 2.4.20-jam0

commence J.A. Magallon quotation:

> Thanks, I will add it...
> BTW, who puts names to options ? Wouldn't be more intuitive to add options
> like 'ialloc_std' or 'ialloc_orlov' ? Too late to change this ?

There isn't exactly a whole lot of contention in the mount-options
namespace. And neither orlov not ialloc_orlov is in any way
"intuitive". However, orlov is more guessable, to my mind, than
ialloc_orlov.

--
/ |
[|] Sean Neakums | Questions are a burden to others;
[|] <[email protected]> | answers a prison for oneself.
\ |

2002-11-30 17:02:49

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCHSET] Linux 2.4.20-jam0

At 03:58 PM 11/30/2002 +0100, J.A. Magallon wrote:

>On 2002.11.30 hugang wrote:
> >On Sat, 30 Nov 2002 00:38:07 +0100
> >"J.A. Magallon" <[email protected]> wrote:
> >
> >> - Orlov inode allocator for 2.4
> >
> >- add andrew morton supper.c patch
> >- change the indent to linux standard.
> >
>
>Thankks, I will update it.
>Just a note for further updates: could you make the patch from /usr/src,
>instead of /usr/src/linux, so it can be applied with patch -p1 ?
>It is the standard way...

Concurr... easiest to read.

-Mike

2002-11-30 17:43:47

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [PATCHSET] Linux 2.4.20-jam0

On Sat, Nov 30, 2002 at 12:38:07AM +0100, J.A. Magallon wrote:
> - reverted the fast-pte part of -aa. Still have to try again
> to see if it is more stable now.

AFIK this was reproduced by Srihari on nohighmem so it must be that
somebody is calling pgd_free_fast on a pgd that cannot be re-used.
Can you try this patch on top of 2.4.20rc2aa1? (or jam0 after backing
out the fast-pte removal that would otherwise forbid the debugging check
to trigger)

--- 2.4.20rc2aa1/include/asm-i386/pgalloc.h.~1~ 2002-11-27 10:09:30.000000000 +0100
+++ 2.4.20rc2aa1/include/asm-i386/pgalloc.h 2002-11-30 18:43:29.000000000 +0100
@@ -97,6 +97,20 @@ static inline pgd_t *get_pgd_fast(void)

static inline void free_pgd_fast(pgd_t *pgd)
{
+ {
+ int i;
+ for (i = 0; i < USER_PTRS_PER_PGD; i++)
+ if (pgd_val(pgd[i])) {
+ printk("non zero idx %d\n", i);
+ BUG();
+ }
+ for (i = USER_PTRS_PER_PGD; i < PTRS_PER_PGD - USER_PTRS_PER_PGD -
+ ((-VMALLOC_START + PGDIR_SIZE - 1) >> PGDIR_SHIFT); i++)
+ if (pgd_val(pgd[i]) != pgd_val(swapper_pg_dir[i])) {
+ printk("corrupted idx %d\n", i);
+ BUG();
+ }
+ }
*(unsigned long *)pgd = (unsigned long) pgd_quicklist;
pgd_quicklist = (unsigned long *) pgd;
pgtable_cache_size++;

the stack trace should tell us who is freeing a not valid pgd.
without this check the crash happens in an innocent place and it's not
obvious why it breaks.

Andrea

2002-11-30 22:08:27

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCHSET] Linux 2.4.20-jam0

"J.A. Magallon" wrote:
>
> On 2002.11.30 Andrew Morton wrote:
> >"J.A. Magallon" wrote:
> >>
> >> - Orlov inode allocator for 2.4
> >
> >The Orlov allocator in 2.5 has caused a tremendous performance regression
> >in dbench-on-ext3/ordered-on-scsi.
> >
> >I don't know why yet - I doubt if it's due to the allocator itself - more
> >likely an IO scheduling bug in ext3, or a bug in the 2.5 elevator.
> >
> >There is no such regression on IDE - presumably write caching is covering
> >up the problem.
> >
>
> Is there any way I can test that ? I have all scsi drives and can
> for example remount with 'orlov' or 'oldalloc'...

It is specific to SMP, and for some reason doesn't manifest with
IDE hardware.

See
http://sourceforge.net/mailarchive/forum.php?thread_id=1365460&forum_id=6379
for the analysis.

2002-11-30 23:29:25

by J.A. Magallon

[permalink] [raw]
Subject: Re: [PATCHSET] Linux 2.4.20-jam0


On 2002.11.30 Andrea Arcangeli wrote:
>On Sat, Nov 30, 2002 at 12:38:07AM +0100, J.A. Magallon wrote:
>> - reverted the fast-pte part of -aa. Still have to try again
>> to see if it is more stable now.
>
>AFIK this was reproduced by Srihari on nohighmem so it must be that
>somebody is calling pgd_free_fast on a pgd that cannot be re-used.
>Can you try this patch on top of 2.4.20rc2aa1? (or jam0 after backing
>out the fast-pte removal that would otherwise forbid the debugging check
>to trigger)
>

Yes, I will try. Hope I use the piece of kernel that triggers this.

--
J.A. Magallon <[email protected]> \ Software is like sex:
werewolf.able.es \ It's better when it's free
Mandrake Linux release 9.1 (Cooker) for i586
Linux 2.4.20-jam1 (gcc 3.2 (Mandrake Linux 9.1 3.2-4mdk))

2002-12-01 01:47:41

by J.A. Magallon

[permalink] [raw]
Subject: Re: [PATCHSET] Linux 2.4.20-jam0


On 2002.11.30 Andrea Arcangeli wrote:
>On Sat, Nov 30, 2002 at 12:38:07AM +0100, J.A. Magallon wrote:
>> - reverted the fast-pte part of -aa. Still have to try again
>> to see if it is more stable now.
>
>AFIK this was reproduced by Srihari on nohighmem so it must be that
>somebody is calling pgd_free_fast on a pgd that cannot be re-used.
>Can you try this patch on top of 2.4.20rc2aa1? (or jam0 after backing
>out the fast-pte removal that would otherwise forbid the debugging check
>to trigger)
>

I suppose this will be useless (tainted ;))
BTW, what does mean the symbol address mismatch ?

ksymoops 2.4.8 on i686 2.4.20-jam1. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.20-jam1/ (default)
-m /boot/System.map-2.4.20-jam1 (default)

Warning: You did not tell me where to find symbol information. I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc. ksymoops -h explains the options.

Warning (compare_maps): mismatch on symbol __nvsym03120 , nvdriver says 692dac20, /lib/modules/2.4.20-jam1/video/nvdriver.o says 692d3560. Ignoring /lib/modules/2.4.20-jam1/video/nvdriver.o entry
Dec 1 02:35:57 werewolf kernel: Unable to handle kernel paging request at virtual address 47000be8
Dec 1 02:35:57 werewolf kernel: 4012060d
Dec 1 02:35:57 werewolf kernel: *pde = 070001e3
Dec 1 02:35:57 werewolf kernel: Oops: 0000 2.4.20-jam1 #4 SMP dom dic 1 00:44:09 CET 2002
Dec 1 02:35:57 werewolf kernel: CPU: 0
Dec 1 02:35:57 werewolf kernel: EIP: 0010:[dup_mmap+285/458] Tainted: P
Dec 1 02:35:57 werewolf kernel: EIP: 0010:[<4012060d>] Tainted: P
Using defaults from ksymoops -t elf32-i386 -a i386
Dec 1 02:35:57 werewolf kernel: EFLAGS: 00010202
Dec 1 02:35:57 werewolf kernel: eax: 42527780 ebx: 4387b6e0 ecx: 00000000 edx: 47000be0
Dec 1 02:35:57 werewolf kernel: esi: 467a5544 edi: 4387b724 ebp: 467a5500 esp: 4504bf28
Dec 1 02:35:57 werewolf kernel: ds: 0018 es: 0018 ss: 0018
Dec 1 02:35:57 werewolf kernel: Process rc (pid: 3543, stackpage=4504b000)
Dec 1 02:35:57 werewolf kernel: Stack: 419afeac 000001f0 46c62a80 4504a000 46c62a8c 42527820 4252783c 4252780c
Dec 1 02:35:57 werewolf kernel: 42527780 4011f65c 42527780 000001f0 fffffff4 5046e000 43273a64 42cd0aa4
Dec 1 02:35:57 werewolf kernel: 00000011 4011fdfb 00000011 5046e000 4504bf98 4504bf98 4504bfa8 00000000
Dec 1 02:35:57 werewolf kernel: Call Trace: [copy_mm+252/352] [do_fork+843/2272] [sys_fork+39/48] [system_call+51/56]
Dec 1 02:35:57 werewolf kernel: Call Trace: [<4011f65c>] [<4011fdfb>] [<40107d07>] [<40109777>]
Dec 1 02:35:57 werewolf kernel: Code: 8b 42 08 8b 48 08 f0 ff 42 14 f6 43 15 08 74 07 f0 ff 89 18


>>EIP; 4012060d <dup_mmap+11d/1ca> <=====

>>eax; 42527780 <[videodev].data.end+c8341/110c21>
>>ebx; 4387b6e0 <[8390].rodata.end+d00291/34bcc11>
>>edx; 47000be0 <[mii].text.end+c8e404/122d884>
>>esi; 467a5544 <[mii].text.end+432d68/122d884>
>>edi; 4387b724 <[8390].rodata.end+d002d5/34bcc11>
>>ebp; 467a5500 <[mii].text.end+432d24/122d884>
>>esp; 4504bf28 <[8390].rodata.end+24d0ad9/34bcc11>

Trace; 4011f65c <copy_mm+fc/160>
Trace; 4011fdfb <do_fork+34b/8e0>
Trace; 40107d07 <sys_fork+27/30>
Trace; 40109777 <system_call+33/38>

Code; 4012060d <dup_mmap+11d/1ca>
00000000 <_EIP>:
Code; 4012060d <dup_mmap+11d/1ca> <=====
0: 8b 42 08 mov 0x8(%edx),%eax <=====
Code; 40120610 <dup_mmap+120/1ca>
3: 8b 48 08 mov 0x8(%eax),%ecx
Code; 40120613 <dup_mmap+123/1ca>
6: f0 ff 42 14 lock incl 0x14(%edx)
Code; 40120617 <dup_mmap+127/1ca>
a: f6 43 15 08 testb $0x8,0x15(%ebx)
Code; 4012061b <dup_mmap+12b/1ca>
e: 74 07 je 17 <_EIP+0x17> 40120624 <dup_mmap+134/1ca>
Code; 4012061d <dup_mmap+12d/1ca>
10: f0 ff 89 18 00 00 00 lock decl 0x18(%ecx)


2 warnings issued. Results may not be reliable.

--
J.A. Magallon <[email protected]> \ Software is like sex:
werewolf.able.es \ It's better when it's free
Mandrake Linux release 9.1 (Cooker) for i586
Linux 2.4.20-jam1 (gcc 3.2 (Mandrake Linux 9.1 3.2-4mdk))