2004-01-15 22:32:41

by Marcelo Tosatti

[permalink] [raw]
Subject: Linux 2.4.25-pre5


Hi,

Here is -pre5.

This version fixes the "memory filled with unfreeable inodes" problem
which occurs on high memory machines in some workloads. That is the last
big problem 2.4 VM had with highmem I believe.

It contains a few other important fixes:
- fixes a SMP deadlock introduced during 2.4.23 (which could hang the
machine on high filesystem activity).
- fixes a memory allocation deadlock in USB

Amongst others.

Please test it!

Detailed changelog follows


Summary of changes from v2.4.25-pre4 to v2.4.25-pre5
============================================

<bjorn.helgaas:hp.com>:
o ia64 Configure.help update

<davej:redhat.com>:
o Add AGP support for Radeon IGP 345M

<jack:ucw.cz>:
o Fix ext3/quota deadlock

<khali:linux-fr.org>:
o i2c cleanups: Config.in
o i2c cleanup: saa7146.h should include i2c-old.h, not i2c.h
o i2c cleanup: i2c-core fixes

<len.brown:intel.com>:
o [ACPI] fix smpboot.c mis-merge http://bugzilla.kernel.org/show_bug.cgi?id=1706

<marcelo:logos.cnet>:
o Cset exclude: [email protected]|ChangeSet|20040109135735|05388
o Fix microcode update compilation error
o Fix Makefile typo

<moilanen:austin.ibm.com>:
o [PPC64] Improved NVRAM handling
o [PPC64] Buffer error log entries in NVRAM

<nitin.a.kamble:intel.com>:
o microcode update

<rtjohnso:eecs.berkeley.edu>:
o USB ioctl fixes (vicam.c, w9968cf.c)

<sfr:au1.ibm.com>:
o [PPC64] Fix a compile warning that becomes an error with gcc 3.4

<thomas:winischhofer.net>:
o SiS Framebuffer driver update

<xose:wanadoo.es>:
o ips SCSI driver update

Adrian Bunk:
o fix CONFIG_DS1742 Config.in entry
o remove REPORT_LUNS from cpqfcTSstructs.h
o disallow modular CONFIG_COMX

Alan Cox:
o Fix USB hangs
o Minimal fix for the R128 drivers

Bartlomiej Zolnierkiewicz:
o create /proc/ide/hdX/capacity only once

Ben Collins:
o [IEEE1394]: Fix bug in updating configrom

David Engebretsen:
o [PPC64] Distribute processing of hypervisor events over all processors

David Woodhouse:
o Fix SMP deadlock in __wait_on_freeing_inode() (introduced during 2.4.23)

Hugh Dickins:
o tmpfs readdir does not update dir atime

Paul Mackerras:
o [PPC64] Remove some unnecessary code from arch/ppc64/kernel/prom.c
o [PPC64] Make /dev/sda3 the default root device (rather than sda2)
o [PPC64] Add functions to update and manage flash ROM under Linux on pSeries
o [PPC64] Update defconfig and the example configs

Pete Zaitcev:
o Unhork ymfpci broken by hasty janitors

Rik van Riel:
o Reclaim inodes with highmem pages when low on memory

Tom Rini:
o PPC32: Add support for the CPCI-405 board
o PPC32: Fix cross-compilation from Solaris or Cygwin
o PPC32: s/CONFIG_SMC2_UART/CONFIG_8xx_SMC2/g to match the code


2004-01-15 23:09:02

by David Miller

[permalink] [raw]
Subject: Re: Linux 2.4.25-pre5

On Thu, 15 Jan 2004 21:12:37 -0200
Arnaldo Carvalho de Melo <[email protected]> wrote:

> Dave, haven't checked, but perhaps this cures it:
>
> <marcelo:logos.cnet>:
> o Cset exclude: [email protected]|ChangeSet|20040109135735|05388
> o Fix microcode update compilation error
> o Fix Makefile typo
> ^^^^^^^^^^^^^^^^^^^^
> ^^^^^^^^^^^^^^^^^^^^
> ^^^^^^^^^^^^^^^^^^^^

Don't think so, current bk://linux.bkbits.net/linux-2.4 that I can see
still has "UBLEVEL" at the beginning of line 3 of linux/Makefile

2004-01-15 22:55:42

by David Miller

[permalink] [raw]
Subject: Re: Linux 2.4.25-pre5

On Thu, 15 Jan 2004 18:19:40 -0200 (BRST)
Marcelo Tosatti <[email protected]> wrote:

> Here is -pre5.

If this is anything like the current 2.4.x BK tree, people will need this
in order to get a successful build:

--- Makefile.~1~ Thu Jan 15 12:13:10 2004
+++ Makefile Thu Jan 15 12:13:12 2004
@@ -1,6 +1,6 @@
VERSION = 2
PATCHLEVEL = 4
-UBLEVEL = 25
+SUBLEVEL = 25
EXTRAVERSION = -pre5

KERNELRELEASE=$(VERSION).$(PATCHLEVEL).$(SUBLEVEL)$(EXTRAVERSION)

2004-01-15 23:03:17

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: Linux 2.4.25-pre5

Em Thu, Jan 15, 2004 at 02:55:19PM -0800, David S. Miller escreveu:
> On Thu, 15 Jan 2004 18:19:40 -0200 (BRST)
> Marcelo Tosatti <[email protected]> wrote:
>
> > Here is -pre5.
>
> If this is anything like the current 2.4.x BK tree, people will need this
> in order to get a successful build:
>

Dave, haven't checked, but perhaps this cures it:

<marcelo:logos.cnet>:
o Cset exclude: [email protected]|ChangeSet|20040109135735|05388
o Fix microcode update compilation error
o Fix Makefile typo
^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^

2004-01-15 23:13:39

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: Linux 2.4.25-pre5

Em Thu, Jan 15, 2004 at 03:01:37PM -0800, David S. Miller escreveu:
> On Thu, 15 Jan 2004 21:12:37 -0200
> Arnaldo Carvalho de Melo <[email protected]> wrote:
>
> > Dave, haven't checked, but perhaps this cures it:
> >
> > <marcelo:logos.cnet>:
> > o Cset exclude: [email protected]|ChangeSet|20040109135735|05388
> > o Fix microcode update compilation error
> > o Fix Makefile typo
> > ^^^^^^^^^^^^^^^^^^^^
> > ^^^^^^^^^^^^^^^^^^^^
> > ^^^^^^^^^^^^^^^^^^^^
>
> Don't think so, current bk://linux.bkbits.net/linux-2.4 that I can see
> still has "UBLEVEL" at the beginning of line 3 of linux/Makefile

oops, Marcelo?

2004-01-15 23:42:54

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: Linux 2.4.25-pre5



On Thu, 15 Jan 2004, David S. Miller wrote:

> On Thu, 15 Jan 2004 18:19:40 -0200 (BRST)
> Marcelo Tosatti <[email protected]> wrote:
>
> > Here is -pre5.
>
> If this is anything like the current 2.4.x BK tree, people will need this
> in order to get a successful build:
>
> --- Makefile.~1~ Thu Jan 15 12:13:10 2004
> +++ Makefile Thu Jan 15 12:13:12 2004
> @@ -1,6 +1,6 @@
> VERSION = 2
> PATCHLEVEL = 4
> -UBLEVEL = 25
> +SUBLEVEL = 25
> EXTRAVERSION = -pre5
>
> KERNELRELEASE=$(VERSION).$(PATCHLEVEL).$(SUBLEVEL)$(EXTRAVERSION)

I forgot to "bk push" (I never forget, you know).

Sir Woodhouse,

I just managed to crash a SMP box running dbench.

I guess this is related to your deadlock fix?

Unable to handle kernel NULL pointer dereference at virtual address
00000000 c011a673
*pde = 1efe0001
Oops: 0000
CPU: 6
EIP: 0010:[<c011a673>]

Not tainted
Using defaults from ksymoops -t elf32-i386-a i386
EFLAGS: 00010097 eax: ec4f81d4 ebx: df0f7dd0 ecx: 00000000 edx:
00000003 esi: cd09a1a0 edi: ec4f81d0 ebp: defe3e30 esp: defe3e18 ds: 0018
es: 0018 ss: 0018
Process sh (pid: 2325, stackpage=defe3000)
Stack:
00000001 00000282 00000003 ec4f8120 cd09a1a0 cd09e800 f6ef1002 c0163e24
ec4f8120 ec4f8120 defd9990 00000001 42132674 f5d426e0 cd09a1a0f6efb483
cd09a1f3 c0166010 cd09e800 00001002 cd09a1a0 ffffffea 00000000 cd09c440

Call Trace: [<c0163e24>] [<c0166010>] [<c015620b>] [<c016400e>]
[<c014c3a3>]
[<c014ccdf>] [<c014d16b>] [<c014d66d>] [<c0140ea4>] [<c014c02f>]
[<c01411f4>]
[<c0108c83>]
Code: 8b 01 85 45 f0 74 69 31 d2 9c 5e fa f0 fe 0d 80 7c 3a c0 0f

>>EIP; c011a673 <__wake_up+33/c0> <=====
Trace; c0163e24 <proc_get_inode+64/140>
Trace; c0166010 <proc_lookup+c0/e0>
Trace; c015620b <d_alloc+1b/180>
Trace; c016400e <proc_root_lookup+2e/50>
Trace; c014c3a3 <real_lookup+73/100>
Trace; c014ccdf <link_path_walk+76f/a20>
Trace; c014d16b <path_lookup+1b/30>
Trace; c014d66d <open_namei+6d/640>
Trace; c0140ea4 <filp_open+34/60>
Trace; c014c02f <getname+5f/a0>
Trace; c01411f4 <sys_open+34/a0>
Trace; c0108c83 <system_call+33/40>
Code; c011a673 <__wake_up+33/c0>
00000000 <_EIP>:
Code; c011a673 <__wake_up+33/c0> <=====
0: 8b 01 mov (%ecx),%eax <=====
Code; c011a675 <__wake_up+35/c0>
2: 85 45 f0 test %eax,0xfffffff0(%ebp)
Code; c011a678 <__wake_up+38/c0>
5: 74 69 je 70 <_EIP+0x70> c011a6e3
<__wake_up+a3/c0>
Code; c011a67a <__wake_up+3a/c0>
7: 31 d2 xor %edx,%edx
Code; c011a67c <__wake_up+3c/c0>
9: 9c pushf
Code; c011a67d <__wake_up+3d/c0>
a: 5e pop %esi
Code; c011a67e <__wake_up+3e/c0>
b: fa cli
Code; c011a67f <__wake_up+3f/c0>
c: f0 fe 0d 80 7c 3a c0 lock decb 0xc03a7c80
Code; c011a686 <__wake_up+46/c0>
13: 0f 00 00 sldtl (%eax)


proc_get_inode() does not call __wake_up(), so I'm wondering whats going
on here.

0xc0163e17 <proc_get_inode+87>: mov 0x20(%edi),%eax
0xc0163e1a <proc_get_inode+90>: push %ebx
0xc0163e1b <proc_get_inode+91>: call *0x8(%eax)
0xc0163e1e <proc_get_inode+94>: push %ebx
0xc0163e1f <proc_get_inode+95>: call 0xc01584c0 <unlock_new_inode>
****> 0xc0163e24 <proc_get_inode+100>: pop %edi
0xc0163e25 <proc_get_inode+101>: pop %eax
0xc0163e26 <proc_get_inode+102>: test %ebx,%ebx

2004-01-16 07:45:24

by David Woodhouse

[permalink] [raw]
Subject: Re: Linux 2.4.25-pre5

On Thu, 2004-01-15 at 21:23 -0200, Marcelo Tosatti wrote:
> On Thu, 15 Jan 2004, David S. Miller wrote:
>
> > On Thu, 15 Jan 2004 18:19:40 -0200 (BRST)
> > Marcelo Tosatti <[email protected]> wrote:
> >
> > > Here is -pre5.
> >
> > If this is anything like the current 2.4.x BK tree, people will need this
> > in order to get a successful build:
> >
> > --- Makefile.~1~ Thu Jan 15 12:13:10 2004
> > +++ Makefile Thu Jan 15 12:13:12 2004
> > @@ -1,6 +1,6 @@
> > VERSION = 2
> > PATCHLEVEL = 4
> > -UBLEVEL = 25
> > +SUBLEVEL = 25
> > EXTRAVERSION = -pre5
> >
> > KERNELRELEASE=$(VERSION).$(PATCHLEVEL).$(SUBLEVEL)$(EXTRAVERSION)
>
> I forgot to "bk push" (I never forget, you know).
>
> Sir Woodhouse,
>
> I just managed to crash a SMP box running dbench.
>
> I guess this is related to your deadlock fix?

Oh bugger. It's all very well observing that it's OK to leave ourselves
on the wait queue because the inode is about to be freed..... but the
wait queue doesn't get reinitialised again when the inode is allocated
again from the same slab page. Someone _else_ inherits the stale wait
queue.

I hereby declare myself to be Today's Official Mr Fuck All Good.

===== fs/inode.c 1.48 vs edited =====
--- 1.48/fs/inode.c Wed Jan 14 20:51:18 2004
+++ edited/fs/inode.c Fri Jan 16 07:43:14 2004
@@ -96,6 +96,7 @@
if (inode) {
struct address_space * const mapping = &inode->i_data;

+ init_waitqueue_head(&inode->i_wait);
inode->i_sb = sb;
inode->i_dev = sb->s_dev;
inode->i_blkbits = sb->s_blocksize_bits;
@@ -147,7 +148,6 @@

void __inode_init_once(struct inode *inode)
{
- init_waitqueue_head(&inode->i_wait);
INIT_LIST_HEAD(&inode->i_hash);
INIT_LIST_HEAD(&inode->i_data.clean_pages);
INIT_LIST_HEAD(&inode->i_data.dirty_pages);


--
dwmw2


2004-01-16 08:45:22

by Andrew Morton

[permalink] [raw]
Subject: Re: Linux 2.4.25-pre5

David Woodhouse <[email protected]> wrote:
>
> ===== fs/inode.c 1.48 vs edited =====
> --- 1.48/fs/inode.c Wed Jan 14 20:51:18 2004
> +++ edited/fs/inode.c Fri Jan 16 07:43:14 2004
> @@ -96,6 +96,7 @@
> if (inode) {
> struct address_space * const mapping = &inode->i_data;
>
> + init_waitqueue_head(&inode->i_wait);
> inode->i_sb = sb;
> inode->i_dev = sb->s_dev;
> inode->i_blkbits = sb->s_blocksize_bits;
> @@ -147,7 +148,6 @@
>
> void __inode_init_once(struct inode *inode)
> {
> - init_waitqueue_head(&inode->i_wait);
> INIT_LIST_HEAD(&inode->i_hash);
> INIT_LIST_HEAD(&inode->i_data.clean_pages);
> INIT_LIST_HEAD(&inode->i_data.dirty_pages);

Really, the init_waitqueue_head() should be done prior to putting the inode
back into slab.

2004-01-16 08:58:21

by David Woodhouse

[permalink] [raw]
Subject: Re: Linux 2.4.25-pre5

On Fri, 2004-01-16 at 00:45 -0800, Andrew Morton wrote:
> Really, the init_waitqueue_head() should be done prior to putting the inode
> back into slab.

I had that version first but preferred doing it with all the other inode
initialisation in alloc_inode() rather than in destroy_inode(). If you
do it this way, you reinit even when you're about to discard the slab
pages. I don't care much though.

===== inode.c 1.48 vs edited =====
--- 1.48/fs/inode.c Wed Jan 14 20:51:18 2004
+++ edited/inode.c Fri Jan 16 08:56:14 2004
@@ -127,6 +127,10 @@
{
if (inode_has_buffers(inode))
BUG();
+ /* Reinitialise the waitqueue head because __wait_on_freeing_inode()
+ may have left stale entries on it which it can't remove (since
+ it knows we're freeing the inode right now */
+ init_waitqueue_head(&inode->i_wait);
if (inode->i_sb->s_op->destroy_inode)
inode->i_sb->s_op->destroy_inode(inode);
else

--
dwmw2


2004-01-19 15:15:46

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: Linux 2.4.25-pre5

On Thu, 15 Jan 2004, Marcelo Tosatti wrote:
> Summary of changes from v2.4.25-pre4 to v2.4.25-pre5
> ============================================
>
> Rik van Riel:
> o Reclaim inodes with highmem pages when low on memory

This introduces a warning when compiling fs/inode.c if CONFIG_HIGHMEM is not
set, since in that case avg_pages is defined but not used.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds