2008-02-22 14:32:23

by Unknown

[permalink] [raw]
Subject: PROBLEM: 2.4.36.1 hangs.

1) 2.4.36.1 hangs (probably during ext2_readdir())

2) 2.4.36.1 hangs during compilation of lighttpd-1.4.18.
It is probably during ext2_readdir() but I cannot confirm that..
Currently it only happens during compilation of lighttpd-1.4.18, always
in the same place.
Kernel 2.4.36 w/ same options (make oldconfig) works fine.

Last few lines of 'strace -f make' just before hang:
[pid 12817] open(".", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 3
[pid 12817] fstat64(3, {st_mode=S_IFDIR|0777, st_size=8192, ...}) = 0
[pid 12817] fcntl64(3, F_SETFD, FD_CLOEXEC) = 0
[pid 12817] getdents64(3, /* 114 entries */, 4096) = 4088
[pid 12817] getdents64(3, /* 63 entries */, 4096) = 2424
[pid 12817] getdents64(3,

3) ext2fs

4) None.

5) None. Kernel just hangs.

6)
cd /home/root/src/lighttpd-1.4.18
N=lighttpd
AP="/usr/local/$N"
EP="/usr/local"
ETC="/etc/$N"
./configure --prefix=$AP --bindir=$EP/bin --sbindir=$EP/sbin \
--libexecdir=$EP/libexec --mandir=$EP/man --sysconfdir=$ETC --disable-ipv6
make

7.1) Slackware-11.0
Linux darkstar 2.4.36.1 #2 Fri Feb 22 14:13:50 CET 2008 i686 pentium3 i386
GNU/Linux

Gnu C 3.4.6
Gnu make 3.81
binutils 2.15.92.0.2
util-linux 2.12r
mount 2.12r
modutils 2.4.27
e2fsprogs 1.38
Linux C Library 2.3.6
Dynamic linker (ldd) 2.3.6
Procps 3.2.7
Net-tools 1.60
Kbd 74:
Sh-utils 5.97
Modules Loaded

7.2)
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 8
model name : Pentium III (Coppermine)
stepping : 10
cpu MHz : 930.329
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov
pat pse36 mmx fxsr sse
bogomips : 1854.66

7.3) None
7.4)
# cat /proc/ioports:
0000-001f : dma1
0020-003f : pic1
0040-0043 : timer0
0050-0053 : timer1
0060-006f : keyboard
0080-008f : dma page reg
00a0-00bf : pic2
00c0-00df : dma2
00f0-00ff : fpu
01f0-01f7 : ide0
03c0-03df : vga+
03f6-03f6 : ide0
03f8-03ff : serial(auto)
0cf8-0cff : PCI conf1
1000-100f : Intel Corp. 82801AA IDE
1000-1007 : ide0
1008-100f : ide1
1400-141f : Intel Corp. 82801AA USB
1800-180f : Intel Corp. 82801AA SMBus
2400-243f : Intel Corp. 82557/8/9 [Ethernet Pro 100]
2400-243f : eepro100
f000-f07f : motherboard
f000-f003 : PM1a_EVT_BLK
f004-f005 : PM1a_CNT_BLK
f008-f00b : PM_TMR
f028-f02b : GPE0_BLK
f02c-f02f : GPE1_BLK
f100-f10f : motherboard
f180-f1bf : motherboard
f800-f81f : motherboard
f820-f82f : motherboard
fe00-fe00 : motherboard

# cat /proc/iomem
00000000-0009ebff : System RAM
0009ec00-0009ffff : reserved
000a0000-000bffff : Video RAM area
000c0000-000c7fff : Video ROM
000c8000-000c97ff : Extension ROM
000f0000-000fffff : System ROM
00100000-1feeffff : System RAM
00100000-0027773b : Kernel code
0027773c-002c52e3 : Kernel data
1fef0000-1fefefff : ACPI Tables
1feff000-1fefffff : ACPI Non-volatile Storage
1ff00000-1fffffff : reserved
f4000000-f407ffff : Intel Corp. 82810E DC-133 CGC [Chipset Graphics
Controller]
f4100000-f411ffff : Intel Corp. 82557/8/9 [Ethernet Pro 100]
f4120000-f4120fff : Intel Corp. 82557/8/9 [Ethernet Pro 100]
f4120000-f4120fff : eepro100
f8000000-fbffffff : Intel Corp. 82810E DC-133 CGC [Chipset Graphics
Controller]
ffb80000-ffbfffff : reserved
fff00000-ffffffff : reserved

7.5)
# lspci
00:00.0 Host bridge: Intel Corporation 82810E DC-133 GMCH [Graphics Memory
Controller Hub] (rev 03)
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort+ >SERR- <PERR-
Latency: 0

00:01.0 VGA compatible controller: Intel Corporation 82810E DC-133 CGC
[Chipset Graphics Controller] (rev 03) (prog-if 00 [VGA])
Subsystem: Siemens Nixdorf AG Unknown device 004a
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Interrupt: pin A routed to IRQ 11
Region 0: Memory at f8000000 (32-bit, prefetchable) [size=64M]
Region 1: Memory at f4000000 (32-bit, non-prefetchable)
[size=512K]
Capabilities: [dc] Power Management version 1
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:1e.0 PCI bridge: Intel Corporation 82801AA PCI Bridge (rev 02) (prog-if
00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Bus: primary=00, secondary=01, subordinate=01, sec-latency=64
I/O behind bridge: 00002000-00002fff
Memory behind bridge: f4100000-f41fffff
Prefetchable memory behind bridge: fff00000-000fffff
Secondary status: 66MHz- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort+ <SERR- <PERR-
BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B-

00:1f.0 ISA bridge: Intel Corporation 82801AA ISA Bridge (LPC) (rev 02)
Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 0

00:1f.1 IDE interface: Intel Corporation 82801AA IDE (rev 02) (prog-if 80
[Master])
Subsystem: Intel Corporation 82801AA IDE
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Region 4: I/O ports at 1000 [size=16]

00:1f.2 USB Controller: Intel Corporation 82801AA USB (rev 02) (prog-if 00
[UHCI])
Subsystem: Intel Corporation 82801AA USB
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Interrupt: pin D routed to IRQ 9
Region 4: I/O ports at 1400 [size=32]

00:1f.3 SMBus: Intel Corporation 82801AA SMBus (rev 02)
Subsystem: Intel Corporation 82801AA SMBus
Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Interrupt: pin B routed to IRQ 5
Region 4: I/O ports at 1800 [size=16]

01:08.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro
100] (rev 09)
Subsystem: Siemens Nixdorf AG Unknown device 004b
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop-
ParErr- Stepping- SERR+ FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 66 (2000ns min, 14000ns max), Cache Line Size: 32 bytes
Interrupt: pin A routed to IRQ 9
Region 0: Memory at f4120000 (32-bit, non-prefetchable) [size=4K]
Region 1: I/O ports at 2400 [size=64]
Region 2: Memory at f4100000 (32-bit, non-prefetchable)
[size=128K]
Expansion ROM at <unassigned> [disabled] [size=1M]
Capabilities: [dc] Power Management version 2
Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA
PME(D0+,D1+,D2+,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=2 PME-

7.6) None

7.7) Box has been tested via memtest86+
15 hrs of test running, no errors


2008-02-23 08:39:56

by Willy Tarreau

[permalink] [raw]
Subject: Re: PROBLEM: 2.4.36.1 hangs.

Hello,

first, thanks for your detailed report.

On Fri, Feb 22, 2008 at 03:08:30PM +0100, Unknown wrote:
> 1) 2.4.36.1 hangs (probably during ext2_readdir())
>
> 2) 2.4.36.1 hangs during compilation of lighttpd-1.4.18.
> It is probably during ext2_readdir() but I cannot confirm that..
> Currently it only happens during compilation of lighttpd-1.4.18, always
> in the same place.
> Kernel 2.4.36 w/ same options (make oldconfig) works fine.
>
> Last few lines of 'strace -f make' just before hang:
> [pid 12817] open(".", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 3
> [pid 12817] fstat64(3, {st_mode=S_IFDIR|0777, st_size=8192, ...}) = 0
> [pid 12817] fcntl64(3, F_SETFD, FD_CLOEXEC) = 0
> [pid 12817] getdents64(3, /* 114 entries */, 4096) = 4088
> [pid 12817] getdents64(3, /* 63 entries */, 4096) = 2424
> [pid 12817] getdents64(3,

OK, could you please try to revert the attached patch (patch -Rp1) to see if
this fixes the problem for you ? Please keep Dann and me in CC as it's not
easy to spot 2.4-related threads on LKML these days!

Thanks,
Willy

----

commit c30306fb287323591c854a0982d9fa5351859b45
Author: dann frazier <[email protected]>
Date: Mon Jan 21 17:13:06 2008 -0700

ext2_readdir() filp->f_pos fix

This is a 2.4 backport of a linux-2.6 change by Jan Blunck
(old-2.6-bkcvs commit 2196b4744393d4f6c06fc4d63b98556d05b90933)

Commit log from 2.6 follows.

[PATCH] ext2_readdir() filp->f_pos fix

If the whole directory is read, ext2_readdir() sets the f_pos to a multiple
of the page size (because of the conditions of the outer for loop). This
sets the wrong f_pos for directory inodes on ext2 partitions with a block
size differing from the page size.

Signed-off-by: dann frazier <[email protected]>

diff --git a/fs/ext2/dir.c b/fs/ext2/dir.c
index 58b76dd..b158e60 100644
--- a/fs/ext2/dir.c
+++ b/fs/ext2/dir.c
@@ -240,7 +240,7 @@ ext2_readdir (struct file * filp, void * dirent, filldir_t filldir)
loff_t pos = filp->f_pos;
struct inode *inode = filp->f_dentry->d_inode;
struct super_block *sb = inode->i_sb;
- unsigned offset = pos & ~PAGE_CACHE_MASK;
+ unsigned int offset = pos & ~PAGE_CACHE_MASK;
unsigned long n = pos >> PAGE_CACHE_SHIFT;
unsigned long npages = dir_pages(inode);
unsigned chunk_mask = ~(ext2_chunk_size(inode)-1);
@@ -258,8 +258,13 @@ ext2_readdir (struct file * filp, void * dirent, filldir_t filldir)
ext2_dirent *de;
struct page *page = ext2_get_page(inode, n);

- if (IS_ERR(page))
+ if (IS_ERR(page)) {
+ ext2_error(sb, __FUNCTION__,
+ "bad page in #%lu",
+ inode->i_ino);
+ filp->f_pos += PAGE_CACHE_SIZE - offset;
continue;
+ }
kaddr = page_address(page);
if (need_revalidate) {
offset = ext2_validate_entry(kaddr, offset, chunk_mask);
@@ -283,12 +288,12 @@ ext2_readdir (struct file * filp, void * dirent, filldir_t filldir)
ext2_put_page(page);
goto done;
}
+ filp->f_pos += le16_to_cpu(de->rec_len);
}
ext2_put_page(page);
}

done:
- filp->f_pos = (n << PAGE_CACHE_SHIFT) | offset;
filp->f_version = inode->i_version;
UPDATE_ATIME(inode);
return 0;

2008-02-24 18:29:03

by Willy Tarreau

[permalink] [raw]
Subject: Re: PROBLEM: 2.4.36.1 hangs.

On Sun, Feb 24, 2008 at 07:12:04PM +0100, Pascal Hambourg wrote:
> Hello,
>
> Willy Tarreau wrote :
> >
> >On Fri, Feb 22, 2008 at 03:08:30PM +0100, Unknown wrote:
> >
> >>2) 2.4.36.1 hangs during compilation of lighttpd-1.4.18.
> >> It is probably during ext2_readdir() but I cannot confirm that..
> >> Currently it only happens during compilation of lighttpd-1.4.18, always
> >> in the same place. [...]
>
> I experience the same problem when executing "ls /etc".
>
> >OK, could you please try to revert the attached patch (patch -Rp1) to see
> >if
> >this fixes the problem for you ? Please keep Dann and me in CC as it's not
> >easy to spot 2.4-related threads on LKML these days!
> >
> >----
> >
> >commit c30306fb287323591c854a0982d9fa5351859b45
> >Author: dann frazier <[email protected]>
> >Date: Mon Jan 21 17:13:06 2008 -0700
> >
> > ext2_readdir() filp->f_pos fix
>
> The patch didn't revert cleanly because of the two subsequent
> ext2-related patches, so I had to revert the three of them. It fixed the
> problem. Then I applied only the "ext2_readdir() filp->f_pos fix" patch
> again and the problem came back. HTH.

OK thanks very much Pascal.

I'm reverting the patch for the moment and will release 2.4.36.2 without
it.

Dann, it looks like the backport of the fix causes more trouble than it
attempts to fix :-/

(Un)fortunately I had no problem here, so I think it's not that easy to
reproduce the issue. If the fix is too hard to get right and the risk
of vulnerability very low, don't you think we should simply leave it
unfixed, at least until we really understand the nature of the problem ?

Thanks,
Willy

2008-02-24 18:47:37

by Pascal Hambourg

[permalink] [raw]
Subject: Re: PROBLEM: 2.4.36.1 hangs.

Hello,

Willy Tarreau wrote :
>
> On Fri, Feb 22, 2008 at 03:08:30PM +0100, Unknown wrote:
>
>>2) 2.4.36.1 hangs during compilation of lighttpd-1.4.18.
>> It is probably during ext2_readdir() but I cannot confirm that..
>> Currently it only happens during compilation of lighttpd-1.4.18, always
>> in the same place. [...]

I experience the same problem when executing "ls /etc".

> OK, could you please try to revert the attached patch (patch -Rp1) to see if
> this fixes the problem for you ? Please keep Dann and me in CC as it's not
> easy to spot 2.4-related threads on LKML these days!
>
> ----
>
> commit c30306fb287323591c854a0982d9fa5351859b45
> Author: dann frazier <[email protected]>
> Date: Mon Jan 21 17:13:06 2008 -0700
>
> ext2_readdir() filp->f_pos fix

The patch didn't revert cleanly because of the two subsequent
ext2-related patches, so I had to revert the three of them. It fixed the
problem. Then I applied only the "ext2_readdir() filp->f_pos fix" patch
again and the problem came back. HTH.

2008-02-26 07:47:15

by Glen Nakamura

[permalink] [raw]
Subject: Re: PROBLEM: 2.4.36.1 hangs.

Aloha,

The "ext2_readdir() filp->f_pos fix" patch looks weird...
Perhaps the "filp->f_pos += le16_to_cpu(de->rec_len);" line should be
outside of the if statement like the indentation implies?
As it is, filp->f_pos gets corrupted if de->inode is ever zero...
This could possibly explain why I had a few strange directory
entries until I checked the filesystem with:
e2fsck -D -F -f /dev/{ext2 partition}

- glen

Here is an updated (untested) patch:

--- linux-2.4.36.orig/fs/ext2/dir.c
+++ linux-2.4.36/fs/ext2/dir.c
@@ -240,7 +240,7 @@ ext2_readdir (struct file * filp, void *
loff_t pos = filp->f_pos;
struct inode *inode = filp->f_dentry->d_inode;
struct super_block *sb = inode->i_sb;
- unsigned offset = pos & ~PAGE_CACHE_MASK;
+ unsigned int offset = pos & ~PAGE_CACHE_MASK;
unsigned long n = pos >> PAGE_CACHE_SHIFT;
unsigned long npages = dir_pages(inode);
unsigned chunk_mask = ~(ext2_chunk_size(inode)-1);
@@ -258,8 +258,13 @@ ext2_readdir (struct file * filp, void *
ext2_dirent *de;
struct page *page = ext2_get_page(inode, n);

- if (IS_ERR(page))
+ if (IS_ERR(page)) {
+ ext2_error(sb, __FUNCTION__,
+ "bad page in #%lu",
+ inode->i_ino);
+ filp->f_pos += PAGE_CACHE_SIZE - offset;
continue;
+ }
kaddr = page_address(page);
if (need_revalidate) {
offset = ext2_validate_entry(kaddr, offset, chunk_mask);
@@ -267,7 +272,7 @@ ext2_readdir (struct file * filp, void *
}
de = (ext2_dirent *)(kaddr+offset);
limit = kaddr + PAGE_CACHE_SIZE - EXT2_DIR_REC_LEN(1);
- for ( ;(char*)de <= limit; de = ext2_next_entry(de))
+ for ( ;(char*)de <= limit; de = ext2_next_entry(de)) {
if (de->inode) {
int over;
unsigned char d_type = DT_UNKNOWN;
@@ -284,11 +289,12 @@ ext2_readdir (struct file * filp, void *
goto done;
}
}
+ filp->f_pos += le16_to_cpu(de->rec_len);
+ }
ext2_put_page(page);
}

done:
- filp->f_pos = (n << PAGE_CACHE_SHIFT) | offset;
filp->f_version = inode->i_version;
UPDATE_ATIME(inode);
return 0;

2008-02-26 08:16:41

by Willy Tarreau

[permalink] [raw]
Subject: Re: PROBLEM: 2.4.36.1 hangs.

On Mon, Feb 25, 2008 at 09:36:12PM -1000, Glen Nakamura wrote:
> Aloha,
>
> The "ext2_readdir() filp->f_pos fix" patch looks weird...
> Perhaps the "filp->f_pos += le16_to_cpu(de->rec_len);" line should be
> outside of the if statement like the indentation implies?

good catch! At least it's what is done in 2.6.

> As it is, filp->f_pos gets corrupted if de->inode is ever zero...
> This could possibly explain why I had a few strange directory
> entries until I checked the filesystem with:
> e2fsck -D -F -f /dev/{ext2 partition}
>
> - glen
>
> Here is an updated (untested) patch:

unfortunately, neither Dann nor me could reproduce the issue, so
we'll wait for some victims^Wvolunteers to give it a try.

BTW, I notice that 2.6 also has one extra chunk that 2.4 does not
have :

if (unlikely(need_revalidate)) {
+ if (offset) {
offset = ext2_validate_entry(kaddr, offset, chunk_mask);
+ filp->f_pos = (n<<PAGE_CACHE_SHIFT) + offset;
+ }
+ filp->f_version = inode->i_version;
need_revalidate = 0;
}

I have no idea whether this part is needed, we'd better ask Theo or Al
for some advices, as I'm not tempted by merging an uncertain patch when
it comes to filesystems.

Regards,
Willy

> --- linux-2.4.36.orig/fs/ext2/dir.c
> +++ linux-2.4.36/fs/ext2/dir.c
> @@ -240,7 +240,7 @@ ext2_readdir (struct file * filp, void *
> loff_t pos = filp->f_pos;
> struct inode *inode = filp->f_dentry->d_inode;
> struct super_block *sb = inode->i_sb;
> - unsigned offset = pos & ~PAGE_CACHE_MASK;
> + unsigned int offset = pos & ~PAGE_CACHE_MASK;
> unsigned long n = pos >> PAGE_CACHE_SHIFT;
> unsigned long npages = dir_pages(inode);
> unsigned chunk_mask = ~(ext2_chunk_size(inode)-1);
> @@ -258,8 +258,13 @@ ext2_readdir (struct file * filp, void *
> ext2_dirent *de;
> struct page *page = ext2_get_page(inode, n);
>
> - if (IS_ERR(page))
> + if (IS_ERR(page)) {
> + ext2_error(sb, __FUNCTION__,
> + "bad page in #%lu",
> + inode->i_ino);
> + filp->f_pos += PAGE_CACHE_SIZE - offset;
> continue;
> + }
> kaddr = page_address(page);
> if (need_revalidate) {
> offset = ext2_validate_entry(kaddr, offset, chunk_mask);
> @@ -267,7 +272,7 @@ ext2_readdir (struct file * filp, void *
> }
> de = (ext2_dirent *)(kaddr+offset);
> limit = kaddr + PAGE_CACHE_SIZE - EXT2_DIR_REC_LEN(1);
> - for ( ;(char*)de <= limit; de = ext2_next_entry(de))
> + for ( ;(char*)de <= limit; de = ext2_next_entry(de)) {
> if (de->inode) {
> int over;
> unsigned char d_type = DT_UNKNOWN;
> @@ -284,11 +289,12 @@ ext2_readdir (struct file * filp, void *
> goto done;
> }
> }
> + filp->f_pos += le16_to_cpu(de->rec_len);
> + }
> ext2_put_page(page);
> }
>
> done:
> - filp->f_pos = (n << PAGE_CACHE_SHIFT) | offset;
> filp->f_version = inode->i_version;
> UPDATE_ATIME(inode);
> return 0;

2008-02-26 10:14:13

by dann frazier

[permalink] [raw]
Subject: Re: PROBLEM: 2.4.36.1 hangs.

On Tue, Feb 26, 2008 at 09:16:25AM +0100, Willy Tarreau wrote:
> On Mon, Feb 25, 2008 at 09:36:12PM -1000, Glen Nakamura wrote:
> > Aloha,
> >
> > The "ext2_readdir() filp->f_pos fix" patch looks weird...
> > Perhaps the "filp->f_pos += le16_to_cpu(de->rec_len);" line should be
> > outside of the if statement like the indentation implies?
>
> good catch! At least it's what is done in 2.6.

Yes, that certainly looks like a bug.

> > As it is, filp->f_pos gets corrupted if de->inode is ever zero...
> > This could possibly explain why I had a few strange directory
> > entries until I checked the filesystem with:
> > e2fsck -D -F -f /dev/{ext2 partition}
> >
> > - glen
> >
> > Here is an updated (untested) patch:
>
> unfortunately, neither Dann nor me could reproduce the issue, so
> we'll wait for some victims^Wvolunteers to give it a try.

I'm now able to reliably reproduce it by creating/removing a chroot
(pbuilder create on a Debian system, though I'm sure a simpler test
exists). Correcting the le16_to_cpu placement as Glen described
fixes the issue for me.

> BTW, I notice that 2.6 also has one extra chunk that 2.4 does not
> have :
>
> if (unlikely(need_revalidate)) {
> + if (offset) {
> offset = ext2_validate_entry(kaddr, offset, chunk_mask);
> + filp->f_pos = (n<<PAGE_CACHE_SHIFT) + offset;
> + }
> + filp->f_version = inode->i_version;
> need_revalidate = 0;
> }
>
> I have no idea whether this part is needed, we'd better ask Theo or Al
> for some advices, as I'm not tempted by merging an uncertain patch when
> it comes to filesystems.

Looks like a test case may be available:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=2d7f2ea9c989853310c7f6e8be52cc090cc8e66b

--
dann frazier

2008-02-26 10:27:43

by Willy Tarreau

[permalink] [raw]
Subject: Re: PROBLEM: 2.4.36.1 hangs.

On Tue, Feb 26, 2008 at 03:13:46AM -0700, dann frazier wrote:
> I'm now able to reliably reproduce it by creating/removing a chroot
> (pbuilder create on a Debian system, though I'm sure a simpler test
> exists). Correcting the le16_to_cpu placement as Glen described
> fixes the issue for me.

OK that's excellent news.

> > BTW, I notice that 2.6 also has one extra chunk that 2.4 does not
> > have :
> >
> > if (unlikely(need_revalidate)) {
> > + if (offset) {
> > offset = ext2_validate_entry(kaddr, offset, chunk_mask);
> > + filp->f_pos = (n<<PAGE_CACHE_SHIFT) + offset;
> > + }
> > + filp->f_version = inode->i_version;
> > need_revalidate = 0;
> > }
> >
> > I have no idea whether this part is needed, we'd better ask Theo or Al
> > for some advices, as I'm not tempted by merging an uncertain patch when
> > it comes to filesystems.
>
> Looks like a test case may be available:
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=2d7f2ea9c989853310c7f6e8be52cc090cc8e66b

Dann, would you care to ping Al about the opportunity to merge this entire
patch, as well as Masoud for his test-case ? The patch looks like all what
remains different between 2.4 and 2.6, so I would bet we need it too.

Thanks,
Willy

2008-02-26 13:56:41

by Pascal Hambourg

[permalink] [raw]
Subject: Re: PROBLEM: 2.4.36.1 hangs.

dann frazier a ?crit :
>
> Correcting the le16_to_cpu placement as Glen described
> fixes the issue for me.

One of my boxes has at least six directories triggering the issue (I
must be very unlucky), and Glen's patch fixes it all. Thanks.