2008-02-21 11:36:16

by Haavard Skinnemoen

[permalink] [raw]
Subject: Re: Linux 2.6.24.atmel.1 MMC/SD

(Adding the ext2/ext3/ext4 list to Cc)

Note that the MMC/SD card driver in question, atmel-mci, is not in
mainline, and may be the real cause of this problem. But it looks like
there might be a potential problem in the ext3 code as well?

Haavard

On Thu, 21 Feb 2008 14:17:04 +0800
Hein_Tibosch <[email protected]> wrote:

> Hi James,
>
>
> I've had all kinds of problems with the SD-card hooked to an NGW100, just as John Voltz reported earlier:
>
> http://www.avr32linux.org/archives/kernel/2007-November/000421.html
> http://www.avr32linux.org/archives/kernel/2007-November/000425.html
>
> I debugged this problem and my conclusion is: using an SD-card may lead to both BUS-errors and a complete hanging of the system, with 2.6.23.atmel.5 as well as 2.6.24.atmel.1.
>
> Both the driver for ext2 and ext3 are using this type of function to iterate through a array of inodes:
>
> static inline ext2_dirent *ext2_next_entry(ext2_dirent *p)
> {
> return (ext2_dirent *)((char*)p + le16_to_cpu(p->rec_len));
> }
>
> static inline struct ext3_dir_entry_2 *
> ext3_next_entry(struct ext3_dir_entry_2 *p)
> {
> return (struct ext3_dir_entry_2 *)((char *)p +
> ext3_rec_len_from_disk(p->rec_len));
> }
>
>
> Sometimes, rec_len is checked for a zero-value, sometimes the entry is checked thoroughly for validity (like with ext2_check_page() or ext3_check_dir_entry()), but in other cases rec_len isn't checked at all! This is the case in e.g. fs/ext3/namei.c, function ext3_dx_find_entry(). This function is always enabled since 2.6.24 (CONFIG_EXT3_INDEX not used anymore).
>
> I had a card on which at one place rec_len turned out to be a small negative number. When iterating, it would either cycle for ever (until WDT) or it could enter invalid memory (OOPS: BUS error).
>
> ( strange though that the rec_len appeared to have a negative number, I just did a "mkfs -t ext3" on Ubuntu. Could that be caused by the Atmel-driver? )
>
> I don't yet feel qualified to make a patch for this, I only did it for myself. Maybe someone can pick this up: a validity check should be made before any call to xxx_next_entry().
>
>
> Regards,
>
> Hein Tibosch (HeinBali at avr32linux)
>
>
>
> James Stewart wrote:
> Hi,
>
> I'm wondering if there are any known issues with booting from SD card on the ATNGW100 using this kernel. I get a bunch of ext2 looking errors and then a stack dump immediately after mounting VFS. 2.6.23.atmel.5 runs perfectly, however.
>
> This is just compiling using atngw100_defconfig.
>
> Thanks,
>
> James
>
> ------------------------------------------------
>
> _______________________________________________
> Kernel mailing list
> [email protected]
> http://duppen.flaskehals.net/cgi-bin/mailman/listinfo/kernel
>


2008-02-21 17:44:23

by Hein_Tibosch

[permalink] [raw]
Subject: Re: Linux 2.6.24.atmel.1 MMC/SD

My email got refused because it contained HTML. Here again:

Crashes in ext2 and ext3 filesystem:

John's dump shows that PC was in ext2_find_entry while it crashed. I had
exactly the same type of OOPS with ext3_find_entry and found it was
actually in the static function ext3_dx_find_entry, which was inlined by
the compiler. There it got stuck because of an invalid rec_len, which
made the pointer decrease (wrap around) in stead of increase.

And so, ext3 looks as vulnerable as ext2, because in both drivers this
iteration sometimes takes place without checking the validity of data
read from the SD-card.

Whatever data the Atmel driver delivers, I think it should be checked,
also for rec_len values that cause a wrap-around of the pointer. Now
that I have it checked it more thoroughly, my NGW100 boots and runs
perfect from an SD-card.


Hein Tibosch



John Voltz wrote:
> I believe this might be what he is talking about. I had to reformat
my SD card to ext3 to use 2.6.23/24
>
> John
>
> Oops: Unhandled exception in kernel mode, sig: 7 [#1]
> PREEMPT FRAME_POINTER chip: 0x01f:0x1e82 rev 2
> Modules linked in: snd_pcm_oss snd_mixer_oss snd_atmel_ac97
snd_ac97_codec =
> snd_pcm snd_timer snd soundcore snd_page_alloc ac97_bus
> PC is at ext2_find_entry+0x9c/0x16c

Haavard Skinnemoen wrote:

> (Adding the ext2/ext3/ext4 list to Cc)
>
> Note that the MMC/SD card driver in question, atmel-mci, is not in
> mainline, and may be the real cause of this problem. But it looks like
> there might be a potential problem in the ext3 code as well?
>
> Haavard
>
> On Thu, 21 Feb 2008 14:17:04 +0800
> Hein_Tibosch <[email protected]> wrote:
>
>
>> Hi James,
>>
>>
>> I've had all kinds of problems with the SD-card hooked to an NGW100, just as John Voltz reported earlier:
>>
>> http://www.avr32linux.org/archives/kernel/2007-November/000421.html
>> http://www.avr32linux.org/archives/kernel/2007-November/000425.html
>>
>> I debugged this problem and my conclusion is: using an SD-card may lead to both BUS-errors and a complete hanging of the system, with 2.6.23.atmel.5 as well as 2.6.24.atmel.1.
>>
>> Both the driver for ext2 and ext3 are using this type of function to iterate through a array of inodes:
>>
>> static inline ext2_dirent *ext2_next_entry(ext2_dirent *p)
>> {
>> return (ext2_dirent *)((char*)p + le16_to_cpu(p->rec_len));
>> }
>>
>> static inline struct ext3_dir_entry_2 *
>> ext3_next_entry(struct ext3_dir_entry_2 *p)
>> {
>> return (struct ext3_dir_entry_2 *)((char *)p +
>> ext3_rec_len_from_disk(p->rec_len));
>> }
>>
>>
>> Sometimes, rec_len is checked for a zero-value, sometimes the entry is checked thoroughly for validity (like with ext2_check_page() or ext3_check_dir_entry()), but in other cases rec_len isn't checked at all! This is the case in e.g. fs/ext3/namei.c, function ext3_dx_find_entry(). This function is always enabled since 2.6.24 (CONFIG_EXT3_INDEX not used anymore).
>>
>> I had a card on which at one place rec_len turned out to be a small negative number. When iterating, it would either cycle for ever (until WDT) or it could enter invalid memory (OOPS: BUS error).
>>
>> ( strange though that the rec_len appeared to have a negative number, I just did a "mkfs -t ext3" on Ubuntu. Could that be caused by the Atmel-driver? )
>>
>> I don't yet feel qualified to make a patch for this, I only did it for myself. Maybe someone can pick this up: a validity check should be made before any call to xxx_next_entry().
>>
>>
>> Regards,
>>
>> Hein Tibosch (HeinBali at avr32linux)
>>
>>
>>
>> James Stewart wrote:
>> Hi,
>>
>> I'm wondering if there are any known issues with booting from SD card on the ATNGW100 using this kernel. I get a bunch of ext2 looking errors and then a stack dump immediately after mounting VFS. 2.6.23.atmel.5 runs perfectly, however.
>>
>> This is just compiling using atngw100_defconfig.
>>
>> Thanks,
>>
>> James
>>
>> ------------------------------------------------
>>
>> _______________________________________________
>> Kernel mailing list
>> [email protected]
>> http://duppen.flaskehals.net/cgi-bin/mailman/listinfo/kernel
>>
>>
>
>

2008-03-04 19:43:04

by Hein_Tibosch

[permalink] [raw]
Subject: Ext2 - ext3 unstable under 2.6.24: now solved (?)

Could someone please check the following?

The ext2 and ext3 filesystems of 2.6.24 show many Oops and hangups.
After debugging I found the following common cause:

In a new 2.6.24 function an unwanted sign-extension takes place in:

fs/ext2/dir.c

static inline unsigned ext2_rec_len_from_disk(__le16 dlen)
{
unsigned len = le16_to_cpu(dlen);

if (len == EXT2_MAX_REC_LEN)
return 1 << 16;
return len;
}

include/ext3_fs.h :

static inline unsigned ext3_rec_len_from_disk(__le16 dlen)
{
unsigned len = le16_to_cpu(dlen);

if (len == EXT3_MAX_REC_LEN)
return 1 << 16;
return len;
}

00A0 will be returned as 0xFFFFA000 !!

Many code which iterates through dirent's, uses the above function to
determine the start of the next dirent.(ext2_dirent, ext3_dir_entry_2)
See fs/ext2/dir.c and fs/ext3/namei.c

As a test I replaced "le16_to_cpu()" by a simple:

static inline unsigned my_le16_to_cpu (__le16 value)
{
return ((value & 0x00FF) << 8) | ((value & 0xFF00) >> 8);
}

It showed no more "negative" rec_len values which cause the crashes, and
both ext2/3 now run stable.

Compiler: gcc version 4.1.2 (Ubuntu 4.1.2-0ubuntu4)
Kernel: 2.6.24.atmel.1
Platform: Atmel AP7000 CPU, compiling with "ARCH=avr32
CROSS_COMPILE=avr32-linux-"


Hein Tibosch


Haavard Skinnemoen wrote:
> (Adding the ext2/ext3/ext4 list to Cc)
>
> Note that the MMC/SD card driver in question, atmel-mci, is not in
> mainline, and may be the real cause of this problem. But it looks like
> there might be a potential problem in the ext3 code as well?
>
> Haavard
>
> On Thu, 21 Feb 2008 14:17:04 +0800
> Hein_Tibosch <[email protected]> wrote:
>
>
>> Hi James,
>>
>>
>> I've had all kinds of problems with the SD-card hooked to an NGW100, just as John Voltz reported earlier:
>>
>> http://www.avr32linux.org/archives/kernel/2007-November/000421.html
>> http://www.avr32linux.org/archives/kernel/2007-November/000425.html
>>
>> I debugged this problem and my conclusion is: using an SD-card may lead to both BUS-errors and a complete hanging of the system, with 2.6.23.atmel.5 as well as 2.6.24.atmel.1.
>>
>> Both the driver for ext2 and ext3 are using this type of function to iterate through a array of inodes:
>>
>> static inline ext2_dirent *ext2_next_entry(ext2_dirent *p)
>> {
>> return (ext2_dirent *)((char*)p + le16_to_cpu(p->rec_len));
>> }
>>
>> static inline struct ext3_dir_entry_2 *
>> ext3_next_entry(struct ext3_dir_entry_2 *p)
>> {
>> return (struct ext3_dir_entry_2 *)((char *)p +
>> ext3_rec_len_from_disk(p->rec_len));
>> }
>>
>>
>> Sometimes, rec_len is checked for a zero-value, sometimes the entry is checked thoroughly for validity (like with ext2_check_page() or ext3_check_dir_entry()), but in other cases rec_len isn't checked at all! This is the case in e.g. fs/ext3/namei.c, function ext3_dx_find_entry(). This function is always enabled since 2.6.24 (CONFIG_EXT3_INDEX not used anymore).
>>
>> I had a card on which at one place rec_len turned out to be a small negative number. When iterating, it would either cycle for ever (until WDT) or it could enter invalid memory (OOPS: BUS error).
>>
>> ( strange though that the rec_len appeared to have a negative number, I just did a "mkfs -t ext3" on Ubuntu. Could that be caused by the Atmel-driver? )
>>
>> I don't yet feel qualified to make a patch for this, I only did it for myself. Maybe someone can pick this up: a validity check should be made before any call to xxx_next_entry().
>>
>>
>> Regards,
>>
>> Hein Tibosch (HeinBali at avr32linux)
>>
>>
>>
>> James Stewart wrote:
>> Hi,
>>
>> I'm wondering if there are any known issues with booting from SD card on the ATNGW100 using this kernel. I get a bunch of ext2 looking errors and then a stack dump immediately after mounting VFS. 2.6.23.atmel.5 runs perfectly, however.
>>
>> This is just compiling using atngw100_defconfig.
>>
>> Thanks,
>>
>> James
>>
>> ------------------------------------------------
>>
>> _______________________________________________
>> Kernel mailing list
>> [email protected]
>> http://duppen.flaskehals.net/cgi-bin/mailman/listinfo/kernel
>>
>>
>
>


2008-03-05 00:22:54

by Andreas Dilger

[permalink] [raw]
Subject: Re: Ext2 - ext3 unstable under 2.6.24: now solved (?)

On Mar 05, 2008 03:42 +0800, Hein_Tibosch wrote:
> Could someone please check the following?
>
> The ext2 and ext3 filesystems of 2.6.24 show many Oops and hangups. After
> debugging I found the following common cause:
>
> In a new 2.6.24 function an unwanted sign-extension takes place in:
>
> fs/ext2/dir.c
>
> static inline unsigned ext2_rec_len_from_disk(__le16 dlen)
> {
> unsigned len = le16_to_cpu(dlen);
>
> if (len == EXT2_MAX_REC_LEN)
> return 1 << 16;
> return len;
> }
>
> include/ext3_fs.h :
>
> static inline unsigned ext3_rec_len_from_disk(__le16 dlen)
> {
> unsigned len = le16_to_cpu(dlen);
>
> if (len == EXT3_MAX_REC_LEN)
> return 1 << 16;
> return len;
> }
>
> 00A0 will be returned as 0xFFFFA000 !!

Presumably this is a big-endian architecture? It would appear to be a bug
in the le16_to_cpu() code rather than the functions above, since they are
always using an unsigned variable.

I suppose it would be possible to mask off the returned value, but this
seems like it is fixing the problem at the wrong level:

return (len & 0xffffU);

> Many code which iterates through dirent's, uses the above function to
> determine the start of the next dirent.(ext2_dirent, ext3_dir_entry_2)
> See fs/ext2/dir.c and fs/ext3/namei.c
>
> As a test I replaced "le16_to_cpu()" by a simple:
>
> static inline unsigned my_le16_to_cpu (__le16 value)
> {
> return ((value & 0x00FF) << 8) | ((value & 0xFF00) >> 8);
> }
>
> It showed no more "negative" rec_len values which cause the crashes, and
> both ext2/3 now run stable.
>
> Compiler: gcc version 4.1.2 (Ubuntu 4.1.2-0ubuntu4)
> Kernel: 2.6.24.atmel.1
> Platform: Atmel AP7000 CPU, compiling with "ARCH=avr32
> CROSS_COMPILE=avr32-linux-"
>
>
> Hein Tibosch
>
>
> Haavard Skinnemoen wrote:
>> (Adding the ext2/ext3/ext4 list to Cc)
>>
>> Note that the MMC/SD card driver in question, atmel-mci, is not in
>> mainline, and may be the real cause of this problem. But it looks like
>> there might be a potential problem in the ext3 code as well?
>>
>> Haavard
>>
>> On Thu, 21 Feb 2008 14:17:04 +0800
>> Hein_Tibosch <[email protected]> wrote:
>>
>>
>>> Hi James,
>>>
>>>
>>> I've had all kinds of problems with the SD-card hooked to an NGW100, just as John Voltz reported earlier:
>>>
>>> http://www.avr32linux.org/archives/kernel/2007-November/000421.html
>>> http://www.avr32linux.org/archives/kernel/2007-November/000425.html
>>>
>>> I debugged this problem and my conclusion is: using an SD-card may lead to both BUS-errors and a complete hanging of the system, with 2.6.23.atmel.5 as well as 2.6.24.atmel.1.
>>>
>>> Both the driver for ext2 and ext3 are using this type of function to iterate through a array of inodes:
>>>
>>> static inline ext2_dirent *ext2_next_entry(ext2_dirent *p)
>>> {
>>> return (ext2_dirent *)((char*)p + le16_to_cpu(p->rec_len));
>>> }
>>>
>>> static inline struct ext3_dir_entry_2 *
>>> ext3_next_entry(struct ext3_dir_entry_2 *p)
>>> {
>>> return (struct ext3_dir_entry_2 *)((char *)p +
>>> ext3_rec_len_from_disk(p->rec_len));
>>> }
>>>
>>>
>>> Sometimes, rec_len is checked for a zero-value, sometimes the entry is checked thoroughly for validity (like with ext2_check_page() or ext3_check_dir_entry()), but in other cases rec_len isn't checked at all! This is the case in e.g. fs/ext3/namei.c, function ext3_dx_find_entry(). This function is always enabled since 2.6.24 (CONFIG_EXT3_INDEX not used anymore).
>>>
>>> I had a card on which at one place rec_len turned out to be a small negative number. When iterating, it would either cycle for ever (until WDT) or it could enter invalid memory (OOPS: BUS error).
>>>
>>> ( strange though that the rec_len appeared to have a negative number, I just did a "mkfs -t ext3" on Ubuntu. Could that be caused by the Atmel-driver? )
>>>
>>> I don't yet feel qualified to make a patch for this, I only did it for myself. Maybe someone can pick this up: a validity check should be made before any call to xxx_next_entry().
>>>
>>>
>>> Regards,
>>>
>>> Hein Tibosch (HeinBali at avr32linux)
>>>
>>>
>>>
>>> James Stewart wrote:
>>> Hi,
>>> I'm wondering if there are any known issues with booting from SD card on
>>> the ATNGW100 using this kernel. I get a bunch of ext2 looking errors and
>>> then a stack dump immediately after mounting VFS. 2.6.23.atmel.5 runs
>>> perfectly, however.
>>> This is just compiling using atngw100_defconfig.
>>> Thanks,
>>> James
>>>
>>> ------------------------------------------------
>>>
>>> _______________________________________________
>>> Kernel mailing list
>>> [email protected]
>>> http://duppen.flaskehals.net/cgi-bin/mailman/listinfo/kernel
>>>
>>
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.