LinuxLists.cc - Crash (ext3 ) during 2.6.29-rc6 boot

[permalink] [raw]

Subject: Re: Crash (ext3 ) during 2.6.29-rc6 boot

Andrew Morton wrote:
> hm, I wonder what could have caused that - we haven't altered
> fs/ext3/xattr.c in ages.
>
> What is the most recent kernel version you know of which didn't do
> this? Bear in mind that this crash might be triggered by the
> current contents of the filesystem, so if possible, please test
> some other kernel versions on that disk.
>
I am trying to boot a vanilla kernel on this machine for the first
time. Haven't tried any other kernels. Will give it a try.

> It looks like we died in ext3_xattr_block_get():
>
> memcpy(buffer, bh->b_data + le16_to_cpu(entry->e_value_offs),
> size);
>
> Perhaps entry->e_value_offs is no good. I wonder if the filesystem is
> corrupted and this snuck through the defenses.
>
> I also wonder if there is enough info in that trace for a ppc person to
> be able to determine whether the faulting address is in the source or
> destination of the memcpy() (please)?
>
Some more information if this could be of any help.

0:mon> di 0xc000000000039574
c000000000039574 e9240008 ld r9,8(r4)
c000000000039578 409d0010 ble cr7,c000000000039588 # .memcpy+0x88/0x244
c00000000003957c 79290002 rotldi r9,r9,32
c000000000039580 91230000 stw r9,0(r3)
c000000000039584 38630004 addi r3,r3,4
c000000000039588 409e0010 bne cr7,c000000000039598 # .memcpy+0x98/0x244
c00000000003958c 79298000 rotldi r9,r9,16
c000000000039590 b1230000 sth r9,0(r3)
c000000000039594 38630002 addi r3,r3,2
c000000000039598 409f000c bns cr7,c0000000000395a4 # .memcpy+0xa4/0x244
c00000000003959c 79294000 rotldi r9,r9,8
c0000000000395a0 99230000 stb r9,0(r3)
c0000000000395a4 e8610030 ld r3,48(r1)
c0000000000395a8 4e800020 blr
c0000000000395ac 78a6e8c2 rldicl r6,r5,61,3
c0000000000395b0 38a5fff0 addi r5,r5,-16
0:mon> r
R00 = 000000000000e40f R16 = 00000000100edbc8
R01 = c00000003e59b3e0 R17 = 00000000100b0000
R02 = c0000000009c2110 R18 = 0000000000000005
R03 = c000000044bc90e0 R19 = 00000000fff0d7a8
R04 = c000000039cffff4 R20 = 00000000fff0d708
R05 = 0000000000000003 R21 = 00000000000000ff
R06 = 0000000000000000 R22 = 0000000000000006
R07 = 0000000000000001 R23 = c00000000079ab49
R08 = 723a7573725f743a R24 = c0000000372fe2a8
R09 = 3a6f626a6563745f R25 = c000000044bc90c8
R10 = c00000003b250968 R26 = c0000000372fe240
R11 = c000000000039500 R27 = c0000000372fe3b0
R12 = d00000000244c590 R28 = c0000000372c5280
R13 = c000000000a53480 R29 = 000000000000001b
R14 = 00000000100d0000 R30 = d0000000024654d0
R15 = 0000000000000000 R31 = ffffffffffffffde
pc = c000000000039574 .memcpy+0x74/0x244
lr = d00000000244916c .ext3_xattr_get+0x288/0x2f4 [ext3]
msr = 8000000000009032 cr = 4400844b
ctr = 0000000000000000 xer = 0000000000000001 trap = 300
dar = c000000039d00000 dsisr = 40000000
0:mon>

So the other thing i noticed was that this machine was running
a kernel with selinux enabled. I turned off selinux and there
were no issues during bootup. It was a clean boot.

Thanks
-Sachin

--

---------------------------------
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
---------------------------------

2009-02-23 10:57:45

[permalink] [raw]

Subject: Re: Crash (ext3 ) during 2.6.29-rc6 boot

Paul Mackerras wrote:
> It appears to have faulted on a load, implicating the source. The
> address being referenced (0xc00000003f380000) doesn't look
> outlandish. I wonder if this kernel has CONFIG_DEBUG_PAGEALLOC turned
> on, and what page size is selected?
Yes CONFIG_DEBUG_PAGEALLOC is enabled and the page size is 64K.

CONFIG_DEBUG_PAGEALLOC=y
CONFIG_PPC_64K_PAGES=y

Thanks
-Sachin

--

---------------------------------
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
---------------------------------

2009-02-23 15:51:18

by Jan Kara

[permalink] [raw]

Subject: Re: Crash (ext3 ) during 2.6.29-rc6 boot

> Andrew Morton writes:
>
> > It looks like we died in ext3_xattr_block_get():
> >
> > memcpy(buffer, bh->b_data + le16_to_cpu(entry->e_value_offs),
> > size);
> >
> > Perhaps entry->e_value_offs is no good. I wonder if the filesystem is
> > corrupted and this snuck through the defenses.
> >
> > I also wonder if there is enough info in that trace for a ppc person to
> > be able to determine whether the faulting address is in the source or
> > destination of the memcpy() (please)?
>
> It appears to have faulted on a load, implicating the source. The
> address being referenced (0xc00000003f380000) doesn't look
> outlandish. I wonder if this kernel has CONFIG_DEBUG_PAGEALLOC turned
> on, and what page size is selected?
Hmm, OK. But then I'm not sure how that can happen. Obviously, memcpy
somehow got beyond end of the page referenced by bh->b_data. So it means
that le16_to_cpu(entry->e_value_offs) + size > page_size. But
ext3_xattr_find_entry() calls ext3_xattr_check_entry() which in
particular checks whether e_value_offs + e_value_size isn't greater than
bh->b_size. So I see no way how memcpy can get beyond end of the page.
Sachin, is the problem reproducible? If yes, can you send us contents
of the page just before the faulting address (i.e., for current fault it
would be 0xc00000003f370000-0xc00000003f37ffff). As far as I can
remember powerpc monitor could dump it.
BTW, I suppose you use 4KB blocksize on the filesystem, right?

Honza
--
Jan Kara <[email protected]>
SuSE CR Labs

2009-02-24 06:38:47

[permalink] [raw]

Subject: Re: Crash (ext3 ) during 2.6.29-rc6 boot

Jan Kara wrote:
> Hmm, OK. But then I'm not sure how that can happen. Obviously, memcpy
> somehow got beyond end of the page referenced by bh->b_data. So it means
> that le16_to_cpu(entry->e_value_offs) + size > page_size. But
> ext3_xattr_find_entry() calls ext3_xattr_check_entry() which in
> particular checks whether e_value_offs + e_value_size isn't greater than
> bh->b_size. So I see no way how memcpy can get beyond end of the page.
> Sachin, is the problem reproducible? If yes, can you send us contents
>
Yes, i am able to recreate this problem easily. As i had mentioned if the
earlier kernel is booted with selinux enabled and then 2.6.29-rc6 is booted
i get this crash. But if i specify selinux=0 at command line, 2.6.29-rc6 boots
without any problem.

> of the page just before the faulting address (i.e., for current fault it
> would be 0xc00000003f370000-0xc00000003f37ffff). As far as I can
> remember powerpc monitor could dump it.
>
Here is the page dump. This time it crashed while accessing address
0xc00000002d670000.

Unable to handle kernel paging request for data at address 0xc0000
0002d670000
Faulting instruction address: 0xc000000000039574
cpu 0x1: Vector: 300 (Data Access) at [c00000004288b0b0]
pc: c000000000039574: .memcpy+0x74/0x244
lr: c0000000001b497c: .ext3_xattr_get+0x288/0x2f4
sp: c00000004288b330
msr: 8000000000009032

1:mon> d 0xc00000002d660000
............................... <SNIP> ...............................

c00000002d66efd0 0000000000000000 0000000000000000 |................|
c00000002d66efe0 0000000000000000 0000000000000000 |................|
c00000002d66eff0 0000000000000000 0000000000000000 |................|
c00000002d66f000 000002ea00040000 01000000e200d20a |................|
c00000002d66f010 0000000000000000 0000000000000000 |................|
c00000002d66f020 0706e40f00000000 1b000000e200d20a |................|
c00000002d66f030 73656c696e757800 0000000000000000 |selinux.........|
c00000002d66f040 0000000000000000 0000000000000000 |................|
c00000002d66f050 0000000000000000 0000000000000000 |................|
c00000002d66f060 0000000000000000 0000000000000000 |................|

............................... <SNIP> ...............................

c00000002d66ff60 0000000000000000 0000000000000000 |................|
c00000002d66ff70 0000000000000000 0000000000000000 |................|
c00000002d66ff80 0000000000000000 0000000000000000 |................|
c00000002d66ff90 0000000000000000 0000000000000000 |................|
c00000002d66ffa0 0000000000000000 0000000000000000 |................|
c00000002d66ffb0 0000000000000000 0000000000000000 |................|
c00000002d66ffc0 0000000000000000 0000000000000000 |................|
c00000002d66ffd0 0000000000000000 0000000000000000 |................|
c00000002d66ffe0 0000000073797374 656d5f753a6f626a |....system_u:obj|
c00000002d66fff0 6563745f723a7573 725f743a73300000 |ect_r:usr_t:s0..|
c00000002d670000 **************** **************** | |
1:mon> r
R00 = 000000000000e40f R16 = 000000000000005d
R01 = c00000004288b330 R17 = 0000000000000000
R02 = c0000000009f59b8 R18 = 00000000fffbfe9e
R03 = c000000044aa34a0 R19 = 0000000010042638
R04 = c00000002d66fff4 R20 = 0000000010041610
R05 = 0000000000000003 R21 = 00000000000000ff
R06 = 0000000000000000 R22 = 0000000000000006
R07 = 0000000000000001 R23 = c0000000007d27c1
R08 = 723a7573725f743a R24 = c00000002c0cd758
R09 = 3a6f626a6563745f R25 = c000000044aa3488
R10 = c00000000017b43c R26 = c00000002c0cd6f0
R11 = c00000002d66f020 R27 = c00000002c0cd860
R12 = d0000000023c14b0 R28 = c00000002c0b0840
R13 = c000000000a93680 R29 = 000000000000001b
R14 = 00000000000041ed R30 = c0000000009880b0
R15 = 0000000010040000 R31 = ffffffffffffffde
pc = c000000000039574 .memcpy+0x74/0x244
lr = c0000000001b497c .ext3_xattr_get+0x288/0x2f4
msr = 8000000000009032 cr = 4400044b
ctr = 0000000000000000 xer = 0000000020000001 trap = 300
dar = c00000002d670000 dsisr = 40000000
1:mon> zr

> BTW, I suppose you use 4KB blocksize on the filesystem, right?
>
Yes.

dumpe2fs /dev/sda3 | grep -i "block size"
dumpe2fs 1.39 (29-May-2006)
Block size: 4096

Thanks
-Sachin

--

---------------------------------
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
---------------------------------

2009-02-24 15:51:27

by Jan Kara

[permalink] [raw]

Subject: Re: Crash (ext3 ) during 2.6.29-rc6 boot

Hello,

On Tue 24-02-09 12:08:37, Sachin P. Sant wrote:
> Jan Kara wrote:
>> Hmm, OK. But then I'm not sure how that can happen. Obviously, memcpy
>> somehow got beyond end of the page referenced by bh->b_data. So it means
>> that le16_to_cpu(entry->e_value_offs) + size > page_size. But
>> ext3_xattr_find_entry() calls ext3_xattr_check_entry() which in
>> particular checks whether e_value_offs + e_value_size isn't greater than
>> bh->b_size. So I see no way how memcpy can get beyond end of the page.
>> Sachin, is the problem reproducible? If yes, can you send us contents
>>
> Yes, i am able to recreate this problem easily. As i had mentioned if the
> earlier kernel is booted with selinux enabled and then 2.6.29-rc6 is booted
> i get this crash. But if i specify selinux=0 at command line, 2.6.29-rc6 boots
> without any problem.
>
>> of the page just before the faulting address (i.e., for current fault it
>> would be 0xc00000003f370000-0xc00000003f37ffff). As far as I can
>> remember powerpc monitor could dump it.
>>
> Here is the page dump. This time it crashed while accessing address
> 0xc00000002d670000.
Thanks for the dump.

> Unable to handle kernel paging request for data at address 0xc0000
> 0002d670000
> Faulting instruction address: 0xc000000000039574
> cpu 0x1: Vector: 300 (Data Access) at [c00000004288b0b0]
> pc: c000000000039574: .memcpy+0x74/0x244
> lr: c0000000001b497c: .ext3_xattr_get+0x288/0x2f4
> sp: c00000004288b330
> msr: 8000000000009032
>
> 1:mon> d 0xc00000002d660000
> ............................... <SNIP> ...............................
>
> c00000002d66efd0 0000000000000000 0000000000000000 |................|
> c00000002d66efe0 0000000000000000 0000000000000000 |................|
> c00000002d66eff0 0000000000000000 0000000000000000 |................|
> c00000002d66f000 000002ea00040000 01000000e200d20a |................|
> c00000002d66f010 0000000000000000 0000000000000000 |................|
> c00000002d66f020 0706e40f00000000 1b000000e200d20a |................|
> c00000002d66f030 73656c696e757800 0000000000000000 |selinux.........|
> c00000002d66f040 0000000000000000 0000000000000000 |................|
> c00000002d66f050 0000000000000000 0000000000000000 |................|
> c00000002d66f060 0000000000000000 0000000000000000 |................|
>
> ............................... <SNIP> ...............................
>
> c00000002d66ff60 0000000000000000 0000000000000000 |................|
> c00000002d66ff70 0000000000000000 0000000000000000 |................|
> c00000002d66ff80 0000000000000000 0000000000000000 |................|
> c00000002d66ff90 0000000000000000 0000000000000000 |................|
> c00000002d66ffa0 0000000000000000 0000000000000000 |................|
> c00000002d66ffb0 0000000000000000 0000000000000000 |................|
> c00000002d66ffc0 0000000000000000 0000000000000000 |................|
> c00000002d66ffd0 0000000000000000 0000000000000000 |................|
> c00000002d66ffe0 0000000073797374 656d5f753a6f626a |....system_u:obj|
> c00000002d66fff0 6563745f723a7573 725f743a73300000 |ect_r:usr_t:s0..|
> c00000002d670000 **************** **************** | |
> 1:mon> r
> R00 = 000000000000e40f R16 = 000000000000005d
> R01 = c00000004288b330 R17 = 0000000000000000
> R02 = c0000000009f59b8 R18 = 00000000fffbfe9e
> R03 = c000000044aa34a0 R19 = 0000000010042638
> R04 = c00000002d66fff4 R20 = 0000000010041610
> R05 = 0000000000000003 R21 = 00000000000000ff
> R06 = 0000000000000000 R22 = 0000000000000006
> R07 = 0000000000000001 R23 = c0000000007d27c1
> R08 = 723a7573725f743a R24 = c00000002c0cd758
> R09 = 3a6f626a6563745f R25 = c000000044aa3488
> R10 = c00000000017b43c R26 = c00000002c0cd6f0
> R11 = c00000002d66f020 R27 = c00000002c0cd860
> R12 = d0000000023c14b0 R28 = c00000002c0b0840
> R13 = c000000000a93680 R29 = 000000000000001b
> R14 = 00000000000041ed R30 = c0000000009880b0
> R15 = 0000000010040000 R31 = ffffffffffffffde
> pc = c000000000039574 .memcpy+0x74/0x244
> lr = c0000000001b497c .ext3_xattr_get+0x288/0x2f4
> msr = 8000000000009032 cr = 4400044b
> ctr = 0000000000000000 xer = 0000000020000001 trap = 300
> dar = c00000002d670000 dsisr = 40000000
> 1:mon> zr
>
>> BTW, I suppose you use 4KB blocksize on the filesystem, right?
>>
> Yes.
>
> dumpe2fs /dev/sda3 | grep -i "block size" dumpe2fs 1.39 (29-May-2006)
> Block size: 4096
OK. The xattr block causing oops is completely correct. To me it seems
more like some problem in powerpc memcpy() (I saw there went some changes
into in in the end of December) - we call it to copy 27 bytes from
address 0xc00000002d66ffe4 (which is one byte before end of the page).
Could some of the powerpc guys have a look whether this could be the case?
I'm not quite fluent in the powerpc assembly so it would take me ages ;).

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2009-02-24 16:14:06

by Jan Kara

[permalink] [raw]

Subject: Re: Crash (ext3 ) during 2.6.29-rc6 boot

> Andrew Morton wrote:
> >hm, I wonder what could have caused that - we haven't altered
> >fs/ext3/xattr.c in ages.
> >
> >What is the most recent kernel version you know of which didn't do
> >this? Bear in mind that this crash might be triggered by the
> >current contents of the filesystem, so if possible, please test
> >some other kernel versions on that disk.
> >
> I am trying to boot a vanilla kernel on this machine for the first
> time. Haven't tried any other kernels. Will give it a try.
>
> >It looks like we died in ext3_xattr_block_get():
> >
> > memcpy(buffer, bh->b_data + le16_to_cpu(entry->e_value_offs),
> > size);
> >
> >Perhaps entry->e_value_offs is no good. I wonder if the filesystem is
> >corrupted and this snuck through the defenses.
> >
> >I also wonder if there is enough info in that trace for a ppc person to
> >be able to determine whether the faulting address is in the source or
> >destination of the memcpy() (please)?
> >
> Some more information if this could be of any help.
>
> 0:mon> di 0xc000000000039574
> c000000000039574 e9240008 ld r9,8(r4)
> c000000000039578 409d0010 ble cr7,c000000000039588 #
> .memcpy+0x88/0x244
> c00000000003957c 79290002 rotldi r9,r9,32
> c000000000039580 91230000 stw r9,0(r3)
> c000000000039584 38630004 addi r3,r3,4
> c000000000039588 409e0010 bne cr7,c000000000039598 #
> .memcpy+0x98/0x244
> c00000000003958c 79298000 rotldi r9,r9,16
> c000000000039590 b1230000 sth r9,0(r3)
> c000000000039594 38630002 addi r3,r3,2
> c000000000039598 409f000c bns cr7,c0000000000395a4 #
> .memcpy+0xa4/0x244
> c00000000003959c 79294000 rotldi r9,r9,8
> c0000000000395a0 99230000 stb r9,0(r3)
> c0000000000395a4 e8610030 ld r3,48(r1)
> c0000000000395a8 4e800020 blr
> c0000000000395ac 78a6e8c2 rldicl r6,r5,61,3
> c0000000000395b0 38a5fff0 addi r5,r5,-16
> 0:mon> r
> R00 = 000000000000e40f R16 = 00000000100edbc8
> R01 = c00000003e59b3e0 R17 = 00000000100b0000
> R02 = c0000000009c2110 R18 = 0000000000000005
> R03 = c000000044bc90e0 R19 = 00000000fff0d7a8
> R04 = c000000039cffff4 R20 = 00000000fff0d708
> R05 = 0000000000000003 R21 = 00000000000000ff
> R06 = 0000000000000000 R22 = 0000000000000006
> R07 = 0000000000000001 R23 = c00000000079ab49
> R08 = 723a7573725f743a R24 = c0000000372fe2a8
> R09 = 3a6f626a6563745f R25 = c000000044bc90c8
> R10 = c00000003b250968 R26 = c0000000372fe240
> R11 = c000000000039500 R27 = c0000000372fe3b0
> R12 = d00000000244c590 R28 = c0000000372c5280
> R13 = c000000000a53480 R29 = 000000000000001b
> R14 = 00000000100d0000 R30 = d0000000024654d0
> R15 = 0000000000000000 R31 = ffffffffffffffde
> pc = c000000000039574 .memcpy+0x74/0x244
> lr = d00000000244916c .ext3_xattr_get+0x288/0x2f4 [ext3]
> msr = 8000000000009032 cr = 4400844b
> ctr = 0000000000000000 xer = 0000000000000001 trap = 300
> dar = c000000039d00000 dsisr = 40000000
> 0:mon>
Yes, this makes me even more suspitious that memcpy() on powerpc could
be at fault. The instruction (ld r9,8(r4)) is loading last 8 bytes to copy,
but in fact it should load only 3 bytes in our case because remaining 5
bytes are not in the range we specified and thus larger load can cause
page fault...

Honza
--
Jan Kara <[email protected]>
SuSE CR Labs

2009-02-24 18:02:02

[permalink] [raw]

Subject: Re: Crash (ext3 ) during 2.6.29-rc6 boot

On Mon, 23 Feb 2009, Paul Mackerras wrote:
> Andrew Morton writes:
> > It looks like we died in ext3_xattr_block_get():
> >
> > memcpy(buffer, bh->b_data + le16_to_cpu(entry->e_value_offs),
> > size);
> >
> > Perhaps entry->e_value_offs is no good. I wonder if the filesystem is
> > corrupted and this snuck through the defenses.
> >
> > I also wonder if there is enough info in that trace for a ppc person to
> > be able to determine whether the faulting address is in the source or
> > destination of the memcpy() (please)?
>
> It appears to have faulted on a load, implicating the source. The
> address being referenced (0xc00000003f380000) doesn't look
> outlandish. I wonder if this kernel has CONFIG_DEBUG_PAGEALLOC turned
> on, and what page size is selected?

I'm seeing a similar thing on PS3, but not in ext3. During early userspace
setup (udevd), it crashes accessing a 0xc00* address in:

| NIP setup+0x20/0x130
| LR copy_user_page+0x18/0x6c
| Call trace:
| do_wp_page+0x5b4/0x89c
| do_page_fault+0x3a8/0x58c
| handle_page_fault+0x20/0x5c

I have CONFIG_DEBUG_PAGEALLOC=y. If I disable it, the system boots fine.

If needed, I can probably bisect this tomorrow. It definitely didn't happen in
2.6.29-rc5.

With kind regards,

Geert Uytterhoeven
Software Architect

Sony Techsoft Centre Europe
The Corporate Village ? Da Vincilaan 7-D1 ? B-1935 Zaventem ? Belgium

Phone: +32 (0)2 700 8453
Fax: +32 (0)2 700 8622
E-mail: [email protected]
Internet: http://www.sony-europe.com/

A division of Sony Europe (Belgium) N.V.
VAT BE 0413.825.160 ? RPR Brussels
Fortis ? BIC GEBABEBB ? IBAN BE41293037680010

2009-02-25 01:19:23

[permalink] [raw]

Subject: Re: Crash (ext3 ) during 2.6.29-rc6 boot

On Wed, 25 Feb 2009 02:51:20 am Jan Kara wrote:
> Hello,
>
> On Tue 24-02-09 12:08:37, Sachin P. Sant wrote:
> > Jan Kara wrote:
> >> Hmm, OK. But then I'm not sure how that can happen. Obviously, memcpy
> >> somehow got beyond end of the page referenced by bh->b_data. So it means
> >> that le16_to_cpu(entry->e_value_offs) + size > page_size. But
> >> ext3_xattr_find_entry() calls ext3_xattr_check_entry() which in
> >> particular checks whether e_value_offs + e_value_size isn't greater than
> >> bh->b_size. So I see no way how memcpy can get beyond end of the page.
> >> Sachin, is the problem reproducible? If yes, can you send us contents
> >>
> > Yes, i am able to recreate this problem easily. As i had mentioned if the
> > earlier kernel is booted with selinux enabled and then 2.6.29-rc6 is booted
> > i get this crash. But if i specify selinux=0 at command line, 2.6.29-rc6 boots
> > without any problem.
> >
> >> of the page just before the faulting address (i.e., for current fault it
> >> would be 0xc00000003f370000-0xc00000003f37ffff). As far as I can
> >> remember powerpc monitor could dump it.
> >>
> > Here is the page dump. This time it crashed while accessing address
> > 0xc00000002d670000.
> Thanks for the dump.
>
> > Unable to handle kernel paging request for data at address 0xc0000
> > 0002d670000
> > Faulting instruction address: 0xc000000000039574
> > cpu 0x1: Vector: 300 (Data Access) at [c00000004288b0b0]
> > pc: c000000000039574: .memcpy+0x74/0x244
> > lr: c0000000001b497c: .ext3_xattr_get+0x288/0x2f4
> > sp: c00000004288b330
> > msr: 8000000000009032
> >
> > 1:mon> d 0xc00000002d660000
> > ............................... <SNIP> ...............................
> >
> > c00000002d66efd0 0000000000000000 0000000000000000 |................|
> > c00000002d66efe0 0000000000000000 0000000000000000 |................|
> > c00000002d66eff0 0000000000000000 0000000000000000 |................|
> > c00000002d66f000 000002ea00040000 01000000e200d20a |................|
> > c00000002d66f010 0000000000000000 0000000000000000 |................|
> > c00000002d66f020 0706e40f00000000 1b000000e200d20a |................|
> > c00000002d66f030 73656c696e757800 0000000000000000 |selinux.........|
> > c00000002d66f040 0000000000000000 0000000000000000 |................|
> > c00000002d66f050 0000000000000000 0000000000000000 |................|
> > c00000002d66f060 0000000000000000 0000000000000000 |................|
> >
> > ............................... <SNIP> ...............................
> >
> > c00000002d66ff60 0000000000000000 0000000000000000 |................|
> > c00000002d66ff70 0000000000000000 0000000000000000 |................|
> > c00000002d66ff80 0000000000000000 0000000000000000 |................|
> > c00000002d66ff90 0000000000000000 0000000000000000 |................|
> > c00000002d66ffa0 0000000000000000 0000000000000000 |................|
> > c00000002d66ffb0 0000000000000000 0000000000000000 |................|
> > c00000002d66ffc0 0000000000000000 0000000000000000 |................|
> > c00000002d66ffd0 0000000000000000 0000000000000000 |................|
> > c00000002d66ffe0 0000000073797374 656d5f753a6f626a |....system_u:obj|
> > c00000002d66fff0 6563745f723a7573 725f743a73300000 |ect_r:usr_t:s0..|
> > c00000002d670000 **************** **************** | |
> > 1:mon> r
> > R00 = 000000000000e40f R16 = 000000000000005d
> > R01 = c00000004288b330 R17 = 0000000000000000
> > R02 = c0000000009f59b8 R18 = 00000000fffbfe9e
> > R03 = c000000044aa34a0 R19 = 0000000010042638
> > R04 = c00000002d66fff4 R20 = 0000000010041610
> > R05 = 0000000000000003 R21 = 00000000000000ff
> > R06 = 0000000000000000 R22 = 0000000000000006
> > R07 = 0000000000000001 R23 = c0000000007d27c1
> > R08 = 723a7573725f743a R24 = c00000002c0cd758
> > R09 = 3a6f626a6563745f R25 = c000000044aa3488
> > R10 = c00000000017b43c R26 = c00000002c0cd6f0
> > R11 = c00000002d66f020 R27 = c00000002c0cd860
> > R12 = d0000000023c14b0 R28 = c00000002c0b0840
> > R13 = c000000000a93680 R29 = 000000000000001b
> > R14 = 00000000000041ed R30 = c0000000009880b0
> > R15 = 0000000010040000 R31 = ffffffffffffffde
> > pc = c000000000039574 .memcpy+0x74/0x244
> > lr = c0000000001b497c .ext3_xattr_get+0x288/0x2f4
> > msr = 8000000000009032 cr = 4400044b
> > ctr = 0000000000000000 xer = 0000000020000001 trap = 300
> > dar = c00000002d670000 dsisr = 40000000
> > 1:mon> zr
> >
> >> BTW, I suppose you use 4KB blocksize on the filesystem, right?
> >>
> > Yes.
> >
> > dumpe2fs /dev/sda3 | grep -i "block size" dumpe2fs 1.39 (29-May-2006)
> > Block size: 4096
> OK. The xattr block causing oops is completely correct. To me it seems
> more like some problem in powerpc memcpy() (I saw there went some changes
> into in in the end of December) - we call it to copy 27 bytes from
> address 0xc00000002d66ffe4 (which is one byte before end of the page).
> Could some of the powerpc guys have a look whether this could be the case?
> I'm not quite fluent in the powerpc assembly so it would take me ages ;).

You're right - it's a problem with the 64bit powerpc memcpy(). And the brown
paper bag is all mine (commit 25d6e2d7c58ddc4a3b614fc5381591c0cfe66556). On
Power6 and Cell we're doing a load double that goes beyond the source size
we were given to copy. I'll see if I can find a nice way of fixing this up,
if not then I'll ask Ben to revert.

Sorry about the goose chase!

Mark

2009-02-25 01:26:06

[permalink] [raw]

Subject: Re: Crash (ext3 ) during 2.6.29-rc6 boot

On Wed, 25 Feb 2009 05:01:59 am Geert Uytterhoeven wrote:
> On Mon, 23 Feb 2009, Paul Mackerras wrote:
> > Andrew Morton writes:
> > > It looks like we died in ext3_xattr_block_get():
> > >
> > > memcpy(buffer, bh->b_data + le16_to_cpu(entry->e_value_offs),
> > > size);
> > >
> > > Perhaps entry->e_value_offs is no good. I wonder if the filesystem is
> > > corrupted and this snuck through the defenses.
> > >
> > > I also wonder if there is enough info in that trace for a ppc person to
> > > be able to determine whether the faulting address is in the source or
> > > destination of the memcpy() (please)?
> >
> > It appears to have faulted on a load, implicating the source. The
> > address being referenced (0xc00000003f380000) doesn't look
> > outlandish. I wonder if this kernel has CONFIG_DEBUG_PAGEALLOC turned
> > on, and what page size is selected?
>
> I'm seeing a similar thing on PS3, but not in ext3. During early userspace
> setup (udevd), it crashes accessing a 0xc00* address in:
>
> | NIP setup+0x20/0x130
> | LR copy_user_page+0x18/0x6c
> | Call trace:
> | do_wp_page+0x5b4/0x89c
> | do_page_fault+0x3a8/0x58c
> | handle_page_fault+0x20/0x5c
>
> I have CONFIG_DEBUG_PAGEALLOC=y. If I disable it, the system boots fine.
>
> If needed, I can probably bisect this tomorrow. It definitely didn't happen in
> 2.6.29-rc5.

No need to bisect - it was 25d6e2d7c58ddc4a3b614fc5381591c0cfe66556, my
commit that "optimised" 64bit memcpy() for Power6 and Cell.

The bug was in -rc1, but if your copies were 8-byte aligned with respect
to the source the problem wouldn't have been seen... Could this have
been why you didn't see it in -rc5?

I'll work on a fix now.

Thanks!

Mark

2009-02-25 06:51:25

[permalink] [raw]

Subject: Re: Crash (ext3 ) during 2.6.29-rc6 boot

On Tue, 24 Feb 2009 05:38:37 pm Sachin P. Sant wrote:
> Jan Kara wrote:
> > Hmm, OK. But then I'm not sure how that can happen. Obviously, memcpy
> > somehow got beyond end of the page referenced by bh->b_data. So it means
> > that le16_to_cpu(entry->e_value_offs) + size > page_size. But
> > ext3_xattr_find_entry() calls ext3_xattr_check_entry() which in
> > particular checks whether e_value_offs + e_value_size isn't greater than
> > bh->b_size. So I see no way how memcpy can get beyond end of the page.
> > Sachin, is the problem reproducible? If yes, can you send us contents
> >
> Yes, i am able to recreate this problem easily. As i had mentioned if the
> earlier kernel is booted with selinux enabled and then 2.6.29-rc6 is booted
> i get this crash. But if i specify selinux=0 at command line, 2.6.29-rc6 boots
> without any problem.

Hi Sanchin and Geert,

Does the patch below fix the problems you're seeing? If it does I'll send
a properly written up and formatted patch to linuxppc-dev (as well as
another one to fix the same problem in copy_tofrom_user()).

Thanks and sorry again!

Mark

---
arch/powerpc/lib/memcpy_64.S | 26 ++++++++++++++++++++------
1 file changed, 20 insertions(+), 6 deletions(-)

Index: upstream/arch/powerpc/lib/memcpy_64.S
===================================================================
--- upstream.orig/arch/powerpc/lib/memcpy_64.S
+++ upstream/arch/powerpc/lib/memcpy_64.S
@@ -53,18 +53,19 @@ END_FTR_SECTION_IFCLR(CPU_FTR_UNALIGNED_
3: std r8,8(r3)
beq 3f
addi r3,r3,16
- ld r9,8(r4)
.Ldo_tail:
bf cr7*4+1,1f
- rotldi r9,r9,32
+ lwz r9,8(r4)
+ addi r4,r4,4
stw r9,0(r3)
addi r3,r3,4
1: bf cr7*4+2,2f
- rotldi r9,r9,16
+ lhz r9,8(r4)
+ addi r4,r4,2
sth r9,0(r3)
addi r3,r3,2
2: bf cr7*4+3,3f
- rotldi r9,r9,8
+ lbz r9,8(r4)
stb r9,0(r3)
3: ld r3,48(r1) /* return dest pointer */
blr
@@ -133,11 +134,24 @@ END_FTR_SECTION_IFCLR(CPU_FTR_UNALIGNED_
cmpwi cr1,r5,8
addi r3,r3,32
sld r9,r9,r10
- ble cr1,.Ldo_tail
+ ble cr1,6f
ld r0,8(r4)
srd r7,r0,r11
or r9,r7,r9
- b .Ldo_tail
+6:
+ bf cr7*4+1,1f
+ rotldi r9,r9,32
+ stw r9,0(r3)
+ addi r3,r3,4
+1: bf cr7*4+2,2f
+ rotldi r9,r9,16
+ sth r9,0(r3)
+ addi r3,r3,2
+2: bf cr7*4+3,3f
+ rotldi r9,r9,8
+ stb r9,0(r3)
+3: ld r3,48(r1) /* return dest pointer */
+ blr

.Ldst_unaligned:
PPC_MTOCRF 0x01,r6 # put #bytes to 8B bdry into cr7

2009-02-25 09:50:50

[permalink] [raw]

Subject: Re: Crash (ext3 ) during 2.6.29-rc6 boot

On Wed, 25 Feb 2009, Mark Nelson wrote:
> On Tue, 24 Feb 2009 05:38:37 pm Sachin P. Sant wrote:
> > Jan Kara wrote:
> > > Hmm, OK. But then I'm not sure how that can happen. Obviously, memcpy
> > > somehow got beyond end of the page referenced by bh->b_data. So it means
> > > that le16_to_cpu(entry->e_value_offs) + size > page_size. But
> > > ext3_xattr_find_entry() calls ext3_xattr_check_entry() which in
> > > particular checks whether e_value_offs + e_value_size isn't greater than
> > > bh->b_size. So I see no way how memcpy can get beyond end of the page.
> > > Sachin, is the problem reproducible? If yes, can you send us contents
> > >
> > Yes, i am able to recreate this problem easily. As i had mentioned if the
> > earlier kernel is booted with selinux enabled and then 2.6.29-rc6 is booted
> > i get this crash. But if i specify selinux=0 at command line, 2.6.29-rc6 boots
> > without any problem.
>
> Hi Sanchin and Geert,
>
> Does the patch below fix the problems you're seeing? If it does I'll send
> a properly written up and formatted patch to linuxppc-dev (as well as
> another one to fix the same problem in copy_tofrom_user()).

Unfortunately not, now it crashes while accessing the memory pointed to by
GPR16, in

NIP: copy_page_range+x0608/0x628
LR: dup_mm+0x2e4/0x428
Trace: debug_table+0xcc70/0x1afe0 (unreliable)
dup_mm+0x2e4/0x428
copy_process+0x86c/0xf9c
do_fork+0x188/0x39c
sys_clone+0x58/0x70
ppc_clone+0x8/0xc

However, after reverting 25d6e2d7c58ddc4a3b614fc5381591c0cfe66556, I still see
similar problems as above (crash in copy_page_range()).
Which makes me think that
1. Your new patch fixes the problem introduced by 25d6e2d7,
2. There's still another issue than the one introduced by 25d6e2d7.

With kind regards,

Geert Uytterhoeven
Software Architect

Sony Techsoft Centre Europe
The Corporate Village ? Da Vincilaan 7-D1 ? B-1935 Zaventem ? Belgium

Phone: +32 (0)2 700 8453
Fax: +32 (0)2 700 8622
E-mail: [email protected]
Internet: http://www.sony-europe.com/

A division of Sony Europe (Belgium) N.V.
VAT BE 0413.825.160 ? RPR Brussels
Fortis ? BIC GEBABEBB ? IBAN BE41293037680010

2009-02-25 10:51:00

[permalink] [raw]

Subject: Re: Crash (ext3 ) during 2.6.29-rc6 boot

On Wed, 25 Feb 2009, Mark Nelson wrote:
> On Wed, 25 Feb 2009 05:01:59 am Geert Uytterhoeven wrote:
> > On Mon, 23 Feb 2009, Paul Mackerras wrote:
> > > Andrew Morton writes:
> > > > It looks like we died in ext3_xattr_block_get():
> > > >
> > > > memcpy(buffer, bh->b_data + le16_to_cpu(entry->e_value_offs),
> > > > size);
> > > >
> > > > Perhaps entry->e_value_offs is no good. I wonder if the filesystem is
> > > > corrupted and this snuck through the defenses.
> > > >
> > > > I also wonder if there is enough info in that trace for a ppc person to
> > > > be able to determine whether the faulting address is in the source or
> > > > destination of the memcpy() (please)?
> > >
> > > It appears to have faulted on a load, implicating the source. The
> > > address being referenced (0xc00000003f380000) doesn't look
> > > outlandish. I wonder if this kernel has CONFIG_DEBUG_PAGEALLOC turned
> > > on, and what page size is selected?
> >
> > I'm seeing a similar thing on PS3, but not in ext3. During early userspace
> > setup (udevd), it crashes accessing a 0xc00* address in:
> >
> > | NIP setup+0x20/0x130
> > | LR copy_user_page+0x18/0x6c
> > | Call trace:
> > | do_wp_page+0x5b4/0x89c
> > | do_page_fault+0x3a8/0x58c
> > | handle_page_fault+0x20/0x5c
> >
> > I have CONFIG_DEBUG_PAGEALLOC=y. If I disable it, the system boots fine.
> >
> > If needed, I can probably bisect this tomorrow. It definitely didn't happen in
> > 2.6.29-rc5.
>
> No need to bisect - it was 25d6e2d7c58ddc4a3b614fc5381591c0cfe66556, my
> commit that "optimised" 64bit memcpy() for Power6 and Cell.
>
> The bug was in -rc1, but if your copies were 8-byte aligned with respect
> to the source the problem wouldn't have been seen... Could this have
> been why you didn't see it in -rc5?

Hmm... I just started seeing it on older kernels (-rc5+), too...

With kind regards,

Geert Uytterhoeven
Software Architect

Sony Techsoft Centre Europe
The Corporate Village ? Da Vincilaan 7-D1 ? B-1935 Zaventem ? Belgium

Phone: +32 (0)2 700 8453
Fax: +32 (0)2 700 8622
E-mail: [email protected]
Internet: http://www.sony-europe.com/

A division of Sony Europe (Belgium) N.V.
VAT BE 0413.825.160 ? RPR Brussels
Fortis ? BIC GEBABEBB ? IBAN BE41293037680010

2009-02-25 11:08:30

[permalink] [raw]

Subject: Re: Crash (ext3 ) during 2.6.29-rc6 boot

Mark Nelson wrote:
> Hi Sanchin and Geert,
>
> Does the patch below fix the problems you're seeing? If it does I'll send
> a properly written up and formatted patch to linuxppc-dev (as well as
> another one to fix the same problem in copy_tofrom_user()).
>
This patch fixes the issue at my side. I tried booting the system few times
and every single time it came up clean.

Thanks
-Sachin

--

---------------------------------
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
---------------------------------

2009-02-25 12:08:48

[permalink] [raw]

Subject: Re: Crash (ext3 ) during 2.6.29-rc6 boot

On Wed, 25 Feb 2009 08:50:46 pm Geert Uytterhoeven wrote:
> On Wed, 25 Feb 2009, Mark Nelson wrote:
> > On Tue, 24 Feb 2009 05:38:37 pm Sachin P. Sant wrote:
> > > Jan Kara wrote:
> > > > Hmm, OK. But then I'm not sure how that can happen. Obviously, memcpy
> > > > somehow got beyond end of the page referenced by bh->b_data. So it means
> > > > that le16_to_cpu(entry->e_value_offs) + size > page_size. But
> > > > ext3_xattr_find_entry() calls ext3_xattr_check_entry() which in
> > > > particular checks whether e_value_offs + e_value_size isn't greater than
> > > > bh->b_size. So I see no way how memcpy can get beyond end of the page.
> > > > Sachin, is the problem reproducible? If yes, can you send us contents
> > > >
> > > Yes, i am able to recreate this problem easily. As i had mentioned if the
> > > earlier kernel is booted with selinux enabled and then 2.6.29-rc6 is booted
> > > i get this crash. But if i specify selinux=0 at command line, 2.6.29-rc6 boots
> > > without any problem.
> >
> > Hi Sanchin and Geert,
> >
> > Does the patch below fix the problems you're seeing? If it does I'll send
> > a properly written up and formatted patch to linuxppc-dev (as well as
> > another one to fix the same problem in copy_tofrom_user()).
>
> Unfortunately not, now it crashes while accessing the memory pointed to by
> GPR16, in
>
> NIP: copy_page_range+x0608/0x628
> LR: dup_mm+0x2e4/0x428
> Trace: debug_table+0xcc70/0x1afe0 (unreliable)
> dup_mm+0x2e4/0x428
> copy_process+0x86c/0xf9c
> do_fork+0x188/0x39c
> sys_clone+0x58/0x70
> ppc_clone+0x8/0xc
>
> However, after reverting 25d6e2d7c58ddc4a3b614fc5381591c0cfe66556, I still see
> similar problems as above (crash in copy_page_range()).
> Which makes me think that
> 1. Your new patch fixes the problem introduced by 25d6e2d7,
> 2. There's still another issue than the one introduced by 25d6e2d7.

Does the following patch fix the errors you're seeing? (it applies the
same fix as the previous patch but this time to copy_tofrom_user, which
I updated in a4e22f02f5b6518c1484faea1f88d81802b9feac)

Thanks!

Mark

---
arch/powerpc/lib/copyuser_64.S | 38 +++++++++++++++++++++++++++++++-------
1 file changed, 31 insertions(+), 7 deletions(-)

Index: upstream/arch/powerpc/lib/copyuser_64.S
===================================================================
--- upstream.orig/arch/powerpc/lib/copyuser_64.S
+++ upstream/arch/powerpc/lib/copyuser_64.S
@@ -62,18 +62,19 @@ END_FTR_SECTION_IFCLR(CPU_FTR_UNALIGNED_
72: std r8,8(r3)
beq+ 3f
addi r3,r3,16
-23: ld r9,8(r4)
.Ldo_tail:
bf cr7*4+1,1f
- rotldi r9,r9,32
+23: lwz r9,8(r4)
+ addi r4,r4,4
73: stw r9,0(r3)
addi r3,r3,4
1: bf cr7*4+2,2f
- rotldi r9,r9,16
+44: lhz r9,8(r4)
+ addi r4,r4,2
74: sth r9,0(r3)
addi r3,r3,2
2: bf cr7*4+3,3f
- rotldi r9,r9,8
+45: lbz r9,8(r4)
75: stb r9,0(r3)
3: li r3,0
blr
@@ -141,11 +142,24 @@ END_FTR_SECTION_IFCLR(CPU_FTR_UNALIGNED_
6: cmpwi cr1,r5,8
addi r3,r3,32
sld r9,r9,r10
- ble cr1,.Ldo_tail
+ ble cr1,7f
34: ld r0,8(r4)
srd r7,r0,r11
or r9,r7,r9
- b .Ldo_tail
+7:
+ bf cr7*4+1,1f
+ rotldi r9,r9,32
+94: stw r9,0(r3)
+ addi r3,r3,4
+1: bf cr7*4+2,2f
+ rotldi r9,r9,16
+95: sth r9,0(r3)
+ addi r3,r3,2
+2: bf cr7*4+3,3f
+ rotldi r9,r9,8
+96: stb r9,0(r3)
+3: li r3,0
+ blr

.Ldst_unaligned:
PPC_MTOCRF 0x01,r6 /* put #bytes to 8B bdry into cr7 */
@@ -218,7 +232,6 @@ END_FTR_SECTION_IFCLR(CPU_FTR_UNALIGNED_
121:
132:
addi r3,r3,8
-123:
134:
135:
138:
@@ -226,6 +239,9 @@ END_FTR_SECTION_IFCLR(CPU_FTR_UNALIGNED_
140:
141:
142:
+123:
+144:
+145:

/*
* here we have had a fault on a load and r3 points to the first
@@ -309,6 +325,9 @@ END_FTR_SECTION_IFCLR(CPU_FTR_UNALIGNED_
187:
188:
189:
+194:
+195:
+196:
1:
ld r6,-24(r1)
ld r5,-8(r1)
@@ -329,7 +348,9 @@ END_FTR_SECTION_IFCLR(CPU_FTR_UNALIGNED_
.llong 72b,172b
.llong 23b,123b
.llong 73b,173b
+ .llong 44b,144b
.llong 74b,174b
+ .llong 45b,145b
.llong 75b,175b
.llong 24b,124b
.llong 25b,125b
@@ -347,6 +368,9 @@ END_FTR_SECTION_IFCLR(CPU_FTR_UNALIGNED_
.llong 79b,179b
.llong 80b,180b
.llong 34b,134b
+ .llong 94b,194b
+ .llong 95b,195b
+ .llong 96b,196b
.llong 35b,135b
.llong 81b,181b
.llong 36b,136b

2009-02-25 12:12:18

[permalink] [raw]

Subject: Re: Crash (ext3 ) during 2.6.29-rc6 boot

On Wed, 25 Feb 2009 10:08:22 pm Sachin P. Sant wrote:
> Mark Nelson wrote:
> > Hi Sanchin and Geert,
> >
> > Does the patch below fix the problems you're seeing? If it does I'll send
> > a properly written up and formatted patch to linuxppc-dev (as well as
> > another one to fix the same problem in copy_tofrom_user()).
> >
> This patch fixes the issue at my side. I tried booting the system few times
> and every single time it came up clean.

Good to hear. Thanks for testing Sanchin!

Mark

2009-02-25 13:31:23

[permalink] [raw]

Subject: Re: Crash (ext3 ) during 2.6.29-rc6 boot

On Wed, 25 Feb 2009, Mark Nelson wrote:
> On Wed, 25 Feb 2009 08:50:46 pm Geert Uytterhoeven wrote:
> > On Wed, 25 Feb 2009, Mark Nelson wrote:
> > > On Tue, 24 Feb 2009 05:38:37 pm Sachin P. Sant wrote:
> > > > Jan Kara wrote:
> > > > > Hmm, OK. But then I'm not sure how that can happen. Obviously, memcpy
> > > > > somehow got beyond end of the page referenced by bh->b_data. So it means
> > > > > that le16_to_cpu(entry->e_value_offs) + size > page_size. But
> > > > > ext3_xattr_find_entry() calls ext3_xattr_check_entry() which in
> > > > > particular checks whether e_value_offs + e_value_size isn't greater than
> > > > > bh->b_size. So I see no way how memcpy can get beyond end of the page.
> > > > > Sachin, is the problem reproducible? If yes, can you send us contents
> > > > >
> > > > Yes, i am able to recreate this problem easily. As i had mentioned if the
> > > > earlier kernel is booted with selinux enabled and then 2.6.29-rc6 is booted
> > > > i get this crash. But if i specify selinux=0 at command line, 2.6.29-rc6 boots
> > > > without any problem.
> > >
> > > Hi Sanchin and Geert,
> > >
> > > Does the patch below fix the problems you're seeing? If it does I'll send
> > > a properly written up and formatted patch to linuxppc-dev (as well as
> > > another one to fix the same problem in copy_tofrom_user()).
> >
> > Unfortunately not, now it crashes while accessing the memory pointed to by
> > GPR16, in
> >
> > NIP: copy_page_range+x0608/0x628
> > LR: dup_mm+0x2e4/0x428
> > Trace: debug_table+0xcc70/0x1afe0 (unreliable)
> > dup_mm+0x2e4/0x428
> > copy_process+0x86c/0xf9c
> > do_fork+0x188/0x39c
> > sys_clone+0x58/0x70
> > ppc_clone+0x8/0xc
> >
> > However, after reverting 25d6e2d7c58ddc4a3b614fc5381591c0cfe66556, I still see
> > similar problems as above (crash in copy_page_range()).
> > Which makes me think that
> > 1. Your new patch fixes the problem introduced by 25d6e2d7,
> > 2. There's still another issue than the one introduced by 25d6e2d7.
>
> Does the following patch fix the errors you're seeing? (it applies the
> same fix as the previous patch but this time to copy_tofrom_user, which
> I updated in a4e22f02f5b6518c1484faea1f88d81802b9feac)

Thanks, but I still get crashes in copy_page_range().

With kind regards,

Geert Uytterhoeven
Software Architect

Sony Techsoft Centre Europe
The Corporate Village ? Da Vincilaan 7-D1 ? B-1935 Zaventem ? Belgium

Phone: +32 (0)2 700 8453
Fax: +32 (0)2 700 8622
E-mail: [email protected]
Internet: http://www.sony-europe.com/

A division of Sony Europe (Belgium) N.V.
VAT BE 0413.825.160 ? RPR Brussels
Fortis ? BIC GEBABEBB ? IBAN BE41293037680010

2009-02-25 22:44:13

[permalink] [raw]

Subject: Re: Crash (ext3 ) during 2.6.29-rc6 boot

On Thu, 26 Feb 2009 12:31:20 am Geert Uytterhoeven wrote:
> On Wed, 25 Feb 2009, Mark Nelson wrote:
> > On Wed, 25 Feb 2009 08:50:46 pm Geert Uytterhoeven wrote:
> > > On Wed, 25 Feb 2009, Mark Nelson wrote:
> > > > On Tue, 24 Feb 2009 05:38:37 pm Sachin P. Sant wrote:
> > > > > Jan Kara wrote:
> > > > > > Hmm, OK. But then I'm not sure how that can happen. Obviously, memcpy
> > > > > > somehow got beyond end of the page referenced by bh->b_data. So it means
> > > > > > that le16_to_cpu(entry->e_value_offs) + size > page_size. But
> > > > > > ext3_xattr_find_entry() calls ext3_xattr_check_entry() which in
> > > > > > particular checks whether e_value_offs + e_value_size isn't greater than
> > > > > > bh->b_size. So I see no way how memcpy can get beyond end of the page.
> > > > > > Sachin, is the problem reproducible? If yes, can you send us contents
> > > > > >
> > > > > Yes, i am able to recreate this problem easily. As i had mentioned if the
> > > > > earlier kernel is booted with selinux enabled and then 2.6.29-rc6 is booted
> > > > > i get this crash. But if i specify selinux=0 at command line, 2.6.29-rc6 boots
> > > > > without any problem.
> > > >
> > > > Hi Sanchin and Geert,
> > > >
> > > > Does the patch below fix the problems you're seeing? If it does I'll send
> > > > a properly written up and formatted patch to linuxppc-dev (as well as
> > > > another one to fix the same problem in copy_tofrom_user()).
> > >
> > > Unfortunately not, now it crashes while accessing the memory pointed to by
> > > GPR16, in
> > >
> > > NIP: copy_page_range+x0608/0x628
> > > LR: dup_mm+0x2e4/0x428
> > > Trace: debug_table+0xcc70/0x1afe0 (unreliable)
> > > dup_mm+0x2e4/0x428
> > > copy_process+0x86c/0xf9c
> > > do_fork+0x188/0x39c
> > > sys_clone+0x58/0x70
> > > ppc_clone+0x8/0xc
> > >
> > > However, after reverting 25d6e2d7c58ddc4a3b614fc5381591c0cfe66556, I still see
> > > similar problems as above (crash in copy_page_range()).
> > > Which makes me think that
> > > 1. Your new patch fixes the problem introduced by 25d6e2d7,
> > > 2. There's still another issue than the one introduced by 25d6e2d7.
> >
> > Does the following patch fix the errors you're seeing? (it applies the
> > same fix as the previous patch but this time to copy_tofrom_user, which
> > I updated in a4e22f02f5b6518c1484faea1f88d81802b9feac)
>
> Thanks, but I still get crashes in copy_page_range().
>

Hmmm... I'm out of ideas for the moment, but thanks for testing anyway!

Mark

2009-02-25 23:19:02

[permalink] [raw]

Subject: Re: Crash (ext3 ) during 2.6.29-rc6 boot

On Thu, 26 Feb 2009 09:45:41 am Mark Nelson wrote:
> On Thu, 26 Feb 2009 12:31:20 am Geert Uytterhoeven wrote:
> > On Wed, 25 Feb 2009, Mark Nelson wrote:
> > > On Wed, 25 Feb 2009 08:50:46 pm Geert Uytterhoeven wrote:
> > > > On Wed, 25 Feb 2009, Mark Nelson wrote:
> > > > > On Tue, 24 Feb 2009 05:38:37 pm Sachin P. Sant wrote:
> > > > > > Jan Kara wrote:
> > > > > > > Hmm, OK. But then I'm not sure how that can happen. Obviously, memcpy
> > > > > > > somehow got beyond end of the page referenced by bh->b_data. So it means
> > > > > > > that le16_to_cpu(entry->e_value_offs) + size > page_size. But
> > > > > > > ext3_xattr_find_entry() calls ext3_xattr_check_entry() which in
> > > > > > > particular checks whether e_value_offs + e_value_size isn't greater than
> > > > > > > bh->b_size. So I see no way how memcpy can get beyond end of the page.
> > > > > > > Sachin, is the problem reproducible? If yes, can you send us contents
> > > > > > >
> > > > > > Yes, i am able to recreate this problem easily. As i had mentioned if the
> > > > > > earlier kernel is booted with selinux enabled and then 2.6.29-rc6 is booted
> > > > > > i get this crash. But if i specify selinux=0 at command line, 2.6.29-rc6 boots
> > > > > > without any problem.
> > > > >
> > > > > Hi Sanchin and Geert,
> > > > >
> > > > > Does the patch below fix the problems you're seeing? If it does I'll send
> > > > > a properly written up and formatted patch to linuxppc-dev (as well as
> > > > > another one to fix the same problem in copy_tofrom_user()).
> > > >
> > > > Unfortunately not, now it crashes while accessing the memory pointed to by
> > > > GPR16, in
> > > >
> > > > NIP: copy_page_range+x0608/0x628
> > > > LR: dup_mm+0x2e4/0x428
> > > > Trace: debug_table+0xcc70/0x1afe0 (unreliable)
> > > > dup_mm+0x2e4/0x428
> > > > copy_process+0x86c/0xf9c
> > > > do_fork+0x188/0x39c
> > > > sys_clone+0x58/0x70
> > > > ppc_clone+0x8/0xc
> > > >
> > > > However, after reverting 25d6e2d7c58ddc4a3b614fc5381591c0cfe66556, I still see
> > > > similar problems as above (crash in copy_page_range()).
> > > > Which makes me think that
> > > > 1. Your new patch fixes the problem introduced by 25d6e2d7,
> > > > 2. There's still another issue than the one introduced by 25d6e2d7.
> > >
> > > Does the following patch fix the errors you're seeing? (it applies the
> > > same fix as the previous patch but this time to copy_tofrom_user, which
> > > I updated in a4e22f02f5b6518c1484faea1f88d81802b9feac)
> >
> > Thanks, but I still get crashes in copy_page_range().
> >
>
> Hmmm... I'm out of ideas for the moment, but thanks for testing anyway!
>
> Mark
> _______________________________________________
> Linuxppc-dev mailing list
> [email protected]
> https://ozlabs.org/mailman/listinfo/linuxppc-dev
>

If you revert both 25d6e2d7c58ddc4a3b614fc5381591c0cfe66556 and
a4e22f02f5b6518c1484faea1f88d81802b9feac, does it help? You could also
try to revert 57dda6ef5bd5b9e60410477ad29e654097e2cca1 just in case I
need to keep wearing the brown paper bag for a bit longer :)

Thanks!

Mark

2009-02-25 23:25:19

[permalink] [raw]

Subject: [PATCH] powerpc: Fix 64bit memcpy() regression

This fixes a regression introduced by commit
25d6e2d7c58ddc4a3b614fc5381591c0cfe66556 ("powerpc: Update 64bit memcpy()
using CPU_FTR_UNALIGNED_LD_STD").

This commit allowed CPUs that have the CPU_FTR_UNALIGNED_LD_STD CPU
feature bit present to do the memcpy() with unaligned load doubles. But,
along with this came a bug where our final load double would read bytes
beyond a page boundary and into the next (unmapped) page. This was caught
by enabling CONFIG_DEBUG_PAGEALLOC,

The fix was to read only the number of bytes that we need to store rather
than reading a full 8-byte doubleword and storing only a portion of that.

In order to minimise the amount of existing code touched we use the
original do_tail for the src_unaligned case.

Below is an example of the regression, as reported by Sachin Sant:

Unable to handle kernel paging request for data at address 0xc00000003f380000
Faulting instruction address: 0xc000000000039574
cpu 0x1: Vector: 300 (Data Access) at [c00000003baf3020]
pc: c000000000039574: .memcpy+0x74/0x244
lr: d00000000244916c: .ext3_xattr_get+0x288/0x2f4 [ext3]
sp: c00000003baf32a0
msr: 8000000000009032
dar: c00000003f380000
dsisr: 40000000
current = 0xc00000003e54b010
paca = 0xc000000000a53680
pid = 1840, comm = readahead
enter ? for help
[link register ] d00000000244916c .ext3_xattr_get+0x288/0x2f4 [ext3]
[c00000003baf32a0] d000000002449104 .ext3_xattr_get+0x220/0x2f4 [ext3]
(unreliab
le)
[c00000003baf3390] d00000000244a6e8 .ext3_xattr_security_get+0x40/0x5c [ext3]
[c00000003baf3400] c000000000148154 .generic_getxattr+0x74/0x9c
[c00000003baf34a0] c000000000333400 .inode_doinit_with_dentry+0x1c4/0x678
[c00000003baf3560] c00000000032c6b0 .security_d_instantiate+0x50/0x68
[c00000003baf35e0] c00000000013c818 .d_instantiate+0x78/0x9c
[c00000003baf3680] c00000000013ced0 .d_splice_alias+0xf0/0x120
[c00000003baf3720] d00000000243e05c .ext3_lookup+0xec/0x134 [ext3]
[c00000003baf37c0] c000000000131e74 .do_lookup+0x110/0x260
[c00000003baf3880] c000000000134ed0 .__link_path_walk+0xa98/0x1010
[c00000003baf3970] c0000000001354a0 .path_walk+0x58/0xc4
[c00000003baf3a20] c000000000135720 .do_path_lookup+0x138/0x1e4
[c00000003baf3ad0] c00000000013645c .path_lookup_open+0x6c/0xc8
[c00000003baf3b70] c000000000136780 .do_filp_open+0xcc/0x874
[c00000003baf3d10] c0000000001251e0 .do_sys_open+0x80/0x140
[c00000003baf3dc0] c00000000016aaec .compat_sys_open+0x24/0x38
[c00000003baf3e30] c00000000000855c syscall_exit+0x0/0x40
--- Exception: c01 (System Call) at 000000000ff0ef18
SP (ffc6f4b0) is in userspace
1:mon>

Signed-off-by: Mark Nelson <[email protected]>
Reported-by: Sachin Sant <[email protected]>
Tested-by: Sachin Sant <[email protected]>
---
arch/powerpc/lib/memcpy_64.S | 26 ++++++++++++++++++++------
1 file changed, 20 insertions(+), 6 deletions(-)

Index: upstream/arch/powerpc/lib/memcpy_64.S
===================================================================
--- upstream.orig/arch/powerpc/lib/memcpy_64.S
+++ upstream/arch/powerpc/lib/memcpy_64.S
@@ -53,18 +53,19 @@ END_FTR_SECTION_IFCLR(CPU_FTR_UNALIGNED_
3: std r8,8(r3)
beq 3f
addi r3,r3,16
- ld r9,8(r4)
.Ldo_tail:
bf cr7*4+1,1f
- rotldi r9,r9,32
+ lwz r9,8(r4)
+ addi r4,r4,4
stw r9,0(r3)
addi r3,r3,4
1: bf cr7*4+2,2f
- rotldi r9,r9,16
+ lhz r9,8(r4)
+ addi r4,r4,2
sth r9,0(r3)
addi r3,r3,2
2: bf cr7*4+3,3f
- rotldi r9,r9,8
+ lbz r9,8(r4)
stb r9,0(r3)
3: ld r3,48(r1) /* return dest pointer */
blr
@@ -133,11 +134,24 @@ END_FTR_SECTION_IFCLR(CPU_FTR_UNALIGNED_
cmpwi cr1,r5,8
addi r3,r3,32
sld r9,r9,r10
- ble cr1,.Ldo_tail
+ ble cr1,6f
ld r0,8(r4)
srd r7,r0,r11
or r9,r7,r9
- b .Ldo_tail
+6:
+ bf cr7*4+1,1f
+ rotldi r9,r9,32
+ stw r9,0(r3)
+ addi r3,r3,4
+1: bf cr7*4+2,2f
+ rotldi r9,r9,16
+ sth r9,0(r3)
+ addi r3,r3,2
+2: bf cr7*4+3,3f
+ rotldi r9,r9,8
+ stb r9,0(r3)
+3: ld r3,48(r1) /* return dest pointer */
+ blr

.Ldst_unaligned:
PPC_MTOCRF 0x01,r6 # put #bytes to 8B bdry into cr7

2009-02-25 23:44:51

[permalink] [raw]

Subject: [PATCH] powerpc: Fix 64bit __copy_tofrom_user() regression

This fixes a regression introduced by commit
a4e22f02f5b6518c1484faea1f88d81802b9feac ("powerpc: Update 64bit
__copy_tofrom_user() using CPU_FTR_UNALIGNED_LD_STD").

The same bug that existed in the 64bit memcpy() also exists here so fix
it here too. The fix is the same as that applied to memcpy() with the
addition of fixes for the exception handling code required for
__copy_tofrom_user().

This stops us reading beyond the end of the source region we were told
to copy.

Signed-off-by: Mark Nelson <[email protected]>
---
arch/powerpc/lib/copyuser_64.S | 38 +++++++++++++++++++++++++++++++-------
1 file changed, 31 insertions(+), 7 deletions(-)

Index: upstream/arch/powerpc/lib/copyuser_64.S
===================================================================
--- upstream.orig/arch/powerpc/lib/copyuser_64.S
+++ upstream/arch/powerpc/lib/copyuser_64.S
@@ -62,18 +62,19 @@ END_FTR_SECTION_IFCLR(CPU_FTR_UNALIGNED_
72: std r8,8(r3)
beq+ 3f
addi r3,r3,16
-23: ld r9,8(r4)
.Ldo_tail:
bf cr7*4+1,1f
- rotldi r9,r9,32
+23: lwz r9,8(r4)
+ addi r4,r4,4
73: stw r9,0(r3)
addi r3,r3,4
1: bf cr7*4+2,2f
- rotldi r9,r9,16
+44: lhz r9,8(r4)
+ addi r4,r4,2
74: sth r9,0(r3)
addi r3,r3,2
2: bf cr7*4+3,3f
- rotldi r9,r9,8
+45: lbz r9,8(r4)
75: stb r9,0(r3)
3: li r3,0
blr
@@ -141,11 +142,24 @@ END_FTR_SECTION_IFCLR(CPU_FTR_UNALIGNED_
6: cmpwi cr1,r5,8
addi r3,r3,32
sld r9,r9,r10
- ble cr1,.Ldo_tail
+ ble cr1,7f
34: ld r0,8(r4)
srd r7,r0,r11
or r9,r7,r9
- b .Ldo_tail
+7:
+ bf cr7*4+1,1f
+ rotldi r9,r9,32
+94: stw r9,0(r3)
+ addi r3,r3,4
+1: bf cr7*4+2,2f
+ rotldi r9,r9,16
+95: sth r9,0(r3)
+ addi r3,r3,2
+2: bf cr7*4+3,3f
+ rotldi r9,r9,8
+96: stb r9,0(r3)
+3: li r3,0
+ blr

.Ldst_unaligned:
PPC_MTOCRF 0x01,r6 /* put #bytes to 8B bdry into cr7 */
@@ -218,7 +232,6 @@ END_FTR_SECTION_IFCLR(CPU_FTR_UNALIGNED_
121:
132:
addi r3,r3,8
-123:
134:
135:
138:
@@ -226,6 +239,9 @@ END_FTR_SECTION_IFCLR(CPU_FTR_UNALIGNED_
140:
141:
142:
+123:
+144:
+145:

/*
* here we have had a fault on a load and r3 points to the first
@@ -309,6 +325,9 @@ END_FTR_SECTION_IFCLR(CPU_FTR_UNALIGNED_
187:
188:
189:
+194:
+195:
+196:
1:
ld r6,-24(r1)
ld r5,-8(r1)
@@ -329,7 +348,9 @@ END_FTR_SECTION_IFCLR(CPU_FTR_UNALIGNED_
.llong 72b,172b
.llong 23b,123b
.llong 73b,173b
+ .llong 44b,144b
.llong 74b,174b
+ .llong 45b,145b
.llong 75b,175b
.llong 24b,124b
.llong 25b,125b
@@ -347,6 +368,9 @@ END_FTR_SECTION_IFCLR(CPU_FTR_UNALIGNED_
.llong 79b,179b
.llong 80b,180b
.llong 34b,134b
+ .llong 94b,194b
+ .llong 95b,195b
+ .llong 96b,196b
.llong 35b,135b
.llong 81b,181b
.llong 36b,136b

2009-02-26 17:40:34