LinuxLists.cc - Oops with corrupted NTFS image

2008-07-06 17:22:03

Subject: Oops with corrupted NTFS image

Hi,

when mounting a corrupted ntfs image with errors=recover i get the following oops:

[ 129.615116] NTFS-fs error (device loop0): check_mft_mirror(): $MFT and $MFTMirr (record 0) do not match. Run ntfsfix or chkdsk.
[ 129.627531] NTFS-fs error (device loop0): load_system_files(): $MFTMirr does not match $MFT. Mounting read-only. Run ntfsfix and/or chkdsk.
[ 129.639941] NTFS-fs error (device loop0): map_mft_record_page(): Mft record 0xa is corrupt. Run chkdsk.
[ 129.640212] NTFS-fs error (device loop0): map_mft_record(): Failed with error code 5.
[ 129.640388] NTFS-fs error (device loop0): ntfs_read_locked_inode(): Failed with error code -5. Marking corrupt inode 0xa as bad. Run chkdsk.
[ 129.640623] NTFS-fs error (device loop0): load_and_init_upcase(): Failed to load $UpCase from the volume. Using default.
[ 129.656112] BUG: unable to handle kernel paging request at c982e558
[ 129.656372] IP: [<c030ad5c>] ntfs_read_locked_inode+0x16c/0x1520
[ 129.656656] Oops: 0000 [#1] PREEMPT DEBUG_PAGEALLOC
[ 129.656999] Modules linked in: [last unloaded: rcutorture]
[ 129.657014]
[ 129.657014] Pid: 4676, comm: mount Not tainted (2.6.26-rc8-00290-gb8a0b6c #3)
[ 129.657014] EIP: 0060:[<c030ad5c>] EFLAGS: 00010296 CPU: 0
[ 129.657014] EIP is at ntfs_read_locked_inode+0x16c/0x1520
[ 129.657014] EAX: c982e550 EBX: 00000000 ECX: c9821000 EDX: c9821038
[ 129.657014] ESI: 00000000 EDI: 00000000 EBP: c902ab50 ESP: c902aadc
[ 129.657014] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
[ 129.657014] Process mount (pid: 4676, ti=c902a000 task=c9053f80 task.ti=c902a000)
[ 129.657014] Stack: 00000000 00000000 00000000 00000000 00000000 c9026000 c0cd58fc cbffad84
[ 129.657014] c902ab14 c0976870 c0976870 c0976860 cb6e3a40 c9821000 c982e550 c9026000
[ 129.657014] 00000000 c902ab2c c0755667 cbffac00 c902ab50 c01a22f8 c902ab60 00000001
[ 129.657014] Call Trace:
[ 129.657014] [<c0755667>] ? _spin_unlock+0x27/0x50
[ 129.657014] [<c01a22f8>] ? iget5_locked+0x138/0x160
[ 129.657014] [<c030cc80>] ? ntfs_test_inode+0x0/0x70
[ 129.657014] [<c030c165>] ? ntfs_iget+0x55/0x80
[ 129.657014] [<c030ccf0>] ? ntfs_init_locked_inode+0x0/0xf0
[ 129.657014] [<c031861b>] ? load_and_init_attrdef+0x1b/0x270
[ 129.657014] [<c014678f>] ? lockdep_init_map+0x2f/0x480
[ 129.657014] [<c03021ad>] ? __ntfs_error+0x8d/0xd0
[ 129.657014] [<c0318c16>] ? load_system_files+0x3a6/0x1c50
[ 129.657014] [<c031cd4d>] ? generate_default_upcase+0x2d/0x110
[ 129.657014] [<c031cd4d>] ? generate_default_upcase+0x2d/0x110
[ 129.657014] [<c031cd4d>] ? generate_default_upcase+0x2d/0x110
[ 129.657014] [<c031bf6f>] ? ntfs_fill_super+0xccf/0x1420
[ 129.657014] [<c0190b5f>] ? get_sb_bdev+0xef/0x120
[ 129.657014] [<c01a5afd>] copy_mount_options+0x38/0x140
[ 129.657014] [<c01973ae>] ? getname+0x9e/0xd0
[ 129.657014] [<c01a6f7f>] ? sys_mount+0x6f/0xb0
[ 129.657014] [<c0103d79>] ? sysenter_past_esp+0x6a/0xb1
[ 129.657014] =======================
[ 129.657014] Code: 0c 89 5c 24 04 c7 04 24 00 00 00 00 e8 ae 2b ff ff 85 c0 89 45 cc 0f 85 26 0c 00 00 8b 45 c8 8b 50 04 0f b7 42 14 01 d0 89 45 c4 <8b> 58 08 8b 70 0c 81 c3 00 80 c1 2a 81 d6 21 4e 62 fe 89 d8 89
[ 129.657014] EIP: [<c030ad5c>] ntfs_read_locked_inode+0x16c/0x1520 SS:ESP 0068:c902aadc
[ 129.657014] ---[ end trace 9b1c4369310afb44 ]---

which corresponds to:

(gdb) l *(ntfs_read_locked_inode+0x16c)
0xc030ad5c is in ntfs_read_locked_inode (fs/ntfs/time.h:95).
90 u64 t = (u64)(sle64_to_cpu(time) - NTFS_TIME_OFFSET);
91 /*
92 * Convert the time to 1-second intervals and the remainder to
93 * 1-nano-second intervals.
94 */
95 ts.tv_nsec = do_div(t, 10000000) * 100;
96 ts.tv_sec = t;
97 return ts;
98 }
99

The image is available at http://www.cccmz.de/~snakebyte/ntfs.33.img.bz2
I can also reproduce this with -rc9 and without having run rcutortue
before.

Greetings, Eric

2008-07-07 08:34:21

by Anton Altaparmakov

[permalink] [raw]

Subject: Re: [Linux-NTFS-Dev] Oops with corrupted NTFS image

Hi,

Thanks for the report. Would you be able to try the below change to
your kernel and repeat?

On 6 Jul 2008, at 18:21, Eric Sesterhenn wrote:
> 0xc030ad5c is in ntfs_read_locked_inode (fs/ntfs/time.h:95).
> 90 u64 t = (u64)(sle64_to_cpu(time) - NTFS_TIME_OFFSET);
> 91 /*
> 92 * Convert the time to 1-second intervals and the remainder to
> 93 * 1-nano-second intervals.
> 94 */
> 95 ts.tv_nsec = do_div(t, 10000000) * 100;
> 96 ts.tv_sec = t;
> 97 return ts;
> 98 }

Please replace the ts.tv_nsec=... and ts.tv_sec=... with:

if (t) {
ts.tv_nsec = do_div(t, 10000000) * 100;
ts.tv_sec = t;
} else {
ts.tv_nsec = 0;
ts.tv_sec = 0;
}

Thanks a lot in advance!

Best regards,

Anton
--
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer, http://www.linux-ntfs.org/

2008-07-07 12:25:03

by Eric Sesterhenn

[permalink] [raw]

Subject: Re: [Linux-NTFS-Dev] Oops with corrupted NTFS image

hi,

* Anton Altaparmakov ([email protected]) wrote:
>
> Thanks for the report. Would you be able to try the below change to your
> kernel and repeat?
>
> On 6 Jul 2008, at 18:21, Eric Sesterhenn wrote:
>> 0xc030ad5c is in ntfs_read_locked_inode (fs/ntfs/time.h:95).
>> 90 u64 t = (u64)(sle64_to_cpu(time) - NTFS_TIME_OFFSET);
>> 91 /*
>> 92 * Convert the time to 1-second intervals and the remainder to
>> 93 * 1-nano-second intervals.
>> 94 */
>> 95 ts.tv_nsec = do_div(t, 10000000) * 100;
>> 96 ts.tv_sec = t;
>> 97 return ts;
>> 98 }
>
> Please replace the ts.tv_nsec=... and ts.tv_sec=... with:
>
> if (t) {
> ts.tv_nsec = do_div(t, 10000000) * 100;
> ts.tv_sec = t;
> } else {
> ts.tv_nsec = 0;
> ts.tv_sec = 0;
> }
>
> Thanks a lot in advance!

I did this and still get the oops, the strange thing is, that i now
get the bug in line 90

[ 76.583958] BUG: unable to handle kernel paging request at c6798558
[ 76.584019] IP: [<c030adbc>] ntfs_read_locked_inode+0x16c/0x1570
[ 76.584019] *pde = 090b1163 *pte = 06798160
[ 76.584019] Oops: 0000 [#1] PREEMPT DEBUG_PAGEALLOC
[ 76.584019] Modules linked in:
[ 76.584019]
[ 76.584019] Pid: 4488, comm: mount Not tainted (2.6.26-rc9-00005-g1b40a89-dirty #1)
[ 76.584019] EIP: 0060:[<c030adbc>] EFLAGS: 00010296 CPU: 0
[ 76.584019] EIP is at ntfs_read_locked_inode+0x16c/0x1570
[ 76.584019] EAX: c6798550 EBX: 00000000 ECX: c678b000 EDX: c678b038
[ 76.584019] ESI: 00000000 EDI: 00000000 EBP: c5c2db50 ESP: c5c2dadc
[ 76.584019] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
[ 76.584019] Process mount (pid: 4488, ti=c5c2d000 task=c5c12fa0 task.ti=c5c2d000)
[ 76.584019] Stack: 00000000 00000000 00000000 00000000 00000000 c8bda000 c0cd58fc c7132d84
[ 76.584019] c5c2db14 c0976870 c0976870 c0976860 c8bed180 c678b000 c6798550 c8bda000
[ 76.584019] 00000000 c5c2db2c c0755737 c7132c00 c5c2db50 c01a2308 c5c2db60 00000001
[ 76.584019] Call Trace:
[ 76.584019] [<c0755737>] ? _spin_unlock+0x27/0x50
[ 76.584019] [<c01a2308>] ? iget5_locked+0x138/0x160
[ 76.584019] [<c030cd30>] ? ntfs_test_inode+0x0/0x70
[ 76.584020] [<c030c215>] ? ntfs_iget+0x55/0x80
[ 76.584020] [<c030cda0>] ? ntfs_init_locked_inode+0x0/0xf0
[ 76.584020] [<c03186cb>] ? load_and_init_attrdef+0x1b/0x270
[ 76.584020] [<c014678f>] ? lockdep_init_map+0x2f/0x480
[ 76.584020] [<c030219a>] ? __ntfs_error+0x1a/0xd0
[ 76.584020] [<c0318cc6>] ? load_system_files+0x3a6/0x1c50
[ 76.584020] [<c031cdfd>] ? generate_default_upcase+0x2d/0x110
[ 76.584020] [<c031cdfd>] ? generate_default_upcase+0x2d/0x110
[ 76.584020] [<c031cdfd>] ? generate_default_upcase+0x2d/0x110
[ 76.584020] [<c031c01f>] ? ntfs_fill_super+0xccf/0x1420
[ 76.584020] [<c0753a28>] ? mutex_unlock+0x8/0x10
[ 76.584020] [<c0190b6f>] ? get_sb_bdev+0xef/0x120
[ 76.584020] [<c01a5b0d>] ? alloc_vfsmnt+0xdd/0x120
[ 76.584020] [<c01a5b0d>] ? alloc_vfsmnt+0xdd/0x120
[ 76.584020] [<c0317402>] ? ntfs_get_sb+0x22/0x30
[ 76.584020] [<c031b350>] ? ntfs_fill_super+0x0/0x1420
[ 76.584020] [<c01906ba>] ? vfs_kern_mount+0x3a/0x90
[ 76.584020] [<c0190769>] ? do_kern_mount+0x39/0xd0
[ 76.584020] [<c01a6d45>] ? do_new_mount+0x65/0x90
[ 76.584020] [<c01a6eca>] ? do_mount+0x15a/0x1b0
[ 76.584020] [<c0189d95>] ? kmem_cache_alloc+0x95/0xc0
[ 76.584020] [<c016deeb>] ? __get_free_pages+0x1b/0x30
[ 76.584020] [<c01a4bd8>] ? copy_mount_options+0x38/0x140
[ 76.584020] [<c01973be>] ? getname+0x9e/0xd0
[ 76.584020] [<c01a6f8f>] ? sys_mount+0x6f/0xb0
[ 76.584020] [<c0103d79>] ? sysenter_past_esp+0x6a/0xb1
[ 76.584020] =======================
[ 76.584020] Code: 0c 89 5c 24 04 c7 04 24 00 00 00 00 e8 ae 2b ff ff 85 c0 89 45 cc 0f 85 76 0c 00 00 8b 45 c8 8b 50 04 0f b7 42 14 01 d0 89 45 c4 <8b> 58 08 8b 70 0c 81 c3 00 80 c1 2a 81 d6 21 4e 62 fe 89 f1 09
[ 76.584020] EIP: [<c030adbc>] ntfs_read_locked_inode+0x16c/0x1570 SS:ESP 0068:c5c2dadc
[ 76.584020] ---[ end trace f67599227dc2e4cc ]---

(gdb) l *(ntfs_read_locked_inode+0x16c)
0xc030adbc is in ntfs_read_locked_inode (fs/ntfs/time.h:90).
85 static inline struct timespec ntfs2utc(const sle64 time)
86 {
87 struct timespec ts;
88
89 /* Subtract the NTFS time offset. */
90 u64 t = (u64)(sle64_to_cpu(time) - NTFS_TIME_OFFSET);
91 /*
92 * Convert the time to 1-second intervals and the remainder to
93 * 1-nano-second intervals.
94 */
(gdb) quit

Not sure why this happens. I checked out a fresh git tree to
make sure my tree isnt broken or something. Might gcc be bogus
or the debug information and the bug happens in reality somewhere else?

root@whiterabbit:/usr/src/linux# gcc -v
Using built-in specs.
Target: i486-linux-gnu
Configured with: ../src/configure -v --enable-languages=c,c++,fortran,objc,obj-c++,treelang --prefix=/usr --enable-shared --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --enable-nls --with-gxx-include-dir=/usr/include/c++/4.2 --program-suffix=-4.2 --enable-clocale=gnu --enable-libstdcxx-debug --enable-objc-gc --enable-mpfr --enable-targets=all --enable-checking=release --build=i486-linux-gnu --host=i486-linux-gnu --target=i486-linux-gnu
Thread model: posix
gcc version 4.2.3 (Ubuntu 4.2.3-2ubuntu7)

Greetings, Eric

2008-07-07 13:04:09

by Vegard Nossum

[permalink] [raw]

Subject: Re: [Linux-NTFS-Dev] Oops with corrupted NTFS image

On Mon, Jul 7, 2008 at 2:24 PM, Eric Sesterhenn <[email protected]> wrote:
> (gdb) l *(ntfs_read_locked_inode+0x16c)
> 0xc030adbc is in ntfs_read_locked_inode (fs/ntfs/time.h:90).
> 85 static inline struct timespec ntfs2utc(const sle64 time)
> 86 {
> 87 struct timespec ts;
> 88
> 89 /* Subtract the NTFS time offset. */
> 90 u64 t = (u64)(sle64_to_cpu(time) - NTFS_TIME_OFFSET);
> 91 /*
> 92 * Convert the time to 1-second intervals and the remainder to
> 93 * 1-nano-second intervals.
> 94 */
> (gdb) quit
>
> Not sure why this happens. I checked out a fresh git tree to
> make sure my tree isnt broken or something. Might gcc be bogus
> or the debug information and the bug happens in reality somewhere else?

Are you sure you didn't recompile/relink vmlinux after getting the
error? If not, maybe it's gdb which gets confused (somehow) by the
inlining.

Your 'Code:' line decodes to these instructions:

0: 8b 58 08 mov 0x8(%eax),%ebx
3: 8b 70 0c mov 0xc(%eax),%esi

And I find this in my own compiled vmlinux at:

c025bcc1: 8b 58 08 mov 0x8(%eax),%ebx
c025bcc4: 8b 70 0c mov 0xc(%eax),%esi

which is at...

$ addr2line -e vmlinux -i c025bcc1
fs/ntfs/inode.c:670

which is...

vi->i_mtime = ntfs2utc(si->last_data_change_time);

which is probably what is causing the NULL pointer dereference.

Vegard

--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036

2008-07-07 13:21:46

by Carl-Daniel Hailfinger

[permalink] [raw]

Subject: Re: [Linux-NTFS-Dev] Oops with corrupted NTFS image

On 07.07.2008 10:07, Anton Altaparmakov wrote:
> Hi,
>
> Thanks for the report. Would you be able to try the below change to
> your kernel and repeat?
>
> On 6 Jul 2008, at 18:21, Eric Sesterhenn wrote:
>
>> 0xc030ad5c is in ntfs_read_locked_inode (fs/ntfs/time.h:95).
>> 90 u64 t = (u64)(sle64_to_cpu(time) - NTFS_TIME_OFFSET);
>> 91 /*
>> 92 * Convert the time to 1-second intervals and the remainder to
>> 93 * 1-nano-second intervals.
>> 94 */
>> 95 ts.tv_nsec = do_div(t, 10000000) * 100;
>> 96 ts.tv_sec = t;
>> 97 return ts;
>> 98 }
>>
>
> Please replace the ts.tv_nsec=... and ts.tv_sec=... with:
>
> if (t) {
> ts.tv_nsec = do_div(t, 10000000) * 100;
> ts.tv_sec = t;
> } else {
> ts.tv_nsec = 0;
> ts.tv_sec = 0;
> }
>

AFAICS it is not the division that fails (unless a division itself can
cause an access to invalid memory), but either the write to ts or the
read from t. Dependign on compiler optimizations, the real root cause
may be related to an access to time.

Regards,
Carl-Daniel

--
http://www.hailfinger.org/

2008-07-07 13:27:17

by Carl-Daniel Hailfinger

[permalink] [raw]

Subject: Re: [Linux-NTFS-Dev] Oops with corrupted NTFS image

On 07.07.2008 15:03, Vegard Nossum wrote:
> On Mon, Jul 7, 2008 at 2:24 PM, Eric Sesterhenn <[email protected]> wrote:
>
>> (gdb) l *(ntfs_read_locked_inode+0x16c)
>> 0xc030adbc is in ntfs_read_locked_inode (fs/ntfs/time.h:90).
>> 85 static inline struct timespec ntfs2utc(const sle64 time)
>> 86 {
>> 87 struct timespec ts;
>> 88
>> 89 /* Subtract the NTFS time offset. */
>> 90 u64 t = (u64)(sle64_to_cpu(time) - NTFS_TIME_OFFSET);
>> 91 /*
>> 92 * Convert the time to 1-second intervals and the remainder to
>> 93 * 1-nano-second intervals.
>> 94 */
>> (gdb) quit
>>
>> Not sure why this happens. I checked out a fresh git tree to
>> make sure my tree isnt broken or something. Might gcc be bogus
>> or the debug information and the bug happens in reality somewhere else?
>>

As I pointed out in my other mail, gcc compiler optimizations may have
caused a slightly off location being printed.
My suspicions about time were correct.

> Are you sure you didn't recompile/relink vmlinux after getting the
> error? If not, maybe it's gdb which gets confused (somehow) by the
> inlining.
>
> Your 'Code:' line decodes to these instructions:
>
> 0: 8b 58 08 mov 0x8(%eax),%ebx
> 3: 8b 70 0c mov 0xc(%eax),%esi
>
> And I find this in my own compiled vmlinux at:
>
> c025bcc1: 8b 58 08 mov 0x8(%eax),%ebx
> c025bcc4: 8b 70 0c mov 0xc(%eax),%esi
>
> which is at...
>
> $ addr2line -e vmlinux -i c025bcc1
> fs/ntfs/inode.c:670
>
> which is...
>
> vi->i_mtime = ntfs2utc(si->last_data_change_time);
>
> which is probably what is causing the NULL pointer dereference.
>

The problem of this theory is that there is no NULL pointer dereference,
unless you meant "invalid memory access".

Regards,
Carl-Daniel

--
http://www.hailfinger.org/

2008-07-07 13:33:55

by Vegard Nossum

[permalink] [raw]

Subject: Re: [Linux-NTFS-Dev] Oops with corrupted NTFS image

On Mon, Jul 7, 2008 at 3:26 PM, Carl-Daniel Hailfinger
<[email protected]> wrote:
>> which is...
>>
>> vi->i_mtime = ntfs2utc(si->last_data_change_time);
>>
>> which is probably what is causing the NULL pointer dereference.
>>
>
> The problem of this theory is that there is no NULL pointer dereference,
> unless you meant "invalid memory access".

Oops, yeah. I skipped a line, but page fault nonetheless ;-)

The rest of the theory should still hold. The pointer being
dereferenced is held in %eax, which is c6798550. This fits with the
reported address:

[ 76.583958] BUG: unable to handle kernel paging request at c6798558
...
[ 76.584019] EAX: c6798550 EBX: 00000000 ECX: c678b000 EDX: c678b038

Thanks for the correction.

Vegard

--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036