2008-06-12 10:42:46

by Konstantin Kletschke

[permalink] [raw]
Subject: XFS internal error xfs_trans_cancel at line 1163 of file fs/xfs/xfs_trans.c

Hi!

Today morning my server at home bailed out two times (reboot between):

Jun 12 07:23:40 zappa Filesystem "sda7": XFS internal error
xfs_trans_cancel at line 1163 of file fs/xfs/xfs_trans.c. Caller
0xffffffff802fa8f5
Jun 12 07:23:40 zappa Pid: 2379, comm: procmail Not tainted
2.6.25-gentoo-r4 #3
Jun 12 07:23:40 zappa
Jun 12 07:23:40 zappa Call Trace:
Jun 12 07:23:40 zappa [<ffffffff802fa8f5>]
Jun 12 07:23:40 zappa [<ffffffff802f4d71>]
Jun 12 07:23:40 zappa [<ffffffff802fa8f5>]
Jun 12 07:23:40 zappa [<ffffffff803038bb>]
Jun 12 07:23:40 zappa [<ffffffff8025f1e1>]
Jun 12 07:23:40 zappa [<ffffffff80261899>]
Jun 12 07:23:40 zappa [<ffffffff8025ae00>]
Jun 12 07:23:40 zappa [<ffffffff80257010>]
Jun 12 07:23:40 zappa [<ffffffff80256d92>]
Jun 12 07:23:40 zappa [<ffffffff80257077>]
Jun 12 07:23:40 zappa [<ffffffff8020ad3b>]
Jun 12 07:23:40 zappa
Jun 12 07:23:40 zappa xfs_force_shutdown(sda7,0x8) called from line
1164 of file fs/xfs/xfs_trans.c. Return address = 0xffffffff802f4d8a
Jun 12 07:23:40 zappa Filesystem "sda7": Corruption of in-memory data
detected. Shutting down filesystem: sda7
Jun 12 07:23:40 zappa Please umount the filesystem, and rectify the problem(s)

Jun 12 08:15:58 zappa Filesystem "sda7": XFS internal error
xfs_trans_cancel at line 1163 of file fs/xfs/xfs_trans.c. Caller
0xffffffff802fa8f5
Jun 12 08:15:58 zappa Pid: 2161, comm: procmail Not tainted
2.6.25-gentoo-r4 #3
Jun 12 08:15:58 zappa
Jun 12 08:15:58 zappa Call Trace:
Jun 12 08:15:58 zappa [<ffffffff802fa8f5>]
Jun 12 08:15:58 zappa [<ffffffff802f4d71>]
Jun 12 08:15:58 zappa [<ffffffff802fa8f5>]
Jun 12 08:15:58 zappa [<ffffffff803038bb>]
Jun 12 08:15:58 zappa [<ffffffff8025f1e1>]
Jun 12 08:15:58 zappa [<ffffffff80261899>]
Jun 12 08:15:58 zappa [<ffffffff8025ae00>]
Jun 12 08:15:58 zappa [<ffffffff80257010>]
Jun 12 08:15:58 zappa [<ffffffff80256d92>]
Jun 12 08:15:58 zappa [<ffffffff80257077>]
Jun 12 08:15:58 zappa [<ffffffff8020ad3b>]
Jun 12 08:15:58 zappa
Jun 12 08:15:58 zappa xfs_force_shutdown(sda7,0x8) called from line
1164 of file fs/xfs/xfs_trans.c. Return address = 0xffffffff802f4d8a
Jun 12 08:15:58 zappa Filesystem "sda7": Corruption of in-memory data
detected. Shutting down filesystem: sda7
Jun 12 08:15:58 zappa Please umount the filesystem, and rectify the problem(s)


The partitition sda7 is my /home directory located on a SATA 750GB
Harddisk, kernel is vanilla 2.6.25. The fs is 100GB sized and 12%
filled with several small files (imap mailspool).

I investigated the system and while anything else behaves normal, I
found no error in syslog or with smartctl regarding a possible sector
reaad/write error or anything else.
Sadly I lost my ssh connection now until this evening, but xfs_check
put out a line like "xxx-count is 1 but counted 0 in ag17" after I was
instructed to remount the system once to replay the log (which
worked). Backups are finished, I wanted to run xfs_repair now...

Is this something I should worry about or the xfs Folks?

Kind Regards, Konsti


----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.


2008-06-12 14:34:14

by Oliver Pinter

[permalink] [raw]
Subject: Re: XFS internal error xfs_trans_cancel at line 1163 of file fs/xfs/xfs_trans.c

add CC's

On 6/12/08, [email protected] <[email protected]> wrote:
> Hi!
>
> Today morning my server at home bailed out two times (reboot between):
>
> Jun 12 07:23:40 zappa Filesystem "sda7": XFS internal error
> xfs_trans_cancel at line 1163 of file fs/xfs/xfs_trans.c. Caller
> 0xffffffff802fa8f5
> Jun 12 07:23:40 zappa Pid: 2379, comm: procmail Not tainted
> 2.6.25-gentoo-r4 #3
> Jun 12 07:23:40 zappa
> Jun 12 07:23:40 zappa Call Trace:
> Jun 12 07:23:40 zappa [<ffffffff802fa8f5>]
> Jun 12 07:23:40 zappa [<ffffffff802f4d71>]
> Jun 12 07:23:40 zappa [<ffffffff802fa8f5>]
> Jun 12 07:23:40 zappa [<ffffffff803038bb>]
> Jun 12 07:23:40 zappa [<ffffffff8025f1e1>]
> Jun 12 07:23:40 zappa [<ffffffff80261899>]
> Jun 12 07:23:40 zappa [<ffffffff8025ae00>]
> Jun 12 07:23:40 zappa [<ffffffff80257010>]
> Jun 12 07:23:40 zappa [<ffffffff80256d92>]
> Jun 12 07:23:40 zappa [<ffffffff80257077>]
> Jun 12 07:23:40 zappa [<ffffffff8020ad3b>]
> Jun 12 07:23:40 zappa
> Jun 12 07:23:40 zappa xfs_force_shutdown(sda7,0x8) called from line
> 1164 of file fs/xfs/xfs_trans.c. Return address = 0xffffffff802f4d8a
> Jun 12 07:23:40 zappa Filesystem "sda7": Corruption of in-memory data
> detected. Shutting down filesystem: sda7
> Jun 12 07:23:40 zappa Please umount the filesystem, and rectify the
> problem(s)
>
> Jun 12 08:15:58 zappa Filesystem "sda7": XFS internal error
> xfs_trans_cancel at line 1163 of file fs/xfs/xfs_trans.c. Caller
> 0xffffffff802fa8f5
> Jun 12 08:15:58 zappa Pid: 2161, comm: procmail Not tainted
> 2.6.25-gentoo-r4 #3
> Jun 12 08:15:58 zappa
> Jun 12 08:15:58 zappa Call Trace:
> Jun 12 08:15:58 zappa [<ffffffff802fa8f5>]
> Jun 12 08:15:58 zappa [<ffffffff802f4d71>]
> Jun 12 08:15:58 zappa [<ffffffff802fa8f5>]
> Jun 12 08:15:58 zappa [<ffffffff803038bb>]
> Jun 12 08:15:58 zappa [<ffffffff8025f1e1>]
> Jun 12 08:15:58 zappa [<ffffffff80261899>]
> Jun 12 08:15:58 zappa [<ffffffff8025ae00>]
> Jun 12 08:15:58 zappa [<ffffffff80257010>]
> Jun 12 08:15:58 zappa [<ffffffff80256d92>]
> Jun 12 08:15:58 zappa [<ffffffff80257077>]
> Jun 12 08:15:58 zappa [<ffffffff8020ad3b>]
> Jun 12 08:15:58 zappa
> Jun 12 08:15:58 zappa xfs_force_shutdown(sda7,0x8) called from line
> 1164 of file fs/xfs/xfs_trans.c. Return address = 0xffffffff802f4d8a
> Jun 12 08:15:58 zappa Filesystem "sda7": Corruption of in-memory data
> detected. Shutting down filesystem: sda7
> Jun 12 08:15:58 zappa Please umount the filesystem, and rectify the
> problem(s)
>
>
> The partitition sda7 is my /home directory located on a SATA 750GB
> Harddisk, kernel is vanilla 2.6.25. The fs is 100GB sized and 12%
> filled with several small files (imap mailspool).
>
> I investigated the system and while anything else behaves normal, I
> found no error in syslog or with smartctl regarding a possible sector
> reaad/write error or anything else.
> Sadly I lost my ssh connection now until this evening, but xfs_check
> put out a line like "xxx-count is 1 but counted 0 in ag17" after I was
> instructed to remount the system once to replay the log (which
> worked). Backups are finished, I wanted to run xfs_repair now...
>
> Is this something I should worry about or the xfs Folks?
>
> Kind Regards, Konsti
>
>
> ----------------------------------------------------------------
> This message was sent using IMP, the Internet Messaging Program.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>


--
Thanks,
Oliver

2008-06-12 22:24:40

by Eric Sandeen

[permalink] [raw]
Subject: Re: [xfs-masters] Re: XFS internal error xfs_trans_cancel at line 1163 of file fs/xfs/xfs_trans.c

Oliver Pinter wrote:
> add CC's
>
> On 6/12/08, [email protected] <[email protected]> wrote:
>> Hi!
>>
>> Today morning my server at home bailed out two times (reboot between):
>>
>> Jun 12 07:23:40 zappa Filesystem "sda7": XFS internal error
>> xfs_trans_cancel at line 1163 of file fs/xfs/xfs_trans.c. Caller
>> 0xffffffff802fa8f5
>> Jun 12 07:23:40 zappa Pid: 2379, comm: procmail Not tainted
>> 2.6.25-gentoo-r4 #3
>> Jun 12 07:23:40 zappa
>> Jun 12 07:23:40 zappa Call Trace:
>> Jun 12 07:23:40 zappa [<ffffffff802fa8f5>]
>> Jun 12 07:23:40 zappa [<ffffffff802f4d71>]
>> Jun 12 07:23:40 zappa [<ffffffff802fa8f5>]
>> Jun 12 07:23:40 zappa [<ffffffff803038bb>]
>> Jun 12 07:23:40 zappa [<ffffffff8025f1e1>]
>> Jun 12 07:23:40 zappa [<ffffffff80261899>]
>> Jun 12 07:23:40 zappa [<ffffffff8025ae00>]
>> Jun 12 07:23:40 zappa [<ffffffff80257010>]
>> Jun 12 07:23:40 zappa [<ffffffff80256d92>]
>> Jun 12 07:23:40 zappa [<ffffffff80257077>]
>> Jun 12 07:23:40 zappa [<ffffffff8020ad3b>]

ksymoops please? hard to divine what those addresses might be.

-Eric

2008-06-12 22:27:32

by Miquel van Smoorenburg

[permalink] [raw]
Subject: Re: XFS internal error xfs_trans_cancel at line 1163 of file fs/xfs/xfs_trans.c

On Thu, 2008-06-12 at 16:33 +0200, Oliver Pinter wrote:
> add CC's
>
> On 6/12/08, [email protected] <[email protected]> wrote:
> > Hi!
> >
> > Today morning my server at home bailed out two times (reboot between):

> > Jun 12 07:23:40 zappa
> > Jun 12 07:23:40 zappa xfs_force_shutdown(sda7,0x8) called from line
> > 1164 of file fs/xfs/xfs_trans.c. Return address = 0xffffffff802f4d8a
> > Jun 12 07:23:40 zappa Filesystem "sda7": Corruption of in-memory data
> > detected. Shutting down filesystem: sda7
> > Jun 12 07:23:40 zappa Please umount the filesystem, and rectify the
> > problem(s)

Hmm, interesting. I'm seeing the same thing on one of my servers since I
upgraded from 2.6.ancient (14 or so) to 2.6.25, while XFS otherwise has
been very stable for me over the years:

Linux transit5.news.xs4all.nl 2.6.25.6 #1 SMP Wed Jun 11 10:59:10 CEST
2008 x86_64 GNU/Linux

Filesystem "sda4": XFS internal error xfs_trans_cancel at line 1163 of
file fs/xfs/xfs_trans.c. Caller 0xffffffff880f1315
Pid: 3402, comm: diablo Not tainted 2.6.25.6 #1

Call Trace:
[<ffffffff880f1315>] :xfs:xfs_create+0x1e5/0x520
[<ffffffff880ea4a6>] :xfs:xfs_trans_cancel+0x126/0x150
[<ffffffff880f1315>] :xfs:xfs_create+0x1e5/0x520
[<ffffffff880fcb79>] :xfs:xfs_vn_mknod+0x1d9/0x320
[<ffffffff8028811c>] vfs_create+0xac/0xf0
[<ffffffff8028b6bd>] open_namei+0x61d/0x6c0
[<ffffffff802448c0>] autoremove_wake_function+0x0/0x30
[<ffffffff8027da0c>] do_filp_open+0x1c/0x50
[<ffffffff8027d6e9>] get_unused_fd_flags+0x79/0x120
[<ffffffff8027da9a>] do_sys_open+0x5a/0xf0
[<ffffffff8020b2bb>] system_call_after_swapgs+0x7b/0x80

xfs_force_shutdown(sda4,0x8) called from line 1164 of file
fs/xfs/xfs_trans.c. Return address = 0xffffffff880ea4bf
Filesystem "sda4": Corruption of in-memory data detected. Shutting down
filesystem: sda4
Please umount the filesystem, and rectify the problem(s)

After a reboot, xfs_repair didn't find anything wrong with the fs. It
has happened 3 times over the last few days already.

FS is on a local SCSI raid (dpt_i2o), not SATA.

Mike.

2008-06-13 03:08:58

by Dave Chinner

[permalink] [raw]
Subject: Re: XFS internal error xfs_trans_cancel at line 1163 of file fs/xfs/xfs_trans.c

On Fri, Jun 13, 2008 at 12:27:09AM +0200, Miquel van Smoorenburg wrote:
> On Thu, 2008-06-12 at 16:33 +0200, Oliver Pinter wrote:
> > add CC's
> >
> > On 6/12/08, [email protected] <[email protected]> wrote:
> > > Hi!
> > >
> > > Today morning my server at home bailed out two times (reboot between):
>
> > > Jun 12 07:23:40 zappa
> > > Jun 12 07:23:40 zappa xfs_force_shutdown(sda7,0x8) called from line
> > > 1164 of file fs/xfs/xfs_trans.c. Return address = 0xffffffff802f4d8a
> > > Jun 12 07:23:40 zappa Filesystem "sda7": Corruption of in-memory data
> > > detected. Shutting down filesystem: sda7
> > > Jun 12 07:23:40 zappa Please umount the filesystem, and rectify the
> > > problem(s)
>
> Hmm, interesting. I'm seeing the same thing on one of my servers since I
> upgraded from 2.6.ancient (14 or so) to 2.6.25, while XFS otherwise has
> been very stable for me over the years:
>
> Linux transit5.news.xs4all.nl 2.6.25.6 #1 SMP Wed Jun 11 10:59:10 CEST
> 2008 x86_64 GNU/Linux
>
> Filesystem "sda4": XFS internal error xfs_trans_cancel at line 1163 of
> file fs/xfs/xfs_trans.c. Caller 0xffffffff880f1315
> Pid: 3402, comm: diablo Not tainted 2.6.25.6 #1

This commit in 2.6.26 will probably fix it.

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=75de2a91c98a6f486f261c1367fe59f5583e15a3

Cheers,

Dave.
--
Dave Chinner
[email protected]

2008-06-13 07:16:14

by Konstantin Kletschke

[permalink] [raw]
Subject: Re: [xfs-masters] Re: XFS internal error xfs_trans_cancel at line 1163 of file fs/xfs/xfs_trans.c

Now I got ksymoops output:

~/ > ksymoops -m /boot/System.map xfs_break1.txt
ksymoops 2.4.11 on x86_64 2.6.25-gentoo-r4. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.6.25-gentoo-r4/ (default)
-m /boot/System.map (specified)

Error (regular_file): read_ksyms stat /proc/ksyms failed
ksymoops: No such file or directory
No modules in ksyms, skipping objects
No ksyms, skipping lsmod
Jun 12 07:23:40 zappa Pid: 2379, comm: procmail Not tainted 2.6.25-gentoo-r4 #3
Jun 12 07:23:40 zappa Call Trace:
Jun 12 07:23:40 zappa [<ffffffff802fa8f5>]
Jun 12 07:23:40 zappa [<ffffffff802f4d71>]
Jun 12 07:23:40 zappa [<ffffffff802fa8f5>]
Jun 12 07:23:40 zappa [<ffffffff803038bb>]
Jun 12 07:23:40 zappa [<ffffffff8025f1e1>]
Jun 12 07:23:40 zappa [<ffffffff80261899>]
Jun 12 07:23:40 zappa [<ffffffff8025ae00>]
Jun 12 07:23:40 zappa [<ffffffff80257010>]
Jun 12 07:23:40 zappa [<ffffffff80256d92>]
Jun 12 07:23:40 zappa [<ffffffff80257077>]
Jun 12 07:23:40 zappa [<ffffffff8020ad3b>]
Warning (Oops_read): Code line not seen, dumping what data is available


Trace; ffffffff802fa8f5 <xfs_create+44f/493>
Trace; ffffffff802f4d71 <xfs_trans_cancel+5c/f4>
Trace; ffffffff802fa8f5 <xfs_create+44f/493>
Trace; ffffffff803038bb <xfs_vn_mknod+15f/249>
Trace; ffffffff8025f1e1 <vfs_create+75/ba>
Trace; ffffffff80261899 <open_namei+19d/608>
Trace; ffffffff8025ae00 <vfs_lstat_fd+18/47>
Trace; ffffffff80257010 <do_filp_open+1c/3d>
Trace; ffffffff80256d92 <get_unused_fd_flags+6d/100>
Trace; ffffffff80257077 <do_sys_open+46/ca>
Trace; ffffffff8020ad3b <system_call_after_swapgs+7b/80>


1 warning and 1 error issued. Results may not be reliable.





~/ > ksymoops -m /boot/System.map xfs_break2.txt
ksymoops 2.4.11 on x86_64 2.6.25-gentoo-r4. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.6.25-gentoo-r4/ (default)
-m /boot/System.map (specified)

Error (regular_file): read_ksyms stat /proc/ksyms failed
ksymoops: No such file or directory
No modules in ksyms, skipping objects
No ksyms, skipping lsmod
Jun 12 08:15:58 zappa Pid: 2161, comm: procmail Not tainted 2.6.25-gentoo-r4 #3
Jun 12 08:15:58 zappa Call Trace:
Jun 12 08:15:58 zappa [<ffffffff802fa8f5>]
Jun 12 08:15:58 zappa [<ffffffff802f4d71>]
Jun 12 08:15:58 zappa [<ffffffff802fa8f5>]
Jun 12 08:15:58 zappa [<ffffffff803038bb>]
Jun 12 08:15:58 zappa [<ffffffff8025f1e1>]
Jun 12 08:15:58 zappa [<ffffffff80261899>]
Jun 12 08:15:58 zappa [<ffffffff8025ae00>]
Jun 12 08:15:58 zappa [<ffffffff80257010>]
Jun 12 08:15:58 zappa [<ffffffff80256d92>]
Jun 12 08:15:58 zappa [<ffffffff80257077>]
Jun 12 08:15:58 zappa [<ffffffff8020ad3b>]
Warning (Oops_read): Code line not seen, dumping what data is available


Trace; ffffffff802fa8f5 <xfs_create+44f/493>
Trace; ffffffff802f4d71 <xfs_trans_cancel+5c/f4>
Trace; ffffffff802fa8f5 <xfs_create+44f/493>
Trace; ffffffff803038bb <xfs_vn_mknod+15f/249>
Trace; ffffffff8025f1e1 <vfs_create+75/ba>
Trace; ffffffff80261899 <open_namei+19d/608>
Trace; ffffffff8025ae00 <vfs_lstat_fd+18/47>
Trace; ffffffff80257010 <do_filp_open+1c/3d>
Trace; ffffffff80256d92 <get_unused_fd_flags+6d/100>
Trace; ffffffff80257077 <do_sys_open+46/ca>
Trace; ffffffff8020ad3b <system_call_after_swapgs+7b/80>


1 warning and 1 error issued. Results may not be reliable.


Regards, Konsti


PS.: The line xfs_check puttedt out was: "agi_freecount 1, counted 0 in ag17"
After xfs_repair'ing the message vanished, the result left no files in lost+found
and seemed to have fixed this, the filesystem is up and running (for a
while :-) ) now.

--
GPG KeyID EF62FCEF
Fingerprint: 13C9 B16B 9844 EC15 CC2E A080 1E69 3FDA EF62 FCEF

2008-06-13 07:24:18

by Konstantin Kletschke

[permalink] [raw]
Subject: Re: XFS internal error xfs_trans_cancel at line 1163 of file fs/xfs/xfs_trans.c

Am 2008-06-13 13:08 +1000 schrieb Dave Chinner:

> This commit in 2.6.26 will probably fix it.
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=75de2a91c98a6f486f261c1367fe59f5583e15a3

Well, the description points out solving issues regarding dealing with
ENOSPC shutting the fs not down when it is not really necessary. Does
this count here, where my fs is
/dev/sda7 120G 12G 108G 10% /home
with inode usage of
/dev/sda7 125001728 1310022 123691706 2% /home

? May be, I am not experienced in inspecting filesystems...

Konsti

--
GPG KeyID EF62FCEF
Fingerprint: 13C9 B16B 9844 EC15 CC2E A080 1E69 3FDA EF62 FCEF

2008-06-13 11:29:07

by Miquel van Smoorenburg

[permalink] [raw]
Subject: Re: XFS internal error xfs_trans_cancel at line 1163 of file fs/xfs/xfs_trans.c

On Fri, 2008-06-13 at 13:08 +1000, Dave Chinner wrote:
> On Fri, Jun 13, 2008 at 12:27:09AM +0200, Miquel van Smoorenburg wrote:
> > On Thu, 2008-06-12 at 16:33 +0200, Oliver Pinter wrote:
> > > add CC's
> > >
> > > On 6/12/08, [email protected] <[email protected]> wrote:
> > > > Hi!
> > > >
> > > > Today morning my server at home bailed out two times (reboot between):
> >
> > > > Jun 12 07:23:40 zappa
> > > > Jun 12 07:23:40 zappa xfs_force_shutdown(sda7,0x8) called from line
> > > > 1164 of file fs/xfs/xfs_trans.c. Return address = 0xffffffff802f4d8a
> > > > Jun 12 07:23:40 zappa Filesystem "sda7": Corruption of in-memory data
> > > > detected. Shutting down filesystem: sda7
> > > > Jun 12 07:23:40 zappa Please umount the filesystem, and rectify the
> > > > problem(s)
> >
> > Hmm, interesting. I'm seeing the same thing on one of my servers since I
> > upgraded from 2.6.ancient (14 or so) to 2.6.25, while XFS otherwise has
> > been very stable for me over the years:
> >
> > Linux transit5.news.xs4all.nl 2.6.25.6 #1 SMP Wed Jun 11 10:59:10 CEST
> > 2008 x86_64 GNU/Linux
> >
> > Filesystem "sda4": XFS internal error xfs_trans_cancel at line 1163 of
> > file fs/xfs/xfs_trans.c. Caller 0xffffffff880f1315
> > Pid: 3402, comm: diablo Not tainted 2.6.25.6 #1
>
> This commit in 2.6.26 will probably fix it.
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=75de2a91c98a6f486f261c1367fe59f5583e15a3

"At ENOSPC, we can get a filesystem shutdown due to a cancelling a dirty
transaction in xfs_mkdir or xfs_create."

But:

$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda4 15G 5.1G 9.6G 35% /news

The filesystem is only used for 35%. It might have hit 100% somewhere in
the recent past though (a few reboots ago).

I've applied the patch to 2.6.25.6 just in case, I'll let it run over
the weekend to see what happens.

Mike.

2008-06-13 15:36:17

by Dave Chinner

[permalink] [raw]
Subject: Re: [xfs-masters] Re: XFS internal error xfs_trans_cancel at line 1163 of file fs/xfs/xfs_trans.c

On Fri, Jun 13, 2008 at 09:24:05AM +0200, Konstantin Kletschke wrote:
> Am 2008-06-13 13:08 +1000 schrieb Dave Chinner:
>
> > This commit in 2.6.26 will probably fix it.
> >
> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=75de2a91c98a6f486f261c1367fe59f5583e15a3
>
> Well, the description points out solving issues regarding dealing with
> ENOSPC shutting the fs not down when it is not really necessary. Does
> this count here, where my fs is
> /dev/sda7 120G 12G 108G 10% /home
> with inode usage of
> /dev/sda7 125001728 1310022 123691706 2% /home

Perhaps you've fragmented free space, which can lead to this
problem. Inodes require contiguous free space to be allocated.

Please search the mail list archive for this error to find more
about triaging the cause (i.e. the thread that led up to finding
the above problem).

Cheers,

Dave.
--
Dave Chinner
[email protected]

2008-06-16 15:50:29

by Miquel van Smoorenburg

[permalink] [raw]
Subject: Re: XFS internal error xfs_trans_cancel at line 1163 of file fs/xfs/xfs_trans.c

On Fri, 2008-06-13 at 13:28 +0200, Miquel van Smoorenburg wrote:
> On Fri, 2008-06-13 at 13:08 +1000, Dave Chinner wrote:
> > On Fri, Jun 13, 2008 at 12:27:09AM +0200, Miquel van Smoorenburg wrote:
>
> > > Linux transit5.news.xs4all.nl 2.6.25.6 #1 SMP Wed Jun 11 10:59:10 CEST
> > > 2008 x86_64 GNU/Linux
> > >
> > > Filesystem "sda4": XFS internal error xfs_trans_cancel at line 1163 of
> > > file fs/xfs/xfs_trans.c. Caller 0xffffffff880f1315
> > > Pid: 3402, comm: diablo Not tainted 2.6.25.6 #1
> >
> > This commit in 2.6.26 will probably fix it.
> >
> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=75de2a91c98a6f486f261c1367fe59f5583e15a3
>
> "At ENOSPC, we can get a filesystem shutdown due to a cancelling a dirty
> transaction in xfs_mkdir or xfs_create."
>
> But The filesystem is only used for 35%. It might have hit 100% somewhere in
> the recent past though (a few reboots ago).
> I've applied the patch to 2.6.25.6 just in case, I'll let it run over
> the weekend to see what happens.

Well, the box has been up for 3 days now. When this problem first
appeared it only stayed up for a day max, so I'm reasonably positive
it's fixed.

The patch applies cleanly to 2.6.25.6 - perhaps it should go into
-stable.

Thanks,

Mike.

2008-06-17 05:31:53

by Dave Chinner

[permalink] [raw]
Subject: Re: XFS internal error xfs_trans_cancel at line 1163 of file fs/xfs/xfs_trans.c

On Mon, Jun 16, 2008 at 05:50:06PM +0200, Miquel van Smoorenburg wrote:
> On Fri, 2008-06-13 at 13:28 +0200, Miquel van Smoorenburg wrote:
> > On Fri, 2008-06-13 at 13:08 +1000, Dave Chinner wrote:
> > > On Fri, Jun 13, 2008 at 12:27:09AM +0200, Miquel van Smoorenburg wrote:
> >
> > > > Linux transit5.news.xs4all.nl 2.6.25.6 #1 SMP Wed Jun 11 10:59:10 CEST
> > > > 2008 x86_64 GNU/Linux
> > > >
> > > > Filesystem "sda4": XFS internal error xfs_trans_cancel at line 1163 of
> > > > file fs/xfs/xfs_trans.c. Caller 0xffffffff880f1315
> > > > Pid: 3402, comm: diablo Not tainted 2.6.25.6 #1
> > >
> > > This commit in 2.6.26 will probably fix it.
> > >
> > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=75de2a91c98a6f486f261c1367fe59f5583e15a3
> >
> > "At ENOSPC, we can get a filesystem shutdown due to a cancelling a dirty
> > transaction in xfs_mkdir or xfs_create."
> >
> > But The filesystem is only used for 35%. It might have hit 100% somewhere in
> > the recent past though (a few reboots ago).
> > I've applied the patch to 2.6.25.6 just in case, I'll let it run over
> > the weekend to see what happens.
>
> Well, the box has been up for 3 days now. When this problem first
> appeared it only stayed up for a day max, so I'm reasonably positive
> it's fixed.

Cool.

> The patch applies cleanly to 2.6.25.6 - perhaps it should go into
> -stable.

For a bug that's been around for more than 3 years and reported by
only a handful of ppl? I'd prefer not to....

Cheers,

Dave.
--
Dave Chinner
[email protected]

2008-06-20 05:27:35

by Konstantin Kletschke

[permalink] [raw]
Subject: Re: XFS internal error xfs_trans_cancel at line 1163 of file fs/xfs/xfs_trans.c

Am 2008-06-13 13:08 +1000 schrieb Dave Chinner:

> This commit in 2.6.26 will probably fix it.
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=75de2a91c98a6f486f261c1367fe59f5583e15a3

If I am correct, this fix is included in 2.6.26-rc6:

Filesystem "sda6": XFS internal error xfs_trans_cancel at line 1163 of file fs/xfs/xfs_trans.c. Caller 0xffffffff802ffa42
Pid: 7393, comm: emerge Not tainted 2.6.26-rc6 #1

Call Trace:
[<ffffffff802ffa42>]
[<ffffffff802fa35a>]
[<ffffffff802ffa42>]
[<ffffffff80308a29>]
[<ffffffff80262799>]
[<ffffffff8026500a>]
[<ffffffff8025a0fa>]
[<ffffffff8020afab>]

xfs_force_shutdown(sda6,0x8) called from line 1164 of file fs/xfs/xfs_trans.c. Return address = 0xffffffff802fa373
Filesystem "sda6": Corruption of in-memory data detected. Shutting down filesystem: sda6
Please umount the filesystem, and rectify the problem(s)
Filesystem "sda6": xfs_log_force: error 5 returned.
Filesystem "sda6": xfs_log_force: error 5 returned.
zsh: Input/output error: /var/mail/root


root@zappa:~/ > ksymoops -m /usr/src/linux-2.6.26-rc6/System.map xfs-2.6.26.txt
ksymoops 2.4.11 on x86_64 2.6.26-rc6. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.6.26-rc6/ (default)
-m /usr/src/linux-2.6.26-rc6/System.map (specified)

Error (regular_file): read_ksyms stat /proc/ksyms failed
ksymoops: No such file or directory
No modules in ksyms, skipping objects
No ksyms, skipping lsmod
Pid: 7393, comm: emerge Not tainted 2.6.26-rc6 #1
Call Trace:
[<ffffffff802ffa42>]
[<ffffffff802fa35a>]
[<ffffffff802ffa42>]
[<ffffffff80308a29>]
[<ffffffff80262799>]
[<ffffffff8026500a>]
[<ffffffff8025a0fa>]
[<ffffffff8020afab>]
Warning (Oops_read): Code line not seen, dumping what data is available


Trace; ffffffff802ffa42 <xfs_create+41d/462>
Trace; ffffffff802fa35a <xfs_trans_cancel+56/ee>
Trace; ffffffff802ffa42 <xfs_create+41d/462>
Trace; ffffffff80308a29 <xfs_vn_mknod+148/229>
Trace; ffffffff80262799 <vfs_create+75/ba>
Trace; ffffffff8026500a <do_filp_open+1dc/7c5>
Trace; ffffffff8025a0fa <do_sys_open+4a/f1>
Trace; ffffffff8020afab <system_call_after_swapgs+7b/80>


1 warning and 1 error issued. Results may not be reliable.
zsh: Input/output error: /var/mail/root


--
GPG KeyID EF62FCEF
Fingerprint: 13C9 B16B 9844 EC15 CC2E A080 1E69 3FDA EF62 FCEF

2008-06-20 05:45:26

by Konstantin Kletschke

[permalink] [raw]
Subject: Re: XFS internal error xfs_trans_cancel at line 1163 of file fs/xfs/xfs_trans.c

Again. Before strace looks weird, this looks like the old one:

ksymoops 2.4.11 on x86_64 2.6.26-rc6. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.6.26-rc6/ (default)
-m /usr/src/linux-2.6.26-rc6/System.map (specified)

Error (regular_file): read_ksyms stat /proc/ksyms failed
ksymoops: No such file or directory
No modules in ksyms, skipping objects
No ksyms, skipping lsmod
Pid: 2137, comm: fetchnews Not tainted 2.6.26-rc6 #1
Call Trace:
[<ffffffff802ffa42>]
[<ffffffff802fa35a>]
[<ffffffff802ffa42>]
[<ffffffff80308a29>]
[<ffffffff80262799>]
[<ffffffff8026500a>]
[<ffffffff80223506>]
[<ffffffff8023116c>]
[<ffffffff8025a0fa>]
[<ffffffff8020afab>]
Warning (Oops_read): Code line not seen, dumping what data is available


Trace; ffffffff802ffa42 <xfs_create+41d/462>
Trace; ffffffff802fa35a <xfs_trans_cancel+56/ee>
Trace; ffffffff802ffa42 <xfs_create+41d/462>
Trace; ffffffff80308a29 <xfs_vn_mknod+148/229>
Trace; ffffffff80262799 <vfs_create+75/ba>
Trace; ffffffff8026500a <do_filp_open+1dc/7c5>
Trace; ffffffff80223506 <ns_to_timeval+9/27>
Trace; ffffffff8023116c <__remove_hrtimer+6b/78>
Trace; ffffffff8025a0fa <do_sys_open+4a/f1>
Trace; ffffffff8020afab <system_call_after_swapgs+7b/80>


1 warning and 1 error issued. Results may not be reliable.


--
GPG KeyID EF62FCEF
Fingerprint: 13C9 B16B 9844 EC15 CC2E A080 1E69 3FDA EF62 FCEF

2008-06-22 22:54:17

by Dave Chinner

[permalink] [raw]
Subject: Re: XFS internal error xfs_trans_cancel at line 1163 of file fs/xfs/xfs_trans.c

On Fri, Jun 20, 2008 at 07:27:17AM +0200, Konstantin Kletschke wrote:
> Am 2008-06-13 13:08 +1000 schrieb Dave Chinner:
>
> > This commit in 2.6.26 will probably fix it.
> >
> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=75de2a91c98a6f486f261c1367fe59f5583e15a3
>
> If I am correct, this fix is included in 2.6.26-rc6:

Yes, so if you're seeing it again then there's a different problem.

Please provide a pointer to a xfs_metadump image of the filesystem
and the steps to reproduce the error from the image so we can
get to the bottom of it...

Cheers,

Dave.
--
Dave Chinner
[email protected]

2008-06-23 07:58:18

by Konstantin Kletschke

[permalink] [raw]
Subject: Re: XFS internal error xfs_trans_cancel at line 1163 of file fs/xfs/xfs_trans.c

Am 2008-06-23 08:53 +1000 schrieb Dave Chinner:

> > If I am correct, this fix is included in 2.6.26-rc6:
>
> Yes, so if you're seeing it again then there's a different problem.

There seems to be something else too, yes.

> Please provide a pointer to a xfs_metadump image of the filesystem
> and the steps to reproduce the error from the image so we can
> get to the bottom of it...

Well, accidently I formatted the partition. I needed my /var partition
urgently and I tried the following: I tarred the whole /var from a
rescue system (gentoo amd64 live cd version 2007.0, IIRC there is 2.6.19
on it) and did a "rm -fr /var/*" and tried to tar it back.

While this worked for my /home (I also had the error there), tarring
back the files onto /var produced several oopses in dmesg and the
userspace complained about an error in a (directory?) structure not
being accessible/not able to be initialized. Well, I took screenshots of
this but accidently I lost them on my cam and due to running a rescue
system there is no reminisence in a file of this.

I formatted the partition then :-(

I could only try to reproduce it on my home partition... Does anybody
have a clue how this can be done or a suspicion how to trigger this
error?

King Regards, Konsti


--
GPG KeyID EF62FCEF
Fingerprint: 13C9 B16B 9844 EC15 CC2E A080 1E69 3FDA EF62 FCEF