BTW, after last resume from disk fs was corrupted but fsck managed to
fix this error. So I think severity of this issue should be raised.
On 11/27/2011 11:10 PM, Alex wrote:
> Hi guys,
>
> I'm using linux kernel 3.2-rc3. Since bugzilla.kernel.org is in down
> I'm writing this email. After resuming from s2disk I get following error:
> Nov 27 19:42:38 alex kernel: [ 3827.422675] ------------[ cut here
> ]------------
> Nov 27 19:42:38 alex kernel: [ 3827.422683] WARNING: at fs/inode.c:884
> unlock_new_inode+0x34/0x59()
> Nov 27 19:42:38 alex kernel: [ 3827.422686] Hardware name: 4170CTO
> Nov 27 19:42:38 alex kernel: [ 3827.422688] Modules linked in: vfat
> fat usb_storage uas cpufreq_stats rfcomm lockd 8021q garp stp llc
> fcoe libfcoe libfc scsi_transport_fc scsi_tgt bnep ip6t_REJECT
> nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables nf_conntr
> ack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack hdaps(O)
> snd_hda_codec_hdmi snd_hda_codec_conexant virtio_net kvm_intel kvm
> snd_hda_int
> el snd_hda_codec btusb snd_hwdep snd_seq snd_seq_device snd_pcm
> snd_timer arc4 iwlwifi mac80211 cfg80211 wmi e1000e uvcvideo videodev
> media v4l2_compat_ioctl32 bluetooth uinput pcspkr joydev
> thinkpad_acpi snd snd_page_alloc soundcore i2c_i801 iTCO_wdt
> iTCO_vendor_su
> pport rfkill binfmt_misc microcode sunrpc tp_smapi(O) thinkpad_ec(O)
> sdhci_pci sdhci mmc_core [last unloaded: scsi_wait_scan]
> Nov 27 19:42:38 alex kernel: [ 3827.422738] Pid: 968, comm: abrtd
> Tainted: G W O 3.2.0-0.rc3.git0.1.fc16.x86_64 #1
> Nov 27 19:42:38 alex kernel: [ 3827.422740] Call Trace:
> Nov 27 19:42:38 alex kernel: [ 3827.422744] [<ffffffff810589c2>]
> warn_slowpath_common+0x83/0x9b
> Nov 27 19:42:38 alex kernel: [ 3827.422747] [<ffffffff810589f4>]
> warn_slowpath_null+0x1a/0x1c
> Nov 27 19:42:38 alex kernel: [ 3827.422750] [<ffffffff8113de44>]
> unlock_new_inode+0x34/0x59
> Nov 27 19:42:38 alex kernel: [ 3827.422754] [<ffffffff8118b6b8>]
> ext4_new_inode+0xbc6/0xc5b
> Nov 27 19:42:38 alex kernel: [ 3827.422758] [<ffffffff81196ca1>]
> ext4_symlink+0x14d/0x2f4
> Nov 27 19:42:38 alex kernel: [ 3827.422762] [<ffffffff8113592b>]
> vfs_symlink+0x54/0x74
> Nov 27 19:42:38 alex kernel: [ 3827.422766] [<ffffffff811356a1>] ?
> user_path_create+0x4d/0x57
> Nov 27 19:42:38 alex kernel: [ 3827.422769] [<ffffffff811371b8>]
> sys_symlinkat+0x6b/0xb2
> Nov 27 19:42:38 alex kernel: [ 3827.422773] [<ffffffff81137215>]
> sys_symlink+0x16/0x18
> Nov 27 19:42:38 alex kernel: [ 3827.422777] [<ffffffff81526042>]
> system_call_fastpath+0x16/0x1b
> Nov 27 19:42:38 alex kernel: [ 3827.422780] ---[ end trace
> 9866b4286b488fd4 ]---
>
> Kernel config is attached. Please let me know if I can help you with
> anything to fix this issue
>
> Best regards,
> Alex
On Sun, Nov 27, 2011 at 11:24:03PM +0300, Alex wrote:
> BTW, after last resume from disk fs was corrupted but fsck managed
> to fix this error. So I think severity of this issue should be
> raised.
Can you reproduce this reliably? What was running at the time of the s2disk?
What appears to be going on is that insert_inode_locked() is failing
at fs/ext4/ialloc.c:887, probably because there's another inode with
that inode number already on the superblock's hash list. The error
codepath if insert_inode_locked() fail is incorrect; it's going to
fail_drop, which tries dropping the inode's dquot (but we haven't
calle ddquot_initialize)inode) yet) and calls unlock_new_inode(), but
I_NEW hasn't been set because insert_inode_locked().
So the warning is easy to fix; we just need to have it jump to fail
instead of fail_drop. But the bigger issue is why did
insert_inode_locked() failed in the first place.
Did this error happen *right* after the system resumed, or did some
amount of time pass before the warning triggered? This could have
happened because the in-memory (or possibly on-disk) copy of the inode
allocation bitmap has gotten corrupted, for example.
What was the nature of the file system corruption which e2fsck decided
that it need to correct?
Regards,
- Ted
Hi Ted,
Unfortunately my laptop died today and I can't retest this issue. I'll
provide more information once and if I repair it.
Thanks,
Alex
On 28.11.2011 00:34, Ted Ts'o wrote:
> On Sun, Nov 27, 2011 at 11:24:03PM +0300, Alex wrote:
>> BTW, after last resume from disk fs was corrupted but fsck managed
>> to fix this error. So I think severity of this issue should be
>> raised.
> Can you reproduce this reliably? What was running at the time of the s2disk?
>
> What appears to be going on is that insert_inode_locked() is failing
> at fs/ext4/ialloc.c:887, probably because there's another inode with
> that inode number already on the superblock's hash list. The error
> codepath if insert_inode_locked() fail is incorrect; it's going to
> fail_drop, which tries dropping the inode's dquot (but we haven't
> calle ddquot_initialize)inode) yet) and calls unlock_new_inode(), but
> I_NEW hasn't been set because insert_inode_locked().
>
> So the warning is easy to fix; we just need to have it jump to fail
> instead of fail_drop. But the bigger issue is why did
> insert_inode_locked() failed in the first place.
>
> Did this error happen *right* after the system resumed, or did some
> amount of time pass before the warning triggered? This could have
> happened because the in-memory (or possibly on-disk) copy of the inode
> allocation bitmap has gotten corrupted, for example.
>
> What was the nature of the file system corruption which e2fsck decided
> that it need to correct?
>
> Regards,
>
> - Ted
On 11/27/11 3:34 PM, Ted Ts'o wrote:
> On Sun, Nov 27, 2011 at 11:24:03PM +0300, Alex wrote:
>> BTW, after last resume from disk fs was corrupted but fsck managed
>> to fix this error. So I think severity of this issue should be
>> raised.
>
> Can you reproduce this reliably? What was running at the time of the s2disk?
>
> What appears to be going on is that insert_inode_locked() is failing
> at fs/ext4/ialloc.c:887, probably because there's another inode with
> that inode number already on the superblock's hash list. The error
> codepath if insert_inode_locked() fail is incorrect; it's going to
> fail_drop, which tries dropping the inode's dquot (but we haven't
> calle ddquot_initialize)inode) yet) and calls unlock_new_inode(), but
> I_NEW hasn't been set because insert_inode_locked().
OK; this looks to be the result of:
commit 250df6ed274d767da844a5d9f05720b804240197
Author: Dave Chinner <[email protected]>
Date: Tue Mar 22 22:23:36 2011 +1100
fs: protect inode->i_state with inode->i_lock
(went in on 2.6.39)
because before that, insert_inode_locked() used to unconditionally do:
- inode->i_state |= I_NEW;
but that's gone now. Now if the function fails it'll return the
inode w/o I_NEW set.
ext2/3/4, jffs2, and jfs all call unlock_new_inode() on insert_inode_locked()
failure, and all would warn on this path.
I'm still not clear on what's causing insert_inode_locked() to fail,
but it used to be harmless (or at least silent) before.
I suppose it makes most sense to fix all callers to not clear I_NEW
on failure, unless it's too icky; it does seem weird to have I_NEW set
if we return with failure.
-Eric
> So the warning is easy to fix; we just need to have it jump to fail
> instead of fail_drop. But the bigger issue is why did
> insert_inode_locked() failed in the first place.
>
> Did this error happen *right* after the system resumed, or did some
> amount of time pass before the warning triggered? This could have
> happened because the in-memory (or possibly on-disk) copy of the inode
> allocation bitmap has gotten corrupted, for example.
>
> What was the nature of the file system corruption which e2fsck decided
> that it need to correct?
>
> Regards,
>
> - Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
On 12/5/11 2:54 PM, Eric Sandeen wrote:
> On 11/27/11 3:34 PM, Ted Ts'o wrote:
>> On Sun, Nov 27, 2011 at 11:24:03PM +0300, Alex wrote:
>>> BTW, after last resume from disk fs was corrupted but fsck managed
>>> to fix this error. So I think severity of this issue should be
>>> raised.
>>
>> Can you reproduce this reliably? What was running at the time of the s2disk?
>>
>> What appears to be going on is that insert_inode_locked() is failing
>> at fs/ext4/ialloc.c:887, probably because there's another inode with
>> that inode number already on the superblock's hash list. The error
>> codepath if insert_inode_locked() fail is incorrect; it's going to
>> fail_drop, which tries dropping the inode's dquot (but we haven't
>> calle ddquot_initialize)inode) yet) and calls unlock_new_inode(), but
>> I_NEW hasn't been set because insert_inode_locked().
>
> OK; this looks to be the result of:
>
> commit 250df6ed274d767da844a5d9f05720b804240197
> Author: Dave Chinner <[email protected]>
> Date: Tue Mar 22 22:23:36 2011 +1100
>
> fs: protect inode->i_state with inode->i_lock
>
> (went in on 2.6.39)
>
> because before that, insert_inode_locked() used to unconditionally do:
>
> - inode->i_state |= I_NEW;
>
> but that's gone now. Now if the function fails it'll return the
> inode w/o I_NEW set.
>
> ext2/3/4, jffs2, and jfs all call unlock_new_inode() on insert_inode_locked()
> failure, and all would warn on this path.
>
> I'm still not clear on what's causing insert_inode_locked() to fail,
> but it used to be harmless (or at least silent) before.
>
> I suppose it makes most sense to fix all callers to not clear I_NEW
> on failure, unless it's too icky; it does seem weird to have I_NEW set
> if we return with failure.
>
> -Eric
>
OTOH Al thought it would be reasonable to set I_NEW on failure as well,
and then we wouldn't have to touch the callers.
-Eric
>
>
>> So the warning is easy to fix; we just need to have it jump to fail
>> instead of fail_drop. But the bigger issue is why did
>> insert_inode_locked() failed in the first place.
>>
>> Did this error happen *right* after the system resumed, or did some
>> amount of time pass before the warning triggered? This could have
>> happened because the in-memory (or possibly on-disk) copy of the inode
>> allocation bitmap has gotten corrupted, for example.
>>
>> What was the nature of the file system corruption which e2fsck decided
>> that it need to correct?
>>
>> Regards,
>>
>> - Ted
>> --
\