2017-09-18 10:44:36

by RAJESH DASARI

[permalink] [raw]
Subject: File system corruption after reboot, when rootfs in mounted over nfs

Hi ,

Could some one please help me with the below issue.

I have booted a mips based hardware with linux (4.4.36 kernel
)image(over tftp) and rootfs over nfs by passing nfsroot command line
option to the kernel.

rootfs is mounted under / in my hardware environment.

192.168.113.254:/rootfs / type nfs
(rw,relatime,vers=3,rsize=32768,wsize=32768,namlen=255,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.113.254,mountvers=3,mountproto=tcp,local_lock=all,addr=192.168.113.254)

I have a hard disk and i am mounting it on /mnt , this /mnt directory
is part of nfsroot.

e2fsck -f -y /dev/sda1 -> disk is clean with no errors.

mount -t ext2 /dev/sda1 /mnt (mount was successful ,mounting ext2 using ext4)
touch /mnt/test.log -> this command is failing.
umount /mnt
reboot

when i was executing the above commands in a loop i see that /dev/sda1
file system is getting corrupted. touch command is failing with
"ext4_find_dest_de:1809: inode #2: block 771: comm touch: bad entry in
directory: rec_len %4 !=0 -offset =0(0),
inode=4278190080,rec_len=54507,name_len=229 touch : cannot touch
/mnt/test.log structure needs clearing" error. I am able to reproduce
this issue always when i boot rootfs over nfs , if i boot from hard
disk , i am not noticing the issue .



Could someone please help me on this , what could cause the ext4 file
system corruption.

Thanks,
Rajesh Dasari.


2017-09-19 13:27:34

by Theodore Ts'o

[permalink] [raw]
Subject: Re: File system corruption after reboot, when rootfs in mounted over nfs

On Mon, Sep 18, 2017 at 04:14:35PM +0530, RAJESH DASARI wrote:
>
> Could some one please help me with the below issue.
>
> I have booted a mips based hardware with linux (4.4.36 kernel
> )image(over tftp) and rootfs over nfs by passing nfsroot command line
> option to the kernel.
>
> rootfs is mounted under / in my hardware environment.
>
> 192.168.113.254:/rootfs / type nfs
> (rw,relatime,vers=3,rsize=32768,wsize=32768,namlen=255,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.113.254,mountvers=3,mountproto=tcp,local_lock=all,addr=192.168.113.254)
>
> I have a hard disk and i am mounting it on /mnt , this /mnt directory
> is part of nfsroot.
>
> e2fsck -f -y /dev/sda1 -> disk is clean with no errors.
>
> mount -t ext2 /dev/sda1 /mnt (mount was successful ,mounting ext2 using ext4)
> touch /mnt/test.log -> this command is failing.
> umount /mnt
> reboot
>
> when i was executing the above commands in a loop i see that /dev/sda1
> file system is getting corrupted.

The above commands include running e2fsck? Then it sounds like there
is some kind of device driver bug.

What if you include an e2fsck -f -y /dev/sda1 after the umount? Can
you capture the output from that e2fsck run?

> I am able to reproduce
> this issue always when i boot rootfs over nfs , if i boot from hard
> disk , i am not noticing the issue .

Is it exactly the same kernel in both cases?

More detailed logs would certainly be helpful. There's not enough
detail in your description to do anything other than guess, since
we're not mind readers....

- Ted

2017-09-21 04:11:58

by RAJESH DASARI

[permalink] [raw]
Subject: Re: File system corruption after reboot, when rootfs in mounted over nfs

Could someone please respond to my query.

Thanks,
Rajesh Dasari.


On Mon, Sep 18, 2017 at 4:14 PM, RAJESH DASARI <[email protected]> wrote:
> Hi ,
>
> Could some one please help me with the below issue.
>
> I have booted a mips based hardware with linux (4.4.36 kernel
> )image(over tftp) and rootfs over nfs by passing nfsroot command line
> option to the kernel.
>
> rootfs is mounted under / in my hardware environment.
>
> 192.168.113.254:/rootfs / type nfs
> (rw,relatime,vers=3,rsize=32768,wsize=32768,namlen=255,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.113.254,mountvers=3,mountproto=tcp,local_lock=all,addr=192.168.113.254)
>
> I have a hard disk and i am mounting it on /mnt , this /mnt directory
> is part of nfsroot.
>
> e2fsck -f -y /dev/sda1 -> disk is clean with no errors.
>
> mount -t ext2 /dev/sda1 /mnt (mount was successful ,mounting ext2 using ext4)
> touch /mnt/test.log -> this command is failing.
> umount /mnt
> reboot
>
> when i was executing the above commands in a loop i see that /dev/sda1
> file system is getting corrupted. touch command is failing with
> "ext4_find_dest_de:1809: inode #2: block 771: comm touch: bad entry in
> directory: rec_len %4 !=0 -offset =0(0),
> inode=4278190080,rec_len=54507,name_len=229 touch : cannot touch
> /mnt/test.log structure needs clearing" error. I am able to reproduce
> this issue always when i boot rootfs over nfs , if i boot from hard
> disk , i am not noticing the issue .
>
>
>
> Could someone please help me on this , what could cause the ext4 file
> system corruption.
>
> Thanks,
> Rajesh Dasari.

2017-09-21 08:24:41

by Lukas Czerner

[permalink] [raw]
Subject: Re: File system corruption after reboot, when rootfs in mounted over nfs

On Thu, Sep 21, 2017 at 09:41:57AM +0530, RAJESH DASARI wrote:
> Could someone please respond to my query.

Hi,

Ted already did, with some questions.

https://www.spinics.net/lists/linux-ext4/msg58327.html

-Lukas

>
> Thanks,
> Rajesh Dasari.
>
>
> On Mon, Sep 18, 2017 at 4:14 PM, RAJESH DASARI <[email protected]> wrote:
> > Hi ,
> >
> > Could some one please help me with the below issue.
> >
> > I have booted a mips based hardware with linux (4.4.36 kernel
> > )image(over tftp) and rootfs over nfs by passing nfsroot command line
> > option to the kernel.
> >
> > rootfs is mounted under / in my hardware environment.
> >
> > 192.168.113.254:/rootfs / type nfs
> > (rw,relatime,vers=3,rsize=32768,wsize=32768,namlen=255,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.113.254,mountvers=3,mountproto=tcp,local_lock=all,addr=192.168.113.254)
> >
> > I have a hard disk and i am mounting it on /mnt , this /mnt directory
> > is part of nfsroot.
> >
> > e2fsck -f -y /dev/sda1 -> disk is clean with no errors.
> >
> > mount -t ext2 /dev/sda1 /mnt (mount was successful ,mounting ext2 using ext4)
> > touch /mnt/test.log -> this command is failing.
> > umount /mnt
> > reboot
> >
> > when i was executing the above commands in a loop i see that /dev/sda1
> > file system is getting corrupted. touch command is failing with
> > "ext4_find_dest_de:1809: inode #2: block 771: comm touch: bad entry in
> > directory: rec_len %4 !=0 -offset =0(0),
> > inode=4278190080,rec_len=54507,name_len=229 touch : cannot touch
> > /mnt/test.log structure needs clearing" error. I am able to reproduce
> > this issue always when i boot rootfs over nfs , if i boot from hard
> > disk , i am not noticing the issue .
> >
> >
> >
> > Could someone please help me on this , what could cause the ext4 file
> > system corruption.
> >
> > Thanks,
> > Rajesh Dasari.

2017-09-25 05:23:31

by RAJESH DASARI

[permalink] [raw]
Subject: Re: File system corruption after reboot, when rootfs in mounted over nfs

Thanks,
Rajesh Dasari.


On Tue, Sep 19, 2017 at 6:55 PM, Theodore Ts'o <[email protected]> wrote:
> On Mon, Sep 18, 2017 at 04:14:35PM +0530, RAJESH DASARI wrote:
>>
>> Could some one please help me with the below issue.
>>
>> I have booted a mips based hardware with linux (4.4.36 kernel
>> )image(over tftp) and rootfs over nfs by passing nfsroot command line
>> option to the kernel.
>>
>> rootfs is mounted under / in my hardware environment.
>>
>> 192.168.113.254:/rootfs / type nfs
>> (rw,relatime,vers=3,rsize=32768,wsize=32768,namlen=255,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.113.254,mountvers=3,mountproto=tcp,local_lock=all,addr=192.168.113.254)
>>
>> I have a hard disk and i am mounting it on /mnt , this /mnt directory
>> is part of nfsroot.
>>
>> e2fsck -f -y /dev/sda1 -> disk is clean with no errors.
>>
>> mount -t ext2 /dev/sda1 /mnt (mount was successful ,mounting ext2 using ext4)
>> touch /mnt/test.log -> this command is failing.
>> umount /mnt
>> reboot
>>
>> when i was executing the above commands in a loop i see that /dev/sda1
>> file system is getting corrupted.
>
> The above commands include running e2fsck? Then it sounds like there
> is some kind of device driver bug.
yeah.

>
> What if you include an e2fsck -f -y /dev/sda1 after the umount? Can
> you capture the output from that e2fsck run?
>
>> I am able to reproduce
>> this issue always when i boot rootfs over nfs , if i boot from hard
>> disk , i am not noticing the issue .
>
> Is it exactly the same kernel in both cases?
>
> More detailed logs would certainly be helpful. There's not enough
> detail in your description to do anything other than guess, since
> we're not mind readers....
>
> - Ted

2017-09-25 05:35:25

by RAJESH DASARI

[permalink] [raw]
Subject: Re: File system corruption after reboot, when rootfs in mounted over nfs

Thanks Ted for your reply , Please find my response in line and
please let me know if any other logs are needed.

Thanks,
Rajesh Dasari.


On Mon, Sep 25, 2017 at 10:53 AM, RAJESH DASARI <[email protected]> wrote:
> Thanks,
> Rajesh Dasari.
>
>
> On Tue, Sep 19, 2017 at 6:55 PM, Theodore Ts'o <[email protected]> wrote:
>> On Mon, Sep 18, 2017 at 04:14:35PM +0530, RAJESH DASARI wrote:
>>>
>>> Could some one please help me with the below issue.
>>>
>>> I have booted a mips based hardware with linux (4.4.36 kernel
>>> )image(over tftp) and rootfs over nfs by passing nfsroot command line
>>> option to the kernel.
>>>
>>> rootfs is mounted under / in my hardware environment.
>>>
>>> 192.168.113.254:/rootfs / type nfs
>>> (rw,relatime,vers=3,rsize=32768,wsize=32768,namlen=255,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.113.254,mountvers=3,mountproto=tcp,local_lock=all,addr=192.168.113.254)
>>>
>>> I have a hard disk and i am mounting it on /mnt , this /mnt directory
>>> is part of nfsroot.
>>>
>>> e2fsck -f -y /dev/sda1 -> disk is clean with no errors.
>>>
>>> mount -t ext2 /dev/sda1 /mnt (mount was successful ,mounting ext2 using ext4)
>>> touch /mnt/test.log -> this command is failing.
>>> umount /mnt
>>> reboot
>>>
>>> when i was executing the above commands in a loop i see that /dev/sda1
>>> file system is getting corrupted.
>>
>> The above commands include running e2fsck? Then it sounds like there
>> is some kind of device driver bug.
yeah. e2fsck command also included and all the above commands
were executed in a loop.
>
>>
>> What if you include an e2fsck -f -y /dev/sda1 after the umount? Can
>> you capture the output from that e2fsck run?
I tried running e2fsck after unmount also , i still see the issue
. Here is the output of e2fsck after
unmount .


e2fsck -f -y /dev/sda1
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information /dev/sda1: 12/262144 files
(0.0% non-contiguous), 18510/1048576 blocks

e2fsck return code was 0. after executing e2fsck, i rebooted the node
and when i mount the Hard disk it is failed with the below error.

[ 87.685184] EXT4-fs (sda1): mounting ext2 file system using the ext4
subsystem [ 87.694393] EXT4-fs error (device sda1): ext4_iget:4337:
inode #2: comm mount: bad extended attribute block 4278190080 [
87.707920] EXT4-fs (sda1): get root inode failed [ 87.712639] EXT4-fs
(sda1): mount failed mount: mounting /dev/sda1 on /mnt failed:
Structure needs cleaning

>>
>>> I am able to reproduce
>>> this issue always when i boot rootfs over nfs , if i boot from hard
>>> disk , i am not noticing the issue .
>>
>> Is it exactly the same kernel in both cases?
Kernel version and steps executed hardware environment everything
is same in both the cases.
>>
>> More detailed logs would certainly be helpful. There's not enough
>> detail in your description to do anything other than guess, since
>> we're not mind readers....
>>
>> - Ted

2017-09-25 06:48:41

by RAJESH DASARI

[permalink] [raw]
Subject: Re: File system corruption after reboot, when rootfs in mounted over nfs

Added some more information.

Thanks,
Rajesh.

On Mon, Sep 25, 2017 at 11:05 AM, RAJESH DASARI <[email protected]> wrote:
> Thanks Ted for your reply , Please find my response in line and
> please let me know if any other logs are needed.
>
> Thanks,
> Rajesh Dasari.
>
>
> On Mon, Sep 25, 2017 at 10:53 AM, RAJESH DASARI <[email protected]> wrote:
>> Thanks,
>> Rajesh Dasari.
>>
>>
>> On Tue, Sep 19, 2017 at 6:55 PM, Theodore Ts'o <[email protected]> wrote:
>>> On Mon, Sep 18, 2017 at 04:14:35PM +0530, RAJESH DASARI wrote:
>>>>
>>>> Could some one please help me with the below issue.
>>>>
>>>> I have booted a mips based hardware with linux (4.4.36 kernel
>>>> )image(over tftp) and rootfs over nfs by passing nfsroot command line
>>>> option to the kernel.
>>>>
>>>> rootfs is mounted under / in my hardware environment.
>>>>
>>>> 192.168.113.254:/rootfs / type nfs
>>>> (rw,relatime,vers=3,rsize=32768,wsize=32768,namlen=255,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.113.254,mountvers=3,mountproto=tcp,local_lock=all,addr=192.168.113.254)
>>>>
>>>> I have a hard disk and i am mounting it on /mnt , this /mnt directory
>>>> is part of nfsroot.
>>>>
>>>> e2fsck -f -y /dev/sda1 -> disk is clean with no errors.
>>>>
>>>> mount -t ext2 /dev/sda1 /mnt (mount was successful ,mounting ext2 using ext4)
>>>> touch /mnt/test.log -> this command is failing.
>>>> umount /mnt
>>>> reboot
>>>>
>>>> when i was executing the above commands in a loop i see that /dev/sda1
>>>> file system is getting corrupted.
>>>
>>> The above commands include running e2fsck? Then it sounds like there
>>> is some kind of device driver bug.
> yeah. e2fsck command also included and all the above commands
> were executed in a loop.
>>
>>>
>>> What if you include an e2fsck -f -y /dev/sda1 after the umount? Can
>>> you capture the output from that e2fsck run?

I tried running e2fsck after unmount also , i still see the issue.
Here is the output of e2fsck after
unmount .

e2fsck -f -y /dev/sda1
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information /dev/sda1: 12/262144 files
(0.0% non-contiguous), 18510/1048576 blocks

e2fsck return code was 0. after executing e2fsck, i rebooted the node
and i ran e2fsck again, output of this command same as the previous
e2fsck o/p and e2fsck didn't return any errors , But
when i mount the Hard disk it is failed with the below error.

[ 87.685184] EXT4-fs (sda1): mounting ext2 file system using the ext4
subsystem [ 87.694393] EXT4-fs error (device sda1): ext4_iget:4337:
inode #2: comm mount: bad extended attribute block 4278190080 [
87.707920] EXT4-fs (sda1): get root inode failed [ 87.712639] EXT4-fs
(sda1): mount failed mount: mounting /dev/sda1 on /mnt failed:
Structure needs cleaning

the below steps were executed in a loop for 100 times. I am seeing the
issue after executing for around 40 iterations.
1) e2fsck -y -n /dev/sda1
2) mount -t ext2 /dev/sda1 /mnt
3) touch /mnt/test.log
4) umount /mnt
5) e2fsck -y -n /dev/sda1
6)reboot
i tried to mount the hard disk as ext2 as my hard disk partition was
formatted as ext2 device.

I also formatted the hard disk as ext4 device and tried the above
steps (this time i mounted as ext4 only) , i still see the issue but
this time with a different error ,
mount command failed with the below error.
EXT4-fs (sda1): no journal found

the below ext4 kernel config options were used to build the kernel.

CONFIG_EXT4_FS=y
CONFIG_EXT4_USE_FOR_EXT2=y
CONFIG_EXT4_FS_POSIX_ACL=y
CONFIG_EXT4_FS_SECURITY=y

As i mentioned in my previous mail, when i boot the node from
HDD(kernel and rootfs both are booted from HDD) and executed steps 1
to 6 in a loop for 100 times , i am not noticing any issue. I am
seeing this issue continuously when i boot the node from nfs (kernel
is loaded from tftp and rootfs is over nfs).

>
>
>
>>>
>>>> I am able to reproduce
>>>> this issue always when i boot rootfs over nfs , if i boot from hard
>>>> disk , i am not noticing the issue .
>>>
>>> Is it exactly the same kernel in both cases?
Kernel version and steps executed hardware environment everything
is same in both the cases.
>>>
>>> More detailed logs would certainly be helpful. There's not enough
>>> detail in your description to do anything other than guess, since
>>> we're not mind readers....
>>>
>>> - Ted

2017-09-25 14:23:34

by Theodore Ts'o

[permalink] [raw]
Subject: Re: File system corruption after reboot, when rootfs in mounted over nfs

On Mon, Sep 25, 2017 at 12:18:40PM +0530, RAJESH DASARI wrote:
> the below steps were executed in a loop for 100 times. I am seeing the
> issue after executing for around 40 iterations.
> 1) e2fsck -y -n /dev/sda1

Are you really using "e2fsck -y -n"? That makes no sense.

The options "-f -y" would make sense; as would "-f -n". I'm going to
assume it really was "-f -y" given your other comments, but being
precise is also really important. You also didn't mention what
version of e2fsck/e2fsprogs you are using, which might be useful, but
given this is a flaky bug, it really doesn't matter, since it's almost
certainly a hardare problem or a device driver problem.

> 2) mount -t ext2 /dev/sda1 /mnt
> 3) touch /mnt/test.log
> 4) umount /mnt
> 5) e2fsck -y -n /dev/sda1
> 6)reboot
> i tried to mount the hard disk as ext2 as my hard disk partition was
> formatted as ext2 device.

I would suggest doing a series of differential diagnosis. For
example, replacing the hard drive with another hard drive. Moving
your hard drive to another system, and seeing if you can reproduce the
problem there. Simply reseating the cables might help.

But this is not a software problem, but clearly a hardware problem,
and so you're probably better office finding someone local who can
help you walk through debugging a hardware problem.

Cheers,

- Ted