2016-04-26 22:08:30

by Nikhilesh Reddy

[permalink] [raw]
Subject: Emergency remount readonly and EFBIG errors when unlinking files on 3.18 android kernel

Hi

As you know Android uses emergency remount instead of doing something
like "umount -a" in its shutdown/reboot path.

https://android.googlesource.com/platform/system/core/+/master/libcutils/android_reboot.c#132

I have seen a strange issue that sometimes occurs when there are a large
number of writes to an ext4 file system and an adb reboot is issued (
triggering an emergency remount readonly and a reboot)

Teh issue doesnt happen all the writer processes are killed before the
emergency remount

And on disk we see that one of the files being written to has incorrect
ext4_inode->i_blocks_lo ( which is less than the the size of the file by
something like 2k)

When unlinking this file the vfs inode->iblocks underflows and we end up
with EFBIG if EXT4_FEATURE_RO_COMPAT_HUGE_FILE is not enabled in the
superblock.

Is this a known issue?

I am still trying to figure out why we have a incorrect i_blocks_lo on
the disk.

Running fsck on the partition does fix the issue but i am trying to
figure out why this would happen and how to fix it.

I would appreciate if you could point me in the right direction and any
help you can give me.

--
Thanks
Nikhilesh Reddy

Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.


2016-04-27 03:07:39

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Emergency remount readonly and EFBIG errors when unlinking files on 3.18 android kernel

On Tue, Apr 26, 2016 at 03:08:27PM -0700, Nikhilesh Reddy wrote:
> As you know Android uses emergency remount instead of doing something like
> "umount -a" in its shutdown/reboot path.

No, I didn't know that. (And I wish I didn't. Yelch, what an ugly
hack.)

> Teh issue doesnt happen all the writer processes are killed before the
> emergency remount

Is there a missing "if", as in "if all the writer processes".... ?

Note that an emergency remount is very much an emergency. So we don't
do a graceful shutdown of any pending writes. (Normally we would
return EBUSY if there anything that would prevent a clean remount.)
In the emergency remount path, we bypass all of these checks.

> And on disk we see that one of the files being written to has incorrect
> ext4_inode->i_blocks_lo ( which is less than the the size of the file by
> something like 2k)
>
> When unlinking this file the vfs inode->iblocks underflows and we end up
> with EFBIG if EXT4_FEATURE_RO_COMPAT_HUGE_FILE is not enabled in the
> superblock.
>
> Is this a known issue?

No, this isn't a known issue. I've never seen anything like this, but
all of the tests we do assume a forced poweroff, which we simulate
using dm-flakey. We do *not* test the blunt-force-trauma which is
inflected on the file system structures which results from doing an
emergency remount.

Off by 2k really doesn't make sense. I could see if it was off by 4k,
but 2k is really wierd.

> I would appreciate if you could point me in the right direction and any help
> you can give me.

Well, what I'd do is create a new ioctl interface which simulates an
emergency ro on just the one device, and try to create a reliable
repro. Eventually we'll want to add some tests for this in xfstests.

- Ted


2016-04-27 03:25:22

by Dave Chinner

[permalink] [raw]
Subject: Re: Emergency remount readonly and EFBIG errors when unlinking files on 3.18 android kernel

On Tue, Apr 26, 2016 at 11:07:36PM -0400, Theodore Ts'o wrote:
> On Tue, Apr 26, 2016 at 03:08:27PM -0700, Nikhilesh Reddy wrote:
> > I would appreciate if you could point me in the right direction and any help
> > you can give me.
>
> Well, what I'd do is create a new ioctl interface which simulates an
> emergency ro on just the one device, and try to create a reliable
> repro. Eventually we'll want to add some tests for this in xfstests.

That's pretty much XFS_IOC_GOINGDOWN, controlled by xfs_io -c shutdown.
I'd suggest that adding a new flag:

#define XFS_FSOP_GOING_FLAGS_EMERG_REMOUNT 0x4

Would be in line with it's existing usage in xfstests for forcing
different shutdown conditions on filesystems....

Cheers,

Dave.
--
Dave Chinner
[email protected]

2016-04-27 17:56:37

by Nikhilesh Reddy

[permalink] [raw]
Subject: Re: Emergency remount readonly and EFBIG errors when unlinking files on 3.18 android kernel

Hi

Thanks for your reply
Yes the sentence below should have the *if* .. sorry about the typo
The issue doesnt happen if all the writer processes/threads are killed
before the
emergency remount

>
> Note that an emergency remount is very much an emergency. So we don't
> do a graceful shutdown of any pending writes. (Normally we would
> return EBUSY if there anything that would prevent a clean remount.)
> In the emergency remount path, we bypass all of these checks.
>
>> And on disk we see that one of the files being written to has incorrect
>> ext4_inode->i_blocks_lo ( which is less than the the size of the file by
>> something like 2k)
>>
>> When unlinking this file the vfs inode->iblocks underflows and we end up
>> with EFBIG if EXT4_FEATURE_RO_COMPAT_HUGE_FILE is not enabled in the
>> superblock.
>>
>> Is this a known issue?
>
> No, this isn't a known issue. I've never seen anything like this, but
> all of the tests we do assume a forced poweroff, which we simulate
> using dm-flakey. We do *not* test the blunt-force-trauma which is
> inflected on the file system structures which results from doing an
> emergency remount.
>
> Off by 2k really doesn't make sense. I could see if it was off by 4k,
> but 2k is really wierd.

Just to clarify when i say off by 2k .. i meant the i_blocks count not
the actual size file ( which would be 2k * 512 if i am not wrong)

For example we see fsck report
Pass 1: Checking inodes, blocks, and sizes
Inode XXX, i_blocks is 854024, should be 856072.

>
>> I would appreciate if you could point me in the right direction and any help
>> you can give me.
>
> Well, what I'd do is create a new ioctl interface which simulates an
> emergency ro on just the one device, and try to create a reliable
> repro. Eventually we'll want to add some tests for this in xfstests.
>

Thanks so much for your suggestion.
I will try to see if i can reliably reproduce the issue after
implementing the ioctl as you suggested.
I have some issues getting the xfs tests to run on the device which i
have been meaning to work on .. maybe this is the time to do so.


--
Thanks
Nikhilesh Reddy

Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

2016-05-02 00:59:44

by Daeho Jeong

[permalink] [raw]
Subject: Re: Emergency remount readonly and EFBIG errors when unlinking files on 3.18 android kernel

Hi,

It seems like the problem that we had digged in and we just submitted a patch
to resolve the problem. Please, refer to the email titled "[PATCH] ext4: guarantee
already started handles to successfully finish while ro remounting" in the mailing
list.

I am not sure whether your problem and ours are exactly same. But, if you try to
apply our patch and the problem is resolved, please let us know.

Best Regards.