Hi
As you know Android uses emergency remount instead of doing something
like "umount -a" in its shutdown/reboot path.
https://android.googlesource.com/platform/system/core/+/master/libcutils/android_reboot.c#132
I have seen a strange issue that sometimes occurs when there are a large
number of writes to an ext4 file system and an adb reboot is issued (
triggering an emergency remount readonly and a reboot)
Teh issue doesnt happen all the writer processes are killed before the
emergency remount
And on disk we see that one of the files being written to has incorrect
ext4_inode->i_blocks_lo ( which is less than the the size of the file by
something like 2k)
When unlinking this file the vfs inode->iblocks underflows and we end up
with EFBIG if EXT4_FEATURE_RO_COMPAT_HUGE_FILE is not enabled in the
superblock.
Is this a known issue?
I am still trying to figure out why we have a incorrect i_blocks_lo on
the disk.
Running fsck on the partition does fix the issue but i am trying to
figure out why this would happen and how to fix it.
I would appreciate if you could point me in the right direction and any
help you can give me.
--
Thanks
Nikhilesh Reddy
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.
On Tue, Apr 26, 2016 at 03:08:27PM -0700, Nikhilesh Reddy wrote:
> As you know Android uses emergency remount instead of doing something like
> "umount -a" in its shutdown/reboot path.
No, I didn't know that. (And I wish I didn't. Yelch, what an ugly
hack.)
> Teh issue doesnt happen all the writer processes are killed before the
> emergency remount
Is there a missing "if", as in "if all the writer processes".... ?
Note that an emergency remount is very much an emergency. So we don't
do a graceful shutdown of any pending writes. (Normally we would
return EBUSY if there anything that would prevent a clean remount.)
In the emergency remount path, we bypass all of these checks.
> And on disk we see that one of the files being written to has incorrect
> ext4_inode->i_blocks_lo ( which is less than the the size of the file by
> something like 2k)
>
> When unlinking this file the vfs inode->iblocks underflows and we end up
> with EFBIG if EXT4_FEATURE_RO_COMPAT_HUGE_FILE is not enabled in the
> superblock.
>
> Is this a known issue?
No, this isn't a known issue. I've never seen anything like this, but
all of the tests we do assume a forced poweroff, which we simulate
using dm-flakey. We do *not* test the blunt-force-trauma which is
inflected on the file system structures which results from doing an
emergency remount.
Off by 2k really doesn't make sense. I could see if it was off by 4k,
but 2k is really wierd.
> I would appreciate if you could point me in the right direction and any help
> you can give me.
Well, what I'd do is create a new ioctl interface which simulates an
emergency ro on just the one device, and try to create a reliable
repro. Eventually we'll want to add some tests for this in xfstests.
- Ted
On Tue, Apr 26, 2016 at 11:07:36PM -0400, Theodore Ts'o wrote:
> On Tue, Apr 26, 2016 at 03:08:27PM -0700, Nikhilesh Reddy wrote:
> > I would appreciate if you could point me in the right direction and any help
> > you can give me.
>
> Well, what I'd do is create a new ioctl interface which simulates an
> emergency ro on just the one device, and try to create a reliable
> repro. Eventually we'll want to add some tests for this in xfstests.
That's pretty much XFS_IOC_GOINGDOWN, controlled by xfs_io -c shutdown.
I'd suggest that adding a new flag:
#define XFS_FSOP_GOING_FLAGS_EMERG_REMOUNT 0x4
Would be in line with it's existing usage in xfstests for forcing
different shutdown conditions on filesystems....
Cheers,
Dave.
--
Dave Chinner
[email protected]
Hi
Thanks for your reply
Yes the sentence below should have the *if* .. sorry about the typo
The issue doesnt happen if all the writer processes/threads are killed
before the
emergency remount
>
> Note that an emergency remount is very much an emergency. So we don't
> do a graceful shutdown of any pending writes. (Normally we would
> return EBUSY if there anything that would prevent a clean remount.)
> In the emergency remount path, we bypass all of these checks.
>
>> And on disk we see that one of the files being written to has incorrect
>> ext4_inode->i_blocks_lo ( which is less than the the size of the file by
>> something like 2k)
>>
>> When unlinking this file the vfs inode->iblocks underflows and we end up
>> with EFBIG if EXT4_FEATURE_RO_COMPAT_HUGE_FILE is not enabled in the
>> superblock.
>>
>> Is this a known issue?
>
> No, this isn't a known issue. I've never seen anything like this, but
> all of the tests we do assume a forced poweroff, which we simulate
> using dm-flakey. We do *not* test the blunt-force-trauma which is
> inflected on the file system structures which results from doing an
> emergency remount.
>
> Off by 2k really doesn't make sense. I could see if it was off by 4k,
> but 2k is really wierd.
Just to clarify when i say off by 2k .. i meant the i_blocks count not
the actual size file ( which would be 2k * 512 if i am not wrong)
For example we see fsck report
Pass 1: Checking inodes, blocks, and sizes
Inode XXX, i_blocks is 854024, should be 856072.
>
>> I would appreciate if you could point me in the right direction and any help
>> you can give me.
>
> Well, what I'd do is create a new ioctl interface which simulates an
> emergency ro on just the one device, and try to create a reliable
> repro. Eventually we'll want to add some tests for this in xfstests.
>
Thanks so much for your suggestion.
I will try to see if i can reliably reproduce the issue after
implementing the ioctl as you suggested.
I have some issues getting the xfs tests to run on the device which i
have been meaning to work on .. maybe this is the time to do so.
--
Thanks
Nikhilesh Reddy
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.
Hi,
It seems like the problem that we had digged in and we just submitted a patch
to resolve the problem. Please, refer to the email titled "[PATCH] ext4: guarantee
already started handles to successfully finish while ro remounting" in the mailing
list.
I am not sure whether your problem and ours are exactly same. But, if you try to
apply our patch and the problem is resolved, please let us know.
Best Regards.