2012-08-29 09:22:23

by Ashish Sangwan

[permalink] [raw]
Subject: query about truncate and orphan list

I have a query about orphan list and truncate.
Currently these steps are performed in ext4_ext_truncate():
a) Start journal handle.
b) add inode to orphan list.
c) i_disksize is updated and inode is mark dirty.
d) actual truncate happen.
e) remove inode from orphan list.
f) handle stop.

If system crash during step d) will i_disksize is actually updated on disk?
AFAIK i_disksize might be updated on the journal but not on its
original location because the transaction is not commited yet.

If this is the case than what is the use of re-starting truncate
operation while processing orphan inode list?

PS: Also, there is function ext4_ext_truncate_extend_restart which may
commit the current transaction
in which case i_disksize would be updated but I am assuming there are
enough free journal blocks.

Thanks,
Ashish


2012-08-29 13:18:00

by Jan Kara

[permalink] [raw]
Subject: Re: query about truncate and orphan list

On Wed 29-08-12 14:52:22, Ashish Sangwan wrote:
> I have a query about orphan list and truncate.
> Currently these steps are performed in ext4_ext_truncate():
> a) Start journal handle.
> b) add inode to orphan list.
> c) i_disksize is updated and inode is mark dirty.
> d) actual truncate happen.
> e) remove inode from orphan list.
> f) handle stop.
>
> If system crash during step d) will i_disksize is actually updated on disk?
> AFAIK i_disksize might be updated on the journal but not on its
> original location because the transaction is not commited yet.
Yes, that can happen.

> If this is the case than what is the use of re-starting truncate
> operation while processing orphan inode list?
Because it can be on disk (noone guarantees when a transaction commits)
and in that case we have to perform the truncate because some of the blocks
might have been already freed by a transaction which is also committed.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2012-08-30 09:24:34

by Ashish Sangwan

[permalink] [raw]
Subject: Re: query about truncate and orphan list

On Wed, Aug 29, 2012 at 6:47 PM, Jan Kara <[email protected]> wrote:
> On Wed 29-08-12 14:52:22, Ashish Sangwan wrote:
>> I have a query about orphan list and truncate.
>> Currently these steps are performed in ext4_ext_truncate():
>> a) Start journal handle.
>> b) add inode to orphan list.
>> c) i_disksize is updated and inode is mark dirty.
>> d) actual truncate happen.
>> e) remove inode from orphan list.
>> f) handle stop.
>>
>> If system crash during step d) will i_disksize is actually updated on disk?
>> AFAIK i_disksize might be updated on the journal but not on its
>> original location because the transaction is not commited yet.
> Yes, that can happen.
>
Ok,
To test it, I inserted a 10 seconds sleep between d) and e)
I created a 10MB file on new ext4 partition and tried to truncate it to 10KB.
After waiting for 10 seconds, unplug the device.
On remount, the inode was not present on the orphan list.
No matter how many times I repeat this operation, result is same.

After that I inserted a call to ext4_journal_restart() between step c)
and d.) and repeat the above operation.
This time the inode was present on orphan list and i_disksize was
updated (10KB) correctly too.
Is it the correct thing to do?

pardon me if my question seems silly. I am just trying to learn basics
about ext4 journalling.

Thanks,
Ashish
>> If this is the case than what is the use of re-starting truncate
>> operation while processing orphan inode list?
> Because it can be on disk (noone guarantees when a transaction commits)
> and in that case we have to perform the truncate because some of the blocks
> might have been already freed by a transaction which is also committed.
>
> Honza
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR

2012-08-30 09:54:17

by Jan Kara

[permalink] [raw]
Subject: Re: query about truncate and orphan list

On Thu 30-08-12 14:54:33, Ashish Sangwan wrote:
> On Wed, Aug 29, 2012 at 6:47 PM, Jan Kara <[email protected]> wrote:
> > On Wed 29-08-12 14:52:22, Ashish Sangwan wrote:
> >> I have a query about orphan list and truncate.
> >> Currently these steps are performed in ext4_ext_truncate():
> >> a) Start journal handle.
> >> b) add inode to orphan list.
> >> c) i_disksize is updated and inode is mark dirty.
> >> d) actual truncate happen.
> >> e) remove inode from orphan list.
> >> f) handle stop.
> >>
> >> If system crash during step d) will i_disksize is actually updated on disk?
> >> AFAIK i_disksize might be updated on the journal but not on its
> >> original location because the transaction is not commited yet.
> > Yes, that can happen.
> >
> Ok,
> To test it, I inserted a 10 seconds sleep between d) and e)
> I created a 10MB file on new ext4 partition and tried to truncate it to 10KB.
> After waiting for 10 seconds, unplug the device.
> On remount, the inode was not present on the orphan list.
> No matter how many times I repeat this operation, result is same.
Well, this is likely because inode is small and thus the whole truncate
operation fits in a single transaction. So your sleep just held the
transaction with add-to-orphan operation open. Try creating a big
fragmented file - like:
for ((i = 0; i < 5000; i++)); do
dd if=/dev/urandom of=file bs=4096 count=1 conv=notrunc seek=$((i*2));
done
fsync file

Then try your above experiment and you should see inode on orphan list.

> After that I inserted a call to ext4_journal_restart() between step c)
> and d.) and repeat the above operation.
> This time the inode was present on orphan list and i_disksize was
> updated (10KB) correctly too.
> Is it the correct thing to do?
Yes. That's expected result.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR