2006-11-22 21:55:09

by Avantika Mathur LTC

[permalink] [raw]
Subject: Ext4 devel interlock meeting minutes (Nov. 22 2006)


Ext4 Developer Interlock Call: 11/22/06 minutes

Attendees: Mingming Cao, Eric Sandeen, Suparna Bhattacharya, Takashi Sato, Jean-Pierre Dion, Valérie Clément, Avantika Mathur

- Online Defrag:
-- Last week we determined that ioctl was the preferred
interface for the online defrag.
-- Eric will send out an ovreview of the steps completed by XFS for file defragmentation.
-- Last week Eric said that in XFS defrag implementation, you can't defrag a file that is open/being written to. This was an incorrect assumption. There will be more explanation in his mail.

- Preallocation:
-- We need to decide which interface to use in the implementation
* ioctl: simple solution preferred method for defrag and user in reservation
* posix_fallocate: writes zero to the first byte of every block; probably preferred solution
* ftruncate: consistency across platforms may be an issue
-- Possible parameters that would be used for preallocation would be file descriptor, number of blocks, and offset within file where blocks will be preallocated (if defrag and preallocation are implemented together
-- Suparna will send an email detailing all of the options, and some thoughts on implementation details to the list.
-- Eric will send details on XFS's preallocation implementation to the list
-- Multiple block allocation patches will be used in preallocation implementation

- Features we'd like to see in Ext4 before code freeze:
-- online defragmentation
-- multiple block allocation
-- big block group: Valerie has ported this to 2.6.19-rc6 and will send patches to the list tomorrow, and put updated e2fsprogs patches on Bull web site.
-- large inode default for ext4
-- large file support: Current max file size is 2 TB. Increase limit to 16 TB by changing i_block ( patches have been sent by Takashi), then increase to more than 16 TB. Changing the i_block to break 2TB limit may cause some application to break.

- We would like to document all features for Ext4 and sort dependencies and priorities on the mailing list.


2006-11-23 01:13:31

by Andreas Dilger

[permalink] [raw]
Subject: Re: Ext4 devel interlock meeting minutes (Nov. 22 2006)

On Nov 22, 2006 13:53 -0800, Avantika Mathur wrote:
> Attendees: Mingming Cao, Eric Sandeen, Suparna Bhattacharya, Takashi Sato, Jean-Pierre Dion, Val?rie Cl?ment, Avantika Mathur

Sorry for missing recent calls, it has been very busy here and I was sick,
so extra sleep won over getting up early for the concall :-/.

> - Online Defrag:
> -- Last week we determined that ioctl was the preferred
> interface for the online defrag.
> -- Eric will send out an ovreview of the steps completed by XFS for file defragmentation.
> -- Last week Eric said that in XFS defrag implementation, you can't defrag a file that is open/being written to. This was an incorrect assumption. There will be more explanation in his mail.

This is also something CFS is interested in and looking to work on. We
would be focussing on defragmentation of extent-mapped files.

> - Preallocation:
> -- We need to decide which interface to use in the implementation
> * ioctl: simple solution preferred method for defrag and user in reservation
> * posix_fallocate: writes zero to the first byte of every block; probably preferred solution
> * ftruncate: consistency across platforms may be an issue

And it also has compatibility issues because e.g. dd will truncate the file
to the new size and then start writing, when using "skip=NNN":

dd if=/dev/zero of=/tmp/foo bs=4k count=1 skip=100

open("/dev/null", O_RDONLY|O_LARGEFILE) = 0
open("/tmp/foo", O_RDWR|O_CREAT|O_LARGEFILE, 0666) = 1
fstat64(1, {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
ftruncate64(1, 409600) = 0

We don't necessarily want a common tool like dd to start allocating gobs
of disk space when trying to make a sparse file.

> - Features we'd like to see in Ext4 before code freeze:
> -- online defragmentation

Since online defragmentation doesn't require a format change, it likely
doesn't need to affect the ext4 code freeze. This is doubly true because
it is a relatively new topic and there is likely significant upstream
opposition by various individuals regardless of how we want to implement it.

> -- big block group: Valerie has ported this to 2.6.19-rc6 and will send patches to the list tomorrow, and put updated e2fsprogs patches on Bull web site.
> -- large inode default for ext4

Was inode version proposed for inclusion?

> -- large file support: Current max file size is 2 TB. Increase limit to 16 TB by changing i_block ( patches have been sent by Takashi), then increase to more than 16 TB. Changing the i_block to break 2TB limit may cause some application to break.

FYI, there is already on-disk format changes for this in the e2fsprogs
mercurial repo.


Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

2006-11-23 03:33:22

by Eric Sandeen

[permalink] [raw]
Subject: Re: Ext4 devel interlock meeting minutes (Nov. 22 2006)

Avantika Mathur wrote:
> Ext4 Developer Interlock Call: 11/22/06 minutes
>
> Attendees: Mingming Cao, Eric Sandeen, Suparna Bhattacharya, Takashi Sato, Jean-Pierre Dion, Valérie Clément, Avantika Mathur
>
> - Online Defrag:
> -- Last week we determined that ioctl was the preferred
> interface for the online defrag.
> -- Eric will send out an ovreview of the steps completed by XFS for file defragmentation.
> -- Last week Eric said that in XFS defrag implementation, you can't defrag a file that is open/being written to. This was an incorrect assumption. There will be more explanation in his mail.
>
> - Preallocation:
> -- We need to decide which interface to use in the implementation
> * ioctl: simple solution preferred method for defrag and user in reservation

And would also be useful for testing, at least. If reiserfs or jfs or
xfs has existing interfaces for preallocation that are at all useful,
perhaps we could even piggyback on them for now, in the same way that
other filesystems picked up ext[23]'s chattr interfaces?

> * posix_fallocate: writes zero to the first byte of every block; probably preferred solution

For the record, this is only how glibc does it today, it certainly
doesn't have to be done this way, ideally a kernel syscall which allows
filesystems to be smart would be best. Coordinating glibc & kernel
changes will be tricky & time consuming though... but it's a noble goal.
I'd like to see it happen; I'll chat with Ulrich when I get a chance,
see what he thinks.

> * ftruncate: consistency across platforms may be an issue

I agree w/ Andreas on this, I don't see how ftruncate can be used w/o
making holey files impossible.

-Eric