The following changes since commit c7788792a5e7b0d5d7f96d0766b4cb6112d47d75:
Linux 3.10-rc2 (2013-05-20 14:37:38 -0700)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git tags/ext4_for_linus
for you to fetch changes up to 6ae06ff51eab5dcbbf959b05ce0f11003a305ba5:
ext4: optimize starting extent in ext4_ext_rm_leaf() (2013-07-01 08:12:41 -0400)
----------------------------------------------------------------
Lots of bug fixes, cleanups and optimizations. In the bug fixes
category, of note is a fix for on-line resizing file systems where the
block size is smaller than the page size (i.e., file systems 1k blocks
on x86, or more interestingly file systems with 4k blocks on Power or
ia64 systems.)
In the cleanup category, the ext4's punch hole implementation was
significantly improved by Lukas Czerner, and now supports bigalloc
file systems. In addition, Jan Kara significantly cleaned up the
write submission code path. We also improved error checking and added
a few sanity checks.
In the optimizations category, two major optimizations deserve
mention. The first is that ext4_writepages() is now used for
nodelalloc and ext3 compatibility mode. This allows writes to be
submitted much more efficiently as a single bio request, instead of
being sent as individual 4k writes into the block layer (which then
relied on the elevator code to coalesce the requests in the block
queue). Secondly, the extent cache shrink mechanism, which was
introduce in 3.9, no longer has a scalability bottleneck caused by the
i_es_lru spinlock. Other optimizations include some changes to reduce
CPU usage and to avoid issuing empty commits unnecessarily.
----------------------------------------------------------------
Al Viro (1):
ext3,ext4: don't mess with dir_file->f_pos in htree_dirblock_to_tree()
Alexey Khoroshilov (1):
ext4: implement error handling of ext4_mb_new_preallocation()
Ashish Sangwan (2):
ext4: pass inode pointer instead of file pointer to punch hole
ext4: optimize starting extent in ext4_ext_rm_leaf()
Darrick J. Wong (1):
jbd2: fix block tag checksum verification brokenness
Dmitry Monakhov (3):
jbd2: optimize jbd2_journal_force_commit
ext4: fix data integrity for ext4_sync_fs
ext4: Fix fsync error handling after filesystem abort
Jan Kara (30):
ext4: fix data offset overflow on 32-bit archs in ext4_inline_data_fiemap()
ext4: fix overflows in SEEK_HOLE, SEEK_DATA implementations
ext4: fix data offset overflow in ext4_xattr_fiemap() on 32-bit archs
ext4: fix overflow when counting used blocks on 32-bit architectures
ext4: use io_end for multiple bios
jbd2: don't create journal_head for temporary journal buffers
jbd2: remove journal_head from descriptor buffers
jbd2: refine waiting for shadow buffers
jbd2: remove outdated comment
jbd2: cleanup needed free block estimates when starting a transaction
jbd2: fix race in t_outstanding_credits update in jbd2_journal_extend()
jbd2: remove unused waitqueues
jbd2: transaction reservation support
ext4: provide wrappers for transaction reservation calls
ext4: stop messing with nr_to_write in ext4_da_writepages()
ext4: deprecate max_writeback_mb_bump sysfs attribute
ext4: improve writepage credit estimate for files with indirect blocks
ext4: better estimate credits needed for ext4_da_writepages()
ext4: restructure writeback path
ext4: remove buffer_uninit handling
ext4: use transaction reservation for extent conversion in ext4_end_io
ext4: split extent conversion lists to reserved & unreserved parts
ext4: defer clearing of PageWriteback after extent conversion
ext4: protect extent conversion after DIO with i_dio_count
ext4: remove wait for unwritten extent conversion from ext4_truncate()
ext4: use generic_file_fsync() in ext4_file_fsync() in nojournal mode
ext4: remove i_mutex from ext4_file_sync()
ext4: Remove wait for unwritten extents in ext4_ind_direct_IO()
ext4: don't wait for extent conversion in ext4_punch_hole()
ext4: remove ext4_ioend_wait()
Jie Liu (1):
ext4: return FIEMAP_EXTENT_UNKNOWN for delalloc extents
Joe Perches (1):
ext4: reduce object size when !CONFIG_PRINTK
Jon Ernst (1):
ext4: delete unused variables
Lukas Czerner (20):
mm: change invalidatepage prototype to accept length
jbd2: change jbd2_journal_invalidatepage to accept length
ext4: use ->invalidatepage() length argument
jbd: change journal_invalidatepage() to accept length
xfs: use ->invalidatepage() length argument
ocfs2: use ->invalidatepage() length argument
ceph: use ->invalidatepage() length argument
gfs2: use ->invalidatepage() length argument
reiserfs: use ->invalidatepage() length argument
mm: teach truncate_inode_pages_range() to handle non page aligned ranges
Revert "ext4: remove no longer used functions in inode.c"
ext4: Call ext4_jbd2_file_inode() after zeroing block
Revert "ext4: fix fsx truncate failure"
ext4: truncate_inode_pages() in orphan cleanup path
ext4: use ext4_zero_partial_blocks in punch_hole
ext4: remove unused discard_partial_page_buffers
ext4: remove unused code from ext4_remove_blocks()
ext4: update ext4_ext_remove_space trace point
ext4: make punch hole code path work with bigalloc
ext4: only zero partial blocks in ext4_zero_partial_blocks()
Maarten ter Huurne (1):
ext4: fix corruption when online resizing a fs with 1K block size
Paul Gortmaker (6):
jbd2: relocate assert after state lock in journal_commit_transaction()
jbd2: drop checkpoint mutex when waiting in __jbd2_log_wait_for_space()
jbd2: fix duplicate debug label for phase 2
jbd/jbd2: relocate bit_spinlock header to jbd_common
jbd2: use a single printk for jbd_debug()
jbd2: remove debug dependency on debug_fs and update Kconfig help text
Paul Taysom (1):
ext4: suppress ext4 orphan messages on mount
Theodore Ts'o (13):
ext4: add check to io_submit_init_bio
ext4: verify group number in verify_group_input() before using it
ext4: add sanity check to ext4_get_group_info()
ext4: optimize test_root()
ext4: use ext4_da_writepages() for all modes
ext4: add cond_resched() to ext4_free_blocks() & ext4_mb_regular_allocator()
ext4: don't use EXT4_FREE_BLOCKS_FORGET unnecessarily
jbd2: move superblock checksum calculation to jbd2_write_superblock()
ext4: check error return from ext4_write_inline_data_end()
jbd2: fix theoretical race in jbd2__journal_restart
ext4: fix up error handling for mpage_map_and_submit_extent()
ext4: translate flag bits to strings in tracepoints
jbd2: invalidate handle if jbd2_journal_restart() fails
Zheng Liu (2):
jbd2: use kmem_cache_zalloc for allocating journal head
ext4: improve extent cache shrink mechanism to avoid to burn CPU time
boxi liu (1):
ext4: improve free space calculation for inline_data
jon ernst (1):
ext4: delete unnecessary C statements
Documentation/filesystems/Locking | 6 +-
Documentation/filesystems/vfs.txt | 20 +-
fs/9p/vfs_addr.c | 5 +-
fs/afs/file.c | 10 +-
fs/btrfs/disk-io.c | 3 +-
fs/btrfs/extent_io.c | 2 +-
fs/btrfs/inode.c | 3 +-
fs/buffer.c | 21 +-
fs/ceph/addr.c | 15 +-
fs/cifs/file.c | 5 +-
fs/exofs/inode.c | 6 +-
fs/ext3/inode.c | 9 +-
fs/ext3/namei.c | 7 +-
fs/ext4/balloc.c | 14 +-
fs/ext4/ext4.h | 187 ++++---
fs/ext4/ext4_jbd2.c | 58 ++-
fs/ext4/ext4_jbd2.h | 29 +-
fs/ext4/extents.c | 193 +++++---
fs/ext4/extents_status.c | 75 ++-
fs/ext4/extents_status.h | 5 +-
fs/ext4/file.c | 14 +-
fs/ext4/fsync.c | 52 +-
fs/ext4/ialloc.c | 3 +-
fs/ext4/indirect.c | 40 +-
fs/ext4/inline.c | 4 +-
fs/ext4/inode.c | 1775 ++++++++++++++++++++++++++++++------------------------------------
fs/ext4/mballoc.c | 21 +-
fs/ext4/move_extent.c | 3 -
fs/ext4/namei.c | 7 +-
fs/ext4/page-io.c | 325 ++++++------
fs/ext4/resize.c | 24 +-
fs/ext4/super.c | 155 ++++--
fs/f2fs/data.c | 3 +-
fs/f2fs/node.c | 3 +-
fs/gfs2/aops.c | 17 +-
fs/jbd/transaction.c | 19 +-
fs/jbd2/Kconfig | 6 +-
fs/jbd2/checkpoint.c | 22 +-
fs/jbd2/commit.c | 184 +++----
fs/jbd2/journal.c | 166 ++++---
fs/jbd2/recovery.c | 11 +-
fs/jbd2/revoke.c | 49 +-
fs/jbd2/transaction.c | 526 ++++++++++++--------
fs/jfs/jfs_metapage.c | 5 +-
fs/logfs/file.c | 3 +-
fs/logfs/segment.c | 3 +-
fs/nfs/file.c | 8 +-
fs/ntfs/aops.c | 2 +-
fs/ocfs2/aops.c | 5 +-
fs/reiserfs/inode.c | 12 +-
fs/ubifs/file.c | 5 +-
fs/xfs/xfs_aops.c | 14 +-
fs/xfs/xfs_trace.h | 15 +-
include/linux/buffer_head.h | 3 +-
include/linux/fs.h | 2 +-
include/linux/jbd.h | 28 +-
include/linux/jbd2.h | 175 ++++---
include/linux/jbd_common.h | 26 +-
include/linux/mm.h | 3 +-
include/trace/events/ext3.h | 12 +-
include/trace/events/ext4.h | 304 ++++++++----
mm/readahead.c | 2 +-
mm/truncate.c | 117 +++--
63 files changed, 2664 insertions(+), 2182 deletions(-)
Hmm I'm getting this compiler warning:
fs/ext4/inode.c: In function ‘ext4_writepages’:
fs/ext4/inode.c:2219:6: warning: ‘err’ may be used uninitialized in
this function [-Wmaybe-uninitialized]
and I think the compiler is right to warn. The 'err' variable is set
inside a whilte() and an if() statement, and it is not at all obvious
that those codepaths are always taken.
Maybe that "map->m_len" is always guaranteed to be nonzero, and the
"while()" statement could be a "do { } while()" one. But if so, make
it so, don't write code as if it might never be executed, when the
return value seems to *depend* on it being executed.
Or just initialize the variable correctly.
This warning may not be new to this pull, I just happened to notice it now.
Linus
On Tue 02-07-13 10:18:32, Linus Torvalds wrote:
> Hmm I'm getting this compiler warning:
>
> fs/ext4/inode.c: In function ‘ext4_writepages’:
> fs/ext4/inode.c:2219:6: warning: ‘err’ may be used uninitialized in
> this function [-Wmaybe-uninitialized]
>
> and I think the compiler is right to warn. The 'err' variable is set
> inside a whilte() and an if() statement, and it is not at all obvious
> that those codepaths are always taken.
>
> Maybe that "map->m_len" is always guaranteed to be nonzero, and the
> "while()" statement could be a "do { } while()" one. But if so, make
> it so, don't write code as if it might never be executed, when the
> return value seems to *depend* on it being executed.
That's caused by my patches (only for certain gcc versions). map->m_len
is guaranteed to be > 0 in the first iteration (the function is called from
under if (map->m_len > 0)). I though Ted silenced that warning but
apparently he did not. The cleanest fix is likely to make a do-while loop
from that one. I'll send Ted a patch for that.
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Tue, 2 Jul 2013 20:04:47 +0200 Jan Kara <[email protected]> wrote:
>
> On Tue 02-07-13 10:18:32, Linus Torvalds wrote:
> > Hmm I'm getting this compiler warning:
> >
> > fs/ext4/inode.c: In function ‘ext4_writepages’:
> > fs/ext4/inode.c:2219:6: warning: ‘err’ may be used uninitialized in
> > this function [-Wmaybe-uninitialized]
> >
> > and I think the compiler is right to warn. The 'err' variable is set
> > inside a whilte() and an if() statement, and it is not at all obvious
> > that those codepaths are always taken.
> >
> > Maybe that "map->m_len" is always guaranteed to be nonzero, and the
> > "while()" statement could be a "do { } while()" one. But if so, make
> > it so, don't write code as if it might never be executed, when the
> > return value seems to *depend* on it being executed.
> That's caused by my patches (only for certain gcc versions). map->m_len
> is guaranteed to be > 0 in the first iteration (the function is called from
> under if (map->m_len > 0)). I though Ted silenced that warning but
> apparently he did not. The cleanest fix is likely to make a do-while loop
> from that one. I'll send Ted a patch for that.
I did report that warning about 4 weeks ago ... and provided a fix that
was way over the top (but pointed out another problem that was fixed).
--
Cheers,
Stephen Rothwell [email protected]
Hi Linus,
On Mon, 01 Jul 2013 09:55:55 -0400 Theodore Ts'o <[email protected]> wrote:
>
>
> The following changes since commit c7788792a5e7b0d5d7f96d0766b4cb6112d47d75:
>
> Linux 3.10-rc2 (2013-05-20 14:37:38 -0700)
>
> are available in the git repository at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git tags/ext4_for_linus
The merge of the ext4 tree and the staging tree needs the following fixup
(at least). It is not a biggie, as that code in staging is actually
disabled at the moment.
From: Stephen Rothwell <[email protected]>
Date: Tue, 4 Jun 2013 14:41:00 +1000
Subject: [PATCH] staging/lustre: fix for invalidatepage() API change
Signed-off-by: Stephen Rothwell <[email protected]>
---
drivers/staging/lustre/lustre/include/linux/lustre_patchless_compat.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/staging/lustre/lustre/include/linux/lustre_patchless_compat.h b/drivers/staging/lustre/lustre/include/linux/lustre_patchless_compat.h
index f050808..67c4644 100644
--- a/drivers/staging/lustre/lustre/include/linux/lustre_patchless_compat.h
+++ b/drivers/staging/lustre/lustre/include/linux/lustre_patchless_compat.h
@@ -53,7 +53,7 @@ truncate_complete_page(struct address_space *mapping, struct page *page)
return;
if (PagePrivate(page))
- page->mapping->a_ops->invalidatepage(page, 0);
+ page->mapping->a_ops->invalidatepage(page, 0, PAGE_CACHE_SIZE);
cancel_dirty_page(page, PAGE_SIZE);
ClearPageMappedToDisk(page);
diff --git a/drivers/staging/lustre/lustre/llite/rw26.c b/drivers/staging/lustre/lustre/llite/rw26.c
index 27e4e64..f1a1c5f 100644
--- a/drivers/staging/lustre/lustre/llite/rw26.c
+++ b/drivers/staging/lustre/lustre/llite/rw26.c
@@ -72,7 +72,8 @@
* aligned truncate). Lustre leaves partially truncated page in the cache,
* relying on struct inode::i_size to limit further accesses.
*/
-static void ll_invalidatepage(struct page *vmpage, unsigned long offset)
+static void ll_invalidatepage(struct page *vmpage, unsigned int offset,
+ unsigned int length)
{
struct inode *inode;
struct lu_env *env;
@@ -89,7 +90,7 @@ static void ll_invalidatepage(struct page *vmpage, unsigned long offset)
* below because they are run with page locked and all our io is
* happening with locked page too
*/
- if (offset == 0) {
+ if (offset == 0 && length == PAGE_CACHE_SIZE) {
env = cl_env_get(&refcheck);
if (!IS_ERR(env)) {
inode = vmpage->mapping->host;
--
Cheers,
Stephen Rothwell [email protected]
On Tue, Jul 2, 2013 at 4:45 PM, Stephen Rothwell <[email protected]> wrote:
>
> The merge of the ext4 tree and the staging tree needs the following fixup
> (at least). It is not a biggie, as that code in staging is actually
> disabled at the moment.
Argh. I merged both of these before this email came in, so now my
merge doesn't have this.
That said, as you mention, lustre is currently disabled due to bigger
build issues, so I guess it's not a big deal.
Who is going to be the main Lustre maintainer? I'm assuming (based on
the commits I see) it's Andreas Dilger?
I'm really not convinced this whole Lustre thing was correctly
handled. Merging it into stable and yet being in such bad shape that
it isn't enabled even there? I just dunno. But I have the turd in my
tree now, let's hope it gets fixed up.
Linus
On Tue, Jul 02, 2013 at 05:02:21PM -0700, Linus Torvalds wrote:
> On Tue, Jul 2, 2013 at 4:45 PM, Stephen Rothwell <[email protected]> wrote:
> >
> > The merge of the ext4 tree and the staging tree needs the following fixup
> > (at least). It is not a biggie, as that code in staging is actually
> > disabled at the moment.
>
> Argh. I merged both of these before this email came in, so now my
> merge doesn't have this.
>
> That said, as you mention, lustre is currently disabled due to bigger
> build issues, so I guess it's not a big deal.
>
> Who is going to be the main Lustre maintainer? I'm assuming (based on
> the commits I see) it's Andreas Dilger?
>
> I'm really not convinced this whole Lustre thing was correctly
> handled. Merging it into stable and yet being in such bad shape that
> it isn't enabled even there? I just dunno. But I have the turd in my
> tree now, let's hope it gets fixed up.
It's in "staging", not "stable" :)
And yeah, I thought it built properly, as it was running on lots of
systems for years, but it turns out, no one ever tested non-x86 builds
of the thing, and linux-next just choked all over it, which is why it is
disabled from the build for 3.11. We will get it all building properly
for 3.12, so no rush on these types of ext4 patches, Andreas and the
other Lustre developers will get it resolved.
thanks,
greg k-h
On Tue, Jul 2, 2013 at 5:54 PM, Greg KH <[email protected]> wrote:
> On Tue, Jul 02, 2013 at 05:02:21PM -0700, Linus Torvalds wrote:
>>
>> I'm really not convinced this whole Lustre thing was correctly
>> handled. Merging it into stable and yet being in such bad shape that
>> it isn't enabled even there? I just dunno. But I have the turd in my
>> tree now, let's hope it gets fixed up.
>
> It's in "staging", not "stable" :)
Yes. But what was the reason to actually merge it even there? And once
it gets merged, disabling it again rather than fixing the problems it
has?
This is a filesystem that Intel apparently wants to push. I think it
would have been a better idea to push back a bit and say "at least
clean it up a bit first". It's not like Intel is one of the clueless
companies that couldn't have done so and need help from the community.
As it is, it got merged, and apparently things are going *backwards*
rather than forwards because it doesn't even show up in the
allmodconfig builds I do.
Linus
On Tue, Jul 02, 2013 at 05:58:15PM -0700, Linus Torvalds wrote:
> On Tue, Jul 2, 2013 at 5:54 PM, Greg KH <[email protected]> wrote:
> > On Tue, Jul 02, 2013 at 05:02:21PM -0700, Linus Torvalds wrote:
> >>
> >> I'm really not convinced this whole Lustre thing was correctly
> >> handled. Merging it into stable and yet being in such bad shape that
> >> it isn't enabled even there? I just dunno. But I have the turd in my
> >> tree now, let's hope it gets fixed up.
> >
> > It's in "staging", not "stable" :)
>
> Yes. But what was the reason to actually merge it even there? And once
> it gets merged, disabling it again rather than fixing the problems it
> has?
The problems turned out to be too big, too late in the merge cycle for
me to be able to take them (they still aren't even done, as I don't have
a working set of patches yet.) So I just disabled it from the build to
give Andreas and team time to get it working properly.
I could have just removed it, but I thought I would give them a chance.
> This is a filesystem that Intel apparently wants to push. I think it
> would have been a better idea to push back a bit and say "at least
> clean it up a bit first". It's not like Intel is one of the clueless
> companies that couldn't have done so and need help from the community.
For this filesystem, it seems that they don't have any resources to do
this work and are relying on the community to help out. Which is odd,
but big companies are strange some times...
thanks,
greg k-h
On Tue, Jul 02, 2013 at 06:01:11PM -0700, Greg KH wrote:
> On Tue, Jul 02, 2013 at 05:58:15PM -0700, Linus Torvalds wrote:
> > On Tue, Jul 2, 2013 at 5:54 PM, Greg KH <[email protected]> wrote:
> > > On Tue, Jul 02, 2013 at 05:02:21PM -0700, Linus Torvalds wrote:
> > >>
> > >> I'm really not convinced this whole Lustre thing was correctly
> > >> handled. Merging it into stable and yet being in such bad shape that
> > >> it isn't enabled even there? I just dunno. But I have the turd in my
> > >> tree now, let's hope it gets fixed up.
> > >
> > > It's in "staging", not "stable" :)
> >
> > Yes. But what was the reason to actually merge it even there? And once
> > it gets merged, disabling it again rather than fixing the problems it
> > has?
>
> The problems turned out to be too big, too late in the merge cycle for
> me to be able to take them (they still aren't even done, as I don't have
> a working set of patches yet.) So I just disabled it from the build to
> give Andreas and team time to get it working properly.
>
> I could have just removed it, but I thought I would give them a chance.
>
> > This is a filesystem that Intel apparently wants to push. I think it
> > would have been a better idea to push back a bit and say "at least
> > clean it up a bit first". It's not like Intel is one of the clueless
> > companies that couldn't have done so and need help from the community.
>
> For this filesystem, it seems that they don't have any resources to do
> this work and are relying on the community to help out. Which is odd,
> but big companies are strange some times...
Didn't we learn this lesson already with POHMELFS? i.e. that dumping
filesystem code in staging on the assumption "the community" will
fix it up when nobody in "the community" uses or can even test that
filesystem is a broken development model....
Cheers,
Dave.
--
Dave Chinner
[email protected]
On Tue, Jul 02, 2013 at 08:04:47PM +0200, Jan Kara wrote:
> That's caused by my patches (only for certain gcc versions). map->m_len
> is guaranteed to be > 0 in the first iteration (the function is called from
> under if (map->m_len > 0)). I though Ted silenced that warning but
> apparently he did not. The cleanest fix is likely to make a do-while loop
> from that one. I'll send Ted a patch for that.
Sorry, I fixed it with a code simplification (I thought the while loop
wasn't needed at all), but when you explained why in fact we still
needed it, I dropped my commit, and forgot that this also dropped the
fix which silenced the warning. I'll grab your patch and included
with any other fixes that we might need for this merge window.
Linus, are you in a hurry to get this fixed up? It's a false warning
on gcc's part, so it's not actually causing any actual problems. If
not, I can wait until a week or so to see if there are any other bug
fixes that are required and push the fix to you towards the end of the
merge window.
- Ted
On Wed, Jul 3, 2013 at 4:35 AM, Theodore Ts'o <[email protected]> wrote:
>
> Linus, are you in a hurry to get this fixed up? I
No, I just hate seeing warnings in my (simplified) standard build. I'm
used to them in the "allmodconfig" builds I do, but prefer to not see
them when I just build a localized kernel.
But it's not critical. Just as long as I know it will get fixed, I'm happy.
Linus
On Wed, Jul 03, 2013 at 01:29:41PM +1000, Dave Chinner wrote:
> On Tue, Jul 02, 2013 at 06:01:11PM -0700, Greg KH wrote:
> > On Tue, Jul 02, 2013 at 05:58:15PM -0700, Linus Torvalds wrote:
> > > On Tue, Jul 2, 2013 at 5:54 PM, Greg KH <[email protected]> wrote:
> > > > On Tue, Jul 02, 2013 at 05:02:21PM -0700, Linus Torvalds wrote:
> > > >>
> > > >> I'm really not convinced this whole Lustre thing was correctly
> > > >> handled. Merging it into stable and yet being in such bad shape that
> > > >> it isn't enabled even there? I just dunno. But I have the turd in my
> > > >> tree now, let's hope it gets fixed up.
> > > >
> > > > It's in "staging", not "stable" :)
> > >
> > > Yes. But what was the reason to actually merge it even there? And once
> > > it gets merged, disabling it again rather than fixing the problems it
> > > has?
> >
> > The problems turned out to be too big, too late in the merge cycle for
> > me to be able to take them (they still aren't even done, as I don't have
> > a working set of patches yet.) So I just disabled it from the build to
> > give Andreas and team time to get it working properly.
> >
> > I could have just removed it, but I thought I would give them a chance.
> >
> > > This is a filesystem that Intel apparently wants to push. I think it
> > > would have been a better idea to push back a bit and say "at least
> > > clean it up a bit first". It's not like Intel is one of the clueless
> > > companies that couldn't have done so and need help from the community.
> >
> > For this filesystem, it seems that they don't have any resources to do
> > this work and are relying on the community to help out. Which is odd,
> > but big companies are strange some times...
>
> Didn't we learn this lesson already with POHMELFS? i.e. that dumping
> filesystem code in staging on the assumption "the community" will
> fix it up when nobody in "the community" uses or can even test that
> filesystem is a broken development model....
They (Intel) has said that they will continue to clean up this code in
the tree, until it is in good enough shape to be merged into fs/
properly. If they ever stop helping out, I will end up dropping it from
the tree, just like I did for pohmelfs, so don't worry about it
lingering around abandoned.
thanks,
greg k-h
On 2013/03/07 12:12 PM, "Greg KH" <[email protected]> wrote:
>On Wed, Jul 03, 2013 at 01:29:41PM +1000, Dave Chinner wrote:
>> On Tue, Jul 02, 2013 at 06:01:11PM -0700, Greg KH wrote:
>> > On Tue, Jul 02, 2013 at 05:58:15PM -0700, Linus Torvalds wrote:
>> > > On Tue, Jul 2, 2013 at 5:54 PM, Greg KH <[email protected]> wrote:
>> > > > On Tue, Jul 02, 2013 at 05:02:21PM -0700, Linus Torvalds wrote:
>> > > >>
>> > > >> I'm really not convinced this whole Lustre thing was correctly
>> > > >> handled. Merging it into stable and yet being in such bad shape
>>that
>> > > >> it isn't enabled even there? I just dunno. But I have the turd
>>in my
>> > > >> tree now, let's hope it gets fixed up.
>> > > >
>> > > > It's in "staging", not "stable" :)
>> > >
>> > > Yes. But what was the reason to actually merge it even there? And
>>once
>> > > it gets merged, disabling it again rather than fixing the problems
>>it
>> > > has?
>> >
>> > The problems turned out to be too big, too late in the merge cycle for
>> > me to be able to take them (they still aren't even done, as I don't
>>have
>> > a working set of patches yet.) So I just disabled it from the build
>>to
>> > give Andreas and team time to get it working properly.
In our defence, the code has been working fine for years, but only on
vendor
kernels, so we are playing catch-up to the mainline kernel, and hit a
bunch of
snags when merging into -next.
Also, all of the configure checks have been removed from the version
submitted
to the kernel, so this caused some breakage on platforms that Lustre
actually
runs on regularly (e.g. PPC). On the flip side, nobody ever uses Lustre
on S390
or 32-bit clients, so it is no surprise that there were problems there.
>> > I could have just removed it, but I thought I would give them a
>>chance.
Thanks. The code is just too big to get it ready for inclusion in one
piece,
and the only way that we can make it acceptable for mainline kernel
inclusion
is through -staging and incrementally cleaning it up.
>> > > This is a filesystem that Intel apparently wants to push. I think it
>> > > would have been a better idea to push back a bit and say "at least
>> > > clean it up a bit first". It's not like Intel is one of the clueless
>> > > companies that couldn't have done so and need help from the
>>community.
Well, it's been around for 10 years, and is pretty much the standard
filesystem
in HPC. While we are part of Intel now, there is still only a limited
number of
people working on it, and we don't have free reign to focus on getting it
into
the kernel. We still have customers to support and bugs to fix and
features to
develop for the next huge systems (1B cores writing 300TB/s to 1EB fs in
2018).
At the same time, there is enough demand in the
workgroup/department/university
scale that it makes sense to try and get it into mainline.
It isn't that we didn't want to get it into the kernel previously, but
-staging
didn't always exist and we don't have enough resources at one time to
rewrite
all of the code. Thanks to Peng Tao and EMC this is finally happening.
This
isn't "volunteer community" effort, there are dedicated resources working
on it.
>> > For this filesystem, it seems that they don't have any resources to do
>> > this work and are relying on the community to help out. Which is odd,
>> > but big companies are strange some times...
>>
>> Didn't we learn this lesson already with POHMELFS? i.e. that dumping
>> filesystem code in staging on the assumption "the community" will
>> fix it up when nobody in "the community" uses or can even test that
>> filesystem is a broken development model....
>
>They (Intel) has said that they will continue to clean up this code in
>the tree, until it is in good enough shape to be merged into fs/
>properly. If they ever stop helping out, I will end up dropping it from
>the tree, just like I did for pohmelfs, so don't worry about it
>lingering around abandoned.
Right, we are going to continue working on cleaning the code at a steady
pace
until it is ready to move to fs/. I don't expect Al or Dave or Christoph
to
spend their time (or make their eyes bleed) with the current state of the
code.
It has already undergone some significant cleanup, but needs a bunch more
still.
To be honest, I expect it will be in -staging for a year or so, but that is
fine with me since we've been working on it for 10+ years already and we
only
have so much capacity for changing/testing the code for the kernel while
keeping
all of the existing sites in working condition.
Cheers, Andreas
--
Andreas Dilger
Lustre Software Architect
Intel High Performance Data Division
On Wed, Jul 03, 2013 at 06:40:40PM +0000, Dilger, Andreas wrote:
> On 2013/03/07 12:12 PM, "Greg KH" <[email protected]> wrote:
> >On Wed, Jul 03, 2013 at 01:29:41PM +1000, Dave Chinner wrote:
> >> On Tue, Jul 02, 2013 at 06:01:11PM -0700, Greg KH wrote:
> >> > For this filesystem, it seems that they don't have any
> >> > resources to do this work and are relying on the community to
> >> > help out. Which is odd, but big companies are strange some
> >> > times...
> >>
> >> Didn't we learn this lesson already with POHMELFS? i.e. that
> >> dumping filesystem code in staging on the assumption "the
> >> community" will fix it up when nobody in "the community" uses
> >> or can even test that filesystem is a broken development
> >> model....
> >
> >They (Intel) has said that they will continue to clean up this
> >code in the tree, until it is in good enough shape to be merged
> >into fs/ properly. If they ever stop helping out, I will end up
> >dropping it from the tree, just like I did for pohmelfs, so don't
> >worry about it lingering around abandoned.
>
> Right, we are going to continue working on cleaning the code at a
> steady pace until it is ready to move to fs/. I don't expect Al
> or Dave or Christoph to spend their time (or make their eyes
> bleed) with the current state of the code. It has already
> undergone some significant cleanup, but needs a bunch more still.
The issue I'm more concerned about is more to do with what happens
when we do something that affects all filesystems, such as API
changes. Then we are forced to look at it and to try to work out
what the hell is going on. This was the real problem with pomelhfs
being in staging - it had an unreasonably high maintenance overhead
compared to other filesystems.
This was mainly because pohelmfs was effectively dumped in staging
and then left unmaintained - if there's consistent effort put into
the Lustre code to clean it up, maintain it and help with API
transitions, then I have nothing to object to and I'll just crawl
back into my box. ;)
Cheers,
Dave.
--
Dave Chinner
[email protected]
On Thu, Jul 04, 2013 at 09:54:44AM +1000, Dave Chinner wrote:
> On Wed, Jul 03, 2013 at 06:40:40PM +0000, Dilger, Andreas wrote:
> > On 2013/03/07 12:12 PM, "Greg KH" <[email protected]> wrote:
> > >On Wed, Jul 03, 2013 at 01:29:41PM +1000, Dave Chinner wrote:
> > >> On Tue, Jul 02, 2013 at 06:01:11PM -0700, Greg KH wrote:
> > >> > For this filesystem, it seems that they don't have any
> > >> > resources to do this work and are relying on the community to
> > >> > help out. Which is odd, but big companies are strange some
> > >> > times...
> > >>
> > >> Didn't we learn this lesson already with POHMELFS? i.e. that
> > >> dumping filesystem code in staging on the assumption "the
> > >> community" will fix it up when nobody in "the community" uses
> > >> or can even test that filesystem is a broken development
> > >> model....
> > >
> > >They (Intel) has said that they will continue to clean up this
> > >code in the tree, until it is in good enough shape to be merged
> > >into fs/ properly. If they ever stop helping out, I will end up
> > >dropping it from the tree, just like I did for pohmelfs, so don't
> > >worry about it lingering around abandoned.
> >
> > Right, we are going to continue working on cleaning the code at a
> > steady pace until it is ready to move to fs/. I don't expect Al
> > or Dave or Christoph to spend their time (or make their eyes
> > bleed) with the current state of the code. It has already
> > undergone some significant cleanup, but needs a bunch more still.
>
> The issue I'm more concerned about is more to do with what happens
> when we do something that affects all filesystems, such as API
> changes. Then we are forced to look at it and to try to work out
> what the hell is going on. This was the real problem with pomelhfs
> being in staging - it had an unreasonably high maintenance overhead
> compared to other filesystems.
>
> This was mainly because pohelmfs was effectively dumped in staging
> and then left unmaintained - if there's consistent effort put into
> the Lustre code to clean it up, maintain it and help with API
> transitions, then I have nothing to object to and I'll just crawl
> back into my box. ;)
Again, as with all code in the drivers/staging/ area, if you change any
kernel apis, you are not responsible for fixing up the staging drivers,
that's my job. A heads-up is always nice, but again, not required. You
should be able to just safely ignore them if you don't want to ever look
at them.
thanks,
greg k-h
Digging up an email thread from 2013...
On Wed, Jul 03, 2013 at 01:29:41PM +1000, Dave Chinner wrote:
> On Tue, Jul 02, 2013 at 06:01:11PM -0700, Greg KH wrote:
> > On Tue, Jul 02, 2013 at 05:58:15PM -0700, Linus Torvalds wrote:
> > > On Tue, Jul 2, 2013 at 5:54 PM, Greg KH <[email protected]> wrote:
> > > > On Tue, Jul 02, 2013 at 05:02:21PM -0700, Linus Torvalds wrote:
> > > >>
> > > >> I'm really not convinced this whole Lustre thing was correctly
> > > >> handled. Merging it into stable and yet being in such bad shape that
> > > >> it isn't enabled even there? I just dunno. But I have the turd in my
> > > >> tree now, let's hope it gets fixed up.
> > > >
> > > > It's in "staging", not "stable" :)
> > >
> > > Yes. But what was the reason to actually merge it even there? And once
> > > it gets merged, disabling it again rather than fixing the problems it
> > > has?
> >
> > The problems turned out to be too big, too late in the merge cycle for
> > me to be able to take them (they still aren't even done, as I don't have
> > a working set of patches yet.) So I just disabled it from the build to
> > give Andreas and team time to get it working properly.
> >
> > I could have just removed it, but I thought I would give them a chance.
> >
> > > This is a filesystem that Intel apparently wants to push. I think it
> > > would have been a better idea to push back a bit and say "at least
> > > clean it up a bit first". It's not like Intel is one of the clueless
> > > companies that couldn't have done so and need help from the community.
> >
> > For this filesystem, it seems that they don't have any resources to do
> > this work and are relying on the community to help out. Which is odd,
> > but big companies are strange some times...
>
> Didn't we learn this lesson already with POHMELFS? i.e. that dumping
> filesystem code in staging on the assumption "the community" will
> fix it up when nobody in "the community" uses or can even test that
> filesystem is a broken development model....
Dave, and Linus, you were totally right here. Sorry for not listening
to you before, my fault. The lustre developers never got their act
together and probably by this being in staging, it only prolonged the
agony of everyone involved.
greg k-h