2010-03-02 18:18:50

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 00/28] Ext4 patch queue for the 2.6.34 merge window.

This is what I have queued up to send to Linus for the 2.6.34 merge
window. It's in the ext4 patch queue and in the ext4 tree (everything
between origin and next). I'm currently doing final testing and plan to
ship it to Linus in the next day or two.

If you think something is missing or should be dropped for this list, or
see something wrong, please let me know ASAP. Thanks!!

- Ted

Curt Wohlgemuth (1):
ext4: Fix BUG_ON at fs/buffer.c:652 in no journal mode

Dmitry Monakhov (7):
ext4: mount flags manipulation cleanup
ext4: trivial quota cleanup
ext4: deprecate obsoleted mount options
ext4: fix error handling in migrate
ext4: explicitly remove inode from orphan list after failed direct io
ext4: Handle non empty on-disk orphan link
ext4: Fix ext4_quota_write cross block boundary behaviour

Eric Sandeen (3):
ext4: fix async i/o writes beyond 4GB to a sparse file
ext4: Fix optional-arg mount options
ext4: move __func__ into a macro for ext4_warning, ext4_error

Frank Mayhar (1):
ext4: Convert BUG_ON checks to use ext4_error() instead

Jiaying Zhang (1):
ext4: Add flag to files with blocks intentionally past EOF

Leonard Michlmayr (1):
ext4: correctly calculate number of blocks for fiemap

Roel Kluin (1):
ext4: add missing error checking to ext4_expand_extra_isize_ea()

Tao Ma (1):
ext4: Fix fencepost error in chosing choosing group vs file
preallocation.

Theodore Ts'o (9):
ext4: Add block validity check when truncating indirect block mapped
inodes
ext4: Add new tracepoint for jbd2_cleanup_journal_tail
ext4: Add new tracepoints to debug delayed allocation space functions
ext4: Use slab allocator for sub-page sized allocations
ext4: Use bitops to read/modify EXT4_I(inode)->i_state
ext4: Reserve INCOMPAT_EA_INODE and INCOMPAT_DIRDATA feature
codepoints
ext4: mechanical change on dio get_block code in prepare for it to be
used by buffer write
ext4: use ext4_get_block_write in buffer write
ext4: Use direct_IO_no_locking in ext4 dio read.

Toshiyuki Okajima (1):
ext4: make "offset" consistent in ext4_check_dir_entry()

dingdinghua (2):
jbd2: delay discarding buffers in journal_unmap_buffer
jbd2: clean up an assertion in jbd2_journal_commit_transaction()

fs/ext4/balloc.c | 29 +--
fs/ext4/dir.c | 12 +-
fs/ext4/ext4.h | 103 ++++++++---
fs/ext4/ext4_jbd2.c | 4 +-
fs/ext4/ext4_jbd2.h | 24 +++
fs/ext4/extents.c | 250 ++++++++++++++++++++-------
fs/ext4/file.c | 4 +-
fs/ext4/fsync.c | 2 +-
fs/ext4/ialloc.c | 32 ++--
fs/ext4/inode.c | 407 +++++++++++++++++++++++++++++--------------
fs/ext4/ioctl.c | 9 +
fs/ext4/mballoc.c | 34 ++--
fs/ext4/migrate.c | 35 ++--
fs/ext4/move_extent.c | 12 +-
fs/ext4/namei.c | 63 ++++----
fs/ext4/resize.c | 102 ++++-------
fs/ext4/super.c | 336 ++++++++++++++++++++++-------------
fs/ext4/xattr.c | 56 +++---
fs/jbd2/checkpoint.c | 1 +
fs/jbd2/commit.c | 13 +-
fs/jbd2/journal.c | 132 ++++++++++++++
fs/jbd2/transaction.c | 43 ++++--
include/linux/jbd2.h | 11 +-
include/trace/events/ext4.h | 101 +++++++++++
include/trace/events/jbd2.h | 28 +++
25 files changed, 1253 insertions(+), 590 deletions(-)



2010-03-02 18:18:50

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 08/28] ext4: Reserve INCOMPAT_EA_INODE and INCOMPAT_DIRDATA feature codepoints

Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/ext4.h | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index ffe1334..61cf3b3 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -284,6 +284,7 @@ struct flex_groups {
#define EXT4_TOPDIR_FL 0x00020000 /* Top of directory hierarchies*/
#define EXT4_HUGE_FILE_FL 0x00040000 /* Set to each huge file */
#define EXT4_EXTENTS_FL 0x00080000 /* Inode uses extents */
+#define EXT4_EA_INODE_FL 0x00200000 /* Inode used for large EA */
#define EXT4_RESERVED_FL 0x80000000 /* reserved for ext4 lib */

#define EXT4_FL_USER_VISIBLE 0x000BDFFF /* User visible flags */
@@ -1144,6 +1145,8 @@ static inline void ext4_clear_inode_state(struct inode *inode, int bit)
#define EXT4_FEATURE_INCOMPAT_64BIT 0x0080
#define EXT4_FEATURE_INCOMPAT_MMP 0x0100
#define EXT4_FEATURE_INCOMPAT_FLEX_BG 0x0200
+#define EXT4_FEATURE_INCOMPAT_EA_INODE 0x0400 /* EA in inode */
+#define EXT4_FEATURE_INCOMPAT_DIRDATA 0x1000 /* data in dirent */

#define EXT4_FEATURE_COMPAT_SUPP EXT2_FEATURE_COMPAT_EXT_ATTR
#define EXT4_FEATURE_INCOMPAT_SUPP (EXT4_FEATURE_INCOMPAT_FILETYPE| \
--
1.6.6.1.1.g974db.dirty


2010-03-02 18:18:50

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 04/28] ext4: Add new tracepoint for jbd2_cleanup_journal_tail

Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/jbd2/checkpoint.c | 1 +
include/trace/events/jbd2.h | 28 ++++++++++++++++++++++++++++
2 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index 8868493..30beb11 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -507,6 +507,7 @@ int jbd2_cleanup_journal_tail(journal_t *journal)
if (blocknr < journal->j_tail)
freed = freed + journal->j_last - journal->j_first;

+ trace_jbd2_cleanup_journal_tail(journal, first_tid, blocknr, freed);
jbd_debug(1,
"Cleaning journal tail from %d to %d (offset %lu), "
"freeing %lu\n",
diff --git a/include/trace/events/jbd2.h b/include/trace/events/jbd2.h
index 96b370a..bf16545 100644
--- a/include/trace/events/jbd2.h
+++ b/include/trace/events/jbd2.h
@@ -199,6 +199,34 @@ TRACE_EVENT(jbd2_checkpoint_stats,
__entry->forced_to_close, __entry->written, __entry->dropped)
);

+TRACE_EVENT(jbd2_cleanup_journal_tail,
+
+ TP_PROTO(journal_t *journal, tid_t first_tid,
+ unsigned long block_nr, unsigned long freed),
+
+ TP_ARGS(journal, first_tid, block_nr, freed),
+
+ TP_STRUCT__entry(
+ __field( dev_t, dev )
+ __field( tid_t, tail_sequence )
+ __field( tid_t, first_tid )
+ __field(unsigned long, block_nr )
+ __field(unsigned long, freed )
+ ),
+
+ TP_fast_assign(
+ __entry->dev = journal->j_fs_dev->bd_dev;
+ __entry->tail_sequence = journal->j_tail_sequence;
+ __entry->first_tid = first_tid;
+ __entry->block_nr = block_nr;
+ __entry->freed = freed;
+ ),
+
+ TP_printk("dev %s from %u to %u offset %lu freed %lu",
+ jbd2_dev_to_name(__entry->dev), __entry->tail_sequence,
+ __entry->first_tid, __entry->block_nr, __entry->freed)
+);
+
#endif /* _TRACE_JBD2_H */

/* This part must be outside protection */
--
1.6.6.1.1.g974db.dirty


2010-03-02 18:18:50

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 01/28] ext4: fix async i/o writes beyond 4GB to a sparse file

From: Eric Sandeen <[email protected]>

The "offset" member in ext4_io_end holds bytes, not blocks, so
ext4_lblk_t is wrong - and too small (u32).

This caused the async i/o writes to sparse files beyond 4GB to fail
when they wrapped around to 0.

Also fix up the type of arguments to ext4_convert_unwritten_extents(),
it gets ssize_t from ext4_end_aio_dio_nolock() and
ext4_ext_direct_IO().

Reported-by: Giel de Nijs <[email protected]>
Signed-off-by: Eric Sandeen <[email protected]>
---
fs/ext4/ext4.h | 6 +++---
fs/ext4/extents.c | 2 +-
fs/ext4/inode.c | 2 +-
3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 874d169..602d5ad 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -139,8 +139,8 @@ typedef struct ext4_io_end {
struct inode *inode; /* file being written to */
unsigned int flag; /* unwritten or not */
int error; /* I/O error code */
- ext4_lblk_t offset; /* offset in the file */
- size_t size; /* size of the extent */
+ loff_t offset; /* offset in the file */
+ ssize_t size; /* size of the extent */
struct work_struct work; /* data work queue */
} ext4_io_end_t;

@@ -1744,7 +1744,7 @@ extern void ext4_ext_release(struct super_block *);
extern long ext4_fallocate(struct inode *inode, int mode, loff_t offset,
loff_t len);
extern int ext4_convert_unwritten_extents(struct inode *inode, loff_t offset,
- loff_t len);
+ ssize_t len);
extern int ext4_get_blocks(handle_t *handle, struct inode *inode,
sector_t block, unsigned int max_blocks,
struct buffer_head *bh, int flags);
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 765a482..c568779 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3603,7 +3603,7 @@ retry:
* Returns 0 on success.
*/
int ext4_convert_unwritten_extents(struct inode *inode, loff_t offset,
- loff_t len)
+ ssize_t len)
{
handle_t *handle;
ext4_lblk_t block;
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index e119524..2059c34 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3551,7 +3551,7 @@ static int ext4_end_aio_dio_nolock(ext4_io_end_t *io)
{
struct inode *inode = io->inode;
loff_t offset = io->offset;
- size_t size = io->size;
+ ssize_t size = io->size;
int ret = 0;

ext4_debug("end_aio_dio_onlock: io 0x%p from inode %lu,list->next 0x%p,"
--
1.6.6.1.1.g974db.dirty


2010-03-02 18:18:54

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 05/28] ext4: Add new tracepoints to debug delayed allocation space functions

Add tracepoints for ext4_da_reserve_space(),
ext4_da_update_reserve_space(), and ext4_da_release_space().

Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/inode.c | 2 +
include/trace/events/ext4.h | 101 +++++++++++++++++++++++++++++++++++++++++++
2 files changed, 103 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 3e8afd9..1a3d7b2 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1061,6 +1061,7 @@ void ext4_da_update_reserve_space(struct inode *inode,
int mdb_free = 0, allocated_meta_blocks = 0;

spin_lock(&ei->i_block_reservation_lock);
+ trace_ext4_da_update_reserve_space(inode, used);
if (unlikely(used > ei->i_reserved_data_blocks)) {
ext4_msg(inode->i_sb, KERN_NOTICE, "%s: ino %lu, used %d "
"with only %d reserved data blocks\n",
@@ -1846,6 +1847,7 @@ repeat:
spin_lock(&ei->i_block_reservation_lock);
md_reserved = ei->i_reserved_meta_blocks;
md_needed = ext4_calc_metadata_amount(inode, lblock);
+ trace_ext4_da_reserve_space(inode, md_needed);
spin_unlock(&ei->i_block_reservation_lock);

/*
diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index d0b6cd3..2aa6aa3 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -874,6 +874,107 @@ TRACE_EVENT(ext4_forget,
__entry->mode, __entry->is_metadata, __entry->block)
);

+TRACE_EVENT(ext4_da_update_reserve_space,
+ TP_PROTO(struct inode *inode, int used_blocks),
+
+ TP_ARGS(inode, used_blocks),
+
+ TP_STRUCT__entry(
+ __field( dev_t, dev )
+ __field( ino_t, ino )
+ __field( umode_t, mode )
+ __field( __u64, i_blocks )
+ __field( int, used_blocks )
+ __field( int, reserved_data_blocks )
+ __field( int, reserved_meta_blocks )
+ __field( int, allocated_meta_blocks )
+ ),
+
+ TP_fast_assign(
+ __entry->dev = inode->i_sb->s_dev;
+ __entry->ino = inode->i_ino;
+ __entry->mode = inode->i_mode;
+ __entry->i_blocks = inode->i_blocks;
+ __entry->used_blocks = used_blocks;
+ __entry->reserved_data_blocks = EXT4_I(inode)->i_reserved_data_blocks;
+ __entry->reserved_meta_blocks = EXT4_I(inode)->i_reserved_meta_blocks;
+ __entry->allocated_meta_blocks = EXT4_I(inode)->i_allocated_meta_blocks;
+ ),
+
+ TP_printk("dev %s ino %lu mode 0%o i_blocks %llu used_blocks %d reserved_data_blocks %d reserved_meta_blocks %d allocated_meta_blocks %d",
+ jbd2_dev_to_name(__entry->dev), (unsigned long) __entry->ino,
+ __entry->mode, (unsigned long long) __entry->i_blocks,
+ __entry->used_blocks, __entry->reserved_data_blocks,
+ __entry->reserved_meta_blocks, __entry->allocated_meta_blocks)
+);
+
+TRACE_EVENT(ext4_da_reserve_space,
+ TP_PROTO(struct inode *inode, int md_needed),
+
+ TP_ARGS(inode, md_needed),
+
+ TP_STRUCT__entry(
+ __field( dev_t, dev )
+ __field( ino_t, ino )
+ __field( umode_t, mode )
+ __field( __u64, i_blocks )
+ __field( int, md_needed )
+ __field( int, reserved_data_blocks )
+ __field( int, reserved_meta_blocks )
+ ),
+
+ TP_fast_assign(
+ __entry->dev = inode->i_sb->s_dev;
+ __entry->ino = inode->i_ino;
+ __entry->mode = inode->i_mode;
+ __entry->i_blocks = inode->i_blocks;
+ __entry->md_needed = md_needed;
+ __entry->reserved_data_blocks = EXT4_I(inode)->i_reserved_data_blocks;
+ __entry->reserved_meta_blocks = EXT4_I(inode)->i_reserved_meta_blocks;
+ ),
+
+ TP_printk("dev %s ino %lu mode 0%o i_blocks %llu md_needed %d reserved_data_blocks %d reserved_meta_blocks %d",
+ jbd2_dev_to_name(__entry->dev), (unsigned long) __entry->ino,
+ __entry->mode, (unsigned long long) __entry->i_blocks,
+ __entry->md_needed, __entry->reserved_data_blocks,
+ __entry->reserved_meta_blocks)
+);
+
+TRACE_EVENT(ext4_da_release_space,
+ TP_PROTO(struct inode *inode, int freed_blocks),
+
+ TP_ARGS(inode, freed_blocks),
+
+ TP_STRUCT__entry(
+ __field( dev_t, dev )
+ __field( ino_t, ino )
+ __field( umode_t, mode )
+ __field( __u64, i_blocks )
+ __field( int, freed_blocks )
+ __field( int, reserved_data_blocks )
+ __field( int, reserved_meta_blocks )
+ __field( int, allocated_meta_blocks )
+ ),
+
+ TP_fast_assign(
+ __entry->dev = inode->i_sb->s_dev;
+ __entry->ino = inode->i_ino;
+ __entry->mode = inode->i_mode;
+ __entry->i_blocks = inode->i_blocks;
+ __entry->freed_blocks = freed_blocks;
+ __entry->reserved_data_blocks = EXT4_I(inode)->i_reserved_data_blocks;
+ __entry->reserved_meta_blocks = EXT4_I(inode)->i_reserved_meta_blocks;
+ __entry->allocated_meta_blocks = EXT4_I(inode)->i_allocated_meta_blocks;
+ ),
+
+ TP_printk("dev %s ino %lu mode 0%o i_blocks %llu freed_blocks %d reserved_data_blocks %d reserved_meta_blocks %d allocated_meta_blocks %d",
+ jbd2_dev_to_name(__entry->dev), (unsigned long) __entry->ino,
+ __entry->mode, (unsigned long long) __entry->i_blocks,
+ __entry->freed_blocks, __entry->reserved_data_blocks,
+ __entry->reserved_meta_blocks, __entry->allocated_meta_blocks)
+);
+
+
#endif /* _TRACE_EXT4_H */

/* This part must be outside protection */
--
1.6.6.1.1.g974db.dirty


2010-03-02 18:18:54

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 28/28] ext4: Fix ext4_quota_write cross block boundary behaviour

From: Dmitry Monakhov <[email protected]>

We always assume what dquot update result in changes in one data block
But ext4_quota_write() function may handle cross block boundary writes
In fact if this ever happen it will result in incorrect journal
credits reservation, and later a BUG_ON. As soon this never happen
the boundary cross loop is NOOP. In order to make things straight
let's remove this loop and assert cross boundary condition.

Signed-off-by: Dmitry Monakhov <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/super.c | 69 +++++++++++++++++++++++++++----------------------------
1 files changed, 34 insertions(+), 35 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 6c8e92c..1cbd0ad 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -4004,9 +4004,7 @@ static ssize_t ext4_quota_write(struct super_block *sb, int type,
ext4_lblk_t blk = off >> EXT4_BLOCK_SIZE_BITS(sb);
int err = 0;
int offset = off & (sb->s_blocksize - 1);
- int tocopy;
int journal_quota = EXT4_SB(sb)->s_qf_names[type] != NULL;
- size_t towrite = len;
struct buffer_head *bh;
handle_t *handle = journal_current_handle();

@@ -4016,52 +4014,53 @@ static ssize_t ext4_quota_write(struct super_block *sb, int type,
(unsigned long long)off, (unsigned long long)len);
return -EIO;
}
+ /*
+ * Since we account only one data block in transaction credits,
+ * then it is impossible to cross a block boundary.
+ */
+ if (sb->s_blocksize - offset < len) {
+ ext4_msg(sb, KERN_WARNING, "Quota write (off=%llu, len=%llu)"
+ " cancelled because not block aligned",
+ (unsigned long long)off, (unsigned long long)len);
+ return -EIO;
+ }
+
mutex_lock_nested(&inode->i_mutex, I_MUTEX_QUOTA);
- while (towrite > 0) {
- tocopy = sb->s_blocksize - offset < towrite ?
- sb->s_blocksize - offset : towrite;
- bh = ext4_bread(handle, inode, blk, 1, &err);
- if (!bh)
+ bh = ext4_bread(handle, inode, blk, 1, &err);
+ if (!bh)
+ goto out;
+ if (journal_quota) {
+ err = ext4_journal_get_write_access(handle, bh);
+ if (err) {
+ brelse(bh);
goto out;
- if (journal_quota) {
- err = ext4_journal_get_write_access(handle, bh);
- if (err) {
- brelse(bh);
- goto out;
- }
}
- lock_buffer(bh);
- memcpy(bh->b_data+offset, data, tocopy);
- flush_dcache_page(bh->b_page);
- unlock_buffer(bh);
- if (journal_quota)
- err = ext4_handle_dirty_metadata(handle, NULL, bh);
- else {
- /* Always do at least ordered writes for quotas */
- err = ext4_jbd2_file_inode(handle, inode);
- mark_buffer_dirty(bh);
- }
- brelse(bh);
- if (err)
- goto out;
- offset = 0;
- towrite -= tocopy;
- data += tocopy;
- blk++;
}
+ lock_buffer(bh);
+ memcpy(bh->b_data+offset, data, len);
+ flush_dcache_page(bh->b_page);
+ unlock_buffer(bh);
+ if (journal_quota)
+ err = ext4_handle_dirty_metadata(handle, NULL, bh);
+ else {
+ /* Always do at least ordered writes for quotas */
+ err = ext4_jbd2_file_inode(handle, inode);
+ mark_buffer_dirty(bh);
+ }
+ brelse(bh);
out:
- if (len == towrite) {
+ if (err) {
mutex_unlock(&inode->i_mutex);
return err;
}
- if (inode->i_size < off+len-towrite) {
- i_size_write(inode, off+len-towrite);
+ if (inode->i_size < off + len) {
+ i_size_write(inode, off + len);
EXT4_I(inode)->i_disksize = inode->i_size;
}
inode->i_mtime = inode->i_ctime = CURRENT_TIME;
ext4_mark_inode_dirty(handle, inode);
mutex_unlock(&inode->i_mutex);
- return len - towrite;
+ return len;
}

#endif
--
1.6.6.1.1.g974db.dirty


2010-03-02 18:18:54

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 07/28] ext4: Use bitops to read/modify EXT4_I(inode)->i_state

At several places we modify EXT4_I(inode)->i_state without holding
i_mutex (ext4_release_file, ext4_bmap, ext4_journalled_writepage,
ext4_do_update_inode, ...). These modifications are racy and we can
lose updates to i_state. So convert handling of i_state to use bitops
which are atomic.

Cc: Jan Kara <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/ext4.h | 41 +++++++++++++++++++++++++++++------------
fs/ext4/extents.c | 8 ++++----
fs/ext4/file.c | 4 ++--
fs/ext4/ialloc.c | 3 ++-
fs/ext4/inode.c | 38 ++++++++++++++++++++------------------
fs/ext4/migrate.c | 6 +++---
fs/ext4/xattr.c | 22 +++++++++++-----------
7 files changed, 71 insertions(+), 51 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 307ecd1..ffe1334 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -313,17 +313,6 @@ static inline __u32 ext4_mask_flags(umode_t mode, __u32 flags)
return flags & EXT4_OTHER_FLMASK;
}

-/*
- * Inode dynamic state flags
- */
-#define EXT4_STATE_JDATA 0x00000001 /* journaled data exists */
-#define EXT4_STATE_NEW 0x00000002 /* inode is newly created */
-#define EXT4_STATE_XATTR 0x00000004 /* has in-inode xattrs */
-#define EXT4_STATE_NO_EXPAND 0x00000008 /* No space for expansion */
-#define EXT4_STATE_DA_ALLOC_CLOSE 0x00000010 /* Alloc DA blks on close */
-#define EXT4_STATE_EXT_MIGRATE 0x00000020 /* Inode is migrating */
-#define EXT4_STATE_DIO_UNWRITTEN 0x00000040 /* need convert on dio done*/
-
/* Used to pass group descriptor data when online resize is done */
struct ext4_new_group_input {
__u32 group; /* Group number for this data */
@@ -631,7 +620,7 @@ struct ext4_inode_info {
* near to their parent directory's inode.
*/
ext4_group_t i_block_group;
- __u32 i_state; /* Dynamic state flags for ext4 */
+ unsigned long i_state_flags; /* Dynamic state flags */

ext4_lblk_t i_dir_start_lookup;
#ifdef CONFIG_EXT4_FS_XATTR
@@ -1051,6 +1040,34 @@ static inline int ext4_valid_inum(struct super_block *sb, unsigned long ino)
(ino >= EXT4_FIRST_INO(sb) &&
ino <= le32_to_cpu(EXT4_SB(sb)->s_es->s_inodes_count));
}
+
+/*
+ * Inode dynamic state flags
+ */
+enum {
+ EXT4_STATE_JDATA, /* journaled data exists */
+ EXT4_STATE_NEW, /* inode is newly created */
+ EXT4_STATE_XATTR, /* has in-inode xattrs */
+ EXT4_STATE_NO_EXPAND, /* No space for expansion */
+ EXT4_STATE_DA_ALLOC_CLOSE, /* Alloc DA blks on close */
+ EXT4_STATE_EXT_MIGRATE, /* Inode is migrating */
+ EXT4_STATE_DIO_UNWRITTEN, /* need convert on dio done*/
+};
+
+static inline int ext4_test_inode_state(struct inode *inode, int bit)
+{
+ return test_bit(bit, &EXT4_I(inode)->i_state_flags);
+}
+
+static inline void ext4_set_inode_state(struct inode *inode, int bit)
+{
+ set_bit(bit, &EXT4_I(inode)->i_state_flags);
+}
+
+static inline void ext4_clear_inode_state(struct inode *inode, int bit)
+{
+ clear_bit(bit, &EXT4_I(inode)->i_state_flags);
+}
#else
/* Assume that user mode programs are passing in an ext4fs superblock, not
* a kernel struct super_block. This will allow us to call the feature-test
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index c568779..5461615 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3076,7 +3076,7 @@ ext4_ext_handle_uninitialized_extents(handle_t *handle, struct inode *inode,
if (io)
io->flag = DIO_AIO_UNWRITTEN;
else
- EXT4_I(inode)->i_state |= EXT4_STATE_DIO_UNWRITTEN;
+ ext4_set_inode_state(inode, EXT4_STATE_DIO_UNWRITTEN);
goto out;
}
/* async DIO end_io complete, convert the filled extent to written */
@@ -3362,8 +3362,8 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
if (io)
io->flag = DIO_AIO_UNWRITTEN;
else
- EXT4_I(inode)->i_state |=
- EXT4_STATE_DIO_UNWRITTEN;;
+ ext4_set_inode_state(inode,
+ EXT4_STATE_DIO_UNWRITTEN);
}
}
err = ext4_ext_insert_extent(handle, inode, path, &newex, flags);
@@ -3739,7 +3739,7 @@ static int ext4_xattr_fiemap(struct inode *inode,
int error = 0;

/* in-inode? */
- if (EXT4_I(inode)->i_state & EXT4_STATE_XATTR) {
+ if (ext4_test_inode_state(inode, EXT4_STATE_XATTR)) {
struct ext4_iloc iloc;
int offset; /* offset of xattr in inode */

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 9630583..f6071ce 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -35,9 +35,9 @@
*/
static int ext4_release_file(struct inode *inode, struct file *filp)
{
- if (EXT4_I(inode)->i_state & EXT4_STATE_DA_ALLOC_CLOSE) {
+ if (ext4_test_inode_state(inode, EXT4_STATE_DA_ALLOC_CLOSE)) {
ext4_alloc_da_blocks(inode);
- EXT4_I(inode)->i_state &= ~EXT4_STATE_DA_ALLOC_CLOSE;
+ ext4_clear_inode_state(inode, EXT4_STATE_DA_ALLOC_CLOSE);
}
/* if we are the last writer on the inode, drop the block reservation */
if ((filp->f_mode & FMODE_WRITE) &&
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index f3624ea..2fab5ad 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -1029,7 +1029,8 @@ got:
inode->i_generation = sbi->s_next_generation++;
spin_unlock(&sbi->s_next_gen_lock);

- ei->i_state = EXT4_STATE_NEW;
+ ei->i_state_flags = 0;
+ ext4_set_inode_state(inode, EXT4_STATE_NEW);

ei->i_extra_isize = EXT4_SB(sb)->s_want_extra_isize;

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 1a3d7b2..3e53011 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1307,7 +1307,7 @@ int ext4_get_blocks(handle_t *handle, struct inode *inode, sector_t block,
* i_data's format changing. Force the migrate
* to fail by clearing migrate flags
*/
- EXT4_I(inode)->i_state &= ~EXT4_STATE_EXT_MIGRATE;
+ ext4_clear_inode_state(inode, EXT4_STATE_EXT_MIGRATE);
}

/*
@@ -1794,7 +1794,7 @@ static int ext4_journalled_write_end(struct file *file,
new_i_size = pos + copied;
if (new_i_size > inode->i_size)
i_size_write(inode, pos+copied);
- EXT4_I(inode)->i_state |= EXT4_STATE_JDATA;
+ ext4_set_inode_state(inode, EXT4_STATE_JDATA);
if (new_i_size > EXT4_I(inode)->i_disksize) {
ext4_update_i_disksize(inode, new_i_size);
ret2 = ext4_mark_inode_dirty(handle, inode);
@@ -2632,7 +2632,7 @@ static int __ext4_journalled_writepage(struct page *page,
ret = err;

walk_page_buffers(handle, page_bufs, 0, len, NULL, bput_one);
- EXT4_I(inode)->i_state |= EXT4_STATE_JDATA;
+ ext4_set_inode_state(inode, EXT4_STATE_JDATA);
out:
return ret;
}
@@ -3303,7 +3303,8 @@ static sector_t ext4_bmap(struct address_space *mapping, sector_t block)
filemap_write_and_wait(mapping);
}

- if (EXT4_JOURNAL(inode) && EXT4_I(inode)->i_state & EXT4_STATE_JDATA) {
+ if (EXT4_JOURNAL(inode) &&
+ ext4_test_inode_state(inode, EXT4_STATE_JDATA)) {
/*
* This is a REALLY heavyweight approach, but the use of
* bmap on dirty files is expected to be extremely rare:
@@ -3322,7 +3323,7 @@ static sector_t ext4_bmap(struct address_space *mapping, sector_t block)
* everything they get.
*/

- EXT4_I(inode)->i_state &= ~EXT4_STATE_JDATA;
+ ext4_clear_inode_state(inode, EXT4_STATE_JDATA);
journal = EXT4_JOURNAL(inode);
jbd2_journal_lock_updates(journal);
err = jbd2_journal_flush(journal);
@@ -3790,8 +3791,8 @@ static ssize_t ext4_ext_direct_IO(int rw, struct kiocb *iocb,
if (ret != -EIOCBQUEUED && ret <= 0 && iocb->private) {
ext4_free_io_end(iocb->private);
iocb->private = NULL;
- } else if (ret > 0 && (EXT4_I(inode)->i_state &
- EXT4_STATE_DIO_UNWRITTEN)) {
+ } else if (ret > 0 && ext4_test_inode_state(inode,
+ EXT4_STATE_DIO_UNWRITTEN)) {
int err;
/*
* for non AIO case, since the IO is already
@@ -3801,7 +3802,7 @@ static ssize_t ext4_ext_direct_IO(int rw, struct kiocb *iocb,
offset, ret);
if (err < 0)
ret = err;
- EXT4_I(inode)->i_state &= ~EXT4_STATE_DIO_UNWRITTEN;
+ ext4_clear_inode_state(inode, EXT4_STATE_DIO_UNWRITTEN);
}
return ret;
}
@@ -4457,7 +4458,7 @@ void ext4_truncate(struct inode *inode)
return;

if (inode->i_size == 0 && !test_opt(inode->i_sb, NO_AUTO_DA_ALLOC))
- ei->i_state |= EXT4_STATE_DA_ALLOC_CLOSE;
+ ext4_set_inode_state(inode, EXT4_STATE_DA_ALLOC_CLOSE);

if (EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL) {
ext4_ext_truncate(inode);
@@ -4743,7 +4744,7 @@ int ext4_get_inode_loc(struct inode *inode, struct ext4_iloc *iloc)
{
/* We have all inode data except xattrs in memory here. */
return __ext4_get_inode_loc(inode, iloc,
- !(EXT4_I(inode)->i_state & EXT4_STATE_XATTR));
+ !ext4_test_inode_state(inode, EXT4_STATE_XATTR));
}

void ext4_set_inode_flags(struct inode *inode)
@@ -4837,7 +4838,7 @@ struct inode *ext4_iget(struct super_block *sb, unsigned long ino)
}
inode->i_nlink = le16_to_cpu(raw_inode->i_links_count);

- ei->i_state = 0;
+ ei->i_state_flags = 0;
ei->i_dir_start_lookup = 0;
ei->i_dtime = le32_to_cpu(raw_inode->i_dtime);
/* We now have enough fields to check if the inode was active or not.
@@ -4920,7 +4921,7 @@ struct inode *ext4_iget(struct super_block *sb, unsigned long ino)
EXT4_GOOD_OLD_INODE_SIZE +
ei->i_extra_isize;
if (*magic == cpu_to_le32(EXT4_XATTR_MAGIC))
- ei->i_state |= EXT4_STATE_XATTR;
+ ext4_set_inode_state(inode, EXT4_STATE_XATTR);
}
} else
ei->i_extra_isize = 0;
@@ -5060,7 +5061,7 @@ static int ext4_do_update_inode(handle_t *handle,

/* For fields not not tracking in the in-memory inode,
* initialise them to zero for new inodes. */
- if (ei->i_state & EXT4_STATE_NEW)
+ if (ext4_test_inode_state(inode, EXT4_STATE_NEW))
memset(raw_inode, 0, EXT4_SB(inode->i_sb)->s_inode_size);

ext4_get_inode_flags(ei);
@@ -5156,7 +5157,7 @@ static int ext4_do_update_inode(handle_t *handle,
rc = ext4_handle_dirty_metadata(handle, inode, bh);
if (!err)
err = rc;
- ei->i_state &= ~EXT4_STATE_NEW;
+ ext4_clear_inode_state(inode, EXT4_STATE_NEW);

ext4_update_inode_fsync_trans(handle, inode, 0);
out_brelse:
@@ -5580,8 +5581,8 @@ static int ext4_expand_extra_isize(struct inode *inode,
entry = IFIRST(header);

/* No extended attributes present */
- if (!(EXT4_I(inode)->i_state & EXT4_STATE_XATTR) ||
- header->h_magic != cpu_to_le32(EXT4_XATTR_MAGIC)) {
+ if (!ext4_test_inode_state(inode, EXT4_STATE_XATTR) ||
+ header->h_magic != cpu_to_le32(EXT4_XATTR_MAGIC)) {
memset((void *)raw_inode + EXT4_GOOD_OLD_INODE_SIZE, 0,
new_extra_isize);
EXT4_I(inode)->i_extra_isize = new_extra_isize;
@@ -5625,7 +5626,7 @@ int ext4_mark_inode_dirty(handle_t *handle, struct inode *inode)
err = ext4_reserve_inode_write(handle, inode, &iloc);
if (ext4_handle_valid(handle) &&
EXT4_I(inode)->i_extra_isize < sbi->s_want_extra_isize &&
- !(EXT4_I(inode)->i_state & EXT4_STATE_NO_EXPAND)) {
+ !ext4_test_inode_state(inode, EXT4_STATE_NO_EXPAND)) {
/*
* We need extra buffer credits since we may write into EA block
* with this same handle. If journal_extend fails, then it will
@@ -5639,7 +5640,8 @@ int ext4_mark_inode_dirty(handle_t *handle, struct inode *inode)
sbi->s_want_extra_isize,
iloc, handle);
if (ret) {
- EXT4_I(inode)->i_state |= EXT4_STATE_NO_EXPAND;
+ ext4_set_inode_state(inode,
+ EXT4_STATE_NO_EXPAND);
if (mnt_count !=
le16_to_cpu(sbi->s_es->s_mnt_count)) {
ext4_warning(inode->i_sb, __func__,
diff --git a/fs/ext4/migrate.c b/fs/ext4/migrate.c
index 8141581..46a4101 100644
--- a/fs/ext4/migrate.c
+++ b/fs/ext4/migrate.c
@@ -365,12 +365,12 @@ static int ext4_ext_swap_inode_data(handle_t *handle, struct inode *inode,
* happened after we started the migrate. We need to
* fail the migrate
*/
- if (!(EXT4_I(inode)->i_state & EXT4_STATE_EXT_MIGRATE)) {
+ if (!ext4_test_inode_state(inode, EXT4_STATE_EXT_MIGRATE)) {
retval = -EAGAIN;
up_write(&EXT4_I(inode)->i_data_sem);
goto err_out;
} else
- EXT4_I(inode)->i_state &= ~EXT4_STATE_EXT_MIGRATE;
+ ext4_clear_inode_state(inode, EXT4_STATE_EXT_MIGRATE);
/*
* We have the extent map build with the tmp inode.
* Now copy the i_data across
@@ -533,7 +533,7 @@ int ext4_ext_migrate(struct inode *inode)
* allocation.
*/
down_read((&EXT4_I(inode)->i_data_sem));
- EXT4_I(inode)->i_state |= EXT4_STATE_EXT_MIGRATE;
+ ext4_set_inode_state(inode, EXT4_STATE_EXT_MIGRATE);
up_read((&EXT4_I(inode)->i_data_sem));

handle = ext4_journal_start(inode, 1);
diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index f3a2f7e..c619a7e 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -267,7 +267,7 @@ ext4_xattr_ibody_get(struct inode *inode, int name_index, const char *name,
void *end;
int error;

- if (!(EXT4_I(inode)->i_state & EXT4_STATE_XATTR))
+ if (!ext4_test_inode_state(inode, EXT4_STATE_XATTR))
return -ENODATA;
error = ext4_get_inode_loc(inode, &iloc);
if (error)
@@ -396,7 +396,7 @@ ext4_xattr_ibody_list(struct dentry *dentry, char *buffer, size_t buffer_size)
void *end;
int error;

- if (!(EXT4_I(inode)->i_state & EXT4_STATE_XATTR))
+ if (!ext4_test_inode_state(inode, EXT4_STATE_XATTR))
return 0;
error = ext4_get_inode_loc(inode, &iloc);
if (error)
@@ -908,7 +908,7 @@ ext4_xattr_ibody_find(struct inode *inode, struct ext4_xattr_info *i,
is->s.base = is->s.first = IFIRST(header);
is->s.here = is->s.first;
is->s.end = (void *)raw_inode + EXT4_SB(inode->i_sb)->s_inode_size;
- if (EXT4_I(inode)->i_state & EXT4_STATE_XATTR) {
+ if (ext4_test_inode_state(inode, EXT4_STATE_XATTR)) {
error = ext4_xattr_check_names(IFIRST(header), is->s.end);
if (error)
return error;
@@ -940,10 +940,10 @@ ext4_xattr_ibody_set(handle_t *handle, struct inode *inode,
header = IHDR(inode, ext4_raw_inode(&is->iloc));
if (!IS_LAST_ENTRY(s->first)) {
header->h_magic = cpu_to_le32(EXT4_XATTR_MAGIC);
- EXT4_I(inode)->i_state |= EXT4_STATE_XATTR;
+ ext4_set_inode_state(inode, EXT4_STATE_XATTR);
} else {
header->h_magic = cpu_to_le32(0);
- EXT4_I(inode)->i_state &= ~EXT4_STATE_XATTR;
+ ext4_clear_inode_state(inode, EXT4_STATE_XATTR);
}
return 0;
}
@@ -986,8 +986,8 @@ ext4_xattr_set_handle(handle_t *handle, struct inode *inode, int name_index,
if (strlen(name) > 255)
return -ERANGE;
down_write(&EXT4_I(inode)->xattr_sem);
- no_expand = EXT4_I(inode)->i_state & EXT4_STATE_NO_EXPAND;
- EXT4_I(inode)->i_state |= EXT4_STATE_NO_EXPAND;
+ no_expand = ext4_test_inode_state(inode, EXT4_STATE_NO_EXPAND);
+ ext4_set_inode_state(inode, EXT4_STATE_NO_EXPAND);

error = ext4_get_inode_loc(inode, &is.iloc);
if (error)
@@ -997,10 +997,10 @@ ext4_xattr_set_handle(handle_t *handle, struct inode *inode, int name_index,
if (error)
goto cleanup;

- if (EXT4_I(inode)->i_state & EXT4_STATE_NEW) {
+ if (ext4_test_inode_state(inode, EXT4_STATE_NEW)) {
struct ext4_inode *raw_inode = ext4_raw_inode(&is.iloc);
memset(raw_inode, 0, EXT4_SB(inode->i_sb)->s_inode_size);
- EXT4_I(inode)->i_state &= ~EXT4_STATE_NEW;
+ ext4_clear_inode_state(inode, EXT4_STATE_NEW);
}

error = ext4_xattr_ibody_find(inode, &i, &is);
@@ -1052,7 +1052,7 @@ ext4_xattr_set_handle(handle_t *handle, struct inode *inode, int name_index,
ext4_xattr_update_super_block(handle, inode->i_sb);
inode->i_ctime = ext4_current_time(inode);
if (!value)
- EXT4_I(inode)->i_state &= ~EXT4_STATE_NO_EXPAND;
+ ext4_clear_inode_state(inode, EXT4_STATE_NO_EXPAND);
error = ext4_mark_iloc_dirty(handle, inode, &is.iloc);
/*
* The bh is consumed by ext4_mark_iloc_dirty, even with
@@ -1067,7 +1067,7 @@ cleanup:
brelse(is.iloc.bh);
brelse(bs.bh);
if (no_expand == 0)
- EXT4_I(inode)->i_state &= ~EXT4_STATE_NO_EXPAND;
+ ext4_clear_inode_state(inode, EXT4_STATE_NO_EXPAND);
up_write(&EXT4_I(inode)->xattr_sem);
return error;
}
--
1.6.6.1.1.g974db.dirty


2010-03-02 18:18:51

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 02/28] ext4: Fix optional-arg mount options

From: Eric Sandeen <[email protected]>

We have 2 mount options, "barrier" and "auto_da_alloc" which may or
may not take a 1/0 argument. This causes the ext4 superblock mount
code to subtract uninitialized pointers and pass the result to
kmalloc, which results in very noisy failures.

Per Ted's suggestion, initialize the args struct so that
we know whether match_token() found an argument for the
option, and skip match_int() if not.

Also, return error (0) from parse_options if we thought
we found an argument, but match_int() Fails.

Reported-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Eric Sandeen <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/super.c | 23 +++++++++++++++--------
1 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 735c20d..68a55df 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1229,6 +1229,11 @@ static int parse_options(char *options, struct super_block *sb,
if (!*p)
continue;

+ /*
+ * Initialize args struct so we know whether arg was
+ * found; some options take optional arguments.
+ */
+ args[0].to = args[0].from = 0;
token = match_token(p, tokens, args);
switch (token) {
case Opt_bsd_df:
@@ -1518,10 +1523,11 @@ set_qf_format:
clear_opt(sbi->s_mount_opt, BARRIER);
break;
case Opt_barrier:
- if (match_int(&args[0], &option)) {
- set_opt(sbi->s_mount_opt, BARRIER);
- break;
- }
+ if (args[0].from) {
+ if (match_int(&args[0], &option))
+ return 0;
+ } else
+ option = 1; /* No argument, default to 1 */
if (option)
set_opt(sbi->s_mount_opt, BARRIER);
else
@@ -1594,10 +1600,11 @@ set_qf_format:
set_opt(sbi->s_mount_opt,NO_AUTO_DA_ALLOC);
break;
case Opt_auto_da_alloc:
- if (match_int(&args[0], &option)) {
- clear_opt(sbi->s_mount_opt, NO_AUTO_DA_ALLOC);
- break;
- }
+ if (args[0].from) {
+ if (match_int(&args[0], &option))
+ return 0;
+ } else
+ option = 1; /* No argument, default to 1 */
if (option)
clear_opt(sbi->s_mount_opt, NO_AUTO_DA_ALLOC);
else
--
1.6.6.1.1.g974db.dirty


2010-03-02 18:18:54

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 27/28] ext4: Convert BUG_ON checks to use ext4_error() instead

From: Frank Mayhar <[email protected]>

Convert a bunch of BUG_ONs to emit a ext4_error() message and return
EIO. This is a first pass and most notably does _not_ cover
mballoc.c, which is a morass of void functions.

Signed-off-by: Frank Mayhar <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/ext4.h | 10 +++
fs/ext4/extents.c | 189 +++++++++++++++++++++++++++++++++++++++++------------
fs/ext4/inode.c | 18 +++++-
fs/ext4/super.c | 36 ++++++++++
4 files changed, 209 insertions(+), 44 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 79c067e..350c3a2 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -53,6 +53,12 @@
#define ext4_debug(f, a...) do {} while (0)
#endif

+#define EXT4_ERROR_INODE(inode, fmt, a...) \
+ ext4_error_inode(__func__, (inode), (fmt), ## a);
+
+#define EXT4_ERROR_FILE(file, fmt, a...) \
+ ext4_error_file(__func__, (file), (fmt), ## a);
+
/* data type for block offset of block group */
typedef int ext4_grpblk_t;

@@ -1492,6 +1498,10 @@ extern int ext4_group_extend(struct super_block *sb,
extern void __ext4_error(struct super_block *, const char *, const char *, ...)
__attribute__ ((format (printf, 3, 4)));
#define ext4_error(sb, message...) __ext4_error(sb, __func__, ## message)
+extern void ext4_error_inode(const char *, struct inode *, const char *, ...)
+ __attribute__ ((format (printf, 3, 4)));
+extern void ext4_error_file(const char *, struct file *, const char *, ...)
+ __attribute__ ((format (printf, 3, 4)));
extern void __ext4_std_error(struct super_block *, const char *, int);
extern void ext4_abort(struct super_block *, const char *, const char *, ...)
__attribute__ ((format (printf, 3, 4)));
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 202667b..45dad87 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -703,7 +703,12 @@ ext4_ext_find_extent(struct inode *inode, ext4_lblk_t block,
}
eh = ext_block_hdr(bh);
ppos++;
- BUG_ON(ppos > depth);
+ if (unlikely(ppos > depth)) {
+ put_bh(bh);
+ EXT4_ERROR_INODE(inode,
+ "ppos %d > depth %d", ppos, depth);
+ goto err;
+ }
path[ppos].p_bh = bh;
path[ppos].p_hdr = eh;
i--;
@@ -749,7 +754,12 @@ int ext4_ext_insert_index(handle_t *handle, struct inode *inode,
if (err)
return err;

- BUG_ON(logical == le32_to_cpu(curp->p_idx->ei_block));
+ if (unlikely(logical == le32_to_cpu(curp->p_idx->ei_block))) {
+ EXT4_ERROR_INODE(inode,
+ "logical %d == ei_block %d!",
+ logical, le32_to_cpu(curp->p_idx->ei_block));
+ return -EIO;
+ }
len = EXT_MAX_INDEX(curp->p_hdr) - curp->p_idx;
if (logical > le32_to_cpu(curp->p_idx->ei_block)) {
/* insert after */
@@ -779,9 +789,17 @@ int ext4_ext_insert_index(handle_t *handle, struct inode *inode,
ext4_idx_store_pblock(ix, ptr);
le16_add_cpu(&curp->p_hdr->eh_entries, 1);

- BUG_ON(le16_to_cpu(curp->p_hdr->eh_entries)
- > le16_to_cpu(curp->p_hdr->eh_max));
- BUG_ON(ix > EXT_LAST_INDEX(curp->p_hdr));
+ if (unlikely(le16_to_cpu(curp->p_hdr->eh_entries)
+ > le16_to_cpu(curp->p_hdr->eh_max))) {
+ EXT4_ERROR_INODE(inode,
+ "logical %d == ei_block %d!",
+ logical, le32_to_cpu(curp->p_idx->ei_block));
+ return -EIO;
+ }
+ if (unlikely(ix > EXT_LAST_INDEX(curp->p_hdr))) {
+ EXT4_ERROR_INODE(inode, "ix > EXT_LAST_INDEX!");
+ return -EIO;
+ }

err = ext4_ext_dirty(handle, inode, curp);
ext4_std_error(inode->i_sb, err);
@@ -819,7 +837,10 @@ static int ext4_ext_split(handle_t *handle, struct inode *inode,

/* if current leaf will be split, then we should use
* border from split point */
- BUG_ON(path[depth].p_ext > EXT_MAX_EXTENT(path[depth].p_hdr));
+ if (unlikely(path[depth].p_ext > EXT_MAX_EXTENT(path[depth].p_hdr))) {
+ EXT4_ERROR_INODE(inode, "p_ext > EXT_MAX_EXTENT!");
+ return -EIO;
+ }
if (path[depth].p_ext != EXT_MAX_EXTENT(path[depth].p_hdr)) {
border = path[depth].p_ext[1].ee_block;
ext_debug("leaf will be split."
@@ -860,7 +881,11 @@ static int ext4_ext_split(handle_t *handle, struct inode *inode,

/* initialize new leaf */
newblock = ablocks[--a];
- BUG_ON(newblock == 0);
+ if (unlikely(newblock == 0)) {
+ EXT4_ERROR_INODE(inode, "newblock == 0!");
+ err = -EIO;
+ goto cleanup;
+ }
bh = sb_getblk(inode->i_sb, newblock);
if (!bh) {
err = -EIO;
@@ -880,7 +905,14 @@ static int ext4_ext_split(handle_t *handle, struct inode *inode,
ex = EXT_FIRST_EXTENT(neh);

/* move remainder of path[depth] to the new leaf */
- BUG_ON(path[depth].p_hdr->eh_entries != path[depth].p_hdr->eh_max);
+ if (unlikely(path[depth].p_hdr->eh_entries !=
+ path[depth].p_hdr->eh_max)) {
+ EXT4_ERROR_INODE(inode, "eh_entries %d != eh_max %d!",
+ path[depth].p_hdr->eh_entries,
+ path[depth].p_hdr->eh_max);
+ err = -EIO;
+ goto cleanup;
+ }
/* start copy from next extent */
/* TODO: we could do it by single memmove */
m = 0;
@@ -927,7 +959,11 @@ static int ext4_ext_split(handle_t *handle, struct inode *inode,

/* create intermediate indexes */
k = depth - at - 1;
- BUG_ON(k < 0);
+ if (unlikely(k < 0)) {
+ EXT4_ERROR_INODE(inode, "k %d < 0!", k);
+ err = -EIO;
+ goto cleanup;
+ }
if (k)
ext_debug("create %d intermediate indices\n", k);
/* insert new index into current index block */
@@ -964,8 +1000,14 @@ static int ext4_ext_split(handle_t *handle, struct inode *inode,

ext_debug("cur 0x%p, last 0x%p\n", path[i].p_idx,
EXT_MAX_INDEX(path[i].p_hdr));
- BUG_ON(EXT_MAX_INDEX(path[i].p_hdr) !=
- EXT_LAST_INDEX(path[i].p_hdr));
+ if (unlikely(EXT_MAX_INDEX(path[i].p_hdr) !=
+ EXT_LAST_INDEX(path[i].p_hdr))) {
+ EXT4_ERROR_INODE(inode,
+ "EXT_MAX_INDEX != EXT_LAST_INDEX ee_block %d!",
+ le32_to_cpu(path[i].p_ext->ee_block));
+ err = -EIO;
+ goto cleanup;
+ }
while (path[i].p_idx <= EXT_MAX_INDEX(path[i].p_hdr)) {
ext_debug("%d: move %d:%llu in new index %llu\n", i,
le32_to_cpu(path[i].p_idx->ei_block),
@@ -1203,7 +1245,10 @@ ext4_ext_search_left(struct inode *inode, struct ext4_ext_path *path,
struct ext4_extent *ex;
int depth, ee_len;

- BUG_ON(path == NULL);
+ if (unlikely(path == NULL)) {
+ EXT4_ERROR_INODE(inode, "path == NULL *logical %d!", *logical);
+ return -EIO;
+ }
depth = path->p_depth;
*phys = 0;

@@ -1217,15 +1262,33 @@ ext4_ext_search_left(struct inode *inode, struct ext4_ext_path *path,
ex = path[depth].p_ext;
ee_len = ext4_ext_get_actual_len(ex);
if (*logical < le32_to_cpu(ex->ee_block)) {
- BUG_ON(EXT_FIRST_EXTENT(path[depth].p_hdr) != ex);
+ if (unlikely(EXT_FIRST_EXTENT(path[depth].p_hdr) != ex)) {
+ EXT4_ERROR_INODE(inode,
+ "EXT_FIRST_EXTENT != ex *logical %d ee_block %d!",
+ *logical, le32_to_cpu(ex->ee_block));
+ return -EIO;
+ }
while (--depth >= 0) {
ix = path[depth].p_idx;
- BUG_ON(ix != EXT_FIRST_INDEX(path[depth].p_hdr));
+ if (unlikely(ix != EXT_FIRST_INDEX(path[depth].p_hdr))) {
+ EXT4_ERROR_INODE(inode,
+ "ix (%d) != EXT_FIRST_INDEX (%d) (depth %d)!",
+ ix != NULL ? ix->ei_block : 0,
+ EXT_FIRST_INDEX(path[depth].p_hdr) != NULL ?
+ EXT_FIRST_INDEX(path[depth].p_hdr)->ei_block : 0,
+ depth);
+ return -EIO;
+ }
}
return 0;
}

- BUG_ON(*logical < (le32_to_cpu(ex->ee_block) + ee_len));
+ if (unlikely(*logical < (le32_to_cpu(ex->ee_block) + ee_len))) {
+ EXT4_ERROR_INODE(inode,
+ "logical %d < ee_block %d + ee_len %d!",
+ *logical, le32_to_cpu(ex->ee_block), ee_len);
+ return -EIO;
+ }

*logical = le32_to_cpu(ex->ee_block) + ee_len - 1;
*phys = ext_pblock(ex) + ee_len - 1;
@@ -1251,7 +1314,10 @@ ext4_ext_search_right(struct inode *inode, struct ext4_ext_path *path,
int depth; /* Note, NOT eh_depth; depth from top of tree */
int ee_len;

- BUG_ON(path == NULL);
+ if (unlikely(path == NULL)) {
+ EXT4_ERROR_INODE(inode, "path == NULL *logical %d!", *logical);
+ return -EIO;
+ }
depth = path->p_depth;
*phys = 0;

@@ -1265,17 +1331,32 @@ ext4_ext_search_right(struct inode *inode, struct ext4_ext_path *path,
ex = path[depth].p_ext;
ee_len = ext4_ext_get_actual_len(ex);
if (*logical < le32_to_cpu(ex->ee_block)) {
- BUG_ON(EXT_FIRST_EXTENT(path[depth].p_hdr) != ex);
+ if (unlikely(EXT_FIRST_EXTENT(path[depth].p_hdr) != ex)) {
+ EXT4_ERROR_INODE(inode,
+ "first_extent(path[%d].p_hdr) != ex",
+ depth);
+ return -EIO;
+ }
while (--depth >= 0) {
ix = path[depth].p_idx;
- BUG_ON(ix != EXT_FIRST_INDEX(path[depth].p_hdr));
+ if (unlikely(ix != EXT_FIRST_INDEX(path[depth].p_hdr))) {
+ EXT4_ERROR_INODE(inode,
+ "ix != EXT_FIRST_INDEX *logical %d!",
+ *logical);
+ return -EIO;
+ }
}
*logical = le32_to_cpu(ex->ee_block);
*phys = ext_pblock(ex);
return 0;
}

- BUG_ON(*logical < (le32_to_cpu(ex->ee_block) + ee_len));
+ if (unlikely(*logical < (le32_to_cpu(ex->ee_block) + ee_len))) {
+ EXT4_ERROR_INODE(inode,
+ "logical %d < ee_block %d + ee_len %d!",
+ *logical, le32_to_cpu(ex->ee_block), ee_len);
+ return -EIO;
+ }

if (ex != EXT_LAST_EXTENT(path[depth].p_hdr)) {
/* next allocated block in this leaf */
@@ -1414,8 +1495,12 @@ static int ext4_ext_correct_indexes(handle_t *handle, struct inode *inode,

eh = path[depth].p_hdr;
ex = path[depth].p_ext;
- BUG_ON(ex == NULL);
- BUG_ON(eh == NULL);
+
+ if (unlikely(ex == NULL || eh == NULL)) {
+ EXT4_ERROR_INODE(inode,
+ "ex %p == NULL or eh %p == NULL", ex, eh);
+ return -EIO;
+ }

if (depth == 0) {
/* there is no tree at all */
@@ -1613,10 +1698,16 @@ int ext4_ext_insert_extent(handle_t *handle, struct inode *inode,
ext4_lblk_t next;
unsigned uninitialized = 0;

- BUG_ON(ext4_ext_get_actual_len(newext) == 0);
+ if (unlikely(ext4_ext_get_actual_len(newext) == 0)) {
+ EXT4_ERROR_INODE(inode, "ext4_ext_get_actual_len(newext) == 0");
+ return -EIO;
+ }
depth = ext_depth(inode);
ex = path[depth].p_ext;
- BUG_ON(path[depth].p_hdr == NULL);
+ if (unlikely(path[depth].p_hdr == NULL)) {
+ EXT4_ERROR_INODE(inode, "path[%d].p_hdr == NULL", depth);
+ return -EIO;
+ }

/* try to insert block into found extent and return */
if (ex && !(flag & EXT4_GET_BLOCKS_PRE_IO)
@@ -1788,7 +1879,11 @@ int ext4_ext_walk_space(struct inode *inode, ext4_lblk_t block,
}

depth = ext_depth(inode);
- BUG_ON(path[depth].p_hdr == NULL);
+ if (unlikely(path[depth].p_hdr == NULL)) {
+ EXT4_ERROR_INODE(inode, "path[%d].p_hdr == NULL", depth);
+ err = -EIO;
+ break;
+ }
ex = path[depth].p_ext;
next = ext4_ext_next_allocated_block(path);

@@ -1839,7 +1934,11 @@ int ext4_ext_walk_space(struct inode *inode, ext4_lblk_t block,
cbex.ec_type = EXT4_EXT_CACHE_EXTENT;
}

- BUG_ON(cbex.ec_len == 0);
+ if (unlikely(cbex.ec_len == 0)) {
+ EXT4_ERROR_INODE(inode, "cbex.ec_len == 0");
+ err = -EIO;
+ break;
+ }
err = func(inode, path, &cbex, ex, cbdata);
ext4_ext_drop_refs(path);

@@ -1982,7 +2081,10 @@ static int ext4_ext_rm_idx(handle_t *handle, struct inode *inode,
/* free index block */
path--;
leaf = idx_pblock(path->p_idx);
- BUG_ON(path->p_hdr->eh_entries == 0);
+ if (unlikely(path->p_hdr->eh_entries == 0)) {
+ EXT4_ERROR_INODE(inode, "path->p_hdr->eh_entries == 0");
+ return -EIO;
+ }
err = ext4_ext_get_access(handle, inode, path);
if (err)
return err;
@@ -2120,8 +2222,10 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
if (!path[depth].p_hdr)
path[depth].p_hdr = ext_block_hdr(path[depth].p_bh);
eh = path[depth].p_hdr;
- BUG_ON(eh == NULL);
-
+ if (unlikely(path[depth].p_hdr == NULL)) {
+ EXT4_ERROR_INODE(inode, "path[%d].p_hdr == NULL", depth);
+ return -EIO;
+ }
/* find where to start removing */
ex = EXT_LAST_EXTENT(eh);

@@ -3240,10 +3344,10 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
* this situation is possible, though, _during_ tree modification;
* this is why assert can't be put in ext4_ext_find_extent()
*/
- if (path[depth].p_ext == NULL && depth != 0) {
- ext4_error(inode->i_sb, "bad extent address "
- "inode: %lu, iblock: %d, depth: %d",
- inode->i_ino, iblock, depth);
+ if (unlikely(path[depth].p_ext == NULL && depth != 0)) {
+ EXT4_ERROR_INODE(inode, "bad extent address "
+ "iblock: %d, depth: %d pblock %lld",
+ iblock, depth, path[depth].p_block);
err = -EIO;
goto out2;
}
@@ -3371,16 +3475,17 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
}

if (unlikely(EXT4_I(inode)->i_flags & EXT4_EOFBLOCKS_FL)) {
- if (eh->eh_entries) {
- last_ex = EXT_LAST_EXTENT(eh);
- if (iblock + ar.len > le32_to_cpu(last_ex->ee_block)
- + ext4_ext_get_actual_len(last_ex))
- EXT4_I(inode)->i_flags &= ~EXT4_EOFBLOCKS_FL;
- } else {
- WARN_ON(eh->eh_entries == 0);
- ext4_error(inode->i_sb, __func__,
- "inode#%lu, eh->eh_entries = 0!", inode->i_ino);
- }
+ if (unlikely(!eh->eh_entries)) {
+ EXT4_ERROR_INODE(inode,
+ "eh->eh_entries == 0 ee_block %d",
+ ex->ee_block);
+ err = -EIO;
+ goto out2;
+ }
+ last_ex = EXT_LAST_EXTENT(eh);
+ if (iblock + ar.len > le32_to_cpu(last_ex->ee_block)
+ + ext4_ext_get_actual_len(last_ex))
+ EXT4_I(inode)->i_flags &= ~EXT4_EOFBLOCKS_FL;
}
err = ext4_ext_insert_extent(handle, inode, path, &newex, flags);
if (err) {
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 576bbe2..2c48fbe 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -607,7 +607,14 @@ static int ext4_alloc_blocks(handle_t *handle, struct inode *inode,
if (*err)
goto failed_out;

- BUG_ON(current_block + count > EXT4_MAX_BLOCK_FILE_PHYS);
+ if (unlikely(current_block + count > EXT4_MAX_BLOCK_FILE_PHYS)) {
+ EXT4_ERROR_INODE(inode,
+ "current_block %llu + count %lu > %d!",
+ current_block, count,
+ EXT4_MAX_BLOCK_FILE_PHYS);
+ *err = -EIO;
+ goto failed_out;
+ }

target -= count;
/* allocate blocks for indirect blocks */
@@ -643,7 +650,14 @@ static int ext4_alloc_blocks(handle_t *handle, struct inode *inode,
ar.flags = EXT4_MB_HINT_DATA;

current_block = ext4_mb_new_blocks(handle, &ar, err);
- BUG_ON(current_block + ar.len > EXT4_MAX_BLOCK_FILE_PHYS);
+ if (unlikely(current_block + ar.len > EXT4_MAX_BLOCK_FILE_PHYS)) {
+ EXT4_ERROR_INODE(inode,
+ "current_block %llu + ar.len %d > %d!",
+ current_block, ar.len,
+ EXT4_MAX_BLOCK_FILE_PHYS);
+ *err = -EIO;
+ goto failed_out;
+ }

if (*err && (target == blks)) {
/*
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 200d257..6c8e92c 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -347,6 +347,42 @@ void __ext4_error(struct super_block *sb, const char *function,
ext4_handle_error(sb);
}

+void ext4_error_inode(const char *function, struct inode *inode,
+ const char *fmt, ...)
+{
+ va_list args;
+
+ va_start(args, fmt);
+ printk(KERN_CRIT "EXT4-fs error (device %s): %s: inode #%lu: (comm %s) ",
+ inode->i_sb->s_id, function, inode->i_ino, current->comm);
+ vprintk(fmt, args);
+ printk("\n");
+ va_end(args);
+
+ ext4_handle_error(inode->i_sb);
+}
+
+void ext4_error_file(const char *function, struct file *file,
+ const char *fmt, ...)
+{
+ va_list args;
+ struct inode *inode = file->f_dentry->d_inode;
+ char pathname[80], *path;
+
+ va_start(args, fmt);
+ path = d_path(&(file->f_path), pathname, sizeof(pathname));
+ if (!path)
+ path = "(unknown)";
+ printk(KERN_CRIT
+ "EXT4-fs error (device %s): %s: inode #%lu (comm %s path %s): ",
+ inode->i_sb->s_id, function, inode->i_ino, current->comm, path);
+ vprintk(fmt, args);
+ printk("\n");
+ va_end(args);
+
+ ext4_handle_error(inode->i_sb);
+}
+
static const char *ext4_decode_error(struct super_block *sb, int errno,
char nbuf[16])
{
--
1.6.6.1.1.g974db.dirty


2010-03-02 18:18:54

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 22/28] ext4: Handle non empty on-disk orphan link

From: Dmitry Monakhov <[email protected]>

In case of truncate errors we explicitly remove inode from in-core
orphan list via orphan_del(NULL, inode) without modifying the on-disk list.

But later on, the same inode may be inserted in the orphan list again
which will result the on-disk linked list getting corrupted. If inode
i_dtime contains valid value, then skip on-disk list modification.

Signed-off-by: Dmitry Monakhov <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]t.edu>
---
fs/ext4/namei.c | 8 ++++++++
1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index a13948f..608d21f 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -2013,6 +2013,13 @@ int ext4_orphan_add(handle_t *handle, struct inode *inode)
err = ext4_reserve_inode_write(handle, inode, &iloc);
if (err)
goto out_unlock;
+ /*
+ * Due to previous errors inode may be already a part of on-disk
+ * orphan list. If so skip on-disk list modification.
+ */
+ if (NEXT_ORPHAN(inode) && NEXT_ORPHAN(inode) <=
+ (le32_to_cpu(EXT4_SB(sb)->s_es->s_inodes_count)))
+ goto mem_insert;

/* Insert this inode at the head of the on-disk orphan list... */
NEXT_ORPHAN(inode) = le32_to_cpu(EXT4_SB(sb)->s_es->s_last_orphan);
@@ -2030,6 +2037,7 @@ int ext4_orphan_add(handle_t *handle, struct inode *inode)
*
* This is safe: on error we're going to ignore the orphan list
* anyway on the next recovery. */
+mem_insert:
if (!err)
list_add(&EXT4_I(inode)->i_orphan, &EXT4_SB(sb)->s_orphan);

--
1.6.6.1.1.g974db.dirty


2010-03-02 18:18:54

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 03/28] ext4: Add block validity check when truncating indirect block mapped inodes

Add checks to ext4_free_branches() to make sure a block number found
in an indirect block are valid before trying to free it. If a bad
block number is found, stop freeing the indirect block immediately,
since the file system is corrupt and we will need to run fsck anyway.
This also avoids spamming the logs, and specifically avoids
driver-level "attempt to access beyond end of device" errors obscure
what is really going on.

If you get *really*, *really*, *really* unlucky, without this patch, a
supposed indirect block containing garbage might contain a reference
to a primary block group descriptor, in which case
ext4_free_branches() could end up zero'ing out a block group
descriptor block, and if then one of the block bitmaps for a block
group described by that bg descriptor block is not in memory, and is
read in by ext4_read_block_bitmap(). This function calls
ext4_valid_block_bitmap(), which assumes that bg_inode_table() was
validated at mount time and hasn't been modified since. Since this
assumption is no longer valid, it's possible for the value
(ext4_inode_table(sb, desc) - group_first_block) to go negative, which
will cause ext4_find_next_zero_bit() to trigger a kernel GPF.

Addresses-Google-Bug: #2220436

Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/ext4.h | 1 +
fs/ext4/inode.c | 39 ++++++++++++++++++++++++++++++---------
fs/ext4/mballoc.c | 7 ++++---
3 files changed, 35 insertions(+), 12 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 602d5ad..307ecd1 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -377,6 +377,7 @@ struct ext4_new_group_data {
*/
#define EXT4_FREE_BLOCKS_METADATA 0x0001
#define EXT4_FREE_BLOCKS_FORGET 0x0002
+#define EXT4_FREE_BLOCKS_VALIDATED 0x0004

/*
* ioctl commands
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 2059c34..3e8afd9 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4130,18 +4130,27 @@ no_top:
* We release `count' blocks on disk, but (last - first) may be greater
* than `count' because there can be holes in there.
*/
-static void ext4_clear_blocks(handle_t *handle, struct inode *inode,
- struct buffer_head *bh,
- ext4_fsblk_t block_to_free,
- unsigned long count, __le32 *first,
- __le32 *last)
+static int ext4_clear_blocks(handle_t *handle, struct inode *inode,
+ struct buffer_head *bh,
+ ext4_fsblk_t block_to_free,
+ unsigned long count, __le32 *first,
+ __le32 *last)
{
__le32 *p;
- int flags = EXT4_FREE_BLOCKS_FORGET;
+ int flags = EXT4_FREE_BLOCKS_FORGET | EXT4_FREE_BLOCKS_VALIDATED;

if (S_ISDIR(inode->i_mode) || S_ISLNK(inode->i_mode))
flags |= EXT4_FREE_BLOCKS_METADATA;

+ if (!ext4_data_block_valid(EXT4_SB(inode->i_sb), block_to_free,
+ count)) {
+ ext4_error(inode->i_sb, __func__, "inode #%lu: "
+ "attempt to clear blocks %llu len %lu, invalid",
+ inode->i_ino, (unsigned long long) block_to_free,
+ count);
+ return 1;
+ }
+
if (try_to_extend_transaction(handle, inode)) {
if (bh) {
BUFFER_TRACE(bh, "call ext4_handle_dirty_metadata");
@@ -4160,6 +4169,7 @@ static void ext4_clear_blocks(handle_t *handle, struct inode *inode,
*p = 0;

ext4_free_blocks(handle, inode, 0, block_to_free, count, flags);
+ return 0;
}

/**
@@ -4215,9 +4225,10 @@ static void ext4_free_data(handle_t *handle, struct inode *inode,
} else if (nr == block_to_free + count) {
count++;
} else {
- ext4_clear_blocks(handle, inode, this_bh,
- block_to_free,
- count, block_to_free_p, p);
+ if (ext4_clear_blocks(handle, inode, this_bh,
+ block_to_free, count,
+ block_to_free_p, p))
+ break;
block_to_free = nr;
block_to_free_p = p;
count = 1;
@@ -4281,6 +4292,16 @@ static void ext4_free_branches(handle_t *handle, struct inode *inode,
if (!nr)
continue; /* A hole */

+ if (!ext4_data_block_valid(EXT4_SB(inode->i_sb),
+ nr, 1)) {
+ ext4_error(inode->i_sb, __func__,
+ "indirect mapped block in inode "
+ "#%lu invalid (level %d, blk #%lu)",
+ inode->i_ino, depth,
+ (unsigned long) nr);
+ break;
+ }
+
/* Go read the buffer for the next level down */
bh = sb_bread(inode->i_sb, nr);

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index d34afad..d129c10 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -4476,10 +4476,11 @@ void ext4_free_blocks(handle_t *handle, struct inode *inode,

sbi = EXT4_SB(sb);
es = EXT4_SB(sb)->s_es;
- if (!ext4_data_block_valid(sbi, block, count)) {
+ if (!(flags & EXT4_FREE_BLOCKS_VALIDATED) &&
+ !ext4_data_block_valid(sbi, block, count)) {
ext4_error(sb, __func__,
- "Freeing blocks not in datazone - "
- "block = %llu, count = %lu", block, count);
+ "Freeing blocks not in datazone - "
+ "block = %llu, count = %lu", block, count);
goto error_return;
}

--
1.6.6.1.1.g974db.dirty


2010-03-02 18:18:53

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 09/28] ext4: move __func__ into a macro for ext4_warning, ext4_error

From: Eric Sandeen <[email protected]>

Just a pet peeve of mine; we had a mishash of calls with either __func__
or "function_name" and the latter tends to get out of sync.

I think it's easier to just hide the __func__ in a macro, and it'll
be consistent from then on.

Signed-off-by: Eric Sandeen <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/balloc.c | 29 +++++---------
fs/ext4/dir.c | 4 +-
fs/ext4/ext4.h | 7 ++-
fs/ext4/ext4_jbd2.c | 2 +-
fs/ext4/extents.c | 9 ++--
fs/ext4/ialloc.c | 27 ++++---------
fs/ext4/inode.c | 43 +++++++++------------
fs/ext4/mballoc.c | 27 +++++--------
fs/ext4/move_extent.c | 12 +++---
fs/ext4/namei.c | 51 ++++++++++--------------
fs/ext4/resize.c | 102 ++++++++++++++++++------------------------------
fs/ext4/super.c | 11 ++---
fs/ext4/xattr.c | 32 +++++++---------
13 files changed, 146 insertions(+), 210 deletions(-)

diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
index 22bc743..720061a 100644
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@@ -97,8 +97,8 @@ unsigned ext4_init_block_bitmap(struct super_block *sb, struct buffer_head *bh,
/* If checksum is bad mark all blocks used to prevent allocation
* essentially implementing a per-group read-only flag. */
if (!ext4_group_desc_csum_verify(sbi, block_group, gdp)) {
- ext4_error(sb, __func__,
- "Checksum bad for group %u", block_group);
+ ext4_error(sb, "Checksum bad for group %u",
+ block_group);
ext4_free_blks_set(sb, gdp, 0);
ext4_free_inodes_set(sb, gdp, 0);
ext4_itable_unused_set(sb, gdp, 0);
@@ -210,10 +210,8 @@ struct ext4_group_desc * ext4_get_group_desc(struct super_block *sb,
struct ext4_sb_info *sbi = EXT4_SB(sb);

if (block_group >= ngroups) {
- ext4_error(sb, "ext4_get_group_desc",
- "block_group >= groups_count - "
- "block_group = %u, groups_count = %u",
- block_group, ngroups);
+ ext4_error(sb, "block_group >= groups_count - block_group = %u,"
+ " groups_count = %u", block_group, ngroups);

return NULL;
}
@@ -221,8 +219,7 @@ struct ext4_group_desc * ext4_get_group_desc(struct super_block *sb,
group_desc = block_group >> EXT4_DESC_PER_BLOCK_BITS(sb);
offset = block_group & (EXT4_DESC_PER_BLOCK(sb) - 1);
if (!sbi->s_group_desc[group_desc]) {
- ext4_error(sb, "ext4_get_group_desc",
- "Group descriptor not loaded - "
+ ext4_error(sb, "Group descriptor not loaded - "
"block_group = %u, group_desc = %u, desc = %u",
block_group, group_desc, offset);
return NULL;
@@ -282,9 +279,7 @@ static int ext4_valid_block_bitmap(struct super_block *sb,
return 1;

err_out:
- ext4_error(sb, __func__,
- "Invalid block bitmap - "
- "block_group = %d, block = %llu",
+ ext4_error(sb, "Invalid block bitmap - block_group = %d, block = %llu",
block_group, bitmap_blk);
return 0;
}
@@ -311,8 +306,7 @@ ext4_read_block_bitmap(struct super_block *sb, ext4_group_t block_group)
bitmap_blk = ext4_block_bitmap(sb, desc);
bh = sb_getblk(sb, bitmap_blk);
if (unlikely(!bh)) {
- ext4_error(sb, __func__,
- "Cannot read block bitmap - "
+ ext4_error(sb, "Cannot read block bitmap - "
"block_group = %u, block_bitmap = %llu",
block_group, bitmap_blk);
return NULL;
@@ -354,8 +348,7 @@ ext4_read_block_bitmap(struct super_block *sb, ext4_group_t block_group)
set_bitmap_uptodate(bh);
if (bh_submit_read(bh) < 0) {
put_bh(bh);
- ext4_error(sb, __func__,
- "Cannot read block bitmap - "
+ ext4_error(sb, "Cannot read block bitmap - "
"block_group = %u, block_bitmap = %llu",
block_group, bitmap_blk);
return NULL;
@@ -419,8 +412,7 @@ void ext4_add_groupblocks(handle_t *handle, struct super_block *sb,
in_range(block, ext4_inode_table(sb, desc), sbi->s_itb_per_group) ||
in_range(block + count - 1, ext4_inode_table(sb, desc),
sbi->s_itb_per_group)) {
- ext4_error(sb, __func__,
- "Adding blocks in system zones - "
+ ext4_error(sb, "Adding blocks in system zones - "
"Block = %llu, count = %lu",
block, count);
goto error_return;
@@ -453,8 +445,7 @@ void ext4_add_groupblocks(handle_t *handle, struct super_block *sb,
BUFFER_TRACE(bitmap_bh, "clear bit");
if (!ext4_clear_bit_atomic(ext4_group_lock_ptr(sb, block_group),
bit + i, bitmap_bh->b_data)) {
- ext4_error(sb, __func__,
- "bit already cleared for block %llu",
+ ext4_error(sb, "bit already cleared for block %llu",
(ext4_fsblk_t)(block + i));
BUFFER_TRACE(bitmap_bh, "bit already cleared");
} else {
diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
index 9dc9316..0e0bef3 100644
--- a/fs/ext4/dir.c
+++ b/fs/ext4/dir.c
@@ -83,7 +83,7 @@ int ext4_check_dir_entry(const char *function, struct inode *dir,
error_msg = "inode out of bounds";

if (error_msg != NULL)
- ext4_error(dir->i_sb, function,
+ __ext4_error(dir->i_sb, function,
"bad entry in directory #%lu: %s - "
"offset=%u, inode=%u, rec_len=%d, name_len=%d",
dir->i_ino, error_msg, offset,
@@ -150,7 +150,7 @@ static int ext4_readdir(struct file *filp,
*/
if (!bh) {
if (!dir_has_error) {
- ext4_error(sb, __func__, "directory #%lu "
+ ext4_error(sb, "directory #%lu "
"contains a hole at offset %Lu",
inode->i_ino,
(unsigned long long) filp->f_pos);
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 61cf3b3..509437f 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1486,13 +1486,16 @@ extern int ext4_group_extend(struct super_block *sb,
ext4_fsblk_t n_blocks_count);

/* super.c */
-extern void ext4_error(struct super_block *, const char *, const char *, ...)
+extern void __ext4_error(struct super_block *, const char *, const char *, ...)
__attribute__ ((format (printf, 3, 4)));
+#define ext4_error(sb, message...) __ext4_error(sb, __func__, ## message)
extern void __ext4_std_error(struct super_block *, const char *, int);
extern void ext4_abort(struct super_block *, const char *, const char *, ...)
__attribute__ ((format (printf, 3, 4)));
-extern void ext4_warning(struct super_block *, const char *, const char *, ...)
+extern void __ext4_warning(struct super_block *, const char *,
+ const char *, ...)
__attribute__ ((format (printf, 3, 4)));
+#define ext4_warning(sb, message...) __ext4_warning(sb, __func__, ## message)
extern void ext4_msg(struct super_block *, const char *, const char *, ...)
__attribute__ ((format (printf, 3, 4)));
extern void ext4_grp_locked_error(struct super_block *, ext4_group_t,
diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c
index b57e5c7..2f407c4 100644
--- a/fs/ext4/ext4_jbd2.c
+++ b/fs/ext4/ext4_jbd2.c
@@ -132,7 +132,7 @@ int __ext4_handle_dirty_metadata(const char *where, handle_t *handle,
if (inode && inode_needs_sync(inode)) {
sync_dirty_buffer(bh);
if (buffer_req(bh) && !buffer_uptodate(bh)) {
- ext4_error(inode->i_sb, __func__,
+ ext4_error(inode->i_sb,
"IO error syncing inode, "
"inode=%lu, block=%llu",
inode->i_ino,
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 5461615..bd80891 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -440,7 +440,7 @@ static int __ext4_ext_check(const char *function, struct inode *inode,
return 0;

corrupted:
- ext4_error(inode->i_sb, function,
+ __ext4_error(inode->i_sb, function,
"bad header/extent in inode #%lu: %s - magic %x, "
"entries %u, max %u(%u), depth %u(%u)",
inode->i_ino, error_msg, le16_to_cpu(eh->eh_magic),
@@ -1538,8 +1538,9 @@ int ext4_ext_try_to_merge(struct inode *inode,
merge_done = 1;
WARN_ON(eh->eh_entries == 0);
if (!eh->eh_entries)
- ext4_error(inode->i_sb, "ext4_ext_try_to_merge",
- "inode#%lu, eh->eh_entries = 0!", inode->i_ino);
+ ext4_error(inode->i_sb,
+ "inode#%lu, eh->eh_entries = 0!",
+ inode->i_ino);
}

return merge_done;
@@ -3238,7 +3239,7 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
* this is why assert can't be put in ext4_ext_find_extent()
*/
if (path[depth].p_ext == NULL && depth != 0) {
- ext4_error(inode->i_sb, __func__, "bad extent address "
+ ext4_error(inode->i_sb, "bad extent address "
"inode: %lu, iblock: %d, depth: %d",
inode->i_ino, iblock, depth);
err = -EIO;
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index 2fab5ad..e4aaf61 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -76,8 +76,7 @@ unsigned ext4_init_inode_bitmap(struct super_block *sb, struct buffer_head *bh,
/* If checksum is bad mark all blocks and inodes use to prevent
* allocation, essentially implementing a per-group read-only flag. */
if (!ext4_group_desc_csum_verify(sbi, block_group, gdp)) {
- ext4_error(sb, __func__, "Checksum bad for group %u",
- block_group);
+ ext4_error(sb, "Checksum bad for group %u", block_group);
ext4_free_blks_set(sb, gdp, 0);
ext4_free_inodes_set(sb, gdp, 0);
ext4_itable_unused_set(sb, gdp, 0);
@@ -111,8 +110,7 @@ ext4_read_inode_bitmap(struct super_block *sb, ext4_group_t block_group)
bitmap_blk = ext4_inode_bitmap(sb, desc);
bh = sb_getblk(sb, bitmap_blk);
if (unlikely(!bh)) {
- ext4_error(sb, __func__,
- "Cannot read inode bitmap - "
+ ext4_error(sb, "Cannot read inode bitmap - "
"block_group = %u, inode_bitmap = %llu",
block_group, bitmap_blk);
return NULL;
@@ -153,8 +151,7 @@ ext4_read_inode_bitmap(struct super_block *sb, ext4_group_t block_group)
set_bitmap_uptodate(bh);
if (bh_submit_read(bh) < 0) {
put_bh(bh);
- ext4_error(sb, __func__,
- "Cannot read inode bitmap - "
+ ext4_error(sb, "Cannot read inode bitmap - "
"block_group = %u, inode_bitmap = %llu",
block_group, bitmap_blk);
return NULL;
@@ -229,8 +226,7 @@ void ext4_free_inode(handle_t *handle, struct inode *inode)

es = EXT4_SB(sb)->s_es;
if (ino < EXT4_FIRST_INO(sb) || ino > le32_to_cpu(es->s_inodes_count)) {
- ext4_error(sb, "ext4_free_inode",
- "reserved or nonexistent inode %lu", ino);
+ ext4_error(sb, "reserved or nonexistent inode %lu", ino);
goto error_return;
}
block_group = (ino - 1) / EXT4_INODES_PER_GROUP(sb);
@@ -248,8 +244,7 @@ void ext4_free_inode(handle_t *handle, struct inode *inode)
cleared = ext4_clear_bit_atomic(ext4_group_lock_ptr(sb, block_group),
bit, bitmap_bh->b_data);
if (!cleared)
- ext4_error(sb, "ext4_free_inode",
- "bit already cleared for inode %lu", ino);
+ ext4_error(sb, "bit already cleared for inode %lu", ino);
else {
gdp = ext4_get_group_desc(sb, block_group, &bh2);

@@ -736,8 +731,7 @@ static int ext4_claim_inode(struct super_block *sb,
if ((group == 0 && ino < EXT4_FIRST_INO(sb)) ||
ino > EXT4_INODES_PER_GROUP(sb)) {
ext4_unlock_group(sb, group);
- ext4_error(sb, __func__,
- "reserved inode or inode > inodes count - "
+ ext4_error(sb, "reserved inode or inode > inodes count - "
"block_group = %u, inode=%lu", group,
ino + group * EXT4_INODES_PER_GROUP(sb));
return 1;
@@ -1099,8 +1093,7 @@ struct inode *ext4_orphan_get(struct super_block *sb, unsigned long ino)

/* Error cases - e2fsck has already cleaned up for us */
if (ino > max_ino) {
- ext4_warning(sb, __func__,
- "bad orphan ino %lu! e2fsck was run?", ino);
+ ext4_warning(sb, "bad orphan ino %lu! e2fsck was run?", ino);
goto error;
}

@@ -1108,8 +1101,7 @@ struct inode *ext4_orphan_get(struct super_block *sb, unsigned long ino)
bit = (ino - 1) % EXT4_INODES_PER_GROUP(sb);
bitmap_bh = ext4_read_inode_bitmap(sb, block_group);
if (!bitmap_bh) {
- ext4_warning(sb, __func__,
- "inode bitmap error for orphan %lu", ino);
+ ext4_warning(sb, "inode bitmap error for orphan %lu", ino);
goto error;
}

@@ -1141,8 +1133,7 @@ iget_failed:
err = PTR_ERR(inode);
inode = NULL;
bad_orphan:
- ext4_warning(sb, __func__,
- "bad orphan inode %lu! e2fsck was run?", ino);
+ ext4_warning(sb, "bad orphan inode %lu! e2fsck was run?", ino);
printk(KERN_NOTICE "ext4_test_bit(bit=%d, block=%llu) = %d\n",
bit, (unsigned long long)bitmap_bh->b_blocknr,
ext4_test_bit(bit, bitmap_bh->b_data));
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 3e53011..536067b 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -194,7 +194,7 @@ void ext4_delete_inode(struct inode *inode)
inode->i_size = 0;
err = ext4_mark_inode_dirty(handle, inode);
if (err) {
- ext4_warning(inode->i_sb, __func__,
+ ext4_warning(inode->i_sb,
"couldn't mark inode dirty (err %d)", err);
goto stop_handle;
}
@@ -212,7 +212,7 @@ void ext4_delete_inode(struct inode *inode)
if (err > 0)
err = ext4_journal_restart(handle, 3);
if (err != 0) {
- ext4_warning(inode->i_sb, __func__,
+ ext4_warning(inode->i_sb,
"couldn't extend journal (err %d)", err);
stop_handle:
ext4_journal_stop(handle);
@@ -323,8 +323,7 @@ static int ext4_block_to_path(struct inode *inode,
offsets[n++] = i_block & (ptrs - 1);
final = ptrs;
} else {
- ext4_warning(inode->i_sb, "ext4_block_to_path",
- "block %lu > max in inode %lu",
+ ext4_warning(inode->i_sb, "block %lu > max in inode %lu",
i_block + direct_blocks +
indirect_blocks + double_blocks, inode->i_ino);
}
@@ -344,7 +343,7 @@ static int __ext4_check_blockref(const char *function, struct inode *inode,
if (blk &&
unlikely(!ext4_data_block_valid(EXT4_SB(inode->i_sb),
blk, 1))) {
- ext4_error(inode->i_sb, function,
+ __ext4_error(inode->i_sb, function,
"invalid block reference %u "
"in inode #%lu", blk, inode->i_ino);
return -EIO;
@@ -1125,7 +1124,7 @@ static int check_block_validity(struct inode *inode, const char *msg,
sector_t logical, sector_t phys, int len)
{
if (!ext4_data_block_valid(EXT4_SB(inode->i_sb), phys, len)) {
- ext4_error(inode->i_sb, msg,
+ __ext4_error(inode->i_sb, msg,
"inode #%lu logical block %llu mapped to %llu "
"(size %d)", inode->i_ino,
(unsigned long long) logical,
@@ -4147,7 +4146,7 @@ static int ext4_clear_blocks(handle_t *handle, struct inode *inode,

if (!ext4_data_block_valid(EXT4_SB(inode->i_sb), block_to_free,
count)) {
- ext4_error(inode->i_sb, __func__, "inode #%lu: "
+ ext4_error(inode->i_sb, "inode #%lu: "
"attempt to clear blocks %llu len %lu, invalid",
inode->i_ino, (unsigned long long) block_to_free,
count);
@@ -4255,7 +4254,7 @@ static void ext4_free_data(handle_t *handle, struct inode *inode,
if ((EXT4_JOURNAL(inode) == NULL) || bh2jh(this_bh))
ext4_handle_dirty_metadata(handle, inode, this_bh);
else
- ext4_error(inode->i_sb, __func__,
+ ext4_error(inode->i_sb,
"circular indirect block detected, "
"inode=%lu, block=%llu",
inode->i_ino,
@@ -4297,7 +4296,7 @@ static void ext4_free_branches(handle_t *handle, struct inode *inode,

if (!ext4_data_block_valid(EXT4_SB(inode->i_sb),
nr, 1)) {
- ext4_error(inode->i_sb, __func__,
+ ext4_error(inode->i_sb,
"indirect mapped block in inode "
"#%lu invalid (level %d, blk #%lu)",
inode->i_ino, depth,
@@ -4313,7 +4312,7 @@ static void ext4_free_branches(handle_t *handle, struct inode *inode,
* (should be rare).
*/
if (!bh) {
- ext4_error(inode->i_sb, "ext4_free_branches",
+ ext4_error(inode->i_sb,
"Read failure, inode=%lu, block=%llu",
inode->i_ino, nr);
continue;
@@ -4628,9 +4627,8 @@ static int __ext4_get_inode_loc(struct inode *inode,

bh = sb_getblk(sb, block);
if (!bh) {
- ext4_error(sb, "ext4_get_inode_loc", "unable to read "
- "inode block - inode=%lu, block=%llu",
- inode->i_ino, block);
+ ext4_error(sb, "unable to read inode block - "
+ "inode=%lu, block=%llu", inode->i_ino, block);
return -EIO;
}
if (!buffer_uptodate(bh)) {
@@ -4728,9 +4726,8 @@ make_io:
submit_bh(READ_META, bh);
wait_on_buffer(bh);
if (!buffer_uptodate(bh)) {
- ext4_error(sb, __func__,
- "unable to read inode block - inode=%lu, "
- "block=%llu", inode->i_ino, block);
+ ext4_error(sb, "unable to read inode block - inode=%lu,"
+ " block=%llu", inode->i_ino, block);
brelse(bh);
return -EIO;
}
@@ -4941,8 +4938,7 @@ struct inode *ext4_iget(struct super_block *sb, unsigned long ino)
ret = 0;
if (ei->i_file_acl &&
!ext4_data_block_valid(EXT4_SB(sb), ei->i_file_acl, 1)) {
- ext4_error(sb, __func__,
- "bad extended attribute block %llu in inode #%lu",
+ ext4_error(sb, "bad extended attribute block %llu inode #%lu",
ei->i_file_acl, inode->i_ino);
ret = -EIO;
goto bad_inode;
@@ -4988,8 +4984,7 @@ struct inode *ext4_iget(struct super_block *sb, unsigned long ino)
new_decode_dev(le32_to_cpu(raw_inode->i_block[1])));
} else {
ret = -EIO;
- ext4_error(inode->i_sb, __func__,
- "bogus i_mode (%o) for inode=%lu",
+ ext4_error(inode->i_sb, "bogus i_mode (%o) for inode=%lu",
inode->i_mode, inode->i_ino);
goto bad_inode;
}
@@ -5228,10 +5223,8 @@ int ext4_write_inode(struct inode *inode, int wait)
if (wait)
sync_dirty_buffer(iloc.bh);
if (buffer_req(iloc.bh) && !buffer_uptodate(iloc.bh)) {
- ext4_error(inode->i_sb, __func__,
- "IO error syncing inode, "
- "inode=%lu, block=%llu",
- inode->i_ino,
+ ext4_error(inode->i_sb, "IO error syncing inode, "
+ "inode=%lu, block=%llu", inode->i_ino,
(unsigned long long)iloc.bh->b_blocknr);
err = -EIO;
}
@@ -5644,7 +5637,7 @@ int ext4_mark_inode_dirty(handle_t *handle, struct inode *inode)
EXT4_STATE_NO_EXPAND);
if (mnt_count !=
le16_to_cpu(sbi->s_es->s_mnt_count)) {
- ext4_warning(inode->i_sb, __func__,
+ ext4_warning(inode->i_sb,
"Unable to expand inode %lu. Delete"
" some EAs or run e2fsck.",
inode->i_ino);
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index d129c10..415e11f 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2709,8 +2709,7 @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,

len = ac->ac_b_ex.fe_len;
if (!ext4_data_block_valid(sbi, block, len)) {
- ext4_error(sb, __func__,
- "Allocating blocks %llu-%llu which overlap "
+ ext4_error(sb, "Allocating blocks %llu-%llu which overlap "
"fs metadata\n", block, block+len);
/* File system mounted not to panic on error
* Fix the bitmap and repeat the block allocation
@@ -3623,15 +3622,13 @@ ext4_mb_discard_group_preallocations(struct super_block *sb,

bitmap_bh = ext4_read_block_bitmap(sb, group);
if (bitmap_bh == NULL) {
- ext4_error(sb, __func__, "Error in reading block "
- "bitmap for %u", group);
+ ext4_error(sb, "Error reading block bitmap for %u", group);
return 0;
}

err = ext4_mb_load_buddy(sb, group, &e4b);
if (err) {
- ext4_error(sb, __func__, "Error in loading buddy "
- "information for %u", group);
+ ext4_error(sb, "Error loading buddy information for %u", group);
put_bh(bitmap_bh);
return 0;
}
@@ -3804,15 +3801,15 @@ repeat:

err = ext4_mb_load_buddy(sb, group, &e4b);
if (err) {
- ext4_error(sb, __func__, "Error in loading buddy "
- "information for %u", group);
+ ext4_error(sb, "Error loading buddy information for %u",
+ group);
continue;
}

bitmap_bh = ext4_read_block_bitmap(sb, group);
if (bitmap_bh == NULL) {
- ext4_error(sb, __func__, "Error in reading block "
- "bitmap for %u", group);
+ ext4_error(sb, "Error reading block bitmap for %u",
+ group);
ext4_mb_release_desc(&e4b);
continue;
}
@@ -4077,8 +4074,8 @@ ext4_mb_discard_lg_preallocations(struct super_block *sb,

ext4_get_group_no_and_offset(sb, pa->pa_pstart, &group, NULL);
if (ext4_mb_load_buddy(sb, group, &e4b)) {
- ext4_error(sb, __func__, "Error in loading buddy "
- "information for %u", group);
+ ext4_error(sb, "Error loading buddy information for %u",
+ group);
continue;
}
ext4_lock_group(sb, group);
@@ -4478,8 +4475,7 @@ void ext4_free_blocks(handle_t *handle, struct inode *inode,
es = EXT4_SB(sb)->s_es;
if (!(flags & EXT4_FREE_BLOCKS_VALIDATED) &&
!ext4_data_block_valid(sbi, block, count)) {
- ext4_error(sb, __func__,
- "Freeing blocks not in datazone - "
+ ext4_error(sb, "Freeing blocks not in datazone - "
"block = %llu, count = %lu", block, count);
goto error_return;
}
@@ -4548,8 +4544,7 @@ do_more:
in_range(block + count - 1, ext4_inode_table(sb, gdp),
EXT4_SB(sb)->s_itb_per_group)) {

- ext4_error(sb, __func__,
- "Freeing blocks in system zone - "
+ ext4_error(sb, "Freeing blocks in system zone - "
"Block = %llu, count = %lu", block, count);
/* err = 0. ext4_std_error should be a no op */
goto error_return;
diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c
index 82c415b..1654eb8 100644
--- a/fs/ext4/move_extent.c
+++ b/fs/ext4/move_extent.c
@@ -152,12 +152,12 @@ mext_check_null_inode(struct inode *inode1, struct inode *inode2,
int ret = 0;

if (inode1 == NULL) {
- ext4_error(inode2->i_sb, function,
+ __ext4_error(inode2->i_sb, function,
"Both inodes should not be NULL: "
"inode1 NULL inode2 %lu", inode2->i_ino);
ret = -EIO;
} else if (inode2 == NULL) {
- ext4_error(inode1->i_sb, function,
+ __ext4_error(inode1->i_sb, function,
"Both inodes should not be NULL: "
"inode1 %lu inode2 NULL", inode1->i_ino);
ret = -EIO;
@@ -526,7 +526,7 @@ mext_leaf_block(handle_t *handle, struct inode *orig_inode,
* new_ext |-------|
*/
if (le32_to_cpu(oext->ee_block) + oext_alen - 1 < new_ext_end) {
- ext4_error(orig_inode->i_sb, __func__,
+ ext4_error(orig_inode->i_sb,
"new_ext_end(%u) should be less than or equal to "
"oext->ee_block(%u) + oext_alen(%d) - 1",
new_ext_end, le32_to_cpu(oext->ee_block),
@@ -689,12 +689,12 @@ mext_replace_branches(handle_t *handle, struct inode *orig_inode,
while (1) {
/* The extent for donor must be found. */
if (!dext) {
- ext4_error(donor_inode->i_sb, __func__,
+ ext4_error(donor_inode->i_sb,
"The extent for donor must be found");
*err = -EIO;
goto out;
} else if (donor_off != le32_to_cpu(tmp_dext.ee_block)) {
- ext4_error(donor_inode->i_sb, __func__,
+ ext4_error(donor_inode->i_sb,
"Donor offset(%u) and the first block of donor "
"extent(%u) should be equal",
donor_off,
@@ -1351,7 +1351,7 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp,
if (ret1 < 0)
break;
if (*moved_len > len) {
- ext4_error(orig_inode->i_sb, __func__,
+ ext4_error(orig_inode->i_sb,
"We replaced blocks too much! "
"sum of replaced: %llu requested: %llu",
*moved_len, len);
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 17a17e1..bd2dc0b 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -383,8 +383,7 @@ dx_probe(const struct qstr *d_name, struct inode *dir,
if (root->info.hash_version != DX_HASH_TEA &&
root->info.hash_version != DX_HASH_HALF_MD4 &&
root->info.hash_version != DX_HASH_LEGACY) {
- ext4_warning(dir->i_sb, __func__,
- "Unrecognised inode hash code %d",
+ ext4_warning(dir->i_sb, "Unrecognised inode hash code %d",
root->info.hash_version);
brelse(bh);
*err = ERR_BAD_DX_DIR;
@@ -399,8 +398,7 @@ dx_probe(const struct qstr *d_name, struct inode *dir,
hash = hinfo->hash;

if (root->info.unused_flags & 1) {
- ext4_warning(dir->i_sb, __func__,
- "Unimplemented inode hash flags: %#06x",
+ ext4_warning(dir->i_sb, "Unimplemented inode hash flags: %#06x",
root->info.unused_flags);
brelse(bh);
*err = ERR_BAD_DX_DIR;
@@ -408,8 +406,7 @@ dx_probe(const struct qstr *d_name, struct inode *dir,
}

if ((indirect = root->info.indirect_levels) > 1) {
- ext4_warning(dir->i_sb, __func__,
- "Unimplemented inode hash depth: %#06x",
+ ext4_warning(dir->i_sb, "Unimplemented inode hash depth: %#06x",
root->info.indirect_levels);
brelse(bh);
*err = ERR_BAD_DX_DIR;
@@ -421,8 +418,7 @@ dx_probe(const struct qstr *d_name, struct inode *dir,

if (dx_get_limit(entries) != dx_root_limit(dir,
root->info.info_length)) {
- ext4_warning(dir->i_sb, __func__,
- "dx entry: limit != root limit");
+ ext4_warning(dir->i_sb, "dx entry: limit != root limit");
brelse(bh);
*err = ERR_BAD_DX_DIR;
goto fail;
@@ -433,7 +429,7 @@ dx_probe(const struct qstr *d_name, struct inode *dir,
{
count = dx_get_count(entries);
if (!count || count > dx_get_limit(entries)) {
- ext4_warning(dir->i_sb, __func__,
+ ext4_warning(dir->i_sb,
"dx entry: no count or count > limit");
brelse(bh);
*err = ERR_BAD_DX_DIR;
@@ -478,7 +474,7 @@ dx_probe(const struct qstr *d_name, struct inode *dir,
goto fail2;
at = entries = ((struct dx_node *) bh->b_data)->entries;
if (dx_get_limit(entries) != dx_node_limit (dir)) {
- ext4_warning(dir->i_sb, __func__,
+ ext4_warning(dir->i_sb,
"dx entry: limit != node limit");
brelse(bh);
*err = ERR_BAD_DX_DIR;
@@ -494,7 +490,7 @@ fail2:
}
fail:
if (*err == ERR_BAD_DX_DIR)
- ext4_warning(dir->i_sb, __func__,
+ ext4_warning(dir->i_sb,
"Corrupt dir inode %ld, running e2fsck is "
"recommended.", dir->i_ino);
return NULL;
@@ -947,9 +943,8 @@ restart:
wait_on_buffer(bh);
if (!buffer_uptodate(bh)) {
/* read error, skip block & hope for the best */
- ext4_error(sb, __func__, "reading directory #%lu "
- "offset %lu", dir->i_ino,
- (unsigned long)block);
+ ext4_error(sb, "reading directory #%lu offset %lu",
+ dir->i_ino, (unsigned long)block);
brelse(bh);
goto next;
}
@@ -1041,7 +1036,7 @@ static struct buffer_head * ext4_dx_find_entry(struct inode *dir, const struct q
retval = ext4_htree_next_block(dir, hash, frame,
frames, NULL);
if (retval < 0) {
- ext4_warning(sb, __func__,
+ ext4_warning(sb,
"error reading index page in directory #%lu",
dir->i_ino);
*err = retval;
@@ -1071,14 +1066,13 @@ static struct dentry *ext4_lookup(struct inode *dir, struct dentry *dentry, stru
__u32 ino = le32_to_cpu(de->inode);
brelse(bh);
if (!ext4_valid_inum(dir->i_sb, ino)) {
- ext4_error(dir->i_sb, "ext4_lookup",
- "bad inode number: %u", ino);
+ ext4_error(dir->i_sb, "bad inode number: %u", ino);
return ERR_PTR(-EIO);
}
inode = ext4_iget(dir->i_sb, ino);
if (unlikely(IS_ERR(inode))) {
if (PTR_ERR(inode) == -ESTALE) {
- ext4_error(dir->i_sb, __func__,
+ ext4_error(dir->i_sb,
"deleted inode referenced: %u",
ino);
return ERR_PTR(-EIO);
@@ -1110,7 +1104,7 @@ struct dentry *ext4_get_parent(struct dentry *child)
brelse(bh);

if (!ext4_valid_inum(child->d_inode->i_sb, ino)) {
- ext4_error(child->d_inode->i_sb, "ext4_get_parent",
+ ext4_error(child->d_inode->i_sb,
"bad inode number: %u", ino);
return ERR_PTR(-EIO);
}
@@ -1410,7 +1404,7 @@ static int make_indexed_dir(handle_t *handle, struct dentry *dentry,
de = (struct ext4_dir_entry_2 *)((char *)fde +
ext4_rec_len_from_disk(fde->rec_len, blocksize));
if ((char *) de >= (((char *) root) + blocksize)) {
- ext4_error(dir->i_sb, __func__,
+ ext4_error(dir->i_sb,
"invalid rec_len for '..' in inode %lu",
dir->i_ino);
brelse(bh);
@@ -1575,8 +1569,7 @@ static int ext4_dx_add_entry(handle_t *handle, struct dentry *dentry,

if (levels && (dx_get_count(frames->entries) ==
dx_get_limit(frames->entries))) {
- ext4_warning(sb, __func__,
- "Directory index full!");
+ ext4_warning(sb, "Directory index full!");
err = -ENOSPC;
goto cleanup;
}
@@ -1916,11 +1909,11 @@ static int empty_dir(struct inode *inode)
if (inode->i_size < EXT4_DIR_REC_LEN(1) + EXT4_DIR_REC_LEN(2) ||
!(bh = ext4_bread(NULL, inode, 0, 0, &err))) {
if (err)
- ext4_error(inode->i_sb, __func__,
+ ext4_error(inode->i_sb,
"error %d reading directory #%lu offset 0",
err, inode->i_ino);
else
- ext4_warning(inode->i_sb, __func__,
+ ext4_warning(inode->i_sb,
"bad directory (dir #%lu) - no data block",
inode->i_ino);
return 1;
@@ -1931,7 +1924,7 @@ static int empty_dir(struct inode *inode)
!le32_to_cpu(de1->inode) ||
strcmp(".", de->name) ||
strcmp("..", de1->name)) {
- ext4_warning(inode->i_sb, "empty_dir",
+ ext4_warning(inode->i_sb,
"bad directory (dir #%lu) - no `.' or `..'",
inode->i_ino);
brelse(bh);
@@ -1949,7 +1942,7 @@ static int empty_dir(struct inode *inode)
offset >> EXT4_BLOCK_SIZE_BITS(sb), 0, &err);
if (!bh) {
if (err)
- ext4_error(sb, __func__,
+ ext4_error(sb,
"error %d reading directory"
" #%lu offset %u",
err, inode->i_ino, offset);
@@ -2163,7 +2156,7 @@ static int ext4_rmdir(struct inode *dir, struct dentry *dentry)
if (retval)
goto end_rmdir;
if (!EXT4_DIR_LINK_EMPTY(inode))
- ext4_warning(inode->i_sb, "ext4_rmdir",
+ ext4_warning(inode->i_sb,
"empty directory has too many links (%d)",
inode->i_nlink);
inode->i_version++;
@@ -2215,7 +2208,7 @@ static int ext4_unlink(struct inode *dir, struct dentry *dentry)
goto end_unlink;

if (!inode->i_nlink) {
- ext4_warning(inode->i_sb, "ext4_unlink",
+ ext4_warning(inode->i_sb,
"Deleting nonexistent file (%lu), %d",
inode->i_ino, inode->i_nlink);
inode->i_nlink = 1;
@@ -2462,7 +2455,7 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
}
}
if (retval) {
- ext4_warning(old_dir->i_sb, "ext4_rename",
+ ext4_warning(old_dir->i_sb,
"Deleting old file (%lu), %d, error=%d",
old_dir->i_ino, old_dir->i_nlink, retval);
}
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index 3b2c554..5692c48 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -48,65 +48,54 @@ static int verify_group_input(struct super_block *sb,

ext4_get_group_no_and_offset(sb, start, NULL, &offset);
if (group != sbi->s_groups_count)
- ext4_warning(sb, __func__,
- "Cannot add at group %u (only %u groups)",
+ ext4_warning(sb, "Cannot add at group %u (only %u groups)",
input->group, sbi->s_groups_count);
else if (offset != 0)
- ext4_warning(sb, __func__, "Last group not full");
+ ext4_warning(sb, "Last group not full");
else if (input->reserved_blocks > input->blocks_count / 5)
- ext4_warning(sb, __func__, "Reserved blocks too high (%u)",
+ ext4_warning(sb, "Reserved blocks too high (%u)",
input->reserved_blocks);
else if (free_blocks_count < 0)
- ext4_warning(sb, __func__, "Bad blocks count %u",
+ ext4_warning(sb, "Bad blocks count %u",
input->blocks_count);
else if (!(bh = sb_bread(sb, end - 1)))
- ext4_warning(sb, __func__,
- "Cannot read last block (%llu)",
+ ext4_warning(sb, "Cannot read last block (%llu)",
end - 1);
else if (outside(input->block_bitmap, start, end))
- ext4_warning(sb, __func__,
- "Block bitmap not in group (block %llu)",
+ ext4_warning(sb, "Block bitmap not in group (block %llu)",
(unsigned long long)input->block_bitmap);
else if (outside(input->inode_bitmap, start, end))
- ext4_warning(sb, __func__,
- "Inode bitmap not in group (block %llu)",
+ ext4_warning(sb, "Inode bitmap not in group (block %llu)",
(unsigned long long)input->inode_bitmap);
else if (outside(input->inode_table, start, end) ||
outside(itend - 1, start, end))
- ext4_warning(sb, __func__,
- "Inode table not in group (blocks %llu-%llu)",
+ ext4_warning(sb, "Inode table not in group (blocks %llu-%llu)",
(unsigned long long)input->inode_table, itend - 1);
else if (input->inode_bitmap == input->block_bitmap)
- ext4_warning(sb, __func__,
- "Block bitmap same as inode bitmap (%llu)",
+ ext4_warning(sb, "Block bitmap same as inode bitmap (%llu)",
(unsigned long long)input->block_bitmap);
else if (inside(input->block_bitmap, input->inode_table, itend))
- ext4_warning(sb, __func__,
- "Block bitmap (%llu) in inode table (%llu-%llu)",
+ ext4_warning(sb, "Block bitmap (%llu) in inode table "
+ "(%llu-%llu)",
(unsigned long long)input->block_bitmap,
(unsigned long long)input->inode_table, itend - 1);
else if (inside(input->inode_bitmap, input->inode_table, itend))
- ext4_warning(sb, __func__,
- "Inode bitmap (%llu) in inode table (%llu-%llu)",
+ ext4_warning(sb, "Inode bitmap (%llu) in inode table "
+ "(%llu-%llu)",
(unsigned long long)input->inode_bitmap,
(unsigned long long)input->inode_table, itend - 1);
else if (inside(input->block_bitmap, start, metaend))
- ext4_warning(sb, __func__,
- "Block bitmap (%llu) in GDT table"
- " (%llu-%llu)",
+ ext4_warning(sb, "Block bitmap (%llu) in GDT table (%llu-%llu)",
(unsigned long long)input->block_bitmap,
start, metaend - 1);
else if (inside(input->inode_bitmap, start, metaend))
- ext4_warning(sb, __func__,
- "Inode bitmap (%llu) in GDT table"
- " (%llu-%llu)",
+ ext4_warning(sb, "Inode bitmap (%llu) in GDT table (%llu-%llu)",
(unsigned long long)input->inode_bitmap,
start, metaend - 1);
else if (inside(input->inode_table, start, metaend) ||
inside(itend - 1, start, metaend))
- ext4_warning(sb, __func__,
- "Inode table (%llu-%llu) overlaps"
- "GDT table (%llu-%llu)",
+ ext4_warning(sb, "Inode table (%llu-%llu) overlaps GDT table "
+ "(%llu-%llu)",
(unsigned long long)input->inode_table,
itend - 1, start, metaend - 1);
else
@@ -364,8 +353,7 @@ static int verify_reserved_gdb(struct super_block *sb,
while ((grp = ext4_list_backups(sb, &three, &five, &seven)) < end) {
if (le32_to_cpu(*p++) !=
grp * EXT4_BLOCKS_PER_GROUP(sb) + blk){
- ext4_warning(sb, __func__,
- "reserved GDT %llu"
+ ext4_warning(sb, "reserved GDT %llu"
" missing grp %d (%llu)",
blk, grp,
grp *
@@ -420,8 +408,7 @@ static int add_new_gdb(handle_t *handle, struct inode *inode,
*/
if (EXT4_SB(sb)->s_sbh->b_blocknr !=
le32_to_cpu(EXT4_SB(sb)->s_es->s_first_data_block)) {
- ext4_warning(sb, __func__,
- "won't resize using backup superblock at %llu",
+ ext4_warning(sb, "won't resize using backup superblock at %llu",
(unsigned long long)EXT4_SB(sb)->s_sbh->b_blocknr);
return -EPERM;
}
@@ -444,8 +431,7 @@ static int add_new_gdb(handle_t *handle, struct inode *inode,

data = (__le32 *)dind->b_data;
if (le32_to_cpu(data[gdb_num % EXT4_ADDR_PER_BLOCK(sb)]) != gdblock) {
- ext4_warning(sb, __func__,
- "new group %u GDT block %llu not reserved",
+ ext4_warning(sb, "new group %u GDT block %llu not reserved",
input->group, gdblock);
err = -EINVAL;
goto exit_dind;
@@ -468,7 +454,7 @@ static int add_new_gdb(handle_t *handle, struct inode *inode,
GFP_NOFS);
if (!n_group_desc) {
err = -ENOMEM;
- ext4_warning(sb, __func__,
+ ext4_warning(sb,
"not enough memory for %lu groups", gdb_num + 1);
goto exit_inode;
}
@@ -567,8 +553,7 @@ static int reserve_backup_gdb(handle_t *handle, struct inode *inode,
/* Get each reserved primary GDT block and verify it holds backups */
for (res = 0; res < reserved_gdb; res++, blk++) {
if (le32_to_cpu(*data) != blk) {
- ext4_warning(sb, __func__,
- "reserved block %llu"
+ ext4_warning(sb, "reserved block %llu"
" not at offset %ld",
blk,
(long)(data - (__le32 *)dind->b_data));
@@ -713,8 +698,7 @@ static void update_backups(struct super_block *sb,
*/
exit_err:
if (err) {
- ext4_warning(sb, __func__,
- "can't update backup for group %u (err %d), "
+ ext4_warning(sb, "can't update backup for group %u (err %d), "
"forcing fsck on next reboot", group, err);
sbi->s_mount_state &= ~EXT4_VALID_FS;
sbi->s_es->s_state &= cpu_to_le16(~EXT4_VALID_FS);
@@ -753,20 +737,19 @@ int ext4_group_add(struct super_block *sb, struct ext4_new_group_data *input)

if (gdb_off == 0 && !EXT4_HAS_RO_COMPAT_FEATURE(sb,
EXT4_FEATURE_RO_COMPAT_SPARSE_SUPER)) {
- ext4_warning(sb, __func__,
- "Can't resize non-sparse filesystem further");
+ ext4_warning(sb, "Can't resize non-sparse filesystem further");
return -EPERM;
}

if (ext4_blocks_count(es) + input->blocks_count <
ext4_blocks_count(es)) {
- ext4_warning(sb, __func__, "blocks_count overflow");
+ ext4_warning(sb, "blocks_count overflow");
return -EINVAL;
}

if (le32_to_cpu(es->s_inodes_count) + EXT4_INODES_PER_GROUP(sb) <
le32_to_cpu(es->s_inodes_count)) {
- ext4_warning(sb, __func__, "inodes_count overflow");
+ ext4_warning(sb, "inodes_count overflow");
return -EINVAL;
}

@@ -774,14 +757,13 @@ int ext4_group_add(struct super_block *sb, struct ext4_new_group_data *input)
if (!EXT4_HAS_COMPAT_FEATURE(sb,
EXT4_FEATURE_COMPAT_RESIZE_INODE)
|| !le16_to_cpu(es->s_reserved_gdt_blocks)) {
- ext4_warning(sb, __func__,
+ ext4_warning(sb,
"No reserved GDT blocks, can't resize");
return -EPERM;
}
inode = ext4_iget(sb, EXT4_RESIZE_INO);
if (IS_ERR(inode)) {
- ext4_warning(sb, __func__,
- "Error opening resize inode");
+ ext4_warning(sb, "Error opening resize inode");
return PTR_ERR(inode);
}
}
@@ -810,8 +792,7 @@ int ext4_group_add(struct super_block *sb, struct ext4_new_group_data *input)

mutex_lock(&sbi->s_resize_lock);
if (input->group != sbi->s_groups_count) {
- ext4_warning(sb, __func__,
- "multiple resizers run on filesystem!");
+ ext4_warning(sb, "multiple resizers run on filesystem!");
err = -EBUSY;
goto exit_journal;
}
@@ -997,13 +978,12 @@ int ext4_group_extend(struct super_block *sb, struct ext4_super_block *es,
" too large to resize to %llu blocks safely\n",
sb->s_id, n_blocks_count);
if (sizeof(sector_t) < 8)
- ext4_warning(sb, __func__, "CONFIG_LBDAF not enabled");
+ ext4_warning(sb, "CONFIG_LBDAF not enabled");
return -EINVAL;
}

if (n_blocks_count < o_blocks_count) {
- ext4_warning(sb, __func__,
- "can't shrink FS - resize aborted");
+ ext4_warning(sb, "can't shrink FS - resize aborted");
return -EBUSY;
}

@@ -1011,15 +991,14 @@ int ext4_group_extend(struct super_block *sb, struct ext4_super_block *es,
ext4_get_group_no_and_offset(sb, o_blocks_count, &group, &last);

if (last == 0) {
- ext4_warning(sb, __func__,
- "need to use ext2online to resize further");
+ ext4_warning(sb, "need to use ext2online to resize further");
return -EPERM;
}

add = EXT4_BLOCKS_PER_GROUP(sb) - last;

if (o_blocks_count + add < o_blocks_count) {
- ext4_warning(sb, __func__, "blocks_count overflow");
+ ext4_warning(sb, "blocks_count overflow");
return -EINVAL;
}

@@ -1027,16 +1006,13 @@ int ext4_group_extend(struct super_block *sb, struct ext4_super_block *es,
add = n_blocks_count - o_blocks_count;

if (o_blocks_count + add < n_blocks_count)
- ext4_warning(sb, __func__,
- "will only finish group (%llu"
- " blocks, %u new)",
+ ext4_warning(sb, "will only finish group (%llu blocks, %u new)",
o_blocks_count + add, add);

/* See if the device is actually as big as what was requested */
bh = sb_bread(sb, o_blocks_count + add - 1);
if (!bh) {
- ext4_warning(sb, __func__,
- "can't read last block, resize aborted");
+ ext4_warning(sb, "can't read last block, resize aborted");
return -ENOSPC;
}
brelse(bh);
@@ -1047,14 +1023,13 @@ int ext4_group_extend(struct super_block *sb, struct ext4_super_block *es,
handle = ext4_journal_start_sb(sb, 3);
if (IS_ERR(handle)) {
err = PTR_ERR(handle);
- ext4_warning(sb, __func__, "error %d on journal start", err);
+ ext4_warning(sb, "error %d on journal start", err);
goto exit_put;
}

mutex_lock(&EXT4_SB(sb)->s_resize_lock);
if (o_blocks_count != ext4_blocks_count(es)) {
- ext4_warning(sb, __func__,
- "multiple resizers run on filesystem!");
+ ext4_warning(sb, "multiple resizers run on filesystem!");
mutex_unlock(&EXT4_SB(sb)->s_resize_lock);
ext4_journal_stop(handle);
err = -EBUSY;
@@ -1063,8 +1038,7 @@ int ext4_group_extend(struct super_block *sb, struct ext4_super_block *es,

if ((err = ext4_journal_get_write_access(handle,
EXT4_SB(sb)->s_sbh))) {
- ext4_warning(sb, __func__,
- "error %d on journal write access", err);
+ ext4_warning(sb, "error %d on journal write access", err);
mutex_unlock(&EXT4_SB(sb)->s_resize_lock);
ext4_journal_stop(handle);
goto exit_put;
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 68a55df..1c85bb6 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -333,7 +333,7 @@ static void ext4_handle_error(struct super_block *sb)
sb->s_id);
}

-void ext4_error(struct super_block *sb, const char *function,
+void __ext4_error(struct super_block *sb, const char *function,
const char *fmt, ...)
{
va_list args;
@@ -450,7 +450,7 @@ void ext4_msg (struct super_block * sb, const char *prefix,
va_end(args);
}

-void ext4_warning(struct super_block *sb, const char *function,
+void __ext4_warning(struct super_block *sb, const char *function,
const char *fmt, ...)
{
va_list args;
@@ -507,7 +507,7 @@ void ext4_update_dynamic_rev(struct super_block *sb)
if (le32_to_cpu(es->s_rev_level) > EXT4_GOOD_OLD_REV)
return;

- ext4_warning(sb, __func__,
+ ext4_warning(sb,
"updating to rev %d because of new feature flag, "
"running e2fsck is recommended",
EXT4_DYNAMIC_REV);
@@ -3367,10 +3367,9 @@ static void ext4_clear_journal_err(struct super_block *sb,
char nbuf[16];

errstr = ext4_decode_error(sb, j_errno, nbuf);
- ext4_warning(sb, __func__, "Filesystem error recorded "
+ ext4_warning(sb, "Filesystem error recorded "
"from previous mount: %s", errstr);
- ext4_warning(sb, __func__, "Marking fs in need of "
- "filesystem check.");
+ ext4_warning(sb, "Marking fs in need of filesystem check.");

EXT4_SB(sb)->s_mount_state |= EXT4_ERROR_FS;
es->s_state |= cpu_to_le16(EXT4_ERROR_FS);
diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index c619a7e..627c98a 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -227,7 +227,8 @@ ext4_xattr_block_get(struct inode *inode, int name_index, const char *name,
ea_bdebug(bh, "b_count=%d, refcount=%d",
atomic_read(&(bh->b_count)), le32_to_cpu(BHDR(bh)->h_refcount));
if (ext4_xattr_check_block(bh)) {
-bad_block: ext4_error(inode->i_sb, __func__,
+bad_block:
+ ext4_error(inode->i_sb,
"inode %lu: bad block %llu", inode->i_ino,
EXT4_I(inode)->i_file_acl);
error = -EIO;
@@ -371,7 +372,7 @@ ext4_xattr_block_list(struct dentry *dentry, char *buffer, size_t buffer_size)
ea_bdebug(bh, "b_count=%d, refcount=%d",
atomic_read(&(bh->b_count)), le32_to_cpu(BHDR(bh)->h_refcount));
if (ext4_xattr_check_block(bh)) {
- ext4_error(inode->i_sb, __func__,
+ ext4_error(inode->i_sb,
"inode %lu: bad block %llu", inode->i_ino,
EXT4_I(inode)->i_file_acl);
error = -EIO;
@@ -665,9 +666,8 @@ ext4_xattr_block_find(struct inode *inode, struct ext4_xattr_info *i,
atomic_read(&(bs->bh->b_count)),
le32_to_cpu(BHDR(bs->bh)->h_refcount));
if (ext4_xattr_check_block(bs->bh)) {
- ext4_error(sb, __func__,
- "inode %lu: bad block %llu", inode->i_ino,
- EXT4_I(inode)->i_file_acl);
+ ext4_error(sb, "inode %lu: bad block %llu",
+ inode->i_ino, EXT4_I(inode)->i_file_acl);
error = -EIO;
goto cleanup;
}
@@ -880,9 +880,8 @@ cleanup_dquot:
goto cleanup;

bad_block:
- ext4_error(inode->i_sb, __func__,
- "inode %lu: bad block %llu", inode->i_ino,
- EXT4_I(inode)->i_file_acl);
+ ext4_error(inode->i_sb, "inode %lu: bad block %llu",
+ inode->i_ino, EXT4_I(inode)->i_file_acl);
goto cleanup;

#undef header
@@ -1195,9 +1194,8 @@ retry:
if (!bh)
goto cleanup;
if (ext4_xattr_check_block(bh)) {
- ext4_error(inode->i_sb, __func__,
- "inode %lu: bad block %llu", inode->i_ino,
- EXT4_I(inode)->i_file_acl);
+ ext4_error(inode->i_sb, "inode %lu: bad block %llu",
+ inode->i_ino, EXT4_I(inode)->i_file_acl);
error = -EIO;
goto cleanup;
}
@@ -1372,16 +1370,14 @@ ext4_xattr_delete_inode(handle_t *handle, struct inode *inode)
goto cleanup;
bh = sb_bread(inode->i_sb, EXT4_I(inode)->i_file_acl);
if (!bh) {
- ext4_error(inode->i_sb, __func__,
- "inode %lu: block %llu read error", inode->i_ino,
- EXT4_I(inode)->i_file_acl);
+ ext4_error(inode->i_sb, "inode %lu: block %llu read error",
+ inode->i_ino, EXT4_I(inode)->i_file_acl);
goto cleanup;
}
if (BHDR(bh)->h_magic != cpu_to_le32(EXT4_XATTR_MAGIC) ||
BHDR(bh)->h_blocks != cpu_to_le32(1)) {
- ext4_error(inode->i_sb, __func__,
- "inode %lu: bad block %llu", inode->i_ino,
- EXT4_I(inode)->i_file_acl);
+ ext4_error(inode->i_sb, "inode %lu: bad block %llu",
+ inode->i_ino, EXT4_I(inode)->i_file_acl);
goto cleanup;
}
ext4_xattr_release_block(handle, inode, bh);
@@ -1506,7 +1502,7 @@ again:
}
bh = sb_bread(inode->i_sb, ce->e_block);
if (!bh) {
- ext4_error(inode->i_sb, __func__,
+ ext4_error(inode->i_sb,
"inode %lu: block %lu read error",
inode->i_ino, (unsigned long) ce->e_block);
} else if (le32_to_cpu(BHDR(bh)->h_refcount) >=
--
1.6.6.1.1.g974db.dirty


2010-03-02 18:18:55

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 24/28] ext4: mechanical change on dio get_block code in prepare for it to be used by buffer write

Renaming the dio block allocation flags, variables, and functions
introduced in Mingming's "Direct IO for holes and fallocate"
patches so that they can be used by ext4 buffer write as well.
Also changed the related function comments accordingly to cover
both direct write and buffer wirte cases.

Signed-off-by: Jiaying Zhang <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/ext4.h | 18 ++++++------
fs/ext4/extents.c | 24 +++++++-------
fs/ext4/fsync.c | 2 +-
fs/ext4/inode.c | 84 ++++++++++++++++++++++++-----------------------------
fs/ext4/super.c | 2 +-
5 files changed, 61 insertions(+), 69 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 74664ca..c831a58 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -133,7 +133,7 @@ struct mpage_da_data {
int pages_written;
int retval;
};
-#define DIO_AIO_UNWRITTEN 0x1
+#define EXT4_IO_UNWRITTEN 0x1
typedef struct ext4_io_end {
struct list_head list; /* per-file finished AIO list */
struct inode *inode; /* file being written to */
@@ -355,13 +355,13 @@ struct ext4_new_group_data {
/* caller is from the direct IO path, request to creation of an
unitialized extents if not allocated, split the uninitialized
extent if blocks has been preallocated already*/
-#define EXT4_GET_BLOCKS_DIO 0x0008
+#define EXT4_GET_BLOCKS_PRE_IO 0x0008
#define EXT4_GET_BLOCKS_CONVERT 0x0010
-#define EXT4_GET_BLOCKS_DIO_CREATE_EXT (EXT4_GET_BLOCKS_DIO|\
+#define EXT4_GET_BLOCKS_IO_CREATE_EXT (EXT4_GET_BLOCKS_PRE_IO|\
EXT4_GET_BLOCKS_CREATE_UNINIT_EXT)
- /* Convert extent to initialized after direct IO complete */
-#define EXT4_GET_BLOCKS_DIO_CONVERT_EXT (EXT4_GET_BLOCKS_CONVERT|\
- EXT4_GET_BLOCKS_DIO_CREATE_EXT)
+ /* Convert extent to initialized after IO complete */
+#define EXT4_GET_BLOCKS_IO_CONVERT_EXT (EXT4_GET_BLOCKS_CONVERT|\
+ EXT4_GET_BLOCKS_IO_CREATE_EXT)

/*
* Flags used by ext4_free_blocks
@@ -700,8 +700,8 @@ struct ext4_inode_info {
qsize_t i_reserved_quota;
#endif

- /* completed async DIOs that might need unwritten extents handling */
- struct list_head i_aio_dio_complete_list;
+ /* completed IOs that might need unwritten extents handling */
+ struct list_head i_completed_io_list;
/* current io_end structure for async DIO write*/
ext4_io_end_t *cur_aio_dio;

@@ -1461,7 +1461,7 @@ extern int ext4_block_truncate_page(handle_t *handle,
struct address_space *mapping, loff_t from);
extern int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf);
extern qsize_t *ext4_get_reserved_space(struct inode *inode);
-extern int flush_aio_dio_completed_IO(struct inode *inode);
+extern int flush_completed_IO(struct inode *inode);
extern void ext4_da_update_reserve_space(struct inode *inode,
int used, int quota_claim);
/* ioctl.c */
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index e0e2009..f6ea4a2 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -1619,7 +1619,7 @@ int ext4_ext_insert_extent(handle_t *handle, struct inode *inode,
BUG_ON(path[depth].p_hdr == NULL);

/* try to insert block into found extent and return */
- if (ex && (flag != EXT4_GET_BLOCKS_DIO_CREATE_EXT)
+ if (ex && (flag != EXT4_GET_BLOCKS_PRE_IO)
&& ext4_can_extents_be_merged(inode, ex, newext)) {
ext_debug("append [%d]%d block to %d:[%d]%d (from %llu)\n",
ext4_ext_is_uninitialized(newext),
@@ -1740,7 +1740,7 @@ has_space:

merge:
/* try to merge extents to the right */
- if (flag != EXT4_GET_BLOCKS_DIO_CREATE_EXT)
+ if (flag != EXT4_GET_BLOCKS_PRE_IO)
ext4_ext_try_to_merge(inode, path, nearex);

/* try to merge extents to the left */
@@ -2984,7 +2984,7 @@ fix_extent_len:
ext4_ext_dirty(handle, inode, path + depth);
return err;
}
-static int ext4_convert_unwritten_extents_dio(handle_t *handle,
+static int ext4_convert_unwritten_extents_endio(handle_t *handle,
struct inode *inode,
struct ext4_ext_path *path)
{
@@ -3064,8 +3064,8 @@ ext4_ext_handle_uninitialized_extents(handle_t *handle, struct inode *inode,
flags, allocated);
ext4_ext_show_leaf(inode, path);

- /* DIO get_block() before submit the IO, split the extent */
- if (flags == EXT4_GET_BLOCKS_DIO_CREATE_EXT) {
+ /* get_block() before submit the IO, split the extent */
+ if (flags == EXT4_GET_BLOCKS_PRE_IO) {
ret = ext4_split_unwritten_extents(handle,
inode, path, iblock,
max_blocks, flags);
@@ -3075,14 +3075,14 @@ ext4_ext_handle_uninitialized_extents(handle_t *handle, struct inode *inode,
* completed
*/
if (io)
- io->flag = DIO_AIO_UNWRITTEN;
+ io->flag = EXT4_IO_UNWRITTEN;
else
ext4_set_inode_state(inode, EXT4_STATE_DIO_UNWRITTEN);
goto out;
}
- /* async DIO end_io complete, convert the filled extent to written */
- if (flags == EXT4_GET_BLOCKS_DIO_CONVERT_EXT) {
- ret = ext4_convert_unwritten_extents_dio(handle, inode,
+ /* IO end_io complete, convert the filled extent to written */
+ if (flags == EXT4_GET_BLOCKS_CONVERT) {
+ ret = ext4_convert_unwritten_extents_endio(handle, inode,
path);
if (ret >= 0)
ext4_update_inode_fsync_trans(handle, inode, 1);
@@ -3359,9 +3359,9 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
* For non asycn direct IO case, flag the inode state
* that we need to perform convertion when IO is done.
*/
- if (flags == EXT4_GET_BLOCKS_DIO_CREATE_EXT) {
+ if (flags == EXT4_GET_BLOCKS_PRE_IO) {
if (io)
- io->flag = DIO_AIO_UNWRITTEN;
+ io->flag = EXT4_IO_UNWRITTEN;
else
ext4_set_inode_state(inode,
EXT4_STATE_DIO_UNWRITTEN);
@@ -3656,7 +3656,7 @@ int ext4_convert_unwritten_extents(struct inode *inode, loff_t offset,
map_bh.b_state = 0;
ret = ext4_get_blocks(handle, inode, block,
max_blocks, &map_bh,
- EXT4_GET_BLOCKS_DIO_CONVERT_EXT);
+ EXT4_GET_BLOCKS_IO_CONVERT_EXT);
if (ret <= 0) {
WARN_ON(ret <= 0);
printk(KERN_ERR "%s: ext4_ext_get_blocks "
diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c
index 98bd140..0d0c323 100644
--- a/fs/ext4/fsync.c
+++ b/fs/ext4/fsync.c
@@ -63,7 +63,7 @@ int ext4_sync_file(struct file *file, struct dentry *dentry, int datasync)
if (inode->i_sb->s_flags & MS_RDONLY)
return 0;

- ret = flush_aio_dio_completed_IO(inode);
+ ret = flush_completed_IO(inode);
if (ret < 0)
return ret;

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 427f469..28f116b 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3468,7 +3468,7 @@ out:
return ret;
}

-static int ext4_get_block_dio_write(struct inode *inode, sector_t iblock,
+static int ext4_get_block_write(struct inode *inode, sector_t iblock,
struct buffer_head *bh_result, int create)
{
handle_t *handle = NULL;
@@ -3476,28 +3476,14 @@ static int ext4_get_block_dio_write(struct inode *inode, sector_t iblock,
unsigned max_blocks = bh_result->b_size >> inode->i_blkbits;
int dio_credits;

- ext4_debug("ext4_get_block_dio_write: inode %lu, create flag %d\n",
+ ext4_debug("ext4_get_block_write: inode %lu, create flag %d\n",
inode->i_ino, create);
/*
- * DIO VFS code passes create = 0 flag for write to
- * the middle of file. It does this to avoid block
- * allocation for holes, to prevent expose stale data
- * out when there is parallel buffered read (which does
- * not hold the i_mutex lock) while direct IO write has
- * not completed. DIO request on holes finally falls back
- * to buffered IO for this reason.
- *
- * For ext4 extent based file, since we support fallocate,
- * new allocated extent as uninitialized, for holes, we
- * could fallocate blocks for holes, thus parallel
- * buffered IO read will zero out the page when read on
- * a hole while parallel DIO write to the hole has not completed.
- *
- * when we come here, we know it's a direct IO write to
- * to the middle of file (<i_size)
- * so it's safe to override the create flag from VFS.
+ * ext4_get_block in prepare for a DIO write or buffer write.
+ * We allocate an uinitialized extent if blocks haven't been allocated.
+ * The extent will be converted to initialized after IO complete.
*/
- create = EXT4_GET_BLOCKS_DIO_CREATE_EXT;
+ create = EXT4_GET_BLOCKS_IO_CREATE_EXT;

if (max_blocks > DIO_MAX_BLOCKS)
max_blocks = DIO_MAX_BLOCKS;
@@ -3524,19 +3510,20 @@ static void ext4_free_io_end(ext4_io_end_t *io)
iput(io->inode);
kfree(io);
}
-static void dump_aio_dio_list(struct inode * inode)
+
+static void dump_completed_IO(struct inode * inode)
{
#ifdef EXT4_DEBUG
struct list_head *cur, *before, *after;
ext4_io_end_t *io, *io0, *io1;

- if (list_empty(&EXT4_I(inode)->i_aio_dio_complete_list)){
- ext4_debug("inode %lu aio dio list is empty\n", inode->i_ino);
+ if (list_empty(&EXT4_I(inode)->i_completed_io_list)){
+ ext4_debug("inode %lu completed_io list is empty\n", inode->i_ino);
return;
}

- ext4_debug("Dump inode %lu aio_dio_completed_IO list \n", inode->i_ino);
- list_for_each_entry(io, &EXT4_I(inode)->i_aio_dio_complete_list, list){
+ ext4_debug("Dump inode %lu completed_io list \n", inode->i_ino);
+ list_for_each_entry(io, &EXT4_I(inode)->i_completed_io_list, list){
cur = &io->list;
before = cur->prev;
io0 = container_of(before, ext4_io_end_t, list);
@@ -3552,21 +3539,21 @@ static void dump_aio_dio_list(struct inode * inode)
/*
* check a range of space and convert unwritten extents to written.
*/
-static int ext4_end_aio_dio_nolock(ext4_io_end_t *io)
+static int ext4_end_io_nolock(ext4_io_end_t *io)
{
struct inode *inode = io->inode;
loff_t offset = io->offset;
ssize_t size = io->size;
int ret = 0;

- ext4_debug("end_aio_dio_onlock: io 0x%p from inode %lu,list->next 0x%p,"
+ ext4_debug("ext4_end_io_nolock: io 0x%p from inode %lu,list->next 0x%p,"
"list->prev 0x%p\n",
io, inode->i_ino, io->list.next, io->list.prev);

if (list_empty(&io->list))
return ret;

- if (io->flag != DIO_AIO_UNWRITTEN)
+ if (io->flag != EXT4_IO_UNWRITTEN)
return ret;

if (offset + size <= i_size_read(inode))
@@ -3584,17 +3571,18 @@ static int ext4_end_aio_dio_nolock(ext4_io_end_t *io)
io->flag = 0;
return ret;
}
+
/*
* work on completed aio dio IO, to convert unwritten extents to extents
*/
-static void ext4_end_aio_dio_work(struct work_struct *work)
+static void ext4_end_io_work(struct work_struct *work)
{
ext4_io_end_t *io = container_of(work, ext4_io_end_t, work);
struct inode *inode = io->inode;
int ret = 0;

mutex_lock(&inode->i_mutex);
- ret = ext4_end_aio_dio_nolock(io);
+ ret = ext4_end_io_nolock(io);
if (ret >= 0) {
if (!list_empty(&io->list))
list_del_init(&io->list);
@@ -3602,32 +3590,35 @@ static void ext4_end_aio_dio_work(struct work_struct *work)
}
mutex_unlock(&inode->i_mutex);
}
+
/*
* This function is called from ext4_sync_file().
*
- * When AIO DIO IO is completed, the work to convert unwritten
- * extents to written is queued on workqueue but may not get immediately
+ * When IO is completed, the work to convert unwritten extents to
+ * written is queued on workqueue but may not get immediately
* scheduled. When fsync is called, we need to ensure the
* conversion is complete before fsync returns.
- * The inode keeps track of a list of completed AIO from DIO path
- * that might needs to do the conversion. This function walks through
- * the list and convert the related unwritten extents to written.
+ * The inode keeps track of a list of pending/completed IO that
+ * might needs to do the conversion. This function walks through
+ * the list and convert the related unwritten extents for completed IO
+ * to written.
+ * The function return the number of pending IOs on success.
*/
-int flush_aio_dio_completed_IO(struct inode *inode)
+int flush_completed_IO(struct inode *inode)
{
ext4_io_end_t *io;
int ret = 0;
int ret2 = 0;

- if (list_empty(&EXT4_I(inode)->i_aio_dio_complete_list))
+ if (list_empty(&EXT4_I(inode)->i_completed_io_list))
return ret;

- dump_aio_dio_list(inode);
- while (!list_empty(&EXT4_I(inode)->i_aio_dio_complete_list)){
- io = list_entry(EXT4_I(inode)->i_aio_dio_complete_list.next,
+ dump_completed_IO(inode);
+ while (!list_empty(&EXT4_I(inode)->i_completed_io_list)){
+ io = list_entry(EXT4_I(inode)->i_completed_io_list.next,
ext4_io_end_t, list);
/*
- * Calling ext4_end_aio_dio_nolock() to convert completed
+ * Calling ext4_end_io_nolock() to convert completed
* IO to written.
*
* When ext4_sync_file() is called, run_queue() may already
@@ -3640,7 +3631,7 @@ int flush_aio_dio_completed_IO(struct inode *inode)
* avoid double converting from both fsync and background work
* queue work.
*/
- ret = ext4_end_aio_dio_nolock(io);
+ ret = ext4_end_io_nolock(io);
if (ret < 0)
ret2 = ret;
else
@@ -3662,7 +3653,7 @@ static ext4_io_end_t *ext4_init_io_end (struct inode *inode)
io->offset = 0;
io->size = 0;
io->error = 0;
- INIT_WORK(&io->work, ext4_end_aio_dio_work);
+ INIT_WORK(&io->work, ext4_end_io_work);
INIT_LIST_HEAD(&io->list);
}

@@ -3685,7 +3676,7 @@ static void ext4_end_io_dio(struct kiocb *iocb, loff_t offset,
size);

/* if not aio dio with unwritten extents, just free io and return */
- if (io_end->flag != DIO_AIO_UNWRITTEN){
+ if (io_end->flag != EXT4_IO_UNWRITTEN){
ext4_free_io_end(io_end);
iocb->private = NULL;
return;
@@ -3700,9 +3691,10 @@ static void ext4_end_io_dio(struct kiocb *iocb, loff_t offset,

/* Add the io_end to per-inode completed aio dio list*/
list_add_tail(&io_end->list,
- &EXT4_I(io_end->inode)->i_aio_dio_complete_list);
+ &EXT4_I(io_end->inode)->i_completed_io_list);
iocb->private = NULL;
}
+
/*
* For ext4 extent files, ext4 will do direct-io write to holes,
* preallocated extents, and those write extend the file, no need to
@@ -3772,7 +3764,7 @@ static ssize_t ext4_ext_direct_IO(int rw, struct kiocb *iocb,
ret = blockdev_direct_IO(rw, iocb, inode,
inode->i_sb->s_bdev, iov,
offset, nr_segs,
- ext4_get_block_dio_write,
+ ext4_get_block_write,
ext4_end_io_dio);
if (iocb->private)
EXT4_I(inode)->cur_aio_dio = NULL;
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index c832508..dc7a97e 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -708,7 +708,7 @@ static struct inode *ext4_alloc_inode(struct super_block *sb)
#ifdef CONFIG_QUOTA
ei->i_reserved_quota = 0;
#endif
- INIT_LIST_HEAD(&ei->i_aio_dio_complete_list);
+ INIT_LIST_HEAD(&ei->i_completed_io_list);
ei->cur_aio_dio = NULL;
ei->i_sync_tid = 0;
ei->i_datasync_tid = 0;
--
1.6.6.1.1.g974db.dirty


2010-03-02 18:18:55

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 25/28] ext4: use ext4_get_block_write in buffer write

Allocate uninitialized extent before ext4 buffer write and
convert the extent to initialized after io completes.
The purpose is to make sure an extent can only be marked
initialized after it has been written with new data so
we can safely drop the i_mutex lock in ext4 DIO read without
exposing stale data. This helps to improve multi-thread DIO
read performance on high-speed disks.

Skip the nobh and data=journal mount cases to make things simple for now.

Signed-off-by: Jiaying Zhang <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/ext4.h | 14 ++++-
fs/ext4/ext4_jbd2.h | 24 +++++++
fs/ext4/extents.c | 22 ++++---
fs/ext4/inode.c | 166 +++++++++++++++++++++++++++++++++++++++++----------
fs/ext4/super.c | 31 ++++++++-
5 files changed, 209 insertions(+), 48 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index c831a58..79c067e 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -138,7 +138,7 @@ typedef struct ext4_io_end {
struct list_head list; /* per-file finished AIO list */
struct inode *inode; /* file being written to */
unsigned int flag; /* unwritten or not */
- int error; /* I/O error code */
+ struct page *page; /* page struct for buffer write */
loff_t offset; /* offset in the file */
ssize_t size; /* size of the extent */
struct work_struct work; /* data work queue */
@@ -361,7 +361,7 @@ struct ext4_new_group_data {
EXT4_GET_BLOCKS_CREATE_UNINIT_EXT)
/* Convert extent to initialized after IO complete */
#define EXT4_GET_BLOCKS_IO_CONVERT_EXT (EXT4_GET_BLOCKS_CONVERT|\
- EXT4_GET_BLOCKS_IO_CREATE_EXT)
+ EXT4_GET_BLOCKS_CREATE_UNINIT_EXT)

/*
* Flags used by ext4_free_blocks
@@ -702,6 +702,7 @@ struct ext4_inode_info {

/* completed IOs that might need unwritten extents handling */
struct list_head i_completed_io_list;
+ spinlock_t i_completed_io_lock;
/* current io_end structure for async DIO write*/
ext4_io_end_t *cur_aio_dio;

@@ -752,6 +753,7 @@ struct ext4_inode_info {
#define EXT4_MOUNT_QUOTA 0x80000 /* Some quota option set */
#define EXT4_MOUNT_USRQUOTA 0x100000 /* "old" user quota */
#define EXT4_MOUNT_GRPQUOTA 0x200000 /* "old" group quota */
+#define EXT4_MOUNT_DIOREAD_NOLOCK 0x400000 /* Enable support for dio read nolocking */
#define EXT4_MOUNT_JOURNAL_CHECKSUM 0x800000 /* Journal checksums */
#define EXT4_MOUNT_JOURNAL_ASYNC_COMMIT 0x1000000 /* Journal Async Commit */
#define EXT4_MOUNT_I_VERSION 0x2000000 /* i_version support */
@@ -1781,6 +1783,14 @@ extern int ext4_move_extents(struct file *o_filp, struct file *d_filp,
__u64 len, __u64 *moved_len);


+/* BH_Uninit flag: blocks are allocated but uninitialized on disk */
+enum ext4_state_bits {
+ BH_Uninit /* blocks are allocated but uninitialized on disk */
+ = BH_JBDPrivateStart,
+};
+
+BUFFER_FNS(Uninit, uninit)
+
/*
* Add new method to test wether block and inode bitmaps are properly
* initialized. With uninit_bg reading the block from disk is not enough
diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
index 05eca81..b79ad51 100644
--- a/fs/ext4/ext4_jbd2.h
+++ b/fs/ext4/ext4_jbd2.h
@@ -304,4 +304,28 @@ static inline int ext4_should_writeback_data(struct inode *inode)
return 0;
}

+/*
+ * This function controls whether or not we should try to go down the
+ * dioread_nolock code paths, which makes it safe to avoid taking
+ * i_mutex for direct I/O reads. This only works for extent-based
+ * files, and it doesn't work for nobh or if data journaling is
+ * enabled, since the dioread_nolock code uses b_private to pass
+ * information back to the I/O completion handler, and this conflicts
+ * with the jbd's use of b_private.
+ */
+static inline int ext4_should_dioread_nolock(struct inode *inode)
+{
+ if (!test_opt(inode->i_sb, DIOREAD_NOLOCK))
+ return 0;
+ if (test_opt(inode->i_sb, NOBH))
+ return 0;
+ if (!S_ISREG(inode->i_mode))
+ return 0;
+ if (!(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL))
+ return 0;
+ if (ext4_should_journal_data(inode))
+ return 0;
+ return 1;
+}
+
#endif /* _EXT4_JBD2_H */
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index f6ea4a2..202667b 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -1619,7 +1619,7 @@ int ext4_ext_insert_extent(handle_t *handle, struct inode *inode,
BUG_ON(path[depth].p_hdr == NULL);

/* try to insert block into found extent and return */
- if (ex && (flag != EXT4_GET_BLOCKS_PRE_IO)
+ if (ex && !(flag & EXT4_GET_BLOCKS_PRE_IO)
&& ext4_can_extents_be_merged(inode, ex, newext)) {
ext_debug("append [%d]%d block to %d:[%d]%d (from %llu)\n",
ext4_ext_is_uninitialized(newext),
@@ -1740,7 +1740,7 @@ has_space:

merge:
/* try to merge extents to the right */
- if (flag != EXT4_GET_BLOCKS_PRE_IO)
+ if (!(flag & EXT4_GET_BLOCKS_PRE_IO))
ext4_ext_try_to_merge(inode, path, nearex);

/* try to merge extents to the left */
@@ -3065,7 +3065,7 @@ ext4_ext_handle_uninitialized_extents(handle_t *handle, struct inode *inode,
ext4_ext_show_leaf(inode, path);

/* get_block() before submit the IO, split the extent */
- if (flags == EXT4_GET_BLOCKS_PRE_IO) {
+ if ((flags & EXT4_GET_BLOCKS_PRE_IO)) {
ret = ext4_split_unwritten_extents(handle,
inode, path, iblock,
max_blocks, flags);
@@ -3078,10 +3078,12 @@ ext4_ext_handle_uninitialized_extents(handle_t *handle, struct inode *inode,
io->flag = EXT4_IO_UNWRITTEN;
else
ext4_set_inode_state(inode, EXT4_STATE_DIO_UNWRITTEN);
+ if (ext4_should_dioread_nolock(inode))
+ set_buffer_uninit(bh_result);
goto out;
}
/* IO end_io complete, convert the filled extent to written */
- if (flags == EXT4_GET_BLOCKS_CONVERT) {
+ if ((flags & EXT4_GET_BLOCKS_CONVERT)) {
ret = ext4_convert_unwritten_extents_endio(handle, inode,
path);
if (ret >= 0)
@@ -3351,21 +3353,21 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
if (flags & EXT4_GET_BLOCKS_UNINIT_EXT){
ext4_ext_mark_uninitialized(&newex);
/*
- * io_end structure was created for every async
- * direct IO write to the middle of the file.
- * To avoid unecessary convertion for every aio dio rewrite
- * to the mid of file, here we flag the IO that is really
- * need the convertion.
+ * io_end structure was created for every IO write to an
+ * uninitialized extent. To avoid unecessary conversion,
+ * here we flag the IO that really needs the conversion.
* For non asycn direct IO case, flag the inode state
* that we need to perform convertion when IO is done.
*/
- if (flags == EXT4_GET_BLOCKS_PRE_IO) {
+ if ((flags & EXT4_GET_BLOCKS_PRE_IO)) {
if (io)
io->flag = EXT4_IO_UNWRITTEN;
else
ext4_set_inode_state(inode,
EXT4_STATE_DIO_UNWRITTEN);
}
+ if (ext4_should_dioread_nolock(inode))
+ set_buffer_uninit(bh_result);
}

if (unlikely(EXT4_I(inode)->i_flags & EXT4_EOFBLOCKS_FL)) {
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 28f116b..ae7adf2 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -38,6 +38,7 @@
#include <linux/uio.h>
#include <linux/bio.h>
#include <linux/workqueue.h>
+#include <linux/kernel.h>

#include "ext4_jbd2.h"
#include "xattr.h"
@@ -1534,6 +1535,8 @@ static void ext4_truncate_failed_write(struct inode *inode)
ext4_truncate(inode);
}

+static int ext4_get_block_write(struct inode *inode, sector_t iblock,
+ struct buffer_head *bh_result, int create);
static int ext4_write_begin(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len, unsigned flags,
struct page **pagep, void **fsdata)
@@ -1575,8 +1578,12 @@ retry:
}
*pagep = page;

- ret = block_write_begin(file, mapping, pos, len, flags, pagep, fsdata,
- ext4_get_block);
+ if (ext4_should_dioread_nolock(inode))
+ ret = block_write_begin(file, mapping, pos, len, flags, pagep,
+ fsdata, ext4_get_block_write);
+ else
+ ret = block_write_begin(file, mapping, pos, len, flags, pagep,
+ fsdata, ext4_get_block);

if (!ret && ext4_should_journal_data(inode)) {
ret = walk_page_buffers(handle, page_buffers(page),
@@ -2092,6 +2099,8 @@ static void mpage_put_bnr_to_bhs(struct mpage_da_data *mpd, sector_t logical,
} else if (buffer_mapped(bh))
BUG_ON(bh->b_blocknr != pblock);

+ if (buffer_uninit(exbh))
+ set_buffer_uninit(bh);
cur_logical++;
pblock++;
} while ((bh = bh->b_this_page) != head);
@@ -2221,6 +2230,8 @@ static int mpage_da_map_blocks(struct mpage_da_data *mpd)
*/
new.b_state = 0;
get_blocks_flags = EXT4_GET_BLOCKS_CREATE;
+ if (ext4_should_dioread_nolock(mpd->inode))
+ get_blocks_flags |= EXT4_GET_BLOCKS_IO_CREATE_EXT;
if (mpd->b_state & (1 << BH_Delay))
get_blocks_flags |= EXT4_GET_BLOCKS_DELALLOC_RESERVE;

@@ -2636,6 +2647,9 @@ out:
return ret;
}

+static int ext4_set_bh_endio(struct buffer_head *bh, struct inode *inode);
+static void ext4_end_io_buffer_write(struct buffer_head *bh, int uptodate);
+
/*
* Note that we don't need to start a transaction unless we're journaling data
* because we should have holes filled from ext4_page_mkwrite(). We even don't
@@ -2683,7 +2697,7 @@ static int ext4_writepage(struct page *page,
int ret = 0;
loff_t size;
unsigned int len;
- struct buffer_head *page_bufs;
+ struct buffer_head *page_bufs = NULL;
struct inode *inode = page->mapping->host;

trace_ext4_writepage(inode, page);
@@ -2759,7 +2773,11 @@ static int ext4_writepage(struct page *page,

if (test_opt(inode->i_sb, NOBH) && ext4_should_writeback_data(inode))
ret = nobh_writepage(page, noalloc_get_block_write, wbc);
- else
+ else if (page_bufs && buffer_uninit(page_bufs)) {
+ ext4_set_bh_endio(page_bufs, inode);
+ ret = block_write_full_page_endio(page, noalloc_get_block_write,
+ wbc, ext4_end_io_buffer_write);
+ } else
ret = block_write_full_page(page, noalloc_get_block_write,
wbc);

@@ -3471,10 +3489,11 @@ out:
static int ext4_get_block_write(struct inode *inode, sector_t iblock,
struct buffer_head *bh_result, int create)
{
- handle_t *handle = NULL;
+ handle_t *handle = ext4_journal_current_handle();
int ret = 0;
unsigned max_blocks = bh_result->b_size >> inode->i_blkbits;
int dio_credits;
+ int started = 0;

ext4_debug("ext4_get_block_write: inode %lu, create flag %d\n",
inode->i_ino, create);
@@ -3485,21 +3504,26 @@ static int ext4_get_block_write(struct inode *inode, sector_t iblock,
*/
create = EXT4_GET_BLOCKS_IO_CREATE_EXT;

- if (max_blocks > DIO_MAX_BLOCKS)
- max_blocks = DIO_MAX_BLOCKS;
- dio_credits = ext4_chunk_trans_blocks(inode, max_blocks);
- handle = ext4_journal_start(inode, dio_credits);
- if (IS_ERR(handle)) {
- ret = PTR_ERR(handle);
- goto out;
+ if (!handle) {
+ if (max_blocks > DIO_MAX_BLOCKS)
+ max_blocks = DIO_MAX_BLOCKS;
+ dio_credits = ext4_chunk_trans_blocks(inode, max_blocks);
+ handle = ext4_journal_start(inode, dio_credits);
+ if (IS_ERR(handle)) {
+ ret = PTR_ERR(handle);
+ goto out;
+ }
+ started = 1;
}
+
ret = ext4_get_blocks(handle, inode, iblock, max_blocks, bh_result,
create);
if (ret > 0) {
bh_result->b_size = (ret << inode->i_blkbits);
ret = 0;
}
- ext4_journal_stop(handle);
+ if (started)
+ ext4_journal_stop(handle);
out:
return ret;
}
@@ -3507,6 +3531,8 @@ out:
static void ext4_free_io_end(ext4_io_end_t *io)
{
BUG_ON(!io);
+ if (io->page)
+ put_page(io->page);
iput(io->inode);
kfree(io);
}
@@ -3516,6 +3542,7 @@ static void dump_completed_IO(struct inode * inode)
#ifdef EXT4_DEBUG
struct list_head *cur, *before, *after;
ext4_io_end_t *io, *io0, *io1;
+ unsigned long flags;

if (list_empty(&EXT4_I(inode)->i_completed_io_list)){
ext4_debug("inode %lu completed_io list is empty\n", inode->i_ino);
@@ -3523,6 +3550,7 @@ static void dump_completed_IO(struct inode * inode)
}

ext4_debug("Dump inode %lu completed_io list \n", inode->i_ino);
+ spin_lock_irqsave(&EXT4_I(inode)->i_completed_io_lock, flags);
list_for_each_entry(io, &EXT4_I(inode)->i_completed_io_list, list){
cur = &io->list;
before = cur->prev;
@@ -3533,6 +3561,7 @@ static void dump_completed_IO(struct inode * inode)
ext4_debug("io 0x%p from inode %lu,prev 0x%p,next 0x%p\n",
io, inode->i_ino, io0, io1);
}
+ spin_unlock_irqrestore(&EXT4_I(inode)->i_completed_io_lock, flags);
#endif
}

@@ -3556,9 +3585,7 @@ static int ext4_end_io_nolock(ext4_io_end_t *io)
if (io->flag != EXT4_IO_UNWRITTEN)
return ret;

- if (offset + size <= i_size_read(inode))
- ret = ext4_convert_unwritten_extents(inode, offset, size);
-
+ ret = ext4_convert_unwritten_extents(inode, offset, size);
if (ret < 0) {
printk(KERN_EMERG "%s: failed to convert unwritten"
"extents to written extents, error is %d"
@@ -3577,18 +3604,25 @@ static int ext4_end_io_nolock(ext4_io_end_t *io)
*/
static void ext4_end_io_work(struct work_struct *work)
{
- ext4_io_end_t *io = container_of(work, ext4_io_end_t, work);
- struct inode *inode = io->inode;
- int ret = 0;
+ ext4_io_end_t *io = container_of(work, ext4_io_end_t, work);
+ struct inode *inode = io->inode;
+ struct ext4_inode_info *ei = EXT4_I(inode);
+ unsigned long flags;
+ int ret;

mutex_lock(&inode->i_mutex);
ret = ext4_end_io_nolock(io);
- if (ret >= 0) {
- if (!list_empty(&io->list))
- list_del_init(&io->list);
- ext4_free_io_end(io);
+ if (ret < 0) {
+ mutex_unlock(&inode->i_mutex);
+ return;
}
+
+ spin_lock_irqsave(&ei->i_completed_io_lock, flags);
+ if (!list_empty(&io->list))
+ list_del_init(&io->list);
+ spin_unlock_irqrestore(&ei->i_completed_io_lock, flags);
mutex_unlock(&inode->i_mutex);
+ ext4_free_io_end(io);
}

/*
@@ -3607,15 +3641,18 @@ static void ext4_end_io_work(struct work_struct *work)
int flush_completed_IO(struct inode *inode)
{
ext4_io_end_t *io;
+ struct ext4_inode_info *ei = EXT4_I(inode);
+ unsigned long flags;
int ret = 0;
int ret2 = 0;

- if (list_empty(&EXT4_I(inode)->i_completed_io_list))
+ if (list_empty(&ei->i_completed_io_list))
return ret;

dump_completed_IO(inode);
- while (!list_empty(&EXT4_I(inode)->i_completed_io_list)){
- io = list_entry(EXT4_I(inode)->i_completed_io_list.next,
+ spin_lock_irqsave(&ei->i_completed_io_lock, flags);
+ while (!list_empty(&ei->i_completed_io_list)){
+ io = list_entry(ei->i_completed_io_list.next,
ext4_io_end_t, list);
/*
* Calling ext4_end_io_nolock() to convert completed
@@ -3631,20 +3668,23 @@ int flush_completed_IO(struct inode *inode)
* avoid double converting from both fsync and background work
* queue work.
*/
+ spin_unlock_irqrestore(&ei->i_completed_io_lock, flags);
ret = ext4_end_io_nolock(io);
+ spin_lock_irqsave(&ei->i_completed_io_lock, flags);
if (ret < 0)
ret2 = ret;
else
list_del_init(&io->list);
}
+ spin_unlock_irqrestore(&ei->i_completed_io_lock, flags);
return (ret2 < 0) ? ret2 : 0;
}

-static ext4_io_end_t *ext4_init_io_end (struct inode *inode)
+static ext4_io_end_t *ext4_init_io_end (struct inode *inode, gfp_t flags)
{
ext4_io_end_t *io = NULL;

- io = kmalloc(sizeof(*io), GFP_NOFS);
+ io = kmalloc(sizeof(*io), flags);

if (io) {
igrab(inode);
@@ -3652,7 +3692,7 @@ static ext4_io_end_t *ext4_init_io_end (struct inode *inode)
io->flag = 0;
io->offset = 0;
io->size = 0;
- io->error = 0;
+ io->page = NULL;
INIT_WORK(&io->work, ext4_end_io_work);
INIT_LIST_HEAD(&io->list);
}
@@ -3665,6 +3705,8 @@ static void ext4_end_io_dio(struct kiocb *iocb, loff_t offset,
{
ext4_io_end_t *io_end = iocb->private;
struct workqueue_struct *wq;
+ unsigned long flags;
+ struct ext4_inode_info *ei;

/* if not async direct IO or dio with 0 bytes write, just return */
if (!io_end || !size)
@@ -3684,17 +3726,77 @@ static void ext4_end_io_dio(struct kiocb *iocb, loff_t offset,

io_end->offset = offset;
io_end->size = size;
+ io_end->flag = EXT4_IO_UNWRITTEN;
wq = EXT4_SB(io_end->inode->i_sb)->dio_unwritten_wq;

/* queue the work to convert unwritten extents to written */
queue_work(wq, &io_end->work);

/* Add the io_end to per-inode completed aio dio list*/
- list_add_tail(&io_end->list,
- &EXT4_I(io_end->inode)->i_completed_io_list);
+ ei = EXT4_I(io_end->inode);
+ spin_lock_irqsave(&ei->i_completed_io_lock, flags);
+ list_add_tail(&io_end->list, &ei->i_completed_io_list);
+ spin_unlock_irqrestore(&ei->i_completed_io_lock, flags);
iocb->private = NULL;
}

+static void ext4_end_io_buffer_write(struct buffer_head *bh, int uptodate)
+{
+ ext4_io_end_t *io_end = bh->b_private;
+ struct workqueue_struct *wq;
+ struct inode *inode;
+ unsigned long flags;
+
+ if (!io_end)
+ goto out;
+ io_end->flag = EXT4_IO_UNWRITTEN;
+ inode = io_end->inode;
+
+ /* Add the io_end to per-inode completed io list*/
+ spin_lock_irqsave(&EXT4_I(inode)->i_completed_io_lock, flags);
+ list_add_tail(&io_end->list, &EXT4_I(inode)->i_completed_io_list);
+ spin_unlock_irqrestore(&EXT4_I(inode)->i_completed_io_lock, flags);
+
+ wq = EXT4_SB(inode->i_sb)->dio_unwritten_wq;
+ /* queue the work to convert unwritten extents to written */
+ queue_work(wq, &io_end->work);
+out:
+ bh->b_private = NULL;
+ bh->b_end_io = NULL;
+ clear_buffer_uninit(bh);
+ end_buffer_async_write(bh, uptodate);
+}
+
+static int ext4_set_bh_endio(struct buffer_head *bh, struct inode *inode)
+{
+ ext4_io_end_t *io_end;
+ struct page *page = bh->b_page;
+ loff_t offset = (sector_t)page->index << PAGE_CACHE_SHIFT;
+ size_t size = bh->b_size;
+
+retry:
+ io_end = ext4_init_io_end(inode, GFP_ATOMIC);
+ if (!io_end) {
+ if (printk_ratelimit())
+ printk(KERN_WARNING "%s: allocation fail\n", __func__);
+ schedule();
+ goto retry;
+ }
+ io_end->offset = offset;
+ io_end->size = size;
+ /*
+ * We need to hold a reference to the page to make sure it
+ * doesn't get evicted before ext4_end_io_work() has a chance
+ * to convert the extent from written to unwritten.
+ */
+ io_end->page = page;
+ get_page(io_end->page);
+
+ bh->b_private = io_end;
+ bh->b_end_io = ext4_end_io_buffer_write;
+ return 0;
+}
+
/*
* For ext4 extent files, ext4 will do direct-io write to holes,
* preallocated extents, and those write extend the file, no need to
@@ -3748,7 +3850,7 @@ static ssize_t ext4_ext_direct_IO(int rw, struct kiocb *iocb,
iocb->private = NULL;
EXT4_I(inode)->cur_aio_dio = NULL;
if (!is_sync_kiocb(iocb)) {
- iocb->private = ext4_init_io_end(inode);
+ iocb->private = ext4_init_io_end(inode, GFP_NOFS);
if (!iocb->private)
return -ENOMEM;
/*
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index dc7a97e..200d257 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -709,6 +709,7 @@ static struct inode *ext4_alloc_inode(struct super_block *sb)
ei->i_reserved_quota = 0;
#endif
INIT_LIST_HEAD(&ei->i_completed_io_list);
+ spin_lock_init(&ei->i_completed_io_lock);
ei->cur_aio_dio = NULL;
ei->i_sync_tid = 0;
ei->i_datasync_tid = 0;
@@ -926,6 +927,9 @@ static int ext4_show_options(struct seq_file *seq, struct vfsmount *vfs)
if (test_opt(sb, NOLOAD))
seq_puts(seq, ",norecovery");

+ if (test_opt(sb, DIOREAD_NOLOCK))
+ seq_puts(seq, ",dioread_nolock");
+
ext4_show_quota_options(seq, sb);

return 0;
@@ -1109,6 +1113,7 @@ enum {
Opt_stripe, Opt_delalloc, Opt_nodelalloc,
Opt_block_validity, Opt_noblock_validity,
Opt_inode_readahead_blks, Opt_journal_ioprio,
+ Opt_dioread_nolock, Opt_dioread_lock,
Opt_discard, Opt_nodiscard,
};

@@ -1176,6 +1181,8 @@ static const match_table_t tokens = {
{Opt_auto_da_alloc, "auto_da_alloc=%u"},
{Opt_auto_da_alloc, "auto_da_alloc"},
{Opt_noauto_da_alloc, "noauto_da_alloc"},
+ {Opt_dioread_nolock, "dioread_nolock"},
+ {Opt_dioread_lock, "dioread_lock"},
{Opt_discard, "discard"},
{Opt_nodiscard, "nodiscard"},
{Opt_err, NULL},
@@ -1640,6 +1647,12 @@ set_qf_format:
case Opt_nodiscard:
clear_opt(sbi->s_mount_opt, DISCARD);
break;
+ case Opt_dioread_nolock:
+ set_opt(sbi->s_mount_opt, DIOREAD_NOLOCK);
+ break;
+ case Opt_dioread_lock:
+ clear_opt(sbi->s_mount_opt, DIOREAD_NOLOCK);
+ break;
default:
ext4_msg(sb, KERN_ERR,
"Unrecognized mount option \"%s\" "
@@ -2795,7 +2808,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_RECOVER)) {
ext4_msg(sb, KERN_ERR, "required journal recovery "
"suppressed and not mounted read-only");
- goto failed_mount4;
+ goto failed_mount_wq;
} else {
clear_opt(sbi->s_mount_opt, DATA_FLAGS);
set_opt(sbi->s_mount_opt, WRITEBACK_DATA);
@@ -2808,7 +2821,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
!jbd2_journal_set_features(EXT4_SB(sb)->s_journal, 0, 0,
JBD2_FEATURE_INCOMPAT_64BIT)) {
ext4_msg(sb, KERN_ERR, "Failed to set 64-bit journal feature");
- goto failed_mount4;
+ goto failed_mount_wq;
}

if (test_opt(sb, JOURNAL_ASYNC_COMMIT)) {
@@ -2847,7 +2860,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
(sbi->s_journal, 0, 0, JBD2_FEATURE_INCOMPAT_REVOKE)) {
ext4_msg(sb, KERN_ERR, "Journal does not support "
"requested data journaling mode");
- goto failed_mount4;
+ goto failed_mount_wq;
}
default:
break;
@@ -2855,13 +2868,17 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
set_task_ioprio(sbi->s_journal->j_task, journal_ioprio);

no_journal:
-
if (test_opt(sb, NOBH)) {
if (!(test_opt(sb, DATA_FLAGS) == EXT4_MOUNT_WRITEBACK_DATA)) {
ext4_msg(sb, KERN_WARNING, "Ignoring nobh option - "
"its supported only with writeback mode");
clear_opt(sbi->s_mount_opt, NOBH);
}
+ if (test_opt(sb, DIOREAD_NOLOCK)) {
+ ext4_msg(sb, KERN_WARNING, "dioread_nolock option is "
+ "not supported with nobh mode");
+ goto failed_mount_wq;
+ }
}
EXT4_SB(sb)->dio_unwritten_wq = create_workqueue("ext4-dio-unwritten");
if (!EXT4_SB(sb)->dio_unwritten_wq) {
@@ -2926,6 +2943,12 @@ no_journal:
"requested data journaling mode");
clear_opt(sbi->s_mount_opt, DELALLOC);
}
+ if (test_opt(sb, DIOREAD_NOLOCK) &&
+ (test_opt(sb, DATA_FLAGS) == EXT4_MOUNT_JOURNAL_DATA)) {
+ ext4_msg(sb, KERN_WARNING, "Ignoring dioread_nolock option - "
+ "requested data journaling mode");
+ clear_opt(sbi->s_mount_opt, DIOREAD_NOLOCK);
+ }

err = ext4_setup_system_zone(sb);
if (err) {
--
1.6.6.1.1.g974db.dirty


2010-03-02 18:18:56

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 21/28] ext4: explicitly remove inode from orphan list after failed direct io

From: Dmitry Monakhov <[email protected]>

Otherwise non-empty orphan list will be triggered on umount.

Signed-off-by: Dmitry Monakhov <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/inode.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index edb7edc..427f469 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3438,6 +3438,9 @@ retry:
* but cannot extend i_size. Bail out and pretend
* the write failed... */
ret = PTR_ERR(handle);
+ if (inode->i_nlink)
+ ext4_orphan_del(NULL, inode);
+
goto out;
}
if (inode->i_nlink)
--
1.6.6.1.1.g974db.dirty


2010-03-02 18:18:56

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 19/28] ext4: deprecate obsoleted mount options

From: Dmitry Monakhov <[email protected]>

Declare following list of mount options as deprecated:
- bsddf, miniddf
- grpid, bsdgroups, nogrpid, sysvgroups

Declare following list of default mount options as deprecated:
- bsdgroups

Signed-off-by: Dmitry Monakhov <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/super.c | 14 +++++++++++++-
1 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 79937e9..c832508 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1205,6 +1205,8 @@ static ext4_fsblk_t get_sb_block(void **data)
}

#define DEFAULT_JOURNAL_IOPRIO (IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, 3))
+static char deprecated_msg[] = "Mount option \"%s\" will be removed by %s\n"
+ "Contact [email protected] if you think we should keep it.\n";

#ifdef CONFIG_QUOTA
static int set_qf_name(struct super_block *sb, int qtype, substring_t *args)
@@ -1294,16 +1296,23 @@ static int parse_options(char *options, struct super_block *sb,
token = match_token(p, tokens, args);
switch (token) {
case Opt_bsd_df:
+ ext4_msg(sb, KERN_WARNING, deprecated_msg, p, "2.6.38");
clear_opt(sbi->s_mount_opt, MINIX_DF);
break;
case Opt_minix_df:
+ ext4_msg(sb, KERN_WARNING, deprecated_msg, p, "2.6.38");
set_opt(sbi->s_mount_opt, MINIX_DF);
+
break;
case Opt_grpid:
+ ext4_msg(sb, KERN_WARNING, deprecated_msg, p, "2.6.38");
set_opt(sbi->s_mount_opt, GRPID);
+
break;
case Opt_nogrpid:
+ ext4_msg(sb, KERN_WARNING, deprecated_msg, p, "2.6.38");
clear_opt(sbi->s_mount_opt, GRPID);
+
break;
case Opt_resuid:
if (match_int(&args[0], &option))
@@ -2449,8 +2458,11 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
def_mount_opts = le32_to_cpu(es->s_default_mount_opts);
if (def_mount_opts & EXT4_DEFM_DEBUG)
set_opt(sbi->s_mount_opt, DEBUG);
- if (def_mount_opts & EXT4_DEFM_BSDGROUPS)
+ if (def_mount_opts & EXT4_DEFM_BSDGROUPS) {
+ ext4_msg(sb, KERN_WARNING, deprecated_msg, "bsdgroups",
+ "2.6.38");
set_opt(sbi->s_mount_opt, GRPID);
+ }
if (def_mount_opts & EXT4_DEFM_UID16)
set_opt(sbi->s_mount_opt, NO_UID32);
#ifdef CONFIG_EXT4_FS_XATTR
--
1.6.6.1.1.g974db.dirty


2010-03-02 18:18:59

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 13/28] ext4: Fix BUG_ON at fs/buffer.c:652 in no journal mode

From: Curt Wohlgemuth <[email protected]>

Calls to ext4_handle_dirty_metadata should only pass in an inode
pointer for inode-specific metadata, and not for shared metadata
blocks such as inode table blocks, block group descriptors, the
superblock, etc.

The BUG_ON can get tripped when updating a special device (such as a
block device) that is opened (so that i_mapping is set in
fs/block_dev.c) and the file system is mounted in no journal mode.

Addresses-Google-Bug: #2404870

Signed-off-by: Curt Wohlgemuth <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/ext4_jbd2.c | 2 +-
fs/ext4/ialloc.c | 2 +-
fs/ext4/inode.c | 6 +++---
fs/ext4/namei.c | 4 ++--
4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c
index 2f407c4..53d2764 100644
--- a/fs/ext4/ext4_jbd2.c
+++ b/fs/ext4/ext4_jbd2.c
@@ -125,7 +125,7 @@ int __ext4_handle_dirty_metadata(const char *where, handle_t *handle,
ext4_journal_abort_handle(where, __func__, bh,
handle, err);
} else {
- if (inode && bh)
+ if (inode)
mark_buffer_dirty_inode(bh, inode);
else
mark_buffer_dirty(bh);
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index e4aaf61..004c9da 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -898,7 +898,7 @@ repeat_in_this_group:
BUFFER_TRACE(inode_bitmap_bh,
"call ext4_handle_dirty_metadata");
err = ext4_handle_dirty_metadata(handle,
- inode,
+ NULL,
inode_bitmap_bh);
if (err)
goto fail;
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 536067b..ecac8c5 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5120,7 +5120,7 @@ static int ext4_do_update_inode(handle_t *handle,
EXT4_FEATURE_RO_COMPAT_LARGE_FILE);
sb->s_dirt = 1;
ext4_handle_sync(handle);
- err = ext4_handle_dirty_metadata(handle, inode,
+ err = ext4_handle_dirty_metadata(handle, NULL,
EXT4_SB(sb)->s_sbh);
}
}
@@ -5149,7 +5149,7 @@ static int ext4_do_update_inode(handle_t *handle,
}

BUFFER_TRACE(bh, "call ext4_handle_dirty_metadata");
- rc = ext4_handle_dirty_metadata(handle, inode, bh);
+ rc = ext4_handle_dirty_metadata(handle, NULL, bh);
if (!err)
err = rc;
ext4_clear_inode_state(inode, EXT4_STATE_NEW);
@@ -5701,7 +5701,7 @@ static int ext4_pin_inode(handle_t *handle, struct inode *inode)
err = jbd2_journal_get_write_access(handle, iloc.bh);
if (!err)
err = ext4_handle_dirty_metadata(handle,
- inode,
+ NULL,
iloc.bh);
brelse(iloc.bh);
}
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index bd2dc0b..a13948f 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -2017,7 +2017,7 @@ int ext4_orphan_add(handle_t *handle, struct inode *inode)
/* Insert this inode at the head of the on-disk orphan list... */
NEXT_ORPHAN(inode) = le32_to_cpu(EXT4_SB(sb)->s_es->s_last_orphan);
EXT4_SB(sb)->s_es->s_last_orphan = cpu_to_le32(inode->i_ino);
- err = ext4_handle_dirty_metadata(handle, inode, EXT4_SB(sb)->s_sbh);
+ err = ext4_handle_dirty_metadata(handle, NULL, EXT4_SB(sb)->s_sbh);
rc = ext4_mark_iloc_dirty(handle, inode, &iloc);
if (!err)
err = rc;
@@ -2089,7 +2089,7 @@ int ext4_orphan_del(handle_t *handle, struct inode *inode)
if (err)
goto out_brelse;
sbi->s_es->s_last_orphan = cpu_to_le32(ino_next);
- err = ext4_handle_dirty_metadata(handle, inode, sbi->s_sbh);
+ err = ext4_handle_dirty_metadata(handle, NULL, sbi->s_sbh);
} else {
struct ext4_iloc iloc2;
struct inode *i_prev =
--
1.6.6.1.1.g974db.dirty


2010-03-02 18:18:56

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 18/28] ext4: Fix fencepost error in chosing choosing group vs file preallocation.

From: Tao Ma <[email protected]>

The ext4 multiblock allocator decides whether to use group or file
preallocation based on the file size. When the file size reaches
s_mb_stream_request (default is 16 blocks), it changes to use a
file-specific preallocation. This is cool, but it has a tiny problem.

See a simple script:
mkfs.ext4 -b 1024 /dev/sda8 1000000
mount -t ext4 -o nodelalloc /dev/sda8 /mnt/ext4
for((i=0;i<5;i++))
do
cat /mnt/4096>>/mnt/ext4/a #4096 is a file with 4096 characters.
cat /mnt/4096>>/mnt/ext4/b
done
debuge4fs -R 'stat a' /dev/sda8|grep BLOCKS -A 1

And you get
BLOCKS:
(0-14):8705-8719, (15):2356, (16-19):8465-8468

So there are 3 extents, a bit strange for the lonely 15th logical
block. As we write to the 16 blocks, we choose file preallocation in
ext4_mb_group_or_file, but in ext4_mb_normalize_request, we meet with
the 16*1024 range, so no preallocation will be carried. file b then
reserves the space after '2356', so when when write 16, we start from
another part.

This patch just change the check in ext4_mb_group_or_file, so
that for the lonely 15 we will still use group preallocation.
After the patch, we will get:
debuge4fs -R 'stat a' /dev/sda8|grep BLOCKS -A 1
BLOCKS:
(0-15):8705-8720, (16-19):8465-8468

Looks more sane. Thanks.

Signed-off-by: Tao Ma <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/mballoc.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 415e11f..72d5ceb 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -3935,7 +3935,7 @@ static void ext4_mb_group_or_file(struct ext4_allocation_context *ac)

/* don't use group allocation for large files */
size = max(size, isize);
- if (size >= sbi->s_mb_stream_request) {
+ if (size > sbi->s_mb_stream_request) {
ac->ac_flags |= EXT4_MB_STREAM_ALLOC;
return;
}
--
1.6.6.1.1.g974db.dirty


2010-03-02 18:18:58

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 17/28] jbd2: clean up an assertion in jbd2_journal_commit_transaction()

From: dingdinghua <[email protected]>

commit_transaction has the same value as journal->j_running_transaction,
so we can simplify the assert statement.

Signed-off-by: dingdinghua <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/jbd2/commit.c | 3 +--
1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 3ee211e..671da7f 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -883,8 +883,7 @@ restart_loop:
spin_unlock(&journal->j_list_lock);
bh = jh2bh(jh);
jbd_lock_bh_state(bh);
- J_ASSERT_JH(jh, jh->b_transaction == commit_transaction ||
- jh->b_transaction == journal->j_running_transaction);
+ J_ASSERT_JH(jh, jh->b_transaction == commit_transaction);

/*
* If there is undo-protected committed data against
--
1.6.6.1.1.g974db.dirty


2010-03-02 18:18:58

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 12/28] jbd2: delay discarding buffers in journal_unmap_buffer

From: dingdinghua <[email protected]>

Delay discarding buffers in journal_unmap_buffer until
we know that "add to orphan" operation has definitely been
committed, otherwise the log space of committing transation
may be freed and reused before truncate get committed, updates
may get lost if crash happens.

Signed-off-by: dingdinghua <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/jbd2/commit.c | 10 +++++-----
fs/jbd2/transaction.c | 43 +++++++++++++++++++++++++++++++------------
2 files changed, 36 insertions(+), 17 deletions(-)

diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 1bc74b6..3ee211e 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -930,12 +930,12 @@ restart_loop:
/* A buffer which has been freed while still being
* journaled by a previous transaction may end up still
* being dirty here, but we want to avoid writing back
- * that buffer in the future now that the last use has
- * been committed. That's not only a performance gain,
- * it also stops aliasing problems if the buffer is left
- * behind for writeback and gets reallocated for another
+ * that buffer in the future after the "add to orphan"
+ * operation been committed, That's not only a performance
+ * gain, it also stops aliasing problems if the buffer is
+ * left behind for writeback and gets reallocated for another
* use in a different page. */
- if (buffer_freed(bh)) {
+ if (buffer_freed(bh) && !jh->b_next_transaction) {
clear_buffer_freed(bh);
clear_buffer_jbddirty(bh);
}
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index a051270..bfc70f5 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -1727,6 +1727,21 @@ static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh)
if (!jh)
goto zap_buffer_no_jh;

+ /*
+ * We cannot remove the buffer from checkpoint lists until the
+ * transaction adding inode to orphan list (let's call it T)
+ * is committed. Otherwise if the transaction changing the
+ * buffer would be cleaned from the journal before T is
+ * committed, a crash will cause that the correct contents of
+ * the buffer will be lost. On the other hand we have to
+ * clear the buffer dirty bit at latest at the moment when the
+ * transaction marking the buffer as freed in the filesystem
+ * structures is committed because from that moment on the
+ * buffer can be reallocated and used by a different page.
+ * Since the block hasn't been freed yet but the inode has
+ * already been added to orphan list, it is safe for us to add
+ * the buffer to BJ_Forget list of the newest transaction.
+ */
transaction = jh->b_transaction;
if (transaction == NULL) {
/* First case: not on any transaction. If it
@@ -1783,16 +1798,15 @@ static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh)
} else if (transaction == journal->j_committing_transaction) {
JBUFFER_TRACE(jh, "on committing transaction");
/*
- * If it is committing, we simply cannot touch it. We
- * can remove it's next_transaction pointer from the
- * running transaction if that is set, but nothing
- * else. */
+ * The buffer is committing, we simply cannot touch
+ * it. So we just set j_next_transaction to the
+ * running transaction (if there is one) and mark
+ * buffer as freed so that commit code knows it should
+ * clear dirty bits when it is done with the buffer.
+ */
set_buffer_freed(bh);
- if (jh->b_next_transaction) {
- J_ASSERT(jh->b_next_transaction ==
- journal->j_running_transaction);
- jh->b_next_transaction = NULL;
- }
+ if (journal->j_running_transaction && buffer_jbddirty(bh))
+ jh->b_next_transaction = journal->j_running_transaction;
jbd2_journal_put_journal_head(jh);
spin_unlock(&journal->j_list_lock);
jbd_unlock_bh_state(bh);
@@ -1969,7 +1983,7 @@ void jbd2_journal_file_buffer(struct journal_head *jh,
*/
void __jbd2_journal_refile_buffer(struct journal_head *jh)
{
- int was_dirty;
+ int was_dirty, jlist;
struct buffer_head *bh = jh2bh(jh);

J_ASSERT_JH(jh, jbd_is_locked_bh_state(bh));
@@ -1991,8 +2005,13 @@ void __jbd2_journal_refile_buffer(struct journal_head *jh)
__jbd2_journal_temp_unlink_buffer(jh);
jh->b_transaction = jh->b_next_transaction;
jh->b_next_transaction = NULL;
- __jbd2_journal_file_buffer(jh, jh->b_transaction,
- jh->b_modified ? BJ_Metadata : BJ_Reserved);
+ if (buffer_freed(bh))
+ jlist = BJ_Forget;
+ else if (jh->b_modified)
+ jlist = BJ_Metadata;
+ else
+ jlist = BJ_Reserved;
+ __jbd2_journal_file_buffer(jh, jh->b_transaction, jlist);
J_ASSERT_JH(jh, jh->b_transaction->t_state == T_RUNNING);

if (was_dirty)
--
1.6.6.1.1.g974db.dirty


2010-03-02 18:18:58

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 16/28] ext4: trivial quota cleanup

From: Dmitry Monakhov <[email protected]>

The patch is aimed to reorganize and simplify quota code a bit.
Quota code is itself complex enough, but we can make it more readable
in some places:
- Move quota option parsing to separate functions.
- Simplify old-quota and journaled-quota mix check.

Signed-off-by: Dmitry Monakhov <[email protected]>
Acked-by: Jan Kara <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/super.c | 123 +++++++++++++++++++++++++++++++------------------------
1 files changed, 69 insertions(+), 54 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 7e8b1b4..79937e9 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1206,6 +1206,64 @@ static ext4_fsblk_t get_sb_block(void **data)

#define DEFAULT_JOURNAL_IOPRIO (IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, 3))

+#ifdef CONFIG_QUOTA
+static int set_qf_name(struct super_block *sb, int qtype, substring_t *args)
+{
+ struct ext4_sb_info *sbi = EXT4_SB(sb);
+ char *qname;
+
+ if (sb_any_quota_loaded(sb) &&
+ !sbi->s_qf_names[qtype]) {
+ ext4_msg(sb, KERN_ERR,
+ "Cannot change journaled "
+ "quota options when quota turned on");
+ return 0;
+ }
+ qname = match_strdup(args);
+ if (!qname) {
+ ext4_msg(sb, KERN_ERR,
+ "Not enough memory for storing quotafile name");
+ return 0;
+ }
+ if (sbi->s_qf_names[qtype] &&
+ strcmp(sbi->s_qf_names[qtype], qname)) {
+ ext4_msg(sb, KERN_ERR,
+ "%s quota file already specified", QTYPE2NAME(qtype));
+ kfree(qname);
+ return 0;
+ }
+ sbi->s_qf_names[qtype] = qname;
+ if (strchr(sbi->s_qf_names[qtype], '/')) {
+ ext4_msg(sb, KERN_ERR,
+ "quotafile must be on filesystem root");
+ kfree(sbi->s_qf_names[qtype]);
+ sbi->s_qf_names[qtype] = NULL;
+ return 0;
+ }
+ set_opt(sbi->s_mount_opt, QUOTA);
+ return 1;
+}
+
+static int clear_qf_name(struct super_block *sb, int qtype)
+{
+
+ struct ext4_sb_info *sbi = EXT4_SB(sb);
+
+ if (sb_any_quota_loaded(sb) &&
+ sbi->s_qf_names[qtype]) {
+ ext4_msg(sb, KERN_ERR, "Cannot change journaled quota options"
+ " when quota turned on");
+ return 0;
+ }
+ /*
+ * The space will be released later when all options are confirmed
+ * to be correct
+ */
+ sbi->s_qf_names[qtype] = NULL;
+ return 1;
+}
+#endif
+
static int parse_options(char *options, struct super_block *sb,
unsigned long *journal_devnum,
unsigned int *journal_ioprio,
@@ -1217,8 +1275,7 @@ static int parse_options(char *options, struct super_block *sb,
int data_opt = 0;
int option;
#ifdef CONFIG_QUOTA
- int qtype, qfmt;
- char *qname;
+ int qfmt;
#endif

if (!options)
@@ -1401,63 +1458,22 @@ static int parse_options(char *options, struct super_block *sb,
break;
#ifdef CONFIG_QUOTA
case Opt_usrjquota:
- qtype = USRQUOTA;
- goto set_qf_name;
- case Opt_grpjquota:
- qtype = GRPQUOTA;
-set_qf_name:
- if (sb_any_quota_loaded(sb) &&
- !sbi->s_qf_names[qtype]) {
- ext4_msg(sb, KERN_ERR,
- "Cannot change journaled "
- "quota options when quota turned on");
+ if (!set_qf_name(sb, USRQUOTA, &args[0]))
return 0;
- }
- qname = match_strdup(&args[0]);
- if (!qname) {
- ext4_msg(sb, KERN_ERR,
- "Not enough memory for "
- "storing quotafile name");
- return 0;
- }
- if (sbi->s_qf_names[qtype] &&
- strcmp(sbi->s_qf_names[qtype], qname)) {
- ext4_msg(sb, KERN_ERR,
- "%s quota file already "
- "specified", QTYPE2NAME(qtype));
- kfree(qname);
- return 0;
- }
- sbi->s_qf_names[qtype] = qname;
- if (strchr(sbi->s_qf_names[qtype], '/')) {
- ext4_msg(sb, KERN_ERR,
- "quotafile must be on "
- "filesystem root");
- kfree(sbi->s_qf_names[qtype]);
- sbi->s_qf_names[qtype] = NULL;
+ break;
+ case Opt_grpjquota:
+ if (!set_qf_name(sb, GRPQUOTA, &args[0]))
return 0;
- }
- set_opt(sbi->s_mount_opt, QUOTA);
break;
case Opt_offusrjquota:
- qtype = USRQUOTA;
- goto clear_qf_name;
+ if (!clear_qf_name(sb, USRQUOTA))
+ return 0;
+ break;
case Opt_offgrpjquota:
- qtype = GRPQUOTA;
-clear_qf_name:
- if (sb_any_quota_loaded(sb) &&
- sbi->s_qf_names[qtype]) {
- ext4_msg(sb, KERN_ERR, "Cannot change "
- "journaled quota options when "
- "quota turned on");
+ if (!clear_qf_name(sb, GRPQUOTA))
return 0;
- }
- /*
- * The space will be released later when all options
- * are confirmed to be correct
- */
- sbi->s_qf_names[qtype] = NULL;
break;
+
case Opt_jqfmt_vfsold:
qfmt = QFMT_VFS_OLD;
goto set_qf_format;
@@ -1630,8 +1646,7 @@ set_qf_format:
if (test_opt(sb, GRPQUOTA) && sbi->s_qf_names[GRPQUOTA])
clear_opt(sbi->s_mount_opt, GRPQUOTA);

- if ((sbi->s_qf_names[USRQUOTA] && test_opt(sb, GRPQUOTA)) ||
- (sbi->s_qf_names[GRPQUOTA] && test_opt(sb, USRQUOTA))) {
+ if (test_opt(sb, GRPQUOTA) || test_opt(sb, USRQUOTA)) {
ext4_msg(sb, KERN_ERR, "old and new quota "
"format mixing");
return 0;
--
1.6.6.1.1.g974db.dirty


2010-03-02 18:18:58

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 20/28] ext4: fix error handling in migrate

From: Dmitry Monakhov <[email protected]>

Set i_nlink to zero for temporary inode from very beginning.
otherwise we may fail to start new journal handle and this
inode will be unreferenced but with i_nlink == 1
Since we hold inode reference it can not be pruned.

Also add missed journal_start retval check.

Signed-off-by: Dmitry Monakhov <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/migrate.c | 29 ++++++++++++++---------------
1 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/fs/ext4/migrate.c b/fs/ext4/migrate.c
index 46a4101..8b87bd0 100644
--- a/fs/ext4/migrate.c
+++ b/fs/ext4/migrate.c
@@ -503,14 +503,10 @@ int ext4_ext_migrate(struct inode *inode)
}
i_size_write(tmp_inode, i_size_read(inode));
/*
- * We don't want the inode to be reclaimed
- * if we got interrupted in between. We have
- * this tmp inode carrying reference to the
- * data blocks of the original file. We set
- * the i_nlink to zero at the last stage after
- * switching the original file to extent format
+ * Set the i_nlink to zero so it will be deleted later
+ * when we drop inode reference.
*/
- tmp_inode->i_nlink = 1;
+ tmp_inode->i_nlink = 0;

ext4_ext_tree_init(handle, tmp_inode);
ext4_orphan_add(handle, tmp_inode);
@@ -537,6 +533,16 @@ int ext4_ext_migrate(struct inode *inode)
up_read((&EXT4_I(inode)->i_data_sem));

handle = ext4_journal_start(inode, 1);
+ if (IS_ERR(handle)) {
+ /*
+ * It is impossible to update on-disk structures without
+ * a handle, so just rollback in-core changes and live other
+ * work to orphan_list_cleanup()
+ */
+ ext4_orphan_del(NULL, tmp_inode);
+ retval = PTR_ERR(handle);
+ goto out;
+ }

ei = EXT4_I(inode);
i_data = ei->i_data;
@@ -618,15 +624,8 @@ err_out:

/* Reset the extent details */
ext4_ext_tree_init(handle, tmp_inode);
-
- /*
- * Set the i_nlink to zero so that
- * generic_drop_inode really deletes the
- * inode
- */
- tmp_inode->i_nlink = 0;

2010-03-02 18:18:58

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 11/28] ext4: correctly calculate number of blocks for fiemap

From: Leonard Michlmayr <[email protected]>

ext4_fiemap() rounds the length of the requested range down to
blocksize, which is is not the true number of blocks that cover the
requested region. This problem is especially impressive if the user
requests only the first byte of a file: not a single extent will be
reported.

We fix this by calculating the last block of the region and then
subtract to find the number of blocks in the extents.

Signed-off-by: Leonard Michlmayr <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/extents.c | 6 ++++--
1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index bd80891..bc9860f 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3768,7 +3768,6 @@ int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
__u64 start, __u64 len)
{
ext4_lblk_t start_blk;
- ext4_lblk_t len_blks;
int error = 0;

/* fallback to generic here if not in extents fmt */
@@ -3782,8 +3781,11 @@ int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
if (fieinfo->fi_flags & FIEMAP_FLAG_XATTR) {
error = ext4_xattr_fiemap(inode, fieinfo);
} else {
+ ext4_lblk_t last_blk, len_blks;
+
start_blk = start >> inode->i_sb->s_blocksize_bits;
- len_blks = len >> inode->i_sb->s_blocksize_bits;
+ last_blk = (start + len - 1) >> inode->i_sb->s_blocksize_bits;
+ len_blks = last_blk - start_blk + 1;

/*
* Walk the extent tree gathering extent information.
--
1.6.6.1.1.g974db.dirty


2010-03-02 18:18:56

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 14/28] ext4: Add flag to files with blocks intentionally past EOF

From: Jiaying Zhang <[email protected]>

fallocate() may potentially instantiate blocks past EOF, depending
on the flags used when it is called.

e2fsck currently has a test for blocks past i_size, and it
sometimes trips up - noticeably on xfstests 013 which runs fsstress.

This patch from Jiayang does fix it up - it (along with
e2fsprogs updates and other patches recently from Aneesh) has
survived many fsstress runs in a row.

Signed-off-by: Eric Sandeen <[email protected]>
Signed-off-by: Jiaying Zhang <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/ext4.h | 5 +++--
fs/ext4/extents.c | 22 +++++++++++++++++++++-
fs/ext4/inode.c | 9 ++++++++-
fs/ext4/ioctl.c | 9 +++++++++
4 files changed, 41 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 509437f..74664ca 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -285,10 +285,11 @@ struct flex_groups {
#define EXT4_HUGE_FILE_FL 0x00040000 /* Set to each huge file */
#define EXT4_EXTENTS_FL 0x00080000 /* Inode uses extents */
#define EXT4_EA_INODE_FL 0x00200000 /* Inode used for large EA */
+#define EXT4_EOFBLOCKS_FL 0x00400000 /* Blocks allocated beyond EOF */
#define EXT4_RESERVED_FL 0x80000000 /* reserved for ext4 lib */

-#define EXT4_FL_USER_VISIBLE 0x000BDFFF /* User visible flags */
-#define EXT4_FL_USER_MODIFIABLE 0x000B80FF /* User modifiable flags */
+#define EXT4_FL_USER_VISIBLE 0x004BDFFF /* User visible flags */
+#define EXT4_FL_USER_MODIFIABLE 0x004B80FF /* User modifiable flags */

/* Flags that should be inherited by new inodes from their parent. */
#define EXT4_FL_INHERITED (EXT4_SECRM_FL | EXT4_UNRM_FL | EXT4_COMPR_FL |\
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index bc9860f..e0e2009 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3186,7 +3186,7 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
{
struct ext4_ext_path *path = NULL;
struct ext4_extent_header *eh;
- struct ext4_extent newex, *ex;
+ struct ext4_extent newex, *ex, *last_ex;
ext4_fsblk_t newblock;
int err = 0, depth, ret, cache_type;
unsigned int allocated = 0;
@@ -3367,6 +3367,19 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
EXT4_STATE_DIO_UNWRITTEN);
}
}
+
+ if (unlikely(EXT4_I(inode)->i_flags & EXT4_EOFBLOCKS_FL)) {
+ if (eh->eh_entries) {
+ last_ex = EXT_LAST_EXTENT(eh);
+ if (iblock + ar.len > le32_to_cpu(last_ex->ee_block)
+ + ext4_ext_get_actual_len(last_ex))
+ EXT4_I(inode)->i_flags &= ~EXT4_EOFBLOCKS_FL;
+ } else {
+ WARN_ON(eh->eh_entries == 0);
+ ext4_error(inode->i_sb, __func__,
+ "inode#%lu, eh->eh_entries = 0!", inode->i_ino);
+ }
+ }
err = ext4_ext_insert_extent(handle, inode, path, &newex, flags);
if (err) {
/* free data blocks we just allocated */
@@ -3500,6 +3513,13 @@ static void ext4_falloc_update_inode(struct inode *inode,
i_size_write(inode, new_size);
if (new_size > EXT4_I(inode)->i_disksize)
ext4_update_i_disksize(inode, new_size);
+ } else {
+ /*
+ * Mark that we allocate beyond EOF so the subsequent truncate
+ * can proceed even if the new size is the same as i_size.
+ */
+ if (new_size > i_size_read(inode))
+ EXT4_I(inode)->i_flags |= EXT4_EOFBLOCKS_FL;
}

}
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index ecac8c5..edb7edc 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4456,6 +4456,8 @@ void ext4_truncate(struct inode *inode)
if (!ext4_can_truncate(inode))
return;

+ EXT4_I(inode)->i_flags &= ~EXT4_EOFBLOCKS_FL;
+
if (inode->i_size == 0 && !test_opt(inode->i_sb, NO_AUTO_DA_ALLOC))
ext4_set_inode_state(inode, EXT4_STATE_DA_ALLOC_CLOSE);

@@ -5305,7 +5307,9 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
}

if (S_ISREG(inode->i_mode) &&
- attr->ia_valid & ATTR_SIZE && attr->ia_size < inode->i_size) {
+ attr->ia_valid & ATTR_SIZE &&
+ (attr->ia_size < inode->i_size ||
+ (EXT4_I(inode)->i_flags & EXT4_EOFBLOCKS_FL))) {
handle_t *handle;

handle = ext4_journal_start(inode, 3);
@@ -5336,6 +5340,9 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
goto err_out;
}
}
+ /* ext4_truncate will clear the flag */
+ if ((EXT4_I(inode)->i_flags & EXT4_EOFBLOCKS_FL))
+ ext4_truncate(inode);
}

rc = inode_setattr(inode, attr);
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index b63d193..2220feb 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -92,6 +92,15 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
flags &= ~EXT4_EXTENTS_FL;
}

+ if (flags & EXT4_EOFBLOCKS_FL) {
+ /* we don't support adding EOFBLOCKS flag */
+ if (!(oldflags & EXT4_EOFBLOCKS_FL)) {
+ err = -EOPNOTSUPP;
+ goto flags_out;
+ }
+ } else if (oldflags & EXT4_EOFBLOCKS_FL)
+ ext4_truncate(inode);
+
handle = ext4_journal_start(inode, 1);
if (IS_ERR(handle)) {
err = PTR_ERR(handle);
--
1.6.6.1.1.g974db.dirty


2010-03-02 18:18:59

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 15/28] ext4: mount flags manipulation cleanup

From: Dmitry Monakhov <[email protected]>

Replace intermediate EXT4_MOUNT_XXX flags manipulation to
corresponding macro.

Signed-off-by: Dmitry Monakhov <[email protected]>
Acked-by: Jan Kara <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/super.c | 31 +++++++++++++------------------
1 files changed, 13 insertions(+), 18 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 1c85bb6..7e8b1b4 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -796,10 +796,10 @@ static inline void ext4_show_quota_options(struct seq_file *seq,
if (sbi->s_qf_names[GRPQUOTA])
seq_printf(seq, ",grpjquota=%s", sbi->s_qf_names[GRPQUOTA]);

- if (sbi->s_mount_opt & EXT4_MOUNT_USRQUOTA)
+ if (test_opt(sb, USRQUOTA))
seq_puts(seq, ",usrquota");

- if (sbi->s_mount_opt & EXT4_MOUNT_GRPQUOTA)
+ if (test_opt(sb, GRPQUOTA))
seq_puts(seq, ",grpquota");
#endif
}
@@ -1383,14 +1383,13 @@ static int parse_options(char *options, struct super_block *sb,
data_opt = EXT4_MOUNT_WRITEBACK_DATA;
datacheck:
if (is_remount) {
- if ((sbi->s_mount_opt & EXT4_MOUNT_DATA_FLAGS)
- != data_opt) {
+ if (test_opt(sb, DATA_FLAGS) != data_opt) {
ext4_msg(sb, KERN_ERR,
"Cannot change data mode on remount");
return 0;
}
} else {
- sbi->s_mount_opt &= ~EXT4_MOUNT_DATA_FLAGS;
+ clear_opt(sbi->s_mount_opt, DATA_FLAGS);
sbi->s_mount_opt |= data_opt;
}
break;
@@ -1625,18 +1624,14 @@ set_qf_format:
}
#ifdef CONFIG_QUOTA
if (sbi->s_qf_names[USRQUOTA] || sbi->s_qf_names[GRPQUOTA]) {
- if ((sbi->s_mount_opt & EXT4_MOUNT_USRQUOTA) &&
- sbi->s_qf_names[USRQUOTA])
+ if (test_opt(sb, USRQUOTA) && sbi->s_qf_names[USRQUOTA])
clear_opt(sbi->s_mount_opt, USRQUOTA);

- if ((sbi->s_mount_opt & EXT4_MOUNT_GRPQUOTA) &&
- sbi->s_qf_names[GRPQUOTA])
+ if (test_opt(sb, GRPQUOTA) && sbi->s_qf_names[GRPQUOTA])
clear_opt(sbi->s_mount_opt, GRPQUOTA);

- if ((sbi->s_qf_names[USRQUOTA] &&
- (sbi->s_mount_opt & EXT4_MOUNT_GRPQUOTA)) ||
- (sbi->s_qf_names[GRPQUOTA] &&
- (sbi->s_mount_opt & EXT4_MOUNT_USRQUOTA))) {
+ if ((sbi->s_qf_names[USRQUOTA] && test_opt(sb, GRPQUOTA)) ||
+ (sbi->s_qf_names[GRPQUOTA] && test_opt(sb, USRQUOTA))) {
ext4_msg(sb, KERN_ERR, "old and new quota "
"format mixing");
return 0;
@@ -2452,11 +2447,11 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
set_opt(sbi->s_mount_opt, POSIX_ACL);
#endif
if ((def_mount_opts & EXT4_DEFM_JMODE) == EXT4_DEFM_JMODE_DATA)
- sbi->s_mount_opt |= EXT4_MOUNT_JOURNAL_DATA;
+ set_opt(sbi->s_mount_opt, JOURNAL_DATA);
else if ((def_mount_opts & EXT4_DEFM_JMODE) == EXT4_DEFM_JMODE_ORDERED)
- sbi->s_mount_opt |= EXT4_MOUNT_ORDERED_DATA;
+ set_opt(sbi->s_mount_opt, ORDERED_DATA);
else if ((def_mount_opts & EXT4_DEFM_JMODE) == EXT4_DEFM_JMODE_WBACK)
- sbi->s_mount_opt |= EXT4_MOUNT_WRITEBACK_DATA;
+ set_opt(sbi->s_mount_opt, WRITEBACK_DATA);

if (le16_to_cpu(sbi->s_es->s_errors) == EXT4_ERRORS_PANIC)
set_opt(sbi->s_mount_opt, ERRORS_PANIC);
@@ -2484,7 +2479,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
goto failed_mount;

sb->s_flags = (sb->s_flags & ~MS_POSIXACL) |
- ((sbi->s_mount_opt & EXT4_MOUNT_POSIX_ACL) ? MS_POSIXACL : 0);
+ (test_opt(sb, POSIX_ACL) ? MS_POSIXACL : 0);

if (le32_to_cpu(es->s_rev_level) == EXT4_GOOD_OLD_REV &&
(EXT4_HAS_COMPAT_FEATURE(sb, ~0U) ||
@@ -3520,7 +3515,7 @@ static int ext4_remount(struct super_block *sb, int *flags, char *data)
ext4_abort(sb, __func__, "Abort forced by user");

sb->s_flags = (sb->s_flags & ~MS_POSIXACL) |
- ((sbi->s_mount_opt & EXT4_MOUNT_POSIX_ACL) ? MS_POSIXACL : 0);
+ (test_opt(sb, POSIX_ACL) ? MS_POSIXACL : 0);

es = sbi->s_es;

--
1.6.6.1.1.g974db.dirty


2010-03-02 18:18:59

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 10/28] ext4: add missing error checking to ext4_expand_extra_isize_ea()

From: Roel Kluin <[email protected]>

Signed-off-by: Roel Kluin <[email protected]>
---
fs/ext4/xattr.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index 627c98a..efc16a4 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -1300,6 +1300,8 @@ retry:

/* Remove the chosen entry from the inode */
error = ext4_xattr_ibody_set(handle, inode, &i, is);
+ if (error)
+ goto cleanup;

entry = IFIRST(header);
if (entry_size + EXT4_XATTR_SIZE(size) >= new_extra_isize)
--
1.6.6.1.1.g974db.dirty


2010-03-02 18:18:56

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 23/28] ext4: make "offset" consistent in ext4_check_dir_entry()

From: Toshiyuki Okajima <[email protected]>

The callers of ext4_check_dir_entry() usually pass in the "file
offset" (ext4_readdir, htree_dirblock_to_tree, search_dirblock,
ext4_dx_find_entry, empty_dir), but a few callers (add_dirent_to_buf,
ext4_delete_entry) only pass in the buffer offset.

To accomodate those last two (which would be hard to fix otherwise),
this patch changes ext4_check_dir_entry() to print the physical block
number and the relative offset as well as the passed-in offset.

Signed-off-by: Toshiyuki Okajima <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/dir.c | 8 +++++---
1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
index 0e0bef3..29857dd 100644
--- a/fs/ext4/dir.c
+++ b/fs/ext4/dir.c
@@ -84,9 +84,11 @@ int ext4_check_dir_entry(const char *function, struct inode *dir,

if (error_msg != NULL)
__ext4_error(dir->i_sb, function,
- "bad entry in directory #%lu: %s - "
- "offset=%u, inode=%u, rec_len=%d, name_len=%d",
- dir->i_ino, error_msg, offset,
+ "bad entry in directory #%lu: %s - block=%llu"
+ "offset=%u(%u), inode=%u, rec_len=%d, name_len=%d",
+ dir->i_ino, error_msg,
+ (unsigned long long) bh->b_blocknr,
+ (unsigned) (offset%bh->b_size), offset,
le32_to_cpu(de->inode),
rlen, de->name_len);
return error_msg == NULL ? 1 : 0;
--
1.6.6.1.1.g974db.dirty


2010-03-02 18:18:55

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 06/28] ext4: Use slab allocator for sub-page sized allocations

Now that the SLUB seems to be fixed so that it respects the requested
alignment, use kmem_cache_alloc() to allocator if the block size of
the buffer heads to be allocated is less than the page size.
Previously, we were using 16k page on a Power system for each buffer,
even when the file system was using 1k or 4k block size.

Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/jbd2/journal.c | 132 ++++++++++++++++++++++++++++++++++++++++++++++++++
include/linux/jbd2.h | 11 +---
2 files changed, 134 insertions(+), 9 deletions(-)

diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index ac0d027..c03d4dc 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -39,6 +39,8 @@
#include <linux/seq_file.h>
#include <linux/math64.h>
#include <linux/hash.h>
+#include <linux/log2.h>
+#include <linux/vmalloc.h>

#define CREATE_TRACE_POINTS
#include <trace/events/jbd2.h>
@@ -93,6 +95,7 @@ EXPORT_SYMBOL(jbd2_journal_begin_ordered_truncate);

static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *);
static void __journal_abort_soft (journal_t *journal, int errno);
+static int jbd2_journal_create_slab(size_t slab_size);

/*
* Helper function used to manage commit timeouts
@@ -1248,6 +1251,13 @@ int jbd2_journal_load(journal_t *journal)
}
}

+ /*
+ * Create a slab for this blocksize
+ */
+ err = jbd2_journal_create_slab(be32_to_cpu(sb->s_blocksize));
+ if (err)
+ return err;
+
/* Let the recovery code check whether it needs to recover any
* data from the journal. */
if (jbd2_journal_recover(journal))
@@ -1807,6 +1817,127 @@ size_t journal_tag_bytes(journal_t *journal)
}

/*
+ * JBD memory management
+ *
+ * These functions are used to allocate block-sized chunks of memory
+ * used for making copies of buffer_head data. Very often it will be
+ * page-sized chunks of data, but sometimes it will be in
+ * sub-page-size chunks. (For example, 16k pages on Power systems
+ * with a 4k block file system.) For blocks smaller than a page, we
+ * use a SLAB allocator. There are slab caches for each block size,
+ * which are allocated at mount time, if necessary, and we only free
+ * (all of) the slab caches when/if the jbd2 module is unloaded. For
+ * this reason we don't need to a mutex to protect access to
+ * jbd2_slab[] allocating or releasing memory; only in
+ * jbd2_journal_create_slab().
+ */
+#define JBD2_MAX_SLABS 8
+static struct kmem_cache *jbd2_slab[JBD2_MAX_SLABS];
+static DECLARE_MUTEX(jbd2_slab_create_sem);
+
+static const char *jbd2_slab_names[JBD2_MAX_SLABS] = {
+ "jbd2_1k", "jbd2_2k", "jbd2_4k", "jbd2_8k",
+ "jbd2_16k", "jbd2_32k", "jbd2_64k", "jbd2_128k"
+};
+
+
+static void jbd2_journal_destroy_slabs(void)
+{
+ int i;
+
+ for (i = 0; i < JBD2_MAX_SLABS; i++) {
+ if (jbd2_slab[i])
+ kmem_cache_destroy(jbd2_slab[i]);
+ jbd2_slab[i] = NULL;
+ }
+}
+
+static int jbd2_journal_create_slab(size_t size)
+{
+ int i = order_base_2(size) - 10;
+ size_t slab_size;
+
+ if (size == PAGE_SIZE)
+ return 0;
+
+ if (i >= JBD2_MAX_SLABS)
+ return -EINVAL;
+
+ if (unlikely(i < 0))
+ i = 0;
+ down(&jbd2_slab_create_sem);
+ if (jbd2_slab[i]) {
+ up(&jbd2_slab_create_sem);
+ return 0; /* Already created */
+ }
+
+ slab_size = 1 << (i+10);
+ jbd2_slab[i] = kmem_cache_create(jbd2_slab_names[i], slab_size,
+ slab_size, 0, NULL);
+ up(&jbd2_slab_create_sem);
+ if (!jbd2_slab[i]) {
+ printk(KERN_EMERG "JBD2: no memory for jbd2_slab cache\n");
+ return -ENOMEM;
+ }
+ return 0;
+}
+
+static struct kmem_cache *get_slab(size_t size)
+{
+ int i = order_base_2(size) - 10;
+
+ BUG_ON(i >= JBD2_MAX_SLABS);
+ if (unlikely(i < 0))
+ i = 0;
+ BUG_ON(jbd2_slab[i] == 0);
+ return jbd2_slab[i];
+}
+
+void *jbd2_alloc(size_t size, gfp_t flags)
+{
+ void *ptr;
+
+ BUG_ON(size & (size-1)); /* Must be a power of 2 */
+
+ flags |= __GFP_REPEAT;
+ if (size == PAGE_SIZE)
+ ptr = (void *)__get_free_pages(flags, 0);
+ else if (size > PAGE_SIZE) {
+ int order = get_order(size);
+
+ if (order < 3)
+ ptr = (void *)__get_free_pages(flags, order);
+ else
+ ptr = vmalloc(size);
+ } else
+ ptr = kmem_cache_alloc(get_slab(size), flags);
+
+ /* Check alignment; SLUB has gotten this wrong in the past,
+ * and this can lead to user data corruption! */
+ BUG_ON(((unsigned long) ptr) & (size-1));
+
+ return ptr;
+}
+
+void jbd2_free(void *ptr, size_t size)
+{
+ if (size == PAGE_SIZE) {
+ free_pages((unsigned long)ptr, 0);
+ return;
+ }
+ if (size > PAGE_SIZE) {
+ int order = get_order(size);
+
+ if (order < 3)
+ free_pages((unsigned long)ptr, order);
+ else
+ vfree(ptr);
+ return;
+ }
+ kmem_cache_free(get_slab(size), ptr);
+};
+
+/*
* Journal_head storage management
*/
static struct kmem_cache *jbd2_journal_head_cache;
@@ -2204,6 +2335,7 @@ static void jbd2_journal_destroy_caches(void)
jbd2_journal_destroy_revoke_caches();
jbd2_journal_destroy_jbd2_journal_head_cache();
jbd2_journal_destroy_handle_cache();
+ jbd2_journal_destroy_slabs();
}

static int __init journal_init(void)
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 638ce45..8ada2a1 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -69,15 +69,8 @@ extern u8 jbd2_journal_enable_debug;
#define jbd_debug(f, a...) /**/
#endif

-static inline void *jbd2_alloc(size_t size, gfp_t flags)
-{
- return (void *)__get_free_pages(flags, get_order(size));
-}

2010-03-02 18:18:54

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 26/28] ext4: Use direct_IO_no_locking in ext4 dio read.

Signed-off-by: Jiaying Zhang <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/inode.c | 9 ++++++++-
1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index ae7adf2..576bbe2 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3440,7 +3440,14 @@ static ssize_t ext4_ind_direct_IO(int rw, struct kiocb *iocb,
}

retry:
- ret = blockdev_direct_IO(rw, iocb, inode, inode->i_sb->s_bdev, iov,
+ if (rw == READ && ext4_should_dioread_nolock(inode))
+ ret = blockdev_direct_IO_no_locking(rw, iocb, inode,
+ inode->i_sb->s_bdev, iov,
+ offset, nr_segs,
+ ext4_get_block, NULL);
+ else
+ ret = blockdev_direct_IO(rw, iocb, inode,
+ inode->i_sb->s_bdev, iov,
offset, nr_segs,
ext4_get_block, NULL);
if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))
--
1.6.6.1.1.g974db.dirty


2010-03-02 18:26:19

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 00/28] Ext4 patch queue for the 2.6.34 merge window.

On Tue, Mar 02, 2010 at 01:18:17PM -0500, Theodore Ts'o wrote:
> This is what I have queued up to send to Linus for the 2.6.34 merge
> window. It's in the ext4 patch queue and in the ext4 tree (everything
> between origin and next). I'm currently doing final testing and plan to
> ship it to Linus in the next day or two.
>
> If you think something is missing or should be dropped for this list, or
> see something wrong, please let me know ASAP. Thanks!!
>
> - Ted
>
> Theodore Ts'o (9):
> ext4: mechanical change on dio get_block code in prepare for it to be
> used by buffer write
> ext4: use ext4_get_block_write in buffer write
> ext4: Use direct_IO_no_locking in ext4 dio read.

This should be properly credited to Jiaying Zhang. Oops. I'll fix
this before pushing...

- Ted

2010-03-03 07:48:05

by Akira Fujita

[permalink] [raw]
Subject: Re: [PATCH 11/28] ext4: correctly calculate number of blocks for fiemap

Hi,

> From: Leonard Michlmayr<[email protected]>
>
> ext4_fiemap() rounds the length of the requested range down to
> blocksize, which is is not the true number of blocks that cover the
> requested region. This problem is especially impressive if the user
> requests only the first byte of a file: not a single extent will be
> reported.

<snip>

> start_blk = start>> inode->i_sb->s_blocksize_bits;
> - len_blks = len>> inode->i_sb->s_blocksize_bits;
> + last_blk = (start + len - 1)>> inode->i_sb->s_blocksize_bits;
> + len_blks = last_blk - start_blk + 1;

In 1KB block size, the overflow occurs at above line.
Since last_blk is set 0xffffffff when len is equal to s_maxbytes.
Therefore ext4_fiemap() can not get correct extent information
with 0 length. How about adding this change?

Signed-off-by: Akira Fujita <[email protected]>
---
extents.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- linux-2.6.33-rc8-a/fs/ext4/extents.c 2010-03-03 14:53:49.000000000 +0900
+++ linux-2.6.33-rc8-b/fs/ext4/extents.c 2010-03-03 15:31:57.000000000 +0900
@@ -3912,7 +3912,8 @@ int ext4_fiemap(struct inode *inode, str

start_blk = start >> inode->i_sb->s_blocksize_bits;
last_blk = (start + len - 1) >> inode->i_sb->s_blocksize_bits;
- len_blks = last_blk - start_blk + 1;
+ len_blks = (loff_t)last_blk - start_blk + 1 < EXT_MAX_BLOCK ?
+ last_blk - start_blk + 1 : EXT_MAX_BLOCK;

/*
* Walk the extent tree gathering extent information.


(2010/03/03 3:18), Theodore Ts'o wrote:
> From: Leonard Michlmayr<[email protected]>
>
> ext4_fiemap() rounds the length of the requested range down to
> blocksize, which is is not the true number of blocks that cover the
> requested region. This problem is especially impressive if the user
> requests only the first byte of a file: not a single extent will be
> reported.
>
> We fix this by calculating the last block of the region and then
> subtract to find the number of blocks in the extents.
>
> Signed-off-by: Leonard Michlmayr<[email protected]>
> Signed-off-by: "Theodore Ts'o"<[email protected]>
> ---
> fs/ext4/extents.c | 6 ++++--
> 1 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index bd80891..bc9860f 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -3768,7 +3768,6 @@ int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
> __u64 start, __u64 len)
> {
> ext4_lblk_t start_blk;
> - ext4_lblk_t len_blks;
> int error = 0;
>
> /* fallback to generic here if not in extents fmt */
> @@ -3782,8 +3781,11 @@ int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
> if (fieinfo->fi_flags& FIEMAP_FLAG_XATTR) {
> error = ext4_xattr_fiemap(inode, fieinfo);
> } else {
> + ext4_lblk_t last_blk, len_blks;
> +
> start_blk = start>> inode->i_sb->s_blocksize_bits;
> - len_blks = len>> inode->i_sb->s_blocksize_bits;
> + last_blk = (start + len - 1)>> inode->i_sb->s_blocksize_bits;
> + len_blks = last_blk - start_blk + 1;
>
> /*
> * Walk the extent tree gathering extent information.

--
Akira Fujita <[email protected]>

The First Fundamental Software Development Group,
Software Development Division,
NEC Software Tohoku, Ltd.

[email protected]@rs.jp.nec.com:

2010-03-03 08:34:34

by Leonard Michlmayr

[permalink] [raw]
Subject: Re: [PATCH 11/28] ext4: correctly calculate number of blocks for fiemap

Hi

I think that the patch is valid as it is.

<snip>
> > start_blk = start>> inode->i_sb->s_blocksize_bits;
> > - len_blks = len>> inode->i_sb->s_blocksize_bits;
> > + last_blk = (start + len - 1)>> inode->i_sb->s_blocksize_bits;
> > + len_blks = last_blk - start_blk + 1;
>
> In 1KB block size, the overflow occurs at above line.
> Since last_blk is set 0xffffffff when len is equal to s_maxbytes.
> Therefore ext4_fiemap() can not get correct extent information
> with 0 length. How about adding this change?
>

I have considered this possibility, but len == 0 is an invalid request
and would be sorted out by fs/ioctl.c:fiemap_check_ranges. If you want
to make sure that everything is correct even if len == 0 slips through
consider Andreas Dilger's solution which does not introduce a new branch
to the code:

(Here end_blk is one more than the last block)

end_blk = (start + len + inode->i_sb->s_blocksize - 1) >> inode->i_sb->s_blocksize_bits;
len_blks = end_blk - start_blk;

I think we don't need to copy the functionality of fiemap_check_ranges.
At least if there a no danger that len == 0 will be allowed in the
specification in future?

> Signed-off-by: Akira Fujita <[email protected]>
> ---
> extents.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
> --- linux-2.6.33-rc8-a/fs/ext4/extents.c 2010-03-03 14:53:49.000000000 +0900
> +++ linux-2.6.33-rc8-b/fs/ext4/extents.c 2010-03-03 15:31:57.000000000 +0900
> @@ -3912,7 +3912,8 @@ int ext4_fiemap(struct inode *inode, str
>
> start_blk = start >> inode->i_sb->s_blocksize_bits;
> last_blk = (start + len - 1) >> inode->i_sb->s_blocksize_bits;
> - len_blks = last_blk - start_blk + 1;
> + len_blks = (loff_t)last_blk - start_blk + 1 < EXT_MAX_BLOCK ?
> + last_blk - start_blk + 1 : EXT_MAX_BLOCK;
>
> /*
> * Walk the extent tree gathering extent information.
>
>
> (2010/03/03 3:18), Theodore Ts'o wrote:
> > From: Leonard Michlmayr<[email protected]>
> >
> > ext4_fiemap() rounds the length of the requested range down to
> > blocksize, which is is not the true number of blocks that cover the
> > requested region. This problem is especially impressive if the user
> > requests only the first byte of a file: not a single extent will be
> > reported.
> >
> > We fix this by calculating the last block of the region and then
> > subtract to find the number of blocks in the extents.
> >
> > Signed-off-by: Leonard Michlmayr<[email protected]>
> > Signed-off-by: "Theodore Ts'o"<[email protected]>
> > ---
> > fs/ext4/extents.c | 6 ++++--
> > 1 files changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> > index bd80891..bc9860f 100644
> > --- a/fs/ext4/extents.c
> > +++ b/fs/ext4/extents.c
> > @@ -3768,7 +3768,6 @@ int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
> > __u64 start, __u64 len)
> > {
> > ext4_lblk_t start_blk;
> > - ext4_lblk_t len_blks;
> > int error = 0;
> >
> > /* fallback to generic here if not in extents fmt */
> > @@ -3782,8 +3781,11 @@ int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
> > if (fieinfo->fi_flags& FIEMAP_FLAG_XATTR) {
> > error = ext4_xattr_fiemap(inode, fieinfo);
> > } else {
> > + ext4_lblk_t last_blk, len_blks;
> > +
> > start_blk = start>> inode->i_sb->s_blocksize_bits;
> > - len_blks = len>> inode->i_sb->s_blocksize_bits;
> > + last_blk = (start + len - 1)>> inode->i_sb->s_blocksize_bits;
> > + len_blks = last_blk - start_blk + 1;
> >
> > /*
> > * Walk the extent tree gathering extent information.
>





2010-03-03 17:52:25

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 11/28] ext4: correctly calculate number of blocks for fiemap

On Wed, Mar 03, 2010 at 04:47:28PM +0900, Akira Fujita wrote:
>
> In 1KB block size, the overflow occurs at above line.
> Since last_blk is set 0xffffffff when len is equal to s_maxbytes.
> Therefore ext4_fiemap() can not get correct extent information
> with 0 length. How about adding this change?

What do you think _is_ the correct thing to do when length is 0? As
Leonard has already pointed out, fiemap_check_ranges() already filters
out the length=0 case.

- Ted

2010-03-04 05:41:54

by Akira Fujita

[permalink] [raw]
Subject: Re: [PATCH 11/28] ext4: correctly calculate number of blocks for fiemap


(2010/03/04 2:52), [email protected] wrote:
> On Wed, Mar 03, 2010 at 04:47:28PM +0900, Akira Fujita wrote:
>>
>> In 1KB block size, the overflow occurs at above line.
>> Since last_blk is set 0xffffffff when len is equal to s_maxbytes.
>> Therefore ext4_fiemap() can not get correct extent information
>> with 0 length. How about adding this change?
>
> What do you think _is_ the correct thing to do when length is 0? As
> Leonard has already pointed out, fiemap_check_ranges() already filters
> out the length=0 case.

/mnt/mp1/file1
ext logical physical expected length flags
0 0 49153 8192
1 8192 77825 57344 2048 eof

For example, we do filefrag command for a above file (file1).
FS_IOC_FIEMAP tries to get whole extents information of file1,
so the output has to be 2 extents.
In this case, fm_length (requested block length) is passed
from the user-space to the kernel-space, as follows:

<user-space>
filefrag:
fiemap->fm_start(0) fiemap->fm_length(~0ULL)

<kernel-space>
fs/ioctl.c ioctl_fimap():

filemap_check_ranges():
len(~0ULL)
new_len(4398046511103 = s_maxbytes) <--- Because 'len > s_maxbytes'

fs/ext4/extents.c ext4_fiemap():
last_blk = start(0) + len(4398046511103) - 1 >> s_blocksize_bits(11)
= 4294967295 (0xFFFFFFFF)
len_blks = 4294967295 + 1 (0xFFFFFFFF + 0x00000001)
= 4294967296 (0x100000000) <--- _OVERFLOW!!_

ext4_ext_walk_space():
num = 0

This overflow leads to incorrect output like the below,
even though 2 extents exist.

[[email protected] akira]# filefrag -v /mnt/mp1/file1
Filesystem type is: ef53
File size of /mnt/mp1/file1 is 10485760 (10240 blocks, blocksize 1024)
ext logical physical expected length flags
/mnt/mp1/file6: 1 extent found

Regards,
Akira Fujita

2010-03-04 21:51:46

by Leonard Michlmayr

[permalink] [raw]
Subject: Re: [PATCH 11/28] ext4: correctly calculate number of blocks for fiemap

Akira Fujita:
> <kernel-space>
> fs/ioctl.c ioctl_fimap():
>
> filemap_check_ranges():
> len(~0ULL)
> new_len(4398046511103 = s_maxbytes) <--- Because 'len > s_maxbytes'
>
> fs/ext4/extents.c ext4_fiemap():
> last_blk = start(0) + len(4398046511103) - 1 >> s_blocksize_bits(11)
> = 4294967295 (0xFFFFFFFF)
> len_blks = 4294967295 + 1 (0xFFFFFFFF + 0x00000001)
> = 4294967296 (0x100000000) <--- _OVERFLOW!!_
>
> ext4_ext_walk_space():
> num = 0
>
> This overflow leads to incorrect output like the below,
> even though 2 extents exist.
>

Thank you for pointing this out. I had not checked s_maxbytes.
Appearently s_maxbytes can be 1<<(32 + s_blocksize_bits) - 1. Therefore
the number of blocks in a file cannot be stored in a 32bit integer.

I have a patch that should fix it for fiemap. I have just compiled it
and I will do some testing and double checking tomorrow. I will send a
separate email with the patch.

regards
Leonard



2010-03-04 22:08:10

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 11/28] ext4: correctly calculate number of blocks for fiemap

On Thu, Mar 04, 2010 at 02:40:22PM +0900, Akira Fujita wrote:
> <user-space>
> filefrag:
> fiemap->fm_start(0) fiemap->fm_length(~0ULL)
>
> <kernel-space>
> fs/ioctl.c ioctl_fimap():
>
> filemap_check_ranges():
> len(~0ULL)
> new_len(4398046511103 = s_maxbytes) <--- Because 'len > s_maxbytes'
>
> fs/ext4/extents.c ext4_fiemap():
> last_blk = start(0) + len(4398046511103) - 1 >> s_blocksize_bits(11)
> = 4294967295 (0xFFFFFFFF)
> len_blks = 4294967295 + 1 (0xFFFFFFFF + 0x00000001)
> = 4294967296 (0x100000000) <--- _OVERFLOW!!_
>
> ext4_ext_walk_space():
> num = 0
>
> This overflow leads to incorrect output like the below,
> even though 2 extents exist.

Akira-san,

Thank you for your clear explanation; you're absolutely correct. I've
replaced the patch with the following, which I think is a bit clearer.
I've tested to make sure it does the right thing, including a number
of corner cases.

Regards,

- Ted

commit aca92ff6f57c000d1b4523e383c8bd6b8269b8b1
Author: Leonard Michlmayr <[email protected]>
Date: Thu Mar 4 17:07:28 2010 -0500

ext4: correctly calculate number of blocks for fiemap

ext4_fiemap() rounds the length of the requested range down to
blocksize, which is is not the true number of blocks that cover the
requested region. This problem is especially impressive if the user
requests only the first byte of a file: not a single extent will be
reported.

We fix this by calculating the last block of the region and then
subtract to find the number of blocks in the extents.

Signed-off-by: Leonard Michlmayr <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index bd80891..7d54850 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3768,7 +3768,6 @@ int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
__u64 start, __u64 len)
{
ext4_lblk_t start_blk;
- ext4_lblk_t len_blks;
int error = 0;

/* fallback to generic here if not in extents fmt */
@@ -3782,8 +3781,14 @@ int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
if (fieinfo->fi_flags & FIEMAP_FLAG_XATTR) {
error = ext4_xattr_fiemap(inode, fieinfo);
} else {
+ ext4_lblk_t len_blks;
+ __u64 last_blk;
+
start_blk = start >> inode->i_sb->s_blocksize_bits;
- len_blks = len >> inode->i_sb->s_blocksize_bits;
+ last_blk = (start + len - 1) >> inode->i_sb->s_blocksize_bits;
+ if (last_blk >= EXT_MAX_BLOCK)
+ last_blk = EXT_MAX_BLOCK-1;
+ len_blks = ((ext4_lblk_t) last_blk) - start_blk + 1;

/*
* Walk the extent tree gathering extent information.

2010-03-04 22:28:39

by Leonard Michlmayr

[permalink] [raw]
Subject: [incomplete PATCH] ext4: avoid overflow in fiemap

A __u32 cannot hold the maximum number of blocks in a file. This may
lead to an overflow if a fiemap request has length = s_maxbytes.

I still get a segfault, I have to go through it again. This is just for
your information. I will work on it tomorrow.

The patch is to be applied after/at the end of the ext4 patch queue.
Appearently only fiemap uses ext4_ext_walk_space(). Therefore I changed
the semantics of the ext4_ext_walk_space() call. I hope this does not
introduce a new problem.

In the case that 1<<32 blocks are requested, but there are no allocated
extents, I return two gaps because, the length of a gap in the
ext4_ext_cache is a __u32 again.

Regards
Leonard

Signed-off-by: Leonard Michlmayr <[email protected]>

diff -puar linux-2.6.33-ted/fs/ext4/extents.c linux-2.6.33-lm/fs/ext4/extents.c
--- linux-2.6.33-ted/fs/ext4/extents.c 2010-03-04 21:08:00.000000000 +0100
+++ linux-2.6.33-lm/fs/ext4/extents.c 2010-03-04 22:50:54.000000000 +0100
@@ -1853,21 +1853,21 @@ cleanup:
}

int ext4_ext_walk_space(struct inode *inode, ext4_lblk_t block,
- ext4_lblk_t num, ext_prepare_callback func,
+ ext4_lblk_t last, ext_prepare_callback func,
void *cbdata)
{
struct ext4_ext_path *path = NULL;
struct ext4_ext_cache cbex;
struct ext4_extent *ex;
ext4_lblk_t next, start = 0, end = 0;
- ext4_lblk_t last = block + num;
int depth, exists, err = 0;

BUG_ON(func == NULL);
BUG_ON(inode == NULL);

- while (block < last && block != EXT_MAX_BLOCK) {
- num = last - block;
+ while (block <= last && block != EXT_MAX_BLOCK) {
+ /* num is one less than the number of blocks */
+ ext4_lblk_t num = last - block;
/* find extent for this block */
down_read(&EXT4_I(inode)->i_data_sem);
path = ext4_ext_find_extent(inode, block, path);
@@ -1893,10 +1893,12 @@ int ext4_ext_walk_space(struct inode *in
* all requested space */
start = block;
end = block + num;
+ if (end - start >= EXT_MAX_BLOCK)
+ end = EXT_MAX_BLOCK - 1;
} else if (le32_to_cpu(ex->ee_block) > block) {
/* need to allocate space before found extent */
start = block;
- end = le32_to_cpu(ex->ee_block);
+ end = le32_to_cpu(ex->ee_block) - 1;
if (block + num < end)
end = block + num;
} else if (block >= le32_to_cpu(ex->ee_block)
@@ -1904,8 +1906,11 @@ int ext4_ext_walk_space(struct inode *in
/* need to allocate space after found extent */
start = block;
end = block + num;
+ if (end - start >= EXT_MAX_BLOCK)
+ end = EXT_MAX_BLOCK - 1;
+ BUG_ON(next == 0);
if (end >= next)
- end = next;
+ end = next - 1;
} else if (block >= le32_to_cpu(ex->ee_block)) {
/*
* some part of requested space is covered
@@ -1913,7 +1918,7 @@ int ext4_ext_walk_space(struct inode *in
*/
start = block;
end = le32_to_cpu(ex->ee_block)
- + ext4_ext_get_actual_len(ex);
+ + (ext4_ext_get_actual_len(ex) - 1);
if (block + num < end)
end = block + num;
exists = 1;
@@ -1924,7 +1929,7 @@ int ext4_ext_walk_space(struct inode *in

if (!exists) {
cbex.ec_block = start;
- cbex.ec_len = end - start;
+ cbex.ec_len = end - start + 1;
cbex.ec_start = 0;
cbex.ec_type = EXT4_EXT_CACHE_GAP;
} else {
@@ -3894,7 +3899,6 @@ static int ext4_xattr_fiemap(struct inod
int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
__u64 start, __u64 len)
{
- ext4_lblk_t start_blk;
int error = 0;

/* fallback to generic here if not in extents fmt */
@@ -3908,17 +3912,16 @@ int ext4_fiemap(struct inode *inode, str
if (fieinfo->fi_flags & FIEMAP_FLAG_XATTR) {
error = ext4_xattr_fiemap(inode, fieinfo);
} else {
- ext4_lblk_t last_blk, len_blks;
+ ext4_lblk_t start_blk, last_blk;

start_blk = start >> inode->i_sb->s_blocksize_bits;
- last_blk = (start + len - 1) >> inode->i_sb->s_blocksize_bits;
- len_blks = last_blk - start_blk + 1;
+ last_blk = (len - 1 + start) >> inode->i_sb->s_blocksize_bits;

/*
* Walk the extent tree gathering extent information.
* ext4_ext_fiemap_cb will push extents back to user.
*/
- error = ext4_ext_walk_space(inode, start_blk, len_blks,
+ error = ext4_ext_walk_space(inode, start_blk, last_blk,
ext4_ext_fiemap_cb, fieinfo);
}





2010-03-04 23:48:05

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH 11/28] ext4: correctly calculate number of blocks for fiemap

[email protected] wrote:
> On Thu, Mar 04, 2010 at 02:40:22PM +0900, Akira Fujita wrote:
>> <user-space>
>> filefrag:
>> fiemap->fm_start(0) fiemap->fm_length(~0ULL)
>>
>> <kernel-space>
>> fs/ioctl.c ioctl_fimap():
>>
>> filemap_check_ranges():
>> len(~0ULL)
>> new_len(4398046511103 = s_maxbytes) <--- Because 'len > s_maxbytes'
>>
>> fs/ext4/extents.c ext4_fiemap():
>> last_blk = start(0) + len(4398046511103) - 1 >> s_blocksize_bits(11)
>> = 4294967295 (0xFFFFFFFF)
>> len_blks = 4294967295 + 1 (0xFFFFFFFF + 0x00000001)
>> = 4294967296 (0x100000000) <--- _OVERFLOW!!_
>>
>> ext4_ext_walk_space():
>> num = 0
>>
>> This overflow leads to incorrect output like the below,
>> even though 2 extents exist.
>
> Akira-san,
>
> Thank you for your clear explanation; you're absolutely correct. I've
> replaced the patch with the following, which I think is a bit clearer.
> I've tested to make sure it does the right thing, including a number
> of corner cases.

Ted, I think those testcases you used would be great xfstests candidates,
hint hint. :)

-Eric

> Regards,
>
> - Ted
>
> commit aca92ff6f57c000d1b4523e383c8bd6b8269b8b1
> Author: Leonard Michlmayr <[email protected]>
> Date: Thu Mar 4 17:07:28 2010 -0500
>
> ext4: correctly calculate number of blocks for fiemap
>
> ext4_fiemap() rounds the length of the requested range down to
> blocksize, which is is not the true number of blocks that cover the
> requested region. This problem is especially impressive if the user
> requests only the first byte of a file: not a single extent will be
> reported.
>
> We fix this by calculating the last block of the region and then
> subtract to find the number of blocks in the extents.
>
> Signed-off-by: Leonard Michlmayr <[email protected]>
> Signed-off-by: "Theodore Ts'o" <[email protected]>
>
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index bd80891..7d54850 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -3768,7 +3768,6 @@ int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
> __u64 start, __u64 len)
> {
> ext4_lblk_t start_blk;
> - ext4_lblk_t len_blks;
> int error = 0;
>
> /* fallback to generic here if not in extents fmt */
> @@ -3782,8 +3781,14 @@ int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
> if (fieinfo->fi_flags & FIEMAP_FLAG_XATTR) {
> error = ext4_xattr_fiemap(inode, fieinfo);
> } else {
> + ext4_lblk_t len_blks;
> + __u64 last_blk;
> +
> start_blk = start >> inode->i_sb->s_blocksize_bits;
> - len_blks = len >> inode->i_sb->s_blocksize_bits;
> + last_blk = (start + len - 1) >> inode->i_sb->s_blocksize_bits;
> + if (last_blk >= EXT_MAX_BLOCK)
> + last_blk = EXT_MAX_BLOCK-1;
> + len_blks = ((ext4_lblk_t) last_blk) - start_blk + 1;
>
> /*
> * Walk the extent tree gathering extent information.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


2010-03-04 23:39:01

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH 11/28] ext4: correctly calculate number of blocks for fiemap

Leonard Michlmayr wrote:
> Akira Fujita:
>> <kernel-space>
>> fs/ioctl.c ioctl_fimap():
>>
>> filemap_check_ranges():
>> len(~0ULL)
>> new_len(4398046511103 = s_maxbytes) <--- Because 'len > s_maxbytes'
>>
>> fs/ext4/extents.c ext4_fiemap():
>> last_blk = start(0) + len(4398046511103) - 1 >> s_blocksize_bits(11)
>> = 4294967295 (0xFFFFFFFF)
>> len_blks = 4294967295 + 1 (0xFFFFFFFF + 0x00000001)
>> = 4294967296 (0x100000000) <--- _OVERFLOW!!_
>>
>> ext4_ext_walk_space():
>> num = 0
>>
>> This overflow leads to incorrect output like the below,
>> even though 2 extents exist.
>>
>
> Thank you for pointing this out. I had not checked s_maxbytes.
> Appearently s_maxbytes can be 1<<(32 + s_blocksize_bits) - 1. Therefore
> the number of blocks in a file cannot be stored in a 32bit integer.

For extent-based files it had better be....

struct ext4_extent {
__le32 ee_block; /* first logical block extent covers */
__le16 ee_len; /* number of blocks covered by extent */

The start block can't be more than 32 bits; this essentially limits
the file size / maximum logical block to 2^32 blocks right?

s_maxbytes comes out to 17592186044415

2^32 4k blocks is 17592186044416 bytes, or max byte offset 17592186044415

What am I missing? (confusion between max byte count and max
byte offset, perhaps?)

-Eric

> I have a patch that should fix it for fiemap. I have just compiled it
> and I will do some testing and double checking tomorrow. I will send a
> separate email with the patch.
>
> regards
> Leonard
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


2010-03-05 05:58:36

by Akira Fujita

[permalink] [raw]
Subject: Re: [PATCH 11/28] ext4: correctly calculate number of blocks for fiemap

Hi Ted,

> commit aca92ff6f57c000d1b4523e383c8bd6b8269b8b1
> Author: Leonard Michlmayr<[email protected]>
> Date: Thu Mar 4 17:07:28 2010 -0500
>
> ext4: correctly calculate number of blocks for fiemap
>
> ext4_fiemap() rounds the length of the requested range down to
> blocksize, which is is not the true number of blocks that cover the
> requested region. This problem is especially impressive if the user
> requests only the first byte of a file: not a single extent will be
> reported.
>
> We fix this by calculating the last block of the region and then
> subtract to find the number of blocks in the extents.
>
> Signed-off-by: Leonard Michlmayr<[email protected]>
> Signed-off-by: "Theodore Ts'o"<[email protected]>
>
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index bd80891..7d54850 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -3768,7 +3768,6 @@ int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
> __u64 start, __u64 len)
> {
> ext4_lblk_t start_blk;
> - ext4_lblk_t len_blks;
> int error = 0;
>
> /* fallback to generic here if not in extents fmt */
> @@ -3782,8 +3781,14 @@ int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
> if (fieinfo->fi_flags& FIEMAP_FLAG_XATTR) {
> error = ext4_xattr_fiemap(inode, fieinfo);
> } else {
> + ext4_lblk_t len_blks;
> + __u64 last_blk;
> +
> start_blk = start>> inode->i_sb->s_blocksize_bits;
> - len_blks = len>> inode->i_sb->s_blocksize_bits;
> + last_blk = (start + len - 1)>> inode->i_sb->s_blocksize_bits;
> + if (last_blk>= EXT_MAX_BLOCK)
> + last_blk = EXT_MAX_BLOCK-1;

last_blk = EXT_MAX_BLOCK - 1;

You seem to have already pushed this patch to Linus, though.
There is a coding style thing.
Anyway, the patch works fine for me.
Thanks for your work. :-)

Tested-by: Akira Fujita <[email protected]>

Regards,
Akira Fujita

> + len_blks = ((ext4_lblk_t) last_blk) - start_blk + 1;
>
> /*
> * Walk the extent tree gathering extent information.
>

2010-03-05 16:47:01

by Leonard Michlmayr

[permalink] [raw]
Subject: Re: [PATCH 11/28] ext4: correctly calculate number of blocks for fiemap

2010/3/5 Eric Sandeen <[email protected]>
>
> > Thank you for pointing this out. I had not checked s_maxbytes.
> > Appearently s_maxbytes can be 1<<(32 + s_blocksize_bits) - 1. Therefore
> > the number of blocks in a file cannot be stored in a 32bit integer.
>
> For extent-based files it had better be....
>
> struct ext4_extent {
>        __le32  ee_block;       /* first logical block extent covers */
>        __le16  ee_len;         /* number of blocks covered by extent */
>
> The start block can't be more than 32 bits; this essentially limits
> the file size / maximum logical block to 2^32 blocks right?
>
> s_maxbytes comes out to 17592186044415
>
> 2^32 4k blocks is 17592186044416 bytes, or max byte offset 17592186044415
>
> What am I missing?  (confusion between max byte count and max
> byte offset, perhaps?)
>

Yes. The block offset has the maximum value 1<<32 - 1.
However the number of blocks may be 1<<32.

2010-03-06 13:03:57

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: [PATCH 27/28] ext4: Convert BUG_ON checks to use ext4_error() instead

On Tue, 2 Mar 2010 13:18:44 -0500, "Theodore Ts'o" <[email protected]> wrote:
> From: Frank Mayhar <[email protected]>
>
> Convert a bunch of BUG_ONs to emit a ext4_error() message and return
> EIO. This is a first pass and most notably does _not_ cover
> mballoc.c, which is a morass of void functions.
>
> Signed-off-by: Frank Mayhar <[email protected]>
> Signed-off-by: "Theodore Ts'o" <[email protected]>
> ---
> fs/ext4/ext4.h | 10 +++
> fs/ext4/extents.c | 189 +++++++++++++++++++++++++++++++++++++++++------------
> fs/ext4/inode.c | 18 +++++-
> fs/ext4/super.c | 36 ++++++++++
> 4 files changed, 209 insertions(+), 44 deletions(-)
>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 79c067e..350c3a2 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -53,6 +53,12 @@
> #define ext4_debug(f, a...) do {} while (0)
> #endif
>
> +#define EXT4_ERROR_INODE(inode, fmt, a...) \
> + ext4_error_inode(__func__, (inode), (fmt), ## a);
> +
> +#define EXT4_ERROR_FILE(file, fmt, a...) \
> + ext4_error_file(__func__, (file), (fmt), ## a);
> +


I guess ext4_error and ext4_warning are now not taking __func__. So we
to be consistant with thos i this this can be in lower case eventhough
they are #defines.


> /* data type for block offset of block group */
> typedef int ext4_grpblk_t;
>
> @@ -1492,6 +1498,10 @@ extern int ext4_group_extend(struct super_block *sb,
> extern void __ext4_error(struct super_block *, const char *, const char *, ...)
> __attribute__ ((format (printf, 3, 4)));
> #define ext4_error(sb, message...) __ext4_error(sb, __func__, ## message)
> +extern void ext4_error_inode(const char *, struct inode *, const char *, ...)
> + __attribute__ ((format (printf, 3, 4)));
> +extern void ext4_error_file(const char *, struct file *, const char *, ...)
> + __attribute__ ((format (printf, 3, 4)));
> extern void __ext4_std_error(struct super_block *, const char *, int);
> extern void ext4_abort(struct super_block *, const char *, const char *, ...)
> __attribute__ ((format (printf, 3, 4)));
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 202667b..45dad87 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -703,7 +703,12 @@ ext4_ext_find_extent(struct inode *inode, ext4_lblk_t block,
> }
> eh = ext_block_hdr(bh);
> ppos++;
> - BUG_ON(ppos > depth);
> + if (unlikely(ppos > depth)) {
> + put_bh(bh);
> + EXT4_ERROR_INODE(inode,
> + "ppos %d > depth %d", ppos, depth);
> + goto err;
> + }
> path[ppos].p_bh = bh;
> path[ppos].p_hdr = eh;
> i--;
> @@ -749,7 +754,12 @@ int ext4_ext_insert_index(handle_t *handle, struct inode *inode,
> if (err)
> return err;
>
> - BUG_ON(logical == le32_to_cpu(curp->p_idx->ei_block));
> + if (unlikely(logical == le32_to_cpu(curp->p_idx->ei_block))) {
> + EXT4_ERROR_INODE(inode,
> + "logical %d == ei_block %d!",
> + logical, le32_to_cpu(curp->p_idx->ei_block));
> + return -EIO;
> + }
> len = EXT_MAX_INDEX(curp->p_hdr) - curp->p_idx;
> if (logical > le32_to_cpu(curp->p_idx->ei_block)) {
> /* insert after */
> @@ -779,9 +789,17 @@ int ext4_ext_insert_index(handle_t *handle, struct inode *inode,
> ext4_idx_store_pblock(ix, ptr);
> le16_add_cpu(&curp->p_hdr->eh_entries, 1);
>
> - BUG_ON(le16_to_cpu(curp->p_hdr->eh_entries)
> - > le16_to_cpu(curp->p_hdr->eh_max));
> - BUG_ON(ix > EXT_LAST_INDEX(curp->p_hdr));
> + if (unlikely(le16_to_cpu(curp->p_hdr->eh_entries)
> + > le16_to_cpu(curp->p_hdr->eh_max))) {
> + EXT4_ERROR_INODE(inode,
> + "logical %d == ei_block %d!",
> + logical, le32_to_cpu(curp->p_idx->ei_block));
> + return -EIO;
> + }
> + if (unlikely(ix > EXT_LAST_INDEX(curp->p_hdr))) {
> + EXT4_ERROR_INODE(inode, "ix > EXT_LAST_INDEX!");
> + return -EIO;
> + }
>
> err = ext4_ext_dirty(handle, inode, curp);
> ext4_std_error(inode->i_sb, err);
> @@ -819,7 +837,10 @@ static int ext4_ext_split(handle_t *handle, struct inode *inode,
>
> /* if current leaf will be split, then we should use
> * border from split point */
> - BUG_ON(path[depth].p_ext > EXT_MAX_EXTENT(path[depth].p_hdr));
> + if (unlikely(path[depth].p_ext > EXT_MAX_EXTENT(path[depth].p_hdr))) {
> + EXT4_ERROR_INODE(inode, "p_ext > EXT_MAX_EXTENT!");
> + return -EIO;
> + }
> if (path[depth].p_ext != EXT_MAX_EXTENT(path[depth].p_hdr)) {
> border = path[depth].p_ext[1].ee_block;
> ext_debug("leaf will be split."
> @@ -860,7 +881,11 @@ static int ext4_ext_split(handle_t *handle, struct inode *inode,
>
> /* initialize new leaf */
> newblock = ablocks[--a];
> - BUG_ON(newblock == 0);
> + if (unlikely(newblock == 0)) {
> + EXT4_ERROR_INODE(inode, "newblock == 0!");
> + err = -EIO;
> + goto cleanup;
> + }
> bh = sb_getblk(inode->i_sb, newblock);
> if (!bh) {
> err = -EIO;
> @@ -880,7 +905,14 @@ static int ext4_ext_split(handle_t *handle, struct inode *inode,
> ex = EXT_FIRST_EXTENT(neh);
>
> /* move remainder of path[depth] to the new leaf */
> - BUG_ON(path[depth].p_hdr->eh_entries != path[depth].p_hdr->eh_max);
> + if (unlikely(path[depth].p_hdr->eh_entries !=
> + path[depth].p_hdr->eh_max)) {
> + EXT4_ERROR_INODE(inode, "eh_entries %d != eh_max %d!",
> + path[depth].p_hdr->eh_entries,
> + path[depth].p_hdr->eh_max);
> + err = -EIO;
> + goto cleanup;
> + }
> /* start copy from next extent */
> /* TODO: we could do it by single memmove */
> m = 0;
> @@ -927,7 +959,11 @@ static int ext4_ext_split(handle_t *handle, struct inode *inode,
>
> /* create intermediate indexes */
> k = depth - at - 1;
> - BUG_ON(k < 0);
> + if (unlikely(k < 0)) {
> + EXT4_ERROR_INODE(inode, "k %d < 0!", k);
> + err = -EIO;
> + goto cleanup;
> + }
> if (k)
> ext_debug("create %d intermediate indices\n", k);
> /* insert new index into current index block */
> @@ -964,8 +1000,14 @@ static int ext4_ext_split(handle_t *handle, struct inode *inode,
>
> ext_debug("cur 0x%p, last 0x%p\n", path[i].p_idx,
> EXT_MAX_INDEX(path[i].p_hdr));
> - BUG_ON(EXT_MAX_INDEX(path[i].p_hdr) !=
> - EXT_LAST_INDEX(path[i].p_hdr));
> + if (unlikely(EXT_MAX_INDEX(path[i].p_hdr) !=
> + EXT_LAST_INDEX(path[i].p_hdr))) {
> + EXT4_ERROR_INODE(inode,
> + "EXT_MAX_INDEX != EXT_LAST_INDEX ee_block %d!",
> + le32_to_cpu(path[i].p_ext->ee_block));
> + err = -EIO;
> + goto cleanup;
> + }
> while (path[i].p_idx <= EXT_MAX_INDEX(path[i].p_hdr)) {
> ext_debug("%d: move %d:%llu in new index %llu\n", i,
> le32_to_cpu(path[i].p_idx->ei_block),
> @@ -1203,7 +1245,10 @@ ext4_ext_search_left(struct inode *inode, struct ext4_ext_path *path,
> struct ext4_extent *ex;
> int depth, ee_len;
>
> - BUG_ON(path == NULL);
> + if (unlikely(path == NULL)) {
> + EXT4_ERROR_INODE(inode, "path == NULL *logical %d!", *logical);
> + return -EIO;
> + }
> depth = path->p_depth;
> *phys = 0;
>
> @@ -1217,15 +1262,33 @@ ext4_ext_search_left(struct inode *inode, struct ext4_ext_path *path,
> ex = path[depth].p_ext;
> ee_len = ext4_ext_get_actual_len(ex);
> if (*logical < le32_to_cpu(ex->ee_block)) {
> - BUG_ON(EXT_FIRST_EXTENT(path[depth].p_hdr) != ex);
> + if (unlikely(EXT_FIRST_EXTENT(path[depth].p_hdr) != ex)) {
> + EXT4_ERROR_INODE(inode,
> + "EXT_FIRST_EXTENT != ex *logical %d ee_block %d!",
> + *logical, le32_to_cpu(ex->ee_block));
> + return -EIO;
> + }
> while (--depth >= 0) {
> ix = path[depth].p_idx;
> - BUG_ON(ix != EXT_FIRST_INDEX(path[depth].p_hdr));
> + if (unlikely(ix != EXT_FIRST_INDEX(path[depth].p_hdr))) {
> + EXT4_ERROR_INODE(inode,
> + "ix (%d) != EXT_FIRST_INDEX (%d) (depth %d)!",
> + ix != NULL ? ix->ei_block : 0,
> + EXT_FIRST_INDEX(path[depth].p_hdr) != NULL ?
> + EXT_FIRST_INDEX(path[depth].p_hdr)->ei_block : 0,
> + depth);
> + return -EIO;
> + }
> }
> return 0;
> }
>
> - BUG_ON(*logical < (le32_to_cpu(ex->ee_block) + ee_len));
> + if (unlikely(*logical < (le32_to_cpu(ex->ee_block) + ee_len))) {
> + EXT4_ERROR_INODE(inode,
> + "logical %d < ee_block %d + ee_len %d!",
> + *logical, le32_to_cpu(ex->ee_block), ee_len);
> + return -EIO;
> + }
>
> *logical = le32_to_cpu(ex->ee_block) + ee_len - 1;
> *phys = ext_pblock(ex) + ee_len - 1;
> @@ -1251,7 +1314,10 @@ ext4_ext_search_right(struct inode *inode, struct ext4_ext_path *path,
> int depth; /* Note, NOT eh_depth; depth from top of tree */
> int ee_len;
>
> - BUG_ON(path == NULL);
> + if (unlikely(path == NULL)) {
> + EXT4_ERROR_INODE(inode, "path == NULL *logical %d!", *logical);
> + return -EIO;
> + }
> depth = path->p_depth;
> *phys = 0;
>
> @@ -1265,17 +1331,32 @@ ext4_ext_search_right(struct inode *inode, struct ext4_ext_path *path,
> ex = path[depth].p_ext;
> ee_len = ext4_ext_get_actual_len(ex);
> if (*logical < le32_to_cpu(ex->ee_block)) {
> - BUG_ON(EXT_FIRST_EXTENT(path[depth].p_hdr) != ex);
> + if (unlikely(EXT_FIRST_EXTENT(path[depth].p_hdr) != ex)) {
> + EXT4_ERROR_INODE(inode,
> + "first_extent(path[%d].p_hdr) != ex",
> + depth);
> + return -EIO;
> + }
> while (--depth >= 0) {
> ix = path[depth].p_idx;
> - BUG_ON(ix != EXT_FIRST_INDEX(path[depth].p_hdr));
> + if (unlikely(ix != EXT_FIRST_INDEX(path[depth].p_hdr))) {
> + EXT4_ERROR_INODE(inode,
> + "ix != EXT_FIRST_INDEX *logical %d!",
> + *logical);
> + return -EIO;
> + }
> }
> *logical = le32_to_cpu(ex->ee_block);
> *phys = ext_pblock(ex);
> return 0;
> }
>
> - BUG_ON(*logical < (le32_to_cpu(ex->ee_block) + ee_len));
> + if (unlikely(*logical < (le32_to_cpu(ex->ee_block) + ee_len))) {
> + EXT4_ERROR_INODE(inode,
> + "logical %d < ee_block %d + ee_len %d!",
> + *logical, le32_to_cpu(ex->ee_block), ee_len);
> + return -EIO;
> + }
>
> if (ex != EXT_LAST_EXTENT(path[depth].p_hdr)) {
> /* next allocated block in this leaf */
> @@ -1414,8 +1495,12 @@ static int ext4_ext_correct_indexes(handle_t *handle, struct inode *inode,
>
> eh = path[depth].p_hdr;
> ex = path[depth].p_ext;
> - BUG_ON(ex == NULL);
> - BUG_ON(eh == NULL);
> +
> + if (unlikely(ex == NULL || eh == NULL)) {
> + EXT4_ERROR_INODE(inode,
> + "ex %p == NULL or eh %p == NULL", ex, eh);
> + return -EIO;
> + }
>
> if (depth == 0) {
> /* there is no tree at all */
> @@ -1613,10 +1698,16 @@ int ext4_ext_insert_extent(handle_t *handle, struct inode *inode,
> ext4_lblk_t next;
> unsigned uninitialized = 0;
>
> - BUG_ON(ext4_ext_get_actual_len(newext) == 0);
> + if (unlikely(ext4_ext_get_actual_len(newext) == 0)) {
> + EXT4_ERROR_INODE(inode, "ext4_ext_get_actual_len(newext) == 0");
> + return -EIO;
> + }
> depth = ext_depth(inode);
> ex = path[depth].p_ext;
> - BUG_ON(path[depth].p_hdr == NULL);
> + if (unlikely(path[depth].p_hdr == NULL)) {
> + EXT4_ERROR_INODE(inode, "path[%d].p_hdr == NULL", depth);
> + return -EIO;
> + }
>
> /* try to insert block into found extent and return */
> if (ex && !(flag & EXT4_GET_BLOCKS_PRE_IO)
> @@ -1788,7 +1879,11 @@ int ext4_ext_walk_space(struct inode *inode, ext4_lblk_t block,
> }
>
> depth = ext_depth(inode);
> - BUG_ON(path[depth].p_hdr == NULL);
> + if (unlikely(path[depth].p_hdr == NULL)) {
> + EXT4_ERROR_INODE(inode, "path[%d].p_hdr == NULL", depth);
> + err = -EIO;
> + break;
> + }
> ex = path[depth].p_ext;
> next = ext4_ext_next_allocated_block(path);
>
> @@ -1839,7 +1934,11 @@ int ext4_ext_walk_space(struct inode *inode, ext4_lblk_t block,
> cbex.ec_type = EXT4_EXT_CACHE_EXTENT;
> }
>
> - BUG_ON(cbex.ec_len == 0);
> + if (unlikely(cbex.ec_len == 0)) {
> + EXT4_ERROR_INODE(inode, "cbex.ec_len == 0");
> + err = -EIO;
> + break;
> + }
> err = func(inode, path, &cbex, ex, cbdata);
> ext4_ext_drop_refs(path);
>
> @@ -1982,7 +2081,10 @@ static int ext4_ext_rm_idx(handle_t *handle, struct inode *inode,
> /* free index block */
> path--;
> leaf = idx_pblock(path->p_idx);
> - BUG_ON(path->p_hdr->eh_entries == 0);
> + if (unlikely(path->p_hdr->eh_entries == 0)) {
> + EXT4_ERROR_INODE(inode, "path->p_hdr->eh_entries == 0");
> + return -EIO;
> + }
> err = ext4_ext_get_access(handle, inode, path);
> if (err)
> return err;
> @@ -2120,8 +2222,10 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
> if (!path[depth].p_hdr)
> path[depth].p_hdr = ext_block_hdr(path[depth].p_bh);
> eh = path[depth].p_hdr;
> - BUG_ON(eh == NULL);
> -
> + if (unlikely(path[depth].p_hdr == NULL)) {
> + EXT4_ERROR_INODE(inode, "path[%d].p_hdr == NULL", depth);
> + return -EIO;
> + }
> /* find where to start removing */
> ex = EXT_LAST_EXTENT(eh);
>
> @@ -3240,10 +3344,10 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
> * this situation is possible, though, _during_ tree modification;
> * this is why assert can't be put in ext4_ext_find_extent()
> */
> - if (path[depth].p_ext == NULL && depth != 0) {
> - ext4_error(inode->i_sb, "bad extent address "
> - "inode: %lu, iblock: %d, depth: %d",
> - inode->i_ino, iblock, depth);
> + if (unlikely(path[depth].p_ext == NULL && depth != 0)) {
> + EXT4_ERROR_INODE(inode, "bad extent address "
> + "iblock: %d, depth: %d pblock %lld",
> + iblock, depth, path[depth].p_block);
> err = -EIO;
> goto out2;
> }
> @@ -3371,16 +3475,17 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
> }
>
> if (unlikely(EXT4_I(inode)->i_flags & EXT4_EOFBLOCKS_FL)) {
> - if (eh->eh_entries) {
> - last_ex = EXT_LAST_EXTENT(eh);
> - if (iblock + ar.len > le32_to_cpu(last_ex->ee_block)
> - + ext4_ext_get_actual_len(last_ex))
> - EXT4_I(inode)->i_flags &= ~EXT4_EOFBLOCKS_FL;
> - } else {
> - WARN_ON(eh->eh_entries == 0);
> - ext4_error(inode->i_sb, __func__,
> - "inode#%lu, eh->eh_entries = 0!", inode->i_ino);
> - }
> + if (unlikely(!eh->eh_entries)) {
> + EXT4_ERROR_INODE(inode,
> + "eh->eh_entries == 0 ee_block %d",
> + ex->ee_block);
> + err = -EIO;
> + goto out2;
> + }
> + last_ex = EXT_LAST_EXTENT(eh);
> + if (iblock + ar.len > le32_to_cpu(last_ex->ee_block)
> + + ext4_ext_get_actual_len(last_ex))
> + EXT4_I(inode)->i_flags &= ~EXT4_EOFBLOCKS_FL;
> }
> err = ext4_ext_insert_extent(handle, inode, path, &newex, flags);
> if (err) {
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 576bbe2..2c48fbe 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -607,7 +607,14 @@ static int ext4_alloc_blocks(handle_t *handle, struct inode *inode,
> if (*err)
> goto failed_out;
>
> - BUG_ON(current_block + count > EXT4_MAX_BLOCK_FILE_PHYS);
> + if (unlikely(current_block + count > EXT4_MAX_BLOCK_FILE_PHYS)) {
> + EXT4_ERROR_INODE(inode,
> + "current_block %llu + count %lu > %d!",
> + current_block, count,
> + EXT4_MAX_BLOCK_FILE_PHYS);
> + *err = -EIO;
> + goto failed_out;
> + }
>
> target -= count;
> /* allocate blocks for indirect blocks */
> @@ -643,7 +650,14 @@ static int ext4_alloc_blocks(handle_t *handle, struct inode *inode,
> ar.flags = EXT4_MB_HINT_DATA;
>
> current_block = ext4_mb_new_blocks(handle, &ar, err);
> - BUG_ON(current_block + ar.len > EXT4_MAX_BLOCK_FILE_PHYS);
> + if (unlikely(current_block + ar.len > EXT4_MAX_BLOCK_FILE_PHYS)) {
> + EXT4_ERROR_INODE(inode,
> + "current_block %llu + ar.len %d > %d!",
> + current_block, ar.len,
> + EXT4_MAX_BLOCK_FILE_PHYS);
> + *err = -EIO;
> + goto failed_out;
> + }
>
> if (*err && (target == blks)) {
> /*
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 200d257..6c8e92c 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -347,6 +347,42 @@ void __ext4_error(struct super_block *sb, const char *function,
> ext4_handle_error(sb);
> }
>
> +void ext4_error_inode(const char *function, struct inode *inode,
> + const char *fmt, ...)
> +{
> + va_list args;
> +
> + va_start(args, fmt);
> + printk(KERN_CRIT "EXT4-fs error (device %s): %s: inode #%lu: (comm %s) ",
> + inode->i_sb->s_id, function, inode->i_ino, current->comm);
> + vprintk(fmt, args);
> + printk("\n");
> + va_end(args);
> +
> + ext4_handle_error(inode->i_sb);
> +}
> +
> +void ext4_error_file(const char *function, struct file *file,
> + const char *fmt, ...)
> +{
> + va_list args;
> + struct inode *inode = file->f_dentry->d_inode;
> + char pathname[80], *path;
> +
> + va_start(args, fmt);
> + path = d_path(&(file->f_path), pathname, sizeof(pathname));
> + if (!path)
> + path = "(unknown)";
> + printk(KERN_CRIT
> + "EXT4-fs error (device %s): %s: inode #%lu (comm %s path %s): ",
> + inode->i_sb->s_id, function, inode->i_ino, current->comm, path);
> + vprintk(fmt, args);
> + printk("\n");
> + va_end(args);
> +
> + ext4_handle_error(inode->i_sb);
> +}


-aneesh

2010-03-07 02:45:35

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 27/28] ext4: Convert BUG_ON checks to use ext4_error() instead

On Sat, Mar 06, 2010 at 06:33:48PM +0530, Aneesh Kumar K. V wrote:
>
> I guess ext4_error and ext4_warning are now not taking __func__. So we
> to be consistant with thos i this this can be in lower case even though
> they are #defines.

Or maybe we should rename them be EXT4_ERROR and EXT4_WARNING. What
do other people think?

- Ted

2010-03-07 03:38:24

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [incomplete PATCH] ext4: avoid overflow in fiemap

On Thu, Mar 04, 2010 at 11:28:35PM +0100, Leonard Michlmayr wrote:
> A __u32 cannot hold the maximum number of blocks in a file. This may
> lead to an overflow if a fiemap request has length = s_maxbytes.
>
> I still get a segfault, I have to go through it again. This is just for
> your information. I will work on it tomorrow.

Check and see if it's still a problem with what I ultimately ended up
pushing to Linus. Yes, the way I chose to fix things means that we'll
never do more than 2**32-1 blocks at a time with fiemap, but in
practice there's a limit to the number of extents that can be returned
in a single fiemap call anyway, so in practice this shouldn't be a
problem.

- Ted

2010-03-07 05:51:46

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH 27/28] ext4: Convert BUG_ON checks to use ext4_error() instead

[email protected] wrote:
> On Sat, Mar 06, 2010 at 06:33:48PM +0530, Aneesh Kumar K. V wrote:
>> I guess ext4_error and ext4_warning are now not taking __func__. So we
>> to be consistant with thos i this this can be in lower case even though
>> they are #defines.
>
> Or maybe we should rename them be EXT4_ERROR and EXT4_WARNING. What
> do other people think?
>
> - Ted


As the originator of the lower-case ext4_error and ext4_warning macros,
I don't much care.

I find the lower case more readable, but if the expectation in fs/extN is
that ALL_MACROS_SHALL_BE_UPPER_CASE, that's fine by me.

-Eric

2010-03-07 17:36:22

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH 27/28] ext4: Convert BUG_ON checks to use ext4_error() instead

On 2010-03-06, at 22:51, Eric Sandeen wrote:
> [email protected] wrote:
>> On Sat, Mar 06, 2010 at 06:33:48PM +0530, Aneesh Kumar K. V wrote:
>>> I guess ext4_error and ext4_warning are now not taking __func__.
>>> So we
>>> to be consistant with thos i this this can be in lower case even
>>> though
>>> they are #defines.
>>
>> Or maybe we should rename them be EXT4_ERROR and EXT4_WARNING. What
>> do other people think?
>
> As the originator of the lower-case ext4_error and ext4_warning
> macros,
> I don't much care.
>
> I find the lower case more readable, but if the expectation in fs/
> extN is
> that ALL_MACROS_SHALL_BE_UPPER_CASE, that's fine by me.


I think it would make for a lot of code churn without any rel value.
We have lots of lower-case wrappers around functions for things like
this already (e.g. ext4_journal_start(), etc.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


2010-03-08 17:46:16

by Frank Mayhar

[permalink] [raw]
Subject: Re: [PATCH 27/28] ext4: Convert BUG_ON checks to use ext4_error() instead

On Sat, 2010-03-06 at 21:45 -0500, [email protected] wrote:
> On Sat, Mar 06, 2010 at 06:33:48PM +0530, Aneesh Kumar K. V wrote:
> >
> > I guess ext4_error and ext4_warning are now not taking __func__. So we
> > to be consistant with thos i this this can be in lower case even though
> > they are #defines.
>
> Or maybe we should rename them be EXT4_ERROR and EXT4_WARNING. What
> do other people think?

I guess I'm joining the apathetic majority when I say that it doesn't
matter to me at all. Keeping them as-is is fine.
--
Frank Mayhar <[email protected]>
Google, Inc.