2008-12-12 05:21:40

by Ryusuke Konishi

[permalink] [raw]
Subject: [PATCH mmotm 0/5] nilfs2 filesystem updates

This is a patch set to update nilfs2 filesystem in mmotm (applicable
to mmotm-2008-Dec-12).

The following bugfix and changes which follow comments offered in the
previous submission:

* fix problems around nilfs2 ioctl which may cause system memory shortage.
* avoid double error handlings brought by nilfs_transaction_end() by
separating nilfs_transaction_abort() function.
* insert description of gcinode in the source file.
* add a maintainer entry of nilfs2 to the MAINTAINERS file (to apply as
needed).

Regards,
Ryusuke Konishi
---

MAINTAINERS | 7 ++++
fs/nilfs2/gcinode.c | 21 +++++++++++++-
fs/nilfs2/inode.c | 23 +++++++++------
fs/nilfs2/ioctl.c | 75 ++++++++++++++++++++++++++++++-------------------
fs/nilfs2/mdt.c | 5 ++-
fs/nilfs2/namei.c | 74 +++++++++++++++++++++++++++++++++---------------
fs/nilfs2/nilfs.h | 3 +-
fs/nilfs2/segment.c | 43 ++++++++++++++++++----------
fs/nilfs2/super.c | 11 +------
fs/nilfs2/the_nilfs.h | 4 +-
10 files changed, 175 insertions(+), 91 deletions(-)


2008-12-12 05:21:59

by Ryusuke Konishi

[permalink] [raw]
Subject: [PATCH mmotm 1/5] nilfs2: fix problems of memory allocation in ioctl

The current memory copy function of nilfs2 ioctl has following
problems:

(1) It tries to allocate 128KB size of memory even for small objects.

(2) Though the function repeatedly tries large memory allocations
while reducing the size, GFP_NOWAIT flag is not specified.
This increases the possibility of system memory shortage.

(3) During the retries of (2), verbose warnings are printed
because _GFP_NOWARN flag is not used for the kmalloc calls.

This will fix these problems.

Signed-off-by: Ryusuke Konishi <[email protected]>
---
fs/nilfs2/ioctl.c | 17 ++++++++++++-----
1 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c
index 35ba60e..5ba6e4e 100644
--- a/fs/nilfs2/ioctl.c
+++ b/fs/nilfs2/ioctl.c
@@ -51,13 +51,20 @@ static int nilfs_ioctl_wrap_copy(struct the_nilfs *nilfs,
if (argv->v_nmembs == 0)
return 0;

- for (ksize = KMALLOC_SIZE_MAX; ksize >= KMALLOC_SIZE_MIN; ksize /= 2) {
- buf = kmalloc(ksize, GFP_NOFS);
- if (buf != NULL)
- break;
+ ksize = min_t(unsigned long, argv->v_nmembs * argv->v_size,
+ KMALLOC_SIZE_MAX);
+ while (ksize >= KMALLOC_SIZE_MIN) {
+ buf = kmalloc(ksize, GFP_NOWAIT | __GFP_NOWARN);
+ if (buf)
+ goto allocated;
+ ksize >>= 1;
}
- if (ksize < KMALLOC_SIZE_MIN)
+ ksize = max_t(size_t, ksize, argv->v_size);
+ buf = kmalloc(ksize, GFP_NOFS);
+ if (unlikely(!buf))
return -ENOMEM;
+
+ allocated:
maxmembs = ksize / argv->v_size;

ret = 0;
--
1.5.6.5

2008-12-12 05:22:22

by Ryusuke Konishi

[permalink] [raw]
Subject: [PATCH mmotm 5/5] nilfs2: add maintainer

Signed-off-by: Ryusuke Konishi <[email protected]>
---
MAINTAINERS | 7 +++++++
1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 63d5587..2bf4e81 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3151,6 +3151,13 @@ M: [email protected]
L: [email protected]
S: Maintained

+NILFS2 FILESYSTEM
+P: KONISHI Ryusuke
+M: [email protected]
+L: [email protected]
+W: http://www.nilfs.org/en/
+S: Supported
+
NINJA SCSI-3 / NINJA SCSI-32Bi (16bit/CardBus) PCMCIA SCSI HOST ADAPTER DRIVER
P: YOKOTA Hiroshi
M: [email protected]
--
1.5.6.5

2008-12-12 05:22:37

by Ryusuke Konishi

[permalink] [raw]
Subject: [PATCH mmotm 2/5] nilfs2: cleanup nilfs_clear_inode

This will remove the following unnecessary locks and cleanup code in
nilfs_clear_inode():

- unnecessary protection using nilfs_transaction_begin() and
nilfs_transaction_end().

- cleanup code of i_dirty list field which is never chained
when this function is called.

- spinlock used when releasing i_bh field.

Signed-off-by: Ryusuke Konishi <[email protected]>
---
fs/nilfs2/super.c | 11 +----------
1 files changed, 1 insertions(+), 10 deletions(-)

diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
index 046368b..6d5ca65 100644
--- a/fs/nilfs2/super.c
+++ b/fs/nilfs2/super.c
@@ -184,8 +184,6 @@ static inline void nilfs_destroy_inode_cache(void)
static void nilfs_clear_inode(struct inode *inode)
{
struct nilfs_inode_info *ii = NILFS_I(inode);
- struct nilfs_transaction_info ti;
- struct nilfs_sb_info *sbi = NILFS_SB(inode->i_sb);

#ifdef CONFIG_NILFS_POSIX_ACL
if (ii->i_acl && ii->i_acl != NILFS_ACL_NOT_CACHED) {
@@ -200,21 +198,14 @@ static void nilfs_clear_inode(struct inode *inode)
/*
* Free resources allocated in nilfs_read_inode(), here.
*/
- nilfs_transaction_begin(inode->i_sb, &ti, 0);
-
- spin_lock(&sbi->s_inode_lock);
- if (!list_empty(&ii->i_dirty))
- list_del_init(&ii->i_dirty);
+ BUG_ON(!list_empty(&ii->i_dirty));
brelse(ii->i_bh);
ii->i_bh = NULL;
- spin_unlock(&sbi->s_inode_lock);

if (test_bit(NILFS_I_BMAP, &ii->i_state))
nilfs_bmap_clear(ii->i_bmap);

nilfs_btnode_cache_clear(&ii->i_btnode_cache);
-
- nilfs_transaction_end(inode->i_sb, 0);
}

/**
--
1.5.6.5

2008-12-12 05:22:52

by Ryusuke Konishi

[permalink] [raw]
Subject: [PATCH mmotm 4/5] nilfs2: insert explanations in gcinode file

The file gcinode.c gives buffer cache functions for on-disk blocks
moved in garbage collection. Joern Engel has suggested inserting its
explanations in the source file (Message-ID:
<[email protected]> and
<[email protected]>).

This follows the comment.

Cc: Joern Engel <[email protected]>
Signed-off-by: Ryusuke Konishi <[email protected]>
---
fs/nilfs2/gcinode.c | 21 ++++++++++++++++++++-
1 files changed, 20 insertions(+), 1 deletions(-)

diff --git a/fs/nilfs2/gcinode.c b/fs/nilfs2/gcinode.c
index 0013952..77615aa 100644
--- a/fs/nilfs2/gcinode.c
+++ b/fs/nilfs2/gcinode.c
@@ -1,5 +1,5 @@
/*
- * gcinode.c - NILFS memory inode for GC
+ * gcinode.c - dummy inodes to buffer blocks for garbage collection
*
* Copyright (C) 2005-2008 Nippon Telegraph and Telephone Corporation.
*
@@ -22,6 +22,25 @@
* Revised by Ryusuke Konishi <[email protected]>.
*
*/
+/*
+ * This file adds the cache of on-disk blocks to be moved in garbage
+ * collection. The disk blocks are held with dummy inodes (called
+ * gcinodes), and this file provides lookup function of the dummy
+ * inodes and their buffer read function.
+ *
+ * Since NILFS2 keeps up multiple checkpoints/snapshots accross GC, it
+ * has to treat blocks that belong to a same file but have different
+ * checkpoint numbers. To avoid interference among generations, dummy
+ * inodes are managed separatly from actual inodes, and their lookup
+ * function (nilfs_gc_iget) is designed to be specified with a
+ * checkpoint number argument as well as an inode number.
+ *
+ * Buffers and pages held by the dummy inodes will be released each
+ * time after they are copied to a new log. Dirty blocks made on the
+ * current generation and the blocks to be moved by GC never overlap
+ * because the dirty blocks make a new generation; they rather must be
+ * written individually.
+ */

#include <linux/buffer_head.h>
#include <linux/mpage.h>
--
1.5.6.5

2008-12-12 05:23:17

by Ryusuke Konishi

[permalink] [raw]
Subject: [PATCH mmotm 3/5] nilfs2: avoid double error caused by nilfs_transaction_end

Pekka Enberg pointed out that double error handlings found after
nilfs_transaction_end() can be avoided by separating abort operation:

OK, I don't understand this. The only way nilfs_transaction_end() can
fail is if we have NILFS_TI_SYNC set and we fail to construct the
segment. But why do we want to construct a segment if we don't commit?

I guess what I'm asking is why don't we have a separate
nilfs_transaction_abort() function that can't fail for the erroneous
case to avoid this double error value tracking thing?

This does the separation and renames nilfs_transaction_end() to
nilfs_transaction_commit() for clarification.

Since, some calls of these functions were used just for exclusion
control against the segment constructor, they are replaced with
semaphore operations.

Cc: Pekka Enberg <[email protected]>
Signed-off-by: Ryusuke Konishi <[email protected]>
---
fs/nilfs2/inode.c | 23 +++++++++------
fs/nilfs2/ioctl.c | 58 ++++++++++++++++++++++----------------
fs/nilfs2/mdt.c | 5 ++-
fs/nilfs2/namei.c | 74 +++++++++++++++++++++++++++++++++---------------
fs/nilfs2/nilfs.h | 3 +-
fs/nilfs2/segment.c | 43 ++++++++++++++++++----------
fs/nilfs2/the_nilfs.h | 4 +-
7 files changed, 135 insertions(+), 75 deletions(-)

diff --git a/fs/nilfs2/inode.c b/fs/nilfs2/inode.c
index b4697d9..e29e329 100644
--- a/fs/nilfs2/inode.c
+++ b/fs/nilfs2/inode.c
@@ -76,7 +76,6 @@ int nilfs_get_block(struct inode *inode, sector_t blkoff,
goto out;
err = nilfs_bmap_insert(ii->i_bmap, (unsigned long)blkoff,
(unsigned long)bh_result);
- nilfs_transaction_end(inode->i_sb, !err);
if (unlikely(err != 0)) {
if (err == -EEXIST) {
/*
@@ -99,8 +98,10 @@ int nilfs_get_block(struct inode *inode, sector_t blkoff,
inode->i_ino);
err = -EIO;
}
+ nilfs_transaction_abort(inode->i_sb);
goto out;
}
+ nilfs_transaction_commit(inode->i_sb); /* never fails */
/* Error handling should be detailed */
set_buffer_new(bh_result);
map_bh(bh_result, inode->i_sb, 0); /* dbn must be changed
@@ -196,7 +197,7 @@ static int nilfs_write_begin(struct file *file, struct address_space *mapping,
err = block_write_begin(file, mapping, pos, len, flags, pagep,
fsdata, nilfs_get_block);
if (unlikely(err))
- nilfs_transaction_end(inode->i_sb, 0);
+ nilfs_transaction_abort(inode->i_sb);
return err;
}

@@ -214,7 +215,7 @@ static int nilfs_write_end(struct file *file, struct address_space *mapping,
copied = generic_write_end(file, mapping, pos, len, copied, page,
fsdata);
nilfs_set_file_dirty(NILFS_SB(inode->i_sb), inode, nr_dirty);
- err = nilfs_transaction_end(inode->i_sb, 1);
+ err = nilfs_transaction_commit(inode->i_sb);
return err ? : copied;
}

@@ -639,7 +640,7 @@ void nilfs_truncate(struct inode *inode)
nilfs_set_transaction_flag(NILFS_TI_SYNC);

nilfs_set_file_dirty(NILFS_SB(sb), inode, 0);
- nilfs_transaction_end(sb, 1);
+ nilfs_transaction_commit(sb);
/* May construct a logical segment and may fail in sync mode.
But truncate has no return value. */
}
@@ -667,7 +668,7 @@ void nilfs_delete_inode(struct inode *inode)
/* nilfs_free_inode() marks inode buffer dirty */
if (IS_SYNC(inode))
nilfs_set_transaction_flag(NILFS_TI_SYNC);
- nilfs_transaction_end(sb, 1);
+ nilfs_transaction_commit(sb);
/* May construct a logical segment and may fail in sync mode.
But delete_inode has no return value. */
}
@@ -677,7 +678,7 @@ int nilfs_setattr(struct dentry *dentry, struct iattr *iattr)
struct nilfs_transaction_info ti;
struct inode *inode = dentry->d_inode;
struct super_block *sb = inode->i_sb;
- int err, err2;
+ int err;

err = inode_change_ok(inode, iattr);
if (err)
@@ -689,8 +690,12 @@ int nilfs_setattr(struct dentry *dentry, struct iattr *iattr)
err = inode_setattr(inode, iattr);
if (!err && (iattr->ia_valid & ATTR_MODE))
err = nilfs_acl_chmod(inode);
- err2 = nilfs_transaction_end(sb, 1);
- return err ? : err2;
+ if (likely(!err))
+ err = nilfs_transaction_commit(sb);
+ else
+ nilfs_transaction_abort(sb);
+
+ return err;
}

int nilfs_load_inode_block(struct nilfs_sb_info *sbi, struct inode *inode,
@@ -815,5 +820,5 @@ void nilfs_dirty_inode(struct inode *inode)
nilfs_transaction_begin(inode->i_sb, &ti, 0);
if (likely(inode->i_ino != NILFS_SKETCH_INO))
nilfs_mark_inode_dirty(inode);
- nilfs_transaction_end(inode->i_sb, 1); /* never fails */
+ nilfs_transaction_commit(inode->i_sb); /* never fails */
}
diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c
index 5ba6e4e..216bb30 100644
--- a/fs/nilfs2/ioctl.c
+++ b/fs/nilfs2/ioctl.c
@@ -116,7 +116,11 @@ static int nilfs_ioctl_change_cpmode(struct inode *inode, struct file *filp,
nilfs_transaction_begin(inode->i_sb, &ti, 0);
ret = nilfs_cpfile_change_cpmode(
cpfile, cpmode.cm_cno, cpmode.cm_mode);
- nilfs_transaction_end(inode->i_sb, !ret);
+ if (unlikely(ret < 0)) {
+ nilfs_transaction_abort(inode->i_sb);
+ return ret;
+ }
+ nilfs_transaction_commit(inode->i_sb); /* never fails */
return ret;
}

@@ -136,7 +140,11 @@ nilfs_ioctl_delete_checkpoint(struct inode *inode, struct file *filp,

nilfs_transaction_begin(inode->i_sb, &ti, 0);
ret = nilfs_cpfile_delete_checkpoint(cpfile, cno);
- nilfs_transaction_end(inode->i_sb, !ret);
+ if (unlikely(ret < 0)) {
+ nilfs_transaction_abort(inode->i_sb);
+ return ret;
+ }
+ nilfs_transaction_commit(inode->i_sb); /* never fails */
return ret;
}

@@ -153,16 +161,17 @@ static int nilfs_ioctl_get_cpinfo(struct inode *inode, struct file *filp,
{
struct the_nilfs *nilfs = NILFS_SB(inode->i_sb)->s_nilfs;
struct nilfs_argv argv;
- struct nilfs_transaction_info ti;
int ret;

if (copy_from_user(&argv, argp, sizeof(argv)))
return -EFAULT;

- nilfs_transaction_begin(inode->i_sb, &ti, 0);
+ down_read(&nilfs->ns_segctor_sem);
ret = nilfs_ioctl_wrap_copy(nilfs, &argv, _IOC_DIR(cmd),
nilfs_ioctl_do_get_cpinfo);
- nilfs_transaction_end(inode->i_sb, 0);
+ up_read(&nilfs->ns_segctor_sem);
+ if (ret < 0)
+ return ret;

if (copy_to_user(argp, &argv, sizeof(argv)))
ret = -EFAULT;
@@ -172,14 +181,13 @@ static int nilfs_ioctl_get_cpinfo(struct inode *inode, struct file *filp,
static int nilfs_ioctl_get_cpstat(struct inode *inode, struct file *filp,
unsigned int cmd, void __user *argp)
{
- struct inode *cpfile = NILFS_SB(inode->i_sb)->s_nilfs->ns_cpfile;
+ struct the_nilfs *nilfs = NILFS_SB(inode->i_sb)->s_nilfs;
struct nilfs_cpstat cpstat;
- struct nilfs_transaction_info ti;
int ret;

- nilfs_transaction_begin(inode->i_sb, &ti, 0);
- ret = nilfs_cpfile_get_stat(cpfile, &cpstat);
- nilfs_transaction_end(inode->i_sb, 0);
+ down_read(&nilfs->ns_segctor_sem);
+ ret = nilfs_cpfile_get_stat(nilfs->ns_cpfile, &cpstat);
+ up_read(&nilfs->ns_segctor_sem);
if (ret < 0)
return ret;

@@ -200,16 +208,17 @@ static int nilfs_ioctl_get_suinfo(struct inode *inode, struct file *filp,
{
struct the_nilfs *nilfs = NILFS_SB(inode->i_sb)->s_nilfs;
struct nilfs_argv argv;
- struct nilfs_transaction_info ti;
int ret;

if (copy_from_user(&argv, argp, sizeof(argv)))
return -EFAULT;

- nilfs_transaction_begin(inode->i_sb, &ti, 0);
+ down_read(&nilfs->ns_segctor_sem);
ret = nilfs_ioctl_wrap_copy(nilfs, &argv, _IOC_DIR(cmd),
nilfs_ioctl_do_get_suinfo);
- nilfs_transaction_end(inode->i_sb, 0);
+ up_read(&nilfs->ns_segctor_sem);
+ if (ret < 0)
+ return ret;

if (copy_to_user(argp, &argv, sizeof(argv)))
ret = -EFAULT;
@@ -219,14 +228,13 @@ static int nilfs_ioctl_get_suinfo(struct inode *inode, struct file *filp,
static int nilfs_ioctl_get_sustat(struct inode *inode, struct file *filp,
unsigned int cmd, void __user *argp)
{
- struct inode *sufile = NILFS_SB(inode->i_sb)->s_nilfs->ns_sufile;
+ struct the_nilfs *nilfs = NILFS_SB(inode->i_sb)->s_nilfs;
struct nilfs_sustat sustat;
- struct nilfs_transaction_info ti;
int ret;

- nilfs_transaction_begin(inode->i_sb, &ti, 0);
- ret = nilfs_sufile_get_stat(sufile, &sustat);
- nilfs_transaction_end(inode->i_sb, 0);
+ down_read(&nilfs->ns_segctor_sem);
+ ret = nilfs_sufile_get_stat(nilfs->ns_sufile, &sustat);
+ up_read(&nilfs->ns_segctor_sem);
if (ret < 0)
return ret;

@@ -247,16 +255,17 @@ static int nilfs_ioctl_get_vinfo(struct inode *inode, struct file *filp,
{
struct the_nilfs *nilfs = NILFS_SB(inode->i_sb)->s_nilfs;
struct nilfs_argv argv;
- struct nilfs_transaction_info ti;
int ret;

if (copy_from_user(&argv, argp, sizeof(argv)))
return -EFAULT;

- nilfs_transaction_begin(inode->i_sb, &ti, 0);
+ down_read(&nilfs->ns_segctor_sem);
ret = nilfs_ioctl_wrap_copy(nilfs, &argv, _IOC_DIR(cmd),
nilfs_ioctl_do_get_vinfo);
- nilfs_transaction_end(inode->i_sb, 0);
+ up_read(&nilfs->ns_segctor_sem);
+ if (ret < 0)
+ return ret;

if (copy_to_user(argp, &argv, sizeof(argv)))
ret = -EFAULT;
@@ -291,16 +300,17 @@ static int nilfs_ioctl_get_bdescs(struct inode *inode, struct file *filp,
{
struct the_nilfs *nilfs = NILFS_SB(inode->i_sb)->s_nilfs;
struct nilfs_argv argv;
- struct nilfs_transaction_info ti;
int ret;

if (copy_from_user(&argv, argp, sizeof(argv)))
return -EFAULT;

- nilfs_transaction_begin(inode->i_sb, &ti, 0);
+ down_read(&nilfs->ns_segctor_sem);
ret = nilfs_ioctl_wrap_copy(nilfs, &argv, _IOC_DIR(cmd),
nilfs_ioctl_do_get_bdescs);
- nilfs_transaction_end(inode->i_sb, 0);
+ up_read(&nilfs->ns_segctor_sem);
+ if (ret < 0)
+ return ret;

if (copy_to_user(argp, &argv, sizeof(argv)))
ret = -EFAULT;
diff --git a/fs/nilfs2/mdt.c b/fs/nilfs2/mdt.c
index 6ab8475..e0a632b 100644
--- a/fs/nilfs2/mdt.c
+++ b/fs/nilfs2/mdt.c
@@ -123,7 +123,10 @@ static int nilfs_mdt_create_block(struct inode *inode, unsigned long block,
brelse(bh);

failed_unlock:
- nilfs_transaction_end(sb, !err);
+ if (likely(!err))
+ err = nilfs_transaction_commit(sb);
+ else
+ nilfs_transaction_abort(sb);
if (writer)
nilfs_put_writer(nilfs);
out:
diff --git a/fs/nilfs2/namei.c b/fs/nilfs2/namei.c
index 95d1b29..df70dad 100644
--- a/fs/nilfs2/namei.c
+++ b/fs/nilfs2/namei.c
@@ -109,7 +109,7 @@ static int nilfs_create(struct inode *dir, struct dentry *dentry, int mode,
{
struct inode *inode;
struct nilfs_transaction_info ti;
- int err, err2;
+ int err;

err = nilfs_transaction_begin(dir->i_sb, &ti, 1);
if (err)
@@ -123,8 +123,12 @@ static int nilfs_create(struct inode *dir, struct dentry *dentry, int mode,
mark_inode_dirty(inode);
err = nilfs_add_nondir(dentry, inode);
}
- err2 = nilfs_transaction_end(dir->i_sb, !err);
- return err ? : err2;
+ if (!err)
+ err = nilfs_transaction_commit(dir->i_sb);
+ else
+ nilfs_transaction_abort(dir->i_sb);
+
+ return err;
}

static int
@@ -132,7 +136,7 @@ nilfs_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t rdev)
{
struct inode *inode;
struct nilfs_transaction_info ti;
- int err, err2;
+ int err;

if (!new_valid_dev(rdev))
return -EINVAL;
@@ -147,8 +151,12 @@ nilfs_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t rdev)
mark_inode_dirty(inode);
err = nilfs_add_nondir(dentry, inode);
}
- err2 = nilfs_transaction_end(dir->i_sb, !err);
- return err ? : err2;
+ if (!err)
+ err = nilfs_transaction_commit(dir->i_sb);
+ else
+ nilfs_transaction_abort(dir->i_sb);
+
+ return err;
}

static int nilfs_symlink(struct inode *dir, struct dentry *dentry,
@@ -158,7 +166,7 @@ static int nilfs_symlink(struct inode *dir, struct dentry *dentry,
struct super_block *sb = dir->i_sb;
unsigned l = strlen(symname)+1;
struct inode *inode;
- int err, err2;
+ int err;

if (l > sb->s_blocksize)
return -ENAMETOOLONG;
@@ -184,8 +192,12 @@ static int nilfs_symlink(struct inode *dir, struct dentry *dentry,

err = nilfs_add_nondir(dentry, inode);
out:
- err2 = nilfs_transaction_end(dir->i_sb, !err);
- return err ? : err2;
+ if (!err)
+ err = nilfs_transaction_commit(dir->i_sb);
+ else
+ nilfs_transaction_abort(dir->i_sb);
+
+ return err;

out_fail:
inode_dec_link_count(inode);
@@ -198,7 +210,7 @@ static int nilfs_link(struct dentry *old_dentry, struct inode *dir,
{
struct inode *inode = old_dentry->d_inode;
struct nilfs_transaction_info ti;
- int err, err2;
+ int err;

if (inode->i_nlink >= NILFS_LINK_MAX)
return -EMLINK;
@@ -212,15 +224,19 @@ static int nilfs_link(struct dentry *old_dentry, struct inode *dir,
atomic_inc(&inode->i_count);

err = nilfs_add_nondir(dentry, inode);
- err2 = nilfs_transaction_end(dir->i_sb, !err);
- return err ? : err2;
+ if (!err)
+ err = nilfs_transaction_commit(dir->i_sb);
+ else
+ nilfs_transaction_abort(dir->i_sb);
+
+ return err;
}

static int nilfs_mkdir(struct inode *dir, struct dentry *dentry, int mode)
{
struct inode *inode;
struct nilfs_transaction_info ti;
- int err, err2;
+ int err;

if (dir->i_nlink >= NILFS_LINK_MAX)
return -EMLINK;
@@ -252,8 +268,12 @@ static int nilfs_mkdir(struct inode *dir, struct dentry *dentry, int mode)

d_instantiate(dentry, inode);
out:
- err2 = nilfs_transaction_end(dir->i_sb, !err);
- return err ? : err2;
+ if (!err)
+ err = nilfs_transaction_commit(dir->i_sb);
+ else
+ nilfs_transaction_abort(dir->i_sb);
+
+ return err;

out_fail:
inode_dec_link_count(inode);
@@ -270,7 +290,7 @@ static int nilfs_unlink(struct inode *dir, struct dentry *dentry)
struct nilfs_dir_entry *de;
struct page *page;
struct nilfs_transaction_info ti;
- int err, err2;
+ int err;

err = nilfs_transaction_begin(dir->i_sb, &ti, 0);
if (err)
@@ -300,15 +320,19 @@ static int nilfs_unlink(struct inode *dir, struct dentry *dentry)
inode_dec_link_count(inode);
err = 0;
out:
- err2 = nilfs_transaction_end(dir->i_sb, !err);
- return err ? : err2;
+ if (!err)
+ err = nilfs_transaction_commit(dir->i_sb);
+ else
+ nilfs_transaction_abort(dir->i_sb);
+
+ return err;
}

static int nilfs_rmdir(struct inode *dir, struct dentry *dentry)
{
struct inode *inode = dentry->d_inode;
struct nilfs_transaction_info ti;
- int err, err2;
+ int err;

err = nilfs_transaction_begin(dir->i_sb, &ti, 0);
if (err)
@@ -323,8 +347,12 @@ static int nilfs_rmdir(struct inode *dir, struct dentry *dentry)
inode_dec_link_count(dir);
}
}
- err2 = nilfs_transaction_end(dir->i_sb, !err);
- return err ? : err2;
+ if (!err)
+ err = nilfs_transaction_commit(dir->i_sb);
+ else
+ nilfs_transaction_abort(dir->i_sb);
+
+ return err;
}

static int nilfs_rename(struct inode *old_dir, struct dentry *old_dentry,
@@ -404,7 +432,7 @@ static int nilfs_rename(struct inode *old_dir, struct dentry *old_dentry,
inode_dec_link_count(old_dir);
}

- err = nilfs_transaction_end(old_dir->i_sb, 1);
+ err = nilfs_transaction_commit(old_dir->i_sb);
return err;

out_dir:
@@ -416,7 +444,7 @@ out_old:
kunmap(old_page);
page_cache_release(old_page);
out:
- nilfs_transaction_end(old_dir->i_sb, 0);
+ nilfs_transaction_abort(old_dir->i_sb);
return err;
}

diff --git a/fs/nilfs2/nilfs.h b/fs/nilfs2/nilfs.h
index c33b8db..30c5341 100644
--- a/fs/nilfs2/nilfs.h
+++ b/fs/nilfs2/nilfs.h
@@ -166,7 +166,8 @@ struct nilfs_transaction_info {

int nilfs_transaction_begin(struct super_block *,
struct nilfs_transaction_info *, int);
-int nilfs_transaction_end(struct super_block *, int);
+int nilfs_transaction_commit(struct super_block *);
+void nilfs_transaction_abort(struct super_block *);

static inline void nilfs_set_transaction_flag(unsigned int flag)
{
diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index 2b6e44b..182fb26 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -163,8 +163,8 @@ static int nilfs_prepare_segment_lock(struct nilfs_transaction_info *ti)
else {
/*
* If journal_info field is occupied by other FS,
- * we save it and restore on nilfs_transaction_end().
- * But this should never happen.
+ * it is saved and will be restored on
+ * nilfs_transaction_commit().
*/
printk(KERN_WARNING
"NILFS warning: journal info from a different "
@@ -195,7 +195,7 @@ static int nilfs_prepare_segment_lock(struct nilfs_transaction_info *ti)
*
* nilfs_transaction_begin() acquires a reader/writer semaphore, called
* the segment semaphore, to make a segment construction and write tasks
- * exclusive. The function is used with nilfs_transaction_end() in pairs.
+ * exclusive. The function is used with nilfs_transaction_commit() in pairs.
* The region enclosed by these two functions can be nested. To avoid a
* deadlock, the semaphore is only acquired or released in the outermost call.
*
@@ -212,8 +212,6 @@ static int nilfs_prepare_segment_lock(struct nilfs_transaction_info *ti)
*
* %-ENOMEM - Insufficient memory available.
*
- * %-ERESTARTSYS - Interrupted
- *
* %-ENOSPC - No space left on device
*/
int nilfs_transaction_begin(struct super_block *sb,
@@ -248,16 +246,17 @@ int nilfs_transaction_begin(struct super_block *sb,
}

/**
- * nilfs_transaction_end - end indivisible file operations.
+ * nilfs_transaction_commit - commit indivisible file operations.
* @sb: super block
- * @commit: commit flag (0 for no change)
*
- * nilfs_transaction_end() releases the read semaphore which is
- * acquired by nilfs_transaction_begin(). Its releasing is only done
- * in outermost call of this function. If the nilfs_transaction_info
- * was allocated dynamically, it is given back to a slab cache.
+ * nilfs_transaction_commit() releases the read semaphore which is
+ * acquired by nilfs_transaction_begin(). This is only performed
+ * in outermost call of this function. If a commit flag is set,
+ * nilfs_transaction_commit() sets a timer to start the segment
+ * constructor. If a sync flag is set, it starts construction
+ * directly.
*/
-int nilfs_transaction_end(struct super_block *sb, int commit)
+int nilfs_transaction_commit(struct super_block *sb)
{
struct nilfs_transaction_info *ti = current->journal_info;
struct nilfs_sb_info *sbi;
@@ -265,9 +264,7 @@ int nilfs_transaction_end(struct super_block *sb, int commit)
int err = 0;

BUG_ON(ti == NULL || ti->ti_magic != NILFS_TI_MAGIC);
-
- if (commit)
- ti->ti_flags |= NILFS_TI_COMMIT;
+ ti->ti_flags |= NILFS_TI_COMMIT;
if (ti->ti_count > 0) {
ti->ti_count--;
return 0;
@@ -291,6 +288,22 @@ int nilfs_transaction_end(struct super_block *sb, int commit)
return err;
}

+void nilfs_transaction_abort(struct super_block *sb)
+{
+ struct nilfs_transaction_info *ti = current->journal_info;
+
+ BUG_ON(ti == NULL || ti->ti_magic != NILFS_TI_MAGIC);
+ if (ti->ti_count > 0) {
+ ti->ti_count--;
+ return;
+ }
+ up_read(&NILFS_SB(sb)->s_nilfs->ns_segctor_sem);
+
+ current->journal_info = ti->ti_save;
+ if (ti->ti_flags & NILFS_TI_DYNAMIC_ALLOC)
+ kmem_cache_free(nilfs_transaction_cachep, ti);
+}
+
void nilfs_relax_pressure_in_lock(struct super_block *sb)
{
struct nilfs_sb_info *sbi = NILFS_SB(sb);
diff --git a/fs/nilfs2/the_nilfs.h b/fs/nilfs2/the_nilfs.h
index dee8d83..9cd3c11 100644
--- a/fs/nilfs2/the_nilfs.h
+++ b/fs/nilfs2/the_nilfs.h
@@ -112,8 +112,8 @@ struct the_nilfs {
/*
* Following fields are dedicated to a writable FS-instance.
* Except for the period seeking checkpoint, code outside the segment
- * constructor must lock a segment semaphore with transaction_begin()
- * and transaction_end(), when accessing these fields.
+ * constructor must lock a segment semaphore while accessing these
+ * fields.
* The writable FS-instance is sole during a lifetime of the_nilfs.
*/
u64 ns_seg_seq;
--
1.5.6.5

2008-12-12 07:50:43

by Pekka Enberg

[permalink] [raw]
Subject: Re: [PATCH mmotm 3/5] nilfs2: avoid double error caused by nilfs_transaction_end

Hi Ryusuke,

On Fri, Dec 12, 2008 at 7:16 AM, Ryusuke Konishi
<[email protected]> wrote:
> Pekka Enberg pointed out that double error handlings found after
> nilfs_transaction_end() can be avoided by separating abort operation:
>
> OK, I don't understand this. The only way nilfs_transaction_end() can
> fail is if we have NILFS_TI_SYNC set and we fail to construct the
> segment. But why do we want to construct a segment if we don't commit?
>
> I guess what I'm asking is why don't we have a separate
> nilfs_transaction_abort() function that can't fail for the erroneous
> case to avoid this double error value tracking thing?
>
> This does the separation and renames nilfs_transaction_end() to
> nilfs_transaction_commit() for clarification.
>
> Since, some calls of these functions were used just for exclusion
> control against the segment constructor, they are replaced with
> semaphore operations.
>
> Cc: Pekka Enberg <[email protected]>
> Signed-off-by: Ryusuke Konishi <[email protected]>

Nice cleanup!

Acked-by: Pekka Enberg <[email protected]>

2008-12-12 13:47:47

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH mmotm 1/5] nilfs2: fix problems of memory allocation in ioctl

Ryusuke Konishi <[email protected]> writes:

> The current memory copy function of nilfs2 ioctl has following
> problems:

Wouldn't it be easier/faster/more reliable to just use vmalloc for
those allocations? The only reason to use kmalloc for such large
allocations would be if you want to do direct DMA (but even there are
ways to do this using virtual allocations and scatter-gather)

-Andi
>

--
[email protected]

2008-12-12 18:48:35

by Ryusuke Konishi

[permalink] [raw]
Subject: Re: [PATCH mmotm 1/5] nilfs2: fix problems of memory allocation in ioctl

Hi,
On Fri, 12 Dec 2008 14:47:58 +0100, Andi Kleen <[email protected]> wrote:
> Ryusuke Konishi <[email protected]> writes:
>
> > The current memory copy function of nilfs2 ioctl has following
> > problems:
>
> Wouldn't it be easier/faster/more reliable to just use vmalloc for
> those allocations? The only reason to use kmalloc for such large
> allocations would be if you want to do direct DMA (but even there are
> ways to do this using virtual allocations and scatter-gather)
>
> -Andi

Thanks. Using vmalloc seems worthy of consideration.
I'll try to rewrite and test it to confirm whether it can make the
code simpler and robust enough.

By the way, is there any recommended way to exchange such a large
amount of data items between user space and kernel space?

In the current interface, each data item is copied twice: one is to
the allocated memory from user space (via copy_from_user), and another
is to on-memory structures or to buffers/pages from the allocated
memory.

This looks somehow inefficient.

Regards,
Ryusuke

2008-12-12 20:12:33

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH mmotm 1/5] nilfs2: fix problems of memory allocation in ioctl

> By the way, is there any recommended way to exchange such a large
> amount of data items between user space and kernel space?
>
> In the current interface, each data item is copied twice: one is to
> the allocated memory from user space (via copy_from_user), and another

For such large copies it is better to use multiple smaller (e.g. 4K)
copy user, that gives better real time preempt latencies. Each cfu has a
cond_resched(), but only one, not multiple times in the inner loop.

> is to on-memory structures or to buffers/pages from the allocated
> memory.

It depends how performance critical it is.

One way for example is to grab the user pages using get_user_pages()
and then reference those pages directly using kmap().
But you would be at the mercy of the user process not modifying in
parallel then. Normally it is safer to work from copies in kernel
space to avoid races. As long as it doesn't happen too often a few
copies are also usually not a problem. I wouldn't worry about them
unless you see them prominently in profiler logs.

-Andi


--
[email protected]

2008-12-13 08:29:39

by Ryusuke Konishi

[permalink] [raw]
Subject: Re: [PATCH mmotm 1/5] nilfs2: fix problems of memory allocation in ioctl

On Fri, 12 Dec 2008 21:24:11 +0100, Andi Kleen <[email protected]> wrote:
> > In the current interface, each data item is copied twice: one is to
> > the allocated memory from user space (via copy_from_user), and another
>
> For such large copies it is better to use multiple smaller (e.g. 4K)
> copy user, that gives better real time preempt latencies. Each cfu has a
> cond_resched(), but only one, not multiple times in the inner loop.

For the function in question, the buffer memory can be divided into a
smaller size (at least to 4K bytes) since the buffer is repeatedly
used for small objects, where the copy_from_user (and a copy_to_user)
is only once in each cycle.

So, just reducing the allocation size of the buffer seems good; it is
likely able to avoid both large preempt latencies and large memory
allocation, which also can leave off the use of vmalloc.

> > is to on-memory structures or to buffers/pages from the allocated
> > memory.
>
> It depends how performance critical it is.
>
> One way for example is to grab the user pages using get_user_pages()
> and then reference those pages directly using kmap().
> But you would be at the mercy of the user process not modifying in
> parallel then. Normally it is safer to work from copies in kernel
> space to avoid races. As long as it doesn't happen too often a few
> copies are also usually not a problem. I wouldn't worry about them
> unless you see them prominently in profiler logs.
>
> -Andi

I got it. If need arises, then I'll recall get_user_pages(). At
present, there is likely no need to do like that.

Thank you for the informative advises.

With regards,
Ryusuke

2008-12-14 12:09:39

by Ryusuke Konishi

[permalink] [raw]
Subject: Re: [PATCH mmotm 1/5] nilfs2: fix problems of memory allocation in ioctl

This is the revised patch for fixing the following problems of a
memory copy function in nilfs2 ioctl.

(1) It tries to allocate 128KB size of memory even for small objects.

(2) Though the function repeatedly tries large memory allocations
while reducing the size, GFP_NOWAIT flag is not specified.
This increases the possibility of system memory shortage.

(3) During the retries of (2), verbose warnings are printed
because _GFP_NOWARN flag is not used for the kmalloc calls.

The first patch was still doing large allocations by kmalloc which are
repeatedly tried while reducing the size.

Andi Kleen has pointed out that just using vmalloc would be
easy/faster/more reliable, and he also told me that using
copy_from_user for large memory is not good from the viewpoint of
preempt latency:

On Fri, 12 Dec 2008 21:24:11 +0100, Andi Kleen <[email protected]> wrote:
> > In the current interface, each data item is copied twice: one is to
> > the allocated memory from user space (via copy_from_user), and another
>
> For such large copies it is better to use multiple smaller (e.g. 4K)
> copy user, that gives better real time preempt latencies. Each cfu has a
> cond_resched(), but only one, not multiple times in the inner loop.

For the function in question, the size of buffer memory can be reduced
since the buffer is repeatedly used for a number of small objects. On
the other hand, it may incur large preempt latencies for larger buffer
because a copy_from_user (and a copy_to_user) was applied only once
each cycle.

So, this revision avoids the latency issue as well as fixes the
original problems merely by reducing allocation size of the buffer.

Cc: Andi Kleen <[email protected]>
Signed-off-by: Ryusuke Konishi <[email protected]>
---
fs/nilfs2/ioctl.c | 12 ++++--------
1 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c
index 35ba60e..23378c3 100644
--- a/fs/nilfs2/ioctl.c
+++ b/fs/nilfs2/ioctl.c
@@ -34,8 +34,7 @@
#include "dat.h"


-#define KMALLOC_SIZE_MIN 4096 /* 4KB */
-#define KMALLOC_SIZE_MAX 131072 /* 128 KB */
+#define NILFS_IOCTL_KMALLOC_SIZE 8192 /* 8KB */

static int nilfs_ioctl_wrap_copy(struct the_nilfs *nilfs,
struct nilfs_argv *argv, int dir,
@@ -51,12 +50,9 @@ static int nilfs_ioctl_wrap_copy(struct the_nilfs *nilfs,
if (argv->v_nmembs == 0)
return 0;

- for (ksize = KMALLOC_SIZE_MAX; ksize >= KMALLOC_SIZE_MIN; ksize /= 2) {
- buf = kmalloc(ksize, GFP_NOFS);
- if (buf != NULL)
- break;
- }
- if (ksize < KMALLOC_SIZE_MIN)
+ ksize = max_t(size_t, NILFS_IOCTL_KMALLOC_SIZE, argv->v_size);
+ buf = kmalloc(ksize, GFP_NOFS);
+ if (unlikely(!buf))
return -ENOMEM;
maxmembs = ksize / argv->v_size;

--
1.5.6.5

2008-12-14 15:01:40

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH mmotm 1/5] nilfs2: fix problems of memory allocation in ioctl

> -#define KMALLOC_SIZE_MIN 4096 /* 4KB */
> -#define KMALLOC_SIZE_MAX 131072 /* 128 KB */
> +#define NILFS_IOCTL_KMALLOC_SIZE 8192 /* 8KB */

Better would be if you could go to PAGE_SIZE. order 0 allocations
are typically the fastest / least likely to stall.

Also in this case it's a good idea to use __get_free_pages()
directly, kmalloc tends to be become less efficient at larger
sizes.

-Andi

2008-12-15 06:59:04

by Ryusuke Konishi

[permalink] [raw]
Subject: Re: [PATCH mmotm 1/5] nilfs2: fix problems of memory allocation in ioctl

On Sun, 14 Dec 2008 16:13:27 +0100, Andi Kleen <[email protected]> wrote:
> > -#define KMALLOC_SIZE_MIN 4096 /* 4KB */
> > -#define KMALLOC_SIZE_MAX 131072 /* 128 KB */
> > +#define NILFS_IOCTL_KMALLOC_SIZE 8192 /* 8KB */
>
> Better would be if you could go to PAGE_SIZE. order 0 allocations
> are typically the fastest / least likely to stall.
>
> Also in this case it's a good idea to use __get_free_pages()
> directly, kmalloc tends to be become less efficient at larger
> sizes.

Thanks again. I've rewritten the patch to use __get_free_pages().

Regards,
Ryusuke Konishi

---
nilfs2: fix problems of memory allocation in ioctl 3

This is another patch for fixing the following problems of a memory
copy function in nilfs2 ioctl:

(1) It tries to allocate 128KB size of memory even for small objects.

(2) Though the function repeatedly tries large memory allocations
while reducing the size, GFP_NOWAIT flag is not specified.
This increases the possibility of system memory shortage.

(3) During the retries of (2), verbose warnings are printed
because _GFP_NOWARN flag is not used for the kmalloc calls.

The first patch was still doing large allocations by kmalloc which are
repeatedly tried while reducing the size.

Andi Kleen told me that using copy_from_user for large memory is not
good from the viewpoint of preempt latency:

On Fri, 12 Dec 2008 21:24:11 +0100, Andi Kleen <[email protected]> wrote:
> > In the current interface, each data item is copied twice: one is to
> > the allocated memory from user space (via copy_from_user), and another
>
> For such large copies it is better to use multiple smaller (e.g. 4K)
> copy user, that gives better real time preempt latencies. Each cfu has a
> cond_resched(), but only one, not multiple times in the inner loop.

He also advised me that:

On Sun, 14 Dec 2008 16:13:27 +0100, Andi Kleen <[email protected]> wrote:
> Better would be if you could go to PAGE_SIZE. order 0 allocations
> are typically the fastest / least likely to stall.
>
> Also in this case it's a good idea to use __get_free_pages()
> directly, kmalloc tends to be become less efficient at larger
> sizes.

For the function in question, the size of buffer memory can be reduced
since the buffer is repeatedly used for a number of small objects. On
the other hand, it may incur large preempt latencies for larger buffer
because a copy_from_user (and a copy_to_user) was applied only once
each cycle.

With that, this revision uses the order 0 allocations with
__get_free_pages() to fix the original problems.

Cc: Andi Kleen <[email protected]>
Signed-off-by: Ryusuke Konishi <[email protected]>
---
fs/nilfs2/ioctl.c | 20 ++++++++------------
1 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c
index 35ba60e..02e91e1 100644
--- a/fs/nilfs2/ioctl.c
+++ b/fs/nilfs2/ioctl.c
@@ -34,9 +34,6 @@
#include "dat.h"


-#define KMALLOC_SIZE_MIN 4096 /* 4KB */
-#define KMALLOC_SIZE_MAX 131072 /* 128 KB */
-
static int nilfs_ioctl_wrap_copy(struct the_nilfs *nilfs,
struct nilfs_argv *argv, int dir,
ssize_t (*dofunc)(struct the_nilfs *,
@@ -44,21 +41,20 @@ static int nilfs_ioctl_wrap_copy(struct the_nilfs *nilfs,
void *, size_t, size_t))
{
void *buf;
- size_t ksize, maxmembs, total, n;
+ size_t maxmembs, total, n;
ssize_t nr;
int ret, i;

if (argv->v_nmembs == 0)
return 0;

- for (ksize = KMALLOC_SIZE_MAX; ksize >= KMALLOC_SIZE_MIN; ksize /= 2) {
- buf = kmalloc(ksize, GFP_NOFS);
- if (buf != NULL)
- break;
- }
- if (ksize < KMALLOC_SIZE_MIN)
+ if (argv->v_size > PAGE_SIZE)
+ return -EINVAL;
+
+ buf = (void *)__get_free_pages(GFP_NOFS, 0);
+ if (unlikely(!buf))
return -ENOMEM;
- maxmembs = ksize / argv->v_size;
+ maxmembs = PAGE_SIZE / argv->v_size;

ret = 0;
total = 0;
@@ -89,7 +85,7 @@ static int nilfs_ioctl_wrap_copy(struct the_nilfs *nilfs,
}
argv->v_nmembs = total;

- kfree(buf);
+ free_pages((unsigned long)buf, 0);
return ret;
}

--
1.5.6.5

2008-12-15 10:05:55

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH mmotm 1/5] nilfs2: fix problems of memory allocation in ioctl

On Mon, Dec 15, 2008 at 03:58:40PM +0900, Ryusuke Konishi wrote:
> On Sun, 14 Dec 2008 16:13:27 +0100, Andi Kleen <[email protected]> wrote:
> > > -#define KMALLOC_SIZE_MIN 4096 /* 4KB */
> > > -#define KMALLOC_SIZE_MAX 131072 /* 128 KB */
> > > +#define NILFS_IOCTL_KMALLOC_SIZE 8192 /* 8KB */
> >
> > Better would be if you could go to PAGE_SIZE. order 0 allocations
> > are typically the fastest / least likely to stall.
> >
> > Also in this case it's a good idea to use __get_free_pages()
> > directly, kmalloc tends to be become less efficient at larger
> > sizes.
>
> Thanks again. I've rewritten the patch to use __get_free_pages().

Thanks looks good now.

-Andi