2014-02-03 16:55:08

by Ryusuke Konishi

[permalink] [raw]
Subject: [PATCH 0/4] nilfs2 updates

Hi Andrew,

Please queue this patchset for the next merge window.

Andreas's three patches will add a new ioctl which helps to mitigate
overheads of garbage collection of nilfs. The original cover letter
can be seen at:

[1] http://marc.info/?l=linux-nilfs&m=139081676816008

there the userland counterpart is not yet finished, but this kernel
patchset is ready for merge, I think.

My additional patch updates entries of nilfs2 file system in
MAINTAINERS file (independently).

Thanks in advance,
Ryusuke Konishi
--
Andreas Rohner (3):
nilfs2: add struct nilfs_suinfo_update and flags
nilfs2: add nilfs_sufile_set_suinfo to update segment usage
nilfs2: implementation of NILFS_IOCTL_SET_SUINFO ioctl

Ryusuke Konishi (1):
nilfs2: update MAINTAINERS file entries

MAINTAINERS | 4 +-
fs/nilfs2/ioctl.c | 92 +++++++++++++++++++++++++++++++
fs/nilfs2/sufile.c | 131 +++++++++++++++++++++++++++++++++++++++++++++
fs/nilfs2/sufile.h | 1 +
include/linux/nilfs2_fs.h | 44 +++++++++++++++
5 files changed, 270 insertions(+), 2 deletions(-)


2014-02-03 16:55:10

by Ryusuke Konishi

[permalink] [raw]
Subject: [PATCH 1/4] nilfs2: update MAINTAINERS file entries

Update git repository entry of nilfs2 file system and maintainer's
email description.

Signed-off-by: Ryusuke Konishi <[email protected]>
---
MAINTAINERS | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index b2cf5cf..342caaa 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6085,10 +6085,10 @@ F: include/uapi/linux/nfs*
F: include/uapi/linux/sunrpc/

NILFS2 FILESYSTEM
-M: KONISHI Ryusuke <[email protected]>
+M: Ryusuke Konishi <[email protected]>
L: [email protected]
W: http://www.nilfs.org/en/
-T: git git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2.git
+T: git git://github.com/konis/nilfs2.git
S: Supported
F: Documentation/filesystems/nilfs2.txt
F: fs/nilfs2/
--
1.7.9.3

2014-02-03 16:55:21

by Ryusuke Konishi

[permalink] [raw]
Subject: [PATCH 4/4] nilfs2: implementation of NILFS_IOCTL_SET_SUINFO ioctl

From: Andreas Rohner <[email protected]>

With this ioctl the segment usage entries in the SUFILE can be
updated from userspace.

This is useful, because it allows the userspace GC to modify and update
segment usage entries for specific segments, which enables it to avoid
unnecessary write operations.

If a segment needs to be cleaned, but there is no or very little
reclaimable space in it, the cleaning operation basically degrades to
a useless moving operation. In the end the only thing that changes is
the location of the data and a timestamp in the segment usage
information. With this ioctl the GC can skip the cleaning and update
the segment usage entries directly instead.

This is basically a shortcut to cleaning the segment. It is still
necessary to read the segment summary information, but the writing of
the live blocks can be skipped if it's not worth it.

Signed-off-by: Andreas Rohner <[email protected]>
Signed-off-by: Ryusuke Konishi <[email protected]>
---
fs/nilfs2/ioctl.c | 92 +++++++++++++++++++++++++++++++++++++++++++++
include/linux/nilfs2_fs.h | 2 +
2 files changed, 94 insertions(+)

diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c
index 2b34021..c19a231 100644
--- a/fs/nilfs2/ioctl.c
+++ b/fs/nilfs2/ioctl.c
@@ -1163,6 +1163,95 @@ static int nilfs_ioctl_get_info(struct inode *inode, struct file *filp,
return ret;
}

+/**
+ * nilfs_ioctl_set_suinfo - set segment usage info
+ * @inode: inode object
+ * @filp: file object
+ * @cmd: ioctl's request code
+ * @argp: pointer on argument from userspace
+ *
+ * Description: Expects an array of nilfs_suinfo_update structures
+ * encapsulated in nilfs_argv and updates the segment usage info
+ * according to the flags in nilfs_suinfo_update.
+ *
+ * Return Value: On success, 0 is returned. On error, one of the
+ * following negative error codes is returned.
+ *
+ * %-EPERM - Not enough permissions
+ *
+ * %-EFAULT - Error copying input data
+ *
+ * %-EIO - I/O error.
+ *
+ * %-ENOMEM - Insufficient amount of memory available.
+ *
+ * %-EINVAL - Invalid values in input (segment number, flags or nblocks)
+ */
+static int nilfs_ioctl_set_suinfo(struct inode *inode, struct file *filp,
+ unsigned int cmd, void __user *argp)
+{
+ struct the_nilfs *nilfs = inode->i_sb->s_fs_info;
+ struct nilfs_transaction_info ti;
+ struct nilfs_argv argv;
+ size_t len;
+ void __user *base;
+ void *kbuf;
+ int ret;
+
+ if (!capable(CAP_SYS_ADMIN))
+ return -EPERM;
+
+ ret = mnt_want_write_file(filp);
+ if (ret)
+ return ret;
+
+ ret = -EFAULT;
+ if (copy_from_user(&argv, argp, sizeof(argv)))
+ goto out;
+
+ ret = -EINVAL;
+ if (argv.v_size < sizeof(struct nilfs_suinfo_update))
+ goto out;
+
+ if (argv.v_nmembs > nilfs->ns_nsegments)
+ goto out;
+
+ if (argv.v_nmembs >= UINT_MAX / argv.v_size)
+ goto out;
+
+ len = argv.v_size * argv.v_nmembs;
+ if (!len) {
+ ret = 0;
+ goto out;
+ }
+
+ base = (void __user *)(unsigned long)argv.v_base;
+ kbuf = vmalloc(len);
+ if (!kbuf) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ if (copy_from_user(kbuf, base, len)) {
+ ret = -EFAULT;
+ goto out_free;
+ }
+
+ nilfs_transaction_begin(inode->i_sb, &ti, 0);
+ ret = nilfs_sufile_set_suinfo(nilfs->ns_sufile, kbuf, argv.v_size,
+ argv.v_nmembs);
+ if (unlikely(ret < 0))
+ nilfs_transaction_abort(inode->i_sb);
+ else
+ nilfs_transaction_commit(inode->i_sb); /* never fails */
+
+out_free:
+ vfree(kbuf);
+out:
+ mnt_drop_write_file(filp);
+ return ret;
+}
+
long nilfs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
{
struct inode *inode = file_inode(filp);
@@ -1189,6 +1278,8 @@ long nilfs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
return nilfs_ioctl_get_info(inode, filp, cmd, argp,
sizeof(struct nilfs_suinfo),
nilfs_ioctl_do_get_suinfo);
+ case NILFS_IOCTL_SET_SUINFO:
+ return nilfs_ioctl_set_suinfo(inode, filp, cmd, argp);
case NILFS_IOCTL_GET_SUSTAT:
return nilfs_ioctl_get_sustat(inode, filp, cmd, argp);
case NILFS_IOCTL_GET_VINFO:
@@ -1228,6 +1319,7 @@ long nilfs_compat_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
case NILFS_IOCTL_GET_CPINFO:
case NILFS_IOCTL_GET_CPSTAT:
case NILFS_IOCTL_GET_SUINFO:
+ case NILFS_IOCTL_SET_SUINFO:
case NILFS_IOCTL_GET_SUSTAT:
case NILFS_IOCTL_GET_VINFO:
case NILFS_IOCTL_GET_BDESCS:
diff --git a/include/linux/nilfs2_fs.h b/include/linux/nilfs2_fs.h
index 2526578..1fb465f 100644
--- a/include/linux/nilfs2_fs.h
+++ b/include/linux/nilfs2_fs.h
@@ -905,5 +905,7 @@ struct nilfs_bdesc {
_IOW(NILFS_IOCTL_IDENT, 0x8B, __u64)
#define NILFS_IOCTL_SET_ALLOC_RANGE \
_IOW(NILFS_IOCTL_IDENT, 0x8C, __u64[2])
+#define NILFS_IOCTL_SET_SUINFO \
+ _IOW(NILFS_IOCTL_IDENT, 0x8D, struct nilfs_argv)

#endif /* _LINUX_NILFS_FS_H */
--
1.7.9.3

2014-02-03 16:55:19

by Ryusuke Konishi

[permalink] [raw]
Subject: [PATCH 2/4] nilfs2: add struct nilfs_suinfo_update and flags

From: Andreas Rohner <[email protected]>

This patch adds the nilfs_suinfo_update structure, which contains the
information needed to update one segment usage entry. The flags
specify, which fields need to be updated.

Signed-off-by: Andreas Rohner <[email protected]>
Signed-off-by: Ryusuke Konishi <[email protected]>
---
include/linux/nilfs2_fs.h | 42 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 42 insertions(+)

diff --git a/include/linux/nilfs2_fs.h b/include/linux/nilfs2_fs.h
index 9875576..2526578 100644
--- a/include/linux/nilfs2_fs.h
+++ b/include/linux/nilfs2_fs.h
@@ -710,6 +710,48 @@ static inline int nilfs_suinfo_clean(const struct nilfs_suinfo *si)
}

/* ioctl */
+/**
+ * nilfs_suinfo_update - segment usage information update
+ * @sup_segnum: segment number
+ * @sup_flags: flags for which fields are active in sup_sui
+ * @sup_reserved: reserved necessary for alignment
+ * @sup_sui: segment usage information
+ */
+struct nilfs_suinfo_update {
+ __u64 sup_segnum;
+ __u32 sup_flags;
+ __u32 sup_reserved;
+ struct nilfs_suinfo sup_sui;
+};
+
+enum {
+ NILFS_SUINFO_UPDATE_LASTMOD,
+ NILFS_SUINFO_UPDATE_NBLOCKS,
+ NILFS_SUINFO_UPDATE_FLAGS,
+ __NR_NILFS_SUINFO_UPDATE_FIELDS,
+};
+
+#define NILFS_SUINFO_UPDATE_FNS(flag, name) \
+static inline void \
+nilfs_suinfo_update_set_##name(struct nilfs_suinfo_update *sup) \
+{ \
+ sup->sup_flags |= 1UL << NILFS_SUINFO_UPDATE_##flag; \
+} \
+static inline void \
+nilfs_suinfo_update_clear_##name(struct nilfs_suinfo_update *sup) \
+{ \
+ sup->sup_flags &= ~(1UL << NILFS_SUINFO_UPDATE_##flag); \
+} \
+static inline int \
+nilfs_suinfo_update_##name(const struct nilfs_suinfo_update *sup) \
+{ \
+ return !!(sup->sup_flags & (1UL << NILFS_SUINFO_UPDATE_##flag));\
+}
+
+NILFS_SUINFO_UPDATE_FNS(LASTMOD, lastmod)
+NILFS_SUINFO_UPDATE_FNS(NBLOCKS, nblocks)
+NILFS_SUINFO_UPDATE_FNS(FLAGS, flags)
+
enum {
NILFS_CHECKPOINT,
NILFS_SNAPSHOT,
--
1.7.9.3

2014-02-03 16:55:17

by Ryusuke Konishi

[permalink] [raw]
Subject: [PATCH 3/4] nilfs2: add nilfs_sufile_set_suinfo to update segment usage

From: Andreas Rohner <[email protected]>

This patch introduces the nilfs_sufile_set_suinfo function, which
expects an array of nilfs_suinfo_update structures and updates the
segment usage information accordingly.

This is basically a helper function for the newly introduced
NILFS_IOCTL_SET_SUINFO ioctl.

Signed-off-by: Andreas Rohner <[email protected]>
Signed-off-by: Ryusuke Konishi <[email protected]>
---
fs/nilfs2/sufile.c | 131 ++++++++++++++++++++++++++++++++++++++++++++++++++++
fs/nilfs2/sufile.h | 1 +
2 files changed, 132 insertions(+)

diff --git a/fs/nilfs2/sufile.c b/fs/nilfs2/sufile.c
index 3127e9f..c37b5f0 100644
--- a/fs/nilfs2/sufile.c
+++ b/fs/nilfs2/sufile.c
@@ -870,6 +870,137 @@ ssize_t nilfs_sufile_get_suinfo(struct inode *sufile, __u64 segnum, void *buf,
}

/**
+ * nilfs_sufile_set_suinfo - sets segment usage info
+ * @sufile: inode of segment usage file
+ * @buf: array of suinfo_update
+ * @supsz: byte size of suinfo_update
+ * @nsup: size of suinfo_update array
+ *
+ * Description: Takes an array of nilfs_suinfo_update structs and updates
+ * segment usage accordingly. Only the fields indicated by the sup_flags
+ * are updated.
+ *
+ * Return Value: On success, 0 is returned. On error, one of the
+ * following negative error codes is returned.
+ *
+ * %-EIO - I/O error.
+ *
+ * %-ENOMEM - Insufficient amount of memory available.
+ *
+ * %-EINVAL - Invalid values in input (segment number, flags or nblocks)
+ */
+ssize_t nilfs_sufile_set_suinfo(struct inode *sufile, void *buf,
+ unsigned supsz, size_t nsup)
+{
+ struct the_nilfs *nilfs = sufile->i_sb->s_fs_info;
+ struct buffer_head *header_bh, *bh;
+ struct nilfs_suinfo_update *sup, *supend = buf + supsz * nsup;
+ struct nilfs_segment_usage *su;
+ void *kaddr;
+ unsigned long blkoff, prev_blkoff;
+ int cleansi, cleansu, dirtysi, dirtysu;
+ long ncleaned = 0, ndirtied = 0;
+ int ret = 0;
+
+ if (unlikely(nsup == 0))
+ return ret;
+
+ for (sup = buf; sup < supend; sup = (void *)sup + supsz) {
+ if (sup->sup_segnum >= nilfs->ns_nsegments
+ || (sup->sup_flags &
+ (~0UL << __NR_NILFS_SUINFO_UPDATE_FIELDS))
+ || (nilfs_suinfo_update_nblocks(sup) &&
+ sup->sup_sui.sui_nblocks >
+ nilfs->ns_blocks_per_segment))
+ return -EINVAL;
+ }
+
+ down_write(&NILFS_MDT(sufile)->mi_sem);
+
+ ret = nilfs_sufile_get_header_block(sufile, &header_bh);
+ if (ret < 0)
+ goto out_sem;
+
+ sup = buf;
+ blkoff = nilfs_sufile_get_blkoff(sufile, sup->sup_segnum);
+ ret = nilfs_mdt_get_block(sufile, blkoff, 1, NULL, &bh);
+ if (ret < 0)
+ goto out_header;
+
+ for (;;) {
+ kaddr = kmap_atomic(bh->b_page);
+ su = nilfs_sufile_block_get_segment_usage(
+ sufile, sup->sup_segnum, bh, kaddr);
+
+ if (nilfs_suinfo_update_lastmod(sup))
+ su->su_lastmod = cpu_to_le64(sup->sup_sui.sui_lastmod);
+
+ if (nilfs_suinfo_update_nblocks(sup))
+ su->su_nblocks = cpu_to_le32(sup->sup_sui.sui_nblocks);
+
+ if (nilfs_suinfo_update_flags(sup)) {
+ /*
+ * Active flag is a virtual flag projected by running
+ * nilfs kernel code - drop it not to write it to
+ * disk.
+ */
+ sup->sup_sui.sui_flags &=
+ ~(1UL << NILFS_SEGMENT_USAGE_ACTIVE);
+
+ cleansi = nilfs_suinfo_clean(&sup->sup_sui);
+ cleansu = nilfs_segment_usage_clean(su);
+ dirtysi = nilfs_suinfo_dirty(&sup->sup_sui);
+ dirtysu = nilfs_segment_usage_dirty(su);
+
+ if (cleansi && !cleansu)
+ ++ncleaned;
+ else if (!cleansi && cleansu)
+ --ncleaned;
+
+ if (dirtysi && !dirtysu)
+ ++ndirtied;
+ else if (!dirtysi && dirtysu)
+ --ndirtied;
+
+ su->su_flags = cpu_to_le32(sup->sup_sui.sui_flags);
+ }
+
+ kunmap_atomic(kaddr);
+
+ sup = (void *)sup + supsz;
+ if (sup >= supend)
+ break;
+
+ prev_blkoff = blkoff;
+ blkoff = nilfs_sufile_get_blkoff(sufile, sup->sup_segnum);
+ if (blkoff == prev_blkoff)
+ continue;
+
+ /* get different block */
+ mark_buffer_dirty(bh);
+ brelse(bh);
+ ret = nilfs_mdt_get_block(sufile, blkoff, 1, NULL, &bh);
+ if (unlikely(ret < 0))
+ goto out_mark;
+ }
+ mark_buffer_dirty(bh);
+ brelse(bh);
+
+ out_mark:
+ if (ncleaned || ndirtied) {
+ nilfs_sufile_mod_counter(header_bh, (u64)ncleaned,
+ (u64)ndirtied);
+ NILFS_SUI(sufile)->ncleansegs += ncleaned;
+ }
+ nilfs_mdt_mark_dirty(sufile);
+ out_header:
+ brelse(header_bh);
+ out_sem:
+ up_write(&NILFS_MDT(sufile)->mi_sem);
+ return ret;
+}
+
+/**
* nilfs_sufile_read - read or get sufile inode
* @sb: super block instance
* @susize: size of a segment usage entry
diff --git a/fs/nilfs2/sufile.h b/fs/nilfs2/sufile.h
index e84bc5b..366003c 100644
--- a/fs/nilfs2/sufile.h
+++ b/fs/nilfs2/sufile.h
@@ -44,6 +44,7 @@ int nilfs_sufile_set_segment_usage(struct inode *sufile, __u64 segnum,
int nilfs_sufile_get_stat(struct inode *, struct nilfs_sustat *);
ssize_t nilfs_sufile_get_suinfo(struct inode *, __u64, void *, unsigned,
size_t);
+ssize_t nilfs_sufile_set_suinfo(struct inode *, void *, unsigned , size_t);

int nilfs_sufile_updatev(struct inode *, __u64 *, size_t, int, size_t *,
void (*dofunc)(struct inode *, __u64,
--
1.7.9.3

2014-02-03 21:38:22

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 3/4] nilfs2: add nilfs_sufile_set_suinfo to update segment usage

On Tue, 4 Feb 2014 01:50:43 +0900 Ryusuke Konishi <[email protected]> wrote:

> From: Andreas Rohner <[email protected]>
>
> This patch introduces the nilfs_sufile_set_suinfo function, which
> expects an array of nilfs_suinfo_update structures and updates the
> segment usage information accordingly.
>
> This is basically a helper function for the newly introduced
> NILFS_IOCTL_SET_SUINFO ioctl.
>
> ..
>
> --- a/fs/nilfs2/sufile.c
> +++ b/fs/nilfs2/sufile.c
> @@ -870,6 +870,137 @@ ssize_t nilfs_sufile_get_suinfo(struct inode *sufile, __u64 segnum, void *buf,
> }
>
> /**
> + * nilfs_sufile_set_suinfo - sets segment usage info
> + * @sufile: inode of segment usage file
> + * @buf: array of suinfo_update
> + * @supsz: byte size of suinfo_update
> + * @nsup: size of suinfo_update array
> + *
> + * Description: Takes an array of nilfs_suinfo_update structs and updates
> + * segment usage accordingly. Only the fields indicated by the sup_flags
> + * are updated.
> + *
> + * Return Value: On success, 0 is returned. On error, one of the
> + * following negative error codes is returned.
> + *
> + * %-EIO - I/O error.
> + *
> + * %-ENOMEM - Insufficient amount of memory available.
> + *
> + * %-EINVAL - Invalid values in input (segment number, flags or nblocks)
> + */
> +ssize_t nilfs_sufile_set_suinfo(struct inode *sufile, void *buf,
> + unsigned supsz, size_t nsup)
> +{
> + struct the_nilfs *nilfs = sufile->i_sb->s_fs_info;
> + struct buffer_head *header_bh, *bh;
> + struct nilfs_suinfo_update *sup, *supend = buf + supsz * nsup;
> + struct nilfs_segment_usage *su;
> + void *kaddr;
> + unsigned long blkoff, prev_blkoff;
> + int cleansi, cleansu, dirtysi, dirtysu;
> + long ncleaned = 0, ndirtied = 0;
> + int ret = 0;
> +
> + if (unlikely(nsup == 0))
> + return ret;
> +
> + for (sup = buf; sup < supend; sup = (void *)sup + supsz) {
> + if (sup->sup_segnum >= nilfs->ns_nsegments
> + || (sup->sup_flags &
> + (~0UL << __NR_NILFS_SUINFO_UPDATE_FIELDS))
> + || (nilfs_suinfo_update_nblocks(sup) &&
> + sup->sup_sui.sui_nblocks >
> + nilfs->ns_blocks_per_segment))
> + return -EINVAL;
> + }
> +
> + down_write(&NILFS_MDT(sufile)->mi_sem);
> +
> + ret = nilfs_sufile_get_header_block(sufile, &header_bh);
> + if (ret < 0)
> + goto out_sem;
> +
> + sup = buf;
> + blkoff = nilfs_sufile_get_blkoff(sufile, sup->sup_segnum);
> + ret = nilfs_mdt_get_block(sufile, blkoff, 1, NULL, &bh);
> + if (ret < 0)
> + goto out_header;
> +
> + for (;;) {
> + kaddr = kmap_atomic(bh->b_page);

Can this buffer_head really be in highmem?

> + su = nilfs_sufile_block_get_segment_usage(
> + sufile, sup->sup_segnum, bh, kaddr);

Returns an address wthin the kmapped page. I really hope
nilfs_sufile_block_get_segment_usage() cannot return an address outside
that page - it appears to do quite a lot of unchecked arithmetic which
is dependent on stuff which was read from the disk. What it that was
interfered with or otherwise corrupted?

> + if (nilfs_suinfo_update_lastmod(sup))
> + su->su_lastmod = cpu_to_le64(sup->sup_sui.sui_lastmod);
> +
> + if (nilfs_suinfo_update_nblocks(sup))
> + su->su_nblocks = cpu_to_le32(sup->sup_sui.sui_nblocks);
> +
> + if (nilfs_suinfo_update_flags(sup)) {
> + /*
> + * Active flag is a virtual flag projected by running
> + * nilfs kernel code - drop it not to write it to
> + * disk.
> + */
> + sup->sup_sui.sui_flags &=
> + ~(1UL << NILFS_SEGMENT_USAGE_ACTIVE);
> +
> + cleansi = nilfs_suinfo_clean(&sup->sup_sui);
> + cleansu = nilfs_segment_usage_clean(su);
> + dirtysi = nilfs_suinfo_dirty(&sup->sup_sui);
> + dirtysu = nilfs_segment_usage_dirty(su);
> +
> + if (cleansi && !cleansu)
> + ++ncleaned;
> + else if (!cleansi && cleansu)
> + --ncleaned;
> +
> + if (dirtysi && !dirtysu)
> + ++ndirtied;
> + else if (!dirtysi && dirtysu)
> + --ndirtied;
> +
> + su->su_flags = cpu_to_le32(sup->sup_sui.sui_flags);
> + }
> +
> + kunmap_atomic(kaddr);

flush_dcache_page()? Can the page be mapped by userspace?

> + sup = (void *)sup + supsz;
> + if (sup >= supend)
> + break;
> +
> + prev_blkoff = blkoff;
> + blkoff = nilfs_sufile_get_blkoff(sufile, sup->sup_segnum);
> + if (blkoff == prev_blkoff)
> + continue;
> +
> + /* get different block */
> + mark_buffer_dirty(bh);
> + brelse(bh);

put_bh() will suffice - we know bh != NULL.

> + ret = nilfs_mdt_get_block(sufile, blkoff, 1, NULL, &bh);
> + if (unlikely(ret < 0))
> + goto out_mark;
> + }
> + mark_buffer_dirty(bh);
> + brelse(bh);

ditto

> + out_mark:
> + if (ncleaned || ndirtied) {
> + nilfs_sufile_mod_counter(header_bh, (u64)ncleaned,
> + (u64)ndirtied);
> + NILFS_SUI(sufile)->ncleansegs += ncleaned;
> + }
> + nilfs_mdt_mark_dirty(sufile);
> + out_header:
> + brelse(header_bh);
> + out_sem:
> + up_write(&NILFS_MDT(sufile)->mi_sem);
> + return ret;
> +}
> +
> +/**

2014-02-03 21:41:05

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 4/4] nilfs2: implementation of NILFS_IOCTL_SET_SUINFO ioctl

On Tue, 4 Feb 2014 01:50:44 +0900 Ryusuke Konishi <[email protected]> wrote:

> With this ioctl the segment usage entries in the SUFILE can be
> updated from userspace.
>
> This is useful, because it allows the userspace GC to modify and update
> segment usage entries for specific segments, which enables it to avoid
> unnecessary write operations.
>
> If a segment needs to be cleaned, but there is no or very little
> reclaimable space in it, the cleaning operation basically degrades to
> a useless moving operation. In the end the only thing that changes is
> the location of the data and a timestamp in the segment usage
> information. With this ioctl the GC can skip the cleaning and update
> the segment usage entries directly instead.
>
> This is basically a shortcut to cleaning the segment. It is still
> necessary to read the segment summary information, but the writing of
> the live blocks can be skipped if it's not worth it.

Documentation/filesystems/nilfs2.txt should be updated to document the
new ioctl.

Which we're in there, please check that the ioctl documentation is
otherwise complete and up-to-date. These things have a tendency to
bitrot.

2014-02-04 01:48:12

by Ryusuke Konishi

[permalink] [raw]
Subject: Re: [PATCH 4/4] nilfs2: implementation of NILFS_IOCTL_SET_SUINFO ioctl

On Mon, 3 Feb 2014 13:41:01 -0800, Andrew Morton wrote:
> On Tue, 4 Feb 2014 01:50:44 +0900 Ryusuke Konishi <[email protected]> wrote:
>
>> With this ioctl the segment usage entries in the SUFILE can be
>> updated from userspace.
>>
>> This is useful, because it allows the userspace GC to modify and update
>> segment usage entries for specific segments, which enables it to avoid
>> unnecessary write operations.
>>
>> If a segment needs to be cleaned, but there is no or very little
>> reclaimable space in it, the cleaning operation basically degrades to
>> a useless moving operation. In the end the only thing that changes is
>> the location of the data and a timestamp in the segment usage
>> information. With this ioctl the GC can skip the cleaning and update
>> the segment usage entries directly instead.
>>
>> This is basically a shortcut to cleaning the segment. It is still
>> necessary to read the segment summary information, but the writing of
>> the live blocks can be skipped if it's not worth it.
>
> Documentation/filesystems/nilfs2.txt should be updated to document the
> new ioctl.
>
> Which we're in there, please check that the ioctl documentation is
> otherwise complete and up-to-date. These things have a tendency to
> bitrot.

Got it. I missed the recent effort by Vyacheslav which added
description on every ioctl in the doucument file.

I'll send a patch for this soon.

Thanks,
Ryusuke Konishi

2014-02-04 16:42:07

by Ryusuke Konishi

[permalink] [raw]
Subject: Re: [PATCH 3/4] nilfs2: add nilfs_sufile_set_suinfo to update segment usage

On Mon, 3 Feb 2014 13:38:18 -0800, Andrew Morton wrote:
> On Tue, 4 Feb 2014 01:50:43 +0900 Ryusuke Konishi <[email protected]> wrote:
>
>> From: Andreas Rohner <[email protected]>
>>
>> This patch introduces the nilfs_sufile_set_suinfo function, which
>> expects an array of nilfs_suinfo_update structures and updates the
>> segment usage information accordingly.
>>
>> This is basically a helper function for the newly introduced
>> NILFS_IOCTL_SET_SUINFO ioctl.
>>
>> ..
>>
>> --- a/fs/nilfs2/sufile.c
>> +++ b/fs/nilfs2/sufile.c
>> @@ -870,6 +870,137 @@ ssize_t nilfs_sufile_get_suinfo(struct inode *sufile, __u64 segnum, void *buf,
>> }
>>
>> /**
>> + * nilfs_sufile_set_suinfo - sets segment usage info
>> + * @sufile: inode of segment usage file
>> + * @buf: array of suinfo_update
>> + * @supsz: byte size of suinfo_update
>> + * @nsup: size of suinfo_update array
>> + *
>> + * Description: Takes an array of nilfs_suinfo_update structs and updates
>> + * segment usage accordingly. Only the fields indicated by the sup_flags
>> + * are updated.
>> + *
>> + * Return Value: On success, 0 is returned. On error, one of the
>> + * following negative error codes is returned.
>> + *
>> + * %-EIO - I/O error.
>> + *
>> + * %-ENOMEM - Insufficient amount of memory available.
>> + *
>> + * %-EINVAL - Invalid values in input (segment number, flags or nblocks)
>> + */
>> +ssize_t nilfs_sufile_set_suinfo(struct inode *sufile, void *buf,
>> + unsigned supsz, size_t nsup)
>> +{
>> + struct the_nilfs *nilfs = sufile->i_sb->s_fs_info;
>> + struct buffer_head *header_bh, *bh;
>> + struct nilfs_suinfo_update *sup, *supend = buf + supsz * nsup;
>> + struct nilfs_segment_usage *su;
>> + void *kaddr;
>> + unsigned long blkoff, prev_blkoff;
>> + int cleansi, cleansu, dirtysi, dirtysu;
>> + long ncleaned = 0, ndirtied = 0;
>> + int ret = 0;
>> +
>> + if (unlikely(nsup == 0))
>> + return ret;
>> +
>> + for (sup = buf; sup < supend; sup = (void *)sup + supsz) {
>> + if (sup->sup_segnum >= nilfs->ns_nsegments
>> + || (sup->sup_flags &
>> + (~0UL << __NR_NILFS_SUINFO_UPDATE_FIELDS))
>> + || (nilfs_suinfo_update_nblocks(sup) &&
>> + sup->sup_sui.sui_nblocks >
>> + nilfs->ns_blocks_per_segment))
>> + return -EINVAL;
>> + }
>> +
>> + down_write(&NILFS_MDT(sufile)->mi_sem);
>> +
>> + ret = nilfs_sufile_get_header_block(sufile, &header_bh);
>> + if (ret < 0)
>> + goto out_sem;
>> +
>> + sup = buf;
>> + blkoff = nilfs_sufile_get_blkoff(sufile, sup->sup_segnum);
>> + ret = nilfs_mdt_get_block(sufile, blkoff, 1, NULL, &bh);
>> + if (ret < 0)
>> + goto out_header;
>> +
>> + for (;;) {
>> + kaddr = kmap_atomic(bh->b_page);
>
> Can this buffer_head really be in highmem?

Yes, data blocks of metadata files can be allocated in highmem. This
buffer head is one of them.

>> + su = nilfs_sufile_block_get_segment_usage(
>> + sufile, sup->sup_segnum, bh, kaddr);
>
> Returns an address wthin the kmapped page. I really hope
> nilfs_sufile_block_get_segment_usage() cannot return an address outside
> that page - it appears to do quite a lot of unchecked arithmetic which
> is dependent on stuff which was read from the disk. What it that was
> interfered with or otherwise corrupted?

That's right. Several range checks looks to be needed, for instance,
for segment usage size, checkpoint size, dat entry size, and inode
size. I will try to add these missing checks.

>> + if (nilfs_suinfo_update_lastmod(sup))
>> + su->su_lastmod = cpu_to_le64(sup->sup_sui.sui_lastmod);
>> +
>> + if (nilfs_suinfo_update_nblocks(sup))
>> + su->su_nblocks = cpu_to_le32(sup->sup_sui.sui_nblocks);
>> +
>> + if (nilfs_suinfo_update_flags(sup)) {
>> + /*
>> + * Active flag is a virtual flag projected by running
>> + * nilfs kernel code - drop it not to write it to
>> + * disk.
>> + */
>> + sup->sup_sui.sui_flags &=
>> + ~(1UL << NILFS_SEGMENT_USAGE_ACTIVE);
>> +
>> + cleansi = nilfs_suinfo_clean(&sup->sup_sui);
>> + cleansu = nilfs_segment_usage_clean(su);
>> + dirtysi = nilfs_suinfo_dirty(&sup->sup_sui);
>> + dirtysu = nilfs_segment_usage_dirty(su);
>> +
>> + if (cleansi && !cleansu)
>> + ++ncleaned;
>> + else if (!cleansi && cleansu)
>> + --ncleaned;
>> +
>> + if (dirtysi && !dirtysu)
>> + ++ndirtied;
>> + else if (!dirtysi && dirtysu)
>> + --ndirtied;
>> +
>> + su->su_flags = cpu_to_le32(sup->sup_sui.sui_flags);
>> + }
>> +
>> + kunmap_atomic(kaddr);
>
> flush_dcache_page()? Can the page be mapped by userspace?

This page is never mapped to userspace, so flush_dcache_page() looks
unnecessary here.

>> + sup = (void *)sup + supsz;
>> + if (sup >= supend)
>> + break;
>> +
>> + prev_blkoff = blkoff;
>> + blkoff = nilfs_sufile_get_blkoff(sufile, sup->sup_segnum);
>> + if (blkoff == prev_blkoff)
>> + continue;
>> +
>> + /* get different block */
>> + mark_buffer_dirty(bh);
>> + brelse(bh);
>
> put_bh() will suffice - we know bh != NULL.

Agreed. I will fix it later.

>> + ret = nilfs_mdt_get_block(sufile, blkoff, 1, NULL, &bh);
>> + if (unlikely(ret < 0))
>> + goto out_mark;
>> + }
>> + mark_buffer_dirty(bh);
>> + brelse(bh);
>
> ditto

Thank you for your review and comments.

Regards,
Ryusuke Konishi


>> + out_mark:
>> + if (ncleaned || ndirtied) {
>> + nilfs_sufile_mod_counter(header_bh, (u64)ncleaned,
>> + (u64)ndirtied);
>> + NILFS_SUI(sufile)->ncleansegs += ncleaned;
>> + }
>> + nilfs_mdt_mark_dirty(sufile);
>> + out_header:
>> + brelse(header_bh);
>> + out_sem:
>> + up_write(&NILFS_MDT(sufile)->mi_sem);
>> + return ret;
>> +}
>> +
>> +/**
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2014-02-04 17:23:51

by Ryusuke Konishi

[permalink] [raw]
Subject: Re: [PATCH 3/4] nilfs2: add nilfs_sufile_set_suinfo to update segment usage

On Wed, 05 Feb 2014 01:41:37 +0900 (JST), Ryusuke Konishi wrote:
> On Mon, 3 Feb 2014 13:38:18 -0800, Andrew Morton wrote:
>> On Tue, 4 Feb 2014 01:50:43 +0900 Ryusuke Konishi <[email protected]> wrote:
>>
>>> From: Andreas Rohner <[email protected]>
>>>
>>> This patch introduces the nilfs_sufile_set_suinfo function, which
>>> expects an array of nilfs_suinfo_update structures and updates the
>>> segment usage information accordingly.
>>>
>>> This is basically a helper function for the newly introduced
>>> NILFS_IOCTL_SET_SUINFO ioctl.
>>>
>>> ..
>>> + sup = (void *)sup + supsz;
>>> + if (sup >= supend)
>>> + break;
>>> +
>>> + prev_blkoff = blkoff;
>>> + blkoff = nilfs_sufile_get_blkoff(sufile, sup->sup_segnum);
>>> + if (blkoff == prev_blkoff)
>>> + continue;
>>> +
>>> + /* get different block */
>>> + mark_buffer_dirty(bh);
>>> + brelse(bh);
>>
>> put_bh() will suffice - we know bh != NULL.
>
> Agreed. I will fix it later.

Here is the fix.

Ryusuke Konishi
--
From: Ryusuke Konishi <[email protected]>
Subject: [PATCH] nilfs2: add nilfs_sufile_set_suinfo to update segment usage
fix

Use put_bh() instead of brelse() because we know bh != NULL.

Signed-off-by: Ryusuke Konishi <[email protected]>
---
fs/nilfs2/sufile.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/nilfs2/sufile.c b/fs/nilfs2/sufile.c
index c37b5f0..5628b99 100644
--- a/fs/nilfs2/sufile.c
+++ b/fs/nilfs2/sufile.c
@@ -978,13 +978,13 @@ ssize_t nilfs_sufile_set_suinfo(struct inode *sufile, void *buf,

/* get different block */
mark_buffer_dirty(bh);
- brelse(bh);
+ put_bh(bh);
ret = nilfs_mdt_get_block(sufile, blkoff, 1, NULL, &bh);
if (unlikely(ret < 0))
goto out_mark;
}
mark_buffer_dirty(bh);
- brelse(bh);
+ put_bh(bh);

out_mark:
if (ncleaned || ndirtied) {
@@ -994,7 +994,7 @@ ssize_t nilfs_sufile_set_suinfo(struct inode *sufile, void *buf,
}
nilfs_mdt_mark_dirty(sufile);
out_header:
- brelse(header_bh);
+ put_bh(header_bh);
out_sem:
up_write(&NILFS_MDT(sufile)->mi_sem);
return ret;
--
1.7.9.3