2014-10-16 10:05:55

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH v11 08/21] dax,ext2: Replace ext2_clear_xip_target with dax_clear_blocks

On 25-Sep-2014 04:33:25 PM, Matthew Wilcox wrote:
> This is practically generic code; other filesystems will want to call
> it from other places, but there's nothing ext2-specific about it.
>
> Make it a little more generic by allowing it to take a count of the number
> of bytes to zero rather than fixing it to a single page. Thanks to Dave
> Hansen for suggesting that I need to call cond_resched() if zeroing more
> than one page.
>
> Signed-off-by: Matthew Wilcox <[email protected]>
> ---
> fs/dax.c | 35 +++++++++++++++++++++++++++++++++++
> fs/ext2/inode.c | 8 +++++---
> fs/ext2/xip.c | 14 --------------
> fs/ext2/xip.h | 3 ---
> include/linux/fs.h | 6 ++++++
> 5 files changed, 46 insertions(+), 20 deletions(-)
>
> diff --git a/fs/dax.c b/fs/dax.c
> index 108c68e..02e226f 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -20,8 +20,43 @@
> #include <linux/fs.h>
> #include <linux/genhd.h>
> #include <linux/mutex.h>
> +#include <linux/sched.h>
> #include <linux/uio.h>
>
> +int dax_clear_blocks(struct inode *inode, sector_t block, long size)
> +{
> + struct block_device *bdev = inode->i_sb->s_bdev;
> + sector_t sector = block << (inode->i_blkbits - 9);

Is there a define e.g. SECTOR_SHIFT rather than using this hardcoded "9"
value ?

> +
> + might_sleep();
> + do {
> + void *addr;
> + unsigned long pfn;
> + long count;
> +
> + count = bdev_direct_access(bdev, sector, &addr, &pfn, size);
> + if (count < 0)
> + return count;
> + while (count > 0) {
> + unsigned pgsz = PAGE_SIZE - offset_in_page(addr);

unsigned -> unsigned int

add a newline between variable declaration and following code.

> + if (pgsz > count)
> + pgsz = count;
> + if (pgsz < PAGE_SIZE)
> + memset(addr, 0, pgsz);
> + else
> + clear_page(addr);
> + addr += pgsz;
> + size -= pgsz;
> + count -= pgsz;
> + sector += pgsz / 512;

Also wondering about those 512 constants everywhere in the code
(including prior patches). Perhaps it calls for a SECTOR_SIZE define ?

> + cond_resched();
> + }
> + } while (size);

Just to stay on the safe side, can we do while (size > 0) ? Just in case
an unforeseen issue makes size negative, and gets us in a very long loop.

Thanks,

Mathieu

> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(dax_clear_blocks);
> +
> static long dax_get_addr(struct buffer_head *bh, void **addr, unsigned blkbits)
> {
> unsigned long pfn;
> diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
> index 3ccd5fd..52978b8 100644
> --- a/fs/ext2/inode.c
> +++ b/fs/ext2/inode.c
> @@ -733,10 +733,12 @@ static int ext2_get_blocks(struct inode *inode,
>
> if (IS_DAX(inode)) {
> /*
> - * we need to clear the block
> + * block must be initialised before we put it in the tree
> + * so that it's not found by another thread before it's
> + * initialised
> */
> - err = ext2_clear_xip_target (inode,
> - le32_to_cpu(chain[depth-1].key));
> + err = dax_clear_blocks(inode, le32_to_cpu(chain[depth-1].key),
> + 1 << inode->i_blkbits);
> if (err) {
> mutex_unlock(&ei->truncate_mutex);
> goto cleanup;
> diff --git a/fs/ext2/xip.c b/fs/ext2/xip.c
> index bbc5fec..8cfca3a 100644
> --- a/fs/ext2/xip.c
> +++ b/fs/ext2/xip.c
> @@ -42,20 +42,6 @@ __ext2_get_block(struct inode *inode, pgoff_t pgoff, int create,
> return rc;
> }
>
> -int
> -ext2_clear_xip_target(struct inode *inode, sector_t block)
> -{
> - void *kaddr;
> - unsigned long pfn;
> - long size;
> -
> - size = __inode_direct_access(inode, block, &kaddr, &pfn, PAGE_SIZE);
> - if (size < 0)
> - return size;
> - clear_page(kaddr);
> - return 0;
> -}
> -
> void ext2_xip_verify_sb(struct super_block *sb)
> {
> struct ext2_sb_info *sbi = EXT2_SB(sb);
> diff --git a/fs/ext2/xip.h b/fs/ext2/xip.h
> index 29be737..b2592f2 100644
> --- a/fs/ext2/xip.h
> +++ b/fs/ext2/xip.h
> @@ -7,8 +7,6 @@
>
> #ifdef CONFIG_EXT2_FS_XIP
> extern void ext2_xip_verify_sb (struct super_block *);
> -extern int ext2_clear_xip_target (struct inode *, sector_t);
> -
> static inline int ext2_use_xip (struct super_block *sb)
> {
> struct ext2_sb_info *sbi = EXT2_SB(sb);
> @@ -19,6 +17,5 @@ int ext2_get_xip_mem(struct address_space *, pgoff_t, int,
> #else
> #define ext2_xip_verify_sb(sb) do { } while (0)
> #define ext2_use_xip(sb) 0
> -#define ext2_clear_xip_target(inode, chain) 0
> #define ext2_get_xip_mem NULL
> #endif
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 45839e8..c04d371 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2490,11 +2490,17 @@ extern int generic_file_open(struct inode * inode, struct file * filp);
> extern int nonseekable_open(struct inode * inode, struct file * filp);
>
> #ifdef CONFIG_FS_XIP
> +int dax_clear_blocks(struct inode *, sector_t block, long size);
> extern int xip_file_mmap(struct file * file, struct vm_area_struct * vma);
> extern int xip_truncate_page(struct address_space *mapping, loff_t from);
> ssize_t dax_do_io(int rw, struct kiocb *, struct inode *, struct iov_iter *,
> loff_t, get_block_t, dio_iodone_t, int flags);
> #else
> +static inline int dax_clear_blocks(struct inode *i, sector_t blk, long sz)
> +{
> + return 0;
> +}
> +
> static inline int xip_truncate_page(struct address_space *mapping, loff_t from)
> {
> return 0;
> --
> 2.1.0
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
>
>

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
Key fingerprint: 2A0B 4ED9 15F2 D3FA 45F5 B162 1728 0A97 8118 6ACF


2014-10-17 07:03:09

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v11 08/21] dax,ext2: Replace ext2_clear_xip_target with dax_clear_blocks

On Thu, Oct 16, 2014 at 12:05:25PM +0200, Mathieu Desnoyers wrote:
> > +int dax_clear_blocks(struct inode *inode, sector_t block, long size)
> > +{
> > + struct block_device *bdev = inode->i_sb->s_bdev;
> > + sector_t sector = block << (inode->i_blkbits - 9);
>
> Is there a define e.g. SECTOR_SHIFT rather than using this hardcoded "9"
> value ?

Yeah ... in half a dozen drivers, so introducing them globally spews
warnings about redefining macros. The '9' and '512' are sprinkled all
over the storage parts of the kernel, it's a complete flustercluck that
I wasn't about to try to unscrew.

> > + while (count > 0) {
> > + unsigned pgsz = PAGE_SIZE - offset_in_page(addr);
>
> unsigned -> unsigned int

Any particular reason? Omitting it in some places helps stay within
the 80-column limit without sacrificing readability.

> > + }
> > + } while (size);
>
> Just to stay on the safe side, can we do while (size > 0) ? Just in case
> an unforeseen issue makes size negative, and gets us in a very long loop.

If size < 0, we should BUG, because that means we've zeroed more than
we were asked to do, which is data corruption.

There's probably some other hardening we should do for this loop.
For example, if 'count' is < 512, it can go into an infinite loop.

do {
void *addr;
unsigned long pfn;
long count;

count = bdev_direct_access(bdev, sector, &addr, &pfn, size);
if (count < 0)
return count;
while (count > 0) {
unsigned pgsz = PAGE_SIZE - offset_in_page(addr);
if (pgsz > count)
pgsz = count;
if (pgsz < PAGE_SIZE)
memset(addr, 0, pgsz);
else
clear_page(addr);
addr += pgsz;
size -= pgsz;
count -= pgsz;
BUG_ON(pgsz & 511);
sector += pgsz / 512;
cond_resched();
}
BUG_ON(size < 0);
} while (size);

I think that should do the job ... ?

2014-10-17 15:46:06

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH v11 08/21] dax,ext2: Replace ext2_clear_xip_target with dax_clear_blocks

----- Original Message -----
> From: "Matthew Wilcox" <[email protected]>
> To: "Mathieu Desnoyers" <[email protected]>
> Cc: "Matthew Wilcox" <[email protected]>, [email protected], [email protected],
> [email protected]
> Sent: Thursday, October 16, 2014 11:22:34 PM
> Subject: Re: [PATCH v11 08/21] dax,ext2: Replace ext2_clear_xip_target with dax_clear_blocks
>
> On Thu, Oct 16, 2014 at 12:05:25PM +0200, Mathieu Desnoyers wrote:
> > > +int dax_clear_blocks(struct inode *inode, sector_t block, long size)
> > > +{
> > > + struct block_device *bdev = inode->i_sb->s_bdev;
> > > + sector_t sector = block << (inode->i_blkbits - 9);
> >
> > Is there a define e.g. SECTOR_SHIFT rather than using this hardcoded "9"
> > value ?
>
> Yeah ... in half a dozen drivers, so introducing them globally spews
> warnings about redefining macros. The '9' and '512' are sprinkled all
> over the storage parts of the kernel, it's a complete flustercluck that
> I wasn't about to try to unscrew.

Fair enough.

>
> > > + while (count > 0) {
> > > + unsigned pgsz = PAGE_SIZE - offset_in_page(addr);
> >
> > unsigned -> unsigned int
>
> Any particular reason? Omitting it in some places helps stay within
> the 80-column limit without sacrificing readability.

It looks like FS code often uses "unsigned", so I'm not too concerned.

It's just that I'm used to the Linux core kernel style, which tend to
use "unsigned int".

>
> > > + }
> > > + } while (size);
> >
> > Just to stay on the safe side, can we do while (size > 0) ? Just in case
> > an unforeseen issue makes size negative, and gets us in a very long loop.
>
> If size < 0, we should BUG, because that means we've zeroed more than
> we were asked to do, which is data corruption.
>
> There's probably some other hardening we should do for this loop.
> For example, if 'count' is < 512, it can go into an infinite loop.
>
> do {
> void *addr;
> unsigned long pfn;
> long count;
>
> count = bdev_direct_access(bdev, sector, &addr, &pfn, size);
> if (count < 0)
> return count;
> while (count > 0) {
> unsigned pgsz = PAGE_SIZE - offset_in_page(addr);
> if (pgsz > count)
> pgsz = count;
> if (pgsz < PAGE_SIZE)
> memset(addr, 0, pgsz);
> else
> clear_page(addr);
> addr += pgsz;
> size -= pgsz;
> count -= pgsz;
> BUG_ON(pgsz & 511);
> sector += pgsz / 512;
> cond_resched();
> }
> BUG_ON(size < 0);
> } while (size);
>
> I think that should do the job ... ?
>

Yep. I love defensive programming, especially for filesystems. :)

Thanks,

Mathieu


--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com