2014-10-16 12:38:45

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH v11 19/21] dax: Add dax_zero_page_range

On 25-Sep-2014 04:33:36 PM, Matthew Wilcox wrote:
> This new function allows us to support hole-punch for DAX files by zeroing
> a partial page, as opposed to the dax_truncate_page() function which can
> only truncate to the end of the page. Reimplement dax_truncate_page() to
> call dax_zero_page_range().
>
> Signed-off-by: Matthew Wilcox <[email protected]>
> [ported to 3.13-rc2]
> Signed-off-by: Ross Zwisler <[email protected]>
> ---
> Documentation/filesystems/dax.txt | 1 +
> fs/dax.c | 36 +++++++++++++++++++++++++++++++-----
> include/linux/fs.h | 7 +++++++
> 3 files changed, 39 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt
> index 635adaa..ebcd97f 100644
> --- a/Documentation/filesystems/dax.txt
> +++ b/Documentation/filesystems/dax.txt
> @@ -62,6 +62,7 @@ Filesystem support consists of
> for fault and page_mkwrite (which should probably call dax_fault() and
> dax_mkwrite(), passing the appropriate get_block() callback)
> - calling dax_truncate_page() instead of block_truncate_page() for DAX files
> +- calling dax_zero_page_range() instead of zero_user() for DAX files
> - ensuring that there is sufficient locking between reads, writes,
> truncates and page faults
>
> diff --git a/fs/dax.c b/fs/dax.c
> index 6801be7..91b7561 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -462,13 +462,16 @@ int dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
> EXPORT_SYMBOL_GPL(dax_fault);
>
> /**
> - * dax_truncate_page - handle a partial page being truncated in a DAX file
> + * dax_zero_page_range - zero a range within a page of a DAX file
> * @inode: The file being truncated
> * @from: The file offset that is being truncated to
> + * @length: The number of bytes to zero
> * @get_block: The filesystem method used to translate file offsets to blocks
> *
> - * Similar to block_truncate_page(), this function can be called by a
> - * filesystem when it is truncating an DAX file to handle the partial page.
> + * This function can be called by a filesystem when it is zeroing part of a
> + * page in a DAX file. This is intended for hole-punch operations. If
> + * you are truncating a file, the helper function dax_truncate_page() may be
> + * more convenient.
> *
> * We work in terms of PAGE_CACHE_SIZE here for commonality with
> * block_truncate_page(), but we could go down to PAGE_SIZE if the filesystem
> @@ -476,17 +479,18 @@ EXPORT_SYMBOL_GPL(dax_fault);
> * block size is smaller than PAGE_SIZE, we have to zero the rest of the page
> * since the file might be mmaped.
> */
> -int dax_truncate_page(struct inode *inode, loff_t from, get_block_t get_block)
> +int dax_zero_page_range(struct inode *inode, loff_t from, unsigned length,

nit: unsigned -> unsigned int ?

Do we want a unsigned int or unsigned long here ?

> + get_block_t get_block)
> {
> struct buffer_head bh;
> pgoff_t index = from >> PAGE_CACHE_SHIFT;
> unsigned offset = from & (PAGE_CACHE_SIZE-1);
> - unsigned length = PAGE_CACHE_ALIGN(from) - from;
> int err;
>
> /* Block boundary? Nothing to do */
> if (!length)
> return 0;
> + BUG_ON((offset + length) > PAGE_CACHE_SIZE);

Isn't it a bit extreme to BUG_ON this condition ? We could return an
error to the caller, and perhaps WARN_ON_ONCE(), but BUG_ON() appears to
be slightly too strong here.

>
> memset(&bh, 0, sizeof(bh));
> bh.b_size = PAGE_CACHE_SIZE;
> @@ -503,4 +507,26 @@ int dax_truncate_page(struct inode *inode, loff_t from, get_block_t get_block)
>
> return 0;
> }
> +EXPORT_SYMBOL_GPL(dax_zero_page_range);
> +
> +/**
> + * dax_truncate_page - handle a partial page being truncated in a DAX file
> + * @inode: The file being truncated
> + * @from: The file offset that is being truncated to
> + * @get_block: The filesystem method used to translate file offsets to blocks
> + *
> + * Similar to block_truncate_page(), this function can be called by a
> + * filesystem when it is truncating an DAX file to handle the partial page.

an DAX -> a DAX

> + *
> + * We work in terms of PAGE_CACHE_SIZE here for commonality with
> + * block_truncate_page(), but we could go down to PAGE_SIZE if the filesystem
> + * took care of disposing of the unnecessary blocks. Even if the filesystem
> + * block size is smaller than PAGE_SIZE, we have to zero the rest of the page
> + * since the file might be mmaped.
> + */
> +int dax_truncate_page(struct inode *inode, loff_t from, get_block_t get_block)
> +{
> + unsigned length = PAGE_CACHE_ALIGN(from) - from;

unsigned -> unsigned int.

Same question about "unsigned long" as above.

> + return dax_zero_page_range(inode, from, length, get_block);
> +}
> EXPORT_SYMBOL_GPL(dax_truncate_page);
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index e6b48cc..105d0f0 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2490,6 +2490,7 @@ extern int nonseekable_open(struct inode * inode, struct file * filp);
>
> #ifdef CONFIG_FS_DAX
> int dax_clear_blocks(struct inode *, sector_t block, long size);
> +int dax_zero_page_range(struct inode *, loff_t from, unsigned len, get_block_t);
> int dax_truncate_page(struct inode *, loff_t from, get_block_t);
> ssize_t dax_do_io(int rw, struct kiocb *, struct inode *, struct iov_iter *,
> loff_t, get_block_t, dio_iodone_t, int flags);
> @@ -2506,6 +2507,12 @@ static inline int dax_truncate_page(struct inode *i, loff_t frm, get_block_t gb)
> return 0;
> }
>
> +static inline int dax_zero_page_range(struct inode *i, loff_t frm,
> + unsigned len, get_block_t gb)
> +{
> + return 0;

Should we return 0 or -ENOSYS here ?

Thanks,

Mathieu

> +}
> +
> static inline ssize_t dax_do_io(int rw, struct kiocb *iocb,
> struct inode *inode, struct iov_iter *iter, loff_t pos,
> get_block_t get_block, dio_iodone_t end_io, int flags)
> --
> 2.1.0
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
>
>

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
Key fingerprint: 2A0B 4ED9 15F2 D3FA 45F5 B162 1728 0A97 8118 6ACF


2014-10-17 07:03:11

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v11 19/21] dax: Add dax_zero_page_range

On Thu, Oct 16, 2014 at 02:38:24PM +0200, Mathieu Desnoyers wrote:
> > +int dax_zero_page_range(struct inode *inode, loff_t from, unsigned length,
>
> nit: unsigned -> unsigned int ?
>
> Do we want a unsigned int or unsigned long here ?

It's supposed to be for a fragment of a page, so until we see a machine
with PAGE_SIZE > 4GB, we're good to use an unsigned int.

> > if (!length)
> > return 0;
> > + BUG_ON((offset + length) > PAGE_CACHE_SIZE);
>
> Isn't it a bit extreme to BUG_ON this condition ? We could return an
> error to the caller, and perhaps WARN_ON_ONCE(), but BUG_ON() appears to
> be slightly too strong here.

Dave Chinner asked for it :-) The filesystem is supposed to be doing
this clamping (until the last version, I had this function doing the
clamping, and I was told off for "leaving landmines lying around".

> > +static inline int dax_zero_page_range(struct inode *i, loff_t frm,
> > + unsigned len, get_block_t gb)
> > +{
> > + return 0;
>
> Should we return 0 or -ENOSYS here ?

I kind of wonder if we shouldn't just declare the function. It's called
like this:

if (IS_DAX(inode))
return dax_zero_page_range(inode, from, length, ext4_get_block);
return __ext4_block_zero_page_range(handle, mapping, from, length);

and if CONFIG_DAX is not set, IS_DAX evaluates to 0 at compile time, so
the compiler will optimise out the call to dax_zero_page_range() anyway.

2014-10-17 15:49:50

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH v11 19/21] dax: Add dax_zero_page_range

----- Original Message -----
> From: "Matthew Wilcox" <[email protected]>
> To: "Mathieu Desnoyers" <[email protected]>
> Cc: "Matthew Wilcox" <[email protected]>, [email protected], [email protected],
> [email protected], "Ross Zwisler" <[email protected]>
> Sent: Friday, October 17, 2014 12:01:26 AM
> Subject: Re: [PATCH v11 19/21] dax: Add dax_zero_page_range
>
> On Thu, Oct 16, 2014 at 02:38:24PM +0200, Mathieu Desnoyers wrote:
> > > +int dax_zero_page_range(struct inode *inode, loff_t from, unsigned
> > > length,
> >
> > nit: unsigned -> unsigned int ?
> >
> > Do we want a unsigned int or unsigned long here ?
>
> It's supposed to be for a fragment of a page, so until we see a machine
> with PAGE_SIZE > 4GB, we're good to use an unsigned int.

OK

>
> > > if (!length)
> > > return 0;
> > > + BUG_ON((offset + length) > PAGE_CACHE_SIZE);
> >
> > Isn't it a bit extreme to BUG_ON this condition ? We could return an
> > error to the caller, and perhaps WARN_ON_ONCE(), but BUG_ON() appears to
> > be slightly too strong here.
>
> Dave Chinner asked for it :-) The filesystem is supposed to be doing
> this clamping (until the last version, I had this function doing the
> clamping, and I was told off for "leaving landmines lying around".

Makes sense,

>
> > > +static inline int dax_zero_page_range(struct inode *i, loff_t frm,
> > > + unsigned len, get_block_t gb)
> > > +{
> > > + return 0;
> >
> > Should we return 0 or -ENOSYS here ?
>
> I kind of wonder if we shouldn't just declare the function. It's called
> like this:
>
> if (IS_DAX(inode))
> return dax_zero_page_range(inode, from, length,
> ext4_get_block);
> return __ext4_block_zero_page_range(handle, mapping, from, length);
>
> and if CONFIG_DAX is not set, IS_DAX evaluates to 0 at compile time, so
> the compiler will optimise out the call to dax_zero_page_range() anyway.

I strongly prefer to implement "unimplemented stub" as static inlines
rather than defining to 0, because the compiler can check that the types
passed to the function are valid, even in the #else configuration which
uses the stubs.

The only reason why I have not pointed this out for some of your other
patches was because it was clear that the local style of those files was
to define stubbed functions as 0. But I still dislike it.

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

2014-10-18 20:36:35

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v11 19/21] dax: Add dax_zero_page_range

On Fri, Oct 17, 2014 at 03:49:39PM +0000, Mathieu Desnoyers wrote:
> > I kind of wonder if we shouldn't just declare the function. It's called
> > like this:
> >
> > if (IS_DAX(inode))
> > return dax_zero_page_range(inode, from, length,
> > ext4_get_block);
> > return __ext4_block_zero_page_range(handle, mapping, from, length);
> >
> > and if CONFIG_DAX is not set, IS_DAX evaluates to 0 at compile time, so
> > the compiler will optimise out the call to dax_zero_page_range() anyway.
>
> I strongly prefer to implement "unimplemented stub" as static inlines
> rather than defining to 0, because the compiler can check that the types
> passed to the function are valid, even in the #else configuration which
> uses the stubs.

I think my explanation was unclear. This is what I meant:

+++ b/include/linux/fs.h
@@ -2473,7 +2473,6 @@ extern loff_t fixed_size_llseek(struct file *file, loff_t
offset,
extern int generic_file_open(struct inode * inode, struct file * filp);
extern int nonseekable_open(struct inode * inode, struct file * filp);

-#ifdef CONFIG_FS_DAX
int dax_clear_blocks(struct inode *, sector_t block, long size);
int dax_zero_page_range(struct inode *, loff_t from, unsigned len, get_block_t)
;
int dax_truncate_page(struct inode *, loff_t from, get_block_t);
#define dax_mkwrite(vma, vmf, gb) dax_fault(vma, vmf, gb)
-#else
-static inline int dax_clear_blocks(struct inode *i, sector_t blk, long sz)
-{
- return 0;
-}
-
-static inline int dax_truncate_page(struct inode *i, loff_t frm, get_block_t gb)
-{
- return 0;
-}
-
-static inline int dax_zero_page_range(struct inode *i, loff_t frm,
- unsigned len, get_block_t gb)
-{
- return 0;
-}
-
-static inline ssize_t dax_do_io(int rw, struct kiocb *iocb,
- struct inode *inode, struct iov_iter *iter, loff_t pos,
- get_block_t get_block, dio_iodone_t end_io, int flags)
-{
- return -ENOTTY;
-}
-#endif

#ifdef CONFIG_BLOCK
typedef void (dio_submit_t)(int rw, struct bio *bio, struct inode *inode,


So after the preprocessor has run, the compiler will see:

if (0)
return dax_zero_page_range(inode, from, length, ext4_get_block);

and it will still do type checking on the call, even though it will eliminate
the call.

I think what you're really complaining about is that the argument to
IS_DAX() is not checked for being an inode.

We could solve that this way:

#ifdef CONFIG_FS_DAX
#define S_DAX 8192
#else
#define S_DAX 0
#endif
...
#define IS_DAX(inode) ((inode)->i_flags & S_DAX)

After preprocessing, the compiler than sees:

if (((inode)->i_flags & 0))
return dax_zero_page_range(inode, from, length, ext4_get_block);

and successfully deduces that the condition evaluates to 0, and still
elide the reference to dax_zero_page_range (checked with 'nm').

2014-10-18 21:16:33

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH v11 19/21] dax: Add dax_zero_page_range

----- Original Message -----
> From: "Matthew Wilcox" <[email protected]>
> To: "Mathieu Desnoyers" <[email protected]>
> Cc: "Matthew Wilcox" <[email protected]>, "Matthew Wilcox" <[email protected]>,
> [email protected], [email protected], [email protected], "Ross Zwisler"
> <[email protected]>
> Sent: Saturday, October 18, 2014 7:41:00 PM
> Subject: Re: [PATCH v11 19/21] dax: Add dax_zero_page_range
>
> On Fri, Oct 17, 2014 at 03:49:39PM +0000, Mathieu Desnoyers wrote:
> > > I kind of wonder if we shouldn't just declare the function. It's called
> > > like this:
> > >
> > > if (IS_DAX(inode))
> > > return dax_zero_page_range(inode, from, length,
> > > ext4_get_block);
> > > return __ext4_block_zero_page_range(handle, mapping, from,
> > > length);
> > >
> > > and if CONFIG_DAX is not set, IS_DAX evaluates to 0 at compile time, so
> > > the compiler will optimise out the call to dax_zero_page_range() anyway.
> >
> > I strongly prefer to implement "unimplemented stub" as static inlines
> > rather than defining to 0, because the compiler can check that the types
> > passed to the function are valid, even in the #else configuration which
> > uses the stubs.
>
> I think my explanation was unclear. This is what I meant:
>
> +++ b/include/linux/fs.h
> @@ -2473,7 +2473,6 @@ extern loff_t fixed_size_llseek(struct file *file,
> loff_t
> offset,
> extern int generic_file_open(struct inode * inode, struct file * filp);
> extern int nonseekable_open(struct inode * inode, struct file * filp);
>
> -#ifdef CONFIG_FS_DAX
> int dax_clear_blocks(struct inode *, sector_t block, long size);
> int dax_zero_page_range(struct inode *, loff_t from, unsigned len,
> get_block_t)
> ;
> int dax_truncate_page(struct inode *, loff_t from, get_block_t);
> #define dax_mkwrite(vma, vmf, gb) dax_fault(vma, vmf, gb)
> -#else
> -static inline int dax_clear_blocks(struct inode *i, sector_t blk, long sz)
> -{
> - return 0;
> -}
> -
> -static inline int dax_truncate_page(struct inode *i, loff_t frm, get_block_t
> gb)
> -{
> - return 0;
> -}
> -
> -static inline int dax_zero_page_range(struct inode *i, loff_t frm,
> - unsigned len, get_block_t gb)
> -{
> - return 0;
> -}
> -
> -static inline ssize_t dax_do_io(int rw, struct kiocb *iocb,
> - struct inode *inode, struct iov_iter *iter, loff_t pos,
> - get_block_t get_block, dio_iodone_t end_io, int flags)
> -{
> - return -ENOTTY;
> -}
> -#endif
>
> #ifdef CONFIG_BLOCK
> typedef void (dio_submit_t)(int rw, struct bio *bio, struct inode *inode,
>
>
> So after the preprocessor has run, the compiler will see:
>
> if (0)
> return dax_zero_page_range(inode, from, length, ext4_get_block);
>
> and it will still do type checking on the call, even though it will eliminate
> the call.
>

Indeed, since Linux is always compiled in O2 or Os, it will work.

> I think what you're really complaining about is that the argument to
> IS_DAX() is not checked for being an inode.
>
> We could solve that this way:
>
> #ifdef CONFIG_FS_DAX
> #define S_DAX 8192
> #else
> #define S_DAX 0
> #endif
> ...
> #define IS_DAX(inode) ((inode)->i_flags & S_DAX)
>
> After preprocessing, the compiler than sees:
>
> if (((inode)->i_flags & 0))
> return dax_zero_page_range(inode, from, length, ext4_get_block);
>
> and successfully deduces that the condition evaluates to 0, and still
> elide the reference to dax_zero_page_range (checked with 'nm').

Sounds good,

Thanks,

Mathieu


--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com