fscrypt_is_bounce_folio() is the equivalent of fscrypt_is_bounce_page()
and fscrypt_pagecache_folio() is the equivalent of fscrypt_pagecache_page().
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
include/linux/fscrypt.h | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)
diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h
index 4f5f8a651213..c2c07d36fb3a 100644
--- a/include/linux/fscrypt.h
+++ b/include/linux/fscrypt.h
@@ -273,6 +273,16 @@ static inline struct page *fscrypt_pagecache_page(struct page *bounce_page)
return (struct page *)page_private(bounce_page);
}
+static inline bool fscrypt_is_bounce_folio(struct folio *folio)
+{
+ return folio->mapping == NULL;
+}
+
+static inline struct folio *fscrypt_pagecache_folio(struct folio *bounce_folio)
+{
+ return bounce_folio->private;
+}
+
void fscrypt_free_bounce_page(struct page *bounce_page);
/* policy.c */
@@ -448,6 +458,17 @@ static inline struct page *fscrypt_pagecache_page(struct page *bounce_page)
return ERR_PTR(-EINVAL);
}
+static inline bool fscrypt_is_bounce_folio(struct folio *folio)
+{
+ return false;
+}
+
+static inline struct folio *fscrypt_pagecache_folio(struct folio *bounce_folio)
+{
+ WARN_ON_ONCE(1);
+ return ERR_PTR(-EINVAL);
+}
+
static inline void fscrypt_free_bounce_page(struct page *bounce_page)
{
}
--
2.35.1
On Thu, Jan 26, 2023 at 08:23:46PM +0000, Matthew Wilcox (Oracle) wrote:
> fscrypt_is_bounce_folio() is the equivalent of fscrypt_is_bounce_page()
> and fscrypt_pagecache_folio() is the equivalent of fscrypt_pagecache_page().
>
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> ---
> include/linux/fscrypt.h | 21 +++++++++++++++++++++
> 1 file changed, 21 insertions(+)
>
> diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h
> index 4f5f8a651213..c2c07d36fb3a 100644
> --- a/include/linux/fscrypt.h
> +++ b/include/linux/fscrypt.h
> @@ -273,6 +273,16 @@ static inline struct page *fscrypt_pagecache_page(struct page *bounce_page)
> return (struct page *)page_private(bounce_page);
> }
>
> +static inline bool fscrypt_is_bounce_folio(struct folio *folio)
> +{
> + return folio->mapping == NULL;
> +}
> +
> +static inline struct folio *fscrypt_pagecache_folio(struct folio *bounce_folio)
> +{
> + return bounce_folio->private;
> +}
ext4_bio_write_folio() is still doing:
bounce_page = fscrypt_encrypt_pagecache_blocks(&folio->page, ...);
Should it be creating a "bounce folio" instead, or is that not in the scope of
this patchset?
- Eric
On Thu, Jan 26, 2023 at 07:02:14PM -0800, Eric Biggers wrote:
> On Thu, Jan 26, 2023 at 08:23:46PM +0000, Matthew Wilcox (Oracle) wrote:
> > fscrypt_is_bounce_folio() is the equivalent of fscrypt_is_bounce_page()
> > and fscrypt_pagecache_folio() is the equivalent of fscrypt_pagecache_page().
> >
> > Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> > ---
> > include/linux/fscrypt.h | 21 +++++++++++++++++++++
> > 1 file changed, 21 insertions(+)
> >
> > diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h
> > index 4f5f8a651213..c2c07d36fb3a 100644
> > --- a/include/linux/fscrypt.h
> > +++ b/include/linux/fscrypt.h
> > @@ -273,6 +273,16 @@ static inline struct page *fscrypt_pagecache_page(struct page *bounce_page)
> > return (struct page *)page_private(bounce_page);
> > }
> >
> > +static inline bool fscrypt_is_bounce_folio(struct folio *folio)
> > +{
> > + return folio->mapping == NULL;
> > +}
> > +
> > +static inline struct folio *fscrypt_pagecache_folio(struct folio *bounce_folio)
> > +{
> > + return bounce_folio->private;
> > +}
>
> ext4_bio_write_folio() is still doing:
>
> bounce_page = fscrypt_encrypt_pagecache_blocks(&folio->page, ...);
>
> Should it be creating a "bounce folio" instead, or is that not in the scope of
> this patchset?
It's out of scope for _this_ patchset. I think it's a patchset that
could come either before or after, and is needed to support large folios
with ext4. The biggest problem with doing that conversion is that
bounce pages are allocated from a mempool which obviously only allocates
order-0 folios. I don't know what to do about that. Have a mempool
for each order of folio that the filesystem supports? Try to allocate
folios without a mempool and then split the folio if allocation fails?
Have a mempool containing PMD-order pages and split them ourselves if
we need to allocate from the mempool?
Nothing's really standing out to me as the perfect answer. There are
probably other alternatives.
On Fri, Jan 27, 2023 at 04:13:37PM +0000, Matthew Wilcox wrote:
> On Thu, Jan 26, 2023 at 07:02:14PM -0800, Eric Biggers wrote:
> > On Thu, Jan 26, 2023 at 08:23:46PM +0000, Matthew Wilcox (Oracle) wrote:
> > > fscrypt_is_bounce_folio() is the equivalent of fscrypt_is_bounce_page()
> > > and fscrypt_pagecache_folio() is the equivalent of fscrypt_pagecache_page().
> > >
> > > Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> > > ---
> > > include/linux/fscrypt.h | 21 +++++++++++++++++++++
> > > 1 file changed, 21 insertions(+)
> > >
> > > diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h
> > > index 4f5f8a651213..c2c07d36fb3a 100644
> > > --- a/include/linux/fscrypt.h
> > > +++ b/include/linux/fscrypt.h
> > > @@ -273,6 +273,16 @@ static inline struct page *fscrypt_pagecache_page(struct page *bounce_page)
> > > return (struct page *)page_private(bounce_page);
> > > }
> > >
> > > +static inline bool fscrypt_is_bounce_folio(struct folio *folio)
> > > +{
> > > + return folio->mapping == NULL;
> > > +}
> > > +
> > > +static inline struct folio *fscrypt_pagecache_folio(struct folio *bounce_folio)
> > > +{
> > > + return bounce_folio->private;
> > > +}
> >
> > ext4_bio_write_folio() is still doing:
> >
> > bounce_page = fscrypt_encrypt_pagecache_blocks(&folio->page, ...);
> >
> > Should it be creating a "bounce folio" instead, or is that not in the scope of
> > this patchset?
>
> It's out of scope for _this_ patchset. I think it's a patchset that
> could come either before or after, and is needed to support large folios
> with ext4. The biggest problem with doing that conversion is that
> bounce pages are allocated from a mempool which obviously only allocates
> order-0 folios. I don't know what to do about that. Have a mempool
> for each order of folio that the filesystem supports? Try to allocate
> folios without a mempool and then split the folio if allocation fails?
> Have a mempool containing PMD-order pages and split them ourselves if
> we need to allocate from the mempool?
>
> Nothing's really standing out to me as the perfect answer. There are
> probably other alternatives.
Would it be possible to keep using bounce *pages* all the time, even when the
pagecache contains large folios?
- Eric
On Fri, Jan 27, 2023 at 08:21:46AM -0800, Eric Biggers wrote:
> On Fri, Jan 27, 2023 at 04:13:37PM +0000, Matthew Wilcox wrote:
> > On Thu, Jan 26, 2023 at 07:02:14PM -0800, Eric Biggers wrote:
> > > On Thu, Jan 26, 2023 at 08:23:46PM +0000, Matthew Wilcox (Oracle) wrote:
> > > > fscrypt_is_bounce_folio() is the equivalent of fscrypt_is_bounce_page()
> > > > and fscrypt_pagecache_folio() is the equivalent of fscrypt_pagecache_page().
> > > >
> > > > Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> > > > ---
> > > > include/linux/fscrypt.h | 21 +++++++++++++++++++++
> > > > 1 file changed, 21 insertions(+)
> > > >
> > > > diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h
> > > > index 4f5f8a651213..c2c07d36fb3a 100644
> > > > --- a/include/linux/fscrypt.h
> > > > +++ b/include/linux/fscrypt.h
> > > > @@ -273,6 +273,16 @@ static inline struct page *fscrypt_pagecache_page(struct page *bounce_page)
> > > > return (struct page *)page_private(bounce_page);
> > > > }
> > > >
> > > > +static inline bool fscrypt_is_bounce_folio(struct folio *folio)
> > > > +{
> > > > + return folio->mapping == NULL;
> > > > +}
> > > > +
> > > > +static inline struct folio *fscrypt_pagecache_folio(struct folio *bounce_folio)
> > > > +{
> > > > + return bounce_folio->private;
> > > > +}
> > >
> > > ext4_bio_write_folio() is still doing:
> > >
> > > bounce_page = fscrypt_encrypt_pagecache_blocks(&folio->page, ...);
> > >
> > > Should it be creating a "bounce folio" instead, or is that not in the scope of
> > > this patchset?
> >
> > It's out of scope for _this_ patchset. I think it's a patchset that
> > could come either before or after, and is needed to support large folios
> > with ext4. The biggest problem with doing that conversion is that
> > bounce pages are allocated from a mempool which obviously only allocates
> > order-0 folios. I don't know what to do about that. Have a mempool
> > for each order of folio that the filesystem supports? Try to allocate
> > folios without a mempool and then split the folio if allocation fails?
> > Have a mempool containing PMD-order pages and split them ourselves if
> > we need to allocate from the mempool?
> >
> > Nothing's really standing out to me as the perfect answer. There are
> > probably other alternatives.
>
> Would it be possible to keep using bounce *pages* all the time, even when the
> pagecache contains large folios?
I _think_ so. Probably the best solution is to attempt to allocate an
order-N folio (with GFP_NOWAIT?) and then fall back to allocating 2^N
pages from the mempool. It'll require some surgery to ext4_finish_bio()
as well as ext4_bio_write_folio(), fscrypt_encrypt_pagecache_blocks()
and fscrypt_free_bounce_page(), but I think it's doable. I'll try
to whip something up.
"Matthew Wilcox (Oracle)" <[email protected]> writes:
> fscrypt_is_bounce_folio() is the equivalent of fscrypt_is_bounce_page()
> and fscrypt_pagecache_folio() is the equivalent of fscrypt_pagecache_page().
>
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> ---
> include/linux/fscrypt.h | 21 +++++++++++++++++++++
> 1 file changed, 21 insertions(+)
Straight forward conversion. IIUC, even after this patchset we haven't
killed fscrypt_is_bounce_page() and fscrypt_pagecache_folio(), because
there are other users of this in f2fs and fscrypt.
Looks good to me. Please feel free to add -
Reviewed-by: Ritesh Harjani (IBM) <[email protected]>
-ritesh
>
> diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h
> index 4f5f8a651213..c2c07d36fb3a 100644
> --- a/include/linux/fscrypt.h
> +++ b/include/linux/fscrypt.h
> @@ -273,6 +273,16 @@ static inline struct page *fscrypt_pagecache_page(struct page *bounce_page)
> return (struct page *)page_private(bounce_page);
> }
>
> +static inline bool fscrypt_is_bounce_folio(struct folio *folio)
> +{
> + return folio->mapping == NULL;
> +}
> +
> +static inline struct folio *fscrypt_pagecache_folio(struct folio *bounce_folio)
> +{
> + return bounce_folio->private;
> +}
> +
> void fscrypt_free_bounce_page(struct page *bounce_page);
>
> /* policy.c */
> @@ -448,6 +458,17 @@ static inline struct page *fscrypt_pagecache_page(struct page *bounce_page)
> return ERR_PTR(-EINVAL);
> }
>
> +static inline bool fscrypt_is_bounce_folio(struct folio *folio)
> +{
> + return false;
> +}
> +
> +static inline struct folio *fscrypt_pagecache_folio(struct folio *bounce_folio)
> +{
> + WARN_ON_ONCE(1);
> + return ERR_PTR(-EINVAL);
> +}
> +
> static inline void fscrypt_free_bounce_page(struct page *bounce_page)
> {
> }
> --
> 2.35.1
On Fri, Jan 27, 2023 at 04:13:37PM +0000, Matthew Wilcox wrote:
>
> It's out of scope for _this_ patchset. I think it's a patchset that
> could come either before or after, and is needed to support large folios
> with ext4. The biggest problem with doing that conversion is that
> bounce pages are allocated from a mempool which obviously only allocates
> order-0 folios. I don't know what to do about that. Have a mempool
> for each order of folio that the filesystem supports? Try to allocate
> folios without a mempool and then split the folio if allocation fails?
> Have a mempool containing PMD-order pages and split them ourselves if
> we need to allocate from the mempool?
>
> Nothing's really standing out to me as the perfect answer. There are
> probably other alternatives.
Hmm.... should we have some kind of check in case a large folio is
passed to these fscrypt functions? (e.g., some kind of BUG_ON, or
WARN_ON?)
Or do we just rely on people remembering that when we start trying to
support large folios for ext4, it will probably have to be the easy
cases first (e.g., no fscrypt, no fsverity, block size == page size)?
- Ted
On Tue, Mar 14, 2023 at 06:05:51PM -0400, Theodore Ts'o wrote:
> On Fri, Jan 27, 2023 at 04:13:37PM +0000, Matthew Wilcox wrote:
> >
> > It's out of scope for _this_ patchset. I think it's a patchset that
> > could come either before or after, and is needed to support large folios
> > with ext4. The biggest problem with doing that conversion is that
> > bounce pages are allocated from a mempool which obviously only allocates
> > order-0 folios. I don't know what to do about that. Have a mempool
> > for each order of folio that the filesystem supports? Try to allocate
> > folios without a mempool and then split the folio if allocation fails?
> > Have a mempool containing PMD-order pages and split them ourselves if
> > we need to allocate from the mempool?
> >
> > Nothing's really standing out to me as the perfect answer. There are
> > probably other alternatives.
>
> Hmm.... should we have some kind of check in case a large folio is
> passed to these fscrypt functions? (e.g., some kind of BUG_ON, or
> WARN_ON?)
>
> Or do we just rely on people remembering that when we start trying to
> support large folios for ext4, it will probably have to be the easy
> cases first (e.g., no fscrypt, no fsverity, block size == page size)?
>
I think large folio support for fscrypt and fsverity is not that far away. I
already made the following changes in 6.3:
51e4e3153ebc ("fscrypt: support decrypting data from large folios")
5d0f0e57ed90 ("fsverity: support verifying data from large folios")
AFAICT, absent actual testing of course, the only major thing that's still
needed is that fscrypt_encrypt_pagecache_blocks() needs to support large folios.
I'm not sure how it should work, exactly. Matthew gave a couple options.
Another option is to just continue to use bounce *pages*, and keep track of all
the bounce pages for each folio.
We could certainly make fscrypt_encrypt_pagecache_blocks() WARN when given a
large folio for now, if we aren't going to update it properly anytime soon.
By the way, fscrypt_encrypt_pagecache_blocks() is only used by the fs-layer file
contents encryption, not inline encryption. Even without changing it, we could
support large folios on encrypted files when inline encryption is being used.
(A smaller thing, which I think I missed in "fsverity: support verifying data
from large folios", is that fsverity_verify_bio() still uses
bio_first_page_all(bio)->mapping->host to get the bio's inode. Perhaps there
needs to be a page_folio() in there for the ->mapping to be valid?)
- Eric
On Tue, Mar 14, 2023 at 04:12:39PM -0700, Eric Biggers wrote:
>
> I think large folio support for fscrypt and fsverity is not that far away. I
> already made the following changes in 6.3:
>
> 51e4e3153ebc ("fscrypt: support decrypting data from large folios")
> 5d0f0e57ed90 ("fsverity: support verifying data from large folios")
Cool! I was thinking that fscrypt and fsverity might end up lagging
as far as the large folio support was concerned, but I'm glad that
this might not be the case.
> AFAICT, absent actual testing of course, the only major thing that's still
> needed is that fscrypt_encrypt_pagecache_blocks() needs to support large folios.
> I'm not sure how it should work, exactly. Matthew gave a couple options.
> Another option is to just continue to use bounce *pages*, and keep track of all
> the bounce pages for each folio.
We don't have to solve that right away; it is possible to support
reads of large folios, but not writes. If someone reads in a 128k
folio, and then modifies a 4k page in the middle of the page, we could
just split up the 128k folio and then writing out either the single 4k
page that was modified. (It might very well be that in that case, we
*want* to break up the folio anyway, to avoid the write amplification
problem.)
In any case, I suspect that how we would support large folios for ext4
by first is to support using iomap for buffer I/O --- but only for
file systems where page size == block size, with no fscrypt, no
fsverity, no data=journal, and only for buffered reads. And for
buffered writes, we'll break apart the folio and then use the existing
ext4_writepages() code path.
We can then gradually start relying on iomap and using large folios
for additional scenarios, both on the read and eventually, on the
write side. I suspect we'll want to have a way of enabling and
disabling large folios on a fine-grained manner, as well has
potentially proactively breaking up large folios in page_mkwrite (so
that a 4k random page modification doesn't get amplified into the
entire contents of a large folio needing to be written back).
- Ted