Andrew, I'm very sorry, this is so late: I thought we had already
tested 5.12-rc's mm/filemap changes earlier, but running xfstests
on 32-bit huge tmpfs last weekend revealed a hang (fixed in 1/2);
then looking closer at test results, found SEEK_HOLE/SEEK_DATA
discrepancies that I'd previously assumed benign (surprises there
not surprising when huge pages get used) were in fact indicating
regressions in the new seek_hole_data implementation (fixed in 2/2).
Complicated by xfstests' seek_sanity_test needing some adjustments
to work correctly on huge tmpfs; but not yet submitted because I've
more to do there. seek_sanity combo patch attached, to allow anyone
here to verify the fixes on generic 308 285 286 436 445 448 490 539.
Up to you and Matthew whether these are rushed last minute into
5.12, or held over until the merge window, adding "Cc: stable"s.
1/2 mm/filemap: fix find_lock_entries hang on 32-bit THP
2/2 mm/filemap: fix mapping_seek_hole_data on THP & 32-bit
mm/filemap.c | 33 ++++++++++++++++++++-------------
1 file changed, 20 insertions(+), 13 deletions(-)
Thanks,
Hugh
No problem on 64-bit, or without huge pages, but xfstests generic/308
hung uninterruptibly on 32-bit huge tmpfs. Since 4.13's 0cc3b0ec23ce
("Clarify (and fix) MAX_LFS_FILESIZE macros"), MAX_LFS_FILESIZE is
only a PAGE_SIZE away from wrapping 32-bit xa_index to 0, so the new
find_lock_entries() has to be extra careful when handling a THP.
Fixes: 5c211ba29deb ("mm: add and use find_lock_entries")
Signed-off-by: Hugh Dickins <[email protected]>
---
mm/filemap.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
--- 5.12-rc8/mm/filemap.c 2021-02-26 19:42:39.812156085 -0800
+++ linux/mm/filemap.c 2021-04-20 23:20:20.509464440 -0700
@@ -1969,8 +1969,14 @@ unlock:
put:
put_page(page);
next:
- if (!xa_is_value(page) && PageTransHuge(page))
- xas_set(&xas, page->index + thp_nr_pages(page));
+ if (!xa_is_value(page) && PageTransHuge(page)) {
+ unsigned int nr_pages = thp_nr_pages(page);
+
+ /* Final THP may cross MAX_LFS_FILESIZE on 32-bit */
+ xas_set(&xas, page->index + nr_pages);
+ if (xas.xa_index < nr_pages)
+ break;
+ }
}
rcu_read_unlock();
No problem on 64-bit without huge pages, but xfstests generic/285
and other SEEK_HOLE/SEEK_DATA tests have regressed on huge tmpfs,
and on 32-bit architectures, with the new mapping_seek_hole_data().
Several different bugs turned out to need fixing.
u64 casts added to stop unfortunate sign-extension when shifting
(and let's use shifts throughout, rather than mixed with * and /).
Use round_up() when advancing pos, to stop assuming that pos was
already THP-aligned when advancing it by THP-size. (But I believe
this use of round_up() assumes that any THP must be THP-aligned:
true while tmpfs enforces that alignment, and is the only fs with
FS_THP_SUPPORT; but might need to be generalized in the future?
If I try to generalize it right now, I'm sure to get it wrong!)
Use xas_set() when iterating away from a THP, so that xa_index stays
in synch with start, instead of drifting away to return bogus offset.
Check start against end to avoid wrapping 32-bit xa_index to 0 (and
to handle these additional cases, seek_data or not, it's easier to
break the loop than goto: so rearrange exit from the function).
Fixes: 41139aa4c3a3 ("mm/filemap: add mapping_seek_hole_data")
Signed-off-by: Hugh Dickins <[email protected]>
---
mm/filemap.c | 23 ++++++++++++-----------
1 file changed, 12 insertions(+), 11 deletions(-)
--- 5.12-rc8/mm/filemap.c 2021-02-26 19:42:39.812156085 -0800
+++ linux/mm/filemap.c 2021-04-20 23:20:20.509464440 -0700
@@ -2671,8 +2671,8 @@ unsigned int seek_page_size(struct xa_st
loff_t mapping_seek_hole_data(struct address_space *mapping, loff_t start,
loff_t end, int whence)
{
- XA_STATE(xas, &mapping->i_pages, start >> PAGE_SHIFT);
- pgoff_t max = (end - 1) / PAGE_SIZE;
+ XA_STATE(xas, &mapping->i_pages, (u64)start >> PAGE_SHIFT);
+ pgoff_t max = (u64)(end - 1) >> PAGE_SHIFT;
bool seek_data = (whence == SEEK_DATA);
struct page *page;
@@ -2681,7 +2681,8 @@ loff_t mapping_seek_hole_data(struct add
rcu_read_lock();
while ((page = find_get_entry(&xas, max, XA_PRESENT))) {
- loff_t pos = xas.xa_index * PAGE_SIZE;
+ loff_t pos = (u64)xas.xa_index << PAGE_SHIFT;
+ unsigned int seek_size;
if (start < pos) {
if (!seek_data)
@@ -2689,25 +2690,25 @@ loff_t mapping_seek_hole_data(struct add
start = pos;
}
- pos += seek_page_size(&xas, page);
+ seek_size = seek_page_size(&xas, page);
+ pos = round_up((u64)pos + 1, seek_size);
start = page_seek_hole_data(&xas, mapping, page, start, pos,
seek_data);
if (start < pos)
goto unlock;
+ if (start >= end)
+ break;
+ if (seek_size > PAGE_SIZE)
+ xas_set(&xas, (u64)pos >> PAGE_SHIFT);
if (!xa_is_value(page))
put_page(page);
}
- rcu_read_unlock();
-
if (seek_data)
- return -ENXIO;
- goto out;
-
+ start = -ENXIO;
unlock:
rcu_read_unlock();
- if (!xa_is_value(page))
+ if (page && !xa_is_value(page))
put_page(page);
-out:
if (start > end)
return end;
return start;
On Wed, Apr 21, 2021 at 05:39:14PM -0700, Hugh Dickins wrote:
> No problem on 64-bit without huge pages, but xfstests generic/285
> and other SEEK_HOLE/SEEK_DATA tests have regressed on huge tmpfs,
> and on 32-bit architectures, with the new mapping_seek_hole_data().
> Several different bugs turned out to need fixing.
>
> u64 casts added to stop unfortunate sign-extension when shifting
> (and let's use shifts throughout, rather than mixed with * and /).
That confuses me. loff_t is a signed long long, but it can't be negative
(... right?) So how does casting it to an u64 before dividing by
PAGE_SIZE help?
> Use round_up() when advancing pos, to stop assuming that pos was
> already THP-aligned when advancing it by THP-size. (But I believe
> this use of round_up() assumes that any THP must be THP-aligned:
> true while tmpfs enforces that alignment, and is the only fs with
> FS_THP_SUPPORT; but might need to be generalized in the future?
> If I try to generalize it right now, I'm sure to get it wrong!)
No generalisation needed in future. Folios must be naturally aligned
within a file.
> @@ -2681,7 +2681,8 @@ loff_t mapping_seek_hole_data(struct add
>
> rcu_read_lock();
> while ((page = find_get_entry(&xas, max, XA_PRESENT))) {
> - loff_t pos = xas.xa_index * PAGE_SIZE;
> + loff_t pos = (u64)xas.xa_index << PAGE_SHIFT;
> + unsigned int seek_size;
I've been preferring size_t for 'number of bytes in a page' because
I'm sure somebody is going to want a page larger than 2GB in the next
ten years.
On Wed, Apr 21, 2021 at 05:37:33PM -0700, Hugh Dickins wrote:
> - if (!xa_is_value(page) && PageTransHuge(page))
> - xas_set(&xas, page->index + thp_nr_pages(page));
> + if (!xa_is_value(page) && PageTransHuge(page)) {
> + unsigned int nr_pages = thp_nr_pages(page);
> +
> + /* Final THP may cross MAX_LFS_FILESIZE on 32-bit */
> + xas_set(&xas, page->index + nr_pages);
> + if (xas.xa_index < nr_pages)
> + break;
> + }
Aargh. We really need to get the multi-index support in; this works
perfectly when the xas_set() hack isn't needed any more.
On Thu, 22 Apr 2021, Matthew Wilcox wrote:
> On Wed, Apr 21, 2021 at 05:39:14PM -0700, Hugh Dickins wrote:
> > No problem on 64-bit without huge pages, but xfstests generic/285
> > and other SEEK_HOLE/SEEK_DATA tests have regressed on huge tmpfs,
> > and on 32-bit architectures, with the new mapping_seek_hole_data().
> > Several different bugs turned out to need fixing.
> >
> > u64 casts added to stop unfortunate sign-extension when shifting
> > (and let's use shifts throughout, rather than mixed with * and /).
>
> That confuses me. loff_t is a signed long long, but it can't be negative
> (... right?) So how does casting it to an u64 before dividing by
> PAGE_SIZE help?
That is a good question. Sprinkling u64s was the first thing I tried,
and I'd swear that it made a good difference at the time; but perhaps
that was all down to just the one on xas.xa_index << PAGE_SHIFT. Or
is it possible that one of the other bugs led to a negative loff_t,
and the casts got better behaviour out of that? Doubtful.
What I certainly recall from yesterday was leaving out one (which?)
of the casts as unnecessary, and wasting quite a bit of time until I
put it back in. Did I really choose precisely the only one necessary?
Taking most of them out did give me good quick runs just now: I'll
go over them again and try full runs on all machines. You'll think me
crazy, but yesterday's experience leaves me reluctant to change without
full testing - but agree it's not good to leave ignorant magic in.
>
> > Use round_up() when advancing pos, to stop assuming that pos was
> > already THP-aligned when advancing it by THP-size. (But I believe
> > this use of round_up() assumes that any THP must be THP-aligned:
> > true while tmpfs enforces that alignment, and is the only fs with
> > FS_THP_SUPPORT; but might need to be generalized in the future?
> > If I try to generalize it right now, I'm sure to get it wrong!)
>
> No generalisation needed in future. Folios must be naturally aligned
> within a file.
Thanks for the info: I did search around in your various patch series
from last October, and failed to find a decider there: I imagined that
when you started on compound pages for more efficient I/O, there would
be no necessity to align them (whereas huge pmd mappings of shared
files make the alignment important). Anyway, assuming natural alignment
is easiest - but it's remarkable how few places need to rely on it.
>
> > @@ -2681,7 +2681,8 @@ loff_t mapping_seek_hole_data(struct add
> >
> > rcu_read_lock();
> > while ((page = find_get_entry(&xas, max, XA_PRESENT))) {
> > - loff_t pos = xas.xa_index * PAGE_SIZE;
> > + loff_t pos = (u64)xas.xa_index << PAGE_SHIFT;
> > + unsigned int seek_size;
>
> I've been preferring size_t for 'number of bytes in a page' because
> I'm sure somebody is going to want a page larger than 2GB in the next
> ten years.
Ah, there I was simply following what the author of seek_page_size()
had chosen, and I think that's the right thing to do in today's tree:
let's see who that author was... hmm, someone called Matthew Wilcox :)
On Wed, 21 Apr 2021, Hugh Dickins wrote:
> On Thu, 22 Apr 2021, Matthew Wilcox wrote:
> > On Wed, Apr 21, 2021 at 05:39:14PM -0700, Hugh Dickins wrote:
> > > No problem on 64-bit without huge pages, but xfstests generic/285
> > > and other SEEK_HOLE/SEEK_DATA tests have regressed on huge tmpfs,
> > > and on 32-bit architectures, with the new mapping_seek_hole_data().
> > > Several different bugs turned out to need fixing.
> > >
> > > u64 casts added to stop unfortunate sign-extension when shifting
> > > (and let's use shifts throughout, rather than mixed with * and /).
> >
> > That confuses me. loff_t is a signed long long, but it can't be negative
> > (... right?) So how does casting it to an u64 before dividing by
> > PAGE_SIZE help?
>
> That is a good question. Sprinkling u64s was the first thing I tried,
> and I'd swear that it made a good difference at the time; but perhaps
> that was all down to just the one on xas.xa_index << PAGE_SHIFT. Or
> is it possible that one of the other bugs led to a negative loff_t,
> and the casts got better behaviour out of that? Doubtful.
>
> What I certainly recall from yesterday was leaving out one (which?)
> of the casts as unnecessary, and wasting quite a bit of time until I
> put it back in. Did I really choose precisely the only one necessary?
>
> Taking most of them out did give me good quick runs just now: I'll
> go over them again and try full runs on all machines. You'll think me
> crazy, but yesterday's experience leaves me reluctant to change without
> full testing - but agree it's not good to leave ignorant magic in.
And you'll be unsurprised to hear that the test runs went fine,
with all but one of those u64 casts removed. And I did locate the
version of filemap.c where I'd left out one "unnecessary" cast:
I had indeed chosen to remove the only one that's necessary.
v2 coming up now, thanks,
Hugh
No problem on 64-bit without huge pages, but xfstests generic/285
and other SEEK_HOLE/SEEK_DATA tests have regressed on huge tmpfs,
and on 32-bit architectures, with the new mapping_seek_hole_data().
Several different bugs turned out to need fixing.
u64 cast to stop losing bits when converting unsigned long to loff_t
(and let's use shifts throughout, rather than mixed with * and /).
Use round_up() when advancing pos, to stop assuming that pos was
already THP-aligned when advancing it by THP-size. (This use of
round_up() assumes that any THP has THP-aligned index: true at present
and true going forward, but could be recoded to avoid the assumption.)
Use xas_set() when iterating away from a THP, so that xa_index stays
in synch with start, instead of drifting away to return bogus offset.
Check start against end to avoid wrapping 32-bit xa_index to 0 (and
to handle these additional cases, seek_data or not, it's easier to
break the loop than goto: so rearrange exit from the function).
Fixes: 41139aa4c3a3 ("mm/filemap: add mapping_seek_hole_data")
Signed-off-by: Hugh Dickins <[email protected]>
---
v2: Removed all but one of v1's u64 casts, as suggested my Matthew.
Updated commit message on u64 cast and THP alignment, per Matthew.
Andrew, I'd have just sent a -fix.patch to remove the unnecessary u64s,
but need to reword the commit message: so please replace yesterday's
mm-filemap-fix-mapping_seek_hole_data-on-thp-32-bit.patch
by this one - thanks.
mm/filemap.c | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)
--- 5.12-rc8/mm/filemap.c 2021-02-26 19:42:39.812156085 -0800
+++ linux/mm/filemap.c 2021-04-21 22:58:03.699655576 -0700
@@ -2672,7 +2672,7 @@ loff_t mapping_seek_hole_data(struct add
loff_t end, int whence)
{
XA_STATE(xas, &mapping->i_pages, start >> PAGE_SHIFT);
- pgoff_t max = (end - 1) / PAGE_SIZE;
+ pgoff_t max = (end - 1) >> PAGE_SHIFT;
bool seek_data = (whence == SEEK_DATA);
struct page *page;
@@ -2681,7 +2681,8 @@ loff_t mapping_seek_hole_data(struct add
rcu_read_lock();
while ((page = find_get_entry(&xas, max, XA_PRESENT))) {
- loff_t pos = xas.xa_index * PAGE_SIZE;
+ loff_t pos = (u64)xas.xa_index << PAGE_SHIFT;
+ unsigned int seek_size;
if (start < pos) {
if (!seek_data)
@@ -2689,25 +2690,25 @@ loff_t mapping_seek_hole_data(struct add
start = pos;
}
- pos += seek_page_size(&xas, page);
+ seek_size = seek_page_size(&xas, page);
+ pos = round_up(pos + 1, seek_size);
start = page_seek_hole_data(&xas, mapping, page, start, pos,
seek_data);
if (start < pos)
goto unlock;
+ if (start >= end)
+ break;
+ if (seek_size > PAGE_SIZE)
+ xas_set(&xas, pos >> PAGE_SHIFT);
if (!xa_is_value(page))
put_page(page);
}
- rcu_read_unlock();
-
if (seek_data)
- return -ENXIO;
- goto out;
-
+ start = -ENXIO;
unlock:
rcu_read_unlock();
- if (!xa_is_value(page))
+ if (page && !xa_is_value(page))
put_page(page);
-out:
if (start > end)
return end;
return start;
On Thu, 22 Apr 2021 13:48:57 -0700 (PDT) Hugh Dickins <[email protected]> wrote:
> Andrew, I'd have just sent a -fix.patch to remove the unnecessary u64s,
> but need to reword the commit message: so please replace yesterday's
> mm-filemap-fix-mapping_seek_hole_data-on-thp-32-bit.patch
> by this one - thanks.
Actually, I routinely update the base patch's changelog when queueing a -fix.
On Thu, 22 Apr 2021, Andrew Morton wrote:
> On Thu, 22 Apr 2021 13:48:57 -0700 (PDT) Hugh Dickins <[email protected]> wrote:
>
> > Andrew, I'd have just sent a -fix.patch to remove the unnecessary u64s,
> > but need to reword the commit message: so please replace yesterday's
> > mm-filemap-fix-mapping_seek_hole_data-on-thp-32-bit.patch
> > by this one - thanks.
>
> Actually, I routinely update the base patch's changelog when queueing a -fix.
And thank you for that, but if there's time, I think we would still
prefer the final commit message to include corrections where Matthew
enlightened me (that "sign-extension" claim came from my confusion):
-u64 casts added to stop unfortunate sign-extension when shifting (and
-let's use shifts throughout, rather than mixed with * and /).
-
-Use round_up() when advancing pos, to stop assuming that pos was already
-THP-aligned when advancing it by THP-size. (But I believe this use of
-round_up() assumes that any THP must be THP-aligned: true while tmpfs
-enforces that alignment, and is the only fs with FS_THP_SUPPORT; but might
-need to be generalized in the future? If I try to generalize it right
-now, I'm sure to get it wrong!)
+u64 cast to stop losing bits when converting unsigned long to loff_t
+(and let's use shifts throughout, rather than mixed with * and /).
+
+Use round_up() when advancing pos, to stop assuming that pos was
+already THP-aligned when advancing it by THP-size. (This use of
+round_up() assumes that any THP has THP-aligned index: true at present
+and true going forward, but could be recoded to avoid the assumption.)
Thanks,
Hugh
On Fri, 23 Apr 2021 10:22:51 -0700 (PDT) Hugh Dickins <[email protected]> wrote:
> On Thu, 22 Apr 2021, Andrew Morton wrote:
> > On Thu, 22 Apr 2021 13:48:57 -0700 (PDT) Hugh Dickins <[email protected]> wrote:
> >
> > > Andrew, I'd have just sent a -fix.patch to remove the unnecessary u64s,
> > > but need to reword the commit message: so please replace yesterday's
> > > mm-filemap-fix-mapping_seek_hole_data-on-thp-32-bit.patch
> > > by this one - thanks.
> >
> > Actually, I routinely update the base patch's changelog when queueing a -fix.
>
> And thank you for that, but if there's time, I think we would still
> prefer the final commit message to include corrections where Matthew
> enlightened me (that "sign-extension" claim came from my confusion):
That's my point. When I merge a -v2 as a -fix, I replace the v1
patch's changelog with v2's changelog so everything works out after
folding.
On Fri, 23 Apr 2021, Andrew Morton wrote:
> On Fri, 23 Apr 2021 10:22:51 -0700 (PDT) Hugh Dickins <[email protected]> wrote:
> > On Thu, 22 Apr 2021, Andrew Morton wrote:
> > > On Thu, 22 Apr 2021 13:48:57 -0700 (PDT) Hugh Dickins <[email protected]> wrote:
> > >
> > > > Andrew, I'd have just sent a -fix.patch to remove the unnecessary u64s,
> > > > but need to reword the commit message: so please replace yesterday's
> > > > mm-filemap-fix-mapping_seek_hole_data-on-thp-32-bit.patch
> > > > by this one - thanks.
> > >
> > > Actually, I routinely update the base patch's changelog when queueing a -fix.
> >
> > And thank you for that, but if there's time, I think we would still
> > prefer the final commit message to include corrections where Matthew
> > enlightened me (that "sign-extension" claim came from my confusion):
>
> That's my point. When I merge a -v2 as a -fix, I replace the v1
> patch's changelog with v2's changelog so everything works out after
> folding.
Oh, great, thanks: that was not clear to me, I feared you meant adding
"v2: remove unneeded u64 casts" to v1 when merging.
Hugh