2019-09-17 18:13:03

by Darrick J. Wong

[permalink] [raw]
Subject: [GIT PULL] iomap: new code for 5.4

Hi Linus,

Please pull this series containing all the new iomap code for 5.4. The
biggest change here is porting some of XFS' writeback code to iomap so
that we can share it with other filesystems; and making some adjustments
to the iomap directio code in preparation for other filesystems starting
to use it. In 5.5 we hope to finish converting XFS to iomap and to
start converting a few other filesystems.

The branch merges cleanly against this morning's HEAD and survived a
couple of weeks' worth of xfstests. The merge was completely
straightforward, so please let me know if you run into anything weird.

--D

The following changes since commit 609488bc979f99f805f34e9a32c1e3b71179d10b:

Linux 5.3-rc2 (2019-07-28 12:47:02 -0700)

are available in the Git repository at:

git://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git tags/iomap-5.4-merge-4

for you to fetch changes up to 68494b8e248fe8a7b6e9f88edd9a87661760ddb9:

iomap: move the iomap_dio_rw ->end_io callback into a structure (2019-09-03 08:28:22 -0700)

----------------------------------------------------------------
New code for 5.4:
- Port the XFS writeback code to iomap with the eventual goal of
converting XFS to use it.
- Clean up a few odds and ends in xfs writeback and convert the xfs
ioend code to use list_pop and friends.
- Report both io errors and short io results to the directio endio
handler.
- Allow directio callers to pass an ops structure to iomap_dio_rw.

----------------------------------------------------------------
Andreas Gruenbacher (1):
iomap: Fix trivial typo

Christoph Hellwig (9):
list.h: add list_pop and list_pop_entry helpers
iomap: copy the xfs writeback code to iomap.c
iomap: add tracing for the address space operations
iomap: warn on inline maps in iomap_writepage_map
xfs: set IOMAP_F_NEW more carefully
iomap: zero newly allocated mapped blocks
xfs: initialize iomap->flags in xfs_bmbt_to_iomap
xfs: refactor the ioend merging code
iomap: move the iomap_dio_rw ->end_io callback into a structure

Matthew Bobrowski (1):
iomap: split size and error for iomap_dio_rw ->end_io

Randy Dunlap (1):
tracing: fix iomap.h build warnings

fs/iomap/buffered-io.c | 575 ++++++++++++++++++++++++++++++++++++++++++-
fs/iomap/direct-io.c | 24 +-
fs/xfs/xfs_aops.c | 70 +++---
fs/xfs/xfs_file.c | 14 +-
fs/xfs/xfs_iomap.c | 35 ++-
fs/xfs/xfs_iomap.h | 2 +-
fs/xfs/xfs_pnfs.c | 2 +-
include/linux/iomap.h | 53 +++-
include/linux/list.h | 33 +++
include/trace/events/iomap.h | 87 +++++++
10 files changed, 824 insertions(+), 71 deletions(-)
create mode 100644 include/trace/events/iomap.h


2019-09-19 01:38:16

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] iomap: new code for 5.4

On Tue, Sep 17, 2019 at 8:21 AM Darrick J. Wong <[email protected]> wrote:
>
> Please pull this series containing all the new iomap code for 5.4.

So looking at the non-iomap parts of it, I react to the new "list_pop() code.

In particular, this:

struct list_head *pos = READ_ONCE(list->next);

is crazy to begin with..

It seems to have come from "list_empty()", but the difference is that
it actually makes sense to check for emptiness of a list outside
whatever lock that protects the list. It can be one of those very
useful optimizations where you don't even bother taking the lock if
you can optimistically check that the list is empty.

But the same is _not_ true of an operation like "list_pop()". By
definition, the list you pop something off has to be stable, so the
READ_ONCE() makes no sense here.

Anyway, if that was the only issue, I wouldn't care. But looking
closer, the whole thing is just completely wrong.

All the users seem to do some version of this:

struct list_head tmp;

list_replace_init(&ioend->io_list, &tmp);
iomap_finish_ioend(ioend, error);
while ((ioend = list_pop_entry(&tmp, struct iomap_ioend, io_list)))
iomap_finish_ioend(ioend, error);

which is completely wrong and pointless.

Why would anybody use that odd "list_pop()" thing in a loop, when what
it really seems to just want is that bog-standard
"list_for_each_entry_safe()"

struct list_head tmp;
struct iomap_ioend *next;

list_replace_init(&ioend->io_list, &tmp);
iomap_finish_ioend(ioend, error);
list_for_each_entry_safe(struct iomap_ioend, next, &tmp, io_list)
iomap_finish_ioend(ioend, error);

which is not only the common pattern, it's more efficient and doesn't
pointlessly re-write the list for each entry, it just walks it (and
the "_safe()" part is because it looks up the next entry early, so
that the entry that it's walking can be deleted).

So I pulled it. But then after looking at it, I unpulled it again
because I don't want to see this kind of insanity in one of THE MOST
CORE header files we have in the whole kernel.

If xfs and iomap want to think they are "popping" a list, they can do
so. In the privacy of your own home, you can do stupid and pointless
things.

But no, we don't pollute core kernel code with those stupid and
pointless things.

Linus

2019-09-19 05:41:38

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [GIT PULL] iomap: new code for 5.4

On Wed, Sep 18, 2019 at 06:31:29PM -0700, Linus Torvalds wrote:
> On Tue, Sep 17, 2019 at 8:21 AM Darrick J. Wong <[email protected]> wrote:
> >
> > Please pull this series containing all the new iomap code for 5.4.
>
> So looking at the non-iomap parts of it, I react to the new "list_pop() code.
>
> In particular, this:
>
> struct list_head *pos = READ_ONCE(list->next);
>
> is crazy to begin with..
>
> It seems to have come from "list_empty()", but the difference is that
> it actually makes sense to check for emptiness of a list outside
> whatever lock that protects the list. It can be one of those very
> useful optimizations where you don't even bother taking the lock if
> you can optimistically check that the list is empty.
>
> But the same is _not_ true of an operation like "list_pop()". By
> definition, the list you pop something off has to be stable, so the
> READ_ONCE() makes no sense here.
>
> Anyway, if that was the only issue, I wouldn't care. But looking
> closer, the whole thing is just completely wrong.
>
> All the users seem to do some version of this:
>
> struct list_head tmp;
>
> list_replace_init(&ioend->io_list, &tmp);
> iomap_finish_ioend(ioend, error);
> while ((ioend = list_pop_entry(&tmp, struct iomap_ioend, io_list)))
> iomap_finish_ioend(ioend, error);
>
> which is completely wrong and pointless.
>
> Why would anybody use that odd "list_pop()" thing in a loop, when what
> it really seems to just want is that bog-standard
> "list_for_each_entry_safe()"
>
> struct list_head tmp;
> struct iomap_ioend *next;
>
> list_replace_init(&ioend->io_list, &tmp);
> iomap_finish_ioend(ioend, error);
> list_for_each_entry_safe(struct iomap_ioend, next, &tmp, io_list)
> iomap_finish_ioend(ioend, error);
>
> which is not only the common pattern, it's more efficient and doesn't
> pointlessly re-write the list for each entry, it just walks it (and
> the "_safe()" part is because it looks up the next entry early, so
> that the entry that it's walking can be deleted).
>
> So I pulled it. But then after looking at it, I unpulled it again
> because I don't want to see this kind of insanity in one of THE MOST
> CORE header files we have in the whole kernel.
>
> If xfs and iomap want to think they are "popping" a list, they can do
> so. In the privacy of your own home, you can do stupid and pointless
> things.
>
> But no, we don't pollute core kernel code with those stupid and
> pointless things.

Ok, thanks for the feedback. TBH I'd wondered if list_pop was really
necessary, but as it didn't seem to harm anything I let it go.

Anyway, how should I proceed now? Christoph? :D

I propose the following (assuming Linus isn't cranky enough to refuse
the entire iomap patchpile forever):

Delete patch 1 and 9 from the series, and amend patch 2 as such:

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 051b8ec326ba..558d09bc5024 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1156,10 +1156,11 @@ void
iomap_finish_ioends(struct iomap_ioend *ioend, int error)
{
struct list_head tmp;
+ struct iomap_ioend *next;

list_replace_init(&ioend->io_list, &tmp);
iomap_finish_ioend(ioend, error);
- while ((ioend = list_pop_entry(&tmp, struct iomap_ioend, io_list)))
+ list_for_each_entry_safe(ioend, next, &tmp, io_list)
iomap_finish_ioend(ioend, error);
}
EXPORT_SYMBOL_GPL(iomap_finish_ioends);

Does that sound ok? It's been running through xfstests for a couple of
hours now and hasn't let any smoke out...

--D

>
> Linus
>

2019-09-19 07:11:18

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] iomap: new code for 5.4

On Wed, Sep 18, 2019 at 6:31 PM Linus Torvalds
<[email protected]> wrote:
>
> Why would anybody use that odd "list_pop()" thing in a loop, when what
> it really seems to just want is that bog-standard
> "list_for_each_entry_safe()"

Side note: I do agree that the list_for_each_entry_safe() thing isn't
exactly beautiful, particularly since you need that extra variable for
the temporary "next" pointer.

It's one of the C++ features I'd really like to use in the kernel -
the whole "declare new variable in a for (;;) statement" thing.

In fact, it made it into C - it's there in C99 - but we still use
"-std=gnu89" because of other problems with the c99 updates.

Anyway, I *would* be interested in cleaning up
list_for_each_entry_safe() if somebody has the energy and figures out
what we could do to get the c99 behavior without the breakage from
other sources.

For some background: the reason we use "gnu89" is because we use the
GNU extension with type cast initializers quite a bit, ie things like

#define __RAW_SPIN_LOCK_UNLOCKED(lockname) \
(raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname)

and that broke in c99 and gnu99, which considers those compound
literals and you can no longer use them as initializers.

See

https://lore.kernel.org/lkml/[email protected]/

for some of the historical discussion about this. It really _is_ sad,
because variable declarations inside for-loops are very useful, and
would have the potential to make some of our "for_each_xyz()" macros a
lot prettier (and easier to use too).

So our list_for_each_entry_safe() thing isn't perfect, but that's no
reason to try to then make up completely new things.

Linus

2019-09-19 18:46:00

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [GIT PULL] iomap: new code for 5.4

On Wed, Sep 18, 2019 at 06:31:29PM -0700, Linus Torvalds wrote:
> It seems to have come from "list_empty()", but the difference is that
> it actually makes sense to check for emptiness of a list outside
> whatever lock that protects the list. It can be one of those very
> useful optimizations where you don't even bother taking the lock if
> you can optimistically check that the list is empty.
>
> But the same is _not_ true of an operation like "list_pop()". By
> definition, the list you pop something off has to be stable, so the
> READ_ONCE() makes no sense here.

Indeed.

> Anyway, if that was the only issue, I wouldn't care. But looking
> closer, the whole thing is just completely wrong.
>
> All the users seem to do some version of this:
>
> struct list_head tmp;
>
> list_replace_init(&ioend->io_list, &tmp);
> iomap_finish_ioend(ioend, error);
> while ((ioend = list_pop_entry(&tmp, struct iomap_ioend, io_list)))
> iomap_finish_ioend(ioend, error);
>
> which is completely wrong and pointless.
>
> Why would anybody use that odd "list_pop()" thing in a loop, when what
> it really seems to just want is that bog-standard
> "list_for_each_entry_safe()"
>
> struct list_head tmp;
> struct iomap_ioend *next;
>
> list_replace_init(&ioend->io_list, &tmp);
> iomap_finish_ioend(ioend, error);
> list_for_each_entry_safe(struct iomap_ioend, next, &tmp, io_list)
> iomap_finish_ioend(ioend, error);
>
> which is not only the common pattern, it's more efficient and doesn't
> pointlessly re-write the list for each entry, it just walks it (and
> the "_safe()" part is because it looks up the next entry early, so
> that the entry that it's walking can be deleted).

That might be true for the current two cases that operate on a temporary
local list, but in general we have lots of cases where we operate on
lists that are not just local and where have to delete all the entries.

Sure, we could somehow let them dangle and then just do a INIT_LIST_HEAD
on the list later, but that is just asking for trouble down the road
when people actually use list_empty in the functions called in the loop.

2019-09-19 18:49:02

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] iomap: new code for 5.4

On Thu, Sep 19, 2019 at 10:03 AM Linus Torvalds
<[email protected]> wrote:
>
> So inside of xfs, that "pop ioend from the list" model really may make
> perfect sense, and you could just do
>
> static inline struct xfs_ioend *xfs_pop_ioend(struct list_head *head)

Note that as usual, my code in emails is written in the MUA, entirely
untested, and may be completely broken.

So take it just as a "maybe something like this works for you".

Linus

2019-09-19 18:49:06

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] iomap: new code for 5.4

On Wed, Sep 18, 2019 at 8:45 PM Darrick J. Wong <[email protected]> wrote:
>
> I propose the following (assuming Linus isn't cranky enough to refuse
> the entire iomap patchpile forever):

Since I don't think the code was _wrong_, it was just that I didn't
want to introduce a new pattern for something we already have, I'd be
ok with just a fix patch on top too.

And if the xfs people really want to use the "pop" model, that's fine
too - just make it specific to just the iomap_ioend type, which is all
you seem to really want to pop, and make the typing (and naming) be
about that particular thing.

As mentioned, I don't think that whole "while (pop)" model is _wrong_
per se. But I also don't like proliferating new random list users in
general, just because some subsystem has some internal preference.
See?

And I notice that one of the users actually does keep the list around
and modifies it, ie this one:

while ((ioend = list_pop_entry(&tmp, struct xfs_ioend, io_list))) {
xfs_ioend_try_merge(ioend, &tmp);
xfs_end_ioend(ioend);
}

actually seems to _work_ on the remaining part of the list in that
xfs_ioend_try_merge() function.

So inside of xfs, that "pop ioend from the list" model really may make
perfect sense, and you could just do

static inline struct xfs_ioend *xfs_pop_ioend(struct list_head *head)
{
struct xfs_ioend *ioend = list_first_entry_or_null(head,
struct xfs_ioend *, io_list);
if (ioend)
list_del(&ioend->io_list);
return ioend;
}

and I don't think it's wrong.

It's just that we've survived three decades without that list_pop(),
and I don't want to make our header files even bigger and more complex
unless we actually have multiple real users.

Linus

2019-09-20 01:28:18

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [GIT PULL] iomap: new code for 5.4

On Thu, Sep 19, 2019 at 10:03:30AM -0700, Linus Torvalds wrote:
> It's just that we've survived three decades without that list_pop(),
> and I don't want to make our header files even bigger and more complex
> unless we actually have multiple real users.

I personally surived with it, but really hated writing the open coded
list_for_each_entry + list_del or while list_first_entry_or_null +
┐ist_del when I could have a nice primitive for it. I finally decided
to just add that helper..

2019-09-20 02:20:14

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] iomap: new code for 5.4

On Thu, Sep 19, 2019 at 10:07 AM Christoph Hellwig <[email protected]> wrote:
>
> I personally surived with it, but really hated writing the open coded
> list_for_each_entry + list_del or while list_first_entry_or_null +
> ┐ist_del when I could have a nice primitive for it. I finally decided
> to just add that helper..

You realize that the list helper is even mis-named?

Generally a "pop()" should pop the last entry, not the first one. Or
course, which one is last and which one is first is entirely up to you
since you can add at the tail or the beginning. So even the name is
ambiguous.

So I really think the whole thing was badly implemented (and yes, part
of that was the whole implementation thing).

Doing list_del() by hand also means that you can actually decide if
you want list_del_init() or not. So it's

But again - keep that helper if you want it. Just don't put a badly
implemented and badly named "helper" it in a global header file that
_everybody_ has to look at.

Linus