2016-03-16 04:17:26

by Al Viro

[permalink] [raw]
Subject: [WTF] utterly tasteless ABI in hfi1 (around ->write()/->write_iter())

Folks, we'd discussed that kind of crap already; why, in name of
everything unholy, is that kind of garbage brought back in a new driver?

Having both ->write() and ->write_iter() *AND* having entirely
unrelated interpretation of user input on those two on the same device
is bogus, with the capital Stop That Shit Right Now.

As it is, write(fd, p, len) and writev(fd, &(struct iovec){p,len}, 1)
are interpreted in absolutely unrelated ways. As in, entirely different
set of commands. Moreover, aio IOCB_CMD_PWRITE matches writev(), not write().

What's going on? Sure, ipathfs is a precious snowflake with the same
kind of braindamage. Back when it had been brought to your attention
(along with the fact that this piece of idiocy happens to be a file on
filesystem of your own, under your full control and no need whatsofuckingever
to multiplex completely unrelated sets of commands on the same file with
"was it write(2) or writev(2)?" used as a selector) the answer had been
basically "it doesn't have to make sense, it's Special". Now it turns out
that it has grown an equally special sibling. With the idiocy directly
exposed as userland ABI. Could we please fix that thing before it's cast in
stone?

Take a look at drivers/staging/rdma/hfi1/file_ops.c in -next and
compare hfi1_write_iter() with hfi1_file_write(). Folks, this ABI is too
ugly to live, let alone to be allowed breeding.

It's also brittle as hell - trivial massage around fs/read_write.c
and fs/aio.c is quite capable of breaking that shit. Arguably, IOCB_CMD_PWRITE
and IOCB_CMD_PWRITEV both triggering your writev() semantics is an example of
just such breakage. Sigh...


2016-03-16 04:34:07

by Linus Torvalds

[permalink] [raw]
Subject: Re: [WTF] utterly tasteless ABI in hfi1 (around ->write()/->write_iter())

On Tue, Mar 15, 2016 at 9:17 PM, Al Viro <[email protected]> wrote:
>
> Take a look at drivers/staging/rdma/hfi1/file_ops.c in -next and
> compare hfi1_write_iter() with hfi1_file_write(). Folks, this ABI is too
> ugly to live, let alone to be allowed breeding.
>
> It's also brittle as hell - trivial massage around fs/read_write.c
> and fs/aio.c is quite capable of breaking that shit. Arguably, IOCB_CMD_PWRITE
> and IOCB_CMD_PWRITEV both triggering your writev() semantics is an example of
> just such breakage. Sigh...

We could just decide that if a file descriptor has both ->write and
->write_iter entities, we always pick ->write_iter in the vfs layer.

That way it's always consistent.

Simple ordering change in __vfs_write()..

We can switch is back later, but make sure it hits a release or two.
Or at least a few rc's, to flush out any problems.

Anybody who thinks that they can have different semantics for write()
and writev() is just completely broken.

Linus

2016-03-16 15:46:36

by Doug Ledford

[permalink] [raw]
Subject: Re: [WTF] utterly tasteless ABI in hfi1 (around ->write()/->write_iter())

On 03/16/2016 12:17 AM, Al Viro wrote:
> Folks, we'd discussed that kind of crap already; why, in name of
> everything unholy, is that kind of garbage brought back in a new driver?

No doubt because in large part the hfi1 driver copy and pastes
significant portions of the qib driver. Likewise, libpsm2 (which works
with the hfi1 devices) is largely a copy and paste job from open-psm
(which works with qib/ipath devices).

> Having both ->write() and ->write_iter() *AND* having entirely
> unrelated interpretation of user input on those two on the same device
> is bogus, with the capital Stop That Shit Right Now.
>
> As it is, write(fd, p, len) and writev(fd, &(struct iovec){p,len}, 1)
> are interpreted in absolutely unrelated ways. As in, entirely different
> set of commands. Moreover, aio IOCB_CMD_PWRITE matches writev(), not write().
>
> What's going on? Sure, ipathfs is a precious snowflake with the same
> kind of braindamage. Back when it had been brought to your attention
> (along with the fact that this piece of idiocy happens to be a file on
> filesystem of your own, under your full control and no need whatsofuckingever
> to multiplex completely unrelated sets of commands on the same file with
> "was it write(2) or writev(2)?" used as a selector) the answer had been
> basically "it doesn't have to make sense, it's Special".

I can't speak for Mike, but I never said "It's Special". I said it's a
driver internal thing with only one consumer and the kernel driver and
the user space consumer are a matched pair. If this were a general API
for use by any old program I would agree with you, but since it isn't, I
wasn't that concerned about whether it got fixed. If it broke, Intel
had both pieces and could fix it. And with that in mind I said "ince
this is an internal driver interface that only Intel uses, I'm not
inclined to force them to rewrite their driver and their library just
because their particular usage took you off guard."

I should point out that the fragility is not so rampant as you make it
sound. The qib driver was added in 2010 and had this interface then
(modulo your changes in 4961772560d2). It was a copy from the ipath
driver at the time, so in truth had been around much longer even than 2010.

> Now it turns out
> that it has grown an equally special sibling. With the idiocy directly
> exposed as userland ABI. Could we please fix that thing before it's cast in
> stone?

If we want to maintain back compatibility, then the qib driver has to
maintain this interface. We could possibly do a new one as well, but we
can't remove this one.

For the hfi1 driver (and OPA in general), we do have the ability to do a
new API. But, going back to what I said before, I just don't care that
much. It's Intel internal stuff as far as I'm concerned. If they do
something fragile and it breaks, then that's all on their hands.
They've gotten away with it for over 10 years so I can see why they
probably aren't that concerned about it themselves.

> Take a look at drivers/staging/rdma/hfi1/file_ops.c in -next and
> compare hfi1_write_iter() with hfi1_file_write(). Folks, this ABI is too
> ugly to live, let alone to be allowed breeding.
>
> It's also brittle as hell - trivial massage around fs/read_write.c
> and fs/aio.c is quite capable of breaking that shit. Arguably, IOCB_CMD_PWRITE
> and IOCB_CMD_PWRITEV both triggering your writev() semantics is an example of
> just such breakage. Sigh...

I'm sure you could break it intentionally if you wanted to. Not that I
think you *should*, but I'm sure you *could*.

Intel: For your own sake's you might want to consider doing something
simple such as using two files, one for regular commands and one for
SDMA commands, and modifying both the hfi1 and libpsm2 code bases. I
don't care, and I wouldn't force you to do it, but I've made my argument
to Al and he appears to be running it up the tree, so it might be
easiest to capitulate.

--
Doug Ledford <[email protected]>
GPG KeyID: 0E572FDD



Attachments:
signature.asc (884.00 B)
OpenPGP digital signature

2016-03-16 16:02:58

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [WTF] utterly tasteless ABI in hfi1 (around ->write()/->write_iter())

On Wed, Mar 16, 2016 at 11:46:31AM -0400, Doug Ledford wrote:
> I can't speak for Mike, but I never said "It's Special". I said it's a
> driver internal thing with only one consumer and the kernel driver and
> the user space consumer are a matched pair. If this were a general API
> for use by any old program I would agree with you, but since it isn't, I
> wasn't that concerned about whether it got fixed. If it broke, Intel
> had both pieces and could fix it. And with that in mind I said "ince
> this is an internal driver interface that only Intel uses, I'm not
> inclined to force them to rewrite their driver and their library just
> because their particular usage took you off guard."

The thing which is really scary about the "we own both pieces, so we
can be sloppy with the interface design" is there is the question of
whether there are any security issues with such an interface. Maybe
the assumption is that only root will get to access the interface, but
as we're seeing with user namespaces, very often these assumptions are
getting upended.

So certainly if I were a bad guy working at the NSA^H^H^H^H KGB trying
to find a zero-day that I would keep in my agency's back pocket,
interfaces designed with this kind of attitude would be the first
place I would look.

> For the hfi1 driver (and OPA in general), we do have the ability to do a
> new API. But, going back to what I said before, I just don't care that
> much. It's Intel internal stuff as far as I'm concerned. If they do
> something fragile and it breaks, then that's all on their hands.

Except if there's a security vulernability, then it's on our
collective heads as kernel developers. Unless we want to tell people,
"don't use intel hardware, it's written by device driver authors who
are sloppy"....

- Ted

2016-03-16 16:36:49

by Linus Torvalds

[permalink] [raw]
Subject: Re: [WTF] utterly tasteless ABI in hfi1 (around ->write()/->write_iter())

On Wed, Mar 16, 2016 at 8:46 AM, Doug Ledford <[email protected]> wrote:
>
> If we want to maintain back compatibility, then the qib driver has to
> maintain this interface. We could possibly do a new one as well, but we
> can't remove this one.

We've broken more important driver ABI's before - all the nasty X stuff.

Now, the X people did learn their lesson, and it hasn't happened
lately (thank Gods!), but quite frankly, some shit-for-brains
hardware-specific config interface for a rdma device that basically
nobody uses is a _lot_ less important than X ever was.

So I don't care one whit if we break it, and it's not the kind of
backwards compatibility the kernel should worry about. There are
exactly zero regular users of this interface. I assume that people who
use this thing are *so* deeply technical that they can take care of
themselves. And it really is a completely broken interface.

I might be proven wrong, and somebody's dear old grandma ends up
complaining about a new kernel breaking her configuration, and in that
case we'd have to revert anything that causes that breakage. But I
suspect I'm not wrong.

Linus

2016-03-16 17:07:09

by Al Viro

[permalink] [raw]
Subject: Re: [WTF] utterly tasteless ABI in hfi1 (around ->write()/->write_iter())

On Wed, Mar 16, 2016 at 09:36:46AM -0700, Linus Torvalds wrote:
> On Wed, Mar 16, 2016 at 8:46 AM, Doug Ledford <[email protected]> wrote:
> >
> > If we want to maintain back compatibility, then the qib driver has to
> > maintain this interface. We could possibly do a new one as well, but we
> > can't remove this one.
>
> We've broken more important driver ABI's before - all the nasty X stuff.
>
> Now, the X people did learn their lesson, and it hasn't happened
> lately (thank Gods!), but quite frankly, some shit-for-brains
> hardware-specific config interface for a rdma device that basically
> nobody uses is a _lot_ less important than X ever was.
>
> So I don't care one whit if we break it, and it's not the kind of
> backwards compatibility the kernel should worry about. There are
> exactly zero regular users of this interface. I assume that people who
> use this thing are *so* deeply technical that they can take care of
> themselves. And it really is a completely broken interface.
>
> I might be proven wrong, and somebody's dear old grandma ends up
> complaining about a new kernel breaking her configuration, and in that
> case we'd have to revert anything that causes that breakage. But I
> suspect I'm not wrong.

Let's start with "do saner interface for hfi1 if you want it in", then
duplicate it for ipathfs (using the saner userland side already tested on
fixed hfi1), then we make sure nobody is even tempted to repeat that
ugliness more or less along the lines of what you'd proposed in
fs/read_write.c, but might as well add a sanity check in do_dentry_open(),
where we already have
if ((f->f_mode & FMODE_WRITE) &&
likely(f->f_op->write || f->f_op->write_iter))
f->f_mode |= FMODE_CAN_WRITE;
- the check for both methods being present could go there conveniently enough;
"use new_sync_write() as ->write" is spelled as "have NULL ->write and non-NULL
->write_iter" these days, so having both is very rare; it's only
/dev/zero, /dev/null and these two perversions. And while writes to /dev/null
are very important (for email handling, if nothing else), I suspect that we
won't be seriously hurt by having it do the extra work new_sync_write() would
involve... Or we could special-case that in the check (move write_null(),
aka return the third argument into fs/libfs.c and in unlikely case of having
both ->write and ->write_iter check if ->write is that one).

2016-03-16 22:52:27

by Dennis Dalessandro

[permalink] [raw]
Subject: RE: [WTF] utterly tasteless ABI in hfi1 (around ->write()/->write_iter())

Just wanted to re-send so it gets picked up by the lists. I was using a different
server than usual and vger did not like it. Hopefully this attempt fares better.
Sorry for the extra noise.

--

On Wed, Mar 16, 2016 at 11:46:31AM -0400, Doug Ledford wrote:
>Intel: For your own sake's you might want to consider doing something
>simple such as using two files, one for regular commands and one for
>SDMA commands, and modifying both the hfi1 and libpsm2 code bases. I
>don't care, and I wouldn't force you to do it, but I've made my argument
>to Al and he appears to be running it up the tree, so it might be
>easiest to capitulate.

This should be doable. We could have multiple files and have each implement only
the appropriate handler, write() for control and writev() for data. Another
option we are considering is combine the struct used for regular commands and
the one for SDMA commands into a union in a structure that includes some magic
value that tells us how to interpret the payload. This way whether it's write()
or writev() won't matter we'll know how to handle it. We'll start with fixing
hfi1 first and go from there.

-Denny