2013-10-09 00:15:51

by Al Viro

[permalink] [raw]
Subject: [RFC][PATCH 10/13] make dump_emit() use vfs_write() instead of banging at ->f_op->write directly


... and deal with short writes properly - the output might be to pipe, after
all; as it is, e.g. no-MMU case of elf_fdpic coredump can write a whole lot
more than a page worth of data at one call.

Signed-off-by: Al Viro <[email protected]>
---
fs/coredump.c | 17 ++++++++++++-----
1 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/fs/coredump.c b/fs/coredump.c
index 319f973..478ebad 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -696,13 +696,20 @@ EXPORT_SYMBOL(dump_write);
int dump_emit(struct coredump_params *cprm, const void *addr, int nr)
{
struct file *file = cprm->file;
- if (dump_interrupted() || !access_ok(VERIFY_READ, addr, nr))
- return 0;
+ loff_t pos = file->f_pos;
+ ssize_t n;
if (cprm->written + nr > cprm->limit)
return 0;
- if (file->f_op->write(file, addr, nr, &file->f_pos) != nr)
- return 0;
- cprm->written += nr;
+ while (nr) {
+ if (dump_interrupted())
+ return 0;
+ n = vfs_write(file, addr, nr, &pos);
+ if (n < 0)
+ return 0;
+ file->f_pos = pos;
+ cprm->written += n;
+ nr -= n;
+ }
return 1;
}
EXPORT_SYMBOL(dump_emit);
--
1.7.2.5


2013-10-09 00:52:45

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC][PATCH 10/13] make dump_emit() use vfs_write() instead of banging at ->f_op->write directly

On Tue, Oct 8, 2013 at 5:15 PM, Al Viro <[email protected]> wrote:
>
> ... and deal with short writes properly

.. except you don't.

> + while (nr) {
> + if (dump_interrupted())
> + return 0;
> + n = vfs_write(file, addr, nr, &pos);
> + if (n < 0)
> + return 0;
> + file->f_pos = pos;
> + cprm->written += n;
> + nr -= n;
> + }

Please handle 'n == 0' too. Maybe it never happens (ie you get EPIPE
or ENOSPC), but write returning zero is actually possible and a valid
return value and traditional for "end of media". Looping forever is
not a good idea.

Linus

2013-10-09 01:18:37

by Al Viro

[permalink] [raw]
Subject: Re: [RFC][PATCH 10/13] make dump_emit() use vfs_write() instead of banging at ->f_op->write directly

On Tue, Oct 08, 2013 at 05:52:42PM -0700, Linus Torvalds wrote:
> On Tue, Oct 8, 2013 at 5:15 PM, Al Viro <[email protected]> wrote:
> >
> > ... and deal with short writes properly
>
> .. except you don't.
>
> > + while (nr) {
> > + if (dump_interrupted())
> > + return 0;
> > + n = vfs_write(file, addr, nr, &pos);
> > + if (n < 0)
> > + return 0;
> > + file->f_pos = pos;
> > + cprm->written += n;
> > + nr -= n;
> > + }
>
> Please handle 'n == 0' too. Maybe it never happens (ie you get EPIPE
> or ENOSPC), but write returning zero is actually possible and a valid
> return value and traditional for "end of media". Looping forever is
> not a good idea.

Point, but I would argue that we should yell very loud if we get 0 from
vfs_write() for non-zero size. I'm not sure if POSIX allows write(2)
to return that, but a lot of userland code won't be expecting that and
won't be able to cope...

2013-10-09 01:20:39

by Al Viro

[permalink] [raw]
Subject: Re: [RFC][PATCH 10/13] make dump_emit() use vfs_write() instead of banging at ->f_op->write directly

On Wed, Oct 09, 2013 at 02:18:33AM +0100, Al Viro wrote:

> Point, but I would argue that we should yell very loud if we get 0 from
> vfs_write() for non-zero size. I'm not sure if POSIX allows write(2)
> to return that, but a lot of userland code won't be expecting that and
> won't be able to cope...

PS: I agree that we should abort that loop if we get nr == 0, of course,
but I believe that we should report the pathname of the offending file,
at the very least.

2013-10-09 01:38:50

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC][PATCH 10/13] make dump_emit() use vfs_write() instead of banging at ->f_op->write directly

On Tue, Oct 8, 2013 at 6:18 PM, Al Viro <[email protected]> wrote:
>
> Point, but I would argue that we should yell very loud if we get 0 from
> vfs_write() for non-zero size. I'm not sure if POSIX allows write(2)
> to return that, but a lot of userland code won't be expecting that and
> won't be able to cope...

Actually POSIX very much allows zero returns. O_NDELAY is mentioned as
a possible cause, in addition to zero-sized writes themselves, of
course.

Also, writing to (but not past) the end of a block device returns 0
for "end of device", iirc.

Linus

2013-10-09 02:06:28

by Al Viro

[permalink] [raw]
Subject: Re: [RFC][PATCH 10/13] make dump_emit() use vfs_write() instead of banging at ->f_op->write directly

On Tue, Oct 08, 2013 at 06:38:47PM -0700, Linus Torvalds wrote:
> On Tue, Oct 8, 2013 at 6:18 PM, Al Viro <[email protected]> wrote:
> >
> > Point, but I would argue that we should yell very loud if we get 0 from
> > vfs_write() for non-zero size. I'm not sure if POSIX allows write(2)
> > to return that, but a lot of userland code won't be expecting that and
> > won't be able to cope...
>
> Actually POSIX very much allows zero returns. O_NDELAY is mentioned as
> a possible cause, in addition to zero-sized writes themselves, of
> course.

Umm... What it says is "If some data can be written without blocking the
thread, write() shall write what it can and return the number of bytes
written. Otherwise, it shall return -1 and set errno to EAGAIN." For
sockets EWOULDBLOCK is also allowed as a possible errno value. I hadn't
dug through the streams-related part, but we don't have that mess anyway.

> Also, writing to (but not past) the end of a block device returns 0
> for "end of device", iirc.

What do you mean? If the starting position is below the end of device,
we get a non-zero length write, not exceeding the end. If it's at
the end of device, we get -ENOSPC. It's out of scope for POSIX, but
Linux is definitely acting that way...

2013-10-09 02:27:19

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC][PATCH 10/13] make dump_emit() use vfs_write() instead of banging at ->f_op->write directly

On Tue, Oct 8, 2013 at 7:06 PM, Al Viro <[email protected]> wrote:
> On Tue, Oct 08, 2013 at 06:38:47PM -0700, Linus Torvalds wrote:
>> On Tue, Oct 8, 2013 at 6:18 PM, Al Viro <[email protected]> wrote:
>> >
>> > Point, but I would argue that we should yell very loud if we get 0 from
>> > vfs_write() for non-zero size. I'm not sure if POSIX allows write(2)
>> > to return that, but a lot of userland code won't be expecting that and
>> > won't be able to cope...
>>
>> Actually POSIX very much allows zero returns. O_NDELAY is mentioned as
>> a possible cause, in addition to zero-sized writes themselves, of
>> course.
>
> Umm... What it says is "If some data can be written without blocking the
> thread, write() shall write what it can and return the number of bytes
> written. Otherwise, it shall return -1 and set errno to EAGAIN."

Look closer.

".. most historical implementations return zero (with the O_NDELAY
flag set, which is the historical predecessor of O_NONBLOCK .."

>> Also, writing to (but not past) the end of a block device returns 0
>> for "end of device", iirc.
>
> What do you mean? If the starting position is below the end of device,
> we get a non-zero length write, not exceeding the end. If it's at
> the end of device, we get -ENOSPC. It's out of scope for POSIX, but
> Linux is definitely acting that way...

Hmm. I'm pretty sure I've seen zero returns for EOF somewhere..

Linus