2000-11-04 22:16:41

by David Wragg

[permalink] [raw]
Subject: What protects f_pos?

Since f_pos of struct file is a loff_t, on 32-bit architectures it
needs a lock to make accesses atomic (or some more sophisticated form
of protection). But looking in 2.4.0-test10, there doesn't seem to be
any such lock.

The llseek op is called with the Big Kernel Lock, but unlike in 2.2,
the read and write ops are called without any locks held, and so
generic_file_{read|write} make unprotected accesses to f_pos (through
their ppos argument).

Is this something for Ted's todo list?

David


2000-11-11 22:54:21

by Theodore Ts'o

[permalink] [raw]
Subject: Re: What protects f_pos?

From: David Wragg <[email protected]>
Date: 04 Nov 2000 22:16:18 +0000

Since f_pos of struct file is a loff_t, on 32-bit architectures it
needs a lock to make accesses atomic (or some more sophisticated form
of protection). But looking in 2.4.0-test10, there doesn't seem to be
any such lock.

The llseek op is called with the Big Kernel Lock, but unlike in 2.2,
the read and write ops are called without any locks held, and so
generic_file_{read|write} make unprotected accesses to f_pos (through
their ppos argument).

This looks like it's a bug to me.... although if you have multiple
threads hitting a file descriptor at the same time, you're pretty much
asking for trouble.

- Ted

2000-11-12 01:56:58

by David Wragg

[permalink] [raw]
Subject: Re: What protects f_pos?

[email protected] writes:
> This looks like it's a bug to me.... although if you have multiple
> threads hitting a file descriptor at the same time, you're pretty much
> asking for trouble.

Yes, I haven't been able to come up with an example that might trigger
this that wasn't dubious to begin with. I'll raise this again at a
convenient time during 2.5.

David

2000-11-12 22:28:24

by David Schwartz

[permalink] [raw]
Subject: RE: What protects f_pos?


> [email protected] writes:
> > This looks like it's a bug to me.... although if you have multiple
> > threads hitting a file descriptor at the same time, you're pretty much
> > asking for trouble.
>
> Yes, I haven't been able to come up with an example that might trigger
> this that wasn't dubious to begin with. I'll raise this again at a
> convenient time during 2.5.
>
> David

Suppose you had a multithreaded program that used a configuration file with
a group of fixed-length blocks indicating what work it had to do. Each
thread read a block from the file and then did the work. One might think
that there's no need to protect the file descriptor with a mutex.

DS

2000-11-13 15:23:37

by David Wragg

[permalink] [raw]
Subject: Re: What protects f_pos?

"David Schwartz" <[email protected]> writes:
> Suppose you had a multithreaded program that used a
> configuration file with a group of fixed-length blocks indicating what
> work it had to do. Each thread read a block from the file and then did
> the work. One might think that there's no need to protect the file
> descriptor with a mutex.

I don't think that this will work, due to a separate non-atomicity
issue with f_pos. The generic file read and write implementations do
not atomically update f_pos. They read f_pos to determine the file
offset to use, then manipulate the page cache (possibly sleeping on
I/O), and only then set f_pos to the appropriate updated value. So
the example you suggest, with two threads, could do something like:

Thread 1 Thread 2

sys_read(fd, buf1, len)

off = file->f_pos sys_read(fd, buf2, len)

read to buf1 off = file->f_pos

file->f_pos = off + len read to buf2

file->f_pos = off + len


So both threads read the same block, and f_pos only gets incremented
once.

(Pipes and sockets are a different matter, of course.)

2.2 has the same issue, since although the BKL is held, it will get
dropped if one of the threads sleeps on I/O. (Earlier Linux versions
might well have the same issue, but I don't have the source around to
check.)

POSIX doesn't seem to bar this behaviour. From 6.4.1.2:

On a regular file or other file capable of seeking, read() shall
start at a position in the file given by the file offset
associated with fildes. Before successful return from read(), the
file offset shall be incremented by the number of bytes actually
read.

Which is exactly what Linux does. I can't find text anywhere else in
POSIX.1 that strengthens that condition for the case of multiple
processes/threads reading from the same file. I'll try to find out
what the Austin Group has to say about this.


David

2000-11-13 15:46:00

by Richard B. Johnson

[permalink] [raw]
Subject: Re: What protects f_pos?

On 13 Nov 2000, David Wragg wrote:

> "David Schwartz" <[email protected]> writes:
[SNIPPED....]


>
> So both threads read the same block, and f_pos only gets incremented
> once.
>
> (Pipes and sockets are a different matter, of course.)
>
> 2.2 has the same issue, since although the BKL is held, it will get
> dropped if one of the threads sleeps on I/O. (Earlier Linux versions
> might well have the same issue, but I don't have the source around to
> check.)
>
> POSIX doesn't seem to bar this behaviour. From 6.4.1.2:
>
> On a regular file or other file capable of seeking, read() shall
> start at a position in the file given by the file offset
> associated with fildes. Before successful return from read(), the
> file offset shall be incremented by the number of bytes actually
> read.
>
> Which is exactly what Linux does. I can't find text anywhere else in
> POSIX.1 that strengthens that condition for the case of multiple
> processes/threads reading from the same file. I'll try to find out
> what the Austin Group has to say about this.
>

Anything that uses f_pos has a problem with multiple readers or multiple
writers. That's one of the problems that file locking is supposed to
handle. An application has got to know how to access a file when it can
be changed by another.

There is a more far-reaching problem that sooner or later will get us.
That's the built-in problem with lseek. Lseek is supposed to return -1
if it failed, with errno being set appropriately. This, in principle,
does not prevent seeking to the last byte of the file as long as errno
is checked by the caller if the function returns -1.

Linux, internally, uses "-errno" as a return value to show an
error. So what happens if we `lseek` to -EBADF or -EFAULT, etc.

An otherwise empty lseek routine in a module, that does nothing except
return the input value, will show errors once the offsets corresponding
to negative errno values are reached.


Cheers,
Dick Johnson

Penguin : Linux version 2.4.0 on an i686 machine (799.54 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.