LinuxLists.cc - [2.4.19] read(2) and page aligned buffers

2002-11-06 18:07:29

Subject: [2.4.19] read(2) and page aligned buffers

(synopsis: read() is returning a short read of
a disk file without setting errno when passed
a malloc()ed buffer that is not page-aligned, while
working correctly when passed a buffer of the
same size directly allocated with mmap()).

[I suppose the page alignment thing is just a guess,
and I should really malloc() an oversized buffer,
step through it to the next page boundary, and
pass that to read() to see if the problem goes
away. But I just thought of that a couple of
sentences ago, so here is the description anyway.]

I have a function that opens a file O_RDONLY,
mmap()s it PROT_READ, MAP_SHARED, then passes it to
a checksum function, and finally munmap()s it.

This is working fine.

In a fit of magnanimity, I decided to make an OPM-friendly
version that doesn't use mmap(). So I replaced the
mmap() with malloc()+read() (with the usual error
checking, retry on EINTR||EAGAIN, incremental read to
incremented buffer offsets with decremented counts,
not incorporating -1 returns into the accumulated
read count, yada yada.)

After making this change, the function randomly
returns error from the read() wrapper. Delving
into it with gdb, I discovered that read() returns
a short count with errno == 0, the wrapper loops
and tries to read the rest of the file to the
offset into the buffer past what it already read,
read() returns 0 with errno still == 0, and of
course the wrapper decides that it must be at
EOF (read() == 0 && errno == 0) and returns.

I wondered if it was a page fault race (unlikely,
since it should have the same problem with an
mmap() buffer), but "memset(buf, 0, size);"
in between the malloc()and the read()does not fix it.

One thing I did notice: the buffer address when
read fails is 0x....08, which is a normal malloc()
alignment and really shouldn't bother read() (which
AFAIK doesn't need an aligned buffer in the first
place.)

Lately it has been running 11 bytes short when it
happens. Last night, in a slightly different version
of the same code, it was running 4 bytes short when
it happened. It isn't every file, but then malloc()
doesn't come up with start addresses at the same
page offset every time, either.

(gdb-5.2 found __libc_read in libg.a (2.2.5)
impenetrable, so this was as deep as I went into it.)

(gcc-2.95.3, binutils-2129009, -O2 -march=k6, cpu
is k6-II/450, SIS-5513/530, generally stable mb that
compiles kernels and X without hiccups).

[Intuition: it's not the lack of page alignment
generically, but rather an exactly 8-byte offset
from beginning of page that is the source of the problem.]

HTH,

Clayton Weaver
<mailto: [email protected]>

--
_______________________________________________
Sign-up for your own FREE Personalized E-mail at Mail.com
http://www.mail.com/?sr=signup

Single & ready to mingle? lavalife.com: Where singles click. Free to Search!
http://www.lavalife.com/mailcom.epl?a=2116

2002-11-06 19:32:16

by Adam Kropelin

[permalink] [raw]

Subject: Re: [2.4.19] read(2) and page aligned buffers

On Wed, Nov 06, 2002 at 01:13:57PM -0500, Clayton Weaver wrote:
> a short count with errno == 0, the wrapper loops
> and tries to read the rest of the file to the
> offset into the buffer past what it already read,
> read() returns 0 with errno still == 0, and of
> course the wrapper decides that it must be at
> EOF (read() == 0 && errno == 0) and returns.

This isn't necessarily the cause of your problem, but your description
here smells an awful lot like classic errno abuse. errno is only valid
when read() returns -1. The check you cite in your last sentence is
illegal.

If read() returns 0, you're done. You're at EOF. If you're not actually
at EOF then *that* is a bug.

--Adam

2002-11-07 06:47:51

by Clayton Weaver

[permalink] [raw]

Subject: Re: [2.4.19] read(2) and page aligned buffers

(It's not an alignment issue. Still broken read()ing whole file
into page-aligned malloc() buffers and MAP_PRIVATE|MAP_ANONYMOUS mmap()
buffers, while not broken mmap()ing the file directly.)

Ok, bizarrely enough (or not), I couldn't reproduce it with a
test program that isolates the functions that return error
in my program in a program of their own that just reads a
dir, open()s files, malloc()s space, read()s the file into the space,
free()s the buffer, and close()s the file. I even incorporated
the sha1 checksum functions from textutils that I use in the actual
program seeing the error into the test program and ran the buffers
through them, still without reproducing the error. I used one of the dirs
where I actually see the errors with malloc() + read() that I originally
reported.

But the test program isn't significantly different from the would-be
production code that sees the error. I've posted it below in case
it is informative.

(preliminaries: the file size is ok, it's an lstat() st_size that
mmap(...MAP_SHARED...) has no issues with. The open() in the
caller of gethash() is merely

fd = open(fdata->name, O_RDONLY);
if (fd >= 0) {

followed by an fcntl() read lock, etc. Eventually it gets down to

if (gethash(fd, fdata->size, fdata->name,
REC_CHKSUM(*current)) == NULL) {
return -1;
}

Here is gethash(), the version that sees the error from read():

/*****
* gethash():
*
* args: const int fd open file descriptor for file to checksum
* const off_t len length of file
* const char * pathname for error reporting
* void * outsum where to put checksum
*
* returns: void * (outsum) or NULL
*
* Notes: Takes a file and size as args, and returns the address of
* outsum with the checksum in the first sizeof(checksum)
* bytes (outsum can point to the tail of an f_rec or to some
* other buffer with sufficient space). sizeof(checksum)
* is 20 bytes for sha1 checksums.
*
* Calls sha1 code lifted from gnu sha.c in gnu textutils-2.0.21.
*****/

/* assume that the appropriate #includes are up here */

void * gethash(const int, const off_t, const char *, void *);

/* extern prototypes; see headers */

void * gethash(const int fd, const off_t len, const char * pathname,
void * outsum)
{
const char * funcname = "gethash()";
struct sha_ctx bchk_ctx;
void * filebuf;

#ifndef NDEBUG
if (!len || !pathname || !outsum) {
bchk_err(funcname, nullarg);
return NULL;
}
#endif

sha_init_ctx(&bchk_ctx);

filebuf = xmalloc((size_t) len);
if (!filebuf) {
return NULL;
}

if (wrap_read(fd, filebuf, (size_t) len) != (ssize_t) len) {
rpt_syserr(funcname, read_err, pathname, NULL);
return NULL;
}

/* (from the version that works fine)
filebuf = wrap_mmap((size_t) len, PROT_READ, (int) fd);
if (filebuf == NULL) {
rpt_syserr(funcname, mmap_err, pathname, NULL);
return NULL;
}
*/

/* sha_process_bytes() does its own % 64 check and calls
sha_process_block() internally to process chunks that are
a multiple of 64 bytes before handling any remainder < 64 */

sha_process_bytes(filebuf, (size_t) len, &bchk_ctx);
(void) sha_finish_ctx(&bchk_ctx, outsum);
free(filebuf);
/* (void) munmap(filebuf, (size_t) len); */

return outsum;
}

Here is wrap_read() (assume that the appropriate #includes are there):

/* read() wrapper with incremental retry on interrupt */

ssize_t wrap_read(int fd, void * buf, size_t count)
{
ssize_t retval;
ssize_t tmpret;

#ifndef NDEBUG
const char * funcname = "wrap_read()";
if (!buf || !count || fd < 0) {
bchk_err(funcname, nullarg);
return -1;
}
#endif

retval = 0;
errno = 0;
do {
tmpret = read(fd, ((char *) buf + retval), count - retval);
if (tmpret + retval == (ssize_t) count) {
return (ssize_t) count;
}
else {
switch (tmpret) {
case -1:
switch(errno) {
case EINTR:
case EAGAIN:
break;

default: /* real error */
return retval;
break;
}
break;

case 0:
return retval;
break;

default: /* partial read */
switch(errno) {
case EINTR: /* interrupted by signal */
case EAGAIN: /* O_NONBLOCK ? */
retval += tmpret;
break;

default: /* real error */
return (tmpret + retval);
break;
}
break;
}
}
} while (retval < count);

return retval;
}

And that's bloody it.

?

(I thought, "stack", but what's downstream? 2 sha1 calls that
don't seem to do anything evil in the test program that attempted
to isolate the error and free(). Doesn't add up.)

Regards,

Clayton Weaver
<mailto: [email protected]>

--
_______________________________________________
Sign-up for your own FREE Personalized E-mail at Mail.com
http://www.mail.com/?sr=signup

Single & ready to mingle? lavalife.com: Where singles click. Free to Search!
http://www.lavalife.com/mailcom.epl?a=2116

2002-11-07 13:37:12

by Adam Kropelin

[permalink] [raw]

Subject: Re: [2.4.19] read(2) and page aligned buffers

On Thu, Nov 07, 2002 at 01:54:21AM -0500, Clayton Weaver wrote:
> Here is wrap_read() (assume that the appropriate #includes are there):

Your code is broken, and for exactly the same reason I told you
yesterday.

> default: /* partial read */
> switch(errno) {
> case EINTR: /* interrupted by signal */
> case EAGAIN: /* O_NONBLOCK ? */
> retval += tmpret;
> break;

I repeat: Checking errno when read() has returned something other than
-1 is ILLEGAL. Period. This check is triggering early, thus giving your
supposed early EOF.

This is not even remotely a kernel issue.

--Adam