2005-03-21 05:46:26

by Ben Pfaff

[permalink] [raw]
Subject: [CHECKER] ext3 bug in ftruncate() with O_SYNC?

Hi. We're doing some checking on Linux file systems and found
what appears to be a bug in the Linux 2.6.11 implementation of
ext3: when ftruncate shrinks a file, using a file descriptor
opened with O_SYNC, the file size is not updated synchronously.
I've appended a test program that illustrates the problem. After
this program runs, the file system shows a file with length 1031,
not 4 as would be expected if the ftruncate completed
synchronously. (If I insert an fsync before closing the file,
the file length is correct.)

Does this look like a bug to you guys?

Thanks,

Ben.

----------------------------------------------------------------------

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h>
#include <assert.h>

#define CHECK(ret) if(ret < 0) {perror(0); assert(0);}

int systemf(const char *fmt, ...)
{
static char cmd[1024];

va_list ap;
va_start(ap, fmt);
vsprintf(cmd, fmt, ap);
va_end(ap);

fprintf(stderr, "running cmd \"%s\"\n", cmd);
return system(cmd);
}

main(int argc, char *argv[])
{
int ret, fd;
systemf("umount /dev/sbd0");
systemf("sbin/mkfs.ext3 -F -j -b 1024 /dev/sbd0 2048");
systemf("sbin/e2fsck.shared -y /dev/sbd0");
systemf("mount -t ext3 /dev/sbd0 /mnt/sbd0 -o commit=65535");
systemf("umount /dev/sbd0");
systemf("mount -t ext3 /dev/sbd0 /mnt/sbd0 -o commit=65535");

fd = open("/mnt/sbd0/0001", O_CREAT | O_RDWR | O_SYNC, 0700);
CHECK(fd);
ret = write(fd, "foobar", 6);
CHECK(ret);
ret = ftruncate(fd, 1031);
CHECK(ret);
ret = pwrite(fd, "bazzle", 6, 1031 - 6);
CHECK(ret);
ret = ftruncate(fd, 4);
CHECK(ret);
ret = close(fd);
CHECK(ret);

#if 0
{
#include "../sbd/sbd.h"
int sbd_fd = open("/dev/sbd0", O_RDONLY);
ret = ioctl(sbd_fd, SBD_COPY_DISK, 1);
CHECK(ret);
close(sbd_fd);
}
#else
systemf("reboot -f -n");
#endif
return 0;
}



--
Ben Pfaff
email: [email protected]
web: http://benpfaff.org


2005-03-22 04:01:40

by Andrew Morton

[permalink] [raw]
Subject: Re: [CHECKER] ext3 bug in ftruncate() with O_SYNC?

Ben Pfaff <[email protected]> wrote:
>
> Hi. We're doing some checking on Linux file systems and found
> what appears to be a bug in the Linux 2.6.11 implementation of
> ext3: when ftruncate shrinks a file, using a file descriptor
> opened with O_SYNC, the file size is not updated synchronously.
> I've appended a test program that illustrates the problem. After
> this program runs, the file system shows a file with length 1031,
> not 4 as would be expected if the ftruncate completed
> synchronously. (If I insert an fsync before closing the file,
> the file length is correct.)
>
> Does this look like a bug to you guys?
>

The spec says "Write I/O operations on the file descriptor shall complete
as defined by synchronized I/O file integrity completion".

Is ftruncate a "write I/O operation"? No. Should ftruncate() honour
O_SYNC? I'd say so, yes. After all, an extending ftruncate is a
sort-of-write.

Unfortunately Linux doesn't pass the file* down to the
filesytem's ->truncate() handler, so the fs doesn't know that the
ftruncated fd was opened O_SYNC. I don't think _any_ Linux filesystems get
this right.


> Ben.
>
> ----------------------------------------------------------------------
>
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <stdarg.h>
> #include <assert.h>
>
> #define CHECK(ret) if(ret < 0) {perror(0); assert(0);}
>
> int systemf(const char *fmt, ...)
> {
> static char cmd[1024];
>
> va_list ap;
> va_start(ap, fmt);
> vsprintf(cmd, fmt, ap);
> va_end(ap);
>
> fprintf(stderr, "running cmd \"%s\"\n", cmd);
> return system(cmd);
> }
>
> main(int argc, char *argv[])
> {
> int ret, fd;
> systemf("umount /dev/sbd0");
> systemf("sbin/mkfs.ext3 -F -j -b 1024 /dev/sbd0 2048");
> systemf("sbin/e2fsck.shared -y /dev/sbd0");
> systemf("mount -t ext3 /dev/sbd0 /mnt/sbd0 -o commit=65535");
> systemf("umount /dev/sbd0");
> systemf("mount -t ext3 /dev/sbd0 /mnt/sbd0 -o commit=65535");
>
> fd = open("/mnt/sbd0/0001", O_CREAT | O_RDWR | O_SYNC, 0700);
> CHECK(fd);
> ret = write(fd, "foobar", 6);
> CHECK(ret);
> ret = ftruncate(fd, 1031);
> CHECK(ret);
> ret = pwrite(fd, "bazzle", 6, 1031 - 6);
> CHECK(ret);
> ret = ftruncate(fd, 4);
> CHECK(ret);
> ret = close(fd);
> CHECK(ret);
>
> #if 0
> {
> #include "../sbd/sbd.h"
> int sbd_fd = open("/dev/sbd0", O_RDONLY);
> ret = ioctl(sbd_fd, SBD_COPY_DISK, 1);
> CHECK(ret);
> close(sbd_fd);
> }
> #else
> systemf("reboot -f -n");
> #endif
> return 0;
> }
>
>
>
> --
> Ben Pfaff
> email: [email protected]
> web: http://benpfaff.org
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2005-03-23 21:10:09

by Stephen C. Tweedie

[permalink] [raw]
Subject: Re: [CHECKER] ext3 bug in ftruncate() with O_SYNC?

Hi,

On Tue, 2005-03-22 at 03:51, Andrew Morton wrote:

> The spec says "Write I/O operations on the file descriptor shall complete
> as defined by synchronized I/O file integrity completion".
>
> Is ftruncate a "write I/O operation"? No.

SUS seems to be pretty clear on this. The syscall descriptions for
write(2) and pwrite(2) explicitly describe O_SYNC as requiring
synchronized I/O file integrity completion. ftruncate() has no such
requirement.

It would certainly be a reasonable thing to do, but I don't think it
strictly counts as a bug that we're not honouring O_SYNC here.

--Stephen