Hello,
I'm seeing a race condition on Linux 2.6 that rather reproducibly
causes GCC bootstrap failures on current mainline.
The problem is spurious rebuilds of some auto-generated files
in the gcc/ directory while building the Ada libraries.
For example, we have the following chain of dependencies:
genconditions generates insn-conditions.c, which is compiled
into insn-conditions.o, which is archived into libbackend.a,
which is linked into the compiler binaries (cc1, cc1plus, ...).
Now, insn-conditions.c is a rather small file, and compiling it
into insn-conditions.o takes usually less than a second. So,
it may well happen that these two files end up having a time
stamp that differs only in the sub-second part. (Note that
with Linux 2.4, file time stamps didn't actually have a
sub-second part; this is a new 2.6 feature.)
Unfortunately, while Linux now has sub-second time stamps,
those cannot actually be *stored* on disk when using the ext3
file system (because the on-disk format has no space). This
has the rather curious effect that while the inode is still
held in Linux's inode cache, the time stamp will retain its
sub-second component, but once the inode was dropped from the
inode cache and later re-read from disk, the sub-second part
will be truncated to zero.
Now, if this truncation happens to the insn-conditions.o file,
but not the insn-conditions.c file, the former will jump from
being just slightly newer than the latter to being just slightly
*older* than the latter. For some reason, this tends to occur
rather reliably during a gcc bootstrap on my system.
Now, as long as the main 'make' is running, this has no adverse
effect (apparently because make remembers it has already built
the insn-conditions.o file from insn-conditions.c). However,
once the libada library is built, it performs a recursive make
invocation that once again checks dependencies in the master
gcc build directory, including the dependencies on gnat1.
At this point, make will re-check the file time stamps and
decide it needs to rebuild insn-conditions.o. This in turn
triggers a rebuild of libbackend.a and all compiler binaries.
This is bad for two reasons. First, at this point in time various
macros required to build the main gcc components aren't in fact
set up correctly, and thus the file will be rebuilt using the
host compiler and not the newly built stage3 gcc.
More importantly, when using parallel make to do the bootstrap,
at this point some *other* library, e.g. libstdc++ or libjava,
will be built at the same time, using the cc1plus or jc1 binaries
that the libada make has just decided it needs to rebuild.
While these binaries are being rebuilt, they will be in a deleted
or inconsistent state for a certain period of time. During this
period, attempts to start compiles for libstdc++ or libjava
components will fail, causing the whole bootstrap to abort.
The appended makefile hack ensures that for generated files,
the .c and .o file time stamps differ by at least a second,
which makes the problem disappear. This allows the bootstrap
to succeed again on my system.
However, I'd say that this should probably be fixed in the kernel,
e.g. by not reporting high-precision time stamps in the first
place if the file system cannot store them ...
Bye,
Ulrich
Index: gcc/Makefile.in
===================================================================
RCS file: /cvs/gcc/gcc/gcc/Makefile.in,v
retrieving revision 1.1266
diff -c -p -r1.1266 Makefile.in
*** gcc/Makefile.in 24 Mar 2004 18:03:15 -0000 1.1266
--- gcc/Makefile.in 30 Mar 2004 23:46:11 -0000
*************** POD2MAN = pod2man --center="GNU" --relea
*** 247,253 ****
# Some versions of `touch' (such as the version on Solaris 2.8)
# do not correctly set the timestamp due to buggy versions of `utime'
# in the kernel. So, we use `echo' instead.
! STAMP = echo timestamp >
# Make sure the $(MAKE) variable is defined.
@SET_MAKE@
--- 247,253 ----
# Some versions of `touch' (such as the version on Solaris 2.8)
# do not correctly set the timestamp due to buggy versions of `utime'
# in the kernel. So, we use `echo' instead.
! STAMP = sleep 1 >
# Make sure the $(MAKE) variable is defined.
@SET_MAKE@
--
Dr. Ulrich Weigand
[email protected]
On Thu, 1 Apr 2004 21:28:20 +0200 (CEST)
Ulrich Weigand <[email protected]> wrote:
> However, I'd say that this should probably be fixed in the kernel,
> e.g. by not reporting high-precision time stamps in the first
> place if the file system cannot store them ...
Interesting. We discussed the case as a theoretical possibility when
the patch was merged, but it seemed to unlikely to make it worth
complicating the first version.
The solution from back then I actually liked best was to just round
up to the next second instead of rounding down when going from 1s
resolution to ns.
-Andi
e.g. like this for ext3 (untested). Does that fix your problem?
diff -u linux-2.6.5rc3-work/fs/ext3/inode.c-o linux-2.6.5rc3-work/fs/ext3/inode.c
--- linux-2.6.5rc3-work/fs/ext3/inode.c-o 2004-04-01 22:07:43.000000000 +0200
+++ linux-2.6.5rc3-work/fs/ext3/inode.c 2004-04-01 22:08:49.000000000 +0200
@@ -2624,9 +2624,11 @@
}
raw_inode->i_links_count = cpu_to_le16(inode->i_nlink);
raw_inode->i_size = cpu_to_le32(ei->i_disksize);
- raw_inode->i_atime = cpu_to_le32(inode->i_atime.tv_sec);
- raw_inode->i_ctime = cpu_to_le32(inode->i_ctime.tv_sec);
- raw_inode->i_mtime = cpu_to_le32(inode->i_mtime.tv_sec);
+ /* round up because we cannot store nanoseconds. This avoids
+ the time jumping back when the inode is loaded again. */
+ raw_inode->i_atime = cpu_to_le32(inode->i_atime.tv_sec + 1);
+ raw_inode->i_ctime = cpu_to_le32(inode->i_ctime.tv_sec + 1);
+ raw_inode->i_mtime = cpu_to_le32(inode->i_mtime.tv_sec + 1);
raw_inode->i_blocks = cpu_to_le32(inode->i_blocks);
raw_inode->i_dtime = cpu_to_le32(ei->i_dtime);
raw_inode->i_flags = cpu_to_le32(ei->i_flags);
On Thu, Apr 01, 2004 at 10:09:57PM +0200, Andi Kleen wrote:
> On Thu, 1 Apr 2004 21:28:20 +0200 (CEST)
> Ulrich Weigand <[email protected]> wrote:
>
> > However, I'd say that this should probably be fixed in the kernel,
> > e.g. by not reporting high-precision time stamps in the first
> > place if the file system cannot store them ...
>
> Interesting. We discussed the case as a theoretical possibility when
> the patch was merged, but it seemed to unlikely to make it worth
> complicating the first version.
>
> The solution from back then I actually liked best was to just round
> up to the next second instead of rounding down when going from 1s
> resolution to ns.
>
> -Andi
>
> e.g. like this for ext3 (untested). Does that fix your problem?
(I haven't tested anything but...) why should this fix it? Ulrich's
problem happens when the .o file is flushed from the cache, and then
stat'd; it now appears to be older than the .c file. With a change to
round up instead, if the .c file is flushed from the cache before the
.o, the .c will still suddenly appear to be newer than the .o.
--
Daniel Jacobowitz
MontaVista Software Debian GNU/Linux Developer
On Thu, 1 Apr 2004 15:39:23 -0500
Daniel Jacobowitz <[email protected]> wrote:
> >
>
> (I haven't tested anything but...) why should this fix it? Ulrich's
> problem happens when the .o file is flushed from the cache, and then
> stat'd; it now appears to be older than the .c file. With a change to
> round up instead, if the .c file is flushed from the cache before the
> .o, the .c will still suddenly appear to be newer than the .o.
That is what he wants I think. It's logically just like taking a bit longer.
-Andi
Andi Kleen writes:
> The solution from back then I actually liked best was to just round
> up to the next second instead of rounding down when going from 1s
> resolution to ns.
(My understanding of kernel internals is way rusty, so if I am
talking nonsense here, just hit me with a cluestick or ignore
me or something).
Ummm, I think that will just fail in the converse way if
insn-conditions.o is retained in the inode cache while
insn-conditions.c is dropped from the cache?
That is, consider these time-stamps:
insn-conditions.c SSSS.NS1
insn-conditions.o SSSS.NS2
Where SSSS is the same value on both files, and NS2 > NS1.
According to Ulrich, the current problem is that insn-conditions.o
is dropped from the cache, so NS2 becomes 0, and insn-conditions.o
becomes older than insn-conditions.c.
With your patch, suppose that insn-conditions.c is dropped from the
cache, while insn-conditions.o remains. Then the timestamps will be:
insn-conditions.c SSSS+1.0
insn-conditions.o SSSS.NS2
We lose again, insn-conditions.c has become newer than insn-conditions.o.
insn-conditions.c is a generated file so I think this can actually
happen during a gcc build.
Michael C
>
> On Thu, 1 Apr 2004 15:39:23 -0500
> Daniel Jacobowitz <[email protected]> wrote:
> > >
> >
> > (I haven't tested anything but...) why should this fix it? Ulrich's
> > problem happens when the .o file is flushed from the cache, and then
> > stat'd; it now appears to be older than the .c file. With a change to
> > round up instead, if the .c file is flushed from the cache before the
> > .o, the .c will still suddenly appear to be newer than the .o.
>
> That is what he wants I think. It's logically just like taking a bit longer.
Not really. I have two files generated in sequence:
.c at 08:12:23.1
.o at 08:12:23.2
As long as the .o stays more recent than the .c, everything is fine.
With the current code, this breaks if the .o file is flushed and
re-read, while the .c file stays in cache:
.c at 08:12:23.1
.o at 08:12:23.0 (oops, suddenly in need of rebuild)
With your suggested change, it will break if the .c file is flushed
while the .o file stays in cache (as Daniel pointed out):
.c at 08:12:24.0
.o at 08:12:23.1 (oops, suddenly in need of rebuild)
Both lead to the same type of problem.
Bye,
Ulrich
--
Dr. Ulrich Weigand
[email protected]
On Thu, Apr 01, 2004 at 09:28:20PM +0200, Ulrich Weigand wrote:
> Hello,
>
> I'm seeing a race condition on Linux 2.6 that rather reproducibly
> causes GCC bootstrap failures on current mainline.
We saw lots of parallel build problems when using a 2.6 kernel on
an older distribution. The problems went away when we used 'make'
built with a new version of glibc.
Janis
Janis Johnson wrote:
> On Thu, Apr 01, 2004 at 09:28:20PM +0200, Ulrich Weigand wrote:
> > Hello,
> >
> > I'm seeing a race condition on Linux 2.6 that rather reproducibly
> > causes GCC bootstrap failures on current mainline.
>
> We saw lots of parallel build problems when using a 2.6 kernel on
> an older distribution. The problems went away when we used 'make'
> built with a new version of glibc.
I'm using pretty recent versions of everything: a glibc 2.3.3 as of
2004-03-11, binutils 2.15.90.0.1, make 3.80, and a 2.6.4+ kernel.
Bye,
Ulrich
--
Dr. Ulrich Weigand
[email protected]
On Thu, 1 Apr 2004 23:01:39 +0200 (CEST)
Ulrich Weigand <[email protected]> wrote:
> Both lead to the same type of problem.
Indeed. My change would only provide a lighter guarantee - time never going backwards.
-Andi
[ Linux 2.6 losing the nanoseconds from a file timestamp ]
There are two different failure modes, but in most cases only one
results in a real problem.
Case 1: make falsely thinks that the .o is younger than the .c. It
decides not to rebuild the .o, resulting in a bad build.
Case 2: make falsely thinks that the .c is younger than the .o. It
recompiles the .c file, even though it didn't have to. Harmless.
So if we can make the bad situation look like a tie, and always rebuild
in the case of a tie, we will obtain valid builds, sometimes with
an extra compilation or two.
Joe Buck <[email protected]> wrote:
> Case 2: make falsely thinks that the .c is younger than the .o. It
> recompiles the .c file, even though it didn't have to. Harmless.
The OP explained how this can be harmful in the case of parallel
builds - the .o file is not updated atomically, so while one part of
the build is (unnecessarily) updating it, another part will fail to
find it.
paul
Joe Buck wrote:
> Case 2: make falsely thinks that the .c is younger than the .o. It
> recompiles the .c file, even though it didn't have to. Harmless.
*Not* harmless, in fact this is exactly what breaks my bootstrap.
Think about what happens when cc1 is 'harmlessly' rebuilt just
while in a parallel make that very same cc1 binary is used to
run a compile ...
Bye,
Ulrich
--
Dr. Ulrich Weigand
[email protected]
On Fri, Apr 02, 2004 at 12:48:55AM +0200, Ulrich Weigand wrote:
> Joe Buck wrote:
>
> > Case 2: make falsely thinks that the .c is younger than the .o. It
> > recompiles the .c file, even though it didn't have to. Harmless.
>
> *Not* harmless, in fact this is exactly what breaks my bootstrap.
>
> Think about what happens when cc1 is 'harmlessly' rebuilt just
> while in a parallel make that very same cc1 binary is used to
> run a compile ...
Well, then, you have a problem: how to handle ties? Consider a system
with no extra precision, and one-second time resolution. You see an .o
file and a .c file that are the same age. Rebuild or not? If we don't
and are wrong, the .o file is bad. If we do and are wrong, the target
will be modified while some process is trying to use it. Somehow we
have to figure out how to do the makefile so that neither problem can
occur.
In your particular case, for example, if the command to rebuild cc1 builds
the new version in a different place, then does an mv, the rebuild *will*
be harmless.
Andi Kleen wrote:
> > However, I'd say that this should probably be fixed in the kernel,
> > e.g. by not reporting high-precision time stamps in the first
> > place if the file system cannot store them ...
>
> Interesting. We discussed the case as a theoretical possibility when
> the patch was merged, but it seemed to unlikely to make it worth
> complicating the first version.
>
> The solution from back then I actually liked best was to just round
> up to the next second instead of rounding down when going from 1s
> resolution to ns.
Files spontaneously getting newer is also problem, although the
consequence is usually less severe than spontaneously getting older.
> - raw_inode->i_atime = cpu_to_le32(inode->i_atime.tv_sec);
> - raw_inode->i_ctime = cpu_to_le32(inode->i_ctime.tv_sec);
> - raw_inode->i_mtime = cpu_to_le32(inode->i_mtime.tv_sec);
> + /* round up because we cannot store nanoseconds. This avoids
> + the time jumping back when the inode is loaded again. */
> + raw_inode->i_atime = cpu_to_le32(inode->i_atime.tv_sec + 1);
> + raw_inode->i_ctime = cpu_to_le32(inode->i_ctime.tv_sec + 1);
> + raw_inode->i_mtime = cpu_to_le32(inode->i_mtime.tv_sec + 1);
The patch always increments the stored seconds by one. If an inode
is read, dirtied, then stored, the seconds fields will all be
incremented by 1 every time that happens, won't they? I.e. every
change to atime interleaved by a flush will increment the seconds
field of ctime and mtime, won't it?
To round up the time properly, I think you need to change the code
that reads the inode, so that newly read inodes get a tv_nsec value of
999999999, and leave the writing code alone.
-- Jamie
On Thu, Apr 01, 2004 at 03:58:45PM -0800, Joe Buck wrote:
> On Fri, Apr 02, 2004 at 12:48:55AM +0200, Ulrich Weigand wrote:
> > Joe Buck wrote:
> >
> > > Case 2: make falsely thinks that the .c is younger than the .o. It
> > > recompiles the .c file, even though it didn't have to. Harmless.
> >
> > *Not* harmless, in fact this is exactly what breaks my bootstrap.
> >
> > Think about what happens when cc1 is 'harmlessly' rebuilt just
> > while in a parallel make that very same cc1 binary is used to
> > run a compile ...
>
> Well, then, you have a problem: how to handle ties? Consider a system
> with no extra precision, and one-second time resolution. You see an .o
> file and a .c file that are the same age. Rebuild or not? If we don't
> and are wrong, the .o file is bad. If we do and are wrong, the target
> will be modified while some process is trying to use it. Somehow we
> have to figure out how to do the makefile so that neither problem can
> occur.
Don't rebuild. If you have makefiles set up that the .o and _then_ the
.c can be rebuilt by parallel threads, you've got lots more wrong
already! And it's a reasonable assumption that the entire build, test,
notice something is wrong, fix, rebuild cycle takes longer than the
granularity of your timestamps.
> In your particular case, for example, if the command to rebuild cc1 builds
> the new version in a different place, then does an mv, the rebuild *will*
> be harmless.
No, that's not right - read what Ulrich wrote originally about the
wrong options being in scope (exported by make) at this point ->
bootstrap miscompares.
--
Daniel Jacobowitz
MontaVista Software Debian GNU/Linux Developer
On Thu, Apr 01, 2004 at 01:13:48PM -0800, Janis Johnson wrote:
> We saw lots of parallel build problems when using a 2.6 kernel on
> an older distribution. The problems went away when we used 'make'
> built with a new version of glibc.
Yes, that was a different bug, powerpc glibc specific.
http://sources.redhat.com/ml/libc-alpha/2004-02/msg00037.html
--
Alan Modra
IBM OzLabs - Linux Technology Centre
On Thu, Apr 01, 2004 at 10:09:57PM +0200, Andi Kleen wrote:
> just round up to the next second instead of rounding down when going
> from 1s resolution to ns.
Please don't do that. Longstanding tradition in timestamp code is to
truncate toward minus infinity when converting from a
higher-resolution timestamp to a lower-resolution timestamp. This
is consistent behavior, and is easy to explain: let's stick to it as a
uniform practice.
There are two basic principles here. First, ordinary files should not
change spontaneously: hence a file's timestamp should not change
merely because its inode is no longer cached. Second, a file's
timestamp should never be "in the future": hence one should never
round such timestamps up.
The only way I can see to satisfy these two principles is to truncate
the timestamp right away, when it is first put into the inode cache.
That way, the copy in main memory equals what will be put onto disk.
This is the approach taken by other operating systems like Solaris,
and it explains why parallel GCC builds won't have this problem on
these other systems.
Switching subjects slightly, in
<http://mail.gnu.org/archive/html/bug-coreutils/2004-03/msg00095.html>
I recently contributed code to coreutils that fixes some bugs with "cp
--update" and "mv --update" when files are copied from
high-resolution-timestamp file systems to low-resolution-timestamp
file systems. This code dynamically determines the timestamp
resolution of a file system by examining (and possibly mutating) its
timestamps. The current Linux+ext3 behavior (which I did not know
about) breaks this code, because it can cause "cp" to falsely think
that ext3 has nanosecond-resolution timestamps.
How long has the current Linux+ext3 behavior been in place? If it's
widespread, I'll probably have to think about adding a workaround to
coreutils. Does the behavior affect all Linux filesystems, or just
ext3?
I'll CC: this message to bug-coreutils to give them a heads-up.
Andi Kleen <[email protected]> wrote:
>
> he solution from back then I actually liked best was to just round
> up to the next second instead of rounding down when going from 1s
> resolution to ns.
>
> -Andi
>
> e.g. like this for ext3 (untested). Does that fix your problem?
>
> diff -u linux-2.6.5rc3-work/fs/ext3/inode.c-o linux-2.6.5rc3-work/fs/ext3/inode.c
> --- linux-2.6.5rc3-work/fs/ext3/inode.c-o 2004-04-01 22:07:43.000000000 +0200
> +++ linux-2.6.5rc3-work/fs/ext3/inode.c 2004-04-01 22:08:49.000000000 +0200
> @@ -2624,9 +2624,11 @@
> }
> raw_inode->i_links_count = cpu_to_le16(inode->i_nlink);
> raw_inode->i_size = cpu_to_le32(ei->i_disksize);
> - raw_inode->i_atime = cpu_to_le32(inode->i_atime.tv_sec);
> - raw_inode->i_ctime = cpu_to_le32(inode->i_ctime.tv_sec);
> - raw_inode->i_mtime = cpu_to_le32(inode->i_mtime.tv_sec);
> + /* round up because we cannot store nanoseconds. This avoids
> + the time jumping back when the inode is loaded again. */
> + raw_inode->i_atime = cpu_to_le32(inode->i_atime.tv_sec + 1);
> + raw_inode->i_ctime = cpu_to_le32(inode->i_ctime.tv_sec + 1);
> + raw_inode->i_mtime = cpu_to_le32(inode->i_mtime.tv_sec + 1);
> raw_inode->i_blocks = cpu_to_le32(inode->i_blocks);
> raw_inode->i_dtime = cpu_to_le32(ei->i_dtime);
> raw_inode->i_flags = cpu_to_le32(ei->i_flags);
I think this will cause the inode timestamps to keep on creeping forwards.
How about in ext3_read_inode() you do:
inode->i_atime.tv_sec = le32_to_cpu(raw_inode->i_atime);
inode->i_ctime.tv_sec = le32_to_cpu(raw_inode->i_ctime);
inode->i_mtime.tv_sec = le32_to_cpu(raw_inode->i_mtime);
- inode->i_atime.tv_nsec = inode->i_ctime.tv_nsec = inode->i_mtime.tv_nsec = 0;
+ inode->i_atime.tv_nsec = inode->i_ctime.tv_nsec = inode->i_mtime.tv_nsec = 999999999;
?
It still has problems, but I think they're smaller ones.
Paul Eggert wrote:
> There are two basic principles here. First, ordinary files should not
> change spontaneously: hence a file's timestamp should not change
> merely because its inode is no longer cached. Second, a file's
> timestamp should never be "in the future": hence one should never
> round such timestamps up.
We can resolve the second requirement (but not the first) in a
different way, by adjusting the timestamp when the inode is re-read
from disk.
When re-reading an inode, rounding the time up is done by setting the
tv_nsec field to 999999999.
If the on-disk timestamp is "now", i.e. the current second if it's a
1-second resolution, then we can avoid setting the timestamp to a
future time by setting the tv_nsec field to the current wall time's
nanosecond value. There is no need to round the time up any more than that.
However, sponteneous mtime changes are not polite. So I broadly agree
with the principle of:
> The only way I can see to satisfy these two principles is to truncate
> the timestamp right away, when it is first put into the inode cache.
> That way, the copy in main memory equals what will be put onto disk.
> This is the approach taken by other operating systems like Solaris,
> and it explains why parallel GCC builds won't have this problem on
> these other systems.
> How long has the current Linux+ext3 behavior been in place? If it's
> widespread, I'll probably have to think about adding a workaround to
> coreutils. Does the behavior affect all Linux filesystems, or just
> ext3?
All Linux filesystems - the nanoseconds field is retained on in-memory
inodes by the generic VFS code. The stored resolution varies among
filesystems, with the coarsest being 2 seconds (FAT), and some do
store nanoseconds. AFAIK there is no way to determine the stored
resolution using file operations alone.
This behaviour was established in 2.5.48, 18th November 2002.
The behaviour might not be restricted to Linux, because non-Linux NFS
clients may be connected to a Linux NFS server which has this behaviour.
-- Jamie
>>>>> "Jamie" == Jamie Lokier <[email protected]> writes:
Jamie> When re-reading an inode, rounding the time up is done by
Jamie> setting the tv_nsec field to 999999999.
Jamie> If the on-disk timestamp is "now", i.e. the current second if
Jamie> it's a 1-second resolution, then we can avoid setting the
Jamie> timestamp to a future time by setting the tv_nsec field to the
Jamie> current wall time's nanosecond value. There is no need to
Jamie> round the time up any more than that.
Given how much time it will take to compare the file's timestamp to
current before choosing 999999999 or now for the tv_nsec field, is
it a reasonable shortcut to just always use now's nsec value?
Obviously it is not *that* many cycles to do the compare, but we are
talking about a nanoseconds field, and the current tv_sec could
increment during the compare....
-JimC
Ulrich Weigand wrote:
> Hello,
>
> I'm seeing a race condition on Linux 2.6 that rather reproducibly
> causes GCC bootstrap failures on current mainline.
Ho hum.
I knew this was going to cause problems:
http://www.ussg.iu.edu/hypermail/linux/kernel/0110.1/0017.html
P?draig.
Jamie Lokier <[email protected]> writes:
> All Linux filesystems - the nanoseconds field is retained on in-memory
> inodes by the generic VFS code.
It's OK to do that, so long as 'stat' and 'fstat' truncate the
user-visible time stamps to the resolution of the filesystem. This
shouldn't cost much.
> AFAIK there is no way to determine the stored resolution using file
> operations alone.
Would it be easy to add one? For example, we might extend pathconf so
that pathconf(filename, _PC_MTIME_DELTA) returns the file system's
mtime stamp resolution in nanoseconds.
I write "mtime" because I understand that some Microsoft file systems
use different resolutions for mtime versus ctime versus atime, and
mtime resolution is all that I need for now. Also, the NFSv3 protocol
supports a delta quantity that tells the NFS client the mtime
resolution on the NFS server, so if you assume NFSv3 or better the
time stamp resolution is known for remote servers too.
> This behaviour was established in 2.5.48, 18th November 2002.
> The behaviour might not be restricted to Linux, because non-Linux NFS
> clients may be connected to a Linux NFS server which has this behaviour.
Ouch. Then it sounds like there's no easy workaround for existing
systems. Still it'd be nice to fix the bug for future systems.
Paul Eggert wrote:
> > AFAIK there is no way to determine the stored resolution using file
> > operations alone.
>
> Would it be easy to add one? For example, we might extend pathconf so
> that pathconf(filename, _PC_MTIME_DELTA) returns the file system's
> mtime stamp resolution in nanoseconds.
pathconf() and fpathconf() are the obvious POSIXy interfaces for it.
Other possibilities are getxattr(), lgetxattr() and fgetxattr().
The only thing I don't like is that some cacheing algorithms will need
to make 2 system calls for each file being checked, instead of 1. I
see no way around that, though. At least the attribute approach would
allow all three (different) delta values to be read in one call (listxattr).
Is there a de facto standard interface used by another OS for this?
> I write "mtime" because I understand that some Microsoft file systems
> use different resolutions for mtime versus ctime versus atime, and
> mtime resolution is all that I need for now.
I didn't know that, thanks.
> Also, the NFSv3 protocol supports a delta quantity that tells the
> NFS client the mtime resolution on the NFS server, so if you assume
> NFSv3 or better the time stamp resolution is known for remote
> servers too.
Nice!
-- Jamie
On Apr 1, 2004, Ulrich Weigand <[email protected]> wrote:
> - STAMP = echo timestamp >
> + STAMP = sleep 1 >
I don't think this will fix the problem, at least not portably.
sleep 1 > filename
will truncate filename before sleep starts, modifying its timestamp at
that point, and leave it unchanged afterwards. Some systems might
update the timestamp again when the file truncated&opened for writing
is closed, but I don't think this is required. Worse yet: some
systems don't support empty files, and will error out because sleep 1
produced no output. Also, since some filesystems don't have 1-second
granularity, you should probably use `sleep 2' instead.
A more portable way to spell it would be:
STAMP = sleep 2; echo timestamp >
or, in order to make $(STAMP) usable in the middle of &&/|| sequences:
STAMP = { sleep 2; echo timestamp; } >
--
Alexandre Oliva http://www.ic.unicamp.br/~oliva/
Red Hat Compiler Engineer aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist oliva@{lsd.ic.unicamp.br, gnu.org}
Jamie Lokier <[email protected]> writes:
> pathconf() and fpathconf() are the obvious POSIXy interfaces for it.
> Other possibilities are getxattr(), lgetxattr() and fgetxattr().
I didn't know about getxattr etc. They would work too.
> The only thing I don't like is that some cacheing algorithms will need
> to make 2 system calls for each file being checked, instead of 1.
Do you mean for mtime versus atime (versus ctime)? Yes, in that case
getxattr etc. would be a better choice.
How hard would this be to do? (Is it something you can do? :-)
Coreutils CVS assumes that the time stamp resolution is the same for
all files within the same file system. Is this a safe assumption
under Linux? I now worry that some NFS implementations might violate
that assumption, if a remote host is exporting several native file
systems, with different native resolutions, to the local host under a
single mount point. On the other hand, NFSv3 and NFSv4 clearly state
that the time stamp resolution is a per-filesystem concept, so perhaps
we should just consider that to be a buggy NFS server configuration.
> Is there a de facto standard interface used by another OS for this?
Not as far as I know, no.
Paul Eggert wrote:
> Jamie Lokier <[email protected]> writes:
> > The only thing I don't like is that some cacheing algorithms will need
> > to make 2 system calls for each file being checked, instead of 1.
>
> Do you mean for mtime versus atime (versus ctime)? Yes, in that case
> getxattr etc. would be a better choice.
No, I mean that they currently call fstat(). In future they'd need to
call fstat()+getxattr().
If it's a per-filesystem quantity, then of course fstat() is all they
need. So that would be great.
> Coreutils CVS assumes that the time stamp resolution is the same for
> all files within the same file system. Is this a safe assumption
> under Linux? I now worry that some NFS implementations might violate
> that assumption, if a remote host is exporting several native file
> systems, with different native resolutions, to the local host under a
> single mount point. On the other hand, NFSv3 and NFSv4 clearly state
> that the time stamp resolution is a per-filesystem concept, so perhaps
> we should just consider that to be a buggy NFS server configuration.
Buugy, perhaps, but quite useful sometimes. It's not a problem, if
we're clear that it's a per-filesystem quantity. That kind of NFS
server configuration can advertise the coarsest timestamp resolution
of all the underlying filesystems.
NFS is not the only remote filesystem, and some like Samba can
certainly span multiple underlying filesystems without violating any
specificaiton.
With that in mind, we'd need to be clear that the resolution actually
stored may exceed the resolution advertised. I don't know whether
that breaks coreutils' assumption.
-- Jamie
Jamie Lokier <[email protected]> writes:
>> Do you mean for mtime versus atime (versus ctime)? Yes, in that case
>> getxattr etc. would be a better choice.
>
> No, I mean that they currently call fstat(). In future they'd need to
> call fstat()+getxattr().
Coreutils currently assumes that the time stamp resolution is a
per-filesystem quantity, and it keeps track of all the filesystems
that it's seen, so the number of extra calls to getxattr for this
purpose would be quite small. This is all assuming that other
programs aren't mutating the file system mounts while 'cp' is running,
but that assumption is already hardwired in several other places.
> With that in mind, we'd need to be clear that the resolution actually
> stored may exceed the resolution advertised. I don't know whether
> that breaks coreutils' assumption.
I think it'd be good enough for coreutils.
What's the next step to get this sort of thing running? (I haven't
had much luck getting my (rare) Linux patches accepted....)
PS. While we're on the subject, I'd like to add a utimens system
call, which behaves like utime/utimes except that it specifies a
nanosecond-resolution time stamp. This will allow programs like 'cp
-p', 'mv', and 'tar' to copy timestamps correctly; currently they lose
the low order part of the time stamps when copying.
On Fri, Apr 02, 2004 at 02:14:11AM +0100, Jamie Lokier wrote:
> However, sponteneous mtime changes are not polite. So I broadly agree
> with the principle of:
>
> Paul Eggert wrote:
> > The only way I can see to satisfy these two principles is to truncate
> > the timestamp right away, when it is first put into the inode cache.
> > That way, the copy in main memory equals what will be put onto disk.
> > This is the approach taken by other operating systems like Solaris,
> > and it explains why parallel GCC builds won't have this problem on
> > these other systems.
So is there any chance in the world that this behavior could be
implemented? None of the alternatives work, and we now know that the
problem bites. (I can't even guess how much time Ulrich wasted
diagnosing it.)
> This behaviour was established in 2.5.48, 18th November 2002.
And shown to be broken in October.
Andrew
On Thu, 1 April 2004 16:37:15 -0800, Andrew Morton wrote:
>
> I think this will cause the inode timestamps to keep on creeping forwards.
>
> How about in ext3_read_inode() you do:
>
> inode->i_atime.tv_sec = le32_to_cpu(raw_inode->i_atime);
> inode->i_ctime.tv_sec = le32_to_cpu(raw_inode->i_ctime);
> inode->i_mtime.tv_sec = le32_to_cpu(raw_inode->i_mtime);
> - inode->i_atime.tv_nsec = inode->i_ctime.tv_nsec = inode->i_mtime.tv_nsec = 0;
> + inode->i_atime.tv_nsec = inode->i_ctime.tv_nsec = inode->i_mtime.tv_nsec = 999999999;
Coming in way too late, how about changing the other end? Each
filesystem provides a new function that transforms high resolution
time into whatever the filesystem can store. If the function is NULL,
we use a sane default like above.
- inode->i_atime.tv_nsec = inode->i_ctime.tv_nsec = inode->i_mtime.tv_nsec = 0;
If the user never sees the high resolution in the first place, we
don't need to play guessing games later, after data has been flushed
from the page cache.
J?rn
--
The competent programmer is fully aware of the strictly limited size of
his own skull; therefore he approaches the programming task in full
humility, and among other things he avoids clever tricks like the plague.
-- Edsger W. Dijkstra