2010-07-02 07:42:11

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH] time/fs - file's time race with vgettimeofday

hi,

there's a race among calling gettimeofday(2) and a file's time
updates. Following test program expose the race.

run it in the while loop
while [ 1 ]; do ./test1 || break; done

--- SNIP ---
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>

int main (void)
{
struct stat st;
struct timeval tv;

unlink("./file");

gettimeofday(&tv, NULL);

if (-1 == creat("./file", O_RDWR)) {
perror("creat");
return -1;
}

if (stat("./file", &st) != 0) {
perror("stat");
return -1;
}

printf("USER gtod: %ld\n", (long)tv.tv_sec);
printf("USER file: %ld.%09u\n",
(long) st.st_mtime,
(unsigned) st.st_mtim.tv_nsec);

return tv.tv_sec <= st.st_mtime ? 0 : -1;
}
--- SNIP ---


The point is that the stat call returns time that
sometime precedes time from gettimeofday.

The reason follows.

- inode's time is initialized by CURRENT_TIME_SEC macro,
which returns tv_sec portion of xtime variable
- the xtime is updated by update_wall_time being called
each tick (not that often for NO_HZ)
- vgettimeofday reads the actuall clocksource tick
and computes the time

Thus while the inode's time is based on the xtime update
by the update_wall_time function, the vgettimeofday computes
the time correctly each time it's called.

Thus the race is triggered within 2 update_wall_time updates,
when in between the gettimeofday and creat calls happened.



ticks CPU update_wall_time executed
-------------------------------------------------------------------------------
t1
update 1
(xtime is computed based on tick t1)


t2

| gettimeofday |
| (returns time based on tick 2) |
| |
| creat |
| (set time based on tick 1) |


update 2
(xtime is computed based on tick t2)
t3
-------------------------------------------------------------------------------



This issue was described in the BZ 244697

Time goes backward - gettimeofday() vs. rename()
https://bugzilla.redhat.com/show_bug.cgi?id=244697


It's not just issue of the creat but few others like rename.


The following patch will prevent the race by adding the
CURRENT_TIME_SEC_REAL macro, which will return seconds from
the getnstimeofday call, ensuring it's computed on current tick.
It fixes the 'creat' case for ext4.


I'm not sure how much trouble is having this race unfixed compared
to the performance impact the fix might have. Maybe there're
better ways to fix this.


thanks for any ideas,
jirka



Signed-off-by: Jiri Olsa <[email protected]>
---
diff --git a/fs/ext2/ialloc.c b/fs/ext2/ialloc.c
index 938dbc7..7a0a2fc 100644
--- a/fs/ext2/ialloc.c
+++ b/fs/ext2/ialloc.c
@@ -558,7 +558,7 @@ got:

inode->i_ino = ino;
inode->i_blocks = 0;
- inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME_SEC;
+ inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME_SEC_REAL;
memset(ei->i_data, 0, sizeof(ei->i_data));
ei->i_flags =
ext2_mask_flags(mode, EXT2_I(dir)->i_flags & EXT2_FL_INHERITED);
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 19a4de5..2c2925c 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1157,7 +1157,7 @@ static inline struct ext4_inode_info *EXT4_I(struct inode *inode)
static inline struct timespec ext4_current_time(struct inode *inode)
{
return (inode->i_sb->s_time_gran < NSEC_PER_SEC) ?
- current_fs_time(inode->i_sb) : CURRENT_TIME_SEC;
+ current_fs_time(inode->i_sb) : CURRENT_TIME_SEC_REAL;
}

static inline int ext4_valid_inum(struct super_block *sb, unsigned long ino)
diff --git a/include/linux/time.h b/include/linux/time.h
index ea3559f..f390687 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -109,12 +109,14 @@ void timekeeping_init(void);
extern int timekeeping_suspended;

unsigned long get_seconds(void);
+unsigned long get_seconds_real(void);
struct timespec current_kernel_time(void);
struct timespec __current_kernel_time(void); /* does not hold xtime_lock */
struct timespec get_monotonic_coarse(void);

#define CURRENT_TIME (current_kernel_time())
#define CURRENT_TIME_SEC ((struct timespec) { get_seconds(), 0 })
+#define CURRENT_TIME_SEC_REAL ((struct timespec) { get_seconds_real(), 0 })

/* Some architectures do not supply their own clocksource.
* This is mainly the case in architectures that get their
diff --git a/kernel/time.c b/kernel/time.c
index 848b1c2..ce10dae 100644
--- a/kernel/time.c
+++ b/kernel/time.c
@@ -227,7 +227,8 @@ SYSCALL_DEFINE1(adjtimex, struct timex __user *, txc_p)
*/
struct timespec current_fs_time(struct super_block *sb)
{
- struct timespec now = current_kernel_time();
+ struct timespec now;
+ getnstimeofday(&now);
return timespec_trunc(now, sb->s_time_gran);
}
EXPORT_SYMBOL(current_fs_time);
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index caf8d4d..7ebfe23 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -897,6 +897,14 @@ unsigned long get_seconds(void)
}
EXPORT_SYMBOL(get_seconds);

+unsigned long get_seconds_real(void)
+{
+ struct timespec ts;
+ getnstimeofday(&ts);
+ return ts.tv_sec;
+}
+EXPORT_SYMBOL(get_seconds_real);
+
struct timespec __current_kernel_time(void)
{
return xtime;


2010-07-02 16:16:19

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] time/fs - file's time race with vgettimeofday

In no way I can review this patch, but I am curious and have the questions.
Also, I think it makes sense to cc the fs/ developers, I've added Al.

On 07/02, Jiri Olsa wrote:
>
> there's a race among calling gettimeofday(2) and a file's time
> updates. Following test program expose the race.
>
> run it in the while loop
> while [ 1 ]; do ./test1 || break; done
>
> --- SNIP ---
> #include <stdio.h>
> #include <stdlib.h>
> #include <fcntl.h>
>
> int main (void)
> {
> struct stat st;
> struct timeval tv;
>
> unlink("./file");
>
> gettimeofday(&tv, NULL);
>
> if (-1 == creat("./file", O_RDWR)) {
> perror("creat");
> return -1;
> }
>
> if (stat("./file", &st) != 0) {
> perror("stat");
> return -1;
> }
>
> printf("USER gtod: %ld\n", (long)tv.tv_sec);
> printf("USER file: %ld.%09u\n",
> (long) st.st_mtime,
> (unsigned) st.st_mtim.tv_nsec);
>
> return tv.tv_sec <= st.st_mtime ? 0 : -1;
> }

Interesting. To the point, I actually compiled this code and yes,
it triggers the problem on ext3 ;)

> The following patch will prevent the race by adding the
> CURRENT_TIME_SEC_REAL macro, which will return seconds from
> the getnstimeofday call, ensuring it's computed on current tick.
> It fixes the 'creat' case for ext4.

What about other filesystems? Perhaps it makes sense to change
CURRENT_TIME_SEC instead of adding CURRENT_TIME_SEC_REAL?

Once again, I am asking. It is not that I suggest to change your patch.



But there is something I can't understand.

We have

#define CURRENT_TIME_SEC ((struct timespec) { get_seconds(), 0 })

and your patch adds

#define CURRENT_TIME_SEC_REAL ((struct timespec) { get_seconds_real(), 0 })

which fixes the problem for ext4.

But, get_seconds() is also used by sys_time(), and we should have the
same problem with another trivial test-case:

#include <stdio.h>
#include <sys/time.h>
#include <time.h>

int main(void)
{
struct timeval tv;
int sec;

do {
gettimeofday(&tv, NULL);
sec = time(NULL);
} while (tv.tv_sec <= sec);

printf("gtod: %ld.%06ld\n", tv.tv_sec, tv.tv_usec);
printf("time: %d.000000\n", sec);
return 0;
}

However, this test-case can't trigger the problem. Confused.

Oleg.

>
> I'm not sure how much trouble is having this race unfixed compared
> to the performance impact the fix might have. Maybe there're
> better ways to fix this.
>
>
> thanks for any ideas,
> jirka
>
>
>
> Signed-off-by: Jiri Olsa <[email protected]>
> ---
> diff --git a/fs/ext2/ialloc.c b/fs/ext2/ialloc.c
> index 938dbc7..7a0a2fc 100644
> --- a/fs/ext2/ialloc.c
> +++ b/fs/ext2/ialloc.c
> @@ -558,7 +558,7 @@ got:
>
> inode->i_ino = ino;
> inode->i_blocks = 0;
> - inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME_SEC;
> + inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME_SEC_REAL;
> memset(ei->i_data, 0, sizeof(ei->i_data));
> ei->i_flags =
> ext2_mask_flags(mode, EXT2_I(dir)->i_flags & EXT2_FL_INHERITED);
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 19a4de5..2c2925c 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -1157,7 +1157,7 @@ static inline struct ext4_inode_info *EXT4_I(struct inode *inode)
> static inline struct timespec ext4_current_time(struct inode *inode)
> {
> return (inode->i_sb->s_time_gran < NSEC_PER_SEC) ?
> - current_fs_time(inode->i_sb) : CURRENT_TIME_SEC;
> + current_fs_time(inode->i_sb) : CURRENT_TIME_SEC_REAL;
> }
>
> static inline int ext4_valid_inum(struct super_block *sb, unsigned long ino)
> diff --git a/include/linux/time.h b/include/linux/time.h
> index ea3559f..f390687 100644
> --- a/include/linux/time.h
> +++ b/include/linux/time.h
> @@ -109,12 +109,14 @@ void timekeeping_init(void);
> extern int timekeeping_suspended;
>
> unsigned long get_seconds(void);
> +unsigned long get_seconds_real(void);
> struct timespec current_kernel_time(void);
> struct timespec __current_kernel_time(void); /* does not hold xtime_lock */
> struct timespec get_monotonic_coarse(void);
>
> #define CURRENT_TIME (current_kernel_time())
> #define CURRENT_TIME_SEC ((struct timespec) { get_seconds(), 0 })
> +#define CURRENT_TIME_SEC_REAL ((struct timespec) { get_seconds_real(), 0 })
>
> /* Some architectures do not supply their own clocksource.
> * This is mainly the case in architectures that get their
> diff --git a/kernel/time.c b/kernel/time.c
> index 848b1c2..ce10dae 100644
> --- a/kernel/time.c
> +++ b/kernel/time.c
> @@ -227,7 +227,8 @@ SYSCALL_DEFINE1(adjtimex, struct timex __user *, txc_p)
> */
> struct timespec current_fs_time(struct super_block *sb)
> {
> - struct timespec now = current_kernel_time();
> + struct timespec now;
> + getnstimeofday(&now);
> return timespec_trunc(now, sb->s_time_gran);
> }
> EXPORT_SYMBOL(current_fs_time);
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index caf8d4d..7ebfe23 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -897,6 +897,14 @@ unsigned long get_seconds(void)
> }
> EXPORT_SYMBOL(get_seconds);
>
> +unsigned long get_seconds_real(void)
> +{
> + struct timespec ts;
> + getnstimeofday(&ts);
> + return ts.tv_sec;
> +}
> +EXPORT_SYMBOL(get_seconds_real);
> +
> struct timespec __current_kernel_time(void)
> {
> return xtime;

2010-07-02 16:22:28

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] time/fs - file's time race with vgettimeofday

On 07/02, Oleg Nesterov wrote:
>
> But, get_seconds() is also used by sys_time(), and we should have the
> same problem with another trivial test-case:
>
> #include <stdio.h>
> #include <sys/time.h>
> #include <time.h>
>
> int main(void)
> {
> struct timeval tv;
> int sec;
>
> do {
> gettimeofday(&tv, NULL);
> sec = time(NULL);
> } while (tv.tv_sec <= sec);
>
> printf("gtod: %ld.%06ld\n", tv.tv_sec, tv.tv_usec);
> printf("time: %d.000000\n", sec);
> return 0;
> }
>
> However, this test-case can't trigger the problem. Confused.

Aha. libc's time() doesn't use sys_time(), it uses __vsyscall(1)
vtime()->do_vgettimeofday().

This one quickly triggers the "time goes backward" case.

#include <stdio.h>
#include <sys/time.h>
#include <time.h>
#include <unistd.h>
#include <sys/syscall.h>

int main(void)
{
struct timeval tv;
int sec;

do {
gettimeofday(&tv, NULL);
sec = syscall(__NR_time, NULL);
} while (tv.tv_sec <= sec);

printf("gtod: %ld.%06ld\n", tv.tv_sec, tv.tv_usec);
printf("time: %d.000000\n", sec);
return 0;
}

Not that I think this "problem" should be fixed, just curious.

Oleg.

2010-07-02 23:58:45

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH] time/fs - file's time race with vgettimeofday

On Fri, Jul 02, 2010 at 06:14:22PM +0200, Oleg Nesterov wrote:
> In no way I can review this patch, but I am curious and have the questions.
> Also, I think it makes sense to cc the fs/ developers, I've added Al.

thanks

>
> On 07/02, Jiri Olsa wrote:
> >
> > there's a race among calling gettimeofday(2) and a file's time
> > updates. Following test program expose the race.
> >
> > run it in the while loop
> > while [ 1 ]; do ./test1 || break; done
> >
> > --- SNIP ---
> > #include <stdio.h>
> > #include <stdlib.h>
> > #include <fcntl.h>
> >
> > int main (void)
> > {
> > struct stat st;
> > struct timeval tv;
> >
> > unlink("./file");
> >
> > gettimeofday(&tv, NULL);
> >
> > if (-1 == creat("./file", O_RDWR)) {
> > perror("creat");
> > return -1;
> > }
> >
> > if (stat("./file", &st) != 0) {
> > perror("stat");
> > return -1;
> > }
> >
> > printf("USER gtod: %ld\n", (long)tv.tv_sec);
> > printf("USER file: %ld.%09u\n",
> > (long) st.st_mtime,
> > (unsigned) st.st_mtim.tv_nsec);
> >
> > return tv.tv_sec <= st.st_mtime ? 0 : -1;
> > }
>
> Interesting. To the point, I actually compiled this code and yes,
> it triggers the problem on ext3 ;)
>
> > The following patch will prevent the race by adding the
> > CURRENT_TIME_SEC_REAL macro, which will return seconds from
> > the getnstimeofday call, ensuring it's computed on current tick.
> > It fixes the 'creat' case for ext4.
>
> What about other filesystems? Perhaps it makes sense to change
> CURRENT_TIME_SEC instead of adding CURRENT_TIME_SEC_REAL?
>
> Once again, I am asking. It is not that I suggest to change your patch.

well, the patch is more or less to prove the problem exists and
it could be fixed :) also I'm not sure that all the places using
CURRENT_TIME_SEC suffer the same issue..

I'm currenty looking to the code and trying to come up with better
solution.. any ideas are welcome :)


jirka

2010-07-06 23:03:33

by john stultz

[permalink] [raw]
Subject: Re: [PATCH] time/fs - file's time race with vgettimeofday

On Fri, Jul 2, 2010 at 12:41 AM, Jiri Olsa <[email protected]> wrote:
> hi,
>
> there's a race among calling gettimeofday(2) and a file's time
> updates. ?Following test program expose the race.
>
> run it in the while loop
> ? ? ? ?while [ 1 ]; do ./test1 || break; done
>
> --- SNIP ---
> #include <stdio.h>
> #include <stdlib.h>
> #include <fcntl.h>
>
> int main (void)
> {
> ? ? ? ?struct stat st;
> ? ? ? ?struct timeval tv;
>
> ? ? ? ?unlink("./file");
>
> ? ? ? ?gettimeofday(&tv, NULL);
>
> ? ? ? ?if (-1 == creat("./file", O_RDWR)) {
> ? ? ? ? ? ? ? ?perror("creat");
> ? ? ? ? ? ? ? ?return -1;
> ? ? ? ?}
>
> ? ? ? ?if (stat("./file", &st) != 0) {
> ? ? ? ? ? ? ? ?perror("stat");
> ? ? ? ? ? ? ? ?return -1;
> ? ? ? ?}
>
> ? ? ? ?printf("USER gtod: %ld\n", (long)tv.tv_sec);
> ? ? ? ?printf("USER file: %ld.%09u\n",
> ? ? ? ? ? ? ? ? ? ? ? ?(long) st.st_mtime,
> ? ? ? ? ? ? ? ? ? ? ? ?(unsigned) st.st_mtim.tv_nsec);
>
> ? ? ? ?return tv.tv_sec <= st.st_mtime ? 0 : -1;
> }
> --- SNIP ---
>
>
> The point is that the stat call returns time that
> sometime precedes time from gettimeofday.
>
> The reason follows.
>
> ? ? ? ?- inode's time is initialized by CURRENT_TIME_SEC macro,
> ? ? ? ? ?which returns tv_sec portion of xtime variable
> ? ? ? ?- the xtime is updated by update_wall_time being called
> ? ? ? ? ?each tick (not that often for NO_HZ)
> ? ? ? ?- vgettimeofday reads the actuall clocksource tick
> ? ? ? ? ?and computes the time
>
> Thus while the inode's time is based on the xtime update
> by the update_wall_time function, the vgettimeofday computes
> the time correctly each time it's called.
>
> Thus the race is triggered within 2 update_wall_time updates,
> when in between the gettimeofday and creat calls happened.
>
>
>
> ticks ? ? ? ? ? ? ? ? ? CPU ? ? ? ? ? ? ? ? ? update_wall_time executed
> -------------------------------------------------------------------------------
> ?t1
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?update 1
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (xtime is computed based on tick t1)
>
>
> ?t2
>
> ? ? ? | ? ? ? ? gettimeofday ? ? ? ? ? |
> ? ? ? | (returns time based on tick 2) |
> ? ? ? | ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?|
> ? ? ? | ? ? ? ? ? ?creat ? ? ? ? ? ? ? |
> ? ? ? | ? (set time based on tick 1) ? |
>
>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?update 2
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (xtime is computed based on tick t2)
> ?t3
> -------------------------------------------------------------------------------
>
>
>
> This issue was described in the BZ 244697
>
> ? ? ? ?Time goes backward - gettimeofday() vs. rename()
> ? ? ? ?https://bugzilla.redhat.com/show_bug.cgi?id=244697
>
>
> It's not just issue of the creat but few others like rename.
>
>
> The following patch will prevent the race by adding the
> CURRENT_TIME_SEC_REAL macro, which will return seconds from
> the getnstimeofday call, ensuring it's computed on current tick.
> It fixes the 'creat' case for ext4.
>
>
> I'm not sure how much trouble is having this race unfixed compared
> to the performance impact the fix might have. Maybe there're
> better ways to fix this.

I do worry that your patch will have too much of a performance hit. I
think the right fix would be in vtime().

Test patch to follow shortly.

thanks
-john

2010-07-06 23:11:47

by john stultz

[permalink] [raw]
Subject: Re: [PATCH] time/fs - file's time race with vgettimeofday

On Tue, 2010-07-06 at 16:03 -0700, john stultz wrote:
> On Fri, Jul 2, 2010 at 12:41 AM, Jiri Olsa <[email protected]> wrote:
> > This issue was described in the BZ 244697
> >
> > Time goes backward - gettimeofday() vs. rename()
> > https://bugzilla.redhat.com/show_bug.cgi?id=244697
> >
> >
> > It's not just issue of the creat but few others like rename.
> >
> >
> > The following patch will prevent the race by adding the
> > CURRENT_TIME_SEC_REAL macro, which will return seconds from
> > the getnstimeofday call, ensuring it's computed on current tick.
> > It fixes the 'creat' case for ext4.
> >
> >
> > I'm not sure how much trouble is having this race unfixed compared
> > to the performance impact the fix might have. Maybe there're
> > better ways to fix this.
>
> I do worry that your patch will have too much of a performance hit. I
> think the right fix would be in vtime().
>
> Test patch to follow shortly.

So the following (untested) patch _should_ resolve this in mainline on
x86. Please let me know if it works for you.

However, for older kernels, this patch won't be sufficient, as it
depends on update_vsyscall getting the time at the last tick in its
wall_time, and kernels that don't have the logarithmic accumulation code
& remove xtime_cache patches can't be fixed so easily.

Once we get this upstream, let me know if you have any questions about
how to backport this to older kernels.

thanks
-john



[PATCH] x86: Fix vtime/file timestamp inconsistencies

Due to vtime calling vgettimeofday(), its possible that an application
could call time();create("stuff",O_RDRW); only to see the file's
creation timestamp to be before the value returned by time.

The proper fix is to make vtime use the same time value as
current_kernel_time() (which is exported via update_vsyscall) instead of
vgettime().


Signed-off-by: John Stultz <[email protected]>



diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
index 1c0c6ab..ce9a4fa 100644
--- a/arch/x86/kernel/vsyscall_64.c
+++ b/arch/x86/kernel/vsyscall_64.c
@@ -169,13 +169,18 @@ int __vsyscall(0) vgettimeofday(struct timeval * tv, struct timezone * tz)
* unlikely */
time_t __vsyscall(1) vtime(time_t *t)
{
- struct timeval tv;
+ unsigned seq;
time_t result;
if (unlikely(!__vsyscall_gtod_data.sysctl_enabled))
return time_syscall(t);

- vgettimeofday(&tv, NULL);
- result = tv.tv_sec;
+ do {
+ seq = read_seqbegin(&__vsyscall_gtod_data.lock);
+
+ result = vsyscall_gtod_data.wall_time_sec;
+
+ } while (read_seqretry(&__vsyscall_gtod_data.lock, seq));
+
if (t)
*t = result;
return result;

2010-07-07 10:47:37

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH] time/fs - file's time race with vgettimeofday

On Tue, Jul 06, 2010 at 04:11:40PM -0700, john stultz wrote:
> On Tue, 2010-07-06 at 16:03 -0700, john stultz wrote:
> > On Fri, Jul 2, 2010 at 12:41 AM, Jiri Olsa <[email protected]> wrote:
> > > This issue was described in the BZ 244697
> > >
> > > Time goes backward - gettimeofday() vs. rename()
> > > https://bugzilla.redhat.com/show_bug.cgi?id=244697
> > >
> > >
> > > It's not just issue of the creat but few others like rename.
> > >
> > >
> > > The following patch will prevent the race by adding the
> > > CURRENT_TIME_SEC_REAL macro, which will return seconds from
> > > the getnstimeofday call, ensuring it's computed on current tick.
> > > It fixes the 'creat' case for ext4.
> > >
> > >
> > > I'm not sure how much trouble is having this race unfixed compared
> > > to the performance impact the fix might have. Maybe there're
> > > better ways to fix this.
> >
> > I do worry that your patch will have too much of a performance hit. I
> > think the right fix would be in vtime().
> >
> > Test patch to follow shortly.
>
> So the following (untested) patch _should_ resolve this in mainline on
> x86. Please let me know if it works for you.
>
> However, for older kernels, this patch won't be sufficient, as it
> depends on update_vsyscall getting the time at the last tick in its
> wall_time, and kernels that don't have the logarithmic accumulation code
> & remove xtime_cache patches can't be fixed so easily.
>
> Once we get this upstream, let me know if you have any questions about
> how to backport this to older kernels.
>
> thanks
> -john
>
>
>
> [PATCH] x86: Fix vtime/file timestamp inconsistencies
>
> Due to vtime calling vgettimeofday(), its possible that an application
> could call time();create("stuff",O_RDRW); only to see the file's
> creation timestamp to be before the value returned by time.
>
> The proper fix is to make vtime use the same time value as
> current_kernel_time() (which is exported via update_vsyscall) instead of
> vgettime().

hm, this would be solution for the time() call.

But I think the issue still stays for gettimeofday and clock_gettime
(CLOCK_REALTIME/CLOCK_MONOTONIC) because they call vread from
clocksource to get the real time.

Thats where I'm not sure if this race is that "bad", compared either
to performance hit or inaccuracy of user time calls.. which are possible
solutions..


jirka

>
>
> Signed-off-by: John Stultz <[email protected]>
>
>
>
> diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
> index 1c0c6ab..ce9a4fa 100644
> --- a/arch/x86/kernel/vsyscall_64.c
> +++ b/arch/x86/kernel/vsyscall_64.c
> @@ -169,13 +169,18 @@ int __vsyscall(0) vgettimeofday(struct timeval * tv, struct timezone * tz)
> * unlikely */
> time_t __vsyscall(1) vtime(time_t *t)
> {
> - struct timeval tv;
> + unsigned seq;
> time_t result;
> if (unlikely(!__vsyscall_gtod_data.sysctl_enabled))
> return time_syscall(t);
>
> - vgettimeofday(&tv, NULL);
> - result = tv.tv_sec;
> + do {
> + seq = read_seqbegin(&__vsyscall_gtod_data.lock);
> +
> + result = vsyscall_gtod_data.wall_time_sec;
> +
> + } while (read_seqretry(&__vsyscall_gtod_data.lock, seq));
> +
> if (t)
> *t = result;
> return result;
>
>

2010-07-07 16:21:15

by john stultz

[permalink] [raw]
Subject: Re: [PATCH] time/fs - file's time race with vgettimeofday

On Wed, 2010-07-07 at 12:47 +0200, Jiri Olsa wrote:
> On Tue, Jul 06, 2010 at 04:11:40PM -0700, john stultz wrote:
> > [PATCH] x86: Fix vtime/file timestamp inconsistencies
> >
> > Due to vtime calling vgettimeofday(), its possible that an application
> > could call time();create("stuff",O_RDRW); only to see the file's
> > creation timestamp to be before the value returned by time.
> >
> > The proper fix is to make vtime use the same time value as
> > current_kernel_time() (which is exported via update_vsyscall) instead of
> > vgettime().
>
> hm, this would be solution for the time() call.
>
> But I think the issue still stays for gettimeofday and clock_gettime
> (CLOCK_REALTIME/CLOCK_MONOTONIC) because they call vread from
> clocksource to get the real time.
>
> Thats where I'm not sure if this race is that "bad", compared either
> to performance hit or inaccuracy of user time calls.. which are possible
> solutions..

Right, so as long as the file timestamps are tick-granular (like other
tick-granular interfaces: current_kernel_time(), time(),
CLOCK_REALTIME_COARSE) you will have the possibility of inconsistencies
against the clocksource resolution interfaces (gettimeofday(),
CLOCK_REALTIME, etc).

But that is to be expected as a constraint of the granularity. So I
don't really see this as an issue.

Folks may want to increase the granularity of filesystem timestamps, but
that will come at the possibly very expensive cost of reading the
clocksource hardware (which can have different access latencies between
architectures and even machines of the same arch). I suspect its not
worth it.

The concerning issue here that you pointed out are the inconsistencies
could be seen between vsyscall time() and time() (or filesystem
timestamps). That is a problem, and my patch should resolve that one.

thanks
-john





2010-07-07 17:11:19

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH] time/fs - file's time race with vgettimeofday

On Tue, Jul 06, 2010 at 04:11:40PM -0700, john stultz wrote:
> On Tue, 2010-07-06 at 16:03 -0700, john stultz wrote:
> > On Fri, Jul 2, 2010 at 12:41 AM, Jiri Olsa <[email protected]> wrote:
> > > This issue was described in the BZ 244697
> > >
> > > Time goes backward - gettimeofday() vs. rename()
> > > https://bugzilla.redhat.com/show_bug.cgi?id=244697
> > >
> > >
> > > It's not just issue of the creat but few others like rename.
> > >
> > >
> > > The following patch will prevent the race by adding the
> > > CURRENT_TIME_SEC_REAL macro, which will return seconds from
> > > the getnstimeofday call, ensuring it's computed on current tick.
> > > It fixes the 'creat' case for ext4.
> > >
> > >
> > > I'm not sure how much trouble is having this race unfixed compared
> > > to the performance impact the fix might have. Maybe there're
> > > better ways to fix this.
> >
> > I do worry that your patch will have too much of a performance hit. I
> > think the right fix would be in vtime().
> >
> > Test patch to follow shortly.
>
> So the following (untested) patch _should_ resolve this in mainline on
> x86. Please let me know if it works for you.
>
> However, for older kernels, this patch won't be sufficient, as it
> depends on update_vsyscall getting the time at the last tick in its
> wall_time, and kernels that don't have the logarithmic accumulation code
> & remove xtime_cache patches can't be fixed so easily.
>
> Once we get this upstream, let me know if you have any questions about
> how to backport this to older kernels.
>
> thanks
> -john
>
>
>
> [PATCH] x86: Fix vtime/file timestamp inconsistencies
>
> Due to vtime calling vgettimeofday(), its possible that an application
> could call time();create("stuff",O_RDRW); only to see the file's
> creation timestamp to be before the value returned by time.
>
> The proper fix is to make vtime use the same time value as
> current_kernel_time() (which is exported via update_vsyscall) instead of
> vgettime().
>
>
> Signed-off-by: John Stultz <[email protected]>
>
>
>
> diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
> index 1c0c6ab..ce9a4fa 100644
> --- a/arch/x86/kernel/vsyscall_64.c
> +++ b/arch/x86/kernel/vsyscall_64.c
> @@ -169,13 +169,18 @@ int __vsyscall(0) vgettimeofday(struct timeval * tv, struct timezone * tz)
> * unlikely */
> time_t __vsyscall(1) vtime(time_t *t)
> {
> - struct timeval tv;
> + unsigned seq;
> time_t result;
> if (unlikely(!__vsyscall_gtod_data.sysctl_enabled))
> return time_syscall(t);
>
> - vgettimeofday(&tv, NULL);
> - result = tv.tv_sec;
> + do {
> + seq = read_seqbegin(&__vsyscall_gtod_data.lock);
> +
> + result = vsyscall_gtod_data.wall_time_sec;
> +
> + } while (read_seqretry(&__vsyscall_gtod_data.lock, seq));
> +
> if (t)
> *t = result;
> return result;
>
>

I think there's a typo in using "vsyscall_gtod_data" inside
the vtime call it should be "__vsyscall_gtod_data" intead.

attaching changed patch

wbr,
jirka

---
[PATCH] x86: Fix vtime/file timestamp inconsistencies

Due to vtime calling vgettimeofday(), its possible that an application
could call time();create("stuff",O_RDRW); only to see the file's
creation timestamp to be before the value returned by time.

The proper fix is to make vtime use the same time value as
current_kernel_time() (which is exported via update_vsyscall) instead of
vgettime().


Signed-off-by: John Stultz <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
---
diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
index 1c0c6ab..dce0c3c 100644
--- a/arch/x86/kernel/vsyscall_64.c
+++ b/arch/x86/kernel/vsyscall_64.c
@@ -169,13 +169,18 @@ int __vsyscall(0) vgettimeofday(struct timeval * tv, struct timezone * tz)
* unlikely */
time_t __vsyscall(1) vtime(time_t *t)
{
- struct timeval tv;
+ unsigned seq;
time_t result;
if (unlikely(!__vsyscall_gtod_data.sysctl_enabled))
return time_syscall(t);

- vgettimeofday(&tv, NULL);
- result = tv.tv_sec;
+ do {
+ seq = read_seqbegin(&__vsyscall_gtod_data.lock);
+
+ result = __vsyscall_gtod_data.wall_time_sec;
+
+ } while (read_seqretry(&__vsyscall_gtod_data.lock, seq));
+
if (t)
*t = result;
return result;

2010-07-07 17:21:06

by john stultz

[permalink] [raw]
Subject: Re: [PATCH] time/fs - file's time race with vgettimeofday

On Wed, 2010-07-07 at 19:11 +0200, Jiri Olsa wrote:
> On Tue, Jul 06, 2010 at 04:11:40PM -0700, john stultz wrote:
> > - vgettimeofday(&tv, NULL);
> > - result = tv.tv_sec;
> > + do {
> > + seq = read_seqbegin(&__vsyscall_gtod_data.lock);
> > +
> > + result = vsyscall_gtod_data.wall_time_sec;
> > +
> > + } while (read_seqretry(&__vsyscall_gtod_data.lock, seq));
> > +
> > if (t)
> > *t = result;
> > return result;
> >
> >
>
> I think there's a typo in using "vsyscall_gtod_data" inside
> the vtime call it should be "__vsyscall_gtod_data" intead.

Quite right! Thanks for catching it!
-john