LinuxLists.cc - [jlayton:ctime] [ext4] ff9aaf58e8: ltp.statx06.fail

2023-05-01 14:18:59

Subject: [jlayton:ctime] [ext4] ff9aaf58e8: ltp.statx06.fail

Hello,

kernel test robot noticed "ltp.statx06.fail" on:

commit: ff9aaf58e816635c454fbe9e9ece94b0eee6f0b1 ("ext4: convert to multigrain timestamps")
https://git.kernel.org/cgit/linux/kernel/git/jlayton/linux.git ctime

in testcase: ltp
version: ltp-x86_64-14c1f76-1_20230429
with following parameters:

disk: 1HDD
fs: xfs
test: syscalls-04

test-description: The LTP testsuite contains a collection of tools for testing the Linux kernel and related features.
test-url: http://linux-test-project.github.io/

compiler: gcc-11
test machine: 4 threads Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz (Skylake) with 32G memory

(please refer to attached dmesg/kmsg for entire log/backtrace)

If you fix the issue, kindly add following tag
| Reported-by: kernel test robot <[email protected]>
| Link: https://lore.kernel.org/oe-lkp/[email protected]

<<<test_start>>>
tag=statx06 stime=1682919030
cmdline="statx06"
contacts=""
analysis=exit
<<<test_output>>>
tst_device.c:96: TINFO: Found free device 0 '/dev/loop0'
tst_test.c:1093: TINFO: Formatting /dev/loop0 with ext4 opts='-I 256' extra opts=''
mke2fs 1.46.6-rc1 (12-Sep-2022)
tst_test.c:1558: TINFO: Timeout per run is 0h 02m 30s
statx06.c:136: TFAIL: Birth time < before time
statx06.c:138: TFAIL: Modified time > after_time
statx06.c:136: TFAIL: Access time < before time
statx06.c:136: TFAIL: Change time < before time

Summary:
passed 0
failed 4
broken 0
skipped 0
warnings 0
incrementing stop
<<<execution_status>>>
initiation_status="ok"
duration=1 termination_type=exited termination_id=1 corefile=no
cutime=0 cstime=5
<<<test_end>>>
INFO: ltp-pan reported some tests FAIL
LTP Version: 20230127-165-gbd512e733

###############################################################

Done executing testcases.
LTP Version: 20230127-165-gbd512e733
###############################################################

To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

Attachments:

(No filename) (2.49 kB)
config-6.3.0-00004-gff9aaf58e816 (163.19 kB)
job-script (5.80 kB)
dmesg.xz (79.11 kB)
ltp (235.70 kB)
job.yaml (5.12 kB)
reproduce (334.00 B)
Download all attachments

2023-05-01 16:05:41

by Jeff Layton

[permalink] [raw]

Subject: Re: [jlayton:ctime] [ext4] ff9aaf58e8: ltp.statx06.fail

On Mon, 2023-05-01 at 22:09 +0800, kernel test robot wrote:
> Hello,
>
> kernel test robot noticed "ltp.statx06.fail" on:
>
> commit: ff9aaf58e816635c454fbe9e9ece94b0eee6f0b1 ("ext4: convert to multigrain timestamps")
> https://git.kernel.org/cgit/linux/kernel/git/jlayton/linux.git ctime
>
> in testcase: ltp
> version: ltp-x86_64-14c1f76-1_20230429
> with following parameters:
>
> disk: 1HDD
> fs: xfs
> test: syscalls-04
>
> test-description: The LTP testsuite contains a collection of tools for testing the Linux kernel and related features.
> test-url: http://linux-test-project.github.io/
>
>
> compiler: gcc-11
> test machine: 4 threads Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz (Skylake) with 32G memory
>
> (please refer to attached dmesg/kmsg for entire log/backtrace)
>
>
>
>
> If you fix the issue, kindly add following tag
> > Reported-by: kernel test robot <[email protected]>
> > Link: https://lore.kernel.org/oe-lkp/[email protected]
>
>
>
> <<<test_start>>>
> tag=statx06 stime=1682919030
> cmdline="statx06"
> contacts=""
> analysis=exit
> <<<test_output>>>
> tst_device.c:96: TINFO: Found free device 0 '/dev/loop0'
> tst_test.c:1093: TINFO: Formatting /dev/loop0 with ext4 opts='-I 256' extra opts=''
> mke2fs 1.46.6-rc1 (12-Sep-2022)
> tst_test.c:1558: TINFO: Timeout per run is 0h 02m 30s
> statx06.c:136: TFAIL: Birth time < before time
> statx06.c:138: TFAIL: Modified time > after_time
> statx06.c:136: TFAIL: Access time < before time
> statx06.c:136: TFAIL: Change time < before time
>
> Summary:
> passed 0
> failed 4
> broken 0
> skipped 0
> warnings 0
> incrementing stop
> <<<execution_status>>>
> initiation_status="ok"
> duration=1 termination_type=exited termination_id=1 corefile=no
> cutime=0 cstime=5
> <<<test_end>>>
> INFO: ltp-pan reported some tests FAIL
> LTP Version: 20230127-165-gbd512e733
>
> ###############################################################
>
> Done executing testcases.
> LTP Version: 20230127-165-gbd512e733
> ###############################################################
>
>
>
>
> To reproduce:
>
> git clone https://github.com/intel/lkp-tests.git
> cd lkp-tests
> sudo bin/lkp install job.yaml # job file is attached in this email
> bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
> sudo bin/lkp run generated-yaml-file
>
> # if come across any failure that blocks the test,
> # please remove ~/.lkp and /lkp dir to run from a clean state.
>
>
>

(adding linux-fsdevel and a few other folks who have shown interest in
the multigrain ctime patches)

I haven't posted the ext4 patch for this yet since I'm still testing it,
but this is an interesting test result. The upshot is that we'll
probably not be able to pass this testcase without modifying it if we go
with multigrain ctimes.

The test does this:

SAFE_CLOCK_GETTIME(CLOCK_REALTIME_COARSE, &before_time);
clock_wait_tick();
tc->operation();
clock_wait_tick();
SAFE_CLOCK_GETTIME(CLOCK_REALTIME_COARSE, &after_time);

...and with that, I usually end up with before/after_times that are 1ns
apart, since my machine is reporting a 1ns granularity.

The first problem is that the coarse grained timestamps represent the
lower bound of what time could end up in the inode. With multigrain
ctimes, we can end up grabbing a fine-grained timestamp to store in the
inode that will be later than either coarse grained time that was
fetched.

That's easy enough to fix -- grab a coarse time for "before" and a fine-
grained time for "after".

The clock_getres function though returns that it has a 1ns granularity
(since it does). With multigrain ctimes, we no longer have that at the
filesystem level. It's a 2ns granularity now (as we need the lowest bit
for the flag).

The following patch to the testcase fixes it for me, but I'm not sure if
it'll be acceptable. Maybe we need some way to indicate to userland that
multigrain timestamps are in effect, for "applications" like this that
care about such things?
--
Jeff Layton <[email protected]>

diff --git a/testcases/kernel/syscalls/statx/statx06.c b/testcases/kernel/syscalls/statx/statx06.c
index ce82b905b..1f5367583 100644
--- a/testcases/kernel/syscalls/statx/statx06.c
+++ b/testcases/kernel/syscalls/statx/statx06.c
@@ -107,9 +107,11 @@ static void test_statx(unsigned int test_nr)

SAFE_CLOCK_GETTIME(CLOCK_REALTIME_COARSE, &before_time);
clock_wait_tick();
+ clock_wait_tick();
tc->operation();
clock_wait_tick();
- SAFE_CLOCK_GETTIME(CLOCK_REALTIME_COARSE, &after_time);
+ clock_wait_tick();
+ SAFE_CLOCK_GETTIME(CLOCK_REALTIME, &after_time);

TEST(statx(AT_FDCWD, TEST_FILE, 0, STATX_ALL, &buff));
if (TST_RET != 0) {

2023-05-02 10:17:39

by Jeff Layton

[permalink] [raw]

Subject: Re: [jlayton:ctime] [ext4] ff9aaf58e8: ltp.statx06.fail

On Tue, 2023-05-02 at 10:39 +1000, Dave Chinner wrote:
> On Mon, May 01, 2023 at 12:05:17PM -0400, Jeff Layton wrote:
> > On Mon, 2023-05-01 at 22:09 +0800, kernel test robot wrote:
> > The test does this:
> >
> > SAFE_CLOCK_GETTIME(CLOCK_REALTIME_COARSE, &before_time);
> > clock_wait_tick();
> > tc->operation();
> > clock_wait_tick();
> > SAFE_CLOCK_GETTIME(CLOCK_REALTIME_COARSE, &after_time);
> >
> > ...and with that, I usually end up with before/after_times that are 1ns
> > apart, since my machine is reporting a 1ns granularity.
> >
> > The first problem is that the coarse grained timestamps represent the
> > lower bound of what time could end up in the inode. With multigrain
> > ctimes, we can end up grabbing a fine-grained timestamp to store in the
> > inode that will be later than either coarse grained time that was
> > fetched.
> >
> > That's easy enough to fix -- grab a coarse time for "before" and a fine-
> > grained time for "after".
> >
> > The clock_getres function though returns that it has a 1ns granularity
> > (since it does). With multigrain ctimes, we no longer have that at the
> > filesystem level. It's a 2ns granularity now (as we need the lowest bit
> > for the flag).
>
> Why are you even using the low bit for this? Nanosecond resolution
> only uses 30 bits, leaving the upper two bits of a 32 bit tv_nsec
> field available for internal status bits. As long as we mask out the
> internal bits when reading the VFS timestamp tv_nsec field, then
> we don't need to change the timestamp resolution, right?
>

Yeah, that should work. Let me give that a shot on the next pass.

Thanks,
--
Jeff Layton <[email protected]>