On Mon, Jan 25, 2021 at 3:48 PM Naresh Kamboju
<[email protected]> wrote:
>
> LTP syscalls madvise08 test case failed on all devices from
> Linux next 20210118 to till day.
> strace log attached to this email and link provided below.
>
> BAD: next-20210118
> GOOD: next-20210115
>
> This failure is easily reproducible on Linux next tag 20210118 above.
>
> tst_test.c:1250: TINFO: Timeout per run is 0h 15m 00s
> madvise08.c:73: TINFO: Temporary core pattern is
> '/scratch/ltp-2nftQzNI1K/HclFMH/dump-%p'
> madvise08.c:112: TINFO: Dump file should be dump-10109
> madvise08.c:196: TPASS: madvise(..., MADV_DONTDUMP)
> madvise08.c:112: TINFO: Dump file should be dump-10110
> madvise08.c:203: TFAIL: No sequence in dump after MADV_DODUMP.
>
> strace log,
> https://lkft.validation.linaro.org/scheduler/job/2184866#L1257
Ok, so in this part of the log,
[pid 485] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_DUMPED,
si_pid=487, si_uid=0, si_status=SIGABRT, si_utime=0, si_stime=0} ---
[pid 485] write(2, \"madvise08.c:117: \33[1;34mTINFO: \33\"...,
64madvise08.c:117: [1;34mTINFO: [0mDump file should be dump-487
) = 64
[pid 485] access(\"dump-487\", F_OK) = 0
[pid 485] openat(AT_FDCWD, \"dump-487\", O_RDONLY) = 3
[pid 485] read(3,
\"\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\4\0>\0\1\0\0\0\0\0\0\0\0\0\0\0\"...,
1024) = 1024
[pid 485] read(3,
\"\0\320\3\0\0\0\0\0\0\340\375\24\304\177\0\0\0\0\0\0\0\0\0\0\0\20\0\0\0\0\0\0\"...,
1024) = 292
[pid 485] read(3, \"\", 1024) = 0
[pid 485] close(3) = 0
[pid 485] write(2, \"madvise08.c:208: \33[1;31mTFAIL: \33\"...,
74madvise08.c:208: [1;31mTFAIL: [0mNo sequence in dump after
MADV_DODUMP.
it seems that the data that was requested to be dumped with MADV_DODUMP is
indeed completely absent.
There was exactly one commit that got merged between next-20210115 and
next-20120118
related to core dumps: 8a3cc755b138 ("coredump: don't bother with
do_truncate()").
Adding Al Viro to Cc for that.
Naresh, could you try reverting that patch?
Arnd
Hi Arnd,
On Mon, 25 Jan 2021 at 20:41, Arnd Bergmann <[email protected]> wrote:
>
> On Mon, Jan 25, 2021 at 3:48 PM Naresh Kamboju
> <[email protected]> wrote:
> >
> > LTP syscalls madvise08 test case failed on all devices from
> > Linux next 20210118 to till day.
> > strace log attached to this email and link provided below.
> >
> > BAD: next-20210118
> > GOOD: next-20210115
> >
> > This failure is easily reproducible on Linux next tag 20210118 above.
> >
> > tst_test.c:1250: TINFO: Timeout per run is 0h 15m 00s
> > madvise08.c:73: TINFO: Temporary core pattern is
> > '/scratch/ltp-2nftQzNI1K/HclFMH/dump-%p'
> > madvise08.c:112: TINFO: Dump file should be dump-10109
> > madvise08.c:196: TPASS: madvise(..., MADV_DONTDUMP)
> > madvise08.c:112: TINFO: Dump file should be dump-10110
> > madvise08.c:203: TFAIL: No sequence in dump after MADV_DODUMP.
> >
> > strace log,
> > https://lkft.validation.linaro.org/scheduler/job/2184866#L1257
>
> Ok, so in this part of the log,
>
> [pid 485] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_DUMPED,
> si_pid=487, si_uid=0, si_status=SIGABRT, si_utime=0, si_stime=0} ---
> [pid 485] write(2, \"madvise08.c:117: \33[1;34mTINFO: \33\"...,
> 64madvise08.c:117: [1;34mTINFO: [0mDump file should be dump-487
> ) = 64
> [pid 485] access(\"dump-487\", F_OK) = 0
> [pid 485] openat(AT_FDCWD, \"dump-487\", O_RDONLY) = 3
> [pid 485] read(3,
> \"\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\4\0>\0\1\0\0\0\0\0\0\0\0\0\0\0\"...,
> 1024) = 1024
> [pid 485] read(3,
> \"\0\320\3\0\0\0\0\0\0\340\375\24\304\177\0\0\0\0\0\0\0\0\0\0\0\20\0\0\0\0\0\0\"...,
> 1024) = 292
> [pid 485] read(3, \"\", 1024) = 0
> [pid 485] close(3) = 0
> [pid 485] write(2, \"madvise08.c:208: \33[1;31mTFAIL: \33\"...,
> 74madvise08.c:208: [1;31mTFAIL: [0mNo sequence in dump after
> MADV_DODUMP.
>
> it seems that the data that was requested to be dumped with MADV_DODUMP is
> indeed completely absent.
>
> There was exactly one commit that got merged between next-20210115 and
> next-20120118
> related to core dumps: 8a3cc755b138 ("coredump: don't bother with
> do_truncate()").
> Adding Al Viro to Cc for that.
>
> Naresh, could you try reverting that patch?
This suspecting commit reverted and tested and the test case PASS.
commit 8a3cc755b13808eba74846dfd1033fcbc21f9a65
Author: Al Viro <[email protected]>
Date: Sun Mar 8 09:16:37 2020 -0400
coredump: don't bother with do_truncate()
have dump_skip() just remember how much needs to be skipped,
leave actual seeks/writing zeroes to the next dump_emit()
or the end of coredump output, whichever comes first.
And instead of playing with do_truncate() in the end, just
write one NUL at the end of the last gap (if any).
Signed-off-by: Al Viro <[email protected]>
fs/binfmt_elf.c | 1 -
fs/coredump.c | 56 +++++++++++++++++++++++++++---------------------
include/linux/binfmts.h | 1 +
include/linux/coredump.h | 1 -
Test case output link,
https://lkft.validation.linaro.org/scheduler/job/2184975#L1369
https://lkft.validation.linaro.org/scheduler/job/2184972#L1358
- Naresh
On Mon, Jan 25, 2021 at 10:01:48PM +0530, Naresh Kamboju wrote:
> commit 8a3cc755b13808eba74846dfd1033fcbc21f9a65
> Author: Al Viro <[email protected]>
> Date: Sun Mar 8 09:16:37 2020 -0400
>
> coredump: don't bother with do_truncate()
>
> have dump_skip() just remember how much needs to be skipped,
> leave actual seeks/writing zeroes to the next dump_emit()
> or the end of coredump output, whichever comes first.
> And instead of playing with do_truncate() in the end, just
> write one NUL at the end of the last gap (if any).
>
> Signed-off-by: Al Viro <[email protected]>
>
> fs/binfmt_elf.c | 1 -
> fs/coredump.c | 56 +++++++++++++++++++++++++++---------------------
> include/linux/binfmts.h | 1 +
> include/linux/coredump.h | 1 -
>
>
> Test case output link,
> https://lkft.validation.linaro.org/scheduler/job/2184975#L1369
> https://lkft.validation.linaro.org/scheduler/job/2184972#L1358
I think I see what's going on. Could you check if the following
fixes your reproducer?
diff --git a/arch/powerpc/platforms/cell/spufs/coredump.c b/arch/powerpc/platforms/cell/spufs/coredump.c
index 60b5583e9eaf..1a587618015c 100644
--- a/arch/powerpc/platforms/cell/spufs/coredump.c
+++ b/arch/powerpc/platforms/cell/spufs/coredump.c
@@ -149,8 +149,7 @@ static int spufs_arch_write_note(struct spu_context *ctx, int i,
return -EIO;
}
- if (!dump_skip(cprm, roundup(cprm->pos - ret + sz, 4) - cprm->pos))
- return -EIO;
+ dump_skip_to(cprm, roundup(cprm->pos - ret + sz, 4));
return 0;
}
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 66137d5dca5f..474a3c7dd5ce 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -2267,8 +2267,7 @@ static int elf_core_dump(struct coredump_params *cprm)
goto end_coredump;
/* Align to page */
- if (!dump_skip(cprm, dataoff - cprm->pos))
- goto end_coredump;
+ dump_skip_to(cprm, dataoff);
for (i = 0; i < vma_count; i++) {
struct core_vma_metadata *meta = vma_meta + i;
diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c
index be4062b8ba75..01f043971644 100644
--- a/fs/binfmt_elf_fdpic.c
+++ b/fs/binfmt_elf_fdpic.c
@@ -1640,8 +1640,7 @@ static int elf_fdpic_core_dump(struct coredump_params *cprm)
goto end_coredump;
}
- if (!dump_skip(cprm, dataoff - cprm->pos))
- goto end_coredump;
+ dump_skip_to(cprm, dataoff);
if (!elf_fdpic_dump_segments(cprm, vma_meta, vma_count))
goto end_coredump;
diff --git a/fs/coredump.c b/fs/coredump.c
index 27a93f724251..bfac7422cd14 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -894,7 +894,14 @@ int dump_emit(struct coredump_params *cprm, const void *addr, int nr)
}
EXPORT_SYMBOL(dump_emit);
-int dump_skip(struct coredump_params *cprm, size_t nr)
+void dump_skip_to(struct coredump_params *cprm, unsigned long pos)
+{
+ cprm->to_skip = pos - cprm->pos;
+ return 0;
+}
+EXPORT_SYMBOL(dump_skip_to);
+
+void dump_skip(struct coredump_params *cprm, size_t nr)
{
cprm->to_skip += nr;
return 0;
@@ -925,11 +932,11 @@ int dump_user_range(struct coredump_params *cprm, unsigned long start,
stop = !dump_emit(cprm, kaddr, PAGE_SIZE);
kunmap(page);
put_page(page);
+ if (stop)
+ return 0;
} else {
- stop = !dump_skip(cprm, PAGE_SIZE);
+ dump_skip(cprm, PAGE_SIZE);
}
- if (stop)
- return 0;
}
return 1;
}
@@ -940,7 +947,9 @@ int dump_align(struct coredump_params *cprm, int align)
unsigned mod = (cprm->pos + cprm->to_skip) & (align - 1);
if (align & (align - 1))
return 0;
- return mod ? dump_skip(cprm, align - mod) : 1;
+ if (mod)
+ cprm->to_skip += align - mod;
+ return 1;
}
EXPORT_SYMBOL(dump_align);
diff --git a/include/linux/coredump.h b/include/linux/coredump.h
index 806d7f87d16a..78fcd776b185 100644
--- a/include/linux/coredump.h
+++ b/include/linux/coredump.h
@@ -23,7 +23,8 @@ extern unsigned int core_pipe_limit;
* functions to write out all the necessary info.
*/
struct coredump_params;
-extern int dump_skip(struct coredump_params *cprm, size_t nr);
+extern void dump_skip_to(struct coredump_params *cprm, unsigned long to);
+extern void dump_skip(struct coredump_params *cprm, size_t nr);
extern int dump_emit(struct coredump_params *cprm, const void *addr, int nr);
extern int dump_align(struct coredump_params *cprm, int align);
int dump_user_range(struct coredump_params *cprm, unsigned long start,
Hi Al Viro,
On Mon, 15 Feb 2021 at 01:25, Al Viro <[email protected]> wrote:
>
> On Mon, Jan 25, 2021 at 10:01:48PM +0530, Naresh Kamboju wrote:
> > commit 8a3cc755b13808eba74846dfd1033fcbc21f9a65
> > Author: Al Viro <[email protected]>
> > Date: Sun Mar 8 09:16:37 2020 -0400
> >
> > coredump: don't bother with do_truncate()
> >
> > have dump_skip() just remember how much needs to be skipped,
> > leave actual seeks/writing zeroes to the next dump_emit()
> > or the end of coredump output, whichever comes first.
> > And instead of playing with do_truncate() in the end, just
> > write one NUL at the end of the last gap (if any).
> >
> > Signed-off-by: Al Viro <[email protected]>
> >
> > fs/binfmt_elf.c | 1 -
> > fs/coredump.c | 56 +++++++++++++++++++++++++++---------------------
> > include/linux/binfmts.h | 1 +
> > include/linux/coredump.h | 1 -
> >
> >
> > Test case output link,
> > https://lkft.validation.linaro.org/scheduler/job/2184975#L1369
> > https://lkft.validation.linaro.org/scheduler/job/2184972#L1358
>
> I think I see what's going on. Could you check if the following
> fixes your reproducer?
>
> diff --git a/arch/powerpc/platforms/cell/spufs/coredump.c b/arch/powerpc/platforms/cell/spufs/coredump.c
> index 60b5583e9eaf..1a587618015c 100644
> --- a/arch/powerpc/platforms/cell/spufs/coredump.c
> +++ b/arch/powerpc/platforms/cell/spufs/coredump.c
> @@ -149,8 +149,7 @@ static int spufs_arch_write_note(struct spu_context *ctx, int i,
> return -EIO;
> }
>
> - if (!dump_skip(cprm, roundup(cprm->pos - ret + sz, 4) - cprm->pos))
> - return -EIO;
> + dump_skip_to(cprm, roundup(cprm->pos - ret + sz, 4));
> return 0;
> }
>
> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> index 66137d5dca5f..474a3c7dd5ce 100644
> --- a/fs/binfmt_elf.c
> +++ b/fs/binfmt_elf.c
> @@ -2267,8 +2267,7 @@ static int elf_core_dump(struct coredump_params *cprm)
> goto end_coredump;
>
> /* Align to page */
> - if (!dump_skip(cprm, dataoff - cprm->pos))
> - goto end_coredump;
> + dump_skip_to(cprm, dataoff);
>
> for (i = 0; i < vma_count; i++) {
> struct core_vma_metadata *meta = vma_meta + i;
> diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c
> index be4062b8ba75..01f043971644 100644
> --- a/fs/binfmt_elf_fdpic.c
> +++ b/fs/binfmt_elf_fdpic.c
> @@ -1640,8 +1640,7 @@ static int elf_fdpic_core_dump(struct coredump_params *cprm)
> goto end_coredump;
> }
>
> - if (!dump_skip(cprm, dataoff - cprm->pos))
> - goto end_coredump;
> + dump_skip_to(cprm, dataoff);
>
> if (!elf_fdpic_dump_segments(cprm, vma_meta, vma_count))
> goto end_coredump;
> diff --git a/fs/coredump.c b/fs/coredump.c
> index 27a93f724251..bfac7422cd14 100644
> --- a/fs/coredump.c
> +++ b/fs/coredump.c
> @@ -894,7 +894,14 @@ int dump_emit(struct coredump_params *cprm, const void *addr, int nr)
> }
> EXPORT_SYMBOL(dump_emit);
>
> -int dump_skip(struct coredump_params *cprm, size_t nr)
> +void dump_skip_to(struct coredump_params *cprm, unsigned long pos)
> +{
> + cprm->to_skip = pos - cprm->pos;
> + return 0;
fs/coredump.c:903:9: error: 'return' with a value, in function
returning void [-Werror=return-type]
903 | return 0;
| ^
Build failed due to above error.
build log link,
https://gitlab.com/Linaro/lkft/users/naresh.kamboju/linux/-/jobs/1029838856#L383
- Naresh
On Mon, Feb 15, 2021 at 02:11:15PM +0530, Naresh Kamboju wrote:
> fs/coredump.c:903:9: error: 'return' with a value, in function
> returning void [-Werror=return-type]
> 903 | return 0;
> | ^
>
> Build failed due to above error.
FWIW, here the test results in
Running tests.......
<<<test_start>>>
tag=madvise08 stime=1613398818
cmdline="madvise08"
contacts=""
analysis=exit
<<<test_output>>>
incrementing stop
tst_test.c:1250: TINFO: Timeout per run is 0h 05m 00s
madvise08.c:78: TINFO: Temporary core pattern is '/tmp/ltp-tgvQ3Lz1UZ/B6lwy6/dump-%p'
madvise08.c:117: TINFO: Dump file should be dump-2276
madvise08.c:201: TPASS: madvise(..., MADV_DONTDUMP)
madvise08.c:117: TINFO: Dump file should be dump-2277
madvise08.c:205: TPASS: madvise(..., MADV_DODUMP)
Summary:
passed 2
failed 0
skipped 0
warnings 0
<<<execution_status>>>
initiation_status="ok"
duration=0 termination_type=exited termination_id=0 corefile=no
cutime=0 cstime=0
<<<test_end>>>
(built without -Werror=return-type, so I'd missed the warnings)
Anyway, I've folded the fix (with those stray return 0 removed, of course)
into #work.coredump and #for-next; works here. Could you test either
branch (in git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git)?
On Mon, 15 Feb 2021 at 20:05, Al Viro <[email protected]> wrote:
>
> On Mon, Feb 15, 2021 at 02:11:15PM +0530, Naresh Kamboju wrote:
>
> > fs/coredump.c:903:9: error: 'return' with a value, in function
> > returning void [-Werror=return-type]
> > 903 | return 0;
> > | ^
> >
> > Build failed due to above error.
>
> FWIW, here the test results in
> Running tests.......
> <<<test_start>>>
> tag=madvise08 stime=1613398818
> cmdline="madvise08"
> contacts=""
> analysis=exit
> <<<test_output>>>
> incrementing stop
> tst_test.c:1250: TINFO: Timeout per run is 0h 05m 00s
> madvise08.c:78: TINFO: Temporary core pattern is '/tmp/ltp-tgvQ3Lz1UZ/B6lwy6/dump-%p'
> madvise08.c:117: TINFO: Dump file should be dump-2276
> madvise08.c:201: TPASS: madvise(..., MADV_DONTDUMP)
> madvise08.c:117: TINFO: Dump file should be dump-2277
> madvise08.c:205: TPASS: madvise(..., MADV_DODUMP)
>
> Summary:
> passed 2
> failed 0
> skipped 0
> warnings 0
> <<<execution_status>>>
> initiation_status="ok"
> duration=0 termination_type=exited termination_id=0 corefile=no
> cutime=0 cstime=0
> <<<test_end>>>
>
> (built without -Werror=return-type, so I'd missed the warnings)
>
> Anyway, I've folded the fix (with those stray return 0 removed, of course)
> into #work.coredump and #for-next; works here. Could you test either
> branch (in git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git)?
By using Linaro tuxsuite [1] I have built your tree and tested on x86_64 and
the reported test PASS now.
LTP syscalls [2] and LTP fs [3] tested on x86_64 and both PASS.
Tested-by: Naresh Kamboju <[email protected]>
[1] https://gitlab.com/Linaro/tuxsuite
[2] https://lkft.validation.linaro.org/results/2278012
[3] https://lkft.validation.linaro.org/results/2280979
- Naresh