Do you see this LTP prot_hsymlinks failure on linux next 20190823 on
x86_64 and i386 devices?
test output log,
useradd: failure while writing changes to /etc/passwd
useradd: /home/hsym was created, but could not be removed
userdel: user 'hsym' does not exist
prot_hsymlinks 1 TBROK : prot_hsymlinks.c:325: Failed to run
cmd: useradd hsym
prot_hsymlinks 2 TBROK : prot_hsymlinks.c:325: Remaining cases broken
prot_hsymlinks 3 TBROK : prot_hsymlinks.c:325: Failed to run
cmd: userdel -r hsym
prot_hsymlinks 4 TBROK : tst_sig.c:234: unexpected signal
SIGIOT/SIGABRT(6) received (pid = 8324).
prot_hsymlinks 5 TBROK : tst_sig.c:234: unexpected signal
SIGIOT/SIGABRT(6) received (pid = 8324).
prot_hsymlinks 6 TBROK : tst_sig.c:234: unexpected signal
SIGIOT/SIGABRT(6) received (pid = 8324).
prot_hsymlinks 7 TBROK : tst_sig.c:234: unexpected signal
SIGIOT/SIGABRT(6) received (pid = 8324).
prot_hsymlinks 8 TBROK : tst_sig.c:234: unexpected signal
SIGIOT/SIGABRT(6) received (pid = 8324).
prot_hsymlinks 9 TBROK : tst_sig.c:234: unexpected signal
SIGIOT/SIGABRT(6) received (pid = 8324).
prot_hsymlinks 10 TBROK : tst_sig.c:234: unexpected signal
SIGIOT/SIGABRT(6) received (pid = 8324).
Full test log,
https://qa-reports.linaro.org/lkft/linux-next-oe/build/next-20190823/testrun/886412/log
Linux version:
Linux version 5.3.0-rc5-next-20190823 (oe-user@oe-host) (gcc version
7.3.0 (GCC)) #1 SMP Fri Aug 23 09:35:54 UTC 2019
steps to reproduce:
cd /opt/ltp
./runltp -s prot_hsymlinks
metadata:
metadata:
git branch: master
git repo: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
git commit: 9733a7c62c66722bcfdb1a6fe4d35c497312d59a
git describe: next-20190823
make_kernelversion: 5.3.0-rc5
kernel-config:
http://snapshots.linaro.org/openembedded/lkft/lkft/sumo/intel-corei7-64/lkft/linux-next/591/config
build-location:
http://snapshots.linaro.org/openembedded/lkft/lkft/sumo/intel-corei7-64/lkft/linux-next/591
toolchain: x86_64-linaro-linux 7.%
series: lkft
ltp-syscalls-tests__url: git://github.com/linux-test-project/ltp.git
ltp-syscalls-tests__version: '20190517'
Best regards
Naresh Kamboju
Hi!
> Do you see this LTP prot_hsymlinks failure on linux next 20190823 on
> x86_64 and i386 devices?
>
> test output log,
> useradd: failure while writing changes to /etc/passwd
> useradd: /home/hsym was created, but could not be removed
This looks like an unrelated problem, failure to write to /etc/passwd
probably means that filesystem is full or some problem happend and how
is remounted RO.
I do not see the kernel messages from this job anywhere at the job
pages, is it stored somewhere?
--
Cyril Hrubis
[email protected]
----- Original Message -----
> Hi!
> > Do you see this LTP prot_hsymlinks failure on linux next 20190823 on
> > x86_64 and i386 devices?
> >
> > test output log,
> > useradd: failure while writing changes to /etc/passwd
> > useradd: /home/hsym was created, but could not be removed
>
> This looks like an unrelated problem, failure to write to /etc/passwd
> probably means that filesystem is full or some problem happend and how
> is remounted RO.
In Naresh' example, root is on NFS:
root=/dev/nfs rw nfsroot=10.66.16.123:/var/lib/lava/dispatcher/tmp/886412/extract-nfsrootfs-tyuevoxm,tcp,hard,intr
10.66.16.123:/var/lib/lava/dispatcher/tmp/886412/extract-nfsrootfs-tyuevoxm on / type nfs (rw,relatime,vers=2,rsize=4096,wsize=4096,namlen=255,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.66.16.123,mountvers=1,mountproto=tcp,local_lock=all,addr=10.66.16.123)
devtmpfs on /dev type devtmpfs (rw,relatime,size=3977640k,nr_inodes=994410,mode=755)
Following message repeats couple times in logs:
NFS: Server wrote zero bytes, expected XXX
Naresh, can you check if there are any errors on NFS server side?
Maybe run NFS cthon against that server with client running next-20190822 and next-20190823.
>
> I do not see the kernel messages from this job anywhere at the job
> pages, is it stored somewhere?
It appears to be mixed in same log file:
https://qa-reports.linaro.org/lkft/linux-next-oe/build/next-20190823/testrun/886412/log
Hi Jan and Cyril,
On Mon, 26 Aug 2019 at 16:35, Jan Stancek <[email protected]> wrote:
>
>
>
> ----- Original Message -----
> > Hi!
> > > Do you see this LTP prot_hsymlinks failure on linux next 20190823 on
> > > x86_64 and i386 devices?
> > >
> > > test output log,
> > > useradd: failure while writing changes to /etc/passwd
> > > useradd: /home/hsym was created, but could not be removed
> >
> > This looks like an unrelated problem, failure to write to /etc/passwd
> > probably means that filesystem is full or some problem happend and how
> > is remounted RO.
>
> In Naresh' example, root is on NFS:
> root=/dev/nfs rw nfsroot=10.66.16.123:/var/lib/lava/dispatcher/tmp/886412/extract-nfsrootfs-tyuevoxm,tcp,hard,intr
Right !
root is mounted on NFS.
>
> 10.66.16.123:/var/lib/lava/dispatcher/tmp/886412/extract-nfsrootfs-tyuevoxm on / type nfs (rw,relatime,vers=2,rsize=4096,wsize=4096,namlen=255,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.66.16.123,mountvers=1,mountproto=tcp,local_lock=all,addr=10.66.16.123)
> devtmpfs on /dev type devtmpfs (rw,relatime,size=3977640k,nr_inodes=994410,mode=755)
>
> Following message repeats couple times in logs:
> NFS: Server wrote zero bytes, expected XXX
>
> Naresh, can you check if there are any errors on NFS server side?
I have re-tested the failed tests on next-20190822 and all get pass
which is also
using same NFS server [1] [2].
> Maybe run NFS cthon against that server with client running next-20190822 and next-20190823.
Thanks for the pointers.
I will setup and run NFS cthon on next-20190822 and next-20190823.
>
> >
> > I do not see the kernel messages from this job anywhere at the job
> > pages, is it stored somewhere?
>
> It appears to be mixed in same log file:
> https://qa-reports.linaro.org/lkft/linux-next-oe/build/next-20190823/testrun/886412/log
For the record the following tests failed on linux -next-20190823 on x86_64
and i386. The filesystem is mounted on NFS and tests are using
locally mounted hard drive ( with -d /scratch ).
Juno-r2 device filesystem mounted on NFS and did not see these errors
and test getting pass on -next-20190823.
These failures are reproducible all time on next-20190823 kernel on x86_64
and i386 device with root mounted on NFS [3] [4] [5] [6].
I will git bisect to find out which is bad commit.
prot_hsymlinks: [3]
------------------
useradd: failure while writing changes to /etc/passwd
useradd: /home/hsym was created, but could not be removed
userdel: user 'hsym' does not exist
prot_hsymlinks 1 TBROK : prot_hsymlinks.c:325: Failed to run
cmd: useradd hsym
prot_hsymlinks 2 TBROK : prot_hsymlinks.c:325: Remaining cases broken
prot_hsymlinks 3 TBROK : prot_hsymlinks.c:325: Failed to run
cmd: userdel -r hsym
prot_hsymlinks 4 TBROK : tst_sig.c:234: unexpected signal
SIGIOT/SIGABRT(6) received (pid = 8324).
logrotate01: [4]
-------------
compressing log with: /bin/gzip
error: error creating temp state file /var/lib/logrotate.status.tmp:
Input/output error
logrotate01 1 TFAIL : ltpapicmd.c:154: Test #1: logrotate
command exited with 1 return code. Output:
sem_unlink_2-2: [5]
------------------
make[3]: Entering directory
'/opt/ltp/testcases/open_posix_testsuite/conformance/interfaces/sem_unlink'
cat: write error: Input/output error
conformance/interfaces/sem_unlink/sem_unlink_2-2: execution: FAILED
syslog{01 ...10} [6]
-------------------
cp: failed to close '/etc/syslog.conf.ltpback': Input/output error
syslog01 1 TBROK : ltpapicmd.c:188: failed to backup /etc/syslog.conf
cp: failed to close '/etc/syslog.conf.ltpback': Input/output error
syslog02 1 TBROK : ltpapicmd.c:188: failed to backup /etc/syslog.conf
...
cp: failed to close '/etc/syslog.conf.ltpback': Input/output error
syslog10 1 TBROK : ltpapicmd.c:188: failed to backup /etc/syslog.conf
ref:
PASS on 20190222:
[1] https://lkft.validation.linaro.org/scheduler/job/890446#L1232
[2] https://lkft.validation.linaro.org/scheduler/job/890454
FAILED on 20190823:
[3] https://lkft.validation.linaro.org/scheduler/job/890404#L1245
[4] https://lkft.validation.linaro.org/scheduler/job/886408#L2544
[5] https://lkft.validation.linaro.org/scheduler/job/886409#L3088
[6] https://lkft.validation.linaro.org/scheduler/job/890400#L1234
- Naresh
----- Original Message -----
> Hi Jan and Cyril,
>
> On Mon, 26 Aug 2019 at 16:35, Jan Stancek <[email protected]> wrote:
> >
> >
> >
> > ----- Original Message -----
> > > Hi!
> > > > Do you see this LTP prot_hsymlinks failure on linux next 20190823 on
> > > > x86_64 and i386 devices?
> > > >
> > > > test output log,
> > > > useradd: failure while writing changes to /etc/passwd
> > > > useradd: /home/hsym was created, but could not be removed
> > >
> > > This looks like an unrelated problem, failure to write to /etc/passwd
> > > probably means that filesystem is full or some problem happend and how
> > > is remounted RO.
> >
> > In Naresh' example, root is on NFS:
> > root=/dev/nfs rw
> > nfsroot=10.66.16.123:/var/lib/lava/dispatcher/tmp/886412/extract-nfsrootfs-tyuevoxm,tcp,hard,intr
>
> Right !
> root is mounted on NFS.
>
> >
> > 10.66.16.123:/var/lib/lava/dispatcher/tmp/886412/extract-nfsrootfs-tyuevoxm
> > on / type nfs
> > (rw,relatime,vers=2,rsize=4096,wsize=4096,namlen=255,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.66.16.123,mountvers=1,mountproto=tcp,local_lock=all,addr=10.66.16.123)
> > devtmpfs on /dev type devtmpfs
> > (rw,relatime,size=3977640k,nr_inodes=994410,mode=755)
> >
> > Following message repeats couple times in logs:
> > NFS: Server wrote zero bytes, expected XXX
> >
> > Naresh, can you check if there are any errors on NFS server side?
>
> I have re-tested the failed tests on next-20190822 and all get pass
> which is also
> using same NFS server [1] [2].
Thanks, that suggests some client side change between next-20190822 and next-20190823
might introduced it.
>
> > Maybe run NFS cthon against that server with client running next-20190822
> > and next-20190823.
>
> Thanks for the pointers.
> I will setup and run NFS cthon on next-20190822 and next-20190823.
I'll try to reproduce too.
>
> >
> > >
> > > I do not see the kernel messages from this job anywhere at the job
> > > pages, is it stored somewhere?
> >
> > It appears to be mixed in same log file:
> > https://qa-reports.linaro.org/lkft/linux-next-oe/build/next-20190823/testrun/886412/log
>
> For the record the following tests failed on linux -next-20190823 on x86_64
> and i386. The filesystem is mounted on NFS and tests are using
> locally mounted hard drive ( with -d /scratch ).
>
> Juno-r2 device filesystem mounted on NFS and did not see these errors
> and test getting pass on -next-20190823.
>
> These failures are reproducible all time on next-20190823 kernel on x86_64
> and i386 device with root mounted on NFS [3] [4] [5] [6].
>
> I will git bisect to find out which is bad commit.
>
> prot_hsymlinks: [3]
> ------------------
> useradd: failure while writing changes to /etc/passwd
> useradd: /home/hsym was created, but could not be removed
> userdel: user 'hsym' does not exist
> prot_hsymlinks 1 TBROK : prot_hsymlinks.c:325: Failed to run
> cmd: useradd hsym
> prot_hsymlinks 2 TBROK : prot_hsymlinks.c:325: Remaining cases broken
> prot_hsymlinks 3 TBROK : prot_hsymlinks.c:325: Failed to run
> cmd: userdel -r hsym
> prot_hsymlinks 4 TBROK : tst_sig.c:234: unexpected signal
> SIGIOT/SIGABRT(6) received (pid = 8324).
>
> logrotate01: [4]
> -------------
> compressing log with: /bin/gzip
> error: error creating temp state file /var/lib/logrotate.status.tmp:
> Input/output error
> logrotate01 1 TFAIL : ltpapicmd.c:154: Test #1: logrotate
> command exited with 1 return code. Output:
>
> sem_unlink_2-2: [5]
> ------------------
> make[3]: Entering directory
> '/opt/ltp/testcases/open_posix_testsuite/conformance/interfaces/sem_unlink'
> cat: write error: Input/output error
> conformance/interfaces/sem_unlink/sem_unlink_2-2: execution: FAILED
>
> syslog{01 ...10} [6]
> -------------------
> cp: failed to close '/etc/syslog.conf.ltpback': Input/output error
> syslog01 1 TBROK : ltpapicmd.c:188: failed to backup /etc/syslog.conf
>
> cp: failed to close '/etc/syslog.conf.ltpback': Input/output error
> syslog02 1 TBROK : ltpapicmd.c:188: failed to backup /etc/syslog.conf
>
> ...
> cp: failed to close '/etc/syslog.conf.ltpback': Input/output error
> syslog10 1 TBROK : ltpapicmd.c:188: failed to backup /etc/syslog.conf
>
> ref:
> PASS on 20190222:
> [1] https://lkft.validation.linaro.org/scheduler/job/890446#L1232
> [2] https://lkft.validation.linaro.org/scheduler/job/890454
>
> FAILED on 20190823:
> [3] https://lkft.validation.linaro.org/scheduler/job/890404#L1245
> [4] https://lkft.validation.linaro.org/scheduler/job/886408#L2544
> [5] https://lkft.validation.linaro.org/scheduler/job/886409#L3088
> [6] https://lkft.validation.linaro.org/scheduler/job/890400#L1234
>
> - Naresh
>
On Mon, 2019-08-26 at 10:38 -0400, Jan Stancek wrote:
> ----- Original Message -----
> > Hi Jan and Cyril,
> >
> > On Mon, 26 Aug 2019 at 16:35, Jan Stancek <[email protected]>
> > wrote:
> > >
> > >
> > > ----- Original Message -----
> > > > Hi!
> > > > > Do you see this LTP prot_hsymlinks failure on linux next
> > > > > 20190823 on
> > > > > x86_64 and i386 devices?
> > > > >
> > > > > test output log,
> > > > > useradd: failure while writing changes to /etc/passwd
> > > > > useradd: /home/hsym was created, but could not be removed
> > > >
> > > > This looks like an unrelated problem, failure to write to
> > > > /etc/passwd
> > > > probably means that filesystem is full or some problem happend
> > > > and how
> > > > is remounted RO.
> > >
> > > In Naresh' example, root is on NFS:
> > > root=/dev/nfs rw
> > >
> > > nfsroot=10.66.16.123:/var/lib/lava/dispatcher/tmp/886412/extract-
> > > nfsrootfs-tyuevoxm,tcp,hard,intr
> >
> > Right !
> > root is mounted on NFS.
> >
> > > 10.66.16.123:/var/lib/lava/dispatcher/tmp/886412/extract-
> > > nfsrootfs-tyuevoxm
> > > on / type nfs
> > > (rw,relatime,vers=2,rsize=4096,wsize=4096,namlen=255,hard,nolock,
> > > proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.66.16.123,moun
> > > tvers=1,mountproto=tcp,local_lock=all,addr=10.66.16.123)
> > > devtmpfs on /dev type devtmpfs
> > > (rw,relatime,size=3977640k,nr_inodes=994410,mode=755)
> > >
The only thing I can think of that might cause an EIO on NFSv2 would be
this patch
http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=commitdiff;h=627d48e597ec5993c4abb3b81dc75e554a07c7c0
assuming that a bind-related error is leaking through.
I'd suggest something like the following to fix it up:
8<---------------------------------------
From 1e9336ac5363914dfcc1f49bf091409edbf36f8d Mon Sep 17 00:00:00 2001
From: Trond Myklebust <[email protected]>
Date: Mon, 26 Aug 2019 11:44:04 -0400
Subject: [PATCH] fixup! SUNRPC: Don't handle errors if the bind/connect
succeeded
Signed-off-by: Trond Myklebust <[email protected]>
---
net/sunrpc/clnt.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index f13ec73c8299..a07b516e503a 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1980,9 +1980,11 @@ call_bind_status(struct rpc_task *task)
dprint_status(task);
trace_rpc_bind_status(task);
- if (task->tk_status >= 0 || xprt_bound(xprt)) {
- task->tk_action = call_connect;
- return;
+ if (task->tk_status >= 0)
+ goto out_next;
+ if (xprt_bound(xprt)) {
+ task->tk_status = 0;
+ goto out_next;
}
switch (task->tk_status) {
@@ -2045,6 +2047,9 @@ call_bind_status(struct rpc_task *task)
rpc_call_rpcerror(task, status);
return;
+out_next:
+ task->tk_action = call_connect;
+ return;
retry_timeout:
task->tk_status = 0;
task->tk_action = call_bind;
@@ -2107,8 +2112,10 @@ call_connect_status(struct rpc_task *task)
clnt->cl_stats->netreconn++;
goto out_next;
}
- if (xprt_connected(xprt))
+ if (xprt_connected(xprt)) {
+ task->tk_status = 0;
goto out_next;
+ }
task->tk_status = 0;
switch (status) {
--
2.21.0
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]
----- Original Message -----
> On Mon, 2019-08-26 at 10:38 -0400, Jan Stancek wrote:
> > ----- Original Message -----
> > > Hi Jan and Cyril,
> > >
> > > On Mon, 26 Aug 2019 at 16:35, Jan Stancek <[email protected]>
> > > wrote:
> > > >
> > > >
> > > > ----- Original Message -----
> > > > > Hi!
> > > > > > Do you see this LTP prot_hsymlinks failure on linux next
> > > > > > 20190823 on
> > > > > > x86_64 and i386 devices?
> > > > > >
> > > > > > test output log,
> > > > > > useradd: failure while writing changes to /etc/passwd
> > > > > > useradd: /home/hsym was created, but could not be removed
> > > > >
> > > > > This looks like an unrelated problem, failure to write to
> > > > > /etc/passwd
> > > > > probably means that filesystem is full or some problem happend
> > > > > and how
> > > > > is remounted RO.
> > > >
> > > > In Naresh' example, root is on NFS:
> > > > root=/dev/nfs rw
> > > >
> > > > nfsroot=10.66.16.123:/var/lib/lava/dispatcher/tmp/886412/extract-
> > > > nfsrootfs-tyuevoxm,tcp,hard,intr
> > >
> > > Right !
> > > root is mounted on NFS.
> > >
> > > > 10.66.16.123:/var/lib/lava/dispatcher/tmp/886412/extract-
> > > > nfsrootfs-tyuevoxm
> > > > on / type nfs
> > > > (rw,relatime,vers=2,rsize=4096,wsize=4096,namlen=255,hard,nolock,
> > > > proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.66.16.123,moun
> > > > tvers=1,mountproto=tcp,local_lock=all,addr=10.66.16.123)
> > > > devtmpfs on /dev type devtmpfs
> > > > (rw,relatime,size=3977640k,nr_inodes=994410,mode=755)
> > > >
>
> The only thing I can think of that might cause an EIO on NFSv2 would be
> this patch
> http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=commitdiff;h=627d48e597ec5993c4abb3b81dc75e554a07c7c0
> assuming that a bind-related error is leaking through.
>
> I'd suggest something like the following to fix it up:
No change with that patch,
but following one fixes it for me:
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 20b3717cd7ca..56cefa0ab804 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -590,7 +590,7 @@ static void nfs_pgio_rpcsetup(struct nfs_pgio_header *hdr,
}
hdr->res.fattr = &hdr->fattr;
- hdr->res.count = 0;
+ hdr->res.count = count;
hdr->res.eof = 0;
hdr->res.verf = &hdr->verf;
nfs_fattr_init(&hdr->fattr);
which is functionally revert of "NFS: Fix initialisation of I/O result struct in nfs_pgio_rpcsetup".
This hunk caught my eye, could res.eof == 0 explain those I/O errors?
/* Emulate the eof flag, which isn't normally needed in NFSv2
* as it is guaranteed to always return the file attributes
*/
if (hdr->args.offset + hdr->res.count >= hdr->res.fattr->size)
hdr->res.eof = 1;
On Mon, 2019-08-26 at 19:12 -0400, Jan Stancek wrote:
> ----- Original Message -----
> > On Mon, 2019-08-26 at 10:38 -0400, Jan Stancek wrote:
> > > ----- Original Message -----
> > > > Hi Jan and Cyril,
> > > >
> > > > On Mon, 26 Aug 2019 at 16:35, Jan Stancek <[email protected]>
> > > > wrote:
> > > > >
> > > > > ----- Original Message -----
> > > > > > Hi!
> > > > > > > Do you see this LTP prot_hsymlinks failure on linux next
> > > > > > > 20190823 on
> > > > > > > x86_64 and i386 devices?
> > > > > > >
> > > > > > > test output log,
> > > > > > > useradd: failure while writing changes to /etc/passwd
> > > > > > > useradd: /home/hsym was created, but could not be removed
> > > > > >
> > > > > > This looks like an unrelated problem, failure to write to
> > > > > > /etc/passwd
> > > > > > probably means that filesystem is full or some problem
> > > > > > happend
> > > > > > and how
> > > > > > is remounted RO.
> > > > >
> > > > > In Naresh' example, root is on NFS:
> > > > > root=/dev/nfs rw
> > > > >
> > > > > nfsroot=10.66.16.123:/var/lib/lava/dispatcher/tmp/886412/extr
> > > > > act-
> > > > > nfsrootfs-tyuevoxm,tcp,hard,intr
> > > >
> > > > Right !
> > > > root is mounted on NFS.
> > > >
> > > > > 10.66.16.123:/var/lib/lava/dispatcher/tmp/886412/extract-
> > > > > nfsrootfs-tyuevoxm
> > > > > on / type nfs
> > > > > (rw,relatime,vers=2,rsize=4096,wsize=4096,namlen=255,hard,nol
> > > > > ock,
> > > > > proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.66.16.123,
> > > > > moun
> > > > > tvers=1,mountproto=tcp,local_lock=all,addr=10.66.16.123)
> > > > > devtmpfs on /dev type devtmpfs
> > > > > (rw,relatime,size=3977640k,nr_inodes=994410,mode=755)
> > > > >
> >
> > The only thing I can think of that might cause an EIO on NFSv2
> > would be
> > this patch
> > http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=commitdiff;h=627d48e597ec5993c4abb3b81dc75e554a07c7c0
> > assuming that a bind-related error is leaking through.
> >
> > I'd suggest something like the following to fix it up:
>
> No change with that patch,
> but following one fixes it for me:
>
> diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
> index 20b3717cd7ca..56cefa0ab804 100644
> --- a/fs/nfs/pagelist.c
> +++ b/fs/nfs/pagelist.c
> @@ -590,7 +590,7 @@ static void nfs_pgio_rpcsetup(struct
> nfs_pgio_header *hdr,
> }
>
> hdr->res.fattr = &hdr->fattr;
> - hdr->res.count = 0;
> + hdr->res.count = count;
> hdr->res.eof = 0;
> hdr->res.verf = &hdr->verf;
> nfs_fattr_init(&hdr->fattr);
>
> which is functionally revert of "NFS: Fix initialisation of I/O
> result struct in nfs_pgio_rpcsetup".
>
> This hunk caught my eye, could res.eof == 0 explain those I/O errors?
Interesting hypothesis. It could if res.count ends up being 0. So does
the following also fix the problem?
8<----------------------------------------
From b5bc0812350e94f8c9331174d22f24692411aef9 Mon Sep 17 00:00:00 2001
From: Trond Myklebust <[email protected]>
Date: Mon, 26 Aug 2019 20:41:16 -0400
Subject: [PATCH] NFSv2: Fix eof handling
If we received a reply from the server with a zero length read and
no error, then that implies we are at eof.
Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/proc.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/nfs/proc.c b/fs/nfs/proc.c
index 5552fa8b6e12..5919878549d2 100644
--- a/fs/nfs/proc.c
+++ b/fs/nfs/proc.c
@@ -594,7 +594,8 @@ static int nfs_read_done(struct rpc_task *task, struct nfs_pgio_header *hdr)
/* Emulate the eof flag, which isn't normally needed in NFSv2
* as it is guaranteed to always return the file attributes
*/
- if (hdr->args.offset + hdr->res.count >= hdr->res.fattr->size)
+ if (hdr->res.count == 0 && hdr->args.count > 0 ||
+ hdr->args.offset + hdr->res.count >= hdr->res.fattr->size)
hdr->res.eof = 1;
}
return 0;
--
2.21.0
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]
On Tue, 27 Aug 2019 at 04:42, Jan Stancek <[email protected]> wrote:
>
>
> ----- Original Message -----
> > On Mon, 2019-08-26 at 10:38 -0400, Jan Stancek wrote:
>
> No change with that patch,
Same for me.
> but following one fixes it for me:
Works for me.
Thanks for the fix patch.
>
> diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
> index 20b3717cd7ca..56cefa0ab804 100644
> --- a/fs/nfs/pagelist.c
> +++ b/fs/nfs/pagelist.c
> @@ -590,7 +590,7 @@ static void nfs_pgio_rpcsetup(struct nfs_pgio_header *hdr,
> }
>
> hdr->res.fattr = &hdr->fattr;
> - hdr->res.count = 0;
> + hdr->res.count = count;
> hdr->res.eof = 0;
> hdr->res.verf = &hdr->verf;
> nfs_fattr_init(&hdr->fattr);
>
> which is functionally revert of "NFS: Fix initialisation of I/O result struct in nfs_pgio_rpcsetup".
>
> This hunk caught my eye, could res.eof == 0 explain those I/O errors?
> /* Emulate the eof flag, which isn't normally needed in NFSv2
> * as it is guaranteed to always return the file attributes
> */
> if (hdr->args.offset + hdr->res.count >= hdr->res.fattr->size)
> hdr->res.eof = 1;
- Naresh
----- Original Message -----
> On Mon, 2019-08-26 at 19:12 -0400, Jan Stancek wrote:
> > ----- Original Message -----
> > > On Mon, 2019-08-26 at 10:38 -0400, Jan Stancek wrote:
> > > > ----- Original Message -----
> > > > > Hi Jan and Cyril,
> > > > >
> > > > > On Mon, 26 Aug 2019 at 16:35, Jan Stancek <[email protected]>
> > > > > wrote:
> > > > > >
> > > > > > ----- Original Message -----
> > > > > > > Hi!
> > > > > > > > Do you see this LTP prot_hsymlinks failure on linux next
> > > > > > > > 20190823 on
> > > > > > > > x86_64 and i386 devices?
> > > > > > > >
> > > > > > > > test output log,
> > > > > > > > useradd: failure while writing changes to /etc/passwd
> > > > > > > > useradd: /home/hsym was created, but could not be removed
> > > > > > >
> > > > > > > This looks like an unrelated problem, failure to write to
> > > > > > > /etc/passwd
> > > > > > > probably means that filesystem is full or some problem
> > > > > > > happend
> > > > > > > and how
> > > > > > > is remounted RO.
> > > > > >
> > > > > > In Naresh' example, root is on NFS:
> > > > > > root=/dev/nfs rw
> > > > > >
> > > > > > nfsroot=10.66.16.123:/var/lib/lava/dispatcher/tmp/886412/extr
> > > > > > act-
> > > > > > nfsrootfs-tyuevoxm,tcp,hard,intr
> > > > >
> > > > > Right !
> > > > > root is mounted on NFS.
> > > > >
> > > > > > 10.66.16.123:/var/lib/lava/dispatcher/tmp/886412/extract-
> > > > > > nfsrootfs-tyuevoxm
> > > > > > on / type nfs
> > > > > > (rw,relatime,vers=2,rsize=4096,wsize=4096,namlen=255,hard,nol
> > > > > > ock,
> > > > > > proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.66.16.123,
> > > > > > moun
> > > > > > tvers=1,mountproto=tcp,local_lock=all,addr=10.66.16.123)
> > > > > > devtmpfs on /dev type devtmpfs
> > > > > > (rw,relatime,size=3977640k,nr_inodes=994410,mode=755)
> > > > > >
> > >
> > > The only thing I can think of that might cause an EIO on NFSv2
> > > would be
> > > this patch
> > > http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=commitdiff;h=627d48e597ec5993c4abb3b81dc75e554a07c7c0
> > > assuming that a bind-related error is leaking through.
> > >
> > > I'd suggest something like the following to fix it up:
> >
> > No change with that patch,
> > but following one fixes it for me:
> >
> > diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
> > index 20b3717cd7ca..56cefa0ab804 100644
> > --- a/fs/nfs/pagelist.c
> > +++ b/fs/nfs/pagelist.c
> > @@ -590,7 +590,7 @@ static void nfs_pgio_rpcsetup(struct
> > nfs_pgio_header *hdr,
> > }
> >
> > hdr->res.fattr = &hdr->fattr;
> > - hdr->res.count = 0;
> > + hdr->res.count = count;
> > hdr->res.eof = 0;
> > hdr->res.verf = &hdr->verf;
> > nfs_fattr_init(&hdr->fattr);
> >
> > which is functionally revert of "NFS: Fix initialisation of I/O
> > result struct in nfs_pgio_rpcsetup".
> >
> > This hunk caught my eye, could res.eof == 0 explain those I/O errors?
>
> Interesting hypothesis. It could if res.count ends up being 0. So does
> the following also fix the problem?
It didn't fix it.
That theory is probably not correct for this case, since EIO I see appears
to originate from write and nfs_writeback_result(). This function also
produces message we saw in logs from Naresh.
I can't find where/how is resp->count updated on WRITE reply in NFSv2.
Issue also goes away with patch below, though I can't speak about its correctness:
NFS version Type Test Return code
nfsvers=2 tcp -b:base 0
nfsvers=2 tcp -g:general 0
nfsvers=2 tcp -s:special 0
nfsvers=2 tcp -l:lock 0
Total time: 141
diff --git a/fs/nfs/nfs2xdr.c b/fs/nfs/nfs2xdr.c
index cbc17a203248..4913c6da270b 100644
--- a/fs/nfs/nfs2xdr.c
+++ b/fs/nfs/nfs2xdr.c
@@ -897,6 +897,16 @@ static int nfs2_xdr_dec_writeres(struct rpc_rqst *req, struct xdr_stream *xdr,
void *data)
{
struct nfs_pgio_res *result = data;
+ struct rpc_task *rq_task = req->rq_task;
+
+ if (rq_task) {
+ struct nfs_pgio_args *args = rq_task->tk_msg.rpc_argp;
+
+ if (args) {
+ result->count = args->count;
+ }
+ }
/* All NFSv2 writes are "file sync" writes */
result->verf->committed = NFS_FILE_SYNC;
On Tue, 2019-08-27 at 06:25 -0400, Jan Stancek wrote:
> That theory is probably not correct for this case, since EIO I see
> appears
> to originate from write and nfs_writeback_result(). This function
> also
> produces message we saw in logs from Naresh.
>
> I can't find where/how is resp->count updated on WRITE reply in
> NFSv2.
> Issue also goes away with patch below, though I can't speak about its
> correctness:
>
> NFS version Type Test Return code
> nfsvers=2 tcp -b:base 0
> nfsvers=2 tcp -g:general 0
> nfsvers=2 tcp -s:special 0
> nfsvers=2 tcp -l:lock 0
> Total time: 141
>
> diff --git a/fs/nfs/nfs2xdr.c b/fs/nfs/nfs2xdr.c
> index cbc17a203248..4913c6da270b 100644
> --- a/fs/nfs/nfs2xdr.c
> +++ b/fs/nfs/nfs2xdr.c
> @@ -897,6 +897,16 @@ static int nfs2_xdr_dec_writeres(struct rpc_rqst
> *req, struct xdr_stream *xdr,
> void *data)
> {
> struct nfs_pgio_res *result = data;
> + struct rpc_task *rq_task = req->rq_task;
> +
> + if (rq_task) {
> + struct nfs_pgio_args *args = rq_task-
> >tk_msg.rpc_argp;
> +
> + if (args) {
> + result->count = args->count;
> + }
> + }
>
> /* All NFSv2 writes are "file sync" writes */
> result->verf->committed = NFS_FILE_SYNC;
Thanks! I've moved the above to nfs_write_done() so that we do it only
on success (see
http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=commitdiff;h=3ba5688da709dd0f7d917029c206bc1848a6ae74
)
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]
----- Original Message -----
> On Tue, 2019-08-27 at 06:25 -0400, Jan Stancek wrote:
> > That theory is probably not correct for this case, since EIO I see
> > appears
> > to originate from write and nfs_writeback_result(). This function
> > also
> > produces message we saw in logs from Naresh.
> >
> > I can't find where/how is resp->count updated on WRITE reply in
> > NFSv2.
> > Issue also goes away with patch below, though I can't speak about its
> > correctness:
> >
> > NFS version Type Test Return code
> > nfsvers=2 tcp -b:base 0
> > nfsvers=2 tcp -g:general 0
> > nfsvers=2 tcp -s:special 0
> > nfsvers=2 tcp -l:lock 0
> > Total time: 141
> >
> > diff --git a/fs/nfs/nfs2xdr.c b/fs/nfs/nfs2xdr.c
> > index cbc17a203248..4913c6da270b 100644
> > --- a/fs/nfs/nfs2xdr.c
> > +++ b/fs/nfs/nfs2xdr.c
> > @@ -897,6 +897,16 @@ static int nfs2_xdr_dec_writeres(struct rpc_rqst
> > *req, struct xdr_stream *xdr,
> > void *data)
> > {
> > struct nfs_pgio_res *result = data;
> > + struct rpc_task *rq_task = req->rq_task;
> > +
> > + if (rq_task) {
> > + struct nfs_pgio_args *args = rq_task-
> > >tk_msg.rpc_argp;
> > +
> > + if (args) {
> > + result->count = args->count;
> > + }
> > + }
> >
> > /* All NFSv2 writes are "file sync" writes */
> > result->verf->committed = NFS_FILE_SYNC;
>
> Thanks! I've moved the above to nfs_write_done() so that we do it only
> on success (see
> http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=commitdiff;h=3ba5688da709dd0f7d917029c206bc1848a6ae74
> )
Thanks, retested with 3ba5688da, all PASS:
NFS version Type Test Return code
nfsvers=2 tcp -b:base 0
nfsvers=2 tcp -g:general 0
nfsvers=2 tcp -s:special 0
nfsvers=2 tcp -l:lock 0
NFS version Type Test Return code
nfsvers=3 tcp -b:base 0
nfsvers=3 tcp -g:general 0
nfsvers=3 tcp -s:special 0
nfsvers=3 tcp -l:lock 0
nfsvers=3 tcp6 -b:base 0
nfsvers=3 tcp6 -g:general 0
nfsvers=3 tcp6 -s:special 0
nfsvers=3 tcp6 -l:lock 0
NFS version Type Test Return code
nfsvers=4 tcp -b:base 0
nfsvers=4 tcp -g:general 0
nfsvers=4 tcp -s:special 0
nfsvers=4 tcp -l:lock 0
nfsvers=4 tcp6 -b:base 0
nfsvers=4 tcp6 -g:general 0
nfsvers=4 tcp6 -s:special 0
nfsvers=4 tcp6 -l:lock 0
Feel free to add also:
Reported-by: Naresh Kamboju <[email protected]>
Tested-by: Jan Stancek <[email protected]>