2021-06-15 16:48:00

by Thiago Rafael Becker

[permalink] [raw]
Subject: [PATCH] cifs: retry lookup and readdir when EAGAIN is returned.

According to the investigation performed by Jacob Shivers at Red Hat,
cifs_lookup and cifs_readdir leak EAGAIN when the user session is
deleted on the server. Fix this issue by implementing a retry with
limits, as is implemented in cifs_revalidate_dentry_attr.

Reproducer based on the work by Jacob Shivers:

~~~
$ cat readdir-cifs-test.sh
#!/bin/bash

# Install and configure powershell and sshd on the windows
# server as descibed in
# https://docs.microsoft.com/en-us/windows-server/administration/openssh/openssh_overview
# This script uses expect(1)

USER=dude
SERVER=192.168.0.2
RPATH=root
PASS='password'

function debug_funcs {
for line in $@ ; do
echo "func $line +p" > /sys/kernel/debug/dynamic_debug/control
done
}

function setup {
echo 1 > /proc/fs/cifs/cifsFYI
debug_funcs wait_for_compound_request \
smb2_query_dir_first cifs_readdir \
compound_send_recv cifs_reconnect_tcon \
generic_ip_connect cifs_reconnect \
smb2_reconnect_server smb2_reconnect \
cifs_readv_from_socket cifs_readv_receive
tcpdump -i eth0 -w cifs.pcap host 192.168.2.182 & sleep 5
dmesg -C
}

function test_call {
if [[ $1 == 1 ]] ; then
tracer="strace -tt -f -s 4096 -o trace-$(date -Iseconds).txt"
fi
# Change the command here to anything apropriate
$tracer ls $2 > /dev/null
res=$?
if [[ $1 == 1 ]] ; then
if [[ $res == 0 ]] ; then
1>&2 echo success
else
1>&2 echo "failure ($res)"
fi
fi
}

mountpoint /mnt > /dev/null || mount -t cifs -o username=$USER,pass=$PASS //$SERVER/$RPATH /mnt

test_call 0 /mnt/

/usr/bin/expect << EOF
set timeout 60

spawn ssh $USER@$SERVER

expect "yes/no" {
send "yes\r"
expect "*?assword" { send "$PASS\r" }
} "*?assword" { send "$PASS\r" }

expect ">" { send "powershell close-smbsession -force\r" }
expect ">" { send "exit\r" }
expect eof
EOF

sysctl -w vm.drop_caches=2 > /dev/null
sysctl -w vm.drop_caches=2 > /dev/null

setup

test_call 1 /mnt/
~~~

Signed-off-by: Thiago Rafael Becker <[email protected]>
---
fs/cifs/dir.c | 4 ++++
fs/cifs/smb2ops.c | 5 +++++
2 files changed, 9 insertions(+)

diff --git a/fs/cifs/dir.c b/fs/cifs/dir.c
index 6bcd3e8f7cda..7c641f9a3dac 100644
--- a/fs/cifs/dir.c
+++ b/fs/cifs/dir.c
@@ -630,6 +630,7 @@ cifs_lookup(struct inode *parent_dir_inode, struct dentry *direntry,
struct inode *newInode = NULL;
const char *full_path;
void *page;
+ int retry_count = 0;

xid = get_xid();

@@ -673,6 +674,7 @@ cifs_lookup(struct inode *parent_dir_inode, struct dentry *direntry,
cifs_dbg(FYI, "Full path: %s inode = 0x%p\n",
full_path, d_inode(direntry));

+again:
if (pTcon->posix_extensions)
rc = smb311_posix_get_inode_info(&newInode, full_path, parent_dir_inode->i_sb, xid);
else if (pTcon->unix_ext) {
@@ -687,6 +689,8 @@ cifs_lookup(struct inode *parent_dir_inode, struct dentry *direntry,
/* since paths are not looked up by component - the parent
directories are presumed to be good here */
renew_parental_timestamps(direntry);
+ } else if (rc == -EAGAIN && retry_count++ < 10) {
+ goto again;
} else if (rc == -ENOENT) {
cifs_set_time(direntry, jiffies);
newInode = NULL;
diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
index 21ef51d338e0..d241e6af8fe4 100644
--- a/fs/cifs/smb2ops.c
+++ b/fs/cifs/smb2ops.c
@@ -2325,6 +2325,7 @@ smb2_query_dir_first(const unsigned int xid, struct cifs_tcon *tcon,
struct smb2_query_directory_rsp *qd_rsp = NULL;
struct smb2_create_rsp *op_rsp = NULL;
struct TCP_Server_Info *server = cifs_pick_channel(tcon->ses);
+ int retry_count = 0;

utf16_path = cifs_convert_path_to_utf16(path, cifs_sb);
if (!utf16_path)
@@ -2372,10 +2373,14 @@ smb2_query_dir_first(const unsigned int xid, struct cifs_tcon *tcon,

smb2_set_related(&rqst[1]);

+again:
rc = compound_send_recv(xid, tcon->ses, server,
flags, 2, rqst,
resp_buftype, rsp_iov);

+ if (rc == -EAGAIN && retry_count++ < 10)
+ goto again;
+
/* If the open failed there is nothing to do */
op_rsp = (struct smb2_create_rsp *)rsp_iov[0].iov_base;
if (op_rsp == NULL || op_rsp->sync_hdr.Status != STATUS_SUCCESS) {
--
2.31.1


2021-06-16 06:10:24

by ronnie sahlberg

[permalink] [raw]
Subject: Re: [PATCH] cifs: retry lookup and readdir when EAGAIN is returned.

Looks good to me.

Acked-by: Ronnie Sahlberg <[email protected]>

On Wed, Jun 16, 2021 at 2:44 AM Thiago Rafael Becker <[email protected]> wrote:
>
> According to the investigation performed by Jacob Shivers at Red Hat,
> cifs_lookup and cifs_readdir leak EAGAIN when the user session is
> deleted on the server. Fix this issue by implementing a retry with
> limits, as is implemented in cifs_revalidate_dentry_attr.
>
> Reproducer based on the work by Jacob Shivers:
>
> ~~~
> $ cat readdir-cifs-test.sh
> #!/bin/bash
>
> # Install and configure powershell and sshd on the windows
> # server as descibed in
> # https://docs.microsoft.com/en-us/windows-server/administration/openssh/openssh_overview
> # This script uses expect(1)
>
> USER=dude
> SERVER=192.168.0.2
> RPATH=root
> PASS='password'
>
> function debug_funcs {
> for line in $@ ; do
> echo "func $line +p" > /sys/kernel/debug/dynamic_debug/control
> done
> }
>
> function setup {
> echo 1 > /proc/fs/cifs/cifsFYI
> debug_funcs wait_for_compound_request \
> smb2_query_dir_first cifs_readdir \
> compound_send_recv cifs_reconnect_tcon \
> generic_ip_connect cifs_reconnect \
> smb2_reconnect_server smb2_reconnect \
> cifs_readv_from_socket cifs_readv_receive
> tcpdump -i eth0 -w cifs.pcap host 192.168.2.182 & sleep 5
> dmesg -C
> }
>
> function test_call {
> if [[ $1 == 1 ]] ; then
> tracer="strace -tt -f -s 4096 -o trace-$(date -Iseconds).txt"
> fi
> # Change the command here to anything apropriate
> $tracer ls $2 > /dev/null
> res=$?
> if [[ $1 == 1 ]] ; then
> if [[ $res == 0 ]] ; then
> 1>&2 echo success
> else
> 1>&2 echo "failure ($res)"
> fi
> fi
> }
>
> mountpoint /mnt > /dev/null || mount -t cifs -o username=$USER,pass=$PASS //$SERVER/$RPATH /mnt
>
> test_call 0 /mnt/
>
> /usr/bin/expect << EOF
> set timeout 60
>
> spawn ssh $USER@$SERVER
>
> expect "yes/no" {
> send "yes\r"
> expect "*?assword" { send "$PASS\r" }
> } "*?assword" { send "$PASS\r" }
>
> expect ">" { send "powershell close-smbsession -force\r" }
> expect ">" { send "exit\r" }
> expect eof
> EOF
>
> sysctl -w vm.drop_caches=2 > /dev/null
> sysctl -w vm.drop_caches=2 > /dev/null
>
> setup
>
> test_call 1 /mnt/
> ~~~
>
> Signed-off-by: Thiago Rafael Becker <[email protected]>
> ---
> fs/cifs/dir.c | 4 ++++
> fs/cifs/smb2ops.c | 5 +++++
> 2 files changed, 9 insertions(+)
>
> diff --git a/fs/cifs/dir.c b/fs/cifs/dir.c
> index 6bcd3e8f7cda..7c641f9a3dac 100644
> --- a/fs/cifs/dir.c
> +++ b/fs/cifs/dir.c
> @@ -630,6 +630,7 @@ cifs_lookup(struct inode *parent_dir_inode, struct dentry *direntry,
> struct inode *newInode = NULL;
> const char *full_path;
> void *page;
> + int retry_count = 0;
>
> xid = get_xid();
>
> @@ -673,6 +674,7 @@ cifs_lookup(struct inode *parent_dir_inode, struct dentry *direntry,
> cifs_dbg(FYI, "Full path: %s inode = 0x%p\n",
> full_path, d_inode(direntry));
>
> +again:
> if (pTcon->posix_extensions)
> rc = smb311_posix_get_inode_info(&newInode, full_path, parent_dir_inode->i_sb, xid);
> else if (pTcon->unix_ext) {
> @@ -687,6 +689,8 @@ cifs_lookup(struct inode *parent_dir_inode, struct dentry *direntry,
> /* since paths are not looked up by component - the parent
> directories are presumed to be good here */
> renew_parental_timestamps(direntry);
> + } else if (rc == -EAGAIN && retry_count++ < 10) {
> + goto again;
> } else if (rc == -ENOENT) {
> cifs_set_time(direntry, jiffies);
> newInode = NULL;
> diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
> index 21ef51d338e0..d241e6af8fe4 100644
> --- a/fs/cifs/smb2ops.c
> +++ b/fs/cifs/smb2ops.c
> @@ -2325,6 +2325,7 @@ smb2_query_dir_first(const unsigned int xid, struct cifs_tcon *tcon,
> struct smb2_query_directory_rsp *qd_rsp = NULL;
> struct smb2_create_rsp *op_rsp = NULL;
> struct TCP_Server_Info *server = cifs_pick_channel(tcon->ses);
> + int retry_count = 0;
>
> utf16_path = cifs_convert_path_to_utf16(path, cifs_sb);
> if (!utf16_path)
> @@ -2372,10 +2373,14 @@ smb2_query_dir_first(const unsigned int xid, struct cifs_tcon *tcon,
>
> smb2_set_related(&rqst[1]);
>
> +again:
> rc = compound_send_recv(xid, tcon->ses, server,
> flags, 2, rqst,
> resp_buftype, rsp_iov);
>
> + if (rc == -EAGAIN && retry_count++ < 10)
> + goto again;
> +
> /* If the open failed there is nothing to do */
> op_rsp = (struct smb2_create_rsp *)rsp_iov[0].iov_base;
> if (op_rsp == NULL || op_rsp->sync_hdr.Status != STATUS_SUCCESS) {
> --
> 2.31.1
>

2021-06-16 10:09:29

by Aurélien Aptel

[permalink] [raw]
Subject: Re: [PATCH] cifs: retry lookup and readdir when EAGAIN is returned.


Thanks for the patch and reproducer Thiago.

ronnie sahlberg <[email protected]> writes:
> Looks good to me.

I guess it looks ok as a quick fix for the issue at hand but:
- should we check for more error codes (use is_retryable_error()?)
- all syscalls will need something similar, the session can be closed at
any point
* either we add a loop everywhere
* or we change cifs_send_receive to retry (most?) calls
* might be worth looking at what nfs does here
- Should this be done for both soft (default) and hard mounts? I guess
for hard we would retry indefinitely

Cheers,
--
Aurélien Aptel / SUSE Labs Samba Team
GPG: 1839 CB5F 9F5B FB9B AA97 8C99 03C8 A49B 521B D5D3
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg, DE
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 247165 (AG München)

2021-06-16 22:55:06

by Steve French

[permalink] [raw]
Subject: Re: [PATCH] cifs: retry lookup and readdir when EAGAIN is returned.

tentatively merged into cifs-2.6.git for-next pending testing and any
more review feedback

On Wed, Jun 16, 2021 at 1:09 AM ronnie sahlberg
<[email protected]> wrote:
>
> Looks good to me.
>
> Acked-by: Ronnie Sahlberg <[email protected]>
>
> On Wed, Jun 16, 2021 at 2:44 AM Thiago Rafael Becker <[email protected]> wrote:
> >
> > According to the investigation performed by Jacob Shivers at Red Hat,
> > cifs_lookup and cifs_readdir leak EAGAIN when the user session is
> > deleted on the server. Fix this issue by implementing a retry with
> > limits, as is implemented in cifs_revalidate_dentry_attr.
> >
> > Reproducer based on the work by Jacob Shivers:
> >
> > ~~~
> > $ cat readdir-cifs-test.sh
> > #!/bin/bash
> >
> > # Install and configure powershell and sshd on the windows
> > # server as descibed in
> > # https://docs.microsoft.com/en-us/windows-server/administration/openssh/openssh_overview
> > # This script uses expect(1)
> >
> > USER=dude
> > SERVER=192.168.0.2
> > RPATH=root
> > PASS='password'
> >
> > function debug_funcs {
> > for line in $@ ; do
> > echo "func $line +p" > /sys/kernel/debug/dynamic_debug/control
> > done
> > }
> >
> > function setup {
> > echo 1 > /proc/fs/cifs/cifsFYI
> > debug_funcs wait_for_compound_request \
> > smb2_query_dir_first cifs_readdir \
> > compound_send_recv cifs_reconnect_tcon \
> > generic_ip_connect cifs_reconnect \
> > smb2_reconnect_server smb2_reconnect \
> > cifs_readv_from_socket cifs_readv_receive
> > tcpdump -i eth0 -w cifs.pcap host 192.168.2.182 & sleep 5
> > dmesg -C
> > }
> >
> > function test_call {
> > if [[ $1 == 1 ]] ; then
> > tracer="strace -tt -f -s 4096 -o trace-$(date -Iseconds).txt"
> > fi
> > # Change the command here to anything apropriate
> > $tracer ls $2 > /dev/null
> > res=$?
> > if [[ $1 == 1 ]] ; then
> > if [[ $res == 0 ]] ; then
> > 1>&2 echo success
> > else
> > 1>&2 echo "failure ($res)"
> > fi
> > fi
> > }
> >
> > mountpoint /mnt > /dev/null || mount -t cifs -o username=$USER,pass=$PASS //$SERVER/$RPATH /mnt
> >
> > test_call 0 /mnt/
> >
> > /usr/bin/expect << EOF
> > set timeout 60
> >
> > spawn ssh $USER@$SERVER
> >
> > expect "yes/no" {
> > send "yes\r"
> > expect "*?assword" { send "$PASS\r" }
> > } "*?assword" { send "$PASS\r" }
> >
> > expect ">" { send "powershell close-smbsession -force\r" }
> > expect ">" { send "exit\r" }
> > expect eof
> > EOF
> >
> > sysctl -w vm.drop_caches=2 > /dev/null
> > sysctl -w vm.drop_caches=2 > /dev/null
> >
> > setup
> >
> > test_call 1 /mnt/
> > ~~~
> >
> > Signed-off-by: Thiago Rafael Becker <[email protected]>
> > ---
> > fs/cifs/dir.c | 4 ++++
> > fs/cifs/smb2ops.c | 5 +++++
> > 2 files changed, 9 insertions(+)
> >
> > diff --git a/fs/cifs/dir.c b/fs/cifs/dir.c
> > index 6bcd3e8f7cda..7c641f9a3dac 100644
> > --- a/fs/cifs/dir.c
> > +++ b/fs/cifs/dir.c
> > @@ -630,6 +630,7 @@ cifs_lookup(struct inode *parent_dir_inode, struct dentry *direntry,
> > struct inode *newInode = NULL;
> > const char *full_path;
> > void *page;
> > + int retry_count = 0;
> >
> > xid = get_xid();
> >
> > @@ -673,6 +674,7 @@ cifs_lookup(struct inode *parent_dir_inode, struct dentry *direntry,
> > cifs_dbg(FYI, "Full path: %s inode = 0x%p\n",
> > full_path, d_inode(direntry));
> >
> > +again:
> > if (pTcon->posix_extensions)
> > rc = smb311_posix_get_inode_info(&newInode, full_path, parent_dir_inode->i_sb, xid);
> > else if (pTcon->unix_ext) {
> > @@ -687,6 +689,8 @@ cifs_lookup(struct inode *parent_dir_inode, struct dentry *direntry,
> > /* since paths are not looked up by component - the parent
> > directories are presumed to be good here */
> > renew_parental_timestamps(direntry);
> > + } else if (rc == -EAGAIN && retry_count++ < 10) {
> > + goto again;
> > } else if (rc == -ENOENT) {
> > cifs_set_time(direntry, jiffies);
> > newInode = NULL;
> > diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
> > index 21ef51d338e0..d241e6af8fe4 100644
> > --- a/fs/cifs/smb2ops.c
> > +++ b/fs/cifs/smb2ops.c
> > @@ -2325,6 +2325,7 @@ smb2_query_dir_first(const unsigned int xid, struct cifs_tcon *tcon,
> > struct smb2_query_directory_rsp *qd_rsp = NULL;
> > struct smb2_create_rsp *op_rsp = NULL;
> > struct TCP_Server_Info *server = cifs_pick_channel(tcon->ses);
> > + int retry_count = 0;
> >
> > utf16_path = cifs_convert_path_to_utf16(path, cifs_sb);
> > if (!utf16_path)
> > @@ -2372,10 +2373,14 @@ smb2_query_dir_first(const unsigned int xid, struct cifs_tcon *tcon,
> >
> > smb2_set_related(&rqst[1]);
> >
> > +again:
> > rc = compound_send_recv(xid, tcon->ses, server,
> > flags, 2, rqst,
> > resp_buftype, rsp_iov);
> >
> > + if (rc == -EAGAIN && retry_count++ < 10)
> > + goto again;
> > +
> > /* If the open failed there is nothing to do */
> > op_rsp = (struct smb2_create_rsp *)rsp_iov[0].iov_base;
> > if (op_rsp == NULL || op_rsp->sync_hdr.Status != STATUS_SUCCESS) {
> > --
> > 2.31.1
> >



--
Thanks,

Steve

2021-06-16 23:30:17

by Thiago Rafael Becker

[permalink] [raw]
Subject: Re: [PATCH] cifs: retry lookup and readdir when EAGAIN is returned.

Thanks for the comments, I'm working on them.

On Wed, Jun 16, 2021 at 12:07:50PM +0200, Aur?lien Aptel wrote:

> I guess it looks ok as a quick fix for the issue at hand but:
> - should we check for more error codes (use is_retryable_error()?)
> - all syscalls will need something similar, the session can be closed at
> any point
> * either we add a loop everywhere
> * or we change cifs_send_receive to retry (most?) calls
> * might be worth looking at what nfs does here

Some syscall can return EAGAIN, so we would need to check which
operation is retryable and if the error is retryable. I don't know if
all Linux syscalls mappable to smb2 operations in the wire, but it may
be worth mapping.

NFS requeues the calls, and fails after a configurable timeout
for soft mounts, or issues an error message and requeues the request
for hard mounts (retrans and timeo mount options).

> - Should this be done for both soft (default) and hard mounts? I guess
> for hard we would retry indefinitely

Good point, but the correct option is to retry on hard mounts until the
operation succeeds.

NFS hard mounts create issues like being unable to do a clean shutdown on
a server failure, because oustanding I/O blocks it. The NFS has atempted
to fix this by adding options to kill all outstanding I/O, but I'm not
current on the efforts/issues in this front.

So this would create the same issue with outstanding mounts on cifs
shares, so a similar solution to NFS may be adapted/created in the future.

One thing that I forgot to mention in the patch is that this uncovered a
bug in compuond requests, where wait_for_compund_request will block the
readdir request with EDEADLK, if the share needs reconnect. I'm looking
into this patch, and will submmit it separatelly, as this looks like a
corner case uncovered by the patch and specific conditions in the tests.

> Cheers,
Best regards,
Thiago