2020-10-29 19:08:06

by Dai Ngo

[permalink] [raw]
Subject: [PATCH 0/2] NFSD: Fix use-after-free warning when doing inter-server copy

Observed use-after-free messages in /var/log/messages of destination
server when doing inter-server copy. These come from 2 different places
in the code, one from the nfsd4_cleanup_inter_ssc when nfsd_file_put
is called for the source file and the other from nfs4_put_copy when
it's called from nfsd4_cb_offload_release.

Fixed by removing the call to nfsd_file_put; the object is not allocated
by nfsd_file_alloc, and by initializing refcount for nfsd4_copy in
nfsd4_do_async_copy.

fs/nfsd/nfs4proc.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)



2020-10-29 19:08:57

by Dai Ngo

[permalink] [raw]
Subject: [PATCH 1/2] NFSD: Fix use-after-free warning when doing inter-server copy

The source file nfsd_file is not constructed the same as other
nfsd_file's via nfsd_file_alloc. nfsd_file_put should not be
called to free the object; nfsd_file_put is not the inverse of
kzalloc, instead kfree is called by nfsd4_do_async_copy when done.

Fixes: ce0887ac96d3 ("NFSD add nfs4 inter ssc to nfsd4_copy")
Signed-off-by: Dai Ngo <[email protected]>
---
fs/nfsd/nfs4proc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index ad2fa1a8e7ad..9c43cad7e408 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1299,7 +1299,7 @@ nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct nfsd_file *src,
struct nfsd_file *dst)
{
nfs42_ssc_close(src->nf_file);
- nfsd_file_put(src);
+ /* 'src' is freed by nfsd4_do_async_copy */
nfsd_file_put(dst);
mntput(ss_mnt);
}
--
2.20.1.1226.g1595ea5.dirty

2020-10-29 19:11:20

by Dai Ngo

[permalink] [raw]
Subject: [PATCH 2/2] NFSD: fix missing refcount in nfsd4_copy by nfsd4_do_async_copy

Need to initialize nfsd4_copy's refcount to 1 to avoid use-after-free
warning when nfs4_put_copy is called from nfsd4_cb_offload_release.

Fixes: ce0887ac96d3 ("NFSD add nfs4 inter ssc to nfsd4_copy")
Signed-off-by: Dai Ngo <[email protected]>
---
fs/nfsd/nfs4proc.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 9c43cad7e408..e83b21778816 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1486,6 +1486,7 @@ static int nfsd4_do_async_copy(void *data)
cb_copy = kzalloc(sizeof(struct nfsd4_copy), GFP_KERNEL);
if (!cb_copy)
goto out;
+ refcount_set(&cb_copy->refcount, 1);
memcpy(&cb_copy->cp_res, &copy->cp_res, sizeof(copy->cp_res));
cb_copy->cp_clp = copy->cp_clp;
cb_copy->nfserr = copy->nfserr;
--
2.20.1.1226.g1595ea5.dirty

2020-11-05 22:26:11

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 0/2] NFSD: Fix use-after-free warning when doing inter-server copy

Applying for 5.10, thanks!

--b.

On Thu, Oct 29, 2020 at 03:07:14PM -0400, Dai Ngo wrote:
> Observed use-after-free messages in /var/log/messages of destination
> server when doing inter-server copy. These come from 2 different places
> in the code, one from the nfsd4_cleanup_inter_ssc when nfsd_file_put
> is called for the source file and the other from nfs4_put_copy when
> it's called from nfsd4_cb_offload_release.
>
> Fixed by removing the call to nfsd_file_put; the object is not allocated
> by nfsd_file_alloc, and by initializing refcount for nfsd4_copy in
> nfsd4_do_async_copy.
>
> fs/nfsd/nfs4proc.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>

2021-02-20 00:34:40

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: [PATCH 1/2] NFSD: Fix use-after-free warning when doing inter-server copy

Hi Dai (Bruce),

This patch is what broke the mount that's now left behind between the
source server and the destination server. We are no longer dropping
the necessary reference on the mount to go away. I haven't been paying
as much attention as I should have been to the changes. The original
code called fput(src) so a simple refcount of the file. Then things
got complicated and moved to nfsd_file_put(). So I don't understand
complexity. But we need to do some kind of put to decrement the needed
reference on the superblock. Bruce any ideas? Can we go back to
fput()?

On Thu, Oct 29, 2020 at 3:08 PM Dai Ngo <[email protected]> wrote:
>
> The source file nfsd_file is not constructed the same as other
> nfsd_file's via nfsd_file_alloc. nfsd_file_put should not be
> called to free the object; nfsd_file_put is not the inverse of
> kzalloc, instead kfree is called by nfsd4_do_async_copy when done.
>
> Fixes: ce0887ac96d3 ("NFSD add nfs4 inter ssc to nfsd4_copy")
> Signed-off-by: Dai Ngo <[email protected]>
> ---
> fs/nfsd/nfs4proc.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index ad2fa1a8e7ad..9c43cad7e408 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -1299,7 +1299,7 @@ nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct nfsd_file *src,
> struct nfsd_file *dst)
> {
> nfs42_ssc_close(src->nf_file);
> - nfsd_file_put(src);
> + /* 'src' is freed by nfsd4_do_async_copy */
> nfsd_file_put(dst);
> mntput(ss_mnt);
> }
> --
> 2.20.1.1226.g1595ea5.dirty
>

2021-02-20 01:12:15

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 1/2] NFSD: Fix use-after-free warning when doing inter-server copy

Dai, do you have a copy of the original use-after-free warning?

--b.

On Fri, Feb 19, 2021 at 07:18:53PM -0500, Olga Kornievskaia wrote:
> Hi Dai (Bruce),
>
> This patch is what broke the mount that's now left behind between the
> source server and the destination server. We are no longer dropping
> the necessary reference on the mount to go away. I haven't been paying
> as much attention as I should have been to the changes. The original
> code called fput(src) so a simple refcount of the file. Then things
> got complicated and moved to nfsd_file_put(). So I don't understand
> complexity. But we need to do some kind of put to decrement the needed
> reference on the superblock. Bruce any ideas? Can we go back to
> fput()?
>
> On Thu, Oct 29, 2020 at 3:08 PM Dai Ngo <[email protected]> wrote:
> >
> > The source file nfsd_file is not constructed the same as other
> > nfsd_file's via nfsd_file_alloc. nfsd_file_put should not be
> > called to free the object; nfsd_file_put is not the inverse of
> > kzalloc, instead kfree is called by nfsd4_do_async_copy when done.
> >
> > Fixes: ce0887ac96d3 ("NFSD add nfs4 inter ssc to nfsd4_copy")
> > Signed-off-by: Dai Ngo <[email protected]>
> > ---
> > fs/nfsd/nfs4proc.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > index ad2fa1a8e7ad..9c43cad7e408 100644
> > --- a/fs/nfsd/nfs4proc.c
> > +++ b/fs/nfsd/nfs4proc.c
> > @@ -1299,7 +1299,7 @@ nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct nfsd_file *src,
> > struct nfsd_file *dst)
> > {
> > nfs42_ssc_close(src->nf_file);
> > - nfsd_file_put(src);
> > + /* 'src' is freed by nfsd4_do_async_copy */
> > nfsd_file_put(dst);
> > mntput(ss_mnt);
> > }
> > --
> > 2.20.1.1226.g1595ea5.dirty
> >

2021-02-20 01:16:51

by Dai Ngo

[permalink] [raw]
Subject: Re: [PATCH 1/2] NFSD: Fix use-after-free warning when doing inter-server copy

Here is the original warning messages:

Oct  1 23:49:42 nfsvmf24 kernel: ------------[ cut here ]------------
Oct  1 23:49:42 nfsvmf24 kernel: refcount_t: underflow; use-after-free.
Oct  1 23:49:42 nfsvmf24 kernel: WARNING: CPU: 0 PID: 5791 at
lib/refcount.c:28 refcount_warn_saturate+0xae/0xf0
Oct  1 23:49:42 nfsvmf24 kernel: Modules linked in: cts rpcsec_gss_krb5
xt_REDIRECT xt_nat ip6table_nat ip6_tables iptable_nat nf_nat
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill btrfs blake2b_generic
xor zstd_compress raid6_pq sb_edac intel_powerclamp crct10dif_pclmul
crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd pcspkr
glue_helper i2c_piix4 video sg ip_tables xfs libcrc32c sd_mod t10_pi
ahci libahci libata e1000 crc32c_intel serio_raw dm_mirror
dm_region_hash dm_log dm_mod
Oct  1 23:49:42 nfsvmf24 kernel: CPU: 0 PID: 5791 Comm: copy thread Not
tainted 5.9.0-rc5+ #4
Oct  1 23:49:42 nfsvmf24 kernel: Hardware name: innotek GmbH
VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
Oct  1 23:49:42 nfsvmf24 kernel: RIP: 0010:refcount_warn_saturate+0xae/0xf0
Oct  1 23:49:42 nfsvmf24 kernel: Code: c9 82 31 01 01 e8 17 b9 b6 ff 0f
0b 5d c3 80 3d b6 82 31 01 00 75 91 48 c7 c7 d0 d4 3a ac c6 05 a6 82 31
01 01 e8 f7 b8 b6 ff <0f> 0b 5d c3 80 3d 94 82 31 01 00 0f 85 6d ff ff
ff 48 c7 c7 28 d5
Oct  1 23:49:42 nfsvmf24 kernel: RSP: 0018:ffff9f71c0527e68 EFLAGS: 00010286
Oct  1 23:49:42 nfsvmf24 kernel: RAX: 0000000000000000 RBX:
0000000000000010 RCX: 0000000000000027
Oct  1 23:49:42 nfsvmf24 kernel: RDX: 0000000000000027 RSI:
0000000000000086 RDI: ffff88ed57c18c48
Oct  1 23:49:42 nfsvmf24 kernel: RBP: ffff9f71c0527e68 R08:
ffff88ed57c18c40 R09: 0000000000000004
Oct  1 23:49:42 nfsvmf24 kernel: R10: 0000000000000000 R11:
0000000000000001 R12: ffff88ed54cedcc0
Oct  1 23:49:42 nfsvmf24 kernel: R13: 0000000000000000 R14:
ffff88ed4a4a7000 R15: ffff88ed4e93f3e0
Oct  1 23:49:42 nfsvmf24 kernel: FS:  0000000000000000(0000)
GS:ffff88ed57c00000(0000) knlGS:0000000000000000
Oct  1 23:49:42 nfsvmf24 kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033
Oct  1 23:49:42 nfsvmf24 kernel: CR2: 0000000000c84018 CR3:
0000000216194000 CR4: 00000000000406f0
Oct  1 23:49:42 nfsvmf24 kernel: Call Trace:
Oct  1 23:49:42 nfsvmf24 kernel: nfsd_file_put_noref+0x8f/0xa0
Oct  1 23:49:42 nfsvmf24 kernel: nfsd_file_put+0x3e/0x90
Oct  1 23:49:42 nfsvmf24 kernel: nfsd4_do_copy+0xf0/0x150
Oct  1 23:49:42 nfsvmf24 kernel: nfsd4_do_async_copy+0x84/0x200
Oct  1 23:49:42 nfsvmf24 kernel: kthread+0x114/0x150
Oct  1 23:49:42 nfsvmf24 kernel: ? nfsd4_copy+0x4e0/0x4e0
Oct  1 23:49:42 nfsvmf24 kernel: ? kthread_park+0x90/0x90
Oct  1 23:49:42 nfsvmf24 kernel: ret_from_fork+0x22/0x30

-Dai

On 2/19/21 5:09 PM, J. Bruce Fields wrote:
> Dai, do you have a copy of the original use-after-free warning?
>
> --b.
>
> On Fri, Feb 19, 2021 at 07:18:53PM -0500, Olga Kornievskaia wrote:
>> Hi Dai (Bruce),
>>
>> This patch is what broke the mount that's now left behind between the
>> source server and the destination server. We are no longer dropping
>> the necessary reference on the mount to go away. I haven't been paying
>> as much attention as I should have been to the changes. The original
>> code called fput(src) so a simple refcount of the file. Then things
>> got complicated and moved to nfsd_file_put(). So I don't understand
>> complexity. But we need to do some kind of put to decrement the needed
>> reference on the superblock. Bruce any ideas? Can we go back to
>> fput()?
>>
>> On Thu, Oct 29, 2020 at 3:08 PM Dai Ngo <[email protected]> wrote:
>>> The source file nfsd_file is not constructed the same as other
>>> nfsd_file's via nfsd_file_alloc. nfsd_file_put should not be
>>> called to free the object; nfsd_file_put is not the inverse of
>>> kzalloc, instead kfree is called by nfsd4_do_async_copy when done.
>>>
>>> Fixes: ce0887ac96d3 ("NFSD add nfs4 inter ssc to nfsd4_copy")
>>> Signed-off-by: Dai Ngo <[email protected]>
>>> ---
>>> fs/nfsd/nfs4proc.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>>> index ad2fa1a8e7ad..9c43cad7e408 100644
>>> --- a/fs/nfsd/nfs4proc.c
>>> +++ b/fs/nfsd/nfs4proc.c
>>> @@ -1299,7 +1299,7 @@ nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct nfsd_file *src,
>>> struct nfsd_file *dst)
>>> {
>>> nfs42_ssc_close(src->nf_file);
>>> - nfsd_file_put(src);
>>> + /* 'src' is freed by nfsd4_do_async_copy */
>>> nfsd_file_put(dst);
>>> mntput(ss_mnt);
>>> }
>>> --
>>> 2.20.1.1226.g1595ea5.dirty
>>>

2021-02-20 01:34:19

by Dai Ngo

[permalink] [raw]
Subject: Re: [PATCH 1/2] NFSD: Fix use-after-free warning when doing inter-server copy

Hi Olga and Bruce,

If this is the cause why we don't drop the mount after the copy
then I can restore the patch and look into this problem. Unfortunately,
all my test machines are down for maintenance until Sunday/Monday.

-Dai

On 2/19/21 5:09 PM, J. Bruce Fields wrote:
> Dai, do you have a copy of the original use-after-free warning?
>
> --b.
>
> On Fri, Feb 19, 2021 at 07:18:53PM -0500, Olga Kornievskaia wrote:
>> Hi Dai (Bruce),
>>
>> This patch is what broke the mount that's now left behind between the
>> source server and the destination server. We are no longer dropping
>> the necessary reference on the mount to go away. I haven't been paying
>> as much attention as I should have been to the changes. The original
>> code called fput(src) so a simple refcount of the file. Then things
>> got complicated and moved to nfsd_file_put(). So I don't understand
>> complexity. But we need to do some kind of put to decrement the needed
>> reference on the superblock. Bruce any ideas? Can we go back to
>> fput()?
>>
>> On Thu, Oct 29, 2020 at 3:08 PM Dai Ngo <[email protected]> wrote:
>>> The source file nfsd_file is not constructed the same as other
>>> nfsd_file's via nfsd_file_alloc. nfsd_file_put should not be
>>> called to free the object; nfsd_file_put is not the inverse of
>>> kzalloc, instead kfree is called by nfsd4_do_async_copy when done.
>>>
>>> Fixes: ce0887ac96d3 ("NFSD add nfs4 inter ssc to nfsd4_copy")
>>> Signed-off-by: Dai Ngo <[email protected]>
>>> ---
>>> fs/nfsd/nfs4proc.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>>> index ad2fa1a8e7ad..9c43cad7e408 100644
>>> --- a/fs/nfsd/nfs4proc.c
>>> +++ b/fs/nfsd/nfs4proc.c
>>> @@ -1299,7 +1299,7 @@ nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct nfsd_file *src,
>>> struct nfsd_file *dst)
>>> {
>>> nfs42_ssc_close(src->nf_file);
>>> - nfsd_file_put(src);
>>> + /* 'src' is freed by nfsd4_do_async_copy */
>>> nfsd_file_put(dst);
>>> mntput(ss_mnt);
>>> }
>>> --
>>> 2.20.1.1226.g1595ea5.dirty
>>>

2021-02-20 03:22:46

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 1/2] NFSD: Fix use-after-free warning when doing inter-server copy

On Fri, Feb 19, 2021 at 05:31:58PM -0800, [email protected] wrote:
> If this is the cause why we don't drop the mount after the copy
> then I can restore the patch and look into this problem. Unfortunately,
> all my test machines are down for maintenance until Sunday/Monday.

I think we can take some time to figure out what's actually going on
here before reverting anything.

--b.

>
> -Dai
>
> On 2/19/21 5:09 PM, J. Bruce Fields wrote:
> >Dai, do you have a copy of the original use-after-free warning?
> >
> >--b.
> >
> >On Fri, Feb 19, 2021 at 07:18:53PM -0500, Olga Kornievskaia wrote:
> >>Hi Dai (Bruce),
> >>
> >>This patch is what broke the mount that's now left behind between the
> >>source server and the destination server. We are no longer dropping
> >>the necessary reference on the mount to go away. I haven't been paying
> >>as much attention as I should have been to the changes. The original
> >>code called fput(src) so a simple refcount of the file. Then things
> >>got complicated and moved to nfsd_file_put(). So I don't understand
> >>complexity. But we need to do some kind of put to decrement the needed
> >>reference on the superblock. Bruce any ideas? Can we go back to
> >>fput()?
> >>
> >>On Thu, Oct 29, 2020 at 3:08 PM Dai Ngo <[email protected]> wrote:
> >>>The source file nfsd_file is not constructed the same as other
> >>>nfsd_file's via nfsd_file_alloc. nfsd_file_put should not be
> >>>called to free the object; nfsd_file_put is not the inverse of
> >>>kzalloc, instead kfree is called by nfsd4_do_async_copy when done.
> >>>
> >>>Fixes: ce0887ac96d3 ("NFSD add nfs4 inter ssc to nfsd4_copy")
> >>>Signed-off-by: Dai Ngo <[email protected]>
> >>>---
> >>> fs/nfsd/nfs4proc.c | 2 +-
> >>> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>>
> >>>diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> >>>index ad2fa1a8e7ad..9c43cad7e408 100644
> >>>--- a/fs/nfsd/nfs4proc.c
> >>>+++ b/fs/nfsd/nfs4proc.c
> >>>@@ -1299,7 +1299,7 @@ nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct nfsd_file *src,
> >>> struct nfsd_file *dst)
> >>> {
> >>> nfs42_ssc_close(src->nf_file);
> >>>- nfsd_file_put(src);
> >>>+ /* 'src' is freed by nfsd4_do_async_copy */
> >>> nfsd_file_put(dst);
> >>> mntput(ss_mnt);
> >>> }
> >>>--
> >>>2.20.1.1226.g1595ea5.dirty
> >>>

2021-02-20 03:43:56

by Dai Ngo

[permalink] [raw]
Subject: Re: [PATCH 1/2] NFSD: Fix use-after-free warning when doing inter-server copy


On 2/19/21 7:20 PM, J. Bruce Fields wrote:
> On Fri, Feb 19, 2021 at 05:31:58PM -0800, [email protected] wrote:
>> If this is the cause why we don't drop the mount after the copy
>> then I can restore the patch and look into this problem. Unfortunately,
>> all my test machines are down for maintenance until Sunday/Monday.
> I think we can take some time to figure out what's actually going on
> here before reverting anything.

Thanks Bruce, I will look into this.

-Dai

>
> --b.
>
>> -Dai
>>
>> On 2/19/21 5:09 PM, J. Bruce Fields wrote:
>>> Dai, do you have a copy of the original use-after-free warning?
>>>
>>> --b.
>>>
>>> On Fri, Feb 19, 2021 at 07:18:53PM -0500, Olga Kornievskaia wrote:
>>>> Hi Dai (Bruce),
>>>>
>>>> This patch is what broke the mount that's now left behind between the
>>>> source server and the destination server. We are no longer dropping
>>>> the necessary reference on the mount to go away. I haven't been paying
>>>> as much attention as I should have been to the changes. The original
>>>> code called fput(src) so a simple refcount of the file. Then things
>>>> got complicated and moved to nfsd_file_put(). So I don't understand
>>>> complexity. But we need to do some kind of put to decrement the needed
>>>> reference on the superblock. Bruce any ideas? Can we go back to
>>>> fput()?
>>>>
>>>> On Thu, Oct 29, 2020 at 3:08 PM Dai Ngo <[email protected]> wrote:
>>>>> The source file nfsd_file is not constructed the same as other
>>>>> nfsd_file's via nfsd_file_alloc. nfsd_file_put should not be
>>>>> called to free the object; nfsd_file_put is not the inverse of
>>>>> kzalloc, instead kfree is called by nfsd4_do_async_copy when done.
>>>>>
>>>>> Fixes: ce0887ac96d3 ("NFSD add nfs4 inter ssc to nfsd4_copy")
>>>>> Signed-off-by: Dai Ngo <[email protected]>
>>>>> ---
>>>>> fs/nfsd/nfs4proc.c | 2 +-
>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>>>>> index ad2fa1a8e7ad..9c43cad7e408 100644
>>>>> --- a/fs/nfsd/nfs4proc.c
>>>>> +++ b/fs/nfsd/nfs4proc.c
>>>>> @@ -1299,7 +1299,7 @@ nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct nfsd_file *src,
>>>>> struct nfsd_file *dst)
>>>>> {
>>>>> nfs42_ssc_close(src->nf_file);
>>>>> - nfsd_file_put(src);
>>>>> + /* 'src' is freed by nfsd4_do_async_copy */
>>>>> nfsd_file_put(dst);
>>>>> mntput(ss_mnt);
>>>>> }
>>>>> --
>>>>> 2.20.1.1226.g1595ea5.dirty
>>>>>

2021-02-20 14:10:19

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: [PATCH 1/2] NFSD: Fix use-after-free warning when doing inter-server copy

On Fri, Feb 19, 2021 at 10:21 PM J. Bruce Fields <[email protected]> wrote:
>
> On Fri, Feb 19, 2021 at 05:31:58PM -0800, [email protected] wrote:
> > If this is the cause why we don't drop the mount after the copy
> > then I can restore the patch and look into this problem. Unfortunately,
> > all my test machines are down for maintenance until Sunday/Monday.
>
> I think we can take some time to figure out what's actually going on
> here before reverting anything.

Yes I agree. We need to fix the use-after-free and also make sure that
reference will go away. I'm assuming to reproduce the use-after-free I
need to run with kazan, is that what you did Dai?

>
> --b.
>
> >
> > -Dai
> >
> > On 2/19/21 5:09 PM, J. Bruce Fields wrote:
> > >Dai, do you have a copy of the original use-after-free warning?
> > >
> > >--b.
> > >
> > >On Fri, Feb 19, 2021 at 07:18:53PM -0500, Olga Kornievskaia wrote:
> > >>Hi Dai (Bruce),
> > >>
> > >>This patch is what broke the mount that's now left behind between the
> > >>source server and the destination server. We are no longer dropping
> > >>the necessary reference on the mount to go away. I haven't been paying
> > >>as much attention as I should have been to the changes. The original
> > >>code called fput(src) so a simple refcount of the file. Then things
> > >>got complicated and moved to nfsd_file_put(). So I don't understand
> > >>complexity. But we need to do some kind of put to decrement the needed
> > >>reference on the superblock. Bruce any ideas? Can we go back to
> > >>fput()?
> > >>
> > >>On Thu, Oct 29, 2020 at 3:08 PM Dai Ngo <[email protected]> wrote:
> > >>>The source file nfsd_file is not constructed the same as other
> > >>>nfsd_file's via nfsd_file_alloc. nfsd_file_put should not be
> > >>>called to free the object; nfsd_file_put is not the inverse of
> > >>>kzalloc, instead kfree is called by nfsd4_do_async_copy when done.
> > >>>
> > >>>Fixes: ce0887ac96d3 ("NFSD add nfs4 inter ssc to nfsd4_copy")
> > >>>Signed-off-by: Dai Ngo <[email protected]>
> > >>>---
> > >>> fs/nfsd/nfs4proc.c | 2 +-
> > >>> 1 file changed, 1 insertion(+), 1 deletion(-)
> > >>>
> > >>>diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > >>>index ad2fa1a8e7ad..9c43cad7e408 100644
> > >>>--- a/fs/nfsd/nfs4proc.c
> > >>>+++ b/fs/nfsd/nfs4proc.c
> > >>>@@ -1299,7 +1299,7 @@ nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct nfsd_file *src,
> > >>> struct nfsd_file *dst)
> > >>> {
> > >>> nfs42_ssc_close(src->nf_file);
> > >>>- nfsd_file_put(src);
> > >>>+ /* 'src' is freed by nfsd4_do_async_copy */
> > >>> nfsd_file_put(dst);
> > >>> mntput(ss_mnt);
> > >>> }
> > >>>--
> > >>>2.20.1.1226.g1595ea5.dirty
> > >>>

2021-02-21 05:12:27

by Dai Ngo

[permalink] [raw]
Subject: Re: [PATCH 1/2] NFSD: Fix use-after-free warning when doing inter-server copy

On 2/20/21 6:08 AM, Olga Kornievskaia wrote:
> On Fri, Feb 19, 2021 at 10:21 PM J. Bruce Fields <[email protected]> wrote:
>> On Fri, Feb 19, 2021 at 05:31:58PM -0800, [email protected] wrote:
>>> If this is the cause why we don't drop the mount after the copy
>>> then I can restore the patch and look into this problem. Unfortunately,
>>> all my test machines are down for maintenance until Sunday/Monday.
>> I think we can take some time to figure out what's actually going on
>> here before reverting anything.
> Yes I agree. We need to fix the use-after-free and also make sure that
> reference will go away. I'm assuming to reproduce the use-after-free I
> need to run with kazan, is that what you did Dai?

I did not use Kasan. I just did an inter-server copy and this message
showed up in /var/log/messages.

-Dai

>
>> --b.
>>
>>> -Dai
>>>
>>> On 2/19/21 5:09 PM, J. Bruce Fields wrote:
>>>> Dai, do you have a copy of the original use-after-free warning?
>>>>
>>>> --b.
>>>>
>>>> On Fri, Feb 19, 2021 at 07:18:53PM -0500, Olga Kornievskaia wrote:
>>>>> Hi Dai (Bruce),
>>>>>
>>>>> This patch is what broke the mount that's now left behind between the
>>>>> source server and the destination server. We are no longer dropping
>>>>> the necessary reference on the mount to go away. I haven't been paying
>>>>> as much attention as I should have been to the changes. The original
>>>>> code called fput(src) so a simple refcount of the file. Then things
>>>>> got complicated and moved to nfsd_file_put(). So I don't understand
>>>>> complexity. But we need to do some kind of put to decrement the needed
>>>>> reference on the superblock. Bruce any ideas? Can we go back to
>>>>> fput()?
>>>>>
>>>>> On Thu, Oct 29, 2020 at 3:08 PM Dai Ngo <[email protected]> wrote:
>>>>>> The source file nfsd_file is not constructed the same as other
>>>>>> nfsd_file's via nfsd_file_alloc. nfsd_file_put should not be
>>>>>> called to free the object; nfsd_file_put is not the inverse of
>>>>>> kzalloc, instead kfree is called by nfsd4_do_async_copy when done.
>>>>>>
>>>>>> Fixes: ce0887ac96d3 ("NFSD add nfs4 inter ssc to nfsd4_copy")
>>>>>> Signed-off-by: Dai Ngo <[email protected]>
>>>>>> ---
>>>>>> fs/nfsd/nfs4proc.c | 2 +-
>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>>>>>> index ad2fa1a8e7ad..9c43cad7e408 100644
>>>>>> --- a/fs/nfsd/nfs4proc.c
>>>>>> +++ b/fs/nfsd/nfs4proc.c
>>>>>> @@ -1299,7 +1299,7 @@ nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct nfsd_file *src,
>>>>>> struct nfsd_file *dst)
>>>>>> {
>>>>>> nfs42_ssc_close(src->nf_file);
>>>>>> - nfsd_file_put(src);
>>>>>> + /* 'src' is freed by nfsd4_do_async_copy */
>>>>>> nfsd_file_put(dst);
>>>>>> mntput(ss_mnt);
>>>>>> }
>>>>>> --
>>>>>> 2.20.1.1226.g1595ea5.dirty
>>>>>>

2021-02-22 18:35:54

by Dai Ngo

[permalink] [raw]
Subject: Re: [PATCH 1/2] NFSD: Fix use-after-free warning when doing inter-server copy


On 2/20/21 8:16 PM, [email protected] wrote:
> On 2/20/21 6:08 AM, Olga Kornievskaia wrote:
>> On Fri, Feb 19, 2021 at 10:21 PM J. Bruce Fields
>> <[email protected]> wrote:
>>> On Fri, Feb 19, 2021 at 05:31:58PM -0800, [email protected] wrote:
>>>> If this is the cause why we don't drop the mount after the copy
>>>> then I can restore the patch and look into this problem.
>>>> Unfortunately,
>>>> all my test machines are down for maintenance until Sunday/Monday.
>>> I think we can take some time to figure out what's actually going on
>>> here before reverting anything.
>> Yes I agree. We need to fix the use-after-free and also make sure that
>> reference will go away.

I reverted the patch, verified the warning message is back:

Feb 22 10:07:45 nfsvmf24 kernel: ------------[ cut here ]------------
Feb 22 10:07:45 nfsvmf24 kernel: refcount_t: underflow; use-after-free.

then did a inter-server copy and waited for more than 20 mins and
the destination server still maintains the session with the source
server. It must be some other references that prevent the mount
to go away.

-Dai

>> I'm assuming to reproduce the use-after-free I
>> need to run with kazan, is that what you did Dai?
>
> I did not use Kasan. I just did an inter-server copy and this message
> showed up in /var/log/messages.
>
> -Dai
>
>>
>>> --b.
>>>
>>>> -Dai
>>>>
>>>> On 2/19/21 5:09 PM, J. Bruce Fields wrote:
>>>>> Dai, do you have a copy of the original use-after-free warning?
>>>>>
>>>>> --b.
>>>>>
>>>>> On Fri, Feb 19, 2021 at 07:18:53PM -0500, Olga Kornievskaia wrote:
>>>>>> Hi Dai (Bruce),
>>>>>>
>>>>>> This patch is what broke the mount that's now left behind between
>>>>>> the
>>>>>> source server and the destination server. We are no longer dropping
>>>>>> the necessary reference on the mount to go away. I haven't been
>>>>>> paying
>>>>>> as much attention as I should have been to the changes. The original
>>>>>> code called fput(src) so a simple refcount of the file. Then things
>>>>>> got complicated and moved to nfsd_file_put(). So I don't understand
>>>>>> complexity. But we need to do some kind of put to decrement the
>>>>>> needed
>>>>>> reference on the superblock. Bruce any ideas? Can we go back to
>>>>>> fput()?
>>>>>>
>>>>>> On Thu, Oct 29, 2020 at 3:08 PM Dai Ngo <[email protected]> wrote:
>>>>>>> The source file nfsd_file is not constructed the same as other
>>>>>>> nfsd_file's via nfsd_file_alloc. nfsd_file_put should not be
>>>>>>> called to free the object; nfsd_file_put is not the inverse of
>>>>>>> kzalloc, instead kfree is called by nfsd4_do_async_copy when done.
>>>>>>>
>>>>>>> Fixes: ce0887ac96d3 ("NFSD add nfs4 inter ssc to nfsd4_copy")
>>>>>>> Signed-off-by: Dai Ngo <[email protected]>
>>>>>>> ---
>>>>>>>   fs/nfsd/nfs4proc.c | 2 +-
>>>>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>>>>>>> index ad2fa1a8e7ad..9c43cad7e408 100644
>>>>>>> --- a/fs/nfsd/nfs4proc.c
>>>>>>> +++ b/fs/nfsd/nfs4proc.c
>>>>>>> @@ -1299,7 +1299,7 @@ nfsd4_cleanup_inter_ssc(struct vfsmount
>>>>>>> *ss_mnt, struct nfsd_file *src,
>>>>>>>                          struct nfsd_file *dst)
>>>>>>>   {
>>>>>>>          nfs42_ssc_close(src->nf_file);
>>>>>>> -       nfsd_file_put(src);
>>>>>>> +       /* 'src' is freed by nfsd4_do_async_copy */
>>>>>>>          nfsd_file_put(dst);
>>>>>>>          mntput(ss_mnt);
>>>>>>>   }
>>>>>>> --
>>>>>>> 2.20.1.1226.g1595ea5.dirty
>>>>>>>

2021-02-22 21:48:09

by Dai Ngo

[permalink] [raw]
Subject: Re: [PATCH 1/2] NFSD: Fix use-after-free warning when doing inter-server copy


On 2/22/21 10:34 AM, [email protected] wrote:
>
> On 2/20/21 8:16 PM, [email protected] wrote:
>> On 2/20/21 6:08 AM, Olga Kornievskaia wrote:
>>> On Fri, Feb 19, 2021 at 10:21 PM J. Bruce Fields
>>> <[email protected]> wrote:
>>>> On Fri, Feb 19, 2021 at 05:31:58PM -0800, [email protected] wrote:
>>>>> If this is the cause why we don't drop the mount after the copy
>>>>> then I can restore the patch and look into this problem.
>>>>> Unfortunately,
>>>>> all my test machines are down for maintenance until Sunday/Monday.
>>>> I think we can take some time to figure out what's actually going on
>>>> here before reverting anything.
>>> Yes I agree. We need to fix the use-after-free and also make sure that
>>> reference will go away.
>
> I reverted the patch, verified the warning message is back:
>
> Feb 22 10:07:45 nfsvmf24 kernel: ------------[ cut here ]------------
> Feb 22 10:07:45 nfsvmf24 kernel: refcount_t: underflow; use-after-free.
>
> then did a inter-server copy and waited for more than 20 mins and
> the destination server still maintains the session with the source
> server.  It must be some other references that prevent the mount
> to go away.

This change fixed the unmount after inter-server copy problem:

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 8d6d2678abad..87687cd18938 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1304,7 +1304,7 @@ nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct nfsd_file *src,
struct nfsd_file *dst)
{
nfs42_ssc_close(src->nf_file);
- /* 'src' is freed by nfsd4_do_async_copy */
+ nfsd_file_put(src);
nfsd_file_put(dst);
mntput(ss_mnt);
}
@@ -1472,14 +1472,12 @@ static int nfsd4_do_async_copy(void *data)
copy->nf_src = kzalloc(sizeof(struct nfsd_file), GFP_KERNEL);
if (!copy->nf_src) {
copy->nfserr = nfserr_serverfault;
- nfsd4_interssc_disconnect(copy->ss_mnt);
goto do_callback;
}
copy->nf_src->nf_file = nfs42_ssc_open(copy->ss_mnt, &copy->c_fh,
&copy->stateid);
if (IS_ERR(copy->nf_src->nf_file)) {
copy->nfserr = nfserr_offload_denied;
- nfsd4_interssc_disconnect(copy->ss_mnt);
goto do_callback;
}
}
@@ -1498,6 +1496,7 @@ static int nfsd4_do_async_copy(void *data)
&nfsd4_cb_offload_ops, NFSPROC4_CLNT_CB_OFFLOAD);
nfsd4_run_cb(&cb_copy->cp_cb);
out:
+ nfsd4_interssc_disconnect(copy->ss_mnt);
if (!copy->cp_intra)
kfree(copy->nf_src);
cleanup_async_copy(copy);

But there is something new. I tried inter-server copy twice.
First time I can verify from tshark capture that a session was
created and destroy, along with all the NFS ops. On 2nd try,
I can see any NFS ops from the tshark capture because all data
are encrypted by TLS/SSLv2. This behavior seems to repeat.
I will look into it but I think this is a separate issue.

-Dai

>
> -Dai
>
>>> I'm assuming to reproduce the use-after-free I
>>> need to run with kazan, is that what you did Dai?
>>
>> I did not use Kasan. I just did an inter-server copy and this message
>> showed up in /var/log/messages.
>>
>> -Dai
>>
>>>
>>>> --b.
>>>>
>>>>> -Dai
>>>>>
>>>>> On 2/19/21 5:09 PM, J. Bruce Fields wrote:
>>>>>> Dai, do you have a copy of the original use-after-free warning?
>>>>>>
>>>>>> --b.
>>>>>>
>>>>>> On Fri, Feb 19, 2021 at 07:18:53PM -0500, Olga Kornievskaia wrote:
>>>>>>> Hi Dai (Bruce),
>>>>>>>
>>>>>>> This patch is what broke the mount that's now left behind
>>>>>>> between the
>>>>>>> source server and the destination server. We are no longer dropping
>>>>>>> the necessary reference on the mount to go away. I haven't been
>>>>>>> paying
>>>>>>> as much attention as I should have been to the changes. The
>>>>>>> original
>>>>>>> code called fput(src) so a simple refcount of the file. Then things
>>>>>>> got complicated and moved to nfsd_file_put(). So I don't understand
>>>>>>> complexity. But we need to do some kind of put to decrement the
>>>>>>> needed
>>>>>>> reference on the superblock. Bruce any ideas? Can we go back to
>>>>>>> fput()?
>>>>>>>
>>>>>>> On Thu, Oct 29, 2020 at 3:08 PM Dai Ngo <[email protected]> wrote:
>>>>>>>> The source file nfsd_file is not constructed the same as other
>>>>>>>> nfsd_file's via nfsd_file_alloc. nfsd_file_put should not be
>>>>>>>> called to free the object; nfsd_file_put is not the inverse of
>>>>>>>> kzalloc, instead kfree is called by nfsd4_do_async_copy when done.
>>>>>>>>
>>>>>>>> Fixes: ce0887ac96d3 ("NFSD add nfs4 inter ssc to nfsd4_copy")
>>>>>>>> Signed-off-by: Dai Ngo <[email protected]>
>>>>>>>> ---
>>>>>>>>   fs/nfsd/nfs4proc.c | 2 +-
>>>>>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>>>>>>>> index ad2fa1a8e7ad..9c43cad7e408 100644
>>>>>>>> --- a/fs/nfsd/nfs4proc.c
>>>>>>>> +++ b/fs/nfsd/nfs4proc.c
>>>>>>>> @@ -1299,7 +1299,7 @@ nfsd4_cleanup_inter_ssc(struct vfsmount
>>>>>>>> *ss_mnt, struct nfsd_file *src,
>>>>>>>>                          struct nfsd_file *dst)
>>>>>>>>   {
>>>>>>>>          nfs42_ssc_close(src->nf_file);
>>>>>>>> -       nfsd_file_put(src);
>>>>>>>> +       /* 'src' is freed by nfsd4_do_async_copy */
>>>>>>>>          nfsd_file_put(dst);
>>>>>>>>          mntput(ss_mnt);
>>>>>>>>   }
>>>>>>>> --
>>>>>>>> 2.20.1.1226.g1595ea5.dirty
>>>>>>>>

2021-02-22 22:03:55

by Dai Ngo

[permalink] [raw]
Subject: Re: [PATCH 1/2] NFSD: Fix use-after-free warning when doing inter-server copy


On 2/22/21 1:46 PM, [email protected] wrote:
>
> On 2/22/21 10:34 AM, [email protected] wrote:
>>
>> On 2/20/21 8:16 PM, [email protected] wrote:
>>> On 2/20/21 6:08 AM, Olga Kornievskaia wrote:
>>>> On Fri, Feb 19, 2021 at 10:21 PM J. Bruce Fields
>>>> <[email protected]> wrote:
>>>>> On Fri, Feb 19, 2021 at 05:31:58PM -0800, [email protected] wrote:
>>>>>> If this is the cause why we don't drop the mount after the copy
>>>>>> then I can restore the patch and look into this problem.
>>>>>> Unfortunately,
>>>>>> all my test machines are down for maintenance until Sunday/Monday.
>>>>> I think we can take some time to figure out what's actually going on
>>>>> here before reverting anything.
>>>> Yes I agree. We need to fix the use-after-free and also make sure that
>>>> reference will go away.
>>
>> I reverted the patch, verified the warning message is back:
>>
>> Feb 22 10:07:45 nfsvmf24 kernel: ------------[ cut here ]------------
>> Feb 22 10:07:45 nfsvmf24 kernel: refcount_t: underflow; use-after-free.
>>
>> then did a inter-server copy and waited for more than 20 mins and
>> the destination server still maintains the session with the source
>> server.  It must be some other references that prevent the mount
>> to go away.
>
> This change fixed the unmount after inter-server copy problem:
>
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index 8d6d2678abad..87687cd18938 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -1304,7 +1304,7 @@ nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt,
> struct nfsd_file *src,
>                         struct nfsd_file *dst)
>  {
>         nfs42_ssc_close(src->nf_file);
> -       /* 'src' is freed by nfsd4_do_async_copy */
> +       nfsd_file_put(src);
>         nfsd_file_put(dst);
>         mntput(ss_mnt);
>  }
> @@ -1472,14 +1472,12 @@ static int nfsd4_do_async_copy(void *data)
>                 copy->nf_src = kzalloc(sizeof(struct nfsd_file),
> GFP_KERNEL);
>                 if (!copy->nf_src) {
>                         copy->nfserr = nfserr_serverfault;
> - nfsd4_interssc_disconnect(copy->ss_mnt);
>                         goto do_callback;
>                 }
>                 copy->nf_src->nf_file = nfs42_ssc_open(copy->ss_mnt,
> &copy->c_fh,
> &copy->stateid);
>                 if (IS_ERR(copy->nf_src->nf_file)) {
>                         copy->nfserr = nfserr_offload_denied;
> - nfsd4_interssc_disconnect(copy->ss_mnt);
>                         goto do_callback;
>                 }
>         }
> @@ -1498,6 +1496,7 @@ static int nfsd4_do_async_copy(void *data)
>                         &nfsd4_cb_offload_ops, NFSPROC4_CLNT_CB_OFFLOAD);
>         nfsd4_run_cb(&cb_copy->cp_cb);
>  out:
> +       nfsd4_interssc_disconnect(copy->ss_mnt);
>         if (!copy->cp_intra)
>                 kfree(copy->nf_src);
>         cleanup_async_copy(copy);
>
> But there is something new. I tried inter-server copy twice.
> First time I can verify from tshark capture that a session was
> created and destroy, along with all the NFS ops. On 2nd try,
> I can

cannot

> see any NFS ops from the tshark capture because all data
> are encrypted by TLS/SSLv2. This behavior seems to repeat.
> I will look into it but I think this is a separate issue.
>
> -Dai
>
>>
>> -Dai
>>
>>>> I'm assuming to reproduce the use-after-free I
>>>> need to run with kazan, is that what you did Dai?
>>>
>>> I did not use Kasan. I just did an inter-server copy and this message
>>> showed up in /var/log/messages.
>>>
>>> -Dai
>>>
>>>>
>>>>> --b.
>>>>>
>>>>>> -Dai
>>>>>>
>>>>>> On 2/19/21 5:09 PM, J. Bruce Fields wrote:
>>>>>>> Dai, do you have a copy of the original use-after-free warning?
>>>>>>>
>>>>>>> --b.
>>>>>>>
>>>>>>> On Fri, Feb 19, 2021 at 07:18:53PM -0500, Olga Kornievskaia wrote:
>>>>>>>> Hi Dai (Bruce),
>>>>>>>>
>>>>>>>> This patch is what broke the mount that's now left behind
>>>>>>>> between the
>>>>>>>> source server and the destination server. We are no longer
>>>>>>>> dropping
>>>>>>>> the necessary reference on the mount to go away. I haven't been
>>>>>>>> paying
>>>>>>>> as much attention as I should have been to the changes. The
>>>>>>>> original
>>>>>>>> code called fput(src) so a simple refcount of the file. Then
>>>>>>>> things
>>>>>>>> got complicated and moved to nfsd_file_put(). So I don't
>>>>>>>> understand
>>>>>>>> complexity. But we need to do some kind of put to decrement the
>>>>>>>> needed
>>>>>>>> reference on the superblock. Bruce any ideas? Can we go back to
>>>>>>>> fput()?
>>>>>>>>
>>>>>>>> On Thu, Oct 29, 2020 at 3:08 PM Dai Ngo <[email protected]>
>>>>>>>> wrote:
>>>>>>>>> The source file nfsd_file is not constructed the same as other
>>>>>>>>> nfsd_file's via nfsd_file_alloc. nfsd_file_put should not be
>>>>>>>>> called to free the object; nfsd_file_put is not the inverse of
>>>>>>>>> kzalloc, instead kfree is called by nfsd4_do_async_copy when
>>>>>>>>> done.
>>>>>>>>>
>>>>>>>>> Fixes: ce0887ac96d3 ("NFSD add nfs4 inter ssc to nfsd4_copy")
>>>>>>>>> Signed-off-by: Dai Ngo <[email protected]>
>>>>>>>>> ---
>>>>>>>>>   fs/nfsd/nfs4proc.c | 2 +-
>>>>>>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>>>
>>>>>>>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>>>>>>>>> index ad2fa1a8e7ad..9c43cad7e408 100644
>>>>>>>>> --- a/fs/nfsd/nfs4proc.c
>>>>>>>>> +++ b/fs/nfsd/nfs4proc.c
>>>>>>>>> @@ -1299,7 +1299,7 @@ nfsd4_cleanup_inter_ssc(struct vfsmount
>>>>>>>>> *ss_mnt, struct nfsd_file *src,
>>>>>>>>>                          struct nfsd_file *dst)
>>>>>>>>>   {
>>>>>>>>>          nfs42_ssc_close(src->nf_file);
>>>>>>>>> -       nfsd_file_put(src);
>>>>>>>>> +       /* 'src' is freed by nfsd4_do_async_copy */
>>>>>>>>>          nfsd_file_put(dst);
>>>>>>>>>          mntput(ss_mnt);
>>>>>>>>>   }
>>>>>>>>> --
>>>>>>>>> 2.20.1.1226.g1595ea5.dirty
>>>>>>>>>

2021-02-22 22:11:47

by Dai Ngo

[permalink] [raw]
Subject: Re: [PATCH 1/2] NFSD: Fix use-after-free warning when doing inter-server copy


On 2/22/21 2:01 PM, [email protected] wrote:
>
> On 2/22/21 1:46 PM, [email protected] wrote:
>>
>> On 2/22/21 10:34 AM, [email protected] wrote:
>>>
>>> On 2/20/21 8:16 PM, [email protected] wrote:
>>>> On 2/20/21 6:08 AM, Olga Kornievskaia wrote:
>>>>> On Fri, Feb 19, 2021 at 10:21 PM J. Bruce Fields
>>>>> <[email protected]> wrote:
>>>>>> On Fri, Feb 19, 2021 at 05:31:58PM -0800, [email protected] wrote:
>>>>>>> If this is the cause why we don't drop the mount after the copy
>>>>>>> then I can restore the patch and look into this problem.
>>>>>>> Unfortunately,
>>>>>>> all my test machines are down for maintenance until Sunday/Monday.
>>>>>> I think we can take some time to figure out what's actually going on
>>>>>> here before reverting anything.
>>>>> Yes I agree. We need to fix the use-after-free and also make sure
>>>>> that
>>>>> reference will go away.
>>>
>>> I reverted the patch, verified the warning message is back:
>>>
>>> Feb 22 10:07:45 nfsvmf24 kernel: ------------[ cut here ]------------
>>> Feb 22 10:07:45 nfsvmf24 kernel: refcount_t: underflow; use-after-free.
>>>
>>> then did a inter-server copy and waited for more than 20 mins and
>>> the destination server still maintains the session with the source
>>> server.  It must be some other references that prevent the mount
>>> to go away.
>>
>> This change fixed the unmount after inter-server copy problem:
>>
>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>> index 8d6d2678abad..87687cd18938 100644
>> --- a/fs/nfsd/nfs4proc.c
>> +++ b/fs/nfsd/nfs4proc.c
>> @@ -1304,7 +1304,7 @@ nfsd4_cleanup_inter_ssc(struct vfsmount
>> *ss_mnt, struct nfsd_file *src,
>>                         struct nfsd_file *dst)
>>  {
>>         nfs42_ssc_close(src->nf_file);
>> -       /* 'src' is freed by nfsd4_do_async_copy */
>> +       nfsd_file_put(src);
>>         nfsd_file_put(dst);
>>         mntput(ss_mnt);
>>  }

This change is not need. It's left over from my testing to
reproduce the warning messages. Only the change in
nfsd4_do_async_copy is needed for the unmount problem.

-Dai

>> @@ -1472,14 +1472,12 @@ static int nfsd4_do_async_copy(void *data)
>>                 copy->nf_src = kzalloc(sizeof(struct nfsd_file),
>> GFP_KERNEL);
>>                 if (!copy->nf_src) {
>>                         copy->nfserr = nfserr_serverfault;
>> - nfsd4_interssc_disconnect(copy->ss_mnt);
>>                         goto do_callback;
>>                 }
>>                 copy->nf_src->nf_file = nfs42_ssc_open(copy->ss_mnt,
>> &copy->c_fh,
>> &copy->stateid);
>>                 if (IS_ERR(copy->nf_src->nf_file)) {
>>                         copy->nfserr = nfserr_offload_denied;
>> - nfsd4_interssc_disconnect(copy->ss_mnt);
>>                         goto do_callback;
>>                 }
>>         }
>> @@ -1498,6 +1496,7 @@ static int nfsd4_do_async_copy(void *data)
>>                         &nfsd4_cb_offload_ops,
>> NFSPROC4_CLNT_CB_OFFLOAD);
>>         nfsd4_run_cb(&cb_copy->cp_cb);
>>  out:
>> +       nfsd4_interssc_disconnect(copy->ss_mnt);
>>         if (!copy->cp_intra)
>>                 kfree(copy->nf_src);
>>         cleanup_async_copy(copy);
>>
>> But there is something new. I tried inter-server copy twice.
>> First time I can verify from tshark capture that a session was
>> created and destroy, along with all the NFS ops. On 2nd try,
>> I can
>
> cannot
>
>> see any NFS ops from the tshark capture because all data
>> are encrypted by TLS/SSLv2. This behavior seems to repeat.
>> I will look into it but I think this is a separate issue.
>>
>> -Dai
>>
>>>
>>> -Dai
>>>
>>>>> I'm assuming to reproduce the use-after-free I
>>>>> need to run with kazan, is that what you did Dai?
>>>>
>>>> I did not use Kasan. I just did an inter-server copy and this message
>>>> showed up in /var/log/messages.
>>>>
>>>> -Dai
>>>>
>>>>>
>>>>>> --b.
>>>>>>
>>>>>>> -Dai
>>>>>>>
>>>>>>> On 2/19/21 5:09 PM, J. Bruce Fields wrote:
>>>>>>>> Dai, do you have a copy of the original use-after-free warning?
>>>>>>>>
>>>>>>>> --b.
>>>>>>>>
>>>>>>>> On Fri, Feb 19, 2021 at 07:18:53PM -0500, Olga Kornievskaia wrote:
>>>>>>>>> Hi Dai (Bruce),
>>>>>>>>>
>>>>>>>>> This patch is what broke the mount that's now left behind
>>>>>>>>> between the
>>>>>>>>> source server and the destination server. We are no longer
>>>>>>>>> dropping
>>>>>>>>> the necessary reference on the mount to go away. I haven't
>>>>>>>>> been paying
>>>>>>>>> as much attention as I should have been to the changes. The
>>>>>>>>> original
>>>>>>>>> code called fput(src) so a simple refcount of the file. Then
>>>>>>>>> things
>>>>>>>>> got complicated and moved to nfsd_file_put(). So I don't
>>>>>>>>> understand
>>>>>>>>> complexity. But we need to do some kind of put to decrement
>>>>>>>>> the needed
>>>>>>>>> reference on the superblock. Bruce any ideas? Can we go back to
>>>>>>>>> fput()?
>>>>>>>>>
>>>>>>>>> On Thu, Oct 29, 2020 at 3:08 PM Dai Ngo <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>> The source file nfsd_file is not constructed the same as other
>>>>>>>>>> nfsd_file's via nfsd_file_alloc. nfsd_file_put should not be
>>>>>>>>>> called to free the object; nfsd_file_put is not the inverse of
>>>>>>>>>> kzalloc, instead kfree is called by nfsd4_do_async_copy when
>>>>>>>>>> done.
>>>>>>>>>>
>>>>>>>>>> Fixes: ce0887ac96d3 ("NFSD add nfs4 inter ssc to nfsd4_copy")
>>>>>>>>>> Signed-off-by: Dai Ngo <[email protected]>
>>>>>>>>>> ---
>>>>>>>>>>   fs/nfsd/nfs4proc.c | 2 +-
>>>>>>>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>>>>>>>>>> index ad2fa1a8e7ad..9c43cad7e408 100644
>>>>>>>>>> --- a/fs/nfsd/nfs4proc.c
>>>>>>>>>> +++ b/fs/nfsd/nfs4proc.c
>>>>>>>>>> @@ -1299,7 +1299,7 @@ nfsd4_cleanup_inter_ssc(struct vfsmount
>>>>>>>>>> *ss_mnt, struct nfsd_file *src,
>>>>>>>>>>                          struct nfsd_file *dst)
>>>>>>>>>>   {
>>>>>>>>>>          nfs42_ssc_close(src->nf_file);
>>>>>>>>>> -       nfsd_file_put(src);
>>>>>>>>>> +       /* 'src' is freed by nfsd4_do_async_copy */
>>>>>>>>>>          nfsd_file_put(dst);
>>>>>>>>>>          mntput(ss_mnt);
>>>>>>>>>>   }
>>>>>>>>>> --
>>>>>>>>>> 2.20.1.1226.g1595ea5.dirty
>>>>>>>>>>

2021-02-25 03:58:56

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: [PATCH 1/2] NFSD: Fix use-after-free warning when doing inter-server copy

On Mon, Feb 22, 2021 at 5:09 PM <[email protected]> wrote:
>
>
> On 2/22/21 2:01 PM, [email protected] wrote:
> >
> > On 2/22/21 1:46 PM, [email protected] wrote:
> >>
> >> On 2/22/21 10:34 AM, [email protected] wrote:
> >>>
> >>> On 2/20/21 8:16 PM, [email protected] wrote:
> >>>> On 2/20/21 6:08 AM, Olga Kornievskaia wrote:
> >>>>> On Fri, Feb 19, 2021 at 10:21 PM J. Bruce Fields
> >>>>> <[email protected]> wrote:
> >>>>>> On Fri, Feb 19, 2021 at 05:31:58PM -0800, [email protected] wrote:
> >>>>>>> If this is the cause why we don't drop the mount after the copy
> >>>>>>> then I can restore the patch and look into this problem.
> >>>>>>> Unfortunately,
> >>>>>>> all my test machines are down for maintenance until Sunday/Monday.
> >>>>>> I think we can take some time to figure out what's actually going on
> >>>>>> here before reverting anything.
> >>>>> Yes I agree. We need to fix the use-after-free and also make sure
> >>>>> that
> >>>>> reference will go away.
> >>>
> >>> I reverted the patch, verified the warning message is back:
> >>>
> >>> Feb 22 10:07:45 nfsvmf24 kernel: ------------[ cut here ]------------
> >>> Feb 22 10:07:45 nfsvmf24 kernel: refcount_t: underflow; use-after-free.
> >>>
> >>> then did a inter-server copy and waited for more than 20 mins and
> >>> the destination server still maintains the session with the source
> >>> server. It must be some other references that prevent the mount
> >>> to go away.
> >>
> >> This change fixed the unmount after inter-server copy problem:
> >>
> >> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> >> index 8d6d2678abad..87687cd18938 100644
> >> --- a/fs/nfsd/nfs4proc.c
> >> +++ b/fs/nfsd/nfs4proc.c
> >> @@ -1304,7 +1304,7 @@ nfsd4_cleanup_inter_ssc(struct vfsmount
> >> *ss_mnt, struct nfsd_file *src,
> >> struct nfsd_file *dst)
> >> {
> >> nfs42_ssc_close(src->nf_file);
> >> - /* 'src' is freed by nfsd4_do_async_copy */
> >> + nfsd_file_put(src);
> >> nfsd_file_put(dst);
> >> mntput(ss_mnt);
> >> }
>
> This change is not need. It's left over from my testing to
> reproduce the warning messages. Only the change in
> nfsd4_do_async_copy is needed for the unmount problem.
>
> -Dai
>
> >> @@ -1472,14 +1472,12 @@ static int nfsd4_do_async_copy(void *data)
> >> copy->nf_src = kzalloc(sizeof(struct nfsd_file),
> >> GFP_KERNEL);
> >> if (!copy->nf_src) {
> >> copy->nfserr = nfserr_serverfault;
> >> - nfsd4_interssc_disconnect(copy->ss_mnt);
> >> goto do_callback;
> >> }
> >> copy->nf_src->nf_file = nfs42_ssc_open(copy->ss_mnt,
> >> &copy->c_fh,
> >> &copy->stateid);
> >> if (IS_ERR(copy->nf_src->nf_file)) {
> >> copy->nfserr = nfserr_offload_denied;
> >> - nfsd4_interssc_disconnect(copy->ss_mnt);
> >> goto do_callback;
> >> }
> >> }
> >> @@ -1498,6 +1496,7 @@ static int nfsd4_do_async_copy(void *data)
> >> &nfsd4_cb_offload_ops,
> >> NFSPROC4_CLNT_CB_OFFLOAD);
> >> nfsd4_run_cb(&cb_copy->cp_cb);
> >> out:
> >> + nfsd4_interssc_disconnect(copy->ss_mnt);
> >> if (!copy->cp_intra)
> >> kfree(copy->nf_src);
> >> cleanup_async_copy(copy);
> >>
> >> But there is something new. I tried inter-server copy twice.
> >> First time I can verify from tshark capture that a session was
> >> created and destroy, along with all the NFS ops. On 2nd try,
> >> I can

Hi Dai/Bruce,

While I believe the fix works (as in the mount goes away), I'm not
comfortable with this fix as I believe we will be leaking resources.
Server calls nfs42_ssc_open() which creates a legit file pointer (yes
it takes a reference on superblock but it also allocates memory for
"file" structure). Normally a file structure requires doing an fput()
which if I look thru the code does a bunch of things before in the end
calling mntput(). While we free the copy->nf_src in
nfs4_do_asyn_copy(), the copy->nf_src->nf_file was allocated
separately and would have been freed from calling fput() on it.

So I guess it's not correct to do kfree(copy->nf_src) in teh
nfs4_do_async_copy() I think that's probably where use-after-free
comes in. We need to keep it until the cleanup. I'm thinking perhaps
call fput(copy->nf_src->nf_file) first and then free it? Just an idea
but I haven't tested it.

2021-02-25 06:04:59

by Dai Ngo

[permalink] [raw]
Subject: Re: [PATCH 1/2] NFSD: Fix use-after-free warning when doing inter-server copy

Hi Olga and Bruce,

On 2/24/21 2:35 PM, Olga Kornievskaia wrote:
> On Mon, Feb 22, 2021 at 5:09 PM <[email protected]> wrote:
>>
>> On 2/22/21 2:01 PM, [email protected] wrote:
>>> On 2/22/21 1:46 PM, [email protected] wrote:
>>>> On 2/22/21 10:34 AM, [email protected] wrote:
>>>>> On 2/20/21 8:16 PM, [email protected] wrote:
>>>>>> On 2/20/21 6:08 AM, Olga Kornievskaia wrote:
>>>>>>> On Fri, Feb 19, 2021 at 10:21 PM J. Bruce Fields
>>>>>>> <[email protected]> wrote:
>>>>>>>> On Fri, Feb 19, 2021 at 05:31:58PM -0800, [email protected] wrote:
>>>>>>>>> If this is the cause why we don't drop the mount after the copy
>>>>>>>>> then I can restore the patch and look into this problem.
>>>>>>>>> Unfortunately,
>>>>>>>>> all my test machines are down for maintenance until Sunday/Monday.
>>>>>>>> I think we can take some time to figure out what's actually going on
>>>>>>>> here before reverting anything.
>>>>>>> Yes I agree. We need to fix the use-after-free and also make sure
>>>>>>> that
>>>>>>> reference will go away.
>>>>> I reverted the patch, verified the warning message is back:
>>>>>
>>>>> Feb 22 10:07:45 nfsvmf24 kernel: ------------[ cut here ]------------
>>>>> Feb 22 10:07:45 nfsvmf24 kernel: refcount_t: underflow; use-after-free.
>>>>>
>>>>> then did a inter-server copy and waited for more than 20 mins and
>>>>> the destination server still maintains the session with the source
>>>>> server. It must be some other references that prevent the mount
>>>>> to go away.
>>>> This change fixed the unmount after inter-server copy problem:
>>>>
>>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>>>> index 8d6d2678abad..87687cd18938 100644
>>>> --- a/fs/nfsd/nfs4proc.c
>>>> +++ b/fs/nfsd/nfs4proc.c
>>>> @@ -1304,7 +1304,7 @@ nfsd4_cleanup_inter_ssc(struct vfsmount
>>>> *ss_mnt, struct nfsd_file *src,
>>>> struct nfsd_file *dst)
>>>> {
>>>> nfs42_ssc_close(src->nf_file);
>>>> - /* 'src' is freed by nfsd4_do_async_copy */
>>>> + nfsd_file_put(src);
>>>> nfsd_file_put(dst);
>>>> mntput(ss_mnt);
>>>> }
>> This change is not need. It's left over from my testing to
>> reproduce the warning messages. Only the change in
>> nfsd4_do_async_copy is needed for the unmount problem.
>>
>> -Dai
>>
>>>> @@ -1472,14 +1472,12 @@ static int nfsd4_do_async_copy(void *data)
>>>> copy->nf_src = kzalloc(sizeof(struct nfsd_file),
>>>> GFP_KERNEL);
>>>> if (!copy->nf_src) {
>>>> copy->nfserr = nfserr_serverfault;
>>>> - nfsd4_interssc_disconnect(copy->ss_mnt);
>>>> goto do_callback;
>>>> }
>>>> copy->nf_src->nf_file = nfs42_ssc_open(copy->ss_mnt,
>>>> &copy->c_fh,
>>>> &copy->stateid);
>>>> if (IS_ERR(copy->nf_src->nf_file)) {
>>>> copy->nfserr = nfserr_offload_denied;
>>>> - nfsd4_interssc_disconnect(copy->ss_mnt);
>>>> goto do_callback;
>>>> }
>>>> }
>>>> @@ -1498,6 +1496,7 @@ static int nfsd4_do_async_copy(void *data)
>>>> &nfsd4_cb_offload_ops,
>>>> NFSPROC4_CLNT_CB_OFFLOAD);
>>>> nfsd4_run_cb(&cb_copy->cp_cb);
>>>> out:
>>>> + nfsd4_interssc_disconnect(copy->ss_mnt);
>>>> if (!copy->cp_intra)
>>>> kfree(copy->nf_src);
>>>> cleanup_async_copy(copy);
>>>>
>>>> But there is something new. I tried inter-server copy twice.
>>>> First time I can verify from tshark capture that a session was
>>>> created and destroy, along with all the NFS ops. On 2nd try,
>>>> I can
> Hi Dai/Bruce,
>
> While I believe the fix works (as in the mount goes away), I'm not
> comfortable with this fix as I believe we will be leaking resources.
> Server calls nfs42_ssc_open() which creates a legit file pointer (yes
> it takes a reference on superblock but it also allocates memory for
> "file" structure). Normally a file structure requires doing an fput()
> which if I look thru the code does a bunch of things before in the end
> calling mntput(). While we free the copy->nf_src in
> nfs4_do_asyn_copy(), the copy->nf_src->nf_file was allocated
> separately and would have been freed from calling fput() on it.
>
> So I guess it's not correct to do kfree(copy->nf_src) in teh
> nfs4_do_async_copy() I think that's probably where use-after-free
> comes in. We need to keep it until the cleanup. I'm thinking perhaps
> call fput(copy->nf_src->nf_file) first and then free it? Just an idea
> but I haven't tested it.

I think the unmount can be treated separately and the fix seems
to be valid for this issue. I think kfree(copy->nf_src) is called
after nfsd4_cleanup_inter_ssc so it's not the reason for the
'use-after-free' warning. A quick look at nfsd4_do_async_copy
I see nf_src was kzalloc'ed but no ref count added to it.

I will review the whole cleanup part and report back.

-Dai

2021-02-25 20:51:37

by Dai Ngo

[permalink] [raw]
Subject: Re: [PATCH 1/2] NFSD: Fix use-after-free warning when doing inter-server copy


On 2/24/21 6:26 PM, [email protected] wrote:
> Hi Olga and Bruce,
>
> On 2/24/21 2:35 PM, Olga Kornievskaia wrote:
>> On Mon, Feb 22, 2021 at 5:09 PM <[email protected]> wrote:
>>>
>>> On 2/22/21 2:01 PM, [email protected] wrote:
>>>> On 2/22/21 1:46 PM, [email protected] wrote:
>>>>> On 2/22/21 10:34 AM, [email protected] wrote:
>>>>>> On 2/20/21 8:16 PM, [email protected] wrote:
>>>>>>> On 2/20/21 6:08 AM, Olga Kornievskaia wrote:
>>>>>>>> On Fri, Feb 19, 2021 at 10:21 PM J. Bruce Fields
>>>>>>>> <[email protected]> wrote:
>>>>>>>>> On Fri, Feb 19, 2021 at 05:31:58PM -0800, [email protected]
>>>>>>>>> wrote:
>>>>>>>>>> If this is the cause why we don't drop the mount after the copy
>>>>>>>>>> then I can restore the patch and look into this problem.
>>>>>>>>>> Unfortunately,
>>>>>>>>>> all my test machines are down for maintenance until
>>>>>>>>>> Sunday/Monday.
>>>>>>>>> I think we can take some time to figure out what's actually
>>>>>>>>> going on
>>>>>>>>> here before reverting anything.
>>>>>>>> Yes I agree. We need to fix the use-after-free and also make sure
>>>>>>>> that
>>>>>>>> reference will go away.
>>>>>> I reverted the patch, verified the warning message is back:
>>>>>>
>>>>>> Feb 22 10:07:45 nfsvmf24 kernel: ------------[ cut here
>>>>>> ]------------
>>>>>> Feb 22 10:07:45 nfsvmf24 kernel: refcount_t: underflow;
>>>>>> use-after-free.
>>>>>>
>>>>>> then did a inter-server copy and waited for more than 20 mins and
>>>>>> the destination server still maintains the session with the source
>>>>>> server.  It must be some other references that prevent the mount
>>>>>> to go away.
>>>>> This change fixed the unmount after inter-server copy problem:
>>>>>
>>>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>>>>> index 8d6d2678abad..87687cd18938 100644
>>>>> --- a/fs/nfsd/nfs4proc.c
>>>>> +++ b/fs/nfsd/nfs4proc.c
>>>>> @@ -1304,7 +1304,7 @@ nfsd4_cleanup_inter_ssc(struct vfsmount
>>>>> *ss_mnt, struct nfsd_file *src,
>>>>>                          struct nfsd_file *dst)
>>>>>   {
>>>>>          nfs42_ssc_close(src->nf_file);
>>>>> -       /* 'src' is freed by nfsd4_do_async_copy */
>>>>> +       nfsd_file_put(src);
>>>>>          nfsd_file_put(dst);
>>>>>          mntput(ss_mnt);
>>>>>   }
>>> This change is not need. It's left over from my testing to
>>> reproduce the warning messages. Only the change in
>>> nfsd4_do_async_copy is needed for the unmount problem.
>>>
>>> -Dai
>>>
>>>>> @@ -1472,14 +1472,12 @@ static int nfsd4_do_async_copy(void *data)
>>>>>                  copy->nf_src = kzalloc(sizeof(struct nfsd_file),
>>>>> GFP_KERNEL);
>>>>>                  if (!copy->nf_src) {
>>>>>                          copy->nfserr = nfserr_serverfault;
>>>>> - nfsd4_interssc_disconnect(copy->ss_mnt);
>>>>>                          goto do_callback;
>>>>>                  }
>>>>>                  copy->nf_src->nf_file = nfs42_ssc_open(copy->ss_mnt,
>>>>> &copy->c_fh,
>>>>> &copy->stateid);
>>>>>                  if (IS_ERR(copy->nf_src->nf_file)) {
>>>>>                          copy->nfserr = nfserr_offload_denied;
>>>>> - nfsd4_interssc_disconnect(copy->ss_mnt);
>>>>>                          goto do_callback;
>>>>>                  }
>>>>>          }
>>>>> @@ -1498,6 +1496,7 @@ static int nfsd4_do_async_copy(void *data)
>>>>>                          &nfsd4_cb_offload_ops,
>>>>> NFSPROC4_CLNT_CB_OFFLOAD);
>>>>>          nfsd4_run_cb(&cb_copy->cp_cb);
>>>>>   out:
>>>>> +       nfsd4_interssc_disconnect(copy->ss_mnt);
>>>>>          if (!copy->cp_intra)
>>>>>                  kfree(copy->nf_src);
>>>>>          cleanup_async_copy(copy);
>>>>>
>>>>> But there is something new. I tried inter-server copy twice.
>>>>> First time I can verify from tshark capture that a session was
>>>>> created and destroy, along with all the NFS ops. On 2nd try,
>>>>> I can
>> Hi Dai/Bruce,
>>
>> While I believe the fix works (as in the mount goes away), I'm not
>> comfortable with this fix as I believe we will be leaking resources.
>> Server calls nfs42_ssc_open() which creates a legit file pointer (yes
>> it takes a reference on superblock but it also allocates memory for
>> "file" structure). Normally a file structure requires doing an fput()
>> which if I look thru the code does a bunch of things before in the end
>> calling mntput(). While we free the copy->nf_src in
>> nfs4_do_asyn_copy(), the copy->nf_src->nf_file was allocated
>> separately and would have been freed from calling fput() on it.
>>
>> So I guess it's not correct to do kfree(copy->nf_src) in teh
>> nfs4_do_async_copy() I think that's probably where use-after-free
>> comes in. We need to keep it until the cleanup. I'm thinking perhaps
>> call fput(copy->nf_src->nf_file) first and then free it? Just an idea
>> but I haven't tested it.
>
> I think the unmount can be treated separately and the fix seems
> to be valid for this issue. I think kfree(copy->nf_src) is called
> after nfsd4_cleanup_inter_ssc so it's not the reason for the
> 'use-after-free' warning. A quick look at nfsd4_do_async_copy
> I see nf_src was kzalloc'ed but no ref count added to it.
>
> I will review the whole cleanup part and report back.

I think this would fix the resource leak problem:

diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
index 57b3821d975a..742fc128fdc8 100644
--- a/fs/nfs/nfs4file.c
+++ b/fs/nfs/nfs4file.c
@@ -405,6 +405,11 @@ static void __nfs42_ssc_close(struct file *filep)
struct nfs_open_context *ctx = nfs_file_open_context(filep);

ctx->state->flags = 0;
+ if (!filep)
+ return;
+ get_file(filep);
+ filp_close(filep, NULL);
+ fput(filep);
}

static const struct nfs4_ssc_client_ops nfs4_ssc_clnt_ops_tbl = {

-Dai

2021-03-01 18:21:04

by Chuck Lever

[permalink] [raw]
Subject: Re: [PATCH 1/2] NFSD: Fix use-after-free warning when doing inter-server copy



> On Feb 25, 2021, at 1:58 PM, [email protected] wrote:
>
>
> On 2/24/21 6:26 PM, [email protected] wrote:
>> Hi Olga and Bruce,
>>
>> On 2/24/21 2:35 PM, Olga Kornievskaia wrote:
>>> On Mon, Feb 22, 2021 at 5:09 PM <[email protected]> wrote:
>>>>
>>>> On 2/22/21 2:01 PM, [email protected] wrote:
>>>>> On 2/22/21 1:46 PM, [email protected] wrote:
>>>>>> On 2/22/21 10:34 AM, [email protected] wrote:
>>>>>>> On 2/20/21 8:16 PM, [email protected] wrote:
>>>>>>>> On 2/20/21 6:08 AM, Olga Kornievskaia wrote:
>>>>>>>>> On Fri, Feb 19, 2021 at 10:21 PM J. Bruce Fields
>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>> On Fri, Feb 19, 2021 at 05:31:58PM -0800, [email protected] wrote:
>>>>>>>>>>> If this is the cause why we don't drop the mount after the copy
>>>>>>>>>>> then I can restore the patch and look into this problem.
>>>>>>>>>>> Unfortunately,
>>>>>>>>>>> all my test machines are down for maintenance until Sunday/Monday.
>>>>>>>>>> I think we can take some time to figure out what's actually going on
>>>>>>>>>> here before reverting anything.
>>>>>>>>> Yes I agree. We need to fix the use-after-free and also make sure
>>>>>>>>> that
>>>>>>>>> reference will go away.
>>>>>>> I reverted the patch, verified the warning message is back:
>>>>>>>
>>>>>>> Feb 22 10:07:45 nfsvmf24 kernel: ------------[ cut here ]------------
>>>>>>> Feb 22 10:07:45 nfsvmf24 kernel: refcount_t: underflow; use-after-free.
>>>>>>>
>>>>>>> then did a inter-server copy and waited for more than 20 mins and
>>>>>>> the destination server still maintains the session with the source
>>>>>>> server. It must be some other references that prevent the mount
>>>>>>> to go away.
>>>>>> This change fixed the unmount after inter-server copy problem:
>>>>>>
>>>>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>>>>>> index 8d6d2678abad..87687cd18938 100644
>>>>>> --- a/fs/nfsd/nfs4proc.c
>>>>>> +++ b/fs/nfsd/nfs4proc.c
>>>>>> @@ -1304,7 +1304,7 @@ nfsd4_cleanup_inter_ssc(struct vfsmount
>>>>>> *ss_mnt, struct nfsd_file *src,
>>>>>> struct nfsd_file *dst)
>>>>>> {
>>>>>> nfs42_ssc_close(src->nf_file);
>>>>>> - /* 'src' is freed by nfsd4_do_async_copy */
>>>>>> + nfsd_file_put(src);
>>>>>> nfsd_file_put(dst);
>>>>>> mntput(ss_mnt);
>>>>>> }
>>>> This change is not need. It's left over from my testing to
>>>> reproduce the warning messages. Only the change in
>>>> nfsd4_do_async_copy is needed for the unmount problem.
>>>>
>>>> -Dai
>>>>
>>>>>> @@ -1472,14 +1472,12 @@ static int nfsd4_do_async_copy(void *data)
>>>>>> copy->nf_src = kzalloc(sizeof(struct nfsd_file),
>>>>>> GFP_KERNEL);
>>>>>> if (!copy->nf_src) {
>>>>>> copy->nfserr = nfserr_serverfault;
>>>>>> - nfsd4_interssc_disconnect(copy->ss_mnt);
>>>>>> goto do_callback;
>>>>>> }
>>>>>> copy->nf_src->nf_file = nfs42_ssc_open(copy->ss_mnt,
>>>>>> &copy->c_fh,
>>>>>> &copy->stateid);
>>>>>> if (IS_ERR(copy->nf_src->nf_file)) {
>>>>>> copy->nfserr = nfserr_offload_denied;
>>>>>> - nfsd4_interssc_disconnect(copy->ss_mnt);
>>>>>> goto do_callback;
>>>>>> }
>>>>>> }
>>>>>> @@ -1498,6 +1496,7 @@ static int nfsd4_do_async_copy(void *data)
>>>>>> &nfsd4_cb_offload_ops,
>>>>>> NFSPROC4_CLNT_CB_OFFLOAD);
>>>>>> nfsd4_run_cb(&cb_copy->cp_cb);
>>>>>> out:
>>>>>> + nfsd4_interssc_disconnect(copy->ss_mnt);
>>>>>> if (!copy->cp_intra)
>>>>>> kfree(copy->nf_src);
>>>>>> cleanup_async_copy(copy);
>>>>>>
>>>>>> But there is something new. I tried inter-server copy twice.
>>>>>> First time I can verify from tshark capture that a session was
>>>>>> created and destroy, along with all the NFS ops. On 2nd try,
>>>>>> I can
>>> Hi Dai/Bruce,
>>>
>>> While I believe the fix works (as in the mount goes away), I'm not
>>> comfortable with this fix as I believe we will be leaking resources.
>>> Server calls nfs42_ssc_open() which creates a legit file pointer (yes
>>> it takes a reference on superblock but it also allocates memory for
>>> "file" structure). Normally a file structure requires doing an fput()
>>> which if I look thru the code does a bunch of things before in the end
>>> calling mntput(). While we free the copy->nf_src in
>>> nfs4_do_asyn_copy(), the copy->nf_src->nf_file was allocated
>>> separately and would have been freed from calling fput() on it.
>>>
>>> So I guess it's not correct to do kfree(copy->nf_src) in teh
>>> nfs4_do_async_copy() I think that's probably where use-after-free
>>> comes in. We need to keep it until the cleanup. I'm thinking perhaps
>>> call fput(copy->nf_src->nf_file) first and then free it? Just an idea
>>> but I haven't tested it.
>>
>> I think the unmount can be treated separately and the fix seems
>> to be valid for this issue. I think kfree(copy->nf_src) is called
>> after nfsd4_cleanup_inter_ssc so it's not the reason for the
>> 'use-after-free' warning. A quick look at nfsd4_do_async_copy
>> I see nf_src was kzalloc'ed but no ref count added to it.
>>
>> I will review the whole cleanup part and report back.
>
> I think this would fix the resource leak problem:
>
> diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
> index 57b3821d975a..742fc128fdc8 100644
> --- a/fs/nfs/nfs4file.c
> +++ b/fs/nfs/nfs4file.c
> @@ -405,6 +405,11 @@ static void __nfs42_ssc_close(struct file *filep)
> struct nfs_open_context *ctx = nfs_file_open_context(filep);
> ctx->state->flags = 0;
> + if (!filep)
> + return;
> + get_file(filep);
> + filp_close(filep, NULL);
> + fput(filep);
> }
> static const struct nfs4_ssc_client_ops nfs4_ssc_clnt_ops_tbl = {
>
> -Dai

I'm not clear on the final outcome. Is a code change needed?
If so, can someone post a patch (with a full description and
Signed-off-by) or point me to one that's been posted already?

Thanks!

--
Chuck Lever