LinuxLists.cc - DoS with NFSv4.1 client

2013-10-09 20:48:35

Subject: DoS with NFSv4.1 client

Hi,

last night we got a DoS attack with one of the NFS clients.
The farm node, which was accessing data with pNFS,
went mad and have tried to kill dCache NFS server. As usually
this have happened over night and we was not able to
get a network traffic or bump the debug level.

The symptoms are:

client starts to bombard the MDS with OPEN requests. As we see
state created on the server side, the requests was processed by
server. Nevertheless, for some reason, client did not like it. Here
is the result of mountstats:

OPEN:
17087065 ops (99%) 1 retrans (0%) 0 major timeouts
avg bytes sent per op: 356 avg bytes received per op: 455
backlog wait: 0.014707 RTT: 4.535704 total execute time: 4.574094 (milliseconds)
CLOSE:
290 ops (0%) 0 retrans (0%) 0 major timeouts
avg bytes sent per op: 247 avg bytes received per op: 173
backlog wait: 308.827586 RTT: 1748.479310 total execute time: 2057.365517 (milliseconds)

As you can see there is a quite a big difference between number of open and close requests.
The same picture we can see on the server side as well:

NFSServerV41 Stats: average±stderr(ns) min(ns) max(ns) Sampes
DESTROY_SESSION 26056±4511.89 13000 97000 17
OPEN 1197297± 0.00 816000 31924558000 54398533
RESTOREFH 0± 0.00 0 25018778000 54398533
SEQUENCE 1000± 0.00 1000 26066722000 55601046
LOOKUP 4607959± 0.00 375000 26977455000 32118
GETDEVICEINFO 13158±100.88 4000 655000 11378
CLOSE 16236211± 0.00 5000 21021819000 20420
LAYOUTGET 271736361± 0.00 10003000 68414723000 21095

The last column is the number of requests.

This is with RHEL6.4 as the client. By looking at the code,
I can see a loop at nfs4proc.c#nfs4_do_open() which can be
the cause of the problem. Nevertheless, I can't
fine any reason why this look turned into an 'infinite' one.

At the and our server ran out of memory and we have returned
NFSERR_SERVERFAULT to the client. This triggered client to
reestablish the session and all open state ids was
invalidated and cleaned up.

I am still trying to reproduce this behavior (on client
and server) and any hint is welcome.

Tigran.

2013-10-10 14:35:26

by Weston Andros Adamson

[permalink] [raw]

Subject: Re: DoS with NFSv4.1 client

Well, it'd be nice not to loop forever, but my question remains, is this due to a server bug (the DS not knowing about new stateid from MDS)?

-dros

On Oct 10, 2013, at 10:14 AM, Weston Andros Adamson <[email protected]> wrote:

> So is this a server bug? It seems like the client is behaving correctly...
>
> -dros
>
> On Oct 10, 2013, at 5:56 AM, "Mkrtchyan, Tigran" <[email protected]> wrote:
>
>>
>>
>> Today we was 'luck' to have such situation at day time.
>> Here is what happens:
>>
>> The client sends an OPEN and gets an open state id.
>> This is followed by LAYOUTGET ... and READ to DS.
>> At some point, server returns back BAD_STATEID.
>> This triggers client to issue a new OPEN and use
>> new open stateid with READ request to DS. As new
>> stateid is not known to DS, it keeps returning
>> BAD_STATEID and becomes an infinite loop.
>>
>> Regards,
>> Tigran.
>>
>>
>>
>> ----- Original Message -----
>>> From: "Tigran Mkrtchyan" <[email protected]>
>>> To: [email protected]
>>> Cc: "Andy Adamson" <[email protected]>, "Steve Dickson" <[email protected]>
>>> Sent: Wednesday, October 9, 2013 10:48:32 PM
>>> Subject: DoS with NFSv4.1 client
>>>
>>>
>>> Hi,
>>>
>>> last night we got a DoS attack with one of the NFS clients.
>>> The farm node, which was accessing data with pNFS,
>>> went mad and have tried to kill dCache NFS server. As usually
>>> this have happened over night and we was not able to
>>> get a network traffic or bump the debug level.
>>>
>>> The symptoms are:
>>>
>>> client starts to bombard the MDS with OPEN requests. As we see
>>> state created on the server side, the requests was processed by
>>> server. Nevertheless, for some reason, client did not like it. Here
>>> is the result of mountstats:
>>>
>>> OPEN:
>>> 17087065 ops (99%) 1 retrans (0%) 0 major timeouts
>>> avg bytes sent per op: 356 avg bytes received per op: 455
>>> backlog wait: 0.014707 RTT: 4.535704 total execute time: 4.574094
>>> (milliseconds)
>>> CLOSE:
>>> 290 ops (0%) 0 retrans (0%) 0 major timeouts
>>> avg bytes sent per op: 247 avg bytes received per op: 173
>>> backlog wait: 308.827586 RTT: 1748.479310 total execute time: 2057.365517
>>> (milliseconds)
>>>
>>>
>>> As you can see there is a quite a big difference between number of open and
>>> close requests.
>>> The same picture we can see on the server side as well:
>>>
>>> NFSServerV41 Stats: average?stderr(ns) min(ns)
>>> max(ns) Sampes
>>> DESTROY_SESSION 26056?4511.89 13000
>>> 97000 17
>>> OPEN 1197297? 0.00 816000
>>> 31924558000 54398533
>>> RESTOREFH 0? 0.00 0
>>> 25018778000 54398533
>>> SEQUENCE 1000? 0.00 1000
>>> 26066722000 55601046
>>> LOOKUP 4607959? 0.00 375000
>>> 26977455000 32118
>>> GETDEVICEINFO 13158?100.88 4000
>>> 655000 11378
>>> CLOSE 16236211? 0.00 5000
>>> 21021819000 20420
>>> LAYOUTGET 271736361? 0.00 10003000
>>> 68414723000 21095
>>>
>>> The last column is the number of requests.
>>>
>>> This is with RHEL6.4 as the client. By looking at the code,
>>> I can see a loop at nfs4proc.c#nfs4_do_open() which can be
>>> the cause of the problem. Nevertheless, I can't
>>> fine any reason why this look turned into an 'infinite' one.
>>>
>>> At the and our server ran out of memory and we have returned
>>> NFSERR_SERVERFAULT to the client. This triggered client to
>>> reestablish the session and all open state ids was
>>> invalidated and cleaned up.
>>>
>>> I am still trying to reproduce this behavior (on client
>>> and server) and any hint is welcome.
>>>
>>> Tigran.
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>> the body of a message to [email protected]
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2013-10-10 14:14:30

by Weston Andros Adamson

[permalink] [raw]

Subject: Re: DoS with NFSv4.1 client

So is this a server bug? It seems like the client is behaving correctly...

-dros

On Oct 10, 2013, at 5:56 AM, "Mkrtchyan, Tigran" <[email protected]> wrote:

>
>
> Today we was 'luck' to have such situation at day time.
> Here is what happens:
>
> The client sends an OPEN and gets an open state id.
> This is followed by LAYOUTGET ... and READ to DS.
> At some point, server returns back BAD_STATEID.
> This triggers client to issue a new OPEN and use
> new open stateid with READ request to DS. As new
> stateid is not known to DS, it keeps returning
> BAD_STATEID and becomes an infinite loop.
>
> Regards,
> Tigran.
>
>
>
> ----- Original Message -----
>> From: "Tigran Mkrtchyan" <[email protected]>
>> To: [email protected]
>> Cc: "Andy Adamson" <[email protected]>, "Steve Dickson" <[email protected]>
>> Sent: Wednesday, October 9, 2013 10:48:32 PM
>> Subject: DoS with NFSv4.1 client
>>
>>
>> Hi,
>>
>> last night we got a DoS attack with one of the NFS clients.
>> The farm node, which was accessing data with pNFS,
>> went mad and have tried to kill dCache NFS server. As usually
>> this have happened over night and we was not able to
>> get a network traffic or bump the debug level.
>>
>> The symptoms are:
>>
>> client starts to bombard the MDS with OPEN requests. As we see
>> state created on the server side, the requests was processed by
>> server. Nevertheless, for some reason, client did not like it. Here
>> is the result of mountstats:
>>
>> OPEN:
>> 17087065 ops (99%) 1 retrans (0%) 0 major timeouts
>> avg bytes sent per op: 356 avg bytes received per op: 455
>> backlog wait: 0.014707 RTT: 4.535704 total execute time: 4.574094
>> (milliseconds)
>> CLOSE:
>> 290 ops (0%) 0 retrans (0%) 0 major timeouts
>> avg bytes sent per op: 247 avg bytes received per op: 173
>> backlog wait: 308.827586 RTT: 1748.479310 total execute time: 2057.365517
>> (milliseconds)
>>
>>
>> As you can see there is a quite a big difference between number of open and
>> close requests.
>> The same picture we can see on the server side as well:
>>
>> NFSServerV41 Stats: average?stderr(ns) min(ns)
>> max(ns) Sampes
>> DESTROY_SESSION 26056?4511.89 13000
>> 97000 17
>> OPEN 1197297? 0.00 816000
>> 31924558000 54398533
>> RESTOREFH 0? 0.00 0
>> 25018778000 54398533
>> SEQUENCE 1000? 0.00 1000
>> 26066722000 55601046
>> LOOKUP 4607959? 0.00 375000
>> 26977455000 32118
>> GETDEVICEINFO 13158?100.88 4000
>> 655000 11378
>> CLOSE 16236211? 0.00 5000
>> 21021819000 20420
>> LAYOUTGET 271736361? 0.00 10003000
>> 68414723000 21095
>>
>> The last column is the number of requests.
>>
>> This is with RHEL6.4 as the client. By looking at the code,
>> I can see a loop at nfs4proc.c#nfs4_do_open() which can be
>> the cause of the problem. Nevertheless, I can't
>> fine any reason why this look turned into an 'infinite' one.
>>
>> At the and our server ran out of memory and we have returned
>> NFSERR_SERVERFAULT to the client. This triggered client to
>> reestablish the session and all open state ids was
>> invalidated and cleaned up.
>>
>> I am still trying to reproduce this behavior (on client
>> and server) and any hint is welcome.
>>
>> Tigran.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2013-10-10 14:48:54

by Mkrtchyan, Tigran

[permalink] [raw]

Subject: Re: DoS with NFSv4.1 client

----- Original Message -----
> From: "Weston Andros Adamson" <[email protected]>
> To: "Tigran Mkrtchyan" <[email protected]>
> Cc: "<[email protected]>" <[email protected]>, "Andy Adamson" <[email protected]>, "Steve
> Dickson" <[email protected]>
> Sent: Thursday, October 10, 2013 4:35:25 PM
> Subject: Re: DoS with NFSv4.1 client
>
> Well, it'd be nice not to loop forever, but my question remains, is this due
> to a server bug (the DS not knowing about new stateid from MDS)?
>

Up to now, we have pushed open state id to the DS only on LAYOUTGET.
This have to be changed, as the behaviour is not spec compliant.

Tigran.

> -dros
>
> On Oct 10, 2013, at 10:14 AM, Weston Andros Adamson <[email protected]> wrote:
>
> > So is this a server bug? It seems like the client is behaving correctly...
> >
> > -dros
> >
> > On Oct 10, 2013, at 5:56 AM, "Mkrtchyan, Tigran" <[email protected]>
> > wrote:
> >
> >>
> >>
> >> Today we was 'luck' to have such situation at day time.
> >> Here is what happens:
> >>
> >> The client sends an OPEN and gets an open state id.
> >> This is followed by LAYOUTGET ... and READ to DS.
> >> At some point, server returns back BAD_STATEID.
> >> This triggers client to issue a new OPEN and use
> >> new open stateid with READ request to DS. As new
> >> stateid is not known to DS, it keeps returning
> >> BAD_STATEID and becomes an infinite loop.
> >>
> >> Regards,
> >> Tigran.
> >>
> >>
> >>
> >> ----- Original Message -----
> >>> From: "Tigran Mkrtchyan" <[email protected]>
> >>> To: [email protected]
> >>> Cc: "Andy Adamson" <[email protected]>, "Steve Dickson"
> >>> <[email protected]>
> >>> Sent: Wednesday, October 9, 2013 10:48:32 PM
> >>> Subject: DoS with NFSv4.1 client
> >>>
> >>>
> >>> Hi,
> >>>
> >>> last night we got a DoS attack with one of the NFS clients.
> >>> The farm node, which was accessing data with pNFS,
> >>> went mad and have tried to kill dCache NFS server. As usually
> >>> this have happened over night and we was not able to
> >>> get a network traffic or bump the debug level.
> >>>
> >>> The symptoms are:
> >>>
> >>> client starts to bombard the MDS with OPEN requests. As we see
> >>> state created on the server side, the requests was processed by
> >>> server. Nevertheless, for some reason, client did not like it. Here
> >>> is the result of mountstats:
> >>>
> >>> OPEN:
> >>> 17087065 ops (99%) 1 retrans (0%) 0 major timeouts
> >>> avg bytes sent per op: 356 avg bytes received per op: 455
> >>> backlog wait: 0.014707 RTT: 4.535704 total execute time: 4.574094
> >>> (milliseconds)
> >>> CLOSE:
> >>> 290 ops (0%) 0 retrans (0%) 0 major timeouts
> >>> avg bytes sent per op: 247 avg bytes received per op: 173
> >>> backlog wait: 308.827586 RTT: 1748.479310 total execute time:
> >>> 2057.365517
> >>> (milliseconds)
> >>>
> >>>
> >>> As you can see there is a quite a big difference between number of open
> >>> and
> >>> close requests.
> >>> The same picture we can see on the server side as well:
> >>>
> >>> NFSServerV41 Stats: average±stderr(ns) min(ns)
> >>> max(ns) Sampes
> >>> DESTROY_SESSION 26056±4511.89 13000
> >>> 97000 17
> >>> OPEN 1197297± 0.00 816000
> >>> 31924558000 54398533
> >>> RESTOREFH 0± 0.00 0
> >>> 25018778000 54398533
> >>> SEQUENCE 1000± 0.00 1000
> >>> 26066722000 55601046
> >>> LOOKUP 4607959± 0.00 375000
> >>> 26977455000 32118
> >>> GETDEVICEINFO 13158±100.88 4000
> >>> 655000 11378
> >>> CLOSE 16236211± 0.00 5000
> >>> 21021819000 20420
> >>> LAYOUTGET 271736361± 0.00 10003000
> >>> 68414723000 21095
> >>>
> >>> The last column is the number of requests.
> >>>
> >>> This is with RHEL6.4 as the client. By looking at the code,
> >>> I can see a loop at nfs4proc.c#nfs4_do_open() which can be
> >>> the cause of the problem. Nevertheless, I can't
> >>> fine any reason why this look turned into an 'infinite' one.
> >>>
> >>> At the and our server ran out of memory and we have returned
> >>> NFSERR_SERVERFAULT to the client. This triggered client to
> >>> reestablish the session and all open state ids was
> >>> invalidated and cleaned up.
> >>>
> >>> I am still trying to reproduce this behavior (on client
> >>> and server) and any hint is welcome.
> >>>
> >>> Tigran.
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> >>> the body of a message to [email protected]
> >>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >>>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> >> the body of a message to [email protected]
> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>

2013-10-10 15:39:11

by Adamson, Andy

[permalink] [raw]

Subject: Re: DoS with NFSv4.1 client

On Oct 10, 2013, at 11:11 AM, "Mkrtchyan, Tigran" <[email protected]>
wrote:

>
>
> This is probably a question to IEFT working group, but anyway.
> If my layout has a flag 'return-on-close' and open state id
> is not valid any more should client expect layout to be still valid?

Here is my take:

The layout stateid is constructed from the first open stateid when pNFS I/O is tried on that file. Once the layout return is successful, the layout stateid is independent from the open stateid used to construct it.
So if that open, or another open stateid goes bad, the layout stateid is still valid.

WRT return-on-close, the invalid openstateid means there is no CLOSE until after the OPEN stateid is recovered (CLAIM_PREVIOUS) and the CLOSE call has a valid stateid. No CLOSE on an invalid stateid means no return-on-close for the invalid stateid which means the layout is still valid until the CLOSE using the recovered open stateid.

-->Andy

>
> Tigran.
>
> ----- Original Message -----
>> From: "Tigran Mkrtchyan" <[email protected]>
>> To: "Weston Andros Adamson" <[email protected]>
>> Cc: "linux-nfs" <[email protected]>, "Andy Adamson" <[email protected]>, "Steve Dickson"
>> <[email protected]>
>> Sent: Thursday, October 10, 2013 4:48:52 PM
>> Subject: Re: DoS with NFSv4.1 client
>>
>>
>>
>> ----- Original Message -----
>>> From: "Weston Andros Adamson" <[email protected]>
>>> To: "Tigran Mkrtchyan" <[email protected]>
>>> Cc: "<[email protected]>" <[email protected]>, "Andy
>>> Adamson" <[email protected]>, "Steve
>>> Dickson" <[email protected]>
>>> Sent: Thursday, October 10, 2013 4:35:25 PM
>>> Subject: Re: DoS with NFSv4.1 client
>>>
>>> Well, it'd be nice not to loop forever, but my question remains, is this
>>> due
>>> to a server bug (the DS not knowing about new stateid from MDS)?
>>>
>>
>> Up to now, we have pushed open state id to the DS only on LAYOUTGET.
>> This have to be changed, as the behavior is not spec compliant.
>>
>> Tigran.
>>
>>> -dros
>>>
>>> On Oct 10, 2013, at 10:14 AM, Weston Andros Adamson <[email protected]>
>>> wrote:
>>>
>>>> So is this a server bug? It seems like the client is behaving
>>>> correctly...
>>>>
>>>> -dros
>>>>
>>>> On Oct 10, 2013, at 5:56 AM, "Mkrtchyan, Tigran"
>>>> <[email protected]>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> Today we was 'luck' to have such situation at day time.
>>>>> Here is what happens:
>>>>>
>>>>> The client sends an OPEN and gets an open state id.
>>>>> This is followed by LAYOUTGET ... and READ to DS.
>>>>> At some point, server returns back BAD_STATEID.
>>>>> This triggers client to issue a new OPEN and use
>>>>> new open stateid with READ request to DS. As new
>>>>> stateid is not known to DS, it keeps returning
>>>>> BAD_STATEID and becomes an infinite loop.
>>>>>
>>>>> Regards,
>>>>> Tigran.
>>>>>
>>>>>
>>>>>
>>>>> ----- Original Message -----
>>>>>> From: "Tigran Mkrtchyan" <[email protected]>
>>>>>> To: [email protected]
>>>>>> Cc: "Andy Adamson" <[email protected]>, "Steve Dickson"
>>>>>> <[email protected]>
>>>>>> Sent: Wednesday, October 9, 2013 10:48:32 PM
>>>>>> Subject: DoS with NFSv4.1 client
>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> last night we got a DoS attack with one of the NFS clients.
>>>>>> The farm node, which was accessing data with pNFS,
>>>>>> went mad and have tried to kill dCache NFS server. As usually
>>>>>> this have happened over night and we was not able to
>>>>>> get a network traffic or bump the debug level.
>>>>>>
>>>>>> The symptoms are:
>>>>>>
>>>>>> client starts to bombard the MDS with OPEN requests. As we see
>>>>>> state created on the server side, the requests was processed by
>>>>>> server. Nevertheless, for some reason, client did not like it. Here
>>>>>> is the result of mountstats:
>>>>>>
>>>>>> OPEN:
>>>>>> 17087065 ops (99%) 1 retrans (0%) 0 major timeouts
>>>>>> avg bytes sent per op: 356 avg bytes received per op: 455
>>>>>> backlog wait: 0.014707 RTT: 4.535704 total execute time: 4.574094
>>>>>> (milliseconds)
>>>>>> CLOSE:
>>>>>> 290 ops (0%) 0 retrans (0%) 0 major timeouts
>>>>>> avg bytes sent per op: 247 avg bytes received per op: 173
>>>>>> backlog wait: 308.827586 RTT: 1748.479310 total execute time:
>>>>>> 2057.365517
>>>>>> (milliseconds)
>>>>>>
>>>>>>
>>>>>> As you can see there is a quite a big difference between number of open
>>>>>> and
>>>>>> close requests.
>>>>>> The same picture we can see on the server side as well:
>>>>>>
>>>>>> NFSServerV41 Stats: average?stderr(ns) min(ns)
>>>>>> max(ns) Sampes
>>>>>> DESTROY_SESSION 26056?4511.89 13000
>>>>>> 97000 17
>>>>>> OPEN 1197297? 0.00 816000
>>>>>> 31924558000 54398533
>>>>>> RESTOREFH 0? 0.00 0
>>>>>> 25018778000 54398533
>>>>>> SEQUENCE 1000? 0.00 1000
>>>>>> 26066722000 55601046
>>>>>> LOOKUP 4607959? 0.00 375000
>>>>>> 26977455000 32118
>>>>>> GETDEVICEINFO 13158?100.88 4000
>>>>>> 655000 11378
>>>>>> CLOSE 16236211? 0.00 5000
>>>>>> 21021819000 20420
>>>>>> LAYOUTGET 271736361? 0.00 10003000
>>>>>> 68414723000 21095
>>>>>>
>>>>>> The last column is the number of requests.
>>>>>>
>>>>>> This is with RHEL6.4 as the client. By looking at the code,
>>>>>> I can see a loop at nfs4proc.c#nfs4_do_open() which can be
>>>>>> the cause of the problem. Nevertheless, I can't
>>>>>> fine any reason why this look turned into an 'infinite' one.
>>>>>>
>>>>>> At the and our server ran out of memory and we have returned
>>>>>> NFSERR_SERVERFAULT to the client. This triggered client to
>>>>>> reestablish the session and all open state ids was
>>>>>> invalidated and cleaned up.
>>>>>>
>>>>>> I am still trying to reproduce this behavior (on client
>>>>>> and server) and any hint is welcome.
>>>>>>
>>>>>> Tigran.
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>>>>> the body of a message to [email protected]
>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>>>> the body of a message to [email protected]
>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>>> the body of a message to [email protected]
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>

2013-10-10 15:11:56

by Mkrtchyan, Tigran

[permalink] [raw]

Subject: Re: DoS with NFSv4.1 client

This is probably a question to IEFT working group, but anyway.
If my layout has a flag 'return-on-close' and open state id
is not valid any more should client expect layout to be still valid?

Tigran.

----- Original Message -----
> From: "Tigran Mkrtchyan" <[email protected]>
> To: "Weston Andros Adamson" <[email protected]>
> Cc: "linux-nfs" <[email protected]>, "Andy Adamson" <[email protected]>, "Steve Dickson"
> <[email protected]>
> Sent: Thursday, October 10, 2013 4:48:52 PM
> Subject: Re: DoS with NFSv4.1 client
>
>
>
> ----- Original Message -----
> > From: "Weston Andros Adamson" <[email protected]>
> > To: "Tigran Mkrtchyan" <[email protected]>
> > Cc: "<[email protected]>" <[email protected]>, "Andy
> > Adamson" <[email protected]>, "Steve
> > Dickson" <[email protected]>
> > Sent: Thursday, October 10, 2013 4:35:25 PM
> > Subject: Re: DoS with NFSv4.1 client
> >
> > Well, it'd be nice not to loop forever, but my question remains, is this
> > due
> > to a server bug (the DS not knowing about new stateid from MDS)?
> >
>
> Up to now, we have pushed open state id to the DS only on LAYOUTGET.
> This have to be changed, as the behaviour is not spec compliant.
>
> Tigran.
>
> > -dros
> >
> > On Oct 10, 2013, at 10:14 AM, Weston Andros Adamson <[email protected]>
> > wrote:
> >
> > > So is this a server bug? It seems like the client is behaving
> > > correctly...
> > >
> > > -dros
> > >
> > > On Oct 10, 2013, at 5:56 AM, "Mkrtchyan, Tigran"
> > > <[email protected]>
> > > wrote:
> > >
> > >>
> > >>
> > >> Today we was 'luck' to have such situation at day time.
> > >> Here is what happens:
> > >>
> > >> The client sends an OPEN and gets an open state id.
> > >> This is followed by LAYOUTGET ... and READ to DS.
> > >> At some point, server returns back BAD_STATEID.
> > >> This triggers client to issue a new OPEN and use
> > >> new open stateid with READ request to DS. As new
> > >> stateid is not known to DS, it keeps returning
> > >> BAD_STATEID and becomes an infinite loop.
> > >>
> > >> Regards,
> > >> Tigran.
> > >>
> > >>
> > >>
> > >> ----- Original Message -----
> > >>> From: "Tigran Mkrtchyan" <[email protected]>
> > >>> To: [email protected]
> > >>> Cc: "Andy Adamson" <[email protected]>, "Steve Dickson"
> > >>> <[email protected]>
> > >>> Sent: Wednesday, October 9, 2013 10:48:32 PM
> > >>> Subject: DoS with NFSv4.1 client
> > >>>
> > >>>
> > >>> Hi,
> > >>>
> > >>> last night we got a DoS attack with one of the NFS clients.
> > >>> The farm node, which was accessing data with pNFS,
> > >>> went mad and have tried to kill dCache NFS server. As usually
> > >>> this have happened over night and we was not able to
> > >>> get a network traffic or bump the debug level.
> > >>>
> > >>> The symptoms are:
> > >>>
> > >>> client starts to bombard the MDS with OPEN requests. As we see
> > >>> state created on the server side, the requests was processed by
> > >>> server. Nevertheless, for some reason, client did not like it. Here
> > >>> is the result of mountstats:
> > >>>
> > >>> OPEN:
> > >>> 17087065 ops (99%) 1 retrans (0%) 0 major timeouts
> > >>> avg bytes sent per op: 356 avg bytes received per op: 455
> > >>> backlog wait: 0.014707 RTT: 4.535704 total execute time: 4.574094
> > >>> (milliseconds)
> > >>> CLOSE:
> > >>> 290 ops (0%) 0 retrans (0%) 0 major timeouts
> > >>> avg bytes sent per op: 247 avg bytes received per op: 173
> > >>> backlog wait: 308.827586 RTT: 1748.479310 total execute time:
> > >>> 2057.365517
> > >>> (milliseconds)
> > >>>
> > >>>
> > >>> As you can see there is a quite a big difference between number of open
> > >>> and
> > >>> close requests.
> > >>> The same picture we can see on the server side as well:
> > >>>
> > >>> NFSServerV41 Stats: average±stderr(ns) min(ns)
> > >>> max(ns) Sampes
> > >>> DESTROY_SESSION 26056±4511.89 13000
> > >>> 97000 17
> > >>> OPEN 1197297± 0.00 816000
> > >>> 31924558000 54398533
> > >>> RESTOREFH 0± 0.00 0
> > >>> 25018778000 54398533
> > >>> SEQUENCE 1000± 0.00 1000
> > >>> 26066722000 55601046
> > >>> LOOKUP 4607959± 0.00 375000
> > >>> 26977455000 32118
> > >>> GETDEVICEINFO 13158±100.88 4000
> > >>> 655000 11378
> > >>> CLOSE 16236211± 0.00 5000
> > >>> 21021819000 20420
> > >>> LAYOUTGET 271736361± 0.00 10003000
> > >>> 68414723000 21095
> > >>>
> > >>> The last column is the number of requests.
> > >>>
> > >>> This is with RHEL6.4 as the client. By looking at the code,
> > >>> I can see a loop at nfs4proc.c#nfs4_do_open() which can be
> > >>> the cause of the problem. Nevertheless, I can't
> > >>> fine any reason why this look turned into an 'infinite' one.
> > >>>
> > >>> At the and our server ran out of memory and we have returned
> > >>> NFSERR_SERVERFAULT to the client. This triggered client to
> > >>> reestablish the session and all open state ids was
> > >>> invalidated and cleaned up.
> > >>>
> > >>> I am still trying to reproduce this behavior (on client
> > >>> and server) and any hint is welcome.
> > >>>
> > >>> Tigran.
> > >>> --
> > >>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > >>> the body of a message to [email protected]
> > >>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> > >>>
> > >> --
> > >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > >> the body of a message to [email protected]
> > >> More majordomo info at http://vger.kernel.org/majordomo-info.html
> > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > > the body of a message to [email protected]
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2013-10-10 09:56:58

by Mkrtchyan, Tigran

[permalink] [raw]

Subject: Re: DoS with NFSv4.1 client

Today we was 'luck' to have such situation at day time.
Here is what happens:

The client sends an OPEN and gets an open state id.
This is followed by LAYOUTGET ... and READ to DS.
At some point, server returns back BAD_STATEID.
This triggers client to issue a new OPEN and use
new open stateid with READ request to DS. As new
stateid is not known to DS, it keeps returning
BAD_STATEID and becomes an infinite loop.

Regards,
Tigran.

----- Original Message -----
> From: "Tigran Mkrtchyan" <[email protected]>
> To: [email protected]
> Cc: "Andy Adamson" <[email protected]>, "Steve Dickson" <[email protected]>
> Sent: Wednesday, October 9, 2013 10:48:32 PM
> Subject: DoS with NFSv4.1 client
>
>
> Hi,
>
> last night we got a DoS attack with one of the NFS clients.
> The farm node, which was accessing data with pNFS,
> went mad and have tried to kill dCache NFS server. As usually
> this have happened over night and we was not able to
> get a network traffic or bump the debug level.
>
> The symptoms are:
>
> client starts to bombard the MDS with OPEN requests. As we see
> state created on the server side, the requests was processed by
> server. Nevertheless, for some reason, client did not like it. Here
> is the result of mountstats:
>
> OPEN:
> 17087065 ops (99%) 1 retrans (0%) 0 major timeouts
> avg bytes sent per op: 356 avg bytes received per op: 455
> backlog wait: 0.014707 RTT: 4.535704 total execute time: 4.574094
> (milliseconds)
> CLOSE:
> 290 ops (0%) 0 retrans (0%) 0 major timeouts
> avg bytes sent per op: 247 avg bytes received per op: 173
> backlog wait: 308.827586 RTT: 1748.479310 total execute time: 2057.365517
> (milliseconds)
>
>
> As you can see there is a quite a big difference between number of open and
> close requests.
> The same picture we can see on the server side as well:
>
> NFSServerV41 Stats: average±stderr(ns) min(ns)
> max(ns) Sampes
> DESTROY_SESSION 26056±4511.89 13000
> 97000 17
> OPEN 1197297± 0.00 816000
> 31924558000 54398533
> RESTOREFH 0± 0.00 0
> 25018778000 54398533
> SEQUENCE 1000± 0.00 1000
> 26066722000 55601046
> LOOKUP 4607959± 0.00 375000
> 26977455000 32118
> GETDEVICEINFO 13158±100.88 4000
> 655000 11378
> CLOSE 16236211± 0.00 5000
> 21021819000 20420
> LAYOUTGET 271736361± 0.00 10003000
> 68414723000 21095
>
> The last column is the number of requests.
>
> This is with RHEL6.4 as the client. By looking at the code,
> I can see a loop at nfs4proc.c#nfs4_do_open() which can be
> the cause of the problem. Nevertheless, I can't
> fine any reason why this look turned into an 'infinite' one.
>
> At the and our server ran out of memory and we have returned
> NFSERR_SERVERFAULT to the client. This triggered client to
> reestablish the session and all open state ids was
> invalidated and cleaned up.
>
> I am still trying to reproduce this behavior (on client
> and server) and any hint is welcome.
>
> Tigran.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>