2022-07-01 18:01:00

by Mkrtchyan, Tigran

[permalink] [raw]
Subject: Per user rate limiter


Hi NFS folks,

reticently we got a kind of DDoS from one of our user: 5k jobs ware aggressively
reading a handful number of files. Of course we have an overload protection,
however, such a large number of requests by a single user didn't give other
users a chance to perform any IO. As we extensively use pNFS, such user behavior
makes some DSes not available to other users.

To address this issues, we are looking at some kind of per user principal
rate limiter. All users will get some IO portion and if there is no requests
from other users, then a single user can get it all. Not ideal solution, of
course, but a good starting point.

So, the question is how tell the aggressive user to back-off? Delaying the response
will block all other requests from the same host for other users. Returning
NFS4ERR_DELAY will have the same effect (this is what we do now). NFSv4.1 session
slots are client wide, thus, any increase or decrease per client id will
either give more slots to aggressive user or reduce for all other as well.

Are there any developments in the direction of per-client (cgroups or namespaces)
timeout/error handling? Are there a nfs client friendly solutions, better that
returning NFS4ERR_DELAY?

Thanks in advance,
Tigran.


Attachments:
smime.p7s (2.16 kB)
S/MIME Cryptographic Signature

2022-07-01 18:28:11

by Trond Myklebust

[permalink] [raw]
Subject: Re: Per user rate limiter

On Fri, 2022-07-01 at 19:58 +0200, Mkrtchyan, Tigran wrote:
>
> Hi NFS folks,
>
> reticently we got a kind of DDoS from one of our user: 5k jobs ware
> aggressively
> reading a handful number of files. Of course we have an overload
> protection,
> however, such a large number of requests by a single user didn't give
> other
> users a chance to perform any IO. As we extensively use pNFS, such
> user behavior
> makes some DSes not available to other users.
>
> To address this issues, we are looking at some kind of per user
> principal
> rate limiter. All users will get some IO portion and if there is no
> requests
> from other users, then a single user can get it all. Not ideal
> solution, of
> course, but a good starting point.
>
> So, the question is how tell the aggressive user to back-off?
> Delaying the response
> will block all other requests from the same host for other users.
> Returning
> NFS4ERR_DELAY will have the same effect (this is what we do now).
> NFSv4.1 session
> slots are client wide, thus, any increase or decrease per client id
> will
> either give more slots to aggressive user or reduce for all other as
> well.
>
> Are there any developments in the direction of per-client (cgroups or
> namespaces)
> timeout/error handling? Are there a nfs client friendly solutions,
> better that
> returning NFS4ERR_DELAY?
>

Here are a few suggestions:

1) Recall the layout from the offending client
2) Define QoS policies for the connections using the kernel Traffic
Control mechanisms
3) Use mirroring/replication to allow read access to the same files
through multiple data servers.
4) Use NFS re-exporting in order to reduce the load on the data
servers.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2022-07-01 21:54:50

by Daire Byrne

[permalink] [raw]
Subject: Re: Per user rate limiter

On Fri, 1 Jul 2022 at 19:23, Trond Myklebust <[email protected]> wrote:
> 2) Define QoS policies for the connections using the kernel Traffic

If it helps, we use HTB qdisc/classes on our Linux NFS servers to
optionally limit the total egress and ingress (ifb) bandwidth to/from
our renderfarm.

User workstations are exempt from these limits so always get full speed.

We can do this fairly easily because our network is well defined and
split into subnet ranges so filtering by these allows us to
differentiate between host classes (farm/workstations etc).

Strictly speaking, it's a bit more complicated in that we only apply
limits and change them dynamically based on the "load" of the server
and how well it is keeping up with demand. This is just a bash script
running in a loop looking at the state, scaling the HTB limits and
applying filters.

Our goal is to always ensure that taff have a good experience on their
interactive desktops and we'll happily slow batch farm jobs to keep it
that way.

It is basically a low-pass filter that limits server load spikes.

To do something similar by user or process, you could run your jobs in
a cgroup and have it mark the packets that the server could then use
to filter. But I think this only works for the client writes to the
server as you have no way to mark and act on the egress packets out of
the server?

Daire

2022-07-05 08:52:28

by Mkrtchyan, Tigran

[permalink] [raw]
Subject: Re: Per user rate limiter

Hi Daire, hi Trond,

We will try to apply your suggestions.

Thanks for the help,
Tigran.

----- Original Message -----
> From: "Daire Byrne" <[email protected]>
> To: "Trond Myklebust" <[email protected]>
> Cc: "Tigran Mkrtchyan" <[email protected]>, "linux-nfs" <[email protected]>
> Sent: Friday, 1 July, 2022 23:51:51
> Subject: Re: Per user rate limiter

> On Fri, 1 Jul 2022 at 19:23, Trond Myklebust <[email protected]> wrote:
>> 2) Define QoS policies for the connections using the kernel Traffic
>
> If it helps, we use HTB qdisc/classes on our Linux NFS servers to
> optionally limit the total egress and ingress (ifb) bandwidth to/from
> our renderfarm.
>
> User workstations are exempt from these limits so always get full speed.
>
> We can do this fairly easily because our network is well defined and
> split into subnet ranges so filtering by these allows us to
> differentiate between host classes (farm/workstations etc).
>
> Strictly speaking, it's a bit more complicated in that we only apply
> limits and change them dynamically based on the "load" of the server
> and how well it is keeping up with demand. This is just a bash script
> running in a loop looking at the state, scaling the HTB limits and
> applying filters.
>
> Our goal is to always ensure that taff have a good experience on their
> interactive desktops and we'll happily slow batch farm jobs to keep it
> that way.
>
> It is basically a low-pass filter that limits server load spikes.
>
> To do something similar by user or process, you could run your jobs in
> a cgroup and have it mark the packets that the server could then use
> to filter. But I think this only works for the client writes to the
> server as you have no way to mark and act on the egress packets out of
> the server?
>
> Daire


Attachments:
smime.p7s (2.16 kB)
S/MIME Cryptographic Signature