2021-04-16 23:10:52

by Keyu Man

[permalink] [raw]
Subject: PROBLEM: DoS Attack on Fragment Cache

Hi,

My name is Keyu Man. We are a group of researchers from University
of California, Riverside. Zhiyun Qian is my advisor. We found the code
in processing IPv4/IPv6 fragments will potentially lead to DoS
Attacks. Specifically, after the latest kernel receives an IPv4
fragment, it will try to fit it into a queue by calling function

struct inet_frag_queue *inet_frag_find(struct fqdir *fqdir, void
*key) in net/ipv4/inet_fragment.c.

However, this function will first check if the existing fragment
memory exceeds the fqdir->high_thresh. If it exceeds, then drop the
fragment regardless whether it belongs to a new queue or an existing
queue.
Chances are that an attacker can fill the cache with fragments
that will never be assembled (i.e., only sends the first fragment with
new IPIDs every time) to exceed the threshold so that all future
incoming fragmented IPv4 traffic would be blocked and dropped. Since
there is no GC mechanism, the victim host has to wait for 30s when the
fragments are expired to continue receiving incoming fragments
normally.
In practice, given the 4MB fragment cache, the attacker only needs
to send 1766 fragments to exhaust the cache and DoS the victim for
30s, whose cost is pretty low. Besides, IPv6 would also be affected
since the issue resides in inet part.
This issue is introduced in commit
648700f76b03b7e8149d13cc2bdb3355035258a9 (inet: frags: use rhashtables
for reassembly units) which removes fqdir->low_thresh, and GC worker
as well. We would kindly request to bring GC workers back to the
kernel to prevent the DoS attacks.

Looking forward to hear from you

Thanks,

Keyu Man


On Fri, Apr 16, 2021 at 3:58 PM Keyu Man <[email protected]> wrote:
>
> Hi,
>
>
>
> My name is Keyu Man. We are a group of researchers from University of California, Riverside. Zhiyun Qian is my advisor. We found the code in processing IPv4/IPv6 fragments will potentially lead to DoS Attacks. Specifically, after the latest kernel receives an IPv4 fragment, it will try to fit it into a queue by calling function
>
>
>
> struct inet_frag_queue *inet_frag_find(struct fqdir *fqdir, void *key) in net/ipv4/inet_fragment.c.
>
>
>
> However, this function will first check if the existing fragment memory exceeds the fqdir->high_thresh. If it exceeds, then drop the fragment regardless whether it belongs to a new queue or an existing queue.
>
> Chances are that an attacker can fill the cache with fragments that will never be assembled (i.e., only sends the first fragment with new IPIDs every time) to exceed the threshold so that all future incoming fragmented IPv4 traffic would be blocked and dropped. Since there is no GC mechanism, the victim host has to wait for 30s when the fragments are expired to continue receive incoming fragments normally.
>
> In practice, given the 4MB fragment cache, the attacker only needs to send 1766 fragments to exhaust the cache and DoS the victim for 30s, whose cost is pretty low. Besides, IPv6 would also be affected since the issue resides in inet part.
>
> This issue is introduced in commit 648700f76b03b7e8149d13cc2bdb3355035258a9 (inet: frags: use rhashtables for reassembly units) which removes fqdir->low_thresh, and GC worker as well. We would gently request to bring GC worker back to the kernel to prevent the DoS attacks.
>
> Looking forward to hear from you
>
>
>
> Thanks,
>
> Keyu Man


2021-04-17 00:34:32

by David Ahern

[permalink] [raw]
Subject: Re: PROBLEM: DoS Attack on Fragment Cache

[ cc author of 648700f76b03b7e8149d13cc2bdb3355035258a9 ]

On 4/16/21 3:58 PM, Keyu Man wrote:
> Hi,
>
>  
>
>     My name is Keyu Man. We are a group of researchers from University
> of California, Riverside. Zhiyun Qian is my advisor. We found the code
> in processing IPv4/IPv6 fragments will potentially lead to DoS Attacks.
> Specifically, after the latest kernel receives an IPv4 fragment, it will
> try to fit it into a queue by calling function
>
>  
>
>     struct inet_frag_queue *inet_frag_find(struct fqdir *fqdir, void
> *key) in net/ipv4/inet_fragment.c.
>
>  
>
>     However, this function will first check if the existing fragment
> memory exceeds the fqdir->high_thresh. If it exceeds, then drop the
> fragment regardless whether it belongs to a new queue or an existing queue.
>
>     Chances are that an attacker can fill the cache with fragments that
> will never be assembled (i.e., only sends the first fragment with new
> IPIDs every time) to exceed the threshold so that all future incoming
> fragmented IPv4 traffic would be blocked and dropped. Since there is no
> GC mechanism, the victim host has to wait for 30s when the fragments are
> expired to continue receive incoming fragments normally.
>
>     In practice, given the 4MB fragment cache, the attacker only needs
> to send 1766 fragments to exhaust the cache and DoS the victim for 30s,
> whose cost is pretty low. Besides, IPv6 would also be affected since the
> issue resides in inet part.
>
> This issue is introduced in commit
> 648700f76b03b7e8149d13cc2bdb3355035258a9 (inet: frags: use rhashtables
> for reassembly units) which removes fqdir->low_thresh, and GC worker as
> well. We would gently request to bring GC worker back to the kernel to
> prevent the DoS attacks.
>
> Looking forward to hear from you
>
>  
>
>     Thanks,
>
> Keyu Man
>

2021-04-17 04:48:08

by Eric Dumazet

[permalink] [raw]
Subject: Re: PROBLEM: DoS Attack on Fragment Cache

On Sat, Apr 17, 2021 at 2:31 AM David Ahern <[email protected]> wrote:
>
> [ cc author of 648700f76b03b7e8149d13cc2bdb3355035258a9 ]



I think this has been discussed already. There is no strategy that
makes IP reassembly units immune to DDOS attacks.

We added rb-tree and sysctls to let admins choose to use GB of RAM if
they really care.



>
> On 4/16/21 3:58 PM, Keyu Man wrote:
> > Hi,
> >
> >
> >
> > My name is Keyu Man. We are a group of researchers from University
> > of California, Riverside. Zhiyun Qian is my advisor. We found the code
> > in processing IPv4/IPv6 fragments will potentially lead to DoS Attacks.
> > Specifically, after the latest kernel receives an IPv4 fragment, it will
> > try to fit it into a queue by calling function
> >
> >
> >
> > struct inet_frag_queue *inet_frag_find(struct fqdir *fqdir, void
> > *key) in net/ipv4/inet_fragment.c.
> >
> >
> >
> > However, this function will first check if the existing fragment
> > memory exceeds the fqdir->high_thresh. If it exceeds, then drop the
> > fragment regardless whether it belongs to a new queue or an existing queue.
> >
> > Chances are that an attacker can fill the cache with fragments that
> > will never be assembled (i.e., only sends the first fragment with new
> > IPIDs every time) to exceed the threshold so that all future incoming
> > fragmented IPv4 traffic would be blocked and dropped. Since there is no
> > GC mechanism, the victim host has to wait for 30s when the fragments are
> > expired to continue receive incoming fragments normally.
> >
> > In practice, given the 4MB fragment cache, the attacker only needs
> > to send 1766 fragments to exhaust the cache and DoS the victim for 30s,
> > whose cost is pretty low. Besides, IPv6 would also be affected since the
> > issue resides in inet part.
> >
> > This issue is introduced in commit
> > 648700f76b03b7e8149d13cc2bdb3355035258a9 (inet: frags: use rhashtables
> > for reassembly units) which removes fqdir->low_thresh, and GC worker as
> > well. We would gently request to bring GC worker back to the kernel to
> > prevent the DoS attacks.
> >
> > Looking forward to hear from you
> >
> >
> >
> > Thanks,
> >
> > Keyu Man
> >
>

2021-04-17 07:30:40

by Willy Tarreau

[permalink] [raw]
Subject: Re: PROBLEM: DoS Attack on Fragment Cache

On Sat, Apr 17, 2021 at 06:44:40AM +0200, Eric Dumazet wrote:
> On Sat, Apr 17, 2021 at 2:31 AM David Ahern <[email protected]> wrote:
> >
> > [ cc author of 648700f76b03b7e8149d13cc2bdb3355035258a9 ]
>
> I think this has been discussed already. There is no strategy that
> makes IP reassembly units immune to DDOS attacks.

For having tried to deal with this in the past as well, I agree with
this conclusion, which is also another good example of why fragments
should really be avoided as much as possible over hostile networks.

However I also found that random drops of previous entries is the
approach which seems to offer the most statistical opportunities to
legitimate traffic to still work under attack (albeit really poorly
considering that any lost fragment requires retransmission of the
whole series). In this case the chance for a packet to be successfully
reassembled would vary proportionally to the inverse of its number of
fragments, which reasonably limits the impact of attacks (without being
an ultimate solution of course).

> We added rb-tree and sysctls to let admins choose to use GB of RAM if
> they really care.

I agree that for those who care, the real solution is to make sure they
can store all the traffic they receive during a reassembly period.
Legitimate traffic mostly reassembles quickly so keeping 1 second of
traffic at 10 Gbps is only 1.25 GB of RAM after all...

Willy

2021-04-17 07:52:27

by Willy Tarreau

[permalink] [raw]
Subject: Re: PROBLEM: DoS Attack on Fragment Cache

On Sat, Apr 17, 2021 at 12:42:39AM -0700, Keyu Man wrote:
> How about at least allow the existing queue to finish? Currently a tiny new
> fragment would potentially invalid all previous fragments by letting them
> timeout without allowing the fragments to come in to finish the assembly.

Because this is exactly the principle of how attacks are built: reserve
resources claiming that you'll send everything so that others can't make
use of the resources that are reserved to you. The best solution precisely
is *not* to wait for anyone to finish, hence *not* to reserve valuable
resources that are unusuable by others.

Willy

2021-04-18 01:37:11

by Matt Corallo

[permalink] [raw]
Subject: Re: PROBLEM: DoS Attack on Fragment Cache

See-also "[PATCH] Reduce IP_FRAG_TIME fragment-reassembly timeout to 1s, from 30s" (and the two resends of it) - given
the size of the default cache (4MB) and the time that it takes before we flush the cache (30 seconds) you only need
about 1Mbps of fragments to hit this issue. While DoS attacks are concerning, its also incredibly practical (and I do)
hit this issue in normal non-adversarial conditions.

Matt

On 4/17/21 03:50, Willy Tarreau wrote:
> On Sat, Apr 17, 2021 at 12:42:39AM -0700, Keyu Man wrote:
>> How about at least allow the existing queue to finish? Currently a tiny new
>> fragment would potentially invalid all previous fragments by letting them
>> timeout without allowing the fragments to come in to finish the assembly.
>
> Because this is exactly the principle of how attacks are built: reserve
> resources claiming that you'll send everything so that others can't make
> use of the resources that are reserved to you. The best solution precisely
> is *not* to wait for anyone to finish, hence *not* to reserve valuable
> resources that are unusuable by others.
>
> Willy
>

2021-04-18 01:41:11

by Keyu Man

[permalink] [raw]
Subject: Re: PROBLEM: DoS Attack on Fragment Cache

Willy's words make sense to me and I agree that the existing fragments
should be evicted when the new one comes in and the cache is full.
Though the attacker can still leverage this to flush the victim's
cache, as mentioned previously, since fragments are likely to be
assembled in a very short time, it would be hard to launch the
attack(evicting the legit fragment before it's assembled requires a
large packet sending rate). And this seems better than the existing
solution (drop all incoming fragments when full).

Keyu

On Sat, Apr 17, 2021 at 6:30 PM Matt Corallo
<[email protected]> wrote:
>
> See-also "[PATCH] Reduce IP_FRAG_TIME fragment-reassembly timeout to 1s, from 30s" (and the two resends of it) - given
> the size of the default cache (4MB) and the time that it takes before we flush the cache (30 seconds) you only need
> about 1Mbps of fragments to hit this issue. While DoS attacks are concerning, its also incredibly practical (and I do)
> hit this issue in normal non-adversarial conditions.
>
> Matt
>
> On 4/17/21 03:50, Willy Tarreau wrote:
> > On Sat, Apr 17, 2021 at 12:42:39AM -0700, Keyu Man wrote:
> >> How about at least allow the existing queue to finish? Currently a tiny new
> >> fragment would potentially invalid all previous fragments by letting them
> >> timeout without allowing the fragments to come in to finish the assembly.
> >
> > Because this is exactly the principle of how attacks are built: reserve
> > resources claiming that you'll send everything so that others can't make
> > use of the resources that are reserved to you. The best solution precisely
> > is *not* to wait for anyone to finish, hence *not* to reserve valuable
> > resources that are unusuable by others.
> >
> > Willy
> >

2021-04-18 02:45:22

by Matt Corallo

[permalink] [raw]
Subject: Re: PROBLEM: DoS Attack on Fragment Cache

Sure, there are better ways to handle the reassembly cache overflowing, but that is pretty unrelated to the fact that
waiting 30 full seconds for a fragment to come in doesn't really make sense in today's networks (the 30 second delay
that is used today appears to even be higher than RFC 791 suggested in 1981!). You get a lot more bang for your buck if
you don't wait around so long (or we could restructure things to kick out the oldest fragments, but that is a lot more
work, and probably extra indexes that just aren't worth it).

Matt

On 4/17/21 21:38, Keyu Man wrote:
> Willy's words make sense to me and I agree that the existing fragments
> should be evicted when the new one comes in and the cache is full.
> Though the attacker can still leverage this to flush the victim's
> cache, as mentioned previously, since fragments are likely to be
> assembled in a very short time, it would be hard to launch the
> attack(evicting the legit fragment before it's assembled requires a
> large packet sending rate). And this seems better than the existing
> solution (drop all incoming fragments when full).
>
> Keyu
>
> On Sat, Apr 17, 2021 at 6:30 PM Matt Corallo
> <[email protected]> wrote:
>>
>> See-also "[PATCH] Reduce IP_FRAG_TIME fragment-reassembly timeout to 1s, from 30s" (and the two resends of it) - given
>> the size of the default cache (4MB) and the time that it takes before we flush the cache (30 seconds) you only need
>> about 1Mbps of fragments to hit this issue. While DoS attacks are concerning, its also incredibly practical (and I do)
>> hit this issue in normal non-adversarial conditions.
>>
>> Matt
>>
>> On 4/17/21 03:50, Willy Tarreau wrote:
>>> On Sat, Apr 17, 2021 at 12:42:39AM -0700, Keyu Man wrote:
>>>> How about at least allow the existing queue to finish? Currently a tiny new
>>>> fragment would potentially invalid all previous fragments by letting them
>>>> timeout without allowing the fragments to come in to finish the assembly.
>>>
>>> Because this is exactly the principle of how attacks are built: reserve
>>> resources claiming that you'll send everything so that others can't make
>>> use of the resources that are reserved to you. The best solution precisely
>>> is *not* to wait for anyone to finish, hence *not* to reserve valuable
>>> resources that are unusuable by others.
>>>
>>> Willy
>>>

2021-04-18 04:55:15

by Willy Tarreau

[permalink] [raw]
Subject: Re: PROBLEM: DoS Attack on Fragment Cache

On Sat, Apr 17, 2021 at 10:26:30PM -0400, Matt Corallo wrote:
> Sure, there are better ways to handle the reassembly cache overflowing, but
> that is pretty unrelated to the fact that waiting 30 full seconds for a
> fragment to come in doesn't really make sense in today's networks (the 30
> second delay that is used today appears to even be higher than RFC 791
> suggested in 1981!).

Not exactly actually, because you forget the TTL here. With most hosts
sending an initial TTL around 64, after crossing 10-15 hops it's still
around 50 so that would result in ~50 seconds by default, even according
to the 40 years old RFC791. The 15s there was the absolute minimum. While
I do agree that we shouldn't keep them that long nowadays, we can't go
too low without risking to break some slow transmission stacks (SLIP/PPP
over modems for example). In addition even cutting that in 3 will remain
trivially DoSable.

> You get a lot more bang for your buck if you don't wait
> around so long (or we could restructure things to kick out the oldest
> fragments, but that is a lot more work, and probably extra indexes that just
> aren't worth it).

Kicking out oldest ones is a bad approach in a system made only of
independent elements, because it tends to result in a lot of damage once
all of them behave similarly. I.e. if you need to kick out an old entry
in valid traffic, it's because you do need to wait that long, and if all
datagrams need to wait that long, then new datagrams systematically
prevent the oldest one from being reassembled, and none gest reassembled.
With a random approach at least your success ratio converges towards 1/e
(i.e. 36%) which is better.

Willy

2021-04-18 14:37:31

by Matt Corallo

[permalink] [raw]
Subject: Re: PROBLEM: DoS Attack on Fragment Cache

Should the default, though, be so low? If someone is still using a old modem they can crank up the sysctl, it does seem
like such things are pretty rare these days :). Its rather trivial to, without any kind of attack, hit 1Mbps of lost
fragments in today's networks, at which point all fragments are dropped. After all, I submitted the patch to "scratch my
own itch" :).

Matt

On 4/18/21 00:39, Willy Tarreau wrote:
> I do agree that we shouldn't keep them that long nowadays, we can't go
> too low without risking to break some slow transmission stacks (SLIP/PPP
> over modems for example).

2021-04-19 09:44:36

by Eric Dumazet

[permalink] [raw]
Subject: Re: PROBLEM: DoS Attack on Fragment Cache

On Sun, Apr 18, 2021 at 4:31 PM Matt Corallo
<[email protected]> wrote:
>
> Should the default, though, be so low? If someone is still using a old modem they can crank up the sysctl, it does seem
> like such things are pretty rare these days :). Its rather trivial to, without any kind of attack, hit 1Mbps of lost
> fragments in today's networks, at which point all fragments are dropped. After all, I submitted the patch to "scratch my
> own itch" :).

Again, even if you increase the values by 1000x, it is trivial for an
attacker to use all the memory you allowed.

And allowing a significant portion of memory to be eaten like that
might cause OOM on hosts where jobs are consuming all physical memory.

It is a sysctl, I changed things so that one could really reserve/use
16GB of memory if she/he is desperate about frags.

>
> Matt
>
> On 4/18/21 00:39, Willy Tarreau wrote:
> > I do agree that we shouldn't keep them that long nowadays, we can't go
> > too low without risking to break some slow transmission stacks (SLIP/PPP
> > over modems for example).

2021-04-19 17:25:17

by Matt Corallo

[permalink] [raw]
Subject: Re: PROBLEM: DoS Attack on Fragment Cache

Note that there are two completely separate sysctls here - the timeout on fragments, and the amount of memory available
for fragment reassembly. You have to multiply them together to reach the "Mbps of lost or deliberately-lost fragments
before we start dropping all future fragments". See the calculation in the description of the patch I mentioned above
for exact details, but turning the time down to 1s already gives you 32Mbps, and you can tune the memory usage
separately (eg 128MB, really 256 between v4 and v6, would give you 1Gbps of "lost" fragments).

Its true, an attacker can use a lot of memory in that case, but 128MiB isn't actually something that rises to the level
of "trivial for an attacker to use all the memory you allowed" or "cause OOM".

I only chimed in on this thread to note that this isn't just a theoretical attack concern, however - this is a
real-world non-attack-scenario issue that's pretty trivial to hit. Just losing 1Mbps of traffic on a modern residential
internet connection is pretty doable, make that flow mostly frags and suddenly your VPN drops out for 30 seconds at a
time just because.

I agree with others here that actually solving the DoS issue isn't trivial, but making it less absurdly trivial to have
30 second dropouts of your VPN connection would also be a nice change.

Matt

On 4/19/21 05:43, Eric Dumazet wrote:
> On Sun, Apr 18, 2021 at 4:31 PM Matt Corallo
> <[email protected]> wrote:
>>
>> Should the default, though, be so low? If someone is still using a old modem they can crank up the sysctl, it does seem
>> like such things are pretty rare these days :). Its rather trivial to, without any kind of attack, hit 1Mbps of lost
>> fragments in today's networks, at which point all fragments are dropped. After all, I submitted the patch to "scratch my
>> own itch" :).
>
> Again, even if you increase the values by 1000x, it is trivial for an
> attacker to use all the memory you allowed.
>
> And allowing a significant portion of memory to be eaten like that
> might cause OOM on hosts where jobs are consuming all physical memory.
>
> It is a sysctl, I changed things so that one could really reserve/use
> 16GB of memory if she/he is desperate about frags.
>
>>
>> Matt
>>
>> On 4/18/21 00:39, Willy Tarreau wrote:
>>> I do agree that we shouldn't keep them that long nowadays, we can't go
>>> too low without risking to break some slow transmission stacks (SLIP/PPP
>>> over modems for example).