Hi Mel Gorman,
I may have found a problem in pfmemalloc skb handling in
net/core/dev.c. I see there are "if" conditions checking for
"sk_memalloc_socks() && skb_pfmemalloc(skb)", and when the condition
is true, the skb is handled specially as a pfmemalloc skb, otherwise
it is handled as a normal skb.
However, if "sk_memalloc_socks()" is false and "skb_pfmemalloc(skb)"
is true, the skb is still handled as a normal skb. Is this correct?
This might happen if "sk_memalloc_socks()" was originally true and has
just turned into false before the check. Can this happen?
I found the original commit that added the "if" conditions:
commit b4b9e3558508 ("netvm: set PF_MEMALLOC as appropriate during SKB
processing")
The commit message clearly indicates pfmemalloc skbs shouldn't be
delivered to taps (or protocols that don't support pfmemalloc skbs).
However, if they are incorrectly handled as normal skbs, they could be
delivered to those places.
I'm not sure if my understanding is correct. Could you please help? Thank you!
On Thu, Apr 08, 2021 at 11:52:01AM -0700, Xie He wrote:
> Hi Mel Gorman,
>
> I may have found a problem in pfmemalloc skb handling in
> net/core/dev.c. I see there are "if" conditions checking for
> "sk_memalloc_socks() && skb_pfmemalloc(skb)", and when the condition
> is true, the skb is handled specially as a pfmemalloc skb, otherwise
> it is handled as a normal skb.
>
> However, if "sk_memalloc_socks()" is false and "skb_pfmemalloc(skb)"
> is true, the skb is still handled as a normal skb. Is this correct?
Under what circumstances do you expect sk_memalloc_socks() to be false
and skb_pfmemalloc() to be true that would cause a problem?
--
Mel Gorman
SUSE Labs
On Fri, Apr 9, 2021 at 12:30 AM Mel Gorman <[email protected]> wrote:
>
> Under what circumstances do you expect sk_memalloc_socks() to be false
> and skb_pfmemalloc() to be true that would cause a problem?
For example, if at the time the skb is allocated,
"sk_memalloc_socks()" was true, then the skb might be allocated as a
pfmemalloc skb. However, if after this skb is allocated and before
this skb reaches "__netif_receive_skb", "sk_memalloc_socks()" has
changed from "true" to "false", then "__netif_receive_skb" will see
"sk_memalloc_socks()" being false and "skb_pfmemalloc(skb)" being
true.
This is a problem because this would cause a pfmemalloc skb to be
delivered to "taps" and protocols that don't support pfmemalloc skbs.
On Fri, Apr 09, 2021 at 01:33:24AM -0700, Xie He wrote:
> On Fri, Apr 9, 2021 at 12:30 AM Mel Gorman <[email protected]> wrote:
> >
> > Under what circumstances do you expect sk_memalloc_socks() to be false
> > and skb_pfmemalloc() to be true that would cause a problem?
>
> For example, if at the time the skb is allocated,
> "sk_memalloc_socks()" was true, then the skb might be allocated as a
> pfmemalloc skb. However, if after this skb is allocated and before
> this skb reaches "__netif_receive_skb", "sk_memalloc_socks()" has
> changed from "true" to "false", then "__netif_receive_skb" will see
> "sk_memalloc_socks()" being false and "skb_pfmemalloc(skb)" being
> true.
>
> This is a problem because this would cause a pfmemalloc skb to be
> delivered to "taps" and protocols that don't support pfmemalloc skbs.
That would imply that the tap was communicating with a swap device to
allocate a pfmemalloc skb which shouldn't happen. Furthermore, it would
require the swap device to be deactivated while pfmemalloc skbs still
existed. Have you encountered this problem?
--
Mel Gorman
SUSE Labs
On Fri, Apr 9, 2021 at 1:44 AM Mel Gorman <[email protected]> wrote:
>
> That would imply that the tap was communicating with a swap device to
> allocate a pfmemalloc skb which shouldn't happen. Furthermore, it would
> require the swap device to be deactivated while pfmemalloc skbs still
> existed. Have you encountered this problem?
I'm not a user of swap devices or pfmemalloc skbs. I just want to make
sure the protocols that I'm developing (not IP or IPv6) won't get
pfmemalloc skbs when receiving, because those protocols cannot handle
them.
According to the code, it seems always possible to get a pfmemalloc
skb when a network driver calls "__netdev_alloc_skb". The skb will
then be queued in per-CPU backlog queues when the driver calls
"netif_rx". There seems to be nothing preventing "sk_memalloc_socks()"
from becoming "false" after the skb is allocated and before it is
handled by "__netif_receive_skb".
Do you mean that at the time "sk_memalloc_socks()" changes from "true"
to "false", there would be no in-flight skbs currently being received,
and all network communications have been paused?
On Fri, Apr 09, 2021 at 02:14:12AM -0700, Xie He wrote:
> On Fri, Apr 9, 2021 at 1:44 AM Mel Gorman <[email protected]> wrote:
> >
> > That would imply that the tap was communicating with a swap device to
> > allocate a pfmemalloc skb which shouldn't happen. Furthermore, it would
> > require the swap device to be deactivated while pfmemalloc skbs still
> > existed. Have you encountered this problem?
>
> I'm not a user of swap devices or pfmemalloc skbs. I just want to make
> sure the protocols that I'm developing (not IP or IPv6) won't get
> pfmemalloc skbs when receiving, because those protocols cannot handle
> them.
>
> According to the code, it seems always possible to get a pfmemalloc
> skb when a network driver calls "__netdev_alloc_skb". The skb will
> then be queued in per-CPU backlog queues when the driver calls
> "netif_rx". There seems to be nothing preventing "sk_memalloc_socks()"
> from becoming "false" after the skb is allocated and before it is
> handled by "__netif_receive_skb".
>
> Do you mean that at the time "sk_memalloc_socks()" changes from "true"
> to "false", there would be no in-flight skbs currently being received,
> and all network communications have been paused?
Not all network communication, but communication with swap devices
should have stopped once sk_memalloc_socks is false.
--
Mel Gorman
SUSE Labs
On Fri, Apr 9, 2021 at 3:04 AM Eric Dumazet <[email protected]> wrote:
>
> Note that pfmemalloc skbs are normally dropped in sk_filter_trim_cap()
>
> Simply make sure your protocol use it.
It seems "sk_filter_trim_cap" needs an "struct sock" argument. Some of
my protocols act like a middle layer to another protocol and don't
have any "struct sock".
Also, I think this is a problem in net/core/dev.c, there are a lot of
old protocols that are not aware of pfmemalloc skbs. I don't think
it's a good idea to fix them one by one.
On Fri, Apr 9, 2021 at 2:58 AM Mel Gorman <[email protected]> wrote:
>
> On Fri, Apr 09, 2021 at 02:14:12AM -0700, Xie He wrote:
> >
> > Do you mean that at the time "sk_memalloc_socks()" changes from "true"
> > to "false", there would be no in-flight skbs currently being received,
> > and all network communications have been paused?
>
> Not all network communication, but communication with swap devices
> should have stopped once sk_memalloc_socks is false.
But all incoming network traffic can be allocated as pfmemalloc skbs,
regardless whether or not it is related to swap devices. My protocols
don't need and cannot handle pfmemalloc skbs, therefore I want to make
sure my protocols never receive pfmemalloc skbs. The current code
doesn't seem to guarantee this.
On 4/9/21 11:14 AM, Xie He wrote:
> On Fri, Apr 9, 2021 at 1:44 AM Mel Gorman <[email protected]> wrote:
>>
>> That would imply that the tap was communicating with a swap device to
>> allocate a pfmemalloc skb which shouldn't happen. Furthermore, it would
>> require the swap device to be deactivated while pfmemalloc skbs still
>> existed. Have you encountered this problem?
>
> I'm not a user of swap devices or pfmemalloc skbs. I just want to make
> sure the protocols that I'm developing (not IP or IPv6) won't get
> pfmemalloc skbs when receiving, because those protocols cannot handle
> them.
>
> According to the code, it seems always possible to get a pfmemalloc
> skb when a network driver calls "__netdev_alloc_skb". The skb will
> then be queued in per-CPU backlog queues when the driver calls
> "netif_rx". There seems to be nothing preventing "sk_memalloc_socks()"
> from becoming "false" after the skb is allocated and before it is
> handled by "__netif_receive_skb".
>
> Do you mean that at the time "sk_memalloc_socks()" changes from "true"
> to "false", there would be no in-flight skbs currently being received,
> and all network communications have been paused?
>
Note that pfmemalloc skbs are normally dropped in sk_filter_trim_cap()
Simply make sure your protocol use it.
On 4/9/21 12:14 PM, Xie He wrote:
> On Fri, Apr 9, 2021 at 3:04 AM Eric Dumazet <[email protected]> wrote:
>>
>> Note that pfmemalloc skbs are normally dropped in sk_filter_trim_cap()
>>
>> Simply make sure your protocol use it.
>
> It seems "sk_filter_trim_cap" needs an "struct sock" argument. Some of
> my protocols act like a middle layer to another protocol and don't
> have any "struct sock".
Then simply copy the needed logic.
>
> Also, I think this is a problem in net/core/dev.c, there are a lot of
> old protocols that are not aware of pfmemalloc skbs. I don't think
> it's a good idea to fix them one by one.
>
I think you are mistaken.
There is no problem in net/core/dev.c really, it uses
skb_pfmemalloc_protocol()
pfmemalloc is best effort really.
If a layer store packets in many long living queues, it has to drop pfmemalloc packets,
unless these packets are used for swapping.
On Fri, Apr 9, 2021 at 4:50 AM Eric Dumazet <[email protected]> wrote:
>
> On 4/9/21 12:14 PM, Xie He wrote:
>
> Then simply copy the needed logic.
No, there's no such thing as "sockets" in some of the protocols. There
is simply no way to copy "the needed logic".
> > Also, I think this is a problem in net/core/dev.c, there are a lot of
> > old protocols that are not aware of pfmemalloc skbs. I don't think
> > it's a good idea to fix them one by one.
> >
>
> I think you are mistaken.
>
> There is no problem in net/core/dev.c really, it uses
> skb_pfmemalloc_protocol()
This is exactly what I'm talking about. "skb_pfmemalloc_protocol"
cannot guarantee pfmemalloc skbs are not delivered to unrelated
protocols, because "__netif_receive_skb" will sometimes treat
pfmemalloc skbs as normal skbs.
> pfmemalloc is best effort really.
>
> If a layer store packets in many long living queues, it has to drop pfmemalloc packets,
> unless these packets are used for swapping.
Yes, the code of "net/core/dev.c" has exactly this problem. It doesn't
drop pfmemalloc skbs in some situations, and instead deliver them to
unrelated protocols, which clearly have nothing to do with swapping.
I'm not sure if you understand what I'm saying. Please look at the
code of "__netif_receive_skb" and see what will happen when
"sk_memalloc_socks()" is false and "skb_pfmemalloc(skb)" is true.
On Fri, Apr 9, 2021 at 12:12 PM Xie He <[email protected]> wrote:
>
> This is exactly what I'm talking about. "skb_pfmemalloc_protocol"
> cannot guarantee pfmemalloc skbs are not delivered to unrelated
> protocols, because "__netif_receive_skb" will sometimes treat
> pfmemalloc skbs as normal skbs.
> I'm not sure if you understand what I'm saying. Please look at the
> code of "__netif_receive_skb" and see what will happen when
> "sk_memalloc_socks()" is false and "skb_pfmemalloc(skb)" is true.
Do you see the problem now? Just think what happens when
"skb_pfmemalloc(skb)" is true and "sk_memalloc_socks()" has just
changed to "false", and whether in this case "skb_pfmemalloc_protocol"
still takes any effect.