2000-10-28 23:11:30

by Marcelo Tosatti

[permalink] [raw]
Subject: tcp_do_sendmsg() allocation still broken ?


David,

tcp_do_sendmsg() still allocates using GFP_KERNEL when it can't, it seems:

int tcp_do_sendmsg(struct sock *sk, struct msghdr *msg)
{
...
TCP_SKB_CB(skb)->seq = tp->write_seq;
TCP_SKB_CB(skb)->end_seq = TCP_SKB_CB(skb)->seq +
copy;

/* This advances tp->write_seq for us. */
tcp_send_skb(sk, skb, queue_it);
^^^^^^^^^^^^

}
}
sk->err = 0;
...
}


Now what we can find at tcp_send_skb? :)


void tcp_send_skb(struct sock *sk, struct sk_buff *skb, int force_queue,
unsigned cur_mss) {

...

if (!force_queue && tp->send_head == NULL && tcp_snd_test(tp, skb,
cur_mss, 1)) {
/* Send it out now. */
TCP_SKB_CB(skb)->when = tcp_time_stamp;
if (tcp_transmit_skb(sk, skb_clone(skb, GFP_KERNEL)) == 0) {
^^^^^^^^^^^^
tp->snd_nxt = TCP_SKB_CB(skb)->end_seq;

...





2000-10-29 01:03:38

by Andi Kleen

[permalink] [raw]
Subject: Re: tcp_do_sendmsg() allocation still broken ?

On Sat, Oct 28, 2000 at 07:12:44PM -0200, Marcelo Tosatti wrote:
>
> David,
>
> tcp_do_sendmsg() still allocates using GFP_KERNEL when it can't, it seems:

tcp_do_sendmsg() should only be called from process context, because it can
sleep for other reasons anyways.

If someone calls it from interrupt context it needs to be fixed.


-Andi

2000-10-29 01:12:20

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: tcp_do_sendmsg() allocation still broken ?



On Sun, 29 Oct 2000, Andi Kleen wrote:

> On Sat, Oct 28, 2000 at 07:12:44PM -0200, Marcelo Tosatti wrote:
> >
> > David,
> >
> > tcp_do_sendmsg() still allocates using GFP_KERNEL when it can't, it seems:
>
> tcp_do_sendmsg() should only be called from process context, because it can
> sleep for other reasons anyways.
>
> If someone calls it from interrupt context it needs to be fixed.

Andi,

Think about nbd.

It allocates memory inside its request function, and since we do write
throttling in try_to_free_pages() we can end up with this deadlock in low
memory conditions:

nbd_do_request -> nbd_send_req -> nbd_xmit -> sock_sendmsg -> ... ->
tcp_do_sendmsg -> tcp_send_skb -> skb_copy -> alloc_skb ->
kmalloc(GFP_KERNEL) -> kmem_cache_grow -> get_free_pages ->
try_to_free_pages -> shrink_mmap -> try_to_free_buffers ->
sync_page_buffers -> ll_rw_block -> make_request -> add_request ->
nbd_do_request -> nbd_send_req -> ...

until there are no more free requests on the request queue and writer
processes stuck on __get_request_wait.



2000-10-29 01:17:02

by Andi Kleen

[permalink] [raw]
Subject: Re: tcp_do_sendmsg() allocation still broken ?

On Sat, Oct 28, 2000 at 09:13:27PM -0200, Marcelo Tosatti wrote:
>
>
> On Sun, 29 Oct 2000, Andi Kleen wrote:
>
> > On Sat, Oct 28, 2000 at 07:12:44PM -0200, Marcelo Tosatti wrote:
> > >
> > > David,
> > >
> > > tcp_do_sendmsg() still allocates using GFP_KERNEL when it can't, it seems:
> >
> > tcp_do_sendmsg() should only be called from process context, because it can
> > sleep for other reasons anyways.
> >
> > If someone calls it from interrupt context it needs to be fixed.
>
> Andi,
>
> Think about nbd.

Making tcp_do_sendmsg use GFP_ATOMIC would make it too unreliable for other
situations. Penalizing the whole system just for nbd is not a good idea.

>
> It allocates memory inside its request function, and since we do write
> throttling in try_to_free_pages() we can end up with this deadlock in low
> memory conditions:
>
> nbd_do_request -> nbd_send_req -> nbd_xmit -> sock_sendmsg -> ... ->
> tcp_do_sendmsg -> tcp_send_skb -> skb_copy -> alloc_skb ->
> kmalloc(GFP_KERNEL) -> kmem_cache_grow -> get_free_pages ->
> try_to_free_pages -> shrink_mmap -> try_to_free_buffers ->
> sync_page_buffers -> ll_rw_block -> make_request -> add_request ->
> nbd_do_request -> nbd_send_req -> ...
>
> until there are no more free requests on the request queue and writer
> processes stuck on __get_request_wait.

Looks like a job for PF_MEMALLOC and making GFP_KERNEL fail earlier.

-Andi

2000-10-29 01:23:53

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: tcp_do_sendmsg() allocation still broken ?


On Sun, 29 Oct 2000, Andi Kleen wrote:

> On Sat, Oct 28, 2000 at 09:13:27PM -0200, Marcelo Tosatti wrote:
> >
> >
> > On Sun, 29 Oct 2000, Andi Kleen wrote:
> >
> > > On Sat, Oct 28, 2000 at 07:12:44PM -0200, Marcelo Tosatti wrote:
> > > >
> > > > David,
> > > >
> > > > tcp_do_sendmsg() still allocates using GFP_KERNEL when it can't, it seems:
> > >
> > > tcp_do_sendmsg() should only be called from process context, because it can
> > > sleep for other reasons anyways.
> > >
> > > If someone calls it from interrupt context it needs to be fixed.
> >
> > Andi,
> >
> > Think about nbd.
>
> Making tcp_do_sendmsg use GFP_ATOMIC would make it too unreliable for other
> situations. Penalizing the whole system just for nbd is not a good idea.

I agree that using GFP_ATOMIC is bad there.

I've fixed 2.2 with GFP_BUFFER.

>
> >
> > It allocates memory inside its request function, and since we do write
> > throttling in try_to_free_pages() we can end up with this deadlock in low
> > memory conditions:
> >
> > nbd_do_request -> nbd_send_req -> nbd_xmit -> sock_sendmsg -> ... ->
> > tcp_do_sendmsg -> tcp_send_skb -> skb_copy -> alloc_skb ->
> > kmalloc(GFP_KERNEL) -> kmem_cache_grow -> get_free_pages ->
> > try_to_free_pages -> shrink_mmap -> try_to_free_buffers ->
> > sync_page_buffers -> ll_rw_block -> make_request -> add_request ->
> > nbd_do_request -> nbd_send_req -> ...
> >
> > until there are no more free requests on the request queue and writer
> > processes stuck on __get_request_wait.
>
> Looks like a job for PF_MEMALLOC and making GFP_KERNEL fail earlier.

IMHO GFP_BUFFER is ok.




2000-10-29 01:47:07

by Rik van Riel

[permalink] [raw]
Subject: Re: tcp_do_sendmsg() allocation still broken ?

On Sun, 29 Oct 2000, Andi Kleen wrote:
> On Sat, Oct 28, 2000 at 07:12:44PM -0200, Marcelo Tosatti wrote:
> >
> > tcp_do_sendmsg() still allocates using GFP_KERNEL when it can't, it seems:
>
> tcp_do_sendmsg() should only be called from process context,
> because it can sleep for other reasons anyways.

Andi, if sk->allocation is there to be ignored, why
don't you remove that element from the structure?

> If someone calls it from interrupt context it needs to be fixed.

Think about nbd.
Think about swap-over-network.
Think about the reasons why sk->allocation exists.

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
-- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/ http://www.surriel.com/

2000-10-29 01:48:37

by David Miller

[permalink] [raw]
Subject: Re: tcp_do_sendmsg() allocation still broken ?

Date: Sun, 29 Oct 2000 02:03:11 +0100
From: Andi Kleen <[email protected]>

tcp_do_sendmsg() should only be called from process context,
because it can sleep for other reasons anyways.

If someone calls it from interrupt context it needs to be fixed.

He is calling it from process context, he needs GFP_ATOMIC for
other reasons (VM deadlocks when using NBD).

Later,
David S. Miller
[email protected]

2000-10-29 01:49:38

by David Miller

[permalink] [raw]
Subject: Re: tcp_do_sendmsg() allocation still broken ?

Date: Sun, 29 Oct 2000 02:16:42 +0100
From: Andi Kleen <[email protected]>

On Sat, Oct 28, 2000 at 09:13:27PM -0200, Marcelo Tosatti wrote:
> Think about nbd.

Making tcp_do_sendmsg use GFP_ATOMIC would make it too unreliable for other
situations. Penalizing the whole system just for nbd is not a good idea.

Andi, sk->allocation not GFP_ATOMIC.

This is such an old topic Andi, I am surprised you forget all these details
so quickly.

Later,
David S. Miller
[email protected]

2000-10-29 07:16:18

by Andi Kleen

[permalink] [raw]
Subject: Re: tcp_do_sendmsg() allocation still broken ?

On Sat, Oct 28, 2000 at 11:46:37PM -0200, Rik van Riel wrote:
> Think about swap-over-network.

swap-over-network is unlikely to ever work properly without a preallocated
skb pool, no matter if you use sk->allocation or not.

-Andi

2000-10-29 11:36:10

by Alan

[permalink] [raw]
Subject: Re: tcp_do_sendmsg() allocation still broken ?

> > Think about nbd.
>
> Making tcp_do_sendmsg use GFP_ATOMIC would make it too unreliable for other
> situations. Penalizing the whole system just for nbd is not a good idea.

Nobody is saying that. tcp_do_sendmsg should honour the existing
sk->allocation flag