2009-12-01 16:19:00

by Chris Friesen

[permalink] [raw]
Subject: seeing strange values for tcp sk_rmem_alloc


I'm hoping someone might be able to explain some odd behaviour that I'm
seeing.

Some of our developers wanted to be able to see how much of their rx
socket buffer space was in use, so I added the following to sock_ioctl()


case SIOCGSKRMEMALLOC:
{
int tmp;
err = -EINVAL;
if(!sock->sk)
break;
tmp = atomic_read(&sock->sk->sk_rmem_alloc);
err = copy_to_user(argp, &tmp, sizeof(tmp));
break;
}

To validate it, I wrote a testcase that opened a tcp socket, then looped
sending 2k of data at a time to it and calling the above ioctl to check
the sk_rmem_alloc value (without ever reading from the socket).

The results were odd--I've copied them below. Can anyone explain how I
can send 20K of data but sk_rmem_alloc still only shows 4.8K used, then
it suddenly jumps by a lot on the next packet to something that more
reflects reality, then repeats that pattern again? Is there some
additional buffering happening somewhere in the TCP stack?

Thanks,

Chris

used: 2424
used: 4848
used: 4848
used: 4848
used: 4848
used: 4848
used: 4848
used: 4848
used: 4848
used: 4848
used: 23696
used: 23696
used: 23696
used: 23696
used: 23696
used: 23696
used: 23696
used: 23696
used: 23696
used: 42544
used: 42544
used: 42544
used: 42544
used: 42544
used: 42544
used: 42544
used: 42544
used: 42544
used: 61392
used: 61392
used: 61392
used: 61392
used: 61392
used: 61392
used: 61392
used: 61392
used: 61392
used: 80240
used: 80240
used: 80240
used: 80240
used: 80240
used: 80240
used: 80240
used: 80240
used: 80240
used: 80240
used: 80240
used: 80240
used: 80240
used: 80240
used: 80240
used: 80240
used: 80240
used: 80240
used: 80240
used: 80240
used: 80240


2009-12-01 16:22:00

by Chris Friesen

[permalink] [raw]
Subject: Re: seeing strange values for tcp sk_rmem_alloc


I forgot to mention that this was on 2.6.27. I haven't tried it on
current git.

Chris

2009-12-01 16:58:38

by Eric Dumazet

[permalink] [raw]
Subject: Re: seeing strange values for tcp sk_rmem_alloc

Chris Friesen a ?crit :
> I'm hoping someone might be able to explain some odd behaviour that I'm
> seeing.
>
> Some of our developers wanted to be able to see how much of their rx
> socket buffer space was in use, so I added the following to sock_ioctl()
>
>
> case SIOCGSKRMEMALLOC:
> {
> int tmp;
> err = -EINVAL;
> if(!sock->sk)
> break;
> tmp = atomic_read(&sock->sk->sk_rmem_alloc);
> err = copy_to_user(argp, &tmp, sizeof(tmp));
> break;
> }
>
> To validate it, I wrote a testcase that opened a tcp socket, then looped
> sending 2k of data at a time to it and calling the above ioctl to check
> the sk_rmem_alloc value (without ever reading from the socket).
>
> The results were odd--I've copied them below. Can anyone explain how I
> can send 20K of data but sk_rmem_alloc still only shows 4.8K used, then
> it suddenly jumps by a lot on the next packet to something that more
> reflects reality, then repeats that pattern again? Is there some
> additional buffering happening somewhere in the TCP stack?
>

Me wondering why you think sk_rmem_alloc is about TX side.

Its used in RX path. rmem means ReadMemory.

You can send 1 Gbytes of data, and sk_rmem_alloc doesnt change, if your
TCP stream is unidirectionnal.

sk_rmem_alloc grows when skb are queued into receive queue
sk_rmem_alloc shrinks when application reads this receive queue.



2009-12-01 17:31:17

by Chris Friesen

[permalink] [raw]
Subject: Re: seeing strange values for tcp sk_rmem_alloc

On 12/01/2009 10:58 AM, Eric Dumazet wrote:

> Me wondering why you think sk_rmem_alloc is about TX side.
> Its used in RX path. rmem means ReadMemory.

Yep, I realize this.

> You can send 1 Gbytes of data, and sk_rmem_alloc doesnt change, if your
> TCP stream is unidirectionnal.
>
> sk_rmem_alloc grows when skb are queued into receive queue
> sk_rmem_alloc shrinks when application reads this receive queue.

I realize this. I sent the data from a socket to itself. It could just
as easily be done with two tcp sockets. The important thing is that I
control both the tx and rx sides, so I know how much data should be
present in the rx queue at any point in time.

The part that surprised me was that I could send multiple chunks of data
without sk_rmem_alloc changing on the socket to which the data was being
sent. Then it would jump up by a large amount (up to 20K) all at once.

I'm starting to suspect that the discrepency might have something to do
with the skb_copy_datagram_iovec() call in tcp_data_queue(), and how
skb_set_owner_r() is only called if "eaten" is <= 0. This could be
totally off-base though.

Chris

2009-12-01 17:52:12

by Eric Dumazet

[permalink] [raw]
Subject: Re: seeing strange values for tcp sk_rmem_alloc

Chris Friesen a ?crit :

> I realize this. I sent the data from a socket to itself. It could just
> as easily be done with two tcp sockets. The important thing is that I
> control both the tx and rx sides, so I know how much data should be
> present in the rx queue at any point in time.
>
> The part that surprised me was that I could send multiple chunks of data
> without sk_rmem_alloc changing on the socket to which the data was being
> sent. Then it would jump up by a large amount (up to 20K) all at once.
>
> I'm starting to suspect that the discrepency might have something to do
> with the skb_copy_datagram_iovec() call in tcp_data_queue(), and how
> skb_set_owner_r() is only called if "eaten" is <= 0. This could be
> totally off-base though.
>

If you dont read() your socket, then skb_copy_datagram_iovec() is not called

But be careful of sender tcp stack : It might be delayed a bit,
because it waits for receiver to open its window (slow start)

You probably need something like

while (1) {
send(fd1, buffer, 2Kbytes);
sleep(2); // let tcp stack flush its write buffers
display_sk_rmem_alloc(fd2);
}

2009-12-03 16:58:25

by Chris Friesen

[permalink] [raw]
Subject: Re: seeing strange values for tcp sk_rmem_alloc

On 12/01/2009 11:52 AM, Eric Dumazet wrote:

> But be careful of sender tcp stack : It might be delayed a bit,
> because it waits for receiver to open its window (slow start)
>
> You probably need something like
>
> while (1) {
> send(fd1, buffer, 2Kbytes);
> sleep(2); // let tcp stack flush its write buffers
> display_sk_rmem_alloc(fd2);
> }

Ah, that makes a difference. But the results (see below) still look
odd. For this test, /proc/sys/net/core/rmem_default is 118784. For
some reason sk_rmem_alloc gets bumped by 16KB when I only send 2KB of
data, and it drops back down again every 6 packets.

Chris



used: 16848
used: 33696
used: 50544
used: 67392
used: 84240
used: 101088
used: 30736
used: 47584
used: 64432
used: 81280
used: 98128
used: 114976
used: 44624
used: 61472
used: 78320
used: 95168
used: 112016
used: 128864
used: 58512
used: 75360
used: 92208
used: 109056
used: 125904
used: 142752
used: 72400
used: 89248
used: 106096
used: 122944
used: 139792
used: 156640
used: 86288
used: 103136
used: 119984
used: 136832
used: 153680
used: 170528
used: 100176
used: 117024
used: 133872
used: 150720
used: 167568
used: 184416
used: 114064
used: 130912
used: 147760
used: 164608
used: 181456
used: 198304
used: 127952
used: 144800
used: 161648
used: 178496
used: 195344
used: 212192
used: 141840
used: 158688
used: 175536
used: 192384
used: 209232
used: 226080
used: 155728
used: 172576
used: 189424
used: 206272
used: 223120
used: 239968
used: 169616
used: 186464
used: 203312
used: 220160
used: 237008
used: 253856
used: 183504
used: 200352
used: 217200
used: 234048
used: 250896
used: 267744
used: 197392
used: 214240
used: 231088
used: 247936
used: 264784
used: 281632
used: 211280
used: 228128
used: 244976

2009-12-03 17:04:25

by Eric Dumazet

[permalink] [raw]
Subject: Re: seeing strange values for tcp sk_rmem_alloc

Chris Friesen a ?crit :
> On 12/01/2009 11:52 AM, Eric Dumazet wrote:
>
>> But be careful of sender tcp stack : It might be delayed a bit,
>> because it waits for receiver to open its window (slow start)
>>
>> You probably need something like
>>
>> while (1) {
>> send(fd1, buffer, 2Kbytes);
>> sleep(2); // let tcp stack flush its write buffers
>> display_sk_rmem_alloc(fd2);
>> }
>
> Ah, that makes a difference. But the results (see below) still look
> odd. For this test, /proc/sys/net/core/rmem_default is 118784. For
> some reason sk_rmem_alloc gets bumped by 16KB when I only send 2KB of
> data, and it drops back down again every 6 packets.
>
> Chris

Might be because you use loopback device ? ;)

ifconfig lo | grep MTU
UP LOOPBACK RUNNING MTU:16436 Metric:1

After a while (when hitting rcvbuf limit), tcp stack performs skb collapses, to reduce ram usage.

2009-12-03 21:43:16

by Chris Friesen

[permalink] [raw]
Subject: Re: seeing strange values for tcp sk_rmem_alloc

On 12/03/2009 11:04 AM, Eric Dumazet wrote:
> Chris Friesen a ?crit :

>> Ah, that makes a difference. But the results (see below) still
>> look odd. For this test, /proc/sys/net/core/rmem_default is
>> 118784. For some reason sk_rmem_alloc gets bumped by 16KB when I
>> only send 2KB of data, and it drops back down again every 6
>> packets.

>
> Might be because you use loopback device ? ;)
>
> ifconfig lo | grep MTU UP LOOPBACK RUNNING MTU:16436 Metric:1
>
> After a while (when hitting rcvbuf limit), tcp stack performs skb
> collapses, to reduce ram usage.

Looks like this is indeed the case, changing the loopback mtu to 8K
makes it increase in 8K increments. This is quite unexpected, since for
UDP it only increases by the actual amount of the data being sent rather
than the size of the full MTU.

As it stands, it looks like sk_rmem_alloc isn't very useful. Can you
point me to something that is more reflective of the actual space used
by the tcp socket? I'd like something that increases monotonically with
received data and once it exceeds the configured size then the tcp stack
will start dropping packets. Does such a thing exist, or is the tcp
stack just too complicated to easily obtain this sort of information?

Thanks for your help,

Chris