2003-09-08 00:10:09

by Sven Köhler

[permalink] [raw]
Subject: [blockdevices/NBD] huge read/write-operations are splitted by the kernel

Hi,

i discussed a problem of the NBD-protocl with Pavel Machek. The problem
i saw is that there is no maximum for the length field in the requests
that the NBD kernel module sends to the NBD server. Well, this length
field is the length field from the read/write-operation that the kernel
delegates to the blockdevice-implementation.
I did some tests tests like
dd if=dev/nbd/0 of=/dev/null bs=10M
and our NBD-server implementation printed out the length field of each
reqeust. There was a very regular pattern like
0x1fc00 (127KB)
0x00400 (1KB)
0x1fc00
0x00400
...
Well, can anybody explain that to me?
(why so "little" 1KB requests? but that's not important)

Well, i also tested
dd if=dev/nbd/0 of=/dev/null bs=1
which means that the device will be read in chunks of 1byte.
The result was the same: 127KB, 1KB, 127KB, 1KB...

I guess the caching layer is inbetween, and will devide the huge 10MB
requests into smaller 127KB ones, as well as joining the small 1byte
requests by using read-ahead i guess.
Perhaps you could tell me how i can turn off caching. Than i will test
again without the cache.

The thing i want to know is, if there is any part of the kernel that
gaarantees that a read/write requests will not be bigger that a certain
value. If there is no such upper limit, the NBD itself would need to
split things up which might become a complicated task. This task need to
be done, because it can become very difficult for the NBD server to
handle huge values, and one huge requests will block all other pending
small ones due to limitations of the NBD protocol.

Thx
Sven



2003-09-08 08:58:36

by Jens Axboe

[permalink] [raw]
Subject: Re: [blockdevices/NBD] huge read/write-operations are splitted by the kernel

On Mon, Sep 08 2003, Sven K?hler wrote:
> Hi,
>
> i discussed a problem of the NBD-protocl with Pavel Machek. The problem
> i saw is that there is no maximum for the length field in the requests
> that the NBD kernel module sends to the NBD server. Well, this length
> field is the length field from the read/write-operation that the kernel
> delegates to the blockdevice-implementation.
> I did some tests tests like
> dd if=dev/nbd/0 of=/dev/null bs=10M
> and our NBD-server implementation printed out the length field of each
> reqeust. There was a very regular pattern like
> 0x1fc00 (127KB)
> 0x00400 (1KB)
> 0x1fc00
> 0x00400
> ...
> Well, can anybody explain that to me?
> (why so "little" 1KB requests? but that's not important)
>
> Well, i also tested
> dd if=dev/nbd/0 of=/dev/null bs=1
> which means that the device will be read in chunks of 1byte.
> The result was the same: 127KB, 1KB, 127KB, 1KB...
>
> I guess the caching layer is inbetween, and will devide the huge 10MB
> requests into smaller 127KB ones, as well as joining the small 1byte
> requests by using read-ahead i guess.
> Perhaps you could tell me how i can turn off caching. Than i will test
> again without the cache.
>
> The thing i want to know is, if there is any part of the kernel that
> gaarantees that a read/write requests will not be bigger that a certain
> value. If there is no such upper limit, the NBD itself would need to
> split things up which might become a complicated task. This task need to
> be done, because it can become very difficult for the NBD server to
> handle huge values, and one huge requests will block all other pending
> small ones due to limitations of the NBD protocol.

You'll probably find that if you bump the max_sectors count if your
drive to 256 from 255 (that is the default if you haven't set it), then
you'll see 128kb chunks all the time.

See max_sectors[] array.

--
Jens Axboe

2003-09-08 12:50:16

by Sven Köhler

[permalink] [raw]
Subject: Re: [blockdevices/NBD] huge read/write-operations are splitted by the kernel

> You'll probably find that if you bump the max_sectors count if your
> drive to 256 from 255 (that is the default if you haven't set it), then
> you'll see 128kb chunks all the time.

Why is 255 the default. It seems to be an inefficient value. Perhaps the
NBD itself should set it to 256.

> See max_sectors[] array.

Well, i found the declaration, but i can't imagine how to set the values
in it.


2003-09-08 13:27:57

by Sven Köhler

[permalink] [raw]
Subject: Re: [blockdevices/NBD] huge read/write-operations are splitted by the kernel

> You'll probably find that if you bump the max_sectors count if your
> drive to 256 from 255 (that is the default if you haven't set it), then
> you'll see 128kb chunks all the time.
>
> See max_sectors[] array.

To make it clear:
the kernel will never read or write more sectors at once than specified
in the max_sectors array (where every device has its own value), right?


2003-09-08 14:34:00

by Jens Axboe

[permalink] [raw]
Subject: Re: [blockdevices/NBD] huge read/write-operations are splitted by the kernel

On Mon, Sep 08 2003, Sven K?hler wrote:
> >You'll probably find that if you bump the max_sectors count if your
> >drive to 256 from 255 (that is the default if you haven't set it), then
> >you'll see 128kb chunks all the time.
> >
> >See max_sectors[] array.
>
> To make it clear:
> the kernel will never read or write more sectors at once than specified
> in the max_sectors array (where every device has its own value), right?

Correct

--
Jens Axboe

2003-09-08 14:33:27

by Jens Axboe

[permalink] [raw]
Subject: Re: [blockdevices/NBD] huge read/write-operations are splitted by the kernel

On Mon, Sep 08 2003, Sven K?hler wrote:
> >You'll probably find that if you bump the max_sectors count if your
> >drive to 256 from 255 (that is the default if you haven't set it), then
> >you'll see 128kb chunks all the time.
>
> Why is 255 the default. It seems to be an inefficient value. Perhaps the
> NBD itself should set it to 256.

To avoid 8-bit wrap arounds, basically. Not sure it's still very valid,
you are free to compile your kernel with it set to 256. 2.6 uses 256 by
default.

> >See max_sectors[] array.
>
> Well, i found the declaration, but i can't imagine how to set the values
> in it.

You can grep for other examples in the kernel, I would imagine?

--
Jens Axboe