2005-11-12 00:22:40

by Jeff V. Merkey

[permalink] [raw]
Subject: 2.6.9 reporting 1 Gigabyte/second throughput on bio's, timer skew possible?


I am running one of our 3U appliances with dual 9500 Series 3Ware
Controllers. The unit is an online demo system accessible from
the internet via SSH to the public for solera networks Linux appliance
demos running the DSFS file system:

(ncurses)
demo.soleranetworks.com
Account: demo
password: demo

(text ncurses)
demo.soleranetworks.com
Account: demo-text
password: demo

I have allocated 393,216 bio buffers I statically maintain in a chain
and am running the dsfs file system with 3 x gigabit links fully
saturated. meta-data
increases the write sizes to 720 MB/Second on dual 9500 controllers with
8 drives each (total of 16) 7200 RPM Drives. I am seeing some
congestion and bursting on the bio chains as they are submitted. I am
not aware of anyone pushing 2.6 to these limits at present with this
type of architecture. I have split
the kernel address space 3GB/1GB 3-kernel 1-user space in order to
create enough memory to run this file system with 2GB of cache.

DSFS dynamically generates html status files form within the file
system. When the system gets somewhat behind, I am seeing bursts > 1
GB/Second which exceeds the theoretical limit of the bus. I have a
timer function that runs every second and profiles the I/O throughput
created by DSFS with bio submissions and captured packets. I am asking
if there is clock skew at these data rates with use of the timer
functions. The system appears to be sustaining 1GB/Second throughput on
dual controllers. I have verified through data rates the system is
sustaining 800 megabytes/second with these 1GB/S bursts. I am curious
if there is potentially timer skew at these higher rates since I am
having a hard time accepting that I can push 1GB/S through a bus rated
at only 850 MB/S for DMA based transfers. The unit is accessible by
the general public, since its a demo unit andwe are unconcerned about
folks getting on the system. Folks are welcome to look and if anyone
has any thoughts on this, please let me know. I am concerned that the
timer functions are not always ending on second boundries, which would
explain the higher reported numbers. Windows 2003 does not approach
these performance numbers, BTW, so Linux appears to win on raw
performance for vertical File System Apps.

dsfs file system mounted at /var/ftp can be viewed:
ftp://demo.soleranetworks.com/

Stats pages generated from dsfs:

capture stats:
ftp://demo.soleranetworks.com/stats/capture.html
storage stats:
ftp://demo.soleranetworks.com/stats/storage.html
dsfs cache stats:
ftp://demo.soleranetworks.com/stats/cache.html
network interface stats:
ftp://demo.soleranetworks.com/stats/network.html
virtual network interface maps:
ftp://demo.soleranetworks.com/stats/virtual.html

Jeff


2005-11-12 09:50:53

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.6.9 reporting 1 Gigabyte/second throughput on bio's, timer skew possible?

On Fri, Nov 11 2005, Jeff V. Merkey wrote:
> I have allocated 393,216 bio buffers I statically maintain in a chain
> and am running the dsfs file system with 3 x gigabit links fully
> saturated. meta-data
> increases the write sizes to 720 MB/Second on dual 9500 controllers with
> 8 drives each (total of 16) 7200 RPM Drives. I am seeing some
> congestion and bursting on the bio chains as they are submitted. I am
> not aware of anyone pushing 2.6 to these limits at present with this
> type of architecture. I have split

16 disks on 2 controllers, I'm 100% sure they are lots of people
pushing 2.6 much further than that! I wouldn't evne call that a big
setup.

> DSFS dynamically generates html status files form within the file
> system. When the system gets somewhat behind, I am seeing bursts > 1
> GB/Second which exceeds the theoretical limit of the bus. I have a
> timer function that runs every second and profiles the I/O throughput
> created by DSFS with bio submissions and captured packets. I am asking
> if there is clock skew at these data rates with use of the timer
> functions. The system appears to be sustaining 1GB/Second throughput on
> dual controllers. I have verified through data rates the system is
> sustaining 800 megabytes/second with these 1GB/S bursts. I am curious
> if there is potentially timer skew at these higher rates since I am
> having a hard time accepting that I can push 1GB/S through a bus rated
> at only 850 MB/S for DMA based transfers. The unit is accessible by

Note that the linux io stats accounting in 2.6.9 accounts queued io, not
io completions. So it's quite possible to have burst rates > bus speeds
for async io. 2.6.15-rc1 change this.

--
Jens Axboe

2005-11-12 11:17:25

by Jeffrey V. Merkey

[permalink] [raw]
Subject: Re: 2.6.9 reporting 1 Gigabyte/second throughput on bio's, timer skew possible?

Jens Axboe wrote:

>On Fri, Nov 11 2005, Jeff V. Merkey wrote:
>
>
>>I have allocated 393,216 bio buffers I statically maintain in a chain
>>and am running the dsfs file system with 3 x gigabit links fully
>>saturated. meta-data
>>increases the write sizes to 720 MB/Second on dual 9500 controllers with
>>8 drives each (total of 16) 7200 RPM Drives. I am seeing some
>>congestion and bursting on the bio chains as they are submitted.
>>


>16 disks on 2 controllers, I'm 100% sure they are lots of people
>pushing 2.6 much further than that! I wouldn't evne call that a big
>setup.
>
>
Probably not for this type of application.

>
>
>>DSFS dynamically generates html status files form within the file
>>system. When the system gets somewhat behind, I am seeing bursts > 1
>>GB/Second which exceeds the theoretical limit of the bus. I have a
>>timer function that runs every second and profiles the I/O throughput
>>created by DSFS with bio submissions and captured packets. I am asking
>>if there is clock skew at these data rates with use of the timer
>>functions. The system appears to be sustaining 1GB/Second throughput on
>>dual controllers. I have verified through data rates the system is
>>sustaining 800 megabytes/second with these 1GB/S bursts. I am curious
>>if there is potentially timer skew at these higher rates since I am
>>having a hard time accepting that I can push 1GB/S through a bus rated
>>at only 850 MB/S for DMA based transfers. The unit is accessible by
>>
>>
>
>Note that the linux io stats accounting in 2.6.9 accounts queued io, not
>io completions. So it's quite possible to have burst rates > bus speeds
>for async io. 2.6.15-rc1 change this.
>
>
>
So you are willing to log into the unit and validate these numbers? I
would like for an
someone other than me to validate I am seeing these rates.

Jeff


2005-11-13 19:35:23

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.6.9 reporting 1 Gigabyte/second throughput on bio's, timer skew possible?

On Sat, Nov 12 2005, Jeff V. Merkey wrote:
> Jens Axboe wrote:
>
> >On Fri, Nov 11 2005, Jeff V. Merkey wrote:
> >
> >
> >>I have allocated 393,216 bio buffers I statically maintain in a chain
> >>and am running the dsfs file system with 3 x gigabit links fully
> >>saturated. meta-data
> >>increases the write sizes to 720 MB/Second on dual 9500 controllers with
> >>8 drives each (total of 16) 7200 RPM Drives. I am seeing some
> >>congestion and bursting on the bio chains as they are submitted.
> >>
>
>
> >16 disks on 2 controllers, I'm 100% sure they are lots of people
> >pushing 2.6 much further than that! I wouldn't evne call that a big
> >setup.
> >
> >
> Probably not for this type of application.
>
> >
> >
> >>DSFS dynamically generates html status files form within the file
> >>system. When the system gets somewhat behind, I am seeing bursts > 1
> >>GB/Second which exceeds the theoretical limit of the bus. I have a
> >>timer function that runs every second and profiles the I/O throughput
> >>created by DSFS with bio submissions and captured packets. I am asking
> >>if there is clock skew at these data rates with use of the timer
> >>functions. The system appears to be sustaining 1GB/Second throughput on
> >>dual controllers. I have verified through data rates the system is
> >>sustaining 800 megabytes/second with these 1GB/S bursts. I am curious
> >>if there is potentially timer skew at these higher rates since I am
> >>having a hard time accepting that I can push 1GB/S through a bus rated
> >>at only 850 MB/S for DMA based transfers. The unit is accessible by
> >>
> >>
> >
> >Note that the linux io stats accounting in 2.6.9 accounts queued io, not
> >io completions. So it's quite possible to have burst rates > bus speeds
> >for async io. 2.6.15-rc1 change this.
> >
> >
> >
> So you are willing to log into the unit and validate these numbers? I
> would like for an
> someone other than me to validate I am seeing these rates.

If you average the bandwidth over a time long enough to eliminate the
bursty queueing rates, your average rage should drop to what the
hardware can actually do. Or dig out the patch from 2.6.15-rc1 for
ll_rw_blk.c and apply it to 2.6.9, find it here:

http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=d72d904a5367ad4ca3f2c9a2ce8c3a68f0b28bf0;hp=d83c671fb7023f69a9582e622d01525054f23b66

--
Jens Axboe

2005-11-13 21:27:06

by Jeff V. Merkey

[permalink] [raw]
Subject: Re: 2.6.9 reporting 1 Gigabyte/second throughput on bio's, timer skew possible?

Jens Axboe wrote:

>On Sat, Nov 12 2005, Jeff V. Merkey wrote:
>
>
>>Jens Axboe wrote:
>>
>>
>>
>>>On Fri, Nov 11 2005, Jeff V. Merkey wrote:
>>>
>>>
>>>
>>>
>>>>I have allocated 393,216 bio buffers I statically maintain in a chain
>>>>and am running the dsfs file system with 3 x gigabit links fully
>>>>saturated. meta-data
>>>>increases the write sizes to 720 MB/Second on dual 9500 controllers with
>>>>8 drives each (total of 16) 7200 RPM Drives. I am seeing some
>>>>congestion and bursting on the bio chains as they are submitted.
>>>>
>>>>
>>>>
>>
>>
>>>16 disks on 2 controllers, I'm 100% sure they are lots of people
>>>pushing 2.6 much further than that! I wouldn't evne call that a big
>>>setup.
>>>
>>>
>>>
>>>
>>Probably not for this type of application.
>>
>>
>>
>>>
>>>
>>>>DSFS dynamically generates html status files form within the file
>>>>system. When the system gets somewhat behind, I am seeing bursts > 1
>>>>GB/Second which exceeds the theoretical limit of the bus. I have a
>>>>timer function that runs every second and profiles the I/O throughput
>>>>created by DSFS with bio submissions and captured packets. I am asking
>>>>if there is clock skew at these data rates with use of the timer
>>>>functions. The system appears to be sustaining 1GB/Second throughput on
>>>>dual controllers. I have verified through data rates the system is
>>>>sustaining 800 megabytes/second with these 1GB/S bursts. I am curious
>>>>if there is potentially timer skew at these higher rates since I am
>>>>having a hard time accepting that I can push 1GB/S through a bus rated
>>>>at only 850 MB/S for DMA based transfers. The unit is accessible by
>>>>
>>>>
>>>>
>>>>
>>>Note that the linux io stats accounting in 2.6.9 accounts queued io, not
>>>io completions. So it's quite possible to have burst rates > bus speeds
>>>for async io. 2.6.15-rc1 change this.
>>>
>>>
>>>
>>>
>>>
>>So you are willing to log into the unit and validate these numbers? I
>>would like for an
>>someone other than me to validate I am seeing these rates.
>>
>>
>
>If you average the bandwidth over a time long enough to eliminate the
>bursty queueing rates, your average rage should drop to what the
>hardware can actually do. Or dig out the patch from 2.6.15-rc1 for
>ll_rw_blk.c and apply it to 2.6.9, find it here:
>
>http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=d72d904a5367ad4ca3f2c9a2ce8c3a68f0b28bf0;hp=d83c671fb7023f69a9582e622d01525054f23b66
>
>
>
Jens,

Thanks. I'll dig out the patch. I am measuring the rates on the back end
and they are running at 720-800 MB/S apart from what's being reported from
the bio submission. At any rate, I ave to say the bio performance is
stunning in comparison to Windows 2003.

Jeff