DomainKey-Signature: a=rsa-sha1; c=nofws; s=beta; d=cern.ch; q=dns; 
	h=received:message-id:date:from:to:subject:mime-version:content-type:content-transfer-encoding; 
	b=KBADUF6HNBF9gyQ6F2ka1dLVGsfNbk8jjp2kLd3sR91ikiMYX3xxsLcNQcDY5Dw/531+7sB5cbMUtxvk9ZIPEI3poWh7lOxBSh3G5GcJb7pd1lXhbgtxZ5hWMx3E3D3f;
Keywords: CERN SpamKiller Note: -51 Charset: west-latin
Message-ID: <42669E73.4000304@cern.ch>
Date: Wed, 20 Apr 2005 20:24:51 +0200
From: Andreas Hirstius <Andreas.Hirstius@cern.ch>
User-Agent: Mozilla/5.0 (X11; U; Linux ia64; en-US; rv:1.7.7) Gecko/20050418
MIME-Version: 1.0
To: jmerkey <jmerkey@utah-nac.org>, linux-kernel@vger.kernel.org
Subject: Re: Serious performance degradation on a RAID with kernel 2.6.10-bk7
 and later
References: <42669357.9080604@cern.ch> <42668977.5060708@utah-nac.org>
In-Reply-To: <42668977.5060708@utah-nac.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 7249
Lines: 203


I was curious if your patch would change the write rate because I see 
only ~550MB/s (continuous) which is about a factor two away from the 
capabilities of the disks.
... and got this behaviour (with and without my other patch):

(with single "dd if=/dev/zero of=testxx bs=65536 count=150000 &" or 
several of them in parallel on an XFS fs)

"vmstat 1" output
 0  0      0  28416  37888 15778368    0    0     0     0 8485  3043  0  
0  0 100
 6  0      0  22144  37952 15785920    0    0     0 12356 7695  2029  0 
61  0 39
 7  0      0  20864  38016 15785856    0    0   324 1722240 8046  4159  
0 100  0  0
 7  0      0  20864  38016 15784768    0    0     0 1261440 8391  5222  
0 100  0  0
 7  0      0  25984  38016 15781504    0    0     0 2003456 8372  5038  
0 100  0  0
 0  6      0  22784  38016 15781504    0    0     0 2826624 8397  8423  
0 93  7  0
 0  0      0  21632  38016 15783680    0    0     0 147840 8572 12114  
0  9 17 74
 0  0      0  21632  38016 15783680    0    0     0    52 8586  5185  0  
0  0 100
 0  0      0  21632  38016 15783680    0    0     0     0 8588  5412  0  
0  0 100
 0  0      0  21632  38016 15783680    0    0     0     0 8580  5372  0  
0  0 100
 0  0      0  21632  38016 15783680    0    0     0     0 7840  5590  0  
0  0 100
 0  0      0  21632  38016 15783680    0    0     0     0 8587  5321  0  
0  0 100
 0  0      0  21632  38016 15783680    0    0     0     0 8569  5575  0  
0  0 100
 0  0      0  21632  38016 15783680    0    0     0     0 8550  5157  0  
0  0 100
 0  0      0  21632  38016 15783680    0    0     0     0 7963  5640  0  
0  0 100
 0  0      0  21632  38016 15783680    0    0     0    32 8583  4434  0  
0  0 100
 7  0      0  20800  38016 15784768    0    0     0  7424 8404  3638  0 
15  0 85
 8  0      0  20864  38016 15786944    0    0     0 688768 7357  3221  0 
100  0  0
 8  0      0  20736  28544 15794240    0    0     0 1978560 8376  4897  
0 100  0  0
 7  0      0  22208  20736 15798784    0    0     0 1385088 8367  4984  
0 100  0  0
 6  0      0  22144   6848 15812672    0    0    56 1291904 8377  4815  
0 100  0  0
 0  0      0  50240   6848 15809408    0    0   304  3136 8556  5088  1 
26  0 74
 0  0      0  50304   6848 15809408    0    0     0     0 8572  5181  0  
0  0 100

The average rate here is again pretty close to 550MB/s, it just writes 
the blocks in "bursts"...


Andreas


jmerkey wrote:

>
>
> For 3Ware, you need to chage the queue depths, and you will see 
> dramatically improved performance. 3Ware can take requests
> a lot faster than Linux pushes them out. Try changing this instead, 
> you won't be going to sleep all the time waiting on the read/write
> request queues to get "unstarved".
>
>
> /linux/include/linux/blkdev.h
>
> //#define BLKDEV_MIN_RQ 4
> //#define BLKDEV_MAX_RQ 128 /* Default maximum */
> #define BLKDEV_MIN_RQ 4096
> #define BLKDEV_MAX_RQ 8192 /* Default maximum */
>
>
> Jeff
>
> Andreas Hirstius wrote:
>
>> Hi,
>>
>>
>> We have a rx4640 with 3x 3Ware 9500 SATA controllers and 24x WD740GD 
>> HDD in a software RAID0 configuration (using md).
>> With kernel 2.6.11 the read performance on the md is reduced by a 
>> factor of 20 (!!) compared to previous kernels.
>> The write rate to the md doesn't change!! (it actually improves a bit).
>>
>> The config for the kernels are basically identical.
>>
>> Here is some vmstat output:
>>
>> kernel 2.6.9: ~1GB/s read
>> procs memory swap io system cpu
>> r b swpd free buff cache si so bi bo in cs us sy wa id
>> 1 1 0 12672 6592 15914112 0 0 1081344 56 15719 1583 0 11 14 74
>> 1 0 0 12672 6592 15915200 0 0 1130496 0 15996 1626 0 11 14 74
>> 0 1 0 12672 6592 15914112 0 0 1081344 0 15891 1570 0 11 14 74
>> 0 1 0 12480 6592 15914112 0 0 1081344 0 15855 1537 0 11 14 74
>> 1 0 0 12416 6592 15914112 0 0 1130496 0 16006 1586 0 12 14 74
>>
>>
>> kernel 2.6.11: ~55MB/s read
>> procs memory swap io system cpu
>> r b swpd free buff cache si so bi bo in cs us sy wa id
>> 1 1 0 24448 37568 15905984 0 0 56934 0 5166 1862 0 1 24 75
>> 0 1 0 20672 37568 15909248 0 0 57280 0 5168 1871 0 1 24 75
>> 0 1 0 22848 37568 15907072 0 0 57306 0 5173 1874 0 1 24 75
>> 0 1 0 25664 37568 15903808 0 0 57190 0 5171 1870 0 1 24 75
>> 0 1 0 21952 37568 15908160 0 0 57267 0 5168 1871 0 1 24 75
>>
>>
>> Because the filesystem might have an impact on the measurement, "dd" 
>> on /dev/md0
>> was used to get information about the performance. This also opens 
>> the possibility to test with block sizes larger than the page size.
>> And it appears that the performance with kernel 2.6.11 is closely 
>> related to the block size.
>> For example if the block size is exactly a multiple (>2) of the page 
>> size the performance is back to ~1.1GB/s.
>> The general behaviour is a bit more complicated:
>> 1. bs <= 1.5 * ps : ~27-57MB/s (differs with ps)
>> 2. bs > 1.5 * ps && bs < 2 * ps : rate increases to max. rate
>> 3. bs = n * ps ; (n >= 2) : ~1.1GB/s (== max. rate)
>> 4. bs > n * ps && bs < ~(n+0.5) * ps ; (n > 2) : ~27-70MB/s (differs 
>> with ps)
>> 5. bs > ~(n+0.5) * ps && bs < (n+1) * ps ; (n > 2) : increasing rate 
>> in several, more or
>> less, distinct steps (e.g. 1/3 of max. rate and then 2/3 of max rate 
>> for 64k pages)
>>
>> I've tested all four possible page sizes on Itanium (4k, 8k, 16k and 
>> 64k) and the pattern is always the same!!
>>
>> With kernel 2.6.9 (any kernel before 2.6.10-bk6) the read rate is 
>> always at ~1.1GB/s,
>> independent of the block size.
>>
>>
>> This simple patch solves the problem, but I have no idea of possible 
>> side-effects ...
>>
>> --- linux-2.6.12-rc2_orig/mm/filemap.c 2005-04-04 18:40:05.000000000 
>> +0200
>> +++ linux-2.6.12-rc2/mm/filemap.c 2005-04-20 10:27:42.000000000 +0200
>> @@ -719,7 +719,7 @@
>> index = *ppos >> PAGE_CACHE_SHIFT;
>> next_index = index;
>> prev_index = ra.prev_page;
>> - last_index = (*ppos + desc->count + PAGE_CACHE_SIZE-1) >> 
>> PAGE_CACHE_SHIFT;
>> + last_index = (*ppos + desc->count + PAGE_CACHE_SIZE) >> 
>> PAGE_CACHE_SHIFT;
>> offset = *ppos & ~PAGE_CACHE_MASK;
>>
>> isize = i_size_read(inode);
>> --- linux-2.6.12-rc2_orig/mm/readahead.c 2005-04-04 
>> 18:40:05.000000000 +0200
>> +++ linux-2.6.12-rc2/mm/readahead.c 2005-04-20 18:37:04.000000000 +0200
>> @@ -70,7 +70,7 @@
>> */
>> static unsigned long get_init_ra_size(unsigned long size, unsigned 
>> long max)
>> {
>> - unsigned long newsize = roundup_pow_of_two(size);
>> + unsigned long newsize = size;
>>
>> if (newsize <= max / 64)
>> newsize = newsize * newsize;
>>
>>
>>
>> In order to keep this mail short, I've created a webpage that 
>> contains all the detailed information and some plots:
>> http://www.cern.ch/openlab-debugging/raid
>>
>>
>> Regards,
>>
>> Andreas Hirstius
>>
>>
>> -
>> To unsubscribe from this list: send the line "unsubscribe 
>> linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
>>
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/