Message-ID: <4CFFBA7D.6060802@psi5.com>
Date: Wed, 08 Dec 2010 18:03:57 +0100
From: Christian Brandt <brandtc@psi5.com>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.2.12) Gecko/20101027 Lightning/1.0b2 Thunderbird/3.1.6
MIME-Version: 1.0
To: linux-kernel@vger.kernel.org
Subject: swap storage alignment and stride size
Content-Type: text/plain; charset=ISO-8859-15
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 1878
Lines: 55

Preamble:

Hi fellow linux tamers, the following question has bounced around for
some days in local lists and newsgroups without conclussion and was
escalated upstream several times, here we are...

We are discussing semi-professional storage systems, e.g. ext4 on luks
on lvm on raid on gpt-partitions on 4k sector harddrives or 512k sector
SSDs. Usually every level profits a lot from aligning the data to the
underlying sector/stride/chunk size, e.g. ext4 with a 128k stripe size
will run a lot better on a well aligned 64k stride raid5.

In other words, partition tables, LVM, RAID, luks and filesystems know
how to handle and profit from aligned larger chunks.

In detail:

As far as we can read mm/swapfile.c linux is only concerned about cpu
page size and does not know anything about underlying
chunk/sector/stride sizes and alignment.

Therefore we think every small 1/2/4/8kiB page-sized write access leads
to a read-modify-write cycle for the whole chunk, taking more then twice
as long than simply writing the whole chunk at once.

Questions:

Is this the right place to ask?

Does or could linux swapping make use of aligning chunks?

And if, how?

If not, would it be an improvement?

Will this effect be mostly compensated by the block elevator?

Does it make any sense to change the mkswap page size to the chunk size?
We think those are two totally different beasts and should be left
seperated.

Is Linux already aware of chunk sizes within swap?

How to set up and controlled by the administrator?

-- 
Christian Brandt

 life is short and in most cases it ends with death
 but my tombstone will carry the hiscore
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/