Message-ID: <4786C578.3090302@tmr.com>
Date: Thu, 10 Jan 2008 20:25:12 -0500
From: Bill Davidsen <davidsen@tmr.com>
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.8) Gecko/20061105 SeaMonkey/1.0.6
MIME-Version: 1.0
To: Jens Axboe <jens.axboe@oracle.com>
CC: linux-kernel@vger.kernel.org, chris.mason@oracle.com,
       linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH][RFC] fast file mapping for loop
References: <20080109085231.GE6650@kernel.dk>
In-Reply-To: <20080109085231.GE6650@kernel.dk>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 4586
Lines: 90

Jens Axboe wrote:
> Hi,
> 
> loop.c currently uses the page cache interface to do IO to file backed
> devices. This works reasonably well for simple things, like mapping an
> iso9660 file for direct mount and other read-only workloads. Writing is
> somewhat problematic, as anyone who has really used this feature can
> attest to - it tends to confuse the vm (hello kswapd) since it break
> dirty accounting and behaves very erratically on writeout. Did I mention
> that it's pretty slow as well, for both reads and writes?
> 
Since you are looking for comments, I'll mention a loop-related behavior 
I've been seeing and see if it gets comments or is useful, since it can 
be used to tickle bad behavior on demand.

I have an 6GB sparse file, which I mount with cryptoloop and populate as 
an ext3 filesystem (more later on why). I then copy ~5.8GB of data to 
the filesystem, which is unmounted to be burnt to a DVD. Before it's 
burned the "dvdisaster" application is used to add some ECC information 
to the end, and make an image which fits on a DVD-DL. Media will be 
burned and distributed to multiple locations.

The problem:

When copying with rsync, the copy runs at ~25MB/s for a while, then 
falls into a pattern of bursts of 25MB/s followed by 10-15 sec of iowait 
with no disk activity. So I tried doing the copy by cpio
   find . -depth | cpio -pdm /mnt/loop
which shows exactly the same behavior. Then, for no good reason I tried
   find . -depth | cpio -pBdm /mnt/loop
and the copy ran at 25MB/s for the whole data set.

I was able to see similar results with a pure loop mount, I only mention 
the crypto for accuracy. Because many of these have been shipped over 
the last two years and new loop code would only be useful in this case 
if it were compatible so old data sets could be read.

> It also behaves differently than a real drive. For writes, completions
> are done once they hit page cache. Since loop queues bio's async and
> hands them off to a thread, you can have a huge backlog of stuff to do.
> It's hard to attempt to guarentee data safety for file systems on top of
> loop without making it even slower than it currently is.
> 
> Back when loop was only used for iso9660 mounting and other simple
> things, this didn't matter. Now it's often used in xen (and others)
> setups where we do care about performance AND writing. So the below is a
> attempt at speeding up loop and making it behave like a real device.
> It's a somewhat quick hack and is still missing one piece to be
> complete, but I'll throw it out there for people to play with and
> comment on.
> 
> So how does it work? Instead of punting IO to a thread and passing it
> through the page cache, we instead attempt to send the IO directly to the
> filesystem block that it maps to. loop maintains a prio tree of known
> extents in the file (populated lazily on demand, as needed). Advantages
> of this approach:
> 
> - It's fast, loop will basically work at device speed.
> - It's fast, loop it doesn't put a huge amount of system load on the
>   system when busy. When I did comparison tests on my notebook with an
>   external drive, running a simple tiobench on the current in-kernel
>   loop with a sparse file backing rendered the notebook basically
>   unusable while the test was ongoing. The remapper version had no more
>   impact than it did when used directly on the external drive.
> - It behaves like a real block device.
> - It's easy to support IO barriers, which is needed to ensure safety
>   especially in virtualized setups.
> 
> Disadvantages:
> 
> - The file block mappings must not change while loop is using the file.
>   This means that we have to ensure exclusive access to the file and
>   this is the bit that is currently missing in the implementation. It
>   would be nice if we could just do this via open(), ideas welcome...
> - It'll tie down a bit of memory for the prio tree. This is GREATLY
>   offset by the reduced page cache foot print though.
> - It cannot be used with the loop encryption stuff. dm-crypt should be
>   used instead, on top of loop (which, I think, is even the recommended
>   way to do this today, so not a big deal).
> 

-- 
Bill Davidsen <davidsen@tmr.com>
   "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/