Message-ID: <3D0A75A4.AB34AC59@zip.com.au>
Date: Fri, 14 Jun 2002 16:00:52 -0700
From: Andrew Morton <akpm@zip.com.au>
MIME-Version: 1.0
To: "Adam J. Richter" <adam@yggdrasil.com>
CC: axboe@suse.de, linux-kernel@vger.kernel.org
Subject: Re: bio_chain: proposed solution for bio_alloc failure and large IO 
 simplification
In-Reply-To: <200206141652.JAA26744@adam.yggdrasil.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org

"Adam J. Richter" wrote:
> 
>         That would be pretty nice too, but that lacks three potential
> advantages of my bio_chain appoach, in order of importance:
> 
>                 1. bio_chain is avoids memory allocation failures,
>                 2. bio_chain can start the first IO slightly sooner,
>                 3. bio_chain makes bio submitters slightly simpler.
> 
>         The disadvantage of my single I/O vector bio_chain apprach is
> that it results in another procedure call per vector.  We also already
> have code that builds multiple IO vectors in ll_rw_kio and the mpage.c
> routines.
> 
> ...

I have not yet seen a BIO allocation failure in testing.  This
would indicate that either the BIO pool is too large, or I'm 
running the wrong tests.  Either way, I don't think we have
demonstrated any otherwise-unsolvable problems with BIO allocation.

What bugs me about your approach is this:  I have been gradually
nudging the kernel's IO paths away from enormously-long per-page
call paths and per-page locking into the direction of a sequence
of short little loops, each of which does the same stuff against
a bunch of pages.

This is by no means finished.  Ultimately, I want to go into the
page allocator and snip off 16 pages in a single acquisition of
the zone lock.  Put these into the pagecache in a single acquisition
of the mapping lock.  Add them to the LRU with a single acquisition
of the pagemap_lru_lock.  Slot them all into a single BIO and send
them all off with a single submit_bio().  So the common-case unit of I/O
in the kernel will not be a page - it will be a great chunk of pages.

Everything is pretty much in place to do this now.  The main piece
which is missing is the gang page allocator (Hi, Bill).

It'll be damn fast, and nicely scalable.  It's all about reducing the
L1 cache footprint.  Making best use of data when it is in cache.
Making best use of locks once they have been acquired.  If it is
done right, it'll be almost as fast as 64k PAGE_CACHE_SIZE, with
none of its disadvantages.

In this context, bio_chain() is regression, because we're back
into doing stuff once-per-page, and longer per-page call graphs.
I'd rather not have to do it if it can be avoided.

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/