On Tue, May 25 2010, Michael Kerrisk wrote:
> On Mon, May 24, 2010 at 7:56 PM, Jens Axboe <[email protected]> wrote:
> > On Mon, May 24 2010, Michael Kerrisk wrote:
> >> On Mon, May 24, 2010 at 7:35 PM, Jens Axboe <[email protected]> wrote:
> >> > On Mon, May 24 2010, Michael Kerrisk wrote:
> >> >> > Right, that looks like a thinko.
> >> >> >
> >> >> > I'll submit a patch changing it to bytes and the agreed API and fix this
> >> >> > -Eerror. Thanks for your comments and suggestions!
> >> >>
> >> >> Thanks. And of course you are welcome. (Please CC linux-api@vger on
> >> >> this patche (and all patches that change the API/ABI.)
> >> >
> >> > The first change is this:
> >> >
> >> > http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=0191f8697bbdfefcd36e7b8dc3eeddfe82893e4b
> >> >
> >> > and the one dealing with the pages vs bytes API is this:
> >> >
> >> > http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=b9598db3401282bb27b4aef77e3eee12015f7f29
> >> >
> >> > Not tested yet, will do so before sending in of course.
> >>
> >> Eyeballing it quickly, these changes look right.
> >
> > Good, thanks.
> >
> >> Do you have some test programs you can make available?
> >
> > Actually I don't, I test it by modifying fio's splice engine to set/get
> > the pipe size and test the resulting transfers.
>
> Two more questions: is the rationale for this feature written up
> somewhere? I could not find it. Is it primarily intended for
> splice/vmsplice/tee, with the effect for pipe(2) being a side effect?
Yes it's primarily for splice, where the 64kb size can sometimes become
a limiting factor because of the pipe mutex lock/unlocking.
> Also, the minuimum size of the buffer is 2 pages. Why is it not 1?
> (Notwithstanding Linus's assertion, a buffer size of 1 page did give
> us POSIX compliance in kernels before 2.6.10.)
I'll defer to Linus on that, I remember some emails on that part from
way back when. As far as I can tell, POSIX wants atomic writes of "less
than a page size", which would make more sense as "of a page size and
less". And since it should not be a page size from either side on a
uni-directional pipe, then 1 page seems enough for that guarantee at
least.
--
Jens Axboe
On Tue, 1 Jun 2010, Jens Axboe wrote:
>
> > Also, the minuimum size of the buffer is 2 pages. Why is it not 1?
> > (Notwithstanding Linus's assertion, a buffer size of 1 page did give
> > us POSIX compliance in kernels before 2.6.10.)
>
> I'll defer to Linus on that, I remember some emails on that part from
> way back when. As far as I can tell, POSIX wants atomic writes of "less
> than a page size", which would make more sense as "of a page size and
> less". And since it should not be a page size from either side on a
> uni-directional pipe, then 1 page seems enough for that guarantee at
> least.
Hmm. You guys may well be right that a single slot is sufficient. It still
gives us PIPE_BUF worth of data for writing atomically. I had this memory
that we needed two because of the merging logic (we have that special case
for re-using the previous page, so that we don't use waste of memory for
lots of small writes), but looking at the code there is no reason at all
for me to hav thought so.
So I don't know why I thought we needed the extra slot, and a single slot
(if anybody really wants slow writes) looks to be fine.
Linus
> -----Original Message-----
> From: [email protected] [mailto:linux-kernel-
> [email protected]] On Behalf Of Linus Torvalds
> Sent: June 01, 2010 11:22 AM
>
> On Tue, 1 Jun 2010, Jens Axboe wrote:
> >
> > > Also, the minuimum size of the buffer is 2 pages. Why is it not 1?
> > > (Notwithstanding Linus's assertion, a buffer size of 1 page did
> give us POSIX compliance in kernels before 2.6.10.)
> >
> > I'll defer to Linus on that, I remember some emails on that part
from
> > way back when. As far as I can tell, POSIX wants atomic writes of
> > "less than a page size", which would make more sense as "of a page
size and
> > less". And since it should not be a page size from either side on a
> > uni-directional pipe, then 1 page seems enough for that guarantee at
> > least.
>
> Hmm. You guys may well be right that a single slot is sufficient. It
> still gives us PIPE_BUF worth of data for writing atomically. I had
this
> memory that we needed two because of the merging logic (we have that
special
> case for re-using the previous page, so that we don't use waste of
memory
> for lots of small writes), but looking at the code there is no reason
at
> all for me to hav thought so.
>
> So I don't know why I thought we needed the extra slot, and a single
> slot (if anybody really wants slow writes) looks to be fine.
>
Ok, I have a really dumb/basic question. The reason we are letting users
grow the pipe->buffers is to decrease the number of splice-calls. This
implies the user has fnctl'd(when he/she wants performance). Can we not
have an option where we don't have to 'alloc pipe->buffers' worth pages
every single time? As an example look at 'default_file_splice_read'. Is
it possible to enhance the existing functionality by defining a new cmd
and a flag(in struct pipe_xxx etc) and allowing an user to control that?
Something like 'fcntl->F_SETPIPE_SZ_AND_LOCK_PIPE_PAGES'? Does this make
sense?
regards
Chetan Loke