I'd just like to take the chance also to ask about a VM/FS meetup some
time around kernel summit (maybe take a big of time during UKUUG or so).
I was thinking about trying to arrange a proper mini summit thing, but
it's a bit difficult and we could talk this year about doing it for
subsequent years. If there is a bit of interest, we could probably find
a small room somewhere this year on pretty short notice or do it as a
BOF or something.
I don't want to do it in the VM summit, because that kind of alienates
the filesystem guys. What I want to talk about is anything and everything
that the VM can do better to help the fs and vice versa. I'd like to
stay away from memory management where not too applicable to the fs.
A few things I'd like to talk about are:
- the address space operations APIs, and their page based nature. I think
it would be nice to generally move toward offset,length based ones as
much as possible because it should give more efficiency and flexibility
in the filesystem.
- write_begin API if it is still an issue by that date. Hope not :)
- truncate races
- fsblock if it hasn't been shot down by then
- how to make complex API changes without having to fix most things
yourself.
Anyway, if you will be in the area and are interested, let me know (off
list) and we can work out time and place.
Thanks,
Nick
On Sun, Jun 24, 2007 at 06:23:45AM +0200, Nick Piggin wrote:
> I'd just like to take the chance also to ask about a VM/FS meetup some
> time around kernel summit (maybe take a big of time during UKUUG or so).
I won't be around until a day or two before KS, so I'd prefer to have it
after KS if possible.
> I don't want to do it in the VM summit, because that kind of alienates
> the filesystem guys. What I want to talk about is anything and everything
> that the VM can do better to help the fs and vice versa. I'd like to
> stay away from memory management where not too applicable to the fs.
As more of a filesystem person I wouldn't mind it being attached to a VM
conf. In the worst case we'll just rename it VM/FS conference. When and
where is it scheduled?
> - the address space operations APIs, and their page based nature. I think
> it would be nice to generally move toward offset,length based ones as
> much as possible because it should give more efficiency and flexibility
> in the filesystem.
>
> - write_begin API if it is still an issue by that date. Hope not :)
>
> - truncate races
>
> - fsblock if it hasn't been shot down by then
Don't forget high order pagecache please.
> - how to make complex API changes without having to fix most things
> yourself.
More issues:
- aio once again
- refactoring the dio code to separate locking down user VM and doing
the actual page based I/O. I've seen valid requests from kernel
initiated direct I/O from a few real world linux users.
- generic code for delayed allocation and writeout using efficient
multi-page allocator calls. I'll hopefully have an example (lifted XFS
code) by then
- what to do about reads/writes from kernelspace. Currently we have
some places (loop mostly) calling directly into ->prepare_write /
->commit_write which is completely wrong from the layerin perspective
and a locking nightmare for distributed or generally more complex
filesystems. And we have a lot of places using set_fs/set_ds and
calling into ->write. The first category could probably be covered
by using the splice infrastructure, but for the latter we'd want
something more optimal and less hacky, especially given all the overhead
related avoiding deadlocks involing the user address space in the
generic write path. Maybe it's time for generic_file_kernel_write?
> > I'd just like to take the chance also to ask about a VM/FS meetup some
> > time around kernel summit (maybe take a big of time during UKUUG or so).
Yeah, I'd be interested.
> More issues:
- chris mason's patches to normalize buffered and direct locking
- z
> A few things I'd like to talk about are:
>
> - the address space operations APIs, and their page based nature. I think
> it would be nice to generally move toward offset,length based ones as
> much as possible because it should give more efficiency and flexibility
> in the filesystem.
>
> - write_begin API if it is still an issue by that date. Hope not :)
>
> - truncate races
>
> - fsblock if it hasn't been shot down by then
>
> - how to make complex API changes without having to fix most things
> yourself.
I'd like to add:
-revamping filemap_xip.c
-memory mappable swap file (I'm not sure if this one is appropriate
for the proposed meeting)
Christoph Hellwig wrote:
> On Sun, Jun 24, 2007 at 06:23:45AM +0200, Nick Piggin wrote:
>
>>I'd just like to take the chance also to ask about a VM/FS meetup some
>>time around kernel summit (maybe take a big of time during UKUUG or so).
>
>
> I won't be around until a day or two before KS, so I'd prefer to have it
> after KS if possible.
I'd like to see you there, so I hope we can find a date that most
people are happy with. I'll try to start working that out after we
have a rough idea of who's interested.
>>I don't want to do it in the VM summit, because that kind of alienates
>>the filesystem guys. What I want to talk about is anything and everything
>>that the VM can do better to help the fs and vice versa. I'd like to
>>stay away from memory management where not too applicable to the fs.
>
>
> As more of a filesystem person I wouldn't mind it being attached to a VM
> conf. In the worst case we'll just rename it VM/FS conference. When and
> where is it scheduled?
I'll just cc Martin, however the VM conference I think is pretty short
on filesystem people. I'd also like to avoid a lot of VM topics and
hopefully have enough time for a topic of interest or so from each fs
maintainer who has something to talk about.
But I'm open to ideas that will make it work better. FWIW, Anton has
offered to try arranging conference facilities at the university, so
I think we should be covered there.
>>- the address space operations APIs, and their page based nature. I think
>> it would be nice to generally move toward offset,length based ones as
>> much as possible because it should give more efficiency and flexibility
>> in the filesystem.
>>
>>- write_begin API if it is still an issue by that date. Hope not :)
>>
>>- truncate races
>>
>>- fsblock if it hasn't been shot down by then
>
>
> Don't forget high order pagecache please.
Leaving my opinion of higher order pagecache aside, this _may_ be an
example of something that doesn't need a lot of attention, because it
should be fairly uncontroversial from a filesystem's POV? (eg. it is
more a relevant item to memory management and possibly block layer).
OTOH if it is discussed in the context of "large blocks in the buffer
layer is crap because we can do it with higher order pagecache", then
that might be interesting :)
Anyway, I won't say no to any proposal, so keep the ideas coming. We
can talk about whatever we find interesting on the day.
>>- how to make complex API changes without having to fix most things
>> yourself.
>
>
> More issues:
Thanks Christoph, sounds good.
--
SUSE Labs, Novell Inc.
On Jun 26, 2007 12:35 +1000, Nick Piggin wrote:
> Leaving my opinion of higher order pagecache aside, this _may_ be an
> example of something that doesn't need a lot of attention, because it
> should be fairly uncontroversial from a filesystem's POV? (eg. it is
> more a relevant item to memory management and possibly block layer).
> OTOH if it is discussed in the context of "large blocks in the buffer
> layer is crap because we can do it with higher order pagecache", then
> that might be interesting :)
FWIW, being able to have large (8-64kB) blocksize would be great for
ext2/3/4. We'd sort of been betting on this by limiting the on-disk
extent format to 48-bit physical block numbers, and to have 2 patches
to implement this in as many weeks is excellent.
To me the mechanism doesn't matter, whether through fsblock or high-order
PAGE_SIZE. I'll let the rest of you duke it out as long as at least one
of them makes it into the kernel.
Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.
On Mon, Jun 25, 2007 at 05:08:02PM -0700, Jared Hulbert wrote:
> -memory mappable swap file (I'm not sure if this one is appropriate
> for the proposed meeting)
Please explain what this is supposed to mean.
On Tue, Jun 26, 2007 at 12:35:09PM +1000, Nick Piggin wrote:
> Christoph Hellwig wrote:
> >On Sun, Jun 24, 2007 at 06:23:45AM +0200, Nick Piggin wrote:
> >
> >>I'd just like to take the chance also to ask about a VM/FS meetup some
> >>time around kernel summit (maybe take a big of time during UKUUG or so).
> >
> >
> >I won't be around until a day or two before KS, so I'd prefer to have it
> >after KS if possible.
>
> I'd like to see you there, so I hope we can find a date that most
> people are happy with. I'll try to start working that out after we
> have a rough idea of who's interested.
I'm game, but won't be staying past the end of KS (I'll arrive Sept 2nd
or so though). Given debates so far, it probably makes sense to talk
about things at KS too.
-chris
On 6/25/07, Christoph Hellwig <[email protected]> wrote:
> On Mon, Jun 25, 2007 at 05:08:02PM -0700, Jared Hulbert wrote:
> > -memory mappable swap file (I'm not sure if this one is appropriate
> > for the proposed meeting)
>
> Please explain what this is supposed to mean.
If you have a large array of a non-volatile semi-writeable memory such
as a highspeed NOR Flash or some of the similar emerging technologies
in a system. It would be useful to use that memory as an extension of
RAM. One of the ways you could do that is allow pages to be swapped
out to this memory. Once there these pages could be read directly,
but would require a COW procedure on a write access. The reason why I
think this may be a vm/fs topic is that the hardware makes writing to
this memory efficiently a non-trivial operation that requires
management just like a filesystem. Also it seems to me that there are
probably overlaps between this topic and the recent filemap_xip.c
discussions.
On Tue, Jun 26, 2007 at 12:35:09PM +1000, Nick Piggin wrote:
> I'd like to see you there, so I hope we can find a date that most
> people are happy with. I'll try to start working that out after we
> have a rough idea of who's interested.
Do we have any data preferences yet?
On Tue, Jun 26, 2007 at 10:07:24AM -0700, Jared Hulbert wrote:
> If you have a large array of a non-volatile semi-writeable memory such
> as a highspeed NOR Flash or some of the similar emerging technologies
> in a system. It would be useful to use that memory as an extension of
> RAM. One of the ways you could do that is allow pages to be swapped
> out to this memory. Once there these pages could be read directly,
> but would require a COW procedure on a write access. The reason why I
> think this may be a vm/fs topic is that the hardware makes writing to
> this memory efficiently a non-trivial operation that requires
> management just like a filesystem. Also it seems to me that there are
> probably overlaps between this topic and the recent filemap_xip.c
> discussions.
So what you mean is "swap on flash" ? Defintively sounds like an
interesting topic, although I'm not too sure it's all that
filesystem-related.
On Sat, Jun 30, 2007 at 06:02:44AM -0400, [email protected] wrote:
> You need either a block translation layer, or a (swap) filesystem that
> understands flash peculiarities in order to make such a thing work.
> The standard Linux swap format will not work.
Yes, it basically needs an ftl.
>>>>> "Christoph" == Christoph Hellwig <[email protected]> writes:
Christoph> On Tue, Jun 26, 2007 at 10:07:24AM -0700, Jared Hulbert
Christoph> wrote:
>> If you have a large array of a non-volatile semi-writeable memory
>> such as a highspeed NOR Flash or some of the similar emerging
>> technologies in a system. It would be useful to use that memory as
>> an extension of RAM. One of the ways you could do that is allow
>> pages to be swapped out to this memory. Once there these pages
>> could be read directly, but would require a COW procedure on a
>> write access. The reason why I think this may be a vm/fs topic is
>> that the hardware makes writing to this memory efficiently a
>> non-trivial operation that requires management just like a
>> filesystem. Also it seems to me that there are probably overlaps
>> between this topic and the recent filemap_xip.c discussions.
Christoph> So what you mean is "swap on flash" ? Defintively sounds
Christoph> like an interesting topic, although I'm not too sure it's
Christoph> all that filesystem-related.
You need either a block translation layer, or a (swap) filesystem that
understands flash peculiarities in order to make such a thing work.
The standard Linux swap format will not work.
--
Dr Peter Chubb http://www.gelato.unsw.edu.au peterc AT gelato.unsw.edu.au
http://www.ertos.nicta.com.au ERTOS within National ICT Australia
Christoph Hellwig wrote:
> On Tue, Jun 26, 2007 at 12:35:09PM +1000, Nick Piggin wrote:
>
>> I'd like to see you there, so I hope we can find a date that most
>> people are happy with. I'll try to start working that out after we
>> have a rough idea of who's interested.
>>
>
> Do we have any data preferences yet?
>
You mean date?
VM is arranged for the 3rd, IIRC Kernel summit doesn't
start until the 5th, so there's a gap on the 4th if you want
to sort out the fs stuff then? Not 100% sure on the dates.
M.
[email protected] wrote:
> >>>>> "Christoph" == Christoph Hellwig <[email protected]> writes:
>
> Christoph> On Tue, Jun 26, 2007 at 10:07:24AM -0700, Jared Hulbert
>
> Christoph> wrote:
> >> If you have a large array of a non-volatile semi-writeable memory
> >> such as a highspeed NOR Flash or some of the similar emerging
> >> technologies in a system. It would be useful to use that memory as
> >> an extension of RAM. One of the ways you could do that is allow
> >> pages to be swapped out to this memory. Once there these pages
> >> could be read directly, but would require a COW procedure on a
> >> write access. The reason why I think this may be a vm/fs topic is
> >> that the hardware makes writing to this memory efficiently a
> >> non-trivial operation that requires management just like a
> >> filesystem. Also it seems to me that there are probably overlaps
> >> between this topic and the recent filemap_xip.c discussions.
>
> Christoph> So what you mean is "swap on flash" ? Defintively sounds
> Christoph> like an interesting topic, although I'm not too sure it's
> Christoph> all that filesystem-related.
I wouldn't want to call it swap, as this carries with it block-io
connotations. It's really mmap on flash.
> You need either a block translation layer,
Are you suggesting to go through the block layer to reach the flash?
> or a (swap) filesystem that
> understands flash peculiarities in order to make such a thing work.
> The standard Linux swap format will not work.
Correct.
BTW, you may want to have a look at my "[RFC] VM: I have a dream..." thread.
Here is an excerpt:
"What's more, there is no more swap.
Apps are executed inplace, as if already loaded.
Physical RAM is used to cache slower storage RAM, much the same as the CPU
cache RAM caches slower physical RAM."
The thread ended with this conclusion:
Alan Cox wrote:
> On Iau, 2006-02-02 at 21:59 +0300, Al Boldi wrote:
> > So w/ 1GB RAM, no swap, and 1TB disk mmap'd, could this mmap'd space be
> > added to the total memory available to the OS, as is done w/ swap?
>
> Yes in theory. It would be harder to manage.
>
> > And if that's possible, why not replace swap w/ mmap'd disk-space?
>
> Swap is just somewhere to stick data that isnt file backed, you could
> build a swapless mmap based OS but it wouldn't be quite the same as
> Unix/Linux are.
Thanks!
--
Al