I'd like to get a first round of review on my AXFS filesystem. This is a simple
read only compressed filesystem like Squashfs and cramfs. AXFS is special
because it also allows for execute-in-place of your applications. It is a major
improvement over the cramfs XIP patches that have been floating around for ages.
The biggest improvement is in the way AXFS allows for each page to be XIP or
not. First, a user collects information about which pages are accessed on a
compressed image for each mmap()ed region from /proc/axfs/volume0. That
'profile' is used as an input to the image builder. The resulting image has
only the relevant pages uncompressed and XIP. The result is smaller memory
sizes and faster launches.
See http://axfs.sourceforge.net for more info.
fs/Kconfig | 21 +
fs/Makefile | 1
fs/axfs/Makefile | 7
fs/axfs/axfs_bdev.c | 158 ++++++++
fs/axfs/axfs_inode.c | 490 ++++++++++++++++++++++++++
fs/axfs/axfs_mtd.c | 233 ++++++++++++
fs/axfs/axfs_profiling.c | 594 +++++++++++++++++++++++++++++++
fs/axfs/axfs_super.c | 866 ++++++++++++++++++++++++++++++++++++++++++++++
fs/axfs/axfs_uml.c | 47 ++
fs/axfs/axfs_uncompress.c | 97 +++++
include/linux/axfs.h | 358 +++++++++++++++++++
11 files changed, 2872 insertions(+)
Jared, nice work!
I've also read your paper from the linux symposium
(http://ols.fedoraproject.org/OLS/Reprints-2008/hulbert-reprint.pdf)
A few questions:
- how does this benchmark compared to cramfs and squashfs in a NAND-only system
(or is it just not a good plan to use this with NAND-only (of course
I won't get XIP with NAND, I understand that)
- would axfs be suitable as a filesystem on a ram disk?
Background for the last question is that if you do not have the memory
to retain all pages uncompressed (as you would with ramfs), this could
be a nice intermediate format.
Furthermore compared to ramfs, a filesystem on a ramdisk does not need
the initialisation during startup (decompressing the cpio file,
creating the files, copying the data), so when it comes to boot times
a filesystem on a ramdisk (e.g. axfs) could be a better choice.
Appreciate your feedback.
Frans.
On Wed, Aug 20, 2008 at 10:44:36PM -0700, Jared Hulbert wrote:
> I'd like to get a first round of review on my AXFS filesystem. This is a simple
> read only compressed filesystem like Squashfs and cramfs. AXFS is special
> because it also allows for execute-in-place of your applications. It is a major
> improvement over the cramfs XIP patches that have been floating around for ages.
> The biggest improvement is in the way AXFS allows for each page to be XIP or
> not. First, a user collects information about which pages are accessed on a
> compressed image for each mmap()ed region from /proc/axfs/volume0. That
> 'profile' is used as an input to the image builder. The resulting image has
> only the relevant pages uncompressed and XIP. The result is smaller memory
> sizes and faster launches.
FWIW, I'm not sure it's a good idea to name this new filesystem AXFS.
People are almost certainly going to confuse it with XFS despite
the filesystems being aimed at diammetrically opposed ends of the
storage spectrum. ;)
Cheers,
Dave.
--
Dave Chinner
[email protected]
Jared Hulbert wrote:
> I'd like to get a first round of review on my AXFS filesystem.
I like the general approach of it. It's much more flexible than the
ext2 extension I've done, and the possibility to select XIP vs.
compression per page is really really neat. I can imagine that people
will prefer this over the ext2 implementation on s390. It is unclear
to me how the "secondary block device" thing is supposed to work.
Could you elaborate a bit on that?
On Thursday 21 August 2008 20:25, Carsten Otte wrote:
> Jared Hulbert wrote:
> > I'd like to get a first round of review on my AXFS filesystem.
>
> I like the general approach of it. It's much more flexible than the
> ext2 extension I've done, and the possibility to select XIP vs.
> compression per page is really really neat. I can imagine that people
> will prefer this over the ext2 implementation on s390. It is unclear
> to me how the "secondary block device" thing is supposed to work.
> Could you elaborate a bit on that?
Agreed. I haven't had a good look through it yet, but at a glance it
looks pretty neat. The VM side of things looks pretty reasonable
(I fear XIP faulting might have another race or two, but that's a
core mm issue rather than filesystem specific).
Jared Hulbert wrote:
> The biggest improvement is in the way AXFS allows for each page to be XIP or
> not. First, a user collects information about which pages are accessed on a
> compressed image for each mmap()ed region from /proc/axfs/volume0. That
> 'profile' is used as an input to the image builder. The resulting image has
> only the relevant pages uncompressed and XIP. The result is smaller memory
> sizes and faster launches.
Sounds great, really nice idea.
How does it fare with no MMU? Can the profiler and image builder lay
out the XIP pages in such a way that no-MMU mmaps can map those regions?
No complaint if not, it would be a nice bonus though.
-- Jamie
On Thursday 21 August 2008, Nick Piggin wrote:
> On Thursday 21 August 2008 20:25, Carsten Otte wrote:
> > Jared Hulbert wrote:
> > > I'd like to get a first round of review on my AXFS filesystem.
> >
> > I like the general approach of it. It's much more flexible than the
> > ext2 extension I've done, and the possibility to select XIP vs.
> > compression per page is really really neat. I can imagine that people
> > will prefer this over the ext2 implementation on s390. It is unclear
> > to me how the "secondary block device" thing is supposed to work.
> > Could you elaborate a bit on that?
>
> Agreed. I haven't had a good look through it yet, but at a glance it
> looks pretty neat. The VM side of things looks pretty reasonable
> (I fear XIP faulting might have another race or two, but that's a
> core mm issue rather than filesystem specific).
Yes, I also like the file system, I guess this is 2.6.28 material and
you should have it added to linux-next when you have addressed the
comments so far.
One thing that would be really nice is if you could add fake-write
support in the way that I proposed for cramfs a few months ago.
This would make axfs much more interesting for another set of
users, and keep cramfs a really simple example file system.
Arnd <><
> How does it fare with no MMU? Can the profiler and image builder lay
> out the XIP pages in such a way that no-MMU mmaps can map those regions?
>
> No complaint if not, it would be a nice bonus though.
Sorry. I don't believe it will work on no-MMU as is. That said you
_could_ tweak the mkfs tool to lay mmap()'ed regions down contiguously
but then if you mmap() an unprofiled region, well that would be bad.
I suppose you could make axfs_mmap smart enough to handle that. I
guess the cleanest way would be to just make files lay down
contiguously, you lose some of the space saving but it would work.
I'm not plannin to get to this anytime soon. But I'd be willing merge
patches. Can anybody convince me offline that working on no-MMU this
makes financial sense for my employer? This is getting to be a common
question. How many noMMU users are out there and why are you so
interested?
> One thing that would be really nice is if you could add fake-write
> support in the way that I proposed for cramfs a few months ago.
> This would make axfs much more interesting for another set of
> users, and keep cramfs a really simple example file system.
Did that get merged?
> Agreed. I haven't had a good look through it yet, but at a glance it
> looks pretty neat. The VM side of things looks pretty reasonable
> (I fear XIP faulting might have another race or two, but that's a
> core mm issue rather than filesystem specific).
How might I design a test to flush those bugs out? We haven't seen any.
On Thursday 21 August 2008, Jared Hulbert wrote:
> > One thing that would be really nice is if you could add fake-write
> > support in the way that I proposed for cramfs a few months ago.
> > This would make axfs much more interesting for another set of
> > users, and keep cramfs a really simple example file system.
>
> Did that get merged?
No, there were a few remaining issues that I never found the time
to work on.
Arnd <><
> FWIW, I'm not sure it's a good idea to name this new filesystem AXFS.
> People are almost certainly going to confuse it with XFS despite
> the filesystems being aimed at diammetrically opposed ends of the
> storage spectrum. ;)
In principle I think you are right. AXFS and XFS are similar names
and it could lead to confusion. I think XFS should change its name to
prevent confusion. I think by 2 years AXFS will be used in orders of
magnitude more machines anyway. ;)
About opposite end of the spectrum... Carsten just said AXFS might be
nice for s390, so I'm not sure how true that is.
I'm kind of attached to the name now.
> I like the general approach of it. It's much more flexible than the ext2
> extension I've done, and the possibility to select XIP vs. compression per
> page is really really neat. I can imagine that people will prefer this over
> the ext2 implementation on s390. It is unclear to me how the "secondary
> block device" thing is supposed to work. Could you elaborate a bit on that?
First off we don't yet support direct_access(), but I am planning on that soon.
Sure. For a system that has say a NOR Flash and a NAND or a embedded
MMC, one can split a filesystem image such that only the XIP parts of
the image are on the NOR while the compressed bits are on the NAND /
eMMC. The NOR part is accessed as directly addressable memory, while
the NAND would use mtd->read() and the eMMC would use block device
access API's. In this case I would call this NAND or eMMC the
"secondary device" because the primary device is the NOR.
Assuming my NOR was at /dev//mtd2 and my NAND at /dev/mtd5. I would
call the following to mount such a system:
mount -t axfs -o second_dev=/dev/mtd5 /dev/mtd2 /mnt/axfs
Hello Jared,
On Thu, Aug 21, 2008 at 4:19 PM, Jared Hulbert <[email protected]> wrote:
>> FWIW, I'm not sure it's a good idea to name this new filesystem AXFS.
>> People are almost certainly going to confuse it with XFS despite
>
People that care about their filesystem choice know their choices.
People that don't
care, well they don't care.
Maybe AXIPFS would be the close alternative.
One question on the use-case profiling and subsequent image rebuild:
What if the use-case did not cover all cases of XIP use?
If a compressed page is attempted to be executed, will the filesystem
fall back to decompression to RAM and execution from RAM, or will this
result in a faulty system?
The design choices look real good. Congrats on the achievement.
Regards,
--
Leon
> What if the use-case did not cover all cases of XIP use?
>
> If a compressed page is attempted to be executed, will the filesystem
> fall back to decompression to RAM and execution from RAM, or will this
> result in a faulty system?
No this will not result in a faulty system. It is perfectly
acceptable to have all pages in a file XIP, no pages in a fill XIP,
and anywhere in between.
> The design choices look real good. Congrats on the achievement.
thanks!
On Thu, 21 Aug 2008, Leon Woestenberg wrote:
> On Thu, Aug 21, 2008 at 4:19 PM, Jared Hulbert <[email protected]> wrote:
> >> FWIW, I'm not sure it's a good idea to name this new filesystem AXFS.
> >> People are almost certainly going to confuse it with XFS despite
> >
> People that care about their filesystem choice know their choices.
> People that don't
> care, well they don't care.
>
> Maybe AXIPFS would be the close alternative.
It seems to be useful for non-XIP, too...
> One question on the use-case profiling and subsequent image rebuild:
>
> What if the use-case did not cover all cases of XIP use?
>
> If a compressed page is attempted to be executed, will the filesystem
> fall back to decompression to RAM and execution from RAM, or will this
> result in a faulty system?
>
> The design choices look real good. Congrats on the achievement.
You probably want to read the paper at
http://ols.fedoraproject.org/OLS/Reprints-2008/hulbert-reprint.pdf
BTW, I regret now not having attended the OLS presentation, because there was
so much emphasis on `XIP' in the description :-)
Fortunately it's been recorded:
http://free-electrons.com/community/videos/conferences/
so I'm gonna watch it right now...
With kind regards,
Geert Uytterhoeven
Software Architect
Sony Techsoft Centre Europe
The Corporate Village · Da Vincilaan 7-D1 · B-1935 Zaventem · Belgium
Phone: +32 (0)2 700 8453
Fax: +32 (0)2 700 8622
E-mail: [email protected]
Internet: http://www.sony-europe.com/
A division of Sony Europe (Belgium) N.V.
VAT BE 0413.825.160 · RPR Brussels
Fortis · BIC GEBABEBB · IBAN BE41293037680010
> Jared, nice work!
Thanks.
> A few questions:
I meant to address these before. Sorry.
> - how does this benchmark compared to cramfs and squashfs in a NAND-only system
> (or is it just not a good plan to use this with NAND-only (of course
> I won't get XIP with NAND, I understand that)
I don't know, I'm interested to find out. I just benchmarked that.
Actually it should work very well as a NAND-only fs. Also you do get
something like XIP with NAND. If you boot an XIP AXFS image on NAND
or a blkdev it will copy that XIP region into RAM and "XIP" it from
there. I think this will make it very good for LiveCD's. Though we
just (minutes ago) realized our testing of that feature was flawed, so
no guarantees.
> - would axfs be suitable as a filesystem on a ram disk?
It could be. I plan on implementing support for brd. That might work nicely.
Jamie Lokier wrote:
> Jared Hulbert wrote:
>> The biggest improvement is in the way AXFS allows for each page to be XIP or
>> not. First, a user collects information about which pages are accessed on a
>> compressed image for each mmap()ed region from /proc/axfs/volume0. That
>> 'profile' is used as an input to the image builder. The resulting image has
>> only the relevant pages uncompressed and XIP. The result is smaller memory
>> sizes and faster launches.
>
> Sounds great, really nice idea.
>
> How does it fare with no MMU? Can the profiler and image builder lay
> out the XIP pages in such a way that no-MMU mmaps can map those regions?
The key for XIP on noMMU would be the ability to store a
file as one complete contiguous chunk. Can AXFS do this?
Regards
Greg
------------------------------------------------------------------------
Greg Ungerer -- Chief Software Dude EMAIL: [email protected]
Secure Computing Corporation PHONE: +61 7 3435 2888
825 Stanley St, FAX: +61 7 3891 3630
Woolloongabba, QLD, 4102, Australia WEB: http://www.SnapGear.com
Hi Jared,
Jared Hulbert wrote:
>> How does it fare with no MMU? Can the profiler and image builder lay
>> out the XIP pages in such a way that no-MMU mmaps can map those regions?
>>
>> No complaint if not, it would be a nice bonus though.
>
> Sorry. I don't believe it will work on no-MMU as is. That said you
> _could_ tweak the mkfs tool to lay mmap()'ed regions down contiguously
> but then if you mmap() an unprofiled region, well that would be bad.
> I suppose you could make axfs_mmap smart enough to handle that. I
> guess the cleanest way would be to just make files lay down
> contiguously, you lose some of the space saving but it would work.
That would be enough I think. If you could manually select
which files are contiguous-and-uncompressed that would be
useful for some too here.
> I'm not plannin to get to this anytime soon. But I'd be willing merge
> patches. Can anybody convince me offline that working on no-MMU this
> makes financial sense for my employer? This is getting to be a common
> question. How many noMMU users are out there and why are you so
> interested?
One of those unknown factors, how many are there?
Who knows, pretty much impossible to tell.
One thing for sure is that many people who do non-MMU setups
are interested in XIP to get the space savings. These are very
often small devices with very constrained RAM and flash. (For
whatever it is worth single NOR flash only boards are common in
these smaller form factors :-)
Regards
Greg
------------------------------------------------------------------------
Greg Ungerer -- Chief Software Dude EMAIL: [email protected]
Secure Computing Corporation PHONE: +61 7 3435 2888
825 Stanley St, FAX: +61 7 3891 3630
Woolloongabba, QLD, 4102, Australia WEB: http://www.SnapGear.com
On Friday 22 August 2008 05:32, Jared Hulbert wrote:
> > Jared, nice work!
> > - would axfs be suitable as a filesystem on a ram disk?
>
> It could be. I plan on implementing support for brd. That might work
> nicely.
I was going to take a look at this too. With any luck, it should be
little effort required as it looks like you have the block device
support in place?
This filesystem actually should in theory work fairly well with brd,
because then we wouldn't have to bring the data over into pagecache
for frequently used pages but we can retain the compressed storage
for the infrequently used stuff.
I say in theory because I don't know of any serious users (except
kernel testing) of brd :)
On Friday 22 August 2008 00:13, Jared Hulbert wrote:
> > Agreed. I haven't had a good look through it yet, but at a glance it
> > looks pretty neat. The VM side of things looks pretty reasonable
> > (I fear XIP faulting might have another race or two, but that's a
> > core mm issue rather than filesystem specific).
>
> How might I design a test to flush those bugs out? We haven't seen any.
Not quite sure yet. I just fixed a couple of easy ones, but there
could be some more lurking. Don't be too worried about it yet, I
was just musing to myself there really :)
> That would be enough I think. If you could manually select
> which files are contiguous-and-uncompressed that would be
> useful for some too here.
So.... If you don't have an MMU when do you call ->fault? Does the
noMMU code just loop through ->fault()ing all the pages in an mmap()?
> One thing for sure is that many people who do non-MMU setups
> are interested in XIP to get the space savings. These are very
> often small devices with very constrained RAM and flash. (For
> whatever it is worth single NOR flash only boards are common in
> these smaller form factors :-)
True.
Hi Jared,
On Wed, 20 Aug 2008, Jared Hulbert wrote:
> I'd like to get a first round of review on my AXFS filesystem. This is a simple
> See http://axfs.sourceforge.net for more info.
The version in SVN seems to be slightly older than the one you submitted?
Which platform(s) do you use for testing?
I gave AxFS a try on PS3 (ppc64, always use big-endian 64-bit for testing new
code ;-).
When mounting the image, I got the crash below:
| attempt to access beyond end of device
| loop0: rw=0, want=4920, limit=4912
| Unable to handle kernel paging request for data at address 0x00000028
| Faulting instruction address: 0xd000000000037988
| Oops: Kernel access of bad area, sig: 11 [#1]
| SMP NR_CPUS=2 PS3
| Modules linked in: axfs zlib_inflate nfsd exportfs dm_crypt dm_mod sg joydev evdev
| NIP: d000000000037988 LR: d000000000037974 CTR: 0000000000000000
| REGS: c00000000c1e3240 TRAP: 0300 Not tainted (2.6.27-rc4-dirty)
| MSR: 8000000000008032 <EE,IR,DR> CR: 24044482 XER: 20000000
| DAR: 0000000000000028, DSISR: 0000000040000000
| TASK = c0000000068d4e40[1744] 'mount' THREAD: c00000000c1e0000 CPU: 0
| GPR00: d000000000037974 c00000000c1e34c0 d000000000043f30 c00000000c1e36a0
| GPR04: 000000000000013e 000000000000013e c00000000c1e2eb0 0000000000000002
| GPR08: c00000000058de80 0000000000000001 c0000000068d4e40 c00000000c1e34c0
| GPR12: 8000000000008032 c000000000671300 0000000010020000 00000000ff80bec1
| GPR16: 0000000010023dc8 0000000010023db8 00000000ff80bed1 0000000010023e00
| GPR20: 0000000000000001 0000000010023e38 c00000000c1e36a0 c00000000c1d5000
| GPR24: 0000000000000000 0000000000000004 0000000000266000 0000000000000000
| GPR28: 0000000000001000 0000000000000004 d0000000000438e0 c00000000c1e34c0
| NIP [d000000000037988] .axfs_copy_block+0xa0/0x144 [axfs]
| LR [d000000000037974] .axfs_copy_block+0x8c/0x144 [axfs]
| Call Trace:
| [c00000000c1e34c0] [d000000000037974] .axfs_copy_block+0x8c/0x144 [axfs] (unreliable)
| [c00000000c1e3580] [d000000000035f20] .axfs_copy_metadata+0x154/0x1cc [axfs]
| [c00000000c1e3630] [d000000000035fd8] .axfs_verify_eofs_magic+0x40/0xa0 [axfs]
| [c00000000c1e36c0] [d000000000036678] .axfs_fill_super+0x3c0/0x7c8 [axfs]
| [c00000000c1e3780] [c0000000000d1670] .get_sb_bdev+0x154/0x1ec
| [c00000000c1e3860] [d000000000037a94] .axfs_get_sb_bdev+0x34/0x6c [axfs]
| [c00000000c1e38f0] [d000000000035d0c] .axfs_get_sb+0x320/0x394 [axfs]
| [c00000000c1e3a00] [c0000000000d1318] .vfs_kern_mount+0x88/0x108
| [c00000000c1e3ab0] [c0000000000d143c] .do_kern_mount+0x68/0x148
| [c00000000c1e3b60] [c0000000000f0a10] .do_new_mount+0x90/0xf4
| [c00000000c1e3c10] [c0000000000f0c5c] .do_mount+0x1e8/0x23c
| [c00000000c1e3d60] [c000000000114778] .compat_sys_mount+0x21c/0x2ac
| [c00000000c1e3e30] [c0000000000074dc] syscall_exit+0x0/0x40
| Instruction dump:
| 3b600000 409e0084 48000090 80b7001c e87701d0 7c84e392 48000799 e8410028
| 2fbb0000 7c781b78 7f3de040 7ec3b378 <e8980028> 409e002c 7f3dcb78 7c1ae392
| ---[ end trace 7f5bc7e7ad0c4386 ]---
When mounting (also on PS3) an image created on ia32, I get a different crash:
| axfs: wrong magic
^^^^^^^^^^^^^^^^^
| Unable to handle kernel paging request for data at address 0x000003a8
| Faulting instruction address: 0xd0000000000355f0
| Oops: Kernel access of bad area, sig: 11 [#1]
| SMP NR_CPUS=2 PS3
| Modules linked in: axfs zlib_inflate nfsd exportfs dm_crypt dm_mod sg joydev evdev
| NIP: d0000000000355f0 LR: c0000000000d1250 CTR: d0000000000355d0
| REGS: c00000000c0b73d0 TRAP: 0300 Not tainted (2.6.27-rc4-dirty)
| MSR: 8000000000008032 <EE,IR,DR> CR: 24044482 XER: 00000000
| DAR: 00000000000003a8, DSISR: 0000000040000000
| TASK = c000000006814b40[1745] 'mount' THREAD: c00000000c0b4000 CPU: 1
| GPR00: c0000000000d1250 c00000000c0b7650 d000000000043f30 c00000000652f800
| GPR04: c00000000652f8b8 c000000006815480 0000000000000002 c000000006815480
| GPR08: c000000006815480 0000000000000000 00000000000001ea 0000000000000000
| GPR12: d000000000037e68 c000000000671500 0000000010020000 00000000ffc18eee
| GPR16: 0000000010023d98 0000000010023d88 00000000ffc18efe 0000000010023db0
| GPR20: 0000000000000001 0000000010023dc8 c00000000634f280 c0000000065b5000
| GPR24: fffffffffffff000 d00000000003bd38 0000000000000000 d00000000003b278
| GPR28: c00000000652f800 c00000000652f800 c0000000005d66e8 c00000000c0b7650
| NIP [d0000000000355f0] .axfs_kill_super+0x20/0x9c [axfs]
| LR [c0000000000d1250] .deactivate_super+0xd4/0x114
| Call Trace:
| [c00000000c0b7650] [c0000000003d2c48] .down_write+0x5c/0xb8 (unreliable)
| [c00000000c0b76e0] [c0000000000d1250] .deactivate_super+0xd4/0x114
| [c00000000c0b7780] [c0000000000d1690] .get_sb_bdev+0x174/0x1ec
| [c00000000c0b7860] [d000000000037a94] .axfs_get_sb_bdev+0x34/0x6c [axfs]
| [c00000000c0b78f0] [d000000000035d0c] .axfs_get_sb+0x320/0x394 [axfs]
| [c00000000c0b7a00] [c0000000000d1318] .vfs_kern_mount+0x88/0x108
| [c00000000c0b7ab0] [c0000000000d143c] .do_kern_mount+0x68/0x148
| [c00000000c0b7b60] [c0000000000f0a10] .do_new_mount+0x90/0xf4
| [c00000000c0b7c10] [c0000000000f0c5c] .do_mount+0x1e8/0x23c
| [c00000000c0b7d60] [c000000000114778] .compat_sys_mount+0x21c/0x2ac
| [c00000000c0b7e30] [c0000000000074dc] syscall_exit+0x0/0x40
| Instruction dump:
| f9240030 ebebfff0 7d615b78 4e800020 f821ff71 7c0802a6 fba10078 7c7d1b78
| fbe10088 7c3f0b78 f80100a0 e9230470 <e80903a8> 2fa00000 409e0034 e80301d8
| ---[ end trace c19667cc5b6821ab ]---
So I guess some parts are not yet 64-bit or endian clean.
With kind regards,
Geert Uytterhoeven
Software Architect
Sony Techsoft Centre Europe
The Corporate Village · Da Vincilaan 7-D1 · B-1935 Zaventem · Belgium
Phone: +32 (0)2 700 8453
Fax: +32 (0)2 700 8622
E-mail: [email protected]
Internet: http://www.sony-europe.com/
A division of Sony Europe (Belgium) N.V.
VAT BE 0413.825.160 · RPR Brussels
Fortis · BIC GEBABEBB · IBAN BE41293037680010
On Friday 22 August 2008, Geert Uytterhoeven wrote:
> I gave AxFS a try on PS3 (ppc64, always use big-endian 64-bit for testing new
> code ;-).
> When mounting the image, I got the crash below:
>
> | attempt to access beyond end of device
> | loop0: rw=0, want=4920, limit=4912
> | Unable to handle kernel paging request for data at address 0x00000028
Offset 0x28 is buffer_head->b_data, so it seems like sb_bread returns NULL,
which it does for out of range block numbers. I guess axfs_copy_block
should check for that condition, as it can happen on malicious file system
images.
I agree that this is likely to get caused by an endianess bug.
A good help for finding endianess bugs is to use __be64 like data types
everywhere and test with sparse -D__CHECK_ENDIAN__.
Arnd
> The version in SVN seems to be slightly older than the one you submitted?
Oops. Okay I must have neglected to sync at the very end. Thanks.
I forgot, there is also a git repo at
git://git.infradead.org/users/jehulber/axfs.git
> Which platform(s) do you use for testing?
ARM, x86
> I gave AxFS a try on PS3 (ppc64, always use big-endian 64-bit for testing new
> code ;-).
Smart. Hmmm, If only I had a PS3....
> When mounting the image, I got the crash below:
>
> | attempt to access beyond end of device
> | loop0: rw=0, want=4920, limit=4912
> | [c00000000c1e34c0] [d000000000037974] .axfs_copy_block+0x8c/0x144 [axfs] (unreliable)
> | [c00000000c1e3580] [d000000000035f20] .axfs_copy_metadata+0x154/0x1cc [axfs]
> | [c00000000c1e3630] [d000000000035fd8] .axfs_verify_eofs_magic+0x40/0xa0 [axfs]
> | [c00000000c1e36c0] [d000000000036678] .axfs_fill_super+0x3c0/0x7c8 [axfs]
> | [c00000000c1e3780] [c0000000000d1670] .get_sb_bdev+0x154/0x1ec
> | [c00000000c1e3860] [d000000000037a94] .axfs_get_sb_bdev+0x34/0x6c [axfs]
> | [c00000000c1e38f0] [d000000000035d0c] .axfs_get_sb+0x320/0x394 [axfs]
> | [c00000000c1e3a00] [c0000000000d1318] .vfs_kern_mount+0x88/0x108
> | [c00000000c1e3ab0] [c0000000000d143c] .do_kern_mount+0x68/0x148
> | [c00000000c1e3b60] [c0000000000f0a10] .do_new_mount+0x90/0xf4
> | [c00000000c1e3c10] [c0000000000f0c5c] .do_mount+0x1e8/0x23c
> | [c00000000c1e3d60] [c000000000114778] .compat_sys_mount+0x21c/0x2ac
> | [c00000000c1e3e30] [c0000000000074dc] syscall_exit+0x0/0x40
Yeah we've had this problem before. I'm not so sure this is an endian
bug, though it is likely.
> When mounting (also on PS3) an image created on ia32, I get a different crash:
>
> | axfs: wrong magic
>
> So I guess some parts are not yet 64-bit or endian clean.
Can you run mkfs.axfs on the same trivial directory on both ia32 and
PPC64 and then get me the resulting images?
Greg Ungerer wrote:
>
> Jamie Lokier wrote:
> >Jared Hulbert wrote:
> >>The biggest improvement is in the way AXFS allows for each page to be XIP
> >>or
> >>not. First, a user collects information about which pages are accessed
> >>on a
> >>compressed image for each mmap()ed region from /proc/axfs/volume0. That
> >>'profile' is used as an input to the image builder. The resulting image
> >>has
> >>only the relevant pages uncompressed and XIP. The result is smaller
> >>memory
> >>sizes and faster launches.
> >
> >Sounds great, really nice idea.
> >
> >How does it fare with no MMU? Can the profiler and image builder lay
> >out the XIP pages in such a way that no-MMU mmaps can map those regions?
>
> The key for XIP on noMMU would be the ability to store a
> file as one complete contiguous chunk. Can AXFS do this?
Or more generally, the mmap'd parts of a file.
XIP doesn't mmap the whole file, it just maps the code and rodata.
The data segment is copied.
AXFS's magic for keeping parts of the file uncompressed, but parts
compressed, would be good for this - both for space saving, and also
because decompressing compressed data from NOR is faster than reading
uncompressed data.
-- Jamie
Greg Ungerer wrote:
> One thing for sure is that many people who do non-MMU setups
> are interested in XIP to get the space savings. These are very
> often small devices with very constrained RAM and flash. (For
> whatever it is worth single NOR flash only boards are common in
> these smaller form factors :-)
I'm using XIP on a device with 32MB RAM. The reason I use it is
_partly_ to save RAM, partly because programs start about 10 times
faster (reading NOR flash is slow and I keep the XIP region in RAM)
and partly because it reduces memory fragmentation.
-- Jamie
On Fri, Aug 22, 2008 at 11:13 AM, Jamie Lokier <[email protected]> wrote:
> Greg Ungerer wrote:
>> One thing for sure is that many people who do non-MMU setups
>> are interested in XIP to get the space savings. These are very
>> often small devices with very constrained RAM and flash. (For
>> whatever it is worth single NOR flash only boards are common in
>> these smaller form factors :-)
>
> I'm using XIP on a device with 32MB RAM. The reason I use it is
> _partly_ to save RAM, partly because programs start about 10 times
> faster (reading NOR flash is slow and I keep the XIP region in RAM)
What kind of NOR you using? That is not what I measure with fast
synchronous burst NOR's.
Jared Hulbert wrote:
> On Fri, Aug 22, 2008 at 11:13 AM, Jamie Lokier <[email protected]> wrote:
> > Greg Ungerer wrote:
> >> One thing for sure is that many people who do non-MMU setups
> >> are interested in XIP to get the space savings. These are very
> >> often small devices with very constrained RAM and flash. (For
> >> whatever it is worth single NOR flash only boards are common in
> >> these smaller form factors :-)
> >
> > I'm using XIP on a device with 32MB RAM. The reason I use it is
> > _partly_ to save RAM, partly because programs start about 10 times
> > faster (reading NOR flash is slow and I keep the XIP region in RAM)
>
> What kind of NOR you using? That is not what I measure with fast
> synchronous burst NOR's.
I think the "fast" in "fast synchronous" gives it away :-)
I'm using Spansion MirrorBit S29GL128N, which reads at about 0.6 MByte/s.
Not because they're good, but because that's what the board I'm coding
for has on it. I presume they were cheap and familiar to the board
designers. (There is 32MB of RAM to play with after all.)
So start a sequence of Busybox processes from a shell script is noticable,
if it reads from NOR each time.
Oh, and it's a 166MHz ARM, so it's quite capable of decompressing
faster than the NOR can deliver.
-- Jamie
Jamie Lokier wrote:
> Jared Hulbert wrote:
> > What kind of NOR you using? That is not what I measure with fast
> > synchronous burst NOR's.
>
> I think the "fast" in "fast synchronous" gives it away :-)
>
> I'm using Spansion MirrorBit S29GL128N, which reads at about 0.6 MByte/s.
By the way, what speeds do you get on fast synchronous burst NORs - and
which chips are those?
Thanks,
-- Jamie
Geert Uytterhoeven wrote:
> I gave AxFS a try on PS3 (ppc64, always use big-endian 64-bit for testing new
> code ;-).
> When mounting the image, I got the crash below:
>
> | attempt to access beyond end of device
> | loop0: rw=0, want=4920, limit=4912
> | Unable to handle kernel paging request for data at address 0x00000028
> | Faulting instruction address: 0xd000000000037988
> | Oops: Kernel access of bad area, sig: 11 [#1]
> | SMP NR_CPUS=2 PS3
>
> When mounting (also on PS3) an image created on ia32, I get a different crash:
>
> | axfs: wrong magic
> ^^^^^^^^^^^^^^^^^
> | Unable to handle kernel paging request for data at address 0x000003a8
> | Faulting instruction address: 0xd0000000000355f0
> | Oops: Kernel access of bad area, sig: 11 [#1]
> | SMP NR_CPUS=2 PS
Geert,
Thanks for giving it a spin, especially on a platform as different from
ours as the PS3.
Before I dig more into what happened, I was wondering if you could tell
me a bit more
about your environment, particularly how you supplied the filesystem to
the kernel and
your mount commandline (also, if you used a boot commandline, what it was.)
My first guess would be a ppc64 compiled UML session, but I'd like to be
a bit more sure.
Will Marone
Hi Jared,
Jared Hulbert wrote:
>> That would be enough I think. If you could manually select
>> which files are contiguous-and-uncompressed that would be
>> useful for some too here.
>
> So.... If you don't have an MMU when do you call ->fault? Does the
> noMMU code just loop through ->fault()ing all the pages in an mmap()?
Sort of. It actually just uses a single ->read to bring in
the entire file contents. There is a few limitations on the use
of mmap() for non-mmu. Documentation/nommu-mmap.txt gives
more details. With no MMU it does rely on being able to kmalloc()
a single RAM region big enough to hold the entire file.
>> One thing for sure is that many people who do non-MMU setups
>> are interested in XIP to get the space savings. These are very
>> often small devices with very constrained RAM and flash. (For
>> whatever it is worth single NOR flash only boards are common in
>> these smaller form factors :-)
>
> True.
Regards
Greg
------------------------------------------------------------------------
Greg Ungerer -- Chief Software Dude EMAIL: [email protected]
SnapGear -- a Secure Computing Company PHONE: +61 7 3435 2888
825 Stanley St, FAX: +61 7 3891 3630
Woolloongabba, QLD, 4102, Australia WEB: http://www.SnapGear.com
On Fri, 22 Aug 2008, Will Marone wrote:
> Geert Uytterhoeven wrote:
> > I gave AxFS a try on PS3 (ppc64, always use big-endian 64-bit for testing
> > new
> > code ;-).
> > When mounting the image, I got the crash below:
> >
> > | attempt to access beyond end of device
> > | loop0: rw=0, want=4920, limit=4912
> > | Unable to handle kernel paging request for data at address 0x00000028
> > | Faulting instruction address: 0xd000000000037988
> > | Oops: Kernel access of bad area, sig: 11 [#1]
> > | SMP NR_CPUS=2 PS3
> >
> > When mounting (also on PS3) an image created on ia32, I get a different
> > crash:
> >
> > | axfs: wrong magic
> > ^^^^^^^^^^^^^^^^^
> > | Unable to handle kernel paging request for data at address 0x000003a8
> > | Faulting instruction address: 0xd0000000000355f0
> > | Oops: Kernel access of bad area, sig: 11 [#1]
> > | SMP NR_CPUS=2 PS
> Geert,
>
> Thanks for giving it a spin, especially on a platform as different from ours
> as the PS3.
>
> Before I dig more into what happened, I was wondering if you could tell me a
> bit more
> about your environment, particularly how you supplied the filesystem to the
> kernel and
> your mount commandline (also, if you used a boot commandline, what it was.)
>
> My first guess would be a ppc64 compiled UML session, but I'd like to be a bit
> more sure.
Nope, I just built axfs as a module and insmoded it. After that
mount image.axfs /mnt -o loop -t axfs
So nothing fancy.
With kind regards,
Geert Uytterhoeven
Software Architect
Sony Techsoft Centre Europe
The Corporate Village · Da Vincilaan 7-D1 · B-1935 Zaventem · Belgium
Phone: +32 (0)2 700 8453
Fax: +32 (0)2 700 8622
E-mail: [email protected]
Internet: http://www.sony-europe.com/
A division of Sony Europe (Belgium) N.V.
VAT BE 0413.825.160 · RPR Brussels
Fortis · BIC GEBABEBB · IBAN BE41293037680010
On Fri, 22 Aug 2008, Jared Hulbert wrote:
> > The version in SVN seems to be slightly older than the one you submitted?
>
> Oops. Okay I must have neglected to sync at the very end. Thanks.
>
> I forgot, there is also a git repo at
>
> git://git.infradead.org/users/jehulber/axfs.git
>
> > Which platform(s) do you use for testing?
>
> ARM, x86
Ah, little endian.
>From your good relationship with the s390 developers, I had hoped you would
have done some tests on s390 ;-)
> > I gave AxFS a try on PS3 (ppc64, always use big-endian 64-bit for testing new
> > code ;-).
>
> Smart. Hmmm, If only I had a PS3....
>
> > When mounting the image, I got the crash below:
>
> > When mounting (also on PS3) an image created on ia32, I get a different crash:
> >
> > | axfs: wrong magic
> >
> > So I guess some parts are not yet 64-bit or endian clean.
>
> Can you run mkfs.axfs on the same trivial directory on both ia32 and
> PPC64 and then get me the resulting images?
I'll send them by private email.
With kind regards,
Geert Uytterhoeven
Software Architect
Sony Techsoft Centre Europe
The Corporate Village · Da Vincilaan 7-D1 · B-1935 Zaventem · Belgium
Phone: +32 (0)2 700 8453
Fax: +32 (0)2 700 8622
E-mail: [email protected]
Internet: http://www.sony-europe.com/
A division of Sony Europe (Belgium) N.V.
VAT BE 0413.825.160 · RPR Brussels
Fortis · BIC GEBABEBB · IBAN BE41293037680010
Geert Uytterhoeven wrote:
> From your good relationship with the s390 developers, I had hoped you would
> have done some tests on s390 ;-)
Haha, we let you sort out the endianess issues first and then take the
easy path :-). We have'nt tried it yet so far.
Greg Ungerer wrote:
> Sort of. It actually just uses a single ->read to bring in
> the entire file contents. There is a few limitations on the use
> of mmap() for non-mmu. Documentation/nommu-mmap.txt gives
> more details. With no MMU it does rely on being able to kmalloc()
> a single RAM region big enough to hold the entire file.
That's unfortunate, if you're using FDPIC-ELF or BFLT-XIP, you really want
to kmalloc() one region for code (i.e. mmap not the whole file), and
one separate for data. Asking for a single larger region sometimes
creates much higher memory pressure while kmalloc() attempts to
defragment by evicting everything.
But that's fiddly to do right in general.
The natural thing for AXFS to do to support no-MMU FDPIC-ELF or
BFLT-XIP is store the code segment uncompressed and contiguous, and
the data segment however the filesystem prefers, and the profiling
information to work out where these are is readily available from the
mmap() calls, which are always the same when an executable is run.
-- Jamie
On Fri, 2008-08-22 at 09:51 -0700, Jared Hulbert wrote:
> Can you run mkfs.axfs on the same trivial directory on both ia32 and
> PPC64 and then get me the resulting images?
git.infradead.org is a big-endian box, and I know you have an account
there...
--
David Woodhouse Open Source Technology Centre
[email protected] Intel Corporation
Hi Jamie,
Jamie Lokier wrote:
> Greg Ungerer wrote:
>> Sort of. It actually just uses a single ->read to bring in
>> the entire file contents. There is a few limitations on the use
>> of mmap() for non-mmu. Documentation/nommu-mmap.txt gives
>> more details. With no MMU it does rely on being able to kmalloc()
>> a single RAM region big enough to hold the entire file.
>
> That's unfortunate, if you're using FDPIC-ELF or BFLT-XIP, you really want
> to kmalloc() one region for code (i.e. mmap not the whole file), and
> one separate for data.
That is what the BFLT loader does. For the XIP case it mmap()s
the text directly from the file, and then mmap()s a second region
for the data/bss (reading the data into that region).
I was referring to general mmap() of a file case above, not
the exec path.
> Asking for a single larger region sometimes
> creates much higher memory pressure while kmalloc() attempts to
> defragment by evicting everything.
Sure.
> But that's fiddly to do right in general.
>
> The natural thing for AXFS to do to support no-MMU FDPIC-ELF or
> BFLT-XIP is store the code segment uncompressed and contiguous, and
> the data segment however the filesystem prefers, and the profiling
> information to work out where these are is readily available from the
> mmap() calls, which are always the same when an executable is run.
Yep.
Regards
Greg
------------------------------------------------------------------------
Greg Ungerer -- Chief Software Dude EMAIL: [email protected]
SnapGear -- a Secure Computing Company PHONE: +61 7 3435 2888
825 Stanley St, FAX: +61 7 3891 3630
Woolloongabba, QLD, 4102, Australia WEB: http://www.SnapGear.com
On Fri, 22 Aug 2008, Jared Hulbert wrote:
> > The version in SVN seems to be slightly older than the one you submitted?
>
> Oops. Okay I must have neglected to sync at the very end. Thanks.
>
> I forgot, there is also a git repo at
>
> git://git.infradead.org/users/jehulber/axfs.git
>
> > Which platform(s) do you use for testing?
>
> ARM, x86
>
> > I gave AxFS a try on PS3 (ppc64, always use big-endian 64-bit for testing new
> > code ;-).
>
> Smart. Hmmm, If only I had a PS3....
I heard you got one? ;-)
> > When mounting the image, I got the crash below:
> >
> > | attempt to access beyond end of device
> > | loop0: rw=0, want=4920, limit=4912
Interestingly, it also doesn't work on UserModeLinux (x86, 32-bit):
| attempt to access beyond end of device
| loop0: rw=0, want=24, limit=16
|
| EIP: 0073:[<0811ec67>] CPU: 0 Not tainted ESP: 007b:19515c38 EFLAGS: 00210212
| Not tainted
| EAX: 00000000 EBX: 00001000 ECX: 19484aa0 EDX: 190d9f0c
| ESI: 195ee000 EDI: 19515cd0 EBP: 19515c6c DS: 007b ES: 007b
| 08247af0: [<08069ba3>] show_regs+0xb4/0xb9
| 08247b1c: [<080591ee>] segv+0x222/0x23a
| 08247bbc: [<08059296>] segv_handler+0x90/0x9a
| 08247c68: [<080649b8>] sig_handler_common+0x63/0x72
| 08247ce0: [<08064cac>] sig_handler+0x31/0x3d
| 08247cec: [<08064c0b>] handle_signal+0x4c/0x7a
| 08247d0c: [<08066327>] hard_handler+0xf/0x14
| 08247d1c: [<005c0420>] 0x5c0420
|
| Kernel panic - not syncing: Kernel mode fault at addr 0x14, ip 0x811ec67
|
| EIP: 0073:[<4010a44e>] CPU: 0 Not tainted ESP: 007b:bfa323a0 EFLAGS: 00200246
| Not tainted
| EAX: ffffffda EBX: 080595f8 ECX: 080595c8 EDX: 080595d8
| ESI: c0ed0000 EDI: 00000000 EBP: bfa323d8 DS: 007b ES: 007b
| 08247a5c: [<08069ba3>] show_regs+0xb4/0xb9
| 08247a88: [<08059462>] panic_exit+0x25/0x3b
| 08247a9c: [<08083642>] notifier_call_chain+0x27/0x4c
| 08247ac4: [<0808367e>] __atomic_notifier_call_chain+0x17/0x19
| 08247ad4: [<08083695>] atomic_notifier_call_chain+0x15/0x17
| 08247af0: [<0806fd87>] panic+0x52/0xd8
| 08247b10: [<080591fc>] segv+0x230/0x23a
| 08247bbc: [<08059296>] segv_handler+0x90/0x9a
| 08247c68: [<080649b8>] sig_handler_common+0x63/0x72
| 08247ce0: [<08064cac>] sig_handler+0x31/0x3d
| 08247cec: [<08064c0b>] handle_signal+0x4c/0x7a
| 08247d0c: [<08066327>] hard_handler+0xf/0x14
| 08247d1c: [<005c0420>] 0x5c0420
Commandline is `mount image.axfs /mnt -o loop -t axfs'.
Is there something wrong with the axfs version you submitted, or with
mkfs.axfs?
With kind regards,
Geert Uytterhoeven
Software Architect
Sony Techsoft Centre Europe
The Corporate Village · Da Vincilaan 7-D1 · B-1935 Zaventem · Belgium
Phone: +32 (0)2 700 8453
Fax: +32 (0)2 700 8622
E-mail: [email protected]
Internet: http://www.sony-europe.com/
A division of Sony Europe (Belgium) N.V.
VAT BE 0413.825.160 · RPR Brussels
Fortis · BIC GEBABEBB · IBAN BE41293037680010