A long time ago, when i was a kid, I had dream. It went like this:
I am waking up in the twenty-first century and start my computer.
After completing the boot sequence, I start top to find that my memory is
equal to total disk-capacity. What's more, there is no more swap.
Apps are executed inplace, as if already loaded.
Physical RAM is used to cache slower storage RAM, much the same as the CPU
cache RAM caches slower physical RAM.
When I woke up, I was really looking forward for the new century.
Sadly, the current way of dealing with memory can at best only be described
as schizophrenic. Again the reason being, that we are still running in the
last-century mode.
Wouldn't it be nice to take advantage of todays 64bit archs and TB drives,
and run a more modern way of life w/o this memory/storage split personality?
All comments, other than "dream on", are most welcome!
Thanks!
--
Al
Al Boldi wrote:
> Apps are executed inplace, as if already loaded.
> Physical RAM is used to cache slower storage RAM, much the same as the CPU
> cache RAM caches slower physical RAM.
Linux and most other OSes have done that since.. oh, 20 years at least?
It's called "demand paging". The RAM is simply a cache of the
executable file on disk. The complicated-looking page fault mechanism
that you see is simply the cache management logic. In what way is
your vision different from demand paging?
> my memory is equal to total disk-capacity. What's more, there is no
> more swap. [...] Physical RAM is used to cache slower storage RAM,
> much the same as the CPU cache RAM caches slower physical RAM.
Windows has had that since, oh, Windows 95?
It's called "on-demand swap space", or making all the disk's free
space be usable for paging. The physical RAM is simply a cache of the
virtual "storage RAM". In what way is your vision different from
on-demand swap?
> Sadly, the current way of dealing with memory can at best only be described
> as schizophrenic. Again the reason being, that we are still running in the
> last-century mode.
>
> Wouldn't it be nice to take advantage of todays 64bit archs and TB drives,
> and run a more modern way of life w/o this memory/storage split personality?
In what way does your vision _behave_ any differently than what we have?
In my mind, "physical RAM is used to cache slower storage RAM" behaves
the same as demand paging, even if the terminology is different. The
code I guess you're referring to in the kernel, to handle paging to
storage, is simply one kernel's method of implementing that kind of cache.
It's not clear from anything you said how the computer in your dream
would behave any differently to the ones we've got now.
Can you describe that difference, if there is one?
Is it just an implementation idea, where the kernel does less of the
page caching logic and some bit of hardware does more of it
automatically? Given how little time is taken in kernel to do that,
and how complex the logic has to be for efficient caching decisions
between RAM and storage, it seems likely that any simple hardware
solution would behave the same, but slower.
-- Jamie
Al Boldi wrote:
>A long time ago, when i was a kid, I had dream. It went like this:
>
>I am waking up in the twenty-first century and start my computer.
>After completing the boot sequence, I start top to find that my memory is
>equal to total disk-capacity. What's more, there is no more swap.
>Apps are executed inplace, as if already loaded.
>Physical RAM is used to cache slower storage RAM, much the same as the CPU
>cache RAM caches slower physical RAM.
>
>
>
I'm sure you can find a 4GB disk on ebay.
>When I woke up, I was really looking forward for the new century.
>
>Sadly, the current way of dealing with memory can at best only be described
>as schizophrenic. Again the reason being, that we are still running in the
>last-century mode.
>
>Wouldn't it be nice to take advantage of todays 64bit archs and TB drives,
>and run a more modern way of life w/o this memory/storage split personality?
>
>
Perhaps you'd be interested in single-level store architectures, where
no distinction is made between memory and storage. IBM uses it in one
(or maybe more) of their systems. A particularly interesting example is
http://www.eros-os.org.
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
On So 21-01-06 21:08:41, Al Boldi wrote:
> A long time ago, when i was a kid, I had dream. It went like this:
>
> I am waking up in the twenty-first century and start my computer.
> After completing the boot sequence, I start top to find that my memory is
> equal to total disk-capacity. What's more, there is no more swap.
> Apps are executed inplace, as if already loaded.
> Physical RAM is used to cache slower storage RAM, much the same as the CPU
> cache RAM caches slower physical RAM.
...and then you try to execute mozilla in place, and your dream slowly
turns into nightmare, as letters start to appear, pixel by pixel...
[swap is backing store for anonymous memory. Think about it. You need
swap as long as you support malloc. You could always provide filename
with malloc, but hey, that starts to look like IBM mainframe. Plus
ability to powercycle the machine and *have* it boot (not continue
where it left) is lifesaver.]
Pavel
--
Thanks, Sharp!
On Sat, Jan 21, 2006 at 09:08:41PM +0300, Al Boldi wrote:
> A long time ago, when i was a kid, I had dream. It went like this:
>
> I am waking up in the twenty-first century and start my computer.
> After completing the boot sequence, I start top to find that my memory is
> equal to total disk-capacity. What's more, there is no more swap.
> Apps are executed inplace, as if already loaded.
> Physical RAM is used to cache slower storage RAM, much the same as the CPU
> cache RAM caches slower physical RAM.
>
> When I woke up, I was really looking forward for the new century.
>
> Sadly, the current way of dealing with memory can at best only be described
> as schizophrenic. Again the reason being, that we are still running in the
> last-century mode.
>
> Wouldn't it be nice to take advantage of todays 64bit archs and TB drives,
> and run a more modern way of life w/o this memory/storage split personality?
>
> All comments, other than "dream on", are most welcome!
Unfortunately, with Linux/Unix you are only going to get a "dream on".
Look at IBM's AS/400 with OS/400. It does that sort of thing. I am
sure there are others.
How do you handle a hot-plug device that is being brought online for
something like a backup? Assume it would the be removed when the backup
is complete. In your dream world, anon-pages could be written to the
device. As the backup proceeds and the disk fills, would those pages
be migrated to a different backing store, any processes which wrote
to the device killed, or would the backup be forced to abort/move to a
different device? When the administrator then goes to remove the volume,
do they need to wait for all those pages to be migrated to a different
backing store?
Now assume a flash device gets added to the system. Would we allow
paging to happen to that device? If so, how does the administrator or
user control the added wear-and-tear on the device?
Now consider a system that has 8K drives (don't laugh, I know of one
system with more). How do we keep track of the pages on those devices?
Right now, there is enough information in the space of a pte (64 bits
on ia64, not sure about other archs) to locate that page on the device.
For your proposal to work, we need a better way to track pages when they
are on backing store. It would need to be nearly unlimited in number
of devices and device offset.
On one large machine I know of, the MTBF for at least one of the drives
in the system is around 3 hours. With your proposal, would we reboot
every time some part of disk fails to be available or would we need
to keep track of where those pages are and kill all user processes on
that device? Imagine the amount of kernel memory required to track all
those pages of anonymous memory. You would end up with a situation where
adding a disk to the system would force you to consume some substatial
portion of kernel memory. Would we allow kernel pages to be migrated to
backing store as well? If so, how would we handle a failure of backing
devices with kernel pages. Would users accept that the longest one of
their jobs can run without being terminated is around 3 hours?
Your simple world introduces a level of complexity to the kernel which
is nearly unmanageable. Basically, you are asking the system to intuit
your desires. The swap device/file scheme allows an administrator to
control some aspects of their system while giving the kernel developer
a reasonable number of variables to work with. That, at least to me,
does not sound schizophrenic, but rather very reasonable.
Sorry for raining on your parade,
Robin Holt
On 1/21/06, Al Boldi <[email protected]> wrote:
> A long time ago, when i was a kid, I had dream. It went like this:
[snip]
FWIW, Mac OS X is one step closer to your vision than the typical
Linux distribution: It has a directory for swapfiles -- /var/vm -- and
it creates new swapfiles there as needed. (It used to be that each
swapfile would be 80MB, but the iMac next to me just has a single 64MB
swapfile, so maybe Mac OS 10.4 does something different now.)
--On January 22, 2006 11:55:37 AM -0800 "Barry K. Nathan"
<[email protected]> wrote:
> On 1/21/06, Al Boldi <[email protected]> wrote:
>> A long time ago, when i was a kid, I had dream. It went like this:
> [snip]
>
> FWIW, Mac OS X is one step closer to your vision than the typical
> Linux distribution: It has a directory for swapfiles -- /var/vm -- and
> it creates new swapfiles there as needed. (It used to be that each
> swapfile would be 80MB, but the iMac next to me just has a single 64MB
> swapfile, so maybe Mac OS 10.4 does something different now.)
/var/vm/swap*
64M swapfile0
64M swapfile1
128M swapfile2
256M swapfile3
512M swapfile4
512M swapfile5
1.5G total
However only the first 5 are in use. the 6th just represents the peak swap
usage on this machine. This is on 10.4.
On Sunday 22 January 2006 23:23, Michael Loftis wrote:
> --On January 22, 2006 11:55:37 AM -0800 "Barry K. Nathan"
>
> <[email protected]> wrote:
> > On 1/21/06, Al Boldi <[email protected]> wrote:
> >> A long time ago, when i was a kid, I had dream. It went like this:
> >
> > [snip]
> >
> > FWIW, Mac OS X is one step closer to your vision than the typical
> > Linux distribution: It has a directory for swapfiles -- /var/vm -- and
> > it creates new swapfiles there as needed. (It used to be that each
> > swapfile would be 80MB, but the iMac next to me just has a single 64MB
> > swapfile, so maybe Mac OS 10.4 does something different now.)
Just as a curiosity... does anyone have any guesses as to the runtime
performance cost of hosting one or more swap files (which thanks to on demand
creation and growth are presumably built of blocks scattered around the disk)
versus having one or more simple contiguous swap partitions?
I think it's probably a given that swap partitions are better; I'm just
curious how much better they might actually be.
Cheers,
Chase
On 1/22/06, Chase Venters <[email protected]> wrote:
> Just as a curiosity... does anyone have any guesses as to the runtime
> performance cost of hosting one or more swap files (which thanks to on demand
> creation and growth are presumably built of blocks scattered around the disk)
> versus having one or more simple contiguous swap partitions?
>
> I think it's probably a given that swap partitions are better; I'm just
> curious how much better they might actually be.
If you google "mac os x swap partition", you'll find benchmarks from
several years ago. (Although, those benchmarks are with a partition
dedicated to the dynamically created swap files. It does more or less
ensure that the files are contiguous though.) Mac OS X was *much* more
of a dog back then, in terms of performance, so I don't know how
relevant those benchmarks are nowadays, but it might be a starting
point for answering your question.
--
-Barry K. Nathan <[email protected]>
Chase Venters wrote:
> Just as a curiosity... does anyone have any guesses as to the
> runtime performance cost of hosting one or more swap files (which
> thanks to on demand creation and growth are presumably built of
> blocks scattered around the disk) versus having one or more simple
> contiguous swap partitions?
> I think it's probably a given that swap partitions are better; I'm just
> curious how much better they might actually be.
When programs must access files in addition to swapping, and that
includes demand-paged executable files, swap files have the
_potential_ to be faster because they provide opportunities to use the
disk nearer the files which are being accessed. This is more so is
all the filesystem's free space is available for swapping. A swap
partition in this scenario forces the disk head to move back and forth
between the swap partition and the filesystem.
-- Jamie
On 1/22/06, Michael Loftis <[email protected]> wrote:
>
> > FWIW, Mac OS X is one step closer to your vision than the typical
> > Linux distribution: It has a directory for swapfiles -- /var/vm -- and
> > it creates new swapfiles there as needed. (It used to be that each
> > swapfile would be 80MB, but the iMac next to me just has a single 64MB
> > swapfile, so maybe Mac OS 10.4 does something different now.)
> /var/vm/swap*
> 64M swapfile0
> 64M swapfile1
> 128M swapfile2
> 256M swapfile3
> 512M swapfile4
> 512M swapfile5
> 1.5G total
>
Linux also supports multiple swap files . But these are more
beneficial if there are more than one disk in the system so that i/o
can be done in parallel. These swap files may be activated at run time
based on some criteria.
Regards
Ram Gupta
El Mon, 23 Jan 2006 09:05:41 -0600,
Ram Gupta <[email protected]> escribi?:
> Linux also supports multiple swap files . But these are more
There're in fact a "dynamic swap" tool which apparently
does what mac os x do: http://dynswapd.sourceforge.net/
However, I doubt the approach is really useful. If you need that much
swap space, you're going well beyond the capabilities of the machine.
In fact, I bet that most of the cases of machines needing too much
memory will be because of bugs in the programs and OOM'ing would be
a better solution.
On Mon, 23 Jan 2006, Diego Calleja wrote:
> El Mon, 23 Jan 2006 09:05:41 -0600,
> Ram Gupta <[email protected]> escribi?:
>
>> Linux also supports multiple swap files . But these are more
>
> There're in fact a "dynamic swap" tool which apparently
> does what mac os x do: http://dynswapd.sourceforge.net/
>
> However, I doubt the approach is really useful. If you need that much
> swap space, you're going well beyond the capabilities of the machine.
> In fact, I bet that most of the cases of machines needing too much
> memory will be because of bugs in the programs and OOM'ing would be
> a better solution.
You have roughly 2 GB of dynamic address-space avaliable to each
task (stuff that's not the kernel and not the runtime libraries).
You can easily have 500 tasks, even RedHat out-of-the-box creates
about 60 tasks. That's 1,000 GB of potential swap-space required
to support this. This is not beyond the capabilites of a 32-bit
machine with a fast front-side bus and fast I/O (like wide SCSI).
Some persons tend to forget that 32-bit address space is available
to every user, some is shared, some is not. A reasonable rule-of-
thumb is to provide enough swap-space to duplicate the address-
space of every potential task.
Cheers,
Dick Johnson
Penguin : Linux version 2.6.13.4 on an i686 machine (5589.54 BogoMips).
Warning : 98.36% of all statistics are fiction.
****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.
Thank you.
linux-os (Dick Johnson) wrote:
> On Mon, 23 Jan 2006, Diego Calleja wrote:
> > However, I doubt the approach is really useful. If you need that much
> > swap space, you're going well beyond the capabilities of the machine.
> > In fact, I bet that most of the cases of machines needing too much
> > memory will be because of bugs in the programs and OOM'ing would be
> > a better solution.
>
> You have roughly 2 GB of dynamic address-space avaliable to each
> task (stuff that's not the kernel and not the runtime libraries).
> You can easily have 500 tasks, even RedHat out-of-the-box creates
> about 60 tasks. That's 1,000 GB of potential swap-space required
> to support this.
And how many machines is it useful to use that much swap-space on?
> This is not beyond the capabilites of a 32-bit
> machine with a fast front-side bus and fast I/O (like wide SCSI).
Anything but the most expensively RAM-equipped machine would be stuck
in a useless swap-storm, if it's got 1000GB of GB of active swap space
and only a relatively tiny amount of physical RAM (e.g. 16GB). The
same is true if only, say, 10% of the swap space is in active use.
Wide SCSI isn't fast enough to make that useful.
I think that was the point Diego was making: you can use that much
swap space, but by the time you do, whatever task you hoped to
accomplish won't get anywhere due to the swap-storm.
> Some persons tend to forget that 32-bit address space is available
> to every user, some is shared, some is not. A reasonable rule-of-
> thumb is to provide enough swap-space to duplicate the address-
> space of every potential task.
I think that's a ridiculous rule of thumb. Not least because (a) even
the biggest drive available (e.g. 1TB) doesn't provide that much
swap-space, and (b) if you're actively using only a tiny fraction of
that, your machine has already become uselessly slow - even root
logins and command prompts don't work under those conditions.
-- Jamie
Robin Holt wrote:
> On Sat, Jan 21, 2006 at 09:08:41PM +0300, Al Boldi wrote:
> >
> > Wouldn't it be nice to take advantage of todays 64bit archs and TB
> > drives, and run a more modern way of life w/o this memory/storage split
> > personality?
>
> Your simple world introduces a level of complexity to the kernel which
> is nearly unmanageable. Basically, you are asking the system to intuit
> your desires. The swap device/file scheme allows an administrator to
> control some aspects of their system while giving the kernel developer
> a reasonable number of variables to work with. That, at least to me,
> does not sound schizophrenic, but rather very reasonable.
>
> Sorry for raining on your parade,
Thanks for your detailed response, it rather felt like a fresh breeze.
Really, I was more thinking about a step by step rather than an all or none
approach. Something that would involve tmpfs merged with swap mapped into
linear address space limited by arch bits, and everything else connected as
archive.
The idea here is to run inside swap instead of using it as an addon.
In effect running inside memory cached by physical RAM.
Wouldn't something like this at least represent a simple starting point?
Thanks!
--
Al
On Mon, 23 Jan 2006 21:03:06 +0300, Al Boldi said:
> The idea here is to run inside swap instead of using it as an addon.
> In effect running inside memory cached by physical RAM.
>
> Wouldn't something like this at least represent a simple starting point?
We *already* treat RAM as a cache for the swap space and other backing store
(for instance, paging in executable code from a file), if you're looking at
it from the 30,000 foot fly-over...
However, it quickly digresses from a "simple starting point" when you try to
get decent performance out of it, even when people are doing things that tend
to make your algorithm fold up. A machine with a gigabyte of memory has on the
order of a quarter million 4K pages - which page are you going to move out to
swap to make room? And if you guess wrong, multiple processes will stall as
the system starts to thrash. (In fact, "thrashing" is just a short way of
saying "consistently guessing wrong as to which pages will be needed soon"....)
But hey, if you got a new page replacement algorithm that performs better,
feel free to post the code.. ;)
Example of why it's a pain in the butt:
A process does a "read(foo, &buffer, 65536);". buffer is declared as 16
contiguous 4K pages, none of which are currently in memory. How many pages do
you have to read in, and at what point do you issue the I/O? (hint - work this
problem for a device that's likely to return 64K of data, and again for a
device that has a high chance of only returning 2K of data.....)
But yeah, other than all the cruft like that, it's simple. :)
On Mon, Jan 23, 2006 at 01:40:46PM -0500, [email protected] wrote:
> A process does a "read(foo, &buffer, 65536);". buffer is declared as 16
> contiguous 4K pages, none of which are currently in memory. How many pages do
> you have to read in, and at what point do you issue the I/O? (hint - work this
> problem for a device that's likely to return 64K of data, and again for a
> device that has a high chance of only returning 2K of data.....)
Actually, that is something that the vm could optimize out of the picture
entirely -- it is a question of whether it is worth the added complexity
to handle such a case. copy_to_user already takes a slow path when it hits
the page fault (we do a lookup on the exception handler already) and could
test if an entire page is being overwritten, and if so proceed to destroy
the old mapping and use a fresh page from ram.
That said, for the swap case, it probably happens so rarely that the extra
code isn't worth it. glibc is already using mmap() in place of read() for
quite a few apps, so I'm not sure how much low hanging fruit there is left.
If someone has an app that's read() heavy, it is probably easier to convert
it to mmap() -- the exception being pipes and sockets which can't. We need
numbers. =-)
-ben
--
"Ladies and gentlemen, I'm sorry to interrupt, but the police are here
and they've asked us to stop the party." Don't Email: <[email protected]>.
On Mon, 23 Jan 2006 14:26:06 EST, Benjamin LaHaise said:
> Actually, that is something that the vm could optimize out of the picture
> entirely -- it is a question of whether it is worth the added complexity
> to handle such a case. copy_to_user already takes a slow path when it hits
> the page fault (we do a lookup on the exception handler already) and could
> test if an entire page is being overwritten, and if so proceed to destroy
> the old mapping and use a fresh page from ram.
That was my point - it's easy till you start trying to get actual performance
out of it by optimizing stuff like that. ;)
>Perhaps you'd be interested in single-level store architectures, where
>no distinction is made between memory and storage. IBM uses it in one
>(or maybe more) of their systems.
It's the IBM Eserver I Series, nee System/38 (A.D. 1980), aka AS/400.
It was expected at one time to be the next generation of computer
architecture, but it turned out that the computing world had matured to
the point that it was more important to be backward compatible than to
push frontiers.
The single 128 bit address space addresses every byte of information in
the system. The underlying system keeps the majority of it on disk, and
the logic that loads stuff into electronic memory when it has to be there
is below the level that any ordinary program would see, much like the
logic in an IA32 CPU that loads stuff into processor cache. It's worth
noting that nowhere in an I Series machine is a layer that looks like a
CPU Linux runs on; it's designed for single level storage from the gates
on up through the operating system.
I found Al's dream rather vague, which explains why several people
inferred different ideas from it (and then beat them down). It sort of
sounds like single level storage, but also like virtual memory and like
mmap. I assume it's actually supposed to be something different from all
those.
I personally have set my sights further down the road: I want an address
space that addresses every byte of information in the universe, not just
"in" a computer system. And the infrastructure should move it around
among various media for optimal access without me worrying about it.
--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems
>>>>> "Jamie" == Jamie Lokier <[email protected]> writes:
Jamie> Chase Venters wrote:
>> Just as a curiosity... does anyone have any guesses as to the
>> runtime performance cost of hosting one or more swap files (which
>> thanks to on demand creation and growth are presumably built of
>> blocks scattered around the disk) versus having one or more simple
>> contiguous swap partitions?
>> I think it's probably a given that swap partitions are better; I'm
>> just curious how much better they might actually be.
Jamie> When programs must access files in addition to swapping, and
Jamie> that includes demand-paged executable files, swap files have
Jamie> the _potential_ to be faster because they provide opportunities
Jamie> to use the disk nearer the files which are being accessed.
If you can, put your swap on a different spindle...
Actually, the original poster's `dream' looked a lot like a
single-address-space operating system, such as Mungi (
http://www.cse.unsw.edu.au/~disy/Mungi/ )
--
Dr Peter Chubb http://www.gelato.unsw.edu.au peterc AT gelato.unsw.edu.au
http://www.ertos.nicta.com.au ERTOS within National ICT Australia
--On January 23, 2006 9:05:41 AM -0600 Ram Gupta <[email protected]>
wrote:
>
> Linux also supports multiple swap files . But these are more
> beneficial if there are more than one disk in the system so that i/o
> can be done in parallel. These swap files may be activated at run time
> based on some criteria.
You missed the point. The kernel in OS X maintains creation and use of
these files automatically. The point wasn't oh wow multiple files' it was
that it creates them on the fly. I just posted back with the apparent new
method that's being used. I'm not sure if the 512MB number continues or if
the next file will be 1Gb or another 512M. Or of memory size affects it or
not.
I'm sure developer.apple.com or apple darwin pages have the information
somewhere.
On Po 23-01-06 21:03:06, Al Boldi wrote:
> Robin Holt wrote:
> > On Sat, Jan 21, 2006 at 09:08:41PM +0300, Al Boldi wrote:
> > >
> > > Wouldn't it be nice to take advantage of todays 64bit archs and TB
> > > drives, and run a more modern way of life w/o this memory/storage split
> > > personality?
> >
> > Your simple world introduces a level of complexity to the kernel which
> > is nearly unmanageable. Basically, you are asking the system to intuit
> > your desires. The swap device/file scheme allows an administrator to
> > control some aspects of their system while giving the kernel developer
> > a reasonable number of variables to work with. That, at least to me,
> > does not sound schizophrenic, but rather very reasonable.
> >
> > Sorry for raining on your parade,
>
> Thanks for your detailed response, it rather felt like a fresh breeze.
>
> Really, I was more thinking about a step by step rather than an all or none
> approach. Something that would involve tmpfs merged with swap mapped into
> linear address space limited by arch bits, and everything else connected as
> archive.
>
> The idea here is to run inside swap instead of using it as an addon.
> In effect running inside memory cached by physical RAM.
And if you do not want to run inside swap? For example because your
machine has only RAM? This will not fly.
Having dreams is nice, but please avoid sharing them unless they come
with patches attached.
Pavel
--
Thanks, Sharp!
Michael Loftis writes:
>
>
> --On January 23, 2006 9:05:41 AM -0600 Ram Gupta <[email protected]>
> wrote:
>
> >
> > Linux also supports multiple swap files . But these are more
> > beneficial if there are more than one disk in the system so that i/o
> > can be done in parallel. These swap files may be activated at run time
> > based on some criteria.
>
> You missed the point. The kernel in OS X maintains creation and use of
> these files automatically. The point wasn't oh wow multiple files' it was
> that it creates them on the fly. I just posted back with the apparent new
This can be done in Linux from user-space: write a script that monitors
free swap space (grep SwapFree /proc/meminfo), and adds/removes new swap
files err... on-the-fly, or --even better-- just-in-time.
The unique feature that Mac OS X VM does have, on the other hand, is
that it keeps profiles of access patterns of applications, and stores
then in files, associated with executables. This allows to quickly
pre-fault necessary pages during application startup (and this makes OSX
boot so fast).
Nikita.
On 1/23/06, Michael Loftis <[email protected]> wrote:
> You missed the point. The kernel in OS X maintains creation and use of
> these files automatically. The point wasn't oh wow multiple files' it was
> that it creates them on the fly. I just posted back with the apparent new
> method that's being used. I'm not sure if the 512MB number continues or if
> the next file will be 1Gb or another 512M. Or of memory size affects it or
> not.
>
> I'm sure developer.apple.com or apple darwin pages have the information
> somewhere.
>
What do you mean by automatically? As I understand there is no such
thing .If there is a task it has to be done by someone. Something is
done automatically from application point of done because kernel takes
care of that. So if creation and use of swap files is done
automatically then who does it? Is it done by hardware?
ML> You missed the point. The kernel in OS X maintains creation and use of
ML> these files automatically. The point wasn't oh wow multiple files' it was
ML> that it creates them on the fly. I just posted back with the apparent new
ML> method that's being used. I'm not sure if the 512MB number continues or if
ML> the next file will be 1Gb or another 512M. Or of memory size affects it or
ML> not.
Not in kernel but userspace, seems like Linux:
http://developer.apple.com/documentation/Darwin/Reference/ManPages/man8/dynamic_pager.8.html
The dynamic_pager daemon manages a pool of external swap files which the
kernel uses to support demand paging. This pool is expanded with new
swap files as load on the system increases, and contracted when the swap-swapping
ping resources are no longer needed. The dynamic_pager daemon also pro-provides
vides a notification service for those applications which wish to receive
notices when the external paging pool expands or contracts.
--
Meelis Roos
On 1/23/06, Nikita Danilov <[email protected]> wrote:
>
> The unique feature that Mac OS X VM does have, on the other hand, is
> that it keeps profiles of access patterns of applications, and stores
> then in files, associated with executables. This allows to quickly
> pre-fault necessary pages during application startup (and this makes OSX
> boot so fast).
This feature is interesting though I am not sure about the fast boot
part of OSX.
as at boot time these applications are all started first time. So
there were no access pattern as yet. They still have to be demand
paged. But yes later accesses may be faster.
Thanks
Ram gupta
El Tue, 24 Jan 2006 08:36:50 -0600,
Ram Gupta <[email protected]> escribi?:
> This feature is interesting though I am not sure about the fast boot
> part of OSX.
> as at boot time these applications are all started first time. So
> there were no access pattern as yet. They still have to be demand
> paged. But yes later accesses may be faster.
The stats are saved on disk (at least on windows). You don't really
care about "later accesses" when everything is already in cache,
this is supposed to speed up cold-cache startup. I don't know
if mac os x does it for every app, the darwin code I saw was
only for the startup of the system not for every app, but maybe that
part was in another module
Linux is the one desktop lacking something like this, both windows
and max os x have things like this. I've wondered for long time if
it's worth of it and if it could improve things in linux. The
prefault part is easy once you get the data. The hard part is to get
the statistics: I wonder if mincore(), /proc/$PID/maps
and the recently posted /proc/$PID/pmap and all the statistics
the kernel can provide today are enought, or it's neccesary
something more complex.
Ram Gupta writes:
> On 1/23/06, Nikita Danilov <[email protected]> wrote:
>
> >
> > The unique feature that Mac OS X VM does have, on the other hand, is
> > that it keeps profiles of access patterns of applications, and stores
> > then in files, associated with executables. This allows to quickly
> > pre-fault necessary pages during application startup (and this makes OSX
> > boot so fast).
>
> This feature is interesting though I am not sure about the fast boot
> part of OSX.
> as at boot time these applications are all started first time. So
> there were no access pattern as yet. They still have to be demand
That's the point: information about access patterns is stored in the
file. So next time when application is started (e.g., during boot)
kernel reads that file and pre-faults pages.
> paged. But yes later accesses may be faster.
>
> Thanks
> Ram gupta
Nikita.
linux-os \(Dick Johnson\) <[email protected]> wrote:
> On Mon, 23 Jan 2006, Diego Calleja wrote:
[...]
> > However, I doubt the approach is really useful. If you need that much
> > swap space, you're going well beyond the capabilities of the machine.
> > In fact, I bet that most of the cases of machines needing too much
> > memory will be because of bugs in the programs and OOM'ing would be
> > a better solution.
Good rule of thumb: If you run into swap, add RAM. Swap is /extremely/ slow
memory, however fast you make it go. RAM is not expensive anymore...
> You have roughly 2 GB of dynamic address-space avaliable to each
> task (stuff that's not the kernel and not the runtime libraries).
Right. But your average task is far from that size, and most of it resides
in shared libraries and (perhaps shared) executables, and is perhaps even
COW shared with other tasks.
> You can easily have 500 tasks,
Even thousands.
> even RedHat out-of-the-box creates
> about 60 tasks. That's 1,000 GB of potential swap-space required
> to support this.
But you really never do. That is the point.
--
Dr. Horst H. von Brand User #22616 counter.li.org
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria +56 32 654239
Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513
linux-os \(Dick Johnson\) <[email protected]> wrote:
> On Mon, 23 Jan 2006, Diego Calleja wrote:
[...]
> > However, I doubt the approach is really useful. If you need that much
> > swap space, you're going well beyond the capabilities of the machine.
> > In fact, I bet that most of the cases of machines needing too much
> > memory will be because of bugs in the programs and OOM'ing would be
> > a better solution.
Good rule of thumb: If you run into swap, add RAM. Swap is /extremely/ slow
memory, however fast you make it go. RAM is not expensive anymore...
> You have roughly 2 GB of dynamic address-space avaliable to each
> task (stuff that's not the kernel and not the runtime libraries).
Right. But your average task is far from that size, and most of it resides
in shared libraries and (perhaps shared) executables, and is perhaps even
COW shared with other tasks.
> You can easily have 500 tasks,
Even thousands.
> even RedHat out-of-the-box creates
> about 60 tasks. That's 1,000 GB of potential swap-space required
> to support this.
But you really never do. That is the point.
--
Dr. Horst H. von Brand User #22616 counter.li.org
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria +56 32 654239
Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513
>Linux is the one desktop lacking something like this, both windows
>and max os x have things like this. I've wondered for long time if
>it's worth of it and if it could improve things in linux. The
>prefault part is easy once you get the data. The hard part is to get
>the statistics:
If you focus on the system startup speed problem, the stats are quite a
bit simpler. If you can take a snapshot of every mmap page in memory
shortly after startup (and verify that no page frames were stolen during
startup) and save that, you could just prefault all those pages in, in a
single sweep, at the next boot.
--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems
Horst von Brand wrote:
> Good rule of thumb: If you run into swap, add RAM. Swap is /extremely/ slow
> memory, however fast you make it go. RAM is not expensive anymore...
Actually, RAM is expensive if you've reached the limits of your
machine and have to buy a new machine to get more RAM.
That's exactly the situation I've reached with my laptop. It's
extremely annoying.
-- Jamie
On Mon, 2006-01-23 at 23:08 -0300, Horst von Brand wrote:
[...]
> Good rule of thumb: If you run into swap, add RAM. Swap is /extremely/ slow
> memory, however fast you make it go. RAM is not expensive anymore...
- Except on laptops where you usually can't add *any* RAM. And if you
can, it is *much much* more expensive than on "normal" PCs.
- Except if you - for whatever reason - have to throw out smaller RAMs
to get larger (and much more expensive) RAMs into it.
- Except (as someone else mentioned) you have already equipped your main
board to the max.
> > You have roughly 2 GB of dynamic address-space avaliable to each
> > task (stuff that's not the kernel and not the runtime libraries).
>
> Right. But your average task is far from that size, and most of it resides
> in shared libraries and (perhaps shared) executables, and is perhaps even
> COW shared with other tasks.
>
> > You can easily have 500 tasks,
>
> Even thousands.
>
> > even RedHat out-of-the-box creates
> > about 60 tasks. That's 1,000 GB of potential swap-space required
> > to support this.
And after login (on XFCE + a few standard tools in my case) > 200.
> But you really never do. That is the point.
ACK. X, evolution and Mozilla family (to name standard apps) are the
exceptions to this rule.
Bermd
--
Firmix Software GmbH http://www.firmix.at/
mobil: +43 664 4416156 fax: +43 1 7890849-55
Embedded Linux Development and Services
On Wed, 2006-01-25 at 10:23 +0100, Bernd Petrovitsch wrote:
>
> ACK. X, evolution and Mozilla family (to name standard apps) are the
> exceptions to this rule.
If you decrease RLIMIT_STACK from the default 8MB to 256KB or 512KB you
will reduce the footprint of multithreaded apps like evolution by tens
or hundreds of MB, as glibc sets the thread stack size to RLIMIT_STACK
by default.
Lee
Lee Revell wrote:
> On Wed, 2006-01-25 at 10:23 +0100, Bernd Petrovitsch wrote:
> >
> > ACK. X, evolution and Mozilla family (to name standard apps) are the
> > exceptions to this rule.
>
> If you decrease RLIMIT_STACK from the default 8MB to 256KB or 512KB you
> will reduce the footprint of multithreaded apps like evolution by tens
> or hundreds of MB, as glibc sets the thread stack size to RLIMIT_STACK
> by default.
That should make no difference to the real memory usage. Stack pages
which aren't used don't take up RAM, and don't count in RSS.
-- Jamie
Bernd Petrovitsch wrote:
> ACK. X, evolution and Mozilla family (to name standard apps) are the
> exceptions to this rule.
Mozilla / Firefox / Opera in particular. 300MB is not funny on a
laptop which cannot be expanded beyond 192MB. Are there any usable
graphical _small_ web browsers around? Usable meaning actually works
on real web sites with fancy features.
-- Jamie
On Wed, 2006-01-25 at 15:05 +0000, Jamie Lokier wrote:
> Bernd Petrovitsch wrote:
> > ACK. X, evolution and Mozilla family (to name standard apps) are the
> > exceptions to this rule.
>
> Mozilla / Firefox / Opera in particular. 300MB is not funny on a
> laptop which cannot be expanded beyond 192MB. Are there any usable
It is also not funny on 512M if you have other apps running.
> graphical _small_ web browsers around? Usable meaning actually works
> on real web sites with fancy features.
None that I'm aware of:
- dillo doesn't know CSS and/or Javascript.
- epiphany is the Gnome standard browser - so it probably plays in the
memory hog league.
- konqueror is KDEs default browser. I've never really used it.
- ____________________________________________________
Bernd
--
Firmix Software GmbH http://www.firmix.at/
mobil: +43 664 4416156 fax: +43 1 7890849-55
Embedded Linux Development and Services
El Wed, 25 Jan 2006 15:05:16 +0000,
Jamie Lokier <[email protected]> escribi?:
> Mozilla / Firefox / Opera in particular. 300MB is not funny on a
> laptop which cannot be expanded beyond 192MB. Are there any usable
> graphical _small_ web browsers around? Usable meaning actually works
> on real web sites with fancy features.
Opera is probably the best browser when it comes to "features per byte
of memory used", so if that isn't useful....there's a minimo web browser
(http://www.mozilla.org/projects/minimo/) It's supposed to be designed
for mobile devices, but it may be usable on normal
computers.
The X server itself doesn't eat too many memory. In my box (radeon
9200SE graphic card) the X server only eats 11 MB of RAM - not too
much in my opinion for a 20-years-old code project which according
to the X developers it has many areas where it could be cleaned up.
The X server will grow its size because applications store the
images in the X server. And the X server is supposed to be
network-transparent, so apps send to the x server the data, not
a "reference to the data" (ie: a path to a file), so (i think) the
file cannot be mmap'ed to share the file in memory: there're still
some apps (or so I've heard) which send a image to the server
and keep a private copy in their own address space so the memory
needed to store those images is *doubled* (gnome used to keep
*three* copies of the background image, one in nautilus, other
in gnome-settings-daemon and another in the X server, and
gnome-terminal keeps another copy when transparency is enabled)
Also, fontconfig allocates ~100 KB of memory per program launched.
There're patches to fix that by creating a mmap'able cache which is
shared between all the applications which has been merged in the
development version. I think there're many low-hanging fruits at
all levels, the problem is not just mozilla & friends
Diego Calleja wrote:
> Opera is probably the best browser when it comes to "features per byte
> of memory used"
Really? If I'm making use it, maybe visiting a few hundred pages a
day, and opening 20 tabs, I find I have to kill it every few days, to
reclaim the memory it's hogging, when its resident size exceeds my RAM
size and it starts chugging.
> Also, fontconfig allocates ~100 KB of memory per program launched.
> There're patches to fix that by creating a mmap'able cache which is
> shared between all the applications which has been merged in the
> development version. I think there're many low-hanging fruits at
> all levels, the problem is not just mozilla & friends
100kB per program, even for 100 programs, is nothing compared a
browser's 300MB footprint. Now, some of that 300MB is permanently
swapped out for the first few days of running. Libraries and such.
Which is relevant to this thread: swap is useful, just so you can swap
out completely unused parts of programs. (The parts which could be
optimised away in principle).
-- Jamie
Bryan Henderson wrote:
> >Perhaps you'd be interested in single-level store architectures, where
> >no distinction is made between memory and storage. IBM uses it in one
> >(or maybe more) of their systems.
>
> It's the IBM Eserver I Series, nee System/38 (A.D. 1980), aka AS/400.
>
> It was expected at one time to be the next generation of computer
> architecture, but it turned out that the computing world had matured to
> the point that it was more important to be backward compatible than to
> push frontiers.
>
> The single 128 bit address space addresses every byte of information in
> the system. The underlying system keeps the majority of it on disk, and
> the logic that loads stuff into electronic memory when it has to be there
> is below the level that any ordinary program would see, much like the
> logic in an IA32 CPU that loads stuff into processor cache. It's worth
> noting that nowhere in an I Series machine is a layer that looks like a
> CPU Linux runs on; it's designed for single level storage from the gates
> on up through the operating system.
>
> I found Al's dream rather vague, which explains why several people
> inferred different ideas from it (and then beat them down). It sort of
> sounds like single level storage, but also like virtual memory and like
> mmap. I assume it's actually supposed to be something different from all
> those.
Not really different, but rather an attempt to use hardware in a
native/direct fashion w/o running circles. But first let's look at the
reasons that led the industry to this mem/disk personality split.
Consider these archs:
bits space
8 256
16 64K
32 4G
64 16GG=16MT
128 256GGGG=256TTT
It follows that with
8 and 16 bits you are forced to split
32 is inbetween
64 is more than enough for most purposes
128 is astronomical for most purposes
:
:
So we have a situation right now that imposes a legacy solution on hardware
that is really screaming (64+) to be taken advantage of. This does not mean
that we have to blow things out of proportion and reinvent the wheel, but
instead revert the workaround that was necessary in the past (-32).
If reverted properly, things should be completely transparent to user-space
and definitely faster, lots faster, especially under load. Think about it.
Thanks!
--
Al
On 23 Jan 2006, Diego Calleja wrote:
> El Mon, 23 Jan 2006 09:05:41 -0600,
> Ram Gupta <[email protected]> escribi?:
>
>> Linux also supports multiple swap files . But these are more
>
> There're in fact a "dynamic swap" tool which apparently
> does what mac os x do: http://dynswapd.sourceforge.net/
>
> However, I doubt the approach is really useful. If you need that much
> swap space, you're going well beyond the capabilities of the machine.
Well, to some extent it depends on your access patterns. The backup
program I use (`dar') is an enormous memory hog: it happily eats 5Gb on
my main fileserver (an UltraSPARC, so compiling it 64-bit does away with
address space sizing problems). That machine has only 512Mb RAM, so
you'd expect the thing would be swapping to death; but the backup
program's locality of reference is sufficiently good that it doesn't
swap much at all (and that in one tight lump at the end).
--
`Everyone has skeletons in the closet. The US has the skeletons
driving living folks into the closet.' --- Rebecca Ore
On Wed, 2006-01-25 at 15:02 +0000, Jamie Lokier wrote:
> Lee Revell wrote:
> > On Wed, 2006-01-25 at 10:23 +0100, Bernd Petrovitsch wrote:
> > >
> > > ACK. X, evolution and Mozilla family (to name standard apps) are the
> > > exceptions to this rule.
> >
> > If you decrease RLIMIT_STACK from the default 8MB to 256KB or 512KB you
> > will reduce the footprint of multithreaded apps like evolution by tens
> > or hundreds of MB, as glibc sets the thread stack size to RLIMIT_STACK
> > by default.
>
> That should make no difference to the real memory usage. Stack pages
> which aren't used don't take up RAM, and don't count in RSS.
It still seems like not allocating memory that the application will
never use could enable the VM to make better decisions. Also not
wasting 7.5MB per thread for the stack should make tracking down actual
bloat in the libraries easier.
Lee
On Wed, 2006-01-25 at 15:05 +0000, Jamie Lokier wrote:
> Bernd Petrovitsch wrote:
> > ACK. X, evolution and Mozilla family (to name standard apps) are the
> > exceptions to this rule.
>
> Mozilla / Firefox / Opera in particular. 300MB is not funny on a
> laptop which cannot be expanded beyond 192MB. Are there any usable
> graphical _small_ web browsers around? Usable meaning actually works
> on real web sites with fancy features.
"Small" and "fancy features" are not compatible.
That's the problem with the term "usable" - to developers it means
"supports the basic core functionality of a web browser" while to users
it means "supports every bell and whistle that I get on Windows".
Lee
On 1/23/06, Bryan Henderson <[email protected]> wrote:
> >Perhaps you'd be interested in single-level store architectures, where
> >no distinction is made between memory and storage. IBM uses it in one
> >(or maybe more) of their systems.
Are there any Linux file systems that work by mmapping the entire
drive and using the paging system to do the read/writes? With 64 bits
there's enough address space to do that now. How does this perform
compared to a traditional block based scheme?
With the IBM 128b address space aren't the devices vulnerable to an
errant program spraying garbage into the address space? Is it better
to map each device into it's own address space?
--
Jon Smirl
[email protected]
El Wed, 25 Jan 2006 18:28:34 -0500,
Lee Revell <[email protected]> escribi?:
> > Mozilla / Firefox / Opera in particular. 300MB is not funny on a
> > laptop which cannot be expanded beyond 192MB. Are there any usable
> > graphical _small_ web browsers around? Usable meaning actually works
> > on real web sites with fancy features.
>
> "Small" and "fancy features" are not compatible.
>
> That's the problem with the term "usable" - to developers it means
> "supports the basic core functionality of a web browser" while to users
> it means "supports every bell and whistle that I get on Windows".
That'd be a interesting philosophical (and somewhat offtopic) flamewar:
It's is theorically possible to write a operative system with bells and
whistles for a computer with 200 MB of ram? 200 MB is really a lot of
ram....I'm really surprised at how easy is to write a program that eats
a docen of MB of ram just by showing a window and a few buttons.
In my perfect world, a superhero (say, Linus ;) would analyze and
redesign the whole software stack and would fix it. IMO some parts
of a complete gnu linux system have been accumulating fat with the
time, ej: plan 9's network abstraction could make possible to
kill tons of networking code from lot of apps...
On Thu, 2006-01-26 at 05:01 +0000, Jamie Lokier wrote:
> Lee Revell wrote:
> > > Mozilla / Firefox / Opera in particular. 300MB is not funny on a
> > > laptop which cannot be expanded beyond 192MB. Are there any usable
> > > graphical _small_ web browsers around? Usable meaning actually works
> > > on real web sites with fancy features.
> >
> > "Small" and "fancy features" are not compatible.
> >
> > That's the problem with the term "usable" - to developers it means
> > "supports the basic core functionality of a web browser" while to users
> > it means "supports every bell and whistle that I get on Windows".
>
> As both a developer and user, all I want is a web browser that works
> with the sites I visit, and performs reasonably well on my laptop.
>
> I know there are fast algorithms for layout, for running scripts and
> updating trees, and the memory usage doesn't have to be anywhere near
> as much as it is.
>
> So it's reasonable to ask if anyone has written a fast browser that
> works with current popular sites in fits in under 256MB after a few
> days use.
>
> Unfortunately, the response seems to be no, nobody has. I guess it's
> a big job and there isn't the interest and resourcing to do it.
>
What's wrong with Firefox?
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
rlrevell 6423 6.7 16.7 167676 73804 ? Sl Jan25 79:41 /usr/lib/firefox/firefox-bin -a firefox
73MB is not bad.
Obviously if you open 20 tabs, it will take a lot more memory, as it's
going to have to cache all the rendered pages.
Lee
Lee Revell wrote:
> > Mozilla / Firefox / Opera in particular. 300MB is not funny on a
> > laptop which cannot be expanded beyond 192MB. Are there any usable
> > graphical _small_ web browsers around? Usable meaning actually works
> > on real web sites with fancy features.
>
> "Small" and "fancy features" are not compatible.
>
> That's the problem with the term "usable" - to developers it means
> "supports the basic core functionality of a web browser" while to users
> it means "supports every bell and whistle that I get on Windows".
As both a developer and user, all I want is a web browser that works
with the sites I visit, and performs reasonably well on my laptop.
I know there are fast algorithms for layout, for running scripts and
updating trees, and the memory usage doesn't have to be anywhere near
as much as it is.
So it's reasonable to ask if anyone has written a fast browser that
works with current popular sites in fits in under 256MB after a few
days use.
Unfortunately, the response seems to be no, nobody has. I guess it's
a big job and there isn't the interest and resourcing to do it.
-- Jamie
On Thu, 2006-01-26 at 00:11 -0500, Lee Revell wrote:
> What's wrong with Firefox?
>
> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
> rlrevell 6423 6.7 16.7 167676 73804 ? Sl Jan25 79:41 /usr/lib/firefox/firefox-bin -a firefox
>
> 73MB is not bad.
>
> Obviously if you open 20 tabs, it will take a lot more memory, as it's
> going to have to cache all the rendered pages.
I had a recent bad experience that I believe was due to a bug in
adblock. Upgrading to the most recent version of adblock fixed a memory
leak that made firefox unusable after a while.
Shaggy
--
David Kleikamp
IBM Linux Technology Center
On Thursday 26 January 2006 00:27, Nix wrote:
> On 23 Jan 2006, Diego Calleja wrote:
> > El Mon, 23 Jan 2006 09:05:41 -0600,
> > Ram Gupta <[email protected]> escribi?:
> >
> >> Linux also supports multiple swap files . But these are more
> >
> > There're in fact a "dynamic swap" tool which apparently
> > does what mac os x do: http://dynswapd.sourceforge.net/
> >
> > However, I doubt the approach is really useful. If you need that much
> > swap space, you're going well beyond the capabilities of the machine.
>
> Well, to some extent it depends on your access patterns. The backup
> program I use (`dar') is an enormous memory hog: it happily eats 5Gb on
> my main fileserver (an UltraSPARC, so compiling it 64-bit does away with
> address space sizing problems). That machine has only 512Mb RAM, so
> you'd expect the thing would be swapping to death; but the backup
> program's locality of reference is sufficiently good that it doesn't
> swap much at all (and that in one tight lump at the end).
Totally insane proggie.
--
vda
On Thu, 26 Jan 2006, Denis Vlasenko announced authoritatively:
> On Thursday 26 January 2006 00:27, Nix wrote:
>> Well, to some extent it depends on your access patterns. The backup
>> program I use (`dar') is an enormous memory hog: it happily eats 5Gb on
>> my main fileserver (an UltraSPARC, so compiling it 64-bit does away with
>> address space sizing problems). That machine has only 512Mb RAM, so
>> you'd expect the thing would be swapping to death; but the backup
>> program's locality of reference is sufficiently good that it doesn't
>> swap much at all (and that in one tight lump at the end).
>
> Totally insane proggie.
For incremental backups, it has to work out which files have been added
or removed across the whole disk; whether it stores this in temporary
files or in memory, if there's more file metadata than fits in physical
RAM, it'll be disk-bound working that out at the end no matter what you
do. And avoiding temporary files means you don't have problems with
those (growing) files landing in the backup.
(Now some of its design decisions, like the decision to represent things
like the sizes of files with a custom `infinint' class with a size of
something like 64 bytes, probably were insane. At least you can change
it at configure-time to use long longs instead, vastly reducing memory
usage to the mere 5Gb mentioned in that post...)
(Lovely feature set, shame about the memory hit.)
--
`Everyone has skeletons in the closet. The US has the skeletons
driving living folks into the closet.' --- Rebecca Ore
>> Opera is probably the best browser when it comes to "features per byte
>> of memory used"
>
>Really? If I'm making use it, maybe visiting a few hundred pages a
>day, and opening 20 tabs, I find I have to kill it every few days, to
>reclaim the memory it's hogging, when its resident size exceeds my RAM
>size and it starts chugging.
That matches my experience, though it does crash enough on its own that I
often don't have to kill it. I also use an rlimit (64MiB) to make the
system kill it automatically before it gets too big, and an automatic
restarter. Opera is, thankfully, very good at bouncing back to exactly
where it was when it died (minus the leaked memory).
But allowing for that extra operational procedure, I'd still say it has
the most features per byte, and if you don't count ability to work with
certain websites as a feature, I think it probably has the most features
absolutely as well.
>[explanation of memory/disk split]
>...
>So we have a situation right now that imposes a legacy solution on
hardware
>that is really screaming (64+) to be taken advantage of.
Put that way, you seem to be describing exactly single level storage as
seen in an IBM Eserver I Series (fka AS/400, nee System/38).
So we know it works, but also that people don't seem to care much for it
(because in 35 years, it hasn't taken over the world - we got to today's
machines with 64 bit address spaces for other reasons).
--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems
>Are there any Linux file systems that work by mmapping the entire
>drive and using the paging system to do the read/writes? With 64 bits
>there's enough address space to do that now. How does this perform
>compared to a traditional block based scheme?
They pretty much all do that. A filesystem driver doesn't actually map
the whole drive into memory addresses all at once and generate page faults
by referencing memory -- instead, it generates the page faults explicitly,
which it can do more efficiently, and sets up the mappings in smaller
pieces as needed (also more efficient). But the code that reads the pages
into the file cache and cleans dirty file cache pages out to the disk is
the same paging code that responds to page faults on malloc'ed pages and
writes such pages out to swap space when their page frames are needed for
other things.
>With the IBM 128b address space aren't the devices vulnerable to an
>errant program spraying garbage into the address space? Is it better
>to map each device into it's own address space?
Partitioning your storage space along device lines and making someone who
wants to store something identify a device for it is a pretty primitive
way of limiting errant programs. Something like Linux disk quota and
rlimit (ulimit) is more appropriate to the task, and systems that gather
all their disk storage (even if separate from main memory) into a single
automated pool do have such quota systems.
--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems
Bryan Henderson wrote:
> >[explanation of memory/disk split]
> >...
> >So we have a situation right now that imposes a legacy solution on
> >hardware that is really screaming (64+) to be taken advantage of.
>
> Put that way, you seem to be describing exactly single level storage as
> seen in an IBM Eserver I Series (fka AS/400, nee System/38).
To some extent.
> So we know it works, but also that people don't seem to care much for it
People didn't care, because the AS/400 was based on a proprietary solution.
I remember a client being forced to dump an AS/400 due to astronomical
maintenance costs.
With todays generically mass-produced 64bit archs, what's not to care about a
cost-effective system that provides direct mapped access into linear address
space?
Thanks!
--
Al
>> So we know it [single level storage] works, but also that people don't
seem to care much for it
>
>People didn't care, because the AS/400 was based on a proprietary
solution.
I don't know what a "proprietary solution" is, but what we had was a
complete demonstration of the value of single level storage, in commercial
use and everything, and other computer makers (and other business units
of IBM) stuck with their memory/disk split personality. For 25 years,
lots of computer makers developed lots of new computer architectures and
they all (practically speaking) had the memory/disk split. There has to
be a lesson in that.
>With todays generically mass-produced 64bit archs, what's not to care
about a
>cost-effective system that provides direct mapped access into linear
address
>space?
I don't know; I'm sure it's complicated. But unless the stumbling block
since 1980 has been that it was too hard to get/make a CPU with a 64 bit
address space, I don't see what's different today.
--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems
Bryan Henderson wrote:
> >> So we know it [single level storage] works, but also that people don't
> >> seem to care much for it
>
> > People didn't care, because the AS/400 was based on a proprietary
> > solution.
>
> I don't know what a "proprietary solution" is, but what we had was a
> complete demonstration of the value of single level storage, in commercial
> use and everything, and other computer makers (and other business units
> of IBM) stuck with their memory/disk split personality. For 25 years,
> lots of computer makers developed lots of new computer architectures and
> they all (practically speaking) had the memory/disk split. There has to
> be a lesson in that.
Sure there is lesson here. People have a tendency to resist change, even
though they know the current way is faulty.
> > With todays generically mass-produced 64bit archs, what's not to care
> > about a cost-effective system that provides direct mapped access into
> > linear address space?
>
> I don't know; I'm sure it's complicated.
Why would you think that the shortest path between two points is complicated,
when you have the ability to fly?
> But unless the stumbling block
> since 1980 has been that it was too hard to get/make a CPU with a 64 bit
> address space, I don't see what's different today.
You are hitting the nail right on it's head here.
Nothing moves the masses like mass-production.
So with 64bits widely available now, and to let Linux spread its wings and
really fly, how could tmpfs merged w/ swap be tweaked to provide direct
mapped access into this linear address space?
Thanks!
--
Al
On Jan 30, 2006, at 08:21, Al Boldi wrote:
> Bryan Henderson wrote:
>>>> So we know it [single level storage] works, but also that people
>>>> don't seem to care much for it
>>
>>> People didn't care, because the AS/400 was based on a proprietary
>>> solution.
>>
>> I don't know what a "proprietary solution" is, but what we had was
>> a complete demonstration of the value of single level storage, in
>> commercial use and everything, and other computer makers (and
>> other business units of IBM) stuck with their memory/disk split
>> personality. For 25 years, lots of computer makers developed lots
>> of new computer architectures and they all (practically speaking)
>> had the memory/disk split. There has to be a lesson in that.
>
> Sure there is lesson here. People have a tendency to resist
> change, even though they know the current way is faulty.
Is it necessarily faulty? It seems to me that the current way works
pretty well so far, and unless you can prove a really strong point
the other way, there's no point in changing. You have to remember
that change introduces bugs which then have to be located and removed
again, so change is not necessarily cheap.
>>> With todays generically mass-produced 64bit archs, what's not to
>>> care about a cost-effective system that provides direct mapped
>>> access into linear address space?
>>
>> I don't know; I'm sure it's complicated.
>
> Why would you think that the shortest path between two points is
> complicated, when you have the ability to fly?
Bad analogy. This is totally irrelevant to the rest of the discussion.
>> But unless the stumbling block since 1980 has been that it was too
>> hard to get/make a CPU with a 64 bit address space, I don't see
>> what's different today.
>
> You are hitting the nail right on it's head here. Nothing moves the
> masses like mass-production.
Uhh, no, you misread his argument: If there were other reasons that
this was not done in the past than lack of 64-bit CPUS, then this is
probably still not practical/feasible/desirable.
Cheers,
Kyle Moffett
--
There is no way to make Linux robust with unreliable memory
subsystems, sorry. It would be like trying to make a human more
robust with an unreliable O2 supply. Memory just has to work.
-- Andi Kleen
>> > With todays generically mass-produced 64bit archs, what's not to care
>> > about a cost-effective system that provides direct mapped access into
>> > linear address space?
>>
>> I don't know; I'm sure it's complicated.
>
>Why would you think that the shortest path between two points is
complicated,
I can see that my statement could be read a different way from what I
meant. I meant I'm sure that the reason people don't care about single
level storage is complicated. (Ergo I haven't tried, so far, to argue for
or against it but just to point out some history).
Kyle Moffett wrote:
> On Jan 30, 2006, at 08:21, Al Boldi wrote:
> > Bryan Henderson wrote:
> >>>> So we know it [single level storage] works, but also that people
> >>>> don't seem to care much for it
> >>>
> >>> People didn't care, because the AS/400 was based on a proprietary
> >>> solution.
> >>
> >> I don't know what a "proprietary solution" is, but what we had was
> >> a complete demonstration of the value of single level storage, in
> >> commercial use and everything, and other computer makers (and
> >> other business units of IBM) stuck with their memory/disk split
> >> personality. For 25 years, lots of computer makers developed lots
> >> of new computer architectures and they all (practically speaking)
> >> had the memory/disk split. There has to be a lesson in that.
> >
> > Sure there is lesson here. People have a tendency to resist
> > change, even though they know the current way is faulty.
>
> Is it necessarily faulty? It seems to me that the current way works
> pretty well so far, and unless you can prove a really strong point
> the other way, there's no point in changing. You have to remember
> that change introduces bugs which then have to be located and removed
> again, so change is not necessarily cheap.
Faulty, because we are currently running a legacy solution to workaround an
8,16,(32) arch bits address space limitation, which does not exist in
64bits+ archs for most purposes.
Trying to defend the current way would be similar to rejecting the move from
16bit to 32bit. Do you remember that time? One of the arguments used was:
the current way works pretty well so far.
The advice here would be: wake up and smell the coffee.
There is a lot to gain, for one there is no more swapping w/ all its related
side-effects. You're dealing with memory only. You can also run your fs
inside memory, like tmpfs, which is definitely faster. And there may be
lots of other advantages, due to the simplified architecture applied.
> >>> With todays generically mass-produced 64bit archs, what's not to
> >>> care about a cost-effective system that provides direct mapped
> >>> access into linear address space?
> >>
> >> I don't know; I'm sure it's complicated.
> >
> > Why would you think that the shortest path between two points is
> > complicated, when you have the ability to fly?
>
> Bad analogy.
If you didn't understand it's meaning. The shortest path meaning accessing
hw w/o running workarounds; using 64bits+ to fly over past limitations.
> >> But unless the stumbling block since 1980 has been that it was too
> >> hard to get/make a CPU with a 64 bit address space, I don't see
> >> what's different today.
> >
> > You are hitting the nail right on it's head here. Nothing moves the
> > masses like mass-production.
>
> Uhh, no, you misread his argument: If there were other reasons that
> this was not done in the past than lack of 64-bit CPUS, then this is
> probably still not practical/feasible/desirable.
Uhh?
The point here is: Even if there were 64bit archs available in the past, this
did not mean that moving into native 64bits would be commercially viable,
due to its unavailability on the mass-market.
So with 64bits widely available now, and to let Linux spread its wings and
really fly, how could tmpfs merged w/ swap be tweaked to provide direct
mapped access into this linear address space?
Thanks!
--
Al
BTW, unless you have a patch or something to propose, let's take this
off-list, it's getting kind of OT now.
On Jan 31, 2006, at 10:56, Al Boldi wrote:
> Kyle Moffett wrote:
>> Is it necessarily faulty? It seems to me that the current way
>> works pretty well so far, and unless you can prove a really strong
>> point the other way, there's no point in changing. You have to
>> remember that change introduces bugs which then have to be located
>> and removed again, so change is not necessarily cheap.
>
> Faulty, because we are currently running a legacy solution to
> workaround an 8,16,(32) arch bits address space limitation, which
> does not exist in 64bits+ archs for most purposes.
There are a lot of reasons for paging, only _one_ of them is/was to
deal with too-small address spaces. Other reasons are that sometimes
you really _do_ want a nonlinear mapping of data/files/libs/etc. It
also allows easy remapping of IO space or video RAM into application
address spaces, etc. If you have a direct linear mapping from
storage into RAM, common non-linear mappings become _extremely_
complex and CPU-intensive.
Besides, you never did address the issue of large changes causing
large bugs. Any large change needs to have advantages proportional
to the bugs it will cause, and you have not yet proven this case.
> Trying to defend the current way would be similar to rejecting the
> move from 16bit to 32bit. Do you remember that time? One of the
> arguments used was: the current way works pretty well so far.
Arbitrary analogies do not prove things. Can you cite examples that
clearly indicate how paged-memory is to direct-linear-mapping as 16-
bit processors are to 32-bit processors?
> There is a lot to gain, for one there is no more swapping w/ all
> its related side-effects.
This is *NOT* true. When you have more data than RAM, you have to
put data on disk, which means swapping, regardless of the method in
which it is done.
> You're dealing with memory only. You can also run your fs inside
> memory, like tmpfs, which is definitely faster.
Not on Linux. We have a whole unique dcache system precisely so that
a frequently accessed filesystem _is_ as fast as tmpfs (Unless you're
writing and syncing a lot, in which case you still need to wait for
disk hardware to commit data).
> And there may be lots of other advantages, due to the simplified
> architecture applied.
Can you describe in detail your "simplified architecture"?? I can't
see any significant complexity advantages over the standard paging
model that Linux has.
>>> Why would you think that the shortest path between two points is
>>> complicated, when you have the ability to fly?
>>
>> Bad analogy.
>
> If you didn't understand it's meaning. The shortest path meaning
> accessing hw w/o running workarounds; using 64bits+ to fly over
> past limitations.
This makes *NO* technical sense and is uselessly vague. Applying
vague indirect analogies to technical topics is a fruitless
endeavor. Please provide technical points and reasons why it _is_
indead shorter/better/faster, and then you can still leave out the
analogy because the technical argument is sufficient.
>>>> But unless the stumbling block since 1980 has been that it was too
>>>> hard to get/make a CPU with a 64 bit address space, I don't see
>>>> what's different today.
>>>
>>> You are hitting the nail right on it's head here. Nothing moves the
>>> masses like mass-production.
>>
>> Uhh, no, you misread his argument: If there were other reasons that
>> this was not done in the past than lack of 64-bit CPUS, then this is
>> probably still not practical/feasible/desirable.
>
> Uhh?
> The point here is: Even if there were 64bit archs available in the
> past, this did not mean that moving into native 64bits would be
> commercially viable, due to its unavailability on the mass-market.
Are you even reading these messages?
1) IF the ONLY reason this was not done before is that 64-bit archs
were hard to get, then you are right.
2) IF there were OTHER reasons, then you are not correct.
This is the argument. You keep discussing how 64-bit archs were not
easily available before and are now, and I AGREE, but that is NOT
RELEVANT to the point he made. Can you prove that there are no other
disadvantages to a linear-mapped model?
Cheers,
Kyle Moffett
--
Q: Why do programmers confuse Halloween and Christmas?
A: Because OCT 31 == DEC 25.
On Tue, Jan 31, 2006 at 06:56:17PM +0300, Al Boldi wrote:
> Faulty, because we are currently running a legacy solution to workaround an
> 8,16,(32) arch bits address space limitation, which does not exist in
> 64bits+ archs for most purposes.
>
> Trying to defend the current way would be similar to rejecting the move from
> 16bit to 32bit. Do you remember that time? One of the arguments used was:
> the current way works pretty well so far.
>
> The advice here would be: wake up and smell the coffee.
>
> There is a lot to gain, for one there is no more swapping w/ all its related
> side-effects. You're dealing with memory only. You can also run your fs
> inside memory, like tmpfs, which is definitely faster. And there may be
> lots of other advantages, due to the simplified architecture applied.
Of course there is swapping. The cpu only executes thigns from physical
memory, so at some point you have to load stuff from disk to physical
memory. That seems amazingly much like the definition of swapping too.
Sometimes you call it loading. Not much difference really. If
something else is occupying physical memory so there isn't room, it has
to be put somewhere, which if it is just caching some physical disk
space, you just dump it, but if it is some giant chunk of data you are
currently generating, then it needs to go to some other place that
handles temporary data that doesn't already have a palce in the
filesystem. Unless you have infinite physical memory, at some point you
will have to move temporary data from physical memory to somewhere else.
That is swapping no matter how you view the system's address space.
Making it be called something else doesn't change the facts.
Applications don't currently care if they are swapped to disk or in
physical memory. That is handled by the OS and is transparent to the
application.
> If you didn't understand it's meaning. The shortest path meaning accessing
> hw w/o running workarounds; using 64bits+ to fly over past limitations.
THe OS still has to map the address space to where it physically exists.
Mapping all disk space into the address space may actually be a lot less
efficient than using the filesystem interface for a block device.
> Uhh?
> The point here is: Even if there were 64bit archs available in the past, this
> did not mean that moving into native 64bits would be commercially viable,
> due to its unavailability on the mass-market.
>
> So with 64bits widely available now, and to let Linux spread its wings and
> really fly, how could tmpfs merged w/ swap be tweaked to provide direct
> mapped access into this linear address space?
Applications can mmap files if they want to. Your idea seems likely to
make the OS much more complex, and waste a lot of resources on mapping
disk space to the address space, and from the applications point of view
it doesn't seem to make any difference at all. It might be a fun idea
for some academic research OS somewhere to go work out the kinks and see
if it has any efficiency at all in real use. Given Linux runs on lots
of architectures, trying to make it work completely differently on 64bit
systems doesn't make that much sense really, especially when there is no
apparent benefit to the change.
Len Sorensen
Al Boldi wrote:
> There is a lot to gain, for one there is no more swapping w/ all its related
> side-effects. You're dealing with memory only.
I'm sorry, I think I don't understand. My weakness. Can you please explain?
Presumably you will want access to more data than you have RAM,
because RAM is still limited to a few GB these days, whereas a typical
personal data store is a few 100s of GB.
64-bit architecture doesn't change this mismatch. So how do you
propose to avoid swapping to/from a disk, with all the time delays and
I/O scheduling algorithms that needs?
-- Jamie
>1) IF the ONLY reason this was not done before is that 64-bit archs
>were hard to get, then you are right.
>
>2) IF there were OTHER reasons, then you are not correct.
>
>This is the argument. You keep discussing how 64-bit archs were not
>easily available before and are now, and I AGREE, but that is NOT
>RELEVANT to the point he made.
As I remember it, my argument was that single level storage was known and
practical for 25 years and people did not flock to it, therefore they must
not see it as useful. So if 64 bit processors were not available enough
during that time, that blows away my argument, because people might have
liked the idea but just couldn't afford the necessary address width. It
doesn't matter if there were other reasons to shun the technology; all it
takes is one. And if 64 bit processors are more available today, that
might tip the balance in favor of making the change away from multilevel
storage.
But I don't really buy that 64 bit processors weren't available until
recently. I think they weren't produced in commodity fashion because
people didn't have a need for them. They saw what you can do with 128 bit
addresses (i.e. single level storage) in the IBM I Series line, but
weren't impressed. People added lots of other new technology to the
mainstream CPU lines, but not additional address bits. Not until they
wanted to address more than 4G of main memory at a time did they see any
reason to make 64 bit processors in volume.
Ergo, I do think it was something bigger that made the industry stick with
traditional multilevel storage all these years.
--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems
On 1/31/06, Al Boldi <[email protected]> wrote:
> Faulty, because we are currently running a legacy solution to workaround an
> 8,16,(32) arch bits address space limitation, which does not exist in
> 64bits+ archs for most purposes.
In the early 1990's (and maybe even the mid 90's), the typical hard
disk's storage could theoretically be byte-addressed using 32-bit
addresses -- just as (if I understand you correctly) you are arguing
that today's hard disks can be byte-addressed using 64-bit addresses.
If this was going to be practical ever (on commodity hardware anyway),
I would have expected someone to try it on a 32-bit PC or Mac when
hard drives were in the 100MB-3GB range... That suggests to me that
there's a more fundamental reason (i.e. other than lack of address
space) that caused people to stick with the current scheme.
[snip]
> There is a lot to gain, for one there is no more swapping w/ all its related
> side-effects. You're dealing with memory only. You can also run your fs
> inside memory, like tmpfs, which is definitely faster. And there may be
> lots of other advantages, due to the simplified architecture applied.
tmpfs isn't "definitely faster". Remember those benchmarks where Linux
ext2 beat Solaris tmpfs?
http://www.tux.org/lkml/#s9-12
Also, the only way I see where "there is no more swapping" and
"[y]ou're dealing with memory only" is if the disk *becomes* main
memory, and main memory becomes an L3 (or L4) cache for the CPU [and
as a consequence, main memory also becomes the main form of long-term
storage]. Is that what you're proposing?
If so, then it actually makes *less* sense to me than before -- with
your scheme, you've reduced the speed of main memory by 100x or more,
then you try to compensate with a huge cache. IOW, you've reduced the
speed of *main* memory to (more or less) the speed of today's swap!
Suddenly it doesn't sound so good anymore...
--
-Barry K. Nathan <[email protected]>
On Wednesday 01 February 2006 04:06, Barry K. Nathan wrote:
>
> Also, the only way I see where "there is no more swapping" and
> "[y]ou're dealing with memory only" is if the disk *becomes* main
> memory, and main memory becomes an L3 (or L4) cache for the CPU [and
> as a consequence, main memory also becomes the main form of long-term
> storage]. Is that what you're proposing?
>
In the not-too-distant future, there is likely to be a ram/disk price
inversion; ram becomes cheaper/mb than disk. At that point, we'll be buying
hardware based on "how much disk can I afford to provide power-off backup of
my ram?" rather than "how much ram can I afford?"
At that point, things will change.
Maybe, then, everything _will_ be in ram (with the kernel will intelligently
write out pages to the disk in the background, incase of power failure and
ready for a shutdown). Disk reads only ever occur during a power-on
population of ram.
Blue skys....
Andrew Walrond
Thanks for your detailed responses!
Kyle Moffett wrote:
> BTW, unless you have a patch or something to propose, let's take this
> off-list, it's getting kind of OT now.
No patches yet, but even if there were, would they get accepted?
> On Jan 31, 2006, at 10:56, Al Boldi wrote:
> > Kyle Moffett wrote:
> >> Is it necessarily faulty? It seems to me that the current way
> >> works pretty well so far, and unless you can prove a really strong
> >> point the other way, there's no point in changing. You have to
> >> remember that change introduces bugs which then have to be located
> >> and removed again, so change is not necessarily cheap.
> >
> > Faulty, because we are currently running a legacy solution to
> > workaround an 8,16,(32) arch bits address space limitation, which
> > does not exist in 64bits+ archs for most purposes.
>
> There are a lot of reasons for paging, only _one_ of them is/was to
> deal with too-small address spaces. Other reasons are that sometimes
> you really _do_ want a nonlinear mapping of data/files/libs/etc. It
> also allows easy remapping of IO space or video RAM into application
> address spaces, etc. If you have a direct linear mapping from
> storage into RAM, common non-linear mappings become _extremely_
> complex and CPU-intensive.
>
> Besides, you never did address the issue of large changes causing
> large bugs. Any large change needs to have advantages proportional
> to the bugs it will cause, and you have not yet proven this case.
How could reverting a workaround introduce large bugs?
> > Trying to defend the current way would be similar to rejecting the
> > move from 16bit to 32bit. Do you remember that time? One of the
> > arguments used was: the current way works pretty well so far.
>
> Arbitrary analogies do not prove things.
Analogies are there to make a long story short.
> Can you cite examples that
> clearly indicate how paged-memory is to direct-linear-mapping as 16-
> bit processors are to 32-bit processors?
I mentioned this in a previous message.
> > There is a lot to gain, for one there is no more swapping w/ all
> > its related side-effects.
>
> This is *NOT* true. When you have more data than RAM, you have to
> put data on disk, which means swapping, regardless of the method in
> which it is done.
>
> > You're dealing with memory only. You can also run your fs inside
> > memory, like tmpfs, which is definitely faster.
>
> Not on Linux. We have a whole unique dcache system precisely so that
> a frequently accessed filesystem _is_ as fast as tmpfs (Unless you're
> writing and syncing a lot, in which case you still need to wait for
> disk hardware to commit data).
This is true, and may very well explain why dcache is so CPU intensive.
> > And there may be lots of other advantages, due to the simplified
> > architecture applied.
>
> Can you describe in detail your "simplified architecture"?? I can't
> see any significant complexity advantages over the standard paging
> model that Linux has.
>
> >>> Why would you think that the shortest path between two points is
> >>> complicated, when you have the ability to fly?
> >>
> >> Bad analogy.
> >
> > If you didn't understand it's meaning. The shortest path meaning
> > accessing hw w/o running workarounds; using 64bits+ to fly over
> > past limitations.
>
> This makes *NO* technical sense and is uselessly vague. Applying
> vague indirect analogies to technical topics is a fruitless
> endeavor. Please provide technical points and reasons why it _is_
> indead shorter/better/faster, and then you can still leave out the
> analogy because the technical argument is sufficient.
>
> >>>> But unless the stumbling block since 1980 has been that it was too
> >>>> hard to get/make a CPU with a 64 bit address space, I don't see
> >>>> what's different today.
> >>>
> >>> You are hitting the nail right on it's head here. Nothing moves the
> >>> masses like mass-production.
> >>
> >> Uhh, no, you misread his argument: If there were other reasons that
> >> this was not done in the past than lack of 64-bit CPUS, then this is
> >> probably still not practical/feasible/desirable.
> >
> > Uhh?
> > The point here is: Even if there were 64bit archs available in the
> > past, this did not mean that moving into native 64bits would be
> > commercially viable, due to its unavailability on the mass-market.
>
> Are you even reading these messages?
Bryan Henderson wrote:
> >1) IF the ONLY reason this was not done before is that 64-bit archs
> >were hard to get, then you are right.
> >
> >2) IF there were OTHER reasons, then you are not correct.
> >
> >This is the argument. You keep discussing how 64-bit archs were not
> >easily available before and are now, and I AGREE, but that is NOT
> >RELEVANT to the point he made.
>
> As I remember it, my argument was that single level storage was known and
> practical for 25 years and people did not flock to it, therefore they must
> not see it as useful. So if 64 bit processors were not available enough
> during that time, that blows away my argument, because people might have
> liked the idea but just couldn't afford the necessary address width. It
> doesn't matter if there were other reasons to shun the technology; all it
> takes is one. And if 64 bit processors are more available today, that
> might tip the balance in favor of making the change away from multilevel
> storage.
Thanks for clarifying this!
> But I don't really buy that 64 bit processors weren't available until
> recently. I think they weren't produced in commodity fashion because
> people didn't have a need for them. They saw what you can do with 128 bit
> addresses (i.e. single level storage) in the IBM I Series line, but
> weren't impressed. People added lots of other new technology to the
> mainstream CPU lines, but not additional address bits. Not until they
> wanted to address more than 4G of main memory at a time did they see any
> reason to make 64 bit processors in volume.
True, so with 64bits=16MTB what reason would there be to stick with a swapped
memory model?
Jamie Lokier wrote:
> Al Boldi wrote:
> > There is a lot to gain, for one there is no more swapping w/ all its
> > related side-effects. You're dealing with memory only.
>
> I'm sorry, I think I don't understand. My weakness. Can you please
> explain?
>
> Presumably you will want access to more data than you have RAM,
> because RAM is still limited to a few GB these days, whereas a typical
> personal data store is a few 100s of GB.
>
> 64-bit architecture doesn't change this mismatch. So how do you
> propose to avoid swapping to/from a disk, with all the time delays and
> I/O scheduling algorithms that needs?
This is exactly what a linear-mapped memory model avoids.
Everything is already mapped into memory/disk.
Lennart Sorensen wrote:
> Of course there is swapping. The cpu only executes thigns from physical
> memory, so at some point you have to load stuff from disk to physical
> memory. That seems amazingly much like the definition of swapping too.
> Sometimes you call it loading. Not much difference really. If
> something else is occupying physical memory so there isn't room, it has
> to be put somewhere, which if it is just caching some physical disk
> space, you just dump it, but if it is some giant chunk of data you are
> currently generating, then it needs to go to some other place that
> handles temporary data that doesn't already have a palce in the
> filesystem. Unless you have infinite physical memory, at some point you
> will have to move temporary data from physical memory to somewhere else.
> That is swapping no matter how you view the system's address space.
> Making it be called something else doesn't change the facts.
Would you call reading and writing to memory/disk swapping?
> Applications don't currently care if they are swapped to disk or in
> physical memory. That is handled by the OS and is transparent to the
> application.
Yes, a linear-mapped memory model extends this transparency to the OS.
> > If you didn't understand it's meaning. The shortest path meaning
> > accessing hw w/o running workarounds; using 64bits+ to fly over past
> > limitations.
>
> THe OS still has to map the address space to where it physically exists.
> Mapping all disk space into the address space may actually be a lot less
> efficient than using the filesystem interface for a block device.
Did you try tmpfs?
> > Uhh?
> > The point here is: Even if there were 64bit archs available in the past,
> > this did not mean that moving into native 64bits would be commercially
> > viable, due to its unavailability on the mass-market.
> >
> > So with 64bits widely available now, and to let Linux spread its wings
> > and really fly, how could tmpfs merged w/ swap be tweaked to provide
> > direct mapped access into this linear address space?
>
> Applications can mmap files if they want to. Your idea seems likely to
> make the OS much more complex, and waste a lot of resources on mapping
> disk space to the address space, and from the applications point of view
> it doesn't seem to make any difference at all. It might be a fun idea
> for some academic research OS somewhere to go work out the kinks and see
> if it has any efficiency at all in real use. Given Linux runs on lots
> of architectures, trying to make it work completely differently on 64bit
> systems doesn't make that much sense really, especially when there is no
> apparent benefit to the change.
Arch bits have nothing to do with a linear-mapped memory model, they only
limit its usefulness. So with 8,16,(32) bits this linear-mapped model isn't
really viable because of its address-space limit. But with a 64bit+ arch
the limits are wide enough to make a linear-mapped model viable. A 32bit
arch is inbetween, so for some a 4GB limit may be acceptable.
Barry K. Nathan wrote:
> On 1/31/06, Al Boldi <[email protected]> wrote:
> > Faulty, because we are currently running a legacy solution to workaround
> > an 8,16,(32) arch bits address space limitation, which does not exist in
> > 64bits+ archs for most purposes.
>
> In the early 1990's (and maybe even the mid 90's), the typical hard
> disk's storage could theoretically be byte-addressed using 32-bit
> addresses -- just as (if I understand you correctly) you are arguing
> that today's hard disks can be byte-addressed using 64-bit addresses.
>
> If this was going to be practical ever (on commodity hardware anyway),
> I would have expected someone to try it on a 32-bit PC or Mac when
> hard drives were in the 100MB-3GB range... That suggests to me that
> there's a more fundamental reason (i.e. other than lack of address
> space) that caused people to stick with the current scheme.
32bits is in brackets - 8,16,(32) - to high-light that it's an inbetween.
> tmpfs isn't "definitely faster". Remember those benchmarks where Linux
> ext2 beat Solaris tmpfs?
Linux tmpfs is faster because it can short-circuit dcache, in effect doing an
o_sync. It slows down when swapping kicks in.
> Also, the only way I see where "there is no more swapping" and
> "[y]ou're dealing with memory only" is if the disk *becomes* main
> memory, and main memory becomes an L3 (or L4) cache for the CPU [and
> as a consequence, main memory also becomes the main form of long-term
> storage]. Is that what you're proposing?
In the long-term yes, maybe even move it into hardware. But for the
short-term there is no need to blow things out of proportion, a simple
tweaking of tmpfs merged w/ swap may do the trick quick and easy.
> If so, then it actually makes *less* sense to me than before -- with
> your scheme, you've reduced the speed of main memory by 100x or more,
> then you try to compensate with a huge cache. IOW, you've reduced the
> speed of *main* memory to (more or less) the speed of today's swap!
> Suddenly it doesn't sound so good anymore...
There really isn't anything new here; we do swap and access the fs on disk
and compensate with a huge dcache now. All this idea implies, is to remove
certain barriers that could not be easily passed before, thus move swap and
fs into main memory.
Can you see how removing barriers would aid performance?
Thanks!
--
Al
Al Boldi wrote:
> > Presumably you will want access to more data than you have RAM,
> > because RAM is still limited to a few GB these days, whereas a typical
> > personal data store is a few 100s of GB.
> >
> > 64-bit architecture doesn't change this mismatch. So how do you
> > propose to avoid swapping to/from a disk, with all the time delays and
> > I/O scheduling algorithms that needs?
>
> This is exactly what a linear-mapped memory model avoids.
> Everything is already mapped into memory/disk.
Having everything mapped to memory/disk *does not* avoid time delays
and I/O scheduling. At some level, whether it's software or hardware,
something has to schedule the I/O to disk because there isn't enough RAM.
How do you propose to avoid those delays?
In my terminology, I/O of pages between disk and memory is called
swapping. (Or paging, or loading, or virtual memory I/O...)
Perhaps you have a different terminology?
> Would you call reading and writing to memory/disk swapping?
Yes, if it involves the disk and heuristic paging decisions. Whether
that's handled by software or hardware.
> > Applications don't currently care if they are swapped to disk or in
> > physical memory. That is handled by the OS and is transparent to the
> > application.
>
> Yes, a linear-mapped memory model extends this transparency to the OS.
Yes, that is possible. It's slow in practice because that
transparency comes at the cost of page faults (when the OS accesses
that linear-mapped memory), which is slow on the kinds of CPU we are
talking about - i.e. commodity 64-bit chips.
> > > If you didn't understand it's meaning. The shortest path meaning
> > > accessing hw w/o running workarounds; using 64bits+ to fly over past
> > > limitations.
> >
> > THe OS still has to map the address space to where it physically exists.
> > Mapping all disk space into the address space may actually be a lot less
> > efficient than using the filesystem interface for a block device.
>
> Did you try tmpfs?
Actually, mmap() to read a tmpfs file can be slower than just calling
read(), for some access patterns. It's because page faults, which are
used to map the file, can be slower than copying data. However,
copying uses more memory. Today we leave it to the application to
decide which method to use.
> > If so, then it actually makes *less* sense to me than before -- with
> > your scheme, you've reduced the speed of main memory by 100x or more,
> > then you try to compensate with a huge cache. IOW, you've reduced the
> > speed of *main* memory to (more or less) the speed of today's swap!
> > Suddenly it doesn't sound so good anymore...
>
> There really isn't anything new here; we do swap and access the fs on disk
> and compensate with a huge dcache now. All this idea implies, is to remove
> certain barriers that could not be easily passed before, thus move swap and
> fs into main memory.
>
> Can you see how removing barriers would aid performance?
I suspect that, despite possibly simplifying code, removing those
barriers would make it run slower.
If I understand your scheme, you're suggesting the kernel accesses
disks, filesystems, etc. by simply reading and writing somewhere in
the 64-bit address space.
At some level, that will involve page faults to move data between RAM and disk.
Those page faults are relatively slow - governed by the CPU's page
fault mechanism. Probably slower than what the kernel does now:
testing flags and indirecting through "struct page *".
However, do feel free to try out your idea. If it is actually notably
faster, or if it makes no difference to speed but makes a lot of code
simpler, well then surely it will be interesting.
-- Jamie
On Wed, Feb 01, 2006 at 09:51:08AM +0000, Andrew Walrond wrote:
> In the not-too-distant future, there is likely to be a ram/disk price
> inversion; ram becomes cheaper/mb than disk. At that point, we'll be buying
> hardware based on "how much disk can I afford to provide power-off backup of
> my ram?" rather than "how much ram can I afford?"
Hmm...
I resently bought a 250GB HD for my machine for $112, which is $0.50/GB
or $0.0005/MB. I bought 512M ram for $55. which is $0.10/MB. The ram
cost 200 times more per MB than the disk space.
In 1992 I got a 245MB HD for a new machine for $500 as far as I recall,
which was $2/MB. I got 16MB ram for $800, which was $50/MB. The ram
cost 25 times more than the disk space.
So just what kind of price trend are you looking at that will let you
get ram cheaper than disk space any time soon? There has never been
such a trend yet as far as I know. Maybe you have better data than me.
My experience shows the other direction. Both memory and disk space are
much cheaper than they used to be, but the disk space has reduced in
price much faster than memory.
> At that point, things will change.
Sure, except I don't believe it will ever happen.
> Maybe, then, everything _will_ be in ram (with the kernel will intelligently
> write out pages to the disk in the background, incase of power failure and
> ready for a shutdown). Disk reads only ever occur during a power-on
> population of ram.
Len Sorensen
On Wednesday 01 February 2006 17:51, Lennart Sorensen wrote:
>
> So just what kind of price trend are you looking at that will let you
> get ram cheaper than disk space any time soon? There has never been
> such a trend yet as far as I know. Maybe you have better data than me.
> My experience shows the other direction. Both memory and disk space are
> much cheaper than they used to be, but the disk space has reduced in
> price much faster than memory.
>
I cannot disagree with the obvious trend to date, but rather than argue the
many reasons why ram prices are artificially high right now, instead just
grab a stick of ram in your left hand, and the heavy lump of precision
engineered metal that is a hard drive in your right, and see if you can
convince yourself that the one on the right will still be ahead of the curve
in another 14 years.
Maybe it will. Drop me a mail in 2020 and I'll shout you dinner if you're
right ;)
Andrew
On Wed, Feb 01, 2006 at 06:21:12PM +0000, Andrew Walrond wrote:
> I cannot disagree with the obvious trend to date, but rather than argue the
> many reasons why ram prices are artificially high right now, instead just
> grab a stick of ram in your left hand, and the heavy lump of precision
> engineered metal that is a hard drive in your right, and see if you can
> convince yourself that the one on the right will still be ahead of the curve
> in another 14 years.
A metal case with a small circuit board, and some magnetic splattered
(very precisly) on a disk doesn't seem like as much work as trying to
fit over 10^12 transistors onto dies fitting in the same space. Making
waffers for memory isn't free, and higher densities take work to
develop. I am not sure what the current density for ram if in terms of
bits per area. I am sure it is a lot less than what a harddisk managed
with magnetic material. I am amazed either one works.
> Maybe it will. Drop me a mail in 2020 and I'll shout you dinner if you're
> right ;)
We will see. :)
Len Sorensen
Jamie Lokier wrote:
> If I understand your scheme, you're suggesting the kernel accesses
> disks, filesystems, etc. by simply reading and writing somewhere in
> the 64-bit address space.
>
> At some level, that will involve page faults to move data between RAM and
> disk.
>
> Those page faults are relatively slow - governed by the CPU's page
> fault mechanism. Probably slower than what the kernel does now:
> testing flags and indirecting through "struct page *".
Is there a way to benchmark this difference?
Thanks!
--
Al
On Maw, 2006-01-31 at 18:56 +0300, Al Boldi wrote:
> So with 64bits widely available now, and to let Linux spread its wings and
> really fly, how could tmpfs merged w/ swap be tweaked to provide direct
> mapped access into this linear address space?
Why bother. You can already create a private large file and mmap it if
you want to do this, and you will get better performance than being
smeared around swap with everyone else.
Currently swap means your data is mixed in with other stuff. Swap could
do preallocation of each vma when running in limited overcommit modes
and it would run a lot faster if you did but you would pay a lot in
flexibility and efficiency, as well as needing a lot more swap.
Far better to let applications wanting to work this way do it
themselves. Just mmap and the cache balancing and pager will do the rest
for you.
Alan Cox wrote:
> On Maw, 2006-01-31 at 18:56 +0300, Al Boldi wrote:
> > So with 64bits widely available now, and to let Linux spread its wings
> > and really fly, how could tmpfs merged w/ swap be tweaked to provide
> > direct mapped access into this linear address space?
>
> Why bother. You can already create a private large file and mmap it if
> you want to do this, and you will get better performance than being
> smeared around swap with everyone else.
>
> Currently swap means your data is mixed in with other stuff. Swap could
> do preallocation of each vma when running in limited overcommit modes
> and it would run a lot faster if you did but you would pay a lot in
> flexibility and efficiency, as well as needing a lot more swap.
>
> Far better to let applications wanting to work this way do it
> themselves. Just mmap and the cache balancing and pager will do the rest
> for you.
So w/ 1GB RAM, no swap, and 1TB disk mmap'd, could this mmap'd space be added
to the total memory available to the OS, as is done w/ swap?
And if that's possible, why not replace swap w/ mmap'd disk-space?
Thanks!
--
Al
>So w/ 1GB RAM, no swap, and 1TB disk mmap'd, could this mmap'd space be
added
>to the total memory available to the OS, as is done w/ swap?
Yes.
>And if that's possible, why not replace swap w/ mmap'd disk-space?
Because mmapped disk space has a permanent mapping of address to disk
location. That's how the earliest virtual memory systems worked, but we
moved beyond that to what we have now (what we've been calling swapping),
where the mapping gets established at the last possible moment, which
means we can go a lot faster. E.g. when the OS needs to steal 10 page
frames used for malloc pages which are scattered across the virtual
address space, it could write all those pages out in a single cluster
wherever a disk head happens to be at the moment.
Also, given that we use multiple address spaces (my shell and your shell
both have an Address 0, but they're different pages), there'd be a giant
allocation problem in assigning a contiguous area of disk to each address
space.
--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems
On Iau, 2006-02-02 at 21:59 +0300, Al Boldi wrote:
> So w/ 1GB RAM, no swap, and 1TB disk mmap'd, could this mmap'd space be added
> to the total memory available to the OS, as is done w/ swap?
Yes in theory. It would be harder to manage.
> And if that's possible, why not replace swap w/ mmap'd disk-space?
Swap is just somewhere to stick data that isnt file backed, you could
build a swapless mmap based OS but it wouldn't be quite the same as
Unix/Linux are.