Hi all :))
I read a time ago that, no matter the RAM you have, adding a
swap-area will improve performance a lot. So I tested.
I created a swap area twice as large as my RAM size (just an
arbitrary size), that is 1G. I've tested with lower sizes too. My RAM
is never filled (well, I haven't seen it filled, at least) since I
always work on console, no X and things like those. Even compiling
two or three kernels at a time don't consume my RAM. What I try to
explain is that the swap is not really needed in my machine, since
the memory is not prone to be filled.
Well, I haven't notice any change in performance, and the swap
area is *never* used. That contradicts what I've read about that, no
matter your free RAM size, a bit of swap is always used. That is not
my case, definitely.
So my question is: should I use a swap-area for improving
performance (or whatever else), or should I use those precious bytes
to improving my porn collection }:))? Seriously: I don't understand
how the swap works, I don't know if the swap area is used only when
RAM is exhausted or when the free RAM goes low beyond some point,
etc... I've read (just took a look) the kernel archives about swap
and it haven't light me O:))
Thanks a lot :)
Ra?l
(There _must_ be a good document on this somewhere, but I didn't find it.
Besides, I'm by far not the best person to explain this, but I believe the
VM gurus have better things to do than to explain this yet again...)
On Sat, Jul 27, 2002 at 02:22:20PM +0200, you [DervishD] wrote:
>
> I read a time ago that, no matter the RAM you have, adding a
> swap-area will improve performance a lot. So I tested.
Well, no. I don't know where you read it, but that's wrong.
Adding swap can only improve if the freed physical memory can be used for
something useful. Useful uses would include (obviously) the active programs
(excutable code and data). The rest of the memory can be used for disk
cache. This can help tremendously, since RAM is ~1000 times faster than
harddisk.
Where swap helps perfomance is when you can swap _inactive_ (parts of)
programs out, and use the freed memory for disk cache.
Where swap differs from adding physical memory is that if/when the inactive
programs become active, you need to swap them in, which takes time.
Obviosly, if you are not using large parts of the disk actively, adding disk
cache will not help. Once active program pages and active disk blocks are in
RAM, the performance is in theory optimal.
Of course, "active" is not unambiguous. You can shovel less active program
and disk pages in memory, but the gain goes quickly down.
> I created a swap area twice as large as my RAM size (just an
> arbitrary size), that is 1G. I've tested with lower sizes too. My RAM
> is never filled (well, I haven't seen it filled, at least) since I
> always work on console, no X and things like those. Even compiling
> two or three kernels at a time don't consume my RAM. What I try to
> explain is that the swap is not really needed in my machine, since
> the memory is not prone to be filled.
So you have 512MB of RAM? All the programs (without X) will fit there
easily. You'll still have plenty for disk cache.
> Well, I haven't notice any change in performance, and the swap
> area is *never* used.
If it is never used, it doesn't help.
BUT: if something unexpected happens - a programs goes out of control and
eats heaps of memory - the swap can save you.
> That contradicts what I've read about that, no matter your free RAM size,
> a bit of swap is always used. That is not my case, definitely.
You almost always have inactive programs that could be swapped off. The
freed memory could be used for disk cache. So you could gain _something_.
Hope this helps.
-- v --
[email protected]
On Sat, 27 Jul 2002, Ville Herva wrote:
> cache. This can help tremendously, since RAM is ~1000 times faster than
> harddisk.
Much more.
The latency difference seems to be on the order of 100000 times.
It is the latency we care about because that determines how long
the CPU cannot do anything useful but has to wait.
Rik
--
Bravely reimplemented by the knights who say "NIH".
http://www.surriel.com/ http://distro.conectiva.com/
Hi Ville :)
>> I read a time ago that, no matter the RAM you have, adding a
>> swap-area will improve performance a lot. So I tested.
>Well, no. I don't know where you read it, but that's wrong.
I don't remember clearly. Maybe at linux-gazette or someplace
like that. Moreover, maybe I take the phrase out of context.
>Where swap helps perfomance is when you can swap _inactive_ (parts of)
>programs out, and use the freed memory for disk cache.
Yes, that makes sense, obviously. My question is more: when an
inactive page will be swapped out? Only when there is no more RAM
left? When free RAM goes below some point? How to configure it?
>> the memory is not prone to be filled.
>So you have 512MB of RAM? All the programs (without X) will fit
>there easily. You'll still have plenty for disk cache.
Except when I'm compiling something large, the memory is almost
entirely free. I have a lot of memory for having a lot of cache, so
when I develope things go real fast. For example, I use gcc, make and
binutils (and an editor) most of the time. Well, thanks to the disk
cache, the first time they are run is the only disk access...
Moreover, sometimes I use ram disks.
>BUT: if something unexpected happens - a programs goes out of
>control and eats heaps of memory - the swap can save you.
But in such a case, highs are the chances of the program crashing
due to a memory error if there is no swap. I really don't understan
why swap may save me in this case O:)) Maybe the swap-in, swap-out
will make that process slower and I have some spare CPU to be able to
kill the program?
>Hope this helps.
Yes, thank you :))
Ra?l
On Sat, Jul 27, 2002 at 06:11:27PM +0200, you [DervishD] wrote:
>
> >Where swap helps perfomance is when you can swap _inactive_ (parts of)
> >programs out, and use the freed memory for disk cache.
>
> Yes, that makes sense, obviously. My question is more: when an
> inactive page will be swapped out? Only when there is no more RAM
> left?
No, it is smarter than that. The exact algorithms are not obvious - even the
linux VM gurus don't quite agree on them :) If you really want to know how
it works, browse at http://www.linux-mm.org - there you can find many
documents on it and plenty of good links.
> How to configure it?
Through the tunables in /proc/sys/vm/.
You can find some explanation for these in beginning of
/usr/src/linux/vm/vmscan.c etc (as of 2.4.19rc3) I don't know if there's
better documentation somewhere.
If you use -ac, recent 2.5 or vendor a kernel, you may find yourself with
Rik van Riel's vm implementation. It may have better documentation - in
different place.
> Except when I'm compiling something large, the memory is almost
> entirely free. I have a lot of memory for having a lot of cache, so
> when I develope things go real fast. For example, I use gcc, make and
> binutils (and an editor) most of the time. Well, thanks to the disk
> cache, the first time they are run is the only disk access...
Yes, that's exactly where disk cache will help you.
> But in such a case, highs are the chances of the program crashing
> due to a memory error if there is no swap. I really don't understan
> why swap may save me in this case O:)) Maybe the swap-in, swap-out
> will make that process slower and I have some spare CPU to be able to
> kill the program?
Well, there can be more than one process allocating memory. You shell is
then competing with all of them to get memory. Swap is no magic bullet in
this case either - it just adds more leeway for you.
Rik van Riel wrote:
> The latency difference seems to be on the order of 100000 times. It is
> the latency we care about because that determines how long the CPU cannot
> do anything useful but has to wait.
I stand corrected - I wrote that without thinking.
-- v --
[email protected]
On 27 July 2002 12:42, Ville Herva wrote:
> > I created a swap area twice as large as my RAM size (just an
> > arbitrary size), that is 1G. I've tested with lower sizes too. My RAM
> > is never filled (well, I haven't seen it filled, at least) since I
> > always work on console, no X and things like those. Even compiling
> > two or three kernels at a time don't consume my RAM. What I try to
> > explain is that the swap is not really needed in my machine, since
> > the memory is not prone to be filled.
>
> So you have 512MB of RAM? All the programs (without X) will fit there
> easily. You'll still have plenty for disk cache.
With today's software I'd say you probably need swap if you have
less than 256M of RAM and use X. You _definitely_ need it if you have less
than 128M.
X is regularly uses 50+ megs, Mozilla and OpenOffice are big
leaky beasts too. Hopes for improvements are dim.
Really, we have to fight software bloat instead of adding tons of RAM
and swap, but sadly we have quite a number of vital desktop software
packages overbloated.
I am enormously grateful for all kernel developers for Linux kernel
which is:
Memory: 124644k/129536k available
(1403k kernel code, 4436k reserved, 403k data, 152k init, 0k highmem)
Only 1.5 megs of code, 0.5 megs of data!
--
vda
>Much more.
>
>The latency difference seems to be on the order of 100000 times.
>It is the latency we care about because that determines how long
>the CPU cannot do anything useful but has to wait.
>
>Rik
And if you look at the ratio between the access time of ram
which is in the low nanoseconds (1* 10 ^ -9) (data and address must be
present for at least
the rated number of ns to guarantee a sucessful read or write). and compare
it to
the seek + rotational delay of a discrete spindal which is in low
milliseconds (1* 10 ^ -3) that puts you at
a ratio of about 1000000.
regards,
--Buddy
On Sat, 27 Jul 2002, Buddy Lumpkin wrote:
> >Much more.
> >
> >The latency difference seems to be on the order of 100000 times.
> >It is the latency we care about because that determines how long
> >the CPU cannot do anything useful but has to wait.
>
> And if you look at the ratio between the access time of ram which is in
> the low nanoseconds (1* 10 ^ -9) ... and compare it to the seek +
> rotational delay of a discrete spindal which is in low milliseconds (1*
> 10 ^ -3) that puts you at a ratio of about 1000000.
Indeed.
Now imagine one in every million memory accesses results in
a major page fault ... your computer would run at 1/2 speed.
The difference between a 99.999% hit rate and 99.9999% hit
rate becomes rather important with these latency ratios ;)
regards,
Rik
--
Bravely reimplemented by the knights who say "NIH".
http://www.surriel.com/ http://distro.conectiva.com/
On Sat, 2002-07-27 at 19:02, Denis Vlasenko wrote:
> On 27 July 2002 12:42, Ville Herva wrote:
> > > I created a swap area twice as large as my RAM size (just an
> > > arbitrary size), that is 1G. I've tested with lower sizes too. My RAM
> > > is never filled (well, I haven't seen it filled, at least) since I
> > > always work on console, no X and things like those. Even compiling
> > > two or three kernels at a time don't consume my RAM. What I try to
> > > explain is that the swap is not really needed in my machine, since
> > > the memory is not prone to be filled.
> >
> > So you have 512MB of RAM? All the programs (without X) will fit there
> > easily. You'll still have plenty for disk cache.
>
> With today's software I'd say you probably need swap if you have
> less than 256M of RAM and use X. You _definitely_ need it if you have less
> than 128M.
You really must think beyond the desktop as well. With large servers
running many databases, or a single large database, you will inherently
use swap. Maybe not much, but it will get used.
On a P4 Xeon 1MB L3 server with 8GB ram, I've got 4GB swap configured,
and use about 2 of that with a 4 oracle instances running. The largest
instance is ~700GB, whereas the 4 others are ~30GB ea.
In this scenario you have a large SHMMAX defined (4GB in this case), or
50% available RAM. As Oracle, Java, and other bits are used in the
system threading or not, most of the entirety of the availble ram will
eventually get used. The available to cache ratio on a box like this
with 2.4.19-rc1 is ~2% free ram, and 95% cached, and 3% active.
Swap is ~50% right now. So regardless of how much ram you have, you will
swap some, somewhere.
> X is regularly uses 50+ megs, Mozilla and OpenOffice are big
> leaky beasts too. Hopes for improvements are dim.
>
> Really, we have to fight software bloat instead of adding tons of RAM
> and swap, but sadly we have quite a number of vital desktop software
> packages overbloated.
This is another scenario as well. On my 1.333Ghz Athlon-C I've got 512MB
ram. My swap total is 265064 KB, but my free is 263600KB. I'm not
swapping much right now, but I also just rebooted last night. Either
way, Memfree is 19256 KB. After running some video applications or
ogg123 or something like this, the swap typically will go up to ~40 or
~100 mb used. Thankfully the -aa tree reclaims this very well, and it
will usually go back down to nearly 0KB used..or like it is now at 2-3MB
used.
> I am enormously grateful for all kernel developers for Linux kernel
> which is:
>
> Memory: 124644k/129536k available
> (1403k kernel code, 4436k reserved, 403k data, 152k init, 0k highmem)
>
> Only 1.5 megs of code, 0.5 megs of data!
> --
> vda
> -
--
Austin Gonyou <[email protected]>
>You really must think beyond the desktop as well. With large servers
>running many databases, or a single large database, you will inherently
>use swap. Maybe not much, but it will get used.
>On a P4 Xeon 1MB L3 server with 8GB ram, I've got 4GB swap configured,
>and use about 2 of that with a 4 oracle instances running. The largest
>instance is ~700GB, whereas the 4 others are ~30GB ea.
>In this scenario you have a large SHMMAX defined (4GB in this case), or
>50% available RAM. As Oracle, Java, and other bits are used in the
>system threading or not, most of the entirety of the availble ram will
>eventually get used. The available to cache ratio on a box like this
>with 2.4.19-rc1 is ~2% free ram, and 95% cached, and 3% active.
>Swap is ~50% right now. So regardless of how much ram you have, you will
>swap some, somewhere.
I thought linux worked more like Solaris where it didn't use any swap (AT
ALL) until it has to... At least, I hope linux works this way.
I manage a couple of Sun E10K domains (currently 20 procs, 20GB ram) running
Oracle instances that are in excess
of 1.5TB (for a single instance, this is considered very large for OLTP
based usage of an Oracle instance)
and we rarely see any use of our swap devices.
Solaris uses a two handed clock algorithm that traverses the PTE's for all
pages in the system.
The 1st hand resets the MMU reference and modified bits for a given page and
the second hand checks to see if the reference
or modified bit has been flipped since the 1st hand reset it.
If not, it ( > Solaris 2.5.1 with priority paging or Solaris 8 ) checks to
see if the page is a filesystem
page and if so, flushes it to it's backing store (the filesystem), if it's
an anoymous page then it may or may
not send that page to swap (depending how memory deprived the system is,
there are several watermarks that trigger different behavior).
Pages that are mapped in MAP_PRIVATE and executable are skipped over until
the system is seriously deprived of physical memory.
I believe Linux does something similar (even though the implementations
probably look completely different)...
The point to make here is that this mechanism doesn't even kick in until
free physical memory on the system drops
to a low watermark (in Solaris it could be cachefree or lotsfree depending
on version and whether it's using priority paging).
Your not gonna have *anything* on the swap device until you have reached one
of these watermarks (1/64 physical memory free in Solaris 8).
Solaris recently added an option to vmstat to look at paging statistics. It
nicely seperates out executable anonymous and filesystem page in/outs. Heres
some sample output fom one of the domains mentioned above:
# uname -a
SunOS <hostname removed> 5.8 Generic_108528-07 sun4u sparc
SUNW,Ultra-Enterprise-10000
# prtconf | head
System Configuration: Sun Microsystems sun4u
Memory size: 20480 Megabytes
System Peripherals (Software Nodes):
SUNW,Ultra-Enterprise-10000
packages (driver not attached)
terminal-emulator (driver not attached)
deblocker (driver not attached)
obp-tftp (driver not attached)
disk-label (driver not attached)
# vmstat -p 2
memory page executable anonymous
filesystem
swap free re mf fr de sr epi epo epf api apo apf fpi fpo
fpf
80330680 10599432 2613 14569 352 0 8 144 3 3 16 38 38 1294 382
310
79949440 11044024 3 234 0 0 0 0 0 0 0 0 0 0 0
0
79946224 11041472 0 286 0 0 0 0 0 0 0 0 0 0 0
0
79940672 11037392 0 227 0 0 0 0 0 0 0 0 0 0 0
0
79939440 11035592 0 0 0 0 0 0 0 0 0 0 0 0 0
0
79936296 11031664 12 577 28 0 0 0 0 0 0 0 0 0 28
28
79934240 11030512 51 249 0 0 0 0 0 0 0 0 0 0 0
0
79931920 11029176 18 227 172 0 0 0 0 0 0 0 0 0 172
172
79930704 11028264 0 58 0 0 0 0 0 0 0 0 0 0 0
0
79926096 11025032 3 205 0 0 0 0 0 0 0 0 0 0 0
0
79927752 11024472 57 691 0 0 0 0 0 0 0 0 0 0 0
0
79922632 11019248 0 223 0 0 0 0 0 0 0 0 0 0 0
0
79920776 11017768 98 984 0 0 0 0 0 0 0 0 0 0 0
0
So as you can see here, using the process described above, the system hit a
point where it was low on physical memory, turned the scanner on (name for
the clock algorithm described above) and it found some filesystem pages to
flush to their backing store that have not been used recently. Nothing went
to the swap device.
Now there is a little sitting on the swap device here, but this system has
been up for several days. What im stressing here is that it doesn't normally
use the swap devices at all.
When you size your DB you have control over how much memory is used for
buffer caches and sort area (for hash joins, etc..). you should be able to
size your instances so that they only occasionally hit the swap device. Now
if you can't afford the correct amount of hardware to stay off the swap
device, then that's another story, but what you imply above is that you
always use the swap device when runnning large DB's, and that just doesn't
make any sense to me.
--Buddy
On Sat, 2002-07-27 at 23:22, Buddy Lumpkin wrote:
> I thought linux worked more like Solaris where it didn't use any swap (AT
> ALL) until it has to... At least, I hope linux works this way.
I'd be suprised if Solaris did something that dumb.
You want to push out old long unaccessed pages of code to make room for
more cached disk blocks from files.
On 28 Jul 2002, Alan Cox wrote:
> On Sat, 2002-07-27 at 23:22, Buddy Lumpkin wrote:
> > I thought linux worked more like Solaris where it didn't use any swap (AT
> > ALL) until it has to... At least, I hope linux works this way.
>
> I'd be suprised if Solaris did something that dumb.
>
> You want to push out old long unaccessed pages of code to make room for
> more cached disk blocks from files.
AFAIK they quietly removed priority paging from Solaris 8,
somewhat embarrasing considering the publicity at its
introduction with Solaris 7, but no more embarrasing than
the regular VM rewrites Linux undergoes ;/
Now only if VM was a well-understood area and we could just
implement something known to work ... OTOH, that would take
away all the fun ;)
cheers,
Rik
--
Bravely reimplemented by the knights who say "NIH".
http://www.surriel.com/ http://distro.conectiva.com/
ok, first off let's not turn this into a *ix vs. *ix type discussion, I
didn't mean to imply Solaris is more superior.
But that's exactly what it does. Solaris doesn't move *anything* to swap
until you reach a watermark called cachefree (which may or may not be equal
to lotsfree)
Why would you want to push *anything* to swap until you have to?
Dirty filesystem pages have to be flushed to disk, it's just a question of
when. Why on earth would I ever decide to move anonymous pages for any
process to disk if I can flush dirty pages to their backing store and put
non-dirty filesystem pages back on the freelist?
Like I said, it's very rare for my systems to do any I/O to swap at all.
and it's pretty relative what "long unaccessed" means ..
--Buddy
-----Original Message-----
From: Alan Cox [mailto:[email protected]]
Sent: Saturday, July 27, 2002 4:40 PM
To: Buddy Lumpkin
Cc: Austin Gonyou; [email protected]; Ville Herva;
DervishD; Linux-kernel
Subject: RE: About the need of a swap area
On Sat, 2002-07-27 at 23:22, Buddy Lumpkin wrote:
> I thought linux worked more like Solaris where it didn't use any swap (AT
> ALL) until it has to... At least, I hope linux works this way.
I'd be suprised if Solaris did something that dumb.
You want to push out old long unaccessed pages of code to make room for
more cached disk blocks from files.
It's still there, but they say not to enable it...
There was a memory leak in the segmap and the only way pages would get freed
was by the scanner waking up (in the manner explained earlier). richard
mcdougald wrote an article about this called the paging storm.
They implemented a new system that still tends to give preference to
filesystem pages called a cyclical page cache. I haven't seen any
whitepapers on how it works though.
I wouldn't call it embarrasing though, they still recommend that you always
run it on pre-Solaris 8 systems because it tends to improve performance by
quite a bit on systems where there is lot's of filesystem I/O.
--Buddy
-----Original Message-----
From: Rik van Riel [mailto:[email protected]]
Sent: Saturday, July 27, 2002 3:35 PM
To: Alan Cox
Cc: Buddy Lumpkin; Austin Gonyou; [email protected];
Ville Herva; DervishD; Linux-kernel
Subject: RE: About the need of a swap area
On 28 Jul 2002, Alan Cox wrote:
> On Sat, 2002-07-27 at 23:22, Buddy Lumpkin wrote:
> > I thought linux worked more like Solaris where it didn't use any swap
(AT
> > ALL) until it has to... At least, I hope linux works this way.
>
> I'd be suprised if Solaris did something that dumb.
>
> You want to push out old long unaccessed pages of code to make room for
> more cached disk blocks from files.
AFAIK they quietly removed priority paging from Solaris 8,
somewhat embarrasing considering the publicity at its
introduction with Solaris 7, but no more embarrasing than
the regular VM rewrites Linux undergoes ;/
Now only if VM was a well-understood area and we could just
implement something known to work ... OTOH, that would take
away all the fun ;)
cheers,
Rik
--
Bravely reimplemented by the knights who say "NIH".
http://www.surriel.com/ http://distro.conectiva.com/
On Sat, 2002-07-27 at 23:39, Buddy Lumpkin wrote:
> Why would you want to push *anything* to swap until you have to?
To reduce the amount of disk access
> Dirty filesystem pages have to be flushed to disk, it's just a question of
Clean ones do not. Dirty ones are also copied to disk but remain in
memory for reread events. They may also be deleted before being written.
> and it's pretty relative what "long unaccessed" means ..
In the Linux case the page cache is basically not discriminating too
much about what page is (and it may be several things at once - cache,
executing code and file data) just its access history.
On 28 Jul 2002, Alan Cox wrote:
> On Sat, 2002-07-27 at 23:39, Buddy Lumpkin wrote:
> > Why would you want to push *anything* to swap until you have to?
>
> To reduce the amount of disk access
> > and it's pretty relative what "long unaccessed" means ..
>
> In the Linux case the page cache is basically not discriminating too
> much about what page is (and it may be several things at once - cache,
> executing code and file data) just its access history.
There is a case to make for evicting the page cache with more
priority than process memory ...
... but frequently accessed page cache memory should definately
stay in ram, while not accessed process memory should be evicted.
I'll make a quick patch for this (for recent 2.5) today.
regards,
Rik
--
Bravely reimplemented by the knights who say "NIH".
http://www.surriel.com/ http://distro.conectiva.com/
>> Why would you want to push *anything* to swap until you have to?
>To reduce the amount of disk access
So in Solaris, the scanner is going to eventually wake up as long as your
doing filesystem I/O, it's just a question of how long it takes to reach
lotsfree.
During that time it caches anything and everything in physical memory.
Are you implying that it should be looking for pages to swap out this whole
time to free up more space for filesystem and executable pages purely based
on lru? Have you done testing to prove that this is a better approach than
setting a threshold of when to wake up the lru mechanism?
>> Dirty filesystem pages have to be flushed to disk, it's just a question
of
>Clean ones do not. Dirty ones are also copied to disk but remain in
>memory for reread events. They may also be deleted before being written.
Solaris keeps dirty pages after they have been flushed to their backing
store, it's just when the system has to choose something to flush that it
preferences filesystem over anonymous and executable, what's wrong with
that?
>> and it's pretty relative what "long unaccessed" means ..
>In the Linux case the page cache is basically not discriminating too
>much about what page is (and it may be several things at once - cache,
>executing code and file data) just its access history.
Interesting ...
On Sun, 2002-07-28 at 00:01, Buddy Lumpkin wrote:
> Are you implying that it should be looking for pages to swap out this whole
> time to free up more space for filesystem and executable pages purely based
> on lru? Have you done testing to prove that this is a better approach than
> setting a threshold of when to wake up the lru mechanism?
Not all the time - when there is pressure to find more pages.
> Solaris keeps dirty pages after they have been flushed to their backing
> store, it's just when the system has to choose something to flush that it
> preferences filesystem over anonymous and executable, what's wrong with
> that?
Many of its pages are both file system and executable. Solaris shares
read-only pages between the caches and the mappings into process spaces.
I can understand favouring flushing mapped files because swap is
generally slower than restoring a file backed mapping
On Sun, 2002-07-28 at 00:01, Buddy Lumpkin wrote:
>> Are you implying that it should be looking for pages to swap out this
whole
>> time to free up more space for filesystem and executable pages purely
based
>> on lru? Have you done testing to prove that this is a better approach
than
>> setting a threshold of when to wake up the lru mechanism?
>Not all the time - when there is pressure to find more pages.
ok, then it turns out that we agree, Solaris does "the right thing" in this
respect.
>> Solaris keeps dirty pages after they have been flushed to their backing
>> store, it's just when the system has to choose something to flush that it
>> preferences filesystem over anonymous and executable, what's wrong with
>> that?
>Many of its pages are both file system and executable. Solaris shares
>read-only pages between the caches and the mappings into process spaces.
>I can understand favouring flushing mapped files because swap is
>generally slower than restoring a file backed mapping
right, solaris share everything!
it maps text, data, etc... MAP_PRIVATE (COW) and uses the segmap to unify
access between read, write and mmaped MAP_SHARED pages so that
only a single page exists for any single filesystem page. You can see proof
of this by writing a little C program and then while it's running modify it
(say it prints for instance changes from "this is a test" to "this xx a
xxxx" ) and overwrite the new file with mmap MAP_SHARED type access or with
write(), the programs output will change on the fly while it's running.
When I differentiate between filesystem pages and executable, I should have
been specific in saying that executable pages almost always have a named
file as thier backing store, but for this discussion executable means that
the page has the execute bit set.
For priority paging they actually warn that standard files that are mmapped
will be incorrectly treated like executables if they have the execute bits
set. This probably happens quite often so it's not perfect.
--Buddy
On Sunday 28 July 2002 00:49, Buddy Lumpkin wrote:
> They implemented a new system that still tends to give preference to
> filesystem pages called a cyclical page cache. I haven't seen any
> whitepapers on how it works though.
This is fairly informative:
http://www.princeton.edu/~unix/Solaris/troubleshoot/ram.html
--
Daniel
On Sat, Jul 27, 2002 at 03:39:41PM -0700, you [Buddy Lumpkin] wrote:
>
> Why would you want to push *anything* to swap until you have to?
If you have idle io time in your hands, you can choose to back up some dirty
anonymous pages to the swap device. This way, when pages really needs to get
freed, you can just drop the pages (just like you would drop clean file
backed pages.) This obviously eliminates a great latency (somebody said
something about a "swap storm"), because the write happened beforehand.
There's nothing wrong with the swap being in use (and the pages may still be
in memory). If you have swap, it makes sense to use it. What doesn't make
sense is to waste time waiting for paging to happen.
-- v --
[email protected]
On Sat, Jul 27, 2002 at 03:39:41PM -0700, you [Buddy Lumpkin] wrote:
>>
>> Why would you want to push *anything* to swap until you have to?
>If you have idle io time in your hands, you can choose to back up some
dirty
>anonymous pages to the swap device. This way, when pages really needs to
get
>freed, you can just drop the pages (just like you would drop clean file
>backed pages.) This obviously eliminates a great latency (somebody said
>something about a "swap storm"), because the write happened beforehand.
>There's nothing wrong with the swap being in use (and the pages may still
be
>in memory). If you have swap, it makes sense to use it. What doesn't make
>sense is to waste time waiting for paging to happen.
In Solaris you don't even need to define a swap device at all.
If your sure that you will never reach lotsfree (for that matter, nothing
stops you from setting lotsfree, desfree and minfree to whatever value you
want) Solaris will happily run without a swap device even defined.
Once you reach the lotsfree watermark it's a whole different story, then it
makes perfect sense to queue up writes to the swap device and write
them out to swap in a sensible way as you point out above, but when I made
the comment above, I was referring to a system that is not low on memory.
Regards,
--Buddy
On Sun, Jul 28, 2002 at 12:59:13AM -0700, you [Buddy Lumpkin] wrote:
>
> In Solaris you don't even need to define a swap device at all.
> If your sure that you will never reach lotsfree (for that matter, nothing
> stops you from setting lotsfree, desfree and minfree to whatever value you
> want) Solaris will happily run without a swap device even defined.
You don't have to have swap device in linux either (afaik it has never been
mandatory). Linux will run without swap just like you would expect.
> Once you reach the lotsfree watermark it's a whole different story, then it
> makes perfect sense to queue up writes to the swap device and write
> them out to swap in a sensible way as you point out above, but when I made
> the comment above, I was referring to a system that is not low on memory.
Obviously, if you have heaps of free memory, linux will not usually touch
swap either. The exact point where swap starts to get used is of course a
matter of debate (and depends on vm implementation and tunables in linux).
But the point is: even if there is no immediate memory pressure, backing
pages to swap is no crime.
-- v --
[email protected]
Hi Ville :)
>> Yes, that makes sense, obviously. My question is more: when an
>> inactive page will be swapped out? Only when there is no more RAM
>> left?
>No, it is smarter than that.
Well, I supposed it ;) Then I will reduce my swap area size
(since it's mostly unused) and will go with it :)
>> How to configure it?
>Through the tunables in /proc/sys/vm/.
Ok, thanks :))
Ra?l
On Sun, 28 Jul 2002, Ville Herva wrote:
> If you have swap, it makes sense to use it. What doesn't make
> sense is to waste time waiting for paging to happen.
Unless of course you're running on battery power...
Rik
--
Bravely reimplemented by the knights who say "NIH".
http://www.surriel.com/ http://distro.conectiva.com/
On Sun, Jul 28, 2002 at 11:11:57AM -0300, you [Rik van Riel] wrote:
> On Sun, 28 Jul 2002, Ville Herva wrote:
>
> > If you have swap, it makes sense to use it. What doesn't make
> > sense is to waste time waiting for paging to happen.
>
> Unless of course you're running on battery power...
Well, that of course an entirely different (and I might call it special)
condition.
In theory you could still write anonymous pages to swap device, and then
have the swap disk spun down/go to powersave state. The swap space is still
in use, but you take the "do not spin up/use the disk, please" requirement
in consideration by not dropping the swap-backed in-memory pages. Once the
swap disk is spun up, you restore the normal operating mode and take
advantage of the still swap-backed anonymous pages (provided of course they
haven't been dirtied in between).
That of course is only theoretical speculation. Surely no os goes that far.
I understand linux even hasn't a mechanism to avoid swapping when power
needs to be saved?
-- v --
[email protected]
On Sat, 2002-07-27 at 17:22, Buddy Lumpkin wrote:
> >You really must think beyond the desktop as well. With large servers
> >running many databases, or a single large database, you will inherently
> >use swap. Maybe not much, but it will get used.
> >On a P4 Xeon 1MB L3 server with 8GB ram, I've got 4GB swap configured,
> >and use about 2 of that with a 4 oracle instances running. The largest
> >instance is ~700GB, whereas the 4 others are ~30GB ea.
>
> >In this scenario you have a large SHMMAX defined (4GB in this case), or
> >50% available RAM. As Oracle, Java, and other bits are used in the
> >system threading or not, most of the entirety of the availble ram will
> >eventually get used. The available to cache ratio on a box like this
> >with 2.4.19-rc1 is ~2% free ram, and 95% cached, and 3% active.
> >Swap is ~50% right now. So regardless of how much ram you have, you will
> >swap some, somewhere.
>
>
> I thought linux worked more like Solaris where it didn't use any swap (AT
> ALL) until it has to... At least, I hope linux works this way.
Let me preface that. We are in a migration from E4500s to X86. Our
production boxen are using > 12GB Swap ea. Most of the time and have 8GB
ram ea. We do a lot of monitoring of our db because of the data types
we're using. Essentially, not OLTP, but batch loading and historical
data mining.
> I manage a couple of Sun E10K domains (currently 20 procs, 20GB ram) running
> Oracle instances that are in excess
> of 1.5TB (for a single instance, this is considered very large for OLTP
> based usage of an Oracle instance)
> and we rarely see any use of our swap devices.
>
...See above...
> I believe Linux does something similar (even though the implementations
> probably look completely different)...
It could but Linux does act very different in this respect, and is
usually more efficient about it.
> The point to make here is that this mechanism doesn't even kick in until
> free physical memory on the system drops
> to a low watermark (in Solaris it could be cachefree or lotsfree depending
> on version and whether it's using priority paging).
> Your not gonna have *anything* on the swap device until you have reached one
> of these watermarks (1/64 physical memory free in Solaris 8).
>
> Solaris recently added an option to vmstat to look at paging statistics. It
> nicely seperates out executable anonymous and filesystem page in/outs. Heres
> some sample output fom one of the domains mentioned above:
>
> # uname -a
> SunOS <hostname removed> 5.8 Generic_108528-07 sun4u sparc
> SUNW,Ultra-Enterprise-10000
> # prtconf | head
> System Configuration: Sun Microsystems sun4u
> Memory size: 20480 Megabytes
> System Peripherals (Software Nodes):
>
> SUNW,Ultra-Enterprise-10000
> packages (driver not attached)
> terminal-emulator (driver not attached)
> deblocker (driver not attached)
> obp-tftp (driver not attached)
> disk-label (driver not attached)
>
> # vmstat -p 2
> memory page executable anonymous
> filesystem
> swap free re mf fr de sr epi epo epf api apo apf fpi fpo
> fpf
> 80330680 10599432 2613 14569 352 0 8 144 3 3 16 38 38 1294 382
> 310
> 79949440 11044024 3 234 0 0 0 0 0 0 0 0 0 0 0
> 0
> 79946224 11041472 0 286 0 0 0 0 0 0 0 0 0 0 0
> 0
> 79940672 11037392 0 227 0 0 0 0 0 0 0 0 0 0 0
> 0
> 79939440 11035592 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0
> 79936296 11031664 12 577 28 0 0 0 0 0 0 0 0 0 28
> 28
> 79934240 11030512 51 249 0 0 0 0 0 0 0 0 0 0 0
> 0
> 79931920 11029176 18 227 172 0 0 0 0 0 0 0 0 0 172
> 172
> 79930704 11028264 0 58 0 0 0 0 0 0 0 0 0 0 0
> 0
> 79926096 11025032 3 205 0 0 0 0 0 0 0 0 0 0 0
> 0
> 79927752 11024472 57 691 0 0 0 0 0 0 0 0 0 0 0
> 0
> 79922632 11019248 0 223 0 0 0 0 0 0 0 0 0 0 0
> 0
> 79920776 11017768 98 984 0 0 0 0 0 0 0 0 0 0 0
> 0
Yeah...I know. We page our asses off...on Solaris. :-D
> Nothing went to the swap device.
I think it's application and type related.
> Now there is a little sitting on the swap device here, but this system has
> been up for several days. What im stressing here is that it doesn't normally
> use the swap devices at all.
Our boxen were up for > 60 days before our latest maintenance window,
and yes, we're at ~12GB again now.
> When you size your DB you have control over how much memory is used for
> buffer caches and sort area (for hash joins, etc..). you should be able to
> size your instances so that they only occasionally hit the swap device. Now
> if you can't afford the correct amount of hardware to stay off the swap
> device, then that's another story,
Not so much about large DBs, but large DBs with lots of different things
around it. Let's put it more into perspective, that our systems don't
swap until oracle starts doing stuff..it could easily be schema and
package related, there are a *few* table-scans we *must* do in our
application. Until the code is changed in the future, that will not
change. We also have about 50-100 sqlplus operations going 100% of the
time, in addition to our loading connections.
We've tweaked, re-tweaked, etc, over and over and over, had countless
people from oracle come to take a crack, and actually, this is as good
as it gets for us ATM. :-D At least on Linux we're only using 2GB.
> but what you imply above is that you
> always use the swap device when runnning large DB's, and that just doesn't
> make any sense to me.
I'm not implying that at all, but I can see how you might think so. What
I was really implying, is that regardless if you have a lot of memory or
not, it's how you use that memory in relationship with how much you have
that will determine if you swap or not. In a perfect world, it might be
safe to say "I don't need swap at all ever.", but I can't say that it's
something that *has* become normal practice with production desktops and
servers.
Almost everything can use a little bit of swap, regardless of how much
ram you might have.
> --Buddy
--
Austin Gonyou <[email protected]>
On Sat, Jul 27, 2002 at 03:39:41PM -0700, you [Buddy Lumpkin] wrote:
>>
>> Why would you want to push *anything* to swap until you have to?
>If you have idle io time in your hands, you can choose to back up some
dirty
>anonymous pages to the swap device. This way, when pages really needs to
get
>freed, you can just drop the pages (just like you would drop clean file
>backed pages.) This obviously eliminates a great latency (somebody said
>something about a "swap storm"), because the write happened beforehand.
>There's nothing wrong with the swap being in use (and the pages may still
be
>in memory). If you have swap, it makes sense to use it. What doesn't make
>sense is to waste time waiting for paging to happen.
This just flat out doesn't make sense to me ...
The system I showed stats on earlier has been up for 57 days. Periodically
file system I/O pushes
freemem below lotsfree and wakes up the scanner. The scanner wakes up and
finds some filesystem
pages that haven't been referenced or modified in a really long, long time
and frees a few of them, then
it goes back to sleep. This keeps a ton of pages in RAM strictly for caching
value (although dirty pages
are flushed periodically, they are kept in RAM too). Then when a shared
mapping to a file occurs or a file
is opened, and accessed with read or write, it can use the page fault
mechanism (minor fault) to retrieve
those pages (using vnode + offset of the page) as apposed to going to disk.
By looking at it, at one of more rare occasions, it must have pushed some
anonymous
pages to the swap devices, and there they sit pretty much doing nothing. But
thats the
nice thing about it ... Why would I want I/O going all the time in
anticipation
of a memory shortage that will rarely happen, or might not happen at all! If
I understand
you correctly, your imagining all of the up front work you could be doing in
anticipation
of the crawling system that could benefit from pages already pushed to the
swap device,
but that would only be one case.
If im willing to spend the money for tons of RAM I shouldn't have to incur
the overhead of going
out to the swap device at all unless I truly get short on memory.
Don't just assume that it's inevitable that I will have to swap at some
point.
And when you refer to idle I/O time, do you mean I/O to the swap device(s)
or all I/O on the system (IO to all disks, network, etc..) ?
--Buddy
On Sun, 2002-07-28 at 14:48, Buddy Lumpkin wrote:
>
> On Sat, Jul 27, 2002 at 03:39:41PM -0700, you [Buddy Lumpkin] wrote:
> >>
> >> Why would you want to push *anything* to swap until you have to?
>
> >If you have idle io time in your hands, you can choose to back up some
> dirty
> >anonymous pages to the swap device. This way, when pages really needs to
> get
> >freed, you can just drop the pages (just like you would drop clean file
> >backed pages.) This obviously eliminates a great latency (somebody said
> >something about a "swap storm"), because the write happened beforehand.
>
> >There's nothing wrong with the swap being in use (and the pages may still
> be
> >in memory). If you have swap, it makes sense to use it. What doesn't make
> >sense is to waste time waiting for paging to happen.
>
>
> This just flat out doesn't make sense to me ...
>
> The system I showed stats on earlier has been up for 57 days. Periodically
> file system I/O pushes
> freemem below lotsfree and wakes up the scanner. The scanner wakes up and
> finds some filesystem
> pages that haven't been referenced or modified in a really long, long time
> and frees a few of them, then
> it goes back to sleep. This keeps a ton of pages in RAM strictly for caching
> value (although dirty pages
> are flushed periodically, they are kept in RAM too). Then when a shared
> mapping to a file occurs or a file
> is opened, and accessed with read or write, it can use the page fault
> mechanism (minor fault) to retrieve
> those pages (using vnode + offset of the page) as apposed to going to disk.
>
> By looking at it, at one of more rare occasions, it must have pushed some
> anonymous
> pages to the swap devices, and there they sit pretty much doing nothing. But
> thats the
> nice thing about it ... Why would I want I/O going all the time in
> anticipation
> of a memory shortage that will rarely happen, or might not happen at all! If
> I understand
> you correctly, your imagining all of the up front work you could be doing in
> anticipation
> of the crawling system that could benefit from pages already pushed to the
> swap device,
> but that would only be one case.
>
> If im willing to spend the money for tons of RAM I shouldn't have to incur
> the overhead of going
> out to the swap device at all unless I truly get short on memory.
> Don't just assume that it's inevitable that I will have to swap at some
> point.
>
> And when you refer to idle I/O time, do you mean I/O to the swap device(s)
> or all I/O on the system (IO to all disks, network, etc..) ?
>
> --Buddy
If you bother to do any real tests you'd see that linux will swap when
nothing is going on and this doesn't hinder anything. This overhead
you're imagining doesn't occur because Overhead only exists when you're
trying to do something. There is no drawback to how linux puts pages
into swap except what rik van riel said about battery powered boxes.
Even so I believe that's just a /proc tunable fix. Otherwise there is
a situation where it's advantageous to do what linux does and none where
it isn't.
It's not like your swap device is growing and using more swap means less
space for programs. You have X amount of swap space.. like other
people have said, might as well make use of it if you can. Who cares
if that situation may not occur, it's not detrimental to anything.
On 28 Jul 2002, Ed Sweetman wrote:
> If you bother to do any real tests you'd see that linux will swap when
> nothing is going on and this doesn't hinder anything.
Linux only puts pages in swap when it's low on free physical memory.
regards,
Rik
--
Bravely reimplemented by the knights who say "NIH".
http://www.surriel.com/ http://distro.conectiva.com/
On Sun, 2002-07-28 at 15:29, Rik van Riel wrote:
> On 28 Jul 2002, Ed Sweetman wrote:
>
> > If you bother to do any real tests you'd see that linux will swap when
> > nothing is going on and this doesn't hinder anything.
>
> Linux only puts pages in swap when it's low on free physical memory.
Perhaps, but linux considers disk cache as "in use" memory and most
people would consider it free memory that's just temporarily being taken
advantage of "in case". Linux will still swap even if 60% of ram is
filesystem cache. I dont have a problem with it, was just stating some
real observations.
> regards,
>
> Rik
> --
On Sun, Jul 28, 2002 at 11:48:51AM -0700, you [Buddy Lumpkin] wrote:
>
> If im willing to spend the money for tons of RAM I shouldn't have to incur
> the overhead of going out to the swap device at all unless I truly get
> short on memory. Don't just assume that it's inevitable that I will have
> to swap at some point.
I don't get it. Why do you insist swap device must not be touched unless the
system is suffering severe memory shortage? If the anonymous pages are only
written out under dire shortage, you'll have to wait longer for memory to
get freed. If you never face the shortage - well, then, you don't. That's
it, no harm done swap-backing the pages. And remember, the fact that
something is written to swap doesn't mean it couldn't still exist in memory.
Why would do you wan't the swap device not to be touched when there's
nothing else going on? I mean, if you don't want the system to use the swap
device, don't configure one.
-- v --
[email protected]
On Sun, 2002-07-28 at 15:29, Rik van Riel wrote:
>> On 28 Jul 2002, Ed Sweetman wrote:
>>
>> > If you bother to do any real tests you'd see that linux will swap when
>> > nothing is going on and this doesn't hinder anything.
>>
>> Linux only puts pages in swap when it's low on free physical memory.
>Perhaps, but linux considers disk cache as "in use" memory and most
>people would consider it free memory that's just temporarily being taken
>advantage of "in case". Linux will still swap even if 60% of ram is
>filesystem cache. I dont have a problem with it, was just stating some
>real observations.
I don't remember anyone implying that pages in memory that are backed by a
named file on a filesystem are "free memory" in Linux or Solaris.
If you thought you read this you should traverse the thread again.
The discussion was centered around whether it would "add value" to
preference filesystem
pages over anonymous and executable pages when you reach the point where you
have
to start looking for pages to reclaim because of a physical memory shortage.
By all means, Solaris will swap pages and eventually entire processes if it
needs to, it just tries
to grab the oldest filesystem pages first. If that's not working (memory
shortage is still getting
worse even though the scanner is running) it will reach the next watermark
which changes the behavior
of the scanner.
Another example of this kind of behavior is how the scanner in Solaris skips
over extensively shared libraries.
The scanner looks at the share reference count for each page and if the page
is shared more than
a certain amount (certain number of processes), then it is skipped during
the page scan operation.
Regards,
--Buddy
On Sat, Jul 27, 2002 at 02:22:20PM +0200, DervishD wrote:
>
> So my question is: should I use a swap-area for improving
> performance (or whatever else), or should I use those precious bytes
> to improving my porn collection }:))?
I'll make you a deal: I won't talk about my porn collection on l-k if
you won't either.
Unless you _want_ to hear about my Playgirl archive... :)
-VAL
Ville Herva <[email protected]> writes:
> > How to configure it?
>
> Through the tunables in /proc/sys/vm/.
By the way, speaking of /proc/sys, could we decide on either hyphens,
or underscores, but not both?
# ls /proc/sys/vm
bdflush max_map_count min-readahead page-cluster
kswapd max-readahead overcommit_memory pagetable_cache
(this is 2.4.19-rc3)
I'd submit a patch except the asbestos underwear is in the wash
today. (IOW, I don't know which would be preferred... I suspect
underscores.)
ian
On Sun, Jul 28, 2002 at 12:40:11AM +0100, Alan Cox wrote:
> On Sat, 2002-07-27 at 23:22, Buddy Lumpkin wrote:
> > I thought linux worked more like Solaris where it didn't use any swap (AT
> > ALL) until it has to... At least, I hope linux works this way.
>
> I'd be suprised if Solaris did something that dumb.
>
> You want to push out old long unaccessed pages of code to make room for
> more cached disk blocks from files.
... unless the disk blocks are coming in due to a sequential stream
that's much larger than memory, in which case paging out user data to
expand the buffer cache is an exercise in futility that makes the system
behave sluggishly long AFTER the stream is done streaming through.
I see this behavior every morning after the nightly backup is done
(pulling in about 20 GB of data on a 256MB machine) -- my window manager
and browser are absurdly sluggish for about 20 seconds.
-andy