2001-11-18 21:13:12

by Justin Piszcz

[permalink] [raw]
Subject: Swap

It is amazing that I could run all of that stuff, because:

When I have swap on, and if I run all of those programs, 200-400MB of
swap is used.



2001-11-18 21:26:15

by James A Sutherland

[permalink] [raw]
Subject: Re: Swap

On Sunday 18 November 2001 9:12 pm, war wrote:
> It is amazing that I could run all of that stuff, because:
>
> When I have swap on, and if I run all of those programs, 200-400MB of
> swap is used.

Yep. There's a reason for that: the kernel is *ALWAYS* able to swap pages out
to disk - even without "swap space". Disabling swapspace simply forces the
kernel to swap out more code, since it cannot swap out any data.

(This is why you can still get "disk thrashing" without any swap - in fact,
it's more likely in this case than it is with some swap added - you are just
forcing your binaries to take more of the swapping load instead.)


So: with swapspace, the kernel swaps out a few hundred Mb of unused data, to
make room for more code. Without it, the kernel is forced to swap out code
pages instead. The big news here is...?


James.

2001-11-18 21:29:34

by Justin Piszcz

[permalink] [raw]
Subject: Re: Swap

Well, without the swap, everything seems to be about 100% more responsive when
I execute any task.
I see how it works now.

James A Sutherland wrote:

> On Sunday 18 November 2001 9:12 pm, war wrote:
> > It is amazing that I could run all of that stuff, because:
> >
> > When I have swap on, and if I run all of those programs, 200-400MB of
> > swap is used.
>
> Yep. There's a reason for that: the kernel is *ALWAYS* able to swap pages out
> to disk - even without "swap space". Disabling swapspace simply forces the
> kernel to swap out more code, since it cannot swap out any data.
>
> (This is why you can still get "disk thrashing" without any swap - in fact,
> it's more likely in this case than it is with some swap added - you are just
> forcing your binaries to take more of the swapping load instead.)
>
> So: with swapspace, the kernel swaps out a few hundred Mb of unused data, to
> make room for more code. Without it, the kernel is forced to swap out code
> pages instead. The big news here is...?
>
> James.

2001-11-18 21:39:38

by FD Cami

[permalink] [raw]
Subject: Re: Swap


I don't understand why it should be better with swap then... I mean,
my comp seems to run so much faster (it doesn't take time to switch
from one app to another, i mean) *without* swap.
And I see no benefits to having an active swap, other than making my
hard drive work harder.

comp is PIII933/512MB on ATA100
kernel is 2.4.14 with XFS patch.

Fran?ois


war wrote:

> Well, without the swap, everything seems to be about 100% more responsive when
> I execute any task.
> I see how it works now.
>
> James A Sutherland wrote:
>
>
>>On Sunday 18 November 2001 9:12 pm, war wrote:
>>
>>>It is amazing that I could run all of that stuff, because:
>>>
>>>When I have swap on, and if I run all of those programs, 200-400MB of
>>>swap is used.
>>>
>>Yep. There's a reason for that: the kernel is *ALWAYS* able to swap pages out
>>to disk - even without "swap space". Disabling swapspace simply forces the
>>kernel to swap out more code, since it cannot swap out any data.
>>
>>(This is why you can still get "disk thrashing" without any swap - in fact,
>>it's more likely in this case than it is with some swap added - you are just
>>forcing your binaries to take more of the swapping load instead.)
>>
>>So: with swapspace, the kernel swaps out a few hundred Mb of unused data, to
>>make room for more code. Without it, the kernel is forced to swap out code
>>pages instead. The big news here is...?
>>
>>James.
>>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>



2001-11-18 21:46:08

by Justin Piszcz

[permalink] [raw]
Subject: Re: Swap

I completely agree with you.
p3/866/1024MB here.
Everything seems much faster; and I can run 512 processes of varying memory
"weights" without a hitch.

Fran?ois Cami wrote:

> I don't understand why it should be better with swap then... I mean,
> my comp seems to run so much faster (it doesn't take time to switch
> from one app to another, i mean) *without* swap.
> And I see no benefits to having an active swap, other than making my
> hard drive work harder.
>
> comp is PIII933/512MB on ATA100
> kernel is 2.4.14 with XFS patch.
>
> Fran?ois
>
> war wrote:
>
> > Well, without the swap, everything seems to be about 100% more responsive when
> > I execute any task.
> > I see how it works now.
> >
> > James A Sutherland wrote:
> >
> >
> >>On Sunday 18 November 2001 9:12 pm, war wrote:
> >>
> >>>It is amazing that I could run all of that stuff, because:
> >>>
> >>>When I have swap on, and if I run all of those programs, 200-400MB of
> >>>swap is used.
> >>>
> >>Yep. There's a reason for that: the kernel is *ALWAYS* able to swap pages out
> >>to disk - even without "swap space". Disabling swapspace simply forces the
> >>kernel to swap out more code, since it cannot swap out any data.
> >>
> >>(This is why you can still get "disk thrashing" without any swap - in fact,
> >>it's more likely in this case than it is with some swap added - you are just
> >>forcing your binaries to take more of the swapping load instead.)
> >>
> >>So: with swapspace, the kernel swaps out a few hundred Mb of unused data, to
> >>make room for more code. Without it, the kernel is forced to swap out code
> >>pages instead. The big news here is...?
> >>
> >>James.
> >>
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> >
> >

2001-11-18 22:06:11

by J.A. Magallon

[permalink] [raw]
Subject: Re: Swap


On 20011118 James A Sutherland wrote:
>On Sunday 18 November 2001 9:12 pm, war wrote:
>> It is amazing that I could run all of that stuff, because:
>>
>> When I have swap on, and if I run all of those programs, 200-400MB of
>> swap is used.
>
>Yep. There's a reason for that: the kernel is *ALWAYS* able to swap pages out
>to disk - even without "swap space". Disabling swapspace simply forces the
>kernel to swap out more code, since it cannot swap out any data.
>

Sure ??? Where ?? What disk space uses it to swap pages to ?

>(This is why you can still get "disk thrashing" without any swap - in fact,
>it's more likely in this case than it is with some swap added - you are just
>forcing your binaries to take more of the swapping load instead.)
>

You get thrashing because you don have anything cached. So you can get a point
(fill all your space with apps and data) where each file read is _REALLY_ a
disk read, not just a transfer from cache (that is what usually happens).

>
>So: with swapspace, the kernel swaps out a few hundred Mb of unused data, to
>make room for more code. Without it, the kernel is forced to swap out code
>pages instead. The big news here is...?
>

You swap out pages, not data or code. Kernel does not care if the page contains
code or data. Try (on a swap enabled box) this: open mozilla or staroffice (a
big gui app), let it open and don't use it, fill your ram with other apps and
try to pull down a menu from mozilla. It has an unusual delay, the time to get
mozilla CODE pages back from swap. That is why a system with no swap is more
responsive.

Yes, a box without swap runs faster, but if you *don't do anything* with it. The test
shown in previous mails had a ton of apps opened *doing nothing*. Try do do
a grep several times on the kernel source tree for example in that scenario.
Or a kernel build. They will be dog slow (all the tries). Try the same on
a box with swap, the second time much things are cached and it flies.

--
J.A. Magallon # Let the source be with you...
mailto:[email protected]
Mandrake Linux release 8.2 (Cooker) for i586
Linux werewolf 2.4.15-pre6-beo #1 SMP Sun Nov 18 10:25:01 CET 2001 i686

2001-11-18 22:18:29

by FD Cami

[permalink] [raw]
Subject: Re: Swap

J.A. Magallon wrote:


> Yes, a box without swap runs faster, but if you *don't do anything* with it. The test
> shown in previous mails had a ton of apps opened *doing nothing*. Try do do
> a grep several times on the kernel source tree for example in that scenario.
> Or a kernel build. They will be dog slow (all the tries). Try the same on
> a box with swap, the second time much things are cached and it flies.

I tend to both agree and disagree with you.


fact :
I don't use more than 350MB of my 512MB for apps (and that's a
worst case scenario), and I guess than 150MB of RAM is enough
for caching, at least in my case.

I agree that a box that uses 99% of its RAM for apps
will be dog slow ; but I simply have to disagree with
swapping apps I *use* when 66% of my RAM is free.
Have you tried pulling openoffice from swap to RAM ?
If doing a second [and third, and so on] grep on the kernel source
tree is dog slow without swap, pulling openoffice from swap is
*snail* slow.

Fran?ois

2001-11-18 22:28:21

by Dan Maas

[permalink] [raw]
Subject: Re: Swap

> >Yep. There's a reason for that: the kernel is *ALWAYS*
> >able to swap pages out to disk - even without "swap space".
> >Disabling swapspace simply forces the kernel to swap out
> >more code, since it cannot swap out any data.
>
> Sure ??? Where ?? What disk space uses it to swap pages to ?

The executables and binaries on your regular filesystems... Even with no
swap space, the kernel can "page out" (i.e. drop from memory) read-only file
mappings, since they can always be reloaded from disk if needed.

In other words, there is still a big difference between running without swap
space, and having every program do an mlockall() (which *really* forces all
pages to be permanently resident in RAM).

Still, it puzzles me why a system with no swap space would appear to be more
responsive than one with swap (assuming their working sets are quite a bit
smaller than total amount of RAM)... Can you do a controlled test somehow,
to rule out any sort of placebo effect?

Regards,
Dan

2001-11-18 22:38:51

by Charles Marslett

[permalink] [raw]
Subject: Re: Swap

"J.A. Magallon" wrote:
> On 20011118 James A Sutherland wrote:
> >On Sunday 18 November 2001 9:12 pm, war wrote:
> >> It is amazing that I could run all of that stuff, because:
> >>
> >> When I have swap on, and if I run all of those programs, 200-400MB of
> >> swap is used.
> >
> >Yep. There's a reason for that: the kernel is *ALWAYS* able to swap pages out
> >to disk - even without "swap space". Disabling swapspace simply forces the
> >kernel to swap out more code, since it cannot swap out any data.
>
> Sure ??? Where ?? What disk space uses it to swap pages to ?

The code is "swapped" to the original file it was loaded from. You just
free up the pages for someone else to use until you get a page fault in that
task, then reload it from the original executable. That may have something
to do with the fact that he gets better performance without a swap file allocated,
since code swaps never write, only read (half as much disk I/O). I could see
some workloads that essentially use every bit of data all the time, and swapping
code only is an optimization. Nothing I've ever profiled worked that way,
though. And I thought even in this case the system would tend to swap code
in preference to dirty data (I have to go back and look at the code to say
for sure, though).

> >(This is why you can still get "disk thrashing" without any swap - in fact,
> >it's more likely in this case than it is with some swap added - you are just
> >forcing your binaries to take more of the swapping load instead.)
> >
>
> You get thrashing because you don have anything cached. So you can get a point
> (fill all your space with apps and data) where each file read is _REALLY_ a
> disk read, not just a transfer from cache (that is what usually happens).

But that would never run faster than enabling the cache (unless the cache code
was competitive with Microsoft's).

> >So: with swapspace, the kernel swaps out a few hundred Mb of unused data, to
> >make room for more code. Without it, the kernel is forced to swap out code
> >pages instead. The big news here is...?
> >
>
> You swap out pages, not data or code. Kernel does not care if the page contains
> code or data. Try (on a swap enabled box) this: open mozilla or staroffice (a
> big gui app), let it open and don't use it, fill your ram with other apps and
> try to pull down a menu from mozilla. It has an unusual delay, the time to get
> mozilla CODE pages back from swap. That is why a system with no swap is more
> responsive.

Check out the thread entitled "Executing binaries on new filesystem" -- they talk
quite a bit about mmap() and how it is used in loading code space. You are right
about swapping out pages, not data or code, but the pages are written only if
they are dirty. A page that is not dirty does not need to be written to be swapped
out, and code pages are almost never dirty (I think). So they can be "swapped" out
without any place to write them, unlike data pages that have anything but zeros
in them.

> Yes, a box without swap runs faster, but if you *don't do anything* with it. The test
> shown in previous mails had a ton of apps opened *doing nothing*. Try do do
> a grep several times on the kernel source tree for example in that scenario.
> Or a kernel build. They will be dog slow (all the tries). Try the same on
> a box with swap, the second time much things are cached and it flies.

Ah! This may well be the explanation for the apparent performance boost. Now I'm
really interested in digging into the paging algorithms....

> --
> J.A. Magallon # Let the source be with you...
> mailto:[email protected]
> Mandrake Linux release 8.2 (Cooker) for i586
> Linux werewolf 2.4.15-pre6-beo #1 SMP Sun Nov 18 10:25:01 CET 2001 i686

--Charles

2001-11-18 22:41:21

by FD Cami

[permalink] [raw]
Subject: Re: Swap

Dan Maas wrote:


> Still, it puzzles me why a system with no swap space would appear to be more
> responsive than one with swap (assuming their working sets are quite a bit
> smaller than total amount of RAM)... Can you do a controlled test somehow,
> to rule out any sort of placebo effect?

It's pretty simple... Try putting as much progs as you can into RAM
(but less than total RAM size) when you have RAM+swap.
Switching from one prog to another now takes time, because if you need
to go e.g. from mozilla to openoffice for example, if openoffice has
been swapped, it'll take ages.

Another good example is launching X and a few heavy X apps, going back
to console, doing a few things, like compiling different kernel trees.
If you have swap, the X + X apps will be swapped. going back to X will
take ages, because all that data + code has to be moved out to RAM to
cache the data in the two kernel trees.
If you don't have swap, maybe one, or both of the two kernel trees
will end up being not cached into main memory, depending on how much
RAM left you have. but going back to X will take 1 second instead of 20,
and thus the system will be more responsive.

It depends clearly on the situation you're in. I believe running with
swap is beneficial when your memory load is more than 75% of total
RAM, and less so when you have a few hundred megs of RAM left with all
useful apps loaded into RAM (which is not too unlikely these days,
due to the low price of SD/DDR RAM).

Fran?ois

2001-11-18 22:54:42

by J.A. Magallon

[permalink] [raw]
Subject: Re: Swap


On 20011118 Charles Marslett wrote:
>"J.A. Magallon" wrote:
>> On 20011118 James A Sutherland wrote:
>> >On Sunday 18 November 2001 9:12 pm, war wrote:
>> >> It is amazing that I could run all of that stuff, because:
>> >>
>> >> When I have swap on, and if I run all of those programs, 200-400MB of
>> >> swap is used.
>> >
>> >Yep. There's a reason for that: the kernel is *ALWAYS* able to swap pages out
>> >to disk - even without "swap space". Disabling swapspace simply forces the
>> >kernel to swap out more code, since it cannot swap out any data.
>>
>> Sure ??? Where ?? What disk space uses it to swap pages to ?
>
>The code is "swapped" to the original file it was loaded from. You just
>free up the pages for someone else to use until you get a page fault in that
>task, then reload it from the original executable. That may have something
>to do with the fact that he gets better performance without a swap file allocated,
>since code swaps never write, only read (half as much disk I/O). I could see

Yup, I missed mmapped pages. You can drop them and reread, yes.

--
J.A. Magallon # Let the source be with you...
mailto:[email protected]
Mandrake Linux release 8.2 (Cooker) for i586
Linux werewolf 2.4.15-pre6-beo #1 SMP Sun Nov 18 10:25:01 CET 2001 i686

2001-11-18 23:36:29

by Bernd Eckenfels

[permalink] [raw]
Subject: Re: Swap

In article <[email protected]> you wrote:
>>Yep. There's a reason for that: the kernel is *ALWAYS* able to swap pages out
>>to disk - even without "swap space". Disabling swapspace simply forces the
>>kernel to swap out more code, since it cannot swap out any data.
>>

> Sure ??? Where ?? What disk space uses it to swap pages to ?

It does not swap code pages out. It simply forgets them and reloads ("page
them in") them when needed.

Greeetings
Bernd

2001-11-19 07:08:35

by Erik Gustavsson

[permalink] [raw]
Subject: Re: Swap

I agree... After a while it always seems that 80% or more of my RAM is
used for cache and buffers while my open, but not currently used apps
get pushed onto disk. Then when I decide to switch to that mozilla
window of emacs session I have to wait for it to be loaded from disk
again. Also considering the kind of disk activity this box has, the data
in the cache is mostly the last few hour's MP3's, in other words utterly
useless as that data will not be used again. I'd rather my apps stayed
in RAM...

Is there a way to limit the size of the cache?

/cyr

>
> I don't understand why it should be better with swap then... I mean,
> my comp seems to run so much faster (it doesn't take time to switch
> from one app to another, i mean) *without* swap.
> And I see no benefits to having an active swap, other than making my
> hard drive work harder.
>
> comp is PIII933/512MB on ATA100
> kernel is 2.4.14 with XFS patch.
>
> Fran?ois
>
>
> war wrote:
>
> > Well, without the swap, everything seems to be about 100% more responsive when
> > I execute any task.
> > I see how it works now.
> >
> > James A Sutherland wrote:
> >
> >
> >>On Sunday 18 November 2001 9:12 pm, war wrote:
> >>
> >>>It is amazing that I could run all of that stuff, because:
> >>>
> >>>When I have swap on, and if I run all of those programs, 200-400MB of
> >>>swap is used.
> >>>
> >>Yep. There's a reason for that: the kernel is *ALWAYS* able to swap pages out
> >>to disk - even without "swap space". Disabling swapspace simply forces the
> >>kernel to swap out more code, since it cannot swap out any data.
> >>
> >>(This is why you can still get "disk thrashing" without any swap - in fact,
> >>it's more likely in this case than it is with some swap added - you are just
> >>forcing your binaries to take more of the swapping load instead.)
> >>
> >>So: with swapspace, the kernel swaps out a few hundred Mb of unused data, to
> >>make room for more code. Without it, the kernel is forced to swap out code
> >>pages instead. The big news here is...?
> >>
> >>James.
> >>
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> >
> >
>
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>
--
-----------------------------------------------------------------------
Holly: Purple alert! Purple alert!
Lister: What's a purple alert?
Holly: Well, it's like not as bad as a red a alert, but a bit worse
than a blue alert -- sort of a mauve alert.

2001-11-19 09:18:31

by James A Sutherland

[permalink] [raw]
Subject: Re: Swap

On Sunday 18 November 2001 10:43 pm, Fran?ois Cami wrote:
> Dan Maas wrote:
> > Still, it puzzles me why a system with no swap space would appear to be
> > more responsive than one with swap (assuming their working sets are quite
> > a bit smaller than total amount of RAM)... Can you do a controlled test
> > somehow, to rule out any sort of placebo effect?
>
> It's pretty simple... Try putting as much progs as you can into RAM
> (but less than total RAM size) when you have RAM+swap.
> Switching from one prog to another now takes time, because if you need
> to go e.g. from mozilla to openoffice for example, if openoffice has
> been swapped, it'll take ages.

Except that openoffice and mozilla can be swapped out in BOTH cases: the
kernel can discard mapped pages and reread as needed, whether you have a swap
partition or not.

> Another good example is launching X and a few heavy X apps, going back
> to console, doing a few things, like compiling different kernel trees.
> If you have swap, the X + X apps will be swapped. going back to X will
> take ages, because all that data + code has to be moved out to RAM to
> cache the data in the two kernel trees.

Whereas without swapspace, only the read-only mapped pages can be swapped out.

> If you don't have swap, maybe one, or both of the two kernel trees
> will end up being not cached into main memory, depending on how much
> RAM left you have. but going back to X will take 1 second instead of 20,
> and thus the system will be more responsive.

You're trading throughput for responsiveness, here: you save 19 seconds
switching to/from X, but walking through the two kernel trees will be slowed
down by more than that amount... By most metrics, keeping X+apps in memory
and forcing your kernel tree accesses to hit the disk is the WRONG strategy.

(Making X mlock() some or all of itself into RAM might make sense here,
perhaps?)

> It depends clearly on the situation you're in. I believe running with
> swap is beneficial when your memory load is more than 75% of total
> RAM, and less so when you have a few hundred megs of RAM left with all
> useful apps loaded into RAM (which is not too unlikely these days,
> due to the low price of SD/DDR RAM).

Provided the VM is doing its job properly, adding swap will always be a net
win for efficiency: the kernel is able to dump unused pages to make more room
for others. Of course, you tend to "feel" the response times to interactive
events, rather than the overall throughput, so a change which slows the
system down but makes it more "responsive" to mouse clicks etc feels like a
net win...


James.

2001-11-19 10:03:44

by Tim Connors

[permalink] [raw]
Subject: Re: Swap

On Sun, 18 Nov 2001, [ISO-8859-15] Fran?ois Cami wrote:

> Dan Maas wrote:
>
>
> > Still, it puzzles me why a system with no swap space would appear to be more
> > responsive than one with swap (assuming their working sets are quite a bit
> > smaller than total amount of RAM)... Can you do a controlled test somehow,
> > to rule out any sort of placebo effect?
>
> It's pretty simple... Try putting as much progs as you can into RAM
> (but less than total RAM size) when you have RAM+swap.
> Switching from one prog to another now takes time, because if you need
> to go e.g. from mozilla to openoffice for example, if openoffice has
> been swapped, it'll take ages.
>
> Another good example is launching X and a few heavy X apps, going back
> to console, doing a few things, like compiling different kernel trees.
> If you have swap, the X + X apps will be swapped. going back to X will
> take ages, because all that data + code has to be moved out to RAM to
> cache the data in the two kernel trees.
> If you don't have swap, maybe one, or both of the two kernel trees
> will end up being not cached into main memory, depending on how much
> RAM left you have. but going back to X will take 1 second instead of 20,
> and thus the system will be more responsive.
>
> It depends clearly on the situation you're in. I believe running with
> swap is beneficial when your memory load is more than 75% of total
> RAM, and less so when you have a few hundred megs of RAM left with all
> useful apps loaded into RAM (which is not too unlikely these days,
> due to the low price of SD/DDR RAM).

A perfect example of why a system _needs_ tuning knobs - this view of
Linus's that we need a self tuning system is idiotic, because some of us
don't care how long a kernel compile takes (or even how long it takes to
serve a couple of web pages per hour), but _do_ care about the general
system responsiveness. The system cannot predict what *I* the user wants
out of it. Hence we need /proc interfaces to the the VM that say this is a
compiling machine, or this is a desktop machine.....

--
TimC -- http://www.physics.usyd.edu.au/~tcon/

cat ~/.signature
Passing cosmic ray (core dumped)

2001-11-19 10:29:14

by Dan Maas

[permalink] [raw]
Subject: Re: Swap

> > If you don't have swap, maybe one, or both of the two
> > kernel trees will end up being not cached into main
> > memory, depending on how much RAM left you have. but going
> > back to X will take 1 second instead of 20,
> > and thus the system will be more responsive.

> A perfect example of why a system _needs_ tuning knobs - this view of
> Linus's that we need a self tuning system is idiotic, because some of us
> don't care how long a kernel compile takes (or even how long it takes to
> serve a couple of web pages per hour), but _do_ care about the general
> system responsiveness.

For what it's worth, I heartily agree...

Linus et al might very well say "if you care so much about keeping X in RAM,
just mlock() it." This is certainly worth a shot. (though I'd much prefer a
configurable 'weight' or 'stickiness' for file mappings vs. cached buffers).

Of course this sort of second-order tuning mechanism is a lot less important
than having a VM that doesn't crash or suck badly for common loads =)...
(not that the VM has been bad at all lately; I haven't had any problems
since 2.4.9-ac10 or 2.4.14, knock on wood...)

Regards,
Dan

2001-11-19 10:52:16

by Remco Post

[permalink] [raw]
Subject: Re: Swap


--8<--

> Except that openoffice and mozilla can be swapped out in BOTH cases: the
> kernel can discard mapped pages and reread as needed, whether you have a swap
> partition or not.
>
No they can't without swap, nothing can be SWAPPED out. The code pages can be
paged out (discarded), but no SWAPPING takes place.


> Whereas without swapspace, only the read-only mapped pages can be swapped out.

Again, pages do not gat swapped out, only applications can get swapped out.
Swapping is per definition the process of removing all pages used by one
application from RAM, and moving ALL pages to swap.


> Provided the VM is doing its job properly, adding swap will always be a net
> win for efficiency: the kernel is able to dump unused pages to make more room
> for others. Of course, you tend to "feel" the response times to interactive
> events, rather than the overall throughput, so a change which slows the
> system down but makes it more "responsive" to mouse clicks etc feels like a
> net win...
>
>
> James.

With any properly sized system, it will NEVER SWAP. Paging is a completely
different thing. A little paging is not a problem. Up to 70 pagescans/s on
occasion is quite acceptable. If paging activety grows above that, you may
have a real problem. I don't know about the current VM, but with most unixes
when you hit this mark, the system actually starts swapping, and your
responsiveness goes down the drain....


--
Met vriendelijke groeten,

Remco Post

SARA - Stichting Academisch Rekencentrum Amsterdam
High Performance Computing Tel. +31 20 592 8008 Fax. +31 20 668 3167

"I really didn't foresee the Internet. But then, neither did the computer
industry. Not that that tells us very much of course - the computer industry
didn't even foresee that the century was going to end." -- Douglas Adams


2001-11-19 13:33:44

by James A Sutherland

[permalink] [raw]
Subject: Re: Swap

On Monday 19 November 2001 10:51 am, Remco Post wrote:
> --8<--
>
> > Except that openoffice and mozilla can be swapped out in BOTH cases: the
> > kernel can discard mapped pages and reread as needed, whether you have a
> > swap partition or not.
>
> No they can't without swap, nothing can be SWAPPED out. The code pages can
> be paged out (discarded), but no SWAPPING takes place.

OK, s/swapped/paged/.

> > Whereas without swapspace, only the read-only mapped pages can be swapped
> > out.
>
> Again, pages do not gat swapped out, only applications can get swapped out.
> Swapping is per definition the process of removing all pages used by one
> application from RAM, and moving ALL pages to swap.

So in effect, Linux never ever swaps. At all. Under any circumstances. (Using
your interpretation of the word). Which does raise the question of WTF that
"swap space" is for, and why it's really used for "paging"...

> > Provided the VM is doing its job properly, adding swap will always be a
> > net win for efficiency: the kernel is able to dump unused pages to make
> > more room for others. Of course, you tend to "feel" the response times to
> > interactive events, rather than the overall throughput, so a change which
> > slows the system down but makes it more "responsive" to mouse clicks etc
> > feels like a net win...
>
> With any properly sized system, it will NEVER SWAP. Paging is a completely
> different thing. A little paging is not a problem. Up to 70 pagescans/s on
> occasion is quite acceptable. If paging activety grows above that, you may
> have a real problem. I don't know about the current VM, but with most
> unixes when you hit this mark, the system actually starts swapping, and
> your responsiveness goes down the drain....

By your definition, Linux does not swap, ever. It only "pages". This is what
I was referring to as swapping, since this involves the SWAPspace/partition,
rather than PAGEfile :)


James.

2001-11-19 13:46:56

by Remco Post

[permalink] [raw]
Subject: Re: Swap

> On Monday 19 November 2001 10:51 am, Remco Post wrote:
> > --8<--
> >
> > > Except that openoffice and mozilla can be swapped out in BOTH cases: the
> > > kernel can discard mapped pages and reread as needed, whether you have a
> > > swap partition or not.
> >
> > No they can't without swap, nothing can be SWAPPED out. The code pages can
> > be paged out (discarded), but no SWAPPING takes place.
>
> OK, s/swapped/paged/.
>
> > > Whereas without swapspace, only the read-only mapped pages can be swapped
> > > out.
> >
> > Again, pages do not gat swapped out, only applications can get swapped out.
> > Swapping is per definition the process of removing all pages used by one
> > application from RAM, and moving ALL pages to swap.
>
> So in effect, Linux never ever swaps. At all. Under any circumstances. (Using
> your interpretation of the word). Which does raise the question of WTF that
> "swap space" is for, and why it's really used for "paging"...
>
Linux does swap (I guess), swapping is a very extreem measure, "I need memory
now, and the paging algorithm does not work any more", this is quite rare, but
a few runaway netscape processes can easily cause this....


> > > Provided the VM is doing its job properly, adding swap will always be a
> > > net win for efficiency: the kernel is able to dump unused pages to make
> > > more room for others. Of course, you tend to "feel" the response times to
> > > interactive events, rather than the overall throughput, so a change which
> > > slows the system down but makes it more "responsive" to mouse clicks etc
> > > feels like a net win...
> >
> > With any properly sized system, it will NEVER SWAP. Paging is a completely
> > different thing. A little paging is not a problem. Up to 70 pagescans/s on
> > occasion is quite acceptable. If paging activety grows above that, you may
> > have a real problem. I don't know about the current VM, but with most
> > unixes when you hit this mark, the system actually starts swapping, and
> > your responsiveness goes down the drain....
>
> By your definition, Linux does not swap, ever. It only "pages". This is what
> I was referring to as swapping, since this involves the SWAPspace/partition,
> rather than PAGEfile :)
>
>
> James.
>

It is quite a common mistake. When discussing the VM, it is important to make
the distinction. In the old days (about the time when I was born ;) swapping
was the only thing Unixes ever did, no paging, which is quite a recent
invention. As you'd expect, this is why you have a swapspace that is now also
used for paging. As a test, you could quite simply build an application that
uses so much memory (not only malloc it, but also USE it) that your system
will start swapping, try using any interative application after that, and
you'll feel why you really don't want a system to swap...



--
Met vriendelijke groeten,

Remco Post

SARA - Stichting Academisch Rekencentrum Amsterdam
High Performance Computing Tel. +31 20 592 8008 Fax. +31 20 668 3167

"I really didn't foresee the Internet. But then, neither did the computer
industry. Not that that tells us very much of course - the computer industry
didn't even foresee that the century was going to end." -- Douglas Adams


2001-11-19 16:36:42

by Jesse Pollard

[permalink] [raw]
Subject: Re: Swap

James A Sutherland <[email protected]>:
> On Monday 19 November 2001 10:51 am, Remco Post wrote:
> > --8<--
> >
> > > Except that openoffice and mozilla can be swapped out in BOTH cases: the
> > > kernel can discard mapped pages and reread as needed, whether you have a
> > > swap partition or not.
> >
> > No they can't without swap, nothing can be SWAPPED out. The code pages can
> > be paged out (discarded), but no SWAPPING takes place.
>
> OK, s/swapped/paged/.
>
> > > Whereas without swapspace, only the read-only mapped pages can be swapped
> > > out.
> >
> > Again, pages do not gat swapped out, only applications can get swapped out.
> > Swapping is per definition the process of removing all pages used by one
> > application from RAM, and moving ALL pages to swap.
>
> So in effect, Linux never ever swaps. At all. Under any circumstances. (Using
> your interpretation of the word). Which does raise the question of WTF that
> "swap space" is for, and why it's really used for "paging"...

Linux doesn't - but some UNIX systems do swap. This is when the kernel pages
out the process header, page tables, process kernel stack ...

At this point the process is in the equivalent state as that of the system
that only does "swapping".

The swap space is used when more physical memory is required than is available
for user data. The modified pages of user data are written to the swap space
and the physical page re-used for another purpose. Effectively "swapping" the
use of the page... :-)

> > > Provided the VM is doing its job properly, adding swap will always be a
> > > net win for efficiency: the kernel is able to dump unused pages to make
> > > more room for others. Of course, you tend to "feel" the response times to
> > > interactive events, rather than the overall throughput, so a change which
> > > slows the system down but makes it more "responsive" to mouse clicks etc
> > > feels like a net win...
> >
> > With any properly sized system, it will NEVER SWAP. Paging is a completely
> > different thing. A little paging is not a problem. Up to 70 pagescans/s on
> > occasion is quite acceptable. If paging activety grows above that, you may
> > have a real problem. I don't know about the current VM, but with most
> > unixes when you hit this mark, the system actually starts swapping, and
> > your responsiveness goes down the drain....
>
> By your definition, Linux does not swap, ever. It only "pages". This is what
> I was referring to as swapping, since this involves the SWAPspace/partition,
> rather than PAGEfile :)

The problem is determining "properly sized system". Second - ALL linux systems
will page in (or swap in) executables, if only at the start of executution
(easiest/fasted way to load the program... mmap is quick, even if it does
blur the distinction between process pages and I/O cache)

Linux uses RAM+SWAP for virtual memory operation, and swaps pages used for
data to the "swap space" to use different "swapped pages" to load back into
physical memory. Since this is effectively hidden from most activity (and
measures), it becomes easy to oversubscribe memory, causing thrashing (lots
of page activity for little gain), where a system with mixed paging + swapping
(page out entire processes and disable scheduling them) CAN make significant
progress.

The other use of RAM is for data caching. Usually is faster to keep file data
loaded into RAM for use by programs. Runtime libraries are frequently where
the majority of CPU time is spent - Instead of waiting for data to be
transferred to RAM for use, Linux tries to "read ahead" accomplishing more
throughput that way by not forcing the active process to wait for the data.

The tricky part is determining the balance between the data cache, and process
memory.

The systems that use a combined pageing + swapping use a variety of measures
to decide what should be paged or swapped. Some characteristics used by these
systems are:

1. number of page faults/sec (swap if > watermark - reduces thrashing)
2. time elapsed since last completed I/O (if greater than some watermark, swap
- makes more RAM available)
3. idle processes (wait time > watermark, swap - discard executable pages,
swap out data pages)
4. batch processes (operate at a lower priority - swap non-interactive
processes - makes more RAM available)
5. High memory requirements (reduce resident set size; which invokes item 1)
6. Users priority (swap lower priority processes - make more RAM available)

Of course the sys admin must have control over all of the watermarks and/or
resource allocations. These are more characteristics of a general computation
or batch system than they are of a single user workstation, which is where
Linux started.

Hope I've help clear up some things.

-------------------------------------------------------------------------
Jesse I Pollard, II
Email: [email protected]

Any opinions expressed are solely my own.

2001-11-19 16:59:24

by Rik van Riel

[permalink] [raw]
Subject: Re: Swap

On Mon, 19 Nov 2001, Remco Post wrote:

> Linux does swap (I guess), swapping is a very extreem measure, "I need
> memory now, and the paging algorithm does not work any more", this is
> quite rare, but a few runaway netscape processes can easily cause
> this....

Guess again. Linux doesn't have load control implemented ...

Rik
--
DMCA, SSSCA, W3C? Who cares? http://thefreeworld.net/

http://www.surriel.com/ http://distro.conectiva.com/

2001-11-19 18:32:07

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Swap

Erik Gustavsson <[email protected]> writes:

> I agree... After a while it always seems that 80% or more of my RAM is
> used for cache and buffers while my open, but not currently used apps
> get pushed onto disk. Then when I decide to switch to that mozilla
> window of emacs session I have to wait for it to be loaded from disk
> again. Also considering the kind of disk activity this box has, the data
> in the cache is mostly the last few hour's MP3's, in other words utterly
> useless as that data will not be used again. I'd rather my apps stayed
> in RAM...

>
> Is there a way to limit the size of the cache?

Reasonable. It looks like the use once heuristics are failing for your
mp3 files. Find out why that is happening and they should push the
rest of your system into swap.

Eric

2001-11-19 18:43:39

by Rik van Riel

[permalink] [raw]
Subject: Re: Swap

On 19 Nov 2001, Eric W. Biederman wrote:

> > Is there a way to limit the size of the cache?
>
> Reasonable. It looks like the use once heuristics are failing for your
> mp3 files. Find out why that is happening and they should push the
> rest of your system into swap.

I bet they're getting mmap()d, like all mp3 programs seem to do ;)

Rik
--
DMCA, SSSCA, W3C? Who cares? http://thefreeworld.net/

http://www.surriel.com/ http://distro.conectiva.com/

2001-11-19 19:12:37

by James A Sutherland

[permalink] [raw]
Subject: Re: Swap

On Monday 19 November 2001 6:12 pm, Eric W. Biederman wrote:
> Erik Gustavsson <[email protected]> writes:
> > I agree... After a while it always seems that 80% or more of my RAM is
> > used for cache and buffers while my open, but not currently used apps
> > get pushed onto disk. Then when I decide to switch to that mozilla
> > window of emacs session I have to wait for it to be loaded from disk
> > again. Also considering the kind of disk activity this box has, the data
> > in the cache is mostly the last few hour's MP3's, in other words utterly
> > useless as that data will not be used again. I'd rather my apps stayed
> > in RAM...
> >
> >
> > Is there a way to limit the size of the cache?
>
> Reasonable. It looks like the use once heuristics are failing for your
> mp3 files. Find out why that is happening and they should push the
> rest of your system into swap.

Getting clobbered by the mp3 player accessing the ID3 tag? That way, at least
part of the file is used twice, so use-ONCE won't matter...


James.

Subject: Re: Swap



--On Monday, 19 November, 2001 2:58 PM -0200 Rik van Riel
<[email protected]> wrote:

> Guess again. Linux doesn't have load control implemented ...

Out of interest, is received wisdom that this is a good/bad
thing?

--
Alex Bligh

2001-11-19 21:18:37

by Rik van Riel

[permalink] [raw]
Subject: Re: Swap

On Mon, 19 Nov 2001, Alex Bligh - linux-kernel wrote:
> --On Monday, 19 November, 2001 2:58 PM -0200 Rik van Riel
> <[email protected]> wrote:
>
> > Guess again. Linux doesn't have load control implemented ...
>
> Out of interest, is received wisdom that this is a good/bad
> thing?

Load control is a good thing since it means the box
gets slower in a controlled way instead of running
fine one minute and horribly falling over the next
minute.

I'm certainly planning to implement some load control
measures for 2.5.

regards,

Rik
--
DMCA, SSSCA, W3C? Who cares? http://thefreeworld.net/

http://www.surriel.com/ http://distro.conectiva.com/

Subject: Re: Swap

Rik,

--On Monday, 19 November, 2001 7:17 PM -0200 Rik van Riel
<[email protected]> wrote:

>> Out of interest, is received wisdom that this is a good/bad
>> thing?
>
> Load control is a good thing since it means the box
> gets slower in a controlled way instead of running
> fine one minute and horribly falling over the next
> minute.
>
> I'm certainly planning to implement some load control
> measures for 2.5.

OK another potentially dumb question on this:

I had previously (mis?)understood load control to mean (say)
clustering page out requests to pages from specific
processes, then altering the scheduler to avoid scheduling these
processes for extended periods of time, then moving onto the next
set of processes to victimize, and so forth; i.e. increasing
scheduler granularity to cope with increased average virtual
memory access times by decreasing VM footprint used per second.

The original poster seemed to be talking about the old-UNIX
definition of swapping, which, if I remember right, was releasing
/all/ clean pages for an app (I guess this has already been done
by the time we want to do this) and paging /all/ dirty pages
& freeing the memory there and then.

I'd have thought swapping was a pretty coarsely-grained
form of load control (and difficulted with shared mem etc.);
do you believe there is a requirement to implement (old UNIX)
swapping per-se, or merely to intelligently tweak the scheduler
to cope better with high VM system loads? [the absence of the
former was what I was suggesting might have been considered
a good thing]

--
Alex Bligh

2001-11-20 03:09:16

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Swap

Rik van Riel <[email protected]> writes:

> On 19 Nov 2001, Eric W. Biederman wrote:
>
> > > Is there a way to limit the size of the cache?
> >
> > Reasonable. It looks like the use once heuristics are failing for your
> > mp3 files. Find out why that is happening and they should push the
> > rest of your system into swap.
>
> I bet they're getting mmap()d, like all mp3 programs seem to do ;)

That would probably do it. Though it is puzzling why after the file
is munmaped it's pages aren't recycled.

Eric

2001-11-20 03:06:56

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Swap

James A Sutherland <[email protected]> writes:

> On Monday 19 November 2001 6:12 pm, Eric W. Biederman wrote:
> > Erik Gustavsson <[email protected]> writes:
> > > I agree... After a while it always seems that 80% or more of my RAM is
> > > used for cache and buffers while my open, but not currently used apps
> > > get pushed onto disk. Then when I decide to switch to that mozilla
> > > window of emacs session I have to wait for it to be loaded from disk
> > > again. Also considering the kind of disk activity this box has, the data
> > > in the cache is mostly the last few hour's MP3's, in other words utterly
> > > useless as that data will not be used again. I'd rather my apps stayed
> > > in RAM...
> > >
> > >
> > > Is there a way to limit the size of the cache?
> >
> > Reasonable. It looks like the use once heuristics are failing for your
> > mp3 files. Find out why that is happening and they should push the
> > rest of your system into swap.
>
> Getting clobbered by the mp3 player accessing the ID3 tag? That way, at least
> part of the file is used twice, so use-ONCE won't matter...

For that page perhaps. But that is only 4K. That doesn't explain the rest
of it. use-once is per page.

Eric

2001-11-20 03:34:23

by Ryan Cumming

[permalink] [raw]
Subject: Re: Swap

On November 19, 2001 18:49, Eric W. Biederman wrote:
> That would probably do it. Though it is puzzling why after the file
> is munmaped it's pages aren't recycled.

Because they're part of the page cache now, and won't be recycled until newer
pages 'push' them out of memory. I for one would be very annoyed an mmap()'s
associated cache was dropped immediately on munmap, there are many instances
in which it would be reused immediately after (think running two GTK+ apps in
a row, and having to reload libgtk from disk each time).

-Ryan

2001-11-20 09:16:23

by James A Sutherland

[permalink] [raw]
Subject: Re: Swap

On Tuesday 20 November 2001 2:47 am, Eric W. Biederman wrote:
> James A Sutherland <[email protected]> writes:
> > On Monday 19 November 2001 6:12 pm, Eric W. Biederman wrote:
> > > Erik Gustavsson <[email protected]> writes:
> > > > I agree... After a while it always seems that 80% or more of my RAM
> > > > is used for cache and buffers while my open, but not currently used
> > > > apps get pushed onto disk. Then when I decide to switch to that
> > > > mozilla window of emacs session I have to wait for it to be loaded
> > > > from disk again. Also considering the kind of disk activity this box
> > > > has, the data in the cache is mostly the last few hour's MP3's, in
> > > > other words utterly useless as that data will not be used again. I'd
> > > > rather my apps stayed in RAM...
> > > >
> > > >
> > > > Is there a way to limit the size of the cache?
> > >
> > > Reasonable. It looks like the use once heuristics are failing for your
> > > mp3 files. Find out why that is happening and they should push the
> > > rest of your system into swap.
> >
> > Getting clobbered by the mp3 player accessing the ID3 tag? That way, at
> > least part of the file is used twice, so use-ONCE won't matter...
>
> For that page perhaps. But that is only 4K. That doesn't explain the rest
> of it. use-once is per page.

True - the ID3 can be in several different places, so that could account for
a couple of pages, but mp3 players certainly DON'T read the whole file in
before playing...

Does the mp3 player in question try to pre-read the pages using one
process/thread, before the actual player thread reaches them? How far apart
would two accesses need to be to disable read-once?


James.

2001-11-20 11:42:32

by Rik van Riel

[permalink] [raw]
Subject: Re: Swap

On 19 Nov 2001, Eric W. Biederman wrote:

> That would probably do it. Though it is puzzling why after the file
> is munmaped it's pages aren't recycled.

Not really. Use-once doesn't work for pages which are
or have been mmap()d, but later use of the cache will
not be able to put pressure on the pages which have
been taken out of the use-once loop.

Use-once as we have it now is fundamentally unbalanced,
I can't see a way of ever getting that thing to work
nicely.

regards,

Rik
--
DMCA, SSSCA, W3C? Who cares? http://thefreeworld.net/

http://www.surriel.com/ http://distro.conectiva.com/

2001-11-20 11:44:42

by Rik van Riel

[permalink] [raw]
Subject: Re: Swap

On Mon, 19 Nov 2001, Ryan Cumming wrote:
> On November 19, 2001 18:49, Eric W. Biederman wrote:
> > That would probably do it. Though it is puzzling why after the file
> > is munmaped it's pages aren't recycled.
>
> Because they're part of the page cache now, and won't be recycled
> until newer pages 'push' them out of memory.

Newer pages cannot push them out of memory, due to use-once.
That is, unless those newer pages also get mmap()d or if they
get accessed really often.

Rik
--
DMCA, SSSCA, W3C? Who cares? http://thefreeworld.net/

http://www.surriel.com/ http://distro.conectiva.com/

2001-11-20 14:52:19

by J.A. Magallon

[permalink] [raw]
Subject: Re: Swap


On 20011119 James A Sutherland wrote:
>On Monday 19 November 2001 10:51 am, Remco Post wrote:
>> --8<--
>>
>> > Except that openoffice and mozilla can be swapped out in BOTH cases: the
>> > kernel can discard mapped pages and reread as needed, whether you have a
>> > swap partition or not.
>>
>> No they can't without swap, nothing can be SWAPPED out. The code pages can
>> be paged out (discarded), but no SWAPPING takes place.
>
>OK, s/swapped/paged/.
>

Not so OK.

AFAIK, that is all a question of names. All is the same. Old systems
like MacOS do SWAP, because when they send something to disk they send the
whole app with its data space to disk. Linux does not send a whole app to
disk, but individual pages, so it does SWAP AT PAGE LEVEL, or paging. When
a page is deleted for one executable (because we can re-read it from on-disk
binary), it is discarded, not paged out. A page is paged-out if it is written
to disk.
So _swaping_ and _paging_ are the same, but with different granularity.

(of course, flame and correct me if I'm wrong...)

>> > Whereas without swapspace, only the read-only mapped pages can be swapped
>> > out.
>>

They are not swapped-out, just discarded to be re-read.

>
>By your definition, Linux does not swap, ever. It only "pages". This is what
>I was referring to as swapping, since this involves the SWAPspace/partition,
>rather than PAGEfile :)
>

It is the same. You can page-out (because Linux never do swap, as the process
of sending a whole app to disk), to an specially formatted partition or to
a file. If you are going to be pedantic, linux really uses _page_partitions_
and _page_files_, instead of swap-partitions and swap-files.

BTW, there is soft for mac that changes the swap algorithm from app level to
page level and they called it "RamDoubler", and people still thinks its
magic...

--
J.A. Magallon # Let the source be with you...
mailto:[email protected]
Mandrake Linux release 8.2 (Cooker) for i586
Linux werewolf 2.4.15-pre6-beo #1 SMP Sun Nov 18 10:25:01 CET 2001 i686

2001-11-20 16:02:05

by Wolfgang Rohdewald

[permalink] [raw]
Subject: Re: Swap

On Tuesday 20 November 2001 15:51, J.A. Magallon wrote:
> When a page is deleted for one executable (because we can re-read it from
> on-disk binary), it is discarded, not paged out.

What happens if the on-disk binary has changed since loading the program?

2001-11-20 16:07:05

by Remco Post

[permalink] [raw]
Subject: Re: Swap

> On Tuesday 20 November 2001 15:51, J.A. Magallon wrote:
> > When a page is deleted for one executable (because we can re-read it from
> > on-disk binary), it is discarded, not paged out.
>
> What happens if the on-disk binary has changed since loading the program?
>
The application usually crashes, but in theory it may run with just some
'strange' behaviour. (Don't worry, apps usually just crash ;)


--
Met vriendelijke groeten,

Remco Post

SARA - Stichting Academisch Rekencentrum Amsterdam
High Performance Computing Tel. +31 20 592 8008 Fax. +31 20 668 3167

"I really didn't foresee the Internet. But then, neither did the computer
industry. Not that that tells us very much of course - the computer industry
didn't even foresee that the century was going to end." -- Douglas Adams


2001-11-20 16:14:05

by Nick LeRoy

[permalink] [raw]
Subject: Re: Swap

On Tuesday 20 November 2001 10:01, Wolfgang Rohdewald wrote:
> On Tuesday 20 November 2001 15:51, J.A. Magallon wrote:
> > When a page is deleted for one executable (because we can re-read it from
> > on-disk binary), it is discarded, not paged out.
>
> What happens if the on-disk binary has changed since loading the program?

In general, you can't... You get a ETXTBSY 'text file busy' error. If you
try to do this over NFS (where the system can't stop you), the running image
will almost certainly crash if it tries to page in text.

-Nick

2001-11-20 16:21:16

by Richard B. Johnson

[permalink] [raw]
Subject: Re: Swap

On Tue, 20 Nov 2001, Wolfgang Rohdewald wrote:

> On Tuesday 20 November 2001 15:51, J.A. Magallon wrote:
> > When a page is deleted for one executable (because we can re-read it from
> > on-disk binary), it is discarded, not paged out.
>
> What happens if the on-disk binary has changed since loading the program?
> -

It can't. That's the reason for `install` and other methods of changing
execututable files (mv exe-file exe-file.old ; cp newfile exe-file).
The currently open, and possibly mapped file can be re-named, but it
can't be overwritten.


Cheers,
Dick Johnson

Penguin : Linux version 2.4.1 on an i686 machine (799.53 BogoMips).

I was going to compile a list of innovations that could be
attributed to Microsoft. Once I realized that Ctrl-Alt-Del
was handled in the BIOS, I found that there aren't any.


2001-11-20 17:12:07

by Chris Friesen

[permalink] [raw]
Subject: Re: Swap

"Richard B. Johnson" wrote:
>
> On Tue, 20 Nov 2001, Wolfgang Rohdewald wrote:
>
> > On Tuesday 20 November 2001 15:51, J.A. Magallon wrote:
> > > When a page is deleted for one executable (because we can re-read it from
> > > on-disk binary), it is discarded, not paged out.
> >
> > What happens if the on-disk binary has changed since loading the program?
> > -
>
> It can't. That's the reason for `install` and other methods of changing
> execututable files (mv exe-file exe-file.old ; cp newfile exe-file).
> The currently open, and possibly mapped file can be re-named, but it
> can't be overwritten.

Actually, with NFS (and probably others) it can. Suppose I change the file on
the server, and it's swapped out on a client that has it mounted. When it swaps
back in, it can get the new information.

Chris

--
Chris Friesen | MailStop: 043/33/F10
Nortel Networks | work: (613) 765-0557
3500 Carling Avenue | fax: (613) 765-2986
Nepean, ON K2H 8E9 Canada | email: [email protected]

2001-11-20 17:54:01

by Richard B. Johnson

[permalink] [raw]
Subject: Re: Swap

On Tue, 20 Nov 2001, Christopher Friesen wrote:

> "Richard B. Johnson" wrote:
> >
> > On Tue, 20 Nov 2001, Wolfgang Rohdewald wrote:
> >
> > > On Tuesday 20 November 2001 15:51, J.A. Magallon wrote:
> > > > When a page is deleted for one executable (because we can re-read it from
> > > > on-disk binary), it is discarded, not paged out.
> > >
> > > What happens if the on-disk binary has changed since loading the program?
> > > -
> >
> > It can't. That's the reason for `install` and other methods of changing
> > execututable files (mv exe-file exe-file.old ; cp newfile exe-file).
> > The currently open, and possibly mapped file can be re-named, but it
> > can't be overwritten.
>
> Actually, with NFS (and probably others) it can. Suppose I change the file on
> the server, and it's swapped out on a client that has it mounted. When it swaps
> back in, it can get the new information.
>
> Chris

I note that NFS files don't currently return ETXTBSY, but this is a bug.
It is 'known' to the OS that the NFS mounted file-system is busy because
you can't unmount the file-system while an executable is running. If
you can trash it (as you can on Linux), it is surely a bug.

Alan explained a few years ago that NFS was "stateless". Nevertheless
it is still a bug.

Cheers,
Dick Johnson

Penguin : Linux version 2.4.1 on an i686 machine (799.53 BogoMips).

I was going to compile a list of innovations that could be
attributed to Microsoft. Once I realized that Ctrl-Alt-Del
was handled in the BIOS, I found that there aren't any.


2001-11-20 18:07:47

by Wolfgang Rohdewald

[permalink] [raw]
Subject: Re: Swap

On Tuesday 20 November 2001 18:14, Christopher Friesen wrote:
> "Richard B. Johnson" wrote:
> > On Tue, 20 Nov 2001, Wolfgang Rohdewald wrote:
> > > On Tuesday 20 November 2001 15:51, J.A. Magallon wrote:
> > > > When a page is deleted for one executable (because we can re-read it
> > > > from on-disk binary), it is discarded, not paged out.
> > >
> > > What happens if the on-disk binary has changed since loading the
> > > program? -
> >
> > It can't. That's the reason for `install` and other methods of changing
> > execututable files (mv exe-file exe-file.old ; cp newfile exe-file).
> > The currently open, and possibly mapped file can be re-named, but it
> > can't be overwritten.
>
> Actually, with NFS (and probably others) it can. Suppose I change the file
> on the server, and it's swapped out on a client that has it mounted. When
> it swaps back in, it can get the new information.

I am quite sure this is also possible if the binary is emulated by the linux-abi
modules like my old SCO binaries. I just cannot check right now because I did
not yet get linux-abi working with 2.4.15-pre7 (worked with 2.4.15-pre4, but
pre4 had a seemingly VM related OOPS when starting VMware3 which is gone with pre7)

2001-11-20 18:43:55

by Nick LeRoy

[permalink] [raw]
Subject: Re: Swap

<snip>
> I note that NFS files don't currently return ETXTBSY, but this is a bug.
> It is 'known' to the OS that the NFS mounted file-system is busy because
> you can't unmount the file-system while an executable is running. If
> you can trash it (as you can on Linux), it is surely a bug.
>
> Alan explained a few years ago that NFS was "stateless". Nevertheless
> it is still a bug.

Correct me if I'm wrong, but I think that it's more a bug in the NFS protocol
than in the Linux (or Solaris, etc) NFS implementation. The problem is that
NFS itself just doesn't pass that information along. The NFS server has no
idea that the 'text' file is being executed, so it doesn't know that it
should "return" ETXTBSY.

Now, this might be different in NFS v3, but I'm pretty sure that this applies
for v2, at least.

-Nick

2001-11-20 20:58:48

by Mike Fedyk

[permalink] [raw]
Subject: Re: Swap

On Tue, Nov 20, 2001 at 03:51:43PM +0100, J.A. Magallon wrote:
> BTW, there is soft for mac that changes the swap algorithm from app level to
> page level and they called it "RamDoubler", and people still thinks its
> magic...
>

Ahh, so that's what it does, in addition to compression...

2001-11-20 21:12:51

by Steffen Persvold

[permalink] [raw]
Subject: Re: Swap

Christopher Friesen wrote:
>
> "Richard B. Johnson" wrote:
> >
> > On Tue, 20 Nov 2001, Wolfgang Rohdewald wrote:
> >
> > > On Tuesday 20 November 2001 15:51, J.A. Magallon wrote:
> > > > When a page is deleted for one executable (because we can re-read it from
> > > > on-disk binary), it is discarded, not paged out.
> > >
> > > What happens if the on-disk binary has changed since loading the program?
> > > -
> >
> > It can't. That's the reason for `install` and other methods of changing
> > execututable files (mv exe-file exe-file.old ; cp newfile exe-file).
> > The currently open, and possibly mapped file can be re-named, but it
> > can't be overwritten.
>
> Actually, with NFS (and probably others) it can. Suppose I change the file on
> the server, and it's swapped out on a client that has it mounted. When it swaps
> back in, it can get the new information.
>

This sounds really dangerous... What about shared libraries ??

Regards,
--
Steffen Persvold | Scalable Linux Systems | Try out the world's best
mailto:[email protected] | http://www.scali.com | performing MPI implementation:
Tel: (+47) 2262 8950 | Olaf Helsets vei 6 | - ScaMPI 1.12.2 -
Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY | >300MBytes/s and <4uS latency

2001-11-20 21:19:13

by Mike Fedyk

[permalink] [raw]
Subject: Re: Swap

On Tue, Nov 20, 2001 at 10:05:37PM +0100, Steffen Persvold wrote:
> Christopher Friesen wrote:
> >
> > "Richard B. Johnson" wrote:
> > >
> > > On Tue, 20 Nov 2001, Wolfgang Rohdewald wrote:
> > >
> > > > On Tuesday 20 November 2001 15:51, J.A. Magallon wrote:
> > > > > When a page is deleted for one executable (because we can re-read it from
> > > > > on-disk binary), it is discarded, not paged out.
> > > >
> > > > What happens if the on-disk binary has changed since loading the program?
> > > > -
> > >
> > > It can't. That's the reason for `install` and other methods of changing
> > > execututable files (mv exe-file exe-file.old ; cp newfile exe-file).
> > > The currently open, and possibly mapped file can be re-named, but it
> > > can't be overwritten.
> >
> > Actually, with NFS (and probably others) it can. Suppose I change the file on
> > the server, and it's swapped out on a client that has it mounted. When it swaps
> > back in, it can get the new information.
> >
>
> This sounds really dangerous... What about shared libraries ??
>

IIRC (if wrong flame...)

When you delete an open file, the entry is removed from the directory, but
not unlinked until the file is closed. This is a standard UNIX semantic.

Now, if you have a set of processes with shared memory, and one closes, and
another is created to replace, the new process will get the new libraries,
or even new version of the process. This could/will bring down the entire
set of processes.

Apps like samba come to mind...

Mike

2001-11-20 21:36:26

by Nick LeRoy

[permalink] [raw]
Subject: Re: Swap

On Tuesday 20 November 2001 15:05, Steffen Persvold wrote:
> Christopher Friesen wrote:
> > "Richard B. Johnson" wrote:
> > > On Tue, 20 Nov 2001, Wolfgang Rohdewald wrote:
> > > > On Tuesday 20 November 2001 15:51, J.A. Magallon wrote:
> > > > > When a page is deleted for one executable (because we can re-read
> > > > > it from on-disk binary), it is discarded, not paged out.
> > > >
> > > > What happens if the on-disk binary has changed since loading the
> > > > program? -
> > >
> > > It can't. That's the reason for `install` and other methods of changing
> > > execututable files (mv exe-file exe-file.old ; cp newfile exe-file).
> > > The currently open, and possibly mapped file can be re-named, but it
> > > can't be overwritten.
> >
> > Actually, with NFS (and probably others) it can. Suppose I change the
> > file on the server, and it's swapped out on a client that has it mounted.
> > When it swaps back in, it can get the new information.
>
> This sounds really dangerous... What about shared libraries ??

It is. Usually it ends with a loud 'boom' the process crashes & burns.

-Nick

2001-11-20 21:39:56

by Dan Maas

[permalink] [raw]
Subject: Re: Swap

> I bet they're getting mmap()d, like all mp3 programs seem to do

Just a note here - I see much fewer buffer underruns and more consistent
read-ahead/drop-behind behavior (i.e. no paging of other programs) when
using plain read(), as opposed to mmap(). This is in a video playback
program that pumps 3.6MB/sec!

MP3 datarates are less than 50KB/sec, so I don't really see why they stand
to benefit from mmap()... With mmap() you pay the extra cost of setting
up/tearing down the mapping, and the kernel->user copy is virtually
insignificant anyway (you already are paying for a single copy plus cache
pollution when moving the data from filesystem buffer to sound card DMA
buffer, so a second copy isn't a big deal)...

Regards,
Dan

2001-11-20 21:43:36

by Richard B. Johnson

[permalink] [raw]
Subject: Re: Swap

On Tue, 20 Nov 2001, Mike Fedyk wrote:

> On Tue, Nov 20, 2001 at 10:05:37PM +0100, Steffen Persvold wrote:
> > Christopher Friesen wrote:
> > >
> > > "Richard B. Johnson" wrote:
> > > >
> > > > On Tue, 20 Nov 2001, Wolfgang Rohdewald wrote:
> > > >
> > > > > On Tuesday 20 November 2001 15:51, J.A. Magallon wrote:
> > > > > > When a page is deleted for one executable (because we can re-read it from
> > > > > > on-disk binary), it is discarded, not paged out.
> > > > >
> > > > > What happens if the on-disk binary has changed since loading the program?
> > > > > -
> > > >
> > > > It can't. That's the reason for `install` and other methods of changing
> > > > execututable files (mv exe-file exe-file.old ; cp newfile exe-file).
> > > > The currently open, and possibly mapped file can be re-named, but it
> > > > can't be overwritten.
> > >
> > > Actually, with NFS (and probably others) it can. Suppose I change the file on
> > > the server, and it's swapped out on a client that has it mounted. When it swaps
> > > back in, it can get the new information.
> > >
> >
> > This sounds really dangerous... What about shared libraries ??
> >
>
> IIRC (if wrong flame...)
>
> When you delete an open file, the entry is removed from the directory, but
> not unlinked until the file is closed. This is a standard UNIX semantic.
>
> Now, if you have a set of processes with shared memory, and one closes, and
> another is created to replace, the new process will get the new libraries,
> or even new version of the process. This could/will bring down the entire
> set of processes.
>
> Apps like samba come to mind...
>
> Mike

If the file is local, everything is fine. A file won't actually
be deleted until the last access is closed. However, the long-standing
problem with NFS is that it's `phony`. Basically, we send a message
to a server that says "Give me a directory listing...". The server
does the `opendir()` etc., and returns the results. If I want to
open a file on the server, the server has no knowledge of the `open`.
The client's software just emulated a file-system open(). When the
client wants to read data from a server's file, it sends a message;
"Gimmie data from file xxx, offset x, length y.". The server responds
with that data. To get that data, the server did an open/lseek/read/close.

So, as far as the server is concerned, that file is closed. Somebody
else (with privilege) can delete the file and replace it. The client,
the one that got the data for an executable, doesn't even know it.

This is 'nice' for the server, it doesn't have the overhead of maintaining
a file-system state. That's why servers are supposed to be read-only.
However, somebody has got to write the stuff to the file-system that's
going to (eventually) be read-only. Beware when such access occurs.

Cheers,
Dick Johnson

Penguin : Linux version 2.4.1 on an i686 machine (799.53 BogoMips).

I was going to compile a list of innovations that could be
attributed to Microsoft. Once I realized that Ctrl-Alt-Del
was handled in the BIOS, I found that there aren't any.


2001-11-20 21:44:46

by Mike Fedyk

[permalink] [raw]
Subject: Re: Swap

On Tue, Nov 20, 2001 at 03:33:28PM -0600, Nick LeRoy wrote:
> On Tuesday 20 November 2001 15:18, Mike Fedyk wrote:
> > On Tue, Nov 20, 2001 at 10:05:37PM +0100, Steffen Persvold wrote:
> > > Christopher Friesen wrote:
> > > > "Richard B. Johnson" wrote:
> > > > > On Tue, 20 Nov 2001, Wolfgang Rohdewald wrote:
> > > > > > On Tuesday 20 November 2001 15:51, J.A. Magallon wrote:
> > > > > > > When a page is deleted for one executable (because we can re-read
> > > > > > > it from on-disk binary), it is discarded, not paged out.
> > > > > >
> > > > > > What happens if the on-disk binary has changed since loading the
> > > > > > program? -
> > > > >
> > > > > It can't. That's the reason for `install` and other methods of
> > > > > changing execututable files (mv exe-file exe-file.old ; cp newfile
> > > > > exe-file). The currently open, and possibly mapped file can be
> > > > > re-named, but it can't be overwritten.
> > > >
> > > > Actually, with NFS (and probably others) it can. Suppose I change the
> > > > file on the server, and it's swapped out on a client that has it
> > > > mounted. When it swaps back in, it can get the new information.
> > >
> > > This sounds really dangerous... What about shared libraries ??
> >
> > IIRC (if wrong flame...)
> >
> > When you delete an open file, the entry is removed from the directory, but
> > not unlinked until the file is closed. This is a standard UNIX semantic.
> >
> > Now, if you have a set of processes with shared memory, and one closes, and
> > another is created to replace, the new process will get the new libraries,
> > or even new version of the process. This could/will bring down the entire
> > set of processes.
> >
> > Apps like samba come to mind...
>
> *Any* time that you write to an executing executable, all bets are off. The
> most likely outcome is a big 'ol crash & burn. With a local FS, Unix
> prevents you from shooting yourself in the foot, but with NFS, fire away..
> I've done it. It *does* let you, but...
>
> Solution: Don't do that. Shut them all down, on all clients, upgrade the
> binaries, then restart the processes on the clients.
>
> As far as the scenerio that you've described, I *think* that it would
> actually work. When the new process is fork()ed, it gets a copy of the file
> descriptors from it's parent, so the file is still open to it. If it the
> exec()s, the new image no longer has any real ties to it's parent (at least,
> not that are relevant to this).
>

What about processes with shared memory such as samba 2.0?

2001-11-20 21:49:17

by Nick LeRoy

[permalink] [raw]
Subject: Re: Swap

On Tuesday 20 November 2001 15:18, Mike Fedyk wrote:
> On Tue, Nov 20, 2001 at 10:05:37PM +0100, Steffen Persvold wrote:
> > Christopher Friesen wrote:
> > > "Richard B. Johnson" wrote:
> > > > On Tue, 20 Nov 2001, Wolfgang Rohdewald wrote:
> > > > > On Tuesday 20 November 2001 15:51, J.A. Magallon wrote:
> > > > > > When a page is deleted for one executable (because we can re-read
> > > > > > it from on-disk binary), it is discarded, not paged out.
> > > > >
> > > > > What happens if the on-disk binary has changed since loading the
> > > > > program? -
> > > >
> > > > It can't. That's the reason for `install` and other methods of
> > > > changing execututable files (mv exe-file exe-file.old ; cp newfile
> > > > exe-file). The currently open, and possibly mapped file can be
> > > > re-named, but it can't be overwritten.
> > >
> > > Actually, with NFS (and probably others) it can. Suppose I change the
> > > file on the server, and it's swapped out on a client that has it
> > > mounted. When it swaps back in, it can get the new information.
> >
> > This sounds really dangerous... What about shared libraries ??
>
> IIRC (if wrong flame...)
>
> When you delete an open file, the entry is removed from the directory, but
> not unlinked until the file is closed. This is a standard UNIX semantic.
>
> Now, if you have a set of processes with shared memory, and one closes, and
> another is created to replace, the new process will get the new libraries,
> or even new version of the process. This could/will bring down the entire
> set of processes.
>
> Apps like samba come to mind...

*Any* time that you write to an executing executable, all bets are off. The
most likely outcome is a big 'ol crash & burn. With a local FS, Unix
prevents you from shooting yourself in the foot, but with NFS, fire away..
I've done it. It *does* let you, but...

Solution: Don't do that. Shut them all down, on all clients, upgrade the
binaries, then restart the processes on the clients.

As far as the scenerio that you've described, I *think* that it would
actually work. When the new process is fork()ed, it gets a copy of the file
descriptors from it's parent, so the file is still open to it. If it the
exec()s, the new image no longer has any real ties to it's parent (at least,
not that are relevant to this).

If it's created via clone(), then, once again, it's got it's parents
descriptors still open, so no problem.

I think the real problems only exist over NFS and NFS-like scenerios.

Did I miss something here, or am I actually correct? I was correct once,
let's see... Ooops. That was a mistake too.

-Nick

2001-11-20 21:51:26

by Mike Fedyk

[permalink] [raw]
Subject: NFS, Paging & Installing [was: Re: Swap]

On Tue, Nov 20, 2001 at 04:43:01PM -0500, Richard B. Johnson wrote:
> On Tue, 20 Nov 2001, Mike Fedyk wrote:
> > IIRC (if wrong flame...)
> >
> > When you delete an open file, the entry is removed from the directory, but
> > not unlinked until the file is closed. This is a standard UNIX semantic.
> >
> > Now, if you have a set of processes with shared memory, and one closes, and
> > another is created to replace, the new process will get the new libraries,
> > or even new version of the process. This could/will bring down the entire
> > set of processes.
> >
> > Apps like samba come to mind...
> >
> > Mike
>
> If the file is local, everything is fine. A file won't actually
> be deleted until the last access is closed. However, the long-standing
> problem with NFS is that it's `phony`. Basically, we send a message
> to a server that says "Give me a directory listing...". The server
> does the `opendir()` etc., and returns the results. If I want to
> open a file on the server, the server has no knowledge of the `open`.
> The client's software just emulated a file-system open(). When the
> client wants to read data from a server's file, it sends a message;
> "Gimmie data from file xxx, offset x, length y.". The server responds
> with that data. To get that data, the server did an open/lseek/read/close.
>
> So, as far as the server is concerned, that file is closed. Somebody
> else (with privilege) can delete the file and replace it. The client,
> the one that got the data for an executable, doesn't even know it.
>
> This is 'nice' for the server, it doesn't have the overhead of maintaining
> a file-system state. That's why servers are supposed to be read-only.
> However, somebody has got to write the stuff to the file-system that's
> going to (eventually) be read-only. Beware when such access occurs.
>

Do any newer versions of NFS fix the stateless server problem?

If not, are there any drop in (at least for linux) replacements that do keep
state on the server?

SMB is out because it doesn't propagate the unix uid/gid

Striped down (auth wise) AFS?

Intermezzo?

2001-11-20 22:05:46

by Rik van Riel

[permalink] [raw]
Subject: Re: Swap

On Tue, 20 Nov 2001, Dan Maas wrote:

> > I bet they're getting mmap()d, like all mp3 programs seem to do
>
> Just a note here - I see much fewer buffer underruns and more consistent
> read-ahead/drop-behind behavior (i.e. no paging of other programs) when
> using plain read(), as opposed to mmap().

Consider this a VM bug, mmap() really should be more efficient.

regards,

Rik
--
Shortwave goes a long way: irc.starchat.net #swl

http://www.surriel.com/ http://distro.conectiva.com/

2001-11-20 22:11:56

by David Miller

[permalink] [raw]
Subject: Re: Swap

From: Rik van Riel <[email protected]>
Date: Tue, 20 Nov 2001 20:05:05 -0200 (BRST)

Consider this a VM bug, mmap() really should be more efficient.

read() is always going to be faster until mmap() can
use large page mappings for the user. This is why
mmap() is slower.

Even if the whole thing is cached in memory, read() will
always be faster.

2001-11-20 22:20:27

by Rik van Riel

[permalink] [raw]
Subject: Re: Swap

On Tue, 20 Nov 2001, David S. Miller wrote:
> From: Rik van Riel <[email protected]>
> Date: Tue, 20 Nov 2001 20:05:05 -0200 (BRST)
>
> Consider this a VM bug, mmap() really should be more efficient.
>
> read() is always going to be faster until mmap() can
> use large page mappings for the user. This is why
> mmap() is slower.

Uhhhh, read his original mail. When using mmap() he had
problems with the VM doing bad page replacement, while
read() was smooth.

Rik
--
Shortwave goes a long way: irc.starchat.net #swl

http://www.surriel.com/ http://distro.conectiva.com/

2001-11-20 22:24:46

by Andrew Morton

[permalink] [raw]
Subject: Re: Swap

"David S. Miller" wrote:
>
> From: Rik van Riel <[email protected]>
> Date: Tue, 20 Nov 2001 20:05:05 -0200 (BRST)
>
> Consider this a VM bug, mmap() really should be more efficient.
>
> read() is always going to be faster until mmap() can
> use large page mappings for the user. This is why
> mmap() is slower.
>
> Even if the whole thing is cached in memory, read() will
> always be faster.

Could you please explain further? What's more expensive
than the copy?

2001-11-20 22:47:38

by Dan Maas

[permalink] [raw]
Subject: Re: Swap

> Uhhhh, read his original mail. When using mmap() he had
> problems with the VM doing bad page replacement, while
> read() was smooth.

I should add that I did experiment with madvise(MADV_SEQUENTIAL) on the
mapping, and with madvise(MADV_WILLNEED) on pages about to be needed. These
had no effect. What *did* help with underruns was pre-touching each page in
a large block (120KB), before sending that block to the output device. At
that point I thought the mmap() code was getting to be more complicated that
it was worth so I just dropped back to read()...

The other day I recorded and played a seven-minute animation (1.6GB) on my
512MB machine, with only 240KB of buffering. Much to my surprise and
delight, there were no underruns, and the large sequential passes through
the file hadn't pushed anything else out of RAM.

BTW, all of the above pertains to one large mapping of the entire file to be
played. I didn't try mmap()/munmap() on a sliding window... I seem to
remember an MP3 player doing that. (I just tried looking at XMMS and
Freeamp - I *think* they are using read(), but strace seems to do bad things
with threaded programs, argh...)

Regards,
Dan

2001-11-20 22:59:20

by Dan Maas

[permalink] [raw]
Subject: Re: Swap

> This is 'nice' for the server, it doesn't have the overhead of maintaining
> a file-system state. That's why servers are supposed to be read-only.
> However, somebody has got to write the stuff to the file-system that's
> going to (eventually) be read-only. Beware when such access occurs.

But NFS still allows atomic rename() right? Isn't it considered essential to
write the new executable or library under a different name, and then
atomically rename() over the old one? If you write() directly into the
executable, you will get what you deserve...

Regards,
Dan

2001-11-20 23:02:29

by David Miller

[permalink] [raw]
Subject: Re: Swap

From: Andrew Morton <[email protected]>
Date: Tue, 20 Nov 2001 14:23:38 -0800

Could you please explain further? What's more expensive
than the copy?

TLB misses add to the cost, and this overhead is more than
"noise".

The Apache guys were playing with using mmap() for page contents
and it was always slower than read() into a static buffer.

2001-11-20 23:06:50

by Andrew Morton

[permalink] [raw]
Subject: Re: Swap

Dan Maas wrote:
>
> > Uhhhh, read his original mail. When using mmap() he had
> > problems with the VM doing bad page replacement, while
> > read() was smooth.
>
> I should add that I did experiment with madvise(MADV_SEQUENTIAL) on the
> mapping, and with madvise(MADV_WILLNEED) on pages about to be needed. These
> had no effect. What *did* help with underruns was pre-touching each page in
> a large block (120KB), before sending that block to the output device. At
> that point I thought the mmap() code was getting to be more complicated that
> it was worth so I just dropped back to read()...

There's a new system call, sys_readahead() which does what you want.

It would be nice to make the pagein code smarter though.

-

2001-11-20 23:07:59

by Nick LeRoy

[permalink] [raw]
Subject: Re: Swap

On Tuesday 20 November 2001 15:44, Mike Fedyk wrote:

<SNIP>

> > *Any* time that you write to an executing executable, all bets are off.
> > The most likely outcome is a big 'ol crash & burn. With a local FS, Unix
> > prevents you from shooting yourself in the foot, but with NFS, fire
> > away.. I've done it. It *does* let you, but...
> >
> > Solution: Don't do that. Shut them all down, on all clients, upgrade
> > the binaries, then restart the processes on the clients.
> >
> > As far as the scenerio that you've described, I *think* that it would
> > actually work. When the new process is fork()ed, it gets a copy of the
> > file descriptors from it's parent, so the file is still open to it. If
> > it the exec()s, the new image no longer has any real ties to it's parent
> > (at least, not that are relevant to this).
>
> What about processes with shared memory such as samba 2.0?

fork()ed processes are *identical* to their parents execept for the return
value from fork(). They have the same shared memory handles, file
descriptors, etc. The kernel "knows" that there's an extra copy of each, and
updates it's link counts, etc.

Actually, the real point is that it'll still be the old executable running
with the old libraries, until you shut down the whole group. Each of the
processes are "linked" to the original file, so the new version will never
run 'til the whole group is restarted.

It should just work. I can't think of any reason why it shouldn't.

-Nick

2001-11-20 23:18:19

by Trond Myklebust

[permalink] [raw]
Subject: Re: Swap

>>>>> " " == Dan Maas <[email protected]> writes:

> But NFS still allows atomic rename() right? Isn't it considered
> essential to write the new executable or library under a
> different name, and then atomically rename() over the old one?
> If you write() directly into the executable, you will get what
> you deserve...

Atomic rename works fine, on NFS, so if you just rename the old
library, you're quite safe. The bugs start to surface if you:

a) Reuse the the old library's inode by doing something along the
lines of open("lib.so",O_TRUNC|O_WRONLY).
or
b) erase the old library.

Cheers,
Trond

2001-11-20 23:23:39

by Luigi Genoni

[permalink] [raw]
Subject: Re: Swap



On Tue, 20 Nov 2001, Richard B. Johnson wrote:

> On Tue, 20 Nov 2001, Christopher Friesen wrote:
>
> > "Richard B. Johnson" wrote:
> > >
> > > On Tue, 20 Nov 2001, Wolfgang Rohdewald wrote:
> > >
> > > > On Tuesday 20 November 2001 15:51, J.A. Magallon wrote:
> > > > > When a page is deleted for one executable (because we can re-read it from
> > > > > on-disk binary), it is discarded, not paged out.
> > > >
> > > > What happens if the on-disk binary has changed since loading the program?
> > > > -
> > >
> > > It can't. That's the reason for `install` and other methods of changing
> > > execututable files (mv exe-file exe-file.old ; cp newfile exe-file).
> > > The currently open, and possibly mapped file can be re-named, but it
> > > can't be overwritten.
> >
> > Actually, with NFS (and probably others) it can. Suppose I change the file on
> > the server, and it's swapped out on a client that has it mounted. When it swaps
> > back in, it can get the new information.
> >
> > Chris
>
> I note that NFS files don't currently return ETXTBSY, but this is a bug.
> It is 'known' to the OS that the NFS mounted file-system is busy because
> you can't unmount the file-system while an executable is running. If
> you can trash it (as you can on Linux), it is surely a bug.
>
In most of the cases, the process on the client simply dies....



2001-11-20 23:36:29

by Rik van Riel

[permalink] [raw]
Subject: Re: Swap

On Tue, 20 Nov 2001, David S. Miller wrote:
> From: Andrew Morton <[email protected]>
> Date: Tue, 20 Nov 2001 14:23:38 -0800
>
> Could you please explain further? What's more expensive
> than the copy?
>
> TLB misses add to the cost, and this overhead is more than
> "noise".

Well, this could have something to do with the fact
that our page fault handler only maps in _1_ page at
a time, so we're trapping into the pagefault handler
every 4kB...

Rik
--
Shortwave goes a long way: irc.starchat.net #swl

http://www.surriel.com/ http://distro.conectiva.com/

2001-11-20 23:40:39

by David Miller

[permalink] [raw]
Subject: Re: Swap

From: Rik van Riel <[email protected]>
Date: Tue, 20 Nov 2001 21:35:40 -0200 (BRST)

On Tue, 20 Nov 2001, David S. Miller wrote:
> TLB misses add to the cost, and this overhead is more than
> "noise".

Well, this could have something to do with the fact
that our page fault handler only maps in _1_ page at
a time, so we're trapping into the pagefault handler
every 4kB...

The Apache folks were keeping it mapped across requests,
so even if it was "primed" (ie. pre-faulted), a read() into
a static buffer was still significantly faster.

2001-11-21 00:20:01

by Rik van Riel

[permalink] [raw]
Subject: Re: Swap

On Tue, 20 Nov 2001, David S. Miller wrote:
> From: Rik van Riel <[email protected]>
> Date: Tue, 20 Nov 2001 21:35:40 -0200 (BRST)
>
> On Tue, 20 Nov 2001, David S. Miller wrote:
> > TLB misses add to the cost, and this overhead is more than
> > "noise".

> The Apache folks were keeping it mapped across requests,
> so even if it was "primed" (ie. pre-faulted), a read() into
> a static buffer was still significantly faster.

Interesting. I wonder how read() and mmap() compare when the
data is in highmem pages and we're facing a kmap()/kunmap()
for read() ...

regards,

Rik
--
Shortwave goes a long way: irc.starchat.net #swl

http://www.surriel.com/ http://distro.conectiva.com/

2001-11-21 00:22:11

by David Miller

[permalink] [raw]
Subject: Re: Swap

From: Rik van Riel <[email protected]>
Date: Tue, 20 Nov 2001 22:19:26 -0200 (BRST)

On Tue, 20 Nov 2001, David S. Miller wrote:
> The Apache folks were keeping it mapped across requests,
> so even if it was "primed" (ie. pre-faulted), a read() into
> a static buffer was still significantly faster.

Interesting. I wonder how read() and mmap() compare when the
data is in highmem pages and we're facing a kmap()/kunmap()
for read() ...

Probably, the performance drops for read() to be equivalent,
or slightly below, mmap() peformance. That would be my guess.

Franks a lot,
David S. Miller
[email protected]

2001-11-21 01:30:11

by Horst von Brand

[permalink] [raw]
Subject: Re: NFS, Paging & Installing [was: Re: Swap]

Mike Fedyk <[email protected]> said:
> Do any newer versions of NFS fix the stateless server problem?

This is an _extremely_ hard problem: The server has to know somehow what
the client thinks the state is... and either one (or both) may have been
rebooted in between without the other one knowing.
--
Horst von Brand [email protected]
Casilla 9G, Vin~a del Mar, Chile +56 32 672616

2001-11-21 01:45:03

by Håvard Kvålen

[permalink] [raw]
Subject: Re: Swap


> (I just tried looking at XMMS and Freeamp - I *think* they are using
> read(), but strace seems to do bad things with threaded programs,
> argh...)

You are right about XMMS, it uses read(). I'm not sure about Freeamp.

--
H?vard Kv?len

2001-11-21 01:47:53

by Mike Fedyk

[permalink] [raw]
Subject: Re: NFS, Paging & Installing [was: Re: Swap]

On Tue, Nov 20, 2001 at 10:22:58PM -0300, Horst von Brand wrote:
> Mike Fedyk <[email protected]> said:
> > Do any newer versions of NFS fix the stateless server problem?
>
> This is an _extremely_ hard problem: The server has to know somehow what
> the client thinks the state is... and either one (or both) may have been
> rebooted in between without the other one knowing.

Yep, but there are currently protocols (SMB) that do that, but not
necessarily in a unix way.

Are there any that do this now with linux? Locking over the network just
like it is locally?

2001-11-21 04:25:07

by Andreas Dilger

[permalink] [raw]
Subject: Re: Swap

On Nov 21, 2001 02:45 +0100, H?vard Kv?len wrote:
> > (I just tried looking at XMMS and Freeamp - I *think* they are using
> > read(), but strace seems to do bad things with threaded programs,
> > argh...)
>
> You are right about XMMS, it uses read(). I'm not sure about Freeamp.

When I was hacking on mpg123, it was using mmap by default unless it was
unable to mmap the file (e.g. stdin) where it uses read. You could turn
this off at compile time, so it only uses read. I found that to work
better on low memory machines.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/

2001-11-21 10:18:36

by Helge Hafting

[permalink] [raw]
Subject: Re: Swap

Nick LeRoy wrote:

> > Alan explained a few years ago that NFS was "stateless". Nevertheless
> > it is still a bug.
>
> Correct me if I'm wrong, but I think that it's more a bug in the NFS protocol
> than in the Linux (or Solaris, etc) NFS implementation. The problem is that
> NFS itself just doesn't pass that information along. The NFS server has no
> idea that the 'text' file is being executed, so it doesn't know that it
> should "return" ETXTBSY.
>
> Now, this might be different in NFS v3, but I'm pretty sure that this applies
> for v2, at least.

Consider the above mentioned statelessness. You can't get what you
want as long as you want a stateless server - it is simply impossible.

Your client can be tweaked so that you can't write via NFS to a
file executing on the same host - but nothing can prevent another
client from writing to that file - because the server is stateless.

A stateless server means it don't actually know if a file is
opened by anyone. The good part of this is that the server
may crash and reboot, and the client will only see a delay.
Open files will still work as soon as the server comes back up.
No state were lost in the crash - because there were no
state at all. But then you can't block writes because
you don't know that someone is executing the file.

It is not a design bug - it is a design tradeoff. A stateful
server might work if you have years of uptime or at least
no unplanned downtime. But such implementations tend to force
clients to remount if the server ever go down. That may
be really annoying if you're accessing lots of servers.

Helge Hafting

2001-11-21 10:55:42

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS, Paging & Installing [was: Re: Swap]

>>>>> " " == Mike Fedyk <[email protected]> writes:

> On Tue, Nov 20, 2001 at 10:22:58PM -0300, Horst von Brand
> wrote:
>> Mike Fedyk <[email protected]> said:
>> > Do any newer versions of NFS fix the stateless server
>> > problem?
>>
>> This is an _extremely_ hard problem: The server has to know
>> somehow what the client thinks the state is... and either one
>> (or both) may have been rebooted in between without the other
>> one knowing.

> Yep, but there are currently protocols (SMB) that do that, but
> not necessarily in a unix way.

<Cough, choke>

Exactly how, pray tell, does SMB cope with recovering the full state
info after client/server crashes?

Cheers,
Trond

2001-11-21 11:09:21

by Alan

[permalink] [raw]
Subject: Re: Swap

> It is not a design bug - it is a design tradeoff. A stateful
> server might work if you have years of uptime or at least
> no unplanned downtime. But such implementations tend to force
> clients to remount if the server ever go down. That may
> be really annoying if you're accessing lots of servers.

NFS is at best "imitation stateless". You can do good stateful servers that
recover across both client and server machine failure. You can do far better
with them than with NFS - its just a bit harder.

Alan

2001-11-21 16:45:10

by Remco Post

[permalink] [raw]
Subject: Re: Swap

> On Tue, 20 Nov 2001, Christopher Friesen wrote:
>
> > "Richard B. Johnson" wrote:
> > >
> > > On Tue, 20 Nov 2001, Wolfgang Rohdewald wrote:
> > >
> > > > On Tuesday 20 November 2001 15:51, J.A. Magallon wrote:
> > > > > When a page is deleted for one executable (because we can re-read it from
> > > > > on-disk binary), it is discarded, not paged out.
> > > >
> > > > What happens if the on-disk binary has changed since loading the program?
> > > > -
> > >
> > > It can't. That's the reason for `install` and other methods of changing
> > > execututable files (mv exe-file exe-file.old ; cp newfile exe-file).
> > > The currently open, and possibly mapped file can be re-named, but it
> > > can't be overwritten.
> >
> > Actually, with NFS (and probably others) it can. Suppose I change the file on
> > the server, and it's swapped out on a client that has it mounted. When it swaps
> > back in, it can get the new information.
> >
> > Chris
>
> I note that NFS files don't currently return ETXTBSY, but this is a bug.
> It is 'known' to the OS that the NFS mounted file-system is busy because
> you can't unmount the file-system while an executable is running. If
> you can trash it (as you can on Linux), it is surely a bug.
>
> Alan explained a few years ago that NFS was "stateless". Nevertheless
> it is still a bug.
>
> Cheers,
> Dick Johnson
>

The Client OS knows the fs is busy, the server does not, so from the server
side, I can change a file, unmount parts of the exported fs (nfs does not see
fs boudries), or even mount a completely different fs on the exported fs,
breaking the nfs client and the nfs server. Been there, done that. Yes, this
is not userfriendly, but then again, NFS in not the best networked filesystem
in the world, not was it designed to be handled by non-administrators. (and I
think it shouldn't have to be).


--
Met vriendelijke groeten,

Remco Post

SARA - Stichting Academisch Rekencentrum Amsterdam
High Performance Computing Tel. +31 20 592 8008 Fax. +31 20 668 3167

"I really didn't foresee the Internet. But then, neither did the computer
industry. Not that that tells us very much of course - the computer industry
didn't even foresee that the century was going to end." -- Douglas Adams



2001-11-21 16:49:00

by Remco Post

[permalink] [raw]
Subject: Re: Swap

> Christopher Friesen wrote:
> >
> > "Richard B. Johnson" wrote:
> > >
> > > On Tue, 20 Nov 2001, Wolfgang Rohdewald wrote:
> > >
> > > > On Tuesday 20 November 2001 15:51, J.A. Magallon wrote:
> > > > > When a page is deleted for one executable (because we can re-read it from
> > > > > on-disk binary), it is discarded, not paged out.
> > > >
> > > > What happens if the on-disk binary has changed since loading the program?
> > > > -
> > >
> > > It can't. That's the reason for `install` and other methods of changing
> > > execututable files (mv exe-file exe-file.old ; cp newfile exe-file).
> > > The currently open, and possibly mapped file can be re-named, but it
> > > can't be overwritten.
> >
> > Actually, with NFS (and probably others) it can. Suppose I change the file on
> > the server, and it's swapped out on a client that has it mounted. When it swaps
> > back in, it can get the new information.
> >
>
> This sounds really dangerous... What about shared libraries ??
>

Same problem. This is why most Unix distros tell you to reboot after each
patch applied and each OS upgrade. just to be sure that all mmapped files and
page-demand loaded bins are all restarted.


--
Met vriendelijke groeten,

Remco Post

SARA - Stichting Academisch Rekencentrum Amsterdam
High Performance Computing Tel. +31 20 592 8008 Fax. +31 20 668 3167

"I really didn't foresee the Internet. But then, neither did the computer
industry. Not that that tells us very much of course - the computer industry
didn't even foresee that the century was going to end." -- Douglas Adams


2001-11-21 16:54:10

by Remco Post

[permalink] [raw]
Subject: Re: Swap

> On Tue, Nov 20, 2001 at 03:33:28PM -0600, Nick LeRoy wrote:
> > On Tuesday 20 November 2001 15:18, Mike Fedyk wrote:
> > > On Tue, Nov 20, 2001 at 10:05:37PM +0100, Steffen Persvold wrote:
> > > > Christopher Friesen wrote:
> > > > > "Richard B. Johnson" wrote:
> > > > > > On Tue, 20 Nov 2001, Wolfgang Rohdewald wrote:
> > > > > > > On Tuesday 20 November 2001 15:51, J.A. Magallon wrote:
> > > > > > > > When a page is deleted for one executable (because we can re-read
> > > > > > > > it from on-disk binary), it is discarded, not paged out.
> > > > > > >
> > > > > > > What happens if the on-disk binary has changed since loading the
> > > > > > > program? -
> > > > > >
> > > > > > It can't. That's the reason for `install` and other methods of
> > > > > > changing execututable files (mv exe-file exe-file.old ; cp newfile
> > > > > > exe-file). The currently open, and possibly mapped file can be
> > > > > > re-named, but it can't be overwritten.
> > > > >
> > > > > Actually, with NFS (and probably others) it can. Suppose I change the
> > > > > file on the server, and it's swapped out on a client that has it
> > > > > mounted. When it swaps back in, it can get the new information.
> > > >
> > > > This sounds really dangerous... What about shared libraries ??
> > >
> > > IIRC (if wrong flame...)
> > >
> > > When you delete an open file, the entry is removed from the directory, but
> > > not unlinked until the file is closed. This is a standard UNIX semantic.
> > >
> > > Now, if you have a set of processes with shared memory, and one closes, and
> > > another is created to replace, the new process will get the new libraries,
> > > or even new version of the process. This could/will bring down the entire
> > > set of processes.
> > >
> > > Apps like samba come to mind...
> >
> > *Any* time that you write to an executing executable, all bets are off. The
> > most likely outcome is a big 'ol crash & burn. With a local FS, Unix
> > prevents you from shooting yourself in the foot, but with NFS, fire away..
> > I've done it. It *does* let you, but...
> >
> > Solution: Don't do that. Shut them all down, on all clients, upgrade the
> > binaries, then restart the processes on the clients.
> >
> > As far as the scenerio that you've described, I *think* that it would
> > actually work. When the new process is fork()ed, it gets a copy of the file
> > descriptors from it's parent, so the file is still open to it. If it the
> > exec()s, the new image no longer has any real ties to it's parent (at least,
> > not that are relevant to this).
> >
>
> What about processes with shared memory such as samba 2.0?


Cool, isn't it. Thinking of 1000 ways to crash apps. As long as the meaning of
the bits and bytes in the shm segment does not change with a newer version of
the app, you're safe. Upgrading in single-user modes makes things a lot safer
(yes I too usually like to live dangerous....)


--
Met vriendelijke groeten,

Remco Post

SARA - Stichting Academisch Rekencentrum Amsterdam
High Performance Computing Tel. +31 20 592 8008 Fax. +31 20 668 3167

"I really didn't foresee the Internet. But then, neither did the computer
industry. Not that that tells us very much of course - the computer industry
didn't even foresee that the century was going to end." -- Douglas Adams


2001-11-22 05:17:22

by Bernd Eckenfels

[permalink] [raw]
Subject: Re: NFS, Paging & Installing [was: Re: Swap]

In article <[email protected]> you wrote:
> Exactly how, pray tell, does SMB cope with recovering the full state
> info after client/server crashes?

Not doing that is the better solution.

Greetings
Bernd

2001-11-22 12:20:20

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS, Paging & Installing [was: Re: Swap]

>>>>> " " == Bernd Eckenfels <[email protected]> writes:

> In article <[email protected]> you wrote:
>> Exactly how, pray tell, does SMB cope with recovering the full
>> state info after client/server crashes?

> Not doing that is the better solution.

...and is why stateless filesystems are the norm. The claim that SMB
was different wasn't mine.

Cheers,
Trond

2001-11-23 19:34:26

by Mike Fedyk

[permalink] [raw]
Subject: Re: NFS, Paging & Installing [was: Re: Swap]

On Wed, Nov 21, 2001 at 11:55:07AM +0100, Trond Myklebust wrote:
> >>>>> " " == Mike Fedyk <[email protected]> writes:
>
> > On Tue, Nov 20, 2001 at 10:22:58PM -0300, Horst von Brand
> > wrote:
> >> Mike Fedyk <[email protected]> said:
> >> > Do any newer versions of NFS fix the stateless server
> >> > problem?
> >>
> >> This is an _extremely_ hard problem: The server has to know
> >> somehow what the client thinks the state is... and either one
> >> (or both) may have been rebooted in between without the other
> >> one knowing.
>
> > Yep, but there are currently protocols (SMB) that do that, but
> > not necessarily in a unix way.
>
> <Cough, choke>
>
> Exactly how, pray tell, does SMB cope with recovering the full state
> info after client/server crashes?
>

No, I wasn't claiming that SMB will recover from a server crash gracefully.
If your SMB server goes down (upgrade being likely with samba instead of
crash...) for whatever reason, any open file connections are hosed.

I was just stating that there are Network FSes that are stateful, and work
good when the server stays up.

As stated by Alan, you can make a stateful Net FS that deals gracefully with
crash recovery, it's just harder.

Also, SMB deals with crashed clients pretty well most of the time by
querying the client with the write lock to see if it's still there...

Mike

2001-11-26 21:55:02

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [Linux-abi-devel] Re: Swap

On Tue, Nov 20, 2001 at 06:58:03PM +0100, Wolfgang Rohdewald wrote:
> I am quite sure this is also possible if the binary is emulated by
> the linux-abi modules like my old SCO binaries.

Linux-ABI mmaps binaries if they are page-aligned, otherwise they
are read completly at startup. Note that Linux-ABI uses the normal
binfmt_elf for foreign ELF binaries, so the above applies only
to COFF and X.out (Microsoft x.out) binaries.

Christoph

--
Of course it doesn't work. We've performed a software upgrade.