I am somewhat confused about how much swap space you can have with a
2.4 series kernel. If I read the mkswap man page, I get the impression
that I could have up to 8x2GB of swap space for a total of 16 GB, but
reading the RedHat reference guide, it says 2GB maximum.
I presume 2.5 kernels have much higher limits?
--
Colin Paul Adams
Preston Lancashire
> I am somewhat confused about how much swap space you can have with a 2.4
> series kernel. If I read the mkswap man page, I get the impression that I
> could have up to 8x2GB of swap space for a total of 16 GB, but reading the
> RedHat reference guide, it says 2GB maximum.
>
> I presume 2.5 kernels have much higher limits?
> --
Hi,
2.5 limits are the same as 2.4.recent AFAIK, but swapfiles in 2.5 work as
well (fast) as swap partitions if I recall Andrew Morton's comments
correctly.
>From http://www.xenotime.net/linux/doc/swap-mini-howto.txt:
3. Swap space limits
Linux 2.4.10 and later, and Linux 2.5 support any combination of swap
files or swap devices to a maximum number of 32 of them. Prior to Linux
2.4.10, the limit was any combination of 8 swap files or swap devices. On
x86 architecture systems, each of these swap areas has a limit of 2 GiB.
~Randy
"Randy.Dunlap" <[email protected]> wrote:
>
> 3. Swap space limits
>
> Linux 2.4.10 and later, and Linux 2.5 support any combination of swap
> files or swap devices to a maximum number of 32 of them. Prior to Linux
> 2.4.10, the limit was any combination of 8 swap files or swap devices. On
> x86 architecture systems, each of these swap areas has a limit of 2 GiB.
The limit is now 16 swapfiles/devices, because one pte bit got
stolen for nonlinear VMA pte's.
I'm not sure where the 2G limit comes from?
On Sat, Jun 07, 2003 at 01:24:32PM -0700, Andrew Morton wrote:
> The limit is now 16 swapfiles/devices, because one pte bit got
> stolen for nonlinear VMA pte's.
> I'm not sure where the 2G limit comes from?
i386 has:
#define __swp_type(x) (((x).val >> 1) & 0x1f)
#define __swp_offset(x) ((x).val >> 8)
#define __swp_entry(type, offset) ((swp_entry_t) { ((type) << 1) | ((offse
t) << 8) })
#define __pte_to_swp_entry(pte) ((swp_entry_t) { (pte).pte_low })
#define __swp_entry_to_pte(x) ((pte_t) { (x).val })
These limits could be slightly relaxed by the kernel with some slightly
more complex bit twiddlings to recover up to 6 bits of the lower byte
of a non-present PTE for 64 swapfiles. The limitation on size seems to
be in userspace. It appears the kernel has 24 bits for offsets in 4KB
units, for up to something approaching 64GB swapfiles. Andi Kleen tells
me newer distributions have fixed the mkswap(8) userspace limitation.
So non-PAE x86 should be able to do 4TB of aggregate swapspace modulo
vmallocspace and/or ZONE_NORMAL exhaustion from swap maps. Also, PAE
should be able to do 64TB of aggregate swapspace (modulo vmallocespace)
since it has an additional 4 bits usage for page offsets. But I didn't
audit intensively, so some silly limits may be lurking in dark corners.
-- wli
Followup to: <33435.4.64.196.31.1055008200.squirrel@http://www.osdl.org>
By author: "Randy.Dunlap" <[email protected]>
In newsgroup: linux.dev.kernel
>
> From http://www.xenotime.net/linux/doc/swap-mini-howto.txt:
>
> 3. Swap space limits
>
> Linux 2.4.10 and later, and Linux 2.5 support any combination of swap
> files or swap devices to a maximum number of 32 of them. Prior to Linux
> 2.4.10, the limit was any combination of 8 swap files or swap devices. On
> x86 architecture systems, each of these swap areas has a limit of 2 GiB.
>
2 GiB is getting a bit tight, especially with tmpfs, ust like the
previous limits of 16 MiB and 128 MiB were getting tight at various
points, and it's annoying to have to make multiple partitions.
tmpfs is a good thing -- in my experience even if it is stored
primarily on disk it is much faster for temp files than any other
filesystem, simply because it never has to worry about consistency.
This means it's entirely reasonable to have a "farm" machine with a
40 GiB tmpfs used for everything except the OS itself.
-hpa
--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
Architectures needed: ia64 m68k mips64 ppc ppc64 s390 s390x sh v850 x86-64
Followup to: <33435.4.64.196.31.1055008200.squirrel@http://www.osdl.org>
By author: "Randy.Dunlap" <[email protected]>
In newsgroup: linux.dev.kernel
>> Linux 2.4.10 and later, and Linux 2.5 support any combination of swap
>> files or swap devices to a maximum number of 32 of them. Prior to Linux
>> 2.4.10, the limit was any combination of 8 swap files or swap devices. On
>> x86 architecture systems, each of these swap areas has a limit of 2 GiB.
On Sat, Jun 07, 2003 at 02:43:54PM -0700, H. Peter Anvin wrote:
> 2 GiB is getting a bit tight, especially with tmpfs, ust like the
> previous limits of 16 MiB and 128 MiB were getting tight at various
> points, and it's annoying to have to make multiple partitions.
> tmpfs is a good thing -- in my experience even if it is stored
> primarily on disk it is much faster for temp files than any other
> filesystem, simply because it never has to worry about consistency.
> This means it's entirely reasonable to have a "farm" machine with a
> 40 GiB tmpfs used for everything except the OS itself.
The 2GB limit is 100% userspace; distros are already shipping the
mkswap(8) fixes (both RH & UL anyway).
-- wli
William Lee Irwin III wrote:
>
> The 2GB limit is 100% userspace; distros are already shipping the
> mkswap(8) fixes (both RH & UL anyway).
>
Presumably it means they have defined a new swap format and have changed
swapon(8) as well. This should be rolled back into util-linux if they
aren't already.
-hpa
On 7 Jun 2003, Colin Paul Adams wrote:
> I am somewhat confused about how much swap space you can have with a
> 2.4 series kernel. If I read the mkswap man page, I get the impression
> that I could have up to 8x2GB of swap space for a total of 16 GB, but
> reading the RedHat reference guide, it says 2GB maximum.
That piece of documentation is out of date. I'm using a
20 GB swap partition on one of my test systems, with a
2.4 kernel.
William Lee Irwin III wrote:
>> The 2GB limit is 100% userspace; distros are already shipping the
>> mkswap(8) fixes (both RH & UL anyway).
On Sat, Jun 07, 2003 at 03:18:45PM -0700, H. Peter Anvin wrote:
> Presumably it means they have defined a new swap format and have changed
> swapon(8) as well. This should be rolled back into util-linux if they
> aren't already.
The swap format (or at least the header) doesn't appear to depend on
byte offsets that I can tell, so I don't see any need for it to change.
I'm not entirely sure what they've done for swapon(8) or mkswap(8),
though I could probably bang out an equivalent or fish out their
patches if pressed.
-- wli
On Sat, Jun 07, 2003 at 01:50:46PM -0700, William Lee Irwin III wrote:
> These limits could be slightly relaxed by the kernel with some slightly
> more complex bit twiddlings to recover up to 6 bits of the lower byte
> of a non-present PTE for 64 swapfiles. The limitation on size seems to
> be in userspace. It appears the kernel has 24 bits for offsets in 4KB
> units, for up to something approaching 64GB swapfiles. Andi Kleen tells
> me newer distributions have fixed the mkswap(8) userspace limitation.
> So non-PAE x86 should be able to do 4TB of aggregate swapspace modulo
> vmallocspace and/or ZONE_NORMAL exhaustion from swap maps. Also, PAE
> should be able to do 64TB of aggregate swapspace (modulo vmallocespace)
> since it has an additional 4 bits usage for page offsets. But I didn't
> audit intensively, so some silly limits may be lurking in dark corners.
Santamarta on #kn tested the following patch to allow up to 64
swapfiles.
diff -prauN linux-2.5.70/include/asm-i386/pgtable.h swap-2.5.70/include/asm-i386/pgtable.h
--- linux-2.5.70/include/asm-i386/pgtable.h Thu May 1 19:15:41 2003
+++ swap-2.5.70/include/asm-i386/pgtable.h Sat Jun 7 16:47:04 2003
@@ -106,6 +106,7 @@
#define _PAGE_BIT_PCD 4
#define _PAGE_BIT_ACCESSED 5
#define _PAGE_BIT_DIRTY 6
+#define _PAGE_BIT_FILE 6
#define _PAGE_BIT_PSE 7 /* 4 MB (or 2MB) page, Pentium+, if present.. */
#define _PAGE_BIT_GLOBAL 8 /* Global TLB entry PPro+ */
@@ -320,12 +321,38 @@
*/
#define update_mmu_cache(vma,address,pte) do { } while (0)
-/* Encode and de-code a swap entry */
-#define __swp_type(x) (((x).val >> 1) & 0x1f)
+/*
+ * Encode and de-code a swap entry
+ * PAE could use more swapspace if swp_entry_t were wider, as there
+ * is an additional word in PTE's with 4 bits available. The benefit
+ * of extending it for such is, however, questionable.
+ */
+
#define __swp_offset(x) ((x).val >> 8)
-#define __swp_entry(type, offset) ((swp_entry_t) { ((type) << 1) | ((offset) << 8) })
#define __pte_to_swp_entry(pte) ((swp_entry_t) { (pte).pte_low })
#define __swp_entry_to_pte(x) ((pte_t) { (x).val })
+
+/*
+ * Bit 0 is reserved for present/non-present, and _PAGE_BIT_FILE is
+ * reserved for non-present PTE's representing file pages.
+ */
+#define __swp_type(entry) \
+({ \
+ unsigned long __val__ = (entry).val; \
+ (__val__ & (_PAGE_FILE - 1)) >> 1 \
+ | (__val__ & (_PAGE_FILE << 1)) >> 2; \
+})
+
+#define __swp_entry(type, offset) \
+({ \
+ unsigned long __type__ = type; \
+ (swp_entry_t) \
+ { (offset) << 8 \
+ | ((__type__ << 1) & (_PAGE_FILE - 1)) \
+ | ((__type__ << 2) & (_PAGE_FILE << 1)) }; \
+})
+
+#define MAX_SWAPFILES_SHIFT _PAGE_BIT_FILE
#endif /* !__ASSEMBLY__ */
diff -prauN linux-2.5.70/include/linux/swap.h swap-2.5.70/include/linux/swap.h
--- linux-2.5.70/include/linux/swap.h Wed May 7 21:19:58 2003
+++ swap-2.5.70/include/linux/swap.h Sat Jun 7 15:09:28 2003
@@ -10,6 +10,7 @@
#include <linux/sched.h>
#include <asm/atomic.h>
#include <asm/page.h>
+#include <asm/pgtable.h>
#define SWAP_FLAG_PREFER 0x8000 /* set if swap priority specified */
#define SWAP_FLAG_PRIO_MASK 0x7fff
@@ -25,10 +26,14 @@
* be swapped to. The swap type and the offset into that swap type are
* encoded into pte's and into pgoff_t's in the swapcache. Using five bits
* for the type means that the maximum number of swapcache pages is 27 bits
- * on 32-bit-pgoff_t architectures. And that assumes that the architecture packs
- * the type/offset into the pte as 5/27 as well.
+ * on 32-bit-pgoff_t architectures. And that assumes that the
+ * architecture packs the type/offset into the pte as 5/27 as well.
+ * Architectures can override this by simply defining MAX_SWAPFILES_SHIFT
+ * in appropriate headers.
*/
+#ifndef MAX_SWAPFILES_SHIFT
#define MAX_SWAPFILES_SHIFT 5
+#endif
#define MAX_SWAPFILES (1 << MAX_SWAPFILES_SHIFT)
/*
William Lee Irwin III <[email protected]> wrote:
>
> Santamarta on #kn tested the following patch to allow up to 64
> swapfiles.
Seems hardly worth the extra arithmetic given that the 2G limit
is actually bogus?
I just did mkswap/swapon of a 52G partition. That used 26MB of lowmem for
the swap map btw.
On Sat, Jun 07, 2003 at 06:28:43PM -0700, Andrew Morton wrote:
> Seems hardly worth the extra arithmetic given that the 2G limit
> is actually bogus?
> I just did mkswap/swapon of a 52G partition. That used 26MB of lowmem for
> the swap map btw.
It's not clear precisely who or what would benefit from it; however,
the decreased maximum of 32 swapfiles on i386 is a regression vs.
2.4.x's limit of 64, in whatever sense something no one cares about is
actually a regression (in principle they could have merely not spoken
up about it).
In other words, if someone feels itchy because the number went down
from 2.4.x, here it is. If not, I'm fine with leaving it be.
-- wli
P.S.
If desired, I can also send in the code to utilize the extra bits on
PAE, or turn things into a config option, or whatever. Joe Blow random
VM hacker at your service etc.
On Sat, Jun 07, 2003 at 06:28:43PM -0700, Andrew Morton wrote:
>> Seems hardly worth the extra arithmetic given that the 2G limit
>> is actually bogus?
>> I just did mkswap/swapon of a 52G partition. That used 26MB of lowmem for
>> the swap map btw.
On Sat, Jun 07, 2003 at 06:38:27PM -0700, William Lee Irwin III wrote:
> It's not clear precisely who or what would benefit from it; however,
> the decreased maximum of 32 swapfiles on i386 is a regression vs.
> 2.4.x's limit of 64, in whatever sense something no one cares about is
> actually a regression (in principle they could have merely not spoken
> up about it).
> In other words, if someone feels itchy because the number went down
> from 2.4.x, here it is. If not, I'm fine with leaving it be.
I went and worked out why it's wrong (_PAGE_PROTNONE) clash. Whatever
you do, don't apply it.
-- wli
The 2GB limit is 100% userspace; distros are already shipping the
mkswap(8) fixes (both RH & UL anyway).
If I recall things correctly: at some point in time
the kernel would reject a swapon on a swapspace that was
larger than it could handle (instead of just using the
initial part).
That is why mkswap contains a lot of very ugly code
that compares the size with the maximum certain kernels
will accept for swapon.
I have not checked recently what the present situation is.
Andries
On Sat, 7 Jun 2003 18:28:46 -0400 (EDT) Rik van Riel <[email protected]> wrote:
| On 7 Jun 2003, Colin Paul Adams wrote:
|
| > I am somewhat confused about how much swap space you can have with a
| > 2.4 series kernel. If I read the mkswap man page, I get the impression
| > that I could have up to 8x2GB of swap space for a total of 16 GB, but
| > reading the RedHat reference guide, it says 2GB maximum.
|
| That piece of documentation is out of date. I'm using a
| 20 GB swap partition on one of my test systems, with a
| 2.4 kernel.
So do we know what the 2.4.current and 2.5.current limits are?
You have used a 20 GB swap partition on 2.4.recent.
Andrew has used (tested) a 52 GB partition on some unmentioned
kernel.
Thanks,
--
~Randy
On Tue, Jun 10, 2003 at 12:00:39PM -0700, Randy.Dunlap wrote:
> So do we know what the 2.4.current and 2.5.current limits are?
> You have used a 20 GB swap partition on 2.4.recent.
> Andrew has used (tested) a 52 GB partition on some unmentioned
> kernel.
I apologize for failing to do a proper wrap-up. AIUI, we have:
(1) both 2.4.x and 2.5.x kernels support swapspaces of up to 64GB in size
(2) 2.4.x supports 64 swapspaces and 2.5.x supports 32 (not reparable)
(3) mkswap(8) needs fixes for creating swapspaces larger than 2GB merged
back to util-linux; aeb (util-linux maintainer) has publicly
requested the code be sent back to him for merging, presumably
with some evidence of its correctness. One of the several distro
people who are maintaining such patches against mkswap(8) is
going to send that in.
-- wli