2008-03-05 09:06:33

by Pavel Machek

[permalink] [raw]
Subject: 2.6.25-rc3: 34TB vmalloc total -- overflow in /proc/meminfo?

Hi!

leet:~ # uname -a
Linux leet 2.6.25-rc3 #189 SMP Mon Mar 3 13:16:59 CET 2008 x86_64
x86_64 x86_64
GNU/Linux
leet:~ #

32-bit distro on 64-bit kernel.

leet:~ # cat /proc/meminfo
MemTotal: 4055780 kB
MemFree: 3972400 kB
Buffers: 4892 kB
Cached: 29844 kB
SwapCached: 0 kB
Active: 23140 kB
Inactive: 20800 kB
SwapTotal: 2104472 kB
SwapFree: 2104472 kB
Dirty: 1300 kB
Writeback: 0 kB
AnonPages: 9152 kB
Mapped: 8684 kB
Slab: 18336 kB
SReclaimable: 7448 kB
SUnreclaim: 10888 kB
PageTables: 676 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
CommitLimit: 4132360 kB
Committed_AS: 27684 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 18112 kB
VmallocChunk: 34359720115 kB
leet:~ # /etc/init.d/gpm start

Linux version 2.6.25-rc3 (pavel@amd) (gcc version 4.1.3 20071209
(prerelease) (Debian 4.1.2-18)) #189 SMP Mon Mar 3 13:16:59 CET 2008
Command line: root=/dev/sda2 vga=6 resume=/dev/sda1 splash=silent
nosmp no_console_suspend 3
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009d800 (usable)
BIOS-e820: 000000000009d800 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000d2000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 00000000bfed0000 (usable)
BIOS-e820: 00000000bfed0000 - 00000000bfee2000 (ACPI data)
BIOS-e820: 00000000bfee2000 - 00000000bfeee000 (ACPI NVS)
BIOS-e820: 00000000bfeee000 - 00000000c0000000 (reserved)
BIOS-e820: 00000000fec00000 - 00000000fec03000 (reserved)
BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
BIOS-e820: 00000000fff80000 - 0000000100000000 (reserved)
BIOS-e820: 0000000100000000 - 0000000140000000 (usable)
Entering add_active_range(0, 0, 157) 0 entries of 3200 used
Entering add_active_range(0, 256, 786128) 1 entries of 3200 used
Entering add_active_range(0, 1048576, 1310720) 2 entries of 3200 used
end_pfn_map = 1310720
DMI present.
ACPI: RSDP 000F8010, 0024 (r2 PTLTD )
ACPI: XSDT BFEDBF8C, 0064 (r1 BRCM Anaheim 6040000 PTL 2000001)
ACPI: FACP BFEDC064, 00F4 (r3 BRCM EXPLOSN 6040000 MSFT 2000001)
ACPI Warning (tbfadt-0442): Optional field "Pm2ControlBlock" has zero
address or length: 0000000000000000/C [20070126]
ACPI: DSDT BFEDC158, 4777 (r2 AMD Anaheim 6040000 MSFT 2000002)
ACPI: FACS BFEEDFC0, 0040
ACPI: TCPA BFEE08CF, 0032 (r1 BRCM Anaheim 6040000 PTL 20000001)
ACPI: SRAT BFEE0901, 0128 (r1 AMD HAMMER 6040000 AMD 1)
ACPI: SSDT BFEE0A29, 143C (r1 AMD POWERNOW 6040000 AMD 1)
ACPI: HPET BFEE1E65, 0038 (r1 BRCM Anaheim 6040000 BRCM 2000001)
ACPI: SSDT BFEE1E9D, 0049 (r1 BRCM PRT0 6040000 BRCM 2000001)
ACPI: SPCR BFEE1EE6, 0050 (r1 PTLTD $UCRTBL$ 6040000 PTL 1)
ACPI: APIC BFEE1F36, 00CA (r1 BRCM Anaheim 6040000 PTL 2000001)
SRAT: PXM 0 -> APIC 0 -> Node 0
SRAT: PXM 0 -> APIC 1 -> Node 0
SRAT: PXM 0 -> APIC 2 -> Node 0
SRAT: PXM 0 -> APIC 3 -> Node 0
SRAT: PXM 1 -> APIC 4 -> Node 1
SRAT: PXM 1 -> APIC 5 -> Node 1
SRAT: PXM 1 -> APIC 6 -> Node 1
SRAT: PXM 1 -> APIC 7 -> Node 1
SRAT: Node 0 PXM 0 0-a0000
Entering add_active_range(0, 0, 157) 0 entries of 3200 used
SRAT: Node 0 PXM 0 0-c0000000
Entering add_active_range(0, 0, 157) 1 entries of 3200 used
Entering add_active_range(0, 256, 786128) 1 entries of 3200 used
SRAT: Node 0 PXM 0 0-140000000
Entering add_active_range(0, 0, 157) 2 entries of 3200 used
Entering add_active_range(0, 256, 786128) 2 entries of 3200 used
Entering add_active_range(0, 1048576, 1310720) 2 entries of 3200 used
NUMA: Using 63 for the hash shift.
Bootmem setup node 0 0000000000000000-0000000140000000
NODE_DATA [000000000000e000 - 0000000000014fff]
bootmap [0000000000015000


(If I try to suspend to disk, I get a rather nasty oops; goes away
with mem=3G. Related?)

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


2008-03-05 09:37:27

by Pavel Machek

[permalink] [raw]
Subject: Re: 2.6.25-rc3: 34TB vmalloc total -- overflow in /proc/meminfo?

Hi!

> (If I try to suspend to disk, I get a rather nasty oops; goes away
> with mem=3G. Related?)

The nasty oops is in swsusp_save+0x298 .

rdx=0xffff810000001008
rdi=0xffff81000c001008
rdi=0x0000000131613000

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2008-03-05 09:39:41

by Pavel Machek

[permalink] [raw]
Subject: Re: 2.6.25-rc3: 34TB vmalloc total -- overflow in /proc/meminfo?

Hi!

> > (If I try to suspend to disk, I get a rather nasty oops; goes away
> > with mem=3G. Related?)
>
> The nasty oops is in swsusp_save+0x298 .
>
> rdx=0xffff810000001008
> rdi=0xffff81000c001008
> rdi=0x0000000131613000

Rafael, this seems to be similar to some problem you were trying to
solve... something with numa... I could not find it in
bugzilla.kernel.org... do you remember details by chance?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2008-03-05 19:49:38

by Christian Kujau

[permalink] [raw]
Subject: Re: 2.6.25-rc3: 34TB vmalloc total -- overflow in /proc/meminfo?

On Wed, 5 Mar 2008, Pavel Machek wrote:
> CommitLimit: 4132360 kB
> Committed_AS: 27684 kB
> VmallocTotal: 34359738367 kB
> VmallocUsed: 18112 kB
> VmallocChunk: 34359720115 kB

out of curiosity: yesterday I've seen a box[0] with ~4 TB Committed_AS:

CommitLimit: 3085152 kB
Committed_AS: 4281048084 kB
VmallocTotal: 118776 kB
VmallocUsed: 13772 kB
VmallocChunk: 103880 kB

Since it's a rather old kernel (2.6.19.2), I just want to know: could this
be related to what you've seen or this completely different (and
Committed_AS is just this high because some st00pid app has allocated this
much memory but not freed again)?

Thanks,
Christian.

[0] amd64, 32bit kernel, 32bit userland, 4GB RAM
--
BOFH excuse #101:

Collapsed Backbone

2008-03-05 21:12:48

by Hugh Dickins

[permalink] [raw]
Subject: Re: 2.6.25-rc3: 34TB vmalloc total -- overflow in /proc/meminfo?

On Wed, 5 Mar 2008, Christian Kujau wrote:
> On Wed, 5 Mar 2008, Pavel Machek wrote:
> > CommitLimit: 4132360 kB
> > Committed_AS: 27684 kB
> > VmallocTotal: 34359738367 kB
> > VmallocUsed: 18112 kB
> > VmallocChunk: 34359720115 kB

I don't see what Pavel's issue is with this: it's simply a fact that
with a 64-bit kernel, we've lots of virtual address space to spare
for vmalloc. What would be surprising is for VmallocUsed to get up
as high as that.

>
> out of curiosity: yesterday I've seen a box[0] with ~4 TB Committed_AS:
>
> CommitLimit: 3085152 kB
> Committed_AS: 4281048084 kB
> VmallocTotal: 118776 kB
> VmallocUsed: 13772 kB
> VmallocChunk: 103880 kB
>
> Since it's a rather old kernel (2.6.19.2), I just want to know: could this be
> related to what you've seen or this completely different

Completely different and much more interesting.

> (and Committed_AS is
> just this high because some st00pid app has allocated this much memory but not
> freed again)?

Unlikely. Offhand I'm not quite sure that's impossible, but it's far
more likely that we've a kernel bug and vm_committed_space has wrapped
negative.

Ancient as your kernel is, I don't notice anything in the ChangeLogs
since then to say we've fixed a bug of that kind since 2.6.19.
Any idea how to reproduce this? Are you using HugePages at all?
(It's particularly easy for us to get into a muddle over them,
though historically I think mremap has proved most difficult for
Committed_AS accounting).

Thanks,
Hugh

>
> Thanks,
> Christian.
>
> [0] amd64, 32bit kernel, 32bit userland, 4GB RAM

2008-03-05 21:21:38

by Pavel Machek

[permalink] [raw]
Subject: Re: 2.6.25-rc3: 34TB vmalloc total -- overflow in /proc/meminfo?

Hi!

> > > CommitLimit: 4132360 kB
> > > Committed_AS: 27684 kB
> > > VmallocTotal: 34359738367 kB
> > > VmallocUsed: 18112 kB
> > > VmallocChunk: 34359720115 kB
>
> I don't see what Pavel's issue is with this: it's simply a fact that
> with a 64-bit kernel, we've lots of virtual address space to spare
> for vmalloc. What would be surprising is for VmallocUsed to get up
> as high as that.

Hmm... ok, I see, I thought "clearly this overflowed somewhere", and I
was wrong, it is expected result.

Still.... what is 34TB of vmalloc space good for when we can only ever
allocate 4GB (because that is how much physical memory we have?)? To
prevent fragmentation?

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2008-03-05 21:37:56

by Hugh Dickins

[permalink] [raw]
Subject: Re: 2.6.25-rc3: 34TB vmalloc total -- overflow in /proc/meminfo?

On Wed, 5 Mar 2008, Pavel Machek wrote:
> > > > CommitLimit: 4132360 kB
> > > > Committed_AS: 27684 kB
> > > > VmallocTotal: 34359738367 kB
> > > > VmallocUsed: 18112 kB
> > > > VmallocChunk: 34359720115 kB
> >
> > I don't see what Pavel's issue is with this: it's simply a fact that
> > with a 64-bit kernel, we've lots of virtual address space to spare
> > for vmalloc. What would be surprising is for VmallocUsed to get up
> > as high as that.
>
> Hmm... ok, I see, I thought "clearly this overflowed somewhere", and I

The (mis)alignment does makes it look that way,
but no, it's not an overflow in this case.

> was wrong, it is expected result.
>
> Still.... what is 34TB of vmalloc space good for when we can only ever
> allocate 4GB (because that is how much physical memory we have?)? To
> prevent fragmentation?

Well, what else would you want to use that space for? If there were
a compelling reason to tune it according to how much physical memory
you have (and you're right, that we want a good surplus of address
space so as to avoid silly limitations by fragmentation), I guess
that could have been done. But why bother if there's no reason?

It's a hard life, there's just too much room to spare in 64-bit ;)

Hugh

2008-03-05 22:11:46

by Christian Kujau

[permalink] [raw]
Subject: Re: 2.6.25-rc3: 34TB vmalloc total -- overflow in /proc/meminfo?

On Wed, 5 Mar 2008, Hugh Dickins wrote:
> I don't see what Pavel's issue is with this: it's simply a fact that
> with a 64-bit kernel, we've lots of virtual address space to spare
> for vmalloc. What would be surprising is for VmallocUsed to get up
> as high as that.

OK, thanks for the clarification.

> Completely different and much more interesting.

Well, if it's "interesting"...here are some more details from the box:

http://nerdbynature.de/bits/2.6.19.2/

> Unlikely. Offhand I'm not quite sure that's impossible, but it's far
> more likely that we've a kernel bug and vm_committed_space has wrapped
> negative.

Huh. When I first saw this I thought "kernel bug" too, but then read the
documentation to Committed_AS I thought it's just userspace related...

> Ancient as your kernel is, I don't notice anything in the ChangeLogs
> since then to say we've fixed a bug of that kind since 2.6.19.
> Any idea how to reproduce this?

Well, the box is running fine and since it's a production machine I don't
intend to reboot the box very often. And since it's really an old kernel
(for lkml discussion, that is) I don't intend to debug this one further.
I really was only curious if this was userspace related (some app
overcommitting) or some kernel weirdness.

> Are you using HugePages at all?

I have:

# CONFIG_HUGETLBFS is not set
# CONFIG_HUGETLB_PAGE is not set

...was this, what you meant?

Thanks,
Christian.
--
BOFH excuse #340:

Well fix that in the next (upgrade, update, patch release, service pack).

2008-03-05 22:14:23

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.25-rc3: 34TB vmalloc total -- overflow in /proc/meminfo?

On Wednesday, 5 of March 2008, Pavel Machek wrote:
> Hi!
>
> > > (If I try to suspend to disk, I get a rather nasty oops; goes away
> > > with mem=3G. Related?)
> >
> > The nasty oops is in swsusp_save+0x298 .
> >
> > rdx=0xffff810000001008
> > rdi=0xffff81000c001008
> > rdi=0x0000000131613000
>
> Rafael, this seems to be similar to some problem you were trying to
> solve... something with numa... I could not find it in
> bugzilla.kernel.org... do you remember details by chance?

http://bugzilla.kernel.org/show_bug.cgi?id=9966

[Just have a look at the list of regressions from 2.6.24. ;-)]

In fact, I didn't even try to solve it myself, but asked some knwoledgeable
people (CCed) for advice. No one responded, unfortunately ...

Thanks,
Rafael

2008-03-05 22:32:04

by Andi Kleen

[permalink] [raw]
Subject: Re: 2.6.25-rc3: 34TB vmalloc total -- overflow in /proc/meminfo?


Just commenting on the subject. The 34TB are not an over/underflow. x86-64
simply has so much address space reserved for vmalloc. It doesn't mean of course
that that much could be actually allocated in real memory.

-Andi

2008-03-05 23:23:06

by Hugh Dickins

[permalink] [raw]
Subject: Re: 2.6.25-rc3: 34TB vmalloc total -- overflow in /proc/meminfo?

On Wed, 5 Mar 2008, Christian Kujau wrote:
>
> Well, if it's "interesting"...here are some more details from the box:
>
> http://nerdbynature.de/bits/2.6.19.2/

Thanks for putting that together; but I didn't find any clues.

>
> > Unlikely. Offhand I'm not quite sure that's impossible, but it's far
> > more likely that we've a kernel bug and vm_committed_space has wrapped
> > negative.
>
> Huh. When I first saw this I thought "kernel bug" too, but then read the
> documentation to Committed_AS I thought it's just userspace related...

It's pretty sure to be userspace related i.e. kernel bug triggered by
particular userspace usage; and understandably, the bits you put there
don't tell me much about what userspace has been up to. (And don't
worry, I'm not expecting you to tell me more! I can't think of
anything useful to ask about it - it's not a question of what mix
of apps you have running there, it's a matter of what system calls,
especially mmaps, they make - I'm not expecting traces of that.)

>
> > Ancient as your kernel is, I don't notice anything in the ChangeLogs
> > since then to say we've fixed a bug of that kind since 2.6.19.
> > Any idea how to reproduce this?
>
> Well, the box is running fine and since it's a production machine I don't
> intend to reboot the box very often.

Absolutely. You mentioned it because it's useful for us to know there's
such an issue about, to keep our eyes open: thank you for doing so.
No need for you to go any further out of your way on it.

> And since it's really an old kernel (for
> lkml discussion, that is) I don't intend to debug this one further. I really
> was only curious if this was userspace related (some app overcommitting) or
> some kernel weirdness.
>
> > Are you using HugePages at all?
>
> I have:
>
> # CONFIG_HUGETLBFS is not set
> # CONFIG_HUGETLB_PAGE is not set
>
> ...was this, what you meant?

Right, I was forgetting that the various "HugePage" lines of /proc/meminfo
don't even appear when those are configured off, so my question was more
obscure than I'd intended. You're not using HugePages at all, so we
can rule out that line of inquiry - that's helpful, thanks.

I'll keep my eyes open.

Hugh

2008-03-06 11:14:35

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.25-rc3: 34TB vmalloc total -- overflow in /proc/meminfo?


* Andi Kleen <[email protected]> wrote:

> Just commenting on the subject. The 34TB are not an over/underflow.
> x86-64 simply has so much address space reserved for vmalloc. It
> doesn't mean of course that that much could be actually allocated in
> real memory.

btw., the exact amount of available vmalloc space on 64-bit x86 is 32 TB
(32768 GB), or 0x0000200000000000 hexa. (this is still only 0.0002% of
the complete 64-bit address space [25% of the 128 TB 64-bit kernel
address space] so we've got plenty of room)

but the first fundamental limit we'll hit on 64-bit is the 32-bit offset
limit of binaries - this affects kernel modules, the kernel image, etc.
We wont hit that anytime soon, but we'll eventually hit it. (user-space
will be the first i guess)

Ingo

2008-03-06 11:30:52

by Andi Kleen

[permalink] [raw]
Subject: Re: 2.6.25-rc3: 34TB vmalloc total -- overflow in /proc/meminfo?


> but the first fundamental limit we'll hit on 64-bit is the 32-bit offset

31bit to be pedantic.

> limit of binaries - this affects kernel modules, the kernel image, etc.

If that ever happens just -fPIC mode would need to be supported
and a proper PLT for the references between modules and kernel. It would complicate
the module loader slightly, but not too much.

> We wont hit that anytime soon, but we'll eventually hit it. (user-space
> will be the first i guess)

I recently submitted a patch to fix the 2GB limit for user space
binaries (missing O_LARGEFILE). I think it made it into .25.

Newer gcc/binutils support the large code model so you could actually
try to generate binaries that big :-) e.g. some of the rtl-to-C compilers
seem to generate huge code so it might be actually useful.

Also of course you can always split the executable into ~2GB shared libraries.

-Andi

2008-03-06 12:00:14

by Pavel Machek

[permalink] [raw]
Subject: Re: 2.6.25-rc3: 34TB vmalloc total -- overflow in /proc/meminfo?

Hi!
> > > > (If I try to suspend to disk, I get a rather nasty oops; goes away
> > > > with mem=3G. Related?)
> > >
> > > The nasty oops is in swsusp_save+0x298 .
> > >
> > > rdx=0xffff810000001008
> > > rdi=0xffff81000c001008
> > > rdi=0x0000000131613000
> >
> > Rafael, this seems to be similar to some problem you were trying to
> > solve... something with numa... I could not find it in
> > bugzilla.kernel.org... do you remember details by chance?
>
> http://bugzilla.kernel.org/show_bug.cgi?id=9966
>
> [Just have a look at the list of regressions from 2.6.24. ;-)]

I searched that one, that's why I discovered those two
duplicates. .. unfortunately, my problem is different.

> In fact, I didn't even try to solve it myself, but asked some knwoledgeable
> people (CCed) for advice. No one responded, unfortunately ...

Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2008-03-06 21:07:26

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.25-rc3: 34TB vmalloc total -- overflow in /proc/meminfo?


* Andi Kleen <[email protected]> wrote:

> > but the first fundamental limit we'll hit on 64-bit is the 32-bit
> > offset
>
> 31bit to be pedantic.

yeah.

> [...]
>
> Newer gcc/binutils support the large code model so you could actually
> try to generate binaries that big :-) e.g. some of the rtl-to-C
> compilers seem to generate huge code so it might be actually useful.

The largest kernel image i've had so far was slightly above 40MB so at
least in the kernel we are not there yet ;-)

Do you have any experience with how much of a size difference there is
when binaries are built for big address mode? I'd expect something in
the neighborhood of 5% for an image with a structure similar to the
kernel's but maybe it's more.

> Also of course you can always split the executable into ~2GB shared
> libraries.

... which brings back happy memories of DOS extenders ;-)

Ingo