2003-11-11 17:21:14

by Erik Jacobson

[permalink] [raw]
Subject: 2.6 /proc/interrupts fails on systems with many CPUs

Howdy.

On systems with lots of processors (512 for example), catting /proc/interrupts
fails with a "not enough memory" error.

This was observed in 2.6.0-test8

I tracked this down to this in proc_misc.c:

static int interrupts_open(struct inode *inode, struct file *file)
{
unsigned size = 4096 * (1 + num_online_cpus() / 8);
char *buf = kmalloc(size, GFP_KERNEL);

The kmalloc fails here.

I'm looking for suggestions on how to fix this. I came up with one fix
that seems to work OK for ia64. I have attached it to this message.
I'm looking for advice on what should be proposed for the real fix.

Thanks!

--
Erik Jacobson - Linux System Software - Silicon Graphics - Eagan, Minnesota


Attachments:
foo.bar (3.18 kB)

2003-11-11 17:36:55

by Martin J. Bligh

[permalink] [raw]
Subject: Re: 2.6 /proc/interrupts fails on systems with many CPUs

> On systems with lots of processors (512 for example), catting /proc/interrupts
> fails with a "not enough memory" error.
>
> This was observed in 2.6.0-test8
>
> I tracked this down to this in proc_misc.c:
>
> static int interrupts_open(struct inode *inode, struct file *file)
> {
> unsigned size = 4096 * (1 + num_online_cpus() / 8);
> char *buf = kmalloc(size, GFP_KERNEL);
>
> The kmalloc fails here.
>
> I'm looking for suggestions on how to fix this. I came up with one fix
> that seems to work OK for ia64. I have attached it to this message.
> I'm looking for advice on what should be proposed for the real fix.

I think it'd make more sense to only use vmalloc when it's explicitly
too big for kmalloc - or simply switch on num_online_cpus > 100 or
whatever a sensible cutoff is (ie nobody but you would ever see this ;-))

M.

2003-11-11 17:33:48

by Randy.Dunlap

[permalink] [raw]
Subject: Re: 2.6 /proc/interrupts fails on systems with many CPUs

On Tue, 11 Nov 2003 11:21:01 -0600 Erik Jacobson <[email protected]> wrote:

| Howdy.
|
| On systems with lots of processors (512 for example), catting /proc/interrupts
| fails with a "not enough memory" error.
|
| This was observed in 2.6.0-test8
|
| I tracked this down to this in proc_misc.c:
|
| static int interrupts_open(struct inode *inode, struct file *file)
| {
| unsigned size = 4096 * (1 + num_online_cpus() / 8);
| char *buf = kmalloc(size, GFP_KERNEL);
|
| The kmalloc fails here.
|
| I'm looking for suggestions on how to fix this. I came up with one fix
| that seems to work OK for ia64. I have attached it to this message.
| I'm looking for advice on what should be proposed for the real fix.

An alternative is to limit 'size' to a maximum of 128 KB (or whatever
the max. kmalloc() on ia64 is) and continue to use kmalloc().
Does that work for you?

Another alternative is to convert show_interrupts to use a seq_file
iterator.

--
~Randy
MOTD: Always include version info.

2003-11-11 17:51:39

by Robert Love

[permalink] [raw]
Subject: Re: 2.6 /proc/interrupts fails on systems with many CPUs

On Tue, 2003-11-11 at 12:21, Erik Jacobson wrote:

> static int interrupts_open(struct inode *inode, struct file *file)
> {
> unsigned size = 4096 * (1 + num_online_cpus() / 8);
> char *buf = kmalloc(size, GFP_KERNEL);
>
> The kmalloc fails here.

Ew.

We should probably use seq_file here, although vmalloc() should not
hurt.

Why __vmalloc() over vmalloc(), though? Eh, I do not even know what
__vmalloc() is?? ;)

Robert Love


2003-11-11 18:17:57

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6 /proc/interrupts fails on systems with many CPUs


On Tue, 11 Nov 2003, Erik Jacobson wrote:
>
> I'm looking for suggestions on how to fix this. I came up with one fix
> that seems to work OK for ia64. I have attached it to this message.
> I'm looking for advice on what should be proposed for the real fix.

This is not the real fix.

Allowing people to use up vmalloc() space by opening the /proc files would
be a major DoS attack. Not worth it.

Instead, just make /proc/interrupts use the proper _sequence_ things, so
that instead of trying to print out everything in one go, you have the
"s_next()" thing to print them out one at a time. The seqfile interfaces
will then do the rigth thing with blocking/caching, and you only need a
single page.

Al - do we have some good documentation of how to use the seq-file
interface?

In the meantime, without documentation, the best place to look is just at
other examples. One such example would be the kernel/kallsyms.c case: see
how it does s_start/s_show/s_next/s_stop (or /proc/slabinfo, or vmstat, or
any number of them).

Linus

2003-11-11 18:22:49

by Al Viro

[permalink] [raw]
Subject: Re: 2.6 /proc/interrupts fails on systems with many CPUs

On Tue, Nov 11, 2003 at 10:17:47AM -0800, Linus Torvalds wrote:

> Al - do we have some good documentation of how to use the seq-file
> interface?

There was a text on LWN and it's OK for starting point. I'll try to
put together something compact, but somebody will have to go through
the result and turn it into proper English...

2003-11-11 18:16:12

by Manfred Spraul

[permalink] [raw]
Subject: Re: 2.6 /proc/interrupts fails on systems with many CPUs

Erik wrote:

> /* don't ask for more than the kmalloc() max size, currently 128 KB */
> if (size > 128 * 1024)
> size = 128 * 1024;
>- buf = kmalloc(size, GFP_KERNEL);
>+ buf = __vmalloc(size, GFP_KERNEL, PAGE_KERNEL);
>
>
kmalloc needs a contiguous memory block. Thus after a long runtime,
large kmalloc calls can fail due to fragmentation. I'd prefer if you
switch to vmalloc after 2*PAGE_SIZE.

Or switch to a line based seq file iterator, then you wouldn't need the
huge buffer.
--
Manfred

2003-11-11 18:24:57

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6 /proc/interrupts fails on systems with many CPUs


On Tue, 11 Nov 2003, Martin J. Bligh wrote:
>
> I think it'd make more sense to only use vmalloc when it's explicitly
> too big for kmalloc - or simply switch on num_online_cpus > 100 or
> whatever a sensible cutoff is (ie nobody but you would ever see this ;-))

No, please please please don't do these things.

vmalloc() is NOT SOMETHING YOU SHOULD EVER USE! It's only valid for when
you _need_ a big array, and you don't have any choice. It's slow, and it's
a very restricted resource: it's a global resource that is literally
restricted to a few tens of megabytes. It should be _very_ carefully used.

There are basically no valid new uses of it. There's a few valid legacy
users (I think the file descriptor array), and there are some drivers that
use it (which is crap, but drivers are drivers), and it's _really_ valid
only for modules. Nothing else.

Basically: if you think you need more memory than a kmalloc() can give,
you need to re-organize your data structures. To either not need a big
area, or to be able to allocate it in chunks.

Linus

2003-11-11 18:36:50

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6 /proc/interrupts fails on systems with many CPUs


On Tue, 11 Nov 2003, Martin J. Bligh wrote:
>
> OK, I was actually trying to avoid the use of vmalloc, instead of the
> unconditional conversion to vmalloc, which is what the original patch did ;-)

Yes, I realize that, but it's the old case of

"I'm totally faithful to my husband - I never sleep with other men when
he is around"

joke.

Basically, if it's wrong to use, it's wrong to use even occasionally. In
fact, having two different code-paths just makes the code worse.

Yes, I realize that sometimes you have to do it that way, and it might be
the simplest way to fix something. In this case, though, the cost and
fragility of a generic interface is not worth it, since the problem isn't
actually in the generic code at all..

Linus

2003-11-11 18:37:24

by Randy.Dunlap

[permalink] [raw]
Subject: Re: 2.6 /proc/interrupts fails on systems with many CPUs

On Tue, 11 Nov 2003 10:17:47 -0800 (PST) Linus Torvalds <[email protected]> wrote:

|
| On Tue, 11 Nov 2003, Erik Jacobson wrote:
| >
| > I'm looking for suggestions on how to fix this. I came up with one fix
| > that seems to work OK for ia64. I have attached it to this message.
| > I'm looking for advice on what should be proposed for the real fix.
|
| This is not the real fix.
|
| Allowing people to use up vmalloc() space by opening the /proc files would
| be a major DoS attack. Not worth it.
|
| Instead, just make /proc/interrupts use the proper _sequence_ things, so
| that instead of trying to print out everything in one go, you have the
| "s_next()" thing to print them out one at a time. The seqfile interfaces
| will then do the rigth thing with blocking/caching, and you only need a
| single page.
|
| Al - do we have some good documentation of how to use the seq-file
| interface?

See http://lwn.net/Articles/22355/
and http://www.xenotime.net/linux/doc/seq_file_howto.txt

| In the meantime, without documentation, the best place to look is just at
| other examples. One such example would be the kernel/kallsyms.c case: see
| how it does s_start/s_show/s_next/s_stop (or /proc/slabinfo, or vmstat, or
| any number of them).


--
~Randy
MOTD: Always include version info.

2003-11-11 18:32:08

by Martin J. Bligh

[permalink] [raw]
Subject: Re: 2.6 /proc/interrupts fails on systems with many CPUs

>> I think it'd make more sense to only use vmalloc when it's explicitly
>> too big for kmalloc - or simply switch on num_online_cpus > 100 or
>> whatever a sensible cutoff is (ie nobody but you would ever see this ;-))
>
> No, please please please don't do these things.
>
> vmalloc() is NOT SOMETHING YOU SHOULD EVER USE! It's only valid for when
> you _need_ a big array, and you don't have any choice. It's slow, and it's
> a very restricted resource: it's a global resource that is literally
> restricted to a few tens of megabytes. It should be _very_ carefully used.
>
> There are basically no valid new uses of it. There's a few valid legacy
> users (I think the file descriptor array), and there are some drivers that
> use it (which is crap, but drivers are drivers), and it's _really_ valid
> only for modules. Nothing else.
>
> Basically: if you think you need more memory than a kmalloc() can give,
> you need to re-organize your data structures. To either not need a big
> area, or to be able to allocate it in chunks.

OK, I was actually trying to avoid the use of vmalloc, instead of the
unconditional conversion to vmalloc, which is what the original patch did ;-)

But you are, of course, correct - in this case, it should be easy to use
the seq_file stuff to do it in smaller chunks, and use a smaller buffer.

M.

2003-11-11 19:24:37

by Anton Blanchard

[permalink] [raw]
Subject: Re: 2.6 /proc/interrupts fails on systems with many CPUs


> On systems with lots of processors (512 for example), catting
> /proc/interrupts fails with a "not enough memory" error.

FYI we are seeing this on ppc64 too (less cpus but probably more
interrupt sources).

Anton

2003-11-11 19:55:37

by Brown, Len

[permalink] [raw]
Subject: Re: 2.6 /proc/interrupts fails on systems with many CPUs

Speaking of /proc/interrupts...

Does anybody else find hw_interrupt_type.typename lacking?

"XT-PIC" doesn't tell us if it is level or edge triggered, high or low
polarity.

"IO-APIC-edge" and "IO-APIC-level" don't tell us if it is high or low
polarity.

Having this info easily available in /proc/interrupts would make
recognizing and diagnosing interrupt configuration issues much easier.

thanks,
-Len


2003-11-11 20:15:28

by Jonathan Corbet

[permalink] [raw]
Subject: Re: 2.6 /proc/interrupts fails on systems with many CPUs

> > Al - do we have some good documentation of how to use the seq-file
> > interface?
>
> There was a text on LWN and it's OK for starting point.

That would be http://lwn.net/Articles/22355/.

If you'd like a version packaged up for Documentation/, say the word and I
can do that.

jon

Jonathan Corbet
Executive editor, LWN.net
[email protected]

2003-11-11 20:19:07

by Anton Blanchard

[permalink] [raw]
Subject: Re: 2.6 /proc/interrupts fails on systems with many CPUs


> There are basically no valid new uses of it. There's a few valid legacy
> users (I think the file descriptor array), and there are some drivers that
> use it (which is crap, but drivers are drivers), and it's _really_ valid
> only for modules. Nothing else.

The IPC code is doing ugly things too:

void* ipc_alloc(int size)
{
void* out;
if(size > PAGE_SIZE)
out = vmalloc(size);
else
out = kmalloc(size, GFP_KERNEL);
return out;
}

Anton

2003-11-11 22:16:22

by Martin J. Bligh

[permalink] [raw]
Subject: Re: 2.6 /proc/interrupts fails on systems with many CPUs

>> There are basically no valid new uses of it. There's a few valid legacy
>> users (I think the file descriptor array), and there are some drivers that
>> use it (which is crap, but drivers are drivers), and it's _really_ valid
>> only for modules. Nothing else.
>
> The IPC code is doing ugly things too:
>
> void* ipc_alloc(int size)
> {
> void* out;
> if(size > PAGE_SIZE)
> out = vmalloc(size);
> else
> out = kmalloc(size, GFP_KERNEL);
> return out;
> }

That seems particularly .... odd ... as PAGE_SIZE isn't anywhere near the
breakpoint. Worst case (and I know I'll get yelled at for this, but I'll
get another amusing analogy out of Linus ;-)) we should just call kmalloc
and if it fails, then try vmalloc ...

M.

2003-11-11 22:33:25

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: 2.6 /proc/interrupts fails on systems with many CPUs

On Tue, 11 Nov 2003, Martin J. Bligh wrote:

> That seems particularly .... odd ... as PAGE_SIZE isn't anywhere near the
> breakpoint. Worst case (and I know I'll get yelled at for this, but I'll
> get another amusing analogy out of Linus ;-)) we should just call kmalloc
> and if it fails, then try vmalloc ...

You're Cruisin' for a bruisin' ;)

I wonder if it ever does exceed 128k anyway...

static int semctl_main(...)
{
int nsems = sma->sem_nsems;
...
sem_io = ipc_alloc(sizeof(ushort)*nsems);
...
}

2003-11-11 23:37:56

by Erlend Aasland

[permalink] [raw]
Subject: Re: 2.6 /proc/interrupts fails on systems with many CPUs

Hi,

This is a bit OT in this thread but...
Doesn't the information in /proc/interrupts really belong somewhere in sysfs?

Regards
Erlend Aasland

2003-11-12 02:09:55

by Martin J. Bligh

[permalink] [raw]
Subject: Re: 2.6 /proc/interrupts fails on systems with many CPUs

> This is a bit OT in this thread but...
> Doesn't the information in /proc/interrupts really belong somewhere in sysfs?

Only if you want to open 10 billion files to get the data every time.

M.

2004-01-27 04:12:57

by Roland Dreier

[permalink] [raw]
Subject: radeonfb problems with 2.6.2-rc2

I just decided to give radeonfb a try with a 2.6 kernel (2.6.2-rc2 to
be specific). I have a Radeon 9000 Pro with two 1280x1024 LCDs
connected, one to the DVI connector and one to the CRT connector.
When it boots, I get a glimpse of the penguin logo for a moment, and
then the screen goes blank and stays blank. If I start the Radeon
XFree86 server then both screens come back to life and work fine.

I get the following in my boot log:

Linux version 2.6.2-rc2 (root@gold) (gcc version 2.95.4 20011002 (Debian prerelease)) #1 Mon Jan 26 11:41:46 PST 2004

[...]

radeonfb_pci_register BEGIN
radeonfb: ref_clk=2700, ref_div=12, xclk=27500 from BIOS
radeonfb: probed DDR SGRAM 65536k videoram
radeon_get_moninfo: bios 4 scratch = a00000a
radeonfb: panel ID string: ?h??
radeonfb: detected DFP panel size from BIOS: 1x0
radeonfb: detected DFP panel size from registers: 1280x1024
radeonfb: ATI Radeon 9000 If DDR SGRAM 64 MB
radeonfb: DVI port DFP monitor connected
radeonfb: CRT port CRT monitor connected
radeonfb_pci_register END

[...]

hStart = 680, hEnd = 960, hTotal = 1056
vStart = 482, vEnd = 501, vTotal = 522
h_total_disp = 0x4f0083 hsync_strt_wid = 0xa302a2
v_total_disp = 0x1df0209 vsync_strt_wid = 0x9301e1
post div = 0x8
fb_div = 0x59
ppll_div_3 = 0x30059
ron = 828, roff = 19684
vclk_freq = 2503, per = 703
Console: switching to colour frame buffer device 80x30

I've seen some radeonfb patches floating around, but my impression was
that 2.6.2-rc2 should work. If anyone has something for me to try,
I'd be happy to give it a spin. Otherwise I guess I'll try to figure
out what that debugging output is telling me.

Thanks,
Roland