2004-04-30 02:05:41

by Art Haas

[permalink] [raw]
Subject: Problem with recent changes to fs/dcache.c

Hi.

I run linux on a SparcStation SS20 in addition to a PC, and found that
none of the 2.6.6-rc kernels would boot. After trying the latest -rc3
kernel and seeing it fail also, my debugging quest began. Adding a few
printk() statements pointed the problem to be in fs/dcache.c in the
vfs_caches_init() function. The 1.69->1.70 changes to this file added in
a call to nr_free_pages() and used that result to adjust the global
mempages variable. This change caused the boot failures.

The printk() statements I'd added showed that vfs_caches_init() was
being called with 'mempages' set to 46073. The nr_free_pages() call
returned 127661, and this value being subtracted from mempages went
negative, but the value is unsigned, so mempages became enormous. Things
ended up getting stuck in the inode_init() call down a bit, having
seeming survived the dcache_init() call only because of values
wrapping around.

I commented out the 'mempages -= reserve;' line in the file, and the
boot continued along. Unfortunately I encounter a kernel trap when
mounting the hard drives, so there are other problems still needing to
be looked at.

The possiblity of nr_free_pages() being larger than mempages looks like
a silent bug that was tripped. If not, then another bug in the Sparc
port may be responsible for values being used in these functions. The
memory-management gurus can take a peek and see what they find.

Art Haas
--
Man once surrendering his reason, has no remaining guard against absurdities
the most monstrous, and like a ship without rudder, is the sport of every wind.

-Thomas Jefferson to James Smith, 1822


2004-04-30 03:40:15

by Andrew Morton

[permalink] [raw]
Subject: Re: Problem with recent changes to fs/dcache.c

"Art Haas" <[email protected]> wrote:
>
> I run linux on a SparcStation SS20 in addition to a PC, and found that
> none of the 2.6.6-rc kernels would boot. After trying the latest -rc3
> kernel and seeing it fail also, my debugging quest began. Adding a few
> printk() statements pointed the problem to be in fs/dcache.c in the
> vfs_caches_init() function. The 1.69->1.70 changes to this file added in
> a call to nr_free_pages() and used that result to adjust the global
> mempages variable. This change caused the boot failures.
>
> The printk() statements I'd added showed that vfs_caches_init() was
> being called with 'mempages' set to 46073. The nr_free_pages() call
> returned 127661, and this value being subtracted from mempages went
> negative, but the value is unsigned, so mempages became enormous. Things
> ended up getting stuck in the inode_init() call down a bit, having
> seeming survived the dcache_init() call only because of values
> wrapping around.
>
> I commented out the 'mempages -= reserve;' line in the file, and the
> boot continued along. Unfortunately I encounter a kernel trap when
> mounting the hard drives, so there are other problems still needing to
> be looked at.
>
> The possiblity of nr_free_pages() being larger than mempages looks like
> a silent bug that was tripped. If not, then another bug in the Sparc
> port may be responsible for values being used in these functions. The
> memory-management gurus can take a peek and see what they find.

Yes, something's bust in the sparc port's calculation of num_physpages.
Clearly it should be larger than nr_free_pages().

2004-04-30 20:48:02

by Art Haas

[permalink] [raw]
Subject: Re: Problem with recent changes to fs/dcache.c

On Thu, Apr 29, 2004 at 08:39:01PM -0700, Andrew Morton wrote:
> "Art Haas" <[email protected]> wrote:
> >
> > [ ... snip boot problem report on Sparc with 2.6.6-rc ... ]
> >
> > The possiblity of nr_free_pages() being larger than mempages looks like
> > a silent bug that was tripped. If not, then another bug in the Sparc
> > port may be responsible for values being used in these functions. The
> > memory-management gurus can take a peek and see what they find.
>
> Yes, something's bust in the sparc port's calculation of num_physpages.
> Clearly it should be larger than nr_free_pages().

I'm still trying to debug this, so I've cloned a 2.6.5 tree and added
some printk() bits to see what it reported. Here's the top of the
'dmesg' output. Notice the 'num_phspages' is 45829, so the value I was
getting in the 2.6.6-rc3 bootup of 46073 is very similar, which suggests
that the problem _might_ be with the nr_free_pages() call - which leads
down in the the mmzone code. Here's the dmesg output:

.......
Boot time fixup v1.6. 4/Mar/98 Jakub Jelinek ([email protected]).
Patching kerne l for srmmu[TI Viking/MXCC]/iommu
319MB HIGHMEM available.
On node 0 totalpages: 130409
DMA zone: 48666 pages, LIFO batch:11
Normal zone: 0 pages, LIFO batch:1
HighMem zone: 81743 pages, LIFO batch:16
Power off control detected.
Built 1 zonelists
Kernel command line: root=/dev/sda1
PID hash table entries: 2048 (order 11: 16384 bytes)
Console: colour dummy device 80x25
calling mem_init()
Memory: 509676k available (1352k kernel code, 312k data, 116k init,
326972k high mem) [f0000000,1ff4f000]
num_physpages: 45829
Calibrating delay loop... 59.80 BogoMIPS
calling fork_init(45829)
calling vfs_caches_init(45829)
vfs_caches_init(): 45829 mempages
...

Could one or two of the VM gurus mail me offlist with some suggestions
for debugging this? I'm still more than a bit lost wandering through the
code trying to find just where various values are set and where the
functions are that are doing the setting.

Art Haas
--
Man once surrendering his reason, has no remaining guard against absurdities
the most monstrous, and like a ship without rudder, is the sport of every wind.

-Thomas Jefferson to James Smith, 1822

2004-04-30 21:39:33

by Andrew Morton

[permalink] [raw]
Subject: Re: Problem with recent changes to fs/dcache.c

"Art Haas" <[email protected]> wrote:
>
> I'm still trying to debug this, so I've cloned a 2.6.5 tree and added
> some printk() bits to see what it reported. Here's the top of the
> 'dmesg' output. Notice the 'num_phspages' is 45829, so the value I was
> getting in the 2.6.6-rc3 bootup of 46073 is very similar, which suggests
> that the problem _might_ be with the nr_free_pages() call - which leads
> down in the the mmzone code. Here's the dmesg output:
>
> .......
> Boot time fixup v1.6. 4/Mar/98 Jakub Jelinek ([email protected]).
> Patching kerne l for srmmu[TI Viking/MXCC]/iommu
> 319MB HIGHMEM available.
> On node 0 totalpages: 130409
> DMA zone: 48666 pages, LIFO batch:11
> Normal zone: 0 pages, LIFO batch:1
> HighMem zone: 81743 pages, LIFO batch:16

130409 pages.

> Power off control detected.
> Built 1 zonelists
> Kernel command line: root=/dev/sda1
> PID hash table entries: 2048 (order 11: 16384 bytes)
> Console: colour dummy device 80x25
> calling mem_init()
> Memory: 509676k available (1352k kernel code, 312k data, 116k init,
> 326972k high mem) [f0000000,1ff4f000]
> num_physpages: 45829

That's wrong.

I do think that num_physpages is ripe for removal - we have a number of
ways of calculating much the same thing in generic code, and probably all
users could be changed to use something else anyway.

But short-term we're stuck with it, and there's a bug somewhere in
arch/sparc/'s calculation of this number.