2002-04-11 01:23:55

by Randy.Dunlap

[permalink] [raw]
Subject: [patch-2.5.8-pre] swapinfo accounting


In and around 2.5.7 & 2.5.8-pre2, with a highly stressful
VM workload (akpm's bash-shared-mapping and usemem),
causing lots of swap activity, I see some odd swap info
in /proc/meminfo.

Machine is 8-proc with 16 GB of RAM and 16 GB of swap (1 partition
and 7 files, each 2 GB).

Examples of /proc/meminfo (excerpts):

Before swapping:
MemTotal/Free: 16312504 kB/651340 kB
SwapTotal/Free: 16777088 kB/16777088 kB

After swapping:
MemTotal/Free: 16312504 kB/5092 kB
SwapTotal/Free: 16913332 kB/16777088 kB

Notice how SwapTotal grows and SwapFree never changes.
This isn't correct, is it?

It looks to me like mm/swapfile.c::si_swapinfo()
shouldn't be adding nr_to_be_unused to total_swap_pages
or nr_swap_pages for return in val->freeswap and
val->totalswap.

I.e., total_swap_pages is correct for SwapTotal and
nr_swap_pages is correct for SwapFree...except for
bad blocks accounting (?).
(nr_to_be_unused === nr_swap_pages_in_use)
(nr_swap_pages === nr_free_swap_pages)

And the for-j loop to count nr_to_be_unused is overhead...
Maybe, except for bad blocks accounting.

Here are some samples from the end of si_swapinfo():
swapinfo: nr_swap_pages: 3405953, total_swap_pages: 4194272,
nr_to_be_unused: 788319
swapinfo: nr_swap_pages: 3405537, total_swap_pages: 4194272,
nr_to_be_unused: 788735
swapinfo: nr_swap_pages: 3402977, total_swap_pages: 4194272,
nr_to_be_unused: 791295

Anyway, here is the patch that makes it better.
Not perfect, due to possibility of bad blocks, but better
than it was. Comments?


--- linux-258-pre2/mm/swapfile.c.SI Wed Apr 10 17:50:34 2002
+++ linux-258-pre2/mm/swapfile.c Wed Apr 10 18:09:46 2002
@@ -1107,8 +1107,8 @@
}
}
}
- val->freeswap = nr_swap_pages + nr_to_be_unused;
- val->totalswap = total_swap_pages + nr_to_be_unused;
+ val->freeswap = nr_swap_pages;
+ val->totalswap = total_swap_pages;
swap_list_unlock();
}


--
~Randy


2002-04-11 01:44:18

by Mike Fedyk

[permalink] [raw]
Subject: Re: [patch-2.5.8-pre] swapinfo accounting

On Wed, Apr 10, 2002 at 06:20:55PM -0700, Randy.Dunlap wrote:
> Anyway, here is the patch that makes it better.
> Not perfect, due to possibility of bad blocks, but better
> than it was. Comments?
>
>
> --- linux-258-pre2/mm/swapfile.c.SI Wed Apr 10 17:50:34 2002
> +++ linux-258-pre2/mm/swapfile.c Wed Apr 10 18:09:46 2002
> @@ -1107,8 +1107,8 @@
> }
> }
> }
> - val->freeswap = nr_swap_pages + nr_to_be_unused;
> - val->totalswap = total_swap_pages + nr_to_be_unused;
> + val->freeswap = nr_swap_pages;
> + val->totalswap = total_swap_pages;
> swap_list_unlock();
> }

Shouldn't that be s/+/-/ instead? If this is badblocks accounting, it
should probably be subtracted instead of added.

2002-04-11 01:50:48

by Andrew Morton

[permalink] [raw]
Subject: Re: [patch-2.5.8-pre] swapinfo accounting

"Randy.Dunlap" wrote:
>
> It looks to me like mm/swapfile.c::si_swapinfo()
> shouldn't be adding nr_to_be_unused to total_swap_pages
> or nr_swap_pages for return in val->freeswap and
> val->totalswap.

whee, an si_swapinfo() maintainer.

Your function sucks :) I'm spending 15 CPU-seconds
in there during a kernel build. The problem appears
to be that a fix from 2.4 hasn't been propagated
forward.

2.4 has:

if (swap_info[i].flags != SWP_USED)

and 2.5 has:

if (!(swap_info[i].flags & SWP_USED))

and I think the 2.4 version will fix the accounting
problem you're seeing?

(I haven't checked whather it's the _right_ fix, but
it looks like it'll make it go away?)

-

2002-04-11 03:52:42

by Randy.Dunlap

[permalink] [raw]
Subject: Re: [patch-2.5.8-pre] swapinfo accounting

On Wed, 10 Apr 2002, Andrew Morton wrote:

| "Randy.Dunlap" wrote:
| >
| > It looks to me like mm/swapfile.c::si_swapinfo()
| > shouldn't be adding nr_to_be_unused to total_swap_pages
| > or nr_swap_pages for return in val->freeswap and
| > val->totalswap.
|
| whee, an si_swapinfo() maintainer.

whoops. I can't argue for its efficiency, can I?

| Your function sucks :) I'm spending 15 CPU-seconds
| in there during a kernel build. The problem appears
| to be that a fix from 2.4 hasn't been propagated
| forward.

Bah!! :( Where is a version-forwarder for this stuff? 8;)
and will it be dropped...

Your latter 2 sentences aren't related -- right?
On the 15 CPU-seconds in si_swapfile(): yes, I think
there is room for some improvement there.

| 2.4 has:
|
| if (swap_info[i].flags != SWP_USED)
|
| and 2.5 has:
|
| if (!(swap_info[i].flags & SWP_USED))
|
| and I think the 2.4 version will fix the accounting
| problem you're seeing?

OK, I'll try it Thurs.

| (I haven't checked whather it's the _right_ fix, but
| it looks like it'll make it go away?)

We'll see about that.

Thanks-
--
~Randy

2002-04-11 03:57:15

by Randy.Dunlap

[permalink] [raw]
Subject: Re: [patch-2.5.8-pre] swapinfo accounting

On Wed, 10 Apr 2002, Mike Fedyk wrote:

| On Wed, Apr 10, 2002 at 06:20:55PM -0700, Randy.Dunlap wrote:
| > Anyway, here is the patch that makes it better.
| > Not perfect, due to possibility of bad blocks, but better
| > than it was. Comments?
| >
| >
| > --- linux-258-pre2/mm/swapfile.c.SI Wed Apr 10 17:50:34 2002
| > +++ linux-258-pre2/mm/swapfile.c Wed Apr 10 18:09:46 2002
| > @@ -1107,8 +1107,8 @@
| > }
| > }
| > }
| > - val->freeswap = nr_swap_pages + nr_to_be_unused;
| > - val->totalswap = total_swap_pages + nr_to_be_unused;
| > + val->freeswap = nr_swap_pages;
| > + val->totalswap = total_swap_pages;
| > swap_list_unlock();
| > }
|
| Shouldn't that be s/+/-/ instead? If this is badblocks accounting, it
| should probably be subtracted instead of added.

Hi-
This isn't badblocks accounting. Sorry for the
confusion...

[and yes, I should have looked in one of the other
kernel branches that are floating around...]

--
~Randy

2002-04-11 15:13:48

by Hugh Dickins

[permalink] [raw]
Subject: Re: [patch-2.5.8-pre] swapinfo accounting

On Wed, 10 Apr 2002, Andrew Morton wrote:
> "Randy.Dunlap" wrote:
> >
> > It looks to me like mm/swapfile.c::si_swapinfo()
> > shouldn't be adding nr_to_be_unused to total_swap_pages
> > or nr_swap_pages for return in val->freeswap and
> > val->totalswap.

Good observation, thanks Randy, but wrong fix.

> whee, an si_swapinfo() maintainer.

That might be me? I rewrote that code for 2.4.10
(when you called attention to si_swapinfo slowness).

> Your function sucks :) I'm spending 15 CPU-seconds
> in there during a kernel build. The problem appears
> to be that a fix from 2.4 hasn't been propagated
> forward.

Not a new fix needing propagation: it was right in 2.5.0.

> 2.4 has:
>
> if (swap_info[i].flags != SWP_USED)
>
> and 2.5 has:
>
> if (!(swap_info[i].flags & SWP_USED))

It was mistakenly changed to this in 2.5.4, when an
additional SWP_BLOCKDEV flag was added (gone in 2.5.6).

> and I think the 2.4 version will fix the accounting
> problem you're seeing?

Yes, it does.

> (I haven't checked whather it's the _right_ fix, but
> it looks like it'll make it go away?)

It is the right fix; but since that condition was misunderstood,
and some other flag might be added in future, the safer patch
would be this more explicit one (which I'll now send to Linus
with a briefer description).

But I hope nobody backports this to 2.4, where it would be
wrong: 2.5.4 confusingly changed the nature of SWP_WRITEOK.

Hugh

--- 2.5.8-pre3/mm/swapfile.c Mon Mar 11 12:30:56 2002
+++ linux/mm/swapfile.c Thu Apr 11 15:26:51 2002
@@ -1095,7 +1095,8 @@
swap_list_lock();
for (i = 0; i < nr_swapfiles; i++) {
unsigned int j;
- if (!(swap_info[i].flags & SWP_USED))
+ if (!(swap_info[i].flags & SWP_USED) ||
+ (swap_info[i].flags & SWP_WRITEOK))
continue;
for (j = 0; j < swap_info[i].max; ++j) {
switch (swap_info[i].swap_map[j]) {

2002-04-11 17:21:07

by Randy.Dunlap

[permalink] [raw]
Subject: Re: [patch-2.5.8-pre] swapinfo accounting

On Thu, 11 Apr 2002, Hugh Dickins wrote:

| On Wed, 10 Apr 2002, Andrew Morton wrote:
| > "Randy.Dunlap" wrote:
| > >
| > > It looks to me like mm/swapfile.c::si_swapinfo()
| > > shouldn't be adding nr_to_be_unused to total_swap_pages
| > > or nr_swap_pages for return in val->freeswap and
| > > val->totalswap.
|
| Good observation, thanks Randy, but wrong fix.

Thanks for the patch. I confirm that it is giving me
numbers that look sane.

BTW, the patch was for discussion, so it worked.
"a la akpm" :)

~Randy

| > whee, an si_swapinfo() maintainer.
|
| That might be me? I rewrote that code for 2.4.10
| (when you called attention to si_swapinfo slowness).
|
...
|
| It is the right fix; but since that condition was misunderstood,
| and some other flag might be added in future, the safer patch
| would be this more explicit one (which I'll now send to Linus
| with a briefer description).
|
| But I hope nobody backports this to 2.4, where it would be
| wrong: 2.5.4 confusingly changed the nature of SWP_WRITEOK.
|
| Hugh
|
| --- 2.5.8-pre3/mm/swapfile.c Mon Mar 11 12:30:56 2002
| +++ linux/mm/swapfile.c Thu Apr 11 15:26:51 2002
| @@ -1095,7 +1095,8 @@
| swap_list_lock();
| for (i = 0; i < nr_swapfiles; i++) {
| unsigned int j;
| - if (!(swap_info[i].flags & SWP_USED))
| + if (!(swap_info[i].flags & SWP_USED) ||
| + (swap_info[i].flags & SWP_WRITEOK))
| continue;
| for (j = 0; j < swap_info[i].max; ++j) {
| switch (swap_info[i].swap_map[j]) {