2001-02-01 07:03:40

by David Ford

[permalink] [raw]
Subject: VM brokenness, possibly related to reiserfs

(Chris, changing JOURNAL_MAX_BATCH from 900 to 100 didn't affect
anything).

Ok, having approached this slightly more intelligently here are [better]
results.

The dumps are large so they are located at http://stuph.org/VM/. Here's
the story. I boot and startx, I load xmms and netscape to eat away
memory. When free buffers/cache falls below 7M the system stalls and
the only recovery is sysrq-E or reboot. At the moment of stall the disk
will grind continuously for about 25 to 30 minutes then go silent. At
this point in time the only recovery is reboot, sysrq-E won't work.

If I move the mouse or type a key within 30 seconds of this incident,
that user input will take about 5 minutes to register. After that
initial minute, nothing more will happen.

Kernel 2.4.1, with reiserfs, devfs, no patches applied.

"klog-X" are basically the same thing but I'm running top, syslogd, and
klogd with -20 priority. I didn't note anything out of the ordinary in
top. These are snapshots where I've managed to murder processes and
restart the problem without rebooting.

In the second instance, I had my finger on the kill button and managed
to kill netscape and recover partially. However the system was heavily
loaded even after the kill.

I have xmms in STOPped state so it's just waiting.

kswapd is taking 12.2% of the CPU according to ps, and kapm-idled is
taking 26.9%. bdflush is taking 2.7%, X 3.5%, all others are nominal.
The system load was hovering at 1.00 for a few minutes then dropped to
zero. However scrolling text in an rxvt is slow enough to watch blocks
move. Running "ps aux" takes nearly one third of a second for total
time. Total number of processes is ~40.

Jan 31 22:31:51 nifty kernel: kapm-idled S CBF77F94 4124 3
1 (L-TLB) 4 2
Jan 31 22:31:51 nifty kernel: Call Trace: [schedule_timeout+115/148]
[process_timeout+0/72] [apm_mainloop+221/256] [apm+668/692]
[kernel_thread+31/56] [kernel_thread+40/56]

Jan 31 22:31:51 nifty kernel: kswapd S CBF75FAC 5704 4
1 (L-TLB) 5 3
Jan 31 22:31:51 nifty kernel: Call Trace: [schedule_timeout+115/148]
[process_timeout+0/72] [interruptible_sleep_on_timeout+66/92]
[kswapd+213/244] [kernel_thread+40/56]

Jan 31 22:31:52 nifty kernel: bdflush S CBF70000 5912 6
1 (L-TLB) 7 5
Jan 31 22:31:52 nifty kernel: Call Trace: [bdflush+206/216]
[kernel_thread+40/56]


In the fourth snapshot, I have put xmms in STOP state again inside the
memory shortage, memory is at 4800 free buffers/cache and 1592 free mem.

As I entered this shortage period I started a 'ps -eo ... > file' to try
and record data there. This is the only disk activity happening. Load
is ~4.00. I have now killed the ps.

Load has dropped significantly and I have tolerable but quite laggy user
input responsiveness now.

Memory is currently 4900/1588 like above. Load is about 2.00 and will
continue dropping if I don't do anything. Any processes I exec which
need to be loaded from disk take several seconds. I.e. 'uptime' takes
about 4 seconds to execute.

Snapshot #5 will be the last one and I will reboot. Once memory is
freed from xmms (back to 150megs free), everything is peachy.


-d

--
There is a natural aristocracy among men. The grounds of this are virtue and talents. Thomas Jefferson
The good thing about standards is that there are so many to choose from. Andrew S. Tanenbaum




2001-02-01 12:39:47

by Ed Tomlinson

[permalink] [raw]
Subject: Re: VM brokenness, possibly related to reiserfs

Hi,

Gather this is with no swap space allocated... And the question is why does
the oom handler not get triggered?

Ed Tomlinson

David Ford wrote:

> (Chris, changing JOURNAL_MAX_BATCH from 900 to 100 didn't affect
> anything).
>
> Ok, having approached this slightly more intelligently here are [better]
> results.
>
> The dumps are large so they are located at http://stuph.org/VM/. Here's
> the story. I boot and startx, I load xmms and netscape to eat away
> memory. When free buffers/cache falls below 7M the system stalls and
> the only recovery is sysrq-E or reboot. At the moment of stall the disk
> will grind continuously for about 25 to 30 minutes then go silent. At
> this point in time the only recovery is reboot, sysrq-E won't work.
>
> If I move the mouse or type a key within 30 seconds of this incident,
> that user input will take about 5 minutes to register. After that
> initial minute, nothing more will happen.
>
> Kernel 2.4.1, with reiserfs, devfs, no patches applied.
>
> "klog-X" are basically the same thing but I'm running top, syslogd, and
> klogd with -20 priority. I didn't note anything out of the ordinary in
> top. These are snapshots where I've managed to murder processes and
> restart the problem without rebooting.
>
> In the second instance, I had my finger on the kill button and managed
> to kill netscape and recover partially. However the system was heavily
> loaded even after the kill.
>
> I have xmms in STOPped state so it's just waiting.
>
> kswapd is taking 12.2% of the CPU according to ps, and kapm-idled is
> taking 26.9%. bdflush is taking 2.7%, X 3.5%, all others are nominal.
> The system load was hovering at 1.00 for a few minutes then dropped to
> zero. However scrolling text in an rxvt is slow enough to watch blocks
> move. Running "ps aux" takes nearly one third of a second for total
> time. Total number of processes is ~40.
>
> Jan 31 22:31:51 nifty kernel: kapm-idled S CBF77F94 4124 3
> 1 (L-TLB) 4 2
> Jan 31 22:31:51 nifty kernel: Call Trace: [schedule_timeout+115/148]
> [process_timeout+0/72] [apm_mainloop+221/256] [apm+668/692]
> [kernel_thread+31/56] [kernel_thread+40/56]
>
> Jan 31 22:31:51 nifty kernel: kswapd S CBF75FAC 5704 4
> 1 (L-TLB) 5 3
> Jan 31 22:31:51 nifty kernel: Call Trace: [schedule_timeout+115/148]
> [process_timeout+0/72] [interruptible_sleep_on_timeout+66/92]
> [kswapd+213/244] [kernel_thread+40/56]
>
> Jan 31 22:31:52 nifty kernel: bdflush S CBF70000 5912 6
> 1 (L-TLB) 7 5
> Jan 31 22:31:52 nifty kernel: Call Trace: [bdflush+206/216]
> [kernel_thread+40/56]
>
>
> In the fourth snapshot, I have put xmms in STOP state again inside the
> memory shortage, memory is at 4800 free buffers/cache and 1592 free mem.
>
> As I entered this shortage period I started a 'ps -eo ... > file' to try
> and record data there. This is the only disk activity happening. Load
> is ~4.00. I have now killed the ps.
>
> Load has dropped significantly and I have tolerable but quite laggy user
> input responsiveness now.
>
> Memory is currently 4900/1588 like above. Load is about 2.00 and will
> continue dropping if I don't do anything. Any processes I exec which
> need to be loaded from disk take several seconds. I.e. 'uptime' takes
> about 4 seconds to execute.
>
> Snapshot #5 will be the last one and I will reboot. Once memory is
> freed from xmms (back to 150megs free), everything is peachy.
>
>
> -d
>
> --
> There is a natural aristocracy among men. The grounds of this are virtue
and talents.
> Thomas Jefferson The good thing about standards is that there are so many
to choose
> from. Andrew S. Tanenbaum
>
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> Please read the FAQ at http://www.tux.org/lkml/
>

2001-02-01 12:43:48

by David Ford

[permalink] [raw]
Subject: Re: VM brokenness, possibly related to reiserfs

Correct, the point of the matter is to find stress points. It will do the exact
same thing when it reaches the end of swap. I suspect a relation to reiserfs
fighting for buffers perhaps. This fight occurs a few megs before the OOM
routine trips.

-d

Ed Tomlinson wrote:

> Hi,
>
> Gather this is with no swap space allocated... And the question is why does
> the oom handler not get triggered?
>
> Ed Tomlinson
>
> David Ford wrote:
>
> > (Chris, changing JOURNAL_MAX_BATCH from 900 to 100 didn't affect
> > anything).
> >
> > Ok, having approached this slightly more intelligently here are [better]
> > results.
> >
> > The dumps are large so they are located at http://stuph.org/VM/. Here's
> > the story. I boot and startx, I load xmms and netscape to eat away
> > memory. When free buffers/cache falls below 7M the system stalls and
> > the only recovery is sysrq-E or reboot. At the moment of stall the disk
> > will grind continuously for about 25 to 30 minutes then go silent. At
> > this point in time the only recovery is reboot, sysrq-E won't work.
> >
> > If I move the mouse or type a key within 30 seconds of this incident,
> > that user input will take about 5 minutes to register. After that
> > initial minute, nothing more will happen.
> >
> > Kernel 2.4.1, with reiserfs, devfs, no patches applied.
> >
> > "klog-X" are basically the same thing but I'm running top, syslogd, and
> > klogd with -20 priority. I didn't note anything out of the ordinary in
> > top. These are snapshots where I've managed to murder processes and
> > restart the problem without rebooting.
> >
> > In the second instance, I had my finger on the kill button and managed
> > to kill netscape and recover partially. However the system was heavily
> > loaded even after the kill.
> >
> > I have xmms in STOPped state so it's just waiting.
> >
> > kswapd is taking 12.2% of the CPU according to ps, and kapm-idled is
> > taking 26.9%. bdflush is taking 2.7%, X 3.5%, all others are nominal.
> > The system load was hovering at 1.00 for a few minutes then dropped to
> > zero. However scrolling text in an rxvt is slow enough to watch blocks
> > move. Running "ps aux" takes nearly one third of a second for total
> > time. Total number of processes is ~40.
> >
> > Jan 31 22:31:51 nifty kernel: kapm-idled S CBF77F94 4124 3
> > 1 (L-TLB) 4 2
> > Jan 31 22:31:51 nifty kernel: Call Trace: [schedule_timeout+115/148]
> > [process_timeout+0/72] [apm_mainloop+221/256] [apm+668/692]
> > [kernel_thread+31/56] [kernel_thread+40/56]
> >
> > Jan 31 22:31:51 nifty kernel: kswapd S CBF75FAC 5704 4
> > 1 (L-TLB) 5 3
> > Jan 31 22:31:51 nifty kernel: Call Trace: [schedule_timeout+115/148]
> > [process_timeout+0/72] [interruptible_sleep_on_timeout+66/92]
> > [kswapd+213/244] [kernel_thread+40/56]
> >
> > Jan 31 22:31:52 nifty kernel: bdflush S CBF70000 5912 6
> > 1 (L-TLB) 7 5
> > Jan 31 22:31:52 nifty kernel: Call Trace: [bdflush+206/216]
> > [kernel_thread+40/56]
> >
> >
> > In the fourth snapshot, I have put xmms in STOP state again inside the
> > memory shortage, memory is at 4800 free buffers/cache and 1592 free mem.
> >
> > As I entered this shortage period I started a 'ps -eo ... > file' to try
> > and record data there. This is the only disk activity happening. Load
> > is ~4.00. I have now killed the ps.
> >
> > Load has dropped significantly and I have tolerable but quite laggy user
> > input responsiveness now.
> >
> > Memory is currently 4900/1588 like above. Load is about 2.00 and will
> > continue dropping if I don't do anything. Any processes I exec which
> > need to be loaded from disk take several seconds. I.e. 'uptime' takes
> > about 4 seconds to execute.
> >
> > Snapshot #5 will be the last one and I will reboot. Once memory is
> > freed from xmms (back to 150megs free), everything is peachy.
> >
> >
> > -d
> >
> > --
> > There is a natural aristocracy among men. The grounds of this are virtue
> and talents.
> > Thomas Jefferson The good thing about standards is that there are so many
> to choose
> > from. Andrew S. Tanenbaum
> >
> >
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > Please read the FAQ at http://www.tux.org/lkml/
> >

--
There is a natural aristocracy among men. The grounds of this are virtue and talents. Thomas Jefferson
The good thing about standards is that there are so many to choose from. Andrew S. Tanenbaum



2001-02-01 16:16:28

by Chris Mason

[permalink] [raw]
Subject: Re: VM brokenness, possibly related to reiserfs



On Wednesday, January 31, 2001 11:02:46 PM -0800 David Ford <[email protected]> wrote:

> (Chris, changing JOURNAL_MAX_BATCH from 900 to 100 didn't affect
> anything).
>
> Ok, having approached this slightly more intelligently here are [better]
> results.
>
> The dumps are large so they are located at http://stuph.org/VM/. Here's
> the story.

Sorry, can't seem to resolve stuph.org. What is kreiserfsd doing during when the system is waiting for more ram? With JOURNAL_MAX_BATCH set to 100, kreiserfsd will end up responsible for sending log blocks/metadata to disk and freeing the pinned buffers.

-chris

2001-02-01 16:17:48

by Rik van Riel

[permalink] [raw]
Subject: Re: VM brokenness, possibly related to reiserfs

On Wed, 31 Jan 2001, David Ford wrote:

> The dumps are large so they are located at http://stuph.org/VM/.

I can't seem to resolve this domain ...

> Here's the story. I boot and startx, I load xmms and netscape
> to eat away memory. When free buffers/cache falls below 7M the
> system stalls and the only recovery is sysrq-E or reboot. At
> the moment of stall the disk will grind continuously for about
> 25 to 30 minutes then go silent. At this point in time the only
> recovery is reboot, sysrq-E won't work.

> Kernel 2.4.1, with reiserfs, devfs, no patches applied.

Between 2.4.0 and 2.4.1, a few VM changes were made by
Linus and Marcelo. These are good changes, but they seem
to need a little bit of tuning in related code to get
system behaviour right again.

I'm working on these changes and should have a (few)
rebalancing patches out soon to fix the performance
problem.

About the system hanging completely, I wonder if it goes
away by pressing sysrq-S (sync all disks). If it does,
maybe Reiserfs was blocking all the pages in the inactive
list from being written because one of the active pages
(not a replacement candidate) needed to be written out
first? Or does the Reiserfs ->writepage() function handle
this?

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/ http://distro.conectiva.com.br/

2001-02-01 16:30:40

by Rik van Riel

[permalink] [raw]
Subject: Re: VM brokenness, possibly related to reiserfs

On Thu, 1 Feb 2001, David Ford wrote:

> Correct, the point of the matter is to find stress points. It
> will do the exact same thing when it reaches the end of swap.
> I suspect a relation to reiserfs fighting for buffers perhaps.
> This fight occurs a few megs before the OOM routine trips.

Ah, so this is the problem. I've already heard that the
OOM killer is no longer triggered for some reason, I'll
enter the bug in the Linux-MM bugzilla so I don't forget
it and I'll fix it ASAP.

ObBugzillaURL:
http://www.linux-mm.org/bugzilla.shtml

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/ http://distro.conectiva.com.br/

2001-02-01 16:32:30

by Chris Mason

[permalink] [raw]
Subject: Re: [reiserfs-list] Re: VM brokenness, possibly related to reiserfs



On Thursday, February 01, 2001 02:16:43 PM -0200 Rik van Riel <[email protected]> wrote:

>
> About the system hanging completely, I wonder if it goes
> away by pressing sysrq-S (sync all disks). If it does,
> maybe Reiserfs was blocking all the pages in the inactive
> list from being written because one of the active pages
> (not a replacement candidate) needed to be written out
> first? Or does the Reiserfs ->writepage() function handle
> this?
>

In most cases, the reiserfs writepage func is the same as block_write_full_page. The only difference should come when a packed tail has been mmap'd.

Since JOURNAL_MAX_BATCH was at 100, the log should have only pinned 400k. More blocks could be pinned, but kreiserfsd should be in the process of flushing those.

I've been trying out a few things to send memory pressure down to reiserfs, but they have mostly been based on the code to make fs/buffer.c use writepage for flushing. I should have done something simple first, I'll start on that now.

-chris

2001-02-01 23:06:02

by David Ford

[permalink] [raw]
Subject: Re: [reiserfs-list] Re: VM brokenness, possibly related to reiserfs

My apologies...my internic data isn't updated, http://208.179.0.18/VM/

-d


--
There is a natural aristocracy among men. The grounds of this are virtue and talents. Thomas Jefferson
The good thing about standards is that there are so many to choose from. Andrew S. Tanenbaum



2001-02-01 23:43:11

by David Ford

[permalink] [raw]
Subject: Re: [reiserfs-list] Re: VM brokenness, possibly related to reiserfs

Chris Mason wrote:

> Sorry, can't seem to resolve stuph.org. What is kreiserfsd doing during when the system is waiting for more ram? With JOURNAL_MAX_BATCH set to 100, kreiserfsd will end up responsible for sending log blocks/metadata to disk and freeing the pinned buffers.
>
> -chris

(http://208.179.0.18/VM/)

[schedule_timeout+115/148] [process_timeout+0/72] [interruptible_sleep_on_timeout+66/92] [reiserfs_journal_commit_thread+173/224] [kernel_thread+40/56]

-d

--
There is a natural aristocracy among men. The grounds of this are virtue and talents. Thomas Jefferson
The good thing about standards is that there are so many to choose from. Andrew S. Tanenbaum



2001-02-04 00:45:39

by David Ford

[permalink] [raw]
Subject: Re: [reiserfs-list] Re: VM brokenness, possibly related to reiserfs

Chris Mason wrote:

>
> On Thursday, February 01, 2001 02:16:43 PM -0200 Rik van Riel <[email protected]> wrote:
>
>> About the system hanging completely, I wonder if it goes
>> away by pressing sysrq-S (sync all disks). If it does,
>> maybe Reiserfs was blocking all the pages in the inactive
>> list from being written because one of the active pages
>> (not a replacement candidate) needed to be written out
>> first? Or does the Reiserfs ->writepage() function handle
>> this?
>

In answer to Rik's question, no, sysrq-S doesn't fix it. The sound of
the disk grind changes momentarily then it resumes.

> In most cases, the reiserfs writepage func is the same as block_write_full_page. The only difference should come when a packed tail has been mmap'd.
>
> Since JOURNAL_MAX_BATCH was at 100, the log should have only pinned 400k. More blocks could be pinned, but kreiserfsd should be in the process of flushing those.
>
> I've been trying out a few things to send memory pressure down to reiserfs, but they have mostly been based on the code to make fs/buffer.c use writepage for flushing. I should have done something simple first, I'll start on that now.
>
> -chris


http://stuph.org/VM/ is back up, no thanks to the network solutions.
The files listed there have all the relevant backtraces.

-d