Hi!
I'm having awfull interactivity problems. While lingvistic application
(slm from nltools.sf.net) is running, machine is unusable. I still can
read text in most, but can't login, can't run links, can't... For
minutes.
slm does a lot of computation over ~250MB dataset, but during stall
disk was not active.
Pavel
--
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]
Pavel Machek <[email protected]> wrote:
>
> Hi!
>
> I'm having awfull interactivity problems. While lingvistic application
> (slm from nltools.sf.net) is running, machine is unusable. I still can
> read text in most, but can't login, can't run links, can't... For
> minutes.
>
> slm does a lot of computation over ~250MB dataset, but during stall
> disk was not active.
Oh Pavel, this is more a whinge than a bug report. You know better ;)
- How much memory does the machine have?
- UP/SMP/preempt?
- What do vmstat and top say?
- Did it happen in 2.5.64? 2.5.63? 2.4.20?
- Does it get better if you renice stuff?
- What steps should others take to reproduce it?
etc, etc, etc.
Hi!
> > I'm having awfull interactivity problems. While lingvistic application
> > (slm from nltools.sf.net) is running, machine is unusable. I still can
> > read text in most, but can't login, can't run links, can't... For
> > minutes.
> >
> > slm does a lot of computation over ~250MB dataset, but during stall
> > disk was not active.
>
> Oh Pavel, this is more a whinge than a bug report. You know better
> ;)
Yes, it is whine, it is *so* horrible that I thought everyone must see
it.
Looks like scheduler problem to me. Disk is not lit.
vmstat (part):
6 1 15352 3316 408 54164 0 0 256 0 1039 100 99 1 0 0
6 1 15352 3316 408 54168 0 0 0 0 1033 99 100 0 0 0
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
6 1 15352 4212 408 53132 0 0 256 0 1040 109 98 2 0 0
6 1 15352 4044 408 53388 0 0 256 0 1051 112 99 1 0 0
6 1 15352 3708 408 53704 0 0 312 0 1045 108 99 1 0 0
6 1 15352 3484 408 53968 0 0 256 0 1045 130 96 4 0 0
6 1 15352 4276 392 53012 0 0 444 0 1088 131 95 5 0 0
6 1 15352 3828 392 53528 0 0 512 72 1034 114 100 0 0 0
6 1 15352 3828 392 53536 0 0 0 0 1057 90 99 1 0 0
6 1 15352 3604 392 53792 0 0 256 0 1052 87 88 12 0 0
6 1 15352 3324 392 54052 0 0 256 0 1040 95 88 12 0 0
6 1 15352 5340 360 52044 0 0 256 0 1031 90 93 7 0 0
8 1 15352 5164 360 52304 0 0 256 8 1036 99 98 2 0 0
4 1 15352 4884 360 52564 0 0 256 0 1031 118 98 2 0 0
4 1 15352 4660 360 52828 0 0 256 0 1210 215 97 3 0 0
Login takes > 30 seconds at this point.
> - How much memory does the machine have?
256MB
> - UP/SMP/preempt?
UP
> - What do vmstat and top say?
top is hung, too.
> - Did it happen in 2.5.64? 2.5.63? 2.4.20?
Definitely not there with 2.4., and I do not think I seen it with 2.5.64.
> - Does it get better if you renice stuff?
Will try.
> - What steps should others take to reproduce it?
Quite hard to reproduce, that lingvistics tools are not really fun to
use.
> etc, etc, etc.
Pavel
--
Horseback riding is like software...
...vgf orggre jura vgf serr.
On Mon, Mar 24, 2003 at 05:19:36PM -0800, Andrew Morton wrote:
> Pavel Machek <[email protected]> wrote:
> > I'm having awfull interactivity problems. While lingvistic application
> > (slm from nltools.sf.net) is running, machine is unusable. I still can
> > read text in most, but can't login, can't run links, can't... For
> > minutes.
> >
> > slm does a lot of computation over ~250MB dataset, but during stall
> > disk was not active.
>
> Oh Pavel, this is more a whinge than a bug report. You know better ;)
If he's seeing what I'm seeing then I can put my own answers to this. I
get freezups, lost keystrokes and eventual shutdown of the laptop. I can
reproduce it prettymuch at will be it a compilation of a piece of s/w,
the kernel, mozilla loading pages or whatnot.
> - How much memory does the machine have?
256
> - UP/SMP/preempt?
UP with and without preempt presents the same issues.
> - What do vmstat and top say?
My box never survived long enough for me to be able to look.
> - Did it happen in 2.5.64? 2.5.63? 2.4.20?
Never had this under 2.4, Not sure wchi version now as I went to 2.5
with this laptop as soon as the freeze came and I could back it up.
As for 2.5 I can definately say the sluggishness and freezes happen with
2.5.63+ (about to compile .66 and try it) but .63 does not turn the
laptop off for me and .64 takes a wee bit more punishment before dieing
on me.
> - Does it get better if you renice stuff?
>From memory, I niced (to lvl 19) a compile of mplayer that was killing
my laptop and it survived.
> - What steps should others take to reproduce it?
Not too sure. For me it's 'when compiling don't even think of looking at
an input device the wrong way or BOOM'.
--
"Other countries of course, bear the same risk. But there's no doubt his
hatred is mainly directed at us. After all this is the guy who tried to
kill my dad."
- George W. Bush Jr, Leader of the United States Regime
September 26, 2002 (from a political fundraiser in Houston, Texas)
Hi!
> > > I'm having awfull interactivity problems. While lingvistic application
> > > (slm from nltools.sf.net) is running, machine is unusable. I still can
> > > read text in most, but can't login, can't run links, can't... For
> > > minutes.
> > >
> > > slm does a lot of computation over ~250MB dataset, but during stall
> > > disk was not active.
> >
> > Oh Pavel, this is more a whinge than a bug report. You know better ;)
>
> If he's seeing what I'm seeing then I can put my own answers to this. I
> get freezups, lost keystrokes and eventual shutdown of the laptop. I can
> reproduce it prettymuch at will be it a compilation of a piece of s/w,
> the kernel, mozilla loading pages or whatnot.
Actually, this looks like unrelated problem. I'm not getting lost
keystrokes, and machine recovers after lingvistics computation
finishes.
Pavel
--
Horseback riding is like software...
...vgf orggre jura vgf serr.
On Tue, Mar 25, 2003 at 01:00:35AM +0100, Pavel Machek wrote:
> > > Oh Pavel, this is more a whinge than a bug report. You know better ;)
> >
> > If he's seeing what I'm seeing then I can put my own answers to this. I
> > get freezups, lost keystrokes and eventual shutdown of the laptop. I can
> > reproduce it prettymuch at will be it a compilation of a piece of s/w,
> > the kernel, mozilla loading pages or whatnot.
>
> Actually, this looks like unrelated problem. I'm not getting lost
> keystrokes, and machine recovers after lingvistics computation
Maybe it's just more severe for me?
> finishes.
Same here. After its straneous excercise the machine goes back to normal.
--
"Other countries of course, bear the same risk. But there's no doubt his
hatred is mainly directed at us. After all this is the guy who tried to
kill my dad."
- George W. Bush Jr, Leader of the United States Regime
September 26, 2002 (from a political fundraiser in Houston, Texas)
I also have a similar problem, when running setiathome with priority 1.
All _running_ applications remain interactive, anything requring disk
access, in particular starting a new program, causes things to block.
I've so far only been in X when this happens, mouse still moves, remote
applications update their windows fine, no lost keystrokes, it's when I
run something locally requiring disk IO.
Renicing setiathome to -19 causes the problem to vanish.
Dragging an xterm around seems to help it recover from a hang too.
> - How much memory does the machine have?
256
> - UP/SMP/preempt?
preempt
>
> - What do vmstat and top say?
3 1 2216 21008 27464 69920 0 0 0 0 1024 241 100 0 0 0
3 1 2216 20176 27464 69920 0 0 0 0 1024 249 100 0 0 0
3 1 2216 20176 27464 69920 0 0 0 0 1024 252 99 1 0 0
3 1 2216 21060 27464 69920 0 0 0 0 1025 236 100 0 0 0
> - Did it happen in 2.5.64? 2.5.63? 2.4.20?
2.5.64
> - Does it get better if you renice stuff?
Yes
> - What steps should others take to reproduce it?
Running setiathome with priority one seems to do it for me.
George
On Mon, Mar 24, 2003 at 06:12:50PM -0800, Andrew Morton wrote:
> > Same here. After its straneous excercise the machine goes back to normal.
>
> Have you fiddled with all your power mgmt and bios options?
> Disabled acpi and apm? Tried an absolutely bare-bones kernel?
The fun so far. Standard build with 2.5.66 (config-big attachment):
With the kernel and mplayer compiling, mozilla tyring to load 40 or so
webpages at the same time and copying a 30meg file over nfs caused the
system to start severely fscking up. Just moving the mouse around caused
sever gerkyness in its progressiona nd eventually I believe it seriously
started to lose data from the mouse as it began acting as if I were
pressing the mouse buttons in extremely rapid succession whilst I was
only moving the mouse. Top was registering a load of about 4.3 and I was
about to check on /proc/interrups and vmstat when the system hung
completely. Couldn't so much as ping it.
Also, the X screen saver activated a few seconds after I did some
kboard+mouse activity instead of many minutes after any activity from
kboard and mouse. I also managed to get something logged from the erratic
mousyness that resulted:
Mar 25 17:18:25 theirongiant kernel: psmouse.c: Lost synchronization, throwing 1 bytes away.
Mar 25 17:18:27 theirongiant kernel: psmouse.c: Lost synchronization, throwing 2 bytes away.
Mar 25 17:18:40 theirongiant kernel: psmouse.c: Lost synchronization, throwing 1 bytes away.
Mar 25 17:18:42 theirongiant kernel: psmouse.c: Lost synchronization, throwing 2 bytes away.
Mar 25 17:18:47 theirongiant kernel: psmouse.c: Lost synchronization, throwing 2 bytes away.
Mar 25 17:18:49 theirongiant kernel: psmouse.c: Lost synchronization, throwing 1 bytes away.
I turned my laptop off and on and rebooted into the same kernel. ext3
journal restore was slow and at one stage seemed like it hung on one of
the partitions. It didn't and moved on to the next but then it did hang
and I had to turn the laptop off and back on as ctrl-alt-delete did not
work.
I rebooted into 2.5.63, restored my journals and booted back into 2.5.66
to write you this type of letter. During this I was recompiling the
kernel to get rid of some options I did not need. Whilst typing I
suffered severe keystroke loss (I could not type the kernels version
number even) which led to what looked like afreeze and then as soon as I
touched the mouse, the laptop shut down. Throughout all this I was
pinging the box on the local lan. The pings were erratic ranging from
0.2ms upto 7200ms. This follows a certain pattern iun that it would hit
0.2ms then jump to a high number like 2400ms and 7200ms and then the
pings would drop down by 1000ms/s until they hit 0.2ms and then jump
again.
Whilst 2.5.63 has severe problems with interactivity it does not crash
or lose keystrokes and pinging the box during a kernel compile resulst
in 'reasonable' results of 0.2ms to 0.3ms.
Same config but without anything in the Power Management menu selected:
Similar results.
A stripped down, minimal config (config-small attachment):
I couldn't really duplicate the problem. There were some interactivity
issues but that's more or less it.
Using Dave Hensen's report_lost_ticks patch did not yeild anything when
used with the first config (the big one) except for some results during
kernel initialisation (once during input dirver init and another waaay
early in the boot due to acpi - I can post these if need be).
Seeing as how the stripped down, minimal config basically worked I
created another that was smaller then the initial one but still useful
to me (I don't really want to run 2.5.63 if I can help it as its
interactivity issues are far worse then 2.6.66's (when it's not killing
my laptop :)). This is the .config-medium attachment.
The results for this config with attempting to compile the kernel, compile
mplayer, getting mozilla to load up about 40 websites in 3 windows and
constantly copying a 30MB file from the HD to an NFS server are the same
as for the minimal config (config-small). ie interactivity issues are
present but I got bored trying to type stuff and move the mouse around
in order to get the kernel to spew up on me.
I've removed all commented entries from the config files and did a diff
of the medium config versus the big one so that the diffs can easily
be seen. This is the config-diff attachment.
I'm going to stop recompiling the kernel now and crashing my laptop
(I've got 8 versions of .66 and god knows how many times I've run tests
:). I'm sure all the power downs and power ups aren't good for it but
if anyone wants me to try a new kernel with new config options to see
if it resumes being nasty, I'll happily do that. I'm just not clueful
enough to pick the right options myself without lots of trial and error
and I'm kinda tired. :) Also, if you have a patch you wish me to try,
chuck it at me and I'll see if it helps.
I do hope this helps. As usual, if you need more info or any help, please
holler or something. It sucks doing a bug report (especially if you've
put a fair bit of work into it) and getting nothing but silence back. :/
(oh yeah... I hope davej and alan don't mind being CCed - Andrew said
they'd be good at figuring out stuff like this so I put the email addies
into the CC line)
--
"Other countries of course, bear the same risk. But there's no doubt his
hatred is mainly directed at us. After all this is the guy who tried to
kill my dad."
- George W. Bush Jr, Leader of the United States Regime
September 26, 2002 (from a political fundraiser in Houston, Texas)
Hello,
I've tried 2.5.66, and have the same trouble with interactivity issues.
Pre-empt kernel, crusoe tm5800 processor, 256MB of ram.
"setiathome -verbose" running at it's default priority (1) in an xterm
still seems to trigger it. Applications that were already running seem
to stay interactive. New processes seem to hang, I had ls in an xterm
hang half way through a listing, after a bit of time and moving windows
around (my theory being to send expose events to wake up other apps
might help unhang) the system responds again. Other applications
such as rox filer, xv, mplayer, gimp, have all hung during load or just
after. Mplayer showed a few frames of video and stopped, once restarted
though it plays fine. The same goes for the other applications, after
being loaded once they tend to work the second time.
Killing seti brings everything back to normal, instantly.
Here is a top listing during a hang:
Tasks: 50 total, 3 running, 46 sleeping, 0 stopped, 1 zombie top
- 22:16:18 up 16 min, 6 users, load average: 2.13, 1.74, 1.08 Tasks:
47 total, 4 running, 42 sleeping, 0 stopped, 1 zombie Cpu(s):
1.0% user, 1.3% system, 97.7% nice, 0.0% idle, 0.0% IO-wait Mem:
240208k total, 131052k used, 109156k free, 4624k buffers Swap:
257032k total, 0k used, 257032k free, 75092k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
173 zed 17 1 15904 14m 352 R 98.3 6.0 11:06.11 setiathome
230 zed 15 0 1856 984 1740 R 0.7 0.4 0:00.38 top
116 root 15 0 21620 15m 7944 S 0.3 6.6 0:51.21 X
157 zed 15 0 4072 2292 2860 S 0.3 1.0 0:14.39 ssh
161 root 15 0 4756 2448 4468 S 0.3 1.0 0:00.80 xterm
1 root 15 0 480 232 456 S 0.0 0.1 0:05.10 init
2 root 34 19 0 0 0 R 0.0 0.0 0:00.01 ksoftirqd/0
3 root 5 -10 0 0 0 S 0.0 0.0 0:00.08 events/0
4 root 15 0 0 0 0 S 0.0 0.0 0:00.00 kapmd
5 root 25 0 0 0 0 S 0.0 0.0 0:00.00 pdflush
6 root 15 0 0 0 0 S 0.0 0.0 0:00.06 pdflush
7 root 25 0 0 0 0 S 0.0 0.0 0:00.00 kswapd0
Here is a vmstat 1 listing:
(mplayer was launched but not appearing)
3 0 0 99640 4712 83300 0 0 0 0 1060 76 96 4 0 0
3 0 0 99952 4712 83300 0 0 0 0 1001 17 99 1 0 0
3 0 0 99120 4712 83300 0 0 0 0 1063 70 97 3 0 0
3 0 0 100004 4712 83300 0 0 0 0 1001 10 100 0 0 0
5 0 0 99016 4712 83300 0 0 0 0 1082 107 97 3 0 0
3 0 0 99900 4712 83300 0 0 0 0 1253 411 96 4 0 0
3 0 0 99008 4712 83300 0 0 0 0 1139 195 96 4 0 0
3 0 0 99892 4712 83300 0 0 0 0 1008 63 100 0 0 0
(killed seti and mplayer shows itself)
0 0 0 111236 4716 83552 0 0 140 0 1043 523 48 7 9 36
1 0 0 111080 4716 83680 0 0 128 0 1273 1302 28 6 66 0
1 0 0 111088 4716 83680 0 0 0 0 1151 1044 27 5 68 0
0 0 0 110984 4716 83808 0 0 128 0 1127 1113 25 3 72 0
Hope this helps,
George
> Can you please test 2.5.66? Some things were fixed.
>
> Thanks.
>
> "George Glover" <[email protected]> wrote:
>>
>> I also have a similar problem, when running setiathome with priority
>> 1.
>>
>> All _running_ applications remain interactive, anything requring disk
>> access, in particular starting a new program, causes things to block.
>>
>> I've so far only been in X when this happens, mouse still moves,
>> remote applications update their windows fine, no lost keystrokes,
>> it's when I run something locally requiring disk IO.
>>
>> Renicing setiathome to -19 causes the problem to vanish.
>>
>> Dragging an xterm around seems to help it recover from a hang too.
>>
>> > - How much memory does the machine have?
>>
>> 256
>>
>> > - UP/SMP/preempt?
>>
>> preempt
>>
>> > - Did it happen in 2.5.64? 2.5.63? 2.4.20?
>>
>> 2.5.64
>>
>> > - Does it get better if you renice stuff?
>>
>> Yes
>>
>> > - What steps should others take to reproduce it?
>>
>> Running setiathome with priority one seems to do it for me.
>>
>> George