2003-11-26 17:12:21

by Gene Heskett

[permalink] [raw]
Subject: amanda vs 2.6

Greetings everybody, scsi folks in particular;

I don't know if I've got a bad, miss-set link thats effecting the
build of amanda, or ???

Preconditions:
tape drive has magazine, magazine ejected and reloaded, but all tapes
still reside in the magazine carrier.

Under a 2.4.22-pre10 boot, a run of amcheck will load the last loaded
slot of the magazine and proceed to search the magazine for the next,
correct tape.

Under a 2.6.0test9 or 10 boot, it loads a tape, but then gets a signal
11 and exits. A re-run from that point, where it does have a loaded
tape from the first run, proceeds 100% normally in searching the
magazine.

Preset the magazine again and run it with gdb, and get this only if
booted to a 2.6 kernel:

[amanda@coyote DailySet1]$ gdb /usr/local/libexec/chg-scsi
GNU gdb Red Hat Linux (5.2.1-4)
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and
you are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for
details.
This GDB was configured as "i386-redhat-linux"...
(gdb) run -info
Starting program: /usr/local/libexec/chg-scsi -info
0 4 1 0

Program exited normally.
(gdb) bt
No stack.
(gdb) run -slot 1
Starting program: /usr/local/libexec/chg-scsi -slot 1

[several minutes elapse]

Program received signal SIGSEGV, Segmentation fault.
0x4011f453 in strchr () from /lib/libc.so.6
(gdb) bt
#0 0x4011f453 in strchr () from /lib/libc.so.6
#1 0x00000001 in ?? ()
#2 0x40034899 in tape_open (filename=0x0, mode=0, mask=2) at
tapeio.c:540
#3 0x40034e4f in tape_rewind (devname=0x0) at tapeio.c:734
#4 0x0804f372 in GenericRewind (DeviceFD=2) at
scsi-changer-driver.c:2583
#5 0x0804d637 in Tape_Ready (fd=2, wait_time=63) at
scsi-changer-driver.c:1459
#6 0x0804b2a2 in main (argc=3, argv=0xbffff5c4) at chg-scsi.c:1581
#7 0x400c154d in __libc_start_main () from /lib/libc.so.6
(gdb) quit
The program is running. Exit anyway? (y or n) y

Additional data: /usr/src/linux-2.6 points at the 2.6.0-test10 src
tree, but /usr/src/linux points at the 2.4.22-pre10 src tree. I have
NDI if this has any effects, good or bad, on the configure and build
of amanda, which is itself the latest 2.4.4p1 snapshot dated
20031120.

And ANAICT, nothing else amanda related is effected. Bug, feature,
miss-configuration? Hints?

--
Cheers, Gene
AMD K6-III@500mhz 320M
Athlon1600XP@1400mhz 512M
99.27% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attornies please note, additions to this message
by Gene Heskett are:
Copyright 2003 by Maurice Eugene Heskett, all rights reserved.


2003-11-26 17:19:33

by William Lee Irwin III

[permalink] [raw]
Subject: Re: amanda vs 2.6

On Wed, Nov 26, 2003 at 12:12:10PM -0500, Gene Heskett wrote:
> Greetings everybody, scsi folks in particular;
> I don't know if I've got a bad, miss-set link thats effecting the
> build of amanda, or ???
> Preconditions:
> tape drive has magazine, magazine ejected and reloaded, but all tapes
> still reside in the magazine carrier.
> Under a 2.4.22-pre10 boot, a run of amcheck will load the last loaded
> slot of the magazine and proceed to search the magazine for the next,
> correct tape.
> Under a 2.6.0test9 or 10 boot, it loads a tape, but then gets a signal
> 11 and exits. A re-run from that point, where it does have a loaded
> tape from the first run, proceeds 100% normally in searching the
> magazine.
> Preset the magazine again and run it with gdb, and get this only if
> booted to a 2.6 kernel:

Please retry with:
echo 1 > /proc/sys/vm/overcommit_memory


-- wli

2003-11-26 19:15:56

by Gene Heskett

[permalink] [raw]
Subject: Re: amanda vs 2.6

On Wednesday 26 November 2003 12:19, William Lee Irwin III wrote:
>echo 1 > /proc/sys/vm/overcommit_memory

Unforch, this seems to have fubared the system, and I will have to
reboot as I cannot (it hangs) do an "su amanda" after executeing
this.

--
Cheers, Gene
AMD K6-III@500mhz 320M
Athlon1600XP@1400mhz 512M
99.27% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attornies please note, additions to this message
by Gene Heskett are:
Copyright 2003 by Maurice Eugene Heskett, all rights reserved.

2003-11-26 19:31:08

by William Lee Irwin III

[permalink] [raw]
Subject: Re: amanda vs 2.6

On Wednesday 26 November 2003 12:19, William Lee Irwin III wrote:
>> echo 1 > /proc/sys/vm/overcommit_memory

On Wed, Nov 26, 2003 at 02:15:52PM -0500, Gene Heskett wrote:
> Unforch, this seems to have fubared the system, and I will have to
> reboot as I cannot (it hangs) do an "su amanda" after executeing
> this.

Sounds like trouble. Are there any external signs of what's going on?
e.g. is the disk thrashing?


-- wli

2003-11-26 19:43:46

by Gene Heskett

[permalink] [raw]
Subject: Re: amanda vs 2.6

On Wednesday 26 November 2003 14:30, William Lee Irwin III wrote:
>On Wednesday 26 November 2003 12:19, William Lee Irwin III wrote:
>>> echo 1 > /proc/sys/vm/overcommit_memory
>
>On Wed, Nov 26, 2003 at 02:15:52PM -0500, Gene Heskett wrote:
>> Unforch, this seems to have fubared the system, and I will have to
>> reboot as I cannot (it hangs) do an "su amanda" after executeing
>> this.

I forgot in case you aren't fam with amanda, much of it will not run
as any user but its own user, in this case amanda. Security etc.

>Sounds like trouble. Are there any external signs of what's going
> on? e.g. is the disk thrashing?

No, it just hangs forever on the su command, never coming back.
everything else I tried, which wasn't much, seemed to keep on working
as I sent that message with that hung su process in another shell on
another window. I'm an idiot, normally running as root...

I've rebooted, not knowing if an echo 0 to that variable would fix it
or not, I see after the reboot the default value is 0 now.

--
Cheers, Gene
AMD K6-III@500mhz 320M
Athlon1600XP@1400mhz 512M
99.27% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attornies please note, additions to this message
by Gene Heskett are:
Copyright 2003 by Maurice Eugene Heskett, all rights reserved.

2003-11-26 19:50:57

by William Lee Irwin III

[permalink] [raw]
Subject: Re: amanda vs 2.6

On Wednesday 26 November 2003 14:30, William Lee Irwin III wrote:
>> Sounds like trouble. Are there any external signs of what's going
>> on? e.g. is the disk thrashing?

On Wed, Nov 26, 2003 at 02:43:43PM -0500, Gene Heskett wrote:
> No, it just hangs forever on the su command, never coming back.
> everything else I tried, which wasn't much, seemed to keep on working
> as I sent that message with that hung su process in another shell on
> another window. I'm an idiot, normally running as root...
> I've rebooted, not knowing if an echo 0 to that variable would fix it
> or not, I see after the reboot the default value is 0 now.

Okay, then we need to figure out what the hung process was doing.
Can you find its pid and check /proc/$PID/wchan?


-- wli

2003-11-26 20:05:15

by Linus Torvalds

[permalink] [raw]
Subject: Re: amanda vs 2.6



On Wed, 26 Nov 2003, William Lee Irwin III wrote:

> On Wed, Nov 26, 2003 at 02:43:43PM -0500, Gene Heskett wrote:
> > No, it just hangs forever on the su command, never coming back.
> > everything else I tried, which wasn't much, seemed to keep on working
> > as I sent that message with that hung su process in another shell on
> > another window. I'm an idiot, normally running as root...
> > I've rebooted, not knowing if an echo 0 to that variable would fix it
> > or not, I see after the reboot the default value is 0 now.
>
> Okay, then we need to figure out what the hung process was doing.
> Can you find its pid and check /proc/$PID/wchan?

I've seen this before, and I'll bet you 5c (yeah, I'm cheap) that it's
trying to log to syslogd.

And syslogd is stopped for some reason - either a bug, a mistaken SIGSTOP,
or simply because the console has been stopped with a simple ^S.

That won't stop "su" working immediately - programs can still log to
syslogd until the logging socket buffer fills up. Which can be _damn_
frsutrating to find (I haven't seen this behaviour lately, but I remember
being perplexed like hell a long time ago).

Linus

2003-11-26 20:08:04

by William Lee Irwin III

[permalink] [raw]
Subject: Re: amanda vs 2.6

On Wed, 26 Nov 2003, William Lee Irwin III wrote:
>> Okay, then we need to figure out what the hung process was doing.
>> Can you find its pid and check /proc/$PID/wchan?

On Wed, Nov 26, 2003 at 12:04:56PM -0800, Linus Torvalds wrote:
> I've seen this before, and I'll bet you 5c (yeah, I'm cheap) that it's
> trying to log to syslogd.
> And syslogd is stopped for some reason - either a bug, a mistaken SIGSTOP,
> or simply because the console has been stopped with a simple ^S.
> That won't stop "su" working immediately - programs can still log to
> syslogd until the logging socket buffer fills up. Which can be _damn_
> frsutrating to find (I haven't seen this behaviour lately, but I remember
> being perplexed like hell a long time ago).

That'll do it. Gene, could you check on syslogd too, then?


-- wli

2003-11-26 20:23:37

by Gene Heskett

[permalink] [raw]
Subject: Re: amanda vs 2.6

On Wednesday 26 November 2003 14:50, William Lee Irwin III wrote:
>On Wednesday 26 November 2003 14:30, William Lee Irwin III wrote:
>>> Sounds like trouble. Are there any external signs of what's going
>>> on? e.g. is the disk thrashing?
>
>On Wed, Nov 26, 2003 at 02:43:43PM -0500, Gene Heskett wrote:
>> No, it just hangs forever on the su command, never coming back.
>> everything else I tried, which wasn't much, seemed to keep on
>> working as I sent that message with that hung su process in
>> another shell on another window. I'm an idiot, normally running
>> as root... I've rebooted, not knowing if an echo 0 to that
>> variable would fix it or not, I see after the reboot the default
>> value is 0 now.
>
>Okay, then we need to figure out what the hung process was doing.
>Can you find its pid and check /proc/$PID/wchan?
>
Ok, repeat, us is PID 1843, so:
[root@coyote root]# ps -ea|grep su
1843 pts/1 00:00:00 su
[root@coyote root]# cat /proc/1843/wchan
sys_wait4[root@coyote root]#

Unforch, echoing a 0 to that variable doesn't fix it, reboot time
again.

Do you need my .config?

>
>-- wli

--
Cheers, Gene
AMD K6-III@500mhz 320M
Athlon1600XP@1400mhz 512M
99.27% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attornies please note, additions to this message
by Gene Heskett are:
Copyright 2003 by Maurice Eugene Heskett, all rights reserved.

2003-11-26 20:32:40

by William Lee Irwin III

[permalink] [raw]
Subject: Re: amanda vs 2.6

On Wednesday 26 November 2003 14:50, William Lee Irwin III wrote:
>> Okay, then we need to figure out what the hung process was doing.
>> Can you find its pid and check /proc/$PID/wchan?

On Wed, Nov 26, 2003 at 03:23:33PM -0500, Gene Heskett wrote:
> Ok, repeat, us is PID 1843, so:
> [root@coyote root]# ps -ea|grep su
> 1843 pts/1 00:00:00 su
> [root@coyote root]# cat /proc/1843/wchan
> sys_wait4[root@coyote root]#
> Unforch, echoing a 0 to that variable doesn't fix it, reboot time
> again.
> Do you need my .config?

su had apparently spawned something and is waiting on it in the
wchan you showed. Could you find the shell it spawned as an amanda
user and syslogd (as per Linus' suggestion) also?


-- wli

2003-11-26 20:40:42

by Gene Heskett

[permalink] [raw]
Subject: Re: amanda vs 2.6

On Wednesday 26 November 2003 15:04, Linus Torvalds wrote:
>On Wed, 26 Nov 2003, William Lee Irwin III wrote:
>> On Wed, Nov 26, 2003 at 02:43:43PM -0500, Gene Heskett wrote:
>> > No, it just hangs forever on the su command, never coming back.
>> > everything else I tried, which wasn't much, seemed to keep on
>> > working as I sent that message with that hung su process in
>> > another shell on another window. I'm an idiot, normally running
>> > as root... I've rebooted, not knowing if an echo 0 to that
>> > variable would fix it or not, I see after the reboot the default
>> > value is 0 now.
>>
>> Okay, then we need to figure out what the hung process was doing.
>> Can you find its pid and check /proc/$PID/wchan?
>
>I've seen this before, and I'll bet you 5c (yeah, I'm cheap) that
> it's trying to log to syslogd.
>
>And syslogd is stopped for some reason - either a bug, a mistaken
> SIGSTOP, or simply because the console has been stopped with a
> simple ^S.
>
>That won't stop "su" working immediately - programs can still log to
>syslogd until the logging socket buffer fills up. Which can be
> _damn_ frsutrating to find (I haven't seen this behaviour lately,
> but I remember being perplexed like hell a long time ago).
>
> Linus

Ok, then, this is not what I'm seeing. The last su amanda was done on
an almost virgin shell, having only executed that echo 1 to the
overcommit_memory var in /proc.

I tried it from other shells too, no difference, su is locked, cannot
even respond to a ctrl-c or ctrl-d. But the close button will close
the shell window just fine.

Also, echoing a 0 to that variable doesn't fix it.only a reboot fixes
it. Right now:
[root@coyote root]# ps -ea |grep syslogd
406 ? 00:00:00 syslogd
[root@coyote root]# echo 1 > /proc/sys/vm/overcommit_memory

which didn't kill syslogd:
[root@coyote root]# ps -ea |grep syslogd
406 ? 00:00:00 syslogd

Ok, I'll play this game, as long as I know the rules, I have now done
an "su amanda" in two different shells, without any problems, and a
"cat /proc/sys/vm/overcommit_memory" returns this:
[amanda@coyote root]$ cat /proc/sys/vm/overcommit_memory
1
[amanda@coyote root]$

The two previous boots to the same kernel, 2.6.0-test10 using the
default scheduler, hung the su command and failed at exactly this
same point.

I think I need a new rulebook...



--
Cheers, Gene
AMD K6-III@500mhz 320M
Athlon1600XP@1400mhz 512M
99.27% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attornies please note, additions to this message
by Gene Heskett are:
Copyright 2003 by Maurice Eugene Heskett, all rights reserved.

2003-11-26 20:48:11

by Gene Heskett

[permalink] [raw]
Subject: Re: amanda vs 2.6

On Wednesday 26 November 2003 15:32, William Lee Irwin III wrote:
>On Wednesday 26 November 2003 14:50, William Lee Irwin III wrote:
>>> Okay, then we need to figure out what the hung process was doing.
>>> Can you find its pid and check /proc/$PID/wchan?
>
>On Wed, Nov 26, 2003 at 03:23:33PM -0500, Gene Heskett wrote:
>> Ok, repeat, us is PID 1843, so:
>> [root@coyote root]# ps -ea|grep su
>> 1843 pts/1 00:00:00 su
>> [root@coyote root]# cat /proc/1843/wchan
>> sys_wait4[root@coyote root]#
>> Unforch, echoing a 0 to that variable doesn't fix it, reboot time
>> again.
>> Do you need my .config?
>
>su had apparently spawned something and is waiting on it in the
>wchan you showed. Could you find the shell it spawned as an amanda
>user and syslogd (as per Linus' suggestion) also?

I need a bottle of aspirin, no change, but this boot, its working. Go
figure. FWIW, syslogd is running just fine, or is for this boot
anyway... At this point I don't even have a SWAG about whats going
on.

Besides, that was a shell I typed that into, and I don't believe it
actually spawned the users shell. No visible indicator that I could
see.

>-- wli

--
Cheers, Gene
AMD K6-III@500mhz 320M
Athlon1600XP@1400mhz 512M
99.27% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attornies please note, additions to this message
by Gene Heskett are:
Copyright 2003 by Maurice Eugene Heskett, all rights reserved.

2003-11-26 20:42:31

by Gene Heskett

[permalink] [raw]
Subject: Re: amanda vs 2.6

On Wednesday 26 November 2003 15:07, William Lee Irwin III wrote:
>On Wed, 26 Nov 2003, William Lee Irwin III wrote:
>>> Okay, then we need to figure out what the hung process was doing.
>>> Can you find its pid and check /proc/$PID/wchan?
>
>On Wed, Nov 26, 2003 at 12:04:56PM -0800, Linus Torvalds wrote:
>> I've seen this before, and I'll bet you 5c (yeah, I'm cheap) that
>> it's trying to log to syslogd.
>> And syslogd is stopped for some reason - either a bug, a mistaken
>> SIGSTOP, or simply because the console has been stopped with a
>> simple ^S. That won't stop "su" working immediately - programs can
>> still log to syslogd until the logging socket buffer fills up.
>> Which can be _damn_ frsutrating to find (I haven't seen this
>> behaviour lately, but I remember being perplexed like hell a long
>> time ago).
>
>That'll do it. Gene, could you check on syslogd too, then?
>
See my reply to Linus.
>
>-- wli

--
Cheers, Gene
AMD K6-III@500mhz 320M
Athlon1600XP@1400mhz 512M
99.27% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attornies please note, additions to this message
by Gene Heskett are:
Copyright 2003 by Maurice Eugene Heskett, all rights reserved.

2003-11-26 20:56:54

by Chris Adams

[permalink] [raw]
Subject: Re: amanda vs 2.6

Once upon a time, Linus Torvalds <[email protected]> wrote:
>And syslogd is stopped for some reason - either a bug, a mistaken SIGSTOP,
>or simply because the console has been stopped with a simple ^S.

It can also happen if there is a problem with DNS; syslogd tries to do a
DNS lookup to get the hostname to put in the record and can hang on that
if the DNS server is busy, hung, down, unreachable, etc.

_Really_ annoying when you are trying to log in to the DNS server to fix
a problem with DNS!
--
Chris Adams <[email protected]>
Systems and Network Administrator - HiWAAY Internet Services
I don't speak for anybody but myself - that's enough trouble.

2003-11-26 21:35:01

by Diego Calleja García

[permalink] [raw]
Subject: Re: amanda vs 2.6

El Wed, 26 Nov 2003 12:04:56 -0800 (PST) Linus Torvalds <[email protected]> escribi?:

> I've seen this before, and I'll bet you 5c (yeah, I'm cheap) that it's
> trying to log to syslogd.
>
> And syslogd is stopped for some reason - either a bug, a mistaken SIGSTOP,
> or simply because the console has been stopped with a simple ^S.
>
> That won't stop "su" working immediately - programs can still log to
> syslogd until the logging socket buffer fills up. Which can be _damn_
> frsutrating to find (I haven't seen this behaviour lately, but I remember
> being perplexed like hell a long time ago).

I've seen this too. I could fix it with "sysrq + s". I always though
it was a bug in syslogd. I haven't seen it in a while...

Diego Calleja

2003-11-27 08:41:29

by Nick Piggin

[permalink] [raw]
Subject: Re: amanda vs 2.6



Linus Torvalds wrote:

>
>On Wed, 26 Nov 2003, William Lee Irwin III wrote:
>
>
>>On Wed, Nov 26, 2003 at 02:43:43PM -0500, Gene Heskett wrote:
>>
>>>No, it just hangs forever on the su command, never coming back.
>>>everything else I tried, which wasn't much, seemed to keep on working
>>>as I sent that message with that hung su process in another shell on
>>>another window. I'm an idiot, normally running as root...
>>>I've rebooted, not knowing if an echo 0 to that variable would fix it
>>>or not, I see after the reboot the default value is 0 now.
>>>
>>Okay, then we need to figure out what the hung process was doing.
>>Can you find its pid and check /proc/$PID/wchan?
>>
>
>I've seen this before, and I'll bet you 5c (yeah, I'm cheap) that it's
>trying to log to syslogd.
>
>And syslogd is stopped for some reason - either a bug, a mistaken SIGSTOP,
>or simply because the console has been stopped with a simple ^S.
>
>That won't stop "su" working immediately - programs can still log to
>syslogd until the logging socket buffer fills up. Which can be _damn_
>frsutrating to find (I haven't seen this behaviour lately, but I remember
>being perplexed like hell a long time ago).
>

Same problem here. Been seeing them now and again for quite a while
I have syslogd and klogd sleeping in do_syslog. cron and login are
sleeping in schedule_timeout. A sysrq+T gets things going again but
unfortunately the interesting state probably wasn't captured. I have
the /proc/*/wchan and sysrq+t trace if anyone is interested.

I'll try any suggestions of what I should look at when I hit it again.

Nick


2003-11-27 10:05:56

by Gene Heskett

[permalink] [raw]
Subject: Re: amanda vs 2.6

On Thursday 27 November 2003 03:41, Nick Piggin wrote:
>Linus Torvalds wrote:
>>On Wed, 26 Nov 2003, William Lee Irwin III wrote:
>>>On Wed, Nov 26, 2003 at 02:43:43PM -0500, Gene Heskett wrote:
>>>>No, it just hangs forever on the su command, never coming back.
>>>>everything else I tried, which wasn't much, seemed to keep on
>>>> working as I sent that message with that hung su process in
>>>> another shell on another window. I'm an idiot, normally running
>>>> as root... I've rebooted, not knowing if an echo 0 to that
>>>> variable would fix it or not, I see after the reboot the default
>>>> value is 0 now.
>>>
>>>Okay, then we need to figure out what the hung process was doing.
>>>Can you find its pid and check /proc/$PID/wchan?
>>
>>I've seen this before, and I'll bet you 5c (yeah, I'm cheap) that
>> it's trying to log to syslogd.
>>
>>And syslogd is stopped for some reason - either a bug, a mistaken
>> SIGSTOP, or simply because the console has been stopped with a
>> simple ^S.
>>
>>That won't stop "su" working immediately - programs can still log
>> to syslogd until the logging socket buffer fills up. Which can be
>> _damn_ frsutrating to find (I haven't seen this behaviour lately,
>> but I remember being perplexed like hell a long time ago).
>
>Same problem here. Been seeing them now and again for quite a while
>I have syslogd and klogd sleeping in do_syslog. cron and login are
>sleeping in schedule_timeout. A sysrq+T gets things going again but
>unfortunately the interesting state probably wasn't captured. I have
>the /proc/*/wchan and sysrq+t trace if anyone is interested.
>
>I'll try any suggestions of what I should look at when I hit it
> again.

User experience report Nick.

Around midnight last night, haveing left
/proc/sys/vm/overcommit_memory=1, I tried to build 2.6.0-test11.
The machine got plumb spastic, taking nearly 10 minutes to unpack the
tarball and copy the configs etc, and another 5 just to run the last
command in my script, 'make xconfig'. Thats my buildit26 script,
which normally runs in maybe 2 minutes plus whatever browsing time I
waste in that xconfig. I'm talking mouse locked up for several
seconds at a time. Using the anticipatory scheduler.

cd'ing into linux-2.6, and editing 1 character in my makeit script
took another 2 minutes, then running the script, about a 12 minute
job as it oversees the fully installed kernel, took about 17 minutes.
With it running, it was quite sluggish vim'ing /boot/grub/grub.conf to
add the new kernel and save it.

I got rebooted with about 2 minutes to spare before amanda was due to
run. The machine is now normal again since the reboot set that back
to 0. I've tried to set it back to zero by hand, but once the
machine turns into an arthritic dog because its set to 1, then a
reboot seems to be the only recovery.

To me, setting this "overcommit_memory" bit to non-zero seems to
trigger something other than what it was designed to do.

The kde utils kpm and ksysguard also do not show enough cpu usage in
the process list, with the sum totals of both usage columns often
being below 25%. The graphical displays however seem to be ok. Both
of those were of course built while running a 2.4 kernel so I'd
expect to see some miss-match when they are interrogating a 2.6
kernel.

My $0.02, but performance like that would scare a new user right back
to winderz.

Around here, its thanksgiving day, and we traditionally eat way too
much turkey (or something like that :) And then complain about the
weight we've gained of course...

--
Cheers, Gene
AMD K6-III@500mhz 320M
Athlon1600XP@1400mhz 512M
99.27% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attornies please note, additions to this message
by Gene Heskett are:
Copyright 2003 by Maurice Eugene Heskett, all rights reserved.

2003-11-27 13:39:45

by William Lee Irwin III

[permalink] [raw]
Subject: Re: amanda vs 2.6

On Thu, Nov 27, 2003 at 05:05:50AM -0500, Gene Heskett wrote:
> My $0.02, but performance like that would scare a new user right back
> to winderz.
> Around here, its thanksgiving day, and we traditionally eat way too
> much turkey (or something like that :) And then complain about the
> weight we've gained of course...

This isn't a performance problem. This is a bug. It vaguely sounds like
a missed wakeup or missing setting of TIF_NEED_RESCHED, but could be a
number of other things too.

(The missing setting of TIF_NEED_RESCHED theory is right if it's
possible to clean up after it by ignoring need_resched() in the
scheduler and always rescheduling.)


-- wli

2003-11-27 17:16:45

by Gene Heskett

[permalink] [raw]
Subject: Re: amanda vs 2.6

On Thursday 27 November 2003 08:39, William Lee Irwin III wrote:
>On Thu, Nov 27, 2003 at 05:05:50AM -0500, Gene Heskett wrote:
>> My $0.02, but performance like that would scare a new user right
>> back to winderz.
>> Around here, its thanksgiving day, and we traditionally eat way
>> too much turkey (or something like that :) And then complain
>> about the weight we've gained of course...
>
>This isn't a performance problem. This is a bug. It vaguely sounds
> like a missed wakeup or missing setting of TIF_NEED_RESCHED, but
> could be a number of other things too.
>
>(The missing setting of TIF_NEED_RESCHED theory is right if it's
>possible to clean up after it by ignoring need_resched() in the
>scheduler and always rescheduling.)

Well, running 2.6.0-test11, I just discovered I'm back to being unable
to 'su amanda' again. It worked the first time, but I got rejected
frorm unpacking the lastest amanda-2.4.4p1-20031126.tar.gz due to a
lack of permissions, so I exited, chowned the archive to what it was
supposed to be, but cannot now do another su amanda in order to start
the install of this latest snapshot.

The process just hangs, never comeing back to a prompt. I never had
any troubles with that useing test9, so I guess its reboot time
again.

However, IMO this is a major problem, and needs fixed before 2.6.0.

--
Cheers, Gene
AMD K6-III@500mhz 320M
Athlon1600XP@1400mhz 512M
99.27% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attornies please note, additions to this message
by Gene Heskett are:
Copyright 2003 by Maurice Eugene Heskett, all rights reserved.

2003-11-27 17:56:03

by Gene Heskett

[permalink] [raw]
Subject: Re: amanda vs 2.6

On Thursday 27 November 2003 12:16, Gene Heskett wrote:
>On Thursday 27 November 2003 08:39, William Lee Irwin III wrote:
>>On Thu, Nov 27, 2003 at 05:05:50AM -0500, Gene Heskett wrote:
>>> My $0.02, but performance like that would scare a new user right
>>> back to winderz.
>>> Around here, its thanksgiving day, and we traditionally eat way
>>> too much turkey (or something like that :) And then complain
>>> about the weight we've gained of course...
>>
>>This isn't a performance problem. This is a bug. It vaguely sounds
>> like a missed wakeup or missing setting of TIF_NEED_RESCHED, but
>> could be a number of other things too.
>>
>>(The missing setting of TIF_NEED_RESCHED theory is right if it's
>>possible to clean up after it by ignoring need_resched() in the
>>scheduler and always rescheduling.)
>
>Well, running 2.6.0-test11, I just discovered I'm back to being
> unable to 'su amanda' again. It worked the first time, but I got
> rejected frorm unpacking the lastest amanda-2.4.4p1-20031126.tar.gz
> due to a lack of permissions, so I exited, chowned the archive to
> what it was supposed to be, but cannot now do another su amanda in
> order to start the install of this latest snapshot.
>
>The process just hangs, never comeing back to a prompt. I never had
>any troubles with that useing test9, so I guess its reboot time
>again.
>
>However, IMO this is a major problem, and needs fixed before 2.6.0.

Rebooted to 2.6.0-test10, deadline scheduler now, and have managed to
do an 'su amanda' at least twice without any hangs.

Three times now, no problems. 4 times, exited the last one with a
ctrl-d instead of an exit string, and now the 5th time is hung. Is
ctrl-d no longer a valid shell exit option? Finding the su PID, and
catting /proc/PID/wchan returns this just as it did yesterday:

[root@coyote root]# ps -ea |grep su
26658 pts/1 00:00:00 su
[root@coyote root]# cat /proc/26658/wchan
sys_wait4[root@coyote root]#

Comment on schedulers, deadline seems to leave me with the snappiest
machine response, with cfq a close second. The default anticipatory
just doesn't have the right 'feel' to it.

Also, setiathome only did 3 units yesterday, and it normally does 4 to
5. With the overcommit_memory non-zeroed, the machine was an
arthritic, stuttering as it barked, spastic dog.

Or, any cat could have caught that mouse...

--
Cheers, Gene
AMD K6-III@500mhz 320M
Athlon1600XP@1400mhz 512M
99.27% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attornies please note, additions to this message
by Gene Heskett are:
Copyright 2003 by Maurice Eugene Heskett, all rights reserved.

2003-11-27 18:05:18

by Gene Heskett

[permalink] [raw]
Subject: Re: amanda vs 2.6

On Thursday 27 November 2003 12:55, Gene Heskett wrote:
>On Thursday 27 November 2003 12:16, Gene Heskett wrote:
>>On Thursday 27 November 2003 08:39, William Lee Irwin III wrote:
>>>On Thu, Nov 27, 2003 at 05:05:50AM -0500, Gene Heskett wrote:
>>>> My $0.02, but performance like that would scare a new user right
>>>> back to winderz.
>>>> Around here, its thanksgiving day, and we traditionally eat way
>>>> too much turkey (or something like that :) And then complain
>>>> about the weight we've gained of course...
>>>
>>>This isn't a performance problem. This is a bug. It vaguely sounds
>>> like a missed wakeup or missing setting of TIF_NEED_RESCHED, but
>>> could be a number of other things too.
>>>
>>>(The missing setting of TIF_NEED_RESCHED theory is right if it's
>>>possible to clean up after it by ignoring need_resched() in the
>>>scheduler and always rescheduling.)
>>
>>Well, running 2.6.0-test11, I just discovered I'm back to being
>> unable to 'su amanda' again. It worked the first time, but I got
>> rejected frorm unpacking the lastest
>> amanda-2.4.4p1-20031126.tar.gz due to a lack of permissions, so I
>> exited, chowned the archive to what it was supposed to be, but
>> cannot now do another su amanda in order to start the install of
>> this latest snapshot.
>>
>>The process just [root@coyote root]# ps -ea |grep su
26658 pts/1 00:00:00 su
[root@coyote root]# cat /proc/26658/wchan
sys_wait4[root@coyote root]#
hangs, never comeing back to a prompt. I never
>> had any troubles with that useing test9, so I guess its reboot
>> time again.
>>
>>However, IMO this is a major problem, and needs fixed before 2.6.0.
>
>Rebooted to 2.6.0-test10, deadline scheduler now, and have managed
> to do an 'su amanda' at least twice without any hangs.
>
>Three times now, no problems. 4 times, exited the last one with a
>ctrl-d instead of an exit string, and now the 5th time is hung. Is
>ctrl-d no longer a valid shell exit option? Finding the su PID, and
>catting /proc/PID/wchan returns this just as it did yesterday:
>
>[root@coyote root]# ps -ea |grep su
>26658 pts/1 00:00:00 su
>[root@coyote root]# cat /proc/26658/wchan
>sys_wait4[root@coyote root]#

Then I killed that su process, which got me back my prompt in that
shell. I have since done 4 runs of 'su amanda -c "amcheck
DailySet1"' without any problem. Now can I just plain 'su amanda'?

No, thats hung again. Somebody wanna pass me the excedrin, I've got
headache #947...

>Comment on schedulers, deadline seems to leave me with the snappiest
>machine response, with cfq a close second. The default anticipatory
>just doesn't have the right 'feel' to it.
>
>Also, setiathome only did 3 units yesterday, and it normally does 4
> to 5. With the overcommit_memory non-zeroed, the machine was an
> arthritic, stuttering as it barked, spastic dog.
>
>Or, any cat could have caught that mouse...

--
Cheers, Gene
AMD K6-III@500mhz 320M
Athlon1600XP@1400mhz 512M
99.27% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attornies please note, additions to this message
by Gene Heskett are:
Copyright 2003 by Maurice Eugene Heskett, all rights reserved.

2003-11-29 17:14:40

by Gene Heskett

[permalink] [raw]
Subject: Re: amanda vs 2.6

On Thursday 27 November 2003 08:39, William Lee Irwin III wrote:
>On Thu, Nov 27, 2003 at 05:05:50AM -0500, Gene Heskett wrote:
>> My $0.02, but performance like that would scare a new user right
>> back to winderz.
>> Around here, its thanksgiving day, and we traditionally eat way
>> too much turkey (or something like that :) And then complain
>> about the weight we've gained of course...
>
>This isn't a performance problem. This is a bug. It vaguely sounds
> like a missed wakeup or missing setting of TIF_NEED_RESCHED, but
> could be a number of other things too.
>
>(The missing setting of TIF_NEED_RESCHED theory is right if it's
>possible to clean up after it by ignoring need_resched() in the
>scheduler and always rescheduling.)
>
>
>-- wli

Another data point about this, still unsolved problem:

The number of times I can do an 'su amanda' then exit, and redo the it
seem to be somewhat random, One test I managed to get to the 4th su
before it hung. I turned on the linux normal security stuff in the
.config, rebuilt and rebooted. It had been off previously because
this machine is behind a firewall.

That time I only got one free ride, it hung on the next attempt. So I
left it hung, and put the ksysguard highlight line on the hung su
process, then put ksysguard into the tree mode. The last item in the
branch was an 'stty', which was reported to be 'stopped'. I killed
it. I got my prompt back, as the user amanda, confirmed by a whoami.

Another data point that might be a clue is that there appears to be no
such restriction if the 'su amanda -c "command"' syntax is used. The
only place that hangs is if I try to do an amcheck after refilling
the tape robots magazine, under 2.4.22 ti will load the last tape
slot and resume the search thru the magazine for the right tape.
Under 2.60-test-whatever, I'm getting a signal 11 from the chg-scsi
script after a long delay, but it does load from the last loaded
tapeslot in the magazine. If I simply up-arrow and repeat, it works
as the first pass did load the tape just fine.

I think these are really two seperate problems.

--
Cheers, Gene
AMD K6-III@500mhz 320M
Athlon1600XP@1400mhz 512M
99.27% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attornies please note, additions to this message
by Gene Heskett are:
Copyright 2003 by Maurice Eugene Heskett, all rights reserved.