2005-02-02 18:24:03

by Richard B. Johnson

[permalink] [raw]
Subject: Joe User DOS kills Linux-2.6.10


When I compile and run the following program:

#include <stdio.h>
int main(int x, char **y)
{
pause();
}
... as:

./xxx `yes`

... the following occurs after about 30 seconds (your mileage
may vary):

Additional sense: Peripheral device write fault
end_request: I/O error, dev sdb, sector 34605780
SCSI error : <0 0 1 0> return code = 0x8000002
Info fld=0x2100101, Deferred sdb: sense key Medium Error
Additional sense: Peripheral device write fault
end_request: I/O error, dev sdb, sector 34603748
SCSI error : <0 0 1 0> return code = 0x8000002
Info fld=0x2100103, Deferred sdb: sense key Medium Error
Additional sense: Peripheral device write fault
end_request: I/O error, dev sdb, sector 34606804
SCSI error : <0 0 1 0> return code = 0x8000002
Info fld=0x213d5cd, Deferred sdb: sense key Medium Error
Additional sense: Peripheral device write fault
end_request: I/O error, dev sdb, sector 33943668
SCSI error : <0 0 1 0> return code = 0x8000002
Info fld=0x213d5ce, Deferred sdb: sense key Medium Error
Additional sense: Peripheral device write fault
end_request: I/O error, dev sdb, sector 33943676
SCSI error : <0 0 1 0> return code = 0x8000002
Info fld=0x213d5cf, Deferred sdb: sense key Medium Error
Additional sense: Peripheral device write fault
end_request: I/O error, dev sdb, sector 33943684
SCSI error : <0 0 1 0> return code = 0x8000002
Info fld=0x213d5d0, Deferred sdb: sense key Medium Error
Additional sense: Peripheral device write fault
end_request: I/O error, dev sdb, sector 33943692
SCSI error : <0 0 1 0> return code = 0x8000002
Info fld=0x2149672, Deferred sdb: sense key Medium Error
Additional sense: Peripheral device write fault
end_request: I/O error, dev sdb, sector 9437375
Buffer I/O error on device sdb1, logical block 1179664
lost page write due to I/O error on sdb1
SCSI error : <0 0 1 0> return code = 0x8000002
Info fld=0x2149673, Deferred sdb: sense key Medium Error
Additional sense: Peripheral device write fault
end_request: I/O error, dev sdb, sector 34903668
SCSI error : <0 0 1 0> return code = 0x8000002
Info fld=0x214967c, Current sdb: sense key Medium Error
Additional sense: Peripheral device write fault
end_request: I/O error, dev sdb, sector 34903676

This device, /dev/sdb1 is one of the mounted file-systems.
It is not being accessed. The root filesystem is on
an IDE drive (/proc/mounts):

rootfs / rootfs rw 0 0
/dev/root.old /initrd ext2 rw 0 0
/dev/root / ext3 rw 0 0
/proc /proc proc rw,nodiratime 0 0
/sys /sys sysfs rw 0 0
none /dev/pts devpts rw 0 0
none /dev/shm tmpfs rw 0 0
/dev/sdb1 /home/project ext2 rw 0 0
/dev/sda1 /dos/drive_C msdos rw,nodiratime,fmask=0022,dmask=0022 0 0
/dev/sda5 /dos/drive_D msdos rw,nodiratime,fmask=0022,dmask=0022 0 0
sunrpc /var/lib/nfs/rpc_pipefs rpc_pipefs rw 0 0

This continues until the system is too sick to even be re-booted
from the console. It requires the reset switch.

It looks like the command-line argument is probably overflowing
something in the kernel, resulting in non-related problems.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.10 on an i686 machine (5537.79 BogoMips).
Notice : All mail here is now cached for review by Dictator Bush.
98.36% of all statistics are fiction.


2005-02-02 18:36:17

by linux-os (Dick Johnson)

[permalink] [raw]
Subject: Re: Joe User DOS kills Linux-2.6.10


Additional information:
My swap-file is also on /dev/sdb2. It appears as though swap
is being written beyond the end of the SCSI device and the
device doesn't like it. Also on a subsequent re-boot the
signature in the swap file had been destroyed so that swapon
didn't like it. I needed to use `mkswap` again.

On Wed, 2 Feb 2005, linux-os wrote:

>
> When I compile and run the following program:
>
> #include <stdio.h>
> int main(int x, char **y)
> {
> pause();
> }
> ... as:
>
> ./xxx `yes`
>
> ... the following occurs after about 30 seconds (your mileage
> may vary):
>
> Additional sense: Peripheral device write fault
> end_request: I/O error, dev sdb, sector 34605780
> SCSI error : <0 0 1 0> return code = 0x8000002
> Info fld=0x2100101, Deferred sdb: sense key Medium Error
> Additional sense: Peripheral device write fault
> end_request: I/O error, dev sdb, sector 34603748
> SCSI error : <0 0 1 0> return code = 0x8000002
> Info fld=0x2100103, Deferred sdb: sense key Medium Error
> Additional sense: Peripheral device write fault
> end_request: I/O error, dev sdb, sector 34606804
> SCSI error : <0 0 1 0> return code = 0x8000002
> Info fld=0x213d5cd, Deferred sdb: sense key Medium Error
> Additional sense: Peripheral device write fault
> end_request: I/O error, dev sdb, sector 33943668
> SCSI error : <0 0 1 0> return code = 0x8000002
> Info fld=0x213d5ce, Deferred sdb: sense key Medium Error
> Additional sense: Peripheral device write fault
> end_request: I/O error, dev sdb, sector 33943676
> SCSI error : <0 0 1 0> return code = 0x8000002
> Info fld=0x213d5cf, Deferred sdb: sense key Medium Error
> Additional sense: Peripheral device write fault
> end_request: I/O error, dev sdb, sector 33943684
> SCSI error : <0 0 1 0> return code = 0x8000002
> Info fld=0x213d5d0, Deferred sdb: sense key Medium Error
> Additional sense: Peripheral device write fault
> end_request: I/O error, dev sdb, sector 33943692
> SCSI error : <0 0 1 0> return code = 0x8000002
> Info fld=0x2149672, Deferred sdb: sense key Medium Error
> Additional sense: Peripheral device write fault
> end_request: I/O error, dev sdb, sector 9437375
> Buffer I/O error on device sdb1, logical block 1179664
> lost page write due to I/O error on sdb1
> SCSI error : <0 0 1 0> return code = 0x8000002
> Info fld=0x2149673, Deferred sdb: sense key Medium Error
> Additional sense: Peripheral device write fault
> end_request: I/O error, dev sdb, sector 34903668
> SCSI error : <0 0 1 0> return code = 0x8000002
> Info fld=0x214967c, Current sdb: sense key Medium Error
> Additional sense: Peripheral device write fault
> end_request: I/O error, dev sdb, sector 34903676
>
> This device, /dev/sdb1 is one of the mounted file-systems.
> It is not being accessed. The root filesystem is on
> an IDE drive (/proc/mounts):
>
> rootfs / rootfs rw 0 0
> /dev/root.old /initrd ext2 rw 0 0
> /dev/root / ext3 rw 0 0
> /proc /proc proc rw,nodiratime 0 0
> /sys /sys sysfs rw 0 0
> none /dev/pts devpts rw 0 0
> none /dev/shm tmpfs rw 0 0
> /dev/sdb1 /home/project ext2 rw 0 0
> /dev/sda1 /dos/drive_C msdos rw,nodiratime,fmask=0022,dmask=0022 0 0
> /dev/sda5 /dos/drive_D msdos rw,nodiratime,fmask=0022,dmask=0022 0 0
> sunrpc /var/lib/nfs/rpc_pipefs rpc_pipefs rw 0 0
>
> This continues until the system is too sick to even be re-booted
> from the console. It requires the reset switch.
>
> It looks like the command-line argument is probably overflowing
> something in the kernel, resulting in non-related problems.
>
> Cheers,
> Dick Johnson
> Penguin : Linux version 2.6.10 on an i686 machine (5537.79 BogoMips).
> Notice : All mail here is now cached for review by Dictator Bush.
> 98.36% of all statistics are fiction.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.10 on an i686 machine (5537.79 BogoMips).
Notice : All mail here is now cached for review by Dictator Bush.
98.36% of all statistics are fiction.

2005-02-02 18:58:23

by Andreas Schwab

[permalink] [raw]
Subject: Re: Joe User DOS kills Linux-2.6.10

linux-os <[email protected]> writes:

> When I compile and run the following program:
>
> #include <stdio.h>
> int main(int x, char **y)
> {
> pause();
> }
> ... as:
>
> ./xxx `yes`

This is roughly equivalent to this:

#include <stdlib.h>
int main(void) { while (1) malloc(1); }

Andreas.

--
Andreas Schwab, SuSE Labs, [email protected]
SuSE Linux Products GmbH, Maxfeldstra?e 5, 90409 N?rnberg, Germany
Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."

2005-02-02 18:58:23

by Rik van Riel

[permalink] [raw]
Subject: Re: Joe User DOS kills Linux-2.6.10

On Wed, 2 Feb 2005, linux-os wrote:

> When I compile and run the following program:

> ./xxx `yes`

It looks like the program itself doesn't matter, since it's
bash that's eating up memory like crazy, until the point where
it is OOM killed.

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
32191 riel 18 0 436m 126m 312 R 45.9 86.9 0:13.37 bash
32222 riel 15 0 3276 148 124 S 39.0 0.1 0:09.79 yes

> Additional sense: Peripheral device write fault
> end_request: I/O error, dev sdb, sector 34605780
> SCSI error : <0 0 1 0> return code = 0x8000002
> Info fld=0x2100101, Deferred sdb: sense key Medium Error

Looks like your SCSI disk has some problems, you may want
to try running 'badblocks' on the swap partition to verify
that. The VM doesn't appear to have a problem with your
test program, in my quick runs here.

ObLKML: I was running the test inside Xen, and that seemed
to hold up fine too ;)

--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan

2005-02-03 00:36:03

by Andries Brouwer

[permalink] [raw]
Subject: Re: Joe User DOS kills Linux-2.6.10

On Wed, Feb 02, 2005 at 01:23:43PM -0500, linux-os wrote:
>
> When I compile and run the following program:
>
> #include <stdio.h>
> int main(int x, char **y)
> {
> pause();
> }
> ... as:
>
> ./xxx `yes`
>
> ... the following occurs after about 30 seconds (your mileage
> may vary):
>
> Additional sense: Peripheral device write fault
> end_request: I/O error, dev sdb, sector 34605780
> SCSI error : <0 0 1 0> return code = 0x8000002
> Info fld=0x2100101, Deferred sdb: sense key Medium Error
> Additional sense: Peripheral device write fault
> end_request: I/O error, dev sdb, sector 34603748
> SCSI error : <0 0 1 0> return code = 0x8000002
> Info fld=0x2100103, Deferred sdb: sense key Medium Error

When I run "sleep `yes`" under bash, all of swap space is filled,
and then bash says "realloc error ..." and exits.

No kernel problem, no bad bash problem.

If you do not run vm overcommit mode 2 then probably your bash
will be killed by the OOM killer, and if you are unlucky some
other stuff might be killed as well.

Concerning the SCSI errors, looks like you might have disk problems.
Bad blocks in your swap space. Recheck the disk.

Andries

2005-02-03 12:28:53

by linux-os (Dick Johnson)

[permalink] [raw]
Subject: Re: Joe User DOS kills Linux-2.6.10

On Thu, 3 Feb 2005, Andries Brouwer wrote:

> On Wed, Feb 02, 2005 at 01:23:43PM -0500, linux-os wrote:
>>
>> When I compile and run the following program:
>>
>> #include <stdio.h>
>> int main(int x, char **y)
>> {
>> pause();
>> }
>> ... as:
>>
>> ./xxx `yes`
>>
>> ... the following occurs after about 30 seconds (your mileage
>> may vary):
>>
>> Additional sense: Peripheral device write fault
>> end_request: I/O error, dev sdb, sector 34605780
>> SCSI error : <0 0 1 0> return code = 0x8000002
>> Info fld=0x2100101, Deferred sdb: sense key Medium Error
>> Additional sense: Peripheral device write fault
>> end_request: I/O error, dev sdb, sector 34603748
>> SCSI error : <0 0 1 0> return code = 0x8000002
>> Info fld=0x2100103, Deferred sdb: sense key Medium Error
>
> When I run "sleep `yes`" under bash, all of swap space is filled,
> and then bash says "realloc error ..." and exits.
>
> No kernel problem, no bad bash problem.
>
> If you do not run vm overcommit mode 2 then probably your bash
> will be killed by the OOM killer, and if you are unlucky some
> other stuff might be killed as well.
>
> Concerning the SCSI errors, looks like you might have disk problems.
> Bad blocks in your swap space. Recheck the disk.
>
> Andries
>

I ran badblocks (all night). There were none. It's a SCSI disk
and it requires chunks of DMA RAM for each write. The machine
just croaks when it gets low on RAM and tries to write to
SCSI swap which requires RAM.


Cheers,
Dick Johnson
Penguin : Linux version 2.6.10 on an i686 machine (5537.79 BogoMips).
Notice : All mail here is now cached for review by Dictator Bush.
98.36% of all statistics are fiction.

2005-02-03 14:45:44

by Andries Brouwer

[permalink] [raw]
Subject: Re: Joe User DOS kills Linux-2.6.10

On Thu, Feb 03, 2005 at 07:28:50AM -0500, linux-os wrote:

> I ran badblocks (all night). There were none. It's a SCSI disk
> and it requires chunks of DMA RAM for each write. The machine
> just croaks when it gets low on RAM and tries to write to
> SCSI swap which requires RAM.

In some other post you said that you were writing past the
end of the partition or disk.

If the disk is fine and you have reproducible errors
then the first thing to check is whether your partition table
is correct, whether your swap signature is correct, whether
the total size of the disk is recognized correctly at boot time.

2005-02-03 14:48:24

by linux-os (Dick Johnson)

[permalink] [raw]
Subject: Re: Joe User DOS kills Linux-2.6.10

On Thu, 3 Feb 2005, Andries Brouwer wrote:

> On Thu, Feb 03, 2005 at 07:28:50AM -0500, linux-os wrote:
>
>> I ran badblocks (all night). There were none. It's a SCSI disk
>> and it requires chunks of DMA RAM for each write. The machine
>> just croaks when it gets low on RAM and tries to write to
>> SCSI swap which requires RAM.
>
> In some other post you said that you were writing past the
> end of the partition or disk.
>
> If the disk is fine and you have reproducible errors
> then the first thing to check is whether your partition table
> is correct, whether your swap signature is correct, whether
> the total size of the disk is recognized correctly at boot time.
>

I just executed `mkswap` on both of my swap partitions. The
original swap partitions were created using very early tools.
I will now try to see if I get the same error, but I can't
do it now because I need a "work-break".


Cheers,
Dick Johnson
Penguin : Linux version 2.6.10 on an i686 machine (5537.79 BogoMips).
Notice : All mail here is now cached for review by Dictator Bush.
98.36% of all statistics are fiction.