2008-02-23 11:30:44

by Nikola Ciprich

[permalink] [raw]
Subject: arcmsr & areca-1660 - strange behaviour under heavy load

Hi,

I've found strange problem either in arcmsr driver, or maybe in
areca-1660 card...
When system on SAS discs RAID connected to areca-1660 card
gets under heavy I/O load, it gets unusable after some time. I can 100% reproduce
this, although it needs quite speciffic conditions:
It can be reproduced on 2x quad core machine, RAM has to be limited to
~192MB to cause heavy paging.
Only thing needed to cause the problem is to start loop doing kernel
compilation using make -j 8 - this loads the system heavily, because of
lack of memory. After few correct compile runs the system gets into
state when all programs including the basic ones (ls, cp, ..) start
crashing... dmesg (when it works) doesn't say anything strange...
After reboot, the system is OK again.
I have tested it on different motherboards, with different CPUs, RAMs(all
were properly tested with memtest), with two different areca cards and
different drives. I can't reproduce the problem on same hardware when
using different RAID card (ie adaptec). All testing systems were properly
cooled..
I have tried all available areca firmwares, two different distributions
(oracle linux, and centos), and kernels ranging from distribution ones, to last GIT snapshot.
Could somebody please give me some hints on how to hunt this problem?
Areca support doesn't seem to be very interested in the problem :-(
Thanks a lot in advance
BR
nik

-------------------------------------
Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.: +420 596 603 142
fax: +420 596 621 273
mobil: +420 777 093 799

http://www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: [email protected]
-------------------------------------


2008-02-25 00:14:05

by Andrew Morton

[permalink] [raw]
Subject: Re: arcmsr & areca-1660 - strange behaviour under heavy load

On Sat, 23 Feb 2008 12:20:12 +0100 (CET) Nikola Ciprich <[email protected]> wrote:

> Hi,
>
> I've found strange problem either in arcmsr driver, or maybe in
> areca-1660 card...
> When system on SAS discs RAID connected to areca-1660 card
> gets under heavy I/O load, it gets unusable after some time. I can 100% reproduce
> this, although it needs quite speciffic conditions:
> It can be reproduced on 2x quad core machine, RAM has to be limited to
> ~192MB to cause heavy paging.
> Only thing needed to cause the problem is to start loop doing kernel
> compilation using make -j 8 - this loads the system heavily, because of
> lack of memory. After few correct compile runs the system gets into
> state when all programs including the basic ones (ls, cp, ..) start
> crashing... dmesg (when it works) doesn't say anything strange...
> After reboot, the system is OK again.
> I have tested it on different motherboards, with different CPUs, RAMs(all
> were properly tested with memtest), with two different areca cards and
> different drives. I can't reproduce the problem on same hardware when
> using different RAID card (ie adaptec). All testing systems were properly
> cooled..
> I have tried all available areca firmwares, two different distributions
> (oracle linux, and centos), and kernels ranging from distribution ones, to last GIT snapshot.
> Could somebody please give me some hints on how to hunt this problem?
> Areca support doesn't seem to be very interested in the problem :-(

(cc's added)

Please get the machine into this state of memory exhaustion then take
copies of the output of the following, and send them via reply-to-all to
this email:

- cat /proc/meminfo

- cat /proc/slabinfo

- dmesg -c > /dev/null ; echo m > /proc/sysrq-trigger ; dmesg -c

Thanks.

2008-02-26 09:36:15

by Nikola Ciprich

[permalink] [raw]
Subject: Re: arcmsr & areca-1660 - strange behaviour under heavy load

Hi

On Sun, 24 Feb 2008, Andrew Morton wrote:

Hi Andrew,
thanks a lot for reply, I'm attaching requested information.
please let me know if You need more information/testing, whatever.
I'll be glad to help.
BR
nik

>> Areca support doesn't seem to be very interested in the problem :-(
>
> (cc's added)
>
> Please get the machine into this state of memory exhaustion then take
> copies of the output of the following, and send them via reply-to-all to
> this email:
>
> - cat /proc/meminfo
>
> - cat /proc/slabinfo
>
> - dmesg -c > /dev/null ; echo m > /proc/sysrq-trigger ; dmesg -c
>
> Thanks.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>

--


Attachments:
meminfo.txt (777.00 B)
slabinfo.txt (15.84 kB)
memory.txt (2.76 kB)
Download all attachments

2008-02-26 10:30:47

by NickCheng

[permalink] [raw]
Subject: RE: arcmsr & areca-1660 - strange behaviour under heavy load

Hi Nikola,
As I said, we will test on our site.
Our support team will help you to settle the issue.
Sorry for your inconvenience,

-----Original Message-----
From: Nikola Ciprich [mailto:[email protected]]
Sent: Tuesday, February 26, 2008 5:36 PM
To: Andrew Morton
Cc: [email protected]; [email protected]; Nick Cheng;
Erich Chen; [email protected]
Subject: Re: arcmsr & areca-1660 - strange behaviour under heavy load

Hi

On Sun, 24 Feb 2008, Andrew Morton wrote:

Hi Andrew,
thanks a lot for reply, I'm attaching requested information.
please let me know if You need more information/testing, whatever.
I'll be glad to help.
BR
nik

>> Areca support doesn't seem to be very interested in the problem :-(
>
> (cc's added)
>
> Please get the machine into this state of memory exhaustion then take
> copies of the output of the following, and send them via reply-to-all to
> this email:
>
> - cat /proc/meminfo
>
> - cat /proc/slabinfo
>
> - dmesg -c > /dev/null ; echo m > /proc/sysrq-trigger ; dmesg -c
>
> Thanks.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>

--

2008-02-26 17:45:19

by Andrew Morton

[permalink] [raw]
Subject: Re: arcmsr & areca-1660 - strange behaviour under heavy load

On Tue, 26 Feb 2008 10:35:31 +0100 (CET) Nikola Ciprich <[email protected]> wrote:

> Hi
>
> On Sun, 24 Feb 2008, Andrew Morton wrote:
>
> Hi Andrew,
> thanks a lot for reply, I'm attaching requested information.
> please let me know if You need more information/testing, whatever.
> I'll be glad to help.
> BR
> nik
>
> >> Areca support doesn't seem to be very interested in the problem :-(
> >
> > (cc's added)
> >
> > Please get the machine into this state of memory exhaustion then take
> > copies of the output of the following, and send them via reply-to-all to
> > this email:
> >
> > - cat /proc/meminfo
> >
> > - cat /proc/slabinfo
> >
> > - dmesg -c > /dev/null ; echo m > /proc/sysrq-trigger ; dmesg -c
> >
> > Thanks.

Alas, that all looks OK to me.

You never get any out-of-memory messages, and no oom-killing messages?

Possibly what is happening here is that in this low-memory condition, some
of the driver's internal memory-allocation attempts are failing, and the
driver isn't correctly handling this. This is a rare situation which may
well not have been hit in anyone else's testing.

I expect that the Areca engineers will be able to reproduce this with a
suitably small "mem=" kernel boot option. If not, they could perhaps
investigate the kernel's fault-injection framework, which permits
simulation of page allocation failures.

2008-02-26 19:30:12

by Nikola Ciprich

[permalink] [raw]
Subject: Re: arcmsr & areca-1660 - strange behaviour under heavy load

Hi Andrew,
no, right now I have the machine in the weird state, swap is empty (3GB),
and so is bigger part of RAM (~100MB free), and the gcc crashes even when
trying to compile c program with empty main function. so it doesn't seem
to be problem with memory exhaustion.
Hopefully the areca guys will be able to find out what is going on. But
anyways, if You'll have any other idea what should I check/try, please let
me know, as I have to admit that I'd really like to hunt it down myself
(and yes, there is some vanity on my side here :))
thanks a lot once more
cheers
nik



On Tue, 26 Feb 2008,
Andrew Morton wrote:

> On Tue, 26 Feb 2008 10:35:31 +0100 (CET) Nikola Ciprich <[email protected]> wrote:
>
>> Hi
>>
>> On Sun, 24 Feb 2008, Andrew Morton wrote:
>>
>> Hi Andrew,
>> thanks a lot for reply, I'm attaching requested information.
>> please let me know if You need more information/testing, whatever.
>> I'll be glad to help.
>> BR
>> nik
>>
>>>> Areca support doesn't seem to be very interested in the problem :-(
>>>
>>> (cc's added)
>>>
>>> Please get the machine into this state of memory exhaustion then take
>>> copies of the output of the following, and send them via reply-to-all to
>>> this email:
>>>
>>> - cat /proc/meminfo
>>>
>>> - cat /proc/slabinfo
>>>
>>> - dmesg -c > /dev/null ; echo m > /proc/sysrq-trigger ; dmesg -c
>>>
>>> Thanks.
>
> Alas, that all looks OK to me.
>
> You never get any out-of-memory messages, and no oom-killing messages?
>
> Possibly what is happening here is that in this low-memory condition, some
> of the driver's internal memory-allocation attempts are failing, and the
> driver isn't correctly handling this. This is a rare situation which may
> well not have been hit in anyone else's testing.
>
> I expect that the Areca engineers will be able to reproduce this with a
> suitably small "mem=" kernel boot option. If not, they could perhaps
> investigate the kernel's fault-injection framework, which permits
> simulation of page allocation failures.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>

--

2008-02-27 00:37:52

by Zan Lynx

[permalink] [raw]
Subject: Re: arcmsr & areca-1660 - strange behaviour under heavy load


On Tue, 2008-02-26 at 20:29 +0100, Nikola Ciprich wrote:
> Hi Andrew,
> no, right now I have the machine in the weird state, swap is empty (3GB),
> and so is bigger part of RAM (~100MB free), and the gcc crashes even when
> trying to compile c program with empty main function. so it doesn't seem
> to be problem with memory exhaustion.

Maybe memory fragmentation? Perhaps the driver tries to allocate a
large block of memory and cannot find a continuous block of the right
size.

Maybe the driver developers used different kernel .config options than
you are using.

Try increasing the value in /proc/sys/vm/min_free_kbytes.

Try switching some things like SLAB or SLUB, try booting with
kernelcore=512M to enable the Movable memory zone, or try 64-bit vs
32-bit kernels.
--
Zan Lynx <[email protected]>


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2008-02-27 01:53:46

by NickCheng

[permalink] [raw]
Subject: RE: arcmsr & areca-1660 - strange behaviour under heavy load

Hi Nikola,
Please put [email protected] in the loop.
I am sure Areca support, Kevin, has taken over your case.
If you like, please let him know your configuration and operations to
synchronize both sides.
Thank you for your patience and sorry for your inconvenience,

-----Original Message-----
From: Zan Lynx [mailto:[email protected]]
Sent: Wednesday, February 27, 2008 5:04 AM
To: Nikola Ciprich
Cc: Andrew Morton; [email protected]; [email protected];
Nick Cheng; Erich Chen; [email protected]
Subject: Re: arcmsr & areca-1660 - strange behaviour under heavy load


On Tue, 2008-02-26 at 20:29 +0100, Nikola Ciprich wrote:
> Hi Andrew,
> no, right now I have the machine in the weird state, swap is empty (3GB),
> and so is bigger part of RAM (~100MB free), and the gcc crashes even when
> trying to compile c program with empty main function. so it doesn't seem
> to be problem with memory exhaustion.

Maybe memory fragmentation? Perhaps the driver tries to allocate a
large block of memory and cannot find a continuous block of the right
size.

Maybe the driver developers used different kernel .config options than
you are using.

Try increasing the value in /proc/sys/vm/min_free_kbytes.

Try switching some things like SLAB or SLUB, try booting with
kernelcore=512M to enable the Movable memory zone, or try 64-bit vs
32-bit kernels.
--
Zan Lynx <[email protected]>