Hello,
I am trying LKML to get some help on one linux kernel related problem.
Lately we got a machine with new HW from Intel. CPU is Intel Core2 Duo E6850
3GHz with 2GB of RAM. Motherboard is Intel DG33BU with G33 chipset.
After long fight with kernel crashes on different things, we figured out that
if the multicore is disabled in bios, everything is ok and machine is running
good. No kernel crashes no problems, but with one core only.
This small table will maybe explain:
Cores - kernel - state
2 - nonsmp or smp - crash
1 - smp or nonsmp - ok
All crashes have been different (swaper, rcu, irq, init.....) or we just got
internal gcc compiler error while compiling kernel/glibc/.... and the machine
was frozen.
Please can somebody advise what to do to identify that problem more precisely.
(debug kernel options?)
Our immpresion - ICH9 & ICH9R support in kernel is bad... sorry to say..
lspci:
00:00.0 Host bridge: Intel Corporation DRAM Controller (rev 02)
00:02.0 VGA compatible controller: Intel Corporation Integrated Graphics
Controller (rev 02)
00:03.0 Communication controller: Intel Corporation MEI Controller (rev 02)
00:1a.0 USB Controller: Intel Corporation USB UHCI Controller #4 (rev 02)
00:1a.1 USB Controller: Intel Corporation USB UHCI Controller #5 (rev 02)
00:1a.2 USB Controller: Intel Corporation USB UHCI Controller #6 (rev 02)
00:1a.7 USB Controller: Intel Corporation USB2 EHCI Controller #2 (rev 02)
00:1c.0 PCI bridge: Intel Corporation PCI Express Port 1 (rev 02)
00:1c.1 PCI bridge: Intel Corporation PCI Express Port 2 (rev 02)
00:1c.2 PCI bridge: Intel Corporation PCI Express Port 3 (rev 02)
00:1c.3 PCI bridge: Intel Corporation PCI Express Port 4 (rev 02)
00:1c.4 PCI bridge: Intel Corporation PCI Express Port 5 (rev 02)
00:1d.0 USB Controller: Intel Corporation USB UHCI Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation USB UHCI Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation USB UHCI Controller #3 (rev 02)
00:1d.7 USB Controller: Intel Corporation USB2 EHCI Controller #1 (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 92)
00:1f.0 ISA bridge: Intel Corporation LPC Interface Controller (rev 02)
00:1f.2 IDE interface: Intel Corporation 4 port SATA IDE Controller (rev 02)
00:1f.3 SMBus: Intel Corporation SMBus Controller (rev 02)
00:1f.5 IDE interface: Intel Corporation 2 port SATA IDE Controller (rev 02)
02:00.0 IDE interface: Marvell Technology Group Ltd. 88SE6101 single-port
PATA133 interface (rev b1)
06:01.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL-8139/8139C/8139C+ (rev 10)
Thanks for any kind of help...
Best regards,
Pavol Cvengros
--
-----------[ Signature ]---------
Name: Pavol Cvengros
Company: Prime Interactive, Ltd.
E-mail: [email protected]
Web: http://www.primeinteractive.net
Personal web: http://orpheus.grass.sk
On Tue, 4 Dec 2007 13:31:24 +0100
Pavol Cvengros <[email protected]> wrote:
> Hello,
>
> I am trying LKML to get some help on one linux kernel related problem.
> Lately we got a machine with new HW from Intel. CPU is Intel Core2
> Duo E6850 3GHz with 2GB of RAM. Motherboard is Intel DG33BU with G33
> chipset.
>
> After long fight with kernel crashes on different things, we figured
> out that if the multicore is disabled in bios, everything is ok and
> machine is running good. No kernel crashes no problems, but with one
> core only.
>
to be honest, this looks like a thermal issue
are all your fans running?
is there anything in the case blocking air flow?
--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
On Tuesday 04 December 2007 15:52:32 you wrote:
> On Tue, 4 Dec 2007 13:31:24 +0100
>
> Pavol Cvengros <[email protected]> wrote:
> > Hello,
> >
> > I am trying LKML to get some help on one linux kernel related problem.
> > Lately we got a machine with new HW from Intel. CPU is Intel Core2
> > Duo E6850 3GHz with 2GB of RAM. Motherboard is Intel DG33BU with G33
> > chipset.
> >
> > After long fight with kernel crashes on different things, we figured
> > out that if the multicore is disabled in bios, everything is ok and
> > machine is running good. No kernel crashes no problems, but with one
> > core only.
>
> to be honest, this looks like a thermal issue
> are all your fans running?
> is there anything in the case blocking air flow?
tested.... everything is ok, CPU temp - ok... ventilators ok
--
-----------[ Signature ]---------
Name: Pavol Cvengros
Company: Prime Interactive, Ltd.
E-mail: [email protected]
Web: http://www.primeinteractive.net
Personal web: http://orpheus.grass.sk
On 12/4/07, Pavol Cvengros <[email protected]> wrote:
> Hello,
>
> I am trying LKML to get some help on one linux kernel related problem.
> Lately we got a machine with new HW from Intel. CPU is Intel Core2 Duo E6850
> 3GHz with 2GB of RAM. Motherboard is Intel DG33BU with G33 chipset.
>
> After long fight with kernel crashes on different things, we figured out
> that
> if the multicore is disabled in bios, everything is ok and machine is
> running
> good. No kernel crashes no problems, but with one core only.
>
> This small table will maybe explain:
>
> Cores - kernel - state
> 2 - nonsmp or smp - crash
> 1 - smp or nonsmp - ok
>
> All crashes have been different (swaper, rcu, irq, init.....) or we just got
> internal gcc compiler error while compiling kernel/glibc/.... and the
> machine
> was frozen.
>
> Please can somebody advise what to do to identify that problem more
> precisely.
> (debug kernel options?)
>
> Our immpresion - ICH9 & ICH9R support in kernel is bad... sorry to say..
>
> lspci:
> 00:00.0 Host bridge: Intel Corporation DRAM Controller (rev 02)
> 00:02.0 VGA compatible controller: Intel Corporation Integrated Graphics
> Controller (rev 02)
> 00:03.0 Communication controller: Intel Corporation MEI Controller (rev 02)
> 00:1a.0 USB Controller: Intel Corporation USB UHCI Controller #4 (rev 02)
> 00:1a.1 USB Controller: Intel Corporation USB UHCI Controller #5 (rev 02)
> 00:1a.2 USB Controller: Intel Corporation USB UHCI Controller #6 (rev 02)
> 00:1a.7 USB Controller: Intel Corporation USB2 EHCI Controller #2 (rev 02)
> 00:1c.0 PCI bridge: Intel Corporation PCI Express Port 1 (rev 02)
> 00:1c.1 PCI bridge: Intel Corporation PCI Express Port 2 (rev 02)
> 00:1c.2 PCI bridge: Intel Corporation PCI Express Port 3 (rev 02)
> 00:1c.3 PCI bridge: Intel Corporation PCI Express Port 4 (rev 02)
> 00:1c.4 PCI bridge: Intel Corporation PCI Express Port 5 (rev 02)
> 00:1d.0 USB Controller: Intel Corporation USB UHCI Controller #1 (rev 02)
> 00:1d.1 USB Controller: Intel Corporation USB UHCI Controller #2 (rev 02)
> 00:1d.2 USB Controller: Intel Corporation USB UHCI Controller #3 (rev 02)
> 00:1d.7 USB Controller: Intel Corporation USB2 EHCI Controller #1 (rev 02)
> 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 92)
> 00:1f.0 ISA bridge: Intel Corporation LPC Interface Controller (rev 02)
> 00:1f.2 IDE interface: Intel Corporation 4 port SATA IDE Controller (rev 02)
> 00:1f.3 SMBus: Intel Corporation SMBus Controller (rev 02)
> 00:1f.5 IDE interface: Intel Corporation 2 port SATA IDE Controller (rev 02)
> 02:00.0 IDE interface: Marvell Technology Group Ltd. 88SE6101 single-port
> PATA133 interface (rev b1)
> 06:01.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
> RTL-8139/8139C/8139C+ (rev 10)
>
> Thanks for any kind of help...
kernel version?
>
> Best regards,
>
> Pavol Cvengros
>
> --
> -----------[ Signature ]---------
> Name: Pavol Cvengros
> Company: Prime Interactive, Ltd.
> E-mail: [email protected]
> Web: http://www.primeinteractive.net
> Personal web: http://orpheus.grass.sk
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
--
Thanks,
Oliver
On Tuesday 04 December 2007 16:10:40 Oliv?r Pint?r wrote:
> On 12/4/07, Pavol Cvengros <[email protected]> wrote:
> > Hello,
> >
> > I am trying LKML to get some help on one linux kernel related problem.
> > Lately we got a machine with new HW from Intel. CPU is Intel Core2 Duo
> > E6850 3GHz with 2GB of RAM. Motherboard is Intel DG33BU with G33 chipset.
> >
> > After long fight with kernel crashes on different things, we figured out
> > that
> > if the multicore is disabled in bios, everything is ok and machine is
> > running
> > good. No kernel crashes no problems, but with one core only.
> >
> > This small table will maybe explain:
> >
> > Cores - kernel - state
> > 2 - nonsmp or smp - crash
> > 1 - smp or nonsmp - ok
> >
> > All crashes have been different (swaper, rcu, irq, init.....) or we just
> > got internal gcc compiler error while compiling kernel/glibc/.... and the
> > machine
> > was frozen.
> >
> > Please can somebody advise what to do to identify that problem more
> > precisely.
> > (debug kernel options?)
> >
> > Our immpresion - ICH9 & ICH9R support in kernel is bad... sorry to say..
> >
> > lspci:
> > 00:00.0 Host bridge: Intel Corporation DRAM Controller (rev 02)
> > 00:02.0 VGA compatible controller: Intel Corporation Integrated Graphics
> > Controller (rev 02)
> > 00:03.0 Communication controller: Intel Corporation MEI Controller (rev
> > 02) 00:1a.0 USB Controller: Intel Corporation USB UHCI Controller #4 (rev
> > 02) 00:1a.1 USB Controller: Intel Corporation USB UHCI Controller #5 (rev
> > 02) 00:1a.2 USB Controller: Intel Corporation USB UHCI Controller #6 (rev
> > 02) 00:1a.7 USB Controller: Intel Corporation USB2 EHCI Controller #2
> > (rev 02) 00:1c.0 PCI bridge: Intel Corporation PCI Express Port 1 (rev
> > 02) 00:1c.1 PCI bridge: Intel Corporation PCI Express Port 2 (rev 02)
> > 00:1c.2 PCI bridge: Intel Corporation PCI Express Port 3 (rev 02) 00:1c.3
> > PCI bridge: Intel Corporation PCI Express Port 4 (rev 02) 00:1c.4 PCI
> > bridge: Intel Corporation PCI Express Port 5 (rev 02) 00:1d.0 USB
> > Controller: Intel Corporation USB UHCI Controller #1 (rev 02) 00:1d.1 USB
> > Controller: Intel Corporation USB UHCI Controller #2 (rev 02) 00:1d.2 USB
> > Controller: Intel Corporation USB UHCI Controller #3 (rev 02) 00:1d.7 USB
> > Controller: Intel Corporation USB2 EHCI Controller #1 (rev 02) 00:1e.0
> > PCI bridge: Intel Corporation 82801 PCI Bridge (rev 92)
> > 00:1f.0 ISA bridge: Intel Corporation LPC Interface Controller (rev 02)
> > 00:1f.2 IDE interface: Intel Corporation 4 port SATA IDE Controller (rev
> > 02) 00:1f.3 SMBus: Intel Corporation SMBus Controller (rev 02)
> > 00:1f.5 IDE interface: Intel Corporation 2 port SATA IDE Controller (rev
> > 02) 02:00.0 IDE interface: Marvell Technology Group Ltd. 88SE6101
> > single-port PATA133 interface (rev b1)
> > 06:01.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
> > RTL-8139/8139C/8139C+ (rev 10)
> >
> > Thanks for any kind of help...
>
> kernel version?
tested with:
vanila 2.6.23.9
gentoo-2.6.23-r2
--
-----------[ Signature ]---------
Name: Pavol Cvengros
Company: Prime Interactive, Ltd.
E-mail: [email protected]
Web: http://www.primeinteractive.net
Personal web: http://orpheus.grass.sk
I have one P35 machine too. And it's running ok (my workstation). The
problem is with this board and chipset Intel G33 Express.
I am going to test vanilla 2.6.22 kernels and latest devel kernel too
next few days.
Zid Null wrote:
> I'm using ICH9 on this machine currently, works fine. (p35 chipset though)
>
> On 04/12/2007, *Pavol Cvengros* < [email protected]
> <mailto:[email protected]>> wrote:
>
> On Tuesday 04 December 2007 16:10:40 Oliv?r Pint?r wrote:
> > On 12/4/07, Pavol Cvengros <[email protected]
> <mailto:[email protected]>> wrote:
> > > Hello,
> > >
> > > I am trying LKML to get some help on one linux kernel related
> problem.
> > > Lately we got a machine with new HW from Intel. CPU is Intel
> Core2 Duo
> > > E6850 3GHz with 2GB of RAM. Motherboard is Intel DG33BU with
> G33 chipset.
> > >
> > > After long fight with kernel crashes on different things, we
> figured out
> > > that
> > > if the multicore is disabled in bios, everything is ok and
> machine is
> > > running
> > > good. No kernel crashes no problems, but with one core only.
> > >
> > > This small table will maybe explain:
> > >
> > > Cores - kernel - state
> > > 2 - nonsmp or smp - crash
> > > 1 - smp or nonsmp - ok
> > >
> > > All crashes have been different (swaper, rcu, irq, init.....)
> or we just
> > > got internal gcc compiler error while compiling
> kernel/glibc/.... and the
> > > machine
> > > was frozen.
> > >
> > > Please can somebody advise what to do to identify that problem
> more
> > > precisely.
> > > (debug kernel options?)
> > >
> > > Our immpresion - ICH9 & ICH9R support in kernel is bad...
> sorry to say..
> > >
> > > lspci:
> > > 00:00.0 Host bridge: Intel Corporation DRAM Controller (rev 02)
> > > 00:02.0 VGA compatible controller: Intel Corporation
> Integrated Graphics
> > > Controller (rev 02)
> > > 00:03.0 Communication controller: Intel Corporation MEI
> Controller (rev
> > > 02) 00: 1a.0 USB Controller: Intel Corporation USB UHCI
> Controller #4 (rev
> > > 02) 00:1a.1 USB Controller: Intel Corporation USB UHCI
> Controller #5 (rev
> > > 02) 00:1a.2 USB Controller: Intel Corporation USB UHCI
> Controller #6 (rev
> > > 02) 00:1a.7 USB Controller: Intel Corporation USB2 EHCI
> Controller #2
> > > (rev 02) 00:1c.0 PCI bridge: Intel Corporation PCI Express
> Port 1 (rev
> > > 02) 00:1c.1 PCI bridge: Intel Corporation PCI Express Port 2
> (rev 02)
> > > 00:1c.2 PCI bridge: Intel Corporation PCI Express Port 3 (rev
> 02) 00:1c.3
> > > PCI bridge: Intel Corporation PCI Express Port 4 (rev 02)
> 00:1c.4 PCI
> > > bridge: Intel Corporation PCI Express Port 5 (rev 02) 00: 1d.0 USB
> > > Controller: Intel Corporation USB UHCI Controller #1 (rev 02)
> 00:1d.1 USB
> > > Controller: Intel Corporation USB UHCI Controller #2 (rev 02)
> 00:1d.2 USB
> > > Controller: Intel Corporation USB UHCI Controller #3 (rev 02)
> 00: 1d.7 USB
> > > Controller: Intel Corporation USB2 EHCI Controller #1 (rev 02)
> 00:1e.0
> > > PCI bridge: Intel Corporation 82801 PCI Bridge (rev 92)
> > > 00:1f.0 ISA bridge: Intel Corporation LPC Interface Controller
> (rev 02)
> > > 00:1f.2 IDE interface: Intel Corporation 4 port SATA IDE
> Controller (rev
> > > 02) 00:1f.3 SMBus: Intel Corporation SMBus Controller (rev 02)
> > > 00:1f.5 IDE interface: Intel Corporation 2 port SATA IDE
> Controller (rev
> > > 02) 02:00.0 IDE interface: Marvell Technology Group Ltd. 88SE6101
> > > single-port PATA133 interface (rev b1)
> > > 06:01.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
> > > RTL-8139/8139C/8139C+ (rev 10)
> > >
> > > Thanks for any kind of help...
> >
> > kernel version?
>
> tested with:
> vanila 2.6.23.9 <http://2.6.23.9>
> gentoo-2.6.23-r2
>
>
> --
> -----------[ Signature ]---------
> Name: Pavol Cvengros
> Company: Prime Interactive, Ltd.
> E-mail: [email protected]
> <mailto:[email protected]>
> Web: http://www.primeinteractive.net <http://www.primeinteractive.net>
> Personal web: http://orpheus.grass.sk
> --
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in
> the body of a message to [email protected]
> <mailto:[email protected]>
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
> <http://www.tux.org/lkml/>
>
>
Pavol Cvengros wrote:
> Hello,
>
> I am trying LKML to get some help on one linux kernel related problem.
> Lately we got a machine with new HW from Intel. CPU is Intel Core2 Duo E6850
> 3GHz with 2GB of RAM. Motherboard is Intel DG33BU with G33 chipset.
>
> After long fight with kernel crashes on different things, we figured out that
> if the multicore is disabled in bios, everything is ok and machine is running
> good. No kernel crashes no problems, but with one core only.
>
> This small table will maybe explain:
>
> Cores - kernel - state
> 2 - nonsmp or smp - crash
> 1 - smp or nonsmp - ok
>
> All crashes have been different (swaper, rcu, irq, init.....) or we just got
> internal gcc compiler error while compiling kernel/glibc/.... and the machine
> was frozen.
>
> Please can somebody advise what to do to identify that problem more precisely.
> (debug kernel options?)
>
> Our immpresion - ICH9 & ICH9R support in kernel is bad... sorry to say..
>
I have seen unusual memory behavior under heavy load, in the cases I saw
it was heavy DMA load from multiple SCSI controllers, and one case with
FFT on the CPU and heavy network load with gigE. Have you run memtest on
this hardware? Just a thought, but I see people running Linux on that
chipset, if not that particular board.
A cheap test even if it shows nothing. Of course it could be a CPU cache
issue in that one CPU, although that's unlikely.
--
Bill Davidsen <[email protected]>
"We have more to fear from the bungling of the incompetent than from
the machinations of the wicked." - from Slashdot
On Thursday 06 December 2007 21:15:53 Bill Davidsen wrote:
> Pavol Cvengros wrote:
> > Hello,
> >
> > I am trying LKML to get some help on one linux kernel related problem.
> > Lately we got a machine with new HW from Intel. CPU is Intel Core2 Duo
> > E6850 3GHz with 2GB of RAM. Motherboard is Intel DG33BU with G33 chipset.
> >
> > After long fight with kernel crashes on different things, we figured out
> > that if the multicore is disabled in bios, everything is ok and machine
> > is running good. No kernel crashes no problems, but with one core only.
> >
> > This small table will maybe explain:
> >
> > Cores - kernel - state
> > 2 - nonsmp or smp - crash
> > 1 - smp or nonsmp - ok
> >
> > All crashes have been different (swaper, rcu, irq, init.....) or we just
> > got internal gcc compiler error while compiling kernel/glibc/.... and the
> > machine was frozen.
> >
> > Please can somebody advise what to do to identify that problem more
> > precisely. (debug kernel options?)
> >
> > Our immpresion - ICH9 & ICH9R support in kernel is bad... sorry to say..
>
> I have seen unusual memory behavior under heavy load, in the cases I saw
> it was heavy DMA load from multiple SCSI controllers, and one case with
> FFT on the CPU and heavy network load with gigE. Have you run memtest on
> this hardware? Just a thought, but I see people running Linux on that
> chipset, if not that particular board.
>
> A cheap test even if it shows nothing. Of course it could be a CPU cache
> issue in that one CPU, although that's unlikely.
yes, memtest was running all his tests without problems. The wierd thing is
that all kernel crashes we have seen were different (as stated in original
mail)....
--
-----------[ Signature ]---------
Name: Pavol Cvengros
Company: Prime Interactive, Ltd.
E-mail: [email protected]
Web: http://www.primeinteractive.net
Personal web: http://orpheus.grass.sk
Pavol Cvengros wrote:
> On Thursday 06 December 2007 21:15:53 Bill Davidsen wrote:
>
>> Pavol Cvengros wrote:
>>
>>> Hello,
>>>
>>> I am trying LKML to get some help on one linux kernel related problem.
>>> Lately we got a machine with new HW from Intel. CPU is Intel Core2 Duo
>>> E6850 3GHz with 2GB of RAM. Motherboard is Intel DG33BU with G33 chipset.
>>>
>>> After long fight with kernel crashes on different things, we figured out
>>> that if the multicore is disabled in bios, everything is ok and machine
>>> is running good. No kernel crashes no problems, but with one core only.
>>>
>>> This small table will maybe explain:
>>>
>>> Cores - kernel - state
>>> 2 - nonsmp or smp - crash
>>> 1 - smp or nonsmp - ok
>>>
>>> All crashes have been different (swaper, rcu, irq, init.....) or we just
>>> got internal gcc compiler error while compiling kernel/glibc/.... and the
>>> machine was frozen.
>>>
>>> Please can somebody advise what to do to identify that problem more
>>> precisely. (debug kernel options?)
>>>
>>> Our immpresion - ICH9 & ICH9R support in kernel is bad... sorry to say..
>>>
>> I have seen unusual memory behavior under heavy load, in the cases I saw
>> it was heavy DMA load from multiple SCSI controllers, and one case with
>> FFT on the CPU and heavy network load with gigE. Have you run memtest on
>> this hardware? Just a thought, but I see people running Linux on that
>> chipset, if not that particular board.
>>
>> A cheap test even if it shows nothing. Of course it could be a CPU cache
>> issue in that one CPU, although that's unlikely.
>>
>
> yes, memtest was running all his tests without problems. The wierd thing is
> that all kernel crashes we have seen were different (as stated in original
> mail)....
>
>
The problem with memtest, unless I underestimate it, is that it doesn't
use all core and siblings, so it doesn't quite load the memory system
the way regular usage would. Needless to say, if this does turn out to
be a memory loading issue I don't know of any tools to really test it. I
fall back on part swapping, but that only helps if it's the memory DIMM
itself.
--
Bill Davidsen <[email protected]>
"Woe unto the statesman who makes war without a reason that will still
be valid when the war is over..." Otto von Bismark
Bill Davidsen wrote:
> Pavol Cvengros wrote:
>> On Thursday 06 December 2007 21:15:53 Bill Davidsen wrote:
>>
>>> Pavol Cvengros wrote:
>>>
>>>> Hello,
>>>>
>>>> I am trying LKML to get some help on one linux kernel related problem.
>>>> Lately we got a machine with new HW from Intel. CPU is Intel Core2 Duo
>>>> E6850 3GHz with 2GB of RAM. Motherboard is Intel DG33BU with G33
>>>> chipset.
>>>>
>>>> After long fight with kernel crashes on different things, we
>>>> figured out
>>>> that if the multicore is disabled in bios, everything is ok and
>>>> machine
>>>> is running good. No kernel crashes no problems, but with one core
>>>> only.
>>>>
>>>> This small table will maybe explain:
>>>>
>>>> Cores - kernel - state
>>>> 2 - nonsmp or smp - crash
>>>> 1 - smp or nonsmp - ok
>>>>
>>>> All crashes have been different (swaper, rcu, irq, init.....) or we
>>>> just
>>>> got internal gcc compiler error while compiling kernel/glibc/....
>>>> and the
>>>> machine was frozen.
>>>>
>>>> Please can somebody advise what to do to identify that problem more
>>>> precisely. (debug kernel options?)
>>>>
>>>> Our immpresion - ICH9 & ICH9R support in kernel is bad... sorry to
>>>> say..
>>>>
>>> I have seen unusual memory behavior under heavy load, in the cases I
>>> saw
>>> it was heavy DMA load from multiple SCSI controllers, and one case with
>>> FFT on the CPU and heavy network load with gigE. Have you run
>>> memtest on
>>> this hardware? Just a thought, but I see people running Linux on that
>>> chipset, if not that particular board.
>>>
>>> A cheap test even if it shows nothing. Of course it could be a CPU
>>> cache
>>> issue in that one CPU, although that's unlikely.
>>>
>>
>> yes, memtest was running all his tests without problems. The wierd
>> thing is that all kernel crashes we have seen were different (as
>> stated in original mail)....
>>
>>
> The problem with memtest, unless I underestimate it, is that it
> doesn't use all core and siblings, so it doesn't quite load the memory
> system the way regular usage would. Needless to say, if this does turn
> out to be a memory loading issue I don't know of any tools to really
> test it. I fall back on part swapping, but that only helps if it's the
> memory DIMM itself.
>
right now that machine has 2 x 1GB DDR2 - 800MHz.... do you think I
should test the machine with only one DDR? (I hope to put there 4GB all
together)
Pavol Cvengros wrote:
> Bill Davidsen wrote:
>> Pavol Cvengros wrote:
>>> On Thursday 06 December 2007 21:15:53 Bill Davidsen wrote:
>>>
>>>> Pavol Cvengros wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I am trying LKML to get some help on one linux kernel related
>>>>> problem.
>>>>> Lately we got a machine with new HW from Intel. CPU is Intel Core2
>>>>> Duo
>>>>> E6850 3GHz with 2GB of RAM. Motherboard is Intel DG33BU with G33
>>>>> chipset.
>>>>>
>>>>> After long fight with kernel crashes on different things, we
>>>>> figured out
>>>>> that if the multicore is disabled in bios, everything is ok and
>>>>> machine
>>>>> is running good. No kernel crashes no problems, but with one core
>>>>> only.
>>>>>
>>>>> This small table will maybe explain:
>>>>>
>>>>> Cores - kernel - state
>>>>> 2 - nonsmp or smp - crash
>>>>> 1 - smp or nonsmp - ok
>>>>>
>>>>> All crashes have been different (swaper, rcu, irq, init.....) or
>>>>> we just
>>>>> got internal gcc compiler error while compiling kernel/glibc/....
>>>>> and the
>>>>> machine was frozen.
>>>>>
>>>>> Please can somebody advise what to do to identify that problem more
>>>>> precisely. (debug kernel options?)
>>>>>
>>>>> Our immpresion - ICH9 & ICH9R support in kernel is bad... sorry to
>>>>> say..
>>>>>
>>>> I have seen unusual memory behavior under heavy load, in the cases
>>>> I saw
>>>> it was heavy DMA load from multiple SCSI controllers, and one case
>>>> with
>>>> FFT on the CPU and heavy network load with gigE. Have you run
>>>> memtest on
>>>> this hardware? Just a thought, but I see people running Linux on that
>>>> chipset, if not that particular board.
>>>>
>>>> A cheap test even if it shows nothing. Of course it could be a CPU
>>>> cache
>>>> issue in that one CPU, although that's unlikely.
>>>>
>>>
>>> yes, memtest was running all his tests without problems. The wierd
>>> thing is that all kernel crashes we have seen were different (as
>>> stated in original mail)....
>>>
>>>
>> The problem with memtest, unless I underestimate it, is that it
>> doesn't use all core and siblings, so it doesn't quite load the
>> memory system the way regular usage would. Needless to say, if this
>> does turn out to be a memory loading issue I don't know of any tools
>> to really test it. I fall back on part swapping, but that only helps
>> if it's the memory DIMM itself.
>>
>
> right now that machine has 2 x 1GB DDR2 - 800MHz.... do you think I
> should test the machine with only one DDR? (I hope to put there 4GB
> all together)
>
Well, odd memory problems are rare, did you look for a BIOS update? It
could be that the chipset isn't being set properly, and would explain
why it might work differently with another BIOS. But if there's nothing
else to try, it won't hurt to see if it works differently with only one DDR.
--
Bill Davidsen <[email protected]>
"Woe unto the statesman who makes war without a reason that will still
be valid when the war is over..." Otto von Bismark
Bill Davidsen wrote:
> Pavol Cvengros wrote:
>> Bill Davidsen wrote:
>>> Pavol Cvengros wrote:
>>>> On Thursday 06 December 2007 21:15:53 Bill Davidsen wrote:
>>>>
>>>>> Pavol Cvengros wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I am trying LKML to get some help on one linux kernel related
>>>>>> problem.
>>>>>> Lately we got a machine with new HW from Intel. CPU is Intel
>>>>>> Core2 Duo
>>>>>> E6850 3GHz with 2GB of RAM. Motherboard is Intel DG33BU with G33
>>>>>> chipset.
>>>>>>
>>>>>> After long fight with kernel crashes on different things, we
>>>>>> figured out
>>>>>> that if the multicore is disabled in bios, everything is ok and
>>>>>> machine
>>>>>> is running good. No kernel crashes no problems, but with one core
>>>>>> only.
>>>>>>
>>>>>> This small table will maybe explain:
>>>>>>
>>>>>> Cores - kernel - state
>>>>>> 2 - nonsmp or smp - crash
>>>>>> 1 - smp or nonsmp - ok
>>>>>>
>>>>>> All crashes have been different (swaper, rcu, irq, init.....) or
>>>>>> we just
>>>>>> got internal gcc compiler error while compiling kernel/glibc/....
>>>>>> and the
>>>>>> machine was frozen.
>>>>>>
>>>>>> Please can somebody advise what to do to identify that problem more
>>>>>> precisely. (debug kernel options?)
>>>>>>
>>>>>> Our immpresion - ICH9 & ICH9R support in kernel is bad... sorry
>>>>>> to say..
>>>>>>
>>>>> I have seen unusual memory behavior under heavy load, in the cases
>>>>> I saw
>>>>> it was heavy DMA load from multiple SCSI controllers, and one case
>>>>> with
>>>>> FFT on the CPU and heavy network load with gigE. Have you run
>>>>> memtest on
>>>>> this hardware? Just a thought, but I see people running Linux on that
>>>>> chipset, if not that particular board.
>>>>>
>>>>> A cheap test even if it shows nothing. Of course it could be a CPU
>>>>> cache
>>>>> issue in that one CPU, although that's unlikely.
>>>>>
>>>>
>>>> yes, memtest was running all his tests without problems. The wierd
>>>> thing is that all kernel crashes we have seen were different (as
>>>> stated in original mail)....
>>>>
>>>>
>>> The problem with memtest, unless I underestimate it, is that it
>>> doesn't use all core and siblings, so it doesn't quite load the
>>> memory system the way regular usage would. Needless to say, if this
>>> does turn out to be a memory loading issue I don't know of any tools
>>> to really test it. I fall back on part swapping, but that only helps
>>> if it's the memory DIMM itself.
>>>
>>
>> right now that machine has 2 x 1GB DDR2 - 800MHz.... do you think I
>> should test the machine with only one DDR? (I hope to put there 4GB
>> all together)
>>
> Well, odd memory problems are rare, did you look for a BIOS update? It
> could be that the chipset isn't being set properly, and would explain
> why it might work differently with another BIOS. But if there's
> nothing else to try, it won't hurt to see if it works differently with
> only one DDR.
>
original BIOS and the latest BIOS tested, doesn't work.... I will try
latest kernels and just one DDR