2005-09-19 01:01:23

by Maurice Volaski

[permalink] [raw]
Subject: Re: Segfaults in mkdir under high load. Software or hardware?

At 6:00 AM -0500 9/18/05,
[email protected] wrote:
> >
>> I have been seeing a similar thing:
>>
>> ./current:Sep 17 18:00:01 [kernel] mkdir[7696]: segfault at
>> 0000000000000000 rip 000000000040184d rsp 00007fffff826350 error 4
>>
>> I'm using the plain 2.6.13 (from gentoo vanilla sources), though it
>> was compiled with
>> gcc version 3.4.4 (Gentoo 3.4.4-r1, ssp-3.4.4-1.0, pie-8.7.8)
>
>x86_64 ? If so see http://bugzilla.kernel.org/show_bug.cgi?id=4851

Dual Opteron, and this looks like my issue. It recommends echo 0 >
/proc/sys/kernel/randomize_va_space but that has not stopped it from
happening, so I'll probably wait for the patch to get merged.
--

Maurice Volaski, [email protected]
Computing Support, Rose F. Kennedy Center
Albert Einstein College of Medicine of Yeshiva University


2005-09-19 19:24:38

by Bongani Hlope

[permalink] [raw]
Subject: Re: Segfaults in mkdir under high load. Software or hardware?

On Monday 19 September 2005 03:03, Maurice Volaski wrote:
> At 6:00 AM -0500 9/18/05,
>
> [email protected] wrote:
> >> I have been seeing a similar thing:
> >>
> >> ./current:Sep 17 18:00:01 [kernel] mkdir[7696]: segfault at
> >> 0000000000000000 rip 000000000040184d rsp 00007fffff826350 error 4
> >>
> >> I'm using the plain 2.6.13 (from gentoo vanilla sources), though it
> >> was compiled with
> >> gcc version 3.4.4 (Gentoo 3.4.4-r1, ssp-3.4.4-1.0, pie-8.7.8)
> >
> >x86_64 ? If so see http://bugzilla.kernel.org/show_bug.cgi?id=4851
>
> Dual Opteron, and this looks like my issue. It recommends echo 0 >
> /proc/sys/kernel/randomize_va_space but that has not stopped it from
> happening, so I'll probably wait for the patch to get merged.

Linus has a patch for that, which you might try. Look at
http://bugzilla.kernel.org/show_bug.cgi?id=4851 for more details on this bug.

--- arch/x86_64/kernel/setup.c.orig 2005-09-18 07:34:36.000000000 +0200
+++ arch/x86_64/kernel/setup.c 2005-09-18 07:37:25.000000000 +0200
@@ -793,10 +793,23 @@ static void __init amd_detect_cmp(struct
#endif
}

+#define HWCR 0xc0010015
+
static int __init init_amd(struct cpuinfo_x86 *c)
{
int r;
int level;
+#if CONFIG_SMP
+ unsigned long value;
+ // Disable TLB flush filter by setting HWCR.FFDIS:
+ // bit 6 of msr C001_0015
+ //
+ // Errata 63 for SH-B3 steppings
+ // Errata 122 for all(?) steppings
+ rdmsrl(HWCR, value);
+ value |= 1 << 6;
+ wrmsrl(HWCR, value);
+#endif

/* Bit 31 in normal CPUID used for nonstandard 3DNow ID;
3DNow is IDd by bit 31 in extended CPUID (1*32+31) anyway */