Marcelo,
This patch enables a workaround for multi-node NUMA systems that are
experiencing gettimeofday returning "old" time values. Because these
systems are frequently driven by different crystals, the CPUs vary
slightly in frequency causing the TSCs to drift apart. Thus it is
possible for gettimeofday to return time values behind time values
already seen on another cpu. This patch allows people compiling w/
'Multi-node NUMA support' to pass "notsc" or "bad-tsc" as boot
parameters. "notsc" disables rdtsc calls, and forces the kernel to use
the PIT for gettimeofday calucluations (as normally expected w/ i386
compiled kernels). While "bad-tsc" forces the kernel to use the PIT for
gettimeofday, but does not disable TSC access.
Thanks to Matt Wilson for suggestions on this revision.
Comments, suggestions and flames welcome.
thanks
-john
>> What's the difference between CONFIG_X86_NUMA and CONFIG_MULTIQUAD?
>>
>> If CONFIG_X86_NUMA is for numaq boxens please use CONFIG_X86_NUMAQ as
>> in pat's patch.
>
> Well, at the moment CONFIG_MULTIQUAD ~= numaq specific stuff + generic
> x86 numa stuff, so James and Martin are starting to break out the
> generic stuff out of MULTIQUAD and put the NUMAQ specific stuff under
> X86_NUMAQ.
The current differentiation is a mess, which is my fault.
CONFIG_MULTIQUAD started off life as clustered apic mode,
and grew from there to be a catchall for the NUMA-Q machine.
I can't help but think this is a bad idea, with the benefit
of hindsight. So we'll try to convert everything to more
meaningful config options, where the top level "machine type"
config option turns on the support features that machine needs.
I know the following looks a little verbose, but it's fairly
straightforward, and is a lot more logical than the current,
errm ... mess.
I'll submit a patch to convert the existing code over unless
someone screams pretty loudly (and suggests a better idea ;-))
M.
PS. Looking at the below again, we probably ought to rename
the remaining CONFIG_MULTIQUAD to CONFIG_CLUSTERED_APIC or
something ... but I'm sure I'll get lynched in the morning
for that one.
> We're trying to all move to something close to:
>
> bool 'Multi-node NUMA system support' CONFIG_X86_NUMA
> if [ "$CONFIG_X86_NUMA" = "y" ]; then
> #Platform Choices
> bool 'Multiquad (IBM/Sequent) NUMAQ support' CONFIG_X86_NUMAQ
> if [ "$CONFIG_X86_NUMAQ" = "y" ]; then
> define_bool CONFIG_MULTIQUAD y
> define_bool CONFIG_X86_TSC_DISABLE y
> fi
> bool 'IBM x440 Summit support' CONFIG_X86_SUMMIT_NUMA
> if [ "$CONFIG_X86_SUMMIT_NUMA" = "y" ]; then
> define_bool CONFIG_X86_TSC_DISABLE y
> fi
> # Common NUMA Features
> if [ "$CONFIG_X86_NUMAQ" = "y" -o "$CONFIG_X86_SUMMIT_NUMA" = "y" ]; then
> bool 'Numa Memory Allocation Support' CONFIG_NUMA
> if [ "$CONFIG_NUMA" = "y" ]; then
> define_bool CONFIG_DISCONTIGMEM y
> define_bool CONFIG_HAVE_ARCH_BOOTMEM_NODE y
> fi
> #[XXX - future]
> #bool 'NUMA API support' CONFIG_WHATEVER
> #bool 'Enable NUMA Scheduler' CONFIG_WHATEVER
> fi
> fi
>
>> else
>> - bool 'Multiquad NUMA system' CONFIG_MULTIQUAD
>> + bool 'Multi-node NUMA system support' CONFIG_X86_NUMA
>> + if [ "$CONFIG_X86_NUMA" = "y" ]; then
>> + bool 'Multiquad (IBM/Sequent) NUMAQ support' CONFIG_MULTIQUAD
>> + fi
>> fi
diff -Nru a/Documentation/Configure.help b/Documentation/Configure.help
--- a/Documentation/Configure.help Mon Aug 5 15:41:40 2002
+++ b/Documentation/Configure.help Mon Aug 5 15:41:40 2002
@@ -233,7 +233,21 @@
network and embedded applications. For more information see the
Axis Communication site, <http://developer.axis.com/>.
-Multiquad support for NUMA systems
+Multi-node support for NUMA systems
What's the difference between CONFIG_X86_NUMA and CONFIG_MULTIQUAD?
If CONFIG_X86_NUMA is for numaq boxens please use CONFIG_X86_NUMAQ as
in pat's patch.
else
- bool 'Multiquad NUMA system' CONFIG_MULTIQUAD
+ bool 'Multi-node NUMA system support' CONFIG_X86_NUMA
+ if [ "$CONFIG_X86_NUMA" = "y" ]; then
+ bool 'Multiquad (IBM/Sequent) NUMAQ support' CONFIG_MULTIQUAD
+ fi
fi
config.in files have three-space indents.
+
+if [ "$CONFIG_X86_HAS_TSC" = "y" ]; then
+ if [ "$CONFIG_X86_NUMA" = "y" ]; then
+ define_bool CONFIG_X86_TSC n
+ else
+ define_bool CONFIG_X86_TSC y
+ fi
+fi
+
+ if(!bad_tsc){
+ use_tsc = 1;
+ x86_udelay_tsc = 1;
+ #ifndef do_gettimeoffset
+ do_gettimeoffset = do_fast_gettimeoffset;
+ #endif
+ }
you want to read Documentation/CodingStyle, don't you?
On Mon, 2002-08-05 at 16:21, Christoph Hellwig wrote:
> -Multiquad support for NUMA systems
> +Multi-node support for NUMA systems
>
> What's the difference between CONFIG_X86_NUMA and CONFIG_MULTIQUAD?
>
> If CONFIG_X86_NUMA is for numaq boxens please use CONFIG_X86_NUMAQ as
> in pat's patch.
Well, at the moment CONFIG_MULTIQUAD ~= numaq specific stuff + generic
x86 numa stuff, so James and Martin are starting to break out the
generic stuff out of MULTIQUAD and put the NUMAQ specific stuff under
X86_NUMAQ.
We're trying to all move to something close to:
bool 'Multi-node NUMA system support' CONFIG_X86_NUMA
if [ "$CONFIG_X86_NUMA" = "y" ]; then
#Platform Choices
bool 'Multiquad (IBM/Sequent) NUMAQ support' CONFIG_X86_NUMAQ
if [ "$CONFIG_X86_NUMAQ" = "y" ]; then
define_bool CONFIG_MULTIQUAD y
define_bool CONFIG_X86_TSC_DISABLE y
fi
bool 'IBM x440 Summit support' CONFIG_X86_SUMMIT_NUMA
if [ "$CONFIG_X86_SUMMIT_NUMA" = "y" ]; then
define_bool CONFIG_X86_TSC_DISABLE y
fi
# Common NUMA Features
if [ "$CONFIG_X86_NUMAQ" = "y" -o "$CONFIG_X86_SUMMIT_NUMA" = "y" ]; then
bool 'Numa Memory Allocation Support' CONFIG_NUMA
if [ "$CONFIG_NUMA" = "y" ]; then
define_bool CONFIG_DISCONTIGMEM y
define_bool CONFIG_HAVE_ARCH_BOOTMEM_NODE y
fi
#[XXX - future]
#bool 'NUMA API support' CONFIG_WHATEVER
#bool 'Enable NUMA Scheduler' CONFIG_WHATEVER
fi
fi
> else
> - bool 'Multiquad NUMA system' CONFIG_MULTIQUAD
> + bool 'Multi-node NUMA system support' CONFIG_X86_NUMA
> + if [ "$CONFIG_X86_NUMA" = "y" ]; then
> + bool 'Multiquad (IBM/Sequent) NUMAQ support' CONFIG_MULTIQUAD
> + fi
> fi
>
> config.in files have three-space indents.
ah, thanks. fixed and attached.
> + if(!bad_tsc){
> + use_tsc = 1;
> + x86_udelay_tsc = 1;
> + #ifndef do_gettimeoffset
> + do_gettimeoffset = do_fast_gettimeoffset;
> + #endif
> + }
>
> you want to read Documentation/CodingStyle, don't you?
Always a good read :) Although outside the #ifndef inside a function
(which I'm really just moving, not adding to the code), I'm not sure I
see the violation in the above (although enlightenment is welcome, in
whatever form it might take :).
thanks for the feedback
-john