Hi ,
I installed Red Hat 7.0, I am able to find the
linux-2.2.16 in /usr/src
These are the following steps I did to install
kernel 2.4:
cd /usr/src
#rm -r linux
# rm -rf linux-2.2.16
#tar -xvf linux-2.4.0-test9.tar
#cd /usr/src
#ls
linux
redhat
#mv linux linux-2.4.0-test9
#ln -s linux-2.4.0-test9 linux
#ls
linux->linux-2.4.0-test9
linux-2.4.0-test9
redhat
#cd /usr/src/linux
#make xconfig
I just save & exit without changing the
configuration.
#make dep
#make clean
#make bzImage
#make modules
#make modules_install
I find that System.map is mapped to 2.4.0, ie.. new
System-2.4.0-test9.map is created
#cd /boot
#ls
#vmlinuz->vmlinuz-2.4.0-test9
#cd /usr/src/linux/arch/i386/boot
#cp bzImage /boot/vmlinuz
#vi /etc/lilo.conf
changed image from image = /boot/vmlinuz-2.2.16-22
to
image = /boot/vmlinuz-2.4.0-test9
#lilo
linux
dos
when I boot linux
The system hangs after messages:
loading linux......
uncompressing linux, booting linux kernel OK.
The System hangs here.
Please let me know where I am wrong
with regards,
Anil
__________________________________________________
Do You Yahoo!?
Thousands of Stores. Millions of Products. All in one Place.
http://shopping.yahoo.com/
Anil kumar wrote:
>
> Hi ,
> I installed Red Hat 7.0, I am able to find the
> linux-2.2.16 in /usr/src
>
> These are the following steps I did to install
> kernel 2.4:
>
> cd /usr/src
> #rm -r linux
> # rm -rf linux-2.2.16
> #tar -xvf linux-2.4.0-test9.tar
>
> #cd /usr/src
> #ls
> linux
> redhat
> #mv linux linux-2.4.0-test9
> #ln -s linux-2.4.0-test9 linux
>
> #ls
> linux->linux-2.4.0-test9
> linux-2.4.0-test9
> redhat
>
> #cd /usr/src/linux
> #make xconfig
> I just save & exit without changing the
> configuration.
> #make dep
> #make clean
> #make bzImage
> #make modules
> #make modules_install
>
> I find that System.map is mapped to 2.4.0, ie.. new
> System-2.4.0-test9.map is created
>
> #cd /boot
> #ls
> #vmlinuz->vmlinuz-2.4.0-test9
> #cd /usr/src/linux/arch/i386/boot
> #cp bzImage /boot/vmlinuz
>
> #vi /etc/lilo.conf
> changed image from image = /boot/vmlinuz-2.2.16-22
How about making this line match the name of the copied kernel (you
copied it as
/boot/vmlinuz so thi should be image = /boot/vmlinuz
Jeff
> to
> image = /boot/vmlinuz-2.4.0-test9
>
> #lilo
> linux
> dos
>
> when I boot linux
>
> The system hangs after messages:
> loading linux......
> uncompressing linux, booting linux kernel OK.
>
> The System hangs here.
>
> Please let me know where I am wrong
>
> with regards,
> Anil
>
> __________________________________________________
> Do You Yahoo!?
> Thousands of Stores. Millions of Products. All in one Place.
> http://shopping.yahoo.com/
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> Please read the FAQ at http://www.tux.org/lkml/
On Tue, 7 Nov 2000, Anil kumar wrote:
> The system hangs after messages:
> loading linux......
> uncompressing linux, booting linux kernel OK.
>
> The System hangs here.
>
> Please let me know where I am wrong
Hi Anil,
The only serious mistake you did was using test9 kernel when test11-pre1
(or at least test10) was available. So, redo everything you have done with
test11-pre1 and if you still cannot boot then send a message to this list
with details like your CPUs, motherboard etc. etc.
Regards,
Tigran
On Tue, 7 Nov 2000, Tigran Aivazian wrote:
> On Tue, 7 Nov 2000, Anil kumar wrote:
> > The system hangs after messages:
> > loading linux......
> > uncompressing linux, booting linux kernel OK.
> >
> > The System hangs here.
> >
> > Please let me know where I am wrong
>
> Hi Anil,
>
> The only serious mistake you did was using test9 kernel when test11-pre1
> (or at least test10) was available. So, redo everything you have done with
> test11-pre1 and if you still cannot boot then send a message to this list
> with details like your CPUs, motherboard etc. etc.
Have you chosen the right cpu type in the configuration?
/Martin
So how come NetWare and NT can detect this at run time, and we have to
use a .config option to specifiy it? Come on guys.....
Jeff
Martin Josefsson wrote:
>
> On Tue, 7 Nov 2000, Tigran Aivazian wrote:
>
> > On Tue, 7 Nov 2000, Anil kumar wrote:
> > > The system hangs after messages:
> > > loading linux......
> > > uncompressing linux, booting linux kernel OK.
> > >
> > > The System hangs here.
> > >
> > > Please let me know where I am wrong
> >
> > Hi Anil,
> >
> > The only serious mistake you did was using test9 kernel when test11-pre1
> > (or at least test10) was available. So, redo everything you have done with
> > test11-pre1 and if you still cannot boot then send a message to this list
> > with details like your CPUs, motherboard etc. etc.
>
> Have you chosen the right cpu type in the configuration?
>
> /Martin
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> Please read the FAQ at http://www.tux.org/lkml/
On Tue, 07 Nov 2000 23:32:46 Jeff V. Merkey wrote:
>
> So how come NetWare and NT can detect this at run time, and we have to
> use a .config option to specifiy it? Come on guys.....
>
If you can get NT to boot on a 486, perhaps that shows that NT does not use
any optimization...so does not worry about what is it running on, just
prints the name...
--
Juan Antonio Magallon Lacarta #> cd /pub
mailto:[email protected] #> more beer
I don't know about you, but I like having the option to cut out code from
my kernel that will never get used for a particular cpu arch.....;o)
Or was that just a troll ;o)
----- Forwarded by Bruce Holzrichter/US/Infinium Software on 11/07/2000
05:46 PM -----
"Jeff V. Merkey"
<[email protected]> To: Martin Josefsson
Sent by: <[email protected]>
linux-kernel-owner@vger. cc: Tigran Aivazian
kernel.org <[email protected]>, Anil kumar
<[email protected]>,
[email protected]
11/07/2000 05:32 PM Subject: Re: Installing kernel 2.4
So how come NetWare and NT can detect this at run time, and we have to
use a .config option to specifiy it? Come on guys.....
Jeff
Martin Josefsson wrote:
>
> On Tue, 7 Nov 2000, Tigran Aivazian wrote:
>
> > On Tue, 7 Nov 2000, Anil kumar wrote:
> > > The system hangs after messages:
> > > loading linux......
> > > uncompressing linux, booting linux kernel OK.
> > >
> > > The System hangs here.
> > >
> > > Please let me know where I am wrong
> >
> > Hi Anil,
> >
> > The only serious mistake you did was using test9 kernel when
test11-pre1
> > (or at least test10) was available. So, redo everything you have done
with
> > test11-pre1 and if you still cannot boot then send a message to this
list
> > with details like your CPUs, motherboard etc. etc.
>
> Have you chosen the right cpu type in the configuration?
>
> /Martin
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel"
in
> the body of a message to [email protected]
> Please read the FAQ at http://www.tux.org/lkml/
On Tue, 7 Nov 2000, Jeff V. Merkey wrote:
> So how come NetWare and NT can detect this at run time, and we have to
> use a .config option to specifiy it? Come on guys.....
Then run a kernel compiled for i386 and suffer the poorer code quality
that comes with not using newer instructions and including the
workarounds for ancient hardware.
-ben
"Jeff V. Merkey" wrote:
> So how come NetWare and NT can detect this at run time, and we have to
> use a .config option to specifiy it? Come on guys.....
Linux detects this as well -
However this is not about detection, but optimizations.
Optimizations e.g. for xeon could keep a K6/2 from booting!
It should probably default to something safe like 386 though...
jjs
There are tests for all this in the feature flags for intel and
non-intel CPUs like AMD -- including MTRR settings. All of this could
be dynamic. Here's some code that does this, and it's similiar to
NetWare. It detexts CPU type, feature flags, special instructions,
etc. All of this on x86 could be dynamically detected.
:-)
;*************************************************************************
;
; check current processor type and state
;
;*************************************************************************
public DetectProcessorInformation
DetectProcessorInformation proc near
mov ax, cs
mov ds, ax
mov es, ax
pushf
call get_cpuid
call get_fpuid
call print
popf
ret
DetectProcessorInformation endp
get_cpuid proc near
check_8086:
pushf
pop ax
mov cx, ax
and ax, 0fffh
push ax
popf
pushf
pop ax
and ax, 0f000h
cmp ax, 0f000h ; flag bits 12-15 are always set on an 8086
mov CPU_TYPE, 0 ; 8086 detected
je end_get_cpuid
check_80286:
or cx, 0f000h
push cx
popf
pushf
pop ax
and ax, 0f000h ; flag bits 12-15 are always clear on 80286 in
real mode
mov CPU_TYPE, 2 ; 80286 processor
jz end_get_cpuid
check_80386:
mov bx, sp
and sp, not 3
OPND32
pushf
OPND32
pop ax
OPND32
mov cx, ax
OPND32 35h, 40000h
OPND32
push ax
OPND32
popf
OPND32
pushf
OPND32
pop ax
OPND32
xor ax, cx ; AC bit won't toggle, 80386 detected
mov sp, bx
mov CPU_TYPE, 3 ; 80386 detected
jz end_get_cpuid
and sp, not 3
OPND32
push cx
OPND32
popf
mov sp, bx ; restore stack
check_80486:
mov CPU_TYPE, 4 ; default to 80486
OPND32
mov ax, cx
OPND32 35h, 200000h ; xor ID bit
OPND32
push ax
OPND32
popf
OPND32
pushf
OPND32
pop ax
OPND32
xor ax, cx ; cant toggle ID bit
je end_get_cpuid
check_vendor:
mov ID_FLAG, 1
OPND32
xor ax, ax
CPUID
OPND32
mov word ptr VENDOR_ID, bx
OPND32
mov word ptr VENDOR_ID[+4], dx
OPND32
mov word ptr VENDOR_ID[+8], cx
mov si, offset VENDOR_ID
mov di, offset intel_id
mov cx, length intel_id
compare:
repe cmpsb
or cx, cx
jnz end_get_cpuid
intel_processor:
mov INTEL_PROC, 1
cpuid_data:
OPND32
cmp ax, 1
jl end_get_cpuid
OPND32
xor ax, ax
OPND32
inc ax
CPUID
mov byte ptr ds:STEPPING, al
and STEPPING, STEPPING_MASK
and al, MODEL_MASK
shr al, MODEL_SHIFT
mov byte ptr ds:CPU_MODEL, al
and ax, FAMILY_MASK
shr ax, FAMILY_SHIFT
mov byte ptr ds:CPU_TYPE, al
mov dword ptr FEATURE_FLAGS, edx
end_get_cpuid:
ret
get_cpuid endp
get_fpuid proc near
fninit
mov word ptr ds:FP_STATUS, 5a5ah
fnstsw word ptr ds:FP_STATUS
mov ax, word ptr ds:FP_STATUS
cmp al, 0
mov FPU_TYPE, 0
jne end_get_fpuid
check_control_word:
fnstcw word ptr ds:FP_STATUS
mov ax, word ptr ds:FP_STATUS
and ax, 103fh
cmp ax, 3fh
mov FPU_TYPE, 0
jne end_get_fpuid
mov FPU_TYPE, 1
check_infinity:
cmp CPU_TYPE, 3
jne end_get_fpuid
fld1
fldz
fdiv
fld st
fchs
fcompp
fstsw word ptr ds:FP_STATUS
mov ax, word ptr ds:FP_STATUS
mov FPU_TYPE, 2
sahf
jz end_get_fpuid
mov FPU_TYPE, 3
end_get_fpuid:
ret
get_fpuid endp
print proc near
cmp ID_FLAG, 1
je print_cpuid_data
if (VERBOSE)
mov dx, offset id_msg
call OutputMessage
endif
print_86:
cmp CPU_TYPE, 0
jne print_286
if (VERBOSE)
mov dx, offset c8086
call OutputMessage
endif
cmp FPU_TYPE, 0
je end_print
if (VERBOSE)
mov dx, offset fp_8087
call OutputMessage
endif
jmp end_print
print_286:
cmp CPU_TYPE, 2
jne print_386
if (VERBOSE)
mov dx, offset c286
call OutputMessage
endif
cmp FPU_TYPE, 0
je end_print
if (VERBOSE)
mov dx, offset fp_80287
call OutputMessage
endif
jmp end_print
print_386:
cmp CPU_TYPE, 3
jne print_486
if (VERBOSE)
mov dx, offset c386
call OutputMessage
endif
cmp FPU_TYPE, 0
je end_print
cmp FPU_TYPE, 2
jne print_387
if (VERBOSE)
mov dx, offset fp_80287
call OutputMessage
endif
jmp end_print
print_387:
if (VERBOSE)
mov dx, offset fp_80387
call OutputMessage
endif
jmp end_print
print_486:
cmp FPU_TYPE, 0
je print_Intel486sx
if (VERBOSE)
mov dx, offset c486
call OutputMessage
endif
jmp end_print
print_Intel486sx:
if (VERBOSE)
mov dx, offset c486nfp
call OutputMessage
endif
jmp end_print
print_cpuid_data:
cmp_vendor:
cmp INTEL_PROC, 1
jne not_GenuineIntel
cmp CPU_TYPE, 4
jne check_Pentium
if (VERBOSE)
mov dx, offset Intel486_msg
call OutputMessage
endif
jmp print_family
check_Pentium:
cmp CPU_TYPE, 5
jne check_PentiumPro
if (VERBOSE)
mov dx, offset Pentium_msg
call OutputMessage
endif
jmp print_family
check_PentiumPro:
cmp CPU_TYPE, 6
jne print_features
if (VERBOSE)
mov dx, offset PentiumPro_msg
call OutputMessage
endif
print_family:
IF VERBOSE
mov dx, offset family_msg
call OutputMessage
ENDIF
mov al, byte ptr ds:CPU_TYPE
mov byte ptr dataCR, al
add byte ptr dataCR, 30h
IF VERBOSE
mov dx, offset dataCR
call OutputMessage
ENDIF
print_model:
IF VERBOSE
mov dx, offset model_msg
call OutputMessage
ENDIF
mov al, byte ptr ds:CPU_MODEL
mov byte ptr dataCR, al
add byte ptr dataCR, 30h
IF VERBOSE
mov dx, offset dataCR
call OutputMessage
ENDIF
print_features:
mov ax, word ptr ds:FEATURE_FLAGS
and ax, FPU_FLAG
jz check_mce
if (VERBOSE)
mov dx, offset fpu_msg
call OutputMessage
ENDIF
check_mce:
mov ax, word ptr ds:FEATURE_FLAGS
and ax, MCE_FLAG
jz check_wc
IF VERBOSE
mov dx, offset mce_msg
call OutputMessage
ENDIF
check_CMPXCHG8B:
mov ax, word ptr ds:FEATURE_FLAGS
and ax, CMPXCHG8B_FLAG
jz check_4MB_paging
IF VERBOSE
mov dx, offset cmp_msg
call OutputMessage
ENDIF
chekc_io_break:
mov ax, word ptr ds:FEATURE_FLAGS
test ax, 4
jz check_4MB_paging
IF VERBOSE
mov dx, offset io_break_msg
call OutputMessage
ENDIF
; Enable Debugging Extensions bit in CR4
CR4_TO_ECX
or ecx, 08h
ECX_TO_CR4
if (VERBOSE)
mov dx, offset io_break_enable
call OutputMessage
endif
check_4MB_paging:
mov ax, word ptr ds:FEATURE_FLAGS
test ax, 08h
jz check_PageExtend
IF VERBOSE
mov dx, offset page_4MB_msg
call OutputMessage
ENDIF
; Enable page size extension bit in CR4
CR4_TO_ECX
or ecx, 10h
ECX_TO_CR4
if (VERBOSE)
mov dx, offset p4mb_enable
call OutputMessage
endif
check_PageExtend:
mov ax, word ptr ds:FEATURE_FLAGS
test ax, 40h
jz check_wc
;; DEBUG DEBUG DEBUG !!!
; Enable page address extension bit in CR4
;; CR4_TO_ECX
;; or ecx, 20h
;; ECX_TO_CR4
check_wc:
mov dx, word ptr ds:FEATURE_FLAGS
test dx, 1000h ; MTRR support flag
jz end_print
if (VERBOSE)
mov dx, offset wc_enable
call OutputMessage
endif
jmp end_print
not_GenuineIntel:
if (VERBOSE)
mov dx, offset not_Intel
call OutputMessage
endif
end_print:
ret
print endp
[email protected] wrote:
>
> On Tue, 7 Nov 2000, Jeff V. Merkey wrote:
>
> > So how come NetWare and NT can detect this at run time, and we have to
> > use a .config option to specifiy it? Come on guys.....
>
> Then run a kernel compiled for i386 and suffer the poorer code quality
> that comes with not using newer instructions and including the
> workarounds for ancient hardware.
>
> -ben
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> Please read the FAQ at http://www.tux.org/lkml/
> There are tests for all this in the feature flags for intel and
> non-intel CPUs like AMD -- including MTRR settings. All of this could
> be dynamic. Here's some code that does this, and it's similiar to
> NetWare. It detexts CPU type, feature flags, special instructions,
> etc. All of this on x86 could be dynamically detected.
Detecting the CPU isn't the issue (we already do all this), it's what to
do when you've figured out what the CPU is. Show me code that can
dynamically adjust the alignment of the routines/variables/structs
dependant upon cacheline size.
regards,
Davej.
--
| Dave Jones <[email protected]> http://www.suse.de/~davej
| SuSE Labs
Jeff, the problem is not detecting the CPU type at runtime, the problem is
trying to re-compile the code to take advantage of that CPU at runtime.
depending on what CPU you have the kernel (and compiler) can use different
commands/opmizations/etc, if you want to do this on boot you have two
options.
1. re-compile the kernel
2. change all the CPU specific places from inline code to function calls
into a table that get changed at boot to point at the correct calls.
doing #2 will cost you so much performance that you would be better off
just compiling for a 386 and not going through the autodetect hassle in
the first place.
David Lang
On Tue, 7 Nov 2000, Jeff V. Merkey wrote:
> Date: Tue, 07 Nov 2000 16:10:58 -0700
> From: Jeff V. Merkey <[email protected]>
> To: [email protected]
> Cc: Martin Josefsson <[email protected]>,
> Tigran Aivazian <[email protected]>, Anil kumar <[email protected]>,
> [email protected]
> Subject: Re: Installing kernel 2.4
>
>
> There are tests for all this in the feature flags for intel and
> non-intel CPUs like AMD -- including MTRR settings. All of this could
> be dynamic. Here's some code that does this, and it's similiar to
> NetWare. It detexts CPU type, feature flags, special instructions,
> etc. All of this on x86 could be dynamically detected.
>
> :-)
>
> ;*************************************************************************
> ;
> ; check current processor type and state
> ;
> ;*************************************************************************
>
> public DetectProcessorInformation
> DetectProcessorInformation proc near
>
> mov ax, cs
> mov ds, ax
> mov es, ax
>
> pushf
> call get_cpuid
> call get_fpuid
> call print
> popf
> ret
>
> DetectProcessorInformation endp
>
> get_cpuid proc near
>
> check_8086:
>
> pushf
> pop ax
> mov cx, ax
> and ax, 0fffh
> push ax
> popf
> pushf
> pop ax
> and ax, 0f000h
> cmp ax, 0f000h ; flag bits 12-15 are always set on an 8086
> mov CPU_TYPE, 0 ; 8086 detected
> je end_get_cpuid
>
> check_80286:
> or cx, 0f000h
> push cx
> popf
> pushf
> pop ax
> and ax, 0f000h ; flag bits 12-15 are always clear on 80286 in
> real mode
> mov CPU_TYPE, 2 ; 80286 processor
> jz end_get_cpuid
>
> check_80386:
> mov bx, sp
> and sp, not 3
> OPND32
> pushf
> OPND32
> pop ax
> OPND32
> mov cx, ax
> OPND32 35h, 40000h
> OPND32
> push ax
> OPND32
> popf
> OPND32
> pushf
> OPND32
> pop ax
> OPND32
> xor ax, cx ; AC bit won't toggle, 80386 detected
> mov sp, bx
> mov CPU_TYPE, 3 ; 80386 detected
> jz end_get_cpuid
>
> and sp, not 3
> OPND32
> push cx
> OPND32
> popf
> mov sp, bx ; restore stack
>
>
> check_80486:
> mov CPU_TYPE, 4 ; default to 80486
>
> OPND32
> mov ax, cx
> OPND32 35h, 200000h ; xor ID bit
> OPND32
> push ax
> OPND32
> popf
> OPND32
> pushf
> OPND32
> pop ax
> OPND32
> xor ax, cx ; cant toggle ID bit
> je end_get_cpuid
>
>
> check_vendor:
> mov ID_FLAG, 1
> OPND32
> xor ax, ax
> CPUID
> OPND32
> mov word ptr VENDOR_ID, bx
> OPND32
> mov word ptr VENDOR_ID[+4], dx
> OPND32
> mov word ptr VENDOR_ID[+8], cx
> mov si, offset VENDOR_ID
> mov di, offset intel_id
> mov cx, length intel_id
>
> compare:
> repe cmpsb
> or cx, cx
> jnz end_get_cpuid
>
> intel_processor:
> mov INTEL_PROC, 1
>
> cpuid_data:
> OPND32
> cmp ax, 1
>
> jl end_get_cpuid
> OPND32
> xor ax, ax
> OPND32
> inc ax
> CPUID
> mov byte ptr ds:STEPPING, al
> and STEPPING, STEPPING_MASK
>
> and al, MODEL_MASK
> shr al, MODEL_SHIFT
> mov byte ptr ds:CPU_MODEL, al
>
> and ax, FAMILY_MASK
> shr ax, FAMILY_SHIFT
> mov byte ptr ds:CPU_TYPE, al
>
> mov dword ptr FEATURE_FLAGS, edx
>
> end_get_cpuid:
> ret
>
> get_cpuid endp
>
>
> get_fpuid proc near
>
> fninit
> mov word ptr ds:FP_STATUS, 5a5ah
>
> fnstsw word ptr ds:FP_STATUS
> mov ax, word ptr ds:FP_STATUS
> cmp al, 0
>
> mov FPU_TYPE, 0
> jne end_get_fpuid
>
> check_control_word:
> fnstcw word ptr ds:FP_STATUS
> mov ax, word ptr ds:FP_STATUS
> and ax, 103fh
> cmp ax, 3fh
>
> mov FPU_TYPE, 0
> jne end_get_fpuid
> mov FPU_TYPE, 1
>
>
> check_infinity:
> cmp CPU_TYPE, 3
> jne end_get_fpuid
> fld1
> fldz
> fdiv
> fld st
> fchs
> fcompp
> fstsw word ptr ds:FP_STATUS
> mov ax, word ptr ds:FP_STATUS
> mov FPU_TYPE, 2
>
> sahf
> jz end_get_fpuid
> mov FPU_TYPE, 3
> end_get_fpuid:
> ret
> get_fpuid endp
>
>
>
> print proc near
> cmp ID_FLAG, 1
> je print_cpuid_data
>
> if (VERBOSE)
> mov dx, offset id_msg
> call OutputMessage
> endif
>
> print_86:
> cmp CPU_TYPE, 0
> jne print_286
>
> if (VERBOSE)
> mov dx, offset c8086
> call OutputMessage
> endif
> cmp FPU_TYPE, 0
> je end_print
>
> if (VERBOSE)
> mov dx, offset fp_8087
> call OutputMessage
> endif
> jmp end_print
>
> print_286:
> cmp CPU_TYPE, 2
> jne print_386
> if (VERBOSE)
> mov dx, offset c286
> call OutputMessage
> endif
> cmp FPU_TYPE, 0
> je end_print
> if (VERBOSE)
> mov dx, offset fp_80287
> call OutputMessage
> endif
> jmp end_print
>
> print_386:
> cmp CPU_TYPE, 3
> jne print_486
> if (VERBOSE)
> mov dx, offset c386
> call OutputMessage
> endif
> cmp FPU_TYPE, 0
> je end_print
> cmp FPU_TYPE, 2
> jne print_387
> if (VERBOSE)
> mov dx, offset fp_80287
> call OutputMessage
> endif
> jmp end_print
>
> print_387:
> if (VERBOSE)
> mov dx, offset fp_80387
> call OutputMessage
> endif
> jmp end_print
>
> print_486:
> cmp FPU_TYPE, 0
> je print_Intel486sx
> if (VERBOSE)
> mov dx, offset c486
> call OutputMessage
> endif
> jmp end_print
>
> print_Intel486sx:
> if (VERBOSE)
> mov dx, offset c486nfp
> call OutputMessage
> endif
> jmp end_print
>
> print_cpuid_data:
>
> cmp_vendor:
> cmp INTEL_PROC, 1
> jne not_GenuineIntel
>
> cmp CPU_TYPE, 4
> jne check_Pentium
> if (VERBOSE)
> mov dx, offset Intel486_msg
> call OutputMessage
> endif
> jmp print_family
>
> check_Pentium:
> cmp CPU_TYPE, 5
> jne check_PentiumPro
> if (VERBOSE)
> mov dx, offset Pentium_msg
> call OutputMessage
> endif
> jmp print_family
>
> check_PentiumPro:
> cmp CPU_TYPE, 6
> jne print_features
> if (VERBOSE)
> mov dx, offset PentiumPro_msg
> call OutputMessage
> endif
>
> print_family:
>
> IF VERBOSE
> mov dx, offset family_msg
> call OutputMessage
> ENDIF
>
> mov al, byte ptr ds:CPU_TYPE
> mov byte ptr dataCR, al
> add byte ptr dataCR, 30h
>
> IF VERBOSE
> mov dx, offset dataCR
> call OutputMessage
> ENDIF
>
> print_model:
>
> IF VERBOSE
> mov dx, offset model_msg
> call OutputMessage
> ENDIF
>
> mov al, byte ptr ds:CPU_MODEL
> mov byte ptr dataCR, al
> add byte ptr dataCR, 30h
>
> IF VERBOSE
> mov dx, offset dataCR
> call OutputMessage
> ENDIF
>
> print_features:
> mov ax, word ptr ds:FEATURE_FLAGS
> and ax, FPU_FLAG
> jz check_mce
>
> if (VERBOSE)
> mov dx, offset fpu_msg
> call OutputMessage
> ENDIF
>
> check_mce:
> mov ax, word ptr ds:FEATURE_FLAGS
> and ax, MCE_FLAG
> jz check_wc
>
> IF VERBOSE
> mov dx, offset mce_msg
> call OutputMessage
> ENDIF
>
> check_CMPXCHG8B:
> mov ax, word ptr ds:FEATURE_FLAGS
> and ax, CMPXCHG8B_FLAG
> jz check_4MB_paging
>
> IF VERBOSE
> mov dx, offset cmp_msg
> call OutputMessage
> ENDIF
>
> chekc_io_break:
> mov ax, word ptr ds:FEATURE_FLAGS
> test ax, 4
> jz check_4MB_paging
>
> IF VERBOSE
> mov dx, offset io_break_msg
> call OutputMessage
> ENDIF
>
> ; Enable Debugging Extensions bit in CR4
> CR4_TO_ECX
> or ecx, 08h
> ECX_TO_CR4
>
> if (VERBOSE)
> mov dx, offset io_break_enable
> call OutputMessage
> endif
>
> check_4MB_paging:
> mov ax, word ptr ds:FEATURE_FLAGS
> test ax, 08h
> jz check_PageExtend
>
> IF VERBOSE
> mov dx, offset page_4MB_msg
> call OutputMessage
> ENDIF
>
> ; Enable page size extension bit in CR4
> CR4_TO_ECX
> or ecx, 10h
> ECX_TO_CR4
>
> if (VERBOSE)
> mov dx, offset p4mb_enable
> call OutputMessage
> endif
>
> check_PageExtend:
> mov ax, word ptr ds:FEATURE_FLAGS
> test ax, 40h
> jz check_wc
>
> ;; DEBUG DEBUG DEBUG !!!
> ; Enable page address extension bit in CR4
>
> ;; CR4_TO_ECX
> ;; or ecx, 20h
> ;; ECX_TO_CR4
>
> check_wc:
> mov dx, word ptr ds:FEATURE_FLAGS
> test dx, 1000h ; MTRR support flag
> jz end_print
>
> if (VERBOSE)
> mov dx, offset wc_enable
> call OutputMessage
> endif
> jmp end_print
>
> not_GenuineIntel:
> if (VERBOSE)
> mov dx, offset not_Intel
> call OutputMessage
> endif
>
> end_print:
> ret
> print endp
>
>
> [email protected] wrote:
> >
> > On Tue, 7 Nov 2000, Jeff V. Merkey wrote:
> >
> > > So how come NetWare and NT can detect this at run time, and we have to
> > > use a .config option to specifiy it? Come on guys.....
> >
> > Then run a kernel compiled for i386 and suffer the poorer code quality
> > that comes with not using newer instructions and including the
> > workarounds for ancient hardware.
> >
> > -ben
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > Please read the FAQ at http://www.tux.org/lkml/
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> Please read the FAQ at http://www.tux.org/lkml/
>
[email protected] wrote:
>
> > There are tests for all this in the feature flags for intel and
> > non-intel CPUs like AMD -- including MTRR settings. All of this could
> > be dynamic. Here's some code that does this, and it's similiar to
> > NetWare. It detexts CPU type, feature flags, special instructions,
> > etc. All of this on x86 could be dynamically detected.
>
> Detecting the CPU isn't the issue (we already do all this), it's what to
> do when you've figured out what the CPU is. Show me code that can
> dynamically adjust the alignment of the routines/variables/structs
> dependant upon cacheline size.
If the compiler always aligned all functions and data on 16 byte
boundries (NetWare)
for all i386 code, it would run a lot faster. Cache line alignment
could be an option in the loader .... after all, it's hte loader that
locates data in memory. If Linux were PE based, relocation logic would
be a snap with this model (like NT).
Jeff
>
> regards,
>
> Davej.
>
> --
> | Dave Jones <[email protected]> http://www.suse.de/~davej
> | SuSE Labs
David Lang wrote:
>
> Jeff, the problem is not detecting the CPU type at runtime, the problem is
> trying to re-compile the code to take advantage of that CPU at runtime.
>
> depending on what CPU you have the kernel (and compiler) can use different
> commands/opmizations/etc, if you want to do this on boot you have two
> options.
>
> 1. re-compile the kernel
>
> 2. change all the CPU specific places from inline code to function calls
> into a table that get changed at boot to point at the correct calls.
The macros would be a problem. Some of the options, like MTRR, should
be auto-detected.
Jeff
>
> doing #2 will cost you so much performance that you would be better off
> just compiling for a 386 and not going through the autodetect hassle in
> the first place.
>
> David Lang
>
> On Tue, 7 Nov 2000, Jeff V. Merkey wrote:
>
> > Date: Tue, 07 Nov 2000 16:10:58 -0700
> > From: Jeff V. Merkey <[email protected]>
> > To: [email protected]
> > Cc: Martin Josefsson <[email protected]>,
> > Tigran Aivazian <[email protected]>, Anil kumar <[email protected]>,
> > [email protected]
> > Subject: Re: Installing kernel 2.4
> >
> >
> > There are tests for all this in the feature flags for intel and
> > non-intel CPUs like AMD -- including MTRR settings. All of this could
> > be dynamic. Here's some code that does this, and it's similiar to
> > NetWare. It detexts CPU type, feature flags, special instructions,
> > etc. All of this on x86 could be dynamically detected.
> >
> > :-)
> >
> > ;*************************************************************************
> > ;
> > ; check current processor type and state
> > ;
> > ;*************************************************************************
> >
> > public DetectProcessorInformation
> > DetectProcessorInformation proc near
> >
> > mov ax, cs
> > mov ds, ax
> > mov es, ax
> >
> > pushf
> > call get_cpuid
> > call get_fpuid
> > call print
> > popf
> > ret
> >
> > DetectProcessorInformation endp
> >
> > get_cpuid proc near
> >
> > check_8086:
> >
> > pushf
> > pop ax
> > mov cx, ax
> > and ax, 0fffh
> > push ax
> > popf
> > pushf
> > pop ax
> > and ax, 0f000h
> > cmp ax, 0f000h ; flag bits 12-15 are always set on an 8086
> > mov CPU_TYPE, 0 ; 8086 detected
> > je end_get_cpuid
> >
> > check_80286:
> > or cx, 0f000h
> > push cx
> > popf
> > pushf
> > pop ax
> > and ax, 0f000h ; flag bits 12-15 are always clear on 80286 in
> > real mode
> > mov CPU_TYPE, 2 ; 80286 processor
> > jz end_get_cpuid
> >
> > check_80386:
> > mov bx, sp
> > and sp, not 3
> > OPND32
> > pushf
> > OPND32
> > pop ax
> > OPND32
> > mov cx, ax
> > OPND32 35h, 40000h
> > OPND32
> > push ax
> > OPND32
> > popf
> > OPND32
> > pushf
> > OPND32
> > pop ax
> > OPND32
> > xor ax, cx ; AC bit won't toggle, 80386 detected
> > mov sp, bx
> > mov CPU_TYPE, 3 ; 80386 detected
> > jz end_get_cpuid
> >
> > and sp, not 3
> > OPND32
> > push cx
> > OPND32
> > popf
> > mov sp, bx ; restore stack
> >
> >
> > check_80486:
> > mov CPU_TYPE, 4 ; default to 80486
> >
> > OPND32
> > mov ax, cx
> > OPND32 35h, 200000h ; xor ID bit
> > OPND32
> > push ax
> > OPND32
> > popf
> > OPND32
> > pushf
> > OPND32
> > pop ax
> > OPND32
> > xor ax, cx ; cant toggle ID bit
> > je end_get_cpuid
> >
> >
> > check_vendor:
> > mov ID_FLAG, 1
> > OPND32
> > xor ax, ax
> > CPUID
> > OPND32
> > mov word ptr VENDOR_ID, bx
> > OPND32
> > mov word ptr VENDOR_ID[+4], dx
> > OPND32
> > mov word ptr VENDOR_ID[+8], cx
> > mov si, offset VENDOR_ID
> > mov di, offset intel_id
> > mov cx, length intel_id
> >
> > compare:
> > repe cmpsb
> > or cx, cx
> > jnz end_get_cpuid
> >
> > intel_processor:
> > mov INTEL_PROC, 1
> >
> > cpuid_data:
> > OPND32
> > cmp ax, 1
> >
> > jl end_get_cpuid
> > OPND32
> > xor ax, ax
> > OPND32
> > inc ax
> > CPUID
> > mov byte ptr ds:STEPPING, al
> > and STEPPING, STEPPING_MASK
> >
> > and al, MODEL_MASK
> > shr al, MODEL_SHIFT
> > mov byte ptr ds:CPU_MODEL, al
> >
> > and ax, FAMILY_MASK
> > shr ax, FAMILY_SHIFT
> > mov byte ptr ds:CPU_TYPE, al
> >
> > mov dword ptr FEATURE_FLAGS, edx
> >
> > end_get_cpuid:
> > ret
> >
> > get_cpuid endp
> >
> >
> > get_fpuid proc near
> >
> > fninit
> > mov word ptr ds:FP_STATUS, 5a5ah
> >
> > fnstsw word ptr ds:FP_STATUS
> > mov ax, word ptr ds:FP_STATUS
> > cmp al, 0
> >
> > mov FPU_TYPE, 0
> > jne end_get_fpuid
> >
> > check_control_word:
> > fnstcw word ptr ds:FP_STATUS
> > mov ax, word ptr ds:FP_STATUS
> > and ax, 103fh
> > cmp ax, 3fh
> >
> > mov FPU_TYPE, 0
> > jne end_get_fpuid
> > mov FPU_TYPE, 1
> >
> >
> > check_infinity:
> > cmp CPU_TYPE, 3
> > jne end_get_fpuid
> > fld1
> > fldz
> > fdiv
> > fld st
> > fchs
> > fcompp
> > fstsw word ptr ds:FP_STATUS
> > mov ax, word ptr ds:FP_STATUS
> > mov FPU_TYPE, 2
> >
> > sahf
> > jz end_get_fpuid
> > mov FPU_TYPE, 3
> > end_get_fpuid:
> > ret
> > get_fpuid endp
> >
> >
> >
> > print proc near
> > cmp ID_FLAG, 1
> > je print_cpuid_data
> >
> > if (VERBOSE)
> > mov dx, offset id_msg
> > call OutputMessage
> > endif
> >
> > print_86:
> > cmp CPU_TYPE, 0
> > jne print_286
> >
> > if (VERBOSE)
> > mov dx, offset c8086
> > call OutputMessage
> > endif
> > cmp FPU_TYPE, 0
> > je end_print
> >
> > if (VERBOSE)
> > mov dx, offset fp_8087
> > call OutputMessage
> > endif
> > jmp end_print
> >
> > print_286:
> > cmp CPU_TYPE, 2
> > jne print_386
> > if (VERBOSE)
> > mov dx, offset c286
> > call OutputMessage
> > endif
> > cmp FPU_TYPE, 0
> > je end_print
> > if (VERBOSE)
> > mov dx, offset fp_80287
> > call OutputMessage
> > endif
> > jmp end_print
> >
> > print_386:
> > cmp CPU_TYPE, 3
> > jne print_486
> > if (VERBOSE)
> > mov dx, offset c386
> > call OutputMessage
> > endif
> > cmp FPU_TYPE, 0
> > je end_print
> > cmp FPU_TYPE, 2
> > jne print_387
> > if (VERBOSE)
> > mov dx, offset fp_80287
> > call OutputMessage
> > endif
> > jmp end_print
> >
> > print_387:
> > if (VERBOSE)
> > mov dx, offset fp_80387
> > call OutputMessage
> > endif
> > jmp end_print
> >
> > print_486:
> > cmp FPU_TYPE, 0
> > je print_Intel486sx
> > if (VERBOSE)
> > mov dx, offset c486
> > call OutputMessage
> > endif
> > jmp end_print
> >
> > print_Intel486sx:
> > if (VERBOSE)
> > mov dx, offset c486nfp
> > call OutputMessage
> > endif
> > jmp end_print
> >
> > print_cpuid_data:
> >
> > cmp_vendor:
> > cmp INTEL_PROC, 1
> > jne not_GenuineIntel
> >
> > cmp CPU_TYPE, 4
> > jne check_Pentium
> > if (VERBOSE)
> > mov dx, offset Intel486_msg
> > call OutputMessage
> > endif
> > jmp print_family
> >
> > check_Pentium:
> > cmp CPU_TYPE, 5
> > jne check_PentiumPro
> > if (VERBOSE)
> > mov dx, offset Pentium_msg
> > call OutputMessage
> > endif
> > jmp print_family
> >
> > check_PentiumPro:
> > cmp CPU_TYPE, 6
> > jne print_features
> > if (VERBOSE)
> > mov dx, offset PentiumPro_msg
> > call OutputMessage
> > endif
> >
> > print_family:
> >
> > IF VERBOSE
> > mov dx, offset family_msg
> > call OutputMessage
> > ENDIF
> >
> > mov al, byte ptr ds:CPU_TYPE
> > mov byte ptr dataCR, al
> > add byte ptr dataCR, 30h
> >
> > IF VERBOSE
> > mov dx, offset dataCR
> > call OutputMessage
> > ENDIF
> >
> > print_model:
> >
> > IF VERBOSE
> > mov dx, offset model_msg
> > call OutputMessage
> > ENDIF
> >
> > mov al, byte ptr ds:CPU_MODEL
> > mov byte ptr dataCR, al
> > add byte ptr dataCR, 30h
> >
> > IF VERBOSE
> > mov dx, offset dataCR
> > call OutputMessage
> > ENDIF
> >
> > print_features:
> > mov ax, word ptr ds:FEATURE_FLAGS
> > and ax, FPU_FLAG
> > jz check_mce
> >
> > if (VERBOSE)
> > mov dx, offset fpu_msg
> > call OutputMessage
> > ENDIF
> >
> > check_mce:
> > mov ax, word ptr ds:FEATURE_FLAGS
> > and ax, MCE_FLAG
> > jz check_wc
> >
> > IF VERBOSE
> > mov dx, offset mce_msg
> > call OutputMessage
> > ENDIF
> >
> > check_CMPXCHG8B:
> > mov ax, word ptr ds:FEATURE_FLAGS
> > and ax, CMPXCHG8B_FLAG
> > jz check_4MB_paging
> >
> > IF VERBOSE
> > mov dx, offset cmp_msg
> > call OutputMessage
> > ENDIF
> >
> > chekc_io_break:
> > mov ax, word ptr ds:FEATURE_FLAGS
> > test ax, 4
> > jz check_4MB_paging
> >
> > IF VERBOSE
> > mov dx, offset io_break_msg
> > call OutputMessage
> > ENDIF
> >
> > ; Enable Debugging Extensions bit in CR4
> > CR4_TO_ECX
> > or ecx, 08h
> > ECX_TO_CR4
> >
> > if (VERBOSE)
> > mov dx, offset io_break_enable
> > call OutputMessage
> > endif
> >
> > check_4MB_paging:
> > mov ax, word ptr ds:FEATURE_FLAGS
> > test ax, 08h
> > jz check_PageExtend
> >
> > IF VERBOSE
> > mov dx, offset page_4MB_msg
> > call OutputMessage
> > ENDIF
> >
> > ; Enable page size extension bit in CR4
> > CR4_TO_ECX
> > or ecx, 10h
> > ECX_TO_CR4
> >
> > if (VERBOSE)
> > mov dx, offset p4mb_enable
> > call OutputMessage
> > endif
> >
> > check_PageExtend:
> > mov ax, word ptr ds:FEATURE_FLAGS
> > test ax, 40h
> > jz check_wc
> >
> > ;; DEBUG DEBUG DEBUG !!!
> > ; Enable page address extension bit in CR4
> >
> > ;; CR4_TO_ECX
> > ;; or ecx, 20h
> > ;; ECX_TO_CR4
> >
> > check_wc:
> > mov dx, word ptr ds:FEATURE_FLAGS
> > test dx, 1000h ; MTRR support flag
> > jz end_print
> >
> > if (VERBOSE)
> > mov dx, offset wc_enable
> > call OutputMessage
> > endif
> > jmp end_print
> >
> > not_GenuineIntel:
> > if (VERBOSE)
> > mov dx, offset not_Intel
> > call OutputMessage
> > endif
> >
> > end_print:
> > ret
> > print endp
> >
> >
> > [email protected] wrote:
> > >
> > > On Tue, 7 Nov 2000, Jeff V. Merkey wrote:
> > >
> > > > So how come NetWare and NT can detect this at run time, and we have to
> > > > use a .config option to specifiy it? Come on guys.....
> > >
> > > Then run a kernel compiled for i386 and suffer the poorer code quality
> > > that comes with not using newer instructions and including the
> > > workarounds for ancient hardware.
> > >
> > > -ben
> > >
> > > -
> > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > > the body of a message to [email protected]
> > > Please read the FAQ at http://www.tux.org/lkml/
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > Please read the FAQ at http://www.tux.org/lkml/
> >
"Jeff V. Merkey" wrote:
>
> [email protected] wrote:
> >
> > > There are tests for all this in the feature flags for intel and
> > > non-intel CPUs like AMD -- including MTRR settings. All of this could
> > > be dynamic. Here's some code that does this, and it's similiar to
> > > NetWare. It detexts CPU type, feature flags, special instructions,
> > > etc. All of this on x86 could be dynamically detected.
> >
> > Detecting the CPU isn't the issue (we already do all this), it's what to
> > do when you've figured out what the CPU is. Show me code that can
> > dynamically adjust the alignment of the routines/variables/structs
> > dependant upon cacheline size.
ftp.timpanogas.org/manos/manos0817.tar.gz
Look in the PE loader -- Microsoft's PE loader can do this since
everything is RVA based. If you want to take the loader and put it in
Linux, be my guest. You can even combine mutiple i86 segments all
compiled under different options (or architectures) and bundle them into
a single executable file -- not somthing gcc can do today -- even with
DLL. This code is almost identical to the PE loader used in NT -- with
one exception, I omit the fs:_THREAD_DLS startup code...
8)
Jeff
>
> If the compiler always aligned all functions and data on 16 byte
> boundries (NetWare)
> for all i386 code, it would run a lot faster. Cache line alignment
> could be an option in the loader .... after all, it's hte loader that
> locates data in memory. If Linux were PE based, relocation logic would
> be a snap with this model (like NT).
>
> Jeff
>
> >
> > regards,
> >
> > Davej.
> >
> > --
> > | Dave Jones <[email protected]> http://www.suse.de/~davej
> > | SuSE Labs
On Tue, 7 Nov 2000, David Lang wrote:
> depending on what CPU you have the kernel (and compiler) can use different
> commands/opmizations/etc, if you want to do this on boot you have two
> options.
Wouldn't it be possible to compile the parts of the kernel needed to
uncompress and to detect the cpu with lower optimizations and then abort
with an error message?
"Error: Kernel needs a PIII" sounds much better than just stoping dead.
c'ya
sven
--
The Internet treats censorship as a routing problem, and routes around it.
(John Gilmore on http://www.cygnus.com/~gnu/)
Jeff Merkey wrote:
> here are tests for all this in the feature flags for intel and
> non-intel CPUs like AMD -- including MTRR settings. All of this could
> be dynamic. Here's some code that does this, and it's similiar to
> NetWare. It detexts CPU type, feature flags, special instructions,
> etc. All of this on x86 could be dynamically detected.
Jeff, I think you miss the point that 100% dynamic detection comes with
a penalty over the current system.
Using CONFIG_M586 enables us to compile with Pentium-specific
instructions, and eliminate any code specific to 386's or 486's. This
includes inlining Pentium-specific code into drivers and the core kernel
where possible, for the maximum possible performance. Your scheme
doesn't work because of all the inlined code, nor does it support
maximum performance code on all processors without massive code bloat...
You do bring up a good point though. Users compile their own kernels to
get the advantages I describe above. Vendors, on the other hand, must
compile one-size-fits-all generic kernels. Your expertise and
assistance would definitely benefit this case.
One change I would like to make in 2.5.x along these lines -- the Alpha
AXP port allow one to define either CONFIG_ALPHA_GENERIC -- support all
processors/machines -- or CONFIG_ALPHA_$MYMACHINE. It would be nice to
follow that model for x86 too. Currently, when I select CONFIG_M586, I
get code for 686, etc. There is no way to simply say "Pentium and
nothing else".
Jeff
--
Jeff Garzik | "When I do this, my computer freezes."
Building 1024 | -user
MandrakeSoft | "Don't do that."
| -level 1
Sven Koch wrote:
>
> On Tue, 7 Nov 2000, David Lang wrote:
>
> > depending on what CPU you have the kernel (and compiler) can use different
> > commands/opmizations/etc, if you want to do this on boot you have two
> > options.
>
> Wouldn't it be possible to compile the parts of the kernel needed to
> uncompress and to detect the cpu with lower optimizations and then abort
> with an error message?
>
> "Error: Kernel needs a PIII" sounds much better than just stoping dead.
I agree... maybe we can solve this simply by giving the CPU detection
module the -march=i386 flag hardcoded, or editing the bootstrap, or
something like that...
Jeff
--
Jeff Garzik | "When I do this, my computer freezes."
Building 1024 | -user
MandrakeSoft | "Don't do that."
| -level 1
"Jeff V. Merkey" wrote:
>
> "Jeff V. Merkey" wrote:
> >
> > [email protected] wrote:
> > >
> > > > There are tests for all this in the feature flags for intel and
> > > > non-intel CPUs like AMD -- including MTRR settings. All of this could
> > > > be dynamic. Here's some code that does this, and it's similiar to
> > > > NetWare. It detexts CPU type, feature flags, special instructions,
> > > > etc. All of this on x86 could be dynamically detected.
> > >
> > > Detecting the CPU isn't the issue (we already do all this), it's what to
> > > do when you've figured out what the CPU is. Show me code that can
> > > dynamically adjust the alignment of the routines/variables/structs
> > > dependant upon cacheline size.
>
> ftp.timpanogas.org/manos/manos0817.tar.gz
>
> Look in the PE loader -- Microsoft's PE loader can do this since
> everything is RVA based. If you want to take the loader and put it in
> Linux, be my guest. You can even combine mutiple i86 segments all
> compiled under different options (or architectures) and bundle them into
> a single executable file -- not somthing gcc can do today -- even with
> DLL. This code is almost identical to the PE loader used in NT -- with
> one exception, I omit the fs:_THREAD_DLS startup code...
>
> 8)
>
> Jeff
>
> >
> > If the compiler always aligned all functions and data on 16 byte
> > boundries (NetWare)
> > for all i386 code, it would run a lot faster. Cache line alignment
> > could be an option in the loader .... after all, it's hte loader that
> > locates data in memory. If Linux were PE based, relocation logic would
> > be a snap with this model (like NT).
Also, init.386 has an x86 real mode PE loader as well that could easily
be used to load Linux as a DLL instead of a coff binary. Then you could
combine several executable segments of differing optimizations and
select the correct one at load time. I do this now. It's pretty easy
to do with PE ...
Jeff
> >
> > Jeff
> >
> > >
> > > regards,
> > >
> > > Davej.
> > >
> > > --
> > > | Dave Jones <[email protected]> http://www.suse.de/~davej
> > > | SuSE Labs
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> Please read the FAQ at http://www.tux.org/lkml/
"Jeff V. Merkey" wrote:
> If the compiler always aligned all functions and data on 16 byte
> boundries (NetWare)
> for all i386 code, it would run a lot faster.
Are you saying that it isn't? Have you look at gcc-generated assembly
from a recent 2.2.x or 2.4.x kernel?
2.2.x build command line, note use of "...align...":
/usr/bin/kgcc -D__KERNEL__ -I/spare/cvs/linux_2_2/include -Wall
-Wstrict-prototypes -O2 -fomit-frame-pointer -fno-strict-aliasing
-D__SMP__ -pipe -fno-strength-reduce -m486 -malign-loops=2
-malign-jumps=2 -malign-functions=2 -DCPU=686 -c -o extable.o
extable.c
2.4.x, note "preferred-stack-boundary" and generated asm code...
gcc -D__KERNEL__ -I/spare/cvs/linux_2_4/include -Wall
-Wstrict-prototypes -O2 -fomit-frame-pointer -fno-strict-aliasing -pipe
-mpreferred-stack-boundary=2 -march=i686 -DMODULE -DMODVERSIONS -include
/spare/cvs/linux_2_4/include/linux/modversions.h -c -o emd.o emd.c
Jeff
--
Jeff Garzik | "When I do this, my computer freezes."
Building 1024 | -user
MandrakeSoft | "Don't do that."
| -level 1
Jeff Garzik wrote:
>
> Jeff Merkey wrote:
> > here are tests for all this in the feature flags for intel and
> > non-intel CPUs like AMD -- including MTRR settings. All of this could
> > be dynamic. Here's some code that does this, and it's similiar to
> > NetWare. It detexts CPU type, feature flags, special instructions,
> > etc. All of this on x86 could be dynamically detected.
>
> Jeff, I think you miss the point that 100% dynamic detection comes with
> a penalty over the current system.
>
> Using CONFIG_M586 enables us to compile with Pentium-specific
> instructions, and eliminate any code specific to 386's or 486's. This
> includes inlining Pentium-specific code into drivers and the core kernel
> where possible, for the maximum possible performance. Your scheme
> doesn't work because of all the inlined code, nor does it support
> maximum performance code on all processors without massive code bloat...
>
> You do bring up a good point though. Users compile their own kernels to
> get the advantages I describe above. Vendors, on the other hand, must
> compile one-size-fits-all generic kernels. Your expertise and
> assistance would definitely benefit this case.
>
We need a format that allow multiple executable segments to be combined
in a single executable and the loader have enough smarts to grab the
right one based on architecture. two options:
1. extend gcc to support this or rearragne linux into segments based on
code type
2. Use PE.
Jeff
> One change I would like to make in 2.5.x along these lines -- the Alpha
> AXP port allow one to define either CONFIG_ALPHA_GENERIC -- support all
> processors/machines -- or CONFIG_ALPHA_$MYMACHINE. It would be nice to
> follow that model for x86 too. Currently, when I select CONFIG_M586, I
> get code for 686, etc. There is no way to simply say "Pentium and
> nothing else".
>
> Jeff
>
> --
> Jeff Garzik | "When I do this, my computer freezes."
> Building 1024 | -user
> MandrakeSoft | "Don't do that."
> | -level 1
Jeff, the kernel image is already pretty large. if you try and take what
are basicly independant kernel images and put them in one file you will
very quickly endup with something that is to large to use.
As an example a kenel for a boot floppy needs to be <1.4MB compressed,
it's not uncommon for it to be >800K compressed as it is, how do you fit
even two of these on a disk.
remember it's not just the start of the file that varies based on cachline
size, it's the positioning of code and data thoughout the kernel image.
David Lang
On Tue, 7 Nov
2000, Jeff V. Merkey wrote:
> Date: Tue, 07 Nov 2000 16:47:08 -0700
> From: Jeff V. Merkey <[email protected]>
> To: [email protected], Linux Kernel Mailing List <[email protected]>
> Subject: Re: Installing kernel 2.4
>
>
>
> "Jeff V. Merkey" wrote:
> >
> > [email protected] wrote:
> > >
> > > > There are tests for all this in the feature flags for intel and
> > > > non-intel CPUs like AMD -- including MTRR settings. All of this could
> > > > be dynamic. Here's some code that does this, and it's similiar to
> > > > NetWare. It detexts CPU type, feature flags, special instructions,
> > > > etc. All of this on x86 could be dynamically detected.
> > >
> > > Detecting the CPU isn't the issue (we already do all this), it's what to
> > > do when you've figured out what the CPU is. Show me code that can
> > > dynamically adjust the alignment of the routines/variables/structs
> > > dependant upon cacheline size.
>
> ftp.timpanogas.org/manos/manos0817.tar.gz
>
> Look in the PE loader -- Microsoft's PE loader can do this since
> everything is RVA based. If you want to take the loader and put it in
> Linux, be my guest. You can even combine mutiple i86 segments all
> compiled under different options (or architectures) and bundle them into
> a single executable file -- not somthing gcc can do today -- even with
> DLL. This code is almost identical to the PE loader used in NT -- with
> one exception, I omit the fs:_THREAD_DLS startup code...
>
> 8)
>
> Jeff
>
>
> >
> > If the compiler always aligned all functions and data on 16 byte
> > boundries (NetWare)
> > for all i386 code, it would run a lot faster. Cache line alignment
> > could be an option in the loader .... after all, it's hte loader that
> > locates data in memory. If Linux were PE based, relocation logic would
> > be a snap with this model (like NT).
> >
> > Jeff
> >
> > >
> > > regards,
> > >
> > > Davej.
> > >
> > > --
> > > | Dave Jones <[email protected]> http://www.suse.de/~davej
> > > | SuSE Labs
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> Please read the FAQ at http://www.tux.org/lkml/
>
Code is. Data isn't. Gcc packs data into the segment like sardines in
a can (NT code does to). 16 byte align this as well. NetWare 16 byte
aligns everythin with an align 16 directive in the data segments of
assembler modules.
Jeff
Jeff Garzik wrote:
>
> "Jeff V. Merkey" wrote:
> > If the compiler always aligned all functions and data on 16 byte
> > boundries (NetWare)
> > for all i386 code, it would run a lot faster.
>
> Are you saying that it isn't? Have you look at gcc-generated assembly
> from a recent 2.2.x or 2.4.x kernel?
>
> 2.2.x build command line, note use of "...align...":
> /usr/bin/kgcc -D__KERNEL__ -I/spare/cvs/linux_2_2/include -Wall
> -Wstrict-prototypes -O2 -fomit-frame-pointer -fno-strict-aliasing
> -D__SMP__ -pipe -fno-strength-reduce -m486 -malign-loops=2
> -malign-jumps=2 -malign-functions=2 -DCPU=686 -c -o extable.o
> extable.c
>
> 2.4.x, note "preferred-stack-boundary" and generated asm code...
> gcc -D__KERNEL__ -I/spare/cvs/linux_2_4/include -Wall
> -Wstrict-prototypes -O2 -fomit-frame-pointer -fno-strict-aliasing -pipe
> -mpreferred-stack-boundary=2 -march=i686 -DMODULE -DMODVERSIONS -include
> /spare/cvs/linux_2_4/include/linux/modversions.h -c -o emd.o emd.c
>
> Jeff
>
> --
> Jeff Garzik | "When I do this, my computer freezes."
> Building 1024 | -user
> MandrakeSoft | "Don't do that."
> | -level 1
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> Please read the FAQ at http://www.tux.org/lkml/
"Jeff V. Merkey" wrote:
> We need a format that allow multiple executable segments to be combined
> in a single executable and the loader have enough smarts to grab the
> right one based on architecture. two options:
>
> 1. extend gcc to support this or rearragne linux into segments based on
> code type
> 2. Use PE.
The kernel isn't going non-ELF. Too painful, for dubious advantages,
namely:
The current gcc toolchain already supports what you suggest.
I understand that some people have even put some thought into a
bootloader that dynamically links your kernel on bootup, so this idea
isn't new. It's a good idea though.
Jeff
--
Jeff Garzik | "When I do this, my computer freezes."
Building 1024 | -user
MandrakeSoft | "Don't do that."
| -level 1
David Lang wrote:
>
> Jeff, the kernel image is already pretty large. if you try and take what
> are basicly independant kernel images and put them in one file you will
> very quickly endup with something that is to large to use.
>
> As an example a kenel for a boot floppy needs to be <1.4MB compressed,
> it's not uncommon for it to be >800K compressed as it is, how do you fit
> even two of these on a disk.
>
> remember it's not just the start of the file that varies based on cachline
> size, it's the positioning of code and data thoughout the kernel image.
>
Understood. I will go off and give some thought and study and respond
later after I have a proposal on the best way to do this. In NetWare,
we had indirections in the code all over the place. NT just make huge
and fat programs (NTKRNLOS.DLL is absolutely huge).
Jeff
> David Lang
>
> On Tue, 7 Nov
> 2000, Jeff V. Merkey wrote:
>
> > Date: Tue, 07 Nov 2000 16:47:08 -0700
> > From: Jeff V. Merkey <[email protected]>
> > To: [email protected], Linux Kernel Mailing List <[email protected]>
> > Subject: Re: Installing kernel 2.4
> >
> >
> >
> > "Jeff V. Merkey" wrote:
> > >
> > > [email protected] wrote:
> > > >
> > > > > There are tests for all this in the feature flags for intel and
> > > > > non-intel CPUs like AMD -- including MTRR settings. All of this could
> > > > > be dynamic. Here's some code that does this, and it's similiar to
> > > > > NetWare. It detexts CPU type, feature flags, special instructions,
> > > > > etc. All of this on x86 could be dynamically detected.
> > > >
> > > > Detecting the CPU isn't the issue (we already do all this), it's what to
> > > > do when you've figured out what the CPU is. Show me code that can
> > > > dynamically adjust the alignment of the routines/variables/structs
> > > > dependant upon cacheline size.
> >
> > ftp.timpanogas.org/manos/manos0817.tar.gz
> >
> > Look in the PE loader -- Microsoft's PE loader can do this since
> > everything is RVA based. If you want to take the loader and put it in
> > Linux, be my guest. You can even combine mutiple i86 segments all
> > compiled under different options (or architectures) and bundle them into
> > a single executable file -- not somthing gcc can do today -- even with
> > DLL. This code is almost identical to the PE loader used in NT -- with
> > one exception, I omit the fs:_THREAD_DLS startup code...
> >
> > 8)
> >
> > Jeff
> >
> >
> > >
> > > If the compiler always aligned all functions and data on 16 byte
> > > boundries (NetWare)
> > > for all i386 code, it would run a lot faster. Cache line alignment
> > > could be an option in the loader .... after all, it's hte loader that
> > > locates data in memory. If Linux were PE based, relocation logic would
> > > be a snap with this model (like NT).
> > >
> > > Jeff
> > >
> > > >
> > > > regards,
> > > >
> > > > Davej.
> > > >
> > > > --
> > > > | Dave Jones <[email protected]> http://www.suse.de/~davej
> > > > | SuSE Labs
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > Please read the FAQ at http://www.tux.org/lkml/
> >
Jeff Garzik wrote:
>
> "Jeff V. Merkey" wrote:
> > We need a format that allow multiple executable segments to be combined
> > in a single executable and the loader have enough smarts to grab the
> > right one based on architecture. two options:
> >
> > 1. extend gcc to support this or rearragne linux into segments based on
> > code type
> > 2. Use PE.
>
> The kernel isn't going non-ELF. Too painful, for dubious advantages,
> namely:
>
perhaps we should extend ELF. After all, where linux goes, gcc
follows....
Jeff
> The current gcc toolchain already supports what you suggest.
>
> I understand that some people have even put some thought into a
> bootloader that dynamically links your kernel on bootup, so this idea
> isn't new. It's a good idea though.
>
> Jeff
>
> --
> Jeff Garzik | "When I do this, my computer freezes."
> Building 1024 | -user
> MandrakeSoft | "Don't do that."
> | -level 1
It seems to me that kernel/cpu matching can be broken into two relatively
simple parts.
1 - Put a cpu "signature" in the kernel image indicating cpu requirements; and
2 - Have the bootloader (lilo) detect cpu type and match it against the cpu
"signature".
The bootloader would then load the kernel, or could give an informative
diagnostic.
David
At 06:59 PM 11/7/00, Jeff Garzik wrote:
>Sven Koch wrote:
> >
> > On Tue, 7 Nov 2000, David Lang wrote:
> >
> > > depending on what CPU you have the kernel (and compiler) can use
> different
> > > commands/opmizations/etc, if you want to do this on boot you have two
> > > options.
> >
> > Wouldn't it be possible to compile the parts of the kernel needed to
> > uncompress and to detect the cpu with lower optimizations and then abort
> > with an error message?
> >
> > "Error: Kernel needs a PIII" sounds much better than just stoping dead.
>
>I agree... maybe we can solve this simply by giving the CPU detection
>module the -march=i386 flag hardcoded, or editing the bootstrap, or
>something like that...
>
> Jeff
--------------------------------------------------------
David Relson Osage Software Systems, Inc.
[email protected] 514 W. Keech Ave.
http://www.osagesoftware.com Ann Arbor, MI 48103
voice: 734.821.8800 fax: 734.821.8800
David Relson wrote:
>
> It seems to me that kernel/cpu matching can be broken into two relatively
> simple parts.
>
> 1 - Put a cpu "signature" in the kernel image indicating cpu requirements; and
> 2 - Have the bootloader (lilo) detect cpu type and match it against the cpu
> "signature".
>
> The bootloader would then load the kernel, or could give an informative
> diagnostic.
>
The PE model uses flags to identify CPU type and capbilities and create
a table in the RVA section header that describes all the segments, i.e.
#define IMAGE_SIZEOF_FILE_HEADER 20
#define IMAGE_FILE_MACHINE_UNKNOWN 0
#define IMAGE_FILE_MACHINE_I860 0x14d
#define IMAGE_FILE_MACHINE_I386 0x14c
#define IMAGE_FILE_MACHINE_R3000 0x162
#define IMAGE_FILE_MACHINE_R4000 0x166
#define IMAGE_FILE_MACHINE_R10000 0x168
#define IMAGE_FILE_MACHINE_ALPHA 0x184
#define IMAGE_FILE_MACHINE_POWERPC 0x1F0
typedef struct _IMAGE_DATA_DIRECTORY
{
DWORD VirtualAddress;
DWORD Size;
} IMAGE_DATA_DIRECTORY,*PIMAGE_DATA_DIRECTORY;
#define IMAGE_NUMBEROF_DIRECTORY_ENTRIES 16
/* Optional coff header - used by NT to provide additional information.
*/
typedef struct _IMAGE_OPTIONAL_HEADER
{
/*
* Standard fields.
*/
WORD Magic;
BYTE MajorLinkerVersion;
BYTE MinorLinkerVersion;
DWORD SizeOfCode;
DWORD SizeOfInitializedData;
DWORD SizeOfUninitializedData;
DWORD AddressOfEntryPoint;
DWORD BaseOfCode;
DWORD BaseOfData;
/*
* NT additional fields.
*/
DWORD ImageBase;
DWORD SectionAlignment;
DWORD FileAlignment;
WORD MajorOperatingSystemVersion;
WORD MinorOperatingSystemVersion;
WORD MajorImageVersion;
WORD MinorImageVersion;
WORD MajorSubsystemVersion;
WORD MinorSubsystemVersion;
DWORD Reserved1;
DWORD SizeOfImage;
DWORD SizeOfHeaders;
DWORD CheckSum;
WORD Subsystem;
WORD DllCharacteristics;
DWORD SizeOfStackReserve;
DWORD SizeOfStackCommit;
DWORD SizeOfHeapReserve;
DWORD SizeOfHeapCommit;
DWORD LoaderFlags;
DWORD NumberOfRvaAndSizes;
IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES];
} IMAGE_OPTIONAL_HEADER,*PIMAGE_OPTIONAL_HEADER;
/* These are indexes into the DataDirectory array */
#define IMAGE_FILE_EXPORT_DIRECTORY 0
#define IMAGE_FILE_IMPORT_DIRECTORY 1
#define IMAGE_FILE_RESOURCE_DIRECTORY 2
#define IMAGE_FILE_EXCEPTION_DIRECTORY 3
#define IMAGE_FILE_SECURITY_DIRECTORY 4
#define IMAGE_FILE_BASE_RELOCATION_TABLE 5
#define IMAGE_FILE_DEBUG_DIRECTORY 6
#define IMAGE_FILE_DESCRIPTION_STRING 7
#define IMAGE_FILE_MACHINE_VALUE 8 /* Mips */
#define IMAGE_FILE_THREAD_LOCAL_STORAGE 9
#define IMAGE_FILE_CALLBACK_DIRECTORY 10
/* Directory Entries, indices into the DataDirectory array */
#define IMAGE_DIRECTORY_ENTRY_EXPORT 0
#define IMAGE_DIRECTORY_ENTRY_IMPORT 1
#define IMAGE_DIRECTORY_ENTRY_RESOURCE 2
#define IMAGE_DIRECTORY_ENTRY_EXCEPTION 3
#define IMAGE_DIRECTORY_ENTRY_SECURITY 4
#define IMAGE_DIRECTORY_ENTRY_BASERELOC 5
#define IMAGE_DIRECTORY_ENTRY_DEBUG 6
#define IMAGE_DIRECTORY_ENTRY_COPYRIGHT 7
#define IMAGE_DIRECTORY_ENTRY_GLOBALPTR 8 /* (MIPS GP) */
#define IMAGE_DIRECTORY_ENTRY_TLS 9
#define IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG 10
#define IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT 11
#define IMAGE_DIRECTORY_ENTRY_IAT 12 /* Import Address Table */
/* Subsystem Values */
#define IMAGE_SUBSYSTEM_UNKNOWN 0
#define IMAGE_SUBSYSTEM_NATIVE 1
#define IMAGE_SUBSYSTEM_WINDOWS_GUI 2 /* Windows GUI subsystem */
#define IMAGE_SUBSYSTEM_WINDOWS_CUI 3 /* Windows character subsystem*/
#define IMAGE_SUBSYSTEM_OS2_CUI 5
#define IMAGE_SUBSYSTEM_POSIX_CUI 7
typedef struct _IMAGE_NT_HEADERS {
DWORD Signature;
IMAGE_FILE_HEADER FileHeader;
IMAGE_OPTIONAL_HEADER OptionalHeader;
} IMAGE_NT_HEADERS,*PIMAGE_NT_HEADERS;
/* Section header format */
#define IMAGE_SIZEOF_SHORT_NAME 8
typedef struct _IMAGE_SECTION_HEADER {
BYTE Name[IMAGE_SIZEOF_SHORT_NAME];
union {
DWORD PhysicalAddress;
DWORD VirtualSize;
} Misc;
DWORD VirtualAddress;
DWORD SizeOfRawData;
DWORD PointerToRawData;
DWORD PointerToRelocations;
DWORD PointerToLinenumbers;
WORD NumberOfRelocations;
WORD NumberOfLinenumbers;
DWORD Characteristics;
} IMAGE_SECTION_HEADER,*PIMAGE_SECTION_HEADER;
#define IMAGE_SIZEOF_SECTION_HEADER 40
/* These defines are for the Characteristics bitfield. */
/* #define IMAGE_SCN_TYPE_REG 0x00000000 - Reserved */
/* #define IMAGE_SCN_TYPE_DSECT 0x00000001 - Reserved */
/* #define IMAGE_SCN_TYPE_NOLOAD 0x00000002 - Reserved */
/* #define IMAGE_SCN_TYPE_GROUP 0x00000004 - Reserved */
/* #define IMAGE_SCN_TYPE_NO_PAD 0x00000008 - Reserved */
/* #define IMAGE_SCN_TYPE_COPY 0x00000010 - Reserved */
#define IMAGE_SCN_CNT_CODE 0x00000020
#define IMAGE_SCN_CNT_INITIALIZED_DATA 0x00000040
#define IMAGE_SCN_CNT_UNINITIALIZED_DATA 0x00000080
#define IMAGE_SCN_LNK_OTHER 0x00000100
#define IMAGE_SCN_LNK_INFO 0x00000200
#define IMAGE_SCN_LNK_OVERLAY 0x00000400
#define IMAGE_SCN_LNK_REMOVE 0x00000800
#define IMAGE_SCN_LNK_COMDAT 0x00001000
/* 0x00002000 - Reserved */
/* #define IMAGE_SCN_MEM_PROTECTED 0x00004000 - Obsolete */
#define IMAGE_SCN_MEM_FARDATA 0x00008000
/* #define IMAGE_SCN_MEM_SYSHEAP 0x00010000 - Obsolete */
#define IMAGE_SCN_MEM_PURGEABLE 0x00020000
#define IMAGE_SCN_MEM_16BIT 0x00020000
#define IMAGE_SCN_MEM_LOCKED 0x00040000
#define IMAGE_SCN_MEM_PRELOAD 0x00080000
#define IMAGE_SCN_ALIGN_1BYTES 0x00100000
#define IMAGE_SCN_ALIGN_2BYTES 0x00200000
#define IMAGE_SCN_ALIGN_4BYTES 0x00300000
#define IMAGE_SCN_ALIGN_8BYTES 0x00400000
#define IMAGE_SCN_ALIGN_16BYTES 0x00500000 /* Default */
#define IMAGE_SCN_ALIGN_32BYTES 0x00600000
#define IMAGE_SCN_ALIGN_64BYTES 0x00700000
/* 0x00800000 - Unused */
#define IMAGE_SCN_LNK_NRELOC_OVFL 0x01000000
#define IMAGE_SCN_MEM_DISCARDABLE 0x02000000
#define IMAGE_SCN_MEM_NOT_CACHED 0x04000000
#define IMAGE_SCN_MEM_NOT_PAGED 0x08000000
#define IMAGE_SCN_MEM_SHARED 0x10000000
#define IMAGE_SCN_MEM_EXECUTE 0x20000000
#define IMAGE_SCN_MEM_READ 0x40000000
#define IMAGE_SCN_MEM_WRITE 0x80000000
/* Import name entry */
typedef struct _IMAGE_IMPORT_BY_NAME {
WORD Hint;
BYTE Name[1];
} IMAGE_IMPORT_BY_NAME,*PIMAGE_IMPORT_BY_NAME;
/* Import thunk */
typedef struct _IMAGE_THUNK_DATA {
union
{
LPBYTE ForwarderString;
LPDWORD Function;
DWORD Ordinal;
PIMAGE_IMPORT_BY_NAME AddressOfData;
} u1;
} IMAGE_THUNK_DATA,*PIMAGE_THUNK_DATA;
/* Import module directory */
typedef struct _IMAGE_IMPORT_DESCRIPTOR {
union
{
DWORD Characteristics; /* 0 for terminating null import descriptor
*/
PIMAGE_THUNK_DATA OriginalFirstThunk; /* RVA to original unbound IAT
*/
} u;
DWORD TimeDateStamp; /* 0 if not bound,
* -1 if bound, and real date\time stamp
* in IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT
* (new BIND)
* otherwise date/time stamp of DLL bound to
* (Old BIND)
*/
DWORD ForwarderChain; /* -1 if no forwarders */
DWORD Name;
/* RVA to IAT (if bound this IAT has actual addresses) */
PIMAGE_THUNK_DATA FirstThunk;
} IMAGE_IMPORT_DESCRIPTOR,*PIMAGE_IMPORT_DESCRIPTOR;
#define IMAGE_ORDINAL_FLAG 0x80000000
#define IMAGE_SNAP_BY_ORDINAL(Ordinal) ((Ordinal & IMAGE_ORDINAL_FLAG)
!= 0)
#define IMAGE_ORDINAL(Ordinal) (Ordinal & 0xffff)
/* Export module directory */
typedef struct _IMAGE_EXPORT_DIRECTORY {
DWORD Characteristics;
DWORD TimeDateStamp;
WORD MajorVersion;
WORD MinorVersion;
DWORD Name;
DWORD Base;
DWORD NumberOfFunctions;
DWORD NumberOfNames;
LPDWORD *AddressOfFunctions;
LPDWORD *AddressOfNames;
LPWORD *AddressOfNameOrdinals;
/* u_char ModuleName[1]; */
} IMAGE_EXPORT_DIRECTORY,*PIMAGE_EXPORT_DIRECTORY;
/*
* Resource directory stuff
*/
typedef struct _IMAGE_RESOURCE_DIRECTORY {
DWORD Characteristics;
DWORD TimeDateStamp;
WORD MajorVersion;
WORD MinorVersion;
WORD NumberOfNamedEntries;
WORD NumberOfIdEntries;
/* IMAGE_RESOURCE_DIRECTORY_ENTRY DirectoryEntries[]; */
} IMAGE_RESOURCE_DIRECTORY,*PIMAGE_RESOURCE_DIRECTORY;
#define IMAGE_RESOURCE_NAME_IS_STRING 0x80000000
#define IMAGE_RESOURCE_DATA_IS_DIRECTORY 0x80000000
typedef struct _IMAGE_RESOURCE_DIRECTORY_ENTRY {
union {
struct {
// DWORD NameOffset:31;
// DWORD NameIsString:1;
// fix jmerkey!
DWORD NameOffset;
} s;
DWORD Name;
WORD Id;
} u1;
union {
DWORD OffsetToData;
struct {
// DWORD OffsetToDirectory:31;
// DWORD DataIsDirectory:1;
// fix jmerkey!
//
DWORD OffsetToDirectory;
} s;
} u2;
} IMAGE_RESOURCE_DIRECTORY_ENTRY,*PIMAGE_RESOURCE_DIRECTORY_ENTRY;
typedef struct tagImportDirectory {
DWORD RVAFunctionNameList;
DWORD UseLess1;
DWORD UseLess2;
DWORD RVAModuleName;
DWORD RVAFunctionAddressList;
} IMAGE_IMPORT_MODULE_DIRECTORY, *PIMAGE_IMPORT_MODULE_DIRECTORY;
typedef struct _IMAGE_RESOURCE_DIRECTORY_STRING {
WORD Length;
CHAR NameString[1];
} IMAGE_RESOURCE_DIRECTORY_STRING,*PIMAGE_RESOURCE_DIRECTORY_STRING;
typedef struct _IMAGE_RESOURCE_DIR_STRING_U {
WORD Length;
WCHAR NameString[1];
} IMAGE_RESOURCE_DIR_STRING_U,*PIMAGE_RESOURCE_DIR_STRING_U;
typedef struct _IMAGE_RESOURCE_DATA_ENTRY {
DWORD OffsetToData;
DWORD Size;
DWORD CodePage;
DWORD Reserved;
} IMAGE_RESOURCE_DATA_ENTRY,*PIMAGE_RESOURCE_DATA_ENTRY;
typedef struct _IMAGE_BASE_RELOCATION
{
DWORD VirtualAddress;
DWORD SizeOfBlock;
WORD TypeOffset[1];
} IMAGE_BASE_RELOCATION,*PIMAGE_BASE_RELOCATION;
typedef struct _IMAGE_LOAD_CONFIG_DIRECTORY {
DWORD Characteristics;
DWORD TimeDateStamp;
WORD MajorVersion;
WORD MinorVersion;
DWORD GlobalFlagsClear;
DWORD GlobalFlagsSet;
DWORD CriticalSectionDefaultTimeout;
DWORD DeCommitFreeBlockThreshold;
DWORD DeCommitTotalFreeThreshold;
LPVOID LockPrefixTable;
DWORD MaximumAllocationSize;
DWORD VirtualMemoryThreshold;
DWORD ProcessHeapFlags;
DWORD Reserved[4];
} IMAGE_LOAD_CONFIG_DIRECTORY,*PIMAGE_LOAD_CONFIG_DIRECTORY;
typedef VOID (*PIMAGE_TLS_CALLBACK)(
LPVOID DllHandle,DWORD Reason,LPVOID Reserved
);
typedef struct _IMAGE_TLS_DIRECTORY {
DWORD StartAddressOfRawData;
DWORD EndAddressOfRawData;
LPDWORD AddressOfIndex;
PIMAGE_TLS_CALLBACK *AddressOfCallBacks;
DWORD SizeOfZeroFill;
DWORD Characteristics;
} IMAGE_TLS_DIRECTORY,*PIMAGE_TLS_DIRECTORY;
/*
* The IMAGE_DEBUG_DIRECTORY data directory points to an array of
* these structures.
*/
typedef struct _IMAGE_DEBUG_DIRECTORY {
DWORD Characteristics;
DWORD TimeDateStamp;
WORD MajorVersion;
WORD MinorVersion;
DWORD Type;
DWORD SizeOfData;
DWORD AddressOfRawData;
DWORD PointerToRawData;
} IMAGE_DEBUG_DIRECTORY,*PIMAGE_DEBUG_DIRECTORY;
/*
* The type field above can take these (plus a few other
* irrelevant) values.
*/
#define IMAGE_DEBUG_TYPE_UNKNOWN 0
#define IMAGE_DEBUG_TYPE_COFF 1
#define IMAGE_DEBUG_TYPE_CODEVIEW 2
#define IMAGE_DEBUG_TYPE_FPO 3
#define IMAGE_DEBUG_TYPE_MISC 4
#define IMAGE_DEBUG_TYPE_EXCEPTION 5
#define IMAGE_DEBUG_TYPE_FIXUP 6
#define IMAGE_DEBUG_TYPE_OMAP_TO_SRC 7
#define IMAGE_DEBUG_TYPE_OMAP_FROM_SRC 8
#define IMAGE_REL_BASED_ABSOLUTE 0
#define IMAGE_REL_BASED_HIGH 1
#define IMAGE_REL_BASED_LOW 2
#define IMAGE_REL_BASED_HIGHLOW 3
#define IMAGE_REL_BASED_HIGHADJ 4
#define IMAGE_REL_BASED_MIPS_JMPADDR 5
/*
* This is the structure that appears at the very start of a .DBG file.
*/
typedef struct _IMAGE_SEPARATE_DEBUG_HEADER {
WORD Signature;
WORD Flags;
WORD Machine;
WORD Characteristics;
DWORD TimeDateStamp;
DWORD CheckSum;
DWORD ImageBase;
DWORD SizeOfImage;
DWORD NumberOfSections;
DWORD ExportedNamesSize;
DWORD DebugDirectorySize;
DWORD Reserved[3 ];
} IMAGE_SEPARATE_DEBUG_HEADER,*PIMAGE_SEPARATE_DEBUG_HEADER;
#define IMAGE_SEPARATE_DEBUG_SIGNATURE 0x4944
#define IMAGE_LIBRARY_PROCESS_INIT 1
#define IMAGE_LIBRARY_PROCESS_TERM 8
#define IMAGE_LIBRARY_THREAD_INIT 4
#define IMAGE_LIBRARY_THREAD_TERM 2
#define IMAGE_LOADER_FLAGS_BREAK_ON_LOAD 1
#define IMAGE_LOADER_FLAGS_DEBUG_ON_LOAD 2
#define IMAGE_SYM_UNDEFINED (short) 0
#define IMAGE_SYM_ABSOLUTE (short) -1
#define IMAGE_SYM_DEBUG (short) -2
//
// Type (derived) values.
//
#define IMAGE_SYM_DTYPE_NULL 0 // no derived type.
#define IMAGE_SYM_DTYPE_POINTER 1 // pointer.
#define IMAGE_SYM_DTYPE_FUNCTION 2 // function.
#define IMAGE_SYM_DTYPE_ARRAY 3 // array.
//
// Storage classes.
//
#define IMAGE_SYM_CLASS_END_OF_FUNCTION (BYTE )-1
#define IMAGE_SYM_CLASS_NULL 0x0000
#define IMAGE_SYM_CLASS_AUTOMATIC 0x0001
#define IMAGE_SYM_CLASS_EXTERNAL 0x0002
#define IMAGE_SYM_CLASS_STATIC 0x0003
#define IMAGE_SYM_CLASS_REGISTER 0x0004
#define IMAGE_SYM_CLASS_EXTERNAL_DEF 0x0005
#define IMAGE_SYM_CLASS_LABEL 0x0006
#define IMAGE_SYM_CLASS_UNDEFINED_LABEL 0x0007
#define IMAGE_SYM_CLASS_MEMBER_OF_STRUCT 0x0008
#define IMAGE_SYM_CLASS_ARGUMENT 0x0009
#define IMAGE_SYM_CLASS_STRUCT_TAG 0x000A
#define IMAGE_SYM_CLASS_MEMBER_OF_UNION 0x000B
#define IMAGE_SYM_CLASS_UNION_TAG 0x000C
#define IMAGE_SYM_CLASS_TYPE_DEFINITION 0x000D
#define IMAGE_SYM_CLASS_UNDEFINED_STATIC 0x000E
#define IMAGE_SYM_CLASS_ENUM_TAG 0x000F
#define IMAGE_SYM_CLASS_MEMBER_OF_ENUM 0x0010
#define IMAGE_SYM_CLASS_REGISTER_PARAM 0x0011
#define IMAGE_SYM_CLASS_BIT_FIELD 0x0012
#define IMAGE_SYM_CLASS_FAR_EXTERNAL 0x0044 //
#define IMAGE_SYM_CLASS_BLOCK 0x0064
#define IMAGE_SYM_CLASS_FUNCTION 0x0065
#define IMAGE_SYM_CLASS_END_OF_STRUCT 0x0066
#define IMAGE_SYM_CLASS_FILE 0x0067
// new
#define IMAGE_SYM_CLASS_SECTION 0x0068
#define IMAGE_SYM_CLASS_WEAK_EXTERNAL 0x0069
As you can see, theirs is quite complete and well thought out. Thanks
to Mr. Bill Gates for the wonderful source code license and the freedom
to use residulas in MANOS for whatever we please.
Extend ELF to use the ELF-LPE format (a new term for what we will invent
- the ELF Linux Portable Executable Format). This would also allow us
to combine Sparc, Alpha, x86 segments into a single executable so the
customer won't have to recompile the kernel every time they want to
change something. SMP, non-SMP coud be handled the same as well.
Jeff
> David
>
> At 06:59 PM 11/7/00, Jeff Garzik wrote:
> >Sven Koch wrote:
> > >
> > > On Tue, 7 Nov 2000, David Lang wrote:
> > >
> > > > depending on what CPU you have the kernel (and compiler) can use
> > different
> > > > commands/opmizations/etc, if you want to do this on boot you have two
> > > > options.
> > >
> > > Wouldn't it be possible to compile the parts of the kernel needed to
> > > uncompress and to detect the cpu with lower optimizations and then abort
> > > with an error message?
> > >
> > > "Error: Kernel needs a PIII" sounds much better than just stoping dead.
> >
> >I agree... maybe we can solve this simply by giving the CPU detection
> >module the -march=i386 flag hardcoded, or editing the bootstrap, or
> >something like that...
> >
> > Jeff
>
> --------------------------------------------------------
> David Relson Osage Software Systems, Inc.
> [email protected] 514 W. Keech Ave.
> http://www.osagesoftware.com Ann Arbor, MI 48103
> voice: 734.821.8800 fax: 734.821.8800
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> Please read the FAQ at http://www.tux.org/lkml/
> There are tests for all this in the feature flags for intel and
> non-intel CPUs like AMD -- including MTRR settings. All of this could
> be dynamic. Here's some code that does this, and it's similiar to
> NetWare. It detexts CPU type, feature flags, special instructions,
> etc. All of this on x86 could be dynamically detected.
Detection isnt the issue, its optimisations. Our 386 kernel build is the
detect all run on any one.
> mov sp, bx
> mov CPU_TYPE, 3 ; 80386 detected
> jz end_get_cpuid
This is wrong btw. You don;t check for Cyrix with CPUID disabled or
the NexGen or pre CPUID Cyrix...
> check_CMPXCHG8B:
> mov ax, word ptr ds:FEATURE_FLAGS
> and ax, CMPXCHG8B_FLAG
> jz check_4MB_paging
This needs a few other bits of interesting checking for non intel chips
Jeff Merkey wrote:
> The PE model uses flags to identify CPU type and capbilities
So does ELF.
--
Jeff Garzik | "When I do this, my computer freezes."
Building 1024 | -user
MandrakeSoft | "Don't do that."
| -level 1
Alan Cox wrote:
>
> > There are tests for all this in the feature flags for intel and
> > non-intel CPUs like AMD -- including MTRR settings. All of this could
> > be dynamic. Here's some code that does this, and it's similiar to
> > NetWare. It detexts CPU type, feature flags, special instructions,
> > etc. All of this on x86 could be dynamically detected.
>
> Detection isnt the issue, its optimisations. Our 386 kernel build is the
> detect all run on any one.
>
> > mov sp, bx
> > mov CPU_TYPE, 3 ; 80386 detected
> > jz end_get_cpuid
>
> This is wrong btw. You don;t check for Cyrix with CPUID disabled or
> the NexGen or pre CPUID Cyrix...
Thanks Alan, I'll fix immediately.
Jeff
>
> > check_CMPXCHG8B:
> > mov ax, word ptr ds:FEATURE_FLAGS
> > and ax, CMPXCHG8B_FLAG
> > jz check_4MB_paging
>
> This needs a few other bits of interesting checking for non intel chips
I'll grab the code in linux and port.
8)
Jeff
Jeff Garzik wrote:
>
> Jeff Merkey wrote:
> > The PE model uses flags to identify CPU type and capbilities
>
> So does ELF.
Jeff,
Can we also combine mutiple segments from different processors or is it
a one-sy two-sy king of affair? If so, we're there, it just becomes a
linking option.
I am building the RPM for Ute Linux with 2.4.0-10 and right now, I have
to do what RedHat does, and create the /config "directory from hell"
with two dosen .config files and create multiple RPMs for each kernel
(i.e. i586, .i686). It's too convoluted and customers hate it.
8)
Jeff
>
> --
> Jeff Garzik | "When I do this, my computer freezes."
> Building 1024 | -user
> MandrakeSoft | "Don't do that."
> | -level 1
> We need a format that allow multiple executable segments to be combined
> in a single executable and the loader have enough smarts to grab the
> right one based on architecture. two options:
ELF can do that just fine
> I'll grab the code in linux and port.
You are welcome
Make sure you get a pretty current 2.2.x tree however. The ultra deep magic
for detecting NexGen processors is recent. It took a long time before I found
someone who knew how it worked 8)
Alan Cox wrote:
>
> > I'll grab the code in linux and port.
>
> You are welcome
>
> Make sure you get a pretty current 2.2.x tree however. The ultra deep magic
> for detecting NexGen processors is recent. It took a long time before I found
> someone who knew how it worked 8)
I'll get on it. Alan, if ELF can do this now, it would be good idea to
do this with the mutiple images. Sounds like it's just a link option
and a few more smarts in the lilo and boot loader to make it work.
8)
Jeff
Jeff Garzik wrote:
>
> "Jeff V. Merkey" wrote:
> > We need a format that allow multiple executable segments to be combined
> > in a single executable and the loader have enough smarts to grab the
> > right one based on architecture. two options:
> >
> > 1. extend gcc to support this or rearragne linux into segments based on
> > code type
> > 2. Use PE.
>
> The kernel isn't going non-ELF. Too painful, for dubious advantages,
> namely:
>
> The current gcc toolchain already supports what you suggest.
>
> I understand that some people have even put some thought into a
> bootloader that dynamically links your kernel on bootup, so this idea
> isn't new. It's a good idea though.
>
Yes, I have been working on it on and off for a while ("off" due to
various professional and personal issues taking higher priority for some
time...)
-hpa
--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt
There is a variation of #2 that is often good enough, based on some research
work done at (among other places) the Oregon Graduate Center. I don't have
the references handy, but you might want to look for papers on "sandboxing"
authored there.
The basic idea is similar to the one used by many 'recompile on the fly'
systems, and involves marking the code in such a way that even inline pieces
can be replaced on the fly. Very useful for things like system specific
memcpy implementations.
Marty
> -----Original Message-----
> From: David Lang [mailto:[email protected]]
> Sent: Tuesday, November 07, 2000 4:11 PM
> To: Jeff V. Merkey
> Cc: [email protected]; Martin Josefsson; Tigran Aivazian; Anil kumar;
> [email protected]
> Subject: Re: Installing kernel 2.4
>
>
> Jeff, the problem is not detecting the CPU type at runtime,
> the problem is
> trying to re-compile the code to take advantage of that CPU
> at runtime.
>
> depending on what CPU you have the kernel (and compiler) can
> use different
> commands/opmizations/etc, if you want to do this on boot you have two
> options.
>
> 1. re-compile the kernel
>
> 2. change all the CPU specific places from inline code to
> function calls
> into a table that get changed at boot to point at the correct calls.
>
> doing #2 will cost you so much performance that you would be
> better off
> just compiling for a 386 and not going through the autodetect
> hassle in
> the first place.
>
> David Lang
>
There's been a bunch of related work done at the Oregon Graduate Institute
by Calton Pu and others. See
http://www.cse.ogi.edu/DISC/projects/synthetix/publications.html for a list
of papers.
> -----Original Message-----
> From: [email protected] [mailto:[email protected]]
> Sent: Tuesday, November 07, 2000 3:25 PM
> To: Linux Kernel Mailing List
> Cc: [email protected]
> Subject: Re: Installing kernel 2.4
>
>
>
> > There are tests for all this in the feature flags for intel and
> > non-intel CPUs like AMD -- including MTRR settings. All of
> this could
> > be dynamic. Here's some code that does this, and it's similiar to
> > NetWare. It detexts CPU type, feature flags, special instructions,
> > etc. All of this on x86 could be dynamically detected.
>
> Detecting the CPU isn't the issue (we already do all this),
> it's what to
> do when you've figured out what the CPU is. Show me code that can
> dynamically adjust the alignment of the routines/variables/structs
> dependant upon cacheline size.
>
> regards,
>
> Davej.
>
> --
> | Dave Jones <[email protected]> http://www.suse.de/~davej
> | SuSE Labs
>
> -
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in
> the body of a message to [email protected]
> Please read the FAQ at http://www.tux.org/lkml/
>
On Tue, 7 Nov 2000, Jeff V. Merkey wrote:
> If the compiler always aligned all functions and data on 16 byte
> boundries (NetWare) for all i386 code, it would run a lot faster.
Except on architectures where 16 byte alignment isn't optimal.
> Cache line alignment could be an option in the loader .... after all,
> it's hte loader that locates data in memory. If Linux were PE based,
> relocation logic would be a snap with this model (like NT).
Are you suggesting multiple files of differing alignments packed into
a single kernel image, and have the loader select the correct one at
runtime ? I really hope I've misinterpreted your intention.
regards,
Davej.
--
| Dave Jones <[email protected]> http://www.suse.de/~davej
| SuSE Labs
On Tue, 7 Nov 2000, Jeff V. Merkey wrote:
> > > Detecting the CPU isn't the issue (we already do all this), it's what to
> > > do when you've figured out what the CPU is. Show me code that can
> > > dynamically adjust the alignment of the routines/variables/structs
> > > dependant upon cacheline size.
>
> ftp.timpanogas.org/manos/manos0817.tar.gz
>
> Look in the PE loader
The last time I looked at your code, I stopped reading after I got
to a comment mentioning trade secrets, and intellectual property.
> -- Microsoft's PE loader can do this since everything is RVA based.
> If you want to take the loader and put it in Linux, be my guest.
Why ??
> You can even combine mutiple i86 segments all compiled under different
> options (or architectures) and bundle them into a single executable file
There is nothing stopping us from doing that now, we just choose not to,
as it would result in a ridiculously oversized kernel. Even if the loader
threw away the non-used segments, I don't think anyone can justify an
on-disk kernel image containing mostly code they never execute.
regards,
Davej.
--
| Dave Jones <[email protected]> http://www.suse.de/~davej
| SuSE Labs
On Tue, 7 Nov 2000, Jeff V. Merkey wrote:
(Please forgive this snippage making Jeff look less literate
than he is, even after several beers.)
> We need a format that allow
[..]
> the right one based on architecture.
Oh, we already have that. It's called source code.
Matthew.
On Tue, 7 Nov 2000, Jeff V. Merkey wrote:
> > remember it's not just the start of the file that varies based on cachline
> > size, it's the positioning of code and data thoughout the kernel image.
> Understood. I will go off and give some thought and study and respond
> later after I have a proposal on the best way to do this. In NetWare,
> we had indirections in the code all over the place. NT just make huge
> and fat programs (NTKRNLOS.DLL is absolutely huge).
I'm glad you realise this. The Netware method you mention above sounds
over complicated for the desired end result, and the NT method just sounds
like a gross hack.
The current 'compile for the arch you intend to run on' is right now,
the simplest, cleanest way to do this.
If you manage to pull something off in MANOS or whatever other OS,
to prove all this otherwise (without resorting to ugly hacks like the
above), great for you, I (and I assume others) would like to hear
about it.
regards,
Davej.
--
| Dave Jones <[email protected]> http://www.suse.de/~davej
| SuSE Labs
On Wed, Nov 08, 2000 at 03:25:56AM +0000, [email protected] wrote:
> On Tue, 7 Nov 2000, Jeff V. Merkey wrote:
>
> > If the compiler always aligned all functions and data on 16 byte
> > boundries (NetWare) for all i386 code, it would run a lot faster.
>
> Except on architectures where 16 byte alignment isn't optimal.
>
> > Cache line alignment could be an option in the loader .... after all,
> > it's hte loader that locates data in memory. If Linux were PE based,
> > relocation logic would be a snap with this model (like NT).
>
> Are you suggesting multiple files of differing alignments packed into
> a single kernel image, and have the loader select the correct one at
> runtime ? I really hope I've misinterpreted your intention.
Or more practically, a smart loader than could select a kernel image
based on arch and auto-detect to load the correct image. I don't really
think it matters much what mechanism is used.
What makes more sense is to pack multiple segments for different
processor architecures into a single executable package, and have the
loader pick the right one (the NT model). It could be used for
SMP and non-SMP images, though, as well as i386, i586, i686, etc.
Jeff
>
> regards,
>
> Davej.
>
> --
> | Dave Jones <[email protected]> http://www.suse.de/~davej
> | SuSE Labs
On Tue, 7 Nov 2000, Marty Fouts wrote:
> There's been a bunch of related work done at the Oregon Graduate Institute
> by Calton Pu and others. See
> http://www.cse.ogi.edu/DISC/projects/synthetix/publications.html for a list
> of papers.
The only paper that immediately caught my eye of relevance was the one
on dynamic optimization techniques, which is what I assume you
were referring to.
It's interesting stuff, but I think it'd be a cold day in hell before
Linus accepts a dynamic recompiler in kernel space. :)
regards,
davej.
--
| Dave Jones <[email protected]> http://www.suse.de/~davej
| SuSE Labs
On Wed, Nov 08, 2000 at 03:39:39AM +0000, [email protected] wrote:
> On Tue, 7 Nov 2000, Jeff V. Merkey wrote:
>
> > > remember it's not just the start of the file that varies based on cachline
> > > size, it's the positioning of code and data thoughout the kernel image.
> > Understood. I will go off and give some thought and study and respond
> > later after I have a proposal on the best way to do this. In NetWare,
> > we had indirections in the code all over the place. NT just make huge
> > and fat programs (NTKRNLOS.DLL is absolutely huge).
>
> I'm glad you realise this. The Netware method you mention above sounds
> over complicated for the desired end result, and the NT method just sounds
> like a gross hack.
>
> The current 'compile for the arch you intend to run on' is right now,
> the simplest, cleanest way to do this.
>
> If you manage to pull something off in MANOS or whatever other OS,
> to prove all this otherwise (without resorting to ugly hacks like the
> above), great for you, I (and I assume others) would like to hear
> about it.
Your way out in the weeds. What started this thread was a customer who
ended up loading the wrong arch on a system and hanging. I have to
post a kernel RPM for our release, and it's onerous to make customers
recompile kernels all the time and be guinea pigs for arch ports.
They just want it to boot, and run with the same level of ease of use
and stability they get with NT and NetWare and other stuff they are used
to. This is an easy choice from where I'm sitting.
Jeff
> Davej.
>
> --
> | Dave Jones <[email protected]> http://www.suse.de/~davej
> | SuSE Labs
On Tue, 7 Nov 2000, Jeff V. Merkey wrote:
> What makes more sense is to pack multiple segments for different
> processor architecures into a single executable package, and have the
> loader pick the right one (the NT model). It could be used for
> SMP and non-SMP images, though, as well as i386, i586, i686, etc.
Jeff, in x86 alone, there are 13 different compile targets (2.4 tree),
soon to be more when Cyrix III & Pentium IV get added.
Although it doesn't make sense on all of these, it's possible to
compile any of them with SMP support too.
That's 30 different combinations.
Suggesting to put all these into one file isn't a bad idea,
it's bordering on insanity. What do you hope to achieve by doing
this, apart from the end user not having to choose a custom kernel
for their architecture ? Much better to have several kernels built
seperately for each arch, and have the user pick which one
(or even have the distro installer autodetect) at install time,
as SuSE, Red Hat, Mandrake, and several other distros are now doing.
Everything all in one may be the way NT does it, but that does not
mean it's a good idea. In fact it's anything but a good idea.
Please don't try to bring the braindamages of NT to Linux, it
just isn't meant to happen.
regards,
Davej.
--
| Dave Jones <[email protected]> http://www.suse.de/~davej
| SuSE Labs
On Tue, 7 Nov 2000, Jeff V. Merkey wrote:
> Your way out in the weeds. What started this thread was a customer who
> ended up loading the wrong arch on a system and hanging. I have to
> post a kernel RPM for our release, and it's onerous to make customers
> recompile kernels all the time and be guinea pigs for arch ports.
> They just want it to boot, and run with the same level of ease of use
> and stability they get with NT and NetWare and other stuff they are used
> to. This is an easy choice from where I'm sitting.
So you're complaining that as a vendor you have to ship multiple kernels?
The point remains the same.
The only time I recall recently where a kernel hasn't booted was when the
AMD Athlon appeared, and the MTRR code needed fixing.
There wasn't a lot anyone could have done, without seeing documentation
(which iirc wasn't available at the time).
The reason NT & Netware probably loaded fine is that they don't set
the MTRRs themselves, but rely on third party utilities to do this
for them after they've booted.
All other recent cases of non booting that I've seen have been a
case of user error miscompiling for a wrong target.
As a vendor, you don't worry about this as you ship binary kernels,
and $enduser never needs to see a source tree.
davej.
--
| Dave Jones <[email protected]> http://www.suse.de/~davej
| SuSE Labs
On Tue, Nov 07, 2000 at 06:18:09PM -0800, H. Peter Anvin wrote:
> Jeff Garzik wrote:
> >
> > "Jeff V. Merkey" wrote:
> > > We need a format that allow multiple executable segments to be combined
> > > in a single executable and the loader have enough smarts to grab the
> > > right one based on architecture. two options:
> > >
> > > 1. extend gcc to support this or rearragne linux into segments based on
> > > code type
> > > 2. Use PE.
> >
> > The kernel isn't going non-ELF. Too painful, for dubious advantages,
> > namely:
> >
> > The current gcc toolchain already supports what you suggest.
> >
> > I understand that some people have even put some thought into a
> > bootloader that dynamically links your kernel on bootup, so this idea
> > isn't new. It's a good idea though.
> >
>
> Yes, I have been working on it on and off for a while ("off" due to
> various professional and personal issues taking higher priority for some
> time...)
Keep truckin' H. Peter, this is something that's needed.
:-)
Jeff
>
> -hpa
>
> --
> <[email protected]> at work, <[email protected]> in private!
> "Unix gives you enough rope to shoot yourself in the foot."
> http://www.zytor.com/~hpa/puzzle.txt
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> Please read the FAQ at http://www.tux.org/lkml/
"Jeff V. Merkey" <[email protected]> said:
[...]
> Your way out in the weeds. What started this thread was a customer who
> ended up loading the wrong arch on a system and hanging. I have to
> post a kernel RPM for our release, and it's onerous to make customers
> recompile kernels all the time and be guinea pigs for arch ports.
I'd prefer to be a guinea pig for one of 3 or 4 generic kernels distributed
in binary than of one of the hundreds of possibilities of patching a kernel
together at boot, plus the (presumamby rather complex and fragile)
machinery to do so *before* the kernel is booted, thank you very much.
Plus I'm getting pissed off by how long a boot takes as it stands today...
> They just want it to boot, and run with the same level of ease of use
> and stability they get with NT and NetWare and other stuff they are used
> to. This is an easy choice from where I'm sitting.
Easy: i386. Or i486 (I very much doubt your customers run on less, and this
should be geneic enough).
--
Dr. Horst H. von Brand mailto:[email protected]
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria +56 32 654239
Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513
--------- Received message begins Here ---------
>
> On Wed, Nov 08, 2000 at 03:25:56AM +0000, [email protected] wrote:
> > On Tue, 7 Nov 2000, Jeff V. Merkey wrote:
> >
> > > If the compiler always aligned all functions and data on 16 byte
> > > boundries (NetWare) for all i386 code, it would run a lot faster.
> >
> > Except on architectures where 16 byte alignment isn't optimal.
> >
> > > Cache line alignment could be an option in the loader .... after all,
> > > it's hte loader that locates data in memory. If Linux were PE based,
> > > relocation logic would be a snap with this model (like NT).
> >
> > Are you suggesting multiple files of differing alignments packed into
> > a single kernel image, and have the loader select the correct one at
> > runtime ? I really hope I've misinterpreted your intention.
>
> Or more practically, a smart loader than could select a kernel image
> based on arch and auto-detect to load the correct image. I don't really
> think it matters much what mechanism is used.
>
> What makes more sense is to pack multiple segments for different
> processor architecures into a single executable package, and have the
> loader pick the right one (the NT model). It could be used for
> SMP and non-SMP images, though, as well as i386, i586, i686, etc.
Sure.. and it will also be able to boot on Alpha/Sparc/PPC....:)
The best is to have the installer (person) to select the primary
archecture from a CD. There will NOT be a single boot loader that will
work for all systems. At best, there will have to be one per CPU family,
but more likely, one per BIOS structure. This is the only thing that can
determine the primary boot.
The primary boot can then determine which CPU type (starting with the
smallest common CPU), and set flags for a kernel (minimal kernel) load.
During the startup of THAT kernel then the selection of target RPM can
be made that would install a kernel for the specific architetcure. After
a (minimal?) system install, a reboot would be necessary.
It actually seems like it would be simpler to use the minimal kernel
to rebuild the kernel for the local architecture. MUCH less work.
This still requires a CPU family selection by the person doing the install.
Nothing will get around that.
-------------------------------------------------------------------------
Jesse I Pollard, II
Email: [email protected]
Any opinions expressed are solely my own.
>
> On Wed, Nov 08, 2000 at 03:25:56AM +0000, [email protected] wrote:
> > On Tue, 7 Nov 2000, Jeff V. Merkey wrote:
> >
> > > If the compiler always aligned all functions and data on 16 byte
> > > boundries (NetWare) for all i386 code, it would run a lot faster.
> >
> > Except on architectures where 16 byte alignment isn't optimal.
> >
> > > Cache line alignment could be an option in the loader .... after all,
> > > it's hte loader that locates data in memory. If Linux were PE based,
> > > relocation logic would be a snap with this model (like NT).
> >
> > Are you suggesting multiple files of differing alignments packed into
> > a single kernel image, and have the loader select the correct one at
> > runtime ? I really hope I've misinterpreted your intention.
>
> Or more practically, a smart loader than could select a kernel image
> based on arch and auto-detect to load the correct image. I don't really
> think it matters much what mechanism is used.
>
> What makes more sense is to pack multiple segments for different
> processor architecures into a single executable package, and have the
> loader pick the right one (the NT model). It could be used for
> SMP and non-SMP images, though, as well as i386, i586, i686, etc.
And this would fit on my 1.4bm floppy so I can boot my hard driveless
firewalling system, correct?
On Wed, 8 Nov 2000 [email protected] wrote:
> > On Wed, Nov 08, 2000 at 03:25:56AM +0000, [email protected] wrote:
> > > On Tue, 7 Nov 2000, Jeff V. Merkey wrote:
> > > > If the compiler always aligned all functions and data on 16 byte
> > > > boundries (NetWare) for all i386 code, it would run a lot faster.
> > >
> > > Except on architectures where 16 byte alignment isn't optimal.
> > >
> > > > Cache line alignment could be an option in the loader .... after all,
> > > > it's hte loader that locates data in memory. If Linux were PE based,
> > > > relocation logic would be a snap with this model (like NT).
> > >
> > > Are you suggesting multiple files of differing alignments packed into
> > > a single kernel image, and have the loader select the correct one at
> > > runtime ? I really hope I've misinterpreted your intention.
> >
> > Or more practically, a smart loader than could select a kernel image
> > based on arch and auto-detect to load the correct image. I don't really
> > think it matters much what mechanism is used.
> >
> > What makes more sense is to pack multiple segments for different
> > processor architecures into a single executable package, and have the
> > loader pick the right one (the NT model). It could be used for
> > SMP and non-SMP images, though, as well as i386, i586, i686, etc.
> And this would fit on my 1.4bm floppy so I can boot my hard driveless
> firewalling system, correct?
Your mailer is misattributing people. I didn't say that, my comments were
the ones you've attributed to Jeff.
regards,
davej.
--
| Dave Jones <[email protected]> http://www.suse.de/~davej
| SuSE Labs
Horst von Brand <[email protected]> writes:
> I'd prefer to be a guinea pig for one of 3 or 4 generic kernels distributed
> in binary than of one of the hundreds of possibilities of patching a kernel
> together at boot, plus the (presumamby rather complex and fragile)
> machinery to do so *before* the kernel is booted, thank you very much.
>
> Plus I'm getting pissed off by how long a boot takes as it stands today...
Just for reference I can Boot from Power on to Login prompt in 12 seconds.
With Linux. The big change is nuking the BIOS....
> > They just want it to boot, and run with the same level of ease of use
> > and stability they get with NT and NetWare and other stuff they are used
> > to. This is an easy choice from where I'm sitting.
>
> Easy: i386. Or i486 (I very much doubt your customers run on less, and this
> should be geneic enough).
It's also possible to do a two stage boot. Stage 1 i386 kernel stage 2 the
specific kernel for the machine.... This adds about a second to the
whole boot process.
Eric
On Wed, 08 Nov 2000, Horst von Brand wrote:
> "Jeff V. Merkey" <[email protected]> said:
>
> [...]
>
> > Your way out in the weeds. What started this thread was a customer who
> > ended up loading the wrong arch on a system and hanging. I have to
> > post a kernel RPM for our release, and it's onerous to make customers
> > recompile kernels all the time and be guinea pigs for arch ports.
>
> I'd prefer to be a guinea pig for one of 3 or 4 generic kernels distributed
> in binary than of one of the hundreds of possibilities of patching a kernel
> together at boot, plus the (presumamby rather complex and fragile)
> machinery to do so *before* the kernel is booted, thank you very much.
Hmm... some mechanism for selecting the appropriate *module* might be nice,
after boot...
> Plus I'm getting pissed off by how long a boot takes as it stands today...
Yep: slowing down boottimes is not an attractive idea.
> > They just want it to boot, and run with the same level of ease of use
> > and stability they get with NT and NetWare and other stuff they are used
> > to. This is an easy choice from where I'm sitting.
>
> Easy: i386. Or i486 (I very much doubt your customers run on less, and this
> should be geneic enough).
I think there are better options. Jeff could, for example, *optimise* for
Pentium II/III, without using PII specific instructions, in the main kernel,
then have multiple target binaries for modules.
James.
But, here the customer did run the configure code (he said he did not
change anything). Isn't this where the machine should be diagnosed and
the right options chosen? Need a way to say it is a cross build, but
that shouldn't be too hard.
My $.02 worth.
George
"James A. Sutherland" wrote:
>
> On Wed, 08 Nov 2000, Horst von Brand wrote:
> > "Jeff V. Merkey" <[email protected]> said:
> >
> > [...]
> >
> > > Your way out in the weeds. What started this thread was a customer who
> > > ended up loading the wrong arch on a system and hanging. I have to
> > > post a kernel RPM for our release, and it's onerous to make customers
> > > recompile kernels all the time and be guinea pigs for arch ports.
> >
> > I'd prefer to be a guinea pig for one of 3 or 4 generic kernels distributed
> > in binary than of one of the hundreds of possibilities of patching a kernel
> > together at boot, plus the (presumamby rather complex and fragile)
> > machinery to do so *before* the kernel is booted, thank you very much.
>
> Hmm... some mechanism for selecting the appropriate *module* might be nice,
> after boot...
>
> > Plus I'm getting pissed off by how long a boot takes as it stands today...
>
> Yep: slowing down boottimes is not an attractive idea.
>
> > > They just want it to boot, and run with the same level of ease of use
> > > and stability they get with NT and NetWare and other stuff they are used
> > > to. This is an easy choice from where I'm sitting.
> >
> > Easy: i386. Or i486 (I very much doubt your customers run on less, and this
> > should be geneic enough).
>
> I think there are better options. Jeff could, for example, *optimise* for
> Pentium II/III, without using PII specific instructions, in the main kernel,
> then have multiple target binaries for modules.
>
> James.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> Please read the FAQ at http://www.tux.org/lkml/
On Wed, Nov 08, 2000 at 07:43:29AM -0600, Jesse Pollard wrote:
> --------- Received message begins Here ---------
>
> >
> > On Wed, Nov 08, 2000 at 03:25:56AM +0000, [email protected] wrote:
> > > On Tue, 7 Nov 2000, Jeff V. Merkey wrote:
> > >
> > > > If the compiler always aligned all functions and data on 16 byte
> > > > boundries (NetWare) for all i386 code, it would run a lot faster.
> > >
> > > Except on architectures where 16 byte alignment isn't optimal.
> > >
> > > > Cache line alignment could be an option in the loader .... after all,
> > > > it's hte loader that locates data in memory. If Linux were PE based,
> > > > relocation logic would be a snap with this model (like NT).
> > >
> > > Are you suggesting multiple files of differing alignments packed into
> > > a single kernel image, and have the loader select the correct one at
> > > runtime ? I really hope I've misinterpreted your intention.
> >
> > Or more practically, a smart loader than could select a kernel image
> > based on arch and auto-detect to load the correct image. I don't really
> > think it matters much what mechanism is used.
> >
> > What makes more sense is to pack multiple segments for different
> > processor architecures into a single executable package, and have the
> > loader pick the right one (the NT model). It could be used for
> > SMP and non-SMP images, though, as well as i386, i586, i686, etc.
>
> Sure.. and it will also be able to boot on Alpha/Sparc/PPC....:)
>
> The best is to have the installer (person) to select the primary
> archecture from a CD. There will NOT be a single boot loader that will
> work for all systems. At best, there will have to be one per CPU family,
> but more likely, one per BIOS structure. This is the only thing that can
> determine the primary boot.
>
> The primary boot can then determine which CPU type (starting with the
> smallest common CPU), and set flags for a kernel (minimal kernel) load.
> During the startup of THAT kernel then the selection of target RPM can
> be made that would install a kernel for the specific architetcure. After
> a (minimal?) system install, a reboot would be necessary.
>
> It actually seems like it would be simpler to use the minimal kernel
> to rebuild the kernel for the local architecture. MUCH less work.
> This still requires a CPU family selection by the person doing the install.
> Nothing will get around that.
I am hesitant to jump in since hpa is working on something like this. I
think I would like to wait and see what he puts out. If he would like
for me in my spare time to help him with it, I think I'd love to.
Jeff
>
> -------------------------------------------------------------------------
> Jesse I Pollard, II
> Email: [email protected]
>
> Any opinions expressed are solely my own.
On Wed, Nov 08, 2000 at 08:49:15AM -0500, [email protected] wrote:
>
> >
> > On Wed, Nov 08, 2000 at 03:25:56AM +0000, [email protected] wrote:
> > > On Tue, 7 Nov 2000, Jeff V. Merkey wrote:
> > >
> > > > If the compiler always aligned all functions and data on 16 byte
> > > > boundries (NetWare) for all i386 code, it would run a lot faster.
> > >
> > > Except on architectures where 16 byte alignment isn't optimal.
> > >
> > > > Cache line alignment could be an option in the loader .... after all,
> > > > it's hte loader that locates data in memory. If Linux were PE based,
> > > > relocation logic would be a snap with this model (like NT).
> > >
> > > Are you suggesting multiple files of differing alignments packed into
> > > a single kernel image, and have the loader select the correct one at
> > > runtime ? I really hope I've misinterpreted your intention.
> >
> > Or more practically, a smart loader than could select a kernel image
> > based on arch and auto-detect to load the correct image. I don't really
> > think it matters much what mechanism is used.
> >
> > What makes more sense is to pack multiple segments for different
> > processor architecures into a single executable package, and have the
> > loader pick the right one (the NT model). It could be used for
> > SMP and non-SMP images, though, as well as i386, i586, i686, etc.
>
>
> And this would fit on my 1.4bm floppy so I can boot my hard driveless
> firewalling system, correct?
Hard disks (20GB) are about $100.00 these days. CD-ROM drives are even
cheaper. A smart loader will certainly fit on a floppy.
Jeff
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> Please read the FAQ at http://www.tux.org/lkml/
It might be convenient to have a completely unoptimized 386 kernel. While
this would obviously be non-optimal in all cases, it would be compatible
with everything and probably faster on non-386 than a 386-optimized
kernel. Of course, the gains are probably not worth the time it would take
to write one, as I would hope that most linux users are willing to compile
their own kernels...
--
This message has been brought to you by the letter alpha and the number pi.
David Feuer
[email protected]
On Wed, 08 Nov 2000, George Anzinger wrote:
> But, here the customer did run the configure code (he said he did not
> change anything). Isn't this where the machine should be diagnosed and
> the right options chosen? Need a way to say it is a cross build, but
> that shouldn't be too hard.
Why default to incompatibility?! If the user explicitly says "I really do want
a kernel which only works on this specific machine as it is now, and I want it
to break otherwise", fine. Don't make it a default!
BTW: Has anyone benchmarked the different optimizations - i.e. how much
difference does optimizing for a Pentium make when running on a PII? More to
the point, how about optimizing non-exclusively for a Pentium, so the code
still runs on earlier CPUs?
James.
"James A. Sutherland" wrote:
>
> On Wed, 08 Nov 2000, George Anzinger wrote:
> > But, here the customer did run the configure code (he said he did not
> > change anything). Isn't this where the machine should be diagnosed and
> > the right options chosen? Need a way to say it is a cross build, but
> > that shouldn't be too hard.
>
> Why default to incompatibility?! If the user explicitly says "I really do want
> a kernel which only works on this specific machine as it is now, and I want it
> to break otherwise", fine. Don't make it a default!
I could go along with this. The user, however, had the default break,
and, to my knowledge, there are no tools to diagnose the current (or any
other) machine anywhere in the kernel. Maybe it is time to do such a
tool with exports that the configure programs could use as defaults. My
thought is that the tool could run independently on the target system
(be it local or otherwise) with the results fed back to configure.
(Oops, corollary to the rule that "The squeaking wheel gets the grease."
is "S/he who complains most about the squeaking gets to do the
greasing." I better keep quiet :)
>
> BTW: Has anyone benchmarked the different optimizations - i.e. how much
> difference does optimizing for a Pentium make when running on a PII? More to
> the point, how about optimizing non-exclusively for a Pentium, so the code
> still runs on earlier CPUs?
>
> James.
On Wed, 08 Nov 2000, George Anzinger wrote:
> "James A. Sutherland" wrote:
> >
> > On Wed, 08 Nov 2000, George Anzinger wrote:
> > > But, here the customer did run the configure code (he said he did not
> > > change anything). Isn't this where the machine should be diagnosed and
> > > the right options chosen? Need a way to say it is a cross build, but
> > > that shouldn't be too hard.
> >
> > Why default to incompatibility?! If the user explicitly says "I really do want
> > a kernel which only works on this specific machine as it is now, and I want it
> > to break otherwise", fine. Don't make it a default!
>
> I could go along with this. The user, however, had the default break,
> and, to my knowledge, there are no tools to diagnose the current (or any
> other) machine anywhere in the kernel. Maybe it is time to do such a
> tool with exports that the configure programs could use as defaults. My
> thought is that the tool could run independently on the target system
> (be it local or otherwise) with the results fed back to configure.
I think a default whereby the kernel built will run on any Linux-capable
machine of that architecture would be sensible - so if I grab the 2.4.0t10
tarball and build it now, with no changes, I'll be able to boot the kernel on
any x86 machine.
> (Oops, corollary to the rule that "The squeaking wheel gets the grease."
> is "S/he who complains most about the squeaking gets to do the
> greasing." I better keep quiet :)
I'm still not convinced the wheel IS squeaking - anyone got those benchmarks??
James.
[email protected] said:
> I think a default whereby the kernel built will run on any
> Linux-capable machine of that architecture would be sensible - so if I
> grab the 2.4.0t10 tarball and build it now, with no changes, I'll be
> able to boot the kernel on any x86 machine.
I have four machines on my desk at the moment. The workstation is a dual
P-III. I suppose I agree that it might be nice if the kernel for that also
worked on the embedded 386 board. But it'd also be nice if it worked on the
Alpha and the SH boards which are also on my desk. How about putting the
whole lot into a single kernel image? It's the logical extension of what's
being suggested.
--
dwmw2
> I would like to see some features added to ELF. Resource binding support
> would be nice, i.e. bitmaps used internally by GUI apps and such, so
> that they can be shared between processes if they are in a shared lib,
You can do shared mappings of almost anything anyway. In fact most of the
shared library loading is done in user space via mmap.
There are good reasons for not putting resources into the program itself too,
one of which is customisability.
On Thu, 09 Nov 2000, David Woodhouse wrote:
> [email protected] said:
> > I think a default whereby the kernel built will run on any
> > Linux-capable machine of that architecture would be sensible - so if I
> > grab the 2.4.0t10 tarball and build it now, with no changes, I'll be
> > able to boot the kernel on any x86 machine.
>
> I have four machines on my desk at the moment. The workstation is a dual
> P-III. I suppose I agree that it might be nice if the kernel for that also
> worked on the embedded 386 board. But it'd also be nice if it worked on the
> Alpha and the SH boards which are also on my desk. How about putting the
> whole lot into a single kernel image? It's the logical extension of what's
> being suggested.
No. In the x86 case, it is a question of "do we deliberately restrict this
kernel to running only on a Pentium II in order to make it x% faster". My
suggestion does not duplicate any code, or (with a few exceptions) include any
redundant code for any platform (maths emulation, e.g., would be an exception).
Yours duplicates code, rather than just not optimising it as aggressively.
James.
"Jeff V. Merkey" wrote:
> > The kernel isn't going non-ELF. Too painful, for dubious advantages,
> > namely:
> >
>
> perhaps we should extend ELF. After all, where linux goes, gcc
> follows....
I would like to see some features added to ELF. Resource binding support
would be nice, i.e. bitmaps used internally by GUI apps and such, so
that they can be shared between processes if they are in a shared lib,
and so that the app can reload faster if the resources are cached. I
suspect this is what allows netscape to restart in < 2 sec under Windows
or OS/2, versus ~5 sec under Linux on the same system.
Executable signing or at least a CRC (optional, of course) would be nice
too. Version strings would be helpful in some cases as well (like bad
programs that don't/can't support the -v|-V|--version options, or for
automated retrieval of the information)
Sorry if these features are supported already.
--
Mark McClelland
[email protected]