I've been regularly building kernels in the testXX series, and
they have been coming out ~ 600K; test10-final and test11-pre1:
-rw-r--r-- 1 root root 610503 Oct 31 18:39
vmlinuz-t10
-rw-r--r-- 1 root root 610568 Nov 7 20:26
vmlinuz-t11p01
test11-pre2 comes out ~ 900K:
-rw-r--r-- 1 root root 926345 Nov 10 10:16
vmlinuz-t11p02
and is thus unusable.
I believe I am following all the same steps, nothing new, make
dep bzImage modules modules_install.
Bob L.
Followup to: <[email protected]>
By author: Robert Lynch <[email protected]>
In newsgroup: linux.dev.kernel
>
> I've been regularly building kernels in the testXX series, and
> they have been coming out ~ 600K; test10-final and test11-pre1:
>
> -rw-r--r-- 1 root root 610503 Oct 31 18:39
> vmlinuz-t10
> -rw-r--r-- 1 root root 610568 Nov 7 20:26
> vmlinuz-t11p01
>
> test11-pre2 comes out ~ 900K:
>
> -rw-r--r-- 1 root root 926345 Nov 10 10:16
> vmlinuz-t11p02
>
> and is thus unusable.
>
> I believe I am following all the same steps, nothing new, make
> dep bzImage modules modules_install.
>
Different compile options?
Why is a 900K kernel unusable?
-hpa
--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt
On 10 Nov 2000, H. Peter Anvin wrote:
>Different compile options?
>
>Why is a 900K kernel unusable?
>
> -hpa
My guess would be it not actually bzipping the kernel. Id run make
bzImage again and making sure it is bzipping it.
On x86 machines there is a size limitation on booting. Though I thought
it was 1024K as the max, 900K should be fine.
William Tiemann
[email protected]
http://www.OpenPGP.Net
Max Inux wrote:
>
> On 10 Nov 2000, H. Peter Anvin wrote:
> >Different compile options?
> >
> >Why is a 900K kernel unusable?
> >
> > -hpa
>
> My guess would be it not actually bzipping the kernel. Id run make
> bzImage again and making sure it is bzipping it.
>
gzip, actually. I can verify here "make bzImage" does the expected thing
and it looks normal-sized to me.
>
> On x86 machines there is a size limitation on booting. Though I thought
> it was 1024K as the max, 900K should be fine.
>
No, there isn't. There used to be, but it has been fixed.
-hpa
--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt
"H. Peter Anvin" <[email protected]> writes:
> > On x86 machines there is a size limitation on booting. Though I thought
> > it was 1024K as the max, 900K should be fine.
> No, there isn't. There used to be, but it has been fixed.
the main problem is for us distribution if we want to fit this on a
disk with a couple of modules for our installation process.
--
MandrakeSoft Inc http://www.chmouel.org
--Chmouel
[Robert Lynch]
> I've been regularly building kernels in the testXX series, and
> they have been coming out ~ 600K; test10-final and test11-pre1:
>
> -rw-r--r-- 1 root root 610503 Oct 31 18:39 vmlinuz-t10
> -rw-r--r-- 1 root root 610568 Nov 7 20:26 vmlinuz-t11p01
>
> test11-pre2 comes out ~ 900K:
>
> -rw-r--r-- 1 root root 926345 Nov 10 10:16 vmlinuz-t11p02
Track it down yourself:
1) The sizes of your two 'vmlinux' files: do they differ wildly as well?
2a) If no, check the make logs between the vmlinux link line and bzImage
creation. Compare the two and note any significant differences.
2b) If yes, write a perl script to compute symbol sizes from each
System.map file. (Symbol size == address of next symbol minus
address of this symbol.) Sort numerically, then compare old vs new
for symbols that have grown a lot, or large new symbols.
Peter
>gzip, actually. I can verify here "make bzImage" does the expected thing
>and it looks normal-sized to me.
I believe there is zImage (gzip) and bzImage (bzip2). (Or is it compress
vs gzip, but then why bzImage vs gzImage?)
>> On x86 machines there is a size limitation on booting. Though I thought
>> it was 1024K as the max, 900K should be fine.
>>
>
>No, there isn't. There used to be, but it has been fixed.
Ok then, I was on crank, and apparently so is he =)
William Tiemann
<[email protected]>
http://www.OpenPGP.Net
On Sat, Nov 11, 2000 at 03:27:36AM -0800, Max Inux wrote:
> >gzip, actually. I can verify here "make bzImage" does the expected thing
> >and it looks normal-sized to me.
>
> I believe there is zImage (gzip) and bzImage (bzip2). (Or is it compress
> vs gzip, but then why bzImage vs gzImage?)
IMHO bzImage means something like 'big zImage', it uses the same compression
but a different loader. IIRC bzImage became necessary when the (uncompressed)
kernel grew above 1MB.
Jan
Mike Harris corrected me, which puts life back where it started reading
other replies. bzimage = Big zImage removing the 640K limitation. I
have not upgraded to 2.4.0-test11-pre2 from test10, when I do I will see
if I get simmilar results.
Sorry,
William Tiemann
<[email protected]>
http://www.OpenPGP.Net
On Fri, 10 Nov 2000, H. Peter Anvin wrote:
> >
> > On x86 machines there is a size limitation on booting. Though I thought
> > it was 1024K as the max, 900K should be fine.
> >
>
> No, there isn't. There used to be, but it has been fixed.
>
Are you sure? I thought the fix was to build 2 page tables for 0-8M
instead of 1 page table for 0-4M. So, we still cannot boot a bzImage more
than 2.5M which roughly corresponds to 8M. Is this incorrect? Are you
saying I should be able to boot a bzImage corresponding to an ELF object
vmlinux of 4G or more?
I tried it and it failed (a few weeks ago) so at least reasonably recently
what you are saying was not true. I will now check if it suddenly became
true now.
Regards,
Tigran
On Sat, 11 Nov 2000, Tigran Aivazian wrote:
> On Fri, 10 Nov 2000, H. Peter Anvin wrote:
> > >
> > > On x86 machines there is a size limitation on booting. Though I thought
> > > it was 1024K as the max, 900K should be fine.
> > >
> >
> > No, there isn't. There used to be, but it has been fixed.
> >
>
> Are you sure? I thought the fix was to build 2 page tables for 0-8M
> instead of 1 page table for 0-4M. So, we still cannot boot a bzImage more
> than 2.5M which roughly corresponds to 8M. Is this incorrect? Are you
> saying I should be able to boot a bzImage corresponding to an ELF object
> vmlinux of 4G or more?
>
> I tried it and it failed (a few weeks ago) so at least reasonably recently
> what you are saying was not true. I will now check if it suddenly became
> true now.
Just to clarify -- I always eat words on the first round -- of course I
know that there is no limit at 1M, that is obvious -- what I do _not_ know
if there is no limit at 2.5M -- this is non-obvious and requires proof.
Regards,
Tigran
May I recomend a read of Documentation/i386/boot.txt, it explains exactly
what is done
Protocol 2.02: (Kernel 2.4.0-test3-pre3) New command line protocol.
Lower the conventional memory ceiling. No overwrite
of the traditional setup area, thus making booting
safe for systems which use the EBDA from SMM or 32-bit
BIOS entry points. zImage deprecated but still
supported.
2.01 may have had the issue you speak of, looks like this fixes it.
William Tiemann
<[email protected]>
http://www.OpenPGP.Net
On Sat, 11 Nov 2000, Max Inux wrote:
> >gzip, actually. I can verify here "make bzImage" does the expected thing
> >and it looks normal-sized to me.
>
> I believe there is zImage (gzip) and bzImage (bzip2). (Or is it compress
> vs gzip, but then why bzImage vs gzImage?)
Neither. They are both compressed the same way (gzip, IIRC) - the difference is
in how they are loaded. bzImage (= BIG zImage) has a loader which can handle
>1Mb RAM; zImage has to be loaded into normal DOS memory, so it has a size
limitation.
> >> On x86 machines there is a size limitation on booting. Though I thought
> >> it was 1024K as the max, 900K should be fine.
> >>
> >
> >No, there isn't. There used to be, but it has been fixed.
>
> Ok then, I was on crank, and apparently so is he =)
ROFL! What is this "crank" stuff, BTW - some sort of auto lubricant, or ...?
James.
On Fri, Nov 10, 2000 at 11:47:50PM -0600, Peter Samuelson wrote:
> 2b) If yes, write a perl script to compute symbol sizes from each
> System.map file. (Symbol size == address of next symbol minus
> address of this symbol.) Sort numerically, then compare old vs new
> for symbols that have grown a lot, or large new symbols.
No need to write one: ftp.firstfloor.org:/pub/ak/perl/bloat-o-meter
-Andi
On Sat, Nov 11, 2000 at 11:36:00AM +0000, Tigran Aivazian wrote:
> Are you sure? I thought the fix was to build 2 page tables for 0-8M
Paging is disabled at that point.
Andrea
On Sat, 11 Nov 2000, Andrea Arcangeli wrote:
> On Sat, Nov 11, 2000 at 11:36:00AM +0000, Tigran Aivazian wrote:
> > Are you sure? I thought the fix was to build 2 page tables for 0-8M
>
> Paging is disabled at that point.
>
Yes, Andrea, I know that paging is disabled at the point of loading the
image but I was talking about the inability to boot (boot == complete
booting, i.e. at least reach start_kernel()) a kernel with very large
.data or .bss segments because of various reasons -- one of which,
probably,is the inadequacy of those pg0 and pg1 page tables set up in
head.S
So, what is still a bit unclear is -- if the only way to create a huge
bzImage is by having huge .text or .data or .bss, what is the combination
of the limits? I.e. which limit do we hit first -- the one on bzImage
(which Peter says is infinite?) or the ones on .text/.data/.bss (and what
exactly are they?)? See my question now?
Regards,
Tigran
On Sat, Nov 11, 2000 at 03:30:36PM +0100,
Andi Kleen <[email protected]> wrote:
>
> On Fri, Nov 10, 2000 at 11:47:50PM -0600, Peter Samuelson wrote:
> > 2b) If yes, write a perl script to compute symbol sizes from each
> > System.map file. (Symbol size == address of next symbol minus
> > address of this symbol.) Sort numerically, then compare old vs new
> > for symbols that have grown a lot, or large new symbols.
>
> No need to write one: ftp.firstfloor.org:/pub/ak/perl/bloat-o-meter
Would be good if you added a notice under which licence you put all
these nice perl scripts... Are they GPL'ed? Under a BSD-style licence?
Something else?
> -Andi
CU,
Thomas
--
Thomas K?hler Email: [email protected] | LCARS - Linux
<>< WWW: http://jeanluc-picard.de | for Computers
IRC: jeanluc | on All Real
PGP public key available from Homepage! | Starships
> Max Inux wrote:
> > On x86 machines there is a size limitation on booting. Though I thought
> > it was 1024K as the max, 900K should be fine.
>
> No, there isn't. There used to be, but it has been fixed.
>
> -hpa
Except the simple boot loader. You cannot boot kernel >=1024KB directly
from floppy...
Andrzej
On Sat, Nov 11, 2000 at 02:51:21PM +0000, Tigran Aivazian wrote:
> Yes, Andrea, I know that paging is disabled at the point of loading the
> image but I was talking about the inability to boot (boot == complete
> booting, i.e. at least reach start_kernel()) a kernel with very large
> .data or .bss segments because of various reasons -- one of which,
> probably,is the inadequacy of those pg0 and pg1 page tables set up in
> head.S
Ah ok, I thought you were talking about bootloader.
About the initial pagetable setup on i386 port there's certainly a 3M limit on
the size of the kernel image, but it's trivial to enlarge it. BTW, exactly for
that kernel size limit reasons in x86-64 I defined a 40Mbyte mapping where we
currently have a 4M mapping and that's even simpler to enlarge since they're 2M
PAE like pagetables.
Basically as far as the kernel can get loaded in memory correctly we have
no problem :)
> (which Peter says is infinite?) or the ones on .text/.data/.bss (and what
> exactly are they?)? See my question now?
We sure hit the 3M limit on the .bss clearing right now.
Andrea
On Sat, 11 Nov 2000, Andrea Arcangeli wrote:
> On Sat, Nov 11, 2000 at 02:51:21PM +0000, Tigran Aivazian wrote:
> > Yes, Andrea, I know that paging is disabled at the point of loading the
> > image but I was talking about the inability to boot (boot == complete
> > booting, i.e. at least reach start_kernel()) a kernel with very large
> > .data or .bss segments because of various reasons -- one of which,
> > probably,is the inadequacy of those pg0 and pg1 page tables set up in
> > head.S
>
> Ah ok, I thought you were talking about bootloader.
>
> About the initial pagetable setup on i386 port there's certainly a 3M limit on
> the size of the kernel image, but it's trivial to enlarge it. BTW, exactly for
> that kernel size limit reasons in x86-64 I defined a 40Mbyte mapping where we
> currently have a 4M mapping and that's even simpler to enlarge since they're 2M
> PAE like pagetables.
>
> Basically as far as the kernel can get loaded in memory correctly we have
> no problem :)
>
> > (which Peter says is infinite?) or the ones on .text/.data/.bss (and what
> > exactly are they?)? See my question now?
>
> We sure hit the 3M limit on the .bss clearing right now.
>
I understand and agree with what you say except the number 4M. It is not
4M but 8M, imho. See arch/i386/kernel/head.S
/*
* The page tables are initialized to only 8MB here - the final page
* tables are set up later depending on memory size.
*/
.org 0x2000
ENTRY(pg0)
.org 0x3000
ENTRY(pg1)
/*
* empty_zero_page must immediately follow the page tables ! (The
* initialization loop counts until empty_zero_page)
*/
.org 0x4000
ENTRY(empty_zero_page)
(the comment next to pg0 in asm/pgtable.h is misleading, whilst the
comment above paging_init() is plain wrong -- I sent a patch to Linus
yesterday but perhaps "wrong comment" is not a critical 2.4 issue :)
Regards,
Tigran
Andrzej Krzysztofowicz wrote:
> Except the simple boot loader. You cannot boot kernel >=1024KB directly
> from floppy...
That doesn't really matter much though... You have proceded beyond the
'simple' case. :)
You can always use a tiny bootloader like hpa's syslinux. I am
currently typing on a kernel booted from a standard 3 1/2" floppy:
> [jgarzik@rum linux_2_4]$ make bzImage
> [...]
> System is 1612 kB
> [jgarzik@rum g]$ dmesg|less
> [...]
> Memory: 124388k/131060k available (2876k kernel code, 6284k reserved, 367k data, 448k init, 0k highmem)
(...with /dev/fd0u1722, 1.44M floppies becomes 1.722M floppies...)
--
Jeff Garzik |
Building 1024 | Would you like a Twinkie?
MandrakeSoft |
Peter Samuelson wrote:
> [Robert Lynch] wrote:
> > I've been regularly building kernels in the testXX series, and
> > they have been coming out ~ 600K; test10-final and test11-pre1:
> >
> > -rw-r--r-- 1 root root 610503 Oct 31 18:39 vmlinuz-t10
> > -rw-r--r-- 1 root root 610568 Nov 7 20:26 vmlinuz-t11p01
> >
> > test11-pre2 comes out ~ 900K:
> >
> > -rw-r--r-- 1 root root 926345 Nov 10 10:16 vmlinuz-t11p02
>
> Track it down yourself:
>
> 1) The sizes of your two 'vmlinux' files: do they differ wildly as well?
Wildly; compare test11-pre1 and testll-pre2 sizes:
-rwxr-xr-x 1 root root 1789457 Nov 7 20:26
vmlinux-t11p01
-rwxr-xr-x 1 root root 2625016 Nov 10 10:15
vmlinux-t11p02
> 2a) If no, check the make logs between the vmlinux link line and bzImage
> creation. Compare the two and note any significant differences.
>
> 2b) If yes, write a perl script to compute symbol sizes from each
> System.map file. (Symbol size == address of next symbol minus
> address of this symbol.) Sort numerically, then compare old vs new
> for symbols that have grown a lot, or large new symbols.
>
> Peter
Whence Andi Kleen chipped in:
> No need to write one: ftp.firstfloor.org:/pub/ak/perl/bloat-o-meter
>
> -Andi
Running:
perl bloat-o-meter /boot/vmlinux-t11p01 /boot/vmlinux-t11p02 >
/tmp/bloat.out
looking at the output, the large positive changes seem to be
(doing it by eye, might have skipped and/or missed something):
Symbol Old size New size Delta Change (%)
slabinfo_write_proc 8 340 332 +4150.0
show_buffers 24 368 344 +1433.3
sys_nfsservctl 80 1060 980 +1225.0
dump_extended_fpu 8 84 76 +950.00
get_fpregs 36 372 336 +933.33
schedule_tail 16 144 128 +800.00
set_fpregs 36 272 236 +655.56
tty_release 16 108 92 +575.00
ext2_write_inode 20 108 88 +440.00
...
I have surpressed my momentary urge to post the whole thing, so
as not to arouse the legendary ire of this list. :)
Bob L.
On Sat, Nov 11, 2000 at 10:03:35AM -0800, Robert Lynch wrote:
> sys_nfsservctl 80 1060 980 +1225.0
> dump_extended_fpu 8 84 76 +950.00
> get_fpregs 36 372 336 +933.33
> schedule_tail 16 144 128 +800.00
> set_fpregs 36 272 236 +655.56
> tty_release 16 108 92 +575.00
> ext2_write_inode 20 108 88 +440.00
> ...
>
> I have surpressed my momentary urge to post the whole thing, so
> as not to arouse the legendary ire of this list. :)
Ordering by byte delta is more useful than by Change to get the real
pigs, because Change gives high values even for relatively small changes
(like 8 -> 84)
Also note that some of the output is bogus due to inaccurate nm output
(bloat-o-meter relies on nm)
-Andi
On Sat, Nov 11, 2000 at 04:46:09PM +0000, Tigran Aivazian wrote:
> I understand and agree with what you say except the number 4M. It is not
> 4M but 8M, imho. See arch/i386/kernel/head.S
You're reading 2.4.x, I was reading 2.2.x.
Andrea
Andi Kleen wrote:
>
> On Sat, Nov 11, 2000 at 10:03:35AM -0800, Robert Lynch wrote:
> > sys_nfsservctl 80 1060 980 +1225.0
> > dump_extended_fpu 8 84 76 +950.00
> > get_fpregs 36 372 336 +933.33
> > schedule_tail 16 144 128 +800.00
> > set_fpregs 36 272 236 +655.56
> > tty_release 16 108 92 +575.00
> > ext2_write_inode 20 108 88 +440.00
> > ...
> >
> > I have surpressed my momentary urge to post the whole thing, so
> > as not to arouse the legendary ire of this list. :)
>
> Ordering by byte delta is more useful than by Change to get the real
> pigs, because Change gives high values even for relatively small changes
> (like 8 -> 84)
>
> Also note that some of the output is bogus due to inaccurate nm output
> (bloat-o-meter relies on nm)
>
> -Andi
Yer right, here's a biggie I missed:
stext_lock 4344 29395 25051 +576.68
Bob L.
--
Robert Lynch-Berkeley CA [email protected]
Max Inux wrote:
>
> >gzip, actually. I can verify here "make bzImage" does the expected thing
> >and it looks normal-sized to me.
>
> I believe there is zImage (gzip) and bzImage (bzip2). (Or is it compress
> vs gzip, but then why bzImage vs gzImage?)
>
b is "big". They are both gzip compressed. zImage has a size limit
which bzImage doesn't. zImage is pretty much obsolete.
-hpa
--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt
Tigran Aivazian wrote:
>
> On Fri, 10 Nov 2000, H. Peter Anvin wrote:
> > >
> > > On x86 machines there is a size limitation on booting. Though I thought
> > > it was 1024K as the max, 900K should be fine.
> > >
> >
> > No, there isn't. There used to be, but it has been fixed.
> >
>
> Are you sure? I thought the fix was to build 2 page tables for 0-8M
> instead of 1 page table for 0-4M. So, we still cannot boot a bzImage more
> than 2.5M which roughly corresponds to 8M. Is this incorrect? Are you
> saying I should be able to boot a bzImage corresponding to an ELF object
> vmlinux of 4G or more?
>
> I tried it and it failed (a few weeks ago) so at least reasonably recently
> what you are saying was not true. I will now check if it suddenly became
> true now.
>
That wasn't the fix in question (there was a 1 MB *compressed* limit for
a while), but you're right, for now the limit is 8 MB *uncompressed.*
-hpa
--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt
Tigran Aivazian <[email protected]> writes:
> On Sat, 11 Nov 2000, Andrea Arcangeli wrote:
>
> > On Sat, Nov 11, 2000 at 02:51:21PM +0000, Tigran Aivazian wrote:
> > > Yes, Andrea, I know that paging is disabled at the point of loading the
> > > image but I was talking about the inability to boot (boot == complete
> > > booting, i.e. at least reach start_kernel()) a kernel with very large
> > > .data or .bss segments because of various reasons -- one of which,
> > > probably,is the inadequacy of those pg0 and pg1 page tables set up in
> > > head.S
> >
> > Ah ok, I thought you were talking about bootloader.
> >
> > About the initial pagetable setup on i386 port there's certainly a 3M limit on
>
> > the size of the kernel image, but it's trivial to enlarge it. BTW, exactly
> for
>
> > that kernel size limit reasons in x86-64 I defined a 40Mbyte mapping where we
> > currently have a 4M mapping and that's even simpler to enlarge since they're
> 2M
>
> > PAE like pagetables.
> >
> > Basically as far as the kernel can get loaded in memory correctly we have
> > no problem :)
> >
> > > (which Peter says is infinite?) or the ones on .text/.data/.bss (and what
> > > exactly are they?)? See my question now?
> >
> > We sure hit the 3M limit on the .bss clearing right now.
> >
With respect to .bss issues we should clear it before we set up page tables.
That way we have no hardlimit short of 4GB which.
We also do stupid things like set segment registers before setting up
a GDT. Yes we set them in setup.S but it is still a stupid non-obvious
dependency. We we can do it in setup.S
Eric
On Sat, Nov 11, 2000 at 10:57:20AM -0800, Robert Lynch wrote:
> Andi Kleen wrote:
> >
> > On Sat, Nov 11, 2000 at 10:03:35AM -0800, Robert Lynch wrote:
> > > sys_nfsservctl 80 1060 980 +1225.0
> > > dump_extended_fpu 8 84 76 +950.00
> > > get_fpregs 36 372 336 +933.33
> > > schedule_tail 16 144 128 +800.00
> > > set_fpregs 36 272 236 +655.56
> > > tty_release 16 108 92 +575.00
> > > ext2_write_inode 20 108 88 +440.00
> > > ...
> > >
> > > I have surpressed my momentary urge to post the whole thing, so
> > > as not to arouse the legendary ire of this list. :)
> >
> > Ordering by byte delta is more useful than by Change to get the real
> > pigs, because Change gives high values even for relatively small changes
> > (like 8 -> 84)
> >
> > Also note that some of the output is bogus due to inaccurate nm output
> > (bloat-o-meter relies on nm)
> >
> > -Andi
>
> Yer right, here's a biggie I missed:
That is the slow path of the spinlocks needed for fine grained SMP
locking. Not really surprising that that it bloated a bit, given all
the locking work that went into 2.4.
>From looking at my UP configuration (where vmlinux's text segment has bloated
by about 500K between 2.2 and 2.4) there are no obvious big pigs, just lots of
small stuff that adds together.
-Andi
On Sat, Nov 11, 2000 at 12:35:46PM -0700, Eric W. Biederman wrote:
> With respect to .bss issues we should clear it before we set up page tables.
We could sure do that but that's a minor win since we still need a
large mapping (more than 1 pagetable) for the bootmem allocator. (and we need
at least 1 pagetable setup as ident mapping at 0x100000 for the instruction
where we enable paging)
> We also do stupid things like set segment registers before setting up
> a GDT. Yes we set them in setup.S but it is still a stupid non-obvious
^^^^
I think you meant: we set "it" (gdt_48) up.
> dependency. We we can do it in setup.S
I removed that dependency in x86-64.
Andrea
Andrea Arcangeli <[email protected]> writes:
> On Sat, Nov 11, 2000 at 12:35:46PM -0700, Eric W. Biederman wrote:
> > With respect to .bss issues we should clear it before we set up page tables.
>
> We could sure do that but that's a minor win since we still need a
> large mapping (more than 1 pagetable) for the bootmem allocator. (and we need
> at least 1 pagetable setup as ident mapping at 0x100000 for the instruction
> where we enable paging)
>
> > We also do stupid things like set segment registers before setting up
> > a GDT. Yes we set them in setup.S but it is still a stupid non-obvious
> ^^^^
> I think you meant: we set "it" (gdt_48) up.
I was thinking segment descriptors.
>
> > dependency. We we can do it in setup.S
>
> I removed that dependency in x86-64.
Maniacal cackle....
x86-64 doesn't load the segment registers at all before use.
This is BAD BAD BAD!!!!!!!
I can tell you don't have real hardware. The non obviousness
of correct operation tripped you up as well.
So while you load the gdt before you set a segment register later,
which is good the more important part was still missed.
O.k. on monday I'll dig up my patch and that clears this up.
Eric
On Sun, Nov 12, 2000 at 06:14:36AM -0700, Eric W. Biederman wrote:
> x86-64 doesn't load the segment registers at all before use.
Yes, before switching to 64bit long mode we never do any data access. We do a
stack access to clear eflags only while we still run in legacy mode with paging
disabled and so we only rely on ss to be valid when the bootloader jumps at
0x100000 for executing the head.S code (and not anymore on the gdt_48 layout).
> I can tell you don't have real hardware. The non obviousness
Current code definitely works fine on the simnow simulator so if current code
shouldn't work because it's buggy then at least the simulator is sure buggy as
well (and that isn't going to be the case as its behaviour is in full sync with
the specs as far I can see).
> So while you load the gdt before you set a segment register later,
> which is good the more important part was still missed.
Sorry but I don't see the missing part. Are you sure you're not missing this
part of the x86-64 specs?
Data and Stack Segments:
In 64-bit mode, the contents of the ES, DS, and SS segment registers
are ignored. All fields (base, limit, and attribute) in the
corresponding segment descriptor registers (hidden part) are also
ignored.
Address calculations in 64-bit mode that reference the ES, DS, or SS
segments, are treated as if the segment base is zero. Rather than
perform limit checks, the processor instead checks that all
virtual-address references are in canonical form.
You'll find the above at the top of page 42 of the specs.
Basically in 64bit long mode only CS matters and basically only to specify
CS.L=1 and CS.D=0.
The only subtle case is during iret where we need a valid data segment for some
subtle reason (but that's unrelated to head.S that instead only needs to
switch to 64bit mode and jump into head64.c where we do the rest of the work
like clearing bss in C). Infact we need only 1 32bit compatibility mode data
segment in the gdt.
> O.k. on monday I'll dig up my patch and that clears this up.
Sure, go ahead if you weren't missing that basic part of the long mode specs.
Thanks.
Andrea
On Sun, Nov 12, 2000 at 04:37:05PM +0100, Andrea Arcangeli wrote:
> > I can tell you don't have real hardware. The non obviousness
>
> Current code definitely works fine on the simnow simulator so if current code
> shouldn't work because it's buggy then at least the simulator is sure buggy as
> well (and that isn't going to be the case as its behaviour is in full sync with
> the specs as far I can see).
The current simulator seems to be buggy in that it checks the SS,DS segments
that were pushed as part of the interrupt stack on iretd (which it
IMHO shouldn't according to the spec). We currently need to have a valid kernel
DS to make the interrupts work.
-Andi
On Sat, Nov 11, 2000 at 12:09:41PM -0800, H. Peter Anvin wrote:
> a while), but you're right, for now the limit is 8 MB *uncompressed.*
s/8/7/ (kernel starts at 1M)
Andrea
On Sun, Nov 12, 2000 at 04:44:17PM +0100, Andi Kleen wrote:
> The current simulator seems to be buggy in that it checks the SS,DS segments
>that were pushed as part of the interrupt stack on iretd [..]
That's the first thing I thought too indeed 8), but it maybe because at
iret time the CPU doesn't know if it will have to return to compatibility mode
or not. If we feed the ss/ds with a 32bit compatibility mode data segment all
should work right as well (this should be verified on the simulator though).
Andrea
Andrea Arcangeli <[email protected]> writes:
> On Sun, Nov 12, 2000 at 06:14:36AM -0700, Eric W. Biederman wrote:
> > x86-64 doesn't load the segment registers at all before use.
>
> Yes, before switching to 64bit long mode we never do any data access. We do a
> stack access to clear eflags only while we still run in legacy mode with paging
> disabled and so we only rely on ss to be valid when the bootloader jumps at
> 0x100000 for executing the head.S code (and not anymore on the gdt_48 layout).
Nope you rely on cs & ds as well. cs is just a duh the codes running
so it must be valid. But ds is needed for lgdt.
> > I can tell you don't have real hardware. The non obviousness
I need to retract this a bit. You are still building a compressed image,
and the code in the boot/compressed/head.S remains unchanged and loads
segment registers, so it works by luck. If you didn't build a
compressed image you would be in trouble.
> Current code definitely works fine on the simnow simulator so if current code
> shouldn't work because it's buggy then at least the simulator is sure buggy as
> well (and that isn't going to be the case as its behaviour is in full sync with
> the specs as far I can see).
Add a target for a noncompressed image and then build. It should be
interesting to watch.
>
> > So while you load the gdt before you set a segment register later,
> > which is good the more important part was still missed.
>
> Sorry but I don't see the missing part. Are you sure you're not missing this
> part of the x86-64 specs?
Nope because what I was complaining about is in 32 bit mode. :)
> Data and Stack Segments:
>
> In 64-bit mode, the contents of the ES, DS, and SS segment registers
> are ignored. All fields (base, limit, and attribute) in the
> corresponding segment descriptor registers (hidden part) are also
> ignored.
Hmm. I'll have to look and see if FS & GS are also ignored.
> Address calculations in 64-bit mode that reference the ES, DS, or SS
> segments, are treated as if the segment base is zero. Rather than
> perform limit checks, the processor instead checks that all
> virtual-address references are in canonical form.
Cool I like this bit. The segments are finally dead.
> > O.k. on monday I'll dig up my patch and that clears this up.
>
> Sure, go ahead if you weren't missing that basic part of the long mode specs.
> Thanks.
Nope. Though I suspect we should do the switch to 64bit mode in
setup.S and not have these issues pollute head.S at all.
Eric
Andrea Arcangeli <[email protected]> writes:
> On Sun, Nov 12, 2000 at 06:14:36AM -0700, Eric W. Biederman wrote:
> > x86-64 doesn't load the segment registers at all before use.
>
> Yes, before switching to 64bit long mode we never do any data access. We do a
> stack access to clear eflags only while we still run in legacy mode with paging
> disabled and so we only rely on ss to be valid when the bootloader jumps at
> 0x100000 for executing the head.S code (and not anymore on the gdt_48 layout).
Actually it just occurred to me that this stack assess is buggy. You haven't
set up a stack yet so. Only the boot/compressed/head.S did and that location isn't
safe to use.
Eric
[This is quite a bizarre discussion, but I'll answer anyways. I am not exactly
sure what your point is]
On Sun, Nov 12, 2000 at 11:57:15AM -0700, Eric W. Biederman wrote:
>
> > > I can tell you don't have real hardware. The non obviousness
>
> I need to retract this a bit. You are still building a compressed image,
> and the code in the boot/compressed/head.S remains unchanged and loads
> segment registers, so it works by luck. If you didn't build a
> compressed image you would be in trouble.
boot/compressed/head.S does run in 32bit legacy mode, where you of course
need segment registers. After you got into long mode segments are only
needed to jump between 32/64bit code segments and and for a the data segment
of the 32bit emulation (+ the iretd bug currently which I hope will be fixed
in final hardware)
Also note that boot/compressed/* currently does not even link, because the
x86-64 toolchain cannot generate relocated 32bit code ATM (the linker chokes
on the 32bit relocations) The tests we did so far used a precompiled
relocated binary compressed/{head,misc}.o from a IA32 build.
> > In 64-bit mode, the contents of the ES, DS, and SS segment registers
> > are ignored. All fields (base, limit, and attribute) in the
> > corresponding segment descriptor registers (hidden part) are also
> > ignored.
>
> Hmm. I'll have to look and see if FS & GS are also ignored.
They are not, you to fully use them you need privileged MSRs.
Their limit is ignored.
> > Sure, go ahead if you weren't missing that basic part of the long mode specs.
> > Thanks.
>
> Nope. Though I suspect we should do the switch to 64bit mode in
> setup.S and not have these issues pollute head.S at all.
I see no advantage in doing it there instead of in head.S
-Andi
On Sun, Nov 12, 2000 at 11:57:15AM -0700, Eric W. Biederman wrote:
> Nope you rely on cs & ds as well. cs is just a duh the codes running
> so it must be valid. But ds is needed for lgdt.
Right. The ds just needs to be valid as cs and ss needs to be valid
as well (for obvious reasons I didn't even mentioned cs needs to be valid).
i386 instead has the dependency to have the selectors in desc.h to be the same
as used by the decompression code, we only need valid ones instead. I think
relying on the decompression code to provide sane segment selectors isn't as
ugly as being dependent on its own private gdt layout.
If I don't want to rely on the ds to be valid to do the lgdt, then I need to
rely on even more stuff dependent on the decompression as i386 does infact,
see? I just prefer to require the decompression code to provide a sane ds/ss
(and cs). I know decompression code returns valid ds/ss and I think current
requirement is the cleaner one and I don't see any problem in doing that.
> I need to retract this a bit. You are still building a compressed image,
Sure, not compressed images aren't supported (not sure it even worth to support
uncompressed images in the long run as those machines will have _enough_ memory
to decompress the kernel, even my dragonball based PDA has btw :). And
anyways at this point in time what we really care is that a bzImage boots from
floppy at the moment and that works just fine (definitely not by luck).
> Nope. Though I suspect we should do the switch to 64bit mode in
> setup.S and not have these issues pollute head.S at all.
Only point of head.S is to switch to 64bit with minimal pagetable setup in
place and then to jump in kernel virtual address space to run 64bit C code.
If we do the switch to 64bit in setup.S then we have to change the
decompression code and then the decompression code wouldn't work anymore with
other x86 bootloaders that are not been changed in their "setup.S". So
I'd cosndier it a very bad thing.
Also current head.S has much less pollution than the i386 one IMHO as it's been
rewritten to do only the minimal stuff in asm and a new head64.c is been
created to fixup all the rest in C so that's readable and maintainable and I
hope you enjoy this too ;).
Andrea
On Sun, Nov 12, 2000 at 12:20:19PM -0700, Eric W. Biederman wrote:
> Actually it just occurred to me that this stack assess is buggy. You haven't
> set up a stack yet so. [..]
Yes, ss and esp are inherit from the decompression code right now.
> [..] Only the boot/compressed/head.S did and that location
> isn't safe to use.
It's 0x90000 here and that's safe. However I see it should been something like
0x1037a0 instead and it would overwrite 4 bytes of decompressed image, not
sure why it happens to be safe right now hmm.
About ss I'd still depend on the decompression code to give back the sane
segmented environment.
Andrea
BTW, the checks after line 153 in linux/arch/i386/boot/tools/build.c
reflect all those limitations.
- Werner
--
_________________________________________________________________________
/ Werner Almesberger, ICA, EPFL, CH [email protected] /
/_IN_N_032__Tel_+41_21_693_6621__Fax_+41_21_693_6610_____________________/
Andi Kleen <[email protected]> writes:
> [This is quite a bizarre discussion, but I'll answer anyways. I am not exactly
> sure what your point is]
Let me step aside a second and explain where I'm coming from. As a spin
off of the work of the linuxBIOS project I have implemented a system
call that implements exec functionality at the kernel level. Essentially
allowing you to warm boot linux from linux. To get this to work no
bios calls are involved, so I'm not using setup.S. This also has the
interesting side effect of allowing a boot loader to be written that
will work on all linux platforms. (I have currently just begun my
port to alpha).
In the process of the above I have learned quite a bit about how
the current boot loader works. And want eventually to convert linux
to not need wrapper code to use my bootloader.
Booting vmlinux is fun :)
> On Sun, Nov 12, 2000 at 11:57:15AM -0700, Eric W. Biederman wrote:
> >
> > > > I can tell you don't have real hardware. The non obviousness
> >
> > I need to retract this a bit. You are still building a compressed image,
> > and the code in the boot/compressed/head.S remains unchanged and loads
> > segment registers, so it works by luck. If you didn't build a
> > compressed image you would be in trouble.
>
> boot/compressed/head.S does run in 32bit legacy mode, where you of course
> need segment registers. After you got into long mode segments are only
> needed to jump between 32/64bit code segments and and for a the data segment
> of the 32bit emulation (+ the iretd bug currently which I hope will be fixed
> in final hardware)
>
> Also note that boot/compressed/* currently does not even link, because the
> x86-64 toolchain cannot generate relocated 32bit code ATM (the linker chokes
> on the 32bit relocations) The tests we did so far used a precompiled
> relocated binary compressed/{head,misc}.o from a IA32 build.
...
> > > Sure, go ahead if you weren't missing that basic part of the long mode
> specs.
>
> > > Thanks.
> >
> > Nope. Though I suspect we should do the switch to 64bit mode in
> > setup.S and not have these issues pollute head.S at all.
>
> I see no advantage in doing it there instead of in head.S
After reading through the long mode specs I now agree. If you could
be in long mode with the mmu disabled that would be a different story
but you can't and it isn't.
I was thinking of symmetry with the x86 and how much easier everything
is if you only use one processor mode for the initial boot strap. No
need for super assemblers etc. Oh well.
On x86 there are some real advantages to moving the segment loads into
setup.S from the various head.S's and they still apply (although to a
lesser extent) to x86-64. This causes less code confusion.
For my kexec stuff I now need to think really hard how I want
to handle x86-64. What I was thinking would work well in general
is to start the processor it's native/optimal mode with the mmu
disabled. With x86-64 I can't do this unfortunately :(
Eric