FYI, I walked Chris on how to use DOS DEBUG via private email, and below is the results after the INT 15, using the instruction sequence I posted on the mailing list.Notice HIMEM was not loaded.?
----------------------------------------
> From: [email protected]
> To: [email protected]
> Subject: Re: 2.6.38.3 and 2.6.39-rc4 hangs after "Booting the kernel" on quad Pentium Pro system
> Date: Sat, 23 Apr 2011 18:55:44 +1000
>
> On Sat, 23 Apr 2011 06:28:26 PM Yuhong Bao wrote:
>
>> Actually, execute with G instead, and do it without HIMEM.SYS
>> loaded.
>
> I couldn't see any HIMEM.SYS on the FreeDOS floppy (including
> looking for hidden files) so I'm presuming it's not there. There
> was a HIMEM.EXE so I've renamed it to NOHIMEM.OFF in the hope
> that it won't get executed, even in the safe mode I was using.
>
> Using G instead of A I now get:
>
> -G
> Unexpected breakpoint interrupt.
> AX=3C00 BX=0F00 CX=0000 DX=0000 SP=FFFE BP=0000 SI=0000 DI=0000
> DS=2890 ES=2890 SS=2890 CS=2890 IP=0106 NV UP DI PL NZ NA PE NC
> 2890:0106 0000 ADD [BX+SI], AL D:0F00=00
> -
>
> cheers,
> Chris
> --
> Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
>
> This email may come with a PGP signature as a file. Do not panic.
> For more info see: http://en.wikipedia.org/wiki/OpenPGP
-
Am Samstag, den 23.04.2011, 03:47 -0700 schrieb Yuhong Bao:
> FYI, I walked Chris on how to use DOS DEBUG via private email, and below is the results after the INT 15, using the instruction sequence I posted on the mailing list.Notice HIMEM was not loaded.
>
> ----------------------------------------
> > From: [email protected]
> > To: [email protected]
> > Subject: Re: 2.6.38.3 and 2.6.39-rc4 hangs after "Booting the kernel" on quad Pentium Pro system
> > Date: Sat, 23 Apr 2011 18:55:44 +1000
> >
> > On Sat, 23 Apr 2011 06:28:26 PM Yuhong Bao wrote:
> >
> >> Actually, execute with G instead, and do it without HIMEM.SYS
> >> loaded.
> >
> > I couldn't see any HIMEM.SYS on the FreeDOS floppy (including
> > looking for hidden files) so I'm presuming it's not there. There
> > was a HIMEM.EXE so I've renamed it to NOHIMEM.OFF in the hope
> > that it won't get executed, even in the safe mode I was using.
> >
> > Using G instead of A I now get:
> >
> > -G
> > Unexpected breakpoint interrupt.
> > AX=3C00 BX=0F00 CX=0000 DX=0000 SP=FFFE BP=0000 SI=0000 DI=0000
> > DS=2890 ES=2890 SS=2890 CS=2890 IP=0106 NV UP DI PL NZ NA PE NC
> > 2890:0106 0000 ADD [BX+SI], AL D:0F00=00
> > -
> >
so your bios seems to report the size in AX/BX. the code in
arch/x86/boot/memory.c move the return sizes from CX/DX into AX/BX, when
CX or DX is not zero.
Could you try to change the line:
} else if (oreg.ax == 15*1024) {
boot_params.alt_mem_k = (oreg.dx << 6) + oreg.ax;
to
} else if (oreg.ax == 15*1024) {
boot_params.alt_mem_k = (oreg.bx << 6) + oreg.ax;
That should fix your misdetection.
The assembler code in arch/i386/boot/setup.S seemed to move AX/BX into
CX/DX, when CX and(!) DX were zero. Then used CX/DX to calc the memory
size.
PS: gitk --follow arch/x86/boot/memory.c seems to react strangley...
On 04/23/2011 06:52 AM, Thomas Meyer wrote:
>
> so your bios seems to report the size in AX/BX. the code in
> arch/x86/boot/memory.c move the return sizes from CX/DX into AX/BX, when
> CX or DX is not zero.
>
> Could you try to change the line:
>
> } else if (oreg.ax == 15*1024) {
> boot_params.alt_mem_k = (oreg.dx<< 6) + oreg.ax;
>
> to
> } else if (oreg.ax == 15*1024) {
> boot_params.alt_mem_k = (oreg.bx<< 6) + oreg.ax;
>
> That should fix your misdetection.
>
> The assembler code in arch/i386/boot/setup.S seemed to move AX/BX into
> CX/DX, when CX and(!) DX were zero. Then used CX/DX to calc the memory
> size.
>
> PS: gitk --follow arch/x86/boot/memory.c seems to react strangley...
>
Ah yes, this should have been bx; ax/bx and cx/dx forms pairs (unlike
the normal x86 convention of DX:AX and BX:CX forming pairs), and it
doesn't make sense to mix and match them.
-hpa
Hi Thomas,
On Sat, 23 Apr 2011 11:52:27 PM Thomas Meyer wrote:
> Could you try to change the line:
>
> } else if (oreg.ax == 15*1024) {
> boot_params.alt_mem_k = (oreg.dx << 6) + oreg.ax;
>
> to
> } else if (oreg.ax == 15*1024) {
> boot_params.alt_mem_k = (oreg.bx << 6) + oreg.ax;
>
> That should fix your misdetection.
Took me a while to spot the D->B change - good catch!
I can confirm the allnoconfig now no longer hangs and proceeds
to panic about not being able to mount the rootfs, which is good.
I'll build a more functional kernel now and check it boots OK.
All the best,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
This email may come with a PGP signature as a file. Do not panic.
For more info see: http://en.wikipedia.org/wiki/OpenPGP
On Sat, 23 Apr 2011 11:52:27 PM Thomas Meyer wrote:
> That should fix your misdetection.
A fuller kernel does proceed further than before, before running
into issues where both the e100 and aix7xxx drivers complain about
not being able to find their IRQ's and that this might be due to
a buggy MP table.
I presume this is unrelated to the initial problem ?
cheers,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
This email may come with a PGP signature as a file. Do not panic.
For more info see: http://en.wikipedia.org/wiki/OpenPGP
On Sun, 24 Apr 2011 10:16:22 AM Chris Samuel wrote:
> A fuller kernel does proceed further than before, before running
> into issues where both the e100 and aix7xxx drivers complain about
> not being able to find their IRQ's and that this might be due to
> a buggy MP table.
Booting with noapic nolapic removes the IRQ warning but the
system still times out trying to mount the root filesystem and
reboots itself.
I'll be away for a day or two now.
All the best,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
This email may come with a PGP signature as a file. Do not panic.
For more info see: http://en.wikipedia.org/wiki/OpenPGP
On Sat, Apr 23, 2011 at 5:22 PM, Chris Samuel <[email protected]> wrote:
> On Sun, 24 Apr 2011 10:16:22 AM Chris Samuel wrote:
>
>> A fuller kernel does proceed further than before, before running
>> into issues where both the e100 and aix7xxx drivers complain about
>> not being able to find their IRQ's and that this might be due to
>> a buggy MP table.
>
> Booting with noapic nolapic removes the IRQ warning but the
> system still times out trying to mount the root filesystem and
> reboots itself.
You may need to your .config to have
CONFIG_SMP
CONFIG_PCI
CONFIG_X86_MPPARSE
set to use mptable from BIOS.
Thanks
Yinghai
Hello Yinghai,
On Sun, 24 Apr 2011 06:46:03 PM Yinghai Lu wrote:
> You may need to your .config to have
> CONFIG_SMP
> CONFIG_PCI
> CONFIG_X86_MPPARSE
> set to use mptable from BIOS.
Thanks for that - I already have all 3 of those set I'm afraid.
All the best,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
This email may come with a PGP signature as a file. Do not panic.
For more info see: http://en.wikipedia.org/wiki/OpenPGP
Commit-ID: 39b68976ac653cfdc7f872a293e8b7928de2dcc6
Gitweb: http://git.kernel.org/tip/39b68976ac653cfdc7f872a293e8b7928de2dcc6
Author: H. Peter Anvin <[email protected]>
AuthorDate: Mon, 25 Apr 2011 14:52:37 -0700
Committer: H. Peter Anvin <[email protected]>
CommitDate: Mon, 25 Apr 2011 14:52:37 -0700
x86, setup: When probing memory with e801, use ax/bx as a pair
When we use BIOS function e801 to probe memory, we should use ax/bx
(or cx/dx) as a pair, not mix and match. This was a typo during the
translation from assembly code, and breaks at least one set of
machines in the field (which return cx = dx = 0).
Reported-and-tested-by: Chris Samuel <[email protected]>
Fix-proposed-by: Thomas Meyer <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
---
arch/x86/boot/memory.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/arch/x86/boot/memory.c b/arch/x86/boot/memory.c
index cae3feb..db75d07 100644
--- a/arch/x86/boot/memory.c
+++ b/arch/x86/boot/memory.c
@@ -91,7 +91,7 @@ static int detect_memory_e801(void)
if (oreg.ax > 15*1024) {
return -1; /* Bogus! */
} else if (oreg.ax == 15*1024) {
- boot_params.alt_mem_k = (oreg.dx << 6) + oreg.ax;
+ boot_params.alt_mem_k = (oreg.bx << 6) + oreg.ax;
} else {
/*
* This ignores memory above 16MB if we have a memory
On Tue, 26 Apr 2011 09:25:02 AM tip-bot for H. Peter Anvin wrote:
> x86, setup: When probing memory with e801, use ax/bx as a pair
ACK. Thanks!
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
This email may come with a PGP signature as a file. Do not panic.
For more info see: http://en.wikipedia.org/wiki/OpenPGP
* Chris Samuel <[email protected]> wrote:
> On Tue, 26 Apr 2011 09:25:02 AM tip-bot for H. Peter Anvin wrote:
>
> > x86, setup: When probing memory with e801, use ax/bx as a pair
>
> ACK. Thanks!
Nice!
Just asking, does the box now boot to user-space, or are there other
regressions left?
Thanks,
Ingo
Hi Ingo,
On 26/04/11 18:12, Ingo Molnar wrote:
> Just asking, does the box now boot to user-space, or are there other
> regressions left?
Well I now have to boot the box with "noapic" otherwise I hit
IRQ problems in the e100 and aic7xxx drivers, they report:
can't find IRQ for PCI INT A; probably buggy MP table
Even with "noapic" the user space seems to try and load the
software RAID before the SCSI driver resulting in it timing
out locating the root filesystem and resetting the box. I
suspect that may be an initramfs issue though..
cheers,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
On Tue, 26 Apr 2011 06:12:37 PM Ingo Molnar wrote:
> Just asking, does the box now boot to user-space, or are there
> other regressions left?
Final update on this - I can now boot all the way to userspace
successfully using an unpatched 2.6.39-rc6 with the parameters:
noapic scsi_mod.scan=sync
So I'm really happy, thanks so much to everyone for their help!
All the best,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
This email may come with a PGP signature as a file. Do not panic.
For more info see: http://en.wikipedia.org/wiki/OpenPGP
* Chris Samuel <[email protected]> wrote:
> On Tue, 26 Apr 2011 06:12:37 PM Ingo Molnar wrote:
>
> > Just asking, does the box now boot to user-space, or are there
> > other regressions left?
>
> Final update on this - I can now boot all the way to userspace
> successfully using an unpatched 2.6.39-rc6 with the parameters:
>
> noapic scsi_mod.scan=sync
>
> So I'm really happy, thanks so much to everyone for their help!
Thanks for the update!
Note that the noapic and the scsi-sync-scan boot parameter suggests that
there's two regressions remaining.
The stock kernel wont boot - you can work it around via boot options but most
users wont be able to do that.
Thanks,
Ingo
On Thu, 5 May 2011 10:10:59 PM Ingo Molnar wrote:
> Thanks for the update!
My pleasure.
> Note that the noapic and the scsi-sync-scan boot parameter suggests
> that there's two regressions remaining.
Understood, I would guess that the SCSI one would map to the
introduction of the async scsi scanning patch in 2.6.19-rc2
and its enablement in later Ubuntu kernels.
No idea on the APIC one but I'm happy to try and bisect both
cases if you'd like me to try ?
> The stock kernel wont boot - you can work it around via boot
> options but most users wont be able to do that.
Indeed - though at the moment you can't even install a current
Debian release as the boot loader on the install CD locks the
box up. :-(
cheers,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
This email may come with a PGP signature as a file. Do not panic.
For more info see: http://en.wikipedia.org/wiki/OpenPGP
* Chris Samuel <[email protected]> wrote:
> On Thu, 5 May 2011 10:10:59 PM Ingo Molnar wrote:
>
> > Thanks for the update!
>
> My pleasure.
>
> > Note that the noapic and the scsi-sync-scan boot parameter suggests
> > that there's two regressions remaining.
>
> Understood, I would guess that the SCSI one would map to the
> introduction of the async scsi scanning patch in 2.6.19-rc2
> and its enablement in later Ubuntu kernels.
Yeah. Async SCSI scanning was not supposed to break any existing setup.
> No idea on the APIC one but I'm happy to try and bisect both
> cases if you'd like me to try ?
I could definitely do something about the APIC regression if managed to narrow
down the commit range (a specific guilty commit would be fantastic of course).
If the regression got introduced after the e801 regression you'll need to run:
git cherry-pick 39b68976ac65
at every bisection step that needs that fix - but still bisect as if that extra
commit was not there. (bisection will throw away that cherry-picking temporary
tree so you will have to re-pick the commit again and again)
Note that during bisection the current tree might jump in and out of regions
that need this fix, so be prepared to have to do the cherry-picking at random
places. You can attempt the cherry-pick at every step and you will get a
conflict and it will not succeed if the fix is not needed. You can throw away
the conflicting state via 'git reset --hard'.
> > The stock kernel wont boot - you can work it around via boot
> > options but most users wont be able to do that.
>
> Indeed - though at the moment you can't even install a current
> Debian release as the boot loader on the install CD locks the
> box up. :-(
Is that hang due to one of these 3 regressions - or is it a fourth regression
perhaps?
While the installed base of your hardware is small, i think such old-hardware
testing is still very valuable feedback to us: it gives us a feel for how
corrosive our development process is to long-term (10+ years) stability.
Thanks,
Ingo
On Fri, 6 May 2011 10:04:52 PM Ingo Molnar wrote:
> * Chris Samuel <[email protected]> wrote:
>
> > Understood, I would guess that the SCSI one would map to the
> > introduction of the async scsi scanning patch in 2.6.19-rc2
> > and its enablement in later Ubuntu kernels.
>
> Yeah. Async SCSI scanning was not supposed to break any existing
> setup.
Well that'd be about my luck at the moment. ;-)
> > No idea on the APIC one but I'm happy to try and bisect both
> > cases if you'd like me to try ?
>
> I could definitely do something about the APIC regression if
> managed to narrow down the commit range (a specific guilty commit
> would be fantastic of course). If the regression got introduced
> after the e801 regression you'll need to run:
>
> git cherry-pick 39b68976ac65
>
> at every bisection step that needs that fix - but still bisect as
> if that extra commit was not there. (bisection will throw away
> that cherry-picking temporary tree so you will have to re-pick the
> commit again and again)
That's great, will try and see what I can do. It might take a little
bit of time due to work commitments prior to a (planned) trip to
hospital next Thursday.
> Note that during bisection the current tree might jump in and out
> of regions that need this fix, so be prepared to have to do the
> cherry-picking at random places. You can attempt the cherry-pick
> at every step and you will get a conflict and it will not succeed
> if the fix is not needed. You can throw away the conflicting state
> via 'git reset --hard'.
Understood, thanks!
> > Indeed - though at the moment you can't even install a current
> > Debian release as the boot loader on the install CD locks the
> > box up. :-(
>
> Is that hang due to one of these 3 regressions - or is it a fourth
> regression perhaps?
This is before it boots the kernel, so I'd guess something in
whatever they are using for the Squeeze install CD - perhaps
grub2 now ?
> While the installed base of your hardware is small, i think such
> old-hardware testing is still very valuable feedback to us: it
> gives us a feel for how corrosive our development process is to
> long-term (10+ years) stability.
Great, as long as this is more useful than just fixing my problems!
cheers,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
This email may come with a PGP signature as a file. Do not panic.
For more info see: http://en.wikipedia.org/wiki/OpenPGP
> x86, setup: When probing memory with e801, use ax/bx as a pair
I think it is time to push this patch into stable branches too.