2022-05-02 22:38:02

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: Changes in kernel 5.18-rc1 leads to crashes in VirtualBox Virtual Machines

Hi Larry,

On Mon, May 2, 2022 at 4:55 AM Larry Finger <[email protected]> wrote:
> On 5/1/22 20:05, Jason A. Donenfeld wrote:
> > Hi Larry,
> >
> > On Mon, May 02, 2022 at 02:11:13AM +0200, Jason A. Donenfeld wrote:
> >> Hey again,
> >>
> >> I just installed VirtualBox ontop of 5.18-rc4, and then I made a new VM
> >> with a fresh install of OpenSUSE, and everything is fine. No issues at
> >> all.
> >>
> >> So you're going to have to provide more information.
> >>
> >> Jason
> >
> > With still no more information provided from you, I've gone scouring and
> > found your much more informative bug report here:
> > https://www.virtualbox.org/ticket/20914 along with a larger log here
> > https://www.virtualbox.org/attachment/ticket/20914/Windows%2010%20Clone-2022-04-24-20-55-56.log
> >
> > Why would you not have sent me all this information right away? Surely
> > you know how to report bugs. If you're going to concern me with the
> > possibility that I've broken something, at least give me enough detail
> > to be able to do something. Otherwise it's pure frustration.
> >
> > Anyway, it's still too little information, but I could extract the
> > Windows build from that log file, pull down ntoskrnl.exe and hope it
> > roughly matches, and then go to work in IDA Pro trying to figure out
> > what's going on at ntoskrnl.exe+3f7d50, and if I managed to grab the
> > right build -- which I more than likely did not -- then that's a `mov
> > byte ptr gs:853h, 0` in KiInterruptDispatch, which seems entirely
> > unrelated to the change you mentioned.
> >
> > So I think it'd be a good moment for you to show your bisect logs so we
> > can be certain we're after the right thing.
>
> LKML removed from cc due to large files.
>
> Yes, I do know how to report bugs. If you remember my first E-mail, I was just
> looking for some suggestions on how using rdrand and rdseed could conflict with
> your changes. I'm sorry that you think I'm wasting your time.
>
> Where did you get your copy of VirtualBox? Perhaps they have some fixes that I
> do not know about.

I patched
<https://dev.gentoo.org/~polynomial-c/virtualbox/vbox-kernel-module-src-6.1.34.tar.xz>
using <https://xn--4db.cc/AtB1jwli>.

> My bisect logs are gone. I will need to recreate them and I should have them
> tomorrow. I do have my paper log to create the bisect. I will have it for you
> tomorrow.
>
> I ran the VM again and got a slightly different result. The kernel exception was
> at ntoskrnl.exe+458647.The mini dump is attached. The ntosknl.exe is available
> at https:/lwfinger.com/download/ntosknl.exe.gz.

You spelled your URL wrong in two places. Had to guess how to fix it.
Please spend more time with your bug reports. This is already more
painful than it should be.

From looking at the minidump you sent, I don't see how this is related
to the RNG. Maybe something else is wrong with your VirtualBox, and
you're just experiencing a 5.17->5.18 transition. The VirtualBox team
themselves said they haven't released the modules for 5.18 yet.
Then on top of that, maybe you're bisecting wrong.

Anyway, from that minidump...

PROCESS_NAME: svchost.exe
STACK_TEXT:
ffff8603`177407f8 fffff806`30464647 : 00000000`0000001e ffffffff`c0000005 fffff806`3062797c 00000000`00000000 : nt!KeBugCheckEx
ffff8603`17740800 fffff806`30415dac : 00000000`00001000 ffff8603`177410a0 ffff8000`00000000 00000000`00000000 : nt!KiDispatchException+0x17c287
ffff8603`17740ec0 fffff806`30411f43 : 00000000`00000001 ffffa20d`a3e00340 00000000`00000060 00000000`00000000 : nt!KiExceptionDispatch+0x12c
ffff8603`177410a0 fffff806`3062797c : 00000000`000000c8 fffff806`30248da4 00000000`00000000 00000000`00000001 : nt!KiPageFault+0x443
ffff8603`17741230 fffff806`3064606e : 00000000`00000000 ffffdd8e`e4fe9970 00000000`00000000 00000000`00000000 : nt!MiPfPrepareReadList+0x4c
ffff8603`17741320 fffff806`30645de4 : ffffa20d`ac52dcc0 00000000`00000000 00000000`00000000 ffffdd8e`e4fe9970 : nt!MmPrefetchPagesEx+0x96
ffff8603`17741390 fffff806`3064b349 : 00000000`00000000 ffff8603`00000000 ffffa20d`00000000 00000000`00000006 : nt!PfpPrefetchFilesTrickle+0x2a8
ffff8603`17741480 fffff806`3064bb6e : ffffa20d`abf59000 ffffa20d`abf59000 ffff8603`177416a0 00000000`00000000 : nt!PfpPrefetchRequestPerform+0x299
ffff8603`177415f0 fffff806`30651679 : 00000000`00000001 fffff806`302c0c01 ffffdd8e`e9e81760 ffffa20d`abf59000 : nt!PfpPrefetchRequest+0x132
ffff8603`17741670 fffff806`3065050d : ffffdd8e`00000000 00000000`00000000 00000000`1d16c86a 00000000`1d16c801 : nt!PfSetSuperfetchInformation+0x155
ffff8603`17741770 fffff806`304156b5 : 00000000`00000000 00000000`00000000 ffff8603`17741b80 00000000`00000000 : nt!NtSetSystemInformation+0x9bd
ffff8603`17741b00 00007fff`5b9b0274 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x25
00000075`ba37f9c8 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x00007fff`5b9b0274
SYMBOL_NAME: nt!MiPfPrepareReadList+4c
MODULE_NAME: nt
IMAGE_VERSION: 10.0.19041.1682

Loading up the kernel image, we see:

PAGE:000000014061B946 mov r13, rcx
[...]
PAGE:000000014061B96F mov rax, [r13+0]
[...]
PAGE:000000014061B97C mov rdx, [rax+28h]

So it dereferences the first argument of MiPfPrepareReadList(), and then
dereferences offset 0x28 of that, and crashes there. Looks like the same
thing happens in your other traces too, based on the bugcheck code
showing offset 0x28 in those too.

Anyway, until I can see that bisect log, this is beginning to smell like
a big waste of time.

Jason