LinuxLists.cc - vm86.c audit_syscall_exit() call trashes registers

2007-08-14 19:09:01

Subject: vm86.c audit_syscall_exit() call trashes registers

Please Cc: any replies, as we are not subscribed to linux-kernel.
Thanks.

Somewhere around 2.6.16.12 a call to audit_syscall_exit was added to
vm86.c:

static void do_sys_vm86(struct kernel_vm86_struct *info, struct task_struct *tsk)
{
struct tss_struct *tss;
+ long eax;
/*
* make sure the vm86() system call doesn't try to do anything silly
*/
@@ -305,13 +307,19 @@ static void do_sys_vm86(struct kernel_vm
tsk->thread.screen_bitmap = info->screen_bitmap;
if (info->flags & VM86_SCREEN_BITMAP)
mark_screen_rdonly(tsk->mm);
+ __asm__ __volatile__("xorl %eax,%eax; movl %eax,%fs; movl %eax,%gs\n\t");
+ __asm__ __volatile__("movl %%eax, %0\n" :"=r"(eax));
+
+ /*call audit_syscall_exit since we do not exit via the normal paths */
+ if (unlikely(current->audit_context))
+ audit_syscall_exit(current, AUDITSC_RESULT(eax), eax);
+
__asm__ __volatile__(
- "xorl %%eax,%%eax; movl %%eax,%%fs; movl %%eax,%%gs\n\t"
"movl %0,%%esp\n\t"
"movl %1,%%ebp\n\t"
"jmp resume_userspace"
: /* no outputs */
- :"r" (&info->regs), "r" (task_thread_info(tsk)) : "ax");
+ :"r" (&info->regs), "r" (task_thread_info(tsk)));
/* we never return here */
}

This appears to have caused intermittent data corruption of the
results of the vm86() call that the X server uses to get EDID data
from the monitor via the VESA BIOS. After removing the
audit_syscall_exit() call, the problems mentioned in these bugzillas
disappear:

Fetch of EDID 128 byte buffer by X server through vm86 INT 10 call is flaky.
http://bugzilla.kernel.org/show_bug.cgi?id=8633

RHEL 5 fails to get EDID data from monitor and sets low resolution
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=236416

If I'm reading correctly, it appears that the code above trashes the
%fs and %gs registers, or otherwise doesn't leave them at zero before
returning from the system call as the old code did. Is this a correct
analysis? How should this be fixed?

Thanks.

-Chuck

2007-08-14 19:48:18

by Andi Kleen

[permalink] [raw]

Subject: Re: vm86.c audit_syscall_exit() call trashes registers

Chuck Anderson <[email protected]> writes:
>
> If I'm reading correctly, it appears that the code above trashes the
> %fs and %gs registers, or otherwise doesn't leave them at zero before
> returning from the system call as the old code did. Is this a correct
> analysis?

The kernel runs with defined fs -- saved and set at system call entry/exit --
and shouldn't touch gs (except on a context switch, but then it should
be set back when you get scheduled again)

It's in theory possible that something went wrong with the gs saving
for the vm86 path, but this changed long 2.6.16. But I assume
when you just remove the call in 2.6.16 it already works? If yes
it cannot be that (2.6.16 didn't use either fs or gs in the kernel)

> How should this be fixed?

The problem first needs to be fully understood. Do you have more
details on the corruption?

One suspicious thing is that the audit code does mutex_lock(&tty_mutex)
and could sleep there. It's a long shot, but does the problem go
away when you comment that out? [such a patch is incorrect in theory,
but should be unlikely enough to crash for a quick test]

But actually sleeping should be ok here and a preemptible kernel could do
it anyways.

-Andi

2007-08-14 21:14:59

by William Cattey

[permalink] [raw]

Subject: Re: vm86.c audit_syscall_exit() call trashes registers

The corruption originally looked like a race condition.

Sometimes the EDID buffer would be all zeros.
Sometimes it would contain partial data, and then the rest of the
buffer filled with zeros.
The amount of data transferred into the buffer before going to all
zeros is non-deterministic.

When we put a known value in each byte of the buffer before making
the vm86 call, the known data would always be overwritten either with
EDID data or zeros.

-Bill

----

William Cattey
Linux Platform Coordinator
MIT Information Services & Technology

W92-176, 617-253-0140, [email protected]
http://web.mit.edu/wdc/www/

On Aug 14, 2007, at 4:42 PM, Andi Kleen wrote:

> Chuck Anderson <[email protected]> writes:
>>
>> If I'm reading correctly, it appears that the code above trashes the
>> %fs and %gs registers, or otherwise doesn't leave them at zero before
>> returning from the system call as the old code did. Is this a
>> correct
>> analysis?
>
> The kernel runs with defined fs -- saved and set at system call
> entry/exit --
> and shouldn't touch gs (except on a context switch, but then it should
> be set back when you get scheduled again)
>
> It's in theory possible that something went wrong with the gs saving
> for the vm86 path, but this changed long 2.6.16. But I assume
> when you just remove the call in 2.6.16 it already works? If yes
> it cannot be that (2.6.16 didn't use either fs or gs in the kernel)
>
>> How should this be fixed?
>
> The problem first needs to be fully understood. Do you have more
> details on the corruption?
>
> One suspicious thing is that the audit code does mutex_lock
> (&tty_mutex)
> and could sleep there. It's a long shot, but does the problem go
> away when you comment that out? [such a patch is incorrect in theory,
> but should be unlikely enough to crash for a quick test]
>
> But actually sleeping should be ok here and a preemptible kernel
> could do
> it anyways.
>
> -Andi

2007-08-14 21:52:58

by Andi Kleen

[permalink] [raw]

Subject: Re: vm86.c audit_syscall_exit() call trashes registers

On Tue, Aug 14, 2007 at 04:52:54PM -0400, William Cattey wrote:
> The corruption originally looked like a race condition.
>
> Sometimes the EDID buffer would be all zeros.
> Sometimes it would contain partial data, and then the rest of the
> buffer filled with zeros.
> The amount of data transferred into the buffer before going to all
> zeros is non-deterministic.
>
> When we put a known value in each byte of the buffer before making
> the vm86 call, the known data would always be overwritten either with
> EDID data or zeros.

Hmm, that might be consistent with something going wrong with sleeping.
Was the system under high load? Perhaps something else can thrash
some real mode state when you sleep. On the other hand vm86 in user
mode can schedule anyways, so it might have already happened.

But I think the mutex was actually added post 2.6.16 so if you saw
it in 2.6.16 already it might have been something else.

Also when audit is not enabled (did you have it enabled?) the audit
function doesn't do very much?

If you can reliably reproduce it one way might be to comment
out more and more of the audit code until you find who causes
the corruption (that might cause some corrupted audit data,
but that should be fine for testing)

-Andi

2007-08-14 21:57:39

by William Cattey

[permalink] [raw]

Subject: Re: vm86.c audit_syscall_exit() call trashes registers

The system was otherwise completely idle. The only active task was
starting the X server.

The failure is 100% reproducible on my test system.

We have not run a lot of different kernels per se. We ran 2.6.9, and
it was fine. When we ran RHEL 5, it came with 2.6.18. All we really
did was rebuild 2.6.18 with that chunk of code removed, and the
problem went away. Mind you, when that chunk of code was removed,
there were a ton of errors about multiply freed audit blocks. But at
least the X server EDID transfer was successful.

As far as enabling/disabling the audit functionality: I'm clueless
about it. I think RHEL turned it on by default, but I don't know how
to turn it on or off myself.

I will also note that the small stand-alone utility read_edid never
failed. It was only when vm86 was called from inside of the X
server. So perhaps there's a race condition with memory not being
where it's expected to be when a large app calls out to real mode?

-Bill

----

William Cattey
Linux Platform Coordinator
MIT Information Services & Technology

W92-176, 617-253-0140, [email protected]
http://web.mit.edu/wdc/www/

On Aug 14, 2007, at 5:28 PM, Andi Kleen wrote:

> On Tue, Aug 14, 2007 at 04:52:54PM -0400, William Cattey wrote:
>> The corruption originally looked like a race condition.
>>
>> Sometimes the EDID buffer would be all zeros.
>> Sometimes it would contain partial data, and then the rest of the
>> buffer filled with zeros.
>> The amount of data transferred into the buffer before going to all
>> zeros is non-deterministic.
>>
>> When we put a known value in each byte of the buffer before making
>> the vm86 call, the known data would always be overwritten either with
>> EDID data or zeros.
>
> Hmm, that might be consistent with something going wrong with
> sleeping.
> Was the system under high load? Perhaps something else can thrash
> some real mode state when you sleep. On the other hand vm86 in user
> mode can schedule anyways, so it might have already happened.
>
> But I think the mutex was actually added post 2.6.16 so if you saw
> it in 2.6.16 already it might have been something else.
>
> Also when audit is not enabled (did you have it enabled?) the audit
> function doesn't do very much?
>
> If you can reliably reproduce it one way might be to comment
> out more and more of the audit code until you find who causes
> the corruption (that might cause some corrupted audit data,
> but that should be fine for testing)
>
> -Andi

2007-09-25 23:54:48

by William Cattey

[permalink] [raw]

Subject: Re: vm86.c audit_syscall_exit() call trashes registers

At this point Chuck and I are trying to decide what will get us the
best testing with the least effort. (We keep planning to test with a
stock kernel, but last time that was on our TODO list, the project
languished for a month.)

Do you have a specific stock kernel revision to recommend we try on
our test system currently running Red Hat's 2.6.18? Are we right to
presume it would be 2.6.18-0 to demonstrate failure and 2.6.20.0 to
attempt to demonstrate success?

Are you the Andi Kleen who signed off on Fitzhardinge's patch, and if
so do you have some insight for Chuck and I about what pieces are
required to test and see if the bug really got fixed with his cleanup?

I'd feel a lot more confident we were on the right track if I could
just correctly patch Fitzhardinge's cleanup into the test setup I
have now.

-Bill

----

William Cattey
Linux Platform Coordinator
MIT Information Services & Technology

N42-040M, 617-253-0140, [email protected]
http://web.mit.edu/wdc/www/

On Aug 14, 2007, at 6:19 PM, Andi Kleen wrote:

> On Tue, Aug 14, 2007 at 06:14:40PM -0400, William Cattey wrote:
>> In fact, I had begun the process of assuring myself that I could
>> indeed take a main line kernel, and install it in place of a Red Hat
>> Kernel.
>
> Should normally work. Sometimes there are incompatibilities
> with udev -- it is safest to just compile in the drivers you
> need to avoid that -- but likely it would work even without
> that on a modern distro.
>
>> Shall I check back with you after we've re-run the test after putting
>> the stock kernel on the machine? It will be a couple weeks because
>> they're moving my office on Thursday, I'm away on vacation the
>> following week, and I'm still unsure of the procedure to follow to
>> build a totally stock totally mainline kernel and put it into place
>> instead of what Red Hat has made.
>
> Better report to the list again in case others want to chime in.
> But you can put me into cc because I would need to merge
> any resulting patch anyways.
>
> -Andi

Attachments:

patch-2.6.20-vm86-audit-syscall-exit.patch (981.00 B)

2007-09-29 00:58:24

by Jeremy Fitzhardinge

[permalink] [raw]

Subject: Re: vm86.c audit_syscall_exit() call trashes registers

William Cattey wrote:
> Andi,
>
> Sorry to have taken so long to take another step with this problem.
> Once my customers had a work-around, other priorities crowded out this
> project. Today Chuck and I did a little more work. We'd heard that a
> more recent kernel alleged to fix this stuff. Doing some digging, we
> came across this:

Hi,

I cleaned up some completely bogus code in vm86.c. I don't have any
context for your mail though: is my change causing you problems, or
does it fix something for you?

Thanks,
J

>
> (http://www.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.20)
>
> commit 49d26b6eaa8e970c8cf6e299e6ccba2474191bf5
> Author: Jeremy Fitzhardinge <[email protected]>
> Date: Thu Dec 7 02:14:03 2006 +0100
>
> [PATCH] i386: Update sys_vm86 to cope with changed pt_regs and %gs
> usage
>
> sys_vm86 uses a struct kernel_vm86_regs, which is identical to
> pt_regs, but
> adds an extra space for all the segment registers. Previously
> this structure
> was completely independent, so changes in pt_regs had to be
> reflected in
> kernel_vm86_regs. This changes just embeds pt_regs in
> kernel_vm86_regs, and
> makes the appropriate changes to vm86.c to deal with the new naming.
>
> Also, since %gs is dealt with differently in the kernel, this
> change adjusts
> vm86.c to reflect this.
>
> While making these changes, I also cleaned up some frankly bizarre
> code which
> was added when auditing was added to sys_vm86.
>
> Signed-off-by: Jeremy Fitzhardinge <[email protected]>
> Signed-off-by: Andi Kleen <[email protected]>
> Cc: Chuck Ebbert <[email protected]>
> Cc: Zachary Amsden <[email protected]>
> Cc: Jan Beulich <[email protected]>
> Cc: Andi Kleen <[email protected]>
> Cc: Al Viro <[email protected]>
> Cc: Jason Baron <[email protected]>
> Cc: Chris Wright <[email protected]>
> Signed-off-by: Andrew Morton <[email protected]>
>
> Chuck and I took a stab at extracting what we thought was the relevant
> change to the audit_syscall_exit code, but we must have gotten it
> wrong. The EDID transfer always comes up zeros with our extract of
> Fitzhardinge's patch.
>
>
> At this point Chuck and I are trying to decide what will get us the
> best testing with the least effort. (We keep planning to test with a
> stock kernel, but last time that was on our TODO list, the project
> languished for a month.)
>
> Do you have a specific stock kernel revision to recommend we try on
> our test system currently running Red Hat's 2.6.18? Are we right to
> presume it would be 2.6.18-0 to demonstrate failure and 2.6.20.0 to
> attempt to demonstrate success?
>
> Are you the Andi Kleen who signed off on Fitzhardinge's patch, and if
> so do you have some insight for Chuck and I about what pieces are
> required to test and see if the bug really got fixed with his cleanup?
>
> I'd feel a lot more confident we were on the right track if I could
> just correctly patch Fitzhardinge's cleanup into the test setup I have
> now.
>
> -Bill
>
> ----
>
> William Cattey
> Linux Platform Coordinator
> MIT Information Services & Technology
>
> N42-040M, 617-253-0140, [email protected]
> http://web.mit.edu/wdc/www/
>
>
> On Aug 14, 2007, at 6:19 PM, Andi Kleen wrote:
>
>> On Tue, Aug 14, 2007 at 06:14:40PM -0400, William Cattey wrote:
>>> In fact, I had begun the process of assuring myself that I could
>>> indeed take a main line kernel, and install it in place of a Red Hat
>>> Kernel.
>>
>> Should normally work. Sometimes there are incompatibilities
>> with udev -- it is safest to just compile in the drivers you
>> need to avoid that -- but likely it would work even without
>> that on a modern distro.
>>
>>> Shall I check back with you after we've re-run the test after putting
>>> the stock kernel on the machine? It will be a couple weeks because
>>> they're moving my office on Thursday, I'm away on vacation the
>>> following week, and I'm still unsure of the procedure to follow to
>>> build a totally stock totally mainline kernel and put it into place
>>> instead of what Red Hat has made.
>>
>> Better report to the list again in case others want to chime in.
>> But you can put me into cc because I would need to merge
>> any resulting patch anyways.
>>
>> -Andi
>

2007-09-29 01:14:26

by William Cattey

[permalink] [raw]

Subject: Re: vm86.c audit_syscall_exit() call trashes registers

Your fix seems to have remedied a problem we are having with EDID
fetches through vm86.c. At the present moment, we're trying to
understand your cleanup so as to back port it to an earlier rev of
the kernel (2.6.18).

3 questions for you:

1. Are we correct in understanding that your cleanup only touched
vm86.c and vm86.h?

2. Do you remember your changes well enough from back when you made
them in December 2006 to be able to point out the changes solely made
to the audit calls?

3. Does correct operation of vm86.c in the 2.6 kernel require all of
your changes, or just the subset that affects the audit calls?

Thanks in advance,

-Bill Cattey

----

William Cattey
Linux Platform Coordinator
MIT Information Services & Technology

N42-040M, 617-253-0140, [email protected]
http://web.mit.edu/wdc/www/

On Sep 28, 2007, at 8:58 PM, Jeremy Fitzhardinge wrote:

> William Cattey wrote:
>> Andi,
>>
>> Sorry to have taken so long to take another step with this problem.
>> Once my customers had a work-around, other priorities crowded out
>> this
>> project. Today Chuck and I did a little more work. We'd heard
>> that a
>> more recent kernel alleged to fix this stuff. Doing some digging, we
>> came across this:
>
> Hi,
>
> I cleaned up some completely bogus code in vm86.c. I don't have any
> context for your mail though: is my change causing you problems, or
> does it fix something for you?
>
> Thanks,
> J
>
>>
>> (http://www.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.20)
>>
>> commit 49d26b6eaa8e970c8cf6e299e6ccba2474191bf5
>> Author: Jeremy Fitzhardinge <[email protected]>
>> Date: Thu Dec 7 02:14:03 2006 +0100
>>
>> [PATCH] i386: Update sys_vm86 to cope with changed pt_regs and
>> %gs
>> usage
>>
>> sys_vm86 uses a struct kernel_vm86_regs, which is identical to
>> pt_regs, but
>> adds an extra space for all the segment registers. Previously
>> this structure
>> was completely independent, so changes in pt_regs had to be
>> reflected in
>> kernel_vm86_regs. This changes just embeds pt_regs in
>> kernel_vm86_regs, and
>> makes the appropriate changes to vm86.c to deal with the new
>> naming.
>>
>> Also, since %gs is dealt with differently in the kernel, this
>> change adjusts
>> vm86.c to reflect this.
>>
>> While making these changes, I also cleaned up some frankly
>> bizarre
>> code which
>> was added when auditing was added to sys_vm86.
>>
>> Signed-off-by: Jeremy Fitzhardinge <[email protected]>
>> Signed-off-by: Andi Kleen <[email protected]>
>> Cc: Chuck Ebbert <[email protected]>
>> Cc: Zachary Amsden <[email protected]>
>> Cc: Jan Beulich <[email protected]>
>> Cc: Andi Kleen <[email protected]>
>> Cc: Al Viro <[email protected]>
>> Cc: Jason Baron <[email protected]>
>> Cc: Chris Wright <[email protected]>
>> Signed-off-by: Andrew Morton <[email protected]>
>>
>> Chuck and I took a stab at extracting what we thought was the
>> relevant
>> change to the audit_syscall_exit code, but we must have gotten it
>> wrong. The EDID transfer always comes up zeros with our extract of
>> Fitzhardinge's patch.
>>
>>
>> At this point Chuck and I are trying to decide what will get us the
>> best testing with the least effort. (We keep planning to test with a
>> stock kernel, but last time that was on our TODO list, the project
>> languished for a month.)
>>
>> Do you have a specific stock kernel revision to recommend we try on
>> our test system currently running Red Hat's 2.6.18? Are we right to
>> presume it would be 2.6.18-0 to demonstrate failure and 2.6.20.0 to
>> attempt to demonstrate success?
>>
>> Are you the Andi Kleen who signed off on Fitzhardinge's patch, and if
>> so do you have some insight for Chuck and I about what pieces are
>> required to test and see if the bug really got fixed with his
>> cleanup?
>>
>> I'd feel a lot more confident we were on the right track if I could
>> just correctly patch Fitzhardinge's cleanup into the test setup I
>> have
>> now.
>>
>> -Bill
>>
>> ----
>>
>> William Cattey
>> Linux Platform Coordinator
>> MIT Information Services & Technology
>>
>> N42-040M, 617-253-0140, [email protected]
>> http://web.mit.edu/wdc/www/
>>
>>
>> On Aug 14, 2007, at 6:19 PM, Andi Kleen wrote:
>>
>>> On Tue, Aug 14, 2007 at 06:14:40PM -0400, William Cattey wrote:
>>>> In fact, I had begun the process of assuring myself that I could
>>>> indeed take a main line kernel, and install it in place of a Red
>>>> Hat
>>>> Kernel.
>>>
>>> Should normally work. Sometimes there are incompatibilities
>>> with udev -- it is safest to just compile in the drivers you
>>> need to avoid that -- but likely it would work even without
>>> that on a modern distro.
>>>
>>>> Shall I check back with you after we've re-run the test after
>>>> putting
>>>> the stock kernel on the machine? It will be a couple weeks because
>>>> they're moving my office on Thursday, I'm away on vacation the
>>>> following week, and I'm still unsure of the procedure to follow to
>>>> build a totally stock totally mainline kernel and put it into place
>>>> instead of what Red Hat has made.
>>>
>>> Better report to the list again in case others want to chime in.
>>> But you can put me into cc because I would need to merge
>>> any resulting patch anyways.
>>>
>>> -Andi
>>
>

2007-09-29 06:06:50

by Jeremy Fitzhardinge

[permalink] [raw]

Subject: Re: vm86.c audit_syscall_exit() call trashes registers

William Cattey wrote:
> Your fix seems to have remedied a problem we are having with EDID
> fetches through vm86.c. At the present moment, we're trying to
> understand your cleanup so as to back port it to an earlier rev of the
> kernel (2.6.18).
>
> 3 questions for you:
>
> 1. Are we correct in understanding that your cleanup only touched
> vm86.c and vm86.h?
>
> 2. Do you remember your changes well enough from back when you made
> them in December 2006 to be able to point out the changes solely made
> to the audit calls?
>
> 3. Does correct operation of vm86.c in the 2.6 kernel require all of
> your changes, or just the subset that affects the audit calls?

It was only a small part of the patch. I think it was basically this
hunk (hand-edited, so this won't apply directly):

@@ -306,19 +334,18 @@ static void do_sys_vm86(struct kernel_vm
tsk->thread.screen_bitmap = info->screen_bitmap;
if (info->flags & VM86_SCREEN_BITMAP)
mark_screen_rdonly(tsk->mm);
__asm__ __volatile__("xorl %eax,%eax; movl %eax,%fs; movl %eax,%gs\n\t");
- __asm__ __volatile__("movl %%eax, %0\n" :"=r"(eax));

/*call audit_syscall_exit since we do not exit via the normal paths */
if (unlikely(current->audit_context))
- audit_syscall_exit(AUDITSC_RESULT(eax), eax);
+ audit_syscall_exit(AUDITSC_RESULT(0), 0);

This is certainly a bogus piece of code, and it could result in more or
less random values of eax being passed to audit_syscall_exit(). But I
don't know if it will have any bearing on your EDID problem; the rest of
the patch is related to the introduction of using %gs as the base for
the per-processor data area, and shouldn't cause any functional change
to sys_vm86(), but its possible I fixed some other bug in the process.

J

2007-09-29 06:09:55

by Jeremy Fitzhardinge

[permalink] [raw]

Subject: Re: vm86.c audit_syscall_exit() call trashes registers

Jeremy Fitzhardinge wrote:
> @@ -306,19 +334,18 @@ static void do_sys_vm86(struct kernel_vm
> tsk->thread.screen_bitmap = info->screen_bitmap;
> if (info->flags & VM86_SCREEN_BITMAP)
> mark_screen_rdonly(tsk->mm);
> __asm__ __volatile__("xorl %eax,%eax; movl %eax,%fs; movl %eax,%gs\n\t");
>

Oh, this line is also clearly bogus, since it clobbers %eax without
telling the compiler. The minimal change would be something like:

asm volatile("mov %0, %%fs; mov %0, %%gs" : : "r" (0));

J

2007-10-01 22:31:35

by William Cattey

[permalink] [raw]

Subject: Re: vm86.c audit_syscall_exit() call trashes registers

-Bill

----

William Cattey
Linux Platform Coordinator
MIT Information Services & Technology

N42-040M, 617-253-0140, [email protected]
http://web.mit.edu/wdc/www/

On Sep 29, 2007, at 2:09 AM, Jeremy Fitzhardinge wrote:

> Jeremy Fitzhardinge wrote:
>> @@ -306,19 +334,18 @@ static void do_sys_vm86(struct kernel_vm
>> tsk->thread.screen_bitmap = info->screen_bitmap;
>> if (info->flags & VM86_SCREEN_BITMAP)
>> mark_screen_rdonly(tsk->mm);
>> __asm__ __volatile__("xorl %eax,%eax; movl %eax,%fs; movl %
>> eax,%gs\n\t");
>>
>
> Oh, this line is also clearly bogus, since it clobbers %eax without
> telling the compiler. The minimal change would be something like:
>
> asm volatile("mov %0, %%fs; mov %0, %%gs" : : "r" (0));
>
>
> J

Attachments:

linux-2.6-correct-vm86-audit-foonly.patch (771.00 B)

2007-10-01 23:50:10

by Jeremy Fitzhardinge

[permalink] [raw]

Subject: Re: vm86.c audit_syscall_exit() call trashes registers

William Cattey wrote:
> Thanks very much for responding.
>
> From your two replies, I crafted the attached patch.
> Alas, the EDID transfer comes up all zeros.
> I see two possible causes of this behavior:
>
> 1. I misunderstood how you intended the file to be modified.
> 2. The fix for my bug is NOT in correcting the audit call, but instead
> from some other fix, perhaps from the other aspect that you worked on.
>
> I know that when I surrounded the audit_syscall_exit with #if 0 I got
> correct EDID fetches, so the most likely cause is #1, that I didn't
> correctly incorporate your understanding of correct operation of the
> call to audit_syscall_exit.
>
> Here's my patch. Where did I screw up?

Not sure, it looks OK to me. I guess it could be something else that got
fixed in that change; I'll have another look.

Can you explain again what works and what doesn't? Do you have auditing
enabled, or does it break things anyway?

J

2007-10-02 16:45:16

by Chuck Ebbert

[permalink] [raw]

Subject: Re: vm86.c audit_syscall_exit() call trashes registers

On 09/25/2007 07:38 PM, William Cattey wrote:
>
> I'd feel a lot more confident we were on the right track if I could just
> correctly patch Fitzhardinge's cleanup into the test setup I have now.
>

I think you need to zero both registers if you're using 2.6.16, and force
%eax as the source so it doesn't choose %ebp?

--- a/arch/i386/kernel/vm86.c
+++ b/arch/i386/kernel/vm86.c
@@ -306,19 +334,19 @@ static void do_sys_vm86(struct kernel_vm86_struct *info, struct task_struct *tsk
tsk->thread.screen_bitmap = info->screen_bitmap;
if (info->flags & VM86_SCREEN_BITMAP)
mark_screen_rdonly(tsk->mm);
- __asm__ __volatile__("xorl %eax,%eax; movl %eax,%fs; movl %eax,%gs\n\t");
- __asm__ __volatile__("movl %%eax, %0\n" :"=r"(eax));

/*call audit_syscall_exit since we do not exit via the normal paths */
if (unlikely(current->audit_context))
- audit_syscall_exit(AUDITSC_RESULT(eax), eax);
+ audit_syscall_exit(AUDITSC_RESULT(0), 0);

__asm__ __volatile__(
"movl %0,%%esp\n\t"
"movl %1,%%ebp\n\t"
+ "mov %2, %%fs\n\t"
+ "mov %2, %%gs\n\t"
"jmp resume_userspace"
: /* no outputs */
- :"r" (&info->regs), "r" (task_thread_info(tsk)));
+ :"r" (&info->regs), "r" (task_thread_info(tsk)), "a" (0));
/* we never return here */
}

2007-10-04 23:59:30

by William Cattey

[permalink] [raw]

Subject: Re: vm86.c audit_syscall_exit() call trashes registers

Thanks very much for thinking about this and providing a revised
candidate patch.

Sadly, the effect of the patch is the same as the most recent
candidate patch from Jeremy Fitzhardinge: The EDID transfer still
comes up all zeros.

This is very perplexing to me. If I take the code that appears in
2.6.18's vm86.c, and simply put #if 0 around the call to
audit_syscall_exit I get good data.

If this is indeed a correct minimal correction to the
audit_syscall_exit code, then perhaps there's some other condition
being exercised. I guess my next step is to take the whole pt_regs
patch (commit 49d26b6eaa8e970c8cf6e299e6ccba2474191bf5) from
kernel.org and see if that has a beneficial effect.

-Bill

----

William Cattey
Linux Platform Coordinator
MIT Information Services & Technology

N42-040M, 617-253-0140, [email protected]
http://web.mit.edu/wdc/www/

On Oct 2, 2007, at 12:44 PM, Chuck Ebbert wrote:

> On 09/25/2007 07:38 PM, William Cattey wrote:
>>
>> I'd feel a lot more confident we were on the right track if I
>> could just
>> correctly patch Fitzhardinge's cleanup into the test setup I have
>> now.
>>
>
> I think you need to zero both registers if you're using 2.6.16, and
> force
> %eax as the source so it doesn't choose %ebp?
>
> --- a/arch/i386/kernel/vm86.c
> +++ b/arch/i386/kernel/vm86.c
> @@ -306,19 +334,19 @@ static void do_sys_vm86(struct
> kernel_vm86_struct *info, struct task_struct *tsk
> tsk->thread.screen_bitmap = info->screen_bitmap;
> if (info->flags & VM86_SCREEN_BITMAP)
> mark_screen_rdonly(tsk->mm);
> - __asm__ __volatile__("xorl %eax,%eax; movl %eax,%fs; movl %eax,%gs
> \n\t");
> - __asm__ __volatile__("movl %%eax, %0\n" :"=r"(eax));
>
> /*call audit_syscall_exit since we do not exit via the normal
> paths */
> if (unlikely(current->audit_context))
> - audit_syscall_exit(AUDITSC_RESULT(eax), eax);
> + audit_syscall_exit(AUDITSC_RESULT(0), 0);
>
> __asm__ __volatile__(
> "movl %0,%%esp\n\t"
> "movl %1,%%ebp\n\t"
> + "mov %2, %%fs\n\t"
> + "mov %2, %%gs\n\t"
> "jmp resume_userspace"
> : /* no outputs */
> - :"r" (&info->regs), "r" (task_thread_info(tsk)));
> + :"r" (&info->regs), "r" (task_thread_info(tsk)), "a" (0));
> /* we never return here */
> }
>

2007-10-05 00:10:45

by Chuck Ebbert

[permalink] [raw]

Subject: Re: vm86.c audit_syscall_exit() call trashes registers

On 10/04/2007 07:58 PM, William Cattey wrote:
>
> Sadly, the effect of the patch is the same as the most recent candidate
> patch from Jeremy Fitzhardinge: The EDID transfer still comes up all
> zeros.
>

I think maybe a better question is: why does read_edid still work?
The X server might be making some invalid assumption about system
state. Comparing the code the two programs use could provide some clues.