2012-02-06 08:33:16

by Indan Zupancic

[permalink] [raw]
Subject: Re: Compat 32-bit syscall entry from 64-bit task!?

On Wed, January 18, 2012 21:26, Linus Torvalds wrote:
> Added Peter to the cc, since this is now about some x86-specific
> things. Ingo was already cc'd earlier.
>
> On Wed, Jan 18, 2012 at 11:31 AM, Linus Torvalds
> <[email protected]> wrote:
>>
>> Using the high bits of 'eflags' might work. Hopefully nobody tests
>> that. IOW, something like the attached might work. It just sets bit#32
>> in eflags if the system call is a compat call.
>
> So that description was bogus, it was what my original patch did, but
> not the one I actually sent out (Peter - you can find it on lkml,
> although the description below is probably sufficient for you to
> understand what it does, or the obvious nature of the attached patch
> for strace).
>
> The one I sent out *unconditionally* sets one bit in the high bits of
> the returned value of the eflags register from ptrace(), very much on
> purpose. That way you can unambiguously see whether it's an old kernel
> (bits clear) or a new kernel that supports the feature. On a new
> kernel, bit #32 of eflags will be set for a native 64-bit system call,
> and bit #33 will be set for a compat system call.
>
> And some testing says that it works. In particular, I have a patch to
> strace-4.6 that is able to correctly decode my mixed-case binary that
> uses both the compat system call and the native system calls from
> 64-bit long mode. Also, it looks like gdb ignores the high bits of
> eflags, since it "knows" that eflags is just a 32-bit register even in
> 64-bit mode, so the fact that we set some random bits in there doesn't
> end up being noisy for at least one debugger.
>
> HOWEVER. I'm not going to guarantee that this is the right approach.
> It seems to work, and it clearly gives people real information, but
> whether this is the best way to do things or not is open.

It seems that just using eflags is a lot simpler than the alternatives,
let's just go for it.

>
> The reason I picked 'eflags' was that it
>
> (a) was easy from an implementation standpoint, since we already have
> to handle reading of eflags specially in ptrace (we have to fake out
> the resume bit)
>
> (b) it "kind of" makes sense to make high bits be "system flags",
> with low bits being "cpu flags", so it fits at least *some* kind of
> conceptual model.
>
> (c) the other sane places to put it (high bits of CS and/or ORIG_AX)
> were being used and compared as 64-bit values at least by strace.
> Whether eflags works for all users, I have no idea, but generally you
> would never compare eflags for one particular value - you might check
> individual bits in eflags, but hopefully setting a few new bits should
> not be something that any legacy user would ever really notice.
>
> So there are reasons to think that my patch is sane, but...
>
> Here's the strace patch, so people can look. I didn't even test it on
> an old kernel, but the fallback case to the old behavior looks
> trivial.
>
> Comments?

I propose using bits somewhere in the middle of the upper half. If new
flags are ever added by Intel or AMD, they will use the lower bits. If
anyone else ever adds flags, they most likely add them to the top (VIA).
So the middle seems the safest spot as far as long-term maintenance goes.

The below version does that, but instead of setting one of the two bits,
it always sets bit 50 for newer kernels and sets bit 51 if it's a compat
system call. I find this version more readable and after compilation it's
also a couple of bytes smaller compared to Linus' original version.

Should we make sure that the top 32 bits are zero, in case any weird
hardware does set our bits?

Greetings,

Indan

---

diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 5026738..a7fda48 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -353,6 +353,7 @@ static int set_segment_reg(struct task_struct *task,

static unsigned long get_flags(struct task_struct *task)
{
+ int bit = 50;
unsigned long retval = task_pt_regs(task)->flags;

/*
@@ -360,8 +361,11 @@ static unsigned long get_flags(struct task_struct *task)
*/
if (test_tsk_thread_flag(task, TIF_FORCED_TF))
retval &= ~X86_EFLAGS_TF;
-
- return retval;
+#ifdef CONFIG_IA32_EMULATION
+ if (task_thread_info(task)->status & TS_COMPAT)
+ retval |= (1ul << 51);
+#endif
+ return retval | (1ul << bit);
}

static int set_flags(struct task_struct *task, unsigned long value)


2012-02-06 17:06:19

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Compat 32-bit syscall entry from 64-bit task!?

On 02/06/2012 12:32 AM, Indan Zupancic wrote:
>
> It seems that just using eflags is a lot simpler than the alternatives,
> let's just go for it.
>
>
> I propose using bits somewhere in the middle of the upper half. If new
> flags are ever added by Intel or AMD, they will use the lower bits. If
> anyone else ever adds flags, they most likely add them to the top (VIA).
> So the middle seems the safest spot as far as long-term maintenance goes.
>
> The below version does that, but instead of setting one of the two bits,
> it always sets bit 50 for newer kernels and sets bit 51 if it's a compat
> system call. I find this version more readable and after compilation it's
> also a couple of bytes smaller compared to Linus' original version.
>
> Should we make sure that the top 32 bits are zero, in case any weird
> hardware does set our bits?
>

[Adding H.J. Lu, since he has run into some of these requirements before]

NAK in the extreme.

We have not heard back from the architecture people on this, and I will
NAK this unless that happens.

Furthermore, you're picking bits that do not work for 32 bits, EVEN
THOUGH WE HAVE A SIMILAR PROBLEM ON 32 BITS; I outlined it for you and
you chose to ignore it.

Finally, I think we actually are going to need a fair number of bits in
the end. All of this points to using a new regset designed for
extension in the first place.

As far as I can tell, we need at least the following information:

- If the CPU is currently in 32- or 64-bit mode.
- If we are currently inside a system call, and if so if it was entered
via:
- SYSCALL64
- INT 80
- SYSCALL32
- SYSENTER

The reason we need this information is because for the various 32-bit
entry points we do some very ugly swizzling of registers, which
matters to a ptrace client which wants to modify system call
arguments.
- If the process was started as a 64-bit process, i386 process or x32
process.

This adds up to a minimum of six bits already (and at least two bits on
i386), and that's just a start.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2012-02-07 01:53:02

by Indan Zupancic

[permalink] [raw]
Subject: Re: Compat 32-bit syscall entry from 64-bit task!?

On Mon, February 6, 2012 18:02, H. Peter Anvin wrote:
> On 02/06/2012 12:32 AM, Indan Zupancic wrote:
>>
>> It seems that just using eflags is a lot simpler than the alternatives,
>> let's just go for it.
>>
>>
>> I propose using bits somewhere in the middle of the upper half. If new
>> flags are ever added by Intel or AMD, they will use the lower bits. If
>> anyone else ever adds flags, they most likely add them to the top (VIA).
>> So the middle seems the safest spot as far as long-term maintenance goes.
>>
>> The below version does that, but instead of setting one of the two bits,
>> it always sets bit 50 for newer kernels and sets bit 51 if it's a compat
>> system call. I find this version more readable and after compilation it's
>> also a couple of bytes smaller compared to Linus' original version.
>>
>> Should we make sure that the top 32 bits are zero, in case any weird
>> hardware does set our bits?
>>
>
> [Adding H.J. Lu, since he has run into some of these requirements before]
>
> NAK in the extreme.
>
> We have not heard back from the architecture people on this, and I will
> NAK this unless that happens.
>
> Furthermore, you're picking bits that do not work for 32 bits, EVEN
> THOUGH WE HAVE A SIMILAR PROBLEM ON 32 BITS; I outlined it for you and
> you chose to ignore it.

Sorry, I missed that. I looked up that email and you indeed did, though
you didn't give any details about what the problems are.

> Finally, I think we actually are going to need a fair number of bits in
> the end. All of this points to using a new regset designed for
> extension in the first place.
>
> As far as I can tell, we need at least the following information:
>
> - If the CPU is currently in 32- or 64-bit mode.

What is the best way to find that out at the kernel side? Add a function
that checks cs and returns the correct answer? But in the kernel path the
CPU is always in 64-bit mode, so I suppose you want to know what mode the
tracee was in?

> - If we are currently inside a system call, and if so if it was entered
> via:
> - SYSCALL64
> - INT 80
> - SYSCALL32
> - SYSENTER
>
> The reason we need this information is because for the various 32-bit
> entry points we do some very ugly swizzling of registers, which
> matters to a ptrace client which wants to modify system call
> arguments.

But isn't the swizzling done in such way that all this is hidden from
ptrace clients (and the rest of the kernel)? Why would a ptrace client
need to know the details of the 32-bit entry call?

The ptrace client can always modify the same registers, as system calls
always use the same registers too. No unexpected behaviour happens as
far as I can tell from looking at the code, at least not in the syscall
entry path.

E.g. ENTRY(ia32_cstar_target) in ia32entry.S does:

movq %rbp,RCX-ARGOFFSET(%rsp) /* this lies slightly to ptrace */

To hide that for SYSCALL32 arg2 comes in edp instead of rcx. Same for arg6.

(I actually can't find a SYSCALL32 entry in entry_32.S, am I blind or
was it too slow until the 64-bit Athlons showed up?)

A pure 32-bit kernel is compiled with:

#define asmlinkage CPP_ASMLINKAGE __attribute__((regparm(0)))

So all arguments are passed on the stack and those arguments can be
directly modified by ptrace. For compat kernels the arguments are
reloaded after ptrace and before the actual system call is done.

> - If the process was started as a 64-bit process, i386 process or x32
> process.

Can't that be figured out by looking at the AUXV data? Either via /proc
or PTRACE_GETREGSET + NT_AUXV. And as this can't change, there is no
need to pass it on all the time.

> This adds up to a minimum of six bits already (and at least two bits on
> i386), and that's just a start.

I'm not convinced that there is any real problem, it seems only one extra
bit for the task CPU mode would be needed, so three bits in total.

Greetings,

Indan

2012-02-09 00:24:01

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Compat 32-bit syscall entry from 64-bit task!?

On 02/06/2012 05:52 PM, Indan Zupancic wrote:
>>
>> - If the CPU is currently in 32- or 64-bit mode.
>
> What is the best way to find that out at the kernel side? Add a function
> that checks cs and returns the correct answer? But in the kernel path the
> CPU is always in 64-bit mode, so I suppose you want to know what mode the
> tracee was in?
>

You need to look at the CS descriptor.

>> - If we are currently inside a system call, and if so if it was entered
>> via:
>> - SYSCALL64
>> - INT 80
>> - SYSCALL32
>> - SYSENTER
>>
>> The reason we need this information is because for the various 32-bit
>> entry points we do some very ugly swizzling of registers, which
>> matters to a ptrace client which wants to modify system call
>> arguments.
>
> But isn't the swizzling done in such way that all this is hidden from
> ptrace clients (and the rest of the kernel)? Why would a ptrace client
> need to know the details of the 32-bit entry call?
>
> The ptrace client can always modify the same registers, as system calls
> always use the same registers too. No unexpected behaviour happens as
> far as I can tell from looking at the code, at least not in the syscall
> entry path.

The simple stuff works, but once you want to do things like change the
arguments and/or move the execution point, things get unswizzled in
uncontrolled ways. There are bug reports related to that (I would have
to dig them up) and they aren't really fixable in any sane way right now.

> A pure 32-bit kernel is compiled with:
>
> #define asmlinkage CPP_ASMLINKAGE __attribute__((regparm(0)))

... which we'd like to get rid of ...

> So all arguments are passed on the stack and those arguments can be
> directly modified by ptrace. For compat kernels the arguments are
> reloaded after ptrace and before the actual system call is done.

>> - If the process was started as a 64-bit process, i386 process or x32
>> process.
>
> Can't that be figured out by looking at the AUXV data? Either via /proc
> or PTRACE_GETREGSET + NT_AUXV. And as this can't change, there is no
> need to pass it on all the time.

I'll look at the auxv stuff.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2012-02-09 04:20:55

by Indan Zupancic

[permalink] [raw]
Subject: Re: Compat 32-bit syscall entry from 64-bit task!?

On Thu, February 9, 2012 01:19, H. Peter Anvin wrote:
> On 02/06/2012 05:52 PM, Indan Zupancic wrote:
>>>
>>> - If the CPU is currently in 32- or 64-bit mode.
>>
>> What is the best way to find that out at the kernel side? Add a function
>> that checks cs and returns the correct answer? But in the kernel path the
>> CPU is always in 64-bit mode, so I suppose you want to know what mode the
>> tracee was in?
>>
>
> You need to look at the CS descriptor.

CS is already available to user space, but any other value than 0x23 or 0x33
will confuse user space, as that is all they know about. Apparently Xen uses
different values, but if those are static then user space can check for them
separately. But if the values change dynamically then some other way may be
needed.

But does it make much sense to pass the CPU mode of user space if that mode
can be changed at any moment? I don't think it really does. Can you give an
example of how that info can be used by a ptracer?

>
>>> - If we are currently inside a system call, and if so if it was entered
>>> via:
>>> - SYSCALL64
>>> - INT 80
>>> - SYSCALL32
>>> - SYSENTER
>>>
>>> The reason we need this information is because for the various 32-bit
>>> entry points we do some very ugly swizzling of registers, which
>>> matters to a ptrace client which wants to modify system call
>>> arguments.
>>
>> But isn't the swizzling done in such way that all this is hidden from
>> ptrace clients (and the rest of the kernel)? Why would a ptrace client
>> need to know the details of the 32-bit entry call?
>>
>> The ptrace client can always modify the same registers, as system calls
>> always use the same registers too. No unexpected behaviour happens as
>> far as I can tell from looking at the code, at least not in the syscall
>> entry path.
>
> The simple stuff works, but once you want to do things like change the
> arguments and/or move the execution point, things get unswizzled in
> uncontrolled ways.

I do both and haven't encountered any problems.

I can't find any unswizzling happening in the return path though. So
from a ptracer's point of view it all looks the same after a system
call, no matter how it was entered. Except for IP perhaps, but that's
handled in the vDSO.

> There are bug reports related to that (I would have
> to dig them up) and they aren't really fixable in any sane way right now.

I don't see any problems in the code.

Only confusion I can think of is someone following the register values
across a systemcall instruction. Then the swizzling may be unexpected.
But if they do that they could check how the sycall was entered and
compensate for that. (I can't think of any requirement why this would
need to be race-free.)

>> A pure 32-bit kernel is compiled with:
>>
>> #define asmlinkage CPP_ASMLINKAGE __attribute__((regparm(0)))
>
> ... which we'd like to get rid of ...

If you do get rid of it, then you have to reload the registers after
ptrace, just like currently happens on x86_64 kernels. So regparm(0)
isn't a requirement, I only explained why reloading the registers
isn't needed for pure 32-bit.

Greetings,

Indan

2012-02-09 04:32:19

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Compat 32-bit syscall entry from 64-bit task!?

On 02/08/2012 08:20 PM, Indan Zupancic wrote:
>
> CS is already available to user space, but any other value than 0x23 or 0x33
> will confuse user space, as that is all they know about. Apparently Xen uses
> different values, but if those are static then user space can check for them
> separately. But if the values change dynamically then some other way may be
> needed.
>
> But does it make much sense to pass the CPU mode of user space if that mode
> can be changed at any moment? I don't think it really does. Can you give an
> example of how that info can be used by a ptracer?
>

Uh... you could make THAT argument about ANY register state!

I believe H.J. can fill you in about the usage.

>
> Only confusion I can think of is someone following the register values
> across a systemcall instruction. Then the swizzling may be unexpected.
> But if they do that they could check how the sycall was entered and
> compensate for that. (I can't think of any requirement why this would
> need to be race-free.)
>

You'd have to know how you'd entered, which right now you don't have any
way to know.

-hpa

2012-02-09 06:03:47

by Indan Zupancic

[permalink] [raw]
Subject: Re: Compat 32-bit syscall entry from 64-bit task!?

On Thu, February 9, 2012 05:29, H. Peter Anvin wrote:
> On 02/08/2012 08:20 PM, Indan Zupancic wrote:
>>
>> CS is already available to user space, but any other value than 0x23 or 0x33
>> will confuse user space, as that is all they know about. Apparently Xen uses
>> different values, but if those are static then user space can check for them
>> separately. But if the values change dynamically then some other way may be
>> needed.
>>
>> But does it make much sense to pass the CPU mode of user space if that mode
>> can be changed at any moment? I don't think it really does. Can you give an
>> example of how that info can be used by a ptracer?
>>
>
> Uh... you could make THAT argument about ANY register state!

Well, when the tracee is in a system call, it can't change registers,
and their values determine the system call number and arguments. That
information is stable for the current system call. And as a ptracer
can't determine if the 32 or 64-bit syscall entry path was taken in
a race-free way, it makes sense to provide that extra info.

But the same is not true for the user space CPU mode, that can change
at any time without the tracer getting a notification, except if it is
single stepping (which I forgot about).

Would it be useful to know the CPU mode when single stepping or otherwise?

I'm asking because I don't see a need for it, but if someone else does
it's better to add it now together with the syscall mode bit. Unlike the
system call mode, the CPU mode can be checked via CS. The question is
if that works well enough or if the values are dynamic enough that it's
better to pass the info explicitly instead.

Unlike the syscall mode info, figuring out the mode from CS isn't trivial
when it can change dynamically. Then all places that use non-standard CS
values need to be changed to provide the mode somehow.

> I believe H.J. can fill you in about the usage.

That would be great.

>>
>> Only confusion I can think of is someone following the register values
>> across a systemcall instruction. Then the swizzling may be unexpected.
>> But if they do that they could check how the sycall was entered and
>> compensate for that. (I can't think of any requirement why this would
>> need to be race-free.)
>>
>
> You'd have to know how you'd entered, which right now you don't have any
> way to know.

You can check the syscall instruction itself, either before it's executed
or afterwards by checking the IP. Though that's trickier, because the
kernel points the IP to just after int80 for a sysenter call, so you have
to check if there's a sysenter nearby too.

You can also figure out what the entry instruction was by comparing the
register values with the expected ones and deducing it that way.

But the kernel is actually changing the registers, so why hide that?

I mean, once user space is aware that the kernel may do swizzling, is there
any actual problem left? Because this sounds like user space was trying to
be clever, but got it wrong. E.g. it knew the kernel was entered not via
int80, but then got confused because of the swizzling.

Greetings,

Indan

2012-02-09 14:53:23

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Compat 32-bit syscall entry from 64-bit task!?

On 02/08/2012 10:03 PM, Indan Zupancic wrote:
>
> You can check the syscall instruction itself, either before it's executed
> or afterwards by checking the IP. Though that's trickier, because the
> kernel points the IP to just after int80 for a sysenter call, so you have
> to check if there's a sysenter nearby too.
>

No, that's a total nightmare. FAIL.

> But the kernel is actually changing the registers, so why hide that?
>
> I mean, once user space is aware that the kernel may do swizzling, is there
> any actual problem left? Because this sounds like user space was trying to
> be clever, but got it wrong. E.g. it knew the kernel was entered not via
> int80, but then got confused because of the swizzling.

I would be great if we didn't have an existing compatibility problem.
As it is we can't get rid of it easily.

-hpa

2012-02-09 16:00:57

by H.J. Lu

[permalink] [raw]
Subject: Re: Compat 32-bit syscall entry from 64-bit task!?

On Wed, Feb 8, 2012 at 8:29 PM, H. Peter Anvin <[email protected]> wrote:
> On 02/08/2012 08:20 PM, Indan Zupancic wrote:
>>
>> CS is already available to user space, but any other value than 0x23 or 0x33
>> will confuse user space, as that is all they know about. Apparently Xen uses
>> different values, but if those are static then user space can check for them
>> separately. But if the values change dynamically then some other way may be
>> needed.
>>
>> But does it make much sense to pass the CPU mode of user space if that mode
>> can be changed at any moment? I don't think it really does. Can you give an
>> example of how that info can be used by a ptracer?
>>
>
> Uh... you could make THAT argument about ANY register state!
>
> I believe H.J. can fill you in about the usage.
>

GDB uses CS value to tell ia32 process from x86-64 process.
At minimum, we need a bit in CS for GDB. But any changes
will break old GDB.

H.J.

2012-02-10 01:10:33

by Indan Zupancic

[permalink] [raw]
Subject: Re: Compat 32-bit syscall entry from 64-bit task!?

On Thu, February 9, 2012 17:00, H.J. Lu wrote:
> GDB uses CS value to tell ia32 process from x86-64 process.

Are there any cases when this doesn't work? Someone said Xen can
have different CS values, but looking at the source it seems it's
using the same ones, at least with a Linux hypervisor. So perhaps
it was KVM. Looking at the header it seems paravirtualisation uses
different cs values. On the upside, it seems we can just use that
user_64bit_mode() to know whether it is 32 or 64 bit mode, so
adding a bit telling the process mode is easier than I thought.

Currently there is a need to tell if the 32 or 64-bit syscall
path is being taken, which is independent of the process mode.

> At minimum, we need a bit in CS for GDB. But any changes
> will break old GDB.

Would adding bits to the upper 32-bit of rflags break GDB?

Do you also need a way to know whether the kernel was entered via
int 0x80, SYSCALL32/64 or SYSENTER?

Greetings,

Indan

2012-02-10 01:19:08

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Compat 32-bit syscall entry from 64-bit task!?

On 02/09/2012 05:09 PM, Indan Zupancic wrote:
> On Thu, February 9, 2012 17:00, H.J. Lu wrote:
>> GDB uses CS value to tell ia32 process from x86-64 process.
>
> Are there any cases when this doesn't work? Someone said Xen can
> have different CS values, but looking at the source it seems it's
> using the same ones, at least with a Linux hypervisor. So perhaps
> it was KVM. Looking at the header it seems paravirtualisation uses
> different cs values. On the upside, it seems we can just use that
> user_64bit_mode() to know whether it is 32 or 64 bit mode, so
> adding a bit telling the process mode is easier than I thought.
>
> Currently there is a need to tell if the 32 or 64-bit syscall
> path is being taken, which is independent of the process mode.
>

There are definitely cases where the current reliance on magic CS values
doesn't work; never mind the fact that it's just broken.

>> At minimum, we need a bit in CS for GDB. But any changes
>> will break old GDB.
>
> Would adding bits to the upper 32-bit of rflags break GDB?

It doesn't work for i386, never mind that this is reserved hardware
state and we don't have an OK at this time to redeclare them available.

> Do you also need a way to know whether the kernel was entered via
> int 0x80, SYSCALL32/64 or SYSENTER?

gdb, probably not. That came from another user (pin, I think, but I'm
not sure.)

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2012-02-10 02:30:13

by Indan Zupancic

[permalink] [raw]
Subject: Re: Compat 32-bit syscall entry from 64-bit task!?

On Fri, February 10, 2012 02:15, H. Peter Anvin wrote:
> On 02/09/2012 05:09 PM, Indan Zupancic wrote:
>> On Thu, February 9, 2012 17:00, H.J. Lu wrote:
>>> GDB uses CS value to tell ia32 process from x86-64 process.
>>
>> Are there any cases when this doesn't work? Someone said Xen can
>> have different CS values, but looking at the source it seems it's
>> using the same ones, at least with a Linux hypervisor. So perhaps
>> it was KVM. Looking at the header it seems paravirtualisation uses
>> different cs values. On the upside, it seems we can just use that
>> user_64bit_mode() to know whether it is 32 or 64 bit mode, so
>> adding a bit telling the process mode is easier than I thought.
>>
>> Currently there is a need to tell if the 32 or 64-bit syscall
>> path is being taken, which is independent of the process mode.
>>
>
> There are definitely cases where the current reliance on magic CS values
> doesn't work; never mind the fact that it's just broken.

It's only broken because it doesn't work sometimes. ;-)

>>> At minimum, we need a bit in CS for GDB. But any changes
>>> will break old GDB.
>>
>> Would adding bits to the upper 32-bit of rflags break GDB?
>
> It doesn't work for i386, never mind that this is reserved hardware
> state and we don't have an OK at this time to redeclare them available.

It doesn't need to work for i386 because it's close to practically
impossible to ptrace a 64-bit task with a 32-bit ptracer.

An alternative would be to use some of the bits in the lower half.

E.g. bits 1, 3, 5 and 15 are reserved and very unlikely to be ever
used for anything, because they can use plenty of bits at the top.
Problem would be that we can't be sure that they are always zero.
If they are, they're safe to use.

The VIF and VIP flags can also be stolen as they're always zero
outside of vm86 mode (which can't be ptraced AFAIK). So we could
set VIF or VIP to tell if we stole bits 1, 3, 5 and/or 15. That
would give us 6 bits in total, and the only confusing thing might
be VIF or VIP set for user space. But anyone counting on those
being zero seems unlikely, and even more unlikely for the reserved
bits, as they are intermixed with unpredictable bits. We could use
VM too, but that might be too confusing, while VIF or VIP without
VM set make no sense.

Perhaps using VIF or VIP to tell whether the other bits are valid
is a good idea anyway, as it can never clash because they are well
defined already and always zero for non-VM mode.

With the current rate of adding flags it will take forever before
any of this might break. And if that happens, we just move to other
bits and user space needs to check those first. Or if the flags
aren't useful for userspace, hide them and keep using it for the
kernel.

>> Do you also need a way to know whether the kernel was entered via
>> int 0x80, SYSCALL32/64 or SYSENTER?
>
> gdb, probably not. That came from another user (pin, I think, but I'm
> not sure.)

Could you find out? Because I have a hard time thinking of any good
reason why anyone would want to know this specifically.

If this info is added it can replace the bit saying if it's 32 or 64-bit
syscall path. So one bit for enabling all this, 2 bits for the syscall
entry instruction (with SYSCALL64 being 0 as an easy check for the 64-bit
path) and one bit for user space mode. This would end up being 4 bits in
total, except if I forgot anything.

Only downside of adding the entry instruction info would be that more
work in the entry-specific code is needed. The code wouldn't be contained
to a small ptrace specific bit anymore.

Greetings,

Indan

2012-02-10 02:51:01

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Compat 32-bit syscall entry from 64-bit task!?

On 02/09/2012 06:29 PM, Indan Zupancic wrote:
>
> It's only broken because it doesn't work sometimes. ;-)
>

I really hope you realize how idiotic that sounds.

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.