Hi guys, I wonder how easily the include/uapi/* is being
changed these days.
The patch:
http://lkml.kernel.org/r/405594361340a2ec32f8e2b115c142df0e180d8e.1426193719.git.luto@kernel.org
breaks dosemu (and perhaps everyone else who used
to restore the segregs by hands). And the fix involves
both autoconf magic and run-time magic, so it is not even
trivial.
I realize this patch may be good to have in general, but
breaking userspace without a single warning is a bit
discouraging. Seems like the old "we don't break userspace"
rule have gone.
On Tue, Aug 11, 2015 at 5:17 PM, Stas Sergeev <[email protected]> wrote:
> Hi guys, I wonder how easily the include/uapi/* is being
> changed these days.
> The patch:
> http://lkml.kernel.org/r/405594361340a2ec32f8e2b115c142df0e180d8e.1426193719.git.luto@kernel.org
> breaks dosemu (and perhaps everyone else who used
> to restore the segregs by hands). And the fix involves
> both autoconf magic and run-time magic, so it is not even
> trivial.
> I realize this patch may be good to have in general, but
> breaking userspace without a single warning is a bit
> discouraging. Seems like the old "we don't break userspace"
> rule have gone.
I didn't anticipate any breakage. I could have been wrong.
Do you know what the actual breakage is? I'm curious how this ever
worked for DOSEMU, given that, before this patch, it appeared to be
impossible to return to any nonstandard SS from a 64-bit signal
handler. FWIW, DOSEMU seems to work for me on recent kernels.
We might still be able to require a new sigcontext flag to be set and
to forcibly return to __USER_DS if the flag is set regardless of the
ss value in sigcontext when sigreturn is called, if that is indeed the
problem with DOSEMU. But I'm not actually sure that that's the
problem.
In fact, DOSEMU contains this:
/* set up a frame to get back to DPMI via iret. The kernel does not save
%ss, and the SYSCALL instruction in sigreturn() destroys it.
IRET pops off everything in 64-bit mode even if the privilege
does not change which is nice, but clobbers the high 48 bits
of rsp if the DPMI client uses a 16-bit stack which is not so
nice (see EMUfailure.txt). Setting %rsp to 0x100000000 so that
bits 16-31 are zero works around this problem, as DPMI code
can't see bits 32-63 anyway.
*/
So, if DOSEMU were to realize that both sigreturnissues it's
complaining about are fixed in recent kernels, it could sigreturn
directly back to any state. (It can't *directly* IRET back to a
16-bit state with properly controlled RSP, but it can do it via
sigreturn.)
I don't actually see any code in DOSEMU that generates a sigcontext
from scratch (as opposed to copying one and modifying it), so I'm not
entirely sure what the problem is.
--Andy
--
Andy Lutomirski
AMA Capital Management, LLC
12.08.2015 03:38, Andy Lutomirski пишет:
> On Tue, Aug 11, 2015 at 5:17 PM, Stas Sergeev <[email protected]> wrote:
>> Hi guys, I wonder how easily the include/uapi/* is being
>> changed these days.
>> The patch:
>> http://lkml.kernel.org/r/405594361340a2ec32f8e2b115c142df0e180d8e.1426193719.git.luto@kernel.org
>> breaks dosemu (and perhaps everyone else who used
>> to restore the segregs by hands). And the fix involves
>> both autoconf magic and run-time magic, so it is not even
>> trivial.
>> I realize this patch may be good to have in general, but
>> breaking userspace without a single warning is a bit
>> discouraging. Seems like the old "we don't break userspace"
>> rule have gone.
> I didn't anticipate any breakage. I could have been wrong.
You changed include/uapi/*, which is obviously an asking
for problems. I applied the following changes to my local
git tree to get dosemu working again:
https://github.com/stsp/dosemu2/commit/48b2a13a49a9fe1a456cd77df6b9a1feec675a01
https://github.com/stsp/dosemu2/commit/7898ac60d5e569964127d6cc48f592caecd20b81
> Do you know what the actual breakage is? I'm curious how this ever
> worked for DOSEMU, given that, before this patch, it appeared to be
> impossible to return to any nonstandard SS from a 64-bit signal
> handler.
This is not the point.
What dosemu wants is to simply save the DOS SS somewhere.
After your patch, it saves the Linux SS instead, then crashes.
> FWIW, DOSEMU seems to work for me on recent kernels.
Do you have any protected mode DOS program to test?
I'll send you one in a private e-mail just about now.
> We might still be able to require a new sigcontext flag to be set and
> to forcibly return to __USER_DS if the flag is set regardless of the
> ss value in sigcontext when sigreturn is called, if that is indeed the
> problem with DOSEMU. But I'm not actually sure that that's the
> problem.
Well, the flag would be an ideal solution in an ideal world,
but in our world I don't know the current relevance of dosemu,
and whether or not it worth a new flag to add.
> In fact, DOSEMU contains this:
>
> /* set up a frame to get back to DPMI via iret. The kernel does not save
> %ss, and the SYSCALL instruction in sigreturn() destroys it.
>
> IRET pops off everything in 64-bit mode even if the privilege
> does not change which is nice, but clobbers the high 48 bits
> of rsp if the DPMI client uses a 16-bit stack which is not so
> nice (see EMUfailure.txt). Setting %rsp to 0x100000000 so that
> bits 16-31 are zero works around this problem, as DPMI code
> can't see bits 32-63 anyway.
> */
>
> So, if DOSEMU were to realize that both sigreturnissues it's
> complaining about are fixed in recent kernels, it could sigreturn
> directly back to any state.
Good, but have you added any flag for dosemu to even know
it can do this? Unless I am mistaken, you didn't. So the fix you
suggest, is not easy to detect and make portable with the older
kernels. Any suggestions?
> I don't actually see any code in DOSEMU that generates a sigcontext
It doesn't. But it manually pops the kernel-generated
sigcontext, see dpmisel.S:DPMI_direct_transfer.
This part didn't broke though, so no need to look there in fact.
> from scratch (as opposed to copying one and modifying it), so I'm not
> entirely sure what the problem is.
Now you have the changes I did to get it working, and I'll
also mail you a simple test-case. Let me know if you need
something else.
The more important question is whether we ignore dosemu
or some actions should be taken. Since you changed uapi/*,
my initial guess was that you opt for ignoring it, but maybe
this was not the point.
On Wed, Aug 12, 2015 at 1:02 AM, Stas Sergeev <[email protected]> wrote:
> 12.08.2015 03:38, Andy Lutomirski пишет:
>>
>> On Tue, Aug 11, 2015 at 5:17 PM, Stas Sergeev <[email protected]> wrote:
>>>
>>> Hi guys, I wonder how easily the include/uapi/* is being
>>> changed these days.
>>> The patch:
>>>
>>> http://lkml.kernel.org/r/405594361340a2ec32f8e2b115c142df0e180d8e.1426193719.git.luto@kernel.org
>>> breaks dosemu (and perhaps everyone else who used
>>> to restore the segregs by hands). And the fix involves
>>> both autoconf magic and run-time magic, so it is not even
>>> trivial.
>>> I realize this patch may be good to have in general, but
>>> breaking userspace without a single warning is a bit
>>> discouraging. Seems like the old "we don't break userspace"
>>> rule have gone.
>>
>> I didn't anticipate any breakage. I could have been wrong.
>
> You changed include/uapi/*, which is obviously an asking
> for problems. I applied the following changes to my local
> git tree to get dosemu working again:
To be fair, I renamed a field that used to be padding. The UAPI has
to change on occasion -- it's just not supposed to break things.
> https://github.com/stsp/dosemu2/commit/48b2a13a49a9fe1a456cd77df6b9a1feec675a01
Maybe I'm still missing something, but this seems like it should be
unnecessary. What goes wrong without it?
The new ss field serves two purposes: it stores the old ss (dosemu
needs that on new kernels and would benefit in general) and it stores
the new post-sigreturn ss (dosemu doesn't currently have any use for
that because of the iret trampoline trick).
But maybe you're doing this to make the next patch work.
> https://github.com/stsp/dosemu2/commit/7898ac60d5e569964127d6cc48f592caecd20b81
So the problem is that dosemu was actually hacking around the old
buggy behavior and thus relying on it. Grr.
>> We might still be able to require a new sigcontext flag to be set and
>> to forcibly return to __USER_DS if the flag is set regardless of the
>> ss value in sigcontext when sigreturn is called, if that is indeed the
>> problem with DOSEMU. But I'm not actually sure that that's the
>> problem.
>
> Well, the flag would be an ideal solution in an ideal world,
> but in our world I don't know the current relevance of dosemu,
> and whether or not it worth a new flag to add.
It wouldn't even help here, because the breakage isn't caused by
incompatible sigcontext formats -- it's caused by dosemu's reliance on
ss being preserved across signal delivery (even if it wasn't preserved
on the way back).
>
>> In fact, DOSEMU contains this:
>>
>> /* set up a frame to get back to DPMI via iret. The kernel does not
>> save
>> %ss, and the SYSCALL instruction in sigreturn() destroys it.
>>
>> IRET pops off everything in 64-bit mode even if the privilege
>> does not change which is nice, but clobbers the high 48 bits
>> of rsp if the DPMI client uses a 16-bit stack which is not so
>> nice (see EMUfailure.txt). Setting %rsp to 0x100000000 so that
>> bits 16-31 are zero works around this problem, as DPMI code
>> can't see bits 32-63 anyway.
>> */
>>
>> So, if DOSEMU were to realize that both sigreturnissues it's
>> complaining about are fixed in recent kernels, it could sigreturn
>> directly back to any state.
>
> Good, but have you added any flag for dosemu to even know
> it can do this? Unless I am mistaken, you didn't. So the fix you
> suggest, is not easy to detect and make portable with the older
> kernels. Any suggestions?
>
You could probe for it directly: raise a signal, change the saved ss
and see what's in ss after sigreturn.
Let me see if I can come up with a clean kernel fix.
--Andy
12.08.2015 19:19, Andy Lutomirski пишет:
> On Wed, Aug 12, 2015 at 1:02 AM, Stas Sergeev <[email protected]> wrote:
>> 12.08.2015 03:38, Andy Lutomirski пишет:
>>> On Tue, Aug 11, 2015 at 5:17 PM, Stas Sergeev <[email protected]> wrote:
>>>> Hi guys, I wonder how easily the include/uapi/* is being
>>>> changed these days.
>>>> The patch:
>>>>
>>>> http://lkml.kernel.org/r/405594361340a2ec32f8e2b115c142df0e180d8e.1426193719.git.luto@kernel.org
>>>> breaks dosemu (and perhaps everyone else who used
>>>> to restore the segregs by hands). And the fix involves
>>>> both autoconf magic and run-time magic, so it is not even
>>>> trivial.
>>>> I realize this patch may be good to have in general, but
>>>> breaking userspace without a single warning is a bit
>>>> discouraging. Seems like the old "we don't break userspace"
>>>> rule have gone.
>>> I didn't anticipate any breakage. I could have been wrong.
>> You changed include/uapi/*, which is obviously an asking
>> for problems. I applied the following changes to my local
>> git tree to get dosemu working again:
> To be fair, I renamed a field that used to be padding. The UAPI has
> to change on occasion -- it's just not supposed to break things.
>
>> https://github.com/stsp/dosemu2/commit/48b2a13a49a9fe1a456cd77df6b9a1feec675a01
> Maybe I'm still missing something, but this seems like it should be
> unnecessary. What goes wrong without it?
Without it, dosemu stores and fetches the ss value
elsewhere. It could use any place for it, be it even a global var.
But with your patch, dosemu _needs_ to use the sigcontext.ss,
because that's where the kernel now puts it.
As a result, dosemu had to be changed to use sigcontext.ss
to load the ss from. Sounds good? Not! :)
The reality is that you'll have to work with the old headers,
that still have no sigcontext.ss, and so you'd need to access
ss via __pad0 in pretty much 100% of real-life setups.
If there is such a need to touch uapi from time to time
(and I understand this is the case), then perhaps you should
invent some versioning or whatever, to save people from
surprises.
Yes, you took the field that was used for padding.
But this doesn't help, because this is not a new functionality.
The existing programs now _need_ to use your new field
for what they did in the past without it. So it is nearly the
same as renaming any of the existing widely used fields:
people will need the autoconf hacks to probe its existence.
>> https://github.com/stsp/dosemu2/commit/7898ac60d5e569964127d6cc48f592caecd20b81
> So the problem is that dosemu was actually hacking around the old
> buggy behavior and thus relying on it. Grr.
What else it could do? :(
Please note: I am not trying to ask you to revert the patch.
I realize it is needed, I realize dosemu did bad things without
it. But the way it is done is a bit annoying: sudden breakage,
no run-time checks or versionings, and even the build-time
hacks needed.
>>> We might still be able to require a new sigcontext flag to be set and
>>> to forcibly return to __USER_DS if the flag is set regardless of the
>>> ss value in sigcontext when sigreturn is called, if that is indeed the
>>> problem with DOSEMU. But I'm not actually sure that that's the
>>> problem.
>> Well, the flag would be an ideal solution in an ideal world,
>> but in our world I don't know the current relevance of dosemu,
>> and whether or not it worth a new flag to add.
> It wouldn't even help here, because the breakage isn't caused by
> incompatible sigcontext formats -- it's caused by dosemu's reliance on
> ss being preserved across signal delivery
I thought you mean some per-process flag that would
preserve the backward-compatibility for the unaware apps.
I probably got you wrong.
> (even if it wasn't preserved
> on the way back).
How so?
IIRC sometime fs/gs were restored, but I have no evidence
the ss was. Could you clarify?
>>> In fact, DOSEMU contains this:
>>>
>>> /* set up a frame to get back to DPMI via iret. The kernel does not
>>> save
>>> %ss, and the SYSCALL instruction in sigreturn() destroys it.
>>>
>>> IRET pops off everything in 64-bit mode even if the privilege
>>> does not change which is nice, but clobbers the high 48 bits
>>> of rsp if the DPMI client uses a 16-bit stack which is not so
>>> nice (see EMUfailure.txt). Setting %rsp to 0x100000000 so that
>>> bits 16-31 are zero works around this problem, as DPMI code
>>> can't see bits 32-63 anyway.
>>> */
>>>
>>> So, if DOSEMU were to realize that both sigreturnissues it's
>>> complaining about are fixed in recent kernels, it could sigreturn
>>> directly back to any state.
>> Good, but have you added any flag for dosemu to even know
>> it can do this? Unless I am mistaken, you didn't. So the fix you
>> suggest, is not easy to detect and make portable with the older
>> kernels. Any suggestions?
>>
> You could probe for it directly: raise a signal, change the saved ss
> and see what's in ss after sigreturn.
Umm, nope.
> Let me see if I can come up with a clean kernel fix.
The check for proper sigreturn would be good.
On Wed, Aug 12, 2015 at 10:00 AM, Stas Sergeev <[email protected]> wrote:
> 12.08.2015 19:19, Andy Lutomirski пишет:
>>
>> Maybe I'm still missing something, but this seems like it should be
>> unnecessary. What goes wrong without it?
>
> Without it, dosemu stores and fetches the ss value
> elsewhere. It could use any place for it, be it even a global var.
> But with your patch, dosemu _needs_ to use the sigcontext.ss,
> because that's where the kernel now puts it.
> As a result, dosemu had to be changed to use sigcontext.ss
> to load the ss from. Sounds good? Not! :)
> The reality is that you'll have to work with the old headers,
> that still have no sigcontext.ss, and so you'd need to access
> ss via __pad0 in pretty much 100% of real-life setups.
> If there is such a need to touch uapi from time to time
> (and I understand this is the case), then perhaps you should
> invent some versioning or whatever, to save people from
> surprises.
> Yes, you took the field that was used for padding.
> But this doesn't help, because this is not a new functionality.
> The existing programs now _need_ to use your new field
> for what they did in the past without it. So it is nearly the
> same as renaming any of the existing widely used fields:
> people will need the autoconf hacks to probe its existence.
>
Yeah, I see the problem here.
>>>
>>> https://github.com/stsp/dosemu2/commit/7898ac60d5e569964127d6cc48f592caecd20b81
>>
>> So the problem is that dosemu was actually hacking around the old
>> buggy behavior and thus relying on it. Grr.
>
> What else it could do? :(
Going back in time? Ask the kernel to fix the issue. At this point,
it's a bit late for that.
>> It wouldn't even help here, because the breakage isn't caused by
>> incompatible sigcontext formats -- it's caused by dosemu's reliance on
>> ss being preserved across signal delivery
>
> I thought you mean some per-process flag that would
> preserve the backward-compatibility for the unaware apps.
> I probably got you wrong.
No, I meant a flag in sigcontext indicating which format was used.
>
>> (even if it wasn't preserved
>> on the way back).
>
> How so?
> IIRC sometime fs/gs were restored, but I have no evidence
> the ss was. Could you clarify?
ss was never restored by sigreturn AFAIK. I don't think that fs and
gs are, but I think they might have been a long long time ago (before
git).
>>>
>>> Good, but have you added any flag for dosemu to even know
>>> it can do this? Unless I am mistaken, you didn't. So the fix you
>>> suggest, is not easy to detect and make portable with the older
>>> kernels. Any suggestions?
>>>
>> You could probe for it directly: raise a signal, change the saved ss
>> and see what's in ss after sigreturn.
>
> Umm, nope.
Why not? The safest general way to detect new features is to try to use them.
>
>> Let me see if I can come up with a clean kernel fix.
>
> The check for proper sigreturn would be good.
I still don't see how sigreturn matters here. It's signal *delivery*
that's the problem.
I'm thinking of having signal delivery zap ss only if the old ss looks
bogus instead of zapping it unconditionally. IOW, instead of setting
regs->ss = __USER_DS unconditionally, we'd do larl on the old regs->ss
and keep it if it's DPL 3 RW data (exp-down or otherwise) and present.
I'll have to check the precise rules in both the SDM and APM. The
idea is that we don't want IRET to fail during signal delivery, which
can happen due to a bad sigreturn or a race against modify_ldt.
--Andy
12.08.2015 21:25, Andy Lutomirski пишет:
>>>> https://github.com/stsp/dosemu2/commit/7898ac60d5e569964127d6cc48f592caecd20b81
>>> So the problem is that dosemu was actually hacking around the old
>>> buggy behavior and thus relying on it. Grr.
>> What else it could do? :(
> Going back in time? Ask the kernel to fix the issue.
Like this?
http://www.x86-64.org/pipermail/discuss/2007-May/009913.html
And this:
http://www.x86-64.org/pipermail/discuss/2007-May/009923.html
>>>> Good, but have you added any flag for dosemu to even know
>>>> it can do this? Unless I am mistaken, you didn't. So the fix you
>>>> suggest, is not easy to detect and make portable with the older
>>>> kernels. Any suggestions?
>>>>
>>> You could probe for it directly: raise a signal, change the saved ss
>>> and see what's in ss after sigreturn.
>> Umm, nope.
> Why not? The safest general way to detect new features is to try to use them.
But this is just too many ugly code for nothing.
Since it is not very urgent to use sigreturn() instead of
iret, I guess I'll better wait for an API addition that will
let the check possible.
>>> Let me see if I can come up with a clean kernel fix.
>> The check for proper sigreturn would be good.
> I still don't see how sigreturn matters here. It's signal *delivery*
> that's the problem.
But the delivery can be easily checked with "if (ss & 4)".
What remains is just a sigreturn instead of iret.
> I'm thinking of having signal delivery zap ss only if the old ss looks
> bogus instead of zapping it unconditionally. IOW, instead of setting
> regs->ss = __USER_DS unconditionally, we'd do larl on the old regs->ss
> and keep it if it's DPL 3 RW data (exp-down or otherwise) and present.
I am not sure how good is this.
Yes, may help for a backward-compatibility.
But OTOH the 32bit kernel saves _all_ registers, including
ss, which is IMHO the right thing to do in general.
So as long as the things are already "broken", I wonder if the
new hacks are worth the troubles.
Please also see here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66631#c15
Not saving fs is a pita!
> I'll have to check the precise rules in both the SDM and APM. The
> idea is that we don't want IRET to fail during signal delivery, which
> can happen due to a bad sigreturn or a race against modify_ldt.
Well, this is a "very basic" idea, so to say.
The fact that segregs are not restored, have much more
consequences, and since now you already broke things,
I wonder if something can be finally fixed for good...
What alternatives do we have? Can we do something
really brave, introduce a new sigaction flag perhaps, that
will just restore all segregs for new apps, and none - for
old apps? I mean the above gcc bugzilla ticket in particular -
very annoying one...
On Wed, Aug 12, 2015 at 11:55 AM, Stas Sergeev <[email protected]> wrote:
> 12.08.2015 21:25, Andy Lutomirski пишет:
>>>>>
>>>>>
>>>>> https://github.com/stsp/dosemu2/commit/7898ac60d5e569964127d6cc48f592caecd20b81
>>>>
>>>> So the problem is that dosemu was actually hacking around the old
>>>> buggy behavior and thus relying on it. Grr.
>>>
>>> What else it could do? :(
>>
>> Going back in time? Ask the kernel to fix the issue.
>
> Like this?
> http://www.x86-64.org/pipermail/discuss/2007-May/009913.html
> And this:
> http://www.x86-64.org/pipermail/discuss/2007-May/009923.html
I apologize on behalf of the upstream kernel in 2007. :-/ I wasn't
really involved at that point.
>
>>>>> Good, but have you added any flag for dosemu to even know
>>>>> it can do this? Unless I am mistaken, you didn't. So the fix you
>>>>> suggest, is not easy to detect and make portable with the older
>>>>> kernels. Any suggestions?
>>>>>
>>>> You could probe for it directly: raise a signal, change the saved ss
>>>> and see what's in ss after sigreturn.
>>>
>>> Umm, nope.
>>
>> Why not? The safest general way to detect new features is to try to use
>> them.
>
> But this is just too many ugly code for nothing.
> Since it is not very urgent to use sigreturn() instead of
> iret, I guess I'll better wait for an API addition that will
> let the check possible.
I'll ponder it. You'll get a ~300 cycle speedup if you switch over
(IRET is really slow), but I doubt that matters much.
>
>>>> Let me see if I can come up with a clean kernel fix.
>>>
>>> The check for proper sigreturn would be good.
>>
>> I still don't see how sigreturn matters here. It's signal *delivery*
>> that's the problem.
>
> But the delivery can be easily checked with "if (ss & 4)".
> What remains is just a sigreturn instead of iret.
>
>> I'm thinking of having signal delivery zap ss only if the old ss looks
>> bogus instead of zapping it unconditionally. IOW, instead of setting
>> regs->ss = __USER_DS unconditionally, we'd do larl on the old regs->ss
>> and keep it if it's DPL 3 RW data (exp-down or otherwise) and present.
>
> I am not sure how good is this.
> Yes, may help for a backward-compatibility.
> But OTOH the 32bit kernel saves _all_ registers, including
> ss, which is IMHO the right thing to do in general.
I agree. So does x32.
Are you planning on merging your patches into upstream DOSEMU?
> So as long as the things are already "broken", I wonder if the
> new hacks are worth the troubles.
> Please also see here:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66631#c15
> Not saving fs is a pita!
>
That is an incredible can of worms, and it'll only get worse when
WRFSBASE, etc are enabled in Linux. I don't think I like the current
kernel behavior, but at least it's fully functional, unlike the old SS
behavior. (Except that FS, GS, and their bases aren't context
switched correctly right now if you pound on them hard enough.)
>> I'll have to check the precise rules in both the SDM and APM. The
>> idea is that we don't want IRET to fail during signal delivery, which
>> can happen due to a bad sigreturn or a race against modify_ldt.
>
> Well, this is a "very basic" idea, so to say.
> The fact that segregs are not restored, have much more
> consequences, and since now you already broke things,
> I wonder if something can be finally fixed for good...
>
> What alternatives do we have? Can we do something
> really brave, introduce a new sigaction flag perhaps, that
> will just restore all segregs for new apps, and none - for
> old apps? I mean the above gcc bugzilla ticket in particular -
> very annoying one...
We might need to do that.
Here's a nasty case:
void sighandler(...) {
switch_userspace_thread();
}
Suppose that switch_userspace_thread() changes fs. Now what? On
current kernels, it stays switched. If we change this, it won't stay
switched. Even ignoring old ABI, it's not really clear to me what the
right thing to do is.
--Andy
12.08.2015 22:20, Andy Lutomirski пишет:
> On Wed, Aug 12, 2015 at 11:55 AM, Stas Sergeev <[email protected]> wrote:
>> 12.08.2015 21:25, Andy Lutomirski пишет:
>>>>>>
>>>>>> https://github.com/stsp/dosemu2/commit/7898ac60d5e569964127d6cc48f592caecd20b81
>>>>> So the problem is that dosemu was actually hacking around the old
>>>>> buggy behavior and thus relying on it. Grr.
>>>> What else it could do? :(
>>> Going back in time? Ask the kernel to fix the issue.
>> Like this?
>> http://www.x86-64.org/pipermail/discuss/2007-May/009913.html
>> And this:
>> http://www.x86-64.org/pipermail/discuss/2007-May/009923.html
> I apologize on behalf of the upstream kernel in 2007. :-/ I wasn't
> really involved at that point.
:)
>>>>> Let me see if I can come up with a clean kernel fix.
>>>> The check for proper sigreturn would be good.
>>> I still don't see how sigreturn matters here. It's signal *delivery*
>>> that's the problem.
>> But the delivery can be easily checked with "if (ss & 4)".
>> What remains is just a sigreturn instead of iret.
>>
>>> I'm thinking of having signal delivery zap ss only if the old ss looks
>>> bogus instead of zapping it unconditionally. IOW, instead of setting
>>> regs->ss = __USER_DS unconditionally, we'd do larl on the old regs->ss
>>> and keep it if it's DPL 3 RW data (exp-down or otherwise) and present.
>> I am not sure how good is this.
>> Yes, may help for a backward-compatibility.
>> But OTOH the 32bit kernel saves _all_ registers, including
>> ss, which is IMHO the right thing to do in general.
> I agree. So does x32.
>
> Are you planning on merging your patches into upstream DOSEMU?
Unlikely.
My git tree counts ~3000 patches already, so it is an
entirely separate project, which just happen to share
some code with dosemu (and under different license).
But we can discuss a binary-compatible fix, so that the
original dosemu can work too.
>>> I'll have to check the precise rules in both the SDM and APM. The
>>> idea is that we don't want IRET to fail during signal delivery, which
>>> can happen due to a bad sigreturn or a race against modify_ldt.
>> Well, this is a "very basic" idea, so to say.
>> The fact that segregs are not restored, have much more
>> consequences, and since now you already broke things,
>> I wonder if something can be finally fixed for good...
>>
>> What alternatives do we have? Can we do something
>> really brave, introduce a new sigaction flag perhaps, that
>> will just restore all segregs for new apps, and none - for
>> old apps? I mean the above gcc bugzilla ticket in particular -
>> very annoying one...
> We might need to do that.
>
> Here's a nasty case:
>
> void sighandler(...) {
> switch_userspace_thread();
> }
>
> Suppose that switch_userspace_thread() changes fs. Now what? On
The crash - see the gcc ticket in the prev e-mail.
If fs on function entry differs from fs on exit, the gcc
stack protector will terminate the program.
So such code will not exist, except maybe in some
asm form...
> current kernels, it stays switched. If we change this, it won't stay
> switched. Even ignoring old ABI, it's not really clear to me what the
> right thing to do is.
There can be the following cases:
- switch_userspace_thread() switches fs to non-zero selector
- switch_userspace_thread() switches the fs base via syscall
- switch_userspace_thread() switches fs in sigcontext
- switch_userspace_thread() switches fs_base in sigcontext (???)
What exactly case do you have in mind?
I'd say, the way x86_32 is doing things - is good, but the
bases... perhaps, in ideal world, they should be a part of
the sigcontext as well?
On Wed, Aug 12, 2015 at 12:55 PM, Stas Sergeev <[email protected]> wrote:
> 12.08.2015 22:20, Andy Lutomirski пишет:
>> current kernels, it stays switched. If we change this, it won't stay
>> switched. Even ignoring old ABI, it's not really clear to me what the
>> right thing to do is.
>
> There can be the following cases:
> - switch_userspace_thread() switches fs to non-zero selector
> - switch_userspace_thread() switches the fs base via syscall
> - switch_userspace_thread() switches fs in sigcontext
> - switch_userspace_thread() switches fs_base in sigcontext (???)
> What exactly case do you have in mind?
> I'd say, the way x86_32 is doing things - is good, but the
> bases... perhaps, in ideal world, they should be a part of
> the sigcontext as well?
Any of the above. What do you want the kernel to do, and how does the
kernel know you want to do that? The kernel has to pick *some*
semantics here.
--
Andy Lutomirski
AMA Capital Management, LLC
12.08.2015 23:01, Andy Lutomirski пишет:
> On Wed, Aug 12, 2015 at 12:55 PM, Stas Sergeev <[email protected]> wrote:
>> 12.08.2015 22:20, Andy Lutomirski пишет:
>>> current kernels, it stays switched. If we change this, it won't stay
>>> switched. Even ignoring old ABI, it's not really clear to me what the
>>> right thing to do is.
>> There can be the following cases:
>> - switch_userspace_thread() switches fs to non-zero selector
>> - switch_userspace_thread() switches the fs base via syscall
>> - switch_userspace_thread() switches fs in sigcontext
>> - switch_userspace_thread() switches fs_base in sigcontext (???)
>> What exactly case do you have in mind?
>> I'd say, the way x86_32 is doing things - is good, but the
>> bases... perhaps, in ideal world, they should be a part of
>> the sigcontext as well?
> Any of the above. What do you want the kernel to do, and how does the
> kernel know you want to do that? The kernel has to pick *some*
> semantics here.
Assuming the bases are made the part of a sigcontext,
I'd say there would be no ambiguities remained at all:
whatever you change in a sigcontext, will be "applied" by
the sigreturn(). Whatever you put in the registers
(either segregs or MSRs), is valid until sigreturn(), then
forgotten forever.
The mess only comes in when some things are part of
sigcontext and some are not. But if you have _all_ things
accessable in sigcontext, then the user has a way of expressing
his needs very clearly: he'll either touch sigcontext or direct
values, depending on what he need.
Is this right?
On Wed, Aug 12, 2015 at 1:14 PM, Stas Sergeev <[email protected]> wrote:
> 12.08.2015 23:01, Andy Lutomirski пишет:
>>
>> On Wed, Aug 12, 2015 at 12:55 PM, Stas Sergeev <[email protected]> wrote:
>>>
>>> 12.08.2015 22:20, Andy Lutomirski пишет:
>>>>
>>>> current kernels, it stays switched. If we change this, it won't stay
>>>> switched. Even ignoring old ABI, it's not really clear to me what the
>>>> right thing to do is.
>>>
>>> There can be the following cases:
>>> - switch_userspace_thread() switches fs to non-zero selector
>>> - switch_userspace_thread() switches the fs base via syscall
>>> - switch_userspace_thread() switches fs in sigcontext
>>> - switch_userspace_thread() switches fs_base in sigcontext (???)
>>> What exactly case do you have in mind?
>>> I'd say, the way x86_32 is doing things - is good, but the
>>> bases... perhaps, in ideal world, they should be a part of
>>> the sigcontext as well?
>>
>> Any of the above. What do you want the kernel to do, and how does the
>> kernel know you want to do that? The kernel has to pick *some*
>> semantics here.
>
> Assuming the bases are made the part of a sigcontext,
> I'd say there would be no ambiguities remained at all:
> whatever you change in a sigcontext, will be "applied" by
> the sigreturn(). Whatever you put in the registers
> (either segregs or MSRs), is valid until sigreturn(), then
> forgotten forever.
> The mess only comes in when some things are part of
> sigcontext and some are not. But if you have _all_ things
> accessable in sigcontext, then the user has a way of expressing
> his needs very clearly: he'll either touch sigcontext or direct
> values, depending on what he need.
>
> Is this right?
Maybe, except that doing this might break existing code (Wine and Java
come to mind). I'm not really sure.
Anyway, can you give this and its parent a try:
https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/sigcontext&id=83a08d8c3f43c5524ffc0d88c0eff747716696f5
If they fix the problem for you, I'll improve the test cases and send
them to -stable.
--Andy
--
Andy Lutomirski
AMA Capital Management, LLC
12.08.2015 23:28, Andy Lutomirski пишет:
> On Wed, Aug 12, 2015 at 1:14 PM, Stas Sergeev <[email protected]> wrote:
>> 12.08.2015 23:01, Andy Lutomirski пишет:
>>> On Wed, Aug 12, 2015 at 12:55 PM, Stas Sergeev <[email protected]> wrote:
>>>> 12.08.2015 22:20, Andy Lutomirski пишет:
>>>>> current kernels, it stays switched. If we change this, it won't stay
>>>>> switched. Even ignoring old ABI, it's not really clear to me what the
>>>>> right thing to do is.
>>>> There can be the following cases:
>>>> - switch_userspace_thread() switches fs to non-zero selector
>>>> - switch_userspace_thread() switches the fs base via syscall
>>>> - switch_userspace_thread() switches fs in sigcontext
>>>> - switch_userspace_thread() switches fs_base in sigcontext (???)
>>>> What exactly case do you have in mind?
>>>> I'd say, the way x86_32 is doing things - is good, but the
>>>> bases... perhaps, in ideal world, they should be a part of
>>>> the sigcontext as well?
>>> Any of the above. What do you want the kernel to do, and how does the
>>> kernel know you want to do that? The kernel has to pick *some*
>>> semantics here.
>> Assuming the bases are made the part of a sigcontext,
>> I'd say there would be no ambiguities remained at all:
>> whatever you change in a sigcontext, will be "applied" by
>> the sigreturn(). Whatever you put in the registers
>> (either segregs or MSRs), is valid until sigreturn(), then
>> forgotten forever.
>> The mess only comes in when some things are part of
>> sigcontext and some are not. But if you have _all_ things
>> accessable in sigcontext, then the user has a way of expressing
>> his needs very clearly: he'll either touch sigcontext or direct
>> values, depending on what he need.
>>
>> Is this right?
> Maybe, except that doing this might break existing code (Wine and Java
> come to mind). I'm not really sure.
Yes, but that's why I was talking about some new
flag. Maybe a new sigaction() flag? Or something else that
will allow the user to request explicitly the new handling
where the things are all switched by the kernel. Then
the old programs that don't use that flag, will remain
unaffected. I realize this may be a lot of work... But please
note that there will be no more a chance like this one,
when things are already badly broken. :)
> Anyway, can you give this and its parent a try:
>
> https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/sigcontext&id=83a08d8c3f43c5524ffc0d88c0eff747716696f5
>
> If they fix the problem for you, I'll improve the test cases and send
> them to -stable.
:(
Doesn't look pretty at all.
Of course I'll test it if you can't think of any alternative,
but do you really think explicitly requesting a new interface
will not be possible, and we'll have to live with work-arounds
and new problems like in the gcc tracker popping up once in
a while?
On Wed, Aug 12, 2015 at 1:45 PM, Stas Sergeev <[email protected]> wrote:
> 12.08.2015 23:28, Andy Lutomirski пишет:
>
>> On Wed, Aug 12, 2015 at 1:14 PM, Stas Sergeev <[email protected]> wrote:
>>>
>>> 12.08.2015 23:01, Andy Lutomirski пишет:
>>>>
>>>> On Wed, Aug 12, 2015 at 12:55 PM, Stas Sergeev <[email protected]> wrote:
>>>>>
>>>>> 12.08.2015 22:20, Andy Lutomirski пишет:
>>>>>>
>>>>>> current kernels, it stays switched. If we change this, it won't stay
>>>>>> switched. Even ignoring old ABI, it's not really clear to me what the
>>>>>> right thing to do is.
>>>>>
>>>>> There can be the following cases:
>>>>> - switch_userspace_thread() switches fs to non-zero selector
>>>>> - switch_userspace_thread() switches the fs base via syscall
>>>>> - switch_userspace_thread() switches fs in sigcontext
>>>>> - switch_userspace_thread() switches fs_base in sigcontext (???)
>>>>> What exactly case do you have in mind?
>>>>> I'd say, the way x86_32 is doing things - is good, but the
>>>>> bases... perhaps, in ideal world, they should be a part of
>>>>> the sigcontext as well?
>>>>
>>>> Any of the above. What do you want the kernel to do, and how does the
>>>> kernel know you want to do that? The kernel has to pick *some*
>>>> semantics here.
>>>
>>> Assuming the bases are made the part of a sigcontext,
>>> I'd say there would be no ambiguities remained at all:
>>> whatever you change in a sigcontext, will be "applied" by
>>> the sigreturn(). Whatever you put in the registers
>>> (either segregs or MSRs), is valid until sigreturn(), then
>>> forgotten forever.
>>> The mess only comes in when some things are part of
>>> sigcontext and some are not. But if you have _all_ things
>>> accessable in sigcontext, then the user has a way of expressing
>>> his needs very clearly: he'll either touch sigcontext or direct
>>> values, depending on what he need.
>>>
>>> Is this right?
>>
>> Maybe, except that doing this might break existing code (Wine and Java
>> come to mind). I'm not really sure.
>
> Yes, but that's why I was talking about some new
> flag. Maybe a new sigaction() flag? Or something else that
> will allow the user to request explicitly the new handling
> where the things are all switched by the kernel. Then
> the old programs that don't use that flag, will remain
> unaffected. I realize this may be a lot of work... But please
> note that there will be no more a chance like this one,
> when things are already badly broken. :)
I think that, with my patch, we get the best of both worlds. We keep
the old behavior in cases where it would work, and we switch to the
new behavior in cases where the old behavior would result in killing
the task.
>
>> Anyway, can you give this and its parent a try:
>>
>>
>> https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/sigcontext&id=83a08d8c3f43c5524ffc0d88c0eff747716696f5
>>
>> If they fix the problem for you, I'll improve the test cases and send
>> them to -stable.
>
> :(
> Doesn't look pretty at all.
> Of course I'll test it if you can't think of any alternative,
> but do you really think explicitly requesting a new interface
> will not be possible, and we'll have to live with work-arounds
> and new problems like in the gcc tracker popping up once in
> a while?
I think that explicitly asking for the new behavior would just make it uglier.
--Andy
12.08.2015 23:47, Andy Lutomirski пишет:
> On Wed, Aug 12, 2015 at 1:45 PM, Stas Sergeev <[email protected]> wrote:
>> 12.08.2015 23:28, Andy Lutomirski пишет:
>>
>>> On Wed, Aug 12, 2015 at 1:14 PM, Stas Sergeev <[email protected]> wrote:
>>>> 12.08.2015 23:01, Andy Lutomirski пишет:
>>>>> On Wed, Aug 12, 2015 at 12:55 PM, Stas Sergeev <[email protected]> wrote:
>>>>>> 12.08.2015 22:20, Andy Lutomirski пишет:
>>>>>>> current kernels, it stays switched. If we change this, it won't stay
>>>>>>> switched. Even ignoring old ABI, it's not really clear to me what the
>>>>>>> right thing to do is.
>>>>>> There can be the following cases:
>>>>>> - switch_userspace_thread() switches fs to non-zero selector
>>>>>> - switch_userspace_thread() switches the fs base via syscall
>>>>>> - switch_userspace_thread() switches fs in sigcontext
>>>>>> - switch_userspace_thread() switches fs_base in sigcontext (???)
>>>>>> What exactly case do you have in mind?
>>>>>> I'd say, the way x86_32 is doing things - is good, but the
>>>>>> bases... perhaps, in ideal world, they should be a part of
>>>>>> the sigcontext as well?
>>>>> Any of the above. What do you want the kernel to do, and how does the
>>>>> kernel know you want to do that? The kernel has to pick *some*
>>>>> semantics here.
>>>> Assuming the bases are made the part of a sigcontext,
>>>> I'd say there would be no ambiguities remained at all:
>>>> whatever you change in a sigcontext, will be "applied" by
>>>> the sigreturn(). Whatever you put in the registers
>>>> (either segregs or MSRs), is valid until sigreturn(), then
>>>> forgotten forever.
>>>> The mess only comes in when some things are part of
>>>> sigcontext and some are not. But if you have _all_ things
>>>> accessable in sigcontext, then the user has a way of expressing
>>>> his needs very clearly: he'll either touch sigcontext or direct
>>>> values, depending on what he need.
>>>>
>>>> Is this right?
>>> Maybe, except that doing this might break existing code (Wine and Java
>>> come to mind). I'm not really sure.
>> Yes, but that's why I was talking about some new
>> flag. Maybe a new sigaction() flag? Or something else that
>> will allow the user to request explicitly the new handling
>> where the things are all switched by the kernel. Then
>> the old programs that don't use that flag, will remain
>> unaffected. I realize this may be a lot of work... But please
>> note that there will be no more a chance like this one,
>> when things are already badly broken. :)
> I think that, with my patch, we get the best of both worlds. We keep
> the old behavior in cases where it would work, and we switch to the
> new behavior in cases where the old behavior would result in killing
> the task.
But I mean also fs/TLS.
There is a chance now to fix things for good, all at once. :)
With such an ss patch applied to stable, there will be no more
such a chance ever. What's your opinion on the possibility of
fixing the TLS problem?
Also I am not sure about the sigreturn()'s detection: is it
a subject of the subsequent patch, or you dropped an idea?
On Wed, Aug 12, 2015 at 1:55 PM, Stas Sergeev <[email protected]> wrote:
> 12.08.2015 23:47, Andy Lutomirski пишет:
>
>> On Wed, Aug 12, 2015 at 1:45 PM, Stas Sergeev <[email protected]> wrote:
>>>
>>> 12.08.2015 23:28, Andy Lutomirski пишет:
>>>
>>>> On Wed, Aug 12, 2015 at 1:14 PM, Stas Sergeev <[email protected]> wrote:
>>>>>
>>>>> 12.08.2015 23:01, Andy Lutomirski пишет:
>>>>>>
>>>>>> On Wed, Aug 12, 2015 at 12:55 PM, Stas Sergeev <[email protected]> wrote:
>>>>>>>
>>>>>>> 12.08.2015 22:20, Andy Lutomirski пишет:
>>>>>>>>
>>>>>>>> current kernels, it stays switched. If we change this, it won't
>>>>>>>> stay
>>>>>>>> switched. Even ignoring old ABI, it's not really clear to me what
>>>>>>>> the
>>>>>>>> right thing to do is.
>>>>>>>
>>>>>>> There can be the following cases:
>>>>>>> - switch_userspace_thread() switches fs to non-zero selector
>>>>>>> - switch_userspace_thread() switches the fs base via syscall
>>>>>>> - switch_userspace_thread() switches fs in sigcontext
>>>>>>> - switch_userspace_thread() switches fs_base in sigcontext (???)
>>>>>>> What exactly case do you have in mind?
>>>>>>> I'd say, the way x86_32 is doing things - is good, but the
>>>>>>> bases... perhaps, in ideal world, they should be a part of
>>>>>>> the sigcontext as well?
>>>>>>
>>>>>> Any of the above. What do you want the kernel to do, and how does the
>>>>>> kernel know you want to do that? The kernel has to pick *some*
>>>>>> semantics here.
>>>>>
>>>>> Assuming the bases are made the part of a sigcontext,
>>>>> I'd say there would be no ambiguities remained at all:
>>>>> whatever you change in a sigcontext, will be "applied" by
>>>>> the sigreturn(). Whatever you put in the registers
>>>>> (either segregs or MSRs), is valid until sigreturn(), then
>>>>> forgotten forever.
>>>>> The mess only comes in when some things are part of
>>>>> sigcontext and some are not. But if you have _all_ things
>>>>> accessable in sigcontext, then the user has a way of expressing
>>>>> his needs very clearly: he'll either touch sigcontext or direct
>>>>> values, depending on what he need.
>>>>>
>>>>> Is this right?
>>>>
>>>> Maybe, except that doing this might break existing code (Wine and Java
>>>> come to mind). I'm not really sure.
>>>
>>> Yes, but that's why I was talking about some new
>>> flag. Maybe a new sigaction() flag? Or something else that
>>> will allow the user to request explicitly the new handling
>>> where the things are all switched by the kernel. Then
>>> the old programs that don't use that flag, will remain
>>> unaffected. I realize this may be a lot of work... But please
>>> note that there will be no more a chance like this one,
>>> when things are already badly broken. :)
>>
>> I think that, with my patch, we get the best of both worlds. We keep
>> the old behavior in cases where it would work, and we switch to the
>> new behavior in cases where the old behavior would result in killing
>> the task.
>
> But I mean also fs/TLS.
> There is a chance now to fix things for good, all at once. :)
> With such an ss patch applied to stable, there will be no more
> such a chance ever. What's your opinion on the possibility of
> fixing the TLS problem?
> Also I am not sure about the sigreturn()'s detection: is it
> a subject of the subsequent patch, or you dropped an idea?
I think these things shouldn't be conflated. If we can fix it
transparently (i.e. if my patch works), then I think we should do
something like my patch.
--Andy
--
Andy Lutomirski
AMA Capital Management, LLC
13.08.2015 00:37, Andy Lutomirski пишет:
> On Wed, Aug 12, 2015 at 1:55 PM, Stas Sergeev <[email protected]> wrote:
>> 12.08.2015 23:47, Andy Lutomirski пишет:
>>
>>> On Wed, Aug 12, 2015 at 1:45 PM, Stas Sergeev <[email protected]> wrote:
>>>> 12.08.2015 23:28, Andy Lutomirski пишет:
>>>>
>>>>> On Wed, Aug 12, 2015 at 1:14 PM, Stas Sergeev <[email protected]> wrote:
>>>>>> 12.08.2015 23:01, Andy Lutomirski пишет:
>>>>>>> On Wed, Aug 12, 2015 at 12:55 PM, Stas Sergeev <[email protected]> wrote:
>>>>>>>> 12.08.2015 22:20, Andy Lutomirski пишет:
>>>>>>>>> current kernels, it stays switched. If we change this, it won't
>>>>>>>>> stay
>>>>>>>>> switched. Even ignoring old ABI, it's not really clear to me what
>>>>>>>>> the
>>>>>>>>> right thing to do is.
>>>>>>>> There can be the following cases:
>>>>>>>> - switch_userspace_thread() switches fs to non-zero selector
>>>>>>>> - switch_userspace_thread() switches the fs base via syscall
>>>>>>>> - switch_userspace_thread() switches fs in sigcontext
>>>>>>>> - switch_userspace_thread() switches fs_base in sigcontext (???)
>>>>>>>> What exactly case do you have in mind?
>>>>>>>> I'd say, the way x86_32 is doing things - is good, but the
>>>>>>>> bases... perhaps, in ideal world, they should be a part of
>>>>>>>> the sigcontext as well?
>>>>>>> Any of the above. What do you want the kernel to do, and how does the
>>>>>>> kernel know you want to do that? The kernel has to pick *some*
>>>>>>> semantics here.
>>>>>> Assuming the bases are made the part of a sigcontext,
>>>>>> I'd say there would be no ambiguities remained at all:
>>>>>> whatever you change in a sigcontext, will be "applied" by
>>>>>> the sigreturn(). Whatever you put in the registers
>>>>>> (either segregs or MSRs), is valid until sigreturn(), then
>>>>>> forgotten forever.
>>>>>> The mess only comes in when some things are part of
>>>>>> sigcontext and some are not. But if you have _all_ things
>>>>>> accessable in sigcontext, then the user has a way of expressing
>>>>>> his needs very clearly: he'll either touch sigcontext or direct
>>>>>> values, depending on what he need.
>>>>>>
>>>>>> Is this right?
>>>>> Maybe, except that doing this might break existing code (Wine and Java
>>>>> come to mind). I'm not really sure.
>>>> Yes, but that's why I was talking about some new
>>>> flag. Maybe a new sigaction() flag? Or something else that
>>>> will allow the user to request explicitly the new handling
>>>> where the things are all switched by the kernel. Then
>>>> the old programs that don't use that flag, will remain
>>>> unaffected. I realize this may be a lot of work... But please
>>>> note that there will be no more a chance like this one,
>>>> when things are already badly broken. :)
>>> I think that, with my patch, we get the best of both worlds. We keep
>>> the old behavior in cases where it would work, and we switch to the
>>> new behavior in cases where the old behavior would result in killing
>>> the task.
>> But I mean also fs/TLS.
>> There is a chance now to fix things for good, all at once. :)
>> With such an ss patch applied to stable, there will be no more
>> such a chance ever. What's your opinion on the possibility of
>> fixing the TLS problem?
>> Also I am not sure about the sigreturn()'s detection: is it
>> a subject of the subsequent patch, or you dropped an idea?
> I think these things shouldn't be conflated. If we can fix it
> transparently (i.e. if my patch works), then I think we should do
> something like my patch.
OK.
I'll try to test the patch tomorrow, but I think the sigreturn()'s
capability detection is still needed to easily replace the iret trampoline
in userspace (without generating a signal and testing by hands).
Can of course be done with a run-time kernel version check...
On Wed, Aug 12, 2015 at 2:50 PM, Stas Sergeev <[email protected]> wrote:
> 13.08.2015 00:37, Andy Lutomirski пишет:
>
>> On Wed, Aug 12, 2015 at 1:55 PM, Stas Sergeev <[email protected]> wrote:
>>>
>>> 12.08.2015 23:47, Andy Lutomirski пишет:
>>>
>>>> On Wed, Aug 12, 2015 at 1:45 PM, Stas Sergeev <[email protected]> wrote:
>>>>>
>>>>> 12.08.2015 23:28, Andy Lutomirski пишет:
>>>>>
>>>>>> On Wed, Aug 12, 2015 at 1:14 PM, Stas Sergeev <[email protected]> wrote:
>>>>>>>
>>>>>>> 12.08.2015 23:01, Andy Lutomirski пишет:
>>>>>>>>
>>>>>>>> On Wed, Aug 12, 2015 at 12:55 PM, Stas Sergeev <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>> 12.08.2015 22:20, Andy Lutomirski пишет:
>>>>>>>>>>
>>>>>>>>>> current kernels, it stays switched. If we change this, it won't
>>>>>>>>>> stay
>>>>>>>>>> switched. Even ignoring old ABI, it's not really clear to me what
>>>>>>>>>> the
>>>>>>>>>> right thing to do is.
>>>>>>>>>
>>>>>>>>> There can be the following cases:
>>>>>>>>> - switch_userspace_thread() switches fs to non-zero selector
>>>>>>>>> - switch_userspace_thread() switches the fs base via syscall
>>>>>>>>> - switch_userspace_thread() switches fs in sigcontext
>>>>>>>>> - switch_userspace_thread() switches fs_base in sigcontext (???)
>>>>>>>>> What exactly case do you have in mind?
>>>>>>>>> I'd say, the way x86_32 is doing things - is good, but the
>>>>>>>>> bases... perhaps, in ideal world, they should be a part of
>>>>>>>>> the sigcontext as well?
>>>>>>>>
>>>>>>>> Any of the above. What do you want the kernel to do, and how does
>>>>>>>> the
>>>>>>>> kernel know you want to do that? The kernel has to pick *some*
>>>>>>>> semantics here.
>>>>>>>
>>>>>>> Assuming the bases are made the part of a sigcontext,
>>>>>>> I'd say there would be no ambiguities remained at all:
>>>>>>> whatever you change in a sigcontext, will be "applied" by
>>>>>>> the sigreturn(). Whatever you put in the registers
>>>>>>> (either segregs or MSRs), is valid until sigreturn(), then
>>>>>>> forgotten forever.
>>>>>>> The mess only comes in when some things are part of
>>>>>>> sigcontext and some are not. But if you have _all_ things
>>>>>>> accessable in sigcontext, then the user has a way of expressing
>>>>>>> his needs very clearly: he'll either touch sigcontext or direct
>>>>>>> values, depending on what he need.
>>>>>>>
>>>>>>> Is this right?
>>>>>>
>>>>>> Maybe, except that doing this might break existing code (Wine and Java
>>>>>> come to mind). I'm not really sure.
>>>>>
>>>>> Yes, but that's why I was talking about some new
>>>>> flag. Maybe a new sigaction() flag? Or something else that
>>>>> will allow the user to request explicitly the new handling
>>>>> where the things are all switched by the kernel. Then
>>>>> the old programs that don't use that flag, will remain
>>>>> unaffected. I realize this may be a lot of work... But please
>>>>> note that there will be no more a chance like this one,
>>>>> when things are already badly broken. :)
>>>>
>>>> I think that, with my patch, we get the best of both worlds. We keep
>>>> the old behavior in cases where it would work, and we switch to the
>>>> new behavior in cases where the old behavior would result in killing
>>>> the task.
>>>
>>> But I mean also fs/TLS.
>>> There is a chance now to fix things for good, all at once. :)
>>> With such an ss patch applied to stable, there will be no more
>>> such a chance ever. What's your opinion on the possibility of
>>> fixing the TLS problem?
>>> Also I am not sure about the sigreturn()'s detection: is it
>>> a subject of the subsequent patch, or you dropped an idea?
>>
>> I think these things shouldn't be conflated. If we can fix it
>> transparently (i.e. if my patch works), then I think we should do
>> something like my patch.
>
> OK.
> I'll try to test the patch tomorrow, but I think the sigreturn()'s
> capability detection is still needed to easily replace the iret trampoline
> in userspace (without generating a signal and testing by hands).
> Can of course be done with a run-time kernel version check...
That feature is so specialized that I think you should just probe it.
void foo(...) {
sigcontext->ss = 7;
}
modify_ldt(initialize descriptor 0);
sigaction(SIGUSR1, foo, SA_SIGINFO);
if (ss == 7)
yay;
Fortunately, all kernels that restore ss also have espfix64, so you
don't need to worry about esp[31:16] corruption on those kernels
either.
I suppose we could add a new uc_flag to indicate that ss is saved and
restored, though. Ingo, hpa: any thoughts on that? There will always
be some kernel versions that save and restore ss but don't set the
flag, though.
--Andy
--
Andy Lutomirski
AMA Capital Management, LLC
* Andy Lutomirski <[email protected]> wrote:
> > OK.
> > I'll try to test the patch tomorrow, but I think the sigreturn()'s
> > capability detection is still needed to easily replace the iret trampoline
> > in userspace (without generating a signal and testing by hands).
> > Can of course be done with a run-time kernel version check...
>
> That feature is so specialized that I think you should just probe it.
>
> void foo(...) {
> sigcontext->ss = 7;
> }
>
> modify_ldt(initialize descriptor 0);
> sigaction(SIGUSR1, foo, SA_SIGINFO);
> if (ss == 7)
> yay;
>
> Fortunately, all kernels that restore ss also have espfix64, so you
> don't need to worry about esp[31:16] corruption on those kernels
> either.
>
> I suppose we could add a new uc_flag to indicate that ss is saved and restored,
> though. Ingo, hpa: any thoughts on that? There will always be some kernel
> versions that save and restore ss but don't set the flag, though.
So this new flag would essentially be a 'the ss save/restore bug is fixed for
sure' flag, not covering old kernels that happen to have the correct behavior,
right?
Could you please map out the range of kernel versions involved - which ones:
- 'never do the right thing'
- 'do the right thing sometimes'
- 'do the right thing always, but by accident'
- 'do the right thing always and intentionally'
?
I'd hate to complicate a legacy ABI any more. My gut feeling is to let apps either
assume that the kernel works right, or probe the actual behavior. Adding the flag
just makes it easy to screw certain kernel versions that would still work fine if
the app used actual probing. So I don't see the flag as an improvement.
If your patch fixes the regression that would be a good first step.
Please also send the fix in email instead of just pointing to a Git tree, so that
people can comment on it.
Thanks,
Ingo
13.08.2015 11:39, Ingo Molnar пишет:
> * Andy Lutomirski <[email protected]> wrote:
>
>>> OK.
>>> I'll try to test the patch tomorrow, but I think the sigreturn()'s
>>> capability detection is still needed to easily replace the iret trampoline
>>> in userspace (without generating a signal and testing by hands).
>>> Can of course be done with a run-time kernel version check...
>> That feature is so specialized that I think you should just probe it.
>>
>> void foo(...) {
>> sigcontext->ss = 7;
>> }
>>
>> modify_ldt(initialize descriptor 0);
>> sigaction(SIGUSR1, foo, SA_SIGINFO);
>> if (ss == 7)
>> yay;
>>
>> Fortunately, all kernels that restore ss also have espfix64, so you
>> don't need to worry about esp[31:16] corruption on those kernels
>> either.
>>
>> I suppose we could add a new uc_flag to indicate that ss is saved and restored,
>> though. Ingo, hpa: any thoughts on that? There will always be some kernel
>> versions that save and restore ss but don't set the flag, though.
> So this new flag would essentially be a 'the ss save/restore bug is fixed for
> sure' flag, not covering old kernels that happen to have the correct behavior,
> right?
>
> Could you please map out the range of kernel versions involved - which ones:
"me too"
I'll just hard-code that version info into an app - testing version
is the same as testing the flag.
> - 'never do the right thing'
> - 'do the right thing sometimes'
> - 'do the right thing always, but by accident'
> - 'do the right thing always and intentionally'
>
> ?
>
> I'd hate to complicate a legacy ABI any more. My gut feeling is to let apps either
> assume that the kernel works right, or probe the actual behavior.
The problem is that current apps assume the kernel works _wrong_,
here's the reference:
http://www.x86-64.org/pipermail/discuss/2007-May/009913.html
http://www.x86-64.org/pipermail/discuss/2007-May/009923.html
And as such, they break.
Details earlier in that thread.
> Adding the flag
> just makes it easy to screw certain kernel versions that would still work fine if
> the app used actual probing. So I don't see the flag as an improvement.
>
> If your patch fixes the regression that would be a good first step.
Unfortunately, the currently proposed patch puts the kernel
in the category 'do the right thing sometimes' from the current
'do the right thing always and intentionally'. This helps the legacy
apps to work though...
I am not saying this patch should not be applied, but I'd like
to discuss the alternatives a bit (although Andy already put it clear
that this is likely the final decision, so I am just putting a newly CCed
people into a course of events).
Possibilities:
1. leave things as they are and fix dosemu instead (this is against
the kernel policy to not break userspace, but I wonder how relevant
dosemu is today)
2. Apply Andy's patch that does tricky sanity tests with lar, and
if they pass - do not restore the right ss value.
3. Admitting the wider problem, for example TLS is also not restored:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66631
and fix everything at once, probably adding a new sigaction()
flag to allow an apps to request the behaviour they need (keeping
the old behaviour by default).
Of course all of the above have pros and cons.
3 is a lot of work, both 1 and 2 have patches ready.
> Please also send the fix in email instead of just pointing to a Git tree, so that
> people can comment on it.
A reference for newly CCed people:
https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/sigcontext&id=83a08d8c3f43c5524ffc0d88c0eff747716696f5
Not the prettiest possible approach, but at least should work...
13.08.2015 01:00, Andy Lutomirski пишет:
> On Wed, Aug 12, 2015 at 2:50 PM, Stas Sergeev <[email protected]> wrote:
>> 13.08.2015 00:37, Andy Lutomirski пишет:
>>
>>> On Wed, Aug 12, 2015 at 1:55 PM, Stas Sergeev <[email protected]> wrote:
>>>> 12.08.2015 23:47, Andy Lutomirski пишет:
>>>>
>>>>> On Wed, Aug 12, 2015 at 1:45 PM, Stas Sergeev <[email protected]> wrote:
>>>>>> 12.08.2015 23:28, Andy Lutomirski пишет:
>>>>>>
>>>>>>> On Wed, Aug 12, 2015 at 1:14 PM, Stas Sergeev <[email protected]> wrote:
>>>>>>>> 12.08.2015 23:01, Andy Lutomirski пишет:
>>>>>>>>> On Wed, Aug 12, 2015 at 12:55 PM, Stas Sergeev <[email protected]> wrote:
>>>>>>>>>> 12.08.2015 22:20, Andy Lutomirski пишет:
>>>>>>>>>>> current kernels, it stays switched. If we change this, it won't
>>>>>>>>>>> stay
>>>>>>>>>>> switched. Even ignoring old ABI, it's not really clear to me what
>>>>>>>>>>> the
>>>>>>>>>>> right thing to do is.
>>>>>>>>>> There can be the following cases:
>>>>>>>>>> - switch_userspace_thread() switches fs to non-zero selector
>>>>>>>>>> - switch_userspace_thread() switches the fs base via syscall
>>>>>>>>>> - switch_userspace_thread() switches fs in sigcontext
>>>>>>>>>> - switch_userspace_thread() switches fs_base in sigcontext (???)
>>>>>>>>>> What exactly case do you have in mind?
>>>>>>>>>> I'd say, the way x86_32 is doing things - is good, but the
>>>>>>>>>> bases... perhaps, in ideal world, they should be a part of
>>>>>>>>>> the sigcontext as well?
>>>>>>>>> Any of the above. What do you want the kernel to do, and how does
>>>>>>>>> the
>>>>>>>>> kernel know you want to do that? The kernel has to pick *some*
>>>>>>>>> semantics here.
>>>>>>>> Assuming the bases are made the part of a sigcontext,
>>>>>>>> I'd say there would be no ambiguities remained at all:
>>>>>>>> whatever you change in a sigcontext, will be "applied" by
>>>>>>>> the sigreturn(). Whatever you put in the registers
>>>>>>>> (either segregs or MSRs), is valid until sigreturn(), then
>>>>>>>> forgotten forever.
>>>>>>>> The mess only comes in when some things are part of
>>>>>>>> sigcontext and some are not. But if you have _all_ things
>>>>>>>> accessable in sigcontext, then the user has a way of expressing
>>>>>>>> his needs very clearly: he'll either touch sigcontext or direct
>>>>>>>> values, depending on what he need.
>>>>>>>>
>>>>>>>> Is this right?
>>>>>>> Maybe, except that doing this might break existing code (Wine and Java
>>>>>>> come to mind). I'm not really sure.
>>>>>> Yes, but that's why I was talking about some new
>>>>>> flag. Maybe a new sigaction() flag? Or something else that
>>>>>> will allow the user to request explicitly the new handling
>>>>>> where the things are all switched by the kernel. Then
>>>>>> the old programs that don't use that flag, will remain
>>>>>> unaffected. I realize this may be a lot of work... But please
>>>>>> note that there will be no more a chance like this one,
>>>>>> when things are already badly broken. :)
>>>>> I think that, with my patch, we get the best of both worlds. We keep
>>>>> the old behavior in cases where it would work, and we switch to the
>>>>> new behavior in cases where the old behavior would result in killing
>>>>> the task.
>>>> But I mean also fs/TLS.
>>>> There is a chance now to fix things for good, all at once. :)
>>>> With such an ss patch applied to stable, there will be no more
>>>> such a chance ever. What's your opinion on the possibility of
>>>> fixing the TLS problem?
>>>> Also I am not sure about the sigreturn()'s detection: is it
>>>> a subject of the subsequent patch, or you dropped an idea?
>>> I think these things shouldn't be conflated. If we can fix it
>>> transparently (i.e. if my patch works), then I think we should do
>>> something like my patch.
>> OK.
>> I'll try to test the patch tomorrow, but I think the sigreturn()'s
>> capability detection is still needed to easily replace the iret trampoline
>> in userspace (without generating a signal and testing by hands).
>> Can of course be done with a run-time kernel version check...
> That feature is so specialized that I think you should just probe it.
>
> void foo(...) {
> sigcontext->ss = 7;
> }
>
> modify_ldt(initialize descriptor 0);
> sigaction(SIGUSR1, foo, SA_SIGINFO);
> if (ss == 7)
> yay;
>
> Fortunately, all kernels that restore ss also have espfix64, so you
> don't need to worry about esp[31:16] corruption on those kernels
> either.
Unfortunately, this doesn't help.
I made a simple patch that checks the kernel version:
https://github.com/stsp/dosemu2/commit/098413ef8de98972ca795e078351ae9f3cc07ffe
but iret is still used when switching to DOS code from dosemu
code, rather than from signal handler. So espfix64 doesn't help
much.
I guess the only real solution to this would be to "rewrite" dosemu
so that the DOS code is run in a clone(CLONE_VM) separate process...
13.08.2015 11:39, Ingo Molnar пишет:
> * Andy Lutomirski <[email protected]> wrote:
>
>>> OK.
>>> I'll try to test the patch tomorrow, but I think the sigreturn()'s
>>> capability detection is still needed to easily replace the iret trampoline
>>> in userspace (without generating a signal and testing by hands).
>>> Can of course be done with a run-time kernel version check...
>> That feature is so specialized that I think you should just probe it.
>>
>> void foo(...) {
>> sigcontext->ss = 7;
>> }
>>
>> modify_ldt(initialize descriptor 0);
>> sigaction(SIGUSR1, foo, SA_SIGINFO);
>> if (ss == 7)
>> yay;
>>
>> Fortunately, all kernels that restore ss also have espfix64, so you
>> don't need to worry about esp[31:16] corruption on those kernels
>> either.
>>
>> I suppose we could add a new uc_flag to indicate that ss is saved and restored,
>> though. Ingo, hpa: any thoughts on that? There will always be some kernel
>> versions that save and restore ss but don't set the flag, though.
> So this new flag would essentially be a 'the ss save/restore bug is fixed for
> sure' flag, not covering old kernels that happen to have the correct behavior,
> right?
>
> Could you please map out the range of kernel versions involved - which ones:
>
> - 'never do the right thing'
> - 'do the right thing sometimes'
> - 'do the right thing always, but by accident'
> - 'do the right thing always and intentionally'
>
> ?
>
> I'd hate to complicate a legacy ABI any more. My gut feeling is to let apps either
> assume that the kernel works right, or probe the actual behavior. Adding the flag
> just makes it easy to screw certain kernel versions that would still work fine if
> the app used actual probing. So I don't see the flag as an improvement.
>
> If your patch fixes the regression that would be a good first step.
I've tested the patch.
It doesn't fix the problem.
It allows dosemu to save the ss the old way, but,
because dosemu doesn't save it to the sigreturn()'s-expected
place (sigcontext.__pad0), it crashes on sigreturn().
So the problem can't be fixed this way --> NACK to the patch.
I may be unavailable for further testings till next week.
On Thu, Aug 13, 2015 at 5:44 AM, Stas Sergeev <[email protected]> wrote:
> 13.08.2015 11:39, Ingo Molnar пишет:
>>
>> * Andy Lutomirski <[email protected]> wrote:
>>
>>
>>>> OK.
>>>> I'll try to test the patch tomorrow, but I think the sigreturn()'s
>>>> capability detection is still needed to easily replace the iret
>>>> trampoline
>>>> in userspace (without generating a signal and testing by hands).
>>>> Can of course be done with a run-time kernel version check...
>>>
>>> That feature is so specialized that I think you should just probe it.
>>>
>>> void foo(...) {
>>> sigcontext->ss = 7;
>>> }
>>>
>>> modify_ldt(initialize descriptor 0);
>>> sigaction(SIGUSR1, foo, SA_SIGINFO);
>>> if (ss == 7)
>>> yay;
>>>
>>> Fortunately, all kernels that restore ss also have espfix64, so you
>>> don't need to worry about esp[31:16] corruption on those kernels
>>> either.
>>>
>>> I suppose we could add a new uc_flag to indicate that ss is saved and
>>> restored,
>>> though. Ingo, hpa: any thoughts on that? There will always be some
>>> kernel
>>> versions that save and restore ss but don't set the flag, though.
>>
>> So this new flag would essentially be a 'the ss save/restore bug is fixed
>> for
>> sure' flag, not covering old kernels that happen to have the correct
>> behavior,
>> right?
>>
>> Could you please map out the range of kernel versions involved - which
>> ones:
>>
>> - 'never do the right thing'
>> - 'do the right thing sometimes'
>> - 'do the right thing always, but by accident'
>> - 'do the right thing always and intentionally'
>>
>> ?
>>
>> I'd hate to complicate a legacy ABI any more. My gut feeling is to let
>> apps either
>> assume that the kernel works right, or probe the actual behavior. Adding
>> the flag
>> just makes it easy to screw certain kernel versions that would still work
>> fine if
>> the app used actual probing. So I don't see the flag as an improvement.
>>
>> If your patch fixes the regression that would be a good first step.
>
> I've tested the patch.
> It doesn't fix the problem.
> It allows dosemu to save the ss the old way, but,
> because dosemu doesn't save it to the sigreturn()'s-expected
> place (sigcontext.__pad0), it crashes on sigreturn().
>
> So the problem can't be fixed this way --> NACK to the patch.
>
> I may be unavailable for further testings till next week.
I'm still fighting with getting DOSEMU to run at all in my VM.
I must be missing something. What ends up in ss/__pad0? Wouldn't it
contain whatever signal delivery put there (i.e. some valid ss value)?
--Andy
13.08.2015 17:58, Andy Lutomirski пишет:
> On Thu, Aug 13, 2015 at 5:44 AM, Stas Sergeev <[email protected]> wrote:
>> 13.08.2015 11:39, Ingo Molnar пишет:
>>> * Andy Lutomirski <[email protected]> wrote:
>>>
>>>
>>>>> OK.
>>>>> I'll try to test the patch tomorrow, but I think the sigreturn()'s
>>>>> capability detection is still needed to easily replace the iret
>>>>> trampoline
>>>>> in userspace (without generating a signal and testing by hands).
>>>>> Can of course be done with a run-time kernel version check...
>>>> That feature is so specialized that I think you should just probe it.
>>>>
>>>> void foo(...) {
>>>> sigcontext->ss = 7;
>>>> }
>>>>
>>>> modify_ldt(initialize descriptor 0);
>>>> sigaction(SIGUSR1, foo, SA_SIGINFO);
>>>> if (ss == 7)
>>>> yay;
>>>>
>>>> Fortunately, all kernels that restore ss also have espfix64, so you
>>>> don't need to worry about esp[31:16] corruption on those kernels
>>>> either.
>>>>
>>>> I suppose we could add a new uc_flag to indicate that ss is saved and
>>>> restored,
>>>> though. Ingo, hpa: any thoughts on that? There will always be some
>>>> kernel
>>>> versions that save and restore ss but don't set the flag, though.
>>> So this new flag would essentially be a 'the ss save/restore bug is fixed
>>> for
>>> sure' flag, not covering old kernels that happen to have the correct
>>> behavior,
>>> right?
>>>
>>> Could you please map out the range of kernel versions involved - which
>>> ones:
>>>
>>> - 'never do the right thing'
>>> - 'do the right thing sometimes'
>>> - 'do the right thing always, but by accident'
>>> - 'do the right thing always and intentionally'
>>>
>>> ?
>>>
>>> I'd hate to complicate a legacy ABI any more. My gut feeling is to let
>>> apps either
>>> assume that the kernel works right, or probe the actual behavior. Adding
>>> the flag
>>> just makes it easy to screw certain kernel versions that would still work
>>> fine if
>>> the app used actual probing. So I don't see the flag as an improvement.
>>>
>>> If your patch fixes the regression that would be a good first step.
>> I've tested the patch.
>> It doesn't fix the problem.
>> It allows dosemu to save the ss the old way, but,
>> because dosemu doesn't save it to the sigreturn()'s-expected
>> place (sigcontext.__pad0), it crashes on sigreturn().
>>
>> So the problem can't be fixed this way --> NACK to the patch.
>>
>> I may be unavailable for further testings till next week.
> I'm still fighting with getting DOSEMU to run at all in my VM.
>
> I must be missing something. What ends up in ss/__pad0? Wouldn't it
> contain whatever signal delivery put there (i.e. some valid ss value)?
The crash happens when DOS program terminates.
At that point dosemu subverts the execution flow by
replacing segregs and cs/ip ss/sp in sigcontext with its own.
But __pad0 still has DOS SS, which crash because (presumably)
the DOS LDT have been just removed.
On Tue, Aug 11, 2015 at 5:17 PM, Stas Sergeev <[email protected]> wrote:
>
> I realize this patch may be good to have in general, but
> breaking userspace without a single warning is a bit
> discouraging. Seems like the old "we don't break userspace"
> rule have gone.
That rule hasn't gone anywhere.
Does a plain revert just fix everything? Because if so, that's the
right thing to do, and we can just re-visit this later.
I don't understand why Andy and Ingo are even discussing this. What
the f*ck, guys?
Stas, can you verify that this actually fixes it? There's two
different versions here: one that reverts *just* that one commit, and
one that reverts the fs/gs changes too. Can you test them both?
Linus
On Thu, Aug 13, 2015 at 8:22 AM, Stas Sergeev <[email protected]> wrote:
> 13.08.2015 17:58, Andy Lutomirski пишет:
>
>> On Thu, Aug 13, 2015 at 5:44 AM, Stas Sergeev <[email protected]> wrote:
>>>
>>> 13.08.2015 11:39, Ingo Molnar пишет:
>>>>
>>>> * Andy Lutomirski <[email protected]> wrote:
>>>>
>>>>
>>>>>> OK.
>>>>>> I'll try to test the patch tomorrow, but I think the sigreturn()'s
>>>>>> capability detection is still needed to easily replace the iret
>>>>>> trampoline
>>>>>> in userspace (without generating a signal and testing by hands).
>>>>>> Can of course be done with a run-time kernel version check...
>>>>>
>>>>> That feature is so specialized that I think you should just probe it.
>>>>>
>>>>> void foo(...) {
>>>>> sigcontext->ss = 7;
>>>>> }
>>>>>
>>>>> modify_ldt(initialize descriptor 0);
>>>>> sigaction(SIGUSR1, foo, SA_SIGINFO);
>>>>> if (ss == 7)
>>>>> yay;
>>>>>
>>>>> Fortunately, all kernels that restore ss also have espfix64, so you
>>>>> don't need to worry about esp[31:16] corruption on those kernels
>>>>> either.
>>>>>
>>>>> I suppose we could add a new uc_flag to indicate that ss is saved and
>>>>> restored,
>>>>> though. Ingo, hpa: any thoughts on that? There will always be some
>>>>> kernel
>>>>> versions that save and restore ss but don't set the flag, though.
>>>>
>>>> So this new flag would essentially be a 'the ss save/restore bug is
>>>> fixed
>>>> for
>>>> sure' flag, not covering old kernels that happen to have the correct
>>>> behavior,
>>>> right?
>>>>
>>>> Could you please map out the range of kernel versions involved - which
>>>> ones:
>>>>
>>>> - 'never do the right thing'
>>>> - 'do the right thing sometimes'
>>>> - 'do the right thing always, but by accident'
>>>> - 'do the right thing always and intentionally'
>>>>
>>>> ?
>>>>
>>>> I'd hate to complicate a legacy ABI any more. My gut feeling is to let
>>>> apps either
>>>> assume that the kernel works right, or probe the actual behavior. Adding
>>>> the flag
>>>> just makes it easy to screw certain kernel versions that would still
>>>> work
>>>> fine if
>>>> the app used actual probing. So I don't see the flag as an improvement.
>>>>
>>>> If your patch fixes the regression that would be a good first step.
>>>
>>> I've tested the patch.
>>> It doesn't fix the problem.
>>> It allows dosemu to save the ss the old way, but,
>>> because dosemu doesn't save it to the sigreturn()'s-expected
>>> place (sigcontext.__pad0), it crashes on sigreturn().
>>>
>>> So the problem can't be fixed this way --> NACK to the patch.
>>>
>>> I may be unavailable for further testings till next week.
>>
>> I'm still fighting with getting DOSEMU to run at all in my VM.
>>
>> I must be missing something. What ends up in ss/__pad0? Wouldn't it
>> contain whatever signal delivery put there (i.e. some valid ss value)?
>
> The crash happens when DOS program terminates.
> At that point dosemu subverts the execution flow by
> replacing segregs and cs/ip ss/sp in sigcontext with its own.
> But __pad0 still has DOS SS, which crash because (presumably)
> the DOS LDT have been just removed.
That's unfortunate.
I don't really know what to do about this. My stupid heuristic for
signal delivery seems unlikely to cause problems, but I'm not coming
up with a great heuristic for detecting when a program that *modifies*
sigcontext hasn't set all the fields. Even adding a flag won't really
help here, since DOSEMU won't know to manipulate the flag.
Ingo, here's the situation, assuming I remember the versions right:
v4.0 and before: If we try to deliver a signal while SS is bad, we
fail and the process dies. If SS is good but nonstandard, we end up
in the signal handler with whatever SS value was loaded when the
signal was sent. We do *not* put SS anywhere in the sigcontext, so
the only way for a program to figure out what SS was is to look at the
HW state before making any syscalls. We also don't even try to
restore SS, so SS is unconditionally set to __USER_DS, necessitating
nasty workarounds (and breaking all kinds of test cases).
v4.1 and current -linus: We always set SS to __USER_DS when delivering
a signal. We save the old SS in the sigcontext and restore it, just
like 32-bit signals.
My patch: We leave SS alone when delivering a signal, unless it's
invalid, in which case we replace it with __USER_DS. We still save
the old SS in the sigcontext and restore it on return.
Apparently the remaining regression is that DOSEMU doesn't realize
that SS is saved so, when it tries to return to full 64-bit mode after
a signal that hit in 16-bit mode, it fails because it's invalidated
the old SS descriptor in the mean time.
So... what do we do about it? We could revert the whole mess. We
could tell everyone to fix their DOSEMU, which violates policy and is
especially annoying given how much effort we've put into keeping
16-bit mode fully functional lately. We could add yet more heuristics
and teach sigreturn to ignore the saved SS value in sigcontext if the
saved CS is 64-bit and the saved SS is unusable.
--Andy
--
Andy Lutomirski
AMA Capital Management, LLC
On Thu, Aug 13, 2015 at 8:37 AM, Linus Torvalds
<[email protected]> wrote:
> On Tue, Aug 11, 2015 at 5:17 PM, Stas Sergeev <[email protected]> wrote:
>>
>> I realize this patch may be good to have in general, but
>> breaking userspace without a single warning is a bit
>> discouraging. Seems like the old "we don't break userspace"
>> rule have gone.
>
> That rule hasn't gone anywhere.
>
> Does a plain revert just fix everything? Because if so, that's the
> right thing to do, and we can just re-visit this later.
>
> I don't understand why Andy and Ingo are even discussing this. What
> the f*ck, guys?
>
I'm trying to fix it without reverting. If that doesn't work, then we
revert. Yesterday, I thought I had a reasonably clean fix, but it
turned out that it only solved half of the problem.
If we revert, I think I need to check what will break due to the
revert. I need to check at least Wine, and we'll have to do something
about all the selftests that will start failing. I also need to check
CRIU, and IIRC CRIU has started using the new sigcontext SS in new
versions.
And, damnit, those selftests are *useful*. They've smoked out all
kinds of problems. That's part of the reason I'd prefer not to revert
if there's a better option.
--Andy
13.08.2015 18:38, Andy Lutomirski пишет:
> On Thu, Aug 13, 2015 at 8:22 AM, Stas Sergeev <[email protected]> wrote:
>> 13.08.2015 17:58, Andy Lutomirski пишет:
>>
>>> On Thu, Aug 13, 2015 at 5:44 AM, Stas Sergeev <[email protected]> wrote:
>>>> 13.08.2015 11:39, Ingo Molnar пишет:
>>>>> * Andy Lutomirski <[email protected]> wrote:
>>>>>
>>>>>
>>>>>>> OK.
>>>>>>> I'll try to test the patch tomorrow, but I think the sigreturn()'s
>>>>>>> capability detection is still needed to easily replace the iret
>>>>>>> trampoline
>>>>>>> in userspace (without generating a signal and testing by hands).
>>>>>>> Can of course be done with a run-time kernel version check...
>>>>>> That feature is so specialized that I think you should just probe it.
>>>>>>
>>>>>> void foo(...) {
>>>>>> sigcontext->ss = 7;
>>>>>> }
>>>>>>
>>>>>> modify_ldt(initialize descriptor 0);
>>>>>> sigaction(SIGUSR1, foo, SA_SIGINFO);
>>>>>> if (ss == 7)
>>>>>> yay;
>>>>>>
>>>>>> Fortunately, all kernels that restore ss also have espfix64, so you
>>>>>> don't need to worry about esp[31:16] corruption on those kernels
>>>>>> either.
>>>>>>
>>>>>> I suppose we could add a new uc_flag to indicate that ss is saved and
>>>>>> restored,
>>>>>> though. Ingo, hpa: any thoughts on that? There will always be some
>>>>>> kernel
>>>>>> versions that save and restore ss but don't set the flag, though.
>>>>> So this new flag would essentially be a 'the ss save/restore bug is
>>>>> fixed
>>>>> for
>>>>> sure' flag, not covering old kernels that happen to have the correct
>>>>> behavior,
>>>>> right?
>>>>>
>>>>> Could you please map out the range of kernel versions involved - which
>>>>> ones:
>>>>>
>>>>> - 'never do the right thing'
>>>>> - 'do the right thing sometimes'
>>>>> - 'do the right thing always, but by accident'
>>>>> - 'do the right thing always and intentionally'
>>>>>
>>>>> ?
>>>>>
>>>>> I'd hate to complicate a legacy ABI any more. My gut feeling is to let
>>>>> apps either
>>>>> assume that the kernel works right, or probe the actual behavior. Adding
>>>>> the flag
>>>>> just makes it easy to screw certain kernel versions that would still
>>>>> work
>>>>> fine if
>>>>> the app used actual probing. So I don't see the flag as an improvement.
>>>>>
>>>>> If your patch fixes the regression that would be a good first step.
>>>> I've tested the patch.
>>>> It doesn't fix the problem.
>>>> It allows dosemu to save the ss the old way, but,
>>>> because dosemu doesn't save it to the sigreturn()'s-expected
>>>> place (sigcontext.__pad0), it crashes on sigreturn().
>>>>
>>>> So the problem can't be fixed this way --> NACK to the patch.
>>>>
>>>> I may be unavailable for further testings till next week.
>>> I'm still fighting with getting DOSEMU to run at all in my VM.
>>>
>>> I must be missing something. What ends up in ss/__pad0? Wouldn't it
>>> contain whatever signal delivery put there (i.e. some valid ss value)?
>> The crash happens when DOS program terminates.
>> At that point dosemu subverts the execution flow by
>> replacing segregs and cs/ip ss/sp in sigcontext with its own.
>> But __pad0 still has DOS SS, which crash because (presumably)
>> the DOS LDT have been just removed.
> That's unfortunate.
>
> I don't really know what to do about this. My stupid heuristic for
> signal delivery seems unlikely to cause problems, but I'm not coming
> up with a great heuristic for detecting when a program that *modifies*
> sigcontext hasn't set all the fields. Even adding a flag won't really
> help here, since DOSEMU won't know to manipulate the flag.
>
> Ingo, here's the situation, assuming I remember the versions right:
>
> v4.0 and before: If we try to deliver a signal while SS is bad, we
> fail and the process dies. If SS is good but nonstandard, we end up
> in the signal handler with whatever SS value was loaded when the
> signal was sent. We do *not* put SS anywhere in the sigcontext, so
> the only way for a program to figure out what SS was is to look at the
> HW state before making any syscalls. We also don't even try to
> restore SS, so SS is unconditionally set to __USER_DS, necessitating
> nasty workarounds (and breaking all kinds of test cases).
>
> v4.1 and current -linus: We always set SS to __USER_DS when delivering
> a signal. We save the old SS in the sigcontext and restore it, just
> like 32-bit signals.
>
> My patch: We leave SS alone when delivering a signal, unless it's
> invalid, in which case we replace it with __USER_DS. We still save
> the old SS in the sigcontext and restore it on return.
>
> Apparently the remaining regression is that DOSEMU doesn't realize
> that SS is saved so, when it tries to return to full 64-bit mode after
> a signal that hit in 16-bit mode, it fails because it's invalidated
> the old SS descriptor in the mean time.
>
>
> So... what do we do about it? We could revert the whole mess. We
> could tell everyone to fix their DOSEMU, which violates policy and is
> especially annoying given how much effort we've put into keeping
> 16-bit mode fully functional lately. We could add yet more heuristics
> and teach sigreturn to ignore the saved SS value in sigcontext if the
> saved CS is 64-bit and the saved SS is unusable.
Andy, why do you constantly ignore the proposal to make
new behaviour explicitly controlable? You don't have to agree
with it, but you could at least comment on that possibility
and/or mention it with the ones you listed above.
On Thu, Aug 13, 2015 at 9:03 AM, Stas Sergeev <[email protected]> wrote:
> 13.08.2015 18:38, Andy Lutomirski пишет:
>>
>>
>> So... what do we do about it? We could revert the whole mess. We
>> could tell everyone to fix their DOSEMU, which violates policy and is
>> especially annoying given how much effort we've put into keeping
>> 16-bit mode fully functional lately. We could add yet more heuristics
>> and teach sigreturn to ignore the saved SS value in sigcontext if the
>> saved CS is 64-bit and the saved SS is unusable.
>
> Andy, why do you constantly ignore the proposal to make
> new behaviour explicitly controlable? You don't have to agree
> with it, but you could at least comment on that possibility
> and/or mention it with the ones you listed above.
I'm not sure what the proposal is exactly.
We could add a new uc_flags flag. If set, it means that
sigcontext->ss is valid and should be used by sigreturn. If clear,
then we ignore sigcontext->ss and just restore __USER_DS.
The problem is that, by itself, this won't fix old DOSEMU. We somehow
need to either detect that something funny is going on or just leave
the flag clear by default.
We could do this: always save SS to sigcontext->ss, but only restore
sigcontext->ss if userspace explicitly sets the flag before sigreturn.
If we do that, we'd need to also add my patch to preserve the actual
HW SS selector if possible so that old DOSEMU knows what SS to program
into its trampoline.
This at least lets *new* DOSEMU set the flag and get the improved
behavior. I still don't know what effect it'll have on Wine and CRIU.
Stas, is that what you were thinking, or were you thinking of something else?
--Andy
On Thu, Aug 13, 2015 at 8:43 AM, Andy Lutomirski <[email protected]> wrote:
>
> I'm trying to fix it without reverting. If that doesn't work, then we
> revert. Yesterday, I thought I had a reasonably clean fix, but it
> turned out that it only solved half of the problem.
The thing is, I actually think that the current situation is crazy.
Especially given that we don't restore any of the other segment
registers on x86-64 (except CS, of course)
So how about this "alternate" minimal patch instead. The difference is:
- we actually leave the
regs->ss = __USER_DS;
in __setup_rt_frame, to guarantee that when we take a signal, we do
take it with a valid SS
- but it removes all the other games with SS (and treats it exactly
the same as FS/GS).
So now we don't play games with the actual sigcontext, and
hopefully dosemu is happier.
Hmm? That actually makes the code look better, and doesn't
re-introduce that annoying CONFIG_X86_32 case (because it now does it
in obviously the same place as fs/gs).
So the code is cleaner, and closer to what we used to do.
Stas, can you test this one too? I, like Luto, don't actually have a
dosemu test-case.
Linus
13.08.2015 19:09, Andy Lutomirski пишет:
> On Thu, Aug 13, 2015 at 9:03 AM, Stas Sergeev <[email protected]> wrote:
>> 13.08.2015 18:38, Andy Lutomirski пишет:
>>>
>>> So... what do we do about it? We could revert the whole mess. We
>>> could tell everyone to fix their DOSEMU, which violates policy and is
>>> especially annoying given how much effort we've put into keeping
>>> 16-bit mode fully functional lately. We could add yet more heuristics
>>> and teach sigreturn to ignore the saved SS value in sigcontext if the
>>> saved CS is 64-bit and the saved SS is unusable.
>> Andy, why do you constantly ignore the proposal to make
>> new behaviour explicitly controlable? You don't have to agree
>> with it, but you could at least comment on that possibility
>> and/or mention it with the ones you listed above.
> I'm not sure what the proposal is exactly.
>
> We could add a new uc_flags flag. If set, it means that
> sigcontext->ss is valid and should be used by sigreturn. If clear,
> then we ignore sigcontext->ss and just restore __USER_DS.
>
> The problem is that, by itself, this won't fix old DOSEMU. We somehow
> need to either detect that something funny is going on or just leave
> the flag clear by default.
>
> We could do this: always save SS to sigcontext->ss, but only restore
> sigcontext->ss if userspace explicitly sets the flag before sigreturn.
> If we do that, we'd need to also add my patch to preserve the actual
> HW SS selector if possible so that old DOSEMU knows what SS to program
> into its trampoline.
>
> This at least lets *new* DOSEMU set the flag and get the improved
> behavior. I still don't know what effect it'll have on Wine and CRIU.
>
> Stas, is that what you were thinking, or were you thinking of something else?
Not quite.
I mean the flag that will control not only sigreturn, but
the signal delivery as well. This may probably be a sigaction()
flag or some other. If not set - ss is ignored by both signal
delivery and sigreturn(). If set - ss is saved/restored (and in
the future - also fs/gs).
Is such a flag possible?
On Thu, Aug 13, 2015 at 9:19 AM, Linus Torvalds
<[email protected]> wrote:
> On Thu, Aug 13, 2015 at 8:43 AM, Andy Lutomirski <[email protected]> wrote:
>>
>> I'm trying to fix it without reverting. If that doesn't work, then we
>> revert. Yesterday, I thought I had a reasonably clean fix, but it
>> turned out that it only solved half of the problem.
>
> The thing is, I actually think that the current situation is crazy.
>
> Especially given that we don't restore any of the other segment
> registers on x86-64 (except CS, of course)
>
> So how about this "alternate" minimal patch instead. The difference is:
>
> - we actually leave the
>
> regs->ss = __USER_DS;
>
> in __setup_rt_frame, to guarantee that when we take a signal, we do
> take it with a valid SS
That by itself is enough to break DOSEMU. I think we may be stuck
with my hack to only replace regs->ss if the old one was invalid.
>
> - but it removes all the other games with SS (and treats it exactly
> the same as FS/GS).
>
> So now we don't play games with the actual sigcontext, and
> hopefully dosemu is happier.
You mean that we always set ss to __USER_DS on sigreturn? The problem
is that user code *can't* program SS when calling sigreturn because
the SYSCALL instruction zaps it.
I'll try to implement something.
If this regression were new in 4.2-rc, then I'd say revert first and
ask questions later, but the regression is in 4.1 as well :(
--Andy
On Thu, Aug 13, 2015 at 9:20 AM, Stas Sergeev <[email protected]> wrote:
> 13.08.2015 19:09, Andy Lutomirski пишет:
>
>> On Thu, Aug 13, 2015 at 9:03 AM, Stas Sergeev <[email protected]> wrote:
>>>
>>> 13.08.2015 18:38, Andy Lutomirski пишет:
>>>>
>>>>
>>>> So... what do we do about it? We could revert the whole mess. We
>>>> could tell everyone to fix their DOSEMU, which violates policy and is
>>>> especially annoying given how much effort we've put into keeping
>>>> 16-bit mode fully functional lately. We could add yet more heuristics
>>>> and teach sigreturn to ignore the saved SS value in sigcontext if the
>>>> saved CS is 64-bit and the saved SS is unusable.
>>>
>>> Andy, why do you constantly ignore the proposal to make
>>> new behaviour explicitly controlable? You don't have to agree
>>> with it, but you could at least comment on that possibility
>>> and/or mention it with the ones you listed above.
>>
>> I'm not sure what the proposal is exactly.
>>
>> We could add a new uc_flags flag. If set, it means that
>> sigcontext->ss is valid and should be used by sigreturn. If clear,
>> then we ignore sigcontext->ss and just restore __USER_DS.
>>
>> The problem is that, by itself, this won't fix old DOSEMU. We somehow
>> need to either detect that something funny is going on or just leave
>> the flag clear by default.
>>
>> We could do this: always save SS to sigcontext->ss, but only restore
>> sigcontext->ss if userspace explicitly sets the flag before sigreturn.
>> If we do that, we'd need to also add my patch to preserve the actual
>> HW SS selector if possible so that old DOSEMU knows what SS to program
>> into its trampoline.
>>
>> This at least lets *new* DOSEMU set the flag and get the improved
>> behavior. I still don't know what effect it'll have on Wine and CRIU.
>>
>> Stas, is that what you were thinking, or were you thinking of something
>> else?
>
> Not quite.
> I mean the flag that will control not only sigreturn, but
> the signal delivery as well. This may probably be a sigaction()
> flag or some other. If not set - ss is ignored by both signal
> delivery and sigreturn(). If set - ss is saved/restored (and in
> the future - also fs/gs).
> Is such a flag possible?
Maybe. I think I'm more nervous about adding new flags in sigaction
than I am in uc_flags.
--Andy
On Thu, Aug 13, 2015 at 9:23 AM, Andy Lutomirski <[email protected]> wrote:
> On Thu, Aug 13, 2015 at 9:19 AM, Linus Torvalds
> <[email protected]> wrote:
>>
>> So how about this "alternate" minimal patch instead. The difference is:
>>
>> - we actually leave the
>>
>> regs->ss = __USER_DS;
>>
>> in __setup_rt_frame, to guarantee that when we take a signal, we do
>> take it with a valid SS
>
> That by itself is enough to break DOSEMU. I think we may be stuck
> with my hack to only replace regs->ss if the old one was invalid.
Are you sure? From the description by Stas, the problem is literally
the *restoring* action of the sigcontext, and trying to restore a SS
value that is no longer valid.
"The crash happens when DOS program terminates.
At that point dosemu subverts the execution flow by
replacing segregs and cs/ip ss/sp in sigcontext with its own.
But __pad0 still has DOS SS, which crash because (presumably)
the DOS LDT have been just removed"
and my "truly-minimal" patch removes all of the sigcontext games.
> You mean that we always set ss to __USER_DS on sigreturn?
No. We never touch SS at sigreturn time at all. Only when entering the
signal *handler* do we reset things to a known state. The signal
handler can do anything it wants, and sigreturn won't touch it (which
will obviously _leave_ it as __USER_DS, but avoids the problem with
sigreturn trying to load an SS that is no longer valid)
> If this regression were new in 4.2-rc, then I'd say revert first and
> ask questions later, but the regression is in 4.1 as well :(
Big deal. That's why we have the "cc stable". Distributions that ship
with 4.1 are still fairly few (but it's a LTS release so it will grow)
but they all pick up stable kernels.
And even if they temporarily have a broken situation, it's still
better to make sure that broken situation gets fixed, rather than say
"oh well, too late to do anything about it now".
Linus
13.08.2015 19:24, Andy Lutomirski пишет:
> On Thu, Aug 13, 2015 at 9:20 AM, Stas Sergeev <[email protected]> wrote:
>> 13.08.2015 19:09, Andy Lutomirski пишет:
>>
>>> On Thu, Aug 13, 2015 at 9:03 AM, Stas Sergeev <[email protected]> wrote:
>>>> 13.08.2015 18:38, Andy Lutomirski пишет:
>>>>>
>>>>> So... what do we do about it? We could revert the whole mess. We
>>>>> could tell everyone to fix their DOSEMU, which violates policy and is
>>>>> especially annoying given how much effort we've put into keeping
>>>>> 16-bit mode fully functional lately. We could add yet more heuristics
>>>>> and teach sigreturn to ignore the saved SS value in sigcontext if the
>>>>> saved CS is 64-bit and the saved SS is unusable.
>>>> Andy, why do you constantly ignore the proposal to make
>>>> new behaviour explicitly controlable? You don't have to agree
>>>> with it, but you could at least comment on that possibility
>>>> and/or mention it with the ones you listed above.
>>> I'm not sure what the proposal is exactly.
>>>
>>> We could add a new uc_flags flag. If set, it means that
>>> sigcontext->ss is valid and should be used by sigreturn. If clear,
>>> then we ignore sigcontext->ss and just restore __USER_DS.
>>>
>>> The problem is that, by itself, this won't fix old DOSEMU. We somehow
>>> need to either detect that something funny is going on or just leave
>>> the flag clear by default.
>>>
>>> We could do this: always save SS to sigcontext->ss, but only restore
>>> sigcontext->ss if userspace explicitly sets the flag before sigreturn.
>>> If we do that, we'd need to also add my patch to preserve the actual
>>> HW SS selector if possible so that old DOSEMU knows what SS to program
>>> into its trampoline.
>>>
>>> This at least lets *new* DOSEMU set the flag and get the improved
>>> behavior. I still don't know what effect it'll have on Wine and CRIU.
>>>
>>> Stas, is that what you were thinking, or were you thinking of something
>>> else?
>> Not quite.
>> I mean the flag that will control not only sigreturn, but
>> the signal delivery as well. This may probably be a sigaction()
>> flag or some other. If not set - ss is ignored by both signal
>> delivery and sigreturn(). If set - ss is saved/restored (and in
>> the future - also fs/gs).
>> Is such a flag possible?
> Maybe. I think I'm more nervous about adding new flags in sigaction
> than I am in uc_flags.
Isn't uc_flags read-only for the user?
I look into setup_rt_frame
<http://lxr.free-electrons.com/ident?v=2.4.37;i=setup_rt_frame>() and see
---
/* Create the ucontext. */
err |= __put_user(0, &frame->uc.uc_flags);
---
so it doesn't look like the flag that user can use to _request_
something from the kernel. And I am talking about exactly
the flag to request the new behaviour, as only that can remove
the regression completely without patching dosemu.
On Thu, Aug 13, 2015 at 9:38 AM, Stas Sergeev <[email protected]> wrote:
> 13.08.2015 19:24, Andy Lutomirski пишет:
>
>> On Thu, Aug 13, 2015 at 9:20 AM, Stas Sergeev <[email protected]> wrote:
>>>
>>> 13.08.2015 19:09, Andy Lutomirski пишет:
>>>
>>>> On Thu, Aug 13, 2015 at 9:03 AM, Stas Sergeev <[email protected]> wrote:
>>>>>
>>>>> 13.08.2015 18:38, Andy Lutomirski пишет:
>>>>>>
>>>>>>
>>>>>> So... what do we do about it? We could revert the whole mess. We
>>>>>> could tell everyone to fix their DOSEMU, which violates policy and is
>>>>>> especially annoying given how much effort we've put into keeping
>>>>>> 16-bit mode fully functional lately. We could add yet more heuristics
>>>>>> and teach sigreturn to ignore the saved SS value in sigcontext if the
>>>>>> saved CS is 64-bit and the saved SS is unusable.
>>>>>
>>>>> Andy, why do you constantly ignore the proposal to make
>>>>> new behaviour explicitly controlable? You don't have to agree
>>>>> with it, but you could at least comment on that possibility
>>>>> and/or mention it with the ones you listed above.
>>>>
>>>> I'm not sure what the proposal is exactly.
>>>>
>>>> We could add a new uc_flags flag. If set, it means that
>>>> sigcontext->ss is valid and should be used by sigreturn. If clear,
>>>> then we ignore sigcontext->ss and just restore __USER_DS.
>>>>
>>>> The problem is that, by itself, this won't fix old DOSEMU. We somehow
>>>> need to either detect that something funny is going on or just leave
>>>> the flag clear by default.
>>>>
>>>> We could do this: always save SS to sigcontext->ss, but only restore
>>>> sigcontext->ss if userspace explicitly sets the flag before sigreturn.
>>>> If we do that, we'd need to also add my patch to preserve the actual
>>>> HW SS selector if possible so that old DOSEMU knows what SS to program
>>>> into its trampoline.
>>>>
>>>> This at least lets *new* DOSEMU set the flag and get the improved
>>>> behavior. I still don't know what effect it'll have on Wine and CRIU.
>>>>
>>>> Stas, is that what you were thinking, or were you thinking of something
>>>> else?
>>>
>>> Not quite.
>>> I mean the flag that will control not only sigreturn, but
>>> the signal delivery as well. This may probably be a sigaction()
>>> flag or some other. If not set - ss is ignored by both signal
>>> delivery and sigreturn(). If set - ss is saved/restored (and in
>>> the future - also fs/gs).
>>> Is such a flag possible?
>>
>> Maybe. I think I'm more nervous about adding new flags in sigaction
>> than I am in uc_flags.
>
> Isn't uc_flags read-only for the user?
> I look into setup_rt_frame
> <http://lxr.free-electrons.com/ident?v=2.4.37;i=setup_rt_frame>() and see
> ---
> /* Create the ucontext. */
> err |= __put_user(0, &frame->uc.uc_flags);
> ---
> so it doesn't look like the flag that user can use to _request_
> something from the kernel. And I am talking about exactly
> the flag to request the new behaviour, as only that can remove
> the regression completely without patching dosemu.
User code could rewrite it in the signal handler to request something.
--Andy
On Thu, Aug 13, 2015 at 9:34 AM, Linus Torvalds
<[email protected]> wrote:
>
> Are you sure? From the description by Stas, the problem is literally
> the *restoring* action of the sigcontext, and trying to restore a SS
> value that is no longer valid.
>
> "The crash happens when DOS program terminates.
> At that point dosemu subverts the execution flow by
> replacing segregs and cs/ip ss/sp in sigcontext with its own.
> But __pad0 still has DOS SS, which crash because (presumably)
> the DOS LDT have been just removed"
Side note: if this is the main issue, and the problem is the "iret"
faulting when trying to restore SS (and causing an unexpected SIGSEGV
that dosemu crashes on), then an alternate model might be to keep the
save/restore SS code, but do a "VERW" on the SS descriptor in
restore_sigcontext(), and silently just replacing it with __USER_DS if
that fails.
Linus
On Thu, Aug 13, 2015 at 9:43 AM, Linus Torvalds
<[email protected]> wrote:
> On Thu, Aug 13, 2015 at 9:34 AM, Linus Torvalds
> <[email protected]> wrote:
>>
>> Are you sure? From the description by Stas, the problem is literally
>> the *restoring* action of the sigcontext, and trying to restore a SS
>> value that is no longer valid.
>>
>> "The crash happens when DOS program terminates.
>> At that point dosemu subverts the execution flow by
>> replacing segregs and cs/ip ss/sp in sigcontext with its own.
>> But __pad0 still has DOS SS, which crash because (presumably)
>> the DOS LDT have been just removed"
>
> Side note: if this is the main issue, and the problem is the "iret"
> faulting when trying to restore SS (and causing an unexpected SIGSEGV
> that dosemu crashes on), then an alternate model might be to keep the
> save/restore SS code, but do a "VERW" on the SS descriptor in
> restore_sigcontext(), and silently just replacing it with __USER_DS if
> that fails.
>
I have a patch to do approximately that (using LAR instead of VERW to
rule out DPL < 3.
I'm 90% sure that the regs->ss = __USER_DS thing is a problem. Read
farther up in the thread.
I'm leaning toward saying we should revert, cc: stable, and fix it
better for 4.3.
--Andy
13.08.2015 19:42, Andy Lutomirski пишет:
> On Thu, Aug 13, 2015 at 9:38 AM, Stas Sergeev <[email protected]> wrote:
>> 13.08.2015 19:24, Andy Lutomirski пишет:
>>
>>> On Thu, Aug 13, 2015 at 9:20 AM, Stas Sergeev <[email protected]> wrote:
>>>> 13.08.2015 19:09, Andy Lutomirski пишет:
>>>>
>>>>> On Thu, Aug 13, 2015 at 9:03 AM, Stas Sergeev <[email protected]> wrote:
>>>>>> 13.08.2015 18:38, Andy Lutomirski пишет:
>>>>>>>
>>>>>>> So... what do we do about it? We could revert the whole mess. We
>>>>>>> could tell everyone to fix their DOSEMU, which violates policy and is
>>>>>>> especially annoying given how much effort we've put into keeping
>>>>>>> 16-bit mode fully functional lately. We could add yet more heuristics
>>>>>>> and teach sigreturn to ignore the saved SS value in sigcontext if the
>>>>>>> saved CS is 64-bit and the saved SS is unusable.
>>>>>> Andy, why do you constantly ignore the proposal to make
>>>>>> new behaviour explicitly controlable? You don't have to agree
>>>>>> with it, but you could at least comment on that possibility
>>>>>> and/or mention it with the ones you listed above.
>>>>> I'm not sure what the proposal is exactly.
>>>>>
>>>>> We could add a new uc_flags flag. If set, it means that
>>>>> sigcontext->ss is valid and should be used by sigreturn. If clear,
>>>>> then we ignore sigcontext->ss and just restore __USER_DS.
>>>>>
>>>>> The problem is that, by itself, this won't fix old DOSEMU. We somehow
>>>>> need to either detect that something funny is going on or just leave
>>>>> the flag clear by default.
>>>>>
>>>>> We could do this: always save SS to sigcontext->ss, but only restore
>>>>> sigcontext->ss if userspace explicitly sets the flag before sigreturn.
>>>>> If we do that, we'd need to also add my patch to preserve the actual
>>>>> HW SS selector if possible so that old DOSEMU knows what SS to program
>>>>> into its trampoline.
>>>>>
>>>>> This at least lets *new* DOSEMU set the flag and get the improved
>>>>> behavior. I still don't know what effect it'll have on Wine and CRIU.
>>>>>
>>>>> Stas, is that what you were thinking, or were you thinking of something
>>>>> else?
>>>> Not quite.
>>>> I mean the flag that will control not only sigreturn, but
>>>> the signal delivery as well. This may probably be a sigaction()
>>>> flag or some other. If not set - ss is ignored by both signal
>>>> delivery and sigreturn(). If set - ss is saved/restored (and in
>>>> the future - also fs/gs).
>>>> Is such a flag possible?
>>> Maybe. I think I'm more nervous about adding new flags in sigaction
>>> than I am in uc_flags.
>> Isn't uc_flags read-only for the user?
>> I look into setup_rt_frame
>> <http://lxr.free-electrons.com/ident?v=2.4.37;i=setup_rt_frame>() and see
>> ---
>> /* Create the ucontext. */
>> err |= __put_user(0, &frame->uc.uc_flags);
>> ---
>> so it doesn't look like the flag that user can use to _request_
>> something from the kernel. And I am talking about exactly
>> the flag to request the new behaviour, as only that can remove
>> the regression completely without patching dosemu.
> User code could rewrite it in the signal handler to request something.
But that's too late to affect the signal _delivery_ anyhow, no?
Any idea about the flag that can control both delivery and return?
On Thu, Aug 13, 2015 at 9:48 AM, Stas Sergeev <[email protected]> wrote:
> 13.08.2015 19:42, Andy Lutomirski пишет:
>
>> On Thu, Aug 13, 2015 at 9:38 AM, Stas Sergeev <[email protected]> wrote:
>>>
>>> 13.08.2015 19:24, Andy Lutomirski пишет:
>>>
>>>> On Thu, Aug 13, 2015 at 9:20 AM, Stas Sergeev <[email protected]> wrote:
>>>>>
>>>>> 13.08.2015 19:09, Andy Lutomirski пишет:
>>>>>
>>>>>> On Thu, Aug 13, 2015 at 9:03 AM, Stas Sergeev <[email protected]> wrote:
>>>>>>>
>>>>>>> 13.08.2015 18:38, Andy Lutomirski пишет:
>>>>>>>>
>>>>>>>>
>>>>>>>> So... what do we do about it? We could revert the whole mess. We
>>>>>>>> could tell everyone to fix their DOSEMU, which violates policy and
>>>>>>>> is
>>>>>>>> especially annoying given how much effort we've put into keeping
>>>>>>>> 16-bit mode fully functional lately. We could add yet more
>>>>>>>> heuristics
>>>>>>>> and teach sigreturn to ignore the saved SS value in sigcontext if
>>>>>>>> the
>>>>>>>> saved CS is 64-bit and the saved SS is unusable.
>>>>>>>
>>>>>>> Andy, why do you constantly ignore the proposal to make
>>>>>>> new behaviour explicitly controlable? You don't have to agree
>>>>>>> with it, but you could at least comment on that possibility
>>>>>>> and/or mention it with the ones you listed above.
>>>>>>
>>>>>> I'm not sure what the proposal is exactly.
>>>>>>
>>>>>> We could add a new uc_flags flag. If set, it means that
>>>>>> sigcontext->ss is valid and should be used by sigreturn. If clear,
>>>>>> then we ignore sigcontext->ss and just restore __USER_DS.
>>>>>>
>>>>>> The problem is that, by itself, this won't fix old DOSEMU. We somehow
>>>>>> need to either detect that something funny is going on or just leave
>>>>>> the flag clear by default.
>>>>>>
>>>>>> We could do this: always save SS to sigcontext->ss, but only restore
>>>>>> sigcontext->ss if userspace explicitly sets the flag before sigreturn.
>>>>>> If we do that, we'd need to also add my patch to preserve the actual
>>>>>> HW SS selector if possible so that old DOSEMU knows what SS to program
>>>>>> into its trampoline.
>>>>>>
>>>>>> This at least lets *new* DOSEMU set the flag and get the improved
>>>>>> behavior. I still don't know what effect it'll have on Wine and CRIU.
>>>>>>
>>>>>> Stas, is that what you were thinking, or were you thinking of
>>>>>> something
>>>>>> else?
>>>>>
>>>>> Not quite.
>>>>> I mean the flag that will control not only sigreturn, but
>>>>> the signal delivery as well. This may probably be a sigaction()
>>>>> flag or some other. If not set - ss is ignored by both signal
>>>>> delivery and sigreturn(). If set - ss is saved/restored (and in
>>>>> the future - also fs/gs).
>>>>> Is such a flag possible?
>>>>
>>>> Maybe. I think I'm more nervous about adding new flags in sigaction
>>>> than I am in uc_flags.
>>>
>>> Isn't uc_flags read-only for the user?
>>> I look into setup_rt_frame
>>> <http://lxr.free-electrons.com/ident?v=2.4.37;i=setup_rt_frame>() and see
>>> ---
>>> /* Create the ucontext. */
>>> err |= __put_user(0, &frame->uc.uc_flags);
>>> ---
>>> so it doesn't look like the flag that user can use to _request_
>>> something from the kernel. And I am talking about exactly
>>> the flag to request the new behaviour, as only that can remove
>>> the regression completely without patching dosemu.
>>
>> User code could rewrite it in the signal handler to request something.
>
> But that's too late to affect the signal _delivery_ anyhow, no?
> Any idea about the flag that can control both delivery and return?
I think my LAR patch should cover the signal delivery part.
--Andy
On Thu, Aug 13, 2015 at 11:43 AM, Andy Lutomirski <[email protected]> wrote:
> On Thu, Aug 13, 2015 at 8:37 AM, Linus Torvalds
> <[email protected]> wrote:
>> On Tue, Aug 11, 2015 at 5:17 PM, Stas Sergeev <[email protected]> wrote:
>>>
>>> I realize this patch may be good to have in general, but
>>> breaking userspace without a single warning is a bit
>>> discouraging. Seems like the old "we don't break userspace"
>>> rule have gone.
>>
>> That rule hasn't gone anywhere.
>>
>> Does a plain revert just fix everything? Because if so, that's the
>> right thing to do, and we can just re-visit this later.
>>
>> I don't understand why Andy and Ingo are even discussing this. What
>> the f*ck, guys?
>>
>
> I'm trying to fix it without reverting. If that doesn't work, then we
> revert. Yesterday, I thought I had a reasonably clean fix, but it
> turned out that it only solved half of the problem.
>
> If we revert, I think I need to check what will break due to the
> revert. I need to check at least Wine, and we'll have to do something
> about all the selftests that will start failing. I also need to check
> CRIU, and IIRC CRIU has started using the new sigcontext SS in new
> versions.
I don't think Wine will be a problem, at least how it is currently set
up. 16-bit support is only in the 32-bit build. The 64-bit build
only supports Win64 apps, and will call the 32-bit version (installed
in parallel) to run 32 and 16-bit apps.
--
Brian Gerst
13.08.2015 19:59, Andy Lutomirski пишет:
> On Thu, Aug 13, 2015 at 9:48 AM, Stas Sergeev <[email protected]> wrote:
>> 13.08.2015 19:42, Andy Lutomirski пишет:
>>
>>> On Thu, Aug 13, 2015 at 9:38 AM, Stas Sergeev <[email protected]> wrote:
>>>> 13.08.2015 19:24, Andy Lutomirski пишет:
>>>>
>>>>> On Thu, Aug 13, 2015 at 9:20 AM, Stas Sergeev <[email protected]> wrote:
>>>>>> 13.08.2015 19:09, Andy Lutomirski пишет:
>>>>>>
>>>>>>> On Thu, Aug 13, 2015 at 9:03 AM, Stas Sergeev <[email protected]> wrote:
>>>>>>>> 13.08.2015 18:38, Andy Lutomirski пишет:
>>>>>>>>>
>>>>>>>>> So... what do we do about it? We could revert the whole mess. We
>>>>>>>>> could tell everyone to fix their DOSEMU, which violates policy and
>>>>>>>>> is
>>>>>>>>> especially annoying given how much effort we've put into keeping
>>>>>>>>> 16-bit mode fully functional lately. We could add yet more
>>>>>>>>> heuristics
>>>>>>>>> and teach sigreturn to ignore the saved SS value in sigcontext if
>>>>>>>>> the
>>>>>>>>> saved CS is 64-bit and the saved SS is unusable.
>>>>>>>> Andy, why do you constantly ignore the proposal to make
>>>>>>>> new behaviour explicitly controlable? You don't have to agree
>>>>>>>> with it, but you could at least comment on that possibility
>>>>>>>> and/or mention it with the ones you listed above.
>>>>>>> I'm not sure what the proposal is exactly.
>>>>>>>
>>>>>>> We could add a new uc_flags flag. If set, it means that
>>>>>>> sigcontext->ss is valid and should be used by sigreturn. If clear,
>>>>>>> then we ignore sigcontext->ss and just restore __USER_DS.
>>>>>>>
>>>>>>> The problem is that, by itself, this won't fix old DOSEMU. We somehow
>>>>>>> need to either detect that something funny is going on or just leave
>>>>>>> the flag clear by default.
>>>>>>>
>>>>>>> We could do this: always save SS to sigcontext->ss, but only restore
>>>>>>> sigcontext->ss if userspace explicitly sets the flag before sigreturn.
>>>>>>> If we do that, we'd need to also add my patch to preserve the actual
>>>>>>> HW SS selector if possible so that old DOSEMU knows what SS to program
>>>>>>> into its trampoline.
>>>>>>>
>>>>>>> This at least lets *new* DOSEMU set the flag and get the improved
>>>>>>> behavior. I still don't know what effect it'll have on Wine and CRIU.
>>>>>>>
>>>>>>> Stas, is that what you were thinking, or were you thinking of
>>>>>>> something
>>>>>>> else?
>>>>>> Not quite.
>>>>>> I mean the flag that will control not only sigreturn, but
>>>>>> the signal delivery as well. This may probably be a sigaction()
>>>>>> flag or some other. If not set - ss is ignored by both signal
>>>>>> delivery and sigreturn(). If set - ss is saved/restored (and in
>>>>>> the future - also fs/gs).
>>>>>> Is such a flag possible?
>>>>> Maybe. I think I'm more nervous about adding new flags in sigaction
>>>>> than I am in uc_flags.
>>>> Isn't uc_flags read-only for the user?
>>>> I look into setup_rt_frame
>>>> <http://lxr.free-electrons.com/ident?v=2.4.37;i=setup_rt_frame>() and see
>>>> ---
>>>> /* Create the ucontext. */
>>>> err |= __put_user(0, &frame->uc.uc_flags);
>>>> ---
>>>> so it doesn't look like the flag that user can use to _request_
>>>> something from the kernel. And I am talking about exactly
>>>> the flag to request the new behaviour, as only that can remove
>>>> the regression completely without patching dosemu.
>>> User code could rewrite it in the signal handler to request something.
>> But that's too late to affect the signal _delivery_ anyhow, no?
>> Any idea about the flag that can control both delivery and return?
> I think my LAR patch should cover the signal delivery part.
Ah, I see your point now.
But that's not what I mean, as it doesn't cover fs/gs, which
is what Linus is looking to revert now too (I am building the
testing kernels now).
So you obviously don't want the flag that will control all 3
things together without any lar heuristics, but I don't understand why...
Yes, your heuristic+uc_flag may work, but IMHO far from
perfection and TLS problem is not covered. I can test such
a patch but I don't understand why you don't want the flag
that will just control all things together.
On Thu, Aug 13, 2015 at 10:13 AM, Stas Sergeev <[email protected]> wrote:
> 13.08.2015 19:59, Andy Lutomirski пишет:
>
>> On Thu, Aug 13, 2015 at 9:48 AM, Stas Sergeev <[email protected]> wrote:
>>>
>>> 13.08.2015 19:42, Andy Lutomirski пишет:
>>>
>>>> On Thu, Aug 13, 2015 at 9:38 AM, Stas Sergeev <[email protected]> wrote:
>>>>>
>>>>> 13.08.2015 19:24, Andy Lutomirski пишет:
>>>>>
>>>>>> On Thu, Aug 13, 2015 at 9:20 AM, Stas Sergeev <[email protected]> wrote:
>>>>>>>
>>>>>>> 13.08.2015 19:09, Andy Lutomirski пишет:
>>>>>>>
>>>>>>>> On Thu, Aug 13, 2015 at 9:03 AM, Stas Sergeev <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>> 13.08.2015 18:38, Andy Lutomirski пишет:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> So... what do we do about it? We could revert the whole mess.
>>>>>>>>>> We
>>>>>>>>>> could tell everyone to fix their DOSEMU, which violates policy and
>>>>>>>>>> is
>>>>>>>>>> especially annoying given how much effort we've put into keeping
>>>>>>>>>> 16-bit mode fully functional lately. We could add yet more
>>>>>>>>>> heuristics
>>>>>>>>>> and teach sigreturn to ignore the saved SS value in sigcontext if
>>>>>>>>>> the
>>>>>>>>>> saved CS is 64-bit and the saved SS is unusable.
>>>>>>>>>
>>>>>>>>> Andy, why do you constantly ignore the proposal to make
>>>>>>>>> new behaviour explicitly controlable? You don't have to agree
>>>>>>>>> with it, but you could at least comment on that possibility
>>>>>>>>> and/or mention it with the ones you listed above.
>>>>>>>>
>>>>>>>> I'm not sure what the proposal is exactly.
>>>>>>>>
>>>>>>>> We could add a new uc_flags flag. If set, it means that
>>>>>>>> sigcontext->ss is valid and should be used by sigreturn. If clear,
>>>>>>>> then we ignore sigcontext->ss and just restore __USER_DS.
>>>>>>>>
>>>>>>>> The problem is that, by itself, this won't fix old DOSEMU. We
>>>>>>>> somehow
>>>>>>>> need to either detect that something funny is going on or just leave
>>>>>>>> the flag clear by default.
>>>>>>>>
>>>>>>>> We could do this: always save SS to sigcontext->ss, but only restore
>>>>>>>> sigcontext->ss if userspace explicitly sets the flag before
>>>>>>>> sigreturn.
>>>>>>>> If we do that, we'd need to also add my patch to preserve the actual
>>>>>>>> HW SS selector if possible so that old DOSEMU knows what SS to
>>>>>>>> program
>>>>>>>> into its trampoline.
>>>>>>>>
>>>>>>>> This at least lets *new* DOSEMU set the flag and get the improved
>>>>>>>> behavior. I still don't know what effect it'll have on Wine and
>>>>>>>> CRIU.
>>>>>>>>
>>>>>>>> Stas, is that what you were thinking, or were you thinking of
>>>>>>>> something
>>>>>>>> else?
>>>>>>>
>>>>>>> Not quite.
>>>>>>> I mean the flag that will control not only sigreturn, but
>>>>>>> the signal delivery as well. This may probably be a sigaction()
>>>>>>> flag or some other. If not set - ss is ignored by both signal
>>>>>>> delivery and sigreturn(). If set - ss is saved/restored (and in
>>>>>>> the future - also fs/gs).
>>>>>>> Is such a flag possible?
>>>>>>
>>>>>> Maybe. I think I'm more nervous about adding new flags in sigaction
>>>>>> than I am in uc_flags.
>>>>>
>>>>> Isn't uc_flags read-only for the user?
>>>>> I look into setup_rt_frame
>>>>> <http://lxr.free-electrons.com/ident?v=2.4.37;i=setup_rt_frame>() and
>>>>> see
>>>>> ---
>>>>> /* Create the ucontext. */
>>>>> err |= __put_user(0, &frame->uc.uc_flags);
>>>>> ---
>>>>> so it doesn't look like the flag that user can use to _request_
>>>>> something from the kernel. And I am talking about exactly
>>>>> the flag to request the new behaviour, as only that can remove
>>>>> the regression completely without patching dosemu.
>>>>
>>>> User code could rewrite it in the signal handler to request something.
>>>
>>> But that's too late to affect the signal _delivery_ anyhow, no?
>>> Any idea about the flag that can control both delivery and return?
>>
>> I think my LAR patch should cover the signal delivery part.
>
> Ah, I see your point now.
> But that's not what I mean, as it doesn't cover fs/gs, which
> is what Linus is looking to revert now too (I am building the
> testing kernels now).
> So you obviously don't want the flag that will control all 3
> things together without any lar heuristics, but I don't understand why...
> Yes, your heuristic+uc_flag may work, but IMHO far from
> perfection and TLS problem is not covered. I can test such
> a patch but I don't understand why you don't want the flag
> that will just control all things together.
The fs/gs patch doesn't change anything, so there's nothing to
control. It just renamed fields that did nothing. (It turns out they
did something back before arch_prctl existed, but there's only a
narrow range of kernels like that, and I'm not at all convinced that
those kernels are ABI-compatible with modern kernels at all. This is
all pre-git.)
Sure, it might make sense to change TLS behavior in signals at some
point, but I don't think we're there yet. We need to deal with
fsgsbase first, and that's a *huge* can of worms.
--Andy
--
Andy Lutomirski
AMA Capital Management, LLC
13.08.2015 18:37, Linus Torvalds пишет:
> On Tue, Aug 11, 2015 at 5:17 PM, Stas Sergeev <[email protected]> wrote:
>> I realize this patch may be good to have in general, but
>> breaking userspace without a single warning is a bit
>> discouraging. Seems like the old "we don't break userspace"
>> rule have gone.
> That rule hasn't gone anywhere.
>
> Does a plain revert just fix everything? Because if so, that's the
> right thing to do, and we can just re-visit this later.
>
> I don't understand why Andy and Ingo are even discussing this. What
> the f*ck, guys?
>
> Stas, can you verify that this actually fixes it? There's two
> different versions here: one that reverts *just* that one commit, and
> one that reverts the fs/gs changes too. Can you test them both?
Hello Linus, I verified that patch-minimal.diff is enough
to fix the problem, BUT! dosemu is in fact using the .fs and
.gs fields of sigcontext as a placeholders. Why the minimal
patch alone helps is simply because the kernel headers
installed in a system do not yet represent the newer kernel
developments and have the .fs and .gs fields in.
So, to allow the pre-compiled dosemu binary to run, patch-minimal
is enough. To allow re-compiling it, you'd need patch.diff.
Although I guess compilation is not the point, and the
fact that the pre-compiled binary works, is enough.
Now my PC is dying of overheat, I am struggling to even
boot it. I am not sure I'll test the "really-minimal" patch,
but it will unlikely to help because dosemu expects unmodified
ss in a sighandler, which "really-minimal" patch doesn't seem
to give.
13.08.2015 20:17, Andy Lutomirski пишет:
> On Thu, Aug 13, 2015 at 10:13 AM, Stas Sergeev <[email protected]> wrote:
>
>> Ah, I see your point now.
>> But that's not what I mean, as it doesn't cover fs/gs, which
>> is what Linus is looking to revert now too (I am building the
>> testing kernels now).
>> So you obviously don't want the flag that will control all 3
>> things together without any lar heuristics, but I don't understand why...
>> Yes, your heuristic+uc_flag may work, but IMHO far from
>> perfection and TLS problem is not covered. I can test such
>> a patch but I don't understand why you don't want the flag
>> that will just control all things together.
> The fs/gs patch doesn't change anything, so there's nothing to
> control. It just renamed fields that did nothing. (It turns out they
> did something back before arch_prctl existed, but there's only a
> narrow range of kernels like that, and I'm not at all convinced that
> those kernels are ABI-compatible with modern kernels at all. This is
> all pre-git.)
The problem is that dosemu existed back then too.
It still uses these fields as a place-holders. Well, this is a
compile-time breakage only, so perhaps not as important
as the run-time one, but still, you broke it in yet another way.
> Sure, it might make sense to change TLS behavior in signals at some
> point, but I don't think we're there yet. We need to deal with
> fsgsbase first, and that's a *huge* can of worms.
My point is not when to fix TLS or how.
But you can get the flag ready, for now controlling only SS
and fixing the regression, but it will define the course of the
further developments. When the time will come, it will cover
also TLS, but why not to get such a flag ready now, without
yet fixing TLS?
On Thu, Aug 13, 2015 at 11:00 AM, Stas Sergeev <[email protected]> wrote:
> 13.08.2015 20:17, Andy Lutomirski пишет:
>>
>> On Thu, Aug 13, 2015 at 10:13 AM, Stas Sergeev <[email protected]> wrote:
>>
>>> Ah, I see your point now.
>>> But that's not what I mean, as it doesn't cover fs/gs, which
>>> is what Linus is looking to revert now too (I am building the
>>> testing kernels now).
>>> So you obviously don't want the flag that will control all 3
>>> things together without any lar heuristics, but I don't understand why...
>>> Yes, your heuristic+uc_flag may work, but IMHO far from
>>> perfection and TLS problem is not covered. I can test such
>>> a patch but I don't understand why you don't want the flag
>>> that will just control all things together.
>>
>> The fs/gs patch doesn't change anything, so there's nothing to
>> control. It just renamed fields that did nothing. (It turns out they
>> did something back before arch_prctl existed, but there's only a
>> narrow range of kernels like that, and I'm not at all convinced that
>> those kernels are ABI-compatible with modern kernels at all. This is
>> all pre-git.)
>
> The problem is that dosemu existed back then too.
> It still uses these fields as a place-holders. Well, this is a
> compile-time breakage only, so perhaps not as important
> as the run-time one, but still, you broke it in yet another way.
Great. What exactly is DOSEMU sticking in those fields? Are we now
stuck ignoring the contents in sigreturn because DOSEMU coopts them
for its own purposes?
>
>> Sure, it might make sense to change TLS behavior in signals at some
>> point, but I don't think we're there yet. We need to deal with
>> fsgsbase first, and that's a *huge* can of worms.
>
> My point is not when to fix TLS or how.
> But you can get the flag ready, for now controlling only SS
> and fixing the regression, but it will define the course of the
> further developments. When the time will come, it will cover
> also TLS, but why not to get such a flag ready now, without
> yet fixing TLS?
I think that if we create a flag to change semantics, we shouldn't
introduce the flag and make it look like it works without actually
changing the semantics.
--Andy
--
Andy Lutomirski
AMA Capital Management, LLC
13.08.2015 21:05, Andy Lutomirski пишет:
> On Thu, Aug 13, 2015 at 11:00 AM, Stas Sergeev <[email protected]> wrote:
>> 13.08.2015 20:17, Andy Lutomirski пишет:
>>> On Thu, Aug 13, 2015 at 10:13 AM, Stas Sergeev <[email protected]> wrote:
>>>
>>>> Ah, I see your point now.
>>>> But that's not what I mean, as it doesn't cover fs/gs, which
>>>> is what Linus is looking to revert now too (I am building the
>>>> testing kernels now).
>>>> So you obviously don't want the flag that will control all 3
>>>> things together without any lar heuristics, but I don't understand why...
>>>> Yes, your heuristic+uc_flag may work, but IMHO far from
>>>> perfection and TLS problem is not covered. I can test such
>>>> a patch but I don't understand why you don't want the flag
>>>> that will just control all things together.
>>> The fs/gs patch doesn't change anything, so there's nothing to
>>> control. It just renamed fields that did nothing. (It turns out they
>>> did something back before arch_prctl existed, but there's only a
>>> narrow range of kernels like that, and I'm not at all convinced that
>>> those kernels are ABI-compatible with modern kernels at all. This is
>>> all pre-git.)
>> The problem is that dosemu existed back then too.
>> It still uses these fields as a place-holders. Well, this is a
>> compile-time breakage only, so perhaps not as important
>> as the run-time one, but still, you broke it in yet another way.
> Great. What exactly is DOSEMU sticking in those fields?
FS and GS of course. Saves by hands in a sighandler.
> Are we now
> stuck ignoring the contents in sigreturn because DOSEMU coopts them
> for its own purposes?
No, its just that these fields _were_ valid in times, and so,
when kernel stopped using them, dosemu authors opted
not to change their locations but just save things by hands.
What else do you suppose could they do?
Well, compile-time breakage is another thing, it can safely
be ignored on your side I guess.
Something like this should do:
https://github.com/stsp/dosemu2/commit/4e16d83b9c7b1f9155ec0c0c125fe6f755eb719c
>>> Sure, it might make sense to change TLS behavior in signals at some
>>> point, but I don't think we're there yet. We need to deal with
>>> fsgsbase first, and that's a *huge* can of worms.
>> My point is not when to fix TLS or how.
>> But you can get the flag ready, for now controlling only SS
>> and fixing the regression, but it will define the course of the
>> further developments. When the time will come, it will cover
>> also TLS, but why not to get such a flag ready now, without
>> yet fixing TLS?
> I think that if we create a flag to change semantics, we shouldn't
> introduce the flag and make it look like it works without actually
> changing the semantics.
It is more about selecting the right field for such a flag.
You can select the right field now, and introduce some flag
to it, like SIG_SAVE_SS or whatever. This will fix a regression.
Then, when the TLS time will code, you'll just add SIG_SAVE_FS
flag to the same field, so that they can be ORed.
On Thu, Aug 13, 2015 at 11:19 AM, Stas Sergeev <[email protected]> wrote:
> It is more about selecting the right field for such a flag.
> You can select the right field now, and introduce some flag
> to it, like SIG_SAVE_SS or whatever. This will fix a regression.
> Then, when the TLS time will code, you'll just add SIG_SAVE_FS
> flag to the same field, so that they can be ORed.
Oh.
I think the field is obvious: uc_flags. I also thing that all of the
saving should happen automatically, since I still don't see how
anything will break if the kernel starts saving more things. It's the
restore part that's problematic.
--Andy
--
Andy Lutomirski
AMA Capital Management, LLC
On Thu, Aug 13, 2015 at 10:51 AM, Stas Sergeev <[email protected]> wrote:
>
> Hello Linus, I verified that patch-minimal.diff is enough
> to fix the problem, BUT! dosemu is in fact using the .fs and
> .gs fields of sigcontext as a placeholders. Why the minimal
> patch alone helps is simply because the kernel headers
> installed in a system do not yet represent the newer kernel
> developments and have the .fs and .gs fields in.
Ok. So I'm inclined to do the bigger revert, just to fix the compile
issue. It would be crazy to force some silly autoconf script for
random header info.
Of course, we could also just let the uabi version go out of sync with
the internal version, but that sounds even worse. I think we'd be
better off just leaving the old names, and just having big comments
about this..
Andy? I agree that we should strive to improve in this area, and this
should be worked on, but that would seem to be a 4.3 issue (and mark
that too for stable once it actually works). No?
Linus
13.08.2015 21:25, Andy Lutomirski пишет:
> On Thu, Aug 13, 2015 at 11:19 AM, Stas Sergeev <[email protected]> wrote:
>
>> It is more about selecting the right field for such a flag.
>> You can select the right field now, and introduce some flag
>> to it, like SIG_SAVE_SS or whatever. This will fix a regression.
>> Then, when the TLS time will code, you'll just add SIG_SAVE_FS
>> flag to the same field, so that they can be ORed.
> Oh.
>
> I think the field is obvious: uc_flags.
But Andy, I don't understand...
If we are talking about the field that will in the future also
cover TLS, how can this be uc_flags? By using uc_flags,
how will you control the restoring of FS on signal delivery?
> I also thing that all of the
> saving should happen automatically, since I still don't see how
> anything will break if the kernel starts saving more things. It's the
> restore part that's problematic.
OK, so SIG_RESTORE_SS and SIG_RESTORE_FS.
How can those be the part of uc_flags, if we want to
control the restoring _on a signal delivery_?
On Thu, Aug 13, 2015 at 11:35 AM, Linus Torvalds
<[email protected]> wrote:
> On Thu, Aug 13, 2015 at 10:51 AM, Stas Sergeev <[email protected]> wrote:
>>
>> Hello Linus, I verified that patch-minimal.diff is enough
>> to fix the problem, BUT! dosemu is in fact using the .fs and
>> .gs fields of sigcontext as a placeholders. Why the minimal
>> patch alone helps is simply because the kernel headers
>> installed in a system do not yet represent the newer kernel
>> developments and have the .fs and .gs fields in.
>
> Ok. So I'm inclined to do the bigger revert, just to fix the compile
> issue. It would be crazy to force some silly autoconf script for
> random header info.
>
> Of course, we could also just let the uabi version go out of sync with
> the internal version, but that sounds even worse. I think we'd be
> better off just leaving the old names, and just having big comments
> about this..
>
> Andy? I agree that we should strive to improve in this area, and this
> should be worked on, but that would seem to be a 4.3 issue (and mark
> that too for stable once it actually works). No?
Yeah, probably makes sense, but one of us should explicitly test CRIU
(both new and old versions) on the result. CRIU does interesting
things involving sigcontext and protocol buffers.
We can do something a bit silly wrt UABI in the long run:
union { unsigned long __pad0; unsigned long ss; };
I have half-written patches to do better, but I don't think they're
4.2 material. I'm already vastly over my quota for late-arriving 4.2
things.
Stas: I think uc_flags is okay. We don't currently read it during
sigreturn, but I see no reason that we can't start reading it.
--Andy
13.08.2015 21:35, Linus Torvalds пишет:
> On Thu, Aug 13, 2015 at 10:51 AM, Stas Sergeev <[email protected]> wrote:
>> Hello Linus, I verified that patch-minimal.diff is enough
>> to fix the problem, BUT! dosemu is in fact using the .fs and
>> .gs fields of sigcontext as a placeholders. Why the minimal
>> patch alone helps is simply because the kernel headers
>> installed in a system do not yet represent the newer kernel
>> developments and have the .fs and .gs fields in.
> Ok. So I'm inclined to do the bigger revert, just to fix the compile
> issue. It would be crazy to force some silly autoconf script for
> random header info.
But OTOH these fields already lost their meaning.
It may make sense to force people to stop using them,
in case you ever want to re-use them again in the future.
From what Andy says, it seems there are the distant plans
to start restoring FS again. If people still use sigcontext.fs
by that time, you'll get problems. If you force everyone to
stop using them - you'll be safe.
Also, at least in the past, resolving the compile-time problems
was up to the distributions: they always provided the "sanitized up"
version of kernel headers. Not sure what the current policy is...
In fact, here in Fedora-22, I have
/usr/include/asm/sigcontext.h
that is straight from the kernel, but signal.h is instead using
a "sanitized up" version in
/usr/include/bits/sigcontext.h
so the userspace compiles fine.
In fact, I think the "silly autoconf script" you mentioned above,
should indeed be reverted, and instead I should use
sigcontext.reserved1[8] array to store FS/GS? Is this
safer against ever re-using this space? Not sure...
On Thu, Aug 13, 2015 at 11:57 AM, Stas Sergeev <[email protected]> wrote:
> 13.08.2015 21:35, Linus Torvalds пишет:
>>
>> On Thu, Aug 13, 2015 at 10:51 AM, Stas Sergeev <[email protected]> wrote:
>>>
>>> Hello Linus, I verified that patch-minimal.diff is enough
>>> to fix the problem, BUT! dosemu is in fact using the .fs and
>>> .gs fields of sigcontext as a placeholders. Why the minimal
>>> patch alone helps is simply because the kernel headers
>>> installed in a system do not yet represent the newer kernel
>>> developments and have the .fs and .gs fields in.
>>
>> Ok. So I'm inclined to do the bigger revert, just to fix the compile
>> issue. It would be crazy to force some silly autoconf script for
>> random header info.
>
> But OTOH these fields already lost their meaning.
> It may make sense to force people to stop using them,
> in case you ever want to re-use them again in the future.
> From what Andy says, it seems there are the distant plans
> to start restoring FS again. If people still use sigcontext.fs
> by that time, you'll get problems. If you force everyone to
> stop using them - you'll be safe.
There are distant plans to think about restoring them, at least. But
it's not just FS -- it's FSBASE as well, and that's not going to fit
in the same slot. And we still don't get to break old DOSEMU versions
(knowingly, anyway).
>
> In fact, I think the "silly autoconf script" you mentioned above,
> should indeed be reverted, and instead I should use
> sigcontext.reserved1[8] array to store FS/GS? Is this
> safer against ever re-using this space? Not sure...
Honestly, I'd just save it somewhere outside sigcontext. If it's
application data, treat it as such. OTOH, if you've already been
saving it in the old FS/GS slots, I see no great reason to change it.
--Andy
--
Andy Lutomirski
AMA Capital Management, LLC
13.08.2015 21:41, Andy Lutomirski пишет:
> Stas: I think uc_flags is okay. We don't currently read it during
> sigreturn, but I see no reason that we can't start reading it.
Andy, we definitely have some communication discontinuity here. :)
The point is not sigreturn. If we are talking about the flags that
will in the future control also TLS, how would you limit it to sigreturn()?
It should control the restoring of FS _on signal delivery_, not only
on sigreturn()! So how uc_flags can be used for this at all?
13.08.2015 22:01, Andy Lutomirski пишет:
> On Thu, Aug 13, 2015 at 11:57 AM, Stas Sergeev <[email protected]> wrote:
>> 13.08.2015 21:35, Linus Torvalds пишет:
>>> On Thu, Aug 13, 2015 at 10:51 AM, Stas Sergeev <[email protected]> wrote:
>>>> Hello Linus, I verified that patch-minimal.diff is enough
>>>> to fix the problem, BUT! dosemu is in fact using the .fs and
>>>> .gs fields of sigcontext as a placeholders. Why the minimal
>>>> patch alone helps is simply because the kernel headers
>>>> installed in a system do not yet represent the newer kernel
>>>> developments and have the .fs and .gs fields in.
>>> Ok. So I'm inclined to do the bigger revert, just to fix the compile
>>> issue. It would be crazy to force some silly autoconf script for
>>> random header info.
>> But OTOH these fields already lost their meaning.
>> It may make sense to force people to stop using them,
>> in case you ever want to re-use them again in the future.
>> From what Andy says, it seems there are the distant plans
>> to start restoring FS again. If people still use sigcontext.fs
>> by that time, you'll get problems. If you force everyone to
>> stop using them - you'll be safe.
> There are distant plans to think about restoring them, at least. But
> it's not just FS -- it's FSBASE as well, and that's not going to fit
> in the same slot. And we still don't get to break old DOSEMU versions
> (knowingly, anyway).
>
>> In fact, I think the "silly autoconf script" you mentioned above,
>> should indeed be reverted, and instead I should use
>> sigcontext.reserved1[8] array to store FS/GS? Is this
>> safer against ever re-using this space? Not sure...
> Honestly, I'd just save it somewhere outside sigcontext. If it's
> application data, treat it as such. OTOH, if you've already been
> saving it in the old FS/GS slots, I see no great reason to change it.
OK, as you promise not to re-use the same slots in the future,
I won't change it. Thanks.
As for the compilation failure - I am surprised you even care.
I thought the "we don't break userspace" covers only run-time,
not compile-time. Oh well.
On Thu, Aug 13, 2015 at 12:13 PM, Stas Sergeev <[email protected]> wrote:
>
> As for the compilation failure - I am surprised you even care.
> I thought the "we don't break userspace" covers only run-time,
> not compile-time. Oh well.
I definitely care.
Compile issues may be slightly lower on my radar, but the basic rule
should be that upgrading a kernel shouldn't cause problems.
The only exception is for stuff that gets very intimate with the
kernel itself - ie things like external modules etc. Those we don't
really try to cater to, or care about.
Things that actually include internal kernel headers tend to have just
themselves to blame and historically we really didn't care very much
(long long ago we had issues with ext3fs-tools etc until those kinds
of things just ended up making their own copies).
But the point of the uabi headers was that even _that_ is supposed to
actually work, at least for the limited case of those headers.. So if
a uabi header change causes problems, we really *should* care, because
otherwise, what was the point of that whole uabi rework, really?
(And no, we've not always succeeded in that "shouldn't cause problems"
space. So I'm not claiming we have a perfect track record, but I do
claim that I at least care very deeply)
Linus
On Aug 13, 2015 12:05 PM, "Stas Sergeev" <[email protected]> wrote:
>
> 13.08.2015 21:41, Andy Lutomirski пишет:
>
>> Stas: I think uc_flags is okay. We don't currently read it during
>> sigreturn, but I see no reason that we can't start reading it.
>
> Andy, we definitely have some communication discontinuity here. :)
> The point is not sigreturn. If we are talking about the flags that
> will in the future control also TLS, how would you limit it to sigreturn()?
> It should control the restoring of FS _on signal delivery_, not only
> on sigreturn()! So how uc_flags can be used for this at all?
Ah, you want it restored on signal delivery. What would it be
restored to? ISTM that can be done easily enough in user code, so
maybe we should leave it to user code.
--Andy
On Thu, Aug 13, 2015 at 11:41 AM, Andy Lutomirski <[email protected]> wrote:
> On Thu, Aug 13, 2015 at 11:35 AM, Linus Torvalds
> <[email protected]> wrote:
>>
>> Ok. So I'm inclined to do the bigger revert, just to fix the compile
>> issue. It would be crazy to force some silly autoconf script for
>> random header info.
>
> Yeah, probably makes sense, but one of us should explicitly test CRIU
> (both new and old versions) on the result. CRIU does interesting
> things involving sigcontext and protocol buffers.
I've reverted it in the -git tree, I'm adding Pavel and Cyrill to the
cc, to see if they can check the impact on CRIU..
Pavel/Cyrill? Current top-of-tree (but that will change) commit
ed596cde9425 ("Revert x86 sigcontext cleanups").
Linus
13.08.2015 22:37, Linus Torvalds пишет:
> On Thu, Aug 13, 2015 at 12:13 PM, Stas Sergeev <[email protected]> wrote:
>> As for the compilation failure - I am surprised you even care.
>> I thought the "we don't break userspace" covers only run-time,
>> not compile-time. Oh well.
> I definitely care.
>
> Compile issues may be slightly lower on my radar, but the basic rule
> should be that upgrading a kernel shouldn't cause problems.
It doesn't: fedora provides a "sanitized up" version of sigcontext.h
in /usr/include/bits, which comes from glibc-headers-2.21-7.fc22.x86_64.
So it seems the "sanitized up" headers come from glibc, which
means all other distros would have that too.
> The only exception is for stuff that gets very intimate with the
> kernel itself - ie things like external modules etc. Those we don't
> really try to cater to, or care about.
>
> Things that actually include internal kernel headers tend to have just
> themselves to blame and historically we really didn't care very much
> (long long ago we had issues with ext3fs-tools etc until those kinds
> of things just ended up making their own copies).
>
> But the point of the uabi headers was that even _that_ is supposed to
> actually work, at least for the limited case of those headers.. So if
> a uabi header change causes problems, we really *should* care, because
> otherwise, what was the point of that whole uabi rework, really?
I thought this is mainly to help glibc and distributors to
provide a sanitized-up versions.
If it was for something else, then this "something else" didn't
reach the user PCs yet. :) Fedora-22 still has the sanitized
version in the glibc-headers package, so no compilation break
because of your change was ever actually observed.
On Thu, Aug 13, 2015 at 12:59 PM, Stas Sergeev <[email protected]> wrote:
>
> It doesn't: fedora provides a "sanitized up" version of sigcontext.h
> in /usr/include/bits, which comes from glibc-headers-2.21-7.fc22.x86_64.
> So it seems the "sanitized up" headers come from glibc, which
> means all other distros would have that too.
Yes. Except the whole point of uapi was that in the long term the
glibc people should just be able to pick up the kernel ones directly.
And while that may involve some further editing, it sure as hell
shouldn't involve "oh, I know, this got renamed, so now I have to
rename it back". That would be crazy.
Linus
On Thu, Aug 13, 2015 at 12:53:03PM -0700, Linus Torvalds wrote:
> On Thu, Aug 13, 2015 at 11:41 AM, Andy Lutomirski <[email protected]> wrote:
> > On Thu, Aug 13, 2015 at 11:35 AM, Linus Torvalds
> > <[email protected]> wrote:
> >>
> >> Ok. So I'm inclined to do the bigger revert, just to fix the compile
> >> issue. It would be crazy to force some silly autoconf script for
> >> random header info.
> >
> > Yeah, probably makes sense, but one of us should explicitly test CRIU
> > (both new and old versions) on the result. CRIU does interesting
> > things involving sigcontext and protocol buffers.
>
> I've reverted it in the -git tree, I'm adding Pavel and Cyrill to the
> cc, to see if they can check the impact on CRIU..
>
> Pavel/Cyrill? Current top-of-tree (but that will change) commit
> ed596cde9425 ("Revert x86 sigcontext cleanups").
If only I'm not missin something obvious this should not hurt us.
But I gonna build test kernel and check to be sure tomorrow, ok?
13.08.2015 22:49, Andy Lutomirski пишет:
> On Aug 13, 2015 12:05 PM, "Stas Sergeev" <[email protected]> wrote:
>> 13.08.2015 21:41, Andy Lutomirski пишет:
>>
>>> Stas: I think uc_flags is okay. We don't currently read it during
>>> sigreturn, but I see no reason that we can't start reading it.
>> Andy, we definitely have some communication discontinuity here. :)
>> The point is not sigreturn. If we are talking about the flags that
>> will in the future control also TLS, how would you limit it to sigreturn()?
>> It should control the restoring of FS _on signal delivery_, not only
>> on sigreturn()! So how uc_flags can be used for this at all?
> Ah, you want it restored on signal delivery. What would it be
> restored to?
Null descriptor and TLS base in MSR I guess, no?
> ISTM that can be done easily enough in user code, so
> maybe we should leave it to user code.
But it is actually not.
gcc relies of fs pointing to TLS on the function prolog, so
the asm signal handlers again?
And there are just too many trickery for an asm handler.
Should it do the syscall to set fs base via MSR? And to what
value? Why do you think the user should mess with all this
pain? It is just much easier to do on a kernel side, is it not?
And IMHO this is the kernel's responsibility to adhere to the
ABI constraints when entering the signal handler, and the
ABI says fs should point to TLS.
On Thu, Aug 13, 2015 at 1:08 PM, Cyrill Gorcunov <[email protected]> wrote:
>
> If only I'm not missin something obvious this should not hurt us.
> But I gonna build test kernel and check to be sure tomorrow, ok?
Thanks,
Linus
On 08/13/15 13:09, Linus Torvalds wrote:
> On Thu, Aug 13, 2015 at 1:08 PM, Cyrill Gorcunov <[email protected]> wrote:
>> If only I'm not missin something obvious this should not hurt us.
>> But I gonna build test kernel and check to be sure tomorrow, ok?
> Thanks,
>
> Linus
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
I am curious about what's supposed to happen normally on signal delivery.
Is SS a register that's supposed to be preserved like EIP/RIP and CS
when a signal is delivered?
On Thu, Aug 13, 2015 at 2:42 PM, Raymond Jennings <[email protected]> wrote:
>
> I am curious about what's supposed to happen normally on signal delivery.
>
> Is SS a register that's supposed to be preserved like EIP/RIP and CS when a
> signal is delivered?
What exactly does "supposed" mean?
On x86-64, we traditionally haven't touched SS, because it doesn't
really matter in 64-bit long mode. And apparently dosemu depended on
that behavior.
So clearly, we're not "supposed" to save/restore it. Because reality
matters a hell of a lot more than any theoretical arguments.
Linus
On 08/13/15 14:46, Linus Torvalds wrote:
> On Thu, Aug 13, 2015 at 2:42 PM, Raymond Jennings <[email protected]> wrote:
>> I am curious about what's supposed to happen normally on signal delivery.
>>
>> Is SS a register that's supposed to be preserved like EIP/RIP and CS when a
>> signal is delivered?
> What exactly does "supposed" mean?
Basically, when a process/thread receives a signal, what happens to its
registers?
> So clearly, we're not "supposed" to save/restore it. Because reality
> matters a hell of a lot more than any theoretical arguments.
So it still counts as a regression if the kernel pulls the rug out from
under someone that was relying on undocumented or buggy behavior?
14.08.2015 00:46, Linus Torvalds пишет:
> On Thu, Aug 13, 2015 at 2:42 PM, Raymond Jennings <[email protected]> wrote:
>> I am curious about what's supposed to happen normally on signal delivery.
>>
>> Is SS a register that's supposed to be preserved like EIP/RIP and CS when a
>> signal is delivered?
> What exactly does "supposed" mean?
>
> On x86-64, we traditionally haven't touched SS, because it doesn't
> really matter in 64-bit long mode. And apparently dosemu depended on
> that behavior.
>
> So clearly, we're not "supposed" to save/restore it. Because reality
> matters a hell of a lot more than any theoretical arguments.
Unless you introduce some clever flag to explicitly request its restoring.
There is another problem as well which is that gcc assumes
FS base to point to TLS at function prolog. Since FS is not
restored too, the only suggestion I get is to write a sighandlers
in asm... I wonder if someone really should write a sighandler in
asm to restore FS base manually with a syscall.
So I think the reality is asking for a new flag. :)
14.08.2015 01:01, Raymond Jennings пишет:
>
>
> On 08/13/15 14:46, Linus Torvalds wrote:
>> On Thu, Aug 13, 2015 at 2:42 PM, Raymond Jennings
>> <[email protected]> wrote:
>>> I am curious about what's supposed to happen normally on signal
>>> delivery.
>>>
>>> Is SS a register that's supposed to be preserved like EIP/RIP and CS
>>> when a
>>> signal is delivered?
>> What exactly does "supposed" mean?
> Basically, when a process/thread receives a signal, what happens to
> its registers?
>> So clearly, we're not "supposed" to save/restore it. Because reality
>> matters a hell of a lot more than any theoretical arguments.
> So it still counts as a regression if the kernel pulls the rug out
> from under someone that was relying on undocumented or buggy behavior?
>
You probably want to read the whole thread...
Or start from here:
https://lkml.org/lkml/2015/8/12/800
On Thu, Aug 13, 2015 at 3:02 PM, Stas Sergeev <[email protected]> wrote:
> 14.08.2015 00:46, Linus Torvalds пишет:
>>
>> On Thu, Aug 13, 2015 at 2:42 PM, Raymond Jennings <[email protected]>
>> wrote:
>>>
>>> I am curious about what's supposed to happen normally on signal delivery.
>>>
>>> Is SS a register that's supposed to be preserved like EIP/RIP and CS when
>>> a
>>> signal is delivered?
>>
>> What exactly does "supposed" mean?
>>
>> On x86-64, we traditionally haven't touched SS, because it doesn't
>> really matter in 64-bit long mode. And apparently dosemu depended on
>> that behavior.
>>
>> So clearly, we're not "supposed" to save/restore it. Because reality
>> matters a hell of a lot more than any theoretical arguments.
>
> Unless you introduce some clever flag to explicitly request its restoring.
> There is another problem as well which is that gcc assumes
> FS base to point to TLS at function prolog. Since FS is not
> restored too, the only suggestion I get is to write a sighandlers
> in asm... I wonder if someone really should write a sighandler in
> asm to restore FS base manually with a syscall.
> So I think the reality is asking for a new flag. :)
I still don't see how this hypothetical flag would work.
The relevant task state consists of FS and the hidden FS base
register. If you build with -fstack-protector, GCC really wants the
FS base register to point to the right place. So you can't change it
in the middle of a C function (because the prologue and epilogue will
fail to match).
Now suppose you set some magic flag and jump (via sigreturn,
trampoline, whatever) into DOS code. The DOS code loads 0x7 into FS
and then gets #GP. You land in a signal handler. As far as the
kernel's concerned, the FS base register is whatever the base of LDT
entry 0 is. What else is the kernel supposed to shove in there?
I think that making this work fully in the kernel would require a
full-blown FS equivalent of sigaltstack, and that seems like overkill.
Now, if it turns out that Oracle or whoever wants to write a JVM that
uses WRFSBASE but still handles signals using standard C signal
handlers, they're going to have the same problem. They, too, could
fix it in libc, or someone could come up with an in-kernel mechanism
that gets enabled at the same time as WRFSBASE and is careful not to
break existing code.
I really think that we want to have a single, coherent change whenever
WRFSBASE and WRGSBASE get enabled that covers all the bases. None of
the patch sets so far have done that. But this cuts both ways: I
don't think we should change FS and GS behavior in signal handlers
until we are confident that we know what's happening with the FSGSBASE
instructions.
--Andy
14.08.2015 01:11, Andy Lutomirski пишет:
> On Thu, Aug 13, 2015 at 3:02 PM, Stas Sergeev <[email protected]> wrote:
>> 14.08.2015 00:46, Linus Torvalds пишет:
>>> On Thu, Aug 13, 2015 at 2:42 PM, Raymond Jennings <[email protected]>
>>> wrote:
>>>> I am curious about what's supposed to happen normally on signal delivery.
>>>>
>>>> Is SS a register that's supposed to be preserved like EIP/RIP and CS when
>>>> a
>>>> signal is delivered?
>>> What exactly does "supposed" mean?
>>>
>>> On x86-64, we traditionally haven't touched SS, because it doesn't
>>> really matter in 64-bit long mode. And apparently dosemu depended on
>>> that behavior.
>>>
>>> So clearly, we're not "supposed" to save/restore it. Because reality
>>> matters a hell of a lot more than any theoretical arguments.
>> Unless you introduce some clever flag to explicitly request its restoring.
>> There is another problem as well which is that gcc assumes
>> FS base to point to TLS at function prolog. Since FS is not
>> restored too, the only suggestion I get is to write a sighandlers
>> in asm... I wonder if someone really should write a sighandler in
>> asm to restore FS base manually with a syscall.
>> So I think the reality is asking for a new flag. :)
> I still don't see how this hypothetical flag would work.
>
> The relevant task state consists of FS and the hidden FS base
> register. If you build with -fstack-protector, GCC really wants the
> FS base register to point to the right place. So you can't change it
> in the middle of a C function (because the prologue and epilogue will
> fail to match).
>
> Now suppose you set some magic flag and jump (via sigreturn,
> trampoline, whatever) into DOS code. The DOS code loads 0x7 into FS
> and then gets #GP. You land in a signal handler. As far as the
> kernel's concerned, the FS base register is whatever the base of LDT
> entry 0 is. What else is the kernel supposed to shove in there?
The same as what happens when you do in userspace:
---
asm ("mov $0,%%fs\n");
prctl(ARCH_SET_FS, my_tls_base);
---
This was the trick I did before gcc started to use FS in prolog,
now I have to do this in asm.
But how simpler for the kernel is to do the same?
> I think that making this work fully in the kernel would require a
> full-blown FS equivalent of sigaltstack, and that seems like overkill.
Setting selector and base is what you call an "equivalent of sigaltstack"?
On Thu, Aug 13, 2015 at 3:25 PM, Stas Sergeev <[email protected]> wrote:
> 14.08.2015 01:11, Andy Lutomirski пишет:
>
>> Now suppose you set some magic flag and jump (via sigreturn,
>> trampoline, whatever) into DOS code. The DOS code loads 0x7 into FS
>> and then gets #GP. You land in a signal handler. As far as the
>> kernel's concerned, the FS base register is whatever the base of LDT
>> entry 0 is. What else is the kernel supposed to shove in there?
>
> The same as what happens when you do in userspace:
> ---
> asm ("mov $0,%%fs\n");
> prctl(ARCH_SET_FS, my_tls_base);
> ---
>
> This was the trick I did before gcc started to use FS in prolog,
> now I have to do this in asm.
> But how simpler for the kernel is to do the same?
>
>> I think that making this work fully in the kernel would require a
>> full-blown FS equivalent of sigaltstack, and that seems like overkill.
>
> Setting selector and base is what you call an "equivalent of sigaltstack"?
Yes. sigaltstack says "hey, kernel! here's my SP for signal
handling." I think we'd need something similar to tell the kernel
what my_tls_base is. Using the most recent thing passed to
ARCH_SET_FS is no good because WRFSBASE systems might not use
ARCH_SET_FS, and we can't break DOSEMU on Ivy Bridge and newer as soon
as we enable WRFSBASE.
--Andy
14.08.2015 01:29, Andy Lutomirski пишет:
> On Thu, Aug 13, 2015 at 3:25 PM, Stas Sergeev <[email protected]> wrote:
>> 14.08.2015 01:11, Andy Lutomirski пишет:
>>
>>> Now suppose you set some magic flag and jump (via sigreturn,
>>> trampoline, whatever) into DOS code. The DOS code loads 0x7 into FS
>>> and then gets #GP. You land in a signal handler. As far as the
>>> kernel's concerned, the FS base register is whatever the base of LDT
>>> entry 0 is. What else is the kernel supposed to shove in there?
>> The same as what happens when you do in userspace:
>> ---
>> asm ("mov $0,%%fs\n");
>> prctl(ARCH_SET_FS, my_tls_base);
>> ---
>>
>> This was the trick I did before gcc started to use FS in prolog,
>> now I have to do this in asm.
>> But how simpler for the kernel is to do the same?
>>
>>> I think that making this work fully in the kernel would require a
>>> full-blown FS equivalent of sigaltstack, and that seems like overkill.
>> Setting selector and base is what you call an "equivalent of sigaltstack"?
> Yes. sigaltstack says "hey, kernel! here's my SP for signal
> handling." I think we'd need something similar to tell the kernel
> what my_tls_base is. Using the most recent thing passed to
> ARCH_SET_FS is no good because WRFSBASE systems might not use
> ARCH_SET_FS, and we can't break DOSEMU on Ivy Bridge and newer as soon
> as we enable WRFSBASE.
If someone uses WRFSBASE and wants things to be preserved
in a sighandler, he'll just not set the aforementioned flag. No regression.
Whoever wants to use that flag properly, will not use WRFSBASE,
and will use ARCH_SET_FS or set_thread_area().
What exactly breakage do you have in mind?
On Thu, Aug 13, 2015 at 3:51 PM, Stas Sergeev <[email protected]> wrote:
> 14.08.2015 01:29, Andy Lutomirski пишет:
>>
>> On Thu, Aug 13, 2015 at 3:25 PM, Stas Sergeev <[email protected]> wrote:
>>>
>>> 14.08.2015 01:11, Andy Lutomirski пишет:
>>>
>>>> Now suppose you set some magic flag and jump (via sigreturn,
>>>> trampoline, whatever) into DOS code. The DOS code loads 0x7 into FS
>>>> and then gets #GP. You land in a signal handler. As far as the
>>>> kernel's concerned, the FS base register is whatever the base of LDT
>>>> entry 0 is. What else is the kernel supposed to shove in there?
>>>
>>> The same as what happens when you do in userspace:
>>> ---
>>> asm ("mov $0,%%fs\n");
>>> prctl(ARCH_SET_FS, my_tls_base);
>>> ---
>>>
>>> This was the trick I did before gcc started to use FS in prolog,
>>> now I have to do this in asm.
>>> But how simpler for the kernel is to do the same?
>>>
>>>> I think that making this work fully in the kernel would require a
>>>> full-blown FS equivalent of sigaltstack, and that seems like overkill.
>>>
>>> Setting selector and base is what you call an "equivalent of
>>> sigaltstack"?
>>
>> Yes. sigaltstack says "hey, kernel! here's my SP for signal
>> handling." I think we'd need something similar to tell the kernel
>> what my_tls_base is. Using the most recent thing passed to
>> ARCH_SET_FS is no good because WRFSBASE systems might not use
>> ARCH_SET_FS, and we can't break DOSEMU on Ivy Bridge and newer as soon
>> as we enable WRFSBASE.
>
> If someone uses WRFSBASE and wants things to be preserved
> in a sighandler, he'll just not set the aforementioned flag. No regression.
> Whoever wants to use that flag properly, will not use WRFSBASE,
> and will use ARCH_SET_FS or set_thread_area().
> What exactly breakage do you have in mind?
DOSEMU, when you set that flag, WRFSBASE gets enabled, and glibc's
threading library starts using WRFSBASE instead of arch_prctl.
--Andy
On Thu, Aug 13, 2015 at 3:01 PM, Raymond Jennings <[email protected]> wrote:
>
> So it still counts as a regression if the kernel pulls the rug out from
> under someone that was relying on undocumented or buggy behavior?
Absolutely. There are no excuses for regressions. If the code was
badly written and left itself open to user space "misusing" it, it's
our problem. Especially if the code was badly written to begin with,
and user space worked around our bad code.
We don't then "fix" code and blame user space for doing bad things.
In particular, we don't cop out and say "hey, you didn't follow the
docs" (whether they existed or not) or "you didn't do what we meant
you to do".
The _only_ thing that matters is that something broke.
Linus
14.08.2015 02:00, Andy Lutomirski пишет:
> On Thu, Aug 13, 2015 at 3:51 PM, Stas Sergeev <[email protected]> wrote:
>> 14.08.2015 01:29, Andy Lutomirski пишет:
>>> On Thu, Aug 13, 2015 at 3:25 PM, Stas Sergeev <[email protected]> wrote:
>>>> 14.08.2015 01:11, Andy Lutomirski пишет:
>>>>
>>>>> Now suppose you set some magic flag and jump (via sigreturn,
>>>>> trampoline, whatever) into DOS code. The DOS code loads 0x7 into FS
>>>>> and then gets #GP. You land in a signal handler. As far as the
>>>>> kernel's concerned, the FS base register is whatever the base of LDT
>>>>> entry 0 is. What else is the kernel supposed to shove in there?
>>>> The same as what happens when you do in userspace:
>>>> ---
>>>> asm ("mov $0,%%fs\n");
>>>> prctl(ARCH_SET_FS, my_tls_base);
>>>> ---
>>>>
>>>> This was the trick I did before gcc started to use FS in prolog,
>>>> now I have to do this in asm.
>>>> But how simpler for the kernel is to do the same?
>>>>
>>>>> I think that making this work fully in the kernel would require a
>>>>> full-blown FS equivalent of sigaltstack, and that seems like overkill.
>>>> Setting selector and base is what you call an "equivalent of
>>>> sigaltstack"?
>>> Yes. sigaltstack says "hey, kernel! here's my SP for signal
>>> handling." I think we'd need something similar to tell the kernel
>>> what my_tls_base is. Using the most recent thing passed to
>>> ARCH_SET_FS is no good because WRFSBASE systems might not use
>>> ARCH_SET_FS, and we can't break DOSEMU on Ivy Bridge and newer as soon
>>> as we enable WRFSBASE.
>> If someone uses WRFSBASE and wants things to be preserved
>> in a sighandler, he'll just not set the aforementioned flag. No regression.
>> Whoever wants to use that flag properly, will not use WRFSBASE,
>> and will use ARCH_SET_FS or set_thread_area().
>> What exactly breakage do you have in mind?
> DOSEMU, when you set that flag, WRFSBASE gets enabled, and glibc's
> threading library starts using WRFSBASE instead of arch_prctl.
Whoever wants to use the new flag, will need to call
prctl(ARCH_SET_FS, needed_tls) or the like, before altering FS.
"needed_tls" will be the current TLS in most cases.
This will make kernel to know what TLS should be restored in
sighandler. Yes, that's resembles sigaltstack to some degree,
but is simple? Nothing special on kernel side, and not a big
deal for the user to call prctl().
On Thu, Aug 13, 2015 at 4:05 PM, Linus Torvalds
<[email protected]> wrote:
>
> The _only_ thing that matters is that something broke.
To clarify: things like test programs etc don't matter. Real
applications, used by real users. That's what regressions cover. If
you have a workflow that isn't just some random kernel test thing, and
you depend on it, and we break it, the kernel is supposed to fix it.
There are some (very few) exceptions.
If it's a security issue, we may not be able to "fix" it, because
other concerns can obviously take precedence.
Also, sometimes the reports come in way too late - if you were running
some stable distro kernel for several years, and updated, and notice a
change that happened four years ago and modern applications now rely
on the _new_ behavior, we may not be able to fix the regression any
more.
But no, "it was an unintentional kernel bug and clearly just stupid
crap code, and we fixed it and now the kernel is much better and
cleaner" is not a valid reason for regressions. We'll go back to the
stupid and crap code if necessary, however much that may annoy us.
For an example of the kind of things we may have to do, see commits
64f371bc3107 autofs: make the autofsv5 packet file descriptor use
a packetized pipe
9883035ae7ed pipes: add a "packetized pipe" mode for writing
and just wonder at the insanity. That's the kinds of things that
happen when one application had actively worked around a bug in
compatibility handling, and then trying to "fix" that bug just caused
another application to break instead.
Linus
On 08/13/15 16:18, Linus Torvalds wrote:
> On Thu, Aug 13, 2015 at 4:05 PM, Linus Torvalds
> <[email protected]> wrote:
>> The _only_ thing that matters is that something broke.
> To clarify: things like test programs etc don't matter. Real
> applications, used by real users. That's what regressions cover. If
> you have a workflow that isn't just some random kernel test thing, and
> you depend on it, and we break it, the kernel is supposed to fix it.
>
> There are some (very few) exceptions.
>
> If it's a security issue, we may not be able to "fix" it, because
> other concerns can obviously take precedence.
>
> Also, sometimes the reports come in way too late - if you were running
> some stable distro kernel for several years, and updated, and notice a
> change that happened four years ago and modern applications now rely
> on the _new_ behavior, we may not be able to fix the regression any
> more.
>
> But no, "it was an unintentional kernel bug and clearly just stupid
> crap code, and we fixed it and now the kernel is much better and
> cleaner" is not a valid reason for regressions. We'll go back to the
> stupid and crap code if necessary, however much that may annoy us.
>
> For an example of the kind of things we may have to do, see commits
>
> 64f371bc3107 autofs: make the autofsv5 packet file descriptor use
> a packetized pipe
> 9883035ae7ed pipes: add a "packetized pipe" mode for writing
>
> and just wonder at the insanity. That's the kinds of things that
> happen when one application had actively worked around a bug in
> compatibility handling, and then trying to "fix" that bug just caused
> another application to break instead.
>
> Linus
Is there a way to temporally confine the bad crap code just to the
applications that depend on it, or does a userspace app latching onto
bad behavior effectively lock down the abi for the future?
I know that some features in the kernel get deprecated over a process
(devfs for example) once userspace is given an alternative...would there
be a process like that to deal with userspace code that is pinning a
piece of crap in the kernel?
14.08.2015 02:18, Linus Torvalds пишет:
> On Thu, Aug 13, 2015 at 4:05 PM, Linus Torvalds
> <[email protected]> wrote:
>> The _only_ thing that matters is that something broke.
> To clarify: things like test programs etc don't matter. Real
> applications, used by real users. That's what regressions cover. If
> you have a workflow that isn't just some random kernel test thing, and
> you depend on it, and we break it, the kernel is supposed to fix it.
>
> There are some (very few) exceptions.
>
> If it's a security issue, we may not be able to "fix" it, because
> other concerns can obviously take precedence.
>
> Also, sometimes the reports come in way too late - if you were running
> some stable distro kernel for several years, and updated, and notice a
> change that happened four years ago and modern applications now rely
> on the _new_ behavior, we may not be able to fix the regression any
> more.
>
> But no, "it was an unintentional kernel bug and clearly just stupid
> crap code, and we fixed it and now the kernel is much better and
> cleaner" is not a valid reason for regressions. We'll go back to the
> stupid and crap code if necessary, however much that may annoy us.
IMHO at least such ocasions should be listed somewhere,
and the software authors should be asked to fix their code.
Then you'll be able to re-visit the problem later.
It may be unreasonable to carry the compatibility hacks forever
if the software that needed it, released the fix 10 years ago.
In fact, in the cases I can remember, the kernel patches
were never reverted, see this for instance:
https://lkml.org/lkml/2005/3/26/21
And there were many other breakages too, for example when
kernel started to use top-down memory allocations. These
were because of the poor code in dosemu, and dosemu was
asked to fix the code. I guess the policy to never break userspace
was not existing back then. Or there is some margin below
which the code quality is considered not worth the troubles. :)
14.08.2015 02:00, Andy Lutomirski пишет:
> On Thu, Aug 13, 2015 at 3:51 PM, Stas Sergeev <[email protected]> wrote:
>> 14.08.2015 01:29, Andy Lutomirski пишет:
>>> On Thu, Aug 13, 2015 at 3:25 PM, Stas Sergeev <[email protected]> wrote:
>>>> 14.08.2015 01:11, Andy Lutomirski пишет:
>>>>
>>>>> Now suppose you set some magic flag and jump (via sigreturn,
>>>>> trampoline, whatever) into DOS code. The DOS code loads 0x7 into FS
>>>>> and then gets #GP. You land in a signal handler. As far as the
>>>>> kernel's concerned, the FS base register is whatever the base of LDT
>>>>> entry 0 is. What else is the kernel supposed to shove in there?
>>>> The same as what happens when you do in userspace:
>>>> ---
>>>> asm ("mov $0,%%fs\n");
>>>> prctl(ARCH_SET_FS, my_tls_base);
>>>> ---
>>>>
>>>> This was the trick I did before gcc started to use FS in prolog,
>>>> now I have to do this in asm.
>>>> But how simpler for the kernel is to do the same?
>>>>
>>>>> I think that making this work fully in the kernel would require a
>>>>> full-blown FS equivalent of sigaltstack, and that seems like overkill.
>>>> Setting selector and base is what you call an "equivalent of
>>>> sigaltstack"?
>>> Yes. sigaltstack says "hey, kernel! here's my SP for signal
>>> handling." I think we'd need something similar to tell the kernel
>>> what my_tls_base is. Using the most recent thing passed to
>>> ARCH_SET_FS is no good because WRFSBASE systems might not use
>>> ARCH_SET_FS, and we can't break DOSEMU on Ivy Bridge and newer as soon
>>> as we enable WRFSBASE.
>> If someone uses WRFSBASE and wants things to be preserved
>> in a sighandler, he'll just not set the aforementioned flag. No regression.
>> Whoever wants to use that flag properly, will not use WRFSBASE,
>> and will use ARCH_SET_FS or set_thread_area().
>> What exactly breakage do you have in mind?
> DOSEMU, when you set that flag, WRFSBASE gets enabled, and glibc's
> threading library starts using WRFSBASE instead of arch_prctl.
Hmm, how about the following:
prctl(ARCH_SET_SIGNAL_FS, my_tls)
If my_tls==NULL - use current fsbase (including one of WRFSBASE).
If my_tls==(void)-1 - don't restore.
Can this work?
On Thu, Aug 13, 2015 at 4:43 PM, Stas Sergeev <[email protected]> wrote:
> In fact, in the cases I can remember, the kernel patches
> were never reverted, see this for instance:
> https://lkml.org/lkml/2005/3/26/21
> And there were many other breakages too, for example when
> kernel started to use top-down memory allocations. These
> were because of the poor code in dosemu, and dosemu was
> asked to fix the code. I guess the policy to never break userspace
> was not existing back then. Or there is some margin below
> which the code quality is considered not worth the troubles. :)
Back in 2005 we may well not have been as strict about regressions as
we are now, no. The strict policy of no regressions actually
originally started mainly wrt suspend/resume issues, where the "fix
one machine, break another" kind of back-and-forth caused endless
problems, and meant that we didn't actually necessarily make any
forward progress, just moving a problem around.
(That said, I'd like to think that we've _always_ tried very hard to
not break stuff, it just wasn't necessarily the kind of very explicit
and hard rule that it is these days).
Also, there are certainly developers who push back on fixing the
regressions they caused. That is in fact the cause of some of my more
memorable explosions. But if I'm not brought into the discussion, and
the push-back happens on a mailing list, I may not even be aware of
the regression and the fact that a developer isn't willing to fix it.
I've also seen other projects be more willing to work around problems
like this than I personally would consider to be a good idea. For
example, we had some Wine regressions that broke a steam game or two,
and it was debugged on the steam and wine lists, and took a longish
time to actually even get to the attention of kernel people, because
the developers were actually willing to try to do an update and fix
their application to not do the thing that caused problems. Which I
certainly appreciate, but at the same time that doesn't necessarily
help the user who sits there with a game that just didn't work. So
even if an app is getting updated eventually, the kernel breakage
should be fixed.
So if some patch causes problems, and the author of the patch doesn't
acknowledge the problem or even if the application developer says "we
can work around that", always feel free to escalate the issue and
bring in upper maintainers.
That said, it *does* depend a bit on just how core the area is.
Sometimes it just gets too painful to fix things, and while the "no
regressions" rule is pretty damn close to the strictest rule we have,
there are never any completely absolute rules. As mentioned, there are
exceptions to the regression rule too, and sometimes the result might
just end up being "very few people are actually affected, and there's
a sufficiently good reason for the regression that we'll just cop
out".
But that should really be a very rare occurrence. We used to have
quite a lot of those kinds of issues with the GPU layer (resulting in
flag-days with the X server etc), and it really causes huge problems.
So there are no absolutes. But regressions are a serious problem, and
need to be treated as such. I will not _guarantee_ that we always fix
them, but I would hope we do our best.
Linus
On Thu, Aug 13, 2015 at 5:00 PM, Stas Sergeev <[email protected]> wrote:
> 14.08.2015 02:00, Andy Lutomirski пишет:
>
>> On Thu, Aug 13, 2015 at 3:51 PM, Stas Sergeev <[email protected]> wrote:
>>>
>>> 14.08.2015 01:29, Andy Lutomirski пишет:
>>>>
>>>> On Thu, Aug 13, 2015 at 3:25 PM, Stas Sergeev <[email protected]> wrote:
>>>>>
>>>>> 14.08.2015 01:11, Andy Lutomirski пишет:
>>>>>
>>>>>> Now suppose you set some magic flag and jump (via sigreturn,
>>>>>> trampoline, whatever) into DOS code. The DOS code loads 0x7 into FS
>>>>>> and then gets #GP. You land in a signal handler. As far as the
>>>>>> kernel's concerned, the FS base register is whatever the base of LDT
>>>>>> entry 0 is. What else is the kernel supposed to shove in there?
>>>>>
>>>>> The same as what happens when you do in userspace:
>>>>> ---
>>>>> asm ("mov $0,%%fs\n");
>>>>> prctl(ARCH_SET_FS, my_tls_base);
>>>>> ---
>>>>>
>>>>> This was the trick I did before gcc started to use FS in prolog,
>>>>> now I have to do this in asm.
>>>>> But how simpler for the kernel is to do the same?
>>>>>
>>>>>> I think that making this work fully in the kernel would require a
>>>>>> full-blown FS equivalent of sigaltstack, and that seems like overkill.
>>>>>
>>>>> Setting selector and base is what you call an "equivalent of
>>>>> sigaltstack"?
>>>>
>>>> Yes. sigaltstack says "hey, kernel! here's my SP for signal
>>>> handling." I think we'd need something similar to tell the kernel
>>>> what my_tls_base is. Using the most recent thing passed to
>>>> ARCH_SET_FS is no good because WRFSBASE systems might not use
>>>> ARCH_SET_FS, and we can't break DOSEMU on Ivy Bridge and newer as soon
>>>> as we enable WRFSBASE.
>>>
>>> If someone uses WRFSBASE and wants things to be preserved
>>> in a sighandler, he'll just not set the aforementioned flag. No
>>> regression.
>>> Whoever wants to use that flag properly, will not use WRFSBASE,
>>> and will use ARCH_SET_FS or set_thread_area().
>>> What exactly breakage do you have in mind?
>>
>> DOSEMU, when you set that flag, WRFSBASE gets enabled, and glibc's
>> threading library starts using WRFSBASE instead of arch_prctl.
>
> Hmm, how about the following:
>
> prctl(ARCH_SET_SIGNAL_FS, my_tls)
> If my_tls==NULL - use current fsbase (including one of WRFSBASE).
> If my_tls==(void)-1 - don't restore.
>
> Can this work?
Certainly, but why? ISTM user code should do this itself with a
little bit of asm unless there's a good reason it wouldn't work. A
good reason it wouldn't work is that high-performance applications
need this and an extra syscall is too slow, but IMO that would need
evidence.
--Andy
--
Andy Lutomirski
AMA Capital Management, LLC
On Thu, Aug 13, 2015 at 5:00 PM, Stas Sergeev <[email protected]> wrote:
> 14.08.2015 02:00, Andy Lutomirski пишет:
>>
>> DOSEMU, when you set that flag, WRFSBASE gets enabled, and glibc's
>> threading library starts using WRFSBASE instead of arch_prctl.
>
> Hmm, how about the following:
>
> prctl(ARCH_SET_SIGNAL_FS, my_tls)
> If my_tls==NULL - use current fsbase (including one of WRFSBASE).
> If my_tls==(void)-1 - don't restore.
>
> Can this work?
I'm really inclined to wonder whether we need the change and such a flag at all.
Basically, no _normal_ application will ever play with segments at all
on x86-64. So our current behavior of not touching any segments at all
for signal handling would seem to be the right thing to do - because
it handles all the sane cases optimally.
And applications that *do* play with segments very much know they do
so, and we already put the onus on *them* to save/restore segments.
That's how dosemu clearly works today.
So why not just keep to that policy? It has worked fairly well so
far. Only when we tried to change that policy did we hit these
problems, because existing applications obviously already live with
what we do (or rather, what we _don't_ do) right now...
Linus
14.08.2015 03:05, Andy Lutomirski пишет:
> On Thu, Aug 13, 2015 at 5:00 PM, Stas Sergeev <[email protected]> wrote:
>> 14.08.2015 02:00, Andy Lutomirski пишет:
>>
>>> On Thu, Aug 13, 2015 at 3:51 PM, Stas Sergeev <[email protected]> wrote:
>>>> 14.08.2015 01:29, Andy Lutomirski пишет:
>>>>> On Thu, Aug 13, 2015 at 3:25 PM, Stas Sergeev <[email protected]> wrote:
>>>>>> 14.08.2015 01:11, Andy Lutomirski пишет:
>>>>>>
>>>>>>> Now suppose you set some magic flag and jump (via sigreturn,
>>>>>>> trampoline, whatever) into DOS code. The DOS code loads 0x7 into FS
>>>>>>> and then gets #GP. You land in a signal handler. As far as the
>>>>>>> kernel's concerned, the FS base register is whatever the base of LDT
>>>>>>> entry 0 is. What else is the kernel supposed to shove in there?
>>>>>> The same as what happens when you do in userspace:
>>>>>> ---
>>>>>> asm ("mov $0,%%fs\n");
>>>>>> prctl(ARCH_SET_FS, my_tls_base);
>>>>>> ---
>>>>>>
>>>>>> This was the trick I did before gcc started to use FS in prolog,
>>>>>> now I have to do this in asm.
>>>>>> But how simpler for the kernel is to do the same?
>>>>>>
>>>>>>> I think that making this work fully in the kernel would require a
>>>>>>> full-blown FS equivalent of sigaltstack, and that seems like overkill.
>>>>>> Setting selector and base is what you call an "equivalent of
>>>>>> sigaltstack"?
>>>>> Yes. sigaltstack says "hey, kernel! here's my SP for signal
>>>>> handling." I think we'd need something similar to tell the kernel
>>>>> what my_tls_base is. Using the most recent thing passed to
>>>>> ARCH_SET_FS is no good because WRFSBASE systems might not use
>>>>> ARCH_SET_FS, and we can't break DOSEMU on Ivy Bridge and newer as soon
>>>>> as we enable WRFSBASE.
>>>> If someone uses WRFSBASE and wants things to be preserved
>>>> in a sighandler, he'll just not set the aforementioned flag. No
>>>> regression.
>>>> Whoever wants to use that flag properly, will not use WRFSBASE,
>>>> and will use ARCH_SET_FS or set_thread_area().
>>>> What exactly breakage do you have in mind?
>>> DOSEMU, when you set that flag, WRFSBASE gets enabled, and glibc's
>>> threading library starts using WRFSBASE instead of arch_prctl.
>> Hmm, how about the following:
>>
>> prctl(ARCH_SET_SIGNAL_FS, my_tls)
>> If my_tls==NULL - use current fsbase (including one of WRFSBASE).
>> If my_tls==(void)-1 - don't restore.
>>
>> Can this work?
> Certainly, but why?
For example because you can as well do:
prctl(ARCH_SET_SIGNAL_SS, 0)
which will mean "restore ss in sighandler to its current value",
and this will fix the regression right here right now, without
any lar heuristic, and will keep the correct behaviour of always
restoring ss for those who need that.
> ISTM user code should do this itself with a
> little bit of asm unless there's a good reason it wouldn't work.
Any example of that asm?
I can even code up the ARCH_SET_SIGNAL_SS patch if you want,
it seems absolutely trivial, much simpler than the aforementioned asm,
much simpler than the patches you propose.
So my question is more like "why not", rather than "why".
Just because it is simple and clean, IMHO.
On Thu, Aug 13, 2015 at 5:08 PM, Linus Torvalds
<[email protected]> wrote:
> On Thu, Aug 13, 2015 at 5:00 PM, Stas Sergeev <[email protected]> wrote:
>> 14.08.2015 02:00, Andy Lutomirski пишет:
>>>
>>> DOSEMU, when you set that flag, WRFSBASE gets enabled, and glibc's
>>> threading library starts using WRFSBASE instead of arch_prctl.
>>
>> Hmm, how about the following:
>>
>> prctl(ARCH_SET_SIGNAL_FS, my_tls)
>> If my_tls==NULL - use current fsbase (including one of WRFSBASE).
>> If my_tls==(void)-1 - don't restore.
>>
>> Can this work?
>
> I'm really inclined to wonder whether we need the change and such a flag at all.
>
> Basically, no _normal_ application will ever play with segments at all
> on x86-64. So our current behavior of not touching any segments at all
> for signal handling would seem to be the right thing to do - because
> it handles all the sane cases optimally.
>
> And applications that *do* play with segments very much know they do
> so, and we already put the onus on *them* to save/restore segments.
> That's how dosemu clearly works today.
I agree for all but CS and SS, which are special. CS is fine already.
The way that DOSEMU works around SS it is hideous: it just gives up on
sigreturn working and fixes the segments with IRET. (Also, if DOSEMU
ever wants to get ESP[31:16] right, it *can't*: only the kernel can
usefully do espfix64, and DOSEMU can't get the kernel to return from
64-bit code to 16-bit code, because we zap SS. DOSEMU fudges it by
forcibly zeroing ESP[31:16]), but that's not a full solution and I
wouldn't be surprised if something breaks as a result.
So yes, it mostly works. It also sucks, and it makes it extremely
unpleasant for any other program to do this. I'd argue that keeping
things like the sigreturn_64 test working is quite valuable, because
it's a royal PITA to exercise this code cleanly without proper control
of SS. Unfortunately, making it hard for sigreturn_64 to exercise
this stuff doesn't mean that the bad guys can't do it, because they'll
use malformed ELF files, or ptrace, or stupid modify_ldt races (except
I hopefully fixed those), or x32, or compat, or some other hack I
haven't thought of yet, and they'll hit the same bugs. I've lost
count of the number of bugs of various severities that the sigreturn
tests have shaken out *and* caught in patches that were emailed in.
We could say "forget sigreturn_64, use sigreturn_32 instead", but that
can't exercise !IA32_EMULATION kernels, and that code is a bit
different.
(BadIRET, for example, is nominally unreachable from 64-bit code
without modify_ldt and the sigreturn fix, except that I think know of
one really nasty way to do it. I have no intention of implementing
that, so keeping selftests working nicely makes me *much* more
confident.)
Obviously, if we reintroduce SS restoration, we need to do it much
more carefully. My RFC patches are an attempt to do that, but it
needs a lot of care to make sure all the bases are covered.
--Andy
On Thu, Aug 13, 2015 at 5:17 PM, Stas Sergeev <[email protected]> wrote:
>
> For example because you can as well do:
> prctl(ARCH_SET_SIGNAL_SS, 0)
> which will mean "restore ss in sighandler to its current value",
I really think a prctl() is the wrong thing to do.
If you want a signal handler to save/restore segments, I think it
should be a SA_xyz flag to sigaction() (the way we have SA_RESTART
etc). And off by default because of the obvious compatibility issues.
Linus
On Thu, Aug 13, 2015 at 5:24 PM, Andy Lutomirski <[email protected]> wrote:
>
> So yes, it mostly works. It also sucks, and it makes it extremely
> unpleasant for any other program to do this.
Well, I'd argue that
(a) we don't really _want_ any other programs to do that
(b) but yeah, we might want to make it easier to do cleanly and
explicitly for dosemu (and any other possible programs that do want to
do it)
so if this is for just things like dosemu and for test applications
etc, let's just introduce something like SA_RESTORE_SS for those to
explicitly opt in to "I want this signal handler to save and restore
SS".
That's how we have handled these kinds of things in the past (ie
SA_SIGINFO extends the stack frame with siginfo, sa_restorer says that
user space will use its own sigrestore function, etc etc).
Yeah, the SA_xyzzy flags aren't exactly _common_, but it's how we've
handled these kinds of compat issues in the past (SA_ONESHOT, SA_MASK,
and SA_NOCLDWAIT are obviously about the traditional BSD/SysV
differences, and SA_RESTORER is a Linux internal compatibility thing
etc)
Linus
14.08.2015 03:27, Linus Torvalds пишет:
> On Thu, Aug 13, 2015 at 5:17 PM, Stas Sergeev <[email protected]> wrote:
>> For example because you can as well do:
>> prctl(ARCH_SET_SIGNAL_SS, 0)
>> which will mean "restore ss in sighandler to its current value",
> I really think a prctl() is the wrong thing to do.
>
> If you want a signal handler to save/restore segments, I think it
> should be a SA_xyz flag to sigaction() (the way we have SA_RESTART
Yes, I was proposing the new sigaction() flag in this thread
already too. But at the end, prctl() looks better to me because
it allows to pass the TLS value to use when restoring FS.
The thing is that I am trying to find the similar treatment for
both the SS and FS problems. If you don't think they need a
similar treatment, then perhaps the Andy's patch is enough.
> etc). And off by default because of the obvious compatibility issues.
Of course.
So, what we have right now (in the latest Andy's patch) is:
1. lar heuristics
2. new uc_flags flag
What it solves: dosemu's regression.
What prctl() can give:
- fix to dosemu's regression
- fix to the TLS problem in the future
- no hack and heuristics
With SA_xyz you can only solve the SS problem, so it is
probably not any better than the uc_flags things coded
up by Andy.
On Thu, Aug 13, 2015 at 5:50 PM, Stas Sergeev <[email protected]> wrote:
> 14.08.2015 03:27, Linus Torvalds пишет:
>>
>> On Thu, Aug 13, 2015 at 5:17 PM, Stas Sergeev <[email protected]> wrote:
>>>
>>> For example because you can as well do:
>>> prctl(ARCH_SET_SIGNAL_SS, 0)
>>> which will mean "restore ss in sighandler to its current value",
>>
>> I really think a prctl() is the wrong thing to do.
>>
>> If you want a signal handler to save/restore segments, I think it
>> should be a SA_xyz flag to sigaction() (the way we have SA_RESTART
>
> Yes, I was proposing the new sigaction() flag in this thread
> already too. But at the end, prctl() looks better to me because
> it allows to pass the TLS value to use when restoring FS.
> The thing is that I am trying to find the similar treatment for
> both the SS and FS problems. If you don't think they need a
> similar treatment, then perhaps the Andy's patch is enough.
>
>> etc). And off by default because of the obvious compatibility issues.
>
> Of course.
>
> So, what we have right now (in the latest Andy's patch) is:
> 1. lar heuristics
> 2. new uc_flags flag
>
> What it solves: dosemu's regression.
>
> What prctl() can give:
> - fix to dosemu's regression
> - fix to the TLS problem in the future
> - no hack and heuristics
>
> With SA_xyz you can only solve the SS problem, so it is
> probably not any better than the uc_flags things coded
> up by Andy.
I'm leaning slightly toward LAR heuristic + SA_SAVE_SS. Using a
sigaction flag is a bit less weird than using uc_flags. It's also
kind of nice that it's more composable -- you can install a SIGUSR1
handler that's just normal code and set SA_SAVE_SS and it'll work,
whereas with uc_flags you need to explicitly twiddle uc_flags in the
handler.
Unfortunately, I don't think we were clever enough to allow this to be
probed easily -- we silently ignore unrecognized sa_flags bits.
--Andy
--
Andy Lutomirski
AMA Capital Management, LLC
14.08.2015 04:21, Andy Lutomirski пишет:
> On Thu, Aug 13, 2015 at 5:50 PM, Stas Sergeev <[email protected]> wrote:
>> 14.08.2015 03:27, Linus Torvalds пишет:
>>> On Thu, Aug 13, 2015 at 5:17 PM, Stas Sergeev <[email protected]> wrote:
>>>> For example because you can as well do:
>>>> prctl(ARCH_SET_SIGNAL_SS, 0)
>>>> which will mean "restore ss in sighandler to its current value",
>>> I really think a prctl() is the wrong thing to do.
>>>
>>> If you want a signal handler to save/restore segments, I think it
>>> should be a SA_xyz flag to sigaction() (the way we have SA_RESTART
>> Yes, I was proposing the new sigaction() flag in this thread
>> already too. But at the end, prctl() looks better to me because
>> it allows to pass the TLS value to use when restoring FS.
>> The thing is that I am trying to find the similar treatment for
>> both the SS and FS problems. If you don't think they need a
>> similar treatment, then perhaps the Andy's patch is enough.
>>
>>> etc). And off by default because of the obvious compatibility issues.
>> Of course.
>>
>> So, what we have right now (in the latest Andy's patch) is:
>> 1. lar heuristics
>> 2. new uc_flags flag
>>
>> What it solves: dosemu's regression.
>>
>> What prctl() can give:
>> - fix to dosemu's regression
>> - fix to the TLS problem in the future
>> - no hack and heuristics
>>
>> With SA_xyz you can only solve the SS problem, so it is
>> probably not any better than the uc_flags things coded
>> up by Andy.
> I'm leaning slightly toward LAR heuristic + SA_SAVE_SS.
Stop right here, doesn't the SA_xyz allow to avoid the
lar heuristic? Why would you still need the lar heuristic then?
Just call it SA_RESTORE_SS instead of SA_SAVE_SS, and
the lar heuristic is gone.
> Unfortunately, I don't think we were clever enough to allow this to be
> probed easily -- we silently ignore unrecognized sa_flags bits.
Big deal, check the kversion. :)
Unforunately, in my eyes SA_xyz doesn't help with FS, so
whether it is better than uc_flags or not, is not what I care
about.
On Thu, Aug 13, 2015 at 6:32 PM, Stas Sergeev <[email protected]> wrote:
> 14.08.2015 04:21, Andy Lutomirski пишет:
>
>> On Thu, Aug 13, 2015 at 5:50 PM, Stas Sergeev <[email protected]> wrote:
>>>
>>> 14.08.2015 03:27, Linus Torvalds пишет:
>>>>
>>>> On Thu, Aug 13, 2015 at 5:17 PM, Stas Sergeev <[email protected]> wrote:
>>>>>
>>>>> For example because you can as well do:
>>>>> prctl(ARCH_SET_SIGNAL_SS, 0)
>>>>> which will mean "restore ss in sighandler to its current value",
>>>>
>>>> I really think a prctl() is the wrong thing to do.
>>>>
>>>> If you want a signal handler to save/restore segments, I think it
>>>> should be a SA_xyz flag to sigaction() (the way we have SA_RESTART
>>>
>>> Yes, I was proposing the new sigaction() flag in this thread
>>> already too. But at the end, prctl() looks better to me because
>>> it allows to pass the TLS value to use when restoring FS.
>>> The thing is that I am trying to find the similar treatment for
>>> both the SS and FS problems. If you don't think they need a
>>> similar treatment, then perhaps the Andy's patch is enough.
>>>
>>>> etc). And off by default because of the obvious compatibility issues.
>>>
>>> Of course.
>>>
>>> So, what we have right now (in the latest Andy's patch) is:
>>> 1. lar heuristics
>>> 2. new uc_flags flag
>>>
>>> What it solves: dosemu's regression.
>>>
>>> What prctl() can give:
>>> - fix to dosemu's regression
>>> - fix to the TLS problem in the future
>>> - no hack and heuristics
>>>
>>> With SA_xyz you can only solve the SS problem, so it is
>>> probably not any better than the uc_flags things coded
>>> up by Andy.
>>
>> I'm leaning slightly toward LAR heuristic + SA_SAVE_SS.
>
> Stop right here, doesn't the SA_xyz allow to avoid the
> lar heuristic? Why would you still need the lar heuristic then?
> Just call it SA_RESTORE_SS instead of SA_SAVE_SS, and
> the lar heuristic is gone.
The LAR heuristic is about five lines of code, and it makes signal
delivery more reliable. Sure, we could gate the "regs->ss =
__USER_DS" line on a flag, but why?
>
>> Unfortunately, I don't think we were clever enough to allow this to be
>> probed easily -- we silently ignore unrecognized sa_flags bits.
>
> Big deal, check the kversion. :)
Not so good. For example, if you made your DOSEMU patch to use the
saved SS check the version, then the backported revert would break
you.
--Andy
14.08.2015 04:37, Andy Lutomirski пишет:
> On Thu, Aug 13, 2015 at 6:32 PM, Stas Sergeev <[email protected]> wrote:
>> 14.08.2015 04:21, Andy Lutomirski пишет:
>>
>>> On Thu, Aug 13, 2015 at 5:50 PM, Stas Sergeev <[email protected]> wrote:
>>>> 14.08.2015 03:27, Linus Torvalds пишет:
>>>>> On Thu, Aug 13, 2015 at 5:17 PM, Stas Sergeev <[email protected]> wrote:
>>>>>> For example because you can as well do:
>>>>>> prctl(ARCH_SET_SIGNAL_SS, 0)
>>>>>> which will mean "restore ss in sighandler to its current value",
>>>>> I really think a prctl() is the wrong thing to do.
>>>>>
>>>>> If you want a signal handler to save/restore segments, I think it
>>>>> should be a SA_xyz flag to sigaction() (the way we have SA_RESTART
>>>> Yes, I was proposing the new sigaction() flag in this thread
>>>> already too. But at the end, prctl() looks better to me because
>>>> it allows to pass the TLS value to use when restoring FS.
>>>> The thing is that I am trying to find the similar treatment for
>>>> both the SS and FS problems. If you don't think they need a
>>>> similar treatment, then perhaps the Andy's patch is enough.
>>>>
>>>>> etc). And off by default because of the obvious compatibility issues.
>>>> Of course.
>>>>
>>>> So, what we have right now (in the latest Andy's patch) is:
>>>> 1. lar heuristics
>>>> 2. new uc_flags flag
>>>>
>>>> What it solves: dosemu's regression.
>>>>
>>>> What prctl() can give:
>>>> - fix to dosemu's regression
>>>> - fix to the TLS problem in the future
>>>> - no hack and heuristics
>>>>
>>>> With SA_xyz you can only solve the SS problem, so it is
>>>> probably not any better than the uc_flags things coded
>>>> up by Andy.
>>> I'm leaning slightly toward LAR heuristic + SA_SAVE_SS.
>> Stop right here, doesn't the SA_xyz allow to avoid the
>> lar heuristic? Why would you still need the lar heuristic then?
>> Just call it SA_RESTORE_SS instead of SA_SAVE_SS, and
>> the lar heuristic is gone.
> The LAR heuristic is about five lines of code, and it makes signal
> delivery more reliable. Sure, we could gate the "regs->ss =
> __USER_DS" line on a flag, but why?
Speed?
I'll let others decide on that.
My vote is to no heuristic if possible, and keeping the FS
problem in mind if possible (obviously, both are possible).
>>> Unfortunately, I don't think we were clever enough to allow this to be
>>> probed easily -- we silently ignore unrecognized sa_flags bits.
>> Big deal, check the kversion. :)
> Not so good. For example, if you made your DOSEMU patch to use the
> saved SS check the version, then the backported revert would break
> you.
I would be scared to imagine the program that probes
for everything, adding more and more probes with years,
instead of just saying "you need kernel version at least x.y.z".
If some functionality is reverted, another kversion check
can be added. This all is not new: dosemu's protected mode
implementation doesn't work on many kernels selectively.
IIRC it didn't work on 3.14 because the 16bit LDTs were
disallowed, and other versions had problems too. It is easier
to ban such kernels by versions instead of adding a run-time
probes, because you don't know beforehand what will break
in the next kernel to add a probe in advance, and when you
react to an existing breakage, it already doesn't matter.
The possibly of reverting a new functionality is similar to
getting a breakage in an existing functionality (or even have
lower probability), so I see no reason to choose different
treatments for these two.
But of course every author has own habits.
On Thu, Aug 13, 2015 at 01:09:47PM -0700, Linus Torvalds wrote:
> On Thu, Aug 13, 2015 at 1:08 PM, Cyrill Gorcunov <[email protected]> wrote:
> >
> > If only I'm not missin something obvious this should not hurt us.
> > But I gonna build test kernel and check to be sure tomorrow, ok?
Managed to test it. And criu works fine with the revert as expected.
Actually it's because of commit c6f2062935c8 Oleg made us a patch:
| commit 07dcf0dbb6ff97c255bc6b06569255a9479bccdd
| Author: Oleg Nesterov <[email protected]>
| Date: Thu Mar 19 19:14:00 2015 +0300
|
| restore/x86: restore_gpregs() needs to initialize ->ss as well
|
| Before the recent "x86_64,signal: Fix SS handling for signals delivered
| to 64-bit programs" kernel patch, sigreturn paths forgot to restore ->ss
| after return from the signal handler.
|
| Now that the kernel was fixed, restore_gpregs() has to initialize ->ss
| too, it is no longer ignored.
| ...
|
| +++ b/arch/x86/include/asm/restorer.h
| @@ -53,7 +53,7 @@ struct rt_sigcontext {
| unsigned short cs;
| unsigned short gs;
| unsigned short fs;
| - unsigned short __pad0;
| + unsigned short ss;
| unsigned long err;
| unsigned long trapno;
| unsigned long oldmask;
IOW we've been not setting up __pad0 which became ss
inside the kernel (in result we've been passing 0 here,
which caused the problem).
fwiw, we declare that new criu versions may require new
kernels to work but never promised that new criu gonna
be compatible with old kernels. That said if something
get changed inside sigcontext structure in future
we may update criu code as well.
On Thu, Aug 13, 2015 at 08:43:24AM -0700, Andy Lutomirski wrote:
...
> >
> > That rule hasn't gone anywhere.
> >
> > Does a plain revert just fix everything? Because if so, that's the
> > right thing to do, and we can just re-visit this later.
> >
> > I don't understand why Andy and Ingo are even discussing this. What
> > the f*ck, guys?
> >
>
> I'm trying to fix it without reverting. If that doesn't work, then we
> revert. Yesterday, I thought I had a reasonably clean fix, but it
> turned out that it only solved half of the problem.
>
> If we revert, I think I need to check what will break due to the
> revert. I need to check at least Wine, and we'll have to do something
> about all the selftests that will start failing. I also need to check
> CRIU, and IIRC CRIU has started using the new sigcontext SS in new
> versions.
Yes, we've tuned up our sigcontext structure and put ss into the place
where previously __pad0 were. After the revert the kernel simply ignores
this field again. But we never did any "weird" gaming over segment registers
for testing purposes, neither we note any application (in containers) which
does some weird things like dosemu.
>
> And, damnit, those selftests are *useful*. They've smoked out all
> kinds of problems. That's part of the reason I'd prefer not to revert
> if there's a better option.
On 08/14/2015 10:22 AM, Cyrill Gorcunov wrote:
> On Thu, Aug 13, 2015 at 01:09:47PM -0700, Linus Torvalds wrote:
>> On Thu, Aug 13, 2015 at 1:08 PM, Cyrill Gorcunov <[email protected]> wrote:
>>>
>>> If only I'm not missin something obvious this should not hurt us.
>>> But I gonna build test kernel and check to be sure tomorrow, ok?
>
> Managed to test it. And criu works fine with the revert as expected.
> Actually it's because of commit c6f2062935c8 Oleg made us a patch:
>
> | commit 07dcf0dbb6ff97c255bc6b06569255a9479bccdd
> | Author: Oleg Nesterov <[email protected]>
> | Date: Thu Mar 19 19:14:00 2015 +0300
> |
> | restore/x86: restore_gpregs() needs to initialize ->ss as well
> |
> | Before the recent "x86_64,signal: Fix SS handling for signals delivered
> | to 64-bit programs" kernel patch, sigreturn paths forgot to restore ->ss
> | after return from the signal handler.
> |
> | Now that the kernel was fixed, restore_gpregs() has to initialize ->ss
> | too, it is no longer ignored.
> | ...
> |
> | +++ b/arch/x86/include/asm/restorer.h
> | @@ -53,7 +53,7 @@ struct rt_sigcontext {
> | unsigned short cs;
> | unsigned short gs;
> | unsigned short fs;
> | - unsigned short __pad0;
> | + unsigned short ss;
> | unsigned long err;
> | unsigned long trapno;
> | unsigned long oldmask;
>
> IOW we've been not setting up __pad0 which became ss
> inside the kernel (in result we've been passing 0 here,
> which caused the problem).
>
> fwiw, we declare that new criu versions may require new
> kernels to work but never promised that new criu gonna
> be compatible with old kernels.
We did. Kernel 3.11 is still declared as supported (modulo new stuff
we added after 1.0 might not work, but basic sigframe mgmt is not one
of those "new" things).
> That said if something
> get changed inside sigcontext structure in future
> we may update criu code as well.
> .
>
On Fri, Aug 14, 2015 at 01:02:14PM +0300, Pavel Emelyanov wrote:
...
> > IOW we've been not setting up __pad0 which became ss
> > inside the kernel (in result we've been passing 0 here,
> > which caused the problem).
> >
> > fwiw, we declare that new criu versions may require new
> > kernels to work but never promised that new criu gonna
> > be compatible with old kernels.
>
> We did. Kernel 3.11 is still declared as supported (modulo new stuff
> we added after 1.0 might not work, but basic sigframe mgmt is not one
> of those "new" things).
This won't last forever. And I think it's pretty normal to require
a new criu version for new kernel.
Cyrill
14.08.2015 04:37, Andy Lutomirski пишет:
> On Thu, Aug 13, 2015 at 6:32 PM, Stas Sergeev <[email protected]> wrote:
>> 14.08.2015 04:21, Andy Lutomirski пишет:
>>
>>> On Thu, Aug 13, 2015 at 5:50 PM, Stas Sergeev <[email protected]> wrote:
>>>> 14.08.2015 03:27, Linus Torvalds пишет:
>>>>> On Thu, Aug 13, 2015 at 5:17 PM, Stas Sergeev <[email protected]> wrote:
>>>>>> For example because you can as well do:
>>>>>> prctl(ARCH_SET_SIGNAL_SS, 0)
>>>>>> which will mean "restore ss in sighandler to its current value",
>>>>> I really think a prctl() is the wrong thing to do.
>>>>>
>>>>> If you want a signal handler to save/restore segments, I think it
>>>>> should be a SA_xyz flag to sigaction() (the way we have SA_RESTART
>>>> Yes, I was proposing the new sigaction() flag in this thread
>>>> already too. But at the end, prctl() looks better to me because
>>>> it allows to pass the TLS value to use when restoring FS.
>>>> The thing is that I am trying to find the similar treatment for
>>>> both the SS and FS problems. If you don't think they need a
>>>> similar treatment, then perhaps the Andy's patch is enough.
>>>>
>>>>> etc). And off by default because of the obvious compatibility issues.
>>>> Of course.
>>>>
>>>> So, what we have right now (in the latest Andy's patch) is:
>>>> 1. lar heuristics
>>>> 2. new uc_flags flag
>>>>
>>>> What it solves: dosemu's regression.
>>>>
>>>> What prctl() can give:
>>>> - fix to dosemu's regression
>>>> - fix to the TLS problem in the future
>>>> - no hack and heuristics
>>>>
>>>> With SA_xyz you can only solve the SS problem, so it is
>>>> probably not any better than the uc_flags things coded
>>>> up by Andy.
>>> I'm leaning slightly toward LAR heuristic + SA_SAVE_SS.
>> Stop right here, doesn't the SA_xyz allow to avoid the
>> lar heuristic? Why would you still need the lar heuristic then?
>> Just call it SA_RESTORE_SS instead of SA_SAVE_SS, and
>> the lar heuristic is gone.
> The LAR heuristic is about five lines of code, and it makes signal
> delivery more reliable.
Why more reliable? In what case?
> Sure, we could gate the "regs->ss =
> __USER_DS" line on a flag, but why?
A few things I can think of why:
- nested signals (usual for dosemu)
- using siglongjmp() to return to dosemu (rather than to DOS code)
Both cases look very scare when using SS from just freed LDT entry.
How would you even justify and changelog the patch that adds a lar
heuristic code that no one uses or wants? Since SA_hyz flag allows
you to do without, why not to just keep things safe and simple?
13.08.2015 20:00, Brian Gerst пишет:
> On Thu, Aug 13, 2015 at 11:43 AM, Andy Lutomirski <[email protected]> wrote:
>> On Thu, Aug 13, 2015 at 8:37 AM, Linus Torvalds
>> <[email protected]> wrote:
>>> On Tue, Aug 11, 2015 at 5:17 PM, Stas Sergeev <[email protected]> wrote:
>>>> I realize this patch may be good to have in general, but
>>>> breaking userspace without a single warning is a bit
>>>> discouraging. Seems like the old "we don't break userspace"
>>>> rule have gone.
>>> That rule hasn't gone anywhere.
>>>
>>> Does a plain revert just fix everything? Because if so, that's the
>>> right thing to do, and we can just re-visit this later.
>>>
>>> I don't understand why Andy and Ingo are even discussing this. What
>>> the f*ck, guys?
>>>
>> I'm trying to fix it without reverting. If that doesn't work, then we
>> revert. Yesterday, I thought I had a reasonably clean fix, but it
>> turned out that it only solved half of the problem.
>>
>> If we revert, I think I need to check what will break due to the
>> revert. I need to check at least Wine, and we'll have to do something
>> about all the selftests that will start failing. I also need to check
>> CRIU, and IIRC CRIU has started using the new sigcontext SS in new
>> versions.
> I don't think Wine will be a problem, at least how it is currently set
> up. 16-bit support is only in the 32-bit build. The 64-bit build
> only supports Win64 apps, and will call the 32-bit version (installed
> in parallel) to run 32 and 16-bit apps.
Is this also because of the lack of the proper 32/16bit support in
a 64bit kernels? If so, dosemu's work-arounds do not look like the
too bad thing compared to that. :)
13.08.2015 23:07, Linus Torvalds пишет:
> On Thu, Aug 13, 2015 at 12:59 PM, Stas Sergeev <[email protected]> wrote:
>> It doesn't: fedora provides a "sanitized up" version of sigcontext.h
>> in /usr/include/bits, which comes from glibc-headers-2.21-7.fc22.x86_64.
>> So it seems the "sanitized up" headers come from glibc, which
>> means all other distros would have that too.
> Yes. Except the whole point of uapi was that in the long term the
> glibc people should just be able to pick up the kernel ones directly.
>
> And while that may involve some further editing, it sure as hell
> shouldn't involve "oh, I know, this got renamed, so now I have to
> rename it back". That would be crazy.
Of course they shouldn't rename back, but the breakage in
this case will be seen only to the upstream authors and the
distributors who compile the package, but never to the end-user.
They'll just quickly release the update without troubling the user.
This way kernel could force the uapi changes without too
much troubles, and, for example, when it stopped to use
.fs/.fs fields for FS/GS, it could actually reasonably do that.
On Mon, Aug 17, 2015 at 11:29 PM, Stas Sergeev <[email protected]> wrote:
> 13.08.2015 20:00, Brian Gerst пишет:
>
>> On Thu, Aug 13, 2015 at 11:43 AM, Andy Lutomirski <[email protected]>
>> wrote:
>>>
>>> On Thu, Aug 13, 2015 at 8:37 AM, Linus Torvalds
>>> <[email protected]> wrote:
>>>>
>>>> On Tue, Aug 11, 2015 at 5:17 PM, Stas Sergeev <[email protected]> wrote:
>>>>>
>>>>> I realize this patch may be good to have in general, but
>>>>> breaking userspace without a single warning is a bit
>>>>> discouraging. Seems like the old "we don't break userspace"
>>>>> rule have gone.
>>>>
>>>> That rule hasn't gone anywhere.
>>>>
>>>> Does a plain revert just fix everything? Because if so, that's the
>>>> right thing to do, and we can just re-visit this later.
>>>>
>>>> I don't understand why Andy and Ingo are even discussing this. What
>>>> the f*ck, guys?
>>>>
>>> I'm trying to fix it without reverting. If that doesn't work, then we
>>> revert. Yesterday, I thought I had a reasonably clean fix, but it
>>> turned out that it only solved half of the problem.
>>>
>>> If we revert, I think I need to check what will break due to the
>>> revert. I need to check at least Wine, and we'll have to do something
>>> about all the selftests that will start failing. I also need to check
>>> CRIU, and IIRC CRIU has started using the new sigcontext SS in new
>>> versions.
>>
>> I don't think Wine will be a problem, at least how it is currently set
>> up. 16-bit support is only in the 32-bit build. The 64-bit build
>> only supports Win64 apps, and will call the 32-bit version (installed
>> in parallel) to run 32 and 16-bit apps.
>
> Is this also because of the lack of the proper 32/16bit support in
> a 64bit kernels? If so, dosemu's work-arounds do not look like the
> too bad thing compared to that. :)
What do you mean lack of proper 32/16 bit support?
--Andy
On Mon, Aug 17, 2015 at 11:19 PM, Stas Sergeev <[email protected]> wrote:
> 14.08.2015 04:37, Andy Lutomirski пишет:
>
>> On Thu, Aug 13, 2015 at 6:32 PM, Stas Sergeev <[email protected]> wrote:
>>>
>>> 14.08.2015 04:21, Andy Lutomirski пишет:
>>>
>>>> On Thu, Aug 13, 2015 at 5:50 PM, Stas Sergeev <[email protected]> wrote:
>>>>>
>>>>> 14.08.2015 03:27, Linus Torvalds пишет:
>>>>>>
>>>>>> On Thu, Aug 13, 2015 at 5:17 PM, Stas Sergeev <[email protected]> wrote:
>>>>>>>
>>>>>>> For example because you can as well do:
>>>>>>> prctl(ARCH_SET_SIGNAL_SS, 0)
>>>>>>> which will mean "restore ss in sighandler to its current value",
>>>>>>
>>>>>> I really think a prctl() is the wrong thing to do.
>>>>>>
>>>>>> If you want a signal handler to save/restore segments, I think it
>>>>>> should be a SA_xyz flag to sigaction() (the way we have SA_RESTART
>>>>>
>>>>> Yes, I was proposing the new sigaction() flag in this thread
>>>>> already too. But at the end, prctl() looks better to me because
>>>>> it allows to pass the TLS value to use when restoring FS.
>>>>> The thing is that I am trying to find the similar treatment for
>>>>> both the SS and FS problems. If you don't think they need a
>>>>> similar treatment, then perhaps the Andy's patch is enough.
>>>>>
>>>>>> etc). And off by default because of the obvious compatibility issues.
>>>>>
>>>>> Of course.
>>>>>
>>>>> So, what we have right now (in the latest Andy's patch) is:
>>>>> 1. lar heuristics
>>>>> 2. new uc_flags flag
>>>>>
>>>>> What it solves: dosemu's regression.
>>>>>
>>>>> What prctl() can give:
>>>>> - fix to dosemu's regression
>>>>> - fix to the TLS problem in the future
>>>>> - no hack and heuristics
>>>>>
>>>>> With SA_xyz you can only solve the SS problem, so it is
>>>>> probably not any better than the uc_flags things coded
>>>>> up by Andy.
>>>>
>>>> I'm leaning slightly toward LAR heuristic + SA_SAVE_SS.
>>>
>>> Stop right here, doesn't the SA_xyz allow to avoid the
>>> lar heuristic? Why would you still need the lar heuristic then?
>>> Just call it SA_RESTORE_SS instead of SA_SAVE_SS, and
>>> the lar heuristic is gone.
>>
>> The LAR heuristic is about five lines of code, and it makes signal
>> delivery more reliable.
>
> Why more reliable? In what case?
>
>> Sure, we could gate the "regs->ss =
>> __USER_DS" line on a flag, but why?
>
> A few things I can think of why:
> - nested signals (usual for dosemu)
What's the issue with nested signals?
> - using siglongjmp() to return to dosemu (rather than to DOS code)
> Both cases look very scare when using SS from just freed LDT entry.
> How would you even justify and changelog the patch that adds a lar
> heuristic code that no one uses or wants? Since SA_hyz flag allows
> you to do without, why not to just keep things safe and simple?
The LAR heuristic is just for compatibility.
ISTM what DOSEMU should want (on new kernels, anyway) is the ability
to save and restore SS just like any other register, which is what my
patch did. The issue is that it broke old DOSEMU. I want to find a
way to keep old DOSEMU working while making things work better for new
code that's aware of new behavior. That means we want some way
(opt-in or magically compatible with old DOSEMU) to get SS saved and
restored.
Incidentally, I tried implementing the sigaction flag approach. I
think it's no good. When we return from a signal, there's no concept
of sigaction -- it's just sigreturn. Sigreturn can't look up the
sigaction flags -- what if the signal handler calls sigaction itself.
So we either need a per-task flag, a per-sighand flag, or a sigcontext
flag indicating what we should do.
(Yes, I suspect we really might want some way to get FS, GS, and their
bases saved and restored, but I still think we should do that
separately.)
--
Andy Lutomirski
AMA Capital Management, LLC
19.08.2015 01:47, Andy Lutomirski пишет:
> On Mon, Aug 17, 2015 at 11:19 PM, Stas Sergeev <[email protected]> wrote:
>> 14.08.2015 04:37, Andy Lutomirski пишет:
>>
>>> On Thu, Aug 13, 2015 at 6:32 PM, Stas Sergeev <[email protected]> wrote:
>>>>
>>>> 14.08.2015 04:21, Andy Lutomirski пишет:
>>>>
>>>>> On Thu, Aug 13, 2015 at 5:50 PM, Stas Sergeev <[email protected]> wrote:
>>>>>>
>>>>>> 14.08.2015 03:27, Linus Torvalds пишет:
>>>>>>>
>>>>>>> On Thu, Aug 13, 2015 at 5:17 PM, Stas Sergeev <[email protected]> wrote:
>>>>>>>>
>>>>>>>> For example because you can as well do:
>>>>>>>> prctl(ARCH_SET_SIGNAL_SS, 0)
>>>>>>>> which will mean "restore ss in sighandler to its current value",
>>>>>>>
>>>>>>> I really think a prctl() is the wrong thing to do.
>>>>>>>
>>>>>>> If you want a signal handler to save/restore segments, I think it
>>>>>>> should be a SA_xyz flag to sigaction() (the way we have SA_RESTART
>>>>>>
>>>>>> Yes, I was proposing the new sigaction() flag in this thread
>>>>>> already too. But at the end, prctl() looks better to me because
>>>>>> it allows to pass the TLS value to use when restoring FS.
>>>>>> The thing is that I am trying to find the similar treatment for
>>>>>> both the SS and FS problems. If you don't think they need a
>>>>>> similar treatment, then perhaps the Andy's patch is enough.
>>>>>>
>>>>>>> etc). And off by default because of the obvious compatibility issues.
>>>>>>
>>>>>> Of course.
>>>>>>
>>>>>> So, what we have right now (in the latest Andy's patch) is:
>>>>>> 1. lar heuristics
>>>>>> 2. new uc_flags flag
>>>>>>
>>>>>> What it solves: dosemu's regression.
>>>>>>
>>>>>> What prctl() can give:
>>>>>> - fix to dosemu's regression
>>>>>> - fix to the TLS problem in the future
>>>>>> - no hack and heuristics
>>>>>>
>>>>>> With SA_xyz you can only solve the SS problem, so it is
>>>>>> probably not any better than the uc_flags things coded
>>>>>> up by Andy.
>>>>>
>>>>> I'm leaning slightly toward LAR heuristic + SA_SAVE_SS.
>>>>
>>>> Stop right here, doesn't the SA_xyz allow to avoid the
>>>> lar heuristic? Why would you still need the lar heuristic then?
>>>> Just call it SA_RESTORE_SS instead of SA_SAVE_SS, and
>>>> the lar heuristic is gone.
>>>
>>> The LAR heuristic is about five lines of code, and it makes signal
>>> delivery more reliable.
>>
>> Why more reliable? In what case?
>>
>>> Sure, we could gate the "regs->ss =
>>> __USER_DS" line on a flag, but why?
>>
>> A few things I can think of why:
>> - nested signals (usual for dosemu)
>
> What's the issue with nested signals?
If nested signal is async and SS is from LDT just freed, then
the nested signal will silently change the SS value. So if you
are saving it somewhere, you'll save it by luck.
Now this is unlikely to happen, as the async signals in dosemu
are all blocked inside any sighandler. So the siglongjmp() case
is more expressive: if dosemu jumped via siglongjmp() and as such
unblocked the async signals, it will likely want to save its
registers before doing a new switch. Now, since SS is invalid
and will be therefore changed by a sighandler, what it will save
depends on a luck.
Not that dosemu uses siglongjmp() right now, but I am just asking
to please not implement the unreliable interfaces _if possible_.
>> - using siglongjmp() to return to dosemu (rather than to DOS code)
>> Both cases look very scare when using SS from just freed LDT entry.
>> How would you even justify and changelog the patch that adds a lar
>> heuristic code that no one uses or wants? Since SA_hyz flag allows
>> you to do without, why not to just keep things safe and simple?
>
> The LAR heuristic is just for compatibility.
>
> ISTM what DOSEMU should want (on new kernels, anyway) is the ability
> to save and restore SS just like any other register, which is what my
> patch did. The issue is that it broke old DOSEMU. I want to find a
> way to keep old DOSEMU working while making things work better for new
> code that's aware of new behavior. That means we want some way
> (opt-in or magically compatible with old DOSEMU) to get SS saved and
> restored.
>
> Incidentally, I tried implementing the sigaction flag approach. I
> think it's no good. When we return from a signal, there's no concept
> of sigaction -- it's just sigreturn. Sigreturn can't look up the
> sigaction flags -- what if the signal handler calls sigaction itself.
How about the SA_hyz flag that does the following:
- Saves SS into sigcontext
- Forces SS to USER_DS on signal delivery
- Sets the uc_flags flag for sigreturn() to take care of the rest.
You'll have both the control on every bit of action, and a simple
detection logic: if SA_hyz didn't set the uc flag - it didn't work.
You can even employ your lar heuristic here for the case when the
aforementioned SA_hyz is not set. But please, please not when it is
set! In fact, I wonder if you had in mind exactly that: using the
lar heuristic only if the SA_hyz is not set. If so - I misunderstood.
Just please don't add it when it is set.
> So we either need a per-task flag, a per-sighand flag, or a sigcontext
> flag indicating what we should do.
>
> (Yes, I suspect we really might want some way to get FS, GS, and their
> bases saved and restored, but I still think we should do that
> separately.)
In fact, I have already convinced myself that SA_hyz can take
care of both cases. :) Maybe you'll just need to extend the struct sigaction
to pass the TLS address, but this doesn't look absolutely impossible...
19.08.2015 01:42, Andy Lutomirski пишет:
> On Mon, Aug 17, 2015 at 11:29 PM, Stas Sergeev <[email protected]> wrote:
>> 13.08.2015 20:00, Brian Gerst пишет:
>>
>>> On Thu, Aug 13, 2015 at 11:43 AM, Andy Lutomirski <[email protected]>
>>> wrote:
>>>>
>>>> On Thu, Aug 13, 2015 at 8:37 AM, Linus Torvalds
>>>> <[email protected]> wrote:
>>>>>
>>>>> On Tue, Aug 11, 2015 at 5:17 PM, Stas Sergeev <[email protected]> wrote:
>>>>>>
>>>>>> I realize this patch may be good to have in general, but
>>>>>> breaking userspace without a single warning is a bit
>>>>>> discouraging. Seems like the old "we don't break userspace"
>>>>>> rule have gone.
>>>>>
>>>>> That rule hasn't gone anywhere.
>>>>>
>>>>> Does a plain revert just fix everything? Because if so, that's the
>>>>> right thing to do, and we can just re-visit this later.
>>>>>
>>>>> I don't understand why Andy and Ingo are even discussing this. What
>>>>> the f*ck, guys?
>>>>>
>>>> I'm trying to fix it without reverting. If that doesn't work, then we
>>>> revert. Yesterday, I thought I had a reasonably clean fix, but it
>>>> turned out that it only solved half of the problem.
>>>>
>>>> If we revert, I think I need to check what will break due to the
>>>> revert. I need to check at least Wine, and we'll have to do something
>>>> about all the selftests that will start failing. I also need to check
>>>> CRIU, and IIRC CRIU has started using the new sigcontext SS in new
>>>> versions.
>>>
>>> I don't think Wine will be a problem, at least how it is currently set
>>> up. 16-bit support is only in the 32-bit build. The 64-bit build
>>> only supports Win64 apps, and will call the 32-bit version (installed
>>> in parallel) to run 32 and 16-bit apps.
>>
>> Is this also because of the lack of the proper 32/16bit support in
>> a 64bit kernels? If so, dosemu's work-arounds do not look like the
>> too bad thing compared to that. :)
>
> What do you mean lack of proper 32/16 bit support?
At least the following:
1. vm86().
There was a patch:
http://v86-64.sourceforge.net/
Afaik rejected by Andi Kleen (likely for a good reason - too complex).
There is some kvm-based alternative which IIRC was called by dosemu authors
as "too slow", and so they started to use a jit-compiler. Wine have started
to use dosbox for the DOS progs AFAIK. So both projects have a work-arounds
to this limitation with which they are happy, and so it probably not worth
the re-visiting.
2. espfix64.
Its there since 3.16, but dosemu have lots of work-arounds in its code.
The iret trampoline, for example, uses the carefully aligned stack page,
where the high word of ESP is zero.
Another part of the work-around is in a sighandler to decode the
instruction to figure out what register caused a fault (corrupted esp
value usually goes into ebp first, then to other regs) and zero out
the high word of that, plus the high word of esp. There are also other
bits of the work-around spread around the dosemu code, and I am surprised
it actually even works!
3. SS problem. Was fixed in some versions of 4.1; not fixed any more. ;)
dosemu did a glorious iret work-around.
4. FS problem.
Worked around by autoconf checks to ban some gcc options, plus some
special care when accessing thread-local vars in a sighandler.
While your suggestion is to write an asm handlers, to the date I don't
think anyone did that. It is easier to work-around it by other means.
Maybe if you show an example of such handler, the things will change,
but it is simpler to just wait for a kernel fix IMHO.
This is what I called a 32/16bit support, and in fact, when I installed
dosemu on a 64bit machine, started win31 and it just worked, I
immediately wrote my regards to Bart Oldeman, so much I was impressed -
I thought it is absolutely impossible to make this whole mess working
reliably.
I guess wine authors just were not as brave and decided to wait for
the kernel functionality in place.
On Wed, Aug 19, 2015 at 3:10 AM, Stas Sergeev <[email protected]> wrote:
> 19.08.2015 01:42, Andy Lutomirski пишет:
>> What do you mean lack of proper 32/16 bit support?
> At least the following:
>
> 1. vm86().
> There was a patch:
> http://v86-64.sourceforge.net/
> Afaik rejected by Andi Kleen (likely for a good reason - too complex).
> There is some kvm-based alternative which IIRC was called by dosemu authors
> as "too slow", and so they started to use a jit-compiler. Wine have started
> to use dosbox for the DOS progs AFAIK. So both projects have a work-arounds
> to this limitation with which they are happy, and so it probably not worth
> the re-visiting.
>
Wow!
Yeah, that's quite a hack. EFI mixed mode support used to play
similar tricks. Unfortunately, switching in and out of long mode at
runtime works very poorly and has terrible interactions with perf, so
EFI stopped doing that.
> 2. espfix64.
> Its there since 3.16, but dosemu have lots of work-arounds in its code.
> The iret trampoline, for example, uses the carefully aligned stack page,
> where the high word of ESP is zero.
> Another part of the work-around is in a sighandler to decode the
> instruction to figure out what register caused a fault (corrupted esp
> value usually goes into ebp first, then to other regs) and zero out
> the high word of that, plus the high word of esp. There are also other
> bits of the work-around spread around the dosemu code, and I am surprised
> it actually even works!
Wow, no kidding! At least this is fixed now.
>
> 3. SS problem. Was fixed in some versions of 4.1; not fixed any more. ;)
> dosemu did a glorious iret work-around.
>
> 4. FS problem.
> Worked around by autoconf checks to ban some gcc options, plus some
> special care when accessing thread-local vars in a sighandler.
> While your suggestion is to write an asm handlers, to the date I don't
> think anyone did that. It is easier to work-around it by other means.
> Maybe if you show an example of such handler, the things will change,
> but it is simpler to just wait for a kernel fix IMHO.
>
Something like:
mov $__NR_arch_prctl, %rax
mov $ARCH_GET_FS, %rdi
mov [wherever you safe fsbase], %rsi
syscall
mov $ARCH_SET_FS, %rdi
mov (wherever you stashed glibc's value), %rsi
syscall
pushq $0
call real_signal_handler
popq %rax
mov $__NR_arch_prctl, %rax
mov $ARCH_SET_FS, %rdi
mov (saved value), %rsi
syscall
ret
Yeah, it's ugly. It may very well be worth changing this once the
FSBASE stuff happens.
--Andy
On Wed, Aug 19, 2015 at 2:35 AM, Stas Sergeev <[email protected]> wrote:
> 19.08.2015 01:47, Andy Lutomirski пишет:
>> On Mon, Aug 17, 2015 at 11:19 PM, Stas Sergeev <[email protected]> wrote:
>>> 14.08.2015 04:37, Andy Lutomirski пишет:
>>>
>>>> On Thu, Aug 13, 2015 at 6:32 PM, Stas Sergeev <[email protected]> wrote:
>>>>>
>>>>> 14.08.2015 04:21, Andy Lutomirski пишет:
>>>>>
>>>>>> On Thu, Aug 13, 2015 at 5:50 PM, Stas Sergeev <[email protected]> wrote:
>>>>>>>
>>>>>>> 14.08.2015 03:27, Linus Torvalds пишет:
>>>>>>>>
>>>>>>>> On Thu, Aug 13, 2015 at 5:17 PM, Stas Sergeev <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>> For example because you can as well do:
>>>>>>>>> prctl(ARCH_SET_SIGNAL_SS, 0)
>>>>>>>>> which will mean "restore ss in sighandler to its current value",
>>>>>>>>
>>>>>>>> I really think a prctl() is the wrong thing to do.
>>>>>>>>
>>>>>>>> If you want a signal handler to save/restore segments, I think it
>>>>>>>> should be a SA_xyz flag to sigaction() (the way we have SA_RESTART
>>>>>>>
>>>>>>> Yes, I was proposing the new sigaction() flag in this thread
>>>>>>> already too. But at the end, prctl() looks better to me because
>>>>>>> it allows to pass the TLS value to use when restoring FS.
>>>>>>> The thing is that I am trying to find the similar treatment for
>>>>>>> both the SS and FS problems. If you don't think they need a
>>>>>>> similar treatment, then perhaps the Andy's patch is enough.
>>>>>>>
>>>>>>>> etc). And off by default because of the obvious compatibility issues.
>>>>>>>
>>>>>>> Of course.
>>>>>>>
>>>>>>> So, what we have right now (in the latest Andy's patch) is:
>>>>>>> 1. lar heuristics
>>>>>>> 2. new uc_flags flag
>>>>>>>
>>>>>>> What it solves: dosemu's regression.
>>>>>>>
>>>>>>> What prctl() can give:
>>>>>>> - fix to dosemu's regression
>>>>>>> - fix to the TLS problem in the future
>>>>>>> - no hack and heuristics
>>>>>>>
>>>>>>> With SA_xyz you can only solve the SS problem, so it is
>>>>>>> probably not any better than the uc_flags things coded
>>>>>>> up by Andy.
>>>>>>
>>>>>> I'm leaning slightly toward LAR heuristic + SA_SAVE_SS.
>>>>>
>>>>> Stop right here, doesn't the SA_xyz allow to avoid the
>>>>> lar heuristic? Why would you still need the lar heuristic then?
>>>>> Just call it SA_RESTORE_SS instead of SA_SAVE_SS, and
>>>>> the lar heuristic is gone.
>>>>
>>>> The LAR heuristic is about five lines of code, and it makes signal
>>>> delivery more reliable.
>>>
>>> Why more reliable? In what case?
>>>
>>>> Sure, we could gate the "regs->ss =
>>>> __USER_DS" line on a flag, but why?
>>>
>>> A few things I can think of why:
>>> - nested signals (usual for dosemu)
>>
>> What's the issue with nested signals?
> If nested signal is async and SS is from LDT just freed, then
> the nested signal will silently change the SS value. So if you
> are saving it somewhere, you'll save it by luck.
> Now this is unlikely to happen, as the async signals in dosemu
> are all blocked inside any sighandler. So the siglongjmp() case
> is more expressive: if dosemu jumped via siglongjmp() and as such
> unblocked the async signals, it will likely want to save its
> registers before doing a new switch. Now, since SS is invalid
> and will be therefore changed by a sighandler, what it will save
> depends on a luck.
> Not that dosemu uses siglongjmp() right now, but I am just asking
> to please not implement the unreliable interfaces _if possible_.
Ok, I think I get it.
I'm not proposing the LAR thing as something that anyone should
intentionally use going forward. It would only be for compatbility
with old code if needed. We certainly should save SS somewhere.
>> Incidentally, I tried implementing the sigaction flag approach. I
>> think it's no good. When we return from a signal, there's no concept
>> of sigaction -- it's just sigreturn. Sigreturn can't look up the
>> sigaction flags -- what if the signal handler calls sigaction itself.
> How about the SA_hyz flag that does the following:
> - Saves SS into sigcontext
> - Forces SS to USER_DS on signal delivery
> - Sets the uc_flags flag for sigreturn() to take care of the rest.
> You'll have both the control on every bit of action, and a simple
> detection logic: if SA_hyz didn't set the uc flag - it didn't work.
> You can even employ your lar heuristic here for the case when the
> aforementioned SA_hyz is not set. But please, please not when it is
> set! In fact, I wonder if you had in mind exactly that: using the
> lar heuristic only if the SA_hyz is not set. If so - I misunderstood.
> Just please don't add it when it is set.
Hmm, interesting. Maybe that would work for everything. How's this
to make it concrete?
Add a sigaction flag SA_RESTORE_SS.
On signal delivery, always save SS into sigcontext->ss. if
SA_RESTORE_SS is set, then unconditionally switch HW SS to __USER_DS
and set UC_RESTORE_SS. If SA_RESTORE_SS is clear, then leave HW SS
alone (i.e. preserve the old behavior).
On signal return, if UC_RESTORE_SS is set, then restore
sigcontext->ss. If not, then set SS to __USER_DS (as old kernels
did).
This should change nothing at all (except the initial value of
sigcontext->ss / __pad0) on old kernels.
>
>> So we either need a per-task flag, a per-sighand flag, or a sigcontext
>> flag indicating what we should do.
>>
>> (Yes, I suspect we really might want some way to get FS, GS, and their
>> bases saved and restored, but I still think we should do that
>> separately.)
> In fact, I have already convinced myself that SA_hyz can take
> care of both cases. :) Maybe you'll just need to extend the struct sigaction
> to pass the TLS address, but this doesn't look absolutely impossible...
I think that should be a separate SA_ flag down the road, if for no
other reason than that the SS thing can be done more or less
immediately, whereas getting FS and GS right needs to wait.
Also, it occurs to me that FS and GS could become more complicated.
There are some proposals to allow tasks to opt in to having a per-cpu
GS. If that happens, then figuring out how it would interact with
signals could be complicated.
--Andy
19.08.2015 18:46, Andy Lutomirski пишет:
> On Wed, Aug 19, 2015 at 2:35 AM, Stas Sergeev <[email protected]> wrote:
>>> Incidentally, I tried implementing the sigaction flag approach. I
>>> think it's no good. When we return from a signal, there's no concept
>>> of sigaction -- it's just sigreturn. Sigreturn can't look up the
>>> sigaction flags -- what if the signal handler calls sigaction itself.
>> How about the SA_hyz flag that does the following:
>> - Saves SS into sigcontext
>> - Forces SS to USER_DS on signal delivery
>> - Sets the uc_flags flag for sigreturn() to take care of the rest.
>> You'll have both the control on every bit of action, and a simple
>> detection logic: if SA_hyz didn't set the uc flag - it didn't work.
>> You can even employ your lar heuristic here for the case when the
>> aforementioned SA_hyz is not set. But please, please not when it is
>> set! In fact, I wonder if you had in mind exactly that: using the
>> lar heuristic only if the SA_hyz is not set. If so - I misunderstood.
>> Just please don't add it when it is set.
>
> Hmm, interesting. Maybe that would work for everything. How's this
> to make it concrete?
>
> Add a sigaction flag SA_RESTORE_SS.
>
> On signal delivery, always save SS into sigcontext->ss. if
> SA_RESTORE_SS is set, then unconditionally switch HW SS to __USER_DS
> and set UC_RESTORE_SS. If SA_RESTORE_SS is clear, then leave HW SS
> alone (i.e. preserve the old behavior).
Either that, or employ the lar heuristic for the "not set" case
(I think its not needed).
> On signal return, if UC_RESTORE_SS is set, then restore
> sigcontext->ss. If not, then set SS to __USER_DS (as old kernels
> did).
>
> This should change nothing at all (except the initial value of
> sigcontext->ss / __pad0) on old kernels.
Agreed.
>>> So we either need a per-task flag, a per-sighand flag, or a sigcontext
>>> flag indicating what we should do.
>>>
>>> (Yes, I suspect we really might want some way to get FS, GS, and their
>>> bases saved and restored, but I still think we should do that
>>> separately.)
>> In fact, I have already convinced myself that SA_hyz can take
>> care of both cases. :) Maybe you'll just need to extend the struct sigaction
>> to pass the TLS address, but this doesn't look absolutely impossible...
>
> I think that should be a separate SA_ flag down the road,
Of course separate.
I only wanted to get a "somewhat similar" solution, so if the
TLS case can also be covered by the (different) SA_ flag (plus
perhaps a struct sigaction extension), then its just excellent.
What I was afraid of is to get an SS-specific fix, like your
initial uc_flags+lar_heuristic solution. Such fix can't be
extended to the TLS case, reducing the chances for the TLS case
to be ever re-visited.
> Also, it occurs to me that FS and GS could become more complicated.
> There are some proposals to allow tasks to opt in to having a per-cpu
> GS. If that happens, then figuring out how it would interact with
> signals could be complicated.
Is this a problem?
At least on 86_64, according to this:
http://wiki.osdev.org/Thread_Local_Storage#x86-64
we only need FS.
In any case, since what you say about GS is optional and should
be explicitly requested by the task, it can just disable the use
of the new SA_TLS flag, making the sigaction() to return error for
example. I think the amount of the supported combinations should
be limited by the practical needs, and I don't think someone will
need per-cpu GS with sighandler re-setting it to something non-per-cpu.
* Andy Lutomirski <[email protected]> wrote:
> > The crash happens when DOS program terminates.
> > At that point dosemu subverts the execution flow by
> > replacing segregs and cs/ip ss/sp in sigcontext with its own.
> > But __pad0 still has DOS SS, which crash because (presumably)
> > the DOS LDT have been just removed.
>
> That's unfortunate.
>
> I don't really know what to do about this. My stupid heuristic for
> signal delivery seems unlikely to cause problems, but I'm not coming
> up with a great heuristic for detecting when a program that *modifies*
> sigcontext hasn't set all the fields. Even adding a flag won't really
> help here, since DOSEMU won't know to manipulate the flag.
>
> Ingo, here's the situation, assuming I remember the versions right:
>
> v4.0 and before: If we try to deliver a signal while SS is bad, we
> fail and the process dies. If SS is good but nonstandard, we end up
> in the signal handler with whatever SS value was loaded when the
> signal was sent. We do *not* put SS anywhere in the sigcontext, so
> the only way for a program to figure out what SS was is to look at the
> HW state before making any syscalls. We also don't even try to
> restore SS, so SS is unconditionally set to __USER_DS, necessitating
> nasty workarounds (and breaking all kinds of test cases).
>
> v4.1 and current -linus: We always set SS to __USER_DS when delivering
> a signal. We save the old SS in the sigcontext and restore it, just
> like 32-bit signals.
>
> My patch: We leave SS alone when delivering a signal, unless it's
> invalid, in which case we replace it with __USER_DS. We still save
> the old SS in the sigcontext and restore it on return.
>
> Apparently the remaining regression is that DOSEMU doesn't realize
> that SS is saved so, when it tries to return to full 64-bit mode after
> a signal that hit in 16-bit mode, it fails because it's invalidated
> the old SS descriptor in the mean time.
>
>
> So... what do we do about it? We could revert the whole mess.
Yes, absolutely - we should restore the ABI behavior to where v4.0 and before
stood.
> [...] We could tell everyone to fix their DOSEMU, which violates policy and is
> especially annoying given how much effort we've put into keeping 16-bit mode
> fully functional lately. [...]
So calling DOSEMU broken annoys me: there's nothing to 'fix' in DOSEMU really.
DOSEMU tried hard to work around a kernel bug. 100% of the blame and the
responsibility to keep things working is on the kernel side. No ifs and whens.
> [...] We could add yet more heuristics and teach sigreturn to ignore the saved
> SS value in sigcontext if the saved CS is 64-bit and the saved SS is unusable.
We could maybe try this - assuming it doesn't break DOSEMU.
Or we could extend the ABI and allow DOSEMU to cleanly use it.
I'm not sure it's worth the complexity, given that the DOSEMU workaround already
exists and will likely exist forever, so that it can support older kernels, right?
Thanks,
Ingo
22.08.2015 15:38, Ingo Molnar пишет:
>> [...] We could add yet more heuristics and teach sigreturn to ignore the saved
>> SS value in sigcontext if the saved CS is 64-bit and the saved SS is unusable.
> We could maybe try this - assuming it doesn't break DOSEMU.
>
> Or we could extend the ABI and allow DOSEMU to cleanly use it.
>
> I'm not sure it's worth the complexity, given that the DOSEMU workaround already
> exists and will likely exist forever, so that it can support older kernels, right?
That assumes dosemu is the only current and future
user of that functionality. That's why I raised the question
whether does Wine want to support a 32/16bit code in its
64bit builds too. So far it doesn't have such functionality,
likely waiting for the kernel to provide it, rather than going
the route of hacks and work-arounds as dosemu did.
Also, the fact that dosemu already have that functionality,
doesn't mean it will not use the new API - it actually will.
With a few run-time checks at first, but when all distros are
upgraded, there is simply no point to support the kernels
that are not in any current distro (after a few years of
deprecation perhaps).
Overall, I think the existance of a few hacks in dosemu is
not the reason for the linux kernel to stop any development
in that area. dosemu did a great work of exploring all the
pitfalls in that area, so now you know what to fix and how.
Please, keep the things going.
* Stas Sergeev <[email protected]> wrote:
> Also, the fact that dosemu already have that functionality,
> doesn't mean it will not use the new API - it actually will.
So if dosemu makes use of the new facility then sure, I'm not
against it at all!
Thanks,
Ingo