Hi all-
I'm working on a massive set of cleanups to Linux's syscall handling.
We currently have a nasty optimization in which we don't save rbx,
rbp, r12, r13, r14, and r15 on x86_64 before calling C functions.
This works, but it makes the code a huge mess. I'd rather save all
regs in asm and then call C code.
Unfortunately, this will add five cycles (on SNB) to one of the
hottest paths in the kernel. To counteract it, I have a gcc feature
request that might not be all that crazy. When writing C functions
intended to be called from asm, what if we could do:
__attribute__((extra_clobber("rbx", "rbp", "r12", "r13", "r14",
"r15"))) void func(void);
This will save enough pushes and pops that it could easily give us our
five cycles back and then some. It's also easy to be compatible with
old GCC versions -- we could just omit the attribute, since preserving
a register is always safe.
Thoughts? Is this totally crazy? Is it easy to implement?
(I'm not necessarily suggesting that we do this for the syscall bodies
themselves. I want to do it for the entry and exit helpers, so we'd
still lose the five cycles in the full fast-path case, but we'd do
better in the slower paths, and the slower paths are becoming
increasingly important in real workloads.)
Thanks,
Andy
On 06/30/2015 02:22 PM, Andy Lutomirski wrote:
> Hi all-
>
> I'm working on a massive set of cleanups to Linux's syscall handling.
> We currently have a nasty optimization in which we don't save rbx,
> rbp, r12, r13, r14, and r15 on x86_64 before calling C functions.
> This works, but it makes the code a huge mess. I'd rather save all
> regs in asm and then call C code.
>
> Unfortunately, this will add five cycles (on SNB) to one of the
> hottest paths in the kernel. To counteract it, I have a gcc feature
> request that might not be all that crazy. When writing C functions
> intended to be called from asm, what if we could do:
>
> __attribute__((extra_clobber("rbx", "rbp", "r12", "r13", "r14",
> "r15"))) void func(void);
>
> This will save enough pushes and pops that it could easily give us our
> five cycles back and then some. It's also easy to be compatible with
> old GCC versions -- we could just omit the attribute, since preserving
> a register is always safe.
>
> Thoughts? Is this totally crazy? Is it easy to implement?
>
> (I'm not necessarily suggesting that we do this for the syscall bodies
> themselves. I want to do it for the entry and exit helpers, so we'd
> still lose the five cycles in the full fast-path case, but we'd do
> better in the slower paths, and the slower paths are becoming
> increasingly important in real workloads.)
>
Some gcc targets have done this in the past. There are command-line
options to do that, but using attributes you have to handle cross-ABI
compilation.
However, I don't see this being done in the upstream gcc.
Keep in mind the runway that we'll need, though.
-hpa
On Tue, Jun 30, 2015 at 02:22:33PM -0700, Andy Lutomirski wrote:
> I'm working on a massive set of cleanups to Linux's syscall handling.
> We currently have a nasty optimization in which we don't save rbx,
> rbp, r12, r13, r14, and r15 on x86_64 before calling C functions.
> This works, but it makes the code a huge mess. I'd rather save all
> regs in asm and then call C code.
>
> Unfortunately, this will add five cycles (on SNB) to one of the
> hottest paths in the kernel. To counteract it, I have a gcc feature
> request that might not be all that crazy. When writing C functions
> intended to be called from asm, what if we could do:
>
> __attribute__((extra_clobber("rbx", "rbp", "r12", "r13", "r14",
> "r15"))) void func(void);
>
> This will save enough pushes and pops that it could easily give us our
> five cycles back and then some. It's also easy to be compatible with
> old GCC versions -- we could just omit the attribute, since preserving
> a register is always safe.
>
> Thoughts? Is this totally crazy? Is it easy to implement?
>
> (I'm not necessarily suggesting that we do this for the syscall bodies
> themselves. I want to do it for the entry and exit helpers, so we'd
> still lose the five cycles in the full fast-path case, but we'd do
> better in the slower paths, and the slower paths are becoming
> increasingly important in real workloads.)
GCC already supports -ffixed-REG, -fcall-used-REG and -fcall-saved-REG
options, which allow to tweak the calling conventions; but it is per
translation unit right now. It isn't clear which of these options
you mean with the extra_clobber.
I assume you are looking for a possibility to change this to be
per-function, with caller with a different calling convention having to
adjust for different ABI callee. To some extent, recent GCC versions
do that automatically with -fipa-ra already - if some call used registers
are not clobbered by some call and the caller can analyze that callee,
it can stick values in such registers across the call.
I'd say the most natural API for this would be to allow
f{fixed,call-{used,saved}}-REG in target attribute.
Jakub
On 06/30/2015 02:37 PM, Jakub Jelinek wrote:
> I'd say the most natural API for this would be to allow
> f{fixed,call-{used,saved}}-REG in target attribute.
Either that or
__attribute__((fixed(rbp,rcx),used(rax,rbx),saved(r11)))
... just to be shorter. Either way, I would consider this to be
desirable -- I have myself used this to good effect in a past life
(*cough* Transmeta *cough*) -- but not a high priority feature.
-hpa
On Tue, Jun 30, 2015 at 2:41 PM, H. Peter Anvin <[email protected]> wrote:
> On 06/30/2015 02:37 PM, Jakub Jelinek wrote:
>> I'd say the most natural API for this would be to allow
>> f{fixed,call-{used,saved}}-REG in target attribute.
>
> Either that or
>
> __attribute__((fixed(rbp,rcx),used(rax,rbx),saved(r11)))
>
> ... just to be shorter. Either way, I would consider this to be
> desirable -- I have myself used this to good effect in a past life
> (*cough* Transmeta *cough*) -- but not a high priority feature.
I think I mean the per-function equivalent of -fcall-used-reg, so
hpa's "used" suggestion would do the trick.
I guess that clobbering the frame pointer is a non-starter, but five
out of six isn't so bad. It would be nice to error out instead of
producing "disastrous results", though, if another bad reg is chosen.
(Presumably the PIC register on PIC builds would be an example of
that.)
--Andy
On 06/30/2015 02:48 PM, Andy Lutomirski wrote:
> On Tue, Jun 30, 2015 at 2:41 PM, H. Peter Anvin <[email protected]> wrote:
>> On 06/30/2015 02:37 PM, Jakub Jelinek wrote:
>>> I'd say the most natural API for this would be to allow
>>> f{fixed,call-{used,saved}}-REG in target attribute.
>>
>> Either that or
>>
>> __attribute__((fixed(rbp,rcx),used(rax,rbx),saved(r11)))
>>
>> ... just to be shorter. Either way, I would consider this to be
>> desirable -- I have myself used this to good effect in a past life
>> (*cough* Transmeta *cough*) -- but not a high priority feature.
>
> I think I mean the per-function equivalent of -fcall-used-reg, so
> hpa's "used" suggestion would do the trick.
>
> I guess that clobbering the frame pointer is a non-starter, but five
> out of six isn't so bad. It would be nice to error out instead of
> producing "disastrous results", though, if another bad reg is chosen.
> (Presumably the PIC register on PIC builds would be an example of
> that.)
>
Clobbering the frame pointer is perfectly fine, as is the PIC register.
However, gcc might need to handle them as "fixed" rather than "clobbered".
-hpa
On Tue, Jun 30, 2015 at 2:52 PM, H. Peter Anvin <[email protected]> wrote:
> On 06/30/2015 02:48 PM, Andy Lutomirski wrote:
>> On Tue, Jun 30, 2015 at 2:41 PM, H. Peter Anvin <[email protected]> wrote:
>>> On 06/30/2015 02:37 PM, Jakub Jelinek wrote:
>>>> I'd say the most natural API for this would be to allow
>>>> f{fixed,call-{used,saved}}-REG in target attribute.
>>>
>>> Either that or
>>>
>>> __attribute__((fixed(rbp,rcx),used(rax,rbx),saved(r11)))
>>>
>>> ... just to be shorter. Either way, I would consider this to be
>>> desirable -- I have myself used this to good effect in a past life
>>> (*cough* Transmeta *cough*) -- but not a high priority feature.
>>
>> I think I mean the per-function equivalent of -fcall-used-reg, so
>> hpa's "used" suggestion would do the trick.
>>
>> I guess that clobbering the frame pointer is a non-starter, but five
>> out of six isn't so bad. It would be nice to error out instead of
>> producing "disastrous results", though, if another bad reg is chosen.
>> (Presumably the PIC register on PIC builds would be an example of
>> that.)
>>
>
> Clobbering the frame pointer is perfectly fine, as is the PIC register.
> However, gcc might need to handle them as "fixed" rather than "clobbered".
Hmm. True, I guess, although I wouldn't necessarily expect gcc to be
able to generate code to call a function like that.
--Andy
On 06/30/2015 02:55 PM, Andy Lutomirski wrote:
> On Tue, Jun 30, 2015 at 2:52 PM, H. Peter Anvin <[email protected]> wrote:
>> On 06/30/2015 02:48 PM, Andy Lutomirski wrote:
>>> On Tue, Jun 30, 2015 at 2:41 PM, H. Peter Anvin <[email protected]> wrote:
>>>> On 06/30/2015 02:37 PM, Jakub Jelinek wrote:
>>>>> I'd say the most natural API for this would be to allow
>>>>> f{fixed,call-{used,saved}}-REG in target attribute.
>>>>
>>>> Either that or
>>>>
>>>> __attribute__((fixed(rbp,rcx),used(rax,rbx),saved(r11)))
>>>>
>>>> ... just to be shorter. Either way, I would consider this to be
>>>> desirable -- I have myself used this to good effect in a past life
>>>> (*cough* Transmeta *cough*) -- but not a high priority feature.
>>>
>>> I think I mean the per-function equivalent of -fcall-used-reg, so
>>> hpa's "used" suggestion would do the trick.
>>>
>>> I guess that clobbering the frame pointer is a non-starter, but five
>>> out of six isn't so bad. It would be nice to error out instead of
>>> producing "disastrous results", though, if another bad reg is chosen.
>>> (Presumably the PIC register on PIC builds would be an example of
>>> that.)
>>>
>>
>> Clobbering the frame pointer is perfectly fine, as is the PIC register.
>> However, gcc might need to handle them as "fixed" rather than "clobbered".
>
> Hmm. True, I guess, although I wouldn't necessarily expect gcc to be
> able to generate code to call a function like that.
>
No, but you need to be able to call other functions, or you just push
the issue down one level.
-hpa
On 06/30/2015 04:02 PM, H. Peter Anvin wrote:
> On 06/30/2015 02:55 PM, Andy Lutomirski wrote:
>> On Tue, Jun 30, 2015 at 2:52 PM, H. Peter Anvin <[email protected]> wrote:
>>> On 06/30/2015 02:48 PM, Andy Lutomirski wrote:
>>>> On Tue, Jun 30, 2015 at 2:41 PM, H. Peter Anvin <[email protected]> wrote:
>>>>> On 06/30/2015 02:37 PM, Jakub Jelinek wrote:
>>>>>> I'd say the most natural API for this would be to allow
>>>>>> f{fixed,call-{used,saved}}-REG in target attribute.
>>>>>
>>>>> Either that or
>>>>>
>>>>> __attribute__((fixed(rbp,rcx),used(rax,rbx),saved(r11)))
>>>>>
>>>>> ... just to be shorter. Either way, I would consider this to be
>>>>> desirable -- I have myself used this to good effect in a past life
>>>>> (*cough* Transmeta *cough*) -- but not a high priority feature.
>>>>
>>>> I think I mean the per-function equivalent of -fcall-used-reg, so
>>>> hpa's "used" suggestion would do the trick.
>>>>
>>>> I guess that clobbering the frame pointer is a non-starter, but five
>>>> out of six isn't so bad. It would be nice to error out instead of
>>>> producing "disastrous results", though, if another bad reg is chosen.
>>>> (Presumably the PIC register on PIC builds would be an example of
>>>> that.)
>>>>
>>>
>>> Clobbering the frame pointer is perfectly fine, as is the PIC register.
>>> However, gcc might need to handle them as "fixed" rather than "clobbered".
>>
>> Hmm. True, I guess, although I wouldn't necessarily expect gcc to be
>> able to generate code to call a function like that.
>>
>
> No, but you need to be able to call other functions, or you just push
> the issue down one level.
For ia32, the PIC register really isn't special anymore. I'd be
surprised if you couldn't clobber it.
jeff
On 06/30/2015 05:37 PM, Jakub Jelinek wrote:
> On Tue, Jun 30, 2015 at 02:22:33PM -0700, Andy Lutomirski wrote:
>> I'm working on a massive set of cleanups to Linux's syscall handling.
>> We currently have a nasty optimization in which we don't save rbx,
>> rbp, r12, r13, r14, and r15 on x86_64 before calling C functions.
>> This works, but it makes the code a huge mess. I'd rather save all
>> regs in asm and then call C code.
>>
>> Unfortunately, this will add five cycles (on SNB) to one of the
>> hottest paths in the kernel. To counteract it, I have a gcc feature
>> request that might not be all that crazy. When writing C functions
>> intended to be called from asm, what if we could do:
>>
>> __attribute__((extra_clobber("rbx", "rbp", "r12", "r13", "r14",
>> "r15"))) void func(void);
>>
>> This will save enough pushes and pops that it could easily give us our
>> five cycles back and then some. It's also easy to be compatible with
>> old GCC versions -- we could just omit the attribute, since preserving
>> a register is always safe.
>>
>> Thoughts? Is this totally crazy? Is it easy to implement?
>>
>> (I'm not necessarily suggesting that we do this for the syscall bodies
>> themselves. I want to do it for the entry and exit helpers, so we'd
>> still lose the five cycles in the full fast-path case, but we'd do
>> better in the slower paths, and the slower paths are becoming
>> increasingly important in real workloads.)
> GCC already supports -ffixed-REG, -fcall-used-REG and -fcall-saved-REG
> options, which allow to tweak the calling conventions; but it is per
> translation unit right now. It isn't clear which of these options
> you mean with the extra_clobber.
> I assume you are looking for a possibility to change this to be
> per-function, with caller with a different calling convention having to
> adjust for different ABI callee. To some extent, recent GCC versions
> do that automatically with -fipa-ra already - if some call used registers
> are not clobbered by some call and the caller can analyze that callee,
> it can stick values in such registers across the call.
> I'd say the most natural API for this would be to allow
> f{fixed,call-{used,saved}}-REG in target attribute.
>
>
One consequence of frequent changing calling convention per function or
register usage could be GCC slowdown. RA calculates too many data and
it requires a lot of time to recalculate them after something in the
register usage convention is changed.
Another consequence would be that RA fails generate the code in some
cases and even worse the failure might depend on version of GCC (I
already saw PRs where RA worked for an asm in one GCC version because a
pseudo was changed by equivalent constant and failed in another GCC
version where it did not happen).
Other than that I don't see other complications with implementing such
feature.
On Wed, Jul 1, 2015 at 8:23 AM, Vladimir Makarov <[email protected]> wrote:
>
>
> On 06/30/2015 05:37 PM, Jakub Jelinek wrote:
>>
>> On Tue, Jun 30, 2015 at 02:22:33PM -0700, Andy Lutomirski wrote:
>>>
>>> I'm working on a massive set of cleanups to Linux's syscall handling.
>>> We currently have a nasty optimization in which we don't save rbx,
>>> rbp, r12, r13, r14, and r15 on x86_64 before calling C functions.
>>> This works, but it makes the code a huge mess. I'd rather save all
>>> regs in asm and then call C code.
>>>
>>> Unfortunately, this will add five cycles (on SNB) to one of the
>>> hottest paths in the kernel. To counteract it, I have a gcc feature
>>> request that might not be all that crazy. When writing C functions
>>> intended to be called from asm, what if we could do:
>>>
>>> __attribute__((extra_clobber("rbx", "rbp", "r12", "r13", "r14",
>>> "r15"))) void func(void);
>>>
>>> This will save enough pushes and pops that it could easily give us our
>>> five cycles back and then some. It's also easy to be compatible with
>>> old GCC versions -- we could just omit the attribute, since preserving
>>> a register is always safe.
>>>
>>> Thoughts? Is this totally crazy? Is it easy to implement?
>>>
>>> (I'm not necessarily suggesting that we do this for the syscall bodies
>>> themselves. I want to do it for the entry and exit helpers, so we'd
>>> still lose the five cycles in the full fast-path case, but we'd do
>>> better in the slower paths, and the slower paths are becoming
>>> increasingly important in real workloads.)
>>
>> GCC already supports -ffixed-REG, -fcall-used-REG and -fcall-saved-REG
>> options, which allow to tweak the calling conventions; but it is per
>> translation unit right now. It isn't clear which of these options
>> you mean with the extra_clobber.
>> I assume you are looking for a possibility to change this to be
>> per-function, with caller with a different calling convention having to
>> adjust for different ABI callee. To some extent, recent GCC versions
>> do that automatically with -fipa-ra already - if some call used registers
>> are not clobbered by some call and the caller can analyze that callee,
>> it can stick values in such registers across the call.
>> I'd say the most natural API for this would be to allow
>> f{fixed,call-{used,saved}}-REG in target attribute.
>>
>>
> One consequence of frequent changing calling convention per function or
> register usage could be GCC slowdown. RA calculates too many data and it
> requires a lot of time to recalculate them after something in the register
> usage convention is changed.
Do you mean that RA precalculates things based on the calling
convention and saves it across functions? Hmm. I don't think this
would be a big problem in my intended use case -- there would only be
a handful of functions using this extension, and they'd have very few
non-asm callers.
>
> Another consequence would be that RA fails generate the code in some cases
> and even worse the failure might depend on version of GCC (I already saw PRs
> where RA worked for an asm in one GCC version because a pseudo was changed
> by equivalent constant and failed in another GCC version where it did not
> happen).
>
Would this be a problem generating code for a function with extra
"used" regs or just a problem generating code to call such a function.
I imagine that, in the former case, RA's job would be easier, not
harder, since there would be more registers to work with. In
practice, though, I think it would just end up changing the prologue
and epilogue.
--Andy
On Wed, Jul 01, 2015 at 11:23:17AM -0400, Vladimir Makarov wrote:
> >>(I'm not necessarily suggesting that we do this for the syscall bodies
> >>themselves. I want to do it for the entry and exit helpers, so we'd
> >>still lose the five cycles in the full fast-path case, but we'd do
> >>better in the slower paths, and the slower paths are becoming
> >>increasingly important in real workloads.)
> >GCC already supports -ffixed-REG, -fcall-used-REG and -fcall-saved-REG
> >options, which allow to tweak the calling conventions; but it is per
> >translation unit right now. It isn't clear which of these options
> >you mean with the extra_clobber.
> >I assume you are looking for a possibility to change this to be
> >per-function, with caller with a different calling convention having to
> >adjust for different ABI callee. To some extent, recent GCC versions
> >do that automatically with -fipa-ra already - if some call used registers
> >are not clobbered by some call and the caller can analyze that callee,
> >it can stick values in such registers across the call.
> >I'd say the most natural API for this would be to allow
> >f{fixed,call-{used,saved}}-REG in target attribute.
> >
> >
> One consequence of frequent changing calling convention per function or
> register usage could be GCC slowdown. RA calculates too many data and it
> requires a lot of time to recalculate them after something in the register
> usage convention is changed.
That is true. i?86/x86_64 is a switchable target, so at least for the case
of info computed for the callee with non-standard calling convention such
info can be computed just once when the function with such a target
attribute would be seen first. But for the caller side, I agree not
everything can be precomputed, if we can't use e.g. regsets saved in the
callee; as a single function can call different functions with different
ABIs. But to some extent we have that already with -fipa-ra, don't we?
Jakub
On 07/01/2015 11:31 AM, Jakub Jelinek wrote:
> On Wed, Jul 01, 2015 at 11:23:17AM -0400, Vladimir Makarov wrote:
>>>> (I'm not necessarily suggesting that we do this for the syscall bodies
>>>> themselves. I want to do it for the entry and exit helpers, so we'd
>>>> still lose the five cycles in the full fast-path case, but we'd do
>>>> better in the slower paths, and the slower paths are becoming
>>>> increasingly important in real workloads.)
>>> GCC already supports -ffixed-REG, -fcall-used-REG and -fcall-saved-REG
>>> options, which allow to tweak the calling conventions; but it is per
>>> translation unit right now. It isn't clear which of these options
>>> you mean with the extra_clobber.
>>> I assume you are looking for a possibility to change this to be
>>> per-function, with caller with a different calling convention having to
>>> adjust for different ABI callee. To some extent, recent GCC versions
>>> do that automatically with -fipa-ra already - if some call used registers
>>> are not clobbered by some call and the caller can analyze that callee,
>>> it can stick values in such registers across the call.
>>> I'd say the most natural API for this would be to allow
>>> f{fixed,call-{used,saved}}-REG in target attribute.
>>>
>>>
>> One consequence of frequent changing calling convention per function or
>> register usage could be GCC slowdown. RA calculates too many data and it
>> requires a lot of time to recalculate them after something in the register
>> usage convention is changed.
> That is true. i?86/x86_64 is a switchable target, so at least for the case
> of info computed for the callee with non-standard calling convention such
> info can be computed just once when the function with such a target
> attribute would be seen first.
Yes, more clever way could be used. We can can calculate the info for
specific calling convention, save it and reuse it for the function with
the same attributes. The compilation speed will be ok even with the
current implementation if there are few calling convention changes.
> But for the caller side, I agree not
> everything can be precomputed, if we can't use e.g. regsets saved in the
> callee; as a single function can call different functions with different
> ABIs. But to some extent we have that already with -fipa-ra, don't we?
>
>
Yes, for -fipa-ra if we saw the function, we know what registers it
actually clobbers. If we did not processed it yet, we use the worst
case scenario (clobbering all clobbered registers according to calling
convention).
Actually it raise a question for me. If we describe that a function
clobbers more than calling convention and then use it as a value
(assigning a variable or passing as an argument) and loosing a track of
it and than call it. How can RA know what the call clobbers actually.
So for the function with the attributes we should prohibit use it as a
value or make the attributes as a part of the function type, or at least
say it is unsafe. So now I see this as a *bigger problem* with this
extension. Although I guess it already exists as we have description of
different ABI as an extension.
On Wed, Jul 1, 2015 at 10:35 AM, Vladimir Makarov <[email protected]> wrote:
> Actually it raise a question for me. If we describe that a function
> clobbers more than calling convention and then use it as a value (assigning
> a variable or passing as an argument) and loosing a track of it and than
> call it. How can RA know what the call clobbers actually. So for the
> function with the attributes we should prohibit use it as a value or make
> the attributes as a part of the function type, or at least say it is unsafe.
I think it should be part of the type. This shouldn't compile:
void func(void) __attribute__((used_reg("r12")));
void (*x)(void);
x = func;
--Andy
On Wed, Jul 01, 2015 at 01:35:16PM -0400, Vladimir Makarov wrote:
> Actually it raise a question for me. If we describe that a function
> clobbers more than calling convention and then use it as a value (assigning
> a variable or passing as an argument) and loosing a track of it and than
> call it. How can RA know what the call clobbers actually. So for the
> function with the attributes we should prohibit use it as a value or make
> the attributes as a part of the function type, or at least say it is unsafe.
> So now I see this as a *bigger problem* with this extension. Although I
> guess it already exists as we have description of different ABI as an
> extension.
Unfortunately target attribute is function decl attribute rather than
function type. And having more attributes affect switchable targets will be
non-fun.
Jakub
On 07/01/2015 11:27 AM, Andy Lutomirski wrote:
> On Wed, Jul 1, 2015 at 8:23 AM, Vladimir Makarov <[email protected]> wrote:
>>
>> On 06/30/2015 05:37 PM, Jakub Jelinek wrote:
>>> On Tue, Jun 30, 2015 at 02:22:33PM -0700, Andy Lutomirski wrote:
>>>> I'm working on a massive set of cleanups to Linux's syscall handling.
>>>> We currently have a nasty optimization in which we don't save rbx,
>>>> rbp, r12, r13, r14, and r15 on x86_64 before calling C functions.
>>>> This works, but it makes the code a huge mess. I'd rather save all
>>>> regs in asm and then call C code.
>>>>
>>>> Unfortunately, this will add five cycles (on SNB) to one of the
>>>> hottest paths in the kernel. To counteract it, I have a gcc feature
>>>> request that might not be all that crazy. When writing C functions
>>>> intended to be called from asm, what if we could do:
>>>>
>>>> __attribute__((extra_clobber("rbx", "rbp", "r12", "r13", "r14",
>>>> "r15"))) void func(void);
>>>>
>>>> This will save enough pushes and pops that it could easily give us our
>>>> five cycles back and then some. It's also easy to be compatible with
>>>> old GCC versions -- we could just omit the attribute, since preserving
>>>> a register is always safe.
>>>>
>>>> Thoughts? Is this totally crazy? Is it easy to implement?
>>>>
>>>> (I'm not necessarily suggesting that we do this for the syscall bodies
>>>> themselves. I want to do it for the entry and exit helpers, so we'd
>>>> still lose the five cycles in the full fast-path case, but we'd do
>>>> better in the slower paths, and the slower paths are becoming
>>>> increasingly important in real workloads.)
>>> GCC already supports -ffixed-REG, -fcall-used-REG and -fcall-saved-REG
>>> options, which allow to tweak the calling conventions; but it is per
>>> translation unit right now. It isn't clear which of these options
>>> you mean with the extra_clobber.
>>> I assume you are looking for a possibility to change this to be
>>> per-function, with caller with a different calling convention having to
>>> adjust for different ABI callee. To some extent, recent GCC versions
>>> do that automatically with -fipa-ra already - if some call used registers
>>> are not clobbered by some call and the caller can analyze that callee,
>>> it can stick values in such registers across the call.
>>> I'd say the most natural API for this would be to allow
>>> f{fixed,call-{used,saved}}-REG in target attribute.
>>>
>>>
>> One consequence of frequent changing calling convention per function or
>> register usage could be GCC slowdown. RA calculates too many data and it
>> requires a lot of time to recalculate them after something in the register
>> usage convention is changed.
> Do you mean that RA precalculates things based on the calling
> convention and saves it across functions?
RA calculates a lot info (register classes, class x class relations etc)
based on register usage convention (fixed regs, call used registers
etc). If register usage convention is not changed from previous
function compilation, RA reuses the info. Otherwise, RA recalculates it.
> Hmm. I don't think this
> would be a big problem in my intended use case -- there would only be
> a handful of functions using this extension, and they'd have very few
> non-asm callers.
Good. I guess it will be rarely used and people will tolerate some
extra compilation time.
>> Another consequence would be that RA fails generate the code in some cases
>> and even worse the failure might depend on version of GCC (I already saw PRs
>> where RA worked for an asm in one GCC version because a pseudo was changed
>> by equivalent constant and failed in another GCC version where it did not
>> happen).
>>
> Would this be a problem generating code for a function with extra
> "used" regs or just a problem generating code to call such a function.
> I imagine that, in the former case, RA's job would be easier, not
> harder, since there would be more registers to work with.
Sorry, I meant that the problem will be mostly when the attributes
describe more fixed regs. If you describe more clobbered regs, they
still can be used for allocator which can spill/restore them (around
calls) when they can not be used. Still i think there will be some rare
and complicated cases where even describing only clobbered regs can make
RA fails in a function calling the function with additional clobbered regs.
> In
> practice, though, I think it would just end up changing the prologue
> and epilogue.
>
On 07/01/2015 01:43 PM, Jakub Jelinek wrote:
> On Wed, Jul 01, 2015 at 01:35:16PM -0400, Vladimir Makarov wrote:
>> Actually it raise a question for me. If we describe that a function
>> clobbers more than calling convention and then use it as a value (assigning
>> a variable or passing as an argument) and loosing a track of it and than
>> call it. How can RA know what the call clobbers actually. So for the
>> function with the attributes we should prohibit use it as a value or make
>> the attributes as a part of the function type, or at least say it is unsafe.
>> So now I see this as a *bigger problem* with this extension. Although I
>> guess it already exists as we have description of different ABI as an
>> extension.
> Unfortunately target attribute is function decl attribute rather than
> function type. And having more attributes affect switchable targets will be
> non-fun.
>
>
Making attributes a part of type probably creates a lot issues too.
Although I am not a front-end developer, still I think it is hard to
implement in front-end. Sticking fully to this approach, it would be
logical to describe this as a debug info (I am not sure it is even
possible).
Portability would be an issue too. It is hard to prevent for a regular
C developer to assign such function to variable because it is ok on his
system while the compilation of such code may fail on another system.
On Wed, Jul 1, 2015 at 10:43 AM, Jakub Jelinek <[email protected]> wrote:
> On Wed, Jul 01, 2015 at 01:35:16PM -0400, Vladimir Makarov wrote:
>> Actually it raise a question for me. If we describe that a function
>> clobbers more than calling convention and then use it as a value (assigning
>> a variable or passing as an argument) and loosing a track of it and than
>> call it. How can RA know what the call clobbers actually. So for the
>> function with the attributes we should prohibit use it as a value or make
>> the attributes as a part of the function type, or at least say it is unsafe.
>> So now I see this as a *bigger problem* with this extension. Although I
>> guess it already exists as we have description of different ABI as an
>> extension.
>
> Unfortunately target attribute is function decl attribute rather than
> function type. And having more attributes affect switchable targets will be
> non-fun.
Just to make sure we're on the same page here, if I write:
extern void normal_func(void);
void weird_func(void) __attribute__((used_regs("r12")))
{
// do something
normal_func();
// do something
}
I'd want the code that calls normal_func() to be understand that
normal_func() *will* preserve r12 despite the fact that weird_func is
allowed to clobber r12. I think this means that the attribute would
have to be an attribute of a function, not of the RA while compiling
the function.
--Andy
On 07/01/2015 10:43 AM, Jakub Jelinek wrote:
> On Wed, Jul 01, 2015 at 01:35:16PM -0400, Vladimir Makarov wrote:
>> Actually it raise a question for me. If we describe that a function
>> clobbers more than calling convention and then use it as a value (assigning
>> a variable or passing as an argument) and loosing a track of it and than
>> call it. How can RA know what the call clobbers actually. So for the
>> function with the attributes we should prohibit use it as a value or make
>> the attributes as a part of the function type, or at least say it is unsafe.
>> So now I see this as a *bigger problem* with this extension. Although I
>> guess it already exists as we have description of different ABI as an
>> extension.
>
> Unfortunately target attribute is function decl attribute rather than
> function type. And having more attributes affect switchable targets will be
> non-fun.
>
How on Earth does that work with existing switchable ABIs? Keep in mind
that we already support multiple ABIs...
-hpa