2008-06-25 16:08:01

by Agner Fog

[permalink] [raw]
Subject: ABI change for device drivers using future AVX instruction set

The announced future Intel AVX instruction set extends the 128-bit XMM
registers to 256-bit YMM registers.

Intel has proposed relevant ABI extensions, which I assume will be
adopted in the System V ABI. See references below.

Some details are not covered in the Intel documents. I have discussed
this with an Intel engineer who has supplied all the details I asked
for. I have listed the necessary ABI changes in detail in my manual on
calling conventions (see below).

One problem that has not been resolved yet, AFAIK, is how to handle the
possible use of YMM registers in device drivers. Should these registers
be saved before calling a device driver from an interrupt or should it
be the responsibility of the device driver?

This is particularly problematic for the following reasons:

1. The YMM registers must be saved with the new instruction XSAVE and
restored with XRESTOR if done in the device driver. Saving registers
individually will be incompatible with future extensions of the register
size to more than 256 bits.

2. There is a performance cost to using XSAVE / XRESTOR.

3. When compiling a device driver, the compiler may insert implicit
calls to library functions such as memcpy and memset. These functions
typically have a CPU dispatcher that chooses the largest register size
available. The device driver may therefore use YMM registers without the
knowledge of the programmer and without compiling with the AVX switch on.

4. The consequences of failing to save the YMM registers properly would
be intermittent and irreproducible errors that are difficult to trace.

The possible solutions, as I see it, are:

A. The operating system saves the state with XSAVE before calling a
device driver from an interrupt and restores with XRESTOR. The device
driver can use any register. This method is safe but has a performance
penalty.

B. The operating system disables the use of YMM registers with the
instruction XSETBV before calling a device driver from an interrupt. If
the device driver needs to use YMM registers it must save the state with
XSAVE before enabling YMM with XSETBV, and reverse these actions before
returning.

C. Make it the responsibility of the device driver to avoid the use of
YMM registers unless it saves the state with XSAVE. This solution
requires that available compilers have a switch to disable calls to
library functions with internal CPU dispatchers. Appears unsafe to me.

A decision on this question should be made and published in the ABI so
that people can make compatible device drivers.

Note:
Please Cc: me on this thread. I am not on this mailing list and I am not
involved with Linux development.

References:
Intel AVX programming reference:
http://softwarecommunity.intel.com/isn/downloads/intelavx/Intel-AVX-Programming-Reference-31943302.pdf

Intel proposed ABI extensions (very brief):
http://intel.wingateweb.com/SHchina/published/NGMS002/SP_NGMS002_100r_eng.pdf

My interpretation of the ABI extensions in detail:
http://www.agner.org/optimize/calling_conventions.pdf

My discussion with Mark Buxton, Intel
http://softwarecommunity.intel.com/isn/Community/en-US/forums/thread/30257153.aspx


2008-06-25 16:22:36

by Arjan van de Ven

[permalink] [raw]
Subject: Re: ABI change for device drivers using future AVX instruction set

On Wed, 25 Jun 2008 17:32:36 +0200
Agner Fog <[email protected]> wrote:

>
> One problem that has not been resolved yet, AFAIK, is how to handle
> the possible use of YMM registers in device drivers. Should these
> registers be saved before calling a device driver from an interrupt
> or should it be the responsibility of the device driver?

the answer is: you don't use AVX in drivers just as you don't use SSE
in drivers.

Let me repeat this loud and clear:

It is not allowed to use floating point, SSE of AVX in device drivers.

(there are few places that very carefully do this anyway, the raid5
code being the most obvious one, but we should not add more)


> 2. There is a performance cost to using XSAVE / XRESTOR.

a patch introducing xsave/xrestor is being discussed already here...
it's not really more expensive.

>
> 3. When compiling a device driver, the compiler may insert implicit
> calls to library functions such as memcpy and memset. These functions
> typically have a CPU dispatcher that chooses the largest register
> size available. The device driver may therefore use YMM registers
> without the knowledge of the programmer and without compiling with
> the AVX switch on.

this is not correct; the library functions are provided by the kernel
and do not use AVX etc.


--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2008-06-25 19:54:22

by Agner Fog

[permalink] [raw]
Subject: Re: ABI change for device drivers using future AVX instruction set

Arjan van de Ven wrote:
>Let me repeat this loud and clear:
>It is not allowed to use floating point, SSE of AVX in device drivers.

Thank you for a clear answer. Re-reading the ABI:
>A.2.4 Miscellaneous Remarks
>Linux Kernel code is not allowed to change the x87 and SSE units. If
those are
>changed by kernel code, they have to be restored properly before
sleeping or leaving
>the kernel. On preemptive kernels also more precautions may be needed.

This states your point, but certainly not "loud and clear" :-)

This ABI statement has to be amended, however, because it is allowed to
use any register as long it is "restored properly". A programmer can
save a single XMM register, use it, and restore it. A programmer might
think that it is possible to do the same with a YMM register, but this
will not work on future systems that extend the 256-bit YMM registers to
512-bit ZMM or whatever. This has to be said "loud and clear" because
nobody would discover the error until a distant future when we have ZMM.

Let me explain the background for this as I understand it:
Intel have decided to make two versions of every existing XMM
instruction in the new AVX: A legacy version that leaves the upper half
of the YMM register unchanged, and a new AVX version that clears the
upper part of the YMM register in order to avoid false dependencies.
This very expensive decision was necessary for only one reason: To
prevent existing device drivers from destroying the upper part of the
YMM registers. Learning from past mistakes, they are saying already now
that they will not use the same fix next time the registers are made
bigger. When ZMM and still bigger registers are introduced, the YMM
instructions will clear bit number 256 and up of the ZMM or whatever
registers.

Note: Please Cc: me of any replies. I am not on the mailing list.

2008-06-25 20:09:29

by Arjan van de Ven

[permalink] [raw]
Subject: Re: ABI change for device drivers using future AVX instruction set

On Wed, 25 Jun 2008 21:54:04 +0200
Agner Fog <[email protected]> wrote:

> Arjan van de Ven wrote:
> >Let me repeat this loud and clear:
> >It is not allowed to use floating point, SSE of AVX in device
> >drivers.
>
> Thank you for a clear answer. Re-reading the ABI:
> >A.2.4 Miscellaneous Remarks
> >Linux Kernel code is not allowed to change the x87 and SSE units.
> >If
> those are
> >changed by kernel code, they have to be restored properly before
> sleeping or leaving
> >the kernel. On preemptive kernels also more precautions may be
> >needed.
>
> This states your point, but certainly not "loud and clear" :-)
>
> This ABI statement has to be amended, however, because it is allowed

the linux kernel policy is loud and clear; this is more an OS policy as
it is a platform ABI issue. We as linux do not allow this (with the
small footnote around the raid stuff, but that has specific very
careful use). In addition, you really aren't allowed to use floating
point in the kernel period as concept, if you need it you very likely
are doing something wrong.


--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2008-06-25 20:32:22

by Alan

[permalink] [raw]
Subject: Re: ABI change for device drivers using future AVX instruction set

> 3. When compiling a device driver, the compiler may insert implicit
> calls to library functions such as memcpy and memset. These functions
> typically have a CPU dispatcher that chooses the largest register size
> available. The device driver may therefore use YMM registers without the
> knowledge of the programmer and without compiling with the AVX switch on.

Linux uses its own routines in these cases - and in fact for MMX we
generally avoid kernel usage in order to avoid the overheads (ditto FP).

> A decision on this question should be made and published in the ABI so
> that people can make compatible device drivers.

The beauty of the Linux kernel being source based - we don't have to worry
about ABI problems like this.

Alan

2008-06-26 01:14:42

by H. Peter Anvin

[permalink] [raw]
Subject: Re: ABI change for device drivers using future AVX instruction set

Arjan van de Ven wrote:
>
> Let me repeat this loud and clear:
>
> It is not allowed to use floating point, SSE of AVX in device drivers.
>
> (there are few places that very carefully do this anyway, the raid5
> code being the most obvious one, but we should not add more)
>

Sadly, AVX repeats the mistakes of SSE1, and doesn't implement proper
wide support for integer operations. It has the basic bitwide stuff,
which makes it usable for RAID-5, but it doesn't extend MMX (which SSE2
eventually got around to), so it's not usable for RAID-6. I'd hoped to
find a VPERM-style instruction, like SSE5 has :(

-hpa

2008-06-26 14:01:19

by Andi Kleen

[permalink] [raw]
Subject: Re: ABI change for device drivers using future AVX instruction set

Agner Fog <[email protected]> writes:

> Some details are not covered in the Intel documents. I have discussed
> this with an Intel engineer who has supplied all the details I asked
> for. I have listed the necessary ABI changes in detail in my manual on
> calling conventions (see below).

The discussion there pretty much only applies to other operating
systems, not Linux.

> One problem that has not been resolved yet, AFAIK, is how to handle
> the possible use of YMM registers in device drivers. Should these
> registers be saved before calling a device driver from an interrupt or
> should it be the responsibility of the device driver?

The Linux kernel always saves FP and related state only in the function
that uses it (in the "device driver" in your terminology), never on
kernel entry/exit (at least on x86, some other architectures like
IA64 use slightly different schemes because there even standard
integer code sometimes uses the FPU for common operations)

Also actually using it is very rare. In the mainline kernel it's only
the software RAID support functions.

So you could say it's solved.

> This is particularly problematic for the following reasons:
>
> 1. The YMM registers must be saved with the new instruction XSAVE and
> restored with XRESTOR if done in the device driver. Saving registers
> individually will be incompatible with future extensions of the
> register size to more than 256 bits.

The device driver should know what registers it uses so it can
call the right functions. But Linux has standard functions to do this
anyways, and I believe in the current XSAVE patchkit they already
use XSAVE.

> 3. When compiling a device driver, the compiler may insert implicit
> calls to library functions such as memcpy and memset. These functions
> typically have a CPU dispatcher that chooses the largest register size

Not in the Linux kernel. Or rather there is a CPU dispatcher,
but it only selects between two simple integer copying strategies
(string instructions and an unrolled loop).

That works because most operations in the Linux kernel are only
on 4K pieces where SSE* operations tend to not make too much sense.

> A. The operating system saves the state with XSAVE before calling a
> device driver from an interrupt and restores with XRESTOR. The device
> driver can use any register. This method is safe but has a performance
> penalty.

Linux never did that and there's no need to do it.

> C. Make it the responsibility of the device driver to avoid the use of
> YMM registers unless it saves the state with XSAVE. This solution
> requires that available compilers have a switch to disable calls to
> library functions with internal CPU dispatchers. Appears unsafe to me.

Kernel modules are always compiled with FP/MMX/SSE/etc. disabled
on the compiler level. And we only support one compiler (gcc)

BTW there's also a D. you didn't mention: many OS (including Linux) use lazy
FPU saving schemes and the exception handler doing that could be fixed for
supporting kernel code too. Right now it doesn't on Linux because there's
no need.

-Andi

2008-06-27 11:32:09

by Agner Fog

[permalink] [raw]
Subject: Re: ABI change for device drivers using future AVX instruction set

Arjan van de Ven wrote:

>the linux kernel policy is loud and clear; this is more an OS policy as
>it is a platform ABI issue.

It doesn't help to say it "loud and clear" in a closed mailing list. It has to go into an official document that people can find, and the ABI is the most natural place to look for such rules. The ABI is hard enough to find. Is there an official OS policy document somewhere that I haven't found? Please point me to an authoritative document.

You can't blame driver makers for using XMM or YMM registers in inline assembly or intrinsic functions or calling their own libraries or using a different compiler unless there is an official rule against it written in some official document that is easy to find. If you want to move data in a device driver, it is tempting indeed to use the largest register size available.

I will write the rules in my manual, but it is not authoritative.


H. Peter Anvin wrote:
>Sadly, AVX repeats the mistakes of SSE1, and doesn't implement proper
wide support for integer operations.
>It has the basic bitwide stuff, which makes it usable for RAID-5, but
it doesn't extend MMX (which SSE2
>eventually got around to), so it's not usable for RAID-6. I'd hoped
to find a VPERM-style instruction, like
>SSE5 has :(

There is no problem in using the floating point permutation instructions
on integer data. They will not generate exceptions or anything on
denormal numbers. Only the smallest data size is 32 bits, of course.
Integer YMM instructions will probably come in a later version.

Note: Please Cc: me on answers, I am not on the mailing list.

2008-06-27 14:22:49

by Arjan van de Ven

[permalink] [raw]
Subject: Re: ABI change for device drivers using future AVX instruction set

On Fri, 27 Jun 2008 13:31:49 +0200
Agner Fog <[email protected]> wrote:

> Arjan van de Ven wrote:
>
> >the linux kernel policy is loud and clear; this is more an OS policy
> >as it is a platform ABI issue.
>
> It doesn't help to say it "loud and clear" in a closed mailing list.

lkml is rather public

> It has to go into an official document that people can find, and the
> ABI is the most natural place to look for such rules. The ABI is hard
> enough to find. Is there an official OS policy document somewhere
> that I haven't found? Please point me to an authoritative document.

DocBook/kernel-hacking.tmpl has a section about it.


> You can't blame driver makers for using XMM or YMM registers in
> inline assembly or intrinsic functions or calling their own libraries
> or using a different compiler unless there is an official rule
> against it written in some official document that is easy to find. If
> you want to move data in a device driver, it is tempting indeed to
> use the largest register size available.

the good news is that we review drivers before they get included and we
do catch such things. In addition, we give the compiler the command
line options that prevent it from using these instructions....
I don't think this is a problem in practice for Linux drivers.

(I'm deliberately ignoring non-opensource drivers here; you get what
you get with those; "just say no").



--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2008-06-28 08:06:16

by Agner Fog

[permalink] [raw]
Subject: Re: ABI change for device drivers using future AVX instruction set

Arjan van de Ven wrote:
>the good news is that we review drivers before they get included and
we do catch such things.

Are you saying that some group should have the monopoly of approving
device drivers? That is exectly the policy that Microsoft has been so
much criticized for. You can only control the drivers that are included
in your own Linux distribution.

Nothing should prevent me from publishing my own driver for some special
purpose (in fact, I am going to do so). The open source principle would
allow anybody to make a different compiler for device drivers, possibly
using a different programming language. Or making a function library for
use in device drivers.

Thank you for the reference to DocBook/kernel-hacking. The title reads
"Unreliable Guide To Hacking The Linux Kernel". This doesn't really look
like the place to look for authoritative information. It warns against
using floating point/MMX, but not XMM. You can't blame programmers for
making faulty device drivers when there is no authoritative guideline to
follow.

You don't seem to realize the importance of proper documentation. Do you
want me to write in my manual: "There is no authoritative information on
this, but the Linux Junta says so-and-so.."?

How can you expect people to regard Linux as a serious and reliable
alternative to ..BEEP.. when there is no proper documentation?

It is a disgrace that the ABI consists of fractions stored in weird
places and some of them still drafts, but at least there is consensus on
which one to look in for authoritative information. There should be an
authoritative document on what you can do and not do in a device driver,
and the ABI would be the most natural place to put this information. It
is important to tell people that they can save and restore an XMM
register under certain conditions, but not a YMM register. You can't
expect people to search through a huge mailing list archive for such
information.


Note: Please Cc: me of answers

2008-06-28 08:10:32

by David Miller

[permalink] [raw]
Subject: Re: ABI change for device drivers using future AVX instruction set

From: Agner Fog <[email protected]>
Date: Sat, 28 Jun 2008 10:05:46 +0200

> Thank you for the reference to DocBook/kernel-hacking. The title reads
> "Unreliable Guide To Hacking The Linux Kernel".

The title of that document is an application of "sarcasm".

2008-06-28 11:47:57

by Andi Kleen

[permalink] [raw]
Subject: Re: ABI change for device drivers using future AVX instruction set

Agner Fog <[email protected]> writes:

I explained it earlier, but you didn't seem to have understood it.
Here's it again in a simpler form:

If a x86 device driver doesn't use the standard kernel interfaces
for saving/restoring extended state (kernel_fpu_begin/end) it will die.
That is because Linux uses lazy FPU saving by default and when the lazy FPU exception
hits kernel code it will crash because that's not allowed.

And the standard interfaces are going to handle all extended state supported
by the kernel. Full XSAVE support should be there for 2.6.27.

Pretty much the whole problem you describe in your original email doesn't
apply to Linux because of that.

-Andi

2008-06-28 15:09:30

by Agner Fog

[permalink] [raw]
Subject: Re: ABI change for device drivers using future AVX instruction set

Andi Kleen wrote:
>If a x86 device driver doesn't use the standard kernel interfaces for
saving/restoring extended state
> (kernel_fpu_begin/end) it will die.That is because Linux uses lazy
FPU saving by default and when the
>lazy FPU exception hits kernel code it will crash because that's not
allowed. And the standard interfaces
>are going to handle all extended state supported by the kernel. Full
XSAVE support should be there for
> 2.6.27.

Thank you Andi. Finally an exact and exhaustive answer. :-)
This info is not in the "Unreliable Guide To Hacking The Linux Kernel"
or anywhere else except deeply hidden in the archives of this mailing
list. I had to actually look into the source code of kernel_fpu_begin to
verify that it saves not only the FPU but also the XMM registers and
that it disables pre-emption.

You see why I want proper documentation? If this had been documented in
some reference that was easy to find, I wouldn't have needed to take
your time with all these questions...

Note: Please Cc: me of answers.

2008-06-28 15:44:18

by Helge Hafting

[permalink] [raw]
Subject: Re: ABI change for device drivers using future AVX instruction set

Agner Fog wrote:

> You see why I want proper documentation? If this had been documented in
> some reference that was easy to find, I wouldn't have needed to take
> your time with all these questions...

This is correct - but then someone would have to use time to write that
documentation _and keep it up-to-date_. So far, nobody volunteered.
You should feel free to add your newly gained information to the
existing incomplete documentation. :)

This is how open-source works - there is no "boss/management"
that can put someone on the less popular jobs.
No matter how useful this might be.
You want something extra - driver or docs - you make it.

Helge Hafting

2008-06-28 19:51:14

by Andi Kleen

[permalink] [raw]
Subject: Re: ABI change for device drivers using future AVX instruction set

> This info is not in the "Unreliable Guide To Hacking The Linux Kernel"

The unreliable guide is a little outdated.

> or anywhere else except deeply hidden in the archives of this mailing

Normally all introductions of kernel programming (like
the classic "Linux device drivers") should document that FPU code
is not allowed. The kernel also enforces that by passing special
compiler optionsthat cause compiler errors for FPU and SSE code.
You have to explicitely override those.

> list. I had to actually look into the source code of kernel_fpu_begin to
> verify that it saves not only the FPU but also the XMM registers and
> that it disables pre-emption.

The requirement to disable preemption is one reason why XMM in
a driver is not a good idea BTW. XMM should be normally only
used when you plan to spend a lot of CPU cycles (otherwise
the cost of saving the state is not amortized by the improvements).

But keeping preemption disabled for a long time is considered
unfriendly because it increases kernel latencies and might
in the worst case cause visible scheduling problems like skipping audio
etc.

> You see why I want proper documentation? If this had been documented in
> some reference that was easy to find, I wouldn't have needed to take
> your time with all these questions...

Yes agreed the documentation could be better. The standard Linux
reference documentation is "kerneldoc" which you can generate from the
kernel source with make pdfdocs/mandocs/htmldocs
(some distributions ship pregenerated docs in a separate kernel-docs package)

Unfortunately kerneldoc documents a lot of obscure unimportant
functions, but doesn't document important functionality like
kernel_fpu_begin/end (or even important functions like kmalloc!)

There's a new editor for kernel documentation now so things might
improve.

-Andi

2008-06-29 11:33:49

by Avi Kivity

[permalink] [raw]
Subject: Re: ABI change for device drivers using future AVX instruction set

Andi Kleen wrote:
>
>> list. I had to actually look into the source code of kernel_fpu_begin to
>> verify that it saves not only the FPU but also the XMM registers and
>> that it disables pre-emption.
>>
>
> The requirement to disable preemption is one reason why XMM in
> a driver is not a good idea BTW. XMM should be normally only
> used when you plan to spend a lot of CPU cycles (otherwise
> the cost of saving the state is not amortized by the improvements).
>
> But keeping preemption disabled for a long time is considered
> unfriendly because it increases kernel latencies and might
> in the worst case cause visible scheduling problems like skipping audio
> etc.
>

This is fixable. We could change kernel_fpu_begin() not to disable
preemption, but instead set a task flag. When we get the "no device"
fault, if the flag is set, save the fpu state into the kernel fpu save
area instead of the user fpu save area. Similar logic for restoring the
fpu state.

kvm runs with the guest fpu context while in the host kernel, using
preempt notifiers to achieve the same thing.

--
error compiling committee.c: too many arguments to function

2008-06-29 12:21:35

by Andi Kleen

[permalink] [raw]
Subject: Re: ABI change for device drivers using future AVX instruction set

Avi Kivity wrote:

> This is fixable.

Sure nearly everything is fixable, but why would you want to do that?

For me the best fix currently seems to be to just not do that
when it hurts.

> We could change kernel_fpu_begin() not to disable
> preemption, but instead set a task flag. When we get the "no device"
> fault, if the flag is set, save the fpu state into the kernel fpu save
> area

What kernel fpu save area do you mean?

-Andi

2008-06-29 12:32:06

by Avi Kivity

[permalink] [raw]
Subject: Re: ABI change for device drivers using future AVX instruction set

Andi Kleen wrote:
> Avi Kivity wrote:
>
>
>> This is fixable.
>>
>
> Sure nearly everything is fixable, but why would you want to do that?
>
> For me the best fix currently seems to be to just not do that
> when it hurts.
>
>

We already use sse in the kernel (raid), presumably disabling preemption
there hurts some workloads.

With sse/avx gaining more features, we may see more requirements for
kernel fpu.

>> We could change kernel_fpu_begin() not to disable
>> preemption, but instead set a task flag. When we get the "no device"
>> fault, if the flag is set, save the fpu state into the kernel fpu save
>> area
>>
>
> What kernel fpu save area do you mean?
>

A new one, of course.

--
error compiling committee.c: too many arguments to function

2008-06-29 12:46:59

by Alan

[permalink] [raw]
Subject: Re: ABI change for device drivers using future AVX instruction set

> Nothing should prevent me from publishing my own driver for some special
> purpose (in fact, I am going to do so). The open source principle would

Go ahead but the API is *source* level and changes each release so the
ABI question doesn't happen.

> It is a disgrace that the ABI

We don't have an ABI.

Alan

2008-06-29 13:07:27

by Andi Kleen

[permalink] [raw]
Subject: Re: ABI change for device drivers using future AVX instruction set


>>> We could change kernel_fpu_begin() not to disable
>>> preemption, but instead set a task flag. When we get the "no device"
>>> fault, if the flag is set, save the fpu state into the kernel fpu save
>>> area
>>
>> What kernel fpu save area do you mean?
>>
>
> A new one, of course.
>

With that we would be eventually in the mess Agner talked about.

-Andi

2008-06-29 13:19:09

by Avi Kivity

[permalink] [raw]
Subject: Re: ABI change for device drivers using future AVX instruction set

Andi Kleen wrote:
>>>> We could change kernel_fpu_begin() not to disable
>>>> preemption, but instead set a task flag. When we get the "no device"
>>>> fault, if the flag is set, save the fpu state into the kernel fpu save
>>>> area
>>>>
>>> What kernel fpu save area do you mean?
>>>
>>>
>> A new one, of course.
>>
>>
>
> With that we would be eventually in the mess Agner talked about.
>
>

If you use xsave, I don't see how this is different to the user fpu save
area.

--
error compiling committee.c: too many arguments to function

2008-06-29 13:23:29

by Andi Kleen

[permalink] [raw]
Subject: Re: ABI change for device drivers using future AVX instruction set

Avi Kivity wrote:
> Andi Kleen wrote:
>>>>> We could change kernel_fpu_begin() not to disable
>>>>> preemption, but instead set a task flag. When we get the "no device"
>>>>> fault, if the flag is set, save the fpu state into the kernel fpu save
>>>>> area
>>>> What kernel fpu save area do you mean?
>>>>
>>> A new one, of course.
>>>
>>>
>>
>> With that we would be eventually in the mess Agner talked about.
>>
>>
>
> If you use xsave, I don't see how this is different to the user fpu save
> area.

For once there's no clear error handling path for allocation failures
on the (arbitarily sized) xsave state. On user code that can be barely
tolerated, but for the kernel it would be deadly.

-Andi

2008-06-29 13:29:55

by Avi Kivity

[permalink] [raw]
Subject: Re: ABI change for device drivers using future AVX instruction set

Andi Kleen wrote:



>> If you use xsave, I don't see how this is different to the user fpu save
>> area.
>>
>
> For once there's no clear error handling path for allocation failures
> on the (arbitarily sized) xsave state. On user code that can be barely
> tolerated, but for the kernel it would be deadly.
>

if (kernel_fpu_begin() < 0)
goto no_sse;

The question is not if it can be done or not, it's whether making raid
preemptible is worthwhile, and whether we see/want more sse/avx
accelerated drivers or not.

--
error compiling committee.c: too many arguments to function

2008-06-29 22:16:26

by H. Peter Anvin

[permalink] [raw]
Subject: Re: ABI change for device drivers using future AVX instruction set

Andi Kleen wrote:
>>>
>> If you use xsave, I don't see how this is different to the user fpu save
>> area.
>
> For once there's no clear error handling path for allocation failures
> on the (arbitarily sized) xsave state. On user code that can be barely
> tolerated, but for the kernel it would be deadly.
>

Well, that's relatively easily dealt with... you'd have to allocate that
state save area explicitly in kernel_fpu_begin(), and it would be up to
the callers of that function to handle the resulting sleep and/or
allocation failure -- we could even make kernel_fpu_begin() take a GFP_*
flag.

Now, there are a few possibilities what to do with said state area. One
is to make it associated with the task; if used for something as RAID,
this would mean pretty soon *all* (or nearly all) tasks have such a
state area, and we might as well go back to allocating them preemptively
on thread creation. The other, of course, is to destroy it in
kernel_fpu_end(). This may cause quite a bit of allocation/deallocation
overhead, but perhaps this would be a decent use of a quicklist.

-hpa

2008-07-01 15:31:16

by Arjan van de Ven

[permalink] [raw]
Subject: Re: ABI change for device drivers using future AVX instruction set

On Sat, 28 Jun 2008 10:05:46 +0200
Agner Fog <[email protected]> wrote:

> Arjan van de Ven wrote:
> >the good news is that we review drivers before they get included
> >and
> we do catch such things.
>
> Are you saying that some group should have the monopoly of approving
> device drivers? That is exectly the policy that Microsoft has been so
> much criticized for. You can only control the drivers that are
> included in your own Linux distribution.

the linux model is that drivers (eventually) get included into the
linux kernel source code. Everything we do process wise is centered
around that concept.

Nothing prevents you from going outside the process. But you need to
realize that that is exactly that: outside the process. Meaning: you're
partially on your own.


--
If you want to reach me at my work email, use [email protected]
For development, discussion and tips for power savings,
visit http://www.lesswatts.org