Changes since v1 [1]:
* fixup the ifence definition to use alternative_2 per recent AMD
changes in tip/x86/pti (Tom)
* drop 'nospec_ptr' (Linus, Mark)
* rename 'nospec_array_ptr' to 'array_ptr' (Alexei)
* rename 'nospec_barrier' to 'ifence' (Peter, Ingo)
* clean up occasions of 'variable assignment in if()' (Sergei, Stephen)
* make 'array_ptr' use a mask instead of an architectural ifence by
default (Linus, Alexei)
* provide a command line and compile-time opt-in to the ifence
mechanism, if an architecture provides 'ifence_array_ptr'.
* provide an optimized mask generation helper, 'array_ptr_mask', for
x86 (Linus)
* move 'get_user' hardening from '__range_not_ok' to '__uaccess_begin'
(Linus)
* drop "Thermal/int340x: prevent bounds-check..." since userspace does
not have arbitrary control over the 'trip' index (Srinivas)
* update the changelog of "net: mpls: prevent bounds-check..." and keep
it in the series to continue the debate about Spectre hygiene patches.
(Eric).
* record a reviewed-by from Laurent on "[media] uvcvideo: prevent
bounds-check..."
* update the cover letter
[1]: https://lwn.net/Articles/743376/
---
Quoting Mark's original RFC:
"Recently, Google Project Zero discovered several classes of attack
against speculative execution. One of these, known as variant-1, allows
explicit bounds checks to be bypassed under speculation, providing an
arbitrary read gadget. Further details can be found on the GPZ blog [2]
and the Documentation patch in this series."
This series incorporates Mark Rutland's latest ARM changes and adds
the x86 specific implementation of 'ifence_array_ptr'. That ifence
based approach is provided as an opt-in fallback, but the default
mitigation, '__array_ptr', uses a 'mask' approach that removes
conditional branches instructions, and otherwise aims to redirect
speculation to use a NULL pointer rather than a user controlled value.
The mask is generated by the following from Alexei, and Linus:
mask = ~(long)(_i | (_s - 1 - _i)) >> (BITS_PER_LONG - 1);
...and Linus provided an optimized mask generation helper for x86:
asm ("cmpq %1,%2; sbbq %0,%0;"
:"=r" (mask)
:"r"(sz),"r" (idx)
:"cc");
The 'array_ptr' mechanism can be switched between 'mask' and 'ifence'
via the spectre_v1={mask,ifence} command line option, and the
compile-time default is set by selecting either CONFIG_SPECTRE1_MASK or
CONFIG_SPECTRE1_IFENCE.
The 'array_ptr' infrastructure is the primary focus this patch set. The
individual patches that perform 'array_ptr' conversions are a point in
time (i.e. earlier kernel, early analysis tooling, x86 only etc...)
start at finding some of these gadgets.
Another consideration for reviewing these patches is the 'hygiene'
argument. When a patch refers to hygiene it is concerned with stopping
speculation on an unconstrained or insufficiently constrained pointer
value under userspace control. That by itself is not sufficient for
attack (per current understanding) [3], but it is a necessary
pre-condition. So 'hygiene' refers to cleaning up those suspect
pointers regardless of whether they are usable as a gadget.
These patches are also be available via the 'nospec-v2' git branch
here:
git://git.kernel.org/pub/scm/linux/kernel/git/djbw/linux nospec-v2
Note that the BPF fix for Spectre variant1 is merged in the bpf.git
tree [4], and is not included in this branch.
[2]: https://googleprojectzero.blogspot.co.uk/2018/01/reading-privileged-memory-with-side.html
[3]: https://spectreattack.com/spectre.pdf
[4]: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/commit/?id=b2157399cc98
---
Dan Williams (16):
x86: implement ifence()
x86: implement ifence_array_ptr() and array_ptr_mask()
asm-generic/barrier: mask speculative execution flows
x86: introduce __uaccess_begin_nospec and ASM_IFENCE
x86: use __uaccess_begin_nospec and ASM_IFENCE in get_user paths
ipv6: prevent bounds-check bypass via speculative execution
ipv4: prevent bounds-check bypass via speculative execution
vfs, fdtable: prevent bounds-check bypass via speculative execution
userns: prevent bounds-check bypass via speculative execution
udf: prevent bounds-check bypass via speculative execution
[media] uvcvideo: prevent bounds-check bypass via speculative execution
carl9170: prevent bounds-check bypass via speculative execution
p54: prevent bounds-check bypass via speculative execution
qla2xxx: prevent bounds-check bypass via speculative execution
cw1200: prevent bounds-check bypass via speculative execution
net: mpls: prevent bounds-check bypass via speculative execution
Mark Rutland (3):
Documentation: document array_ptr
arm64: implement ifence_array_ptr()
arm: implement ifence_array_ptr()
Documentation/speculation.txt | 142 ++++++++++++++++++++++++++++++
arch/arm/Kconfig | 1
arch/arm/include/asm/barrier.h | 24 +++++
arch/arm64/Kconfig | 1
arch/arm64/include/asm/barrier.h | 24 +++++
arch/x86/Kconfig | 3 +
arch/x86/include/asm/barrier.h | 46 ++++++++++
arch/x86/include/asm/msr.h | 3 -
arch/x86/include/asm/smap.h | 4 +
arch/x86/include/asm/uaccess.h | 16 +++
arch/x86/include/asm/uaccess_32.h | 6 +
arch/x86/include/asm/uaccess_64.h | 12 +--
arch/x86/lib/copy_user_64.S | 3 +
arch/x86/lib/usercopy_32.c | 8 +-
drivers/media/usb/uvc/uvc_v4l2.c | 9 +-
drivers/net/wireless/ath/carl9170/main.c | 7 +
drivers/net/wireless/intersil/p54/main.c | 9 +-
drivers/net/wireless/st/cw1200/sta.c | 11 +-
drivers/net/wireless/st/cw1200/wsm.h | 4 -
drivers/scsi/qla2xxx/qla_mr.c | 17 ++--
fs/udf/misc.c | 40 +++++---
include/linux/fdtable.h | 7 +
include/linux/nospec.h | 71 +++++++++++++++
kernel/Kconfig.nospec | 31 +++++++
kernel/Makefile | 1
kernel/nospec.c | 52 +++++++++++
kernel/user_namespace.c | 11 +-
lib/Kconfig | 3 +
net/ipv4/raw.c | 10 +-
net/ipv6/raw.c | 10 +-
net/mpls/af_mpls.c | 12 +--
31 files changed, 521 insertions(+), 77 deletions(-)
create mode 100644 Documentation/speculation.txt
create mode 100644 include/linux/nospec.h
create mode 100644 kernel/Kconfig.nospec
create mode 100644 kernel/nospec.c
On Thu, Jan 11, 2018 at 4:46 PM, Dan Williams <[email protected]> wrote:
>
> This series incorporates Mark Rutland's latest ARM changes and adds
> the x86 specific implementation of 'ifence_array_ptr'. That ifence
> based approach is provided as an opt-in fallback, but the default
> mitigation, '__array_ptr', uses a 'mask' approach that removes
> conditional branches instructions, and otherwise aims to redirect
> speculation to use a NULL pointer rather than a user controlled value.
Do you have any performance numbers and perhaps example code
generation? Is this noticeable? Are there any microbenchmarks showing
the difference between lfence use and the masking model?
Having both seems good for testing, but wouldn't we want to pick one in the end?
Also, I do think that there is one particular array load that would
seem to be pretty obvious: the system call function pointer array.
Yes, yes, the actual call is now behind a retpoline, but that protects
against a speculative BTB access, it's not obvious that it protects
against the mispredict of the __NR_syscall_max comparison in
arch/x86/entry/entry_64.S.
The act of fetching code is a kind of read too. And retpoline protects
against BTB stuffing etc, but what if the _actual_ system call
function address is wrong (due to mis-prediction of the system call
index check)?
Should the array access in entry_SYSCALL_64_fastpath be made to use
the masking approach?
Linus
On Fri, Jan 12, 2018 at 4:15 PM, Tony Luck <[email protected]> wrote:
>
> Here there isn't any reason for speculation. The core has the
> value of 'x' in a register and the upper bound encoded into the
> "cmp" instruction. Both are right there, no waiting, no speculation.
So this is an argument I haven't seen before (although it was brought
up in private long ago), but that is very relevant: the actual scope
and depth of speculation.
Your argument basically depends on just what gets speculated, and on
the _actual_ order of execution.
So your argument depends on "the uarch will actually run the code in
order if there are no events that block the pipeline".
Or at least it depends on a certain latency of the killing of any OoO
execution being low enough that the cache access doesn't even begin.
I realize that that is very much a particular microarchitectural
detail, but it's actually a *big* deal. Do we have a set of rules for
what is not a worry, simply because the speculated accesses get killed
early enough?
Apparently "test a register value against a constant" is good enough,
assuming that register is also needed for the address of the access.
Linus
Static analysis reports that 'queue' may be a user controlled value that
is used as a data dependency to read from the 'ar9170_qmap' array. In
order to avoid potential leaks of kernel memory values, block
speculative execution of the instruction stream that could issue reads
based on an invalid result of 'ar9170_qmap[queue]'. In this case the
value of 'ar9170_qmap[queue]' is immediately reused as an index to the
'ar->edcf' array.
Based on an original patch by Elena Reshetova.
Cc: Christian Lamparter <[email protected]>
Cc: Kalle Valo <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Elena Reshetova <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
---
drivers/net/wireless/ath/carl9170/main.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/net/wireless/ath/carl9170/main.c b/drivers/net/wireless/ath/carl9170/main.c
index 988c8857d78c..0acfa8c22b7d 100644
--- a/drivers/net/wireless/ath/carl9170/main.c
+++ b/drivers/net/wireless/ath/carl9170/main.c
@@ -41,6 +41,7 @@
#include <linux/module.h>
#include <linux/etherdevice.h>
#include <linux/random.h>
+#include <linux/nospec.h>
#include <net/mac80211.h>
#include <net/cfg80211.h>
#include "hw.h"
@@ -1384,11 +1385,13 @@ static int carl9170_op_conf_tx(struct ieee80211_hw *hw,
const struct ieee80211_tx_queue_params *param)
{
struct ar9170 *ar = hw->priv;
+ const u8 *elem;
int ret;
mutex_lock(&ar->mutex);
- if (queue < ar->hw->queues) {
- memcpy(&ar->edcf[ar9170_qmap[queue]], param, sizeof(*param));
+ elem = array_ptr(ar9170_qmap, queue, ar->hw->queues);
+ if (elem) {
+ memcpy(&ar->edcf[*elem], param, sizeof(*param));
ret = carl9170_set_qos(ar);
} else {
ret = -EINVAL;
On Fri, Jan 12, 2018 at 12:01 PM, Christian Lamparter
<[email protected]> wrote:
> On Friday, January 12, 2018 7:39:50 PM CET Dan Williams wrote:
>> On Fri, Jan 12, 2018 at 6:42 AM, Christian Lamparter <[email protected]> wrote:
>> > On Friday, January 12, 2018 1:47:46 AM CET Dan Williams wrote:
>> >> Static analysis reports that 'queue' may be a user controlled value that
>> >> is used as a data dependency to read from the 'ar9170_qmap' array. In
>> >> order to avoid potential leaks of kernel memory values, block
>> >> speculative execution of the instruction stream that could issue reads
>> >> based on an invalid result of 'ar9170_qmap[queue]'. In this case the
>> >> value of 'ar9170_qmap[queue]' is immediately reused as an index to the
>> >> 'ar->edcf' array.
>> >>
>> >> Based on an original patch by Elena Reshetova.
>> >>
>> >> Cc: Christian Lamparter <[email protected]>
>> >> Cc: Kalle Valo <[email protected]>
>> >> Cc: [email protected]
>> >> Cc: [email protected]
>> >> Signed-off-by: Elena Reshetova <[email protected]>
>> >> Signed-off-by: Dan Williams <[email protected]>
>> >> ---
>> > This patch (and p54, cw1200) look like the same patch?!
>> > Can you tell me what happend to:
>> >
>> > On Saturday, January 6, 2018 5:34:03 PM CET Dan Williams wrote:
>> >> On Sat, Jan 6, 2018 at 6:23 AM, Christian Lamparter <[email protected]> wrote:
>> >> > And Furthermore a invalid queue (param->ac) would cause a crash in
>> >> > this line in mac80211 before it even reaches the driver [1]:
>> >> > | sdata->tx_conf[params->ac] = p;
>> >> > | ^^^^^^^^
>> >> > | if (drv_conf_tx(local, sdata, >>>> params->ac <<<<, &p)) {
>> >> > | ^^ (this is a wrapper for the *_op_conf_tx)
>> >> >
>> >> > I don't think these chin-up exercises are needed.
>> >>
>> >> Quite the contrary, you've identified a better place in the call stack
>> >> to sanitize the input and disable speculation. Then we can kill the
>> >> whole class of the wireless driver reports at once it seems.
>> > <https://www.spinics.net/lists/netdev/msg476353.html>
>>
>> I didn't see where ac is being validated against the driver specific
>> 'queues' value in that earlier patch.
> The link to the check is right there in the earlier post. It's in
> parse_txq_params():
> <https://github.com/torvalds/linux/blob/master/net/wireless/nl80211.c#L2070>
> | if (txq_params->ac >= NL80211_NUM_ACS)
> | return -EINVAL;
>
> NL80211_NUM_ACS is 4
> <http://elixir.free-electrons.com/linux/v4.15-rc7/source/include/uapi/linux/nl80211.h#L3748>
>
> This check was added ever since mac80211's ieee80211_set_txq_params():
> | sdata->tx_conf[params->ac] = p;
>
> For cw1200: the driver just sets the dev->queue to 4.
> In carl9170 dev->queues is set to __AR9170_NUM_TXQ and
> p54 uses P54_QUEUE_AC_NUM.
>
> Both __AR9170_NUM_TXQ and P54_QUEUE_AC_NUM are 4.
> And this is not going to change since all drivers
> have to follow mac80211's queue API:
> <https://wireless.wiki.kernel.org/en/developers/documentation/mac80211/queues>
>
> Some background:
> In the old days (linux 2.6 and early 3.x), the parse_txq_params() function did
> not verify the "queue" value. That's why these drivers had to do it.
>
> Here's the relevant code from 2.6.39:
> <http://elixir.free-electrons.com/linux/v2.6.39/source/net/wireless/nl80211.c#L879>
> You'll notice that the check is missing there.
> Here's mac80211's ieee80211_set_txq_params for reference:
> <http://elixir.free-electrons.com/linux/v2.6.39/source/net/mac80211/cfg.c#L1197>
>
> However over time, the check in the driver has become redundant.
>
Excellent, thank you for pointing that out and the background so clearly.
What this tells me though is that we want to inject an ifence() at
this input validation point, i.e.:
if (txq_params->ac >= NL80211_NUM_ACS) {
ifence();
return -EINVAL;
}
...but the kernel, in these patches, only has ifence() defined for
x86. The only way we can sanitize the 'txq_params->ac' value without
ifence() is to do it at array access time, but then we're stuck
touching all drivers when standard kernel development practice says
'refactor common code out of drivers'.
Ugh, the bigger concern is that this driver is being flagged and not
that initial bounds check. Imagine if cw1200 and p54 had already been
converted to assume that they can just trust 'queue'. It would never
have been flagged.
I think we should focus on the get_user path and __fcheck_files for v3.
On Thu, Jan 18, 2018 at 08:58:08AM -0800, Dan Williams wrote:
> On Thu, Jan 18, 2018 at 5:18 AM, Will Deacon <[email protected]> wrote:
> > On Thu, Jan 11, 2018 at 05:41:08PM -0800, Dan Williams wrote:
> >> On Thu, Jan 11, 2018 at 5:19 PM, Linus Torvalds
> >> <[email protected]> wrote:
> >> > On Thu, Jan 11, 2018 at 4:46 PM, Dan Williams <[email protected]> wrote:
> >> >>
> >> >> This series incorporates Mark Rutland's latest ARM changes and adds
> >> >> the x86 specific implementation of 'ifence_array_ptr'. That ifence
> >> >> based approach is provided as an opt-in fallback, but the default
> >> >> mitigation, '__array_ptr', uses a 'mask' approach that removes
> >> >> conditional branches instructions, and otherwise aims to redirect
> >> >> speculation to use a NULL pointer rather than a user controlled value.
> >> >
> >> > Do you have any performance numbers and perhaps example code
> >> > generation? Is this noticeable? Are there any microbenchmarks showing
> >> > the difference between lfence use and the masking model?
> >>
> >> I don't have performance numbers, but here's a sample code generation
> >> from __fcheck_files, where the 'and; lea; and' sequence is portion of
> >> array_ptr() after the mask generation with 'sbb'.
> >>
> >> fdp = array_ptr(fdt->fd, fd, fdt->max_fds);
> >> 8e7: 8b 02 mov (%rdx),%eax
> >> 8e9: 48 39 c7 cmp %rax,%rdi
> >> 8ec: 48 19 c9 sbb %rcx,%rcx
> >> 8ef: 48 8b 42 08 mov 0x8(%rdx),%rax
> >> 8f3: 48 89 fe mov %rdi,%rsi
> >> 8f6: 48 21 ce and %rcx,%rsi
> >> 8f9: 48 8d 04 f0 lea (%rax,%rsi,8),%rax
> >> 8fd: 48 21 c8 and %rcx,%rax
> >>
> >>
> >> > Having both seems good for testing, but wouldn't we want to pick one in the end?
> >>
> >> I was thinking we'd keep it as a 'just in case' sort of thing, at
> >> least until the 'probably safe' assumption of the 'mask' approach has
> >> more time to settle out.
> >
> > From the arm64 side, the only concern I have (and this actually applies to
> > our CSDB sequence as well) is the calculation of the array size by the
> > caller. As Linus mentioned at the end of [1], if the determination of the
> > size argument is based on a conditional branch, then masking doesn't help
> > because you bound within the wrong range under speculation.
> >
> > We ran into this when trying to use masking to protect our uaccess routines
> > where the conditional bound is either KERNEL_DS or USER_DS. It's possible
> > that a prior conditional set_fs(KERNEL_DS) could defeat the masking and so
> > we'd need to throw some heavy barriers in set_fs to make it robust.
>
> At least in the conditional mask case near set_fs() usage the approach
> we are taking is to use a barrier. I.e. the following guidance from
> Linus:
>
> "Basically, the rule is trivial: find all 'stac' users, and use address
> masking if those users already integrate the limit check, and lfence
> they don't."
>
> ...which translates to narrow the pointer for get_user() and use a
> barrier for __get_user().
Great, that matches my thinking re set_fs but I'm still worried about
finding all the places where the bound is conditional for other users
of the macro. Then again, finding the places that need this macro in the
first place is tough enough so perhaps analysing the bound calculation
doesn't make it much worse.
Will
On Friday, January 12, 2018 7:39:50 PM CET Dan Williams wrote:
> On Fri, Jan 12, 2018 at 6:42 AM, Christian Lamparter <[email protected]> wrote:
> > On Friday, January 12, 2018 1:47:46 AM CET Dan Williams wrote:
> >> Static analysis reports that 'queue' may be a user controlled value that
> >> is used as a data dependency to read from the 'ar9170_qmap' array. In
> >> order to avoid potential leaks of kernel memory values, block
> >> speculative execution of the instruction stream that could issue reads
> >> based on an invalid result of 'ar9170_qmap[queue]'. In this case the
> >> value of 'ar9170_qmap[queue]' is immediately reused as an index to the
> >> 'ar->edcf' array.
> >>
> >> Based on an original patch by Elena Reshetova.
> >>
> >> Cc: Christian Lamparter <[email protected]>
> >> Cc: Kalle Valo <[email protected]>
> >> Cc: [email protected]
> >> Cc: [email protected]
> >> Signed-off-by: Elena Reshetova <[email protected]>
> >> Signed-off-by: Dan Williams <[email protected]>
> >> ---
> > This patch (and p54, cw1200) look like the same patch?!
> > Can you tell me what happend to:
> >
> > On Saturday, January 6, 2018 5:34:03 PM CET Dan Williams wrote:
> >> On Sat, Jan 6, 2018 at 6:23 AM, Christian Lamparter <[email protected]> wrote:
> >> > And Furthermore a invalid queue (param->ac) would cause a crash in
> >> > this line in mac80211 before it even reaches the driver [1]:
> >> > | sdata->tx_conf[params->ac] = p;
> >> > | ^^^^^^^^
> >> > | if (drv_conf_tx(local, sdata, >>>> params->ac <<<<, &p)) {
> >> > | ^^ (this is a wrapper for the *_op_conf_tx)
> >> >
> >> > I don't think these chin-up exercises are needed.
> >>
> >> Quite the contrary, you've identified a better place in the call stack
> >> to sanitize the input and disable speculation. Then we can kill the
> >> whole class of the wireless driver reports at once it seems.
> > <https://www.spinics.net/lists/netdev/msg476353.html>
>
> I didn't see where ac is being validated against the driver specific
> 'queues' value in that earlier patch.
The link to the check is right there in the earlier post. It's in
parse_txq_params():
<https://github.com/torvalds/linux/blob/master/net/wireless/nl80211.c#L2070>
| if (txq_params->ac >= NL80211_NUM_ACS)
| return -EINVAL;
NL80211_NUM_ACS is 4
<http://elixir.free-electrons.com/linux/v4.15-rc7/source/include/uapi/linux/nl80211.h#L3748>
This check was added ever since mac80211's ieee80211_set_txq_params():
| sdata->tx_conf[params->ac] = p;
For cw1200: the driver just sets the dev->queue to 4.
In carl9170 dev->queues is set to __AR9170_NUM_TXQ and
p54 uses P54_QUEUE_AC_NUM.
Both __AR9170_NUM_TXQ and P54_QUEUE_AC_NUM are 4.
And this is not going to change since all drivers
have to follow mac80211's queue API:
<https://wireless.wiki.kernel.org/en/developers/documentation/mac80211/queues>
Some background:
In the old days (linux 2.6 and early 3.x), the parse_txq_params() function did
not verify the "queue" value. That's why these drivers had to do it.
Here's the relevant code from 2.6.39:
<http://elixir.free-electrons.com/linux/v2.6.39/source/net/wireless/nl80211.c#L879>
You'll notice that the check is missing there.
Here's mac80211's ieee80211_set_txq_params for reference:
<http://elixir.free-electrons.com/linux/v2.6.39/source/net/mac80211/cfg.c#L1197>
However over time, the check in the driver has become redundant.
> > Anyway, I think there's an easy way to solve this: remove the
> > "if (queue < ar->hw->queues)" check altogether. It's no longer needed
> > anymore as the "queue" value is validated long before the driver code
> > gets called.
> >
> > And from my understanding, this will fix the "In this case
> > the value of 'ar9170_qmap[queue]' is immediately reused as an index to
> > the 'ar->edcf' array." gripe your tool complains about.
> >
> > This is here's a quick test-case for carl9170.:
> > ---
> > diff --git a/drivers/net/wireless/ath/carl9170/main.c b/drivers/net/wireless/ath/carl9170/main.c
> > index 988c8857d78c..2d3afb15bb62 100644
> > --- a/drivers/net/wireless/ath/carl9170/main.c
> > +++ b/drivers/net/wireless/ath/carl9170/main.c
> > @@ -1387,13 +1387,8 @@ static int carl9170_op_conf_tx(struct ieee80211_hw *hw,
> > int ret;
> >
> > mutex_lock(&ar->mutex);
> > - if (queue < ar->hw->queues) {
> > - memcpy(&ar->edcf[ar9170_qmap[queue]], param, sizeof(*param));
> > - ret = carl9170_set_qos(ar);
> > - } else {
> > - ret = -EINVAL;
> > - }
> > -
> > + memcpy(&ar->edcf[ar9170_qmap[queue]], param, sizeof(*param));
> > + ret = carl9170_set_qos(ar);
> > mutex_unlock(&ar->mutex);
> > return ret;
> > }
> > ---
> > What does your tool say about this?
>
> If you take away the 'if' then I don't the tool will report on this.
>
> > (If necessary, the "queue" value could be even sanitized with a
> > queue %= ARRAY_SIZE(ar9170_qmap); before the mutex_lock.)
>
> That is what array_ptr() is doing in a more sophisticated way.
I think it's a very roundabout way :D. In any case the queue %= ...
could also be replaced by:
BUILD_BUG_ON(ARRAY_SIZE(ar9170_qmap) != NL80211_NUM_ACS));
(And the equivalent for p54, cw1200)
Do you think that the appropriate patches could be copied to the
appropriate people please?
On Thu, Jan 11, 2018 at 04:46:24PM -0800, Dan Williams wrote:
> Changes since v1 [1]:
> * fixup the ifence definition to use alternative_2 per recent AMD
> changes in tip/x86/pti (Tom)
>
> * drop 'nospec_ptr' (Linus, Mark)
>
> * rename 'nospec_array_ptr' to 'array_ptr' (Alexei)
>
> * rename 'nospec_barrier' to 'ifence' (Peter, Ingo)
>
> * clean up occasions of 'variable assignment in if()' (Sergei, Stephen)
>
> * make 'array_ptr' use a mask instead of an architectural ifence by
> default (Linus, Alexei)
>
> * provide a command line and compile-time opt-in to the ifence
> mechanism, if an architecture provides 'ifence_array_ptr'.
>
> * provide an optimized mask generation helper, 'array_ptr_mask', for
> x86 (Linus)
>
> * move 'get_user' hardening from '__range_not_ok' to '__uaccess_begin'
> (Linus)
>
> * drop "Thermal/int340x: prevent bounds-check..." since userspace does
> not have arbitrary control over the 'trip' index (Srinivas)
>
> * update the changelog of "net: mpls: prevent bounds-check..." and keep
> it in the series to continue the debate about Spectre hygiene patches.
> (Eric).
>
> * record a reviewed-by from Laurent on "[media] uvcvideo: prevent
> bounds-check..."
>
> * update the cover letter
>
> [1]: https://lwn.net/Articles/743376/
>
> ---
>
> Quoting Mark's original RFC:
>
> "Recently, Google Project Zero discovered several classes of attack
> against speculative execution. One of these, known as variant-1, allows
> explicit bounds checks to be bypassed under speculation, providing an
> arbitrary read gadget. Further details can be found on the GPZ blog [2]
> and the Documentation patch in this series."
>
> This series incorporates Mark Rutland's latest ARM changes and adds
> the x86 specific implementation of 'ifence_array_ptr'. That ifence
> based approach is provided as an opt-in fallback, but the default
> mitigation, '__array_ptr', uses a 'mask' approach that removes
> conditional branches instructions, and otherwise aims to redirect
> speculation to use a NULL pointer rather than a user controlled value.
>
> The mask is generated by the following from Alexei, and Linus:
>
> mask = ~(long)(_i | (_s - 1 - _i)) >> (BITS_PER_LONG - 1);
>
> ...and Linus provided an optimized mask generation helper for x86:
>
> asm ("cmpq %1,%2; sbbq %0,%0;"
> :"=r" (mask)
> :"r"(sz),"r" (idx)
> :"cc");
>
> The 'array_ptr' mechanism can be switched between 'mask' and 'ifence'
> via the spectre_v1={mask,ifence} command line option, and the
> compile-time default is set by selecting either CONFIG_SPECTRE1_MASK or
> CONFIG_SPECTRE1_IFENCE.
>
> The 'array_ptr' infrastructure is the primary focus this patch set. The
> individual patches that perform 'array_ptr' conversions are a point in
> time (i.e. earlier kernel, early analysis tooling, x86 only etc...)
> start at finding some of these gadgets.
>
> Another consideration for reviewing these patches is the 'hygiene'
> argument. When a patch refers to hygiene it is concerned with stopping
> speculation on an unconstrained or insufficiently constrained pointer
> value under userspace control. That by itself is not sufficient for
> attack (per current understanding) [3], but it is a necessary
> pre-condition. So 'hygiene' refers to cleaning up those suspect
> pointers regardless of whether they are usable as a gadget.
>
> These patches are also be available via the 'nospec-v2' git branch
> here:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/djbw/linux nospec-v2
>
> Note that the BPF fix for Spectre variant1 is merged in the bpf.git
> tree [4], and is not included in this branch.
>
> [2]: https://googleprojectzero.blogspot.co.uk/2018/01/reading-privileged-memory-with-side.html
> [3]: https://spectreattack.com/spectre.pdf
> [4]: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/commit/?id=b2157399cc98
>
> ---
>
> Dan Williams (16):
> x86: implement ifence()
> x86: implement ifence_array_ptr() and array_ptr_mask()
> asm-generic/barrier: mask speculative execution flows
> x86: introduce __uaccess_begin_nospec and ASM_IFENCE
> x86: use __uaccess_begin_nospec and ASM_IFENCE in get_user paths
> ipv6: prevent bounds-check bypass via speculative execution
> ipv4: prevent bounds-check bypass via speculative execution
> vfs, fdtable: prevent bounds-check bypass via speculative execution
> userns: prevent bounds-check bypass via speculative execution
> udf: prevent bounds-check bypass via speculative execution
> [media] uvcvideo: prevent bounds-check bypass via speculative execution
> carl9170: prevent bounds-check bypass via speculative execution
> p54: prevent bounds-check bypass via speculative execution
> qla2xxx: prevent bounds-check bypass via speculative execution
> cw1200: prevent bounds-check bypass via speculative execution
> net: mpls: prevent bounds-check bypass via speculative execution
>
> Mark Rutland (3):
> Documentation: document array_ptr
> arm64: implement ifence_array_ptr()
> arm: implement ifence_array_ptr()
>
> Documentation/speculation.txt | 142 ++++++++++++++++++++++++++++++
> arch/arm/Kconfig | 1
> arch/arm/include/asm/barrier.h | 24 +++++
> arch/arm64/Kconfig | 1
> arch/arm64/include/asm/barrier.h | 24 +++++
> arch/x86/Kconfig | 3 +
> arch/x86/include/asm/barrier.h | 46 ++++++++++
> arch/x86/include/asm/msr.h | 3 -
> arch/x86/include/asm/smap.h | 4 +
> arch/x86/include/asm/uaccess.h | 16 +++
> arch/x86/include/asm/uaccess_32.h | 6 +
> arch/x86/include/asm/uaccess_64.h | 12 +--
> arch/x86/lib/copy_user_64.S | 3 +
> arch/x86/lib/usercopy_32.c | 8 +-
> drivers/media/usb/uvc/uvc_v4l2.c | 9 +-
> drivers/net/wireless/ath/carl9170/main.c | 7 +
> drivers/net/wireless/intersil/p54/main.c | 9 +-
> drivers/net/wireless/st/cw1200/sta.c | 11 +-
> drivers/net/wireless/st/cw1200/wsm.h | 4 -
> drivers/scsi/qla2xxx/qla_mr.c | 17 ++--
> fs/udf/misc.c | 40 +++++---
> include/linux/fdtable.h | 7 +
> include/linux/nospec.h | 71 +++++++++++++++
> kernel/Kconfig.nospec | 31 +++++++
> kernel/Makefile | 1
> kernel/nospec.c | 52 +++++++++++
> kernel/user_namespace.c | 11 +-
> lib/Kconfig | 3 +
> net/ipv4/raw.c | 10 +-
> net/ipv6/raw.c | 10 +-
> net/mpls/af_mpls.c | 12 +--
> 31 files changed, 521 insertions(+), 77 deletions(-)
> create mode 100644 Documentation/speculation.txt
> create mode 100644 include/linux/nospec.h
> create mode 100644 kernel/Kconfig.nospec
> create mode 100644 kernel/nospec.c
>
--
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up
Static analysis reports that 'queue' may be a user controlled value that
is used as a data dependency to read from the 'priv->qos_params' array.
In order to avoid potential leaks of kernel memory values, block
speculative execution of the instruction stream that could issue reads
based on an invalid result of 'priv->qos_params[queue]'.
Based on an original patch by Elena Reshetova.
Cc: Christian Lamparter <[email protected]>
Cc: Kalle Valo <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Elena Reshetova <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
---
drivers/net/wireless/intersil/p54/main.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/drivers/net/wireless/intersil/p54/main.c b/drivers/net/wireless/intersil/p54/main.c
index ab6d39e12069..5ce693ff547e 100644
--- a/drivers/net/wireless/intersil/p54/main.c
+++ b/drivers/net/wireless/intersil/p54/main.c
@@ -20,6 +20,7 @@
#include <linux/firmware.h>
#include <linux/etherdevice.h>
#include <linux/module.h>
+#include <linux/nospec.h>
#include <net/mac80211.h>
@@ -411,12 +412,14 @@ static int p54_conf_tx(struct ieee80211_hw *dev,
const struct ieee80211_tx_queue_params *params)
{
struct p54_common *priv = dev->priv;
+ struct p54_edcf_queue_param *p54_q;
int ret;
mutex_lock(&priv->conf_mutex);
- if (queue < dev->queues) {
- P54_SET_QUEUE(priv->qos_params[queue], params->aifs,
- params->cw_min, params->cw_max, params->txop);
+ p54_q = array_ptr(priv->qos_params, queue, dev->queues);
+ if (p54_q) {
+ P54_SET_QUEUE(p54_q[0], params->aifs, params->cw_min,
+ params->cw_max, params->txop);
ret = p54_set_edcf(priv);
} else
ret = -EINVAL;
On Thu, Jan 11, 2018 at 5:19 PM, Linus Torvalds
<[email protected]> wrote:
> On Thu, Jan 11, 2018 at 4:46 PM, Dan Williams <[email protected]> wrote:
>>
>> This series incorporates Mark Rutland's latest ARM changes and adds
>> the x86 specific implementation of 'ifence_array_ptr'. That ifence
>> based approach is provided as an opt-in fallback, but the default
>> mitigation, '__array_ptr', uses a 'mask' approach that removes
>> conditional branches instructions, and otherwise aims to redirect
>> speculation to use a NULL pointer rather than a user controlled value.
>
> Do you have any performance numbers and perhaps example code
> generation? Is this noticeable? Are there any microbenchmarks showing
> the difference between lfence use and the masking model?
I don't have performance numbers, but here's a sample code generation
from __fcheck_files, where the 'and; lea; and' sequence is portion of
array_ptr() after the mask generation with 'sbb'.
fdp = array_ptr(fdt->fd, fd, fdt->max_fds);
8e7: 8b 02 mov (%rdx),%eax
8e9: 48 39 c7 cmp %rax,%rdi
8ec: 48 19 c9 sbb %rcx,%rcx
8ef: 48 8b 42 08 mov 0x8(%rdx),%rax
8f3: 48 89 fe mov %rdi,%rsi
8f6: 48 21 ce and %rcx,%rsi
8f9: 48 8d 04 f0 lea (%rax,%rsi,8),%rax
8fd: 48 21 c8 and %rcx,%rax
> Having both seems good for testing, but wouldn't we want to pick one in the end?
I was thinking we'd keep it as a 'just in case' sort of thing, at
least until the 'probably safe' assumption of the 'mask' approach has
more time to settle out.
>
> Also, I do think that there is one particular array load that would
> seem to be pretty obvious: the system call function pointer array.
>
> Yes, yes, the actual call is now behind a retpoline, but that protects
> against a speculative BTB access, it's not obvious that it protects
> against the mispredict of the __NR_syscall_max comparison in
> arch/x86/entry/entry_64.S.
>
> The act of fetching code is a kind of read too. And retpoline protects
> against BTB stuffing etc, but what if the _actual_ system call
> function address is wrong (due to mis-prediction of the system call
> index check)?
>
> Should the array access in entry_SYSCALL_64_fastpath be made to use
> the masking approach?
I'll take a look. I'm firmly in the 'patch first / worry later' stance
on these investigations.
On Friday, January 12, 2018 1:47:46 AM CET Dan Williams wrote:
> Static analysis reports that 'queue' may be a user controlled value that
> is used as a data dependency to read from the 'ar9170_qmap' array. In
> order to avoid potential leaks of kernel memory values, block
> speculative execution of the instruction stream that could issue reads
> based on an invalid result of 'ar9170_qmap[queue]'. In this case the
> value of 'ar9170_qmap[queue]' is immediately reused as an index to the
> 'ar->edcf' array.
>
> Based on an original patch by Elena Reshetova.
>
> Cc: Christian Lamparter <[email protected]>
> Cc: Kalle Valo <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Elena Reshetova <[email protected]>
> Signed-off-by: Dan Williams <[email protected]>
> ---
This patch (and p54, cw1200) look like the same patch?!
Can you tell me what happend to:
On Saturday, January 6, 2018 5:34:03 PM CET Dan Williams wrote:
> On Sat, Jan 6, 2018 at 6:23 AM, Christian Lamparter <[email protected]> wrote:
> > And Furthermore a invalid queue (param->ac) would cause a crash in
> > this line in mac80211 before it even reaches the driver [1]:
> > | sdata->tx_conf[params->ac] = p;
> > | ^^^^^^^^
> > | if (drv_conf_tx(local, sdata, >>>> params->ac <<<<, &p)) {
> > | ^^ (this is a wrapper for the *_op_conf_tx)
> >
> > I don't think these chin-up exercises are needed.
>
> Quite the contrary, you've identified a better place in the call stack
> to sanitize the input and disable speculation. Then we can kill the
> whole class of the wireless driver reports at once it seems.
<https://www.spinics.net/lists/netdev/msg476353.html>
Anyway, I think there's an easy way to solve this: remove the
"if (queue < ar->hw->queues)" check altogether. It's no longer needed
anymore as the "queue" value is validated long before the driver code
gets called. And from my understanding, this will fix the "In this case
the value of 'ar9170_qmap[queue]' is immediately reused as an index to
the 'ar->edcf' array." gripe your tool complains about.
This is here's a quick test-case for carl9170.:
---
diff --git a/drivers/net/wireless/ath/carl9170/main.c b/drivers/net/wireless/ath/carl9170/main.c
index 988c8857d78c..2d3afb15bb62 100644
--- a/drivers/net/wireless/ath/carl9170/main.c
+++ b/drivers/net/wireless/ath/carl9170/main.c
@@ -1387,13 +1387,8 @@ static int carl9170_op_conf_tx(struct ieee80211_hw *hw,
int ret;
mutex_lock(&ar->mutex);
- if (queue < ar->hw->queues) {
- memcpy(&ar->edcf[ar9170_qmap[queue]], param, sizeof(*param));
- ret = carl9170_set_qos(ar);
- } else {
- ret = -EINVAL;
- }
-
+ memcpy(&ar->edcf[ar9170_qmap[queue]], param, sizeof(*param));
+ ret = carl9170_set_qos(ar);
mutex_unlock(&ar->mutex);
return ret;
}
---
What does your tool say about this?
(If necessary, the "queue" value could be even sanitized with a
queue %= ARRAY_SIZE(ar9170_qmap); before the mutex_lock.)
On Thu, Jan 11, 2018 at 5:19 PM, Linus Torvalds
<[email protected]> wrote:
> Should the array access in entry_SYSCALL_64_fastpath be made to use
> the masking approach?
That one has a bounds check for an inline constant.
cmpq $__NR_syscall_max, %rax
so should be safe.
The classic Spectre variant #1 code sequence is:
int array_size;
if (x < array_size) {
something with array[x]
}
which runs into problems because the array_size variable may not
be in cache, and while the CPU core is waiting for the value it
speculates inside the "if" body.
The syscall entry is more like:
#define ARRAY_SIZE 10
if (x < ARRAY_SIZE) {
something with array[x]
}
Here there isn't any reason for speculation. The core has the
value of 'x' in a register and the upper bound encoded into the
"cmp" instruction. Both are right there, no waiting, no speculation.
-Tony
Static analysis reports that 'queue' may be a user controlled value that
is used as a data dependency to read 'txq_params' from the
'priv->tx_queue_params.params' array. In order to avoid potential leaks
of kernel memory values, block speculative execution of the instruction
stream that could issue reads based on an invalid value of 'txq_params'.
In this case 'txq_params' is referenced later in the function.
Based on an original patch by Elena Reshetova.
Cc: Solomon Peachy <[email protected]>
Cc: Kalle Valo <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Elena Reshetova <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
---
drivers/net/wireless/st/cw1200/sta.c | 11 +++++++----
drivers/net/wireless/st/cw1200/wsm.h | 4 +---
2 files changed, 8 insertions(+), 7 deletions(-)
diff --git a/drivers/net/wireless/st/cw1200/sta.c b/drivers/net/wireless/st/cw1200/sta.c
index 38678e9a0562..7521077e50a4 100644
--- a/drivers/net/wireless/st/cw1200/sta.c
+++ b/drivers/net/wireless/st/cw1200/sta.c
@@ -14,6 +14,7 @@
#include <linux/firmware.h>
#include <linux/module.h>
#include <linux/etherdevice.h>
+#include <linux/nospec.h>
#include "cw1200.h"
#include "sta.h"
@@ -612,18 +613,20 @@ int cw1200_conf_tx(struct ieee80211_hw *dev, struct ieee80211_vif *vif,
u16 queue, const struct ieee80211_tx_queue_params *params)
{
struct cw1200_common *priv = dev->priv;
+ struct wsm_set_tx_queue_params *txq_params;
int ret = 0;
/* To prevent re-applying PM request OID again and again*/
bool old_uapsd_flags;
mutex_lock(&priv->conf_mutex);
- if (queue < dev->queues) {
+ txq_params = array_ptr(priv->tx_queue_params.params, queue,
+ dev->queues);
+ if (txq_params) {
old_uapsd_flags = le16_to_cpu(priv->uapsd_info.uapsd_flags);
- WSM_TX_QUEUE_SET(&priv->tx_queue_params, queue, 0, 0, 0);
- ret = wsm_set_tx_queue_params(priv,
- &priv->tx_queue_params.params[queue], queue);
+ WSM_TX_QUEUE_SET(txq_params, 0, 0, 0);
+ ret = wsm_set_tx_queue_params(priv, txq_params, queue);
if (ret) {
ret = -EINVAL;
goto out;
diff --git a/drivers/net/wireless/st/cw1200/wsm.h b/drivers/net/wireless/st/cw1200/wsm.h
index 48086e849515..8c8d9191e233 100644
--- a/drivers/net/wireless/st/cw1200/wsm.h
+++ b/drivers/net/wireless/st/cw1200/wsm.h
@@ -1099,10 +1099,8 @@ struct wsm_tx_queue_params {
};
-#define WSM_TX_QUEUE_SET(queue_params, queue, ack_policy, allowed_time,\
- max_life_time) \
+#define WSM_TX_QUEUE_SET(p, ack_policy, allowed_time, max_life_time) \
do { \
- struct wsm_set_tx_queue_params *p = &(queue_params)->params[queue]; \
p->ackPolicy = (ack_policy); \
p->allowedMediumTime = (allowed_time); \
p->maxTransmitLifetime = (max_life_time); \
On Sat, Jan 13, 2018 at 10:51 AM, Linus Torvalds
<[email protected]> wrote:
> On Fri, Jan 12, 2018 at 4:15 PM, Tony Luck <[email protected]> wrote:
> So your argument depends on "the uarch will actually run the code in
> order if there are no events that block the pipeline".
And might be bogus ... I'm a software person not a u-arch expert. That
sounded good in my head, but the level of parallelism may be greater
than I can imagine.
> Or at least it depends on a certain latency of the killing of any OoO
> execution being low enough that the cache access doesn't even begin.
>
> I realize that that is very much a particular microarchitectural
> detail, but it's actually a *big* deal. Do we have a set of rules for
> what is not a worry, simply because the speculated accesses get killed
> early enough?
>
> Apparently "test a register value against a constant" is good enough,
> assuming that register is also needed for the address of the access.
People who do understand this are working on what can be guaranteed.
For now don't make big plans based on my ramblings.
-Tony
Hi Will,
On Thursday, 18 January 2018 19:05:47 EET Will Deacon wrote:
> On Thu, Jan 18, 2018 at 08:58:08AM -0800, Dan Williams wrote:
> > On Thu, Jan 18, 2018 at 5:18 AM, Will Deacon wrote:
> >> On Thu, Jan 11, 2018 at 05:41:08PM -0800, Dan Williams wrote:
> >>> On Thu, Jan 11, 2018 at 5:19 PM, Linus Torvalds wrote:
> >>>> On Thu, Jan 11, 2018 at 4:46 PM, Dan Williams wrote:
> >>>>> This series incorporates Mark Rutland's latest ARM changes and adds
> >>>>> the x86 specific implementation of 'ifence_array_ptr'. That ifence
> >>>>> based approach is provided as an opt-in fallback, but the default
> >>>>> mitigation, '__array_ptr', uses a 'mask' approach that removes
> >>>>> conditional branches instructions, and otherwise aims to redirect
> >>>>> speculation to use a NULL pointer rather than a user controlled
> >>>>> value.
> >>>>
> >>>> Do you have any performance numbers and perhaps example code
> >>>> generation? Is this noticeable? Are there any microbenchmarks showing
> >>>> the difference between lfence use and the masking model?
> >>>
> >>> I don't have performance numbers, but here's a sample code generation
> >>> from __fcheck_files, where the 'and; lea; and' sequence is portion of
> >>> array_ptr() after the mask generation with 'sbb'.
> >>>
> >>> fdp = array_ptr(fdt->fd, fd, fdt->max_fds);
> >>>
> >>> 8e7: 8b 02 mov (%rdx),%eax
> >>> 8e9: 48 39 c7 cmp %rax,%rdi
> >>> 8ec: 48 19 c9 sbb %rcx,%rcx
> >>> 8ef: 48 8b 42 08 mov 0x8(%rdx),%rax
> >>> 8f3: 48 89 fe mov %rdi,%rsi
> >>> 8f6: 48 21 ce and %rcx,%rsi
> >>> 8f9: 48 8d 04 f0 lea (%rax,%rsi,8),%rax
> >>> 8fd: 48 21 c8 and %rcx,%rax
> >>>>
> >>>> Having both seems good for testing, but wouldn't we want to pick one
> >>>> in the end?
> >>>
> >>> I was thinking we'd keep it as a 'just in case' sort of thing, at
> >>> least until the 'probably safe' assumption of the 'mask' approach has
> >>> more time to settle out.
> >>
> >> From the arm64 side, the only concern I have (and this actually applies
> >> to our CSDB sequence as well) is the calculation of the array size by
> >> the caller. As Linus mentioned at the end of [1], if the determination
> >> of the size argument is based on a conditional branch, then masking
> >> doesn't help because you bound within the wrong range under speculation.
> >>
> >> We ran into this when trying to use masking to protect our uaccess
> >> routines where the conditional bound is either KERNEL_DS or USER_DS.
> >> It's possible that a prior conditional set_fs(KERNEL_DS) could defeat
> >> the masking and so we'd need to throw some heavy barriers in set_fs to
> >> make it robust.
> >
> > At least in the conditional mask case near set_fs() usage the approach
> > we are taking is to use a barrier. I.e. the following guidance from
> > Linus:
> >
> > "Basically, the rule is trivial: find all 'stac' users, and use address
> > masking if those users already integrate the limit check, and lfence
> > they don't."
> >
> > ...which translates to narrow the pointer for get_user() and use a
> > barrier for __get_user().
>
> Great, that matches my thinking re set_fs but I'm still worried about
> finding all the places where the bound is conditional for other users
> of the macro. Then again, finding the places that need this macro in the
> first place is tough enough so perhaps analysing the bound calculation
> doesn't make it much worse.
It might not now, but if the bound calculation changes later, I'm pretty sure
we'll forget to update the speculation barrier macro at least in some cases.
Without the help of static (or possibly dynamic) code analysis I think we're
bound to reintroduce problems over time, but that's true for finding places
where the barrier is needed, not just for barrier selection based on bound
calculation.
--
Regards,
Laurent Pinchart
On Fri, Jan 12, 2018 at 6:42 AM, Christian Lamparter <[email protected]> wrote:
> On Friday, January 12, 2018 1:47:46 AM CET Dan Williams wrote:
>> Static analysis reports that 'queue' may be a user controlled value that
>> is used as a data dependency to read from the 'ar9170_qmap' array. In
>> order to avoid potential leaks of kernel memory values, block
>> speculative execution of the instruction stream that could issue reads
>> based on an invalid result of 'ar9170_qmap[queue]'. In this case the
>> value of 'ar9170_qmap[queue]' is immediately reused as an index to the
>> 'ar->edcf' array.
>>
>> Based on an original patch by Elena Reshetova.
>>
>> Cc: Christian Lamparter <[email protected]>
>> Cc: Kalle Valo <[email protected]>
>> Cc: [email protected]
>> Cc: [email protected]
>> Signed-off-by: Elena Reshetova <[email protected]>
>> Signed-off-by: Dan Williams <[email protected]>
>> ---
> This patch (and p54, cw1200) look like the same patch?!
> Can you tell me what happend to:
>
> On Saturday, January 6, 2018 5:34:03 PM CET Dan Williams wrote:
>> On Sat, Jan 6, 2018 at 6:23 AM, Christian Lamparter <[email protected]> wrote:
>> > And Furthermore a invalid queue (param->ac) would cause a crash in
>> > this line in mac80211 before it even reaches the driver [1]:
>> > | sdata->tx_conf[params->ac] = p;
>> > | ^^^^^^^^
>> > | if (drv_conf_tx(local, sdata, >>>> params->ac <<<<, &p)) {
>> > | ^^ (this is a wrapper for the *_op_conf_tx)
>> >
>> > I don't think these chin-up exercises are needed.
>>
>> Quite the contrary, you've identified a better place in the call stack
>> to sanitize the input and disable speculation. Then we can kill the
>> whole class of the wireless driver reports at once it seems.
> <https://www.spinics.net/lists/netdev/msg476353.html>
I didn't see where ac is being validated against the driver specific
'queues' value in that earlier patch.
>
> Anyway, I think there's an easy way to solve this: remove the
> "if (queue < ar->hw->queues)" check altogether. It's no longer needed
> anymore as the "queue" value is validated long before the driver code
> gets called.
Can you point me to where that validation happens?
> And from my understanding, this will fix the "In this case
> the value of 'ar9170_qmap[queue]' is immediately reused as an index to
> the 'ar->edcf' array." gripe your tool complains about.
>
> This is here's a quick test-case for carl9170.:
> ---
> diff --git a/drivers/net/wireless/ath/carl9170/main.c b/drivers/net/wireless/ath/carl9170/main.c
> index 988c8857d78c..2d3afb15bb62 100644
> --- a/drivers/net/wireless/ath/carl9170/main.c
> +++ b/drivers/net/wireless/ath/carl9170/main.c
> @@ -1387,13 +1387,8 @@ static int carl9170_op_conf_tx(struct ieee80211_hw *hw,
> int ret;
>
> mutex_lock(&ar->mutex);
> - if (queue < ar->hw->queues) {
> - memcpy(&ar->edcf[ar9170_qmap[queue]], param, sizeof(*param));
> - ret = carl9170_set_qos(ar);
> - } else {
> - ret = -EINVAL;
> - }
> -
> + memcpy(&ar->edcf[ar9170_qmap[queue]], param, sizeof(*param));
> + ret = carl9170_set_qos(ar);
> mutex_unlock(&ar->mutex);
> return ret;
> }
> ---
> What does your tool say about this?
If you take away the 'if' then I don't the tool will report on this.
> (If necessary, the "queue" value could be even sanitized with a
> queue %= ARRAY_SIZE(ar9170_qmap); before the mutex_lock.)
That is what array_ptr() is doing in a more sophisticated way.
Hi Dan, Linus,
On Thu, Jan 11, 2018 at 05:41:08PM -0800, Dan Williams wrote:
> On Thu, Jan 11, 2018 at 5:19 PM, Linus Torvalds
> <[email protected]> wrote:
> > On Thu, Jan 11, 2018 at 4:46 PM, Dan Williams <[email protected]> wrote:
> >>
> >> This series incorporates Mark Rutland's latest ARM changes and adds
> >> the x86 specific implementation of 'ifence_array_ptr'. That ifence
> >> based approach is provided as an opt-in fallback, but the default
> >> mitigation, '__array_ptr', uses a 'mask' approach that removes
> >> conditional branches instructions, and otherwise aims to redirect
> >> speculation to use a NULL pointer rather than a user controlled value.
> >
> > Do you have any performance numbers and perhaps example code
> > generation? Is this noticeable? Are there any microbenchmarks showing
> > the difference between lfence use and the masking model?
>
> I don't have performance numbers, but here's a sample code generation
> from __fcheck_files, where the 'and; lea; and' sequence is portion of
> array_ptr() after the mask generation with 'sbb'.
>
> fdp = array_ptr(fdt->fd, fd, fdt->max_fds);
> 8e7: 8b 02 mov (%rdx),%eax
> 8e9: 48 39 c7 cmp %rax,%rdi
> 8ec: 48 19 c9 sbb %rcx,%rcx
> 8ef: 48 8b 42 08 mov 0x8(%rdx),%rax
> 8f3: 48 89 fe mov %rdi,%rsi
> 8f6: 48 21 ce and %rcx,%rsi
> 8f9: 48 8d 04 f0 lea (%rax,%rsi,8),%rax
> 8fd: 48 21 c8 and %rcx,%rax
>
>
> > Having both seems good for testing, but wouldn't we want to pick one in the end?
>
> I was thinking we'd keep it as a 'just in case' sort of thing, at
> least until the 'probably safe' assumption of the 'mask' approach has
> more time to settle out.
>From the arm64 side, the only concern I have (and this actually applies to
our CSDB sequence as well) is the calculation of the array size by the
caller. As Linus mentioned at the end of [1], if the determination of the
size argument is based on a conditional branch, then masking doesn't help
because you bound within the wrong range under speculation.
We ran into this when trying to use masking to protect our uaccess routines
where the conditional bound is either KERNEL_DS or USER_DS. It's possible
that a prior conditional set_fs(KERNEL_DS) could defeat the masking and so
we'd need to throw some heavy barriers in set_fs to make it robust.
The question then is whether or not we're likely to run into this elsewhere,
and I don't have a good feel for that.
Will
[1] http://lkml.kernel.org/r/CA+55aFz0tsreoa=5Ud2noFCpng-dizLBhT9WU9asyhpLfjdcYA@mail.gmail.com
On Thu, Jan 18, 2018 at 5:18 AM, Will Deacon <[email protected]> wrote:
> Hi Dan, Linus,
>
> On Thu, Jan 11, 2018 at 05:41:08PM -0800, Dan Williams wrote:
>> On Thu, Jan 11, 2018 at 5:19 PM, Linus Torvalds
>> <[email protected]> wrote:
>> > On Thu, Jan 11, 2018 at 4:46 PM, Dan Williams <[email protected]> wrote:
>> >>
>> >> This series incorporates Mark Rutland's latest ARM changes and adds
>> >> the x86 specific implementation of 'ifence_array_ptr'. That ifence
>> >> based approach is provided as an opt-in fallback, but the default
>> >> mitigation, '__array_ptr', uses a 'mask' approach that removes
>> >> conditional branches instructions, and otherwise aims to redirect
>> >> speculation to use a NULL pointer rather than a user controlled value.
>> >
>> > Do you have any performance numbers and perhaps example code
>> > generation? Is this noticeable? Are there any microbenchmarks showing
>> > the difference between lfence use and the masking model?
>>
>> I don't have performance numbers, but here's a sample code generation
>> from __fcheck_files, where the 'and; lea; and' sequence is portion of
>> array_ptr() after the mask generation with 'sbb'.
>>
>> fdp = array_ptr(fdt->fd, fd, fdt->max_fds);
>> 8e7: 8b 02 mov (%rdx),%eax
>> 8e9: 48 39 c7 cmp %rax,%rdi
>> 8ec: 48 19 c9 sbb %rcx,%rcx
>> 8ef: 48 8b 42 08 mov 0x8(%rdx),%rax
>> 8f3: 48 89 fe mov %rdi,%rsi
>> 8f6: 48 21 ce and %rcx,%rsi
>> 8f9: 48 8d 04 f0 lea (%rax,%rsi,8),%rax
>> 8fd: 48 21 c8 and %rcx,%rax
>>
>>
>> > Having both seems good for testing, but wouldn't we want to pick one in the end?
>>
>> I was thinking we'd keep it as a 'just in case' sort of thing, at
>> least until the 'probably safe' assumption of the 'mask' approach has
>> more time to settle out.
>
> From the arm64 side, the only concern I have (and this actually applies to
> our CSDB sequence as well) is the calculation of the array size by the
> caller. As Linus mentioned at the end of [1], if the determination of the
> size argument is based on a conditional branch, then masking doesn't help
> because you bound within the wrong range under speculation.
>
> We ran into this when trying to use masking to protect our uaccess routines
> where the conditional bound is either KERNEL_DS or USER_DS. It's possible
> that a prior conditional set_fs(KERNEL_DS) could defeat the masking and so
> we'd need to throw some heavy barriers in set_fs to make it robust.
At least in the conditional mask case near set_fs() usage the approach
we are taking is to use a barrier. I.e. the following guidance from
Linus:
"Basically, the rule is trivial: find all 'stac' users, and use address
masking if those users already integrate the limit check, and lfence
they don't."
...which translates to narrow the pointer for get_user() and use a
barrier for __get_user().