Hello,
current -git kills my system. adding
if (!virt_addr_valid(&aad[2])) {
WARN_ON(1);
return -EINVAL;
}
to ieee80211_aes_ccm_decrypt() given the following backtrace
WARNING: CPU: 5 PID: 252 at net/mac80211/aes_ccm.c:77 ieee80211_aes_ccm_decrypt+0xc8/0x197
CPU: 5 PID: 252 Comm: irq/29-iwlwifi Tainted: G W 4.8.0-next-20161010-dbg-00007-g79797e9-dirty #88
ffffc90000413638 ffffffff811ff0e3 0000000000000000 0000000000000000
ffffc90000413678 ffffffff8103fe91 0000004d000001c8 1ffff920000826d3
ffff88040fc526d8 0000000000000008 ffffc90000413978 ffffc9000041397a
Call Trace:
[<ffffffff811ff0e3>] dump_stack+0x4f/0x65
[<ffffffff8103fe91>] __warn+0xc2/0xdd
[<ffffffff8103ff1c>] warn_slowpath_null+0x1d/0x1f
[<ffffffff8142aaa5>] ieee80211_aes_ccm_decrypt+0xc8/0x197
[<ffffffff810ed595>] ? __put_page+0x3c/0x3f
[<ffffffff8131fa42>] ? put_page+0x4a/0x62
[<ffffffff813218d3>] ? __pskb_pull_tail+0x1e8/0x279
[<ffffffff8141a7dc>] ? ccmp_special_blocks.isra.5+0x51/0x12d
[<ffffffff8141b226>] ieee80211_crypto_ccmp_decrypt+0x1ba/0x221
[<ffffffff81432e80>] ieee80211_rx_handlers+0x52a/0x19c2
[<ffffffff81070000>] ? start_dl_timer+0xa8/0xb4
[<ffffffff8107462d>] ? put_lock_stats.isra.24+0xe/0x20
[<ffffffff8108ebec>] ? del_timer+0x57/0x61
[<ffffffff814351a8>] ieee80211_prepare_and_rx_handle+0xcd6/0xd2a
[<ffffffff810742a5>] ? local_clock+0x10/0x12
[<ffffffff8107642b>] ? __lock_acquire.isra.31+0x202/0x57e
[<ffffffff8143207b>] ? rcu_read_unlock+0x23/0x23
[<ffffffff81066e77>] ? sched_clock_cpu+0x17/0xc6
[<ffffffff814357ab>] ieee80211_rx_napi+0x5af/0x698
[<ffffffff810742c0>] ? get_lock_stats+0x19/0x50
[<ffffffff8107462d>] ? put_lock_stats.isra.24+0xe/0x20
[<ffffffffa023aaa9>] iwl_mvm_rx_rx_mpdu+0x5ab/0x60c [iwlmvm]
[<ffffffff810742c0>] ? get_lock_stats+0x19/0x50
[<ffffffffa0235c80>] iwl_mvm_rx+0x45/0x69 [iwlmvm]
[<ffffffffa01a989e>] iwl_pcie_rx_handle+0x478/0x584 [iwlwifi]
[<ffffffffa01aaafd>] iwl_pcie_irq_handler+0x39c/0x52d [iwlwifi]
[<ffffffff81080824>] ? irq_finalize_oneshot+0xa7/0xa7
[<ffffffff81080841>] irq_thread_fn+0x1d/0x34
[<ffffffff81080ab5>] irq_thread+0xe6/0x1bb
[<ffffffff8108093a>] ? wake_threads_waitq+0x2c/0x2c
[<ffffffff810809cf>] ? irq_thread_dtor+0x95/0x95
[<ffffffff81059d79>] kthread+0xc6/0xce
[<ffffffff8107462d>] ? put_lock_stats.isra.24+0xe/0x20
[<ffffffff81059cb3>] ? __list_del_entry+0x22/0x22
[<ffffffff814669d2>] ret_from_fork+0x22/0x30
---[ end trace 94da6d4698b938b2 ]---
-ss
Cc Andy
Andy, can this be related to CONFIG_VMAP_STACK?
On (10/11/16 00:03), Sergey Senozhatsky wrote:
> Hello,
>
> current -git kills my system. adding
>
> if (!virt_addr_valid(&aad[2])) {
> WARN_ON(1);
> return -EINVAL;
> }
>
> to ieee80211_aes_ccm_decrypt() given the following backtrace
>
> WARNING: CPU: 5 PID: 252 at net/mac80211/aes_ccm.c:77 ieee80211_aes_ccm_decrypt+0xc8/0x197
> CPU: 5 PID: 252 Comm: irq/29-iwlwifi Tainted: G W 4.8.0-next-20161010-dbg-00007-g79797e9-dirty #88
> ffffc90000413638 ffffffff811ff0e3 0000000000000000 0000000000000000
> ffffc90000413678 ffffffff8103fe91 0000004d000001c8 1ffff920000826d3
> ffff88040fc526d8 0000000000000008 ffffc90000413978 ffffc9000041397a
> Call Trace:
> [<ffffffff811ff0e3>] dump_stack+0x4f/0x65
> [<ffffffff8103fe91>] __warn+0xc2/0xdd
> [<ffffffff8103ff1c>] warn_slowpath_null+0x1d/0x1f
> [<ffffffff8142aaa5>] ieee80211_aes_ccm_decrypt+0xc8/0x197
> [<ffffffff810ed595>] ? __put_page+0x3c/0x3f
> [<ffffffff8131fa42>] ? put_page+0x4a/0x62
> [<ffffffff813218d3>] ? __pskb_pull_tail+0x1e8/0x279
> [<ffffffff8141a7dc>] ? ccmp_special_blocks.isra.5+0x51/0x12d
> [<ffffffff8141b226>] ieee80211_crypto_ccmp_decrypt+0x1ba/0x221
> [<ffffffff81432e80>] ieee80211_rx_handlers+0x52a/0x19c2
> [<ffffffff81070000>] ? start_dl_timer+0xa8/0xb4
> [<ffffffff8107462d>] ? put_lock_stats.isra.24+0xe/0x20
> [<ffffffff8108ebec>] ? del_timer+0x57/0x61
> [<ffffffff814351a8>] ieee80211_prepare_and_rx_handle+0xcd6/0xd2a
> [<ffffffff810742a5>] ? local_clock+0x10/0x12
> [<ffffffff8107642b>] ? __lock_acquire.isra.31+0x202/0x57e
> [<ffffffff8143207b>] ? rcu_read_unlock+0x23/0x23
> [<ffffffff81066e77>] ? sched_clock_cpu+0x17/0xc6
> [<ffffffff814357ab>] ieee80211_rx_napi+0x5af/0x698
> [<ffffffff810742c0>] ? get_lock_stats+0x19/0x50
> [<ffffffff8107462d>] ? put_lock_stats.isra.24+0xe/0x20
> [<ffffffffa023aaa9>] iwl_mvm_rx_rx_mpdu+0x5ab/0x60c [iwlmvm]
> [<ffffffff810742c0>] ? get_lock_stats+0x19/0x50
> [<ffffffffa0235c80>] iwl_mvm_rx+0x45/0x69 [iwlmvm]
> [<ffffffffa01a989e>] iwl_pcie_rx_handle+0x478/0x584 [iwlwifi]
> [<ffffffffa01aaafd>] iwl_pcie_irq_handler+0x39c/0x52d [iwlwifi]
> [<ffffffff81080824>] ? irq_finalize_oneshot+0xa7/0xa7
> [<ffffffff81080841>] irq_thread_fn+0x1d/0x34
> [<ffffffff81080ab5>] irq_thread+0xe6/0x1bb
> [<ffffffff8108093a>] ? wake_threads_waitq+0x2c/0x2c
> [<ffffffff810809cf>] ? irq_thread_dtor+0x95/0x95
> [<ffffffff81059d79>] kthread+0xc6/0xce
> [<ffffffff8107462d>] ? put_lock_stats.isra.24+0xe/0x20
> [<ffffffff81059cb3>] ? __list_del_entry+0x22/0x22
> [<ffffffff814669d2>] ret_from_fork+0x22/0x30
> ---[ end trace 94da6d4698b938b2 ]---
-ss
Hi,
Sorry - I meant to look into this yesterday but forgot.
> Andy, can this be related to CONFIG_VMAP_STACK?
I think it is.
> > current -git kills my system.
Can you elaborate on how exactly it kills your system?
> > adding
> >
> > if (!virt_addr_valid(&aad[2])) {
> > WARN_ON(1);
> > return -EINVAL;
> > }
That's pretty obviously false with VMAP_STACK, since the caller
(ieee80211_crypto_ccmp_decrypt) puts the aad on the stack. b_0 is also
on the stack, but maybe that doesn't matter.
Herbert, do you know what could cause this, and how we should fix it?
We can't really afford to do an allocation here, and we don't have
space in the skb (not even in skb->cb at that point), so if we really
have no way to continue using the stack we'd ... not sure, use a per-
CPU buffer perhaps.
We need 32 bytes for aad and 16 bytes for b_0, if that also can't be on
the stack any more.
johannes
Hello,
On (10/12/16 11:05), Johannes Berg wrote:
> Sorry - I meant to look into this yesterday but forgot.
>
> > Andy, can this be related to CONFIG_VMAP_STACK?
>
> I think it is.
yeah, the system works fine with !CONFIG_VMAP_STACK.
> > > current -git kills my system.
>
> Can you elaborate on how exactly it kills your system?
the last time I saw it it was a NULL deref at ieee80211_aes_ccm_decrypt.
-ss
> > Can you elaborate on how exactly it kills your system?
>
> the last time I saw it it was a NULL deref at
> ieee80211_aes_ccm_decrypt.
Hm. I was expecting something within the crypto code would cause the
crash, this seems strange.
Anyway, I'm surely out of my depth wrt. the actual cause. Something
like the patch below probably works around it, but it's horribly
inefficient due to the locking and doesn't cover CMAC/GMAC either.
johannes
diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
index 103187ca9474..e820f437f02e 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -27,6 +27,7 @@
#include <linux/leds.h>
#include <linux/idr.h>
#include <linux/rhashtable.h>
+#include <crypto/aes.h>
#include <net/ieee80211_radiotap.h>
#include <net/cfg80211.h>
#include <net/mac80211.h>
@@ -1224,6 +1225,10 @@ struct ieee80211_local {
spinlock_t rx_path_lock;
+ /* temporary buffers for software crypto */
+ u8 aad[2 * AES_BLOCK_SIZE];
+ u8 b_0[AES_BLOCK_SIZE];
+
/* Station data */
/*
* The mutex only protects the list, hash table and
diff --git a/net/mac80211/wpa.c b/net/mac80211/wpa.c
index b48c1e13e281..a3f17a710b85 100644
--- a/net/mac80211/wpa.c
+++ b/net/mac80211/wpa.c
@@ -405,8 +405,8 @@ static int ccmp_encrypt_skb(struct ieee80211_tx_data *tx, struct sk_buff *skb,
u8 *pos;
u8 pn[6];
u64 pn64;
- u8 aad[2 * AES_BLOCK_SIZE];
- u8 b_0[AES_BLOCK_SIZE];
+ u8 *aad = tx->local->aad;
+ u8 *b_0 = tx->local->b_0;
if (info->control.hw_key &&
!(info->control.hw_key->flags & IEEE80211_KEY_FLAG_GENERATE_IV) &&
@@ -460,9 +460,11 @@ static int ccmp_encrypt_skb(struct ieee80211_tx_data *tx, struct sk_buff *skb,
return 0;
pos += IEEE80211_CCMP_HDR_LEN;
+ spin_lock_bh(&tx->local->rx_path_lock);
ccmp_special_blocks(skb, pn, b_0, aad);
ieee80211_aes_ccm_encrypt(key->u.ccmp.tfm, b_0, aad, pos, len,
skb_put(skb, mic_len), mic_len);
+ spin_unlock_bh(&tx->local->rx_path_lock);
return 0;
}
@@ -534,8 +536,9 @@ ieee80211_crypto_ccmp_decrypt(struct ieee80211_rx_data *rx,
}
if (!(status->flag & RX_FLAG_DECRYPTED)) {
- u8 aad[2 * AES_BLOCK_SIZE];
- u8 b_0[AES_BLOCK_SIZE];
+ u8 *aad = rx->local->aad;
+ u8 *b_0 = rx->local->b_0;
+
/* hardware didn't decrypt/verify MIC */
ccmp_special_blocks(skb, pn, b_0, aad);
@@ -639,8 +642,8 @@ static int gcmp_encrypt_skb(struct ieee80211_tx_data *tx, struct sk_buff *skb)
u8 *pos;
u8 pn[6];
u64 pn64;
- u8 aad[2 * AES_BLOCK_SIZE];
- u8 j_0[AES_BLOCK_SIZE];
+ u8 *aad = tx->local->aad;
+ u8 *j_0 = tx->local->b_0;
if (info->control.hw_key &&
!(info->control.hw_key->flags & IEEE80211_KEY_FLAG_GENERATE_IV) &&
@@ -695,9 +698,11 @@ static int gcmp_encrypt_skb(struct ieee80211_tx_data *tx, struct sk_buff *skb)
return 0;
pos += IEEE80211_GCMP_HDR_LEN;
+ spin_lock_bh(&tx->local->rx_path_lock);
gcmp_special_blocks(skb, pn, j_0, aad);
ieee80211_aes_gcm_encrypt(key->u.gcmp.tfm, j_0, aad, pos, len,
skb_put(skb, IEEE80211_GCMP_MIC_LEN));
+ spin_unlock_bh(&tx->local->rx_path_lock);
return 0;
}
@@ -764,8 +769,9 @@ ieee80211_crypto_gcmp_decrypt(struct ieee80211_rx_data *rx)
}
if (!(status->flag & RX_FLAG_DECRYPTED)) {
- u8 aad[2 * AES_BLOCK_SIZE];
- u8 j_0[AES_BLOCK_SIZE];
+ u8 *aad = rx->local->aad;
+ u8 *j_0 = rx->local->b_0;
+
/* hardware didn't decrypt/verify MIC */
gcmp_special_blocks(skb, pn, j_0, aad);
On Wed, 2016-10-12 at 22:39 -0700, Andy Lutomirski wrote:
> In a pinch, I have these patches sitting around:
>
> https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/vmap_stack&id=0a39cfa6fbb5d5635c85253cc7d6b44b54822afd
> https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/vmap_stack&id=bf8cfa200b5a01383ea39fc8ce2f32909767baa8
That truly sounds like something we'd rather avoid in the TX/RX paths
though, which should perform well.
> I don't like them, though. I think it's rather silly that we can't
> just pass virtual addresses to the crypto code.
I don't really understand it either, hence my question about the actual
crash. I'll try to reproduce it in a VM.
johannes
On Wed, Oct 12, 2016 at 7:22 AM, Johannes Berg
<[email protected]> wrote:
>
>> > Can you elaborate on how exactly it kills your system?
>>
>> the last time I saw it it was a NULL deref at
>> ieee80211_aes_ccm_decrypt.
>
> Hm. I was expecting something within the crypto code would cause the
> crash, this seems strange.
>
> Anyway, I'm surely out of my depth wrt. the actual cause. Something
> like the patch below probably works around it, but it's horribly
> inefficient due to the locking and doesn't cover CMAC/GMAC either.
In a pinch, I have these patches sitting around:
https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/vmap_stack&id=0a39cfa6fbb5d5635c85253cc7d6b44b54822afd
https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/vmap_stack&id=bf8cfa200b5a01383ea39fc8ce2f32909767baa8
I don't like them, though. I think it's rather silly that we can't
just pass virtual addresses to the crypto code.
On (10/13/16 08:02), Johannes Berg wrote:
> On Wed, 2016-10-12 at 22:39 -0700, Andy Lutomirski wrote:
>
> > In a pinch, I have these patches sitting around:
> >
> > https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/vmap_stack&id=0a39cfa6fbb5d5635c85253cc7d6b44b54822afd
> > https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/vmap_stack&id=bf8cfa200b5a01383ea39fc8ce2f32909767baa8
>
> That truly sounds like something we'd rather avoid in the TX/RX paths
> though, which should perform well.
didn't fix.
so I finally had some time to do a better bug-reporter job.
I added a bunch of printk-s and several virt_addr_valid()-s
to ieee80211_aes_ccm_encrypt().
and right befoe the Oops I see the following report from
virt_addr_valid()
FAIL: 00004100002cba02 > ffffc900802cba02 || 1 -> (00004100002cba02 >> 39) == 130
which is basically failed '!phys_addr_valid(x)' in __virt_addr_valid()
/* carry flag will be set if starting x was >= PAGE_OFFSET */
if ((x > y) || !phys_addr_valid(x))
return false;
backtrace
------------[ cut here ]------------
WARNING: CPU: 7 PID: 246 at arch/x86/mm/physaddr.c:68 __virt_addr_valid+0xab/0xed
ffffc900002cb6f0 ffffffff8122168c 0000000000000000 0000000000000000
ffffc900002cb730 ffffffff810428d8 0000004400000198 ffff88041bd21022
ffffc900002cba02 1ffff920000596ed ffff88041932d1e0 ffffc900002cba00
Call Trace:
[<ffffffff8122168c>] dump_stack+0x4f/0x65
[<ffffffff810428d8>] __warn+0xc2/0xdd
[<ffffffff81042963>] warn_slowpath_null+0x1d/0x1f
[<ffffffff8103c226>] __virt_addr_valid+0xab/0xed
[<ffffffff8146d31a>] ieee80211_aes_ccm_decrypt+0x8f/0x2da
[<ffffffff812372de>] ? debug_smp_processor_id+0x17/0x19
[<ffffffff810fb7e1>] ? __put_page+0x3c/0x3f
[<ffffffff8145b879>] ? ccmp_special_blocks.isra.1+0x51/0x12d
[<ffffffff8145c445>] ieee80211_crypto_ccmp_decrypt+0x204/0x298
[<ffffffff81476dd1>] ieee80211_rx_handlers+0x7df/0x1c1d
[<ffffffff814791c1>] ieee80211_prepare_and_rx_handle+0xdc2/0xe79
[<ffffffff814793cc>] ? ieee80211_rx_napi+0x154/0x7a5
[<ffffffff814796ec>] ieee80211_rx_napi+0x474/0x7a5
[<ffffffffa01fce3b>] iwl_mvm_rx_rx_mpdu+0x6e6/0x751 [iwlmvm]
[<ffffffffa01f6c49>] iwl_mvm_rx+0x7e/0x98 [iwlmvm]
[<ffffffffa01c0bca>] iwl_pcie_rx_handle+0x523/0x698 [iwlwifi]
[<ffffffffa01c2015>] iwl_pcie_irq_handler+0x45d/0x64d [iwlwifi]
[<ffffffff81089411>] ? irq_finalize_oneshot+0xd4/0xd4
[<ffffffff8108942e>] irq_thread_fn+0x1d/0x34
[<ffffffff810896a2>] irq_thread+0xe6/0x1bb
[<ffffffff81089527>] ? wake_threads_waitq+0x2c/0x2c
[<ffffffff810895bc>] ? irq_thread_dtor+0x95/0x95
[<ffffffff8105d7a3>] kthread+0xfc/0x104
[<ffffffff8107d3ad>] ? put_lock_stats.isra.9+0xe/0x20
[<ffffffff8105d6a7>] ? kthread_create_on_node+0x3f/0x3f
[<ffffffff8105d6a7>] ? kthread_create_on_node+0x3f/0x3f
[<ffffffff8105d6a7>] ? kthread_create_on_node+0x3f/0x3f
[<ffffffff814b2952>] ret_from_fork+0x22/0x30
-ss
On (10/13/16 22:42), Sergey Senozhatsky wrote:
>
> On (10/13/16 08:02), Johannes Berg wrote:
> > On Wed, 2016-10-12 at 22:39 -0700, Andy Lutomirski wrote:
> >
> > > In a pinch, I have these patches sitting around:
> > >
> > > https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/vmap_stack&id=0a39cfa6fbb5d5635c85253cc7d6b44b54822afd
> > > https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/vmap_stack&id=bf8cfa200b5a01383ea39fc8ce2f32909767baa8
> >
> > That truly sounds like something we'd rather avoid in the TX/RX paths
> > though, which should perform well.
>
> didn't fix.
>
> so I finally had some time to do a better bug-reporter job.
>
> I added a bunch of printk-s and several virt_addr_valid()-s
> to ieee80211_aes_ccm_encrypt().
>
> and right befoe the Oops I see the following report from
> virt_addr_valid()
>
>
> FAIL: 00004100002cba02 > ffffc900802cba02 || 1 -> (00004100002cba02 >> 39) == 130
that `(00004100002cba02 >> 39) == 130' part is
phys_addr_valid()
{
(addr >> boot_cpu_data.x86_phys_bits)
}
-ss
On Thu, 2016-10-13 at 22:42 +0900, Sergey Senozhatsky wrote:
>
> > > https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commi
> > > t/?h=x86/vmap_stack&id=0a39cfa6fbb5d5635c85253cc7d6b44b54822afd
> > > https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commi
> > > t/?h=x86/vmap_stack&id=bf8cfa200b5a01383ea39fc8ce2f32909767baa8
> >
> > That truly sounds like something we'd rather avoid in the TX/RX
> > paths though, which should perform well.
>
> didn't fix.
It couldn't, since the new helpers weren't used in mac80211 in those
patches yet.
> so I finally had some time to do a better bug-reporter job.
>
> I added a bunch of printk-s and several virt_addr_valid()-s
> to ieee80211_aes_ccm_encrypt().
>
> and right befoe the Oops I see the following report from
> virt_addr_valid()
>
>
> FAIL: 00004100002cba02 > ffffc900802cba02 || 1 -> (00004100002cba02
> >> 39) == 130
Yeah, we already know that in this function the aad variable is on the
stack, it explicitly is.
The question, though, is why precisely that fails in the crypto code.
Can you send the Oops report itself?
johannes
On (10/13/16 15:45), Johannes Berg wrote:
> On Thu, 2016-10-13 at 22:42 +0900, Sergey Senozhatsky wrote:
> >?
> > > > https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commi
> > > > t/?h=x86/vmap_stack&id=0a39cfa6fbb5d5635c85253cc7d6b44b54822afd
> > > > https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commi
> > > > t/?h=x86/vmap_stack&id=bf8cfa200b5a01383ea39fc8ce2f32909767baa8
> > >
> > > That truly sounds like something we'd rather avoid in the TX/RX
> > > paths though, which should perform well.
> >
> > didn't fix.
>
> It couldn't, since the new helpers weren't used in mac80211 in those
> patches yet.
indeed. I thought they were.
> > ?FAIL: 00004100002cba02 > ffffc900802cba02 || 1 -> (00004100002cba02
> > >> 39) == 130
>
> The question, though, is why precisely that fails in the crypto code.
> Can you send the Oops report itself?
kernel: BUG: unable to handle kernel NULL pointer dereference at (null)
kernel: IP: [<ffffffff8146d2f4>] ieee80211_aes_ccm_decrypt+0x107/0x27f
kernel: PGD 0
kernel:
kernel: Oops: 0000 [#1] PREEMPT SMP
kernel: Modules linked in: nls_iso8859_1 nls_cp437 vfat fat mousedev psmouse serio_raw atkbd libps2 i915 coretemp i2c_algo_bit hwmon crc32c_intel mxm_wmi drm_kms_helper cfbfillrect syscopyarea cfbimgblt sysfillrect iwlmvm sysimgblt fb_sys_fops i2c_i801 cfbcopyarea ie31200_edac drm iwlwifi i2c
kernel: CPU: 3 PID: 245 Comm: irq/28-iwlwifi Not tainted 4.8.0-next-20161013-dbg-00002-ge789862-dirty #112
kernel: task: ffff88041bf01800 task.stack: ffffc900002d0000
kernel: RIP: 0010:[<ffffffff8146d2f4>] [<ffffffff8146d2f4>] ieee80211_aes_ccm_decrypt+0x107/0x27f
kernel: RSP: 0018:ffffc900002d3770 EFLAGS: 00010246
kernel: RAX: ffffc900002d3930 RBX: ffff8804133cf606 RCX: 0000000000082000
kernel: RDX: 0000000000000000 RSI: 0000000000000018 RDI: 0000000000000a02
kernel: RBP: ffffc900002d39b8 R08: 00000000000005e4 R09: 00000004100002d3
kernel: R10: 000000000000001c R11: ffff8803e66d2d20 R12: ffff8804191c2780
kernel: R13: ffffc900002d39f0 R14: ffff8804133cf022 R15: 1ffff9200005a6ee
kernel: FS: 0000000000000000(0000) GS:ffff88041ea00000(0000) knlGS:0000000000000000
kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 0000000000000000 CR3: 0000000001805000 CR4: 00000000001406e0
kernel: Stack:
kernel: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
kernel: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
kernel: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
kernel: Call Trace:
kernel: [<ffffffff8145c405>] ieee80211_crypto_ccmp_decrypt+0x204/0x298
kernel: [<ffffffff81476cd8>] ieee80211_rx_handlers+0x7df/0x1c1d
kernel: [<ffffffff814790c8>] ieee80211_prepare_and_rx_handle+0xdc2/0xe79
kernel: [<ffffffff814792e7>] ? ieee80211_rx_napi+0x168/0x7b6
kernel: [<ffffffff8147960a>] ieee80211_rx_napi+0x48b/0x7b6
kernel: [<ffffffff8123729e>] ? debug_smp_processor_id+0x17/0x19
kernel: [<ffffffffa01cfe3b>] iwl_mvm_rx_rx_mpdu+0x6e6/0x751 [iwlmvm]
kernel: [<ffffffffa01c9c49>] iwl_mvm_rx+0x7e/0x98 [iwlmvm]
kernel: [<ffffffffa0131bca>] iwl_pcie_rx_handle+0x523/0x698 [iwlwifi]
kernel: [<ffffffffa0133027>] iwl_pcie_irq_handler+0x46f/0x65f [iwlwifi]
kernel: [<ffffffff810893d0>] ? irq_finalize_oneshot+0xd4/0xd4
kernel: [<ffffffff810893ed>] irq_thread_fn+0x1d/0x34
kernel: [<ffffffff81089661>] irq_thread+0xe6/0x1bb
kernel: [<ffffffff810894e6>] ? wake_threads_waitq+0x2c/0x2c
kernel: [<ffffffff8108957b>] ? irq_thread_dtor+0x95/0x95
kernel: [<ffffffff8105d762>] kthread+0xfc/0x104
kernel: [<ffffffff8107d36c>] ? put_lock_stats.isra.9+0xe/0x20
kernel: [<ffffffff8105d666>] ? kthread_create_on_node+0x3f/0x3f
kernel: [<ffffffff814b2852>] ret_from_fork+0x22/0x30
kernel: Code: 01 ca 49 89 d1 48 89 d1 48 c1 ea 23 48 8b 14 d5 80 23 63 82 49 c1 e9 0c 48 c1 e9 1b 48 85 d2 74 0a 0f b6 c9 48 c1 e1 04 48 01 ca <48> 8b 12 49 c1 e1 06 b9 00 00 00 80 89 7d 80 89 75 84 48 8b 3d
kernel: RIP [<ffffffff8146d2f4>] ieee80211_aes_ccm_decrypt+0x107/0x27f
kernel: RSP <ffffc900002d3770>
kernel: CR2: 0000000000000000
kernel: ---[ end trace 3cd1fcd496516f72 ]---
-ss
On (10/14/16 00:00), Sergey Senozhatsky wrote:
> kernel: [<ffffffff8145c405>] ieee80211_crypto_ccmp_decrypt+0x204/0x298
> kernel: [<ffffffff81476cd8>] ieee80211_rx_handlers+0x7df/0x1c1d
> kernel: [<ffffffff814790c8>] ieee80211_prepare_and_rx_handle+0xdc2/0xe79
> kernel: [<ffffffff814792e7>] ? ieee80211_rx_napi+0x168/0x7b6
> kernel: [<ffffffff8147960a>] ieee80211_rx_napi+0x48b/0x7b6
> kernel: [<ffffffff8123729e>] ? debug_smp_processor_id+0x17/0x19
> kernel: [<ffffffffa01cfe3b>] iwl_mvm_rx_rx_mpdu+0x6e6/0x751 [iwlmvm]
> kernel: [<ffffffffa01c9c49>] iwl_mvm_rx+0x7e/0x98 [iwlmvm]
> kernel: [<ffffffffa0131bca>] iwl_pcie_rx_handle+0x523/0x698 [iwlwifi]
> kernel: [<ffffffffa0133027>] iwl_pcie_irq_handler+0x46f/0x65f [iwlwifi]
> kernel: [<ffffffff810893d0>] ? irq_finalize_oneshot+0xd4/0xd4
> kernel: [<ffffffff810893ed>] irq_thread_fn+0x1d/0x34
> kernel: [<ffffffff81089661>] irq_thread+0xe6/0x1bb
> kernel: [<ffffffff810894e6>] ? wake_threads_waitq+0x2c/0x2c
> kernel: [<ffffffff8108957b>] ? irq_thread_dtor+0x95/0x95
> kernel: [<ffffffff8105d762>] kthread+0xfc/0x104
> kernel: [<ffffffff8107d36c>] ? put_lock_stats.isra.9+0xe/0x20
> kernel: [<ffffffff8105d666>] ? kthread_create_on_node+0x3f/0x3f
> kernel: [<ffffffff814b2852>] ret_from_fork+0x22/0x30
> kernel: Code: 01 ca 49 89 d1 48 89 d1 48 c1 ea 23 48 8b 14 d5 80 23 63 82 49 c1 e9 0c 48 c1 e9 1b 48 85 d2 74 0a 0f b6 c9 48 c1 e1 04 48 01 ca <48> 8b 12 49 c1 e1 06 b9 00 00 00 80 89 7d 80 89 75 84 48 8b 3d
> kernel: RIP [<ffffffff8146d2f4>] ieee80211_aes_ccm_decrypt+0x107/0x27f
ffffffff8146d1ed <ieee80211_aes_ccm_decrypt>:
ffffffff8146d1ed: e8 9e 67 04 00 callq ffffffff814b3990 <__fentry__>
ffffffff8146d1f2: 55 push %rbp
ffffffff8146d1f3: 48 89 e5 mov %rsp,%rbp
ffffffff8146d1f6: 41 57 push %r15
ffffffff8146d1f8: 41 56 push %r14
ffffffff8146d1fa: 49 89 ce mov %rcx,%r14
ffffffff8146d1fd: 41 55 push %r13
ffffffff8146d1ff: 41 54 push %r12
ffffffff8146d201: 53 push %rbx
ffffffff8146d202: 48 83 c4 80 add $0xffffffffffffff80,%rsp
ffffffff8146d206: 8b 47 04 mov 0x4(%rdi),%eax
ffffffff8146d209: 48 8d 48 50 lea 0x50(%rax),%rcx
ffffffff8146d20d: 48 83 c0 5e add $0x5e,%rax
ffffffff8146d211: 48 c1 e8 03 shr $0x3,%rax
ffffffff8146d215: 48 c1 e0 03 shl $0x3,%rax
ffffffff8146d219: 48 29 c4 sub %rax,%rsp
ffffffff8146d21c: 4c 8d 7c 24 07 lea 0x7(%rsp),%r15
ffffffff8146d221: 49 c1 ef 03 shr $0x3,%r15
ffffffff8146d225: 4d 85 c0 test %r8,%r8
ffffffff8146d228: 4a 8d 04 fd 00 00 00 lea 0x0(,%r15,8),%rax
ffffffff8146d22f: 00
ffffffff8146d230: 48 89 85 70 ff ff ff mov %rax,-0x90(%rbp)
ffffffff8146d237: 75 0a jne ffffffff8146d243 <ieee80211_aes_ccm_decrypt+0x56>
ffffffff8146d239: b8 ea ff ff ff mov $0xffffffea,%eax
ffffffff8146d23e: e9 1a 02 00 00 jmpq ffffffff8146d45d <ieee80211_aes_ccm_decrypt+0x270>
ffffffff8146d243: 31 c0 xor %eax,%eax
ffffffff8146d245: 49 89 fc mov %rdi,%r12
ffffffff8146d248: 49 89 f5 mov %rsi,%r13
ffffffff8146d24b: 4c 89 85 58 ff ff ff mov %r8,-0xa8(%rbp)
ffffffff8146d252: 4a 8d 3c fd 00 00 00 lea 0x0(,%r15,8),%rdi
ffffffff8146d259: 00
ffffffff8146d25a: be 03 00 00 00 mov $0x3,%esi
ffffffff8146d25f: 4c 89 cb mov %r9,%rbx
ffffffff8146d262: 48 89 95 60 ff ff ff mov %rdx,-0xa0(%rbp)
ffffffff8146d269: f3 aa rep stos %al,%es:(%rdi)
ffffffff8146d26b: 48 8d 85 78 ff ff ff lea -0x88(%rbp),%rax
ffffffff8146d272: 48 89 c7 mov %rax,%rdi
ffffffff8146d275: 48 89 85 68 ff ff ff mov %rax,-0x98(%rbp)
ffffffff8146d27c: e8 46 06 dc ff callq ffffffff8122d8c7 <sg_init_table>
ffffffff8146d281: 48 8b 95 60 ff ff ff mov -0xa0(%rbp),%rdx
ffffffff8146d288: 41 b9 00 00 00 80 mov $0x80000000,%r9d
ffffffff8146d28e: 48 8b 0d 7b cd 39 00 mov 0x39cd7b(%rip),%rcx # ffffffff8180a010 <phys_base>
ffffffff8146d295: 48 8b 85 68 ff ff ff mov -0x98(%rbp),%rax
ffffffff8146d29c: 4c 8b 85 58 ff ff ff mov -0xa8(%rbp),%r8
ffffffff8146d2a3: 0f b7 32 movzwl (%rdx),%esi
ffffffff8146d2a6: 48 83 c2 02 add $0x2,%rdx
ffffffff8146d2aa: 89 d7 mov %edx,%edi
ffffffff8146d2ac: 81 e7 ff 0f 00 00 and $0xfff,%edi
ffffffff8146d2b2: 66 c1 c6 08 rol $0x8,%si
ffffffff8146d2b6: 4c 01 ca add %r9,%rdx
ffffffff8146d2b9: 0f b7 f6 movzwl %si,%esi
ffffffff8146d2bc: 72 0a jb ffffffff8146d2c8 <ieee80211_aes_ccm_decrypt+0xdb>
ffffffff8146d2be: 48 b9 00 00 00 80 ff movabs $0x77ff80000000,%rcx
ffffffff8146d2c5: 77 00 00
ffffffff8146d2c8: 48 01 ca add %rcx,%rdx
ffffffff8146d2cb: 49 89 d1 mov %rdx,%r9
ffffffff8146d2ce: 48 89 d1 mov %rdx,%rcx
ffffffff8146d2d1: 48 c1 ea 23 shr $0x23,%rdx
ffffffff8146d2d5: 48 8b 14 d5 80 23 63 mov -0x7d9cdc80(,%rdx,8),%rdx
ffffffff8146d2dc: 82
ffffffff8146d2dd: 49 c1 e9 0c shr $0xc,%r9
ffffffff8146d2e1: 48 c1 e9 1b shr $0x1b,%rcx
ffffffff8146d2e5: 48 85 d2 test %rdx,%rdx
ffffffff8146d2e8: 74 0a je ffffffff8146d2f4 <ieee80211_aes_ccm_decrypt+0x107>
ffffffff8146d2ea: 0f b6 c9 movzbl %cl,%ecx
ffffffff8146d2ed: 48 c1 e1 04 shl $0x4,%rcx
ffffffff8146d2f1: 48 01 ca add %rcx,%rdx
ffffffff8146d2f4: 48 8b 12 mov (%rdx),%rdx
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ffffffff8146d2f7: 49 c1 e1 06 shl $0x6,%r9
ffffffff8146d2fb: b9 00 00 00 80 mov $0x80000000,%ecx
ffffffff8146d300: 89 7d 80 mov %edi,-0x80(%rbp)
ffffffff8146d303: 89 75 84 mov %esi,-0x7c(%rbp)
ffffffff8146d306: 48 8b 3d 03 cd 39 00 mov 0x39cd03(%rip),%rdi # ffffffff8180a010 <phys_base>
ffffffff8146d30d: 48 83 e2 fc and $0xfffffffffffffffc,%rdx
ffffffff8146d311: 49 01 d1 add %rdx,%r9
ffffffff8146d314: 48 8b 95 78 ff ff ff mov -0x88(%rbp),%rdx
-ss
On Oct 13, 2016 6:46 AM, "Johannes Berg" <[email protected]> wrote:
>
> On Thu, 2016-10-13 at 22:42 +0900, Sergey Senozhatsky wrote:
> >
> > > > https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commi
> > > > t/?h=x86/vmap_stack&id=0a39cfa6fbb5d5635c85253cc7d6b44b54822afd
> > > > https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commi
> > > > t/?h=x86/vmap_stack&id=bf8cfa200b5a01383ea39fc8ce2f32909767baa8
> > >
> > > That truly sounds like something we'd rather avoid in the TX/RX
> > > paths though, which should perform well.
> >
> > didn't fix.
>
> It couldn't, since the new helpers weren't used in mac80211 in those
> patches yet.
>
> > so I finally had some time to do a better bug-reporter job.
> >
> > I added a bunch of printk-s and several virt_addr_valid()-s
> > to ieee80211_aes_ccm_encrypt().
> >
> > and right befoe the Oops I see the following report from
> > virt_addr_valid()
> >
> >
> > FAIL: 00004100002cba02 > ffffc900802cba02 || 1 -> (00004100002cba02
> > >> 39) == 130
>
> Yeah, we already know that in this function the aad variable is on the
> stack, it explicitly is.
>
> The question, though, is why precisely that fails in the crypto code.
> Can you send the Oops report itself?
>
It's failing before that. With CONFIG_VMAP_STACK=y, the stack may not
be physically contiguous and can't be used for DMA, so putting it in a
scatterlist is bogus in general, and the crypto code mostly wants a
scatterlist.
There are a couple (faster!) APIs for crypto that don't use
scatterlists, but I don't think AEAD works with them.
--Andy
On Thu, 2016-10-13 at 14:49 -0700, Andy Lutomirski wrote:
>
> It's failing before that. With CONFIG_VMAP_STACK=y, the stack may
> not be physically contiguous and can't be used for DMA, so putting it
> in a scatterlist is bogus in general, and the crypto code mostly
> wants a scatterlist.
I see, so all this stuff is getting inlined, and we crash in
sg_set_buf() because it does sg_set_page() and that obviously needs to
do virt_to_page(), which is invalid on this address now.
With CONFIG_DEBUG_SG we'd have hit the BUG_ON there instead.
It does indeed look like AEAD doesn't have any non-SG API.
So ultimately, the bug already goes back to Ard's commit 7ec7c4a9a686
("mac80211: port CCMP to cryptoapi's CCM driver") since that already
potentially used stack space for DMA.
Since we don't have any space in the SKB or anywhere else at this point
(other than the stack that we can't use), I see two ways out of this:
1. revert that patch (doing so would need some major adjustments now,
since it's pretty old and a number of new things were added in the
meantime)
2. allocate a per-CPU buffer for all the things that we put on the
stack and use in SG lists, those are:
* CCM/GCM: AAD (32B), B_0/J_0 (16B)
* GMAC: AAD (20B), zero (16B)
* (not sure why CMAC isn't using this API, but it would be like
GMAC)
Thoughts?
johannes
> 1. revert that patch (doing so would need some major adjustments now,
> since it's pretty old and a number of new things were added in the
> meantime)
This it will have to be, I guess.
> 2. allocate a per-CPU buffer for all the things that we put on the
> stack and use in SG lists, those are:
> * CCM/GCM: AAD (32B), B_0/J_0 (16B)
> * GMAC: AAD (20B), zero (16B)
> * (not sure why CMAC isn't using this API, but it would be like GMAC)
This doesn't work - I tried to move the mac80211 buffers, but because
we also put the struct aead_request on the stack, and crypto_ccm has
the "odata" in there, and we can't separate the odata from that struct,
we'd have to also put that into a per-CPU buffer, but it's very big -
456 bytes for CCM, didn't measure the others but I'd expect them to be
larger, if different.
I don't think we can allocate half a kb for each CPU just to be able to
possibly use the acceleration here. We can't even make that conditional
on not having hardware crypto in the wifi NIC because drivers are
always allowed to pass undecrypted frames, regardless of whether or not
HW crypto was attempted, so we don't know upfront if we'll have to
decrypt anything in software...
Given that, I think we have had a bug in here basically since Ard's
patch, we never should've put these structs on the stack. Herbert, you
also touched this later and converted the API usage, did you see the
way the stack is used here and think it should be OK, or did you simply
not realize that?
Ard, are you able to help out working on a revert of your patch? That
would require also reverting a number of other patches (various fixes,
API adjustments, etc. to the AEAD usage), but the more complicated part
is that in the meantime Jouni introduced GCMP and CCMP-256, both of
which we of course need to retain.
johannes
On 14 October 2016 at 09:28, Johannes Berg <[email protected]> wrote:
>
>> 1. revert that patch (doing so would need some major adjustments now,
>> since it's pretty old and a number of new things were added in the
>> meantime)
>
> This it will have to be, I guess.
>
>> 2. allocate a per-CPU buffer for all the things that we put on the
>> stack and use in SG lists, those are:
>> * CCM/GCM: AAD (32B), B_0/J_0 (16B)
>> * GMAC: AAD (20B), zero (16B)
>> * (not sure why CMAC isn't using this API, but it would be like GMAC)
>
> This doesn't work - I tried to move the mac80211 buffers, but because
> we also put the struct aead_request on the stack, and crypto_ccm has
> the "odata" in there, and we can't separate the odata from that struct,
> we'd have to also put that into a per-CPU buffer, but it's very big -
> 456 bytes for CCM, didn't measure the others but I'd expect them to be
> larger, if different.
>
> I don't think we can allocate half a kb for each CPU just to be able to
> possibly use the acceleration here. We can't even make that conditional
> on not having hardware crypto in the wifi NIC because drivers are
> always allowed to pass undecrypted frames, regardless of whether or not
> HW crypto was attempted, so we don't know upfront if we'll have to
> decrypt anything in software...
>
> Given that, I think we have had a bug in here basically since Ard's
> patch, we never should've put these structs on the stack. Herbert, you
> also touched this later and converted the API usage, did you see the
> way the stack is used here and think it should be OK, or did you simply
> not realize that?
>
> Ard, are you able to help out working on a revert of your patch? That
> would require also reverting a number of other patches (various fixes,
> API adjustments, etc. to the AEAD usage), but the more complicated part
> is that in the meantime Jouni introduced GCMP and CCMP-256, both of
> which we of course need to retain.
>
I am missing some context here, but could you explain what exactly is
the problem here?
Look at this code
"""
struct scatterlist sg[3];
char aead_req_data[sizeof(struct aead_request) +
crypto_aead_reqsize(tfm)]
__aligned(__alignof__(struct aead_request));
struct aead_request *aead_req = (void *) aead_req_data;
memset(aead_req, 0, sizeof(aead_req_data));
sg_init_table(sg, 3);
sg_set_buf(&sg[0], &aad[2], be16_to_cpup((__be16 *)aad));
sg_set_buf(&sg[1], data, data_len);
sg_set_buf(&sg[2], mic, mic_len);
aead_request_set_tfm(aead_req, tfm);
aead_request_set_crypt(aead_req, sg, sg, data_len, b_0);
aead_request_set_ad(aead_req, sg[0].length);
"""
I assume the stack buffer itself is not the problem here, but aad,
which is allocated on the stack one frame up.
Do we really need to revert the whole patch to fix that?
On (10/13/16 14:49), Andy Lutomirski wrote:
[..]
> > > FAIL: 00004100002cba02 > ffffc900802cba02 || 1 -> (00004100002cba02
> > > >> 39) == 130
> >
> > Yeah, we already know that in this function the aad variable is on the
> > stack, it explicitly is.
> >
> > The question, though, is why precisely that fails in the crypto code.
> > Can you send the Oops report itself?
> >
>
> It's failing before that. With CONFIG_VMAP_STACK=y, the stack may not
> be physically contiguous and can't be used for DMA, so putting it in a
> scatterlist is bogus in general, and the crypto code mostly wants a
> scatterlist.
>
> There are a couple (faster!) APIs for crypto that don't use
> scatterlists, but I don't think AEAD works with them.
given that we have a known issue shouldn't VMAP_STACK be
disabled for now, or would you rather prefer to mark MAC80211
as incompatible: "depends on CFG80211 && !VMAP_STACK"?
-ss
On 14 October 2016 at 09:39, Ard Biesheuvel <[email protected]> wrote:
> On 14 October 2016 at 09:28, Johannes Berg <[email protected]> wrote:
>>
>>> 1. revert that patch (doing so would need some major adjustments now,
>>> since it's pretty old and a number of new things were added in the
>>> meantime)
>>
>> This it will have to be, I guess.
>>
>>> 2. allocate a per-CPU buffer for all the things that we put on the
>>> stack and use in SG lists, those are:
>>> * CCM/GCM: AAD (32B), B_0/J_0 (16B)
>>> * GMAC: AAD (20B), zero (16B)
>>> * (not sure why CMAC isn't using this API, but it would be like GMAC)
>>
>> This doesn't work - I tried to move the mac80211 buffers, but because
>> we also put the struct aead_request on the stack, and crypto_ccm has
>> the "odata" in there, and we can't separate the odata from that struct,
>> we'd have to also put that into a per-CPU buffer, but it's very big -
>> 456 bytes for CCM, didn't measure the others but I'd expect them to be
>> larger, if different.
>>
>> I don't think we can allocate half a kb for each CPU just to be able to
>> possibly use the acceleration here. We can't even make that conditional
>> on not having hardware crypto in the wifi NIC because drivers are
>> always allowed to pass undecrypted frames, regardless of whether or not
>> HW crypto was attempted, so we don't know upfront if we'll have to
>> decrypt anything in software...
>>
>> Given that, I think we have had a bug in here basically since Ard's
>> patch, we never should've put these structs on the stack. Herbert, you
>> also touched this later and converted the API usage, did you see the
>> way the stack is used here and think it should be OK, or did you simply
>> not realize that?
>>
>> Ard, are you able to help out working on a revert of your patch? That
>> would require also reverting a number of other patches (various fixes,
>> API adjustments, etc. to the AEAD usage), but the more complicated part
>> is that in the meantime Jouni introduced GCMP and CCMP-256, both of
>> which we of course need to retain.
>>
>
> I am missing some context here, but could you explain what exactly is
> the problem here?
>
> Look at this code
>
> """
> struct scatterlist sg[3];
>
> char aead_req_data[sizeof(struct aead_request) +
> crypto_aead_reqsize(tfm)]
> __aligned(__alignof__(struct aead_request));
> struct aead_request *aead_req = (void *) aead_req_data;
>
> memset(aead_req, 0, sizeof(aead_req_data));
>
> sg_init_table(sg, 3);
> sg_set_buf(&sg[0], &aad[2], be16_to_cpup((__be16 *)aad));
> sg_set_buf(&sg[1], data, data_len);
> sg_set_buf(&sg[2], mic, mic_len);
>
> aead_request_set_tfm(aead_req, tfm);
> aead_request_set_crypt(aead_req, sg, sg, data_len, b_0);
> aead_request_set_ad(aead_req, sg[0].length);
> """
>
> I assume the stack buffer itself is not the problem here, but aad,
> which is allocated on the stack one frame up.
> Do we really need to revert the whole patch to fix that?
Ah never mind, this is about 'odata'. Apologies, should have read first
On Fri, 2016-10-14 at 09:41 +0100, Ard Biesheuvel wrote:
> > I assume the stack buffer itself is not the problem here, but aad,
> > which is allocated on the stack one frame up.
> > Do we really need to revert the whole patch to fix that?
>
> Ah never mind, this is about 'odata'. Apologies, should have read
> first
Right, odata also goes into an sg list and further on.
I think we should wait for Herbert to chime in before we do any further
work though, perhaps he has any better ideas.
johannes
On 14 October 2016 at 09:42, Johannes Berg <[email protected]> wrote:
> On Fri, 2016-10-14 at 09:41 +0100, Ard Biesheuvel wrote:
>
>> > I assume the stack buffer itself is not the problem here, but aad,
>> > which is allocated on the stack one frame up.
>> > Do we really need to revert the whole patch to fix that?
>>
>> Ah never mind, this is about 'odata'. Apologies, should have read
>> first
>
> Right, odata also goes into an sg list and further on.
>
> I think we should wait for Herbert to chime in before we do any further
> work though, perhaps he has any better ideas.
>
Do you have a reference for the sg_set_buf() call on odata?
crypto/ccm.c does not seem to have it (afaict), and the same problem
does not exist in the accelerated arm64 implementation. In the mean
time, I will try and see if we can move aad[] off the stack in the WPA
code.
On Fri, 2016-10-14 at 17:39 +0900, Sergey Senozhatsky wrote:
>
> given that we have a known issue shouldn't VMAP_STACK be
> disabled for now, or would you rather prefer to mark MAC80211
> as incompatible: "depends on CFG80211 && !VMAP_STACK"?
Yeah. It's a bit complicated by the fact that most people will probably
have hardware crypto in their wifi NICs, so that they won't actually
hit the software crypto path. As I said in my other email though, we
can't guarantee - even if the driver says it can do hardware crypto -
that it really will do it for all frames (some might not be able to do
for management frames for example), so we also can't really catch this
at runtime ...
Making mac80211 depend on !VMAP_STACK is probably technically best, but
I fear it'll break a lot of people's configurations who don't have a
problem right now (e.g. Linus's, who probably enabled this, but I know
where he uses wifi he uses an Intel NIC that will always do HW crypto).
Andy, what do you think?
johannes
For reference, this was my patch moving the mac80211 buffers to percpu.
diff --git a/net/mac80211/aes_ccm.c b/net/mac80211/aes_ccm.c
index 7663c28ba353..c3709ddf71e9 100644
--- a/net/mac80211/aes_ccm.c
+++ b/net/mac80211/aes_ccm.c
@@ -29,6 +29,8 @@ void ieee80211_aes_ccm_encrypt(struct crypto_aead *tfm, u8 *b_0, u8 *aad,
__aligned(__alignof__(struct aead_request));
struct aead_request *aead_req = (void *) aead_req_data;
+ printk(KERN_INFO "ccm size: %d\n", sizeof(aead_req_data));
+
memset(aead_req, 0, sizeof(aead_req_data));
sg_init_table(sg, 3);
@@ -37,6 +39,9 @@ void ieee80211_aes_ccm_encrypt(struct crypto_aead *tfm, u8 *b_0, u8 *aad,
sg_set_buf(&sg[2], mic, mic_len);
aead_request_set_tfm(aead_req, tfm);
+
+ printk(KERN_INFO "aead: %pf\n", crypto_aead_alg(crypto_aead_reqtfm(aead_req))->encrypt);
+
aead_request_set_crypt(aead_req, sg, sg, data_len, b_0);
aead_request_set_ad(aead_req, sg[0].length);
@@ -67,6 +72,8 @@ int ieee80211_aes_ccm_decrypt(struct crypto_aead *tfm, u8 *b_0, u8 *aad,
aead_request_set_crypt(aead_req, sg, sg, data_len + mic_len, b_0);
aead_request_set_ad(aead_req, sg[0].length);
+ printk(KERN_INFO "aead: %pf\n", crypto_aead_alg(crypto_aead_reqtfm(aead_req))->decrypt);
+
return crypto_aead_decrypt(aead_req);
}
diff --git a/net/mac80211/aes_cmac.c b/net/mac80211/aes_cmac.c
index bdf0790d89cc..ebb8c2dc9928 100644
--- a/net/mac80211/aes_cmac.c
+++ b/net/mac80211/aes_cmac.c
@@ -20,7 +20,6 @@
#define CMAC_TLEN 8 /* CMAC TLen = 64 bits (8 octets) */
#define CMAC_TLEN_256 16 /* CMAC TLen = 128 bits (16 octets) */
-#define AAD_LEN 20
static void gf_mulx(u8 *pad)
@@ -101,7 +100,7 @@ void ieee80211_aes_cmac(struct crypto_cipher *tfm, const u8 *aad,
memset(zero, 0, CMAC_TLEN);
addr[0] = aad;
- len[0] = AAD_LEN;
+ len[0] = CMAC_AAD_LEN;
addr[1] = data;
len[1] = data_len - CMAC_TLEN;
addr[2] = zero;
@@ -119,7 +118,7 @@ void ieee80211_aes_cmac_256(struct crypto_cipher *tfm, const u8 *aad,
memset(zero, 0, CMAC_TLEN_256);
addr[0] = aad;
- len[0] = AAD_LEN;
+ len[0] = CMAC_AAD_LEN;
addr[1] = data;
len[1] = data_len - CMAC_TLEN_256;
addr[2] = zero;
diff --git a/net/mac80211/aes_cmac.h b/net/mac80211/aes_cmac.h
index 3702041f44fd..6645f8963278 100644
--- a/net/mac80211/aes_cmac.h
+++ b/net/mac80211/aes_cmac.h
@@ -11,6 +11,8 @@
#include <linux/crypto.h>
+#define CMAC_AAD_LEN 20
+
struct crypto_cipher *ieee80211_aes_cmac_key_setup(const u8 key[],
size_t key_len);
void ieee80211_aes_cmac(struct crypto_cipher *tfm, const u8 *aad,
diff --git a/net/mac80211/aes_gcm.c b/net/mac80211/aes_gcm.c
index 3afe361fd27c..13e64d383c46 100644
--- a/net/mac80211/aes_gcm.c
+++ b/net/mac80211/aes_gcm.c
@@ -25,6 +25,8 @@ void ieee80211_aes_gcm_encrypt(struct crypto_aead *tfm, u8 *j_0, u8 *aad,
__aligned(__alignof__(struct aead_request));
struct aead_request *aead_req = (void *)aead_req_data;
+ printk(KERN_DEBUG "gcm size: %d\n", sizeof(aead_req_data));
+
memset(aead_req, 0, sizeof(aead_req_data));
sg_init_table(sg, 3);
diff --git a/net/mac80211/aes_gmac.c b/net/mac80211/aes_gmac.c
index 3ddd927aaf30..a2fc69ec5ca9 100644
--- a/net/mac80211/aes_gmac.c
+++ b/net/mac80211/aes_gmac.c
@@ -19,17 +19,18 @@
#define GMAC_MIC_LEN 16
#define GMAC_NONCE_LEN 12
-#define AAD_LEN 20
int ieee80211_aes_gmac(struct crypto_aead *tfm, const u8 *aad, u8 *nonce,
- const u8 *data, size_t data_len, u8 *mic)
+ const u8 *data, size_t data_len, u8 *mic, u8 *zero)
{
struct scatterlist sg[4];
char aead_req_data[sizeof(struct aead_request) +
crypto_aead_reqsize(tfm)]
__aligned(__alignof__(struct aead_request));
struct aead_request *aead_req = (void *)aead_req_data;
- u8 zero[GMAC_MIC_LEN], iv[AES_BLOCK_SIZE];
+ u8 iv[AES_BLOCK_SIZE];
+
+ printk(KERN_DEBUG "gmac size: %d\n", sizeof(aead_req_data));
if (data_len < GMAC_MIC_LEN)
return -EINVAL;
@@ -38,7 +39,7 @@ int ieee80211_aes_gmac(struct crypto_aead *tfm, const u8 *aad, u8 *nonce,
memset(zero, 0, GMAC_MIC_LEN);
sg_init_table(sg, 4);
- sg_set_buf(&sg[0], aad, AAD_LEN);
+ sg_set_buf(&sg[0], aad, GMAC_AAD_LEN);
sg_set_buf(&sg[1], data, data_len - GMAC_MIC_LEN);
sg_set_buf(&sg[2], zero, GMAC_MIC_LEN);
sg_set_buf(&sg[3], mic, GMAC_MIC_LEN);
@@ -49,7 +50,7 @@ int ieee80211_aes_gmac(struct crypto_aead *tfm, const u8 *aad, u8 *nonce,
aead_request_set_tfm(aead_req, tfm);
aead_request_set_crypt(aead_req, sg, sg, 0, iv);
- aead_request_set_ad(aead_req, AAD_LEN + data_len);
+ aead_request_set_ad(aead_req, GMAC_AAD_LEN + data_len);
crypto_aead_encrypt(aead_req);
diff --git a/net/mac80211/aes_gmac.h b/net/mac80211/aes_gmac.h
index d328204d73a8..f06833c9095f 100644
--- a/net/mac80211/aes_gmac.h
+++ b/net/mac80211/aes_gmac.h
@@ -11,10 +11,13 @@
#include <linux/crypto.h>
+#define GMAC_MIC_LEN 16
+#define GMAC_AAD_LEN 20
+
struct crypto_aead *ieee80211_aes_gmac_key_setup(const u8 key[],
size_t key_len);
int ieee80211_aes_gmac(struct crypto_aead *tfm, const u8 *aad, u8 *nonce,
- const u8 *data, size_t data_len, u8 *mic);
+ const u8 *data, size_t data_len, u8 *mic, u8 *zero);
void ieee80211_aes_gmac_key_free(struct crypto_aead *tfm);
#endif /* AES_GMAC_H */
diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
index 103187ca9474..bc2a3282e0a1 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -1129,6 +1129,14 @@ enum mac80211_scan_state {
SCAN_ABORT,
};
+struct ieee80211_crypto_bufs {
+
+ u8 buf1[32];
+ u8 buf2[16];
+};
+
+extern struct ieee80211_crypto_bufs __percpu *ieee80211_crypto_bufs;
+
struct ieee80211_local {
/* embed the driver visible part.
* don't cast (use the static inlines below), but we keep
diff --git a/net/mac80211/main.c b/net/mac80211/main.c
index 1075ac24c8c5..6175cde94c53 100644
--- a/net/mac80211/main.c
+++ b/net/mac80211/main.c
@@ -33,6 +33,8 @@
#include "led.h"
#include "debugfs.h"
+struct ieee80211_crypto_bufs __percpu *ieee80211_crypto_bufs;
+
void ieee80211_configure_filter(struct ieee80211_local *local)
{
u64 mc;
@@ -1234,6 +1236,10 @@ static int __init ieee80211_init(void)
BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, driver_data) +
IEEE80211_TX_INFO_DRIVER_DATA_SIZE > sizeof(skb->cb));
+ ieee80211_crypto_bufs = alloc_percpu(struct ieee80211_crypto_bufs);
+ if (!ieee80211_crypto_bufs)
+ return -ENOMEM;
+
ret = rc80211_minstrel_init();
if (ret)
return ret;
@@ -1264,6 +1270,8 @@ static void __exit ieee80211_exit(void)
ieee80211_iface_exit();
+ free_percpu(ieee80211_crypto_bufs);
+
rcu_barrier();
}
diff --git a/net/mac80211/wpa.c b/net/mac80211/wpa.c
index b48c1e13e281..c02634c4210c 100644
--- a/net/mac80211/wpa.c
+++ b/net/mac80211/wpa.c
@@ -405,8 +405,13 @@ static int ccmp_encrypt_skb(struct ieee80211_tx_data *tx, struct sk_buff *skb,
u8 *pos;
u8 pn[6];
u64 pn64;
- u8 aad[2 * AES_BLOCK_SIZE];
- u8 b_0[AES_BLOCK_SIZE];
+ struct ieee80211_crypto_bufs *bufs =
+ this_cpu_ptr(ieee80211_crypto_bufs);
+ u8 *aad = bufs->buf1;
+ u8 *b_0 = bufs->buf2;
+
+ BUILD_BUG_ON(sizeof(bufs->buf1) < 2 * AES_BLOCK_SIZE);
+ BUILD_BUG_ON(sizeof(bufs->buf2) < AES_BLOCK_SIZE);
if (info->control.hw_key &&
!(info->control.hw_key->flags & IEEE80211_KEY_FLAG_GENERATE_IV) &&
@@ -534,8 +539,14 @@ ieee80211_crypto_ccmp_decrypt(struct ieee80211_rx_data *rx,
}
if (!(status->flag & RX_FLAG_DECRYPTED)) {
- u8 aad[2 * AES_BLOCK_SIZE];
- u8 b_0[AES_BLOCK_SIZE];
+ struct ieee80211_crypto_bufs *bufs =
+ this_cpu_ptr(ieee80211_crypto_bufs);
+ u8 *aad = bufs->buf1;
+ u8 *b_0 = bufs->buf2;
+
+ BUILD_BUG_ON(sizeof(bufs->buf1) < 2 * AES_BLOCK_SIZE);
+ BUILD_BUG_ON(sizeof(bufs->buf2) < AES_BLOCK_SIZE);
+
/* hardware didn't decrypt/verify MIC */
ccmp_special_blocks(skb, pn, b_0, aad);
@@ -639,8 +650,13 @@ static int gcmp_encrypt_skb(struct ieee80211_tx_data *tx, struct sk_buff *skb)
u8 *pos;
u8 pn[6];
u64 pn64;
- u8 aad[2 * AES_BLOCK_SIZE];
- u8 j_0[AES_BLOCK_SIZE];
+ struct ieee80211_crypto_bufs *bufs =
+ this_cpu_ptr(ieee80211_crypto_bufs);
+ u8 *aad = bufs->buf1;
+ u8 *j_0 = bufs->buf2;
+
+ BUILD_BUG_ON(sizeof(bufs->buf1) < 2 * AES_BLOCK_SIZE);
+ BUILD_BUG_ON(sizeof(bufs->buf2) < AES_BLOCK_SIZE);
if (info->control.hw_key &&
!(info->control.hw_key->flags & IEEE80211_KEY_FLAG_GENERATE_IV) &&
@@ -764,8 +780,14 @@ ieee80211_crypto_gcmp_decrypt(struct ieee80211_rx_data *rx)
}
if (!(status->flag & RX_FLAG_DECRYPTED)) {
- u8 aad[2 * AES_BLOCK_SIZE];
- u8 j_0[AES_BLOCK_SIZE];
+ struct ieee80211_crypto_bufs *bufs =
+ this_cpu_ptr(ieee80211_crypto_bufs);
+ u8 *aad = bufs->buf1;
+ u8 *j_0 = bufs->buf2;
+
+ BUILD_BUG_ON(sizeof(bufs->buf1) < 2 * AES_BLOCK_SIZE);
+ BUILD_BUG_ON(sizeof(bufs->buf2) < AES_BLOCK_SIZE);
+
/* hardware didn't decrypt/verify MIC */
gcmp_special_blocks(skb, pn, j_0, aad);
@@ -935,8 +957,12 @@ ieee80211_crypto_aes_cmac_encrypt(struct ieee80211_tx_data *tx)
struct ieee80211_tx_info *info;
struct ieee80211_key *key = tx->key;
struct ieee80211_mmie *mmie;
- u8 aad[20];
u64 pn64;
+ struct ieee80211_crypto_bufs *bufs =
+ this_cpu_ptr(ieee80211_crypto_bufs);
+ u8 *aad = bufs->buf1;
+
+ BUILD_BUG_ON(sizeof(bufs->buf1) < CMAC_AAD_LEN);
if (WARN_ON(skb_queue_len(&tx->skbs) != 1))
return TX_DROP;
@@ -979,8 +1005,12 @@ ieee80211_crypto_aes_cmac_256_encrypt(struct ieee80211_tx_data *tx)
struct ieee80211_tx_info *info;
struct ieee80211_key *key = tx->key;
struct ieee80211_mmie_16 *mmie;
- u8 aad[20];
u64 pn64;
+ struct ieee80211_crypto_bufs *bufs =
+ this_cpu_ptr(ieee80211_crypto_bufs);
+ u8 *aad = bufs->buf1;
+
+ BUILD_BUG_ON(sizeof(bufs->buf1) < CMAC_AAD_LEN);
if (WARN_ON(skb_queue_len(&tx->skbs) != 1))
return TX_DROP;
@@ -1022,8 +1052,13 @@ ieee80211_crypto_aes_cmac_decrypt(struct ieee80211_rx_data *rx)
struct ieee80211_rx_status *status = IEEE80211_SKB_RXCB(skb);
struct ieee80211_key *key = rx->key;
struct ieee80211_mmie *mmie;
- u8 aad[20], mic[8], ipn[6];
+ u8 mic[8], ipn[6];
struct ieee80211_hdr *hdr = (struct ieee80211_hdr *) skb->data;
+ struct ieee80211_crypto_bufs *bufs =
+ this_cpu_ptr(ieee80211_crypto_bufs);
+ u8 *aad = bufs->buf1;
+
+ BUILD_BUG_ON(sizeof(bufs->buf1) < CMAC_AAD_LEN);
if (!ieee80211_is_mgmt(hdr->frame_control))
return RX_CONTINUE;
@@ -1072,8 +1107,13 @@ ieee80211_crypto_aes_cmac_256_decrypt(struct ieee80211_rx_data *rx)
struct ieee80211_rx_status *status = IEEE80211_SKB_RXCB(skb);
struct ieee80211_key *key = rx->key;
struct ieee80211_mmie_16 *mmie;
- u8 aad[20], mic[16], ipn[6];
+ u8 mic[16], ipn[6];
struct ieee80211_hdr *hdr = (struct ieee80211_hdr *)skb->data;
+ struct ieee80211_crypto_bufs *bufs =
+ this_cpu_ptr(ieee80211_crypto_bufs);
+ u8 *aad = bufs->buf1;
+
+ BUILD_BUG_ON(sizeof(bufs->buf1) < CMAC_AAD_LEN);
if (!ieee80211_is_mgmt(hdr->frame_control))
return RX_CONTINUE;
@@ -1123,9 +1163,15 @@ ieee80211_crypto_aes_gmac_encrypt(struct ieee80211_tx_data *tx)
struct ieee80211_key *key = tx->key;
struct ieee80211_mmie_16 *mmie;
struct ieee80211_hdr *hdr;
- u8 aad[20];
u64 pn64;
u8 nonce[12];
+ struct ieee80211_crypto_bufs *bufs =
+ this_cpu_ptr(ieee80211_crypto_bufs);
+ u8 *aad = bufs->buf1;
+ u8 *zero = bufs->buf2;
+
+ BUILD_BUG_ON(sizeof(bufs->buf1) < GMAC_AAD_LEN);
+ BUILD_BUG_ON(sizeof(bufs->buf2) < GMAC_MIC_LEN);
if (WARN_ON(skb_queue_len(&tx->skbs) != 1))
return TX_DROP;
@@ -1158,7 +1204,8 @@ ieee80211_crypto_aes_gmac_encrypt(struct ieee80211_tx_data *tx)
/* MIC = AES-GMAC(IGTK, AAD || Management Frame Body || MMIE, 128) */
if (ieee80211_aes_gmac(key->u.aes_gmac.tfm, aad, nonce,
- skb->data + 24, skb->len - 24, mmie->mic) < 0)
+ skb->data + 24, skb->len - 24, mmie->mic,
+ zero) < 0)
return TX_DROP;
return TX_CONTINUE;
@@ -1171,8 +1218,15 @@ ieee80211_crypto_aes_gmac_decrypt(struct ieee80211_rx_data *rx)
struct ieee80211_rx_status *status = IEEE80211_SKB_RXCB(skb);
struct ieee80211_key *key = rx->key;
struct ieee80211_mmie_16 *mmie;
- u8 aad[20], mic[16], ipn[6], nonce[12];
+ u8 mic[16], ipn[6], nonce[12];
struct ieee80211_hdr *hdr = (struct ieee80211_hdr *)skb->data;
+ struct ieee80211_crypto_bufs *bufs =
+ this_cpu_ptr(ieee80211_crypto_bufs);
+ u8 *aad = bufs->buf1;
+ u8 *zero = bufs->buf2;
+
+ BUILD_BUG_ON(sizeof(bufs->buf1) < GMAC_AAD_LEN);
+ BUILD_BUG_ON(sizeof(bufs->buf2) < GMAC_MIC_LEN);
if (!ieee80211_is_mgmt(hdr->frame_control))
return RX_CONTINUE;
@@ -1204,7 +1258,7 @@ ieee80211_crypto_aes_gmac_decrypt(struct ieee80211_rx_data *rx)
if (ieee80211_aes_gmac(key->u.aes_gmac.tfm, aad, nonce,
skb->data + 24, skb->len - 24,
- mic) < 0 ||
+ mic, zero) < 0 ||
memcmp(mic, mmie->mic, sizeof(mmie->mic)) != 0) {
key->u.aes_gmac.icverrors++;
return RX_DROP_UNUSABLE;
On Fri, 2016-10-14 at 09:47 +0100, Ard Biesheuvel wrote:
>
> Do you have a reference for the sg_set_buf() call on odata?
> crypto/ccm.c does not seem to have it (afaict),
It's indirect - crypto_ccm_encrypt() calls crypto_ccm_init_crypt()
which does it.
> and the same problem
> does not exist in the accelerated arm64 implementation. In the mean
> time, I will try and see if we can move aad[] off the stack in the
> WPA code.
I had that with per-CPU buffers, just sent the patch upthread.
johannes
On 14 October 2016 at 09:55, Johannes Berg <[email protected]> wrote:
> On Fri, 2016-10-14 at 09:47 +0100, Ard Biesheuvel wrote:
>>
>> Do you have a reference for the sg_set_buf() call on odata?
>> crypto/ccm.c does not seem to have it (afaict),
>
> It's indirect - crypto_ccm_encrypt() calls crypto_ccm_init_crypt()
> which does it.
>
Indeed. And the decrypt path does the same for auth_tag[].
But that still means there are two separate problems here, one which
affects the WPA code, and one that only affects the generic CCM
chaining mode (but not the accelerated arm64 implementation)
Unsurprisingly, I would strongly prefer those to be fixed properly
rather than backing out my patch, but I'm happy to help out whichever
solution we reach consensus on.
>> and the same problem
>> does not exist in the accelerated arm64 implementation. In the mean
>> time, I will try and see if we can move aad[] off the stack in the
>> WPA code.
>
> I had that with per-CPU buffers, just sent the patch upthread.
>
I will check whether this removes the issue when not using crypto/ccm.ko
On Fri, 2016-10-14 at 10:05 +0100, Ard Biesheuvel wrote:
>
> Indeed. And the decrypt path does the same for auth_tag[].
Hadn't gotten that far, due to the BUG_ON() in CONFIG_DEBUG_SG in the
encrypt path :)
> But that still means there are two separate problems here, one which
> affects the WPA code, and one that only affects the generic CCM
> chaining mode (but not the accelerated arm64 implementation)
Yes. The generic CCM chaining still doesn't typically have a request on
the stack though. In fact, ESP (net/ipv4/esp4.c) for example will do
temporary allocations with kmalloc for every frame, it seems.
> Unsurprisingly, I would strongly prefer those to be fixed properly
> rather than backing out my patch, but I'm happy to help out whichever
> solution we reach consensus on.
Yeah, obviously, it would be good to use the accelerated versions after
all.
> I will check whether this removes the issue when not using
> crypto/ccm.ko
Ok. I think we can probably live with having those 48 bytes in per-CPU
buffers, but I suppose we don't really want to have ~500.
johannes
On 14 October 2016 at 10:10, Johannes Berg <[email protected]> wrote:
> On Fri, 2016-10-14 at 10:05 +0100, Ard Biesheuvel wrote:
>>
>> Indeed. And the decrypt path does the same for auth_tag[].
>
> Hadn't gotten that far, due to the BUG_ON() in CONFIG_DEBUG_SG in the
> encrypt path :)
>
>> But that still means there are two separate problems here, one which
>> affects the WPA code, and one that only affects the generic CCM
>> chaining mode (but not the accelerated arm64 implementation)
>
> Yes. The generic CCM chaining still doesn't typically have a request on
> the stack though. In fact, ESP (net/ipv4/esp4.c) for example will do
> temporary allocations with kmalloc for every frame, it seems.
>
It is annotated with a TODO, though :-)
38320c70d282b (Herbert Xu 2008-01-28 19:35:05 -0800 41)
* TODO: Use spare space in skb for this where possible.
>> Unsurprisingly, I would strongly prefer those to be fixed properly
>> rather than backing out my patch, but I'm happy to help out whichever
>> solution we reach consensus on.
>
> Yeah, obviously, it would be good to use the accelerated versions after
> all.
>
>> I will check whether this removes the issue when not using
>> crypto/ccm.ko
>
> Ok. I think we can probably live with having those 48 bytes in per-CPU
> buffers, but I suppose we don't really want to have ~500.
>
Agreed.
On Fri, 2016-10-14 at 10:21 +0100, Ard Biesheuvel wrote:
> It is annotated with a TODO, though :-)
>
> 38320c70d282b (Herbert Xu 2008-01-28 19:35:05
> -0800 41)
> * TODO: Use spare space in skb for this where possible.
I saw that, but I don't think generally there will be spare space for
it - the stuff there is likely far too big. Anyway ... same problem
that we have.
I'm not inclined to allocate ~500 bytes temporarily for every frame
either though.
Maybe we could try to manage it in mac80211, we'd "only" need 5 AEAD
structs (which are today on the stack) in parallel for each key (4 TX,
1 RX), but in a typical case of having 3 keys that's already 7.5K worth
of memory that we almost never use. Again, with more complexity, we
could know that the TX will not be used if the driver does the TX, but
the single RX one we'd need unconditionally... decisions decisions...
johannes
On 14 October 2016 at 10:25, Johannes Berg <[email protected]> wrote:
> On Fri, 2016-10-14 at 10:21 +0100, Ard Biesheuvel wrote:
>
>> It is annotated with a TODO, though :-)
>>
>> 38320c70d282b (Herbert Xu 2008-01-28 19:35:05
>> -0800 41)
>> * TODO: Use spare space in skb for this where possible.
>
> I saw that, but I don't think generally there will be spare space for
> it - the stuff there is likely far too big. Anyway ... same problem
> that we have.
>
> I'm not inclined to allocate ~500 bytes temporarily for every frame
> either though.
>
> Maybe we could try to manage it in mac80211, we'd "only" need 5 AEAD
> structs (which are today on the stack) in parallel for each key (4 TX,
> 1 RX), but in a typical case of having 3 keys that's already 7.5K worth
> of memory that we almost never use. Again, with more complexity, we
> could know that the TX will not be used if the driver does the TX, but
> the single RX one we'd need unconditionally... decisions decisions...
>
So why is the performance hit acceptable for ESP but not for WPA? We
could easily implement the same thing, i.e., kmalloc(GFP_ATOMIC)/kfree
the aead_req struct rather than allocate it on the stack
> So why is the performance hit acceptable for ESP but not for WPA? We
> could easily implement the same thing, i.e.,
> kmalloc(GFP_ATOMIC)/kfree the aead_req struct rather than allocate it
> on the stack
Yeah, maybe we should. It's likely a much bigger allocation, but I
don't actually know if that affects speed.
In most cases where you want high performance we never hit this anyway
since we'll have hardware crypto. I know for our (Intel's) devices we
normally never hit these code paths.
But on the other hand, you also did your changes for a reason, and the
only reason I can see of that is performance. So you'd be the one with
most "skin in the game", I guess?
johannes
On 14 October 2016 at 11:00, Johannes Berg <[email protected]> wrote:
>
>> So why is the performance hit acceptable for ESP but not for WPA? We
>> could easily implement the same thing, i.e.,
>> kmalloc(GFP_ATOMIC)/kfree the aead_req struct rather than allocate it
>> on the stack
>
> Yeah, maybe we should. It's likely a much bigger allocation, but I
> don't actually know if that affects speed.
>
> In most cases where you want high performance we never hit this anyway
> since we'll have hardware crypto. I know for our (Intel's) devices we
> normally never hit these code paths.
>
> But on the other hand, you also did your changes for a reason, and the
> only reason I can see of that is performance. So you'd be the one with
> most "skin in the game", I guess?
>
Well, what sucks here is that the accelerated driver I implemented for
arm64 does not actually need this, as long as we take aad[] off the
stack. And note that the API was changed since my patch, to add aad[]
to the scatterlist: prior to this change, it used
aead_request_set_assoc() to set the associated data separately.