Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933407AbcKGSfY (ORCPT ); Mon, 7 Nov 2016 13:35:24 -0500 Received: from mail-pf0-f169.google.com ([209.85.192.169]:33366 "EHLO mail-pf0-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932691AbcKGSfU (ORCPT ); Mon, 7 Nov 2016 13:35:20 -0500 Date: Mon, 7 Nov 2016 10:26:46 -0800 From: Eric Biggers To: "Jason A. Donenfeld" Cc: David Miller , Herbert Xu , linux-crypto@vger.kernel.org, LKML , Martin Willi , WireGuard mailing list , =?iso-8859-1?Q?Ren=E9?= van Dorst Subject: Re: [PATCH] poly1305: generic C can be faster on chips with slow unaligned access Message-ID: <20161107182646.GA34388@google.com> References: <20161103004934.GA30775@gondor.apana.org.au> <20161103.130852.1456848512897088071.davem@davemloft.net> <20161104173723.GB34176@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1486 Lines: 34 On Mon, Nov 07, 2016 at 07:08:22PM +0100, Jason A. Donenfeld wrote: > Hmm... The general data flow that strikes me as most pertinent is > something like: > > struct sk_buff *skb = get_it_from_somewhere(); > skb = skb_share_check(skb, GFP_ATOMIC); > num_frags = skb_cow_data(skb, ..., ...); > struct scatterlist sg[num_frags]; > sg_init_table(sg, num_frags); > skb_to_sgvec(skb, sg, ..., ...); > blkcipher_walk_init(&walk, sg, sg, len); > blkcipher_walk_virt_block(&desc, &walk, BLOCK_SIZE); > while (walk.nbytes >= BLOCK_SIZE) { > size_t chunk_len = rounddown(walk.nbytes, BLOCK_SIZE); > poly1305_update(&poly1305_state, walk.src.virt.addr, chunk_len); > blkcipher_walk_done(&desc, &walk, walk.nbytes % BLOCK_SIZE); > } > if (walk.nbytes) { > poly1305_update(&poly1305_state, walk.src.virt.addr, walk.nbytes); > blkcipher_walk_done(&desc, &walk, 0); > } > > Is your suggestion that that in the final if block, walk.src.virt.addr > might be unaligned? Like in the case of the last fragment being 67 > bytes long? I was not referring to any users in particular, only what users could do. As an example, if you did crypto_shash_update() with 32, 15, then 17 bytes, and the underlying algorithm is poly1305-generic, the last block would end up misaligned. This doesn't appear possible with your pseudocode because it only passes in multiples of the block size until the very end. However I don't see it claimed anywhere that shash API users have to do that. Eric