Received: by 2002:a25:1985:0:0:0:0:0 with SMTP id 127csp1594752ybz; Thu, 23 Apr 2020 01:46:48 -0700 (PDT) X-Google-Smtp-Source: APiQypK1/MuZw40RLAm97j9dT3ktIbmX8pjnM2ThYfGa+4Ac2syPVB61h8QbBCohjaHnxKI3faHJ X-Received: by 2002:a05:6402:1cac:: with SMTP id cz12mr1698674edb.373.1587631607910; Thu, 23 Apr 2020 01:46:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1587631607; cv=none; d=google.com; s=arc-20160816; b=hf207bvuvnDiXImWZQ1x++yQ9Q+uN3FM1a4jGb+OcGR+IyA32vzGWV/mUxOFqNZNvb 2621GJayq6qsPlIh5WMk/eC3u/WFxjwXwEOT+JuSeiSqY7w7pytEZuQhXstSvSuuBjna sNLnqqPEJ6FrfmDU9KV8yD8MSF2XEGJn6QTyiNWgAlAY49YcCYItNp9iVE0B94D/kslF 2W2znoS6PfwLQlQ/YCuDKiysB3oU9/PxPuY6e4ZGrqfAzGGFzflI4g8cYnI1umMCmc/2 fdUqpajPJmSg4WVOSJh927Xu+QHIawlqfvtAbCqUAQtXVCVIDG5Fo5L8PzpmKoLrBJBR q0rg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=0L5WuKaGtyvfiYJssfzWuX2JIzfWWfbnRYzlkrnnsRM=; b=B6EYfaHxOozRDoJCBz8Bw85l0qDbrW4l82SF1+jsu/H0jjgUSKUz7tRJDlesKcX/ka yVqD9nkjCPX0hGYQ43qJV8gs6oG669KD1F3rsVtQzro54Q1NVz+a6APQE+APqATLLqQm xB0+khVu1+ZVz6LlPEXhTKqYz3YX+FvIEmK/PXHSeRhuPg4EabGRaAmfIN78Qq0kWkHN YOYFl1nLNk8kSo0+c/soSsNZp0PK785MjJ908/Om7Nos30lzHTl1zlZgP5E9gnvm3IGI ibETszeFS2LoycQaNaNXjC9P4l/ah1ANPHPCm3QDxya2edGgrwbVAdHVCV6G3FY7h+n0 nIPw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=yQ+KnUvb; spf=pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id x5si841366edr.583.2020.04.23.01.46.15; Thu, 23 Apr 2020 01:46:47 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=yQ+KnUvb; spf=pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725884AbgDWIqL (ORCPT + 99 others); Thu, 23 Apr 2020 04:46:11 -0400 Received: from mail.kernel.org ([198.145.29.99]:57940 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725854AbgDWIqL (ORCPT ); Thu, 23 Apr 2020 04:46:11 -0400 Received: from mail-il1-f175.google.com (mail-il1-f175.google.com [209.85.166.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 536EA20736; Thu, 23 Apr 2020 08:46:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1587631571; bh=BtULDBd1INlNO1t48cFaq0bwcPUqYBLewU519ka9fvM=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=yQ+KnUvb2doOAmmdkxRQV+tD4ewepGfoPvwfrrufH0i840Pgfl82X2spNzRia2TZe qKGWItKC4xis3+223XskoBzaLPD4j8x5P7BFfPX5IoRjHSj/k4Fqwc6CLI25a0L7q8 rNd35J41KoEFX5/cNHRLz/lT7J4sCRALyJupbXIo= Received: by mail-il1-f175.google.com with SMTP id f82so4764640ilh.8; Thu, 23 Apr 2020 01:46:11 -0700 (PDT) X-Gm-Message-State: AGi0Puagy5X5x17aSIGyguZ2YiNG/Yuyp5YeMuLVY7n4WVNEDrIZSYxf KwGaDURy9gZDFOmfQ9Zi3u4KHoiwCH2VS0DDA/o= X-Received: by 2002:a92:607:: with SMTP id x7mr2160202ilg.218.1587631570740; Thu, 23 Apr 2020 01:46:10 -0700 (PDT) MIME-Version: 1.0 References: <20200420075711.2385190-1-Jason@zx2c4.com> <20200422040415.GA2881@sol.localdomain> In-Reply-To: From: Ard Biesheuvel Date: Thu, 23 Apr 2020 10:45:59 +0200 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH crypto-stable] crypto: arch/lib - limit simd usage to PAGE_SIZE chunks To: "Jason A. Donenfeld" Cc: Eric Biggers , Herbert Xu , Linux Crypto Mailing List , LKML Content-Type: text/plain; charset="UTF-8" Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org On Wed, 22 Apr 2020 at 22:17, Jason A. Donenfeld wrote: > > On Wed, Apr 22, 2020 at 1:51 PM Jason A. Donenfeld wrote: > > > > On Wed, Apr 22, 2020 at 1:39 AM Ard Biesheuvel wrote: > > > > > > On Wed, 22 Apr 2020 at 09:32, Jason A. Donenfeld wrote: > > > > > > > > On Tue, Apr 21, 2020 at 10:04 PM Eric Biggers wrote: > > > > > Seems this should just be a 'while' loop? > > > > > > > > > > while (bytes) { > > > > > unsigned int todo = min_t(unsigned int, PAGE_SIZE, bytes); > > > > > > > > > > kernel_neon_begin(); > > > > > chacha_doneon(state, dst, src, todo, nrounds); > > > > > kernel_neon_end(); > > > > > > > > > > bytes -= todo; > > > > > src += todo; > > > > > dst += todo; > > > > > } > > > > > > > > The for(;;) is how it's done elsewhere in the kernel (that this patch > > > > doesn't touch), because then we can break out of the loop before > > > > having to increment src and dst unnecessarily. Likely a pointless > > > > optimization as probably the compiler can figure out how to avoid > > > > that. But maybe it can't. If you have a strong preference, I can > > > > reactor everything to use `while (bytes)`, but if you don't care, > > > > let's keep this as-is. Opinion? > > > > > > > > > > Since we're bikeshedding, I'd prefer 'do { } while (bytes);' here, > > > given that bytes is guaranteed to be non-zero before we enter the > > > loop. But in any case, I'd prefer avoiding for(;;) or while(1) where > > > we can. > > > > Okay, will do-while it up for v2. > > I just sent v2 containing do-while, and I'm fine with that going in > that way. But just in the interest of curiosity in the pan-tone > palette, check this out: > > https://godbolt.org/z/VxXien > > It looks like on mine, the compiler avoids unnecessarily calling those > adds on the last iteration, but on the other hand, it results in an > otherwise unnecessary unconditional jump for the < 4096 case. Sort of > interesting. Arm64 code is more or less the same difference too. Yeah, even if shaving off 1 or 2 cycles mattered here (since we've just decided that ugh() may take up to 20,000 cycles), hiding a couple of ALU instructions in the slots between the subs (which sets the zero flag) and the conditional branch that tests it probably comes for free on in-order cores anyway. And even if it didn't, backwards branches are usually statically predicted as taken, in which case their results are actually needed. On out-of-order cores under speculation, none of this matters anyway.