Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp8469119rwd; Tue, 20 Jun 2023 15:58:03 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7zouJLVGM4fDXRao+eUGXFO6kV2zclY76Q2JGGY0HcHvic4ly0Xscd+KmB3ANT3F4Um5mQ X-Received: by 2002:a17:90a:54:b0:259:3cc5:ff8f with SMTP id 20-20020a17090a005400b002593cc5ff8fmr10502180pjb.1.1687301883299; Tue, 20 Jun 2023 15:58:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687301883; cv=none; d=google.com; s=arc-20160816; b=yM2ZrWGKRzbhC2Ncv0Il9wsnb9R2wb8ofDQLzCz8BrQ9fY9e54K3QQgg9ZjNgEgHrV Ib5NEW8tTdCUDl051FQuK3lrbiHKbqLcllpKCLUGqlIAMGjamXgPbj07/j7oFVekwT1g LDc/4/H6hxllb5+dgWA3WPe2RCGm2vxxsKeuxoggHgloQzOESS0e4yvv9sLveKMXPVHP MfP7/RKA1sNx+xQ4BZ996WtIcsLI0DkBfRdGQH5d2pk81n2uAAoL17KWhon2CpFGFIGB Prh+8hXZI52T/KgRh0LxiYNijRpfkHjI9rM1ZH1O5RVneVh+B/OKRUTFdjFXdkpyI717 TllA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:subject:cc:to:from:date:references:in-reply-to :message-id:mime-version:user-agent:feedback-id:dkim-signature; bh=jVeBO4flahBbnjhX4Tp3tQxxENFJCdf3PttGZSppfJ4=; b=tRQo8BqgsBQTTyhfZHHxFVIBflqVuUBSDnpuYNZiL1HHdpgbPKp/+gfIggHE4cG7OC gW4NSJ5eMFTfEmGbTpQC/hjHCs/xfuvE+z4NyQQL6wImcSubPtVJ8s0hsf8ESlXn/aUl 77O/8WbsQ4TuEk4fOYCC6sNw3gHmb++D6HtG1xTNwJmSiPcpF2+/63Ek4ZzdUPecsOIU TicVUj2w2HYozw8Ye+qq6JwipX3yX5+9bmBSYav6c4W21PjTYzbo8tTXZhtP6KJnlpCS jMM5ELgyC3q0nzanEGdhJddv0t5RzCd5sGFr/0TZmpN3vZJAgqG/tFaSWn5bvozJJ6fY jRyQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=lKDEk2Y+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l5-20020a17090a850500b0025669d0176asi10527122pjn.134.2023.06.20.15.57.51; Tue, 20 Jun 2023 15:58:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=lKDEk2Y+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229882AbjFTWdK (ORCPT + 99 others); Tue, 20 Jun 2023 18:33:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43706 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229523AbjFTWdJ (ORCPT ); Tue, 20 Jun 2023 18:33:09 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1E91BDD; Tue, 20 Jun 2023 15:33:08 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id AB7E86133B; Tue, 20 Jun 2023 22:33:07 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 71269C433C9; Tue, 20 Jun 2023 22:33:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1687300387; bh=mEiuO2xX0IcJgMZ94z/fc+Jrradl7/CxlLfkUtw2y1E=; h=In-Reply-To:References:Date:From:To:Cc:Subject:From; b=lKDEk2Y+c6w76/rnnZJxjBPNEbQ2s66cBUg8vwf3lcapHCD4geBstpPVu2Ljj2Ktc diiQeqKDHfbFrWDcYcyRvbyaYBdgRJkChQasG68qdkWRQl5uLMSfEOIcvjDbimCtDm nXsGOf2k4J7eV+PysjtjLIKgH535Zls2ZkpLgOS+ahbRMNnURmgzzOxDujX5IdxrHA u/ScYm49YmGkR+N4LVrZLJPURWbgKyTWSP7GJppffBR05q8OgvrTa8W8gzlhdTwrQh sTXBeDHIzZMfGF7moyO+2cjf+gaQ8hL/VFnnC6eI19e9AHAV0XCFCKEn5naYSXXsqy sGXpmmjrFYusA== Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailauth.nyi.internal (Postfix) with ESMTP id 564CF27C005B; Tue, 20 Jun 2023 18:33:05 -0400 (EDT) Received: from imap48 ([10.202.2.98]) by compute3.internal (MEProxy); Tue, 20 Jun 2023 18:33:05 -0400 X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvhedrgeefiedguddvucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepofgfggfkjghffffhvfevufgtsehttdertderredtnecuhfhrohhmpedftehn ugihucfnuhhtohhmihhrshhkihdfuceolhhuthhosehkvghrnhgvlhdrohhrgheqnecugg ftrfgrthhtvghrnhepvdevledvueektdeuhfegvedvleeugfetgefggffggeethefhkedt ffekieffteejnecuffhomhgrihhnpehgohgusgholhhtrdhorhhgnecuvehluhhsthgvrh fuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomheprghnugihodhmvghsmhhtphgr uhhthhhpvghrshhonhgrlhhithihqdduudeiudekheeifedvqddvieefudeiiedtkedqlh huthhopeepkhgvrhhnvghlrdhorhhgsehlihhnuhigrdhluhhtohdruhhs X-ME-Proxy: Feedback-ID: ieff94742:Fastmail Received: by mailuser.nyi.internal (Postfix, from userid 501) id AAE1A31A0063; Tue, 20 Jun 2023 18:33:04 -0400 (EDT) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.9.0-alpha0-499-gf27bbf33e2-fm-20230619.001-gf27bbf33 Mime-Version: 1.0 Message-Id: In-Reply-To: References: <20230509165657.1735798-1-kent.overstreet@linux.dev> <20230509165657.1735798-8-kent.overstreet@linux.dev> <20230619104717.3jvy77y3quou46u3@moria.home.lan> <20230619191740.2qmlza3inwycljih@moria.home.lan> <5ef2246b-9fe5-4206-acf0-0ce1f4469e6c@app.fastmail.com> <20230620180839.oodfav5cz234pph7@moria.home.lan> <37d2378e-72de-e474-5e25-656b691384ba@intel.com> Date: Tue, 20 Jun 2023 15:32:44 -0700 From: "Andy Lutomirski" To: "Dave Hansen" , "Kent Overstreet" Cc: "Mark Rutland" , "Linux Kernel Mailing List" , linux-fsdevel@vger.kernel.org, "linux-bcachefs@vger.kernel.org" , "Kent Overstreet" , "Andrew Morton" , "Uladzislau Rezki" , "hch@infradead.org" , linux-mm@kvack.org, "Kees Cook" , "the arch/x86 maintainers" Subject: Re: [PATCH 07/32] mm: Bring back vmalloc_exec Content-Type: text/plain X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jun 20, 2023, at 1:42 PM, Andy Lutomirski wrote: > Hi all- > > On Tue, Jun 20, 2023, at 11:48 AM, Dave Hansen wrote: >>>> No, I'm saying your concerns are baseless and too vague to >>>> address. >>> If you don't address them, the NAK will stand forever, or at least >>> until a different group of people take over x86 maintainership. >>> That's fine with me. >> >> I've got a specific concern: I don't see vmalloc_exec() used in this >> series anywhere. I also don't see any of the actual assembly that's >> being generated, or the glue code that's calling into the generated >> assembly. >> >> I grepped around a bit in your git trees, but I also couldn't find it in >> there. Any chance you could help a guy out and point us to some of the >> specifics of this new, tiny JIT? >> > > So I had a nice discussion with Kent on IRC, and, for the benefit of > everyone else reading along, I *think* the JITted code can be replaced > by a table-driven approach like this: > > typedef unsigned int u32; > typedef unsigned long u64; > > struct uncompressed > { > u32 a; > u32 b; > u64 c; > u64 d; > u64 e; > u64 f; > }; > > struct bitblock > { > u64 source; > u64 target; > u64 mask; > int shift; > }; > > // out needs to be zeroed first > void unpack(struct uncompressed *out, const u64 *in, const struct > bitblock *blocks, int nblocks) > { > u64 *out_as_words = (u64*)out; > for (int i = 0; i < nblocks; i++) { > const struct bitblock *b; > out_as_words[b->target] |= (in[b->source] & b->mask) << > b->shift; > } > } > > void apply_offsets(struct uncompressed *out, const struct uncompressed *offsets) > { > out->a += offsets->a; > out->b += offsets->b; > out->c += offsets->c; > out->d += offsets->d; > out->e += offsets->e; > out->f += offsets->f; > } > > Which generates nice code: https://godbolt.org/z/3fEq37hf5 Thinking about this a bit more, I think the only real performance issue with my code is that it does 12 read-xor-write operations in memory, which all depend on each other in horrible ways. If it's reversed so the stores are all in order, then this issue would go away. typedef unsigned int u32; typedef unsigned long u64; struct uncompressed { u32 a; u32 b; u64 c; u64 d; u64 e; u64 f; }; struct field_piece { int source; int shift; u64 mask; }; struct field_pieces { struct field_piece pieces[2]; u64 offset; }; u64 unpack_one(const u64 *in, const struct field_pieces *pieces) { const struct field_piece *p = pieces->pieces; return (((in[p[0].source] & p[0].mask) << p[0].shift) | ((in[p[1].source] & p[1].mask) << p[1].shift)) + pieces->offset; } struct encoding { struct field_pieces a, b, c, d, e, f; }; void unpack(struct uncompressed *out, const u64 *in, const struct encoding *encoding) { out->a = unpack_one(in, &encoding->a); out->b = unpack_one(in, &encoding->b); out->c = unpack_one(in, &encoding->c); out->d = unpack_one(in, &encoding->d); out->e = unpack_one(in, &encoding->e); out->f = unpack_one(in, &encoding->f); } https://godbolt.org/z/srsfcGK4j Could be faster. Probably worth testing.