Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp1819752rwd; Mon, 15 May 2023 03:36:23 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ69TpNLEdLJcacVCFdwQ3DwzHZOsR27K/5lbAN+zRqwAHD+nRoeRyCiO3UuVHfFWJXjfqEf X-Received: by 2002:a05:6a20:394a:b0:101:9344:bf89 with SMTP id r10-20020a056a20394a00b001019344bf89mr26211498pzg.49.1684146983760; Mon, 15 May 2023 03:36:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684146983; cv=none; d=google.com; s=arc-20160816; b=b6WvzqT56JxP7pHHGjI/CL5A5dJQYAERnE2qtNAbZ/mS2MGjX3mXbfB+ZziHdkDO2r damS7/JDx2rxlWBA1U+1dTvQWiu1Z0clqP2JC6jixgYlM3MBIaN7uOMPpkSmwqhVyQhq WmnmlIo04n307c3JrI3Jnm66SkbPVv6fWsLS3AAmgJlop8GixvKW2Q11lQWNu/nch6xn QZUikHGg+p6eDnqDhSU6aYwYW0kUix1gvJP1saz3lIydlFOiLiehjPoxPZi0+uqW9VSm M0o2FxcOo/AfjIvvwaVtuVOUlFHgpPTqx9GPcmcW8W4gdw229s2C97z7aQMajrGs6s6Y JxCQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :mime-version:accept-language:in-reply-to:references:message-id:date :thread-index:thread-topic:subject:cc:to:from; bh=uJ6kDi44SBfCMJjZve2Yf0SqN9YlWxwKtNSSVYPr5Qw=; b=xwLSGgaisP/wZqHty8uGIYQ63Ny2wqP9aqogIY6h/6ojH2RUd+46HhMWTDR0x3Jl4T hXfXiWb9ZLRfqLxANlG6dCrbpT70/8KYv4o91D8NOcJd/2UDmsbN66NlC6Ne/jdB2K6Y SsJGo7RsUejIB5YomtEc5yVNjYRThLM0YZHf+z7njkQAkSm/83U3jDDv96oLpxIuMH0C gmCdu4FbupdtgsFSyZErMVDADlyBOZn8ksEIER23AeQN2Ov2Tw8aiaVGgSkSjjuZgR/c aG/caMLTmwreuIscvDE5O9JpJ3A4iE877krrgXAjfD3iaJRu+rN3qWe4bZNNDtrO6PqQ 2jCA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=aculab.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 4-20020a630104000000b0050f9b910fa1si15628508pgb.368.2023.05.15.03.36.09; Mon, 15 May 2023 03:36:23 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=aculab.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240894AbjEOK31 convert rfc822-to-8bit (ORCPT + 99 others); Mon, 15 May 2023 06:29:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57728 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230091AbjEOK3Z (ORCPT ); Mon, 15 May 2023 06:29:25 -0400 Received: from eu-smtp-delivery-151.mimecast.com (eu-smtp-delivery-151.mimecast.com [185.58.86.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 67A2D10C1 for ; Mon, 15 May 2023 03:29:22 -0700 (PDT) Received: from AcuMS.aculab.com (156.67.243.121 [156.67.243.121]) by relay.mimecast.com with ESMTP with both STARTTLS and AUTH (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id uk-mta-9-fk4fV_TIMaibnjx8Kk2ggA-1; Mon, 15 May 2023 11:29:20 +0100 X-MC-Unique: fk4fV_TIMaibnjx8Kk2ggA-1 Received: from AcuMS.Aculab.com (10.202.163.6) by AcuMS.aculab.com (10.202.163.6) with Microsoft SMTP Server (TLS) id 15.0.1497.48; Mon, 15 May 2023 11:29:19 +0100 Received: from AcuMS.Aculab.com ([::1]) by AcuMS.aculab.com ([::1]) with mapi id 15.00.1497.048; Mon, 15 May 2023 11:29:19 +0100 From: David Laight To: 'Kent Overstreet' , Eric Biggers CC: Lorenzo Stoakes , Christoph Hellwig , "linux-kernel@vger.kernel.org" , "linux-fsdevel@vger.kernel.org" , "linux-bcachefs@vger.kernel.org" , Kent Overstreet , Andrew Morton , Uladzislau Rezki , "linux-mm@kvack.org" Subject: RE: [PATCH 07/32] mm: Bring back vmalloc_exec Thread-Topic: [PATCH 07/32] mm: Bring back vmalloc_exec Thread-Index: AQHZhidTKdvNQYED30e4lpVLdXSS2q9bIbmQ Date: Mon, 15 May 2023 10:29:18 +0000 Message-ID: <1f1d88a6a33f4e5db99544fda965c594@AcuMS.aculab.com> References: <20230509165657.1735798-1-kent.overstreet@linux.dev> <20230509165657.1735798-8-kent.overstreet@linux.dev> <20230510064849.GC1851@quark.localdomain> <20230513015752.GC3033@quark.localdomain> In-Reply-To: Accept-Language: en-GB, en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.202.205.107] MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: aculab.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT X-Spam-Status: No, score=-0.9 required=5.0 tests=BAYES_00,PDS_BAD_THREAD_QP_64, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Kent Overstreet > Sent: 14 May 2023 06:45 ... > dynamically generated unpack: > rand_insert: 20.0 MiB with 1 threads in 33 sec, 1609 nsec per iter, 607 KiB per sec > > old C unpack: > rand_insert: 20.0 MiB with 1 threads in 35 sec, 1672 nsec per iter, 584 KiB per sec > > the Eric Biggers special: > rand_insert: 20.0 MiB with 1 threads in 35 sec, 1676 nsec per iter, 583 KiB per sec > > Tested two versions of your approach, one without a shift value, one > where we use a shift value to try to avoid unaligned access - second was > perhaps 1% faster You won't notice any effect of avoiding unaligned accesses on x86. I think then get split into 64bit accesses and again on 64 byte boundaries (that is what I see for uncached access to PCIe). The kernel won't be doing >64bit and the 'out of order' pipeline will tend to cover the others (especially since you get 2 reads/clock). > so it's not looking good. This benchmark doesn't even hit on > unpack_key() quite as much as I thought, so the difference is > significant. Beware: unless you manage to lock the cpu frequency (which is ~impossible on some cpu) timings in nanoseconds are pretty useless. You can use the performance counter to get accurate cycle times (provided there isn't a cpu switch in the middle of a micro-benchmark). David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)