Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752021AbdILT4I (ORCPT ); Tue, 12 Sep 2017 15:56:08 -0400 Received: from mail-wm0-f50.google.com ([74.125.82.50]:45050 "EHLO mail-wm0-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751444AbdILT4F (ORCPT ); Tue, 12 Sep 2017 15:56:05 -0400 X-Google-Smtp-Source: ADKCNb4EIaFHfBjYCr6dczMurE/aDTjxtUEiT+UY8NqkudKynjjRUPbaySrsJsdbyB/e21IEw0PfJQ== Subject: Re: [PATCH] KVM: MMU: speedup update_permission_bitmask To: Peter Feiner , Jim Mattson Cc: LKML , kvm list , David Hildenbrand References: <1503590171-41030-1-git-send-email-pbonzini@redhat.com> From: Paolo Bonzini Message-ID: Date: Tue, 12 Sep 2017 21:55:56 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1067 Lines: 26 On 12/09/2017 18:48, Peter Feiner wrote: >>> >>> Because update_permission_bitmask is actually the top item in the profile >>> for nested vmexits, this speeds up an L2->L1 vmexit by about ten thousand >>> clock cycles, or up to 30%: > > This is a great improvement! Why not take it a step further and > compute the whole table once at module init time and be done with it? > There are only 5 extra input bits (nx, ept, smep, smap, wp), 4 actually, nx could be ignored (because unlike WP, the bit is reserved when nx is disabled). It is only handled for clarity. > so the > whole table would only take up (1 << 5) * 16 = 512 bytes. Moreover, if > you had 32 VMs on the host, you'd actually save memory! Indeed; my thought was to write a script or something to generate the tables at compile time, but doing it at module init time would be clever and easier. That said, the generated code for the function, right now, is pretty good. If it saved 1000 clock cycles per nested vmexit it would be very convincing, but if it were 50 or even 100 a bit less so. Paolo