2024-01-09 05:47:27

by Jun Miao

[permalink] [raw]
Subject: [PATCH] x86/tdx: Optimize try_accept_memory() to reduce 1GB page accepted failed times

Since the current TDX module ABI spec TDG.MEM.PAGE.ACCEPT Leaf show:
"Level of the Secure EPT leaf entry that maps the private page to be
accepted: either 0 (4KB) or 1 (2MB)".

There is not 1G page accept dynamically, if every time try to accept 1G
size but always fail, then cost more time from two cases:
- When size < 1G, judge failed return 0
- Really TDCALL<ACCEPT_PAGE> 1G failed when size >= 1G
So skip the 1G and optimize it to 2M directly to save time.

Run the eatmemory with different memories to get the cost time as follow:
[root@td-guest ~]# ./eatmemory 8G
Currently total memory: 100169027584
Currently avail memory: 99901911040
Eating 8589934592 bytes in chunks of 1024...

Start time:1704699207487 ms
End time:1704699222966 ms
Cost time: 15479 ms
#
# Compare with/without this optimization
#
# Hardware: ArcherCity Sapphire Rapids 128cores
# Test eatmemory: https://github.com/jmiao2018/eatmemory.git
# Detail test log link: https://github.com/jmiao2018/eatmemory/blob/master/log-tdx.txt
#
# Accept Memeory Sizes Before(ms) After(ms) Trigger 1G Failed Times Reduce Time%
# .................... .......... ......... ....................... .............
#
1G 3414 3402 751824 -12(-0.035%)
2G 3853 3804 1015126 -349(-0.128%)
4G 7773 7561 1557834 -212(-0.281%)
8G 15479 15173 2633686 -306(-0.201%)
16G 31527 30379 4785649 -1148(-0.378%)
32G 65058 63723 9087686 -1335(-0.209%)
64G 133379 128354 17693366 -5025(-0.391%)

Co-developed-by: Zhiquan Li <[email protected]>
Signed-off-by: Jun Miao <[email protected]>
---
arch/x86/coco/tdx/tdx-shared.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/x86/coco/tdx/tdx-shared.c b/arch/x86/coco/tdx/tdx-shared.c
index 1655aa56a0a5..1694b7eba93b 100644
--- a/arch/x86/coco/tdx/tdx-shared.c
+++ b/arch/x86/coco/tdx/tdx-shared.c
@@ -18,7 +18,7 @@ static unsigned long try_accept_one(phys_addr_t start, unsigned long len,
* Pass the page physical address to the TDX module to accept the
* pending, private page.
*
- * Bits 2:0 of RCX encode page size: 0 - 4K, 1 - 2M, 2 - 1G.
+ * Bits 2:0 of RCX encode page size: 0 - 4K, 1 - 2M.
*/
switch (pg_level) {
case PG_LEVEL_4K:
@@ -27,9 +27,6 @@ static unsigned long try_accept_one(phys_addr_t start, unsigned long len,
case PG_LEVEL_2M:
page_size = TDX_PS_2M;
break;
- case PG_LEVEL_1G:
- page_size = TDX_PS_1G;
- break;
default:
return 0;
}
@@ -55,11 +52,14 @@ bool tdx_accept_memory(phys_addr_t start, phys_addr_t end)
* Try larger accepts first. It gives chance to VMM to keep
* 1G/2M Secure EPT entries where possible and speeds up
* process by cutting number of hypercalls (if successful).
- */
+ * Since per current TDX spec, only support for adding 4KB or
+ * 2MB page dynamically.
+ * /

- accept_size = try_accept_one(start, len, PG_LEVEL_1G);
- if (!accept_size)
+ if (IS_ALIGNED(start, PMD_SIZE) && len >= PMD_SIZE)
accept_size = try_accept_one(start, len, PG_LEVEL_2M);
+
+ /* The 4KB page case or accept 2MB page failed case. */
if (!accept_size)
accept_size = try_accept_one(start, len, PG_LEVEL_4K);
if (!accept_size)
--
2.32.0



2024-01-09 11:41:13

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH] x86/tdx: Optimize try_accept_memory() to reduce 1GB page accepted failed times

On Tue, Jan 09, 2024 at 01:48:24PM +0800, Jun Miao wrote:
> Since the current TDX module ABI spec TDG.MEM.PAGE.ACCEPT Leaf show:
> "Level of the Secure EPT leaf entry that maps the private page to be
> accepted: either 0 (4KB) or 1 (2MB)".

Well, that's true that current implementation supports only 4k and 2M, but
note reference to "Secure EPT level" table. This as well as size of the
field suggests that it can be extended to more page levels.

> There is not 1G page accept dynamically, if every time try to accept 1G
> size but always fail, then cost more time from two cases:
> - When size < 1G, judge failed return 0
> - Really TDCALL<ACCEPT_PAGE> 1G failed when size >= 1G
> So skip the 1G and optimize it to 2M directly to save time.

Do you actually see issued TDCALL for 1G pages? It shouldn't be the case.

Kernel accepts memory in MAX_ORDER chunks -- 4MiB a time. try_accept_one()
will fail on alignment check 511 times of 512 and on len check for the
one. I expected these checks to be within noise compared to TDCALL.

I don't oppose the patch in principal, but let's establish facts first.

>
> Run the eatmemory with different memories to get the cost time as follow:
> [root@td-guest ~]# ./eatmemory 8G
> Currently total memory: 100169027584
> Currently avail memory: 99901911040
> Eating 8589934592 bytes in chunks of 1024...
>
> Start time:1704699207487 ms
> End time:1704699222966 ms
> Cost time: 15479 ms
> #
> # Compare with/without this optimization
> #
> # Hardware: ArcherCity Sapphire Rapids 128cores
> # Test eatmemory: https://github.com/jmiao2018/eatmemory.git
> # Detail test log link: https://github.com/jmiao2018/eatmemory/blob/master/log-tdx.txt
> #
> # Accept Memeory Sizes Before(ms) After(ms) Trigger 1G Failed Times Reduce Time%
> # .................... .......... ......... ....................... .............
> #
> 1G 3414 3402 751824 -12(-0.035%)
> 2G 3853 3804 1015126 -349(-0.128%)
> 4G 7773 7561 1557834 -212(-0.281%)
> 8G 15479 15173 2633686 -306(-0.201%)
> 16G 31527 30379 4785649 -1148(-0.378%)
> 32G 65058 63723 9087686 -1335(-0.209%)
> 64G 133379 128354 17693366 -5025(-0.391%)
>
> Co-developed-by: Zhiquan Li <[email protected]>
> Signed-off-by: Jun Miao <[email protected]>
> ---
> arch/x86/coco/tdx/tdx-shared.c | 14 +++++++-------
> 1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/coco/tdx/tdx-shared.c b/arch/x86/coco/tdx/tdx-shared.c
> index 1655aa56a0a5..1694b7eba93b 100644
> --- a/arch/x86/coco/tdx/tdx-shared.c
> +++ b/arch/x86/coco/tdx/tdx-shared.c
> @@ -18,7 +18,7 @@ static unsigned long try_accept_one(phys_addr_t start, unsigned long len,
> * Pass the page physical address to the TDX module to accept the
> * pending, private page.
> *
> - * Bits 2:0 of RCX encode page size: 0 - 4K, 1 - 2M, 2 - 1G.
> + * Bits 2:0 of RCX encode page size: 0 - 4K, 1 - 2M.
> */
> switch (pg_level) {
> case PG_LEVEL_4K:
> @@ -27,9 +27,6 @@ static unsigned long try_accept_one(phys_addr_t start, unsigned long len,
> case PG_LEVEL_2M:
> page_size = TDX_PS_2M;
> break;
> - case PG_LEVEL_1G:
> - page_size = TDX_PS_1G;
> - break;
> default:
> return 0;
> }
> @@ -55,11 +52,14 @@ bool tdx_accept_memory(phys_addr_t start, phys_addr_t end)
> * Try larger accepts first. It gives chance to VMM to keep
> * 1G/2M Secure EPT entries where possible and speeds up
> * process by cutting number of hypercalls (if successful).
> - */
> + * Since per current TDX spec, only support for adding 4KB or
> + * 2MB page dynamically.
> + * /
>
> - accept_size = try_accept_one(start, len, PG_LEVEL_1G);
> - if (!accept_size)
> + if (IS_ALIGNED(start, PMD_SIZE) && len >= PMD_SIZE)

You duplicate checks inside try_to_accept_on().

> accept_size = try_accept_one(start, len, PG_LEVEL_2M);
> +
> + /* The 4KB page case or accept 2MB page failed case. */
> if (!accept_size)
> accept_size = try_accept_one(start, len, PG_LEVEL_4K);
> if (!accept_size)
> --
> 2.32.0
>

--
Kiryl Shutsemau / Kirill A. Shutemov

2024-01-09 23:15:15

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH] x86/tdx: Optimize try_accept_memory() to reduce 1GB page accepted failed times

Hi Jun,

kernel test robot noticed the following build warnings:

[auto build test WARNING on tip/x86/tdx]
[also build test WARNING on next-20240109]
[cannot apply to linus/master v6.7]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Jun-Miao/x86-tdx-Optimize-try_accept_memory-to-reduce-1GB-page-accepted-failed-times/20240109-134908
base: tip/x86/tdx
patch link: https://lore.kernel.org/r/20240109054824.9023-1-jun.miao%40intel.com
patch subject: [PATCH] x86/tdx: Optimize try_accept_memory() to reduce 1GB page accepted failed times
config: x86_64-allmodconfig (https://download.01.org/0day-ci/archive/20240110/[email protected]/config)
compiler: ClangBuiltLinux clang version 17.0.6 (https://github.com/llvm/llvm-project 6009708b4367171ccdbf4b5905cb6a803753fe18)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240110/[email protected]/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/

All warnings (new ones prefixed by >>):

>> arch/x86/coco/tdx/tdx-shared.c:62:3: warning: '/*' within block comment [-Wcomment]
62 | /* The 4KB page case or accept 2MB page failed case. */
| ^
>> arch/x86/coco/tdx/tdx-shared.c:63:8: warning: variable 'accept_size' is uninitialized when used here [-Wuninitialized]
63 | if (!accept_size)
| ^~~~~~~~~~~
arch/x86/coco/tdx/tdx-shared.c:49:28: note: initialize the variable 'accept_size' to silence this warning
49 | unsigned long accept_size;
| ^
| = 0
2 warnings generated.


vim +62 arch/x86/coco/tdx/tdx-shared.c

40
41 bool tdx_accept_memory(phys_addr_t start, phys_addr_t end)
42 {
43 /*
44 * For shared->private conversion, accept the page using
45 * TDG_MEM_PAGE_ACCEPT TDX module call.
46 */
47 while (start < end) {
48 unsigned long len = end - start;
49 unsigned long accept_size;
50
51 /*
52 * Try larger accepts first. It gives chance to VMM to keep
53 * 1G/2M Secure EPT entries where possible and speeds up
54 * process by cutting number of hypercalls (if successful).
55 * Since per current TDX spec, only support for adding 4KB or
56 * 2MB page dynamically.
57 * /
58
59 if (IS_ALIGNED(start, PMD_SIZE) && len >= PMD_SIZE)
60 accept_size = try_accept_one(start, len, PG_LEVEL_2M);
61
> 62 /* The 4KB page case or accept 2MB page failed case. */
> 63 if (!accept_size)
64 accept_size = try_accept_one(start, len, PG_LEVEL_4K);
65 if (!accept_size)
66 return false;
67 start += accept_size;
68 }
69
70 return true;
71 }
72

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

2024-01-09 23:47:31

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH] x86/tdx: Optimize try_accept_memory() to reduce 1GB page accepted failed times

Hi Jun,

kernel test robot noticed the following build warnings:

[auto build test WARNING on tip/x86/tdx]
[also build test WARNING on next-20240109]
[cannot apply to linus/master v6.7]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Jun-Miao/x86-tdx-Optimize-try_accept_memory-to-reduce-1GB-page-accepted-failed-times/20240109-134908
base: tip/x86/tdx
patch link: https://lore.kernel.org/r/20240109054824.9023-1-jun.miao%40intel.com
patch subject: [PATCH] x86/tdx: Optimize try_accept_memory() to reduce 1GB page accepted failed times
config: x86_64-rhel-8.3-bpf (https://download.01.org/0day-ci/archive/20240110/[email protected]/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240110/[email protected]/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/

All warnings (new ones prefixed by >>):

arch/x86/coco/tdx/tdx-shared.c: In function 'tdx_accept_memory':
>> arch/x86/coco/tdx/tdx-shared.c:62:17: warning: "/*" within comment [-Wcomment]
62 | /* The 4KB page case or accept 2MB page failed case. */
|


vim +62 arch/x86/coco/tdx/tdx-shared.c

40
41 bool tdx_accept_memory(phys_addr_t start, phys_addr_t end)
42 {
43 /*
44 * For shared->private conversion, accept the page using
45 * TDG_MEM_PAGE_ACCEPT TDX module call.
46 */
47 while (start < end) {
48 unsigned long len = end - start;
49 unsigned long accept_size;
50
51 /*
52 * Try larger accepts first. It gives chance to VMM to keep
53 * 1G/2M Secure EPT entries where possible and speeds up
54 * process by cutting number of hypercalls (if successful).
55 * Since per current TDX spec, only support for adding 4KB or
56 * 2MB page dynamically.
57 * /
58
59 if (IS_ALIGNED(start, PMD_SIZE) && len >= PMD_SIZE)
60 accept_size = try_accept_one(start, len, PG_LEVEL_2M);
61
> 62 /* The 4KB page case or accept 2MB page failed case. */
63 if (!accept_size)
64 accept_size = try_accept_one(start, len, PG_LEVEL_4K);
65 if (!accept_size)
66 return false;
67 start += accept_size;
68 }
69
70 return true;
71 }
72

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki