LinuxLists.cc - [RFC PATCH 00/22] riscv: s64ilp32: Running 32-bit Linux kernel on 64-bit supervisor mode

2023-05-18 13:34:12

[permalink] [raw]

Subject: [RFC PATCH 00/22] riscv: s64ilp32: Running 32-bit Linux kernel on 64-bit supervisor mode

From: Guo Ren <[email protected]>

This patch series adds s64ilp32 support to riscv. The term s64ilp32
means smode-xlen=64 and -mabi=ilp32 (ints, longs, and pointers are all
32-bit), i.e., running 32-bit Linux kernel on pure 64-bit supervisor
mode. There have been many 64ilp32 abis existing, such as mips-n32 [1],
arm-aarch64ilp32 [2], and x86-x32 [3], but they are all about userspace.
Thus, this should be the first time running a 32-bit Linux kernel with
the 64ilp32 ABI at supervisor mode (If not, correct me).

Why 32-bit Linux?
=================
The motivation for using a 32-bit Linux kernel is to reduce memory
footprint and meet the small capacity of DDR & cache requirement
(e.g., 64/128MB SIP SoC).

Here are the 32-bit v.s. 64-bit Linux kernel data type comparison
summary:
32-bit 64-bit
sizeof(page): 32bytes 64bytes
sizeof(list_head): 8bytes 16bytes
sizeof(hlist_head): 8bytes 16bytes
sizeof(vm_area): 68bytes 136bytes
...

The size of ilp32's long & pointer is just half of lp64's (rv64 default
abi - longs and pointers are all 64-bit). This significant difference
in data type causes different memory & cache footprint costs. Here is
the comparison measurement between s32ilp32, s64ilp32, and s64lp64 in
the same 128MB qemu system environment:

Rootfs:
u32ilp32 - Using the same 32-bit userspace rootfs.ext2 (UXL=32) binary
from buildroot 2023.02-rc3, qemu_riscv32_virt_defconfig

Linux:
s32ilp32 - Linux version 6.3.0-rc1 (124MB)
rv32_defconfig: $(Q)$(MAKE) -f $(srctree)/Makefile
defconfig 32-bit.config

s64lp64 - Linux version 6.3.0-rc1 (126MB)
defconfig: $(Q)$(MAKE) -f $(srctree)/Makefile defconfig

s64ilp32 - Linux version 6.3.0-rc1 (126MB)
rv64ilp32_defconfig: $(Q)$(MAKE) -f $(srctree)/Makefile
defconfig 64ilp32.config

Opensbi:
m64lp64 - (2MB) OpenSBI v1.2-80-g4b28afc98bbe
m32ilp32 - (4MB) OpenSBI v1.2-80-g4b28afc98bbe

+----------------------------------------+--------
| u32ilp32 |
| UXL=32 | Rootfs
+----------------------------------------+--------
| +----------+ +---------+ | +---------+ |
| | s64ilp32 | | s64lp64 | | | s32ilp32| |
| | SXL=64 | | SXL=64 | | | SXL=32 | | Linux
| +----------+ +---------+ | +---------+ |
+----------------------------------------+--------
| +----------------------+ | +---------+ |
| | m64lp64 | | | m32ilp32| |
| | MXL=64 | | | MXL=32 | | Opensbi
| +----------------------+ | +---------+ |
+----------------------------------------+--------
| +----------------------+ | +---------+ |
| | qemu-rv64 | | |qemu-rv32| | HW
| +----------------------+ | +---------+ |
+----------------------------------------+--------

Mem-usage:
(s32ilp32) # free
total used free shared buff/cache available
Mem: 100040 8380 88244 44 3416 88080

(s64lp64) # free
total used free shared buff/cache available
Mem: 91568 11848 75796 44 3924 75952

(s64ilp32) # free
total used free shared buff/cache available
Mem: 101952 8528 90004 44 3420 89816
^^^^^

It's a rough measurement based on the current default config without any
modification, and 32-bit (s32ilp32, s64ilp32) saved more than 16% memory
to 64-bit (s64lp64). But s32ilp32 & s64ilp32 have a similar memory
footprint (about 0.33% difference), meaning s64ilp32 has a big chance to
replace s32ilp32 on the 64-bit machine.

Why s64ilp32?
=============
The current RISC-V has the profiles of RVA20S64, RVA22S64, and RVA23S64
(ongoing) [4], but no RVA**S32 profile exists or any ongoing plan. That
means when a vendor wants to produce a 32-bit s-mode RISC-V Application
Processor, they have no shape to follow. Therefore, many cheap riscv
chips have come out but follow the RVA2xS64 profiles, such as Allwinner
D1/D1s/F133 [5], SOPHGO CV1800B [6], Canaan Kendryte k230 [7], and
Bouffalo Lab BL808 which are typically cortex a7/a35/a53 product
scenarios. The D1 & CV1800B & BL808 didn't support UXL=32 (32-bit U-mode),
so they need a new u64ilp32 userspace ABI which has no software ecosystem
for the current. Thus, the first landing of s64ilp32 would be on Canaan
Kendryte k230, which has c908 with rv64gcv and compat user mode
(sstatus.uxl=32/64), which could support the existing rv32 userspace
software ecosystem.

Another reason for inventing s64ilp32 is performance benefits and
simplify 64-bit CPU hardware design (v.s. s32ilp32).

Why s64ilp32 has better performance?
====================================
Generally speaking, we should build a 32-bit hardware s-mode to run
32-bit Linux on a 64-bit processor (such as Linux-arm32 on cortex-a53).
Or only use old 32ilp32-abi on a 64-bit machine (such as mips
SYS_SUPPORTS_32BIT_KERNEL). These can't reuse performance-related
features and instructions of the 64-bit hardware, such as 64-bit ALU,
AMO, and LD/SD, which would cause significant performance gaps on many
Linux features:

- memcpy/memset/strcmp (s64ilp32 has half of the instructions count
and double the bandwidth of load/store instructions than s32ilp32.)

- ebpf JIT is a 64-bit virtual ISA, which is not suitable
for mapping to s32ilp32.

- Atomic64 (s64ilp32 has the exact native instructions mapping as
s64lp64, but s32ilp32 only uses generic_atomic64, a tradeoff &
limited software solution.)

- 64-bit native arithmetic instructions for "long long" type

- Support cmxchg_double for slub (The 2nd 32-bit Linux
supports the feature, the 1st is i386.)

- ...

Compared with the user space ecosystem, the 32-bit Linux kernel is more
eager to need 64ilp32 to improve performance because the Linux kernel
can't utilize float-point/vector features of the ISA.

Let's look at performance from another perspective (s64ilp32 v.s.
s64lp64). Just as the first chapter said, the pointer size of ilp32 is
half of the lp64, and it reduces the size of the critical data structs
(e.g., page, list, ...). That means the cache of using ilp32 could
contain double data that lp64 with the same cache capacity, which is a
natural advantage of 32-bit.

Why s64ilp32 simplifies CPU design?
===================================
Yes, there are a lot of runing 32-bit Linux on 64-bit hardware examples
in history, such as arm cortex a35/a53/a55, which implements the 32-bit
EL1/EL2/EL3 hardware mode to support 32-bit Linux. We could follow Arm's
style, but riscv could choose another better way. Compared to UXL=32,
the MXL=SXL=32 has many CSR-related hardware functionalities, which
causes a lot of effort to mix them into 64-bit hardware. The s64ilp32
works on MXL=SXL=64 mode, so the CPU vendors needn't implement 32-bit
machine and supervisor modes.

How does s64ilp32 work?
=======================
The s64ilp32 is the same as the s64lp64 compat mode from a hardware
view, i.e., MXL=SXL=64 + UXL=32. Because the s64ilp32 uses CONFIG_32BIT
of Linux, it only supports u32ilp32 abi user space, the current standard
rv32 software ecosystem, and it can't work with u64lp64 abi (I don't
want that complex and useless stuff). But it may work with u64ilp32 in the
future; now, the s64ilp32 depends on the UXL=32 feature of the hardware.

The 64ilp32 gcc still uses sign-extend lw & auipc to generate address
variables because inserting zero-extend instructions to mask the highest
32-bit would cause significant code size and performance problems. Thus,
we invented an OS approach to solve the problem:
- When satp=bare and start physical address < 2GB, there is no sign-extend
address problem.
- When satp=bare and start physical address > 2GB, we need zjpm liked
hardware extensions to mask high 32bit.
(Fortunately, all existed SoCs' (D1/D1s/F133, CV1800B, k230, BL808)
start physical address < 2GB.)
- When satp=sv39, we invent double mapping to make the sign-extended
virtual address the same as the zero-extended virtual address.

+--------+ +---------+ +--------+
| | +--| 511:PUD1| | |
| | | +---------+ | |
| | | | 510:PUD0|--+ | |
| | | +---------+ | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | INVALID | | | |
| | | | | | | |
| .... | | | | | | .... |
| | | | | | | |
| | | +---------+ | | |
| | +--| 3:PUD1 | | | |
| | | +---------+ | | |
| | | | 2:PUD0 |--+ | |
| | | +---------+ | | |
| | | |1:USR_PUD| | | |
| | | +---------+ | | |
| | | |0:USR_PUD| | | |
+--------+<--+ +---------+ +-->+--------+
PUD1 ^ PGD PUD0
1GB | 4GB 1GB
|
+----------+
| Sv39 PGDP|
+----------+
SATP

The size of xlen was always equal to the pointer/long size before
s64ilp32 emerged. So we need to introduce a new type of data - xlen_t,
which could deal with CSR-related and callee-save/restore operations.

Some kernel features use 32BIT/64BIT to determine the exact ISA, such as
ebpf JIT would map to rv32 ISA when CONFIG_32BIT=y. But s64ilp32 needs
the ebpf JIT map to rv64 ISA when CONFIG_32BIT=y and we need to use
another config to distinguish the difference.

More detials, please review the path series.

How to run s64ilp32?
====================

GNU toolchain
-------------
git clone https://github.com/Liaoshihua/riscv-gnu-toolchain.git
cd riscv-gnu-toolchain
./configure --prefix="$PWD/opt-rv64-ilp32/" --with-arch=rv64imac --with-abi=ilp32
make linux
export PATH=$PATH:$PWD/opt-rv64-ilp32/bin/

Opensbi
-------
git clone https://github.com/riscv-software-src/opensbi.git
CROSS_COMPILE=riscv64-unknown-linux-gnu- make PLATFORM=generic

Linux kernel
------------
git clone https://github.com/guoren83/linux.git -b s64ilp32
cd linux
make ARCH=riscv CROSS_COMPILE=riscv64-unknown-linux-gnu- rv64ilp32_defconfig
make ARCH=riscv CROSS_COMPILE=riscv64-unknown-linux-gnu- all

Rootfs
------
git clone git://git.busybox.net/buildroot
cd buildroot
make qemu_riscv32_virt_defconfig
make

Qemu
----
git clone https://github.com/plctlab/plct-qemu.git -b plct-s64ilp32-dev
cd plct-qemu
mkdir build
cd build
../qemu/configure --target-list="riscv64-softmmu riscv32-softmmu"
make

Run
---
./qemu-system-riscv64 -cpu rv64 -M virt -m 128m -nographic -bios fw_dynamic.bin -kernel Image -drive file=rootfs.ext2,format=raw,id=hd0 -device virtio-blk-device,drive=hd0 -append "rootwait root=/dev/vda ro console=ttyS0 earlycon=sbi" -netdev user,id=net0 -device virtio-net-device,netdev=net0

OpenSBI v1.2-119-gdc1c7db05e07
____ _____ ____ _____
/ __ \ / ____| _ \_ _|
| | | |_ __ ___ _ __ | (___ | |_) || |
| | | | '_ \ / _ \ '_ \ \___ \| _ < | |
| |__| | |_) | __/ | | |____) | |_) || |_
\____/| .__/ \___|_| |_|_____/|___/_____|
| |
|_|

Platform Name : riscv-virtio,qemu
Platform Features : medeleg
Platform HART Count : 1
Platform IPI Device : aclint-mswi
Platform Timer Device : aclint-mtimer @ 10000000Hz
Platform Console Device : uart8250
Platform HSM Device : ---
Platform PMU Device : ---
Platform Reboot Device : sifive_test
Platform Shutdown Device : sifive_test
Platform Suspend Device : ---
Platform CPPC Device : ---
Firmware Base : 0x60000000
Firmware Size : 360 KB
Firmware RW Offset : 0x40000
Runtime SBI Version : 1.0

Domain0 Name : root
Domain0 Boot HART : 0
Domain0 HARTs : 0*
Domain0 Region00 : 0x0000000002000000-0x000000000200ffff M: (I,R,W) S/U: ()
Domain0 Region01 : 0x0000000060040000-0x000000006005ffff M: (R,W) S/U: ()
Domain0 Region02 : 0x0000000060000000-0x000000006003ffff M: (R,X) S/U: ()
Domain0 Region03 : 0x0000000000000000-0xffffffffffffffff M: (R,W,X) S/U: (R,W,X)
Domain0 Next Address : 0x0000000060200000
Domain0 Next Arg1 : 0x0000000067e00000
Domain0 Next Mode : S-mode
Domain0 SysReset : yes
Domain0 SysSuspend : yes

Boot HART ID : 0
Boot HART Domain : root
Boot HART Priv Version : v1.12
Boot HART Base ISA : rv64imafdch
Boot HART ISA Extensions : time,sstc
Boot HART PMP Count : 16
Boot HART PMP Granularity : 4
Boot HART PMP Address Bits: 54
Boot HART MHPM Count : 16
Boot HART MIDELEG : 0x0000000000001666
Boot HART MEDELEG : 0x0000000000f0b509
[ 0.000000] Linux version 6.3.0-rc1-00086-gc8d2fedb997a (guoren@fedora) (riscv64-unknown-linux-gnu-gcc (g5e578a16201f) 13.0.1 20230206 (experimental), GNU ld (GNU Binutils) 2.40.50.20230205) #1 SMP Sun May 14 10:46:42 EDT 2023
[ 0.000000] random: crng init done
[ 0.000000] OF: fdt: Ignoring memory range 0x60000000 - 0x60200000
[ 0.000000] Machine model: riscv-virtio,qemu
[ 0.000000] efi: UEFI not found.
[ 0.000000] OF: reserved mem: 0x60000000..0x6003ffff (256 KiB) map non-reusable mmode_resv1@60000000
[ 0.000000] OF: reserved mem: 0x60040000..0x6005ffff (128 KiB) map non-reusable mmode_resv0@60040000
[ 0.000000] Zone ranges:
[ 0.000000] Normal [mem 0x0000000060200000-0x0000000067ffffff]
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000060200000-0x0000000067ffffff]
[ 0.000000] Initmem setup node 0 [mem 0x0000000060200000-0x0000000067ffffff]
[ 0.000000] On node 0, zone Normal: 512 pages in unavailable ranges
[ 0.000000] SBI specification v1.0 detected
[ 0.000000] SBI implementation ID=0x1 Version=0x10002
[ 0.000000] SBI TIME extension detected
[ 0.000000] SBI IPI extension detected
[ 0.000000] SBI RFENCE extension detected
[ 0.000000] SBI SRST extension detected
[ 0.000000] SBI HSM extension detected
[ 0.000000] riscv: base ISA extensions acdfhim
[ 0.000000] riscv: ELF capabilities acdfim
[ 0.000000] percpu: Embedded 13 pages/cpu s24352 r8192 d20704 u53248
[ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 31941
[ 0.000000] Kernel command line: rootwait root=/dev/vda ro console=ttyS0 earlycon=sbi norandmaps
[ 0.000000] Dentry cache hash table entries: 16384 (order: 4, 65536 bytes, linear)
[ 0.000000] Inode-cache hash table entries: 8192 (order: 3, 32768 bytes, linear)
[ 0.000000] mem auto-init: stack:all(zero), heap alloc:off, heap free:off
[ 0.000000] Virtual kernel memory layout:
[ 0.000000] fixmap : 0x9ce00000 - 0x9d000000 (2048 kB)
[ 0.000000] pci io : 0x9d000000 - 0x9e000000 ( 16 MB)
[ 0.000000] vmemmap : 0x9e000000 - 0xa0000000 ( 32 MB)
[ 0.000000] vmalloc : 0xa0000000 - 0xc0000000 ( 512 MB)
[ 0.000000] lowmem : 0xc0000000 - 0xc7e00000 ( 126 MB)
[ 0.000000] Memory: 97748K/129024K available (8699K kernel code, 8867K rwdata, 4096K rodata, 4204K init, 361K bss, 31276K reserved, 0K cma-reserved)
...
Starting network: udhcpc: started, v1.36.0
udhcpc: broadcasting discover
udhcpc: broadcasting select for 10.0.2.15, server 10.0.2.2
udhcpc: lease of 10.0.2.15 obtained from 10.0.2.2, lease time 86400
deleting routers
adding dns 10.0.2.3
OK

Welcome to Buildroot
buildroot login: root
# cat /proc/cpuinfo
processor : 0
hart : 0
isa : rv64imafdch_zihintpause_zbb_sstc
mmu : sv39
mvendorid : 0x0
marchid : 0x70232
mimpid : 0x70232

# uname -a
Linux buildroot 6.3.0-rc1-00086-gc8d2fedb997a #1 SMP Sun May 14 10:46:42 EDT 2023 riscv32 GNU/Linux
# ls /lib/
ld-linux-riscv32-ilp32d.so.1 libgcc_s.so.1
libanl.so.1 libm.so.6
libatomic.so libnss_dns.so.2
libatomic.so.1 libnss_files.so.2
libatomic.so.1.2.0 libpthread.so.0
libc.so.6 libresolv.so.2
libcrypt.so.1 librt.so.1
libdl.so.2 libutil.so.1
libgcc_s.so modules

# cat /proc/99/maps
0000000055554000-0000000055634000 r-xp 00000000 00000000fe:00 17 /bin/busybox
0000000055634000-0000000055636000 r--p 00000000df000 00000000fe:00 17 /bin/busybox
0000000055636000-0000000055637000 rw-p 00000000e1000 00000000fe:00 17 /bin/busybox
0000000055637000-0000000055659000 rw-p 00000000 00:00 0 [heap]
0000000077e8d000-0000000077fbe000 r-xp 00000000 00000000fe:00 137 /lib/libc.so.6
0000000077fbe000-0000000077fbf000 ---p 00000000131000 00000000fe:00 137 /lib/libc.so.6
0000000077fbf000-0000000077fc1000 r--p 00000000131000 00000000fe:00 137 /lib/libc.so.6
0000000077fc1000-0000000077fc2000 rw-p 00000000133000 00000000fe:00 137 /lib/libc.so.6
0000000077fc2000-0000000077fcc000 rw-p 00000000 00:00 0
0000000077fcc000-0000000077fd4000 r-xp 00000000 00000000fe:00 146 /lib/libresolv.so.2
0000000077fd4000-0000000077fd5000 ---p 000000008000 00000000fe:00 146 /lib/libresolv.so.2
0000000077fd5000-0000000077fd6000 r--p 000000008000 00000000fe:00 146 /lib/libresolv.so.2
0000000077fd6000-0000000077fd7000 rw-p 000000009000 00000000fe:00 146 /lib/libresolv.so.2
0000000077fd7000-0000000077fd9000 rw-p 00000000 00:00 0
0000000077fd9000-0000000077fdb000 r--p 00000000 00:00 0 [vvar]
0000000077fdb000-0000000077fdd000 r-xp 00000000 00:00 0 [vdso]
0000000077fdd000-0000000077ffc000 r-xp 00000000 00000000fe:00 132 /lib/ld-linux-riscv32-ilp32d.so.1
0000000077ffd000-0000000077ffe000 r--p 000000001f000 00000000fe:00 132 /lib/ld-linux-riscv32-ilp32d.so.1
0000000077ffe000-0000000077fff000 rw-p 0000000020000 00000000fe:00 132 /lib/ld-linux-riscv32-ilp32d.so.1
000000007ffde000-000000007ffff000 rw-p 00000000 00:00 0 [stack]

Other resources
===============

OpenEuler riscv32 rootfs
------------------------
The OpenEuler riscv32 rootfs you can download from here:
https://repo.tarsier-infra.com/openEuler-RISC-V/obs/archive/rv32/openeuler-image-qemu-riscv32-20221111070036.rootfs.ext4
(Made by Junqiang Wang)

Debain riscv32 rootfs
---------------------
The Debian riscv32 rootfs you can download from here:
https://github.com/yuzibo/riscv32
(Made by Bo YU and Han Gao)

Fedora riscv32 rootfs
---------------------
https://fedoraproject.org/wiki/Architectures/RISC-V/RV32
(Made by Wei Fu)

LLVM 64ilp32
------------
git clone https://github.com/luxufan/llvm-project.git -b rv64-ilp32
cd llvm-project
mkdir build && cd build
cmake ../llvm -G Ninja -DCMAKE_BUILD_TYPE=Release -DLLVM_TARGETS_TO_BUILD=“X86;RISCV" -DLLVM_ENABLE_PROJECTS="clang;lld"
ninja all

(LLVM development status is that CC=clang can compile the kernel with
LLVM=1 but has not yet booted successfully.)

Patch organization
==================
This series depends on 64ilp32 toolchain patches that are not upstream
yet.

PATCH [0-1] unify vdso32 & compat_vdso
PATCH [2] adds time-related vDSO common flow for vdso32
PATCH [3] adds s64ilp32 support of clocksource driver
PATCH [5] adds s64ilp32 support of irqchip driver
PATCH [4,6-12] add basic data types and compiling framework
PATCH [13] adds MMU_SV39 support
PATCH [14] adds native atomic64
PATCH [15] adds TImode
PATCH [16] adds cmpxchg_double
PATCH [17-19] cleanup kconfig & add defconfig
PATCH [20-21] fix temporary compiler problems

Open issues
===========

Callee saved the register width
-------------------------------
For 64-bit ISA (including 64lp64, 64ilp32), callee can't determine the
correct width used in the register, so they saved the maximum width of
the ISA register, i.e., xlen size. We also found this rule in x86-x32,
mips-n32, and aarch64ilp32, which comes from 64lp64. See PATCH [20]

Here are two downsides of this:
- It would cause a difference with 32ilp32's stack frame, and s64ilp32
reuses 32ilp32 software stack. Thus, many additional compatible
problems would happen during the porting of 64ilp32 software.
- It also increases the budget of the stack usage.
<setup_vm>:
auipc a3,0xff3fb
add a3,a3,1234 # c0000000
li a5,-1
lui a4,0xc0000
addw sp,sp,-96
srl a5,a5,0x20
subw a4,a4,a3
auipc a2,0x111a
add a2,a2,1212 # c1d1f000
sd s0,80(sp)----+
sd s1,72(sp) |
sd s2,64(sp) |
sd s7,24(sp) |
sd s8,16(sp) |
sd s9,8(sp) |-> All <= 32b widths, but occupy 64b
sd ra,88(sp) | stack space.
sd s3,56(sp) | Affect memory footprint & cache
sd s4,48(sp) | performance.
sd s5,40(sp) |
sd s6,32(sp) |
sd s10,0(sp)----+
sll a1,a4,0x20
subw a2,a2,a3
and a4,a4,a5

So here is a proposal to riscv 64ilp32 ABI:
- Let the compiler prevent callee saving ">32b variables" in
callee-registers. (Q: We need to measure, how the influence of
64b variables cross function call?)

EF_RISCV_X32
------------
We add an e_flag (EF_RISCV_X32) to distinguish the 32-bit ELF, which
occupies BIT[6] of the e_flags layout.

ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: REL (Relocatable file)
Machine: RISC-V
Version: 0x1
Entry point address: 0x0
Start of program headers: 0 (bytes into file)
Start of section headers: 24620 (bytes into file)
Flags: 0x21, RVC, X32, soft-float ABI
^^^
64-bit Optimization problem
---------------------------
There is an existing problem in 64ilp32 gcc that combines two pointers
in one register. Liao is solving that problem. Before he finishes the
job, we could prevent it with a simple noinline attribute, fortunately.
struct path {
struct vfsmount *mnt;
struct dentry *dentry;
} __randomize_layout;

struct nameidata {
struct path path;
...
struct path root;
...
} __randomize_layout;

struct nameidata *nd
...
nd->path = nd->root;
6c88 ld a0,24(s1)
^^ // a0 contains two pointers
e088 sd a0,0(s1)
mntget(path->mnt);
// Need "lw a0,0(s1)" or "a0 << 32; a0 >> 32"
2a6150ef jal c01ce946 <mntget> // bug!

Acknowledge
===========
The s64ilp32 needs many other projects' cooperation. Thx, all guys
involved:
- GNU: LiaoShihua <[email protected]>,
Jiawe Chen<[email protected]>
- Qemu: Weiwei Li <[email protected]>
- LLVM: luxufan <[email protected]>,
Chunyu Liao<[email protected]>
- OpenEuler rv32: Junqiang Wang <[email protected]>
- Debian rv32: Bo YU <[email protected]>
Han Gao <[email protected]>
- Fedora rv32: Wei Fu <[email protected]>

References
==========
[1] https://techpubs.jurassic.nl/manuals/0630/developer/Mpro_n32_ABI/sgi_html/index.html
[2] https://wiki.debian.org/Arm64ilp32Port
[3] https://lwn.net/Articles/456731/
[4] https://github.com/riscv/riscv-profiles/releases
[5] https://www.cnx-software.com/2021/10/25/allwinner-d1s-f133-risc-v-processor-64mb-ddr2/
[6] https://milkv.io/duo/
[7] https://twitter.com/tphuang/status/1631308330256801793
[8] https://www.cnx-software.com/2022/12/02/pine64-ox64-sbc-bl808-risc-v-multi-protocol-wisoc-64mb-ram/

Guo Ren (22):
riscv: vdso: Unify vdso32 & compat_vdso into vdso/Makefile
riscv: vdso: Remove compat_vdso/
riscv: vdso: Add time-related vDSO common flow for vdso32
clocksource: riscv: s64ilp32: Use __riscv_xlen instead of CONFIG_32BIT
riscv: s64ilp32: Introduce xlen_t
irqchip: riscv: s64ilp32: Use __riscv_xlen instead of CONFIG_32BIT
riscv: s64ilp32: Add sbi support
riscv: s64ilp32: Add asid support
riscv: s64ilp32: Introduce PTR_L and PTR_S
riscv: s64ilp32: Enable user space runtime environment
riscv: s64ilp32: Add ebpf jit support
riscv: s64ilp32: Add ELF32 support
riscv: s64ilp32: Add ARCH RV64 ILP32 compiling framework
riscv: s64ilp32: Add MMU_SV39 mode support for 32BIT
riscv: s64ilp32: Enable native atomic64
riscv: s64ilp32: Add TImode (128 int) support
riscv: s64ilp32: Implement cmpxchg_double
riscv: s64ilp32: Disable KVM
riscv: Cleanup rv32_defconfig
riscv: s64ilp32: Add rv64ilp32_defconfig
riscv: s64ilp32: Correct the rv64ilp32 stackframe layout
riscv: s64ilp32: Temporary workaround solution to gcc problem

arch/riscv/Kconfig | 36 +++-
arch/riscv/Makefile | 24 ++-
arch/riscv/configs/32-bit.config | 2 -
arch/riscv/configs/64ilp32.config | 2 +
arch/riscv/include/asm/asm.h | 5 +
arch/riscv/include/asm/atomic.h | 6 +
arch/riscv/include/asm/cmpxchg.h | 53 ++++++
arch/riscv/include/asm/cpu_ops_sbi.h | 4 +-
arch/riscv/include/asm/csr.h | 58 +++---
arch/riscv/include/asm/extable.h | 2 +-
arch/riscv/include/asm/page.h | 24 ++-
arch/riscv/include/asm/pgtable-64.h | 42 ++---
arch/riscv/include/asm/pgtable.h | 26 ++-
arch/riscv/include/asm/processor.h | 8 +-
arch/riscv/include/asm/ptrace.h | 96 +++++-----
arch/riscv/include/asm/sbi.h | 24 +--
arch/riscv/include/asm/stacktrace.h | 6 +
arch/riscv/include/asm/timex.h | 10 +-
arch/riscv/include/asm/vdso.h | 34 +++-
arch/riscv/include/asm/vdso/gettimeofday.h | 84 +++++++++
arch/riscv/include/uapi/asm/elf.h | 2 +-
arch/riscv/include/uapi/asm/unistd.h | 1 +
arch/riscv/kernel/Makefile | 3 +-
arch/riscv/kernel/compat_signal.c | 2 +-
arch/riscv/kernel/compat_vdso/.gitignore | 2 -
arch/riscv/kernel/compat_vdso/compat_vdso.S | 8 -
.../kernel/compat_vdso/compat_vdso.lds.S | 3 -
arch/riscv/kernel/compat_vdso/flush_icache.S | 3 -
arch/riscv/kernel/compat_vdso/getcpu.S | 3 -
arch/riscv/kernel/compat_vdso/note.S | 3 -
arch/riscv/kernel/compat_vdso/rt_sigreturn.S | 3 -
arch/riscv/kernel/cpu.c | 4 +-
arch/riscv/kernel/cpu_ops_sbi.c | 4 +-
arch/riscv/kernel/cpufeature.c | 4 +-
arch/riscv/kernel/entry.S | 24 +--
arch/riscv/kernel/head.S | 8 +-
arch/riscv/kernel/process.c | 8 +-
arch/riscv/kernel/sbi.c | 24 +--
arch/riscv/kernel/signal.c | 6 +-
arch/riscv/kernel/traps.c | 4 +-
arch/riscv/kernel/vdso.c | 4 +-
arch/riscv/kernel/vdso/Makefile | 176 ++++++++++++------
..._vdso_offsets.sh => gen_vdso32_offsets.sh} | 2 +-
.../gen_vdso64_offsets.sh} | 2 +-
arch/riscv/kernel/vdso/vgettimeofday.c | 39 +++-
arch/riscv/kernel/vdso32.S | 8 +
arch/riscv/kernel/{vdso/vdso.S => vdso64.S} | 8 +-
arch/riscv/kvm/Kconfig | 1 +
arch/riscv/lib/Makefile | 1 +
arch/riscv/lib/memset.S | 4 +-
arch/riscv/mm/context.c | 16 +-
arch/riscv/mm/fault.c | 13 +-
arch/riscv/mm/init.c | 29 ++-
arch/riscv/net/Makefile | 6 +-
arch/riscv/net/bpf_jit_comp64.c | 10 +-
drivers/clocksource/timer-riscv.c | 2 +-
drivers/irqchip/irq-riscv-intc.c | 4 +-
fs/namei.c | 2 +-
58 files changed, 675 insertions(+), 317 deletions(-)
create mode 100644 arch/riscv/configs/64ilp32.config
delete mode 100644 arch/riscv/kernel/compat_vdso/.gitignore
delete mode 100644 arch/riscv/kernel/compat_vdso/compat_vdso.S
delete mode 100644 arch/riscv/kernel/compat_vdso/compat_vdso.lds.S
delete mode 100644 arch/riscv/kernel/compat_vdso/flush_icache.S
delete mode 100644 arch/riscv/kernel/compat_vdso/getcpu.S
delete mode 100644 arch/riscv/kernel/compat_vdso/note.S
delete mode 100644 arch/riscv/kernel/compat_vdso/rt_sigreturn.S
rename arch/riscv/kernel/vdso/{gen_vdso_offsets.sh => gen_vdso32_offsets.sh} (78%)
rename arch/riscv/kernel/{compat_vdso/gen_compat_vdso_offsets.sh => vdso/gen_vdso64_offsets.sh} (77%)
create mode 100644 arch/riscv/kernel/vdso32.S
rename arch/riscv/kernel/{vdso/vdso.S => vdso64.S} (73%)

--
2.36.1

2023-05-18 13:34:46

[permalink] [raw]

Subject: [RFC PATCH 12/22] riscv: s64ilp32: Add ELF32 support

From: Guo Ren <[email protected]>

Use abi_len to distinct ELF32 and ELF64 because s64ilp32 is xlen=64 and
abi_len=32 (__SIZEOF_POINTER__=4). And s64ilp32 is an ELF32 based the
same as s32ilp32.

Signed-off-by: Guo Ren <[email protected]>
Signed-off-by: Guo Ren <[email protected]>
---
arch/riscv/include/uapi/asm/elf.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/riscv/include/uapi/asm/elf.h b/arch/riscv/include/uapi/asm/elf.h
index d696d6610231..962e8ec8fe05 100644
--- a/arch/riscv/include/uapi/asm/elf.h
+++ b/arch/riscv/include/uapi/asm/elf.h
@@ -24,7 +24,7 @@ typedef __u64 elf_fpreg_t;
typedef union __riscv_fp_state elf_fpregset_t;
#define ELF_NFPREG (sizeof(struct __riscv_d_ext_state) / sizeof(elf_fpreg_t))

-#if __riscv_xlen == 64
+#if __SIZEOF_POINTER__ == 8
#define ELF_RISCV_R_SYM(r_info) ELF64_R_SYM(r_info)
#define ELF_RISCV_R_TYPE(r_info) ELF64_R_TYPE(r_info)
#else
--
2.36.1

2023-05-18 13:34:51

[permalink] [raw]

Subject: [RFC PATCH 07/22] riscv: s64ilp32: Add sbi support

From: Guo Ren <[email protected]>

The sbi uses xlen as base argument elements to connect m-mode and
s-mode. The previous implementation assumes sizeof(xlen_t) =
sizeof(long), but the s64ilp32's are different. So modify the sbi code
suitable with the s64ilp32 change.

Signed-off-by: Guo Ren <[email protected]>
Signed-off-by: Guo Ren <[email protected]>
---
arch/riscv/include/asm/cpu_ops_sbi.h | 4 ++--
arch/riscv/include/asm/sbi.h | 24 ++++++++++++------------
arch/riscv/kernel/cpu_ops_sbi.c | 4 ++--
arch/riscv/kernel/sbi.c | 24 ++++++++++++------------
4 files changed, 28 insertions(+), 28 deletions(-)

diff --git a/arch/riscv/include/asm/cpu_ops_sbi.h b/arch/riscv/include/asm/cpu_ops_sbi.h
index d6e4665b3195..d967adad6b48 100644
--- a/arch/riscv/include/asm/cpu_ops_sbi.h
+++ b/arch/riscv/include/asm/cpu_ops_sbi.h
@@ -19,8 +19,8 @@ extern const struct cpu_operations cpu_ops_sbi;
* @stack_ptr: A pointer to the hart specific sp
*/
struct sbi_hart_boot_data {
- void *task_ptr;
- void *stack_ptr;
+ xlen_t task_ptr;
+ xlen_t stack_ptr;
};
#endif

diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h
index 945b7be249c1..d31135715f0e 100644
--- a/arch/riscv/include/asm/sbi.h
+++ b/arch/riscv/include/asm/sbi.h
@@ -123,16 +123,16 @@ enum sbi_ext_pmu_fid {
};

union sbi_pmu_ctr_info {
- unsigned long value;
+ xlen_t value;
struct {
- unsigned long csr:12;
- unsigned long width:6;
+ xlen_t csr:12;
+ xlen_t width:6;
#if __riscv_xlen == 32
- unsigned long reserved:13;
+ xlen_t reserved:13;
#else
- unsigned long reserved:45;
+ xlen_t reserved:45;
#endif
- unsigned long type:1;
+ xlen_t type:1;
};
};

@@ -254,15 +254,15 @@ enum sbi_pmu_ctr_type {

extern unsigned long sbi_spec_version;
struct sbiret {
- long error;
- long value;
+ xlen_t error;
+ xlen_t value;
};

void sbi_init(void);
-struct sbiret sbi_ecall(int ext, int fid, unsigned long arg0,
- unsigned long arg1, unsigned long arg2,
- unsigned long arg3, unsigned long arg4,
- unsigned long arg5);
+struct sbiret sbi_ecall(int ext, int fid, xlen_t arg0,
+ xlen_t arg1, xlen_t arg2,
+ xlen_t arg3, xlen_t arg4,
+ xlen_t arg5);

void sbi_console_putchar(int ch);
int sbi_console_getchar(void);
diff --git a/arch/riscv/kernel/cpu_ops_sbi.c b/arch/riscv/kernel/cpu_ops_sbi.c
index efa0f0816634..01a1e270ec1d 100644
--- a/arch/riscv/kernel/cpu_ops_sbi.c
+++ b/arch/riscv/kernel/cpu_ops_sbi.c
@@ -71,8 +71,8 @@ static int sbi_cpu_start(unsigned int cpuid, struct task_struct *tidle)

/* Make sure tidle is updated */
smp_mb();
- bdata->task_ptr = tidle;
- bdata->stack_ptr = task_stack_page(tidle) + THREAD_SIZE;
+ bdata->task_ptr = (ulong)tidle;
+ bdata->stack_ptr = (ulong)task_stack_page(tidle) + THREAD_SIZE;
/* Make sure boot data is updated */
smp_mb();
hsm_data = __pa(bdata);
diff --git a/arch/riscv/kernel/sbi.c b/arch/riscv/kernel/sbi.c
index 5c87db8fdff2..b649562aff61 100644
--- a/arch/riscv/kernel/sbi.c
+++ b/arch/riscv/kernel/sbi.c
@@ -22,21 +22,21 @@ static int (*__sbi_rfence)(int fid, const struct cpumask *cpu_mask,
unsigned long start, unsigned long size,
unsigned long arg4, unsigned long arg5) __ro_after_init;

-struct sbiret sbi_ecall(int ext, int fid, unsigned long arg0,
- unsigned long arg1, unsigned long arg2,
- unsigned long arg3, unsigned long arg4,
- unsigned long arg5)
+struct sbiret sbi_ecall(int ext, int fid, xlen_t arg0,
+ xlen_t arg1, xlen_t arg2,
+ xlen_t arg3, xlen_t arg4,
+ xlen_t arg5)
{
struct sbiret ret;

- register uintptr_t a0 asm ("a0") = (uintptr_t)(arg0);
- register uintptr_t a1 asm ("a1") = (uintptr_t)(arg1);
- register uintptr_t a2 asm ("a2") = (uintptr_t)(arg2);
- register uintptr_t a3 asm ("a3") = (uintptr_t)(arg3);
- register uintptr_t a4 asm ("a4") = (uintptr_t)(arg4);
- register uintptr_t a5 asm ("a5") = (uintptr_t)(arg5);
- register uintptr_t a6 asm ("a6") = (uintptr_t)(fid);
- register uintptr_t a7 asm ("a7") = (uintptr_t)(ext);
+ register xlen_t a0 asm ("a0") = arg0;
+ register xlen_t a1 asm ("a1") = arg1;
+ register xlen_t a2 asm ("a2") = arg2;
+ register xlen_t a3 asm ("a3") = arg3;
+ register xlen_t a4 asm ("a4") = arg4;
+ register xlen_t a5 asm ("a5") = arg5;
+ register xlen_t a6 asm ("a6") = fid;
+ register xlen_t a7 asm ("a7") = ext;
asm volatile ("ecall"
: "+r" (a0), "+r" (a1)
: "r" (a2), "r" (a3), "r" (a4), "r" (a5), "r" (a6), "r" (a7)
--
2.36.1

2023-05-18 13:35:20

[permalink] [raw]

Subject: [RFC PATCH 15/22] riscv: s64ilp32: Enable native atomic64

From: Guo Ren <[email protected]>

The traditional rv32 Linux (s32ilp32) uses a generic version of the
lib/atomic64.c, which are inaccurate atomic64 primitives and couldn't
co-work with READ_ONCE/WRITE_ONCE, atomic_8/16/32. The s64ilp32 could
use native AMO instructions to implement accurate atomic64 primitives.

Signed-off-by: Guo Ren <[email protected]>
Signed-off-by: Guo Ren <[email protected]>
---
arch/riscv/Kconfig | 2 +-
arch/riscv/include/asm/atomic.h | 6 ++++++
2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 9c458496ec3a..33fe624ef6d3 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -60,7 +60,7 @@ config RISCV
select CPU_PM if CPU_IDLE
select EDAC_SUPPORT
select GENERIC_ARCH_TOPOLOGY
- select GENERIC_ATOMIC64 if !64BIT
+ select GENERIC_ATOMIC64 if ARCH_RV32I
select GENERIC_CLOCKEVENTS_BROADCAST if SMP
select GENERIC_EARLY_IOREMAP
select GENERIC_ENTRY
diff --git a/arch/riscv/include/asm/atomic.h b/arch/riscv/include/asm/atomic.h
index 0dfe9d857a76..edfa6a74fe04 100644
--- a/arch/riscv/include/asm/atomic.h
+++ b/arch/riscv/include/asm/atomic.h
@@ -16,6 +16,12 @@
# endif
#endif

+#ifdef CONFIG_ARCH_RV64ILP32
+typedef struct {
+ s64 counter;
+} atomic64_t;
+#endif
+
#include <asm/cmpxchg.h>
#include <asm/barrier.h>

--
2.36.1

2023-05-18 13:35:34

[permalink] [raw]

Subject: [RFC PATCH 04/22] clocksource: riscv: s64ilp32: Use __riscv_xlen instead of CONFIG_32BIT

From: Guo Ren <[email protected]>

When s64ilp32 enabled, CONFIG_32BIT=y but __riscv_xlen=64. So we
must use __riscv_xlen to detect real machine XLEN for CSR access.

Signed-off-by: Guo Ren <[email protected]>
Signed-off-by: Guo Ren <[email protected]>
---
drivers/clocksource/timer-riscv.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/clocksource/timer-riscv.c b/drivers/clocksource/timer-riscv.c
index 5f0f10c7e222..459a634012ce 100644
--- a/drivers/clocksource/timer-riscv.c
+++ b/drivers/clocksource/timer-riscv.c
@@ -37,7 +37,7 @@ static int riscv_clock_next_event(unsigned long delta,

csr_set(CSR_IE, IE_TIE);
if (static_branch_likely(&riscv_sstc_available)) {
-#if defined(CONFIG_32BIT)
+#if __riscv_xlen == 32
csr_write(CSR_STIMECMP, next_tval & 0xFFFFFFFF);
csr_write(CSR_STIMECMPH, next_tval >> 32);
#else
--
2.36.1

2023-05-18 13:35:48

[permalink] [raw]

Subject: [RFC PATCH 02/22] riscv: vdso: Remove compat_vdso/

From: Guo Ren <[email protected]>

After unifying vdso32 & vdso64 into vdso/, we ever needn't compat_vdso
directory. This commit removes the whole compat_vdso/.

Signed-off-by: Guo Ren <[email protected]>
Signed-off-by: Guo Ren <[email protected]>
---
arch/riscv/kernel/compat_vdso/.gitignore | 2 --
arch/riscv/kernel/compat_vdso/compat_vdso.S | 8 --------
arch/riscv/kernel/compat_vdso/compat_vdso.lds.S | 3 ---
arch/riscv/kernel/compat_vdso/flush_icache.S | 3 ---
arch/riscv/kernel/compat_vdso/gen_compat_vdso_offsets.sh | 5 -----
arch/riscv/kernel/compat_vdso/getcpu.S | 3 ---
arch/riscv/kernel/compat_vdso/note.S | 3 ---
arch/riscv/kernel/compat_vdso/rt_sigreturn.S | 3 ---
8 files changed, 30 deletions(-)
delete mode 100644 arch/riscv/kernel/compat_vdso/.gitignore
delete mode 100644 arch/riscv/kernel/compat_vdso/compat_vdso.S
delete mode 100644 arch/riscv/kernel/compat_vdso/compat_vdso.lds.S
delete mode 100644 arch/riscv/kernel/compat_vdso/flush_icache.S
delete mode 100755 arch/riscv/kernel/compat_vdso/gen_compat_vdso_offsets.sh
delete mode 100644 arch/riscv/kernel/compat_vdso/getcpu.S
delete mode 100644 arch/riscv/kernel/compat_vdso/note.S
delete mode 100644 arch/riscv/kernel/compat_vdso/rt_sigreturn.S

diff --git a/arch/riscv/kernel/compat_vdso/.gitignore b/arch/riscv/kernel/compat_vdso/.gitignore
deleted file mode 100644
index 19d83d846c1e..000000000000
--- a/arch/riscv/kernel/compat_vdso/.gitignore
+++ /dev/null
@@ -1,2 +0,0 @@
-# SPDX-License-Identifier: GPL-2.0-only
-compat_vdso.lds
diff --git a/arch/riscv/kernel/compat_vdso/compat_vdso.S b/arch/riscv/kernel/compat_vdso/compat_vdso.S
deleted file mode 100644
index ffd66237e091..000000000000
--- a/arch/riscv/kernel/compat_vdso/compat_vdso.S
+++ /dev/null
@@ -1,8 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
-
-#define vdso_start compat_vdso_start
-#define vdso_end compat_vdso_end
-
-#define __VDSO_PATH "arch/riscv/kernel/compat_vdso/compat_vdso.so"
-
-#include "../vdso/vdso.S"
diff --git a/arch/riscv/kernel/compat_vdso/compat_vdso.lds.S b/arch/riscv/kernel/compat_vdso/compat_vdso.lds.S
deleted file mode 100644
index c7c9355d311e..000000000000
--- a/arch/riscv/kernel/compat_vdso/compat_vdso.lds.S
+++ /dev/null
@@ -1,3 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
-
-#include "../vdso/vdso.lds.S"
diff --git a/arch/riscv/kernel/compat_vdso/flush_icache.S b/arch/riscv/kernel/compat_vdso/flush_icache.S
deleted file mode 100644
index 523dd8b96045..000000000000
--- a/arch/riscv/kernel/compat_vdso/flush_icache.S
+++ /dev/null
@@ -1,3 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
-
-#include "../vdso/flush_icache.S"
diff --git a/arch/riscv/kernel/compat_vdso/gen_compat_vdso_offsets.sh b/arch/riscv/kernel/compat_vdso/gen_compat_vdso_offsets.sh
deleted file mode 100755
index 8ac070c783b3..000000000000
--- a/arch/riscv/kernel/compat_vdso/gen_compat_vdso_offsets.sh
+++ /dev/null
@@ -1,5 +0,0 @@
-#!/bin/sh
-# SPDX-License-Identifier: GPL-2.0
-
-LC_ALL=C
-sed -n -e 's/^[0]\+\(0[0-9a-fA-F]*\) . \(__vdso_[a-zA-Z0-9_]*\)$/\#define compat\2_offset\t0x\1/p'
diff --git a/arch/riscv/kernel/compat_vdso/getcpu.S b/arch/riscv/kernel/compat_vdso/getcpu.S
deleted file mode 100644
index 10f463efe271..000000000000
--- a/arch/riscv/kernel/compat_vdso/getcpu.S
+++ /dev/null
@@ -1,3 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
-
-#include "../vdso/getcpu.S"
diff --git a/arch/riscv/kernel/compat_vdso/note.S b/arch/riscv/kernel/compat_vdso/note.S
deleted file mode 100644
index b10312907542..000000000000
--- a/arch/riscv/kernel/compat_vdso/note.S
+++ /dev/null
@@ -1,3 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
-
-#include "../vdso/note.S"
diff --git a/arch/riscv/kernel/compat_vdso/rt_sigreturn.S b/arch/riscv/kernel/compat_vdso/rt_sigreturn.S
deleted file mode 100644
index 884aada4facc..000000000000
--- a/arch/riscv/kernel/compat_vdso/rt_sigreturn.S
+++ /dev/null
@@ -1,3 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
-
-#include "../vdso/rt_sigreturn.S"
--
2.36.1

2023-05-18 13:48:19

[permalink] [raw]

Subject: [RFC PATCH 20/22] riscv: s64ilp32: Add rv64ilp32_defconfig

From: Guo Ren <[email protected]>

Follow the rv32_defconfig rule to add rv64ilp32_defconfig; the only
difference is:
-CONFIG_ARCH_RV32I=y
+CONFIG_ARCH_RV64ILP32=y

Signed-off-by: Guo Ren <[email protected]>
Signed-off-by: Guo Ren <[email protected]>
---
arch/riscv/Makefile | 4 ++++
arch/riscv/configs/64ilp32.config | 2 ++
2 files changed, 6 insertions(+)
create mode 100644 arch/riscv/configs/64ilp32.config

diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
index d47ba6b09b41..22b62de62192 100644
--- a/arch/riscv/Makefile
+++ b/arch/riscv/Makefile
@@ -187,3 +187,7 @@ rv32_defconfig:
PHONY += rv32_nommu_virt_defconfig
rv32_nommu_virt_defconfig:
$(Q)$(MAKE) -f $(srctree)/Makefile nommu_virt_defconfig 32-bit.config
+
+PHONY += rv64ilp32_defconfig
+rv64ilp32_defconfig:
+ $(Q)$(MAKE) -f $(srctree)/Makefile defconfig 64ilp32.config
diff --git a/arch/riscv/configs/64ilp32.config b/arch/riscv/configs/64ilp32.config
new file mode 100644
index 000000000000..7d836aa2fae7
--- /dev/null
+++ b/arch/riscv/configs/64ilp32.config
@@ -0,0 +1,2 @@
+CONFIG_ARCH_RV64ILP32=y
+CONFIG_NONPORTABLE=y
--
2.36.1

2023-05-18 15:53:46

by Palmer Dabbelt

[permalink] [raw]

Subject: Re: [RFC PATCH 00/22] riscv: s64ilp32: Running 32-bit Linux kernel on 64-bit supervisor mode

On Thu, 18 May 2023 06:09:51 PDT (-0700), [email protected] wrote:
> From: Guo Ren <[email protected]>
>
> This patch series adds s64ilp32 support to riscv. The term s64ilp32
> means smode-xlen=64 and -mabi=ilp32 (ints, longs, and pointers are all
> 32-bit), i.e., running 32-bit Linux kernel on pure 64-bit supervisor
> mode. There have been many 64ilp32 abis existing, such as mips-n32 [1],
> arm-aarch64ilp32 [2], and x86-x32 [3], but they are all about userspace.
> Thus, this should be the first time running a 32-bit Linux kernel with
> the 64ilp32 ABI at supervisor mode (If not, correct me).

Does anyone actually want this? At a bare minimum we'd need to add it
to the psABI, which would presumably also be required on the compiler
side of things.

It's not even clear anyone wants rv64/ilp32 in userspace, the kernel
seems like it'd be even less widely used.

> Why 32-bit Linux?
> =================
> The motivation for using a 32-bit Linux kernel is to reduce memory
> footprint and meet the small capacity of DDR & cache requirement
> (e.g., 64/128MB SIP SoC).
>
> Here are the 32-bit v.s. 64-bit Linux kernel data type comparison
> summary:
> 32-bit 64-bit
> sizeof(page): 32bytes 64bytes
> sizeof(list_head): 8bytes 16bytes
> sizeof(hlist_head): 8bytes 16bytes
> sizeof(vm_area): 68bytes 136bytes
> ...
>
> The size of ilp32's long & pointer is just half of lp64's (rv64 default
> abi - longs and pointers are all 64-bit). This significant difference
> in data type causes different memory & cache footprint costs. Here is
> the comparison measurement between s32ilp32, s64ilp32, and s64lp64 in
> the same 128MB qemu system environment:
>
> Rootfs:
> u32ilp32 - Using the same 32-bit userspace rootfs.ext2 (UXL=32) binary
> from buildroot 2023.02-rc3, qemu_riscv32_virt_defconfig
>
> Linux:
> s32ilp32 - Linux version 6.3.0-rc1 (124MB)
> rv32_defconfig: $(Q)$(MAKE) -f $(srctree)/Makefile
> defconfig 32-bit.config
>
> s64lp64 - Linux version 6.3.0-rc1 (126MB)
> defconfig: $(Q)$(MAKE) -f $(srctree)/Makefile defconfig
>
> s64ilp32 - Linux version 6.3.0-rc1 (126MB)
> rv64ilp32_defconfig: $(Q)$(MAKE) -f $(srctree)/Makefile
> defconfig 64ilp32.config
>
> Opensbi:
> m64lp64 - (2MB) OpenSBI v1.2-80-g4b28afc98bbe
> m32ilp32 - (4MB) OpenSBI v1.2-80-g4b28afc98bbe
>
> +----------------------------------------+--------
> | u32ilp32 |
> | UXL=32 | Rootfs
> +----------------------------------------+--------
> | +----------+ +---------+ | +---------+ |
> | | s64ilp32 | | s64lp64 | | | s32ilp32| |
> | | SXL=64 | | SXL=64 | | | SXL=32 | | Linux
> | +----------+ +---------+ | +---------+ |
> +----------------------------------------+--------
> | +----------------------+ | +---------+ |
> | | m64lp64 | | | m32ilp32| |
> | | MXL=64 | | | MXL=32 | | Opensbi
> | +----------------------+ | +---------+ |
> +----------------------------------------+--------
> | +----------------------+ | +---------+ |
> | | qemu-rv64 | | |qemu-rv32| | HW
> | +----------------------+ | +---------+ |
> +----------------------------------------+--------
>
> Mem-usage:
> (s32ilp32) # free
> total used free shared buff/cache available
> Mem: 100040 8380 88244 44 3416 88080
>
> (s64lp64) # free
> total used free shared buff/cache available
> Mem: 91568 11848 75796 44 3924 75952
>
> (s64ilp32) # free
> total used free shared buff/cache available
> Mem: 101952 8528 90004 44 3420 89816
> ^^^^^
>
> It's a rough measurement based on the current default config without any
> modification, and 32-bit (s32ilp32, s64ilp32) saved more than 16% memory
> to 64-bit (s64lp64). But s32ilp32 & s64ilp32 have a similar memory
> footprint (about 0.33% difference), meaning s64ilp32 has a big chance to
> replace s32ilp32 on the 64-bit machine.
>
> Why s64ilp32?
> =============
> The current RISC-V has the profiles of RVA20S64, RVA22S64, and RVA23S64
> (ongoing) [4], but no RVA**S32 profile exists or any ongoing plan. That
> means when a vendor wants to produce a 32-bit s-mode RISC-V Application
> Processor, they have no shape to follow. Therefore, many cheap riscv
> chips have come out but follow the RVA2xS64 profiles, such as Allwinner
> D1/D1s/F133 [5], SOPHGO CV1800B [6], Canaan Kendryte k230 [7], and
> Bouffalo Lab BL808 which are typically cortex a7/a35/a53 product
> scenarios. The D1 & CV1800B & BL808 didn't support UXL=32 (32-bit U-mode),
> so they need a new u64ilp32 userspace ABI which has no software ecosystem
> for the current. Thus, the first landing of s64ilp32 would be on Canaan
> Kendryte k230, which has c908 with rv64gcv and compat user mode
> (sstatus.uxl=32/64), which could support the existing rv32 userspace
> software ecosystem.
>
> Another reason for inventing s64ilp32 is performance benefits and
> simplify 64-bit CPU hardware design (v.s. s32ilp32).
>
> Why s64ilp32 has better performance?
> ====================================
> Generally speaking, we should build a 32-bit hardware s-mode to run
> 32-bit Linux on a 64-bit processor (such as Linux-arm32 on cortex-a53).
> Or only use old 32ilp32-abi on a 64-bit machine (such as mips
> SYS_SUPPORTS_32BIT_KERNEL). These can't reuse performance-related
> features and instructions of the 64-bit hardware, such as 64-bit ALU,
> AMO, and LD/SD, which would cause significant performance gaps on many
> Linux features:
>
> - memcpy/memset/strcmp (s64ilp32 has half of the instructions count
> and double the bandwidth of load/store instructions than s32ilp32.)
>
> - ebpf JIT is a 64-bit virtual ISA, which is not suitable
> for mapping to s32ilp32.
>
> - Atomic64 (s64ilp32 has the exact native instructions mapping as
> s64lp64, but s32ilp32 only uses generic_atomic64, a tradeoff &
> limited software solution.)
>
> - 64-bit native arithmetic instructions for "long long" type
>
> - Support cmxchg_double for slub (The 2nd 32-bit Linux
> supports the feature, the 1st is i386.)
>
> - ...
>
> Compared with the user space ecosystem, the 32-bit Linux kernel is more
> eager to need 64ilp32 to improve performance because the Linux kernel
> can't utilize float-point/vector features of the ISA.
>
> Let's look at performance from another perspective (s64ilp32 v.s.
> s64lp64). Just as the first chapter said, the pointer size of ilp32 is
> half of the lp64, and it reduces the size of the critical data structs
> (e.g., page, list, ...). That means the cache of using ilp32 could
> contain double data that lp64 with the same cache capacity, which is a
> natural advantage of 32-bit.
>
> Why s64ilp32 simplifies CPU design?
> ===================================
> Yes, there are a lot of runing 32-bit Linux on 64-bit hardware examples
> in history, such as arm cortex a35/a53/a55, which implements the 32-bit
> EL1/EL2/EL3 hardware mode to support 32-bit Linux. We could follow Arm's
> style, but riscv could choose another better way. Compared to UXL=32,
> the MXL=SXL=32 has many CSR-related hardware functionalities, which
> causes a lot of effort to mix them into 64-bit hardware. The s64ilp32
> works on MXL=SXL=64 mode, so the CPU vendors needn't implement 32-bit
> machine and supervisor modes.
>
> How does s64ilp32 work?
> =======================
> The s64ilp32 is the same as the s64lp64 compat mode from a hardware
> view, i.e., MXL=SXL=64 + UXL=32. Because the s64ilp32 uses CONFIG_32BIT
> of Linux, it only supports u32ilp32 abi user space, the current standard
> rv32 software ecosystem, and it can't work with u64lp64 abi (I don't
> want that complex and useless stuff). But it may work with u64ilp32 in the
> future; now, the s64ilp32 depends on the UXL=32 feature of the hardware.
>
> The 64ilp32 gcc still uses sign-extend lw & auipc to generate address
> variables because inserting zero-extend instructions to mask the highest
> 32-bit would cause significant code size and performance problems. Thus,
> we invented an OS approach to solve the problem:
> - When satp=bare and start physical address < 2GB, there is no sign-extend
> address problem.
> - When satp=bare and start physical address > 2GB, we need zjpm liked
> hardware extensions to mask high 32bit.
> (Fortunately, all existed SoCs' (D1/D1s/F133, CV1800B, k230, BL808)
> start physical address < 2GB.)
> - When satp=sv39, we invent double mapping to make the sign-extended
> virtual address the same as the zero-extended virtual address.
>
> +--------+ +---------+ +--------+
> | | +--| 511:PUD1| | |
> | | | +---------+ | |
> | | | | 510:PUD0|--+ | |
> | | | +---------+ | | |
> | | | | | | | |
> | | | | | | | |
> | | | | | | | |
> | | | | INVALID | | | |
> | | | | | | | |
> | .... | | | | | | .... |
> | | | | | | | |
> | | | +---------+ | | |
> | | +--| 3:PUD1 | | | |
> | | | +---------+ | | |
> | | | | 2:PUD0 |--+ | |
> | | | +---------+ | | |
> | | | |1:USR_PUD| | | |
> | | | +---------+ | | |
> | | | |0:USR_PUD| | | |
> +--------+<--+ +---------+ +-->+--------+
> PUD1 ^ PGD PUD0
> 1GB | 4GB 1GB
> |
> +----------+
> | Sv39 PGDP|
> +----------+
> SATP
>
> The size of xlen was always equal to the pointer/long size before
> s64ilp32 emerged. So we need to introduce a new type of data - xlen_t,
> which could deal with CSR-related and callee-save/restore operations.
>
> Some kernel features use 32BIT/64BIT to determine the exact ISA, such as
> ebpf JIT would map to rv32 ISA when CONFIG_32BIT=y. But s64ilp32 needs
> the ebpf JIT map to rv64 ISA when CONFIG_32BIT=y and we need to use
> another config to distinguish the difference.
>
> More detials, please review the path series.
>
> How to run s64ilp32?
> ====================
>
> GNU toolchain
> -------------
> git clone https://github.com/Liaoshihua/riscv-gnu-toolchain.git
> cd riscv-gnu-toolchain
> ./configure --prefix="$PWD/opt-rv64-ilp32/" --with-arch=rv64imac --with-abi=ilp32
> make linux
> export PATH=$PATH:$PWD/opt-rv64-ilp32/bin/
>
> Opensbi
> -------
> git clone https://github.com/riscv-software-src/opensbi.git
> CROSS_COMPILE=riscv64-unknown-linux-gnu- make PLATFORM=generic
>
> Linux kernel
> ------------
> git clone https://github.com/guoren83/linux.git -b s64ilp32
> cd linux
> make ARCH=riscv CROSS_COMPILE=riscv64-unknown-linux-gnu- rv64ilp32_defconfig
> make ARCH=riscv CROSS_COMPILE=riscv64-unknown-linux-gnu- all
>
> Rootfs
> ------
> git clone git://git.busybox.net/buildroot
> cd buildroot
> make qemu_riscv32_virt_defconfig
> make
>
> Qemu
> ----
> git clone https://github.com/plctlab/plct-qemu.git -b plct-s64ilp32-dev
> cd plct-qemu
> mkdir build
> cd build
> ../qemu/configure --target-list="riscv64-softmmu riscv32-softmmu"
> make
>
> Run
> ---
> ./qemu-system-riscv64 -cpu rv64 -M virt -m 128m -nographic -bios fw_dynamic.bin -kernel Image -drive file=rootfs.ext2,format=raw,id=hd0 -device virtio-blk-device,drive=hd0 -append "rootwait root=/dev/vda ro console=ttyS0 earlycon=sbi" -netdev user,id=net0 -device virtio-net-device,netdev=net0
>
> OpenSBI v1.2-119-gdc1c7db05e07
> ____ _____ ____ _____
> / __ \ / ____| _ \_ _|
> | | | |_ __ ___ _ __ | (___ | |_) || |
> | | | | '_ \ / _ \ '_ \ \___ \| _ < | |
> | |__| | |_) | __/ | | |____) | |_) || |_
> \____/| .__/ \___|_| |_|_____/|___/_____|
> | |
> |_|
>
> Platform Name : riscv-virtio,qemu
> Platform Features : medeleg
> Platform HART Count : 1
> Platform IPI Device : aclint-mswi
> Platform Timer Device : aclint-mtimer @ 10000000Hz
> Platform Console Device : uart8250
> Platform HSM Device : ---
> Platform PMU Device : ---
> Platform Reboot Device : sifive_test
> Platform Shutdown Device : sifive_test
> Platform Suspend Device : ---
> Platform CPPC Device : ---
> Firmware Base : 0x60000000
> Firmware Size : 360 KB
> Firmware RW Offset : 0x40000
> Runtime SBI Version : 1.0
>
> Domain0 Name : root
> Domain0 Boot HART : 0
> Domain0 HARTs : 0*
> Domain0 Region00 : 0x0000000002000000-0x000000000200ffff M: (I,R,W) S/U: ()
> Domain0 Region01 : 0x0000000060040000-0x000000006005ffff M: (R,W) S/U: ()
> Domain0 Region02 : 0x0000000060000000-0x000000006003ffff M: (R,X) S/U: ()
> Domain0 Region03 : 0x0000000000000000-0xffffffffffffffff M: (R,W,X) S/U: (R,W,X)
> Domain0 Next Address : 0x0000000060200000
> Domain0 Next Arg1 : 0x0000000067e00000
> Domain0 Next Mode : S-mode
> Domain0 SysReset : yes
> Domain0 SysSuspend : yes
>
> Boot HART ID : 0
> Boot HART Domain : root
> Boot HART Priv Version : v1.12
> Boot HART Base ISA : rv64imafdch
> Boot HART ISA Extensions : time,sstc
> Boot HART PMP Count : 16
> Boot HART PMP Granularity : 4
> Boot HART PMP Address Bits: 54
> Boot HART MHPM Count : 16
> Boot HART MIDELEG : 0x0000000000001666
> Boot HART MEDELEG : 0x0000000000f0b509
> [ 0.000000] Linux version 6.3.0-rc1-00086-gc8d2fedb997a (guoren@fedora) (riscv64-unknown-linux-gnu-gcc (g5e578a16201f) 13.0.1 20230206 (experimental), GNU ld (GNU Binutils) 2.40.50.20230205) #1 SMP Sun May 14 10:46:42 EDT 2023
> [ 0.000000] random: crng init done
> [ 0.000000] OF: fdt: Ignoring memory range 0x60000000 - 0x60200000
> [ 0.000000] Machine model: riscv-virtio,qemu
> [ 0.000000] efi: UEFI not found.
> [ 0.000000] OF: reserved mem: 0x60000000..0x6003ffff (256 KiB) map non-reusable mmode_resv1@60000000
> [ 0.000000] OF: reserved mem: 0x60040000..0x6005ffff (128 KiB) map non-reusable mmode_resv0@60040000
> [ 0.000000] Zone ranges:
> [ 0.000000] Normal [mem 0x0000000060200000-0x0000000067ffffff]
> [ 0.000000] Movable zone start for each node
> [ 0.000000] Early memory node ranges
> [ 0.000000] node 0: [mem 0x0000000060200000-0x0000000067ffffff]
> [ 0.000000] Initmem setup node 0 [mem 0x0000000060200000-0x0000000067ffffff]
> [ 0.000000] On node 0, zone Normal: 512 pages in unavailable ranges
> [ 0.000000] SBI specification v1.0 detected
> [ 0.000000] SBI implementation ID=0x1 Version=0x10002
> [ 0.000000] SBI TIME extension detected
> [ 0.000000] SBI IPI extension detected
> [ 0.000000] SBI RFENCE extension detected
> [ 0.000000] SBI SRST extension detected
> [ 0.000000] SBI HSM extension detected
> [ 0.000000] riscv: base ISA extensions acdfhim
> [ 0.000000] riscv: ELF capabilities acdfim
> [ 0.000000] percpu: Embedded 13 pages/cpu s24352 r8192 d20704 u53248
> [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 31941
> [ 0.000000] Kernel command line: rootwait root=/dev/vda ro console=ttyS0 earlycon=sbi norandmaps
> [ 0.000000] Dentry cache hash table entries: 16384 (order: 4, 65536 bytes, linear)
> [ 0.000000] Inode-cache hash table entries: 8192 (order: 3, 32768 bytes, linear)
> [ 0.000000] mem auto-init: stack:all(zero), heap alloc:off, heap free:off
> [ 0.000000] Virtual kernel memory layout:
> [ 0.000000] fixmap : 0x9ce00000 - 0x9d000000 (2048 kB)
> [ 0.000000] pci io : 0x9d000000 - 0x9e000000 ( 16 MB)
> [ 0.000000] vmemmap : 0x9e000000 - 0xa0000000 ( 32 MB)
> [ 0.000000] vmalloc : 0xa0000000 - 0xc0000000 ( 512 MB)
> [ 0.000000] lowmem : 0xc0000000 - 0xc7e00000 ( 126 MB)
> [ 0.000000] Memory: 97748K/129024K available (8699K kernel code, 8867K rwdata, 4096K rodata, 4204K init, 361K bss, 31276K reserved, 0K cma-reserved)
> ...
> Starting network: udhcpc: started, v1.36.0
> udhcpc: broadcasting discover
> udhcpc: broadcasting select for 10.0.2.15, server 10.0.2.2
> udhcpc: lease of 10.0.2.15 obtained from 10.0.2.2, lease time 86400
> deleting routers
> adding dns 10.0.2.3
> OK
>
> Welcome to Buildroot
> buildroot login: root
> # cat /proc/cpuinfo
> processor : 0
> hart : 0
> isa : rv64imafdch_zihintpause_zbb_sstc
> mmu : sv39
> mvendorid : 0x0
> marchid : 0x70232
> mimpid : 0x70232
>
> # uname -a
> Linux buildroot 6.3.0-rc1-00086-gc8d2fedb997a #1 SMP Sun May 14 10:46:42 EDT 2023 riscv32 GNU/Linux
> # ls /lib/
> ld-linux-riscv32-ilp32d.so.1 libgcc_s.so.1
> libanl.so.1 libm.so.6
> libatomic.so libnss_dns.so.2
> libatomic.so.1 libnss_files.so.2
> libatomic.so.1.2.0 libpthread.so.0
> libc.so.6 libresolv.so.2
> libcrypt.so.1 librt.so.1
> libdl.so.2 libutil.so.1
> libgcc_s.so modules
>
> # cat /proc/99/maps
> 0000000055554000-0000000055634000 r-xp 00000000 00000000fe:00 17 /bin/busybox
> 0000000055634000-0000000055636000 r--p 00000000df000 00000000fe:00 17 /bin/busybox
> 0000000055636000-0000000055637000 rw-p 00000000e1000 00000000fe:00 17 /bin/busybox
> 0000000055637000-0000000055659000 rw-p 00000000 00:00 0 [heap]
> 0000000077e8d000-0000000077fbe000 r-xp 00000000 00000000fe:00 137 /lib/libc.so.6
> 0000000077fbe000-0000000077fbf000 ---p 00000000131000 00000000fe:00 137 /lib/libc.so.6
> 0000000077fbf000-0000000077fc1000 r--p 00000000131000 00000000fe:00 137 /lib/libc.so.6
> 0000000077fc1000-0000000077fc2000 rw-p 00000000133000 00000000fe:00 137 /lib/libc.so.6
> 0000000077fc2000-0000000077fcc000 rw-p 00000000 00:00 0
> 0000000077fcc000-0000000077fd4000 r-xp 00000000 00000000fe:00 146 /lib/libresolv.so.2
> 0000000077fd4000-0000000077fd5000 ---p 000000008000 00000000fe:00 146 /lib/libresolv.so.2
> 0000000077fd5000-0000000077fd6000 r--p 000000008000 00000000fe:00 146 /lib/libresolv.so.2
> 0000000077fd6000-0000000077fd7000 rw-p 000000009000 00000000fe:00 146 /lib/libresolv.so.2
> 0000000077fd7000-0000000077fd9000 rw-p 00000000 00:00 0
> 0000000077fd9000-0000000077fdb000 r--p 00000000 00:00 0 [vvar]
> 0000000077fdb000-0000000077fdd000 r-xp 00000000 00:00 0 [vdso]
> 0000000077fdd000-0000000077ffc000 r-xp 00000000 00000000fe:00 132 /lib/ld-linux-riscv32-ilp32d.so.1
> 0000000077ffd000-0000000077ffe000 r--p 000000001f000 00000000fe:00 132 /lib/ld-linux-riscv32-ilp32d.so.1
> 0000000077ffe000-0000000077fff000 rw-p 0000000020000 00000000fe:00 132 /lib/ld-linux-riscv32-ilp32d.so.1
> 000000007ffde000-000000007ffff000 rw-p 00000000 00:00 0 [stack]
>
> Other resources
> ===============
>
> OpenEuler riscv32 rootfs
> ------------------------
> The OpenEuler riscv32 rootfs you can download from here:
> https://repo.tarsier-infra.com/openEuler-RISC-V/obs/archive/rv32/openeuler-image-qemu-riscv32-20221111070036.rootfs.ext4
> (Made by Junqiang Wang)
>
> Debain riscv32 rootfs
> ---------------------
> The Debian riscv32 rootfs you can download from here:
> https://github.com/yuzibo/riscv32
> (Made by Bo YU and Han Gao)
>
> Fedora riscv32 rootfs
> ---------------------
> https://fedoraproject.org/wiki/Architectures/RISC-V/RV32
> (Made by Wei Fu)
>
> LLVM 64ilp32
> ------------
> git clone https://github.com/luxufan/llvm-project.git -b rv64-ilp32
> cd llvm-project
> mkdir build && cd build
> cmake ../llvm -G Ninja -DCMAKE_BUILD_TYPE=Release -DLLVM_TARGETS_TO_BUILD=“X86;RISCV" -DLLVM_ENABLE_PROJECTS="clang;lld"
> ninja all
>
> (LLVM development status is that CC=clang can compile the kernel with
> LLVM=1 but has not yet booted successfully.)
>
> Patch organization
> ==================
> This series depends on 64ilp32 toolchain patches that are not upstream
> yet.
>
> PATCH [0-1] unify vdso32 & compat_vdso
> PATCH [2] adds time-related vDSO common flow for vdso32
> PATCH [3] adds s64ilp32 support of clocksource driver
> PATCH [5] adds s64ilp32 support of irqchip driver
> PATCH [4,6-12] add basic data types and compiling framework
> PATCH [13] adds MMU_SV39 support
> PATCH [14] adds native atomic64
> PATCH [15] adds TImode
> PATCH [16] adds cmpxchg_double
> PATCH [17-19] cleanup kconfig & add defconfig
> PATCH [20-21] fix temporary compiler problems
>
> Open issues
> ===========
>
> Callee saved the register width
> -------------------------------
> For 64-bit ISA (including 64lp64, 64ilp32), callee can't determine the
> correct width used in the register, so they saved the maximum width of
> the ISA register, i.e., xlen size. We also found this rule in x86-x32,
> mips-n32, and aarch64ilp32, which comes from 64lp64. See PATCH [20]
>
> Here are two downsides of this:
> - It would cause a difference with 32ilp32's stack frame, and s64ilp32
> reuses 32ilp32 software stack. Thus, many additional compatible
> problems would happen during the porting of 64ilp32 software.
> - It also increases the budget of the stack usage.
> <setup_vm>:
> auipc a3,0xff3fb
> add a3,a3,1234 # c0000000
> li a5,-1
> lui a4,0xc0000
> addw sp,sp,-96
> srl a5,a5,0x20
> subw a4,a4,a3
> auipc a2,0x111a
> add a2,a2,1212 # c1d1f000
> sd s0,80(sp)----+
> sd s1,72(sp) |
> sd s2,64(sp) |
> sd s7,24(sp) |
> sd s8,16(sp) |
> sd s9,8(sp) |-> All <= 32b widths, but occupy 64b
> sd ra,88(sp) | stack space.
> sd s3,56(sp) | Affect memory footprint & cache
> sd s4,48(sp) | performance.
> sd s5,40(sp) |
> sd s6,32(sp) |
> sd s10,0(sp)----+
> sll a1,a4,0x20
> subw a2,a2,a3
> and a4,a4,a5
>
> So here is a proposal to riscv 64ilp32 ABI:
> - Let the compiler prevent callee saving ">32b variables" in
> callee-registers. (Q: We need to measure, how the influence of
> 64b variables cross function call?)
>
> EF_RISCV_X32
> ------------
> We add an e_flag (EF_RISCV_X32) to distinguish the 32-bit ELF, which
> occupies BIT[6] of the e_flags layout.
>
> ELF Header:
> Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
> Class: ELF32
> Data: 2's complement, little endian
> Version: 1 (current)
> OS/ABI: UNIX - System V
> ABI Version: 0
> Type: REL (Relocatable file)
> Machine: RISC-V
> Version: 0x1
> Entry point address: 0x0
> Start of program headers: 0 (bytes into file)
> Start of section headers: 24620 (bytes into file)
> Flags: 0x21, RVC, X32, soft-float ABI
> ^^^
> 64-bit Optimization problem
> ---------------------------
> There is an existing problem in 64ilp32 gcc that combines two pointers
> in one register. Liao is solving that problem. Before he finishes the
> job, we could prevent it with a simple noinline attribute, fortunately.
> struct path {
> struct vfsmount *mnt;
> struct dentry *dentry;
> } __randomize_layout;
>
> struct nameidata {
> struct path path;
> ...
> struct path root;
> ...
> } __randomize_layout;
>
> struct nameidata *nd
> ...
> nd->path = nd->root;
> 6c88 ld a0,24(s1)
> ^^ // a0 contains two pointers
> e088 sd a0,0(s1)
> mntget(path->mnt);
> // Need "lw a0,0(s1)" or "a0 << 32; a0 >> 32"
> 2a6150ef jal c01ce946 <mntget> // bug!
>
> Acknowledge
> ===========
> The s64ilp32 needs many other projects' cooperation. Thx, all guys
> involved:
> - GNU: LiaoShihua <[email protected]>,
> Jiawe Chen<[email protected]>
> - Qemu: Weiwei Li <[email protected]>
> - LLVM: luxufan <[email protected]>,
> Chunyu Liao<[email protected]>
> - OpenEuler rv32: Junqiang Wang <[email protected]>
> - Debian rv32: Bo YU <[email protected]>
> Han Gao <[email protected]>
> - Fedora rv32: Wei Fu <[email protected]>
>
> References
> ==========
> [1] https://techpubs.jurassic.nl/manuals/0630/developer/Mpro_n32_ABI/sgi_html/index.html
> [2] https://wiki.debian.org/Arm64ilp32Port
> [3] https://lwn.net/Articles/456731/
> [4] https://github.com/riscv/riscv-profiles/releases
> [5] https://www.cnx-software.com/2021/10/25/allwinner-d1s-f133-risc-v-processor-64mb-ddr2/
> [6] https://milkv.io/duo/
> [7] https://twitter.com/tphuang/status/1631308330256801793
> [8] https://www.cnx-software.com/2022/12/02/pine64-ox64-sbc-bl808-risc-v-multi-protocol-wisoc-64mb-ram/
>
> Guo Ren (22):
> riscv: vdso: Unify vdso32 & compat_vdso into vdso/Makefile
> riscv: vdso: Remove compat_vdso/
> riscv: vdso: Add time-related vDSO common flow for vdso32
> clocksource: riscv: s64ilp32: Use __riscv_xlen instead of CONFIG_32BIT
> riscv: s64ilp32: Introduce xlen_t
> irqchip: riscv: s64ilp32: Use __riscv_xlen instead of CONFIG_32BIT
> riscv: s64ilp32: Add sbi support
> riscv: s64ilp32: Add asid support
> riscv: s64ilp32: Introduce PTR_L and PTR_S
> riscv: s64ilp32: Enable user space runtime environment
> riscv: s64ilp32: Add ebpf jit support
> riscv: s64ilp32: Add ELF32 support
> riscv: s64ilp32: Add ARCH RV64 ILP32 compiling framework
> riscv: s64ilp32: Add MMU_SV39 mode support for 32BIT
> riscv: s64ilp32: Enable native atomic64
> riscv: s64ilp32: Add TImode (128 int) support
> riscv: s64ilp32: Implement cmpxchg_double
> riscv: s64ilp32: Disable KVM
> riscv: Cleanup rv32_defconfig
> riscv: s64ilp32: Add rv64ilp32_defconfig
> riscv: s64ilp32: Correct the rv64ilp32 stackframe layout
> riscv: s64ilp32: Temporary workaround solution to gcc problem
>
> arch/riscv/Kconfig | 36 +++-
> arch/riscv/Makefile | 24 ++-
> arch/riscv/configs/32-bit.config | 2 -
> arch/riscv/configs/64ilp32.config | 2 +
> arch/riscv/include/asm/asm.h | 5 +
> arch/riscv/include/asm/atomic.h | 6 +
> arch/riscv/include/asm/cmpxchg.h | 53 ++++++
> arch/riscv/include/asm/cpu_ops_sbi.h | 4 +-
> arch/riscv/include/asm/csr.h | 58 +++---
> arch/riscv/include/asm/extable.h | 2 +-
> arch/riscv/include/asm/page.h | 24 ++-
> arch/riscv/include/asm/pgtable-64.h | 42 ++---
> arch/riscv/include/asm/pgtable.h | 26 ++-
> arch/riscv/include/asm/processor.h | 8 +-
> arch/riscv/include/asm/ptrace.h | 96 +++++-----
> arch/riscv/include/asm/sbi.h | 24 +--
> arch/riscv/include/asm/stacktrace.h | 6 +
> arch/riscv/include/asm/timex.h | 10 +-
> arch/riscv/include/asm/vdso.h | 34 +++-
> arch/riscv/include/asm/vdso/gettimeofday.h | 84 +++++++++
> arch/riscv/include/uapi/asm/elf.h | 2 +-
> arch/riscv/include/uapi/asm/unistd.h | 1 +
> arch/riscv/kernel/Makefile | 3 +-
> arch/riscv/kernel/compat_signal.c | 2 +-
> arch/riscv/kernel/compat_vdso/.gitignore | 2 -
> arch/riscv/kernel/compat_vdso/compat_vdso.S | 8 -
> .../kernel/compat_vdso/compat_vdso.lds.S | 3 -
> arch/riscv/kernel/compat_vdso/flush_icache.S | 3 -
> arch/riscv/kernel/compat_vdso/getcpu.S | 3 -
> arch/riscv/kernel/compat_vdso/note.S | 3 -
> arch/riscv/kernel/compat_vdso/rt_sigreturn.S | 3 -
> arch/riscv/kernel/cpu.c | 4 +-
> arch/riscv/kernel/cpu_ops_sbi.c | 4 +-
> arch/riscv/kernel/cpufeature.c | 4 +-
> arch/riscv/kernel/entry.S | 24 +--
> arch/riscv/kernel/head.S | 8 +-
> arch/riscv/kernel/process.c | 8 +-
> arch/riscv/kernel/sbi.c | 24 +--
> arch/riscv/kernel/signal.c | 6 +-
> arch/riscv/kernel/traps.c | 4 +-
> arch/riscv/kernel/vdso.c | 4 +-
> arch/riscv/kernel/vdso/Makefile | 176 ++++++++++++------
> ..._vdso_offsets.sh => gen_vdso32_offsets.sh} | 2 +-
> .../gen_vdso64_offsets.sh} | 2 +-
> arch/riscv/kernel/vdso/vgettimeofday.c | 39 +++-
> arch/riscv/kernel/vdso32.S | 8 +
> arch/riscv/kernel/{vdso/vdso.S => vdso64.S} | 8 +-
> arch/riscv/kvm/Kconfig | 1 +
> arch/riscv/lib/Makefile | 1 +
> arch/riscv/lib/memset.S | 4 +-
> arch/riscv/mm/context.c | 16 +-
> arch/riscv/mm/fault.c | 13 +-
> arch/riscv/mm/init.c | 29 ++-
> arch/riscv/net/Makefile | 6 +-
> arch/riscv/net/bpf_jit_comp64.c | 10 +-
> drivers/clocksource/timer-riscv.c | 2 +-
> drivers/irqchip/irq-riscv-intc.c | 4 +-
> fs/namei.c | 2 +-
> 58 files changed, 675 insertions(+), 317 deletions(-)
> create mode 100644 arch/riscv/configs/64ilp32.config
> delete mode 100644 arch/riscv/kernel/compat_vdso/.gitignore
> delete mode 100644 arch/riscv/kernel/compat_vdso/compat_vdso.S
> delete mode 100644 arch/riscv/kernel/compat_vdso/compat_vdso.lds.S
> delete mode 100644 arch/riscv/kernel/compat_vdso/flush_icache.S
> delete mode 100644 arch/riscv/kernel/compat_vdso/getcpu.S
> delete mode 100644 arch/riscv/kernel/compat_vdso/note.S
> delete mode 100644 arch/riscv/kernel/compat_vdso/rt_sigreturn.S
> rename arch/riscv/kernel/vdso/{gen_vdso_offsets.sh => gen_vdso32_offsets.sh} (78%)
> rename arch/riscv/kernel/{compat_vdso/gen_compat_vdso_offsets.sh => vdso/gen_vdso64_offsets.sh} (77%)
> create mode 100644 arch/riscv/kernel/vdso32.S
> rename arch/riscv/kernel/{vdso/vdso.S => vdso64.S} (73%)

2023-05-18 18:45:25

by Arnd Bergmann

[permalink] [raw]

Subject: Re: [RFC PATCH 00/22] riscv: s64ilp32: Running 32-bit Linux kernel on 64-bit supervisor mode

On Thu, May 18, 2023, at 17:38, Palmer Dabbelt wrote:
> On Thu, 18 May 2023 06:09:51 PDT (-0700), [email protected] wrote:
>> From: Guo Ren <[email protected]>
>>
>> This patch series adds s64ilp32 support to riscv. The term s64ilp32
>> means smode-xlen=64 and -mabi=ilp32 (ints, longs, and pointers are all
>> 32-bit), i.e., running 32-bit Linux kernel on pure 64-bit supervisor
>> mode. There have been many 64ilp32 abis existing, such as mips-n32 [1],
>> arm-aarch64ilp32 [2], and x86-x32 [3], but they are all about userspace.
>> Thus, this should be the first time running a 32-bit Linux kernel with
>> the 64ilp32 ABI at supervisor mode (If not, correct me).
>
> Does anyone actually want this? At a bare minimum we'd need to add it
> to the psABI, which would presumably also be required on the compiler
> side of things.
>
> It's not even clear anyone wants rv64/ilp32 in userspace, the kernel
> seems like it'd be even less widely used.

We have had long discussions about supporting ilp32 userspace on
arm64, and I think almost everyone is glad we never merged it into
the mainline kernel, so we don't have to worry about supporting it
in the future. The cost of supporting an extra user space ABI
is huge, and I'm sure you don't want to go there. The other two
cited examples (mips-n32 and x86-x32) are pretty much unused now
as well, but still have a maintenance burden until they can finally
get removed.

If for some crazy reason you'd still want the 64ilp32 ABI in user
space, running the kernel this way is probably still a bad idea,
but that one is less clear. There is clearly a small memory
penalty of running a 64-bit kernel for larger data structures
(page, inode, task_struct, ...) and vmlinux, and there is no
huge additional maintenance cost on top of the ABI itself
that you'd need either way, but using a 64-bit address space
in the kernel has some important advantages even when running
32-bit userland: processes can use the entire 4GB virtual
space, while the kernel can address more than 768MB of lowmem,
and KASLR has more bits to work with for randomization. On
RISCV, some additional features (VMAP_STACK, KASAN, KFENCE,
...) depend on 64-bit kernels even though they don't
strictly need that.

Arnd

2023-05-19 00:18:44

by Paul Walmsley

[permalink] [raw]

Subject: Re: [RFC PATCH 00/22] riscv: s64ilp32: Running 32-bit Linux kernel on 64-bit supervisor mode

On Thu, 18 May 2023, Palmer Dabbelt wrote:

> On Thu, 18 May 2023 06:09:51 PDT (-0700), [email protected] wrote:
>
> > This patch series adds s64ilp32 support to riscv. The term s64ilp32
> > means smode-xlen=64 and -mabi=ilp32 (ints, longs, and pointers are all
> > 32-bit), i.e., running 32-bit Linux kernel on pure 64-bit supervisor
> > mode. There have been many 64ilp32 abis existing, such as mips-n32 [1],
> > arm-aarch64ilp32 [2], and x86-x32 [3], but they are all about userspace.
> > Thus, this should be the first time running a 32-bit Linux kernel with
> > the 64ilp32 ABI at supervisor mode (If not, correct me).
>
> Does anyone actually want this? At a bare minimum we'd need to add it to the
> psABI, which would presumably also be required on the compiler side of things.
>
> It's not even clear anyone wants rv64/ilp32 in userspace, the kernel seems
> like it'd be even less widely used.

We've certainly talked to folks who are interested in RV64 ILP32 userspace
with an LP64 kernel. The motivation is the usual one: to reduce data size
and therefore (ideally) BOM cost. I think this work, if it goes forward,
would need to go hand in hand with the RVIA psABI group.

The RV64 ILP32 kernel and ILP32 userspace approach implemented by this
patch is intriguing, but I guess for me, the question is whether it's
worth the extra hassle vs. a pure RV32 kernel & userspace.

- Paul

2023-05-19 00:41:10

by Paul Walmsley

[permalink] [raw]

Subject: Re: [RFC PATCH 00/22] riscv: s64ilp32: Running 32-bit Linux kernel on 64-bit supervisor mode

On Thu, 18 May 2023, Arnd Bergmann wrote:

> We have had long discussions about supporting ilp32 userspace on
> arm64, and I think almost everyone is glad we never merged it into
> the mainline kernel, so we don't have to worry about supporting it
> in the future. The cost of supporting an extra user space ABI
> is huge, and I'm sure you don't want to go there. The other two
> cited examples (mips-n32 and x86-x32) are pretty much unused now
> as well, but still have a maintenance burden until they can finally
> get removed.

There probably hasn't been much pressure to support Aarch64 ILP32 since
ARM still has hardware support for Aarch32. Will be interesting to see if
that's still the case after ARM drops Aarch32 support for future designs.

- Paul

2023-05-19 09:28:37

by Arnd Bergmann

[permalink] [raw]

Subject: Re: [RFC PATCH 00/22] riscv: s64ilp32: Running 32-bit Linux kernel on 64-bit supervisor mode

On Fri, May 19, 2023, at 02:38, Paul Walmsley wrote:
> On Thu, 18 May 2023, Arnd Bergmann wrote:
>
>> We have had long discussions about supporting ilp32 userspace on
>> arm64, and I think almost everyone is glad we never merged it into
>> the mainline kernel, so we don't have to worry about supporting it
>> in the future. The cost of supporting an extra user space ABI
>> is huge, and I'm sure you don't want to go there. The other two
>> cited examples (mips-n32 and x86-x32) are pretty much unused now
>> as well, but still have a maintenance burden until they can finally
>> get removed.
>
> There probably hasn't been much pressure to support Aarch64 ILP32 since
> ARM still has hardware support for Aarch32. Will be interesting to see if
> that's still the case after ARM drops Aarch32 support for future designs.

I think there was a some pressure for 64ilp32 from Arm when aarch64 support
was originally added, as they always planned to drop aarch32 support
eventually, but I don't see that coming back now. I think the situation
is quite different as well:

On aarch64, there is a significant cost in supporting aarch32 userspace
because of the complexity of that particular instruction set, but at
the same time there is also a huge amount of software that is compiled
for or written to support aarch32 software, and nobody wants to
replace that. There are also a lot of existing arm32 chips with
guaranteed availability well into the 2030s, new 32-bit-only chips
based on Cortex-A7 (originally released in 2011) coming out constantly,
and even the latest low-end core (Cortex-A510 r1). It's probably
going to be several years before that core even shows up in low-memory
systems, and then decades before this stops being available in SoCs,
even in the unlikely case that no future low-end cores support
aarch32-el0 mode (it's already been announced that there are no
plans for future high-end cores with aarch32 mode, but those won't
be used in low-memory configurations anyway).

For RISC-V, I have not seen much interest in Linux userspace for
the existing rv32 mode, so you could argue that there is not much
to lose in abandoning it. On the other hand, the cost of adding
rv32 support to an rv64 core should be very small as all the
instructions are already present in some other encoding, and
developers have already spent a significant amount of work on
bringing up rv32 userspace that would all have to be done again
for a new ABI, and you'd end up splitting the already tiny
developer base for 32-bit riscv in two for the existing rv32 side
and a new rv64ilp32 side.

I suppose the answer in both cases is the same though: if a
SoC maker wants to sell a product to users with low memory,
they should pick a CPU core that implements standard 32-bit
user space support rather than making a mess of it and
expecting software to work around it.

Arnd

2023-05-19 15:59:26

[permalink] [raw]

Subject: Re: [RFC PATCH 00/22] riscv: s64ilp32: Running 32-bit Linux kernel on 64-bit supervisor mode

On Fri, May 19, 2023 at 2:29 AM Arnd Bergmann <[email protected]> wrote:
>
> On Thu, May 18, 2023, at 17:38, Palmer Dabbelt wrote:
> > On Thu, 18 May 2023 06:09:51 PDT (-0700), [email protected] wrote:
> >> From: Guo Ren <[email protected]>
> >>
> >> This patch series adds s64ilp32 support to riscv. The term s64ilp32
> >> means smode-xlen=64 and -mabi=ilp32 (ints, longs, and pointers are all
> >> 32-bit), i.e., running 32-bit Linux kernel on pure 64-bit supervisor
> >> mode. There have been many 64ilp32 abis existing, such as mips-n32 [1],
> >> arm-aarch64ilp32 [2], and x86-x32 [3], but they are all about userspace.
> >> Thus, this should be the first time running a 32-bit Linux kernel with
> >> the 64ilp32 ABI at supervisor mode (If not, correct me).
> >
> > Does anyone actually want this? At a bare minimum we'd need to add it
> > to the psABI, which would presumably also be required on the compiler
> > side of things.
> >
> > It's not even clear anyone wants rv64/ilp32 in userspace, the kernel
> > seems like it'd be even less widely used.
>
> We have had long discussions about supporting ilp32 userspace on
> arm64, and I think almost everyone is glad we never merged it into
> the mainline kernel, so we don't have to worry about supporting it
> in the future. The cost of supporting an extra user space ABI
> is huge, and I'm sure you don't want to go there. The other two
> cited examples (mips-n32 and x86-x32) are pretty much unused now
> as well, but still have a maintenance burden until they can finally
> get removed.
>
> If for some crazy reason you'd still want the 64ilp32 ABI in user
> space, running the kernel this way is probably still a bad idea,
> but that one is less clear. There is clearly a small memory
> penalty of running a 64-bit kernel for larger data structures
> (page, inode, task_struct, ...) and vmlinux, and there is no
I don't think it's a small memory penalty, our measurement is about
16% with defconfig, see "Why 32-bit Linux?" section.
This patch series doesn't add 64ilp32 userspace abi, but it seems you
also don't like to run 32-bit Linux kernel on 64-bit hardware, right?

The motivation of s64ilp32 (running 32-bit Linux kernel on 64-bit s-mode):
- The target hardware (Canaan Kendryte k230) only supports MXL=64,
SXL=64, UXL=64/32.
- The 64-bit Linux + compat 32-bit app can't satisfy the 64/128MB scenarios.

> huge additional maintenance cost on top of the ABI itself
> that you'd need either way, but using a 64-bit address space
> in the kernel has some important advantages even when running
> 32-bit userland: processes can use the entire 4GB virtual
> space, while the kernel can address more than 768MB of lowmem,
> and KASLR has more bits to work with for randomization. On
> RISCV, some additional features (VMAP_STACK, KASAN, KFENCE,
> ...) depend on 64-bit kernels even though they don't
> strictly need that.

I agree that the 64-bit linux kernel has more functionalities, but:
- What do you think about linux on a 64/128MB SoC? Could it be
affordable to VMAP_STACK, KASAN, KFENCE?
- I think 32-bit Linux & RTOS have monopolized this market (64/128MB
scenarios), right?

>
> Arnd

--
Best Regards
Guo Ren

2023-05-19 17:05:28

by Arnd Bergmann

[permalink] [raw]

Subject: Re: [RFC PATCH 00/22] riscv: s64ilp32: Running 32-bit Linux kernel on 64-bit supervisor mode

On Fri, May 19, 2023, at 17:31, Guo Ren wrote:
> On Fri, May 19, 2023 at 2:29 AM Arnd Bergmann <[email protected]> wrote:
>> On Thu, May 18, 2023, at 17:38, Palmer Dabbelt wrote:
>> > On Thu, 18 May 2023 06:09:51 PDT (-0700), [email protected] wrote:
>>
>> If for some crazy reason you'd still want the 64ilp32 ABI in user
>> space, running the kernel this way is probably still a bad idea,
>> but that one is less clear. There is clearly a small memory
>> penalty of running a 64-bit kernel for larger data structures
>> (page, inode, task_struct, ...) and vmlinux, and there is no
> I don't think it's a small memory penalty, our measurement is about
> 16% with defconfig, see "Why 32-bit Linux?" section.
>
> This patch series doesn't add 64ilp32 userspace abi, but it seems you
> also don't like to run 32-bit Linux kernel on 64-bit hardware, right?

Ok, I'm sorry for missing the important bit here. So if this can
still use the normal 32-bit user space, the cost of this patch set
is not huge, and it's something that can be beneficial in a few
cases, though I suspect most users are still better off running
64-bit kernels.

> The motivation of s64ilp32 (running 32-bit Linux kernel on 64-bit s-mode):
> - The target hardware (Canaan Kendryte k230) only supports MXL=64,
> SXL=64, UXL=64/32.
> - The 64-bit Linux + compat 32-bit app can't satisfy the 64/128MB scenarios.
>
>> huge additional maintenance cost on top of the ABI itself
>> that you'd need either way, but using a 64-bit address space
>> in the kernel has some important advantages even when running
>> 32-bit userland: processes can use the entire 4GB virtual
>> space, while the kernel can address more than 768MB of lowmem,
>> and KASLR has more bits to work with for randomization. On
>> RISCV, some additional features (VMAP_STACK, KASAN, KFENCE,
>> ...) depend on 64-bit kernels even though they don't
>> strictly need that.
>
> I agree that the 64-bit linux kernel has more functionalities, but:
> - What do you think about linux on a 64/128MB SoC? Could it be
> affordable to VMAP_STACK, KASAN, KFENCE?

I would definitely recommend VMAP_STACK, but that can be implemented
and is used on other 32-bit architectures (ppc32, arm32) without a
huge cost. The larger virtual user address space can help even on
machines with 128MB, though most applications probably don't care at
that point.

> - I think 32-bit Linux & RTOS have monopolized this market (64/128MB
> scenarios), right?

The minimum amount of RAM that makes a system usable for Linux is
constantly going up, so I think with 64MB, most new projects are
already better off running some RTOS kernel instead of Linux.
The ones that are still usable today probably won't last a lot
of distro upgrades before the bloat catches up with them, but I
can see how your patch set can give them a few extra years of
updates.

For the 256MB+ systems, I would expect the sensitive kernel
allocations to be small enough that the series makes little
difference. The 128MB systems are the most interesting ones
here, and I'm curious to see where you spot most of the
memory usage differences, I'll also reply to your initial
mail for that.

Arnd

2023-05-19 17:41:20

by Palmer Dabbelt

[permalink] [raw]

Subject: Re: [RFC PATCH 00/22] riscv: s64ilp32: Running 32-bit Linux kernel on 64-bit supervisor mode

On Fri, 19 May 2023 09:53:35 PDT (-0700), Arnd Bergmann wrote:
> On Fri, May 19, 2023, at 17:31, Guo Ren wrote:
>> On Fri, May 19, 2023 at 2:29 AM Arnd Bergmann <[email protected]> wrote:
>>> On Thu, May 18, 2023, at 17:38, Palmer Dabbelt wrote:
>>> > On Thu, 18 May 2023 06:09:51 PDT (-0700), [email protected] wrote:
>>>
>>> If for some crazy reason you'd still want the 64ilp32 ABI in user
>>> space, running the kernel this way is probably still a bad idea,
>>> but that one is less clear. There is clearly a small memory
>>> penalty of running a 64-bit kernel for larger data structures
>>> (page, inode, task_struct, ...) and vmlinux, and there is no
>> I don't think it's a small memory penalty, our measurement is about
>> 16% with defconfig, see "Why 32-bit Linux?" section.
>>
>> This patch series doesn't add 64ilp32 userspace abi, but it seems you
>> also don't like to run 32-bit Linux kernel on 64-bit hardware, right?
>
> Ok, I'm sorry for missing the important bit here. So if this can
> still use the normal 32-bit user space, the cost of this patch set
> is not huge, and it's something that can be beneficial in a few
> cases, though I suspect most users are still better off running
> 64-bit kernels.

Running a normal 32-bit userspace would require HW support for the
32-bit mode switch for userspace, though (rv32 isn't a subset of rv64,
so there's nothing we can do to make those binaries function correctly
with uABI). The userspace-only mode switch is a bit simpler than the
user+supervisor switch, but it seems like vendors who really want the
memory savings would just implement both mode switches.

>> The motivation of s64ilp32 (running 32-bit Linux kernel on 64-bit s-mode):
>> - The target hardware (Canaan Kendryte k230) only supports MXL=64,
>> SXL=64, UXL=64/32.
>> - The 64-bit Linux + compat 32-bit app can't satisfy the 64/128MB scenarios.
>>
>>> huge additional maintenance cost on top of the ABI itself
>>> that you'd need either way, but using a 64-bit address space
>>> in the kernel has some important advantages even when running
>>> 32-bit userland: processes can use the entire 4GB virtual
>>> space, while the kernel can address more than 768MB of lowmem,
>>> and KASLR has more bits to work with for randomization. On
>>> RISCV, some additional features (VMAP_STACK, KASAN, KFENCE,
>>> ...) depend on 64-bit kernels even though they don't
>>> strictly need that.
>>
>> I agree that the 64-bit linux kernel has more functionalities, but:
>> - What do you think about linux on a 64/128MB SoC? Could it be
>> affordable to VMAP_STACK, KASAN, KFENCE?
>
> I would definitely recommend VMAP_STACK, but that can be implemented
> and is used on other 32-bit architectures (ppc32, arm32) without a
> huge cost. The larger virtual user address space can help even on
> machines with 128MB, though most applications probably don't care at
> that point.

At least having them as an option seems reasonable. Historically we
haven't gated new base systems on having every feature the others do,
though (!MMU, rv32, etc).

>> - I think 32-bit Linux & RTOS have monopolized this market (64/128MB
>> scenarios), right?
>
> The minimum amount of RAM that makes a system usable for Linux is
> constantly going up, so I think with 64MB, most new projects are
> already better off running some RTOS kernel instead of Linux.
> The ones that are still usable today probably won't last a lot
> of distro upgrades before the bloat catches up with them, but I
> can see how your patch set can give them a few extra years of
> updates.

We also have 32-bit kernel support. Systems that have tens of MB of RAM
tend to end up with some memory technology that doesn't scale to
gigabytes these days, and since that's fixed when the chip is built it
seems like those folks would be better off just having HW support for
32-bit kernels (and maybe not even bothering with HW support for 64-bit
kernels).

> For the 256MB+ systems, I would expect the sensitive kernel
> allocations to be small enough that the series makes little
> difference. The 128MB systems are the most interesting ones
> here, and I'm curious to see where you spot most of the
> memory usage differences, I'll also reply to your initial
> mail for that.

Thanks. I agree we need to see some real systems that benefit from
this, as it's a pretty big support cost. Just defconfig sizes doesn't
mean a whole lot, as users on these very constrained systems aren't
likely to run defconfig anyway.

If someone's going to use it then I'm fine taking the code, it just
seems like a very thin set of possible use cases. We've already got
almost no users in RISC-V land, I've got a feeling this is esoteric
enough to actually have zero.

>
> Arnd

2023-05-19 20:55:28

by Arnd Bergmann

[permalink] [raw]

Subject: Re: [RFC PATCH 00/22] riscv: s64ilp32: Running 32-bit Linux kernel on 64-bit supervisor mode

On Thu, May 18, 2023, at 15:09, [email protected] wrote:
> From: Guo Ren <[email protected]>
> Why 32-bit Linux?
> =================
> The motivation for using a 32-bit Linux kernel is to reduce memory
> footprint and meet the small capacity of DDR & cache requirement
> (e.g., 64/128MB SIP SoC).
>
> Here are the 32-bit v.s. 64-bit Linux kernel data type comparison
> summary:
> 32-bit 64-bit
> sizeof(page): 32bytes 64bytes
> sizeof(list_head): 8bytes 16bytes
> sizeof(hlist_head): 8bytes 16bytes
> sizeof(vm_area): 68bytes 136bytes
> ...

> Mem-usage:
> (s32ilp32) # free
> total used free shared buff/cache available
> Mem: 100040 8380 88244 44 3416 88080
>
> (s64lp64) # free
> total used free shared buff/cache available
> Mem: 91568 11848 75796 44 3924 75952
>
> (s64ilp32) # free
> total used free shared buff/cache available
> Mem: 101952 8528 90004 44 3420 89816
> ^^^^^
>
> It's a rough measurement based on the current default config without any
> modification, and 32-bit (s32ilp32, s64ilp32) saved more than 16% memory
> to 64-bit (s64lp64). But s32ilp32 & s64ilp32 have a similar memory
> footprint (about 0.33% difference), meaning s64ilp32 has a big chance to
> replace s32ilp32 on the 64-bit machine.

I've tried to run the same numbers for the debate about running
32-bit vs 64-bit arm kernels in the past, but focused mostly on
slightly larger systems, but I looked mainly at the 512MB case,
as that is the most cost-efficient DDR3 memory configuration
and fairly common.

What I'd like to understand better in your example is where
the 14MB of memory went. I assume this is for 128MB of total
RAM, so we know that 1MB went into additional 'struct page'
objects (32 bytes * 32768 pages). It would be good to know
where the dynamic allocations went and if they are reclaimable
(e.g. inodes) or non-reclaimable (e.g. kmalloc-128).

For the vmlinux size, is this already a minimal config
that one would run on a board with 128MB of RAM, or a
defconfig that includes a lot of stuff that is only relevant
for other platforms but also grows on 64-bit?

What do you see in /proc/slabinfo, /proc/meminfo/, and
'size vmlinux' for the s64ilp32 and s64lp64 kernels here?

Arnd

2023-05-20 01:45:30

[permalink] [raw]

Subject: Re: [RFC PATCH 00/22] riscv: s64ilp32: Running 32-bit Linux kernel on 64-bit supervisor mode

On Sat, May 20, 2023 at 12:54 AM Arnd Bergmann <[email protected]> wrote:
>
> On Fri, May 19, 2023, at 17:31, Guo Ren wrote:
> > On Fri, May 19, 2023 at 2:29 AM Arnd Bergmann <[email protected]> wrote:
> >> On Thu, May 18, 2023, at 17:38, Palmer Dabbelt wrote:
> >> > On Thu, 18 May 2023 06:09:51 PDT (-0700), [email protected] wrote:
> >>
> >> If for some crazy reason you'd still want the 64ilp32 ABI in user
> >> space, running the kernel this way is probably still a bad idea,
> >> but that one is less clear. There is clearly a small memory
> >> penalty of running a 64-bit kernel for larger data structures
> >> (page, inode, task_struct, ...) and vmlinux, and there is no
> > I don't think it's a small memory penalty, our measurement is about
> > 16% with defconfig, see "Why 32-bit Linux?" section.
> >
> > This patch series doesn't add 64ilp32 userspace abi, but it seems you
> > also don't like to run 32-bit Linux kernel on 64-bit hardware, right?
>
> Ok, I'm sorry for missing the important bit here. So if this can
> still use the normal 32-bit user space, the cost of this patch set
> is not huge, and it's something that can be beneficial in a few
> cases, though I suspect most users are still better off running
> 64-bit kernels.
>
> > The motivation of s64ilp32 (running 32-bit Linux kernel on 64-bit s-mode):
> > - The target hardware (Canaan Kendryte k230) only supports MXL=64,
> > SXL=64, UXL=64/32.
> > - The 64-bit Linux + compat 32-bit app can't satisfy the 64/128MB scenarios.
> >
> >> huge additional maintenance cost on top of the ABI itself
> >> that you'd need either way, but using a 64-bit address space
> >> in the kernel has some important advantages even when running
> >> 32-bit userland: processes can use the entire 4GB virtual
> >> space, while the kernel can address more than 768MB of lowmem,
> >> and KASLR has more bits to work with for randomization. On
> >> RISCV, some additional features (VMAP_STACK, KASAN, KFENCE,
> >> ...) depend on 64-bit kernels even though they don't
> >> strictly need that.
> >
> > I agree that the 64-bit linux kernel has more functionalities, but:
> > - What do you think about linux on a 64/128MB SoC? Could it be
> > affordable to VMAP_STACK, KASAN, KFENCE?
>
> I would definitely recommend VMAP_STACK, but that can be implemented
> and is used on other 32-bit architectures (ppc32, arm32) without a
> huge cost. The larger virtual user address space can help even on
> machines with 128MB, though most applications probably don't care at
> that point.
Good point, I would support VMAP_STACK in ARCH_RV64ILP32.

>
> > - I think 32-bit Linux & RTOS have monopolized this market (64/128MB
> > scenarios), right?
>
> The minimum amount of RAM that makes a system usable for Linux is
> constantly going up, so I think with 64MB, most new projects are
> already better off running some RTOS kernel instead of Linux.
> The ones that are still usable today probably won't last a lot
> of distro upgrades before the bloat catches up with them, but I
> can see how your patch set can give them a few extra years of
> updates.
Linux development costs much cheaper than RTOS, so the vendors would
first develop a Linux version. If it succeeds in the market, the
vendors will create a cost-down solution. So their first choice is to
cut down the memory footprint of the first Linux version instead of
moving to RTOS.

With the price of 128MB-DDR3 & 64MB-DDR2 being more and more similar,
32bit-Linux has more opportunities to instead of RTOS.

>
> For the 256MB+ systems, I would expect the sensitive kernel
> allocations to be small enough that the series makes little
> difference. The 128MB systems are the most interesting ones
> here, and I'm curious to see where you spot most of the
> memory usage differences, I'll also reply to your initial
> mail for that.
Thx, I aslo recommand you read about "Why s64ilp32 has better
performance?" section :)
How do you think running arm32-Linux on coretex-A35/A53/A55?

>
> Arnd

--
Best Regards
Guo Ren

2023-05-20 02:56:54

[permalink] [raw]

Subject: Re: [RFC PATCH 00/22] riscv: s64ilp32: Running 32-bit Linux kernel on 64-bit supervisor mode

On Sat, May 20, 2023 at 4:20 AM Arnd Bergmann <[email protected]> wrote:
>
> On Thu, May 18, 2023, at 15:09, [email protected] wrote:
> > From: Guo Ren <[email protected]>
> > Why 32-bit Linux?
> > =================
> > The motivation for using a 32-bit Linux kernel is to reduce memory
> > footprint and meet the small capacity of DDR & cache requirement
> > (e.g., 64/128MB SIP SoC).
> >
> > Here are the 32-bit v.s. 64-bit Linux kernel data type comparison
> > summary:
> > 32-bit 64-bit
> > sizeof(page): 32bytes 64bytes
> > sizeof(list_head): 8bytes 16bytes
> > sizeof(hlist_head): 8bytes 16bytes
> > sizeof(vm_area): 68bytes 136bytes
> > ...
>
> > Mem-usage:
> > (s32ilp32) # free
> > total used free shared buff/cache available
> > Mem: 100040 8380 88244 44 3416 88080
> >
> > (s64lp64) # free
> > total used free shared buff/cache available
> > Mem: 91568 11848 75796 44 3924 75952
> >
> > (s64ilp32) # free
> > total used free shared buff/cache available
> > Mem: 101952 8528 90004 44 3420 89816
> > ^^^^^
> >
> > It's a rough measurement based on the current default config without any
> > modification, and 32-bit (s32ilp32, s64ilp32) saved more than 16% memory
> > to 64-bit (s64lp64). But s32ilp32 & s64ilp32 have a similar memory
> > footprint (about 0.33% difference), meaning s64ilp32 has a big chance to
> > replace s32ilp32 on the 64-bit machine.
>
> I've tried to run the same numbers for the debate about running
> 32-bit vs 64-bit arm kernels in the past, but focused mostly on
> slightly larger systems, but I looked mainly at the 512MB case,
> as that is the most cost-efficient DDR3 memory configuration
> and fairly common.
512MB is extravagant, in my opinion. In the IPC market, 32/64MB is for
480P/720P/1080p, 128/256MB is for 1080p/2k, and 512/1024MB is for 4K.
> 512MB chips is less than 5% of the total (I guess). Even in 512MB
chips, the additional memory is for the frame buffer, not the Linux
system.
I agree for the > 512MB scenarios would make it less sensitive on a
32/64-bit Linux kernel.

>
> What I'd like to understand better in your example is where
> the 14MB of memory went. I assume this is for 128MB of total
> RAM, so we know that 1MB went into additional 'struct page'
> objects (32 bytes * 32768 pages). It would be good to know
> where the dynamic allocations went and if they are reclaimable
> (e.g. inodes) or non-reclaimable (e.g. kmalloc-128).
>
> For the vmlinux size, is this already a minimal config
> that one would run on a board with 128MB of RAM, or a
> defconfig that includes a lot of stuff that is only relevant
> for other platforms but also grows on 64-bit?
It's not minimal config, it's defconfig. So I say it's a roungh measurement :)

I admit I wanted a little bit to exaggerate it, but that's the
starting point for cutting down memory usage for most people, right?
During the past year, we have been convincing our customers to use the
s64lp64 + u32ilp32, but they can't tolerate even 1% memory additional
cost in 64MB/128MB scenarios and then chose cortex-a7/a35, which could
run 32-bit Linux. I think it's too early to talk about throwing 32-bit
Linux into the garbage, not only for the reason of memory footprint
but also for the ingrained opinion of the people. Changing their mind
needs a long time.

>
> What do you see in /proc/slabinfo, /proc/meminfo/, and
> 'size vmlinux' for the s64ilp32 and s64lp64 kernels here?
Both s64ilp32 & s64lp64 use the same u32ilp32_rootfs.ext2 binary and
the same opensbi binary.
All are opensbi(2MB) + Linux(126MB) memory layout.

Here is the result:

s64ilp32:
[ 0.000000] Virtual kernel memory layout:
[ 0.000000] fixmap : 0x9ce00000 - 0x9d000000 (2048 kB)
[ 0.000000] pci io : 0x9d000000 - 0x9e000000 ( 16 MB)
[ 0.000000] vmemmap : 0x9e000000 - 0xa0000000 ( 32 MB)
[ 0.000000] vmalloc : 0xa0000000 - 0xc0000000 ( 512 MB)
[ 0.000000] lowmem : 0xc0000000 - 0xc7e00000 ( 126 MB)
[ 0.000000] Memory: 97748K/129024K available (8699K kernel code,
8867K rwdata, 4096K rodata, 4204K init, 361K bss, 31276K reserved, 0K
cma-reserved)
...
# free
total used free shared buff/cache available
Mem: 101952 8516 90016 44 3420 89828
Swap: 0 0 0
# cat /proc/meminfo
MemTotal: 101952 kB
MemFree: 90016 kB
MemAvailable: 89836 kB
Buffers: 292 kB
Cached: 2484 kB
SwapCached: 0 kB
Active: 2556 kB
Inactive: 656 kB
Active(anon): 40 kB
Inactive(anon): 440 kB
Active(file): 2516 kB
Inactive(file): 216 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 32 kB
Writeback: 0 kB
AnonPages: 480 kB
Mapped: 1804 kB
Shmem: 44 kB
KReclaimable: 644 kB
Slab: 4536 kB
SReclaimable: 644 kB
SUnreclaim: 3892 kB
KernelStack: 344 kB
PageTables: 112 kB
SecPageTables: 0 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 50976 kB
Committed_AS: 2040 kB
VmallocTotal: 524288 kB
VmallocUsed: 112 kB
VmallocChunk: 0 kB
Percpu: 64 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 0 kB

# cat /proc/slabinfo

[68/1691]
slabinfo - version: 2.1
# name <active_objs> <num_objs> <objsize> <objperslab>
<pagesperslab> : tunables <limit> <batchcount> <sharedfactor> :
slabdata <active_slabs> <num_slabs> <sharedavail>
ext4_groupinfo_1k 28 28 144 28 1 : tunables 0 0
0 : slabdata 1 1 0
p9_req_t 0 0 104 39 1 : tunables 0 0
0 : slabdata 0 0 0
UDPv6 0 0 1088 15 4 : tunables 0 0
0 : slabdata 0 0 0
tw_sock_TCPv6 0 0 200 20 1 : tunables 0 0
0 : slabdata 0 0 0
request_sock_TCPv6 0 0 240 17 1 : tunables 0 0
0 : slabdata 0 0 0
TCPv6 0 0 2048 8 4 : tunables 0 0
0 : slabdata 0 0 0
bio-72 32 32 128 32 1 : tunables 0 0
0 : slabdata 1 1 0
bfq_io_cq 0 0 1000 8 2 : tunables 0 0
0 : slabdata 0 0 0
bio-184 21 21 192 21 1 : tunables 0 0
0 : slabdata 1 1 0
mqueue_inode_cache 10 10 768 10 2 : tunables 0 0
0 : slabdata 1 1 0
v9fs_inode_cache 0 0 576 14 2 : tunables 0 0
0 : slabdata 0 0 0
nfs4_xattr_cache_cache 0 0 1848 17 8 : tunables 0
0 0 : slabdata 0 0 0
nfs_direct_cache 0 0 152 26 1 : tunables 0 0
0 : slabdata 0 0 0
nfs_read_data 36 36 640 12 2 : tunables 0 0
0 : slabdata 3 3 0
nfs_inode_cache 0 0 832 19 4 : tunables 0 0
0 : slabdata 0 0 0
isofs_inode_cache 0 0 528 15 2 : tunables 0 0
0 : slabdata 0 0 0
fat_inode_cache 0 0 632 25 4 : tunables 0 0
0 : slabdata 0 0 0
fat_cache 0 0 24 170 1 : tunables 0 0
0 : slabdata 0 0 0
jbd2_journal_handle 0 0 48 85 1 : tunables 0
0 0 : slabdata 0 0 0
jbd2_journal_head 0 0 80 51 1 : tunables 0 0
0 : slabdata 0 0 0
ext4_fc_dentry_update 0 0 88 46 1 : tunables 0
0 0 : slabdata 0 0 0
ext4_inode_cache 88 88 984 8 2 : tunables 0 0
0 : slabdata 11 11 0
ext4_allocation_context 36 36 112 36 1 : tunables 0
0 0 : slabdata 1 1 0
ext4_io_end_vec 0 0 24 170 1 : tunables 0 0
0 : slabdata 0 0 0
pending_reservation 0 0 16 256 1 : tunables 0
0 0 : slabdata 0 0 0
extent_status 256 256 32 128 1 : tunables 0 0
0 : slabdata 2 2 0
mbcache 102 102 40 102 1 : tunables 0 0
0 : slabdata 1 1 0
dio 0 0 384 10 1 : tunables 0 0
0 : slabdata 0 0 0
audit_tree_mark 0 0 64 64 1 : tunables 0 0
0 : slabdata 0 0 0
rpc_inode_cache 0 0 576 14 2 : tunables 0 0
0 : slabdata 0 0 0
ip4-frags 0 0 152 26 1 : tunables 0 0
0 : slabdata 0 0 0
RAW 9 9 896 9 2 : tunables 0 0
0 : slabdata 1 1 0
UDP 8 8 960 8 2 : tunables 0 0
0 : slabdata 1 1 0
tw_sock_TCP 0 0 200 20 1 : tunables 0 0
0 : slabdata 0 0 0
request_sock_TCP 0 0 240 17 1 : tunables 0 0
0 : slabdata 0 0 0
TCP 0 0 1920 8 4 : tunables 0 0
0 : slabdata 0 0 0
hugetlbfs_inode_cache 8 8 504 8 1 : tunables 0
0 0 : slabdata 1 1 0
bio-164 42 42 192 21 1 : tunables 0 0
0 : slabdata 2 2 0
ep_head 0 0 8 512 1 : tunables 0 0
0 : slabdata 0 0 0
dax_cache 14 14 576 14 2 : tunables 0 0
0 : slabdata 1 1 0
sgpool-128 16 16 2048 8 4 : tunables 0 0
0 : slabdata 2 2 0
sgpool-64 8 8 1024 8 2 : tunables 0 0
0 : slabdata 1 1 0
request_queue 13 13 616 13 2 : tunables 0 0
0 : slabdata 1 1 0
blkdev_ioc 0 0 80 51 1 : tunables 0 0
0 : slabdata 0 0 0
bio-120 64 64 128 32 1 : tunables 0 0
0 : slabdata 2 2 0
biovec-max 40 40 3072 10 8 : tunables 0 0
0 : slabdata 4 4 0
biovec-128 0 0 1536 10 4 : tunables 0 0
0 : slabdata 0 0 0
[19/1691]
biovec-64 10 10 768 10 2 : tunables 0 0
0 : slabdata 1 1 0
dmaengine-unmap-2 128 128 32 128 1 : tunables 0 0
0 : slabdata 1 1 0
sock_inode_cache 22 22 704 11 2 : tunables 0 0
0 : slabdata 2 2 0
skbuff_small_head 14 14 576 14 2 : tunables 0 0
0 : slabdata 1 1 0
skbuff_fclone_cache 0 0 448 9 1 : tunables 0
0 0 : slabdata 0 0 0
file_lock_cache 28 28 144 28 1 : tunables 0 0
0 : slabdata 1 1 0
buffer_head 357 357 80 51 1 : tunables 0 0
0 : slabdata 7 7 0
proc_dir_entry 256 256 128 32 1 : tunables 0 0
0 : slabdata 8 8 0
pde_opener 0 0 24 170 1 : tunables 0 0
0 : slabdata 0 0 0
proc_inode_cache 60 60 536 15 2 : tunables 0 0
0 : slabdata 4 4 0
seq_file 42 42 96 42 1 : tunables 0 0
0 : slabdata 1 1 0
sigqueue 85 85 48 85 1 : tunables 0 0
0 : slabdata 1 1 0
bdev_cache 14 14 1152 14 4 : tunables 0 0
0 : slabdata 1 1 0
shmem_inode_cache 637 637 600 13 2 : tunables 0 0
0 : slabdata 49 49 0
kernfs_node_cache 13938 13938 88 46 1 : tunables 0 0
0 : slabdata 303 303 0
inode_cache 360 360 496 8 1 : tunables 0 0
0 : slabdata 45 45 0
dentry 1196 1196 152 26 1 : tunables 0 0
0 : slabdata 46 46 0
names_cache 8 8 4096 8 8 : tunables 0 0
0 : slabdata 1 1 0
net_namespace 0 0 2944 11 8 : tunables 0 0
0 : slabdata 0 0 0
iint_cache 0 0 96 42 1 : tunables 0 0
0 : slabdata 0 0 0
key_jar 105 105 192 21 1 : tunables 0 0
0 : slabdata 5 5 0
uts_namespace 0 0 416 19 2 : tunables 0 0
0 : slabdata 0 0 0
nsproxy 102 102 40 102 1 : tunables 0 0
0 : slabdata 1 1 0
vm_area_struct 255 255 80 51 1 : tunables 0 0
0 : slabdata 5 5 0
signal_cache 55 55 704 11 2 : tunables 0 0
0 : slabdata 5 5 0
sighand_cache 60 60 1088 15 4 : tunables 0 0
0 : slabdata 4 4 0
anon_vma_chain 384 384 32 128 1 : tunables 0 0
0 : slabdata 3 3 0
anon_vma 168 168 72 56 1 : tunables 0 0
0 : slabdata 3 3 0
perf_event 0 0 816 10 2 : tunables 0 0
0 : slabdata 0 0 0
maple_node 32 32 256 16 1 : tunables 0 0
0 : slabdata 2 2 0
radix_tree_node 338 338 304 13 1 : tunables 0 0
0 : slabdata 26 26 0
task_group 8 8 512 8 1 : tunables 0 0
0 : slabdata 1 1 0
mm_struct 20 20 768 10 2 : tunables 0 0
0 : slabdata 2 2 0
vmap_area 102 102 40 102 1 : tunables 0 0
0 : slabdata 1 1 0
page->ptl 256 256 16 256 1 : tunables 0 0
0 : slabdata 1 1 0
kmalloc-cg-8k 0 0 8192 4 8 : tunables 0 0
0 : slabdata 0 0 0
kmalloc-cg-4k 8 8 4096 8 8 : tunables 0 0
0 : slabdata 1 1 0
kmalloc-cg-2k 72 72 2048 8 4 : tunables 0 0
0 : slabdata 9 9 0
kmalloc-cg-1k 32 32 1024 8 2 : tunables 0 0
0 : slabdata 4 4 0
kmalloc-cg-512 32 32 512 8 1 : tunables 0 0
0 : slabdata 4 4 0
kmalloc-cg-256 96 96 256 16 1 : tunables 0 0
0 : slabdata 6 6 0
kmalloc-cg-192 63 63 192 21 1 : tunables 0 0
0 : slabdata 3 3 0
kmalloc-cg-128 160 160 128 32 1 : tunables 0 0
0 : slabdata 5 5 0
kmalloc-cg-64 128 128 64 64 1 : tunables 0 0
0 : slabdata 2 2 0
kmalloc-rcl-8k 0 0 8192 4 8 : tunables 0 0
0 : slabdata 0 0 0
kmalloc-rcl-4k 0 0 4096 8 8 : tunables 0 0
0 : slabdata 0 0 0
kmalloc-rcl-2k 0 0 2048 8 4 : tunables 0 0
0 : slabdata 0 0 0
kmalloc-rcl-1k 0 0 1024 8 2 : tunables 0 0
0 : slabdata 0 0 0
kmalloc-rcl-512 0 0 512 8 1 : tunables 0 0
0 : slabdata 0 0 0
kmalloc-rcl-256 0 0 256 16 1 : tunables 0 0
0 : slabdata 0 0 0
kmalloc-rcl-192 0 0 192 21 1 : tunables 0 0
0 : slabdata 0 0 0
kmalloc-rcl-128 0 0 128 32 1 : tunables 0 0
0 : slabdata 0 0 0
kmalloc-rcl-64 0 0 64 64 1 : tunables 0 0
0 : slabdata 0 0 0
kmalloc-8k 12 12 8192 4 8 : tunables 0 0
0 : slabdata 3 3 0
kmalloc-4k 16 16 4096 8 8 : tunables 0 0
0 : slabdata 2 2 0
kmalloc-2k 40 40 2048 8 4 : tunables 0 0
0 : slabdata 5 5 0
kmalloc-1k 88 88 1024 8 2 : tunables 0 0
0 : slabdata 11 11 0
kmalloc-512 856 856 512 8 1 : tunables 0 0
0 : slabdata 107 107 0
kmalloc-256 64 64 256 16 1 : tunables 0 0
0 : slabdata 4 4 0
kmalloc-192 126 126 192 21 1 : tunables 0 0
0 : slabdata 6 6 0
kmalloc-128 1056 1056 128 32 1 : tunables 0 0
0 : slabdata 33 33 0
kmalloc-64 5302 5312 64 64 1 : tunables 0 0
0 : slabdata 83 83 0
kmem_cache_node 128 128 64 64 1 : tunables 0 0
0 : slabdata 2 2 0
kmem_cache 128 128 128 32 1 : tunables 0 0
0 : slabdata 4 4 0

s64lp64:
[ 0.000000] Virtual kernel memory layout:
[ 0.000000] fixmap : 0xff1bfffffee00000 - 0xff1bffffff000000
(2048 kB)
[ 0.000000] pci io : 0xff1bffffff000000 - 0xff1c000000000000
( 16 MB)
[ 0.000000] vmemmap : 0xff1c000000000000 - 0xff20000000000000
(1024 TB)
[ 0.000000] vmalloc : 0xff20000000000000 - 0xff60000000000000
(16384 TB)
[ 0.000000] modules : 0xffffffff01579000 - 0xffffffff80000000
(2026 MB)
[ 0.000000] lowmem : 0xff60000000000000 - 0xff60000008000000
( 128 MB)
[ 0.000000] kernel : 0xffffffff80000000 - 0xffffffffffffffff
(2047 MB)
[ 0.000000] Memory: 89380K/131072K available (8638K kernel code,
4979K rwdata, 4096K rodata, 2191K init, 477K bss, 41692K reserved, 0K
cma-reserved)
...
# free
total used free shared buff/cache available
Mem: 91568 11472 76264 48 3832 76376
Swap: 0 0 0
# cat /proc/meminfo
MemTotal: 91568 kB
MemFree: 76220 kB
MemAvailable: 76352 kB
Buffers: 292 kB
Cached: 2488 kB
SwapCached: 0 kB
Active: 2560 kB
Inactive: 656 kB
Active(anon): 44 kB
Inactive(anon): 440 kB
Active(file): 2516 kB
Inactive(file): 216 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 16 kB
Writeback: 0 kB
AnonPages: 480 kB
Mapped: 1804 kB
Shmem: 48 kB
KReclaimable: 1092 kB
Slab: 6900 kB
SReclaimable: 1092 kB
SUnreclaim: 5808 kB
KernelStack: 688 kB
PageTables: 120 kB
SecPageTables: 0 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 45784 kB
Committed_AS: 2044 kB
VmallocTotal: 17592186044416 kB
VmallocUsed: 904 kB
VmallocChunk: 0 kB
Percpu: 88 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 0 kB
# cat /proc/slabinfo
slabinfo - version: 2.1
# name <active_objs> <num_objs> <objsize> <objperslab>
<pagesperslab> : tunables <limit> <batchcount> <sharedfactor> :
slabdata <active_slabs> <num_slabs> <sharedavail>
ext4_groupinfo_1k 19 19 208 19 1 : tunables 0 0
0 : slabdata 1 1 0
p9_req_t 0 0 176 23 1 : tunables 0 0
0 : slabdata 0 0 0
ip6-frags 0 0 208 19 1 : tunables 0 0
0 : slabdata 0 0 0
UDPv6 0 0 1472 11 4 : tunables 0 0
0 : slabdata 0 0 0
tw_sock_TCPv6 0 0 264 15 1 : tunables 0 0
0 : slabdata 0 0 0
request_sock_TCPv6 0 0 312 13 1 : tunables 0 0
0 : slabdata 0 0 0
TCPv6 0 0 2560 12 8 : tunables 0 0
0 : slabdata 0 0 0
bio-96 32 32 128 32 1 : tunables 0 0
0 : slabdata 1 1 0
bfq_io_cq 0 0 1352 12 4 : tunables 0 0
0 : slabdata 0 0 0
bfq_queue 0 0 576 14 2 : tunables 0 0
0 : slabdata 0 0 0
mqueue_inode_cache 14 14 1152 14 4 : tunables 0 0
0 : slabdata 1 1 0
v9fs_inode_cache 0 0 888 9 2 : tunables 0 0
0 : slabdata 0 0 0
nfs4_xattr_cache_cache 0 0 3168 10 8 : tunables 0
0 0 : slabdata 0 0 0
nfs_direct_cache 0 0 264 15 1 : tunables 0 0
0 : slabdata 0 0 0
nfs_commit_data 11 11 704 11 2 : tunables 0 0
0 : slabdata 1 1 0
nfs_read_data 36 36 896 9 2 : tunables 0 0
0 : slabdata 4 4 0
nfs_inode_cache 0 0 1272 25 8 : tunables 0 0
0 : slabdata 0 0 0
isofs_inode_cache 0 0 824 19 4 : tunables 0 0
0 : slabdata 0 0 0
fat_inode_cache 0 0 976 8 2 : tunables 0 0
0 : slabdata 0 0 0
fat_cache 0 0 40 102 1 : tunables 0 0
0 : slabdata 0 0 0
jbd2_journal_head 0 0 144 28 1 : tunables 0 0
0 : slabdata 0 0 0
jbd2_revoke_table_s 0 0 16 256 1 : tunables 0
0 0 : slabdata 0 0 0
ext4_fc_dentry_update 0 0 96 42 1 : tunables 0
0 0 : slabdata 0 0 0
ext4_inode_cache 105 105 1496 21 8 : tunables 0 0
0 : slabdata 5 5 0
ext4_allocation_context 30 30 136 30 1 : tunables 0
0 0 : slabdata 1 1 0
ext4_prealloc_space 34 34 120 34 1 : tunables 0
0 0 : slabdata 1 1 0
ext4_system_zone 102 102 40 102 1 : tunables 0 0
0 : slabdata 1 1 0
ext4_io_end_vec 0 0 32 128 1 : tunables 0 0
0 : slabdata 0 0 0
bio_post_read_ctx 170 170 48 85 1 : tunables 0 0
0 : slabdata 2 2 0
pending_reservation 0 0 32 128 1 : tunables 0
0 0 : slabdata 0 0 0
extent_status 102 102 40 102 1 : tunables 0 0
0 : slabdata 1 1 0
mbcache 0 0 56 73 1 : tunables 0 0
0 : slabdata 0 0 0
dnotify_struct 0 0 32 128 1 : tunables 0 0
0 : slabdata 0 0 0
pid_namespace 0 0 160 25 1 : tunables 0 0
0 : slabdata 0 0 0
posix_timers_cache 0 0 272 15 1 : tunables 0 0
0 : slabdata 0 0 0
rpc_inode_cache 0 0 832 19 4 : tunables 0 0
0 : slabdata 0 0 0
UNIX 12 12 1344 12 4 : tunables 0 0
0 : slabdata 1 1 0
ip4-frags 0 0 224 18 1 : tunables 0 0
0 : slabdata 0 0 0
xfrm_dst_cache 0 0 320 12 1 : tunables 0 0
0 : slabdata 0 0 0
ip_fib_trie 85 85 48 85 1 : tunables 0 0
0 : slabdata 1 1 0
ip_fib_alias 73 73 56 73 1 : tunables 0 0
0 : slabdata 1 1 0
UDP 12 12 1280 12 4 : tunables 0 0
0 : slabdata 1 1 0
[35/1689]
tw_sock_TCP 0 0 264 15 1 : tunables 0 0
0 : slabdata 0 0 0
request_sock_TCP 0 0 312 13 1 : tunables 0 0
0 : slabdata 0 0 0
TCP 0 0 2432 13 8 : tunables 0 0
0 : slabdata 0 0 0
hugetlbfs_inode_cache 10 10 784 10 2 : tunables 0
0 0 : slabdata 1 1 0
bio-224 48 48 256 16 1 : tunables 0 0
0 : slabdata 3 3 0
ep_head 0 0 16 256 1 : tunables 0 0
0 : slabdata 0 0 0
inotify_inode_mark 0 0 96 42 1 : tunables 0 0
0 : slabdata 0 0 0
dax_cache 8 8 960 8 2 : tunables 0 0
0 : slabdata 1 1 0
sgpool-128 10 10 3072 10 8 : tunables 0 0
0 : slabdata 1 1 0
sgpool-64 10 10 1536 10 4 : tunables 0 0
0 : slabdata 1 1 0
sgpool-16 10 10 384 10 1 : tunables 0 0
0 : slabdata 1 1 0
request_queue 15 15 1040 15 4 : tunables 0 0
0 : slabdata 1 1 0
bio-160 42 42 192 21 1 : tunables 0 0
0 : slabdata 2 2 0
biovec-128 8 8 2048 8 4 : tunables 0 0
0 : slabdata 1 1 0
biovec-64 8 8 1024 8 2 : tunables 0 0
0 : slabdata 1 1 0
user_namespace 0 0 632 25 4 : tunables 0 0
0 : slabdata 0 0 0
uid_cache 84 84 192 21 1 : tunables 0 0
0 : slabdata 4 4 0
dmaengine-unmap-2 64 64 64 64 1 : tunables 0 0
0 : slabdata 1 1 0
sock_inode_cache 24 24 1024 8 2 : tunables 0 0
0 : slabdata 3 3 0
skbuff_small_head 12 12 640 12 2 : tunables 0 0
0 : slabdata 1 1 0
skbuff_fclone_cache 0 0 512 8 1 : tunables 0
0 0 : slabdata 0 0 0
file_lock_cache 17 17 232 17 1 : tunables 0 0
0 : slabdata 1 1 0
fsnotify_mark_connector 0 0 56 73 1 : tunables 0
0 0 : slabdata 0 0 0
pde_opener 0 0 40 102 1 : tunables 0 0
0 : slabdata 0 0 0
proc_inode_cache 57 57 848 19 4 : tunables 0 0
0 : slabdata 3 3 0
seq_file 26 26 152 26 1 : tunables 0 0
0 : slabdata 1 1 0
sigqueue 51 51 80 51 1 : tunables 0 0
0 : slabdata 1 1 0
bdev_cache 18 18 1792 9 4 : tunables 0 0
0 : slabdata 2 2 0
shmem_inode_cache 646 646 936 17 4 : tunables 0 0
0 : slabdata 38 38 0
kernfs_iattrs_cache 0 0 96 42 1 : tunables 0
0 0 : slabdata 0 0 0
kernfs_node_cache 14304 14304 128 32 1 : tunables 0 0
0 : slabdata 447 447 0
filp 84 84 320 12 1 : tunables 0 0
0 : slabdata 7 7 0
inode_cache 360 360 776 10 2 : tunables 0 0
0 : slabdata 36 36 0
dentry 1188 1188 216 18 1 : tunables 0 0
0 : slabdata 66 66 0
names_cache 48 48 4096 8 8 : tunables 0 0
0 : slabdata 6 6 0
net_namespace 0 0 3840 8 8 : tunables 0 0
0 : slabdata 0 0 0
iint_cache 0 0 152 26 1 : tunables 0 0
0 : slabdata 0 0 0
uts_namespace 0 0 432 9 1 : tunables 0 0
0 : slabdata 0 0 0
nsproxy 56 56 72 56 1 : tunables 0 0
0 : slabdata 1 1 0
vm_area_struct 240 240 136 30 1 : tunables 0 0
0 : slabdata 8 8 0
files_cache 22 22 704 11 2 : tunables 0 0
0 : slabdata 2 2 0
signal_cache 56 56 1152 14 4 : tunables 0 0
0 : slabdata 4 4 0
sighand_cache 57 57 1664 19 8 : tunables 0 0
0 : slabdata 3 3 0
task_struct 55 55 2880 11 8 : tunables 0 0
0 : slabdata 5 5 0
anon_vma 120 120 136 30 1 : tunables 0 0
0 : slabdata 4 4 0
perf_event 0 0 1152 14 4 : tunables 0 0
0 : slabdata 0 0 0
maple_node 304 304 256 16 1 : tunables 0 0
0 : slabdata 19 19 0
radix_tree_node 350 350 584 14 2 : tunables 0 0
0 : slabdata 25 25 0
task_group 10 10 768 10 2 : tunables 0 0
0 : slabdata 1 1 0
mm_struct 22 22 1408 11 4 : tunables 0 0
0 : slabdata 2 2 0
vmap_area 168 168 72 56 1 : tunables 0 0
0 : slabdata 3 3 0
page->ptl 170 170 24 170 1 : tunables 0 0
0 : slabdata 1 1 0
kmalloc-cg-8k 0 0 8192 4 8 : tunables 0 0
0 : slabdata 0 0 0
kmalloc-cg-4k 24 24 4096 8 8 : tunables 0 0
0 : slabdata 3 3 0
kmalloc-cg-2k 32 32 2048 8 4 : tunables 0 0
0 : slabdata 4 4 0
kmalloc-cg-1k 24 24 1024 8 2 : tunables 0 0
0 : slabdata 3 3 0
kmalloc-cg-512 32 32 512 8 1 : tunables 0 0
0 : slabdata 4 4 0
kmalloc-cg-256 16 16 256 16 1 : tunables 0 0
0 : slabdata 1 1 0
kmalloc-cg-192 147 147 192 21 1 : tunables 0 0
0 : slabdata 7 7 0
kmalloc-cg-128 64 64 128 32 1 : tunables 0 0
0 : slabdata 2 2 0
kmalloc-cg-64 320 320 64 64 1 : tunables 0 0
0 : slabdata 5 5 0
kmalloc-rcl-8k 0 0 8192 4 8 : tunables 0 0
0 : slabdata 0 0 0
kmalloc-rcl-4k 0 0 4096 8 8 : tunables 0 0
0 : slabdata 0 0 0
kmalloc-rcl-2k 0 0 2048 8 4 : tunables 0 0
0 : slabdata 0 0 0
kmalloc-rcl-1k 0 0 1024 8 2 : tunables 0 0
0 : slabdata 0 0 0
kmalloc-rcl-512 0 0 512 8 1 : tunables 0 0
0 : slabdata 0 0 0
kmalloc-rcl-256 0 0 256 16 1 : tunables 0 0
0 : slabdata 0 0 0
kmalloc-rcl-192 0 0 192 21 1 : tunables 0 0
0 : slabdata 0 0 0
kmalloc-rcl-128 320 320 128 32 1 : tunables 0 0
0 : slabdata 10 10 0
kmalloc-rcl-64 64 64 64 64 1 : tunables 0 0
0 : slabdata 1 1 0
kmalloc-8k 12 12 8192 4 8 : tunables 0 0
0 : slabdata 3 3 0
kmalloc-4k 16 16 4096 8 8 : tunables 0 0
0 : slabdata 2 2 0
kmalloc-2k 64 64 2048 8 4 : tunables 0 0
0 : slabdata 8 8 0
kmalloc-1k 840 840 1024 8 2 : tunables 0 0
0 : slabdata 105 105 0
kmalloc-512 144 144 512 8 1 : tunables 0 0
0 : slabdata 18 18 0
kmalloc-256 816 816 256 16 1 : tunables 0 0
0 : slabdata 51 51 0
kmalloc-192 252 252 192 21 1 : tunables 0 0
0 : slabdata 12 12 0
kmalloc-128 480 480 128 32 1 : tunables 0 0
0 : slabdata 15 15 0
kmalloc-64 4912 4928 64 64 1 : tunables 0 0
0 : slabdata 77 77 0
kmem_cache_node 128 128 128 32 1 : tunables 0 0
0 : slabdata 4 4 0
kmem_cache 126 126 192 21 1 : tunables 0 0
0 : slabdata 6 6 0

>
> Arnd

--
Best Regards
Guo Ren

2023-05-20 10:51:30

by Arnd Bergmann

[permalink] [raw]

Subject: Re: [RFC PATCH 00/22] riscv: s64ilp32: Running 32-bit Linux kernel on 64-bit supervisor mode

On Sat, May 20, 2023, at 04:53, Guo Ren wrote:
> On Sat, May 20, 2023 at 4:20 AM Arnd Bergmann <[email protected]> wrote:
>> On Thu, May 18, 2023, at 15:09, [email protected] wrote:
>>
>> I've tried to run the same numbers for the debate about running
>> 32-bit vs 64-bit arm kernels in the past, but focused mostly on
>> slightly larger systems, but I looked mainly at the 512MB case,
>> as that is the most cost-efficient DDR3 memory configuration
>> and fairly common.
> 512MB is extravagant, in my opinion. In the IPC market, 32/64MB is for
> 480P/720P/1080p, 128/256MB is for 1080p/2k, and 512/1024MB is for 4K.
>> 512MB chips is less than 5% of the total (I guess). Even in 512MB
> chips, the additional memory is for the frame buffer, not the Linux
> system.

This depends a lot on the target application of course. For
a phone or NAS box, 512MB is probably the lower limit.

What I observe in arch/arm/ devicetree submissions, in board-db.org,
and when looking at industrial Arm board vendor websites is that
512MB is the most common configuration, and I think 1GB is still
more common than 256MB even for 32-bit machines. There is of course
a difference between number of individual products, and number of
machines shipped in a given configuration, and I guess you have
a good point that the cheapest ones are also the ones that ship
in the highest volume.

>> What I'd like to understand better in your example is where
>> the 14MB of memory went. I assume this is for 128MB of total
>> RAM, so we know that 1MB went into additional 'struct page'
>> objects (32 bytes * 32768 pages). It would be good to know
>> where the dynamic allocations went and if they are reclaimable
>> (e.g. inodes) or non-reclaimable (e.g. kmalloc-128).
>>
>> For the vmlinux size, is this already a minimal config
>> that one would run on a board with 128MB of RAM, or a
>> defconfig that includes a lot of stuff that is only relevant
>> for other platforms but also grows on 64-bit?
> It's not minimal config, it's defconfig. So I say it's a roungh
> measurement :)
>
> I admit I wanted a little bit to exaggerate it, but that's the
> starting point for cutting down memory usage for most people, right?
> During the past year, we have been convincing our customers to use the
> s64lp64 + u32ilp32, but they can't tolerate even 1% memory additional
> cost in 64MB/128MB scenarios and then chose cortex-a7/a35, which could
> run 32-bit Linux. I think it's too early to talk about throwing 32-bit
> Linux into the garbage, not only for the reason of memory footprint
> but also for the ingrained opinion of the people. Changing their mind
> needs a long time.
>
>>
>> What do you see in /proc/slabinfo, /proc/meminfo/, and
>> 'size vmlinux' for the s64ilp32 and s64lp64 kernels here?
> Both s64ilp32 & s64lp64 use the same u32ilp32_rootfs.ext2 binary and
> the same opensbi binary.
> All are opensbi(2MB) + Linux(126MB) memory layout.
>
> Here is the result:
>
> s64ilp32:
> [ 0.000000] Virtual kernel memory layout:
> [ 0.000000] fixmap : 0x9ce00000 - 0x9d000000 (2048 kB)
> [ 0.000000] pci io : 0x9d000000 - 0x9e000000 ( 16 MB)
> [ 0.000000] vmemmap : 0x9e000000 - 0xa0000000 ( 32 MB)
> [ 0.000000] vmalloc : 0xa0000000 - 0xc0000000 ( 512 MB)
> [ 0.000000] lowmem : 0xc0000000 - 0xc7e00000 ( 126 MB)
> [ 0.000000] Memory: 97748K/129024K available (8699K kernel code,
> 8867K rwdata, 4096K rodata, 4204K init, 361K bss, 31276K reserved, 0K
> cma-reserved)

Ok, so it saves only a little bit on .text/.init/.bss/.rodata, but
there is a 4MB difference in rwdata, and a total of 10.4MB difference
in "reserved" size, which I think includes all of the above plus
the mem_map[] array.

89380K/131072K available (8638K kernel code, 4979K rwdata, 4096K rodata, 2191K init, 477K bss, 41692K reserved, 0K cma-reserved)

Oddly, I don't see anywhere close to 8KB in a riscv64 defconfig
build (linux-next, gcc-13), so I don't know where that comes
from:

$ size -A build/tmp/vmlinux | sort -k2 -nr | head
Total 13518684
.text 8896058 18446744071562076160
.rodata 2219008 18446744071576748032
.data 933760 18446744071583039488
.bss 476080 18446744071584092160
.init.text 264718 18446744071572553728
__ksymtab_strings 183986 18446744071579214312
__ksymtab_gpl 122928 18446744071579091384
__ksymtab 109080 18446744071578982304
__bug_table 98352 18446744071583973248

> KReclaimable: 644 kB
> Slab: 4536 kB
> SReclaimable: 644 kB
> SUnreclaim: 3892 kB
> KernelStack: 344 kB

These look like the only notable differences in meminfo:

KReclaimable: 1092 kB
Slab: 6900 kB
SReclaimable: 1092 kB
SUnreclaim: 5808 kB
KernelStack: 688 kB

The largest chunk here is 2MB in non-reclaimable slab allocations,
or a 50% growth of those.

The kernel stacks are doubled as expected, but that's only 344KB,
similarly for reclaimable slabs.

> # cat /proc/slabinfo
>
> [68/1691]
> slabinfo - version: 2.1
> # name <active_objs> <num_objs> <objsize> <objperslab>
> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> :
> slabdata <active_slabs> <num_slabs> <sharedavail>
> ext4_groupinfo_1k 28 28 144 28 1 : tunables 0 0
> 0 : slabdata 1 1 0
> p9_req_t 0 0 104 39 1 : tunables 0 0

Did you perhaps miss a few lines while pasting these? It seems
odd that some caches only show up in the ilp32 case (proc_dir_entry,
bd2_journa_handle, buffer_head, biovec_max, anon_vma_chain, ...) and
some others are only in the lp64 case (UNIX, ext4_prealloc_space,
files_cache, filp, ip_fib_alias, task_struct, uid_cache, ...).

Looking at the ones that are in both and have the largest size
increase, I see

# lp64
1788 kernfs_node_cache 14304 128
590 shmem_inode_cache 646 936
272 inode_cache 360 776
153 ext4_inode_cache 105 1496
250 dentry 1188 216
192 names_cache 48 4096
199 radix_tree_node 350 584
307 kmalloc-64 4912 64
60 kmalloc-128 480 128
47 kmalloc-192 252 192
204 kmalloc-256 816 256
72 kmalloc-512 144 512
840 kmalloc-1k 840 1024

# ilp32
1197 kernfs_node_cache 13938 88
373 shmem_inode_cache 637 600
174 inode_cache 360 496
84 ext4_inode_cache 88 984
177 dentry 1196 152
32 names_cache 8 4096
100 radix_tree_node 338 304
331 kmalloc-64 5302 64
132 kmalloc-128 1056 128
23 kmalloc-192 126 192
16 kmalloc-256 64 256
428 kmalloc-512 856 512
88 kmalloc-1k 88 1024

So sysfs (kernfs_node_cache) has the largest chunk of the
2MB non-reclaimable slab, grown 50% from 1.2MB to 1.8MB.
In some cases, this could be avoided entirely by turning
off sysfs, but most users can't do that.
shmem_inode_cache is probably mostly devtmpfs, the
other inode caches ones are smaller and likely reclaimable.

It's interesting how the largest slab cache ends up
being the kmalloc-1k cache (840 1K objects) on lp64,
but the kmalloc-512 cache (856 512B objects) on ilp32.
My guess is that the majority of this is from a single
callsite that has an allocation groing just beyond 512B.
This alone seems significant enough to need further
investigation, I would hope we can completely avoid
these by adding a custom slab cache. I don't see this
effect on an arm64 boot though, for me the 512B allocations
are much higher the 1K ones.

Maybe you can identify the culprit using the boot-time traces
as listed in https://elinux.org/Kernel_dynamic_memory_analysis#Dynamic
That might help everyone running a 64-bit kernel on
low-memory configurations, though it would of course slightly
weaken your argument for an ilp32 kernel ;-)

Arnd

2023-05-20 16:10:13

[permalink] [raw]

Subject: Re: [RFC PATCH 00/22] riscv: s64ilp32: Running 32-bit Linux kernel on 64-bit supervisor mode

On Sat, May 20, 2023 at 6:13 PM Arnd Bergmann <[email protected]> wrote:
>
> On Sat, May 20, 2023, at 04:53, Guo Ren wrote:
> > On Sat, May 20, 2023 at 4:20 AM Arnd Bergmann <[email protected]> wrote:
> >> On Thu, May 18, 2023, at 15:09, [email protected] wrote:
> >>
> >> I've tried to run the same numbers for the debate about running
> >> 32-bit vs 64-bit arm kernels in the past, but focused mostly on
> >> slightly larger systems, but I looked mainly at the 512MB case,
> >> as that is the most cost-efficient DDR3 memory configuration
> >> and fairly common.
> > 512MB is extravagant, in my opinion. In the IPC market, 32/64MB is for
> > 480P/720P/1080p, 128/256MB is for 1080p/2k, and 512/1024MB is for 4K.
> >> 512MB chips is less than 5% of the total (I guess). Even in 512MB
> > chips, the additional memory is for the frame buffer, not the Linux
> > system.
>
> This depends a lot on the target application of course. For
> a phone or NAS box, 512MB is probably the lower limit.
>
> What I observe in arch/arm/ devicetree submissions, in board-db.org,
> and when looking at industrial Arm board vendor websites is that
> 512MB is the most common configuration, and I think 1GB is still
> more common than 256MB even for 32-bit machines. There is of course
> a difference between number of individual products, and number of
> machines shipped in a given configuration, and I guess you have
> a good point that the cheapest ones are also the ones that ship
> in the highest volume.
>
> >> What I'd like to understand better in your example is where
> >> the 14MB of memory went. I assume this is for 128MB of total
> >> RAM, so we know that 1MB went into additional 'struct page'
> >> objects (32 bytes * 32768 pages). It would be good to know
> >> where the dynamic allocations went and if they are reclaimable
> >> (e.g. inodes) or non-reclaimable (e.g. kmalloc-128).
> >>
> >> For the vmlinux size, is this already a minimal config
> >> that one would run on a board with 128MB of RAM, or a
> >> defconfig that includes a lot of stuff that is only relevant
> >> for other platforms but also grows on 64-bit?
> > It's not minimal config, it's defconfig. So I say it's a roungh
> > measurement :)
> >
> > I admit I wanted a little bit to exaggerate it, but that's the
> > starting point for cutting down memory usage for most people, right?
> > During the past year, we have been convincing our customers to use the
> > s64lp64 + u32ilp32, but they can't tolerate even 1% memory additional
> > cost in 64MB/128MB scenarios and then chose cortex-a7/a35, which could
> > run 32-bit Linux. I think it's too early to talk about throwing 32-bit
> > Linux into the garbage, not only for the reason of memory footprint
> > but also for the ingrained opinion of the people. Changing their mind
> > needs a long time.
> >
> >>
> >> What do you see in /proc/slabinfo, /proc/meminfo/, and
> >> 'size vmlinux' for the s64ilp32 and s64lp64 kernels here?
> > Both s64ilp32 & s64lp64 use the same u32ilp32_rootfs.ext2 binary and
> > the same opensbi binary.
> > All are opensbi(2MB) + Linux(126MB) memory layout.
> >
> > Here is the result:
> >
> > s64ilp32:
> > [ 0.000000] Virtual kernel memory layout:
> > [ 0.000000] fixmap : 0x9ce00000 - 0x9d000000 (2048 kB)
> > [ 0.000000] pci io : 0x9d000000 - 0x9e000000 ( 16 MB)
> > [ 0.000000] vmemmap : 0x9e000000 - 0xa0000000 ( 32 MB)
> > [ 0.000000] vmalloc : 0xa0000000 - 0xc0000000 ( 512 MB)
> > [ 0.000000] lowmem : 0xc0000000 - 0xc7e00000 ( 126 MB)
> > [ 0.000000] Memory: 97748K/129024K available (8699K kernel code,
> > 8867K rwdata, 4096K rodata, 4204K init, 361K bss, 31276K reserved, 0K
> > cma-reserved)
>
> Ok, so it saves only a little bit on .text/.init/.bss/.rodata, but
> there is a 4MB difference in rwdata, and a total of 10.4MB difference
> in "reserved" size, which I think includes all of the above plus
> the mem_map[] array.
>
> 89380K/131072K available (8638K kernel code, 4979K rwdata, 4096K rodata, 2191K init, 477K bss, 41692K reserved, 0K cma-reserved)
>
> Oddly, I don't see anywhere close to 8KB in a riscv64 defconfig
> build (linux-next, gcc-13), so I don't know where that comes
> from:
>
> $ size -A build/tmp/vmlinux | sort -k2 -nr | head
> Total 13518684
> .text 8896058 18446744071562076160
> .rodata 2219008 18446744071576748032
> .data 933760 18446744071583039488
> .bss 476080 18446744071584092160
> .init.text 264718 18446744071572553728
> __ksymtab_strings 183986 18446744071579214312
> __ksymtab_gpl 122928 18446744071579091384
> __ksymtab 109080 18446744071578982304
> __bug_table 98352 18446744071583973248
>
>
>
> > KReclaimable: 644 kB
> > Slab: 4536 kB
> > SReclaimable: 644 kB
> > SUnreclaim: 3892 kB
> > KernelStack: 344 kB
>
> These look like the only notable differences in meminfo:
>
> KReclaimable: 1092 kB
> Slab: 6900 kB
> SReclaimable: 1092 kB
> SUnreclaim: 5808 kB
> KernelStack: 688 kB
>
> The largest chunk here is 2MB in non-reclaimable slab allocations,
> or a 50% growth of those.
>
> The kernel stacks are doubled as expected, but that's only 344KB,
> similarly for reclaimable slabs.
>
> > # cat /proc/slabinfo
> >
> > [68/1691]
> > slabinfo - version: 2.1
> > # name <active_objs> <num_objs> <objsize> <objperslab>
> > <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> :
> > slabdata <active_slabs> <num_slabs> <sharedavail>
> > ext4_groupinfo_1k 28 28 144 28 1 : tunables 0 0
> > 0 : slabdata 1 1 0
> > p9_req_t 0 0 104 39 1 : tunables 0 0
>
> Did you perhaps miss a few lines while pasting these? It seems
> odd that some caches only show up in the ilp32 case (proc_dir_entry,
> bd2_journa_handle, buffer_head, biovec_max, anon_vma_chain, ...) and
> some others are only in the lp64 case (UNIX, ext4_prealloc_space,
> files_cache, filp, ip_fib_alias, task_struct, uid_cache, ...).
>
> Looking at the ones that are in both and have the largest size
> increase, I see
>
> # lp64
> 1788 kernfs_node_cache 14304 128
> 590 shmem_inode_cache 646 936
> 272 inode_cache 360 776
> 153 ext4_inode_cache 105 1496
> 250 dentry 1188 216
> 192 names_cache 48 4096
> 199 radix_tree_node 350 584
> 307 kmalloc-64 4912 64
> 60 kmalloc-128 480 128
> 47 kmalloc-192 252 192
> 204 kmalloc-256 816 256
> 72 kmalloc-512 144 512
> 840 kmalloc-1k 840 1024
>
> # ilp32
> 1197 kernfs_node_cache 13938 88
> 373 shmem_inode_cache 637 600
> 174 inode_cache 360 496
> 84 ext4_inode_cache 88 984
> 177 dentry 1196 152
> 32 names_cache 8 4096
> 100 radix_tree_node 338 304
> 331 kmalloc-64 5302 64
> 132 kmalloc-128 1056 128
> 23 kmalloc-192 126 192
> 16 kmalloc-256 64 256
> 428 kmalloc-512 856 512
> 88 kmalloc-1k 88 1024
>
> So sysfs (kernfs_node_cache) has the largest chunk of the
> 2MB non-reclaimable slab, grown 50% from 1.2MB to 1.8MB.
> In some cases, this could be avoided entirely by turning
> off sysfs, but most users can't do that.
> shmem_inode_cache is probably mostly devtmpfs, the
> other inode caches ones are smaller and likely reclaimable.
>
> It's interesting how the largest slab cache ends up
> being the kmalloc-1k cache (840 1K objects) on lp64,
> but the kmalloc-512 cache (856 512B objects) on ilp32.
> My guess is that the majority of this is from a single
> callsite that has an allocation groing just beyond 512B.
> This alone seems significant enough to need further
> investigation, I would hope we can completely avoid
> these by adding a custom slab cache. I don't see this
> effect on an arm64 boot though, for me the 512B allocations
> are much higher the 1K ones.
>
> Maybe you can identify the culprit using the boot-time traces
> as listed in https://elinux.org/Kernel_dynamic_memory_analysis#Dynamic
> That might help everyone running a 64-bit kernel on
> low-memory configurations, though it would of course slightly
> weaken your argument for an ilp32 kernel ;-)
Thx for the detailed reply, I would try your approches mentioned
lately. But these about traditional CONFIG_32BIT v.s. CONFIG_64BIT
comparation.

Besides the detailed analysis data, we also would meet the people's
concept problem. Such as struct page, struct list_head, and some
variables containing pointers, ilp32's would be significantly smaller
than lp64. That means ilp32 is smaller than lp64 in people's minds.
This concept would prevent vendors from accepting lp64 as a cost-down
solution. They even won't try, which I've met these years. I was an
lp64 kernel supporter last year, but I met a lot of arguments on
s64lp64 + u32ilp32. Some guys are using arm32 Linux; they want to stay
on 32-bit Linux to ensure their complex C code can work. So our
argument about "ilp32 v.s. lp64" won't have a result.

Let's change another view, cache utilization. These 64/128MB SoCs also
have limited cache capacities (L1-32KB+L2-128KB/only L1-64KB). Such as
List walk and stack saving/restoring are very common in Linux. What do
you think about "32-bit v.s. 64-bit" cache utilization?

>
> Arnd

--
Best Regards
Guo Ren

2023-05-21 12:55:03

[permalink] [raw]

Subject: Re: [RFC PATCH 00/22] riscv: s64ilp32: Running 32-bit Linux kernel on 64-bit supervisor mode

On Fri, May 19, 2023 at 8:14 AM Paul Walmsley <[email protected]> wrote:
>
> On Thu, 18 May 2023, Palmer Dabbelt wrote:
>
> > On Thu, 18 May 2023 06:09:51 PDT (-0700), [email protected] wrote:
> >
> > > This patch series adds s64ilp32 support to riscv. The term s64ilp32
> > > means smode-xlen=64 and -mabi=ilp32 (ints, longs, and pointers are all
> > > 32-bit), i.e., running 32-bit Linux kernel on pure 64-bit supervisor
> > > mode. There have been many 64ilp32 abis existing, such as mips-n32 [1],
> > > arm-aarch64ilp32 [2], and x86-x32 [3], but they are all about userspace.
> > > Thus, this should be the first time running a 32-bit Linux kernel with
> > > the 64ilp32 ABI at supervisor mode (If not, correct me).
> >
> > Does anyone actually want this? At a bare minimum we'd need to add it to the
> > psABI, which would presumably also be required on the compiler side of things.
> >
> > It's not even clear anyone wants rv64/ilp32 in userspace, the kernel seems
> > like it'd be even less widely used.
>
> We've certainly talked to folks who are interested in RV64 ILP32 userspace
> with an LP64 kernel. The motivation is the usual one: to reduce data size
> and therefore (ideally) BOM cost. I think this work, if it goes forward,
> would need to go hand in hand with the RVIA psABI group.
>
> The RV64 ILP32 kernel and ILP32 userspace approach implemented by this
> patch is intriguing, but I guess for me, the question is whether it's
> worth the extra hassle vs. a pure RV32 kernel & userspace.
Running pure RV32 kernel on 64-bit hardware is not an intelligent
choice (such as cortex-a35/a53/a55), because they wasted 64-bit hw
capabilities, and the hardware designer would waste additional
resources & time on 32-bit machine & supervisor modes (In arm it is
called EL3/EL2/EL1 modes). Think about too many PMP CSRs, PMU CSRs,
and mode switch ... it's definitely wrong to follow the
cortex-a35/a53/a55 way to deal with riscv32 on a 64-bit hardware. The
chapter "Why s64ilp32 has better performance?" give out the
improvement v.s. pure 32-bit, I repeat it here:

- memcpy/memset/strcmp (s64ilp32 has half of the number of
instructions and double the bandwidth per load/store instruction than
s32ilp32.)

- ebpf JIT is a 64-bit virtual ISA, which couldn't be sufficient
mapping by s32ilp32, but s64ilp32 could (just like s64lp64).

- Atomic64 (s64ilp32 has the exact native instructions mapping as
s64lp64, but s32ilp32 only uses generic_atomic64, a tradeoff & limited
software solution.)

- 64-bit native arithmetic instructions for "long long" type

- riscv s64ilp32 could support cmxchg_double for slub (The 2nd 32-bit
Linux supports the feature, the 1st is i386.)

>
>
> - Paul

--
Best Regards
Guo Ren