2022-01-28 20:40:33

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: "BUG: Invalid wait context" in invalidate_batched_entropy

Hi Jonathan,

Thanks for the report. I'll try to reproduce this and see what's going on.

I'm emailing back right away, though, so that I can CC in Andy
Lutomirski, who I know has been sitting on a stack of patches that fix
up (actually, remove) the locking, so this might be one path to fixing
this.

Thanks,
Jason

On Thu, Jan 27, 2022 at 11:21 PM Jonathan Neuschäfer
<[email protected]> wrote:
>
> Hi,
>
> when booting my ARM board with lockdep and CONFIG_PROVE_LOCKING, I get the following message:
>
> [ 2.500000] =============================
> [ 2.500000] [ BUG: Invalid wait context ]
> [ 2.500000] 5.17.0-rc1 #563 Not tainted
> [ 2.500000] -----------------------------
> [ 2.500000] swapper/1 is trying to lock:
> [ 2.500000] c0b0e9cc (batched_entropy_u32.lock){....}-{3:3}, at: invalidate_batched_entropy+0x18/0x4c
> [ 2.500000] other info that might help us debug this:
> [ 2.500000] context-{2:2}
> [ 2.500000] 3 locks held by swapper/1:
> [ 2.500000] #0: c0ae86ac (event_mutex){+.+.}-{4:4}, at: event_trace_init+0x4c/0xd8
> [ 2.500000] #1: c0ae81b8 (trace_event_sem){+.+.}-{4:4}, at: event_trace_init+0x68/0xd8
> [ 2.500000] #2: c19b05cc (&sb->s_type->i_mutex_key#2){+.+.}-{4:4}, at: start_creating+0x40/0xc4
> [ 2.500000] stack backtrace:
> [ 2.500000] CPU: 0 PID: 1 Comm: swapper Not tainted 5.17.0-rc1 #563
> [ 2.500000] Hardware name: WPCM450 chip
> [ 2.500000] [<c00100a8>] (unwind_backtrace) from [<c000db2c>] (show_stack+0x10/0x14)
> [ 2.500000] [<c000db2c>] (show_stack) from [<c0054eec>] (__lock_acquire+0x3f0/0x189c)
> [ 2.500000] [<c0054eec>] (__lock_acquire) from [<c0054478>] (lock_acquire+0x2b8/0x354)
> [ 2.500000] [<c0054478>] (lock_acquire) from [<c0568028>] (_raw_spin_lock_irqsave+0x60/0x74)
> [ 2.500000] [<c0568028>] (_raw_spin_lock_irqsave) from [<c030b6f4>] (invalidate_batched_entropy+0x18/0x4c)
> [ 2.500000] [<c030b6f4>] (invalidate_batched_entropy) from [<c030e7fc>] (crng_fast_load+0xf0/0x110)
> [ 2.500000] [<c030e7fc>] (crng_fast_load) from [<c030e954>] (add_interrupt_randomness+0x138/0x200)
> [ 2.500000] [<c030e954>] (add_interrupt_randomness) from [<c0061b34>] (handle_irq_event_percpu+0x18/0x38)
> [ 2.500000] [<c0061b34>] (handle_irq_event_percpu) from [<c0061b8c>] (handle_irq_event+0x38/0x5c)
> [ 2.500000] [<c0061b8c>] (handle_irq_event) from [<c0065b28>] (handle_fasteoi_irq+0x9c/0x114)
> [ 2.500000] [<c0065b28>] (handle_fasteoi_irq) from [<c0061178>] (handle_irq_desc+0x24/0x34)
> [ 2.500000] [<c0061178>] (handle_irq_desc) from [<c056214c>] (generic_handle_arch_irq+0x28/0x3c)
> [ 2.500000] [<c056214c>] (generic_handle_arch_irq) from [<c0008eb4>] (__irq_svc+0x54/0x80)
> [ 2.500000] Exception stack(0xc1485d48 to 0xc1485d90)
> [ 2.500000] 5d40: 9780e804 00000001 c09413d4 200000d3 60000053 c016af54
> [ 2.500000] 5d60: 00000000 c0afa5b8 c14194e0 c19a1d48 c0789ce0 00000000 c1490480 c1485d98
> [ 2.500000] 5d80: c0168970 c0168984 20000053 ffffffff
> [ 2.500000] [<c0008eb4>] (__irq_svc) from [<c0168984>] (read_seqbegin.constprop.0+0x6c/0x90)
> [ 2.500000] [<c0168984>] (read_seqbegin.constprop.0) from [<c016af54>] (d_lookup+0x14/0x40)
> [ 2.500000] [<c016af54>] (d_lookup) from [<c015cecc>] (lookup_dcache+0x18/0x50)
> [ 2.500000] [<c015cecc>] (lookup_dcache) from [<c015d868>] (lookup_one_len+0x90/0xe0)
> [ 2.500000] [<c015d868>] (lookup_one_len) from [<c01e33e4>] (start_creating+0x68/0xc4)
> [ 2.500000] [<c01e33e4>] (start_creating) from [<c01e398c>] (tracefs_create_file+0x30/0x11c)
> [ 2.500000] [<c01e398c>] (tracefs_create_file) from [<c00c42f8>] (trace_create_file+0x14/0x38)
> [ 2.500000] [<c00c42f8>] (trace_create_file) from [<c00cc854>] (event_create_dir+0x310/0x420)
> [ 2.500000] [<c00cc854>] (event_create_dir) from [<c00cc9d8>] (__trace_early_add_event_dirs+0x28/0x50)
> [ 2.500000] [<c00cc9d8>] (__trace_early_add_event_dirs) from [<c07c8d64>] (event_trace_init+0x70/0xd8)
> [ 2.500000] [<c07c8d64>] (event_trace_init) from [<c07c8560>] (tracer_init_tracefs+0x14/0x284)
> [ 2.500000] [<c07c8560>] (tracer_init_tracefs) from [<c000a330>] (do_one_initcall+0xdc/0x288)
> [ 2.500000] [<c000a330>] (do_one_initcall) from [<c07bd1e8>] (kernel_init_freeable+0x1c4/0x20c)
> [ 2.500000] [<c07bd1e8>] (kernel_init_freeable) from [<c05629c0>] (kernel_init+0x10/0x110)
> [ 2.500000] [<c05629c0>] (kernel_init) from [<c00084f8>] (ret_from_fork+0x14/0x3c)
> [ 2.500000] Exception stack(0xc1485fb0 to 0xc1485ff8)
> [ 2.500000] 5fa0: 00000000 00000000 00000000 00000000
> [ 2.500000] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> [ 2.500000] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
>
>
>
> I wasn't able to bisect it, but I can see it as far back as v5.13
> (I can't easily test earlier kernels). The lock in question was
> introduced with v5.2-rc1:
>
>
> commit b7d5dc21072cda7124d13eae2aefb7343ef94197
> Author: Sebastian Andrzej Siewior <[email protected]>
> Date: Sat Apr 20 00:09:51 2019 -0400
>
> random: add a spinlock_t to struct batched_entropy
>
> The per-CPU variable batched_entropy_uXX is protected by get_cpu_var().
> This is just a preempt_disable() which ensures that the variable is only
> from the local CPU. It does not protect against users on the same CPU
> from another context. It is possible that a preemptible context reads
> slot 0 and then an interrupt occurs and the same value is read again.
>
> [...]
> Add a spinlock_t to the batched_entropy data structure and acquire the
> lock while accessing it. Acquire the lock with disabled interrupts
> because this function may be used from interrupt context.
>
> Remove the batched_entropy_reset_lock lock. Now that we have a lock for
> the data scructure, we can access it from a remote CPU.
>
> Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
> Signed-off-by: Theodore Ts'o <[email protected]>
>
>
> I'm not very familiar with the finer details of locking rules and
> lockdep, so any ideas (or patches) to fix this will be appreciated.
>
>
> Best regards,
> Jonathan Neuschäfer
> --
>
>
> In case it's relevant, here is my config:
>
> CONFIG_SYSVIPC=y
> CONFIG_NO_HZ=y
> CONFIG_HIGH_RES_TIMERS=y
> CONFIG_PREEMPT=y
> CONFIG_IKCONFIG=y
> CONFIG_IKCONFIG_PROC=y
> CONFIG_LOG_BUF_SHIFT=19
> CONFIG_CGROUPS=y
> CONFIG_NAMESPACES=y
> CONFIG_BLK_DEV_INITRD=y
> CONFIG_INITRAMFS_SOURCE="rootfs.cpio"
> CONFIG_CC_OPTIMIZE_FOR_SIZE=y
> CONFIG_EXPERT=y
> CONFIG_PROFILING=y
> # CONFIG_ARCH_MULTI_V7 is not set
> CONFIG_ARCH_NPCM=y
> CONFIG_ARCH_WPCM450=y
> CONFIG_CPU_DCACHE_WRITETHROUGH=y
> CONFIG_AEABI=y
> CONFIG_HIGHMEM=y
> CONFIG_UACCESS_WITH_MEMCPY=y
> CONFIG_ARM_APPENDED_DTB=y
> CONFIG_CMDLINE="earlyprintk debug console=ttyS0,115200 panic=-1"
> CONFIG_CMDLINE_EXTEND=y
> CONFIG_KEXEC=y
> # CONFIG_ATAGS_PROC is not set
> CONFIG_CPU_FREQ=y
> CONFIG_CPU_FREQ_STAT=y
> CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y
> CONFIG_CPU_IDLE=y
> CONFIG_KPROBES=y
> CONFIG_MODULES=y
> CONFIG_MODULE_UNLOAD=y
> CONFIG_ZSWAP=y
> CONFIG_ZSWAP_DEFAULT_ON=y
> CONFIG_Z3FOLD=y
> CONFIG_NET=y
> CONFIG_PACKET=y
> CONFIG_UNIX=y
> CONFIG_INET=y
> CONFIG_IP_MULTICAST=y
> CONFIG_IP_PNP=y
> CONFIG_IP_PNP_DHCP=y
> CONFIG_IP_PNP_BOOTP=y
> CONFIG_NET_DSA=y
> CONFIG_NET_DSA_TAG_DSA=y
> CONFIG_NET_DSA_TAG_EDSA=y
> CONFIG_NET_DSA_TAG_TRAILER=y
> CONFIG_NET_NCSI=y
> CONFIG_NCSI_OEM_CMD_GET_MAC=y
> CONFIG_NET_PKTGEN=m
> # CONFIG_WIRELESS is not set
> CONFIG_DEVTMPFS=y
> CONFIG_DEVTMPFS_MOUNT=y
> CONFIG_DEVTMPFS_SAFE=y
> CONFIG_MTD=y
> CONFIG_MTD_CMDLINE_PARTS=y
> CONFIG_MTD_BLOCK=y
> CONFIG_MTD_CFI=y
> CONFIG_MTD_JEDECPROBE=y
> CONFIG_MTD_CFI_ADV_OPTIONS=y
> CONFIG_MTD_CFI_GEOMETRY=y
> # CONFIG_MTD_MAP_BANK_WIDTH_4 is not set
> CONFIG_MTD_CFI_INTELEXT=y
> CONFIG_MTD_CFI_STAA=y
> CONFIG_MTD_PHYSMAP=y
> CONFIG_MTD_SPI_NOR=y
> CONFIG_BLK_DEV_LOOP=y
> CONFIG_SRAM=y
> CONFIG_EEPROM_AT24=y
> CONFIG_SCSI=y
> # CONFIG_SCSI_PROC_FS is not set
> # CONFIG_SCSI_LOWLEVEL is not set
> CONFIG_NETDEVICES=y
> # CONFIG_NET_VENDOR_ALACRITECH is not set
> # CONFIG_NET_VENDOR_AMAZON is not set
> # CONFIG_NET_VENDOR_AQUANTIA is not set
> # CONFIG_NET_VENDOR_ARC is not set
> # CONFIG_NET_VENDOR_BROADCOM is not set
> # CONFIG_NET_VENDOR_CADENCE is not set
> # CONFIG_NET_VENDOR_CAVIUM is not set
> # CONFIG_NET_VENDOR_CIRRUS is not set
> # CONFIG_NET_VENDOR_CORTINA is not set
> # CONFIG_NET_VENDOR_EZCHIP is not set
> # CONFIG_NET_VENDOR_FARADAY is not set
> # CONFIG_NET_VENDOR_GOOGLE is not set
> # CONFIG_NET_VENDOR_HISILICON is not set
> # CONFIG_NET_VENDOR_HUAWEI is not set
> # CONFIG_NET_VENDOR_INTEL is not set
> # CONFIG_NET_VENDOR_MICROSOFT is not set
> # CONFIG_NET_VENDOR_LITEX is not set
> # CONFIG_NET_VENDOR_MARVELL is not set
> # CONFIG_NET_VENDOR_MELLANOX is not set
> # CONFIG_NET_VENDOR_MICREL is not set
> # CONFIG_NET_VENDOR_MICROCHIP is not set
> # CONFIG_NET_VENDOR_MICROSEMI is not set
> # CONFIG_NET_VENDOR_NATSEMI is not set
> # CONFIG_NET_VENDOR_NETRONOME is not set
> # CONFIG_NET_VENDOR_NI is not set
> # CONFIG_NET_VENDOR_PENSANDO is not set
> # CONFIG_NET_VENDOR_QUALCOMM is not set
> # CONFIG_NET_VENDOR_RENESAS is not set
> # CONFIG_NET_VENDOR_ROCKER is not set
> # CONFIG_NET_VENDOR_SAMSUNG is not set
> # CONFIG_NET_VENDOR_SEEQ is not set
> # CONFIG_NET_VENDOR_SOLARFLARE is not set
> # CONFIG_NET_VENDOR_SMSC is not set
> # CONFIG_NET_VENDOR_SOCIONEXT is not set
> # CONFIG_NET_VENDOR_STMICRO is not set
> # CONFIG_NET_VENDOR_SYNOPSYS is not set
> # CONFIG_NET_VENDOR_VIA is not set
> # CONFIG_NET_VENDOR_WIZNET is not set
> # CONFIG_NET_VENDOR_XILINX is not set
> CONFIG_DAVICOM_PHY=y
> CONFIG_MARVELL_PHY=y
> CONFIG_MICREL_PHY=y
> CONFIG_REALTEK_PHY=y
> # CONFIG_WLAN is not set
> CONFIG_INPUT_EVDEV=y
> CONFIG_KEYBOARD_QT1070=m
> CONFIG_KEYBOARD_GPIO=y
> # CONFIG_INPUT_MOUSE is not set
> CONFIG_VT_HW_CONSOLE_BINDING=y
> CONFIG_LEGACY_PTY_COUNT=16
> CONFIG_SERIAL_8250=y
> CONFIG_SERIAL_8250_CONSOLE=y
> CONFIG_SERIAL_8250_NR_UARTS=6
> CONFIG_SERIAL_8250_RUNTIME_UARTS=6
> CONFIG_SERIAL_8250_EXTENDED=y
> CONFIG_SERIAL_8250_MANY_PORTS=y
> CONFIG_SERIAL_8250_ASPEED_VUART=m
> CONFIG_SERIAL_OF_PLATFORM=y
> CONFIG_HW_RANDOM=y
> CONFIG_I2C=y
> # CONFIG_I2C_COMPAT is not set
> CONFIG_I2C_CHARDEV=y
> CONFIG_I2C_MUX=y
> CONFIG_SPI=y
> CONFIG_SPI_BITBANG=y
> CONFIG_PINCTRL_SINGLE=y
> CONFIG_GPIOLIB=y
> CONFIG_DEBUG_GPIO=y
> CONFIG_GPIO_SYSFS=y
> CONFIG_GPIO_PCF857X=y
> CONFIG_POWER_RESET=y
> CONFIG_POWER_RESET_GPIO=y
> CONFIG_POWER_SUPPLY=y
> CONFIG_THERMAL=y
> CONFIG_WATCHDOG=y
> CONFIG_NPCM7XX_WATCHDOG=y
> CONFIG_MFD_ATMEL_HLCDC=y
> CONFIG_MFD_SYSCON=y
> CONFIG_REGULATOR=y
> CONFIG_REGULATOR_FIXED_VOLTAGE=y
> # CONFIG_DRM_DEBUG_MODESET_LOCK is not set
> CONFIG_FB=y
> CONFIG_FB_MODE_HELPERS=y
> CONFIG_HID_A4TECH=y
> CONFIG_HID_APPLE=y
> CONFIG_HID_BELKIN=y
> CONFIG_HID_CHERRY=y
> CONFIG_HID_CYPRESS=y
> CONFIG_HID_DRAGONRISE=y
> CONFIG_HID_EZKEY=y
> CONFIG_HID_GYRATION=y
> CONFIG_HID_ITE=y
> CONFIG_HID_TWINHAN=y
> CONFIG_HID_KENSINGTON=y
> CONFIG_HID_REDRAGON=y
> CONFIG_HID_MICROSOFT=y
> CONFIG_HID_MONTEREY=y
> CONFIG_HID_PANTHERLORD=y
> CONFIG_HID_PETALYNX=y
> CONFIG_HID_SUNPLUS=y
> CONFIG_HID_GREENASIA=y
> CONFIG_HID_SMARTJOYPLUS=y
> CONFIG_HID_TOPSEED=y
> CONFIG_HID_ZEROPLUS=y
> CONFIG_USB_CHIPIDEA=y
> CONFIG_USB_CHIPIDEA_UDC=y
> CONFIG_NOP_USB_XCEIV=y
> CONFIG_USB_GADGET=y
> CONFIG_U_SERIAL_CONSOLE=y
> CONFIG_USB_CONFIGFS=y
> CONFIG_USB_CONFIGFS_SERIAL=y
> CONFIG_USB_CONFIGFS_RNDIS=y
> CONFIG_USB_CONFIGFS_MASS_STORAGE=y
> CONFIG_USB_CONFIGFS_F_HID=y
> CONFIG_MMC=y
> CONFIG_SDIO_UART=y
> CONFIG_MMC_SDHCI=m
> CONFIG_MMC_SDHCI_PLTFM=m
> CONFIG_MMC_SDHCI_OF_ASPEED=m
> CONFIG_NEW_LEDS=y
> CONFIG_LEDS_CLASS=y
> CONFIG_LEDS_GPIO=y
> CONFIG_LEDS_TRIGGERS=y
> CONFIG_LEDS_TRIGGER_TIMER=y
> CONFIG_LEDS_TRIGGER_ONESHOT=y
> CONFIG_LEDS_TRIGGER_MTD=y
> CONFIG_LEDS_TRIGGER_HEARTBEAT=y
> CONFIG_LEDS_TRIGGER_CPU=y
> CONFIG_LEDS_TRIGGER_ACTIVITY=y
> CONFIG_LEDS_TRIGGER_DEFAULT_ON=y
> CONFIG_LEDS_TRIGGER_PANIC=y
> CONFIG_LEDS_TRIGGER_TTY=y
> CONFIG_RTC_CLASS=y
> CONFIG_RTC_DRV_RS5C372=y
> CONFIG_RTC_DRV_PCF8563=y
> CONFIG_RTC_DRV_S35390A=y
> CONFIG_RTC_DRV_RV3029C2=m
> CONFIG_DMADEVICES=y
> CONFIG_SYNC_FILE=y
> # CONFIG_DMABUF_DEBUG is not set
> # CONFIG_VIRTIO_MENU is not set
> # CONFIG_VHOST_MENU is not set
> CONFIG_STAGING=y
> # CONFIG_SURFACE_PLATFORMS is not set
> # CONFIG_IOMMU_SUPPORT is not set
> CONFIG_MEMORY=y
> CONFIG_PWM=y
> CONFIG_PWM_ATMEL_HLCDC_PWM=m
> CONFIG_GENERIC_PHY=y
> CONFIG_ISO9660_FS=m
> CONFIG_JOLIET=y
> CONFIG_UDF_FS=m
> CONFIG_MSDOS_FS=y
> CONFIG_VFAT_FS=y
> CONFIG_TMPFS=y
> # CONFIG_MISC_FILESYSTEMS is not set
> # CONFIG_NETWORK_FILESYSTEMS is not set
> CONFIG_NLS_CODEPAGE_437=y
> CONFIG_NLS_CODEPAGE_850=y
> CONFIG_NLS_ISO8859_1=y
> CONFIG_NLS_ISO8859_2=y
> CONFIG_NLS_UTF8=y
> CONFIG_KEYS=y
> CONFIG_LSM="lockdown,yama,loadpin,safesetid,integrity,bpf"
> CONFIG_CRYPTO_RSA=y
> CONFIG_CRYPTO_CCM=y
> CONFIG_CRYPTO_GCM=y
> CONFIG_CRYPTO_CMAC=y
> CONFIG_CRYPTO_SHA256=y
> CONFIG_CRYPTO_AES=y
> CONFIG_CRYPTO_DEFLATE=y
> CONFIG_CRYPTO_ZSTD=y
> CONFIG_ASYMMETRIC_KEY_TYPE=y
> CONFIG_ASYMMETRIC_PUBLIC_KEY_SUBTYPE=y
> CONFIG_X509_CERTIFICATE_PARSER=y
> CONFIG_PKCS7_MESSAGE_PARSER=y
> CONFIG_SYSTEM_TRUSTED_KEYRING=y
> CONFIG_CRC_CCITT=y
> CONFIG_CRC16=y
> CONFIG_LIBCRC32C=y
> CONFIG_DMA_API_DEBUG=y
> CONFIG_PRINTK_TIME=y
> CONFIG_DEBUG_INFO=y
> CONFIG_DEBUG_INFO_COMPRESSED=y
> CONFIG_MAGIC_SYSRQ=y
> CONFIG_DEBUG_FS=y
> # CONFIG_SCHED_DEBUG is not set
> # CONFIG_DEBUG_PREEMPT is not set
> CONFIG_PROVE_LOCKING=y
> CONFIG_PROVE_RAW_LOCK_NESTING=y
> CONFIG_DEBUG_LOCKDEP=y
> CONFIG_DEBUG_ATOMIC_SLEEP=y
> # CONFIG_FTRACE is not set
> CONFIG_BACKTRACE_VERBOSE=y
> CONFIG_DEBUG_USER=y
> CONFIG_DEBUG_LL=y
> CONFIG_DEBUG_LL_UART_8250=y
> CONFIG_DEBUG_UART_PHYS=0xb8000000
> CONFIG_DEBUG_UART_VIRT=0x0ff000000
> CONFIG_DEBUG_UART_8250_WORD=y
> CONFIG_EARLY_PRINTK=y


Subject: Re: "BUG: Invalid wait context" in invalidate_batched_entropy

On 2022-01-27 23:26:32 [+0100], Jason A. Donenfeld wrote:
> Hi Jonathan,
Hi Jason,

> Thanks for the report. I'll try to reproduce this and see what's going on.
>
> I'm emailing back right away, though, so that I can CC in Andy
> Lutomirski, who I know has been sitting on a stack of patches that fix
> up (actually, remove) the locking, so this might be one path to fixing
> this.

This report is due to CONFIG_PROVE_LOCKING=y _and_
CONFIG_PROVE_RAW_LOCK_NESTING=y. It reports a nesting problem
(raw_spinlock_t -> spinlock_t lock ordering) which becomes a real
problem on PREEMPT_RT.

I've been testing my old series on top of 5.17-rc1. With blake2 the
numbers lowered a little. I'm gettin 3-6us on average and 16-26us
worstcase and with NUMA it still goes up to 40-50us.
If you still object the previous approach and neither tglx/peterz
disagree we could try making the lock raw_spinlock_t and add a mutex_t
around the userspace interface to lower the lock contention. But even
then we need to find a way to move the crng init part (crng_fast_load())
out of the hard-IRQ.

> Thanks,
> Jason

Sebastian

2022-01-31 11:12:25

by Jason A. Donenfeld

[permalink] [raw]
Subject: [PATCH] random: remove batched entropy locking

From: Andy Lutomirski <[email protected]>

We don't need spinlocks to protect batched entropy -- all we need
is a little bit of care. This should fix up the following splat that
Jonathan received with a PROVE_LOCKING=y/PROVE_RAW_LOCK_NESTING=y
kernel:

[ 2.500000] [ BUG: Invalid wait context ]
[ 2.500000] 5.17.0-rc1 #563 Not tainted
[ 2.500000] -----------------------------
[ 2.500000] swapper/1 is trying to lock:
[ 2.500000] c0b0e9cc (batched_entropy_u32.lock){....}-{3:3}, at: invalidate_batched_entropy+0x18/0x4c
[ 2.500000] other info that might help us debug this:
[ 2.500000] context-{2:2}
[ 2.500000] 3 locks held by swapper/1:
[ 2.500000] #0: c0ae86ac (event_mutex){+.+.}-{4:4}, at: event_trace_init+0x4c/0xd8
[ 2.500000] #1: c0ae81b8 (trace_event_sem){+.+.}-{4:4}, at: event_trace_init+0x68/0xd8
[ 2.500000] #2: c19b05cc (&sb->s_type->i_mutex_key#2){+.+.}-{4:4}, at: start_creating+0x40/0xc4
[ 2.500000] stack backtrace:
[ 2.500000] CPU: 0 PID: 1 Comm: swapper Not tainted 5.17.0-rc1 #563
[ 2.500000] Hardware name: WPCM450 chip
[ 2.500000] [<c00100a8>] (unwind_backtrace) from [<c000db2c>] (show_stack+0x10/0x14)
[ 2.500000] [<c000db2c>] (show_stack) from [<c0054eec>] (__lock_acquire+0x3f0/0x189c)
[ 2.500000] [<c0054eec>] (__lock_acquire) from [<c0054478>] (lock_acquire+0x2b8/0x354)
[ 2.500000] [<c0054478>] (lock_acquire) from [<c0568028>] (_raw_spin_lock_irqsave+0x60/0x74)
[ 2.500000] [<c0568028>] (_raw_spin_lock_irqsave) from [<c030b6f4>] (invalidate_batched_entropy+0x18/0x4c)
[ 2.500000] [<c030b6f4>] (invalidate_batched_entropy) from [<c030e7fc>] (crng_fast_load+0xf0/0x110)
[ 2.500000] [<c030e7fc>] (crng_fast_load) from [<c030e954>] (add_interrupt_randomness+0x138/0x200)
[ 2.500000] [<c030e954>] (add_interrupt_randomness) from [<c0061b34>] (handle_irq_event_percpu+0x18/0x38)
[ 2.500000] [<c0061b34>] (handle_irq_event_percpu) from [<c0061b8c>] (handle_irq_event+0x38/0x5c)
[ 2.500000] [<c0061b8c>] (handle_irq_event) from [<c0065b28>] (handle_fasteoi_irq+0x9c/0x114)
[ 2.500000] [<c0065b28>] (handle_fasteoi_irq) from [<c0061178>] (handle_irq_desc+0x24/0x34)
[ 2.500000] [<c0061178>] (handle_irq_desc) from [<c056214c>] (generic_handle_arch_irq+0x28/0x3c)
[ 2.500000] [<c056214c>] (generic_handle_arch_irq) from [<c0008eb4>] (__irq_svc+0x54/0x80)
[ 2.500000] Exception stack(0xc1485d48 to 0xc1485d90)
[ 2.500000] 5d40: 9780e804 00000001 c09413d4 200000d3 60000053 c016af54
[ 2.500000] 5d60: 00000000 c0afa5b8 c14194e0 c19a1d48 c0789ce0 00000000 c1490480 c1485d98
[ 2.500000] 5d80: c0168970 c0168984 20000053 ffffffff
[ 2.500000] [<c0008eb4>] (__irq_svc) from [<c0168984>] (read_seqbegin.constprop.0+0x6c/0x90)
[ 2.500000] [<c0168984>] (read_seqbegin.constprop.0) from [<c016af54>] (d_lookup+0x14/0x40)
[ 2.500000] [<c016af54>] (d_lookup) from [<c015cecc>] (lookup_dcache+0x18/0x50)
[ 2.500000] [<c015cecc>] (lookup_dcache) from [<c015d868>] (lookup_one_len+0x90/0xe0)
[ 2.500000] [<c015d868>] (lookup_one_len) from [<c01e33e4>] (start_creating+0x68/0xc4)
[ 2.500000] [<c01e33e4>] (start_creating) from [<c01e398c>] (tracefs_create_file+0x30/0x11c)
[ 2.500000] [<c01e398c>] (tracefs_create_file) from [<c00c42f8>] (trace_create_file+0x14/0x38)
[ 2.500000] [<c00c42f8>] (trace_create_file) from [<c00cc854>] (event_create_dir+0x310/0x420)
[ 2.500000] [<c00cc854>] (event_create_dir) from [<c00cc9d8>] (__trace_early_add_event_dirs+0x28/0x50)
[ 2.500000] [<c00cc9d8>] (__trace_early_add_event_dirs) from [<c07c8d64>] (event_trace_init+0x70/0xd8)
[ 2.500000] [<c07c8d64>] (event_trace_init) from [<c07c8560>] (tracer_init_tracefs+0x14/0x284)
[ 2.500000] [<c07c8560>] (tracer_init_tracefs) from [<c000a330>] (do_one_initcall+0xdc/0x288)
[ 2.500000] [<c000a330>] (do_one_initcall) from [<c07bd1e8>] (kernel_init_freeable+0x1c4/0x20c)
[ 2.500000] [<c07bd1e8>] (kernel_init_freeable) from [<c05629c0>] (kernel_init+0x10/0x110)
[ 2.500000] [<c05629c0>] (kernel_init) from [<c00084f8>] (ret_from_fork+0x14/0x3c)
[ 2.500000] Exception stack(0xc1485fb0 to 0xc1485ff8)
[ 2.500000] 5fa0: 00000000 00000000 00000000 00000000
[ 2.500000] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 2.500000] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000

Signed-off-by: Andy Lutomirski <[email protected]>
[Jason: I extracted this from a larger in-progress series of Andy's that
also unifies the two batches into one and does other performance
things. Since that's still under development, but because we need this
part to fix the CONFIG_PROVE_RAW_LOCK_NESTING issue, I've extracted it
out and applied it to the current setup. This will also make it easier
to backport to old kernels that also need the fix. I've also amended
Andy's original commit message.]
Reported-by: Jonathan Neuschäfer <[email protected]>
Link: https://lore.kernel.org/lkml/YfMa0QgsjCVdRAvJ@latitude/
Fixes: b7d5dc21072c ("random: add a spinlock_t to struct batched_entropy")
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: [email protected]
Signed-off-by: Jason A. Donenfeld <[email protected]>
---
Andy - could you take a look at this and let me know if it's still
correct after I've ported it out of your series and into a standalone
thing here? I'd prefer to hold off on moving forward on this until I
receive our green light. I'm also still a bit uncertain about your NB:
comment regarding the acceptable race. If you could elaborate on that
argument, it might save me a few cycles with my thinking cap on.

drivers/char/random.c | 57 ++++++++++++++++++++++++-------------------
1 file changed, 32 insertions(+), 25 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index b411182df6f6..22c190aecbe8 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -2063,7 +2063,6 @@ struct batched_entropy {
u32 entropy_u32[CHACHA_BLOCK_SIZE / sizeof(u32)];
};
unsigned int position;
- spinlock_t batch_lock;
};

/*
@@ -2075,7 +2074,7 @@ struct batched_entropy {
* point prior.
*/
static DEFINE_PER_CPU(struct batched_entropy, batched_entropy_u64) = {
- .batch_lock = __SPIN_LOCK_UNLOCKED(batched_entropy_u64.lock),
+ .position = ARRAY_SIZE(((struct batched_entropy *)0)->entropy_u64)
};

u64 get_random_u64(void)
@@ -2083,42 +2082,55 @@ u64 get_random_u64(void)
u64 ret;
unsigned long flags;
struct batched_entropy *batch;
+ size_t position;
static void *previous;

warn_unseeded_randomness(&previous);

- batch = raw_cpu_ptr(&batched_entropy_u64);
- spin_lock_irqsave(&batch->batch_lock, flags);
- if (batch->position % ARRAY_SIZE(batch->entropy_u64) == 0) {
+ local_irq_save(flags);
+ batch = this_cpu_ptr(&batched_entropy_u64);
+ position = READ_ONCE(batch->position);
+ /* NB: position can change to ARRAY_SIZE(batch->entropy_u64) out
+ * from under us -- see invalidate_batched_entropy(). If this,
+ * happens it's okay if we still return the data in the batch. */
+ if (unlikely(position + 1 > ARRAY_SIZE(batch->entropy_u64))) {
extract_crng((u8 *)batch->entropy_u64);
- batch->position = 0;
+ position = 0;
}
- ret = batch->entropy_u64[batch->position++];
- spin_unlock_irqrestore(&batch->batch_lock, flags);
+ ret = batch->entropy_u64[position++];
+ WRITE_ONCE(batch->position, position);
+ local_irq_restore(flags);
return ret;
}
EXPORT_SYMBOL(get_random_u64);

static DEFINE_PER_CPU(struct batched_entropy, batched_entropy_u32) = {
- .batch_lock = __SPIN_LOCK_UNLOCKED(batched_entropy_u32.lock),
+ .position = ARRAY_SIZE(((struct batched_entropy *)0)->entropy_u32)
};
+
u32 get_random_u32(void)
{
u32 ret;
unsigned long flags;
struct batched_entropy *batch;
+ size_t position;
static void *previous;

warn_unseeded_randomness(&previous);

- batch = raw_cpu_ptr(&batched_entropy_u32);
- spin_lock_irqsave(&batch->batch_lock, flags);
- if (batch->position % ARRAY_SIZE(batch->entropy_u32) == 0) {
+ local_irq_save(flags);
+ batch = this_cpu_ptr(&batched_entropy_u32);
+ position = READ_ONCE(batch->position);
+ /* NB: position can change to ARRAY_SIZE(batch->entropy_u32) out
+ * from under us -- see invalidate_batched_entropy(). If this,
+ * happens it's okay if we still return the data in the batch. */
+ if (unlikely(position + 1 > ARRAY_SIZE(batch->entropy_u32))) {
extract_crng((u8 *)batch->entropy_u32);
- batch->position = 0;
+ position = 0;
}
- ret = batch->entropy_u32[batch->position++];
- spin_unlock_irqrestore(&batch->batch_lock, flags);
+ ret = batch->entropy_u64[position++];
+ WRITE_ONCE(batch->position, position);
+ local_irq_restore(flags);
return ret;
}
EXPORT_SYMBOL(get_random_u32);
@@ -2130,20 +2142,15 @@ EXPORT_SYMBOL(get_random_u32);
static void invalidate_batched_entropy(void)
{
int cpu;
- unsigned long flags;

for_each_possible_cpu(cpu) {
- struct batched_entropy *batched_entropy;
+ struct batched_entropy *batch;

- batched_entropy = per_cpu_ptr(&batched_entropy_u32, cpu);
- spin_lock_irqsave(&batched_entropy->batch_lock, flags);
- batched_entropy->position = 0;
- spin_unlock(&batched_entropy->batch_lock);
+ batch = per_cpu_ptr(&batched_entropy_u32, cpu);
+ WRITE_ONCE(batch->position, ARRAY_SIZE(batch->entropy_u32));

- batched_entropy = per_cpu_ptr(&batched_entropy_u64, cpu);
- spin_lock(&batched_entropy->batch_lock);
- batched_entropy->position = 0;
- spin_unlock_irqrestore(&batched_entropy->batch_lock, flags);
+ batch = per_cpu_ptr(&batched_entropy_u64, cpu);
+ WRITE_ONCE(batch->position, ARRAY_SIZE(batch->entropy_u64));
}
}

--
2.35.0

2022-01-31 11:14:04

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [PATCH] random: remove batched entropy locking

Hi Sebastian,

On Fri, Jan 28, 2022 at 4:44 PM Sebastian Andrzej Siewior
<[email protected]> wrote:
> NO. Could we please look at my RANDOM patches first?
> I can repost my rebased patched if there no objection.

I did, and my reply is here:
https://lore.kernel.org/lkml/CAHmME9pzdXyD0oRYyCoVUSqqsA9h03-oR7kcNhJuPEcEMTJYgw@mail.gmail.com/

I was hoping for a series that addresses these issues. As I mentioned
before, I'm not super keen on deferring that processing in a
conditional case and having multiple entry ways into that same
functionality. I don't think that's worth it, especially if your
actual concern is just userspace calling RNDADDTOENTCNT too often
(which can be safely ratelimited). I don't think that thread needs to
spill over here, though, so feel free to follow up with a v+1 on that
series and I'll happily take a look. Alternatively, if you'd like to
approach this by providing a patch for Jonathan's issue, that makes
sense too. So far, the things in front of me are: 1) your patchset
from last month that has unresolved issues, and 2) Andy's thing, which
maybe will solve the problem (or it won't?). A third alternative from
you would be most welcome too.

Jason

2022-01-31 11:14:04

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [PATCH] random: remove batched entropy locking

Hi Andy,

On Fri, Jan 28, 2022 at 4:34 PM Jason A. Donenfeld <[email protected]> wrote:
> Andy - could you take a look at this and let me know if it's still
> correct after I've ported it out of your series and into a standalone
> thing here? I'd prefer to hold off on moving forward on this until I
> receive our green light. I'm also still a bit uncertain about your NB:
> comment regarding the acceptable race. If you could elaborate on that
> argument, it might save me a few cycles with my thinking cap on.
[...]
> + position = READ_ONCE(batch->position);
> + /* NB: position can change to ARRAY_SIZE(batch->entropy_u64) out
> + * from under us -- see invalidate_batched_entropy(). If this,
> + * happens it's okay if we still return the data in the batch. */
> + if (unlikely(position + 1 > ARRAY_SIZE(batch->entropy_u64))) {
> extract_crng((u8 *)batch->entropy_u64);
> - batch->position = 0;
> + position = 0;

So the argument for this is something along the lines of -- if the
pool is invalidated _after_ the position value is read, in the worst
case, we read old data, and then we set position to value 1, so the
remaining reads _also_ are of invalidated data, and then eventually
this depletes and we get newly extracted data? So this means that in
some racey situations, we're lengthening the window during which
get_random_u{32,64}() returns bad randomness, right?

Couldn't that race be avoided entirely by something like the below?

Jason

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 50c50a59847e..664f1a7eb293 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -2082,14 +2082,15 @@ u64 get_random_u64(void)
u64 ret;
unsigned long flags;
struct batched_entropy *batch;
- unsigned int position;
+ unsigned int position, original;
static void *previous;

warn_unseeded_randomness(&previous);

local_irq_save(flags);
batch = this_cpu_ptr(&batched_entropy_u64);
- position = READ_ONCE(batch->position);
+try_again:
+ original = position = READ_ONCE(batch->position);
/* NB: position can change to ARRAY_SIZE(batch->entropy_u64) out
* from under us -- see invalidate_batched_entropy(). If this
* happens it's okay if we still return the data in the batch. */
@@ -2098,6 +2099,8 @@ u64 get_random_u64(void)
position = 0;
}
ret = batch->entropy_u64[position++];
+ if (unlikely(cmpxchg(&batch->position, original, batch) != original))
+ goto try_again;
WRITE_ONCE(batch->position, position);
local_irq_restore(flags);
return ret;

Subject: Re: [PATCH] random: remove batched entropy locking

On 2022-01-28 16:33:44 [+0100], Jason A. Donenfeld wrote:
> From: Andy Lutomirski <[email protected]>
>
> We don't need spinlocks to protect batched entropy -- all we need
> is a little bit of care. This should fix up the following splat that
> Jonathan received with a PROVE_LOCKING=y/PROVE_RAW_LOCK_NESTING=y
> kernel:

NO. Could we please look at my RANDOM patches first?
This affects PREEMPT_RT. There is no need to stuff this in and tag it
stable.

I can repost my rebased patched if there no objection.
This patch invokes extract_crng() with disabled interrupts so we didn't
gain anything IMHO.

Sebastian

2022-01-31 11:15:29

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: "BUG: Invalid wait context" in invalidate_batched_entropy

Hi Sebastian/Jonathan,

On Fri, Jan 28, 2022 at 9:35 AM Sebastian Andrzej Siewior
<[email protected]> wrote:
> This report is due to CONFIG_PROVE_LOCKING=y _and_
> CONFIG_PROVE_RAW_LOCK_NESTING=y. It reports a nesting problem
> (raw_spinlock_t -> spinlock_t lock ordering) which becomes a real
> problem on PREEMPT_RT.

Hmm, I'm still having a tough time reproducing this. I'm trying to
understand your intuition. Is the problem you see that something else
in the IRQ path uses a raw_spinlock_t, and then with that lock still
held, we call invalidate_batched_entropy(), which takes an ordinary
spinlock_t, non-raw? And taking a spinlock-t while holding a
raw_spinlock_t is illegal?

Jason

Subject: Re: [PATCH] random: remove batched entropy locking

On 2022-01-28 16:54:06 [+0100], Jason A. Donenfeld wrote:
> Hi Sebastian,
Hi Jason,

> On Fri, Jan 28, 2022 at 4:44 PM Sebastian Andrzej Siewior
> <[email protected]> wrote:
> > NO. Could we please look at my RANDOM patches first?
> > I can repost my rebased patched if there no objection.
>
> I did, and my reply is here:
> https://lore.kernel.org/lkml/CAHmME9pzdXyD0oRYyCoVUSqqsA9h03-oR7kcNhJuPEcEMTJYgw@mail.gmail.com/
>
> I was hoping for a series that addresses these issues. As I mentioned
> before, I'm not super keen on deferring that processing in a
> conditional case and having multiple entry ways into that same
> functionality. I don't think that's worth it, especially if your
> actual concern is just userspace calling RNDADDTOENTCNT too often
> (which can be safely ratelimited). I don't think that thread needs to

And what do you do in ratelimiting? As I explained, you get 20 that
"enter" and the following are block. The first 20 are already
problematic and you need a plan-B for those that can't enter.
So I suggested a mutex_t around the ioctl() which would act as a rate
limiting. You did not not follow up on that idea.

> spill over here, though, so feel free to follow up with a v+1 on that
> series and I'll happily take a look. Alternatively, if you'd like to
> approach this by providing a patch for Jonathan's issue, that makes
> sense too. So far, the things in front of me are: 1) your patchset
> from last month that has unresolved issues, and 2) Andy's thing, which
> maybe will solve the problem (or it won't?). A third alternative from
> you would be most welcome too.

I made a reply yesterday I think with some numbers yesterday. From my
point of view it is an in-IRQ context/ code that can be avoided. The
RNDADDTOENTCNT is a simple way to hammer on the lock and see how bad it
gets. Things like add_hwgenerator_randomness() don't appear so often so
it is hard to figure out what the worst case can be.

Please ignore Jonathan report for now. As I tried to explain: This
lockdep report shows a serious problem on PREEMPT_RT. There is _no_ need
to be concerned on a non-PREEMPT_RT kernel. But it should be addressed.
If this gets merged as-is then thanks to the stable tag it will get
backported (again no change for !RT) and will collide with PREEMPT_RT
patch. And as I mentioned, the locking is not working on PREEMPT_RT.

> Jason

Sebastian

Subject: Re: "BUG: Invalid wait context" in invalidate_batched_entropy

On 2022-01-28 17:04:13 [+0100], Jason A. Donenfeld wrote:
> Hi Sebastian/Jonathan,
>
> On Fri, Jan 28, 2022 at 9:35 AM Sebastian Andrzej Siewior
> <[email protected]> wrote:
> > This report is due to CONFIG_PROVE_LOCKING=y _and_
> > CONFIG_PROVE_RAW_LOCK_NESTING=y. It reports a nesting problem
> > (raw_spinlock_t -> spinlock_t lock ordering) which becomes a real
> > problem on PREEMPT_RT.
>
> Hmm, I'm still having a tough time reproducing this. I'm trying to
> understand your intuition. Is the problem you see that something else
> in the IRQ path uses a raw_spinlock_t, and then with that lock still
> held, we call invalidate_batched_entropy(), which takes an ordinary
> spinlock_t, non-raw? And taking a spinlock-t while holding a
> raw_spinlock_t is illegal?

Correct. You must not acquire a spinlock_t while holding a
raw_spinlock_t. This is because on PREEMPT_RT the spinlock_t is a
sleeping lock while raw_spinlock_t disables preemption/ interrupts and
sleeping is not possible.
Documentation/locking/locktypes.rst

On non-PREEMPT both lock types (spinlock_t & raw_spinlock_t) behave in
the same way but lockdep can tell them apart with
CONFIG_PROVE_RAW_LOCK_NESTING=y and show code that is problematic on RT.

> Jason

Sebastian

2022-01-31 11:21:05

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: "BUG: Invalid wait context" in invalidate_batched_entropy

Hi Sebastian,

On Fri, Jan 28, 2022 at 5:19 PM Sebastian Andrzej Siewior
<[email protected]> wrote:
> Correct. You must not acquire a spinlock_t while holding a
> raw_spinlock_t. This is because on PREEMPT_RT the spinlock_t is a
> sleeping lock while raw_spinlock_t disables preemption/ interrupts and
> sleeping is not possible.
> Documentation/locking/locktypes.rst
>
> On non-PREEMPT both lock types (spinlock_t & raw_spinlock_t) behave in
> the same way but lockdep can tell them apart with
> CONFIG_PROVE_RAW_LOCK_NESTING=y and show code that is problematic on RT.

Gotcha. Surely, then, Andy's patch at least goes some of the way
toward fixing this, since it outright _removes_ a spinlock_t. There is
still the other spinlock_t that you want removed, I realize, though
this doesn't appear to be the one from Jonathan's bug report. It
sounds like Andy's patch might be one side of the fix, and your patch
the other?

Jason

2022-01-31 11:21:05

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [PATCH] random: remove batched entropy locking

Hi Sebastian,

I wrote in my last message, "I don't think that thread needs to spill
over here, though," but clearly you disagreed, which is fine I guess.
Replies inline below:

On Fri, Jan 28, 2022 at 5:15 PM Sebastian Andrzej Siewior
<[email protected]> wrote:
> > I did, and my reply is here:
> > https://lore.kernel.org/lkml/CAHmME9pzdXyD0oRYyCoVUSqqsA9h03-oR7kcNhJuPEcEMTJYgw@mail.gmail.com/
> >
> > I was hoping for a series that addresses these issues. As I mentioned
> > before, I'm not super keen on deferring that processing in a
> > conditional case and having multiple entry ways into that same
> > functionality. I don't think that's worth it, especially if your
> > actual concern is just userspace calling RNDADDTOENTCNT too often
> > (which can be safely ratelimited). I don't think that thread needs to
>
> And what do you do in ratelimiting?

If I'm understanding the RT issue correctly, the problem is that
userspace can `for (;;) iotl(...);`, and the high frequency of ioctls
then increases the average latency of interrupts to a level beyond the
requirements for RT. The idea of ratelimiting the ioctl would be so
that userspace is throttled from calling it too often, so that the
average latency isn't increased.

> As I explained, you get 20 that
> "enter" and the following are block. The first 20 are already
> problematic and you need a plan-B for those that can't enter.
> So I suggested a mutex_t around the ioctl() which would act as a rate
> limiting. You did not not follow up on that idea.

A mutex_t would be fine I think? I'd like to see what this looks like
in code, but conceptually I don't see why not.

> Please ignore Jonathan report for now. As I tried to explain: This
> lockdep report shows a serious problem on PREEMPT_RT. There is _no_ need
> to be concerned on a non-PREEMPT_RT kernel. But it should be addressed.
> If this gets merged as-is then thanks to the stable tag it will get
> backported (again no change for !RT) and will collide with PREEMPT_RT
> patch. And as I mentioned, the locking is not working on PREEMPT_RT.

Gotcha, okay, that makes sense. It sounds like Andy's patch and your
patch might both be part of the same non-stable-marked coin for
cutting down on locks in the IRQ path.

[Relatedly, I've been doing a bit of research on other ways to cut
down the amount of processing we're doing in the IRQ path, such as
<https://xn--4db.cc/K4zqXPh8/diff>. This is really not ready to go,
and I'm not ready to have a discussion on the crypto there (please,
nobody comment on the crypto there yet; I'll be really annoyed), but
the general gist is that I think it might be possible to reduce the
number of cycles spent in IRQ with some nice new tricks.]

Jason

Subject: Re: "BUG: Invalid wait context" in invalidate_batched_entropy

On 2022-01-28 17:28:47 [+0100], Jason A. Donenfeld wrote:
> Hi Sebastian,
Hi Jason,

> Gotcha. Surely, then, Andy's patch at least goes some of the way
> toward fixing this, since it outright _removes_ a spinlock_t. There is
> still the other spinlock_t that you want removed, I realize, though
> this doesn't appear to be the one from Jonathan's bug report. It
> sounds like Andy's patch might be one side of the fix, and your patch
> the other?

Only if we want to keep that lock a raw_spinlock_t. And this change
extends your IRQ-off region. Before I was only worried about that one
lock and all the callers. Now we have a little more possibilities.

From looking at get_random_u32(), the whole worst case includes the
whole of extract_crng(). So we have the possible call chain
crng_reseed() -> crng_finalize_init() and here we have
wake_up_interruptible() and kill_fasync() which both can't be called
with disabled interrupts.

> Jason

Sebastian

2022-01-31 11:31:04

by J. Neuschäfer

[permalink] [raw]
Subject: Re: [PATCH] random: remove batched entropy locking

On Fri, Jan 28, 2022 at 04:33:44PM +0100, Jason A. Donenfeld wrote:
> From: Andy Lutomirski <[email protected]>
>
> We don't need spinlocks to protect batched entropy -- all we need
> is a little bit of care. This should fix up the following splat that
> Jonathan received with a PROVE_LOCKING=y/PROVE_RAW_LOCK_NESTING=y
> kernel:
>
> [ 2.500000] [ BUG: Invalid wait context ]
> [ 2.500000] 5.17.0-rc1 #563 Not tainted
> [ 2.500000] -----------------------------
> [ 2.500000] swapper/1 is trying to lock:
> [ 2.500000] c0b0e9cc (batched_entropy_u32.lock){....}-{3:3}, at: invalidate_batched_entropy+0x18/0x4c
> [ 2.500000] other info that might help us debug this:
> [ 2.500000] context-{2:2}
> [ 2.500000] 3 locks held by swapper/1:
> [ 2.500000] #0: c0ae86ac (event_mutex){+.+.}-{4:4}, at: event_trace_init+0x4c/0xd8
> [ 2.500000] #1: c0ae81b8 (trace_event_sem){+.+.}-{4:4}, at: event_trace_init+0x68/0xd8
> [ 2.500000] #2: c19b05cc (&sb->s_type->i_mutex_key#2){+.+.}-{4:4}, at: start_creating+0x40/0xc4
> [ 2.500000] stack backtrace:
> [ 2.500000] CPU: 0 PID: 1 Comm: swapper Not tainted 5.17.0-rc1 #563
> [ 2.500000] Hardware name: WPCM450 chip
> [ 2.500000] [<c00100a8>] (unwind_backtrace) from [<c000db2c>] (show_stack+0x10/0x14)
> [ 2.500000] [<c000db2c>] (show_stack) from [<c0054eec>] (__lock_acquire+0x3f0/0x189c)
> [ 2.500000] [<c0054eec>] (__lock_acquire) from [<c0054478>] (lock_acquire+0x2b8/0x354)
> [ 2.500000] [<c0054478>] (lock_acquire) from [<c0568028>] (_raw_spin_lock_irqsave+0x60/0x74)
> [ 2.500000] [<c0568028>] (_raw_spin_lock_irqsave) from [<c030b6f4>] (invalidate_batched_entropy+0x18/0x4c)
> [ 2.500000] [<c030b6f4>] (invalidate_batched_entropy) from [<c030e7fc>] (crng_fast_load+0xf0/0x110)
> [ 2.500000] [<c030e7fc>] (crng_fast_load) from [<c030e954>] (add_interrupt_randomness+0x138/0x200)
> [ 2.500000] [<c030e954>] (add_interrupt_randomness) from [<c0061b34>] (handle_irq_event_percpu+0x18/0x38)
> [ 2.500000] [<c0061b34>] (handle_irq_event_percpu) from [<c0061b8c>] (handle_irq_event+0x38/0x5c)
> [ 2.500000] [<c0061b8c>] (handle_irq_event) from [<c0065b28>] (handle_fasteoi_irq+0x9c/0x114)
> [ 2.500000] [<c0065b28>] (handle_fasteoi_irq) from [<c0061178>] (handle_irq_desc+0x24/0x34)
> [ 2.500000] [<c0061178>] (handle_irq_desc) from [<c056214c>] (generic_handle_arch_irq+0x28/0x3c)
> [ 2.500000] [<c056214c>] (generic_handle_arch_irq) from [<c0008eb4>] (__irq_svc+0x54/0x80)
> [ 2.500000] Exception stack(0xc1485d48 to 0xc1485d90)
> [ 2.500000] 5d40: 9780e804 00000001 c09413d4 200000d3 60000053 c016af54
> [ 2.500000] 5d60: 00000000 c0afa5b8 c14194e0 c19a1d48 c0789ce0 00000000 c1490480 c1485d98
> [ 2.500000] 5d80: c0168970 c0168984 20000053 ffffffff
> [ 2.500000] [<c0008eb4>] (__irq_svc) from [<c0168984>] (read_seqbegin.constprop.0+0x6c/0x90)
> [ 2.500000] [<c0168984>] (read_seqbegin.constprop.0) from [<c016af54>] (d_lookup+0x14/0x40)
> [ 2.500000] [<c016af54>] (d_lookup) from [<c015cecc>] (lookup_dcache+0x18/0x50)
> [ 2.500000] [<c015cecc>] (lookup_dcache) from [<c015d868>] (lookup_one_len+0x90/0xe0)
> [ 2.500000] [<c015d868>] (lookup_one_len) from [<c01e33e4>] (start_creating+0x68/0xc4)
> [ 2.500000] [<c01e33e4>] (start_creating) from [<c01e398c>] (tracefs_create_file+0x30/0x11c)
> [ 2.500000] [<c01e398c>] (tracefs_create_file) from [<c00c42f8>] (trace_create_file+0x14/0x38)
> [ 2.500000] [<c00c42f8>] (trace_create_file) from [<c00cc854>] (event_create_dir+0x310/0x420)
> [ 2.500000] [<c00cc854>] (event_create_dir) from [<c00cc9d8>] (__trace_early_add_event_dirs+0x28/0x50)
> [ 2.500000] [<c00cc9d8>] (__trace_early_add_event_dirs) from [<c07c8d64>] (event_trace_init+0x70/0xd8)
> [ 2.500000] [<c07c8d64>] (event_trace_init) from [<c07c8560>] (tracer_init_tracefs+0x14/0x284)
> [ 2.500000] [<c07c8560>] (tracer_init_tracefs) from [<c000a330>] (do_one_initcall+0xdc/0x288)
> [ 2.500000] [<c000a330>] (do_one_initcall) from [<c07bd1e8>] (kernel_init_freeable+0x1c4/0x20c)
> [ 2.500000] [<c07bd1e8>] (kernel_init_freeable) from [<c05629c0>] (kernel_init+0x10/0x110)
> [ 2.500000] [<c05629c0>] (kernel_init) from [<c00084f8>] (ret_from_fork+0x14/0x3c)
> [ 2.500000] Exception stack(0xc1485fb0 to 0xc1485ff8)
> [ 2.500000] 5fa0: 00000000 00000000 00000000 00000000
> [ 2.500000] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> [ 2.500000] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
>
> Signed-off-by: Andy Lutomirski <[email protected]>
> [Jason: I extracted this from a larger in-progress series of Andy's that
> also unifies the two batches into one and does other performance
> things. Since that's still under development, but because we need this
> part to fix the CONFIG_PROVE_RAW_LOCK_NESTING issue, I've extracted it
> out and applied it to the current setup. This will also make it easier
> to backport to old kernels that also need the fix. I've also amended
> Andy's original commit message.]
> Reported-by: Jonathan Neuschäfer <[email protected]>
> Link: https://lore.kernel.org/lkml/YfMa0QgsjCVdRAvJ@latitude/
> Fixes: b7d5dc21072c ("random: add a spinlock_t to struct batched_entropy")
> Cc: Sebastian Andrzej Siewior <[email protected]>
> Cc: [email protected]
> Signed-off-by: Jason A. Donenfeld <[email protected]>
> ---
> Andy - could you take a look at this and let me know if it's still
> correct after I've ported it out of your series and into a standalone
> thing here? I'd prefer to hold off on moving forward on this until I
> receive our green light. I'm also still a bit uncertain about your NB:
> comment regarding the acceptable race. If you could elaborate on that
> argument, it might save me a few cycles with my thinking cap on.
>
> drivers/char/random.c | 57 ++++++++++++++++++++++++-------------------
> 1 file changed, 32 insertions(+), 25 deletions(-)

FWIW, this does fix the splat on my machine.

Tested-by: Jonathan Neuschäfer <[email protected]>



Thanks everyone,
Jonathan


Attachments:
(No filename) (6.09 kB)
signature.asc (849.00 B)
Download all attachments

2022-01-31 11:47:12

by Jason A. Donenfeld

[permalink] [raw]
Subject: [PATCH v2] random: remove batched entropy locking

Rather than use spinlocks to protect batched entropy, we can instead
disable interrupts locally, since we're dealing with per-cpu data, and
manage resets with a basic generation counter. This should fix up the
below splat that Jonathan received with a PROVE_RAW_LOCK_NESTING=y
kernel.

Note that Sebastian has pointed out a few other areas where
using spinlock_t in an IRQ context is potentially problematic for
PREEMPT_RT. This patch handles one of those cases, and we'll hopefully
have additional patches for other cases.

[ 2.500000] [ BUG: Invalid wait context ]
[ 2.500000] 5.17.0-rc1 #563 Not tainted
[ 2.500000] -----------------------------
[ 2.500000] swapper/1 is trying to lock:
[ 2.500000] c0b0e9cc (batched_entropy_u32.lock){....}-{3:3}, at: invalidate_batched_entropy+0x18/0x4c
[ 2.500000] other info that might help us debug this:
[ 2.500000] context-{2:2}
[ 2.500000] 3 locks held by swapper/1:
[ 2.500000] #0: c0ae86ac (event_mutex){+.+.}-{4:4}, at: event_trace_init+0x4c/0xd8
[ 2.500000] #1: c0ae81b8 (trace_event_sem){+.+.}-{4:4}, at: event_trace_init+0x68/0xd8
[ 2.500000] #2: c19b05cc (&sb->s_type->i_mutex_key#2){+.+.}-{4:4}, at: start_creating+0x40/0xc4
[ 2.500000] stack backtrace:
[ 2.500000] CPU: 0 PID: 1 Comm: swapper Not tainted 5.17.0-rc1 #563
[ 2.500000] Hardware name: WPCM450 chip
[ 2.500000] [<c00100a8>] (unwind_backtrace) from [<c000db2c>] (show_stack+0x10/0x14)
[ 2.500000] [<c000db2c>] (show_stack) from [<c0054eec>] (__lock_acquire+0x3f0/0x189c)
[ 2.500000] [<c0054eec>] (__lock_acquire) from [<c0054478>] (lock_acquire+0x2b8/0x354)
[ 2.500000] [<c0054478>] (lock_acquire) from [<c0568028>] (_raw_spin_lock_irqsave+0x60/0x74)
[ 2.500000] [<c0568028>] (_raw_spin_lock_irqsave) from [<c030b6f4>] (invalidate_batched_entropy+0x18/0x4c)
[ 2.500000] [<c030b6f4>] (invalidate_batched_entropy) from [<c030e7fc>] (crng_fast_load+0xf0/0x110)
[ 2.500000] [<c030e7fc>] (crng_fast_load) from [<c030e954>] (add_interrupt_randomness+0x138/0x200)
[ 2.500000] [<c030e954>] (add_interrupt_randomness) from [<c0061b34>] (handle_irq_event_percpu+0x18/0x38)
[ 2.500000] [<c0061b34>] (handle_irq_event_percpu) from [<c0061b8c>] (handle_irq_event+0x38/0x5c)
[ 2.500000] [<c0061b8c>] (handle_irq_event) from [<c0065b28>] (handle_fasteoi_irq+0x9c/0x114)
[ 2.500000] [<c0065b28>] (handle_fasteoi_irq) from [<c0061178>] (handle_irq_desc+0x24/0x34)
[ 2.500000] [<c0061178>] (handle_irq_desc) from [<c056214c>] (generic_handle_arch_irq+0x28/0x3c)
[ 2.500000] [<c056214c>] (generic_handle_arch_irq) from [<c0008eb4>] (__irq_svc+0x54/0x80)
[ 2.500000] Exception stack(0xc1485d48 to 0xc1485d90)
[ 2.500000] 5d40: 9780e804 00000001 c09413d4 200000d3 60000053 c016af54
[ 2.500000] 5d60: 00000000 c0afa5b8 c14194e0 c19a1d48 c0789ce0 00000000 c1490480 c1485d98
[ 2.500000] 5d80: c0168970 c0168984 20000053 ffffffff
[ 2.500000] [<c0008eb4>] (__irq_svc) from [<c0168984>] (read_seqbegin.constprop.0+0x6c/0x90)
[ 2.500000] [<c0168984>] (read_seqbegin.constprop.0) from [<c016af54>] (d_lookup+0x14/0x40)
[ 2.500000] [<c016af54>] (d_lookup) from [<c015cecc>] (lookup_dcache+0x18/0x50)
[ 2.500000] [<c015cecc>] (lookup_dcache) from [<c015d868>] (lookup_one_len+0x90/0xe0)
[ 2.500000] [<c015d868>] (lookup_one_len) from [<c01e33e4>] (start_creating+0x68/0xc4)
[ 2.500000] [<c01e33e4>] (start_creating) from [<c01e398c>] (tracefs_create_file+0x30/0x11c)
[ 2.500000] [<c01e398c>] (tracefs_create_file) from [<c00c42f8>] (trace_create_file+0x14/0x38)
[ 2.500000] [<c00c42f8>] (trace_create_file) from [<c00cc854>] (event_create_dir+0x310/0x420)
[ 2.500000] [<c00cc854>] (event_create_dir) from [<c00cc9d8>] (__trace_early_add_event_dirs+0x28/0x50)
[ 2.500000] [<c00cc9d8>] (__trace_early_add_event_dirs) from [<c07c8d64>] (event_trace_init+0x70/0xd8)
[ 2.500000] [<c07c8d64>] (event_trace_init) from [<c07c8560>] (tracer_init_tracefs+0x14/0x284)
[ 2.500000] [<c07c8560>] (tracer_init_tracefs) from [<c000a330>] (do_one_initcall+0xdc/0x288)
[ 2.500000] [<c000a330>] (do_one_initcall) from [<c07bd1e8>] (kernel_init_freeable+0x1c4/0x20c)
[ 2.500000] [<c07bd1e8>] (kernel_init_freeable) from [<c05629c0>] (kernel_init+0x10/0x110)
[ 2.500000] [<c05629c0>] (kernel_init) from [<c00084f8>] (ret_from_fork+0x14/0x3c)
[ 2.500000] Exception stack(0xc1485fb0 to 0xc1485ff8)
[ 2.500000] 5fa0: 00000000 00000000 00000000 00000000
[ 2.500000] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 2.500000] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000

Reported-by: Jonathan Neuschäfer <[email protected]>
Link: https://lore.kernel.org/lkml/YfMa0QgsjCVdRAvJ@latitude/
Fixes: b7d5dc21072c ("random: add a spinlock_t to struct batched_entropy")
Cc: Andy Lutomirski <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>
---
Changes v1->v2:
- We move from Andy's original patch, which was a bit racey, to using a
simple generation counter.

drivers/char/random.c | 58 ++++++++++++++++++++-----------------------
1 file changed, 27 insertions(+), 31 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index b411182df6f6..8b18b3f1c317 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -2057,13 +2057,15 @@ struct ctl_table random_table[] = {
};
#endif /* CONFIG_SYSCTL */

+static atomic_t batch_generation = ATOMIC_INIT(0);
+
struct batched_entropy {
union {
u64 entropy_u64[CHACHA_BLOCK_SIZE / sizeof(u64)];
u32 entropy_u32[CHACHA_BLOCK_SIZE / sizeof(u32)];
};
unsigned int position;
- spinlock_t batch_lock;
+ int generation;
};

/*
@@ -2074,9 +2076,7 @@ struct batched_entropy {
* wait_for_random_bytes() should be called and return 0 at least once at any
* point prior.
*/
-static DEFINE_PER_CPU(struct batched_entropy, batched_entropy_u64) = {
- .batch_lock = __SPIN_LOCK_UNLOCKED(batched_entropy_u64.lock),
-};
+static DEFINE_PER_CPU(struct batched_entropy, batched_entropy_u64);

u64 get_random_u64(void)
{
@@ -2084,41 +2084,52 @@ u64 get_random_u64(void)
unsigned long flags;
struct batched_entropy *batch;
static void *previous;
+ int next_gen;

warn_unseeded_randomness(&previous);

- batch = raw_cpu_ptr(&batched_entropy_u64);
- spin_lock_irqsave(&batch->batch_lock, flags);
- if (batch->position % ARRAY_SIZE(batch->entropy_u64) == 0) {
+ local_irq_save(flags);
+ batch = this_cpu_ptr(&batched_entropy_u64);
+
+ next_gen = atomic_read(&batch_generation);
+ if (batch->position % ARRAY_SIZE(batch->entropy_u64) == 0 ||
+ next_gen != batch->generation) {
extract_crng((u8 *)batch->entropy_u64);
batch->position = 0;
+ batch->generation = next_gen;
}
+
ret = batch->entropy_u64[batch->position++];
- spin_unlock_irqrestore(&batch->batch_lock, flags);
+ local_irq_restore(flags);
return ret;
}
EXPORT_SYMBOL(get_random_u64);

-static DEFINE_PER_CPU(struct batched_entropy, batched_entropy_u32) = {
- .batch_lock = __SPIN_LOCK_UNLOCKED(batched_entropy_u32.lock),
-};
+static DEFINE_PER_CPU(struct batched_entropy, batched_entropy_u32);
+
u32 get_random_u32(void)
{
u32 ret;
unsigned long flags;
struct batched_entropy *batch;
static void *previous;
+ int next_gen;

warn_unseeded_randomness(&previous);

- batch = raw_cpu_ptr(&batched_entropy_u32);
- spin_lock_irqsave(&batch->batch_lock, flags);
- if (batch->position % ARRAY_SIZE(batch->entropy_u32) == 0) {
+ local_irq_save(flags);
+ batch = this_cpu_ptr(&batched_entropy_u32);
+
+ next_gen = atomic_read(&batch_generation);
+ if (batch->position % ARRAY_SIZE(batch->entropy_u32) == 0 ||
+ next_gen != batch->generation) {
extract_crng((u8 *)batch->entropy_u32);
batch->position = 0;
+ batch->generation = next_gen;
}
+
ret = batch->entropy_u32[batch->position++];
- spin_unlock_irqrestore(&batch->batch_lock, flags);
+ local_irq_restore(flags);
return ret;
}
EXPORT_SYMBOL(get_random_u32);
@@ -2129,22 +2140,7 @@ EXPORT_SYMBOL(get_random_u32);
* next usage. */
static void invalidate_batched_entropy(void)
{
- int cpu;
- unsigned long flags;
-
- for_each_possible_cpu(cpu) {
- struct batched_entropy *batched_entropy;
-
- batched_entropy = per_cpu_ptr(&batched_entropy_u32, cpu);
- spin_lock_irqsave(&batched_entropy->batch_lock, flags);
- batched_entropy->position = 0;
- spin_unlock(&batched_entropy->batch_lock);
-
- batched_entropy = per_cpu_ptr(&batched_entropy_u64, cpu);
- spin_lock(&batched_entropy->batch_lock);
- batched_entropy->position = 0;
- spin_unlock_irqrestore(&batched_entropy->batch_lock, flags);
- }
+ atomic_inc(&batch_generation);
}

/**
--
2.35.0

2022-01-31 23:12:12

by kernel test robot

[permalink] [raw]
Subject: [random] 1e1724f9dd: UBSAN:array-index-out-of-bounds_in_drivers/char/random.c



Greeting,

FYI, we noticed the following commit (built with clang-14):

commit: 1e1724f9ddd1649555105fd31a8973e7a2e5466c ("[PATCH] random: remove batched entropy locking")
url: https://github.com/0day-ci/linux/commits/Jason-A-Donenfeld/random-remove-batched-entropy-locking/20220128-233457
base: https://git.kernel.org/cgit/linux/kernel/git/gregkh/char-misc.git 710f8af199ee9d72dd87083edd55c5ee250ee6f4
patch link: https://lore.kernel.org/lkml/[email protected]

in testcase: boot

on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


+----------------------------------------------------------+------------+------------+
| | 710f8af199 | 1e1724f9dd |
+----------------------------------------------------------+------------+------------+
| UBSAN:array-index-out-of-bounds_in_drivers/char/random.c | 0 | 13 |
| BUG:KASAN:global-out-of-bounds_in_get_random_u32 | 0 | 13 |
+----------------------------------------------------------+------------+------------+


If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


[ 29.921782][ T1] UBSAN: array-index-out-of-bounds in drivers/char/random.c:2141:8
[ 29.923207][ T1] index 8 is out of range for type 'u64[8]' (aka 'unsigned long long[8]')
[ 29.923634][ T1] CPU: 0 PID: 1 Comm: swapper Not tainted 5.17.0-rc1-00010-g1e1724f9ddd1 #2 51d507a9ab4d92cb438b1c02ba5a02d8ac52cd1d
[ 29.923634][ T1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[ 29.923634][ T1] Call Trace:
[ 29.923634][ T1] <TASK>
[ 29.923634][ T1] dump_stack_lvl (??:?)
[ 29.923634][ T1] dump_stack (??:?)
[ 29.923634][ T1] __ubsan_handle_out_of_bounds (??:?)
[ 29.923634][ T1] get_random_u32 (??:?)
[ 29.923634][ T1] bucket_table_alloc (rhashtable.c:?)
[ 29.923634][ T1] rhashtable_init (??:?)
[ 29.923634][ T1] ? rcu_read_lock_sched_held (??:?)
[ 29.923634][ T1] ? bpf_iter_netlink (af_netlink.c:?)
[ 29.923634][ T1] netlink_proto_init (af_netlink.c:?)
[ 29.923634][ T1] do_one_initcall (??:?)
[ 29.923634][ T1] ? bpf_iter_netlink (af_netlink.c:?)
[ 29.923634][ T1] do_initcall_level (main.c:?)
[ 29.923634][ T1] do_initcalls (main.c:?)
[ 29.923634][ T1] do_basic_setup (main.c:?)
[ 29.923634][ T1] kernel_init_freeable (main.c:?)
[ 29.923634][ T1] ? rest_init (main.c:?)
[ 29.923634][ T1] kernel_init (main.c:?)
[ 29.923634][ T1] ? rest_init (main.c:?)
[ 29.923634][ T1] ret_from_fork (??:?)
[ 29.923634][ T1] </TASK>
[ 29.923634][ T1] ================================================================================
[ 29.923718][ T1] ==================================================================
[ 29.924895][ T1] BUG: KASAN: global-out-of-bounds in get_random_u32 (??:?)
[ 29.926024][ T1] Read of size 8 at addr ffffffffb4fe94c0 by task swapper/1
[ 29.926967][ T1]
[ 29.926967][ T1] CPU: 0 PID: 1 Comm: swapper Not tainted 5.17.0-rc1-00010-g1e1724f9ddd1 #2 51d507a9ab4d92cb438b1c02ba5a02d8ac52cd1d
[ 29.926967][ T1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[ 29.926967][ T1] Call Trace:
[ 29.926967][ T1] <TASK>
[ 29.926967][ T1] dump_stack_lvl (??:?)
[ 29.926967][ T1] print_address_description (report.c:?)
[ 29.926967][ T1] __kasan_report (report.c:?)
[ 29.926967][ T1] ? get_random_u32 (??:?)
[ 29.926967][ T1] kasan_report (??:?)
[ 29.926967][ T1] __asan_report_load8_noabort (??:?)
[ 29.926967][ T1] get_random_u32 (??:?)
[ 29.926967][ T1] bucket_table_alloc (rhashtable.c:?)
[ 29.926967][ T1] rhashtable_init (??:?)
[ 29.926967][ T1] ? rcu_read_lock_sched_held (??:?)
[ 29.926967][ T1] ? bpf_iter_netlink (af_netlink.c:?)
[ 29.926967][ T1] netlink_proto_init (af_netlink.c:?)
[ 29.926967][ T1] do_one_initcall (??:?)
[ 29.926967][ T1] ? bpf_iter_netlink (af_netlink.c:?)
[ 29.926967][ T1] do_initcall_level (main.c:?)
[ 29.926967][ T1] do_initcalls (main.c:?)
[ 29.926967][ T1] do_basic_setup (main.c:?)
[ 29.926967][ T1] kernel_init_freeable (main.c:?)
[ 29.926967][ T1] ? rest_init (main.c:?)
[ 29.926967][ T1] kernel_init (main.c:?)
[ 29.926967][ T1] ? rest_init (main.c:?)
[ 29.926967][ T1] ret_from_fork (??:?)
[ 29.926967][ T1] </TASK>
[ 29.926967][ T1]
[ 29.926967][ T1] The buggy address belongs to the variable:
[ 29.926967][ T1] random_write_wakeup_bits+0x0/0x20
[ 29.926967][ T1]
[ 29.926967][ T1] Memory state around the buggy address:
[ 29.926967][ T1] ffffffffb4fe9380: 00 00 00 00 00 00 00 00 00 00 00 00 f9 f9 f9 f9
[ 29.926967][ T1] ffffffffb4fe9400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 29.926967][ T1] >ffffffffb4fe9480: 00 00 00 00 00 00 00 00 04 f9 f9 f9 00 00 00 00
[ 29.926967][ T1] ^
[ 29.926967][ T1] ffffffffb4fe9500: 00 00 00 00 00 00 f9 f9 f9 f9 f9 f9 01 f9 f9 f9
[ 29.926967][ T1] ffffffffb4fe9580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 29.926967][ T1] ==================================================================
[ 29.926967][ T1] Disabling lock debugging due to kernel taint
[ 29.927133][ T1] NET: Registered PF_NETLINK/PF_ROUTE protocol family
[ 29.930966][ T1] thermal_sys: Registered thermal governor 'fair_share'
[ 29.930971][ T1] thermal_sys: Registered thermal governor 'bang_bang'
[ 29.932004][ T1] thermal_sys: Registered thermal governor 'step_wise'
[ 29.933055][ T1] thermal_sys: Registered thermal governor 'user_space'
[ 29.933708][ T1] cpuidle: using governor ladder
[ 29.935795][ T1] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
[ 29.937434][ T1] PCI: Using configuration type 1 for base access
[ 29.958988][ T1] kprobes: kprobe jump-optimization is enabled. All kprobes are optimized if possible.
[ 29.960327][ T7] Callback from call_rcu_tasks() invoked.
[ 29.961915][ T1] HugeTLB: can free 6 vmemmap pages for hugepages-2048kB
[ 29.962897][ T1] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages
[ 29.965886][ T1] cryptd: max_cpu_qlen set to 1000
[ 29.967924][ T1] raid6: skipped pq benchmark and selected sse2x4
[ 29.968825][ T1] raid6: using ssse3x2 recovery algorithm
[ 29.969891][ T1] ACPI: Added _OSI(Module Device)
[ 29.970307][ T1] ACPI: Added _OSI(Processor Device)
[ 29.971058][ T1] ACPI: Added _OSI(3.0 _SCP Extensions)
[ 29.971841][ T1] ACPI: Added _OSI(Processor Aggregator Device)
[ 29.972747][ T1] ACPI: Added _OSI(Linux-Dell-Video)
[ 29.973549][ T1] ACPI: Added _OSI(Linux-Lenovo-NV-HDMI-Audio)
[ 29.973648][ T1] ACPI: Added _OSI(Linux-HPI-Hybrid-Graphics)
[ 29.994328][ T1] ACPI: 1 ACPI AML tables successfully acquired and loaded
[ 30.006626][ T1] ACPI: Interpreter enabled
[ 30.007071][ T1] ACPI: PM: (supports S0 S3 S5)
[ 30.007783][ T1] ACPI: Using IOAPIC for interrupt routing
[ 30.008714][ T1] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
[ 30.011387][ T1] ACPI: Enabled 2 GPEs in block 00 to 0F
[ 30.053305][ T1] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
[ 30.053667][ T1] acpi PNP0A03:00: _OSC: OS supports [ASPM ClockPM Segments MSI HPX-Type3]
[ 30.054872][ T1] acpi PNP0A03:00: PCIe port services disabled; not requesting _OSC control
[ 30.056154][ T1] acpi PNP0A03:00: fail to add MMCONFIG information, can't access extended PCI configuration space under this bridge.
[ 30.057970][ T1] acpiphp: Slot [3] registered
[ 30.058877][ T1] acpiphp: Slot [4] registered
[ 30.059769][ T1] acpiphp: Slot [5] registered
[ 30.060516][ T1] acpiphp: Slot [6] registered
[ 30.061393][ T1] acpiphp: Slot [7] registered
[ 30.062306][ T1] acpiphp: Slot [8] registered
[ 30.063187][ T1] acpiphp: Slot [9] registered
[ 30.063877][ T1] acpiphp: Slot [10] registered
[ 30.064814][ T1] acpiphp: Slot [11] registered
[ 30.065712][ T1] acpiphp: Slot [12] registered
[ 30.066613][ T1] acpiphp: Slot [13] registered
[ 30.067181][ T1] acpiphp: Slot [14] registered
[ 30.068082][ T1] acpiphp: Slot [15] registered
[ 30.068992][ T1] acpiphp: Slot [16] registered
[ 30.069889][ T1] acpiphp: Slot [17] registered
[ 30.070506][ T1] acpiphp: Slot [18] registered
[ 30.071401][ T1] acpiphp: Slot [19] registered
[ 30.072314][ T1] acpiphp: Slot [20] registered
[ 30.073206][ T1] acpiphp: Slot [21] registered
[ 30.073840][ T1] acpiphp: Slot [22] registered
[ 30.074765][ T1] acpiphp: Slot [23] registered
[ 30.075669][ T1] acpiphp: Slot [24] registered
[ 30.076557][ T1] acpiphp: Slot [25] registered
[ 30.077176][ T1] acpiphp: Slot [26] registered
[ 30.078073][ T1] acpiphp: Slot [27] registered
[ 30.078982][ T1] acpiphp: Slot [28] registered


To reproduce:

# build kernel
cd linux
cp config-5.17.0-rc1-00010-g1e1724f9ddd1 .config
make HOSTCC=clang-14 CC=clang-14 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules
make HOSTCC=clang-14 CC=clang-14 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install
cd <mod-install-dir>
find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz


git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.



---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation

Thanks,
Oliver Sang


Attachments:
(No filename) (9.92 kB)
config-5.17.0-rc1-00010-g1e1724f9ddd1 (146.35 kB)
job-script (4.84 kB)
dmesg.xz (13.63 kB)
Download all attachments

2022-02-01 07:52:04

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [PATCH] random: remove batched entropy locking

Hey Jonathan,

On Fri, Jan 28, 2022 at 7:05 PM Jonathan Neuschäfer
<[email protected]> wrote:
> FWIW, this does fix the splat on my machine.
>
> Tested-by: Jonathan Neuschäfer <[email protected]>

Thanks for testing. If it's not too much trouble, would you verify v2
as well? It's substantially different from v1 that it doesn't seem
right to carry through your Tested-by, and from talking a bit with
Andy, I think we're more likely to go with a v2-like approach than a
v1-like one.

Jason

2022-02-01 08:47:11

by J. Neuschäfer

[permalink] [raw]
Subject: Re: [PATCH v2] random: remove batched entropy locking

Hi Jason,

On Fri, Jan 28, 2022 at 11:35:48PM +0100, Jason A. Donenfeld wrote:
> Rather than use spinlocks to protect batched entropy, we can instead
> disable interrupts locally, since we're dealing with per-cpu data, and
> manage resets with a basic generation counter. This should fix up the
> below splat that Jonathan received with a PROVE_RAW_LOCK_NESTING=y
> kernel.
>
> Note that Sebastian has pointed out a few other areas where
> using spinlock_t in an IRQ context is potentially problematic for
> PREEMPT_RT. This patch handles one of those cases, and we'll hopefully
> have additional patches for other cases.
>
> [ 2.500000] [ BUG: Invalid wait context ]
[...]
>
> Reported-by: Jonathan Neuschäfer <[email protected]>
> Link: https://lore.kernel.org/lkml/YfMa0QgsjCVdRAvJ@latitude/
> Fixes: b7d5dc21072c ("random: add a spinlock_t to struct batched_entropy")
> Cc: Andy Lutomirski <[email protected]>
> Cc: Sebastian Andrzej Siewior <[email protected]>
> Signed-off-by: Jason A. Donenfeld <[email protected]>
> ---
> Changes v1->v2:
> - We move from Andy's original patch, which was a bit racey, to using a
> simple generation counter.
>
> drivers/char/random.c | 58 ++++++++++++++++++++-----------------------
> 1 file changed, 27 insertions(+), 31 deletions(-)

This patch doesn't seem to apply properly on 5.17-rc1:

$ git am rand2.mbox
Applying: random: remove batched entropy locking
error: patch failed: drivers/char/random.c:2057
error: drivers/char/random.c: patch does not apply
Patch failed at 0001 random: remove batched entropy locking
hint: Use 'git am --show-current-patch=diff' to see the failed patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

So I cherry-picked it from https://git.zx2c4.com/linux-rng jd/no-batch-lock.

The result looks alright (no splat during boot).

Tested-by: Jonathan Neuschäfer <[email protected]>
--


With both patches, I get the following about three minutes after boot, but
I guess it will be addressed by one of the other patchsets under discussion:

[ 202.670000] =============================
[ 202.670000] [ BUG: Invalid wait context ]
[ 202.670000] 5.17.0-rc1-00001-g4e53c88bd88e #585 Not tainted
[ 202.670000] -----------------------------
[ 202.670000] swapper/0 is trying to lock:
[ 202.670000] c0b0ff3c (random_write_wait.lock){....}-{3:3}, at: __wake_up_common_lock+0x54/0xb4
[ 202.670000] other info that might help us debug this:
[ 202.670000] context-{2:2}
[ 202.670000] no locks held by swapper/0.
[ 202.670000] stack backtrace:
[ 202.670000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.17.0-rc1-00001-g4e53c88bd88e #585
[ 202.670000] Hardware name: WPCM450 chip
[ 202.670000] [<c00100a8>] (unwind_backtrace) from [<c000db2c>] (show_stack+0x10/0x14)
[ 202.670000] [<c000db2c>] (show_stack) from [<c0054eec>] (__lock_acquire+0x3f0/0x189c)
[ 202.670000] [<c0054eec>] (__lock_acquire) from [<c0054478>] (lock_acquire+0x2b8/0x354)
[ 202.670000] [<c0054478>] (lock_acquire) from [<c0568088>] (_raw_spin_lock_irqsave+0x60/0x74)
[ 202.670000] [<c0568088>] (_raw_spin_lock_irqsave) from [<c004c34c>] (__wake_up_common_lock+0x54/0xb4)
[ 202.670000] [<c004c34c>] (__wake_up_common_lock) from [<c004c3bc>] (__wake_up+0x10/0x18)
[ 202.670000] [<c004c3bc>] (__wake_up) from [<c030d898>] (crng_reseed.constprop.0+0x240/0x338)
[ 202.670000] [<c030d898>] (crng_reseed.constprop.0) from [<c0061b34>] (handle_irq_event_percpu+0x18/0x38)
[ 202.670000] [<c0061b34>] (handle_irq_event_percpu) from [<c0061b8c>] (handle_irq_event+0x38/0x5c)
[ 202.670000] [<c0061b8c>] (handle_irq_event) from [<c0065b28>] (handle_fasteoi_irq+0x9c/0x114)
[ 202.670000] [<c0065b28>] (handle_fasteoi_irq) from [<c0061178>] (handle_irq_desc+0x24/0x34)
[ 202.670000] [<c0061178>] (handle_irq_desc) from [<c05621ac>] (generic_handle_arch_irq+0x28/0x3c)
[ 202.670000] [<c05621ac>] (generic_handle_arch_irq) from [<c0008eb4>] (__irq_svc+0x54/0x80)
[ 202.670000] Exception stack(0xc0931f20 to 0xc0931f68)
[ 202.670000] 1f20: 00000000 0005317f 0005217f 60000013 00000000 00000000 c0930000 ffffffff
[ 202.670000] 1f40: c0937ac0 00000000 00053177 00000000 600000d3 c0931f70 c000b0e8 c000b0e0
[ 202.670000] 1f60: 60000013 ffffffff
[ 202.670000] [<c0008eb4>] (__irq_svc) from [<c000b0e0>] (arch_cpu_idle+0x24/0x34)
[ 202.670000] [<c000b0e0>] (arch_cpu_idle) from [<c0567e24>] (default_idle_call+0x40/0x80)
[ 202.670000] [<c0567e24>] (default_idle_call) from [<c0046fbc>] (do_idle+0xd4/0x254)
[ 202.670000] [<c0046fbc>] (do_idle) from [<c00474e4>] (cpu_startup_entry+0xc/0x10)
[ 202.670000] [<c00474e4>] (cpu_startup_entry) from [<c07bdedc>] (start_kernel+0x5cc/0x6c0)
[ 202.670000] random: crng init done


Attachments:
(No filename) (4.81 kB)
signature.asc (849.00 B)
Download all attachments
Subject: Re: [PATCH v2] random: remove batched entropy locking

On 2022-02-04 01:27:55 [+0100], Jason A. Donenfeld wrote:
> Hey Andy,
>
> Think I could bug you to review this patch? The general idea is based
> on your original patch, and I think this fits what we talked about on
> IRC. I figure we'll probably both page this out of our minds after
> another week or two of not thinking about it.
>
> It's here on cgit if that's easier to look at:
> https://git.kernel.org/pub/scm/linux/kernel/git/crng/random.git/commit/?h=jd/no-batch-lock

Please don't merge this:
- This splat only occurs with CONFIG_PROVE_RAW_LOCK_NESTING enabled.
This option has this in its description:
│ NOTE: There are known nesting problems. So if you enable this
│ option expect lockdep splats until these problems have been fully
│ addressed which is work in progress. This config switch allows to
│ identify and analyze these problems. It will be removed and the
│ check permanently enabled once the main issues have been fixed.

- The problem identified by the splat affects only PREEMPT_RT. Non-RT is
not affected by this.

- This patch disables interrupts and invokes extract_crng() which leads
to other problems.


If this patch is the way then it should be merged together with the
other outstanding issues.

> Thanks,
> Jason

Sebastian

Subject: Re: [PATCH v2] random: remove batched entropy locking

On 2022-02-04 14:42:03 [+0100], Jason A. Donenfeld wrote:
> Hi Sebastian,
Hi Jason,

> Please calm down a bit: this patch doesn't minimize the importance of
> working out a real solution for PREEMPT_RT, and I'm not under the
> illusion that this one here is the silver bullet. It does, however,
> have other merits, which may or may not have anything to do with
> PREEMPT_RT. To reiterate: I am taking your PREEMPT_RT concerns
> seriously, and I want to come up with a solution to that, which we're
> working toward more broadly in that other thread.
>
> Per your feedback on v1, this is no longer marked for stable and no
> longer purports to fix the PREEMPT_RT issues entirely. Actually, a
> large motivation for this includes the reason why Andy's original
> patch was laying around in the first place: we're trying to make this
> code faster.

The commit in tree you cited is b43db859a36cb553102c9c80431fc44618703bda.
It does not mention anything regarding faster nor the performance
improvement and conditions (hoth path, etc). It still has a stable tag.

> I can improve the commit message a bit though.
>
> On Fri, Feb 4, 2022 at 12:10 PM Sebastian Andrzej Siewior
> <[email protected]> wrote:
> > - This splat only occurs with CONFIG_PROVE_RAW_LOCK_NESTING enabled.
>
> Right, the commit message for v2 mentions that.
>
> > - The problem identified by the splat affects only PREEMPT_RT. Non-RT is
> > not affected by this.
>
> Right.
>
> >
> > - This patch disables interrupts and invokes extract_crng() which leads
> > to other problems.
>
> The existing code, which uses a spinlock, also disables interrupts,
> right? So this isn't actually regressing in that regard. It just
> doesn't fix your PREEMPT_RT issue, right?

The existing code uses spin_lock_irqsave() which do not disable on
PREEMPT_RT. The local_irq_save() on the hand does.

> Or is the issue you see that spinlock_t is a mutex on PREEMPT_RT, so
> we're disabling interrupts here in a way that we _weren't_ originally,
> in a PREEMPT_RT context? If that's the case, then I think I see your
> objection.

Exactly.

> I wonder if it'd be enough here to disable preemption instead? But
> then we run into trouble if this is called from an interrupt.

Disabling preemption does not allow to acquire sleeping locks so no win.

> Maybe it'd be best to retain the spinlock_t, which will amount to
> disabling interrupts on !PREEMPT_RT, since it'll never be contended,
> but will turn into a mutex on PREEMPT_RT, where it'll do the right
> thing from an exclusivity perspective. Would this be reasonable?

what does retain the spinlock_t mean since we already have a spinlock_t?

> Andy? Any suggestions?
>
> Jason

Sebastian

2022-02-04 17:38:32

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [PATCH v2] random: remove batched entropy locking

Hi Sebastian,

On Fri, Feb 4, 2022 at 3:02 PM Sebastian Andrzej Siewior
<[email protected]> wrote:
> The commit in tree you cited is b43db859a36cb553102c9c80431fc44618703bda.
> It does not mention anything regarding faster nor the performance
> improvement and conditions (hoth path, etc). It still has a stable tag.

It dropped the Cc: stable@. It still has the Fixes:. I can get rid of
the Fixes: too. I'll improve that message a bunch for a potential v3.

> > Maybe it'd be best to retain the spinlock_t, which will amount to
> > disabling interrupts on !PREEMPT_RT, since it'll never be contended,
> > but will turn into a mutex on PREEMPT_RT, where it'll do the right
> > thing from an exclusivity perspective. Would this be reasonable?
>
> what does retain the spinlock_t mean since we already have a spinlock_t?

The idea would be to keep using spinlock_t like we do now -- no change
there -- but move to using this atomic generation counter so that
there's never any contention. Actually, though, I worry that that
approach would throw out the gains we're getting by chucking the
spinlock in the first place.

What if we keep a spinlock_t there on PREEMPT_RT but stick with
disabling interrupts on !PREEMPT_RT? I wish there was a solution or an
API that amounted to the same thing so there wouldn't need to be an
#ifdef, but I don't know what that'd be.

Jason

2022-02-05 16:03:25

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [PATCH v2] random: remove batched entropy locking

Hey Andy,

Think I could bug you to review this patch? The general idea is based
on your original patch, and I think this fits what we talked about on
IRC. I figure we'll probably both page this out of our minds after
another week or two of not thinking about it.

It's here on cgit if that's easier to look at:
https://git.kernel.org/pub/scm/linux/kernel/git/crng/random.git/commit/?h=jd/no-batch-lock

Thanks,
Jason

2022-02-07 07:03:45

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [PATCH v2] random: remove batched entropy locking

Hi Sebastian,

On Fri, Feb 4, 2022 at 3:30 PM Sebastian Andrzej Siewior
<[email protected]> wrote:
> > What if we keep a spinlock_t there on PREEMPT_RT but stick with
> > disabling interrupts on !PREEMPT_RT? I wish there was a solution or an
> > API that amounted to the same thing so there wouldn't need to be an
> > #ifdef, but I don't know what that'd be.
>
> If it is still to much try to look for locallock_t and
> local_lock_irqsave(). This is kind of like your local_irq_save() but
> you have lockdep annotations and PREEMPT_RT has a spinlock_t like
> behaviour. It also documents in-code what the scope of your locking is.

Oh, that's terrific, thanks! Sounds like exactly what we were looking
for. I'll respin this patch with that.

Jason

Subject: Re: [PATCH v2] random: remove batched entropy locking

On 2022-02-04 15:11:34 [+0100], Jason A. Donenfeld wrote:
> Hi Sebastian,
Hi Jason,

> On Fri, Feb 4, 2022 at 3:02 PM Sebastian Andrzej Siewior
> <[email protected]> wrote:
> > The commit in tree you cited is b43db859a36cb553102c9c80431fc44618703bda.
> > It does not mention anything regarding faster nor the performance
> > improvement and conditions (hoth path, etc). It still has a stable tag.
>
> It dropped the Cc: stable@. It still has the Fixes:. I can get rid of
> the Fixes: too. I'll improve that message a bunch for a potential v3.

Either you argue for bug fixing or performance improvement and I made it
clear that it is not bug fixing. That Fixes: tag is enough for Greg to
backport it.

> > > Maybe it'd be best to retain the spinlock_t, which will amount to
> > > disabling interrupts on !PREEMPT_RT, since it'll never be contended,
> > > but will turn into a mutex on PREEMPT_RT, where it'll do the right
> > > thing from an exclusivity perspective. Would this be reasonable?
> >
> > what does retain the spinlock_t mean since we already have a spinlock_t?
>
> The idea would be to keep using spinlock_t like we do now -- no change
> there -- but move to using this atomic generation counter so that
> there's never any contention. Actually, though, I worry that that
> approach would throw out the gains we're getting by chucking the
> spinlock in the first place.

It is a per-CPU spinlock_t so there should be no contention if there is
no cross-CPU access. The overhead are two atomic operations.

> What if we keep a spinlock_t there on PREEMPT_RT but stick with
> disabling interrupts on !PREEMPT_RT? I wish there was a solution or an
> API that amounted to the same thing so there wouldn't need to be an
> #ifdef, but I don't know what that'd be.

If it is still to much try to look for locallock_t and
local_lock_irqsave(). This is kind of like your local_irq_save() but
you have lockdep annotations and PREEMPT_RT has a spinlock_t like
behaviour. It also documents in-code what the scope of your locking is.

> Jason

Sebastian

2022-02-07 19:03:31

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [PATCH v2] random: remove batched entropy locking

Hi Sebastian,

Please calm down a bit: this patch doesn't minimize the importance of
working out a real solution for PREEMPT_RT, and I'm not under the
illusion that this one here is the silver bullet. It does, however,
have other merits, which may or may not have anything to do with
PREEMPT_RT. To reiterate: I am taking your PREEMPT_RT concerns
seriously, and I want to come up with a solution to that, which we're
working toward more broadly in that other thread.

Per your feedback on v1, this is no longer marked for stable and no
longer purports to fix the PREEMPT_RT issues entirely. Actually, a
large motivation for this includes the reason why Andy's original
patch was laying around in the first place: we're trying to make this
code faster.

I can improve the commit message a bit though.

On Fri, Feb 4, 2022 at 12:10 PM Sebastian Andrzej Siewior
<[email protected]> wrote:
> - This splat only occurs with CONFIG_PROVE_RAW_LOCK_NESTING enabled.

Right, the commit message for v2 mentions that.

> - The problem identified by the splat affects only PREEMPT_RT. Non-RT is
> not affected by this.

Right.

>
> - This patch disables interrupts and invokes extract_crng() which leads
> to other problems.

The existing code, which uses a spinlock, also disables interrupts,
right? So this isn't actually regressing in that regard. It just
doesn't fix your PREEMPT_RT issue, right?

Or is the issue you see that spinlock_t is a mutex on PREEMPT_RT, so
we're disabling interrupts here in a way that we _weren't_ originally,
in a PREEMPT_RT context? If that's the case, then I think I see your
objection.

I wonder if it'd be enough here to disable preemption instead? But
then we run into trouble if this is called from an interrupt.

Maybe it'd be best to retain the spinlock_t, which will amount to
disabling interrupts on !PREEMPT_RT, since it'll never be contended,
but will turn into a mutex on PREEMPT_RT, where it'll do the right
thing from an exclusivity perspective. Would this be reasonable?

Andy? Any suggestions?

Jason

2022-02-07 19:48:58

by Jason A. Donenfeld

[permalink] [raw]
Subject: [PATCH v3] random: remove batched entropy locking

Rather than use spinlocks to protect batched entropy, we can instead
disable interrupts locally, since we're dealing with per-cpu data, and
manage resets with a basic generation counter. At the same time, we
can't quite do this on PREEMPT_RT, where we still want spinlocks-as-
mutexes semantics. So we use a local_lock_t, which provides the right
behavior for each. Because this is a per-cpu lock, that generation
counter is still doing the necessary CPU-to-CPU communication.

This should improve performance a bit. It will also fix the linked splat
that Jonathan received with a PROVE_RAW_LOCK_NESTING=y.

Note that Sebastian has pointed out a few other areas where
using spinlock_t in an IRQ context is potentially problematic for
PREEMPT_RT. This patch handles one of those cases, and we'll hopefully
have additional patches for other cases.

Suggested-by: Andy Lutomirski <[email protected]>
Reported-by: Jonathan Neuschäfer <[email protected]>
Tested-by: Jonathan Neuschäfer <[email protected]>
Link: https://lore.kernel.org/lkml/YfMa0QgsjCVdRAvJ@latitude/
Cc: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>
---
drivers/char/random.c | 55 ++++++++++++++++++++++---------------------
1 file changed, 28 insertions(+), 27 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 455615ac169a..3e54b90a3ff8 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1732,13 +1732,16 @@ struct ctl_table random_table[] = {
};
#endif /* CONFIG_SYSCTL */

+static atomic_t batch_generation = ATOMIC_INIT(0);
+
struct batched_entropy {
union {
u64 entropy_u64[CHACHA_BLOCK_SIZE / sizeof(u64)];
u32 entropy_u32[CHACHA_BLOCK_SIZE / sizeof(u32)];
};
+ local_lock_t lock;
unsigned int position;
- spinlock_t batch_lock;
+ int generation;
};

/*
@@ -1750,7 +1753,7 @@ struct batched_entropy {
* point prior.
*/
static DEFINE_PER_CPU(struct batched_entropy, batched_entropy_u64) = {
- .batch_lock = __SPIN_LOCK_UNLOCKED(batched_entropy_u64.lock),
+ .lock = INIT_LOCAL_LOCK(batched_entropy_u64.lock)
};

u64 get_random_u64(void)
@@ -1759,41 +1762,54 @@ u64 get_random_u64(void)
unsigned long flags;
struct batched_entropy *batch;
static void *previous;
+ int next_gen;

warn_unseeded_randomness(&previous);

- batch = raw_cpu_ptr(&batched_entropy_u64);
- spin_lock_irqsave(&batch->batch_lock, flags);
- if (batch->position % ARRAY_SIZE(batch->entropy_u64) == 0) {
+ batch = this_cpu_ptr(&batched_entropy_u64);
+ local_lock_irqsave(&batch->lock, flags);
+
+ next_gen = atomic_read(&batch_generation);
+ if (batch->position % ARRAY_SIZE(batch->entropy_u64) == 0 ||
+ next_gen != batch->generation) {
extract_crng((u8 *)batch->entropy_u64);
batch->position = 0;
+ batch->generation = next_gen;
}
+
ret = batch->entropy_u64[batch->position++];
- spin_unlock_irqrestore(&batch->batch_lock, flags);
+ local_unlock_irqrestore(&batch->lock, flags);
return ret;
}
EXPORT_SYMBOL(get_random_u64);

static DEFINE_PER_CPU(struct batched_entropy, batched_entropy_u32) = {
- .batch_lock = __SPIN_LOCK_UNLOCKED(batched_entropy_u32.lock),
+ .lock = INIT_LOCAL_LOCK(batched_entropy_u32.lock)
};
+
u32 get_random_u32(void)
{
u32 ret;
unsigned long flags;
struct batched_entropy *batch;
static void *previous;
+ int next_gen;

warn_unseeded_randomness(&previous);

- batch = raw_cpu_ptr(&batched_entropy_u32);
- spin_lock_irqsave(&batch->batch_lock, flags);
- if (batch->position % ARRAY_SIZE(batch->entropy_u32) == 0) {
+ batch = this_cpu_ptr(&batched_entropy_u32);
+ local_lock_irqsave(&batch->lock, flags);
+
+ next_gen = atomic_read(&batch_generation);
+ if (batch->position % ARRAY_SIZE(batch->entropy_u32) == 0 ||
+ next_gen != batch->generation) {
extract_crng((u8 *)batch->entropy_u32);
batch->position = 0;
+ batch->generation = next_gen;
}
+
ret = batch->entropy_u32[batch->position++];
- spin_unlock_irqrestore(&batch->batch_lock, flags);
+ local_unlock_irqrestore(&batch->lock, flags);
return ret;
}
EXPORT_SYMBOL(get_random_u32);
@@ -1804,22 +1820,7 @@ EXPORT_SYMBOL(get_random_u32);
* next usage. */
static void invalidate_batched_entropy(void)
{
- int cpu;
- unsigned long flags;
-
- for_each_possible_cpu(cpu) {
- struct batched_entropy *batched_entropy;
-
- batched_entropy = per_cpu_ptr(&batched_entropy_u32, cpu);
- spin_lock_irqsave(&batched_entropy->batch_lock, flags);
- batched_entropy->position = 0;
- spin_unlock(&batched_entropy->batch_lock);
-
- batched_entropy = per_cpu_ptr(&batched_entropy_u64, cpu);
- spin_lock(&batched_entropy->batch_lock);
- batched_entropy->position = 0;
- spin_unlock_irqrestore(&batched_entropy->batch_lock, flags);
- }
+ atomic_inc(&batch_generation);
}

/**
--
2.35.0