2024-01-10 22:35:16

by Denis Benato

[permalink] [raw]
Subject: iio: iio-trig-hrtimer bug on suspend/resume when used with bmi160 and bmi323

Hello,

With this mail I am submitting bug report that is probably related to iio-trig-hrtimer but there is also the possibility for it to be specific to bmi160 and bmi323.

The described problem have been reproduced on my handheld PC (Asus RC71L) and in another handheld PC with two different gyroscope drivers: bmi323 (backported by me on v6.7, on RC71L) and bmi160.

My target hardware (RC71L that yeld to this discovery) has a bmi323 chip that does not have any interrupt pins reaching the CPU, yet I need to fetch periodically data from said device, therefore I used iio-trig-hrtimer: created a trigger, set the device and trigger sampling frequencies, bound the trigger to the device and enabled buffer: data is being read and available over /dev/iio:device0.

While in this state if I suspend my handheld I receive (from dmesg) the warning reported below and at resume data is not coming out of the iio device and the hrtimer appears to not be working. If I create a new trigger and bind the new trigger to said iio device and re-enable buffer data does come out of /dev/iio:device0 once more, until the next sleep.

Since this is important to me I have taken the time to look at both drivers and iio-trig-hrtimer and I have identified three possible reasons:

1) iio-trig-hrtimer won't work after suspend regardless of how it is used (this is what I believe is the cause)
2) iio-trig-hrtimer is stopped by the -ESHTDOWN returned by the function printing "Transfer while suspended", however that stack trace does not include function calls related to iio-trig-hrtimer and this seems less plausible
3) bmi160 and bmi323 appears to be similar and maybe are sharing a common bug with suspend (this is also why I have maintainers of those drivers in the recipient list)

Thanks for your time, patience and understanding,
Denis Benato


This one is from my device:
[ 635.956648] ------------[ cut here ]------------
[ 635.956649] i2c i2c-1: Transfer while suspended
[ 635.956663] WARNING: CPU: 9 PID: 848 at __i2c_transfer+0xaf/0x6e0
[ 635.956670] Modules linked in: ccm rfcomm ntfs3 snd_seq_dummy snd_hrtimer snd_seq cmac algif_hash algif_skcipher af_alg bnep iio_trig_hrtimer iio_trig_sysfs hid_playstation led_class_multicolor industrialio_sw_trigger uhid industrialio_configfs snd_sof_amd_acp63 snd_sof_amd_rembrandt mt7921e snd_sof_amd_renoir mt7921_common snd_sof_amd_acp snd_usb_audio hid_logitech_hidpp snd_sof_xtensa_dsp mt792x_lib snd_sof_pci r8153_ecm mt76_connac_lib cdc_ether snd_ump snd_sof snd_usbmidi_lib mt76 usbnet snd_rawmidi snd_hda_codec_realtek snd_sof_utils snd_seq_device snd_hda_scodec_cs35l41_spi mc snd_hda_codec_generic intel_rapl_msr mac80211 hid_multitouch snd_hda_codec_hdmi snd_soc_core snd_compress ac97_bus snd_pcm_dmaengine snd_pci_ps intel_rapl_common snd_hda_intel edac_mce_amd snd_intel_dspcfg snd_rpl_pci_acp6x snd_acp_pci snd_hda_codec snd_acp_legacy_common kvm_amd snd_hda_core snd_pci_acp6x snd_pci_acp5x snd_hwdep btusb snd_rn_pci_acp3x snd_hda_scodec_cs35l41_i2c snd_acp_config btrtl snd_pcm snd_hda_scodec_cs35l41 bmi323_i2c
[ 635.956705] btmtk btbcm bmi323_core kvm snd_hda_cs_dsp_ctls cfg80211 btintel snd_soc_acpi snd_timer uas ucsi_acpi cs_dsp xpad industrialio_triggered_buffer irqbypass crct10dif_pclmul polyval_clmulni polyval_generic bluetooth r8152 joydev ghash_clmulni_intel snd_soc_cs35l41_lib ecdh_generic sha1_ssse3 rapl ecc usb_storage mii pcspkr asus_nb_wmi wmi_bmof efi_pstore typec_ucsi kfifo_buf k10temp i2c_piix4 snd ff_memless snd_pci_acp3x nls_iso8859_1 libarc4 soundcore ccp typec hid_logitech_dj usbmouse usbkbd i2c_hid_acpi i2c_hid industrialio amd_pmc serial_multi_instantiate input_leds mac_hid sch_fq_codel crypto_user nfnetlink dmi_sysfs ip_tables x_tables autofs4 hid_asus asus_wmi ledtrig_audio sparse_keymap platform_profile hid_generic usbhid hid btrfs mmc_block amdgpu drm_buddy drm_suballoc_helper raid6_pq xor libcrc32c drm_exec i2c_algo_bit drm_display_helper cec gpu_sched amdxcp drm_ttm_helper ttm drm_kms_helper video crc32_pclmul sha512_ssse3 drm sha256_ssse3 nvme sdhci_pci nvme_core cqhci serio_raw xhci_pci sdhci
[ 635.956752] xhci_pci_renesas wmi overlay aesni_intel crypto_simd cryptd
[ 635.956755] CPU: 9 PID: 848 Comm: irq/90-bmi323-i Tainted: G W 6.7.0--xanmod1-neroreflex-1 #1
[ 635.956758] Hardware name: ASUSTeK COMPUTER INC. ROG Ally RC71L_RC71L/RC71L, BIOS RC71L.335 11/29/2023
[ 635.956759] RIP: 0010:__i2c_transfer+0xaf/0x6e0
[ 635.956761] Code: 00 00 48 85 c0 0f 84 2a 06 00 00 48 8b 30 48 8b 93 c0 00 00 00 48 85 d2 75 04 48 8b 53 70 48 c7 c7 9a 5e aa 99 e8 31 57 3e ff <0f> 0b e9 2e 04 00 00 ba 02 00 00 00 41 83 ff 02 0f 85 dd 01 00 00
[ 635.956763] RSP: 0018:ffffbf33c160bbf8 EFLAGS: 00010246
[ 635.956764] RAX: 41e9eaf9bb59b300 RBX: ffff9f8f013a28e8 RCX: 0000000000000027
[ 635.956765] RDX: ffffffff9a235ab8 RSI: 0000000000000002 RDI: ffff9f9152261888
[ 635.956766] RBP: 00000000ffffff94 R08: 80000000ffffe5fd R09: 656c696877207265
[ 635.956767] R10: 0000000000000023 R11: 0a6465646e657073 R12: 0000000000000000
[ 635.956768] R13: 0000000000000003 R14: ffffbf33c160bc50 R15: 0000000000000002
[ 635.956769] FS: 0000000000000000(0000) GS:ffff9f9152240000(0000) knlGS:0000000000000000
[ 635.956770] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 635.956771] CR2: 0000020b0a053000 CR3: 000000028ae10000 CR4: 0000000000750ef0
[ 635.956773] PKRU: 55555554
[ 635.956773] Call Trace:
[ 635.956775] <TASK>
[ 635.956777] ? __warn+0x13c/0x1f0
[ 635.956779] ? __i2c_transfer+0xaf/0x6e0
[ 635.956781] ? report_bug+0x16e/0x1f0
[ 635.956784] ? handle_bug+0x42/0x70
[ 635.956786] ? exc_invalid_op+0x16/0x50
[ 635.956788] ? asm_exc_invalid_op+0x16/0x20
[ 635.956791] ? __i2c_transfer+0xaf/0x6e0
[ 635.956793] ? __i2c_transfer+0xaf/0x6e0
[ 635.956794] i2c_transfer+0xe3/0x110
[ 635.956798] bmi323_regmap_i2c_read+0x7b/0xc0 [bmi323_i2c]
[ 635.956803] _regmap_raw_read+0x11d/0x200
[ 635.956806] regmap_raw_read+0x26a/0x2c0
[ 635.956807] ? __schedule+0x604/0x16b0
[ 635.956810] regmap_bulk_read+0x17b/0x200
[ 635.956813] ? iio_store_to_kfifo+0x11/0x30 [kfifo_buf]
[ 635.956818] bmi323_trigger_handler+0x52/0x160 [bmi323_core]
[ 635.956824] irq_thread_fn+0x18/0x50
[ 635.956826] irq_thread+0x21a/0x3c0
[ 635.956828] ? wake_up_and_wait_for_irq_thread_ready+0x190/0x190
[ 635.956830] ? irq_forced_thread_fn+0x110/0x110
[ 635.956831] kthread+0x26c/0x2b0
[ 635.956833] ? irq_forced_secondary_handler+0x20/0x20
[ 635.956835] ? kthreadd+0x300/0x300
[ 635.956836] ret_from_fork+0x2e/0x40
[ 635.956838] ? kthreadd+0x300/0x300
[ 635.956839] ret_from_fork_asm+0x11/0x20
[ 635.956841] </TASK>
[ 635.956842] ---[ end trace 0000000000000000 ]---

The following is from the other device (amd 5600u):
[ 67.131064] ------------[ cut here ]------------
[ 67.131067] i2c i2c-1: Transfer while suspended
[ 67.131078] WARNING: CPU: 1 PID: 1655 at drivers/i2c/i2c-core.h:54 __i2c_transfer+0x594/0x6b0
[ 67.131090] Modules linked in: iio_trig_hrtimer iio_trig_sysfs industrialio_sw_trigger industrialio_configfs ccm rfcomm cmac algif_hash algif_skcipher af_alg snd_seq_dummy snd_hrtimer snd_seq snd_seq_device exfat bnep wl(POE) intel_rapl_msr zenergy(OE) intel_rapl_common mousedev joydev snd_sof_amd_rembrandt snd_sof_amd_renoir snd_sof_amd_acp snd_sof_pci mt7921e snd_sof_xtensa_dsp mt7921_common snd_hda_codec_realtek snd_sof mt792x_lib snd_hda_codec_generic snd_sof_utils mt76_connac_lib cdc_mbim cdc_wdm edac_mce_amd ledtrig_audio snd_hda_codec_hdmi snd_soc_core mt76 kvm_amd snd_hda_intel snd_compress snd_intel_dspcfg ac97_bus mac80211 snd_pcm_dmaengine
[ 67.131146] ACPI Error:
[ 67.131147] snd_intel_sdw_acpi kvm btusb snd_pci_ps
[ 67.131151] No handler for Region [ECRM] (00000000b2bf81c9) [EmbeddedControl]
[ 67.131152] btrtl snd_hda_codec
[ 67.131155] (20230628/evregion-130)
[ 67.131155] snd_rpl_pci_acp6x libarc4 snd_acp_pci
[ 67.131158] ACPI Error:
[ 67.131159] irqbypass
[ 67.131161] Region EmbeddedControl (ID=3) has no handler
[ 67.131165] btintel
[ 67.131166] (20230628/exfldio-261)
[ 67.131169] snd_hda_core snd_acp_legacy_common btbcm crct10dif_pclmul
[ 67.131176] ACPI Error:
[ 67.131176] bmi160_spi
[ 67.131178] Aborting method
[ 67.131178] snd_pci_acp6x
[ 67.131180] \_SB.GPIO._EVT
[ 67.131180] snd_hwdep
[ 67.131182] due to previous error (AE_NOT_EXIST)
[ 67.131182] btmtk
[ 67.131183] (20230628/psparse-529)
[ 67.131183] polyval_clmulni cfg80211 snd_pcm polyval_generic snd_pci_acp5x gf128mul cdc_ncm snd_timer snd_rn_pci_acp3x bluetooth hid_multitouch ghash_clmulni_intel cdc_ether bmi160_i2c snd_acp_config snd usbnet sha1_ssse3 vfat bmi160_core snd_soc_acpi oxp_sensors fat rapl ecdh_generic wmi_bmof industrialio_triggered_buffer mii pcspkr rfkill k10temp i2c_piix4 snd_pci_acp3x soundcore ccp kfifo_buf i2c_hid_acpi soc_button_array i2c_hid industrialio mac_hid xpad ff_memless uinput ec_sys crypto_user fuse loop nfnetlink ip_tables x_tables uas usbhid usb_storage amdgpu btrfs drm_exec blake2b_generic amdxcp xor drm_buddy raid6_pq gpu_sched libcrc32c crc32_pclmul i2c_algo_bit sha512_ssse3 drm_suballoc_helper serio_raw sha256_ssse3 drm_ttm_helper atkbd aesni_intel libps2 ttm nvme vivaldi_fmap crypto_simd drm_display_helper nvme_core cryptd video i8042 cec nvme_common wmi serio overlay ext4 crc32c_generic crc32c_intel crc16 mbcache jbd2 dm_mod
[ 67.131318] CPU: 1 PID: 1655 Comm: irq/70-i2c-BMI0 Tainted: P OE 6.6.7-chos5-chimeraos-1 #1 011c184bc075678bbf5f825a76fa2f06741839b6
[ 67.131324] Hardware name: ONE-NETBOOK TECHNOLOGY CO., LTD. ONE XPLAYER/ONE XPLAYER, BIOS V1.08_P4C8M43L6 04/28/2022
[ 67.131327] RIP: 0010:__i2c_transfer+0x594/0x6b0
[ 67.131334] Code: 8b af c0 00 00 00 48 85 ed 75 04 48 8b 6f 70 48 8d 7b 70 e8 ae 92 ea ff 48 89 ea 48 c7 c7 c8 f0 ed 82 48 89 c6 e8 2c e7 63 ff <0f> 0b bd 94 ff ff ff e9 fc fb ff ff 48 c7 c6 50 95 9e 82 48 c7 c7
[ 67.131336] RSP: 0018:ffffc90012623cb0 EFLAGS: 00010286
[ 67.131339] RAX: 0000000000000000 RBX: ffff8881017c68e8 RCX: 0000000000000027
[ 67.131342] RDX: ffff888360e616c8 RSI: 0000000000000001 RDI: ffff888360e616c0
[ 67.131344] RBP: ffff888100e0de38 R08: ffffffff83756c80 R09: 000000008407cebb
[ 67.131345] R10: ffffffffffffffff R11: 0000000000000100 R12: 0000000000000002
[ 67.131347] R13: ffff888110836c00 R14: 0000000000000000 R15: ffff888110836c00
[ 67.131348] FS: 0000000000000000(0000) GS:ffff888360e40000(0000) knlGS:0000000000000000
[ 67.131351] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 67.131352] CR2: 0000000000000000 CR3: 000000024ec20000 CR4: 0000000000750ee0
[ 67.131354] PKRU: 55555554
[ 67.131356] Call Trace:
[ 67.131358] <TASK>
[ 67.131359] ? __i2c_transfer+0x594/0x6b0
[ 67.131363] ? __warn+0x81/0x130
[ 67.131373] ? __i2c_transfer+0x594/0x6b0
[ 67.131377] ? report_bug+0x171/0x1a0
[ 67.131382] ? srso_alias_return_thunk+0x5/0x7f
[ 67.131387] ? console_unlock+0xcb/0x120
[ 67.131394] ? handle_bug+0x3c/0x80
[ 67.131399] ? exc_invalid_op+0x17/0x70
[ 67.131402] ? asm_exc_invalid_op+0x1a/0x20
[ 67.131411] ? __i2c_transfer+0x594/0x6b0
[ 67.131417] i2c_transfer+0x4f/0x100
[ 67.131421] regmap_i2c_read+0x73/0xb0
[ 67.131428] _regmap_raw_read+0xe6/0x230
[ 67.131434] regmap_raw_read+0x25c/0x2c0
[ 67.131440] regmap_bulk_read+0x1e4/0x290
[ 67.131448] bmi160_trigger_handler+0x56/0x100 [bmi160_core 01d2f0c9a6f31373f2aa6b38a031d29ec6d3cd40]
[ 67.131457] ? __pfx_irq_thread_fn+0x10/0x10
[ 67.131461] irq_thread_fn+0x23/0x60
[ 67.131466] irq_thread+0xfe/0x1c0
[ 67.131470] ? __pfx_irq_thread_dtor+0x10/0x10
[ 67.131475] ? __pfx_irq_thread+0x10/0x10
[ 67.131478] kthread+0xe8/0x120
[ 67.131484] ? __pfx_kthread+0x10/0x10
[ 67.131488] ret_from_fork+0x34/0x50
[ 67.131493] ? __pfx_kthread+0x10/0x10
[ 67.131496] ret_from_fork_asm+0x1b/0x30
[ 67.131506] </TASK>
[ 67.131507] ---[ end trace 0000000000000000 ]---

For reference I place here the bash file that is used to reproduce the issue:
#!/bin/bash
modprobe industrialio-sw-trigger
modprobe iio-trig-sysfs
modprobe iio-trig-hrtimer

# hrtimer
if [ ! -d "/home/config" ]; then
mkdir -p /home/config
fi

mount -t configfs none /home/config
mkdir -p /home/config/iio/triggers/hrtimer/rogue

# set sampling frequency for rogue
for i in /sys/bus/iio/devices/* ; do
if [ -d "$i" ]; then
if [ -f "$i/name" ]; then
name=$(cat "$i/name")
if [ "$name" = "rogue" ]; then
echo "1600" > "$i/sampling_frequency"
fi
fi
fi
done

# set the gyroscope
for i in /sys/bus/iio/devices/* ; do
if [ -d "$i" ]; then
if [ -f "$i/name" ]; then
name=$(cat "$i/name")
if [ "$name" = "bmi323-imu" ]; then

# change chip sampling frequency
echo "1600.000000" > "$i/in_accel_sampling_frequency"
echo "1600.000000" > "$i/in_anglvel_sampling_frequency"

# enable accel data acquisition
echo 1 > "$i/scan_elements/in_accel_x_en"
echo 1 > "$i/scan_elements/in_accel_y_en"
echo 1 > "$i/scan_elements/in_accel_z_en"

# enable gyroscope data acquisition
echo 1 > "$i/scan_elements/in_anglvel_x_en"
echo 1 > "$i/scan_elements/in_anglvel_y_en"
echo 1 > "$i/scan_elements/in_anglvel_z_en"

# enable timestamp reporting
echo 1 > "$i/scan_elements/in_timestamp_en"

# bind rogue hrtimer to to the iio device
echo "rogue" > "$i/trigger/current_trigger"

# enable the buffer
echo 1 > "$i/buffer0/enable"

echo "bmi323-imu buffer started"
fi
fi
fi
done


2024-01-13 17:46:47

by Jonathan Cameron

[permalink] [raw]
Subject: Re: iio: iio-trig-hrtimer bug on suspend/resume when used with bmi160 and bmi323

On Wed, 10 Jan 2024 23:35:01 +0100
Denis Benato <[email protected]> wrote:

> Hello,
>
> With this mail I am submitting bug report that is probably related to
> iio-trig-hrtimer but there is also the possibility for it to be
> specific to bmi160 and bmi323.
>
> The described problem have been reproduced on my handheld PC (Asus
> RC71L) and in another handheld PC with two different gyroscope
> drivers: bmi323 (backported by me on v6.7, on RC71L) and bmi160.
>
> My target hardware (RC71L that yeld to this discovery) has a bmi323
> chip that does not have any interrupt pins reaching the CPU, yet I
> need to fetch periodically data from said device, therefore I used
> iio-trig-hrtimer: created a trigger, set the device and trigger
> sampling frequencies, bound the trigger to the device and enabled
> buffer: data is being read and available over /dev/iio:device0.
>
> While in this state if I suspend my handheld I receive (from dmesg)
> the warning reported below and at resume data is not coming out of
> the iio device and the hrtimer appears to not be working. If I create
> a new trigger and bind the new trigger to said iio device and
> re-enable buffer data does come out of /dev/iio:device0 once more,
> until the next sleep.
>
> Since this is important to me I have taken the time to look at both
> drivers and iio-trig-hrtimer and I have identified three possible
> reasons:
>
> 1) iio-trig-hrtimer won't work after suspend regardless of how it is
> used (this is what I believe is the cause)
me too.

> 2) iio-trig-hrtimer is stopped by the -ESHTDOWN returned by the
> function printing "Transfer while suspended", however that stack
> trace does not include function calls related to iio-trig-hrtimer and
> this seems less plausible 3) bmi160 and bmi323 appears to be similar
> and maybe are sharing a common bug with suspend (this is also why I
> have maintainers of those drivers in the recipient list)
>
> Thanks for your time, patience and understanding,

Hi Denis,

I suspect this is the legacy of the platform I used to test the hrtimer
and similar triggers on never had working suspend and resume (we ripped
support for that board out of the kernel a while back now...)
Hence those paths were never tested by me and others may not have cared
about this particular case.

Anyhow, so I think what is going on is fairly simple.

There is no way for a driver to indicate to a trigger provided by a separate
module / hardware device that it should stop triggering data capture.
The driver in question doesn't block data capture when suspended, which
would be easy enough to add but doesn't feel like the right solution.

So my initial thought is that we should add suspend and resume callbacks to
iio_trigger_ops and call them manually from iio device drivers in their
suspend and resume callbacks. These would simply pause whatever the
trigger source was so that no attempts are made to trigger the use of
the device when it is suspended.

It gets a little messy though as triggers can be shared between
multiple devices so we'd need to reference count suspend and resume
for the trigger to make sure we only resume once all consumers of
the trigger have said they are ready to cope with triggers again.

As mentioned, the alternative would be to block the triggers at ingress
to the bmi323 and bmi160 drivers. There may be a helpful pm flag that could
be used but if not, then setting an is_suspended flag under the data->mutex
to avoid races. and reading it in the trigger_handler to decide whether
to talk to the device should work.

I'd kind of like the generic solution of actually letting the trigger
know, but not sure how much work it would turn out to be. Either way there
are a lot of drivers to fix this problem in as in theory most triggers can
be used with most drivers that support buffered data capture.
There may also be fun corners where a hardware trigger from one IIO
device A is being used by another B and the suspend timing is such that B
finishing with the trigger results in A taking an action (in the try_reenable
callback) that could result in bus traffic.
That one is going to be much more fiddly to handle than the simpler case
you have run into.

Thanks for the detailed bug report btw. To get it fixed a few
questions:
1) Are you happy to test proposed fixes?
2) Do you want to have a go at fixing it yourself? (I'd suggest trying
the fix in the bmi323 driver first rather than looking at the other
solution)
If we eventually implement the more general version, then a bmi323
additional protection against this problem would not be a problem.

Clearly I'd like the answers to be yes to both questions, but up to you!

Jonathan



2024-01-14 20:04:50

by Denis Benato

[permalink] [raw]
Subject: Re: iio: iio-trig-hrtimer bug on suspend/resume when used with bmi160 and bmi323

On 1/13/24 18:46, Jonathan Cameron wrote:
> On Wed, 10 Jan 2024 23:35:01 +0100
> Denis Benato <[email protected]> wrote:
>
>> Hello,
>>
>> With this mail I am submitting bug report that is probably related to
>> iio-trig-hrtimer but there is also the possibility for it to be
>> specific to bmi160 and bmi323.
>>
>> The described problem have been reproduced on my handheld PC (Asus
>> RC71L) and in another handheld PC with two different gyroscope
>> drivers: bmi323 (backported by me on v6.7, on RC71L) and bmi160.
>>
>> My target hardware (RC71L that yeld to this discovery) has a bmi323
>> chip that does not have any interrupt pins reaching the CPU, yet I
>> need to fetch periodically data from said device, therefore I used
>> iio-trig-hrtimer: created a trigger, set the device and trigger
>> sampling frequencies, bound the trigger to the device and enabled
>> buffer: data is being read and available over /dev/iio:device0.
>>
>> While in this state if I suspend my handheld I receive (from dmesg)
>> the warning reported below and at resume data is not coming out of
>> the iio device and the hrtimer appears to not be working. If I create
>> a new trigger and bind the new trigger to said iio device and
>> re-enable buffer data does come out of /dev/iio:device0 once more,
>> until the next sleep.
>>
>> Since this is important to me I have taken the time to look at both
>> drivers and iio-trig-hrtimer and I have identified three possible
>> reasons:
>>
>> 1) iio-trig-hrtimer won't work after suspend regardless of how it is
>> used (this is what I believe is the cause)
> me too.
who and how should investigate this issue? Would putting a kprintf in the hrtimer callback be enough to check?


Just to make sure I understood correctly: is this a separate issue from the warning I receive or are those linked?
>
>> 2) iio-trig-hrtimer is stopped by the -ESHTDOWN returned by the
>> function printing "Transfer while suspended", however that stack
>> trace does not include function calls related to iio-trig-hrtimer and
>> this seems less plausible 3) bmi160 and bmi323 appears to be similar
>> and maybe are sharing a common bug with suspend (this is also why I
>> have maintainers of those drivers in the recipient list)
>>
>> Thanks for your time, patience and understanding,
>
> Hi Denis,
>
> I suspect this is the legacy of the platform I used to test the hrtimer
> and similar triggers on never had working suspend and resume (we ripped
> support for that board out of the kernel a while back now...)
> Hence those paths were never tested by me and others may not have cared
> about this particular case.
>
> Anyhow, so I think what is going on is fairly simple.
>
> There is no way for a driver to indicate to a trigger provided by a separate
> module / hardware device that it should stop triggering data capture.
> The driver in question doesn't block data capture when suspended, which
> would be easy enough to add but doesn't feel like the right solution.
>
> So my initial thought is that we should add suspend and resume callbacks to
> iio_trigger_ops and call them manually from iio device drivers in their
> suspend and resume callbacks. These would simply pause whatever the
> trigger source was so that no attempts are made to trigger the use of
> the device when it is suspended.
>
> It gets a little messy though as triggers can be shared between
> multiple devices so we'd need to reference count suspend and resume
> for the trigger to make sure we only resume once all consumers of
> the trigger have said they are ready to cope with triggers again.
>
> As mentioned, the alternative would be to block the triggers at ingress
> to the bmi323 and bmi160 drivers. There may be a helpful pm flag that could
> be used but if not, then setting an is_suspended flag under the data->mutex
> to avoid races. and reading it in the trigger_handler to decide whether
> to talk to the device should work.
I was thinking of doing this too, but I don't know if adding a mutex to frequently invoked functions is going to introduce some timing problems and so I was waiting for some comments on that matter. If nothing bad is expected I can surely try it.
>
> I'd kind of like the generic solution of actually letting the trigger
> know, but not sure how much work it would turn out to be. Either way there
> are a lot of drivers to fix this problem in as in theory most triggers can
> be used with most drivers that support buffered data capture.
> There may also be fun corners where a hardware trigger from one IIO
> device A is being used by another B and the suspend timing is such that B
> finishing with the trigger results in A taking an action (in the try_reenable
> callback) that could result in bus traffic.
> That one is going to be much more fiddly to handle than the simpler case
> you have run into.
Since more and more handheld PCs are coming and provided many vendors such as asus tends to improve what they did built on previous generations I think a general solution would be desirable.

Plus there are handheld PCs that does not yet have a driver (bmi270) or are using an existing one and it would be very difficult to fix it in every of them as of now, in the future I fear it will become almost impossible or extremely time consuming as market expands.
>
> Thanks for the detailed bug report btw. To get it fixed a few
> questions:
> 1) Are you happy to test proposed fixes?
> 2) Do you want to have a go at fixing it yourself? (I'd suggest trying
> the fix in the bmi323 driver first rather than looking at the other
> solution)
> If we eventually implement the more general version, then a bmi323
> additional protection against this problem would not be a problem.
>
> Clearly I'd like the answers to be yes to both questions, but up to you!
>
> Jonathan
>
>
Hello Jonathan and thank you kindly for you answer,

I am very interested in the iio ecosystem as in those aforementioned handheld PCs the gyroscope plays the role as a mouse so it's a pretty important input device.

I am writing to lkml not just as a single developer, but as part of a larger community in this matter: this means we will be able to test any solution and in more than just one hardware.

To answers your questions:
1) yes, we all will be very happy to
2) as I am very busy right now I might be unable to do that immediately, but I will surely do it if one general solution cannot be found or is impractical.

As of my limited knowledge the number of drivers now existing that are affected are 2, and the number of drivers that will be affected, once the driver is written, will be at least 3.

Thank you very much for your answer,
Denis

2024-01-22 17:52:00

by Jagath Jog J

[permalink] [raw]
Subject: Re: iio: iio-trig-hrtimer bug on suspend/resume when used with bmi160 and bmi323

Hi Denis and Jonathan

Thank you for testing the driver, and I apologize for the delayed response.

During the development of the BMI323 driver, I used an ARM-based
Raspberry Pi for testing. I tested the triggers with interrupt, sysfs, and HRT.
However, I haven't tested the HRT triggers with power-saving modes. Currently,
I'm trying to reproduce the issue by continuously reading the sensor, similar
to your approach, and putting my board into suspend-to-ram mode.
Unfortunately, my board doesn't support low-power modes, resulting in a
continuous data output from the sensor and the following error related to
suspend mode. I plan to acquire another board that supports these
low-power modes soon and will work on resolving this issue.

$ echo mem > /sys/power/state

<6>[ 5636.810797] PM: suspend entry (s2idle)
<6>[ 5636.840781] Filesystems sync: 0.029 seconds
<6>[ 5636.923133] Freezing user space processes completed (elapsed
0.002 seconds)
<6>[ 5636.923156] OOM killer disabled.
<6>[ 5636.923162] Freezing remaining freezable tasks
<6>[ 5636.924689] Freezing remaining freezable tasks completed
(elapsed 0.001 seconds)
<6>[ 5636.924700] printk: Suspending console(s) (use
no_console_suspend to debug)
<4>[ 5636.969691]
<4>[ 5636.969703] WARN::dwc_otg_hcd_urb_dequeue:639: Timed out waiting
for FSM NP transfer to complete on 6
<3>[ 5637.962420] brcmfmac: brcmf_sdio_dpc: failed backplane access
over SDIO, halting operation
<3>[ 5637.962433] ieee80211 phy21: brcmf_cfg80211_reg_notifier:
Country code iovar returned err = -5
<3>[ 5637.962528] ieee80211 phy21: brcmf_proto_bcdc_query_dcmd:
brcmf_proto_bcdc_msg failed w/status -5
<3>[ 5637.962536] ieee80211 phy21: brcmf_cfg80211_reg_notifier:
Country code iovar returned err = -5
<3>[ 5637.962563] ieee80211 phy21: brcmf_fil_cmd_data: bus is down. we
have nothing to do.
<3>[ 5637.962571] ieee80211 phy21: brcmf_cfg80211_get_tx_power: error (-5)
<7>[ 5638.518728] brcmfmac_wcc: brcmf_wcc_detach: executing
<3>[ 5638.561147] usb usb1: PM: dpm_run_callback():
usb_dev_suspend+0x0/0x10 returns -2
<3>[ 5638.561179] usb usb1: PM: failed to suspend async: error -2
<3>[ 5638.561194] PM: Some devices failed to suspend, or early wake
event detected
<7>[ 5638.661065] brcmfmac: F1 signature read @0x18000000=0x15264345
<6>[ 5638.665324] brcmfmac: brcmf_fw_alloc_request: using
brcm/brcmfmac43455-sdio for chip BCM4345/6
<4>[ 5638.665413] brcmfmac mmc1:0001:1: Direct firmware load for
brcm/brcmfmac43455-sdio.raspberrypi,3-model-b-plus.bin failed with
error -2
<6>[ 5638.709079] lan78xx 1-1.1.1:1.0 eth0: Link is Down
<7>[ 5638.841128] brcmfmac_wcc: brcmf_wcc_attach: executing
<6>[ 5638.848752] brcmfmac: brcmf_c_process_txcap_blob: no txcap_blob
available (err=-2)
<6>[ 5638.849206] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM4345/6
wl0: Jan 4 2021 19:56:29 version 7.45.229 (617f1f5 CY) FWID
01-2dbd9d2e
<6>[ 5638.859102] OOM killer enabled.
<6>[ 5638.862718] PM: suspend entry (s2idle)
<6>[ 5638.980447] Filesystems sync: 0.117 seconds
<6>[ 5639.479450] Restarting tasks ... done.
<5>[ 5639.481816] random: crng reseeded on system resumption
<6>[ 5639.481967] PM: suspend exit


Regards
Jagath

Jagath

On Thu, Jan 11, 2024 at 4:05 AM Denis Benato <[email protected]> wrote:
>
> Hello,
>
> With this mail I am submitting bug report that is probably related to iio-trig-hrtimer but there is also the possibility for it to be specific to bmi160 and bmi323.
>
> The described problem have been reproduced on my handheld PC (Asus RC71L) and in another handheld PC with two different gyroscope drivers: bmi323 (backported by me on v6.7, on RC71L) and bmi160.
>
> My target hardware (RC71L that yeld to this discovery) has a bmi323 chip that does not have any interrupt pins reaching the CPU, yet I need to fetch periodically data from said device, therefore I used iio-trig-hrtimer: created a trigger, set the device and trigger sampling frequencies, bound the trigger to the device and enabled buffer: data is being read and available over /dev/iio:device0.
>
> While in this state if I suspend my handheld I receive (from dmesg) the warning reported below and at resume data is not coming out of the iio device and the hrtimer appears to not be working. If I create a new trigger and bind the new trigger to said iio device and re-enable buffer data does come out of /dev/iio:device0 once more, until the next sleep.
>
> Since this is important to me I have taken the time to look at both drivers and iio-trig-hrtimer and I have identified three possible reasons:
>
> 1) iio-trig-hrtimer won't work after suspend regardless of how it is used (this is what I believe is the cause)
> 2) iio-trig-hrtimer is stopped by the -ESHTDOWN returned by the function printing "Transfer while suspended", however that stack trace does not include function calls related to iio-trig-hrtimer and this seems less plausible
> 3) bmi160 and bmi323 appears to be similar and maybe are sharing a common bug with suspend (this is also why I have maintainers of those drivers in the recipient list)
>
> Thanks for your time, patience and understanding,
> Denis Benato
>
>
> This one is from my device:
> [ 635.956648] ------------[ cut here ]------------
> [ 635.956649] i2c i2c-1: Transfer while suspended
> [ 635.956663] WARNING: CPU: 9 PID: 848 at __i2c_transfer+0xaf/0x6e0
> [ 635.956670] Modules linked in: ccm rfcomm ntfs3 snd_seq_dummy snd_hrtimer snd_seq cmac algif_hash algif_skcipher af_alg bnep iio_trig_hrtimer iio_trig_sysfs hid_playstation led_class_multicolor industrialio_sw_trigger uhid industrialio_configfs snd_sof_amd_acp63 snd_sof_amd_rembrandt mt7921e snd_sof_amd_renoir mt7921_common snd_sof_amd_acp snd_usb_audio hid_logitech_hidpp snd_sof_xtensa_dsp mt792x_lib snd_sof_pci r8153_ecm mt76_connac_lib cdc_ether snd_ump snd_sof snd_usbmidi_lib mt76 usbnet snd_rawmidi snd_hda_codec_realtek snd_sof_utils snd_seq_device snd_hda_scodec_cs35l41_spi mc snd_hda_codec_generic intel_rapl_msr mac80211 hid_multitouch snd_hda_codec_hdmi snd_soc_core snd_compress ac97_bus snd_pcm_dmaengine snd_pci_ps intel_rapl_common snd_hda_intel edac_mce_amd snd_intel_dspcfg snd_rpl_pci_acp6x snd_acp_pci snd_hda_codec snd_acp_legacy_common kvm_amd snd_hda_core snd_pci_acp6x snd_pci_acp5x snd_hwdep btusb snd_rn_pci_acp3x snd_hda_scodec_cs35l41_i2c snd_acp_config btrtl snd_pcm snd_hda_scodec_cs35l41 bmi323_i2c
> [ 635.956705] btmtk btbcm bmi323_core kvm snd_hda_cs_dsp_ctls cfg80211 btintel snd_soc_acpi snd_timer uas ucsi_acpi cs_dsp xpad industrialio_triggered_buffer irqbypass crct10dif_pclmul polyval_clmulni polyval_generic bluetooth r8152 joydev ghash_clmulni_intel snd_soc_cs35l41_lib ecdh_generic sha1_ssse3 rapl ecc usb_storage mii pcspkr asus_nb_wmi wmi_bmof efi_pstore typec_ucsi kfifo_buf k10temp i2c_piix4 snd ff_memless snd_pci_acp3x nls_iso8859_1 libarc4 soundcore ccp typec hid_logitech_dj usbmouse usbkbd i2c_hid_acpi i2c_hid industrialio amd_pmc serial_multi_instantiate input_leds mac_hid sch_fq_codel crypto_user nfnetlink dmi_sysfs ip_tables x_tables autofs4 hid_asus asus_wmi ledtrig_audio sparse_keymap platform_profile hid_generic usbhid hid btrfs mmc_block amdgpu drm_buddy drm_suballoc_helper raid6_pq xor libcrc32c drm_exec i2c_algo_bit drm_display_helper cec gpu_sched amdxcp drm_ttm_helper ttm drm_kms_helper video crc32_pclmul sha512_ssse3 drm sha256_ssse3 nvme sdhci_pci nvme_core cqhci serio_raw xhci_pci sdhci
> [ 635.956752] xhci_pci_renesas wmi overlay aesni_intel crypto_simd cryptd
> [ 635.956755] CPU: 9 PID: 848 Comm: irq/90-bmi323-i Tainted: G W 6.7.0--xanmod1-neroreflex-1 #1
> [ 635.956758] Hardware name: ASUSTeK COMPUTER INC. ROG Ally RC71L_RC71L/RC71L, BIOS RC71L.335 11/29/2023
> [ 635.956759] RIP: 0010:__i2c_transfer+0xaf/0x6e0
> [ 635.956761] Code: 00 00 48 85 c0 0f 84 2a 06 00 00 48 8b 30 48 8b 93 c0 00 00 00 48 85 d2 75 04 48 8b 53 70 48 c7 c7 9a 5e aa 99 e8 31 57 3e ff <0f> 0b e9 2e 04 00 00 ba 02 00 00 00 41 83 ff 02 0f 85 dd 01 00 00
> [ 635.956763] RSP: 0018:ffffbf33c160bbf8 EFLAGS: 00010246
> [ 635.956764] RAX: 41e9eaf9bb59b300 RBX: ffff9f8f013a28e8 RCX: 0000000000000027
> [ 635.956765] RDX: ffffffff9a235ab8 RSI: 0000000000000002 RDI: ffff9f9152261888
> [ 635.956766] RBP: 00000000ffffff94 R08: 80000000ffffe5fd R09: 656c696877207265
> [ 635.956767] R10: 0000000000000023 R11: 0a6465646e657073 R12: 0000000000000000
> [ 635.956768] R13: 0000000000000003 R14: ffffbf33c160bc50 R15: 0000000000000002
> [ 635.956769] FS: 0000000000000000(0000) GS:ffff9f9152240000(0000) knlGS:0000000000000000
> [ 635.956770] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 635.956771] CR2: 0000020b0a053000 CR3: 000000028ae10000 CR4: 0000000000750ef0
> [ 635.956773] PKRU: 55555554
> [ 635.956773] Call Trace:
> [ 635.956775] <TASK>
> [ 635.956777] ? __warn+0x13c/0x1f0
> [ 635.956779] ? __i2c_transfer+0xaf/0x6e0
> [ 635.956781] ? report_bug+0x16e/0x1f0
> [ 635.956784] ? handle_bug+0x42/0x70
> [ 635.956786] ? exc_invalid_op+0x16/0x50
> [ 635.956788] ? asm_exc_invalid_op+0x16/0x20
> [ 635.956791] ? __i2c_transfer+0xaf/0x6e0
> [ 635.956793] ? __i2c_transfer+0xaf/0x6e0
> [ 635.956794] i2c_transfer+0xe3/0x110
> [ 635.956798] bmi323_regmap_i2c_read+0x7b/0xc0 [bmi323_i2c]
> [ 635.956803] _regmap_raw_read+0x11d/0x200
> [ 635.956806] regmap_raw_read+0x26a/0x2c0
> [ 635.956807] ? __schedule+0x604/0x16b0
> [ 635.956810] regmap_bulk_read+0x17b/0x200
> [ 635.956813] ? iio_store_to_kfifo+0x11/0x30 [kfifo_buf]
> [ 635.956818] bmi323_trigger_handler+0x52/0x160 [bmi323_core]
> [ 635.956824] irq_thread_fn+0x18/0x50
> [ 635.956826] irq_thread+0x21a/0x3c0
> [ 635.956828] ? wake_up_and_wait_for_irq_thread_ready+0x190/0x190
> [ 635.956830] ? irq_forced_thread_fn+0x110/0x110
> [ 635.956831] kthread+0x26c/0x2b0
> [ 635.956833] ? irq_forced_secondary_handler+0x20/0x20
> [ 635.956835] ? kthreadd+0x300/0x300
> [ 635.956836] ret_from_fork+0x2e/0x40
> [ 635.956838] ? kthreadd+0x300/0x300
> [ 635.956839] ret_from_fork_asm+0x11/0x20
> [ 635.956841] </TASK>
> [ 635.956842] ---[ end trace 0000000000000000 ]---
>
> The following is from the other device (amd 5600u):
> [ 67.131064] ------------[ cut here ]------------
> [ 67.131067] i2c i2c-1: Transfer while suspended
> [ 67.131078] WARNING: CPU: 1 PID: 1655 at drivers/i2c/i2c-core.h:54 __i2c_transfer+0x594/0x6b0
> [ 67.131090] Modules linked in: iio_trig_hrtimer iio_trig_sysfs industrialio_sw_trigger industrialio_configfs ccm rfcomm cmac algif_hash algif_skcipher af_alg snd_seq_dummy snd_hrtimer snd_seq snd_seq_device exfat bnep wl(POE) intel_rapl_msr zenergy(OE) intel_rapl_common mousedev joydev snd_sof_amd_rembrandt snd_sof_amd_renoir snd_sof_amd_acp snd_sof_pci mt7921e snd_sof_xtensa_dsp mt7921_common snd_hda_codec_realtek snd_sof mt792x_lib snd_hda_codec_generic snd_sof_utils mt76_connac_lib cdc_mbim cdc_wdm edac_mce_amd ledtrig_audio snd_hda_codec_hdmi snd_soc_core mt76 kvm_amd snd_hda_intel snd_compress snd_intel_dspcfg ac97_bus mac80211 snd_pcm_dmaengine
> [ 67.131146] ACPI Error:
> [ 67.131147] snd_intel_sdw_acpi kvm btusb snd_pci_ps
> [ 67.131151] No handler for Region [ECRM] (00000000b2bf81c9) [EmbeddedControl]
> [ 67.131152] btrtl snd_hda_codec
> [ 67.131155] (20230628/evregion-130)
> [ 67.131155] snd_rpl_pci_acp6x libarc4 snd_acp_pci
> [ 67.131158] ACPI Error:
> [ 67.131159] irqbypass
> [ 67.131161] Region EmbeddedControl (ID=3) has no handler
> [ 67.131165] btintel
> [ 67.131166] (20230628/exfldio-261)
> [ 67.131169] snd_hda_core snd_acp_legacy_common btbcm crct10dif_pclmul
> [ 67.131176] ACPI Error:
> [ 67.131176] bmi160_spi
> [ 67.131178] Aborting method
> [ 67.131178] snd_pci_acp6x
> [ 67.131180] \_SB.GPIO._EVT
> [ 67.131180] snd_hwdep
> [ 67.131182] due to previous error (AE_NOT_EXIST)
> [ 67.131182] btmtk
> [ 67.131183] (20230628/psparse-529)
> [ 67.131183] polyval_clmulni cfg80211 snd_pcm polyval_generic snd_pci_acp5x gf128mul cdc_ncm snd_timer snd_rn_pci_acp3x bluetooth hid_multitouch ghash_clmulni_intel cdc_ether bmi160_i2c snd_acp_config snd usbnet sha1_ssse3 vfat bmi160_core snd_soc_acpi oxp_sensors fat rapl ecdh_generic wmi_bmof industrialio_triggered_buffer mii pcspkr rfkill k10temp i2c_piix4 snd_pci_acp3x soundcore ccp kfifo_buf i2c_hid_acpi soc_button_array i2c_hid industrialio mac_hid xpad ff_memless uinput ec_sys crypto_user fuse loop nfnetlink ip_tables x_tables uas usbhid usb_storage amdgpu btrfs drm_exec blake2b_generic amdxcp xor drm_buddy raid6_pq gpu_sched libcrc32c crc32_pclmul i2c_algo_bit sha512_ssse3 drm_suballoc_helper serio_raw sha256_ssse3 drm_ttm_helper atkbd aesni_intel libps2 ttm nvme vivaldi_fmap crypto_simd drm_display_helper nvme_core cryptd video i8042 cec nvme_common wmi serio overlay ext4 crc32c_generic crc32c_intel crc16 mbcache jbd2 dm_mod
> [ 67.131318] CPU: 1 PID: 1655 Comm: irq/70-i2c-BMI0 Tainted: P OE 6.6.7-chos5-chimeraos-1 #1 011c184bc075678bbf5f825a76fa2f06741839b6
> [ 67.131324] Hardware name: ONE-NETBOOK TECHNOLOGY CO., LTD. ONE XPLAYER/ONE XPLAYER, BIOS V1.08_P4C8M43L6 04/28/2022
> [ 67.131327] RIP: 0010:__i2c_transfer+0x594/0x6b0
> [ 67.131334] Code: 8b af c0 00 00 00 48 85 ed 75 04 48 8b 6f 70 48 8d 7b 70 e8 ae 92 ea ff 48 89 ea 48 c7 c7 c8 f0 ed 82 48 89 c6 e8 2c e7 63 ff <0f> 0b bd 94 ff ff ff e9 fc fb ff ff 48 c7 c6 50 95 9e 82 48 c7 c7
> [ 67.131336] RSP: 0018:ffffc90012623cb0 EFLAGS: 00010286
> [ 67.131339] RAX: 0000000000000000 RBX: ffff8881017c68e8 RCX: 0000000000000027
> [ 67.131342] RDX: ffff888360e616c8 RSI: 0000000000000001 RDI: ffff888360e616c0
> [ 67.131344] RBP: ffff888100e0de38 R08: ffffffff83756c80 R09: 000000008407cebb
> [ 67.131345] R10: ffffffffffffffff R11: 0000000000000100 R12: 0000000000000002
> [ 67.131347] R13: ffff888110836c00 R14: 0000000000000000 R15: ffff888110836c00
> [ 67.131348] FS: 0000000000000000(0000) GS:ffff888360e40000(0000) knlGS:0000000000000000
> [ 67.131351] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 67.131352] CR2: 0000000000000000 CR3: 000000024ec20000 CR4: 0000000000750ee0
> [ 67.131354] PKRU: 55555554
> [ 67.131356] Call Trace:
> [ 67.131358] <TASK>
> [ 67.131359] ? __i2c_transfer+0x594/0x6b0
> [ 67.131363] ? __warn+0x81/0x130
> [ 67.131373] ? __i2c_transfer+0x594/0x6b0
> [ 67.131377] ? report_bug+0x171/0x1a0
> [ 67.131382] ? srso_alias_return_thunk+0x5/0x7f
> [ 67.131387] ? console_unlock+0xcb/0x120
> [ 67.131394] ? handle_bug+0x3c/0x80
> [ 67.131399] ? exc_invalid_op+0x17/0x70
> [ 67.131402] ? asm_exc_invalid_op+0x1a/0x20
> [ 67.131411] ? __i2c_transfer+0x594/0x6b0
> [ 67.131417] i2c_transfer+0x4f/0x100
> [ 67.131421] regmap_i2c_read+0x73/0xb0
> [ 67.131428] _regmap_raw_read+0xe6/0x230
> [ 67.131434] regmap_raw_read+0x25c/0x2c0
> [ 67.131440] regmap_bulk_read+0x1e4/0x290
> [ 67.131448] bmi160_trigger_handler+0x56/0x100 [bmi160_core 01d2f0c9a6f31373f2aa6b38a031d29ec6d3cd40]
> [ 67.131457] ? __pfx_irq_thread_fn+0x10/0x10
> [ 67.131461] irq_thread_fn+0x23/0x60
> [ 67.131466] irq_thread+0xfe/0x1c0
> [ 67.131470] ? __pfx_irq_thread_dtor+0x10/0x10
> [ 67.131475] ? __pfx_irq_thread+0x10/0x10
> [ 67.131478] kthread+0xe8/0x120
> [ 67.131484] ? __pfx_kthread+0x10/0x10
> [ 67.131488] ret_from_fork+0x34/0x50
> [ 67.131493] ? __pfx_kthread+0x10/0x10
> [ 67.131496] ret_from_fork_asm+0x1b/0x30
> [ 67.131506] </TASK>
> [ 67.131507] ---[ end trace 0000000000000000 ]---
>
> For reference I place here the bash file that is used to reproduce the issue:
> #!/bin/bash
> modprobe industrialio-sw-trigger
> modprobe iio-trig-sysfs
> modprobe iio-trig-hrtimer
>
> # hrtimer
> if [ ! -d "/home/config" ]; then
> mkdir -p /home/config
> fi
>
> mount -t configfs none /home/config
> mkdir -p /home/config/iio/triggers/hrtimer/rogue
>
> # set sampling frequency for rogue
> for i in /sys/bus/iio/devices/* ; do
> if [ -d "$i" ]; then
> if [ -f "$i/name" ]; then
> name=$(cat "$i/name")
> if [ "$name" = "rogue" ]; then
> echo "1600" > "$i/sampling_frequency"
> fi
> fi
> fi
> done
>
> # set the gyroscope
> for i in /sys/bus/iio/devices/* ; do
> if [ -d "$i" ]; then
> if [ -f "$i/name" ]; then
> name=$(cat "$i/name")
> if [ "$name" = "bmi323-imu" ]; then
>
> # change chip sampling frequency
> echo "1600.000000" > "$i/in_accel_sampling_frequency"
> echo "1600.000000" > "$i/in_anglvel_sampling_frequency"
>
> # enable accel data acquisition
> echo 1 > "$i/scan_elements/in_accel_x_en"
> echo 1 > "$i/scan_elements/in_accel_y_en"
> echo 1 > "$i/scan_elements/in_accel_z_en"
>
> # enable gyroscope data acquisition
> echo 1 > "$i/scan_elements/in_anglvel_x_en"
> echo 1 > "$i/scan_elements/in_anglvel_y_en"
> echo 1 > "$i/scan_elements/in_anglvel_z_en"
>
> # enable timestamp reporting
> echo 1 > "$i/scan_elements/in_timestamp_en"
>
> # bind rogue hrtimer to to the iio device
> echo "rogue" > "$i/trigger/current_trigger"
>
> # enable the buffer
> echo 1 > "$i/buffer0/enable"
>
> echo "bmi323-imu buffer started"
> fi
> fi
> fi
> done

2024-05-30 14:07:38

by Denis Benato

[permalink] [raw]
Subject: Re: iio: iio-trig-hrtimer bug on suspend/resume when used with bmi160 and bmi323

On 1/15/24 10:37, Jonathan Cameron wrote:
> On Sun, 14 Jan 2024 21:04:12 +0100
> Denis Benato <[email protected]> wrote:
>
>> On 1/13/24 18:46, Jonathan Cameron wrote:
>>> On Wed, 10 Jan 2024 23:35:01 +0100
>>> Denis Benato <[email protected]> wrote:
>>>
>>>> Hello,
>>>>
>>>> With this mail I am submitting bug report that is probably related to
>>>> iio-trig-hrtimer but there is also the possibility for it to be
>>>> specific to bmi160 and bmi323.
>>>>
>>>> The described problem have been reproduced on my handheld PC (Asus
>>>> RC71L) and in another handheld PC with two different gyroscope
>>>> drivers: bmi323 (backported by me on v6.7, on RC71L) and bmi160.
>>>>
>>>> My target hardware (RC71L that yeld to this discovery) has a bmi323
>>>> chip that does not have any interrupt pins reaching the CPU, yet I
>>>> need to fetch periodically data from said device, therefore I used
>>>> iio-trig-hrtimer: created a trigger, set the device and trigger
>>>> sampling frequencies, bound the trigger to the device and enabled
>>>> buffer: data is being read and available over /dev/iio:device0.
>>>>
>>>> While in this state if I suspend my handheld I receive (from dmesg)
>>>> the warning reported below and at resume data is not coming out of
>>>> the iio device and the hrtimer appears to not be working. If I create
>>>> a new trigger and bind the new trigger to said iio device and
>>>> re-enable buffer data does come out of /dev/iio:device0 once more,
>>>> until the next sleep.
>>>>
>>>> Since this is important to me I have taken the time to look at both
>>>> drivers and iio-trig-hrtimer and I have identified three possible
>>>> reasons:
>>>>
>>>> 1) iio-trig-hrtimer won't work after suspend regardless of how it is
>>>> used (this is what I believe is the cause)
>>> me too.
>> who and how should investigate this issue? Would putting a kprintf in the hrtimer callback be enough to check?
> The warning you have pretty much points at this being the problem, but sure
> you could add a print there to be absolutely sure.
>
>>
>>
>> Just to make sure I understood correctly: is this a separate issue from the warning I receive or are those linked?
>
> I think it's all one issue.
Hello,

Sorry for the delay, I was able to find some time just recently.

Sadly I don't think anymore this is just one issue:

I have setup a proof of concept that at suspend sets a boolean to true and at resume sets the same boolean to false, and on trigger_handler returns IRQ_NONE if the device is sleeping (everything guarded by an additional mutex) and it solved the warning: no more regmap access are being performed after the device is put to sleep, but unfortunately the acquisition does not automatically resume after waking up the device.

How shall we proceed?

>
>>>
>>>> 2) iio-trig-hrtimer is stopped by the -ESHTDOWN returned by the
>>>> function printing "Transfer while suspended", however that stack
>>>> trace does not include function calls related to iio-trig-hrtimer and
>>>> this seems less plausible 3) bmi160 and bmi323 appears to be similar
>>>> and maybe are sharing a common bug with suspend (this is also why I
>>>> have maintainers of those drivers in the recipient list)
>>>>
>>>> Thanks for your time, patience and understanding,
>>>
>>> Hi Denis,
>>>
>>> I suspect this is the legacy of the platform I used to test the hrtimer
>>> and similar triggers on never had working suspend and resume (we ripped
>>> support for that board out of the kernel a while back now...)
>>> Hence those paths were never tested by me and others may not have cared
>>> about this particular case.
>>>
>>> Anyhow, so I think what is going on is fairly simple.
>>>
>>> There is no way for a driver to indicate to a trigger provided by a separate
>>> module / hardware device that it should stop triggering data capture.
>>> The driver in question doesn't block data capture when suspended, which
>>> would be easy enough to add but doesn't feel like the right solution.
>>>
>>> So my initial thought is that we should add suspend and resume callbacks to
>>> iio_trigger_ops and call them manually from iio device drivers in their
>>> suspend and resume callbacks. These would simply pause whatever the
>>> trigger source was so that no attempts are made to trigger the use of
>>> the device when it is suspended.
>>>
>>> It gets a little messy though as triggers can be shared between
>>> multiple devices so we'd need to reference count suspend and resume
>>> for the trigger to make sure we only resume once all consumers of
>>> the trigger have said they are ready to cope with triggers again.
>>>
>>> As mentioned, the alternative would be to block the triggers at ingress
>>> to the bmi323 and bmi160 drivers. There may be a helpful pm flag that could
>>> be used but if not, then setting an is_suspended flag under the data->mutex
>>> to avoid races. and reading it in the trigger_handler to decide whether
>>> to talk to the device should work.
>> I was thinking of doing this too, but I don't know if adding a mutex to frequently invoked functions is going to introduce some timing problems and so I was waiting for some comments on that matter. If nothing bad is expected I can surely try it.
>>>
>>> I'd kind of like the generic solution of actually letting the trigger
>>> know, but not sure how much work it would turn out to be. Either way there
>>> are a lot of drivers to fix this problem in as in theory most triggers can
>>> be used with most drivers that support buffered data capture.
>>> There may also be fun corners where a hardware trigger from one IIO
>>> device A is being used by another B and the suspend timing is such that B
>>> finishing with the trigger results in A taking an action (in the try_reenable
>>> callback) that could result in bus traffic.
>>> That one is going to be much more fiddly to handle than the simpler case
>>> you have run into.
>> Since more and more handheld PCs are coming and provided many vendors such as
>> asus tends to improve what they did built on previous generations I think a
>> general solution would be desirable.
>>
>> Plus there are handheld PCs that does not yet have a driver (bmi270) or are
>> using an existing one and it would be very difficult to fix it in every of
>> them as of now, in the future I fear it will become almost impossible or
>> extremely time consuming as market expands.
>
> Both solutions require specific calls to be added to every driver that might
> encounter this - so most drivers that support triggers other than the ones
> they supply.
>
>>>
>>> Thanks for the detailed bug report btw. To get it fixed a few
>>> questions:
>>> 1) Are you happy to test proposed fixes?
>>> 2) Do you want to have a go at fixing it yourself? (I'd suggest trying
>>> the fix in the bmi323 driver first rather than looking at the other
>>> solution)
>>> If we eventually implement the more general version, then a bmi323
>>> additional protection against this problem would not be a problem.
>>>
>>> Clearly I'd like the answers to be yes to both questions, but up to you!
>>>
>>> Jonathan
>>>
>>>
>> Hello Jonathan and thank you kindly for you answer,
>>
>> I am very interested in the iio ecosystem as in those aforementioned
>> handheld PCs the gyroscope plays the role as a mouse so it's a pretty
>> important input device.
>>
>> I am writing to lkml not just as a single developer, but as part of a
>> larger community in this matter: this means we will be able to test
>> any solution and in more than just one hardware.
>>
>> To answers your questions:
>> 1) yes, we all will be very happy to
>> 2) as I am very busy right now I might be unable to do that immediately,
>> but I will surely do it if one general solution cannot be found or is impractical.
>>
>> As of my limited knowledge the number of drivers now existing that are affected
>> are 2, and the number of drivers that will be affected, once the driver is
>> written, will be at least 3.
>
> The problem appears to be general unfortunately.
>
> I'll have to see if I can fire up something where I can actually test solutions
> and I'm afraid I also don't have a lot of time at the moment.
> Perhaps I can find time in the next few weeks to hack together a prototype
> for one of the drivers you can test.
>
> Jonathan
>
>>
>> Thank you very much for your answer,
>> Denis
>>
>

Thank you for you time and patience,
Denis

2024-06-02 16:08:58

by Jonathan Cameron

[permalink] [raw]
Subject: Re: iio: iio-trig-hrtimer bug on suspend/resume when used with bmi160 and bmi323

On Thu, 30 May 2024 16:07:22 +0200
Denis Benato <[email protected]> wrote:

> On 1/15/24 10:37, Jonathan Cameron wrote:
> > On Sun, 14 Jan 2024 21:04:12 +0100
> > Denis Benato <[email protected]> wrote:
> >
> >> On 1/13/24 18:46, Jonathan Cameron wrote:
> >>> On Wed, 10 Jan 2024 23:35:01 +0100
> >>> Denis Benato <[email protected]> wrote:
> >>>
> >>>> Hello,
> >>>>
> >>>> With this mail I am submitting bug report that is probably related to
> >>>> iio-trig-hrtimer but there is also the possibility for it to be
> >>>> specific to bmi160 and bmi323.
> >>>>
> >>>> The described problem have been reproduced on my handheld PC (Asus
> >>>> RC71L) and in another handheld PC with two different gyroscope
> >>>> drivers: bmi323 (backported by me on v6.7, on RC71L) and bmi160.
> >>>>
> >>>> My target hardware (RC71L that yeld to this discovery) has a bmi323
> >>>> chip that does not have any interrupt pins reaching the CPU, yet I
> >>>> need to fetch periodically data from said device, therefore I used
> >>>> iio-trig-hrtimer: created a trigger, set the device and trigger
> >>>> sampling frequencies, bound the trigger to the device and enabled
> >>>> buffer: data is being read and available over /dev/iio:device0.
> >>>>
> >>>> While in this state if I suspend my handheld I receive (from dmesg)
> >>>> the warning reported below and at resume data is not coming out of
> >>>> the iio device and the hrtimer appears to not be working. If I create
> >>>> a new trigger and bind the new trigger to said iio device and
> >>>> re-enable buffer data does come out of /dev/iio:device0 once more,
> >>>> until the next sleep.
> >>>>
> >>>> Since this is important to me I have taken the time to look at both
> >>>> drivers and iio-trig-hrtimer and I have identified three possible
> >>>> reasons:
> >>>>
> >>>> 1) iio-trig-hrtimer won't work after suspend regardless of how it is
> >>>> used (this is what I believe is the cause)
> >>> me too.
> >> who and how should investigate this issue? Would putting a kprintf in the hrtimer callback be enough to check?
> > The warning you have pretty much points at this being the problem, but sure
> > you could add a print there to be absolutely sure.
> >
> >>
> >>
> >> Just to make sure I understood correctly: is this a separate issue from the warning I receive or are those linked?
> >
> > I think it's all one issue.
> Hello,
>
> Sorry for the delay, I was able to find some time just recently.
>
> Sadly I don't think anymore this is just one issue:
>
> I have setup a proof of concept that at suspend sets a boolean to true and at resume sets the same boolean to false, and on trigger_handler returns IRQ_NONE if the device is sleeping (everything guarded by an additional mutex) and it solved the warning: no more regmap access are being performed after the device is put to sleep, but unfortunately the acquisition does not automatically resume after waking up the device.
>
> How shall we proceed?

It's been a while (and a busy day of reviewing) so I'm struggling
a bit with the best away forwards.

A hackyish approach might be to mask the interrupt at device side
(fake interrupt used as part of the tree from a trigger to a pollfunc).
A tidy wrapped up version of this might be the best route forwards
though it won't currently stop the hrtimer merrily poking interrupts
at no one

As a hack, could you try having your suspend call disable_irq() on
the irq found in iio_dev->poll_func->irq and reenable it in resume?
That might be sufficient.

Check poll_func goes somewhere first though and that irq is > 0
I think that will call iio_trig_subirq_mask() which should block
any further interrupts until it's unmasked again.

We'll need to ensure this doesn't race with pollfunc going away though
which will make it more complex, but the above test should tell us
if there is a fairly smooth path to making this work.

I'll try and find time to do some testing myself, but it won't be
for a few weeks :(

Thanks,

Jonathan


> >
> >>>
> >>>> 2) iio-trig-hrtimer is stopped by the -ESHTDOWN returned by the
> >>>> function printing "Transfer while suspended", however that stack
> >>>> trace does not include function calls related to iio-trig-hrtimer and
> >>>> this seems less plausible 3) bmi160 and bmi323 appears to be similar
> >>>> and maybe are sharing a common bug with suspend (this is also why I
> >>>> have maintainers of those drivers in the recipient list)
> >>>>
> >>>> Thanks for your time, patience and understanding,
> >>>
> >>> Hi Denis,
> >>>
> >>> I suspect this is the legacy of the platform I used to test the hrtimer
> >>> and similar triggers on never had working suspend and resume (we ripped
> >>> support for that board out of the kernel a while back now...)
> >>> Hence those paths were never tested by me and others may not have cared
> >>> about this particular case.
> >>>
> >>> Anyhow, so I think what is going on is fairly simple.
> >>>
> >>> There is no way for a driver to indicate to a trigger provided by a separate
> >>> module / hardware device that it should stop triggering data capture.
> >>> The driver in question doesn't block data capture when suspended, which
> >>> would be easy enough to add but doesn't feel like the right solution.
> >>>
> >>> So my initial thought is that we should add suspend and resume callbacks to
> >>> iio_trigger_ops and call them manually from iio device drivers in their
> >>> suspend and resume callbacks. These would simply pause whatever the
> >>> trigger source was so that no attempts are made to trigger the use of
> >>> the device when it is suspended.
> >>>
> >>> It gets a little messy though as triggers can be shared between
> >>> multiple devices so we'd need to reference count suspend and resume
> >>> for the trigger to make sure we only resume once all consumers of
> >>> the trigger have said they are ready to cope with triggers again.
> >>>
> >>> As mentioned, the alternative would be to block the triggers at ingress
> >>> to the bmi323 and bmi160 drivers. There may be a helpful pm flag that could
> >>> be used but if not, then setting an is_suspended flag under the data->mutex
> >>> to avoid races. and reading it in the trigger_handler to decide whether
> >>> to talk to the device should work.
> >> I was thinking of doing this too, but I don't know if adding a mutex to frequently invoked functions is going to introduce some timing problems and so I was waiting for some comments on that matter. If nothing bad is expected I can surely try it.
> >>>
> >>> I'd kind of like the generic solution of actually letting the trigger
> >>> know, but not sure how much work it would turn out to be. Either way there
> >>> are a lot of drivers to fix this problem in as in theory most triggers can
> >>> be used with most drivers that support buffered data capture.
> >>> There may also be fun corners where a hardware trigger from one IIO
> >>> device A is being used by another B and the suspend timing is such that B
> >>> finishing with the trigger results in A taking an action (in the try_reenable
> >>> callback) that could result in bus traffic.
> >>> That one is going to be much more fiddly to handle than the simpler case
> >>> you have run into.
> >> Since more and more handheld PCs are coming and provided many vendors such as
> >> asus tends to improve what they did built on previous generations I think a
> >> general solution would be desirable.
> >>
> >> Plus there are handheld PCs that does not yet have a driver (bmi270) or are
> >> using an existing one and it would be very difficult to fix it in every of
> >> them as of now, in the future I fear it will become almost impossible or
> >> extremely time consuming as market expands.
> >
> > Both solutions require specific calls to be added to every driver that might
> > encounter this - so most drivers that support triggers other than the ones
> > they supply.
> >
> >>>
> >>> Thanks for the detailed bug report btw. To get it fixed a few
> >>> questions:
> >>> 1) Are you happy to test proposed fixes?
> >>> 2) Do you want to have a go at fixing it yourself? (I'd suggest trying
> >>> the fix in the bmi323 driver first rather than looking at the other
> >>> solution)
> >>> If we eventually implement the more general version, then a bmi323
> >>> additional protection against this problem would not be a problem.
> >>>
> >>> Clearly I'd like the answers to be yes to both questions, but up to you!
> >>>
> >>> Jonathan
> >>>
> >>>
> >> Hello Jonathan and thank you kindly for you answer,
> >>
> >> I am very interested in the iio ecosystem as in those aforementioned
> >> handheld PCs the gyroscope plays the role as a mouse so it's a pretty
> >> important input device.
> >>
> >> I am writing to lkml not just as a single developer, but as part of a
> >> larger community in this matter: this means we will be able to test
> >> any solution and in more than just one hardware.
> >>
> >> To answers your questions:
> >> 1) yes, we all will be very happy to
> >> 2) as I am very busy right now I might be unable to do that immediately,
> >> but I will surely do it if one general solution cannot be found or is impractical.
> >>
> >> As of my limited knowledge the number of drivers now existing that are affected
> >> are 2, and the number of drivers that will be affected, once the driver is
> >> written, will be at least 3.
> >
> > The problem appears to be general unfortunately.
> >
> > I'll have to see if I can fire up something where I can actually test solutions
> > and I'm afraid I also don't have a lot of time at the moment.
> > Perhaps I can find time in the next few weeks to hack together a prototype
> > for one of the drivers you can test.
> >
> > Jonathan
> >
> >>
> >> Thank you very much for your answer,
> >> Denis
> >>
> >
>
> Thank you for you time and patience,
> Denis


2024-06-08 01:25:37

by Denis Benato

[permalink] [raw]
Subject: Re: iio: iio-trig-hrtimer bug on suspend/resume when used with bmi160 and bmi323

On 6/2/24 18:08, Jonathan Cameron wrote:
> On Thu, 30 May 2024 16:07:22 +0200
> Denis Benato <[email protected]> wrote:
>
>> On 1/15/24 10:37, Jonathan Cameron wrote:
>>> On Sun, 14 Jan 2024 21:04:12 +0100
>>> Denis Benato <[email protected]> wrote:
>>>
>>>> On 1/13/24 18:46, Jonathan Cameron wrote:
>>>>> On Wed, 10 Jan 2024 23:35:01 +0100
>>>>> Denis Benato <[email protected]> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> With this mail I am submitting bug report that is probably related to
>>>>>> iio-trig-hrtimer but there is also the possibility for it to be
>>>>>> specific to bmi160 and bmi323.
>>>>>>
>>>>>> The described problem have been reproduced on my handheld PC (Asus
>>>>>> RC71L) and in another handheld PC with two different gyroscope
>>>>>> drivers: bmi323 (backported by me on v6.7, on RC71L) and bmi160.
>>>>>>
>>>>>> My target hardware (RC71L that yeld to this discovery) has a bmi323
>>>>>> chip that does not have any interrupt pins reaching the CPU, yet I
>>>>>> need to fetch periodically data from said device, therefore I used
>>>>>> iio-trig-hrtimer: created a trigger, set the device and trigger
>>>>>> sampling frequencies, bound the trigger to the device and enabled
>>>>>> buffer: data is being read and available over /dev/iio:device0.
>>>>>>
>>>>>> While in this state if I suspend my handheld I receive (from dmesg)
>>>>>> the warning reported below and at resume data is not coming out of
>>>>>> the iio device and the hrtimer appears to not be working. If I create
>>>>>> a new trigger and bind the new trigger to said iio device and
>>>>>> re-enable buffer data does come out of /dev/iio:device0 once more,
>>>>>> until the next sleep.
>>>>>>
>>>>>> Since this is important to me I have taken the time to look at both
>>>>>> drivers and iio-trig-hrtimer and I have identified three possible
>>>>>> reasons:
>>>>>>
>>>>>> 1) iio-trig-hrtimer won't work after suspend regardless of how it is
>>>>>> used (this is what I believe is the cause)
>>>>> me too.
>>>> who and how should investigate this issue? Would putting a kprintf in the hrtimer callback be enough to check?
>>> The warning you have pretty much points at this being the problem, but sure
>>> you could add a print there to be absolutely sure.
>>>
>>>>
>>>>
>>>> Just to make sure I understood correctly: is this a separate issue from the warning I receive or are those linked?
>>>
>>> I think it's all one issue.
>> Hello,
>>
>> Sorry for the delay, I was able to find some time just recently.
>>
>> Sadly I don't think anymore this is just one issue:
>>
>> I have setup a proof of concept that at suspend sets a boolean to true and at resume sets the same boolean to false, and on trigger_handler returns IRQ_NONE if the device is sleeping (everything guarded by an additional mutex) and it solved the warning: no more regmap access are being performed after the device is put to sleep, but unfortunately the acquisition does not automatically resume after waking up the device.
>>
>> How shall we proceed?
>
> It's been a while (and a busy day of reviewing) so I'm struggling
> a bit with the best away forwardHello,

Don't worry about it, I don't have much time available either. That stuff is as precious as gold! Thank you kindly for your time!
>
> A hackyish approach might be to mask the interrupt at device side
> (fake interrupt used as part of the tree from a trigger to a pollfunc).
> A tidy wrapped up version of this might be the best route forwards
> though it won't currently stop the hrtimer merrily poking interrupts
> at no oneI am sorry but I am not sure I understood this correctly: do you mean having something "in-between" iio-trig-hrtimer and the target iio device and use that to selectively poll the device?
>
> As a hack, could you try having your suspend call disable_irq() on
> the irq found in iio_dev->poll_func->irq and reenable it in resume?
> That might be sufficient.
I used disable_irq and enable_irq as suggested and it works perfectly: I tested it four times and all four times (in a row) data flow resumed after s2idle-suspending the device.

I can do more testing, but this is already looking very good and I'm happy (I don't have knowledge about iio internals so I am unable to tell if what I did will introduce some race conditions).

As a reference I leave here the github link to the branch I have been experimenting with: https://github.com/NeroReflex/linux/commits/bmi323-pollfunc/ (Please don't mind that proof-of-concept code being copied and partially commented from bmc150-accel).

No errors are printed out and combining that info with the fact it's now working it can only means pollfunc and irq are both non-NULL and non-zero at both suspend and resume.
>
> Check poll_func goes somewhere first though and that irq is > 0
> I think that will call iio_trig_subirq_mask() which should block
> any further interrupts until it's unmasked again.
>
> We'll need to ensure this doesn't race with pollfunc going away though
> which will make it more complex, but the above test should tell us
> if there is a fairly smooth path to making this work.
>
> I'll try and find time to do some testing myself, but it won't be
> for a few weeks :(
>
> Thanks,
>
> Jonathan
Thank you again for your guidance,
Denis Benato
>
>
>>>
>>>>>
>>>>>> 2) iio-trig-hrtimer is stopped by the -ESHTDOWN returned by the
>>>>>> function printing "Transfer while suspended", however that stack
>>>>>> trace does not include function calls related to iio-trig-hrtimer and
>>>>>> this seems less plausible 3) bmi160 and bmi323 appears to be similar
>>>>>> and maybe are sharing a common bug with suspend (this is also why I
>>>>>> have maintainers of those drivers in the recipient list)
>>>>>>
>>>>>> Thanks for your time, patience and understanding,
>>>>>
>>>>> Hi Denis,
>>>>>
>>>>> I suspect this is the legacy of the platform I used to test the hrtimer
>>>>> and similar triggers on never had working suspend and resume (we ripped
>>>>> support for that board out of the kernel a while back now...)
>>>>> Hence those paths were never tested by me and others may not have cared
>>>>> about this particular case.
>>>>>
>>>>> Anyhow, so I think what is going on is fairly simple.
>>>>>
>>>>> There is no way for a driver to indicate to a trigger provided by a separate
>>>>> module / hardware device that it should stop triggering data capture.
>>>>> The driver in question doesn't block data capture when suspended, which
>>>>> would be easy enough to add but doesn't feel like the right solution.
>>>>>
>>>>> So my initial thought is that we should add suspend and resume callbacks to
>>>>> iio_trigger_ops and call them manually from iio device drivers in their
>>>>> suspend and resume callbacks. These would simply pause whatever the
>>>>> trigger source was so that no attempts are made to trigger the use of
>>>>> the device when it is suspended.
>>>>>
>>>>> It gets a little messy though as triggers can be shared between
>>>>> multiple devices so we'd need to reference count suspend and resume
>>>>> for the trigger to make sure we only resume once all consumers of
>>>>> the trigger have said they are ready to cope with triggers again.
>>>>>
>>>>> As mentioned, the alternative would be to block the triggers at ingress
>>>>> to the bmi323 and bmi160 drivers. There may be a helpful pm flag that could
>>>>> be used but if not, then setting an is_suspended flag under the data->mutex
>>>>> to avoid races. and reading it in the trigger_handler to decide whether
>>>>> to talk to the device should work.
>>>> I was thinking of doing this too, but I don't know if adding a mutex to frequently invoked functions is going to introduce some timing problems and so I was waiting for some comments on that matter. If nothing bad is expected I can surely try it.
>>>>>
>>>>> I'd kind of like the generic solution of actually letting the trigger
>>>>> know, but not sure how much work it would turn out to be. Either way there
>>>>> are a lot of drivers to fix this problem in as in theory most triggers can
>>>>> be used with most drivers that support buffered data capture.
>>>>> There may also be fun corners where a hardware trigger from one IIO
>>>>> device A is being used by another B and the suspend timing is such that B
>>>>> finishing with the trigger results in A taking an action (in the try_reenable
>>>>> callback) that could result in bus traffic.
>>>>> That one is going to be much more fiddly to handle than the simpler case
>>>>> you have run into.
>>>> Since more and more handheld PCs are coming and provided many vendors such as
>>>> asus tends to improve what they did built on previous generations I think a
>>>> general solution would be desirable.
>>>>
>>>> Plus there are handheld PCs that does not yet have a driver (bmi270) or are
>>>> using an existing one and it would be very difficult to fix it in every of
>>>> them as of now, in the future I fear it will become almost impossible or
>>>> extremely time consuming as market expands.
>>>
>>> Both solutions require specific calls to be added to every driver that might
>>> encounter this - so most drivers that support triggers other than the ones
>>> they supply.
>>>
>>>>>
>>>>> Thanks for the detailed bug report btw. To get it fixed a few
>>>>> questions:
>>>>> 1) Are you happy to test proposed fixes?
>>>>> 2) Do you want to have a go at fixing it yourself? (I'd suggest trying
>>>>> the fix in the bmi323 driver first rather than looking at the other
>>>>> solution)
>>>>> If we eventually implement the more general version, then a bmi323
>>>>> additional protection against this problem would not be a problem.
>>>>>
>>>>> Clearly I'd like the answers to be yes to both questions, but up to you!
>>>>>
>>>>> Jonathan
>>>>>
>>>>>
>>>> Hello Jonathan and thank you kindly for you answer,
>>>>
>>>> I am very interested in the iio ecosystem as in those aforementioned
>>>> handheld PCs the gyroscope plays the role as a mouse so it's a pretty
>>>> important input device.
>>>>
>>>> I am writing to lkml not just as a single developer, but as part of a
>>>> larger community in this matter: this means we will be able to test
>>>> any solution and in more than just one hardware.
>>>>
>>>> To answers your questions:
>>>> 1) yes, we all will be very happy to
>>>> 2) as I am very busy right now I might be unable to do that immediately,
>>>> but I will surely do it if one general solution cannot be found or is impractical.
>>>>
>>>> As of my limited knowledge the number of drivers now existing that are affected
>>>> are 2, and the number of drivers that will be affected, once the driver is
>>>> written, will be at least 3.
>>>
>>> The problem appears to be general unfortunately.
>>>
>>> I'll have to see if I can fire up something where I can actually test solutions
>>> and I'm afraid I also don't have a lot of time at the moment.
>>> Perhaps I can find time in the next few weeks to hack together a prototype
>>> for one of the drivers you can test.
>>>
>>> Jonathan
>>>
>>>>
>>>> Thank you very much for your answer,
>>>> Denis
>>>>
>>>
>>
>> Thank you for you time and patience,
>> Denis
>


2024-06-08 13:59:10

by Jonathan Cameron

[permalink] [raw]
Subject: Re: iio: iio-trig-hrtimer bug on suspend/resume when used with bmi160 and bmi323

On Sat, 8 Jun 2024 03:25:13 +0200
Denis Benato <[email protected]> wrote:

> On 6/2/24 18:08, Jonathan Cameron wrote:
> > On Thu, 30 May 2024 16:07:22 +0200
> > Denis Benato <[email protected]> wrote:
> >
> >> On 1/15/24 10:37, Jonathan Cameron wrote:
> >>> On Sun, 14 Jan 2024 21:04:12 +0100
> >>> Denis Benato <[email protected]> wrote:
> >>>
> >>>> On 1/13/24 18:46, Jonathan Cameron wrote:
> >>>>> On Wed, 10 Jan 2024 23:35:01 +0100
> >>>>> Denis Benato <[email protected]> wrote:
> >>>>>
> >>>>>> Hello,
> >>>>>>
> >>>>>> With this mail I am submitting bug report that is probably related to
> >>>>>> iio-trig-hrtimer but there is also the possibility for it to be
> >>>>>> specific to bmi160 and bmi323.
> >>>>>>
> >>>>>> The described problem have been reproduced on my handheld PC (Asus
> >>>>>> RC71L) and in another handheld PC with two different gyroscope
> >>>>>> drivers: bmi323 (backported by me on v6.7, on RC71L) and bmi160.
> >>>>>>
> >>>>>> My target hardware (RC71L that yeld to this discovery) has a bmi323
> >>>>>> chip that does not have any interrupt pins reaching the CPU, yet I
> >>>>>> need to fetch periodically data from said device, therefore I used
> >>>>>> iio-trig-hrtimer: created a trigger, set the device and trigger
> >>>>>> sampling frequencies, bound the trigger to the device and enabled
> >>>>>> buffer: data is being read and available over /dev/iio:device0.
> >>>>>>
> >>>>>> While in this state if I suspend my handheld I receive (from dmesg)
> >>>>>> the warning reported below and at resume data is not coming out of
> >>>>>> the iio device and the hrtimer appears to not be working. If I create
> >>>>>> a new trigger and bind the new trigger to said iio device and
> >>>>>> re-enable buffer data does come out of /dev/iio:device0 once more,
> >>>>>> until the next sleep.
> >>>>>>
> >>>>>> Since this is important to me I have taken the time to look at both
> >>>>>> drivers and iio-trig-hrtimer and I have identified three possible
> >>>>>> reasons:
> >>>>>>
> >>>>>> 1) iio-trig-hrtimer won't work after suspend regardless of how it is
> >>>>>> used (this is what I believe is the cause)
> >>>>> me too.
> >>>> who and how should investigate this issue? Would putting a kprintf in the hrtimer callback be enough to check?
> >>> The warning you have pretty much points at this being the problem, but sure
> >>> you could add a print there to be absolutely sure.
> >>>
> >>>>
> >>>>
> >>>> Just to make sure I understood correctly: is this a separate issue from the warning I receive or are those linked?
> >>>
> >>> I think it's all one issue.
> >> Hello,
> >>
> >> Sorry for the delay, I was able to find some time just recently.
> >>
> >> Sadly I don't think anymore this is just one issue:
> >>
> >> I have setup a proof of concept that at suspend sets a boolean to true and at resume sets the same boolean to false, and on trigger_handler returns IRQ_NONE if the device is sleeping (everything guarded by an additional mutex) and it solved the warning: no more regmap access are being performed after the device is put to sleep, but unfortunately the acquisition does not automatically resume after waking up the device.
> >>
> >> How shall we proceed?
> >
> > It's been a while (and a busy day of reviewing) so I'm struggling
> > a bit with the best away forwardHello,
>
> Don't worry about it, I don't have much time available either. That stuff is as precious as gold! Thank you kindly for your time!
> >
> > A hackyish approach might be to mask the interrupt at device side
> > (fake interrupt used as part of the tree from a trigger to a pollfunc).
> > A tidy wrapped up version of this might be the best route forwards
> > though it won't currently stop the hrtimer merrily poking interrupts
> > at no oneI am sorry but I am not sure I understood this correctly: do you mean having something "in-between" iio-trig-hrtimer and the target iio device and use that to selectively poll the device?
> >
> > As a hack, could you try having your suspend call disable_irq() on
> > the irq found in iio_dev->poll_func->irq and reenable it in resume?
> > That might be sufficient.
> I used disable_irq and enable_irq as suggested and it works perfectly: I tested it four times and all four times (in a row) data flow resumed after s2idle-suspending the device.
>

Excellent.

> I can do more testing, but this is already looking very good and I'm happy (I don't have knowledge about iio internals so I am unable to tell if what I did will introduce some race conditions).

On that note we definitely shouldn't directly mess with the IIO internals
as that implementation has changed in the past and might change again (though
it's been stable for over 10 years, so probably not ;). Hence we should
wrap this up in something like iio_suspend_triggering()/ iio_resume_triggering()
which will then be called from the suspend and resume callbacks of
drivers that need to ensure no accesses once suspended.

>
> As a reference I leave here the github link to the branch I have been experimenting with: https://github.com/NeroReflex/linux/commits/bmi323-pollfunc/ (Please don't mind that proof-of-concept code being copied and partially commented from bmc150-accel).
>
> No errors are printed out and combining that info with the fact it's now working it can only means pollfunc and irq are both non-NULL and non-zero at both suspend and resume.

That's true in the case you care about, but if we were putting in place general infrastructure
it will get a little more 'interesting'.

For example if currently if the trigger has never been set for a device
pollfunc->irq == 0, but we don't actually clear that value when we
detach the trigger (previously it was never used after that detach so
it's value was irrelevant.

So at minimum we need to fix that by resetting the irq value to 0 in
iio_trigger_detach_poll_func(), and add the checks so that if there
is no trigger we don't try to disable the irq.

I'm nervous that it might be possible to suspend whilst also setting
the trigger and hence we need locking. We can probably use the core
state lock (iio_dev->mlock) for that but we'd need care as that's
already taken in some of these paths.

> >
> > Check poll_func goes somewhere first though and that irq is > 0
> > I think that will call iio_trig_subirq_mask() which should block
> > any further interrupts until it's unmasked again.
> >
> > We'll need to ensure this doesn't race with pollfunc going away though
> > which will make it more complex, but the above test should tell us
> > if there is a fairly smooth path to making this work.
> >
> > I'll try and find time to do some testing myself, but it won't be
> > for a few weeks :(
> >
> > Thanks,
> >
> > Jonathan
> Thank you again for your guidance,

No problem.

Jonathan

> Denis Benato
> >
> >
> >>>
> >>>>>
> >>>>>> 2) iio-trig-hrtimer is stopped by the -ESHTDOWN returned by the
> >>>>>> function printing "Transfer while suspended", however that stack
> >>>>>> trace does not include function calls related to iio-trig-hrtimer and
> >>>>>> this seems less plausible 3) bmi160 and bmi323 appears to be similar
> >>>>>> and maybe are sharing a common bug with suspend (this is also why I
> >>>>>> have maintainers of those drivers in the recipient list)
> >>>>>>
> >>>>>> Thanks for your time, patience and understanding,
> >>>>>
> >>>>> Hi Denis,
> >>>>>
> >>>>> I suspect this is the legacy of the platform I used to test the hrtimer
> >>>>> and similar triggers on never had working suspend and resume (we ripped
> >>>>> support for that board out of the kernel a while back now...)
> >>>>> Hence those paths were never tested by me and others may not have cared
> >>>>> about this particular case.
> >>>>>
> >>>>> Anyhow, so I think what is going on is fairly simple.
> >>>>>
> >>>>> There is no way for a driver to indicate to a trigger provided by a separate
> >>>>> module / hardware device that it should stop triggering data capture.
> >>>>> The driver in question doesn't block data capture when suspended, which
> >>>>> would be easy enough to add but doesn't feel like the right solution.
> >>>>>
> >>>>> So my initial thought is that we should add suspend and resume callbacks to
> >>>>> iio_trigger_ops and call them manually from iio device drivers in their
> >>>>> suspend and resume callbacks. These would simply pause whatever the
> >>>>> trigger source was so that no attempts are made to trigger the use of
> >>>>> the device when it is suspended.
> >>>>>
> >>>>> It gets a little messy though as triggers can be shared between
> >>>>> multiple devices so we'd need to reference count suspend and resume
> >>>>> for the trigger to make sure we only resume once all consumers of
> >>>>> the trigger have said they are ready to cope with triggers again.
> >>>>>
> >>>>> As mentioned, the alternative would be to block the triggers at ingress
> >>>>> to the bmi323 and bmi160 drivers. There may be a helpful pm flag that could
> >>>>> be used but if not, then setting an is_suspended flag under the data->mutex
> >>>>> to avoid races. and reading it in the trigger_handler to decide whether
> >>>>> to talk to the device should work.
> >>>> I was thinking of doing this too, but I don't know if adding a mutex to frequently invoked functions is going to introduce some timing problems and so I was waiting for some comments on that matter. If nothing bad is expected I can surely try it.
> >>>>>
> >>>>> I'd kind of like the generic solution of actually letting the trigger
> >>>>> know, but not sure how much work it would turn out to be. Either way there
> >>>>> are a lot of drivers to fix this problem in as in theory most triggers can
> >>>>> be used with most drivers that support buffered data capture.
> >>>>> There may also be fun corners where a hardware trigger from one IIO
> >>>>> device A is being used by another B and the suspend timing is such that B
> >>>>> finishing with the trigger results in A taking an action (in the try_reenable
> >>>>> callback) that could result in bus traffic.
> >>>>> That one is going to be much more fiddly to handle than the simpler case
> >>>>> you have run into.
> >>>> Since more and more handheld PCs are coming and provided many vendors such as
> >>>> asus tends to improve what they did built on previous generations I think a
> >>>> general solution would be desirable.
> >>>>
> >>>> Plus there are handheld PCs that does not yet have a driver (bmi270) or are
> >>>> using an existing one and it would be very difficult to fix it in every of
> >>>> them as of now, in the future I fear it will become almost impossible or
> >>>> extremely time consuming as market expands.
> >>>
> >>> Both solutions require specific calls to be added to every driver that might
> >>> encounter this - so most drivers that support triggers other than the ones
> >>> they supply.
> >>>
> >>>>>
> >>>>> Thanks for the detailed bug report btw. To get it fixed a few
> >>>>> questions:
> >>>>> 1) Are you happy to test proposed fixes?
> >>>>> 2) Do you want to have a go at fixing it yourself? (I'd suggest trying
> >>>>> the fix in the bmi323 driver first rather than looking at the other
> >>>>> solution)
> >>>>> If we eventually implement the more general version, then a bmi323
> >>>>> additional protection against this problem would not be a problem.
> >>>>>
> >>>>> Clearly I'd like the answers to be yes to both questions, but up to you!
> >>>>>
> >>>>> Jonathan
> >>>>>
> >>>>>
> >>>> Hello Jonathan and thank you kindly for you answer,
> >>>>
> >>>> I am very interested in the iio ecosystem as in those aforementioned
> >>>> handheld PCs the gyroscope plays the role as a mouse so it's a pretty
> >>>> important input device.
> >>>>
> >>>> I am writing to lkml not just as a single developer, but as part of a
> >>>> larger community in this matter: this means we will be able to test
> >>>> any solution and in more than just one hardware.
> >>>>
> >>>> To answers your questions:
> >>>> 1) yes, we all will be very happy to
> >>>> 2) as I am very busy right now I might be unable to do that immediately,
> >>>> but I will surely do it if one general solution cannot be found or is impractical.
> >>>>
> >>>> As of my limited knowledge the number of drivers now existing that are affected
> >>>> are 2, and the number of drivers that will be affected, once the driver is
> >>>> written, will be at least 3.
> >>>
> >>> The problem appears to be general unfortunately.
> >>>
> >>> I'll have to see if I can fire up something where I can actually test solutions
> >>> and I'm afraid I also don't have a lot of time at the moment.
> >>> Perhaps I can find time in the next few weeks to hack together a prototype
> >>> for one of the drivers you can test.
> >>>
> >>> Jonathan
> >>>
> >>>>
> >>>> Thank you very much for your answer,
> >>>> Denis
> >>>>
> >>>
> >>
> >> Thank you for you time and patience,
> >> Denis
> >
>


2024-06-09 01:46:22

by Denis Benato

[permalink] [raw]
Subject: Re: iio: iio-trig-hrtimer bug on suspend/resume when used with bmi160 and bmi323

On 6/8/24 15:58, Jonathan Cameron wrote:
> On Sat, 8 Jun 2024 03:25:13 +0200
> Denis Benato <[email protected]> wrote:
>
>> On 6/2/24 18:08, Jonathan Cameron wrote:
>>> On Thu, 30 May 2024 16:07:22 +0200
>>> Denis Benato <[email protected]> wrote:
>>>
>>>> On 1/15/24 10:37, Jonathan Cameron wrote:
>>>>> On Sun, 14 Jan 2024 21:04:12 +0100
>>>>> Denis Benato <[email protected]> wrote:
>>>>>
>>>>>> On 1/13/24 18:46, Jonathan Cameron wrote:
>>>>>>> On Wed, 10 Jan 2024 23:35:01 +0100
>>>>>>> Denis Benato <[email protected]> wrote:
>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> With this mail I am submitting bug report that is probably related to
>>>>>>>> iio-trig-hrtimer but there is also the possibility for it to be
>>>>>>>> specific to bmi160 and bmi323.
>>>>>>>>
>>>>>>>> The described problem have been reproduced on my handheld PC (Asus
>>>>>>>> RC71L) and in another handheld PC with two different gyroscope
>>>>>>>> drivers: bmi323 (backported by me on v6.7, on RC71L) and bmi160.
>>>>>>>>
>>>>>>>> My target hardware (RC71L that yeld to this discovery) has a bmi323
>>>>>>>> chip that does not have any interrupt pins reaching the CPU, yet I
>>>>>>>> need to fetch periodically data from said device, therefore I used
>>>>>>>> iio-trig-hrtimer: created a trigger, set the device and trigger
>>>>>>>> sampling frequencies, bound the trigger to the device and enabled
>>>>>>>> buffer: data is being read and available over /dev/iio:device0.
>>>>>>>>
>>>>>>>> While in this state if I suspend my handheld I receive (from dmesg)
>>>>>>>> the warning reported below and at resume data is not coming out of
>>>>>>>> the iio device and the hrtimer appears to not be working. If I create
>>>>>>>> a new trigger and bind the new trigger to said iio device and
>>>>>>>> re-enable buffer data does come out of /dev/iio:device0 once more,
>>>>>>>> until the next sleep.
>>>>>>>>
>>>>>>>> Since this is important to me I have taken the time to look at both
>>>>>>>> drivers and iio-trig-hrtimer and I have identified three possible
>>>>>>>> reasons:
>>>>>>>>
>>>>>>>> 1) iio-trig-hrtimer won't work after suspend regardless of how it is
>>>>>>>> used (this is what I believe is the cause)
>>>>>>> me too.
>>>>>> who and how should investigate this issue? Would putting a kprintf in the hrtimer callback be enough to check?
>>>>> The warning you have pretty much points at this being the problem, but sure
>>>>> you could add a print there to be absolutely sure.
>>>>>
>>>>>>
>>>>>>
>>>>>> Just to make sure I understood correctly: is this a separate issue from the warning I receive or are those linked?
>>>>>
>>>>> I think it's all one issue.
>>>> Hello,
>>>>
>>>> Sorry for the delay, I was able to find some time just recently.
>>>>
>>>> Sadly I don't think anymore this is just one issue:
>>>>
>>>> I have setup a proof of concept that at suspend sets a boolean to true and at resume sets the same boolean to false, and on trigger_handler returns IRQ_NONE if the device is sleeping (everything guarded by an additional mutex) and it solved the warning: no more regmap access are being performed after the device is put to sleep, but unfortunately the acquisition does not automatically resume after waking up the device.
>>>>
>>>> How shall we proceed?
>>>
>>> It's been a while (and a busy day of reviewing) so I'm struggling
>>> a bit with the best away forwardHello,
>>
>> Don't worry about it, I don't have much time available either. That stuff is as precious as gold! Thank you kindly for your time!
>>>
>>> A hackyish approach might be to mask the interrupt at device side
>>> (fake interrupt used as part of the tree from a trigger to a pollfunc).
>>> A tidy wrapped up version of this might be the best route forwards
>>> though it won't currently stop the hrtimer merrily poking interrupts
>>> at no oneI am sorry but I am not sure I understood this correctly: do you mean having something "in-between" iio-trig-hrtimer and the target iio device and use that to selectively poll the device?
>>>
>>> As a hack, could you try having your suspend call disable_irq() on
>>> the irq found in iio_dev->poll_func->irq and reenable it in resume?
>>> That might be sufficient.
>> I used disable_irq and enable_irq as suggested and it works perfectly: I tested it four times and all four times (in a row) data flow resumed after s2idle-suspending the device.
>>
>
> Excellent.
Hello,

>
>> I can do more testing, but this is already looking very good and I'm happy (I don't have knowledge about iio internals so I am unable to tell if what I did will introduce some race conditions).
>
> On that note we definitely shouldn't directly mess with the IIO internals
> as that implementation has changed in the past and might change again (though
> it's been stable for over 10 years, so probably not ;). Hence we should
> wrap this up in something like iio_suspend_triggering()/ iio_resume_triggering()
> which will then be called from the suspend and resume callbacks of
> drivers that need to ensure no accesses once suspended.
>
I totally agree. I can try to draft something if you are ok with it. I have never done something like this to the kernel so guidance is very welcome.

>>
>> As a reference I leave here the github link to the branch I have been experimenting with: https://github.com/NeroReflex/linux/commits/bmi323-pollfunc/ (Please don't mind that proof-of-concept code being copied and partially commented from bmc150-accel).
>>
>> No errors are printed out and combining that info with the fact it's now working it can only means pollfunc and irq are both non-NULL and non-zero at both suspend and resume.
>
> That's true in the case you care about, but if we were putting in place general infrastructure
> it will get a little more 'interesting'.
>
Understood, thanks.

> For example if currently if the trigger has never been set for a device
> pollfunc->irq == 0, but we don't actually clear that value when we
> detach the trigger (previously it was never used after that detach so
> it's value was irrelevant.
>
> So at minimum we need to fix that by resetting the irq value to 0 in
> iio_trigger_detach_poll_func(), and add the checks so that if there
> is no trigger we don't try to disable the irq.
>
this will also minimize code path divergence for use-cases not affected: I like this.

> I'm nervous that it might be possible to suspend whilst also setting
> the trigger and hence we need locking. We can probably use the core
> state lock (iio_dev->mlock) for that but we'd need care as that's
> already taken in some of these paths.
>
I'm nervous too, but using the same mutex that is used to set a trigger seems like the right thing to do.

>>>
>>> Check poll_func goes somewhere first though and that irq is > 0
>>> I think that will call iio_trig_subirq_mask() which should block
>>> any further interrupts until it's unmasked again.
>>>
>>> We'll need to ensure this doesn't race with pollfunc going away though
>>> which will make it more complex, but the above test should tell us
>>> if there is a fairly smooth path to making this work.
>>>
>>> I'll try and find time to do some testing myself, but it won't be
>>> for a few weeks :(
>>>
>>> Thanks,
>>>
>>> Jonathan
>> Thank you again for your guidance,
>
> No problem.
>
> Jonathan
>
Best regards,
Denis

>> Denis Benato
>>>
>>>
>>>>>
>>>>>>>
>>>>>>>> 2) iio-trig-hrtimer is stopped by the -ESHTDOWN returned by the
>>>>>>>> function printing "Transfer while suspended", however that stack
>>>>>>>> trace does not include function calls related to iio-trig-hrtimer and
>>>>>>>> this seems less plausible 3) bmi160 and bmi323 appears to be similar
>>>>>>>> and maybe are sharing a common bug with suspend (this is also why I
>>>>>>>> have maintainers of those drivers in the recipient list)
>>>>>>>>
>>>>>>>> Thanks for your time, patience and understanding,
>>>>>>>
>>>>>>> Hi Denis,
>>>>>>>
>>>>>>> I suspect this is the legacy of the platform I used to test the hrtimer
>>>>>>> and similar triggers on never had working suspend and resume (we ripped
>>>>>>> support for that board out of the kernel a while back now...)
>>>>>>> Hence those paths were never tested by me and others may not have cared
>>>>>>> about this particular case.
>>>>>>>
>>>>>>> Anyhow, so I think what is going on is fairly simple.
>>>>>>>
>>>>>>> There is no way for a driver to indicate to a trigger provided by a separate
>>>>>>> module / hardware device that it should stop triggering data capture.
>>>>>>> The driver in question doesn't block data capture when suspended, which
>>>>>>> would be easy enough to add but doesn't feel like the right solution.
>>>>>>>
>>>>>>> So my initial thought is that we should add suspend and resume callbacks to
>>>>>>> iio_trigger_ops and call them manually from iio device drivers in their
>>>>>>> suspend and resume callbacks. These would simply pause whatever the
>>>>>>> trigger source was so that no attempts are made to trigger the use of
>>>>>>> the device when it is suspended.
>>>>>>>
>>>>>>> It gets a little messy though as triggers can be shared between
>>>>>>> multiple devices so we'd need to reference count suspend and resume
>>>>>>> for the trigger to make sure we only resume once all consumers of
>>>>>>> the trigger have said they are ready to cope with triggers again.
>>>>>>>
>>>>>>> As mentioned, the alternative would be to block the triggers at ingress
>>>>>>> to the bmi323 and bmi160 drivers. There may be a helpful pm flag that could
>>>>>>> be used but if not, then setting an is_suspended flag under the data->mutex
>>>>>>> to avoid races. and reading it in the trigger_handler to decide whether
>>>>>>> to talk to the device should work.
>>>>>> I was thinking of doing this too, but I don't know if adding a mutex to frequently invoked functions is going to introduce some timing problems and so I was waiting for some comments on that matter. If nothing bad is expected I can surely try it.
>>>>>>>
>>>>>>> I'd kind of like the generic solution of actually letting the trigger
>>>>>>> know, but not sure how much work it would turn out to be. Either way there
>>>>>>> are a lot of drivers to fix this problem in as in theory most triggers can
>>>>>>> be used with most drivers that support buffered data capture.
>>>>>>> There may also be fun corners where a hardware trigger from one IIO
>>>>>>> device A is being used by another B and the suspend timing is such that B
>>>>>>> finishing with the trigger results in A taking an action (in the try_reenable
>>>>>>> callback) that could result in bus traffic.
>>>>>>> That one is going to be much more fiddly to handle than the simpler case
>>>>>>> you have run into.
>>>>>> Since more and more handheld PCs are coming and provided many vendors such as
>>>>>> asus tends to improve what they did built on previous generations I think a
>>>>>> general solution would be desirable.
>>>>>>
>>>>>> Plus there are handheld PCs that does not yet have a driver (bmi270) or are
>>>>>> using an existing one and it would be very difficult to fix it in every of
>>>>>> them as of now, in the future I fear it will become almost impossible or
>>>>>> extremely time consuming as market expands.
>>>>>
>>>>> Both solutions require specific calls to be added to every driver that might
>>>>> encounter this - so most drivers that support triggers other than the ones
>>>>> they supply.
>>>>>
>>>>>>>
>>>>>>> Thanks for the detailed bug report btw. To get it fixed a few
>>>>>>> questions:
>>>>>>> 1) Are you happy to test proposed fixes?
>>>>>>> 2) Do you want to have a go at fixing it yourself? (I'd suggest trying
>>>>>>> the fix in the bmi323 driver first rather than looking at the other
>>>>>>> solution)
>>>>>>> If we eventually implement the more general version, then a bmi323
>>>>>>> additional protection against this problem would not be a problem.
>>>>>>>
>>>>>>> Clearly I'd like the answers to be yes to both questions, but up to you!
>>>>>>>
>>>>>>> Jonathan
>>>>>>>
>>>>>>>
>>>>>> Hello Jonathan and thank you kindly for you answer,
>>>>>>
>>>>>> I am very interested in the iio ecosystem as in those aforementioned
>>>>>> handheld PCs the gyroscope plays the role as a mouse so it's a pretty
>>>>>> important input device.
>>>>>>
>>>>>> I am writing to lkml not just as a single developer, but as part of a
>>>>>> larger community in this matter: this means we will be able to test
>>>>>> any solution and in more than just one hardware.
>>>>>>
>>>>>> To answers your questions:
>>>>>> 1) yes, we all will be very happy to
>>>>>> 2) as I am very busy right now I might be unable to do that immediately,
>>>>>> but I will surely do it if one general solution cannot be found or is impractical.
>>>>>>
>>>>>> As of my limited knowledge the number of drivers now existing that are affected
>>>>>> are 2, and the number of drivers that will be affected, once the driver is
>>>>>> written, will be at least 3.
>>>>>
>>>>> The problem appears to be general unfortunately.
>>>>>
>>>>> I'll have to see if I can fire up something where I can actually test solutions
>>>>> and I'm afraid I also don't have a lot of time at the moment.
>>>>> Perhaps I can find time in the next few weeks to hack together a prototype
>>>>> for one of the drivers you can test.
>>>>>
>>>>> Jonathan
>>>>>
>>>>>>
>>>>>> Thank you very much for your answer,
>>>>>> Denis
>>>>>>
>>>>>
>>>>
>>>> Thank you for you time and patience,
>>>> Denis
>>>
>>
>