2019-05-16 04:44:37

by Eduardo Valentin

[permalink] [raw]
Subject: [GIT PULL] Thermal-SoC management changes for v5.2-rc1

Hello Linus,

Please consider the following thermal soc changes for v5.2-rc1.

The following changes since commit 37624b58542fb9f2d9a70e6ea006ef8a5f66c30b:

Linux 5.1-rc7 (2019-04-28 17:04:13 -0700)

are available in the git repository at:

git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal linus

for you to fetch changes up to 37bcec5d9f71bd13142a97d2196b293c9ac23823:

hwmon: (pwm-fan) Use devm_thermal_of_cooling_device_register (2019-05-14 07:00:47 -0700)

Specifics:
- thermal core has a new devm_* API for registering cooling devices, thanks to Guenter R.
I took the entire series, that is why you see changes on drivers/hwmon in this pull.
- rockchip thermal driver gains support to PX30 SoC, thanks to Elaine Z.
- the generic-adc thermal driver now considers the lookup table DT property as optional,
thanks to Jean-Francois D.
- Refactoring of tsens thermal driver, thanks to Amit K.
- Cleanups on cpu cooling driver, thanks to Daniel L.
- broadcom thermal driver dropped support to ACPI, thanks to Srinath M.
- tegra thermal driver gains support to OC hw throttle and GPU throtle, thanks to Wei Ni.
- Fixes in several thermal drivers.

BR,

Eduardo Valentin

----------------------------------------------------------------
Amit Kucheria (21):
drivers: thermal: tsens: Document the data structures
drivers: thermal: tsens: Rename tsens_data
drivers: thermal: tsens: Rename tsens_device
drivers: thermal: tsens: Rename variable tmdev
drivers: thermal: tsens: Use consistent names for variables
drivers: thermal: tsens: Function prototypes should have argument names
drivers: thermal: tsens: Rename tsens-8916 to prepare to merge with tsens-8974
drivers: thermal: tsens: Rename constants to prepare to merge with tsens-8974
drivers: thermal: tsens: Merge tsens-8974 into tsens-v0_1
drivers: thermal: tsens: Introduce reg_fields to deal with register description
drivers: thermal: tsens: Save reference to the device pointer and use it
drivers: thermal: tsens: Don't print error message on -EPROBE_DEFER
drivers: thermal: tsens: Add new operation to check if a sensor is enabled
drivers: thermal: tsens: change data type for sensor IDs
drivers: thermal: tsens: Introduce IP-specific max_sensor count
drivers: thermal: tsens: simplify get_temp_tsens_v2 routine
drivers: thermal: tsens: Move get_temp_tsens_v2 to allow sharing
drivers: thermal: tsens: Common get_temp() learns to do ADC conversion
dt: thermal: tsens: Add bindings for qcs404
drivers: thermal: tsens: Add generic support for TSENS v1 IP
drivers: thermal: tsens: Move calibration constants to header file

Andrey Smirnov (1):
thermal: qoriq: Remove unnecessary DT node is NULL check

Daniel Lezcano (4):
thermal/drivers/cpu_cooling: Remove pointless test in power2state()
thermal/drivers/cpu_cooling: Fixup the header and copyright
thermal/drivers/cpu_cooling: Add Software Package Data Exchange (SPDX)
thermal/drivers/cpu_cooling: Remove pointless field

Elaine Zhang (3):
thermal: rockchip: fix up the tsadc pinctrl setting error
dt-bindings: rockchip-thermal: Support the PX30 SoC compatible
thermal: rockchip: Support the PX30 SoC in thermal driver

Enrico Weigelt, metux IT consult (1):
drivers: thermal: Kconfig: pedantic cleanups

Guenter Roeck (6):
thermal: Introduce devm_thermal_of_cooling_device_register
hwmon: (aspeed-pwm-tacho) Use devm_thermal_of_cooling_device_register
hwmon: (gpio-fan) Use devm_thermal_of_cooling_device_register
hwmon: (mlxreg-fan) Use devm_thermal_of_cooling_device_register
hwmon: (npcm750-pwm-fan) Use devm_thermal_of_cooling_device_register
hwmon: (pwm-fan) Use devm_thermal_of_cooling_device_register

Hoan Nguyen An (1):
thermal: rcar_gen3_thermal: Fix init value of IRQCTL register

Jean-Francois Dagenais (2):
thermal: generic-adc: make lookup table optional
dt-bindings: thermal: generic-adc: make lookup-table optional

Jiada Wang (3):
thermal: rcar_gen3_thermal: fix interrupt type
thermal: rcar_gen3_thermal: disable interrupt in .remove
thermal: rcar_gen3_thermal: Fix to show correct trip points number

Matthias Kaehlcke (1):
thermal: cpu_cooling: Actually trace CPU load in thermal_power_cpu_get_power

Srinath Mannam (1):
thermal: broadcom: Remove ACPI support

Talel Shenhar (3):
dt-bindings: thermal: al-thermal: Add binding documentation
thermal: Introduce Amazon's Annapurna Labs Thermal Driver
thermal: Fix build error of missing devm_ioremap_resource on UM

Wei Ni (9):
of: Add bindings of thermtrip for Tegra soctherm
thermal: tegra: support hw and sw shutdown
of: Add bindings of gpu hw throttle for Tegra soctherm
thermal: tegra: add support for gpu hw-throttle
thermal: tegra: add support for thermal IRQ
thermal: tegra: add set_trips functionality
thermal: tegra: add support for EDP IRQ
of: Add bindings of OC hw throttle for Tegra soctherm
thermal: tegra: enable OC hw throttle

Wolfram Sang (1):
thermal: stm32: simplify getting .driver_data

Yangtao Li (1):
of: thermal: Improve print information

Yoshihiro Kaneko (1):
thermal: rcar_thermal: update calculation formula for R-Car Gen3 SoCs

.../bindings/thermal/amazon,al-thermal.txt | 33 +
.../bindings/thermal/nvidia,tegra124-soctherm.txt | 62 +-
.../devicetree/bindings/thermal/qcom-tsens.txt | 14 +
.../bindings/thermal/rockchip-thermal.txt | 1 +
.../bindings/thermal/thermal-generic-adc.txt | 10 +-
MAINTAINERS | 6 +
drivers/hwmon/aspeed-pwm-tacho.c | 6 +-
drivers/hwmon/gpio-fan.c | 25 +-
drivers/hwmon/mlxreg-fan.c | 31 +-
drivers/hwmon/npcm750-pwm-fan.c | 6 +-
drivers/hwmon/pwm-fan.c | 73 +-
drivers/thermal/Kconfig | 11 +
drivers/thermal/Makefile | 1 +
drivers/thermal/broadcom/sr-thermal.c | 8 -
drivers/thermal/cpu_cooling.c | 30 +-
drivers/thermal/of-thermal.c | 3 +
drivers/thermal/qcom/Makefile | 4 +-
drivers/thermal/qcom/tsens-8916.c | 105 ---
drivers/thermal/qcom/tsens-8960.c | 84 +-
drivers/thermal/qcom/tsens-common.c | 159 +++-
.../thermal/qcom/{tsens-8974.c => tsens-v0_1.c} | 166 +++-
drivers/thermal/qcom/tsens-v1.c | 193 +++++
drivers/thermal/qcom/tsens-v2.c | 111 +--
drivers/thermal/qcom/tsens.c | 100 ++-
drivers/thermal/qcom/tsens.h | 291 ++++++-
drivers/thermal/qoriq_thermal.c | 5 -
drivers/thermal/rcar_gen3_thermal.c | 51 +-
drivers/thermal/rcar_thermal.c | 11 +-
drivers/thermal/rockchip_thermal.c | 74 +-
drivers/thermal/st/Kconfig | 22 +-
drivers/thermal/st/stm_thermal.c | 6 +-
drivers/thermal/tegra/Kconfig | 4 +-
drivers/thermal/tegra/soctherm.c | 961 +++++++++++++++++++--
drivers/thermal/tegra/soctherm.h | 16 +
drivers/thermal/tegra/tegra124-soctherm.c | 7 +-
drivers/thermal/tegra/tegra132-soctherm.c | 7 +-
drivers/thermal/tegra/tegra210-soctherm.c | 15 +-
drivers/thermal/thermal-generic-adc.c | 9 +-
drivers/thermal/thermal_core.c | 49 ++
drivers/thermal/thermal_mmio.c | 129 +++
include/dt-bindings/thermal/tegra124-soctherm.h | 8 +-
include/linux/thermal.h | 13 +
42 files changed, 2330 insertions(+), 590 deletions(-)
create mode 100644 Documentation/devicetree/bindings/thermal/amazon,al-thermal.txt
delete mode 100644 drivers/thermal/qcom/tsens-8916.c
rename drivers/thermal/qcom/{tsens-8974.c => tsens-v0_1.c} (56%)
create mode 100644 drivers/thermal/qcom/tsens-v1.c
create mode 100644 drivers/thermal/thermal_mmio.c


2019-05-16 15:10:02

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] Thermal-SoC management changes for v5.2-rc1

On Wed, May 15, 2019 at 9:43 PM Eduardo Valentin <[email protected]> wrote:
>
> - thermal core has a new devm_* API for registering cooling devices, thanks to Guenter R.
> I took the entire series, that is why you see changes on drivers/hwmon in this pull.

This clashed badly with commit 6b1ec4789fb1 ("hwmon: (pwm-fan) Add RPM
support via external interrupt"), which added a timer to the pwm-fan
handling.

In particular, that timer now needed the same kind of cleanup changes,
and I'd like you guys (particularly Guenther, who was involved on both
sides) to double-check my merge.

The way I solved it was to just make the pwm_fan_pwm_disable()
callback do both the pwm_diable() _and_ the del_timer_sync() on the
new timer. That seemed to be the simplest solution that meshed with
the new devm cleanup model, but while I build-tested the result, I
obviously did no actual use testing. And maybe there's some reason why
that approach is flawed.

Guenther?

Linus

2019-05-16 15:13:00

by pr-tracker-bot

[permalink] [raw]
Subject: Re: [GIT PULL] Thermal-SoC management changes for v5.2-rc1

The pull request you sent on Wed, 15 May 2019 21:43:14 -0700:

> git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal linus

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/a455eda33faafcaac1effb31d682765b14ef868c

Thank you!

--
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker

2019-05-16 16:13:45

by Stefan Wahren

[permalink] [raw]
Subject: Re: [GIT PULL] Thermal-SoC management changes for v5.2-rc1

Hi Linus,

On 16.05.19 17:07, Linus Torvalds wrote:
> On Wed, May 15, 2019 at 9:43 PM Eduardo Valentin <[email protected]> wrote:
>> - thermal core has a new devm_* API for registering cooling devices, thanks to Guenter R.
>> I took the entire series, that is why you see changes on drivers/hwmon in this pull.
> This clashed badly with commit 6b1ec4789fb1 ("hwmon: (pwm-fan) Add RPM
> support via external interrupt"), which added a timer to the pwm-fan
> handling.
>
> In particular, that timer now needed the same kind of cleanup changes,
> and I'd like you guys (particularly Guenther, who was involved on both
> sides) to double-check my merge.
>
> The way I solved it was to just make the pwm_fan_pwm_disable()
> callback do both the pwm_diable() _and_ the del_timer_sync() on the
> new timer. That seemed to be the simplest solution that meshed with
> the new devm cleanup model, but while I build-tested the result, I
> obviously did no actual use testing. And maybe there's some reason why
> that approach is flawed.

i will try to test on our custom i.MX6 board. Unfortunately this take
some time since it isn't mainline yet (at least until tomorrow).

Stefan

>
> Guenther?
>
> Linus

2019-05-16 18:23:11

by Guenter Roeck

[permalink] [raw]
Subject: Re: [GIT PULL] Thermal-SoC management changes for v5.2-rc1

On 5/16/19 8:07 AM, Linus Torvalds wrote:
> On Wed, May 15, 2019 at 9:43 PM Eduardo Valentin <[email protected]> wrote:
>>
>> - thermal core has a new devm_* API for registering cooling devices, thanks to Guenter R.
>> I took the entire series, that is why you see changes on drivers/hwmon in this pull.
>
> This clashed badly with commit 6b1ec4789fb1 ("hwmon: (pwm-fan) Add RPM
> support via external interrupt"), which added a timer to the pwm-fan
> handling.
>
> In particular, that timer now needed the same kind of cleanup changes,
> and I'd like you guys (particularly Guenther, who was involved on both
> sides) to double-check my merge.
>
> The way I solved it was to just make the pwm_fan_pwm_disable()
> callback do both the pwm_diable() _and_ the del_timer_sync() on the
> new timer. That seemed to be the simplest solution that meshed with
> the new devm cleanup model, but while I build-tested the result, I
> obviously did no actual use testing. And maybe there's some reason why
> that approach is flawed.
>
> Guenther?

Sorry for the trouble. Looks like I did too much cleanup this time around.

Looks ok. I'll have to send a follow-up patch - we should check the
return value of devm_add_action_or_reset(). No idea why I didn't do that
in this series. I'll do that after the commit window closes (and after
I am back from vacation).

Thanks a lot for sorting this out.

Guenter

2019-05-17 12:41:42

by Stefan Wahren

[permalink] [raw]
Subject: Re: [GIT PULL] Thermal-SoC management changes for v5.2-rc1

On 16.05.19 18:11, Stefan Wahren wrote:
> Hi Linus,
>
> On 16.05.19 17:07, Linus Torvalds wrote:
>> On Wed, May 15, 2019 at 9:43 PM Eduardo Valentin <[email protected]> wrote:
>>> - thermal core has a new devm_* API for registering cooling devices, thanks to Guenter R.
>>> I took the entire series, that is why you see changes on drivers/hwmon in this pull.
>> This clashed badly with commit 6b1ec4789fb1 ("hwmon: (pwm-fan) Add RPM
>> support via external interrupt"), which added a timer to the pwm-fan
>> handling.
>>
>> In particular, that timer now needed the same kind of cleanup changes,
>> and I'd like you guys (particularly Guenther, who was involved on both
>> sides) to double-check my merge.
>>
>> The way I solved it was to just make the pwm_fan_pwm_disable()
>> callback do both the pwm_diable() _and_ the del_timer_sync() on the
>> new timer. That seemed to be the simplest solution that meshed with
>> the new devm cleanup model, but while I build-tested the result, I
>> obviously did no actual use testing. And maybe there's some reason why
>> that approach is flawed.
> i will try to test on our custom i.MX6 board. Unfortunately this take
> some time since it isn't mainline yet (at least until tomorrow).

Okay, today's test based on your tree ( a6a4b66bd8f ) were successful.

Thanks
Stefan

>
> Stefan
>
>> Guenther?
>>
>> Linus

2019-05-23 09:48:35

by Tomeu Vizoso

[permalink] [raw]
Subject: Re: [GIT PULL] Thermal-SoC management changes for v5.2-rc1

Hi Eduardo,

I saw that for 5.1 [0] you included a kernelci boot report for your
tree, but not for 5.2. Have you found anything that should be improved
in KernelCI for it to be more useful to maintainers like you?

[0] https://lore.kernel.org/lkml/[email protected]/

I found about this when trying to understand why the boot on the
veyron-jaq board has been broken in 5.2-rc1.

Thanks,

Tomeu

On Thu, 16 May 2019 at 06:43, Eduardo Valentin <[email protected]> wrote:
>
> Hello Linus,
>
> Please consider the following thermal soc changes for v5.2-rc1.
>
> The following changes since commit 37624b58542fb9f2d9a70e6ea006ef8a5f66c30b:
>
> Linux 5.1-rc7 (2019-04-28 17:04:13 -0700)
>
> are available in the git repository at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal linus
>
> for you to fetch changes up to 37bcec5d9f71bd13142a97d2196b293c9ac23823:
>
> hwmon: (pwm-fan) Use devm_thermal_of_cooling_device_register (2019-05-14 07:00:47 -0700)
>
> Specifics:
> - thermal core has a new devm_* API for registering cooling devices, thanks to Guenter R.
> I took the entire series, that is why you see changes on drivers/hwmon in this pull.
> - rockchip thermal driver gains support to PX30 SoC, thanks to Elaine Z.
> - the generic-adc thermal driver now considers the lookup table DT property as optional,
> thanks to Jean-Francois D.
> - Refactoring of tsens thermal driver, thanks to Amit K.
> - Cleanups on cpu cooling driver, thanks to Daniel L.
> - broadcom thermal driver dropped support to ACPI, thanks to Srinath M.
> - tegra thermal driver gains support to OC hw throttle and GPU throtle, thanks to Wei Ni.
> - Fixes in several thermal drivers.
>
> BR,
>
> Eduardo Valentin
>
> ----------------------------------------------------------------
> Amit Kucheria (21):
> drivers: thermal: tsens: Document the data structures
> drivers: thermal: tsens: Rename tsens_data
> drivers: thermal: tsens: Rename tsens_device
> drivers: thermal: tsens: Rename variable tmdev
> drivers: thermal: tsens: Use consistent names for variables
> drivers: thermal: tsens: Function prototypes should have argument names
> drivers: thermal: tsens: Rename tsens-8916 to prepare to merge with tsens-8974
> drivers: thermal: tsens: Rename constants to prepare to merge with tsens-8974
> drivers: thermal: tsens: Merge tsens-8974 into tsens-v0_1
> drivers: thermal: tsens: Introduce reg_fields to deal with register description
> drivers: thermal: tsens: Save reference to the device pointer and use it
> drivers: thermal: tsens: Don't print error message on -EPROBE_DEFER
> drivers: thermal: tsens: Add new operation to check if a sensor is enabled
> drivers: thermal: tsens: change data type for sensor IDs
> drivers: thermal: tsens: Introduce IP-specific max_sensor count
> drivers: thermal: tsens: simplify get_temp_tsens_v2 routine
> drivers: thermal: tsens: Move get_temp_tsens_v2 to allow sharing
> drivers: thermal: tsens: Common get_temp() learns to do ADC conversion
> dt: thermal: tsens: Add bindings for qcs404
> drivers: thermal: tsens: Add generic support for TSENS v1 IP
> drivers: thermal: tsens: Move calibration constants to header file
>
> Andrey Smirnov (1):
> thermal: qoriq: Remove unnecessary DT node is NULL check
>
> Daniel Lezcano (4):
> thermal/drivers/cpu_cooling: Remove pointless test in power2state()
> thermal/drivers/cpu_cooling: Fixup the header and copyright
> thermal/drivers/cpu_cooling: Add Software Package Data Exchange (SPDX)
> thermal/drivers/cpu_cooling: Remove pointless field
>
> Elaine Zhang (3):
> thermal: rockchip: fix up the tsadc pinctrl setting error
> dt-bindings: rockchip-thermal: Support the PX30 SoC compatible
> thermal: rockchip: Support the PX30 SoC in thermal driver
>
> Enrico Weigelt, metux IT consult (1):
> drivers: thermal: Kconfig: pedantic cleanups
>
> Guenter Roeck (6):
> thermal: Introduce devm_thermal_of_cooling_device_register
> hwmon: (aspeed-pwm-tacho) Use devm_thermal_of_cooling_device_register
> hwmon: (gpio-fan) Use devm_thermal_of_cooling_device_register
> hwmon: (mlxreg-fan) Use devm_thermal_of_cooling_device_register
> hwmon: (npcm750-pwm-fan) Use devm_thermal_of_cooling_device_register
> hwmon: (pwm-fan) Use devm_thermal_of_cooling_device_register
>
> Hoan Nguyen An (1):
> thermal: rcar_gen3_thermal: Fix init value of IRQCTL register
>
> Jean-Francois Dagenais (2):
> thermal: generic-adc: make lookup table optional
> dt-bindings: thermal: generic-adc: make lookup-table optional
>
> Jiada Wang (3):
> thermal: rcar_gen3_thermal: fix interrupt type
> thermal: rcar_gen3_thermal: disable interrupt in .remove
> thermal: rcar_gen3_thermal: Fix to show correct trip points number
>
> Matthias Kaehlcke (1):
> thermal: cpu_cooling: Actually trace CPU load in thermal_power_cpu_get_power
>
> Srinath Mannam (1):
> thermal: broadcom: Remove ACPI support
>
> Talel Shenhar (3):
> dt-bindings: thermal: al-thermal: Add binding documentation
> thermal: Introduce Amazon's Annapurna Labs Thermal Driver
> thermal: Fix build error of missing devm_ioremap_resource on UM
>
> Wei Ni (9):
> of: Add bindings of thermtrip for Tegra soctherm
> thermal: tegra: support hw and sw shutdown
> of: Add bindings of gpu hw throttle for Tegra soctherm
> thermal: tegra: add support for gpu hw-throttle
> thermal: tegra: add support for thermal IRQ
> thermal: tegra: add set_trips functionality
> thermal: tegra: add support for EDP IRQ
> of: Add bindings of OC hw throttle for Tegra soctherm
> thermal: tegra: enable OC hw throttle
>
> Wolfram Sang (1):
> thermal: stm32: simplify getting .driver_data
>
> Yangtao Li (1):
> of: thermal: Improve print information
>
> Yoshihiro Kaneko (1):
> thermal: rcar_thermal: update calculation formula for R-Car Gen3 SoCs
>
> .../bindings/thermal/amazon,al-thermal.txt | 33 +
> .../bindings/thermal/nvidia,tegra124-soctherm.txt | 62 +-
> .../devicetree/bindings/thermal/qcom-tsens.txt | 14 +
> .../bindings/thermal/rockchip-thermal.txt | 1 +
> .../bindings/thermal/thermal-generic-adc.txt | 10 +-
> MAINTAINERS | 6 +
> drivers/hwmon/aspeed-pwm-tacho.c | 6 +-
> drivers/hwmon/gpio-fan.c | 25 +-
> drivers/hwmon/mlxreg-fan.c | 31 +-
> drivers/hwmon/npcm750-pwm-fan.c | 6 +-
> drivers/hwmon/pwm-fan.c | 73 +-
> drivers/thermal/Kconfig | 11 +
> drivers/thermal/Makefile | 1 +
> drivers/thermal/broadcom/sr-thermal.c | 8 -
> drivers/thermal/cpu_cooling.c | 30 +-
> drivers/thermal/of-thermal.c | 3 +
> drivers/thermal/qcom/Makefile | 4 +-
> drivers/thermal/qcom/tsens-8916.c | 105 ---
> drivers/thermal/qcom/tsens-8960.c | 84 +-
> drivers/thermal/qcom/tsens-common.c | 159 +++-
> .../thermal/qcom/{tsens-8974.c => tsens-v0_1.c} | 166 +++-
> drivers/thermal/qcom/tsens-v1.c | 193 +++++
> drivers/thermal/qcom/tsens-v2.c | 111 +--
> drivers/thermal/qcom/tsens.c | 100 ++-
> drivers/thermal/qcom/tsens.h | 291 ++++++-
> drivers/thermal/qoriq_thermal.c | 5 -
> drivers/thermal/rcar_gen3_thermal.c | 51 +-
> drivers/thermal/rcar_thermal.c | 11 +-
> drivers/thermal/rockchip_thermal.c | 74 +-
> drivers/thermal/st/Kconfig | 22 +-
> drivers/thermal/st/stm_thermal.c | 6 +-
> drivers/thermal/tegra/Kconfig | 4 +-
> drivers/thermal/tegra/soctherm.c | 961 +++++++++++++++++++--
> drivers/thermal/tegra/soctherm.h | 16 +
> drivers/thermal/tegra/tegra124-soctherm.c | 7 +-
> drivers/thermal/tegra/tegra132-soctherm.c | 7 +-
> drivers/thermal/tegra/tegra210-soctherm.c | 15 +-
> drivers/thermal/thermal-generic-adc.c | 9 +-
> drivers/thermal/thermal_core.c | 49 ++
> drivers/thermal/thermal_mmio.c | 129 +++
> include/dt-bindings/thermal/tegra124-soctherm.h | 8 +-
> include/linux/thermal.h | 13 +
> 42 files changed, 2330 insertions(+), 590 deletions(-)
> create mode 100644 Documentation/devicetree/bindings/thermal/amazon,al-thermal.txt
> delete mode 100644 drivers/thermal/qcom/tsens-8916.c
> rename drivers/thermal/qcom/{tsens-8974.c => tsens-v0_1.c} (56%)
> create mode 100644 drivers/thermal/qcom/tsens-v1.c
> create mode 100644 drivers/thermal/thermal_mmio.c

2019-05-24 02:38:57

by Eduardo Valentin

[permalink] [raw]
Subject: Re: [GIT PULL] Thermal-SoC management changes for v5.2-rc1

On Thu, May 16, 2019 at 09:55:33AM -0700, Guenter Roeck wrote:
> On 5/16/19 8:07 AM, Linus Torvalds wrote:
> >On Wed, May 15, 2019 at 9:43 PM Eduardo Valentin <[email protected]> wrote:
> >>
> >>- thermal core has a new devm_* API for registering cooling devices, thanks to Guenter R.
> >> I took the entire series, that is why you see changes on drivers/hwmon in this pull.
> >
> >This clashed badly with commit 6b1ec4789fb1 ("hwmon: (pwm-fan) Add RPM
> >support via external interrupt"), which added a timer to the pwm-fan
> >handling.
> >
> >In particular, that timer now needed the same kind of cleanup changes,
> >and I'd like you guys (particularly Guenther, who was involved on both
> >sides) to double-check my merge.
> >
> >The way I solved it was to just make the pwm_fan_pwm_disable()
> >callback do both the pwm_diable() _and_ the del_timer_sync() on the
> >new timer. That seemed to be the simplest solution that meshed with
> >the new devm cleanup model, but while I build-tested the result, I
> >obviously did no actual use testing. And maybe there's some reason why
> >that approach is flawed.
> >
> >Guenther?
>
> Sorry for the trouble. Looks like I did too much cleanup this time around.
>
> Looks ok. I'll have to send a follow-up patch - we should check the
> return value of devm_add_action_or_reset(). No idea why I didn't do that
> in this series. I'll do that after the commit window closes (and after
> I am back from vacation).

OK... From what I could tell, looked fine from a thermal perspective.

>
> Thanks a lot for sorting this out.
>
> Guenter

2019-05-24 02:44:21

by Eduardo Valentin

[permalink] [raw]
Subject: Re: [GIT PULL] Thermal-SoC management changes for v5.2-rc1

On Thu, May 23, 2019 at 11:46:47AM +0200, Tomeu Vizoso wrote:
> Hi Eduardo,
>
> I saw that for 5.1 [0] you included a kernelci boot report for your
> tree, but not for 5.2. Have you found anything that should be improved
> in KernelCI for it to be more useful to maintainers like you?

Honestly, I take a couple of automated testing as input before sending
my pulls to Linux: (a) my local test, (b) kernel-ci, and (c) 0-day.

There was really no reason specifically for me to not add the report
from kernelci, except..
>
> [0] https://lore.kernel.org/lkml/[email protected]/
>
> I found about this when trying to understand why the boot on the
> veyron-jaq board has been broken in 5.2-rc1.
>

I remember a report saying this failed, but from what I could tell from
the boot log, the board booted and hit terminal. But apparently, after
all reports from developers, the veyron-jaq boards were in a hang state.

That was hard for me to tell from your logs, as they looked like
a regular boot that hits terminal.

Maybe I should have looked for a specific output of a command you guys
run, saying "successful boot" somewhere?

> Thanks,
>
> Tomeu
>
> On Thu, 16 May 2019 at 06:43, Eduardo Valentin <[email protected]> wrote:
> >
> > Hello Linus,
> >
> > Please consider the following thermal soc changes for v5.2-rc1.
> >
> > The following changes since commit 37624b58542fb9f2d9a70e6ea006ef8a5f66c30b:
> >
> > Linux 5.1-rc7 (2019-04-28 17:04:13 -0700)
> >
> > are available in the git repository at:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal linus
> >
> > for you to fetch changes up to 37bcec5d9f71bd13142a97d2196b293c9ac23823:
> >
> > hwmon: (pwm-fan) Use devm_thermal_of_cooling_device_register (2019-05-14 07:00:47 -0700)
> >
> > Specifics:
> > - thermal core has a new devm_* API for registering cooling devices, thanks to Guenter R.
> > I took the entire series, that is why you see changes on drivers/hwmon in this pull.
> > - rockchip thermal driver gains support to PX30 SoC, thanks to Elaine Z.
> > - the generic-adc thermal driver now considers the lookup table DT property as optional,
> > thanks to Jean-Francois D.
> > - Refactoring of tsens thermal driver, thanks to Amit K.
> > - Cleanups on cpu cooling driver, thanks to Daniel L.
> > - broadcom thermal driver dropped support to ACPI, thanks to Srinath M.
> > - tegra thermal driver gains support to OC hw throttle and GPU throtle, thanks to Wei Ni.
> > - Fixes in several thermal drivers.
> >
> > BR,
> >
> > Eduardo Valentin
> >
> > ----------------------------------------------------------------
> > Amit Kucheria (21):
> > drivers: thermal: tsens: Document the data structures
> > drivers: thermal: tsens: Rename tsens_data
> > drivers: thermal: tsens: Rename tsens_device
> > drivers: thermal: tsens: Rename variable tmdev
> > drivers: thermal: tsens: Use consistent names for variables
> > drivers: thermal: tsens: Function prototypes should have argument names
> > drivers: thermal: tsens: Rename tsens-8916 to prepare to merge with tsens-8974
> > drivers: thermal: tsens: Rename constants to prepare to merge with tsens-8974
> > drivers: thermal: tsens: Merge tsens-8974 into tsens-v0_1
> > drivers: thermal: tsens: Introduce reg_fields to deal with register description
> > drivers: thermal: tsens: Save reference to the device pointer and use it
> > drivers: thermal: tsens: Don't print error message on -EPROBE_DEFER
> > drivers: thermal: tsens: Add new operation to check if a sensor is enabled
> > drivers: thermal: tsens: change data type for sensor IDs
> > drivers: thermal: tsens: Introduce IP-specific max_sensor count
> > drivers: thermal: tsens: simplify get_temp_tsens_v2 routine
> > drivers: thermal: tsens: Move get_temp_tsens_v2 to allow sharing
> > drivers: thermal: tsens: Common get_temp() learns to do ADC conversion
> > dt: thermal: tsens: Add bindings for qcs404
> > drivers: thermal: tsens: Add generic support for TSENS v1 IP
> > drivers: thermal: tsens: Move calibration constants to header file
> >
> > Andrey Smirnov (1):
> > thermal: qoriq: Remove unnecessary DT node is NULL check
> >
> > Daniel Lezcano (4):
> > thermal/drivers/cpu_cooling: Remove pointless test in power2state()
> > thermal/drivers/cpu_cooling: Fixup the header and copyright
> > thermal/drivers/cpu_cooling: Add Software Package Data Exchange (SPDX)
> > thermal/drivers/cpu_cooling: Remove pointless field
> >
> > Elaine Zhang (3):
> > thermal: rockchip: fix up the tsadc pinctrl setting error
> > dt-bindings: rockchip-thermal: Support the PX30 SoC compatible
> > thermal: rockchip: Support the PX30 SoC in thermal driver
> >
> > Enrico Weigelt, metux IT consult (1):
> > drivers: thermal: Kconfig: pedantic cleanups
> >
> > Guenter Roeck (6):
> > thermal: Introduce devm_thermal_of_cooling_device_register
> > hwmon: (aspeed-pwm-tacho) Use devm_thermal_of_cooling_device_register
> > hwmon: (gpio-fan) Use devm_thermal_of_cooling_device_register
> > hwmon: (mlxreg-fan) Use devm_thermal_of_cooling_device_register
> > hwmon: (npcm750-pwm-fan) Use devm_thermal_of_cooling_device_register
> > hwmon: (pwm-fan) Use devm_thermal_of_cooling_device_register
> >
> > Hoan Nguyen An (1):
> > thermal: rcar_gen3_thermal: Fix init value of IRQCTL register
> >
> > Jean-Francois Dagenais (2):
> > thermal: generic-adc: make lookup table optional
> > dt-bindings: thermal: generic-adc: make lookup-table optional
> >
> > Jiada Wang (3):
> > thermal: rcar_gen3_thermal: fix interrupt type
> > thermal: rcar_gen3_thermal: disable interrupt in .remove
> > thermal: rcar_gen3_thermal: Fix to show correct trip points number
> >
> > Matthias Kaehlcke (1):
> > thermal: cpu_cooling: Actually trace CPU load in thermal_power_cpu_get_power
> >
> > Srinath Mannam (1):
> > thermal: broadcom: Remove ACPI support
> >
> > Talel Shenhar (3):
> > dt-bindings: thermal: al-thermal: Add binding documentation
> > thermal: Introduce Amazon's Annapurna Labs Thermal Driver
> > thermal: Fix build error of missing devm_ioremap_resource on UM
> >
> > Wei Ni (9):
> > of: Add bindings of thermtrip for Tegra soctherm
> > thermal: tegra: support hw and sw shutdown
> > of: Add bindings of gpu hw throttle for Tegra soctherm
> > thermal: tegra: add support for gpu hw-throttle
> > thermal: tegra: add support for thermal IRQ
> > thermal: tegra: add set_trips functionality
> > thermal: tegra: add support for EDP IRQ
> > of: Add bindings of OC hw throttle for Tegra soctherm
> > thermal: tegra: enable OC hw throttle
> >
> > Wolfram Sang (1):
> > thermal: stm32: simplify getting .driver_data
> >
> > Yangtao Li (1):
> > of: thermal: Improve print information
> >
> > Yoshihiro Kaneko (1):
> > thermal: rcar_thermal: update calculation formula for R-Car Gen3 SoCs
> >
> > .../bindings/thermal/amazon,al-thermal.txt | 33 +
> > .../bindings/thermal/nvidia,tegra124-soctherm.txt | 62 +-
> > .../devicetree/bindings/thermal/qcom-tsens.txt | 14 +
> > .../bindings/thermal/rockchip-thermal.txt | 1 +
> > .../bindings/thermal/thermal-generic-adc.txt | 10 +-
> > MAINTAINERS | 6 +
> > drivers/hwmon/aspeed-pwm-tacho.c | 6 +-
> > drivers/hwmon/gpio-fan.c | 25 +-
> > drivers/hwmon/mlxreg-fan.c | 31 +-
> > drivers/hwmon/npcm750-pwm-fan.c | 6 +-
> > drivers/hwmon/pwm-fan.c | 73 +-
> > drivers/thermal/Kconfig | 11 +
> > drivers/thermal/Makefile | 1 +
> > drivers/thermal/broadcom/sr-thermal.c | 8 -
> > drivers/thermal/cpu_cooling.c | 30 +-
> > drivers/thermal/of-thermal.c | 3 +
> > drivers/thermal/qcom/Makefile | 4 +-
> > drivers/thermal/qcom/tsens-8916.c | 105 ---
> > drivers/thermal/qcom/tsens-8960.c | 84 +-
> > drivers/thermal/qcom/tsens-common.c | 159 +++-
> > .../thermal/qcom/{tsens-8974.c => tsens-v0_1.c} | 166 +++-
> > drivers/thermal/qcom/tsens-v1.c | 193 +++++
> > drivers/thermal/qcom/tsens-v2.c | 111 +--
> > drivers/thermal/qcom/tsens.c | 100 ++-
> > drivers/thermal/qcom/tsens.h | 291 ++++++-
> > drivers/thermal/qoriq_thermal.c | 5 -
> > drivers/thermal/rcar_gen3_thermal.c | 51 +-
> > drivers/thermal/rcar_thermal.c | 11 +-
> > drivers/thermal/rockchip_thermal.c | 74 +-
> > drivers/thermal/st/Kconfig | 22 +-
> > drivers/thermal/st/stm_thermal.c | 6 +-
> > drivers/thermal/tegra/Kconfig | 4 +-
> > drivers/thermal/tegra/soctherm.c | 961 +++++++++++++++++++--
> > drivers/thermal/tegra/soctherm.h | 16 +
> > drivers/thermal/tegra/tegra124-soctherm.c | 7 +-
> > drivers/thermal/tegra/tegra132-soctherm.c | 7 +-
> > drivers/thermal/tegra/tegra210-soctherm.c | 15 +-
> > drivers/thermal/thermal-generic-adc.c | 9 +-
> > drivers/thermal/thermal_core.c | 49 ++
> > drivers/thermal/thermal_mmio.c | 129 +++
> > include/dt-bindings/thermal/tegra124-soctherm.h | 8 +-
> > include/linux/thermal.h | 13 +
> > 42 files changed, 2330 insertions(+), 590 deletions(-)
> > create mode 100644 Documentation/devicetree/bindings/thermal/amazon,al-thermal.txt
> > delete mode 100644 drivers/thermal/qcom/tsens-8916.c
> > rename drivers/thermal/qcom/{tsens-8974.c => tsens-v0_1.c} (56%)
> > create mode 100644 drivers/thermal/qcom/tsens-v1.c
> > create mode 100644 drivers/thermal/thermal_mmio.c

2019-05-24 08:26:26

by Tomeu Vizoso

[permalink] [raw]
Subject: Re: [GIT PULL] Thermal-SoC management changes for v5.2-rc1

On Fri, 24 May 2019 at 04:40, Eduardo Valentin <[email protected]> wrote:
>
> On Thu, May 23, 2019 at 11:46:47AM +0200, Tomeu Vizoso wrote:
> > Hi Eduardo,
> >
> > I saw that for 5.1 [0] you included a kernelci boot report for your
> > tree, but not for 5.2. Have you found anything that should be improved
> > in KernelCI for it to be more useful to maintainers like you?
>
> Honestly, I take a couple of automated testing as input before sending
> my pulls to Linux: (a) my local test, (b) kernel-ci, and (c) 0-day.
>
> There was really no reason specifically for me to not add the report
> from kernelci, except..
> >
> > [0] https://lore.kernel.org/lkml/[email protected]/
> >
> > I found about this when trying to understand why the boot on the
> > veyron-jaq board has been broken in 5.2-rc1.
> >
>
> I remember a report saying this failed, but from what I could tell from
> the boot log, the board booted and hit terminal. But apparently, after
> all reports from developers, the veyron-jaq boards were in a hang state.
>
> That was hard for me to tell from your logs, as they looked like
> a regular boot that hits terminal.
>
> Maybe I should have looked for a specific output of a command you guys
> run, saying "successful boot" somewhere?

I think what is easiest and clearest is to consider the bisection
reports as a very strong indication that something is quite wrong in
the branch.

Because if a board stopped booting and the bisection found a
suspicious patch, and reverting it gets the board booting again, then
chances are very high that the patch in question broke that boot.

Do you think the wording could be improved to make it clearer? Or
maybe some other changes to make all this more useful to maintainers
like you?

Thanks,

Tomeu

> > Thanks,
> >
> > Tomeu
> >
> > On Thu, 16 May 2019 at 06:43, Eduardo Valentin <[email protected]> wrote:
> > >
> > > Hello Linus,
> > >
> > > Please consider the following thermal soc changes for v5.2-rc1.
> > >
> > > The following changes since commit 37624b58542fb9f2d9a70e6ea006ef8a5f66c30b:
> > >
> > > Linux 5.1-rc7 (2019-04-28 17:04:13 -0700)
> > >
> > > are available in the git repository at:
> > >
> > > git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal linus
> > >
> > > for you to fetch changes up to 37bcec5d9f71bd13142a97d2196b293c9ac23823:
> > >
> > > hwmon: (pwm-fan) Use devm_thermal_of_cooling_device_register (2019-05-14 07:00:47 -0700)
> > >
> > > Specifics:
> > > - thermal core has a new devm_* API for registering cooling devices, thanks to Guenter R.
> > > I took the entire series, that is why you see changes on drivers/hwmon in this pull.
> > > - rockchip thermal driver gains support to PX30 SoC, thanks to Elaine Z.
> > > - the generic-adc thermal driver now considers the lookup table DT property as optional,
> > > thanks to Jean-Francois D.
> > > - Refactoring of tsens thermal driver, thanks to Amit K.
> > > - Cleanups on cpu cooling driver, thanks to Daniel L.
> > > - broadcom thermal driver dropped support to ACPI, thanks to Srinath M.
> > > - tegra thermal driver gains support to OC hw throttle and GPU throtle, thanks to Wei Ni.
> > > - Fixes in several thermal drivers.
> > >
> > > BR,
> > >
> > > Eduardo Valentin
> > >
> > > ----------------------------------------------------------------
> > > Amit Kucheria (21):
> > > drivers: thermal: tsens: Document the data structures
> > > drivers: thermal: tsens: Rename tsens_data
> > > drivers: thermal: tsens: Rename tsens_device
> > > drivers: thermal: tsens: Rename variable tmdev
> > > drivers: thermal: tsens: Use consistent names for variables
> > > drivers: thermal: tsens: Function prototypes should have argument names
> > > drivers: thermal: tsens: Rename tsens-8916 to prepare to merge with tsens-8974
> > > drivers: thermal: tsens: Rename constants to prepare to merge with tsens-8974
> > > drivers: thermal: tsens: Merge tsens-8974 into tsens-v0_1
> > > drivers: thermal: tsens: Introduce reg_fields to deal with register description
> > > drivers: thermal: tsens: Save reference to the device pointer and use it
> > > drivers: thermal: tsens: Don't print error message on -EPROBE_DEFER
> > > drivers: thermal: tsens: Add new operation to check if a sensor is enabled
> > > drivers: thermal: tsens: change data type for sensor IDs
> > > drivers: thermal: tsens: Introduce IP-specific max_sensor count
> > > drivers: thermal: tsens: simplify get_temp_tsens_v2 routine
> > > drivers: thermal: tsens: Move get_temp_tsens_v2 to allow sharing
> > > drivers: thermal: tsens: Common get_temp() learns to do ADC conversion
> > > dt: thermal: tsens: Add bindings for qcs404
> > > drivers: thermal: tsens: Add generic support for TSENS v1 IP
> > > drivers: thermal: tsens: Move calibration constants to header file
> > >
> > > Andrey Smirnov (1):
> > > thermal: qoriq: Remove unnecessary DT node is NULL check
> > >
> > > Daniel Lezcano (4):
> > > thermal/drivers/cpu_cooling: Remove pointless test in power2state()
> > > thermal/drivers/cpu_cooling: Fixup the header and copyright
> > > thermal/drivers/cpu_cooling: Add Software Package Data Exchange (SPDX)
> > > thermal/drivers/cpu_cooling: Remove pointless field
> > >
> > > Elaine Zhang (3):
> > > thermal: rockchip: fix up the tsadc pinctrl setting error
> > > dt-bindings: rockchip-thermal: Support the PX30 SoC compatible
> > > thermal: rockchip: Support the PX30 SoC in thermal driver
> > >
> > > Enrico Weigelt, metux IT consult (1):
> > > drivers: thermal: Kconfig: pedantic cleanups
> > >
> > > Guenter Roeck (6):
> > > thermal: Introduce devm_thermal_of_cooling_device_register
> > > hwmon: (aspeed-pwm-tacho) Use devm_thermal_of_cooling_device_register
> > > hwmon: (gpio-fan) Use devm_thermal_of_cooling_device_register
> > > hwmon: (mlxreg-fan) Use devm_thermal_of_cooling_device_register
> > > hwmon: (npcm750-pwm-fan) Use devm_thermal_of_cooling_device_register
> > > hwmon: (pwm-fan) Use devm_thermal_of_cooling_device_register
> > >
> > > Hoan Nguyen An (1):
> > > thermal: rcar_gen3_thermal: Fix init value of IRQCTL register
> > >
> > > Jean-Francois Dagenais (2):
> > > thermal: generic-adc: make lookup table optional
> > > dt-bindings: thermal: generic-adc: make lookup-table optional
> > >
> > > Jiada Wang (3):
> > > thermal: rcar_gen3_thermal: fix interrupt type
> > > thermal: rcar_gen3_thermal: disable interrupt in .remove
> > > thermal: rcar_gen3_thermal: Fix to show correct trip points number
> > >
> > > Matthias Kaehlcke (1):
> > > thermal: cpu_cooling: Actually trace CPU load in thermal_power_cpu_get_power
> > >
> > > Srinath Mannam (1):
> > > thermal: broadcom: Remove ACPI support
> > >
> > > Talel Shenhar (3):
> > > dt-bindings: thermal: al-thermal: Add binding documentation
> > > thermal: Introduce Amazon's Annapurna Labs Thermal Driver
> > > thermal: Fix build error of missing devm_ioremap_resource on UM
> > >
> > > Wei Ni (9):
> > > of: Add bindings of thermtrip for Tegra soctherm
> > > thermal: tegra: support hw and sw shutdown
> > > of: Add bindings of gpu hw throttle for Tegra soctherm
> > > thermal: tegra: add support for gpu hw-throttle
> > > thermal: tegra: add support for thermal IRQ
> > > thermal: tegra: add set_trips functionality
> > > thermal: tegra: add support for EDP IRQ
> > > of: Add bindings of OC hw throttle for Tegra soctherm
> > > thermal: tegra: enable OC hw throttle
> > >
> > > Wolfram Sang (1):
> > > thermal: stm32: simplify getting .driver_data
> > >
> > > Yangtao Li (1):
> > > of: thermal: Improve print information
> > >
> > > Yoshihiro Kaneko (1):
> > > thermal: rcar_thermal: update calculation formula for R-Car Gen3 SoCs
> > >
> > > .../bindings/thermal/amazon,al-thermal.txt | 33 +
> > > .../bindings/thermal/nvidia,tegra124-soctherm.txt | 62 +-
> > > .../devicetree/bindings/thermal/qcom-tsens.txt | 14 +
> > > .../bindings/thermal/rockchip-thermal.txt | 1 +
> > > .../bindings/thermal/thermal-generic-adc.txt | 10 +-
> > > MAINTAINERS | 6 +
> > > drivers/hwmon/aspeed-pwm-tacho.c | 6 +-
> > > drivers/hwmon/gpio-fan.c | 25 +-
> > > drivers/hwmon/mlxreg-fan.c | 31 +-
> > > drivers/hwmon/npcm750-pwm-fan.c | 6 +-
> > > drivers/hwmon/pwm-fan.c | 73 +-
> > > drivers/thermal/Kconfig | 11 +
> > > drivers/thermal/Makefile | 1 +
> > > drivers/thermal/broadcom/sr-thermal.c | 8 -
> > > drivers/thermal/cpu_cooling.c | 30 +-
> > > drivers/thermal/of-thermal.c | 3 +
> > > drivers/thermal/qcom/Makefile | 4 +-
> > > drivers/thermal/qcom/tsens-8916.c | 105 ---
> > > drivers/thermal/qcom/tsens-8960.c | 84 +-
> > > drivers/thermal/qcom/tsens-common.c | 159 +++-
> > > .../thermal/qcom/{tsens-8974.c => tsens-v0_1.c} | 166 +++-
> > > drivers/thermal/qcom/tsens-v1.c | 193 +++++
> > > drivers/thermal/qcom/tsens-v2.c | 111 +--
> > > drivers/thermal/qcom/tsens.c | 100 ++-
> > > drivers/thermal/qcom/tsens.h | 291 ++++++-
> > > drivers/thermal/qoriq_thermal.c | 5 -
> > > drivers/thermal/rcar_gen3_thermal.c | 51 +-
> > > drivers/thermal/rcar_thermal.c | 11 +-
> > > drivers/thermal/rockchip_thermal.c | 74 +-
> > > drivers/thermal/st/Kconfig | 22 +-
> > > drivers/thermal/st/stm_thermal.c | 6 +-
> > > drivers/thermal/tegra/Kconfig | 4 +-
> > > drivers/thermal/tegra/soctherm.c | 961 +++++++++++++++++++--
> > > drivers/thermal/tegra/soctherm.h | 16 +
> > > drivers/thermal/tegra/tegra124-soctherm.c | 7 +-
> > > drivers/thermal/tegra/tegra132-soctherm.c | 7 +-
> > > drivers/thermal/tegra/tegra210-soctherm.c | 15 +-
> > > drivers/thermal/thermal-generic-adc.c | 9 +-
> > > drivers/thermal/thermal_core.c | 49 ++
> > > drivers/thermal/thermal_mmio.c | 129 +++
> > > include/dt-bindings/thermal/tegra124-soctherm.h | 8 +-
> > > include/linux/thermal.h | 13 +
> > > 42 files changed, 2330 insertions(+), 590 deletions(-)
> > > create mode 100644 Documentation/devicetree/bindings/thermal/amazon,al-thermal.txt
> > > delete mode 100644 drivers/thermal/qcom/tsens-8916.c
> > > rename drivers/thermal/qcom/{tsens-8974.c => tsens-v0_1.c} (56%)
> > > create mode 100644 drivers/thermal/qcom/tsens-v1.c
> > > create mode 100644 drivers/thermal/thermal_mmio.c

2019-05-24 13:55:28

by Eduardo Valentin

[permalink] [raw]
Subject: Re: [GIT PULL] Thermal-SoC management changes for v5.2-rc1

Hello,

On Fri, May 24, 2019 at 10:23:09AM +0200, Tomeu Vizoso wrote:
> On Fri, 24 May 2019 at 04:40, Eduardo Valentin <[email protected]> wrote:
> >
> > On Thu, May 23, 2019 at 11:46:47AM +0200, Tomeu Vizoso wrote:
> > > Hi Eduardo,
> > >
> > > I saw that for 5.1 [0] you included a kernelci boot report for your
> > > tree, but not for 5.2. Have you found anything that should be improved
> > > in KernelCI for it to be more useful to maintainers like you?
> >
> > Honestly, I take a couple of automated testing as input before sending
> > my pulls to Linux: (a) my local test, (b) kernel-ci, and (c) 0-day.
> >
> > There was really no reason specifically for me to not add the report
> > from kernelci, except..
> > >
> > > [0] https://lore.kernel.org/lkml/[email protected]/
> > >
> > > I found about this when trying to understand why the boot on the
> > > veyron-jaq board has been broken in 5.2-rc1.
> > >
> >
> > I remember a report saying this failed, but from what I could tell from
> > the boot log, the board booted and hit terminal. But apparently, after
> > all reports from developers, the veyron-jaq boards were in a hang state.
> >
> > That was hard for me to tell from your logs, as they looked like
> > a regular boot that hits terminal.
> >
> > Maybe I should have looked for a specific output of a command you guys
> > run, saying "successful boot" somewhere?
>
> I think what is easiest and clearest is to consider the bisection
> reports as a very strong indication that something is quite wrong in
> the branch.

OK. I hear you.

>
> Because if a board stopped booting and the bisection found a
> suspicious patch, and reverting it gets the board booting again, then
> chances are very high that the patch in question broke that boot.
>


Yeah, for sure If I had understood the report properly I could have
nacked the patch.

> Do you think the wording could be improved to make it clearer? Or
> maybe some other changes to make all this more useful to maintainers
> like you?
>

Well, from my perspective, I need to judge if the failure on your report
is really related to my changes. Many times, specially on build errors,
we get failures that are unrelated. Build errors are more straight
forward do judge. Similarly, we need to find out if a boot issue is
caused by a change on the branch or something existing. On boot issues
from kernelci reports, I think the false negatives I have been seeing
is lab/boards failing to boot. Those can also be easy to spot as the
in most cases the kernel wont even load.

For this particular case, as I described before, the kernel would
load and hit the shell command line, but in fact it was in hang state
IIRC. That is probably why it has not straight forward to understand
from the log. Maybe a successful boot message somewhere would have
helped to spot the problem (or the opposite of it, something
saying, I was expecting to execute a command and board was
unresponsive).

2019-05-27 11:39:13

by Tomeu Vizoso

[permalink] [raw]
Subject: Re: [GIT PULL] Thermal-SoC management changes for v5.2-rc1

On Fri, 24 May 2019 at 15:54, Eduardo Valentin <[email protected]> wrote:
>
> Hello,
>
> On Fri, May 24, 2019 at 10:23:09AM +0200, Tomeu Vizoso wrote:
> > On Fri, 24 May 2019 at 04:40, Eduardo Valentin <[email protected]> wrote:
> > >
> > > On Thu, May 23, 2019 at 11:46:47AM +0200, Tomeu Vizoso wrote:
> > > > Hi Eduardo,
> > > >
> > > > I saw that for 5.1 [0] you included a kernelci boot report for your
> > > > tree, but not for 5.2. Have you found anything that should be improved
> > > > in KernelCI for it to be more useful to maintainers like you?
> > >
> > > Honestly, I take a couple of automated testing as input before sending
> > > my pulls to Linux: (a) my local test, (b) kernel-ci, and (c) 0-day.
> > >
> > > There was really no reason specifically for me to not add the report
> > > from kernelci, except..
> > > >
> > > > [0] https://lore.kernel.org/lkml/[email protected]/
> > > >
> > > > I found about this when trying to understand why the boot on the
> > > > veyron-jaq board has been broken in 5.2-rc1.
> > > >
> > >
> > > I remember a report saying this failed, but from what I could tell from
> > > the boot log, the board booted and hit terminal. But apparently, after
> > > all reports from developers, the veyron-jaq boards were in a hang state.
> > >
> > > That was hard for me to tell from your logs, as they looked like
> > > a regular boot that hits terminal.
> > >
> > > Maybe I should have looked for a specific output of a command you guys
> > > run, saying "successful boot" somewhere?
> >
> > I think what is easiest and clearest is to consider the bisection
> > reports as a very strong indication that something is quite wrong in
> > the branch.
>
> OK. I hear you.
>
> >
> > Because if a board stopped booting and the bisection found a
> > suspicious patch, and reverting it gets the board booting again, then
> > chances are very high that the patch in question broke that boot.
> >
>
>
> Yeah, for sure If I had understood the report properly I could have
> nacked the patch.
>
> > Do you think the wording could be improved to make it clearer? Or
> > maybe some other changes to make all this more useful to maintainers
> > like you?
> >
>
> Well, from my perspective, I need to judge if the failure on your report
> is really related to my changes. Many times, specially on build errors,
> we get failures that are unrelated. Build errors are more straight
> forward do judge. Similarly, we need to find out if a boot issue is
> caused by a change on the branch or something existing. On boot issues
> from kernelci reports, I think the false negatives I have been seeing
> is lab/boards failing to boot. Those can also be easy to spot as the
> in most cases the kernel wont even load.
>
> For this particular case, as I described before, the kernel would
> load and hit the shell command line, but in fact it was in hang state
> IIRC. That is probably why it has not straight forward to understand
> from the log. Maybe a successful boot message somewhere would have
> helped to spot the problem (or the opposite of it, something
> saying, I was expecting to execute a command and board was
> unresponsive).

Ah, I think I see now what you meant. In the boot log linked below,
near the end we have:

22:14:41.981401 ShellCommand command timed out.: Sending # in case of
corruption. Connection timeout 00:04:25, retry in 00:02:12
22:14:42.083594 #

The # character is sent by the LAVA machine that the DUT is connected
to, in the hope that a userspace shell would reply with something.

But the kernel failed to boot to userspace, so we have this at the end:

22:16:54.558087 depthcharge-retry failed: 1 of 1 attempts.
'auto-login-action timed out after 285 seconds'
22:16:54.560479 depthcharge-action failed: 1 of 1 attempts.
'auto-login-action timed out after 285 seconds'
22:16:54.855023 JobError: Your job cannot terminate cleanly.

https://storage.kernelci.org/evalenti/for-kernelci/v5.1-rc6-58-gbe827ffd38ea/arm/multi_v7_defconfig/gcc-8/lab-collabora/boot-rk3288-veyron-jaq.html

Do you think that's the source of the confusion?

Thanks,

Tomeu