Am 13. September 2023 10:16:51 MESZ schrieb AngeloGioacchino Del Regno <[email protected]>:
Hi angelo,
thanks for first look
>Il 11/09/23 20:33, Frank Wunderlich ha scritto:
>> From: Frank Wunderlich <[email protected]>
>>
>> Add Support for mediatek fologic 880/MT7988.
>>
>> Signed-off-by: Frank Wunderlich <[email protected]>
>> ---
>> drivers/thermal/mediatek/lvts_thermal.c | 73 +++++++++++++++++++++++++
>> 1 file changed, 73 insertions(+)
>>
>> diff --git a/drivers/thermal/mediatek/lvts_thermal.c b/drivers/thermal/mediatek/lvts_thermal.c
>> index c1004b4da3b6..48b257a3c80e 100644
>> --- a/drivers/thermal/mediatek/lvts_thermal.c
>> +++ b/drivers/thermal/mediatek/lvts_thermal.c
>> @@ -82,6 +82,8 @@
>> #define LVTS_GOLDEN_TEMP_DEFAULT 50
>> #define LVTS_COEFF_A_MT8195 -250460
>> #define LVTS_COEFF_B_MT8195 250460
>> +#define LVTS_COEFF_A_MT7988 -204650
>> +#define LVTS_COEFF_B_MT7988 204650
>> #define LVTS_MSR_IMMEDIATE_MODE 0
>> #define LVTS_MSR_FILTERED_MODE 1
>> @@ -1272,6 +1274,67 @@ static int lvts_remove(struct platform_device *pdev)
>> return 0;
>> }
>> +/*
>> + * LVTS MT7988
>> + */
>> +#define LVTS_HW_SHUTDOWN_MT7988 117000
>
>Are you sure that this chip's Tj is >117°C ?!
>
>Looks a bit high... if it is exactly 117°C, I would suggest cutting earlier,
>either at 110 (safe side) or 115: after all, this is a life-saver feature and
>the chip is actually never meant to *constantly* work at 110°C (as it would
>degrade fast and say goodbye earlier than "planned").
I took values from SDK
https://git01.mediatek.com/plugins/gitiles/openwrt/feeds/mtk-openwrt-feeds/+/refs/heads/master/target/linux/mediatek/files-5.4/drivers/thermal/mediatek/soc_temp_lvts.c#1483
>> +//enum mt7988_lvts_domain { MT7988_AP_DOMAIN, MT7988_NUM_DOMAIN };
>> +
>> +enum mt7988_lvts_sensor_enum {
>> + MT7988_TS3_0,
>> + MT7988_TS3_1,
>> + MT7988_TS3_2,
>> + MT7988_TS3_3,
>> + MT7988_TS4_0,
>> + MT7988_TS4_1,
>> + MT7988_TS4_2,
>> + MT7988_TS4_3,
>> + MT7988_NUM_TS
>> +};
>This enumeration should be definitions in bindings (mediatek,lvts-thermal.h).
>
>Besides, the LVTS is about internal temperatures, so those TS3_x and 4_x can
>be renamed like what was done for MT8192 and MT8195: this is because you will
>never see TS3_2 being CPU2 on a board and CPU4 on another, being those - again -
>internal to the SoC, hence unchangeable.
Right these sensors are internally only and i took naming from sdk to avoid confusion. And i have not more information about these internal sensors (special meaning),but their values are packed together to get the resulting (average) temperature.
>Another reason is that you'll anyway have to refer to those sensors in the
>devicetree to configure thermal trips and such, so... :-)
In device tree it will look like this:
https://github.com/frank-w/BPI-Router-Linux/blob/6.5-lvts/arch/arm64/boot/dts/mediatek/mt7988a.dtsi#L771
Daniel has also defined thermal trips there,but these are untested atm. I only verified temperature itself i get from sysfs as far as i can (start at ~40°C and reaching ~70 while running).
>Regards,
>Angelo
regards Frank
Il 13/09/23 12:52, Frank Wunderlich ha scritto:
> Am 13. September 2023 10:16:51 MESZ schrieb AngeloGioacchino Del Regno <[email protected]>:
> Hi angelo,
>
> thanks for first look
>
>> Il 11/09/23 20:33, Frank Wunderlich ha scritto:
>>> From: Frank Wunderlich <[email protected]>
>>>
>>> Add Support for mediatek fologic 880/MT7988.
>>>
>>> Signed-off-by: Frank Wunderlich <[email protected]>
>>> ---
>>> drivers/thermal/mediatek/lvts_thermal.c | 73 +++++++++++++++++++++++++
>>> 1 file changed, 73 insertions(+)
>>>
>>> diff --git a/drivers/thermal/mediatek/lvts_thermal.c b/drivers/thermal/mediatek/lvts_thermal.c
>>> index c1004b4da3b6..48b257a3c80e 100644
>>> --- a/drivers/thermal/mediatek/lvts_thermal.c
>>> +++ b/drivers/thermal/mediatek/lvts_thermal.c
>>> @@ -82,6 +82,8 @@
>>> #define LVTS_GOLDEN_TEMP_DEFAULT 50
>>> #define LVTS_COEFF_A_MT8195 -250460
>>> #define LVTS_COEFF_B_MT8195 250460
>>> +#define LVTS_COEFF_A_MT7988 -204650
>>> +#define LVTS_COEFF_B_MT7988 204650
>>> #define LVTS_MSR_IMMEDIATE_MODE 0
>>> #define LVTS_MSR_FILTERED_MODE 1
>>> @@ -1272,6 +1274,67 @@ static int lvts_remove(struct platform_device *pdev)
>>> return 0;
>>> }
>>> +/*
>>> + * LVTS MT7988
>>> + */
>>> +#define LVTS_HW_SHUTDOWN_MT7988 117000
>>
>> Are you sure that this chip's Tj is >117°C ?!
>>
>> Looks a bit high... if it is exactly 117°C, I would suggest cutting earlier,
>> either at 110 (safe side) or 115: after all, this is a life-saver feature and
>> the chip is actually never meant to *constantly* work at 110°C (as it would
>> degrade fast and say goodbye earlier than "planned").
>
> I took values from SDK
>
> https://git01.mediatek.com/plugins/gitiles/openwrt/feeds/mtk-openwrt-feeds/+/refs/heads/master/target/linux/mediatek/files-5.4/drivers/thermal/mediatek/soc_temp_lvts.c#1483
>
That kernel also defines 117°C for MT8195, which leaves me a bit dubious.
For safety, I would recommend using the same 105°C hw shutdown temperature
for 7988 as well: after all if you think about it, reaching that kind of
temperature means that there's a real problem (fan stopped working and/or
heatsink detached) that needs attention.
This is because you'll have trip points in devicetree that should throttle
the CPU in case it reaches at least a dangerously high temperature (read:
a temperature outside the recommended continuous operation range), bringing
the temperatures down because of the throttling action; I would imagine
throttling the CPU a bit down at 80°C, then a bit more at 90°C - but then,
if the temps won't drop like that, there's clearly a HW-related issue that
must be addressed (like the fan/heatsink scenario that I just described).
Though, take this as an advice; I'll respect whichever decision you'll take.
>>> +//enum mt7988_lvts_domain { MT7988_AP_DOMAIN, MT7988_NUM_DOMAIN };
>>> +
>>> +enum mt7988_lvts_sensor_enum {
>>> + MT7988_TS3_0,
>>> + MT7988_TS3_1,
>>> + MT7988_TS3_2,
>>> + MT7988_TS3_3,
>>> + MT7988_TS4_0,
>>> + MT7988_TS4_1,
>>> + MT7988_TS4_2,
>>> + MT7988_TS4_3,
>>> + MT7988_NUM_TS
>>> +};
>
>> This enumeration should be definitions in bindings (mediatek,lvts-thermal.h).
>>
>> Besides, the LVTS is about internal temperatures, so those TS3_x and 4_x can
>> be renamed like what was done for MT8192 and MT8195: this is because you will
>> never see TS3_2 being CPU2 on a board and CPU4 on another, being those - again -
>> internal to the SoC, hence unchangeable.
>
> Right these sensors are internally only and i took naming from sdk to avoid confusion. And i have not more information about these internal sensors (special meaning),but their values are packed together to get the resulting (average) temperature.
>
>> Another reason is that you'll anyway have to refer to those sensors in the
>> devicetree to configure thermal trips and such, so... :-)
>
> In device tree it will look like this:
>
> https://github.com/frank-w/BPI-Router-Linux/blob/6.5-lvts/arch/arm64/boot/dts/mediatek/mt7988a.dtsi#L771
>
> Daniel has also defined thermal trips there,but these are untested atm. I only verified temperature itself i get from sysfs as far as i can (start at ~40°C and reaching ~70 while running).
>
Check how it's done in mt8192.dtsi and mt8195.dtsi: we're using the definitions
from the bindings for thermal zones.
At least for consistency (apart from other even more valid considerations), that's
how it should be done: please do it like that.
Besides, I think that you can definitely guess what `TS` is CPU related: checking
the driver and devicetree from the downstream kernel that you provided, what I
understand is:
1. Your naming TS3,4 corresponds to TS2,3 downstream (so it's wrong?)
2. Downstream TS2 is related to the CPU cores, so it should be
- TS2_0 - CPU0
- TS2_1 - CPU1
- TS2_2 - CPU2
- TS2_3 - CPU3
The other set of thermal sensors seem to be completely unused, so we cannot guess
just by looking at the code. Since you have the hardware, you may be able to take
a position about what they are by checking what sensor heats up for each "action"
(could be ethernet controller or infra or whatever else); if in doubt, just leave
them out, but add a comment saying that there are more sensors and say where, so
that if anyone finds what they're for, it'll be easy to add them.
In any case, adding thermal sensors that you don't know about is meaningless, as
the sense of all this is:
- Monitoring temperatures of the hardware
- Taking action to prevent temperature overrun
but if you don't know what you're reading, you can't interpret temperatures for
something unknown, hence you can't as well take action to prevent overruns, as
you won't know what's the maximum temperature for "unknown thing" :-)
Regards,
Angelo
Hi,
> Gesendet: Mittwoch, 13. September 2023 um 13:43 Uhr
> Von: "AngeloGioacchino Del Regno" <[email protected]>
> An: [email protected], "Frank Wunderlich" <[email protected]>, [email protected]
> Il 13/09/23 12:52, Frank Wunderlich ha scritto:
> > Am 13. September 2023 10:16:51 MESZ schrieb AngeloGioacchino Del Regno <[email protected]>:
> > Hi angelo,
> >
> > thanks for first look
> >
> >> Il 11/09/23 20:33, Frank Wunderlich ha scritto:
> >>> From: Frank Wunderlich <[email protected]>
> >>>
> >>> Add Support for mediatek fologic 880/MT7988.
> >>>
> >>> Signed-off-by: Frank Wunderlich <[email protected]>
> >>> ---
> >>> drivers/thermal/mediatek/lvts_thermal.c | 73 +++++++++++++++++++++++++
> >>> 1 file changed, 73 insertions(+)
> >>>
> >>> diff --git a/drivers/thermal/mediatek/lvts_thermal.c b/drivers/thermal/mediatek/lvts_thermal.c
> >>> index c1004b4da3b6..48b257a3c80e 100644
> >>> --- a/drivers/thermal/mediatek/lvts_thermal.c
> >>> +++ b/drivers/thermal/mediatek/lvts_thermal.c
> >>> @@ -82,6 +82,8 @@
> >>> #define LVTS_GOLDEN_TEMP_DEFAULT 50
> >>> #define LVTS_COEFF_A_MT8195 -250460
> >>> #define LVTS_COEFF_B_MT8195 250460
> >>> +#define LVTS_COEFF_A_MT7988 -204650
> >>> +#define LVTS_COEFF_B_MT7988 204650
> >>> #define LVTS_MSR_IMMEDIATE_MODE 0
> >>> #define LVTS_MSR_FILTERED_MODE 1
> >>> @@ -1272,6 +1274,67 @@ static int lvts_remove(struct platform_device *pdev)
> >>> return 0;
> >>> }
> >>> +/*
> >>> + * LVTS MT7988
> >>> + */
> >>> +#define LVTS_HW_SHUTDOWN_MT7988 117000
> >>
> >> Are you sure that this chip's Tj is >117°C ?!
> >>
> >> Looks a bit high... if it is exactly 117°C, I would suggest cutting earlier,
> >> either at 110 (safe side) or 115: after all, this is a life-saver feature and
> >> the chip is actually never meant to *constantly* work at 110°C (as it would
> >> degrade fast and say goodbye earlier than "planned").
> >
> > I took values from SDK
> >
> > https://git01.mediatek.com/plugins/gitiles/openwrt/feeds/mtk-openwrt-feeds/+/refs/heads/master/target/linux/mediatek/files-5.4/drivers/thermal/mediatek/soc_temp_lvts.c#1483
> >
>
> That kernel also defines 117°C for MT8195, which leaves me a bit dubious.
>
> For safety, I would recommend using the same 105°C hw shutdown temperature
> for 7988 as well: after all if you think about it, reaching that kind of
> temperature means that there's a real problem (fan stopped working and/or
> heatsink detached) that needs attention.
ack, will change to same value and put the define on top, abve the mt8195
> This is because you'll have trip points in devicetree that should throttle
> the CPU in case it reaches at least a dangerously high temperature (read:
> a temperature outside the recommended continuous operation range), bringing
> the temperatures down because of the throttling action; I would imagine
> throttling the CPU a bit down at 80°C, then a bit more at 90°C - but then,
> if the temps won't drop like that, there's clearly a HW-related issue that
> must be addressed (like the fan/heatsink scenario that I just described).
>
> Though, take this as an advice; I'll respect whichever decision you'll take.
>
> >>> +//enum mt7988_lvts_domain { MT7988_AP_DOMAIN, MT7988_NUM_DOMAIN };
> >>> +
> >>> +enum mt7988_lvts_sensor_enum {
> >>> + MT7988_TS3_0,
> >>> + MT7988_TS3_1,
> >>> + MT7988_TS3_2,
> >>> + MT7988_TS3_3,
> >>> + MT7988_TS4_0,
> >>> + MT7988_TS4_1,
> >>> + MT7988_TS4_2,
> >>> + MT7988_TS4_3,
> >>> + MT7988_NUM_TS
> >>> +};
> >
> >> This enumeration should be definitions in bindings (mediatek,lvts-thermal.h).
> >>
> >> Besides, the LVTS is about internal temperatures, so those TS3_x and 4_x can
> >> be renamed like what was done for MT8192 and MT8195: this is because you will
> >> never see TS3_2 being CPU2 on a board and CPU4 on another, being those - again -
> >> internal to the SoC, hence unchangeable.
> >
> > Right these sensors are internally only and i took naming from sdk to avoid confusion. And i have not more information about these internal sensors (special meaning),but their values are packed together to get the resulting (average) temperature.
> >
> >> Another reason is that you'll anyway have to refer to those sensors in the
> >> devicetree to configure thermal trips and such, so... :-)
> >
> > In device tree it will look like this:
> >
> > https://github.com/frank-w/BPI-Router-Linux/blob/6.5-lvts/arch/arm64/boot/dts/mediatek/mt7988a.dtsi#L771
> >
> > Daniel has also defined thermal trips there,but these are untested atm. I only verified temperature itself i get from sysfs as far as i can (start at ~40°C and reaching ~70 while running).
> >
>
> Check how it's done in mt8192.dtsi and mt8195.dtsi: we're using the definitions
> from the bindings for thermal zones.
> At least for consistency (apart from other even more valid considerations), that's
> how it should be done: please do it like that.
>
> Besides, I think that you can definitely guess what `TS` is CPU related: checking
> the driver and devicetree from the downstream kernel that you provided, what I
> understand is:
>
> 1. Your naming TS3,4 corresponds to TS2,3 downstream (so it's wrong?)
> 2. Downstream TS2 is related to the CPU cores, so it should be
> - TS2_0 - CPU0
> - TS2_1 - CPU1
> - TS2_2 - CPU2
> - TS2_3 - CPU3
i took the names from efuse names (which are listed as comment above), had 2/3 before.
need to ask mtk here. same for the second controller and if it is really a "lvts-ap".
> The other set of thermal sensors seem to be completely unused, so we cannot guess
> just by looking at the code. Since you have the hardware, you may be able to take
> a position about what they are by checking what sensor heats up for each "action"
> (could be ethernet controller or infra or whatever else); if in doubt, just leave
> them out, but add a comment saying that there are more sensors and say where, so
> that if anyone finds what they're for, it'll be easy to add them.
>
> In any case, adding thermal sensors that you don't know about is meaningless, as
> the sense of all this is:
> - Monitoring temperatures of the hardware
> - Taking action to prevent temperature overrun
> but if you don't know what you're reading, you can't interpret temperatures for
> something unknown, hence you can't as well take action to prevent overruns, as
> you won't know what's the maximum temperature for "unknown thing" :-)
yes i have only the downstream kernel and some additional information like interupt
number and the efuse offsets for calibration. And for the 8 sensors i get this information:
"8 sensors are distributed across the whole soc in order to get the correct average temperature.
chip designer didn’t disclose more detailed info to us."
so i thought all 8 sensors are taken into account with both controllers to calculate 1
resulting temperature. In downstream i saw also only 1 controller in dts and i have
only 1 interrupt.
https://git01.mediatek.com/plugins/gitiles/openwrt/feeds/mtk-openwrt-feeds/+/refs/heads/master/target/linux/mediatek/files-5.4/arch/arm64/boot/dts/mediatek/mt7988.dtsi#579
and the compatible for both (mt7988 and mt8195) are without "ap" there, so i took the same.
Where have you seen the mt7988-AP ?
i have only seen 1 sensor which should be the SOC itself and this chip has metal surface
which makes it hard to get real temperature (infrared thermometer gives wrong temp).
> Regards,
> Angelo