by Jarkko Sakkinen

[permalink] [raw]

Subject: Re: [PATCH 1/3] tpm: protect against locality counter underflow

On Sun Feb 25, 2024 at 1:23 PM EET, Daniel P. Smith wrote:
> On 2/23/24 07:58, Jarkko Sakkinen wrote:
> > On Fri Feb 23, 2024 at 3:58 AM EET, Daniel P. Smith wrote:
> >>> Just adding here that I wish we also had a log transcript of bug, which
> >>> is right now missing. The explanation believable enough to move forward
> >>> but I still wish to see a log transcript.
> >>
> >> That will be forth coming.
> >
> > I did not respond yet to other responses that you've given in the past
> > 12'ish hours or so (just woke up) but I started to think how all this
> > great and useful information would be best kept in memory. Some of it
> > has been discussed in the past but there is lot of small details that
> > are too easily forgotten.
> >
> > I'd think the best "documentation" approach here would be inject the
> > spec references to the sites where locality behaviour is changed so
> > that it is easy in future cross-reference them, and least of risk
> > of having code changes that would break anything. I think this way
> > all the information that you provided is best preserved for the
> > future.
> >
> > Thanks a lot for great and informative responses!
>
> No problem at all.
>
> Here is a serial output[1] from a dynamic launch using Linux Secure
> Launch v7[2] with one additional patch[3] to dump TPM driver state.

But are this fixes for a kernel tree with [2] applied.

If the bugs do not occur in the mainline tree without the out-of-tree
feature, they are not bug fixes. They should then really be part of that
series.

BR, Jarkko

2024-02-26 12:45:24

by Alexander Steffen

[permalink] [raw]

Subject: Re: [PATCH 1/3] tpm: protect against locality counter underflow

On 23.02.2024 02:55, Daniel P. Smith wrote:
> On 2/20/24 13:42, Alexander Steffen wrote:
>> On 02.02.2024 04:08, Lino Sanfilippo wrote:
>>> On 01.02.24 23:21, Jarkko Sakkinen wrote:
>>>
>>>>
>>>> On Wed Jan 31, 2024 at 7:08 PM EET, Daniel P. Smith wrote:
>>>>> Commit 933bfc5ad213 introduced the use of a locality counter to
>>>>> control when a
>>>>> locality request is allowed to be sent to the TPM. In the commit,
>>>>> the counter
>>>>> is indiscriminately decremented. Thus creating a situation for an
>>>>> integer
>>>>> underflow of the counter.
>>>>
>>>> What is the sequence of events that leads to this triggering the
>>>> underflow? This information should be represent in the commit message.
>>>>
>>>
>>> AFAIU this is:
>>>
>>> 1. We start with a locality_counter of 0 and then we call
>>> tpm_tis_request_locality()
>>> for the first time, but since a locality is (unexpectedly) already
>>> active
>>> check_locality() and consequently __tpm_tis_request_locality() return
>>> "true".
>>
>> check_locality() returns true, but __tpm_tis_request_locality() returns
>> the requested locality. Currently, this is always 0, so the check for
>> !ret will always correctly indicate success and increment the
>> locality_count.
>>
>> But since theoretically a locality != 0 could be requested, the correct
>> fix would be to check for something like ret >= 0 or ret == l instead of
>> !ret. Then the counter will also be incremented correctly for localities
>> != 0, and no underflow will happen later on. Therefore, explicitly
>> checking for an underflow is unnecessary and hides the real problem.
>>
>
> My apologies, but I will have to humbly disagree from a fundamental
> level here. If a state variable has bounds, then those bounds should be
> enforced when the variable is being manipulated.

That's fine, but that is not what your proposed fix does.

tpm_tis_request_locality and tpm_tis_relinquish_locality are meant to be
called in pairs: for every (successful) call to tpm_tis_request_locality
there *must* be a corresponding call to tpm_tis_relinquish_locality
afterwards. Unfortunately, in C there is no language construct to
enforce that (nothing like a Python context manager), so instead
locality_count is used to count the number of successful calls to
tpm_tis_request_locality, so that tpm_tis_relinquish_locality can wait
to actually relinquish the locality until the last expected call has
happened (you can think of that as a Python RLock, to stay with the
Python analogies).

So if locality_count ever gets negative, that is certainly a bug
somewhere. But your proposed fix hides this bug, by allowing
tpm_tis_relinquish_locality to be called more often than
tpm_tis_request_locality. You could have added something like
BUG_ON(priv->locality_count == 0) before decrementing the counter. That
would really enforce the bounds, without hiding the bug, and I would be
fine with that.

Of course, that still leaves the actual bug to be fixed. In this case,
there is no mismatch between the calls to tpm_tis_request_locality and
tpm_tis_relinquish_locality. It is just (as I said before) that the
counting of successful calls in tpm_tis_request_locality is broken for
localities != 0, so that is what you need to fix.

> Assuming that every
> path leading to the variable manipulation code has ensured proper
> manipulation is just that, an assumption. When assumptions fail is how
> bugs and vulnerabilities occur.
>
> To your point, does this full address the situation experienced, I would
> say it does not. IMHO, the situation is really a combination of both
> patch 1 and patch 2, but the request was to split the changes for
> individual discussion. We selected this one as being the fixes for two
> reasons. First, it blocks the underflow such that when the Secure Launch
> series opens Locality 2, it will get incremented at that time and the
> internal locality tracking state variables will end up with the correct
> values. Thus leading to the relinquish succeeding at kernel shutdown.
> Second, it provides a stronger defensive coding practice.
>
> Another reason that this works as a fix is that the TPM specification
> requires the registers to be mirrored across all localities, regardless
> of the active locality. While all the request/relinquishes for Locality
> 0 sent by the early code do not succeed, obtaining the values via the
> Locality 0 registers are still guaranteed to be correct.
>
> v/r,
> dps