This patch prevents NULL dereferencing when using chip->ops while
sending TPM2_Shutdown command if both tpm_class_shutdown handler and
tpm_del_char_device are called during system shutdown.
Both these handlers set chip->ops to NULL but don't check if it's
already NULL when they are called before using it.
This issue was revealed in Chrome OS after a recent set of changes
to the unregister order for spi controllers, such as:
b4c6230bb0ba spi: Fix controller unregister order
f40913d2dca1 spi: pxa2xx: Fix controller unregister order
and similar for other controllers.
Signed-off-by: Andrey Pronin <[email protected]>
---
drivers/char/tpm/tpm-chip.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c
index 8c77e88012e9..a410ca40a3c5 100644
--- a/drivers/char/tpm/tpm-chip.c
+++ b/drivers/char/tpm/tpm-chip.c
@@ -296,7 +296,7 @@ static int tpm_class_shutdown(struct device *dev)
struct tpm_chip *chip = container_of(dev, struct tpm_chip, dev);
down_write(&chip->ops_sem);
- if (chip->flags & TPM_CHIP_FLAG_TPM2) {
+ if (chip->ops && (chip->flags & TPM_CHIP_FLAG_TPM2)) {
if (!tpm_chip_start(chip)) {
tpm2_shutdown(chip, TPM2_SU_CLEAR);
tpm_chip_stop(chip);
@@ -479,7 +479,7 @@ static void tpm_del_char_device(struct tpm_chip *chip)
/* Make the driver uncallable. */
down_write(&chip->ops_sem);
- if (chip->flags & TPM_CHIP_FLAG_TPM2) {
+ if (chip->ops && (chip->flags & TPM_CHIP_FLAG_TPM2)) {
if (!tpm_chip_start(chip)) {
tpm2_shutdown(chip, TPM2_SU_CLEAR);
tpm_chip_stop(chip);
--
2.25.1
On Thu, Jul 09, 2020 at 05:22:09PM -0700, Andrey Pronin wrote:
> This patch prevents NULL dereferencing when using chip->ops while
> sending TPM2_Shutdown command if both tpm_class_shutdown handler and
> tpm_del_char_device are called during system shutdown.
>
> Both these handlers set chip->ops to NULL but don't check if it's
> already NULL when they are called before using it.
>
> This issue was revealed in Chrome OS after a recent set of changes
> to the unregister order for spi controllers, such as:
> b4c6230bb0ba spi: Fix controller unregister order
> f40913d2dca1 spi: pxa2xx: Fix controller unregister order
> and similar for other controllers.
I'm not sure I fully understand the scenario. When does thi happen?
Why does not tpm_del_char_device need this? The changes listed tell
me nothing. Why they have this effect?
I'm just trying to understand whether this could be a regression or
not.
I neither understand what you mean by "and similar for other
controllers."
NAK for the reason that I don't understand what I'm merging.
/Jarkko
>
> Signed-off-by: Andrey Pronin <[email protected]>
> ---
> drivers/char/tpm/tpm-chip.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c
> index 8c77e88012e9..a410ca40a3c5 100644
> --- a/drivers/char/tpm/tpm-chip.c
> +++ b/drivers/char/tpm/tpm-chip.c
> @@ -296,7 +296,7 @@ static int tpm_class_shutdown(struct device *dev)
> struct tpm_chip *chip = container_of(dev, struct tpm_chip, dev);
>
> down_write(&chip->ops_sem);
> - if (chip->flags & TPM_CHIP_FLAG_TPM2) {
> + if (chip->ops && (chip->flags & TPM_CHIP_FLAG_TPM2)) {
> if (!tpm_chip_start(chip)) {
> tpm2_shutdown(chip, TPM2_SU_CLEAR);
> tpm_chip_stop(chip);
> @@ -479,7 +479,7 @@ static void tpm_del_char_device(struct tpm_chip *chip)
>
> /* Make the driver uncallable. */
> down_write(&chip->ops_sem);
> - if (chip->flags & TPM_CHIP_FLAG_TPM2) {
> + if (chip->ops && (chip->flags & TPM_CHIP_FLAG_TPM2)) {
> if (!tpm_chip_start(chip)) {
> tpm2_shutdown(chip, TPM2_SU_CLEAR);
> tpm_chip_stop(chip);
> --
> 2.25.1
>
On Fri, Jul 10, 2020 at 4:40 AM Jarkko Sakkinen
<[email protected]> wrote:
>
> On Thu, Jul 09, 2020 at 05:22:09PM -0700, Andrey Pronin wrote:
> > This patch prevents NULL dereferencing when using chip->ops while
> > sending TPM2_Shutdown command if both tpm_class_shutdown handler and
> > tpm_del_char_device are called during system shutdown.
> >
> > Both these handlers set chip->ops to NULL but don't check if it's
> > already NULL when they are called before using it.
> >
> > This issue was revealed in Chrome OS after a recent set of changes
> > to the unregister order for spi controllers, such as:
> > b4c6230bb0ba spi: Fix controller unregister order
> > f40913d2dca1 spi: pxa2xx: Fix controller unregister order
> > and similar for other controllers.
>
> I'm not sure I fully understand the scenario. When does thi happen?
This happens during system shutdown.
Here a sample stack trace from the panic:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
...
Call Trace:
tpm_transmit_cmd+0x21/0x7f
tpm2_shutdown+0x84/0xc6
tpm_chip_unregister+0xa2/0xb9
cr50_spi_remove+0x19/0x26
spi_drv_remove+0x2a/0x42
device_release_driver_internal+0x123/0x1ec
bus_remove_device+0xe8/0x111
device_del+0x1bf/0x319
? spi_unregister_controller+0xfc/0xfc
device_unregister+0x12/0x28
__unregister+0xe/0x12
device_for_each_child+0x79/0xb8
spi_unregister_controller+0x27/0xfc
pxa2xx_spi_remove+0x45/0xb4
device_shutdown+0x181/0x1d3
kernel_restart+0x12/0x56
SyS_reboot+0x16a/0x1e7
do_syscall_64+0x6b/0xf7
entry_SYSCALL_64_after_hwframe+0x41/0xa6
> Why does not tpm_del_char_device need this?
"Not" is a typo in the sentence above, right? tpm_del_char_device *does*
need the fix. When tpm_class_shutdown is called it sets chip->ops to
NULL. If tpm_del_char_device is called after that, it doesn't check if
chip->ops is NULL (normal kernel API and char device API calls go
through tpm_try_get_ops, but tpm_del_char_device doesn't) and proceeds to
call tpm2_shutdown(), which tries sending the command and dereferences
chip->ops.
> The changes listed tell
> me nothing. Why they have this effect?
"spi: pxa2xx: Fix controller unregister order" adds a spi_unregister_controller
call to pxa2xx_spi_remove, which internally calls tpm_del_char_device, which
results in the stack trace leading to the panic above.
>
> I'm just trying to understand whether this could be a regression or
> not.
>
> I neither understand what you mean by "and similar for other
> controllers."
There was a series of spi unregister order changes for various
spi controllers, including the following:
1c6221b430a0 spi: bcm2835: Fix controller unregister order
f40913d2dca1 spi: pxa2xx: Fix controller unregister order
b4c6230bb0ba spi: Fix controller unregister order
54000d2e15e9 spi: dw: Fix controller unregister order
c8f309db490e spi: bcm2835aux: Fix controller unregister order
>
> NAK for the reason that I don't understand what I'm merging.
>
> /Jarkko
>
> >
> > Signed-off-by: Andrey Pronin <[email protected]>
> > ---
> > drivers/char/tpm/tpm-chip.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c
> > index 8c77e88012e9..a410ca40a3c5 100644
> > --- a/drivers/char/tpm/tpm-chip.c
> > +++ b/drivers/char/tpm/tpm-chip.c
> > @@ -296,7 +296,7 @@ static int tpm_class_shutdown(struct device *dev)
> > struct tpm_chip *chip = container_of(dev, struct tpm_chip, dev);
> >
> > down_write(&chip->ops_sem);
> > - if (chip->flags & TPM_CHIP_FLAG_TPM2) {
> > + if (chip->ops && (chip->flags & TPM_CHIP_FLAG_TPM2)) {
> > if (!tpm_chip_start(chip)) {
> > tpm2_shutdown(chip, TPM2_SU_CLEAR);
> > tpm_chip_stop(chip);
> > @@ -479,7 +479,7 @@ static void tpm_del_char_device(struct tpm_chip *chip)
> >
> > /* Make the driver uncallable. */
> > down_write(&chip->ops_sem);
> > - if (chip->flags & TPM_CHIP_FLAG_TPM2) {
> > + if (chip->ops && (chip->flags & TPM_CHIP_FLAG_TPM2)) {
> > if (!tpm_chip_start(chip)) {
> > tpm2_shutdown(chip, TPM2_SU_CLEAR);
> > tpm_chip_stop(chip);
> > --
> > 2.25.1
> >
--
Andrey
On Thu, 2020-07-09 at 17:22 -0700, Andrey Pronin wrote:
> This patch prevents NULL dereferencing when using chip->ops while
> sending TPM2_Shutdown command if both tpm_class_shutdown handler and
> tpm_del_char_device are called during system shutdown.
>
> Both these handlers set chip->ops to NULL but don't check if it's
> already NULL when they are called before using it.
>
> This issue was revealed in Chrome OS after a recent set of changes
> to the unregister order for spi controllers, such as:
> b4c6230bb0ba spi: Fix controller unregister order
> f40913d2dca1 spi: pxa2xx: Fix controller unregister order
> and similar for other controllers.
>
> Signed-off-by: Andrey Pronin <[email protected]>
> ---
> drivers/char/tpm/tpm-chip.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-
> chip.c
> index 8c77e88012e9..a410ca40a3c5 100644
> --- a/drivers/char/tpm/tpm-chip.c
> +++ b/drivers/char/tpm/tpm-chip.c
> @@ -296,7 +296,7 @@ static int tpm_class_shutdown(struct device *dev)
> struct tpm_chip *chip = container_of(dev, struct tpm_chip,
> dev);
>
> down_write(&chip->ops_sem);
> - if (chip->flags & TPM_CHIP_FLAG_TPM2) {
> + if (chip->ops && (chip->flags & TPM_CHIP_FLAG_TPM2)) {
> if (!tpm_chip_start(chip)) {
> tpm2_shutdown(chip, TPM2_SU_CLEAR);
> tpm_chip_stop(chip);
> @@ -479,7 +479,7 @@ static void tpm_del_char_device(struct tpm_chip
> *chip)
>
> /* Make the driver uncallable. */
> down_write(&chip->ops_sem);
> - if (chip->flags & TPM_CHIP_FLAG_TPM2) {
> + if (chip->ops && (chip->flags & TPM_CHIP_FLAG_TPM2)) {
> if (!tpm_chip_start(chip)) {
> tpm2_shutdown(chip, TPM2_SU_CLEAR);
> tpm_chip_stop(chip);
I really don't think this is the right fix. The problem is that these
two functions are trying to open code tpm_try_get_ops/tpm_put_ops (only
really for the tpm2 shutdown) because they want to NULL out the ops
before final mutex unlock. The problem with the current open coding is
it doesn't shut down the clock if required (not really a problem for
shutdown, but might cause issues for simple rmmod). I think this is
because no-one noticed the open coding when get/put was updated.
This code should all be abstracted into a single function and shared
with tpm_try_get_ops/tpm_put_ops so we can't have this happen in
future. Possibly there should be a chip shutdown function which is
only active for TPM2 which does the correct try_get/shutdown/put
operation and then a separate simple get mutex, null ops, put mutex one
that's guaranteed to be called last.
James
On Fri, Jul 10, 2020 at 12:08 PM James Bottomley
<[email protected]> wrote:
>
> On Thu, 2020-07-09 at 17:22 -0700, Andrey Pronin wrote:
> > This patch prevents NULL dereferencing when using chip->ops while
> > sending TPM2_Shutdown command if both tpm_class_shutdown handler and
> > tpm_del_char_device are called during system shutdown.
> >
> > Both these handlers set chip->ops to NULL but don't check if it's
> > already NULL when they are called before using it.
> >
> > This issue was revealed in Chrome OS after a recent set of changes
> > to the unregister order for spi controllers, such as:
> > b4c6230bb0ba spi: Fix controller unregister order
> > f40913d2dca1 spi: pxa2xx: Fix controller unregister order
> > and similar for other controllers.
> >
> > Signed-off-by: Andrey Pronin <[email protected]>
> > ---
> > drivers/char/tpm/tpm-chip.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-
> > chip.c
> > index 8c77e88012e9..a410ca40a3c5 100644
> > --- a/drivers/char/tpm/tpm-chip.c
> > +++ b/drivers/char/tpm/tpm-chip.c
> > @@ -296,7 +296,7 @@ static int tpm_class_shutdown(struct device *dev)
> > struct tpm_chip *chip = container_of(dev, struct tpm_chip,
> > dev);
> >
> > down_write(&chip->ops_sem);
> > - if (chip->flags & TPM_CHIP_FLAG_TPM2) {
> > + if (chip->ops && (chip->flags & TPM_CHIP_FLAG_TPM2)) {
> > if (!tpm_chip_start(chip)) {
> > tpm2_shutdown(chip, TPM2_SU_CLEAR);
> > tpm_chip_stop(chip);
> > @@ -479,7 +479,7 @@ static void tpm_del_char_device(struct tpm_chip
> > *chip)
> >
> > /* Make the driver uncallable. */
> > down_write(&chip->ops_sem);
> > - if (chip->flags & TPM_CHIP_FLAG_TPM2) {
> > + if (chip->ops && (chip->flags & TPM_CHIP_FLAG_TPM2)) {
> > if (!tpm_chip_start(chip)) {
> > tpm2_shutdown(chip, TPM2_SU_CLEAR);
> > tpm_chip_stop(chip);
>
> I really don't think this is the right fix. The problem is that these
> two functions are trying to open code tpm_try_get_ops/tpm_put_ops (only
> really for the tpm2 shutdown) because they want to NULL out the ops
> before final mutex unlock. The problem with the current open coding is
> it doesn't shut down the clock if required (not really a problem for
> shutdown, but might cause issues for simple rmmod). I think this is
> because no-one noticed the open coding when get/put was updated.
>
> This code should all be abstracted into a single function and shared
> with tpm_try_get_ops/tpm_put_ops so we can't have this happen in
> future. Possibly there should be a chip shutdown function which is
> only active for TPM2 which does the correct try_get/shutdown/put
> operation and then a separate simple get mutex, null ops, put mutex one
> that's guaranteed to be called last.
Yes, went for a minimal patch here to stop kernel panics, didn't try to
refactor. Note that we do hold chip->ops_sem in both cases, and it's a
write-lock, not a read-lock (as tpm_try_get_ops uses) since we are
changing chip->ops. Thanks to this write-lock there, shouldn't be parallel
operations that use chip->ops (so not locking chip->tpm_mutex shouldn't
affect it).
So, if I understand the idea right, can refactor to something like:
1) extract common code between tpm_del_char_device and
tpm_class_shutdown into a shared method;
2) further extract the part between up/down(chip->ops_sem) to be
re-used between tpm_try_get_ops/tpm_put_ops and this flow;
3) still have down_write/up_write in this flow vs
get/put_device + down_read/up_read in tpm_try_get_ops case.
Please let me know if that's a bad idea.
Will be unavailable next week, but will continue after that.
>
> James
>
--
Andrey
On Fri, Jul 10, 2020 at 11:25:44AM -0700, Andrey Pronin wrote:
> > Why does not tpm_del_char_device need this?
>
> "Not" is a typo in the sentence above, right? tpm_del_char_device *does*
> need the fix. When tpm_class_shutdown is called it sets chip->ops to
> NULL. If tpm_del_char_device is called after that, it doesn't check if
> chip->ops is NULL (normal kernel API and char device API calls go
> through tpm_try_get_ops, but tpm_del_char_device doesn't) and proceeds to
> call tpm2_shutdown(), which tries sending the command and dereferences
> chip->ops.
It's a typo, yes. Sorry about that.
tpm_class_shutdown() is essentially tail of tpm_del_char_device().
To clean things up, I'd suggest dropping tpm_del_char_device() and
call tpm_class_shutdown() in tpm_chip_unregisters() along, and open
coding things that prepend it in tpm_del_char_device().
/Jarkko
On Tue, Jul 14, 2020 at 4:32 AM Jarkko Sakkinen
<[email protected]> wrote:
>
> On Fri, Jul 10, 2020 at 11:25:44AM -0700, Andrey Pronin wrote:
> > > Why does not tpm_del_char_device need this?
> >
> > "Not" is a typo in the sentence above, right? tpm_del_char_device *does*
> > need the fix. When tpm_class_shutdown is called it sets chip->ops to
> > NULL. If tpm_del_char_device is called after that, it doesn't check if
> > chip->ops is NULL (normal kernel API and char device API calls go
> > through tpm_try_get_ops, but tpm_del_char_device doesn't) and proceeds to
> > call tpm2_shutdown(), which tries sending the command and dereferences
> > chip->ops.
>
> It's a typo, yes. Sorry about that.
>
> tpm_class_shutdown() is essentially tail of tpm_del_char_device().
>
> To clean things up, I'd suggest dropping tpm_del_char_device() and
> call tpm_class_shutdown() in tpm_chip_unregisters() along, and open
> coding things that prepend it in tpm_del_char_device().
>
Personally I would have preferred two separate patches, one to fix the
immediate problem (with Cc: stable) and one for the cleanup, but I
guess merging both into one is ok as long as it is marked for stable.
Thanks,
Guenter
On Tue, Jul 14, 2020 at 08:48:38AM -0700, Guenter Roeck wrote:
> On Tue, Jul 14, 2020 at 4:32 AM Jarkko Sakkinen
> <[email protected]> wrote:
> >
> > On Fri, Jul 10, 2020 at 11:25:44AM -0700, Andrey Pronin wrote:
> > > > Why does not tpm_del_char_device need this?
> > >
> > > "Not" is a typo in the sentence above, right? tpm_del_char_device *does*
> > > need the fix. When tpm_class_shutdown is called it sets chip->ops to
> > > NULL. If tpm_del_char_device is called after that, it doesn't check if
> > > chip->ops is NULL (normal kernel API and char device API calls go
> > > through tpm_try_get_ops, but tpm_del_char_device doesn't) and proceeds to
> > > call tpm2_shutdown(), which tries sending the command and dereferences
> > > chip->ops.
> >
> > It's a typo, yes. Sorry about that.
> >
> > tpm_class_shutdown() is essentially tail of tpm_del_char_device().
> >
> > To clean things up, I'd suggest dropping tpm_del_char_device() and
> > call tpm_class_shutdown() in tpm_chip_unregisters() along, and open
> > coding things that prepend it in tpm_del_char_device().
> >
>
> Personally I would have preferred two separate patches, one to fix the
> immediate problem (with Cc: stable) and one for the cleanup, but I
> guess merging both into one is ok as long as it is marked for stable.
>
> Thanks,
> Guenter
Not sure about stable as this issue does not afaik concern earlier
kernel versions?
/Jarkko
On Thu, Jul 16, 2020 at 10:28 AM Jarkko Sakkinen
<[email protected]> wrote:
>
> On Tue, Jul 14, 2020 at 08:48:38AM -0700, Guenter Roeck wrote:
> > On Tue, Jul 14, 2020 at 4:32 AM Jarkko Sakkinen
> > <[email protected]> wrote:
> > >
> > > On Fri, Jul 10, 2020 at 11:25:44AM -0700, Andrey Pronin wrote:
> > > > > Why does not tpm_del_char_device need this?
> > > >
> > > > "Not" is a typo in the sentence above, right? tpm_del_char_device *does*
> > > > need the fix. When tpm_class_shutdown is called it sets chip->ops to
> > > > NULL. If tpm_del_char_device is called after that, it doesn't check if
> > > > chip->ops is NULL (normal kernel API and char device API calls go
> > > > through tpm_try_get_ops, but tpm_del_char_device doesn't) and proceeds to
> > > > call tpm2_shutdown(), which tries sending the command and dereferences
> > > > chip->ops.
> > >
> > > It's a typo, yes. Sorry about that.
> > >
> > > tpm_class_shutdown() is essentially tail of tpm_del_char_device().
> > >
> > > To clean things up, I'd suggest dropping tpm_del_char_device() and
> > > call tpm_class_shutdown() in tpm_chip_unregisters() along, and open
> > > coding things that prepend it in tpm_del_char_device().
> > >
> >
> > Personally I would have preferred two separate patches, one to fix the
> > immediate problem (with Cc: stable) and one for the cleanup, but I
> > guess merging both into one is ok as long as it is marked for stable.
> >
> > Thanks,
> > Guenter
>
> Not sure about stable as this issue does not afaik concern earlier
> kernel versions?
>
I just had a quick look into linux-5.4.y, and it seemed to me that it
is affected. Maybe I am wrong. Either case, we already applied this
patch to all affected ChromeOS kernel branches, so from our
perspective it doesn't really matter.
Thanks,
Guenter
On Thu, Jul 16, 2020 at 10:38:00AM -0700, Guenter Roeck wrote:
> On Thu, Jul 16, 2020 at 10:28 AM Jarkko Sakkinen
> <[email protected]> wrote:
> >
> > On Tue, Jul 14, 2020 at 08:48:38AM -0700, Guenter Roeck wrote:
> > > On Tue, Jul 14, 2020 at 4:32 AM Jarkko Sakkinen
> > > <[email protected]> wrote:
> > > >
> > > > On Fri, Jul 10, 2020 at 11:25:44AM -0700, Andrey Pronin wrote:
> > > > > > Why does not tpm_del_char_device need this?
> > > > >
> > > > > "Not" is a typo in the sentence above, right? tpm_del_char_device *does*
> > > > > need the fix. When tpm_class_shutdown is called it sets chip->ops to
> > > > > NULL. If tpm_del_char_device is called after that, it doesn't check if
> > > > > chip->ops is NULL (normal kernel API and char device API calls go
> > > > > through tpm_try_get_ops, but tpm_del_char_device doesn't) and proceeds to
> > > > > call tpm2_shutdown(), which tries sending the command and dereferences
> > > > > chip->ops.
> > > >
> > > > It's a typo, yes. Sorry about that.
> > > >
> > > > tpm_class_shutdown() is essentially tail of tpm_del_char_device().
> > > >
> > > > To clean things up, I'd suggest dropping tpm_del_char_device() and
> > > > call tpm_class_shutdown() in tpm_chip_unregisters() along, and open
> > > > coding things that prepend it in tpm_del_char_device().
> > > >
> > >
> > > Personally I would have preferred two separate patches, one to fix the
> > > immediate problem (with Cc: stable) and one for the cleanup, but I
> > > guess merging both into one is ok as long as it is marked for stable.
> > >
> > > Thanks,
> > > Guenter
> >
> > Not sure about stable as this issue does not afaik concern earlier
> > kernel versions?
> >
>
> I just had a quick look into linux-5.4.y, and it seemed to me that it
> is affected. Maybe I am wrong. Either case, we already applied this
> patch to all affected ChromeOS kernel branches, so from our
> perspective it doesn't really matter.
>
> Thanks,
> Guenter
I'm fine with cc'ing stable after consideration given the benefits.
Given that conclusion, it is better to break this down to two part
series as you proposed.
/Jarkko