2018-08-21 15:03:50

by Christophe Leroy

[permalink] [raw]
Subject: Re: BUG: libkcapi tests trigger sleep-in-atomic bug in VMX code (ppc64)



Le 21/08/2018 à 16:38, Ondrej Mosnáček a écrit :
> ut 21. 8. 2018 o 16:18 Stephan Mueller <[email protected]> napísal(a):
>> Am Dienstag, 21. August 2018, 14:48:11 CEST schrieb Ondrej Mosnáček:
>>
>> Hi Ondrej, Marcelo,
>>
>> (+Marcelo)
>>
>>> Looking at crypto/algif_skcipher.c, I can see that skcipher_recvmsg()
>>> holds the socket lock the whole time and yet passes
>>> CRYPTO_TFM_REQ_MAY_SLEEP to the cipher implementation. Isn't that
>>> wrong?
>>
>> I think you are referring to lock_sock(sk)?
>>
>> If so, this should not be the culprit: the socket lock is in essence a mutex-
>> like operation with its own wait queue that it allowed to sleep. In
>> lock_sock_nested that is called by lock_sock it even has the call of
>> might_sleep which indicates that the caller may be put to sleep.
>>
>> Looking into the code (without too much debugging) I see in the function
>> p8_aes_cbc_encrypt that is part of the stack trace the call to
>> preempt_disable() which starts an atomic context. The preempt_enable() is
>> invoked after the walk operation.
>>
>> The preempt_disable increases the preempt_count. That counter is used by
>> in_atomic() to check whether we are in atomic context.
>>
>> The issue is that blkcipher_walk_done may call crypto_yield() which then
>> invokes cond_resched if the implementation is allowed to sleep.
>
> Indeed, you're right, the issue is actually in the vmx_crypto code. I
> remember having looked at the 'ctr(aes)' implementation in there a few
> days ago (I think I was trying to debug this very issue, but for some
> reason I only looked at ctr(aes)...) and I didn't find any bug, so
> that's why I jumped to suspecting the algif_skcipher code... I should
> have double-checked :)
>
> It turns out the 'cbc(aes)' (and actually also 'xts(aes)')
> implementation is coded a bit differently and they both *do* contain
> the sleep-in-atomic bug. I will try to fix them according to the
> correct CTR implementation and send a patch.

CC: [email protected] <[email protected]>

>
> Thanks,
> Ondrej
>
>> @Marcelo: shouldn't be the sleep flag be cleared when entering the
>> preempt_disable section?
>>
>> Ciao
>> Stephan
>>
>>


2018-08-21 15:12:00

by Marcelo Henrique Cerri

[permalink] [raw]
Subject: Re: BUG: libkcapi tests trigger sleep-in-atomic bug in VMX code (ppc64)

CC: Paulo Flabiano Smorigo <[email protected]>

Yes, I do believe that CTR is doing it right. Preemption only needs to be
disabled during the aes_p8_cbc_encrypt() call, to avoid trashing the
VSX registers during the AES operation.

--
Regards,
Marcelo

On Tue, Aug 21, 2018 at 05:03:50PM +0200, Christophe LEROY wrote:
>
>
> Le 21/08/2018 à 16:38, Ondrej Mosnáček a écrit :
> > ut 21. 8. 2018 o 16:18 Stephan Mueller <[email protected]> napísal(a):
> > > Am Dienstag, 21. August 2018, 14:48:11 CEST schrieb Ondrej Mosnáček:
> > >
> > > Hi Ondrej, Marcelo,
> > >
> > > (+Marcelo)
> > >
> > > > Looking at crypto/algif_skcipher.c, I can see that skcipher_recvmsg()
> > > > holds the socket lock the whole time and yet passes
> > > > CRYPTO_TFM_REQ_MAY_SLEEP to the cipher implementation. Isn't that
> > > > wrong?
> > >
> > > I think you are referring to lock_sock(sk)?
> > >
> > > If so, this should not be the culprit: the socket lock is in essence a mutex-
> > > like operation with its own wait queue that it allowed to sleep. In
> > > lock_sock_nested that is called by lock_sock it even has the call of
> > > might_sleep which indicates that the caller may be put to sleep.
> > >
> > > Looking into the code (without too much debugging) I see in the function
> > > p8_aes_cbc_encrypt that is part of the stack trace the call to
> > > preempt_disable() which starts an atomic context. The preempt_enable() is
> > > invoked after the walk operation.
> > >
> > > The preempt_disable increases the preempt_count. That counter is used by
> > > in_atomic() to check whether we are in atomic context.
> > >
> > > The issue is that blkcipher_walk_done may call crypto_yield() which then
> > > invokes cond_resched if the implementation is allowed to sleep.
> >
> > Indeed, you're right, the issue is actually in the vmx_crypto code. I
> > remember having looked at the 'ctr(aes)' implementation in there a few
> > days ago (I think I was trying to debug this very issue, but for some
> > reason I only looked at ctr(aes)...) and I didn't find any bug, so
> > that's why I jumped to suspecting the algif_skcipher code... I should
> > have double-checked :)
> >
> > It turns out the 'cbc(aes)' (and actually also 'xts(aes)')
> > implementation is coded a bit differently and they both *do* contain
> > the sleep-in-atomic bug. I will try to fix them according to the
> > correct CTR implementation and send a patch.
>
> CC: [email protected] <[email protected]>
>
> >
> > Thanks,
> > Ondrej
> >
> > > @Marcelo: shouldn't be the sleep flag be cleared when entering the
> > > preempt_disable section?
> > >
> > > Ciao
> > > Stephan
> > >
> > >


Attachments:
(No filename) (2.64 kB)
signature.asc (488.00 B)
Download all attachments