2014-04-25 13:00:01

by Dmitry Kasatkin

[permalink] [raw]
Subject: Kernel panic at Ubuntu: IMA + Apparmor

Hello,

I discovered a kernel panic on system running Ubuntu when IMA is enabled.
It happens on reboot.

----------------------
[ 106.750100] NSPROXY is NULL: error.log (/var/log/mysql/error.log)
[ 106.750167] BUG: unable to handle kernel NULL pointer dereference at
0000000000000018
[ 106.750221] IP: [<ffffffff811ec7da>] our_mnt+0x1a/0x30
[ 106.750241] PGD 0
[ 106.750254] Oops: 0000 [#1] SMP
[ 106.750272] Modules linked in: cuse parport_pc ppdev bnep rfcomm
bluetooth rpcsec_gss_krb5 nfsd auth_rpcgss nfs_acl nfs lockd sunrpc
fscache dm_crypt intel_rapl x86_pkg_temp_thermal intel_powerclamp
kvm_intel snd_hda_codec_hdmi kvm crct10dif_pclmul crc32_pclmul
ghash_clmulni_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul
ablk_helper cryptd snd_hda_codec_realtek dcdbas snd_hda_intel
snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_seq_midi
snd_seq_midi_event snd_rawmidi psmouse snd_seq microcode serio_raw
snd_timer snd_seq_device snd soundcore video lpc_ich coretemp mac_hid lp
parport mei_me mei nbd hid_generic e1000e usbhid ahci ptp hid libahci
pps_core
[ 106.750658] CPU: 6 PID: 1394 Comm: mysqld Not tainted 3.13.0-rc7-kds+ #15
[ 106.750673] Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A08
09/19/2012
[ 106.750689] task: ffff8800de804920 ti: ffff880400fca000 task.ti:
ffff880400fca000
[ 106.750704] RIP: 0010:[<ffffffff811ec7da>] [<ffffffff811ec7da>]
our_mnt+0x1a/0x30
[ 106.750725] RSP: 0018:ffff880400fcba60 EFLAGS: 00010286
[ 106.750738] RAX: 0000000000000000 RBX: 0000000000000100 RCX:
ffff8800d51523e7
[ 106.750764] RDX: ffffffffffffffea RSI: ffff880400fcba34 RDI:
ffff880402d20020
[ 106.750791] RBP: ffff880400fcbae0 R08: 0000000000000000 R09:
0000000000000001
[ 106.750817] R10: 0000000000000000 R11: 0000000000000001 R12:
ffff8800d5152300
[ 106.750844] R13: ffff8803eb8df510 R14: ffff880400fcbb28 R15:
ffff8800d51523e7
[ 106.750871] FS: 0000000000000000(0000) GS:ffff88040d200000(0000)
knlGS:0000000000000000
[ 106.750910] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 106.750935] CR2: 0000000000000018 CR3: 0000000001c0e000 CR4:
00000000001407e0
[ 106.750962] Stack:
[ 106.750981] ffffffff813434eb ffff880400fcbb20 ffff880400fcbb18
0000000000000000
[ 106.751037] ffff8800de804920 ffffffff8101b9b9 0001800000000000
0000000000000100
[ 106.751093] 0000010000000000 0000000000000002 000000000000000e
ffff8803eb8df500
[ 106.751149] Call Trace:
[ 106.751172] [<ffffffff813434eb>] ? aa_path_name+0x2ab/0x430
[ 106.751199] [<ffffffff8101b9b9>] ? sched_clock+0x9/0x10
[ 106.751225] [<ffffffff8134a68d>] aa_path_perm+0x7d/0x170
[ 106.751250] [<ffffffff8101b945>] ? native_sched_clock+0x15/0x80
[ 106.751276] [<ffffffff8134aa73>] aa_file_perm+0x33/0x40
[ 106.751301] [<ffffffff81348c5e>] common_file_perm+0x8e/0xb0
[ 106.751327] [<ffffffff81348d78>] apparmor_file_permission+0x18/0x20
[ 106.751355] [<ffffffff8130c853>] security_file_permission+0x23/0xa0
[ 106.751382] [<ffffffff811c77a2>] rw_verify_area+0x52/0xe0
[ 106.751407] [<ffffffff811c789d>] vfs_read+0x6d/0x170
[ 106.751432] [<ffffffff811cda31>] kernel_read+0x41/0x60
[ 106.751457] [<ffffffff8134fd45>] ima_calc_file_hash+0x225/0x280
[ 106.751483] [<ffffffff8134fb52>] ? ima_calc_file_hash+0x32/0x280
[ 106.751509] [<ffffffff8135022d>] ima_collect_measurement+0x9d/0x160
[ 106.751536] [<ffffffff810b552d>] ? trace_hardirqs_on+0xd/0x10
[ 106.751562] [<ffffffff8134f07c>] ? ima_file_free+0x6c/0xd0
[ 106.751587] [<ffffffff81352824>] ima_update_xattr+0x34/0x60
[ 106.751612] [<ffffffff8134f0d0>] ima_file_free+0xc0/0xd0
[ 106.751637] [<ffffffff811c9635>] __fput+0xd5/0x300
[ 106.751662] [<ffffffff811c98ae>] ____fput+0xe/0x10
[ 106.751687] [<ffffffff81086774>] task_work_run+0xc4/0xe0
[ 106.751712] [<ffffffff81066fad>] do_exit+0x2bd/0xa90
[ 106.751738] [<ffffffff8173c958>] ? retint_swapgs+0x13/0x1b
[ 106.751763] [<ffffffff8106780c>] do_group_exit+0x4c/0xc0
[ 106.751788] [<ffffffff81067894>] SyS_exit_group+0x14/0x20
[ 106.751814] [<ffffffff8174522d>] system_call_fastpath+0x1a/0x1f
[ 106.751839] Code: c3 0f 1f 44 00 00 55 48 89 e5 e8 22 fe ff ff 5d c3
0f 1f 44 00 00 55 65 48 8b 04 25 c0 c9 00 00 48 8b 80 28 06 00 00 48 89
e5 5d <48> 8b 40 18 48 39 87 c0 00 00 00 0f 94 c0 c3 0f 1f 80 00 00 00
[ 106.752185] RIP [<ffffffff811ec7da>] our_mnt+0x1a/0x30
[ 106.752214] RSP <ffff880400fcba60>
[ 106.752236] CR2: 0000000000000018
[ 106.752258] ---[ end trace 3c520748b4732721 ]---
[ 106.752282] Fixing recursive fault but reboot is needed!

----------------------------

It happens in "our_mnt", because current->nsproxy is NULL...

It looks like "current->nsproxy" is gone before mysqld closes error.log,
or it is always NULL for mysqld???


Does anyone have any ideas?

Thanks,
Dmitry


2014-04-25 14:47:42

by Dmitry Kasatkin

[permalink] [raw]
Subject: Re: Kernel panic at Ubuntu: IMA + Apparmor

On 25/04/14 16:00, Dmitry Kasatkin wrote:
> Hello,
>
> I discovered a kernel panic on system running Ubuntu when IMA is enabled.
> It happens on reboot.
>
> ----------------------
> [ 106.750100] NSPROXY is NULL: error.log (/var/log/mysql/error.log)
> [ 106.750167] BUG: unable to handle kernel NULL pointer dereference at
> 0000000000000018
> [ 106.750221] IP: [<ffffffff811ec7da>] our_mnt+0x1a/0x30
> [ 106.750241] PGD 0
> [ 106.750254] Oops: 0000 [#1] SMP
> [ 106.750272] Modules linked in: cuse parport_pc ppdev bnep rfcomm
> bluetooth rpcsec_gss_krb5 nfsd auth_rpcgss nfs_acl nfs lockd sunrpc
> fscache dm_crypt intel_rapl x86_pkg_temp_thermal intel_powerclamp
> kvm_intel snd_hda_codec_hdmi kvm crct10dif_pclmul crc32_pclmul
> ghash_clmulni_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul
> ablk_helper cryptd snd_hda_codec_realtek dcdbas snd_hda_intel
> snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_seq_midi
> snd_seq_midi_event snd_rawmidi psmouse snd_seq microcode serio_raw
> snd_timer snd_seq_device snd soundcore video lpc_ich coretemp mac_hid lp
> parport mei_me mei nbd hid_generic e1000e usbhid ahci ptp hid libahci
> pps_core
> [ 106.750658] CPU: 6 PID: 1394 Comm: mysqld Not tainted 3.13.0-rc7-kds+ #15
> [ 106.750673] Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A08
> 09/19/2012
> [ 106.750689] task: ffff8800de804920 ti: ffff880400fca000 task.ti:
> ffff880400fca000
> [ 106.750704] RIP: 0010:[<ffffffff811ec7da>] [<ffffffff811ec7da>]
> our_mnt+0x1a/0x30
> [ 106.750725] RSP: 0018:ffff880400fcba60 EFLAGS: 00010286
> [ 106.750738] RAX: 0000000000000000 RBX: 0000000000000100 RCX:
> ffff8800d51523e7
> [ 106.750764] RDX: ffffffffffffffea RSI: ffff880400fcba34 RDI:
> ffff880402d20020
> [ 106.750791] RBP: ffff880400fcbae0 R08: 0000000000000000 R09:
> 0000000000000001
> [ 106.750817] R10: 0000000000000000 R11: 0000000000000001 R12:
> ffff8800d5152300
> [ 106.750844] R13: ffff8803eb8df510 R14: ffff880400fcbb28 R15:
> ffff8800d51523e7
> [ 106.750871] FS: 0000000000000000(0000) GS:ffff88040d200000(0000)
> knlGS:0000000000000000
> [ 106.750910] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 106.750935] CR2: 0000000000000018 CR3: 0000000001c0e000 CR4:
> 00000000001407e0
> [ 106.750962] Stack:
> [ 106.750981] ffffffff813434eb ffff880400fcbb20 ffff880400fcbb18
> 0000000000000000
> [ 106.751037] ffff8800de804920 ffffffff8101b9b9 0001800000000000
> 0000000000000100
> [ 106.751093] 0000010000000000 0000000000000002 000000000000000e
> ffff8803eb8df500
> [ 106.751149] Call Trace:
> [ 106.751172] [<ffffffff813434eb>] ? aa_path_name+0x2ab/0x430
> [ 106.751199] [<ffffffff8101b9b9>] ? sched_clock+0x9/0x10
> [ 106.751225] [<ffffffff8134a68d>] aa_path_perm+0x7d/0x170
> [ 106.751250] [<ffffffff8101b945>] ? native_sched_clock+0x15/0x80
> [ 106.751276] [<ffffffff8134aa73>] aa_file_perm+0x33/0x40
> [ 106.751301] [<ffffffff81348c5e>] common_file_perm+0x8e/0xb0
> [ 106.751327] [<ffffffff81348d78>] apparmor_file_permission+0x18/0x20
> [ 106.751355] [<ffffffff8130c853>] security_file_permission+0x23/0xa0
> [ 106.751382] [<ffffffff811c77a2>] rw_verify_area+0x52/0xe0
> [ 106.751407] [<ffffffff811c789d>] vfs_read+0x6d/0x170
> [ 106.751432] [<ffffffff811cda31>] kernel_read+0x41/0x60
> [ 106.751457] [<ffffffff8134fd45>] ima_calc_file_hash+0x225/0x280
> [ 106.751483] [<ffffffff8134fb52>] ? ima_calc_file_hash+0x32/0x280
> [ 106.751509] [<ffffffff8135022d>] ima_collect_measurement+0x9d/0x160
> [ 106.751536] [<ffffffff810b552d>] ? trace_hardirqs_on+0xd/0x10
> [ 106.751562] [<ffffffff8134f07c>] ? ima_file_free+0x6c/0xd0
> [ 106.751587] [<ffffffff81352824>] ima_update_xattr+0x34/0x60
> [ 106.751612] [<ffffffff8134f0d0>] ima_file_free+0xc0/0xd0
> [ 106.751637] [<ffffffff811c9635>] __fput+0xd5/0x300
> [ 106.751662] [<ffffffff811c98ae>] ____fput+0xe/0x10
> [ 106.751687] [<ffffffff81086774>] task_work_run+0xc4/0xe0
> [ 106.751712] [<ffffffff81066fad>] do_exit+0x2bd/0xa90
> [ 106.751738] [<ffffffff8173c958>] ? retint_swapgs+0x13/0x1b
> [ 106.751763] [<ffffffff8106780c>] do_group_exit+0x4c/0xc0
> [ 106.751788] [<ffffffff81067894>] SyS_exit_group+0x14/0x20
> [ 106.751814] [<ffffffff8174522d>] system_call_fastpath+0x1a/0x1f
> [ 106.751839] Code: c3 0f 1f 44 00 00 55 48 89 e5 e8 22 fe ff ff 5d c3
> 0f 1f 44 00 00 55 65 48 8b 04 25 c0 c9 00 00 48 8b 80 28 06 00 00 48 89
> e5 5d <48> 8b 40 18 48 39 87 c0 00 00 00 0f 94 c0 c3 0f 1f 80 00 00 00
> [ 106.752185] RIP [<ffffffff811ec7da>] our_mnt+0x1a/0x30
> [ 106.752214] RSP <ffff880400fcba60>
> [ 106.752236] CR2: 0000000000000018
> [ 106.752258] ---[ end trace 3c520748b4732721 ]---
> [ 106.752282] Fixing recursive fault but reboot is needed!
>
> ----------------------------
>
> It happens in "our_mnt", because current->nsproxy is NULL...
>
> It looks like "current->nsproxy" is gone before mysqld closes error.log,
> or it is always NULL for mysqld???
>
>
> Does anyone have any ideas?
>
> Thanks,
> Dmitry
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>


It seems the problem is the order of functions in do_exit

do_exit()
{
...
exit_task_namespaces(tsk);
exit_task_work(tsk);
...
}

First, namesspaces is cleaned up.
Second, delayed fput is done.

It seems this patch appeared in 3.10 is the reason for the panic:

8aac62706 move exit_task_namespaces() outside of exit_notify()

This patch moved exit_task_namespaces(tsk) before exit_task_work..

Oleg, Al, any thoughts?

Thanks,

- Dmitry

2014-04-25 18:23:26

by Oleg Nesterov

[permalink] [raw]
Subject: Re: Kernel panic at Ubuntu: IMA + Apparmor

On 04/25, Dmitry Kasatkin wrote:
>
> On 25/04/14 16:00, Dmitry Kasatkin wrote:
> > Hello,
> >
> > I discovered a kernel panic on system running Ubuntu when IMA is enabled.
> > It happens on reboot.
> >
> > ----------------------
> > [ 106.750100] NSPROXY is NULL: error.log (/var/log/mysql/error.log)
> > [ 106.750167] BUG: unable to handle kernel NULL pointer dereference at
> > 0000000000000018
> > [ 106.750221] IP: [<ffffffff811ec7da>] our_mnt+0x1a/0x30
...
> > [ 106.751149] Call Trace:
> > [ 106.751172] [<ffffffff813434eb>] ? aa_path_name+0x2ab/0x430
> > [ 106.751199] [<ffffffff8101b9b9>] ? sched_clock+0x9/0x10
> > [ 106.751225] [<ffffffff8134a68d>] aa_path_perm+0x7d/0x170
> > [ 106.751250] [<ffffffff8101b945>] ? native_sched_clock+0x15/0x80
> > [ 106.751276] [<ffffffff8134aa73>] aa_file_perm+0x33/0x40
> > [ 106.751301] [<ffffffff81348c5e>] common_file_perm+0x8e/0xb0
> > [ 106.751327] [<ffffffff81348d78>] apparmor_file_permission+0x18/0x20
> > [ 106.751355] [<ffffffff8130c853>] security_file_permission+0x23/0xa0
> > [ 106.751382] [<ffffffff811c77a2>] rw_verify_area+0x52/0xe0
> > [ 106.751407] [<ffffffff811c789d>] vfs_read+0x6d/0x170
> > [ 106.751432] [<ffffffff811cda31>] kernel_read+0x41/0x60
> > [ 106.751457] [<ffffffff8134fd45>] ima_calc_file_hash+0x225/0x280
> > [ 106.751483] [<ffffffff8134fb52>] ? ima_calc_file_hash+0x32/0x280
> > [ 106.751509] [<ffffffff8135022d>] ima_collect_measurement+0x9d/0x160
> > [ 106.751536] [<ffffffff810b552d>] ? trace_hardirqs_on+0xd/0x10
> > [ 106.751562] [<ffffffff8134f07c>] ? ima_file_free+0x6c/0xd0
> > [ 106.751587] [<ffffffff81352824>] ima_update_xattr+0x34/0x60
> > [ 106.751612] [<ffffffff8134f0d0>] ima_file_free+0xc0/0xd0
> > [ 106.751637] [<ffffffff811c9635>] __fput+0xd5/0x300

fantastic ;)

> > [ 106.751662] [<ffffffff811c98ae>] ____fput+0xe/0x10
> > [ 106.751687] [<ffffffff81086774>] task_work_run+0xc4/0xe0
> > [ 106.751712] [<ffffffff81066fad>] do_exit+0x2bd/0xa90
> > [ 106.751738] [<ffffffff8173c958>] ? retint_swapgs+0x13/0x1b
> > [ 106.751763] [<ffffffff8106780c>] do_group_exit+0x4c/0xc0
...
> It seems the problem is the order of functions in do_exit
>
> do_exit()
> {
> ...
> exit_task_namespaces(tsk);
> exit_task_work(tsk);

Yes.

Eric, this makes me think again that we should do exit_task_namespaces()
after exit_task_work(). We already discussed this before, but this looks
like another indication this change makes sense.

The problem with fput() from free_nsproxy() was hopefully also fixed by
e7b2c4069252. The main motivation for "move" was "outside of exit_notify".
Even if we fix the paths above task_work_add() can have another user which
wants ->nsproxy.

What do you think?

Oleg.

2014-04-25 19:05:42

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Kernel panic at Ubuntu: IMA + Apparmor

Oleg Nesterov <[email protected]> writes:

> On 04/25, Dmitry Kasatkin wrote:
>>
>> On 25/04/14 16:00, Dmitry Kasatkin wrote:
>> > Hello,
>> >
>> > I discovered a kernel panic on system running Ubuntu when IMA is enabled.
>> > It happens on reboot.
>> >
>> > ----------------------
>> > [ 106.750100] NSPROXY is NULL: error.log (/var/log/mysql/error.log)
>> > [ 106.750167] BUG: unable to handle kernel NULL pointer dereference at
>> > 0000000000000018
>> > [ 106.750221] IP: [<ffffffff811ec7da>] our_mnt+0x1a/0x30
> ...
>> > [ 106.751149] Call Trace:
>> > [ 106.751172] [<ffffffff813434eb>] ? aa_path_name+0x2ab/0x430
>> > [ 106.751199] [<ffffffff8101b9b9>] ? sched_clock+0x9/0x10
>> > [ 106.751225] [<ffffffff8134a68d>] aa_path_perm+0x7d/0x170
>> > [ 106.751250] [<ffffffff8101b945>] ? native_sched_clock+0x15/0x80
>> > [ 106.751276] [<ffffffff8134aa73>] aa_file_perm+0x33/0x40
>> > [ 106.751301] [<ffffffff81348c5e>] common_file_perm+0x8e/0xb0
>> > [ 106.751327] [<ffffffff81348d78>] apparmor_file_permission+0x18/0x20
>> > [ 106.751355] [<ffffffff8130c853>] security_file_permission+0x23/0xa0
>> > [ 106.751382] [<ffffffff811c77a2>] rw_verify_area+0x52/0xe0
>> > [ 106.751407] [<ffffffff811c789d>] vfs_read+0x6d/0x170
>> > [ 106.751432] [<ffffffff811cda31>] kernel_read+0x41/0x60
>> > [ 106.751457] [<ffffffff8134fd45>] ima_calc_file_hash+0x225/0x280
>> > [ 106.751483] [<ffffffff8134fb52>] ? ima_calc_file_hash+0x32/0x280
>> > [ 106.751509] [<ffffffff8135022d>] ima_collect_measurement+0x9d/0x160
>> > [ 106.751536] [<ffffffff810b552d>] ? trace_hardirqs_on+0xd/0x10
>> > [ 106.751562] [<ffffffff8134f07c>] ? ima_file_free+0x6c/0xd0
>> > [ 106.751587] [<ffffffff81352824>] ima_update_xattr+0x34/0x60
>> > [ 106.751612] [<ffffffff8134f0d0>] ima_file_free+0xc0/0xd0
>> > [ 106.751637] [<ffffffff811c9635>] __fput+0xd5/0x300
>
> fantastic ;)
>
>> > [ 106.751662] [<ffffffff811c98ae>] ____fput+0xe/0x10
>> > [ 106.751687] [<ffffffff81086774>] task_work_run+0xc4/0xe0
>> > [ 106.751712] [<ffffffff81066fad>] do_exit+0x2bd/0xa90
>> > [ 106.751738] [<ffffffff8173c958>] ? retint_swapgs+0x13/0x1b
>> > [ 106.751763] [<ffffffff8106780c>] do_group_exit+0x4c/0xc0
> ...
>> It seems the problem is the order of functions in do_exit
>>
>> do_exit()
>> {
>> ...
>> exit_task_namespaces(tsk);
>> exit_task_work(tsk);
>
> Yes.
>
> Eric, this makes me think again that we should do exit_task_namespaces()
> after exit_task_work(). We already discussed this before, but this looks
> like another indication this change makes sense.

I know you mentioned something about that. I haven't actually had much
time to think about it.

> The problem with fput() from free_nsproxy() was hopefully also fixed by
> e7b2c4069252. The main motivation for "move" was "outside of exit_notify".
> Even if we fix the paths above task_work_add() can have another user which
> wants ->nsproxy.
>
> What do you think?

I am scratching my head. Delayed work that depends on current sort of
blows my mind.

As I read this it is coming from __fput. Hmm.

So I think the blinkered thing is having a security_file_permission
inside of kernel_read. That is utter nonsense. At least for a
kernel_read we are doing from ima.

apparmor should not be trying to limit ima.

Show me a legitimate bug caused by the ordering and we can discuss
reordering.


Looking at kernel_read it is used elsewhere by the binfmt routines, and
doing permissions checks there on the part of the user that is doing
exec seems appropriate.

So I think the deep bug here is ima is using kernel_read which is
inappropriate for what it is trying to do. Permission checks on an
integrity mechanism?

Even if we ignore insane the security_file_permission in rw_verify_area
we still have mandatory locks that could cause this to fail. Which
looks like an evil user could bypass ima by simply setting a mandatory
lock on a file and not letting anyone else read it. Which makes
vfs_read and thus kernel_read wildly inappropriate for what ima is
trying to do.

Eric

2014-04-25 19:25:58

by Oleg Nesterov

[permalink] [raw]
Subject: Re: Kernel panic at Ubuntu: IMA + Apparmor

On 04/25, Eric W. Biederman wrote:
>
> Oleg Nesterov <[email protected]> writes:
>
> > Eric, this makes me think again that we should do exit_task_namespaces()
> > after exit_task_work(). We already discussed this before, but this looks
> > like another indication this change makes sense.
>
> I know you mentioned something about that. I haven't actually had much
> time to think about it.
>
> > The problem with fput() from free_nsproxy() was hopefully also fixed by
> > e7b2c4069252. The main motivation for "move" was "outside of exit_notify".
> > Even if we fix the paths above task_work_add() can have another user which
> > wants ->nsproxy.
> >
> > What do you think?
>
> I am scratching my head. Delayed work that depends on current sort of
> blows my mind.

But task_work_add(task) was specially introduced to run a callback in the
task's context.

> That is utter nonsense.

Yes I agree, _perhaps_ we can fix this particular problem without changing
the exit_namespace/work ordering, and perhaps this makes sense anyway.

Well. I _think_ that __fput() and ima_file_free() in particular should not
depend on current and/or current->nsproxy. If nothing else, fput() can be
called by the unrelated task which looks into /proc/pid/.

But again, task_work_add() has more and more users, and it seems that even
__fput() paths can do "everything", so perhaps it would be safer to allow
to use ->nsproxy in task_work_run.

Oleg.

2014-04-25 19:41:08

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Kernel panic at Ubuntu: IMA + Apparmor

Oleg Nesterov <[email protected]> writes:

> On 04/25, Eric W. Biederman wrote:
>>
>> Oleg Nesterov <[email protected]> writes:
>>
>> > Eric, this makes me think again that we should do exit_task_namespaces()
>> > after exit_task_work(). We already discussed this before, but this looks
>> > like another indication this change makes sense.
>>
>> I know you mentioned something about that. I haven't actually had much
>> time to think about it.
>>
>> > The problem with fput() from free_nsproxy() was hopefully also fixed by
>> > e7b2c4069252. The main motivation for "move" was "outside of exit_notify".
>> > Even if we fix the paths above task_work_add() can have another user which
>> > wants ->nsproxy.
>> >
>> > What do you think?
>>
>> I am scratching my head. Delayed work that depends on current sort of
>> blows my mind.
>
> But task_work_add(task) was specially introduced to run a callback in the
> task's context.
>
>> That is utter nonsense.
>
> Yes I agree, _perhaps_ we can fix this particular problem without changing
> the exit_namespace/work ordering, and perhaps this makes sense anyway.
>
> Well. I _think_ that __fput() and ima_file_free() in particular should not
> depend on current and/or current->nsproxy. If nothing else, fput() can be
> called by the unrelated task which looks into /proc/pid/.
>
> But again, task_work_add() has more and more users, and it seems that even
> __fput() paths can do "everything", so perhaps it would be safer to allow
> to use ->nsproxy in task_work_run.

Like I said, give me a clear motivating case. Right now not allowing
nsproxy is turning up bugs in __fput. Which seems like a good thing.

Eric

2014-04-25 20:02:18

by Oleg Nesterov

[permalink] [raw]
Subject: Re: Kernel panic at Ubuntu: IMA + Apparmor

On 04/25, Eric W. Biederman wrote:
>
> Oleg Nesterov <[email protected]> writes:
>
> > Well. I _think_ that __fput() and ima_file_free() in particular should not
> > depend on current and/or current->nsproxy. If nothing else, fput() can be
> > called by the unrelated task which looks into /proc/pid/.
> >
> > But again, task_work_add() has more and more users, and it seems that even
> > __fput() paths can do "everything", so perhaps it would be safer to allow
> > to use ->nsproxy in task_work_run.
>
> Like I said, give me a clear motivating case.

I agree, we need a reason. Currently I do not see one.

> Right now not allowing
> nsproxy is turning up bugs in __fput. Which seems like a good thing.

This is what I certainly agree with ;)

Oleg.

2014-04-25 20:20:58

by Dmitry Kasatkin

[permalink] [raw]
Subject: Re: Kernel panic at Ubuntu: IMA + Apparmor

On 25 April 2014 23:01, Oleg Nesterov <[email protected]> wrote:
> On 04/25, Eric W. Biederman wrote:
>>
>> Oleg Nesterov <[email protected]> writes:
>>
>> > Well. I _think_ that __fput() and ima_file_free() in particular should not
>> > depend on current and/or current->nsproxy. If nothing else, fput() can be
>> > called by the unrelated task which looks into /proc/pid/.
>> >
>> > But again, task_work_add() has more and more users, and it seems that even
>> > __fput() paths can do "everything", so perhaps it would be safer to allow
>> > to use ->nsproxy in task_work_run.
>>
>> Like I said, give me a clear motivating case.
>
> I agree, we need a reason. Currently I do not see one.
>
>> Right now not allowing
>> nsproxy is turning up bugs in __fput. Which seems like a good thing.
>
> This is what I certainly agree with ;)
>

Hi,

IMA uses kernel_read API which does not know anything about caller.
And of course security frameworks are at guard as usual.

Exactly after reading first Eric's respons, I thought why to scratch
the head when task work queues are indeed designed for tasks...

And if you to dig for the history, IMA-appraisal was stuck due to
lockdep reporting even though it was on non-everlaping cases.
IIRC files vs. directories...

After that IIRC Al Viro discussed about delayed fput and IIRC Oleg
(sorry if I am wrong) introduced task work queues.

So IMA-appraisal was able to be upstreamed... That was ~3.4 time frame, IIRC

Name space also dated around ~3.4??
Apparmor namespace change was also around that time.

3.10 introduces this name space order change and broke IMA-appraisal.

Isn't it a clear motivating case???

- Dmitry


> Oleg.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html



--
Thanks,
Dmitry

2014-04-25 20:45:58

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Kernel panic at Ubuntu: IMA + Apparmor

Dmitry Kasatkin <[email protected]> writes:

> On 25 April 2014 23:01, Oleg Nesterov <[email protected]> wrote:
>> On 04/25, Eric W. Biederman wrote:
>>>
>>> Oleg Nesterov <[email protected]> writes:
>>>
>>> > Well. I _think_ that __fput() and ima_file_free() in particular should not
>>> > depend on current and/or current->nsproxy. If nothing else, fput() can be
>>> > called by the unrelated task which looks into /proc/pid/.
>>> >
>>> > But again, task_work_add() has more and more users, and it seems that even
>>> > __fput() paths can do "everything", so perhaps it would be safer to allow
>>> > to use ->nsproxy in task_work_run.
>>>
>>> Like I said, give me a clear motivating case.
>>
>> I agree, we need a reason. Currently I do not see one.
>>
>>> Right now not allowing
>>> nsproxy is turning up bugs in __fput. Which seems like a good thing.
>>
>> This is what I certainly agree with ;)
>>
>
> Hi,
>
> IMA uses kernel_read API which does not know anything about caller.
> And of course security frameworks are at guard as usual.
>
> Exactly after reading first Eric's respons, I thought why to scratch
> the head when task work queues are indeed designed for tasks...

__fput has no guarantee of running in the task that close the file
descriptor. If your code depends on that your code is broken.

> And if you to dig for the history, IMA-appraisal was stuck due to
> lockdep reporting even though it was on non-everlaping cases.
> IIRC files vs. directories...
>
> After that IIRC Al Viro discussed about delayed fput and IIRC Oleg
> (sorry if I am wrong) introduced task work queues.
>
> So IMA-appraisal was able to be upstreamed... That was ~3.4 time frame, IIRC
>
> Name space also dated around ~3.4??
> Apparmor namespace change was also around that time.
>
> 3.10 introduces this name space order change and broke IMA-appraisal.

IMA-appraisal is fundamentally broken because I can take a mandatory
file lock and prevent IMA-apprasial.

Using kernel_read is what allows this.

> Isn't it a clear motivating case???

kernel_read is not appropriate for IMA use. The rest of this is just
the messenger.

IMA needs to use a cousin of kernel_read that operates at a lower level
than vfs_read. A function that all of the permission checks and the
fsnotify work.

I am sorry to be the bearer of bad news. But kernel_read is totally
inappropriate for IMA.

Eric

2014-04-25 20:52:24

by Dmitry Kasatkin

[permalink] [raw]
Subject: Re: Kernel panic at Ubuntu: IMA + Apparmor

On 25 April 2014 23:45, Eric W. Biederman <[email protected]> wrote:
> Dmitry Kasatkin <[email protected]> writes:
>
>> On 25 April 2014 23:01, Oleg Nesterov <[email protected]> wrote:
>>> On 04/25, Eric W. Biederman wrote:
>>>>
>>>> Oleg Nesterov <[email protected]> writes:
>>>>
>>>> > Well. I _think_ that __fput() and ima_file_free() in particular should not
>>>> > depend on current and/or current->nsproxy. If nothing else, fput() can be
>>>> > called by the unrelated task which looks into /proc/pid/.
>>>> >
>>>> > But again, task_work_add() has more and more users, and it seems that even
>>>> > __fput() paths can do "everything", so perhaps it would be safer to allow
>>>> > to use ->nsproxy in task_work_run.
>>>>
>>>> Like I said, give me a clear motivating case.
>>>
>>> I agree, we need a reason. Currently I do not see one.
>>>
>>>> Right now not allowing
>>>> nsproxy is turning up bugs in __fput. Which seems like a good thing.
>>>
>>> This is what I certainly agree with ;)
>>>
>>
>> Hi,
>>
>> IMA uses kernel_read API which does not know anything about caller.
>> And of course security frameworks are at guard as usual.
>>
>> Exactly after reading first Eric's respons, I thought why to scratch
>> the head when task work queues are indeed designed for tasks...
>
> __fput has no guarantee of running in the task that close the file
> descriptor. If your code depends on that your code is broken.
>
>> And if you to dig for the history, IMA-appraisal was stuck due to
>> lockdep reporting even though it was on non-everlaping cases.
>> IIRC files vs. directories...
>>
>> After that IIRC Al Viro discussed about delayed fput and IIRC Oleg
>> (sorry if I am wrong) introduced task work queues.
>>
>> So IMA-appraisal was able to be upstreamed... That was ~3.4 time frame, IIRC
>>
>> Name space also dated around ~3.4??
>> Apparmor namespace change was also around that time.
>>
>> 3.10 introduces this name space order change and broke IMA-appraisal.
>
> IMA-appraisal is fundamentally broken because I can take a mandatory
> file lock and prevent IMA-apprasial.
>

What file lock are you talking about?
IMA-appraisal does not depends on file locks...

> Using kernel_read is what allows this.
>
>> Isn't it a clear motivating case???
>
> kernel_read is not appropriate for IMA use. The rest of this is just
> the messenger.
>
> IMA needs to use a cousin of kernel_read that operates at a lower level
> than vfs_read. A function that all of the permission checks and the
> fsnotify work.
>
> I am sorry to be the bearer of bad news. But kernel_read is totally
> inappropriate for IMA.
>

So you break IMA-appraisal and declare that it cannot be used now?

> Eric
>



--
Thanks,
Dmitry

2014-04-25 21:21:44

by Al Viro

[permalink] [raw]
Subject: Re: Kernel panic at Ubuntu: IMA + Apparmor

On Fri, Apr 25, 2014 at 01:45:17PM -0700, Eric W. Biederman wrote:

> IMA-appraisal is fundamentally broken because I can take a mandatory
> file lock and prevent IMA-apprasial.
>
> Using kernel_read is what allows this.
>
> > Isn't it a clear motivating case???
>
> kernel_read is not appropriate for IMA use. The rest of this is just
> the messenger.
>
> IMA needs to use a cousin of kernel_read that operates at a lower level
> than vfs_read. A function that all of the permission checks and the
> fsnotify work.

It's worse than that, actually ;-/ IMA hooks in __fput() have interesting
interplay with revoke-related stuff as well. Another very messy thing in
the same area is that it actually does ->read() from under ->i_mutex, leading
to all kinds of interesting locking issues...

I doubt that your "let's open-code vfs_read() guts" would be a good idea;
if nothing else, it might make more sense to make rw_verify_area() skip
the mandlock and security theatre when called in such situation.

What a mess... ;-/

2014-04-25 21:27:51

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Kernel panic at Ubuntu: IMA + Apparmor

Dmitry Kasatkin <[email protected]> writes:

> On 25 April 2014 23:45, Eric W. Biederman <[email protected]> wrote:
>> Dmitry Kasatkin <[email protected]> writes:
>>
>>> On 25 April 2014 23:01, Oleg Nesterov <[email protected]> wrote:
>>>> On 04/25, Eric W. Biederman wrote:
>>>>>
>>>>> Oleg Nesterov <[email protected]> writes:
>>>>>
>>>>> > Well. I _think_ that __fput() and ima_file_free() in particular should not
>>>>> > depend on current and/or current->nsproxy. If nothing else, fput() can be
>>>>> > called by the unrelated task which looks into /proc/pid/.
>>>>> >
>>>>> > But again, task_work_add() has more and more users, and it seems that even
>>>>> > __fput() paths can do "everything", so perhaps it would be safer to allow
>>>>> > to use ->nsproxy in task_work_run.
>>>>>
>>>>> Like I said, give me a clear motivating case.
>>>>
>>>> I agree, we need a reason. Currently I do not see one.
>>>>
>>>>> Right now not allowing
>>>>> nsproxy is turning up bugs in __fput. Which seems like a good thing.
>>>>
>>>> This is what I certainly agree with ;)
>>>>
>>>
>>> Hi,
>>>
>>> IMA uses kernel_read API which does not know anything about caller.
>>> And of course security frameworks are at guard as usual.
>>>
>>> Exactly after reading first Eric's respons, I thought why to scratch
>>> the head when task work queues are indeed designed for tasks...
>>
>> __fput has no guarantee of running in the task that close the file
>> descriptor. If your code depends on that your code is broken.
>>
>>> And if you to dig for the history, IMA-appraisal was stuck due to
>>> lockdep reporting even though it was on non-everlaping cases.
>>> IIRC files vs. directories...
>>>
>>> After that IIRC Al Viro discussed about delayed fput and IIRC Oleg
>>> (sorry if I am wrong) introduced task work queues.
>>>
>>> So IMA-appraisal was able to be upstreamed... That was ~3.4 time frame, IIRC
>>>
>>> Name space also dated around ~3.4??
>>> Apparmor namespace change was also around that time.
>>>
>>> 3.10 introduces this name space order change and broke IMA-appraisal.
>>
>> IMA-appraisal is fundamentally broken because I can take a mandatory
>> file lock and prevent IMA-apprasial.
>>
>
> What file lock are you talking about?
> IMA-appraisal does not depends on file locks...

It honors them. Look at rw_verify_area, in vfs_read, in kernel_read.

It sure looks like locks_mandatory_area can cause your kernel_read to
fail.

>> Using kernel_read is what allows this.
>>
>>> Isn't it a clear motivating case???
>>
>> kernel_read is not appropriate for IMA use. The rest of this is just
>> the messenger.
>>
>> IMA needs to use a cousin of kernel_read that operates at a lower level
>> than vfs_read. A function that all of the permission checks and the
>> fsnotify work.
>>
>> I am sorry to be the bearer of bad news. But kernel_read is totally
>> inappropriate for IMA.
>>
>
> So you break IMA-appraisal and declare that it cannot be used now?

I didn't break it. I read the code, and I read the back trace to see
where the bug was.

I see IMA-appraisal trying to read file data as if it were a user space
application in such a way that it can get permission denied for a whole
host of reasons.

My understanding of IMA-appraisal is that using a code path that can
give use permission denined when performing appraisal is a way for
clever people to attack and avoid IMA-appraisal without violating any
security policy.

Am I wrong. Is it ok for IMA-appraisal to get permission denied when it
wants to appraise a file?

Eric

2014-04-25 21:44:21

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Kernel panic at Ubuntu: IMA + Apparmor

Al Viro <[email protected]> writes:

> On Fri, Apr 25, 2014 at 01:45:17PM -0700, Eric W. Biederman wrote:
>
>> IMA-appraisal is fundamentally broken because I can take a mandatory
>> file lock and prevent IMA-apprasial.
>>
>> Using kernel_read is what allows this.
>>
>> > Isn't it a clear motivating case???
>>
>> kernel_read is not appropriate for IMA use. The rest of this is just
>> the messenger.
>>
>> IMA needs to use a cousin of kernel_read that operates at a lower level
>> than vfs_read. A function that all of the permission checks and the
>> fsnotify work.
>
> It's worse than that, actually ;-/ IMA hooks in __fput() have interesting
> interplay with revoke-related stuff as well. Another very messy thing in
> the same area is that it actually does ->read() from under ->i_mutex, leading
> to all kinds of interesting locking issues...
>
> I doubt that your "let's open-code vfs_read() guts" would be a good idea;
> if nothing else, it might make more sense to make rw_verify_area() skip
> the mandlock and security theatre when called in such situation.
>
> What a mess... ;-/

Agreed.

All I really meant is that vfs_read does too much, so it probably needs
to be refactored for this case. But fsnotify_read, add_rchar, and
inc_syscr all seem inappropriate.

So I think we might be able to get away with something like this:

ssize_t __vfs_read(struct file *file, char __user *buf, size_t count, loff_t *pos)
{
ssize_t ret;

if (!(file->f_mode & FMODE_READ))
return -EBADF;
if (!file->f_op->read && !file->f_op->aio_read)
return -EINVAL;
if (unlikely(!access_ok(VERIFY_WRITE, buf, count)))
return -EFAULT;

if (ret >= 0) {
count = ret;
if (file->f_op->read)
ret = file->f_op->read(file, buf, count, pos);
else
ret = do_sync_read(file, buf, count, pos);
}

return ret;
}


How much of the rest we do really would seem to depend on how valuable
the sanity checks are.

This area of code keeps evolving enough that I don't see how we could
possibly avoid going through helper functions to figure out which file
ops we want to use this week.

Eric

2014-04-25 21:46:30

by Dmitry Kasatkin

[permalink] [raw]
Subject: Re: Kernel panic at Ubuntu: IMA + Apparmor

On 26 April 2014 00:27, Eric W. Biederman <[email protected]> wrote:
> Dmitry Kasatkin <[email protected]> writes:
>
>> On 25 April 2014 23:45, Eric W. Biederman <[email protected]> wrote:
>>> Dmitry Kasatkin <[email protected]> writes:
>>>
>>>> On 25 April 2014 23:01, Oleg Nesterov <[email protected]> wrote:
>>>>> On 04/25, Eric W. Biederman wrote:
>>>>>>
>>>>>> Oleg Nesterov <[email protected]> writes:
>>>>>>
>>>>>> > Well. I _think_ that __fput() and ima_file_free() in particular should not
>>>>>> > depend on current and/or current->nsproxy. If nothing else, fput() can be
>>>>>> > called by the unrelated task which looks into /proc/pid/.
>>>>>> >
>>>>>> > But again, task_work_add() has more and more users, and it seems that even
>>>>>> > __fput() paths can do "everything", so perhaps it would be safer to allow
>>>>>> > to use ->nsproxy in task_work_run.
>>>>>>
>>>>>> Like I said, give me a clear motivating case.
>>>>>
>>>>> I agree, we need a reason. Currently I do not see one.
>>>>>
>>>>>> Right now not allowing
>>>>>> nsproxy is turning up bugs in __fput. Which seems like a good thing.
>>>>>
>>>>> This is what I certainly agree with ;)
>>>>>
>>>>
>>>> Hi,
>>>>
>>>> IMA uses kernel_read API which does not know anything about caller.
>>>> And of course security frameworks are at guard as usual.
>>>>
>>>> Exactly after reading first Eric's respons, I thought why to scratch
>>>> the head when task work queues are indeed designed for tasks...
>>>
>>> __fput has no guarantee of running in the task that close the file
>>> descriptor. If your code depends on that your code is broken.
>>>
>>>> And if you to dig for the history, IMA-appraisal was stuck due to
>>>> lockdep reporting even though it was on non-everlaping cases.
>>>> IIRC files vs. directories...
>>>>
>>>> After that IIRC Al Viro discussed about delayed fput and IIRC Oleg
>>>> (sorry if I am wrong) introduced task work queues.
>>>>
>>>> So IMA-appraisal was able to be upstreamed... That was ~3.4 time frame, IIRC
>>>>
>>>> Name space also dated around ~3.4??
>>>> Apparmor namespace change was also around that time.
>>>>
>>>> 3.10 introduces this name space order change and broke IMA-appraisal.
>>>
>>> IMA-appraisal is fundamentally broken because I can take a mandatory
>>> file lock and prevent IMA-apprasial.
>>>
>>
>> What file lock are you talking about?
>> IMA-appraisal does not depends on file locks...
>
> It honors them. Look at rw_verify_area, in vfs_read, in kernel_read.
>
> It sure looks like locks_mandatory_area can cause your kernel_read to
> fail.
>
>>> Using kernel_read is what allows this.
>>>
>>>> Isn't it a clear motivating case???
>>>
>>> kernel_read is not appropriate for IMA use. The rest of this is just
>>> the messenger.
>>>
>>> IMA needs to use a cousin of kernel_read that operates at a lower level
>>> than vfs_read. A function that all of the permission checks and the
>>> fsnotify work.
>>>
>>> I am sorry to be the bearer of bad news. But kernel_read is totally
>>> inappropriate for IMA.
>>>
>>
>> So you break IMA-appraisal and declare that it cannot be used now?
>
> I didn't break it. I read the code, and I read the back trace to see
> where the bug was.
>
> I see IMA-appraisal trying to read file data as if it were a user space
> application in such a way that it can get permission denied for a whole
> host of reasons.
>
> My understanding of IMA-appraisal is that using a code path that can
> give use permission denined when performing appraisal is a way for
> clever people to attack and avoid IMA-appraisal without violating any
> security policy.
>

Interesting thing is that file was already open before and LSM gave OK for this.
Then re-reading the file on close in fact does not need any LSM
permission checks.
But as kernel_read API is still the same, it goes via the same checks...

But on close with delayed fput nsproxy is missing ....

> Am I wrong. Is it ok for IMA-appraisal to get permission denied when it
> wants to appraise a file?
>

IMA is called after may_open...


> Eric



--
Thanks,
Dmitry

2014-04-25 21:55:20

by Al Viro

[permalink] [raw]
Subject: Re: Kernel panic at Ubuntu: IMA + Apparmor

On Fri, Apr 25, 2014 at 02:43:42PM -0700, Eric W. Biederman wrote:

> ssize_t __vfs_read(struct file *file, char __user *buf, size_t count, loff_t *pos)
> {
> ssize_t ret;
>
> if (!(file->f_mode & FMODE_READ))
> return -EBADF;
> if (!file->f_op->read && !file->f_op->aio_read)
> return -EINVAL;
> if (unlikely(!access_ok(VERIFY_WRITE, buf, count)))
> return -EFAULT;
>
> if (ret >= 0) {
> count = ret;
> if (file->f_op->read)
> ret = file->f_op->read(file, buf, count, pos);
> else
> ret = do_sync_read(file, buf, count, pos);
> }
>
> return ret;
> }

... which lacks the f_pos wraparound, etc. checks done by rw_verify_area().
IOW, it's one more place to grep through while verifying that ->read()
et.al. do not get called with such arguments.

fanotify probably could be skipped - ask the security circus crowd about
that one, it's their bast^Wbaby. add_rchar() and inc_syscr()... depends on
whether you want those reads hidden from accounting.

2014-04-25 21:56:36

by Dmitry Kasatkin

[permalink] [raw]
Subject: Re: Kernel panic at Ubuntu: IMA + Apparmor

On 26 April 2014 00:46, Dmitry Kasatkin <[email protected]> wrote:
> On 26 April 2014 00:27, Eric W. Biederman <[email protected]> wrote:
>> Dmitry Kasatkin <[email protected]> writes:
>>
>>> On 25 April 2014 23:45, Eric W. Biederman <[email protected]> wrote:
>>>> Dmitry Kasatkin <[email protected]> writes:
>>>>
>>>>> On 25 April 2014 23:01, Oleg Nesterov <[email protected]> wrote:
>>>>>> On 04/25, Eric W. Biederman wrote:
>>>>>>>
>>>>>>> Oleg Nesterov <[email protected]> writes:
>>>>>>>
>>>>>>> > Well. I _think_ that __fput() and ima_file_free() in particular should not
>>>>>>> > depend on current and/or current->nsproxy. If nothing else, fput() can be
>>>>>>> > called by the unrelated task which looks into /proc/pid/.
>>>>>>> >
>>>>>>> > But again, task_work_add() has more and more users, and it seems that even
>>>>>>> > __fput() paths can do "everything", so perhaps it would be safer to allow
>>>>>>> > to use ->nsproxy in task_work_run.
>>>>>>>
>>>>>>> Like I said, give me a clear motivating case.
>>>>>>
>>>>>> I agree, we need a reason. Currently I do not see one.
>>>>>>
>>>>>>> Right now not allowing
>>>>>>> nsproxy is turning up bugs in __fput. Which seems like a good thing.
>>>>>>
>>>>>> This is what I certainly agree with ;)
>>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> IMA uses kernel_read API which does not know anything about caller.
>>>>> And of course security frameworks are at guard as usual.
>>>>>
>>>>> Exactly after reading first Eric's respons, I thought why to scratch
>>>>> the head when task work queues are indeed designed for tasks...
>>>>
>>>> __fput has no guarantee of running in the task that close the file
>>>> descriptor. If your code depends on that your code is broken.
>>>>
>>>>> And if you to dig for the history, IMA-appraisal was stuck due to
>>>>> lockdep reporting even though it was on non-everlaping cases.
>>>>> IIRC files vs. directories...
>>>>>
>>>>> After that IIRC Al Viro discussed about delayed fput and IIRC Oleg
>>>>> (sorry if I am wrong) introduced task work queues.
>>>>>
>>>>> So IMA-appraisal was able to be upstreamed... That was ~3.4 time frame, IIRC
>>>>>
>>>>> Name space also dated around ~3.4??
>>>>> Apparmor namespace change was also around that time.
>>>>>
>>>>> 3.10 introduces this name space order change and broke IMA-appraisal.
>>>>
>>>> IMA-appraisal is fundamentally broken because I can take a mandatory
>>>> file lock and prevent IMA-apprasial.
>>>>
>>>
>>> What file lock are you talking about?
>>> IMA-appraisal does not depends on file locks...
>>
>> It honors them. Look at rw_verify_area, in vfs_read, in kernel_read.
>>
>> It sure looks like locks_mandatory_area can cause your kernel_read to
>> fail.
>>
>>>> Using kernel_read is what allows this.
>>>>
>>>>> Isn't it a clear motivating case???
>>>>
>>>> kernel_read is not appropriate for IMA use. The rest of this is just
>>>> the messenger.
>>>>
>>>> IMA needs to use a cousin of kernel_read that operates at a lower level
>>>> than vfs_read. A function that all of the permission checks and the
>>>> fsnotify work.
>>>>
>>>> I am sorry to be the bearer of bad news. But kernel_read is totally
>>>> inappropriate for IMA.
>>>>
>>>
>>> So you break IMA-appraisal and declare that it cannot be used now?
>>
>> I didn't break it. I read the code, and I read the back trace to see
>> where the bug was.
>>
>> I see IMA-appraisal trying to read file data as if it were a user space
>> application in such a way that it can get permission denied for a whole
>> host of reasons.
>>
>> My understanding of IMA-appraisal is that using a code path that can
>> give use permission denined when performing appraisal is a way for
>> clever people to attack and avoid IMA-appraisal without violating any
>> security policy.
>>
>
> Interesting thing is that file was already open before and LSM gave OK for this.
> Then re-reading the file on close in fact does not need any LSM
> permission checks.
> But as kernel_read API is still the same, it goes via the same checks...
>
> But on close with delayed fput nsproxy is missing ....
>
>> Am I wrong. Is it ok for IMA-appraisal to get permission denied when it
>> wants to appraise a file?
>>
>
> IMA is called after may_open...
>
>
>> Eric
>
>

Is it really a show stopper to switch order of 2 functions as quick fix?
It was like that before 3.10 and seemed ok...


>
> --
> Thanks,
> Dmitry


--
Thanks,
Dmitry

2014-04-25 22:11:48

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Kernel panic at Ubuntu: IMA + Apparmor

Dmitry Kasatkin <[email protected]> writes:

> On 26 April 2014 00:27, Eric W. Biederman <[email protected]> wrote:
>> Dmitry Kasatkin <[email protected]> writes:
>>
>>> On 25 April 2014 23:45, Eric W. Biederman <[email protected]> wrote:
>>>> Dmitry Kasatkin <[email protected]> writes:
>>>>
>>>>> On 25 April 2014 23:01, Oleg Nesterov <[email protected]> wrote:
>>>>>> On 04/25, Eric W. Biederman wrote:
>>>>>>>
>>>>>>> Oleg Nesterov <[email protected]> writes:
>>>>>>>
>>>>>>> > Well. I _think_ that __fput() and ima_file_free() in particular should not
>>>>>>> > depend on current and/or current->nsproxy. If nothing else, fput() can be
>>>>>>> > called by the unrelated task which looks into /proc/pid/.
>>>>>>> >
>>>>>>> > But again, task_work_add() has more and more users, and it seems that even
>>>>>>> > __fput() paths can do "everything", so perhaps it would be safer to allow
>>>>>>> > to use ->nsproxy in task_work_run.
>>>>>>>
>>>>>>> Like I said, give me a clear motivating case.
>>>>>>
>>>>>> I agree, we need a reason. Currently I do not see one.
>>>>>>
>>>>>>> Right now not allowing
>>>>>>> nsproxy is turning up bugs in __fput. Which seems like a good thing.
>>>>>>
>>>>>> This is what I certainly agree with ;)
>>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> IMA uses kernel_read API which does not know anything about caller.
>>>>> And of course security frameworks are at guard as usual.
>>>>>
>>>>> Exactly after reading first Eric's respons, I thought why to scratch
>>>>> the head when task work queues are indeed designed for tasks...
>>>>
>>>> __fput has no guarantee of running in the task that close the file
>>>> descriptor. If your code depends on that your code is broken.
>>>>
>>>>> And if you to dig for the history, IMA-appraisal was stuck due to
>>>>> lockdep reporting even though it was on non-everlaping cases.
>>>>> IIRC files vs. directories...
>>>>>
>>>>> After that IIRC Al Viro discussed about delayed fput and IIRC Oleg
>>>>> (sorry if I am wrong) introduced task work queues.
>>>>>
>>>>> So IMA-appraisal was able to be upstreamed... That was ~3.4 time frame, IIRC
>>>>>
>>>>> Name space also dated around ~3.4??
>>>>> Apparmor namespace change was also around that time.
>>>>>
>>>>> 3.10 introduces this name space order change and broke IMA-appraisal.
>>>>
>>>> IMA-appraisal is fundamentally broken because I can take a mandatory
>>>> file lock and prevent IMA-apprasial.
>>>>
>>>
>>> What file lock are you talking about?
>>> IMA-appraisal does not depends on file locks...
>>
>> It honors them. Look at rw_verify_area, in vfs_read, in kernel_read.
>>
>> It sure looks like locks_mandatory_area can cause your kernel_read to
>> fail.
>>
>>>> Using kernel_read is what allows this.
>>>>
>>>>> Isn't it a clear motivating case???
>>>>
>>>> kernel_read is not appropriate for IMA use. The rest of this is just
>>>> the messenger.
>>>>
>>>> IMA needs to use a cousin of kernel_read that operates at a lower level
>>>> than vfs_read. A function that all of the permission checks and the
>>>> fsnotify work.
>>>>
>>>> I am sorry to be the bearer of bad news. But kernel_read is totally
>>>> inappropriate for IMA.
>>>>
>>>
>>> So you break IMA-appraisal and declare that it cannot be used now?
>>
>> I didn't break it. I read the code, and I read the back trace to see
>> where the bug was.
>>
>> I see IMA-appraisal trying to read file data as if it were a user space
>> application in such a way that it can get permission denied for a whole
>> host of reasons.
>>
>> My understanding of IMA-appraisal is that using a code path that can
>> give use permission denined when performing appraisal is a way for
>> clever people to attack and avoid IMA-appraisal without violating any
>> security policy.
>>
>
> Interesting thing is that file was already open before and LSM gave OK for this.
> Then re-reading the file on close in fact does not need any LSM
> permission checks.

And LSM are typically an implementation of mandatory access control.
Which means they do the crazy thing of checking every file operation.
Which means from one operation to the next they can give you -EPERM.

> But as kernel_read API is still the same, it goes via the same checks...
>
> But on close with delayed fput nsproxy is missing ....

And you might right in the same processor or you might be called in a
completely different process with completely different credentials
and security labels and the the lsm instead of crashing would silently
tell you no, and you have even stranger bugs to track down.

>> Am I wrong. Is it ok for IMA-appraisal to get permission denied when it
>> wants to appraise a file?
>>
>
> IMA is called after may_open...

Which in the case of mandatory file locking, or lsms means little.

I am afraid your mental model of where linux does permission checks is
about 20 years out of date. Which is interesting to say to an lsm maintainer...

Eric

2014-04-25 22:26:31

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Kernel panic at Ubuntu: IMA + Apparmor

Al Viro <[email protected]> writes:

> On Fri, Apr 25, 2014 at 02:43:42PM -0700, Eric W. Biederman wrote:
>
>> ssize_t __vfs_read(struct file *file, char __user *buf, size_t count, loff_t *pos)
>> {
>> ssize_t ret;
>>
>> if (!(file->f_mode & FMODE_READ))
>> return -EBADF;
>> if (!file->f_op->read && !file->f_op->aio_read)
>> return -EINVAL;
>> if (unlikely(!access_ok(VERIFY_WRITE, buf, count)))
>> return -EFAULT;
>>
>> if (ret >= 0) {
>> count = ret;
>> if (file->f_op->read)
>> ret = file->f_op->read(file, buf, count, pos);
>> else
>> ret = do_sync_read(file, buf, count, pos);
>> }
>>
>> return ret;
>> }
>
> ... which lacks the f_pos wraparound, etc. checks done by rw_verify_area().
> IOW, it's one more place to grep through while verifying that ->read()
> et.al. do not get called with such arguments.

Agreed it must be done more delicately than my sketch. I am not
familiar with how much value such sanity checks add. Especially when
the read is not coming from a potentially hostile userspace.

> fanotify probably could be skipped - ask the security circus crowd about
> that one, it's their bast^Wbaby.

When the point is having a factor of read that skips the security circus
I think it makes sense to skip this too. At least as a starting
position.

> add_rchar() and inc_syscr()... depends on
> whether you want those reads hidden from accounting.

I doubt it matters in practice, the code is cheap.

Still it feels wrong to account reads to a task that did not ask for
them. It feels more correct to account that kind of read into a
different bucket. Say the reads performed by the kernel for mysterious
kernel activities.

Eric


2014-04-25 22:39:41

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Kernel panic at Ubuntu: IMA + Apparmor

Dmitry Kasatkin <[email protected]> writes:

> Is it really a show stopper to switch order of 2 functions as quick fix?
> It was like that before 3.10 and seemed ok...

When that is the question. The answer is yes it is a show stopper.

A quick fix to bury a fundamental design flaw because the code
previously seemed ok. That seems fundamentally wrong.

Having IMA conflict with Apparmor in Kconfig would be a sensible quick
fix.

Eric

2014-04-26 08:49:17

by Dmitry Kasatkin

[permalink] [raw]
Subject: Re: Kernel panic at Ubuntu: IMA + Apparmor

On 26 April 2014 01:11, Eric W. Biederman <[email protected]> wrote:
> Dmitry Kasatkin <[email protected]> writes:
>
>> On 26 April 2014 00:27, Eric W. Biederman <[email protected]> wrote:
>>> Dmitry Kasatkin <[email protected]> writes:
>>>
>>>> On 25 April 2014 23:45, Eric W. Biederman <[email protected]> wrote:
>>>>> Dmitry Kasatkin <[email protected]> writes:
>>>>>
>>>>>> On 25 April 2014 23:01, Oleg Nesterov <[email protected]> wrote:
>>>>>>> On 04/25, Eric W. Biederman wrote:
>>>>>>>>
>>>>>>>> Oleg Nesterov <[email protected]> writes:
>>>>>>>>
>>>>>>>> > Well. I _think_ that __fput() and ima_file_free() in particular should not
>>>>>>>> > depend on current and/or current->nsproxy. If nothing else, fput() can be
>>>>>>>> > called by the unrelated task which looks into /proc/pid/.
>>>>>>>> >
>>>>>>>> > But again, task_work_add() has more and more users, and it seems that even
>>>>>>>> > __fput() paths can do "everything", so perhaps it would be safer to allow
>>>>>>>> > to use ->nsproxy in task_work_run.
>>>>>>>>
>>>>>>>> Like I said, give me a clear motivating case.
>>>>>>>
>>>>>>> I agree, we need a reason. Currently I do not see one.
>>>>>>>
>>>>>>>> Right now not allowing
>>>>>>>> nsproxy is turning up bugs in __fput. Which seems like a good thing.
>>>>>>>
>>>>>>> This is what I certainly agree with ;)
>>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> IMA uses kernel_read API which does not know anything about caller.
>>>>>> And of course security frameworks are at guard as usual.
>>>>>>
>>>>>> Exactly after reading first Eric's respons, I thought why to scratch
>>>>>> the head when task work queues are indeed designed for tasks...
>>>>>
>>>>> __fput has no guarantee of running in the task that close the file
>>>>> descriptor. If your code depends on that your code is broken.
>>>>>
>>>>>> And if you to dig for the history, IMA-appraisal was stuck due to
>>>>>> lockdep reporting even though it was on non-everlaping cases.
>>>>>> IIRC files vs. directories...
>>>>>>
>>>>>> After that IIRC Al Viro discussed about delayed fput and IIRC Oleg
>>>>>> (sorry if I am wrong) introduced task work queues.
>>>>>>
>>>>>> So IMA-appraisal was able to be upstreamed... That was ~3.4 time frame, IIRC
>>>>>>
>>>>>> Name space also dated around ~3.4??
>>>>>> Apparmor namespace change was also around that time.
>>>>>>
>>>>>> 3.10 introduces this name space order change and broke IMA-appraisal.
>>>>>
>>>>> IMA-appraisal is fundamentally broken because I can take a mandatory
>>>>> file lock and prevent IMA-apprasial.
>>>>>
>>>>
>>>> What file lock are you talking about?
>>>> IMA-appraisal does not depends on file locks...
>>>
>>> It honors them. Look at rw_verify_area, in vfs_read, in kernel_read.
>>>
>>> It sure looks like locks_mandatory_area can cause your kernel_read to
>>> fail.
>>>
>>>>> Using kernel_read is what allows this.
>>>>>
>>>>>> Isn't it a clear motivating case???
>>>>>
>>>>> kernel_read is not appropriate for IMA use. The rest of this is just
>>>>> the messenger.
>>>>>
>>>>> IMA needs to use a cousin of kernel_read that operates at a lower level
>>>>> than vfs_read. A function that all of the permission checks and the
>>>>> fsnotify work.
>>>>>
>>>>> I am sorry to be the bearer of bad news. But kernel_read is totally
>>>>> inappropriate for IMA.
>>>>>
>>>>
>>>> So you break IMA-appraisal and declare that it cannot be used now?
>>>
>>> I didn't break it. I read the code, and I read the back trace to see
>>> where the bug was.
>>>
>>> I see IMA-appraisal trying to read file data as if it were a user space
>>> application in such a way that it can get permission denied for a whole
>>> host of reasons.
>>>
>>> My understanding of IMA-appraisal is that using a code path that can
>>> give use permission denined when performing appraisal is a way for
>>> clever people to attack and avoid IMA-appraisal without violating any
>>> security policy.
>>>
>>
>> Interesting thing is that file was already open before and LSM gave OK for this.
>> Then re-reading the file on close in fact does not need any LSM
>> permission checks.
>
> And LSM are typically an implementation of mandatory access control.
> Which means they do the crazy thing of checking every file operation.
> Which means from one operation to the next they can give you -EPERM.
>
>> But as kernel_read API is still the same, it goes via the same checks...
>>
>> But on close with delayed fput nsproxy is missing ....
>
> And you might right in the same processor or you might be called in a
> completely different process with completely different credentials
> and security labels and the the lsm instead of crashing would silently
> tell you no, and you have even stranger bugs to track down.
>

task works are supposed to be called from process context and credentials
should stay the same on any CPU.

Isn't it?

And not taking into account to IMA, reverse order of exit_taskwork and
exit_namespaces
looks reasonable as it was before.

>>> Am I wrong. Is it ok for IMA-appraisal to get permission denied when it
>>> wants to appraise a file?
>>>
>>
>> IMA is called after may_open...
>
> Which in the case of mandatory file locking, or lsms means little.
>

No, it does not mean little.
LSMs, which are AC models, can only impose additional restrictions,
not to relax them.

IMA is not AC LSM, it is integrity verification module.
But follows the same principal.
It may further to restrict the access by performing integrity checks...
It makes sure that that "OK" decision granted by LSM, was actually
based on trustworthy
file credentials...

> I am afraid your mental model of where linux does permission checks is
> about 20 years out of date. Which is interesting to say to an lsm maintainer...
>

Be sure it's not. I am quite aware where linux does permission checks.

> Eric

IMA does kernel_read at file open.
At that point, security_file_permission might have certain sense as
process will be reading file after that.
Though, it will also happen during real file access..

In the case of IMA-appraisal, on file close, indeed file_permission
check is useless.
It may be "utter nonsense" as you say.


--
Thanks,
Dmitry

2014-04-26 08:58:50

by Dmitry Kasatkin

[permalink] [raw]
Subject: Re: Kernel panic at Ubuntu: IMA + Apparmor

On 26 April 2014 01:38, Eric W. Biederman <[email protected]> wrote:
> Dmitry Kasatkin <[email protected]> writes:
>
>> Is it really a show stopper to switch order of 2 functions as quick fix?
>> It was like that before 3.10 and seemed ok...
>
> When that is the question. The answer is yes it is a show stopper.
>
> A quick fix to bury a fundamental design flaw because the code
> previously seemed ok. That seems fundamentally wrong.
>
> Having IMA conflict with Apparmor in Kconfig would be a sensible quick
> fix.
>
> Eric

Conflict with Apparmor means with Ubuntu.

But answering to your early question..
IMA does not want permission denied when measuring and re-measuring files.
may_open() is doing that job before.

We need quickly introduce kernel_read without LSM checks...

--
Thanks,
Dmitry

2014-04-26 13:56:37

by Al Viro

[permalink] [raw]
Subject: Re: Kernel panic at Ubuntu: IMA + Apparmor

On Sat, Apr 26, 2014 at 11:58:45AM +0300, Dmitry Kasatkin wrote:

> Conflict with Apparmor means with Ubuntu.
>
> But answering to your early question..
> IMA does not want permission denied when measuring and re-measuring files.
> may_open() is doing that job before.
>
> We need quickly introduce kernel_read without LSM checks...

*snarl*

What we need quickly is to introduce you to a textbook or two. As the
matter of fact, in this case even wikipedia might suffice...

Please, figure out what "mandatory locking" is about, what kind of
exclusion does it provide and how much is it (un)related to LSM.

It has nothing to do with permission being denied; the normal behaviour is
to *block* until the lock has been removed. Or failure with -EAGAIN if
the file had been opened with O_NDELAY.

The effects apply only to read/write and their ilk; they have nothing
to do with e.g. O_RDWR open(). And having a file already opened r/w
by somebody does not prevent another process from opening it and acquiring
an exclusive lock on some range.

2014-04-26 16:54:52

by Dmitry Kasatkin

[permalink] [raw]
Subject: Re: Kernel panic at Ubuntu: IMA + Apparmor

On 26 April 2014 16:56, Al Viro <[email protected]> wrote:
> On Sat, Apr 26, 2014 at 11:58:45AM +0300, Dmitry Kasatkin wrote:
>
>> Conflict with Apparmor means with Ubuntu.
>>
>> But answering to your early question..
>> IMA does not want permission denied when measuring and re-measuring files.
>> may_open() is doing that job before.
>>
>> We need quickly introduce kernel_read without LSM checks...
>
> *snarl*
>
> What we need quickly is to introduce you to a textbook or two. As the
> matter of fact, in this case even wikipedia might suffice...
>

Hopefully we have you who were introduced to a textbook or two about relevant
subject and able kindly help us with the solution instead of telling
me this crap...


> Please, figure out what "mandatory locking" is about, what kind of
> exclusion does it provide and how much is it (un)related to LSM.
>

I admit, I missed this issue, but see above, we have you :)


> It has nothing to do with permission being denied; the normal behaviour is
> to *block* until the lock has been removed. Or failure with -EAGAIN if
> the file had been opened with O_NDELAY.
>
> The effects apply only to read/write and their ilk; they have nothing
> to do with e.g. O_RDWR open(). And having a file already opened r/w
> by somebody does not prevent another process from opening it and acquiring
> an exclusive lock on some range.


--
Thanks,
Dmitry

2014-04-26 17:42:46

by Al Viro

[permalink] [raw]
Subject: Re: Kernel panic at Ubuntu: IMA + Apparmor

On Sat, Apr 26, 2014 at 07:54:47PM +0300, Dmitry Kasatkin wrote:
> On 26 April 2014 16:56, Al Viro <[email protected]> wrote:
> > On Sat, Apr 26, 2014 at 11:58:45AM +0300, Dmitry Kasatkin wrote:
> >
> >> Conflict with Apparmor means with Ubuntu.
> >>
> >> But answering to your early question..
> >> IMA does not want permission denied when measuring and re-measuring files.
> >> may_open() is doing that job before.
> >>
> >> We need quickly introduce kernel_read without LSM checks...
> >
> > *snarl*
> >
> > What we need quickly is to introduce you to a textbook or two. As the
> > matter of fact, in this case even wikipedia might suffice...
> >
>
> Hopefully we have you who were introduced to a textbook or two about relevant
> subject and able kindly help us with the solution instead of telling
> me this crap...

See the discussion of that very topic (required modifications of vfs_read())
upthread. And Eric has a very good point about the usefulness of understanding
the basics of IO-related system calls in Unix for anybody who does any
kind of development related to keeping track of file contents modifications,
etc. It's *not* about some arcane knowledge of VFS internals (which also might
come handy when sticking hooks into said internals); it's about being familiar
with the semantics of read(2) and related concepts.

2014-04-26 19:03:44

by Dmitry Kasatkin

[permalink] [raw]
Subject: Re: Kernel panic at Ubuntu: IMA + Apparmor

On 26 April 2014 20:42, Al Viro <[email protected]> wrote:
> On Sat, Apr 26, 2014 at 07:54:47PM +0300, Dmitry Kasatkin wrote:
>> On 26 April 2014 16:56, Al Viro <[email protected]> wrote:
>> > On Sat, Apr 26, 2014 at 11:58:45AM +0300, Dmitry Kasatkin wrote:
>> >
>> >> Conflict with Apparmor means with Ubuntu.
>> >>
>> >> But answering to your early question..
>> >> IMA does not want permission denied when measuring and re-measuring files.
>> >> may_open() is doing that job before.
>> >>
>> >> We need quickly introduce kernel_read without LSM checks...
>> >
>> > *snarl*
>> >
>> > What we need quickly is to introduce you to a textbook or two. As the
>> > matter of fact, in this case even wikipedia might suffice...
>> >
>>
>> Hopefully we have you who were introduced to a textbook or two about relevant
>> subject and able kindly help us with the solution instead of telling
>> me this crap...
>
> See the discussion of that very topic (required modifications of vfs_read())
> upthread. And Eric has a very good point about the usefulness of understanding
> the basics of IO-related system calls in Unix for anybody who does any
> kind of development related to keeping track of file contents modifications,
> etc. It's *not* about some arcane knowledge of VFS internals (which also might
> come handy when sticking hooks into said internals); it's about being familiar
> with the semantics of read(2) and related concepts.

Great. Teaching discussions are over?
So how we will solve the problem reported in this thread?


--
Thanks,
Dmitry

2014-04-29 13:01:10

by Mimi Zohar

[permalink] [raw]
Subject: Re: Kernel panic at Ubuntu: IMA + Apparmor

My apologies for those receiving this post a 2nd time. The original
post never made it the mailing lists ...

On Fri, 2014-04-25 at 15:25 -0700, Eric W. Biederman wrote:
> Al Viro <[email protected]> writes:
>
> > On Fri, Apr 25, 2014 at 02:43:42PM -0700, Eric W. Biederman wrote:
> >
> >> ssize_t __vfs_read(struct file *file, char __user *buf, size_t count, loff_t *pos)
> >> {
> >> ssize_t ret;
> >>
> >> if (!(file->f_mode & FMODE_READ))
> >> return -EBADF;
> >> if (!file->f_op->read && !file->f_op->aio_read)
> >> return -EINVAL;
> >> if (unlikely(!access_ok(VERIFY_WRITE, buf, count)))
> >> return -EFAULT;
> >>
> >> if (ret >= 0) {
> >> count = ret;
> >> if (file->f_op->read)
> >> ret = file->f_op->read(file, buf, count, pos);
> >> else
> >> ret = do_sync_read(file, buf, count, pos);
> >> }
> >>
> >> return ret;
> >> }
> >
> > ... which lacks the f_pos wraparound, etc. checks done by rw_verify_area().
> > IOW, it's one more place to grep through while verifying that ->read()
> > et.al. do not get called with such arguments.
>
> Agreed it must be done more delicately than my sketch. I am not
> familiar with how much value such sanity checks add. Especially when
> the read is not coming from a potentially hostile userspace.

Sorry for the delay in commenting, imap problems. This sounds like a
plausible solution, similar to __vfs_setxattr_noperm() vs.
__vfs_setxattr().

> > fanotify probably could be skipped - ask the security circus crowd about
> > that one, it's their bast^Wbaby.
>
> When the point is having a factor of read that skips the security circus
> I think it makes sense to skip this too. At least as a starting
> position.

Right, fsnotify*() is meant for userspace access, not kernel access.
CC'ing Eric Paris for comment.

> > add_rchar() and inc_syscr()... depends on
> > whether you want those reads hidden from accounting.
>
> I doubt it matters in practice, the code is cheap.
>
> Still it feels wrong to account reads to a task that did not ask for
> them. It feels more correct to account that kind of read into a
> different bucket. Say the reads performed by the kernel for mysterious
> kernel activities.

Ok. So who are the interested parties that need to be included in this
discussion?

thanks,

Mimi