2011-04-12 15:00:26

by Michal Suchanek

[permalink] [raw]
Subject: Unionmount status?

Hello,

as some already know the Unionmount VFS union which has been in
development for some years now is the only True Union (TM) that can be
accepted into the kernel mainline by the VFS maintainers (for reasons
of their own which you can surely find if you search the web or ask
them directly).

The current UnionMount version that can be found here:

http://git.kernel.org/?p=linux/kernel/git/val/linux-2.6.git;a=shortlog;h=refs/heads/ext2_works

works for me as good as aufs does. That is I can build a live CD using
this unioning solution and it boots and runs without any apparent
issues.

There are probably many possible uses of the union which I did not
test nor did I test long term stability of using the unioned
filesystem. As far as ephemeral live systems go it works fine for me,
though.

The issue is that while the code is (nearly) finished it is not yet
merged into mainline and as I am not familiar with the details of
ever-changing Linux VFS layer forward-porting this code to current
kernels is somewhat challenging.

What is the plan with unionmount now?

What is required for it to be merged into mainline?

Thanks

Michal


2011-04-12 20:31:45

by Ric Wheeler

[permalink] [raw]
Subject: Re: Unionmount status?

On 04/12/2011 11:00 AM, Michal Suchanek wrote:
> Hello,
>
> as some already know the Unionmount VFS union which has been in
> development for some years now is the only True Union (TM) that can be
> accepted into the kernel mainline by the VFS maintainers (for reasons
> of their own which you can surely find if you search the web or ask
> them directly).
>
> The current UnionMount version that can be found here:
>
> http://git.kernel.org/?p=linux/kernel/git/val/linux-2.6.git;a=shortlog;h=refs/heads/ext2_works
>
> works for me as good as aufs does. That is I can build a live CD using
> this unioning solution and it boots and runs without any apparent
> issues.
>
> There are probably many possible uses of the union which I did not
> test nor did I test long term stability of using the unioned
> filesystem. As far as ephemeral live systems go it works fine for me,
> though.
>
> The issue is that while the code is (nearly) finished it is not yet
> merged into mainline and as I am not familiar with the details of
> ever-changing Linux VFS layer forward-porting this code to current
> kernels is somewhat challenging.
>
> What is the plan with unionmount now?
>
> What is required for it to be merged into mainline?
>
> Thanks
>
> Michal

Hi Michal,

People are actively looking to see what union mount (or overlayfs) solution to
pursue. Val has shifted her focus away from kernel hacking these days, but did
refresh her patch set in the last month or so.

Miklos has an overlay file system that has also been posted upstream.

I think that testing like you did and getting more eyes to look at the code is
the next key step for both projects.

Thanks!

Ric


2011-04-12 21:37:00

by Michal Suchanek

[permalink] [raw]
Subject: Re: Unionmount status?

On 12 April 2011 22:31, Ric Wheeler <[email protected]> wrote:
> On 04/12/2011 11:00 AM, Michal Suchanek wrote:
>>
>> Hello,
>>
>> as some already know the Unionmount VFS union which has been in
>> development for some years now is the only True Union (TM) that can be
>> accepted into the kernel mainline by the VFS maintainers (for reasons
>> of their own which you can surely find if you search the web or ask
>> them directly).
>>
>> The current UnionMount version that can be found here:
>>
>>
>> http://git.kernel.org/?p=linux/kernel/git/val/linux-2.6.git;a=shortlog;h=refs/heads/ext2_works
>>
>> works for me as good as aufs does. That is I can build a live CD using
>> this unioning solution and it boots and runs without any apparent
>> issues.
>>
>> There are probably many possible uses of the union which I did not
>> test nor did I test long term stability of using the unioned
>> filesystem. As far as ephemeral live systems go it works fine for me,
>> though.
>>
>> The issue is that while the code is (nearly) finished it is not yet
>> merged into mainline and as I am not familiar with the details of
>> ever-changing Linux VFS layer forward-porting this code to current
>> kernels is somewhat challenging.
>>
>> What is the plan with unionmount now?
>>
>> What is required  for it to be merged into mainline?
>>
>> Thanks
>>
>> Michal
>
> Hi Michal,
>
> People are actively looking to see what union mount (or overlayfs) solution
> to pursue. Val has shifted her focus away from kernel hacking these days,
> but did refresh her patch set in the last month or so.

I am not aware of such refreshed patch set, at least it is not
published in her repo.

>
> Miklos has an overlay file system that has also been posted upstream.

I have no idea about his other overlay filesystem either.

>
> I think that testing like you did and getting more eyes to look at the code
> is the next key step for both projects.
>

Could you provide more details?

I am not following the Linux fs development closely.

Thanks

Michal

2011-04-13 14:18:47

by Jiri Kosina

[permalink] [raw]
Subject: Re: Unionmount status?

On Tue, 12 Apr 2011, Michal Suchanek wrote:

> > Miklos has an overlay file system that has also been posted upstream.
>
> I have no idea about his other overlay filesystem either.

https://lkml.org/lkml/2011/3/22/222

--
Jiri Kosina
SUSE Labs, Novell Inc.

2011-04-13 15:14:11

by Michal Suchanek

[permalink] [raw]
Subject: Re: Unionmount status?

On 13 April 2011 16:18, Jiri Kosina <[email protected]> wrote:
> On Tue, 12 Apr 2011, Michal Suchanek wrote:
>
>> > Miklos has an overlay file system that has also been posted upstream.
>>
>> I have no idea about his other overlay filesystem either.
>
> https://lkml.org/lkml/2011/3/22/222
>

Thanks for pointing out this announcement.

However, neither this announcement nor the document
Documentation/filesystems/overlayfs.txt included in the git tree
details how to mount the filesystem. Without that it is kind of
useless.

Thanks

Michal

2011-04-13 17:27:07

by Ric Wheeler

[permalink] [raw]
Subject: Re: Unionmount status?

On 04/12/2011 05:36 PM, Michal Suchanek wrote:
> On 12 April 2011 22:31, Ric Wheeler<[email protected]> wrote:
>> On 04/12/2011 11:00 AM, Michal Suchanek wrote:
>>> Hello,
>>>
>>> as some already know the Unionmount VFS union which has been in
>>> development for some years now is the only True Union (TM) that can be
>>> accepted into the kernel mainline by the VFS maintainers (for reasons
>>> of their own which you can surely find if you search the web or ask
>>> them directly).
>>>
>>> The current UnionMount version that can be found here:
>>>
>>>
>>> http://git.kernel.org/?p=linux/kernel/git/val/linux-2.6.git;a=shortlog;h=refs/heads/ext2_works
>>>
>>> works for me as good as aufs does. That is I can build a live CD using
>>> this unioning solution and it boots and runs without any apparent
>>> issues.
>>>
>>> There are probably many possible uses of the union which I did not
>>> test nor did I test long term stability of using the unioned
>>> filesystem. As far as ephemeral live systems go it works fine for me,
>>> though.
>>>
>>> The issue is that while the code is (nearly) finished it is not yet
>>> merged into mainline and as I am not familiar with the details of
>>> ever-changing Linux VFS layer forward-porting this code to current
>>> kernels is somewhat challenging.
>>>
>>> What is the plan with unionmount now?
>>>
>>> What is required for it to be merged into mainline?
>>>
>>> Thanks
>>>
>>> Michal
>> Hi Michal,
>>
>> People are actively looking to see what union mount (or overlayfs) solution
>> to pursue. Val has shifted her focus away from kernel hacking these days,
>> but did refresh her patch set in the last month or so.
> I am not aware of such refreshed patch set, at least it is not
> published in her repo.
>

Val posted the refreshed patches with the title on March 22nd:

http://lwn.net/Articles/435019/

Ric

>> Miklos has an overlay file system that has also been posted upstream.
> I have no idea about his other overlay filesystem either.
>
>> I think that testing like you did and getting more eyes to look at the code
>> is the next key step for both projects.
>>
> Could you provide more details?
>
> I am not following the Linux fs development closely.
>
> Thanks
>
> Michal

2011-04-13 18:58:59

by Michal Suchanek

[permalink] [raw]
Subject: Re: Unionmount status?

On 13 April 2011 19:26, Ric Wheeler <[email protected]> wrote:
> On 04/12/2011 05:36 PM, Michal Suchanek wrote:
>>
>> On 12 April 2011 22:31, Ric Wheeler<[email protected]>  wrote:
>>>
>>> On 04/12/2011 11:00 AM, Michal Suchanek wrote:
>>>>
>>>> Hello,
>>>>
>>>> as some already know the Unionmount VFS union which has been in
>>>> development for some years now is the only True Union (TM) that can be
>>>> accepted into the kernel mainline by the VFS maintainers (for reasons
>>>> of their own which you can surely find if you search the web or ask
>>>> them directly).
>>>>
>>>> The current UnionMount version that can be found here:
>>>>
>>>>
>>>>
>>>> http://git.kernel.org/?p=linux/kernel/git/val/linux-2.6.git;a=shortlog;h=refs/heads/ext2_works
>>>>
>>>> works for me as good as aufs does. That is I can build a live CD using
>>>> this unioning solution and it boots and runs without any apparent
>>>> issues.
>>>>
>>>> There are probably many possible uses of the union which I did not
>>>> test nor did I test long term stability of using the unioned
>>>> filesystem. As far as ephemeral live systems go it works fine for me,
>>>> though.
>>>>
>>>> The issue is that while the code is (nearly) finished it is not yet
>>>> merged into mainline and as I am not familiar with the details of
>>>> ever-changing Linux VFS layer forward-porting this code to current
>>>> kernels is somewhat challenging.
>>>>
>>>> What is the plan with unionmount now?
>>>>
>>>> What is required  for it to be merged into mainline?
>>>>
>>>> Thanks
>>>>
>>>> Michal
>>>
>>> Hi Michal,
>>>
>>> People are actively looking to see what union mount (or overlayfs)
>>> solution
>>> to pursue. Val has shifted her focus away from kernel hacking these days,
>>> but did refresh her patch set in the last month or so.
>>
>> I am not aware of such refreshed patch set, at least it is not
>> published in her repo.
>>
>
> Val posted the refreshed patches with the title on March 22nd:
>
> http://lwn.net/Articles/435019/
>

That article references the same four months old repo which I
mentioned at the start of the thread, only a slightly different
branch.

While it maybe useful for testing unionmount (which I already tried)
it is not a patch against current kernel which could be used to build
current live images.

Thanks

Michal

2011-04-13 19:11:47

by Ric Wheeler

[permalink] [raw]
Subject: Re: Unionmount status?

On 04/13/2011 02:58 PM, Michal Suchanek wrote:
> On 13 April 2011 19:26, Ric Wheeler<[email protected]> wrote:
>> On 04/12/2011 05:36 PM, Michal Suchanek wrote:
>>> On 12 April 2011 22:31, Ric Wheeler<[email protected]> wrote:
>>>> On 04/12/2011 11:00 AM, Michal Suchanek wrote:
>>>>> Hello,
>>>>>
>>>>> as some already know the Unionmount VFS union which has been in
>>>>> development for some years now is the only True Union (TM) that can be
>>>>> accepted into the kernel mainline by the VFS maintainers (for reasons
>>>>> of their own which you can surely find if you search the web or ask
>>>>> them directly).
>>>>>
>>>>> The current UnionMount version that can be found here:
>>>>>
>>>>>
>>>>>
>>>>> http://git.kernel.org/?p=linux/kernel/git/val/linux-2.6.git;a=shortlog;h=refs/heads/ext2_works
>>>>>
>>>>> works for me as good as aufs does. That is I can build a live CD using
>>>>> this unioning solution and it boots and runs without any apparent
>>>>> issues.
>>>>>
>>>>> There are probably many possible uses of the union which I did not
>>>>> test nor did I test long term stability of using the unioned
>>>>> filesystem. As far as ephemeral live systems go it works fine for me,
>>>>> though.
>>>>>
>>>>> The issue is that while the code is (nearly) finished it is not yet
>>>>> merged into mainline and as I am not familiar with the details of
>>>>> ever-changing Linux VFS layer forward-porting this code to current
>>>>> kernels is somewhat challenging.
>>>>>
>>>>> What is the plan with unionmount now?
>>>>>
>>>>> What is required for it to be merged into mainline?
>>>>>
>>>>> Thanks
>>>>>
>>>>> Michal
>>>> Hi Michal,
>>>>
>>>> People are actively looking to see what union mount (or overlayfs)
>>>> solution
>>>> to pursue. Val has shifted her focus away from kernel hacking these days,
>>>> but did refresh her patch set in the last month or so.
>>> I am not aware of such refreshed patch set, at least it is not
>>> published in her repo.
>>>
>> Val posted the refreshed patches with the title on March 22nd:
>>
>> http://lwn.net/Articles/435019/
>>
> That article references the same four months old repo which I
> mentioned at the start of the thread, only a slightly different
> branch.
>
> While it maybe useful for testing unionmount (which I already tried)
> it is not a patch against current kernel which could be used to build
> current live images.
>
> Thanks
>
> Michal

She did post the patch series that same date in March - you can probably grab
the series from linux-fsdevel, look for this series:

"[PATCH 00/74] Union mounts version something or other"

Al Viro was planning on looking at her refreshed patches (he had reviewed them
with her in person), but that is not going to happen any time soon so getting
more eyes and testing would be great!

Ric

2011-04-13 19:47:54

by Michal Suchanek

[permalink] [raw]
Subject: Re: Unionmount status?

On 13 April 2011 21:11, Ric Wheeler <[email protected]> wrote:
> On 04/13/2011 02:58 PM, Michal Suchanek wrote:
>>
>> On 13 April 2011 19:26, Ric Wheeler<[email protected]>  wrote:
>>>
>>> On 04/12/2011 05:36 PM, Michal Suchanek wrote:
>>>>
>>>> On 12 April 2011 22:31, Ric Wheeler<[email protected]>    wrote:
>>>>>
>>>>> On 04/12/2011 11:00 AM, Michal Suchanek wrote:
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> as some already know the Unionmount VFS union which has been in
>>>>>> development for some years now is the only True Union (TM) that can be
>>>>>> accepted into the kernel mainline by the VFS maintainers (for reasons
>>>>>> of their own which you can surely find if you search the web or ask
>>>>>> them directly).
>>>>>>
>>>>>> The current UnionMount version that can be found here:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> http://git.kernel.org/?p=linux/kernel/git/val/linux-2.6.git;a=shortlog;h=refs/heads/ext2_works
>>>>>>
>>>>>> works for me as good as aufs does. That is I can build a live CD using
>>>>>> this unioning solution and it boots and runs without any apparent
>>>>>> issues.
>>>>>>
>>>>>> There are probably many possible uses of the union which I did not
>>>>>> test nor did I test long term stability of using the unioned
>>>>>> filesystem. As far as ephemeral live systems go it works fine for me,
>>>>>> though.
>>>>>>
>>>>>> The issue is that while the code is (nearly) finished it is not yet
>>>>>> merged into mainline and as I am not familiar with the details of
>>>>>> ever-changing Linux VFS layer forward-porting this code to current
>>>>>> kernels is somewhat challenging.
>>>>>>
>>>>>> What is the plan with unionmount now?
>>>>>>
>>>>>> What is required  for it to be merged into mainline?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Michal
>>>>>
>>>>> Hi Michal,
>>>>>
>>>>> People are actively looking to see what union mount (or overlayfs)
>>>>> solution
>>>>> to pursue. Val has shifted her focus away from kernel hacking these
>>>>> days,
>>>>> but did refresh her patch set in the last month or so.
>>>>
>>>> I am not aware of such refreshed patch set, at least it is not
>>>> published in her repo.
>>>>
>>> Val posted the refreshed patches with the title on March 22nd:
>>>
>>> http://lwn.net/Articles/435019/
>>>
>> That article references the same four months old repo which I
>> mentioned at the start of the thread, only a slightly different
>> branch.
>>
>> While it maybe useful for testing unionmount (which I already tried)
>> it is not a patch against current kernel which could be used to build
>> current live images.
>>
>> Thanks
>>
>> Michal
>
> She did post the patch series that same date in March - you can probably
> grab the series from linux-fsdevel, look for this series:
>
> "[PATCH 00/74] Union mounts version something or other"
>
> Al Viro was planning on looking at her refreshed patches (he had reviewed
> them with her in person), but that is not going to happen any time soon so
> getting more eyes and testing would be great!
>

Even gmame can't collect the patches back from the ML, I don't want to try.

However, the discussion suggests that these are exactly the 4 months
old branch ending in a commit with the summary "Temporary commit"
which did not inspire confidence in me so I used the previous (also 4
moths old) branch.

Thanks

Michal

2011-04-14 04:50:31

by Ian Kent

[permalink] [raw]
Subject: Re: Unionmount status?

On Wed, 2011-04-13 at 21:47 +0200, Michal Suchanek wrote:
> On 13 April 2011 21:11, Ric Wheeler <[email protected]> wrote:
> > On 04/13/2011 02:58 PM, Michal Suchanek wrote:
> >>
> >> On 13 April 2011 19:26, Ric Wheeler<[email protected]> wrote:
> >>>
> >>> On 04/12/2011 05:36 PM, Michal Suchanek wrote:
> >>>>
> >>>> On 12 April 2011 22:31, Ric Wheeler<[email protected]> wrote:
> >>>>>
> >>>>> On 04/12/2011 11:00 AM, Michal Suchanek wrote:
> >>>>>>
> >>>>>> Hello,
> >>>>>>
> >>>>>> as some already know the Unionmount VFS union which has been in
> >>>>>> development for some years now is the only True Union (TM) that can be
> >>>>>> accepted into the kernel mainline by the VFS maintainers (for reasons
> >>>>>> of their own which you can surely find if you search the web or ask
> >>>>>> them directly).
> >>>>>>
> >>>>>> The current UnionMount version that can be found here:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> http://git.kernel.org/?p=linux/kernel/git/val/linux-2.6.git;a=shortlog;h=refs/heads/ext2_works
> >>>>>>
> >>>>>> works for me as good as aufs does. That is I can build a live CD using
> >>>>>> this unioning solution and it boots and runs without any apparent
> >>>>>> issues.
> >>>>>>
> >>>>>> There are probably many possible uses of the union which I did not
> >>>>>> test nor did I test long term stability of using the unioned
> >>>>>> filesystem. As far as ephemeral live systems go it works fine for me,
> >>>>>> though.
> >>>>>>
> >>>>>> The issue is that while the code is (nearly) finished it is not yet
> >>>>>> merged into mainline and as I am not familiar with the details of
> >>>>>> ever-changing Linux VFS layer forward-porting this code to current
> >>>>>> kernels is somewhat challenging.
> >>>>>>
> >>>>>> What is the plan with unionmount now?
> >>>>>>
> >>>>>> What is required for it to be merged into mainline?
> >>>>>>
> >>>>>> Thanks
> >>>>>>
> >>>>>> Michal
> >>>>>
> >>>>> Hi Michal,
> >>>>>
> >>>>> People are actively looking to see what union mount (or overlayfs)
> >>>>> solution
> >>>>> to pursue. Val has shifted her focus away from kernel hacking these
> >>>>> days,
> >>>>> but did refresh her patch set in the last month or so.
> >>>>
> >>>> I am not aware of such refreshed patch set, at least it is not
> >>>> published in her repo.
> >>>>
> >>> Val posted the refreshed patches with the title on March 22nd:
> >>>
> >>> http://lwn.net/Articles/435019/
> >>>
> >> That article references the same four months old repo which I
> >> mentioned at the start of the thread, only a slightly different
> >> branch.
> >>
> >> While it maybe useful for testing unionmount (which I already tried)
> >> it is not a patch against current kernel which could be used to build
> >> current live images.
> >>
> >> Thanks
> >>
> >> Michal
> >
> > She did post the patch series that same date in March - you can probably
> > grab the series from linux-fsdevel, look for this series:
> >
> > "[PATCH 00/74] Union mounts version something or other"
> >
> > Al Viro was planning on looking at her refreshed patches (he had reviewed
> > them with her in person), but that is not going to happen any time soon so
> > getting more eyes and testing would be great!
> >
>
> Even gmame can't collect the patches back from the ML, I don't want to try.
>
> However, the discussion suggests that these are exactly the 4 months
> old branch ending in a commit with the summary "Temporary commit"
> which did not inspire confidence in me so I used the previous (also 4
> moths old) branch.

Yes, that's the impression I have too.

I believe David was working to update the patches and his silence
indicates he is probably bogged down with other priority work. If that's
the case, and your still interested, I might be able to help updating
the series some time soon. I haven't reviewed any of Val's series posts
for a while now so I'd need to catch up with the current state of the
project first.

I guess the first thing is to find out if David has made any progress,
David?

As for the overlayfs from Miklos I haven't looked closely at it but
since Miklos hasn't replied so far I'm guessing there's still a way to
with that as well.

Ian

2011-04-14 08:38:59

by Miklos Szeredi

[permalink] [raw]
Subject: Re: Unionmount status?

On Wed, Apr 13, 2011 at 5:13 PM, Michal Suchanek <[email protected]> wrote:
> On 13 April 2011 16:18, Jiri Kosina <[email protected]> wrote:
>> https://lkml.org/lkml/2011/3/22/222
>>
>
> Thanks for pointing out this announcement.
>
> However, neither this announcement nor the document
> Documentation/filesystems/overlayfs.txt included in the git tree
> details how to mount the filesystem. Without that it is kind of
> useless.

Oh, I'll fix that in the docs. Thanks for pointing it out.

Here's how to mount:

mount -t overlayfs overlayfs -olowerdir=/lower -oupperdir=/upper /overlay

Thanks,
Miklos

2011-04-14 09:33:07

by Michal Suchanek

[permalink] [raw]
Subject: Re: Unionmount status?

On 14 April 2011 06:50, Ian Kent <[email protected]> wrote:
> On Wed, 2011-04-13 at 21:47 +0200, Michal Suchanek wrote:
>> On 13 April 2011 21:11, Ric Wheeler <[email protected]> wrote:
>> > On 04/13/2011 02:58 PM, Michal Suchanek wrote:
>> >>
>> >> On 13 April 2011 19:26, Ric Wheeler<[email protected]>  wrote:
>> >>>
>> >>> On 04/12/2011 05:36 PM, Michal Suchanek wrote:
>> >>>>
>> >>>> On 12 April 2011 22:31, Ric Wheeler<[email protected]>    wrote:
>> >>>>>
>> >>>>> On 04/12/2011 11:00 AM, Michal Suchanek wrote:
>> >>>>>>
>> >>>>>> Hello,
>> >>>>>>
>> >>>>>> as some already know the Unionmount VFS union which has been in
>> >>>>>> development for some years now is the only True Union (TM) that can be
>> >>>>>> accepted into the kernel mainline by the VFS maintainers (for reasons
>> >>>>>> of their own which you can surely find if you search the web or ask
>> >>>>>> them directly).
>> >>>>>>
>> >>>>>> The current UnionMount version that can be found here:
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> http://git.kernel.org/?p=linux/kernel/git/val/linux-2.6.git;a=shortlog;h=refs/heads/ext2_works
>> >>>>>>
>> >>>>>> works for me as good as aufs does. That is I can build a live CD using
>> >>>>>> this unioning solution and it boots and runs without any apparent
>> >>>>>> issues.
>> >>>>>>
>> >>>>>> There are probably many possible uses of the union which I did not
>> >>>>>> test nor did I test long term stability of using the unioned
>> >>>>>> filesystem. As far as ephemeral live systems go it works fine for me,
>> >>>>>> though.
>> >>>>>>
>> >>>>>> The issue is that while the code is (nearly) finished it is not yet
>> >>>>>> merged into mainline and as I am not familiar with the details of
>> >>>>>> ever-changing Linux VFS layer forward-porting this code to current
>> >>>>>> kernels is somewhat challenging.
>> >>>>>>
>> >>>>>> What is the plan with unionmount now?
>> >>>>>>
>> >>>>>> What is required  for it to be merged into mainline?
>> >>>>>>
>> >>>>>> Thanks
>> >>>>>>
>> >>>>>> Michal
>> >>>>>
>> >>>>> Hi Michal,
>> >>>>>
>> >>>>> People are actively looking to see what union mount (or overlayfs)
>> >>>>> solution
>> >>>>> to pursue. Val has shifted her focus away from kernel hacking these
>> >>>>> days,
>> >>>>> but did refresh her patch set in the last month or so.
>> >>>>
>> >>>> I am not aware of such refreshed patch set, at least it is not
>> >>>> published in her repo.
>> >>>>
>> >>> Val posted the refreshed patches with the title on March 22nd:
>> >>>
>> >>> http://lwn.net/Articles/435019/
>> >>>
>> >> That article references the same four months old repo which I
>> >> mentioned at the start of the thread, only a slightly different
>> >> branch.
>> >>
>> >> While it maybe useful for testing unionmount (which I already tried)
>> >> it is not a patch against current kernel which could be used to build
>> >> current live images.
>> >>
>> >> Thanks
>> >>
>> >> Michal
>> >
>> > She did post the patch series that same date in March - you can probably
>> > grab the series from linux-fsdevel, look for this series:
>> >
>> > "[PATCH 00/74] Union mounts version something or other"
>> >
>> > Al Viro was planning on looking at her refreshed patches (he had reviewed
>> > them with her in person), but that is not going to happen any time soon so
>> > getting more eyes and testing would be great!
>> >
>>
>> Even gmame can't collect the patches back from the ML, I don't want to try.
>>
>> However, the discussion suggests that these are exactly the 4 months
>> old branch ending in a commit with the summary "Temporary commit"
>> which did not inspire confidence in me so I used the previous (also 4
>> moths old) branch.
>
> Yes, that's the impression I have too.
>
> I believe David was working to update the patches and his silence
> indicates he is probably bogged down with other priority work. If that's
> the case, and your still interested, I might be able to help updating
> the series some time soon. I haven't reviewed any of Val's series posts
> for a while now so I'd need to catch up with the current state of the
> project first.

I guess overlayfs includes the better part of unionmount and achieves
similar level of functionality in much smaller code size and is
actively developed.

This might make it the best candidate for inclusion so far.

It does not (yet?) support NFS which is one of the options commonly
used with union solutions, though.

I personally don;t use NFS and have not tested overlayfs so far so I can't tell.

Thanks

Michal

2011-04-14 09:40:31

by Miklos Szeredi

[permalink] [raw]
Subject: Re: Unionmount status?

On Thu, Apr 14, 2011 at 11:32 AM, Michal Suchanek <[email protected]> wrote:
> I guess overlayfs includes the better part of unionmount and achieves
> similar level of functionality in much smaller code size and is
> actively developed.
>
> This might make it the best candidate for inclusion so far.
>
> It does not (yet?) support NFS which is one of the options commonly
> used with union solutions, though.

NFS is supported as a lower (read-only) layer, but not as an upper
(read-write) layer.

Thanks,
Miklos

2011-04-14 09:48:52

by Sedat Dilek

[permalink] [raw]
Subject: Re: Unionmount status?

On Thu, Apr 14, 2011 at 10:38 AM, Miklos Szeredi <[email protected]> wrote:
> On Wed, Apr 13, 2011 at 5:13 PM, Michal Suchanek <[email protected]> wrote:
>> On 13 April 2011 16:18, Jiri Kosina <[email protected]> wrote:
>>> https://lkml.org/lkml/2011/3/22/222
>>>
>>
>> Thanks for pointing out this announcement.
>>
>> However, neither this announcement nor the document
>> Documentation/filesystems/overlayfs.txt included in the git tree
>> details how to mount the filesystem. Without that it is kind of
>> useless.
>
> Oh, I'll fix that in the docs.  Thanks for pointing it out.
>
> Here's how to mount:
>
>  mount -t overlayfs overlayfs -olowerdir=/lower -oupperdir=/upper /overlay
>
> Thanks,
> Miklos
>

Hi Miklos,

did you stop working on OverlayFS (no hear no read since 3 weeks :-))?
You made some post-v7-patchset patches, but did not publish a v8.

Also, Linus requested a finer-grained split of the big overlayfs.c
file like in the other FS.
What's the status on this?

Regards,
- Sedat -

2011-04-14 09:58:10

by Miklos Szeredi

[permalink] [raw]
Subject: Re: Unionmount status?

On Thu, Apr 14, 2011 at 11:48 AM, Sedat Dilek
<[email protected]> wrote:
> did you stop working on OverlayFS (no hear no read since 3 weeks :-))?
> You made some post-v7-patchset patches, but did not publish a v8.
>
> Also, Linus requested a finer-grained split of the big overlayfs.c
> file like in the other FS.
> What's the status on this?

I'm working on that. Will post patches shortly.

Thanks,
Miklos

2011-04-14 13:21:43

by Ric Wheeler

[permalink] [raw]
Subject: Re: Unionmount status?

On 04/14/2011 05:40 AM, Miklos Szeredi wrote:
> On Thu, Apr 14, 2011 at 11:32 AM, Michal Suchanek<[email protected]> wrote:
>> I guess overlayfs includes the better part of unionmount and achieves
>> similar level of functionality in much smaller code size and is
>> actively developed.
>>
>> This might make it the best candidate for inclusion so far.
>>
>> It does not (yet?) support NFS which is one of the options commonly
>> used with union solutions, though.
> NFS is supported as a lower (read-only) layer, but not as an upper
> (read-write) layer.
>
> Thanks,
> Miklos
I am not that concerned with the state of Val's repo, her intention was to hand
off the project cleanly to others and have them drive the code (that hand off
was the posting of the patch set). Several people (Ian, David Howells and Al
Viro) had been involved with union mounts recently, so we do have reasonable
candidates for a hand off.

One of the concerns with unionfs is the duplication of data. Union mounts avoids
this with that implementation. That might make unionfs more of a burden for very
large file systems, but probably not a concern for many use cases.

The union mount patch set is certainly considerably larger.

Christoph had expressed some concerns about locking that I think it would be
good to discuss again as well.

Ric


2011-04-14 14:55:04

by Michal Suchanek

[permalink] [raw]
Subject: Re: Unionmount status?

On 14 April 2011 15:21, Ric Wheeler <[email protected]> wrote:
> On 04/14/2011 05:40 AM, Miklos Szeredi wrote:
>>
>> On Thu, Apr 14, 2011 at 11:32 AM, Michal Suchanek<[email protected]>
>>  wrote:
>>>
>>> I guess overlayfs includes the better part of unionmount and achieves
>>> similar level of functionality in much smaller code size and is
>>> actively developed.
>>>
>>> This might make it the best candidate for inclusion so far.
>>>
>>> It does not (yet?) support NFS which is one of the options commonly
>>> used with union solutions, though.
>>
>> NFS is supported as a lower (read-only) layer, but not as an upper
>> (read-write) layer.
>>
>> Thanks,
>> Miklos
>
> I am not that concerned with the state of Val's repo, her intention was to
> hand off the project cleanly to others and have them drive the code (that
> hand off was the posting of the patch set). Several people (Ian, David
> Howells and Al Viro) had been involved with union mounts recently, so we do
> have reasonable candidates for a hand off.
>
> One of the concerns with unionfs is the duplication of data. Union mounts
> avoids this with that implementation. That might make unionfs more of a
> burden for very large file systems, but probably not a concern for many use
> cases.

Just to make things clear, what is a very large filesystem?

A heavily compressed DVD image?

Tens or hundreds of gigabytes? Terabytes?

Hundreds, thousands or hundreds of thousands of inodes?

Or is testing required to determine at what size the performance
becomes unacceptable?

Thanks

Michal

2011-04-14 19:14:33

by David Howells

[permalink] [raw]
Subject: Re: Unionmount status?

Ian Kent <[email protected]> wrote:

> I believe David was working to update the patches and his silence
> indicates he is probably bogged down with other priority work. If that's
> the case, and your still interested, I might be able to help updating
> the series some time soon. I haven't reviewed any of Val's series posts
> for a while now so I'd need to catch up with the current state of the
> project first.
>
> I guess the first thing is to find out if David has made any progress,
> David?

I started trying to forwardport them. It's not trivial because of Nick's RCU
pathwalk patches.

However, I noticed that the collection of patches I was working on wasn't the
most recent in Val's GIT tree, so I stopped work on them whilst I found out
from Val which was the correct branch to use. Her reply came just before I
jetted out to LSF, and I haven't got round to working on them again.

David

2011-04-15 11:23:04

by Michal Suchanek

[permalink] [raw]
Subject: Re: Unionmount status?

On 14 April 2011 10:38, Miklos Szeredi <[email protected]> wrote:
> On Wed, Apr 13, 2011 at 5:13 PM, Michal Suchanek <[email protected]> wrote:
>> On 13 April 2011 16:18, Jiri Kosina <[email protected]> wrote:
>>> https://lkml.org/lkml/2011/3/22/222
>>>
>>
>> Thanks for pointing out this announcement.
>>
>> However, neither this announcement nor the document
>> Documentation/filesystems/overlayfs.txt included in the git tree
>> details how to mount the filesystem. Without that it is kind of
>> useless.
>
> Oh, I'll fix that in the docs.  Thanks for pointing it out.
>
> Here's how to mount:
>
>  mount -t overlayfs overlayfs -olowerdir=/lower -oupperdir=/upper /overlay
>

OK, I built a small live CD with the v7 patch and I can boot it but I
get errors like

Cleaning up temporary files...
[ nn.nnnnnn] overlayfs: ERROR - failed to whiteout 'motd'
find: cannot delete `./motd': Operation not supported

Thanks

Michal

2011-04-15 11:31:29

by Miklos Szeredi

[permalink] [raw]
Subject: Re: Unionmount status?

On Fri, Apr 15, 2011 at 1:22 PM, Michal Suchanek <[email protected]> wrote:
> On 14 April 2011 10:38, Miklos Szeredi <[email protected]> wrote:
>> On Wed, Apr 13, 2011 at 5:13 PM, Michal Suchanek <[email protected]> wrote:
>>> On 13 April 2011 16:18, Jiri Kosina <[email protected]> wrote:
>>>> https://lkml.org/lkml/2011/3/22/222
>>>>
>>>
>>> Thanks for pointing out this announcement.
>>>
>>> However, neither this announcement nor the document
>>> Documentation/filesystems/overlayfs.txt included in the git tree
>>> details how to mount the filesystem. Without that it is kind of
>>> useless.
>>
>> Oh, I'll fix that in the docs.  Thanks for pointing it out.
>>
>> Here's how to mount:
>>
>>  mount -t overlayfs overlayfs -olowerdir=/lower -oupperdir=/upper /overlay
>>
>
> OK, I built a small live CD with the v7 patch and I can boot it but I
> get errors like
>
> Cleaning up temporary files...
> [    nn.nnnnnn] overlayfs: ERROR - failed to whiteout 'motd'
> find: cannot delete `./motd': Operation not supported

What is the upper filesystem type? Is xattr support enabled?

Thanks,
Miklos

2011-04-15 11:51:33

by Michal Suchanek

[permalink] [raw]
Subject: Re: Unionmount status?

On 15 April 2011 13:31, Miklos Szeredi <[email protected]> wrote:
> On Fri, Apr 15, 2011 at 1:22 PM, Michal Suchanek <[email protected]> wrote:
>> On 14 April 2011 10:38, Miklos Szeredi <[email protected]> wrote:
>>> On Wed, Apr 13, 2011 at 5:13 PM, Michal Suchanek <[email protected]> wrote:
>>>> On 13 April 2011 16:18, Jiri Kosina <[email protected]> wrote:
>>>>> https://lkml.org/lkml/2011/3/22/222
>>>>>
>>>>
>>>> Thanks for pointing out this announcement.
>>>>
>>>> However, neither this announcement nor the document
>>>> Documentation/filesystems/overlayfs.txt included in the git tree
>>>> details how to mount the filesystem. Without that it is kind of
>>>> useless.
>>>
>>> Oh, I'll fix that in the docs.  Thanks for pointing it out.
>>>
>>> Here's how to mount:
>>>
>>>  mount -t overlayfs overlayfs -olowerdir=/lower -oupperdir=/upper /overlay
>>>
>>
>> OK, I built a small live CD with the v7 patch and I can boot it but I
>> get errors like
>>
>> Cleaning up temporary files...
>> [    nn.nnnnnn] overlayfs: ERROR - failed to whiteout 'motd'
>> find: cannot delete `./motd': Operation not supported
>
> What is the upper filesystem type?  Is xattr support enabled?
>

The upper filesystem is tmpfs and there is not option regarding XATTR
in the config.

Thanks

Michal

2011-04-15 12:29:28

by Miklos Szeredi

[permalink] [raw]
Subject: Re: Unionmount status?

On Fri, Apr 15, 2011 at 1:51 PM, Michal Suchanek <[email protected]> wrote:
> On 15 April 2011 13:31, Miklos Szeredi <[email protected]> wrote:
>> On Fri, Apr 15, 2011 at 1:22 PM, Michal Suchanek <[email protected]> wrote:
>>>
>>> Cleaning up temporary files...
>>> [    nn.nnnnnn] overlayfs: ERROR - failed to whiteout 'motd'
>>> find: cannot delete `./motd': Operation not supported
>>
>> What is the upper filesystem type?  Is xattr support enabled?
>>
>
> The upper filesystem is tmpfs and there is not option regarding XATTR
> in the config.

Apparently tmpfs does not support generic xattr. I understand why
tmpfs is an attractive choice for an upper filesystem, so this should
be addressed.

I see two options here:

1) implement generic xattr in tmpfs
2) take whiteout/opaque support from union mounts and use that

Both have advantages and disadvantages.

Any thoughts?

Thanks,
Miklos

2011-04-15 12:34:40

by Michal Suchanek

[permalink] [raw]
Subject: Re: Unionmount status?

On 15 April 2011 14:29, Miklos Szeredi <[email protected]> wrote:
> On Fri, Apr 15, 2011 at 1:51 PM, Michal Suchanek <[email protected]> wrote:
>> On 15 April 2011 13:31, Miklos Szeredi <[email protected]> wrote:
>>> On Fri, Apr 15, 2011 at 1:22 PM, Michal Suchanek <[email protected]> wrote:
>>>>
>>>> Cleaning up temporary files...
>>>> [    nn.nnnnnn] overlayfs: ERROR - failed to whiteout 'motd'
>>>> find: cannot delete `./motd': Operation not supported
>>>
>>> What is the upper filesystem type?  Is xattr support enabled?
>>>
>>
>> The upper filesystem is tmpfs and there is not option regarding XATTR
>> in the config.
>
> Apparently tmpfs does not support generic xattr.  I understand why
> tmpfs is an attractive choice for an upper filesystem, so this should
> be addressed.
>
> I see two options here:
>
> 1) implement generic xattr in tmpfs

IMHO this is a general feature that keeps overlayfs in itself clean
and small and can benefit other uses of tmpfs.

> 2) take whiteout/opaque support from union mounts and use that

How far from importing full unionmount is that?

Thanks

Michal

2011-04-15 12:48:09

by Miklos Szeredi

[permalink] [raw]
Subject: Re: Unionmount status?

On Fri, Apr 15, 2011 at 2:34 PM, Michal Suchanek <[email protected]> wrote:
> On 15 April 2011 14:29, Miklos Szeredi <[email protected]> wrote:
>> 2) take whiteout/opaque support from union mounts and use that
>
> How far from importing full unionmount is that?

Whiteout/opaque support is quite separate from the rest of union
mounts, and could be reusable for overlayfs.

There are several reasons why I didn't want to go that way with:

- lots of filesystems would have to be updated
- it introduces incompatibility in the filesystem format, which can be
a real pain (not for tmpfs, obviously, since tmpfs doesn't have a
persistent backing)

There *are* advantages to doing whiteouts in the filesystem, for
example it makes file removal atomic. But atomicity is something that
needs to be addressed in lots of other places (e.g. copy up) not just
during whiteout, and there are other ways to do that than push the
responsibility into each and every filesystem.

Thanks,
Miklos

2011-04-15 16:31:50

by Ric Wheeler

[permalink] [raw]
Subject: Re: Unionmount status?

On 04/14/2011 10:54 AM, Michal Suchanek wrote:
> On 14 April 2011 15:21, Ric Wheeler<[email protected]> wrote:
>> On 04/14/2011 05:40 AM, Miklos Szeredi wrote:
>>> On Thu, Apr 14, 2011 at 11:32 AM, Michal Suchanek<[email protected]>
>>> wrote:
>>>> I guess overlayfs includes the better part of unionmount and achieves
>>>> similar level of functionality in much smaller code size and is
>>>> actively developed.
>>>>
>>>> This might make it the best candidate for inclusion so far.
>>>>
>>>> It does not (yet?) support NFS which is one of the options commonly
>>>> used with union solutions, though.
>>> NFS is supported as a lower (read-only) layer, but not as an upper
>>> (read-write) layer.
>>>
>>> Thanks,
>>> Miklos
>> I am not that concerned with the state of Val's repo, her intention was to
>> hand off the project cleanly to others and have them drive the code (that
>> hand off was the posting of the patch set). Several people (Ian, David
>> Howells and Al Viro) had been involved with union mounts recently, so we do
>> have reasonable candidates for a hand off.
>>
>> One of the concerns with unionfs is the duplication of data. Union mounts
>> avoids this with that implementation. That might make unionfs more of a
>> burden for very large file systems, but probably not a concern for many use
>> cases.
> Just to make things clear, what is a very large filesystem?
>
> A heavily compressed DVD image?
>
> Tens or hundreds of gigabytes? Terabytes?
>
> Hundreds, thousands or hundreds of thousands of inodes?
>
> Or is testing required to determine at what size the performance
> becomes unacceptable?
>
> Thanks
>
> Michal

Very large in the number of inodes more so than fs size...

Ric

2011-04-15 21:48:33

by Hugh Dickins

[permalink] [raw]
Subject: Re: Unionmount status?

On Fri, Apr 15, 2011 at 5:29 AM, Miklos Szeredi <[email protected]> wrote:
> On Fri, Apr 15, 2011 at 1:51 PM, Michal Suchanek <[email protected]> wrote:
>> On 15 April 2011 13:31, Miklos Szeredi <[email protected]> wrote:
>>> On Fri, Apr 15, 2011 at 1:22 PM, Michal Suchanek <[email protected]> wrote:
>>>>
>>>> Cleaning up temporary files...
>>>> [    nn.nnnnnn] overlayfs: ERROR - failed to whiteout 'motd'
>>>> find: cannot delete `./motd': Operation not supported
>>>
>>> What is the upper filesystem type?  Is xattr support enabled?
>>>
>>
>> The upper filesystem is tmpfs and there is not option regarding XATTR
>> in the config.
>
> Apparently tmpfs does not support generic xattr.  I understand why
> tmpfs is an attractive choice for an upper filesystem, so this should
> be addressed.
>
> I see two options here:
>
> 1) implement generic xattr in tmpfs
> 2) take whiteout/opaque support from union mounts and use that
>
> Both have advantages and disadvantages.
>
> Any thoughts?

Please get together with Eric: he's having trouble whipping up
enthusiasm for his tmpfs xattr patch, he and I would both appreciate
your support (and if I've persuaded him to leave out a part of it that
you would need, join forces to call me an idiot). Mainly I'm hoping
to hear from hch and jmorris on it, being xattr-illiterate myself.

https://lkml.org/lkml/2011/3/29/302

Hugh

2011-04-15 22:18:48

by Andreas Dilger

[permalink] [raw]
Subject: Re: Unionmount status?

On 2011-04-15, at 6:29 AM, Miklos Szeredi wrote:
> Apparently tmpfs does not support generic xattr. I understand why
> tmpfs is an attractive choice for an upper filesystem, so this should
> be addressed.
>
> I see two options here:
>
> 1) implement generic xattr in tmpfs

There was a patch posted recently to add xattr support to tmpfs, so that it can use security labels:

From: Eric Paris <[email protected]>
Subject: [PATCH] tmpfs: implement xattr support for the entire security namespace
Date: March 29, 2011 12:56:49 PM MDT

Cheers, Andreas




2011-04-18 13:31:54

by Michal Suchanek

[permalink] [raw]
Subject: Re: Unionmount status?

On 16 April 2011 00:18, Andreas Dilger <[email protected]> wrote:
> On 2011-04-15, at 6:29 AM, Miklos Szeredi wrote:
>> Apparently tmpfs does not support generic xattr.  I understand why
>> tmpfs is an attractive choice for an upper filesystem, so this should
>> be addressed.
>>
>> I see two options here:
>>
>> 1) implement generic xattr in tmpfs
>
> There was a patch posted recently to add xattr support to tmpfs, so that it can use security labels:
>
> From: Eric Paris <[email protected]>
> Subject: [PATCH] tmpfs: implement xattr support for the entire security namespace
> Date: March 29, 2011 12:56:49 PM MDT
>
> Cheers, Andreas

Applying this patch is not sufficient. Apparently more xattrs are
needed but adding them on top of this patch should be easy.

The ones mentioned in the overlayfs doc are

trusted.overlay.whiteout
trusted.overlay.opaque

The patch implements security.*

Thanks

Michal

2011-04-18 13:35:00

by Michal Suchanek

[permalink] [raw]
Subject: Re: Unionmount status?

On 16 April 2011 00:18, Andreas Dilger <[email protected]> wrote:
> On 2011-04-15, at 6:29 AM, Miklos Szeredi wrote:
>> Apparently tmpfs does not support generic xattr.  I understand why
>> tmpfs is an attractive choice for an upper filesystem, so this should
>> be addressed.
>>
>> I see two options here:
>>
>> 1) implement generic xattr in tmpfs
>
> There was a patch posted recently to add xattr support to tmpfs, so that it can use security labels:
>
> From: Eric Paris <[email protected]>
> Subject: [PATCH] tmpfs: implement xattr support for the entire security namespace
> Date: March 29, 2011 12:56:49 PM MDT
>
> Cheers, Andreas

Applying this patch is not sufficient. Apparently more xattrs are
needed but adding them on top of this patch should be easy.

The ones mentioned in the overlayfs doc are

trusted.overlay.whiteout
trusted.overlay.opaque

The patch implements security.*

Thanks

Michal

2011-04-18 13:38:20

by Michal Suchanek

[permalink] [raw]
Subject: Re: Unionmount status?

On 16 April 2011 00:18, Andreas Dilger <[email protected]> wrote:
> On 2011-04-15, at 6:29 AM, Miklos Szeredi wrote:
>> Apparently tmpfs does not support generic xattr.  I understand why
>> tmpfs is an attractive choice for an upper filesystem, so this should
>> be addressed.
>>
>> I see two options here:
>>
>> 1) implement generic xattr in tmpfs
>
> There was a patch posted recently to add xattr support to tmpfs, so that it can use security labels:
>
> From: Eric Paris <[email protected]>
> Subject: [PATCH] tmpfs: implement xattr support for the entire security namespace
> Date: March 29, 2011 12:56:49 PM MDT
>
> Cheers, Andreas

Applying this patch is not sufficient. Apparently more xattrs are
needed but adding them on top of this patch should be easy.

The ones mentioned in the overlayfs doc are

trusted.overlay.whiteout
trusted.overlay.opaque

The patch implements security.*

Thanks

Michal

2011-04-19 20:04:26

by Miklos Szeredi

[permalink] [raw]
Subject: [PATCH] tmpfs: implement generic xattr support

Michal Suchanek <[email protected]> writes:
> Applying this patch is not sufficient. Apparently more xattrs are
> needed but adding them on top of this patch should be easy.
>
> The ones mentioned in the overlayfs doc are
>
> trusted.overlay.whiteout
> trusted.overlay.opaque
>
> The patch implements security.*

Here's an updated patch. It changes a number of things:

- it implements shmem specific xattr ops instead of using the generic_*
functions. Which means that there's no back and forth between the
VFS and the filesystem. I basically copied the btrfs way of doing
things.

- adds a new config option: CONFIG_TMPFS_XATTR and makes
CONFIG_TMPFS_POSIX_ACL depend on this. This way xattr support can be
turned on without also adding ACL support.

- now supports trusted.* namespace needed by overlayfs in addition to
security.*. Doesn't yet support user.* since that needs more thought
wrt. kernel memory use.

- supports xattrs on symlinks, again needed by overlayfs

Hugh, Eric, does this look acceptable?

Thanks,
Miklos

---
From: Eric Paris <[email protected]>
Subject: tmpfs: implement generic xattr support

This patch implements generic xattrs for tmpfs filesystems. The feodra
project, while trying to replace suid apps with file capabilities,
realized that tmpfs, which is used on the build systems, does not
support file capabilities and thus cannot be used to build packages
which use file capabilities. Xattrs are also needed for overlayfs.

The xattr interface is a bit, odd. If a filesystem does not implement any
{get,set,list}xattr functions the VFS will call into some random LSM hooks and
the running LSM can then implement some method for handling xattrs. SELinux
for example provides a method to support security.selinux but no other
security.* xattrs.

As it stands today when one enables CONFIG_TMPFS_POSIX_ACL tmpfs will have
xattr handler routines specifically to handle acls. Because of this tmpfs
would loose the VFS/LSM helpers to support the running LSM. To make up for
that tmpfs had stub functions that did nothing but call into the LSM hooks
which implement the helpers.

This new patch does not use the LSM fallback functions and instead
just implements a native get/set/list xattr feature for the full
security.* and trusted.* namespace like a normal filesystem. This
means that tmpfs can now support both security.selinux and
security.capability, which was not previously possible.

The basic implementation is that I attach a:

struct shmem_xattr {
struct list_head list; /* anchored by shmem_inode_info->xattr_list */
char *name;
size_t size;
char value[0];
};

Into the struct shmem_inode_info for each xattr that is set. This
implementation could easily support the user.* namespace as well,
except some care needs to be taken to prevent large amounts of
unswappable memory being allocated for unprivileged users.

Signed-off-by: Eric Paris <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>
---
fs/Kconfig | 18 ++
include/linux/shmem_fs.h | 1
mm/shmem.c | 302 +++++++++++++++++++++++++++++++++++++++--------
3 files changed, 273 insertions(+), 48 deletions(-)

Index: linux-2.6/fs/Kconfig
===================================================================
--- linux-2.6.orig/fs/Kconfig 2011-04-19 21:09:33.000000000 +0200
+++ linux-2.6/fs/Kconfig 2011-04-19 21:09:35.000000000 +0200
@@ -121,9 +121,25 @@ config TMPFS

See <file:Documentation/filesystems/tmpfs.txt> for details.

+config TMPFS_XATTR
+ bool "Tmpfs extended attributes"
+ depends on TMPFS
+ default y
+ help
+ Extended attributes are name:value pairs associated with inodes by
+ the kernel or by users (see the attr(5) manual page, or visit
+ <http://acl.bestbits.at/> for details).
+
+ Currently this enables support for the trusted.* and
+ security.* namespaces.
+
+ If unsure, say N.
+
+ You need this for POSIX ACL support on tmpfs.
+
config TMPFS_POSIX_ACL
bool "Tmpfs POSIX Access Control Lists"
- depends on TMPFS
+ depends on TMPFS_XATTR
select GENERIC_ACL
help
POSIX Access Control Lists (ACLs) support permissions for users and
Index: linux-2.6/include/linux/shmem_fs.h
===================================================================
--- linux-2.6.orig/include/linux/shmem_fs.h 2011-04-19 21:09:25.000000000 +0200
+++ linux-2.6/include/linux/shmem_fs.h 2011-04-19 21:09:35.000000000 +0200
@@ -19,6 +19,7 @@ struct shmem_inode_info {
struct page *i_indirect; /* top indirect blocks page */
swp_entry_t i_direct[SHMEM_NR_DIRECT]; /* first blocks */
struct list_head swaplist; /* chain of maybes on swap */
+ struct list_head xattr_list; /* list of shmem_xattr */
struct inode vfs_inode;
};

Index: linux-2.6/mm/shmem.c
===================================================================
--- linux-2.6.orig/mm/shmem.c 2011-04-19 21:09:25.000000000 +0200
+++ linux-2.6/mm/shmem.c 2011-04-19 21:09:35.000000000 +0200
@@ -99,6 +99,13 @@ static struct vfsmount *shm_mnt;
/* Pretend that each entry is of this size in directory's i_size */
#define BOGO_DIRENT_SIZE 20

+struct shmem_xattr {
+ struct list_head list; /* anchored by shmem_inode_info->xattr_list */
+ char *name; /* xattr name */
+ size_t size;
+ char value[0];
+};
+
/* Flag allocation requirements to shmem_getpage and shmem_swp_alloc */
enum sgp_type {
SGP_READ, /* don't exceed i_size, don't allocate page */
@@ -822,6 +829,7 @@ static int shmem_notify_change(struct de
static void shmem_evict_inode(struct inode *inode)
{
struct shmem_inode_info *info = SHMEM_I(inode);
+ struct shmem_xattr *xattr, *nxattr;

if (inode->i_mapping->a_ops == &shmem_aops) {
truncate_inode_pages(inode->i_mapping, 0);
@@ -834,6 +842,11 @@ static void shmem_evict_inode(struct ino
mutex_unlock(&shmem_swaplist_mutex);
}
}
+
+ list_for_each_entry_safe(xattr, nxattr, &info->xattr_list, list) {
+ kfree(xattr->name);
+ kfree(xattr);
+ }
BUG_ON(inode->i_blocks);
shmem_free_inode(inode->i_sb);
end_writeback(inode);
@@ -1597,6 +1610,7 @@ static struct inode *shmem_get_inode(str
spin_lock_init(&info->lock);
info->flags = flags & VM_NORESERVE;
INIT_LIST_HEAD(&info->swaplist);
+ INIT_LIST_HEAD(&info->xattr_list);
cache_no_acl(inode);

switch (mode & S_IFMT) {
@@ -2048,62 +2062,225 @@ static void shmem_put_link(struct dentry
}
}

-static const struct inode_operations shmem_symlink_inline_operations = {
- .readlink = generic_readlink,
- .follow_link = shmem_follow_link_inline,
-};
-
-static const struct inode_operations shmem_symlink_inode_operations = {
- .readlink = generic_readlink,
- .follow_link = shmem_follow_link,
- .put_link = shmem_put_link,
-};
-
-#ifdef CONFIG_TMPFS_POSIX_ACL
+#ifdef CONFIG_TMPFS_XATTR
/*
- * Superblocks without xattr inode operations will get security.* xattr
- * support from the VFS "for free". As soon as we have any other xattrs
+ * Superblocks without xattr inode operations may get some security.* xattr
+ * support from the LSM "for free". As soon as we have any other xattrs
* like ACLs, we also need to implement the security.* handlers at
* filesystem level, though.
*/

-static size_t shmem_xattr_security_list(struct dentry *dentry, char *list,
- size_t list_len, const char *name,
- size_t name_len, int handler_flags)
+static int shmem_xattr_get(struct dentry *dentry, const char *name,
+ void *buffer, size_t size)
{
- return security_inode_listsecurity(dentry->d_inode, list, list_len);
-}
+ struct shmem_inode_info *info;
+ struct shmem_xattr *xattr;
+ int ret = -ENODATA;

-static int shmem_xattr_security_get(struct dentry *dentry, const char *name,
- void *buffer, size_t size, int handler_flags)
-{
- if (strcmp(name, "") == 0)
- return -EINVAL;
- return xattr_getsecurity(dentry->d_inode, name, buffer, size);
+ info = SHMEM_I(dentry->d_inode);
+
+ spin_lock(&dentry->d_inode->i_lock);
+ list_for_each_entry(xattr, &info->xattr_list, list) {
+ if (strcmp(name, xattr->name))
+ continue;
+
+ ret = xattr->size;
+ if (buffer) {
+ if (size < xattr->size)
+ ret = -ERANGE;
+ else
+ memcpy(buffer, xattr->value, xattr->size);
+ }
+ break;
+ }
+ spin_unlock(&dentry->d_inode->i_lock);
+ return ret;
}

-static int shmem_xattr_security_set(struct dentry *dentry, const char *name,
- const void *value, size_t size, int flags, int handler_flags)
+static int shmem_xattr_set(struct dentry *dentry, const char *name,
+ const void *value, size_t size, int flags)
{
- if (strcmp(name, "") == 0)
- return -EINVAL;
- return security_inode_setsecurity(dentry->d_inode, name, value,
- size, flags);
+ struct inode *inode = dentry->d_inode;
+ struct shmem_inode_info *info = SHMEM_I(inode);
+ struct shmem_xattr *xattr;
+ struct shmem_xattr *new_xattr = NULL;
+ size_t len;
+ int err = 0;
+
+ /* value == NULL means remove */
+ if (value) {
+ /* wrap around? */
+ len = sizeof(*new_xattr) + size;
+ if (len <= sizeof(*new_xattr))
+ return -ENOMEM;
+
+ new_xattr = kmalloc(len, GFP_NOFS);
+ if (!new_xattr)
+ return -ENOMEM;
+
+ new_xattr->name = kstrdup(name, GFP_NOFS);
+ if (!new_xattr->name) {
+ kfree(new_xattr);
+ return -ENOMEM;
+ }
+
+ new_xattr->size = size;
+ memcpy(new_xattr->value, value, size);
+ }
+
+ spin_lock(&inode->i_lock);
+ list_for_each_entry(xattr, &info->xattr_list, list) {
+ if (!strcmp(name, xattr->name)) {
+ if (flags & XATTR_CREATE) {
+ xattr = new_xattr;
+ err = -EEXIST;
+ } else if (new_xattr) {
+ list_replace(&xattr->list, &new_xattr->list);
+ } else {
+ list_del(&xattr->list);
+ }
+ goto out;
+ }
+ }
+ if (flags & XATTR_REPLACE) {
+ xattr = new_xattr;
+ err = -ENODATA;
+ } else {
+ list_add(&new_xattr->list, &info->xattr_list);
+ xattr = NULL;
+ }
+out:
+ spin_unlock(&inode->i_lock);
+ if (xattr)
+ kfree(xattr->name);
+ kfree(xattr);
+ return 0;
}

-static const struct xattr_handler shmem_xattr_security_handler = {
- .prefix = XATTR_SECURITY_PREFIX,
- .list = shmem_xattr_security_list,
- .get = shmem_xattr_security_get,
- .set = shmem_xattr_security_set,
-};

static const struct xattr_handler *shmem_xattr_handlers[] = {
+#ifdef CONFIG_TMPFS_POSIX_ACL
&generic_acl_access_handler,
&generic_acl_default_handler,
- &shmem_xattr_security_handler,
+#endif
NULL
};
+
+static int shmem_xattr_validate(const char *name)
+{
+ struct { const char *prefix; size_t len; } arr[] = {
+ { XATTR_SECURITY_PREFIX, XATTR_SECURITY_PREFIX_LEN },
+ { XATTR_TRUSTED_PREFIX, XATTR_TRUSTED_PREFIX_LEN }};
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(arr); i++) {
+ size_t preflen = arr[i].len;
+ if (strncmp(name, arr[i].prefix, preflen) == 0) {
+ if (!name[preflen])
+ return -EINVAL;
+ return 0;
+ }
+ }
+ return -EOPNOTSUPP;
+}
+
+static ssize_t shmem_getxattr(struct dentry *dentry, const char *name,
+ void *buffer, size_t size)
+{
+ int err;
+
+ /*
+ * If this is a request for a synthetic attribute in the system.*
+ * namespace use the generic infrastructure to resolve a handler
+ * for it via sb->s_xattr.
+ */
+ if (!strncmp(name, XATTR_SYSTEM_PREFIX, XATTR_SYSTEM_PREFIX_LEN))
+ return generic_getxattr(dentry, name, buffer, size);
+
+ err = shmem_xattr_validate(name);
+ if (err)
+ return err;
+
+ return shmem_xattr_get(dentry, name, buffer, size);
+}
+
+static int shmem_setxattr(struct dentry *dentry, const char *name,
+ const void *value, size_t size, int flags)
+{
+ int err;
+
+ /*
+ * If this is a request for a synthetic attribute in the system.*
+ * namespace use the generic infrastructure to resolve a handler
+ * for it via sb->s_xattr.
+ */
+ if (!strncmp(name, XATTR_SYSTEM_PREFIX, XATTR_SYSTEM_PREFIX_LEN))
+ return generic_setxattr(dentry, name, value, size, flags);
+
+ err = shmem_xattr_validate(name);
+ if (err)
+ return err;
+
+ if (size == 0)
+ value = ""; /* empty EA, do not remove */
+
+ return shmem_xattr_set(dentry, name, value, size, flags);
+
+}
+
+static int shmem_removexattr(struct dentry *dentry, const char *name)
+{
+ int err;
+
+ /*
+ * If this is a request for a synthetic attribute in the system.*
+ * namespace use the generic infrastructure to resolve a handler
+ * for it via sb->s_xattr.
+ */
+ if (!strncmp(name, XATTR_SYSTEM_PREFIX, XATTR_SYSTEM_PREFIX_LEN))
+ return generic_removexattr(dentry, name);
+
+ err = shmem_xattr_validate(name);
+ if (err)
+ return err;
+
+ return shmem_xattr_set(dentry, name, NULL, 0, XATTR_REPLACE);
+}
+
+static bool xattr_is_trusted(const char *name)
+{
+ return !strncmp(name, XATTR_TRUSTED_PREFIX, XATTR_TRUSTED_PREFIX_LEN);
+}
+
+static ssize_t shmem_listxattr(struct dentry *dentry, char *buffer, size_t size)
+{
+ bool trusted = capable(CAP_SYS_ADMIN);
+ struct shmem_xattr *xattr;
+ struct shmem_inode_info *info;
+ size_t used = 0;
+
+ info = SHMEM_I(dentry->d_inode);
+
+ spin_lock(&dentry->d_inode->i_lock);
+ list_for_each_entry(xattr, &info->xattr_list, list) {
+ /* skip "trusted." attributes for unprivileged callers */
+ if (!trusted && xattr_is_trusted(xattr->name))
+ continue;
+
+ used += strlen(xattr->name) + 1;
+ if (buffer) {
+ if (size < used) {
+ used = -ERANGE;
+ break;
+ }
+ strncpy(buffer, xattr->name, strlen(xattr->name) + 1);
+ buffer += strlen(xattr->name) + 1;
+ }
+ }
+ spin_unlock(&dentry->d_inode->i_lock);
+
+ return used;
+}
#endif

static struct dentry *shmem_get_parent(struct dentry *child)
@@ -2384,8 +2561,10 @@ int shmem_fill_super(struct super_block
sb->s_magic = TMPFS_MAGIC;
sb->s_op = &shmem_ops;
sb->s_time_gran = 1;
-#ifdef CONFIG_TMPFS_POSIX_ACL
+#ifdef CONFIG_TMPFS_XATTR
sb->s_xattr = shmem_xattr_handlers;
+#endif
+#ifdef CONFIG_TMPFS_POSIX_ACL
sb->s_flags |= MS_POSIXACL;
#endif

@@ -2483,16 +2662,41 @@ static const struct file_operations shme
static const struct inode_operations shmem_inode_operations = {
.setattr = shmem_notify_change,
.truncate_range = shmem_truncate_range,
+#ifdef CONFIG_TMPFS_XATTR
+ .setxattr = shmem_setxattr,
+ .getxattr = shmem_getxattr,
+ .listxattr = shmem_listxattr,
+ .removexattr = shmem_removexattr,
+#endif
#ifdef CONFIG_TMPFS_POSIX_ACL
- .setxattr = generic_setxattr,
- .getxattr = generic_getxattr,
- .listxattr = generic_listxattr,
- .removexattr = generic_removexattr,
.check_acl = generic_check_acl,
#endif

};

+static const struct inode_operations shmem_symlink_inline_operations = {
+ .readlink = generic_readlink,
+ .follow_link = shmem_follow_link_inline,
+#ifdef CONFIG_TMPFS_XATTR
+ .setxattr = shmem_setxattr,
+ .getxattr = shmem_getxattr,
+ .listxattr = shmem_listxattr,
+ .removexattr = shmem_removexattr,
+#endif
+};
+
+static const struct inode_operations shmem_symlink_inode_operations = {
+ .readlink = generic_readlink,
+ .follow_link = shmem_follow_link,
+ .put_link = shmem_put_link,
+#ifdef CONFIG_TMPFS_XATTR
+ .setxattr = shmem_setxattr,
+ .getxattr = shmem_getxattr,
+ .listxattr = shmem_listxattr,
+ .removexattr = shmem_removexattr,
+#endif
+};
+
static const struct inode_operations shmem_dir_inode_operations = {
#ifdef CONFIG_TMPFS
.create = shmem_create,
@@ -2505,23 +2709,27 @@ static const struct inode_operations shm
.mknod = shmem_mknod,
.rename = shmem_rename,
#endif
-#ifdef CONFIG_TMPFS_POSIX_ACL
- .setattr = shmem_notify_change,
+#ifdef CONFIG_TMPFS_XATTR
.setxattr = generic_setxattr,
.getxattr = generic_getxattr,
.listxattr = generic_listxattr,
.removexattr = generic_removexattr,
+#endif
+#ifdef CONFIG_TMPFS_POSIX_ACL
+ .setattr = shmem_notify_change,
.check_acl = generic_check_acl,
#endif
};

static const struct inode_operations shmem_special_inode_operations = {
-#ifdef CONFIG_TMPFS_POSIX_ACL
- .setattr = shmem_notify_change,
+#ifdef CONFIG_TMPFS_XATTR
.setxattr = generic_setxattr,
.getxattr = generic_getxattr,
.listxattr = generic_listxattr,
.removexattr = generic_removexattr,
+#endif
+#ifdef CONFIG_TMPFS_POSIX_ACL
+ .setattr = shmem_notify_change,
.check_acl = generic_check_acl,
#endif
};

2011-04-20 02:18:24

by Phillip Lougher

[permalink] [raw]
Subject: Re: [PATCH] tmpfs: implement generic xattr support

On Tue, Apr 19, 2011 at 9:04 PM, Miklos Szeredi <[email protected]> wrote:

> +static ssize_t shmem_listxattr(struct dentry *dentry, char *buffer, size_t size)

> + ? ? ? spin_lock(&dentry->d_inode->i_lock);
> + ? ? ? list_for_each_entry(xattr, &info->xattr_list, list) {
> + ? ? ? ? ? ? ? /* skip "trusted." attributes for unprivileged callers */
> + ? ? ? ? ? ? ? if (!trusted && xattr_is_trusted(xattr->name))
> + ? ? ? ? ? ? ? ? ? ? ? continue;
> +
> + ? ? ? ? ? ? ? used += strlen(xattr->name) + 1;
> + ? ? ? ? ? ? ? if (buffer) {
> + ? ? ? ? ? ? ? ? ? ? ? if (size < used) {
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? used = -ERANGE;
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? break;
> + ? ? ? ? ? ? ? ? ? ? ? }
> + ? ? ? ? ? ? ? ? ? ? ? strncpy(buffer, xattr->name, strlen(xattr->name) + 1);
>+ buffer += strlen(xattr->name) + 1;

Why are you doing a strncpy here? strcpy() isn't going to copy more
than strlen(xattr->name) + 1 bytes, and you know buffer is large
enough to hold that because of the previous if (size < used) check?

If you assigned the first strlen(xattr->name) + 1 to a temporary
variable, you could use memcpy here, and avoid the 3 repeated
strlen(xattr->name) calls.

Phillip

2011-04-20 13:43:29

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH] tmpfs: implement generic xattr support

Phillip Lougher <[email protected]> writes:

> On Tue, Apr 19, 2011 at 9:04 PM, Miklos Szeredi <[email protected]> wrote:
>
>> +static ssize_t shmem_listxattr(struct dentry *dentry, char *buffer, size_t size)
>
>> +       spin_lock(&dentry->d_inode->i_lock);
>> +       list_for_each_entry(xattr, &info->xattr_list, list) {
>> +               /* skip "trusted." attributes for unprivileged callers */
>> +               if (!trusted && xattr_is_trusted(xattr->name))
>> +                       continue;
>> +
>> +               used += strlen(xattr->name) + 1;
>> +               if (buffer) {
>> +                       if (size < used) {
>> +                               used = -ERANGE;
>> +                               break;
>> +                       }
>> +                       strncpy(buffer, xattr->name, strlen(xattr->name) + 1);
>>+ buffer += strlen(xattr->name) + 1;
>
> Why are you doing a strncpy here? strcpy() isn't going to copy more
> than strlen(xattr->name) + 1 bytes, and you know buffer is large
> enough to hold that because of the previous if (size < used) check?
>
> If you assigned the first strlen(xattr->name) + 1 to a temporary
> variable, you could use memcpy here, and avoid the 3 repeated
> strlen(xattr->name) calls.

Yeah, makes sense.

Thanks,
Miklos

2011-04-20 16:02:10

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH] tmpfs: implement generic xattr support

Quoting Miklos Szeredi ([email protected]):
> Michal Suchanek <[email protected]> writes:
> > Applying this patch is not sufficient. Apparently more xattrs are
> > needed but adding them on top of this patch should be easy.
> >
> > The ones mentioned in the overlayfs doc are
> >
> > trusted.overlay.whiteout
> > trusted.overlay.opaque
> >
> > The patch implements security.*
>
> Here's an updated patch. It changes a number of things:
>
> - it implements shmem specific xattr ops instead of using the generic_*
> functions. Which means that there's no back and forth between the
> VFS and the filesystem. I basically copied the btrfs way of doing
> things.
>
> - adds a new config option: CONFIG_TMPFS_XATTR and makes
> CONFIG_TMPFS_POSIX_ACL depend on this. This way xattr support can be
> turned on without also adding ACL support.
>
> - now supports trusted.* namespace needed by overlayfs in addition to
> security.*. Doesn't yet support user.* since that needs more thought
> wrt. kernel memory use.
>
> - supports xattrs on symlinks, again needed by overlayfs
>
> Hugh, Eric, does this look acceptable?
>
> Thanks,
> Miklos
>
> ---
> From: Eric Paris <[email protected]>
> Subject: tmpfs: implement generic xattr support
>
> This patch implements generic xattrs for tmpfs filesystems. The feodra
> project, while trying to replace suid apps with file capabilities,
> realized that tmpfs, which is used on the build systems, does not
> support file capabilities and thus cannot be used to build packages
> which use file capabilities. Xattrs are also needed for overlayfs.
>
> The xattr interface is a bit, odd. If a filesystem does not implement any
> {get,set,list}xattr functions the VFS will call into some random LSM hooks and
> the running LSM can then implement some method for handling xattrs. SELinux
> for example provides a method to support security.selinux but no other
> security.* xattrs.
>
> As it stands today when one enables CONFIG_TMPFS_POSIX_ACL tmpfs will have
> xattr handler routines specifically to handle acls. Because of this tmpfs
> would loose the VFS/LSM helpers to support the running LSM. To make up for
> that tmpfs had stub functions that did nothing but call into the LSM hooks
> which implement the helpers.
>
> This new patch does not use the LSM fallback functions and instead
> just implements a native get/set/list xattr feature for the full
> security.* and trusted.* namespace like a normal filesystem. This
> means that tmpfs can now support both security.selinux and
> security.capability, which was not previously possible.
>
> The basic implementation is that I attach a:
>
> struct shmem_xattr {
> struct list_head list; /* anchored by shmem_inode_info->xattr_list */
> char *name;
> size_t size;
> char value[0];
> };
>
> Into the struct shmem_inode_info for each xattr that is set. This
> implementation could easily support the user.* namespace as well,
> except some care needs to be taken to prevent large amounts of
> unswappable memory being allocated for unprivileged users.
>
> Signed-off-by: Eric Paris <[email protected]>
> Signed-off-by: Miklos Szeredi <[email protected]>

Looks good to me.

Acked-by: Serge Hallyn <[email protected]>

thanks,
-serge

> ---
> fs/Kconfig | 18 ++
> include/linux/shmem_fs.h | 1
> mm/shmem.c | 302 +++++++++++++++++++++++++++++++++++++++--------
> 3 files changed, 273 insertions(+), 48 deletions(-)
>
> Index: linux-2.6/fs/Kconfig
> ===================================================================
> --- linux-2.6.orig/fs/Kconfig 2011-04-19 21:09:33.000000000 +0200
> +++ linux-2.6/fs/Kconfig 2011-04-19 21:09:35.000000000 +0200
> @@ -121,9 +121,25 @@ config TMPFS
>
> See <file:Documentation/filesystems/tmpfs.txt> for details.
>
> +config TMPFS_XATTR
> + bool "Tmpfs extended attributes"
> + depends on TMPFS
> + default y
> + help
> + Extended attributes are name:value pairs associated with inodes by
> + the kernel or by users (see the attr(5) manual page, or visit
> + <http://acl.bestbits.at/> for details).
> +
> + Currently this enables support for the trusted.* and
> + security.* namespaces.
> +
> + If unsure, say N.
> +
> + You need this for POSIX ACL support on tmpfs.
> +
> config TMPFS_POSIX_ACL
> bool "Tmpfs POSIX Access Control Lists"
> - depends on TMPFS
> + depends on TMPFS_XATTR
> select GENERIC_ACL
> help
> POSIX Access Control Lists (ACLs) support permissions for users and
> Index: linux-2.6/include/linux/shmem_fs.h
> ===================================================================
> --- linux-2.6.orig/include/linux/shmem_fs.h 2011-04-19 21:09:25.000000000 +0200
> +++ linux-2.6/include/linux/shmem_fs.h 2011-04-19 21:09:35.000000000 +0200
> @@ -19,6 +19,7 @@ struct shmem_inode_info {
> struct page *i_indirect; /* top indirect blocks page */
> swp_entry_t i_direct[SHMEM_NR_DIRECT]; /* first blocks */
> struct list_head swaplist; /* chain of maybes on swap */
> + struct list_head xattr_list; /* list of shmem_xattr */
> struct inode vfs_inode;
> };
>
> Index: linux-2.6/mm/shmem.c
> ===================================================================
> --- linux-2.6.orig/mm/shmem.c 2011-04-19 21:09:25.000000000 +0200
> +++ linux-2.6/mm/shmem.c 2011-04-19 21:09:35.000000000 +0200
> @@ -99,6 +99,13 @@ static struct vfsmount *shm_mnt;
> /* Pretend that each entry is of this size in directory's i_size */
> #define BOGO_DIRENT_SIZE 20
>
> +struct shmem_xattr {
> + struct list_head list; /* anchored by shmem_inode_info->xattr_list */
> + char *name; /* xattr name */
> + size_t size;
> + char value[0];
> +};
> +
> /* Flag allocation requirements to shmem_getpage and shmem_swp_alloc */
> enum sgp_type {
> SGP_READ, /* don't exceed i_size, don't allocate page */
> @@ -822,6 +829,7 @@ static int shmem_notify_change(struct de
> static void shmem_evict_inode(struct inode *inode)
> {
> struct shmem_inode_info *info = SHMEM_I(inode);
> + struct shmem_xattr *xattr, *nxattr;
>
> if (inode->i_mapping->a_ops == &shmem_aops) {
> truncate_inode_pages(inode->i_mapping, 0);
> @@ -834,6 +842,11 @@ static void shmem_evict_inode(struct ino
> mutex_unlock(&shmem_swaplist_mutex);
> }
> }
> +
> + list_for_each_entry_safe(xattr, nxattr, &info->xattr_list, list) {
> + kfree(xattr->name);
> + kfree(xattr);
> + }
> BUG_ON(inode->i_blocks);
> shmem_free_inode(inode->i_sb);
> end_writeback(inode);
> @@ -1597,6 +1610,7 @@ static struct inode *shmem_get_inode(str
> spin_lock_init(&info->lock);
> info->flags = flags & VM_NORESERVE;
> INIT_LIST_HEAD(&info->swaplist);
> + INIT_LIST_HEAD(&info->xattr_list);
> cache_no_acl(inode);
>
> switch (mode & S_IFMT) {
> @@ -2048,62 +2062,225 @@ static void shmem_put_link(struct dentry
> }
> }
>
> -static const struct inode_operations shmem_symlink_inline_operations = {
> - .readlink = generic_readlink,
> - .follow_link = shmem_follow_link_inline,
> -};
> -
> -static const struct inode_operations shmem_symlink_inode_operations = {
> - .readlink = generic_readlink,
> - .follow_link = shmem_follow_link,
> - .put_link = shmem_put_link,
> -};
> -
> -#ifdef CONFIG_TMPFS_POSIX_ACL
> +#ifdef CONFIG_TMPFS_XATTR
> /*
> - * Superblocks without xattr inode operations will get security.* xattr
> - * support from the VFS "for free". As soon as we have any other xattrs
> + * Superblocks without xattr inode operations may get some security.* xattr
> + * support from the LSM "for free". As soon as we have any other xattrs
> * like ACLs, we also need to implement the security.* handlers at
> * filesystem level, though.
> */
>
> -static size_t shmem_xattr_security_list(struct dentry *dentry, char *list,
> - size_t list_len, const char *name,
> - size_t name_len, int handler_flags)
> +static int shmem_xattr_get(struct dentry *dentry, const char *name,
> + void *buffer, size_t size)
> {
> - return security_inode_listsecurity(dentry->d_inode, list, list_len);
> -}
> + struct shmem_inode_info *info;
> + struct shmem_xattr *xattr;
> + int ret = -ENODATA;
>
> -static int shmem_xattr_security_get(struct dentry *dentry, const char *name,
> - void *buffer, size_t size, int handler_flags)
> -{
> - if (strcmp(name, "") == 0)
> - return -EINVAL;
> - return xattr_getsecurity(dentry->d_inode, name, buffer, size);
> + info = SHMEM_I(dentry->d_inode);
> +
> + spin_lock(&dentry->d_inode->i_lock);
> + list_for_each_entry(xattr, &info->xattr_list, list) {
> + if (strcmp(name, xattr->name))
> + continue;
> +
> + ret = xattr->size;
> + if (buffer) {
> + if (size < xattr->size)
> + ret = -ERANGE;
> + else
> + memcpy(buffer, xattr->value, xattr->size);
> + }
> + break;
> + }
> + spin_unlock(&dentry->d_inode->i_lock);
> + return ret;
> }
>
> -static int shmem_xattr_security_set(struct dentry *dentry, const char *name,
> - const void *value, size_t size, int flags, int handler_flags)
> +static int shmem_xattr_set(struct dentry *dentry, const char *name,
> + const void *value, size_t size, int flags)
> {
> - if (strcmp(name, "") == 0)
> - return -EINVAL;
> - return security_inode_setsecurity(dentry->d_inode, name, value,
> - size, flags);
> + struct inode *inode = dentry->d_inode;
> + struct shmem_inode_info *info = SHMEM_I(inode);
> + struct shmem_xattr *xattr;
> + struct shmem_xattr *new_xattr = NULL;
> + size_t len;
> + int err = 0;
> +
> + /* value == NULL means remove */
> + if (value) {
> + /* wrap around? */
> + len = sizeof(*new_xattr) + size;
> + if (len <= sizeof(*new_xattr))
> + return -ENOMEM;
> +
> + new_xattr = kmalloc(len, GFP_NOFS);
> + if (!new_xattr)
> + return -ENOMEM;
> +
> + new_xattr->name = kstrdup(name, GFP_NOFS);
> + if (!new_xattr->name) {
> + kfree(new_xattr);
> + return -ENOMEM;
> + }
> +
> + new_xattr->size = size;
> + memcpy(new_xattr->value, value, size);
> + }
> +
> + spin_lock(&inode->i_lock);
> + list_for_each_entry(xattr, &info->xattr_list, list) {
> + if (!strcmp(name, xattr->name)) {
> + if (flags & XATTR_CREATE) {
> + xattr = new_xattr;
> + err = -EEXIST;
> + } else if (new_xattr) {
> + list_replace(&xattr->list, &new_xattr->list);
> + } else {
> + list_del(&xattr->list);
> + }
> + goto out;
> + }
> + }
> + if (flags & XATTR_REPLACE) {
> + xattr = new_xattr;
> + err = -ENODATA;
> + } else {
> + list_add(&new_xattr->list, &info->xattr_list);
> + xattr = NULL;
> + }
> +out:
> + spin_unlock(&inode->i_lock);
> + if (xattr)
> + kfree(xattr->name);
> + kfree(xattr);
> + return 0;
> }
>
> -static const struct xattr_handler shmem_xattr_security_handler = {
> - .prefix = XATTR_SECURITY_PREFIX,
> - .list = shmem_xattr_security_list,
> - .get = shmem_xattr_security_get,
> - .set = shmem_xattr_security_set,
> -};
>
> static const struct xattr_handler *shmem_xattr_handlers[] = {
> +#ifdef CONFIG_TMPFS_POSIX_ACL
> &generic_acl_access_handler,
> &generic_acl_default_handler,
> - &shmem_xattr_security_handler,
> +#endif
> NULL
> };
> +
> +static int shmem_xattr_validate(const char *name)
> +{
> + struct { const char *prefix; size_t len; } arr[] = {
> + { XATTR_SECURITY_PREFIX, XATTR_SECURITY_PREFIX_LEN },
> + { XATTR_TRUSTED_PREFIX, XATTR_TRUSTED_PREFIX_LEN }};
> + int i;
> +
> + for (i = 0; i < ARRAY_SIZE(arr); i++) {
> + size_t preflen = arr[i].len;
> + if (strncmp(name, arr[i].prefix, preflen) == 0) {
> + if (!name[preflen])
> + return -EINVAL;
> + return 0;
> + }
> + }
> + return -EOPNOTSUPP;
> +}
> +
> +static ssize_t shmem_getxattr(struct dentry *dentry, const char *name,
> + void *buffer, size_t size)
> +{
> + int err;
> +
> + /*
> + * If this is a request for a synthetic attribute in the system.*
> + * namespace use the generic infrastructure to resolve a handler
> + * for it via sb->s_xattr.
> + */
> + if (!strncmp(name, XATTR_SYSTEM_PREFIX, XATTR_SYSTEM_PREFIX_LEN))
> + return generic_getxattr(dentry, name, buffer, size);
> +
> + err = shmem_xattr_validate(name);
> + if (err)
> + return err;
> +
> + return shmem_xattr_get(dentry, name, buffer, size);
> +}
> +
> +static int shmem_setxattr(struct dentry *dentry, const char *name,
> + const void *value, size_t size, int flags)
> +{
> + int err;
> +
> + /*
> + * If this is a request for a synthetic attribute in the system.*
> + * namespace use the generic infrastructure to resolve a handler
> + * for it via sb->s_xattr.
> + */
> + if (!strncmp(name, XATTR_SYSTEM_PREFIX, XATTR_SYSTEM_PREFIX_LEN))
> + return generic_setxattr(dentry, name, value, size, flags);
> +
> + err = shmem_xattr_validate(name);
> + if (err)
> + return err;
> +
> + if (size == 0)
> + value = ""; /* empty EA, do not remove */
> +
> + return shmem_xattr_set(dentry, name, value, size, flags);
> +
> +}
> +
> +static int shmem_removexattr(struct dentry *dentry, const char *name)
> +{
> + int err;
> +
> + /*
> + * If this is a request for a synthetic attribute in the system.*
> + * namespace use the generic infrastructure to resolve a handler
> + * for it via sb->s_xattr.
> + */
> + if (!strncmp(name, XATTR_SYSTEM_PREFIX, XATTR_SYSTEM_PREFIX_LEN))
> + return generic_removexattr(dentry, name);
> +
> + err = shmem_xattr_validate(name);
> + if (err)
> + return err;
> +
> + return shmem_xattr_set(dentry, name, NULL, 0, XATTR_REPLACE);
> +}
> +
> +static bool xattr_is_trusted(const char *name)
> +{
> + return !strncmp(name, XATTR_TRUSTED_PREFIX, XATTR_TRUSTED_PREFIX_LEN);
> +}
> +
> +static ssize_t shmem_listxattr(struct dentry *dentry, char *buffer, size_t size)
> +{
> + bool trusted = capable(CAP_SYS_ADMIN);
> + struct shmem_xattr *xattr;
> + struct shmem_inode_info *info;
> + size_t used = 0;
> +
> + info = SHMEM_I(dentry->d_inode);
> +
> + spin_lock(&dentry->d_inode->i_lock);
> + list_for_each_entry(xattr, &info->xattr_list, list) {
> + /* skip "trusted." attributes for unprivileged callers */
> + if (!trusted && xattr_is_trusted(xattr->name))
> + continue;
> +
> + used += strlen(xattr->name) + 1;
> + if (buffer) {
> + if (size < used) {
> + used = -ERANGE;
> + break;
> + }
> + strncpy(buffer, xattr->name, strlen(xattr->name) + 1);
> + buffer += strlen(xattr->name) + 1;
> + }
> + }
> + spin_unlock(&dentry->d_inode->i_lock);
> +
> + return used;
> +}
> #endif
>
> static struct dentry *shmem_get_parent(struct dentry *child)
> @@ -2384,8 +2561,10 @@ int shmem_fill_super(struct super_block
> sb->s_magic = TMPFS_MAGIC;
> sb->s_op = &shmem_ops;
> sb->s_time_gran = 1;
> -#ifdef CONFIG_TMPFS_POSIX_ACL
> +#ifdef CONFIG_TMPFS_XATTR
> sb->s_xattr = shmem_xattr_handlers;
> +#endif
> +#ifdef CONFIG_TMPFS_POSIX_ACL
> sb->s_flags |= MS_POSIXACL;
> #endif
>
> @@ -2483,16 +2662,41 @@ static const struct file_operations shme
> static const struct inode_operations shmem_inode_operations = {
> .setattr = shmem_notify_change,
> .truncate_range = shmem_truncate_range,
> +#ifdef CONFIG_TMPFS_XATTR
> + .setxattr = shmem_setxattr,
> + .getxattr = shmem_getxattr,
> + .listxattr = shmem_listxattr,
> + .removexattr = shmem_removexattr,
> +#endif
> #ifdef CONFIG_TMPFS_POSIX_ACL
> - .setxattr = generic_setxattr,
> - .getxattr = generic_getxattr,
> - .listxattr = generic_listxattr,
> - .removexattr = generic_removexattr,
> .check_acl = generic_check_acl,
> #endif
>
> };
>
> +static const struct inode_operations shmem_symlink_inline_operations = {
> + .readlink = generic_readlink,
> + .follow_link = shmem_follow_link_inline,
> +#ifdef CONFIG_TMPFS_XATTR
> + .setxattr = shmem_setxattr,
> + .getxattr = shmem_getxattr,
> + .listxattr = shmem_listxattr,
> + .removexattr = shmem_removexattr,
> +#endif
> +};
> +
> +static const struct inode_operations shmem_symlink_inode_operations = {
> + .readlink = generic_readlink,
> + .follow_link = shmem_follow_link,
> + .put_link = shmem_put_link,
> +#ifdef CONFIG_TMPFS_XATTR
> + .setxattr = shmem_setxattr,
> + .getxattr = shmem_getxattr,
> + .listxattr = shmem_listxattr,
> + .removexattr = shmem_removexattr,
> +#endif
> +};
> +
> static const struct inode_operations shmem_dir_inode_operations = {
> #ifdef CONFIG_TMPFS
> .create = shmem_create,
> @@ -2505,23 +2709,27 @@ static const struct inode_operations shm
> .mknod = shmem_mknod,
> .rename = shmem_rename,
> #endif
> -#ifdef CONFIG_TMPFS_POSIX_ACL
> - .setattr = shmem_notify_change,
> +#ifdef CONFIG_TMPFS_XATTR
> .setxattr = generic_setxattr,
> .getxattr = generic_getxattr,
> .listxattr = generic_listxattr,
> .removexattr = generic_removexattr,
> +#endif
> +#ifdef CONFIG_TMPFS_POSIX_ACL
> + .setattr = shmem_notify_change,
> .check_acl = generic_check_acl,
> #endif
> };
>
> static const struct inode_operations shmem_special_inode_operations = {
> -#ifdef CONFIG_TMPFS_POSIX_ACL
> - .setattr = shmem_notify_change,
> +#ifdef CONFIG_TMPFS_XATTR
> .setxattr = generic_setxattr,
> .getxattr = generic_getxattr,
> .listxattr = generic_listxattr,
> .removexattr = generic_removexattr,
> +#endif
> +#ifdef CONFIG_TMPFS_POSIX_ACL
> + .setattr = shmem_notify_change,
> .check_acl = generic_check_acl,
> #endif
> };
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2011-04-21 07:00:17

by Michal Suchanek

[permalink] [raw]
Subject: Re: [PATCH] tmpfs: implement generic xattr support

On 20 April 2011 15:43, Miklos Szeredi <[email protected]> wrote:
> Phillip Lougher <[email protected]> writes:
>
>> On Tue, Apr 19, 2011 at 9:04 PM, Miklos Szeredi <[email protected]> wrote:
>>
>>> +static ssize_t shmem_listxattr(struct dentry *dentry, char *buffer, size_t size)
>>
>>> +       spin_lock(&dentry->d_inode->i_lock);
>>> +       list_for_each_entry(xattr, &info->xattr_list, list) {
>>> +               /* skip "trusted." attributes for unprivileged callers */
>>> +               if (!trusted && xattr_is_trusted(xattr->name))
>>> +                       continue;
>>> +
>>> +               used += strlen(xattr->name) + 1;
>>> +               if (buffer) {
>>> +                       if (size < used) {
>>> +                               used = -ERANGE;
>>> +                               break;
>>> +                       }
>>> +                       strncpy(buffer, xattr->name, strlen(xattr->name) + 1);
>>>+                        buffer += strlen(xattr->name) + 1;
>>
>> Why are you doing a strncpy here?  strcpy() isn't going to copy more
>> than strlen(xattr->name) + 1 bytes, and you know buffer is large
>> enough to hold that because of the previous if (size < used) check?
>>
>> If you assigned the first strlen(xattr->name) + 1 to a temporary
>> variable, you could use memcpy here, and avoid the 3 repeated
>> strlen(xattr->name) calls.
>
> Yeah, makes sense.
>

Can you put up a branch with both overlayfs and xattrs?

I would like to point people at a repo with working implementation and
mirroring the kernel tree to add one patch is a bit overkill.

Thanks

Michal

2011-04-21 09:08:18

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH] tmpfs: implement generic xattr support

Michal Suchanek <[email protected]> writes:

> Can you put up a branch with both overlayfs and xattrs?
>
> I would like to point people at a repo with working implementation and
> mirroring the kernel tree to add one patch is a bit overkill.

Okay, added the tmpfs-xattr patch to the overlayfs.v8 branch.

Thanks,
Miklos

2011-04-21 10:59:23

by Michal Suchanek

[permalink] [raw]
Subject: Re: [PATCH] tmpfs: implement generic xattr support

On 21 April 2011 11:08, Miklos Szeredi <[email protected]> wrote:
> Michal Suchanek <[email protected]> writes:
>
>> Can you put up a branch with both overlayfs and xattrs?
>>
>> I would like to point people at a repo with working implementation and
>> mirroring the kernel tree to add one patch is a bit overkill.
>
> Okay, added the tmpfs-xattr patch to the overlayfs.v8 branch.
>

Okay, I built a small live system using this branch and it boots without errors.

Now it's time to build a larger image I guess.

Thanks

Michal

2011-04-21 14:58:44

by Jordi Pujol

[permalink] [raw]
Subject: Re: [PATCH] tmpfs: implement generic xattr support

A Dijous 21 Abril 2011 12:59:01, Michal Suchanek va escriure:
> Now it's time to build a larger image I guess.

Hello,

I have done it, building a desktop image that contains kde4 and some services,
The tests show that tmpfs is broken, it makes first operations but at some
point it becomes erratic; that occurs in non-persistent mode, where
overlayfs upperdir is on tmpfs, and in persistent mode also but in this case
more user working is needed before tmpfs failure,

in Linux single user mode I have installed the nvidia packages and driver
using dpkg, only some files are installed, therefore I tried creating manually
a required directory and in this manner the process has done some steps more.

kernel source:

http://livenet.selfip.com/ftp/debian/linux-2.6/linux-2.6.38-3.jpp.3-
lnet_2.6.38-15.tar.bz2

logs:

http://livenet.selfip.com/ftp/debian/overlayfs/test-tmpfs-xattrs.tar.bz2

thanks,

Jordi Pujol

Live never ending Tale
GNU/Linux Live forever!
http://livenet.selfip.com

2011-04-21 15:22:52

by Michal Suchanek

[permalink] [raw]
Subject: Re: [PATCH] tmpfs: implement generic xattr support

On 21 April 2011 16:58, Jordi Pujol <[email protected]> wrote:
> A Dijous 21 Abril 2011 12:59:01, Michal Suchanek va escriure:
>> Now it's time to build a larger image I guess.
>
> Hello,
>
> I have done it, building a desktop image that contains kde4 and some services,
> The tests show that tmpfs is broken, it makes first operations but at some
> point it becomes erratic; that occurs in non-persistent mode, where
> overlayfs upperdir is on tmpfs, and in persistent mode also but in this case
> more user working is needed before tmpfs failure,
>
> in Linux single user mode I have installed the nvidia packages and driver
> using dpkg, only some files are installed, therefore I tried creating manually
> a required directory and in this manner the process has done some steps more.
>
> kernel source:
>
> http://livenet.selfip.com/ftp/debian/linux-2.6/linux-2.6.38-3.jpp.3-
> lnet_2.6.38-15.tar.bz2
>
> logs:
>
> http://livenet.selfip.com/ftp/debian/overlayfs/test-tmpfs-xattrs.tar.bz2
>

Yes, while the system appears to run generally fine unpacking dpkg
packages fails.

Thanks

Michal

2011-04-21 15:43:49

by Michal Suchanek

[permalink] [raw]
Subject: Re: [PATCH] tmpfs: implement generic xattr support

On 21 April 2011 17:22, Michal Suchanek <[email protected]> wrote:
> On 21 April 2011 16:58, Jordi Pujol <[email protected]> wrote:
>> A Dijous 21 Abril 2011 12:59:01, Michal Suchanek va escriure:
>>> Now it's time to build a larger image I guess.
>>
>> Hello,
>>
>> I have done it, building a desktop image that contains kde4 and some services,
>> The tests show that tmpfs is broken, it makes first operations but at some
>> point it becomes erratic; that occurs in non-persistent mode, where
>> overlayfs upperdir is on tmpfs, and in persistent mode also but in this case
>> more user working is needed before tmpfs failure,
>>
>> in Linux single user mode I have installed the nvidia packages and driver
>> using dpkg, only some files are installed, therefore I tried creating manually
>> a required directory and in this manner the process has done some steps more.
>>
>> kernel source:
>>
>> http://livenet.selfip.com/ftp/debian/linux-2.6/linux-2.6.38-3.jpp.3-
>> lnet_2.6.38-15.tar.bz2
>>
>> logs:
>>
>> http://livenet.selfip.com/ftp/debian/overlayfs/test-tmpfs-xattrs.tar.bz2
>>
>
> Yes, while the system appears to run generally fine unpacking dpkg
> packages fails.

You will be probably interested in this log:

ls: cannot access /usr/share/doc/ucb*: No such file or directory
...
rmdir("/usr/share/doc/ucblogo.dpkg-new") = -1 ENOENT (No such file or directory)
rmdir("/usr/share/doc/ucblogo.dpkg-tmp") = -1 ENOENT (No such file or directory)
mkdir("/usr/share/doc/ucblogo.dpkg-new", 0) = 0
chown32("/usr/share/doc/ucblogo.dpkg-new", 0, 0) = 0
chmod("/usr/share/doc/ucblogo.dpkg-new", 0755) = 0
utime("/usr/share/doc/ucblogo.dpkg-new", [2011/04/21-15:37:05,
2008/06/08-11:38:17]) = 0
rename("/usr/share/doc/ucblogo.dpkg-new", "/usr/share/doc/ucblogo") =
-1 EOPNOTSUPP (Operation not supported)

Thanks

Michal

2011-04-21 17:26:15

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH] tmpfs: implement generic xattr support

Michal Suchanek <[email protected]> writes:

> You will be probably interested in this log:
>
> ls: cannot access /usr/share/doc/ucb*: No such file or directory
> ...
> rmdir("/usr/share/doc/ucblogo.dpkg-new") = -1 ENOENT (No such file or directory)
> rmdir("/usr/share/doc/ucblogo.dpkg-tmp") = -1 ENOENT (No such file or directory)
> mkdir("/usr/share/doc/ucblogo.dpkg-new", 0) = 0
> chown32("/usr/share/doc/ucblogo.dpkg-new", 0, 0) = 0
> chmod("/usr/share/doc/ucblogo.dpkg-new", 0755) = 0
> utime("/usr/share/doc/ucblogo.dpkg-new", [2011/04/21-15:37:05,
> 2008/06/08-11:38:17]) = 0
> rename("/usr/share/doc/ucblogo.dpkg-new", "/usr/share/doc/ucblogo") =
> -1 EOPNOTSUPP (Operation not supported)

Oops, stupid mistake.

Following patch should fix it.

Thanks,
Miklos

commit e90f1946938da7c9502337340126beae1931d239
Author: Miklos Szeredi <[email protected]>
Date: Thu Apr 21 19:18:23 2011 +0200

tmpfs: xattr: fix xattr for directories and special files

Forgot to update the xattr methods for some file types.

Signed-off-by: Miklos Szeredi <[email protected]>

diff --git a/mm/shmem.c b/mm/shmem.c
index ac8ec9e..c527484 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2714,10 +2714,10 @@ static const struct inode_operations shmem_dir_inode_operations = {
.rename = shmem_rename,
#endif
#ifdef CONFIG_TMPFS_XATTR
- .setxattr = generic_setxattr,
- .getxattr = generic_getxattr,
- .listxattr = generic_listxattr,
- .removexattr = generic_removexattr,
+ .setxattr = shmem_setxattr,
+ .getxattr = shmem_getxattr,
+ .listxattr = shmem_listxattr,
+ .removexattr = shmem_removexattr,
#endif
#ifdef CONFIG_TMPFS_POSIX_ACL
.setattr = shmem_notify_change,
@@ -2727,10 +2727,10 @@ static const struct inode_operations shmem_dir_inode_operations = {

static const struct inode_operations shmem_special_inode_operations = {
#ifdef CONFIG_TMPFS_XATTR
- .setxattr = generic_setxattr,
- .getxattr = generic_getxattr,
- .listxattr = generic_listxattr,
- .removexattr = generic_removexattr,
+ .setxattr = shmem_setxattr,
+ .getxattr = shmem_getxattr,
+ .listxattr = shmem_listxattr,
+ .removexattr = shmem_removexattr,
#endif
#ifdef CONFIG_TMPFS_POSIX_ACL
.setattr = shmem_notify_change,

2011-04-21 19:17:23

by Michal Suchanek

[permalink] [raw]
Subject: Re: [PATCH] tmpfs: implement generic xattr support

On 21 April 2011 19:26, Miklos Szeredi <[email protected]> wrote:
> Michal Suchanek <[email protected]> writes:
>
>> You will be probably interested in this log:
>>
>> ls: cannot access /usr/share/doc/ucb*: No such file or directory
>> ...
>> rmdir("/usr/share/doc/ucblogo.dpkg-new") = -1 ENOENT (No such file or directory)
>> rmdir("/usr/share/doc/ucblogo.dpkg-tmp") = -1 ENOENT (No such file or directory)
>> mkdir("/usr/share/doc/ucblogo.dpkg-new", 0) = 0
>> chown32("/usr/share/doc/ucblogo.dpkg-new", 0, 0) = 0
>> chmod("/usr/share/doc/ucblogo.dpkg-new", 0755) = 0
>> utime("/usr/share/doc/ucblogo.dpkg-new", [2011/04/21-15:37:05,
>> 2008/06/08-11:38:17]) = 0
>> rename("/usr/share/doc/ucblogo.dpkg-new", "/usr/share/doc/ucblogo") =
>> -1 EOPNOTSUPP (Operation not supported)
>
> Oops, stupid mistake.
>
> Following patch should fix it.
>

Yes, with the additional patch package installation now works.

Thanks

Michal

2011-05-12 04:20:44

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH] tmpfs: implement generic xattr support

Hi Miklos,

I understand from the mm-commits discussion that you're probably
preparing another version: so let me make just a few comments
on this old one now, in case you can factor them in.

On Tue, 19 Apr 2011, Miklos Szeredi wrote:

> Michal Suchanek <[email protected]> writes:
> > Applying this patch is not sufficient. Apparently more xattrs are
> > needed but adding them on top of this patch should be easy.
> >
> > The ones mentioned in the overlayfs doc are
> >
> > trusted.overlay.whiteout
> > trusted.overlay.opaque
> >
> > The patch implements security.*
>
> Here's an updated patch. It changes a number of things:
>
> - it implements shmem specific xattr ops instead of using the generic_*
> functions. Which means that there's no back and forth between the
> VFS and the filesystem. I basically copied the btrfs way of doing
> things.
>
> - adds a new config option: CONFIG_TMPFS_XATTR and makes
> CONFIG_TMPFS_POSIX_ACL depend on this. This way xattr support can be
> turned on without also adding ACL support.
>
> - now supports trusted.* namespace needed by overlayfs in addition to
> security.*. Doesn't yet support user.* since that needs more thought
> wrt. kernel memory use.
>
> - supports xattrs on symlinks, again needed by overlayfs
>
> Hugh, Eric, does this look acceptable?

Pretty much, I think. As ever I'm somewhat bemused by these xattrs,
so don't count too much on my opinion. But if it looks good to xattr
people, then it's welcome in tmpfs - thank you. Eric's ack would be good.

Just a few comments...

>
> Thanks,
> Miklos
>
> ---
> From: Eric Paris <[email protected]>
> Subject: tmpfs: implement generic xattr support
>
> This patch implements generic xattrs for tmpfs filesystems. The feodra
> project, while trying to replace suid apps with file capabilities,
> realized that tmpfs, which is used on the build systems, does not
> support file capabilities and thus cannot be used to build packages
> which use file capabilities. Xattrs are also needed for overlayfs.
>
> The xattr interface is a bit, odd. If a filesystem does not implement any
> {get,set,list}xattr functions the VFS will call into some random LSM hooks and
> the running LSM can then implement some method for handling xattrs. SELinux
> for example provides a method to support security.selinux but no other
> security.* xattrs.
>
> As it stands today when one enables CONFIG_TMPFS_POSIX_ACL tmpfs will have
> xattr handler routines specifically to handle acls. Because of this tmpfs
> would loose the VFS/LSM helpers to support the running LSM. To make up for
> that tmpfs had stub functions that did nothing but call into the LSM hooks
> which implement the helpers.
>
> This new patch does not use the LSM fallback functions and instead
> just implements a native get/set/list xattr feature for the full
> security.* and trusted.* namespace like a normal filesystem. This
> means that tmpfs can now support both security.selinux and
> security.capability, which was not previously possible.
>
> The basic implementation is that I attach a:
>
> struct shmem_xattr {
> struct list_head list; /* anchored by shmem_inode_info->xattr_list */
> char *name;
> size_t size;
> char value[0];
> };
>
> Into the struct shmem_inode_info for each xattr that is set. This
> implementation could easily support the user.* namespace as well,
> except some care needs to be taken to prevent large amounts of
> unswappable memory being allocated for unprivileged users.
>
> Signed-off-by: Eric Paris <[email protected]>
> Signed-off-by: Miklos Szeredi <[email protected]>
> ---
> fs/Kconfig | 18 ++
> include/linux/shmem_fs.h | 1
> mm/shmem.c | 302 +++++++++++++++++++++++++++++++++++++++--------
> 3 files changed, 273 insertions(+), 48 deletions(-)
>
> Index: linux-2.6/fs/Kconfig
> ===================================================================
> --- linux-2.6.orig/fs/Kconfig 2011-04-19 21:09:33.000000000 +0200
> +++ linux-2.6/fs/Kconfig 2011-04-19 21:09:35.000000000 +0200
> @@ -121,9 +121,25 @@ config TMPFS
>
> See <file:Documentation/filesystems/tmpfs.txt> for details.
>
> +config TMPFS_XATTR
> + bool "Tmpfs extended attributes"
> + depends on TMPFS
> + default y
> + help
> + Extended attributes are name:value pairs associated with inodes by
> + the kernel or by users (see the attr(5) manual page, or visit
> + <http://acl.bestbits.at/> for details).
> +
> + Currently this enables support for the trusted.* and
> + security.* namespaces.
> +
> + If unsure, say N.
> +
> + You need this for POSIX ACL support on tmpfs.
> +

I disagree with Andrew on this, it should default to off, consistent with
the "If unsure, say N" comment. Partly because that's how Linus prefers
it, for good anti-bloat reasons: if people have got along without a
feature for years, whyever should we suddenly default it on? And
partly because TMPFS_POSIX_ACL already defaults to off (as do
EXT2_FS_XATTR and EXT2_FS_POSIX_ACL).

However... if someone already has CONFIG_TMPFS_POSIX_ACL=y in their
old config, it would be nice to make CONFIG_TMPFS_XATTR=y automatically,
so we're not in danger of removing their old functionality. But I haven't
thought of the right way to achieve that - maybe your helpful POSIX ACL
comment is enough, but automatic would be better.

Few people would choose TMPFS_XATTR on and TMPFS_POSIX_ACL off: perhaps it
would work to make CONFIG_TMPFS_XATTR the new config option to cover both,
and select it from config TMPFS_POSIX_ACL (without a prompt) so the
oldconfig propagates correctly. Perhaps. But I haven't tried myself,
and you may be forgiveably disinclined to make config experiments!

> config TMPFS_POSIX_ACL
> bool "Tmpfs POSIX Access Control Lists"
> - depends on TMPFS
> + depends on TMPFS_XATTR
> select GENERIC_ACL
> help
> POSIX Access Control Lists (ACLs) support permissions for users and
> Index: linux-2.6/include/linux/shmem_fs.h
> ===================================================================
> --- linux-2.6.orig/include/linux/shmem_fs.h 2011-04-19 21:09:25.000000000 +0200
> +++ linux-2.6/include/linux/shmem_fs.h 2011-04-19 21:09:35.000000000 +0200
> @@ -19,6 +19,7 @@ struct shmem_inode_info {
> struct page *i_indirect; /* top indirect blocks page */
> swp_entry_t i_direct[SHMEM_NR_DIRECT]; /* first blocks */
> struct list_head swaplist; /* chain of maybes on swap */
> + struct list_head xattr_list; /* list of shmem_xattr */
> struct inode vfs_inode;
> };
>
> Index: linux-2.6/mm/shmem.c
> ===================================================================
> --- linux-2.6.orig/mm/shmem.c 2011-04-19 21:09:25.000000000 +0200
> +++ linux-2.6/mm/shmem.c 2011-04-19 21:09:35.000000000 +0200
...
> -#ifdef CONFIG_TMPFS_POSIX_ACL
> +#ifdef CONFIG_TMPFS_XATTR
> /*
> - * Superblocks without xattr inode operations will get security.* xattr
> - * support from the VFS "for free". As soon as we have any other xattrs
> + * Superblocks without xattr inode operations may get some security.* xattr
> + * support from the LSM "for free". As soon as we have any other xattrs
> * like ACLs, we also need to implement the security.* handlers at
> * filesystem level, though.
> */
>
> -static size_t shmem_xattr_security_list(struct dentry *dentry, char *list,
> - size_t list_len, const char *name,
> - size_t name_len, int handler_flags)
> +static int shmem_xattr_get(struct dentry *dentry, const char *name,
> + void *buffer, size_t size)
> {
> - return security_inode_listsecurity(dentry->d_inode, list, list_len);
> -}
> + struct shmem_inode_info *info;
> + struct shmem_xattr *xattr;
> + int ret = -ENODATA;
>
> -static int shmem_xattr_security_get(struct dentry *dentry, const char *name,
> - void *buffer, size_t size, int handler_flags)
> -{
> - if (strcmp(name, "") == 0)
> - return -EINVAL;
> - return xattr_getsecurity(dentry->d_inode, name, buffer, size);
> + info = SHMEM_I(dentry->d_inode);
> +
> + spin_lock(&dentry->d_inode->i_lock);

Not important, but I suggest you use info->lock throughout for this,
instead of dentry->d_inode->i_lock: in each case you need "info" for
info->xattr_list (and don't need "inode" at all I think), so info->lock
seems appropriate, and may be in the same cacheline once I make
shmem_inode_info smaller. But don't worry if you'd prefer to leave it.

> -static int shmem_xattr_security_set(struct dentry *dentry, const char *name,
> - const void *value, size_t size, int flags, int handler_flags)
> +static int shmem_xattr_set(struct dentry *dentry, const char *name,
> + const void *value, size_t size, int flags)
> {
> - if (strcmp(name, "") == 0)
> - return -EINVAL;
> - return security_inode_setsecurity(dentry->d_inode, name, value,
> - size, flags);
> + struct inode *inode = dentry->d_inode;
> + struct shmem_inode_info *info = SHMEM_I(inode);
> + struct shmem_xattr *xattr;
> + struct shmem_xattr *new_xattr = NULL;
> + size_t len;
> + int err = 0;
> +
> + /* value == NULL means remove */
> + if (value) {
> + /* wrap around? */
> + len = sizeof(*new_xattr) + size;
> + if (len <= sizeof(*new_xattr))
> + return -ENOMEM;
> +
> + new_xattr = kmalloc(len, GFP_NOFS);

There's no need for GFP_NOFS in this patch, the page reclaim path won't
ever recurse into xattr handling: just use the usual GFP_KERNEL throughout.

> + if (!new_xattr)
> + return -ENOMEM;
> +
> + new_xattr->name = kstrdup(name, GFP_NOFS);
> + if (!new_xattr->name) {
> + kfree(new_xattr);
> + return -ENOMEM;
> + }
> +
> + new_xattr->size = size;
> + memcpy(new_xattr->value, value, size);
> + }
> +
> + spin_lock(&inode->i_lock);
> + list_for_each_entry(xattr, &info->xattr_list, list) {
> + if (!strcmp(name, xattr->name)) {
> + if (flags & XATTR_CREATE) {
> + xattr = new_xattr;
> + err = -EEXIST;
> + } else if (new_xattr) {
> + list_replace(&xattr->list, &new_xattr->list);
> + } else {
> + list_del(&xattr->list);
> + }
> + goto out;
> + }
> + }
> + if (flags & XATTR_REPLACE) {
> + xattr = new_xattr;
> + err = -ENODATA;
> + } else {
> + list_add(&new_xattr->list, &info->xattr_list);
> + xattr = NULL;
> + }
> +out:
> + spin_unlock(&inode->i_lock);
> + if (xattr)
> + kfree(xattr->name);
> + kfree(xattr);
> + return 0;

I think you meant to return err rather than always 0.

...

> +static ssize_t shmem_listxattr(struct dentry *dentry, char *buffer, size_t size)
> +{
> + bool trusted = capable(CAP_SYS_ADMIN);
> + struct shmem_xattr *xattr;
> + struct shmem_inode_info *info;
> + size_t used = 0;
> +
> + info = SHMEM_I(dentry->d_inode);
> +
> + spin_lock(&dentry->d_inode->i_lock);
> + list_for_each_entry(xattr, &info->xattr_list, list) {
> + /* skip "trusted." attributes for unprivileged callers */
> + if (!trusted && xattr_is_trusted(xattr->name))
> + continue;
> +
> + used += strlen(xattr->name) + 1;
> + if (buffer) {
> + if (size < used) {
> + used = -ERANGE;
> + break;
> + }
> + strncpy(buffer, xattr->name, strlen(xattr->name) + 1);
> + buffer += strlen(xattr->name) + 1;
> + }
> + }
> + spin_unlock(&dentry->d_inode->i_lock);
> +
> + return used;
> +}
> #endif

#endif /* CONFIG_TMPFS_XATTR */

would be welcome by the time we get here.

Thanks,
Hugh

2011-05-12 07:52:50

by Michal Suchanek

[permalink] [raw]
Subject: Re: [PATCH] tmpfs: implement generic xattr support

On 12 May 2011 06:20, Hugh Dickins <[email protected]> wrote:
> On Tue, 19 Apr 2011, Miklos Szeredi wrote:
...
>
> I disagree with Andrew on this, it should default to off, consistent with
> the "If unsure, say N" comment.  Partly because that's how Linus prefers
> it, for good anti-bloat reasons: if people have got along without a
> feature for years, whyever should we suddenly default it on?  And
> partly because TMPFS_POSIX_ACL already defaults to off (as do
> EXT2_FS_XATTR and EXT2_FS_POSIX_ACL).
>
> However... if someone already has CONFIG_TMPFS_POSIX_ACL=y in their
> old config, it would be nice to make CONFIG_TMPFS_XATTR=y automatically,
> so we're not in danger of removing their old functionality.  But I haven't
> thought of the right way to achieve that - maybe your helpful POSIX ACL
> comment is enough, but automatic would be better.
>
> Few people would choose TMPFS_XATTR on and TMPFS_POSIX_ACL off: perhaps it
> would work to make CONFIG_TMPFS_XATTR the new config option to cover both,
> and select it from config TMPFS_POSIX_ACL (without a prompt) so the
> oldconfig propagates correctly.  Perhaps.  But I haven't tried myself,
> and you may be forgiveably disinclined to make config experiments!

The xattrs are required for overlayfs so "generic" kernels will
probably want those. However, the "generic" kernels likely have the
acls already so this is not that much of an issue.

Perhaps the acls should select xattrs so that in configs that had acls
the xattrs are added automagically.

Or does it already work with depends?

Thanks

Michal

>
>>  config TMPFS_POSIX_ACL
>>       bool "Tmpfs POSIX Access Control Lists"
>> -     depends on TMPFS
>> +     depends on TMPFS_XATTR
>>       select GENERIC_ACL
>>       help

2011-05-12 12:27:17

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH] tmpfs: implement generic xattr support

Hi Hugh,

Hugh Dickins <[email protected]> writes:

> I understand from the mm-commits discussion that you're probably
> preparing another version: so let me make just a few comments
> on this old one now, in case you can factor them in.

For sure. Thanks for your comments.

>> +config TMPFS_XATTR
>> + bool "Tmpfs extended attributes"
>> + depends on TMPFS
>> + default y
>> + help
>> + Extended attributes are name:value pairs associated with inodes by
>> + the kernel or by users (see the attr(5) manual page, or visit
>> + <http://acl.bestbits.at/> for details).
>> +
>> + Currently this enables support for the trusted.* and
>> + security.* namespaces.
>> +
>> + If unsure, say N.
>> +
>> + You need this for POSIX ACL support on tmpfs.
>> +
>
> I disagree with Andrew on this, it should default to off, consistent with
> the "If unsure, say N" comment. Partly because that's how Linus prefers
> it, for good anti-bloat reasons: if people have got along without a
> feature for years, whyever should we suddenly default it on? And
> partly because TMPFS_POSIX_ACL already defaults to off (as do
> EXT2_FS_XATTR and EXT2_FS_POSIX_ACL).

Okay, changed to "default n".

> However... if someone already has CONFIG_TMPFS_POSIX_ACL=y in their
> old config, it would be nice to make CONFIG_TMPFS_XATTR=y automatically,
> so we're not in danger of removing their old functionality. But I haven't
> thought of the right way to achieve that - maybe your helpful POSIX ACL
> comment is enough, but automatic would be better.

AFAICS we don't really support backward compatibility in Kconfig. If
something breaks, it's the user's (kernel builder's) responsibility to
fix up.

> Few people would choose TMPFS_XATTR on and TMPFS_POSIX_ACL off: perhaps it
> would work to make CONFIG_TMPFS_XATTR the new config option to cover both,
> and select it from config TMPFS_POSIX_ACL (without a prompt) so the
> oldconfig propagates correctly. Perhaps. But I haven't tried myself,
> and you may be forgiveably disinclined to make config experiments!

I think it would work, but I prefer to have these options separate and
avoid messy back compat options.

On overlayfs, for example, TMPFS_XATTR would be useful but
TMPFS_POSIX_ACL would not.

>> + info = SHMEM_I(dentry->d_inode);
>> +
>> + spin_lock(&dentry->d_inode->i_lock);
>
> Not important, but I suggest you use info->lock throughout for this,
> instead of dentry->d_inode->i_lock: in each case you need "info" for
> info->xattr_list (and don't need "inode" at all I think), so info->lock
> seems appropriate, and may be in the same cacheline once I make
> shmem_inode_info smaller. But don't worry if you'd prefer to leave
> it.

Makes sense. I updated the locking.

>> + new_xattr = kmalloc(len, GFP_NOFS);
>
> There's no need for GFP_NOFS in this patch, the page reclaim path won't
> ever recurse into xattr handling: just use the usual GFP_KERNEL
> throughout.

Yes. I didn't notice this in Eric's patch, but GFP_NOFS is obviously
not necessary here.

>> + if (!new_xattr)
>> + return -ENOMEM;
>> +
>> + new_xattr->name = kstrdup(name, GFP_NOFS);
>> + if (!new_xattr->name) {
>> + kfree(new_xattr);
>> + return -ENOMEM;
>> + }
>> +
>> + new_xattr->size = size;
>> + memcpy(new_xattr->value, value, size);
>> + }
>> +
>> + spin_lock(&inode->i_lock);
>> + list_for_each_entry(xattr, &info->xattr_list, list) {
>> + if (!strcmp(name, xattr->name)) {
>> + if (flags & XATTR_CREATE) {
>> + xattr = new_xattr;
>> + err = -EEXIST;
>> + } else if (new_xattr) {
>> + list_replace(&xattr->list, &new_xattr->list);
>> + } else {
>> + list_del(&xattr->list);
>> + }
>> + goto out;
>> + }
>> + }
>> + if (flags & XATTR_REPLACE) {
>> + xattr = new_xattr;
>> + err = -ENODATA;
>> + } else {
>> + list_add(&new_xattr->list, &info->xattr_list);
>> + xattr = NULL;
>> + }
>> +out:
>> + spin_unlock(&inode->i_lock);
>> + if (xattr)
>> + kfree(xattr->name);
>> + kfree(xattr);
>> + return 0;
>
> I think you meant to return err rather than always 0.

Right. Well spotted.

>> +static ssize_t shmem_listxattr(struct dentry *dentry, char *buffer, size_t size)
>> +{
>> + bool trusted = capable(CAP_SYS_ADMIN);
>> + struct shmem_xattr *xattr;
>> + struct shmem_inode_info *info;
>> + size_t used = 0;
>> +
>> + info = SHMEM_I(dentry->d_inode);
>> +
>> + spin_lock(&dentry->d_inode->i_lock);
>> + list_for_each_entry(xattr, &info->xattr_list, list) {
>> + /* skip "trusted." attributes for unprivileged callers */
>> + if (!trusted && xattr_is_trusted(xattr->name))
>> + continue;
>> +
>> + used += strlen(xattr->name) + 1;
>> + if (buffer) {
>> + if (size < used) {
>> + used = -ERANGE;
>> + break;
>> + }
>> + strncpy(buffer, xattr->name, strlen(xattr->name) + 1);
>> + buffer += strlen(xattr->name) + 1;
>> + }
>> + }
>> + spin_unlock(&dentry->d_inode->i_lock);
>> +
>> + return used;
>> +}
>> #endif
>
> #endif /* CONFIG_TMPFS_XATTR */
>
> would be welcome by the time we get here.

Yes, fixed.

Updated patch will follow shortly.

Thanks,
Miklos

2011-05-12 14:00:39

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH] tmpfs: implement generic xattr support

Miklos Szeredi <[email protected]> writes:

>>> + info = SHMEM_I(dentry->d_inode);
>>> +
>>> + spin_lock(&dentry->d_inode->i_lock);
>>
>> Not important, but I suggest you use info->lock throughout for this,
>> instead of dentry->d_inode->i_lock: in each case you need "info" for
>> info->xattr_list (and don't need "inode" at all I think), so info->lock
>> seems appropriate, and may be in the same cacheline once I make
>> shmem_inode_info smaller. But don't worry if you'd prefer to leave
>> it.
>
> Makes sense. I updated the locking.

This uncovered a nasty bug lurking in there: the "info" area, including
lock and xattr_list, may be overwritten by inline symlinks. Because
xattr_list is near the end, this wasn't noticed with casual testing, but
info->lock would immediately Oops on getxattr for symlinks.

I propose the following solution. It results in slightly less space for
inline symlinks, but correct operation for xattrs. Does the anonymous
union/struct solution look acceptable?

Thanks,
Miklos

Index: linux-2.6/include/linux/shmem_fs.h
===================================================================
--- linux-2.6.orig/include/linux/shmem_fs.h 2011-05-12 15:59:08.000000000 +0200
+++ linux-2.6/include/linux/shmem_fs.h 2011-05-12 15:58:25.000000000 +0200
@@ -11,15 +11,39 @@

struct shmem_inode_info {
spinlock_t lock;
- unsigned long flags;
- unsigned long alloced; /* data pages alloced to file */
- unsigned long swapped; /* subtotal assigned to swap */
- unsigned long next_index; /* highest alloced index + 1 */
- struct shared_policy policy; /* NUMA memory alloc policy */
- struct page *i_indirect; /* top indirect blocks page */
- swp_entry_t i_direct[SHMEM_NR_DIRECT]; /* first blocks */
- struct list_head swaplist; /* chain of maybes on swap */
- struct list_head xattr_list; /* list of shmem_xattr */
+
+ /* list of shmem_xattr */
+ struct list_head xattr_list;
+
+ union {
+ char inline_symlink[0];
+
+ /* Members not used by inline symlinks: */
+ struct {
+ unsigned long flags;
+
+ /* data pages alloced to file */
+ unsigned long alloced;
+
+ /* subtotal assigned to swap */
+ unsigned long swapped;
+
+ /* highest alloced index + 1 */
+ unsigned long next_index;
+
+ /* NUMA memory alloc policy */
+ struct shared_policy policy;
+
+ /* top indirect blocks page */
+ struct page *i_indirect;
+
+ /* first blocks */
+ swp_entry_t i_direct[SHMEM_NR_DIRECT];
+
+ /* chain of maybes on swap */
+ struct list_head swaplist;
+ };
+ };
struct inode vfs_inode;
};

Index: linux-2.6/mm/shmem.c
===================================================================
--- linux-2.6.orig/mm/shmem.c 2011-05-12 15:59:08.000000000 +0200
+++ linux-2.6/mm/shmem.c 2011-05-12 15:50:10.000000000 +0200
@@ -2029,9 +2029,9 @@ static int shmem_symlink(struct inode *d

info = SHMEM_I(inode);
inode->i_size = len-1;
- if (len <= (char *)inode - (char *)info) {
+ if (len <= (char *)inode - info->inline_symlink) {
/* do it inline */
- memcpy(info, symname, len);
+ memcpy(info->inline_symlink, symname, len);
inode->i_op = &shmem_symlink_inline_operations;
} else {
error = shmem_getpage(inode, 0, &page, SGP_WRITE, NULL);
@@ -2057,7 +2057,7 @@ static int shmem_symlink(struct inode *d

static void *shmem_follow_link_inline(struct dentry *dentry, struct nameidata *nd)
{
- nd_set_link(nd, (char *)SHMEM_I(dentry->d_inode));
+ nd_set_link(nd, SHMEM_I(dentry->d_inode)->inline_symlink);
return NULL;
}



2011-05-12 16:52:10

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH] tmpfs: implement generic xattr support

On Thu, 12 May 2011, Miklos Szeredi wrote:
> Miklos Szeredi <[email protected]> writes:
>
> >>> + info = SHMEM_I(dentry->d_inode);
> >>> +
> >>> + spin_lock(&dentry->d_inode->i_lock);
> >>
> >> Not important, but I suggest you use info->lock throughout for this,
> >> instead of dentry->d_inode->i_lock: in each case you need "info" for
> >> info->xattr_list (and don't need "inode" at all I think), so info->lock
> >> seems appropriate, and may be in the same cacheline once I make
> >> shmem_inode_info smaller. But don't worry if you'd prefer to leave
> >> it.
> >
> > Makes sense. I updated the locking.
>
> This uncovered a nasty bug lurking in there: the "info" area, including
> lock and xattr_list, may be overwritten by inline symlinks. Because
> xattr_list is near the end, this wasn't noticed with casual testing, but
> info->lock would immediately Oops on getxattr for symlinks.

Yikes, I'd completely forgotten about those inline symlinks:
many thanks for reminding me.

>
> I propose the following solution. It results in slightly less space for
> inline symlinks, but correct operation for xattrs. Does the anonymous
> union/struct solution look acceptable?

You're being conscientious to minimize the space reduction, and I wonder
if I'm being sloppy about it: but I think I'd prefer you to keep it simple
and just make a union of the i_direct[SHMEM_NR_DIRECT] array and the inline
symlink buffer. That does waste space that was occasionally being put to
use before, but saves us from embarrassment next time we forget about the
inline symlinks.

I intend to be removing that i_direct array very soon: I guess I'll want
to kmalloc for short symlinks then, certainly not overlaying over what
fields are left: so you'd be moving in that direction if you just reuse
the i_direct area now.

Hugh

>
> Thanks,
> Miklos
>
> Index: linux-2.6/include/linux/shmem_fs.h
> ===================================================================
> --- linux-2.6.orig/include/linux/shmem_fs.h 2011-05-12 15:59:08.000000000 +0200
> +++ linux-2.6/include/linux/shmem_fs.h 2011-05-12 15:58:25.000000000 +0200
> @@ -11,15 +11,39 @@
>
> struct shmem_inode_info {
> spinlock_t lock;
> - unsigned long flags;
> - unsigned long alloced; /* data pages alloced to file */
> - unsigned long swapped; /* subtotal assigned to swap */
> - unsigned long next_index; /* highest alloced index + 1 */
> - struct shared_policy policy; /* NUMA memory alloc policy */
> - struct page *i_indirect; /* top indirect blocks page */
> - swp_entry_t i_direct[SHMEM_NR_DIRECT]; /* first blocks */
> - struct list_head swaplist; /* chain of maybes on swap */
> - struct list_head xattr_list; /* list of shmem_xattr */
> +
> + /* list of shmem_xattr */
> + struct list_head xattr_list;
> +
> + union {
> + char inline_symlink[0];
> +
> + /* Members not used by inline symlinks: */
> + struct {
> + unsigned long flags;
> +
> + /* data pages alloced to file */
> + unsigned long alloced;
> +
> + /* subtotal assigned to swap */
> + unsigned long swapped;
> +
> + /* highest alloced index + 1 */
> + unsigned long next_index;
> +
> + /* NUMA memory alloc policy */
> + struct shared_policy policy;
> +
> + /* top indirect blocks page */
> + struct page *i_indirect;
> +
> + /* first blocks */
> + swp_entry_t i_direct[SHMEM_NR_DIRECT];
> +
> + /* chain of maybes on swap */
> + struct list_head swaplist;
> + };
> + };
> struct inode vfs_inode;
> };
>
> Index: linux-2.6/mm/shmem.c
> ===================================================================
> --- linux-2.6.orig/mm/shmem.c 2011-05-12 15:59:08.000000000 +0200
> +++ linux-2.6/mm/shmem.c 2011-05-12 15:50:10.000000000 +0200
> @@ -2029,9 +2029,9 @@ static int shmem_symlink(struct inode *d
>
> info = SHMEM_I(inode);
> inode->i_size = len-1;
> - if (len <= (char *)inode - (char *)info) {
> + if (len <= (char *)inode - info->inline_symlink) {
> /* do it inline */
> - memcpy(info, symname, len);
> + memcpy(info->inline_symlink, symname, len);
> inode->i_op = &shmem_symlink_inline_operations;
> } else {
> error = shmem_getpage(inode, 0, &page, SGP_WRITE, NULL);
> @@ -2057,7 +2057,7 @@ static int shmem_symlink(struct inode *d
>
> static void *shmem_follow_link_inline(struct dentry *dentry, struct nameidata *nd)
> {
> - nd_set_link(nd, (char *)SHMEM_I(dentry->d_inode));
> + nd_set_link(nd, SHMEM_I(dentry->d_inode)->inline_symlink);
> return NULL;
> }

2011-06-29 09:39:15

by Ric Wheeler

[permalink] [raw]
Subject: Union mount and overlayfs bake off?


I think that it has been a while since David reposted the refreshed patch set
for union mounts & know that overlayfs has recent updates as well.

Despite that, I have not seen a lot of feedback from reviewers or testers.

Al, from your point of view, where we stand?

Thanks!

Ric

2011-06-29 10:17:57

by David Howells

[permalink] [raw]
Subject: Re: Union mount and overlayfs bake off?

Ric Wheeler <[email protected]> wrote:

> I think that it has been a while since David reposted the refreshed patch set
> for union mounts & know that overlayfs has recent updates as well.
>
> Despite that, I have not seen a lot of feedback from reviewers or testers.

The main problem I've got is that it causes lockdep to generate warnings when
the top layer and one of the lower layers are of the same filesystem type. The
obvious way round this is to give each superblock its own i_mutex lock class
rather than putting this in the file_system_type struct, but I'm not sure of
the consequences (the two obvious problems are superblock transience and the
fact that there may be so many more of them that it may explode lockdep).

I've split out some of the VFS patches that we might be interested in taking
upstream anyway. They're currently sat on Al's plate for his consideration.

I've been dealing with some of Al's issues with the unionmount patches, but I
know he's got more - I just can't remember them all.

David

2011-06-29 11:40:43

by Michal Suchanek

[permalink] [raw]
Subject: Re: Union mount and overlayfs bake off?

On 29 June 2011 11:39, Ric Wheeler <[email protected]> wrote:
>
> I think that it has been a while since David reposted the refreshed patch
> set for union mounts & know that overlayfs has recent updates as well.
>
> Despite that, I have not seen a lot of feedback from reviewers or testers.
>

I can only attest that I can build working live CD on top of either
overlayfs or unionmount.

This is not a good metric for reliability because issues were found
with both solutions after the live CDs were working fine already.

However, it is an usability metric in that both solutions are from the
alpha stage in which crucial features were missing or broken way into
the beta stage in which corner case issues are discovered but the
solution is generally working.

Somebody was running racer on top of overlayfs which uncovered some
problems with overlayfs itself and linux vfs in general. Unionmount
has supposedly its own testsuite but I have never seen that.

There are some issues with neither creating full emulation of a
traditional filesystem and the cost of hiding more effects of the
union from userspace to make the filesystem look more traditional.

With something touching VFS, however, it is important to decide how
(and if) these corner case issues should be resolved because they
could possibly affect unrelated parts of the system. Reviewing all
raised concerns is unfortunately taking quite some time, and then
addressing them in code will probably require more.

Thanks

Michal

2011-06-30 12:44:19

by Miklos Szeredi

[permalink] [raw]
Subject: Re: Union mount and overlayfs bake off?

David Howells <[email protected]> writes:

> I've been dealing with some of Al's issues with the unionmount
> patches, but I know he's got more - I just can't remember them all.

A couple of questions I have:

1) What happens to the union in a cloned namespace (CLONE_NEWNS)?

2) What's the overhead for non-unioned filesystems if CONFIG_UNION_MOUNTS
is enabled? Does it show up in any microbenchmarks?

3) Is there a future strategy for making atomic operations really atomic?
E.g. what happens if power is lost in the middle of a copy-up? Or if
whiteout of source fails after a successful rename()?

4) Have you looked at overlayfs? Do you have any thoughts about the
relative merrits of each solution?

Thanks,
Miklos