As a follow-up on the discussions held in this list after Ted's proposal
on case-insensitive support for ext4 last year, I've implemented a my
version of it to learn my way through the ext4 code, and to get started
on the task. As my goal would be a slightly more complex approach, with
at least UTF-8 support for lookups, I'd like to hear from you about my
current implementation and the following proposal, as well as if there
is anyone currently working on anything like this, so we could
coordinate efforts.
Regarding the Unicode proposal (shivers), I am aware of Ben and
Olaf's proposal from 2014, and I plan to work on top of that to get it
updated and upstream.
Please, let me know your thoughts.
* Current Implementation
Simple learning experience to get the wheel spinning, based on Ted Ts'o
initial proposal back in 2016 [1]. It supports ASCII-case folding only
and doesn't rely on on-disk modifications.
Available at:
https://git.collabora.com/cgit/user/krisman/linux.git/log/?h=ext4-insensitive
- Implemented as a mount option, called ignorecase, which enables
insensitive lookups for the entire filesystem. A look-up will first
attempt the htree search for an exact-case match and if that fails,
fallback to a performing the expensive linear search.
- ASCII case-folding only.
- No on-disk format changes.
- If two files differ only by case, an exact-case lookup will return the
expected file. A non-exact-case lookup result will be unpredictable
as to which file is returned (depends on the order on-disk).
* Proposal
- Make insensitive lookups enabled on a per-directory basis via an
attribute.
- Support UTF-8 encoding in-kernel for case folding.
- sb will store the information required for unicode versioning and
encoding.
- Empty directories are optimized with an insensitive hash for htree
lookups.
- Fallback to linear searches on directories not optimized.
[1] https://www.spinics.net/lists/linux-ext4/msg54279.html
--
Gabriel Krisman Bertazi
On Fri, Sep 01, 2017 at 02:57:44AM -0300, Gabriel Krisman Bertazi wrote:
>
> As a follow-up on the discussions held in this list after Ted's proposal
> on case-insensitive support for ext4 last year, I've implemented a my
> version of it to learn my way through the ext4 code, and to get started
> on the task. As my goal would be a slightly more complex approach, with
> at least UTF-8 support for lookups, I'd like to hear from you about my
> current implementation and the following proposal, as well as if there
> is anyone currently working on anything like this, so we could
> coordinate efforts.
>
> Regarding the Unicode proposal (shivers), I am aware of Ben and
> Olaf's proposal from 2014, and I plan to work on top of that to get it
> updated and upstream.
>
> Please, let me know your thoughts.
>
> * Current Implementation
>
> Simple learning experience to get the wheel spinning, based on Ted Ts'o
> initial proposal back in 2016 [1]. It supports ASCII-case folding only
> and doesn't rely on on-disk modifications.
This isn't a complete implementation of my proposal. In particular
one of the things which is missing is:
1. If case-insensitivity is enabled, override the default dcache hash
and compare operations to ones that are case insensitive in ext4's
dcache_operations structure.
This is needed so there is a single dcache entry for case-folded file
names.
- Ted
Theodore Ts'o <[email protected]> writes:
Hi Ted,
> This isn't a complete implementation of my proposal. In particular
> one of the things which is missing is:
>
> 1. If case-insensitivity is enabled, override the default dcache hash
> and compare operations to ones that are case insensitive in ext4's
> dcache_operations structure.
>
> This is needed so there is a single dcache entry for case-folded file
> names.
Sorry for the delay in replying. In fact, the dcache hash operations
were part of my original patch, but I dropped it before submitting in
favor of d_add_ci(), which I expected would prevent duplication of the
same elements, differing only by case in the dentry cache.
I have shared it in a different branch if you want to take a look.
git://git.collabora.com/git/user/krisman/linux.git -b ext4-insensitive-dcache-patch
Despite that, I've been learning my way in the VFS subsystem,
investigating the suggestion made by you and HCH on the thread I
mentioned:
> I talked to Christoph at the Plumbers Closing party, and he suggested
> that we get something simple in first which (a) assumes no on-disk
> format changes, (b) does everything in the VFS layer, by using a
> MS_CASE_FOLD, uses a case-insensitive dentry hash, and which degrades
> to a brute force search in the VFS by using readdir interfaces if the
> direct lookup does not succeed, and (c) at least initially assumes
> only ASCII.
My current question on this approach is how the MS_CASE_FOLD could be
exposed to userspace. It is not any system call that can receive a new
flag to request an insensitive lookup. In this case, are you
considering a new set of system calls to perform case-insensitive
lookups, some per-process thing or another approach I'm not considering?
Can you provide me with more information on this?
Thanks for helping out on reviewing my code.
--
Gabriel Krisman Bertazi