2005-09-20 18:48:38

by Blaisorblade

[permalink] [raw]
Subject: [PATCH 1/7] Add dm-snapshot tutorial in Documentation

I've recently discovered the real functionality of device-mapper snapshots,
and since they are not well known, I've decided to write some docs for
them.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[email protected]>
---

Documentation/device-mapper/snapshot.txt | 70 ++++++++++++++++++++++++++++++
1 files changed, 70 insertions(+), 0 deletions(-)

diff --git a/Documentation/device-mapper/snapshot.txt b/Documentation/device-mapper/snapshot.txt
new file mode 100644
--- /dev/null
+++ b/Documentation/device-mapper/snapshot.txt
@@ -0,0 +1,70 @@
+Device-mapper snapshot support
+==============================
+
+Device-mapper allows you, without massive data copying,
+
+*) to create snapshots of one block device (i.e. mountable, saved states of
+one block device, which are also writable without interfering with the
+original content),
+*) and to create device "forks", also called COW devices, i.e. multiple
+different versions of the same data stream.
+
+In both cases, dm copies only the changed data (actually, only the changed
+chunks).
+
+There are two available targets, snapshot (for the latter) and snapshot-origin
+(for the former).
+
+*) snapshot <origin> <cow space> <persistent?> <chunksize>
+
+a snapshot is created of the <origin> block device. Changed chunks, wide
+<chunksize> sectors, will be stored on the <cow space> block device. Writes
+will only go to <cow space>, reads will come from <cow space>, or from
+<origin> for unchanged datas. <cow space> will normally be smaller than the
+origin, so if too much data is written on the snapshot, it will start
+returning errors on write. However you can always expand the snapshot later.
+
+<persistent?> is p (persistent) or n(not persistent, will not survive after
+reboot).
+For transient snapshots there is no need to save metadata on disk.
+
+*) snapshot-origin <origin>: <origin> must be a device-mapper block device,
+
+which will normally have one or more snapshots based on it. Reads will be
+mapped directly on backing device; for each write, the original data will be
+saved in the "cow space" of each snapshot to keep their visible content
+unchanged, at least until the cow space fills up.
+
+How this is used at LVM level
+==============================
+When you create a LVM* snapshot of a volume, four dm devices are used:
+
+1) a device containing the original mapping table of the source volume;
+2) a device used as COW space;
+3) a "snapshot" device, combining #1 and #2, which is the visible snapshot
+ volume;
+4) the "original" volume (which keeps the old minor), whose table is replaced
+ by a "snapshot-origin" mapping from device #1.
+
+Fixed name schemes are used, so with the following commands:
+
+lvcreate -L 1G -n base volumeGroup
+lvcreate -L 100M --snapshot -n snap volumeGroup/base
+
+we'll have this situation (with volumes in above order):
+
+# dmsetup table|grep volumeGroup
+
+volumeGroup-base-real: 0 2097152 linear 8:19 384
+volumeGroup-snap-cow: 0 204800 linear 8:19 2097536
+volumeGroup-snap: 0 2097152 snapshot 254:11 254:12 P 16
+volumeGroup-base: 0 2097152 snapshot-origin 254:11
+
+# ll -L /dev/mapper/volumeGroup-*
+brw------- 1 root root 254, 11 29 ago 18:15 /dev/mapper/volumeGroup-base-real
+brw------- 1 root root 254, 12 29 ago 18:15 /dev/mapper/volumeGroup-snap-cow
+brw------- 1 root root 254, 13 29 ago 18:15 /dev/mapper/volumeGroup-snap
+brw------- 1 root root 254, 10 29 ago 18:14 /dev/mapper/volumeGroup-base
+
+* I've verified this with LVM 2.01.09, however I assume this is the LVM2 way
+ of doing this.


2005-09-20 18:49:16

by Blaisorblade

[permalink] [raw]
Subject: [PATCH 2/7] i386: little pgtable.h consolidation vs 2/3level

Join together some common functions (pmd_page{,_kernel}) over 2level and
3level pages.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[email protected]>
---

include/asm-i386/pgtable-2level.h | 5 -----
include/asm-i386/pgtable-3level.h | 5 -----
include/asm-i386/pgtable.h | 5 +++++
3 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/include/asm-i386/pgtable-2level.h b/include/asm-i386/pgtable-2level.h
--- a/include/asm-i386/pgtable-2level.h
+++ b/include/asm-i386/pgtable-2level.h
@@ -26,11 +26,6 @@
#define pfn_pte(pfn, prot) __pte(((pfn) << PAGE_SHIFT) | pgprot_val(prot))
#define pfn_pmd(pfn, prot) __pmd(((pfn) << PAGE_SHIFT) | pgprot_val(prot))

-#define pmd_page(pmd) (pfn_to_page(pmd_val(pmd) >> PAGE_SHIFT))
-
-#define pmd_page_kernel(pmd) \
-((unsigned long) __va(pmd_val(pmd) & PAGE_MASK))
-
/*
* All present user pages are user-executable:
*/
diff --git a/include/asm-i386/pgtable-3level.h b/include/asm-i386/pgtable-3level.h
--- a/include/asm-i386/pgtable-3level.h
+++ b/include/asm-i386/pgtable-3level.h
@@ -74,11 +74,6 @@ static inline void set_pte(pte_t *ptep,
*/
static inline void pud_clear (pud_t * pud) { }

-#define pmd_page(pmd) (pfn_to_page(pmd_val(pmd) >> PAGE_SHIFT))
-
-#define pmd_page_kernel(pmd) \
-((unsigned long) __va(pmd_val(pmd) & PAGE_MASK))
-
#define pud_page(pud) \
((struct page *) __va(pud_val(pud) & PAGE_MASK))

diff --git a/include/asm-i386/pgtable.h b/include/asm-i386/pgtable.h
--- a/include/asm-i386/pgtable.h
+++ b/include/asm-i386/pgtable.h
@@ -368,6 +368,11 @@ static inline pte_t pte_modify(pte_t pte
#define pte_offset_kernel(dir, address) \
((pte_t *) pmd_page_kernel(*(dir)) + pte_index(address))

+#define pmd_page(pmd) (pfn_to_page(pmd_val(pmd) >> PAGE_SHIFT))
+
+#define pmd_page_kernel(pmd) \
+((unsigned long) __va(pmd_val(pmd) & PAGE_MASK))
+
/*
* Helper function that returns the kernel pagetable entry controlling
* the virtual address 'address'. NULL means no pagetable entry present.

2005-09-20 18:48:37

by Blaisorblade

[permalink] [raw]
Subject: [PATCH 6/7] update stale comment for removal of page->list

From: Paolo 'Blaisorblade' Giarrusso <[email protected]>

Update comment for the 2.6.6-rc1 conversion from page->list and
address_space->{clean,dirty,locked}_pages to radix tree tagging and ->lru.

I've mostly avoided to mention page lists (at least I've shortened the
comment).

CC: Hugh Dickins <[email protected]>
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[email protected]>
---

include/linux/mm.h | 9 ++++-----
1 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -350,7 +350,8 @@ static inline void put_page(struct page
* only one copy in memory, at most, normally.
*
* For the non-reserved pages, page_count(page) denotes a reference count.
- * page_count() == 0 means the page is free.
+ * page_count() == 0 means the page is free. page->lru is then used for
+ * freelist management in the buddy allocator.
* page_count() == 1 means the page is used for exactly one purpose
* (e.g. a private data page of one process).
*
@@ -376,10 +377,8 @@ static inline void put_page(struct page
* attaches, plus 1 if `private' contains something, plus one for
* the page cache itself.
*
- * All pages belonging to an inode are in these doubly linked lists:
- * mapping->clean_pages, mapping->dirty_pages and mapping->locked_pages;
- * using the page->list list_head. These fields are also used for
- * freelist managemet (when page_count()==0).
+ * Instead of keeping dirty/clean pages in per address-space lists, we instead
+ * now tag pages as dirty/under writeback in the radix tree.
*
* There is also a per-mapping radix tree mapping index to the page
* in memory if present. The tree is rooted at mapping->root.

2005-09-20 18:48:14

by Blaisorblade

[permalink] [raw]
Subject: [PATCH 3/7] fix locking comment in unmap_region()

From: Paolo 'Blaisorblade' Giarrusso <[email protected]>

That comment is plain wrong (we even take the pagetable lock inside
unmap_region()).

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[email protected]>
---

mm/mmap.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1640,7 +1640,7 @@ static void unmap_vma_list(struct mm_str
/*
* Get rid of page table information in the indicated region.
*
- * Called with the page table lock held.
+ * Called with the mm semaphore held.
*/
static void unmap_region(struct mm_struct *mm,
struct vm_area_struct *vma, struct vm_area_struct *prev,

2005-09-20 18:49:38

by Blaisorblade

[permalink] [raw]
Subject: [PATCH 4/7] README update from the stone age

We have no options which the user can set in the Makefile. Only the
EXTRAVERSION, which is also useful in place of the "backup modules"
suggestion.

Hey! Can anybody tell me when we last had configuration options in the top
Makefile? Please?

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[email protected]>
---

README | 9 ++++++---
1 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/README b/README
--- a/README
+++ b/README
@@ -149,6 +149,9 @@ CONFIGURING the kernel:
"make gconfig" X windows (Gtk) based configuration tool.
"make oldconfig" Default all questions based on the contents of
your existing ./.config file.
+ "make silentoldconfig"
+ Like above, but avoids cluttering the screen
+ with question already answered.

NOTES on "make config":
- having unnecessary drivers will make the kernel bigger, and can
@@ -169,9 +172,6 @@ CONFIGURING the kernel:
should probably answer 'n' to the questions for
"development", "experimental", or "debugging" features.

- - Check the top Makefile for further site-dependent configuration
- (default SVGA mode etc).
-
COMPILING the kernel:

- Make sure you have gcc 2.95.3 available.
@@ -199,6 +199,9 @@ COMPILING the kernel:
are installing a new kernel with the same version number as your
working kernel, make a backup of your modules directory before you
do a "make modules_install".
+ In alternative, before compiling, edit your Makefile and change the
+ "EXTRAVERSION" line - its content is appended to the regular kernel
+ version.

- In order to boot your new kernel, you'll need to copy the kernel
image (e.g. .../linux/arch/i386/boot/bzImage after compilation)

2005-09-20 18:48:14

by Blaisorblade

[permalink] [raw]
Subject: [PATCH 5/7] Clearify comment in swapfile.c

From: Paolo 'Blaisorblade' Giarrusso <[email protected]>

That comment is unclear enough (since there's no pte_wrprotect) that I
"fixed" it, and even Hugh, when rejecting my "fix", agreed on the code
being "mystifying". So here's a note on this.

CC: Hugh Dickins <[email protected]>
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[email protected]>
---

mm/swapfile.c | 7 ++++---
1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/mm/swapfile.c b/mm/swapfile.c
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -399,9 +399,10 @@ void free_swap_and_cache(swp_entry_t ent

/*
* Always set the resulting pte to be nowrite (the same as COW pages
- * after one process has exited). We don't know just how many PTEs will
- * share this swap entry, so be cautious and let do_wp_page work out
- * what to do if a write is requested later.
+ * after one process has exited - so vma->vm_page_prot is already
+ * write-protected). We don't know just how many PTEs will share this
+ * swap entry, so be cautious and let do_wp_page work out what to do if
+ * a write is requested later.
*
* vma->vm_mm->page_table_lock is held.
*/

2005-09-20 18:48:38

by Blaisorblade

[permalink] [raw]
Subject: [PATCH 7/7] Add a note about partially hardcoded VM_* flags

From: Paolo 'Blaisorblade' Giarrusso <[email protected]>

Hugh made me note this line for permission checking in mprotect():

if ((newflags & ~(newflags >> 4)) & 0xf) {

after figuring out what's that about, I decided it's nasty enough. Btw Hugh
itself didn't like the 0xf.

We can safely change it to VM_READ|VM_WRITE|VM_EXEC because we never change
VM_SHARED, so no need to check that.

CC: Hugh Dickins <[email protected]>
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[email protected]>
---

include/linux/mm.h | 1 +
mm/mprotect.c | 3 ++-
2 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -136,6 +136,7 @@ extern unsigned int kobjsize(const void
#define VM_EXEC 0x00000004
#define VM_SHARED 0x00000008

+/* mprotect() hardcodes VM_MAYREAD >> 4 == VM_READ, and so for r/w/x bits. */
#define VM_MAYREAD 0x00000010 /* limits for mprotect() etc */
#define VM_MAYWRITE 0x00000020
#define VM_MAYEXEC 0x00000040
diff --git a/mm/mprotect.c b/mm/mprotect.c
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -248,7 +248,8 @@ sys_mprotect(unsigned long start, size_t

newflags = vm_flags | (vma->vm_flags & ~(VM_READ | VM_WRITE | VM_EXEC));

- if ((newflags & ~(newflags >> 4)) & 0xf) {
+ /* newflags >> 4 shift VM_MAY% in place of VM_% */
+ if ((newflags & ~(newflags >> 4)) & (VM_READ | VM_WRITE | VM_EXEC)) {
error = -EACCES;
goto out;
}

2005-09-20 22:14:06

by Nix

[permalink] [raw]
Subject: Re: [PATCH 1/7] Add dm-snapshot tutorial in Documentation

On 20 Sep 2005, Paolo Giarrusso docced:
> +When you create a LVM* snapshot of a volume, four dm devices are used:
[...]
> +* I've verified this with LVM 2.01.09, however I assume this is the LVM2 way
> + of doing this.

Yes; LVM1 doesn't use device-mapper at all, so these docs don't apply to
it.

--
`One cannot, after all, be expected to read every single word
of a book whose author one wishes to insult.' --- Richard Dawkins

2005-09-20 23:53:54

by Randy Dunlap

[permalink] [raw]
Subject: Re: [PATCH 4/7] README update from the stone age

On Tue, 20 Sep 2005, Paolo 'Blaisorblade' Giarrusso wrote:

> We have no options which the user can set in the Makefile. Only the
> EXTRAVERSION, which is also useful in place of the "backup modules"
> suggestion.
>
> Hey! Can anybody tell me when we last had configuration options in the top
> Makefile? Please?
>
> Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[email protected]>
> ---
>
> README | 9 ++++++---
> 1 files changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/README b/README
> --- a/README
> +++ b/README
> @@ -149,6 +149,9 @@ CONFIGURING the kernel:
> "make gconfig" X windows (Gtk) based configuration tool.
> "make oldconfig" Default all questions based on the contents of
> your existing ./.config file.
> + "make silentoldconfig"
> + Like above, but avoids cluttering the screen
> + with question already answered.
questions

>
> NOTES on "make config":
> - having unnecessary drivers will make the kernel bigger, and can
> @@ -169,9 +172,6 @@ CONFIGURING the kernel:
> should probably answer 'n' to the questions for
> "development", "experimental", or "debugging" features.
>
> - - Check the top Makefile for further site-dependent configuration
> - (default SVGA mode etc).
> -
> COMPILING the kernel:
>
> - Make sure you have gcc 2.95.3 available.
> @@ -199,6 +199,9 @@ COMPILING the kernel:
> are installing a new kernel with the same version number as your
> working kernel, make a backup of your modules directory before you
> do a "make modules_install".
> + In alternative, before compiling, edit your Makefile and change the
Alternatively,

> + "EXTRAVERSION" line - its content is appended to the regular kernel
> + version.
Or consider using CONFIG_LOCALVERSION, which can be set by
using the "make *config" tools in the "General Setup" menu.

>
> - In order to boot your new kernel, you'll need to copy the kernel
> image (e.g. .../linux/arch/i386/boot/bzImage after compilation)

--
~Randy

2005-09-21 15:20:29

by Blaisorblade

[permalink] [raw]
Subject: Re: [PATCH 1/7] Add dm-snapshot tutorial in Documentation

On Wednesday 21 September 2005 00:13, Nix wrote:
> On 20 Sep 2005, Paolo Giarrusso docced:
> > +When you create a LVM* snapshot of a volume, four dm devices are used:
>
> [...]
>
> > +* I've verified this with LVM 2.01.09, however I assume this is the LVM2
> > way + of doing this.

> Yes; LVM1 doesn't use device-mapper at all, so these docs don't apply to
> it.
I really meant "I assume that all LVM2 releases work this way".
--
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade






___________________________________
Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB
http://mail.yahoo.it

2005-09-21 16:11:28

by Nix

[permalink] [raw]
Subject: Re: [PATCH 1/7] Add dm-snapshot tutorial in Documentation

On Wed, 21 Sep 2005, [email protected] spake:
> On Wednesday 21 September 2005 00:13, Nix wrote:
>> On 20 Sep 2005, Paolo Giarrusso docced:
>> > +When you create a LVM* snapshot of a volume, four dm devices are used:
>>
>> [...]
>>
>> > +* I've verified this with LVM 2.01.09, however I assume this is the LVM2
>> > way + of doing this.
>
>> Yes; LVM1 doesn't use device-mapper at all, so these docs don't apply to
>> it.
> I really meant "I assume that all LVM2 releases work this way".

As far as I know they do, modulo bugs, although if you go back far enough
device-mapper doesn't have support for snapshots at all.

--
`One cannot, after all, be expected to read every single word
of a book whose author one wishes to insult.' --- Richard Dawkins