git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] index: be careful when handling long names
  2008-01-13 19:39           ` Junio C Hamano
@ 2008-01-13 22:36             ` Junio C Hamano
  2008-01-13 22:53               ` Alex Riesen
  0 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2008-01-13 22:36 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

We currently use lower 12-bit (masked with CE_NAMEMASK) in the
ce_flags field to store the length of the name in cache_entry,
without checking the length parameter given to
create_ce_flags().  This can make us store incorrect length.

Currently we are mostly protected by the fact that many
codepaths first copy the path in a variable of size PATH_MAX,
which typically is 4096 that happens to match the limit, but
that feels like a bug waiting to happen.  Besides, that would
not allow us to shorten the width of CE_NAMEMASK to use the bits
for new flags.

This redefines the meaning of the name length stored in the
cache_entry.  A name that does not fit is represented by storing
CE_NAMEMASK in the field, and the actual length needs to be
computed by actually counting the bytes in the name[] field.
This way, only the unusually long paths need to suffer.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 Junio C Hamano <gitster@pobox.com> writes:

 > About the CE_NAMEMASK limitation (and currently we do not check
 > it, so I think we would be screwed when a pathname that is
 > longer than (CE_NAMEMASK+1) and still fits under PATH_MAX is
 > given), I think we do not have to limit the maximum pathname
 > length.  Instead we can teach create_ce_flags() and ce_namelen()
 > that a name longer than 2k (or 4k) has the NAMEMASK bits that
 > are all 1 and ce->name[] must be counted if so (with an obvious
 > optimization to start counting at byte position 2k or 4k in
 > ce_namelen()).

 This should fix it.  Passes tests including the new test that
 the existing code fails.

 cache.h          |   17 +++++++++++++++--
 t/t0000-basic.sh |   18 ++++++++++++++++++
 2 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/cache.h b/cache.h
index 39331c2..ad53acc 100644
--- a/cache.h
+++ b/cache.h
@@ -114,8 +114,21 @@ struct cache_entry {
 #define CE_VALID     (0x8000)
 #define CE_STAGESHIFT 12
 
-#define create_ce_flags(len, stage) htons((len) | ((stage) << CE_STAGESHIFT))
-#define ce_namelen(ce) (CE_NAMEMASK & ntohs((ce)->ce_flags))
+static inline unsigned create_ce_flags(size_t len, unsigned stage)
+{
+	if (len >= CE_NAMEMASK)
+		len = CE_NAMEMASK;
+	return htons(len | (stage << CE_STAGESHIFT));
+}
+
+static inline size_t ce_namelen(const struct cache_entry *ce)
+{
+	size_t len = ntohs((ce)->ce_flags) & CE_NAMEMASK;
+	if (len < CE_NAMEMASK) /* likely */
+		return len;
+	return strlen(ce->name + CE_NAMEMASK) + CE_NAMEMASK;
+}
+
 #define ce_size(ce) cache_entry_size(ce_namelen(ce))
 #define ce_stage(ce) ((CE_STAGEMASK & ntohs((ce)->ce_flags)) >> CE_STAGESHIFT)
 
diff --git a/t/t0000-basic.sh b/t/t0000-basic.sh
index 4e49d59..40551a3 100755
--- a/t/t0000-basic.sh
+++ b/t/t0000-basic.sh
@@ -297,4 +297,22 @@ test_expect_success 'absolute path works as expected' '
 	test "$sym" = "$(test-absolute-path $dir2/syml)"
 '
 
+test_expect_success 'very long name in the index handled sanely' '
+
+	a=a && # 1
+	a=$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a && # 16
+	a=$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a && # 256
+	a=$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a && # 4096
+	a=${a}q
+
+	>path4 &&
+	git update-index --add path4 &&
+	(
+		git ls-files -s path4 |
+		sed -e "s/	.*/	/" |
+		tr -d "\012"
+		echo "$a"
+	) | git update-index --index-info
+'
+
 test_done

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] index: be careful when handling long names
  2008-01-13 22:36             ` [PATCH] index: be careful when handling long names Junio C Hamano
@ 2008-01-13 22:53               ` Alex Riesen
  2008-01-13 23:08                 ` Junio C Hamano
  0 siblings, 1 reply; 7+ messages in thread
From: Alex Riesen @ 2008-01-13 22:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Git Mailing List

Junio C Hamano, Sun, Jan 13, 2008 23:36:34 +0100:
> +test_expect_success 'very long name in the index handled sanely' '
> +
> +	a=a && # 1
> +	a=$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a && # 16
> +	a=$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a && # 256
> +	a=$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a && # 4096

I'd expect it to fail on some systems (everywindowsthing up to w2k,
maybe some commercial unices).

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] index: be careful when handling long names
  2008-01-13 22:53               ` Alex Riesen
@ 2008-01-13 23:08                 ` Junio C Hamano
  2008-01-13 23:33                   ` Alex Riesen
  0 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2008-01-13 23:08 UTC (permalink / raw)
  To: Alex Riesen; +Cc: Linus Torvalds, Git Mailing List

Alex Riesen <raa.lkml@gmail.com> writes:

> Junio C Hamano, Sun, Jan 13, 2008 23:36:34 +0100:
>> +test_expect_success 'very long name in the index handled sanely' '
>> +
>> +	a=a && # 1
>> +	a=$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a && # 16
>> +	a=$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a && # 256
>> +	a=$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a && # 4096
>
> I'd expect it to fail on some systems (everywindowsthing up to w2k,
> maybe some commercial unices).

My understanding is that Everywindowsthing do not come with any
(POSIX compliant) shell that we support by default, so if you
are talking about a limit of shell variable value, I do not
think it is an issue to begin with.  It is just the matter of
picking a sensible shell (I understand both Cygwin and msys
ports use a shell that supports more than 4k bytes in value
given to a variable).

I would agree that it might overflow the argument limit when
this is given to "echo", though.  We cannot do much about it,
but you may have cleverer ideas.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] index: be careful when handling long names
  2008-01-13 23:08                 ` Junio C Hamano
@ 2008-01-13 23:33                   ` Alex Riesen
  2008-01-14 21:03                     ` Junio C Hamano
  0 siblings, 1 reply; 7+ messages in thread
From: Alex Riesen @ 2008-01-13 23:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Git Mailing List

Junio C Hamano, Mon, Jan 14, 2008 00:08:07 +0100:
> Alex Riesen <raa.lkml@gmail.com> writes:
> 
> > Junio C Hamano, Sun, Jan 13, 2008 23:36:34 +0100:
> >> +test_expect_success 'very long name in the index handled sanely' '
> >> +
> >> +	a=a && # 1
> >> +	a=$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a && # 16
> >> +	a=$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a && # 256
> >> +	a=$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a && # 4096
> >
> > I'd expect it to fail on some systems (everywindowsthing up to w2k,
> > maybe some commercial unices).
> 
> My understanding is that Everywindowsthing do not come with any
> (POSIX compliant) shell that we support by default, so if you
> are talking about a limit of shell variable value, I do not
> think it is an issue to begin with.  It is just the matter of

Oh, right. The file system wont even see it, it is passed directly to
update-index.

> picking a sensible shell (I understand both Cygwin and msys
> ports use a shell that supports more than 4k bytes in value
> given to a variable).

can't check right now, but I believe it is so

> I would agree that it might overflow the argument limit when
> this is given to "echo", though.  We cannot do much about it,
> but you may have cleverer ideas.

I thought about conditionally disabling the test, like it was done
when the tabs in filenames had to be tested. Wont be needed for this
particular case.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] index: be careful when handling long names
  2008-01-13 23:33                   ` Alex Riesen
@ 2008-01-14 21:03                     ` Junio C Hamano
  0 siblings, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2008-01-14 21:03 UTC (permalink / raw)
  To: Alex Riesen; +Cc: Linus Torvalds, Git Mailing List

Alex Riesen <raa.lkml@gmail.com> writes:

> Junio C Hamano, Mon, Jan 14, 2008 00:08:07 +0100:
> ...
>> I would agree that it might overflow the argument limit when
>> this is given to "echo", though.  We cannot do much about it,
>> but you may have cleverer ideas.
>
> I thought about conditionally disabling the test, like it was done
> when the tabs in filenames had to be tested.

Yes, we can and should do that when somebody reports this (or
any other tests) a real issue, as we have done for other tests.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] index: be careful when handling long names
  2008-01-19  3:25 Updated in-memory index cleanup Linus Torvalds
@ 2008-01-19  7:42 ` Junio C Hamano
  0 siblings, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2008-01-19  7:42 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

We currently use lower 12-bit (masked with CE_NAMEMASK) in the
ce_flags field to store the length of the name in cache_entry,
without checking the length parameter given to
create_ce_flags().  This can make us store incorrect length.

Currently we are mostly protected by the fact that many
codepaths first copy the path in a variable of size PATH_MAX,
which typically is 4096 that happens to match the limit, but
that feels like a bug waiting to happen.  Besides, that would
not allow us to shorten the width of CE_NAMEMASK to use the bits
for new flags.

This redefines the meaning of the name length stored in the
cache_entry.  A name that does not fit is represented by storing
CE_NAMEMASK in the field, and the actual length needs to be
computed by actually counting the bytes in the name[] field.
This way, only the unusually long paths need to suffer.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * This is rebased on your "Updated in-memory index" patch.

   When we read on-disk index, I fear that we trust the name
   length a bit too much.  With or without this patch, you could
   craft a malicious index file that records a namelen that is
   longer than the real string name length for the last entry to
   cause us to go beyond the mmaped area.  Maybe we would want
   to make sure that (1) the name lengths are sane; (2) names
   have NUL at the place we expect them to be; and (3) names are
   sorted in the proper cache order, while we iterate over the
   ondisk structure and convert it to incore structure.

 cache.h          |   17 +++++++++++++++--
 read-cache.c     |   12 +++++++++++-
 t/t0000-basic.sh |   20 ++++++++++++++++++++
 3 files changed, 46 insertions(+), 3 deletions(-)

diff --git a/cache.h b/cache.h
index 4a054c5..9eaffde 100644
--- a/cache.h
+++ b/cache.h
@@ -131,8 +131,21 @@ struct cache_entry {
 #define CE_UPDATE    (0x10000)
 #define CE_REMOVE    (0x20000)
 
-#define create_ce_flags(len, stage) ((len) | ((stage) << CE_STAGESHIFT))
-#define ce_namelen(ce) (CE_NAMEMASK & (ce)->ce_flags)
+static inline unsigned create_ce_flags(size_t len, unsigned stage)
+{
+	if (len >= CE_NAMEMASK)
+		len = CE_NAMEMASK;
+	return (len | (stage << CE_STAGESHIFT));
+}
+
+static inline size_t ce_namelen(const struct cache_entry *ce)
+{
+	size_t len = ce->ce_flags & CE_NAMEMASK;
+	if (len < CE_NAMEMASK)
+		return len;
+	return strlen(ce->name + CE_NAMEMASK) + CE_NAMEMASK;
+}
+
 #define ce_size(ce) cache_entry_size(ce_namelen(ce))
 #define ondisk_ce_size(ce) ondisk_cache_entry_size(ce_namelen(ce))
 #define ce_stage(ce) ((CE_STAGEMASK & (ce)->ce_flags) >> CE_STAGESHIFT)
diff --git a/read-cache.c b/read-cache.c
index f5f9c3d..f98f57c 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -927,6 +927,8 @@ int read_index(struct index_state *istate)
 
 static void convert_from_disk(struct ondisk_cache_entry *ondisk, struct cache_entry *ce)
 {
+	size_t len;
+
 	ce->ce_ctime = ntohl(ondisk->ctime.sec);
 	ce->ce_mtime = ntohl(ondisk->mtime.sec);
 	ce->ce_dev   = ntohl(ondisk->dev);
@@ -938,7 +940,15 @@ static void convert_from_disk(struct ondisk_cache_entry *ondisk, struct cache_en
 	/* On-disk flags are just 16 bits */
 	ce->ce_flags = ntohs(ondisk->flags);
 	hashcpy(ce->sha1, ondisk->sha1);
-	memcpy(ce->name, ondisk->name, ce_namelen(ce)+1);
+
+	len = ce->ce_flags & CE_NAMEMASK;
+	if (len == CE_NAMEMASK)
+		len = strlen(ondisk->name);
+	/*
+	 * NEEDSWORK: If the original index is crafted, this copy could
+	 * go unchecked.
+	 */
+	memcpy(ce->name, ondisk->name, len + 1);
 }
 
 /* remember to discard_cache() before reading a different cache! */
diff --git a/t/t0000-basic.sh b/t/t0000-basic.sh
index 4e49d59..9f84b8d 100755
--- a/t/t0000-basic.sh
+++ b/t/t0000-basic.sh
@@ -297,4 +297,24 @@ test_expect_success 'absolute path works as expected' '
 	test "$sym" = "$(test-absolute-path $dir2/syml)"
 '
 
+test_expect_success 'very long name in the index handled sanely' '
+
+	a=a && # 1
+	a=$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a && # 16
+	a=$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a && # 256
+	a=$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a$a && # 4096
+	a=${a}q &&
+
+	>path4 &&
+	git update-index --add path4 &&
+	(
+		git ls-files -s path4 |
+		sed -e "s/	.*/	/" |
+		tr -d "\012"
+		echo "$a"
+	) | git update-index --index-info &&
+	len=$(git ls-files "a*" | wc -c) &&
+	test $len = 4098
+'
+
 test_done
-- 
1.5.4.rc3.37.gfdcf3

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH] index: be careful when handling long names
@ 2008-07-20  0:51 Junio C Hamano
  0 siblings, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2008-07-20  0:51 UTC (permalink / raw)
  To: git; +Cc: Petr Baudis

This is a follow-up to 7fec10b (index: be careful when handling long
names, 2008-01-18) where we fixed handling of index entries with long
names.  There still remained a few remaining unsafe codepaths.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin-apply.c        |    2 +-
 builtin-update-index.c |    2 +-
 read-cache.c           |    2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/builtin-apply.c b/builtin-apply.c
index e15471b..f2377fe 100644
--- a/builtin-apply.c
+++ b/builtin-apply.c
@@ -2758,7 +2758,7 @@ static void add_index_file(const char *path, unsigned mode, void *buf, unsigned
 	ce = xcalloc(1, ce_size);
 	memcpy(ce->name, path, namelen);
 	ce->ce_mode = create_ce_mode(mode);
-	ce->ce_flags = namelen;
+	ce->ce_flags = create_ce_flags(namelen, 0);
 	if (S_ISGITLINK(mode)) {
 		const char *s = buf;
 
diff --git a/builtin-update-index.c b/builtin-update-index.c
index 38eb53c..003cb98 100644
--- a/builtin-update-index.c
+++ b/builtin-update-index.c
@@ -95,7 +95,7 @@ static int add_one_path(struct cache_entry *old, const char *path, int len, stru
 	size = cache_entry_size(len);
 	ce = xcalloc(1, size);
 	memcpy(ce->name, path, len);
-	ce->ce_flags = len;
+	ce->ce_flags = create_ce_flags(len, 0);
 	fill_stat_cache_info(ce, st);
 	ce->ce_mode = ce_mode_from_stat(old, st->st_mode);
 
diff --git a/read-cache.c b/read-cache.c
index 1648428..2dd8131 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -498,7 +498,7 @@ int add_to_index(struct index_state *istate, const char *path, struct stat *st,
 	size = cache_entry_size(namelen);
 	ce = xcalloc(1, size);
 	memcpy(ce->name, path, namelen);
-	ce->ce_flags = namelen;
+	ce->ce_flags = create_ce_flags(namelen, 0);
 	fill_stat_cache_info(ce, st);
 
 	if (trust_executable_bit && has_symlinks)

^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2008-07-20  0:52 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-07-20  0:51 [PATCH] index: be careful when handling long names Junio C Hamano
  -- strict thread matches above, loose matches on Subject: below --
2008-01-19  3:25 Updated in-memory index cleanup Linus Torvalds
2008-01-19  7:42 ` [PATCH] index: be careful when handling long names Junio C Hamano
2008-01-12 22:46 performance problem: "git commit filename" Linus Torvalds
2008-01-13  1:46 ` Linus Torvalds
2008-01-13  4:04   ` Linus Torvalds
2008-01-13  8:12     ` Junio C Hamano
2008-01-13 10:33       ` Junio C Hamano
2008-01-13 17:24         ` Linus Torvalds
2008-01-13 19:39           ` Junio C Hamano
2008-01-13 22:36             ` [PATCH] index: be careful when handling long names Junio C Hamano
2008-01-13 22:53               ` Alex Riesen
2008-01-13 23:08                 ` Junio C Hamano
2008-01-13 23:33                   ` Alex Riesen
2008-01-14 21:03                     ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).