git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Michael Haggerty <mhagger@alum.mit.edu>
Cc: Jeff King <peff@peff.net>, Johan Herland <johan@herland.net>,
	Thomas Rast <trast@inf.ethz.ch>,
	git@vger.kernel.org
Subject: Re: [PATCH v2 11/25] object_array_remove_duplicates(): rewrite to reduce copying
Date: Wed, 29 May 2013 09:18:05 -0700	[thread overview]
Message-ID: <7vk3mhwyiq.fsf@alter.siamese.dyndns.org> (raw)
In-Reply-To: <1369472904-12875-12-git-send-email-mhagger@alum.mit.edu> (Michael Haggerty's message of "Sat, 25 May 2013 11:08:10 +0200")

Michael Haggerty <mhagger@alum.mit.edu> writes:

> The old version copied one entry to its destination position, then
> deleted any matching entries from the tail of the array.  This
> required the tail of the array to be copied multiple times.  It didn't
> affect the complexity of the algorithm because the whole tail has to
> be searched through anyway.  But all the copying was unnecessary.
>
> Instead, check for the existence of an entry with the same name in the
> *head* of the list before copying an entry to its final position.
> This way each entry has to be copied at most one time.
>
> Extract a helper function contains_name() to do a bit of the work.
>
> Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
> ---
>  object.c | 32 +++++++++++++++++++++-----------
>  object.h |  6 +++++-
>  2 files changed, 26 insertions(+), 12 deletions(-)
>
> diff --git a/object.c b/object.c
> index fcd4a82..10b5349 100644
> --- a/object.c
> +++ b/object.c
> @@ -294,22 +294,32 @@ void object_array_filter(struct object_array *array,
>  	array->nr = dst;
>  }
>  
> +/*
> + * Return true iff array already contains an entry with name.
> + */
> +static int contains_name(struct object_array *array, const char *name)
> +{
> +	unsigned nr = array->nr, i;
> +	struct object_array_entry *object = array->objects;
> +
> +	for (i = 0; i < nr; i++, object++)
> +		if (!strcmp(object->name, name))
> +			return 1;
> +	return 0;
> +}

Because some codepaths (e.g. patch 14/25) stuff NULL in the name
field, we may want to be more careful with this.

This is not a new problem, and I think the longer term solution is
to get rid of object_array_remove_duplicates(), so it is perfectly
fine to leave this function broken with respect to NULL input as-is.

The only caller of remove-duplicates is bundle.c, which gets many
starting points and end points from the command line and tries to be
nice by removing obvious duplicates, e.g.

	git bundle create t.bundle master master

but I think its logic of deduping is wrong.  It runs dwim_ref() on
the incoming refs after the remove-duplicates call, so

	git bundle create t.bundle master heads/mater

will end up with two copies of refs/heads/master.  To fix it, the
code must dedup the result of running dwim_ref(), and at that point,
there is no reason to call object_array_remove_duplicates().

> +
>  void object_array_remove_duplicates(struct object_array *array)
>  {
> -	unsigned int ref, src, dst;
> +	unsigned nr = array->nr, src;
>  	struct object_array_entry *objects = array->objects;
>  
> -	for (ref = 0; ref + 1 < array->nr; ref++) {
> -		for (src = ref + 1, dst = src;
> -		     src < array->nr;
> -		     src++) {
> -			if (!strcmp(objects[ref].name, objects[src].name))
> -				continue;
> -			if (src != dst)
> -				objects[dst] = objects[src];
> -			dst++;
> +	array->nr = 0;
> +	for (src = 0; src < nr; src++) {
> +		if (!contains_name(array, objects[src].name)) {
> +			if (src != array->nr)
> +				objects[array->nr] = objects[src];
> +			array->nr++;
>  		}
> -		array->nr = dst;
>  	}
>  }

>  
> diff --git a/object.h b/object.h
> index 0d39ff4..6c1c27f 100644
> --- a/object.h
> +++ b/object.h
> @@ -96,7 +96,11 @@ typedef int (*object_array_each_func_t)(struct object_array_entry *, void *);
>  void object_array_filter(struct object_array *array,
>  			 object_array_each_func_t want, void *cb_data);
>  
> -void object_array_remove_duplicates(struct object_array *);
> +/*
> + * Remove from array all but the first entry with a given name.
> + * Warning: this function uses an O(N^2) algorithm.
> + */
> +void object_array_remove_duplicates(struct object_array *array);
>  
>  void clear_object_flags(unsigned flags);

  reply	other threads:[~2013-05-29 16:18 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-25  9:07 [PATCH v2 00/25] Remove assumptions about each_ref_fn arg lifetimes Michael Haggerty
2013-05-25  9:08 ` [PATCH v2 01/25] describe: make own copy of refname Michael Haggerty
2013-05-25  9:08 ` [PATCH v2 02/25] fetch: make own copies of refnames Michael Haggerty
2013-05-25  9:08 ` [PATCH v2 03/25] add_rev_cmdline(): make a copy of the name argument Michael Haggerty
2013-05-25  9:08 ` [PATCH v2 04/25] builtin_diff_tree(): make it obvious that function wants two entries Michael Haggerty
2013-05-25  9:08 ` [PATCH v2 05/25] cmd_diff(): use an object_array for holding trees Michael Haggerty
2013-05-25  9:08 ` [PATCH v2 06/25] cmd_diff(): rename local variable "list" -> "entry" Michael Haggerty
2013-05-25  9:08 ` [PATCH v2 07/25] cmd_diff(): make it obvious which cases are exclusive of each other Michael Haggerty
2013-05-25  9:08 ` [PATCH v2 08/25] revision: split some overly-long lines Michael Haggerty
2013-05-25  9:08 ` [PATCH v2 09/25] object_array: add function object_array_filter() Michael Haggerty
2013-05-25  9:08 ` [PATCH v2 10/25] revision: use object_array_filter() in implementation of gc_boundary() Michael Haggerty
2013-05-25  9:08 ` [PATCH v2 11/25] object_array_remove_duplicates(): rewrite to reduce copying Michael Haggerty
2013-05-29 16:18   ` Junio C Hamano [this message]
2013-05-30 21:14     ` Michael Haggerty
2013-06-02 21:02       ` Junio C Hamano
2013-05-25  9:08 ` [PATCH v2 12/25] fsck: don't put a void*-shaped peg in a char*-shaped hole Michael Haggerty
2013-05-25  9:08 ` [PATCH v2 13/25] find_first_merges(): initialize merges variable using initializer Michael Haggerty
2013-05-25  9:08 ` [PATCH v2 14/25] find_first_merges(): remove unnecessary code Michael Haggerty
2013-05-25  9:08 ` [PATCH v2 15/25] object_array_entry: fix memory handling of the name field Michael Haggerty
2013-05-29 16:24   ` Junio C Hamano
2013-05-25  9:08 ` [PATCH v2 16/25] do_fetch(): reduce scope of peer_item Michael Haggerty
2013-05-25  9:08 ` [PATCH v2 17/25] do_fetch(): clean up existing_refs before exiting Michael Haggerty
2013-05-25  9:08 ` [PATCH v2 18/25] add_existing(): do not retain a reference to sha1 Michael Haggerty
2013-05-25  9:08 ` [PATCH v2 19/25] show_head_ref(): do not shadow name of argument Michael Haggerty
2013-05-25  9:08 ` [PATCH v2 20/25] show_head_ref(): rename first parameter to "refname" Michael Haggerty
2013-05-25  9:08 ` [PATCH v2 21/25] string_list_add_one_ref(): " Michael Haggerty
2013-05-25  9:08 ` [PATCH v2 22/25] string_list_add_refs_by_glob(): add a comment about memory management Michael Haggerty
2013-05-29  8:21   ` Thomas Rast
2013-05-30 19:29     ` Michael Haggerty
2013-05-30 22:05     ` [PATCH v2 FIXUP 22/25] fixup! " Michael Haggerty
2013-06-03 15:31       ` Junio C Hamano
2013-05-25  9:08 ` [PATCH v2 23/25] exclude_existing(): set existing_refs.strdup_strings Michael Haggerty
2013-05-25  9:08 ` [PATCH v2 24/25] register_ref(): make a copy of the bad reference SHA-1 Michael Haggerty
2013-05-29 16:53   ` Junio C Hamano
2013-05-30 21:51     ` Michael Haggerty
2013-05-30 22:09       ` Philip Oakley
2013-05-25  9:08 ` [PATCH v2 25/25] refs: document the lifetime of the args passed to each_ref_fn Michael Haggerty
2013-05-29 16:54   ` Junio C Hamano
2013-05-29  8:25 ` [PATCH v2 00/25] Remove assumptions about each_ref_fn arg lifetimes Thomas Rast
2013-05-30 19:55   ` Michael Haggerty

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7vk3mhwyiq.fsf@alter.siamese.dyndns.org \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=johan@herland.net \
    --cc=mhagger@alum.mit.edu \
    --cc=peff@peff.net \
    --cc=trast@inf.ethz.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).