git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Brandon Casey <bcasey@nvidia.com>
Cc: git@vger.kernel.org, mfick@codeaurora.org, gitster@pobox.com,
	Brandon Casey <drafnel@gmail.com>
Subject: Re: [PATCH] remote.c: avoid O(n^2) behavior in match_push_refs by using string_list
Date: Wed, 3 Jul 2013 02:23:32 -0400	[thread overview]
Message-ID: <20130703062332.GA16090@sigill.intra.peff.net> (raw)
In-Reply-To: <1372809228-2963-1-git-send-email-bcasey@nvidia.com>

On Tue, Jul 02, 2013 at 04:53:48PM -0700, Brandon Casey wrote:

> From: Brandon Casey <drafnel@gmail.com>
> 
> When pushing, each ref in the local repository must be paired with a
> ref advertised by the remote server.  Currently, this is performed by
> first applying the refspec to the local ref to transform the local ref
> into the name of the remote ref, and then performing a linear search
> through the list of remote refs to see if the remote ref was advertised
> by the remote system.
> 
> This has O(n) complexity and makes match_push_refs() be an O(n^2)
> operation.

Just to be sure I understand correctly, is this actually O(m*n) where
"m" is the number of local refs and "n" is the number of remote refs?

For a repository that repeatedly pushes everything it has to the remote,
we end up with m=n, but it would not necessarily be the case if you are
pushing a subset of your refs. But even pushing a small number of refs
into a repository with a very large number of refs would be
unnecessarily slow, as we would do several O(n) lookups which could be
O(log n). So it may speed things up even in the case of a normal-sized
repo pushing to a large one.

> Dry-run push of a repository with 121913 refs:
> 
>         before     after
> real    1m40.582s  0m0.804s
> user    1m39.914s  0m0.515s
> sys     0m0.125s   0m0.106s

Very nice. :)

> Signed-off-by: Brandon Casey <drafnel@gmail.com>
> ---
>  remote.c | 26 ++++++++++++++++++++++++--
>  1 file changed, 24 insertions(+), 2 deletions(-)

Patch itself looks good to me, although...

> @@ -1362,6 +1378,8 @@ int match_push_refs(struct ref *src, struct ref **dst,
>  		free(dst_name);
>  	}
>  
> +	string_list_clear(&ref_list, 0);
> +
>  	if (flags & MATCH_REFS_FOLLOW_TAGS)
>  		add_missing_tags(src, dst, &dst_tail);
>  
> @@ -1376,11 +1394,15 @@ int match_push_refs(struct ref *src, struct ref **dst,
>  
>  			src_name = get_ref_match(rs, nr_refspec, ref, send_mirror, FROM_DST, NULL);
>  			if (src_name) {
> -				if (!find_ref_by_name(src, src_name))
> +				if (!ref_list.nr)
> +					prepare_searchable_ref_list(src,
> +						&ref_list);
> +				if (!string_list_has_string(&ref_list, src_name))

This hunk threw me for a bit, as it looked like we were lazily
initializing ref_list in case we had not done so earlier. But we would
have cleared it mid-way through the function (in the hunk above), and it
is only that we are reusing the same ref_list for two different
purposes.

I do not feel strongly about it, but it might be a little more obvious
to just declare a new variable in the block, like:

diff --git a/remote.c b/remote.c
index 75255af..53bef82 100644
--- a/remote.c
+++ b/remote.c
@@ -1399,6 +1399,7 @@ int match_push_refs(struct ref *src, struct ref **dst,
 		add_missing_tags(src, dst, &dst_tail);
 
 	if (send_prune) {
+		struct string_list src_ref_index = STRING_LIST_INIT_NODUP;
 		/* check for missing refs on the remote */
 		for (ref = *dst; ref; ref = ref->next) {
 			char *src_name;
@@ -1409,15 +1410,15 @@ int match_push_refs(struct ref *src, struct ref **dst,
 
 			src_name = get_ref_match(rs, nr_refspec, ref, send_mirror, FROM_DST, NULL);
 			if (src_name) {
-				if (!ref_list.nr)
+				if (!src_ref_index.nr)
 					prepare_searchable_ref_list(src,
-						&ref_list);
-				if (!string_list_has_string(&ref_list, src_name))
+						&src_ref_index);
+				if (!string_list_has_string(&src_ref_index, src_name))
 					ref->peer_ref = alloc_delete_ref();
 				free(src_name);
 			}
 		}
-		string_list_clear(&ref_list, 0);
+		string_list_clear(&src_ref_index, 0);
 	}
 	if (errs)
 		return -1;

And similarly maybe call the outer ref_list dst_ref_index or something.
I also note that we don't do the lazy-prepare for the other loop. I
guess that is because we assume that "src" is always non-NULL?

-Peff

  reply	other threads:[~2013-07-03  6:23 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-02 23:53 [PATCH] remote.c: avoid O(n^2) behavior in match_push_refs by using string_list Brandon Casey
2013-07-03  6:23 ` Jeff King [this message]
2013-07-03 18:12   ` Brandon Casey
2013-07-03 18:40     ` Junio C Hamano
2013-07-03 19:00       ` Jeff King
2013-07-03 20:05         ` Brandon Casey
2013-07-03 19:21       ` Brandon Casey
2013-07-03 20:22         ` Junio C Hamano
2013-07-08  7:02           ` [PATCH v2] remote.c: avoid O(m*n) behavior in match_push_refs Brandon Casey
2013-07-08  7:50             ` Jeff King
2013-07-08  8:58               ` [PATCH v2 w/prune index] " Brandon Casey
2013-07-08 16:12                 ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130703062332.GA16090@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=bcasey@nvidia.com \
    --cc=drafnel@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mfick@codeaurora.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).