git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Heiko Voigt <git-list@hvoigt.net>
To: Michael Haggerty <mhagger@alum.mit.edu>
Cc: Junio C Hamano <gitster@pobox.com>,
	ydirson@altern.org, git@vger.kernel.org
Subject: [CVSPS PATCH v2] fix: correct rev order in case commiters clocks were not syncronised
Date: Wed, 18 Mar 2009 18:34:03 +0100	[thread overview]
Message-ID: <49C1308B.7070509@hvoigt.net> (raw)
In-Reply-To: <49B52F74.1090006@alum.mit.edu>

This fixes the following kind of cvs issues:

 Structure of the test cvs repository

 Message   File:Content         Commit Time
 Rev 1     a: 1.1               2009-02-21 19:11:43 +0100
 Rev 2     a: 1.2    b: 1.1     2009-02-21 19:11:14 +0100
 Rev 3               b: 1.2     2009-02-21 19:11:43 +0100

 As you can see the commit of Rev 3 has the same time as
 Rev 1 this was leading to a broken estimation of patchset
 order.

The correction only applies to the main development line and specifically
ignores branches. The reason behind this is that a complete implementation
covering the correction of branches needs much more work.

I do not consider this patch a very "clean" bugfix. It is somewhat hacky
but may help someone who otherwise would not be able to use cvsps. It
additionally can help to find broken revisions and manually correct the
input files.

Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net>
---
I am currently looking into a more clean solution which does not fidle
with the timestamps but only corrects the ordering of patchsets. To
implement this cvsps needs to be extended to use a topological sort
algorithm instead of a simple merge sort. Maybe Michael can give me some
directions how this was solved in cvs2svn.

My current idea is about using a modified selection sort. Sorting in
chronological order. It should always select the minimum date as the
next patchset as long as it does not conflict with the minimum last
chosen revision of all affected files. If it conflicts another pass will
select the patchset with the minimum revision for that file. This means
sorting time will be O(n^2), another drawback is the space required to
remember the last revision of every file and its branches.

Assuming that this can be implemented in my limited time I will attempt
to do so.

Here are a few explanations about this updated patch:

Michael Haggerty schrieb:
> 
> I am not familiar with the cvsps code, so I will just make some comments
> about things that it is not obvious from your patch that you have
> considered.  These things all caused problems in pre-2.0 versions of
> cvs2svn:
> 
> 1. It is not clear from the patch in what order the revisions are being
> processed.  If they are being processed in the order that they appear in
> the RCS file, then you have to consider branches:
> 
>    * The date adjustment should only occur along chains of revisions
> that are "causally related" -- that is, adjacent revisions on trunk, or
> adjacent revisions on a branch, or the first revision on a branch
> relative to the revision from which the branch sprouted.  It is not
> always the case that revisions that are adjacent in the RCS file are
> causally related.
> 
>    * The revisions along trunk appear in RCS files in reverse
> chronological order; e.g., 1.3, 1.2, 1.1 (this seems to be the case you
> handle).  But the revisions along a branch appear in chronological
> order; e.g., 1.3.2.1, 1.3.2.2, 1.3.2.3.  Do you handle both cases
> correctly?  (A unit test involving revisions on branches would be helpful.)

The input which is parsed by cvsps comes from the cvs log command. There
everything appears in reverse chronological order. Branches always come
at the end of the log so they are not timely ordered among the main
development line.

I adressed this issue by only applying the correction to the mainline.
This is not complete but solves the most practical special case.

A complete implementation needs much more work because in addition to
the last revision of the main line the last revision of every branch
would need to be remembered.

> 2. One form of clock skew that is common in CVS repositories is that
> some computer's CMOS battery went dead and the clock reverted to 1970
> after every reboot.  Given that you adjust revisions' times only towards
> the past, then such a glitch would force the times of all earlier
> revisions to be pushed back to 1970.  (Since you unconditionally
> subtract one second from each commit timestamp, this could also
> conceivably cause an underflow to 2038, but this is admittedly rather
> unlikely.)  This is a hard problem to solve generally.  But if you want
> to handle this problem more robustly, I suggest that you always adjust
> times towards the future, as incorrect clock times in the far future
> seem to be less common in practice.
> 
> Of course these clock skew corrections, if only applied to one file at a
> time, can easily cause changesets to be broken up if the time deltas
> exceed five minutes.

I addressed this by checking for time_t underflow before correction.
cvps will exit with an error message in case it happens. Due to the
reverse chronological order of the input I am only able to see one step
into the future but not into the past. Thats why I can only correct into
the past.

> 
> 3. When cvsps collects individual file revisions into changesets (within
> the 5 minute window), a single "consensus" commit time has to be chosen
> from all of the single-file commits.  Depending on how cvsps does this,
> it could be that the consensus commit times for two commits involving
> revisions within a single file are put back out of order (undoing your
> timestamp fixup).  It would be nice to verify that this does not result
> in out-of-order commits.

On file revisions cvsps corrects using this patch it will probably
happen that one change is broken into multiple patchsets. Because cvsps
issues warnings for every revision time it is changing the user can 
manually repair these rcs files.

cvsps keeps the minimum time of any member as the "consensus" time. So
the time correction backwards in time can result in out of order
commits. So it is not ensured that all commits compile. But it should be
ensured that all revisions of single files appear in the correct order.

As stated in the commit message this is not a "clean" solution but can
help to clean the imported repository and get correct revision ordering
on the main development line.

 cvsps.c |   53 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 53 insertions(+), 0 deletions(-)

diff --git a/cvsps.c b/cvsps.c
index 81c6e21..1e72969 100644
--- a/cvsps.c
+++ b/cvsps.c
@@ -259,6 +259,45 @@ int main(int argc, char *argv[])
     exit(0);
 }
 
+void detect_and_repair_time_skew(const char *last_date, char *date, int n,
+                                 PatchSetMember *psm)
+{
+
+    time_t smaller;
+    time_t bigger;
+    struct tm *ts;
+    char *rev_end;
+
+    /* if last_date does not exist do nothing */
+    if (last_date[0] == '\0')
+        return;
+
+    /* TODO: repairing of branch times, skipping them for the moment */
+    /* check whether rev is of the form /1.[0-9]+/ */
+    if (psm->post_rev->rev[0] != '1' || psm->post_rev->rev[1] != '.')
+        return;
+    strtol(psm->post_rev->rev+2, &rev_end, 10);
+    if (*rev_end != '\0')
+        return;
+
+    /* important: because rlog is showing revisions backwards last_date should
+     * always be bigger than date */
+    convert_date(&bigger, last_date);
+    convert_date(&smaller, date);
+
+    if (difftime(bigger, smaller) <= 0) {
+        debug(DEBUG_APPMSG1, "broken revision date: %s -> %s file: %s, repairing.\n",
+              date, last_date, psm->file->filename);
+        if (!(bigger > 0)) {
+            debug(DEBUG_APPERROR, "timestamp underflow, exiting ... ");
+            exit(1);
+        }
+        smaller = bigger - 1;
+        ts = gmtime(&smaller);
+        strftime(date, n, "%Y-%m-%d %H:%M:%S", ts);
+    }
+}
+
 static void load_from_cvs()
 {
     FILE * cvsfp;
@@ -267,6 +306,7 @@ static void load_from_cvs()
     CvsFile * file = NULL;
     PatchSetMember * psm = NULL;
     char datebuff[20];
+    char last_datebuff[20];
     char authbuff[AUTH_STR_MAX];
     int logbufflen = LOG_STR_MAX + 1;
     char * logbuff = malloc(logbufflen);
@@ -334,6 +374,8 @@ static void load_from_cvs()
 	exit(1);
     }
 
+    /* initialize the last_datebuff with value indicating invalid date */
+    last_datebuff[0]='\0';
     for (;;)
     {
 	char * tst;
@@ -474,8 +516,14 @@ static void load_from_cvs()
 	    {
 		if (psm)
 		{
+		    detect_and_repair_time_skew(last_datebuff, datebuff, 20, psm);
 		    PatchSet * ps = get_patch_set(datebuff, logbuff, authbuff, psm->post_rev->branch, psm);
 		    patch_set_add_member(ps, psm);
+
+		    /* remember last revision */
+		    strncpy(last_datebuff, datebuff, 20);
+		    /* just to be sure */
+		    last_datebuff[19] = '\0';
 		}
 
 		logbuff[0] = 0;
@@ -487,8 +535,13 @@ static void load_from_cvs()
 	    {
 		if (psm)
 		{
+		    detect_and_repair_time_skew(last_datebuff, datebuff, 20, psm);
 		    PatchSet * ps = get_patch_set(datebuff, logbuff, authbuff, psm->post_rev->branch, psm);
 		    patch_set_add_member(ps, psm);
+
+		    /* just finished the last revision of this file, set last_datebuff to invalid */
+		    last_datebuff[0]='\0';
+
 		    assign_pre_revision(psm, NULL);
 		}
 
-- 
1.6.1.2.390.gba743

      parent reply	other threads:[~2009-03-18 17:35 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-23 18:49 [PATCH] cvsimport: add test illustrating a bug in cvsps Heiko Voigt
2009-02-23 20:35 ` Heiko Voigt
2009-02-24  5:00 ` Michael Haggerty
2009-03-02 17:59   ` [PATCH v2 0/1] " Heiko Voigt
2009-03-02 17:59   ` [PATCH v2 1/1] " Heiko Voigt
2009-03-09 11:26     ` [CVSPS PATCH] fix: correct rev order in case commiters clocks were not syncronised Heiko Voigt
2009-03-09 15:02       ` Michael Haggerty
2009-03-18 17:33         ` [PATCH v3 0/2] cvsimport: add test illustrating a bug in cvsps Heiko Voigt
2009-03-18 18:22           ` Junio C Hamano
2009-03-19 10:41             ` Michael J Gruber
2009-03-19 11:00               ` Johannes Schindelin
2009-03-19 11:22                 ` Michael J Gruber
2009-03-21  5:41               ` Michael Haggerty
2009-03-23 18:11                 ` started a cvsps testsuite Was: " Heiko Voigt
2009-03-23 19:06                   ` Martin Langhoff
2009-03-24  4:50                   ` Michael Haggerty
2009-04-06 19:01                     ` Heiko Voigt
2009-03-23 17:47             ` Heiko Voigt
2009-03-18 17:33         ` [PATCH v3 1/2] " Heiko Voigt
2009-03-18 17:33         ` [PATCH v3 2/2] cvsimport: extend testcase about patchset order to contain branches Heiko Voigt
2009-03-18 17:34         ` Heiko Voigt [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49C1308B.7070509@hvoigt.net \
    --to=git-list@hvoigt.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mhagger@alum.mit.edu \
    --cc=ydirson@altern.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).