git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: "René Scharfe" <l.s.r@web.de>
Cc: git@vger.kernel.org, "Junio C Hamano" <gitster@pobox.com>,
	"Torsten Bögershausen" <tboegi@web.de>
Subject: Re: [PATCH v5 0/8] Introduce timestamp_t for timestamps
Date: Wed, 26 Apr 2017 00:22:00 +0200 (CEST)	[thread overview]
Message-ID: <alpine.DEB.2.20.1704260005310.3480@virtualbox> (raw)
In-Reply-To: <7a2271ff-1386-18a6-5f6d-7eb13dc92509@web.de>

[-- Attachment #1: Type: text/plain, Size: 9771 bytes --]

Hi René,

On Tue, 25 Apr 2017, René Scharfe wrote:

> Am 24.04.2017 um 15:57 schrieb Johannes Schindelin:
> > Git v2.9.2 was released in a hurry to accomodate for platforms like
> > Windows, where the `unsigned long` data type is 32-bit even for 64-bit
> > setups.
> > 
> > The quick fix was to simply disable all the testing with "absurd"
> > future dates.
> > 
> > However, we can do much better than that, as we already make use of
> > 64-bit data types internally. There is no good reason why we should
> > not use the same for timestamps. Hence, let's use uintmax_t for
> > timestamps.
> > 
> > Note: while the `time_t` data type exists and is meant to be used for
> > timestamps, on 32-bit Linux it is *still* 32-bit. An earlier iteration
> > used `time_t` for that reason, but it came with a few serious
> > downsides: as `time_t` can be signed (and indeed, on Windows it is an
> > int64_t), Git's expectation that 0 is the minimal value does no longer
> > hold true, introducing its own set of interesting challenges. Besides,
> > if we *can* handle far in the future timestamps (except for formatting
> > them using the system libraries), it is more consistent to do so.
> 
> time_t is signed on Linux and BSDs as well.

s/is/happens to be/.

The point is: we must not rely on time_t to be signed just because it
*happens* to be the case on the setups to which we have access. We want
Git to be portable, not only "portable to our own setups".

> Using an unsigned type gives us the ability to represent times beyond
> the 292 billion years in the future that int64_t would give us, but
> prevents recording events that occurred before the Epoch.  That doesn't
> sound like a good deal to me -- storing historical works (e.g. law
> texts) with real time stamps is probably more interesting than fixing
> the year 292277026596 problem within this decade.

It sounds like a good deal to me, if the alternative is that *I* have to
patch Git's source code to support signed timestamps, when *I* am probably
the only one in this entire thread who does not even want them.

So could y'all please just stop talking about signed timestamps to me? I
feel that they are really, really start to irritate the hell out of me.

> > The upside of using `uintmax_t` for timestamps is that we do a much
> > better job to support far in the future timestamps across all
> > platforms, including 32-bit ones. The downside is that those platforms
> > that use a 32-bit `time_t` will barf when parsing or formatting those
> > timestamps.
> 
> IIUC this series has two aims: solving the year 2038 problem on 32-bit
> Linux by replacing time_t (int32_t), and solving the year 2106 problem
> on Windows by replacing unsigned long (uint32_t), right?

No. The series has one aim: to stop using `unsigned long` (which is
ill-defined to begin with) for timestamps.

> The latter one sounds more interesting, because 32-bit platforms would
> still be unable to fully use bigger time values as you wrote above.
> 
> Can we leave time_t alone and just do the part where you replace
> unsigned long with timestamp_t defined as uint64_t?  That should already
> help on Windows, correct?  When/if timestamp_t is later changed to a
> signed type then we could easily convert the time_t cases to timestamp_t
> as well, or the other way around.

This patch series leaves time_t alone already, so your wish has been
fulfilled preemptively.

> > This iteration makes the date_overflows() check more stringent again.
> > 
> > It is arguably a bug to paper over too-large author/committer dates and
> > to replace them with Jan 1 1970 without even telling the user that we do
> > that, but this is the behavior that t4212 verifies, so I reinstated that
> > behavior. The change in behavior was missed because of the missing
> > unsigned_add_overflows() test.
> 
> I can't think of many ways to get future time stamps (broken clock,
> broken CMOS battery, bit rot, time travel), so I wouldn't expect a
> change towards better error reporting to affect a lot of users.  (Not
> necessarily as part of this series, of course.)

If you want to suggest that we should stop verifying overflows when a
complex reasoning can prove that the overflow is not happening in a
billion years, I disagree. Not only is it unnecessarily time-consuming to
ask readers to perform the complex reasoning, and not only is there enough
room for bugs to hide in plain sight (because of the complexity), it also
makes the same code harder to reuse in other software where a different
timestamp data type was chosen (or inherited from previous Git versions).

I'd much rather have easy-to-reason code that does not cause head
scratching (like the "why do we ignore a too large timestamp?" triggering
`if (date_overflows(date)) date = 0;`) than pretending to be smart and
clever and make everybody else feel stupid by forcing them through hoops
of thinking bubbles until they also reached the conclusion that this
actually won't happen. Unless there is a bug in the code.

> >   Documentation/technical/api-parse-options.txt |   8 +-
> >   archive-tar.c                                 |   5 +-
> >   archive-zip.c                                 |  12 ++-
> >   archive.h                                     |   2 +-
> >   builtin/am.c                                  |   4 +-
> >   builtin/blame.c                               |  14 ++--
> >   builtin/fsck.c                                |   6 +-
> >   builtin/gc.c                                  |   2 +-
> >   builtin/log.c                                 |   4 +-
> >   builtin/merge-base.c                          |   2 +-
> >   builtin/name-rev.c                            |   6 +-
> >   builtin/pack-objects.c                        |   4 +-
> >   builtin/prune.c                               |   4 +-
> >   builtin/receive-pack.c                        |  14 ++--
> >   builtin/reflog.c                              |  24 +++---
> >   builtin/rev-list.c                            |   2 +-
> >   builtin/rev-parse.c                           |   2 +-
> >   builtin/show-branch.c                         |   4 +-
> >   builtin/worktree.c                            |   4 +-
> >   bundle.c                                      |   4 +-
> >   cache.h                                       |  14 ++--
> >   commit.c                                      |  18 ++--
> >   commit.h                                      |   2 +-
> >   config.c                                      |   2 +-
> >   credential-cache--daemon.c                    |  12 +--
> >   date.c                                        | 113
> >   ++++++++++++++------------
> >   fetch-pack.c                                  |   8 +-
> >   fsck.c                                        |   2 +-
> >   git-compat-util.h                             |   9 ++
> >   http-backend.c                                |   4 +-
> >   parse-options-cb.c                            |   4 +-
> >   pretty.c                                      |   4 +-
> >   reachable.c                                   |   9 +-
> >   reachable.h                                   |   4 +-
> >   ref-filter.c                                  |  22 ++---
> >   reflog-walk.c                                 |   8 +-
> >   refs.c                                        |  14 ++--
> >   refs.h                                        |   8 +-
> >   refs/files-backend.c                          |   8 +-
> >   revision.c                                    |   6 +-
> >   revision.h                                    |   4 +-
> >   sha1_name.c                                   |   6 +-
> >   t/helper/test-date.c                          |  18 ++--
> >   t/helper/test-parse-options.c                 |   4 +-
> >   t/helper/test-ref-store.c                     |   4 +-
> >   t/t0006-date.sh                               |   4 +-
> >   t/t5000-tar-tree.sh                           |   6 +-
> >   t/test-lib.sh                                 |   3 +
> >   tag.c                                         |   6 +-
> >   tag.h                                         |   2 +-
> >   upload-pack.c                                 |   8 +-
> >   vcs-svn/fast_export.c                         |   8 +-
> >   vcs-svn/fast_export.h                         |   4 +-
> >   vcs-svn/svndump.c                             |   2 +-
> >   wt-status.c                                   |   2 +-
> >   55 files changed, 260 insertions(+), 219 deletions(-)
> 
> How did you find all the pieces of code that need to be touched?

Pain and suffering.

Seriosly, for v1 of this patch series, I went painstakingly through `git
grep -Ovi "unsigned long"`. I determined for every single use of `unsigned
long` whether it referred to a timestamp and changed it accordingly.

In subsequent iterations, I went the cheaper route of compiling with
DEVELOPER=1 on Windows and once I even went through replacing the 64-bit
libraries and compiler/linker in my Linux VM with 32-bit ones to imitate
the 32-bit Linux Travis coordinate (because I failed to get the Docker
setup to run in the VM). These build runs identified new callers of
functions whose signature I had to change to avoid `unsigned long` for
timestamps.

This was no fun.

> Is there a regex or something that can be used to spot new such places
> that sneak in, e.g. through in-flight merges?

No, a regex would only work if we already had converted all `unsigned
long` uses to semantically meaningful data types.

Ciao,
Dscho

P.S.: Please remove the quoted interdiff when there is no reason to keep
it around.

  reply	other threads:[~2017-04-25 22:22 UTC|newest]

Thread overview: 113+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-27 21:30 [PATCH 0/6] Use time_t Johannes Schindelin
2017-02-27 21:30 ` [PATCH 1/6] t0006 & t5000: prepare for 64-bit time_t Johannes Schindelin
2017-02-27 22:55   ` Junio C Hamano
2017-02-27 21:30 ` [PATCH 2/6] Specify explicitly where we parse timestamps Johannes Schindelin
2017-02-27 22:37   ` Junio C Hamano
2017-02-27 22:51     ` Junio C Hamano
2017-02-28 10:49       ` Johannes Schindelin
2017-02-27 21:31 ` [PATCH 3/6] Introduce a new "printf format" for timestamps Johannes Schindelin
2017-03-01 18:20   ` Junio C Hamano
2017-03-01 19:53     ` Junio C Hamano
2017-02-27 21:31 ` [PATCH 4/6] Prepare for timestamps to use 64-bit signed types Johannes Schindelin
2017-02-27 21:31 ` [PATCH 5/6] ref-filter: avoid using `unsigned long` for catch-all data type Johannes Schindelin
2017-02-27 21:31 ` [PATCH 6/6] Use time_t where appropriate Johannes Schindelin
2017-02-27 22:48 ` [PATCH 0/6] Use time_t Junio C Hamano
2017-02-28 11:32   ` Johannes Schindelin
2017-02-28 14:28 ` Jeff King
2017-02-28 15:01   ` Johannes Schindelin
2017-02-28 16:38   ` René Scharfe
2017-02-28 18:55     ` Junio C Hamano
2017-02-28 20:04       ` Jeff King
2017-02-28 20:54       ` Johannes Schindelin
2017-02-28 21:31         ` Jeff King
2017-02-28 21:31         ` René Scharfe
2017-02-28 23:10           ` Johannes Schindelin
2017-03-01  0:59             ` René Scharfe
2017-02-28 17:26   ` Junio C Hamano
2017-02-28 20:01     ` Jeff King
2017-02-28 22:27       ` Junio C Hamano
2017-02-28 22:33         ` Jeff King
2017-03-01 17:23           ` Junio C Hamano
2017-04-02 19:06 ` [PATCH v2 0/8] Introduce timestamp_t for timestamps Johannes Schindelin
2017-04-02 19:06   ` [PATCH v2 1/8] ref-filter: avoid using `unsigned long` for catch-all data type Johannes Schindelin
2017-04-03  4:22     ` Torsten Bögershausen
2017-04-03 22:47       ` Johannes Schindelin
2017-04-02 19:06   ` [PATCH v2 2/8] t0006 & t5000: prepare for 64-bit timestamps Johannes Schindelin
2017-04-02 19:06   ` [PATCH v2 3/8] t0006 & t5000: skip "far in the future" test when time_t is too limited Johannes Schindelin
2017-04-02 19:06   ` [PATCH v2 4/8] Specify explicitly where we parse timestamps Johannes Schindelin
2017-04-03  4:26     ` Torsten Bögershausen
2017-04-03 22:50       ` Johannes Schindelin
2017-04-02 19:06   ` [PATCH v2 5/8] Introduce a new "printf format" for timestamps Johannes Schindelin
2017-04-02 19:06   ` [PATCH v2 6/8] Introduce a new data type " Johannes Schindelin
2017-04-02 19:07   ` [PATCH v2 7/8] Abort if the system time cannot handle one of our timestamps Johannes Schindelin
2017-04-02 19:07   ` [PATCH v2 8/8] Use uintmax_t for timestamps Johannes Schindelin
2017-04-20 20:52   ` [PATCH v3 0/8] Introduce timestamp_t " Johannes Schindelin
2017-04-20 20:52     ` [PATCH v3 1/8] ref-filter: avoid using `unsigned long` for catch-all data type Johannes Schindelin
2017-04-20 20:52     ` [PATCH v3 2/8] t0006 & t5000: prepare for 64-bit timestamps Johannes Schindelin
2017-04-20 20:58     ` [PATCH v3 3/8] t0006 & t5000: skip "far in the future" test when time_t is too limited Johannes Schindelin
2017-04-20 20:58     ` [PATCH v3 4/8] Specify explicitly where we parse timestamps Johannes Schindelin
2017-04-20 20:58     ` [PATCH v3 5/8] Introduce a new "printf format" for timestamps Johannes Schindelin
2017-04-20 20:58     ` [PATCH v3 6/8] Introduce a new data type " Johannes Schindelin
2017-04-20 20:58     ` [PATCH v3 7/8] Abort if the system time cannot handle one of our timestamps Johannes Schindelin
2017-04-20 20:59     ` [PATCH v3 8/8] Use uintmax_t for timestamps Johannes Schindelin
2017-04-21  6:05     ` [PATCH v3 0/8] Introduce timestamp_t " Junio C Hamano
2017-04-21 10:44       ` Johannes Schindelin
2017-04-21 10:45     ` [PATCH v4 0/9] " Johannes Schindelin
2017-04-21 10:45       ` [PATCH v4 1/9] ref-filter: avoid using `unsigned long` for catch-all data type Johannes Schindelin
2017-04-21 10:45       ` [PATCH v4 2/9] t0006 & t5000: prepare for 64-bit timestamps Johannes Schindelin
2017-04-21 10:45       ` [PATCH v4 3/9] t0006 & t5000: skip "far in the future" test when time_t is too limited Johannes Schindelin
2017-04-21 10:45       ` [PATCH v4 4/9] Specify explicitly where we parse timestamps Johannes Schindelin
2017-04-24  3:19         ` Junio C Hamano
2017-04-21 10:45       ` [PATCH v4 5/9] Introduce a new "printf format" for timestamps Johannes Schindelin
2017-04-21 10:45       ` [PATCH v4 6/9] Introduce a new data type " Johannes Schindelin
2017-04-21 10:45       ` [PATCH v4 7/9] Abort if the system time cannot handle one of our timestamps Johannes Schindelin
2017-04-24  3:16         ` Junio C Hamano
2017-04-24 13:57           ` Johannes Schindelin
2017-04-25  2:37             ` Junio C Hamano
2017-04-25  3:56             ` Junio C Hamano
2017-04-21 10:46       ` [PATCH v4 8/9] Use uintmax_t for timestamps Johannes Schindelin
2017-04-24  3:24         ` Junio C Hamano
2017-04-24 10:28           ` Johannes Schindelin
2017-04-25  3:59             ` Junio C Hamano
2017-04-25 20:10               ` Johannes Schindelin
2017-04-26  1:52                 ` Junio C Hamano
2017-04-26  3:45                   ` Junio C Hamano
2017-04-26  9:32                   ` Johannes Schindelin
2017-04-26 13:18                     ` Junio C Hamano
2017-04-21 10:46       ` [PATCH v4 9/9] show_date_ident(): defer date overflow check Johannes Schindelin
2017-04-24  3:29       ` [PATCH v4 0/9] Introduce timestamp_t for timestamps Junio C Hamano
2017-04-24  6:15         ` Jacob Keller
2017-04-24 14:02           ` Johannes Schindelin
2017-04-24 11:37         ` Jeff King
2017-04-25 20:13           ` Johannes Schindelin
2017-04-24 14:00         ` Johannes Schindelin
2017-04-24 13:57       ` [PATCH v5 0/8] " Johannes Schindelin
2017-04-24 13:57         ` [PATCH v5 1/8] ref-filter: avoid using `unsigned long` for catch-all data type Johannes Schindelin
2017-04-24 13:57         ` [PATCH v5 2/8] t0006 & t5000: prepare for 64-bit timestamps Johannes Schindelin
2017-04-24 13:58         ` [PATCH v5 3/8] t0006 & t5000: skip "far in the future" test when time_t is too limited Johannes Schindelin
2017-04-24 13:58         ` [PATCH v5 4/8] Specify explicitly where we parse timestamps Johannes Schindelin
2017-04-25  5:59           ` Junio C Hamano
2017-04-24 13:58         ` [PATCH v5 5/8] Introduce a new "printf format" for timestamps Johannes Schindelin
2017-04-24 13:58         ` [PATCH v5 6/8] Introduce a new data type " Johannes Schindelin
2017-04-26 16:43           ` Johannes Sixt
2017-04-26 19:18             ` Johannes Schindelin
2017-04-26 22:32           ` René Scharfe
2017-04-24 13:58         ` [PATCH v5 7/8] Abort if the system time cannot handle one of our timestamps Johannes Schindelin
2017-04-24 13:58         ` [PATCH v5 8/8] Use uintmax_t for timestamps Johannes Schindelin
2017-04-26 16:36           ` Johannes Sixt
2017-04-26 19:09             ` Johannes Schindelin
2017-04-25 21:54         ` [PATCH v5 0/8] Introduce timestamp_t " René Scharfe
2017-04-25 22:22           ` Johannes Schindelin [this message]
2017-04-26 22:09             ` René Scharfe
2017-04-26  1:56           ` Junio C Hamano
2017-04-26 19:20         ` [PATCH v6 " Johannes Schindelin
2017-04-26 19:26           ` [PATCH v6 1/8] ref-filter: avoid using `unsigned long` for catch-all data type Johannes Schindelin
2017-04-26 19:26           ` [PATCH v6 2/8] t0006 & t5000: prepare for 64-bit timestamps Johannes Schindelin
2017-04-26 19:26           ` [PATCH v6 3/8] t0006 & t5000: skip "far in the future" test when time_t is too limited Johannes Schindelin
2017-04-26 19:26           ` [PATCH v6 4/8] Specify explicitly where we parse timestamps Johannes Schindelin
2017-04-26 19:29           ` [PATCH v6 5/8] Introduce a new "printf format" for timestamps Johannes Schindelin
2017-04-26 19:29           ` [PATCH v6 6/8] Introduce a new data type " Johannes Schindelin
2017-05-20  5:47             ` [PATCH] name-rev: change a "long" variable to timestamp_t Junio C Hamano
2017-05-22 13:39               ` Johannes Schindelin
2017-04-26 19:29           ` [PATCH v6 7/8] Abort if the system time cannot handle one of our timestamps Johannes Schindelin
2017-04-26 19:29           ` [PATCH v6 8/8] Use uintmax_t for timestamps Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.20.1704260005310.3480@virtualbox \
    --to=johannes.schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=l.s.r@web.de \
    --cc=tboegi@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).