git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Contributor Summit Topics and Logistics
@ 2019-01-22  7:50 Jeff King
  2019-01-22  8:26 ` Jeff King
                   ` (4 more replies)
  0 siblings, 5 replies; 19+ messages in thread
From: Jeff King @ 2019-01-22  7:50 UTC (permalink / raw)
  To: git

The Git Merge Contributor Summit is a little over a week away. If you're
interested in coming but haven't signed up, please do! We have a few
spaces available still. Details are in the previous announcement:

  http://public-inbox.org/git/20181206094805.GA1398@sigill.intra.peff.net/

There's no set agenda; we'll decide what to discuss that day. But if
anybody would like to mention topics they are interested in (whether you
want to present on them, or just have an open discussion), please do so
here. A little advance notice can help people prepare more for the
discussions.

Even if you're not coming, please feel free to suggest topics (but bonus
points if you convince somebody who _is_ coming to lead the session).

If you're not coming, you can probably stop reading this message now.
The rest is all logistics.

We have the room available from 9am-5pm. Breakfast will be served from
9am-10am in the main area (i.e., mingling with workshop attendees and
other such ruffians). So I'd suggest to show up around 9am to get
registered and mingle, and then we can start the Very Serious Business
at 10am.

Lunch will be provided at noon, and some snacks in the afternoon.
There's no organized dinner. However, there will be a social/drinks
event for the broader conference at 7pm; I'll provide more details that
day.

For people who want to try to join remotely, I don't think we're going
to have a particularly fancy AV setup. But there should at least be a
big screen (which we typically do not really use for presenting), and I
hope we can provide some connectivity. I'll be visiting the venue the
day before (Jan 30th) in the late afternoon (Brussels time) and I'll try
to do a test run. If anybody wants to volunteer to be the guinea pig on
the other end of the line, I'd welcome it.

The physical setup this year will actually be 4 round tables, instead of
one giant table. I'm hoping this will facilitate breaking off into
sub-groups and having more intimate conversations, and maybe avoid the
"it's hard to hear people at the other end of the table" issues. Or
maybe it will just make it worse as we shout to each other from all four
tables. I can't wait to see!

If you have any other questions or ideas, please share them here (or
email me off-list if appropriate). I look forward to seeing people
there!

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Contributor Summit Topics and Logistics
  2019-01-22  7:50 Contributor Summit Topics and Logistics Jeff King
@ 2019-01-22  8:26 ` Jeff King
  2019-01-22  9:17   ` GSoC 2019 (was: Contributor Summit Topics and Logistics) Christian Couder
  2019-01-22 18:21   ` Contributor Summit Topics and Logistics Stefan Beller
  2019-01-22 18:23 ` Derrick Stolee
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 19+ messages in thread
From: Jeff King @ 2019-01-22  8:26 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller, Christian Couder

On Tue, Jan 22, 2019 at 02:50:27AM -0500, Jeff King wrote:

> There's no set agenda; we'll decide what to discuss that day. But if
> anybody would like to mention topics they are interested in (whether you
> want to present on them, or just have an open discussion), please do so
> here. A little advance notice can help people prepare more for the
> discussions.

One topic worth discussing (here or there): the GSoC org deadline is Feb
6th. Last year's org admins were Christian and Stefan (cc'd). Are you
both interested and able to continue?

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* GSoC 2019 (was: Contributor Summit Topics and Logistics)
  2019-01-22  8:26 ` Jeff King
@ 2019-01-22  9:17   ` Christian Couder
  2019-01-31  2:02     ` SZEDER Gábor
  2019-01-22 18:21   ` Contributor Summit Topics and Logistics Stefan Beller
  1 sibling, 1 reply; 19+ messages in thread
From: Christian Couder @ 2019-01-22  9:17 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Stefan Beller

On Tue, Jan 22, 2019 at 9:26 AM Jeff King <peff@peff.net> wrote:
>
> On Tue, Jan 22, 2019 at 02:50:27AM -0500, Jeff King wrote:
>
> > There's no set agenda; we'll decide what to discuss that day. But if
> > anybody would like to mention topics they are interested in (whether you
> > want to present on them, or just have an open discussion), please do so
> > here. A little advance notice can help people prepare more for the
> > discussions.
>
> One topic worth discussing (here or there): the GSoC org deadline is Feb
> 6th. Last year's org admins were Christian and Stefan (cc'd). Are you
> both interested and able to continue?

Yeah, I am interested and able to both be org admin and mentor. Thanks
for talking about this.

I think that as usual we will have to prepare a few pages about:

- our application (like https://git.github.io/SoC-2018-Org-Application/)
- microprojects idea for interested students (like
https://git.github.io/SoC-2018-Microprojects/)
- project ideas (like https://git.github.io/SoC-2018-Ideas/)

Suggestions for microprojects or project ideas are welcome! Volunteers
for mentoring or org admin are welcome too!

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Contributor Summit Topics and Logistics
  2019-01-22  8:26 ` Jeff King
  2019-01-22  9:17   ` GSoC 2019 (was: Contributor Summit Topics and Logistics) Christian Couder
@ 2019-01-22 18:21   ` Stefan Beller
  2019-01-22 20:53     ` Jeff King
  1 sibling, 1 reply; 19+ messages in thread
From: Stefan Beller @ 2019-01-22 18:21 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Christian Couder

On Tue, Jan 22, 2019 at 12:26 AM Jeff King <peff@peff.net> wrote:
>
> On Tue, Jan 22, 2019 at 02:50:27AM -0500, Jeff King wrote:
>
> > There's no set agenda; we'll decide what to discuss that day. But if
> > anybody would like to mention topics they are interested in (whether you
> > want to present on them, or just have an open discussion), please do so
> > here. A little advance notice can help people prepare more for the
> > discussions.
>
> One topic worth discussing (here or there): the GSoC org deadline is Feb
> 6th. Last year's org admins were Christian and Stefan (cc'd). Are you
> both interested and able to continue?

I am treading lightly this year; if no one else is around I could be an
admin (definitely not a mentor), but I'd prefer not to.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Contributor Summit Topics and Logistics
  2019-01-22  7:50 Contributor Summit Topics and Logistics Jeff King
  2019-01-22  8:26 ` Jeff King
@ 2019-01-22 18:23 ` Derrick Stolee
  2019-01-24  8:57   ` Ævar Arnfjörð Bjarmason
  2019-01-22 20:30 ` Elijah Newren
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: Derrick Stolee @ 2019-01-22 18:23 UTC (permalink / raw)
  To: Jeff King, git

On 1/22/2019 2:50 AM, Jeff King wrote:
> For people who want to try to join remotely, I don't think we're going
> to have a particularly fancy AV setup. But there should at least be a
> big screen (which we typically do not really use for presenting), and I
> hope we can provide some connectivity. I'll be visiting the venue the
> day before (Jan 30th) in the late afternoon (Brussels time) and I'll try
> to do a test run. If anybody wants to volunteer to be the guinea pig on
> the other end of the line, I'd welcome it.

I would like to join remotely, so I volunteer to do a test run. I'll 
need to wake up early, so let's set an exact time privately.


Topics I would like to hear about:

- commit-graph status report (I can lead, if I'm able to join)

- multi-pack-index status report (same)

- reftable

- partial clone

- test coverage report, usefulness or improvements


Thanks,

-Stolee


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Contributor Summit Topics and Logistics
  2019-01-22  7:50 Contributor Summit Topics and Logistics Jeff King
  2019-01-22  8:26 ` Jeff King
  2019-01-22 18:23 ` Derrick Stolee
@ 2019-01-22 20:30 ` Elijah Newren
  2019-01-30 20:57 ` Ævar Arnfjörð Bjarmason
  2019-01-30 23:07 ` Jeff King
  4 siblings, 0 replies; 19+ messages in thread
From: Elijah Newren @ 2019-01-22 20:30 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List

On Mon, Jan 21, 2019 at 11:52 PM Jeff King <peff@peff.net> wrote:
>
> The Git Merge Contributor Summit is a little over a week away. If you're
> interested in coming but haven't signed up, please do! We have a few
> spaces available still. Details are in the previous announcement:
>
>   http://public-inbox.org/git/20181206094805.GA1398@sigill.intra.peff.net/
>
> There's no set agenda; we'll decide what to discuss that day. But if
> anybody would like to mention topics they are interested in (whether you
> want to present on them, or just have an open discussion), please do so
> here. A little advance notice can help people prepare more for the
> discussions.

* git repo-filter[1] or whatever it ends up being named (filter-branch
alternative): is it wanted in git.git?

* merge-recursive rewrite -- steps others want to see me take in that process?

* Making --merge option of rebase be the default[2]: what steps need
to be taken?

* I'll second Derrick's request for partial clone, perhaps also
briefly discuss related capabilities like sparse checkouts and partial
indexes too?



[1] https://public-inbox.org/git/20181111062312.16342-1-newren@gmail.com/
[2] https://public-inbox.org/git/xmqqh8jeh1id.fsf@gitster-ct.c.googlers.com/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Contributor Summit Topics and Logistics
  2019-01-22 18:21   ` Contributor Summit Topics and Logistics Stefan Beller
@ 2019-01-22 20:53     ` Jeff King
  0 siblings, 0 replies; 19+ messages in thread
From: Jeff King @ 2019-01-22 20:53 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, Christian Couder

On Tue, Jan 22, 2019 at 10:21:56AM -0800, Stefan Beller wrote:

> > > There's no set agenda; we'll decide what to discuss that day. But if
> > > anybody would like to mention topics they are interested in (whether you
> > > want to present on them, or just have an open discussion), please do so
> > > here. A little advance notice can help people prepare more for the
> > > discussions.
> >
> > One topic worth discussing (here or there): the GSoC org deadline is Feb
> > 6th. Last year's org admins were Christian and Stefan (cc'd). Are you
> > both interested and able to continue?
> 
> I am treading lightly this year; if no one else is around I could be an
> admin (definitely not a mentor), but I'd prefer not to.

I can be an org admin, as well. If Christian is willing, then I think
you don't need to do it this year.

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Contributor Summit Topics and Logistics
  2019-01-22 18:23 ` Derrick Stolee
@ 2019-01-24  8:57   ` Ævar Arnfjörð Bjarmason
  2019-01-29 18:22     ` Derrick Stolee
  0 siblings, 1 reply; 19+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-01-24  8:57 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Jeff King, git


On Tue, Jan 22 2019, Derrick Stolee wrote:

> On 1/22/2019 2:50 AM, Jeff King wrote:
>> For people who want to try to join remotely, I don't think we're going
>> to have a particularly fancy AV setup. But there should at least be a
>> big screen (which we typically do not really use for presenting), and I
>> hope we can provide some connectivity. I'll be visiting the venue the
>> day before (Jan 30th) in the late afternoon (Brussels time) and I'll try
>> to do a test run. If anybody wants to volunteer to be the guinea pig on
>> the other end of the line, I'd welcome it.
>
> I would like to join remotely, so I volunteer to do a test run. I'll
> need to wake up early, so let's set an exact time privately.
>
>
> Topics I would like to hear about:
>
> - commit-graph status report (I can lead, if I'm able to join)

While we're at it it would be useful to discuss what attendes think
about making core.commitGraph=true && gc.writeCommitGraph=true the
default.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Contributor Summit Topics and Logistics
  2019-01-24  8:57   ` Ævar Arnfjörð Bjarmason
@ 2019-01-29 18:22     ` Derrick Stolee
  0 siblings, 0 replies; 19+ messages in thread
From: Derrick Stolee @ 2019-01-29 18:22 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Jeff King, git

I was hoping to attend the contributors' summit remotely, but now my leave is
starting before then. This email contains a summary of what I would have
added to the discussion.

Thanks,
-Stolee


Commit-Graph Status Report
==========================

I'm really happy with the progress in this area, especially with the number of
other contributors working on the feature! Thanks Ævar, Jonathan, Josh, Stefan,
and Szeder in particular.

Here are some directions to take the feature in the near future:

File Format v2
--------------

The new format version [1] specifically fixes some shortcomings in v1:

* Uses the 4-byte format id for the hash algorithm.
* Creates a separate version byte for the reachability index.
* Enforces that the unused byte is zero until we use it for incremental writes.

Hopefully, this is the last time we need to update the file header.

[1] https://public-inbox.org/git/pull.112.git.gitgitgadget@gmail.com/
    [PATCH 0/6] Create commit-graph file format v2

Reachability Index
------------------

As discussed on-list [2], we want to replace generation numbers with a different
(negative-cut) reachability index. I used the term "corrected commit date". The
definition is:

* If a commit has no parents, then its corrected commit date is its commit date.

* If a commit has parents, then its corrected commit date is the maximum of:
    - its commit date
    - one more than the maximum corrected commit date of its parents

The benefits of this definition were discussed already, but to summarize:

* This definition will work _at least as well_ as the commit date heuristic,
  with the added bonus of being absolutely sure our results are right. We can
  update algorithms like paint_down_to_common() to use this reachability index
  without performance problems in some cases.

* If someone creates a terrible commit with a date that is far in the future,
  this definition is no worse than existing generation numbers (because we
  enforce that the corrected commit date is strictly larger than the parents'
  corrected commit date).

To implement this index, we can re-use the 30 bits per commit in the
commit-graph file that are used for generation numbers, but use them instead
for the difference between the corrected commit date and the actual commit
date. File format v2 gives us a version value that can be incremented to signal
the change in meaning.

Some work is required to adjust the existing generation-number-aware algorithms
to care about an "arbitrary" reachability index. It could be as easy as a
helper function that returns a function pointer to the proper compare function.

If someone wants to move forward on this topic while I'm gone, please
volunteer. Otherwise, this will be among my first items to work on when I
return from leave.

[2] https://public-inbox.org/git/6367e30a-1b3a-4fe9-611b-d931f51effef@gmail.com/
    [RFC] Generation Number v2

Incremental Writes
------------------

Similar to the split index, an incremental commit-graph file can be implemented
to reduce the write time when adding commits to an existing (large)
commit-graph. In this case, the .git/objects/info/commit-graph file would
be small, and have a pointer to a base file, say "cgraph-<hash>.cgraph", that
contains the majority of the commits.

The important thing to keep in mind here is that we use integers to refer to
a commit's parents. This integer would need to refer to the order of commits
when you concatenate the orderd lists from each file. When doing this, we
can point into the base file as well as the tip file. Since the base
commit-graph file would be closed under reachability, it only needs to care
about commits in its file.

It is also possible to have multiple base files, and we can use the unused
byte in the commit-graph file format v2 to store the number of base files.
We can then store a list of file names in a new chunk, presenting the ordered
list of base files. We still want to keep this list short, but there may be
benefits to a variable number. I expect the first version would limit the
construction to one base file for simplicity's sake.

When this is implemented, we can use it to write the commit-graph at fetch
time. A config setting, say 'fetch.writeCommitGraph', could enable this write.
Since most writes would add a small number of commits compared to the large
base file, this would be a more reasonable cost to add to a fetch. Since we
verify the pack upon download, the commits it contained will already be in
the memory cache and we won't need to re-parse those commits.

Volunteers welcome.

Bloom Filters
-------------

Using bloom filters to speed up file history has been discussed and prototyped
on-list (see [12] and the thread before it). Thanks for lots of contributions
in this area! A lot of people have shown an interest in this feature, and it
is particularly helpful with server-side queries.

Any implementation here should check that it is helping 'git blame' as much as
it can [13]. It's entirely possible that the performance problem mentioned
there is more about the size of the file and not finding the commits that
changed the file, but it's worth digging in here.

A few people have mentioned that they are interested in pursuing this
implementation, so it would be good to declare intentions during the summit.

[12] https://public-inbox.org/git/61559c5b-546e-d61b-d2e1-68de692f5972@gmail.com/

[13] https://public-inbox.org/git/CABXAcUzoNJ6s3=2xZfWYQUZ_AUefwP=5UVUgMnafKHHtufzbSA@mail.gmail.com/

Enabled by Default?
-------------------

I proposed turning on the feature by default [3], but that had some
resistance [4] and I never followed up to that remark. (It involved the
hope that we could consolidate commit walks during a gc/repack. I'm unsure
this is a goal worth pursuing.) Since there has been more interest recently [5]
I think it would be good to discuss what concerns we may have in turning this
on by default. Specifically, make 'core.commitGraph' and 'gc.writeCommitGraph'
default to 'true'. Users could still opt-out.

[3] https://public-inbox.org/git/pull.50.git.gitgitgadget@gmail.com/
[4] https://public-inbox.org/git/xmqqlg6vvrur.fsf@gitster-ct.c.googlers.com/
[5] https://public-inbox.org/git/87bm464elm.fsf@evledraar.gmail.com/


Multi-Pack-Index Status Report
==============================

The multi-pack-index feature shipped with Git 2.20! We've been using this
feature (or, a similar implementation as it changed a lot with review) in
VFS for Git for a year now. It's been critical to solving the many-packs
problem we have with our prefetch packs model. Our next version ships with
Git 2.20 and the upstream implementation.

We are now able to start tackling our space problem with these many packs.
Our solution includes the 'expire' and 'repack' subcommands [6]. We will run
these in the background [7] to slowly reduce the space we are using. Since
Git references the multi-pack-index, we are able to delete packs that have
no referenced objects from the multi-pack-index without interrupting user
commands (I don't think the same holds for 'git repack'). This "highly
available" model makes me think that this could be useful to other scenarios.

We are looking for interest from other users or groups in this feature. We
want this feature to be adopted, and that means the future of the feature 
should depend on more scenarios than our specific case.

Here are some ideas to make this more useful for others:

1. Incremental writes. See the commit-graph section for details. This would
   allow writing the multi-pack-index on fetch, helping users who have set
   gc.auto=0 keep performance high even though they have packs piling up.

2. Stable object order and bitmaps. This is discussed in the design
   document [8]. This is more useful for server environments.

[6] https://public-inbox.org/git/pull.92.git.gitgitgadget@gmail.com/
    [PATCH 0/5] Create 'expire' and 'repack' verbs for git-multi-pack-index

[7] https://github.com/Microsoft/VFSForGit/blob/9cad154293456a41bef593a75e1ad2cb840c8524/GVFS/GVFS.Common/Maintenance/PackfileMaintenanceStep.cs#L141-L158
    The use of 'expire' and 'repack' in VFS for Git

[8] https://github.com/git/git/blob/master/Documentation/technical/multi-pack-index.txt#L77-L84
    multi-pack-index and stable object order


Test Coverage Report
====================

My intentions creating the test coverage report were to avoid bugs by double-
checking that we are testing all logic that was both (1) non-trivial, and
(2) new. The report does tend to be noisy with a lot of trivial blocks (error
cases) or code that was not covered before but was updated with a mechanical
refactoring. I'm hoping to attack these issues by using a new approach when
generating the reports.

I've created a GitHub repo [9] that contains new logic for generating the test
coverage report. In particular, it will now generate a text report that will
be sent to the list, but also an HTML report that will be posted online (see
[10] for an example).

In addition, the repo has an 'ignored' directory. This directory will be filled
with files that mirror their corresponding files in the Git repo, but contain
line numbers and contents for lines that have been deemed "unimportant". For
instance, I didn't want to just ignore all lines that say simply "return;" but
we can check that line 302 of builtin/checkout.c says "return;" and ignore that
line in the report [11].

I'll try to review the test report and add ignored lines before generating the
next report. I'll also accept PRs that add ignored lines (with justification).

I think this will help the usefulness significantly, especially as topics merge
down into 'next' and 'master'. If we track the ignored lines throughout a cycle,
then the report for 'maint' versus 'master' near release time may actually be
reasonable to read.

Any other feedback on the reports is greatly appreciated!

[9] https://github.com/derrickstolee/git-test-coverage

[10] https://derrickstolee.github.io/git-test-coverage/reports/2019-01-29.htm

[11] https://github.com/derrickstolee/git-test-coverage/blob/master/ignored/builtin/checkout.c

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Contributor Summit Topics and Logistics
  2019-01-22  7:50 Contributor Summit Topics and Logistics Jeff King
                   ` (2 preceding siblings ...)
  2019-01-22 20:30 ` Elijah Newren
@ 2019-01-30 20:57 ` Ævar Arnfjörð Bjarmason
  2019-01-30 22:26   ` Jeff Hostetler
  2019-01-30 22:51   ` Philip Oakley
  2019-01-30 23:07 ` Jeff King
  4 siblings, 2 replies; 19+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-01-30 20:57 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Derrick Stolee, Johannes Schindelin


On Tue, Jan 22 2019, Jeff King wrote:

> There's no set agenda; we'll decide what to discuss that day. But if
> anybody would like to mention topics they are interested in (whether you
> want to present on them, or just have an open discussion), please do so
> here. A little advance notice can help people prepare more for the
> discussions.

This is definitely a "little" advance seeing as it's tomorrow morning.

> Even if you're not coming, please feel free to suggest topics (but bonus
> points if you convince somebody who _is_ coming to lead the session).

Things I'd be interested in hearing / talking about about that haven't
yet been mentioned.

These are in descending order of how interesting I think these will be
to a general audience, to the point where maybe only I care about the
bottom of this list...

* "Big repos". We had discussions about this in years past. It's a very
  spawly and vague topic. Do we mean big history, big blobs, big (in
  size/depth/width) checkouts etc?

  But regardless, many of us deal with this in one way or another, and
  it would be good to have a top-level overview of how the various
  solutions to this that are being integrated into git.git are doing /
  what people see on the horizon for scalabiltiy.

* "Structured remote logging". We had an RFC spec for turning our trace
  format into something more structural with a way to send it to a
  remote server. There were both implementation & privacy concernse,
  last time at least a couple of users of git reported having in-house
  patches for this (not ready for upstream). Where are we on this now?

* "commit graph by default". I had this on my list, but Derrick Stolee
  sent out a much better summary:
  https://public-inbox.org/git/6d0dc2a2-120c-0d42-1910-14ffed7adaf1@gmail.com/

* I've been using (but haven't yet re-rolled) my "relative SHA-1
  abbreviation" series
  (https://public-inbox.org/git/20180608224136.20220-1-avarab@gmail.com/)

  I'm interested in seeing if anyone else is interested in this, and
  particularly what the overlap (if any) is between this & midx.

* "Making strict fsck checks on clone the default". I worked a bit on
  this in this last year in between a couple of security issues that
  needed fsck checks. Has caveats etc., but would give users some more
  protections.

* "The CI I set up for git on the GCC Compile Farm". Can be folded into
  a general "state of git.git CI" topic:
  https://gitlab.com/git-vcs/git-ci/pipelines

* If people care about making the TAP mode in our test suite mandatory
  (i.e. require "prove" or a tool like it). See
  https://public-inbox.org/git/87zhrj2n2l.fsf@evledraar.gmail.com/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Contributor Summit Topics and Logistics
  2019-01-30 20:57 ` Ævar Arnfjörð Bjarmason
@ 2019-01-30 22:26   ` Jeff Hostetler
  2019-01-30 22:51   ` Philip Oakley
  1 sibling, 0 replies; 19+ messages in thread
From: Jeff Hostetler @ 2019-01-30 22:26 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Jeff King
  Cc: git, Derrick Stolee, Johannes Schindelin



On 1/30/2019 3:57 PM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Tue, Jan 22 2019, Jeff King wrote:
> 
...
> * "Structured remote logging". We had an RFC spec for turning our trace
>    format into something more structural with a way to send it to a
>    remote server. There were both implementation & privacy concernse,
>    last time at least a couple of users of git reported having in-house
>    patches for this (not ready for upstream). Where are we on this now?

I won't be attending GitMerge this year, but I can talk about
this work here.

My earlier "structured logging" and/or "telemetry" proposals
have been replaced by my Trace2 patch series now in "pu".

The Trace2 feature is designed to report trace and performance
data from within the git process to a local log file, unix
domain socket, or Windows named pipe.  Functions in the Trace2
API generate structured data and can write either structured
(JSON) or non-structured formats to disk.  (It should not be
hard to add a binary structured format too, but that is beyond
the scope of the current patch series.)

The JSON stream is suitable for post-processing by a local
process.  This can be a daemon listening to the stream or a
cron job processing the trace data after the fact.

I consider it to be the job of the post-processor (after
aggregating, filtering or whatever) to decide what to do with
the data.  This lets the the user and/or sysadmin control how
and when data is collected.  The post-processor is free to hook
into something like syslog or ETW or write to a custom DB.

Post-processing tools are not included in the patch series.


Internally within Microsoft, we have a local Windows Service
listening on a named pipe and collecting events from all
git processes for our GVFS users in the Windows OS repo.
It computes a summary record for each git command, for example
combining the argv from the "start" event with the elapsed
time from the "exit" event into a single record.  The service
then sends the aggregate records to a centralized database.

This lets us run various database queries to try to understand
pain points that our OS developers are experiencing (and that
may not show up on my machine) and help us prioritize future perf
and scaling work.

But again, this service is but one possible post-processor
and is for internal-use-only.

The Trace2 feature itself does not have any remote capability.
It just writes data locally.

Jeff



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Contributor Summit Topics and Logistics
  2019-01-30 20:57 ` Ævar Arnfjörð Bjarmason
  2019-01-30 22:26   ` Jeff Hostetler
@ 2019-01-30 22:51   ` Philip Oakley
  2019-01-30 23:13     ` Christian Couder
  1 sibling, 1 reply; 19+ messages in thread
From: Philip Oakley @ 2019-01-30 22:51 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Jeff King
  Cc: git, Derrick Stolee, Johannes Schindelin

On 30/01/2019 20:57, Ævar Arnfjörð Bjarmason wrote:
> On Tue, Jan 22 2019, Jeff King wrote:
>
>> There's no set agenda; we'll decide what to discuss that day. But if
>> anybody would like to mention topics they are interested in (whether you
>> want to present on them, or just have an open discussion), please do so
>> here. A little advance notice can help people prepare more for the
>> discussions.
> This is definitely a "little" advance seeing as it's tomorrow morning.
>
>> Even if you're not coming, please feel free to suggest topics (but bonus
>> points if you convince somebody who _is_ coming to lead the session).
> Things I'd be interested in hearing / talking about about that haven't
> yet been mentioned.
>
> These are in descending order of how interesting I think these will be
> to a general audience, to the point where maybe only I care about the
> bottom of this list...
>
> * "Big repos". We had discussions about this in years past. It's a very
>    spawly and vague topic. Do we mean big history, big blobs, big (in
>    size/depth/width) checkouts etc?
>
>    But regardless, many of us deal with this in one way or another, and
>    it would be good to have a top-level overview of how the various
>    solutions to this that are being integrated into git.git are doing /
>    what people see on the horizon for scalabiltiy.


I'd also like a bit of discussion about ensuring that the partial clone 
& filtering aspects of 'big repos' (if partial is needed /used then it's 
big ...) still retain the full 'distributed' nature and capability of git.


Also in some environments the filtering may want to be applied at the 
server end (based on it's knowledge of the specific user). Ultimately it 
should also pull in some of the sub-module aspects as super projects are 
just big repos in disguise.

>
> * "Structured remote logging". We had an RFC spec for turning our trace
>    format into something more structural with a way to send it to a
>    remote server. There were both implementation & privacy concernse,
>    last time at least a couple of users of git reported having in-house
>    patches for this (not ready for upstream). Where are we on this now?
>
> * "commit graph by default". I had this on my list, but Derrick Stolee
>    sent out a much better summary:
>    https://public-inbox.org/git/6d0dc2a2-120c-0d42-1910-14ffed7adaf1@gmail.com/
>
> * I've been using (but haven't yet re-rolled) my "relative SHA-1
>    abbreviation" series
>    (https://public-inbox.org/git/20180608224136.20220-1-avarab@gmail.com/)
>
>    I'm interested in seeing if anyone else is interested in this, and
>    particularly what the overlap (if any) is between this & midx.
>
> * "Making strict fsck checks on clone the default". I worked a bit on
>    this in this last year in between a couple of security issues that
>    needed fsck checks. Has caveats etc., but would give users some more
>    protections.
>
> * "The CI I set up for git on the GCC Compile Farm". Can be folded into
>    a general "state of git.git CI" topic:
>    https://gitlab.com/git-vcs/git-ci/pipelines
>
> * If people care about making the TAP mode in our test suite mandatory
>    (i.e. require "prove" or a tool like it). See
>    https://public-inbox.org/git/87zhrj2n2l.fsf@evledraar.gmail.com/


I also had some questions regarding tree walk issues for follower and 
friendly fork repos that have lots of deadheads within their tree, such 
as previous release versions in Git for Windows. It should be easier to 
filter those deadheads (or at least suggest the best way of creating 
such sentinels).

--

Philip



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Contributor Summit Topics and Logistics
  2019-01-22  7:50 Contributor Summit Topics and Logistics Jeff King
                   ` (3 preceding siblings ...)
  2019-01-30 20:57 ` Ævar Arnfjörð Bjarmason
@ 2019-01-30 23:07 ` Jeff King
  2019-02-02 12:33   ` Jakub Narebski
  4 siblings, 1 reply; 19+ messages in thread
From: Jeff King @ 2019-01-30 23:07 UTC (permalink / raw)
  To: git

On Tue, Jan 22, 2019 at 02:50:27AM -0500, Jeff King wrote:

> If you're not coming, you can probably stop reading this message now.
> The rest is all logistics.

Here are a few additional last-minute logistics:

> For people who want to try to join remotely, I don't think we're going
> to have a particularly fancy AV setup. But there should at least be a
> big screen (which we typically do not really use for presenting), and I
> hope we can provide some connectivity. I'll be visiting the venue the
> day before (Jan 30th) in the late afternoon (Brussels time) and I'll try
> to do a test run. If anybody wants to volunteer to be the guinea pig on
> the other end of the line, I'd welcome it.

The remote connection will be done via Zoom, using this URL which will
become active shortly before 10:00am (Brussels time):

  https://github.zoom.us/j/186903655

You may need to download an app or other software; solutions are
available for most platforms, and the zoom site should guide you.

Note that this is _not_ configured as a one-way webinar. It's a real
video-conference where joiners can participate in the discussion. So
spectators from the community are OK, but please leave your camera/mic
off if you're not actively participating.

> The physical setup this year will actually be 4 round tables, instead of
> one giant table. I'm hoping this will facilitate breaking off into
> sub-groups and having more intimate conversations, and maybe avoid the
> "it's hard to hear people at the other end of the table" issues. Or
> maybe it will just make it worse as we shout to each other from all four
> tables. I can't wait to see!

There will be outlets for charging laptops, but probably only about half
as many as there are people. So plan accordingly.

> There's no organized dinner. However, there will be a social/drinks
> event for the broader conference at 7pm; I'll provide more details
> that day.

This is indeed happening, and is open to all Git Merge attendees.
Details should have been emailed out to the email address you registered
with. Note that they're asking people to RSVP through a web link. Please
do so if you're planning on coming!

See everybody tomorrow at 9:00am.

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Contributor Summit Topics and Logistics
  2019-01-30 22:51   ` Philip Oakley
@ 2019-01-30 23:13     ` Christian Couder
  0 siblings, 0 replies; 19+ messages in thread
From: Christian Couder @ 2019-01-30 23:13 UTC (permalink / raw)
  To: Philip Oakley
  Cc: Ævar Arnfjörð Bjarmason, Jeff King, git,
	Derrick Stolee, Johannes Schindelin

On Thu, Jan 31, 2019 at 12:05 AM Philip Oakley <philipoakley@iee.org> wrote:
>
> On 30/01/2019 20:57, Ævar Arnfjörð Bjarmason wrote:
> >
> > * "Big repos". We had discussions about this in years past. It's a very
> >    spawly and vague topic. Do we mean big history, big blobs, big (in
> >    size/depth/width) checkouts etc?
> >
> >    But regardless, many of us deal with this in one way or another, and
> >    it would be good to have a top-level overview of how the various
> >    solutions to this that are being integrated into git.git are doing /
> >    what people see on the horizon for scalabiltiy.

I am also very interested in that topic ;-)

> I'd also like a bit of discussion about ensuring that the partial clone
> & filtering aspects of 'big repos' (if partial is needed /used then it's
> big ...) still retain the full 'distributed' nature and capability of git.

And in this too, especially regarding my work on many promisor/partial
clone remotes (previously ODBs).

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: GSoC 2019 (was: Contributor Summit Topics and Logistics)
  2019-01-22  9:17   ` GSoC 2019 (was: Contributor Summit Topics and Logistics) Christian Couder
@ 2019-01-31  2:02     ` SZEDER Gábor
  2019-01-31  6:11       ` Christian Couder
  0 siblings, 1 reply; 19+ messages in thread
From: SZEDER Gábor @ 2019-01-31  2:02 UTC (permalink / raw)
  To: Christian Couder; +Cc: Jeff King, git, Stefan Beller

On Tue, Jan 22, 2019 at 10:17:59AM +0100, Christian Couder wrote:
> - microprojects idea for interested students (like
> https://git.github.io/SoC-2018-Microprojects/)

> Suggestions for microprojects or project ideas are welcome! Volunteers
> for mentoring or org admin are welcome too!

I think we should remove most (all?) CI-related microprojects.

  - The first three are about adding static analizers.  Now, while
    adding a new build job to run a static analyzer is easy enough,
    it's also next to useless or even downright harmful in itself.
    Static analyzers are inherently prone to false positives, and
    dealing with those is definitely beyond the scope of a
    microproject.  And adding a static analysis build job that always
    fails because of undealt with false positives, and thus makes the
    whole build failed will just make life harder for those who take
    the effort to look at CI results.

    Last year we had submissions for these micrprojcets, but in the
    end they were not picked up because of this.

  - One project suggest installing CVS, Subversion and Apache in the
    CI environmens to increase test coverage.  Well, Subversion and
    Apache are already installed, and have been for a long time
    (though $GIT_TEST_SVNSERVE is not enabled (don't know why) and one
    test script is skipped because "svn-info test (SVN version: 1.8.8
    not supported)".  That leaves only CVS, which is perhaps too small
    a microproject (perhaps even with old standards; our microprojects
    grew considerably over the years).

  - Finally, the last one is about building a webpage that analyses
    Travis CI test results and identifies flaky tests, and then goes
    on to suggest that "look at the randomly failing tests and try to
    figure out why they fail".  I've got my fair share in fixing flaky
    tests, and IMO doing so is definitely beyond the scope of a
    microproject.

Ok, after suggesting the removal of five microproject ideas, here is a
suggestion for a new one:

  Find a test script that verifies the presence/absence of
  files/directories with 'test -(e|f|d|...)' and replace them with the
  appropriate 'test_path_is_file', 'test_path_is_dir', etc. helper
  functions.

The good thing about this is that there are plenty of those test
scripts :)


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: GSoC 2019 (was: Contributor Summit Topics and Logistics)
  2019-01-31  2:02     ` SZEDER Gábor
@ 2019-01-31  6:11       ` Christian Couder
  0 siblings, 0 replies; 19+ messages in thread
From: Christian Couder @ 2019-01-31  6:11 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Jeff King, git, Stefan Beller

On Thu, Jan 31, 2019 at 3:02 AM SZEDER Gábor <szeder.dev@gmail.com> wrote:
>
> I think we should remove most (all?) CI-related microprojects.

Yeah, I agree that they don't make sense anymore.

> Ok, after suggesting the removal of five microproject ideas, here is a
> suggestion for a new one:
>
>   Find a test script that verifies the presence/absence of
>   files/directories with 'test -(e|f|d|...)' and replace them with the
>   appropriate 'test_path_is_file', 'test_path_is_dir', etc. helper
>   functions.
>
> The good thing about this is that there are plenty of those test
> scripts :)

Thank you for this suggestion, I will add it.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Contributor Summit Topics and Logistics
  2019-01-30 23:07 ` Jeff King
@ 2019-02-02 12:33   ` Jakub Narebski
  2019-02-04 19:30     ` Elijah Newren
  2019-04-23  3:45     ` Jeff King
  0 siblings, 2 replies; 19+ messages in thread
From: Jakub Narebski @ 2019-02-02 12:33 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> On Tue, Jan 22, 2019 at 02:50:27AM -0500, Jeff King wrote:
>
>> If you're not coming, you can probably stop reading this message now.
>> The rest is all logistics.
>
> Here are a few additional last-minute logistics:
>
>> For people who want to try to join remotely, I don't think we're going
>> to have a particularly fancy AV setup. But there should at least be a
>> big screen (which we typically do not really use for presenting), and I
>> hope we can provide some connectivity. I'll be visiting the venue the
>> day before (Jan 30th) in the late afternoon (Brussels time) and I'll try
>> to do a test run. If anybody wants to volunteer to be the guinea pig on
>> the other end of the line, I'd welcome it.
>
> The remote connection will be done via Zoom, using this URL which will
> become active shortly before 10:00am (Brussels time):
>
>   https://github.zoom.us/j/186903655
>
> You may need to download an app or other software; solutions are
> available for most platforms, and the zoom site should guide you.

Thank you very much for setting this remote connection up.  It did make
it possible for me to watch the Git Contributor Summit 2019 (and take
notes for Git Rev News).  I have had Zoom installed already, so it was
not a problem.  (As I have seen, Szeder Gábor was also spectacting ;-)

The audio was not always clear, which depended on where the person
speaking was positioned; I understand that it is a very difficult
problem to get good acoustic in such unstructured setup.

> Note that this is _not_ configured as a one-way webinar. It's a real
> video-conference where joiners can participate in the discussion. So
> spectators from the community are OK, but please leave your camera/mic
> off if you're not actively participating.

As far as I know it went untested (but then nobody announced that he or
she wants to actively participate remotely).

I didn't stay for the 15-17 breakout session (talking in individual
groups); I wonder how well the remote connection setup would work with
multiple discussions in parallel.


I have noticed a little 'recording' indicator; would recorded session
(video or audio only) be made available at some point in time?  Did
anyone take minutes, or take notes (for example of the Summit agenda
created at the start of the meeting -- when the audio was muted)?  I
would be very interested in your impressions.

> See everybody tomorrow at 9:00am.

The event actually started at 10:00am CET.


Thanks again,
--
Jakub Narębski

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Contributor Summit Topics and Logistics
  2019-02-02 12:33   ` Jakub Narebski
@ 2019-02-04 19:30     ` Elijah Newren
  2019-04-23  3:45     ` Jeff King
  1 sibling, 0 replies; 19+ messages in thread
From: Elijah Newren @ 2019-02-04 19:30 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Jeff King, Git Mailing List

Hi Jakub,

On Sat, Feb 2, 2019 at 4:39 AM Jakub Narebski <jnareb@gmail.com> wrote:
> I have noticed a little 'recording' indicator; would recorded session
> (video or audio only) be made available at some point in time?  Did
> anyone take minutes, or take notes (for example of the Summit agenda
> created at the start of the meeting -- when the audio was muted)?  I
> would be very interested in your impressions.

I took some notes.  I'm not sure how useful they'll be given that they
were meant just for my own memory (my company said I either had to
give a talk at the conference or come back and give a talk to my
coworkers about the conference in order for them to pay for it, so I'm
doing the latter).  But, I'll provide them here in case they're useful
to anyone.

Discussion points:
  * Fetch response CDN offloading (Jonathan Tan)
    * allows resumable cloning
    * does load balancing
    * gets the static part of history (e.g. until a week ago) from CDN, and
      last bits from "main" server
    * questions about whether to do multiple bits offloaded (e.g. almost
      full clone, only stuff from last month, etc.); can server keep track of
      manifest and direct client to necessary subset of pack on a CDN?
  * A review of "Big"
    * references, history, wide-checkout, deep-checkout, lots to gc, etc.
    * newer stuff: partial clones, worktrees, commit-graph
    * plan to do a breakout session later
  * NewHash
    * sha1 -> sha256
    * have sha256 repo locally talking to a server using sha1?
    * as of yesterday, binary that can create either sha1 or sha256 repos
    * will be using fixed length listing of shas in packfile; if given sha1
      is fourth in list, then the corresponding sha256 will be fourth
    * next: interoperation; fetch & push coming up next
    * done a fair amount of work so moving to a new hash in the future with
      a different length should be much less work
    * no automatic translation of commit messages, but maintenance of
      dual-mapping of hashes
    * (Comments on sha1dc & its performance)
    * Submodules is the biggest issue right now
  * Poll: prove vs. jumbled output
    * some people didn't set up prove; some attempts to avoid perl on windows
    * nearly everyone using prove; could switch to it as the default
  * Poll: where should Git Merge be next year?
    * will bring up on list, but Canada is at least an option
    * North America is more likely to get Junio to come
    * I tried to push for North America...
  * Using mailmap by default in git log?
    * People change names for lots of reasons (including
transliteration differences)
    * Keep an option to not use mailmap
    * People generally positive on the idea
  <Lunch>
  * fetch response sideband-all
    * sidebands for progress messages and errors
    * sideband currently limited to when sending packfile
    * proposal: expand sideband for whole response, not just packfile.
    * particularly useful given ideas to do CDN
    * also needed for keep-alive messages
    * this will be a negotiated new capability (can't do it backward-compatibly)
  * protocol v2 for push
    * ref advertisement the main issue
    * would like to be able to modify the commit message (?!?)
      * rebase-on-push
      * reformat-on-push
    * discussion of how to split messages up into sub-commands
    * a way to retry pushes without re-pushing everything (e.g. someone else
      updated the branch, you then re-merged or rebased locally and want to
      push again, meaning the server already has _most_ the objects but just
      needs a few new ones)
  * partial clones
    * doing work to have multiple remotes (also ties in to CDN usage)
    * still very tied to having a server around to request additional objects
    * we need to have a way to keep upload-pack open and do multiple requests
    * has some ability to filter trees, but we need them for now for index
      * Matthew Devore doing some work in this area right now, but it appears
        to be based on depth rather than width?
    * connection with sparse checkout is kind of hacky right now
    * there are reachability enforcement issues in V2, which becomes even more
      of an issue with partial clones (now need to worry about blobs not just
      commits)
    * in a partial clone world, server can't gc
    * sidenote: dumb http support
      * no major hosting provider supports it
      * some people like it due to resumability (e.g. Joey Hess & git annex)
      * cgit provides dumb http support natively
    * questions I had in area: getting list of initial files of interest...
                               gluing together with sparse checkout
                               partial indexes
  <break; talked with Michael H. & Thomas G.: filter-repo, checkout overlay>
  * breakouts: merge, GSoc, structured logging, windows big files; I
was in "merge"
    * merge-recursive rewrite
      * questions and basic explanation of how the algorithm works
      * want incremental updates on merge-recursive rewrite
      * make merge-recursive code part of libgit.a ?
      * people are very happy about idea to not touch the working tree
    * make rebase --merge the default
      * use performance tests to see how well it compares (p3400-rebase.sh)
      * may later also reimplement the am-specific flags on top of interactive
    * make use of best merge bases in more places (e.g. git diff A...B uses a
      suboptimal one)
    * rebase --rebase-merges:
      * doing a five-way merge rewriting xdiff to handle five instead of three
        file versions
      * M merges A & B
      * M' should like like a merge of A' and B', but really involved in a
        five way merge of A', A, M, B, B' -- and that is necessary in order
        to get evil merge represented
  * overview of "Big"
    * git-sizer (funny: git-lab asks users to run it and return results; github
      runs it for user and shows them the results)
    * large blobs, partial clones
    * partial or hierarchical indexes
  * CI
    * Dscho has a lot of machinery built up around Azure Pipelines
    * PRs to github.com:gitgitgadget/git will automatically be built on
      Windows, MacOS X, and linux
    * Interest in getting emails for failures that their topic branch
      caused (note: get topic author from tip commit author if not Junio)
    * This may be able to move to github.com:git/git after Dscho's patches
      merge down

Stuff that had been mentioned but we didn't get to:
state-of-the-union, commit-graph, evolve (we had the developer of the
feature in mercurial present, but not the folks who had worked on the
feature in git), git filter-repo, maybe a few others I'm forgetting.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Contributor Summit Topics and Logistics
  2019-02-02 12:33   ` Jakub Narebski
  2019-02-04 19:30     ` Elijah Newren
@ 2019-04-23  3:45     ` Jeff King
  1 sibling, 0 replies; 19+ messages in thread
From: Jeff King @ 2019-04-23  3:45 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

On Sat, Feb 02, 2019 at 01:33:22PM +0100, Jakub Narebski wrote:

> I have noticed a little 'recording' indicator; would recorded session
> (video or audio only) be made available at some point in time?  Did
> anyone take minutes, or take notes (for example of the Summit agenda
> created at the start of the meeting -- when the audio was muted)?  I
> would be very interested in your impressions.

I did record this. The resulting file is quite large, and full of
incoherent bits and blank spots (where we took a break and turned off
the mics but forgot to pause the recording).

I had planned to try to cut it down (at least roughly removing the
useless spots), but here it is April and I haven't managed to do so. If
anybody wants to volunteer to take a crack at it, let me know.

The video file is a few gigabytes. TBH, I've wondered if just
distributing the audio would be just as useful, since the camera is
mostly a static shot of people who aren't currently talking. ;)

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2019-04-23  3:45 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-22  7:50 Contributor Summit Topics and Logistics Jeff King
2019-01-22  8:26 ` Jeff King
2019-01-22  9:17   ` GSoC 2019 (was: Contributor Summit Topics and Logistics) Christian Couder
2019-01-31  2:02     ` SZEDER Gábor
2019-01-31  6:11       ` Christian Couder
2019-01-22 18:21   ` Contributor Summit Topics and Logistics Stefan Beller
2019-01-22 20:53     ` Jeff King
2019-01-22 18:23 ` Derrick Stolee
2019-01-24  8:57   ` Ævar Arnfjörð Bjarmason
2019-01-29 18:22     ` Derrick Stolee
2019-01-22 20:30 ` Elijah Newren
2019-01-30 20:57 ` Ævar Arnfjörð Bjarmason
2019-01-30 22:26   ` Jeff Hostetler
2019-01-30 22:51   ` Philip Oakley
2019-01-30 23:13     ` Christian Couder
2019-01-30 23:07 ` Jeff King
2019-02-02 12:33   ` Jakub Narebski
2019-02-04 19:30     ` Elijah Newren
2019-04-23  3:45     ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).