* Contributor Summit Topics and Logistics @ 2019-01-22 7:50 Jeff King 2019-01-22 8:26 ` Jeff King ` (4 more replies) 0 siblings, 5 replies; 19+ messages in thread From: Jeff King @ 2019-01-22 7:50 UTC (permalink / raw) To: git The Git Merge Contributor Summit is a little over a week away. If you're interested in coming but haven't signed up, please do! We have a few spaces available still. Details are in the previous announcement: http://public-inbox.org/git/20181206094805.GA1398@sigill.intra.peff.net/ There's no set agenda; we'll decide what to discuss that day. But if anybody would like to mention topics they are interested in (whether you want to present on them, or just have an open discussion), please do so here. A little advance notice can help people prepare more for the discussions. Even if you're not coming, please feel free to suggest topics (but bonus points if you convince somebody who _is_ coming to lead the session). If you're not coming, you can probably stop reading this message now. The rest is all logistics. We have the room available from 9am-5pm. Breakfast will be served from 9am-10am in the main area (i.e., mingling with workshop attendees and other such ruffians). So I'd suggest to show up around 9am to get registered and mingle, and then we can start the Very Serious Business at 10am. Lunch will be provided at noon, and some snacks in the afternoon. There's no organized dinner. However, there will be a social/drinks event for the broader conference at 7pm; I'll provide more details that day. For people who want to try to join remotely, I don't think we're going to have a particularly fancy AV setup. But there should at least be a big screen (which we typically do not really use for presenting), and I hope we can provide some connectivity. I'll be visiting the venue the day before (Jan 30th) in the late afternoon (Brussels time) and I'll try to do a test run. If anybody wants to volunteer to be the guinea pig on the other end of the line, I'd welcome it. The physical setup this year will actually be 4 round tables, instead of one giant table. I'm hoping this will facilitate breaking off into sub-groups and having more intimate conversations, and maybe avoid the "it's hard to hear people at the other end of the table" issues. Or maybe it will just make it worse as we shout to each other from all four tables. I can't wait to see! If you have any other questions or ideas, please share them here (or email me off-list if appropriate). I look forward to seeing people there! -Peff ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Contributor Summit Topics and Logistics 2019-01-22 7:50 Contributor Summit Topics and Logistics Jeff King @ 2019-01-22 8:26 ` Jeff King 2019-01-22 9:17 ` GSoC 2019 (was: Contributor Summit Topics and Logistics) Christian Couder 2019-01-22 18:21 ` Contributor Summit Topics and Logistics Stefan Beller 2019-01-22 18:23 ` Derrick Stolee ` (3 subsequent siblings) 4 siblings, 2 replies; 19+ messages in thread From: Jeff King @ 2019-01-22 8:26 UTC (permalink / raw) To: git; +Cc: Stefan Beller, Christian Couder On Tue, Jan 22, 2019 at 02:50:27AM -0500, Jeff King wrote: > There's no set agenda; we'll decide what to discuss that day. But if > anybody would like to mention topics they are interested in (whether you > want to present on them, or just have an open discussion), please do so > here. A little advance notice can help people prepare more for the > discussions. One topic worth discussing (here or there): the GSoC org deadline is Feb 6th. Last year's org admins were Christian and Stefan (cc'd). Are you both interested and able to continue? -Peff ^ permalink raw reply [flat|nested] 19+ messages in thread
* GSoC 2019 (was: Contributor Summit Topics and Logistics) 2019-01-22 8:26 ` Jeff King @ 2019-01-22 9:17 ` Christian Couder 2019-01-31 2:02 ` SZEDER Gábor 2019-01-22 18:21 ` Contributor Summit Topics and Logistics Stefan Beller 1 sibling, 1 reply; 19+ messages in thread From: Christian Couder @ 2019-01-22 9:17 UTC (permalink / raw) To: Jeff King; +Cc: git, Stefan Beller On Tue, Jan 22, 2019 at 9:26 AM Jeff King <peff@peff.net> wrote: > > On Tue, Jan 22, 2019 at 02:50:27AM -0500, Jeff King wrote: > > > There's no set agenda; we'll decide what to discuss that day. But if > > anybody would like to mention topics they are interested in (whether you > > want to present on them, or just have an open discussion), please do so > > here. A little advance notice can help people prepare more for the > > discussions. > > One topic worth discussing (here or there): the GSoC org deadline is Feb > 6th. Last year's org admins were Christian and Stefan (cc'd). Are you > both interested and able to continue? Yeah, I am interested and able to both be org admin and mentor. Thanks for talking about this. I think that as usual we will have to prepare a few pages about: - our application (like https://git.github.io/SoC-2018-Org-Application/) - microprojects idea for interested students (like https://git.github.io/SoC-2018-Microprojects/) - project ideas (like https://git.github.io/SoC-2018-Ideas/) Suggestions for microprojects or project ideas are welcome! Volunteers for mentoring or org admin are welcome too! ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: GSoC 2019 (was: Contributor Summit Topics and Logistics) 2019-01-22 9:17 ` GSoC 2019 (was: Contributor Summit Topics and Logistics) Christian Couder @ 2019-01-31 2:02 ` SZEDER Gábor 2019-01-31 6:11 ` Christian Couder 0 siblings, 1 reply; 19+ messages in thread From: SZEDER Gábor @ 2019-01-31 2:02 UTC (permalink / raw) To: Christian Couder; +Cc: Jeff King, git, Stefan Beller On Tue, Jan 22, 2019 at 10:17:59AM +0100, Christian Couder wrote: > - microprojects idea for interested students (like > https://git.github.io/SoC-2018-Microprojects/) > Suggestions for microprojects or project ideas are welcome! Volunteers > for mentoring or org admin are welcome too! I think we should remove most (all?) CI-related microprojects. - The first three are about adding static analizers. Now, while adding a new build job to run a static analyzer is easy enough, it's also next to useless or even downright harmful in itself. Static analyzers are inherently prone to false positives, and dealing with those is definitely beyond the scope of a microproject. And adding a static analysis build job that always fails because of undealt with false positives, and thus makes the whole build failed will just make life harder for those who take the effort to look at CI results. Last year we had submissions for these micrprojcets, but in the end they were not picked up because of this. - One project suggest installing CVS, Subversion and Apache in the CI environmens to increase test coverage. Well, Subversion and Apache are already installed, and have been for a long time (though $GIT_TEST_SVNSERVE is not enabled (don't know why) and one test script is skipped because "svn-info test (SVN version: 1.8.8 not supported)". That leaves only CVS, which is perhaps too small a microproject (perhaps even with old standards; our microprojects grew considerably over the years). - Finally, the last one is about building a webpage that analyses Travis CI test results and identifies flaky tests, and then goes on to suggest that "look at the randomly failing tests and try to figure out why they fail". I've got my fair share in fixing flaky tests, and IMO doing so is definitely beyond the scope of a microproject. Ok, after suggesting the removal of five microproject ideas, here is a suggestion for a new one: Find a test script that verifies the presence/absence of files/directories with 'test -(e|f|d|...)' and replace them with the appropriate 'test_path_is_file', 'test_path_is_dir', etc. helper functions. The good thing about this is that there are plenty of those test scripts :) ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: GSoC 2019 (was: Contributor Summit Topics and Logistics) 2019-01-31 2:02 ` SZEDER Gábor @ 2019-01-31 6:11 ` Christian Couder 0 siblings, 0 replies; 19+ messages in thread From: Christian Couder @ 2019-01-31 6:11 UTC (permalink / raw) To: SZEDER Gábor; +Cc: Jeff King, git, Stefan Beller On Thu, Jan 31, 2019 at 3:02 AM SZEDER Gábor <szeder.dev@gmail.com> wrote: > > I think we should remove most (all?) CI-related microprojects. Yeah, I agree that they don't make sense anymore. > Ok, after suggesting the removal of five microproject ideas, here is a > suggestion for a new one: > > Find a test script that verifies the presence/absence of > files/directories with 'test -(e|f|d|...)' and replace them with the > appropriate 'test_path_is_file', 'test_path_is_dir', etc. helper > functions. > > The good thing about this is that there are plenty of those test > scripts :) Thank you for this suggestion, I will add it. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Contributor Summit Topics and Logistics 2019-01-22 8:26 ` Jeff King 2019-01-22 9:17 ` GSoC 2019 (was: Contributor Summit Topics and Logistics) Christian Couder @ 2019-01-22 18:21 ` Stefan Beller 2019-01-22 20:53 ` Jeff King 1 sibling, 1 reply; 19+ messages in thread From: Stefan Beller @ 2019-01-22 18:21 UTC (permalink / raw) To: Jeff King; +Cc: git, Christian Couder On Tue, Jan 22, 2019 at 12:26 AM Jeff King <peff@peff.net> wrote: > > On Tue, Jan 22, 2019 at 02:50:27AM -0500, Jeff King wrote: > > > There's no set agenda; we'll decide what to discuss that day. But if > > anybody would like to mention topics they are interested in (whether you > > want to present on them, or just have an open discussion), please do so > > here. A little advance notice can help people prepare more for the > > discussions. > > One topic worth discussing (here or there): the GSoC org deadline is Feb > 6th. Last year's org admins were Christian and Stefan (cc'd). Are you > both interested and able to continue? I am treading lightly this year; if no one else is around I could be an admin (definitely not a mentor), but I'd prefer not to. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Contributor Summit Topics and Logistics 2019-01-22 18:21 ` Contributor Summit Topics and Logistics Stefan Beller @ 2019-01-22 20:53 ` Jeff King 0 siblings, 0 replies; 19+ messages in thread From: Jeff King @ 2019-01-22 20:53 UTC (permalink / raw) To: Stefan Beller; +Cc: git, Christian Couder On Tue, Jan 22, 2019 at 10:21:56AM -0800, Stefan Beller wrote: > > > There's no set agenda; we'll decide what to discuss that day. But if > > > anybody would like to mention topics they are interested in (whether you > > > want to present on them, or just have an open discussion), please do so > > > here. A little advance notice can help people prepare more for the > > > discussions. > > > > One topic worth discussing (here or there): the GSoC org deadline is Feb > > 6th. Last year's org admins were Christian and Stefan (cc'd). Are you > > both interested and able to continue? > > I am treading lightly this year; if no one else is around I could be an > admin (definitely not a mentor), but I'd prefer not to. I can be an org admin, as well. If Christian is willing, then I think you don't need to do it this year. -Peff ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Contributor Summit Topics and Logistics 2019-01-22 7:50 Contributor Summit Topics and Logistics Jeff King 2019-01-22 8:26 ` Jeff King @ 2019-01-22 18:23 ` Derrick Stolee 2019-01-24 8:57 ` Ævar Arnfjörð Bjarmason 2019-01-22 20:30 ` Elijah Newren ` (2 subsequent siblings) 4 siblings, 1 reply; 19+ messages in thread From: Derrick Stolee @ 2019-01-22 18:23 UTC (permalink / raw) To: Jeff King, git On 1/22/2019 2:50 AM, Jeff King wrote: > For people who want to try to join remotely, I don't think we're going > to have a particularly fancy AV setup. But there should at least be a > big screen (which we typically do not really use for presenting), and I > hope we can provide some connectivity. I'll be visiting the venue the > day before (Jan 30th) in the late afternoon (Brussels time) and I'll try > to do a test run. If anybody wants to volunteer to be the guinea pig on > the other end of the line, I'd welcome it. I would like to join remotely, so I volunteer to do a test run. I'll need to wake up early, so let's set an exact time privately. Topics I would like to hear about: - commit-graph status report (I can lead, if I'm able to join) - multi-pack-index status report (same) - reftable - partial clone - test coverage report, usefulness or improvements Thanks, -Stolee ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Contributor Summit Topics and Logistics 2019-01-22 18:23 ` Derrick Stolee @ 2019-01-24 8:57 ` Ævar Arnfjörð Bjarmason 2019-01-29 18:22 ` Derrick Stolee 0 siblings, 1 reply; 19+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2019-01-24 8:57 UTC (permalink / raw) To: Derrick Stolee; +Cc: Jeff King, git On Tue, Jan 22 2019, Derrick Stolee wrote: > On 1/22/2019 2:50 AM, Jeff King wrote: >> For people who want to try to join remotely, I don't think we're going >> to have a particularly fancy AV setup. But there should at least be a >> big screen (which we typically do not really use for presenting), and I >> hope we can provide some connectivity. I'll be visiting the venue the >> day before (Jan 30th) in the late afternoon (Brussels time) and I'll try >> to do a test run. If anybody wants to volunteer to be the guinea pig on >> the other end of the line, I'd welcome it. > > I would like to join remotely, so I volunteer to do a test run. I'll > need to wake up early, so let's set an exact time privately. > > > Topics I would like to hear about: > > - commit-graph status report (I can lead, if I'm able to join) While we're at it it would be useful to discuss what attendes think about making core.commitGraph=true && gc.writeCommitGraph=true the default. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Contributor Summit Topics and Logistics 2019-01-24 8:57 ` Ævar Arnfjörð Bjarmason @ 2019-01-29 18:22 ` Derrick Stolee 0 siblings, 0 replies; 19+ messages in thread From: Derrick Stolee @ 2019-01-29 18:22 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason; +Cc: Jeff King, git I was hoping to attend the contributors' summit remotely, but now my leave is starting before then. This email contains a summary of what I would have added to the discussion. Thanks, -Stolee Commit-Graph Status Report ========================== I'm really happy with the progress in this area, especially with the number of other contributors working on the feature! Thanks Ævar, Jonathan, Josh, Stefan, and Szeder in particular. Here are some directions to take the feature in the near future: File Format v2 -------------- The new format version [1] specifically fixes some shortcomings in v1: * Uses the 4-byte format id for the hash algorithm. * Creates a separate version byte for the reachability index. * Enforces that the unused byte is zero until we use it for incremental writes. Hopefully, this is the last time we need to update the file header. [1] https://public-inbox.org/git/pull.112.git.gitgitgadget@gmail.com/ [PATCH 0/6] Create commit-graph file format v2 Reachability Index ------------------ As discussed on-list [2], we want to replace generation numbers with a different (negative-cut) reachability index. I used the term "corrected commit date". The definition is: * If a commit has no parents, then its corrected commit date is its commit date. * If a commit has parents, then its corrected commit date is the maximum of: - its commit date - one more than the maximum corrected commit date of its parents The benefits of this definition were discussed already, but to summarize: * This definition will work _at least as well_ as the commit date heuristic, with the added bonus of being absolutely sure our results are right. We can update algorithms like paint_down_to_common() to use this reachability index without performance problems in some cases. * If someone creates a terrible commit with a date that is far in the future, this definition is no worse than existing generation numbers (because we enforce that the corrected commit date is strictly larger than the parents' corrected commit date). To implement this index, we can re-use the 30 bits per commit in the commit-graph file that are used for generation numbers, but use them instead for the difference between the corrected commit date and the actual commit date. File format v2 gives us a version value that can be incremented to signal the change in meaning. Some work is required to adjust the existing generation-number-aware algorithms to care about an "arbitrary" reachability index. It could be as easy as a helper function that returns a function pointer to the proper compare function. If someone wants to move forward on this topic while I'm gone, please volunteer. Otherwise, this will be among my first items to work on when I return from leave. [2] https://public-inbox.org/git/6367e30a-1b3a-4fe9-611b-d931f51effef@gmail.com/ [RFC] Generation Number v2 Incremental Writes ------------------ Similar to the split index, an incremental commit-graph file can be implemented to reduce the write time when adding commits to an existing (large) commit-graph. In this case, the .git/objects/info/commit-graph file would be small, and have a pointer to a base file, say "cgraph-<hash>.cgraph", that contains the majority of the commits. The important thing to keep in mind here is that we use integers to refer to a commit's parents. This integer would need to refer to the order of commits when you concatenate the orderd lists from each file. When doing this, we can point into the base file as well as the tip file. Since the base commit-graph file would be closed under reachability, it only needs to care about commits in its file. It is also possible to have multiple base files, and we can use the unused byte in the commit-graph file format v2 to store the number of base files. We can then store a list of file names in a new chunk, presenting the ordered list of base files. We still want to keep this list short, but there may be benefits to a variable number. I expect the first version would limit the construction to one base file for simplicity's sake. When this is implemented, we can use it to write the commit-graph at fetch time. A config setting, say 'fetch.writeCommitGraph', could enable this write. Since most writes would add a small number of commits compared to the large base file, this would be a more reasonable cost to add to a fetch. Since we verify the pack upon download, the commits it contained will already be in the memory cache and we won't need to re-parse those commits. Volunteers welcome. Bloom Filters ------------- Using bloom filters to speed up file history has been discussed and prototyped on-list (see [12] and the thread before it). Thanks for lots of contributions in this area! A lot of people have shown an interest in this feature, and it is particularly helpful with server-side queries. Any implementation here should check that it is helping 'git blame' as much as it can [13]. It's entirely possible that the performance problem mentioned there is more about the size of the file and not finding the commits that changed the file, but it's worth digging in here. A few people have mentioned that they are interested in pursuing this implementation, so it would be good to declare intentions during the summit. [12] https://public-inbox.org/git/61559c5b-546e-d61b-d2e1-68de692f5972@gmail.com/ [13] https://public-inbox.org/git/CABXAcUzoNJ6s3=2xZfWYQUZ_AUefwP=5UVUgMnafKHHtufzbSA@mail.gmail.com/ Enabled by Default? ------------------- I proposed turning on the feature by default [3], but that had some resistance [4] and I never followed up to that remark. (It involved the hope that we could consolidate commit walks during a gc/repack. I'm unsure this is a goal worth pursuing.) Since there has been more interest recently [5] I think it would be good to discuss what concerns we may have in turning this on by default. Specifically, make 'core.commitGraph' and 'gc.writeCommitGraph' default to 'true'. Users could still opt-out. [3] https://public-inbox.org/git/pull.50.git.gitgitgadget@gmail.com/ [4] https://public-inbox.org/git/xmqqlg6vvrur.fsf@gitster-ct.c.googlers.com/ [5] https://public-inbox.org/git/87bm464elm.fsf@evledraar.gmail.com/ Multi-Pack-Index Status Report ============================== The multi-pack-index feature shipped with Git 2.20! We've been using this feature (or, a similar implementation as it changed a lot with review) in VFS for Git for a year now. It's been critical to solving the many-packs problem we have with our prefetch packs model. Our next version ships with Git 2.20 and the upstream implementation. We are now able to start tackling our space problem with these many packs. Our solution includes the 'expire' and 'repack' subcommands [6]. We will run these in the background [7] to slowly reduce the space we are using. Since Git references the multi-pack-index, we are able to delete packs that have no referenced objects from the multi-pack-index without interrupting user commands (I don't think the same holds for 'git repack'). This "highly available" model makes me think that this could be useful to other scenarios. We are looking for interest from other users or groups in this feature. We want this feature to be adopted, and that means the future of the feature should depend on more scenarios than our specific case. Here are some ideas to make this more useful for others: 1. Incremental writes. See the commit-graph section for details. This would allow writing the multi-pack-index on fetch, helping users who have set gc.auto=0 keep performance high even though they have packs piling up. 2. Stable object order and bitmaps. This is discussed in the design document [8]. This is more useful for server environments. [6] https://public-inbox.org/git/pull.92.git.gitgitgadget@gmail.com/ [PATCH 0/5] Create 'expire' and 'repack' verbs for git-multi-pack-index [7] https://github.com/Microsoft/VFSForGit/blob/9cad154293456a41bef593a75e1ad2cb840c8524/GVFS/GVFS.Common/Maintenance/PackfileMaintenanceStep.cs#L141-L158 The use of 'expire' and 'repack' in VFS for Git [8] https://github.com/git/git/blob/master/Documentation/technical/multi-pack-index.txt#L77-L84 multi-pack-index and stable object order Test Coverage Report ==================== My intentions creating the test coverage report were to avoid bugs by double- checking that we are testing all logic that was both (1) non-trivial, and (2) new. The report does tend to be noisy with a lot of trivial blocks (error cases) or code that was not covered before but was updated with a mechanical refactoring. I'm hoping to attack these issues by using a new approach when generating the reports. I've created a GitHub repo [9] that contains new logic for generating the test coverage report. In particular, it will now generate a text report that will be sent to the list, but also an HTML report that will be posted online (see [10] for an example). In addition, the repo has an 'ignored' directory. This directory will be filled with files that mirror their corresponding files in the Git repo, but contain line numbers and contents for lines that have been deemed "unimportant". For instance, I didn't want to just ignore all lines that say simply "return;" but we can check that line 302 of builtin/checkout.c says "return;" and ignore that line in the report [11]. I'll try to review the test report and add ignored lines before generating the next report. I'll also accept PRs that add ignored lines (with justification). I think this will help the usefulness significantly, especially as topics merge down into 'next' and 'master'. If we track the ignored lines throughout a cycle, then the report for 'maint' versus 'master' near release time may actually be reasonable to read. Any other feedback on the reports is greatly appreciated! [9] https://github.com/derrickstolee/git-test-coverage [10] https://derrickstolee.github.io/git-test-coverage/reports/2019-01-29.htm [11] https://github.com/derrickstolee/git-test-coverage/blob/master/ignored/builtin/checkout.c ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Contributor Summit Topics and Logistics 2019-01-22 7:50 Contributor Summit Topics and Logistics Jeff King 2019-01-22 8:26 ` Jeff King 2019-01-22 18:23 ` Derrick Stolee @ 2019-01-22 20:30 ` Elijah Newren 2019-01-30 20:57 ` Ævar Arnfjörð Bjarmason 2019-01-30 23:07 ` Jeff King 4 siblings, 0 replies; 19+ messages in thread From: Elijah Newren @ 2019-01-22 20:30 UTC (permalink / raw) To: Jeff King; +Cc: Git Mailing List On Mon, Jan 21, 2019 at 11:52 PM Jeff King <peff@peff.net> wrote: > > The Git Merge Contributor Summit is a little over a week away. If you're > interested in coming but haven't signed up, please do! We have a few > spaces available still. Details are in the previous announcement: > > http://public-inbox.org/git/20181206094805.GA1398@sigill.intra.peff.net/ > > There's no set agenda; we'll decide what to discuss that day. But if > anybody would like to mention topics they are interested in (whether you > want to present on them, or just have an open discussion), please do so > here. A little advance notice can help people prepare more for the > discussions. * git repo-filter[1] or whatever it ends up being named (filter-branch alternative): is it wanted in git.git? * merge-recursive rewrite -- steps others want to see me take in that process? * Making --merge option of rebase be the default[2]: what steps need to be taken? * I'll second Derrick's request for partial clone, perhaps also briefly discuss related capabilities like sparse checkouts and partial indexes too? [1] https://public-inbox.org/git/20181111062312.16342-1-newren@gmail.com/ [2] https://public-inbox.org/git/xmqqh8jeh1id.fsf@gitster-ct.c.googlers.com/ ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Contributor Summit Topics and Logistics 2019-01-22 7:50 Contributor Summit Topics and Logistics Jeff King ` (2 preceding siblings ...) 2019-01-22 20:30 ` Elijah Newren @ 2019-01-30 20:57 ` Ævar Arnfjörð Bjarmason 2019-01-30 22:26 ` Jeff Hostetler 2019-01-30 22:51 ` Philip Oakley 2019-01-30 23:07 ` Jeff King 4 siblings, 2 replies; 19+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2019-01-30 20:57 UTC (permalink / raw) To: Jeff King; +Cc: git, Derrick Stolee, Johannes Schindelin On Tue, Jan 22 2019, Jeff King wrote: > There's no set agenda; we'll decide what to discuss that day. But if > anybody would like to mention topics they are interested in (whether you > want to present on them, or just have an open discussion), please do so > here. A little advance notice can help people prepare more for the > discussions. This is definitely a "little" advance seeing as it's tomorrow morning. > Even if you're not coming, please feel free to suggest topics (but bonus > points if you convince somebody who _is_ coming to lead the session). Things I'd be interested in hearing / talking about about that haven't yet been mentioned. These are in descending order of how interesting I think these will be to a general audience, to the point where maybe only I care about the bottom of this list... * "Big repos". We had discussions about this in years past. It's a very spawly and vague topic. Do we mean big history, big blobs, big (in size/depth/width) checkouts etc? But regardless, many of us deal with this in one way or another, and it would be good to have a top-level overview of how the various solutions to this that are being integrated into git.git are doing / what people see on the horizon for scalabiltiy. * "Structured remote logging". We had an RFC spec for turning our trace format into something more structural with a way to send it to a remote server. There were both implementation & privacy concernse, last time at least a couple of users of git reported having in-house patches for this (not ready for upstream). Where are we on this now? * "commit graph by default". I had this on my list, but Derrick Stolee sent out a much better summary: https://public-inbox.org/git/6d0dc2a2-120c-0d42-1910-14ffed7adaf1@gmail.com/ * I've been using (but haven't yet re-rolled) my "relative SHA-1 abbreviation" series (https://public-inbox.org/git/20180608224136.20220-1-avarab@gmail.com/) I'm interested in seeing if anyone else is interested in this, and particularly what the overlap (if any) is between this & midx. * "Making strict fsck checks on clone the default". I worked a bit on this in this last year in between a couple of security issues that needed fsck checks. Has caveats etc., but would give users some more protections. * "The CI I set up for git on the GCC Compile Farm". Can be folded into a general "state of git.git CI" topic: https://gitlab.com/git-vcs/git-ci/pipelines * If people care about making the TAP mode in our test suite mandatory (i.e. require "prove" or a tool like it). See https://public-inbox.org/git/87zhrj2n2l.fsf@evledraar.gmail.com/ ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Contributor Summit Topics and Logistics 2019-01-30 20:57 ` Ævar Arnfjörð Bjarmason @ 2019-01-30 22:26 ` Jeff Hostetler 2019-01-30 22:51 ` Philip Oakley 1 sibling, 0 replies; 19+ messages in thread From: Jeff Hostetler @ 2019-01-30 22:26 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason, Jeff King Cc: git, Derrick Stolee, Johannes Schindelin On 1/30/2019 3:57 PM, Ævar Arnfjörð Bjarmason wrote: > > On Tue, Jan 22 2019, Jeff King wrote: > ... > * "Structured remote logging". We had an RFC spec for turning our trace > format into something more structural with a way to send it to a > remote server. There were both implementation & privacy concernse, > last time at least a couple of users of git reported having in-house > patches for this (not ready for upstream). Where are we on this now? I won't be attending GitMerge this year, but I can talk about this work here. My earlier "structured logging" and/or "telemetry" proposals have been replaced by my Trace2 patch series now in "pu". The Trace2 feature is designed to report trace and performance data from within the git process to a local log file, unix domain socket, or Windows named pipe. Functions in the Trace2 API generate structured data and can write either structured (JSON) or non-structured formats to disk. (It should not be hard to add a binary structured format too, but that is beyond the scope of the current patch series.) The JSON stream is suitable for post-processing by a local process. This can be a daemon listening to the stream or a cron job processing the trace data after the fact. I consider it to be the job of the post-processor (after aggregating, filtering or whatever) to decide what to do with the data. This lets the the user and/or sysadmin control how and when data is collected. The post-processor is free to hook into something like syslog or ETW or write to a custom DB. Post-processing tools are not included in the patch series. Internally within Microsoft, we have a local Windows Service listening on a named pipe and collecting events from all git processes for our GVFS users in the Windows OS repo. It computes a summary record for each git command, for example combining the argv from the "start" event with the elapsed time from the "exit" event into a single record. The service then sends the aggregate records to a centralized database. This lets us run various database queries to try to understand pain points that our OS developers are experiencing (and that may not show up on my machine) and help us prioritize future perf and scaling work. But again, this service is but one possible post-processor and is for internal-use-only. The Trace2 feature itself does not have any remote capability. It just writes data locally. Jeff ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Contributor Summit Topics and Logistics 2019-01-30 20:57 ` Ævar Arnfjörð Bjarmason 2019-01-30 22:26 ` Jeff Hostetler @ 2019-01-30 22:51 ` Philip Oakley 2019-01-30 23:13 ` Christian Couder 1 sibling, 1 reply; 19+ messages in thread From: Philip Oakley @ 2019-01-30 22:51 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason, Jeff King Cc: git, Derrick Stolee, Johannes Schindelin On 30/01/2019 20:57, Ævar Arnfjörð Bjarmason wrote: > On Tue, Jan 22 2019, Jeff King wrote: > >> There's no set agenda; we'll decide what to discuss that day. But if >> anybody would like to mention topics they are interested in (whether you >> want to present on them, or just have an open discussion), please do so >> here. A little advance notice can help people prepare more for the >> discussions. > This is definitely a "little" advance seeing as it's tomorrow morning. > >> Even if you're not coming, please feel free to suggest topics (but bonus >> points if you convince somebody who _is_ coming to lead the session). > Things I'd be interested in hearing / talking about about that haven't > yet been mentioned. > > These are in descending order of how interesting I think these will be > to a general audience, to the point where maybe only I care about the > bottom of this list... > > * "Big repos". We had discussions about this in years past. It's a very > spawly and vague topic. Do we mean big history, big blobs, big (in > size/depth/width) checkouts etc? > > But regardless, many of us deal with this in one way or another, and > it would be good to have a top-level overview of how the various > solutions to this that are being integrated into git.git are doing / > what people see on the horizon for scalabiltiy. I'd also like a bit of discussion about ensuring that the partial clone & filtering aspects of 'big repos' (if partial is needed /used then it's big ...) still retain the full 'distributed' nature and capability of git. Also in some environments the filtering may want to be applied at the server end (based on it's knowledge of the specific user). Ultimately it should also pull in some of the sub-module aspects as super projects are just big repos in disguise. > > * "Structured remote logging". We had an RFC spec for turning our trace > format into something more structural with a way to send it to a > remote server. There were both implementation & privacy concernse, > last time at least a couple of users of git reported having in-house > patches for this (not ready for upstream). Where are we on this now? > > * "commit graph by default". I had this on my list, but Derrick Stolee > sent out a much better summary: > https://public-inbox.org/git/6d0dc2a2-120c-0d42-1910-14ffed7adaf1@gmail.com/ > > * I've been using (but haven't yet re-rolled) my "relative SHA-1 > abbreviation" series > (https://public-inbox.org/git/20180608224136.20220-1-avarab@gmail.com/) > > I'm interested in seeing if anyone else is interested in this, and > particularly what the overlap (if any) is between this & midx. > > * "Making strict fsck checks on clone the default". I worked a bit on > this in this last year in between a couple of security issues that > needed fsck checks. Has caveats etc., but would give users some more > protections. > > * "The CI I set up for git on the GCC Compile Farm". Can be folded into > a general "state of git.git CI" topic: > https://gitlab.com/git-vcs/git-ci/pipelines > > * If people care about making the TAP mode in our test suite mandatory > (i.e. require "prove" or a tool like it). See > https://public-inbox.org/git/87zhrj2n2l.fsf@evledraar.gmail.com/ I also had some questions regarding tree walk issues for follower and friendly fork repos that have lots of deadheads within their tree, such as previous release versions in Git for Windows. It should be easier to filter those deadheads (or at least suggest the best way of creating such sentinels). -- Philip ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Contributor Summit Topics and Logistics 2019-01-30 22:51 ` Philip Oakley @ 2019-01-30 23:13 ` Christian Couder 0 siblings, 0 replies; 19+ messages in thread From: Christian Couder @ 2019-01-30 23:13 UTC (permalink / raw) To: Philip Oakley Cc: Ævar Arnfjörð Bjarmason, Jeff King, git, Derrick Stolee, Johannes Schindelin On Thu, Jan 31, 2019 at 12:05 AM Philip Oakley <philipoakley@iee.org> wrote: > > On 30/01/2019 20:57, Ævar Arnfjörð Bjarmason wrote: > > > > * "Big repos". We had discussions about this in years past. It's a very > > spawly and vague topic. Do we mean big history, big blobs, big (in > > size/depth/width) checkouts etc? > > > > But regardless, many of us deal with this in one way or another, and > > it would be good to have a top-level overview of how the various > > solutions to this that are being integrated into git.git are doing / > > what people see on the horizon for scalabiltiy. I am also very interested in that topic ;-) > I'd also like a bit of discussion about ensuring that the partial clone > & filtering aspects of 'big repos' (if partial is needed /used then it's > big ...) still retain the full 'distributed' nature and capability of git. And in this too, especially regarding my work on many promisor/partial clone remotes (previously ODBs). ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Contributor Summit Topics and Logistics 2019-01-22 7:50 Contributor Summit Topics and Logistics Jeff King ` (3 preceding siblings ...) 2019-01-30 20:57 ` Ævar Arnfjörð Bjarmason @ 2019-01-30 23:07 ` Jeff King 2019-02-02 12:33 ` Jakub Narebski 4 siblings, 1 reply; 19+ messages in thread From: Jeff King @ 2019-01-30 23:07 UTC (permalink / raw) To: git On Tue, Jan 22, 2019 at 02:50:27AM -0500, Jeff King wrote: > If you're not coming, you can probably stop reading this message now. > The rest is all logistics. Here are a few additional last-minute logistics: > For people who want to try to join remotely, I don't think we're going > to have a particularly fancy AV setup. But there should at least be a > big screen (which we typically do not really use for presenting), and I > hope we can provide some connectivity. I'll be visiting the venue the > day before (Jan 30th) in the late afternoon (Brussels time) and I'll try > to do a test run. If anybody wants to volunteer to be the guinea pig on > the other end of the line, I'd welcome it. The remote connection will be done via Zoom, using this URL which will become active shortly before 10:00am (Brussels time): https://github.zoom.us/j/186903655 You may need to download an app or other software; solutions are available for most platforms, and the zoom site should guide you. Note that this is _not_ configured as a one-way webinar. It's a real video-conference where joiners can participate in the discussion. So spectators from the community are OK, but please leave your camera/mic off if you're not actively participating. > The physical setup this year will actually be 4 round tables, instead of > one giant table. I'm hoping this will facilitate breaking off into > sub-groups and having more intimate conversations, and maybe avoid the > "it's hard to hear people at the other end of the table" issues. Or > maybe it will just make it worse as we shout to each other from all four > tables. I can't wait to see! There will be outlets for charging laptops, but probably only about half as many as there are people. So plan accordingly. > There's no organized dinner. However, there will be a social/drinks > event for the broader conference at 7pm; I'll provide more details > that day. This is indeed happening, and is open to all Git Merge attendees. Details should have been emailed out to the email address you registered with. Note that they're asking people to RSVP through a web link. Please do so if you're planning on coming! See everybody tomorrow at 9:00am. -Peff ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Contributor Summit Topics and Logistics 2019-01-30 23:07 ` Jeff King @ 2019-02-02 12:33 ` Jakub Narebski 2019-02-04 19:30 ` Elijah Newren 2019-04-23 3:45 ` Jeff King 0 siblings, 2 replies; 19+ messages in thread From: Jakub Narebski @ 2019-02-02 12:33 UTC (permalink / raw) To: Jeff King; +Cc: git Jeff King <peff@peff.net> writes: > On Tue, Jan 22, 2019 at 02:50:27AM -0500, Jeff King wrote: > >> If you're not coming, you can probably stop reading this message now. >> The rest is all logistics. > > Here are a few additional last-minute logistics: > >> For people who want to try to join remotely, I don't think we're going >> to have a particularly fancy AV setup. But there should at least be a >> big screen (which we typically do not really use for presenting), and I >> hope we can provide some connectivity. I'll be visiting the venue the >> day before (Jan 30th) in the late afternoon (Brussels time) and I'll try >> to do a test run. If anybody wants to volunteer to be the guinea pig on >> the other end of the line, I'd welcome it. > > The remote connection will be done via Zoom, using this URL which will > become active shortly before 10:00am (Brussels time): > > https://github.zoom.us/j/186903655 > > You may need to download an app or other software; solutions are > available for most platforms, and the zoom site should guide you. Thank you very much for setting this remote connection up. It did make it possible for me to watch the Git Contributor Summit 2019 (and take notes for Git Rev News). I have had Zoom installed already, so it was not a problem. (As I have seen, Szeder Gábor was also spectacting ;-) The audio was not always clear, which depended on where the person speaking was positioned; I understand that it is a very difficult problem to get good acoustic in such unstructured setup. > Note that this is _not_ configured as a one-way webinar. It's a real > video-conference where joiners can participate in the discussion. So > spectators from the community are OK, but please leave your camera/mic > off if you're not actively participating. As far as I know it went untested (but then nobody announced that he or she wants to actively participate remotely). I didn't stay for the 15-17 breakout session (talking in individual groups); I wonder how well the remote connection setup would work with multiple discussions in parallel. I have noticed a little 'recording' indicator; would recorded session (video or audio only) be made available at some point in time? Did anyone take minutes, or take notes (for example of the Summit agenda created at the start of the meeting -- when the audio was muted)? I would be very interested in your impressions. > See everybody tomorrow at 9:00am. The event actually started at 10:00am CET. Thanks again, -- Jakub Narębski ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Contributor Summit Topics and Logistics 2019-02-02 12:33 ` Jakub Narebski @ 2019-02-04 19:30 ` Elijah Newren 2019-04-23 3:45 ` Jeff King 1 sibling, 0 replies; 19+ messages in thread From: Elijah Newren @ 2019-02-04 19:30 UTC (permalink / raw) To: Jakub Narebski; +Cc: Jeff King, Git Mailing List Hi Jakub, On Sat, Feb 2, 2019 at 4:39 AM Jakub Narebski <jnareb@gmail.com> wrote: > I have noticed a little 'recording' indicator; would recorded session > (video or audio only) be made available at some point in time? Did > anyone take minutes, or take notes (for example of the Summit agenda > created at the start of the meeting -- when the audio was muted)? I > would be very interested in your impressions. I took some notes. I'm not sure how useful they'll be given that they were meant just for my own memory (my company said I either had to give a talk at the conference or come back and give a talk to my coworkers about the conference in order for them to pay for it, so I'm doing the latter). But, I'll provide them here in case they're useful to anyone. Discussion points: * Fetch response CDN offloading (Jonathan Tan) * allows resumable cloning * does load balancing * gets the static part of history (e.g. until a week ago) from CDN, and last bits from "main" server * questions about whether to do multiple bits offloaded (e.g. almost full clone, only stuff from last month, etc.); can server keep track of manifest and direct client to necessary subset of pack on a CDN? * A review of "Big" * references, history, wide-checkout, deep-checkout, lots to gc, etc. * newer stuff: partial clones, worktrees, commit-graph * plan to do a breakout session later * NewHash * sha1 -> sha256 * have sha256 repo locally talking to a server using sha1? * as of yesterday, binary that can create either sha1 or sha256 repos * will be using fixed length listing of shas in packfile; if given sha1 is fourth in list, then the corresponding sha256 will be fourth * next: interoperation; fetch & push coming up next * done a fair amount of work so moving to a new hash in the future with a different length should be much less work * no automatic translation of commit messages, but maintenance of dual-mapping of hashes * (Comments on sha1dc & its performance) * Submodules is the biggest issue right now * Poll: prove vs. jumbled output * some people didn't set up prove; some attempts to avoid perl on windows * nearly everyone using prove; could switch to it as the default * Poll: where should Git Merge be next year? * will bring up on list, but Canada is at least an option * North America is more likely to get Junio to come * I tried to push for North America... * Using mailmap by default in git log? * People change names for lots of reasons (including transliteration differences) * Keep an option to not use mailmap * People generally positive on the idea <Lunch> * fetch response sideband-all * sidebands for progress messages and errors * sideband currently limited to when sending packfile * proposal: expand sideband for whole response, not just packfile. * particularly useful given ideas to do CDN * also needed for keep-alive messages * this will be a negotiated new capability (can't do it backward-compatibly) * protocol v2 for push * ref advertisement the main issue * would like to be able to modify the commit message (?!?) * rebase-on-push * reformat-on-push * discussion of how to split messages up into sub-commands * a way to retry pushes without re-pushing everything (e.g. someone else updated the branch, you then re-merged or rebased locally and want to push again, meaning the server already has _most_ the objects but just needs a few new ones) * partial clones * doing work to have multiple remotes (also ties in to CDN usage) * still very tied to having a server around to request additional objects * we need to have a way to keep upload-pack open and do multiple requests * has some ability to filter trees, but we need them for now for index * Matthew Devore doing some work in this area right now, but it appears to be based on depth rather than width? * connection with sparse checkout is kind of hacky right now * there are reachability enforcement issues in V2, which becomes even more of an issue with partial clones (now need to worry about blobs not just commits) * in a partial clone world, server can't gc * sidenote: dumb http support * no major hosting provider supports it * some people like it due to resumability (e.g. Joey Hess & git annex) * cgit provides dumb http support natively * questions I had in area: getting list of initial files of interest... gluing together with sparse checkout partial indexes <break; talked with Michael H. & Thomas G.: filter-repo, checkout overlay> * breakouts: merge, GSoc, structured logging, windows big files; I was in "merge" * merge-recursive rewrite * questions and basic explanation of how the algorithm works * want incremental updates on merge-recursive rewrite * make merge-recursive code part of libgit.a ? * people are very happy about idea to not touch the working tree * make rebase --merge the default * use performance tests to see how well it compares (p3400-rebase.sh) * may later also reimplement the am-specific flags on top of interactive * make use of best merge bases in more places (e.g. git diff A...B uses a suboptimal one) * rebase --rebase-merges: * doing a five-way merge rewriting xdiff to handle five instead of three file versions * M merges A & B * M' should like like a merge of A' and B', but really involved in a five way merge of A', A, M, B, B' -- and that is necessary in order to get evil merge represented * overview of "Big" * git-sizer (funny: git-lab asks users to run it and return results; github runs it for user and shows them the results) * large blobs, partial clones * partial or hierarchical indexes * CI * Dscho has a lot of machinery built up around Azure Pipelines * PRs to github.com:gitgitgadget/git will automatically be built on Windows, MacOS X, and linux * Interest in getting emails for failures that their topic branch caused (note: get topic author from tip commit author if not Junio) * This may be able to move to github.com:git/git after Dscho's patches merge down Stuff that had been mentioned but we didn't get to: state-of-the-union, commit-graph, evolve (we had the developer of the feature in mercurial present, but not the folks who had worked on the feature in git), git filter-repo, maybe a few others I'm forgetting. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Contributor Summit Topics and Logistics 2019-02-02 12:33 ` Jakub Narebski 2019-02-04 19:30 ` Elijah Newren @ 2019-04-23 3:45 ` Jeff King 1 sibling, 0 replies; 19+ messages in thread From: Jeff King @ 2019-04-23 3:45 UTC (permalink / raw) To: Jakub Narebski; +Cc: git On Sat, Feb 02, 2019 at 01:33:22PM +0100, Jakub Narebski wrote: > I have noticed a little 'recording' indicator; would recorded session > (video or audio only) be made available at some point in time? Did > anyone take minutes, or take notes (for example of the Summit agenda > created at the start of the meeting -- when the audio was muted)? I > would be very interested in your impressions. I did record this. The resulting file is quite large, and full of incoherent bits and blank spots (where we took a break and turned off the mics but forgot to pause the recording). I had planned to try to cut it down (at least roughly removing the useless spots), but here it is April and I haven't managed to do so. If anybody wants to volunteer to take a crack at it, let me know. The video file is a few gigabytes. TBH, I've wondered if just distributing the audio would be just as useful, since the camera is mostly a static shot of people who aren't currently talking. ;) -Peff ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2019-04-23 3:45 UTC | newest] Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-01-22 7:50 Contributor Summit Topics and Logistics Jeff King 2019-01-22 8:26 ` Jeff King 2019-01-22 9:17 ` GSoC 2019 (was: Contributor Summit Topics and Logistics) Christian Couder 2019-01-31 2:02 ` SZEDER Gábor 2019-01-31 6:11 ` Christian Couder 2019-01-22 18:21 ` Contributor Summit Topics and Logistics Stefan Beller 2019-01-22 20:53 ` Jeff King 2019-01-22 18:23 ` Derrick Stolee 2019-01-24 8:57 ` Ævar Arnfjörð Bjarmason 2019-01-29 18:22 ` Derrick Stolee 2019-01-22 20:30 ` Elijah Newren 2019-01-30 20:57 ` Ævar Arnfjörð Bjarmason 2019-01-30 22:26 ` Jeff Hostetler 2019-01-30 22:51 ` Philip Oakley 2019-01-30 23:13 ` Christian Couder 2019-01-30 23:07 ` Jeff King 2019-02-02 12:33 ` Jakub Narebski 2019-02-04 19:30 ` Elijah Newren 2019-04-23 3:45 ` Jeff King
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).