* [GSoC] [PATCH 0/1] draft proposal for reachability bitmap @ 2022-04-09 18:43 Shubham Mishra 2022-04-09 18:43 ` [PATCH 1/1] [GSoC] Initial draft proposal Shubham Mishra 2022-04-09 18:43 ` [PATCH 1/1] " Shubham Mishra 0 siblings, 2 replies; 4+ messages in thread From: Shubham Mishra @ 2022-04-09 18:43 UTC (permalink / raw) To: git; +Cc: me, kaartic.sivaraam, Shubham Mishra Hi Community, I am sharing draft for my proposal on reachability bitmap. I am looking to get your reviews on proposal on every aspect that includes planning, grammar, or anything I might missed to mention. Thanks, looking forward to get some comments and improve the initial version. google doc link - https://docs.google.com/document/d/1Zei9g3mYa0SAz6Z94cIx_1tgFeID6xIL5Qb3Un_itVQ Thanks, Shubham Shubham Mishra (1): Initial draft proposal proposal.txt | 132 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 132 insertions(+) create mode 100644 proposal.txt base-commit: bf23fe5c37d62f37267d31d4afa1a1444f70cdac -- 2.25.1 ^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH 1/1] [GSoC] Initial draft proposal 2022-04-09 18:43 [GSoC] [PATCH 0/1] draft proposal for reachability bitmap Shubham Mishra @ 2022-04-09 18:43 ` Shubham Mishra 2022-04-09 18:43 ` [PATCH 1/1] " Shubham Mishra 1 sibling, 0 replies; 4+ messages in thread From: Shubham Mishra @ 2022-04-09 18:43 UTC (permalink / raw) To: git; +Cc: me, kaartic.sivaraam, Shubham Mishra Initial draft proposal for reachability bitmap Signed-off-by: Shubham Mishra <shivam828787@gmail.com> --- proposal.txt | 132 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 132 insertions(+) create mode 100644 proposal.txt diff --git a/proposal.txt b/proposal.txt new file mode 100644 index 0000000000..0888ba8b38 --- /dev/null +++ b/proposal.txt @@ -0,0 +1,132 @@ +# GIT + +# Reachability bitmap performance improvement + +Name - Shubham Mishra + +Email - [shivam828787@gmail.com](mailto:shivam828787@gmail.com) + +IRC Nick - shubham828 + +Location - Ghaziabad, Uttar Pradesh , India , UTC+5:30 + +Proposal Title - Reachability bitmap performance improvement. + +* * * + +About Me + +I am Shubham, I am currently working as a Software Engineer at Microsoft India. I am a 2021 graduate from Delhi University. I am passionate about core engineering and backend technologies. I love to see beyond all abstractions and how things really work under the hood. So, I can work from their roots and make things better. I feel engineering is all about the tradeoffs that we make and I am trying to learn them to become a better Engineer. + +I am passionate about open source technologies and have quite a good amount of contribution to them, I participated in GSoC 2020 with [KDE](https://summerofcode.withgoogle.com/archive/2020/projects/6473982317953024), did Internship with Linux Foundation - [HDV](https://wiki.lfnetworking.org/display/LN/HDV), Season of KDE - [report](https://community.kde.org/SoK/2020/StatusReport/Shubham), and I am doing voluntary contributions to VSperf, BoostC++ and some other open-source projects. + +Motivation for Proposal + +I have been using Git for the last 3 years now, and I always find myself curious about how it manages a lot of files seamlessly to make developer collaboration so smooth. + +I was listening to one of Derrick Stolee's podcasts, where I felt git contributors really do cool stuff, So I also wanted to be one. By becoming a regular contributor of git, I can give my contribution back to it as well as I can show off it to my friends :) . I love to study advanced data structures and Algorithms that's the reason I chose a bitmap related project. I will get a chance to learn different compression algorithms and analyze their performance. + +Project Abstract + +During repository clones, the Git server needs to find out all the objects which clients do not have and need to be sent to the client. + +To make the process faster, Git uses bitmaps to quickly find all the related objects from an object. Bitmap approach is a performance optimization over the legacy "Counting Objects" - the process in which the git server used to iterate through the graph from branch tips to the beginning of history to list down all objects that need to be sent. + +bitmap made reachability faster but uncompressed bitmaps can cost a lot of extra storage. Git uses a C ported version of [EWAHBoolArray](https://github.com/lemire/EWAHBoolArray) to compress bitmaps which get stored in the ".bitmap" file with the same prefix "sha" as ".pack" and ".idx". + +The aim of the project is to design a performance test suite as well as do the necessary changes to improve bitmap performance by trying out a new compression scheme that can make read operations along with other common operations like intersect, union and negate faster. + +Me & Git: + +## Microproject: + +I worked on the microproject "Avoid pipes in git related commands in test scripts", the patches for it has been merged to master now + +* [https://public-inbox.org/git/20220224054720.23996-3-shivam828787@gmail.com](https://public-inbox.org/git/20220224054720.23996-3-shivam828787@gmail.com/) +* [https://public-inbox.org/git/20220224054720.23996-3-shivam828787@gmail.co/](https://public-inbox.org/git/20220224054720.23996-3-shivam828787@gmail.com/) + + +I run a pattern matching grep to find all git commands on LHS of pipes and fix all of them from file t001-t050. + +As an outcome of this process, I got to learn the code review process at git work, which is quite cool and different from other organization's I contributed to before. + +I learned about building source code, running tests, using email to send patches, communicating with reviewers and sending the next patch version process. + +## Current understanding: + +* I have gone through git internals, and I well understood about the pack files as well as the difference between git objects (tree, blob, commit). +* I have gone through some documentations - "MyFirstObjectWalk", etc. it was a good hands-on to get some glimpse of general object related tasks. +* I understand how bitmap works in general, I have got some idea how EWAH compression works and also I have gone through the research paper on roaring run. +* I played with commands of pack-object - "git pack-objects dir --progress < obj_lists.txt" and read the code of related files "pack-bitmap.c" and parts of "pack-object.c" +* I checked the general documentation of [Croaring](https://github.com/RoaringBitmap/CRoaring) as one of the potential alternatives to EWAH. + + +Execution plan: + +I am interested in keeping my primary focus on "building a performance suite and improving bitmaps performance by finding a better compression scheme" project and if I finish this early or even after the GSoC timeline, I will be happy to contribute to other tasks too. + +From the idea page, I got some sense that decompressing a bitmap for reading bits or doing operations like intersection, negation and union makes them slow and can be improved. + +Roaring + Run was the suggested alternative to explore. It divides the data into chunks of 2^16, which allows you to check for the presence of any one value faster. As a result, Roaring can compute many operations much faster than run-length-encoded formats like WAH, EWAH, Concise. After getting a high level understanding of algorithms, I explored a bit of the [Croaring](https://github.com/RoaringBitmap/CRoaring) library which is a C implementation of roaring bitmap. It provides a lot of useful functions to do all general operations (find cardinality, and, or, copy, equals). Which I think we will be using in the "pack-bitmap.c" and "pack-bitmap-write.c" files as a replacement of ewah/bitmap functions. I do not have enough knowledge yet to figure out how compatible croaring is with the current .bitmap format. We might need to make changes in the current .bitmap format accordingly. + +Steps I will be following to accomplish the task- + +1\. Get a better understanding of bitmap related functionalities/ codebase, EWAH, roaring+run techniques. + +2\. try to build out an initial draft version implementing only minimal required core changes, I will try to get a review on it from a wider audience (including mentors). + +3\. make changes according to the comment and repeat the review process. + +4\. build a performance suite for the changes I have done in the above steps. + +5\. If performance improves, I will be writing the rest of the required code changes to use the Croaring including perf tests for them. + +6\. Until this time, I will also get a good understanding of the bitmap related projects, so if we will be able to make good progress on roaring+run. I can start picking other subprojects too like 'table of contents' for the .bitmap file where past work - [https://lore.kernel.org/git/YNuiM8TR5evSeNsN@nand.local/](https://lore.kernel.org/git/YNuiM8TR5evSeNsN@nand.local/) can act as a good reference to me or/and 'append-only bitmap generation' subproject. + +I feel for any mentoring program, Communication is the key to success. I will be utilizing the community bonding periods to figure out a process for being in regular touch with mentors, which is essential to make sure I am going in the right direction. + +Timeline + +I am available to dedicate around 30-35 hours every week to the project. + +Community bonding periods - + +1. Exploring code base mostly related to bitmaps +2. research on other bitmap compression techniques +3. reading technical documents +4. interacting with mentors to understand them and project in more detail. + + +12 June - 25 July: + +1. Writing the first version with the new compression technique +2. Get the initial version reviewed by reviewers and make changes accordingly. +3. write performance test for these changes + + +25 July - 4 sept + +1. if tests result well, extending the above functionality to completely move to a new technique. +2. writing more performance tests. +3. start picking up other tasks if time left + + +Sept 5 - Sept 12 + +1. I will make sure to get all changes merged before this week including tests +2. if not, make a decision with mentior on extending the project + + +Other proposals + +No, This is the only proposal I am making for GSoC 2022. + +Blog + +I want to make blog writing a habit so I planned to publish biweekly blogs at [https://shubham828.github.io/](https://shubham828.github.io/) during this GSoC and after that. This is something I started during my last GSoC too but unfortunately couldn't continue post-gsoc. This GSoC gives me another opportunity to become a regular blogger. + +After GSoC + +I would love to be a regular contributor of Git. After GSoC, I can pick any left out subprojects of bitmap reachability and I would also be happy to extend my knowledge beyond bitmaps to learn and contribute to other parts of git too. + +EndFragment \ No newline at end of file -- 2.25.1 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH 1/1] Initial draft proposal 2022-04-09 18:43 [GSoC] [PATCH 0/1] draft proposal for reachability bitmap Shubham Mishra 2022-04-09 18:43 ` [PATCH 1/1] [GSoC] Initial draft proposal Shubham Mishra @ 2022-04-09 18:43 ` Shubham Mishra 2022-04-13 20:44 ` Taylor Blau 1 sibling, 1 reply; 4+ messages in thread From: Shubham Mishra @ 2022-04-09 18:43 UTC (permalink / raw) To: git; +Cc: me, kaartic.sivaraam, Shubham Mishra Initial draft proposal for reachability bitmap Signed-off-by: Shubham Mishra <shivam828787@gmail.com> --- proposal.txt | 132 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 132 insertions(+) create mode 100644 proposal.txt diff --git a/proposal.txt b/proposal.txt new file mode 100644 index 0000000000..0888ba8b38 --- /dev/null +++ b/proposal.txt @@ -0,0 +1,132 @@ +# GIT + +# Reachability bitmap performance improvement + +Name - Shubham Mishra + +Email - [shivam828787@gmail.com](mailto:shivam828787@gmail.com) + +IRC Nick - shubham828 + +Location - Ghaziabad, Uttar Pradesh , India , UTC+5:30 + +Proposal Title - Reachability bitmap performance improvement. + +* * * + +About Me + +I am Shubham, I am currently working as a Software Engineer at Microsoft India. I am a 2021 graduate from Delhi University. I am passionate about core engineering and backend technologies. I love to see beyond all abstractions and how things really work under the hood. So, I can work from their roots and make things better. I feel engineering is all about the tradeoffs that we make and I am trying to learn them to become a better Engineer. + +I am passionate about open source technologies and have quite a good amount of contribution to them, I participated in GSoC 2020 with [KDE](https://summerofcode.withgoogle.com/archive/2020/projects/6473982317953024), did Internship with Linux Foundation - [HDV](https://wiki.lfnetworking.org/display/LN/HDV), Season of KDE - [report](https://community.kde.org/SoK/2020/StatusReport/Shubham), and I am doing voluntary contributions to VSperf, BoostC++ and some other open-source projects. + +Motivation for Proposal + +I have been using Git for the last 3 years now, and I always find myself curious about how it manages a lot of files seamlessly to make developer collaboration so smooth. + +I was listening to one of Derrick Stolee's podcasts, where I felt git contributors really do cool stuff, So I also wanted to be one. By becoming a regular contributor of git, I can give my contribution back to it as well as I can show off it to my friends :) . I love to study advanced data structures and Algorithms that's the reason I chose a bitmap related project. I will get a chance to learn different compression algorithms and analyze their performance. + +Project Abstract + +During repository clones, the Git server needs to find out all the objects which clients do not have and need to be sent to the client. + +To make the process faster, Git uses bitmaps to quickly find all the related objects from an object. Bitmap approach is a performance optimization over the legacy "Counting Objects" - the process in which the git server used to iterate through the graph from branch tips to the beginning of history to list down all objects that need to be sent. + +bitmap made reachability faster but uncompressed bitmaps can cost a lot of extra storage. Git uses a C ported version of [EWAHBoolArray](https://github.com/lemire/EWAHBoolArray) to compress bitmaps which get stored in the ".bitmap" file with the same prefix "sha" as ".pack" and ".idx". + +The aim of the project is to design a performance test suite as well as do the necessary changes to improve bitmap performance by trying out a new compression scheme that can make read operations along with other common operations like intersect, union and negate faster. + +Me & Git: + +## Microproject: + +I worked on the microproject "Avoid pipes in git related commands in test scripts", the patches for it has been merged to master now + +* [https://public-inbox.org/git/20220224054720.23996-3-shivam828787@gmail.com](https://public-inbox.org/git/20220224054720.23996-3-shivam828787@gmail.com/) +* [https://public-inbox.org/git/20220224054720.23996-3-shivam828787@gmail.co/](https://public-inbox.org/git/20220224054720.23996-3-shivam828787@gmail.com/) + + +I run a pattern matching grep to find all git commands on LHS of pipes and fix all of them from file t001-t050. + +As an outcome of this process, I got to learn the code review process at git work, which is quite cool and different from other organization's I contributed to before. + +I learned about building source code, running tests, using email to send patches, communicating with reviewers and sending the next patch version process. + +## Current understanding: + +* I have gone through git internals, and I well understood about the pack files as well as the difference between git objects (tree, blob, commit). +* I have gone through some documentations - "MyFirstObjectWalk", etc. it was a good hands-on to get some glimpse of general object related tasks. +* I understand how bitmap works in general, I have got some idea how EWAH compression works and also I have gone through the research paper on roaring run. +* I played with commands of pack-object - "git pack-objects dir --progress < obj_lists.txt" and read the code of related files "pack-bitmap.c" and parts of "pack-object.c" +* I checked the general documentation of [Croaring](https://github.com/RoaringBitmap/CRoaring) as one of the potential alternatives to EWAH. + + +Execution plan: + +I am interested in keeping my primary focus on "building a performance suite and improving bitmaps performance by finding a better compression scheme" project and if I finish this early or even after the GSoC timeline, I will be happy to contribute to other tasks too. + +From the idea page, I got some sense that decompressing a bitmap for reading bits or doing operations like intersection, negation and union makes them slow and can be improved. + +Roaring + Run was the suggested alternative to explore. It divides the data into chunks of 2^16, which allows you to check for the presence of any one value faster. As a result, Roaring can compute many operations much faster than run-length-encoded formats like WAH, EWAH, Concise. After getting a high level understanding of algorithms, I explored a bit of the [Croaring](https://github.com/RoaringBitmap/CRoaring) library which is a C implementation of roaring bitmap. It provides a lot of useful functions to do all general operations (find cardinality, and, or, copy, equals). Which I think we will be using in the "pack-bitmap.c" and "pack-bitmap-write.c" files as a replacement of ewah/bitmap functions. I do not have enough knowledge yet to figure out how compatible croaring is with the current .bitmap format. We might need to make changes in the current .bitmap format accordingly. + +Steps I will be following to accomplish the task- + +1\. Get a better understanding of bitmap related functionalities/ codebase, EWAH, roaring+run techniques. + +2\. try to build out an initial draft version implementing only minimal required core changes, I will try to get a review on it from a wider audience (including mentors). + +3\. make changes according to the comment and repeat the review process. + +4\. build a performance suite for the changes I have done in the above steps. + +5\. If performance improves, I will be writing the rest of the required code changes to use the Croaring including perf tests for them. + +6\. Until this time, I will also get a good understanding of the bitmap related projects, so if we will be able to make good progress on roaring+run. I can start picking other subprojects too like 'table of contents' for the .bitmap file where past work - [https://lore.kernel.org/git/YNuiM8TR5evSeNsN@nand.local/](https://lore.kernel.org/git/YNuiM8TR5evSeNsN@nand.local/) can act as a good reference to me or/and 'append-only bitmap generation' subproject. + +I feel for any mentoring program, Communication is the key to success. I will be utilizing the community bonding periods to figure out a process for being in regular touch with mentors, which is essential to make sure I am going in the right direction. + +Timeline + +I am available to dedicate around 30-35 hours every week to the project. + +Community bonding periods - + +1. Exploring code base mostly related to bitmaps +2. research on other bitmap compression techniques +3. reading technical documents +4. interacting with mentors to understand them and project in more detail. + + +12 June - 25 July: + +1. Writing the first version with the new compression technique +2. Get the initial version reviewed by reviewers and make changes accordingly. +3. write performance test for these changes + + +25 July - 4 sept + +1. if tests result well, extending the above functionality to completely move to a new technique. +2. writing more performance tests. +3. start picking up other tasks if time left + + +Sept 5 - Sept 12 + +1. I will make sure to get all changes merged before this week including tests +2. if not, make a decision with mentior on extending the project + + +Other proposals + +No, This is the only proposal I am making for GSoC 2022. + +Blog + +I want to make blog writing a habit so I planned to publish biweekly blogs at [https://shubham828.github.io/](https://shubham828.github.io/) during this GSoC and after that. This is something I started during my last GSoC too but unfortunately couldn't continue post-gsoc. This GSoC gives me another opportunity to become a regular blogger. + +After GSoC + +I would love to be a regular contributor of Git. After GSoC, I can pick any left out subprojects of bitmap reachability and I would also be happy to extend my knowledge beyond bitmaps to learn and contribute to other parts of git too. + +EndFragment \ No newline at end of file -- 2.25.1 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH 1/1] Initial draft proposal 2022-04-09 18:43 ` [PATCH 1/1] " Shubham Mishra @ 2022-04-13 20:44 ` Taylor Blau 0 siblings, 0 replies; 4+ messages in thread From: Taylor Blau @ 2022-04-13 20:44 UTC (permalink / raw) To: Shubham Mishra; +Cc: git, kaartic.sivaraam Hi Shubham, On Sun, Apr 10, 2022 at 12:13:50AM +0530, Shubham Mishra wrote: > Initial draft proposal for reachability bitmap Thanks so much for your interest in this project and the time you've spent putting together this proposal. Sorry that it has taken me a few days to get back to you. Let's take a look... (I should mention that this shouldn't be reviewed to be picked up by Junio, so it would be fine to just send the contents of "proposal.txt" as an email to the list directly instead of a patch. But it's fine to review it either way, just stating this in case other readers aren't sure). > About Me > > I am Shubham, I am currently working as a Software Engineer at > Microsoft India. I am a 2021 graduate from Delhi University. I am > passionate about core engineering and backend technologies. I love to > see beyond all abstractions and how things really work under the hood. > So, I can work from their roots and make things better. I feel > engineering is all about the tradeoffs that we make and I am trying to > learn them to become a better Engineer. > > I am passionate about open source technologies and have quite a good > amount of contribution to them, I participated in GSoC 2020 with > [KDE](https://summerofcode.withgoogle.com/archive/2020/projects/6473982317953024), > did Internship with Linux Foundation - > [HDV](https://wiki.lfnetworking.org/display/LN/HDV), Season of KDE - > [report](https://community.kde.org/SoK/2020/StatusReport/Shubham), and > I am doing voluntary contributions to VSperf, BoostC++ and some other > open-source projects. Great, that all sounds like terrific experience! Typically GSoC only allows accepted contributors to participate once, but an exception has been made for participants in the 2020 and 2021 cycles, so you would be eligible here under those rules. https://developers.google.com/open-source/gsoc/faq#what_are_the_eligibility_requirements_for_participation > Project Abstract > > During repository clones, the Git server needs to find out all the > objects which clients do not have and need to be sent to the client. > > To make the process faster, Git uses bitmaps to quickly find all the > related objects from an object. Bitmap approach is a performance > optimization over the legacy "Counting Objects" - the process in which > the git server used to iterate through the graph from branch tips to > the beginning of history to list down all objects that need to be > sent. > > bitmap made reachability faster but uncompressed bitmaps can cost a > lot of extra storage. Git uses a C ported version of > [EWAHBoolArray](https://github.com/lemire/EWAHBoolArray) to compress > bitmaps which get stored in the ".bitmap" file with the same prefix > "sha" as ".pack" and ".idx". Right; there are more than 8.5M objects in my (very outdated copy of linux.git), so it would be impractical to store an 8.5M-bit array for each selected commit. > The aim of the project is to design a performance test suite as well > as do the necessary changes to improve bitmap performance by trying > out a new compression scheme that can make read operations along with > other common operations like intersect, union and negate faster. Well stated. Like I mentioned in another similar proposal, I think validating our assumption that the performance of bitmap decompression is worth pursuing is a good first step. We may find that using a different compression scheme: - makes it faster to read bitmaps (by taking less time to decompress them) - makes it faster to write them (by taking less time to compress the bit-arrays) - use less memory for one, the other, or both of the above , alternatively, we may find that switching EWAH for something else doesn't make a meaningful dent in any of those areas, in which case we can find something else to work on, too ;-). > Me & Git: > > ## Microproject: > > I worked on the microproject "Avoid pipes in git related commands in > test scripts", the patches for it has been merged to master now > > * https://public-inbox.org/git/20220224054720.23996-3-shivam828787@gmail.com/ > * https://public-inbox.org/git/20220224054720.23996-3-shivam828787@gmail.com/ > > I run a pattern matching grep to find all git commands on LHS of pipes > and fix all of them from file t001-t050. > > As an outcome of this process, I got to learn the code review process > at git work, which is quite cool and different from other > organization's I contributed to before. > > I learned about building source code, running tests, using email to > send patches, communicating with reviewers and sending the next patch > version process. Terrific; I was glad to see that you were able to work on modernizing those test scripts by removing git invocations on the left-hand side of a pipe. Hopefully that gave you some good hands-on experience with our development process, which it sounds like it did. > ## Current understanding: > > - I have gone through git internals, and I well understood about the > pack files as well as the difference between git objects (tree, blob, > commit). > > - I have gone through some documentations - "MyFirstObjectWalk", etc. > it was a good hands-on to get some glimpse of general object related > tasks. > > - I understand how bitmap works in general, I have got some idea how > EWAH compression works and also I have gone through the research paper > on roaring run. > > - I played with commands of pack-object - "git pack-objects dir > --progress < obj_lists.txt" and read the code of related files > "pack-bitmap.c" and parts of "pack-object.c" Great; if you are looking for more areas to dig in, I'd recommend reading: - pack-bitmap-writer.c - midx.c (the bitmap-related parts) - and spending extra time with pack-objects.c (as you mention you already have, but this is the best place to look for how pack-objects uses bitmaps to accelerate its work). There are other parts of the code that use bitmaps, too, for some more straightforward tasks. See for e.g., the parts of builtin/rev-list.c that match a search for "bitmap". > Execution plan: > > I am interested in keeping my primary focus on "building a performance > suite and improving bitmaps performance by finding a better > compression scheme" project and if I finish this early or even after > the GSoC timeline, I will be happy to contribute to other tasks too. I think that's a good strategy. Like I mentioned in the other proposal, too, the goal behind having a few different sub-projects to pick from is that it gives you some additional flexibility to choose what is most interesting to you. It definitely isn't meant to suggest that we should do all of the sub-projects within, since I think any one of them done right could easily fill up the entire GSoC contribution period. > [...] I do not have enough knowledge yet to figure out how > compatible croaring is with the current .bitmap format. We might need > to make changes in the current .bitmap format accordingly. Good summary in the parts I elided with the [...] in the quoted section above. I think that we will likely want to introduce a new bitmap format. We *could* use the existing format, and indicate to older readers that we are using a newer format by incrementing the version identifier and otherwise leaving the format unchanged. But I would like to get away from the existing .bitmap format, since the way we read the hash-cache extension makes it difficult to add things like a variable-width table of contents. So it would be nice to make the .bitmap format compatible with the new-ish chunk-format API that Stolee worked on a year or two ago. Of course, this will be a not-insignificant project in and of itself. It also carries some notable downside that JGit won't be able to read the newer format of bitmaps. We could consider expanding the GSoC project to help JGit add support for that, but I don't think it's strictly necessary for you to take that on, since JGit should gracefully ignore the unknown version number. > Steps I will be following to accomplish the task- > > 1\. Get a better understanding of bitmap related functionalities/ > codebase, EWAH, roaring+run techniques. > > 2\. try to build out an initial draft version implementing only > minimal required core changes, I will try to get a review on it from a > wider audience (including mentors). > > 3\. make changes according to the comment and repeat the review > process. > > 4\. build a performance suite for the changes I have done in the above > steps. It may be worth making this occur after the first step in your proposal, since it removes bias from when you design the performance tests and keeps us honest about whether or not our changes are worth making. (Please don't worry about the possibility of replacing EWAH compression not helping bitmap performance; if it doesn't, there is no shortage of other things that we can work on in this area ;-)). > 6\. Until this time, I will also get a good understanding of the > bitmap related projects, so if we will be able to make good progress > on roaring+run. I can start picking other subprojects too like 'table > of contents' for the .bitmap file where past work - > [https://lore.kernel.org/git/YNuiM8TR5evSeNsN@nand.local/](https://lore.kernel.org/git/YNuiM8TR5evSeNsN@nand.local/) > can act as a good reference to me or/and 'append-only bitmap > generation' subproject. I think it makes sense to pick one or the other, since both will be fairly involved. Like I said earlier, I think we'll have to investigate a new .bitmap format for the "table of contents" project, so we may want to dedicate some time to sketching out what that would look like. > Timeline > > I am available to dedicate around 30-35 hours every week to the project. That seems reasonable to me. > Community bonding periods - > > 1. Exploring code base mostly related to bitmaps > 2. research on other bitmap compression techniques > 3. reading technical documents > 4. interacting with mentors to understand them and project in more detail. > > 12 June - 25 July: > > 1. Writing the first version with the new compression technique > 2. Get the initial version reviewed by reviewers and make changes accordingly. > 3. write performance test for these changes > > 25 July - 4 sept > > 1. if tests result well, extending the above functionality to completely move to a new technique. > 2. writing more performance tests. > 3. start picking up other tasks if time left > > Sept 5 - Sept 12 > > 1. I will make sure to get all changes merged before this week including tests > 2. if not, make a decision with mentior on extending the project This looks reasonable-ish to me, though I would suggest a couple of changes: - Dedicate some time to investigating whether or not a new .bitmap format is required up front, adjusting the rest of the schedule accordingly if it is. - Move the writing the performance tests up much earlier, since I think it makes sense to start there, rather than to wait until the end to start evaluating our changes (and it removes some degree of bias, like I mentioned earlier). I'm not too attached to the particular dates, TBH, and much more interested in the general pace and priority of things. > Blog > > I want to make blog writing a habit so I planned to publish biweekly > blogs at [https://shubham828.github.io/](https://shubham828.github.io/) > during this GSoC and after that. This is something I started during my > last GSoC too but unfortunately couldn't continue post-gsoc. This GSoC > gives me another opportunity to become a regular blogger. Wonderful, I'll look forward to reading your blog posts in this area if we end up working on it. > After GSoC > > I would love to be a regular contributor of Git. After GSoC, I can > pick any left out subprojects of bitmap reachability and I would also > be happy to extend my knowledge beyond bitmaps to learn and contribute > to other parts of git too. That's great! Thanks, Taylor ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2022-04-13 20:45 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-04-09 18:43 [GSoC] [PATCH 0/1] draft proposal for reachability bitmap Shubham Mishra 2022-04-09 18:43 ` [PATCH 1/1] [GSoC] Initial draft proposal Shubham Mishra 2022-04-09 18:43 ` [PATCH 1/1] " Shubham Mishra 2022-04-13 20:44 ` Taylor Blau
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).