git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* [Outreachy][Proposal] Accelerate rename detection and the range-diff
@ 2020-10-26  7:49 Sangeeta NB
  2020-10-26 16:52 ` Elijah Newren
  0 siblings, 1 reply; 5+ messages in thread
From: Sangeeta NB @ 2020-10-26  7:49 UTC (permalink / raw)
  To: Kaartic Sivaraam, christian.couder, Git List

Hey Everyone,

I would love to participate in outreachy this year with Git in the
project "Accelerate rename detection and the range-diff command in
Git". I have contributed to the microproject "Unify the meaning of
dirty between diff and describe"[1] which is still under review, but
through the process, I have got myself familiar with the mailing list
and patch review system. I am also contributing to another issue[2]
which is still under discussion[3] about `git bisect` and `git
rebase`.

[1] https://lore.kernel.org/git/pull.751.git.1602781723670.gitgitgadget@gmail.com
[2] https://github.com/gitgitgadget/git/issues/486
[3] https://lore.kernel.org/git/pull.765.git.1603271344522.gitgitgadget@gmail.com/

Coming to the project, I have read more about it[4] and have created
the initial version for the timeline. I would really love to have
comments on it.

[4] https://github.com/gitgitgadget/git/issues/519

Also, there's a column for community-specific questions in the final
application. Is there anything specific that I have to fill in that?

Please let me know if I missed anything.

Looking forward to working and learning with you all.

Thanks and Regards,
Sangeeta

=================================================

Link to docs: https://docs.google.com/document/d/15mgqy4id1fXZWE1NvBEERWvET9zy-ZEfhp4x0NNv_d4/edit?usp=sharing

=================================================

## Accelerate rename detection and the range-diff command in Git

# Timeline

## Nov 23 - Dec 1(Before intern officially starts)

* Getting to know the mentors.
* Bonding with the community.
* Understanding the structure of the code and familiarizing myself
with the requirements during the internship period.
* Create a concrete workflow for outreachy tasks.


## Dec 1 - Dec 20

* Study about various Approximate Nearest Neighbor Search algorithms.
* There are various comparisons for the Approximate Nearest Neighbor
algorithm like:
* [ANN benchmarks](http://ann-benchmarks.com/)
* [How to benchmark ANN
algorithms](https://medium.com/gsi-technology/how-to-benchmark-ann-algorithms-a9f1cef6be08)

* Would compare all the algorithms and would narrow down to one or two
best algorithms for our use case.

## Dec 11: Initial point of feedback

* Would take feedback from the mentors and would ask about all the
expectations that mentors and the community have from me.

## Dec 21 - Jan 05

* Would study how Locality Sensitive Hashing (data-independent) or
Locality Preserving Hashing (data-dependent) can improve our accuracy
(or even complexity).
* Would study various hashing algorithms and combine them with our
nearest neighbor search algorithm.

## Jan 06 - Jan 20
* Study if a pre-trained Support Vector Machine can add something to
our use case.
* Study how different organizations(eg Gerrit) decide if two commits
are similar or not.
* SVM’s have accuracy disadvantage as compared to nearest neighbor
algorithms. Therefore, I would look into ways if we can create a
hybrid algorithm which uses SVM’s and nearest neighbor algorithms and
get better accuracy. There are also some research papers on the same.
I would study that and would finalize the algorithm after discussion
with mentors and the community.

## Jan 12: Midpoint feedback
* Would take feedback from the mentors and would ask about ways where
I can improve or places where I was lagging.

## Jan 21 - Feb 15
* Implement the finalized algorithm.
* Benchmark its accuracy and complexity against existing methods.
* Use it for the rename detection and for commit matching in `git range-diff`.
* Update the documentation for the same.


## Feb 16 - Mar 02 ( Wrap up)
* Buffer period for incomplete work.
* Wrap up the code.
* Implement the reviews and suggestions given by mentors.
* Write documentation for the code if required.
* Get my patches merged.


## Mar 02: Final feedback
* Would take the final feedback from the mentors and would ask about
ways where I could have improved on.
* Would talk about ways to connect even after the Outreachy period.


## Post-Outreachy
* I intend to keep contributing even after the Outreachy period ends.
* Would love to co-mentor(if possible) in the next outreachy and GSoC rounds.
* Would love to review patches of other contributors and take part in
the mailing list discussions.


# Other Involvements
* Blogging is an important part of Outreachy, therefore I would love
to write a blog every weekend or every fortnight, as discussed with
mentors, writing in it the summary of work done so far, anything I
learned in that week, and my experience.
* I would also be glad to help other contributors and users solve
their issues and help the maintainers in reviewing patches over the
outreachy period and even after that.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-11-02 18:35 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-26  7:49 [Outreachy][Proposal] Accelerate rename detection and the range-diff Sangeeta NB
2020-10-26 16:52 ` Elijah Newren
2020-10-30  9:02   ` Kaartic Sivaraam
2020-10-31 20:31     ` Elijah Newren
2020-11-02 18:35       ` Kaartic Sivaraam

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).