git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Sangeeta NB <sangunb09@gmail.com>
To: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>,
	christian.couder@gmail.com, Git List <git@vger.kernel.org>
Subject: [Outreachy][Proposal] Accelerate rename detection and the range-diff
Date: Mon, 26 Oct 2020 13:19:50 +0530	[thread overview]
Message-ID: <CAHjREB6Hh+urW3j2c9p45ZudSdDv0rUP28Lb4e4TZasqTzRmDA@mail.gmail.com> (raw)

Hey Everyone,

I would love to participate in outreachy this year with Git in the
project "Accelerate rename detection and the range-diff command in
Git". I have contributed to the microproject "Unify the meaning of
dirty between diff and describe"[1] which is still under review, but
through the process, I have got myself familiar with the mailing list
and patch review system. I am also contributing to another issue[2]
which is still under discussion[3] about `git bisect` and `git
rebase`.

[1] https://lore.kernel.org/git/pull.751.git.1602781723670.gitgitgadget@gmail.com
[2] https://github.com/gitgitgadget/git/issues/486
[3] https://lore.kernel.org/git/pull.765.git.1603271344522.gitgitgadget@gmail.com/

Coming to the project, I have read more about it[4] and have created
the initial version for the timeline. I would really love to have
comments on it.

[4] https://github.com/gitgitgadget/git/issues/519

Also, there's a column for community-specific questions in the final
application. Is there anything specific that I have to fill in that?

Please let me know if I missed anything.

Looking forward to working and learning with you all.

Thanks and Regards,
Sangeeta

=================================================

Link to docs: https://docs.google.com/document/d/15mgqy4id1fXZWE1NvBEERWvET9zy-ZEfhp4x0NNv_d4/edit?usp=sharing

=================================================

## Accelerate rename detection and the range-diff command in Git

# Timeline

## Nov 23 - Dec 1(Before intern officially starts)

* Getting to know the mentors.
* Bonding with the community.
* Understanding the structure of the code and familiarizing myself
with the requirements during the internship period.
* Create a concrete workflow for outreachy tasks.


## Dec 1 - Dec 20

* Study about various Approximate Nearest Neighbor Search algorithms.
* There are various comparisons for the Approximate Nearest Neighbor
algorithm like:
* [ANN benchmarks](http://ann-benchmarks.com/)
* [How to benchmark ANN
algorithms](https://medium.com/gsi-technology/how-to-benchmark-ann-algorithms-a9f1cef6be08)

* Would compare all the algorithms and would narrow down to one or two
best algorithms for our use case.

## Dec 11: Initial point of feedback

* Would take feedback from the mentors and would ask about all the
expectations that mentors and the community have from me.

## Dec 21 - Jan 05

* Would study how Locality Sensitive Hashing (data-independent) or
Locality Preserving Hashing (data-dependent) can improve our accuracy
(or even complexity).
* Would study various hashing algorithms and combine them with our
nearest neighbor search algorithm.

## Jan 06 - Jan 20
* Study if a pre-trained Support Vector Machine can add something to
our use case.
* Study how different organizations(eg Gerrit) decide if two commits
are similar or not.
* SVM’s have accuracy disadvantage as compared to nearest neighbor
algorithms. Therefore, I would look into ways if we can create a
hybrid algorithm which uses SVM’s and nearest neighbor algorithms and
get better accuracy. There are also some research papers on the same.
I would study that and would finalize the algorithm after discussion
with mentors and the community.

## Jan 12: Midpoint feedback
* Would take feedback from the mentors and would ask about ways where
I can improve or places where I was lagging.

## Jan 21 - Feb 15
* Implement the finalized algorithm.
* Benchmark its accuracy and complexity against existing methods.
* Use it for the rename detection and for commit matching in `git range-diff`.
* Update the documentation for the same.


## Feb 16 - Mar 02 ( Wrap up)
* Buffer period for incomplete work.
* Wrap up the code.
* Implement the reviews and suggestions given by mentors.
* Write documentation for the code if required.
* Get my patches merged.


## Mar 02: Final feedback
* Would take the final feedback from the mentors and would ask about
ways where I could have improved on.
* Would talk about ways to connect even after the Outreachy period.


## Post-Outreachy
* I intend to keep contributing even after the Outreachy period ends.
* Would love to co-mentor(if possible) in the next outreachy and GSoC rounds.
* Would love to review patches of other contributors and take part in
the mailing list discussions.


# Other Involvements
* Blogging is an important part of Outreachy, therefore I would love
to write a blog every weekend or every fortnight, as discussed with
mentors, writing in it the summary of work done so far, anything I
learned in that week, and my experience.
* I would also be glad to help other contributors and users solve
their issues and help the maintainers in reviewing patches over the
outreachy period and even after that.

             reply	other threads:[~2020-10-26  7:50 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-26  7:49 Sangeeta NB [this message]
2020-10-26 16:52 ` [Outreachy][Proposal] Accelerate rename detection and the range-diff Elijah Newren
2020-10-30  9:02   ` Kaartic Sivaraam
2020-10-31 20:31     ` Elijah Newren
2020-11-02 18:35       ` Kaartic Sivaraam

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHjREB6Hh+urW3j2c9p45ZudSdDv0rUP28Lb4e4TZasqTzRmDA@mail.gmail.com \
    --to=sangunb09@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=kaartic.sivaraam@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).