git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Samuel Wales <samologist@gmail.com>
To: git@vger.kernel.org
Subject: Re: is this data corruption?
Date: Fri, 30 Dec 2022 17:33:20 -0700	[thread overview]
Message-ID: <CAJcAo8smwU2ddB96J+G2SOAP+FU2p4ejB2JX9+5QHdQEn43htQ@mail.gmail.com> (raw)
In-Reply-To: <CAJcAo8tjMLFisK5_13iD_JGo2xVQDJRX3wAC7wRD_V2GKFGevQ@mail.gmail.com>

p.p.s.  git 2.20 has the same problem.

On 12/30/22, Samuel Wales <samologist@gmail.com> wrote:
> i am not subscribed, but am of the impression that's ok.  please copy
> me directly.
>
>
> tldr: git diff is showing differences that do not exist in the files
> themselves.
>
> i have nothing staged, nothing fancy like stashing, etc.  this is a
> repo of mostly emacs org mode files.  mostly ascii text.
>
> git status and these commands show nothing unusual:
>
>     git fsck --strict --no-dangling
>     git gc --prune="0 days"
>
>
> the problem that seems like data corruption is that a few lines appear
> twice as - and once as +.  but in the current version of the files,
> those lines exist only once.  here are the lines.  there are 2 -
> versions and one + version:
>
> +***************** REF bigpart is a partition
> +biglike and homelike are distracting nonsense i think except
> +to describe inferior filesets.  anomalous subset of home
> +might be called homelike or so.
>
>
> emacs magit shows the same problem.  however, it shows a slightly
> different diff.  i did a meta-diff on git diff vs. magit, and there
> are about 800 + real-content lines that magit shows but git diff does
> not.  i do not know what this means.  wc -l is like
>
>   62540 aaa.diff
>   62965 bbb--magit.txt
>
> idk why a diff would be different with only + lines being different?
>
>
> in summary, what is wrong with my repo, if anything, and what can i do
> about it?  nothing on the web for git corruption seems to say much,
> other than pull from github or whatever.  this is my own repo, the
> original repo, so i cannot do that.  org annex has an uncorrupt tool
> of some kind, but it did not seem relevant.  i do have rsnapshot
> [basically rsync] backups of the repo and the most significant files
> and dirs, but i do not know what one does to use that to repair any
> issues.  i won't get into why, but changes were made over months.
>
> is there a protocol for this?
>
> would git fsck have balked?
>
> thank you!
>
>
> p.s.
>
> i have no reason to believe this is related, but git diff has
> intermingled emacs org mode entries.  but i don't have to talk about
> it in org terms; in generic text terms, it has intermingled parts of
> different paragraphs.  as a user, i'd prefer that completely unrelated
> paragraphs not be mingled, regardless of minimality.  if possible.
>
> with respect to the intermingling only, unless this is related to the
> possible corruption, i will presume the diff is correct, in that a
> patch from it would produce the same result as a patch that does not
> intermingle.  i believe this intermingling is because diff does not
> understand org, or paragraphs for that matter.  in org, an entry
> starts with "^[*]+ " and ends at the beginning of another entry or at
> eof.  they consist in my case mostly of ascii text paragraphs.  just
> as with paragraphs, if you move an entry, you don't expect it to be
> mingled with a different one in the diff.
>
> i have been told that this cannot be fixed by merely telling a
> slightly improved differ that stuff between stars is worth preserving,
> but that a parser, not merely a couple of regexps, is needed to reduce
> this intermingling.  i have also been told that difftastic uses
> tree-sitter, which might get such a syntax for emacs org mode.  and so
> maybe at some point git diff can use that.  idk.
>
> idk if any of this is related but i include it for completeness.
>
> also, please don't laugh, but i am using git version 2.11.0.  i will
> upgrade pending various library and os stuff but my main concern is
> not for git, but for possible corruption in the repo and what is
> possible to do, at least given rsnapshot, to fix it.
>


-- 
The Kafka Pandemic

A blog about science, health, human rights, and misopathy:
https://thekafkapandemic.blogspot.com

  parent reply	other threads:[~2022-12-31  0:33 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-31  0:17 is this data corruption? Samuel Wales
2022-12-31  0:25 ` rsbecker
2022-12-31  0:57   ` Samuel Wales
2023-01-01  1:15     ` Samuel Wales
2023-01-01  1:14   ` Samuel Wales
2022-12-31  0:33 ` Samuel Wales [this message]
2023-01-01  4:07 ` Junio C Hamano
2023-01-01  4:46   ` Samuel Wales
2023-01-01  4:48     ` Samuel Wales

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJcAo8smwU2ddB96J+G2SOAP+FU2p4ejB2JX9+5QHdQEn43htQ@mail.gmail.com \
    --to=samologist@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).