ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:94402] [Ruby master Bug#16108] gsub gives wrong results with regex backreferencing and triple backslash
       [not found] <redmine.issue-16108.20190817183016@ruby-lang.org>
@ 2019-08-17 18:30 ` vivian.unger
  2019-08-17 18:38 ` [ruby-core:94403] " vivian.unger
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 5+ messages in thread
From: vivian.unger @ 2019-08-17 18:30 UTC (permalink / raw)
  To: ruby-core

Issue #16108 has been reported by VivianUnger (Vivian Unger).

----------------------------------------
Bug #16108: gsub gives wrong results with regex backreferencing and triple backslash
https://bugs.ruby-lang.org/issues/16108

* Author: VivianUnger (Vivian Unger)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.6.3p62 (2019-04-16 revision 67580) [x64-mingw32]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
I have written a script to convert LaTeX indexing files (.idx) to Macrex backup format (.mbk), so that I can import LaTeX-embedded indexes into the Macrex indexing program. A problem arises when I try to convert bolded text. LaTeX indicates bolded text with the tag \textbf{} while Macrex wraps it in backslashes: \\.

In my test case, the input string is "\indexentry{\textbf{bold}|hyperpage}{2}", which I need to convert into "\indexentry{\bold\|hyperpage}{2}". For this I am using:

record.gsub(/\\textbf\{([^\}]+)\}/, '\\\1\\')

But instead of the expected output, I get:

\indexentry{\1\|hyperpage}{2}

...as if I only had \\ rather than \\\.

I have tried the same Regex in a search-and-replace in Notepad++ and it works as expected. It's only in Ruby that I get this unexpected result.

The kludgey workaround I have found is to leave a space before the \\:

record.gsub(/\\textbf\{([^\}]+)\}/, '\\ \1\\')

...giving the result:

\indexentry{\ bold\|hyperpage}{2}

But this won't do. Macrex complains and the extra space has to be edited out. Imagine if you have hundreds of lines with bold text in them!



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [ruby-core:94403] [Ruby master Bug#16108] gsub gives wrong results with regex backreferencing and triple backslash
       [not found] <redmine.issue-16108.20190817183016@ruby-lang.org>
  2019-08-17 18:30 ` [ruby-core:94402] [Ruby master Bug#16108] gsub gives wrong results with regex backreferencing and triple backslash vivian.unger
@ 2019-08-17 18:38 ` vivian.unger
  2019-08-17 18:41 ` [ruby-core:94404] " vivian.unger
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 5+ messages in thread
From: vivian.unger @ 2019-08-17 18:38 UTC (permalink / raw)
  To: ruby-core

Issue #16108 has been updated by VivianUnger (Vivian Unger).


I have written a script to convert LaTeX indexing files (.idx) to Macrex backup format (.mbk), so that I can import LaTeX-embedded indexes into the Macrex indexing program. A problem arises when I try to convert bold text. LaTeX indicates bold text with the tag \textbf{[bold text]} while Macrex wraps it in backslashes: \\[bold text]\\.

In my test case, the input string is:

```
\indexentry{\textbf{bold}|hyperpage}{2}
```

I need to convert this into:

```
\indexentry{\bold\|hyperpage}{2}
```

For this I am using the following code:

``` ruby
record.gsub(/\\textbf\{([^\}]+)\}/, '\\\1\\')
```

But instead of the expected output, I get:

```
\indexentry{\1\|hyperpage}{2}
```

...as if I only had 2 backslashes rather than three.

I have tried using the same Regex in a search-and-replace in Notepad++ and it works as expected. It's only in Ruby that I get this unexpected result.

The kludgey workaround I have found is to leave a space before the two backslashes:

``` ruby
record.gsub(/\\textbf\{([^\}]+)\}/, '\\ \1\\')
```

...giving the result:

```
\indexentry{\ bold\|hyperpage}{2}
```

But this won't do. Macrex complains and the extra space has to be edited out. Imagine if you have hundreds of lines with bold text in them!

----------------------------------------
Bug #16108: gsub gives wrong results with regex backreferencing and triple backslash
https://bugs.ruby-lang.org/issues/16108#change-80823

* Author: VivianUnger (Vivian Unger)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.6.3p62 (2019-04-16 revision 67580) [x64-mingw32]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
I have written a script to convert LaTeX indexing files (.idx) to Macrex backup format (.mbk), so that I can import LaTeX-embedded indexes into the Macrex indexing program. A problem arises when I try to convert bolded text. LaTeX indicates bolded text with the tag \textbf{} while Macrex wraps it in backslashes: \\.

In my test case, the input string is "\indexentry{\textbf{bold}|hyperpage}{2}", which I need to convert into "\indexentry{\bold\|hyperpage}{2}". For this I am using:

record.gsub(/\\textbf\{([^\}]+)\}/, '\\\1\\')

But instead of the expected output, I get:

\indexentry{\1\|hyperpage}{2}

...as if I only had \\ rather than \\\.

I have tried the same Regex in a search-and-replace in Notepad++ and it works as expected. It's only in Ruby that I get this unexpected result.

The kludgey workaround I have found is to leave a space before the \\:

record.gsub(/\\textbf\{([^\}]+)\}/, '\\ \1\\')

...giving the result:

\indexentry{\ bold\|hyperpage}{2}

But this won't do. Macrex complains and the extra space has to be edited out. Imagine if you have hundreds of lines with bold text in them!



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [ruby-core:94404] [Ruby master Bug#16108] gsub gives wrong results with regex backreferencing and triple backslash
       [not found] <redmine.issue-16108.20190817183016@ruby-lang.org>
  2019-08-17 18:30 ` [ruby-core:94402] [Ruby master Bug#16108] gsub gives wrong results with regex backreferencing and triple backslash vivian.unger
  2019-08-17 18:38 ` [ruby-core:94403] " vivian.unger
@ 2019-08-17 18:41 ` vivian.unger
  2019-08-17 20:00 ` [ruby-core:94405] " XrXr
  2019-08-19  1:17 ` [ruby-core:94420] " shyouhei
  4 siblings, 0 replies; 5+ messages in thread
From: vivian.unger @ 2019-08-17 18:41 UTC (permalink / raw)
  To: ruby-core

Issue #16108 has been updated by VivianUnger (Vivian Unger).


I have written a script to convert LaTeX indexing files (.idx) to Macrex backup format (.mbk), so that I can import LaTeX-embedded indexes into the Macrex indexing program. A problem arises when I try to convert bold text. LaTeX indicates bold text with the tag \textbf{[bold text]} while Macrex wraps it in backslashes: \\[bold text]\\.

In my test case, the input string is:

```
\indexentry{\textbf{bold}|hyperpage}{2}
```

I need to convert this into:

```
\indexentry{\bold\|hyperpage}{2}
```

For this I am using the following code:

``` ruby
record.gsub(/\\textbf\{([^\}]+)\}/, '\\\1\\')
```

But instead of the expected output, I get:

```
\indexentry{\1\|hyperpage}{2}
```

...as if I only had 2 backslashes rather than three.

I have tried using the same Regex in a search-and-replace in Notepad++ and it works as expected. It's only in Ruby that I get this unexpected result.

The kludgey workaround I have found is to leave a space before the two backslashes:

``` ruby
record.gsub(/\\textbf\{([^\}]+)\}/, '\\ \1\\')
```

...giving the result:

```
\indexentry{\ bold\|hyperpage}{2}
```

But this won't do. Macrex complains and the extra space has to be edited out. Imagine if you have hundreds of lines with bold text in them!

----------------------------------------
Bug #16108: gsub gives wrong results with regex backreferencing and triple backslash
https://bugs.ruby-lang.org/issues/16108#change-80824

* Author: VivianUnger (Vivian Unger)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.6.3p62 (2019-04-16 revision 67580) [x64-mingw32]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
I have written a script to convert LaTeX indexing files (.idx) to Macrex backup format (.mbk), so that I can import LaTeX-embedded indexes into the Macrex indexing program. A problem arises when I try to convert bolded text. LaTeX indicates bolded text with the tag \textbf{} while Macrex wraps it in backslashes: \\.

In my test case, the input string is "\indexentry{\textbf{bold}|hyperpage}{2}", which I need to convert into "\indexentry{\bold\|hyperpage}{2}". For this I am using:

record.gsub(/\\textbf\{([^\}]+)\}/, '\\\1\\')

But instead of the expected output, I get:

\indexentry{\1\|hyperpage}{2}

...as if I only had \\ rather than \\\.

I have tried the same Regex in a search-and-replace in Notepad++ and it works as expected. It's only in Ruby that I get this unexpected result.

The kludgey workaround I have found is to leave a space before the \\:

record.gsub(/\\textbf\{([^\}]+)\}/, '\\ \1\\')

...giving the result:

\indexentry{\ bold\|hyperpage}{2}

But this won't do. Macrex complains and the extra space has to be edited out. Imagine if you have hundreds of lines with bold text in them!



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [ruby-core:94405] [Ruby master Bug#16108] gsub gives wrong results with regex backreferencing and triple backslash
       [not found] <redmine.issue-16108.20190817183016@ruby-lang.org>
                   ` (2 preceding siblings ...)
  2019-08-17 18:41 ` [ruby-core:94404] " vivian.unger
@ 2019-08-17 20:00 ` XrXr
  2019-08-19  1:17 ` [ruby-core:94420] " shyouhei
  4 siblings, 0 replies; 5+ messages in thread
From: XrXr @ 2019-08-17 20:00 UTC (permalink / raw)
  To: ruby-core

Issue #16108 has been updated by alanwu (Alan Wu).


The source of your problem seem to be the behavior below:
```ruby
p ' \1 '.bytes # => [32, 92, 49, 32]
p ' \\ '.bytes # => [32, 92, 32]
p ' \ '.bytes  # => [32, 92, 32]
```
as you can see, two backslashes in a single quote string literal only gives one backslash in the resulting string.

This is future complicated by gsub interpreting the content of the second argument as a replacement directive. The means interpreting the backslashes for a second time. You want the final replacement to be "one backslash, followed by the first match group, then another backslash", or literally `\\1\` (`[92, 92, 49, 92]`). The replacement directive to express this is `\\\1\\` (`[92, 92, 92, 49, 92, 92]`), as we need to escape the first and last backslash by doubling them. We don't want to double the backslash right before "1", as we are not looking for a literal backslash there.

Now we need to construct a Ruby string literal we can put in the source code that would give us the replacement directive we want, which we could do by doubling all the backslashes:

```ruby
p '\\\\\\1\\\\'.bytes # => [92, 92, 92, 49, 92, 92]
```

We could get rid of one of the backslashes in the before "1", the single quote literal `'\1'` gives `[92, 49]`:
```ruby
p '\\\\\1\\\\'.bytes # => [92, 92, 92, 49, 92, 92]
```
We could also get rid of two backslashes after the 1 as `gsub` interprets the lone backslash at the end as a literal backslash.

This is too many backslashes for my taste, so I would prefer the block form. It takes the return value of block and substitute that for the mach verbatim. The special `$1` variable is set within the gsub block, which we can use to build the replacement we want:

```ruby
input.gsub(pattern) { ["\\", $1, "\\"].join }
```
---
Here is a test program for you:

```ruby
input = '\indexentry{\textbf{bold}|hyperpage}{2}'
pattern = /\\textbf\{([^\}]+)\}/

test = ->(replacement) {
  puts "result: #{input.gsub(pattern, replacement)}, replacement: #{replacement.bytes}.map(&:chr).join"
}
test.call('\\\1\\')
test.call('\\ \1\\')
test.call('\\\\\\1\\\\')
test.call('\\\\\\1\\')
test.call('\\\\\1\\')

$stdout.write "alternative: "
puts input.gsub(pattern) { ["\\", $1, "\\"].join }
```


----------------------------------------
Bug #16108: gsub gives wrong results with regex backreferencing and triple backslash
https://bugs.ruby-lang.org/issues/16108#change-80825

* Author: VivianUnger (Vivian Unger)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.6.3p62 (2019-04-16 revision 67580) [x64-mingw32]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
I have written a script to convert LaTeX indexing files (.idx) to Macrex backup format (.mbk), so that I can import LaTeX-embedded indexes into the Macrex indexing program. A problem arises when I try to convert bolded text. LaTeX indicates bolded text with the tag \textbf{} while Macrex wraps it in backslashes: \\.

In my test case, the input string is "\indexentry{\textbf{bold}|hyperpage}{2}", which I need to convert into "\indexentry{\bold\|hyperpage}{2}". For this I am using:

record.gsub(/\\textbf\{([^\}]+)\}/, '\\\1\\')

But instead of the expected output, I get:

\indexentry{\1\|hyperpage}{2}

...as if I only had \\ rather than \\\.

I have tried the same Regex in a search-and-replace in Notepad++ and it works as expected. It's only in Ruby that I get this unexpected result.

The kludgey workaround I have found is to leave a space before the \\:

record.gsub(/\\textbf\{([^\}]+)\}/, '\\ \1\\')

...giving the result:

\indexentry{\ bold\|hyperpage}{2}

But this won't do. Macrex complains and the extra space has to be edited out. Imagine if you have hundreds of lines with bold text in them!



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [ruby-core:94420] [Ruby master Bug#16108] gsub gives wrong results with regex backreferencing and triple backslash
       [not found] <redmine.issue-16108.20190817183016@ruby-lang.org>
                   ` (3 preceding siblings ...)
  2019-08-17 20:00 ` [ruby-core:94405] " XrXr
@ 2019-08-19  1:17 ` shyouhei
  4 siblings, 0 replies; 5+ messages in thread
From: shyouhei @ 2019-08-19  1:17 UTC (permalink / raw)
  To: ruby-core

Issue #16108 has been updated by shyouhei (Shyouhei Urabe).

Status changed from Open to Rejected

This is a designed behaviour.  A backslash character is first cooked by the ruby interpreter (to handle `\'` etc), then cooked again by gsub's own preprocessor (to handle `\1` etc).  You have to understand exactly what is going on to play with it.

Don't hesitate to resort to the alternative solution shown in @alanwu's comment.

----------------------------------------
Bug #16108: gsub gives wrong results with regex backreferencing and triple backslash
https://bugs.ruby-lang.org/issues/16108#change-80841

* Author: VivianUnger (Vivian Unger)
* Status: Rejected
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.6.3p62 (2019-04-16 revision 67580) [x64-mingw32]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
I have written a script to convert LaTeX indexing files (.idx) to Macrex backup format (.mbk), so that I can import LaTeX-embedded indexes into the Macrex indexing program. A problem arises when I try to convert bolded text. LaTeX indicates bolded text with the tag \textbf{} while Macrex wraps it in backslashes: \\.

In my test case, the input string is "\indexentry{\textbf{bold}|hyperpage}{2}", which I need to convert into "\indexentry{\bold\|hyperpage}{2}". For this I am using:

record.gsub(/\\textbf\{([^\}]+)\}/, '\\\1\\')

But instead of the expected output, I get:

\indexentry{\1\|hyperpage}{2}

...as if I only had \\ rather than \\\.

I have tried the same Regex in a search-and-replace in Notepad++ and it works as expected. It's only in Ruby that I get this unexpected result.

The kludgey workaround I have found is to leave a space before the \\:

record.gsub(/\\textbf\{([^\}]+)\}/, '\\ \1\\')

...giving the result:

\indexentry{\ bold\|hyperpage}{2}

But this won't do. Macrex complains and the extra space has to be edited out. Imagine if you have hundreds of lines with bold text in them!



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-08-19  1:17 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <redmine.issue-16108.20190817183016@ruby-lang.org>
2019-08-17 18:30 ` [ruby-core:94402] [Ruby master Bug#16108] gsub gives wrong results with regex backreferencing and triple backslash vivian.unger
2019-08-17 18:38 ` [ruby-core:94403] " vivian.unger
2019-08-17 18:41 ` [ruby-core:94404] " vivian.unger
2019-08-17 20:00 ` [ruby-core:94405] " XrXr
2019-08-19  1:17 ` [ruby-core:94420] " shyouhei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).