ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:105660] [Ruby master Feature#18254] Add an `offset` parameter to String#unpack and String#unpack1
@ 2021-10-18  8:04 byroot (Jean Boussier)
  2021-10-18  8:24 ` [ruby-core:105661] " znz (Kazuhiro NISHIYAMA)
                   ` (11 more replies)
  0 siblings, 12 replies; 13+ messages in thread
From: byroot (Jean Boussier) @ 2021-10-18  8:04 UTC (permalink / raw)
  To: ruby-core

Issue #18254 has been reported by byroot (Jean Boussier).

----------------------------------------
Feature #18254: Add an `offset` parameter to String#unpack and String#unpack1
https://bugs.ruby-lang.org/issues/18254

* Author: byroot (Jean Boussier)
* Status: Open
* Priority: Normal
----------------------------------------
When working with binary protocols it's common to have to first unpack some kind of header or type prefix, and then based on that unpack another part of the string.

For instance here's [a code snippet from Dalli, the most common Memcached client](https://github.com/petergoldstein/dalli/blob/76b79d78cda13562da17bc99f92edcedf1873994/lib/dalli/protocol/binary.rb#L156-L184):

```ruby
while buf.bytesize - pos >= 24
  header = buf.slice(pos, 24)
  (key_length, _, body_length, cas) = header.unpack(KV_HEADER)

  if key_length == 0
    # all done!
    @multi_buffer = nil
    @position = nil
    @inprogress = false
    break

  elsif buf.bytesize - pos >= 24 + body_length
    flags = buf.slice(pos + 24, 4).unpack1("N")
    key = buf.slice(pos + 24 + 4, key_length)
    value = buf.slice(pos + 24 + 4 + key_length, body_length - key_length - 4) if body_length - key_length - 4 > 0

    pos = pos + 24 + body_length

    begin
      values[key] = [deserialize(value, flags), cas]
    rescue DalliError
    end

  else
    # not enough data yet, wait for more
    break
  end
end
@position = pos
```

### Proposal

If `unpack` and `unpack1` had an `offset:` parameter, it would allow this kind of code to extract the fields it needs without allocating and copying as much strings, e.g.:

```ruby
flags = buf.slice(pos + 24, 4).unpack1("N")
```

could be:

```ruby
buf.unpack1("N", offset: pos + 24)
```




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [ruby-core:105661] [Ruby master Feature#18254] Add an `offset` parameter to String#unpack and String#unpack1
  2021-10-18  8:04 [ruby-core:105660] [Ruby master Feature#18254] Add an `offset` parameter to String#unpack and String#unpack1 byroot (Jean Boussier)
@ 2021-10-18  8:24 ` znz (Kazuhiro NISHIYAMA)
  2021-10-18  8:32 ` [ruby-core:105662] " byroot (Jean Boussier)
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: znz (Kazuhiro NISHIYAMA) @ 2021-10-18  8:24 UTC (permalink / raw)
  To: ruby-core

Issue #18254 has been updated by znz (Kazuhiro NISHIYAMA).


You can use `unpack1("@#{pos + 24}N")`.

----------------------------------------
Feature #18254: Add an `offset` parameter to String#unpack and String#unpack1
https://bugs.ruby-lang.org/issues/18254#change-94160

* Author: byroot (Jean Boussier)
* Status: Open
* Priority: Normal
----------------------------------------
When working with binary protocols it's common to have to first unpack some kind of header or type prefix, and then based on that unpack another part of the string.

For instance here's [a code snippet from Dalli, the most common Memcached client](https://github.com/petergoldstein/dalli/blob/76b79d78cda13562da17bc99f92edcedf1873994/lib/dalli/protocol/binary.rb#L156-L184):

```ruby
while buf.bytesize - pos >= 24
  header = buf.slice(pos, 24)
  (key_length, _, body_length, cas) = header.unpack(KV_HEADER)

  if key_length == 0
    # all done!
    @multi_buffer = nil
    @position = nil
    @inprogress = false
    break

  elsif buf.bytesize - pos >= 24 + body_length
    flags = buf.slice(pos + 24, 4).unpack1("N")
    key = buf.slice(pos + 24 + 4, key_length)
    value = buf.slice(pos + 24 + 4 + key_length, body_length - key_length - 4) if body_length - key_length - 4 > 0

    pos = pos + 24 + body_length

    begin
      values[key] = [deserialize(value, flags), cas]
    rescue DalliError
    end

  else
    # not enough data yet, wait for more
    break
  end
end
@position = pos
```

### Proposal

If `unpack` and `unpack1` had an `offset:` parameter, it would allow this kind of code to extract the fields it needs without allocating and copying as much strings, e.g.:

```ruby
flags = buf.slice(pos + 24, 4).unpack1("N")
```

could be:

```ruby
buf.unpack1("N", offset: pos + 24)
```




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [ruby-core:105662] [Ruby master Feature#18254] Add an `offset` parameter to String#unpack and String#unpack1
  2021-10-18  8:04 [ruby-core:105660] [Ruby master Feature#18254] Add an `offset` parameter to String#unpack and String#unpack1 byroot (Jean Boussier)
  2021-10-18  8:24 ` [ruby-core:105661] " znz (Kazuhiro NISHIYAMA)
@ 2021-10-18  8:32 ` byroot (Jean Boussier)
  2021-10-18 14:26 ` [ruby-core:105663] " byroot (Jean Boussier)
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: byroot (Jean Boussier) @ 2021-10-18  8:32 UTC (permalink / raw)
  To: ruby-core

Issue #18254 has been updated by byroot (Jean Boussier).


Ah, I didn't know about it, but then you just allocated a string and converted an integer to string, so it's even slower than the `slice` pattern:

```ruby
# frozen_string_literal: true
require 'benchmark/ips'

STRING = Random.bytes(200)
POS = 12
Benchmark.ips do |x|
  x.report("no-offset") { STRING.unpack1("N") }
  x.report("slice-offset") { STRING.slice(POS, 4).unpack1("N")}
  x.report("unpack-offset") { STRING.unpack1("@#{POS}N") }
  x.compare!
end
```

```
# Ruby 2.7.2
Warming up --------------------------------------
           no-offset     1.016M i/100ms
        slice-offset   532.173k i/100ms
       unpack-offset   321.805k i/100ms
Calculating -------------------------------------
           no-offset     10.090M (± 1.2%) i/s -     50.782M in   5.033549s
        slice-offset      5.318M (± 2.1%) i/s -     26.609M in   5.005346s
       unpack-offset      3.205M (± 1.8%) i/s -     16.090M in   5.021922s

Comparison:
           no-offset: 10090269.9 i/s
        slice-offset:  5318453.9 i/s - 1.90x  (± 0.00) slower
       unpack-offset:  3205017.9 i/s - 3.15x  (± 0.00) slower
```

Based on this, an `offset` parameter could make the current code almost 2x more efficient.

----------------------------------------
Feature #18254: Add an `offset` parameter to String#unpack and String#unpack1
https://bugs.ruby-lang.org/issues/18254#change-94161

* Author: byroot (Jean Boussier)
* Status: Open
* Priority: Normal
----------------------------------------
When working with binary protocols it's common to have to first unpack some kind of header or type prefix, and then based on that unpack another part of the string.

For instance here's [a code snippet from Dalli, the most common Memcached client](https://github.com/petergoldstein/dalli/blob/76b79d78cda13562da17bc99f92edcedf1873994/lib/dalli/protocol/binary.rb#L156-L184):

```ruby
while buf.bytesize - pos >= 24
  header = buf.slice(pos, 24)
  (key_length, _, body_length, cas) = header.unpack(KV_HEADER)

  if key_length == 0
    # all done!
    @multi_buffer = nil
    @position = nil
    @inprogress = false
    break

  elsif buf.bytesize - pos >= 24 + body_length
    flags = buf.slice(pos + 24, 4).unpack1("N")
    key = buf.slice(pos + 24 + 4, key_length)
    value = buf.slice(pos + 24 + 4 + key_length, body_length - key_length - 4) if body_length - key_length - 4 > 0

    pos = pos + 24 + body_length

    begin
      values[key] = [deserialize(value, flags), cas]
    rescue DalliError
    end

  else
    # not enough data yet, wait for more
    break
  end
end
@position = pos
```

### Proposal

If `unpack` and `unpack1` had an `offset:` parameter, it would allow this kind of code to extract the fields it needs without allocating and copying as much strings, e.g.:

```ruby
flags = buf.slice(pos + 24, 4).unpack1("N")
```

could be:

```ruby
buf.unpack1("N", offset: pos + 24)
```




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [ruby-core:105663] [Ruby master Feature#18254] Add an `offset` parameter to String#unpack and String#unpack1
  2021-10-18  8:04 [ruby-core:105660] [Ruby master Feature#18254] Add an `offset` parameter to String#unpack and String#unpack1 byroot (Jean Boussier)
  2021-10-18  8:24 ` [ruby-core:105661] " znz (Kazuhiro NISHIYAMA)
  2021-10-18  8:32 ` [ruby-core:105662] " byroot (Jean Boussier)
@ 2021-10-18 14:26 ` byroot (Jean Boussier)
  2021-10-25  2:08 ` [ruby-core:105766] " matz (Yukihiro Matsumoto)
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: byroot (Jean Boussier) @ 2021-10-18 14:26 UTC (permalink / raw)
  To: ruby-core

Issue #18254 has been updated by byroot (Jean Boussier).


I submitted a pull request for it, https://github.com/ruby/ruby/pull/4984.

----------------------------------------
Feature #18254: Add an `offset` parameter to String#unpack and String#unpack1
https://bugs.ruby-lang.org/issues/18254#change-94162

* Author: byroot (Jean Boussier)
* Status: Open
* Priority: Normal
----------------------------------------
When working with binary protocols it's common to have to first unpack some kind of header or type prefix, and then based on that unpack another part of the string.

For instance here's [a code snippet from Dalli, the most common Memcached client](https://github.com/petergoldstein/dalli/blob/76b79d78cda13562da17bc99f92edcedf1873994/lib/dalli/protocol/binary.rb#L156-L184):

```ruby
while buf.bytesize - pos >= 24
  header = buf.slice(pos, 24)
  (key_length, _, body_length, cas) = header.unpack(KV_HEADER)

  if key_length == 0
    # all done!
    @multi_buffer = nil
    @position = nil
    @inprogress = false
    break

  elsif buf.bytesize - pos >= 24 + body_length
    flags = buf.slice(pos + 24, 4).unpack1("N")
    key = buf.slice(pos + 24 + 4, key_length)
    value = buf.slice(pos + 24 + 4 + key_length, body_length - key_length - 4) if body_length - key_length - 4 > 0

    pos = pos + 24 + body_length

    begin
      values[key] = [deserialize(value, flags), cas]
    rescue DalliError
    end

  else
    # not enough data yet, wait for more
    break
  end
end
@position = pos
```

### Proposal

If `unpack` and `unpack1` had an `offset:` parameter, it would allow this kind of code to extract the fields it needs without allocating and copying as much strings, e.g.:

```ruby
flags = buf.slice(pos + 24, 4).unpack1("N")
```

could be:

```ruby
buf.unpack1("N", offset: pos + 24)
```




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [ruby-core:105766] [Ruby master Feature#18254] Add an `offset` parameter to String#unpack and String#unpack1
  2021-10-18  8:04 [ruby-core:105660] [Ruby master Feature#18254] Add an `offset` parameter to String#unpack and String#unpack1 byroot (Jean Boussier)
                   ` (2 preceding siblings ...)
  2021-10-18 14:26 ` [ruby-core:105663] " byroot (Jean Boussier)
@ 2021-10-25  2:08 ` matz (Yukihiro Matsumoto)
  2021-10-25  2:36 ` [ruby-core:105769] " mame (Yusuke Endoh)
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: matz (Yukihiro Matsumoto) @ 2021-10-25  2:08 UTC (permalink / raw)
  To: ruby-core

Issue #18254 has been updated by matz (Yukihiro Matsumoto).


Sounds reasonable. Accepted.

Matz.


----------------------------------------
Feature #18254: Add an `offset` parameter to String#unpack and String#unpack1
https://bugs.ruby-lang.org/issues/18254#change-94280

* Author: byroot (Jean Boussier)
* Status: Open
* Priority: Normal
----------------------------------------
When working with binary protocols it's common to have to first unpack some kind of header or type prefix, and then based on that unpack another part of the string.

For instance here's [a code snippet from Dalli, the most common Memcached client](https://github.com/petergoldstein/dalli/blob/76b79d78cda13562da17bc99f92edcedf1873994/lib/dalli/protocol/binary.rb#L156-L184):

```ruby
while buf.bytesize - pos >= 24
  header = buf.slice(pos, 24)
  (key_length, _, body_length, cas) = header.unpack(KV_HEADER)

  if key_length == 0
    # all done!
    @multi_buffer = nil
    @position = nil
    @inprogress = false
    break

  elsif buf.bytesize - pos >= 24 + body_length
    flags = buf.slice(pos + 24, 4).unpack1("N")
    key = buf.slice(pos + 24 + 4, key_length)
    value = buf.slice(pos + 24 + 4 + key_length, body_length - key_length - 4) if body_length - key_length - 4 > 0

    pos = pos + 24 + body_length

    begin
      values[key] = [deserialize(value, flags), cas]
    rescue DalliError
    end

  else
    # not enough data yet, wait for more
    break
  end
end
@position = pos
```

### Proposal

If `unpack` and `unpack1` had an `offset:` parameter, it would allow this kind of code to extract the fields it needs without allocating and copying as much strings, e.g.:

```ruby
flags = buf.slice(pos + 24, 4).unpack1("N")
```

could be:

```ruby
buf.unpack1("N", offset: pos + 24)
```




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [ruby-core:105769] [Ruby master Feature#18254] Add an `offset` parameter to String#unpack and String#unpack1
  2021-10-18  8:04 [ruby-core:105660] [Ruby master Feature#18254] Add an `offset` parameter to String#unpack and String#unpack1 byroot (Jean Boussier)
                   ` (3 preceding siblings ...)
  2021-10-25  2:08 ` [ruby-core:105766] " matz (Yukihiro Matsumoto)
@ 2021-10-25  2:36 ` mame (Yusuke Endoh)
  2021-10-25  5:39 ` [ruby-core:105774] " nobu (Nobuyoshi Nakada)
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: mame (Yusuke Endoh) @ 2021-10-25  2:36 UTC (permalink / raw)
  To: ruby-core

Issue #18254 has been updated by mame (Yusuke Endoh).


Just a confirmation: the offset is byte-oriented, not character-oriented, right? There are a format "u" which is UTF-8 coding, so the behavior should be explained clearly in the document.

----------------------------------------
Feature #18254: Add an `offset` parameter to String#unpack and String#unpack1
https://bugs.ruby-lang.org/issues/18254#change-94283

* Author: byroot (Jean Boussier)
* Status: Open
* Priority: Normal
----------------------------------------
When working with binary protocols it's common to have to first unpack some kind of header or type prefix, and then based on that unpack another part of the string.

For instance here's [a code snippet from Dalli, the most common Memcached client](https://github.com/petergoldstein/dalli/blob/76b79d78cda13562da17bc99f92edcedf1873994/lib/dalli/protocol/binary.rb#L156-L184):

```ruby
while buf.bytesize - pos >= 24
  header = buf.slice(pos, 24)
  (key_length, _, body_length, cas) = header.unpack(KV_HEADER)

  if key_length == 0
    # all done!
    @multi_buffer = nil
    @position = nil
    @inprogress = false
    break

  elsif buf.bytesize - pos >= 24 + body_length
    flags = buf.slice(pos + 24, 4).unpack1("N")
    key = buf.slice(pos + 24 + 4, key_length)
    value = buf.slice(pos + 24 + 4 + key_length, body_length - key_length - 4) if body_length - key_length - 4 > 0

    pos = pos + 24 + body_length

    begin
      values[key] = [deserialize(value, flags), cas]
    rescue DalliError
    end

  else
    # not enough data yet, wait for more
    break
  end
end
@position = pos
```

### Proposal

If `unpack` and `unpack1` had an `offset:` parameter, it would allow this kind of code to extract the fields it needs without allocating and copying as much strings, e.g.:

```ruby
flags = buf.slice(pos + 24, 4).unpack1("N")
```

could be:

```ruby
buf.unpack1("N", offset: pos + 24)
```




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [ruby-core:105774] [Ruby master Feature#18254] Add an `offset` parameter to String#unpack and String#unpack1
  2021-10-18  8:04 [ruby-core:105660] [Ruby master Feature#18254] Add an `offset` parameter to String#unpack and String#unpack1 byroot (Jean Boussier)
                   ` (4 preceding siblings ...)
  2021-10-25  2:36 ` [ruby-core:105769] " mame (Yusuke Endoh)
@ 2021-10-25  5:39 ` nobu (Nobuyoshi Nakada)
  2021-10-25  6:56 ` [ruby-core:105775] " byroot (Jean Boussier)
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: nobu (Nobuyoshi Nakada) @ 2021-10-25  5:39 UTC (permalink / raw)
  To: ruby-core

Issue #18254 has been updated by nobu (Nobuyoshi Nakada).


As the RDoc of `String#unpack` states:
```ruby
  # Decodes <i>str</i> (which may contain binary data) according to the
  # format string, returning an array of each value extracted. The
```
Isn't it clear that it is counted as binary?

----------------------------------------
Feature #18254: Add an `offset` parameter to String#unpack and String#unpack1
https://bugs.ruby-lang.org/issues/18254#change-94288

* Author: byroot (Jean Boussier)
* Status: Open
* Priority: Normal
----------------------------------------
When working with binary protocols it's common to have to first unpack some kind of header or type prefix, and then based on that unpack another part of the string.

For instance here's [a code snippet from Dalli, the most common Memcached client](https://github.com/petergoldstein/dalli/blob/76b79d78cda13562da17bc99f92edcedf1873994/lib/dalli/protocol/binary.rb#L156-L184):

```ruby
while buf.bytesize - pos >= 24
  header = buf.slice(pos, 24)
  (key_length, _, body_length, cas) = header.unpack(KV_HEADER)

  if key_length == 0
    # all done!
    @multi_buffer = nil
    @position = nil
    @inprogress = false
    break

  elsif buf.bytesize - pos >= 24 + body_length
    flags = buf.slice(pos + 24, 4).unpack1("N")
    key = buf.slice(pos + 24 + 4, key_length)
    value = buf.slice(pos + 24 + 4 + key_length, body_length - key_length - 4) if body_length - key_length - 4 > 0

    pos = pos + 24 + body_length

    begin
      values[key] = [deserialize(value, flags), cas]
    rescue DalliError
    end

  else
    # not enough data yet, wait for more
    break
  end
end
@position = pos
```

### Proposal

If `unpack` and `unpack1` had an `offset:` parameter, it would allow this kind of code to extract the fields it needs without allocating and copying as much strings, e.g.:

```ruby
flags = buf.slice(pos + 24, 4).unpack1("N")
```

could be:

```ruby
buf.unpack1("N", offset: pos + 24)
```




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [ruby-core:105775] [Ruby master Feature#18254] Add an `offset` parameter to String#unpack and String#unpack1
  2021-10-18  8:04 [ruby-core:105660] [Ruby master Feature#18254] Add an `offset` parameter to String#unpack and String#unpack1 byroot (Jean Boussier)
                   ` (5 preceding siblings ...)
  2021-10-25  5:39 ` [ruby-core:105774] " nobu (Nobuyoshi Nakada)
@ 2021-10-25  6:56 ` byroot (Jean Boussier)
  2021-10-25  9:26 ` [ruby-core:105783] " duerst
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: byroot (Jean Boussier) @ 2021-10-25  6:56 UTC (permalink / raw)
  To: ruby-core

Issue #18254 has been updated by byroot (Jean Boussier).


> Just a confirmation: the offset is byte-oriented, not character-oriented, right? There

Yes. 

----------------------------------------
Feature #18254: Add an `offset` parameter to String#unpack and String#unpack1
https://bugs.ruby-lang.org/issues/18254#change-94289

* Author: byroot (Jean Boussier)
* Status: Open
* Priority: Normal
----------------------------------------
When working with binary protocols it's common to have to first unpack some kind of header or type prefix, and then based on that unpack another part of the string.

For instance here's [a code snippet from Dalli, the most common Memcached client](https://github.com/petergoldstein/dalli/blob/76b79d78cda13562da17bc99f92edcedf1873994/lib/dalli/protocol/binary.rb#L156-L184):

```ruby
while buf.bytesize - pos >= 24
  header = buf.slice(pos, 24)
  (key_length, _, body_length, cas) = header.unpack(KV_HEADER)

  if key_length == 0
    # all done!
    @multi_buffer = nil
    @position = nil
    @inprogress = false
    break

  elsif buf.bytesize - pos >= 24 + body_length
    flags = buf.slice(pos + 24, 4).unpack1("N")
    key = buf.slice(pos + 24 + 4, key_length)
    value = buf.slice(pos + 24 + 4 + key_length, body_length - key_length - 4) if body_length - key_length - 4 > 0

    pos = pos + 24 + body_length

    begin
      values[key] = [deserialize(value, flags), cas]
    rescue DalliError
    end

  else
    # not enough data yet, wait for more
    break
  end
end
@position = pos
```

### Proposal

If `unpack` and `unpack1` had an `offset:` parameter, it would allow this kind of code to extract the fields it needs without allocating and copying as much strings, e.g.:

```ruby
flags = buf.slice(pos + 24, 4).unpack1("N")
```

could be:

```ruby
buf.unpack1("N", offset: pos + 24)
```




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [ruby-core:105783] [Ruby master Feature#18254] Add an `offset` parameter to String#unpack and String#unpack1
  2021-10-18  8:04 [ruby-core:105660] [Ruby master Feature#18254] Add an `offset` parameter to String#unpack and String#unpack1 byroot (Jean Boussier)
                   ` (6 preceding siblings ...)
  2021-10-25  6:56 ` [ruby-core:105775] " byroot (Jean Boussier)
@ 2021-10-25  9:26 ` duerst
  2021-10-25  9:41 ` [ruby-core:105784] " byroot (Jean Boussier)
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: duerst @ 2021-10-25  9:26 UTC (permalink / raw)
  To: ruby-core

Issue #18254 has been updated by duerst (Martin Dürst).


mame (Yusuke Endoh) wrote in #note-5:
> Just a confirmation: the offset is byte-oriented, not character-oriented, right? There are a format "u" which is UTF-8 coding, so the behavior should be explained clearly in the document.

This is not only a problem of "explain it in the document". In order for this offset to work well, there should be a way to know how many bytes an invocation of String#unpack consumes. In many cases, that's very easy to calculate from the format string, but in others, in particular for UTF-8, it's not easy.

----------------------------------------
Feature #18254: Add an `offset` parameter to String#unpack and String#unpack1
https://bugs.ruby-lang.org/issues/18254#change-94299

* Author: byroot (Jean Boussier)
* Status: Open
* Priority: Normal
----------------------------------------
When working with binary protocols it's common to have to first unpack some kind of header or type prefix, and then based on that unpack another part of the string.

For instance here's [a code snippet from Dalli, the most common Memcached client](https://github.com/petergoldstein/dalli/blob/76b79d78cda13562da17bc99f92edcedf1873994/lib/dalli/protocol/binary.rb#L156-L184):

```ruby
while buf.bytesize - pos >= 24
  header = buf.slice(pos, 24)
  (key_length, _, body_length, cas) = header.unpack(KV_HEADER)

  if key_length == 0
    # all done!
    @multi_buffer = nil
    @position = nil
    @inprogress = false
    break

  elsif buf.bytesize - pos >= 24 + body_length
    flags = buf.slice(pos + 24, 4).unpack1("N")
    key = buf.slice(pos + 24 + 4, key_length)
    value = buf.slice(pos + 24 + 4 + key_length, body_length - key_length - 4) if body_length - key_length - 4 > 0

    pos = pos + 24 + body_length

    begin
      values[key] = [deserialize(value, flags), cas]
    rescue DalliError
    end

  else
    # not enough data yet, wait for more
    break
  end
end
@position = pos
```

### Proposal

If `unpack` and `unpack1` had an `offset:` parameter, it would allow this kind of code to extract the fields it needs without allocating and copying as much strings, e.g.:

```ruby
flags = buf.slice(pos + 24, 4).unpack1("N")
```

could be:

```ruby
buf.unpack1("N", offset: pos + 24)
```




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [ruby-core:105784] [Ruby master Feature#18254] Add an `offset` parameter to String#unpack and String#unpack1
  2021-10-18  8:04 [ruby-core:105660] [Ruby master Feature#18254] Add an `offset` parameter to String#unpack and String#unpack1 byroot (Jean Boussier)
                   ` (7 preceding siblings ...)
  2021-10-25  9:26 ` [ruby-core:105783] " duerst
@ 2021-10-25  9:41 ` byroot (Jean Boussier)
  2021-10-25 14:34 ` [ruby-core:105788] " byroot (Jean Boussier)
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: byroot (Jean Boussier) @ 2021-10-25  9:41 UTC (permalink / raw)
  To: ruby-core

Issue #18254 has been updated by byroot (Jean Boussier).


That argument will indeed be pretty much worthless if you use the `U` format, but I don't really see it as a blocker. It is meant to help binary parsers, I don't see `U` making sense for these.

As for the documentation, we indeed need to be clear that it's a **byte** offset.

----------------------------------------
Feature #18254: Add an `offset` parameter to String#unpack and String#unpack1
https://bugs.ruby-lang.org/issues/18254#change-94300

* Author: byroot (Jean Boussier)
* Status: Open
* Priority: Normal
----------------------------------------
When working with binary protocols it's common to have to first unpack some kind of header or type prefix, and then based on that unpack another part of the string.

For instance here's [a code snippet from Dalli, the most common Memcached client](https://github.com/petergoldstein/dalli/blob/76b79d78cda13562da17bc99f92edcedf1873994/lib/dalli/protocol/binary.rb#L156-L184):

```ruby
while buf.bytesize - pos >= 24
  header = buf.slice(pos, 24)
  (key_length, _, body_length, cas) = header.unpack(KV_HEADER)

  if key_length == 0
    # all done!
    @multi_buffer = nil
    @position = nil
    @inprogress = false
    break

  elsif buf.bytesize - pos >= 24 + body_length
    flags = buf.slice(pos + 24, 4).unpack1("N")
    key = buf.slice(pos + 24 + 4, key_length)
    value = buf.slice(pos + 24 + 4 + key_length, body_length - key_length - 4) if body_length - key_length - 4 > 0

    pos = pos + 24 + body_length

    begin
      values[key] = [deserialize(value, flags), cas]
    rescue DalliError
    end

  else
    # not enough data yet, wait for more
    break
  end
end
@position = pos
```

### Proposal

If `unpack` and `unpack1` had an `offset:` parameter, it would allow this kind of code to extract the fields it needs without allocating and copying as much strings, e.g.:

```ruby
flags = buf.slice(pos + 24, 4).unpack1("N")
```

could be:

```ruby
buf.unpack1("N", offset: pos + 24)
```




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [ruby-core:105788] [Ruby master Feature#18254] Add an `offset` parameter to String#unpack and String#unpack1
  2021-10-18  8:04 [ruby-core:105660] [Ruby master Feature#18254] Add an `offset` parameter to String#unpack and String#unpack1 byroot (Jean Boussier)
                   ` (8 preceding siblings ...)
  2021-10-25  9:41 ` [ruby-core:105784] " byroot (Jean Boussier)
@ 2021-10-25 14:34 ` byroot (Jean Boussier)
  2021-10-25 16:55 ` [ruby-core:105792] " mame (Yusuke Endoh)
  2021-10-25 17:36 ` [ruby-core:105794] " byroot (Jean Boussier)
  11 siblings, 0 replies; 13+ messages in thread
From: byroot (Jean Boussier) @ 2021-10-25 14:34 UTC (permalink / raw)
  To: ruby-core

Issue #18254 has been updated by byroot (Jean Boussier).


I extended the pull request to clearly document the `offset` keyword and stress that it's a byte offset. Hopefully that clears that concern.

----------------------------------------
Feature #18254: Add an `offset` parameter to String#unpack and String#unpack1
https://bugs.ruby-lang.org/issues/18254#change-94306

* Author: byroot (Jean Boussier)
* Status: Open
* Priority: Normal
----------------------------------------
When working with binary protocols it's common to have to first unpack some kind of header or type prefix, and then based on that unpack another part of the string.

For instance here's [a code snippet from Dalli, the most common Memcached client](https://github.com/petergoldstein/dalli/blob/76b79d78cda13562da17bc99f92edcedf1873994/lib/dalli/protocol/binary.rb#L156-L184):

```ruby
while buf.bytesize - pos >= 24
  header = buf.slice(pos, 24)
  (key_length, _, body_length, cas) = header.unpack(KV_HEADER)

  if key_length == 0
    # all done!
    @multi_buffer = nil
    @position = nil
    @inprogress = false
    break

  elsif buf.bytesize - pos >= 24 + body_length
    flags = buf.slice(pos + 24, 4).unpack1("N")
    key = buf.slice(pos + 24 + 4, key_length)
    value = buf.slice(pos + 24 + 4 + key_length, body_length - key_length - 4) if body_length - key_length - 4 > 0

    pos = pos + 24 + body_length

    begin
      values[key] = [deserialize(value, flags), cas]
    rescue DalliError
    end

  else
    # not enough data yet, wait for more
    break
  end
end
@position = pos
```

### Proposal

If `unpack` and `unpack1` had an `offset:` parameter, it would allow this kind of code to extract the fields it needs without allocating and copying as much strings, e.g.:

```ruby
flags = buf.slice(pos + 24, 4).unpack1("N")
```

could be:

```ruby
buf.unpack1("N", offset: pos + 24)
```




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [ruby-core:105792] [Ruby master Feature#18254] Add an `offset` parameter to String#unpack and String#unpack1
  2021-10-18  8:04 [ruby-core:105660] [Ruby master Feature#18254] Add an `offset` parameter to String#unpack and String#unpack1 byroot (Jean Boussier)
                   ` (9 preceding siblings ...)
  2021-10-25 14:34 ` [ruby-core:105788] " byroot (Jean Boussier)
@ 2021-10-25 16:55 ` mame (Yusuke Endoh)
  2021-10-25 17:36 ` [ruby-core:105794] " byroot (Jean Boussier)
  11 siblings, 0 replies; 13+ messages in thread
From: mame (Yusuke Endoh) @ 2021-10-25 16:55 UTC (permalink / raw)
  To: ruby-core

Issue #18254 has been updated by mame (Yusuke Endoh).


@byroot Thank you for adding documentation. I agree with merging.

> there should be a way to know how many bytes an invocation of String#unpack consumes.

In fact, some committers discussed this point at the dev-meeting. However, in many cases, it is trivial (or able to calculate) for a programmer how many bytes are consumed. Also, it looks difficult to provide the feature by just extending the current API design of `String#unpack`. So, matz concluded that those who really wants the feaature should create another ticket with use case discussion and a concrete API proposal.

----------------------------------------
Feature #18254: Add an `offset` parameter to String#unpack and String#unpack1
https://bugs.ruby-lang.org/issues/18254#change-94310

* Author: byroot (Jean Boussier)
* Status: Open
* Priority: Normal
----------------------------------------
When working with binary protocols it's common to have to first unpack some kind of header or type prefix, and then based on that unpack another part of the string.

For instance here's [a code snippet from Dalli, the most common Memcached client](https://github.com/petergoldstein/dalli/blob/76b79d78cda13562da17bc99f92edcedf1873994/lib/dalli/protocol/binary.rb#L156-L184):

```ruby
while buf.bytesize - pos >= 24
  header = buf.slice(pos, 24)
  (key_length, _, body_length, cas) = header.unpack(KV_HEADER)

  if key_length == 0
    # all done!
    @multi_buffer = nil
    @position = nil
    @inprogress = false
    break

  elsif buf.bytesize - pos >= 24 + body_length
    flags = buf.slice(pos + 24, 4).unpack1("N")
    key = buf.slice(pos + 24 + 4, key_length)
    value = buf.slice(pos + 24 + 4 + key_length, body_length - key_length - 4) if body_length - key_length - 4 > 0

    pos = pos + 24 + body_length

    begin
      values[key] = [deserialize(value, flags), cas]
    rescue DalliError
    end

  else
    # not enough data yet, wait for more
    break
  end
end
@position = pos
```

### Proposal

If `unpack` and `unpack1` had an `offset:` parameter, it would allow this kind of code to extract the fields it needs without allocating and copying as much strings, e.g.:

```ruby
flags = buf.slice(pos + 24, 4).unpack1("N")
```

could be:

```ruby
buf.unpack1("N", offset: pos + 24)
```




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [ruby-core:105794] [Ruby master Feature#18254] Add an `offset` parameter to String#unpack and String#unpack1
  2021-10-18  8:04 [ruby-core:105660] [Ruby master Feature#18254] Add an `offset` parameter to String#unpack and String#unpack1 byroot (Jean Boussier)
                   ` (10 preceding siblings ...)
  2021-10-25 16:55 ` [ruby-core:105792] " mame (Yusuke Endoh)
@ 2021-10-25 17:36 ` byroot (Jean Boussier)
  11 siblings, 0 replies; 13+ messages in thread
From: byroot (Jean Boussier) @ 2021-10-25 17:36 UTC (permalink / raw)
  To: ruby-core

Issue #18254 has been updated by byroot (Jean Boussier).


Agreed. The goal is to avoid slicing anyway, and to slice you need to know how many bytes you consumed.

If there's no other objections I'll merge in a day or two.

----------------------------------------
Feature #18254: Add an `offset` parameter to String#unpack and String#unpack1
https://bugs.ruby-lang.org/issues/18254#change-94313

* Author: byroot (Jean Boussier)
* Status: Open
* Priority: Normal
----------------------------------------
When working with binary protocols it's common to have to first unpack some kind of header or type prefix, and then based on that unpack another part of the string.

For instance here's [a code snippet from Dalli, the most common Memcached client](https://github.com/petergoldstein/dalli/blob/76b79d78cda13562da17bc99f92edcedf1873994/lib/dalli/protocol/binary.rb#L156-L184):

```ruby
while buf.bytesize - pos >= 24
  header = buf.slice(pos, 24)
  (key_length, _, body_length, cas) = header.unpack(KV_HEADER)

  if key_length == 0
    # all done!
    @multi_buffer = nil
    @position = nil
    @inprogress = false
    break

  elsif buf.bytesize - pos >= 24 + body_length
    flags = buf.slice(pos + 24, 4).unpack1("N")
    key = buf.slice(pos + 24 + 4, key_length)
    value = buf.slice(pos + 24 + 4 + key_length, body_length - key_length - 4) if body_length - key_length - 4 > 0

    pos = pos + 24 + body_length

    begin
      values[key] = [deserialize(value, flags), cas]
    rescue DalliError
    end

  else
    # not enough data yet, wait for more
    break
  end
end
@position = pos
```

### Proposal

If `unpack` and `unpack1` had an `offset:` parameter, it would allow this kind of code to extract the fields it needs without allocating and copying as much strings, e.g.:

```ruby
flags = buf.slice(pos + 24, 4).unpack1("N")
```

could be:

```ruby
buf.unpack1("N", offset: pos + 24)
```




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-10-25 17:36 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-18  8:04 [ruby-core:105660] [Ruby master Feature#18254] Add an `offset` parameter to String#unpack and String#unpack1 byroot (Jean Boussier)
2021-10-18  8:24 ` [ruby-core:105661] " znz (Kazuhiro NISHIYAMA)
2021-10-18  8:32 ` [ruby-core:105662] " byroot (Jean Boussier)
2021-10-18 14:26 ` [ruby-core:105663] " byroot (Jean Boussier)
2021-10-25  2:08 ` [ruby-core:105766] " matz (Yukihiro Matsumoto)
2021-10-25  2:36 ` [ruby-core:105769] " mame (Yusuke Endoh)
2021-10-25  5:39 ` [ruby-core:105774] " nobu (Nobuyoshi Nakada)
2021-10-25  6:56 ` [ruby-core:105775] " byroot (Jean Boussier)
2021-10-25  9:26 ` [ruby-core:105783] " duerst
2021-10-25  9:41 ` [ruby-core:105784] " byroot (Jean Boussier)
2021-10-25 14:34 ` [ruby-core:105788] " byroot (Jean Boussier)
2021-10-25 16:55 ` [ruby-core:105792] " mame (Yusuke Endoh)
2021-10-25 17:36 ` [ruby-core:105794] " byroot (Jean Boussier)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).