ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:95875] [Ruby master Bug#16352] Marshal limit of >= 2 GiB
       [not found] <redmine.issue-16352.20191118113902@ruby-lang.org>
@ 2019-11-18 11:39 ` seoanezonjic
  2019-11-20  2:05 ` [ruby-core:95890] " shyouhei
  2019-11-21 13:59 ` [ruby-core:95905] " shevegen
  2 siblings, 0 replies; 4+ messages in thread
From: seoanezonjic @ 2019-11-18 11:39 UTC (permalink / raw
  To: ruby-core

Issue #16352 has been reported by seoanezonjic (Pedro Seoane).

----------------------------------------
Bug #16352: Marshal limit of  >= 2 GiB
https://bugs.ruby-lang.org/issues/16352

* Author: seoanezonjic (Pedro Seoane)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.7.0dev (2019-11-12T12:03:22Z master 3816622fbe) [x86_64-linux]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
Hi
Using a gem to handle matrix operations called Numo-array I found the following error when save large matrix:
in `dump': long too big to dump (TypeError)
Github thread: https://github.com/ruby-numo/numo-narray/issues/144
Digging with the authors, we found the following code that reproduces the error:
ruby -e 'Marshal.dump(" "*2**31)'
Executed in :
ruby 2.7.0dev (2019-11-12T12:03:22Z master 3816622fbe) [x86_64-linux]

The marshal library  has a limit that is checked with the SIZEOF_LONG constant. This check is performed in this line https://github.com/ruby/ruby/blob/e7ea6e078fecb70fbc91b04878b69f696749afac/marshal.c#L301 to 321 of the Marshal.c file. I don't understand the motivation of this limit and has a great impact in libraries that need to serialize large objects as numeric matrix. In this case, the limit of  >= 2 GiB it's reached easily and it blocks the ruby development in scientifical projects as cited. I found other bug related: #1560, but the Marshal problem itself was not addressed in this case.
Thank you in advance
PEdro Seoane



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [ruby-core:95890] [Ruby master Bug#16352] Marshal limit of >= 2 GiB
       [not found] <redmine.issue-16352.20191118113902@ruby-lang.org>
  2019-11-18 11:39 ` [ruby-core:95875] [Ruby master Bug#16352] Marshal limit of >= 2 GiB seoanezonjic
@ 2019-11-20  2:05 ` shyouhei
  2019-11-20 19:58   ` [ruby-core:95898] " Austin Ziegler
  2019-11-21 13:59 ` [ruby-core:95905] " shevegen
  2 siblings, 1 reply; 4+ messages in thread
From: shyouhei @ 2019-11-20  2:05 UTC (permalink / raw
  To: ruby-core

Issue #16352 has been updated by shyouhei (Shyouhei Urabe).

Description updated

This behaviour has been there since the beginning.  No ruby version since 0.49 has successfully dumped such long string.  Same thing happens for a very big bignum, a very long array, a class that has very long classpath (Q::W::E::R::...), an object of 2**31 instance variables (which isn't impossible these days), and much much more.

The limitation is due to marshal's binary format.  I guess the reason behind this is simply because at the time the format was designed (back in 1990s), there simply was no such thing like a 64 bit integer type.  To properly reroute we have to reconsider all use of `long` in marshal format.  I guess that is essentially a format change.  That should hurt data portability so not that easy.

Any nice idea to fix the situation?

----------------------------------------
Bug #16352: Marshal limit of  >= 2 GiB
https://bugs.ruby-lang.org/issues/16352#change-82729

* Author: seoanezonjic (Pedro Seoane)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.7.0dev (2019-11-12T12:03:22Z master 3816622fbe) [x86_64-linux]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
Hi
Using a gem to handle matrix operations called Numo-array I found the following error when save large matrix:
in `dump': long too big to dump (TypeError)
Github thread: https://github.com/ruby-numo/numo-narray/issues/144
Digging with the authors, we found the following code that reproduces the error:
```
ruby -e 'Marshal.dump(" "*2**31)'
```
Executed in :
ruby 2.7.0dev (2019-11-12T12:03:22Z master 3816622fbe) [x86_64-linux]

The marshal library  has a limit that is checked with the SIZEOF_LONG constant. This check is performed in this line https://github.com/ruby/ruby/blob/e7ea6e078fecb70fbc91b04878b69f696749afac/marshal.c#L301 to 321 of the Marshal.c file. I don't understand the motivation of this limit and has a great impact in libraries that need to serialize large objects as numeric matrix. In this case, the limit of  >= 2 GiB it's reached easily and it blocks the ruby development in scientifical projects as cited. I found other bug related: #1560, but the Marshal problem itself was not addressed in this case.
Thank you in advance
PEdro Seoane



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [ruby-core:95898] Re: [Ruby master Bug#16352] Marshal limit of >= 2 GiB
  2019-11-20  2:05 ` [ruby-core:95890] " shyouhei
@ 2019-11-20 19:58   ` Austin Ziegler
  0 siblings, 0 replies; 4+ messages in thread
From: Austin Ziegler @ 2019-11-20 19:58 UTC (permalink / raw
  To: Ruby developers


[-- Attachment #1.1: Type: text/plain, Size: 2882 bytes --]

Marshal2?

On Tue, Nov 19, 2019 at 9:05 PM <shyouhei@ruby-lang.org> wrote:

> Issue #16352 has been updated by shyouhei (Shyouhei Urabe).
>
> Description updated
>
> This behaviour has been there since the beginning.  No ruby version since
> 0.49 has successfully dumped such long string.  Same thing happens for a
> very big bignum, a very long array, a class that has very long classpath
> (Q::W::E::R::...), an object of 2**31 instance variables (which isn't
> impossible these days), and much much more.
>
> The limitation is due to marshal's binary format.  I guess the reason
> behind this is simply because at the time the format was designed (back in
> 1990s), there simply was no such thing like a 64 bit integer type.  To
> properly reroute we have to reconsider all use of `long` in marshal
> format.  I guess that is essentially a format change.  That should hurt
> data portability so not that easy.
>
> Any nice idea to fix the situation?
>
> ----------------------------------------
> Bug #16352: Marshal limit of  >= 2 GiB
> https://bugs.ruby-lang.org/issues/16352#change-82729
>
> * Author: seoanezonjic (Pedro Seoane)
> * Status: Open
> * Priority: Normal
> * Assignee:
> * Target version:
> * ruby -v: ruby 2.7.0dev (2019-11-12T12:03:22Z master 3816622fbe)
> [x86_64-linux]
> * Backport: 2.5: UNKNOWN, 2.6: UNKNOWN
> ----------------------------------------
> Hi
> Using a gem to handle matrix operations called Numo-array I found the
> following error when save large matrix:
> in `dump': long too big to dump (TypeError)
> Github thread: https://github.com/ruby-numo/numo-narray/issues/144
> Digging with the authors, we found the following code that reproduces the
> error:
> ```
> ruby -e 'Marshal.dump(" "*2**31)'
> ```
> Executed in :
> ruby 2.7.0dev (2019-11-12T12:03:22Z master 3816622fbe) [x86_64-linux]
>
> The marshal library  has a limit that is checked with the SIZEOF_LONG
> constant. This check is performed in this line
> https://github.com/ruby/ruby/blob/e7ea6e078fecb70fbc91b04878b69f696749afac/marshal.c#L301
> to 321 of the Marshal.c file. I don't understand the motivation of this
> limit and has a great impact in libraries that need to serialize large
> objects as numeric matrix. In this case, the limit of  >= 2 GiB it's
> reached easily and it blocks the ruby development in scientifical projects
> as cited. I found other bug related: #1560, but the Marshal problem itself
> was not addressed in this case.
> Thank you in advance
> PEdro Seoane
>
>
>
> --
> https://bugs.ruby-lang.org/
>
> Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
> <http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>
>


-- 
Austin Ziegler • halostatue@gmail.com • austin@halostatue.ca
http://www.halostatue.ca/http://twitter.com/halostatue

[-- Attachment #1.2: Type: text/html, Size: 4200 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 4+ messages in thread

* [ruby-core:95905] [Ruby master Bug#16352] Marshal limit of >= 2 GiB
       [not found] <redmine.issue-16352.20191118113902@ruby-lang.org>
  2019-11-18 11:39 ` [ruby-core:95875] [Ruby master Bug#16352] Marshal limit of >= 2 GiB seoanezonjic
  2019-11-20  2:05 ` [ruby-core:95890] " shyouhei
@ 2019-11-21 13:59 ` shevegen
  2 siblings, 0 replies; 4+ messages in thread
From: shevegen @ 2019-11-21 13:59 UTC (permalink / raw
  To: ruby-core

Issue #16352 has been updated by shevegen (Robert A. Heiler).


> I don't understand the motivation of this limit and has a great impact in libraries that need to serialize large objects as numeric matrix.
> In this case, the limit of >= 2 GiB it's reached easily and it blocks the ruby development in scientifical projects as cited.

Shyouhei already pointed out the historic reason. I believe you can quite easily convince the ruby core team that a change may
be necessary in the long run (most likely past ruby 3.0) based on use cases. Matz likes to hear real world use cases, so the
more information may be given the better. :)

As for possibility of change, I guess the Marshal format could be kept by default, but another variant could perhaps be added
where people could switch to another format - a bit like syck and psych could be used interchangably for yaml to some extent
(I used syck for quite some time even after psych was added, before I transitioned into Unicode finally; I used to specify
the yaml engine via e. g. YAML.engine = or something like that).

----------------------------------------
Bug #16352: Marshal limit of  >= 2 GiB
https://bugs.ruby-lang.org/issues/16352#change-82748

* Author: seoanezonjic (Pedro Seoane)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.7.0dev (2019-11-12T12:03:22Z master 3816622fbe) [x86_64-linux]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
Hi
Using a gem to handle matrix operations called Numo-array I found the following error when save large matrix:
in `dump': long too big to dump (TypeError)
Github thread: https://github.com/ruby-numo/numo-narray/issues/144
Digging with the authors, we found the following code that reproduces the error:
```
ruby -e 'Marshal.dump(" "*2**31)'
```
Executed in :
ruby 2.7.0dev (2019-11-12T12:03:22Z master 3816622fbe) [x86_64-linux]

The marshal library  has a limit that is checked with the SIZEOF_LONG constant. This check is performed in this line https://github.com/ruby/ruby/blob/e7ea6e078fecb70fbc91b04878b69f696749afac/marshal.c#L301 to 321 of the Marshal.c file. I don't understand the motivation of this limit and has a great impact in libraries that need to serialize large objects as numeric matrix. In this case, the limit of  >= 2 GiB it's reached easily and it blocks the ruby development in scientifical projects as cited. I found other bug related: #1560, but the Marshal problem itself was not addressed in this case.
Thank you in advance
PEdro Seoane



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-11-21 14:00 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <redmine.issue-16352.20191118113902@ruby-lang.org>
2019-11-18 11:39 ` [ruby-core:95875] [Ruby master Bug#16352] Marshal limit of >= 2 GiB seoanezonjic
2019-11-20  2:05 ` [ruby-core:95890] " shyouhei
2019-11-20 19:58   ` [ruby-core:95898] " Austin Ziegler
2019-11-21 13:59 ` [ruby-core:95905] " shevegen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).