MemoryBlock <-> string Memory usage

I think I know the answer but I want to be sure

Let’s say you have a memoryblock MB with a lot of data. If you do:

Dim S as String = MB

Are you copying all the bytes (so using approximately twice the RAM ) or does the string internally just point to the same data?
(I am pretty sure that would copy the bytes unfortunately)

Also if you do
MB = S

Again are you copying all the bytes or just a reference to the underlying bytes… As Strings are not mutable in theory that could be either way depending on how stuff is structured under the hood.

Using a memoryblock to build a large string is certainly a way to keep ram usage down and fast but when one needs to go to or from string and a memory block, depending on how things work one could be doubling ram usage at least until the MemoryBlock is Niled or do S = “”

And I would like to avoid that ram usage spike as I can not predict how large the string/memoryblock contents could sometimes be.

Thanks,

  • karen

It copies, otherwise it would be a backdoor to making a String mutable. So yes, expect to double the RAM usage as the MB is assigned to a String and vice versa. Unfortunately any method of building a large String will be a trade-off between speed and memory. Edit: Speed and memory considerations, that is. There is a direct relation, i.e., less speed = less RAM, more speed = more RAM.

But one thing you may not have considered: the MemoryBlock --> String operation gives you a String with no encoding so it’s up to you to define the encoding, creating yet another version of the String. If you don’t do it properly, you will temporarily triple your RAM usage.

I am curious about why this is even an issue though. Unless you’re building strings in the GB range, you shouldn’t have to worry about it.

If you are doing that, perhaps using a file to hold completed chunks might be a better option.

Best way to build large strings is an array and Join later.

Christian, that would involve the same RAM concerns. Also, in my experience, that’s not the fastest way, although it does help with encoding concerns.

If you have repeating parts, the array can reference them.

Ah and if you need a memoryblock, we do have a join function in our plugin to join an array of variants with strings or memory blocks.

[quote=438751:@Kem Tekinay]I am curious about why this is even an issue though. Unless you’re building strings in the GB range, you shouldn’t have to worry about it.
[/quote]

First I do know when to use DefineEncoding as that should not copy the string… But in this case I am talking about a mix of text and binary data in the MemoryBlock so the whole string would never have a TextEncoding.

In theory I should never need to convert the whole MemoryBlock to a string (only small parts at a time) , but for example if I try to end the MB contents through a socket or shove the data from one into a MemoryBlock a copy will be generated as the APIs are for strings (which automagically get generated on assignment to methods expecting a string) … very inefficient and wasteful…

Some things should be able to directly take a MemoryBlock as well depending on what they do. Such things should be able to be done with MemoryBlocks and no copying of data

So what I am I try to do … well reinvent the wheel! :wink:

To start with I am writing 2 classes for dealing with generic data sets… then i will take the concept beyond that with additional capabilities.

The first class is to (among other things) take any Database record set and serialize all the records (with all different types of fields- Text, Numeric and blob) in as most compact an CPU efficient way I can think of to be able to transfer the data to Xojo client apps either though IPC or the net and sometime maybe a file… I am doing that by writing the data sequentially to a MemoryBlock using a BinaryStream

It can also load data though an api where you define fields and datatypes and then add data for the fields. You tell it when you are done entering data for each record.

I want to handle as big a dataset as possible, as efficiently as possible because I don’t the # of fields, Datatypes and # of records I will eventually need to handle

The second class receives the data and acts like a recordset but keeps the data for all but the current record in the memory block to minimize memory footprint on the client for large data sets (and in the 1st class the records can be compressed individually or in blocks of n records when they are large)

There is more to it than that but that is the high level view…

For Client Server applications Both would be Xojo apps and the serve be an old PC or mac (so slow) and not a ton of ram… and the server multi-user and could be based on Aloe Express… So speed and memory footprint are important

I am trying to make sure I squeeze out the highest performance I can as tool for future projects.

Basically I am looking to create a standard data set transfer mechanism for Xojo Middleware apps (or communication with Xojo helper apps) I create…

As i said reinventing the wheel… but then it will be my wheel!

That said I think having to pass memory blocks to framework Api’s that expect a string but really only need a byte buffer is potentially hugely wasteful of resources

  • karen