Capstring benchmark 1 Start time: 20:22:18 (HH:MM:SS) Finish time: 20:22:18 (Took LESS THAN ONE SECOND) Regular ansistring benchmark 1 Start time: 20:22:18 (HH:MM:SS) Finish time: 20:32:24 (Took OVER TEN MINUTES)
Note: an ansistring has fairly good performance in simple loops such as:
var s: ansistring; i: integer; begin for i:= 1 to 50000 do s:= s + ' more'; end.In the above loop, you won't notice much difference in speed with a Capstring versus an Ansistring. However, Ansistrings are very unpredictable inside loops when we do concatenations such as below:
const CRLF = #13#10; var s: ansistring; i: integer; begin for i:= 1 to 50000 do begin // usually an ansistring is EXTREMELY slow here, therefore hard to master/predict s:= s + ' more' + CRLF; s:= s + ' more'; end; end.Since ansistrings are unpredictable and it requires expertise to bypass their unpredictability (such as using SetLength, Uniquestring, and accessing char by char) the capstring can be used for performance that is predictable.
The capstring grows chunk by chunk and you may specify the chunk size that it grows. For example if you specify the chunk to be 128000 bytes then the string will only grow in size every time you add 128000 characters into it. You can add any size string to the capstring even though it grows in fixed chunks. Once you are done adding strings to the capstring, you tell it you are finished and it shrinks to the correct size, and is ready for read-access. A regular ansistring would grow any time you add any string to it and a regular ansistring does not grow using intelligent chunks.
When using the capstring, once you are done a loop, remember to call the EndUpdate function which tells the capstring to shrink down to its actual string size (chunk growth means it makes extra room during updates). The EndUpdate function also sets up the capstring data field (an ansistring) to be available for read only access. To add more strings to the capstring, simply call AddStr(). To add more characters to the capstring, simply call AddChar.
The capstring is similar to a stringlist some ways in that you call an ADDSTR() function instead of using the plus sign operator. The plus operator is not used, since capstring is not yet a built in type into the modern Pascal languages (but it can be in future compiler versions).
Sometimes even with stringlists one makes a reference to stringlist.text in loops, which calls the full ansistring rather than the list of strings. Since stringlists can be very large, making calls to stringlist.text or stringlist.text{i} in a loop will slow the program down significantly. Not all text operations can be done using the stringlist.items, some still must be done using stringlist.text in the real world - which slows the program down. Neither can all operations be done using some other concoction such as an array of strings with a delimiter. Strings are strings of text at times, rather than arrays or items of text. That is why they are called strings.
Capstring came about because time after time after time, I found myself allocating my own memory with setlength or resorting to uglier pchars during big concatenation loops. Not only was it annoying to escape the ansistring automated type, but the programmer had to be an expert on compilers in order to understand what tricks could be used to fool the compiler/reference counter/memory allocator into making the ansistring more performance friendly. In fact even though pchars are hard to use, they are more predicable than ansistrings since they are more crude, and since ansistrings have so much magic that goes on in the background that you have to escape, which only compiler writers or extremely experienced programmers understand well.
Concatenation inside loops occurs extremely often in programming - and loops are probably most prevalent bottleneck in all applications. I'm not even a performance freak myself - I'm against premature optimization - and I still had a significant need for capstring, because some ansistring concatenations inside loops literally took 5 hours to process if I didn't tune them using setlengths and string{i} indexing! A capstring allows you to do concatenations inside loops without allocating your memory with a setlength ahead of time - plus you don't have to access the string char by char as you do with string{i} indexing.
In the cases where one is concatenating hundreds or thousands of strings inside a loop, the ansistring or stringlist.text field starts to exponentially become slower, and escaping the automated type using setlength becomes extremely tedious, almost as tedious as using pchars. Most of the ansistring performance issues are due to constant memory allocations on the heap every time a new string is added to an existing ansistring.
Tuning ansistrings by using uniquestring and using pchar casts along with an initial setlength is tedious, time consuming, and far from automated. Programmers have better things to be doing every day. Some folks then end up trying string lists or arrays of chars, or arrays of strings - but what they really need is a capacitance based ansistring.
It is possible to develop a capacitance based stringlist too, but in many cases, why bother using a stringlist if an ansistring with a capacitance is more appropriate? I think there is both a need for a capacitance stringlist and a capacitance ansistring, and one does not always substitute the other.
The capstring data structure and algorithms could eventually be built into the Modern Pascal programming languages (and others, such as Java/C++/Ada/Etc.). A capstring as a record, struct, or class is not as beautiful as a built in type such as the ansistring that comes with many modern pascals. However, for now, a capstring isn't built into the Pascal language and we must make a compromise.
The notes you should know before using capstring provided below are:
program CapstrDemo; {$mode objfpc} {$H+} uses capstr; var buf: TCapstr; begin resetbuf(@buf); // always reset before using a new capstring for i:= ... do begin // in a large loop you will notice performance is consistent and predictable addstr('test123', @buf); addstr(' and testabc', @buf); end; endupdate(@buf); // we are done with data, must clean up writeln(buf.data); // can access data as read only after cleaning up readln; end.The above program is equivalent to doing this:
program AnsistringDemo; {$mode objfpc} {$H+} var s: ansistring; begin s:= ''; for i:= ... do begin // in a large loop you will notice this is very slow or unpredictable s:= s + 'test123' + 'and testabc'; end; writeln(s); readln; end.Sometimes the ansistring is extremely slow only when you are concatenating three or more ansistrings together in one shot (as opposed to just two). Sometimes the ansistring is slow only when you add two or more string constants to an ansistring (as opposed to just adding a single string constant). Sometimes the ansistring is fast, other times it is bizarrely slow. Because of this unpredictability of the ansistring, it is better to use a more predictable and consistent performance friendly type, which the capstring intends to be.