help files
Z505 | PasWiki | FUQ | Search | Main Docs | API Guide



Notes

Capstring is a performance friendly ansistring enhancement. It was invented by myself (Lars Olson) when I needed an ansistring type that didn't have such terrible performance in certain loops.

Performance notes

See a Screenshot and Benchmark.
  Capstring benchmark 1
  Start time: 20:22:18 (HH:MM:SS)
  Finish time: 20:22:18
  (Took LESS THAN ONE SECOND)
  Regular ansistring benchmark 1
  Start time: 20:22:18 (HH:MM:SS)
  Finish time: 20:32:24
  (Took OVER TEN MINUTES)

Note: an ansistring has fairly good performance in simple loops such as:

var
  s: ansistring;
  i: integer;
begin
  for i:= 1 to 50000 do
    s:= s + ' more';
end.
In the above loop, you won't notice much difference in speed with a Capstring versus an Ansistring. However, Ansistrings are very unpredictable inside loops when we do concatenations such as below:
const
  CRLF = #13#10;
var
  s: ansistring;
  i: integer;
begin
  for i:= 1 to 50000 do
  begin
   // usually an ansistring is EXTREMELY slow here, therefore hard to master/predict
    s:= s + ' more' + CRLF; 
    s:= s + ' more';
  end;
end.
Since ansistrings are unpredictable and it requires expertise to bypass their unpredictability (such as using SetLength, Uniquestring, and accessing char by char) the capstring can be used for performance that is predictable.

The capstring grows chunk by chunk and you may specify the chunk size that it grows. For example if you specify the chunk to be 128000 bytes then the string will only grow in size every time you add 128000 characters into it. You can add any size string to the capstring even though it grows in fixed chunks. Once you are done adding strings to the capstring, you tell it you are finished and it shrinks to the correct size, and is ready for read-access. A regular ansistring would grow any time you add any string to it and a regular ansistring does not grow using intelligent chunks.

When using the capstring, once you are done a loop, remember to call the EndUpdate function which tells the capstring to shrink down to its actual string size (chunk growth means it makes extra room during updates). The EndUpdate function also sets up the capstring data field (an ansistring) to be available for read only access. To add more strings to the capstring, simply call AddStr(). To add more characters to the capstring, simply call AddChar.

The capstring is similar to a stringlist some ways in that you call an ADDSTR() function instead of using the plus sign operator. The plus operator is not used, since capstring is not yet a built in type into the modern Pascal languages (but it can be in future compiler versions).

Ansistring performance the Hard Way

Using string{i} indexed access to characters and calling uniquestring/setlength ahead of time before loops is tedious, and takes ridiculous effort to trick the compiler/reference counter and memory allocator into doing performance friendly maneuvers. The ansistring is called an automated type, but it really isn't automatic at all when you are concatenating in larger loops. The ansistring becomes so slow in larger loops that you must escape the automation and handle the ansistring manually yourself at times. The ansistring is only an automated type when you are doing small random concatenations in extremely low quantity.

Sometimes even with stringlists one makes a reference to stringlist.text in loops, which calls the full ansistring rather than the list of strings. Since stringlists can be very large, making calls to stringlist.text or stringlist.text{i} in a loop will slow the program down significantly. Not all text operations can be done using the stringlist.items, some still must be done using stringlist.text in the real world - which slows the program down. Neither can all operations be done using some other concoction such as an array of strings with a delimiter. Strings are strings of text at times, rather than arrays or items of text. That is why they are called strings.

Capstring came about because time after time after time, I found myself allocating my own memory with setlength or resorting to uglier pchars during big concatenation loops. Not only was it annoying to escape the ansistring automated type, but the programmer had to be an expert on compilers in order to understand what tricks could be used to fool the compiler/reference counter/memory allocator into making the ansistring more performance friendly. In fact even though pchars are hard to use, they are more predicable than ansistrings since they are more crude, and since ansistrings have so much magic that goes on in the background that you have to escape, which only compiler writers or extremely experienced programmers understand well.

Concatenation inside loops occurs extremely often in programming - and loops are probably most prevalent bottleneck in all applications. I'm not even a performance freak myself - I'm against premature optimization - and I still had a significant need for capstring, because some ansistring concatenations inside loops literally took 5 hours to process if I didn't tune them using setlengths and string{i} indexing! A capstring allows you to do concatenations inside loops without allocating your memory with a setlength ahead of time - plus you don't have to access the string char by char as you do with string{i} indexing.

In the cases where one is concatenating hundreds or thousands of strings inside a loop, the ansistring or stringlist.text field starts to exponentially become slower, and escaping the automated type using setlength becomes extremely tedious, almost as tedious as using pchars. Most of the ansistring performance issues are due to constant memory allocations on the heap every time a new string is added to an existing ansistring.

Tuning ansistrings by using uniquestring and using pchar casts along with an initial setlength is tedious, time consuming, and far from automated. Programmers have better things to be doing every day. Some folks then end up trying string lists or arrays of chars, or arrays of strings - but what they really need is a capacitance based ansistring.

It is possible to develop a capacitance based stringlist too, but in many cases, why bother using a stringlist if an ansistring with a capacitance is more appropriate? I think there is both a need for a capacitance stringlist and a capacitance ansistring, and one does not always substitute the other.

The capstring data structure and algorithms could eventually be built into the Modern Pascal programming languages (and others, such as Java/C++/Ada/Etc.). A capstring as a record, struct, or class is not as beautiful as a built in type such as the ansistring that comes with many modern pascals. However, for now, a capstring isn't built into the Pascal language and we must make a compromise.

The notes you should know before using capstring provided below are:

Download

The capstring unit is available here with a demo.

Code example

program CapstrDemo;
{$mode objfpc} {$H+}
uses
  capstr;
var
  buf: TCapstr;
begin
  resetbuf(@buf); // always reset before using a new capstring
  for i:= ... do
  begin // in a large loop you will notice performance is consistent and predictable
    addstr('test123', @buf);
    addstr(' and testabc', @buf);
  end;
  endupdate(@buf); // we are done with data, must clean up
  writeln(buf.data); // can access data as read only after cleaning up
  readln;
end.
The above program is equivalent to doing this:
program AnsistringDemo;
{$mode objfpc} {$H+}
var
  s: ansistring;
begin
  s:= '';
  for i:= ... do
  begin // in a large loop you will notice this is very slow or unpredictable
    s:= s + 'test123' + 'and testabc';
  end;
  writeln(s); 
  readln;
end.
Sometimes the ansistring is extremely slow only when you are concatenating three or more ansistrings together in one shot (as opposed to just two). Sometimes the ansistring is slow only when you add two or more string constants to an ansistring (as opposed to just adding a single string constant). Sometimes the ansistring is fast, other times it is bizarrely slow. Because of this unpredictability of the ansistring, it is better to use a more predictable and consistent performance friendly type, which the capstring intends to be.

Why Not Just Use Pchars

Pchars can offer predictable performance if you grow them in chunks or you allocate large memory ahead of time. They are very crude and offer no automation. Allocating the memory is tedious with pchars. It is also easy to make mistakes with Pchars. Pchars can be slow if you don't spend hours tuning them to fit the situation. Most people reinvent and roll their own capstring each time, without reusing important procedures or methods.. i.e. most people keep track of the string growth in their head and grow the string in chunks. This is a waste of time, to put it bluntly.

When to use Capstring

The capstring will not be very much faster than a manually tuned ansistring or manually tuned pchar. So why use it? Because programmers don't have time to tune pchars and ansistrings! Simple as that. The capstring will be predictable and fast in large concatenation loops and memory allocation is done automatically. The capstring will not be faster in simple loops - it will offer equal performance to an ansistring. In most software programs, loops are far from simple. Because of the capstring's predictability, it is a better type to use in large loops where you demand predictable performance without allocating your own memory.





lufdoc, Powtils, fpc, freepascal, delphi, kylix, c/c++, mysql, cgi web framework docs, Z505