TBBMM - A Multicore Scalable Delphi and C++ Builder Memory Manager Replacement

FastMM was adopted by Borland / CodeGear (now Embarcadero) end of year 2005 as the default memory manager for Delphi and C++ Builder.

FastMM is fast, but...

FastMM does not scale well in a multi-threaded environment, especially when running under a machine with more than 1 core. It is quite surprising, considering one of CodeGear's main selling point with the new memory manager is that it has good scalability in multithreaded environment.

To FastMM's credit however (and I have tremendous respect for Pierre le Riche, the author of FastMM), it is about 10% faster than Microsoft Visual Studio's built-in memory manager in single-thread.

I came across this when I was doing some thing as simple as concatenating a string in multiple threads, trying to simulate a server which processes requests from multiple clients. Working my way to reproduce the problem with minimal code, I came up with this:

void __fastcall TAnsiStringTesterThread::Execute()
    AnsiString str;
    for (int i=0; i<10000000; i++)
        str = " something ";

procedure TAnsiStringTesterThread.Execute;
  i: Integer;
  str: string;
  for i := 0 to 10000000 - 1 do
    str := IntToStr(10000000);
After a lengthy investigation, tracing into the RTL, it was found that the devil lies in LStrFromPCharLen which eventually calls GetMem to allocate memory for the AnsiString object. The problem is more severe for C++ Builder as every creation of AnsiString leads to LStrFromPCharLen, whereas in Delphi, a string is already in the form of an 'AnsiString' (or simply called string in Delphi), so no allocations need to take place.

Whether you are using Microsoft Visual C++ or C++ Builder, both their std::string implementations are also affected by scalability problem. You can try replacing AnsiString with std::string and you'll notice the problem prevails. Microsoft Visual C++ memory manager does a better job at scaling though (~90% vs FastMM's ~0%).

These days, with both Intel and AMD so heavily backing multi/many-core CPUs as the future, we need a memory manager which is both fast and scalable.

Lucky for CodeGear users (thanks to its flexible Memory Manager design), we could swap out the default for something better (with Microsoft Visual C++, you'll need to do run-time patching of its RTL, almost like how a virus would infect an EXE/DLL).

The Solution - TBBMM

Meet TBBMM, built on top of Intel's Thread Building Block (TBB):

TBBMM vs FastMM Performance Scaling

Intel's Thread Building Block (TBB) source codes and binaries are available for download here under GPL v2 with Run-time Exception.

This was run using the following thread execute code:
DWORD WINAPI threadExecute(LPVOID obj)
    for (int i=0; i<10000000; i++)
        char* test = new char[100];
        delete[] test;
    return 0;

Download TBBMM

Latest Version: 09-09-2010 (Updated on 09 September 2010)

Download the TBBMM here:

download    BorlndMM-09-09-2010.zip    98.94kB

Installation: Close RAD Studio and backup the existing BorlndMM.dll in your CodeGear RAD Studio Bin folder. Then, unzip the archive into the Bin folder.

For support, visit the TBBMM Forum.


By downloading, you agree to the following terms and conditions:
  • TBBMM.dll and BorlandMM.dll contained in the archive can be freely distributed
  • You have agreed to the Intel's Threading Building Block GPL v2 with Run-time Exception license
  • Other standard software EULAs such as this.

Change History

Version 09-09-2010
  • BorlndMM reverted to plain wrapper for TBBMM.dll.
  • TBB memory allocator v2.2 and v3.0 commercial aligned release update 2 are included in the archive.
  • Starting v2.2, the TBB memory allocator will return freed memory back to the OS, but it will still use more memory than FastMM4 and WinHeap.
Version 2.1.2008.605 - 22-11-2008
  • Solved more incompatibility issues when running with CodeGuard.
Version 2.1.2008.605 - 19-11-2008
  • With the previous version, CodeGuard was complaining about 'Bad Parameters' when it is enabled with TBBMM.
  • Padding 4 bytes to GetMem calls seem to have solved it.
Version 2.1.2008.605 - 18-11-2008
  • As TBB's allocator only has bins up to 8064 bytes, anything above will fall back to Microsoft CRT's malloc / free.
  • MS CRT's malloc / free for medium sizes are painfully slow - like 20 times slower than FastMM.
  • This version falls back to FastMM instead of MS CRT.
  • TBBMM.dll is now standalone (no longer requires MSVCR90.dll and TBBMALLOC.dll)
Version 2.1.2008.605 - 16-11-2008
  • Pilot Release