笔记列表
《提高C++性能的编程技术》笔记:总结
《提高C++性能的编程技术》笔记:跟踪
《提高C++性能的编程技术》笔记:构造函数、析构函数
《提高C++性能的编程技术》笔记:临时对象
《提高C++性能的编程技术》笔记:内存池(单线程、多线程)
《提高C++性能的编程技术》笔记:内联
《提高C++性能的编程技术》笔记:STL
《提高C++性能的编程技术》笔记:引用计数
《提高C++性能的编程技术》笔记:编码优化
《提高C++性能的编程技术》笔记:设计优化/可扩展性/系统体系结构
单线程内存池
频繁地分配和回收内存会严重地降低程序的性能。性能降低的原因在于默认的内存管理是通用的。应用程序可能会以某种特定的方式使用内存,并且为不需要的功能付出性能上的代价。通过开发专用的内存管理器可以解决这个问题。对专用内存管理器的设计可以从多个角度考虑。我们至少可以想到两个方面:大小和并发。
从大小的角度分为以下两种:
(1)、固定大小:分配固定大小内存块的内存管理器。
(2)、可变大小:分配任意大小内存块的内存管理器。所请求分配的大小事先是未知的。
类似的,从并发的角度也分为以下两种:
(1)、单线程:内存管理器局限在一个线程内。内存被一个线程使用,并且不越出该线程的界限。这种内存管理器不涉及相互访问的多线程。
(2)、多线程:内存管理器被多个线程并发地使用。这种实现需要包含互斥执行的代码段。无论什么时候,只能有一个线程在执行一个代码段。
全局函数new()和delete():从设计上来说,默认的内存管理器是通用的。当调用全局函数new()和delete()时,我们使用的正是默认内存管理器。这两个函数的实现不能作任何简化假设。它们在进程范围内管理内存。既然一个进程可能产生多个线程,new()和delete()就必须能够在多线程环境中运行。而且,每次请求分配的内存大小是不同的。这种灵活性以速度为代价。所做的计算越多,消耗的周期就越多。
通常情况下,客户端代码不需要全局函数new()和delete()的全部功能。它可能只(或通常)需要特定大小的内存块。客户端代码很可能在单线程环境中运行,在这种环境中并不真正需要默认的new()和delete()所提供的并发保护。这样的话,使用这两个函数的全部功能就会浪费CPU周期。通过调整内存分配策略来更好地匹配特定需求,可以明显地提高性能。
灵活性以速度的降低为代价.随着内存管理的功能和灵活性的增强,执行速度将降低.
全局内存管理器(由new()和delete()执行)是通用的,因此代价高。
专用内存管理器比全局内存管理器快一个数量级以上。
如果主要分配固定大小的内存块,专用的固定大小内存管理器将明显地提升性能。
如果主要分配限于单线程的内存块,那么内存管理器也会有类似的性能提高。由于省去了全局函数new()和delete()必须处理的并发问题,单线程内存管理器的性能有所提高。
以下是测试代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 #include <iostream> #include <chrono> #include <string> namespace single_threaded_memory_pool_ { class Rational1 {public : Rational1 (int a = 0 , int b = 1 ) : n (a), d (b) {} private : int n; int d; }; class NextOnFreeList {public : NextOnFreeList* next; }; class Rational2 {public : Rational2 (int a = 0 , int b = 1 ) : n (a),d (b) {} inline void * operator new (size_t size) ; inline void operator delete (void * doomed, size_t size) ; static void newMemPool () { expandTheFreeList (); } static void deleteMemPool () ; private : static NextOnFreeList* freeList; static void expandTheFreeList () ; enum { EXPANSION_SIZE = 32 }; int n; int d; }; NextOnFreeList* Rational2::freeList = nullptr ; inline void * Rational2::operator new (size_t size) { if (nullptr == freeList) { expandTheFreeList (); } NextOnFreeList* head = freeList; freeList = head->next; return head; } inline void Rational2::operator delete (void * doomed, size_t size) { NextOnFreeList* head = static_cast <NextOnFreeList*>(doomed); head->next = freeList; freeList = head; } void Rational2::expandTheFreeList () { size_t size = sizeof (Rational2) > sizeof (NextOnFreeList*) ? sizeof (Rational2) : sizeof (NextOnFreeList*); NextOnFreeList* runner = static_cast <NextOnFreeList*>((void *)new char [size]); freeList = runner; for (int i = 0 ; i < EXPANSION_SIZE; ++i) { runner->next = static_cast <NextOnFreeList*>((void *)new char [size]); runner = runner->next; } runner->next = nullptr ; } void Rational2::deleteMemPool () { NextOnFreeList* nextPtr; for (nextPtr = freeList; nextPtr != nullptr ; nextPtr = freeList) { freeList = freeList->next; delete [] nextPtr; } } template <class T >class MemoryPool1 {public : MemoryPool1 (size_t size = EXPANSION_SIZE); ~MemoryPool1 (); inline void * alloc (size_t size) ; inline void free (void * someElement) ; private : MemoryPool1<T>* next; enum { EXPANSION_SIZE = 32 }; void expandTheFreeList (int howMany = EXPANSION_SIZE) ; }; template <class T >MemoryPool1<T>::MemoryPool1 (size_t size) { expandTheFreeList (size); } template <class T >MemoryPool1<T>::~MemoryPool1 () { MemoryPool1<T>* nextPtr = next; for (nextPtr = next; nextPtr != nullptr ; nextPtr = next) { next = next->next; delete [] static_cast <char *>(static_cast <void *>(nextPtr)); } } template <class T >inline void * MemoryPool1<T>::alloc (size_t size){ if (!next) { expandTheFreeList (); } MemoryPool1<T>* head = next; next = head->next; return head; } template <class T >inline void MemoryPool1<T>::free (void * doomed){ MemoryPool1<T>* head = static_cast <MemoryPool1<T>*>(doomed); head->next = next; next = head; } template <class T >void MemoryPool1<T>::expandTheFreeList (int howMany){ size_t size = sizeof (T) > sizeof (MemoryPool1<T>*) ? sizeof (T) : sizeof (MemoryPool1<T>*); MemoryPool1<T>* runner = static_cast <MemoryPool1<T>*>((void *)(new char [size])); next = runner; for (int i = 0 ; i < howMany; ++i) { runner->next = static_cast <MemoryPool1<T>*>((void *)(new char [size])); runner = runner->next; } runner->next = nullptr ; } class Rational3 {public : Rational3 (int a = 0 , int b = 1 ) : n (a),d (b) {} void * operator new (size_t size) { return memPool->alloc (size); } void operator delete (void * doomed, size_t size) { memPool->free (doomed); } static void newMemPool () { memPool = new MemoryPool1<Rational3>; } static void deleteMemPool () { delete memPool; } private : int n; int d; static MemoryPool1<Rational3>* memPool; }; MemoryPool1<Rational3>* Rational3::memPool = nullptr ; class MemoryChunk {public : MemoryChunk (MemoryChunk* nextChunk, size_t chunkSize); ~MemoryChunk () { delete [] mem; } inline void * alloc (size_t size) ; inline void free (void * someElement) ; MemoryChunk* nextMemChunk () { return next; } size_t spaceAvailable () { return chunkSize - bytesAlreadyAllocated; } enum { DEFAULT_CHUNK_SIZE = 4096 }; private : MemoryChunk* next; void * mem; size_t chunkSize; size_t bytesAlreadyAllocated; }; MemoryChunk::MemoryChunk (MemoryChunk* nextChunk, size_t reqSize) { chunkSize = (reqSize > DEFAULT_CHUNK_SIZE) ? reqSize : DEFAULT_CHUNK_SIZE; next = nextChunk; bytesAlreadyAllocated = 0 ; mem = (void *)(new char [chunkSize]); } void * MemoryChunk::alloc (size_t requestSize) { void * addr = static_cast <void *>(static_cast <char *>(mem) + bytesAlreadyAllocated); bytesAlreadyAllocated += requestSize; return addr; } inline void MemoryChunk::free (void * doomed) {} class ByteMemoryPool {public : ByteMemoryPool (size_t initSize = MemoryChunk::DEFAULT_CHUNK_SIZE); ~ByteMemoryPool (); inline void * alloc (size_t size) ; inline void free (void * someElement) ; private : MemoryChunk* listOfMemoryChunks = nullptr ; void expandStorage (size_t reqSize) ; }; ByteMemoryPool::ByteMemoryPool (size_t initSize) { expandStorage (initSize); } ByteMemoryPool::~ByteMemoryPool () { MemoryChunk* memChunk = listOfMemoryChunks; while (memChunk) { listOfMemoryChunks = memChunk->nextMemChunk (); delete memChunk; memChunk = listOfMemoryChunks; } } void * ByteMemoryPool::alloc (size_t requestSize) { size_t space = listOfMemoryChunks->spaceAvailable (); if (space < requestSize) { expandStorage (requestSize); } return listOfMemoryChunks->alloc (requestSize); } inline void ByteMemoryPool::free (void * doomed) { listOfMemoryChunks->free (doomed); } void ByteMemoryPool::expandStorage (size_t reqSize) { listOfMemoryChunks = new MemoryChunk (listOfMemoryChunks, reqSize); } class Rational4 {public : Rational4 (int a = 0 , int b = 1 ) : n (a),d (b) {} void * operator new (size_t size) { return memPool->alloc (size); } void operator delete (void * doomed, size_t size) { memPool->free (doomed); } static void newMemPool () { memPool = new ByteMemoryPool; } static void deleteMemPool () { delete memPool; } private : int n; int d; static ByteMemoryPool* memPool; }; } using namespace single_threaded_memory_pool_;ByteMemoryPool* Rational4::memPool = nullptr ; int main () { using namespace std::chrono; high_resolution_clock::time_point time_start, time_end; const int cycle_number1{10000 }, cycle_number2{1000 }; { Rational1* array[cycle_number2]; time_start = high_resolution_clock::now (); for (int j =0 ; j < cycle_number1; ++j) { for (int i =0 ; i < cycle_number2; ++i) { array[i] = new Rational1 (i); } for (int i = 0 ; i < cycle_number2; ++i) { delete array[i]; } } time_end = high_resolution_clock::now (); fprintf (stdout, "global function new/delete time spend: %f seconds\n" ,(duration_cast<duration<double >>(time_end - time_start)).count ()); } { Rational2* array[cycle_number2]; time_start = high_resolution_clock::now (); Rational2::newMemPool (); for (int j = 0 ; j < cycle_number1; ++j) { for (int i = 0 ; i < cycle_number2; ++i) { array[i] = new Rational2 (i); } for (int i = 0 ; i < cycle_number2; ++i) { delete array[i]; } } Rational2::deleteMemPool (); time_end = high_resolution_clock::now (); fprintf (stdout, "specialized rational2 memory manager time spend: %f seconds\n" ,(duration_cast<duration<double >>(time_end - time_start)).count ()); } { Rational3* array[cycle_number2]; time_start = high_resolution_clock::now (); Rational3::newMemPool (); for (int j = 0 ; j < cycle_number1; ++j) { for (int i = 0 ; i < cycle_number2; ++i) { array[i] = new Rational3 (i); } for (int i = 0 ; i < cycle_number2; ++i) { delete array[i]; } } Rational3::deleteMemPool (); time_end = high_resolution_clock::now (); fprintf (stdout, "fixed-size object memory pool time spend: %f seconds\n" ,(duration_cast<duration<double >>(time_end - time_start)).count ()); } { Rational4* array[cycle_number2]; time_start = high_resolution_clock::now (); Rational4::newMemPool (); for (int j = 0 ; j < cycle_number1; ++j) { for (int i = 0 ; i < cycle_number2; ++i) { array[i] = new Rational4 (i); } for (int i = 0 ; i < cycle_number2; ++i) { delete array[i]; } } Rational4::deleteMemPool (); time_end = high_resolution_clock::now (); fprintf (stdout, "single-threaded variable-size memory manager time spend: %f seconds\n" ,(duration_cast<duration<double >>(time_end - time_start)).count ()); } return 0 ; }
利用 godbolt 执行,相应代码见 点击这里 测试结果如下:
1 2 3 4 global function new/delete time spend: 0.232326 seconds specialized rational2 memory manager time spend: 0.022057 seconds fixed-size object memory pool time spend: 0.043718 seconds single-threaded variable-size memory manager time spend: 0.045875 seconds
多线程内存池
为了使多个线程并发地分配和释放内存,必须在分配器方法中添加互斥锁。
全局内存管理器(通过new()和delete()实现)是通用的,因此它的开销也非常大。
因为单线程内存管理器要比多线程内存管理器快的多,所以如果要分配的大多数内存块限于单线程中使用,那么可以显著提升性能。
如果开发了一套有效的单线程分配器,那么通过模板可以方便地将它们扩展到多线程环境中。
以下是测试代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 #include <iostream> #include <chrono> #include <string> #include <mutex> namespace multi_threaded_memory_pool_ { class Rational1 {public : Rational1 (int a = 0 , int b = 1 ) : n (a), d (b) {} private : int n; int d; }; class MemoryChunk {public : MemoryChunk (MemoryChunk* nextChunk, size_t chunkSize); ~MemoryChunk () { delete [] mem; } inline void * alloc (size_t size) ; inline void free (void * someElement) ; MemoryChunk* nextMemChunk () { return next; } size_t spaceAvailable () { return chunkSize - bytesAlreadyAllocated; } enum { DEFAULT_CHUNK_SIZE = 4096 }; private : MemoryChunk* next; void * mem; size_t chunkSize; size_t bytesAlreadyAllocated; }; MemoryChunk::MemoryChunk (MemoryChunk* nextChunk, size_t reqSize) { chunkSize = (reqSize > DEFAULT_CHUNK_SIZE) ? reqSize : DEFAULT_CHUNK_SIZE; next = nextChunk; bytesAlreadyAllocated = 0 ; mem = new char [chunkSize]; } void * MemoryChunk::alloc (size_t requestSize) { void * addr = static_cast <void *>(static_cast <char *>(mem) + bytesAlreadyAllocated); bytesAlreadyAllocated += requestSize; return addr; } inline void MemoryChunk::free (void * doomed) {} class ByteMemoryPool {public : ByteMemoryPool (size_t initSize = MemoryChunk::DEFAULT_CHUNK_SIZE); ~ByteMemoryPool (); inline void * alloc (size_t size) ; inline void free (void * someElement) ; private : MemoryChunk* listOfMemoryChunks = nullptr ; void expandStorage (size_t reqSize) ; }; ByteMemoryPool::ByteMemoryPool (size_t initSize) { expandStorage (initSize); } ByteMemoryPool::~ByteMemoryPool () { MemoryChunk* memChunk = listOfMemoryChunks; while (memChunk) { listOfMemoryChunks = memChunk->nextMemChunk (); delete memChunk; memChunk = listOfMemoryChunks; } } void * ByteMemoryPool::alloc (size_t requestSize) { size_t space = listOfMemoryChunks->spaceAvailable (); if (space < requestSize) { expandStorage (requestSize); } return listOfMemoryChunks->alloc (requestSize); } inline void ByteMemoryPool::free (void * doomed) { listOfMemoryChunks->free (doomed); } void ByteMemoryPool::expandStorage (size_t reqSize) { listOfMemoryChunks = new MemoryChunk (listOfMemoryChunks, reqSize); } template <class POOLTYPE , class LOCK >class MTMemoryPool {public : inline void * alloc (size_t size) ; inline void free (void * someElement) ; private : POOLTYPE stPool; LOCK theLock; }; template <class M , class L >inline void * MTMemoryPool<M, L>::alloc (size_t size){ void * mem; theLock.lock (); mem = stPool.alloc (size); theLock.unlock (); return mem; } template <class M , class L >inline void MTMemoryPool<M, L>::free (void * doomed){ theLock.lock (); stPool.free (doomed); theLock.unlock (); } class ABCLock { public : virtual ~ABCLock () {} virtual void lock () = 0 ; virtual void unlock () = 0 ; }; class MutexLock : public ABCLock {public : MutexLock () {} ~MutexLock () {} inline void lock () { mtx.lock (); } inline void unlock () { mtx.unlock (); } private : std::mutex mtx; }; class Rational2 {public : Rational2 (int a = 0 , int b = 1 ) : n (a),d (b) {} void * operator new (size_t size) { return memPool->alloc (size); } void operator delete (void * doomed, size_t size) { memPool->free (doomed); } static void newMemPool () { memPool = new MTMemoryPool<ByteMemoryPool, MutexLock>; } static void deleteMemPool () { delete memPool; } private : int n; int d; static MTMemoryPool<ByteMemoryPool, MutexLock>* memPool; }; MTMemoryPool<ByteMemoryPool, MutexLock>* Rational2::memPool = nullptr ; class DummyLock : public ABCLock {public : inline void lock () {} inline void unlock () {} }; class Rational3 {public : Rational3 (int a = 0 , int b = 1 ) : n (a),d (b) {} void * operator new (size_t size) { return memPool->alloc (size); } void operator delete (void * doomed, size_t size) { memPool->free (doomed); } static void newMemPool () { memPool = new MTMemoryPool<ByteMemoryPool, DummyLock>; } static void deleteMemPool () { delete memPool; } private : int n; int d; static MTMemoryPool<ByteMemoryPool, DummyLock>* memPool; }; } using namespace multi_threaded_memory_pool_;MTMemoryPool<ByteMemoryPool, DummyLock>* Rational3::memPool = nullptr ; int main () { using namespace std::chrono; high_resolution_clock::time_point time_start, time_end; const int cycle_number1{10000 }, cycle_number2{1000 }; { Rational1* array[cycle_number2]; time_start = high_resolution_clock::now (); for (int j =0 ; j < cycle_number1; ++j) { for (int i =0 ; i < cycle_number2; ++i) { array[i] = new Rational1 (i); } for (int i = 0 ; i < cycle_number2; ++i) { delete array[i]; } } time_end = high_resolution_clock::now (); fprintf (stdout, "global function new/delete time spend: %f seconds\n" ,(duration_cast<duration<double >>(time_end - time_start)).count ()); } { Rational2* array[cycle_number2]; time_start = high_resolution_clock::now (); Rational2::newMemPool (); for (int j = 0 ; j < cycle_number1; ++j) { for (int i = 0 ; i < cycle_number2; ++i) { array[i] = new Rational2 (i); } for (int i = 0 ; i < cycle_number2; ++i) { delete array[i]; } } Rational2::deleteMemPool (); time_end = high_resolution_clock::now (); fprintf (stdout, "multi-threaded variable-size memory manager time spend: %f seconds\n" ,(duration_cast<duration<double >>(time_end - time_start)).count ()); } { Rational3* array[cycle_number2]; time_start = high_resolution_clock::now (); Rational3::newMemPool (); for (int j = 0 ; j < cycle_number1; ++j) { for (int i = 0 ; i < cycle_number2; ++i) { array[i] = new Rational3 (i); } for (int i = 0 ; i < cycle_number2; ++i) { delete array[i]; } } Rational3::deleteMemPool (); time_end = high_resolution_clock::now (); fprintf (stdout, "multi-threaded variable-size memory manager in single-threaded environment time spend: %f seconds\n" ,(duration_cast<duration<double >>(time_end - time_start)).count ()); } return 0 ; }
利用 godbolt 执行,相应代码见 点击这里 测试结果如下:
1 2 3 global function new/delete time spend: 0.192321 seconds multi-threaded variable-size memory manager time spend: 0.083261 seconds multi-threaded variable-size memory manager in single-threaded environment time spend: 0.028608 seconds