做比较难的堆题需要对堆这个数据结构以及glibc各种函数的实现源码有充分的掌握，因此我打算详细读一遍源码，对堆获得一个比较完备的认识。
本文分析的glibc版本为glibc2.33
注：本文由l0tus一人编写而成，编写过程参考了ctf wiki和A1ex、chuj等师傅的博客，如果有错误欢迎联系本人更正。

从一个chunk的结构开始讲起

在glibc源码中，可以找到malloc_chunk这个结构体的定义：

struct malloc_chunk {

  INTERNAL_SIZE_T      mchunk_prev_size;  /* Size of previous chunk (if free).  */
  INTERNAL_SIZE_T      mchunk_size;       /* Size in bytes, including overhead. */

  struct malloc_chunk* fd;         /* double links -- used only if free. */
  struct malloc_chunk* bk;

  /* Only used for large blocks: pointer to next larger size.  */
  struct malloc_chunk* fd_nextsize; /* double links -- used only if free. */
  struct malloc_chunk* bk_nextsize;
};

这就是一个标准的chunk结构，我们对其加以分析。首先最前面的0x10位是mchunk_prev_size和mchunk_size位，也就是常说的prev_size和size，分别对应了上一个chunk和当前chunk的大小，这部分就是常说的chunk头。但是需要注意的是，prev_size只在物理相邻的上一个堆块为freed状态时才生效。
后0x10位是fd和bk指针，用于记录单双向链表（也就是被free之后形成的链表）的前后chunk节点。后0x10是fd_nextsize和bk_nextsize，记录链表前后chunk的size，生效的情况与前者一致。
从fd开始到下一个chunk的prev_size可以看作是user_data部分，也就是一个chunk可写入部分。
值得注意的是size位存在三个比特位的利用，具体定义在源码中如下：

/* size field is or'ed with PREV_INUSE when previous adjacent chunk in use */
#define PREV_INUSE 0x1

/* extract inuse bit of previous chunk */
#define prev_inuse(p)       ((p)->mchunk_size & PREV_INUSE)


/* size field is or'ed with IS_MMAPPED if the chunk was obtained with mmap() */
#define IS_MMAPPED 0x2

/* check for mmap()'ed chunk */
#define chunk_is_mmapped(p) ((p)->mchunk_size & IS_MMAPPED)


/* size field is or'ed with NON_MAIN_ARENA if the chunk was obtained
   from a non-main arena.  This is only set immediately before handing
   the chunk to the user, if necessary.  */
#define NON_MAIN_ARENA 0x4

/*
   Bits to mask off when extracting size

   Note: IS_MMAPPED is intentionally not masked off from size field in
   macros for which mmapped chunks should never be seen. This should
   cause helpful core dumps to occur if it is tried by accident by
   people extending or adapting this malloc.
 */
#define SIZE_BITS (PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)

从这里看到，prev_inuse、is_mmapped、non_main_arena分别对应0x1、0x2、0x4这三个位作为标志位，prev_inuse的含义是物理相邻的前一个chunk是否处于malloced状态，是为1，否为0。is_mmaped记录当前 chunk 是否是由 mmap 分配的，NON_MAIN_ARENA，记录当前 chunk 是否不属于主线程，1 表示不属于，0 表示属于。
这三者被宏定义为SIZE_BITS位，那么为什么glibc要这么做呢？因为一个chunk的size大小一般为2 * SIZE_SZ 的最小整数倍，也就是0x10对齐，那么size的低4位就是0，为了提高效率将低三位拿来作bitflag，也就是这三个标志位。
其实看到这里读者肯定会产生疑惑，chunk的大小不是通过你malloc的参数决定的吗？怎么就说是0x10对齐了呢？
关于chunk到底多大，大小对齐是怎么来的，我在cj学长博客看到这篇文章的链接，确实解释得很清楚，读者可以先去看看这篇文章，心里的疑惑大概就通顺了。
那么我们可以画一个示意图来形象地表示堆块，我个人觉得最详细的堆块结构示意图（自己画的），如下：

值得注意的是，prev_size这个位只有在前一个chunk处于free的时候在会被使用，因此当前一个chunk处于malloced状态时，后者prev_size这个位会被当作前一个chunk的user_data，这个就是所谓的空间复用，这个值得特别记一下，因为涉及到后面的off-by-one/off-by-null以及相关的unlink、堆合并等操作。
剩下的fd、bk这些比较简单，这俩是记录在链表中的前后chunk，既然是在链表中那就不一定是物理相邻，什么时候会形成链表呢？答案就是被free
fd_nextsize、bk_nextsize只出现在large bin中。在large bin attack的时候会用到。
到这里对于堆块的结构认识已经比较全面了，我把glibc用于解释的源码也放在这，讲的比较详细，但没我解释得直观就是了：

/*
   malloc_chunk details:

    (The following includes lightly edited explanations by Colin Plumb.)

    Chunks of memory are maintained using a `boundary tag' method as
    described in e.g., Knuth or Standish.  (See the paper by Paul
    Wilson ftp://ftp.cs.utexas.edu/pub/garbage/allocsrv.ps for a
    survey of such techniques.)  Sizes of free chunks are stored both
    in the front of each chunk and at the end.  This makes
    consolidating fragmented chunks into bigger chunks very fast.  The
    size fields also hold bits representing whether chunks are free or
    in use.

    An allocated chunk looks like this:


    chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	    |             Size of previous chunk, if unallocated (P clear)  |
	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	    |             Size of chunk, in bytes                     |A|M|P|
      mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	    |             User data starts here...                          .
	    .                                                               .
	    .             (malloc_usable_size() bytes)                      .
	    .                                                               |
nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	    |             (size of chunk, but used for application data)    |
	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	    |             Size of next chunk, in bytes                |A|0|1|
	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

    Where "chunk" is the front of the chunk for the purpose of most of
    the malloc code, but "mem" is the pointer that is returned to the
    user.  "Nextchunk" is the beginning of the next contiguous chunk.

    Chunks always begin on even word boundaries, so the mem portion
    (which is returned to the user) is also on an even word boundary, and
    thus at least double-word aligned.

    Free chunks are stored in circular doubly-linked lists, and look like this:

    chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	    |             Size of previous chunk, if unallocated (P clear)  |
	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    `head:' |             Size of chunk, in bytes                     |A|0|P|
      mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	    |             Forward pointer to next chunk in list             |
	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	    |             Back pointer to previous chunk in list            |
	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	    |             Unused space (may be 0 bytes long)                .
	    .                                                               .
	    .                                                               |
nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    `foot:' |             Size of chunk, in bytes                           |
	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	    |             Size of next chunk, in bytes                |A|0|0|
	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

    The P (PREV_INUSE) bit, stored in the unused low-order bit of the
    chunk size (which is always a multiple of two words), is an in-use
    bit for the *previous* chunk.  If that bit is *clear*, then the
    word before the current chunk size contains the previous chunk
    size, and can be used to find the front of the previous chunk.
    The very first chunk allocated always has this bit set,
    preventing access to non-existent (or non-owned) memory. If
    prev_inuse is set for any given chunk, then you CANNOT determine
    the size of the previous chunk, and might even get a memory
    addressing fault when trying to do so.

    The A (NON_MAIN_ARENA) bit is cleared for chunks on the initial,
    main arena, described by the main_arena variable.  When additional
    threads are spawned, each thread receives its own arena (up to a
    configurable limit, after which arenas are reused for multiple
    threads), and the chunks in these arenas have the A bit set.  To
    find the arena for a chunk on such a non-main arena, heap_for_ptr
    performs a bit mask operation and indirection through the ar_ptr
    member of the per-heap header heap_info (see arena.c).

    Note that the `foot' of the current chunk is actually represented
    as the prev_size of the NEXT chunk. This makes it easier to
    deal with alignments etc but can be very confusing when trying
    to extend or adapt this code.

    The three exceptions to all this are:

     1. The special chunk `top' doesn't bother using the
	trailing size field since there is no next contiguous chunk
	that would have to index off it. After initialization, `top'
	is forced to always exist.  If it would become less than
	MINSIZE bytes long, it is replenished.

     2. Chunks allocated via mmap, which have the second-lowest-order
	bit M (IS_MMAPPED) set in their size fields.  Because they are
	allocated one-by-one, each must contain its own trailing size
	field.  If the M bit is set, the other bits are ignored
	(because mmapped chunks are neither in an arena, nor adjacent
	to a freed chunk).  The M bit is also used for chunks which
	originally came from a dumped heap via malloc_set_state in
	hooks.c.

     3. Chunks in fastbins are treated as allocated chunks from the
	point of view of the chunk allocator.  They are consolidated
	with their neighbors only in bulk, in malloc_consolidate.
*/

/*
  ---------- Size and alignment checks and conversions ----------
*/

/* Conversion from malloc headers to user pointers, and back.  When
   using memory tagging the user data and the malloc data structure
   headers have distinct tags.  Converting fully from one to the other
   involves extracting the tag at the other address and creating a
   suitable pointer using it.  That can be quite expensive.  There are
   cases when the pointers are not dereferenced (for example only used
   for alignment check) so the tags are not relevant, and there are
   cases when user data is not tagged distinctly from malloc headers
   (user data is untagged because tagging is done late in malloc and
   early in free).  User memory tagging across internal interfaces:

      sysmalloc: Returns untagged memory.
      _int_malloc: Returns untagged memory.
      _int_free: Takes untagged memory.
      _int_memalign: Returns untagged memory.
      _int_memalign: Returns untagged memory.
      _mid_memalign: Returns tagged memory.
      _int_realloc: Takes and returns tagged memory.
*/

malloc_state

bins被定义在malloc_state结构体，类型是mfastbinptr和mchunkptr，分别找到这两者的原始定义发现其实都是malloc_chunk

typedef struct malloc_chunk *mfastbinptr;
typedef struct malloc_chunk* mchunkptr;

#define MAX_FAST_SIZE     (80 * SIZE_SZ / 4)

#define NFASTBINS  (fastbin_index (request2size (MAX_FAST_SIZE)) + 1)

#define NBINS             128

struct malloc_state
{
  /* Serialize access.  */
  __libc_lock_define (, mutex);

  /* Flags (formerly in max_fast).  */
  int flags;

  /* Set if the fastbin chunks contain recently inserted free blocks.  */
  /* Note this is a bool but not all targets support atomics on booleans.  */
  int have_fastchunks;

  /* Fastbins */
  mfastbinptr fastbinsY[NFASTBINS];

  /* Base of the topmost chunk -- not otherwise kept in a bin */
  mchunkptr top;

  /* The remainder from the most recent split of a small request */
  mchunkptr last_remainder;

  /* Normal bins packed as described above */
  mchunkptr bins[NBINS * 2 - 2];

  /* Bitmap of bins */
  unsigned int binmap[BINMAPSIZE];

  /* Linked list */
  struct malloc_state *next;

  /* Linked list for free arenas.  Access to this field is serialized
     by free_list_lock in arena.c.  */
  struct malloc_state *next_free;

  /* Number of threads attached to this arena.  0 if the arena is on
     the free list.  Access to this field is serialized by
     free_list_lock in arena.c.  */
  INTERNAL_SIZE_T attached_threads;

  /* Memory allocated from the system in this arena.  */
  INTERNAL_SIZE_T system_mem;
  INTERNAL_SIZE_T max_system_mem;
};

以上源码中，flags标志位用来表示当前arena中是否存在fastbin或者内存是否连续等，暂且不去管他
后面定义的是一些比较重要的结构，可以看到主要有两个数组fastbinsY[]和bins[]，fastbinsY的数量是NFASTBINS根据宏定义得知也就是10个
而bins数组中的第一个 bin 是 unsorted bin（1个），数组中从第 2 个到第 63 个 bin 是 small bin（62个），64到126为large bin（63个）
看到这里会产生疑惑，我们在寻找宏定义的时候看到NBINS是128，而bins的数量是NBINS*2-2也就是254，那为什么上述bins加起来也只有一半不到呢？
答案就是unsorted bin, small bin, large bin三者都是由双向链表构成，需要数组中两个相邻的值表示一个chunk。
所以这样表述可能会更贴切：bins中每一组数据由两个元素组成，第1组是Unsorted bin，第2 到 63组是Small bin，第64 到 126组是Large bin。
binmap这个数组表示每一个bit表示对应的bin中是否存在空闲chunk

/*
   Indexing

    Bins for sizes < 512 bytes contain chunks of all the same size, spaced
    8 bytes apart. Larger bins are approximately logarithmically spaced:

    64 bins of size       8
    32 bins of size      64
    16 bins of size     512
     8 bins of size    4096
     4 bins of size   32768
     2 bins of size  262144
     1 bin  of size what's left

    There is actually a little bit of slop in the numbers in bin_index
    for the sake of speed. This makes no difference elsewhere.

    The bins top out around 1MB because we expect to service large
    requests via mmap.

    Bin 0 does not exist.  Bin 1 is the unordered list; if that would be
    a valid chunk size the small bins are bumped up one.
 */

#define NBINS             128
#define NSMALLBINS         64
#define SMALLBIN_WIDTH    MALLOC_ALIGNMENT
#define SMALLBIN_CORRECTION (MALLOC_ALIGNMENT > CHUNK_HDR_SZ)
#define MIN_LARGE_SIZE    ((NSMALLBINS - SMALLBIN_CORRECTION) * SMALLBIN_WIDTH)

#define in_smallbin_range(sz)  \
  ((unsigned long) (sz) < (unsigned long) MIN_LARGE_SIZE)

#define smallbin_index(sz) \
  ((SMALLBIN_WIDTH == 16 ? (((unsigned) (sz)) >> 4) : (((unsigned) (sz)) >> 3))\
   + SMALLBIN_CORRECTION)

#define largebin_index_32(sz)                                                \
  (((((unsigned long) (sz)) >> 6) <= 38) ?  56 + (((unsigned long) (sz)) >> 6) :\
   ((((unsigned long) (sz)) >> 9) <= 20) ?  91 + (((unsigned long) (sz)) >> 9) :\
   ((((unsigned long) (sz)) >> 12) <= 10) ? 110 + (((unsigned long) (sz)) >> 12) :\
   ((((unsigned long) (sz)) >> 15) <= 4) ? 119 + (((unsigned long) (sz)) >> 15) :\
   ((((unsigned long) (sz)) >> 18) <= 2) ? 124 + (((unsigned long) (sz)) >> 18) :\
   126)

#define largebin_index_32_big(sz)                                            \
  (((((unsigned long) (sz)) >> 6) <= 45) ?  49 + (((unsigned long) (sz)) >> 6) :\
   ((((unsigned long) (sz)) >> 9) <= 20) ?  91 + (((unsigned long) (sz)) >> 9) :\
   ((((unsigned long) (sz)) >> 12) <= 10) ? 110 + (((unsigned long) (sz)) >> 12) :\
   ((((unsigned long) (sz)) >> 15) <= 4) ? 119 + (((unsigned long) (sz)) >> 15) :\
   ((((unsigned long) (sz)) >> 18) <= 2) ? 124 + (((unsigned long) (sz)) >> 18) :\
   126)

// XXX It remains to be seen whether it is good to keep the widths of
// XXX the buckets the same or whether it should be scaled by a factor
// XXX of two as well.
#define largebin_index_64(sz)                                                \
  (((((unsigned long) (sz)) >> 6) <= 48) ?  48 + (((unsigned long) (sz)) >> 6) :\
   ((((unsigned long) (sz)) >> 9) <= 20) ?  91 + (((unsigned long) (sz)) >> 9) :\
   ((((unsigned long) (sz)) >> 12) <= 10) ? 110 + (((unsigned long) (sz)) >> 12) :\
   ((((unsigned long) (sz)) >> 15) <= 4) ? 119 + (((unsigned long) (sz)) >> 15) :\
   ((((unsigned long) (sz)) >> 18) <= 2) ? 124 + (((unsigned long) (sz)) >> 18) :\
   126)

#define largebin_index(sz) \
  (SIZE_SZ == 8 ? largebin_index_64 (sz)                                     \
   : MALLOC_ALIGNMENT == 16 ? largebin_index_32_big (sz)                     \
   : largebin_index_32 (sz))

#define bin_index(sz) \
  ((in_smallbin_range (sz)) ? smallbin_index (sz) : largebin_index (sz))

small bin：根据用户请求的 sz 大小得到 small bin的下标的方法是：chunk_size = MALLOC_ALIGNMENT * indx，其中 MALLOC_ALIGNMENT = 2 * SIZE_SZ。所以对于 32位，最大 chunk 为 504字节，对于64最大为 1008 字节。
large bin：共包含 63 个 bin，每个 bin 中的chunk 大小不一致，而是在一定范围之内，63个 bin 被分成 6 组，每组 Bin 的 chunk 的大小之间的公差一致。
largebin 排列方式从大到小依次排列，链表头的 bk 指针指向最小的堆块.
unsorted bin和small bin采取FIFO（first in first out）,其余是FILO

_heap_info

typedef struct _heap_info
{
  mstate ar_ptr; /* Arena for this heap. */
  struct _heap_info *prev; /* Previous heap. */
  size_t size;   /* Current size in bytes. */
  size_t mprotect_size; /* Size in bytes that has been mprotected
                           PROT_READ|PROT_WRITE.  */
  size_t pagesize; /* Page size used when allocating the arena.  */
  /* Make sure the following data is properly aligned, particularly
     that sizeof (heap_info) + 2 * SIZE_SZ is a multiple of
     MALLOC_ALIGNMENT. */
  char pad[-3 * SIZE_SZ & MALLOC_ALIGN_MASK];
} heap_info;

该结构体主要是为了非主线程分配内存使用，也即在非主分配区分配内存。因为在主分配区内存可以直接使用 sbrk 方式拓展内存，只有一个堆。而非主线程的 heap 是提前分配好的，因此当该 heap 资源用完时，就需要重新申请内存空间，因为一般情况下申请的内存空间是不连续的，因此需要记录一个线程中不同的 heap 的链接结构。
ar_ptr:表示该堆所对应的分配区
prev:该线程的上一个堆结构，注意到这里是采用单链表的形式来记录一个线程的所有堆结构的。
size:堆的大小

malloc

在malloc.c中，glibc使用strong_alias (__libc_malloc, __malloc) strong_alias (__libc_malloc, malloc)这条代码将malloc作为__libc_malloc的别名，所以在调用malloc时，实际被调用的是__libc_malloc这个函数。
2.34以上的版本已经取消了与hook有关的使用，所以其malloc函数源码和旧版本不一样，不过现在很多国内比赛的堆题考hook的还是挺多的，上IO_FILE的house of系列估计也暂时还没卷到2.37。我这里就先试着分析旧版本，然后再把无hook的新版本也分析一遍。

2.33（有hook）

初始化malloc

void *
__libc_malloc (size_t bytes)
{
  mstate ar_ptr;
  void *victim;

  _Static_assert (PTRDIFF_MAX <= SIZE_MAX / 2,
                  "PTRDIFF_MAX is not more than half of SIZE_MAX");

  void *(*hook) (size_t, const void *)
    = atomic_forced_read (__malloc_hook);
  if (__builtin_expect (hook != NULL, 0))
    return (*hook)(bytes, RETURN_ADDRESS (0));
}

静态断言的部分我们不去管他，是编译器需要分析的东西，我们直接看malloc的流程。
看到这样一段代码：

1 2	if (__builtin_expect (hook != NULL, 0)) return (*hook)(bytes, RETURN_ADDRESS (0));

在调用__libc_malloc这个函数时，会先检验__malloc_hook是否为空，若不是则直接调用其指向的函数。
在第一次调用malloc函数的时候，malloc_hook指向的是malloc_hook_ini这个函数，这个函数定义在hooks.c中

static void *
malloc_hook_ini (size_t sz, const void *caller)
{
  __malloc_hook = NULL;
  ptmalloc_init ();
  return __libc_malloc (sz);
}

执行这个函数会将__malloc_hook置NULL
同时调用ptmalloc_init这个函数
这个函数定义在arena.c中，主要作用是对堆和main_arena的初始化，我们暂时先不详细分析这个函数。
在初始化完成后再调用__libc_malloc来分配堆块。

进入tcache分支

#if USE_TCACHE
  /* int_free also calls request2size, be careful to not pad twice.  */
  size_t tbytes;
  if (!checked_request2size (bytes, &tbytes))
    {
      __set_errno (ENOMEM);
      return NULL;
    }
  size_t tc_idx = csize2tidx (tbytes);

  MAYBE_INIT_TCACHE ();

  DIAG_PUSH_NEEDS_COMMENT;
  if (tc_idx < mp_.tcache_bins
      && tcache
      && tcache->counts[tc_idx] > 0)
    {
      victim = tcache_get (tc_idx);
      return TAG_NEW_USABLE (victim);
    }
  DIAG_POP_NEEDS_COMMENT;
#endif

进入这个分支以后会先把size储存为tbytes，同时使用csize2tidx这个函数将大小转化成tcache对应的下标。
该函数是宏定义的函数：# define csize2tidx(x) (((x) - MINSIZE + MALLOC_ALIGNMENT - 1) / MALLOC_ALIGNMENT)
在64位系统中，MINISIZE和MALLOC_ALIGNMENT的值分别是0x20，0x10，这个计算式也很好理解了，就是根据size的值与tcache最小的size比较，计算出对应的下标
然后执行的是MAYBE_INIT_TCACHE这个宏，这个内容很简略，就是判断tcache是否初始化，若无，则调用tcahce_init函数来初始化tcache。
tcache_init函数的作用是初始化线程缓存（Thread Cache），它是glibc中用于提高内存分配效率的一种机制。
线程缓存是一种线程本地的内存池，每个线程都有自己的缓存，用于存储小块内存，当线程需要分配小块内存时，会先从自己的缓存中获取，如果缓存中没有，则从全局内存池中获取。
tcache_init函数的作用是初始化线程缓存，它会为每个线程分配一个缓存，并且为每个缓存分配一定数量的内存块，以便线程可以从缓存中获取内存。

tcache声明

1	static __thread tcache_perthread_struct *tcache = NULL;

在MAYBE_INIT_TCACHE这个宏定义的时候看到tcache这个变量，我们去找它的声明，可以看到它的类型是tcache_perthread_struct结构体，__thread是修饰它的关键字，查找这个关键字的作用： __thread是GCC内置的线程局部存储设施，存取效率可以和全局变量相比。__thread变量每一个线程有一份独立实体，各个线程的值互不干扰。可以用来修饰那些带有全局性且值可能变，但是又不值得用全局变量保护的变量。相当于每个线程一个副本。
然后它的类型tcache_perthread_struct结构体定义如下：

# define TCACHE_MAX_BINS		64
# define MAX_TCACHE_SIZE	tidx2usize (TCACHE_MAX_BINS-1)

typedef struct tcache_perthread_struct
{
  uint16_t counts[TCACHE_MAX_BINS];
  tcache_entry *entries[TCACHE_MAX_BINS];
} tcache_perthread_struct;

定义了最大数量64，也就是说tcache中维护了64个不同size且相互隔离的链表，size范围可以通过计算得知是0x20~0x410。
定义了tcache_entry类型的指针数组entries，可以看出它的指向是链表的第一个chunk。
tcache_entry结构定义如下：

typedef struct tcache_entry
{
  struct tcache_entry *next;
  /* This field exists to detect double frees.  */
  struct tcache_perthread_struct *key;
} tcache_entry;

一个next指针指向下一个chunk。2.28以后的版本加入了key，用来检查tcache double free，具体方法大概就是当 free 掉一个堆块进入tcache 时，假如堆块的bk位存放的key == tcache_key，就会遍历这个大小的 Tcache，假如发现同地址的堆块，则触发 Double Free 报错，具体实现暂不分析。

初始化tcache

初始化tcache函数tcache_init:

static void
tcache_init(void)
{
  mstate ar_ptr;
  void *victim = 0;
  const size_t bytes = sizeof (tcache_perthread_struct);

  if (tcache_shutting_down)
    return;

  arena_get (ar_ptr, bytes);
  victim = _int_malloc (ar_ptr, bytes);
  if (!victim && ar_ptr != NULL)
    {
      ar_ptr = arena_get_retry (ar_ptr, bytes);
      victim = _int_malloc (ar_ptr, bytes);
    }


  if (ar_ptr != NULL)
    __libc_lock_unlock (ar_ptr->mutex);

  /* In a low memory situation, we may not be able to allocate memory
     - in which case, we just keep trying later.  However, we
     typically do this very early, so either there is sufficient
     memory, or there isn't enough memory to do non-trivial
     allocations anyway.  */
  if (victim)
    {
      tcache = (tcache_perthread_struct *) victim;
      memset (tcache, 0, sizeof (tcache_perthread_struct));
    }
}
# define MAYBE_INIT_TCACHE() \
  if (__glibc_unlikely (tcache == NULL)) \
    tcache_init();

#else  /* !USE_TCACHE */
# define MAYBE_INIT_TCACHE()

在MAYBE_INIT_TCACHE这个宏被调用的时候会检测tcache是否指向空，如果是则说明未初始化，于是调用tcache_init()这个函数来初始化tcache。这个函数首先用arena_get函数获取到了一个可用的arena线程，后面通过_int_malloc给tcache分配sizeof(tcache_perthread_struct)大小的内存，然后用到__libc_lock_unlock把对应内存空间解锁完成初始化。

tcache分配

DIAG_PUSH_NEEDS_COMMENT;
  if (tc_idx < mp_.tcache_bins
      && tcache
      && tcache->counts[tc_idx] > 0)
    {
      victim = tcache_get (tc_idx);
      return TAG_NEW_USABLE (victim);
    }
  DIAG_POP_NEEDS_COMMENT;

初始化完成以后，malloc在tcache分支里执行的是这些代码。DIAG_PUSH_NEEDS_COMMENT; DIAG_POP_NEEDS_COMMENT;这两句影响不大，暂且不去管他。关键语是后面。判断：
1、tc_idx是否在tcache_bins的范围内，tcache_bins定义在mp_结构体里面。关于这个结构体的利用在hgame2023-large_note这道题中可以学习。
2、tcache是否已经初始化
3、tcache在该下标对应的链表中是否存在chunk
判断通过则调用tcache_get函数

tcache_get()

#define aligned_OK(m)  (((unsigned long)(m) & MALLOC_ALIGN_MASK) == 0)

static __always_inline void *
tcache_get (size_t tc_idx)
{
  tcache_entry *e = tcache->entries[tc_idx];
  if (__glibc_unlikely (!aligned_OK (e)))
    malloc_printerr ("malloc(): unaligned tcache chunk detected");
  tcache->entries[tc_idx] = REVEAL_PTR (e->next);
  --(tcache->counts[tc_idx]);
  e->key = NULL;
  return (void *) e;
}

aligned_OK是宏定义检查堆块是否对齐，64位是0x10，32位是0x8。tcache_get先将目标的链表头存在局部变量e，检查e是否对齐，检查通过以后将e指向的下一个堆块通过REVEAL_PTR解密后放到当前的entries[tc_idx]，然后当前链表计数-1，将e的key置空、返回e。
这里的REVEAL_PTR解密与safe-linking机制有关，2.32开始加入。
safe-linking机制实现如下：

1
2
3

#define PROTECT_PTR(pos, ptr) \
  ((__typeof (ptr)) ((((size_t) pos) >> 12) ^ ((size_t) ptr)))
#define REVEAL_PTR(ptr)  PROTECT_PTR (&ptr, ptr)

原理是一个PROTECT_PTR函数，参数是指针的地址和本身，将地址右移12位再异或指针本身。因此绕过这个保护的常用方法是泄露堆地址，计算出指针的地址进行右移得到key，然后再写入一些特定的指针的时候要先将其本身与key异或加密再写入就可以。
tcache_get结束后执行以下代码

if (SINGLE_THREAD_P)
  {
    victim = TAG_NEW_USABLE (_int_malloc (&main_arena, bytes));
    assert (!victim || chunk_is_mmapped (mem2chunk (victim)) ||
     &main_arena == arena_for_chunk (mem2chunk (victim)));
    return victim;
  }

后面就到了malloc的最关键也是最长的函数_int_malloc

_int_malloc

这部分流程很复杂，函数也很长，我们先从头开始看

局部变量

INTERNAL_SIZE_T nb;               /* normalized request size */
  unsigned int idx;                 /* associated bin index */
  mbinptr bin;                      /* associated bin */

  mchunkptr victim;                 /* inspected/selected chunk */
  INTERNAL_SIZE_T size;             /* its size */
  int victim_index;                 /* its bin index */

  mchunkptr remainder;              /* remainder from a split */
  unsigned long remainder_size;     /* its size */

  unsigned int block;               /* bit map traverser */
  unsigned int bit;                 /* bit map traverser */
  unsigned int map;                 /* current word of binmap */

  mchunkptr fwd;                    /* misc temp for linking */
  mchunkptr bck;                    /* misc temp for linking */

在_int_malloc()开始的时候先定义了这些局部变量，在后面会被使用到。nb是内部大小；idx是对应的bin的下标；bin变量就是对应的bin；victim变量指的就是待操作的被选中的堆块；size也就是其大小；victim_index对应待处理的堆块在bin里面的下标；last_remainder的类型是mchunkptr，这个变量的作用是储存被切割的chunk，这个在后续的unsorted bin中chunk的分配/切割操作会有用到；remainder_size就是它的大小； block，bit，map 三个变量是为了从 binmap 中寻址的，binmap定义如下：

1 2	/* Bitmap of bins */ unsigned int binmap[BINMAPSIZE];

与之相关的宏：

/*
   Binmap

    To help compensate for the large number of bins, a one-level index
    structure is used for bin-by-bin searching.  `binmap' is a
    bitvector recording whether bins are definitely empty so they can
    be skipped over during during traversals.  The bits are NOT always
    cleared as soon as bins are empty, but instead only
    when they are noticed to be empty during traversal in malloc.
 */

/* Conservatively use 32 bits per map word, even if on 64bit system */
#define BINMAPSHIFT      5
#define BITSPERMAP       (1U << BINMAPSHIFT)
#define BINMAPSIZE       (NBINS / BITSPERMAP)

#define idx2block(i)     ((i) >> BINMAPSHIFT)
#define idx2bit(i)       ((1U << ((i) & ((1U << BINMAPSHIFT) - 1))))

#define mark_bin(m, i)    ((m)->binmap[idx2block (i)] |= idx2bit (i))
#define unmark_bin(m, i)  ((m)->binmap[idx2block (i)] &= ~(idx2bit (i)))
#define get_binmap(m, i)  ((m)->binmap[idx2block (i)] & idx2bit (i))

这里可以看到BINMAPSIZE的计算方式是(NBINS / BITSPERMAP)，NBINS我们知道是128，BITSPERMAP是1左移5也就是32，那么BINMAPSIZE计算出来就是4。

第一部分

#if USE_TCACHE
  size_t tcache_unsorted_count;	    /* count of unsorted chunks processed */
#endif

  /*
     Convert request size to internal form by adding SIZE_SZ bytes
     overhead plus possibly more to obtain necessary alignment and/or
     to obtain a size of at least MINSIZE, the smallest allocatable
     size. Also, checked_request2size returns false for request sizes
     that are so large that they wrap around zero when padded and
     aligned.
   */

  if (!checked_request2size (bytes, &nb))
    {
      __set_errno (ENOMEM);
      return NULL;
    }

  /* There are no usable arenas.  Fall back to sysmalloc to get a chunk from
     mmap.  */
  if (__glibc_unlikely (av == NULL))
    {
      void *p = sysmalloc (nb, av);
      if (p != NULL)
	alloc_perturb (p, bytes);
      return p;
    }

这部分首先把bytes转化位内部大小nb，然后判断申请的合法性。若不合法则返回NULL，这里关键的宏就是前面分析过的request2size
__glibc_unlikely传入的参数av是在__libc_malloc中调用arena_get获得的分配去指针，如果为null，就表示没有分配区可用，这时候就直接调用sysmalloc通过mmap获取chunk。
至此_int_malloc()流程开始进入bins部分

CAS

在malloc的实现中，需要频繁的插入和删除各个bin中的chunk，很多地方用到了CAS操作，因为用的比较多，这里先简单介绍一下
CAS是compare and swap的缩写，它是原子操作的一种，可用于在多线程编程中实现不被打断的数据交换操作，从而避免多线程同时改写某一数据时由于执行顺序不确定性以及中断的不可预知性产生的数据不一致问题，该操作通过将内存中的值与指定数据进行比较，当数值一样时将内存中的数据替换为新的值。
CAS需要有3个操作数：内存地址，旧的预期值，即将要更新的目标值，CAS指令执行时，当且仅当内存地址的值与预期值相等时，将内存地址的值修改为目标值，否则就什么都不做，整个比较并替换的操作是一个原子操作，下面举一个例子：

#define REMOVE_FB(fb, victim, pp)			\
  do							\
    {							\
      victim = pp;					\
      if (victim == NULL)				\
	break;						\
      pp = REVEAL_PTR (victim->fd);                                     \
      if (__glibc_unlikely (pp != NULL && misaligned_chunk (pp)))       \
	malloc_printerr ("malloc(): unaligned fastbin chunk detected"); \
    }							\
  while ((pp = catomic_compare_and_exchange_val_acq (fb, pp, victim)) \
  	 != victim);

上面这段代码是用来从fast bin中删除一个chunk，我们这里只关注catomic_compare_and_exchange_val_acq(fb, pp, victim)这个函数调用，其中fb是表头，pp新的节点，victim是老的节点，需要把老节点删掉，把新节点接上，这个调用就是通过CAS操作保证thread-safe的，以x86平台为例，一直往下追，最底层的实现如下：

#define __arch_c_compare_and_exchange_val_32_acq(mem, newval, oldval) \
  ({                                                                  \
    __typeof(*mem) ret;                                               \
    __asm __volatile("cmpl $0, %%" SEG_REG ":%P5\n\t"                 \
                     "je 0f\n\t"                                      \
                     "lock\n"                                         \
                     "0:\tcmpxchgl %2, %1"                            \
                     : "=a"(ret), "=m"(*mem)                          \
                     : BR_CONSTRAINT(newval), "m"(*mem), "0"(oldval), \
                       "i"(offsetof(tcbhead_t, multiple_threads)));   \
    ret;                                                              \
  })

这是一段x86的内联汇编，GCC的内联汇编语法大家可以自行查阅相关资料，这里只关注lock和cmpxchgl这两个指令，lock确保对内存的read/write操作原子执行，cmpxchgl用来比较并交换操作数，所以归根结底，CAS操作还是通过硬件指令的支持才能实现原子操作

fast bin

if ((unsigned long) (nb) <= (unsigned long) (get_max_fast ()))
    {
      idx = fastbin_index (nb);
      mfastbinptr *fb = &fastbin (av, idx);
      mchunkptr pp;
      victim = *fb;

      if (victim != NULL)
	{
	  if (__glibc_unlikely (misaligned_chunk (victim)))
	    malloc_printerr ("malloc(): unaligned fastbin chunk detected 2");

	  if (SINGLE_THREAD_P)
	    *fb = REVEAL_PTR (victim->fd);
	  else
	    REMOVE_FB (fb, pp, victim);
	  if (__glibc_likely (victim != NULL))
	    {
	      size_t victim_idx = fastbin_index (chunksize (victim));
	      if (__builtin_expect (victim_idx != idx, 0))
		malloc_printerr ("malloc(): memory corruption (fast)");
	      check_remalloced_chunk (av, victim, nb);
```        
首先判断内部大小是否在fastbin的范围内，与get_max_fast()函数取得的最大大小比较：
```c
static inline INTERNAL_SIZE_T
get_max_fast (void)
{
  /* Tell the GCC optimizers that global_max_fast is never larger
     than MAX_FAST_SIZE.  This avoids out-of-bounds array accesses in
     _int_malloc after constant propagation of the size parameter.
     (The code never executes because malloc preserves the
     global_max_fast invariant, but the optimizers may not recognize
     this.)  */
  if (global_max_fast > MAX_FAST_SIZE)
    __builtin_unreachable ();
  return global_max_fast;
}

这个函数就是返回global_max_fast这个全局变量

如果在该范围内则调用以下fastbin的分配方式：
首先将大小转化成fastbin的对应索引，然后得到所在的fastbin的头指针即链表头的堆块，如果非空则解链、取出。*fb = REVEAL_PTR (victim->fd);指向下一个节点，然后用前面定义的的REMOVE_FB宏删除堆块。

fastbin stash

2.26以后加入了这个机制

#if USE_TCACHE
	      /* While we're here, if we see other chunks of the same size,
		 stash them in the tcache.  */
	      size_t tc_idx = csize2tidx (nb);
	      if (tcache && tc_idx < mp_.tcache_bins)
		{
		  mchunkptr tc_victim;

		  /* While bin not empty and tcache not full, copy chunks.  */
		  while (tcache->counts[tc_idx] < mp_.tcache_count
			 && (tc_victim = *fb) != NULL)
		    {
		      if (__glibc_unlikely (misaligned_chunk (tc_victim)))
			malloc_printerr ("malloc(): unaligned fastbin chunk detected 3");
		      if (SINGLE_THREAD_P)
			*fb = REVEAL_PTR (tc_victim->fd);
		      else
			{
			  REMOVE_FB (fb, pp, tc_victim);
			  if (__glibc_unlikely (tc_victim == NULL))
			    break;
			}
		      tcache_put (tc_victim, tc_idx);
		    }
		}

实现方式就是满足tcache->counts[tc_idx] < mp_.tcache_count && (tc_victim = *fb) != NULL条件时把fastbin中的堆块放进tcache中，直到fastbin为空或者tcache满。唯一的检查就是堆块是否对齐。

#endif
	      void *p = chunk2mem (victim);
	      alloc_perturb (p, bytes);
	      return p;

fastbin stash执行完以后就正常返回堆块了。

small bin

small bin和large bin的判断和分配在一个if else语句中，
small bin分配的源码如下：

if (in_smallbin_range (nb))
    {
      idx = smallbin_index (nb);
      bin = bin_at (av, idx);

      if ((victim = last (bin)) != bin)
        {
          bck = victim->bk;
	  if (__glibc_unlikely (bck->fd != victim))
	    malloc_printerr ("malloc(): smallbin double linked list corrupted");
          set_inuse_bit_at_offset (victim, nb);
          bin->bk = bck;
          bck->fd = bin;

          if (av != &main_arena)
	    set_non_main_arena (victim);
          check_malloced_chunk (av, victim, nb);
#if USE_TCACHE
	  /* While we're here, if we see other chunks of the same size,
	     stash them in the tcache.  */
	  size_t tc_idx = csize2tidx (nb);
	  if (tcache && tc_idx < mp_.tcache_bins)
	    {
	      mchunkptr tc_victim;

	      /* While bin not empty and tcache not full, copy chunks over.  */
	      while (tcache->counts[tc_idx] < mp_.tcache_count
		     && (tc_victim = last (bin)) != bin)
		{
		  if (tc_victim != 0)
		    {
		      bck = tc_victim->bk;
		      set_inuse_bit_at_offset (tc_victim, nb);
		      if (av != &main_arena)
			set_non_main_arena (tc_victim);
		      bin->bk = bck;
		      bck->fd = bin;

		      tcache_put (tc_victim, tc_idx);
	            }
		}
	    }
#endif
          void *p = chunk2mem (victim);
          alloc_perturb (p, bytes);
          return p;
        }
    }

fastbin中如果找不到合适的堆块、那就到smallbin中用bestfit原理去查找，每一个smallbin都对应了一个确切大小，因此需要检查对应的链表就可以不需要盲目搜索。
具体代码实现的话就是这一段：

if (in_smallbin_range (nb))
    {
      idx = smallbin_index (nb);
      bin = bin_at (av, idx);

      if ((victim = last (bin)) != bin)
        {
          bck = victim->bk;
	  if (__glibc_unlikely (bck->fd != victim))
	    malloc_printerr ("malloc(): smallbin double linked list corrupted");
          set_inuse_bit_at_offset (victim, nb);
          bin->bk = bck;
          bck->fd = bin;

          if (av != &main_arena)
	    set_non_main_arena (victim);
          check_malloced_chunk (av, victim, nb);

根据所需大小找到smallbin中对应的链表的index，bin_at函数作用是跟去上一步的idx计算出所在链表在main_arena中的表头，后面的一步关键判断是把链表中的最后一个chunk当作victim，并检查是否非空而且非链表头，如果是那就说明smallbin已经形成了双向链表，那就将victim解链、取出并且将物理相邻的下一个堆块的prev_inuse标志位置1.

然后就是2.29之后和fastbin类似的stash机制

#if USE_TCACHE
	  /* While we're here, if we see other chunks of the same size,
	     stash them in the tcache.  */
	  size_t tc_idx = csize2tidx (nb);
	  if (tcache && tc_idx < mp_.tcache_bins)
	    {
	      mchunkptr tc_victim;

	      /* While bin not empty and tcache not full, copy chunks over.  */
	      while (tcache->counts[tc_idx] < mp_.tcache_count
		     && (tc_victim = last (bin)) != bin)
		{
		  if (tc_victim != 0)
		    {
		      bck = tc_victim->bk;
		      set_inuse_bit_at_offset (tc_victim, nb);
		      if (av != &main_arena)
			set_non_main_arena (tc_victim);
		      bin->bk = bck;
		      bck->fd = bin;

		      tcache_put (tc_victim, tc_idx);
	            }
		}
	    }
#endif

方法也是将smallbin中的堆块一个个取出放入tcache，一直到smallbin空空或者tcache满。这里是有洞的，可以参考安全客这篇文章：https://www.anquanke.com/post/id/222079
完成以后就是返回目标的堆块

#endif
          void *p = chunk2mem (victim);
          alloc_perturb (p, bytes);
          return p;
        }
    }

实现如上

前面提到small bin分配的前提是通过small bin size和是否形成双向链表的检查，如果没通过就不会分配，并且运行到这里：

/*
   If this is a large request, consolidate fastbins before continuing.
   While it might look excessive to kill all fastbins before
   even seeing if there is space available, this avoids
   fragmentation problems normally associated with fastbins.
   Also, in practice, programs tend to have runs of either small or
   large requests, but less often mixtures, so consolidation is not
   invoked all that often in most programs. And the programs that
   it is called frequently in otherwise tend to fragment.
 */

else
  {
    idx = largebin_index (nb);
    if (atomic_load_relaxed (&av->have_fastchunks))
      malloc_consolidate (av);
  }

也就是说如果fast bin和small bin都不能分配，那么就回到large request的流程，能走到这一步就说明前面的bins中有很多chunk碎片，也就是不满足大小需求的很多小chunk。为了避免内存碎片过多的情况，这里会先调用malloc_consolidate来合并fastbin中的堆块
下面来分析一下这个比较重要的函数

malloc consolidate

/*
  ------------------------- malloc_consolidate -------------------------

  malloc_consolidate is a specialized version of free() that tears
  down chunks held in fastbins.  Free itself cannot be used for this
  purpose since, among other things, it might place chunks back onto
  fastbins.  So, instead, we need to use a minor variant of the same
  code.
*/

static void malloc_consolidate(mstate av)
{
  mfastbinptr*    fb;                 /* current fastbin being consolidated */
  mfastbinptr*    maxfb;              /* last fastbin (for loop control) */
  mchunkptr       p;                  /* current chunk being consolidated */
  mchunkptr       nextp;              /* next chunk to consolidate */
  mchunkptr       unsorted_bin;       /* bin header */
  mchunkptr       first_unsorted;     /* chunk to link to */

  /* These have same use as in free() */
  mchunkptr       nextchunk;
  INTERNAL_SIZE_T size;
  INTERNAL_SIZE_T nextsize;
  INTERNAL_SIZE_T prevsize;
  int             nextinuse;

  atomic_store_relaxed (&av->have_fastchunks, false);

  unsorted_bin = unsorted_chunks(av);

  /*
    Remove each chunk from fast bin and consolidate it, placing it
    then in unsorted bin. Among other reasons for doing this,
    placing in unsorted bin avoids needing to calculate actual bins
    until malloc is sure that chunks aren't immediately going to be
    reused anyway.
  */

  maxfb = &fastbin (av, NFASTBINS - 1);
  fb = &fastbin (av, 0);
  do {
    p = atomic_exchange_acq (fb, NULL);
    if (p != 0) {
      do {
	{
	  if (__glibc_unlikely (misaligned_chunk (p)))
	    malloc_printerr ("malloc_consolidate(): "
			     "unaligned fastbin chunk detected");

	  unsigned int idx = fastbin_index (chunksize (p));
	  if ((&fastbin (av, idx)) != fb)
	    malloc_printerr ("malloc_consolidate(): invalid chunk size");
	}

	check_inuse_chunk(av, p);
	nextp = REVEAL_PTR (p->fd);

	/* Slightly streamlined version of consolidation code in free() */
	size = chunksize (p);
	nextchunk = chunk_at_offset(p, size);
	nextsize = chunksize(nextchunk);

	if (!prev_inuse(p)) {
	  prevsize = prev_size (p);
	  size += prevsize;
	  p = chunk_at_offset(p, -((long) prevsize));
	  if (__glibc_unlikely (chunksize(p) != prevsize))
	    malloc_printerr ("corrupted size vs. prev_size in fastbins");
	  unlink_chunk (av, p);
	}

	if (nextchunk != av->top) {
	  nextinuse = inuse_bit_at_offset(nextchunk, nextsize);

	  if (!nextinuse) {
	    size += nextsize;
	    unlink_chunk (av, nextchunk);
	  } else
	    clear_inuse_bit_at_offset(nextchunk, 0);

	  first_unsorted = unsorted_bin->fd;
	  unsorted_bin->fd = p;
	  first_unsorted->bk = p;

	  if (!in_smallbin_range (size)) {
	    p->fd_nextsize = NULL;
	    p->bk_nextsize = NULL;
	  }

	  set_head(p, size | PREV_INUSE);
	  p->bk = unsorted_bin;
	  p->fd = first_unsorted;
	  set_foot(p, size);
	}

	else {
	  size += nextsize;
	  set_head(p, size | PREV_INUSE);
	  av->top = p;
	}

      } while ( (p = nextp) != 0);

    }
  } while (fb++ != maxfb);
}

开头是初始化了一些变量，都比较好理解。这里ptmalloc前面已经初始化过了，就直接来看if这一块的主要逻辑
最外层是一个while(fb++ != maxfb)来控制循环次数。maxfb这个变量记录的是fastbin的个数，那么从这一个循环语句的条件可以看出，每一次循环都会完成对某个大小的fastbin的解链、合并操作。

 do {
   p = atomic_exchange_acq (fb, NULL);
   if (p != 0) {
     do {
{
  if (__glibc_unlikely (misaligned_chunk (p)))
    malloc_printerr ("malloc_consolidate(): "
		     "unaligned fastbin chunk detected");

  unsigned int idx = fastbin_index (chunksize (p));
  if ((&fastbin (av, idx)) != fb)
    malloc_printerr ("malloc_consolidate(): invalid chunk size");
}

check_inuse_chunk(av, p);
nextp = REVEAL_PTR (p->fd);

对于每一个fastbin中的链表，首先atomic_exchange_acq又是一个CAS操作，该函数取出fb指针存放到p中，并将原来的chunk链表头指针的值设为0，表示chunk链表空闲了，这都是atomic_exchange_acq做的事。如果p不为0，则表明这个链表当中有chunk，然后首先会有一个循环对链表中存在的chunk进行合法性检查，检查是否对齐和size合法性。分别对应的是misaligned_chunk()函数和fastbin_index()把chunksize转化为fastbin的链表索引然后通过这个索引去找fastbin，未找到则size不合法。
通过以上检查以后就到了主要的合并逻辑：

如果您喜欢此博客或发现它对您有用，则欢迎对此发表评论。也欢迎您共享此博客，以便更多人可以参与。如果博客中使用的图像侵犯了您的版权，请与作者联系以将其删除。谢谢！

malloc相关源码解读

从一个chunk的结构开始讲起

malloc_state

_heap_info

malloc

2.33（有hook）

初始化malloc

进入tcache分支

tcache声明

初始化tcache

tcache分配

tcache_get()

_int_malloc

局部变量

第一部分

CAS

fast bin

fastbin stash

small bin

malloc consolidate

特色标签

链友