Sunday, 28 March 2021

Enable root login over SSH:

Enable root login over SSH: As root, edit the sshd_config file in /etc/ssh/sshd_config: nano /etc/ssh/sshd_config Add a line in the Authentication section of the file that says PermitRootLogin yes. This line may already exist and be commented out with a "#". In this case, remove the "#". # Authentication: #LoginGraceTime 2m PermitRootLogin yes #StrictModes yes #MaxAuthTries 6 #MaxSessions 10 Save the updated /etc/ssh/sshd_config file. Restart the SSH server: service sshd restart

Tuesday, 9 March 2021

mdb calculates ZFS related values and how those differ from ZFS ARC size

 


Applies to:

Solaris Operating System - Version 10 6/06 U2 and later
Information in this document applies to any platform.

Purpose

This document describes how mdb calculates ZFS related values and how those differ from ZFS ARC size so that users understand correctly the relationship between these two.

Details

ARC size reported by arcstats

arcstats kernel statistics reports the current ZFS ARC usage.

# kstat -n arcstats
module: zfs                             instance: 0     
name:   arcstats                        class:    misc
        buf_size                        37861488
        data_size                       7838309824
        l2_hdr_size                     0
        meta_used                       170464568
        other_size                      115650152
        prefetch_meta_size              16952928
        rawdata_size                    0
        size                            8008774392

(The output is cut for brevity.)

'size' is the amount of active data in the ARC and it can be broken down as follows.

Solaris 11.x prior to Solaris 11.3 SRU 13.4 and Solaris 10 without 150400-46/150401-46

size = meta_used + data_size;

Solaris 11.3 SRU 13.4 or later and Solaris 10 with 150400-46/150401-46 or later

size = data_size;


meta_used = buf_size + other_size + l2_hdr_size + rawdata_size + prefetch_meta_size;

buf_size: size of in-core data to manage ARC buffers.

other_size: size of in-core data to mange ZFS objects.

l2_hdr_size: size of in-core data to manage L2ARC.

rawdata_size: size of raw data used for persistent L2ARC. (Solaris 11.2.8 or later)

prefetch_meta_size: size of in-core data to manage prefetch. (Solaris 11.3 or later)

data_size: size of cached on-disk file data and on-disk meta data.

 

How ZFS ARC is allocated from kernel memory

The way ZFS ARC is allocated from kernel memory depends on Solaris versions.

Solaris 10, Solaris 11.0, Solaris 11.1

To cache on-disk file data, ARC is allocated from 'zio_data_buf_XXX' (XXX indicates cache unit size, such as '4096', '8192' etc.) kmem caches allocated from 'zfs_file_data_buf' virtual memory (vmem) arena.
To cache on-disk meta data, ARC is allocated from 'zio_buf_XXX' kmem caches allocated from 'kmem_default' vmem arena.
In-core data is allocated from other kmem caches, 'arc_buf_t', 'dmu_buf_impl_t', 'l2arc_buf_t', etc. allocated from 'kmem_default' vmem arena.
Also 'zio_data_buf_XXX' and 'zio_buf_XXX' are not used only to cache on-disk file and meta data but also used by ZFS IO routines not for ZFS ARC purpose.

Pages for 'zio_data_buf_XXX' are associated with the 'zvp' vnode and in the 'kzioseg' kernel segment.
Pages for 'zio_buf_XXX' and other caches are associated with the 'kvp', usual kernel vnode.

On Solaris 11.1 with SRU 3.4 or later, in addition to the above, 'zfs_file_data_lp_buf' vmem arena is used to allocate large pages.

Solaris 11.2

To cache on-disk file data, ARC is allocated from 'zio_data_buf_XXX' kmem caches allocated from 'zfs_file_data_buf' vmem arena.
To cache on-disk meta data, ARC is allocated from 'zio_buf_XXX' kmem caches allocated from 'zfs_metadta_buf' vmem arena.
In-core data is allocated from other kmem caches, 'arc_buf_t', 'dmu_buf_impl_t', 'l2arc_buf_t', 'zfetch_triggert_t', etc. allocated from 'kmem_default' vmem arena.
Also 'zio_data_buf_XXX' and 'zio_buf_XXX' are not used only to cache on-disk file and meta data but also used by ZFS IO routines not for ZFS ARC purpose.

Pages for both 'zio_data_buf_XXX' and 'zio_buf_XXX' are associated with the 'zvp' vnode and in the 'kzioseg' kernel segment.
Pages for other caches are associated with the 'kvp', usual kernel vnode.

Solaris 11.3 prior to SRU 21.5

The new kernel memory allocation mechanism, Kernel Object Manager (KOM) is introduced.
To cache on-disk file data, ARC is allocated from 'arc_data' kom class.
To cache on-disk meta data, ARC is allocated from 'arc_meta' kom class.
In-core data is allocated from other kmem caches, 'arc_buf_t', 'dmu_buf_impl_t', 'l2arc_buf_t', 'zfetch_triggert_t', etc. allocated from 'kmem_default' vmem arena.
Memory used by ZFS IO routines not for ZFS ARC purpose are allocated as 'kmem_alloc_XXX' from 'kmem_default' vmem arena.

'kzioseg' segment and 'zvp' vnode no longer exist.

Solaris 11.3 SRU 21.5 or later

To cache on-disk file data, ARC is allocated from 'arc_data' kom class.
To cache on-disk meta data, ARC is allocated from 'arc_meta' kom_class.

'kmem_default_zfs' vmem arena is introduced to account for kernel memory used by zfs not to cache on-disk data.

In-core data, 'arc_buf_t', 'dmu_buf_impl_t', 'l2arc_buf_t', 'zfetch_triggert_t', etc., are now allocated from 'kmem_default_zfs' vmem arena.
Memory used by ZFS IO routines not for ZFS ARC purpose are allocated as 'zio_buf_XXX' from 'kmem_default_zfs' vmem arena too.

 

ZFS information reported by ::memstat in mdb
::memstat reports ZFS related memory usage also, but it's not exactly the same as arcstats and its implementation depends on OS versions.

Solaris 10, Solaris 11.0, Solaris 11.1

> ::memstat
Page Summary                Pages                MB  %Tot
------------     ----------------  ----------------  ----
Kernel                     540356              2110   13%
ZFS File Data              609140              2379   15%
Anon                        41590               162    1%
Exec and libs                5231                20    0%
Page cache                   2883                11    0%
Free (cachelist)           800042              3125   19%
Free (freelist)           2192512              8564   52%

Total                     4191754             16374
Physical                  4102251             16024

'ZFS File Data' shows the size of pages associated with the 'zvp', which is the size allocated from 'zio_data_buf_XXX' kmem caches.
It does not include on-disk meta data and in-core data. Also it contains some amount of data used by ZFS IO routines.

Solaris 11.2

> ::memstat
Page Summary                 Pages             Bytes  %Tot
----------------- ----------------  ----------------  ----
Kernel                      237329              1.8G   23%
Guest                            0                 0    0%
ZFS Metadata                 28989            226.4M    3%
ZFS File Data               699858              5.3G   67%
Anon                         41418            323.5M    4%
Exec and libs                 1366             10.6M    0%
Page cache                    4782             37.3M    0%
Free (cachelist)              1017              7.9M    0%
Free (freelist)              33817            264.1M    3%
Total                      1048576                8G

'ZFS File Data' shows the size allocated from 'zfs_file_data_buf' vmem arena. 'ZFS Metadata' shows the size of "pages associated with zvp" - 'ZFS File Data'.

Solaris 11.3 prior to SRU 17.5.0

> ::memstat
Page Summary                 Pages             Bytes  %Tot
----------------- ----------------  ----------------  ----
Kernel                      558607              4.2G    7%
ZFS Metadata                 27076            211.5M    0%
ZFS File Data              2743214             20.9G   33%
Anon                         68656            536.3M    1%
Exec and libs                 2067             16.1M    0%
Page cache                    7285             56.9M    0%
Free (cachelist)             21596            168.7M    0%
Free (freelist)            4927709             37.5G   59%
Total                      8372224             63.8G

> ::kom_class
ADDR             FLAGS NAME             RSS        MEM_TOTAL
4c066e91d80      -L-   arc_meta         211.5m     280m      
4c066e91c80      ---   arc_data         20.9g      20.9g 

'ZFS File Data' shows the size of KOM statistics of 'arc_data''ZFS Metadata' shows the size of KOM statistics of 'arc_meta'.

Solaris 11.3 with SRU 17.5 and without SRU 21.5

> ::memstat -v
Page Summary                            Pages             Bytes  %Tot
---------------------------- ----------------  ----------------  ----
Kernel                                 636916              4.8G    4%
Kernel (ZFS ARC excess)                 16053            125.4M    0%
Defdump prealloc                       291049              2.2G    2%
ZFS Metadata                           137434              1.0G    1%
ZFS File Data                         4244593             32.3G   25%
Anon                                   114975            898.2M    1%
Exec and libs                            2000             15.6M    0%
Page cache                              15548            121.4M    0%
Free (cachelist)                       253689              1.9G    2%
Free (freelist)                      11064959             84.4G   66%
Total                                16777216              128G

::memstat on Solaris 11.3 SRU 17.5 or later has '-v' option to show the details.

'ZFS File Data' and 'ZFS Metadata' shows the KOM stat same as before.

In addition, 'Kernel (ZFS ARC excess)' shows the wasted memory of the sum of 'ZFS File Data' and 'ZFS Metadata'.

KOM can keep allocated memory which is not actually used at the moment, which is considered wasted.

Solaris 11.3 SRU 21.5 or later

> ::memstat -v
Page Summary                            Pages             Bytes  %Tot
---------------------------- ----------------  ----------------  ----
Kernel                                 671736              2.5G    6%
Kernel (ZFS ARC excess)                 21159             82.6M    0%
Defdump prealloc                       361273              1.3G    3%
ZFS Kernel Data                        131699            514.4M    1%
ZFS Metadata                            42962            167.8M    0%
ZFS File Data                         8857479             33.7G   84%
Anon                                    99066            386.9M    1%
Exec and libs                            2050              8.0M    0%
Page cache                               9265             36.1M    0%
Free (cachelist)                        14663             57.2M    0%
Free (freelist)                        273905              1.0G    3%
Total                                10485257             39.9G

 In addition to the information prior to Solarsi 11.3 SRU 21.5, 'ZFS Kernel Data' shows the size allocated from 'kmem_default_zfs' arena (and its overhead).

Solaris 11.4 or later

> ::memstat -v
Usage Type/Subtype                      Pages    Bytes  %Tot  %Tot/%Subt
---------------------------- ---------------- -------- ----- -----------
Kernel                                3669091    13.9g  7.2%
  Regular Kernel                      2602037     9.9g        5.1%/70.9%
  ZFS ARC Fragmentation                 14515    56.6m        0.0%/ 0.3%
  Defdump prealloc                    1052539     4.0g        2.0%/28.6%
ZFS                                  28359638   108.1g 56.3%
  ZFS Metadata                         116083   453.4m        0.2%/ 0.4%
  ZFS Data                           27959629   106.6g       55.5%/98.5%
  ZFS Kernel Data                      283926     1.0g        0.5%/ 1.0%
User/Anon                              201462   786.9m  0.4%
Exec and libs                            3062    11.9m  0.0%
Page Cache                              29372   114.7m  0.0%
Free (cachelist)                          944     3.6m  0.0%
Free                                 18033911    68.7g 35.8%
Total                                50297480   191.8g  100%

 'ZFS ARC Fragmentation' under 'Kernel' shows the wasted memory.

 

Why values reported by ::memstat is different from size reported by arcstats?

There are a few factors.

ARC size includes cached on-disk file data, cached on-disk meta data, and various in-core data. But ::memstat does not report each of them. Prior to Solaris 11.2, only 'ZFS File Data' is reported.
Even on Solaris 11.2 and 11.3, in-core data is not reported. Also the accounting by arcstats and ::memstat does not completely match.

::memstat on Solaris 11.3 SRU 21.5 or later reports in-core data as 'ZFS Kernel Data', though in-core data counted by arcstats and by ::memstat are not exactly the same.

Another factor is wasted memory in kmem caches.
Consider a possible scenario here: customer ran a workload that was largely 128K blocksize based. This resulted in filling up the ARC cache with say X GB of 128K blocks. The customer then switched to a workload that was 8K based. The ARC cache now filled up Y GB of 8K blocks (the 128K blocks are evicted). When the 128K blocks are evicted from the ARC cache, they are returned to the 'zio_data_buf_131072' cache, where they will stay (unused by the ARC) until either re-allocated or "reaped" by the VM system.

Under such a condition, 'ZFS File Data' shown by ::memstat can be much higher than the ARC size.
Especially, from Solaris 11.1 with SRU 3.4 through Solaris 11.1 with SRU 21.4, large pages are used by default and the situation can be worse.

::memstat reports such waste as 'Kernel (ZFS ARC excess)' on Solaris 11.3 SRU 17.5 or later, or 'ZFS ARC Fragmentation' on Solaris 11.4 or later.

Also it could happen 'ZFS File Data' is higher than the ARC size even though 'ZFS ARC excess / ZFS ARC Fragmentation' is not high.
In this case, the ARC memory is freed but still have KOM objects associated.

As discussed above, it is clear that reported values by ::memstat do not have to match with the value of ZFS ARC size.  It is not an issue if ::memstat values are more or less than ZFS ARC size.

 

-------------

 


Click to add to FavoritesTo BottomTo Bottom

Applies to:

Solaris Operating System - Version 8.0 to 11.4 [Release 8.0 to 11.0]
All Platforms
*** Checked for currency and updated for Solaris 11.2 11-March-2015 ***


Goal

This document is intended to give hints, where to look for in checking and troubleshooting memory usage.
In principle, investigation of memory usage is split in checking usage of kernel memory and user memory.

Please be aware that in case of a memory-usage problem on a system, corrective actions usually requires deep knowledge and must be performed with great care.

Solution

General System Practices is to keep system up-to-date with latest Solaris releases and patches

First, you need to check  how much Memory is used in Kernel and how much is used in User Memory. This is important to decide, which further troubleshooting steps are required.

A very useful mdb dcmd is '::memstat' ( this command can take several minutes to complete )
For more information on using the modular debugger, see the Oracle Solaris Modular Debugger Guide.
Solaris[TM] 9 Operating System or greater only !  Format varies with OS release.  This example is from Solaris 11.2

# echo "::memstat" | mdb -k
Page Summary                 Pages             Bytes  %Tot
----------------- ----------------  ----------------  ----
Kernel                      585584              4.4G   14%
Defdump prealloc            204802              1.5G    5%
Guest                            0                 0    0%
ZFS Metadata                 21436            167.4M    0%
ZFS File Data               342833              2.6G    8%
Anon                         56636            442.4M    1%
Exec and libs                 1131              8.8M    0%
Page cache                    4339             33.8M    0%
Free (cachelist)              8011             62.5M    0%
Free (freelist)            2969532             22.6G   71%
Total                      4194304               32G



User memory usage :  print out processes using most USER - memory
% prstat -s size # sorted by userland virtual memory consumption
% prstat -s rss # sorted by userland physical memory consumption

% prstat -s rss
   PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP
  4051 user1     297M  258M sleep   59    0   1:35:05 0.0% mysqld/10
 26286 user2     229M  180M sleep   59    0   0:05:07 0.0% java/53
 27101 user2     237M  150M sleep   59    0   0:02:21 0.0% soffice.bin/5
 23335 user2     193M  135M sleep   59    0   0:12:33 0.0% firefox-bin/10
  3727 noaccess  192M  131M sleep   59    0   0:36:22 0.0% java/18
 22751 root      165M  131M sleep   59    0   1:13:12 0.0% java/46
  1448 noaccess  192M  108M sleep   59    0   0:34:47 0.0% java/18
 10115 root      129M   82M sleep   59    0   0:31:29 0.0% java/41
 20274 root      136M   77M stop    59    0   0:04:08 0.0% java/25
  3397 root      138M   76M sleep   59    0   0:12:42 0.0% java/37
 12949 pgsql      81M   70M sleep   59    0   0:09:36 0.0% postgres/1
 12945 pgsql      80M   70M sleep   59    0   0:00:05 0.0% postgres/1



 User Memory Usage : shows Shared Memory and Semaphores:

% ipcs -a

IPC status from
T  ID     KEY        MODE     OWNER   GROUP  CREATOR  CGROUP CBYTES  QNUM     QBYTES  LSPID  LRPID   STIME    RTIME    CTIME
Message Queues:
q  0  0x55460272 -Rrw-rw----   root    root     root    root    0       0     4194304  1390  18941  14:12:20  14:12:21  10:23:32
q  1  0x41460272 --rw-rw----   root    root     root    root    0       0     4194304  5914   1390   8:03:34   8:03:34  10:23:39
q  2  0x4b460272 --rw-rw----   root    root     root    root    0       0     4194304     0      0  no-entry  no-entry  10:23:39

T  ID      KEY       MODE      OWNER     GROUP CREATOR    CGROUP    NATTCH       SEGSZ  CPID   LPID     ATIME     DTIME    CTIME
Shared Memory:
m  0  0x50000b3f --rw-r--r--   root      root     root      root         1           4   738   738   18:50:36  18:50:36  18:50:36
m  1  0x52574801 --rw-rw----   root    oracle     root    oracle        35  1693450240  2049  26495  10:30:00  10:30:00  18:51:13
m  2  0x52574802 --rw-rw----   root    oracle     root    oracle        35  1258291200  2049  26495  10:30:00  10:30:00  18:51:16
m  3  0x52594801 --rw-rw----   root    oracle     root    oracle        12   241172480  2098  14328   7:58:33   7:58:33  18:51:27
m  4  0x52594802 --rw-rw----   root    oracle     root    oracle        12    78643200  2098  14329   7:58:32   7:58:33  18:51:27
m  5  0x52584801 --rw-rw----   root    oracle     root    oracle        13   125829120  2125  27492   1:36:12   1:36:12  18:51:34
m  6  0x52584802 --rw-rw----   root    oracle     root    oracle        13   268435456  2125  27487   1:36:10   1:36:11  18:51:34
m  7  0x525a4801 --rw-rw----   root    oracle     root    oracle        15   912261120  2160  27472   1:36:09   1:36:09  18:51:40
m  8  0x525a4802 --rw-rw----   root    oracle     root    oracle        15   268435456  2160  27467   1:36:08   1:36:09  18:51:42
m 8201 0x4d2     --rw-rw-rw-   root      root     root      root         0       32008  1528   1543  10:26:03  10:26:04  10:25:53

T  ID  KEY       MODE     OWNER       GROUP       CREATOR        CGROUP         NSEMS     OTIME    CTIME
Semaphores:
s  0   0x1   --ra-ra-ra-   root        root          root         root              1     16:17:35  18:50:33
s  1     0   --ra-ra----   root       oracle         root         oracle           36     10:33:28  18:51:17
s  2     0   --ra-ra----   root       oracle         root         oracle           13     10:33:28  18:51:27
s  3     0   --ra-ra----   root       oracle         root         oracle           14     10:33:28  18:51:34
s  4     0   --ra-ra----   root       oracle         root         oracle           16     10:33:27  18:51:42
s  5 0x4d2   --ra-ra-ra-   root       root           root         root               1    no-entry  10:25:53
s  6 0x4d3   --ra-ra-ra-   root       root           root         root               1    no-entry  10:25:53




User Memory Usage : lists User Memory usage of all processes ( except PID 0,2,3 )

# pmap -x /proc/* > /var/tmp/pmap-x
short list of total usage of these processes

% egrep "[0-9]:|^total" /var/tmp/pmap-x
     1:   /sbin/init
total Kb 2336 2080  128 -
1006:  rlogin cores4
total Kb 2216 1696    80 -
1007:  rlogin cores4
total Kb 2216 1696  104 -
  115:  /usr/sbin/nscd
total Kb 4208 3784 1704 -
-- snip --




User Memory Usage : check the usage of /tmp

% df -kl /tmp
Filesystem kbytes        used        avail capacity  Mounted on 
swap        1355552    2072 1353480        1%      /tmp

print the biggest 10 files and dirs in /tmp

% du -akd /tmp/ | sort -n | tail -10
288     /tmp/SUNWut
328     /tmp/log
576     /tmp/ips2
584     /tmp/explo
608     /tmp/ipso
3408    /tmp/sshd-truss.out
17992   /tmp/truss.p
22624   /tmp/js
49208   /tmp



 
User Memory Usage : Overall Memory usage on system

% vmstat -p 3
     memory           page          executable      anonymous      filesystem
   swap  free     re  mf  fr  de  sr  epi  epo  epf  api  apo  apf  fpi  fpo  fpf
19680912 27487976 21  94   0   0   0    0    0    0    0    0    0   14    0    0
 3577608 11959480  0  20   0   0   0    0    0    0    0    0    0    0    0    0
 3577328 11959240  0   5   0   0   0    0    0    0    0    0    0    0    0    0
 3577328 11959112 38 207   0   0   0    0    0    0    0    0    0    0    0    0
 3577280 11958944  0   1   0   0   0    0    0    0    0    0    0    0    0    0

 

scanrate 'sr'  should be 0  or near zero



 
User Memory Usage : Swap usage

% swap -l
swapfile              dev    swaplo  blocks      free
/dev/dsk/c0t0d0s1   32,25        16  1946032  1946032

% swap -s
total: 399400k bytes allocated + 18152k reserved = 417552k used, 1355480k available




common kernel statistics

print out all kernel statistics in a parse'able format

% kstat -p > /var/tmp/kstat-p



kernel memory statistics:

% kstat -p -c kmem_cache
% kstat -p -m vmem
% kstat -p -c vmem
% kstat -p | egrep zfs_file_data_buf | egrep mem_total



alternatively to kstat you can get kernel memory usage with kmastat
prints kmastat buffers

# echo "::kmastat" | mdb -k > /var/tmp/kmastat
% more /var/tmp/kmastat
    cache                     buf    buf    buf     memory     alloc  alloc
    name                     size in use   total    in use   succeed  fail
 ------------------------- ------ ------  ------ --------- --------- -----
  kmem_magazine_1              16    470     508      8192       470     0
  kmem_magazine_3              32    970    1016     32768      1164     0
  kmem_magazine_7              64   1690    1778    114688      1715     0


Look for the highest numbers in column "memory in use" and for any numbers higher than '0' in column "alloc fail"

 

ZFS File Data:
    Keep system up-to-date with latest Solaris releases and patches
    Size memory requirements to actual system workload

        With a known application memory footprint, such as for a database application, you might cap the ARC size so that the
        application will not need to reclaim its necessary memory from the ZFS cache.
        Consider de-duplication memory requirements
        Identify ZFS memory usage with the following command:

# mdb -k
Loading modules: [ unix genunix specfs dtrace zfs scsi_vhci sd mpt mac px ldc ip
 hook neti ds arp usba kssl sockfs random mdesc idm nfs cpc crypto fcip fctl ufs
 logindmux ptm sppp ipc ]
> ::memstat
Page Summary                 Pages             Bytes  %Tot
----------------- ----------------  ----------------  ----
Kernel                      261969              1.9G    6%
Guest                            0                 0    0%
ZFS Metadata                 13915            108.7M    0%
ZFS File Data               111955            874.6M    3%
Anon                         52339            408.8M    1%
Exec and libs                 1308             10.2M    0%
Page cache                    5932             46.3M    0%
Free (cachelist)             16460            128.5M    0%
Free (freelist)            3701754             28.2G   89%
Total                      4165632             31.7G
> $q

In case the amount of ZFS File Data is too high on the system, you might consider how to limit how much memory ZFS can consume.

For Solaris revisions prior to Solaris 11, the only way accomplish this is to limit the ARC cache
by setting zfs:zfs_arc_max in /etc/system
set zfs:zfs_arc_max = [size]
i.e. limit the cache to 1 GB in size
set zfs:zfs_arc_max = 1073741824

Please check the following documents to check/limit the ARC
How to Understand "ZFS File Data" Value by mdb and ZFS ARC Size. (Doc ID 1430323.1)
Oracle Solaris Tunable Parameters Reference Manual

Starting at Solaris 11, a second method, reserving memory for applications, may be used to prevent ZFS from using too much memory.

The entry in /etc/system looks like this:

set user_reserve_hint_pct=60

 

configure /dev/shm size of Linux

 

How to configure /dev/shm size of Linux?

To change the configuration for /dev/shm, add one line to /etc/fstab as follows.

tmpfs /dev/shm tmpfs defaults,size=8g 0 0

Here, the /dev/shm size is configured to be 8GB (make sure you have enough physical memory installed).

It will take effect next time Linux reboot. If you would like to make it take effect immediately, run

 

========

For many facilities there are system calls, others are hidden behind netlink interfaces, and even others are exposed via virtual file systems such as /proc or /sys. These file systems are programming interfaces, they are not actually backed by real, persistent storage. They simply use the file system interface of the kernel as interface to various unrelated mechanisms.


Now by default systemd assigns a certain part of your physical memory to these partitions as a threshold. But what if your requirement requires you to change tmpfs partition size?

For some of the tmpfs partitions, you can change the threshold size by using fstab. While for other partitions like (/run/user/) which are created runtime, you cannot use fstab to change tmpfs partition size for such runtime directories.

Below are the list of tmpfs partitions available in RHEL 7

Filesystem Size Used Avail Use% Mounted on
tmpfs      187G    0  187G   0% /dev/shm
tmpfs      187G  41M  187G   1%  /run
tmpfs      187G    0  187G   0% /sys/fs/cgroup
tmpfs       38G    0   38G   0% /run/user/1710
tmpfs       38G    0   38G   0% /run/user/0
NOTE:
You may notice that /etc/fstab does not contains entries for these tmpfs partitions but still df -h will show these partitions.

 

Change tmpfs partition size for /dev/shm

If an application is POSIX compliant or it uses GLIBC (2.2 and above) on a Red Hat Enterprise Linux system, it will usually use the /dev/shm for shared memory (shm_open, shm_unlink). /dev/shm is a temporary filesystem (tmpfs) which is mounted from /etc/fstab. Hence the standard options like "size" supported for tmpfs can be used to increase or decrease the size of tmpfs on /dev/shm (by default it is half of available system RAM).


For example, to set the size of /dev/shm to 2GiB, change the following line in /etc/fstab:

Default:

none     /dev/shm       tmpfs   defaults                0 0

To:

none     /dev/shm       tmpfs   defaults,size=2G        0 0

For the changes to take effect immediately remount /dev/shm:

# mount -o remount /dev/shm
NOTE:
A mount -o remount to shrink a tmpfs will succeed if there are not any blocks or inodes allocated within the new limit of the smaller tmpfs size. It is not possible to predict or control this, however a remount simply will not work if it cannot be done. In that case, stop all processes using tmpfs, unmount it, and remount it using the new size.

Lastly validate the new size

# df -h /dev/shm
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           2.0G     0  2.0G   0% /dev/shm

 

Change tmpfs partition size for /run

/run is a filesystem which is used by applications the same way /var/run was used in previous versions of RHEL. Now /var/run is a symlink to /run filesystem. Previously early boot programs used to place runtime data in /dev under numerous hidden dot directories. The reason they used directories in /dev was because it was known to be available from very early time during machine boot process. Because /var/run was available very late during boot, as /var might reside on a separate file system, directory /run was implemented.

 

By default you may not find any /etc/fstab entry for /run, so you can add below line

none     /run          tmpfs       defaults,size=600M        0 0

For the changes to take effect immediately remount /run:

# mount -o remount /run

lastly validate the new size

# df -h /run
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           600M  9.6M  591M   2% /run

 

Change tmpfs partition size for /run/user/$UID

/run/user/$UID is a filesystem used by pam_systemd to store files used by running processes for that user. In previous releases these files were typically stored in /tmp as it was the only location specified by the FHS which is local, and writeable by all users. However using /tmp can causes issues because it is writeable by anyone and thus access control was challenging. Using /run/user/$UID fixes the issue because it is only accessible by the target user.

IMPORTANT NOTE:
You cannot change tmpfs partition size for /run/user/$UID using /etc/fstab.

tmps partition size for /run/user/$UID is taken based on RuntimeDirectorySize value from /etc/systemd/logind.conf

# grep -i runtime /etc/systemd/logind.conf
RuntimeDirectorySize=10%

By default the default threshold for these runtime directory is 10% of the total physical memory.

From the man page of logind.conf

RuntimeDirectorySize=
      Sets the size limit on the $XDG_RUNTIME_DIR runtime directory for each user who logs in. Takes a size in bytes, optionally suffixed with the usual K, G, M, and T suffixes, to the base 1024 (IEC). Alternatively, a numerical percentage suffixed by "%" may be specified, which sets the size limit relative to the amount of physical RAM. Defaults to 10%. Note that this size is a safety limit only. As each runtime directory is a tmpfs file system, it will only consume as much memory as is needed.

Modify this variable to your required value, for example I have provided threshold of 100M

# grep -i runtime /etc/systemd/logind.conf
RuntimeDirectorySize=100M

Next restart the systemd-logind service

IMPORTANT NOTE:
A reboot of the node is required to activate the changes.

 

Change tmpfs partition size for /sys/fs/cgroup

/sys/fs/cgroup is an interface through which Control Groups can be accessed. By default there may or may not be /etc/fstab content for /sys/fs/cgroup so add a new entry

Current value for /sys/fs/cgroup

# df -h /sys/fs/cgroup
Filesystem      Size  Used Avail Use% Mounted on
tmpfs            63G     0   63G   0% /sys/fs/cgroup

Add below line in your /etc/fstab to change the threshold to 2GB

none          /sys/fs/cgroup          tmpfs       defaults,size=2G         0 0

Remount the partition /sys/fs/cgroup

# mount -o remount /sys/fs/cgroup

Lastly validate the updated changes

# df -h /sys/fs/cgroup
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           2.0G     0  2.0G   0% /sys/fs/cgroup

 

 

Monday, 8 March 2021

How to Check and Analyze Solaris Memory Usage

Solaris Operating System - Version 8.0 to 11.4 [Release 8.0 to 11.0] All Platforms *** Checked for currency and updated for Solaris 11.2 11-March-2015 *** Goal This document is intended to give hints, where to look for in checking and troubleshooting memory usage. In principle, investigation of memory usage is split in checking usage of kernel memory and user memory. Please be aware that in case of a memory-usage problem on a system, corrective actions usually requires deep knowledge and must be performed with great care. Solution General System Practices is to keep system up-to-date with latest Solaris releases and patches First, you need to check how much Memory is used in Kernel and how much is used in User Memory. This is important to decide, which further troubleshooting steps are required. A very useful mdb dcmd is '::memstat' ( this command can take several minutes to complete ) For more information on using the modular debugger, see the Oracle Solaris Modular Debugger Guide. Solaris[TM] 9 Operating System or greater only ! Format varies with OS release. This example is from Solaris 11.2 # echo "::memstat" | mdb -k Page Summary Pages Bytes %Tot ----------------- ---------------- ---------------- ---- Kernel 585584 4.4G 14% Defdump prealloc 204802 1.5G 5% Guest 0 0 0% ZFS Metadata 21436 167.4M 0% ZFS File Data 342833 2.6G 8% Anon 56636 442.4M 1% Exec and libs 1131 8.8M 0% Page cache 4339 33.8M 0% Free (cachelist) 8011 62.5M 0% Free (freelist) 2969532 22.6G 71% Total 4194304 32G User memory usage : print out processes using most USER - memory % prstat -s size # sorted by userland virtual memory consumption % prstat -s rss # sorted by userland physical memory consumption % prstat -s rss PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP 4051 user1 297M 258M sleep 59 0 1:35:05 0.0% mysqld/10 26286 user2 229M 180M sleep 59 0 0:05:07 0.0% java/53 27101 user2 237M 150M sleep 59 0 0:02:21 0.0% soffice.bin/5 23335 user2 193M 135M sleep 59 0 0:12:33 0.0% firefox-bin/10 3727 noaccess 192M 131M sleep 59 0 0:36:22 0.0% java/18 22751 root 165M 131M sleep 59 0 1:13:12 0.0% java/46 1448 noaccess 192M 108M sleep 59 0 0:34:47 0.0% java/18 10115 root 129M 82M sleep 59 0 0:31:29 0.0% java/41 20274 root 136M 77M stop 59 0 0:04:08 0.0% java/25 3397 root 138M 76M sleep 59 0 0:12:42 0.0% java/37 12949 pgsql 81M 70M sleep 59 0 0:09:36 0.0% postgres/1 12945 pgsql 80M 70M sleep 59 0 0:00:05 0.0% postgres/1 User Memory Usage : shows Shared Memory and Semaphores: % ipcs -a IPC status from T ID KEY MODE OWNER GROUP CREATOR CGROUP CBYTES QNUM QBYTES LSPID LRPID STIME RTIME CTIME Message Queues: q 0 0x55460272 -Rrw-rw---- root root root root 0 0 4194304 1390 18941 14:12:20 14:12:21 10:23:32 q 1 0x41460272 --rw-rw---- root root root root 0 0 4194304 5914 1390 8:03:34 8:03:34 10:23:39 q 2 0x4b460272 --rw-rw---- root root root root 0 0 4194304 0 0 no-entry no-entry 10:23:39 T ID KEY MODE OWNER GROUP CREATOR CGROUP NATTCH SEGSZ CPID LPID ATIME DTIME CTIME Shared Memory: m 0 0x50000b3f --rw-r--r-- root root root root 1 4 738 738 18:50:36 18:50:36 18:50:36 m 1 0x52574801 --rw-rw---- root oracle root oracle 35 1693450240 2049 26495 10:30:00 10:30:00 18:51:13 m 2 0x52574802 --rw-rw---- root oracle root oracle 35 1258291200 2049 26495 10:30:00 10:30:00 18:51:16 m 3 0x52594801 --rw-rw---- root oracle root oracle 12 241172480 2098 14328 7:58:33 7:58:33 18:51:27 m 4 0x52594802 --rw-rw---- root oracle root oracle 12 78643200 2098 14329 7:58:32 7:58:33 18:51:27 m 5 0x52584801 --rw-rw---- root oracle root oracle 13 125829120 2125 27492 1:36:12 1:36:12 18:51:34 m 6 0x52584802 --rw-rw---- root oracle root oracle 13 268435456 2125 27487 1:36:10 1:36:11 18:51:34 m 7 0x525a4801 --rw-rw---- root oracle root oracle 15 912261120 2160 27472 1:36:09 1:36:09 18:51:40 m 8 0x525a4802 --rw-rw---- root oracle root oracle 15 268435456 2160 27467 1:36:08 1:36:09 18:51:42 m 8201 0x4d2 --rw-rw-rw- root root root root 0 32008 1528 1543 10:26:03 10:26:04 10:25:53 T ID KEY MODE OWNER GROUP CREATOR CGROUP NSEMS OTIME CTIME Semaphores: s 0 0x1 --ra-ra-ra- root root root root 1 16:17:35 18:50:33 s 1 0 --ra-ra---- root oracle root oracle 36 10:33:28 18:51:17 s 2 0 --ra-ra---- root oracle root oracle 13 10:33:28 18:51:27 s 3 0 --ra-ra---- root oracle root oracle 14 10:33:28 18:51:34 s 4 0 --ra-ra---- root oracle root oracle 16 10:33:27 18:51:42 s 5 0x4d2 --ra-ra-ra- root root root root 1 no-entry 10:25:53 s 6 0x4d3 --ra-ra-ra- root root root root 1 no-entry 10:25:53 User Memory Usage : lists User Memory usage of all processes ( except PID 0,2,3 ) # pmap -x /proc/* > /var/tmp/pmap-x short list of total usage of these processes % egrep "[0-9]:|^total" /var/tmp/pmap-x 1: /sbin/init total Kb 2336 2080 128 - 1006: rlogin cores4 total Kb 2216 1696 80 - 1007: rlogin cores4 total Kb 2216 1696 104 - 115: /usr/sbin/nscd total Kb 4208 3784 1704 - -- snip -- User Memory Usage : check the usage of /tmp % df -kl /tmp Filesystem kbytes used avail capacity Mounted on swap 1355552 2072 1353480 1% /tmp print the biggest 10 files and dirs in /tmp % du -akd /tmp/ | sort -n | tail -10 288 /tmp/SUNWut 328 /tmp/log 576 /tmp/ips2 584 /tmp/explo 608 /tmp/ipso 3408 /tmp/sshd-truss.out 17992 /tmp/truss.p 22624 /tmp/js 49208 /tmp User Memory Usage : Overall Memory usage on system % vmstat -p 3 memory page executable anonymous filesystem swap free re mf fr de sr epi epo epf api apo apf fpi fpo fpf 19680912 27487976 21 94 0 0 0 0 0 0 0 0 0 14 0 0 3577608 11959480 0 20 0 0 0 0 0 0 0 0 0 0 0 0 3577328 11959240 0 5 0 0 0 0 0 0 0 0 0 0 0 0 3577328 11959112 38 207 0 0 0 0 0 0 0 0 0 0 0 0 3577280 11958944 0 1 0 0 0 0 0 0 0 0 0 0 0 0 scanrate 'sr' should be 0 or near zero User Memory Usage : Swap usage % swap -l swapfile dev swaplo blocks free /dev/dsk/c0t0d0s1 32,25 16 1946032 1946032 % swap -s total: 399400k bytes allocated + 18152k reserved = 417552k used, 1355480k available common kernel statistics print out all kernel statistics in a parse'able format % kstat -p > /var/tmp/kstat-p kernel memory statistics: % kstat -p -c kmem_cache % kstat -p -m vmem % kstat -p -c vmem % kstat -p | egrep zfs_file_data_buf | egrep mem_total alternatively to kstat you can get kernel memory usage with kmastat prints kmastat buffers # echo "::kmastat" | mdb -k > /var/tmp/kmastat % more /var/tmp/kmastat cache buf buf buf memory alloc alloc name size in use total in use succeed fail ------------------------- ------ ------ ------ --------- --------- ----- kmem_magazine_1 16 470 508 8192 470 0 kmem_magazine_3 32 970 1016 32768 1164 0 kmem_magazine_7 64 1690 1778 114688 1715 0 Look for the highest numbers in column "memory in use" and for any numbers higher than '0' in column "alloc fail" ZFS File Data: Keep system up-to-date with latest Solaris releases and patches Size memory requirements to actual system workload With a known application memory footprint, such as for a database application, you might cap the ARC size so that the application will not need to reclaim its necessary memory from the ZFS cache. Consider de-duplication memory requirements Identify ZFS memory usage with the following command: # mdb -k Loading modules: [ unix genunix specfs dtrace zfs scsi_vhci sd mpt mac px ldc ip hook neti ds arp usba kssl sockfs random mdesc idm nfs cpc crypto fcip fctl ufs logindmux ptm sppp ipc ] > ::memstat Page Summary Pages Bytes %Tot ----------------- ---------------- ---------------- ---- Kernel 261969 1.9G 6% Guest 0 0 0% ZFS Metadata 13915 108.7M 0% ZFS File Data 111955 874.6M 3% Anon 52339 408.8M 1% Exec and libs 1308 10.2M 0% Page cache 5932 46.3M 0% Free (cachelist) 16460 128.5M 0% Free (freelist) 3701754 28.2G 89% Total 4165632 31.7G > $q In case the amount of ZFS File Data is too high on the system, you might consider how to limit how much memory ZFS can consume. For Solaris revisions prior to Solaris 11, the only way accomplish this is to limit the ARC cache by setting zfs:zfs_arc_max in /etc/system set zfs:zfs_arc_max = [size] i.e. limit the cache to 1 GB in size set zfs:zfs_arc_max = 1073741824 Please check the following documents to check/limit the ARC How to Understand "ZFS File Data" Value by mdb and ZFS ARC Size. (Doc ID 1430323.1) Oracle Solaris Tunable Parameters Reference Manual Starting at Solaris 11, a second method, reserving memory for applications, may be used to prevent ZFS from using too much memory. The entry in /etc/system looks like this: set user_reserve_hint_pct=60 ARC size reported by arcstats arcstats kernel statistics reports the current ZFS ARC usage. # kstat -n arcstats module: zfs instance: 0 name: arcstats class: misc buf_size 37861488 data_size 7838309824 l2_hdr_size 0 meta_used 170464568 other_size 115650152 prefetch_meta_size 16952928 rawdata_size 0 size 8008774392 (The output is cut for brevity.) 'size' is the amount of active data in the ARC and it can be broken down as follows. Solaris 11.x prior to Solaris 11.3 SRU 13.4 and Solaris 10 without 150400-46/150401-46 size = meta_used + data_size; Solaris 11.3 SRU 13.4 or later and Solaris 10 with 150400-46/150401-46 or later size = data_size; meta_used = buf_size + other_size + l2_hdr_size + rawdata_size + prefetch_meta_size; buf_size: size of in-core data to manage ARC buffers. other_size: size of in-core data to mange ZFS objects. l2_hdr_size: size of in-core data to manage L2ARC. rawdata_size: size of raw data used for persistent L2ARC. (Solaris 11.2.8 or later) prefetch_meta_size: size of in-core data to manage prefetch. (Solaris 11.3 or later) data_size: size of cached on-disk file data and on-disk meta data. How ZFS ARC is allocated from kernel memory The way ZFS ARC is allocated from kernel memory depends on Solaris versions. Solaris 10, Solaris 11.0, Solaris 11.1 To cache on-disk file data, ARC is allocated from 'zio_data_buf_XXX' (XXX indicates cache unit size, such as '4096', '8192' etc.) kmem caches allocated from 'zfs_file_data_buf' virtual memory (vmem) arena. To cache on-disk meta data, ARC is allocated from 'zio_buf_XXX' kmem caches allocated from 'kmem_default' vmem arena. In-core data is allocated from other kmem caches, 'arc_buf_t', 'dmu_buf_impl_t', 'l2arc_buf_t', etc. allocated from 'kmem_default' vmem arena. Also 'zio_data_buf_XXX' and 'zio_buf_XXX' are not used only to cache on-disk file and meta data but also used by ZFS IO routines not for ZFS ARC purpose. Pages for 'zio_data_buf_XXX' are associated with the 'zvp' vnode and in the 'kzioseg' kernel segment. Pages for 'zio_buf_XXX' and other caches are associated with the 'kvp', usual kernel vnode. On Solaris 11.1 with SRU 3.4 or later, in addition to the above, 'zfs_file_data_lp_buf' vmem arena is used to allocate large pages. Solaris 11.2 To cache on-disk file data, ARC is allocated from 'zio_data_buf_XXX' kmem caches allocated from 'zfs_file_data_buf' vmem arena. To cache on-disk meta data, ARC is allocated from 'zio_buf_XXX' kmem caches allocated from 'zfs_metadta_buf' vmem arena. In-core data is allocated from other kmem caches, 'arc_buf_t', 'dmu_buf_impl_t', 'l2arc_buf_t', 'zfetch_triggert_t', etc. allocated from 'kmem_default' vmem arena. Also 'zio_data_buf_XXX' and 'zio_buf_XXX' are not used only to cache on-disk file and meta data but also used by ZFS IO routines not for ZFS ARC purpose. Pages for both 'zio_data_buf_XXX' and 'zio_buf_XXX' are associated with the 'zvp' vnode and in the 'kzioseg' kernel segment. Pages for other caches are associated with the 'kvp', usual kernel vnode. Solaris 11.3 prior to SRU 21.5 The new kernel memory allocation mechanism, Kernel Object Manager (KOM) is introduced. To cache on-disk file data, ARC is allocated from 'arc_data' kom class. To cache on-disk meta data, ARC is allocated from 'arc_meta' kom class. In-core data is allocated from other kmem caches, 'arc_buf_t', 'dmu_buf_impl_t', 'l2arc_buf_t', 'zfetch_triggert_t', etc. allocated from 'kmem_default' vmem arena. Memory used by ZFS IO routines not for ZFS ARC purpose are allocated as 'kmem_alloc_XXX' from 'kmem_default' vmem arena. 'kzioseg' segment and 'zvp' vnode no longer exist. Solaris 11.3 SRU 21.5 or later To cache on-disk file data, ARC is allocated from 'arc_data' kom class. To cache on-disk meta data, ARC is allocated from 'arc_meta' kom_class. 'kmem_default_zfs' vmem arena is introduced to account for kernel memory used by zfs not to cache on-disk data. In-core data, 'arc_buf_t', 'dmu_buf_impl_t', 'l2arc_buf_t', 'zfetch_triggert_t', etc., are now allocated from 'kmem_default_zfs' vmem arena. Memory used by ZFS IO routines not for ZFS ARC purpose are allocated as 'zio_buf_XXX' from 'kmem_default_zfs' vmem arena too. ZFS information reported by ::memstat in mdb ::memstat reports ZFS related memory usage also, but it's not exactly the same as arcstats and its implementation depends on OS versions. Solaris 10, Solaris 11.0, Solaris 11.1 > ::memstat Page Summary Pages MB %Tot ------------ ---------------- ---------------- ---- Kernel 540356 2110 13% ZFS File Data 609140 2379 15% Anon 41590 162 1% Exec and libs 5231 20 0% Page cache 2883 11 0% Free (cachelist) 800042 3125 19% Free (freelist) 2192512 8564 52% Total 4191754 16374 Physical 4102251 16024 'ZFS File Data' shows the size of pages associated with the 'zvp', which is the size allocated from 'zio_data_buf_XXX' kmem caches. It does not include on-disk meta data and in-core data. Also it contains some amount of data used by ZFS IO routines. Solaris 11.2 > ::memstat Page Summary Pages Bytes %Tot ----------------- ---------------- ---------------- ---- Kernel 237329 1.8G 23% Guest 0 0 0% ZFS Metadata 28989 226.4M 3% ZFS File Data 699858 5.3G 67% Anon 41418 323.5M 4% Exec and libs 1366 10.6M 0% Page cache 4782 37.3M 0% Free (cachelist) 1017 7.9M 0% Free (freelist) 33817 264.1M 3% Total 1048576 8G 'ZFS File Data' shows the size allocated from 'zfs_file_data_buf' vmem arena. 'ZFS Metadata' shows the size of "pages associated with zvp" - 'ZFS File Data'. Solaris 11.3 prior to SRU 17.5.0 > ::memstat Page Summary Pages Bytes %Tot ----------------- ---------------- ---------------- ---- Kernel 558607 4.2G 7% ZFS Metadata 27076 211.5M 0% ZFS File Data 2743214 20.9G 33% Anon 68656 536.3M 1% Exec and libs 2067 16.1M 0% Page cache 7285 56.9M 0% Free (cachelist) 21596 168.7M 0% Free (freelist) 4927709 37.5G 59% Total 8372224 63.8G > ::kom_class ADDR FLAGS NAME RSS MEM_TOTAL 4c066e91d80 -L- arc_meta 211.5m 280m 4c066e91c80 --- arc_data 20.9g 20.9g 'ZFS File Data' shows the size of KOM statistics of 'arc_data'. 'ZFS Metadata' shows the size of KOM statistics of 'arc_meta'. Solaris 11.3 with SRU 17.5 and without SRU 21.5 > ::memstat -v Page Summary Pages Bytes %Tot ---------------------------- ---------------- ---------------- ---- Kernel 636916 4.8G 4% Kernel (ZFS ARC excess) 16053 125.4M 0% Defdump prealloc 291049 2.2G 2% ZFS Metadata 137434 1.0G 1% ZFS File Data 4244593 32.3G 25% Anon 114975 898.2M 1% Exec and libs 2000 15.6M 0% Page cache 15548 121.4M 0% Free (cachelist) 253689 1.9G 2% Free (freelist) 11064959 84.4G 66% Total 16777216 128G ::memstat on Solaris 11.3 SRU 17.5 or later has '-v' option to show the details. 'ZFS File Data' and 'ZFS Metadata' shows the KOM stat same as before. In addition, 'Kernel (ZFS ARC excess)' shows the wasted memory of the sum of 'ZFS File Data' and 'ZFS Metadata'. KOM can keep allocated memory which is not actually used at the moment, which is considered wasted. Solaris 11.3 SRU 21.5 or later > ::memstat -v Page Summary Pages Bytes %Tot ---------------------------- ---------------- ---------------- ---- Kernel 671736 2.5G 6% Kernel (ZFS ARC excess) 21159 82.6M 0% Defdump prealloc 361273 1.3G 3% ZFS Kernel Data 131699 514.4M 1% ZFS Metadata 42962 167.8M 0% ZFS File Data 8857479 33.7G 84% Anon 99066 386.9M 1% Exec and libs 2050 8.0M 0% Page cache 9265 36.1M 0% Free (cachelist) 14663 57.2M 0% Free (freelist) 273905 1.0G 3% Total 10485257 39.9G In addition to the information prior to Solarsi 11.3 SRU 21.5, 'ZFS Kernel Data' shows the size allocated from 'kmem_default_zfs' arena (and its overhead). Solaris 11.4 or later > ::memstat -v Usage Type/Subtype Pages Bytes %Tot %Tot/%Subt ---------------------------- ---------------- -------- ----- ----------- Kernel 3669091 13.9g 7.2% Regular Kernel 2602037 9.9g 5.1%/70.9% ZFS ARC Fragmentation 14515 56.6m 0.0%/ 0.3% Defdump prealloc 1052539 4.0g 2.0%/28.6% ZFS 28359638 108.1g 56.3% ZFS Metadata 116083 453.4m 0.2%/ 0.4% ZFS Data 27959629 106.6g 55.5%/98.5% ZFS Kernel Data 283926 1.0g 0.5%/ 1.0% User/Anon 201462 786.9m 0.4% Exec and libs 3062 11.9m 0.0% Page Cache 29372 114.7m 0.0% Free (cachelist) 944 3.6m 0.0% Free 18033911 68.7g 35.8% Total 50297480 191.8g 100% 'ZFS ARC Fragmentation' under 'Kernel' shows the wasted memory. Why values reported by ::memstat is different from size reported by arcstats? There are a few factors. ARC size includes cached on-disk file data, cached on-disk meta data, and various in-core data. But ::memstat does not report each of them. Prior to Solaris 11.2, only 'ZFS File Data' is reported. Even on Solaris 11.2 and 11.3, in-core data is not reported. Also the accounting by arcstats and ::memstat does not completely match. ::memstat on Solaris 11.3 SRU 21.5 or later reports in-core data as 'ZFS Kernel Data', though in-core data counted by arcstats and by ::memstat are not exactly the same. Another factor is wasted memory in kmem caches. Consider a possible scenario here: customer ran a workload that was largely 128K blocksize based. This resulted in filling up the ARC cache with say X GB of 128K blocks. The customer then switched to a workload that was 8K based. The ARC cache now filled up Y GB of 8K blocks (the 128K blocks are evicted). When the 128K blocks are evicted from the ARC cache, they are returned to the 'zio_data_buf_131072' cache, where they will stay (unused by the ARC) until either re-allocated or "reaped" by the VM system. Under such a condition, 'ZFS File Data' shown by ::memstat can be much higher than the ARC size. Especially, from Solaris 11.1 with SRU 3.4 through Solaris 11.1 with SRU 21.4, large pages are used by default and the situation can be worse. ::memstat reports such waste as 'Kernel (ZFS ARC excess)' on Solaris 11.3 SRU 17.5 or later, or 'ZFS ARC Fragmentation' on Solaris 11.4 or later. Also it could happen 'ZFS File Data' is higher than the ARC size even though 'ZFS ARC excess / ZFS ARC Fragmentation' is not high. In this case, the ARC memory is freed but still have KOM objects associated. As discussed above, it is clear that reported values by ::memstat do not have to match with the value of ZFS ARC size. It is not an issue if ::memstat values are more or less than ZFS ARC size.