Monday, 8 March 2021

How to Check and Analyze Solaris Memory Usage

Solaris Operating System - Version 8.0 to 11.4 [Release 8.0 to 11.0] All Platforms *** Checked for currency and updated for Solaris 11.2 11-March-2015 *** Goal This document is intended to give hints, where to look for in checking and troubleshooting memory usage. In principle, investigation of memory usage is split in checking usage of kernel memory and user memory. Please be aware that in case of a memory-usage problem on a system, corrective actions usually requires deep knowledge and must be performed with great care. Solution General System Practices is to keep system up-to-date with latest Solaris releases and patches First, you need to check how much Memory is used in Kernel and how much is used in User Memory. This is important to decide, which further troubleshooting steps are required. A very useful mdb dcmd is '::memstat' ( this command can take several minutes to complete ) For more information on using the modular debugger, see the Oracle Solaris Modular Debugger Guide. Solaris[TM] 9 Operating System or greater only ! Format varies with OS release. This example is from Solaris 11.2 # echo "::memstat" | mdb -k Page Summary Pages Bytes %Tot ----------------- ---------------- ---------------- ---- Kernel 585584 4.4G 14% Defdump prealloc 204802 1.5G 5% Guest 0 0 0% ZFS Metadata 21436 167.4M 0% ZFS File Data 342833 2.6G 8% Anon 56636 442.4M 1% Exec and libs 1131 8.8M 0% Page cache 4339 33.8M 0% Free (cachelist) 8011 62.5M 0% Free (freelist) 2969532 22.6G 71% Total 4194304 32G User memory usage : print out processes using most USER - memory % prstat -s size # sorted by userland virtual memory consumption % prstat -s rss # sorted by userland physical memory consumption % prstat -s rss PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP 4051 user1 297M 258M sleep 59 0 1:35:05 0.0% mysqld/10 26286 user2 229M 180M sleep 59 0 0:05:07 0.0% java/53 27101 user2 237M 150M sleep 59 0 0:02:21 0.0% soffice.bin/5 23335 user2 193M 135M sleep 59 0 0:12:33 0.0% firefox-bin/10 3727 noaccess 192M 131M sleep 59 0 0:36:22 0.0% java/18 22751 root 165M 131M sleep 59 0 1:13:12 0.0% java/46 1448 noaccess 192M 108M sleep 59 0 0:34:47 0.0% java/18 10115 root 129M 82M sleep 59 0 0:31:29 0.0% java/41 20274 root 136M 77M stop 59 0 0:04:08 0.0% java/25 3397 root 138M 76M sleep 59 0 0:12:42 0.0% java/37 12949 pgsql 81M 70M sleep 59 0 0:09:36 0.0% postgres/1 12945 pgsql 80M 70M sleep 59 0 0:00:05 0.0% postgres/1 User Memory Usage : shows Shared Memory and Semaphores: % ipcs -a IPC status from T ID KEY MODE OWNER GROUP CREATOR CGROUP CBYTES QNUM QBYTES LSPID LRPID STIME RTIME CTIME Message Queues: q 0 0x55460272 -Rrw-rw---- root root root root 0 0 4194304 1390 18941 14:12:20 14:12:21 10:23:32 q 1 0x41460272 --rw-rw---- root root root root 0 0 4194304 5914 1390 8:03:34 8:03:34 10:23:39 q 2 0x4b460272 --rw-rw---- root root root root 0 0 4194304 0 0 no-entry no-entry 10:23:39 T ID KEY MODE OWNER GROUP CREATOR CGROUP NATTCH SEGSZ CPID LPID ATIME DTIME CTIME Shared Memory: m 0 0x50000b3f --rw-r--r-- root root root root 1 4 738 738 18:50:36 18:50:36 18:50:36 m 1 0x52574801 --rw-rw---- root oracle root oracle 35 1693450240 2049 26495 10:30:00 10:30:00 18:51:13 m 2 0x52574802 --rw-rw---- root oracle root oracle 35 1258291200 2049 26495 10:30:00 10:30:00 18:51:16 m 3 0x52594801 --rw-rw---- root oracle root oracle 12 241172480 2098 14328 7:58:33 7:58:33 18:51:27 m 4 0x52594802 --rw-rw---- root oracle root oracle 12 78643200 2098 14329 7:58:32 7:58:33 18:51:27 m 5 0x52584801 --rw-rw---- root oracle root oracle 13 125829120 2125 27492 1:36:12 1:36:12 18:51:34 m 6 0x52584802 --rw-rw---- root oracle root oracle 13 268435456 2125 27487 1:36:10 1:36:11 18:51:34 m 7 0x525a4801 --rw-rw---- root oracle root oracle 15 912261120 2160 27472 1:36:09 1:36:09 18:51:40 m 8 0x525a4802 --rw-rw---- root oracle root oracle 15 268435456 2160 27467 1:36:08 1:36:09 18:51:42 m 8201 0x4d2 --rw-rw-rw- root root root root 0 32008 1528 1543 10:26:03 10:26:04 10:25:53 T ID KEY MODE OWNER GROUP CREATOR CGROUP NSEMS OTIME CTIME Semaphores: s 0 0x1 --ra-ra-ra- root root root root 1 16:17:35 18:50:33 s 1 0 --ra-ra---- root oracle root oracle 36 10:33:28 18:51:17 s 2 0 --ra-ra---- root oracle root oracle 13 10:33:28 18:51:27 s 3 0 --ra-ra---- root oracle root oracle 14 10:33:28 18:51:34 s 4 0 --ra-ra---- root oracle root oracle 16 10:33:27 18:51:42 s 5 0x4d2 --ra-ra-ra- root root root root 1 no-entry 10:25:53 s 6 0x4d3 --ra-ra-ra- root root root root 1 no-entry 10:25:53 User Memory Usage : lists User Memory usage of all processes ( except PID 0,2,3 ) # pmap -x /proc/* > /var/tmp/pmap-x short list of total usage of these processes % egrep "[0-9]:|^total" /var/tmp/pmap-x 1: /sbin/init total Kb 2336 2080 128 - 1006: rlogin cores4 total Kb 2216 1696 80 - 1007: rlogin cores4 total Kb 2216 1696 104 - 115: /usr/sbin/nscd total Kb 4208 3784 1704 - -- snip -- User Memory Usage : check the usage of /tmp % df -kl /tmp Filesystem kbytes used avail capacity Mounted on swap 1355552 2072 1353480 1% /tmp print the biggest 10 files and dirs in /tmp % du -akd /tmp/ | sort -n | tail -10 288 /tmp/SUNWut 328 /tmp/log 576 /tmp/ips2 584 /tmp/explo 608 /tmp/ipso 3408 /tmp/sshd-truss.out 17992 /tmp/truss.p 22624 /tmp/js 49208 /tmp User Memory Usage : Overall Memory usage on system % vmstat -p 3 memory page executable anonymous filesystem swap free re mf fr de sr epi epo epf api apo apf fpi fpo fpf 19680912 27487976 21 94 0 0 0 0 0 0 0 0 0 14 0 0 3577608 11959480 0 20 0 0 0 0 0 0 0 0 0 0 0 0 3577328 11959240 0 5 0 0 0 0 0 0 0 0 0 0 0 0 3577328 11959112 38 207 0 0 0 0 0 0 0 0 0 0 0 0 3577280 11958944 0 1 0 0 0 0 0 0 0 0 0 0 0 0 scanrate 'sr' should be 0 or near zero User Memory Usage : Swap usage % swap -l swapfile dev swaplo blocks free /dev/dsk/c0t0d0s1 32,25 16 1946032 1946032 % swap -s total: 399400k bytes allocated + 18152k reserved = 417552k used, 1355480k available common kernel statistics print out all kernel statistics in a parse'able format % kstat -p > /var/tmp/kstat-p kernel memory statistics: % kstat -p -c kmem_cache % kstat -p -m vmem % kstat -p -c vmem % kstat -p | egrep zfs_file_data_buf | egrep mem_total alternatively to kstat you can get kernel memory usage with kmastat prints kmastat buffers # echo "::kmastat" | mdb -k > /var/tmp/kmastat % more /var/tmp/kmastat cache buf buf buf memory alloc alloc name size in use total in use succeed fail ------------------------- ------ ------ ------ --------- --------- ----- kmem_magazine_1 16 470 508 8192 470 0 kmem_magazine_3 32 970 1016 32768 1164 0 kmem_magazine_7 64 1690 1778 114688 1715 0 Look for the highest numbers in column "memory in use" and for any numbers higher than '0' in column "alloc fail" ZFS File Data: Keep system up-to-date with latest Solaris releases and patches Size memory requirements to actual system workload With a known application memory footprint, such as for a database application, you might cap the ARC size so that the application will not need to reclaim its necessary memory from the ZFS cache. Consider de-duplication memory requirements Identify ZFS memory usage with the following command: # mdb -k Loading modules: [ unix genunix specfs dtrace zfs scsi_vhci sd mpt mac px ldc ip hook neti ds arp usba kssl sockfs random mdesc idm nfs cpc crypto fcip fctl ufs logindmux ptm sppp ipc ] > ::memstat Page Summary Pages Bytes %Tot ----------------- ---------------- ---------------- ---- Kernel 261969 1.9G 6% Guest 0 0 0% ZFS Metadata 13915 108.7M 0% ZFS File Data 111955 874.6M 3% Anon 52339 408.8M 1% Exec and libs 1308 10.2M 0% Page cache 5932 46.3M 0% Free (cachelist) 16460 128.5M 0% Free (freelist) 3701754 28.2G 89% Total 4165632 31.7G > $q In case the amount of ZFS File Data is too high on the system, you might consider how to limit how much memory ZFS can consume. For Solaris revisions prior to Solaris 11, the only way accomplish this is to limit the ARC cache by setting zfs:zfs_arc_max in /etc/system set zfs:zfs_arc_max = [size] i.e. limit the cache to 1 GB in size set zfs:zfs_arc_max = 1073741824 Please check the following documents to check/limit the ARC How to Understand "ZFS File Data" Value by mdb and ZFS ARC Size. (Doc ID 1430323.1) Oracle Solaris Tunable Parameters Reference Manual Starting at Solaris 11, a second method, reserving memory for applications, may be used to prevent ZFS from using too much memory. The entry in /etc/system looks like this: set user_reserve_hint_pct=60 ARC size reported by arcstats arcstats kernel statistics reports the current ZFS ARC usage. # kstat -n arcstats module: zfs instance: 0 name: arcstats class: misc buf_size 37861488 data_size 7838309824 l2_hdr_size 0 meta_used 170464568 other_size 115650152 prefetch_meta_size 16952928 rawdata_size 0 size 8008774392 (The output is cut for brevity.) 'size' is the amount of active data in the ARC and it can be broken down as follows. Solaris 11.x prior to Solaris 11.3 SRU 13.4 and Solaris 10 without 150400-46/150401-46 size = meta_used + data_size; Solaris 11.3 SRU 13.4 or later and Solaris 10 with 150400-46/150401-46 or later size = data_size; meta_used = buf_size + other_size + l2_hdr_size + rawdata_size + prefetch_meta_size; buf_size: size of in-core data to manage ARC buffers. other_size: size of in-core data to mange ZFS objects. l2_hdr_size: size of in-core data to manage L2ARC. rawdata_size: size of raw data used for persistent L2ARC. (Solaris 11.2.8 or later) prefetch_meta_size: size of in-core data to manage prefetch. (Solaris 11.3 or later) data_size: size of cached on-disk file data and on-disk meta data. How ZFS ARC is allocated from kernel memory The way ZFS ARC is allocated from kernel memory depends on Solaris versions. Solaris 10, Solaris 11.0, Solaris 11.1 To cache on-disk file data, ARC is allocated from 'zio_data_buf_XXX' (XXX indicates cache unit size, such as '4096', '8192' etc.) kmem caches allocated from 'zfs_file_data_buf' virtual memory (vmem) arena. To cache on-disk meta data, ARC is allocated from 'zio_buf_XXX' kmem caches allocated from 'kmem_default' vmem arena. In-core data is allocated from other kmem caches, 'arc_buf_t', 'dmu_buf_impl_t', 'l2arc_buf_t', etc. allocated from 'kmem_default' vmem arena. Also 'zio_data_buf_XXX' and 'zio_buf_XXX' are not used only to cache on-disk file and meta data but also used by ZFS IO routines not for ZFS ARC purpose. Pages for 'zio_data_buf_XXX' are associated with the 'zvp' vnode and in the 'kzioseg' kernel segment. Pages for 'zio_buf_XXX' and other caches are associated with the 'kvp', usual kernel vnode. On Solaris 11.1 with SRU 3.4 or later, in addition to the above, 'zfs_file_data_lp_buf' vmem arena is used to allocate large pages. Solaris 11.2 To cache on-disk file data, ARC is allocated from 'zio_data_buf_XXX' kmem caches allocated from 'zfs_file_data_buf' vmem arena. To cache on-disk meta data, ARC is allocated from 'zio_buf_XXX' kmem caches allocated from 'zfs_metadta_buf' vmem arena. In-core data is allocated from other kmem caches, 'arc_buf_t', 'dmu_buf_impl_t', 'l2arc_buf_t', 'zfetch_triggert_t', etc. allocated from 'kmem_default' vmem arena. Also 'zio_data_buf_XXX' and 'zio_buf_XXX' are not used only to cache on-disk file and meta data but also used by ZFS IO routines not for ZFS ARC purpose. Pages for both 'zio_data_buf_XXX' and 'zio_buf_XXX' are associated with the 'zvp' vnode and in the 'kzioseg' kernel segment. Pages for other caches are associated with the 'kvp', usual kernel vnode. Solaris 11.3 prior to SRU 21.5 The new kernel memory allocation mechanism, Kernel Object Manager (KOM) is introduced. To cache on-disk file data, ARC is allocated from 'arc_data' kom class. To cache on-disk meta data, ARC is allocated from 'arc_meta' kom class. In-core data is allocated from other kmem caches, 'arc_buf_t', 'dmu_buf_impl_t', 'l2arc_buf_t', 'zfetch_triggert_t', etc. allocated from 'kmem_default' vmem arena. Memory used by ZFS IO routines not for ZFS ARC purpose are allocated as 'kmem_alloc_XXX' from 'kmem_default' vmem arena. 'kzioseg' segment and 'zvp' vnode no longer exist. Solaris 11.3 SRU 21.5 or later To cache on-disk file data, ARC is allocated from 'arc_data' kom class. To cache on-disk meta data, ARC is allocated from 'arc_meta' kom_class. 'kmem_default_zfs' vmem arena is introduced to account for kernel memory used by zfs not to cache on-disk data. In-core data, 'arc_buf_t', 'dmu_buf_impl_t', 'l2arc_buf_t', 'zfetch_triggert_t', etc., are now allocated from 'kmem_default_zfs' vmem arena. Memory used by ZFS IO routines not for ZFS ARC purpose are allocated as 'zio_buf_XXX' from 'kmem_default_zfs' vmem arena too. ZFS information reported by ::memstat in mdb ::memstat reports ZFS related memory usage also, but it's not exactly the same as arcstats and its implementation depends on OS versions. Solaris 10, Solaris 11.0, Solaris 11.1 > ::memstat Page Summary Pages MB %Tot ------------ ---------------- ---------------- ---- Kernel 540356 2110 13% ZFS File Data 609140 2379 15% Anon 41590 162 1% Exec and libs 5231 20 0% Page cache 2883 11 0% Free (cachelist) 800042 3125 19% Free (freelist) 2192512 8564 52% Total 4191754 16374 Physical 4102251 16024 'ZFS File Data' shows the size of pages associated with the 'zvp', which is the size allocated from 'zio_data_buf_XXX' kmem caches. It does not include on-disk meta data and in-core data. Also it contains some amount of data used by ZFS IO routines. Solaris 11.2 > ::memstat Page Summary Pages Bytes %Tot ----------------- ---------------- ---------------- ---- Kernel 237329 1.8G 23% Guest 0 0 0% ZFS Metadata 28989 226.4M 3% ZFS File Data 699858 5.3G 67% Anon 41418 323.5M 4% Exec and libs 1366 10.6M 0% Page cache 4782 37.3M 0% Free (cachelist) 1017 7.9M 0% Free (freelist) 33817 264.1M 3% Total 1048576 8G 'ZFS File Data' shows the size allocated from 'zfs_file_data_buf' vmem arena. 'ZFS Metadata' shows the size of "pages associated with zvp" - 'ZFS File Data'. Solaris 11.3 prior to SRU 17.5.0 > ::memstat Page Summary Pages Bytes %Tot ----------------- ---------------- ---------------- ---- Kernel 558607 4.2G 7% ZFS Metadata 27076 211.5M 0% ZFS File Data 2743214 20.9G 33% Anon 68656 536.3M 1% Exec and libs 2067 16.1M 0% Page cache 7285 56.9M 0% Free (cachelist) 21596 168.7M 0% Free (freelist) 4927709 37.5G 59% Total 8372224 63.8G > ::kom_class ADDR FLAGS NAME RSS MEM_TOTAL 4c066e91d80 -L- arc_meta 211.5m 280m 4c066e91c80 --- arc_data 20.9g 20.9g 'ZFS File Data' shows the size of KOM statistics of 'arc_data'. 'ZFS Metadata' shows the size of KOM statistics of 'arc_meta'. Solaris 11.3 with SRU 17.5 and without SRU 21.5 > ::memstat -v Page Summary Pages Bytes %Tot ---------------------------- ---------------- ---------------- ---- Kernel 636916 4.8G 4% Kernel (ZFS ARC excess) 16053 125.4M 0% Defdump prealloc 291049 2.2G 2% ZFS Metadata 137434 1.0G 1% ZFS File Data 4244593 32.3G 25% Anon 114975 898.2M 1% Exec and libs 2000 15.6M 0% Page cache 15548 121.4M 0% Free (cachelist) 253689 1.9G 2% Free (freelist) 11064959 84.4G 66% Total 16777216 128G ::memstat on Solaris 11.3 SRU 17.5 or later has '-v' option to show the details. 'ZFS File Data' and 'ZFS Metadata' shows the KOM stat same as before. In addition, 'Kernel (ZFS ARC excess)' shows the wasted memory of the sum of 'ZFS File Data' and 'ZFS Metadata'. KOM can keep allocated memory which is not actually used at the moment, which is considered wasted. Solaris 11.3 SRU 21.5 or later > ::memstat -v Page Summary Pages Bytes %Tot ---------------------------- ---------------- ---------------- ---- Kernel 671736 2.5G 6% Kernel (ZFS ARC excess) 21159 82.6M 0% Defdump prealloc 361273 1.3G 3% ZFS Kernel Data 131699 514.4M 1% ZFS Metadata 42962 167.8M 0% ZFS File Data 8857479 33.7G 84% Anon 99066 386.9M 1% Exec and libs 2050 8.0M 0% Page cache 9265 36.1M 0% Free (cachelist) 14663 57.2M 0% Free (freelist) 273905 1.0G 3% Total 10485257 39.9G In addition to the information prior to Solarsi 11.3 SRU 21.5, 'ZFS Kernel Data' shows the size allocated from 'kmem_default_zfs' arena (and its overhead). Solaris 11.4 or later > ::memstat -v Usage Type/Subtype Pages Bytes %Tot %Tot/%Subt ---------------------------- ---------------- -------- ----- ----------- Kernel 3669091 13.9g 7.2% Regular Kernel 2602037 9.9g 5.1%/70.9% ZFS ARC Fragmentation 14515 56.6m 0.0%/ 0.3% Defdump prealloc 1052539 4.0g 2.0%/28.6% ZFS 28359638 108.1g 56.3% ZFS Metadata 116083 453.4m 0.2%/ 0.4% ZFS Data 27959629 106.6g 55.5%/98.5% ZFS Kernel Data 283926 1.0g 0.5%/ 1.0% User/Anon 201462 786.9m 0.4% Exec and libs 3062 11.9m 0.0% Page Cache 29372 114.7m 0.0% Free (cachelist) 944 3.6m 0.0% Free 18033911 68.7g 35.8% Total 50297480 191.8g 100% 'ZFS ARC Fragmentation' under 'Kernel' shows the wasted memory. Why values reported by ::memstat is different from size reported by arcstats? There are a few factors. ARC size includes cached on-disk file data, cached on-disk meta data, and various in-core data. But ::memstat does not report each of them. Prior to Solaris 11.2, only 'ZFS File Data' is reported. Even on Solaris 11.2 and 11.3, in-core data is not reported. Also the accounting by arcstats and ::memstat does not completely match. ::memstat on Solaris 11.3 SRU 21.5 or later reports in-core data as 'ZFS Kernel Data', though in-core data counted by arcstats and by ::memstat are not exactly the same. Another factor is wasted memory in kmem caches. Consider a possible scenario here: customer ran a workload that was largely 128K blocksize based. This resulted in filling up the ARC cache with say X GB of 128K blocks. The customer then switched to a workload that was 8K based. The ARC cache now filled up Y GB of 8K blocks (the 128K blocks are evicted). When the 128K blocks are evicted from the ARC cache, they are returned to the 'zio_data_buf_131072' cache, where they will stay (unused by the ARC) until either re-allocated or "reaped" by the VM system. Under such a condition, 'ZFS File Data' shown by ::memstat can be much higher than the ARC size. Especially, from Solaris 11.1 with SRU 3.4 through Solaris 11.1 with SRU 21.4, large pages are used by default and the situation can be worse. ::memstat reports such waste as 'Kernel (ZFS ARC excess)' on Solaris 11.3 SRU 17.5 or later, or 'ZFS ARC Fragmentation' on Solaris 11.4 or later. Also it could happen 'ZFS File Data' is higher than the ARC size even though 'ZFS ARC excess / ZFS ARC Fragmentation' is not high. In this case, the ARC memory is freed but still have KOM objects associated. As discussed above, it is clear that reported values by ::memstat do not have to match with the value of ZFS ARC size. It is not an issue if ::memstat values are more or less than ZFS ARC size.

No comments:

Post a Comment