OnOutOfMemoryError占用大量IO问题

这是我ps -ef | grep OnOutOfMemory出来的omm      13141     1  3 Jul26 ?        2-01:49:42 /opt/huawei/Bigdata/jdk1.8.0_72/bin/java -Dprocess.name=nodeagent -Dbeetle.application.home.path=/opt/huawei/Bigdata/security/config -Dsun.rmi.transport.tcp.responseTimeout=60000 -Djava.library.path=/opt/huawei/Bigdata/nodeagent/lib -XX:ErrorFile=/var/log/Bigdata/nodeagent/agentlog/hs_error%p.log -Xms128m -Xmx2048m -XX:MetaspaceSize=64M -XX:MaxMetaspaceSize=128m -Dnodeagent.log.dir=/var/log/Bigdata/nodeagent -Dnodeagent.home=/opt/huawei/Bigdata/nodeagent -cp /opt/huawei/Bigdata/nodeagent/etc/agent:/opt/huawei/Bigdata/nodeagent/share/om/common/*:/opt/huawei/Bigdata/nodeagent/share/om/agent/*:/opt/huawei/Bigdata/nodeagent/share/om/lib/*:/opt/huawei/Bigdata/nodeagent/etc/om com.huawei.hadoop.om.agent.services.NodeAgent -XX:OnOutOfMemoryError=kill -9 %p
omm      18364     1 62 Jul26 ?        33-17:30:30 /opt/huawei/Bigdata/jdk1.8.0_72//bin/java -Dproc_datanode -XX:OnOutOfMemoryError=kill -9 %p -Xmx20622m -DIgnoreReplayReqDetect -Djava.security.auth.login.config=/opt/huawei/Bigdata/FusionInsight_V100R002C60SPC200/etc/1_8_DataNode/jaas.conf -Dzookeeper.server.principal=zookeeper/hadoop.hadoop_seq.com -Djava.security.krb5.conf=/opt/huawei/Bigdata/FusionInsight_V100R002C60SPC200/etc/1_6_KerberosClient/kdc.conf -Dzookeeper.request.timeout=120000 -Xms8G -Xmx8G -XX:NewSize=512M -XX:MaxNewSize=1G -XX:MetaspaceSize=128M -XX:MaxMetaspaceSize=128M -XX:CMSFullGCsBeforeCompaction=1 -XX:MaxDirectMemorySize=1G -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+UseCMSCompactAtFullCollection -XX:CMSInitiatingOccupancyFraction=65 -Xloggc:/var/log/Bigdata/hdfs/dn/datanode-omm-gc.log -XX:+PrintGCDetails -Dsun.rmi.dgc.client.gcInterval=0x7FFFFFFFFFFFFFE -Dsun.rmi.dgc.server.gcInterval=0x7FFFFFFFFFFFFFE -XX:-OmitStackTraceInFastThrow -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=1M -Xmx20622M -XX:ErrorFile=/var/log/Bigdata/hdfs/dn/datanode-omm-gc-err-%p.log -Dhadoop.log.dir=/var/log/Bigdata/hdfs/dn -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/opt/huawei/Bigdata/FusionInsight/FusionInsight-Hadoop-2.7.2/hadoop -Dhadoop.id.str= -Dhadoop.root.logger=INFO,RFA -Dlog4j.configuration=log4j.properties -Djava.library.path=/opt/huawei/Bigdata/FusionInsight/FusionInsight-Hadoop-2.7.2/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dlog4j.configuration.watch=true -Dhadoop.log.dir=/var/log/Bigdata/hdfs/dn -Dhadoop.log.file=hadoop-omm-datanode-pc-zjqbrq61.log -Dhadoop.home.dir=/opt/huawei/Bigdata/FusionInsight/FusionInsight-Hadoop-2.7.2/hadoop -Dhadoop.id.str=omm -Dhadoop.root.logger=INFO,RFA -Dlog4j.configuration=log4j.properties -Djava.library.path=/opt/huawei/Bigdata/FusionInsight/FusionInsight-Hadoop-2.7.2/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -server -Dhadoop.metrics.log.file=hadoop-omm-datanode-metrics-pc-zjqbrq61.log -Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.server.datanode.DataNode
omm      20735     1 42 Jul26 ?        22-22:07:28 /opt/huawei/Bigdata/jdk1.8.0_72//bin/java -Dproc_nodemanager -XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m -Dlibrary.leveldbjni.path=/srv/BigData/tmp/nm -DIgnoreReplayReqDetect -Djava.security.auth.login.config=/opt/huawei/Bigdata/FusionInsight_V100R002C60SPC200/etc/1_9_NodeManager/jaas.conf -Dzookeeper.server.principal=zookeeper/hadoop.hadoop_seq.com -Djava.security.krb5.conf=/opt/huawei/Bigdata/FusionInsight_V100R002C60SPC200/etc/1_6_KerberosClient/kdc.conf -Dzookeeper.request.timeout=120000 -Xms2G -Xmx4G -XX:NewSize=128M -XX:MaxNewSize=128M -XX:MetaspaceSize=128M -XX:MaxMetaspaceSize=128M -XX:CMSFullGCsBeforeCompaction=1 -XX:MaxDirectMemorySize=128M -XX:+UseConcMarkSweepGC -XX:+UseCMSCompactAtFullCollection -XX:CMSInitiatingOccupancyFraction=65 -Xloggc:/var/log/Bigdata/yarn/nm/nodemanager-omm-gc.log -XX:+PrintGCDetails -Dsun.rmi.dgc.client.gcInterval=0x7FFFFFFFFFFFFFE -Dsun.rmi.dgc.server.gcInterval=0x7FFFFFFFFFFFFFE -XX:-OmitStackTraceInFastThrow -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=1M -XX:ErrorFile=/var/log/Bigdata/yarn/nm/nodemanager-omm-gc-err-%p.log -Dlog4j.configuration.watch=true -server -Dhadoop.log.dir=/var/log/Bigdata/yarn/nm -Dyarn.log.dir=/var/log/Bigdata/yarn/nm -Dhadoop.log.file=yarn-omm-nodemanager-pc-zjqbrq61.log -Dyarn.log.file=yarn-omm-nodemanager-pc-zjqbrq61.log -Dyarn.home.dir=/opt/huawei/Bigdata/FusionInsight/FusionInsight-Hadoop-2.7.2/hadoop -Dhadoop.home.dir=/opt/huawei/Bigdata/FusionInsight/FusionInsight-Hadoop-2.7.2/hadoop -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Dlog4j.configuration=log4j.properties -Djava.library.path=/opt/huawei/Bigdata/FusionInsight/FusionInsight-Hadoop-2.7.2/hadoop/lib/native -classpath /opt/huawei/Bigdata/FusionInsight_V100R002C60SPC200/etc/1_9_NodeManager:/opt/huawei/Bigdata/FusionInsight_V100R002C60SPC200/etc/1_9_NodeManager:/opt/huawei/Bigdata/FusionInsight_V100R002C60SPC200/etc/1_9_NodeManager:/opt/huawei/Bigdata/FusionInsight/FusionInsight-Hadoop-2.7.2/hadoop/share/hadoop/common/lib/*:/opt/huawei/Bigdata/FusionInsight/FusionInsight-Hadoop-2.7.2/hadoop/share/hadoop/common/*:/opt/huawei/Bigdata/FusionInsight/FusionInsight-Hadoop-2.7.2/hadoop/share/hadoop/hdfs:/opt/huawei/Bigdata/FusionInsight/FusionInsight-Hadoop-2.7.2/hadoop/share/hadoop/hdfs/lib/*:/opt/huawei/Bigdata/FusionInsight/FusionInsight-Hadoop-2.7.2/hadoop/share/hadoop/hdfs/*:/opt/huawei/Bigdata/FusionInsight/FusionInsight-Hadoop-2.7.2/hadoop/share/hadoop/yarn/lib/*:/opt/huawei/Bigdata/FusionInsight/FusionInsight-Hadoop-2.7.2/hadoop/share/hadoop/yarn/*:/opt/huawei/Bigdata/FusionInsight/FusionInsight-Hadoop-2.7.2/hadoop/share/hadoop/mapreduce/lib/*:/opt/huawei/Bigdata/FusionInsight/FusionInsight-Hadoop-2.7.2/hadoop/share/hadoop/mapreduce/*:/opt/huawei/Bigdata/FusionInsight/FusionInsight-Hadoop-2.7.2/hadoop/share/hadoop/yarn/*:/opt/huawei/Bigdata/FusionInsight/FusionInsight-Hadoop-2.7.2/hadoop/share/hadoop/yarn/lib/* org.apache.hadoop.yarn.server.nodemanager.NodeManager
root     21233  8984  0 15:23 pts/1    00:00:00 grep OnOutOfMemoryError
omm      26523 26508  0 Jul26 ?        03:17:24 /opt/huawei/Bigdata/jdk1.8.0_72//bin/java -Dproc_regionserver -XX:OnOutOfMemoryError=kill -9 %p -Xmx4124m -DIgnoreReplayReqDetect -Djava.security.auth.login.config=/opt/huawei/Bigdata/FusionInsight_V100R002C60SPC200/etc/1_19_RegionServer2/jaas.conf -Dzookeeper.server.principal=zookeeper/hadoop.hadoop_seq.com -Dzookeeper.request.timeout=120000 -Djava.security.krb5.conf=/opt/huawei/Bigdata/FusionInsight_V100R002C60SPC200/etc/1_6_KerberosClient/kdc.conf -Dzookeeper.client.bind.address=10.78.155.128 -server -Xms4G -XX:NewSize=512M -XX:MaxNewSize=512M -XX:MetaspaceSize=128M -XX:MaxMetaspaceSize=512M -XX:MaxDirectMemorySize=512M -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=65 -XX:+PrintGCDetails -Dsun.rmi.dgc.client.gcInterval=0x7FFFFFFFFFFFFFE -Dsun.rmi.dgc.server.gcInterval=0x7FFFFFFFFFFFFFE -XX:-OmitStackTraceInFastThrow -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=1M -Xloggc:/var/log/Bigdata/hbase2/rs/regionserver-omm-20170726105148-gc.log -Djava.io.tmpdir=/srv/BigData/tmp/snappy_hbase2 -Dorg.xerial.snappy.tempdir=/srv/BigData/tmp/snappy_hbase2 -Dbeetle.application.home.path=/opt/huawei/Bigdata/security/config -Dhbase.log.dir=/var/log/Bigdata/hbase2/rs -Dhbase.log.file=hbase-omm-regionserver-pc-zjqbrq61.log -Dhbase.home.dir=/opt/huawei/Bigdata/FusionInsight/FusionInsight-HBase-1.0.2/hbase -Dhbase.id.str=omm -Dhbase.root.logger=INFO,RFA -Djava.library.path=/opt/huawei/Bigdata/FusionInsight/FusionInsight-HBase-1.0.2/hbase/lib/native:/opt/huawei/Bigdata/FusionInsight/FusionInsight-HBase-1.0.2/hbase/lib/native/Linux-amd64-64 -Dlog4j.configuration.watch=true -Dhbase.security.logger=INFO,RFAS org.apache.hadoop.hbase.regionserver.HRegionServer start
omm      27643 27623 65 Jul26 ?        35-16:55:29 /opt/huawei/Bigdata/jdk1.8.0_72//bin/java -Dproc_regionserver -XX:OnOutOfMemoryError=kill -9 %p -Xmx20622m -DIgnoreReplayReqDetect -Djava.security.auth.login.config=/opt/huawei/Bigdata/FusionInsight_V100R002C60SPC200/etc/1_10_RegionServer/jaas.conf -Dzookeeper.server.principal=zookeeper/hadoop.hadoop_seq.com -Dzookeeper.request.timeout=120000 -Djava.security.krb5.conf=/opt/huawei/Bigdata/FusionInsight_V100R002C60SPC200/etc/1_6_KerberosClient/kdc.conf -Dzookeeper.client.bind.address=10.78.155.128 -server -Xms16G -XX:NewSize=1G -XX:MaxNewSize=1G -XX:MetaspaceSize=128M -XX:MaxMetaspaceSize=512M -XX:MaxDirectMemorySize=512M -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=85 -XX:+PrintGCDetails -Dsun.rmi.dgc.client.gcInterval=0x7FFFFFFFFFFFFFE -Dsun.rmi.dgc.server.gcInterval=0x7FFFFFFFFFFFFFE -XX:-OmitStackTraceInFastThrow -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=1M -Xloggc:/var/log/Bigdata/hbase/rs/regionserver-omm-20170726105153-gc.log -Djava.io.tmpdir=/srv/BigData/tmp/snappy_hbase -Dorg.xerial.snappy.tempdir=/srv/BigData/tmp/snappy_hbase -Dbeetle.application.home.path=/opt/huawei/Bigdata/security/config -Dhbase.log.dir=/var/log/Bigdata/hbase/rs -Dhbase.log.file=hbase-omm-regionserver-pc-zjqbrq61.log -Dhbase.home.dir=/opt/huawei/Bigdata/FusionInsight/FusionInsight-HBase-1.0.2/hbase -Dhbase.id.str=omm -Dhbase.root.logger=INFO,RFA -Djava.library.path=/opt/huawei/Bigdata/FusionInsight/FusionInsight-HBase-1.0.2/hbase/lib/native:/opt/huawei/Bigdata/FusionInsight/FusionInsight-HBase-1.0.2/hbase/lib/native/Linux-amd64-64 -Dlog4j.configuration.watch=true -Dhbase.security.logger=INFO,RFAS org.apache.hadoop.hbase.regionserver.HRegionServer start
omm      28904 28888 82 Jul26 ?        44-12:38:09 /opt/huawei/Bigdata/jdk1.8.0_72//bin/java -Dproc_regionserver -XX:OnOutOfMemoryError=kill -9 %p -Xmx8249m -DIgnoreReplayReqDetect -Djava.security.auth.login.config=/opt/huawei/Bigdata/FusionInsight_V100R002C60SPC200/etc/1_13_RegionServer1/jaas.conf -Dzookeeper.server.principal=zookeeper/hadoop.hadoop_seq.com -Dzookeeper.request.timeout=120000 -Djava.security.krb5.conf=/opt/huawei/Bigdata/FusionInsight_V100R002C60SPC200/etc/1_6_KerberosClient/kdc.conf -Dzookeeper.client.bind.address=10.78.155.128 -server -Xms4G -XX:NewSize=512M -XX:MaxNewSize=512M -XX:MetaspaceSize=128M -XX:MaxMetaspaceSize=512M -XX:MaxDirectMemorySize=512M -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=65 -XX:+PrintGCDetails -Dsun.rmi.dgc.client.gcInterval=0x7FFFFFFFFFFFFFE -Dsun.rmi.dgc.server.gcInterval=0x7FFFFFFFFFFFFFE -XX:-OmitStackTraceInFastThrow -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=1M -Xloggc:/var/log/Bigdata/hbase1/rs/regionserver-omm-20170726105155-gc.log -Djava.io.tmpdir=/srv/BigData/tmp/snappy_hbase1 -Dorg.xerial.snappy.tempdir=/srv/BigData/tmp/snappy_hbase1 -Dbeetle.application.home.path=/opt/huawei/Bigdata/security/config -Dhbase.log.dir=/var/log/Bigdata/hbase1/rs -Dhbase.log.file=hbase-omm-regionserver-pc-zjqbrq61.log -Dhbase.home.dir=/opt/huawei/Bigdata/FusionInsight/FusionInsight-HBase-1.0.2/hbase -Dhbase.id.str=omm -Dhbase.root.logger=INFO,RFA -Djava.library.path=/opt/huawei/Bigdata/FusionInsight/FusionInsight-HBase-1.0.2/hbase/lib/native:/opt/huawei/Bigdata/FusionInsight/FusionInsight-HBase-1.0.2/hbase/lib/native/Linux-amd64-64 -Dlog4j.configuration.watch=true -Dhbase.security.logger=INFO,RFAS org.apache.hadoop.hbase.regionserver.HRegionServer start
omm      29554 29540  0 Jul26 ?        03:17:50 /opt/huawei/Bigdata/jdk1.8.0_72//bin/java -Dproc_regionserver -XX:OnOutOfMemoryError=kill -9 %p -Xmx4124m -DIgnoreReplayReqDetect -Djava.security.auth.login.config=/opt/huawei/Bigdata/FusionInsight_V100R002C60SPC200/etc/1_20_RegionServer3/jaas.conf -Dzookeeper.server.principal=zookeeper/hadoop.hadoop_seq.com -Dzookeeper.request.timeout=120000 -Djava.security.krb5.conf=/opt/huawei/Bigdata/FusionInsight_V100R002C60SPC200/etc/1_6_KerberosClient/kdc.conf -Dzookeeper.client.bind.address=10.78.155.128 -server -Xms4G -XX:NewSize=512M -XX:MaxNewSize=512M -XX:MetaspaceSize=128M -XX:MaxMetaspaceSize=512M -XX:MaxDirectMemorySize=512M -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=65 -XX:+PrintGCDetails -Dsun.rmi.dgc.client.gcInterval=0x7FFFFFFFFFFFFFE -Dsun.rmi.dgc.server.gcInterval=0x7FFFFFFFFFFFFFE -XX:-OmitStackTraceInFastThrow -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=1M -Xloggc:/var/log/Bigdata/hbase3/rs/regionserver-omm-20170726105158-gc.log -Djava.io.tmpdir=/srv/BigData/tmp/snappy_hbase3 -Dorg.xerial.snappy.tempdir=/srv/BigData/tmp/snappy_hbase3 -Dbeetle.application.home.path=/opt/huawei/Bigdata/security/config -Dhbase.log.dir=/var/log/Bigdata/hbase3/rs -Dhbase.log.file=hbase-omm-regionserver-pc-zjqbrq61.log -Dhbase.home.dir=/opt/huawei/Bigdata/FusionInsight/FusionInsight-HBase-1.0.2/hbase -Dhbase.id.str=omm -Dhbase.root.logger=INFO,RFA -Djava.library.path=/opt/huawei/Bigdata/FusionInsight/FusionInsight-HBase-1.0.2/hbase/lib/native:/opt/huawei/Bigdata/FusionInsight/FusionInsight-HBase-1.0.2/hbase/lib/native/Linux-amd64-64 -Dlog4j.configuration.watch=true -Dhbase.security.logger=INFO,RFAS org.apache.hadoop.hbase.regionserver.HRegionServer start
omm      30202 30097  0 Jul26 ?        03:16:51 /opt/huawei/Bigdata/jdk1.8.0_72//bin/java -Dproc_regionserver -XX:OnOutOfMemoryError=kill -9 %p -Xmx4124m -DIgnoreReplayReqDetect -Djava.security.auth.login.config=/opt/huawei/Bigdata/FusionInsight_V100R002C60SPC200/etc/1_21_RegionServer4/jaas.conf -Dzookeeper.server.principal=zookeeper/hadoop.hadoop_seq.com -Dzookeeper.request.timeout=120000 -Djava.security.krb5.conf=/opt/huawei/Bigdata/FusionInsight_V100R002C60SPC200/etc/1_6_KerberosClient/kdc.conf -Dzookeeper.client.bind.address=10.78.155.128 -server -Xms4G -XX:NewSize=512M -XX:MaxNewSize=512M -XX:MetaspaceSize=128M -XX:MaxMetaspaceSize=512M -XX:MaxDirectMemorySize=512M -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=65 -XX:+PrintGCDetails -Dsun.rmi.dgc.client.gcInterval=0x7FFFFFFFFFFFFFE -Dsun.rmi.dgc.server.gcInterval=0x7FFFFFFFFFFFFFE -XX:-OmitStackTraceInFastThrow -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=1M -Xloggc:/var/log/Bigdata/hbase4/rs/regionserver-omm-20170726105200-gc.log -Djava.io.tmpdir=/srv/BigData/tmp/snappy_hbase4 -Dorg.xerial.snappy.tempdir=/srv/BigData/tmp/snappy_hbase4 -Dbeetle.application.home.path=/opt/huawei/Bigdata/security/config -Dhbase.log.dir=/var/log/Bigdata/hbase4/rs -Dhbase.log.file=hbase-omm-regionserver-pc-zjqbrq61.log -Dhbase.home.dir=/opt/huawei/Bigdata/FusionInsight/FusionInsight-HBase-1.0.2/hbase -Dhbase.id.str=omm -Dhbase.root.logger=INFO,RFA -Djava.library.path=/opt/huawei/Bigdata/FusionInsight/FusionInsight-HBase-1.0.2/hbase/lib/native:/opt/huawei/Bigdata/FusionInsight/FusionInsight-HBase-1.0.2/hbase/lib/native/Linux-amd64-64 -Dlog4j.configuration.watch=true -Dhbase.security.logger=INFO,RFAS org.apache.hadoop.hbase.regionserver.HRegionServer start
已邀请:

igloo1986

赞同来自: lvwenyuan

OnOutOfMemory 这个是启动参数,jvm参数,ps -ef | grep OnOoutOfMemory只是看到了用这个jvm参数的进程而已,不能说明任何问题,也不是错误。hbase集群中,bhase本身并不和本地磁盘打交道,io一定是datanode节点做的
hbase产生io的行为有
compaction, flushing, 写log,宕机恢复的时候的logsplit,其他几乎没有了。

libis - HBase爱好者

赞同来自:

不清楚问题是啥

lvwenyuan

赞同来自:

用iotop看了下。主要是java -Dproc_datanode -xx:OnOutOfMemoryError=kill -9 %p -Xmx20622m -DIgnoreReplayReqDete~1.log -Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.erver.datanode.Datanode这种进程占用的很多,还有就是他们写也占用了一定的io。

igloo1986

赞同来自:

是的,hbase除了本地log意外,所有写数据都是通过hdfs,datanode就是hdfs中最终数据落盘的服务。

要回复问题请先登录注册