一、背景简介
作为一个DBA难免不了会遇到性能问题,那么我们遇到性能问题该如何进行排查呢?例如我们在高并发的业务下,出现业务响应慢,处理时间长我们又该如何入手进行排查,本片文章将分析io高的情况下如何分析及定位。
二、环境复现环境配置:本次测试使用C_G_4TSSD服务器配置,MySQL版本为8.0.27场景模拟:使用sysbench创建5个表,每个表2亿条数据,执行产生笛卡尔积查询的sql语句,产生io,可以模拟业务压力。首先使用sysbench进行数据压测三、系统层面底层故障排查Shellsysbench--test=/usr/local/share/sysbench/oltp_insert.lua--mysql-host=XXX--mysql-port=--mysql-user=pcms--mysql-password=abc--mysql-db=sysbench--percentile=99--table-size=--tables=5--threads=prepare
使用sysbench进行模拟高并发
shellsysbench--test=/usr/local/share/sysbench/oltp_write_only.lua--mysql-host=xxx--mysql-port=--mysql-user=pcms--mysql-password=abc--mysql-db=sysbench--percentile=99--table-size=--tables=5--threads=--max-time=--report-interval=1--threads=--max-requests=0--mysql-ignore-errors=allrun
执行笛卡尔积sql语句
mysqlselectSQL_NO_CACHEb.id,a.kfromsbtest_aaleftjoinsbtest_bbona.id=b.idgroupbya.korderbyb.cdesc;3.1检查当前服务器状态
shelltoptop-19:49:05up10days,8:16,2users,loadaverage:72.56,40.21,17.08Tasks:8total,1running,sleeping,0stopped,0zombie%Cpu(s):19.7us,4.2sy,0.0ni,75.9id,1.0wa,0.0hi,0.2si,0.0stKiBMem:+total,+free,+used,buff/cacheKiBSwap:0total,0free,0used.06+availMem
由上可知:目前一分钟负载为72.56,且呈上升趋势,并且存在io压力
3.2查看当前各个磁盘设备的io情况shelliostat-m-x1Linux4.14.0-.el7a.0.1.aarch64(mysql-4)01/08/_aarch64_(CPU)Device:rrqm/swrqm/sr/sw/srMB/swMB/savgrq-szavgqu-szawaitr_awaitw_awaitsvctm%utilsda0.............00sdb0.............00dm-00.............00dm-10.............00dm-20.............00
由上可知:目前有多块物理磁盘,sda磁盘的io压力较大
3.3检查sda磁盘当前的io读写情况shelliostat-d/dev/sda-m-x1Linux4.14.0-.el7a.0.1.aarch64(mysql-4)01/08/_aarch64_(CPU)Device:rrqm/swrqm/sr/sw/srMB/swMB/savgrq-szavgqu-szawaitr_awaitw_awaitsvctm%utilsda0.............33Device:rrqm/swrqm/sr/sw/srMB/swMB/savgrq-szavgqu-szawaitr_awaitw_awaitsvctm%utilsda0.............00Device:rrqm/swrqm/sr/sw/srMB/swMB/savgrq-szavgqu-szawaitr_awaitw_awaitsvctm%utilsda0...0044...3......0.00Device:rrqm/swrqm/sr/sw/srMB/swMB/savgrq-szavgqu-szawaitr_awaitw_awaitsvctm%utilsda0.............00
由上可知:目前sda磁盘的压力比较大,每秒写入比每秒读差距较大,证明目前有大量的io写入
3.4检查sda磁盘中哪个应用程序占用的io比较高shellpidstat-d1Linux4.14.0-.el7a.0.1.aarch64(mysql-4)01/08/_aarch64_(CPU)08:01:43PMUIDPIDkB_rd/skB_wr/skB_ccwr/sCommand08:01:44PM62018...00mysqld08:01:44PMUIDPIDkB_rd/skB_wr/skB_ccwr/sCommand08:01:45PM...00mysqld08:01:45PMUIDPIDkB_rd/skB_wr/skB_ccwr/sCommand08:01:46PM141.00433..00mysqld
由上可知:占用io高的应用程序是mysql,且pid为
3.5分析应用程序中哪一个线程占用的io比较高shellpidstat-dt-p1执行两三秒即可Average:-738...00
__mysqldAverage:-...00
__mysqld11:56:18PM-...00
__mysqld11:56:17PM-...00
__mysqldAverage:-.4003..00
__mysqld
由上可知:这个线程占用的io比较高
3.6分析这个线程在干什么?Shellperftrace-t-o/tmp/tmp_aa.pstraceShellcat/tmp/tmp_aa.pstrace.(1.ms):futex(uaddr:0xae9c4,op:WAIT
PRIVATE_FLAG,val:1)=.(0.ms):futex(uaddr:0xae,op:WAKE
PRIVATE_FLAG,val:1)=.(0.ms):clock_gettime(which_clock:MONOTONIC,tp:0xfff7bdf68)=.(0.ms):write(fd:,buf:0xd7a20,count:)=2852.(0.ms):clock_gettime(which_clock:MONOTONIC,tp:0xfff7bdf68)=.(0.ms):clock_gettime(which_clock:MONOTONIC,tp:0xfff7bdf58)=.(0.ms):clock_gettime(which_clock:MONOTONIC,tp:0xfff7bdf38)=.(0.ms):clock_gettime(which_clock:MONOTONIC,tp:0xfff7bdf68)=.(0.ms):clock_gettime(which_clock:MONOTONIC,tp:0xfff7bdf68)=.(0.ms):write(fd:64,buf:0xcd51e,count:)=2853.(0.ms):clock_gettime(which_clock:MONOTONIC,tp:0xfff7bdf68)=.(0.ms):clock_gettime(which_clock:MONOTONIC,tp:0xfff7bdf58)=.(0.ms):clock_gettime(which_clock:MONOTONIC,tp:0xfff7bdf38)=0
由上可知:目前这个线程在写入多个文件,fd为文件句柄,文件句柄号有64、
3.7查看这个文件句柄是什么shelllsof-p
grepumysqldmysqluREG8,/mysql/mysqldata/fff-5fd5-4c47-a-a9e1ad0d/tmp/#70(deleted)[root
mysql-4~]#lsof-pgrep64umysqldmysql64uREG8,79331/mysql/mysqldata/fff-5fd5-4c47-a-a9e1ad0d/tmp/#6979331(deleted)
由上可知:这个线程在大量的写入临时文件
四、分析MySQL应用程序4.1查看当前的会话列表mysqlselect*frominformation_schema.processlistwhere