1、根据以下这些方面来考虑故障定位: 弄清楚系统发生了什么问题 系统现在能做什么不能做什么 故障什么时候发生的 有没有做平时不同的操作 故障有没有规律定时还是不定时发生的频率有多高 是一台机器出现故障还是多台机器故障故障现象是否相同 最近有没有做改动如安装了新的硬件软件改变了系统的一些设置 故障信息的收集收集故障信息对于判断诊断故障原因修复系统非常重要1.1 控制台信息通过系统控制台来收集系统的信息,系统启动时的硬件自检信息,系统日常运行时的信息。如:Sun Fire 480R, No KeyboardCopyright 1998-2002 Sun Microsystems, Inc. All r
2、ights reserved.OpenBoot 4.7.5, 4096 MB memory installed, Serial #54021334.Ethernet address 0:3:ba:38:4c:d6, Host ID: 83384cd6.Rebooting with command: boot Boot device: /pci9,600000/SUNW,qlc2/fp0,0/diskw21000004cfd98e33,0:a File and args:SunOS Release 5.8 Version Generic_117350-18 64-bitCopyright 198
3、3-2003 Sun Microsystems, Inc.Starting VxVM restore daemon.VxVM starting in boot mode./usr/sbin/prtconf: getexecname() failedvxvm:vxconfigd: NOTICE: atf vendor_info: readlink fails for /dev/rdsk/c2t1d0s2: No such file or directoryNOTICE: vxvm:vxdmp: added disk array OTHER_DISKS, datype = OTHER_DISKS
4、Unable to resolve duplicate diskid. Please refer to release notes and admin guide for possible action/solution. Following are the disks with duplicate diskid: Vendor: SEAGATE Product: ST336607FSUN36G - c1t1d0s2, c1t0d0s2 WARNING: Detaching plex rootvol-01 from volume rootvol ERROR: Cannot start root
5、vol volume, no valid plexes System startup failedsyncing file systems. doneProgram terminated1.2 系统故障记录(messages)syslog 进程在系统启动时自动运行。记录包括硬件软件及其他操作信息。故障记录文件为/var/adm/messages,可备份下来或拷贝到别的机器上分析#vi /var/adm/messages列出简短出错信息Apr 3 03:10:20 s9svr2 in.mpathd1967: ID 472890 daemon.error phyint_inst_v4_sockin
6、it: setsockopt IP_DONTFAILOVER (inet rf2): Option not supported by protocol11:20 s9svr2 last message repeated 3 times40 s9svr2 in.mpathd1967:26:08 s9svr2 rf: ID 885255 kern.notice NOTICE: rf1: link down detected: mii_stat:7809 restarting auto-negotiation10 s9svr2 rf: ID 345559 kern.info rf0: auto-ne
7、gotiation done ID 345559 kern.info rf1: ID 103695 kern.info rf0: Link up: 100 Mbps full duplex without flow control ID 103695 kern.info rf1:12 s9svr2 cl_runtime: ID 273354 kern.notice NOTICE: CMM: Node s9svr1 (nodeid = 1) is dead在每条记录中,有事件发生的时间,事件的id,以及事件的类型,在查看messages文件时,特别要注意,事件类型为error,或者warnnin
8、g的项目1.3 主机面板上的LED 根据具体机型的不同,LED灯的状态不同,具体含义,要参照各个产品的说明文档。通常设备运行正常时,LED灯的状态是绿灯。1.4 系统引导的故障记录在系统引导的过程当中,系统的一些信息,包括正常的和错误的信息都不会直接显示出来,而是记录在日志文件中,我们可以在系统中用dmesg命令来显示这些信息。# dmesg Wed Apr 10 17:04:48 EDT 2002Apr 10 16:39:35 s9svr2 genunix: ID 936769 kern.info devinfo0 is /pseudo/devinfo035 s9svr2 cl_runtim
9、e: ID 499756 kern.notice NOTICE: Node s9svr2: joined cluster.36 s9svr2 cl_runtime: ID 487827 kern.notice NOTICE: CCR: Waiting for repository synchronization to finish.37 s9svr2 pseudo: ID 129642 kern.info pseudo-device: clprivnet057 s9svr2 rootnex: ID 349649 kern.info ffb0 at root: UPA 0x1e 0x057 s9
10、svr2 genunix: ID 936769 kern.info ffb0 is /SUNW,ffb1e,040:14 s9svr2 xntpd379: ID 301315 daemon.notice tickadj = 5, tick = 10000, tvu_maxslew = 495, est. hz = 100 ID 798731 daemon.notice using kernel phase-lock loop 004114 s9svr2 last message repeated 1 time14 s9svr2 Cluster.Framework: ID 801593 daem
11、on.notice stdout: releasing reservations for scsi-2 disks shared with s9svr121 s9svr2 Cluster.Framework: resetting scsi buses shared with non-cluster nodes42:12 s9svr2 in.mpathd1962: ID 472890 daemon.error phyint_inst_v4_sockinit:说明:与message文件一样,在每条记录中,都有事件发生的时间,事件的id,以及事件的类型,在查看messages文件时,特别要注意,事件
12、类型为error,或者warnning的项目。1.5 MAIL通常系统出现故障后,系统会定时提醒root用户,向root用户发mail,报告出错信息。# mailFrom *.b Tue Apr 9 06:53:56 2002Date: Tue, 9 Apr 2002 06:56 +0800 (CST)From: Super-User Message-Id: To: *.bSubject: Attempting VxVM relocation on host s9svr1Content-Length: 940Relocation was not successful for subdisks
13、on disk rootdisk_1 involume lvtest1 in disk group rootdg. No replacement was made and thedisk is still unusable.The following volumes have storage on rootdisk_1:lvtest1lvtest3rootdisk_16volrootvolswapvolThese volumes are still usable, but the the redundancy ofthose volumes is reduced. Any RAID-5 vol
14、umes with storage on the failed disk may become unusable in the face of further failures.1.6 运行故障诊断程序(prtdiad -v) 对系统硬件进行检查和诊断当发现有硬件故障时,或者怀疑有硬件故障时,应立即使用prtdiag。# prtdiag -vSystem Configuration: Sun Microsystems sun4u Sun Ultra 30 UPA/PCI (UltraSPARC-II 296MHz)System clock frequency: 99 MHzMemory siz
15、e: 512 Megabytes= CPUs = Run Ecache CPU CPUBrd Module MHz MB Impl. Mask- - - - - -0 0 296 2.0 US-II 2.0= IO Cards = Bus Freq Type Slot Name Model - - - - PCI 33 On-Board network-SUNW,hme scsi-glm/disk (block) Symbios,53C875 pcib slot 2 ethernet-pci10ec,8139 pcib - 66 pcia slot 1 ethernet-pci1113,121
16、1 UPA 99 30 FFB, Single Buffered SUNW,501-4789No failures found in System注意看各个设备项目检查后的状态,和最后的结论。1.7 其他用于收集系统信息的命令收集系统explorer信息# cd /opt/SUNWexplo/bin/./explorercd .cd outputls -l在这个目录下将看到一个最新的explorer文件,接着再用ftp(bin)方式下载到PC上进行分析。# prtconf sun4u 256 MegabytesSystem Peripherals (Software Nodes):SUNW,Ultra-30 packages (driver not attached) terminal-emulator (driver not attached) deblocker (driver not attache
copyright@ 2008-2023 冰点文库 网站版权所有
经营许可证编号:鄂ICP备19020893号-2