Flaws (作者:乖乖迪)   Date:2004.3.12

Keywords: Hardware Software Bug 7200 7500 GSR I/O Controller

    无论是硬件还是软件,即使在设计、测试、制造、检验等过程中都执行了严格的质量控制规范,仍有可能会出现一些瑕疵。下面介绍的几个硬件或软件的BUG,是乖乖迪在实践中碰到的,贡献出来,给大家做个参考,以便在您遇到时能够迅速地识别和处理。

    (按:实战Cisco网站不是一个个人网站。本站将多位以开放的精神进行技术交流、以奉献的精神提供无偿援助的技术人员的Cisco系统应用经验整理发布,以便提高我们国家的信息技术应用水平。如果您也是这样的技术人员,对实际工作中遇到的某些技术问题有深入的研究、准确的理解,欢迎您将经验通过本网站和大家共享。麦子热切地期待能有更多的朋友加入“实战Cisco”。)

    1.Cisco 7200路由器I/O控制模块硬件Bug可能会引起系统启动时报错:

	Warning: monitor nvram area is corrupt ... using default values
	environment checksum in NVRAM failed
	C7200 platform with 262144 Kbytes of main memory

    路由器进Rommon模式。该问题据说出现的概率大约为10的负19次方,处理方法是返修I/O控制模块。

    2.Cisco 7200、7500、GSR 平台上的提示:

	*Nov 30 00:00:40.771:  WARNING: Enviro Monitor Reference Voltage was ZERO !
	*Nov 30 00:00:41.771:  WARNING: Enviro Monitor Reference Voltage was ZERO !
	*Nov 30 00:00:42.771:  WARNING: Enviro Monitor Reference Voltage was ZERO !
	*Nov 30 00:00:43.771:  WARNING: Enviro Monitor Reference Voltage was ZERO !
	*Nov 30 00:00:44.771: %ENVM-0-SHUT: Environmental Monitor initiated shutdown
	Buffered messages:
	System Bootstrap, Version 12.2(4r)B2, RELEASE SOFTWARE (fc2)
	TAC Support: http://www.cisco.com/tac
	Copyright (c) 2002 by cisco Systems, Inc.

    以上提示看似电源毛病,其实也是一个硬件的Bug。表现为启动正常,启动完毕后重复出现告警,然后自动掉电(两个电源同时开着也这样),只有手动开机。show env 结果为:

	Router#sh env
	All measured values are normal
	Router#sh env last
	  I/O Cont Inlet     previously measured at 24C/75F
	  I/O Cont Outlet    previously measured at 24C/75F
	  NPE Inlet          previously measured at 25C/77F
	  NPE Outlet         previously measured at 25C/77F
	  +3.45 V            is unmeasured
	  +5.15 V            is unmeasured
	  +12.15 V           is unmeasured
	  -11.95 V           is unmeasured
	  last shutdown reason - critical voltage
	Router#sh env all
	Power Supplies:
	  Power Supply 1 is unmeasured.
	  Power Supply 2 is unmeasured.

	Temperature readings:
  	  I/O Cont Inlet   measured at 24C/75F 
	  I/O Cont Outlet  measured at 25C/77F 
	  NPE Inlet        measured at 26C/78F 
	  NPE Outlet       measured at 26C/78F 

	Voltage readings:
	  +3.45 V       is unmeasured
	  +5.15 V       is unmeasured
	  +12.15 V      is unmeasured
	  -11.95 V      is unmeasured

	Envm stats saved 0 time(s) since reload

    处理办法:返修I/O Controller

    3.Cisco Catalyst 6500 SUP720的引擎,IOS 12.2.14软件版本,如果做NAT,当NAT转换条目达到约6000条以上时,就会出现如下提示:

	*Mar  2 01:19:56.738: %SYS-3-CPUHOG: Task ran for 2048 msec (39/2), process = IP NAT Ager, PC = 4021C300.
	-Traceback= 4021C308 40EFB5CC 40EFBA64 40EFBF44
	*Mar  2 01:20:06.974: %SYS-3-CPUHOG: Task ran for 2172 msec (44/2), process = IP NAT Ager, PC = 4021C300.
	-Traceback= 4021C308 40EFB5CC 40EFBA64 40EFBF44

    系统隔一段时间就会重启一次,在Bootflash中记录Crash信息。 这个时候show proce cpu看到ip nat进程占CPU相当大:

	------------------ show process cpu ------------------


	CPU utilization for five seconds: 44%/8%; one minute: 43%; five minutes: 44%
	 PID Runtime(ms)   Invoked      uSecs   5Sec   1Min   5Min TTY Process 
       
	  72         200      6178         32  0.00%  0.00%  0.00%   0 Spanning Tree    
	  73           0         2          0  0.00%  0.00%  0.00%   0 Const MPLS RP pr 
	  74     1465884   7504888        195  3.18%  3.25%  3.65%   0 IP Input         
	  75        2480      6141        403  0.00%  0.00%  0.00%   0 CDP Protocol     
	  76           0         1          0  0.00%  0.00%  0.00%   0 PPPATM Session d 
	  77           0         2          0  0.00%  0.00%  0.00%   0 PASVC create VA  
	  78     7856368   4520305       1738 28.58% 30.23% 30.71%   0 IP NAT Ager      
	  79          32       393         81  0.00%  0.00%  0.00%   0 HWIF QoS Process 

    后来查到是软件的Bug,还是思科内部的Bug,用CCO帐号也看不到。后来升级到12.2.17就好了,同样数量的NAT条目,IP NAT Ager进程只占0.16%。

    <附>相关链接:(可能需要CCO帐号)
    Cisco Bug 搜索工具:http://www.cisco.com/cgi-bin/Support/Bugtool/home.pl
    功能:1.根据已知的Bug ID查询其详细信息 2.查询某一IOS软件版本已知的Bug 3.查询其他Cisco软件/硬件产品已知的Bug
    
    最新产品问题信息汇总(Field notices):http://www.cisco.com/kobayashi/support/tac/fn_index.html
    
    阅读某一软件/硬件的Release notes也是了解该产品问题的重要资料:包括仍存在的问题(open caveats)、已经解决的问题(resolved caveats)、重要提示(Important Notes)、局限(Limitations and Restrictions)等。如:3550 最新软件的Release notes

 

欢迎来信讨论。
版权所有,转载请注明作者及出处。