Aacraid monitoring: различия между версиями
Sirmax (обсуждение | вклад) |
Sirmax (обсуждение | вклад) |
||
| Строка 370: | Строка 370: | ||
Command completed successfully. |
Command completed successfully. |
||
</PRE> |
</PRE> |
||
| − | == |
+ | ==Интеграция с Nagios== |
| + | Для того что бы удаленно мониторить состояние массива я решил раз в 5 минут (cron) запускать комманду |
||
| + | mordred# arcconf GETCONFIG 1 AL и разбирать ее результат. Значения возвращать через snmpd. |
||
| + | Думаю, нет смысла возвращать значения сколько винтов работает, или сколько не работает, единственная ошибка - уже повод принимать какие-то меры. |
||
Версия 10:23, 5 мая 2009
Adaptec Raid Monitoring
После года эксплуотации мне пришла в голову мысль, что никак кроме как по состоянию "лампочек" я не узнаю, живые ли винты в рейде. А их у меня аж 2, в шасси от SuperMicro
04:01.0 RAID bus controller: Adaptec AAC-RAID (rev 02)
Subsystem: Adaptec ASR-4000
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr+ Stepping+ SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 32 (250ns min, 250ns max), Cache Line Size: 32 bytes
Interrupt: pin A routed to IRQ 16
Region 0: Memory at d8200000 (64-bit, non-prefetchable) [size=2M]
Region 2: Memory at d8000000 (32-bit, non-prefetchable) [size=2M]
Region 4: Memory at c0000000 (32-bit, prefetchable) [size=256M]
Capabilities: [c0] Power Management version 2
Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [d0] Message Signalled Interrupts: Mask- 64bit+ Queue=0/1 Enable-
Address: 0000000000000000 Data: 0000
Capabilities: [e0] PCI-X non-bridge device
Command: DPERE- ERO- RBC=512 OST=4
Status: Dev=04:01.0 64bit+ 133MHz+ SCD- USC- DC=bridge DMMRBC=1024 DMOST=4 DMCRS=16 RSCEM- 266MHz- 533MHz-
Kernel driver in use: aacraid
Для управления есть утилита "от производителя" - Storage Manager
Установка
На обоих серверах установлена Gentoo, ebuild http://www.gentoo.ru/node/14090
Обратить внимаени на версии, возможно прийдется доработать. Кроме того мне пришлось заменить
SRC_URI_amd64="${SRC_URI_BASE}/${PN}_linux_x64_v${PV}.rpm"
на
SRC_URI_amd64="http://download.adaptec.com/raid/storage_manager/asm_linux_x64_v5_20_17414.rpm"
# Copyright 1999-2009 Gentoo Foundation
# Distributed under the terms of the GNU General Public License v2
# $Header: $
EAPI=2
inherit multilib rpm versionator
DESCRIPTION="Storage manager for Adaptec RAID controller"
HOMEPAGE="http://www.adaptec.com/en-US/downloads/"
LICENSE="Adaptec"
SLOT="0"
KEYWORDS="~amd64"
IUSE="X"
RESTRICT="mirror"
SRC_URI_BASE="http://download.adaptec.com/raid/storage_manager"
SRC_URI_amd64="${SRC_URI_BASE}/${PN}_linux_x64_v${PV}.rpm"
SRC_URI="amd64? ( ${SRC_URI_amd64} )"
RDEPEND="sys-libs/libstdc++-v3
!=sys-devel/gcc-3*
X? ( dev-java/sun-jdk:1.5[X] )
!X? ( dev-java/sun-jdk:1.5 )"
S="${WORKDIR}/usr/StorMan"
src_unpack() {
rpm_src_unpack
}
src_configure() {
# binpkg - nothing to do here
:;
}
src_compile() {
# binpkg - nothing to do here
:;
}
src_install() {
if use X ; then
cd "${S}" || die
insinto /opt/StorMan
doins index.html *.jar *.pps *.so
# StorMan needs the help inside of /opt/StorMan
doins -r help
into /opt
dobin "${FILESDIR}"/StorMan.sh
dosed "s:%LIBDIR%:/usr/$(get_libdir):" /opt/bin/StorMan.sh
dobin "${FILESDIR}"/StorAgnt.sh
dosed "s:%LIBDIR%:/usr/$(get_libdir):" /opt/bin/StorAgnt.sh
dosed 's:\(\.log=\):\1/var/log:g' /opt/StorMan/RaidLog.pps
fi
into /opt/StorMan
dobin {arc,hr}conf
dosym ../StorMan/bin/arcconf /opt/bin/arcconf
dosym ../StorMan/bin/hrconf /opt/bin/hrconf
dodoc README.TXT
}
Установка версии x86 практически не отличается от amd64, за исключением того, что номера версий и способы их именования отличаются. Это отражено в патче ниже:
--- asm-5.01.16862.ebuild 2009-03-04 13:48:02.000000000 +0300
+++ asm-5.20.17414.ebuild 2009-03-04 14:06:57.000000000 +0300
@@ -6,20 +6,21 @@
inherit multilib rpm versionator
+CH_PV="$(replace_all_version_separators _ ${PV})"
DESCRIPTION="Storage manager for Adaptec RAID controller"
HOMEPAGE="http://www.adaptec.com/en-US/downloads/"
LICENSE="Adaptec"
SLOT="0"
-KEYWORDS="~amd64"
+KEYWORDS="~x86" IUSE="X"
RESTRICT="mirror"
SRC_URI_BASE="http://download.adaptec.com/raid/storage_manager"
-SRC_URI_amd64="${SRC_URI_BASE}/${PN}_linux_x64_v${PV}.rpm"
+SRC_URI_x86="${SRC_URI_BASE}/${PN}_linux_x86_v${CH_PV}.rpm"
-SRC_URI="amd64? ( ${SRC_URI_amd64} )"
+SRC_URI="x86? ( ${SRC_URI_x86} )"
RDEPEND="sys-libs/libstdc++-v3
Использование
Все более-менее очевидно, думаю результат работы комманды не требует особых пояснений.
mordred ~ # arcconf GETCONFIG 1 AL
Controllers found: 1
----------------------------------------------------------------------
Controller information
----------------------------------------------------------------------
Controller Status : Optimal
Channel description : SAS/SATA
Controller Model : Adaptec 4000SAS
Controller Serial Number : BAD0
Physical Slot : 1
Installed memory : 256 MB
Copyback : Disabled
Background consistency check : Disabled
Automatic Failover : Enabled
Defunct disk drive count : 0
Logical devices/Failed/Degraded : 1/0/0
--------------------------------------------------------
Controller Version Information
--------------------------------------------------------
BIOS : 5.1-0 (8461)
Firmware : 5.1-0 (8461)
Driver : 1.1-5 (2449)
Boot Flash : 0.0-0 (0)
--------------------------------------------------------
Controller Battery Information
--------------------------------------------------------
Status : Not Installed
----------------------------------------------------------------------
Logical device information
----------------------------------------------------------------------
Logical device number 0
Logical device name : System
RAID level : 10
Status of logical device : Optimal
Size : 279600 MB
Stripe-unit size : 256 KB
Read-cache mode : Enabled
Write-cache mode : Enabled (write-back)
Write-cache setting : Enabled (write-back)
Partitioned : Yes
Protected by Hot-Spare : No
Bootable : Yes
Failed stripes : No
--------------------------------------------------------
Logical device segment information
--------------------------------------------------------
Group 0, Segment 0 : Present (0,0) 3LQ1GXXX
Group 0, Segment 1 : Present (0,4) 3LQ1GXXX
Group 1, Segment 0 : Present (0,1) 3LQ1GXXX
Group 1, Segment 1 : Present (0,5) 3LQ1LXXX
Group 2, Segment 0 : Present (0,2) 3LQ1GXXX
Group 2, Segment 1 : Present (0,6) 3LQ1NXXX
Group 3, Segment 0 : Present (0,3) 3LQ1NXXX
Group 3, Segment 1 : Present (0,7) 3LQ1NXXX
----------------------------------------------------------------------
Physical Device information
----------------------------------------------------------------------
Device #0
Device is a Hard drive
State : Online
Supported : Yes
Transfer Speed : SAS 3.0 Gb/s
Reported Channel,Device : 0,0
Reported Location : Enclosure 0, Slot 0
Reported ESD : 2,0
Vendor : SEAGATE
Model : ST373455SS
Firmware : S515
Serial number : 3LQ1GXXX
World-wide name : 5000C50006493XXX
Size : 70007 MB
Write Cache : Unknown
FRU : None
S.M.A.R.T. : No
Device #1
Device is a Hard drive
State : Online
Supported : Yes
Transfer Speed : SAS 3.0 Gb/s
Reported Channel,Device : 0,1
Reported Location : Enclosure 0, Slot 1
Reported ESD : 2,0
Vendor : SEAGATE
Model : ST37345XXX
Firmware : S515
Serial number : 3LQ1GZ79
World-wide name : 5000C50006499XXX
Size : 70007 MB
Write Cache : Unknown
FRU : None
S.M.A.R.T. : No
Device #2
Device is a Hard drive
State : Online
Supported : Yes
Transfer Speed : SAS 3.0 Gb/s
Reported Channel,Device : 0,2
Reported Location : Enclosure 0, Slot 2
Reported ESD : 2,0
Vendor : SEAGATE
Model : ST37345XXX
Firmware : S515
Serial number : 3LQ1GYDP
World-wide name : 5000C50006492XXX
Size : 70007 MB
Write Cache : Unknown
FRU : None
S.M.A.R.T. : No
Device #3
Device is a Hard drive
State : Online
Supported : Yes
Transfer Speed : SAS 3.0 Gb/s
Reported Channel,Device : 0,3
Reported Location : Enclosure 0, Slot 3
Reported ESD : 2,0
Vendor : SEAGATE
Model : ST37345XXX
Firmware : S515
Serial number : 3LQ1NE5D
World-wide name : 5000C500064AFXXX
Size : 70007 MB
Write Cache : Unknown
FRU : None
S.M.A.R.T. : No
Device #4
Device is a Hard drive
State : Online
Supported : Yes
Transfer Speed : SAS 3.0 Gb/s
Reported Channel,Device : 0,4
Reported Location : Enclosure 1, Slot 4
Reported ESD : 2,1
Vendor : SEAGATE
Model : ST37345XXX
Firmware : S515
Serial number : 3LQ1GXCS
World-wide name : 5000C50006493XXX
Size : 70007 MB
Write Cache : Unknown
FRU : None
S.M.A.R.T. : No
Device #5
Device is a Hard drive
State : Online
Supported : Yes
Transfer Speed : SAS 3.0 Gb/s
Reported Channel,Device : 0,5
Reported Location : Enclosure 1, Slot 5
Reported ESD : 2,1
Vendor : SEAGATE
Model : ST37345XXX
Firmware : S515
Serial number : 3LQ1LMCV
World-wide name : 5000C5000649AXXX
Size : 70007 MB
Write Cache : Unknown
FRU : None
S.M.A.R.T. : No
Device #6
Device is a Hard drive
State : Online
Supported : Yes
Transfer Speed : SAS 3.0 Gb/s
Reported Channel,Device : 0,6
Reported Location : Enclosure 1, Slot 6
Reported ESD : 2,1
Vendor : SEAGATE
Model : ST37345XXX
Firmware : S515
Serial number : 3LQ1NNSB
World-wide name : 5000C5000649AXXX
Size : 70007 MB
Write Cache : Unknown
FRU : None
S.M.A.R.T. : No
Device #7
Device is a Hard drive
State : Online
Supported : Yes
Transfer Speed : SAS 3.0 Gb/s
Reported Channel,Device : 0,7
Reported Location : Enclosure 1, Slot 7
Reported ESD : 2,1
Vendor : SEAGATE
Model : ST37345XXX
Firmware : S515
Serial number : 3LQ1NE9Q
World-wide name : 5000C500064B0XXX
Size : 70007 MB
Write Cache : Unknown
FRU : None
S.M.A.R.T. : No
Device #8
Device is an Enclosure services device
Reported Channel,Device : 2,0
Enclosure ID : 0
Type : SES2
Vendor : AMI
Model : MG9072
Firmware : 0005
Status of Enclosure services device
Temperature : Normal
Device #9
Device is an Enclosure services device
Reported Channel,Device : 2,1
Enclosure ID : 1
Type : SES2
Vendor : AMI
Model : MG9072
Firmware : 0005
Status of Enclosure services device
Temperature : Normal
Command completed successfully.
Интеграция с Nagios
Для того что бы удаленно мониторить состояние массива я решил раз в 5 минут (cron) запускать комманду mordred# arcconf GETCONFIG 1 AL и разбирать ее результат. Значения возвращать через snmpd. Думаю, нет смысла возвращать значения сколько винтов работает, или сколько не работает, единственная ошибка - уже повод принимать какие-то меры.