Instruction/ maintenance manual of the product RS/6000 SP IBM
Go to page of 114
RS / 6 000 SP S P Sw i t ch Ser v i c e Gu i de GA22-7443-04 IBM.
.
RS / 6 000 SP S P Sw i t ch Ser v i c e Gu i de GA22-7443-04 IBM.
Note! Before using this information and the product it supports, read the information in “Safety and environmental notices” on page ix and “Notices” on page A-1. Fifth Edition (April 2002) This book replaces GA22-7443-03. IBM welcomes your comments.
Contents Figures .................................... v T ables .................................... v i i Safety and environmental notices .......................... i x Safety notices .................................. i x Danger notices ...........
Selecting the switch clock source ......................... 3 - 6 Determining the correct switch clock source ..................... 3 - 6 Removing and restoring switch resources ....................... 3 - 7 Removing a switch assembly from the active configuration .
Figures 1-1. SP Switch high-level diagram ......................... 1 - 2 1-2. SP Switch wrap plugs ........................... 1 - 4 1-3. SP Switch inner chassis and front chassis cables ................. 1 - 8 2-1. Front view of frame locations ....
vi RS/6000 SP: SP Switch Service Guide.
T ables 1-1. Switch problem diagnostics ......................... 1 - 4 1-2. Environmental messages for switches ..................... 1 - 5 1-3. Switch connector resistance values ...................... 1 - 6 1-4. Fan failure diagnostics ............
viii RS/6000 SP: SP Switch Service Guide.
Safety and environmental notices For general information concerning safety , refer to Electrical Safety for IBM Customer Engineers , S229-8124. For a copy of the publication, contact your IBM account representative or the IBM branch office serving your locality .
DANGER Before you connect the power cable of this product to ac power , verify that the power receptacle is correctly grounded and has the correct voltage. ( SPSFD004 ) DANGER During an electrical storm, do not connect or disconnect any cable that has a conductive outer surface or a conductive connector .
DANGER The remaining steps of the procedure contain measurements that are taken with power on. Remember that hazardous voltages are present. ( SPSFD013 ) DANGER The frame main circuit breaker and the controller must not be switched on again now .
CAUTION: The covers are to be closed at all times except for service by trained service personnel. ( SPSFC003 ) CAUTION: When the unit is being serviced, the covers should not be left off or opened while the machine is running unattended.
CAUTION: v When moving frames into position, team members should work together . Using one person on each corner of the frame can prevent strain. v In raised floor installations, mechanically safe moldings should be installed around floor cutouts.
This product might contain nickel-cadmium or lithium batteries in communication adapters. The batteries must be recycled or disposed of properly . Recycling facilities might not be available in your area.
About this book This book is part of the RS/6000 ® SP ™ hardware service library and applies to the RS/6000 SP Switch. Use this book to assist you in performing the following tasks: v Identify fiel.
How to send your comments Y our feedback is important in helping to provide the most accurate and highest quality information. If you have any comments about this book or any other RS/6000 SP documentation: v Send your comments by e-mail to mhvrcfs@us.
Summary of changes GA22-7443-04 This edition contains replaces GA22-7443-03 and any update versions made to that level and makes them obsolete. This edition contains minor changes and fixes to softcopy cross-book links. GA22-7443-03 This edition contains replaces GA22-7443-02 and any update versions made to that level and makes them obsolete.
xviii RS/6000 SP: SP Switch Service Guide.
Chapter 1. Maintenance Analysis Procedures (MAPs) This chapter provides information for identifying problems and guides you to the most likely failed Field Replaceable Unit (FRU). The MAPs then refer you to the FRU Removal/Replacement procedures for the corrective action.
FRUs include: Fans, circuit breaker/LED card, switch supervisor card, switch power card(s), inner chassis cable, front chassis cable, complete assembly .
There are two LEDs on the front of each SP Switch. For quick reference, their definitions are as follows: Environment (Y ellow) Off No environmental problems detected by switch supervisor card. On W arning of environmental condition out of nominal range.
T able 1-1. Switch problem diagnostics Priority Message or condition Action 1 (1 of 3) Environmental problems v Errpt: “Failure...” v Log: “Shutdown: V oltage...” v Log: “Shutdown: Fan...” v Log: “Shutdown: T emperature...” v Errpt: “W arning.
SP Switch environment (MAP 0600) Purpose of this MAP This MAP provides diagnostic information for switch problems that are related to the operating environment. Note: Refer to “Service position procedures” on page 3-9 for placing a switch into the service position or for removing the switch from the service position.
Step 0600-004 Perspectives indicated a shutdown condition and T able 1-2 on page 1-5 directed you to this step. 1. One or more of the following conditions exist: v V oltage out of range: +5 V “shutdownP5” v V oltage out of range: +12 V “shutdownP12” v V oltage out of range: −5 V “shutdownN5” 2.
3. Is the measured resistance now within the acceptable range? v If yes, go to “Step 0600-017” on page 1-10 to verify fix. v If no, go to “Step 0600-008”. Step 0600-008 Y ou replaced the inner chassis cable and the front chassis cable but the measured resistance is still outside of the acceptable range.
T able 1-4. Fan failure diagnostics Priority Component Action 1 (1 of 5) Fan 1, 2, 3, 4 or 5 a. Check specified fans for blockages or loose cable connections. b. Fix any obvious problems and continue at “Step 0600-012”. c. If you do not find any problems, continue at Priority 2.
6. Check the Environmental (yellow) LED for an ON or FLASHING condition. 7. Is the Environmental (yellow) LED ON or FLASHING? v If the Environmental LED is on or flashing: a. Put the SP Switch’s circuit breaker into the Off (‘0’) position. b. Return to “Step 0600-01 1” on page 1-7 and continue service with the next highest priority .
2. Remove blockage. 3. If required, put the switch into service position (refer to “Service position procedures” on page 3-9). 4. With all cables replugged and Environmental (yellow) LED OFF , power on the SP Switch. 5. Go to “Step 0600-017” to verify fix.
2. Remove the switch supervisor card (refer to “Removing the switch supervisor card” on page 4-4). 3. Using a digital multimeter , measure resistance at the planar connection for the supervisor card, between pins 12A and 12B. v The resistance should be in a range of 4 to 20 ohms.
Step 0600-025 Y ou have a PS1Fail problem 1. Replace power supply card PS1. 2. Reinstall the SPS front panel assembly , being careful to align the guide pins on the P1 to the inner chassis cup guide. 3. Replug the power cable (J1) and supervisor cable (J2) to the rear of the assembly .
Step 0600-029 Y ou have a PSFuseGood problem that did not go away when you removed power supply PS1. 1. Put the circuit breaker in the Off (‘0’) position. 2. Remove power cable (J1) and supervisor cable (J2) from the rear of the SP Switch. 3. Remove the front panel assembly .
Step 0610-001 A message in Perspectives indicated that you have a power problem and T able 1-1 on page 1-4 or T able 1-2 on page 1-5 directed you to this MAP . 1. From a Perspectives window on the control workstation or by looking at the SP Switch, check the Power (green) LED for this SP Switch.
Step 0610-005 When you put the circuit breaker into the On (‘1’) position, the circuit breaker tripped into the Off (‘0’) position. 1. Have the customer remove the SP Switch from the active configuration and power off the SP Switch. 2. Put the switch into service position (refer to “Service position procedures” on page 3-9).
v If yes: a. Have the customer remove the SP Switch from the active configuration and power off the SP Switch. b. Go to “Step 0610-010”. v If no: a.
a. Replace the front chassis cable. b. Return to “Step 0610-004” on page 1-14 to verify the replacement cable. v If no: a. Replace the circuit breaker . b. Return to “Step 0610-004” on page 1-14 to verify the replacement breaker . Step 0610-013 Y ou have continuity on all cables.
splstdata -n | pg 5. From the system file server , log into “primary” processor node as root using the telnet command: telnet PrimaryNodename 6. Check errpt -a -N Worm | pg for any switch related problems. If any errors are listed, use the error information, with this MAP , to help isolate the problem.
T able 1-7. SP Switch error conditions Error # Message/condition Description and action 2 (SPS) Initialized Description: Initialization detected a wrapped port where a processor node or dependent node was expected (this may result from isolation procedures), or else a disconnected cable.
T able 1-7. SP Switch error conditions (continued) Error # Message/condition Description and action −3 (SPS) Device status: Device has been removed from network because of a bad signature Link status: Not operational Description: Possible hardware problem.
T able 1-7. SP Switch error conditions (continued) Error # Message/condition Description and action −8 (SPS) Device status: Device has been removed from network because of a miswire Link status: Lin.
Refer to notes at beginning of “SP Switch function (MAP 0620)” on page 1-17 for more information on cable connections and logical-to-physical frame translations.
1. Make sure a wrap plug is properly installed on the connector . Refer to Figure 1-2 on page 1-4 for views of the switch wrap plugs. 2. Have customer check (and update if necessary) the switch configuration file appropriately before continuing: v For code level 1.
v If yes: a. An SRN was obtained from diagnostics. b. Use the following SRN table (T able 1-9) to continue service. c. After using T able 1-9, go to “Step 0620-012” on page 1-26. v If no: a. The diagnostics did not detect a problem, therefore the problem is in SP Switch.
T able 1-9. Service Request Number (SRN) table for SP Switch adapters (continued) Service Request Number SRN source Failing component(s) Description 763-200 through 763-299 (except 763-282) D ext clock SPS MX adapter wrap plug Problem detected with the external clock (SP switch).
T able 1-9. Service Request Number (SRN) table for SP Switch adapters (continued) Service Request Number SRN source Failing component(s) Description 764-2A0 through 764-2A9 D SP System Attachment Adapter Problem detected with the internal clock.
T able 1-10. Switch problem priority listing Priority Failing component Action 1 (1 of 7) Software a. Have customer verify that the software is configured and operating correctly for this processor node/system. b. If no problem is found, continue with next highest priority item in the list for this SRN.
Step 0620-013 An SRN listed in T able 1-9 on page 1-24 indicated that you have a switch clock problem and Priority 2 in T able 1-10 on page 1-27 directed you to this step.
Step 0620-019 Since the diagnostics did not return an SRN, no problem was detected. 1. Have you just reseated, repaired, or replaced a component? v If yes, go to “Step 0620-043” on page 1-36 to verify fix. v If no: a. Problem is in the SP Switch. b.
Note: When unplugging the two ends of the suspect cable, check /var/adm/SPlogs/css/out.top to verify only two (2) ports were lost. If four (4) ports were lost, then two cables were swapped across that switch to switch connection. Connect the cables correctly and check /var/adm/SPlogs/css/out.
v If yes, go to “Step 0620-027”. v If no: a. The problem switch is providing the master clock for the system. b. Have customer use the Eclock command to select a different master clock for the system. c. Go to “Step 0620-043” on page 1-36 to verify fix.
v S00-BH-J9 5. Disconnect the switch data cable at the tailgate of the frame containing the processor node. 6. Connect the end of the processor node data cable to the jack. 7. Run advanced diagnostics on “css0” on the “test” processor node and its associated switch port.
Step 0620-033 All ground straps connecting the frames make adequate contact at both ends. 1. Disconnect clock source data cable at S00-BH-J3, J5, J7, or J9, then reconnect to SP Switch. (This is done to eliminate clocking noise from cable.) 2. Find a processor node in the “problem” frame which is usable for service.
5. Set the switch clock selections on the “problem” switches (refer to “Selecting appropriate switch clocks” on page 3-6). 6. Run advanced diagnostics on “css0” on the “test” processor node and its associated switch port. v Do not perform cable wrap test.
5. Look for an SRN indicating a clock problem, such as “External clock” being listed as one of the failing components. 6. Do the “css0” diagnostics fail with indication of a clock problem? v If yes: a. Problem is the clock selection in this SP Switch.
2. Put the SP Switch’s circuit breaker into the On (‘1’) position. 3. Go to “Step 0620-043” to verify fix. Step 0620-043 Y ou have replaced switch components and need to verify that the problem has been fixed. 1. Make sure any processor node(s) that was put in SERVICE mode is returned to NORMAL mode.
Chapter 2. Locations Naming standard for RS/6000 SP components ..................... 2 - 1 Format structure ............................... 2 - 1 Example of format structure .......................... 2 - 1 Frame (WWW) ............................... 2 - 1 Major assembly (XXX) .
– 01 - 99 for frames 1-99 (specific to that frame) Notes: 1. E01 designates RS/6000 SP physical frame 1 2. L00 designates any/all RS/6000 SP logical frames 3.
Front and rear views of RS/6000 SP frame Figure 2-1 shows a front view of the RS/6000 SP frame locations. “Frame (FRA)” on page 2-6 describes the assembly designations for the RS/6000 SP frame. Figure notes: 1. Wide processor nodes take up an entire shelf position (two thin processor node slots).
Figure 2-2 shows a front view of the RS/6000 SP multi-switch frame. Figure 2-3 on page 2-5 shows a front view of the Model 3AX (49-inch) frame. Main Power Switch with LED Left Skirt Right Skirt Switch.
Figure notes: 1. Wide processor nodes take up an entire shelf position (two thin processor node slots). They are identified by the odd numbered position. 2. In a F/C 2030/1 frame, switch assemblies take up an entire shelf partition. (They are identified by the even-numbered position.
Note: See notes under Figure 2-1 on page 2-3 for processor node/switch assembly numbering. Frame locations Figure 2-1 on page 2-3 shows a front view of the RS/6000 SP frame locations, with numbered processor nodes, and the three phase SEPBU.
G6: Front door ground G7: Rear door ground G8: Ground SW: Power-on switch LD: LED card FC: Front cover RC: Rear cover Example: E01-FRA-G1 Chapter 2. Locations 2-7.
Switch assembly locations Fan 5 P6 P4 P5 Fan 4 Fan 3 Fan 2 Fan 1 CB & LED Card J1 J2 Switch Planar Power Supply 2 Supervisor Card Power Supply 1 Supervisor Bus Card A i r B a f f l e Figure 2-5.
Connector details Figure 2-6 shows RS/6000 SP component connector details. Cable routing Figure 2-7 on page 2-10 and Figure 2-8 on page 2-10 show back views of the RS/6000 SP frame, showing the horizontal and vertical paths of cable routing from connector-to-connector , with the depth amplified on the drawing.
Note: When attaching exterior and interior cables to a POWER3 SMP High Node allow for enough cable for a 2-foot service loop for node movement into service position. Note: For a multi-switch frame (F/C 2030/1), refer to Figure 2-7. Figure 2-7. Frame cabling routing path in rear of RS/6000 SP frame — 1.
T able 2-1 shows external cable routing in a RS/6000 SP frame populated with 16 processor nodes. (Refer to “Cable routing” on page 2-9 to see the routing paths.
Figure 2-9. Frame cable routing paths in rear of RS/6000 SP multi-switch frame (F/C 2030/1) — 1.93 m frame Figure 2-10. Frame cable routing paths in rear of RS/6000 SP multi-switch frame (F/C 2030/1) — 2.
Switch data cables SPS data cables T able 2-2 describes the attachment locations and routing for the internal SPS Switch data cables: T able 2-2. SPS Switch data cable chart Cable Part Number Plug fro.
2-14 RS/6000 SP: SP Switch Service Guide.
Chapter 3. Service procedures Personal ESD requirements ............................ 3 - 1 T ools and files overview .............................. 3 - 1 Using the css.snap script ............................. 3 - 3 Switch supervisor self-test .......
T able 3-1. Service procedure tools Utility (see note) Runs on Description fault_service_Worm_RTG All nodes Monitors the switch for faults. It restarts the switch if a fault is detected. fs_monitor All nodes Monitors the adapter for interrupts that have not been serviced.
T able 3-3. T uning output files (continued) File (see note) Location Description daemon.stderr Primary A record of which nodes were not initialized. out.top Primary Reports errors from the last tuning procedure. It begins as a copy of the topology file and errors are indicated to the right of each entry .
The files ending in .out are produced by running the appropriate command to dump internal (in memory) trace information or dump data to a file. The completed output file will be found in /var/adm/SPlogs/css/css.snap.[date-time]tar .Z . css.snap avoids flooding /var by following these rules: v If less than 10% of /var is free, css.
2. In the Node pane, click the icon of the node you want to verify 3. Click the ″ Notebook ″ icon on the tool bar v When the Notebook window opens, make certain that the ″ Node Status ″ tab is selected 4.
v ″ Ye s ″ displayed in a red box indicates that the switch supervisor has failed and it is not responding to the frame supervisor . Note: Clicking ″ Help ″ in the Notebook window’s lower right corner displays attribute descriptions.
1 Input 1 (BH-J3 for SPS) 2 Input 2 (BH-J4 for SPS) 3 Input 3 (BH-J5 for SPS) T able 3-4. Setting switch clock sources Model Number of Logical Frames Master Clock Choice 123 30x 1-4 L01-S00 = i L02-S0.
6. The customer can re-initialize the switch using the Estart command. The frame and processor nodes which were removed in this procedure will appear in the out.top file with error messages; however , the remainder of the switch resources are now available for customer use.
Efence of primary and primary backup nodes By design, Efence of primary and primary backup nodes is not allowed. If you attempt to fence either of these nodes, you will get the following responses: Efence: 0028-147 Node number designates the Primary Node.
d. Choose “Normal Mode” e. Choose “Display Current Bootlist” This will display the current bootlist. 2. Power down the node, service it, and hook it back into the frame. 3. On the control workstation, run spbootins to set the node to boot in maintenance mode.
The following 3 adapters require functional microcode to be installed: Adapter Package ESCON ® Control Unit Adapters Feature 2756 ESCON BLKMUX S/370 ™ Control Unit Feature 2755 BLKMUX FDDI Adapters.
T o complete the microcode update, it is usually necessary to remove and then replace the device from the configuration. The most reliable method to do this is to reboot the node. Some adapters can actually require a power off cycle to complete the microcode update.
Chapter 4. FRU removals and replacements Handling static-sensitive devices .......................... 4 - 1 Procedures for switch assemblies .......................... 4 - 2 Removing the switch assembly .......................... 4 - 2 Replacing the switch assembly .
Procedures for switch assemblies CAUTION: The unit weight exceeds 18 Kg (40 lbs) and requires two service personnel to lift. (SPSFC002) Attention: Components in the frame are susceptible to damage from static discharge. Always use an ESD wristband when working inside frame covers.
Removing the switch fans Note: Refer to “Handling static-sensitive devices” on page 4-1 before removing or installing ESD sensitive devices. Perform these procedures to remove a fan from an SP Switch assembly: 1. Perform “Placing a switch assembly into service position” on page 3-9 to place the switch assembly into the service position.
Perform these procedures to remove the fan control cable from an SP Switch assembly: 1. Perform “Placing a switch assembly into service position” on page 3-9 to place the switch assembly into the service position. 2. Unplug connectors P7, P8 and P9.
Replacing the switch supervisor card Perform these procedures to replace the supervisor card in an SP Switch assembly: 1. Insert supervisor card. 2. Rotate card thumb locks inward to seat card.
Replacing the switch inner chassis cable Perform these procedures to replace the supervisor power cable from an SP Switch assembly: 1. Plug connectors P3, P4, P5 and P6. Route cable along the raceway , hooking retaining material where needed. Attach P1 connector to the rear of the switch assembly with screws retained in the removal procedure.
4. Remove power supply card. Replacing the switch power cards Perform these procedures to replace the switch power cards) in an SP Switch assembly: 1. Insert power supply card. 2. Rotate card thumb locks inward to seat card. 3. Plug connector P4 (PS1) or P6 (PS2).
4-8 RS/6000 SP: SP Switch Service Guide.
Chapter 5. Parts catalog SPS, SPS-8 Switch assembly (feature) (view 1) ..................... 5 - 2 SPS, SPS-8 Switch assembly (feature) (view 2) ..................... 5 - 4 SPS, SPS-8 Switch assembly (feature) (view 3) ..................... 5 - 6 Switch cables (feature) .
SPS, SPS-8 Switch assembly (feature) (view 1) 5-2 RS/6000 SP: SP Switch Service Guide.
T able 5-1. SPS, SPS-8 Switch assembly (feature) (view 1) Assembly index Part number Units Description SPS Switch assembly (reference only) SPS-8 Switch assembly (reference only) 1 26H7255 1 Front cha.
SPS, SPS-8 Switch assembly (feature) (view 2) 5-4 RS/6000 SP: SP Switch Service Guide.
T able 5-2. SPS, SPS-8 Switch assembly (feature) (view 2) Assembly index Part number Units Description SPS Switch assembly (reference only) SPS-8 Switch assembly (reference only) 1 1 1P0655 1 Inner ch.
SPS, SPS-8 Switch assembly (feature) (view 3) 8 7 6 5 4 3 2 1 Air flow 5-6 RS/6000 SP: SP Switch Service Guide.
T able 5-3. SPS, SPS-8 Switch assembly (feature) (view 3) Assembly index Part number Units Description SPS switch assembly (reference only) SPS-8 switch assembly (reference only) 1 26H7391 1 Cable, front chassis 2 32G1547 1 Screw , hex M4 x 5 3 46H9778 2 Cup guide (2.
Switch cables (feature) T able 5-4. Switch cables (feature) Assembly index Part number Units Description SP Switch data cables (SPS) 1 1J6091 AR Cable, Switch data node 01 - (1345 mm) 1 1J6092 AR Cabl.
Chapter 5. Parts catalog 5-9.
Multi-switch frame (F/C 2030/1) 5-10 RS/6000 SP: SP Switch Service Guide.
T able 5-5. Multi-switch frame (F/C 2030/1) Assembly index Part number Units Description 1 SPS Switch Assembly 2 54G3281 16 Screw , Phil Pan Hd M5 x 12 3 93G1065 AR Shelf Assembly 4 0375867 8 Nut, Cli.
5-12 RS/6000 SP: SP Switch Service Guide.
Notices This information was developed for products and services offered in the U.S.A. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area.
Electronic emissions notices Federal Communications Commission (FCC) statement This equipment has been tested and found to comply with the limits for a Class A digital device, pursuant to Part 15 of the FCC Rules.
For installations in Japan: The following is a summary of the VCCI Japanese statement in the box above. This is a Class A product based on the standard of the V oluntary Control Council for Interference by Information T echnology Equipment (VCCI). If this equipment is used in a domestic environment, radio disturbance may arise.
A-4 RS/6000 SP: SP Switch Service Guide.
Index Numerics 0375867 5-1 1 04H9469 5-1 1 08J5557 5-1 1 1 1J3975 5-8 1 1J3976 5-8 1 1J3977 5-8 1 1J3978 5-8 1 1J3979 5-8 1 1J3980 5-8 1 1J3981 5-8 1 1J3982 5-8 1 1J5189 5-1 1 1 1J5191 5-1 1 1 1J5193 .
frame locations 2-3, 2-5, 2-6 frame naming standard 2-1 frame supervisor verification 3-5 front chassis cable, SPS 1-8 front view of 49-inch frame locations 2-4 front view of frame locations 2-3 front.
SPS assembly , placing into service position 3-9 SPS assembly , removing 4-2 SPS assembly , replacing 4-2 SPS assembly , replacing from service position 3-9 SPS fan control cable, removing 4-3 SPS fan.
X-4 RS/6000 SP: SP Switch Service Guide.
Reader’s comments – W e’d like to hear from you RS/6000 SP SP Switch Service Guide Publication No. GA22-7443-04 Overall, how satisfied are you with the information in this book? V ery Satisfied .
Readers’ Comments — We’d Like to Hear from Y ou GA22-7443-04 GA22-7443-04 IBM Cut or Fold Along Line Cut or Fold Along Line Fold and T ape Please do not staple Fold and T ape Fold and T ape Plea.
.
IBM GA22-7443-04.
An important point after buying a device IBM RS/6000 SP (or even before the purchase) is to read its user manual. We should do this for several simple reasons:
If you have not bought IBM RS/6000 SP yet, this is a good time to familiarize yourself with the basic data on the product. First of all view first pages of the manual, you can find above. You should find there the most important technical data IBM RS/6000 SP - thus you can check whether the hardware meets your expectations. When delving into next pages of the user manual, IBM RS/6000 SP you will learn all the available features of the product, as well as information on its operation. The information that you get IBM RS/6000 SP will certainly help you make a decision on the purchase.
If you already are a holder of IBM RS/6000 SP, but have not read the manual yet, you should do it for the reasons described above. You will learn then if you properly used the available features, and whether you have not made any mistakes, which can shorten the lifetime IBM RS/6000 SP.
However, one of the most important roles played by the user manual is to help in solving problems with IBM RS/6000 SP. Almost always you will find there Troubleshooting, which are the most frequently occurring failures and malfunctions of the device IBM RS/6000 SP along with tips on how to solve them. Even if you fail to solve the problem, the manual will show you a further procedure – contact to the customer service center or the nearest service center