Instruction/ maintenance manual of the product V5 PAR Technologies
Go to page of 98
Administrator's Guide Release 5.0.5 Published April 2010.
ParaStation5 Administrator's Guide ParaStation5 Administrator's Guide Release 5.0.5 Copyright © 2002-2010 ParTec Cluster Competence Center GmbH April 2010 Printed 7 April 2010, 14:11 Reproduction in any manner whatsoever without the written permission of ParTec Cluster Competence Center GmbH is strictly forbidden.
ParaStation5 Administrator's Guide iii Table of Contents 1. In tro duc tio n ................................................................................................................................. 1 1.1. What is ParaStation ............
ParaStation5 Administrator's Guide iv ParaStation5 Administrator's Guide 6. 2. Pr ob lem : nod e sho wn as "do wn" .................................................................................... 2 9 6. 3. Pr ob lem : ca nn ot st ar t pa ra lle l tas k .
ParaStation5 Administrator's Guide 1 Chapter 1. Introduction 1.1. What is ParaStation ParaStation is an integrated cluster management and communication solution.
About this document 2 ParaStation5 Administrator's Guide In the middle of 2004, all rights on ParaStation where transferred from ParTec AG to the ParTec Cluster Competence Center GmbH. This new company takes a much more service-oriented approach to the customer.
ParaStation5 Administrator's Guide 3 Chapter 2. Technical overview Within this section, a brief technical overview of ParaStation5 will be given. The various software modules constituting ParaStation5 are explained.
License 4 ParaStation5 Administrator's Guide • p4sock.o : this module implements the kernel based ParaStation5 communication protocol. • e1000_glue.o , bcm5700_glue.o : these modules enable even more efficient communication to the network drivers coming with ParaStation5 (see below).
ParaStation5 Administrator's Guide 5 Chapter 3. Installation This chapter describes the installation of ParaStation5 . At first, the prerequisites to use ParaStation5 are discussed. Next, the directory structure of all installed components is explained.
Software 6 ParaStation5 Administrator's Guide Software ParaStation requires a RPM-based Linux installation, as the ParaStation software is based on installable RPM packages. All current distributions from Novell and Red Hat are supported, like • SuSE Linux Enterprise Server (SLES) 9 and 10 • SuSE Professional 9.
Installation via RPM packages ParaStation5 Administrator's Guide 7 man contains the manual pages describing the ParaStation daemons, utilities and configuration files after installing the documentation package. The necessary steps are described in Section 3.
Compiling the ParaStation5 packages from source 8 ParaStation5 Administrator's Guide Please note that the individual version numbers of the distinct packages building the ParaStation5 system do not necessarily have to match.
Installing the documentation ParaStation5 Administrator's Guide 9 # rpm -Uv psmgmt.5.0.0-0.i586.rpm pscom.5.0.0-0.i586.rpm pscom-modules.5.0.0-0.i586.rpm This will copy all the necessary files to /opt/parastation and the kernel modules to /lib/modules/ kernelversion /kernel/drivers/net/ps4 .
Installing MPI 10 ParaStation5 Administrator's Guide # rpm -Uv psdoc-5.0.0-1.noarch.rpm All the PDF and HTML files will be installed within the directory /opt/parastation/doc , the manual pages will reside in /opt/parastation/man .
Uninstalling ParaStation5 ParaStation5 Administrator's Guide 11 • testing These steps will be discussed in Chapter 4, Configuration . 3.7. Uninstalling ParaStation5 After stoping the ParaStation daemons, the corresponding packets can be removed using # /etc/init.
12 ParaStation5 Administrator's Guide.
ParaStation5 Administrator's Guide 13 Chapter 4. Configuration After installing the ParaStation software successfully, only few modifications to the configuration file parastation.conf(5) have to be made in order to enable ParaStation on the local cluster.
Enable optimized network drivers 14 ParaStation5 Administrator's Guide The values that might be assigned to the HWType parameter have to be defined within the parastation.conf configuration file. Have a brief look at the various Hardware sections of this file in order to find out which hardware types are actually defined.
Testing the installation ParaStation5 Administrator's Guide 15 transfer application data across Ethernet, this adapted drivers should be used, too. To enable these drivers, the simplest way is to rename the original modules and recreate the module dependencies: # cd /lib/modules/$(uname -r)/kernel/drivers/net # mv e1000/e1000.
Testing the installation 16 ParaStation5 Administrator's Guide Alternatively, it is possible to use the single command form of the psiadmin command: # /opt/parastation/bin/psiadmin -s -c "list" The command should be repeated until all nodes are up.
ParaStation5 Administrator's Guide 17 Chapter 5. Insight ParaStation5 This chapter provides more technical details and background information about ParaStation5 . 5.1. ParaStation5 pscom communication library The ParaStation communication library libpscom offers secure and reliable end-to-end connectivity.
Directory /proc/sys/ps4/state 18 ParaStation5 Administrator's Guide The p4sock.ko module inserts a number of entries within the /proc filesystem. All ParaStation5 entries are located within the subdirectory /proc/sys/ps4 . Three different subdirectories, listed below, are available.
Directory /proc/sys/ps4/local ParaStation5 Administrator's Guide 19 • MaxAcksPending: maximum number of pending ACK messages until an "urgent" ACK messages will be sent. • MaxDevSendQSize: maximum number of entries of the (protocol internal) send queue to the network device.
Using the ParaStation5 queuing facility 20 ParaStation5 Administrator's Guide a predefined node list. If not defined, all currently known nodes are taken into account. Also, the variables PSI_NODES_SORT , PSI_LOOP_NODES_FIRST , PSI_EXCLUSIVE and PSI_OVERBOOK are observed.
ParaStation5 TCP bypass ParaStation5 Administrator's Guide 21 In order to run applications linked with one of those MPI libraries, ParaStation5 provides dedicated mpirun commands. The processes for those type of parallel tasks are spawned obeying all restrictions described in Section 5.
Authentication within ParaStation5 22 ParaStation5 Administrator's Guide PSP_SHM or PSP_SHAREDMEM Don't use shared memory for communication within the same node. PSP_P4S or PSP_P4SOCK Don't use ParaStation p4sock protocol for communication.
Homogeneous user ID space ParaStation5 Administrator's Guide 23 etc/passwd . Usage of common authentication schemes like NIS is not required and therefore limits user management to the frontend nodes. Authentication of users is restricted to login or frontend nodes and is outside of the scope of ParaStation .
Integration with AFS 24 ParaStation5 Administrator's Guide 5.14. Integration with AFS To run parallel tasks spawned by ParaStation on clusters using AFS , ParaStation provides the scripts env2tok and tok2env . On the frontend side, calling . tok2env will create an environment variable AFS_TOKEN containing an encoded access token for AFS.
Integration with PBS PRO ParaStation5 Administrator's Guide 25 If an external queuing system is used, the environment variable PSI_NODES_SORT should be set to "none", thus no sorting of any predefined node list will be done by ParaStation .
Copying files in parallel 26 ParaStation5 Administrator's Guide # UseMCast statement. If Multicast is enabled, the ParaStation daemons exchange status information using multicast messages. Thus, a Linux kernel supporting multicast on all nodes of the cluster is required.
Using ParaStation process pinning ParaStation5 Administrator's Guide 27 To list, sort and filter all the collected information, the command psaccview is available. See psaccounter(8) and psaccview(8) for details. 5.19. Using ParaStation process pinning ParaStation is able to pin down compute tasks to particular cores.
Changing the default ports for psid(8) 28 ParaStation5 Administrator's Guide and change the default port number 888 . Modify the entry port = 888 within the file /etc/xinet.d/psidstarter to reflect the newly assigned port numbers. In addition, the ParaStation daemon psid(8) uses the UDP port 886 for RDP connections.
ParaStation5 Administrator's Guide 29 Chapter 6. Troubleshooting This chapter provides some hints to common problems seen while installing or using ParaStation5 .
Problem: cannot start parallel task 30 ParaStation5 Administrator's Guide Or logged on to this node, run psiadmin which also starts up the ParaStation daemon psid . See Section 6.1, “ Problem: psiadmin returns error ” for more details. Check the logfile /var/log/messages on this node for error messages.
Warning issued on task startup ParaStation5 Administrator's Guide 31 This typically happens, if the frontend or head node is included as compute node and also acts as gateway for the compute nodes. The "external" address of the frontend is not known to the compute nodes.
Problem: processes cannot access files on remote nodes 32 ParaStation5 Administrator's Guide Make sure no other process uses this port. Or use the RDPPort directive within parastation.conf to re-define this port for all daemons within the cluster.
ParaStation5 Administrator's Guide 33 Reference Pages This appendix lists all reference pages related to ParaStation5 administration tasks. For reference pages describing user related commands and information, refer to the ParaStation5 User's Guide .
34 ParaStation5 Administrator's Guide.
ParaStation5 Administrator's Guide 35 parastation.conf parastation.conf — the ParaStation configuration file Description Upon execution, the ParaStation daemon psid(8) reads its configuration information from a configuration file which, by default, is /etc/parastation.
36 ParaStation5 Administrator's Guide The following five types of parameters within the Hardware environment will get a special handling from the ParaStation daemon psid(8). These define different script files called in order to execute various operations towards the corresponding communication hardware.
ParaStation5 Administrator's Guide 37 p4sock Use optimized communication via (Gigabit) Ethernet. The script handling this hardware type ps_p4sock is also located in the config subdirectory. It understands the following two environment variables: PS_TCP If set to an address range, e.
38 ParaStation5 Administrator's Guide accounter This is actually a pseudo communication layer. It is only used for configuring nodes running the ParaStation accounting daemon and should be used only in a particular Nodes entry. NrOfNodes num Define the number of connected nodes including the frontend node.
ParaStation5 Administrator's Guide 39 Node[s] hostname id [HWType-entry] [starter-entry] [runJobs-entry] [env name value ] [env { name value ... }] Node[s] { { hostname id [HWType-entry] [starter-entry] [runJobs-entry] [env name value ] [env { name value .
40 ParaStation5 Administrator's Guide SelectTime time Set the timeout of the central select(2) of the ParaStation daemon psid(8) to time seconds. The default value is 2 seconds. This parameter can be set during runtime via the set selecttime directive within the ParaStation administration and management tool psiadmin(1).
ParaStation5 Administrator's Guide 41 The default port to use is 886 . RLimit { Core size | CPUTime time | DataSize size | MemLock size | StackSize size | RSSize size } RLimit { { Core size | CPUTime time | DataSize size | MemLock size | StackSize size | RSSize size }.
42 ParaStation5 Administrator's Guide The value part of each line either is a single word or an expression enclosed by single or double quotes. The expression might contain whitespace characters. If the expression is enclosed by single quotes, it is allowed to use balanced or unbalanced double quotes within this expression and vice versa.
ParaStation5 Administrator's Guide 43 This only comes into play, if the user does not define a sorting strategy explicitely via PSI_NODES_SORT . Be aware of the fact that using a batch-system like PBS or LSF *will* set the strategy explicitely, namely to NONE.
44 ParaStation5 Administrator's Guide rdpMaxRetrans number Set the maximum number of retransmissions within the RDP facility. If more than this number of retransmission would have been necessary to deliver the packet to the remote destination, this connection is declared to be down.
ParaStation5 Administrator's Guide 45 ACK is sent piggyback within the next regular packet to this node or as soon as a retransmission occurred. If set to 1, each RDP packet received is acknowledged by an explicit ACK.
46 ParaStation5 Administrator's Guide.
ParaStation5 Administrator's Guide 47 psiadmin psiadmin — the ParaStation administration and management tool Synopsis psiadmin [ -denqrsv? ] [ -c command ] [ -f program-file ] [ --usage ] Description The psiadmin command provides an administrator interface to the ParaStation system.
48 ParaStation5 Administrator's Guide --usage Display a brief usage message. Standard Input The psiadmin command reads standard input for directives until end of file is reached, or the exit or quit directive is read.
ParaStation5 Administrator's Guide 49 If nodes is empty, the node range preselected via the range command is used. The default preselected node range contains all nodes of the ParaStation cluster. The from and to parts of each range are node IDs.
50 ParaStation5 Administrator's Guide count [hw hw ] List the status of the communication system(s) on the selected node(s). Various counters are displayed. If the hw option is given, only the counters concerning the hw hardware type are displayed.
ParaStation5 Administrator's Guide 51 TaskID The ParaStation task ID of the process, both as decimal and hexadecimal number. The task ID of a process is unique within the cluster and is composed out of the ParaStation ID of the node the process is running on and the local process ID of the process, i.
52 ParaStation5 Administrator's Guide range {[nodes] | all } Preselect or display the default set of nodes If nodes or all is given, this directive modifies the default set of nodes all following directives will act on. nodes is given in the same syntax as within any other directive, i.
ParaStation5 Administrator's Guide 53 master [ nodes ] Show the current master on the selected node(s). The master node's task is the management and allocation of resources within the cluster. It is elected among the running nodes during runtime.
54 ParaStation5 Administrator's Guide cpumap [ nodes ] Show the CPU-slot to core mapping list for the selected nodes. bindmem [ nodes ] Show flag marking if this nodes uses binding as NUMA policy. adminuser [ nodes ] Show users allowed to start admin-tasks, i.
ParaStation5 Administrator's Guide 55 rl_sigpending [ nodes ] Show RLIMIT_SIGPENDING on this node. rl_stack [ nodes ] Show RLIMIT_STACK on this node. supplementaryGroups [ nodes ] Show supplementaryGroups flag. statusBroadcasts [ nodes ] Show the maximum number of status broadcasts initiated by lost connections to other daemon.
56 ParaStation5 Administrator's Guide hwstart [hw { hw | all } ] [ nodes ] Start the declared hardware on the selected nodes. Starting a specific hardware will be tried on the selected nodes regardless, if this hardware is specified for this nodes within the parastation.
ParaStation5 Administrator's Guide 57 adminuser [ + | - ] { name | any } [ nodes ] Grant authorization to start admin-tasks, i.e. task not blocking a dedicated CPU, to a particular or any user.
58 ParaStation5 Administrator's Guide Pattern Name Description 0x0000001 PSC_LOG_PART Partitioning functions (i.e. PSpart_()) 0x0000002 PSC_LOG_TASK Task structure handling (i.
ParaStation5 Administrator's Guide 59 Pattern Name Description 0x0001 RDP_LOG_CONN Uncritical errors on connection loss 0x0002 RDP_LOG_INIT Info from initialization (IP, FE, NFTS etc.
60 ParaStation5 Administrator's Guide nodesSort { PROC | LOAD_1 | LOAD_5 | LOAD_15 | PROC+LOAD | NONE } [ nodes ] Define the default sorting strategy for nodes when attaching them to a partition.
ParaStation5 Administrator's Guide 61 bindmem [ 0 | 1 ] [ nodes ] Set flag marking if this nodes will use memory-binding as NUMA policy. Relevant values are 'false', 'true', 'no', 'yes', 0 or different from 0.
62 ParaStation5 Administrator's Guide quiet Quiet execution. Only a short message is printed if the test was successful. normal Normal execution with some messages during runtime. This is the default. verbose Very verbose execution with many message during runtime.
ParaStation5 Administrator's Guide 63 psid psid — the ParaStation daemon. The organizer of the ParaStation software architecture. Synopsis psid [-v?] [-d level ] [-f configfile ] [-l logfile ] [--usage] Description The ParaStation daemon is implemented as a Unix daemon process.
64 ParaStation5 Administrator's Guide Options -d , --debug= level Activate the debugging mode and set the debugging level to level . If debugging is enabled, i.e. if level is larger than 0 and option -l is set to stdout , no fork(2) is made on startup, which is usually done in order to run psid as a daemon process in background.
ParaStation5 Administrator's Guide 65 test_config test_config — verify the ParaStation4 configuration file. Synopsis test_config [-vad? ] [-v ] [-a ] [-d ] [-? ] [-f filename] Description test_config reads and analyses the ParaStation4 configuration file.
66 ParaStation5 Administrator's Guide.
ParaStation5 Administrator's Guide 67 test_nodes test_nodes — test physical connections within a cluster. Synopsis test_nodes [-np num ] [-cnt count ] [-map] [-type] Description Tests all or some physical (low level) connections within a cluster.
68 ParaStation5 Administrator's Guide.
ParaStation5 Administrator's Guide 69 test_pse test_pse — test virtual connections within a cluster. Synopsis test_pse [-np num ] Description This command spawns num processes within the cluster. It's intended to test the process spawning capabilities of ParaStation .
70 ParaStation5 Administrator's Guide.
ParaStation5 Administrator's Guide 71 p4stat p4stat — display information about the p4sock protocol. Synopsis p4stat [ -v ] [ -s ] [ -n ] [ -? ] [ --sock ] [ --net ] [ --version ] [ --help ] [ --usage ] Description Display information for sockets and network connections using the ParaStation4 protocol p4sock .
72 ParaStation5 Administrator's Guide.
ParaStation5 Administrator's Guide 73 p4tcp p4tcp — configure the ParaStation4 TCP bypass. Synopsis p4tcp [ -v ] [ -a ] [ -d ] [ -? ] [ from [ to ]] Description p4tcp configures the ParaStation4 TCP bypass. Without an argument, the current configuration is printed.
74 ParaStation5 Administrator's Guide.
ParaStation5 Administrator's Guide 75 psaccounter psaccounter — Write accounting information from the ParaStation psid to the accounting files. Synopsis psaccounter [ -e | --extend ] [ -d | --d.
76 ParaStation5 Administrator's Guide Calling psaccounter with -p gzip would call the command gzip yyyymmdd and therefore compress least recently used accounting file. -c , --dumpcore Define that a core file should be written in case of a catastrophy.
ParaStation5 Administrator's Guide 77 psaccview psaccview — Print ParaStation accounting information. Synopsis psaccview [ -? | --help ] [ -h | --human ] [ -nh | --noheader ] [ -l | --logdir= d.
78 ParaStation5 Administrator's Guide Grouping jobs -lj , --ljobs Print detailed jobs list. Lists all jobs, one per line. -lu , --ltotuser Print user list. Lists job summary per user, one user per line. -lg , --ltotgroup Print group list. Lists job summary per group, one group per line.
ParaStation5 Administrator's Guide 79 Upon startup psaccview tries to find the file .psaccviewrc in the user's home directory. Within this file, pre-defined variables in the command my be re-defined. See the configuration section within the psaccview script.
80 ParaStation5 Administrator's Guide These column names may also be used for sorting lists, where applicable. Files /var/account/ * , /var/account/ *.gz , /var/account/ *.bz2 Accounting files, one per day. $HOME/.psaccviewrc Initialization file.
ParaStation5 Administrator's Guide 81 mlisten mlisten — display multicast pings from the ParaStation daemon psid(8) Synopsis mlisten [-dv?] [-m MCAST ] [-p PORT ] [-n IP ] [-# NODES ] [--usage] Description Display the multicast pings the ParaStation daemon psid(8) is emitting continuously.
82 ParaStation5 Administrator's Guide.
ParaStation5 Administrator's Guide 83 Appendix A. Quick Installation Guide This appendix gives a brief overview how to install ParaStation5 on a cluster. A detailed description can be found in Chapter 3, Installation and Chapter 4, Configuration .
84 ParaStation5 Administrator's Guide Provided the ParaStation daemon is started by the xinetd , run the psiadmin(1) command located in / opt/parastation/bin and execute the add command. This will bring up the ParaStation daemon psid(8) on every node.
ParaStation5 Administrator's Guide 85 Appendix B. ParaStation license The ParaStation software may be used under the following terms and conditions only. Software and Know-how License Agreement Version 1.0 between ParTec Cluster Competence Center GmbH place of business: Possartstr.
86 ParaStation5 Administrator's Guide Commercial Use means any non-consumer use that is not covered by University Use. Know-how means program documents and information which relates to Software, .
ParaStation5 Administrator's Guide 87 § 6 Grant-Back 1. Licensee grants ParTec for Modifications being severable improvements a nonexclusive, perpetual, irrevocable, worldwide and royalty-free license, and for Modifications being non-severable improvements an exclusive, perpetual, irrevocable, worldwide and royalty-free license to a.
88 ParaStation5 Administrator's Guide 2. A breach by Licensee of any one of the obligations under sections §4, §5 and §6, will automatically terminate Licensee's rights under this license. § 12 Rights after Expiration of the Agreement 1.
ParaStation5 Administrator's Guide 89 Appendix C. Upgrading ParaStation4 to ParaStation5 This appendix explains how to upgrade an existing ParaStation4 installation to the current ParaStation5 version. C.1. Building and installing ParaStation5 packages Just recompile the packages: # rpmbuild --rebuild psmgmt.
Changes to the runtime environment 90 ParaStation5 Administrator's Guide Use the mpiexec command instead! Executables linked with ParaStation4 can be run using the new mpiexec command. In this case, the option -b or --bnr is required. The environment variable PSP_P4SOCK was renamed to PSP_P4S , but still recognized.
ParaStation5 Administrator's Guide 91 Glossary Address Resolution Protocol A sending host decides, through a protocols routing mechanism, that it wants to transmit to a target host located some place on a connected piece of a physical network. To actually transmit the hardware packet usually a hardware address must be generated.
92 ParaStation5 Administrator's Guide to store it to a given address. The rest of the jobs is done by this controller without producing further load to the CPU. Obviously this concept helps to disburden the CPU from work which is not its first task and thus gives more power to solve the actual application.
ParaStation5 Administrator's Guide 93 Serial Task A single process running on one of the compute nodes within the cluster. This process does not communicate with other processes using MPI. ParaStation knows about this process and where it is started from.
94 ParaStation5 Administrator's Guide.
An important point after buying a device PAR Technologies V5 (or even before the purchase) is to read its user manual. We should do this for several simple reasons:
If you have not bought PAR Technologies V5 yet, this is a good time to familiarize yourself with the basic data on the product. First of all view first pages of the manual, you can find above. You should find there the most important technical data PAR Technologies V5 - thus you can check whether the hardware meets your expectations. When delving into next pages of the user manual, PAR Technologies V5 you will learn all the available features of the product, as well as information on its operation. The information that you get PAR Technologies V5 will certainly help you make a decision on the purchase.
If you already are a holder of PAR Technologies V5, but have not read the manual yet, you should do it for the reasons described above. You will learn then if you properly used the available features, and whether you have not made any mistakes, which can shorten the lifetime PAR Technologies V5.
However, one of the most important roles played by the user manual is to help in solving problems with PAR Technologies V5. Almost always you will find there Troubleshooting, which are the most frequently occurring failures and malfunctions of the device PAR Technologies V5 along with tips on how to solve them. Even if you fail to solve the problem, the manual will show you a further procedure – contact to the customer service center or the nearest service center